Audio Caption Generation From Image Using Deep Learning

Abdul Hannan Sunsara

Mobile Engineer

Software Engineer

Flask

Flutter

Python

• The Android application will capture an image which will be processed using the Machine learning model and then it will generate captions which will be converted to audio to describe the whole image.

• The application has been designed with the specific needs of blind people in mind, providing them with an enhanced perspective and greater understanding of their surroundings. With built-in shortcuts, users can easily navigate the app’s features and functionalities. This ensures that blind users can use the application with ease and convenience, improving their overall experience and making it easier for them to engage with the world around them.

Technology Used: Python, Flutter, Flask, CNN, RNN.

Like this project