Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
-
Updated
Jul 14, 2025 - Python
Flask-based AI app that summarizes surveillance videos using Whisper (audio), ViT-GPT2 (frame captions), and Groq LLM (narratives). Produces both general and law enforcement-style summaries.
AI-powered image captioning using InceptionV3+LSTM and ViT-GPT2 models. Trained on Flickr8k dataset with interactive Streamlit interface.
Developed an image captioning system using the BLIP model to generate detailed, context-aware captions. Achieved an average BLEU score of 0.72, providing rich descriptions that enhance accessibility and inclusivity.
The chrome extension that gets input images and generates the captions for them.
A powerful Streamlit application that analyzes images using multiple vision models and responds to queries about visual content through conversational AI.
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
Add a description, image, and links to the vit-gpt2 topic page so that developers can more easily learn about it.
To associate your repository with the vit-gpt2 topic, visit your repo's landing page and select "manage topics."