Make movies more accessible for blind and low-vision audiences by auto-generating concise, scene-aware audio descriptions.
This MVP fuses speech (Whisper), environmental audio (PANNs), face & pose/emotion (FaceNet / MediaPipe / DeepFace), and vision (YOLO + BLIP) with BigQuery AI to build a unified scene timeline and generate narration “stingers” that play naturally between dialogue.
vision_keyframe_blip_yolo.py– extracts keyframes and BLIP captions + YOLO objectspanns_tag_other.py– sound events on theother.wavstem (Demucs output)voice_cluster_speakers.py– speaker embeddings + clustering (SpeechBrain + HDBSCAN)face_cluster_characters.py– character discovery from faces (FaceNet-PyTorch)pose_emotion_on_frames.py– skeleton/pose and basic emotions (MediaPipe + DeepFace)merge_whisper_panns.py– align Whisper transcript with SFX events into one timelinebuild_accessible_narration_en.py– build the final English narration filebigquery/queries.sql– sample BigQuery AI SQL showing generative + vector searchrequirements.txt– baseline dependencies to reproduce the pipelinelady_clip.mp4– 40MB demo video provided so judges/users can run the pipeline end-to-end without extra setuprun_demo_local.py– one-click runner to process the demo clip and save narration outputs
Large artifacts (long videos, audio stems, frames, outputs) are intentionally ignored by Git with
.gitignore.
For extended testing, you can replacelady_clip.mp4with your own short clip (under 50MB) or host larger media in Cloud Storage.
-
Clone the repo and enter the project folder:
git clone https://github.com/HosseinDataScience/accessible-narration-mvp.git cd accessible-narration-mvp -
Create a Python 3.11 virtual environment and install dependencies:
pip install -r requirements.txt
-
Run the full demo pipeline on the included 40MB sample video:
python run_demo_local.py
-
Outputs will be saved in the
mvp_outputs/folder (transcripts, vision CSV/JSON, and final narration text).
This MVP currently focuses on generating audio narration for blind and low-vision users. In future updates, we plan to expand accessibility features, including:
- Scene context from costume & set design: using clothing and environment recognition to create richer mental images.
- Braille output option: instead of narration audio, output text could be delivered to refreshable Braille devices.
- Sign language support for deaf audiences: a multimodal extension where a face icon with lip and hand gestures conveys the story in sign language.
These directions aim to make cinema more inclusive for people with different accessibility needs.
---