A voice-activated object detection application designed to help visually impaired users understand their surroundings using artificial intelligence and speech feedback.
- Voice Activation: Say "start", "detect", "see", or "look" to trigger object detection
- Offline Operation: Uses Vosk for offline speech recognition and pyttsx3 for text-to-speech
- Real-time Detection: YOLOv8 for fast, accurate object detection
- Natural Descriptions: Converts detections into natural language descriptions
- Accessible UI: Clean KivyMD interface designed for accessibility
abserny/
├─ app/ # UI entry point
│ └─ main.py # KivyMD application
├─ orchestrator/ # Main coordination logic
│ └─ supervisor.py # Orchestrates all services
├─ domain/ # Business logic
│ ├─ entities.py # Data models
│ └─ usecases.py # Use cases
├─ services/ # External service wrappers
│ ├─ camera.py # Camera interface
│ ├─ vision.py # YOLO wrapper
│ ├─ audio.py # Text-to-speech
│ └─ speech_recognition.py # Vosk wrapper
├─ infra/ # Infrastructure
│ └─ config.py # Configuration
└─ requirements.txt
pip install -r requirements.txtDownload a Vosk model for your language:
- English: Offical Link
- Extract to a
modelfolder in the project root
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 modelThe YOLOv8 nano model will be automatically downloaded on first run.
python app/main.py- The app starts listening for voice commands
- Say "start" or another trigger word
- The camera captures a frame
- YOLO detects objects in the frame
- The app speaks a natural description of what it sees
- Returns to listening mode
- "start"
- "detect"
- "see"
- "look"
- "scan"
Edit infra/config.py to customize:
- Camera settings
- YOLO model and confidence threshold
- Trigger words
- Detection cooldown period
- TTS settings
User says: "start"
App responds: "Detecting... I see a person, a laptop, and 2 cups"
- Python 3.8+
- Webcam
- Microphone
- Speakers/headphones
- Check that pyttsx3 is properly installed
- Try:
python -c "import pyttsx3; pyttsx3.init()"
- Verify Vosk model is in the
modelfolder - Check microphone permissions
- Test microphone:
python -m sounddevice
- Check camera is connected and not in use
- Try changing
CAMERA_IDin config (0, 1, 2, etc.)
- Ensure internet connection for first download
- Manually download from: Direct Link
- Distance estimation
- Directional information (left/right/center)
- Hazard detection (stairs, obstacles)
- Object tracking across frames
- Custom object classes for indoor/outdoor environments
- Multi-language support
- Gesture control
This project is intended for educational purposes as a graduation project.
- YOLOv8: Ultralytics
- Vosk: Alpha Cephei
- KivyMD: KivyMD Team