Winner of "Best Use of Google Cloud" at HackHarvard 2022 π https://devpost.com/software/realive
ReAlive is an innovative AI-powered web application that brings your old photos to life by adding realistic, contextually-aware audio. Using advanced computer vision and machine learning techniques, ReAlive analyzes images to generate depth maps, extract semantic information, and synthesize immersive audio experiences that recreate what the scene might have sounded like.
- π¨ Image Analysis: Advanced computer vision to extract visual elements and depth information
- π Smart Audio Synthesis: AI-powered audio generation based on image content
- π Depth Mapping: Monocular depth estimation using CNN and OpenCV
- π Web Interface: Clean, responsive web UI built with FastAPI and Bootstrap
- βοΈ Cloud-Native: Fully deployed on Google Cloud Platform with containerized microservices
- π΅ Audio Mixing: Intelligent audio layering and intensity mapping
- π± Real-time Processing: Fast image-to-audio conversion pipeline
- Python 3.8 or higher
- Google Cloud Platform account
- Docker (for containerized deployment)
-
Clone the repository
git clone https://github.com/yourusername/realive.git cd realive -
Install dependencies
# Install main requirements pip install -r requirements.txt # Install PyTorch-specific requirements pip install -r pytorch_requirements.txt # Install spaCy requirements pip install -r spacy_requirements.txt
-
Set up Google Cloud credentials
# Download your GCP service account key # Place it in the project root as 'plasma-myth-365608-02f3f88329f7.json' export GOOGLE_APPLICATION_CREDENTIALS="plasma-myth-365608-02f3f88329f7.json"
-
Download required models
# Download the depth estimation model # Place 'depthmodel.h5' in the project root
-
Run the application
python main.py
-
Access the web interface Open your browser and navigate to
http://localhost:8000
ReAlive follows a microservices architecture with the following components:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Web Frontend β β Image2Text β β Depth Map β
β (FastAPI) βββββΊβ Service β β Generator β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Sound Mapper β β Google Cloud β β Audio Mixer β
β & Synthesizer β β Storage β β (Pydub) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- FastAPI-based web server
- Handles image uploads and processing requests
- Orchestrates the entire pipeline
- Returns processed videos and metadata
- Flask-based microservice for image captioning
- Uses pre-trained models for scene description
- Deployed as containerized service
- Monocular depth estimation using CNN
- Generates depth maps for audio intensity mapping
- Uses custom Keras model with BilinearUpSampling2D
- Maps text descriptions to audio samples
- Handles audio synthesis and mixing
- Implements intensity-based volume control
- Manages file uploads and downloads
- Handles cloud storage operations
- Provides secure file access
realive/
βββ π main.py # Main FastAPI application
βββ π depth_image_generator.py # Depth estimation module
βββ π sound_mapper_helpers.py # Audio synthesis and mapping
βββ π gcp_helpers.py # Google Cloud Platform utilities
βββ π tokenizer.py # Text processing utilities
βββ π layers.py # Custom Keras layers
βββ π post_req_helper.py # HTTP request utilities
βββ π img2text/ # Image-to-text microservice
β βββ π app.py # Flask application
β βββ π image_to_text.py # Image captioning logic
β βββ π Dockerfile # Container configuration
β βββ π start_app.sh # Startup script
βββ π heatmap/ # Depth visualization service
β βββ π app.py # Heatmap generation API
β βββ π depth_image_generator.py # Depth processing
β βββ π Dockerfile # Container configuration
βββ π templates/ # Web templates
β βββ π index.html # Main web interface
βββ π data/ # Configuration and data
β βββ π __init__.py # Data module configuration
β βββ π databaseFiles/ # Audio datasets and mappings
β βββ π esc50.csv # Audio dataset metadata
β βββ π intensityMap.json # Audio intensity mappings
β βββ π textMusicMapping.json # Text-to-audio mappings
βββ π test_images/ # Sample images for testing
βββ π requirements.txt # Python dependencies
βββ π pytorch_requirements.txt # PyTorch-specific dependencies
βββ π spacy_requirements.txt # spaCy NLP dependencies
βββ π LICENSE # MIT License
- Description: Main web interface
- Response: HTML page with upload form
- Description: Process uploaded image and generate audio-video
- Request: Multipart form with image file
- Response: JSON with processing results
{ "img2text": "A beautiful landscape with mountains and trees", "linkToFinalVideo": "https://storage.googleapis.com/bucket/video.mp4", "linkToHeatMap": "https://storage.googleapis.com/bucket/depth.jpg" }
class PipelineFinish(BaseModel):
img2text: str # Generated text description
linkToFinalVideo: str # URL to final video with audio
linkToHeatMap: str # URL to depth map visualization-
Upload an Image
curl -X POST "http://localhost:8000/upload/" \ -H "Content-Type: multipart/form-data" \ -F "file=@your_image.jpg"
-
Process via Python
import requests with open('image.jpg', 'rb') as f: files = {'file': f} response = requests.post('http://localhost:8000/upload/', files=files) result = response.json() print(f"Description: {result['img2text']}") print(f"Video: {result['linkToFinalVideo']}")
# Modify intensityMap.json to adjust audio levels
{
"bird": -2,
"water": -1,
"wind": -3,
"traffic": 0
}# Adjust depth estimation parameters
def predict(model, images, minDepth=10, maxDepth=1000, batch_size=2):
# Customize depth range and batch processing-
Build containers
# Build image-to-text service cd img2text docker build -t realive-img2text . # Build heatmap service cd ../heatmap docker build -t realive-heatmap .
-
Deploy to Google Cloud Run
# Deploy main application gcloud run deploy realive-main --source . --platform managed --region us-central1 # Deploy microservices gcloud run deploy realive-img2text --image realive-img2text --platform managed gcloud run deploy realive-heatmap --image realive-heatmap --platform managed
export GCP_BUCKET_NAME="your-bucket-name"
export IMG2TEXT_API="https://your-img2text-service.run.app/img2txt/"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"The test_images/ directory contains sample images for testing:
- Landscape photos
- Urban scenes
- Nature images
- Various lighting conditions
# Test image processing pipeline
python -c "
from main import upload
import os
test_image = 'test_images/photo-1502781252888-9143ba7f074e.jpeg'
with open(test_image, 'rb') as f:
result = upload(f)
print('Test completed successfully!')
"- Model: Custom CNN with BilinearUpSampling2D
- Input: RGB images (640x480)
- Output: Depth maps for audio intensity mapping
- Framework: Keras/TensorFlow
- Model: Pre-trained vision-language model
- Input: RGB images
- Output: Natural language descriptions
- Framework: Transformers/Hugging Face
- Dataset: ESC-50 environmental sound classification
- Mapping: Text-to-audio semantic matching
- Processing: Pydub for audio manipulation
- Output: Mixed stereo audio tracks
- Batch Processing: Efficient depth estimation with configurable batch sizes
- Caching: Google Cloud Storage for processed files
- Async Processing: Non-blocking file operations
- Memory Management: Optimized image resizing and processing
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements.txt
pip install -r pytorch_requirements.txt
pip install -r spacy_requirements.txt
# Install pre-commit hooks
pre-commit install
# Run tests
python -m pytest tests/- Processing Time: ~30-60 seconds per image
- Supported Formats: JPEG, PNG, WebP
- Max Image Size: 10MB
- Audio Duration: 2-4 seconds
- Video Output: MP4 (H.264)
# Ensure depth model is in project root
ls -la depthmodel.h5
# If missing, download from model repository# Verify credentials
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"# Reduce batch size in depth_image_generator.py
def predict(model, images, batch_size=1): # Reduced from 2# Install additional audio codecs
sudo apt-get install ffmpeg
pip install pydub[scipy]- Image Animation: Add subtle motion to static images
- Augmented Reality: AR integration for immersive experiences
- Real-time Processing: Optimize for faster response times
- Mobile App: Native iOS and Android applications
- Advanced Audio: 3D spatial audio and surround sound
- Batch Processing: Multiple image processing capabilities
- Custom Models: User-trainable audio synthesis models
This project is licensed under the MIT License - see the LICENSE file for details.
HackHarvard 2022 Team
- Samanvya Tripathi - Lead Developer & Project Manager
- Team Members - Full-stack development, ML engineering, Cloud architecture
- HackHarvard 2022 for the amazing hackathon experience
- Google Cloud Platform for providing the infrastructure and winning the "Best Use of Google Cloud" award
- Open Source Community for the incredible tools and libraries
- ESC-50 Dataset creators for the environmental sound data
- Keras/TensorFlow team for the depth estimation models
- Project Link: https://github.com/yourusername/realive
- Devpost: https://devpost.com/software/realive
Made with β€οΈ at HackHarvard 2022
Bringing memories to life, one photo at a time π΅πΈ