ReAlive 🎵📸

Winner of "Best Use of Google Cloud" at HackHarvard 2022 🏆 https://devpost.com/software/realive

ReAlive is an innovative AI-powered web application that brings your old photos to life by adding realistic, contextually-aware audio. Using advanced computer vision and machine learning techniques, ReAlive analyzes images to generate depth maps, extract semantic information, and synthesize immersive audio experiences that recreate what the scene might have sounded like.

🌟 Features

🎨 Image Analysis: Advanced computer vision to extract visual elements and depth information
🔊 Smart Audio Synthesis: AI-powered audio generation based on image content
📐 Depth Mapping: Monocular depth estimation using CNN and OpenCV
🌐 Web Interface: Clean, responsive web UI built with FastAPI and Bootstrap
☁️ Cloud-Native: Fully deployed on Google Cloud Platform with containerized microservices
🎵 Audio Mixing: Intelligent audio layering and intensity mapping
📱 Real-time Processing: Fast image-to-audio conversion pipeline

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Google Cloud Platform account
Docker (for containerized deployment)

Installation

Clone the repository

git clone https://github.com/yourusername/realive.git
cd realive

Install dependencies

# Install main requirements
pip install -r requirements.txt

# Install PyTorch-specific requirements
pip install -r pytorch_requirements.txt

# Install spaCy requirements
pip install -r spacy_requirements.txt

Set up Google Cloud credentials

# Download your GCP service account key
# Place it in the project root as 'plasma-myth-365608-02f3f88329f7.json'
export GOOGLE_APPLICATION_CREDENTIALS="plasma-myth-365608-02f3f88329f7.json"

Download required models

# Download the depth estimation model
# Place 'depthmodel.h5' in the project root

Run the application
```
python main.py
```
Access the web interface Open your browser and navigate to http://localhost:8000

🏗️ Architecture

System Overview

ReAlive follows a microservices architecture with the following components:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web Frontend  │    │   Image2Text    │    │   Depth Map     │
│   (FastAPI)     │◄──►│   Service       │    │   Generator     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Sound Mapper  │    │   Google Cloud  │    │   Audio Mixer   │
│   & Synthesizer │    │   Storage       │    │   (Pydub)       │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Core Components

1. Main API Server (`main.py`)

FastAPI-based web server
Handles image uploads and processing requests
Orchestrates the entire pipeline
Returns processed videos and metadata

2. Image-to-Text Service (`img2text/`)

Flask-based microservice for image captioning
Uses pre-trained models for scene description
Deployed as containerized service

3. Depth Map Generator (`depth_image_generator.py`)

Monocular depth estimation using CNN
Generates depth maps for audio intensity mapping
Uses custom Keras model with BilinearUpSampling2D

4. Sound Mapper (`sound_mapper_helpers.py`)

Maps text descriptions to audio samples
Handles audio synthesis and mixing
Implements intensity-based volume control

5. Google Cloud Integration (`gcp_helpers.py`)

Manages file uploads and downloads
Handles cloud storage operations
Provides secure file access

📁 Project Structure

realive/
├── 📄 main.py                          # Main FastAPI application
├── 📄 depth_image_generator.py         # Depth estimation module
├── 📄 sound_mapper_helpers.py          # Audio synthesis and mapping
├── 📄 gcp_helpers.py                   # Google Cloud Platform utilities
├── 📄 tokenizer.py                     # Text processing utilities
├── 📄 layers.py                        # Custom Keras layers
├── 📄 post_req_helper.py               # HTTP request utilities
├── 📁 img2text/                        # Image-to-text microservice
│   ├── 📄 app.py                       # Flask application
│   ├── 📄 image_to_text.py             # Image captioning logic
│   ├── 📄 Dockerfile                   # Container configuration
│   └── 📄 start_app.sh                 # Startup script
├── 📁 heatmap/                         # Depth visualization service
│   ├── 📄 app.py                       # Heatmap generation API
│   ├── 📄 depth_image_generator.py     # Depth processing
│   └── 📄 Dockerfile                   # Container configuration
├── 📁 templates/                       # Web templates
│   └── 📄 index.html                   # Main web interface
├── 📁 data/                            # Configuration and data
│   ├── 📄 __init__.py                  # Data module configuration
│   └── 📁 databaseFiles/               # Audio datasets and mappings
│       ├── 📄 esc50.csv                # Audio dataset metadata
│       ├── 📄 intensityMap.json        # Audio intensity mappings
│       └── 📄 textMusicMapping.json    # Text-to-audio mappings
├── 📁 test_images/                     # Sample images for testing
├── 📄 requirements.txt                 # Python dependencies
├── 📄 pytorch_requirements.txt         # PyTorch-specific dependencies
├── 📄 spacy_requirements.txt           # spaCy NLP dependencies
└── 📄 LICENSE                          # MIT License

🔧 API Documentation

Endpoints

`GET /`

Description: Main web interface
Response: HTML page with upload form

`POST /upload/`

Description: Process uploaded image and generate audio-video
Request: Multipart form with image file

Response: JSON with processing results

{
  "img2text": "A beautiful landscape with mountains and trees",
  "linkToFinalVideo": "https://storage.googleapis.com/bucket/video.mp4",
  "linkToHeatMap": "https://storage.googleapis.com/bucket/depth.jpg"
}

Request/Response Models

PipelineFinish

class PipelineFinish(BaseModel):
    img2text: str                    # Generated text description
    linkToFinalVideo: str           # URL to final video with audio
    linkToHeatMap: str              # URL to depth map visualization

🎯 Usage Examples

Basic Usage

Upload an Image

curl -X POST "http://localhost:8000/upload/" \
     -H "Content-Type: multipart/form-data" \
     -F "file=@your_image.jpg"

Process via Python

import requests

with open('image.jpg', 'rb') as f:
    files = {'file': f}
    response = requests.post('http://localhost:8000/upload/', files=files)
    result = response.json()
    print(f"Description: {result['img2text']}")
    print(f"Video: {result['linkToFinalVideo']}")

Advanced Configuration

Custom Audio Intensity Mapping

# Modify intensityMap.json to adjust audio levels
{
  "bird": -2,
  "water": -1,
  "wind": -3,
  "traffic": 0
}

Depth Model Customization

# Adjust depth estimation parameters
def predict(model, images, minDepth=10, maxDepth=1000, batch_size=2):
    # Customize depth range and batch processing

🚀 Deployment

Docker Deployment

Build containers

# Build image-to-text service
cd img2text
docker build -t realive-img2text .

# Build heatmap service
cd ../heatmap
docker build -t realive-heatmap .

Deploy to Google Cloud Run

# Deploy main application
gcloud run deploy realive-main --source . --platform managed --region us-central1

# Deploy microservices
gcloud run deploy realive-img2text --image realive-img2text --platform managed
gcloud run deploy realive-heatmap --image realive-heatmap --platform managed

Environment Variables

export GCP_BUCKET_NAME="your-bucket-name"
export IMG2TEXT_API="https://your-img2text-service.run.app/img2txt/"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

🧪 Testing

Test Images

The test_images/ directory contains sample images for testing:

Landscape photos
Urban scenes
Nature images
Various lighting conditions

Running Tests

# Test image processing pipeline
python -c "
from main import upload
import os
test_image = 'test_images/photo-1502781252888-9143ba7f074e.jpeg'
with open(test_image, 'rb') as f:
    result = upload(f)
    print('Test completed successfully!')
"

🔬 Technical Details

Machine Learning Models

Depth Estimation

Model: Custom CNN with BilinearUpSampling2D
Input: RGB images (640x480)
Output: Depth maps for audio intensity mapping
Framework: Keras/TensorFlow

Image Captioning

Model: Pre-trained vision-language model
Input: RGB images
Output: Natural language descriptions
Framework: Transformers/Hugging Face

Audio Synthesis

Dataset: ESC-50 environmental sound classification
Mapping: Text-to-audio semantic matching
Processing: Pydub for audio manipulation
Output: Mixed stereo audio tracks

Performance Optimizations

Batch Processing: Efficient depth estimation with configurable batch sizes
Caching: Google Cloud Storage for processed files
Async Processing: Non-blocking file operations
Memory Management: Optimized image resizing and processing

🤝 Contributing

We welcome contributions! Please see our contributing guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements.txt
pip install -r pytorch_requirements.txt
pip install -r spacy_requirements.txt

# Install pre-commit hooks
pre-commit install

# Run tests
python -m pytest tests/

📊 Performance Metrics

Processing Time: ~30-60 seconds per image
Supported Formats: JPEG, PNG, WebP
Max Image Size: 10MB
Audio Duration: 2-4 seconds
Video Output: MP4 (H.264)

🐛 Troubleshooting

Common Issues

1. Model Loading Errors

# Ensure depth model is in project root
ls -la depthmodel.h5
# If missing, download from model repository

2. Google Cloud Authentication

# Verify credentials
gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"

3. Memory Issues

# Reduce batch size in depth_image_generator.py
def predict(model, images, batch_size=1):  # Reduced from 2

4. Audio Processing Errors

# Install additional audio codecs
sudo apt-get install ffmpeg
pip install pydub[scipy]

📈 Future Roadmap

Image Animation: Add subtle motion to static images
Augmented Reality: AR integration for immersive experiences
Real-time Processing: Optimize for faster response times
Mobile App: Native iOS and Android applications
Advanced Audio: 3D spatial audio and surround sound
Batch Processing: Multiple image processing capabilities
Custom Models: User-trainable audio synthesis models

📚 Resources

Documentation

Related Projects

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Team

HackHarvard 2022 Team

Samanvya Tripathi - Lead Developer & Project Manager
Team Members - Full-stack development, ML engineering, Cloud architecture

🙏 Acknowledgments

HackHarvard 2022 for the amazing hackathon experience
Google Cloud Platform for providing the infrastructure and winning the "Best Use of Google Cloud" award
Open Source Community for the incredible tools and libraries
ESC-50 Dataset creators for the environmental sound data
Keras/TensorFlow team for the depth estimation models

📞 Contact

Project Link: https://github.com/yourusername/realive
Devpost: https://devpost.com/software/realive

Made with ❤️ at HackHarvard 2022

Bringing memories to life, one photo at a time 🎵📸

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
databaseFiles		databaseFiles
heatmap		heatmap
img2text		img2text
templates		templates
test_images		test_images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ReAlive.pptx		ReAlive.pptx
__init__.py		__init__.py
depth_image_generator.py		depth_image_generator.py
gcp_helpers.py		gcp_helpers.py
layers.py		layers.py
main.py		main.py
post_image_examplereq.py		post_image_examplereq.py
post_req_helper.py		post_req_helper.py
pytorch_requirements.txt		pytorch_requirements.txt
requirements.txt		requirements.txt
sound_mapper_helpers.py		sound_mapper_helpers.py
spacy_requirements.txt		spacy_requirements.txt
tokenizer.py		tokenizer.py

License

sacredvoid/ReAlive

Folders and files

Latest commit

History

Repository files navigation

ReAlive 🎵📸

🌟 Features

🚀 Quick Start

Prerequisites

Installation

🏗️ Architecture

System Overview

Core Components

1. Main API Server (main.py)

2. Image-to-Text Service (img2text/)

3. Depth Map Generator (depth_image_generator.py)

4. Sound Mapper (sound_mapper_helpers.py)

5. Google Cloud Integration (gcp_helpers.py)

📁 Project Structure

🔧 API Documentation

Endpoints

GET /

POST /upload/

Request/Response Models

PipelineFinish

🎯 Usage Examples

Basic Usage

Advanced Configuration

Custom Audio Intensity Mapping

Depth Model Customization

🚀 Deployment

Docker Deployment

Environment Variables

🧪 Testing

Test Images

Running Tests

🔬 Technical Details

Machine Learning Models

Depth Estimation

Image Captioning

Audio Synthesis

Performance Optimizations

🤝 Contributing

Development Setup

📊 Performance Metrics

🐛 Troubleshooting

Common Issues

1. Model Loading Errors

2. Google Cloud Authentication

3. Memory Issues

4. Audio Processing Errors

📈 Future Roadmap

📚 Resources

Documentation

Related Projects

📄 License

👥 Team

🙏 Acknowledgments

📞 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Main API Server (`main.py`)

2. Image-to-Text Service (`img2text/`)

3. Depth Map Generator (`depth_image_generator.py`)

4. Sound Mapper (`sound_mapper_helpers.py`)

5. Google Cloud Integration (`gcp_helpers.py`)

`GET /`

`POST /upload/`

Packages