π Portfolio: somiljain7.github.io
π Location: Bengaluru, Karnataka, India
πΌ Current Role: Founding ML Engineer @ Vaani
I'm an ML engineer and researcher who specializes in breaking systems apart, understanding their failure modes, and rebuilding them to withstand production reality. My work sits at the intersection of language research, speech systems, and production-grade AI infrastructure.
I've moved past the era of impressive demos. What drives me now is latency optimization, system robustness, and shipping AI models that actually survive user traffic at scale. I believe the most exciting ML work happens when you're debugging edge cases at 3 AM, not when you're hitting 99% on a clean benchmark.
- π― Production ML systems that handle millions of requests without breaking
- π£οΈ Voice AI that feels natural, responsive, and culturally aware
- π¬ Research with real-world impact β papers are great, but I care more about what ships
- β‘ Low-latency architectures where every millisecond counts
- π Multilingual & localized AI that works for more than just English speakers
Building the core product capabilities and voice AI infrastructure that powers hundreds of Voice AI companies and millions of deployments globally.
Key responsibilities:
- Architecting scalable ML pipelines for real-time speech processing
- Optimizing inference latency for STT, LLM, and TTS components
- Building production-grade voice AI systems that handle edge cases gracefully
- Developing tools and infrastructure for rapid experimentation and deployment
- Working on multilingual and code-mixed speech recognition
Impact:
- Enabling voice AI at scale for diverse use cases across industries
- Contributing to infrastructure that processes millions of voice interactions
- Pushing the boundaries of what's possible in real-time conversational AI
Developed machine learning solutions for aerospace applications, focusing on predictive maintenance and operational optimization.
Conducted research in machine learning and AI, contributing to academic publications and experimental systems in collaboration with faculty and fellow researchers.
- Delivered technical talks and seminars on Machine Learning and Blockchain at various institutions
- Mentored students and junior developers in ML/AI fundamentals
- Led workshops on practical ML implementation and deployment strategies
- Led a community of 100+ students passionate about open-source technology
- Organized hackathons, workshops, and tech talks on web technologies and open-source contribution
- Fostered collaboration between students and the broader Mozilla community
- Promoted web literacy and open-source values on campus
An immersive voice AI experience featuring "Moriarty Bhai" β a localized Hindi/English persona that brings character and cultural context to conversational AI.
What makes it special:
- Ultra-low latency pipeline: Real-time STT β LLM β Emotional TTS loop optimized for <500ms end-to-end latency
- Cultural localization: Seamless code-switching between Hindi and English with culturally appropriate responses
- Emotional intelligence: Dynamic TTS that adapts tone and emotion based on conversation context
- Immersive experience: Designed as an interactive game/experience, not just a chatbot
- Production-grade architecture: Built to handle real user interactions, not just demos
Technical highlights:
- Custom STT preprocessing for Hindi-English code-mixing
- Optimized LLM inference with streaming responses
- Real-time emotional TTS generation with voice cloning
- WebSocket-based bidirectional communication for minimal latency
- Robust error handling and graceful degradation
Built for: OpenAI x Peak XV Ventures Hackathon
π Repository: [Coming soon]
π₯ Demo: [Coming soon]
Research and implementation of ASR systems for Indian languages with limited training data.
Key contributions:
- Transfer learning from high-resource to low-resource language models
- Data augmentation techniques for speech in noisy environments
- Fine-tuning strategies for code-mixed conversations
Built a production-ready audio processing system for voice applications.
Features:
- WebRTC integration for low-latency audio streaming
- Noise suppression and audio enhancement
- VAD (Voice Activity Detection) for efficient processing
- Multi-speaker diarization
- Scalable microservice architecture
Primary: Python (PyTorch, TensorFlow), C++ (for performance-critical components)
Secondary: JavaScript/TypeScript (for tooling and APIs), Bash/Shell scripting
- Deep Learning: PyTorch (primary), TensorFlow, JAX
- NLP/Speech: HuggingFace Transformers, Whisper, Wav2Vec2, FastSpeech2
- LLMs: OpenAI API, Anthropic Claude, Llama, Mistral
- Audio Processing: librosa, soundfile, pydub, WebRTC
- ML Ops: Weights & Biases, MLflow, TensorBoard
- Containerization: Docker, Docker Compose, Kubernetes
- Cloud Platforms: AWS (SageMaker, Lambda, EC2), GCP (Vertex AI, Cloud Run)
- CI/CD: GitHub Actions, GitLab CI, Jenkins
- Monitoring: Prometheus, Grafana, ELK Stack
- Databases: PostgreSQL, Redis, MongoDB, Vector DBs (Pinecone, Weaviate)
- Speech Tools: Kaldi, ESPnet, Coqui TTS, XTTS
- Optimization: ONNX, TensorRT, quantization techniques
- API Development: FastAPI, Flask, gRPC
- Real-time Communication: WebSockets, WebRTC, Socket.io
I'm particularly interested in the following areas:
- π£οΈ Speech Processing: ASR, TTS, speaker recognition, emotion detection
- π Multilingual NLP: Code-mixing, low-resource languages, cross-lingual transfer
- β‘ Model Optimization: Quantization, pruning, knowledge distillation for edge deployment
- π Conversational AI: Dialog systems, persona consistency, context management
- π Audio Understanding: Sound event detection, audio classification, music information retrieval
- π€ Production ML: MLOps, model monitoring, A/B testing, serving infrastructure
Open to collaborations on:
- Speech and language technology for Indian languages
- Real-time multimodal AI systems
- Open-source ML tools and frameworks
- Research with clear paths to production deployment
I occasionally write about ML engineering, production AI systems, and lessons learned from shipping models to production.
Topics I cover:
- Production ML war stories and debugging techniques
- Latency optimization for real-time AI systems
- Building voice AI that doesn't sound like a robot
- The gap between research and production (and how to bridge it)
Blog/Medium links coming soon
I'm always interested in connecting with fellow ML engineers, researchers, and builders. Whether you want to discuss a potential collaboration, talk about production ML challenges, or just chat about the latest in AI β feel free to reach out!
| Platform | Link |
|---|---|
| π¬ Telegram | @coderatwork7 |
| π§ Email | somiljain71100@gmail.com |
| πΌ LinkedIn | somiljain7 |
| π GitHub | @somiljain7 |
| π¦ Twitter/X | @coderatwork7 |
| π Portfolio | somiljain7.github.io |
- Quick questions: Twitter/X DM
- Technical discussions: Email or LinkedIn
- Casual chat: Telegram
- Collaboration proposals: Email with subject line "Collaboration: [Brief Topic]"
- Ship production voice AI systems serving 10M+ users
- Open-source a key component of our voice AI stack
- Contribute to 3+ impactful open-source ML projects
- Write 12+ technical blog posts on production ML
- Speak at 2+ ML/AI conferences or meetups
- Mentor 10+ aspiring ML engineers
- π OpenAI x Peak XV Ventures Hackathon - Moriarty's Game (Voice AI track)
- π Junior Research Fellow - Selected for research position at NIT Karnataka
- π₯ Mozilla Campus Club President - Led 100+ member community (2021-2022)
- π’ Tech Speaker - Delivered talks at multiple institutions on ML & Blockchain
"The best ML models are the ones that ship. The second-best are the ones that actually work when users touch them. Everything else is just interesting math."
I believe in:
- Pragmatism over perfectionism β 90% accuracy in production beats 99% in a notebook
- Latency as a feature β fast models create better user experiences
- Learning by shipping β you learn more from one production failure than ten successful benchmarks
- Open collaboration β the best ideas come from diverse perspectives
- Continuous improvement β today's state-of-the-art is tomorrow's baseline
I'm always interested in:
- π€ Collaborations on speech/NLP projects, especially for Indian languages
- πΌ Opportunities to work on challenging production ML problems
- π§βπ€βπ§ Connections with researchers and engineers pushing the boundaries of voice AI
- π Knowledge sharing β if you're working on similar problems, let's chat!
This README is licensed under CC BY 4.0. Feel free to use it as inspiration for your own profile!
I like hard problems, real systems, and work that ships.
Let's build something that matters. π
Last updated: February 2026

