Skip to content
View somiljain7's full-sized avatar
🎯
too tired to remain focused
🎯
too tired to remain focused

Organizations

@fossasia @NVIDIAGameWorks @mozilla-campus-club-bvducoep @decentralizedautomata

Block or report somiljain7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
somiljain7/readme.md

BONJOUR MON AMI πŸ‘‹

Somil Jain (@coderatwork7)

🌐 Portfolio: somiljain7.github.io
πŸ“ Location: Bengaluru, Karnataka, India
πŸ’Ό Current Role: Founding ML Engineer @ Vaani


πŸ‘¨β€πŸ’» About Me

I'm an ML engineer and researcher who specializes in breaking systems apart, understanding their failure modes, and rebuilding them to withstand production reality. My work sits at the intersection of language research, speech systems, and production-grade AI infrastructure.

I've moved past the era of impressive demos. What drives me now is latency optimization, system robustness, and shipping AI models that actually survive user traffic at scale. I believe the most exciting ML work happens when you're debugging edge cases at 3 AM, not when you're hitting 99% on a clean benchmark.

What gets me excited:

  • 🎯 Production ML systems that handle millions of requests without breaking
  • πŸ—£οΈ Voice AI that feels natural, responsive, and culturally aware
  • πŸ”¬ Research with real-world impact β€” papers are great, but I care more about what ships
  • ⚑ Low-latency architectures where every millisecond counts
  • 🌍 Multilingual & localized AI that works for more than just English speakers

πŸš€ Current Focus

πŸ§‘β€πŸ’» Founding ML Engineer @ Vaani

Building the core product capabilities and voice AI infrastructure that powers hundreds of Voice AI companies and millions of deployments globally.

Key responsibilities:

  • Architecting scalable ML pipelines for real-time speech processing
  • Optimizing inference latency for STT, LLM, and TTS components
  • Building production-grade voice AI systems that handle edge cases gracefully
  • Developing tools and infrastructure for rapid experimentation and deployment
  • Working on multilingual and code-mixed speech recognition

Impact:

  • Enabling voice AI at scale for diverse use cases across industries
  • Contributing to infrastructure that processes millions of voice interactions
  • Pushing the boundaries of what's possible in real-time conversational AI

πŸ’Ό Professional Experience

πŸ€– AI/ML Developer @ Aerocraft Engineering

Developed machine learning solutions for aerospace applications, focusing on predictive maintenance and operational optimization.

πŸ”¬ Junior Research Fellow @ NIT Karnataka, Surathkal

Conducted research in machine learning and AI, contributing to academic publications and experimental systems in collaboration with faculty and fellow researchers.

🎀 Tech Speaker & Educator

  • Delivered technical talks and seminars on Machine Learning and Blockchain at various institutions
  • Mentored students and junior developers in ML/AI fundamentals
  • Led workshops on practical ML implementation and deployment strategies

🀝 President, Mozilla Campus Club BVUCOEP (2021–2022)

  • Led a community of 100+ students passionate about open-source technology
  • Organized hackathons, workshops, and tech talks on web technologies and open-source contribution
  • Fostered collaboration between students and the broader Mozilla community
  • Promoted web literacy and open-source values on campus

🧩 Featured Projects

πŸ•΅οΈ Moriarty's Game – Real-time Multimodal Voice Agent

An immersive voice AI experience featuring "Moriarty Bhai" β€” a localized Hindi/English persona that brings character and cultural context to conversational AI.

What makes it special:

  • Ultra-low latency pipeline: Real-time STT β†’ LLM β†’ Emotional TTS loop optimized for <500ms end-to-end latency
  • Cultural localization: Seamless code-switching between Hindi and English with culturally appropriate responses
  • Emotional intelligence: Dynamic TTS that adapts tone and emotion based on conversation context
  • Immersive experience: Designed as an interactive game/experience, not just a chatbot
  • Production-grade architecture: Built to handle real user interactions, not just demos

Technical highlights:

  • Custom STT preprocessing for Hindi-English code-mixing
  • Optimized LLM inference with streaming responses
  • Real-time emotional TTS generation with voice cloning
  • WebSocket-based bidirectional communication for minimal latency
  • Robust error handling and graceful degradation

Built for: OpenAI x Peak XV Ventures Hackathon
πŸ”— Repository: [Coming soon]
πŸŽ₯ Demo: [Coming soon]


πŸ—£οΈ Speech Recognition for Low-Resource Languages

Research and implementation of ASR systems for Indian languages with limited training data.

Key contributions:

  • Transfer learning from high-resource to low-resource language models
  • Data augmentation techniques for speech in noisy environments
  • Fine-tuning strategies for code-mixed conversations

πŸ”Š Real-time Audio Processing Pipeline

Built a production-ready audio processing system for voice applications.

Features:

  • WebRTC integration for low-latency audio streaming
  • Noise suppression and audio enhancement
  • VAD (Voice Activity Detection) for efficient processing
  • Multi-speaker diarization
  • Scalable microservice architecture

πŸ› οΈ Technical Stack

Languages

Python C++ JavaScript SQL

Primary: Python (PyTorch, TensorFlow), C++ (for performance-critical components)
Secondary: JavaScript/TypeScript (for tooling and APIs), Bash/Shell scripting

ML/AI Frameworks & Tools

PyTorch TensorFlow scikit-learn HuggingFace

  • Deep Learning: PyTorch (primary), TensorFlow, JAX
  • NLP/Speech: HuggingFace Transformers, Whisper, Wav2Vec2, FastSpeech2
  • LLMs: OpenAI API, Anthropic Claude, Llama, Mistral
  • Audio Processing: librosa, soundfile, pydub, WebRTC
  • ML Ops: Weights & Biases, MLflow, TensorBoard

Infrastructure & DevOps

Docker Kubernetes AWS GCP

  • Containerization: Docker, Docker Compose, Kubernetes
  • Cloud Platforms: AWS (SageMaker, Lambda, EC2), GCP (Vertex AI, Cloud Run)
  • CI/CD: GitHub Actions, GitLab CI, Jenkins
  • Monitoring: Prometheus, Grafana, ELK Stack
  • Databases: PostgreSQL, Redis, MongoDB, Vector DBs (Pinecone, Weaviate)

Specialized Tools

  • Speech Tools: Kaldi, ESPnet, Coqui TTS, XTTS
  • Optimization: ONNX, TensorRT, quantization techniques
  • API Development: FastAPI, Flask, gRPC
  • Real-time Communication: WebSockets, WebRTC, Socket.io

πŸ“š Research Interests

I'm particularly interested in the following areas:

  • πŸ—£οΈ Speech Processing: ASR, TTS, speaker recognition, emotion detection
  • 🌐 Multilingual NLP: Code-mixing, low-resource languages, cross-lingual transfer
  • ⚑ Model Optimization: Quantization, pruning, knowledge distillation for edge deployment
  • 🎭 Conversational AI: Dialog systems, persona consistency, context management
  • πŸ”Š Audio Understanding: Sound event detection, audio classification, music information retrieval
  • πŸ€– Production ML: MLOps, model monitoring, A/B testing, serving infrastructure

Open to collaborations on:

  • Speech and language technology for Indian languages
  • Real-time multimodal AI systems
  • Open-source ML tools and frameworks
  • Research with clear paths to production deployment

πŸ“ Writing & Talks

I occasionally write about ML engineering, production AI systems, and lessons learned from shipping models to production.

Topics I cover:

  • Production ML war stories and debugging techniques
  • Latency optimization for real-time AI systems
  • Building voice AI that doesn't sound like a robot
  • The gap between research and production (and how to bridge it)

Blog/Medium links coming soon


πŸ“Š GitHub Statistics

GitHub Stats

GitHub Streak

Top Languages

Profile Views


🀝 Let's Connect

I'm always interested in connecting with fellow ML engineers, researchers, and builders. Whether you want to discuss a potential collaboration, talk about production ML challenges, or just chat about the latest in AI β€” feel free to reach out!

πŸ“« Contact Information

Platform Link
πŸ’¬ Telegram @coderatwork7
πŸ“§ Email somiljain71100@gmail.com
πŸ’Ό LinkedIn somiljain7
πŸ™ GitHub @somiljain7
🐦 Twitter/X @coderatwork7
🌐 Portfolio somiljain7.github.io

πŸ’‘ Best ways to reach me:

  • Quick questions: Twitter/X DM
  • Technical discussions: Email or LinkedIn
  • Casual chat: Telegram
  • Collaboration proposals: Email with subject line "Collaboration: [Brief Topic]"

🎯 Current Goals (2026)

  • Ship production voice AI systems serving 10M+ users
  • Open-source a key component of our voice AI stack
  • Contribute to 3+ impactful open-source ML projects
  • Write 12+ technical blog posts on production ML
  • Speak at 2+ ML/AI conferences or meetups
  • Mentor 10+ aspiring ML engineers

πŸ† Achievements & Recognition

  • πŸ… OpenAI x Peak XV Ventures Hackathon - Moriarty's Game (Voice AI track)
  • πŸŽ“ Junior Research Fellow - Selected for research position at NIT Karnataka
  • πŸ‘₯ Mozilla Campus Club President - Led 100+ member community (2021-2022)
  • πŸ“’ Tech Speaker - Delivered talks at multiple institutions on ML & Blockchain

πŸ’­ Philosophy

"The best ML models are the ones that ship. The second-best are the ones that actually work when users touch them. Everything else is just interesting math."

I believe in:

  • Pragmatism over perfectionism β€” 90% accuracy in production beats 99% in a notebook
  • Latency as a feature β€” fast models create better user experiences
  • Learning by shipping β€” you learn more from one production failure than ten successful benchmarks
  • Open collaboration β€” the best ideas come from diverse perspectives
  • Continuous improvement β€” today's state-of-the-art is tomorrow's baseline

πŸ” What I'm Looking For

I'm always interested in:

  • 🀝 Collaborations on speech/NLP projects, especially for Indian languages
  • πŸ’Ό Opportunities to work on challenging production ML problems
  • πŸ§‘β€πŸ€β€πŸ§‘ Connections with researchers and engineers pushing the boundaries of voice AI
  • πŸ“š Knowledge sharing β€” if you're working on similar problems, let's chat!

πŸ“œ License

This README is licensed under CC BY 4.0. Feel free to use it as inspiration for your own profile!


I like hard problems, real systems, and work that ships.

Let's build something that matters. πŸš€


Last updated: February 2026

Pinned Loading

  1. AI AI Public

    This repository contains my algorithm implemetantion +EDA + data visualization

    Jupyter Notebook 10 9

  2. location-geo-api location-geo-api Public

    Python

  3. A python script for converting praat... A python script for converting praat textgrid files to SRT files
    1
    ##############################################################################################################
    2
    #    @author Somiljain7
    3
    ##############################################################################################################
    4
    import argparse
    5
    import textgrid # pip install textgrid  
  4. # Script for generating .ark file co... # Script for generating .ark file consisiting recordings and its features #
    1
    ###########################################################################
    2
    # Script for generating .ark file consisiting recordings and its features #
    3
    ###########################################################################
    4
    import struct
    5
    import numpy as np