- Lost-in-the-Middle in Industrial Contexts * Evaluated 7-8B SOTA models on the GM-Extract benchmark.
- Investigated spatial retrieval failures and "middle-context" performance drops.
- Auto-formalization for LLM Verification * Framework for translating LLM reasoning into formal specifications.
- Achieving "closed-loop" verification against ground truth data.
- Implicit Knowledge for Hierarchical RAG * Developed a novel method for retrieving tree-structured data in vector databases.
- Improved efficiency for complex engineering documentation.
Most of my codebase is developed within private enterprise environments (including work with engineering teams at General Motors).
- Note on Public Repos: My GitHub activity does not reflect my daily output, as the majority of my work involves proprietary industrial work.
- Open Research: While the core implementation code is private, I am committed to publishing the methodologies, benchmarks (like GM-Extract), and theoretical findings via ArXiv and research conferences.
- Research Interests: LLM Evaluation, Neuro-symbolic AI, Retrieval Augmented Generation (RAG), Formal Verification.
- Expertise: Long-context optimization, hierarchical data structures, and automated truth-verification pipelines.
- Stack: Python, PyTorch, Hugging Face, LangChain, Vector Databases (Milvus/Pinecone/Weaviate).
- LinkedIn: https://www.linkedin.com/in/mihir-gupte-0408/
- Google Scholar: https://scholar.google.com/citations?user=y1WEYT0AAAAJ
- Email: mihir.a.gupte@gmail.com
