Mihir Gupte mihirgupte

Hi there, I'm Mihir Gupte 👋

Lost-in-the-Middle in Industrial Contexts * Evaluated 7-8B SOTA models on the GM-Extract benchmark.
- Investigated spatial retrieval failures and "middle-context" performance drops.
Auto-formalization for LLM Verification * Framework for translating LLM reasoning into formal specifications.
- Achieving "closed-loop" verification against ground truth data.
Implicit Knowledge for Hierarchical RAG * Developed a novel method for retrieving tree-structured data in vector databases.
- Improved efficiency for complex engineering documentation.

Most of my codebase is developed within private enterprise environments (including work with engineering teams at General Motors).

Note on Public Repos: My GitHub activity does not reflect my daily output, as the majority of my work involves proprietary industrial work.
Open Research: While the core implementation code is private, I am committed to publishing the methodologies, benchmarks (like GM-Extract), and theoretical findings via ArXiv and research conferences.

Research Interests: LLM Evaluation, Neuro-symbolic AI, Retrieval Augmented Generation (RAG), Formal Verification.
Expertise: Long-context optimization, hierarchical data structures, and automated truth-verification pipelines.
Stack: Python, PyTorch, Hugging Face, LangChain, Vector Databases (Milvus/Pinecone/Weaviate).