AI-powered natural language search for codebases using vector embeddings
Try it now: semantic-code-search.vercel.app
Upload a ZIP file of your code and search using natural language queries like:
- "authentication functions"
- "database error handling"
- "JWT token generation"
- Semantic Understanding - Search by meaning, not just keywords
- Lightning Fast - Search results in ~50-60ms (vector similarity search)
- ZIP Upload - Drag & drop your codebase (no GitHub integration needed)
- Syntax Highlighting - Beautiful code display with Prism.js
- Rate Limited - Protected against abuse (10 uploads/hour, 50 searches/hour)
- Multi-Language - Python, JavaScript, TypeScript, Go, Java, Rust, C++, C
Backend:
- FastAPI (Python 3.11)
- ChromaDB (vector database)
- HuggingFace Inference API (embeddings)
- Sentence Transformers (
all-MiniLM-L6-v2)
Frontend:
- React 18 + TypeScript
- Vite (build tool)
- TailwindCSS (styling)
- Lucide React (icons)
Deployment:
- Upload - User uploads ZIP file containing code
- Extract - Backend extracts and scans for supported file types
- Chunk - Code is split into semantic chunks (functions/blocks)
- Embed - HuggingFace API generates 384-dim vectors for each chunk
- Store - Vectors stored in ChromaDB with metadata
- Search - User query is embedded and similarity search finds matches
- Results - Ranked results with score, file path, and syntax highlighting
- Code Review - "Find all authentication checks"
- Onboarding - "Show me database connection code"
- Refactoring - "Where do we handle errors?"
- Learning - "How is JWT implemented here?"
- Python 3.11+
- Node.js 18+
- HuggingFace API key (free tier)
1. Clone the repository
git clone https://github.com/tommyvio/semantic-code-search.git
cd semantic-code-search2. Backend Setup
cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your HUGGINGFACE_API_KEY
uvicorn app.main:app --reload3. Frontend Setup
cd frontend
npm install
cp .env.example .env
# Edit .env: VITE_API_URL=http://localhost:8000
npm run devVisit http://localhost:5173
The frontend includes an "Explain AI" button, but requires your own API key to function.
To enable AI explanations:
- Get a Gemini API key from Google AI Studio
- Add to your
.envfile:GEMINI_API_KEY=your_api_key_here - Restart the backend
This feature uses Google's Gemini API to explain code snippets in plain English.
Base URL: https://semantic-code-search.onrender.com
| Endpoint | Method | Description | Avg Response Time |
|---|---|---|---|
/api/upload |
POST | Upload ZIP file and index code | ~20s (depends on file size) |
/api/search |
POST | Search indexed code with natural language | ~50-60ms |
/api/stats |
GET | Get indexing statistics | ~10ms |
/health |
GET | Check API health | ~5ms |
curl -X POST https://semantic-code-search.onrender.com/api/search \
-H "Content-Type: application/json" \
-d '{"query": "authentication functions", "top_k": 5}'Response:
{
"results": [
{
"code": "def authenticate_user(username, password)...",
"file_path": "auth.py",
"start_line": 1,
"score": 0.87
}
],
"total_results": 3,
"search_time": 0.057
}See full API documentation: API Docs
To prevent abuse and protect API quotas:
- Uploads: 10 per IP per hour
- Searches: 50 per IP per hour
Exceeding limits returns 429 Too Many Requests.
Free Tier Constraints:
- Database resets on server restart (ephemeral storage)
- 512MB RAM limit on Render free tier
- HuggingFace free tier: ~30k API calls/month
- No authentication (public demo)
For Production:
- Add persistent disk storage ($7/month on Render)
- Implement user authentication
- Use dedicated embedding server or local models
- Add Redis for rate limiting across instances
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
MIT License - see LICENSE file for details
- HuggingFace for free inference API
- ChromaDB for vector database
- Sentence Transformers for embeddings
- FastAPI for backend framework
Created by @tommyvio
Note: This is a demo project. The database resets periodically on the free tier. For production use, upgrade to persistent storage.