Semantic Code Search

AI-powered natural language search for codebases using vector embeddings

Live Demo

Try it now: semantic-code-search.vercel.app

Upload a ZIP file of your code and search using natural language queries like:

"authentication functions"
"database error handling"
"JWT token generation"

Features

Semantic Understanding - Search by meaning, not just keywords
Lightning Fast - Search results in ~50-60ms (vector similarity search)
ZIP Upload - Drag & drop your codebase (no GitHub integration needed)
Syntax Highlighting - Beautiful code display with Prism.js
Rate Limited - Protected against abuse (10 uploads/hour, 50 searches/hour)
Multi-Language - Python, JavaScript, TypeScript, Go, Java, Rust, C++, C

Architecture

Tech Stack

Backend:

FastAPI (Python 3.11)
ChromaDB (vector database)
HuggingFace Inference API (embeddings)
Sentence Transformers (all-MiniLM-L6-v2)

Frontend:

React 18 + TypeScript
Vite (build tool)
TailwindCSS (styling)
Lucide React (icons)

Deployment:

Backend: Render (free tier)
Frontend: Vercel
Database: Ephemeral (resets on restart)

How It Works

Upload - User uploads ZIP file containing code
Extract - Backend extracts and scans for supported file types
Chunk - Code is split into semantic chunks (functions/blocks)
Embed - HuggingFace API generates 384-dim vectors for each chunk
Store - Vectors stored in ChromaDB with metadata
Search - User query is embedded and similarity search finds matches
Results - Ranked results with score, file path, and syntax highlighting

Use Cases

Code Review - "Find all authentication checks"
Onboarding - "Show me database connection code"
Refactoring - "Where do we handle errors?"
Learning - "How is JWT implemented here?"

Getting Started

Prerequisites

Python 3.11+
Node.js 18+
HuggingFace API key (free tier)

Local Development

1. Clone the repository

git clone https://github.com/tommyvio/semantic-code-search.git
cd semantic-code-search

2. Backend Setup

cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your HUGGINGFACE_API_KEY
uvicorn app.main:app --reload

3. Frontend Setup

cd frontend
npm install
cp .env.example .env
# Edit .env: VITE_API_URL=http://localhost:8000
npm run dev

Visit http://localhost:5173

AI Code Explanation (Optional)

The frontend includes an "Explain AI" button, but requires your own API key to function.

To enable AI explanations:

Get a Gemini API key from Google AI Studio
Add to your .env file:
```
GEMINI_API_KEY=your_api_key_here
```
Restart the backend

This feature uses Google's Gemini API to explain code snippets in plain English.

API Documentation

Base URL: https://semantic-code-search.onrender.com

Endpoint	Method	Description	Avg Response Time
`/api/upload`	POST	Upload ZIP file and index code	~20s (depends on file size)
`/api/search`	POST	Search indexed code with natural language	~50-60ms
`/api/stats`	GET	Get indexing statistics	~10ms
`/health`	GET	Check API health	~5ms

Example: Search Request

curl -X POST https://semantic-code-search.onrender.com/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication functions", "top_k": 5}'

Response:

{
  "results": [
    {
      "code": "def authenticate_user(username, password)...",
      "file_path": "auth.py",
      "start_line": 1,
      "score": 0.87
    }
  ],
  "total_results": 3,
  "search_time": 0.057
}

See full API documentation: API Docs

Rate Limiting

To prevent abuse and protect API quotas:

Uploads: 10 per IP per hour
Searches: 50 per IP per hour

Exceeding limits returns 429 Too Many Requests.

Limitations

Free Tier Constraints:

Database resets on server restart (ephemeral storage)
512MB RAM limit on Render free tier
HuggingFace free tier: ~30k API calls/month
No authentication (public demo)

For Production:

Add persistent disk storage ($7/month on Render)
Implement user authentication
Use dedicated embedding server or local models
Add Redis for rate limiting across instances

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing)
Open a Pull Request

License

MIT License - see LICENSE file for details

Acknowledgments

HuggingFace for free inference API
ChromaDB for vector database
Sentence Transformers for embeddings
FastAPI for backend framework

Contact

Created by @tommyvio

Note: This is a demo project. The database resets periodically on the free tier. For production use, upgrade to persistent storage.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
test_local.sh		test_local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Code Search

Live Demo

Features

Architecture

Tech Stack

How It Works

Use Cases

Getting Started

Prerequisites

Local Development

AI Code Explanation (Optional)

API Documentation

Example: Search Request

Rate Limiting

Limitations

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

License

tommyvio/semantic-code-search

Folders and files

Latest commit

History

Repository files navigation

Semantic Code Search

Live Demo

Features

Architecture

Tech Stack

How It Works

Use Cases

Getting Started

Prerequisites

Local Development

AI Code Explanation (Optional)

API Documentation

Example: Search Request

Rate Limiting

Limitations

Contributing

License

Acknowledgments

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages