Skip to content
/ nl2sql Public

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

License

Notifications You must be signed in to change notification settings

nadeem4/nl2sql

Repository files navigation

Enterprise NL2SQL Engine

A Production-Grade Natural Language to SQL Engine built on the principles of Zero Trust and Deterministic Execution.

This platform treats "Text-to-SQL" not as a prompt engineering problem, but as a Distributed Systems problem. It replaces fragile one-shot generation with a robust, compiled pipeline that bridges the gap between Unstructured Intention (User Language) and Structured Execution (SQL Databases).


🏗️ System Topology

The architecture is composed of three distinct planes, ensuring separation of concerns and failure isolation.

1. The Control Plane (The Graph)

Responsibility: Reasoning, Planning, and Orchestration.

  • Agentic Graph: Implemented as a Directed Cyclic Graph (LangGraph) to enable "Refinement Loops". If a plan fails validation, the system self-corrects.
  • State Management: Deterministic state transitions ensure auditability and reproducibility of every decision.

2. The Security Plane (The Firewall)

Responsibility: Invariants Enforcement.

  • Valid-by-Construction: The LLM never executes SQL directly. It generates an Abstract Syntax Tree (AST).
  • Static Analysis: The Validator Node enforces Row-Level Security (RLS) and type safety on the AST before compilation.
  • Intent Classification: Upstream detection of adversarial prompts (Jailbreaks/Injections).

3. The Data Plane (The Sandbox)

Responsibility: Semantic Search and Execution.

  • Blast Radius Isolation: SQL Drivers (ODBC/C-Ext) run in a dedicated Sandboxed Process Pool. A segfault in a driver kills a disposable worker, not the Agent.
  • Partitioned Retrieval: The Orchestrator uses Partitioned MMR to inject only relevant schema context, preventing context window overflow.

4. The Reliability Plane (The Guard)

Responsibility: Fault Tolerance and Stability.

  • Layered Defense: A combination of Retries, Circuit Breakers, and Sandboxing ensures the system stays up even when LLMs or Databases go down.
  • Fail-Fast: We stop processing immediately if a dependency is unresponsive, preserving resources.

5. The Observability Plane (The Watchtower)

Responsibility: Visibility, Forensics, and Compliance.

  • Full-Stack Telemetry: Native OpenTelemetry integration provides distributed tracing (Jaeger) and metrics (Prometheus) for every node execution.
  • Forensic Audit Logs: A tamper-evident, persistent Audit Log records every AI decision (Prompt/Response/Reasoning) for compliance and debugging.

📐 Architectural Invariants

Invariant Rationale Mechanism
No Unvalidated SQL Prevent Hullucinations & Data Leaks All plans pass through LogicalValidator (AST) + PhysicalValidator (Dry Run) before execution.
Zero Shared State Crash Safety Execution happens in isolated processes; no shared memory with the Control Plane.
Fail-Fast Reliability Circuit Breakers and Strict Timeouts prevent cascading failures (Retry Storms).
Determinism Debuggability Temperature-0 generation + Strict Typing (Pydantic) for all LLM outputs.

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Docker (Optional, for full integration environment)

1. Installation

git clone https://github.com/nadeem4/nl2sql.git
cd nl2sql

# Set up environment
python -m venv venv
source venv/bin/activate

# Install Core Engine & CLI
pip install -e packages/core
pip install -e packages/cli
pip install -e packages/adapter-sdk

2. Run Demo (Lite Mode)

Boot the engine with an in-memory SQLite database (No Docker required).

nl2sql setup --demo

📚 Technical Documentation


📦 Repository Structure

packages/
├── core/               # The Engine (Graph, State, Logic)
├── cli/                # Terminal Interface & Ops Tools
├── adapter-sdk/        # Interface Contract for new Databases
└── adapters/           # Official Dialects (Postgres, MSSQL, MySQL)
configs/                # Runtime Configuration (Policies, Prompts)
docs/                   # Architecture & Operations Manual

About

NL2SQL is an enterprise-grade, multi-agent NL→SQL system that delivers accurate, safe, and deterministic SQL with schema retrieval, validation, and full observability.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages