The AI-Native Web Data Infrastructure for Developers & Agents
Thordata is the next-generation web scraping and proxy infrastructure designed for the AI era. While traditional providers focus on manual scraping, we build pipelines that feed data directly into LLMs, RAG systems, and AI Agents.
We process over 100M+ requests daily with a focus on speed, success rates, and developer experience.
- AI-First Architecture: Native support for MCP (Model Context Protocol) and LangChain.
- Unblockable Infrastructure: Proprietary Web Unlocker technology that handles Captchas, fingerprints, and JS rendering automatically.
- Massive IP Network: Ethical access to 60M+ Residential, Mobile, and ISP IPs in 181+ locations.
- Developer Centric: Modern SDKs, strictly typed responses, and "Copy-Paste" ready examples.
Empower your AI agents to browse the web and fetch real-time context.
| Repository | Description | Status |
|---|---|---|
| thordata-mcp-server | 🤖 AI Bridge: Connect Claude Desktop / OpenAI directly to real-world web data via MCP protocol. | ✅ Stable |
| thordata-rag-pipeline | 🔍 RAG Ready: Clean, structured data extraction pipeline optimized for Vector Databases. | 🚧 Beta |
| thordata-langchain-tools | 🦜🔗 LangChain: Official tools to turn Thordata into a web-browsing tool for your agents. | 🚧 TBD |
Type-safe, robust, and production-ready libraries for your stack.
| Language | Repository | Features |
|---|---|---|
| Python | thordata-python-sdk | The flagship SDK. Async support, full typing, deeply integrated with Pandas/AI stacks. |
| Node.js | thordata-js-sdk | TypeScript ready. Perfect for serverless and puppeteer/playwright integrations. |
| Go | thordata-go-sdk | High-concurrency client for enterprise-grade scraping systems. |
| Java | thordata-java-sdk | Enterprise compliant, thread-safe implementation. |
From raw HTML to structured JSON, we handle the complexity.
- SERP API: Real-time search results from Google, Bing, Yandex (Search, Shopping, Maps, News).
- Web Scraper API: "Swiss Army Knife" for any URL. Handles rendering, waiting, and extraction.
- Scraping Browser: Headless browsers hosted on our cloud. Connect via CDP/Selenium/Puppeteer.
Install our flagship SDK:
pip install thordataScenario: Search Google for "AI Agents" and get JSON results
import os
from thordata import ThorClient
# Initialize with your tokens
client = ThorClient(
scraper_token=os.getenv("THORDATA_SCRAPER_TOKEN"),
public_token=os.getenv("THORDATA_PUBLIC_TOKEN"),
public_key=os.getenv("THORDATA_PUBLIC_KEY")
)
# 1. SERP Search (Google)
results = client.serp.search(
engine="google",
q="AI Agents using Web Data",
location="United States",
num=5
)
for item in results.get('organic_results', []):
print(f"Title: {item['title']}")
print(f"Link: {item['link']}")
# 2. Universal Scrape (Any URL)
html_content = client.universal.request(
url="https://www.example.com",
js_render=True,
country="us"
)We provide the foundation for anonymous web access.
| Type | Repository / Docs | Use Case |
|---|---|---|
| Residential | Docs | 60M+ IPs. Perfect for high-trust scraping (Social, E-commerce). |
| Datacenter | Docs | High speed, low cost. Best for market intelligence. |
| ISP | Docs | Static residential IPs. Keep the same session for banking/login flows. |
| Mobile | Docs | 3G/4G/5G IPs for mobile-only app verification. |
We are building for the developers.
- 🐛 Found a bug? Open an issue in the respective repository.
- 💡 Feature Request? Check our Roadmap or discuss in Discussions.
- 📧 Enterprise Inquiry? Contact
business@thordata.comfor custom plans (>1TB/month).