OSINT tool that combines passive infrastructure discovery, document metadata extraction, organizational intelligence gathering, and LLM-powered analysis.
- Passive enumeration via bbot — discovers subdomains, IP addresses, open ports, technologies, email addresses, and cloud storage buckets
- Google dorking via Serper.dev API — finds exposed documents, directory listings, error pages, and sensitive files
- Document metadata extraction — downloads discovered documents and extracts author names, internal paths, software versions, and email addresses from PDF, DOCX, XLSX, PPTX, DOC, XLS, and PPT files
- Organizational reconnaissance — answers six intelligence questions about the target (leadership, business model, history, locations, revenue, corporate structure) using web search and SEC EDGAR filings
- LLM-powered analysis — optional AI analysis of information gathered on the target
- Python 3.12+
- uv package manager
- A Serper.dev API key (free tier: 2,500 queries) — required for dorking and orgrecon
- An LLM API key (Anthropic, OpenAI, etc.) — optional, needed for orgrecon and AI analysis
git clone https://github.com/dejisec/reaconagent.git && cd reconagent
uv syncCopy the example environment file and add update the variables:
cp .env.example .envGeneral
| Variable | Default | Description |
|---|---|---|
RECONAGENT_SERPER_API_KEY |
(required) | Serper.dev API key for dorking and orgrecon |
RECONAGENT_OUTPUT_DIR |
./reports |
Output directory for HTML reports |
RECONAGENT_MAX_DOWNLOAD_SIZE_MB |
50 |
Max file size to download (MB) |
RECONAGENT_DOWNLOAD_TIMEOUT |
30 |
Download timeout in seconds |
RECONAGENT_MAX_CONCURRENT_DOWNLOADS |
5 |
Parallel download limit (1–20) |
bbot
| Variable | Default | Description |
|---|---|---|
RECONAGENT_BBOT_EXCLUDE_FLAGS |
["slow","active","deadly"] |
bbot module flags to exclude |
RECONAGENT_BBOT_REQUIRE_FLAGS |
["passive"] |
bbot module flags to require |
RECONAGENT_BBOT_MAX_SCAN_DURATION |
600 |
Max bbot scan time in seconds |
Optional bbot module API keys (e.g. VirusTotal, Shodan, Censys, SecurityTrails) can enhance scan results. See .env.example for the full list.
Dorking
| Variable | Default | Description |
|---|---|---|
RECONAGENT_DORKING_REQUEST_DELAY |
1.0 |
Delay between Serper API calls (seconds) |
RECONAGENT_DORKING_MAX_RESULTS_PER_QUERY |
10 |
Max results per dork query (1–100) |
Organizational Recon
| Variable | Default | Description |
|---|---|---|
RECONAGENT_ORGRECON_MAX_SEARCH_RESULTS |
10 |
Max Serper results per question query (1–100) |
RECONAGENT_ORGRECON_MAX_EXTRACT_URLS |
8 |
Max web pages to extract per question (1–20) |
RECONAGENT_ORGRECON_MAX_CONTENT_LENGTH |
15000 |
Max characters per extracted page (1,000–100,000) |
RECONAGENT_ORGRECON_SERPER_DELAY |
0.5 |
Seconds between Serper API calls |
RECONAGENT_ORGRECON_EDGAR_DELAY |
0.2 |
Seconds between SEC EDGAR API calls |
RECONAGENT_ORGRECON_SKIP_EDGAR |
false |
Skip SEC EDGAR lookups (for private companies) |
LLM Analysis
| Variable | Default | Description |
|---|---|---|
RECONAGENT_LLM_PROVIDER |
none |
LLM provider: anthropic, openai, openai-compat, ollama, none |
RECONAGENT_LLM_MODEL |
"" |
Model identifier (e.g. claude-sonnet-4-5-20250929, gpt-4o-mini) |
RECONAGENT_LLM_API_KEY |
"" |
API key for the LLM provider (not needed for Ollama) |
RECONAGENT_LLM_BASE_URL |
"" |
Base URL for OpenAI-compatible or Ollama providers |
RECONAGENT_LLM_MAX_TOKENS |
4096 |
Maximum tokens per LLM response (256–32,768) |
RECONAGENT_LLM_TIMEOUT |
120 |
Timeout in seconds per LLM call (10–600) |
RECONAGENT_LLM_TEMPERATURE |
0.2 |
Sampling temperature (0.0–2.0, lower = more deterministic) |
# Full recon (passive infrastructure + orgrecon + LLM analysis)
uv run reconagent example.com
# Infrastructure only (no orgrecon, no LLM)
uv run reconagent example.com --skip-orgrecon --skip-llm
# Orgrecon only (skip bbot and dorking)
uv run reconagent example.com --skip-bbot --skip-dorking- bbot — Passive reconnaissance discovers subdomains, IPs, open ports, technologies, emails, and storage buckets.
- Google Dorking — Queries Serper.dev with targeted dork queries (filetype, sensitive keywords, directory listings, error pages) to find exposed documents.
- Document Download — Downloads discovered documents that have extractable file types, with size limits and content-type validation.
- Metadata Extraction — Extracts author names, internal paths, software versions, emails, and timestamps from PDF, Office, and legacy Office formats.
- LLM Analysis — Optional AI-powered analysis of infrastructure findings using the configured LLM provider. Produces an executive summary, risk assessment, and technical recommendations.
- SEC EDGAR Lookup — Resolves the target company's CIK and ticker via EDGAR full-text search. Pulls 10-K, 10-Q, and DEF 14A filings plus XBRL financial facts. Skipped for private companies.
- Six-Question Intelligence Gathering — For each question (leadership, business, history, locations, revenue, structure): generates targeted search queries, executes them via Serper, extracts full-text content from top results, fetches question-specific EDGAR data, and synthesises a structured answer via LLM with source citations and confidence ratings.
- Risk Scoring — Applies category-specific scoring rules to all collected data, generates prioritised findings and actionable recommendations.
- Report Generation — Renders a self-contained HTML report combining infrastructure findings, org intelligence, risk tables, and executive summary.