raria2

A wrapper for aria2 to mirror open directories.

This CLI tool tries to emulate the same behavior of wget --recursive, but with a couple of filters, checks and caching and by using aria2c to perform the download of resources.

The crawler/link extraction works on HTTP(S) directory listings and HTML pages; individual resources can be downloaded from any scheme supported by aria2 (e.g. HTTP(S), FTP/FTPS).

Compile

go build .

Usage

Usage: raria2 [--output OUTPUT] [--dry-run] [--max-connection-per-server CONNECTIONS] [--max-concurrent-downloads DOWNLOADS] [--threads THREADS] [--aria2-session-size SIZE] [--max-depth DEPTH] [--accept EXT] [--reject EXT] [--accept-filename GLOB] [--reject-filename GLOB] [--case-insensitive-paths] [--accept-path PATTERN] [--reject-path PATTERN] [--visited-cache FILE] [--write-batch FILE] [--http-timeout DURATION] [--user-agent UA] [--log-level LEVEL] [--rate-limit RATE] URL [-- ARIA2_OPTS...]

Positional arguments:
  URL                    The URL from where to fetch the resources from
  ARIA2_OPTS             Options forwarded to aria2c after the URL (use -- before them if
                         they look like flags)

Options:
  --output OUTPUT, -o OUTPUT
                         Output directory. If omitted, raria2 mirrors into
                         <host>/<path>/ derived from the URL (similar to wget)
  --dry-run, -d          Dry Run [default: false]
  --max-connection-per-server, -x
                         Parallel connections per download [default: 5]
  --max-concurrent-downloads, -j
                         Maximum concurrent downloads [default: 5]
  --threads THREADS, -w THREADS
                         Concurrent crawler threads [default: 5]
  --aria2-session-size SIZE
                         Number of links to feed a single aria2 process before
                         closing stdin and restarting it. Defaults to 100 entries;
                         set to 0 to keep a single session. See "Aria2 stdin bug
                         workaround" below for details.
                         [default: 100]
  --max-depth DEPTH      Maximum HTML depth to crawl (-1 for unlimited) [default: -1]
  --accept EXT           Comma-separated list(s) of extensions to include (no dot, case-insensitive)
  --reject EXT           Comma-separated list(s) of extensions to exclude
  --accept-filename GLOB  Comma-separated list(s) of filename globs to include
  --reject-filename GLOB  Comma-separated list(s) of filename globs to exclude
  --case-insensitive-paths
                         Make path matching case-insensitive
  --accept-path PATTERN  Path glob (default) or regex:<expr> that must match to crawl/download
  --reject-path PATTERN  Path glob or regex to skip
  --visited-cache FILE   Persist visited URLs to this file so interrupted runs can resume
  --write-batch FILE     Write aria2 input file to disk instead of executing
  --http-timeout DURATION
                         HTTP client timeout as Go duration string (e.g. 30s, 2m) [default: 30s]
  --user-agent UA         Custom User-Agent string [default: raria2/1.0]
  --log-level LEVEL       Log level (panic,fatal,error,warn,info,debug,trace) [default: info]
  --rate-limit RATE       Rate limit for HTTP requests (requests per second) [default: 0]
  --respect-robots        Respect robots.txt when crawling [default: false]
  --accept-mime TYPES     Comma-separated list of MIME types to include
  --reject-mime TYPES     Comma-separated list of MIME types to exclude
  --help, -h             display this help and exit

Example

# dry run mirroring into host/path structure automatically
raria2 -d 'https://proof.ovh.net/files/' -- --max-download-limit=1M

# explicitly setting output directory and concurrency knobs (crawler + aria2)
raria2 -d -o output -x 10 -j 8 -w 12 'https://mirror.nforce.com/pub/speedtests/' -- --max-download-limit=1M

# customize the HTTP timeout (here: 2 minutes)
raria2 --http-timeout=2m 'https://example.com/pub/'

# limit crawl depth to first directory level
raria2 --max-depth=1 'https://example.com/pub/'

# only download .iso files inside /iso/ paths and persist visited cache
raria2 --accept=iso --accept-path='glob:/iso/**' --visited-cache=visited.txt 'https://mirror.example.com/'

# Advanced filtering: filename globs and case-insensitive paths
raria2 --accept-filename='release-*' --reject-filename='*.tmp' --case-insensitive-paths --accept-path='glob:**/Files/**' 'https://example.com/pub/'

# Generate batch file for later use instead of immediate download
raria2 --write-batch downloads.txt --dry-run 'https://example.com/pub/'
# Later use: aria2c --input-file=downloads.txt

# Rate-limited crawling with custom User-Agent
raria2 --rate-limit=2 --user-agent='MyBot/1.0' 'https://example.com/pub/'

# Respect robots.txt while crawling
raria2 --respect-robots 'https://example.com/pub/'

# MIME-type filtering - only download PDFs and images
raria2 --accept-mime 'application/pdf,image/jpeg,image/png' 'https://example.com/files/'

# Exclude binary executables and archives
raria2 --reject-mime 'application/octet-stream,application/x-executable,application/zip' 'https://example.com/pub/'

Aria2 stdin bug workaround

Aria2 has an open bug (aria2/aria2#1161) where downloads fed via stdin might not start until the input stream closes. When crawling large trees, use --aria2-session-size to periodically close and restart aria2 so downloads begin before the crawl finishes. This option only applies when streaming URLs directly to aria2 (normal mode), not when --write-batch is used.

Session Management

The --visited-cache option allows you to persist visited URLs between runs, enabling resume functionality:

# First run - crawl and cache visited URLs
raria2 --visited-cache=visited.txt 'https://example.com/pub/'

# Interrupted run - resume from where you left off
raria2 --visited-cache=visited.txt 'https://example.com/pub/'

Batch File Generation

Use --write-batch to create an aria2 input file for manual control. If you hit the aria2 stdin bug (aria2/aria2#1161) on large crawls, combine --aria2-session-size with --write-batch to generate smaller chunks:

# Create batch file without downloading
raria2 --write-batch downloads.txt 'https://example.com/pub/'

# Use the batch file later with aria2
aria2c --input-file=downloads.txt

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
pkg		pkg
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

raria2

Compile

Usage

Example

Aria2 stdin bug workaround

Session Management

Batch File Generation

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

smeinecke/raria2

Folders and files

Latest commit

History

Repository files navigation

raria2

Compile

Usage

Example

Aria2 stdin bug workaround

Session Management

Batch File Generation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages