pyragify is a Python-based tool designed to transform your Python code repositories into a format that's ready for analysis with large language models (LLMs), specifically NotebookLM. It breaks down complex code structures into manageable semantic chunks, making it easier to understand, analyze, and extract insights from your code.
- Boost Code Comprehension: pyragify makes it easier to digest large codebases by dividing them into smaller, logical units.
- Effortless Analysis: The structured output simplifies the process of analyzing code, identifying patterns, and extracting knowledge.
- Unlock the Power of NotebookLM: pyragify prepares your code for use with NotebookLM, allowing you to leverage the power of LLMs for tasks like code summarization, documentation generation, and question answering.
- Semantic Chunking: pyragify intelligently extracts functions, classes, and comments from Python files, as well as headers and sections from Markdown files, preserving the context and meaning.
- Wide Format Support: It handles Python (.py), Markdown (.md, .markdown), HTML (.html), CSS (.css), and other common file types, ensuring all your repository content is processed.
- Smart Parsing: Uses AST for Python files, regex-based parsing for HTML/CSS, and header-based chunking for Markdown files.
- Seamless Integration with NotebookLM: The output format is specifically designed for compatibility with NotebookLM, making it easy to analyze your code with powerful LLMs.
- Flexible Configuration: Tailor the processing through a YAML file or command-line arguments to fit your specific needs.
- File Skipping: Respect your
.gitignoreand.dockerignorefiles, and define custom skip patterns for even more control. - Word Limit Control: Automatically chunks output files based on a configurable word limit to ensure manageable file sizes.
- Input Validation: Validates repository paths and provides clear error messages for invalid inputs.
-
Using uv (Recommended):
uv pip install pyragify
uvis a blazing fast Python package manager that handles virtual environments and dependencies automatically. -
Using pip:
pip install pyragify
-
From Source:
git clone https://github.com/ThomasBury/pyragify.git cd pyragify uv pip install -e .
-
Best Practice with uv:
uv run pyragify --config-file config.yaml
See below for details about the configuration file.
-
Direct CLI Execution:
python -m pyragify --config-file config.yaml
See pyragify --help for a full list of options.
--config-file: Path to the YAML configuration file (default: config.yaml).--repo-path: Override the repository path.--output-dir: Override the output directory.--max-words: Override the maximum words per output file.--max-file-size: Override the maximum file size (in bytes) to process.--skip-patterns: Override file patterns to skip.--skip-dirs: Override directories to skip.--verbose: Enable detailed logging for debugging.
repo_path: /path/to/repository
output_dir: /path/to/output
max_words: 200000
max_file_size: 10485760 # 10 MB
skip_patterns:
- "*.log"
- "*.tmp"
skip_dirs:
- "__pycache__"
- "node_modules"
verbose: false-
Prepare Your Repository: Make sure your repository contains the code you want to process. Utilize
.gitignoreor.dockerignoreto exclude unwanted files or directories. -
Configure pyragify: Create a
config.yamlfile with your desired settings or use the default configuration. -
Process the Repository: Run pyragify using uv (recommended):
uv run pyragify --config-file config.yaml
-
Check the Output: Your processed content is neatly organized by file type in the specified output directory.
- Navigate to NotebookLM.
- Upload the
chunk_0.txtfile (or other relevant chunks) from the pyragify output directory to a new notebook. - Start asking questions and get insights with precise citations! You can even generate a podcast from your code.

The processed content is saved as .txt files and categorized into subdirectories based on the file type:
python/: Contains chunks of Python functions, classes, and their code.markdown/: Contains sections of Markdown files split by headers.html/: Contains HTML script and style chunks extracted from HTML files.css/: Contains CSS rule chunks from CSS files.other/: Contains plain-text versions of unsupported file types.
- Input Validation: Validates repository paths and provides clear error messages for invalid inputs.
- Respect for Ignore Files: pyragify automatically honors
.gitignoreand.dockerignorepatterns. - Incremental Processing: MD5 hashes are used to efficiently skip unchanged files during subsequent runs.
We welcome contributions! To contribute to pyragify:
- Clone the repository.
- Install dependencies.
- Run tests. (Test suite is under development).
Feel free to create a GitHub issue for any questions, bug reports, or feature requests.
This project is licensed under the MIT License. See the LICENSE file for details.
Process a Repository with Default Settings:
uv run pyragify --config-file config.yamlProcess a Specific Repository with Custom Settings:
uv run pyragify \
--repo-path /my/repo \
--output-dir /my/output \
--max-words 100000 \
--max-file-size 5242880 \
--skip-patterns "*.log,*.tmp" \
--skip-dirs "__pycache__,node_modules" \
--verbose