A streamlined CLI tool for profiling Tenstorrent's TT-Metal tests and extracting device kernel performance metrics
- π Automated Profiling: Seamlessly runs Tenstorrent's TT-Metal profiler with pytest
- π CSV Analysis: Automatically extracts and parses performance CSV files
- β‘ Real-time Output: Shows profiling progress in real-time
- π Performance Metrics: Calculates total DEVICE KERNEL DURATION
- π― Simple CLI: Easy-to-use command-line interface
- π οΈ Flexible: Supports named profiles and various test paths
- π Operation-based Profiling: Profile specific operations by name (e.g.,
ttperf add) - βοΈ Dynamic Configuration: Customize tensor shape, dtype, and layout for operations
# Install from PyPI (recommended)
pip install ttperfOr install from source:
# Clone the repository
git clone https://github.com/Aswintechie/ttperf.git
cd ttperf
# Install the package
pip install -e .ttperf automatically searches for your TT-Metal installation using a simple two-step process:
# Option 1: Set PYTHONPATH to your tt-metal location
export PYTHONPATH=/path/to/your/tt-metal
ttperf add
# Option 2: Run from within tt-metal directory (or any subdirectory)
cd /path/to/your/tt-metal
ttperf relu
# Or from a subdirectory
cd /path/to/your/tt-metal/tests
ttperf matmultt-metal Path Search Order:
PYTHONPATHenvironment variable (if specified)- Current working directory (walks up directory tree to find tt-metal root)
# Run profiling on a specific test
ttperf test_performance.py
# Run with a custom profile name
ttperf my_profile pytest test_performance.py
# Run on a specific test method
ttperf tests/test_ops.py::test_matmul
# Profile specific operations by name
ttperf add
ttperf relu
ttperf matmul
# Profile operations with custom profile names
ttperf my_add_profile add
ttperf my_relu_profile relu
# Profile operations with custom configuration
ttperf add --shape 1,1,32,32 --dtype bfloat16 --layout tile
ttperf relu --shape 1,1,64,64 --dtype float32 --layout row_major
# Profile operations with memory configuration
ttperf add --dram # Use DRAM memory (default)
ttperf relu --l1 # Use L1 memory
ttperf add --shape 1,1,64,64 --l1 # Combined optionsttperf test_conv.pyttperf conv_benchmark pytest test_conv.pyttperf tests/ops/test_matmul.py::test_basic_matmul# Basic operations
ttperf add
ttperf subtract
ttperf multiply
ttperf divide
# Activation functions
ttperf relu
ttperf sigmoid
ttperf tanh
ttperf gelu
# Mathematical operations
ttperf sqrt
ttperf exp
ttperf log
ttperf sin
ttperf cos
# Comparison operations
ttperf gt
ttperf lt
ttperf eq
ttperf ne
# Reduction operations
ttperf max
ttperf min
ttperf mean
ttperf sum
# Backward operations
ttperf add_bw
ttperf relu_bw
ttperf sigmoid_bw# Custom tensor shape
ttperf add --shape 1,1,32,32
ttperf relu --shape 2,3,64,128
# Custom data type
ttperf add --dtype float32
ttperf multiply --dtype int32
# Custom memory layout
ttperf add --layout row_major
ttperf relu --layout tile
# Combined configuration
ttperf add --shape 1,1,64,64 --dtype float32 --layout row_major
ttperf gelu --shape 2,1,32,32 --dtype bfloat16 --layout tile
# Memory configuration options
ttperf add --memory-config dram # Explicit DRAM
ttperf relu --memory-config l1 # Explicit L1
ttperf add --dram --shape 1,1,128,128 # DRAM with custom shape
ttperf relu --l1 --dtype float32 # L1 with custom dtypettperf --list-ops
# or
ttperf -lπ§ Using custom configuration:
Shape: (1, 1, 32, 32)
Dtype: bfloat16
Layout: tile
π·οΈ Auto-generated profile name: temp_test_add
βΆοΈ Running: ./tools/tracy/profile_this.py -n temp_test_add -c "pytest temp_test_add.py"
... (profiling output) ...
π Found CSV path: /path/to/profile_results.csv
β±οΈ DEVICE KERNEL DURATION [ns] total: 1234567.89 ns
- Command Parsing: Analyzes input arguments to determine profile name and test path/operation
- Operation Detection: If an operation name is provided, maps it to the corresponding test method
- Dynamic Configuration: If custom configuration is provided, generates a temporary test file with the specified parameters
- Profile Execution: Runs the Tenstorrent's TT-Metal profiler with the specified test
- Output Monitoring: Streams profiling output in real-time
- CSV Extraction: Parses the output to find the generated CSV file path
- Performance Analysis: Reads the CSV and calculates total device kernel duration
The tool extracts the following key metrics:
- DEVICE KERNEL DURATION [ns]: Total time spent in device kernels
- CSV Path: Location of the detailed profiling results
- Real-time Progress: Live output during profiling
- Format: Comma-separated integers (e.g.,
1,1,32,32) - Default:
(1, 1, 1024, 1024) - Example:
--shape 2,3,64,128
- Valid Options:
bfloat16,float32,int32 - Default:
bfloat16 - Example:
--dtype float32
- Valid Options:
tile,row_major - Default:
tile - Example:
--layout row_major
- Python 3.8+
- pandas
- Tenstorrent's TT-Metal development environment
- pytest
ttperf/
βββ ttperf.py # Main CLI implementation
βββ pyproject.toml # Project configuration
βββ README.md # This file
βββ .gitignore # Git ignore rules
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is an independent utility that interfaces with Tenstorrent's TT-Metal profiling tools. It is not affiliated with or endorsed by Tenstorrent Inc. The tool serves as a convenience wrapper around existing TT-Metal profiling infrastructure.
If you encounter any issues, please create an issue on GitHub.
Aswin Z
- GitHub: @Aswintechie
- Portfolio: aswinlocal.in
- Tenstorrent's TT-Metal development team for the profiling tools
- Python community for excellent libraries like pandas