Analyze AMD GPU profiling traces using SQL queries and export results to Excel with color-coded comparisons.
- Query Perfetto/PyTorch traces using SQL
- Compare two traces with LCS (Longest Common Subsequence) alignment
- Color-coded results: GREEN = new is better, RED = new is worse
- Connect to running
trace_processorHTTP servers for fast iteration - Export to Excel with multiple sheets (one per query)
# Activate your Python environment
pyenv activate 31015 # or your environment
# Install dependencies
pip install perfetto pandas openpyxl# Start server for the NEW/experimental trace on port 9001
tmux new-session -d -s new_trace "trace_processor -D --http-port 9001 /path/to/new_trace.json.gz"
# Start server for the BASELINE trace on port 9002
tmux new-session -d -s baseline "trace_processor -D --http-port 9002 /path/to/baseline_trace.json.gz"Create queries.sql:
-- @query: total_kernel_time
-- Total kernel execution time
SELECT SUM(dur) / 1000000.0 as total_kernel_ms FROM slice WHERE cat = 'kernel';
-- @query: top_20_kernels
-- @align: name
-- Top 20 kernels by total execution time
SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms
FROM slice WHERE cat = 'kernel'
GROUP BY name ORDER BY total_ms DESC LIMIT 20;
-- @query: gemm_comparison
-- @align: name
-- @lower_is_better: total_ms, avg_us
-- GEMM kernel comparison
SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'kernel' AND name LIKE '%Cijk%'
GROUP BY name ORDER BY total_ms DESC LIMIT 10;python trace_analyzer.py localhost:9001 localhost:9002 \
-s queries.sql \
-o comparison.xlsx \
-n embsim baselinepython trace_analyzer.py [OPTIONS] TRACE1 [TRACE2] -s QUERIES.sql -o OUTPUT.xlsx
Arguments:
TRACE1, TRACE2 Trace files or HTTP servers (host:port)
- First trace = NEW/experimental
- Second trace = BASELINE
Options:
-s, --sql FILE SQL file with queries (required)
-o, --output FILE Output Excel file (default: trace_analysis.xlsx)
-n, --names Names for traces (used as column suffixes)
| Directive | Description |
|---|---|
@query: name |
Names the query (becomes Excel sheet name) |
@align: column |
Column to use for LCS alignment between traces |
@lower_is_better: col1, col2 |
Columns where lower values are better (for color coding) |
-- @query: kernel_performance
-- @align: name
-- @lower_is_better: total_ms, avg_us
-- Compare kernel performance between traces
SELECT
name,
COUNT(*) as calls,
SUM(dur)/1000000.0 as total_ms,
AVG(dur)/1000.0 as avg_us,
MIN(dur)/1000.0 as min_us,
MAX(dur)/1000.0 as max_us
FROM slice
WHERE cat = 'kernel' AND dur > 0
GROUP BY name
ORDER BY total_ms DESC
LIMIT 20;When comparing two traces:
| Color | Meaning |
|---|---|
| 🟢 Green | New trace is better than baseline (>1% improvement) |
| 🔴 Red | New trace is worse than baseline (>1% regression) |
| No color | Within 1% or not comparable |
Automatic detection: Columns with _ms, _us, time, dur, latency are assumed to be "lower is better".
Manual override: Use @lower_is_better: col1, col2 to specify additional columns.
python trace_analyzer.py /path/to/trace.json.gz -s queries.sql -o analysis.xlsxpython trace_analyzer.py new_trace.pb baseline_trace.pb \
-s queries.sql \
-o comparison.xlsx \
-n new baseline# Faster for large traces - load once, query many times
python trace_analyzer.py localhost:9001 localhost:9002 \
-s queries.sql \
-o comparison.xlsx \
-n embsim noeqSELECT SUM(dur) / 1000000.0 as total_kernel_ms FROM slice WHERE cat = 'kernel';SELECT cat, COUNT(*) as cnt, SUM(dur)/1000000.0 as total_ms
FROM slice WHERE dur > 0
GROUP BY cat ORDER BY total_ms DESC;SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'kernel' AND name LIKE '%Cijk%'
GROUP BY name ORDER BY total_ms DESC LIMIT 10;SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'cuda_runtime'
GROUP BY name ORDER BY total_ms DESC LIMIT 15;SELECT name, COUNT(*) as call_count, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE name LIKE '%wvSplitK%'
GROUP BY name ORDER BY total_ms DESC;Make sure to use the http:// prefix when connecting:
# The tool adds http:// automatically, but if issues persist:
python trace_analyzer.py http://localhost:9001 ...For traces >1GB, use HTTP server mode:
# Start server (loads trace once)
trace_processor -D --http-port 9001 large_trace.json.gz
# Run multiple analyses without reloading
python trace_analyzer.py localhost:9001 -s queries1.sql -o results1.xlsx
python trace_analyzer.py localhost:9001 -s queries2.sql -o results2.xlsxcurl http://127.0.0.1:9001/statustrace_analyzer.py- Main analysis toolcomparison_queries.sql- Example SQL queries for AMD GPU profilingexample_queries.sql- Basic example queries