Skip to content

aghamari/trace-compare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trace Analyzer

Analyze AMD GPU profiling traces using SQL queries and export results to Excel with color-coded comparisons.

Features

  • Query Perfetto/PyTorch traces using SQL
  • Compare two traces with LCS (Longest Common Subsequence) alignment
  • Color-coded results: GREEN = new is better, RED = new is worse
  • Connect to running trace_processor HTTP servers for fast iteration
  • Export to Excel with multiple sheets (one per query)

Installation

# Activate your Python environment
pyenv activate 31015  # or your environment

# Install dependencies
pip install perfetto pandas openpyxl

Quick Start

1. Start trace_processor servers (recommended for large traces)

# Start server for the NEW/experimental trace on port 9001
tmux new-session -d -s new_trace "trace_processor -D --http-port 9001 /path/to/new_trace.json.gz"

# Start server for the BASELINE trace on port 9002
tmux new-session -d -s baseline "trace_processor -D --http-port 9002 /path/to/baseline_trace.json.gz"

2. Create a SQL queries file

Create queries.sql:

-- @query: total_kernel_time
-- Total kernel execution time
SELECT SUM(dur) / 1000000.0 as total_kernel_ms FROM slice WHERE cat = 'kernel';

-- @query: top_20_kernels
-- @align: name
-- Top 20 kernels by total execution time
SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms
FROM slice WHERE cat = 'kernel'
GROUP BY name ORDER BY total_ms DESC LIMIT 20;

-- @query: gemm_comparison
-- @align: name
-- @lower_is_better: total_ms, avg_us
-- GEMM kernel comparison
SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'kernel' AND name LIKE '%Cijk%'
GROUP BY name ORDER BY total_ms DESC LIMIT 10;

3. Run the comparison

python trace_analyzer.py localhost:9001 localhost:9002 \
    -s queries.sql \
    -o comparison.xlsx \
    -n embsim baseline

Usage

python trace_analyzer.py [OPTIONS] TRACE1 [TRACE2] -s QUERIES.sql -o OUTPUT.xlsx

Arguments:
  TRACE1, TRACE2    Trace files or HTTP servers (host:port)
                    - First trace = NEW/experimental
                    - Second trace = BASELINE

Options:
  -s, --sql FILE    SQL file with queries (required)
  -o, --output FILE Output Excel file (default: trace_analysis.xlsx)
  -n, --names       Names for traces (used as column suffixes)

SQL File Format

Directives

Directive Description
@query: name Names the query (becomes Excel sheet name)
@align: column Column to use for LCS alignment between traces
@lower_is_better: col1, col2 Columns where lower values are better (for color coding)

Example

-- @query: kernel_performance
-- @align: name
-- @lower_is_better: total_ms, avg_us
-- Compare kernel performance between traces
SELECT
    name,
    COUNT(*) as calls,
    SUM(dur)/1000000.0 as total_ms,
    AVG(dur)/1000.0 as avg_us,
    MIN(dur)/1000.0 as min_us,
    MAX(dur)/1000.0 as max_us
FROM slice
WHERE cat = 'kernel' AND dur > 0
GROUP BY name
ORDER BY total_ms DESC
LIMIT 20;

Color Coding

When comparing two traces:

Color Meaning
🟢 Green New trace is better than baseline (>1% improvement)
🔴 Red New trace is worse than baseline (>1% regression)
No color Within 1% or not comparable

Automatic detection: Columns with _ms, _us, time, dur, latency are assumed to be "lower is better".

Manual override: Use @lower_is_better: col1, col2 to specify additional columns.

Examples

Single trace analysis

python trace_analyzer.py /path/to/trace.json.gz -s queries.sql -o analysis.xlsx

Compare trace files directly

python trace_analyzer.py new_trace.pb baseline_trace.pb \
    -s queries.sql \
    -o comparison.xlsx \
    -n new baseline

Connect to running trace_processor servers

# Faster for large traces - load once, query many times
python trace_analyzer.py localhost:9001 localhost:9002 \
    -s queries.sql \
    -o comparison.xlsx \
    -n embsim noeq

Common SQL Queries for AMD GPU Profiling

Total kernel time

SELECT SUM(dur) / 1000000.0 as total_kernel_ms FROM slice WHERE cat = 'kernel';

Category breakdown

SELECT cat, COUNT(*) as cnt, SUM(dur)/1000000.0 as total_ms
FROM slice WHERE dur > 0
GROUP BY cat ORDER BY total_ms DESC;

Top GEMM kernels (hipBLASLt)

SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'kernel' AND name LIKE '%Cijk%'
GROUP BY name ORDER BY total_ms DESC LIMIT 10;

HIP Runtime API calls

SELECT name, COUNT(*) as calls, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE cat = 'cuda_runtime'
GROUP BY name ORDER BY total_ms DESC LIMIT 15;

wvSplitK analysis

SELECT name, COUNT(*) as call_count, SUM(dur)/1000000.0 as total_ms, AVG(dur)/1000.0 as avg_us
FROM slice WHERE name LIKE '%wvSplitK%'
GROUP BY name ORDER BY total_ms DESC;

Troubleshooting

Connection timeout to trace_processor

Make sure to use the http:// prefix when connecting:

# The tool adds http:// automatically, but if issues persist:
python trace_analyzer.py http://localhost:9001 ...

Large traces

For traces >1GB, use HTTP server mode:

# Start server (loads trace once)
trace_processor -D --http-port 9001 large_trace.json.gz

# Run multiple analyses without reloading
python trace_analyzer.py localhost:9001 -s queries1.sql -o results1.xlsx
python trace_analyzer.py localhost:9001 -s queries2.sql -o results2.xlsx

Check if trace_processor is running

curl http://127.0.0.1:9001/status

Files

  • trace_analyzer.py - Main analysis tool
  • comparison_queries.sql - Example SQL queries for AMD GPU profiling
  • example_queries.sql - Basic example queries

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages