cudad is an experimental CUDA SASS decompiler.
⚠️ WarningThis project is heavily vibe-coded and experimental. Do not rely on it for production reverse engineering, security decisions, or correctness-critical workflows.
It currently emphasizes:
- stable parse/CFG/SSA construction,
- conservative structurization,
- optional semantic lifting and name recovery,
- ABI-aware display/declaration inference,
- golden-based regression testing.
It does not yet aim for broad architecture/version coverage or production-grade decompilation quality.
Yes, there is already nice prior work on CUDA SASS reversing/decompilation (for example: Jeb's SASS decompiler).
This project exists mostly because:
- it is fun,
- it is a hands-on learning exercise for compiler/decompiler internals,
- it is an experiment to probe LLM ability and limits on graph-heavy tasks (CFG, dominators, SSA, structurization).
So the goal here is not to compete with mature tools; it is to learn by building and iterating.
The .sass input used by this project is expected to be disassembly text produced by NVIDIA CUDA Toolkit tools (for example cuobjdump / nvdisasm).
Current parsing/testing is primarily aligned with cuobjdump-style text dumps seen in the fixtures.
PTX is usually easier to read and reverse than SASS.
However, PTX can be stripped from CUDA binaries, while SASS machine code is what must exist for execution.
So this project focuses on SASS as the more robust target when PTX is missing.
- Current fixture coverage is primarily one SASS family/style: mostly
sm_89withcode version = [1,7]-style dumps. - Coverage across older/newer SM targets and different disassembly formats is still limited.
...
/*0000*/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
/*0010*/ S2R R0, SR_CTAID.X ;
/*0020*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x184], PT ;
/*0030*/ @P0 EXIT ;
/*0040*/ S2R R2, SR_TID.X ;
/*0050*/ ULDC.64 UR6, c[0x0][0x118] ;
/*0060*/ BSSY B0, 0xf0 ;
/*0070*/ ISETP.GT.AND P1, PT, R2, 0xff, PT ;
/*0080*/ ISETP.NE.AND P0, PT, R2, RZ, PT ;
/*0090*/ @P1 BRA 0xe0 ;
/*00a0*/ STS.U8 [R2], R2 ;
/*00b0*/ IADD3 R2, R2, c[0x0][0x0], RZ ;
/*00c0*/ ISETP.GE.AND P1, PT, R2, 0x100, PT ;
/*00d0*/ @!P1 BRA 0xa0 ;
......
void kernel(uint8_t* arg0_ptr, int32_t arg2, uint8_t* arg4_ptr, uint8_t* arg6_ptr, uint32_t arg8, uint32_t arg9) {
BB0 {
v0 = abi_internal_0x28;
ctaid_x = blockIdx.x;
}
if (!((int32_t)(ctaid_x) >= (int32_t)(arg9))) {
BB1 {
tid_x = threadIdx.x;
b1 = (int32_t)(tid_x) > (int32_t)(255);
b2 = tid_x != 0;
}
if (!((int32_t)(tid_x) > (int32_t)(255))) {
while (!((int32_t)(tid_x) >= (int32_t)(256))) {
shmem_u8[tid_x] = tid_x;
tid_x = tid_x + blockDimX;
}
}
}
}
...This is the style of transformation the project targets: low-level SASS blocks/predicates into conservative structured pseudocode with optional naming and ABI hints.
- Parse SASS
parse_sassinsrc/parser.rs - Build CFG
build_cfginsrc/cfg.rs - Build SSA IR
build_ssainsrc/ir.rs - Structurize control flow
Structurizer::structure_functioninsrc/structurizer.rs - Render structured pseudocode
Structurizer::pretty_printinsrc/structurizer.rs
- Semantic lifting (expression-level cleanup)
lift_function_irinsrc/semantic_lift.rs - Name recovery + post-render cleanup
recover_structured_output_namesinsrc/name_recovery.rs - ABI-aware typing/display pass
ABI profile detection, const-memory annotation, arg aliasing and typed signatures insrc/abi.rs
src/test.rs composes CFG + SSA + structurizer + lift + name recovery + ABI rendering, then compares against fixtures in:
test_cu/golden/test_cu/golden_lifted/test_cu/golden_lifted_named/test_cu/golden_full_pass/
- End-to-end parse → CFG → SSA → structured output is stable.
- Semantic lifting reduces raw opcode noise while preserving conservative fallbacks.
- Name recovery deterministically rewrites SSA tokens to C-like names.
- ABI profile/alias inference provides typed signatures and more readable const-memory semantics.
- Predication cleanup has improved, especially for predicated-only temporary handling and fake-merge ternary patterns.
- Golden fixtures are synchronized with current behavior.
- Test suite currently passes (
cargo test): 145 passed, 0 failed.
-
Push predication cleanup earlier in pipeline
Move more predication semantics from post-render text rewriting into IR/structurizer-level representation. -
Reduce variable reuse ambiguity
Improve naming/SSA presentation so reused temps (e.g.v3) are less likely to look semantically conflated. -
Control-flow recovery expansion
Improve structurizer handling for harder loop/branch shapes and reduce fallbackgotousage. -
Switch/multi-way branch support
Add normalization for branch tables / multi-way control flow. -
Richer type propagation
Strengthen inferred local/argument typing and pointer/value distinction beyond current heuristics. -
Golden/test quality gates
Add more fixture coverage for predication corner cases and loop-carried dataflow. -
Cross-version/cross-arch validation
Expand test corpus across multiple SM targets and SASS dump formats.
cargo run --bin main -- --input test_cu/if_loop.sass --cfg-dot > cfg.dotcargo run --bin main -- --input test_cu/if_loop.sass --ssa-dot --output ssa.dotIf you have Graphviz installed, render to SVG:
dot -Tsvg ssa.dot -o ssa.svgcargo run --bin main -- --input test_cu/if_loop.sass --struct-codecargo run --bin main -- --input test_cu/if_loop.sass --struct-code --semantic-lift --recover-names --typed-decls --abi-mapcargo run --bin main -- --input test_cu/if_loop.sass --struct-code --recover-names --phi-merge-commentsNotes:
- We do not currently provide a first-class standalone def-use chain dump CLI yet.
- Def-use information exists internally in SSA/lifting/name-recovery passes, and partial hints can be seen via phi/live-in comments.
Use the binary on a SASS file:
cargo run --bin main -- --input test_cu/if_loop.sass --struct-codeUseful options (can be combined):
--semantic-lift--recover-names--typed-decls--abi-map--abi-profile auto|legacy140|modern160
cargo testcargo run --example regen_goldens- Design notes:
docs/dev/decompiler_design.MD - Instruction notes:
docs/dev/insts.MD
This project is in active development. It is best viewed as a learning and experimentation codebase that prioritizes conservative behavior over aggressive prettification.