Skip to content

Comments I added when going through the Triton GPU Tutorial code. (using Claud/AI)

Notifications You must be signed in to change notification settings

Ghiora/TritonLearning

Repository files navigation

Use the same instructions as in 
  ../transformer_translation_python/HowToTrain.README


python transformer_translation_triton.py
python transformer_translation_triton.py --train train.en train.de


RTX 4090 optimizations:

    Block sizes tuned for SM89 architecture (BLOCK_M=64, BLOCK_N=64)
    Memory-efficient attention avoids materializing full attention matrix
    Fused operations reduce global memory traffic

About

Comments I added when going through the Triton GPU Tutorial code. (using Claud/AI)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages