- pytorch/ao#3306 FP8 blockwise quantization benchmark
- pytorch/ao#3342 replaced torch._scaled_mm with torch.nn.functional.scaled_mm
- NovaSky-AI/SkyRL#758 Benchmarking and optimising setting logprobs tracking to default
- NovaSky-AI/SkyRL#691 benchmarking and fusing kernels
- NovaSky-AI/SkyRL#680 top_k sampling
- NovaSky-AI/SkyRL#880 Cuda tiling for expert parallelism



