Overview
- NVIDIA says CUDA Toolkit 13.1 is its biggest update since CUDA’s 2006 debut.
- The release introduces the CUDA Tile programming model and Tile IR, with cuTile Python enabling tile‑centric GPU kernels at a higher level than SIMT.
- In NVIDIA’s examples, a 15‑line cuTile Python kernel can match the performance of a hand‑tuned CUDA C++ kernel of roughly 200 lines.
- Initial Tile capability is limited to Blackwell GPUs (compute capability 10.x and 12.x), with NVIDIA stating broader architecture support will arrive in future CUDA versions.
- Core libraries and tools gain Blackwell‑focused features, including grouped GEMM in cuBLAS, batched eigendecomposition in cuSOLVER, Tile‑aware profiling in Nsight, and Green Contexts now available in the runtime API.