2025-12-04 Hacker News Top Articles and Its Summaries
1. CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication Through RL Total comment counts : 2 Summary CUDA-L2 is a system that uses large language models and reinforcement learning to automatically optimize half-precision GEMM CUDA kernels. It reportedly outperforms major matmul baselines, including PyTorch’s matmul and NVIDIA cuBLAS/cuBLASLt. The kernels are trained on A100 hardware, with caveats about potential speedups on other GPUs; kernels for other machines will be released progressively. For dimensions not covered, users can pad to the nearest configuration or open GitHub issues for requests....