2023-06-13 Hacker News Top Articles and Its Summaries
1. Llama.cpp: Full CUDA GPU Acceleration Total comment counts : 25 Summary The article describes a new pull request that adds GPU acceleration for ggml tensors in order to improve prompt processing and generation performance. The author added CUDA kernels for scale, cpy, diag_mask_inf, and soft_max, as well as two special kernels for doing matrix vector multiplication. The PR is still being tested for lower-end GPUs and potential issues with VRAM usage....