6,903 stars | 871 forks | Cuda
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
What it does
DeepGEMM is a high-performance CUDA library for tensor core kernels, offering efficient FP8 GEMM operations and more. It simplifies GPU kernel optimization techniques while matching or surpassing expert-tuned libraries.
Why it matters: Revolutionize your AI workloads with DeepGEMM, a high-performance CUDA library for tensor core kernels. #AI #CUDA
Trending today with 109 new stars
Want to create content about this repo? Use Nemati AI tools to generate articles, tutorials, and social posts.
![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



