diff --git a/README.md b/README.md index 0b07eaf..16f65cf 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,14 @@ -GemLite is a collection of straightforward CUDA and Triton kernels for efficient, fused low-bit matrix multiplication. It is specifically designed for simplicity and reasubility. +# GemLite +GemLite is a collection of straightforward CUDA and Triton kernels for efficient, fused low-bit matrix multiplication. It is specifically designed for simplicity and reasubility. This project was initiated because we found it challenging to customize the low-bit kernels that are currently available. GemLite provides both flexibility and performance, enabling users to easily modify the codebase to develop high-performance kernels tailored to their specific needs. -While GemLite can outperform the best existing implementations on large matrices, there's still potential for further optimization! +While GemLite can outperform the best existing implementations on large matrices, there's still potential for further optimization!