From 2bafee56030f3aa631ed773b194fda94aed54ba3 Mon Sep 17 00:00:00 2001
From: Raj Ghosh <148113238+bxbee@users.noreply.github.com>
Date: Sat, 31 Jan 2026 21:28:00 +0530
Subject: [PATCH] Revise README for performance improvements and features

Updated README to improve clarity and fix formatting issues.
---
 src/README.md | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/README.md b/src/README.md
index f713b9ab2..658e67488 100644
--- a/src/README.md
+++ b/src/README.md
@@ -1,20 +1,18 @@
 # BitNet CPU Inference Optimization
 
-This update provides significant performance improvements for BitNet inference on CPU through paralleled kernel implementations, native I2_S GEMM/GEMV support, configurable tiling block size and embedding quantization.
+This update delivers significant performance improvements for BitNet inference on CPU through parallelized kernel implementations, native I2_S GEMM/GEMV support, configurable tiling block sizes, and embedding quantization
 
 ## Update
 
-- **Parallel Weight & Activation Computation**  
-  Implemented parallel processing of weights and activations in the W2A8 vet_dot kernel, achieving improved throughput on both x86 and ARM architectures.
+- ** Parallel Weight & Activation Computation
+Implemented parallel processing of weights and activations in the W2A8 vec_dot kernel, achieving higher throughput on both x86 and ARM architectures.
+- ** Native I2_S GEMM & GEMV Support
+Integrated I2_S GEMM and GEMV operations into the ggml library, ensuring full compatibility with the llama.cpp architecture. This enables seamless integration with existing inference pipelines.
+- ** Configurable Tiling & Parallelism
+Introduced configurable GEMM and GEMV block sizes along with adjustable parallelism levels, allowing fine‑tuned performance optimization across different CPU architectures.
+- ** Embedding Quantization
+Added support for embedding layer quantization using the Q6_K format, reducing memory footprint and improving inference speed while maintaining high accuracy.
 
-- **Native I2_S GEMM & GEMV Support**  
-  Integrated I2_S GEMM and GEMV operations into ggml library, making them fully compatible with the llama.cpp architecture. This enables seamless integration with existing inference pipelines.
-
-- **Configurable Tiling & Parallelism**  
-  Introduced configurable GEMM & GEMV block sizes and parallelism levels, allowing performance fine-tuning for different CPU architectures.
-
-- **Embedding Quantization**  
-  Added support for embedding layer quantization with Q6_K format, reducing memory footprint and improving inference speed while maintaining high accuracy.
 
 ## Usage