Bonus Performance Optimization: Cache Blocking Technique #10

joachimasare · 2024-11-19T03:05:14Z

I implemented a cache blocking technique to optimize matrix multiplication and achieved a 1.5x on first run and 1.6x performance improvement (approximately 49.2% and 59.5% respectively) faster compared to the reference implementation).

Optimization Details:

Technique: Divided large matrices into smaller submatrices or blocks. These blocks were sized to align with my CPU’s cache hierarchy, reducing cache misses and improving data reuse.
Block Sizes:
- Rows (BM): 32
- Columns (BN): 32
- Reduction dimension (BK): 32
My CPU Cache Details that were considered for choosing block size:
- L2 Cache Size: 18432 KB
- L3 Cache Size: 24576 KB
  The block size (32x32x32) was chosen to fit within the available L2/L3 cache for efficient data storage and retrieval during compu

1st run result

station.

2nd run result

…ion by joachim asare

619135593 · 2024-11-19T03:25:04Z

hi,I have run ./evaluate.sh reference,but when i run ./chat, the result is gibberish response too.What can i do?

… to the bonus optimizaiton with ARM fallback and compiler optimizations

joachimasare · 2024-12-15T08:34:39Z

Update to PR:

Hi @sxtyzhangzk Zhekai Zhang,
Following up on the feedback on canvas, I have now combined the cache blocking optimization with all other techniques as requested, and tested the performance with compiler optimization enabled (-Ofast). I have made commits to my branch not too long ago. Below were the results rerun:

Updated Results Table:

Implementation	Total Time (ms)	Average Time (ms)	Count	GOPs
Reference	522.049011	52.204002	10	5.021444
All Techniques (Without Cache Blocking)	40.069000	4.006000	10	65.423145
All Techniques (With Cache Blocking)	37.713001	3.771000	10	69.510250

Performance Improvement:

All Techniques without Cache Blocking:
- GOPs: 65.42 (12.7x improvement over reference).
All Techniques with Cache Blocking:
- GOPs: 69.51 (13.8x improvement over reference).
- This represents a 6.25% improvement over the previous "all techniques" implementation.

Screenshots:

Reference Implementation:
![Reference]()
All Techniques (Without Cache Blocking):
![All Techniques](l)
All Techniques (With Cache Blocking):
![All Techniques + Cache Blocking]()

Commit Details

Added cache blocking to all_techniques.cc with fallback for ARM architecture.
Enabled compiler optimizations (-Ofast) in the Makefile for better performance evaluation.

Summary:

After integrating cache blocking with all techniques, I was able to achieve a performance improvement of 6.25%

"All Techniques" achieved a ~13.x improvement over the reference implementation.
"All Techniques + Cache Blocking" further improved performance to ~14x compared to the reference.

sxtyzhangzk · 2024-12-16T19:37:55Z

Got a 6% speedup over our reference solution on my desktop. Great job!

joachimasare · 2024-12-18T20:51:02Z

great! thanks.

Addition of cache blocking optimization for bonus performance evaluat…

b949215

…ion by joachim asare

Integrated the cache blocking into all_techniques as asked to inlcude…

61626c6

… to the bonus optimizaiton with ARM fallback and compiler optimizations

joachimasare closed this Dec 18, 2024

joachimasare reopened this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonus Performance Optimization: Cache Blocking Technique #10

Bonus Performance Optimization: Cache Blocking Technique #10

joachimasare commented Nov 19, 2024

619135593 commented Nov 19, 2024

joachimasare commented Dec 15, 2024 •

edited

Loading

sxtyzhangzk commented Dec 16, 2024

joachimasare commented Dec 18, 2024

Bonus Performance Optimization: Cache Blocking Technique #10

Are you sure you want to change the base?

Bonus Performance Optimization: Cache Blocking Technique #10

Conversation

joachimasare commented Nov 19, 2024

619135593 commented Nov 19, 2024

joachimasare commented Dec 15, 2024 • edited Loading

Update to PR:

Updated Results Table:

Performance Improvement:

Screenshots:

Commit Details

Summary:

sxtyzhangzk commented Dec 16, 2024

joachimasare commented Dec 18, 2024

joachimasare commented Dec 15, 2024 •

edited

Loading