Systolic arrays sizes #43

mateo-vm · 2021-12-13T13:30:49Z

I am testing the systolic array accelerator, and I am having problems when changing the array sizes. For the default configuration (8x8), everything works fine.

For smaller sizes, it never ends. It gets stuck in a loop, and never leaves. The following is an example of a loop for a 4x4 systolic array. This happens even for the base test.c.

963152000: system.systolic_array_acc.input_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.input_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.input_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.input_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.weight_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.weight_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.weight_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963152000: system.systolic_array_acc.weight_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: global: Weight fold barrier, arrived: 1.
963153000: global: Weight fold barrier, arrived: 2.
963153000: global: evaluate
963153000: system.systolic_array_acc.input_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.input_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.input_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.input_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.weight_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.weight_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.weight_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963153000: system.systolic_array_acc.weight_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: global: Weight fold barrier, arrived: 3.
963154000: global: Weight fold barrier, arrived: 4.
963154000: global: evaluate
963154000: system.systolic_array_acc.input_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.input_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.input_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.input_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.weight_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.weight_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.weight_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963154000: system.systolic_array_acc.weight_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: global: Weight fold barrier, arrived: 5.
963155000: global: Weight fold barrier, arrived: 6.
963155000: global: evaluate
963155000: system.systolic_array_acc.input_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.input_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.input_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.input_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.weight_fetch0: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.weight_fetch1: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.weight_fetch2: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963155000: system.systolic_array_acc.weight_fetch3: Fetch queue occupied space: 0 / 32, allFetched: 1, allConsumed: 1, arrived at barrier: 1.
963156000: global: Weight fold barrier, arrived: 7.
963156000: global: Weight fold barrier, arrived: 8.
963156000: global: All have arrived at the weight fold barrier.

As for the bigger SAs, there are two cases. In the base test.c, SAs up to a size of 16 (not included) work as expected. For bigger sizes, I am getting a segmentation fault.

If I want to simulate bigger layers (e.g., ResNet's conv3), for bigger SAs I am getting the following error:

fatal: Streaming out premature data!

How can these errors be fixed?

The text was updated successfully, but these errors were encountered:

xyzsam · 2021-12-20T16:13:47Z

Yuan did you get a chance to look into this? IIRC we've tested out both 4x4 and 16x16 arrays in the past.

yaoyuannnn · 2021-12-20T19:25:44Z

Yuan did you get a chance to look into this? IIRC we've tested out both 4x4 and 16x16 arrays in the past.

Sam, I haven't taken a look. It could be bugs introduced by more recent changes. Will look into this this week.

mateo-vm · 2022-01-14T11:03:40Z

Hi! Would there be any update regarding this issue?

yaoyuannnn · 2022-01-17T05:40:10Z

Sorry for late response. After a bit of investigation, I think for the 4x4 PE configuration, the hang is because of a bug in the commit unit (which collects data from the PEs and writes results to the local SRAM). It currently assumes the number of PE columns is larger than the writeback line size. I will upload a fix for this tomorrow.

xyzsam · 2022-02-07T23:31:33Z

@mateo-vm: did this resolve your problems?

yaoyuannnn · 2022-02-08T00:21:47Z

Hey Sam, the MRs didn't fix the bug for 16x16 configuration. Sorry been quite busy lately, but will get down to this soon.

mateo-vm · 2022-02-10T11:34:27Z

@mateo-vm: did this resolve your problems?

Yes, thank you very much. At the end I focused on arrays up to 8x8, so #45 really helped. On my side is solved, but I won't close the issue in case you still want to solve the 16x16 problem.

yaoyuannnn self-assigned this Dec 15, 2021

yaoyuannnn added the bug label Dec 15, 2021

This was referenced Jan 18, 2022

systolic-array: Fix access size in scratchpad. #44

Merged

systolic-array: Fix bugs for small PE configurations. #45

Merged

mateo-vm mentioned this issue Feb 18, 2022

GEMM in Systolic Array #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systolic arrays sizes #43

Systolic arrays sizes #43

mateo-vm commented Dec 13, 2021

xyzsam commented Dec 20, 2021

yaoyuannnn commented Dec 20, 2021

mateo-vm commented Jan 14, 2022

yaoyuannnn commented Jan 17, 2022

xyzsam commented Feb 7, 2022

yaoyuannnn commented Feb 8, 2022

mateo-vm commented Feb 10, 2022

Systolic arrays sizes #43

Systolic arrays sizes #43

Comments

mateo-vm commented Dec 13, 2021

xyzsam commented Dec 20, 2021

yaoyuannnn commented Dec 20, 2021

mateo-vm commented Jan 14, 2022

yaoyuannnn commented Jan 17, 2022

xyzsam commented Feb 7, 2022

yaoyuannnn commented Feb 8, 2022

mateo-vm commented Feb 10, 2022