-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SW] Initial support for compilation in Linux environment #312
Commits on Jun 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 665b6ae - Browse repository at this point
Copy the full SHA 665b6aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 193f385 - Browse repository at this point
Copy the full SHA 193f385View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5e2fb2 - Browse repository at this point
Copy the full SHA d5e2fb2View commit details -
Configuration menu - View commit details
-
Copy full SHA for e4c2466 - Browse repository at this point
Copy the full SHA e4c2466View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c9d5ba - Browse repository at this point
Copy the full SHA 2c9d5baView commit details -
Configuration menu - View commit details
-
Copy full SHA for 26af2ec - Browse repository at this point
Copy the full SHA 26af2ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7bcfcbf - Browse repository at this point
Copy the full SHA 7bcfcbfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7b0191a - Browse repository at this point
Copy the full SHA 7b0191aView commit details -
[hardware] 🐛 Suboptimal fix to reshuffle with LMUL > 1
If LMUL_X has X > 1, Ara injects one reshuffle at a time for each register within Vn and V(n+X-1) that has an EEW mismatch. All these reshuffles are reshuffling different Vm with LMUL_1, but also the same register (Vn with LMUL_X) from the point of view of the hazard checks on the next instruction that has a dependency on Vn with LMUL_X. We cannot just inject one macro reshuffle since the registers between Vn and V(n+X-1) can have different encodings. So, we need finer-grain reshuffles that messes up the dependency tracking. For example, vst @, v0 (LMUL_8) will use the registers from v0 to v7. If they are all reshuffled, we will end up with 8 reshuffle instructions that will get IDs from 0 to 7. The store will then see a dependency on the reshuffle ID that targets v0 only. This is wrong, since if the store opreq is faster than the slide opreq once the v0-reshuffle is over, it will violate the RAW dependency. Not to mess this up, the safest and most suboptimal fix is to just wait in WAIT_IDLE after a reshuffle with LMUL > 1. There are many possible optimizations to this: 1) Check if, when LMUL > 1, we reshuffled more than 1 register. If we reshuffle 1 reg only, we can also skip the WAIT_IDLE. 2) Check if all the X registers need to be reshuffled (common case). If this is the case, inject a large reshuffle with LMUL_X only and skip WAIT_IDLE. 3) Not to wait until idle, instead of WAIT_IDLE we can inject the reshuffles starting from V(n+X-1) instead than Vn. This will automatically adjust the dependency check and will speed up a bit the whole operation.
Configuration menu - View commit details
-
Copy full SHA for 7d6da86 - Browse repository at this point
Copy the full SHA 7d6da86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4733f20 - Browse repository at this point
Copy the full SHA 4733f20View commit details -
Configuration menu - View commit details
-
Copy full SHA for a8426f3 - Browse repository at this point
Copy the full SHA a8426f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2fed184 - Browse repository at this point
Copy the full SHA 2fed184View commit details
Commits on Jun 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for dd0047b - Browse repository at this point
Copy the full SHA dd0047bView commit details -
* Add MMU interface (just mock) * Refactoring
Configuration menu - View commit details
-
Copy full SHA for b6363d2 - Browse repository at this point
Copy the full SHA b6363d2View commit details -
* Switch from pulp-platform/cva6 to MaistoV/cva6_fork * Bump axi to v0.39.0
Configuration menu - View commit details
-
Copy full SHA for fc8dd42 - Browse repository at this point
Copy the full SHA fc8dd42View commit details -
Supporting vstart CSR for operand read, VALU, VLSU
* vstart support for vector unit-stride loads and stores * vstart support for vector strided loads and stores * vstart support for valu operations, mask operations not tested * Preliminary work on vstart support for vector indexed loads and stores * Minor fixes * Refactoring * Explanatory comments
Configuration menu - View commit details
-
Copy full SHA for 310c4da - Browse repository at this point
Copy the full SHA 310c4daView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9dd870f - Browse repository at this point
Copy the full SHA 9dd870fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d0a026a - Browse repository at this point
Copy the full SHA d0a026aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 37faafa - Browse repository at this point
Copy the full SHA 37faafaView commit details -
Configuration menu - View commit details
-
Copy full SHA for a3d46d2 - Browse repository at this point
Copy the full SHA a3d46d2View commit details -
[hardware] Support VLD and VST with vstart > 0
- Restrict mem bus to EW if vstore, vstart > 0, and EW < 64-bit If vstart > 0 and EW < 64, the situation is similar to when the memory addr is misaligned wrt the memory bus. Because of the VRF Byte Layout and since the granularity of each lane's payload to the store unit is 64 bit, all the packets can contain valid data while we have not completed the beat. So, either we calculate in the addrgen the effective length of a bursts with unequal beats, or we add a buffer and aligner in the store unit, or we handle the ready signals at a byte level, or we simply reduce the effective memory bus to the element width (worst case). We do the latter. It's low performance, but vstore with vstart > 0 happen after an exception, so the throughput drop should be acceptable. - Data packets from VRF to STU Operand requesters now send balanced payloads from all the lanes if vstart > 0. The store unit will identify the good ones by itself, and will only have to handshake balanced payloads.
Configuration menu - View commit details
-
Copy full SHA for 09b927f - Browse repository at this point
Copy the full SHA 09b927fView commit details -
[hardware] 🐛 Flush st-opqueue and reset st-requester upon exception
- Time the STU exception flush with the opqueues
Configuration menu - View commit details
-
Copy full SHA for a9411ea - Browse repository at this point
Copy the full SHA a9411eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 929bcac - Browse repository at this point
Copy the full SHA 929bcacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 22031a1 - Browse repository at this point
Copy the full SHA 22031a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 62f6d5a - Browse repository at this point
Copy the full SHA 62f6d5aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0532ffb - Browse repository at this point
Copy the full SHA 0532ffbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9be56e9 - Browse repository at this point
Copy the full SHA 9be56e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for f12be75 - Browse repository at this point
Copy the full SHA f12be75View commit details -
[hardware] 🐛 Don't use vstart to drop elements for slides
The vstart signal within the lanes is not the architectural vstart. For all the instructions, it corresponds to the architectural vstart manipulated to reflect the "vstart" in every lane for VRF fetch address calculation purposes. Memory instructions, which support arch vstart > 0, can use that vstart signal to resize the number of elements to fetch from the VRF. Slide instructions, instead, further modify the vstart only for addressing purposes, and should not use the vstart signal to resize the number of elements to fetch.
Configuration menu - View commit details
-
Copy full SHA for 699a2c9 - Browse repository at this point
Copy the full SHA 699a2c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 097049f - Browse repository at this point
Copy the full SHA 097049fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 404ce2f - Browse repository at this point
Copy the full SHA 404ce2fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9067762 - Browse repository at this point
Copy the full SHA 9067762View commit details -
Configuration menu - View commit details
-
Copy full SHA for 11be9ed - Browse repository at this point
Copy the full SHA 11be9edView commit details -
Configuration menu - View commit details
-
Copy full SHA for e385500 - Browse repository at this point
Copy the full SHA e385500View commit details -
[apps] Extended sw build for Linux
* Added LINUX switch, default LINUX=0
Configuration menu - View commit details
-
Copy full SHA for 0a994f9 - Browse repository at this point
Copy the full SHA 0a994f9View commit details