[SW] Initial support for compilation in Linux environment #312

mp-17 · 2024-06-25T10:12:38Z

#269 rework

Introduce initial support for kernel compilation under Linux environment

Changelog

Fixed

Description of changes

Added

Description of changes

Changed

Description of changes

Checklist

Automated tests pass
Changelog updated
Code style guideline is observed

Please check our contributing guidelines before opening a Pull Request.

If LMUL_X has X > 1, Ara injects one reshuffle at a time for each register within Vn and V(n+X-1) that has an EEW mismatch. All these reshuffles are reshuffling different Vm with LMUL_1, but also the same register (Vn with LMUL_X) from the point of view of the hazard checks on the next instruction that has a dependency on Vn with LMUL_X. We cannot just inject one macro reshuffle since the registers between Vn and V(n+X-1) can have different encodings. So, we need finer-grain reshuffles that messes up the dependency tracking. For example, vst @, v0 (LMUL_8) will use the registers from v0 to v7. If they are all reshuffled, we will end up with 8 reshuffle instructions that will get IDs from 0 to 7. The store will then see a dependency on the reshuffle ID that targets v0 only. This is wrong, since if the store opreq is faster than the slide opreq once the v0-reshuffle is over, it will violate the RAW dependency. Not to mess this up, the safest and most suboptimal fix is to just wait in WAIT_IDLE after a reshuffle with LMUL > 1. There are many possible optimizations to this: 1) Check if, when LMUL > 1, we reshuffled more than 1 register. If we reshuffle 1 reg only, we can also skip the WAIT_IDLE. 2) Check if all the X registers need to be reshuffled (common case). If this is the case, inject a large reshuffle with LMUL_X only and skip WAIT_IDLE. 3) Not to wait until idle, instead of WAIT_IDLE we can inject the reshuffles starting from V(n+X-1) instead than Vn. This will automatically adjust the dependency check and will speed up a bit the whole operation.

* Add MMU interface (just mock) * Refactoring

* Switch from pulp-platform/cva6 to MaistoV/cva6_fork * Bump axi to v0.39.0

* vstart support for vector unit-stride loads and stores * vstart support for vector strided loads and stores * vstart support for valu operations, mask operations not tested * Preliminary work on vstart support for vector indexed loads and stores * Minor fixes * Refactoring * Explanatory comments

- Restrict mem bus to EW if vstore, vstart > 0, and EW < 64-bit If vstart > 0 and EW < 64, the situation is similar to when the memory addr is misaligned wrt the memory bus. Because of the VRF Byte Layout and since the granularity of each lane's payload to the store unit is 64 bit, all the packets can contain valid data while we have not completed the beat. So, either we calculate in the addrgen the effective length of a bursts with unequal beats, or we add a buffer and aligner in the store unit, or we handle the ready signals at a byte level, or we simply reduce the effective memory bus to the element width (worst case). We do the latter. It's low performance, but vstore with vstart > 0 happen after an exception, so the throughput drop should be acceptable. - Data packets from VRF to STU Operand requesters now send balanced payloads from all the lanes if vstart > 0. The store unit will identify the good ones by itself, and will only have to handshake balanced payloads.

- Time the STU exception flush with the opqueues

The vstart signal within the lanes is not the architectural vstart. For all the instructions, it corresponds to the architectural vstart manipulated to reflect the "vstart" in every lane for VRF fetch address calculation purposes. Memory instructions, which support arch vstart > 0, can use that vstart signal to resize the number of elements to fetch from the VRF. Slide instructions, instead, further modify the vstart only for addressing purposes, and should not use the vstart signal to resize the number of elements to fetch.

* Added LINUX switch, default LINUX=0

mp-17 · 2024-10-16T13:37:03Z

Continue this in #319

MaistoV and others added 15 commits June 19, 2024 18:13

Rename CSRs in ara dispatcher

665b6ae

Stall Ara upon operations on vector CSRs

193f385

Change "errors" to "exceptions"

d5e2fb2

Extend and fix Ara exception reporting from VLSU

e4c2466

Set vstart=0 for succesful vector instructions

2c9d5ba

[hardware] Fix vstart handling in dispatcher

26af2ec

[hardware] 🐛 Fix reshuffling bug in dispatcher

7bcfcbf

[hardware] 🐛 Fix eew_q update during reshuffle

7b0191a

[hardware] 🐛 Fix reshuffle

4733f20

[hardware] 🐛 Consider LMUL when deciding if to reshuffle vd

a8426f3

[hardware] Bump CVA6

2fed184

Refactoring addrgen

dd0047b

Extensions and bug fixes

b6363d2

* Add MMU interface (just mock) * Refactoring

Update submodules

fc8dd42

* Switch from pulp-platform/cva6 to MaistoV/cva6_fork * Bump axi to v0.39.0

mp-17 mentioned this pull request Jun 25, 2024

[Draft] ✨ Extend sw build flow for Linux environment #269

Closed

3 tasks

MaistoV and others added 14 commits June 25, 2024 12:17

tmp commit Adding MMU logic

9dd870f

tmp commit (MMU stub)

d0a026a

tmp commit (Extending Ara for MMU stub exceptions

37faafa

[hardware] 🐛 Fix end addr computation in addrgen

a3d46d2

[hardware] 🐛 Flush st-opqueue and reset st-requester upon exception

a9411ea

- Time the STU exception flush with the opqueues

[hardware] 🐛 Reshuffle before a vst with vstart > 0

929bcac

[hardware] 🐛 Fix packet count hw

22031a1

[hardware] 🐛 Fix case when ld exception is not on the first beat

62f6d5a

[hardware] 🐛 Fix case when st exception is not on the first beat

0532ffb

[hardware] 🐛 Stop Ara frontend until idle after an exception

9be56e9

[hardware] 🐛 Signal mem op done passing from VLDU and VSTU

f12be75

mp-17 added 5 commits June 25, 2024 12:17

[hardware] 🐛 Fix addr check on AXI transactions

097049f

[hardware] Fix verilator lint

404ce2f

[hardware] Compress accelerator MMU interface

9067762

[hardware] Fix rebase errors

11be9ed

[Bender] Bump CVA6

e385500

mp-17 force-pushed the mp/sw/os branch from 94bd57a to fb6985f Compare June 25, 2024 10:22

[apps] Extended sw build for Linux

0a994f9

* Added LINUX switch, default LINUX=0

mp-17 force-pushed the mp/sw/os branch from fb6985f to 0a994f9 Compare June 25, 2024 11:40

mp-17 closed this Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SW] Initial support for compilation in Linux environment #312

[SW] Initial support for compilation in Linux environment #312

mp-17 commented Jun 25, 2024

mp-17 commented Oct 16, 2024

[SW] Initial support for compilation in Linux environment #312

[SW] Initial support for compilation in Linux environment #312

Conversation

mp-17 commented Jun 25, 2024

Changelog

Fixed

Added

Changed

Checklist

mp-17 commented Oct 16, 2024