Skip to content
This repository has been archived by the owner on Nov 28, 2023. It is now read-only.

Implement r0 crate in assembly #123

Merged
merged 5 commits into from
Oct 2, 2023

Conversation

coastalwhite
Copy link
Contributor

This implements the r0::init_data and r0::zero_bss routines in assembly. There is a generic implementation for riscv32 and riscv64, since a riscv64 would deal with alignment problems. The routines are kept at their old calling site so that only one hardware thread calls them. Consequently they are also inlined into the start_rust function.

One special consideration I ran into is the alignment constraints for the .bss and .data sections. In the documentation for r0, it notes that the .bss section should also be aligned to 4 byte. This makes a lot of sense since it was using u32s as enforcing alignment of the sections. The current link.x file does not have this constraint. It is added for the .data section and therefore I am assuming it to be a bug. I added it in since it is also required for this patch to work correctly.

Ideally, for the riscv64 implement we could use the 64-bit instructions load and store instructions. This is currently not possible. We could overcome the 4 byte alignment without changing the link.x script by using the follow code.

core::arch::asm!(
    "
        // Copy over .data
        la      {start},_sdata
        la      {end},_edata
        la      {input},_sidata

        andi    {a},{start},4
        beqz    {a},1f
        lw      {b},0({input})
        sw      {b},0({start})
        addi    {input},{input},4
        addi    {start},{start},4

    1:
    	addi    {a},{start},8
    	bgeu    {a},{end},1f
    	ld      {b},0({input})
    	sd      {b},0({start})
    	addi    {start},{start},8
    	addi    {input},{input},8
    	j       1b

    1:
        andi    {a},{end},4
        beqz    {a},1f
        lw      {b},0({input})
    	sw      {b},0({start})

    1:
        // Zero out .bss
    	la      {start},_sbss
    	la      {end},_ebss

        andi    {b},{start},4
        beqz    {b},2f
    	sw      zero,0({start})
    	addi    {start},{start},4

    2:
    	addi    {a},{start},8
    	bgeu    {a},{end},2f
    	sd      zero,0({start})
    	addi    {start},{start},8
    	j       2b

    2:
        andi    {b},{end},4
        beqz    {b},2f
    	sw      zero,0({start})

    2:
        li      {start},0
        li      {end},0
        li      {input},0
    ",
    start = out(reg) _,
    end = out(reg) _,
    input = out(reg) _,
    a = out(reg) _,
    b = out(reg) _,
);

I did try this but it might still lead to problematic behavior since now there is a chance that the _sidata is differently aligned than _sdata. In which case, we use load and store instructions on unaligned addresses. This might cause exceptions according to the RISC-V Manual. Therefore, we also need to ensure the alignment of the _sidata section. We could assert this, but again, this might be undesirable.

As proposed as well in #122 we could add a feature to zero out the entire RAM, but I believe that to be a separate issue onto itself.

Fixes #122

This implements the `r0::init_data` and `r0::zero_bss` routines in
assembly. There is a generic implementation for `riscv32` and
`riscv64`, since `riscv64` deals with alignment problems. The routines
are kept at their old calling site so that only one hardware thread
calls them. Consequently they are also inlined into the `start_rust`
function.

[Issue rust-embedded#122]
@coastalwhite coastalwhite requested a review from a team as a code owner September 24, 2023 20:21
@romancardenas
Copy link
Contributor

Thanks! I'll review it ASAP

Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left you a few comments. I had some doubts with the loops, maybe I'm missing something.

Regarding RISC-V 64 specific routines, we can use feature gates to select one or another method. However, the current implementation should work for both RISC-V 32 and 64 (I guess).

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
link.x Outdated Show resolved Hide resolved
romancardenas
romancardenas previously approved these changes Sep 29, 2023
Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coastalwhite
Copy link
Contributor Author

Is there anything you want me to do about the RV64 specific routine? I can add that. I just need to modify the link.x to do so.

@romancardenas
Copy link
Contributor

I've been reviewing the current link.x file and think the best approach is adding those extra assertions instead of changing the alignment requirements. What do you think? Then, we can provide a RISCV64-specific routine as you suggest.

@romancardenas
Copy link
Contributor

BTW! Please, add your changes to CHANGELOG.md before merging this! (I need to set up a CI workflow to make me not forget about this step...)

@coastalwhite
Copy link
Contributor Author

So, I now added the RV64 version. Could you please check to make sure I didn't make any mistakes?

romancardenas
romancardenas previously approved these changes Oct 1, 2023
Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The only "improvement" I can propose now depends on whether the new assertion is only required in RISC-V 64 targets. If so, we can add link-32.x and link-64.x linker files with arch-specific assertions. Then, build.rs would add to the compilation the correct additional linker file with a generic name such as link-arch.x, which will be included in the current link.x file. The hifive1 crate does something similar for a set of boards.

Another improvement would be modifying start_rust to look more like in esp-riscv-rt. I'd love to update riscv-rt using some of the ideas of esp-riscv-rt. Maybe @MabezDev can give us more insights about this :)

In any case, these improvements can be tackled in future PRs

link.x Outdated
@@ -150,6 +150,10 @@ BUG(riscv-rt): .data is not 4-byte aligned");
ASSERT(_sidata % 4 == 0, "
BUG(riscv-rt): the LMA of .data is not 4-byte aligned");

/* Make sure that we can safely perform .data initialization on RV64 */
ASSERT(_sidata % 8 == _sdata % 8, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assertion necessary for RISC-V 32 targets?

@romancardenas romancardenas linked an issue Oct 1, 2023 that may be closed by this pull request
@romancardenas
Copy link
Contributor

CI fails... this RISC-V 64 optimization seems difficult to add without modifying the alignment rules.

I propose reverting the RISC-V 64 optimization and leaving the 32-bit-compliant version for now. Then, we can develop a slightly smarter build script that applies different alignment rules depending on the target pointer width and add this optimization again.

@coastalwhite
Copy link
Contributor Author

Maybe, this is better.

Now, I added separate linker scripts for rv32 and rv64. The only change is the alignment. The build script that chooses which one to take based on the target.

One added benefit of doing this is that the riscv64 assembly code can be massively simplified.

Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's see if the CI passes now

@romancardenas romancardenas added this pull request to the merge queue Oct 2, 2023
Merged via the queue into rust-embedded:master with commit a4fd9fa Oct 2, 2023
67 checks passed
@coastalwhite coastalwhite deleted the r0-in-asm branch October 2, 2023 14:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reimplement r0, since it is deprecated RAM init code violates pointer provenance and aliasing rules
2 participants