Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I tell the Rust compiler &mut [u8] has changed after a DMA operation #537

Open
schultetwin1 opened this issue Oct 7, 2024 · 41 comments

Comments

@schultetwin1
Copy link

Hi all!

I'm trying to force the Rust compiler to assume the contents of a &mut [u8] has changed after a DMA operation.

In the std library, calls to things like std::fs::File::read take a &mut [u8] and after the call, the contents of that slice have changed. The Rust compiler knows it can not assume the contents of that buffer after the call to read.

I need to do something similar with DMA. After a DMA operation has been completed (and hardware has written bytes to memory), the Rust compiler must assume the buf has changed.

My current solution (below) is to use asm!. Will that do the trick? Is there a more portable way?

Something like:

fn dma_read(&mut self, buf: &mut [u8]) {
    // Zero out buffer
    buf.fill(0);

    let addr = buf.as_mut_ptr() as usize;
    let len = buf.len();

    // Setup DMA via MMIO
    self.reg.dma_addr.set(addr);
    self.reg.dma_len.set(len);

    // Start DMA via MMIO
    self.reg.dma_start.set(1);

    // Wait for DMA to complete via MMIO
    while self.reg.dma_done.get() != 1 {}

    // Insert an `asm!` block to force Rust to assume `buf` has been modified
    unsafe { core::arch::asm!("mov {},{}", out(reg) _, in(reg) addr) };

    // Read "buf"
    // At this point, how can I ensure the Rust compiler will assume buf has been modified?
}
@Diggsey
Copy link

Diggsey commented Oct 7, 2024

Assuming self.reg.dma_addr.set does some FFI or ASM internally that the compiler can't see into, then I wouldn't expect anything else to be necessary, since it's equivalent to passing addr to a C function that writes into it.

There are a few extra complexities which I am not completely sure of:

  1. Strict provenance. With strict provenance you can't round-trip pointers through non-pointer-types (ie. the cast to as usize is invalid. You should probably use https://doc.rust-lang.org/std/primitive.pointer.html#method.expose_provenance when stable instead of as usize so that it's clear that the code is not compatible with strict provenance. To solve this, the DMA library you are using would need to be rewritten to accept pointers.
  2. Synchronization. Since the DMA happens at some unspecified time rather than on the self.reg.dma_addr.set(addr) line, there needs to be some kind of fence to prevent the compiler moving any reads from buf above the while self.reg.dma_done.get() != 1 {} loop. However, I would expect that to be part of the guarantees of the .get() function (ie. it's at least as strong as an acquire load).

Why do you think the asm block is necessary? Did you run into a problem without it?

@chorman0773
Copy link
Contributor

I'd agree with the above with the caveat that strict provenance is not a ruleset enforced on rust code by the language, only a coding standard. Though I agree that using expose_provenance to communicate intent is a better option that as casts willy nilly.

@korran
Copy link

korran commented Oct 8, 2024

Assuming self.reg.dma_addr.set does some FFI or ASM internally that the compiler can't see into, then I wouldn't expect anything else to be necessary, since it's equivalent to passing addr to a C function that writes into it.

self.reg.dma_addr.set() is likely just a thin wrapper around std::ptr::write_volatile(), which the compiler can see into.

@schultetwin1
Copy link
Author

Assuming self.reg.dma_addr.set does some FFI or ASM internally that the compiler can't see into, then I wouldn't expect anything else to be necessary, since it's equivalent to passing addr to a C function that writes into it.

As @korran mentioned, the self.reg.dma_addr.set (and all self.reg writes and reads) are thin wrappers around core::ptr::write_volatile() or core::ptr::read_volatile().

There are a few extra complexities which I am not completely sure of:

  1. Strict provenance. With strict provenance you can't round-trip pointers through non-pointer-types...

I am aware of this and very much looking forward to when strict_provenace is stable! For now, I think as usize is the best option I have in stable Rust (please correct me if I'm wrong).

  1. Synchronization. Since the DMA happens at some unspecified time rather than on the self.reg.dma_addr.set(addr) line, there needs to be some kind

Yes, I'm aware of this as well, but was omitted it for code brevity sake. That being said, there is another issue (#347) that talks about how compiler fences may not work in a case like this.

Why do you think the asm block is necessary? Did you run into a problem without it?

Yes, I was running into issues (I did not complete debug the disassembly) where Rust would assume the buf was still filled with zeros after while self.reg.dma_done.get() != 1 {}.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 8, 2024

volatile isn't allowed to access the memory of any address other that the one being accessed, so the volatile read checking for the dma complete isn't allowed to manipulate any address other than the pin check address with each read.

You should probably pass a pointer to the buffer into asm after the DMA, and that should make llvm assume the memory is now changed. The actual asm doesn't need to do anything at all, it can just be a comment. llvm isn't allowed to check what you wrote in your asm block so it won't know.

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

I'm trying to force the Rust compiler to assume the contents of a &mut [u8] has changed after a DMA operation.

The memory a mutable reference points to must not be aliased, which means it must not be read or written by anything else. This includes DMA operations.

So, your code quite simply has UB, and the fix is to use raw pointers or shared references with UnsafeCell. DMA memory is inherently shared mutable memory, which cannot be accessed using Rust references. Remember, Rust references enforce a model of "aliasing XOR mutability", you can't have both (with the exception of interior mutability).

@schultetwin1
Copy link
Author

Thanks @RalfJung!

The memory a mutable reference points to must not be aliased, which means it must not be read or written by anything else

That makes sense, but in that case, why does std::fs::File::read not need to use an UnsafeCell? In the case of std::fs::File::read, the memory points to a buffer which is filled by a syscall instead of a DMA operation but it seems like both are outside of Rust's memory model.

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

When you give an &mut reference to a function, the function may write that reference. This is possible even in safe code, so naturally it must be allowed. But the part where the reference is explicitly given to that function is crucial -- the function writes to that memory through a pointer that is derived from your reference, and that's what makes this legal.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 8, 2024

But the DMA is also writing to a pointer derived from the mutable reference

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

Oh I see... hm well, not really, since the provenance chain is broken. "derived from" refers to provenance. Once there is an integer, it's not "derived from" any more.

But the provenance is getting exposed. So what's missing is the right kind of fence, right? If set/get were relaxed atomic accesses, then what we'd have to add is a release fence before the set(addr), and an acquire fence after the while.

So IMO the proper fix is, yet again, to make (sufficiently small) volatile accesses relaxed atomic, and tell people to add these fences. But that's not something you can do in your code. Absent that, there's probably hacks involving inline asm as has been suggested above...

@Lokathor
Copy link
Contributor

Lokathor commented Oct 8, 2024

Personally, I'd normally tell people to do the entire DMA sequence in asm when possible. Just pass the pointer to the buffer to asm, then set all the registers and spin loop within that asm. Don't move control flow back to Rust until the DMA is over, if possible. Then the asm block as a whole becomes "a slightly weird memcpy", which is easy to reason about. A pointer goes in, and when the asm is done there's new data in the pointed-to area, we're all used to that.

This specific example usage code does seem to allow for that, so that's what I'd suggest here.

@Diggsey
Copy link

Diggsey commented Oct 8, 2024

That being said, there is another issue (#347) that talks about how compiler fences may not work in a case like this.

That issue is about compiler fences, which are not relevant here. You need an atomic fence or equivalent to synchronize with the DMA operation since it's effectively another thread.

Yes, I was running into issues (I did not complete debug the disassembly) where Rust would assume the buf was still filled with zeros after while self.reg.dma_done.get() != 1 {}.

It sounds like the issues you are seeing are from this lack of synchronization, rather than anything to do with the mutable reference.

What you need as others have mentioned is a volatile and atomic Acquire read, but Rust doesn't provide primitives for this, so you need to do this in ASM.

@VorpalBlade
Copy link

Personally, I'd normally tell people to do the entire DMA sequence in asm when possible. Just pass the pointer to the buffer to asm, then set all the registers and spin loop within that asm. Don't move control flow back to Rust until the DMA is over, if possible. Then the asm block as a whole becomes "a slightly weird memcpy", which is easy to reason about. A pointer goes in, and when the asm is done there's new data in the pointed-to area, we're all used to that.

This specific example usage code does seem to allow for that, so that's what I'd suggest here.

That approach is not a general solution. You often want to do other things while DMA is running in the background (otherwise, why off-load it, why not just bit bang?). In particular it is not compatible with an async executor such as embassy.

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

It should be sufficient to use one inline asm block that logically does "release fence and then write to DMA address", and another inline asm block that does "wait until DMA is done and then acquire" (or "test if DMA is done, and acquire if yes"). Then you can have normal Rust code in between.

@kupiakos
Copy link

kupiakos commented Oct 8, 2024

the proper fix is, yet again, to make (sufficiently small) volatile accesses relaxed atomic, and tell people to add these fences

There's no way that can land without major upheaval of existing embedded codebases. Many are on platforms that don't define atomic operations, and even if they did, they could not afford that code size cost. There must be a solution that works on systems that don't define any atomic intrinsics and are consistently single-threaded. Perhaps this means two sets of volatile operations, one that works as it does today and another that interacts with atomic fences?

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

@kupiakos on single-core platforms, a (relaxed) atomic access can just be compiled to a regular boring load or store. In fact that's true on multi-core platforms as well (at least under the current C++ memory model). So I don't know why you think this is a problem?

@kupiakos
Copy link

kupiakos commented Oct 8, 2024

I don't know why you think this is a problem?

It's a problem if a volatile access tries to generate atomic intrinsics and they're not defined for the platform, causing a compile failure for Tier-3 targets. If it can properly identify when atomics aren't available and explicitly avoid using them, then that shouldn't affect us.

Then, my concern is with the rather unexpected automatic conflation of atomics (concurrency) and volatililty (ensuring order of operations and preventing elision). If Rust is going to extend a raw pointer operation to do more than what the equivalent C would do, we should provide a fallback for the C-like behavior.

@Lokathor
Copy link
Contributor

Lokathor commented Oct 8, 2024

The last time I tried to use atomics on a no-atomic target, just after that was introduced to the compiler, there was an unfortunate amount of code generated. The situation might have improved since then, but an inspection of what the compiler actually does on several common older targets is definitely worth doing.

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

Something seems seriously wrong if a relaxed atomic access needs intrinsics of any sort. On which architecture is this more complicated than a single load/store instruction?

But anyway there's a cop-out here, which is that I said "sufficiently small" volatile accesses must be atomic. The threshold for "sufficient" will be platform-dependent, and some platforms can define it in such a way that no volatile accesses at all are atomic.

Then, my concern is with the rather unexpected automatic conflation of atomics (concurrency) and volatililty (ensuring order of operations and preventing elision)

These things are closely related though. Volatile is meant for using the memory to communicate with another component, i.e., it only ever makes sense to use volatile on memory that is in some sense "shared" with something else. That something else runs concurrently with the current thread. And atomics are all about properly modeling concurrent memory accesses.

In particular, atomics are also all about ensuring a certain order of operations -- the (in)famous "happens-before" order. And it's exactly that kind of ordering which we currently cannot express with volatile accesses, as the example in the OP shows. Atomics have all the vocabulary we need to fill this gap, it would be pointless to try to reinvent this.

In my view, a non-atomic volatile access makes very little sense. (Note that I mean "atomic" here in the sense of "participating in the concurrency memory model". Volatiles accesses can of course in general tear, so they are closer to "atomic bytewise memcpy". But actually of course one often does not want tearing for volatile accesses, and yet we do not currently make any guarantee in that direction -- yet another reason to make sufficiently small volatile accesses atomic.)

@korran
Copy link

korran commented Oct 8, 2024

In my view, a non-atomic volatile access makes very little sense.

The vast majority of MMIO register accesses do not affect the contents of memory. Registers that control and monitor DMA actions are the exception. If atomic volatile accesses induce the compiler optimizer to be more conservative with reloading CPU registers from RAM, many embedded developers will complain VERY loudly about the additional code bloat; we often jump through many hoops just to save 100 bytes of instruction memory. We only want to pay the cost of atomic-volatile accesses when working with DMA MMIO-registers.

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2024

The vast majority of MMIO register accesses do not affect the contents of memory.

They are still communicating with some concurrently running system. It doesn't matter whether there are any sort of persistent "contents" there.

If atomic volatile accesses induce the compiler optimizer to be more conservative with reloading registers from RAM

Volatile accesses already need to be reloaded from RAM each time, that's what makes them volatile...

I'm afraid you are going to have to spell out your concern in more details. Please skip the rhetoric and give us some technical details. :) I have now said multiple times that a relaxed atomic access has exactly the same cost (in the final assembly) as a non-atomic access, the difference is entirely in the memory model. (The "relaxed" part is important!) Until you provide any evidence to the contrary I will claim that the effects you are describing are therefore impossible. (I'm totally willing to believe that there's some potential for trouble here, but you will need to get a bit more specific.)

@korran
Copy link

korran commented Oct 8, 2024

They are still communicating with some concurrently running system. It doesn't matter whether there are any sort of persistent "contents" there.

Sure, but these non-DMA-related MMIO accesses won't affect (for example) the contents of the stack, unlike a DMA operation.

I have now said multiple times that a relaxed atomic access has exactly the same cost (in the final assembly) as a non-atomic access,

If this is true globally, then I don't have a concern. I'm sure the load/store itself is lightweight; it's how the atomic access influences the compiler optimizer's decision whether to reload (for example) unrelated local variables from the stack or re-use the ones in the registers that concerns me.

I think the only way to ease my fears is to run an experiment and see how making every volatile access atomic influences the code-size of representative firmware.

@Diggsey
Copy link

Diggsey commented Oct 8, 2024

In particular, atomics are also all about ensuring a certain order of operations -- the (in)famous "happens-before" order. And it's exactly that kind of ordering which we currently cannot express with volatile accesses, as the example in the OP shows. Atomics have all the vocabulary we need to fill this gap, it would be pointless to try to reinvent this.

I don't like this explanation because volatile is clearly also about ordering.

  • Volatile operations have a total order amongst themselves, but no order exists between volatile and non-volatile acceses. They are literally IO / a side effect of the program, which the compiler is required to preserve regardless of what other transformations it does. The fact that this IO involves memory is basically a distraction. A platform will define what kinds of volatile accesses are allowed to tear.

  • Atomic operations allow expressing various ordering constraints between the atomic access and other non-atomic accesses. They also guarantee that no tearing occurs.

This implies that you can use volatile operations for communication between threads, it's just very limited because you can never share any information via non-volatile accesses, and the resulting program will not be portable because of the tearing guarantees. Hence why you almost always want atomics.

I think it's an interesting question as to whether we need all four combinations of atomic/non-atomic and volatile/non-volatile accesses, or whether we can make volatile imply atomic to some degree. My view is that while you probably could add some atomicity guarantees to volatile accesses, it is simpler to understand if they are separate.

Also:

"sufficiently small" volatile accesses must be atomic.

What memory ordering would these volatile accesses have? The DMA case requires at least "Acquire" level on the final volatile read.

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2024

@korran

Sure, but these non-DMA-related MMIO accesses won't affect (for example) the contents of the stack, unlike a DMA operation.
[...]
it's how the atomic access influences the compiler optimizer's decision whether to reload (for example) unrelated local variables from the stack or re-use the ones in the registers that concerns me.

DMA operations coordinated with volatile operations may only read/write the stack when there are fences to induce synchronization. Without fences, optimizations should be largely unaffected. But that is indeed hard to say without actually trying it.

@Diggsey

I don't like this explanation because volatile is clearly also about ordering.

I would say they are primarily about being an observable effect. That indeed implies that the outside world (e.g. the hardware on the other end of that MMIO register) can observe the order of two accesses, but we don't currently let the program itself observe that order in any way.

This is a quite surprising state of affairs that many people are confused by. That's why I think everything gets less confusing if we make volatile accesses be at least "bytewise atomic memcpy" (with relaxed ordering). Additionally we need platform-specific tearing guarantees, where non-tearing accesses are then entirely equivalent to relaxed atomic accesses.

We don't yet have atomic bytewise memcpy though, so today the best we can do is to make small accesses truly relaxed atomic, and larger accesses stay non-atomic.

This implies that you can use volatile operations for communication between threads,

Not really, since we say racy volatile accesses are UB.

What memory ordering would these volatile accesses have? The DMA case requires at least "Acquire" level on the final volatile read.

They are relaxed. The DMA case then needs an acquire fence after the loop, so it's fine for the accesses to be relaxed.

@Diggsey
Copy link

Diggsey commented Oct 11, 2024

Not really, since we say racy volatile accesses are UB.

I agree this is non-sensical:

When using volatile to access hardware, accesses are always racy. And the program can't possibly distinguish a volatile access to hardware from a volatile access to another thread in the program. In fact a common pattern that absolutely must work would be setting a value with a volatile write, and then waiting for it to be cleared by hardware.

I believe it's impossible for a compiler to both faithfully treat volatile accesses as side-effectful, and also use knowledge that a racy volatile access cannot occur in order to optimize a program in any way.

So I agree that it makes sense for volatile accesses to be similar to an "elementwise atomic access" (where element size is platform dependent).

However, I don't necessarily think that precludes the idea of operations that are both atomic and volatile, where ordering can be specified, etc.

@RalfJung
Copy link
Member

So I agree that it makes sense for volatile accesses to be similar to an "elementwise atomic access" (where element size is platform dependent).

That means we need to say which ordering they have (to make them fit into the concurrency model), and "relaxed" is the only reasonable choice IMO.

However, I don't necessarily think that precludes the idea of operations that are both atomic and volatile, where ordering can be specified, etc.

Oh sure, we could have more APIs where one can do volatile acquire operations or something like that. But I think there's very little need for that -- release/acquire operations can be very reasonably approximated with relaxed operations + fences, which only leaves volatile SeqCst operations missing and that just seems truly cursed. ;)

Anyway, I'm not saying we shouldn't do that. All I am saying is we should fix our existing read_volatile/write_volatile operations to make more sense, by making them relaxed atomic.

@VorpalBlade
Copy link

This is a quite surprising state of affairs that many people are confused by. That's why I think everything gets less confusing if we make volatile accesses be at least "bytewise atomic memcpy" (with relaxed ordering). Additionally we need platform-specific tearing guarantees, where non-tearing accesses are then entirely equivalent to relaxed atomic accesses.

Hm, is there any weird thing that people want to do that would be incompatible with that? Or any architecture where the actual ISA isn't relaxed by default for loads/stores?

So, I may be way off here, but how would that work with memory mapped write combining memory? For example GPU memory is generally mapped as write combining / non-temporal (in APIs such as Vulkan). Even on x86, which is... interesting.

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2024

how would that work with memory mapped write combining memory?

There has to still be some heavy platform-dependent aspect to volatile reads, which defines what exactly is being "acquired" by a read_volatile(); fence(Acquire); sequence. If this isn't actual memory and you are just asking some device to for a value, the answer likely is "nothing". I assume the same is true for volatile reads from write-combining memory. (Non-volatile reads/writes to such memory are just UB.)

This is not surprising: in formal models of release/acquire operations, we think of memory as recording all past stores that ever happened. A read operation picks some previous store whose value it will return (subject to a lower bound: you are guaranteed to not see a store that's "too old", which models the concept of which things you have "synchronized with"), and that store has attached metadata about which things can be "acquired" through it. For volatile reads, that store was done by the hardware (or whatever), so it is entirely up to the hardware to define what the things "acquired" through this store are.

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2024

I guess a different approach would be to not use volatile at all for things like the DMA case described above, and just use atomics. This requires modeling the hardware as "another thread in the AM" and pretending we're sharing memory with that thread. Whether that is compatible with compiler optimizations depends on how one obtains the address of this shared memory -- but it's almost certainly already some construct that the compiler cannot trace back to its origins, and therefore the compiler already has to assume that it is accessible to some other thread in the AM and can't just optimize it all away.

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 11, 2024

Actually, yeah, I don't think even "per-byte atomic" is upheld entirely by any normal access to specifically WT/UC memory on x86 at all (fully atomic definitely isn't for sure). WC memory also comes with the caveat of needing a machine fence to actually do an acquire operation. The reason I don't think its even per-byte atomic is because I actually don't believe WT memory is fully coherent.

WC and WT memory types are incredibly common for MMIO on x86, as is UC typing (which makes reads incoherent arround writes on other cpus, unless you use hardware fences). You thus can't do language level atomic operations on MMIO memory often, so demanding volatile be effectively an atomic operation isn't going to fly.

@Diggsey
Copy link

Diggsey commented Oct 11, 2024

demanding volatile be effectively an atomic operation isn't going to fly.

The problem is that only atomic operations are allowed to race, or else you get UB. It seems like atomic is too strong but volatile is currently too weak.

At the very least, if a program only accesses some memory location via volatile accesses, that should not be UB, even if there is no synchronisation.

@Lokathor
Copy link
Contributor

In the llvm memory model for concurrent operations, they seem to list volatile as taking priority over the rest of the access stuff. Is it only rust that's calling it a data race? can we change rust's rules without having to even poke llvm about all this?

@chorman0773
Copy link
Contributor

At the very least, if a program only accesses some memory location via volatile accesses, that should not be UB, even if there is no synchronisation.

The catch, though, is DMA here, since you can't link a fence to a volatile operation. I think the only real way is to inline-asm (or volatile atomics, provided the memory type co-operates).

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2024

Actually, yeah, I don't think even "per-byte atomic" is upheld entirely

In which sense is it not upheld? volatile doesn't even have to be actual memory where writes and reads "see" the same thing. Flagging them as atomic doesn't change that, it just gives volatile accesses the option to also interact with the concurrency memory model.

Saying that volatile accesses are atomic is not intended to require anything new from targets. A volatile read is still basically an FFI call. However, currently the "FFI call" is severely limited in which state it is allowed to depend on or mutate, and in particular it cannot mutate the acquire-fence clock that determines which happens-before relationships are established by the next acquire fence. By making the operation atomic, we are giving this "FFI call" a few more options, so that it can update the acquire-fence clock. (And correspondingly, the "FFI call" that happens on a volatile write must be permitted to depend on the current thread's release-fence-clock, which it currently is not.)

(These clocks are pieces of per-thread state that are completely implicit in the typical graph-based "axiomatic" memory model, but made explicit when translating them into a more operational form, such as the "promising semantics" model. This explains e.g. why a nomem asm block is not allowed to contain a fence, or at least the fence does not actually have its intended effect -- such an asm block is not allowed to read or write these clocks.)

In the llvm memory model for concurrent operations, they seem to list volatile as taking priority over the rest of the access stuff

It says the result is "target-dependent", which is pretty useless. Where does it spell out how the targets define it? Maybe it's UB.

It also says "It does not generally provide cross-thread synchronization", whatever that means -- but it does sound like these are not actually atomic operations, e.g. "volatile read; acquire fence" doesn't work as intended for the DMA example above.

Generally the LLVM memory model is fairly poorly documented, has no formal model I am aware of, and does some rather unconventional things. I don't think it is suitable for direct use by Rust at this point.

@scottmcm
Copy link
Member

I guess a different approach would be to not use volatile at all for things like the DMA case described above, and just use atomics. This requires modeling the hardware as "another thread in the AM" and pretending we're sharing memory with that thread.

I'm a big fan of this direction.

To me, the point of volatile is for things where the act of reading/writing does something, and thus writing the same value twice in a row right next to each other (or reading twice right in a row) can't be optimized in the obvious way. Anything that doesn't need that shouldn't be volatile, IMHO. (Much as various people love to abuse volatile for random stuff in C.)

For something like a vram buffer that does work like memory, and thus coalescing reads/writes like that is completely reasonable, treating it the same way we do something like shared memory between user-mode processes (or even just shared memory in the same process) and using atomics feels like a better fit anyway.

@chorman0773
Copy link
Contributor

In which sense is it not upheld? volatile doesn't even have to be actual memory where writes and reads "see" the same thing. Flagging them as atomic doesn't change that, it just gives volatile accesses the option to also interact with the concurrency memory model.

They may not link up with fences properly or at all, and they aren't necessarily coherent.

@RalfJung
Copy link
Member

@chorman0773

They may not link up with fences properly or at all, and they aren't necessarily coherent.

Yeah that's fine. They already don't satisfy tons of property of regular memory. The key point is to give them the option to interact with the concurrency model -- which, in my understanding, requires marking them as relaxed atomic in LLVM.

@scottmcm

To me, the point of volatile is for things where the act of reading/writing does something, and thus writing the same value twice in a row right next to each other (or reading twice right in a row) can't be optimized in the obvious way. Anything that doesn't need that shouldn't be volatile, IMHO. (Much as various people love to abuse volatile for random stuff in C.)

But doesn't that apply to the DMA example? self.reg.dma_start.set(1) is a magic access that does something, namely inform the other side of this exchange to go start the DMA. The other side literally knows that this write happened, not because it changed the value in memory, but because the write itself is a thing it can see.

@chorman0773
Copy link
Contributor

Yeah that's fine. They already don't satisfy tons of property of regular memory. The key point is to give them the option to interact with the concurrency model -- which, in my understanding, requires marking them as relaxed atomic in LLVM.

I'm not sure this reasoning holds. "They don't interact with atomic rules because they don't interact with atomic rules" is an interesting tautology, but I don't think it holds. One property lost by WT memory is Write-write coherence, which is observable here, because it's the order that the writes occur. I don't think we can make them atomic without that property existing.

@RalfJung
Copy link
Member

I don't think we can make them atomic without that property existing.

I think we can. You keep ignoring my arguments. LLVM can already not assume that volatile accesses behave like normal memory accesses. Nothing about that changes just because we add an atomic flag to them.

Please show concrete evidence of any issue. I just don't buy your hypotheticals.

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 11, 2024

My assumption is that if we provide atomicy coherence, a user could rely on the order of writes between threads, because those writes are observable behaviour. This is the not the case with WT memory, even with a clear inter-thread-happens before.

What the proposal is "Yeah, these operations are atomic, but you the user can't actually rely on any atomicy properties. They just, are because we say so."

At least with a proper atomic (or atomic-volatile), we could say "Yeah, it's UB on memory types that don't support atomic operations", but saying that for all volatile operations is a non-starter.

@RalfJung
Copy link
Member

RalfJung commented Oct 11, 2024

That seems far-fetched to me. Users already cannot rely on volatile reads/writes behaving like normal non-atomic reads/writes since there might not even be any memory there. Why would they suddenly forget about that lesson when we let volatile accesses have atomicity effects?

Users that work on non-standard memory are hardly going to be surprised that they get non-standard semantics. They will be happy that we give them any way at all to use that memory!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants