-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have a Move-In-Move-Out DMA transaction API in addition (again) #1512
Comments
Could you briefly explain the lifetime issue you have run into and how the old API made it viable? |
Sure! Probably easiest to explain with some code which (hopefully) resembles the problem: use std::{marker::PhantomData, ptr::addr_of_mut};
fn main() {
let dma = Dma {
phantom: PhantomData,
};
let mut foo = Foobar {
dma,
inflight: None,
};
foo.process();
// foo.process(); // foo is still borrowed mutably
}
struct Foobar<'d, 't> {
dma: Dma<'d>,
inflight: Option<Transaction<'t>>,
}
impl<'d, 't> Foobar<'d, 't> {
fn process(&'d mut self) where 'd: 't {
if let Some(previous) = self.inflight.take() {
previous.wait();
}
let mut count = 5;
let transaction = loop {
let transaction = self.dma.transaction(dma_buffer()); // without Polonius: `self.dma` was mutably borrowed here in the previous iteration of the loop
if count > 0 {
transaction.wait();
count -= 1;
} else {
break transaction;
}
};
self.inflight.replace(transaction);
}
}
fn dma_buffer() -> &'static mut [u8; 1024] {
static mut BUFFER: [u8; 1024] = [0u8; 1024];
unsafe { &mut *addr_of_mut!(BUFFER) }
}
struct Dma<'d> {
phantom: PhantomData<&'d ()>,
}
impl<'d> Dma<'d> {
pub fn transaction<'t>(&'t mut self, _buffer: &'t mut [u8]) -> Transaction<'t> {
Transaction {
phantom: PhantomData,
}
}
}
struct Transaction<'t> {
phantom: PhantomData<&'t ()>,
}
impl<'t> Transaction<'t> {
pub fn wait(self) {}
} With the commented-out second call to The old API moved SPI and the buffers into the transaction and we can store the "inflight" transaction in a struct field to finish it in the next iteration |
I might have an alternative idea. Need to try that |
I happen to also have a use case for Move-In-Move-Out (DMA) transactions, where I want to schedule semi-continuous transfers with interrupts with minimal involvement from the application code. I'm writing a driver for the MAX3421E USB chip and with the current |
Would my proposed unsafe SPI master API help? |
It does but having to use unsafe is a bit unfortunate. |
Yes, the move-in-move-out API was my first thought (and actually how it was before) 😄 |
Sorry I haven't had much time to think about this recently. I'm not against the Move-In-Out API, but I'd like to avoid two APIs that do the same thing. I think fundamentally, what we might want is the ability to queue multiple DMA transactions. If you squint a little bit, the Move-In-Out API gives us a queue of depth 1. So I think to achieve that, we probably want to split out the "preparation" part from the actual DMA transaction. All of what I've mentioned above is very hand wavey though, I don't know much about the DMA impl and this may be impossible to achieve, or a bad idea :D. I would appreciate your inputs on this. |
Just some thoughts here and there about the topic. Apologies for the wall of text, I've been thinking about this for 2 days. I think the current ergonomic (but technically unsound) API can be built on top of a move based API, it just means we need to live with some I would really love to be able to queue multiple (DMA or not) transactions like esp-idf allows you to (but slightly better). I don't really know if all of this belongs in the hal, with a move based API, users could just DIY it on top of the driver. Or perhaps it's useful enough for everyone to keep in here, idk. Can decide this later I suppose.
It's important that one is able to queue up more transactions whilst there are already transactions in flight. Splitting out the "preparation" part from the transaction itself would mean you can queue multiple transactions at once but once they start you have to wait for them to finish to queue some more, killing throughput. To achieve this, I think we just need a
If ESP-IDF can do it, then esp-hal can do it too. 😛 A more interesting implementation could also take advantage of the segmented transfer feature that the ESP32S3 has (Idk about other chips). It basically let's you queue up multiple transactions to different devices in one go. It appears to use GDMA to configure the SPI registers. |
Thanks for sharing your thoughts - I think segmented transfers are definitely something we need to explore. I avoided it before because of the added complexity but definitely has some advantages
I also thought about that - I was a bit concerned about the need for Just to explain the original motivation for this issue a little bit: That linked repository in the first comment basically starts a DMA transfer to push pixels to a display and returns so that the user code can prepare the next frame. (It does that in chunks to avoid needing a 300x240x2 bytes buffer which wouldn't be possible to transfer in one go anyways - but that's not important here). When the next frame is ready to push to the display, we need to make sure the previous transfer is done before starting the next one. With the old API it was possible to store the Transaction as a struct member and wait for its completion when the next frame is ready - getting back the SPI instance (and the buffers) to push new pixels to the display. With the unsafe API doing all this is very easy and convenient but unsafe With the current API it's impossible unfortunately A queue might help but I agree with:
I guess for that we'd need some Currently the DMA transactions API is quite simple and duplicated for each peripheral (because it's almost the same but a little bit different). Creating something more advanced ideally wouldn't be duplicated code for each driver (and it would be great to get rid of the duplicated code we already have) |
Yeah this has always bugged me about the DMA code in each driver, it always feels like there's room for dedup but not an obvious way of how to.
Yeah probably. It'd work for SPI at least, that has the concept of transactions. It may be tricky to fill up the DMA buffer without accidentally feeding data into the previous transactions? I guess this would need separate descriptor lists. Or perhaps users can just provide a
Well if you're not unhappy with the idea I'd be happy to draft something. Though my initial use case if for SPI without DMA, which at the moment isn't async (for understandable reasons). My transactions never go above 64 bytes. |
I've come back to wanting this again and have some more concrete ideas after reading the TRM (of the ESP32-S3). To layout some hardware facts, feel free to skip if you already know how the hardware works.
Having laid that out, my ideas build on top of @MabezDev's idea about splitting the preparation from the transaction. The first idea I have is basically the Move API but with an important change; The driver no longer takes ownership of the DMA descriptors. To illustrate with pseudo code, the current API looks like this: struct SpiDriver {
gdma: Channel, // with descriptors
// other things....
}
impl SpiDriver {
pub fn write(&mut self, buffer: &[u8]) -> DmaTransfer;
} I want to change it to something like this: struct SpiDriver {
gdma: Channel, // without descriptors
// other things....
}
pub struct DmaTxBuf(buffer: &[u8], descriptors: &[DmaDescriptor]);
impl SpiDriver {
pub fn write(self, buffer: DmaTxBuf) -> DmaTransfer;
} With this change, users can prepare several This is quite a big change, especially for the DMA driver which now needs to be descriptor free..... Having said all that, I think it's best that any queuing should be left to the user if possible and the hal should just make it safe to write queuing implementation outside of it. The main reason is some folks may want to perform some actions between each transaction, like toggling an out of band gpio pin (think of mipidsi D/C pins) or software CS pin. Also locking implementations would depend on what the user is doing, so best to leave them to their devices haha. The second idea is to make the FIFO a separate object to the driver. let (mut spi, mut fifo) = Spi::new(peripherals.SPI3, .....);
fifo.fill(&[0x01, 0x02, .....]); // Assert buf < 64 (FIFO size)
spi.write(&mut fifo); I have three reasons for this. The first being that users will now be able fill the FIFO whilst a DMA transfer is happening. This works well for devices like SD cards, where you have one big DMA transfer to write some blocks to the device, then a small transfer to simply check the status of the previous transfer. The second being that one can now "split" the FIFO in two (see hardware fact 7), then use one for preparation and the other for the active transfer. This allows for a safe queuing implementation based on the FIFO. Picture something like this: let (mut spi, fifo) = Spi::new(peripherals.SPI3, .....);
let (mut fifo_first_half, mut fifo_last_half) = fifo.split();
fifo_first_half.fill(&[0x01, 0x02, .....]); // Assert buf < 32 (Half of FIFO size)
spi.write(&mut fifo_first_half); // Can also take ownership of the FIFO for async transfers.
fifo_second_half.fill(&[0x05, 0x0A, .....]); // Assert buf < 32 (Half of FIFO size)
spi.write(&mut fifo_second_half); // Can also take ownership of the FIFO for async transfers.
let fifo = fifo_first_half.join(fifo_second_half); // Can undo the split for bigger transfers. The third being that users will be able to take advantage of hardware fact 2, use FIFO for write and use DMA for read, or vice versa. Another nice side effect of this is, in a full duplex transfer, one can use one half of the FIFO for writing and one half for reading. This is useful for polling, where you're basically writing the same command over and over again until you get a certain result. Of course, there will need to be some kind of type safety to make sure you cannot use SPI2's FIFO with SPI3 or vice versa. Now to address the aftermath of these changes. The base SPI driver will no longer be able to implement let (spi, fifo) = Spi::new(peripherals.SPI3, .....);
// e-hal implementation that does memcpy's to the FIFO like the current one does
let mut driver = SpiBusDriver::new(spi, fifo);
driver.write(&[0, 0, 0, 0]);
driver.transfer_in_place(&mut [0, 0, 0, 0]); or in the DMA case. let (spi, _) = Spi::new(peripherals.SPI3, .....);
let spi_dma = spi.with_dma(dma_channel);
// e-hal implementation that does memcpy's to the DMA buf like FlashSafeDma does.
let mut driver = SpiBusDmaDriver::new(spi_dma, dma_tx_buf, dma_rx_buf); This could probably have some helper functions to make construction shorter. For a future improvement on this, the Once all this is done, I also have semi concrete ideas for configurable segmented transfers that go on top of this
Thoughts? 🙂 I quite like how this came out. |
I like this idea, a lot of the operations we do with (especially interrupt-less) SPI devices is polling which can be made much less costly with this. |
I really like the first proposal, it's exactly what I had in mind! One thing to think about is if we can still have the "easy" way to use the API by default, IMO it's not a dealbreaker if we can't but it would be nice if a user just wants to do one transaction at a time we make it easy as it is now, making the more complex queuing as an opt-in API. Either way, I'd really like to see something like this implemented, I think it's a great proposal.
I totally agree with this too. In the future we could provide some opt-in wrapper type that could handle the queuing, but I would say for now it's quite an advanced API and it should be used manually when the performance is needed.
This part I'm not so sure of yet. I like the idea in principle, but in reality, when would you use this? You mentioned SD Cards over SPI, but these are inherently quite slow, if you want performance you would use SDIO. I'm not sure adding this complexity is worth it. Especially since it changes the most common use of SPI with the embedded-hal traits. We could flip the functionality of your proposed PR on its head, where by default its a e-hal compatible SPI device, but then we could be split into the two parts; again though I'm not sure its worth the effort unless I'm missing something. (also sorry again for not getting to this sooner! I really appreciate the effort required to come up with these proposals :).) |
I agree with @MabezDev (and also sorry I was scared away by the wall of text initially)
I might have an idea which sounds great in my head but maybe isn't in code - that would be possible to do in a follow-up PR (I'd volunteer for that), i.e. I guess it's fine to start with the move approach to not have that initial PR even more complex than it will for sure be
I can support in testing those. And at least we now have a test for I2S (breaking I2S was the main reason I often scared away from touching DMA code before 😆) |
I'm glad you both like the proposal!
It's precisely because they're slow that you'd want to do this. If the SPI clock rate is low, then pulling out DMA and faffing around with I've personally started with a queueing implementation in my project based on
I did consider this initially but it gets in the way of a future proposal of mine which was to expose app interrupts. The SPI peripheral (on the ESP32-S3 at least) has "app interrupts" which pretty do the same thing as Oh actually! Maybe two separate c'tors make sense here. The existing
Fantastic! |
This would be great, since the SPI already has a FIFO of 64 bytes and it would be similar to the "interrupt-based" transactions in the official docs. For example, my application mainly uses the SPI to read IMU packets (20-byte), at 20MHz that would be ~2000 cpu cycles wasted. Setting up DMA for this application is also quite wasteful. |
Turns out this doesn't work well. Particularly because of the In addition to the "app interrupts", I was also hoping (in a future PR) to return some kind of "interrupt as a resource" thing from let (spi, buf, app_intr, intr_handler) = Spi::new(....);
let async_bus = AsyncBusDriver::new(spi, buf, intr_handler); or unsafe fn spi_handler() {}
let (spi, buf, app_intr, intr_handler) = Spi::new(....);
intr_handler.set(spi_handler); |
In #1238 using DMA was made easier but it turned out that the new API doesn't support all use-cases supported by the previous API.
Especially something like https://github.com/georgik/esp-display-interface-spi-dma isn't possible this way.
(Using Polonius partially helps a bit but still doesn't fully solve the issue)
The text was updated successfully, but these errors were encountered: