Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

axi-spi-engine driver watchdog timer still calls back after being canceled #2298

Open
dlech opened this issue Oct 20, 2023 · 3 comments
Open

Comments

@dlech
Copy link
Collaborator

dlech commented Oct 20, 2023

          Unrelated, but I noticed another odd thing when everything is working properly (no timeout): the watchdog timer doesn't seem to be canceled on successful completion and still fires 5 seconds afterwards.
        iio_attr-1170    [000] d.h2.   129.119232: spi_engine_read_rx_fifo <-spi_engine_irq
        iio_attr-1170    [000] d.h2.   129.119233: spi_engine_read_buff <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119235: spi_engine_rx_next <-spi_engine_read_rx_fifo
        iio_attr-1170    [000] d.h2.   129.119236: spi_engine_xfer_next <-spi_engine_rx_next
        iio_attr-1170    [000] d.h2.   129.119238: spi_engine_complete_message <-spi_engine_irq
          <idle>-0       [000] ..s2.   134.152771: spi_engine_timeout <-call_timer_fn

del_timer() is called immediately before spi_engine_complete_message() so I would expect it to be canceled and not call the callback.

Originally posted by @dlech in #2287 (comment)

@nunojsa
Copy link
Collaborator

nunojsa commented Jan 24, 2024

Sooo, I had to deal with this thing and saw the same behavior. As it turns out the problem is not as crazy or in core code as we were thinking... Note the below log (additional printk placed by me):

[  110.207544] Transfer one spi message... c1c9bda8
[  110.207578] Del watchdog timer (0)
[  110.207585] complete spi message c1c9bda8
[  110.207620] Modify Timer base...
root@analog:~# dmesg -c
[  115.280465] SPI ENGINE TIMEOUT

From the above it's clear that we're being faster in processing the spi transfer than calling mod_timer(). That's also visible by the return code from del_timer() which is 0 - > deactivated timer. So, if we move the mod_timer() call before activating the interrupts, we get:

[  382.201474] Transfer one spi message... c6919da8
[  382.201503] Modify Timer base...
[  382.201520] Del watchdog timer (1)
[  382.201532] complete spi message c6919da8

and no spurious soft interrupt. This was actually causing a kernel panic in a usecase where we play with overlays and a loop with load + unloading the overlay would eventually panic as we would end up (at some point) in the timer callback with the spi controller unbound...

So, a simple patch for this would be:

diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c
index 050096874afe..44cbc40abde3 100644
--- a/drivers/spi/spi-axi-spi-engine.c
+++ b/drivers/spi/spi-axi-spi-engine.c
@@ -651,12 +651,13 @@ static int spi_engine_transfer_one_message(struct spi_master *master,

        int_enable |= SPI_ENGINE_INT_SYNC;

+       mod_timer(&spi_engine->watchdog_timer, jiffies + 5 * HZ);
+
        writel_relaxed(int_enable,
                spi_engine->base + SPI_ENGINE_REG_INT_ENABLE);
        spi_engine->int_enable = int_enable;
        spin_unlock_irqrestore(&spi_engine->lock, flags);

-       mod_timer(&spi_engine->watchdog_timer, jiffies + 5*HZ);

        return 0;
 }

Now we are calling mod_timer() with the spin lock held but AFAICT, it should not be an issue... Do you see any issue with it @dlech?

I just checked and this should not be an issue in the upstream version of the driver but we will probably only sync the upstream version of the controller in the main branch and not in 2022_R2 where we also need a fix and that is why I'm asking if this looks sane to you :).

I can also just call mod_timer() right before acquiring the spin...

@dlech
Copy link
Collaborator Author

dlech commented Jan 24, 2024

I think it is fine in the spinlock since the timer should have the TIMER_IRQSAFE flag. But if you want to be consistent with the upstream driver, I just put it before the spinlock since it isn't possible for two calls to spi_engine_transfer_one_message() to happen at the same time.

@nunojsa
Copy link
Collaborator

nunojsa commented Jan 24, 2024

Alright,

I'll do this for the 2022_R2 branch and assume for main we well get it in sync with upstream + offload stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants