Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: sof-pcm/pm: Stop paused streams before the system suspend #5058

Open
wants to merge 1 commit into
base: topic/sof-dev
Choose a base branch
from

Conversation

ujfalusi
Copy link
Collaborator

Paused streams will not receive a suspend trigger, they will be marked by ALSA core as suspended and it's state is saved.
Since the pause stream is not in a fully stopped state, for example DMA might be still enabled (just the trigger source is removed/disabled) we need to make sure that the hardware is ready to handle the suspend.

This involves a bit more than just stopping a DMA since we also need to communicate with the firmware in a delicate sequence to follow IP programming flows.
To make things a bit more challenging, these flows are different between IPC versions due to the fact that they use different messages to implement the same functionality.

To avoid adding yet another path, callbacks and sequencing for handling the corner case of suspending while a stream is paused, and do this for each IPC versions and platforms, we can move the stream back to running just to put it to stopped state.

Explanation of the change:
Streams moved to SUSPENDED state from PAUSED without trigger. If a stream does not support RESUME then on system resume the RESUME trigger is not sent, the stream's state and suspended_state remains untouched. When the user space releases the pause then the core will reject this because the state of the stream is not PAUSED, it is still SUSPENDED.

From this point user space will do the normal (hw_params) prepare and START, PAUSE_RELEASE trigger will not be sent by the core after the system has resumed.

Link: #5035

Paused streams will not receive a suspend trigger, they will be marked by
ALSA core as suspended and it's state is saved.
Since the pause stream is not in a fully stopped state, for example DMA
might be still enabled (just the trigger source is removed/disabled) we
need to make sure that the hardware is ready to handle the suspend.

This involves a bit more than just stopping a DMA since we also need to
communicate with the firmware in a delicate sequence to follow IP
programming flows.
To make things a bit more challenging, these flows are different between
IPC versions due to the fact that they use different messages to implement
the same functionality.

To avoid adding yet another path, callbacks and sequencing for handling the
corner case of suspending while a stream is paused, and do this for each
IPC versions and platforms, we can move the stream back to running just to
put it to stopped state.

Explanation of the change:
Streams moved to SUSPENDED state from PAUSED without trigger. If a stream
does not support RESUME then on system resume the RESUME trigger is not
sent, the stream's state and suspended_state remains untouched.
When the user space releases the pause then the core will reject this
because the state of the stream is _not_ PAUSED, it is still SUSPENDED.

From this point user space will do the normal (hw_params) prepare and
START, PAUSE_RELEASE trigger will not be sent by the core after the
system has resumed.

Link: thesofproject#5035
Signed-off-by: Peter Ujfalusi <[email protected]>
@ujfalusi
Copy link
Collaborator Author

Right, so this is a resurrection of #5040 with extended comments and explanation.

Replaces: #5054
With this patch all of the bugs can remain as they are not going to be hit (we are stopping the paused stream before suspend after all) and I did not wanted to water up this PR with 'unrelated' patches.

@ujfalusi
Copy link
Collaborator Author

To note: this patch can handles so far all cases that I have thrown at it as long as we don't support RESUME, With RESUME advertised the system locks up and parts of #5054 is needed to fix the lock, but we really cannot handle this corner case in any other sane way.

continue;

ret = substream->ops->trigger(substream, cmd);
if (ret) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the core still thinks that the stream is in the paused state when in fact we went ahead and stopped it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core thinks that the stream is suspended, the state it was before it got suspended was paused as far as the core knows, but if the SNDRV_PCM_INFO_RESUME is not set then on system resume the stream will stay in this state and on an attempt to un-pause it, it will fail as the stream is not in paused state, it is still in suspended.

Might be a bug in core, but this is how it has been, so it is a feature now.
We don't support RESUME, so after resume the stream must be restarted, this applies to running or paused streams, so to cut down on things, we just stop the paused ones to avoid dealing with different flows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi will this handle both the BE and FE pipelines?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi the other concern I have with this is that when you stop during suspend and then after system resumne, the user releases the pause, the started_count will not be incremented right? what happens during stop after pause/release then? Wont the started_count go below zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi will this handle both the BE and FE pipelines?

Yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi the other concern I have with this is that when you stop during suspend and then after system resumne, the user releases the pause, the started_count will not be incremented right? what happens during stop after pause/release then? Wont the started_count go below zero?

The user releasing the pause will not going to be propagated down here, it is going to error out in the core because the stream is in SUSPENDED state.
Since the release of pause failed, user space will start the stream again.
This patch handles this correctly as well, counters are all well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?

I don't think we are able to do that easily. To move to PAUSED we need to start first right after the resume and since we don't support RESUME we would need someone to call prepare() then trigger:START.

This is what happens if we suspend with active audio as well. prepare() followed by trigger() and this is what happens if we were in PAUSED state: core will reject a PAUSE_RELEASE after the system has resumed, so user space will do a complete restart (or close the PCM and gives up).

return ret;

return sof_pcm_trigger_suspended_paused_streams(sdev,
SNDRV_PCM_TRIGGER_STOP);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.
Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.

One loop does not work for some reason, if I do that then some refcounting will got crazy and we will end up with all sorts of errors.
I can re-test it again, but it was failing either with IPC3 or IPC4.

Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams

Not sure what you mean, I think it is saying that we stop them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.

One loop does not work for some reason, if I do that then some refcounting will got crazy and we will end up with all sorts of errors. I can re-test it again, but it was failing either with IPC3 or IPC4.

@lyakh, the old #5040 implemented a single loop (it was with SUSPEND instead of STOP, but used STOP later) and there were a reason I added this comment: https://github.com/thesofproject/linux/pull/5040/files/358f4ee05c54dfa8fb74b6235ca6d3e08dc947ae#r1634986648

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams

Not sure what you mean, I think it is saying that we stop them.

Right, sorry, I misunderstood

@ujfalusi
Copy link
Collaborator Author

I think the important note is that this is fixing an extremely rare corner case. It is as rare as I think it is broken for sever years and we did not received a single report of it, but if it happens your machine will lock up hard.

@plbossart
Copy link
Member

Coming back after a short break, I am not sure why this is needed?

we have code in hda_dsp_dais_suspend() that supposedly takes care of the suspend-while-paused configuration. I am 100% sure we tested this with @ranj063

So the questions are:

a) what is missing in the existing code?
b) what is new in the suggested PR?

@ujfalusi
Copy link
Collaborator Author

Coming back after a short break, I am not sure why this is needed?

we have code in hda_dsp_dais_suspend() that supposedly takes care of the suspend-while-paused configuration. I am 100% sure we tested this with @ranj063

I'm also 100% sure that this was tested, but a long-long time ago and it got broken at some point (it was not teted in CI or by developers).

So the questions are:

a) what is missing in the existing code?

A completely new state machine addition to handle this corner case for IPC4 and for IPC3.

b) what is new in the suggested PR?

This PR cuts the Gordian-knot by placing the firmware, hardware and kernel states to what they should be using existing and well tested flows.

@plbossart
Copy link
Member

Still not following if we need to keep the existing code in hda_dsp_dais_suspend() ??

I am now starting to wonder if the issue only happens with IPC4.

@ujfalusi
Copy link
Collaborator Author

Still not following if we need to keep the existing code in hda_dsp_dais_suspend() ??

In theory the hda_dsp_dais_suspend() will be a NOP as we stop all paused streams properly and not leaving anything to be handled later, but I would wait a bit for the removal.

I am now starting to wonder if the issue only happens with IPC4.

The issue happens with both IPC3 and IPC4.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still many opens...

* supporting RESUME.
*
* First we need to move (trigger) the paused streams to RUNNING state,
* then we need to stop them
Copy link
Member

@plbossart plbossart Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING

This is the case for soc-pcm.c:

		case SNDRV_PCM_TRIGGER_STOP:
			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) &&
			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
				goto next;

and likewise IPC4 transitions are fine with a PAUSE->STOP.

IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING

This is the case for soc-pcm.c:

		case SNDRV_PCM_TRIGGER_STOP:
			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) &&
			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
				goto next;

@plbossart, it might be so in ASoC DPCM, but if a stream is PAUSED then the core would not send a STOP trigger:

static int snd_pcm_do_stop(struct snd_pcm_substream *substream,
			   snd_pcm_state_t state)
{
	if (substream->runtime->trigger_master == substream &&
	    snd_pcm_running(substream)) {
		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
		substream->runtime->stop_operating = true;
	}
	return 0; /* unconditionally stop all substreams */
}

and likewise IPC4 transitions are fine with a PAUSE->STOP.

Furthermore, we in sof do not handle well the PAUSE->STOP if we handle it all.
Yes, I have tested this as the first thing.

IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.

Yes, that might happen. We are talking about a rare corner case. The options is to end up with a broken stack or a possible glitch.

continue;

ret = substream->ops->trigger(substream, cmd);
if (ret) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

@ujfalusi can I suggest a different approach? ALSA suspends stream only if the dpcm state is START. How about we add PAUSED to this also? I mean invoke the dai suspend trigger but leave the state as PAUSED. That way we wont have to do any special handling in the SOF driver at all?

Copy link
Collaborator Author

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart, the state transitions are just too complicated and they differ between IPC3 and IPC4. If we want to handle the suspend while paused we would need unknown (but not small!) amount of tweaking, extending, rewriting of the state machines and ops in core and in vendor code.
I'm not saying that it is not possible, but every time we do that, the state handling goes more and more complex to comprehend what is going on.

Cutting the knot with a good amount of documentation and explanation is far better solution, imho.

* supporting RESUME.
*
* First we need to move (trigger) the paused streams to RUNNING state,
* then we need to stop them
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING

This is the case for soc-pcm.c:

		case SNDRV_PCM_TRIGGER_STOP:
			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) &&
			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
				goto next;

@plbossart, it might be so in ASoC DPCM, but if a stream is PAUSED then the core would not send a STOP trigger:

static int snd_pcm_do_stop(struct snd_pcm_substream *substream,
			   snd_pcm_state_t state)
{
	if (substream->runtime->trigger_master == substream &&
	    snd_pcm_running(substream)) {
		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
		substream->runtime->stop_operating = true;
	}
	return 0; /* unconditionally stop all substreams */
}

and likewise IPC4 transitions are fine with a PAUSE->STOP.

Furthermore, we in sof do not handle well the PAUSE->STOP if we handle it all.
Yes, I have tested this as the first thing.

IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.

Yes, that might happen. We are talking about a rare corner case. The options is to end up with a broken stack or a possible glitch.

continue;

ret = substream->ops->trigger(substream, cmd);
if (ret) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?

I don't think we are able to do that easily. To move to PAUSED we need to start first right after the resume and since we don't support RESUME we would need someone to call prepare() then trigger:START.

This is what happens if we suspend with active audio as well. prepare() followed by trigger() and this is what happens if we were in PAUSED state: core will reject a PAUSE_RELEASE after the system has resumed, so user space will do a complete restart (or close the PCM and gives up).

@ujfalusi
Copy link
Collaborator Author

@ujfalusi can I suggest a different approach? ALSA suspends stream only if the dpcm state is START. How about we add PAUSED to this also? I mean invoke the dai suspend trigger but leave the state as PAUSED. That way we wont have to do any special handling in the SOF driver at all?

@ranj063, I did thought about that, but that will have effect on all drivers. I cannot predict how they will handle this fundamental change.

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

@ranj063, I did thought about that, but that will have effect on all drivers. I cannot predict how they will handle this fundamental change.

@ujfalusi why not make it opt-in? One suggestion I have is to make this a property of struct snd_soc_pcm_runtime that can be set by the driver opting-in in the be_hw_params_fixup callback

@ujfalusi
Copy link
Collaborator Author

@ranj063, I did thought about that, but that will have effect on all drivers. I cannot predict how they will handle this fundamental change.

@ujfalusi why not make it opt-in? One suggestion I have is to make this a property of struct snd_soc_pcm_runtime that can be set by the driver opting-in in the be_hw_params_fixup callback

it is ALSA core which stops the triggers.

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

@ranj063, I did thought about that, but that will have effect on all drivers. I cannot predict how they will handle this fundamental change.

@ujfalusi why not make it opt-in? One suggestion I have is to make this a property of struct snd_soc_pcm_runtime that can be set by the driver opting-in in the be_hw_params_fixup callback

it is ALSA core which stops the triggers.

@ujfalusi this is what I had in mind

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 33671437ee89..62664631fbc0 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -1210,6 +1210,7 @@ struct snd_soc_pcm_runtime {
 
        int num_components;
        struct snd_soc_component *components[]; /* CPU/Codec/Platform */
+       bool suspend_trigger_on_pause;
 };
 
 /* see soc_new_pcm_runtime()  */
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 711b2f49ed88..86f3f7f05581 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -2266,8 +2266,14 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 
                        break;
                case SNDRV_PCM_TRIGGER_SUSPEND:
-                       if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
+                       if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) {
+                               /* send suspend trigger for paused stream if opted-in. Leave the state as paused though */
+                               if (be->dpcm[stream].state == SND_SOC_DPCM_STATE_PAUSED &&
+                                   be->suspend_trigger_on_pause)
+                                       ret = soc_pcm_trigger(be_substream, cmd);
+
                                goto next;
+                       }
 
                        be->dpcm[stream].be_start--;
                        if (be->dpcm[stream].be_start != 0)
diff --git a/sound/soc/sof/pcm.c b/sound/soc/sof/pcm.c
index baad4c1445aa..9eb45620eb34 100644
--- a/sound/soc/sof/pcm.c
+++ b/sound/soc/sof/pcm.c
@@ -617,6 +617,9 @@ int sof_pcm_dai_link_fixup(struct snd_soc_pcm_runtime *rtd, struct snd_pcm_hw_pa
                return 0;
        }
 
+       /* set the flag to receive the suspend trigger for paused streams */
+       rtd->suspend_trigger_on_pause = true;
+
        if (pcm_ops && pcm_ops->dai_link_fixup)
                return pcm_ops->dai_link_fixup(rtd, params);

@ujfalusi
Copy link
Collaborator Author

@ranj063, that is all fine, but the SUSPEND will not reach that point since in sound/core/pcm_native.c we have this:

static int snd_pcm_do_suspend(struct snd_pcm_substream *substream,
			      snd_pcm_state_t state)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	if (runtime->trigger_master != substream)
		return 0;
	if (! snd_pcm_running(substream))
		return 0;
	substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_SUSPEND);
	runtime->stop_operating = true;
	return 0; /* suspend unconditionally */
}

The trigger is 'eaten' up in core level.

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

snd_pcm_running

Can we extend this to snd_pcm_runtime also to do the same or do you think its too invasive?

@ujfalusi
Copy link
Collaborator Author

snd_pcm_running

Can we extend this to snd_pcm_runtime also to do the same or do you think its too invasive?

Since this trigger sequence has never been tested (as it was not even possible to get SUSPEND while PAUSED) it is highly likely that it is broken. I have reservations on how simple is to handle this also.

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

Since this trigger sequence has never been tested (as it was not even possible to get SUSPEND while PAUSED) it is highly likely that it is broken. I have reservations on how simple is to handle this also.

But you're trying to do the same within our driver anyway no by calling the suspend trigger for paused streams? Wouldnt it make sense to simplify it?

@ranj063
Copy link
Collaborator

ranj063 commented Jul 17, 2024

But you're trying to do the same within our driver anyway no by calling the suspend trigger for paused streams? Wouldnt it make sense to simplify it?

@ujfalusi how about consulting with Mark or Takashi about this instead of assuming that we have to deal with this in the SOF driver?

@ujfalusi
Copy link
Collaborator Author

Since this trigger sequence has never been tested (as it was not even possible to get SUSPEND while PAUSED) it is highly likely that it is broken. I have reservations on how simple is to handle this also.

But you're trying to do the same within our driver anyway no by calling the suspend trigger for paused streams? Wouldnt it make sense to simplify it?

Not really, I'm moving the paused stream to RUNNING and then to STOPPED. If we move it to SUSPENDED then there were other issues popping up. After system resume a SUSPENDED stream is handled differently than a PAUSED stream, given the SNDRV_PCM_INFO_RESUME is not set.

For example: if the stream was PAUSED, it will not receive the RESUME trigger...

@lgirdwood
Copy link
Member

@ujfalusi @ranj063 still needed or can we close ?

@ranj063
Copy link
Collaborator

ranj063 commented Oct 22, 2024

@ujfalusi @ranj063 still needed or can we close ?

@ujfalusi should we take the issue up with Mark/Takashi?

@ujfalusi
Copy link
Collaborator Author

@ranj063, yes, it is just I never had time to think about he proper not too long intro..
@lgirdwood, yes, this or something along the lines is a must.

@lgirdwood
Copy link
Member

@ranj063, yes, it is just I never had time to think about he proper not too long intro.. @lgirdwood, yes, this or something along the lines is a must.

ok, pls confirm this only impacts paused streams at suspend and normal running streams are stopped properly ?

If so - we dont have any impact atm as none of our pipeline advertise pause today. We can revisit this though if ever needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants