Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ipc timed out for 0x13000004|GLB_SET_PIPELINE_STATE with multiple-pipeline-capture on TGLU_NOCODEC_IPC4 #7361

Closed
keqiaozhang opened this issue Apr 3, 2023 · 19 comments
Assignees
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection IPC timeout IPC timeout observed IPC4 Issues observed with IPC4 (same IPC as Windows) MTL Applies to Meteor Lake platform P1 Blocker bugs or important features TGL Applies to Tiger Lake
Milestone

Comments

@keqiaozhang
Copy link
Collaborator

Describe the bug
Observed this issue in CI testing and only happened on TGLU_RVP_NOCODEC_IPC4ZPH so far. The reproduction rate >50%.

To Reproduce
~/sof-test/test-case/multiple-pipeline.sh -f c -l 50

Reproduction Rate

50%

dmesg

[ 5532.578324] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x33000000|0x1: GLB_SET_PIPELINE_STATE
[ 5532.578391] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0x13000003|0x1: GLB_SET_PIPELINE_STATE [data size: 12]
[ 5532.578396] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x13000004|0x1: GLB_SET_PIPELINE_STATE [data size: 12]
[ 5533.058084] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 0 dir 1 cmd 0
[ 5533.058103] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: trigger cmd: 0 state: 3
[ 5533.065031] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 12 dir 1 cmd 0
[ 5533.065045] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: trigger cmd: 0 state: 3
[ 5533.080515] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x13000004|0x1
[ 5533.080523] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[ 5533.080525] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 5533.080542] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc000000b rirb 00
[ 5533.080545] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 5533.080568] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Host IPC initiator: 0x93000004|0x1|0x0, target: 0x0|0x0|0x0, ctl: 0x3
[ 5533.080571] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 5533.080574] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 5533.080576] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 5533.080579] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 5533.080588] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 5533.080613] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: extended rom status:  0x5 0x0 0x0 0x0 0x0 0x0 0x0 0x1
[ 5533.080615] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 5533.080642] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to set final state 4 for all pipelines
[ 5533.080647] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 5533.080651] kernel:  Port2: ASoC: trigger FE cmd: 1 failed: -110

mtrace

[  111.733180] <inf> ll_schedule: task add 0xbe0bf0c0 0x20180U priority 0 flags 0x0
[  111.733628] <inf> host_comp: comp:7 0x4000b no bytes to copy, available bytes: 0, free_bytes: 3072
[  111.733700] <inf> dai_intel_dmic: dmic_start(), dmic->fifo_a
[  111.733711] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 0, mask 6000000, value 4000000
[  111.733723] <inf> dai_intel_dmic: dmic_start(), pdm0 mic_a = 1, mic_b = 1
[  111.733733] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 1000, mask c000, value c000
[  111.733745] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 100c, mask 3, value 3
[  111.733756] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 1020, mask 80, value 80
[  111.733766] <inf> dai_intel_dmic: dmic_start(), pdm1 mic_a = 1, mic_b = 1
[  111.733778] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 2000, mask c000, value c000
[  111.733788] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 200c, mask 3, value 3
[  111.733800] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 2020, mask 80, value 80
[  111.733811] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 1000, mask 10000, value 0
[  111.733820] <inf> dai_intel_dmic: dmic_start(), cic 0x0000c001
[  111.733831] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 2000, mask 10000, value 0
[  111.733840] <inf> dai_intel_dmic: dmic_start(), cic 0x0000c001
[  111.733851] <inf> dai_intel_dmic: dmic_start(), dmic_active_fifos_mask = 0x1
[  111.733871] <wrn> dai_comp: comp:5 0x40008 dai_copy(): nothing to copy
[  111.734566] <inf> host_comp: comp:7 0x4000b no bytes to copy, available bytes: 0, free_bytes: 3072
[  111.734601] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 1000, mask 2, value 0
[  111.734613] <inf> dai_intel_dmic: dai_dmic_update_bits base 10000, reg 2000, mask 2, value 0
[  111.734705] <inf> ipc: rx	: 0x13000003|0x1
[  111.734750] <wrn> ipc: ipc_pipeline_complete(): no scheduling component specified, use comp 262151
[  111.735293] <inf> ipc: rx	: 0x13000004|0x1
[  111.7Terminated

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: {avs-tplg/sof-tgl-nocodec.tplg}
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {TGLU_NOCODEC_IPC4}

dmesg.txt

mtrace.txt

@keqiaozhang keqiaozhang added bug Something isn't working as expected P1 Blocker bugs or important features I2S Applies to I2S bus for codec connection IPC timeout IPC timeout observed TGL Applies to Tiger Lake IPC4 Issues observed with IPC4 (same IPC as Windows) labels Apr 3, 2023
@kv2019i
Copy link
Collaborator

kv2019i commented Apr 4, 2023

I think this has the same rootcause as #7191 (or at least this will mask 7191).

I was trying to reproduce 7191 with current main, but I get a IPC timeout like the one here instead. I'll keep these as separate until we can prove these have the same rootcause. But as I can't repro 7191 without hitting this one first, assigning this to myself.

@kv2019i kv2019i self-assigned this Apr 4, 2023
@lgirdwood lgirdwood added this to the v2.6 milestone Apr 5, 2023
@kv2019i
Copy link
Collaborator

kv2019i commented Apr 10, 2023

I could now reproduce this with the reported steps. No DSP panic and no error in the firmware log prior to IPC timeout, so no clues yet on what is the cause.

@wszypelt
Copy link

@keqiaozhang Does the issue only affect TGL?

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 18, 2023

There's a fix now for #7191 . Let's see a few daily rounds whether this bug still occurs. Either this bug starts to appear again (i.e. it was masked by #7191), or it has been solved. I don't think the rootcause is the same as this bug spefifically happens around dmic start, while #7191 was related to smart-amp-test.

@wszypelt
Copy link

@keqiaozhang Does the issue only affect TGL?
and unfortunately, the logs are missing Payloads data, request for access to IPC with Payloads

@keqiaozhang
Copy link
Collaborator Author

This issue has not appeared in CI within three weeks. Closing it.

@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang keqiaozhang reopened this Apr 24, 2023
@wszypelt
Copy link

if the issue is not related to MTL, I lower the priority to P2

@wszypelt wszypelt added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Apr 28, 2023
@fredoh9
Copy link
Contributor

fredoh9 commented May 3, 2023

Found same issue with ADLP_RVP_NOCODEC_IPC4ZPH. multiple-pipeline-all failed in PCM27, DMIC0 RAW pipeline.

[ 5734.391959] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm: trigger stream 27 dir 1 cmd 0
[ 5734.391967] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: trigger cmd: 0 state: 3
[ 5734.399658] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x13120004|0x0
[ 5734.399671] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[ 5734.399675] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 5734.399694] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc000078f rirb 00
[ 5734.399699] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 5734.399720] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Host IPC initiator: 0x13120004|0x0|0x0, target: 0x0|0x0|0x0, ctl: 0x3
[ 5734.399725] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 5734.399729] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 5734.399732] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 5734.399735] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 5734.399750] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 5734.399782] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: extended rom status:  0x5 0x0 0x0 0x0 0x0 0x0 0x0 0x1
[ 5734.399785] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 5734.399812] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: failed to set final state 4 for all pipelines
[ 5734.399832] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[ 5734.399841] kernel:  DMIC: ASoC: trigger FE cmd: 1 failed: -110

Intel internal daily test:
planresultdetail/24774?model=ADLP_RVP_NOCODEC_IPC4ZPH&testcase=multiple-pipeline-all-25

@keqiaozhang
Copy link
Collaborator Author

We only observed this issue on TGL/ADL-IPC4 platforms, no reproductions on MTL, but the reproduce rate is high. After this issue occurs, the audio will be in a bad state, which will affect subsequent test cases.

Changing the priority to P1.

@keqiaozhang keqiaozhang added P1 Blocker bugs or important features and removed P2 Critical bugs or normal features labels May 11, 2023
@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang keqiaozhang added the MTL Applies to Meteor Lake platform label May 11, 2023
@wszypelt
Copy link

Potential fix: #7439

@fredoh9
Copy link
Contributor

fredoh9 commented May 17, 2023

After #7439 merged, this is not reproducible for 3 consecutive daily tests. Good sign.

@wszypelt
Copy link

@keqiaozhang please check if the issue still exists

@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang please check if the issue still exists

The reproduce rate of this issue is decreased, but I still encountered this issue once yesterday.
https://sof-ci.ostc.intel.com/#/result/planresultdetail/26188?model=MTLP_RVP_NOCODEC-sof-ipc4-nocodec&testcase=multiple-pause-resume-50

@kv2019i
Copy link
Collaborator

kv2019i commented May 25, 2023

@keqiaozhang @fredoh9 Potential fix that might solve the remaining failues in #7660 . Please report if you see ayny cases of this error with this PR merged.

@keqiaozhang
Copy link
Collaborator Author

Observed this issue again in CI. Test ID:26591.
dmesg error is the same as before. But mtrace error is somewhat different.

mtrace

[00:00:43.026,265] <inf> dai_intel_ssp: dai_ssp_get_properties: dai_ssp_get_properties init delay 0
[00:00:43.026,276] <inf> dai_intel_ssp: dai_ssp_pre_start: dai_ssp_pre_start
[00:00:43.026,311] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: intel_adsp_gpdma_power_on: dma dma@7c000 initialized
[00:00:43.026,345] <inf> dai_intel_ssp: dai_ssp_start: dai_ssp_start
[00:00:43.026,358] <inf> dai_intel_ssp: dai_ssp_update_bits: dai_ssp_update_bits base 28000, reg 30, mask 100, value 100
ASSERTION FAIL [aligned_addr == addr] @ ZEPHYR_BASE/soc/xtensa/intel_adsp/common/include/intel_adsp_hda.h:163
	Buffer must be 128 byte aligned
[00:00:43.026,540] <err> os:  ** FATerminated

@kv2019i , please help to confirm if they are the same issue?

@kv2019i
Copy link
Collaborator

kv2019i commented May 29, 2023

@keqiaozhang The test case looks a bit different, in 26591, it's check-playback.sh while this bug (and 7191) have capture and playback concurrently (which led to the bug fixed in #7660 ). The mtrace log does look very similar to the original #7191. I think we can use #7191 still to follow this.

@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang The test case looks a bit different, in 26591, it's check-playback.sh while this bug (and 7191) have capture and playback concurrently (which led to the bug fixed in #7660 ). The mtrace log does look very similar to the original #7191. I think we can use #7191 still to follow this.

Thanks for the confirmation. The original bug cannot be reproduced in CI. Let's close this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection IPC timeout IPC timeout observed IPC4 Issues observed with IPC4 (same IPC as Windows) MTL Applies to Meteor Lake platform P1 Blocker bugs or important features TGL Applies to Tiger Lake
Projects
None yet
Development

No branches or pull requests

5 participants