Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximising window hard freezes Cosmic #887

Open
SoloRobo opened this issue Sep 27, 2024 · 6 comments
Open

Maximising window hard freezes Cosmic #887

SoloRobo opened this issue Sep 27, 2024 · 6 comments

Comments

@SoloRobo
Copy link

It appears that maximising any window is causing Cosmic to hard freeze where I have to hard reset the physical machine.

I'm wonder if it is because I am using 3:2 screen on a framework 13 (2880x1920) as this is pretty non-standard and perhaps a calculation is assuming 16:9?

Fedora 41 Beta
cosmic-comp 1.0.0~alpha.2^git20240923afdb656

This happened before I went to Fedora 41 Beta

Alternatively its simply a combination of options I have with the applets.

Is there any debug info I can generate or find?

@onlyreportingissues
Copy link

onlyreportingissues commented Sep 27, 2024

It appears that maximising any window is causing Cosmic to hard freeze where I have to hard reset the physical machine.

I'm wonder if it is because I am using 3:2 screen on a framework 13 (2880x1920) as this is pretty non-standard and perhaps a calculation is assuming 16:9?

Fedora 41 Beta cosmic-comp 1.0.0~alpha.2^git20240923afdb656

This happened before I went to Fedora 41 Beta

Alternatively its simply a combination of options I have with the applets.

Is there any debug info I can generate or find?

Probably related:

@ids1024
Copy link
Member

ids1024 commented Sep 27, 2024

When tty switching keybindings don't work, it's possible to use magic sysrq to enter raw mode, so the kernel will handle the tty switch binding. (On many distros, this requires the kernel.sysrq sysctl to be changed first.)

I'm wonder if it is because I am using 3:2 screen on a framework 13 (2880x1920) as this is pretty non-standard and perhaps a calculation is assuming 16:9?

I don't think resolution would be the problem. The GPU model is probably more relevant. Presumably it's an iGPU. Exactly what model of CPU does the system have?

Perhaps related to direct scanout.

Not sure if there's an easy way to debug, but run a debug build of cosmic-comp, ssh into it from a different system, and attach gdb to the process, to see if I can find what line cosmic-comp is freezing on. (Doing this on a tty, after switching with magic sysrq, may also help). Definitely not the easiest way to test.

Does it produce any dmesg errors?

@onlyreportingissues
Copy link

onlyreportingissues commented Sep 27, 2024

When tty switching keybindings don't work, it's possible to use magic sysrq to enter raw mode, so the kernel will handle the tty switch binding. (On many distros, this requires the kernel.sysrq sysctl to be changed first.)

I'm wonder if it is because I am using 3:2 screen on a framework 13 (2880x1920) as this is pretty non-standard and perhaps a calculation is assuming 16:9?

I don't think resolution would be the problem. The GPU model is probably more relevant. Presumably it's an iGPU. Exactly what model of CPU does the system have?

Perhaps related to direct scanout.

Not sure if there's an easy way to debug, but run a debug build of cosmic-comp, ssh into it from a different system, and attach gdb to the process, to see if I can find what line cosmic-comp is freezing on. (Doing this on a tty, after switching with magic sysrq, may also help). Definitely not the easiest way to test.

Does it produce any dmesg errors?

Not OP, but same issue:

OS: Fedora Linux 41 (Workstation Edition) x86_64
Kernel: Linux 6.12.0-0.rc0.20240927gt075dbe9f.413.vanilla.fc41.x86_64
Resolution 2560x1440 @ 100 Hz [External]
CPU: AMD Ryzen 5 5600
GPU: AMD Radeon RX 6600 [Discrete]

The problem doesn't occur with kernel 6.10.

@Quackdoc
Copy link
Contributor

Perhaps related to direct scanout.

Is there a way to disable direct scannout? I think direct scannout is causing the issues im having with #868

@ids1024
Copy link
Member

ids1024 commented Sep 28, 2024

Good point; we should probably have an env var to test without direct scanout, like Anvil.

You can try this:

diff --git a/src/backend/kms/surface/mod.rs b/src/backend/kms/surface/mod.rs
index d0cfb8d..32aaf4a 100644
--- a/src/backend/kms/surface/mod.rs
+++ b/src/backend/kms/surface/mod.rs
@@ -624,7 +624,8 @@ impl SurfaceThreadState {
             cursor_size,
             Some(gbm),
         ) {
-            Ok(compositor) => {
+            Ok(mut compositor) => {
+                compositor.use_direct_scanout(false);
                 self.active.store(true, Ordering::SeqCst);
                 self.compositor = Some(compositor);
                 Ok(())

@man0lis
Copy link

man0lis commented Sep 30, 2024

I can also reproduce the problem. It seems to happen when maximizing a window (Super+M), swapping a window (Super+X) or when trying to stack 2 windows (Super+U, Super+S). All actions performed in tiling mode. In all cases a version of this stack trace is printed to dmesg and the UI freezes. SSH into the host still works.

[ 3000.716511] BUG: unable to handle page fault for address: 00000000212d216e
[ 3000.716519] #PF: supervisor read access in kernel mode
[ 3000.716524] #PF: error_code(0x0000) - not-present page
[ 3000.716527] PGD 0 P4D 0 
[ 3000.716535] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 3000.716543] CPU: 10 UID: 0 PID: 3064 Comm: kworker/u64:35 Tainted: G           O       6.11.0 #1-NixOS
[ 3000.716550] Tainted: [O]=OOT_MODULE
[ 3000.716553] Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
[ 3000.716557] Workqueue: events_unbound commit_work
[ 3000.716581] RIP: 0010:copy_stream_update_to_stream.isra.0+0x2df/0x6f0 [amdgpu]
[ 3000.717078] Code: 1f 48 8b 10 49 89 97 f0 00 00 00 48 8b 50 08 49 89 97 f8 00 00 00 8b 40 10 41 89 87 00 01 00 00 49 8b 44 24 78 48 85 c0 74 0a <0f> b6 00 41 88 87 88 64 00 00 49 8b 44 24 60 48 85 c0 74 36 48 8b
[ 3000.717082] RSP: 0018:ffffa108465eb9d8 EFLAGS: 00010202
[ 3000.717086] RAX: 00000000212d216e RBX: 0000000000000004 RCX: 0000000000000000
[ 3000.717090] RDX: ffff8fb2da8a9e30 RSI: ffff8fb2e37f8000 RDI: 0000000000000000
[ 3000.717093] RBP: ffffa108465eba30 R08: 0000000000000000 R09: 0000000000000000
[ 3000.717095] R10: 0000000000000000 R11: ffff8fb2da8a99e0 R12: ffff8fb2da8a9e30
[ 3000.717098] R13: ffff8fb151000000 R14: ffff8fb2da8a9e30 R15: ffff8fb2e37f8000
[ 3000.717101] FS:  0000000000000000(0000) GS:ffff8fbfa1f00000(0000) knlGS:0000000000000000
[ 3000.717104] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3000.717107] CR2: 00000000212d216e CR3: 000000014473e000 CR4: 0000000000f50ef0
[ 3000.717110] PKRU: 55555554
[ 3000.717113] Call Trace:
[ 3000.717118]  <TASK>
[ 3000.717123]  ? __die+0x23/0x70
[ 3000.717131]  ? page_fault_oops+0x173/0x5a0
[ 3000.717141]  ? exc_page_fault+0x71/0x150
[ 3000.717149]  ? asm_exc_page_fault+0x26/0x30
[ 3000.717159]  ? copy_stream_update_to_stream.isra.0+0x2df/0x6f0 [amdgpu]
[ 3000.717539]  ? psi_task_switch+0xd6/0x230
[ 3000.717546]  update_planes_and_stream_state+0x23e/0x520 [amdgpu]
[ 3000.717903]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.717910]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.717914]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.717919]  ? commit_minimal_transition_state+0x113/0x350 [amdgpu]
[ 3000.718169]  update_planes_and_stream_v2+0x1b4/0x5f0 [amdgpu]
[ 3000.718309]  ? __entry_text_end+0x101e86/0x101e89
[ 3000.718315]  ? dma_fence_array_release+0x7c/0xa0
[ 3000.718318]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.718320]  ? kfree+0x2b7/0x300
[ 3000.718325]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.718326]  ? kvfree_call_rcu+0x21f/0x360
[ 3000.718331]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.718333]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 3000.718335]  ? wait_for_completion_timeout+0x135/0x160
[ 3000.718339]  ? commit_tail+0x91/0x130
[ 3000.718342]  ? process_one_work+0x18f/0x3b0
[ 3000.718346]  ? worker_thread+0x21f/0x330
[ 3000.718348]  ? __pfx_worker_thread+0x10/0x10
[ 3000.718350]  ? kthread+0xcd/0x100
[ 3000.718353]  ? __pfx_kthread+0x10/0x10
[ 3000.718355]  ? ret_from_fork+0x31/0x50
[ 3000.718359]  ? __pfx_kthread+0x10/0x10
[ 3000.718361]  ? ret_from_fork_asm+0x1a/0x30
[ 3000.718366]  </TASK>
[ 3000.718367] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm ccm af_packet cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 nls_cp437 vfat fat iwlmvm snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp xt_conntrack snd_sof_pci nf_conntrack snd_sof_xtensa_dsp mousedev mac80211 nf_defrag_ipv6 nf_defrag_ipv4 snd_sof hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common snd_sof_utils industrialio snd_pci_ps snd_amd_sdw_acpi xt_policy soundwire_amd ptp soundwire_generic_allocation soundwire_bus pps_core snd_hda_codec_realtek libarc4 edac_mce_amd ip6t_rpfilter joydev snd_hda_codec_generic snd_hda_scodec_component edac_core snd_soc_core ipt_rpfilter snd_hda_codec_hdmi intel_rapl_msr spd5118 amd_atl snd_compress intel_rapl_common snd_hda_intel ac97_bus hid_multitouch hid_sensor_hub snd_pcm_dmaengine xt_pkttype iwlwifi kvm_amd snd_rpl_pci_acp6x snd_intel_dspcfg snd_intel_sdw_acpi xt_LOG snd_acp_pci
[ 3000.718427]  snd_hda_codec snd_acp_legacy_common btusb hid_generic nf_log_syslog snd_hda_core sp5100_tco ip6t_REJECT kvm cfg80211 snd_pci_acp6x btrtl snd_hwdep btintel watchdog amd_pmf snd_pci_acp5x nf_reject_ipv6 snd_pcm amdtee crct10dif_pclmul btbcm snd_rn_pci_acp3x crc32_pclmul ucsi_acpi snd_acp_config ipt_REJECT amd_sfh nf_reject_ipv4 polyval_clmulni i2c_piix4 btmtk snd_timer typec_ucsi snd_soc_acpi polyval_generic cros_ec_hwmon cros_ec_sysfs cros_ec_debugfs cros_ec_chardev snd ghash_clmulni_intel bluetooth rapl typec k10temp framework_laptop(O) tiny_power_button wmi_bmof tpm_crb platform_profile rfkill ccp soundcore snd_pci_acp3x i2c_smbus thermal roles ac i2c_hid_acpi xt_tcpudp i2c_hid tpm_tis cros_charge_control leds_cros_ec hid tpm_tis_core led_class_multicolor evdev cros_usbpd_logger button cros_usbpd_charger battery amd_pmc cros_kbd_led_backlight cros_usbpd_notify mac_hid gpio_cros_ec nft_compat serio_raw cros_ec_dev nf_tables sch_fq_codel loop tun tap macvlan bridge stp llc cros_ec_lpcs cros_ec fuse
[ 3000.718497]  efi_pstore configfs nfnetlink efivarfs dmi_sysfs ip_tables x_tables autofs4 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core libaescfb ecdh_generic ecc input_leds led_class atkbd xhci_pci libps2 xhci_pci_renesas vivaldi_fmap sha512_ssse3 thunderbolt nvme sha256_ssse3 sha1_ssse3 xhci_hcd aesni_intel nvme_core gf128mul crypto_simd cryptd i8042 nvme_auth rtc_cmos serio amdgpu video wmi backlight amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper firmware_class cec crc16 dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
[ 3000.718548] CR2: 00000000212d216e
[ 3000.718550] ---[ end trace 0000000000000000 ]---
[ 3001.038324] pstore: backend (efi_pstore) writing error (-28)
[ 3001.038326] RIP: 0010:copy_stream_update_to_stream.isra.0+0x2df/0x6f0 [amdgpu]
[ 3001.038499] Code: 1f 48 8b 10 49 89 97 f0 00 00 00 48 8b 50 08 49 89 97 f8 00 00 00 8b 40 10 41 89 87 00 01 00 00 49 8b 44 24 78 48 85 c0 74 0a <0f> b6 00 41 88 87 88 64 00 00 49 8b 44 24 60 48 85 c0 74 36 48 8b
[ 3001.038500] RSP: 0018:ffffa108465eb9d8 EFLAGS: 00010202
[ 3001.038502] RAX: 00000000212d216e RBX: 0000000000000004 RCX: 0000000000000000
[ 3001.038504] RDX: ffff8fb2da8a9e30 RSI: ffff8fb2e37f8000 RDI: 0000000000000000
[ 3001.038505] RBP: ffffa108465eba30 R08: 0000000000000000 R09: 0000000000000000
[ 3001.038506] R10: 0000000000000000 R11: ffff8fb2da8a99e0 R12: ffff8fb2da8a9e30
[ 3001.038507] R13: ffff8fb151000000 R14: ffff8fb2da8a9e30 R15: ffff8fb2e37f8000
[ 3001.038508] FS:  0000000000000000(0000) GS:ffff8fbfa1f00000(0000) knlGS:0000000000000000
[ 3001.038509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3001.038510] CR2: 00000000212d216e CR3: 000000014473e000 CR4: 0000000000f50ef0
[ 3001.038512] PKRU: 55555554
[ 3001.038513] note: kworker/u64:35[3064] exited with irqs disabled

Did not see this posted in one of the issues, hope it helps. Maybe a bug in amdgpu?

EDIT:
The problem goes away, when I apply the provided patch for disabling direct scanout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants