-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New 3DS CHD Performance Degraded #575
Comments
It would be helpful to know which commit exactly caused the performance issue. CHD does have a performance cost on the CPU, mostly related on decompressing LZMA2 and FLAC. Possible suspects are CDROM timing changes : 068a613 Icache interpreter merge : 7a81171 ARM : Always look up verify_dirty literals from offsets by neonloop 12aa995 This is the commit that's probably affecting you performance wise. It's possible however, that the performance being degraded could be caused by a compiler upgrade, if they did any. EDIT: You also mentionned that you had a new 3DS and not an old 3DS. It's possible that the CDDA code made the new 3DS models slower as it used to be threaded. However because of #521 and Kyuukentai's CDROM delay, the threaded CDDA code was eventually removed as it caused crashes and also caused performance issues on platforms with just one CPU/core. |
Is this still an issue nowadays? |
Older CHDv5 use LZMA + FLAC |
libchdr was updated already (notaz#339) but I have no way to test on N3DS. |
Well I can only recommend @MarioKartFan to keep PBP or use zstd chd |
MiSTer FPGA can also have stuttering issues with CHDs when using CD speed hacks, so I did some digging on this topic recently. If you just want the solution, try recreating the CHD with My understanding from snooping around in the MiSTer and libchdr source code is that before the emulator can read a sector off the CHD file, it has to read the compressed hunk off of the file, then decompress the entire thing. After that, it can reuse the same decompressed hunk to read the next couple of the sectors "for free", before this process repeats itself. Hopefully by then the OS will have pre-fetched the next couple of pages from the disk/SD card. The hunk size is important because it determines how many 4 KB pages the compressed hunk will span on the filesystem (and the OS's page cache), and because bigger hunks means more data to decompress. chdman defaults to 8 sectors or 19,584 bytes; this is too large to get full speed on MiSTer FPGA even with Zstd compression. Final Fantasy IX discs compress down to about 60% of their original size, so a hunk of that size would span three 4 KB pages. The other issue is the compression used. chdman will throw every algorithm at the CD trying to get the file size down as much as possible, but anything besides FLAC for audio plus either Zstd (newer and faster but less widely supported) or Zlib/Deflate (old reliable) is going to slow decompression down a lot for very little size gain. A hunk size of 4 sectors/9792 bytes was fast enough to get the full x8 CD speed on MiSTer reading the file off of an SD card, even with Zlib compression instead of the faster Zstd. That's small enough to go down from 3 pages to 2, and some games can squeeze down to one page. It's also ~10 KB less that needs to be decompressed. Additionally, the size of some games can blow up at lower hunk sizes; when I tested different hunk sizes on my collection of PSX games, 9792 bytes was the start of the diminishing returns on compression and had the least amount of size edge cases: |
Dug out my New 3DS XL and tried running FFIX disc 1 compressed with Zstd + FLAC at various hunk sizes, but none would play the intro FMV 100% free of stutter and audio pops. (Don't do a hunk size of 2448 though, it really started stuttering since every single sector read had to go to the SD card.) Then again, even with an uncompressed CHD I still get a small amount of stutter. Maybe the 3DS's SD card slot just isn't fast enough. |
Grabbed some footage of the attract mode FMV with my phone. I can't tell a difference between a CHD compressed with zstd + FLAC (9792 byte hunks) and an uncompressed CHD (hunk size reduced slightly to 7344 so it reads a similar amount of bytes.) Unfortunately when I tried to load a CHD made with the standard chdman settings (lzma + zlib + FLAC, 19584 byte hunks), the core crashed, so I don't have the worst case scenario for the comparison. I've attached the dump file. Tested on a New 3DS XL and core version r24l 237887e. |
For the sake of completeness, I tried loading it up as a bin/cue; it stuttered as badly as setting the CHD hunk size to 2448. I'm guessing the I/O is completely unbuffered and going to the SD card for every sector read. |
Hard to point CHD reading code then. |
I agree, whatever performance issue was reported originally probably isn't a problem any more if your CHD uses the right settings. But I do find it troubling that running a CHD made with chdman's defaults crashed the core. That's arguably a worse problem, since the documentation for the various chd-compatible cores don't instruct users to change the hunk size or compression algorithm. |
Probably worth compiling 3DS without LZMA support in libchdr to reject such CHD files. |
I/O stalling issues could probably be relieved by reviving threaded cdrom code, although I'm not sure there are many people still using 3DS for PSX... As for the crash, I need to have at least the exact binary to find the faulty code, otherwise the crash dump isn't useful. The nightly builds are different every time because they are statically linked to RetroArch itself which is changing a lot. Or ideally it would be useful to get the dump file (no screenshot needed) for the following build for which I have the debug symbols: Another useful thing would be a .chd that is known to crash the 3DS which could be investigated for out-of-bounds reads or whatever. Perhaps some homebrew could be repacked like from here. |
For what it's worth, I'm one of those people that is still using (New) 3DS for PSX since it's still a very nice and compact handheld that has all the controls needed in order to play PSX games. I assume there are many more out there that don't watch GitHub issues and won't answer here. Anyways, I was interested in testing the recently merged zstd compression in libchdr, I'm actually using a really old version of RetroArch in the New 3DS and I think part of the reason for that was the performance regression mentioned in the first post. I should try the games I found to be most problematic, PaRappa The Rapper and Vib-Ribbon, with zstd and report back. It would also be interesting to test that threaded cdrom code but I understand it's low priority right now. |
For a start fixing the crash would be great, for which I need a crash .dmp created on a build I posted above (assuming it works at all). |
Didn't mean to leave you all hanging, I was just very busy last weekend. I can get those crash dumps tonight, but I'm not sure what to do with the retroarch_3ds.elf file. As for the core, I assume I have to install the CIA? Sorry, I've never had to deal with debug builds before. |
Just ignore any files that you don't need. I zipped all the files that came out of the build together to not have to track which files correspond which build. |
I'm hoping the debug core was installed correctly; I have no idea how to verify. This crash dump is for the PS1 Graphics Demo homebrew from the link you shared. This crash dump is for Final Fantasy IX (USA) (Disc1) (Rev1). Both CHD files were made with chdman v0.268 using the default settings (equivalent to |
The dump matches the binary but it shows the crash happens in |
That's strange. All right then, I turned up both the frontend and core log verbosity to Debug, enabled logging to file with timestamps and tried loading up the graphics demo CHD again. Not sure how much this'll help, but here's the log file and crash dump. |
I've stared at the code some more and it looks like it's indeed is running out of memory. When the heap memory usage grows (for larger CHD hunks I guess) RetroArch has special code to reduce other "linear heap" that seems to be used for texture memory. With that the video driver fails to allocate texture memory, but error handling is missing and it caries on just to crash later in Unclear what can be done about this though. |
I'm skeptical it's an issue with the hunk size. It didn't crash when the CHDs used zlib, zstd, or no compression. When I was messing around with different hunk sizes to see if any of them could run with 0 stutters, I even tested 1 MB hunks (the maximum chdman supports) and it loaded up fine. The bug seems to be related to using lzma. At any rate, sounds like it's definitely a frontend bug. Sorry for sending you on a wild goose chase! |
Well even of the frontent handled OOM gracefully the emulator would still not work with an error message at best. Could you try this build? Please post a log regardless if it crashes or not, it should print some mem usage info I'm curious about. |
Here's the logs and crash dump after installing CIA file for that build. Still trying to run the PS1 Graphics Demo CHD, same as last time. |
Thanks, here is another one. This might require quite a few tries, it's part normal debugging process sadly... pcsx_rearmed_libretro_v1.2-39855-gd08b867e7d_237887e8_patch4.zip |
Hey, that one booted up the demo! |
As guessed it shows all 128MB used (which is an apparent limit?) and it starting to eat into some reserved linear memory. It's kind of weird as standalone version of pcsxr on r-pi4 shows ~68MB usage, and that's on 64bit. Here is a test build with threaded cdrom code: |
Here's the RetroArch logs from the PS1 Graphics Demo CHD that used to crash the core. Still boots fine on this build. retroarch__2024_10_04__20_18_21.log The threaded CD access definitely makes a difference. FFIX's attract mode FMV seems to run full speed now. (I didn't do a side by side with real hardware or anything, but the stutters were quite noticeable before, and I didn't notice any this time.) Here's a recording of it; I left the CD read-ahead at its default value of 12. When I first hacked my 3DS I remember Brave Fencer Musashi stuttering occasionally when voiceovers were playing, and those seem to be gone too. More importantly, Battle Arena Toshinden couldn't run at full speed at all (presumably because it constantly streams music tracks.) With the new build, I still get some slowdowns, but it holds 60 FPS most of the time. I know the 3DS isn't the best system to be running RetroArch, but it's still the best solution for Virtual Boy, DS and 3DS games in my opinion, so I still use it from time to time. It's really cool to see PSX games running this well on it. Thanks for all your hard work getting this working. |
The OP report says there were problems after fast forwarding, does my implementation handle it ok? What about streaming things on lzma compressed CHDs and raw cue/bin? In either case thanks for all the testing. |
Good news! I was about to test a default settings FFIX CHD (lzma + zlib + flac, 19584 byte hunk size) but after looking at my RetroArch history, I noticed I already accidentally used it in that last video I recorded. So yeah, it runs fine; the threaded I/O is enough to make that particular game run fine even without optimizing the CHD settings. (I definitely used a zstd CHD when I tested Battle Arena Toshinden though, which just goes to show that some games can still push the SD card to the limit.) Tried doing some Fast Forward with that last build and didn't notice any issues, but for whatever it's worth I didn't have any FF issues when I was testing the CHD stutters in the previous builds either. |
What about single-sector formats (cue/bin or CHD with hunk size 2448)? Your previous reports showed those were the worst. From those reports it would seem it takes a long time to start a SD card transfer on the 3DS so small transfers are problematic. Or maybe the mismatch of SD card and CD sector sizes causes some SD card controller or driver inefficiency or something. |
Just tested out the FFIX attract mode with single sector formats and it worked pretty well. I did notice a stutter or two, but you'd have to be paying attention to notice. This used to stutter so much the intro felt like it was running at half speed. I recorded a video of the CHD with no compression and 2448 byte hunk size. (Sorry that it came out sideways, not sure why my phone did that.) I did test bin/cue as well but the result was more or less the same, wasn't worth capturing a second video. https://youtube.com/shorts/I3MP3AaPAiY?feature=share So yeah, the threaded CD build makes the two most common and slowest use cases (bin/cue and lzma-compressed 8 sector CHDs) run smoother than the Zstd + 4 sector hunk CHD and the uncompressed 3 sector hunk CHD I tested out in this comment. Seems like a slam dunk to me. |
Just wanted to say I grabbed the latest build through RetroArch (as opposed to test build you attached here) and didn't notice any differences. |
@InquisitiveCoder could you try another build? This one has multithreaded dynarec, it does seem to help on the switch but no idea if it's any good for 3ds, if it works there at all. It can be disabled in "System->DynaRec threading" where "auto" is the same as "on" on multicore systems (the 3ds is one of them). The desired effect is to reduce stutters when things happen for the first time and MIPS code is recompiled to ARM. v1.2-42455-g75c647d3ca-g907c42ea.zip As for this issue I'm closing it as I think I did what I could for CD image reading slowdowns. |
Since you said it seems to help on Switch, do you know of any games that had issues? Or should I just load up random games and see if there's any difference between threaded dynarec on and off? |
"Brave Fencer Musashi" had stutters at the start of the level, despite a fast CPU the Switch has. 3DS ARM11 design could be considered ancient in comparison so I'd guess most games would be affected. Not to mention that out of 4 cores 2 seem to be reserved for the system on 3DS... |
Sorry, it was a very busy week for me so it took me a while to test this thoroughly. I started a new game in Brave Fencer Musashi and played up to the Steam Knight. I used a zstd+flac CHD with 4 sector hunk size as usual. Most of the emulator settings were left at their defaults except I turned on synchronous threaded rendering; I wanted to lighten the load on the main thread and also see if having threaded rendering, CD access and dynarec simultaneously would cause any contention issues with the 3DS's 2 user space cores. The good news is that enabling threaded dynarec doesn't seem to cause any problems. However, it also didn't seem to change any of the slowdowns. To be clear, the game runs at a steady 30 FPS most of the time, with the only notable slowdown happening whenever the transparent artwork for an Assimilate ability appears (probably related to issue #607 ), and when the wall explodes at the start of the cutscene with Rootrick. In short I don't see any harm in providing this option but I also wasn't able to find a scenario where it helps. If you have other games or more specific test setups you'd like me to try, I'd be happy to test further. |
Thanks again for the testing. I got hold of an old3ds, even if it's possible to run code on the syscore, running the recompiler there makes things a lot worse for the old3ds. I'll just default it to off for the 3ds. |
Returned to this port on my New 3DS after a few months of absence. Last version I regularly used was probably somewhere around 1.9.9 or 1.9.10.
I could not believe how poor performance had become in Final Fantasy IX. Began seeing frame tearing running through towns, which never happened previously; far more stutters in fights, sound crackling in movies (after fast forwarding briefly) etc. There has always been a slight sound stutter at the start of fights, going back to when Justin Weiss first added async rendering support. But the issues that exist now are far worse.
I checked all settings, deleted configs and set everything up properly. Issues remained.
Finally tried using PBP instead of CHD. Fixed all problems immediately.
This surprised me as I would have expected CHD to provide similar performance. I then did a very silly performance test. I started a new game using a CHD and a PBP. Using fast forward, the introduction ended with Zidane control in 1:10:15 on CHD and 1:05:01 on PBP. Not saying this is scientific at all but just another data point.
The text was updated successfully, but these errors were encountered: