-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework iso9660 view action #199
base: master
Are you sure you want to change the base?
Conversation
Apparently this would fix https://midnight-commander.org/ticket/2851 |
Oh, I wanted to ask for tests again, but I realized that this is only for view action, and there is not even a framework for it at the moment, if I'm correct... Also, it means that the linked ticket is only related, but not exactly the same issue. Can you confirm that the copy-out works though, or if not also have a look at fixing it? |
Also this one is indirectly related: https://midnight-commander.org/ticket/4488 |
Dont yet proceed on merge/review, I'm writing some big text on isoinfo to post here |
After looking deeper into the whole isoinfo stuff, I've found something bad. Below I'll use a sample Rock Ridge+Joliet
Rock Ridge doesnt feature a "charset" concept for filenames. By default iso9660 tools print the names as-is and it is not a big problem these days, since most likely the names are utf-8 encoded and the terminals are utf-8 as well. xorriso since 2009 supports isoinfo is a simple tool, it always prints RR names raw, which is fine:
Joliet filenames are UCS-2 encoded, it is the standard. When iso9660 tools create images, they convert from whatever input charset is to UCS-2. When they list some image's content, they convert from UCS-2 for the local charset. It sounds much better than the RR case, but there is a problem: isoinfo cant convert to utf-8. It can only convert to a selection of 1-byte charsets, the conversion tables are under
The underscored name can be used to extract the file:
It already looks bad, but there is more to it. Let's create another sample iso:
and list it:
All the files look the same, as expected. Let's see what whould it extract by
It printed contents of ALL the files which got their names squashed into It is possible to produce the correct listing with isoinfo:
but it only works because we know ahead symbols used in filenames can be converted to cp1251 without issues. This trick can be used with extraction as well:
To summarize, Joliet support in isoinfo is inadequate. It only works well for latin characters. It cant convert non-latin filenames to utf-8, which is the must these days. I might suggest to at least ignore the Joliet tree when using isoinfo. In the case, it would be always |
In case we decided to ignore the Joliet tree with isoinfo, this whole PR would become a one-line change: |
Sounds pretty brutal to build a complex fallback chain with Joliet problem in-between. Maybe just ignore Joliet with isoinfo (a reference to this PR can be added in the code) and if you want to make it better rather use xorriso if available and only then fallback to isoinfo? |
Locally I've remade it into xorriso + isoinfo (no Joliet). The following text is about the outdated p7zip 16.02 still present in ubuntu 22.04 Here is another question. In the current view action code for iso9660 there is a fallback to Next, 7z prefers Joliet over Rock Ridge, there is no cli option to change that [1]. When Joliet is present,
But when there is only Rock Ridge, it assumes the filenames are encoded in some 1-byte encoding (idk what
It could be tricked to print the names raw:
But the same trick breaks it for Joliet images:
So, to correctly list some iso with
To summarize: What do you think, should [1] https://github.com/p7zip-project/p7zip/blob/36f6b74222/CPP/7zip/Archive/Iso/IsoHandler.cpp#L203-L206 |
Well, I don't think it hurts too much to keep 7z as a fallback. In the worst case the filenames are unreadable, but at least you've got something at all. I was thinking that maybe it makes sense to report a bug against 7z, but it's kind of difficult. What should they do? I think the only thing that would help a bit would be to validate the filenames as utf-8 stream and in this case as a special handling just output them raw. But somehow I'm skeptical they would want to invest much into it. |
You mean let it be just Also, I'd like to have my findings about isoinfo and 7z added somewhere to the repo, like README.iso9660, so that I could refer to it in comments. Which dir can I put it into?
|
Yep.
I think this would be a good place: https://github.com/MidnightCommander/mc/tree/master/src/vfs/extfs/helpers I don't think anyone would be looking for it / find it where the scripts are: https://github.com/MidnightCommander/mc/tree/master/misc/ext.d |
More details on 7z. I've figured out there are two flavours of 7z on my ubuntu 22.04 system. One is named p7zip (p7zip-full package, 7z and 7za binaries). 7-zip.org quote:
Its most recent version is 16.02. Ubuntu shipped that version since 16.10 till 23.10. In ubuntu 24.04 it became a transitional package to 7zip. The other 7z flavour is 7zip (7zip package, 7zz binary), available since ubuntu 22.04. It is built from 7-zip.org sources. This one is actively maintained. And it has no problem with RR names I noticed in p7zip-full. Since ubuntu 24.04, 7zip package provides 7z, 7za and 7zr binaries. And 7zip-standalone package provides 7zz binary. Interestingly, archlinux atm ships p7zip 17.05-2 from this fork https://github.com/p7zip-project/p7zip. It does not have the RR problem as well. There are builds from upstream in AUR as well (1 2) So, a rule of thumb with iso9660 listing with 7z: try 7zz first, fallback to 7z. But it still sucks in preferring Joliet over RR. |
Ready to merge now. I'll look into those tickets, you mentioned above, later. |
- use xorriso -> isoinfo -> 7z fallback chain - ignore the Joliet tree with isoinfo - improve error reporting - dev notes: src/vfs/extfs/helpers/README.iso9660
e9e72d9
to
de7d72c
Compare
Sample iso images to try it on:
Example how to inspect those for Rock/Joliet support: