-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification for names of OPC-parts leaves some open questions #42
Comments
OPC Section 9.1.1.1 Part Name Syntax has the following: An URI allows unicode characters while Part URI only allows ASCII, with escaped sequences for Unicode characters. |
Would you please refer to your source? Because the RFC3987 defining IRI specifies its encoding with percent prefixes in a similar way to URI I don't understand how the consumer would understand whether the names are URI vs. IRI encoded if IRI was not using the percent escape rule. |
I found it. The "OPC Section 9.1.1.1 Part Name Syntax" says
But I think this is not correct. Indeed, just the next section
which is copied from RFC3987 defines the IRI encoding, which does NOT allow direct use of Unicode characters. I think OPC Section 1.1.1. needs amendment. |
According to OPC spec, the OPC parts could be URI or IRI encoded. The open points are:
According to https://datatracker.ietf.org/doc/html/rfc3987#section-2.1
But that is the only mention of "unreserved" characters in the RFC3987. I understand this paragraph as "characters beyond U+007F, subject to the limitations ... could be stored into IRI verbatim without encoding. However https://datatracker.ietf.org/doc/html/rfc3987#section-2.2 does not reference the "unreserved" characters at all. IMHO the RFC3987 is ambiguous and not quite complete. This stackoverflow post seems to explain a lot.
It is my understanding, that IRI allows the UTF8 characters above %xA0 to be stored directly without escaping (with some exceptions), while the rest of the characters still need to be escaped. Thus the "OPC Section 9.1.1.1 Part Name Syntax" says
which is quite imprecise. It shall say
I wonder whether any 3MF consumer / producer ever stored a character that should have been escaped based on the URI or IRI specification but it was not. PrusaSlicer luckily only generates part names with printable 7 bit characters and it may be the case of other producers as well. If it is not the case and those part names were NOT URI / IRI encoded, enforcing URI / IRI encoding may break backwards compatibility with existing 3MFs. I believe the OPC part names specification is clear now. For names or identifiers other than OPC part names, I believe we do not have to worry as we declare our XMLs as UTF-8 encoded. As long as these other names do not address a ZIP directory entry and they are not pointing to a file or URL, names and IDs may use UTF-8 charset without any limitation. If used as identifiers, there is a risk of two IDs that should be equal but they are not, as one is canonical and the other not. For example, the Czech character 'ú' could be encoded in UTF8 as sequence of two characters: a dash and 'u', or as a single 'ú', while they will both be displayed the same (or nearly the same). Second issue may be that some client may not be able to display an ID because it misses some fonts (for example Chinese fonts may not be installed on his machine). Report from my college Lukas follows:Report - Open Packaging Conventions Microsoft OPC implementation Because the OPC specification was initiated by Microsoft, we went ahead and tested using Microsoft's own tools. While MS Word / Excel only generate simple OPC part names, Microsoft 3D Builder allows saving files with custom names (for example, an image used as a texture) in 3MF (OPC package) files. According to our tests, custom names containing UTF-8 characters are always stored encoded as URI. The URI encoded Part Name is used for both storing inside the ZIP file header and also within XML (3dmodel.model, etc.). According to https://docs.microsoft.com/en-us/windows/win32/api/_opc/, OPC packages can also be created through the Win32 API calls. To insert a new Part Name, you need to call the method IOpcPartSet::CreatePart, which takes as the parameter the interface IOpcPartUri created by the method IOpcFactor::CreatePartUri. This function ensures that each input passed is encoded as URI before it is inserted into the OPC package. We haven't found a way to create Part Name through the Win32 API encoded as IRI instead of URI. Summarization |
Here is an example of an URI encoded texture file name produced by Microsoft 3D Builder, containing non-7bit ASCII characters: Content of
The decoded relationship target as shown by https://www.urldecoder.io/ The file name as stored inside the 3MF ZIP package: The file name is clearly URI encoded by Microsoft 3D Builder. Most likely Microsoft uses the same OPC implementation for various OPC derived formats, thus most likely they use URI only across the board. |
https://github.com/3MFConsortium/spec_core/blob/master/3MF%20Core%20Specification.md#22-part-naming-recommendations does not give a hint to what characters should be used for OPC part names.
What does OPC actually say about that?
Other things (like partnumbers) are specified well:
https://github.com/3MFConsortium/spec_core/blob/master/3MF%20Core%20Specification.md#3431-item-element refers to standard the standard xml simple type xs:string (https://www.w3.org/TR/xmlschema11-2/#string)
The text was updated successfully, but these errors were encountered: