Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt to use encoding header for decoding #11

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

finnboeger
Copy link
Contributor

The encoding of some files is getting detected incorrectly. If a character, that gets changed due to a different encoding being used, is present in the audio file path it will lead to a canonicalzation error.

While automatic encoding detection is not something that I will touch with a 10ft pole we can work around this by honoring the #ENCODING header if it is present.
As these chars are always interpreted correctly regardless of the encoding we can decode as ASCII and disregard any invalid bytes. If the header is not present it falls back on the previous auto-detection algorithm.

(Example File that gets misidentified: petit milady - 360° Hoshi no Orchestra (TV).txt)

@finnboeger
Copy link
Contributor Author

The added test to verify that without the encoding header the file would be misinterpreted is subject to failing if chardet changes. However if we don't test that it fails in absence of the header, we can't be sure that the presence of the header fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant