-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 encoding error in cmudict-0.7b #5
Comments
Danilo,
I believe that iconv is doing the right thing, at least for me (using a git
shell):
$ cat /proc/version
MINGW64_NT-10.0-19043 version 3.1.7-340.x86_64 ***@***.***) (gcc
version 10.2.0 (GCC) ) 2021-03-26 22:17 UTC
$ iconv --version
iconv (GNU libiconv 1.16)
Copyright (C) 2000-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.
$ file cmudict-0.7b
cmudict-0.7b: ASCII text, with CRLF line terminators
$ iconv -f UTF-8 cmudict-0.7b > foo
$ diff foo cmudict-0.7b
$
iconv by default will map into whatever is your local language setting; I'm
not sure but it's likely that ASCII stays but it gets supplemented by a
language-appropriate page. So it may be that no conversion is needed. In
any case I'm not what's different in your case.
Alex
…On Wed, Dec 8, 2021 at 2:02 AM Danylo Mysak ***@***.***> wrote:
The output of iconv -f UTF-8 cmudict-0.7b > /dev/null; echo $? (as
suggested here
<https://stackoverflow.com/questions/115210/how-to-check-whether-a-file-is-valid-utf-8/115262#115262>)
is:
iconv: cmudict-0.7b:35733:1: cannot convert
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABX2IQE6EKRHFXDFNY6ZXVTUP37JFANCNFSM5JTBRX3A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Alex, thanks for the reply! I think it might be the case that your version of
I believe, the |
The file encoding is CP-1252: byte C9 represents É and C0 represents À. Assuming you want the file in UTF-8, this is the right incantation:
|
The output of
iconv -f UTF-8 cmudict-0.7b > /dev/null; echo $?
(as suggested here) is:iconv: cmudict-0.7b:35733:1: cannot convert
Removing the line fixes things.
The text was updated successfully, but these errors were encountered: