-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle UTF-8 in SOSI 4.5 files #9
Comments
Branch H_unicode should be able to handle this. |
Besides path settings to fyut.h in GM.cpp, the usage of __int64 (instead of UT_INT64) in line 249 in fyut.h and the usage of windows-specific SW_SHOWNORMAL und HANDLE in line 327 and 328 in fyut.h, there some more error-messages on Linux (Ubuntu 12.04): DELDIR.cpp: In function 'short int UT_DeleteDir(const wchar_t_)': In the old openfyba-version DELDIR.cpp (and also FILNACMP.cpp) used only “char”, in the 4.5 version this was changed to “wchar_t” which now is causing problems… |
Yes, wchar_t is a major source of issues when porting, and is also (wrongly) used to store filenames and directories. |
Talking about path-names: I wonder if the fully free replacements of SPLITPTH.cpp and FULLPATH.cpp by @Kagee were merged into the 4.5 version? Or was the borland stuff replaced by some other improvements? |
UT_splitpath are now using _wsplitpath_s, and UT_FullPath are using _wfullpath/_fullpath. |
The usage of wchar_t is also causing troubles for building UT (UT1.cpp, UT2.cpp, UT3.cpp, UT4.cpp, SETSIZE.cpp, CREDIR.cpp, and CopyFile.cpp) with MS Visual Studio 2010 (which OSGeo4W is using). Example: |
We also use VS2010 with no problems. Have you changed the Property Page for fyba, ut and gm? The General | Character Set shal be set to "Use Unicode Character Set". |
Arh, OK. I used the project files from the repository. With the "Use Unicode Character Set" MS Visual Studio compiles the code without problem. While talking about MS Visual Studio: OSGeo4W requests also a 64bit-package. Do you have experience regarding a 64-bit build of fyba? |
For test I compiled a 64-bit version of fyba for 1-2 years ago. It worked fine, but we have only used it in a test application. |
For portability, it seems to be doable to redefine wchar_t to char in functions which handle path- and filenames. However, the UT_StrCopy and UT_StrCat are a likely to be a bit tricky because they are used both on path- / file-names and textwithin the files if I am not mistaken... |
Are there any updates on this issue? Are you planning to port the new version of the fyba-lib to Unix? The earlier version serves us well on our Ubuntu server (e.g. for loading data into PostGIS)... |
N-Mapseries from 2014 have som UTF8 name files (e.g. no_n250_navn_utf8.sos) which cannot be opened with OGR compiled against fyba-master... Would be nice to be able to read them as well... |
Hi again, I tried to port the Unicode branch to Unix with my humble amateur knowledge of programming and no education in C/C++. The modified version I have now does compile on Unix too, but I am sure there are some ugly hacks in the code. Anyway, would you mind if I publish the modified version on my github, with the aim of contributing it back to kartverket when it is a bit more mature? I also worked on adjusting the OGR driver code to the new library. For the OGR driver I am almost there (I think) it does compile but segfaults at runtime... In that context it will increase chances to get help from the GDAL people (or others) if the modified library is available for them (e.g. on github)... |
Probably a bit late to the party here, but regarding the comment "Todo: check that it compiles on *X systems." Answer: No :\ I do not know any C, so this is cryptic to me, but it seems to be an error related to fyut.h when running make, see below.
|
Workaround for Bash: #!/bin/bash
set -euxo pipefail
for path in "$@"
do
# Remove BOM (not needed for UTF-8)
sed 's/\xef\xbb\xbf//' "$path" |
iconv -f 'UTF-8' -t 'ISO-8859-10' |
sed 's/TEGNSETT UTF-8/TEGNSETT ISO8859-10/' > "${path%.*}_iso8859-10.${path##*.}"
done |
SOSI 4.5 files allow for TEGNSETT UTF8, along with other changes in the standard.
The current TestHode method (correctly) does not accept these files, as the driver does not handle them correctly. Hence, they are not identified as SOSI files by the ogr driver.
Two possible solutions, possibly in parallel:
The text was updated successfully, but these errors were encountered: