Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle UTF-8 in SOSI 4.5 files #9

Open
relet opened this issue Oct 23, 2013 · 15 comments
Open

Handle UTF-8 in SOSI 4.5 files #9

relet opened this issue Oct 23, 2013 · 15 comments

Comments

@relet
Copy link
Contributor

relet commented Oct 23, 2013

SOSI 4.5 files allow for TEGNSETT UTF8, along with other changes in the standard.

The current TestHode method (correctly) does not accept these files, as the driver does not handle them correctly. Hence, they are not identified as SOSI files by the ogr driver.

Two possible solutions, possibly in parallel:

  • Enable UTF8 in TestHode and see how these files are handled.
  • @rosand can publish a newer, but more heavily windows-biased SOSI version in its own branch. That one would need to be merged and confirmed working on unixes.
@relet
Copy link
Contributor Author

relet commented Dec 18, 2013

Branch H_unicode should be able to handle this.
https://github.com/kartverket/fyba/tree/H_unicode
Todo: check that it compiles on *X systems.

@ninsbl
Copy link
Contributor

ninsbl commented Dec 18, 2013

Besides path settings to fyut.h in GM.cpp, the usage of __int64 (instead of UT_INT64) in line 249 in fyut.h and the usage of windows-specific SW_SHOWNORMAL und HANDLE in line 327 and 328 in fyut.h, there some more error-messages on Linux (Ubuntu 12.04):

DELDIR.cpp: In function 'short int UT_DeleteDir(const wchar_t_)':
DELDIR.cpp:51:30: error: cannot convert 'const wchar_t_' to 'const char_' for argument '1' to 'int rmdir(const char_)'
DELDIR.cpp:55:30: error: cannot convert 'const wchar_t_' to 'const char_' for argument '1' to 'int rmdir(const char*)'
DELDIR.cpp:70:1: warning: control reaches end of non-void function [-Wreturn-type]

In the old openfyba-version DELDIR.cpp (and also FILNACMP.cpp) used only “char”, in the 4.5 version this was changed to “wchar_t” which now is causing problems…

@relet
Copy link
Contributor Author

relet commented Dec 18, 2013

Yes, wchar_t is a major source of issues when porting, and is also (wrongly) used to store filenames and directories.
It should be valid to change it back to char on UTF-8 POSIX systems.

@ninsbl
Copy link
Contributor

ninsbl commented Dec 18, 2013

Talking about path-names: I wonder if the fully free replacements of SPLITPTH.cpp and FULLPATH.cpp by @Kagee were merged into the 4.5 version? Or was the borland stuff replaced by some other improvements?

@rosand
Copy link

rosand commented Dec 18, 2013

UT_splitpath are now using _wsplitpath_s, and UT_FullPath are using _wfullpath/_fullpath.

@ninsbl
Copy link
Contributor

ninsbl commented Dec 19, 2013

The usage of wchar_t is also causing troubles for building UT (UT1.cpp, UT2.cpp, UT3.cpp, UT4.cpp, SETSIZE.cpp, CREDIR.cpp, and CopyFile.cpp) with MS Visual Studio 2010 (which OSGeo4W is using). Example:
fyba-h_unicode\src\ut\ut4.cpp(144): error C2664: 'int _snprintf_s(char *,size_t,size_t,const char *,...)' : cannot convert parameter 1 from 'wchar_t [80]' to 'char *'
Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

@rosand
Copy link

rosand commented Dec 19, 2013

We also use VS2010 with no problems. Have you changed the Property Page for fyba, ut and gm? The General | Character Set shal be set to "Use Unicode Character Set".

@ninsbl
Copy link
Contributor

ninsbl commented Dec 19, 2013

Arh, OK. I used the project files from the repository. With the "Use Unicode Character Set" MS Visual Studio compiles the code without problem.

While talking about MS Visual Studio: OSGeo4W requests also a 64bit-package. Do you have experience regarding a 64-bit build of fyba?

@rosand
Copy link

rosand commented Dec 19, 2013

For test I compiled a 64-bit version of fyba for 1-2 years ago. It worked fine, but we have only used it in a test application.

@ninsbl
Copy link
Contributor

ninsbl commented Dec 20, 2013

For portability, it seems to be doable to redefine wchar_t to char in functions which handle path- and filenames. However, the UT_StrCopy and UT_StrCat are a likely to be a bit tricky because they are used both on path- / file-names and textwithin the files if I am not mistaken...
Furthermore, functions _vscwprintf and vswprintf_s which are used to create function UT_fwprintfUTF8 (in UT1.cpp) are not available on UNIX/LINUX. Looks like UT1.cpp requires most work for porting to UNIX/LINUX...
But I am no C developer...

@ninsbl
Copy link
Contributor

ninsbl commented Feb 10, 2014

Are there any updates on this issue? Are you planning to port the new version of the fyba-lib to Unix? The earlier version serves us well on our Ubuntu server (e.g. for loading data into PostGIS)...
If you don`t mind I would package the SOSI 4.5 UTF-8-version for OSGeo4W at next occasion. But it would be nice to have an equivalent on Unix/Linux ...

@ninsbl
Copy link
Contributor

ninsbl commented Oct 4, 2014

N-Mapseries from 2014 have som UTF8 name files (e.g. no_n250_navn_utf8.sos) which cannot be opened with OGR compiled against fyba-master... Would be nice to be able to read them as well...

@ninsbl
Copy link
Contributor

ninsbl commented Jan 9, 2015

Hi again, I tried to port the Unicode branch to Unix with my humble amateur knowledge of programming and no education in C/C++. The modified version I have now does compile on Unix too, but I am sure there are some ugly hacks in the code. Anyway, would you mind if I publish the modified version on my github, with the aim of contributing it back to kartverket when it is a bit more mature?

I also worked on adjusting the OGR driver code to the new library. For the OGR driver I am almost there (I think) it does compile but segfaults at runtime... In that context it will increase chances to get help from the GDAL people (or others) if the modified library is available for them (e.g. on github)...

@atlefren
Copy link

Probably a bit late to the party here, but regarding the comment

"Todo: check that it compiles on *X systems."

Answer: No :\

I do not know any C, so this is cryptic to me, but it seems to be an error related to fyut.h when running make, see below.

[atlefren@sprocket] ~/code/fyba (H_unicode)$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

[atlefren@sprocket] ~/code/fyba (H_unicode)$ git branch
* H_unicode
  master

[atlefren@sprocket] ~/code/fyba (H_unicode)$ git status
On branch H_unicode
Your branch is up-to-date with 'origin/H_unicode'.
nothing to commit, working directory clean

[atlefren@sprocket] ~/code/fyba (H_unicode)$ autoreconf --force --install 
libtoolize: putting auxiliary files in '.'.
libtoolize: copying file './ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
configure.ac:6: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated.  For more info, see:
configure.ac:6: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
configure.ac:7: installing './compile'
configure.ac:7: installing './config.guess'
configure.ac:7: installing './config.sub'
configure.ac:6: installing './install-sh'
configure.ac:6: installing './missing'
src/FYBA/Makefile.am: installing './depcomp'

[atlefren@sprocket] ~/code/fyba (H_unicode)$ ./configure                                                       
checking for a BSD-compatible install... /usr/bin/install -c                                                   
checking whether build environment is sane... yes                                                              
checking for a thread-safe mkdir -p... /bin/mkdir -p                                                           
checking for gawk... no                                                                                        
checking for mawk... mawk                                                                                      
checking whether make sets $(MAKE)... yes                                                                      
checking whether make supports nested variables... yes                                                         
checking build system type... x86_64-pc-linux-gnu                                                              
checking host system type... x86_64-pc-linux-gnu                                                               
checking how to print strings... printf                                                                        
checking for style of include used by make... GNU                                                              
checking for gcc... gcc                                                                                        
checking whether the C compiler works... yes                                                                   
checking for C compiler default output file name... a.out                                                      
checking for suffix of executables...                                                                          
checking whether we are cross compiling... no                                                                  
checking for suffix of object files... o                                                                       
checking whether we are using the GNU C compiler... yes                                                        
checking whether gcc accepts -g... yes                                                                         
checking for gcc option to accept ISO C89... none needed                                                       
checking whether gcc understands -c and -o together... yes                                                     
checking dependency style of gcc... gcc3                                                                       
checking for a sed that does not truncate output... /bin/sed                                                   
checking for grep that handles long lines and -e... /bin/grep                                                  
checking for egrep... /bin/grep -E                                                                             
checking for fgrep... /bin/grep -F                                                                             
checking for ld used by gcc... /usr/bin/ld                                                                     
checking if the linker (/usr/bin/ld) is GNU ld... yes                                                          
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B                                          
checking the name lister (/usr/bin/nm -B) interface... BSD nm                                                  
checking whether ln -s works... yes                                                                            
checking the maximum length of command line arguments... 1572864                                               
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop 
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop           
checking for /usr/bin/ld option to reload object files... -r                                                   
checking for objdump... objdump                                                                                
checking how to recognize dependent libraries... pass_all                                                      
checking for dlltool... no                                                                                     
checking how to associate runtime and link libraries... printf %s\n                                            
checking for ar... ar                                                                                          
checking for archiver @FILE support... @                                                                       
checking for strip... strip                                                                                    
checking for ranlib... ranlib                                                                                  
checking command to parse /usr/bin/nm -B output from gcc object... ok                                          
checking for sysroot... no                                                                                     
checking for a working dd... /bin/dd                                                                           
checking how to truncate binary pipes... /bin/dd bs=4096 count=1                                               
checking for mt... mt                                                                                          
checking if mt is a manifest tool... no                                                                        
checking how to run the C preprocessor... gcc -E                                                               
checking for ANSI C header files... yes                                                                        
checking for sys/types.h... yes                                                                                
checking for sys/stat.h... yes                                                                                 
checking for stdlib.h... yes                                                                                   
checking for string.h... yes                                                                                   
checking for memory.h... yes                                                                                   
checking for strings.h... yes                                                                                  
checking for inttypes.h... yes                                                                                 
checking for stdint.h... yes                                                                                   
checking for unistd.h... yes                                                                                   
checking for dlfcn.h... yes                                                                                    
checking for objdir... .libs                                                                                   
checking if gcc supports -fno-rtti -fno-exceptions... no                                                       
checking for gcc option to produce PIC... -fPIC -DPIC                                                          
checking if gcc PIC flag -fPIC -DPIC works... yes                                                              
checking if gcc static flag -static works... yes                                                               
checking if gcc supports -c -o file.o... yes                                                                   
checking if gcc supports -c -o file.o... (cached) yes                                                          
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes                   
checking whether -lc should be explicitly linked in... no                                                      
checking dynamic linker characteristics... GNU/Linux ld.so                                                     
checking how to hardcode library paths into programs... immediate                                              
checking whether stripping libraries is possible... yes                                                        
checking if libtool supports shared libraries... yes                                                           
checking whether to build shared libraries... yes                                                              
checking whether to build static libraries... yes                                                              
checking for g++... g++                                                                                        
checking whether we are using the GNU C++ compiler... yes                                                      
checking whether g++ accepts -g... yes                                                                         
checking dependency style of g++... gcc3                                                                       
checking how to run the C++ preprocessor... g++ -E                                                             
checking for ld used by g++... /usr/bin/ld -m elf_x86_64                                                       
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes                                            
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes                   
checking for g++ option to produce PIC... -fPIC -DPIC                                                          
checking if g++ PIC flag -fPIC -DPIC works... yes                                                              
checking if g++ static flag -static works... yes                                                               
checking if g++ supports -c -o file.o... yes                                                                   
checking if g++ supports -c -o file.o... (cached) yes                                                          
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes                   
checking dynamic linker characteristics... (cached) GNU/Linux ld.so                                            
checking how to hardcode library paths into programs... immediate                                              
checking for gcc... (cached) gcc                                                                               
checking whether we are using the GNU C compiler... (cached) yes                                               
checking whether gcc accepts -g... (cached) yes                                                                
checking for gcc option to accept ISO C89... (cached) none needed                                              
checking whether gcc understands -c and -o together... (cached) yes                                            
checking dependency style of gcc... (cached) gcc3                                                              
checking how to run the C preprocessor... gcc -E                                                               
checking fcntl.h usability... yes                                                                              
checking fcntl.h presence... yes                                                                               
checking for fcntl.h... yes                                                                                    
checking float.h usability... yes                                                                              
checking float.h presence... yes                                                                               
checking for float.h... yes                                                                                    
checking for inttypes.h... (cached) yes                                                                        
checking limits.h usability... yes                                                                             
checking limits.h presence... yes                                                                              
checking for limits.h... yes                                                                                   
checking locale.h usability... yes                                                                             
checking locale.h presence... yes                                                                              
checking for locale.h... yes                                                                                   
checking for memory.h... (cached) yes                                                                          
checking for stdint.h... (cached) yes                                                                          
checking for stdlib.h... (cached) yes                                                                          
checking for string.h... (cached) yes                                                                          
checking sys/ioctl.h usability... yes                                                                          
checking sys/ioctl.h presence... yes                                                                           
checking for sys/ioctl.h... yes                                                                                
checking sys/statvfs.h usability... yes                                                                        
checking sys/statvfs.h presence... yes                                                                         
checking for sys/statvfs.h... yes                                                                              
checking sys/time.h usability... yes                                                                           
checking sys/time.h presence... yes                                                                            
checking for sys/time.h... yes                                                                                 
checking sys/vfs.h usability... yes                                                                            
checking sys/vfs.h presence... yes                                                                             
checking for sys/vfs.h... yes                                                                                  
checking termios.h usability... yes                                                                            
checking termios.h presence... yes                                                                             
checking for termios.h... yes                                                                                  
checking for unistd.h... (cached) yes                                                                          
checking for stdbool.h that conforms to C99... yes                                                             
checking for _Bool... yes                                                                                      
checking for mode_t... yes                                                                                     
checking for size_t... yes                                                                                     
checking for stdlib.h... (cached) yes                                                                          
checking for GNU libc compatible malloc... yes                                                                 
checking for stdlib.h... (cached) yes                                                                          
checking for GNU libc compatible realloc... yes                                                                
checking for working strcoll... yes
checking for working strtod... yes
checking for floor... no
checking for getcwd... yes
checking for memmove... yes
checking for memset... yes
checking for mkdir... yes
checking for modf... yes
checking for pow... no
checking for rmdir... yes
checking for sqrt... no
checking for strchr... yes
checking for strerror... yes
checking for strpbrk... yes
checking for strstr... yes
checking for strtol... yes
checking for strtoul... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/GM/Makefile
config.status: creating src/UT/Makefile
config.status: creating src/FYBA/Makefile
config.status: creating doc/Makefile
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands

[atlefren@sprocket] ~/code/fyba (H_unicode)$ make
make  all-recursive
make[1]: Entering directory '/home/atlefren/code/fyba'
Making all in src/GM
make[2]: Entering directory '/home/atlefren/code/fyba/src/GM'
/bin/bash ../../libtool  --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I. -I../..  --pedantic -Wno-long-long -Wall -O2 -D_FILE_OFFSET_BITS=64 -DUNIX -DLINUX -fPIC -Wno-write-strings   -g -O2 -MT GM.lo -MD -MP -MF .deps/GM.Tpo -c -o GM.lo GM.cpp
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../.. --pedantic -Wno-long-long -Wall -O2 -D_FILE_OFFSET_BITS=64 -DUNIX -DLINUX -fPIC -Wno-write-strings -g -O2 -MT GM.lo -MD -MP -MF .deps/GM.Tpo -c GM.cpp  -fPIC -DPIC -o .libs/GM.o
GM.cpp:14:18: fatal error: fyut.h: No such file or directory
 #include <fyut.h>
                  ^
compilation terminated.
Makefile:452: recipe for target 'GM.lo' failed
make[2]: *** [GM.lo] Error 1
make[2]: Leaving directory '/home/atlefren/code/fyba/src/GM'
Makefile:401: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/atlefren/code/fyba'
Makefile:333: recipe for target 'all' failed
make: *** [all] Error 2

@frafra
Copy link

frafra commented Feb 2, 2023

Workaround for Bash:

#!/bin/bash

set -euxo pipefail

for path in "$@"
do
    # Remove BOM (not needed for UTF-8)
    sed 's/\xef\xbb\xbf//' "$path" |
        iconv -f 'UTF-8' -t 'ISO-8859-10' |
        sed 's/TEGNSETT UTF-8/TEGNSETT ISO8859-10/' > "${path%.*}_iso8859-10.${path##*.}"
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants