I wrote this program as an exercise because the only other RFC I implemented before this one was my internet checksum in 8086 assembly which is very simple. This RFC is more complicated and seemed like a good next step.
If you actually intend to use this program, please contact me, and I will pay more attention to development.
The library itself has no dependencies. The command line utility and the tests depend on either of libbsd or libobsd. The command line utility additionally depends on libunistring.
The compilation process is the usual for Meson:
$ meson setup build
$ meson compile -C build
The shell utility and the tests can be disabled:
$ meson setup build -Dutility=false -Dtests=false
This removes the need for their dependencies.
If you have issues finding libunistring, specify the package library and include paths manually:
$ meson setup build "-Dpkg_paths=['/usr/local/lib', '/usr/local/include']"
Replace /usr/local/lib and /usr/local/include with paths appropriate for your system.
In your build system, tell pkgconf to look for libpunycode. This is the only supported method of linking against the library and providing its include path.
In your C or C++ program, follow the library manual.
If you desire more complete example code, read the source of the utility at src/punycode.c.
You can also integrate the library into your source tree; for that, copypaste src/libpunycode.c and src/punycode.h into your program. They were written with standalone usage in mind.
The utility is a filter:
$ echo dog | build/punycode
dog-
$ echo Leoš Janáček | build/punycode
leo janek-61a89bk6a
Usage information is present in the utility manual.
The API is designed with the idea that an implementation of a standard should do what ought to be done within the standard, not literally everything the standard allows. As such, there is no support for mixed case because it complicates the encoder and its usage massively and makes no functional difference.
The API uses idiomatic C and targets modern systems and ease of use, so none of the weirdness that is present in the reference implementation is present in this implementation. For instance, the reference implementation does not create valid C strings and requires that the length of the source string be passed explicitly through an argument, this implementation does what you'd expect for a C function that takes an input string and writes an output string.
Read the manuals and the source code for in-depth information about the design and implementation of this punycode codec.
src/libpunycode.c and src/punycode.h
contain the encoder and are intended to be usable standalone. They are intended
to be maximally compatible, written in pure C99, and make no assumptions about
the underlying machine and operating system, the assumptions we do not make
include but are not limited to the source or execution character sets (although
input/output are UTF-8) or the size of int
.
spec/ contains the specification and the reference implementation, useful for development.
The other source files target a loosely POSIX 2008+ operating system with UTF-8 support.
src/ files are written with API simplicity and implementation speed and robustness in mind. test/ files are written with only simplicity in mind.
The future directions are to write more tests and a decoder.