Skip to content
/ idna Public
forked from ada-url/idna

C++ library implementing the to_ascii and to_unicode functions from the Unicode Technical Standard.

License

Notifications You must be signed in to change notification settings

reqable/idna

 
 

Repository files navigation

Unicode IDNA

OpenSSF Scorecard Badge VS17-CI Alpine Linux Alpine Linux

The ada-url/ada library is a C++ library implementing the to_ascii and to_unicode functions from the Unicode Technical Standard supporting a wide range of systems. It is suitable for URL parsing.

Our IDNA library is used by the Node.js runtime and by ClickHouse.

According to our benchmarks, it can be faster than ICU.

Requirements

  • A recent C++ compiler supporting C++20. We test GCC 12 or better, LLVM 12 or better and Microsoft Visual Studio 2022.

Usage

std::string_view input = u8"meßagefactory.ca";// non-empty UTF-8 string, must be percent decoded
std::string idna_ascii = ada::idna::to_ascii(input);
if(idna_ascii.empty()) {
    // There was an error.
}
std::cout << idna_ascii << std::endl;
// outputs 'xn--meagefactory-m9a.ca' if the input is u8"meßagefactory.ca"

Benchmarks

You may build a benchmarking tool with the library as follows under macOS and Linux:

cmake -D ADA_IDNA_BENCHMARKS=ON -B build
cmake --build build
./build/benchmarks/to_ascii

The commands for users of Visual Studio are slightly different.

Sample result (LLVM 14, Apple M1 Max processor):

---------------------------------------------------------------------
Benchmark           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------
Ada              1504 ns         1504 ns       440984 speed=48.5371M/s time/byte=20.6028ns time/domain=250.667ns url/s=3.98935M/s
Icu              1898 ns         1897 ns       369967 speed=38.4721M/s time/byte=25.9928ns time/url=316.246ns url/s=3.16209M/s

License

This code is made available under the Apache License 2.0 as well as the MIT license.

Our tests include third-party code and data. The benchmarking code includes third-party code: it is provided for research purposes only and not part of the library.

About

C++ library implementing the to_ascii and to_unicode functions from the Unicode Technical Standard.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 96.0%
  • C 2.4%
  • Other 1.6%