A unicode codepoint is always modeled as a char32_t. Because I don’t like typedefs, there is no codepoint_t
.
Customization point objects codepoint
and grapheme
whose default behavior is to treat the View that they are given as a sequence of UTF-8 and produce Views over char32_t codepoints, and Views over Views of char32_t codepoints, codepoint_view
and grapheme_view
. Draconian error handling, producing nothing if the underlying data is not well formed. There may be other transcoding objects providing replacements or callbacks for error handling.
A Concept, UnicodeText
, which requires that codepoint(T const& t)
and grapheme(T const& t)
are well formed.
Concepts CodepointView
and GraphemeView
, which syntactically require that a type is View of char32_t and View of View of char32_t respectively, but have the additional semantic requirement of being ranges of codepoints and grapheme clusters.
Unicode algorithms take Concepts of either UnicodeText or the Concept they require, and the second is producible from the first.
std::text will model UnicodeText.
std::text does not model Range. Access to code units is via text::data()
, codepoints via text::codepoint()
, EGCs via text::grapheme()
.