Skip to content

Latest commit

 

History

History
45 lines (36 loc) · 2.3 KB

unicode-strawman.org

File metadata and controls

45 lines (36 loc) · 2.3 KB

Unicode Strawman

1 A C++ 20 Range API for Unicode Text (for C++23)

A unicode codepoint is always modeled as a char32_t. Because I don’t like typedefs, there is no codepoint_t.

Customization point objects codepoint and grapheme whose default behavior is to treat the View that they are given as a sequence of UTF-8 and produce Views over char32_t codepoints, and Views over Views of char32_t codepoints, codepoint_view and grapheme_view. Draconian error handling, producing nothing if the underlying data is not well formed. There may be other transcoding objects providing replacements or callbacks for error handling.

A Concept, UnicodeText, which requires that codepoint(T const& t) and grapheme(T const& t) are well formed.

Concepts CodepointView and GraphemeView, which syntactically require that a type is View of char32_t and View of View of char32_t respectively, but have the additional semantic requirement of being ranges of codepoints and grapheme clusters.

Unicode algorithms take Concepts of either UnicodeText or the Concept they require, and the second is producible from the first.

std::text will model UnicodeText.

std::text does not model Range. Access to code units is via text::data(), codepoints via text::codepoint(), EGCs via text::grapheme().