Unicode data and manipulation.
Required: pandas, pudzu-utils.
unicode_resources: List the *.txt Unicode resource files present in the package (or a given path).
>> unicode_resources()
['Blocks',
'DerivedCoreProperties',
'PropList',
'ScriptExtensions',
'Scripts',
'UnicodeData',
'emoji-data']
unicode_data: Extract the core unicode data and specified properties, either from the packaged files or from a given path. Extracts from fresh each time, so for repeat use it may make sense to save the result.
>> unicode_data(("Blocks", "Scripts"))
[10:53:32] unicode:INFO - Extracting UnicodeData.txt from package...
[10:53:33] unicode:INFO - Extracting Blocks.txt from package...
[10:53:36] unicode:INFO - Extracting Scripts.txt from package...
Name General_Category Canonical_Combining_Class ... Code_Point Blocks Scripts
0 <control> Cc 0 ... 0000 Basic Latin Common
1 <control> Cc 0 ... 0001 Basic Latin Common
2 <control> Cc 0 ... 0002 Basic Latin Common
3 <control> Cc 0 ... 0003 Basic Latin Common
4 <control> Cc 0 ... 0004 Basic Latin Common
... ... ... ... ... ... ... ...
917995 VARIATION SELECTOR-252 Mn 0 ... e01eb Variation Selectors Supplement Inherited
917996 VARIATION SELECTOR-253 Mn 0 ... e01ec Variation Selectors Supplement Inherited
917997 VARIATION SELECTOR-254 Mn 0 ... e01ed Variation Selectors Supplement Inherited
917998 VARIATION SELECTOR-255 Mn 0 ... e01ee Variation Selectors Supplement Inherited
917999 VARIATION SELECTOR-256 Mn 0 ... e01ef Variation Selectors Supplement Inherited
[137994 rows x 15 columns]
Includes Unicode 13.0.0 data from https://www.unicode.org/Public/13.0.0/ucd/. For terms of use, see http://www.unicode.org/terms_of_use.html.