It's a collaborative ongoing work.
† Mainly contributed by Minghao
‡ Mainly contributed by Randy
- dict_celebrity †
- Hong Kong celebrity
- dict_institution †
- dict_location_hk †
- Includes locations related to Hong Kong. They don't have to be in Hong Kong physically.
- dict_politics ‡
- dict_covid19_vaccine †
- dict_covid19 †
- dict_general_en ‡ †
- dict_general_zh ‡ †
- ad_words †
- Summaried keywords related to advertisement and promotional contents appears frequently in Hong Kong public media.
- Keywords are in regular expression format.
- stopwords_canto ‡ †
- Stopwords in Cantonese.
- stopwords_en ‡ †
- Stopwords in English.
- stopwords_simple †
- A simplified list of Cantonese stopwords. Numbers are included.
- †
- A script made for solving a mistake when loading user-defined dictionary with jieba (jieba3k 0.35.1).