-
-
Notifications
You must be signed in to change notification settings - Fork 46
/
Copy pathbgWordlistReadme.txt
23 lines (18 loc) · 1.15 KB
/
bgWordlistReadme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Bulgarian wordlist 1 by Miglen Georgiev
Version: f46eff1 (2022-04-26)
Source: https://github.com/miglen/bulgarian-wordlists/blob/master/wordlists/bg-words-validated-cyrillic.txt
License: https://github.com/miglen/bulgarian-wordlists/blob/master/LICENSE
Bulgarian wordlist 2 by michmech
Version: 9c91fe4
Source: https://github.com/michmech/lemmatization-lists/blob/master/lemmatization-bg.txt
License: https://github.com/michmech/lemmatization-lists/blob/master/LICENCE
Bulgarian wordlist 3 by chitanka
Source: https://rechnik.chitanka.info/about
Github: https://github.com/chitanka/rechko
License: Just "free download", so assuming public domain.
Also, used the wooorm's hunspell-compatible dictionary to determine which words need to start with a capital letter
Link: https://github.com/wooorm/dictionaries/tree/main/dictionaries/bg
Git commit: 13 Apr 2022 [0c78cc810c8aafb2e6f5140bb6dcd4026b247eb8]
Additionally cleaned up repeating words and added some missing ones manually.
Word frequencies obtained from the "General" word frequency dictionary by the Department of Computational Linguistics of the Bulgarian Academy of Sciences.
Link: https://dcl.bas.bg/frequency.html