Command line scripts for processing of lexicographical data from Wikidata.
Which labels contain spaces?
$ make labels.tsv
$ awk -F $'\t' '$3 ~ / / {print}' labels.tsv
Which properties are used how frequently on lexemes and forms?
$ make properties.tsv
$ awk '{print $2}' properties.tsv | ./histogram
Which language codes are used how often?
$ make languages.tsv
The following extended processings requires to install wikidata-cli.
Which properties are used how frequently, with property labels:
$ make properties.tsv plabels.tsv
$ awk '{print $2}' properties.tsv | sort | join plabels.tsv - | ./histogram