Merge pull request #454 from catreedle/wikidata

move language_data_extraction under wikidata and lowercase languages
scribe-org · Oct 23, 2024 · 180950d · 180950d
2 parents 399efe2 + 7e0c521
commit 180950d
Show file tree

Hide file tree

Showing 397 changed files with 127 additions and 1,619 deletions.
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ Check out Scribe's [architecture diagrams](https://github.com/scribe-org/Organiz
 
 The CLI commands defined within [scribe_data/cli](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/cli) and the notebooks within the various [scribe_data](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data) directories are used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) once they're active.
 
-The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.
+The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.
 
 <a id="cli-usage"></a>
 
@@ -197,7 +197,7 @@ See the [contribution guidelines](https://github.com/scribe-org/Scribe-Data/blob
 
 # Supported Languages [`⇧`](#contents)
 
-Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).
+Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).
 
 The following table shows the supported languages and the amount of data available for each on [Wikidata](https://www.wikidata.org/) and via [Unicode CLDR](https://github.com/unicode-org/cldr) for emojis:
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -40,11 +40,8 @@
     "numpydoc",
     "sphinx.ext.viewcode",
     "sphinx.ext.imgmath",
-    "nbsphinx",
 ]
 
-nbsphinx_allow_errors = True
-nbsphinx_execute = "never"
 numpydoc_show_inherited_class_members = False
 numpydoc_show_class_members = False
 

diff --git a/docs/source/scribe_data/index.rst b/docs/source/scribe_data/index.rst
@@ -6,7 +6,6 @@ Scribe-Data
 .. toctree::
     :maxdepth: 2
 
-    language_data_extraction/index
     load/index
     unicode/index
     wikidata/index

diff --git a/docs/source/scribe_data/wikidata/index.rst b/docs/source/scribe_data/wikidata/index.rst
@@ -7,6 +7,7 @@ wikidata/
     :maxdepth: 2
 
     check_query/index
+    language_data_extraction/index
 
 .. toctree::
     :maxdepth: 1

diff --git a/...e_data/language_data_extraction/index.rst → ...kidata/language_data_extraction/index.rst b/...e_data/language_data_extraction/index.rst → ...kidata/language_data_extraction/index.rst
@@ -1,7 +1,7 @@
 language_data_extraction/
 =========================
 
-`View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction>`_
+`View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction>`_
 
 This directory contains all language extraction and formatting code for Scribe-Data. The structure is broken down by language, with each language sub-directory then including directories for nouns, prepositions, translations and verbs if needed. Within these data type directories are :code:`query_DATA_TYPE.sparql` SPARQL files that are ran to query Wikidata and then formatted with the given :code:`format_DATA_TYPE.py` Python files.
 

diff --git a/docs/source/scribe_data/wikipedia/gen_autosuggestions.rst b/docs/source/scribe_data/wikipedia/gen_autosuggestions.rst
@@ -5,8 +5,4 @@ gen_autosuggestions.ipynb
 
 This notebook is used to run the functions found in Scribe-Data to extract, clean and load autosuggestion files into Scribe apps.
 
-.. toctree::
-
-   notebook.ipynb
-
 Use the :code:`View code on GitHub` link above to view the notebook and explore the process!
diff --git a/docs/source/scribe_data/wikipedia/notebook.ipynb b/docs/source/scribe_data/wikipedia/notebook.ipynb
diff --git a/requirements.txt b/requirements.txt
@@ -6,7 +6,6 @@ flax>=0.8.2
 iso639-lang>=2.2.3
 m2r2>=0.3.3
 mwparserfromhell>=0.6
-nbsphinx>=0.9.5
 numpydoc>=1.6.0
 packaging>=20.9
 pandas>=1.5.3