Merge branch 'scribe-org:main' into AK-Latin

scribe-org · Oct 23, 2024 · febc71e · febc71e
2 parents 35a1c18 + 399efe2
commit febc71e
Show file tree

Hide file tree

Showing 174 changed files with 3,555 additions and 2,206 deletions.
diff --git a/.github/workflows/check_query_forms.yaml b/.github/workflows/check_query_forms.yaml
@@ -0,0 +1,46 @@
+name: Check Query Forms
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+    types: [opened, reopened, synchronize]
+
+jobs:
+  format_check:
+    strategy:
+      fail-fast: false
+      matrix:
+        os:
+          - ubuntu-latest
+        python-version:
+          - "3.9"
+
+    runs-on: ${{ matrix.os }}
+
+    name: Run Check Query Forms
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Add project root to PYTHONPATH
+        run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt
+
+      - name: Run check_query_forms.py
+        working-directory: ./src/scribe_data/check
+        run: python check_query_forms.py
+
+      - name: Post-run status
+        if: failure()
+        run: echo "Project SPARQL query forms check failed. Please fix the reported errors."
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,9 +16,9 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/).
 
 - Scribe-Data is now a fully functional CLI.
   - Querying Wikidata lexicographical data can be done via the `--query` command ([#159](https://github.com/scribe-org/Scribe-Data/issues/159)).
-    - The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
-    - Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
-    - The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
+  - The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
+  - Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
+  - The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
   - Total Wikidata lexemes for languages and data types can be derived with the `--total` command ([#147](https://github.com/scribe-org/Scribe-Data/issues/147)).
   - Commands can be used via an interactive mode with the `--interactive` command ([#158](https://github.com/scribe-org/Scribe-Data/issues/158)).
 - Articles are removed from machine translations so they're more directly useful in Scribe applications ([#96](https://github.com/scribe-org/Scribe-Data/issues/96)).

diff --git a/docs/source/_static/CONTRIBUTING.rst b/docs/source/_static/CONTRIBUTING.rst
@@ -16,7 +16,7 @@ Contents
 -  `First steps as a contributor <#first-steps-as-a-contributor>`__
 -  `Learning the tech stack <#learning-the-tech-stack>`__
 -  `Development environment <#development-environment>`__
--  `Issues and projects <#issues-projects>`__
+-  `Issues and projects <#issues-and-projects>`__
 -  `Bug reports <#bug-reports>`__
 -  `Feature requests <#feature-requests>`__
 -  `Pull requests <#pull-requests>`__

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -40,8 +40,11 @@
     "numpydoc",
     "sphinx.ext.viewcode",
     "sphinx.ext.imgmath",
+    "nbsphinx",
 ]
 
+nbsphinx_allow_errors = True
+nbsphinx_execute = "never"
 numpydoc_show_inherited_class_members = False
 numpydoc_show_class_members = False
 

diff --git a/docs/source/notes.rst b/docs/source/notes.rst
@@ -1,9 +1,9 @@
-.. mdinclude:: _static/CONTRIBUTING.rst
+.. include:: _static/CONTRIBUTING.rst
 
 License
 =======
 
 .. literalinclude:: ../../LICENSE.txt
     :language: text
 
-.. mdinclude:: ../../CHANGELOG.md
+.. include:: ../../CHANGELOG.md
diff --git a/docs/source/scribe_data/cli.rst b/docs/source/scribe_data/cli.rst
@@ -67,7 +67,10 @@ Example output:
     adverbs
     emoji-keywords
     nouns
+    personal-pronouns
+    postpositions
     prepositions
+    proper-nouns
     verbs
     -----------------------------------
 
@@ -94,7 +97,10 @@ Example output:
     adverbs
     emoji-keywords
     nouns
+    personal-pronouns
+    postpositions
     prepositions
+    proper-nouns
     verbs
     -----------------------------------
 
@@ -115,7 +121,10 @@ Example output:
     adverbs
     emoji-keywords
     nouns
+    personal-pronouns
+    postpositions
     prepositions
+    proper-nouns
     verbs
     -----------------------------------
 
@@ -137,6 +146,7 @@ Options:
 - ``-dt, --data-type DATA_TYPE``: The data type(s) to get.
 - ``-od, --output-dir OUTPUT_DIR``: The output directory path for results.
 - ``-ot, --output-type {json,csv,tsv}``: The output file type.
+- ``-ope, --outputs-per-entry OUTPUTS_PER_ENTRY``: How many outputs should be generated per data entry.
 - ``-o, --overwrite``: Whether to overwrite existing files (default: False).
 - ``-a, --all ALL``: Get all languages and data types.
 - ``-i, --interactive``: Run in interactive mode.
@@ -257,7 +267,7 @@ Examples:
 
 .. code-block:: text
 
-    $scribe-data total -lang English -dt nouns
+    $scribe-data total -lang English -dt nouns  # verbs, adjectives, etc
     Language: English
     Data type: nouns
     Total number of lexemes: 12345
@@ -278,7 +288,4 @@ Options:
 
 - ``-f, --file FILE``: The file to convert to a new type.
 - ``-ko, --keep-original``: Whether to keep the file to be converted (default: True).
-- ``-json, --to-json TO_JSON``: Convert the file to JSON format.
-- ``-csv, --to-csv TO_CSV``: Convert the file to CSV format.
-- ``-tsv, --to-tsv TO_TSV``: Convert the file to TSV format.
-- ``-sqlite, --to-sqlite TO_SQLITE``: Convert the file to SQLite format.
+- ``-ot, --output-type {json,csv,tsv,sqlite}``: The output file type.
diff --git a/docs/source/scribe_data/wikidata/query_profanity.rst b/docs/source/scribe_data/wikidata/query_profanity.rst
@@ -24,8 +24,7 @@ Queries all profane words from a given language to be removed from autosuggest o
         }.
 
         FILTER EXISTS {?sense wdt:P6191 ?filter.}.
-
-        }
+    }
 
     ORDER BY
         lcase(?lemma)

diff --git a/docs/source/scribe_data/wikipedia/gen_autosuggestions.rst b/docs/source/scribe_data/wikipedia/gen_autosuggestions.rst
@@ -3,9 +3,10 @@ gen_autosuggestions.ipynb
 
 `View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb>`_
 
-Scribe Autosuggest Generation
------------------------------
-
 This notebook is used to run the functions found in Scribe-Data to extract, clean and load autosuggestion files into Scribe apps.
 
+.. toctree::
+
+   notebook.ipynb
+
 Use the :code:`View code on GitHub` link above to view the notebook and explore the process!