Skip to content

Commit

Permalink
Merge branch 'main' into AK-Contributions-Emoji-Functionality
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewtavis authored Oct 24, 2024
2 parents 8066f2e + 52c8363 commit e3e6870
Show file tree
Hide file tree
Showing 405 changed files with 6,359 additions and 4,872 deletions.
32 changes: 32 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: 📝 Documentation
description: Suggest improvements or updates to the documentation of Scribe-Data.
labels: ["documentation"]
projects: ["scribe-org/1"]
body:
- type: checkboxes
id: doc-enhancement
attributes:
label: Terms
options:
- label: I have searched all [open documentation issues](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3Adocumentation)
required: true
- label: I agree to follow Scribe-Data's [Code of Conduct](https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md)
required: true
- type: textarea
attributes:
label: Current Documentation
placeholder: |
Provide a brief description or link to the current documentation you want to enhance.
validations:
required: true
- type: textarea
attributes:
label: Suggested Enhancement
placeholder: |
Describe the improvements or changes you'd like to see in the documentation.
validations:
required: true
- type: markdown
attributes:
value: |
Thanks for helping improve our documentation!
46 changes: 46 additions & 0 deletions .github/workflows/check_query_forms.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Check Query Forms
on:
push:
branches: [main]
pull_request:
branches: [main]
types: [opened, reopened, synchronize]

jobs:
format_check:
strategy:
fail-fast: false
matrix:
os:
- ubuntu-latest
python-version:
- "3.9"

runs-on: ${{ matrix.os }}

name: Run Check Query Forms

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run check_query_forms.py
working-directory: ./src/scribe_data/check
run: python check_query_forms.py

- name: Post-run status
if: failure()
run: echo "Project SPARQL query forms check failed. Please fix the reported errors."
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/).

- Scribe-Data is now a fully functional CLI.
- Querying Wikidata lexicographical data can be done via the `--query` command ([#159](https://github.com/scribe-org/Scribe-Data/issues/159)).
- The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
- Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
- The output type of queries can be in JSON, CSV, TSV and SQLite, with conversions output types also being possible ([#145](https://github.com/scribe-org/Scribe-Data/issues/145), [#146](https://github.com/scribe-org/Scribe-Data/issues/146))
- Output paths can be set for query results ([#144](https://github.com/scribe-org/Scribe-Data/issues/144)).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself ([#186](https://github.com/scribe-org/Scribe-Data/issues/186), [#157 ](https://github.com/scribe-org/Scribe-Data/issues/157)).
- Total Wikidata lexemes for languages and data types can be derived with the `--total` command ([#147](https://github.com/scribe-org/Scribe-Data/issues/147)).
- Commands can be used via an interactive mode with the `--interactive` command ([#158](https://github.com/scribe-org/Scribe-Data/issues/158)).
- Articles are removed from machine translations so they're more directly useful in Scribe applications ([#96](https://github.com/scribe-org/Scribe-Data/issues/96)).
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Check out Scribe's [architecture diagrams](https://github.com/scribe-org/Organiz

The CLI commands defined within [scribe_data/cli](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/cli) and the notebooks within the various [scribe_data](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data) directories are used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) once they're active.

The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.
The main data update process in triggers [language based SPARQL queries](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) to query language data from [Wikidata](https://www.wikidata.org/) using [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper) as a URI. The autosuggestion process derives popular words from [Wikipedia](https://www.wikipedia.org/) as well as those words that normally follow them for an effective baseline feature until natural language processing methods are employed. Functions to generate autosuggestions are ran in [gen_autosuggestions.ipynb](https://github.com/scribe-org/Scribe-Data/blob/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb). Emojis are further sourced from [Unicode CLDR](https://github.com/unicode-org/cldr), with this process being ran via the `scribe-data get -lang LANGUAGE -dt emoji-keywords` command.

<a id="cli-usage"></a>

Expand Down Expand Up @@ -197,7 +197,7 @@ See the [contribution guidelines](https://github.com/scribe-org/Scribe-Data/blob

# Supported Languages [`⇧`](#contents)

Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).
Scribe's goal is functional, feature-rich keyboards and interfaces for all languages. Check the [language_data_extraction](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction) directory for queries for currently supported languages and those that have substantial data on [Wikidata](https://www.wikidata.org/).
The following table shows the supported languages and the amount of data available for each on [Wikidata](https://www.wikidata.org/) and via [Unicode CLDR](https://github.com/unicode-org/cldr) for emojis:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/_static/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Contents
- `First steps as a contributor <#first-steps-as-a-contributor>`__
- `Learning the tech stack <#learning-the-tech-stack>`__
- `Development environment <#development-environment>`__
- `Issues and projects <#issues-projects>`__
- `Issues and projects <#issues-and-projects>`__
- `Bug reports <#bug-reports>`__
- `Feature requests <#feature-requests>`__
- `Pull requests <#pull-requests>`__
Expand Down
4 changes: 2 additions & 2 deletions docs/source/notes.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. mdinclude:: _static/CONTRIBUTING.rst
.. include:: _static/CONTRIBUTING.rst

License
=======

.. literalinclude:: ../../LICENSE.txt
:language: text

.. mdinclude:: ../../CHANGELOG.md
.. include:: ../../CHANGELOG.md
36 changes: 19 additions & 17 deletions docs/source/scribe_data/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,20 +56,22 @@ Example output:
$ scribe-data list
Language ISO QID
-----------------------
==========================
English en Q1860
...
-----------------------
Available data types: All languages
-----------------------------------
===================================
adjectives
adverbs
emoji-keywords
nouns
personal-pronouns
postpositions
prepositions
proper-nouns
verbs
-----------------------------------
Expand All @@ -78,46 +80,48 @@ Example output:
$scribe-data list --language
Language ISO QID
-----------------------
==========================
English en Q1860
...
-----------------------
.. code-block:: text
$scribe-data list -dt
Available data types: All languages
-----------------------------------
===================================
adjectives
adverbs
emoji-keywords
nouns
personal-pronouns
postpositions
prepositions
proper-nouns
verbs
-----------------------------------
.. code-block:: text
$scribe-data list -a
Language ISO QID
-----------------------
==========================
English en Q1860
...
-----------------------
Available data types: All languages
-----------------------------------
===================================
adjectives
adverbs
emoji-keywords
nouns
personal-pronouns
postpositions
prepositions
proper-nouns
verbs
-----------------------------------
Get Command
~~~~~~~~~~~
Expand All @@ -137,6 +141,7 @@ Options:
- ``-dt, --data-type DATA_TYPE``: The data type(s) to get.
- ``-od, --output-dir OUTPUT_DIR``: The output directory path for results.
- ``-ot, --output-type {json,csv,tsv}``: The output file type.
- ``-ope, --outputs-per-entry OUTPUTS_PER_ENTRY``: How many outputs should be generated per data entry.
- ``-o, --overwrite``: Whether to overwrite existing files (default: False).
- ``-a, --all ALL``: Get all languages and data types.
- ``-i, --interactive``: Run in interactive mode.
Expand Down Expand Up @@ -257,7 +262,7 @@ Examples:
.. code-block:: text
$scribe-data total -lang English -dt nouns
$scribe-data total -lang English -dt nouns # verbs, adjectives, etc
Language: English
Data type: nouns
Total number of lexemes: 12345
Expand All @@ -278,7 +283,4 @@ Options:

- ``-f, --file FILE``: The file to convert to a new type.
- ``-ko, --keep-original``: Whether to keep the file to be converted (default: True).
- ``-json, --to-json TO_JSON``: Convert the file to JSON format.
- ``-csv, --to-csv TO_CSV``: Convert the file to CSV format.
- ``-tsv, --to-tsv TO_TSV``: Convert the file to TSV format.
- ``-sqlite, --to-sqlite TO_SQLITE``: Convert the file to SQLite format.
- ``-ot, --output-type {json,csv,tsv,sqlite}``: The output file type.
1 change: 0 additions & 1 deletion docs/source/scribe_data/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ Scribe-Data
.. toctree::
:maxdepth: 2

language_data_extraction/index
load/index
unicode/index
wikidata/index
Expand Down
1 change: 1 addition & 0 deletions docs/source/scribe_data/wikidata/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ wikidata/
:maxdepth: 2

check_query/index
language_data_extraction/index

.. toctree::
:maxdepth: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
language_data_extraction/
=========================

`View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/language_data_extraction>`_
`View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikidata/language_data_extraction>`_

This directory contains all language extraction and formatting code for Scribe-Data. The structure is broken down by language, with each language sub-directory then including directories for nouns, prepositions, translations and verbs if needed. Within these data type directories are :code:`query_DATA_TYPE.sparql` SPARQL files that are ran to query Wikidata and then formatted with the given :code:`format_DATA_TYPE.py` Python files.

Expand Down
3 changes: 1 addition & 2 deletions docs/source/scribe_data/wikidata/query_profanity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ Queries all profane words from a given language to be removed from autosuggest o
}.
FILTER EXISTS {?sense wdt:P6191 ?filter.}.
}
}
ORDER BY
lcase(?lemma)
Expand Down
3 changes: 0 additions & 3 deletions docs/source/scribe_data/wikipedia/gen_autosuggestions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ gen_autosuggestions.ipynb

`View code on Github <https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/wikipedia/gen_autosuggestions.ipynb>`_

Scribe Autosuggest Generation
-----------------------------

This notebook is used to run the functions found in Scribe-Data to extract, clean and load autosuggestion files into Scribe apps.

Use the :code:`View code on GitHub` link above to view the notebook and explore the process!
6 changes: 3 additions & 3 deletions src/scribe_data/check/check_project_structure.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@

import os

from scribe_data.cli.cli_utils import (
from scribe_data.utils import (
LANGUAGE_DATA_EXTRACTION_DIR,
data_type_metadata,
language_metadata,
)

# Expected languages and data types.
LANGUAGES = [lang.capitalize() for lang in language_metadata.keys()]
LANGUAGES = list(language_metadata.keys())
DATA_TYPES = data_type_metadata.keys()
SUB_DIRECTORIES = {
k.capitalize(): [lang.capitalize() for lang in v["sub_languages"].keys()]
k: list(v["sub_languages"].keys())
for k, v in language_metadata.items()
if len(v.keys()) == 1 and "sub_languages" in v.keys()
}
Expand Down
Loading

0 comments on commit e3e6870

Please sign in to comment.