Add OpenSearch implementation #107

ElliottKasoar · 2023-09-04T12:22:34Z

OpenSearchDatabase

The focus of this PR is the implementation of the OpenSearchDatabase class in abcd/backends/atoms_opensearch.py, designed to mirror the MongoDatabase class in abcd/backends/atoms_pymongo.py. Where possible, functions should behave equivalently between the two classes, although in at least once case (OpenSearchDatabase.property), a more efficient alternative is provided (OpenSearchDatabase.count_property).

While it would be possible to use OpenSearch in combination of MongoDB (both generally, and as a relatively straightforward extension of this implementation), it seems to make more sense to use OpenSearch as the database itself, as efficiencies from OpenSearch queries are due to processing on ingestion. Having ingested data into OpenSearch, the data is stored as JSON documents, so also storing the data in MongoDB would require duplication of most, if not all, data.

Unit testing, both in mock form, similar to those currently written for MongoDB, and a more completely set of new tests, designed to connect to a live containerised database through GitHub Actions, have also been written.

Properties

A new class in abcd/backends/atoms_properties.py is designed to read in extra information from a CSV file, as well as infer units and the relevant structure files via a template. Unit testing for this class have also been written.

Query parsing

OpenSearch queries can be relatively complex to construct, so this proposes the use of Luqum, which allows queries to be written using the Lucene Query DSL, and parsed into an Elastic/OpenSearch string query.

Parsing to enable extra information to be added in abcd/parsers/extras.py is largely unchanged, although I extended it slightly to allow expressions in the form of Lucene queries (e.g. key:value).

Misc

Note: The initial commits are required for later OpenSearch commits, but were written as a separate branch, as they focus on implementing poetry for package installation and dependency management, and GitHub Actions for unit testing, as well as a fixes to query parsing and pymongo for newer versions of the packages. A separate PR could, therefore, be made for these non-OpenSearch oriented changes, if desired. More general changes to legacy code due to the use of flake8 and black could also be separated out, but would be more work to untangle.

To do

Remaining work to be done is documented in more detail here, of which testing integration with the GUI is perhaps the most significant remaining feature to be worked on that already exists for MongoDB. However, a number of new features will also be required for PSDI, including integration with AiiDA and external databases, storage of potentials, and new metadata.

ElliottKasoar · 2024-06-10T19:24:58Z

Note: further changes to be added following merge of ElliottKasoar#31

As discussed with @stenczelt, ideally this will be split into 2-3 PRs (CI + OpenSearch)

codecov · 2024-06-12T14:48:00Z

Codecov Report

Attention: Patch coverage is 78.87029% with 101 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@25a79ff). Learn more about missing BASE report.

Files	Patch %	Lines
abcd/backends/atoms_opensearch.py	87.94%	34 Missing ⚠️
abcd/backends/utils.py	39.62%	32 Missing ⚠️
abcd/frontends/commandline/commands.py	48.14%	14 Missing ⚠️
abcd/backends/atoms_pymongo.py	52.38%	10 Missing ⚠️
abcd/backends/atoms_properties.py	89.85%	7 Missing ⚠️
abcd/frontends/commandline/decorators.py	75.00%	3 Missing ⚠️
abcd/model.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master     #107   +/-   ##
=========================================
  Coverage          ?   59.29%           
=========================================
  Files             ?       25           
  Lines             ?     1646           
  Branches          ?        0           
=========================================
  Hits              ?      976           
  Misses            ?      670           
  Partials          ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Jacob Wilkins <[email protected]>

ElliottKasoar force-pushed the add_opensearch branch 2 times, most recently from 114c10a to 861e685 Compare September 7, 2023 14:58

ElliottKasoar force-pushed the add_opensearch branch 8 times, most recently from 1be979f to 04089d3 Compare June 12, 2024 08:17

ElliottKasoar mentioned this pull request Jun 12, 2024

modernisation: convert test suite to pytest #110

Closed

ElliottKasoar force-pushed the add_opensearch branch 2 times, most recently from ea1907e to fa1ca36 Compare June 12, 2024 14:45

ElliottKasoar force-pushed the add_opensearch branch 9 times, most recently from 8e4036d to 0e30256 Compare June 12, 2024 19:54

ElliottKasoar added 6 commits June 13, 2024 21:17

Create initial GitHub Actions workflow

a382a58

Add OpenSearch dependency

7e4662c

Create initial OpenSearch interface

7d35230

Add OpenSearch insertion and deletion functions

86cf6fa

Add OpenSearch property functions

0f59b23

Add openmock dependency

21bbae2

ElliottKasoar and others added 29 commits June 13, 2024 21:19

Fix CLI code formatting

f0314d3

Apply suggestions from code review

ef08360

Co-authored-by: Jacob Wilkins <[email protected]>

Tidy for flake8

0f47586

Tidy README formatting

8702649

Tidy optional type hints

562d1d4

Add return type hint

7781612

Apply suggestions from code review

932b4e2

Co-authored-by: Jacob Wilkins <[email protected]>

Update abcd/backends/atoms_opensearch.py

7a6affe

Co-authored-by: Jacob Wilkins <[email protected]>

Fix type Optional Union type hints

5abd153

Fix connection type

5356d39

Apply suggestions from code review

9b05307

Co-authored-by: Jacob Wilkins <[email protected]>

Fix renaming keys

c546d6d

Tidy setting db

3dddc38

Tidy code

c77316c

Tidy logs

3164fbc

Tidy code

70449ee

Fix extra info

2a1ef18

Tidy code

3276277

Update mongomock tests for pytest

8da3c11

Update mock opensearch tests for pytest

02afdc9

Update CLI tests for pytest

cd95560

Update property tests for pytest

cf70beb

Update opensearch tests for pytest

b95b0b8

Fix opensearch mock tests

84637db

Fix opensearch test

be2d64b

Tidy code

db3c36a

Fix opensearch test

74d24c7

Fix mock opensearch tests

3ba579d

Fix histogram query

39086be

ElliottKasoar force-pushed the add_opensearch branch from 03212eb to 39086be Compare June 13, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenSearch implementation #107

Add OpenSearch implementation #107

ElliottKasoar commented Sep 4, 2023

ElliottKasoar commented Jun 10, 2024

codecov bot commented Jun 12, 2024 •

edited

Loading

Add OpenSearch implementation #107

Are you sure you want to change the base?

Add OpenSearch implementation #107

Conversation

ElliottKasoar commented Sep 4, 2023

ElliottKasoar commented Jun 10, 2024

codecov bot commented Jun 12, 2024 • edited Loading

Codecov Report

codecov bot commented Jun 12, 2024 •

edited

Loading