-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenSearch implementation #107
base: master
Are you sure you want to change the base?
Conversation
114c10a
to
861e685
Compare
Note: further changes to be added following merge of ElliottKasoar#31 As discussed with @stenczelt, ideally this will be split into 2-3 PRs (CI + OpenSearch) |
1be979f
to
04089d3
Compare
ea1907e
to
fa1ca36
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #107 +/- ##
=========================================
Coverage ? 59.29%
=========================================
Files ? 25
Lines ? 1646
Branches ? 0
=========================================
Hits ? 976
Misses ? 670
Partials ? 0 ☔ View full report in Codecov by Sentry. |
8e4036d
to
0e30256
Compare
Co-authored-by: Jacob Wilkins <[email protected]>
Co-authored-by: Jacob Wilkins <[email protected]>
Co-authored-by: Jacob Wilkins <[email protected]>
Co-authored-by: Jacob Wilkins <[email protected]>
03212eb
to
39086be
Compare
OpenSearchDatabase
The focus of this PR is the implementation of the
OpenSearchDatabase
class inabcd/backends/atoms_opensearch.py
, designed to mirror theMongoDatabase
class inabcd/backends/atoms_pymongo.py
. Where possible, functions should behave equivalently between the two classes, although in at least once case (OpenSearchDatabase.property
), a more efficient alternative is provided (OpenSearchDatabase.count_property
).While it would be possible to use OpenSearch in combination of MongoDB (both generally, and as a relatively straightforward extension of this implementation), it seems to make more sense to use OpenSearch as the database itself, as efficiencies from OpenSearch queries are due to processing on ingestion. Having ingested data into OpenSearch, the data is stored as JSON documents, so also storing the data in MongoDB would require duplication of most, if not all, data.
Unit testing, both in mock form, similar to those currently written for MongoDB, and a more completely set of new tests, designed to connect to a live containerised database through GitHub Actions, have also been written.
Properties
A new class in
abcd/backends/atoms_properties.py
is designed to read in extra information from a CSV file, as well as infer units and the relevant structure files via a template. Unit testing for this class have also been written.Query parsing
OpenSearch queries can be relatively complex to construct, so this proposes the use of Luqum, which allows queries to be written using the Lucene Query DSL, and parsed into an Elastic/OpenSearch string query.
Parsing to enable extra information to be added in
abcd/parsers/extras.py
is largely unchanged, although I extended it slightly to allow expressions in the form of Lucene queries (e.g.key:value
).Misc
Note: The initial commits are required for later OpenSearch commits, but were written as a separate branch, as they focus on implementing poetry for package installation and dependency management, and GitHub Actions for unit testing, as well as a fixes to query parsing and pymongo for newer versions of the packages. A separate PR could, therefore, be made for these non-OpenSearch oriented changes, if desired. More general changes to legacy code due to the use of
flake8
andblack
could also be separated out, but would be more work to untangle.To do
Remaining work to be done is documented in more detail here, of which testing integration with the GUI is perhaps the most significant remaining feature to be worked on that already exists for MongoDB. However, a number of new features will also be required for PSDI, including integration with AiiDA and external databases, storage of potentials, and new metadata.