Python API for interacting with dserver.
This package offers a class-based asynchronous lookup API within dtool_lookup_api.core.LookupClient
,
a simple class-less wrapper around it at dtool_lookup_api.asynchronous
,
and a synchronous interface on top at dtool_lookup_api.synchronous
.
Direct imports of utility functions from dtool_lookup_api in the examples below forward to the synchronous API variant.
To install the dtool_lookup_api package.
pip install dtool_lookup_api
This package depends on a dserver instance to talk to.
The API needs to know the URL of the lookup server
export DSERVER_URL=https://localhost:5000
You may also need specify an access token generated on the server
export DSERVER_TOKEN=$(flask user token testuser)
Instead of specifying the access token directly, it is also possible to provide
export DSERVER_TOKEN_GENERATOR_URL=https://localhost:5001
export DSERVER_USERNAME=my-username
export DSERVER_PASSWORD=my-password
for the API to request a token. This, however, is intended only for testing purposes and strongly discouraged in a production environment, as your password would reside within environment variables or the dtool config file as clear text.
Our recommended setup is a combination of
export DSERVER_URL=https://localhost:5000
export DSERVER_TOKEN_GENERATOR_URL=https://localhost:5001
in the config. If used interactively, the API will then ask for your credentials at the first interaction and cache the provided values for this session, i.e.
In [1]: from dtool_lookup_api import query ...: res = query( ...: { ...: 'readme.owners.name': {'$regex': '^Testing User$'}, ...: } ...: ) Authentication URL https://localhost:5001/token username:my-username Authentication URL https://localhost:5001/token password: In [2]: res Out[2]: [{'base_uri': 'smb://test-share', 'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT', 'creator_username': 'jotelha', 'dtoolcore_version': '3.17.0', 'frozen_at': 'Wed, 11 Nov 2020 17:20:30 GMT', 'name': 'simple_test_dataset', 'tags': [], 'type': 'dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}] In [3]: from dtool_lookup_api import all ...: all() Out[4]: [{'base_uri': 'smb://test-share', 'created_at': 1604860720.736269, 'creator_username': 'jotelha', 'frozen_at': 1604921621.719575, 'name': 'simple_test_dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
Credentials caching and interactive prompting are turned off with
In [1]: import dtool_lookup_api.core.config ...: dtool_lookup_api.core.config.Config.interactive = False ...: dtool_lookup_api.core.config.Config.cache = False In [2]: from dtool_lookup_api import all ...: all() ... RuntimeError: Authentication failed
For testing purposes, it is possible to disable SSL certificates validation with
export DSERVER_VERIFY_SSL=false
As usual, these settings may be specified within the default dtool configuration
file as well, i.e. at ~/.config/dtool/dtool.json
{
"DSERVER_TOKEN_GENERATOR_URL": "https://localhost:5001/token",
"DSERVER_URL": "https://localhost:5000"
}
To list all registered datasets
In [1]: from dtool_lookup_api import get_datasets ...: res = get_datasets() In [2]: res Out[2]: [{'base_uri': 'smb://test-share', 'created_at': 1604860720.736269, 'creator_username': 'jotelha', 'frozen_at': 1604921621.719575, 'name': 'simple_test_dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
To lookup URIs from a dataset UUID within Python
In [1]: from dtool_lookup_api import get_datasets_by_uuid ...: uuid = "1a1f9fad-8589-413e-9602-5bbd66bfe675" ...: res = get_datasets_by_uuid(uuid) In [2]: res Out[2]: [{'base_uri': 'smb://test-share', 'created_at': 1604860720.736269, 'creator_username': 'jotelha', 'frozen_at': 1604921621.719575, 'name': 'simple_test_dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
Full text search for the word "test"
In [1]: from dtool_lookup_api import get_datasets ...: res = get_datasets(free_text="test") In [2]: res Out[2]: [{'base_uri': 'smb://test-share', 'created_at': 1604860720.736, 'creator_username': 'jotelha', 'dtoolcore_version': '3.17.0', 'frozen_at': 1605027357.308, 'name': 'simple_test_dataset', 'tags': [], 'type': 'dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
Request the manifest of a particular dataset by URI
In [1]: from dtool_lookup_api import get_manifest ...: uri = 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675' ...: res = get_manifest(uri) In [2]: res Out[2]: {'dtoolcore_version': '3.17.0', 'hash_function': 'md5sum_hexdigest', 'items': {'eb58eb70ebcddf630feeea28834f5256c207edfd': {'hash': '2f7d9c3e0cfd47e8fcab0c12447b2bf0', 'relpath': 'simple_text_file.txt', 'size_in_bytes': 17, 'utc_timestamp': 1605027357.284966}}}
Request the readme content of a particular dataset by URI
In [1]: from dtool_lookup_api import get_readme ..: res = get_readme('smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675') In [2]: import yaml ..: yaml.safe_load(res) Out[2]: {'creation_date': '2020-11-08', 'description': 'testing description', 'expiration_date': '2022-11-08', 'funders': [{'code': 'testing_code', 'organization': 'testing_organization', 'program': 'testing_program'}], 'owners': [{'email': '[email protected]', 'name': 'Testing User', 'orcid': 'testing_orcid', 'username': 'testing_user'}], 'project': 'testing project'}
To list all datasets at a certain base URI with their name matching some regular expression pattern, send a direct mongo language query to the server with
In [15]: from dtool_lookup_api import query ...: res = query( ...: { ...: 'base_uri': 'smb://test-share', ...: 'name': {'$regex': 'test'}, ...: } ...: ) In [16]: res Out[16]: [{'base_uri': 'smb://test-share', 'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT', 'creator_username': 'jotelha', 'dtoolcore_version': '3.17.0', 'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT', 'name': 'simple_test_dataset', 'tags': [], 'type': 'dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
It is possible to search readme content via
In [21]: from dtool_lookup_api import query ...: res = query( ...: { ...: 'readme.owners.name': {'$regex': '^Testing User$'}, ...: } ...: ) In [22]: res Out[22]: [{'base_uri': 'smb://test-share', 'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT', 'creator_username': 'jotelha', 'dtoolcore_version': '3.17.0', 'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT', 'name': 'simple_test_dataset', 'tags': [], 'type': 'dataset', 'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675', 'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]
This requires the server-side dserver-direct-mongo-plugin.
TODO: Response from server-side direct mongo plugin still yields dates as strings. Fix within https://github.com/IMTEK-Simulation/dserver-direct-mongo-plugin.
The current implementation via asgiref.async_to_sync
(https://github.com/django/asgiref)
hinders the use of the synchronous interface within Jupyter notebooks.
Directly use the asynchronous api instead
import dtool_lookup_api.asynchronous as dl
res = await dl.query({
'base_uri': 'smb://test-share',
'name': {'$regex': 'test'},
})
The drawback of the above approach is that the same code doesn't work in python and in jupyter (await outsite of a function is a syntax error in non-interactive python context). The code below can be executed in both contexts:
import dtool_lookup_api.asynchronous as dl
if asyncio.get_event_loop().is_running():
# then we are in jupyter notebook
# this allows nested event loops, i.e. calls to asyncio.run inside the notebook as well
# This way, the same code works in notebook and python
import nest_asyncio
nest_asyncio.apply()
def query(query_dict):
return asyncio.run(dl.query(query_dict))
query({
'base_uri': 'smb://test-share',
'name': {'$regex': 'test'},
})
See jupyter/notebook#3397 (comment), https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
Install editable with testing requirements with
pip install -e .[test]
By default, tests rely on the demo.dtool.dev demonstrator instance.
This can be changed by configuring
DSERVER_ADDRESS_AND_CREDENTIALS = {
"DSERVER_URL": "https://demo.dtool.dev/lookup",
"DSERVER_TOKEN_GENERATOR_URL": "https://demo.dtool.dev/token",
"DSERVER_USERNAME": "testuser",
"DSERVER_PASSWORD": "test_password",
"DSERVER_VERIFY_SSL": False,
}
within tests/conftest.py
.