Skip to content

Commit

Permalink
improve docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jmosbacher committed Sep 12, 2023
1 parent 128903b commit 3c4aba9
Show file tree
Hide file tree
Showing 8 changed files with 194 additions and 2 deletions.
Empty file added docs/advanced_usage.rst
Empty file.
Empty file added docs/api_server.rst
Empty file.
3 changes: 3 additions & 0 deletions docs/usage.rst → docs/basic_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ Usage
To use rframe in a project::

import rframe


.. include:: ../README.rst
Empty file added docs/data_access_apis.rst
Empty file.
9 changes: 9 additions & 0 deletions docs/data_backends.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
=============
Data Backends
=============


- MongoDB
- TinyDB
- Pandas Dataframes
- Lists of dictionaries
5 changes: 3 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ Welcome to rframe's documentation!
:maxdepth: 2
:caption: Contents:

readme
introduction
installation
usage
quickstart
writing_schemas
modules
contributing
authors
Expand Down
27 changes: 27 additions & 0 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

============
Introduction
============

Data reading
------------

- Read data from multiple formats (e.g. mongodb, pandas) and locations with a simple unified interface.
- Custom logic implemented on the document class, e.g. creating a tensorflow model from the data etc.
- Multiple APIs for reading data, fun functional, ODM style, pandas and xarray.
- Read data as objects, dataframes, dicts, json.

Writing data
------------

- Write data to multiple storage backends with the same interface
- Custom per-collection rules for data insertion, deletion and updating.
- Schema validation and type coercion so storage has uniform and consistent data.

Other
-----

- Custom panel widgets for graphical representation of data, web client
- Auto-generated API server and client + openapi documentation
- CLI for viewing and downloading data

152 changes: 152 additions & 0 deletions docs/writing_schemas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
===============
Writing Schemas
===============

Schema definitions
------------------

To query data you need a schema, a python class that inherits from `rframe.BaseSchema`.
Lets say we want to store a collection of versioned documents
indexed by the experiment, detector and version

.. code-block:: python
import rframe
from typing import Literal
class ExampleSchema(rframe.BaseSchema):
_ALIAS = 'simple_dataframe'
# The simplest index, matches on exact value.
# This is how we define a versioned document without
# time dependence
experiment: Literal['1t', 'nt'] = rframe.Index()
detector: Literal['tpc', 'nveto','muveto'] = rframe.Index()
version: str = rframe.Index()
# we can define as many fields as we like, each with its own type
# defined using standard python annotations
value: float
error: float
unit: str
creator: str
def pre_insert(self, db):
# the base class implementation checks if any documents already exist at the index
# and raises an error if it does
super().pre_insert(db)
# here you can add any special logic to perform prior to inserts
def pre_update(self, db, new):
# the base class implementation raises an error if new values dont match
# the current ones for this index. we allow replacing the current values
# with identical ones because the current values may be inferred (i.e interpolated)
# in which case we allow new documents with the interpolated values since that wont
# change any interpolated values.
super().pre_update(db, new)
# add any extra logic/checks to perform here
Database Queries
----------------

Once we have a schema, we can use it to query any of the supported data backends
This allows for a consistent interface to data stored in any backend.

.. code-block:: python
import pymongo
db = pymongo.MongoClient()['cmt2']['simple_dataframe']
# or
db = pd.read_csv("pandas_dataframe.csv")
docs = ExampleSchema.find(datasource=db, experiment=..., detector=..., version=...)
# or
doc = ExampleSchema.find_one(datasource=db, experiment=..., detector=..., version=...)
Inserting data
--------------
A schema can also help you insert new data into a data source, e.g. a mogodb collection.

.. code-block:: python
import pymongo
db = pymongo.MongoClient()['cmt2']['simple_dataframe']
doc = ExampleSchema(experiment=..., detector=..., version=..., value=..., ...)
doc.save(db) # this will insert the document into the collection
The details of how to insert data into are handled by the framework so that you can use
the same code independent of the data source.


The RemoteFrame
---------------

Alternatively we can use the ``RemoteFrame`` class to access/store documents in any supported backend.

.. code-block:: python
rf = ExampleSchema.rframe(db)
# or
rf = rframe.RemoteFrame(ExampleSchema, db)
**Reading specific rows**

Rows can be accessed by calling the dataframe with the rows index values, using pandas-like indexing ``df.loc[idx]``, ``df.at[idx, column]``, ``df[column].loc[idx]`` or with the xarray style ``df.sel(index_name=idx)`` method

.. code-block:: python
# These methods will al return an identical pandas dataframe
df = rf.loc[experiment,detector, version]
df = rf.sel(experiment=experiment, detector=detector, version=version)
df = rf.loc[experiment,detector, version]
# Access a specific column to get a series back
df = rf['value'].loc[experiment,detector, version]
df = rf.value.loc[experiment,detector, version]
# pandas-style scalar lookup returns a scalar
value = rdf.at[(experiment,detector, version), 'value']
# or call the dataframe with the column as argyment and index values as keyword arguments
value = rf('value', experiment=experiment, detector=detector, version=version)
**Slicing**

You can also omit indices to get results back matching all values of the omitted index

.. code-block:: python
df = rf.sel(version=version)
# or
df = rf.loc[experiment, detector, :]
# or
df = rf.loc[experiment]
# or pass a list a values you want to match on:
df = rf.sel(version=[0,1], experiment=experiment)
# Slicing is also supported
df = rf.sel(version=slice(2,10), detector=detector)
The interval index also supports passing a tuple/slice/begin,end keywords to query all intervals overlapping the given interval

.. code-block:: python
df = rf.sel(version=version, time=(time1,time2))
df = rf.loc[version, time1:time2]
df = rf.get(version=version, begin=time1, end=time2)

0 comments on commit 3c4aba9

Please sign in to comment.