improve docs

XENONnT · Sep 12, 2023 · 3c4aba9 · 3c4aba9
1 parent 128903b
commit 3c4aba9
Show file tree

Hide file tree

Showing 8 changed files with 194 additions and 2 deletions.
diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst
diff --git a/docs/api_server.rst b/docs/api_server.rst
diff --git a/docs/usage.rst → docs/basic_usage.rst b/docs/usage.rst → docs/basic_usage.rst
@@ -5,3 +5,6 @@ Usage
 To use rframe in a project::
 
     import rframe
+
+
+.. include:: ../README.rst
diff --git a/docs/data_access_apis.rst b/docs/data_access_apis.rst
diff --git a/docs/data_backends.rst b/docs/data_backends.rst
@@ -0,0 +1,9 @@
+=============
+Data Backends
+=============
+
+
+- MongoDB
+- TinyDB
+- Pandas Dataframes
+- Lists of dictionaries
diff --git a/docs/index.rst b/docs/index.rst
@@ -5,9 +5,10 @@ Welcome to rframe's documentation!
    :maxdepth: 2
    :caption: Contents:
 
-   readme
+   introduction
    installation
-   usage
+   quickstart
+   writing_schemas
    modules
    contributing
    authors

diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -0,0 +1,27 @@
+
+============
+Introduction
+============
+
+Data reading
+------------
+
+- Read data from multiple formats (e.g. mongodb, pandas) and locations with a simple unified interface.
+- Custom logic implemented on the document class, e.g. creating a tensorflow model from the data etc.
+- Multiple APIs for reading data, fun functional, ODM style, pandas and xarray.
+- Read data as objects, dataframes, dicts, json.
+
+Writing data
+------------
+
+- Write data to multiple storage backends with the same interface
+- Custom per-collection rules for data insertion, deletion and updating.
+- Schema validation and type coercion so storage has uniform and consistent data.
+
+Other
+-----
+
+- Custom panel widgets for graphical representation of data, web client
+- Auto-generated API server and client + openapi documentation
+- CLI for viewing and downloading data
+
diff --git a/docs/writing_schemas.rst b/docs/writing_schemas.rst
@@ -0,0 +1,152 @@
+===============
+Writing Schemas
+===============
+
+Schema definitions
+------------------
+
+To query data you need a schema, a python class that inherits from `rframe.BaseSchema`.
+Lets say we want to store a collection of versioned documents 
+indexed by the experiment, detector and version
+
+.. code-block:: python
+    import rframe
+    from typing import Literal
+
+
+    class ExampleSchema(rframe.BaseSchema):
+        _ALIAS = 'simple_dataframe'
+
+        # The simplest index, matches on exact value. 
+        # This is how we define a versioned document without 
+        # time dependence
+        experiment: Literal['1t', 'nt'] = rframe.Index()
+        detector: Literal['tpc', 'nveto','muveto'] = rframe.Index()
+        version: str = rframe.Index()
+
+        # we can define as many fields as we like, each with its own type
+        # defined using standard python annotations
+        value: float
+        error: float
+        unit: str
+        creator: str
+
+    def pre_insert(self, db):
+            # the base class implementation checks if any documents already exist at the index 
+        # and raises an error if it does
+            super().pre_insert(db)
+            # here you can add any special logic to perform prior to inserts
+
+    def pre_update(self, db, new):
+            # the base class implementation raises an error if new values dont match
+            # the current ones for this index. we allow replacing the current values
+            # with identical ones because the current values may be inferred (i.e interpolated)
+            # in which case we allow new documents with the interpolated values since that wont
+            # change any interpolated values.
+            super().pre_update(db, new)
+
+            # add any extra logic/checks to perform here 
+
+
+Database Queries
+----------------
+
+Once we have a schema, we can use it to query any of the supported data backends
+This allows for a consistent interface to data stored in any backend.
+
+.. code-block:: python
+
+    import pymongo
+    
+
+    db = pymongo.MongoClient()['cmt2']['simple_dataframe']
+
+    # or 
+    db = pd.read_csv("pandas_dataframe.csv")
+
+    docs = ExampleSchema.find(datasource=db, experiment=..., detector=..., version=...)
+
+    # or
+    doc = ExampleSchema.find_one(datasource=db, experiment=..., detector=..., version=...)
+
+
+Inserting data
+--------------
+A schema can also help you insert new data into a data source, e.g. a mogodb collection.
+
+.. code-block:: python
+
+    import pymongo
+
+    db = pymongo.MongoClient()['cmt2']['simple_dataframe']
+
+    doc = ExampleSchema(experiment=..., detector=..., version=..., value=..., ...)
+
+    doc.save(db) # this will insert the document into the collection
+
+The details of how to insert data into are handled by the framework so that you can use
+the same code independent of the data source.
+
+
+The RemoteFrame
+---------------
+
+Alternatively we can use the ``RemoteFrame`` class to access/store documents in any supported backend.
+
+.. code-block:: python
+
+    rf = ExampleSchema.rframe(db)
+    # or
+    rf = rframe.RemoteFrame(ExampleSchema, db) 
+
+**Reading specific rows**
+
+Rows can be accessed by calling the dataframe with the rows index values, using pandas-like indexing ``df.loc[idx]``, ``df.at[idx, column]``, ``df[column].loc[idx]`` or with the xarray style ``df.sel(index_name=idx)`` method
+
+.. code-block:: python
+
+    # These methods will al return an identical pandas dataframe
+
+    df = rf.loc[experiment,detector, version]
+    
+    df = rf.sel(experiment=experiment, detector=detector, version=version)
+    
+    df = rf.loc[experiment,detector, version]
+    
+    # Access a specific column to get a series back
+    df = rf['value'].loc[experiment,detector, version]
+    df = rf.value.loc[experiment,detector, version]
+
+    # pandas-style scalar lookup returns a scalar
+    value = rdf.at[(experiment,detector, version), 'value']
+    # or call the dataframe with the column as argyment and index values as keyword arguments
+    value = rf('value', experiment=experiment, detector=detector, version=version)
+
+**Slicing**
+
+You can also omit indices to get results back matching all values of the omitted index
+
+.. code-block:: python
+
+    df = rf.sel(version=version)
+
+    # or
+    df = rf.loc[experiment, detector, :]
+
+    # or
+    df = rf.loc[experiment]
+
+    # or pass a list a values you want to match on:
+    df = rf.sel(version=[0,1], experiment=experiment)
+
+    # Slicing is also supported
+    df = rf.sel(version=slice(2,10), detector=detector)
+
+
+The interval index also supports passing a tuple/slice/begin,end keywords to query all intervals overlapping the given interval
+
+.. code-block:: python
+
+    df = rf.sel(version=version, time=(time1,time2))
+    df = rf.loc[version, time1:time2]
+    df = rf.get(version=version, begin=time1, end=time2)