Supporting other data types #16

abkfenris · 2023-10-02T20:13:09Z

abkfenris
Oct 2, 2023
Maintainer

At the IOOS DMAC meeting, there was talk about supporting other data types, such as tabular or forecast collections. Since I've said a few times that this should be possible and then was asked afterwards how to make it happen, here's a sketch of how to implement it for dataframes.

To give the same functionality as we currently have with datasets, we would want to allow plugins to be dataframe providers, and dataframe routers. To do that we need a plugin that both specifies the hooks that other plugins should be able to use, and mounts and serves the routers from those plugins.

Imports

from typing import Iterable, Optional, Sequence, Callable

from fastapi import APIRouter, HTTPException
import pandas as pd
from pydantic import Field
from xpublish import Plugin, hookimpl, hookspec, Dependencies

The first thing to do is to define the additional dependencies that our dataframe plugins can use.

class DataFrameDeps(Dependencies):
    dataframe_ids: Callable[..., list[str]] = Field(
        description="Returns a list of all valid dataset ids"
    )
    dataframe: Callable[[str], pd.DataFrame] = Field(
        description='Returns a dataframe, using the ``/<dataframe_id>`` in the path.'
    )

This updated xpublish.Dependency now allows other plugins to depend upon a dataframe, or a list of all dataframe IDs.

Now we create a specification for the new dataframe hooks, which can largely match the specs defined for datasets plugin methods.

class DataframePluginSpec(Plugin):

    @hookspec
    def get_dataframe_ids(self) -> Iterable[str]:
        """Return an iterable of dataframe IDs that the plugin can provide"""

    @hookspec(firstresult=True)
    def get_dataframe(self, dataframe_id: str) -> Optional[pd.DataFrame]:
        """Return a dataframe requested by dataframe_id.
        
        If the plugin does not have the dataframe, return None"""

    @hookspec
    def dataframe_router(self, deps: DataFrameDeps) -> APIRouter:
        """Return an API router that can work with Dataframes"""

Now that we have those, we create our plugin.

class DataframePlugin(Plugin):
    name: str = "dataframe"

    ...

    @hookimpl
    def register_hookspec(self):
        return DataframePluginSpec

    ...

The first method that needs to be implemented, is register_hookspec(), which needs to return the spec we defined above. (In theory we have docs on this, but largely it's a 'I'll get to it later'). This tells the plugin system what methods other plugins can implement, and allows us to really start extending the Xpublish without bringing new things into core.

Then we create an app_router method that both can list all dataframes, and pull in other plugins for both loading dataframes and adding new routes for them.

class DataframePlugin(Plugin):
    ...

    app_router_prefix: str = "/dataframes"
    app_router_tags: Sequence[str] = ["dataframe"]

    ...
    
    @hookimpl
    def app_router(self, deps: Dependencies):
        router = APIRouter(
            prefix=self.app_router_prefix,
            tags=self.app_router_tags
        )

        def get_dataframe_ids():
            """Return the known dataframe IDs from all dataframe provider plugins"""
            df_ids = []

            for new_ids in deps.plugin_manager().hook.get_dataframe_ids():
                df_ids.extend(new_ids)

            return df_ids
        
        def get_dataframe(dataframe_id: str):
            """Returns a dataframe from dataframe provider plugins"""
            df = deps.plugin_manager().hook.get_dataframe(dataframe_id=dataframe_id)

            if df is not None:
                return df
            
            raise HTTPException(
                status_code=404,
                detail=f"Dataframe {dataframe_id} not found."
            )
        
        @router.get("/")
        def dataframe_ids():
            """Returns known dataframe IDs"""
            return get_dataframe_ids()

        df_deps = DataFrameDeps(
            **deps.model_dump(), 
            dataframe=get_dataframe, 
            dataframe_ids=get_dataframe_ids
        )

        for new_router in deps.plugin_manager().hook.dataframe_router(deps=df_deps):
            router.include_router(new_router, prefix="/{dataframe_id}")

        return router

Within our app_router we start by defining methods to both get a dataframe and dataframe_ids. These directly access deps.plugin_manager().hook. This is the same pattern as the core of xpublish uses to deps.dataset_ids and deps.dataset.

Then we create a route to return all dataframe IDs.

Next we build our new dataframe dependencies, with the existing deps, and the addition of our two new dependencies.

Then we play with the plugin_manager again, and ask it for all the dataframe_router implementations and pass it the new dataframe dependencies. For each one of these, we include them in the app router we're building.

Now for some example plugins

A dataframe provider.

import pandas as pd
from xpublish import Plugin, hookimpl


class DataFrameProviderPlugin(Plugin):
    name: str = "df-provider"

    @hookimpl
    def get_dataframe_ids(self):
        return ["a01_met"]
    
    @hookimpl
    def get_dataframe(self, dataframe_id: str):
        if dataframe_id == "a01_met":
            return pd.read_csv("https://data.neracoos.org/erddap/tabledap/A01_met.csv?time%2Cwind_speed%2Cwind_gust%2Cwind_direction%2Cair_temperature&time%3E=2023-09-25T00%3A00%3A00Z&time%3C=2023-10-02T17%3A00%3A00Z")

And a CSV router

from typing import Sequence

from fastapi import APIRouter, Depends, Response
import pandas as pd
from xpublish import Plugin, hookimpl

from df import DataFrameDeps


class DataFrameCSVPlugin(Plugin):
    name: str = "df-csv"

    dataframe_router_prefix: str = ""
    dataframe_router_tags: Sequence[str] = ["dataframe", "csv"]

    @hookimpl
    def dataframe_router(self, deps: DataFrameDeps) -> APIRouter:
        """Return an API router that can work with Dataframes"""
        router = APIRouter()

        @router.get(".csv")
        def csv(dataframe: pd.DataFrame=Depends(deps.dataframe)):
            csv = dataframe.to_csv()

            return Response(
                csv,
                media_type="text/csv",
                headers={"Content-Disposition": 'attachment; filename="dataframe.csv"'},
            )

        return router

Hopefully this gives some folks something to start from to explore adding new data types to Xpublish.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xpublish

Supporting other data types #16

{{title}}

Replies: 0 comments

Select a reply

Xpublish

Supporting other data types #16

abkfenris Oct 2, 2023 Maintainer

Replies: 0 comments

abkfenris
Oct 2, 2023
Maintainer