Design Prediction Module Interface #28

chauhankaranraj · 2021-03-31T16:01:23Z

As a data scientist,
I want to establish the API for the disk health prediction python module to be created by #27,
So that I can design the python module according to what kind of inputs to expect and what kind of output to return.

As a ceph developer,
I want to establish the interface between the ceph manager daemon and the disk health prediction python module (to be created by #27),
So that I can ensure ceph manager is able to provide the inputs needed by this module in the correct format, and is able to use the output provided by it.

Acceptance Criteria

doc outlining the interface added to docs directory

The text was updated successfully, but these errors were encountered:

goern · 2021-04-01T10:14:44Z

/kind feature

MichaelClifford · 2021-04-21T12:38:00Z

@chauhankaranraj does #24 close this issue?

chauhankaranraj · 2021-04-21T13:03:53Z

@chauhankaranraj does #24 close this issue?

I believe it does not. Coz iirc from #24 we just decided that there should be a separate prediction module, but didn't really describe in detail what the interface to this module should be like.

chauhankaranraj · 2021-05-05T12:58:57Z

WIP design doc for prediction module is here. Ideas and suggestions are welcome and appreciated!

goern · 2021-05-06T09:14:30Z

@chauhankaranraj could you dump a png somewhere?

chauhankaranraj · 2021-05-07T14:41:16Z

@chauhankaranraj could you dump a png somewhere?

Yeah, here's some thoughts I had

A model "store" / "vault" / "hub" can be set up, where pretrained models are stored. This store could be of different kinds too.
Where the directory structure of a model store could look something like this:
And finally a Predictor class could be made to consume the models/preprocessors/feature-extractors within a given model directory:
So overall the user (e.g. ceph) workflow could be something like this:

This is just some ideas and very much WIP, so could be completely off base too :) Lmk what you think!

durandom · 2021-05-12T11:23:16Z

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

You want to make sure that the inference payload matches the expected values and the result matches an expected schema.

@chauhankaranraj @MichaelClifford @TreeinRandomForest is there a common way to specify input and output?
Like https://www.tensorflow.org/api_docs/python/tf/compat/v1/saved_model/signature_def_utils

From a ceph perspective it would be

import rh-classifier as disk_health_classifier

estimate = disk_health_classifier.predict(smart_ctl)

managing and loading the models is a matter of the ceph code

MichaelClifford · 2021-05-12T14:03:42Z

@durandom, I'm not sure of a common standard. But there should certainly be a check on the inference payload before it is applied to the model. You can look at this older example here where a validator function is applied to check that the input data is the correct dimensions. But this approach could be expanded to be a bit more robust check on the data.

As far as output, this will be dictated by the model architecture and part of the design. As long as the input is correct and the model doesn't throw an error, the output should always be of the same schema. We could also create a check on the output data, but I think it would only be useful if the model architecture changed somehow.

chauhankaranraj · 2021-05-12T15:55:45Z

@durandom, I'm not sure of a common standard. But there should certainly be a check on the inference payload before it is applied to the model. You can look at this older example here where a validator function is applied to check that the input data is the correct dimensions. But this approach could be expanded to be a bit more robust check on the data.

@chauhankaranraj @MichaelClifford @TreeinRandomForest is there a common way to specify input and output?

Hm, is ONNX is something we can consider here? Maybe we can solve multiple issues with it

Instead of saving sklearn/tf/pytorch models in their native formats, we could convert them to ONNX compute graphs and save those. This way, there's a universal format in which the models are stored.
This should let us specify input schema. E.g. input has to be a float tensor of shape (None, 20)

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([None, 20]))]
onx = convert_sklearn(my_classifier, initial_types=initial_type)

with open("rf_iris.onnx", "wb") as f:
    f.write(onx.SerializeToString())

Users (e.g. ceph) won't need to install sklearn/tf/pytorch/mxnet based on how the model was created, but rather install just ONNX runtime to run inference for all kinds of models.

import onnxruntime.backend as backend

# load graph
rep = backend.prepare('rf_iris.onnx', 'CPU')

# convert input to float type otherwise it will throw an error
X_test = X_test.astype(np.float32)

prediction = rep.run(X_test)

wdyt?

yaarith · 2021-05-14T15:59:42Z

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

I agree with @durandom; I think we can wait with the abstraction for now and simply import the models from a library.

chauhankaranraj · 2021-05-14T21:32:09Z

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

I agree with @durandom; I think we can wait with the abstraction for now and simply import the models from a library.

I see, so iiuc you're suggesting we should have the models within the module itself, right?

My initial thought was that the models should not be directly a part of the module. Instead, they should be stored in a separate storage place (e.g. MLflow server / an AIOps GH repo / someone else's GH repo / ceph bucket) and should be "pulled" by the module as required by the user (same idea as torch.hub shown here)

But if this adds more complexity than needed, we can definitely defer it and directly include the pretrained models in the module itself :)

yaarith · 2021-05-17T03:51:51Z

I see, so iiuc you're suggesting we should have the models within the module itself, right?

yes; not all clusters have access to the internet, and in those that do have access we will need to cache the model, so it will not be fetched from the internet on every run. So we can start by keeping it local in the module; we can always add access to the other services in the future.

chauhankaranraj · 2021-05-17T14:35:33Z

yes; not all clusters have access to the internet, and in those that do have access we will need to cache the model, so it will not be fetched from the internet on every run. So we can start by keeping it local in the module; we can always add access to the other services in the future.

Makes sense! Thanks for the feedback :)

sesheta added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 1, 2021

MichaelClifford changed the title ~~Prediction Module Interface~~ Design Prediction Module Interface May 5, 2021

chauhankaranraj self-assigned this May 5, 2021

chauhankaranraj mentioned this issue May 25, 2021

Add ADR for python module design #35

Merged

2 tasks

MichaelClifford assigned isabelizimm Jun 2, 2021

sesheta closed this as completed in #35 Jul 12, 2021

isabelizimm mentioned this issue Sep 9, 2021

idea: Ceph Drive Failure ML AICoE/content-pipeline#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Prediction Module Interface #28

Design Prediction Module Interface #28

chauhankaranraj commented Mar 31, 2021

goern commented Apr 1, 2021

MichaelClifford commented Apr 21, 2021

chauhankaranraj commented Apr 21, 2021

chauhankaranraj commented May 5, 2021

goern commented May 6, 2021

chauhankaranraj commented May 7, 2021

durandom commented May 12, 2021 •

edited

Loading

MichaelClifford commented May 12, 2021

chauhankaranraj commented May 12, 2021

yaarith commented May 14, 2021

chauhankaranraj commented May 14, 2021

yaarith commented May 17, 2021

chauhankaranraj commented May 17, 2021

Design Prediction Module Interface #28

Design Prediction Module Interface #28

Comments

chauhankaranraj commented Mar 31, 2021

goern commented Apr 1, 2021

MichaelClifford commented Apr 21, 2021

chauhankaranraj commented Apr 21, 2021

chauhankaranraj commented May 5, 2021

goern commented May 6, 2021

chauhankaranraj commented May 7, 2021

durandom commented May 12, 2021 • edited Loading

MichaelClifford commented May 12, 2021

chauhankaranraj commented May 12, 2021

yaarith commented May 14, 2021

chauhankaranraj commented May 14, 2021

yaarith commented May 17, 2021

chauhankaranraj commented May 17, 2021

durandom commented May 12, 2021 •

edited

Loading