Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Prediction Module Interface #28

Closed
1 task
chauhankaranraj opened this issue Mar 31, 2021 · 13 comments · Fixed by #35
Closed
1 task

Design Prediction Module Interface #28

chauhankaranraj opened this issue Mar 31, 2021 · 13 comments · Fixed by #35
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@chauhankaranraj
Copy link
Member

As a data scientist,
I want to establish the API for the disk health prediction python module to be created by #27,
So that I can design the python module according to what kind of inputs to expect and what kind of output to return.

As a ceph developer,
I want to establish the interface between the ceph manager daemon and the disk health prediction python module (to be created by #27),
So that I can ensure ceph manager is able to provide the inputs needed by this module in the correct format, and is able to use the output provided by it.

Acceptance Criteria

  • doc outlining the interface added to docs directory
@goern
Copy link
Contributor

goern commented Apr 1, 2021

/kind feature

@sesheta sesheta added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 1, 2021
@MichaelClifford
Copy link
Member

@chauhankaranraj does #24 close this issue?

@chauhankaranraj
Copy link
Member Author

@chauhankaranraj does #24 close this issue?

I believe it does not. Coz iirc from #24 we just decided that there should be a separate prediction module, but didn't really describe in detail what the interface to this module should be like.

@chauhankaranraj
Copy link
Member Author

WIP design doc for prediction module is here. Ideas and suggestions are welcome and appreciated!

@MichaelClifford MichaelClifford changed the title Prediction Module Interface Design Prediction Module Interface May 5, 2021
@chauhankaranraj chauhankaranraj self-assigned this May 5, 2021
@goern
Copy link
Contributor

goern commented May 6, 2021

@chauhankaranraj could you dump a png somewhere?

@chauhankaranraj
Copy link
Member Author

@chauhankaranraj could you dump a png somewhere?

Yeah, here's some thoughts I had

  • A model "store" / "vault" / "hub" can be set up, where pretrained models are stored. This store could be of different kinds too.
    Blank diagram

  • Where the directory structure of a model store could look something like this:
    Blank diagram (2)

  • And finally a Predictor class could be made to consume the models/preprocessors/feature-extractors within a given model directory:
    Blank diagram (3)

  • So overall the user (e.g. ceph) workflow could be something like this:
    Blank diagram (4)

This is just some ideas and very much WIP, so could be completely off base too :) Lmk what you think!

@durandom
Copy link
Member

durandom commented May 12, 2021

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

You want to make sure that the inference payload matches the expected values and the result matches an expected schema.

@chauhankaranraj @MichaelClifford @TreeinRandomForest is there a common way to specify input and output?
Like https://www.tensorflow.org/api_docs/python/tf/compat/v1/saved_model/signature_def_utils

From a ceph perspective it would be

import rh-classifier as disk_health_classifier

estimate = disk_health_classifier.predict(smart_ctl)

managing and loading the models is a matter of the ceph code

@MichaelClifford
Copy link
Member

@durandom, I'm not sure of a common standard. But there should certainly be a check on the inference payload before it is applied to the model. You can look at this older example here where a validator function is applied to check that the input data is the correct dimensions. But this approach could be expanded to be a bit more robust check on the data.

As far as output, this will be dictated by the model architecture and part of the design. As long as the input is correct and the model doesn't throw an error, the output should always be of the same schema. We could also create a check on the output data, but I think it would only be useful if the model architecture changed somehow.

@chauhankaranraj
Copy link
Member Author

@durandom, I'm not sure of a common standard. But there should certainly be a check on the inference payload before it is applied to the model. You can look at this older example here where a validator function is applied to check that the input data is the correct dimensions. But this approach could be expanded to be a bit more robust check on the data.

@chauhankaranraj @MichaelClifford @TreeinRandomForest is there a common way to specify input and output?

Hm, is ONNX is something we can consider here? Maybe we can solve multiple issues with it

  • Instead of saving sklearn/tf/pytorch models in their native formats, we could convert them to ONNX compute graphs and save those. This way, there's a universal format in which the models are stored.

  • This should let us specify input schema. E.g. input has to be a float tensor of shape (None, 20)

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

initial_type = [('float_input', FloatTensorType([None, 20]))]
onx = convert_sklearn(my_classifier, initial_types=initial_type)

with open("rf_iris.onnx", "wb") as f:
    f.write(onx.SerializeToString())
  • Users (e.g. ceph) won't need to install sklearn/tf/pytorch/mxnet based on how the model was created, but rather install just ONNX runtime to run inference for all kinds of models.
import onnxruntime.backend as backend

# load graph
rep = backend.prepare('rf_iris.onnx', 'CPU')

# convert input to float type otherwise it will throw an error
X_test = X_test.astype(np.float32)

prediction = rep.run(X_test)

wdyt?

@yaarith
Copy link

yaarith commented May 14, 2021

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

I agree with @durandom; I think we can wait with the abstraction for now and simply import the models from a library.

@chauhankaranraj
Copy link
Member Author

I would defer the model-store stuff to a library and just use basic inheritance/interface design.

I agree with @durandom; I think we can wait with the abstraction for now and simply import the models from a library.

I see, so iiuc you're suggesting we should have the models within the module itself, right?

My initial thought was that the models should not be directly a part of the module. Instead, they should be stored in a separate storage place (e.g. MLflow server / an AIOps GH repo / someone else's GH repo / ceph bucket) and should be "pulled" by the module as required by the user (same idea as torch.hub shown here)

But if this adds more complexity than needed, we can definitely defer it and directly include the pretrained models in the module itself :)

@yaarith
Copy link

yaarith commented May 17, 2021

I see, so iiuc you're suggesting we should have the models within the module itself, right?

yes; not all clusters have access to the internet, and in those that do have access we will need to cache the model, so it will not be fetched from the internet on every run. So we can start by keeping it local in the module; we can always add access to the other services in the future.

@chauhankaranraj
Copy link
Member Author

yes; not all clusters have access to the internet, and in those that do have access we will need to cache the model, so it will not be fetched from the internet on every run. So we can start by keeping it local in the module; we can always add access to the other services in the future.

Makes sense! Thanks for the feedback :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants