Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default way to specify upstream data for this node #69

Open
kzecchini opened this issue Mar 3, 2020 · 0 comments
Open

Add default way to specify upstream data for this node #69

kzecchini opened this issue Mar 3, 2020 · 0 comments

Comments

@kzecchini
Copy link
Contributor

kzecchini commented Mar 3, 2020

Right now custom nodes need to apply certain logic to find upstream data - sometimes filtering on keys. However upstream operations might cause the keys to be different names, or in a different format.

I think that there may be a way to implement a standard way to find upstream data in the AbstractNode class. Every node should be able to take a standard configuration which will search for upstream data for this custom node. We can specify the potential "data args" to a node in this way.

For example if I am looking for data_1, but it is keyed to my_upstream_key_1, we can have a configuration which fixes this mapping for us. Example:

...
class: MyCustomNode
upstream_data:
  filter_for_key: my_filter
  data_1_key: my_upstream_key_1
  data_2_key: my_upstream_key_2
...

Our documentation for each class can include the data which is needed in the data_object, for example:

data_1 (pd.DataFrame): dataframe of training data
data_2 (int): number of cv folds
...

In this way we can ensure that when we are searching upstream, we can always find the data by including an optional remapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants