Pipelines / Transform functionality discussion #720
Replies: 5 comments 1 reply
-
@rufuspollock @risenW
https://colab.research.google.com/drive/1C4dFWDExyxzGIwLUovrDQZghZK4JK2PD |
Beta Was this translation helpful? Give feedback.
-
@roll great and i've read through that. What do you think of doing a bit of syncing / planning before we proceed much further? Also some questions:
|
Beta Was this translation helpful? Give feedback.
-
@rufuspollock Another question is that in my opinion, I don't think it should be driven by some kind of committee 😃. Honestly, I don't know if it's a good enough solution or not until real people start using it in real projects and give us feedback. For example, as far as I can remember, this project https://github.com/datasets/covid-19 was driven by DataFlows initially and it uses Pandas now. I guess you ran into some shortcomings of the pipeline and had to switch. Also, it's very interesting whether we will be able to prototype something like this using Frictionless Transform.
I use a pretty simple adapter that makes Resource to be compatible with PETL Table Container interface - https://petl.readthedocs.io/en/stable/intro.html#conventions-table-containers-and-table-iterators. It allowed to just re-use all their battle-tested processors for data and actually write only metadata updates in our processors. Although we fully wrap PETL as a project so our users don't need to go to their documentation once we have finished ours.
It's streaming |
Beta Was this translation helpful? Give feedback.
-
hi @roll do you think we can close this? has it been resolved in frictionless-py? |
Beta Was this translation helpful? Give feedback.
-
We're planning to have some kind of recommended pipeline / transform system(s) for Frictionless toolkit.
@roll i (rufus) am opening this issue so we have a place to discuss plans for transform / pipelines functionality. I think this is something worth discussing a bit and maybe even speccing in an RFC.
This is an It's something @rufuspollock has now been involved with implementing several times (including dataflows and a new system called AirCan https://github.com/datopian/aircan) - we may want to think whether we directly reuse this or build anew. It is also an area where there is a lot of existing open source tooling so it is worth thinking what we reuse vs what we build ourselves.
Tasks
Analysis
Existing work (related to frictionless)
Existing work (non frictionless) - see also https://tech.datopian.com/flows/research/
Please preserve this line to notify @lwinfree (lead of this repository)
Beta Was this translation helpful? Give feedback.
All reactions