Replies: 1 comment
-
Thanks @vlcinsky! We'll be working on this coming month and your suggestions will be really useful cc @shashigharti |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Table of Contents
When I found Frictionless Data framework and libraries, it looked as my fulfilled dream.
Importing 24 CSV files to SQL database in 3 lines - great experience
Having 24 CSV files with complex relationships, I described them properly (incl. foreign keys etc) as
data/package.yaml
.Following lines were supposed to import the files into sqlite database:
And it really worked incl. doing the import in proper order.
This was really astonishing experience - it does not happen often.
Import data from Excel and format the table - frustrating
My next task was to import Excel files from multiple people. There was an agreement about naming columns, but I did not expect everyone to follow the order so my plan was to have a pipeline doing steps:
Resulting resource (table) should be ready for further processing.
This time, I have spent few days trying to understand frictionless python package and it's usage and the experience was rather frustrating.
Transform functions have very limited documentation
Code contains many notes about incomplete implementation
Code for the functions is full of notes, which sound very dangerous, such as:
Some transformation step functions are broken
E.g.
field_filter
mess up field and value order (see issue #1155)The transform raise/reports strange errors
The errors were about things such as:
Lazy evaluation makes debugging transformations difficult
The processing is by design using lazy evaluation. This has apparent advantages, but also makes debugging rather difficult.
I am sure, core team members are aware of some working method to debug such situations effectively. It would be very handy to share it with users.
Too short documentation about custom steps
Transform Guide shows short custom step defined as plain function, Step Guide is using class derived from
Step
.Apart from two different code samples there is no explanation, how to use it and what to care about.
How to improve the experience
To summarize my experience with python library
frictionless
, I would say:Following actions could improve the experience:
Document functions properly using docstring
For functions, which are intended for general reuse (e.g. in pipelines), take care to write proper specification of what it does in docstring.
Docstring are easily accessible to developer from IDE as helpstring for given function.
If there is only one place to put function description, it is in docstrings. There are tools to generate reference documentation from it.
Mark (in docstring) incomplete functions as experimental
If is a function implementation incomplete from functional point of view, declare it in docstring (very early) as "experimental" or "WIP" or similarly to inform the coder, that there might be surprises.
Missing optimization is not a reason for being marked as experimental.
Document (in docstring) all aspects of behavior
E.g. for
field_filter
describe, that:Such a description would help not only a user, but also library developer as it is great specification for creating test cases.
Write guides with checklists for specific coding topics
Identify important coding topics and write for them instructions for developer. This shall explain principles of the task, things to take care of (e.g. using checklist listing important aspects such as "modify the data" + "update the schema incl. xxyy").
Such a text would help not only internal developers, but would help also external contributors.
Possible topics are e.g. (some already exists, but are too short at this moment):
Focus on completing existing functionality
Seeing very promising functionality which later turns to be broken is frustrating.
Turn all "implementation incomplete" notes into issues
Incomplete implementation (but not incomplete optimization) is really an issue.
Having such an issue in the tracker allows to:
Update test suite where needed
Using updated function documentation and also checklists from guides, one can make test suites more complete.
Hint: If there is some test, which would fail due to incomplete function implementation, you may mark it as expected failure until it is fixed. It also serves as very clear sign to potential user, which cases would probably not work properly.
Resolve the "implementation incomplete" issues
Every completed implementation brings real value to end users.
Beta Was this translation helpful? Give feedback.
All reactions