Usability issues #738

vlcinsky · 2022-07-09T20:50:33Z

vlcinsky
Jul 9, 2022

Importing 24 CSV files to SQL database in 3 lines - great experience

Having 24 CSV files with complex relationships, I described them properly (incl. foreign keys etc) as data/package.yaml.

Following lines were supposed to import the files into sqlite database:

from frictionless import  Package
package = Package("data/package.yaml")
package.to_sql("sqlite:///sqlite.db")

And it really worked incl. doing the import in proper order.

This was really astonishing experience - it does not happen often.

Import data from Excel and format the table - frustrating

My next task was to import Excel files from multiple people. There was an agreement about naming columns, but I did not expect everyone to follow the order so my plan was to have a pipeline doing steps:

Import an Excel file
select only fields with agreed names
filter out empty records

Resulting resource (table) should be ready for further processing.

This time, I have spent few days trying to understand frictionless python package and it's usage and the experience was rather frustrating.

Transform functions have very limited documentation

python code itself has docstring with single line simply repeating what the function name already tells
Transform Steps documentation shows only sample code to run, but mostly does not tell any extra word of explanation
- There 40+ transformation steps and only two of them (table_aggregate and table_normalize) have a sentence or two of explanation
- The table_normalize talks about the function "fixing dimensions". Real usage reveals, columns are really fixed properly, but empty lines are not removed so I do not understand, why there is plural for the word dimension.

Code contains many notes about incomplete implementation

Code for the functions is full of notes, which sound very dangerous, such as:

    # Currently, metadata profiles are not fully finished; will require improvements
    # Some of the following step use **options - we need to review/fix it
    # We need to review table_pivot step as it's not fully implemented/tested
    # We need to review table_validate step as it's not fully implemented/tested
    # We need to review table_write step as it's not fully implemented/tested
    # We need to review how we use "target.schema.fields.clear()"

Some transformation step functions are broken

E.g. field_filter mess up field and value order (see issue #1155)

The transform raise/reports strange errors

The errors were about things such as:

real number and resource reported number of rows differ
resource.sample property is empty

Lazy evaluation makes debugging transformations difficult

The processing is by design using lazy evaluation. This has apparent advantages, but also makes debugging rather difficult.

I am sure, core team members are aware of some working method to debug such situations effectively. It would be very handy to share it with users.

Too short documentation about custom steps

Transform Guide shows short custom step defined as plain function, Step Guide is using class derived from Step.

Apart from two different code samples there is no explanation, how to use it and what to care about.

How to improve the experience

To summarize my experience with python library frictionless, I would say:

very promising
some features are already working excellently
generally under-documented
implementation of significant amount of functions is incomplete and this fact is not clearly declared to users

Following actions could improve the experience:

Document functions properly using docstring

For functions, which are intended for general reuse (e.g. in pipelines), take care to write proper specification of what it does in docstring.

Docstring are easily accessible to developer from IDE as helpstring for given function.

If there is only one place to put function description, it is in docstrings. There are tools to generate reference documentation from it.

Mark (in docstring) incomplete functions as experimental

If is a function implementation incomplete from functional point of view, declare it in docstring (very early) as "experimental" or "WIP" or similarly to inform the coder, that there might be surprises.

Missing optimization is not a reason for being marked as experimental.

Document (in docstring) all aspects of behavior

E.g. for field_filter describe, that:

order of names is significant as it reorders fields on output
when a name which is not existing in field names is used, such field will not be present in output
resulting table can have zero fields
metadata are changed in a way:
- schema fields are updated
- unique and foreign keys properties of the schema are (not?) updated if they contain fields, which are removed

Such a description would help not only a user, but also library developer as it is great specification for creating test cases.

Write guides with checklists for specific coding topics

Identify important coding topics and write for them instructions for developer. This shall explain principles of the task, things to take care of (e.g. using checklist listing important aspects such as "modify the data" + "update the schema incl. xxyy").

Such a text would help not only internal developers, but would help also external contributors.

Possible topics are e.g. (some already exists, but are too short at this moment):

transformation function or class for resource
transformation function or class for dataset
transformation function written as a function
transformation function written as a Step subclass
creating Plugin…

Focus on completing existing functionality

Seeing very promising functionality which later turns to be broken is frustrating.

Turn all "implementation incomplete" notes into issues

Incomplete implementation (but not incomplete optimization) is really an issue.

Having such an issue in the tracker allows to:

describe incompleteness in detail
discuss it
focus in completing it

Update test suite where needed

Using updated function documentation and also checklists from guides, one can make test suites more complete.

Hint: If there is some test, which would fail due to incomplete function implementation, you may mark it as expected failure until it is fixed. It also serves as very clear sign to potential user, which cases would probably not work properly.

Resolve the "implementation incomplete" issues

Every completed implementation brings real value to end users.

roll · 2022-07-11T13:50:18Z

roll
Jul 11, 2022
Maintainer

Thanks @vlcinsky!

We'll be working on this coming month and your suggestions will be really useful cc @shashigharti

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usability issues #738

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Usability issues #738

vlcinsky Jul 9, 2022

Table of Contents

Importing 24 CSV files to SQL database in 3 lines - great experience

Import data from Excel and format the table - frustrating

Transform functions have very limited documentation

Code contains many notes about incomplete implementation

Some transformation step functions are broken

The transform raise/reports strange errors

Lazy evaluation makes debugging transformations difficult

Too short documentation about custom steps

How to improve the experience

Document functions properly using docstring

Mark (in docstring) incomplete functions as experimental

Document (in docstring) all aspects of behavior

Write guides with checklists for specific coding topics

Focus on completing existing functionality

Turn all "implementation incomplete" notes into issues

Update test suite where needed

Resolve the "implementation incomplete" issues

Replies: 1 comment

roll Jul 11, 2022 Maintainer

vlcinsky
Jul 9, 2022

roll
Jul 11, 2022
Maintainer