Review ETL processes for sanitization, robustness and correctness #45

seanshahkarami · 2017-09-07T14:46:48Z

We should do a simple review of the main processes involved in loading data into the databases, processing it, etc. Some examples of what we're looking for are things like:

Do they apply sanitization? For example, ensure consistent node_ids, encoding, naming, etc.
Do they handle invalid data correctly? At least one process just drops bad blobs on failure. We probably would like to flag that data and have it put into an error queue or something for later inspection.
Are they tolerant to database and broker delays, timeouts, etc? This means things like not crashing immediately if the database is busy, ensuring proper message acknowledgements are being done, etc.
Are they relatively efficient in their implementation?

This is worth looking at and getting correct now, as these will be part of our architecture regardless of how we redesign beehive.

seanshahkarami · 2017-09-08T22:33:02Z

All the workers now have proper connection retries when starting, so that should cut down on the crashing immediately and restarting if the message broker is down.

seanshahkarami added data reliability labels Sep 7, 2017

seanshahkarami added this to the Beehive "Good Enough" Candidate milestone Sep 7, 2017

seanshahkarami changed the title ~~Review ETL processes for data sanitization, robustness and correctness~~ Review ETL processes for sanitization, robustness and correctness Sep 7, 2017

gemblerz self-assigned this Sep 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review ETL processes for sanitization, robustness and correctness #45

Review ETL processes for sanitization, robustness and correctness #45

seanshahkarami commented Sep 7, 2017 •

edited

Loading

seanshahkarami commented Sep 8, 2017

Review ETL processes for sanitization, robustness and correctness #45

Review ETL processes for sanitization, robustness and correctness #45

Comments

seanshahkarami commented Sep 7, 2017 • edited Loading

seanshahkarami commented Sep 8, 2017

seanshahkarami commented Sep 7, 2017 •

edited

Loading