You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.
We should do a simple review of the main processes involved in loading data into the databases, processing it, etc. Some examples of what we're looking for are things like:
Do they apply sanitization? For example, ensure consistent node_ids, encoding, naming, etc.
Do they handle invalid data correctly? At least one process just drops bad blobs on failure. We probably would like to flag that data and have it put into an error queue or something for later inspection.
Are they tolerant to database and broker delays, timeouts, etc? This means things like not crashing immediately if the database is busy, ensuring proper message acknowledgements are being done, etc.
Are they relatively efficient in their implementation?
This is worth looking at and getting correct now, as these will be part of our architecture regardless of how we redesign beehive.
The text was updated successfully, but these errors were encountered:
seanshahkarami
changed the title
Review ETL processes for data sanitization, robustness and correctness
Review ETL processes for sanitization, robustness and correctness
Sep 7, 2017
All the workers now have proper connection retries when starting, so that should cut down on the crashing immediately and restarting if the message broker is down.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
We should do a simple review of the main processes involved in loading data into the databases, processing it, etc. Some examples of what we're looking for are things like:
This is worth looking at and getting correct now, as these will be part of our architecture regardless of how we redesign beehive.
The text was updated successfully, but these errors were encountered: