Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving CBS tables and pipeline #1791

Open
10 tasks
atalyaalon opened this issue May 12, 2021 · 0 comments
Open
10 tasks

Improving CBS tables and pipeline #1791

atalyaalon opened this issue May 12, 2021 · 0 comments
Milestone

Comments

@atalyaalon
Copy link
Collaborator

atalyaalon commented May 12, 2021

  • Relevant documentation and possible tasks can be found here
  • Read carefully the CBS doc attached, the code and the raw data
    Important Note - CBS loading process is now in anyway repo, and runs in anyway-etl every week in this flow
    Ideally we will run code using anyway-etl - (however code can be in anyway repo)
    In anyway-etl implementation is not ready - there is a logical bug there - discussed here
  • Prioritize tasks.

First priority tasks:

  • Make sure streets and cities don't have whitspaces (like \t found in this issue)
  • Make sure Hebrew DB tables (markers_hebrew, vehicles_hebrew, involved_hebrew, involved_markers_hebrew, vehicles_markers_hebrew) don't have missing data when loading data. From the discussion in this issue it seems data is always consistent but please re-verify it.
  • Make sure raw CBS DB tables (markers,vehicles,involved) don't have missing data when loading data: Nowadays when loading CBS data in executor.py ANYWAY map is “unstable” for a few minutes (data disappears for a while) since data from markers,vehicles,involved table is deleted in this flow and reloaded (This happens in this flow, with this [command](python3 main.py process cbs --source s3)).

Future tasks - need to be discussed before prioritized - since yet to be a priority:

  • Revise current CBS tables, create a specification document for new/updated CBS tables schema (including documentation tools and/or methods). Current tables sizes
  • Create a specification document for CBS Pipeline
  • After the above documents are reviewed by team members - we'll start the implementation
  • Change cbs_locations location table to either materialized view OR make sure it's stable (right now it's truncated and then recalculated, hence not stable since there's a short time with no data. see here)
  • Delete unused Infographics cache that were created in the past but are no longer in use

Important notes:

  • Additional thoughts and improvements on CBS loading process and loading outputs are welcome
@atalyaalon atalyaalon added this to the v0.6.0 milestone May 12, 2021
@atalyaalon atalyaalon assigned atalyaalon and unassigned atalyaalon May 12, 2021
@atalyaalon atalyaalon assigned OriHoch and unassigned OriHoch Jun 22, 2021
@atalyaalon atalyaalon modified the milestones: v0.6.0, v0.7.0 Jul 3, 2021
@atalyaalon atalyaalon changed the title POC - rebuilding CBS tables for better widgets performance POC - rebuilding CBS tables and pipeline Jul 7, 2021
@atalyaalon atalyaalon changed the title POC - rebuilding CBS tables and pipeline Rebuilding CBS tables and pipeline Jul 7, 2021
@atalyaalon atalyaalon modified the milestones: v0.7.0, v0.8.0 Aug 14, 2021
@atalyaalon atalyaalon modified the milestones: v0.8.0, v0.10.0, v0.9.0 Sep 11, 2021
@atalyaalon atalyaalon changed the title Rebuilding CBS tables and pipeline Improving CBS tables and pipeline Dec 18, 2021
@atalyaalon atalyaalon modified the milestones: v0.9.0, v1.1.0, v1.2.0 Dec 18, 2021
@atalyaalon atalyaalon assigned atalyaalon and ziv17 and unassigned atalyaalon Dec 18, 2021
@atalyaalon atalyaalon modified the milestones: v1.2.0, v0.19.0 Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants