Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make initial run complete faster #279

Open
rymurr opened this issue Sep 8, 2023 · 4 comments
Open

Make initial run complete faster #279

rymurr opened this issue Sep 8, 2023 · 4 comments

Comments

@rymurr
Copy link
Contributor

rymurr commented Sep 8, 2023

It would be nice if we could do the first 1 month (or so) of data so users can immediately start trying out OpsCenter. The problem is that the algorithm only works in the forward direction. We have (at least) two options:

  1. make it work in both directions
  2. after the initial write is completed just have it start over again and write all data (the first month shouldn't be stupid expensive to do again)

I am in favor of 2. atm.

@rymurr
Copy link
Contributor Author

rymurr commented Sep 8, 2023

I think the initial write should also be done on the 'main' opscenter thread rather than via task. We want to take care that CI and testing dbs don't all recreate the entire enriched query view from scratch on every run and this would help

@joshelser
Copy link
Contributor

We want to take care that CI and testing dbs don't all recreate the entire enriched query view from scratch on every run and this would help

I was just thinking about this last night. +1

@joshelser
Copy link
Contributor

after the initial write is completed just have it start over again and write all data (the first month shouldn't be stupid expensive to do again)

This approach makes sense to me. It should be straightforward to track the most recent month being materialized, and then rebuild it from the beginning of time with the constructs we have already. I think the transaction inside of the query history update procedure should also insulate the streamlit reports from having to know how much data we have materialized (and we can just show some banner telling the user that we have materialized the most recent month's worth of data and are materializing the historical data in the background).

@rymurr
Copy link
Contributor Author

rymurr commented Sep 8, 2023

The update code already has some functionality to zero out and start over. I think if you set one timestamp to 0 in the task history table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants