Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions of dump_clean.sql for selective and daily dumps #699

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sasharevzin
Copy link
Collaborator

@sasharevzin sasharevzin commented Jun 15, 2020

@matschaffer two new crons to setup on a server:

# https://api.safecast.org/system/dump_clean_daily.tar.gz
cron/dump_clean_daily 

@sasharevzin sasharevzin linked an issue Jun 15, 2020 that may be closed by this pull request
Copy link
Contributor

@matschaffer matschaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #427 (comment) we should probably tweak a few things:

  • Remove the blacklisting (this is for historical stuff and shouldn't be required for a new daily export)
  • rename it from "mclean" to avoid confusion

@Frangible should be able to advise here if there are any other questions beyond what was on the original PR

@sasharevzin
Copy link
Collaborator Author

  • Remove the blacklisting (this is for historical stuff and shouldn't be required for a new daily export)

@matschaffer did you mean removing these filters: https://github.com/Safecast/safecastapi/pull/699/files#diff-2a92cbfc2514ad867e2c7fa579a3d0c9R21-R38

@matschaffer
Copy link
Contributor

I'm pretty sure that's what @Frangible meant. Would be nice if he could confirm.

@matschaffer
Copy link
Contributor

Also wondering if we should leave cron/dump_clean_selective.sql out for now. The original ask was for the daily exports (#419)

Not sure how a second bulk export got into the request. @absalomshu can you provide more info on why you need another selective bulk export (instead of just the daily?)

Those full-data-set exports are really expensive in terms of DB resources, so I'd rather not have two.

I'm okay with the one we have today plus a daily incremental export.

@Frangible
Copy link
Contributor

Frangible commented Jun 18, 2020 via email

Comment on lines 28 to 54
AND captured_at IS NOT NULL
AND value IS NOT NULL
AND LOCATION IS NOT NULL
AND ((unit = 'cpm'
AND ((device_id IS NULL
AND value BETWEEN 10.00 AND 350000.0)
OR ((device_id <= 24
OR device_id IN (69,
89))
AND value BETWEEN 10.00 AND 30000.0)))
OR (unit = 'celsius'
AND value BETWEEN -80 AND 80))
AND (ST_X(LOCATION::geometry) != 0.0
OR ST_Y(LOCATION::geometry) != 0.0)
AND ST_Y(LOCATION::geometry) BETWEEN -85.05 AND 85.05
AND ST_X(LOCATION::geometry) BETWEEN -180.00 AND 180.00
AND (user_id NOT IN (9,
442)
OR value < 35.0
OR ST_Y(LOCATION::geometry) NOT BETWEEN 35.4489 AND 35.7278
OR ST_X(LOCATION::geometry) NOT BETWEEN 139.5706 AND 139.9186)
AND (user_id != 366
OR value < 35.0
OR ST_Y(LOCATION::geometry) NOT BETWEEN -45.5201 AND -7.6228
OR ST_X(LOCATION::geometry) NOT BETWEEN 111.3241 AND 153.8620
OR (value < 105.0
AND ST_X(LOCATION::geometry) < 111.3241)) ) ;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Frangible @matschaffer Let me know if we need these filters

@julovi
Copy link

julovi commented Jun 19, 2020 via email

@julovi
Copy link

julovi commented Jun 19, 2020 via email

@sasharevzin
Copy link
Collaborator Author

Hi, I don’t think they’re needed, other filtering can be done locally. By the way, where the daily exports would be located and how many will be stored? Cheers, Julian Villegas, Ph.D.

@julovi On a server itself and can be found here: https://api.safecast.org/system/dump_clean_daily.tar.gz

@matschaffer
Copy link
Contributor

matschaffer commented Jun 20, 2020 via email

@matschaffer
Copy link
Contributor

Certainly less than the $75/mo we’d have to pay to run kubernetes ;)

@matschaffer
Copy link
Contributor

Btw, we might want to get these moved to s3 first. Then we can implement retention via s3 lifetime rules

@auspicacious
Copy link
Contributor

@sasharevzin and @matschaffer I'm thinking that we:

  1. Fix the shell script in this PR for the simplest possible case (every day, we create https://api.safecast.org/system/dump_clean_daily.tar.gz) and accept this PR, so we can see it working

  2. Create a separate task for moving all scheduled dumps to S3 with retention rules

What do you think?

@matschaffer
Copy link
Contributor

Yep. I'd say #571 is basically that "move it to s3 issue", just need to make a note about the retention rules.

@sasharevzin
Copy link
Collaborator Author

@sasharevzin and @matschaffer I'm thinking that we:

  1. Fix the shell script in this PR for the simplest possible case (every day, we create https://api.safecast.org/system/dump_clean_daily.tar.gz) and accept this PR, so we can see it working
  2. Create a separate task for moving all scheduled dumps to S3 with retention rules

What do you think?

@matschaffer @auspicacious I think we are good to go with the first item, for a second I created a ticket: #710
Let me know if something is still missing in this PR.

Thanks

@matschaffer
Copy link
Contributor

matschaffer commented Jul 10, 2020

Sorry for the back and forth here, but two things:

  1. I think we should probably remove this bit
(measurement_import_id NOT IN
            (SELECT id
             FROM measurement_imports
             WHERE subtype NOT IN ('None',
                                   'Drive'))
          OR measurement_import_id IS NULL)

Just make it a straight measurements export.

  1. Similarly let's take clean out of the name and just call it dump_measurements_daily since we're really not doing any cleaning just dropping whatever came into the table to disk.

@matschaffer
Copy link
Contributor

Then yeah, let's do #710 after we set up the s3 storage.

Then we don't have to worry about local clean up we can just s3 cp right after we export, then rm the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extensions of dump_clean.sql for selective and daily dumps
5 participants