-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging database restore DAG: base restore #2099
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳 This looks great! No blocking feedback. Code lgtm and I did test locally with SKIP_STAGING_DATABASE_RESTORE
. Splitting this into another PR with the next steps makes total sense to me as well.
I learned so much about TaskFlow reviewing this 😄 Very, very cool.
catalog/dags/database/staging_database_restore/staging_database_restore_dag.py
Outdated
Show resolved
Hide resolved
should_skip >> staging_details | ||
|
||
restore_snapshot = restore_staging_from_snapshot(latest_snapshot, staging_details) | ||
ensure_snapshot_ready >> restore_snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all works as is, but I'm curious why we didn't need to explicitly set either:
latest_snapshot >> ensure_snapshot_ready
or
staging_details >> restore_snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little sneaky, behind the scenes stuff from the TaskFlow API 😅 the restore_snapshot = restore_staging_from_snapshot(latest_snapshot, staging_details)
line creates an inherent dependency relationship from latest_snapshot
and staging_details
to restore_snapshot
, without us having to explicitly define that relationship with the >>
operator. Same for ensure_snapshot_ready
- it has latest_snapshot
as one of its params, so Airflow knows implicitly it has to run that step first!
raise ValueError(f"No staging DB found for {constants.STAGING_IDENTIFIER}") | ||
staging_db = instances[0] | ||
# While it might be tempting to log this information, it contains sensitive | ||
# values. Instead, we'll select only the information we need, then log that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is filtering out sensitive values only important for logging? Ie, could we only restrict logging to REQUIRED_DB_INFO
, but use everything to create the new db? Would that provide any extra resilience to configuration drift, or can we expect that REQUIRED_DB_INFO
will never need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few reasons why the current approach might be best:
- Easy prevention of logging secrets accidentally
REQUIRED_DB_INFO
isn't likely to change on the features that we care about at least- There are some values that are returned from the details API that have to have their shape changed before they can be fed into the create from snapshot API (see the lines below on subnet group name and VPC security group IDs). There are dozens of values returned from the former API, so trying to manage the appropriate transformations on all of them so they can be fed into the latter API seems unnecessary at this point IMO.
|
||
|
||
@task.short_circuit | ||
def skip_restore(should_skip: bool = False) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this take should_skip
as an argument? Will a value ever be passed this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be passed in normal control flow, but it's useful for testing! See: https://github.com/WordPress/openverse/pull/2099/files#diff-cbfa38a24878311bad02d5a1140c5bc6ab29e4e8a468c0c7e90433cdaf45a904R23-R24
log = logging.getLogger(__name__) | ||
|
||
|
||
@task.short_circuit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, TIL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stumbled across this one by accident while looking for something else, couldn't help but use it!
) | ||
) | ||
if not should_continue: | ||
notify_slack.function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
|
||
@functools.wraps(func) | ||
def wrapped(*args, **kwargs): | ||
rds_hook = kwargs.pop("rds_hook", None) or RdsHook( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
if db_identifier not in constants.SAFE_TO_MUTATE: | ||
raise ValueError( | ||
f"The target function must be called with the staging database " | ||
f"identifier ({constants.STAGING_IDENTIFIER}), not {db_identifier}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a little confusing, since the TEMP identifiers also work, right? Maybe The target function must be called with a non-production database identifier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot, that's a great point. This function went through a few iterations (initially it was only staging, then SAFE_TO_MUTATE
came about). I'll update this!
staging_database_restore.slack, "send_message" | ||
) as mock_send_message: | ||
actual = staging_database_restore.skip_restore.function(should_skip) | ||
assert actual == (not should_skip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add tests that mock the SKIP_STAGING_DATABASE_RESTORE
variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since that's a Variable.get
with a simple or
comparison between that and should_skip
, that testing the conditions there is sufficient. This seemed easier than mocking the Variable.get
step, but I can change it to that kind of test and remove the should_skip
param if you think that's best!
rule = TriggerRule.NONE_FAILED | ||
with DAG(dag_id="test_make_rename_task_group", start_date=datetime(1970, 1, 1)): | ||
group = staging_database_restore.make_rename_task_group("dibble", "crim", rule) | ||
assert group.group_id == "rename_dibble_to_crim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳 This looks great! No blocking feedback. Code lgtm and I did test locally with SKIP_STAGING_DATABASE_RESTORE
. Splitting this into another PR with the next steps makes total sense to me as well.
I learned so much about TaskFlow reviewing this 😄 Very, very cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this locally, and it skipped all of the steps :)
Oh, and got scared at first of all the errors when I set the variable value to True
(capitalized as in Python). So everything worked as expected.
Co-authored-by: Staci Mullins <[email protected]>
Fixes
This is a step towards #1989 but does not address it completely (steps 12+ from the implementation plan will be added in follow-up PRs).
Description
This PR adds the initial restore-from-snapshot steps of the staging database restore DAG. See the linked IP above for the full definition.
I recommend reviewing in the following order:
constants.py
utils.py
staging_database_restore.py
staging_database_restore_dag.py
Since this doesn't have the last required steps for the DAG, we won't enable it once this is merged. I felt that splitting this up into several PRs was a better route to take though, otherwise the full DAG would have been huge!
Testing Instructions
Tests should pass locally. You can also set the
SKIP_STAGING_DATABASE_RESTORE
Airflow Variable totrue
and kick off the DAG to see that it should skip everything.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin