-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUMULUS-3757: Update granule upsert logic to allow updating collection info #3887
Conversation
…/nasa/cumulus into es_for_move_collections_backport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Nnaga1 - did a once over, the approach looks good 🎉 , but have a few implementation questions for discussion. Take a look when you have a moment and let's chat!
@@ -354,3 +360,61 @@ export const getGranulesByGranuleId = async ( | |||
.where({ granule_id: granuleId }); | |||
return records; | |||
}; | |||
|
|||
/** | |||
* Change a granules' PG record and its files' PG record based on collection move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: formatting. "Change granules PG record and files" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to better define what 'change' means. Is this an upsert, a bounded modification, etc.
/** | ||
* Change a granules' PG record and its files' PG record based on collection move | ||
* | ||
* @param {Knex | Knex.Transaction} knexOrTransaction - DB client or transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should this be either? I think we probably don't want to allow a transaction to create a subtransactio here
* Change a granules' PG record and its files' PG record based on collection move | ||
* | ||
* @param {Knex | Knex.Transaction} knexOrTransaction - DB client or transaction | ||
* @param {Array<ApiGranuleRecord>} [granules] - updated ApiGranule records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these going to be full ApiGranuleRecords, or is a partial allowed? This feels like you might want to consider ApiGranule instead, or else enforce a new type that has update fields if they're required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the master PR you mentioned making them ApiGranuleRecords: #3872 (comment)
last_update_date_time: pgGranule.last_update_date_time, | ||
} | ||
); | ||
if (granule.files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like it's handling file object set mismatches 'correctly' - if we're going to expose this business logic as a DB package method, we need to make sure it handles that case correctly or put a logic-bound on making sure the behavior is understood.
e.g. what does updateGranulesAndFiles
behavior look like in the case where you have a files mismatch.
Based on a first read through, it looks like this does a .update to only file records that match (e.g. already exist)... which means if you pass in a file object with less or more files, that's ignored.
Documentation (or documentation via units) and guide-rails are probably needed here if we're writing a method that specific.
} = TableNames; | ||
|
||
try { | ||
await createRejectableTransaction(knexOrTransaction, async (trx) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it appropriate/intended for the batch update to be all or nothing?
Nit: This should probably be documented in the header
} = req.testContext || {}; | ||
const payload = req.body; | ||
try { | ||
await updateGranulesAndFiles(knex, payload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
tl;dr.
We should probably move the concurrency logic to the API package, handle bounding/configuring concurrency and address validation via something like zod.
Details/concerns:
-
We should consider implementing a concurrency or size bound somehow the API is limited by design to 2 DB connections >by design<, if you throw a large enough request at this endpoint I'd expect it will either timeout or overwhelm the knex connection pool, or both. That's probably not appropriate to the db package (and why we wound up with this sort of logic in the
api/lib
write 'stuff' as an approach. -
Users aren't going to just use this for the intended approach for 3757, if it's part of the public API we likely should use zod to validate inputs/schemas/etc before passing a payload to a @cumulus/db method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this should probably just update one granule and its file, while the api/lib or something else, would do the promise.all/concurrency stuff,
I'm all for changing the approach, I was really hoping in the review it would be to make it better than what I had bc theres stuff I was sure about (concurrency, api calls, the api timing out, how to call this.......), so whatever would be better, this has also changed a lot in the way it should be done so that has also thrown me off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should I do it like this:
make this endpoint just update one granule and its files in PG and ES -> make an api-client call for that -> elswhere, call the api-client fn concurrently with the list of target_granules (the changes to make)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to make a bulk update feature like this utilize a single granule call - that's going to force scaling of lambda invocations per granule, and is super inefficient.
There's room for various approaches here, but particularly, we need to put configurations/limits on concurrency around await Promise.all(granules.map(async (granule) => {
so that we don't overwhelm the limited knex connection pool or make an exception for this bulk call and increase it, however that has user monitoring/tuning concerns.
As an opinionated take: I'd expect that best looks like having the DB package implement a single granule + all files update and associated tests, with an api lib method that does it for a bunch of granules and has concurrency controls in it. That allows for solid testing around a single granule update,and decouples concurrency concerns from the DB package. If the team feels strongly we should support bulk updates as part of the domain of the DB package, we could easily do that same approach entirely within the DB package.
@@ -115,7 +115,7 @@ async function genericRecordUpdate(esClient, id, doc, index, type, parent) { | |||
* @param {string} type - Elasticsearch type | |||
* @returns {Promise} Elasticsearch response | |||
*/ | |||
async function updateExistingRecord(esClient, id, doc, index, type) { | |||
async function updateExistingRecord(esClient, id, doc, index, type, parent = undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
Summary: Summary of changes
Addresses CUMULUS-XX: Develop amazing new feature
Changes
PR Checklist