-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUMULUS-3757: Update granule upsert logic to allow updating collection info #3872
base: master
Are you sure you want to change the base?
Conversation
…umulus into CUMULUS-3757-move-granule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate the update, I think it's good to go, but one nit (remove a commented out line of code)
good to have partial state tests and parallelize as much as possible
Taking a quick look at this this morning. |
packages/db/src/lib/granule.ts
Outdated
files: filesTable, | ||
} = TableNames; | ||
await Promise.all(granules.map(async (granule) => { | ||
const pgGranule = await translateApiGranuleToPostgresGranule({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theory/Nit: Should the DB package be taking API granules as input for anything other than translation generally? I see we're doing it for fields in other methods, so I'm not resting on package dogma, but seeing 'take a set of API granules and translate' in the method makes me wonder if we should be offloading that concurrency to the calling method and just take a set of updates to incoming Knex objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really sure, that does sounds like it would make it better, does that entail passing the translation method into the function? changes-wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhhhhhhhhhhh you're saying instead of sending APIGRANULES to the function, instead to send them already translated to PG, ok, that doesnt seem like any more work except that Ethan's task would need to do that 🤔
packages/db/src/lib/granule.ts
Outdated
* @param {Array<ApiGranule>} [granules] - updated ApiGranule records | ||
* @returns {Promise<void>} | ||
*/ | ||
export const updateGranuleAndFiles = async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes an input array of multiples, it should probably be updateGranulesAndFiles
packages/db/src/lib/granule.ts
Outdated
*/ | ||
export const updateGranuleAndFiles = async ( | ||
knexOrTransaction: Knex | Knex.Transaction, | ||
granules: Array<ApiGranule> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider ApiGranuleRecord (these shouldn't be new records, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the difference? theyre not new records (the same granules with the updated files/collectionId based on the move), I assumed ApiGranule was fine, will change it but just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Nnaga1 - adding a review flag based on a couple of initial comments, working on a full review.
@@ -354,3 +357,39 @@ export const getGranulesByGranuleId = async ( | |||
.where({ granule_id: granuleId }); | |||
return records; | |||
}; | |||
|
|||
/** | |||
* Change a granules' PG record and its files' PG record based on collection move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks like it's purpose is to create a bulk granule update method that circumvents the existing granule write logic. That probably isn't specific to collections.
If we are intending that it be specific to moving a collection....I'm assuming the need to entirely re-write the entire object is due to the intent to move files as well, but not try to parallelize writeGranuleFromApi because the business logic should be irrelevant to this use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks like it's purpose is to create a bulk granule update method that circumvents the existing granule write logic.
: this would be the intention of this method, to avoid changing core API functionality and instead do this.
If we are intending that it be specific to moving a collection
: this part, I'm not sure, I intended it to be used for Ethan's task but if there are potential applications elsewhere then it can probably be re-used I'd assume
packages/db/src/lib/granule.ts
Outdated
dynamoRecord: granule, | ||
knexOrTransaction, | ||
}); | ||
await knexOrTransaction(granulesTable).where('granule_id', '=', pgGranule.granule_id).update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to enforce a transaction here, not just have it be possible it's a transaction object. We don't want partial granule/file record updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to enforce a transaction here,
can you point me to an example of this in the code I can refer to? not too familiar with it beyond just doing straight up : await knex(....).....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nnaga1 I'd read up on https://knexjs.org/guide/transactions.html / take a look at what we're doing in the API logic for https://github.com/nasa/cumulus/blob/CUMULUS-3757-move-granule/packages/api/lib/writeRecords/write-granules.js#L488
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm suggesting that we commit all granule/file updates together (and roll back on failure) in a transaction instead of make updates serially in a way that doesn't rollback if any of them fail.
packages/db/src/lib/granule.ts
Outdated
granules: granulesTable, | ||
files: filesTable, | ||
} = TableNames; | ||
await Promise.all(granules.map(async (granule) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: If the map of granules is 10k, is it an acceptable usage scenario that when it fails on granule 1001 for that 1k granules were moved, and 9k failed, or do we want it to move 9,999 of them and fail the one with a metadata/connection/whatever issue? Apologies if that should be obvious, I may be lacking context.
Edit reviewed the tests - re-run/idempotent intent is probably fine here, but a doc annotation for @error is probably warranted in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not exactly sure on this, i'd assume this: or do we want it to move 9,999 of them and fail the one with a metadata/connection/whatever issue?
? I think the failure of writes, of this function, is or should be dealt with task-side but if theres something I can change here then I can do it
} | ||
}); | ||
|
||
test.serial('updateGranuleAndFiles successfully updates a complete list of granules, 1/2 of which have already been moved', async (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a more complete set of units testing all granule updates, given the method isn't written to update files and the collection ID, but all granule fields.
If that 's not intentional and this method is intending to be limited to updating file locations and the collectionID, we should probably make the method more defensive somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so Ethan's task would send the target_granules to me, which are the granule records post move-collections, which is just a complete granule record, so I can do this: a more complete set of units testing all granule updates
, would it just enforce that the other fields are.... the same as it was before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nnaga1 to some degree that depends on what the intent is. As written this I don't believe this method enforces the ticket context (only update collection and file paths)
If the intent is to provide a method that takes an array of granule objects and updates it only utilizing translation conventions and ignoring API write business logic, then we should test that broader case.
If the intent is this helper is designed to just update those specific fields, we should update it to do only that and test.
This is important as this method is an exposed package method and creating a user contract.
Summary: Summary of changes
Addresses CUMULUS-XX: Develop amazing new feature
Changes
PR Checklist