Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUMULUS-3757: Update granule upsert logic to allow updating collection info #3872

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

Nnaga1
Copy link
Contributor

@Nnaga1 Nnaga1 commented Dec 2, 2024

Summary: Summary of changes

Addresses CUMULUS-XX: Develop amazing new feature

Changes

  • Detailed list or prose of changes
  • ...

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

Copy link
Contributor

@etcart etcart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate the update, I think it's good to go, but one nit (remove a commented out line of code)
good to have partial state tests and parallelize as much as possible

@Jkovarik
Copy link
Member

Taking a quick look at this this morning.

files: filesTable,
} = TableNames;
await Promise.all(granules.map(async (granule) => {
const pgGranule = await translateApiGranuleToPostgresGranule({
Copy link
Member

@Jkovarik Jkovarik Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theory/Nit: Should the DB package be taking API granules as input for anything other than translation generally? I see we're doing it for fields in other methods, so I'm not resting on package dogma, but seeing 'take a set of API granules and translate' in the method makes me wonder if we should be offloading that concurrency to the calling method and just take a set of updates to incoming Knex objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really sure, that does sounds like it would make it better, does that entail passing the translation method into the function? changes-wise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhhhhhhhhhh you're saying instead of sending APIGRANULES to the function, instead to send them already translated to PG, ok, that doesnt seem like any more work except that Ethan's task would need to do that 🤔

* @param {Array<ApiGranule>} [granules] - updated ApiGranule records
* @returns {Promise<void>}
*/
export const updateGranuleAndFiles = async (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes an input array of multiples, it should probably be updateGranulesAndFiles

*/
export const updateGranuleAndFiles = async (
knexOrTransaction: Knex | Knex.Transaction,
granules: Array<ApiGranule>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider ApiGranuleRecord (these shouldn't be new records, right?)

Copy link
Contributor Author

@Nnaga1 Nnaga1 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the difference? theyre not new records (the same granules with the updated files/collectionId based on the move), I assumed ApiGranule was fine, will change it but just wondering

Copy link
Member

@Jkovarik Jkovarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Nnaga1 - adding a review flag based on a couple of initial comments, working on a full review.

@@ -354,3 +357,39 @@ export const getGranulesByGranuleId = async (
.where({ granule_id: granuleId });
return records;
};

/**
* Change a granules' PG record and its files' PG record based on collection move
Copy link
Member

@Jkovarik Jkovarik Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks like it's purpose is to create a bulk granule update method that circumvents the existing granule write logic. That probably isn't specific to collections.

If we are intending that it be specific to moving a collection....I'm assuming the need to entirely re-write the entire object is due to the intent to move files as well, but not try to parallelize writeGranuleFromApi because the business logic should be irrelevant to this use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks like it's purpose is to create a bulk granule update method that circumvents the existing granule write logic. : this would be the intention of this method, to avoid changing core API functionality and instead do this.

If we are intending that it be specific to moving a collection: this part, I'm not sure, I intended it to be used for Ethan's task but if there are potential applications elsewhere then it can probably be re-used I'd assume

dynamoRecord: granule,
knexOrTransaction,
});
await knexOrTransaction(granulesTable).where('granule_id', '=', pgGranule.granule_id).update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to enforce a transaction here, not just have it be possible it's a transaction object. We don't want partial granule/file record updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to enforce a transaction here, can you point me to an example of this in the code I can refer to? not too familiar with it beyond just doing straight up : await knex(....).....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm suggesting that we commit all granule/file updates together (and roll back on failure) in a transaction instead of make updates serially in a way that doesn't rollback if any of them fail.

granules: granulesTable,
files: filesTable,
} = TableNames;
await Promise.all(granules.map(async (granule) => {
Copy link
Member

@Jkovarik Jkovarik Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If the map of granules is 10k, is it an acceptable usage scenario that when it fails on granule 1001 for that 1k granules were moved, and 9k failed, or do we want it to move 9,999 of them and fail the one with a metadata/connection/whatever issue? Apologies if that should be obvious, I may be lacking context.

Edit reviewed the tests - re-run/idempotent intent is probably fine here, but a doc annotation for @error is probably warranted in the header.

Copy link
Contributor Author

@Nnaga1 Nnaga1 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not exactly sure on this, i'd assume this: or do we want it to move 9,999 of them and fail the one with a metadata/connection/whatever issue?? I think the failure of writes, of this function, is or should be dealt with task-side but if theres something I can change here then I can do it

}
});

test.serial('updateGranuleAndFiles successfully updates a complete list of granules, 1/2 of which have already been moved', async (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a more complete set of units testing all granule updates, given the method isn't written to update files and the collection ID, but all granule fields.

If that 's not intentional and this method is intending to be limited to updating file locations and the collectionID, we should probably make the method more defensive somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so Ethan's task would send the target_granules to me, which are the granule records post move-collections, which is just a complete granule record, so I can do this: a more complete set of units testing all granule updates, would it just enforce that the other fields are.... the same as it was before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nnaga1 to some degree that depends on what the intent is. As written this I don't believe this method enforces the ticket context (only update collection and file paths)

If the intent is to provide a method that takes an array of granule objects and updates it only utilizing translation conventions and ignoring API write business logic, then we should test that broader case.

If the intent is this helper is designed to just update those specific fields, we should update it to do only that and test.

This is important as this method is an exposed package method and creating a user contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants