Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple legacy entity tables from results queries #4342

Merged
merged 3 commits into from
Sep 4, 2024

Conversation

JAORMX
Copy link
Contributor

@JAORMX JAORMX commented Sep 2, 2024

Summary

This is another step into moving away from the central entity tables and
into the new per-entity tables.

Fixes #4316

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@JAORMX JAORMX marked this pull request as draft September 2, 2024 13:56
internal/history/service.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, this is not easy code to modify. About the issues with the code referencing repoName and repoOwner separately, it looks like we're either constructing the repo slug again anyway or just returning them in a generic map[string]string through the entityInfo variable, so it should be possible to use the name straight away hopefully?

@JAORMX JAORMX force-pushed the results-entity-instances branch 9 times, most recently from 1755b21 to 2df7309 Compare September 3, 2024 11:01
@JAORMX JAORMX changed the title Use decouple legacy entity tables from results queries Decouple legacy entity tables from results queries Sep 3, 2024
@JAORMX JAORMX force-pushed the results-entity-instances branch 2 times, most recently from 5d15418 to a548c03 Compare September 3, 2024 11:39
@coveralls
Copy link

coveralls commented Sep 3, 2024

Coverage Status

coverage: 52.849% (-0.2%) from 53.06%
when pulling 720aac6 on results-entity-instances
into 5428799 on main.

@JAORMX JAORMX requested a review from jhrozek September 3, 2024 14:15
@JAORMX JAORMX marked this pull request as ready for review September 3, 2024 18:48
entityInfo["entity_type"] = efp.Entity.Type.ToString()
entityInfo["entity_id"] = rs.EntityID.String()

// temporary: These will be replaced by entity_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as these could be simplified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have a set of properties we output and have a general way of displaying them, so this will be replaced with a provider-specific call IMO.

artRepoOwner := efp.Properties.GetProperty(ghprop.ArtifactPropertyRepoOwner).GetString()
artRepoName := efp.Properties.GetProperty(ghprop.ArtifactPropertyRepoName).GetString()
if artRepoOwner != "" && artRepoName != "" {
repoPath = fmt.Sprintf("%s/%s", artRepoOwner, artRepoName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name ever going to be used in anything but presentation? I wonder if we should make the RepoOwner and RepoName properties not tied to github provider (and move them into contants.go like we have properties.RepoPropertyIsPrivate). Then we could use this code for non-github artifacts and maybe even make the code more generic by constructing new properties and calling provider.GetEntityName(REPOSITORY, theNewProperties)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, our output is heavily tied to GitHub, this is actually used to get the alert which won't exist in other providers. I can't wait to deprecate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering your question, I still think it's a github-specific attribute.

@@ -70,8 +70,6 @@ func TestListEvaluationHistoryFilters(t *testing.T) {
require.Equal(t, es1, row.EvaluationID)
require.Equal(t, EntitiesRepository, row.EntityType)
require.Equal(t, repo1.ID, row.EntityID)
require.Equal(t, repo1.RepoOwner, row.RepoOwner.String)
require.Equal(t, repo1.RepoName, row.RepoName.String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we have to remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because else we'd need to return the repo owner via a JOIN in the ListEvaluationHistory call. And the intent was to remove that. We should not rely on any entity-specific info.

ehs := &evaluationHistoryService{
providerManager: providerManager,
propServiceBuilder: func(qtx db.ExtendQuerier) propertiessvc.PropertiesService {
return propertiessvc.NewPropertiesService(qtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be OK to just require the PropertiesService interface as a parameter but the options don't hurt..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done for testing only.

return nil, fmt.Errorf("error fetching entity for properties: %w", err)
}

err = propsvc.RetrieveAllPropertiesForEntity(ctx, efp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is potentially a lot of calls isn't it? Do we actually need up-to-date data here, couldn't we just fetch the entities and properties from cache regardless of whether it's expired or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are at a weird point where we don't have everything in cache just yet. The idea is to fetch from cache if it's possible, and persist it if it isn't there. Perhaps I should instead add "options" to this call and rely on the cache even if the data is marked as stale. wdyt?

}

fetchByProps, err := properties.NewProperties(map[string]any{
properties.PropertyName: ent.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is right or wrong and it's not for this PR, but I was wondering whether the way I coded up fetching properties is correct - for example for PRs you need the PR number, the repo owner and repo name. So the way the fetcher retrieves the information is either 1) those properties are in the lookByProps properties structure or 2) the fetcher parses the name.

I wonder if doing 2) is too "magic" and if a function like this one should rather just return all the cached properties and let the fetcher decide based on the properties and the type of the entity.

// RetrieveAllPropertiesForEntity fetches a single property for the given an entity
// for properties model. Note that properties will be updated in place.
func (ps *propertiesService) RetrieveAllPropertiesForEntity(
ctx context.Context, efp *models.EntityForProperties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but do you think that in a follow up we could change RetrieveAllProperties to accept EntityForProperties or just an Entity that alrady has projectID, providerID and type and do something like:

newProps, err := ps.RetrieveAllProperties(ctx, prov, entity, lookupProps)
efp := NewEntityFromPropertiesWithInstance(entity, newProps)

*EntityWithProperties

// Provider is the provider for the entity
Provider provifv1.Provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep track of the provider and not just use the providerID in the EntityWithProperties? I wonder if the cleanest way would be to just pass the providerID to the propertyService, let the propertyService structure contain a providerManager and always instantiate the provider...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. I was just trying to make it easier to instantiate everything. But this could actually be moved to the RetrieveAllPropertiesForEntity so we would not need the new wrapper. Let me refactor this.

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

The code reads easier now that it doesn't handle the entity type separately. I put some comments inline, they are mostly to make sure that the original design of the PropertyService wasn't bad to start with and that we're not adding hacks to address the shortcomings instead.

The biggest question I have is though - why do we call the Refreshes in this service at all and not just rely on the database properties? Wouldn't we cause a lot of upstream traffic this way?

This is another step into moving away from the central entity tables and
into the new per-entity tables.

Signed-off-by: Juan Antonio Osorio <[email protected]>
@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 4, 2024

The biggest question I have is though - why do we call the Refreshes in this service at all and not just rely on the database properties? Wouldn't we cause a lot of upstream traffic this way?

I added a comment about that above. I think we can deal with that by adding options to the retrieve calls.

This is not needed and redundant. Instead the logic is moved towards the
properties service

Signed-off-by: Juan Antonio Osorio <[email protected]>
@JAORMX JAORMX requested a review from jhrozek September 4, 2024 08:44
if rs.EntityType == db.EntitiesRepository {
entityInfo["repository_id"] = efp.Entity.ID.String()
} else if rs.EntityType == db.EntitiesArtifact {
entityInfo["artifact_id"] = efp.Entity.ID.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just use the upstream ID as a string here to simplify the logic (OK in a follow-up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use these for anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I don't know. Let's try removing them subsequently.

@JAORMX JAORMX merged commit 1a5c09e into main Sep 4, 2024
22 checks passed
@JAORMX JAORMX deleted the results-entity-instances branch September 4, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rely on central entities table for results
3 participants