Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract resolution mechanisms from service implementations #514

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

snazy
Copy link
Member

@snazy snazy commented Dec 9, 2024

Currently all service implementations, both the Polaris admin service and the (Iceberg REST) catalog service need knowledge about the implementation details about how exactly entities and their instances are managed. This is rather a persistence implementation concern, not a service implementation concern. Services should tell, if necessary, tell a properly abstracted builder which entities it will need.

Another concern this change tackles is to eventually allow a more efficient data model that is suitable for horizontally scalable databases without having to use pgsql with the implicitly required serializable isolation level.

This change abstracts the two related but still different entity resolution (ResolutionManifest and Resolver) via interfaces and concrete implementation classes, eliminating the hard dependency of services to concrete persistence specific implementations, and also replace the C/Go-style error handling with exceptions.

On top, this change moves types to separate packages.

This PR is part of a series of changes to abstract implementations and separate those by concern. Follow ups of this change include:

  • Remove the dependency on "entity cache" from Resolver - consumers of Resolver unwrap those in all cases
  • Rather unify ResolutionManifest and Resolver - those are very tightly coupled things, doing quite some work
  • Implement proper exception abstraction (the changes in this PR are already better, but not as good as it could be)
  • Eventually let service implementation work only with their natural keys, and not interact with many types (meta-store-manager-factory, meta-store-manager, meta-store-session, entity-manager, entity-cache).

@snazy snazy force-pushed the hide-entity-resolution-from-services branch 7 times, most recently from e287e41 to 10d4a70 Compare December 10, 2024 10:34
@snazy snazy changed the title Hide entity resolution from service implementations (WIP - DONT REVIEW!) Abstract resolution mechanisms from service implementations (WIP - DONT REVIEW!) Dec 10, 2024
@snazy snazy force-pushed the hide-entity-resolution-from-services branch 4 times, most recently from ef6390f to aba25f7 Compare December 10, 2024 11:07
@snazy snazy changed the title Abstract resolution mechanisms from service implementations (WIP - DONT REVIEW!) Abstract resolution mechanisms from service implementations Dec 10, 2024
@snazy snazy marked this pull request as ready for review December 10, 2024 16:12
throw ex;
}
}
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spurious ;

@collado-mike
Copy link
Contributor

See the doc I attached to #526 . In my mind, much of this code goes away and is replaced by a cache-capable PolarisMetaStoreManager implementation with an additional batch read method.

Currently all service implementations, both the Polaris admin service and the (Iceberg REST) catalog service need knowledge about the implementation details about _how exactly_ entities and their instances are managed. This is rather a persistence implementation concern, not a service implementation concern. Services should tell, if necessary, tell a properly abstracted builder which entities it will need.

Another concern this change tackles is to eventually allow a more efficient data model that is suitable for horizontally scalable databases without having to use pgsql with the implicitly required serializable isolation level.

This change abstracts the two related but still different entity resolution (`ResolutionManifest` and `Resolver`) via interfaces and concrete implementation classes, eliminating the hard dependency of services to concrete persistence specific implementations, and also replace the C/Go-style error handling with exceptions.

On top, this change moves types to separate packages.

This PR is part of a series of changes to abstract implementations and separate those by concern. Follow ups of this change include:
* Remove the dependency on "entity cache" from `Resolver` - consumers of `Resolver` unwrap those in all cases
* Rather unify `ResolutionManifest` and `Resolver` - those are very tightly coupled things, doing quite some work
* Implement proper exception abstraction (the changes in this PR are already better, but not as good as it could be)
* Eventually let service implementation work only with their natural keys, and not interact with many types (meta-store-manager-factory, meta-store-manager, meta-store-session, entity-manager, entity-cache).
@snazy
Copy link
Member Author

snazy commented Dec 11, 2024

See the doc I attached to #526 . In my mind, much of this code goes away and is replaced by a cache-capable PolarisMetaStoreManager implementation with an additional batch read method.

I don't think that this and the doc really conflict and we agree on the overall effort to encapsulate things better. This one is one "baby step" towards the persistence-model doc shared

@snazy snazy force-pushed the hide-entity-resolution-from-services branch from aba25f7 to 33b6529 Compare December 11, 2024 06:50
Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction, but I think @dennishuo should take a good look at this.

Comment on lines +91 to +93
ResolutionManifest buildResolved(PolarisEntitySubType subType);

ResolutionManifest buildResolved();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build or buildManifest

*/
package org.apache.polaris.core.persistence.resolution;

import com.google.errorprone.annotations.CanIgnoreReturnValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use these annotations anywhere else?

@@ -1006,6 +1012,20 @@ public Map<String, String> deserializeProperties(PolarisCallContext callCtx, Str
: new PrincipalSecretsResult(secrets);
}

@Override
@Nonnull
public ResolutionManifestBuilder newResolutionManifestBuilder(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be completely separate from the PolarisMetaStoreManager. I think the Resolver process is business logic, not persistence and I think the PolarisMetaStoreManager interface should be focused on persistence only.

Comment on lines +159 to +165
var resolver = primaryResolver.buildResolved();

List<PrincipalRoleEntity> activatedPrincipalRoles =
resolver.getResolvedCallerPrincipalRoles().stream()
.map(ce -> PrincipalRoleEntity.of(ce.getEntity()))
.collect(Collectors.toList());
this.authenticatedPrincipal.setActivatedPrincipalRoles(activatedPrincipalRoles);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this pulled out of the Resolver process? I mean, I don't think it should be in there, but I think it should be part of the construction of the AuthenticatedPolarisPrincipal, not here. I'd scope this change to keep the logic inside the Resolver process for now, then extract the PrincipalRole resolution as part of a new change.

import org.apache.polaris.core.persistence.cache.EntityCacheByNameKey;
import org.apache.polaris.core.persistence.cache.EntityCacheEntry;

final class ResolverImpl implements Resolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not push these fields into the PolarisResolutionManifest? It seems that the ResolverImpl is just a container, but the PolarisResolutionManifestBuilder seems to build a fully resolved manifest. Can we just consolidate these?

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants