Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce AWS Security Mapping #21622

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

imjalpreet
Copy link
Member

@imjalpreet imjalpreet commented Jan 2, 2024

Description

This will allow flexible security mapping for AWS Lake Formation(and AWS Glue) and AWS S3 API calls, allowing for separate IAM roles or IAM credentials for specific users.

Motivation and Context

#20851

Impact

This PR introduces the ability to use multiple IAM roles / IAM credentials for different Presto Users to access AWS S3.
Follow-up PRs will allow the same for AWS Lake Formation.

Test Plan

Tested with different IAM Roles for different Presto Users when accessing AWS S3 using the Hive Connector

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Changes
* Introduce AWS Security Mapping which will allow flexible mapping of Presto Users to AWS Credentials or IAM Roles for different AWS Services

@imjalpreet imjalpreet self-assigned this Jan 2, 2024
@imjalpreet imjalpreet requested a review from a team as a code owner January 2, 2024 21:10
@imjalpreet imjalpreet requested a review from presto-oss January 2, 2024 21:10
@imjalpreet imjalpreet requested a review from tdcmeehan January 2, 2024 22:18
@tdcmeehan tdcmeehan self-assigned this Jan 3, 2024
@imjalpreet
Copy link
Member Author

@tdcmeehan can you please help review this when you get a chance? Thanks

@steveburnett
Copy link
Contributor

When you can, please add documentation for how users can specify IAM roles or IAM credentials. If I can help, let me know!

@imjalpreet
Copy link
Member Author

@steveburnett Yes sure, I have a draft in my local, I will refine it and push it as well for your review.

private static final Logger log = Logger.get(AWSSecurityMappingsSupplier.class);
private final Supplier<AWSSecurityMappings> mappingsSupplier;

public AWSSecurityMappingsSupplier(Optional<File> configFile, Optional<Duration> refreshPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate inputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check the file exists here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the check in the getMappings method

}

Supplier<AWSSecurityMappings> supplier = () -> parseJson(configFile.get().toPath(), AWSSecurityMappings.class);
return refreshPeriod.<Supplier<AWSSecurityMappings>>map(duration -> Suppliers.memoizeWithExpiration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean up formatting. orElseGet in new line, appropriately indented.

Copy link

github-actions bot commented Apr 2, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 39e11a5...d23c401.

No notifications.

@imjalpreet
Copy link
Member Author

@tdcmeehan sorry for the delay, please let me know in case you have more feedback. I have made the updates based on your previous review comments.

@steveburnett I have pushed the documentation as you had requested. Please let me know if you would like any changes.

@steveburnett
Copy link
Contributor

@imjalpreet, when I try to build the doc from the branch in this PR I get the following error:

"python3 -m sphinx -b html -n -d target/doctrees -j auto src/main/sphinx target/html
Running Sphinx v4.4.0

Sphinx version error:
The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.
make: *** [html] Error 2"

I think this may be because this PR predates PR 21708.

Would you update this branch to pick up the change in PR 21708? I'm happy to try again after that.

@imjalpreet imjalpreet force-pushed the AWSSecurityMapping branch from a90d3e6 to 49f93f7 Compare April 2, 2024 22:26
@imjalpreet
Copy link
Member Author

@steveburnett Thanks, I have rebased the branch on latest master.

@steveburnett
Copy link
Contributor

@steveburnett Thanks, I have rebased the branch on latest master.

Thanks! A local doc build is successful now and I see your new docs in the result.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! It makes sense to me. I commented on a few nits, and suggested moving some of the text to separate the s3 and lakeformation information as their own sections. Let me know what you think, and if any of my suggestions are unclear.

presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just waiting for earlier comments to be addressed.

@imjalpreet
Copy link
Member Author

@steveburnett Thank you for the review and suggestions, I have made the changes.

@tdcmeehan sorry for the delay, the review comments have been addressed.

@imjalpreet imjalpreet requested a review from steveburnett April 24, 2024 05:27
steveburnett
steveburnett previously approved these changes Apr 24, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local build of docs, everything looks good. Thanks!

@imjalpreet
Copy link
Member Author

@tdcmeehan just a reminder, all the comments were addressed.

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks to me like it belongs to a separate module, not presto-hive-common

@@ -95,6 +105,11 @@
<artifactId>drift-api</artifactId>
</dependency>

<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to add this dependency to common code. Anyway it can be isolated to just the aws package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that I am only using one class from this dependency in the class com.facebook.presto.hive.aws.security.s3.AWSS3SecurityMapping.

It is being used to form the credentials object here: https://github.com/prestodb/presto/pull/21622/files#diff-33e654b04e220ef6b736fee85d4cf10f15e0d5255c6af614ce8d4a2000db2b27R51.

We have a couple of options on how we can exclude the aws dependency from hive-common:

  1. Make a clone(or similar class) of the BasicAWSCredentials class within Presto and use that here
  2. Rather than creating the credentials object here, modify the current implementation and keep the access key and secret key as is in this class and, create the credentials object when required.

CC: @tdcmeehan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 sounds fine to me, because we just need the key and secret key, and those can just be opaque strings.

"No AWS Security mappings configured");
checkArgument(
(awsLakeFormationSecurityMappings == null) || (awsS3SecurityMappings == null),
"AWS S3 Security Mapping is not allowed if AWS Lake Formation Security Mapping is configured");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and vice versa


Supplier<AWSSecurityMappings> supplier = () -> parseJson(configFile.get().toPath(), AWSSecurityMappings.class);
return refreshPeriod.<Supplier<AWSSecurityMappings>>map(duration -> Suppliers.memoizeWithExpiration(
() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message not really needed; level debug at most

@tdcmeehan
Copy link
Contributor

This all looks to me like it belongs to a separate module, not presto-hive-common

@imjalpreet what do you think about this?

I know that Lake Formation also supports Iceberg tables. Would that mean it would need to be placed in a common module?

@imjalpreet
Copy link
Member Author

This all looks to me like it belongs to a separate module, not presto-hive-common

what do you think about this?

I know that Lake Formation also supports Iceberg tables. Would that mean it would need to be placed in a common module?

@elharo @tdcmeehan The reason I had to include these classes in hive-common as these classes would be used from both presto-hive and presto-hive-metastore. What would be the ideal package that you think these should be included in? I can try to see if that would be possible.

@imjalpreet imjalpreet force-pushed the AWSSecurityMapping branch from 19d82dc to 5e8bedb Compare May 22, 2024 23:20
@imjalpreet
Copy link
Member Author

@tdcmeehan @elharo I have updated the PR based on the discussion and review comments. Please let me know if there are any additional comments.

elharo
elharo previously requested changes May 23, 2024
public Optional<File> getConfigFile()
{
if (configFile != null) {
checkArgument(configFile.exists() && configFile.isFile(), "AWS Lake Formation Security Mapping config file does not exist: %s", configFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configFile is not an argument so this should be an IllegalStateException or an IOException, not an IllegalArgumentException. I'm tempted to say we shouldn't check for existence here at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this check to where the file is getting used as you suggested

@ConfigDescription("JSON configuration file containing AWS IAM Security mappings")
public AWSSecurityMappingConfig setConfigFile(File configFile)
{
this.configFile = configFile;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely don't check in getConfig. Possibly check when the object is created or when the file is read, but not in the getter method.


@Config("hive.aws.security-mapping.config-file")
@ConfigDescription("JSON configuration file containing AWS IAM Security mappings")
public AWSSecurityMappingConfig setConfigFile(File configFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we explicitly mark things nullable? This is nullable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the annotation

@MinDuration("0ms")
@Config("hive.aws.security-mapping.refresh-period")
@ConfigDescription("Time interval after which AWS IAM security mapping configuration will be refreshed")
public AWSSecurityMappingConfig setRefreshPeriod(Duration refreshPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want this to be nullable? It is.

I'm a little surprised the object is mutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want this to be nullable? It is.

I don't think this will ever be null since null is not an acceptable value for Duration. Do you see any issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised the object is mutable.

Since new users can be added or credentials for existing users might be modified at any time. To make that change dynamic without a cluster restart, I have made this object mutable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All someone has to do to make this null is pass null to this method. If that's nopt OK, put a requireNonNull here. If it is OK, annotate it as nullable.

private static final Logger log = Logger.get(AWSSecurityMappingsSupplier.class);
private final Supplier<AWSSecurityMappings> mappingsSupplier;

public AWSSecurityMappingsSupplier(Optional<File> configFile, Optional<Duration> refreshPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check the file exists here


applyMapping(provider, selector, configuration);

assertEquals(configuration.get(S3_ACCESS_KEY), mappingResult.getAccessKey().orElse(null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orElse in an assert feels strange to me. Just use get(). If it's not present the assert will fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this

import static com.facebook.presto.hive.s3.security.TestAWSS3SecurityMapping.MappingSelector.empty;
import static com.google.common.io.Resources.getResource;
import static java.util.Objects.requireNonNull;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, but I try to stick to a single test framework for simplicity. Mixing and matching is confusing

{
Configuration configuration = new Configuration(false);

assertThatThrownBy(() -> applyMapping(provider, selector, configuration))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the expectedExceptions parameter in the @Test annotation instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to exceptedExceptions.

There are many classes in Presto that are using assertThatThrownBy along with TestNG. We can plan the refactor later.

}
}

protected static class MappingResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private


private MappingResult(Optional<String> accessKey, Optional<String> secretKey, Optional<String> role)
{
this.accessKey = requireNonNull(accessKey, "accessKey is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull really isn't needed in private test code

@imjalpreet imjalpreet force-pushed the AWSSecurityMapping branch from 5e8bedb to 2e3fea1 Compare May 28, 2024 09:46
@imjalpreet
Copy link
Member Author

@elharo I have added some changes and comments above, please let me know in case you have more comments.

AWS Security Mapping
^^^^^^^^^^^^^^^^^^^^

Presto supports flexible security mapping for AWS Lake Formation, AWS Glue, and AWS S3 API calls, allowing for separate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "security mapping" defined somewhere else? This is not a term I've heard before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a generic term, we named it like that since it maps different Presto User Identities to AWS Credentials. Let me know if you have any ideas on the naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need to say that in the docs and the code then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current explanation here on line 389-390: "... allowing for separate credentials or IAM roles for specific users." do you think we should rephrase this better? We tried to explain what we mean by security mapping here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more details on what each mapping should include in the following lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you still never say what a security mapping is. You need to define that. Easiest way might be to rewrite the first paragraph without using the phrase "security mapping".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will rephrase it.

@JsonProperty("lakeformation") List<AWSLakeFormationSecurityMapping> awsLakeFormationSecurityMappings,
@JsonProperty("s3") List<AWSS3SecurityMapping> awsS3SecurityMappings)
{
checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awkward. Anyway there could be two constructors or two classes to handle the two different cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial design, I had both of them(AWS S3 and AWS Lake Formation) as separate features, which meant both had their own set of configs to enable the feature. I felt it would be similar code for both the features and since they are a subset of AWS services, we could have a common feature that can be used either for S3 or Lake Formation depending upon the use-case.

Since only one constructor can be annotated with @JsonCreator, we had to do it this way since we use this class to parse the Json File which has the mappings.

ImmutableList.of();
}

public boolean isAWSLakeFormationSecurityMappingEnabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These is methods really make me think adding some subclasses would clean this up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud, if I add a new parameter in the json file at the beginning something like service which can have two values s3 or lakeformation. Instead of the "is" methods, I can introduce the getServiceName method based on which the relevant feature would be enabled, for eg. in AWSS3SecurityMappingConfigurationProvider#updateConfiguration I will enable it only if service is S3.

Do you think this would be more readable/cleaner?

@elharo elharo dismissed their stale review May 28, 2024 17:57

done

@imjalpreet
Copy link
Member Author

@tdcmeehan Can you please have another look in case you have any other comments?

AWS Security Mapping
^^^^^^^^^^^^^^^^^^^^

Presto supports flexible security mapping for AWS Lake Formation, AWS Glue, and AWS S3 API calls, allowing for separate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you still never say what a security mapping is. You need to define that. Easiest way might be to rewrite the first paragraph without using the phrase "security mapping".

A security mapping entry can be of two types: ``s3`` or ``lakeformation``.

The security mapping entries are processed in the order listed in the configuration
file. More specific mappings should thus be specified before less specific mappings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "thus"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaddressed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it, somehow missed it earlier.

@MinDuration("0ms")
@Config("hive.aws.security-mapping.refresh-period")
@ConfigDescription("Time interval after which AWS IAM security mapping configuration will be refreshed")
public AWSSecurityMappingConfig setRefreshPeriod(Duration refreshPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All someone has to do to make this null is pass null to this method. If that's nopt OK, put a requireNonNull here. If it is OK, annotate it as nullable.

import static com.google.common.base.MoreObjects.toStringHelper;
import static java.util.Objects.requireNonNull;

public class AWSLakeFormationSecurityMapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like AWSLakeFormationSecurityMapping and AWSS3SecurityMapping should have a common, possibly abstract, SecurityMapping superclass. That might allow you to avoid some of the problems in AWSSecurityMappings which could work exclusively with the supertype.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me see what is the best way to simplify the implementation.

@imjalpreet
Copy link
Member Author

@elharo I have refactored the previous implementation to try to address the points you mentioned, can you please have a look and let me know if this looks better? I have kept a separate commit for now.

return this;
}

@AssertTrue(message = "MAPPING TYPE(" + MAPPING_TYPE + ") must be configured when AWS Security Mapping Config File(" + CONFIG_FILE + ") is set and vice versa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this annotation before. Where is this coming from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the below dependency:

<dependency>
            <groupId>javax.validation</groupId>
            <artifactId>validation-api</artifactId>
</dependency>

This is being used in a few more places already in Presto like com.facebook.presto.server.InternalCommunicationConfig#isRequiredSharedSecretSet, com.facebook.presto.sql.analyzer.FeaturesConfig#isSpillerSpillPathsConfiguredIfSpillEnabled

elharo
elharo previously approved these changes May 31, 2024
@imjalpreet imjalpreet force-pushed the AWSSecurityMapping branch from d728296 to 0881c39 Compare May 31, 2024 20:10
@imjalpreet
Copy link
Member Author

@elharo Thank you for the review.

@tdcmeehan The PR is ready for a final review from your side.

@imjalpreet
Copy link
Member Author

@tdcmeehan just a reminder, this PR is ready for final review.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits, happy to merge once addressed


AWSSecurityMapping awsS3SecurityMapping = mappings.get().getAWSS3SecurityMapping(context.getIdentity().getUser());

verify(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
verify(
checkArgument(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan just wanted to confirm, I used verify since the expression I was trying to evaluate was not an argument to this method. Is it still okay to use checkArgument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkArgument is more appropriate, because it throws an illegal argument exception, which more accurately describes the problem by my read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have updated it to checkArgument.

This will allow flexible security mapping for AWS Lake Formation(and AWS Glue) and AWS S3 API calls, allowing for separate IAM roles or IAM credentials for specific users.
@imjalpreet imjalpreet merged commit 07c1b33 into prestodb:master Jun 4, 2024
57 checks passed
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants