-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce AWS Security Mapping #21622
Conversation
0b99300
to
a08e98b
Compare
@tdcmeehan can you please help review this when you get a chance? Thanks |
When you can, please add documentation for how users can specify IAM roles or IAM credentials. If I can help, let me know! |
@steveburnett Yes sure, I have a draft in my local, I will refine it and push it as well for your review. |
private static final Logger log = Logger.get(AWSSecurityMappingsSupplier.class); | ||
private final Supplier<AWSSecurityMappings> mappingsSupplier; | ||
|
||
public AWSSecurityMappingsSupplier(Optional<File> configFile, Optional<Duration> refreshPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to check the file exists here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the check in the getMappings
method
} | ||
|
||
Supplier<AWSSecurityMappings> supplier = () -> parseJson(configFile.get().toPath(), AWSSecurityMappings.class); | ||
return refreshPeriod.<Supplier<AWSSecurityMappings>>map(duration -> Suppliers.memoizeWithExpiration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up formatting. orElseGet in new line, appropriately indented.
a08e98b
to
a90d3e6
Compare
Codenotify: Notifying subscribers in CODENOTIFY files for diff 39e11a5...d23c401. No notifications. |
@tdcmeehan sorry for the delay, please let me know in case you have more feedback. I have made the updates based on your previous review comments. @steveburnett I have pushed the documentation as you had requested. Please let me know if you would like any changes. |
@imjalpreet, when I try to build the doc from the branch in this PR I get the following error: "python3 -m sphinx -b html -n -d target/doctrees -j auto src/main/sphinx target/html Sphinx version error: I think this may be because this PR predates PR 21708. Would you update this branch to pick up the change in PR 21708? I'm happy to try again after that. |
a90d3e6
to
49f93f7
Compare
@steveburnett Thanks, I have rebased the branch on latest master. |
Thanks! A local doc build is successful now and I see your new docs in the result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation! It makes sense to me. I commented on a few nits, and suggested moving some of the text to separate the s3 and lakeformation information as their own sections. Let me know what you think, and if any of my suggestions are unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just waiting for earlier comments to be addressed.
49f93f7
to
11812b8
Compare
@steveburnett Thank you for the review and suggestions, I have made the changes. @tdcmeehan sorry for the delay, the review comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local build of docs, everything looks good. Thanks!
@tdcmeehan just a reminder, all the comments were addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks to me like it belongs to a separate module, not presto-hive-common
presto-hive-common/pom.xml
Outdated
@@ -95,6 +105,11 @@ | |||
<artifactId>drift-api</artifactId> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to add this dependency to common code. Anyway it can be isolated to just the aws package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that I am only using one class from this dependency in the class com.facebook.presto.hive.aws.security.s3.AWSS3SecurityMapping.
It is being used to form the credentials object here: https://github.com/prestodb/presto/pull/21622/files#diff-33e654b04e220ef6b736fee85d4cf10f15e0d5255c6af614ce8d4a2000db2b27R51.
We have a couple of options on how we can exclude the aws dependency from hive-common:
- Make a clone(or similar class) of the BasicAWSCredentials class within Presto and use that here
- Rather than creating the credentials object here, modify the current implementation and keep the access key and secret key as is in this class and, create the credentials object when required.
CC: @tdcmeehan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1 sounds fine to me, because we just need the key and secret key, and those can just be opaque strings.
"No AWS Security mappings configured"); | ||
checkArgument( | ||
(awsLakeFormationSecurityMappings == null) || (awsS3SecurityMappings == null), | ||
"AWS S3 Security Mapping is not allowed if AWS Lake Formation Security Mapping is configured"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and vice versa
|
||
Supplier<AWSSecurityMappings> supplier = () -> parseJson(configFile.get().toPath(), AWSSecurityMappings.class); | ||
return refreshPeriod.<Supplier<AWSSecurityMappings>>map(duration -> Suppliers.memoizeWithExpiration( | ||
() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log message not really needed; level debug at most
@imjalpreet what do you think about this? I know that Lake Formation also supports Iceberg tables. Would that mean it would need to be placed in a common module? |
@elharo @tdcmeehan The reason I had to include these classes in hive-common as these classes would be used from both presto-hive and presto-hive-metastore. What would be the ideal package that you think these should be included in? I can try to see if that would be possible. |
11812b8
to
19d82dc
Compare
19d82dc
to
5e8bedb
Compare
@tdcmeehan @elharo I have updated the PR based on the discussion and review comments. Please let me know if there are any additional comments. |
public Optional<File> getConfigFile() | ||
{ | ||
if (configFile != null) { | ||
checkArgument(configFile.exists() && configFile.isFile(), "AWS Lake Formation Security Mapping config file does not exist: %s", configFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configFile is not an argument so this should be an IllegalStateException or an IOException, not an IllegalArgumentException. I'm tempted to say we shouldn't check for existence here at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved this check to where the file is getting used as you suggested
@ConfigDescription("JSON configuration file containing AWS IAM Security mappings") | ||
public AWSSecurityMappingConfig setConfigFile(File configFile) | ||
{ | ||
this.configFile = configFile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely don't check in getConfig. Possibly check when the object is created or when the file is read, but not in the getter method.
|
||
@Config("hive.aws.security-mapping.config-file") | ||
@ConfigDescription("JSON configuration file containing AWS IAM Security mappings") | ||
public AWSSecurityMappingConfig setConfigFile(File configFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we explicitly mark things nullable? This is nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the annotation
@MinDuration("0ms") | ||
@Config("hive.aws.security-mapping.refresh-period") | ||
@ConfigDescription("Time interval after which AWS IAM security mapping configuration will be refreshed") | ||
public AWSSecurityMappingConfig setRefreshPeriod(Duration refreshPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this to be nullable? It is.
I'm a little surprised the object is mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this to be nullable? It is.
I don't think this will ever be null since null is not an acceptable value for Duration. Do you see any issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised the object is mutable.
Since new users can be added or credentials for existing users might be modified at any time. To make that change dynamic without a cluster restart, I have made this object mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All someone has to do to make this null is pass null to this method. If that's nopt OK, put a requireNonNull here. If it is OK, annotate it as nullable.
private static final Logger log = Logger.get(AWSSecurityMappingsSupplier.class); | ||
private final Supplier<AWSSecurityMappings> mappingsSupplier; | ||
|
||
public AWSSecurityMappingsSupplier(Optional<File> configFile, Optional<Duration> refreshPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to check the file exists here
|
||
applyMapping(provider, selector, configuration); | ||
|
||
assertEquals(configuration.get(S3_ACCESS_KEY), mappingResult.getAccessKey().orElse(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orElse in an assert feels strange to me. Just use get(). If it's not present the assert will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified this
import static com.facebook.presto.hive.s3.security.TestAWSS3SecurityMapping.MappingSelector.empty; | ||
import static com.google.common.io.Resources.getResource; | ||
import static java.util.Objects.requireNonNull; | ||
import static org.assertj.core.api.Assertions.assertThatThrownBy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference, but I try to stick to a single test framework for simplicity. Mixing and matching is confusing
{ | ||
Configuration configuration = new Configuration(false); | ||
|
||
assertThatThrownBy(() -> applyMapping(provider, selector, configuration)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the expectedExceptions parameter in the @Test
annotation instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to exceptedExceptions.
There are many classes in Presto that are using assertThatThrownBy
along with TestNG. We can plan the refactor later.
} | ||
} | ||
|
||
protected static class MappingResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
|
||
private MappingResult(Optional<String> accessKey, Optional<String> secretKey, Optional<String> role) | ||
{ | ||
this.accessKey = requireNonNull(accessKey, "accessKey is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requireNonNull really isn't needed in private test code
5e8bedb
to
2e3fea1
Compare
@elharo I have added some changes and comments above, please let me know in case you have more comments. |
AWS Security Mapping | ||
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Presto supports flexible security mapping for AWS Lake Formation, AWS Glue, and AWS S3 API calls, allowing for separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "security mapping" defined somewhere else? This is not a term I've heard before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a generic term, we named it like that since it maps different Presto User Identities to AWS Credentials. Let me know if you have any ideas on the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just need to say that in the docs and the code then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current explanation here on line 389-390: "... allowing for separate credentials or IAM roles for specific users." do you think we should rephrase this better? We tried to explain what we mean by security mapping here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more details on what each mapping should include in the following lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you still never say what a security mapping is. You need to define that. Easiest way might be to rewrite the first paragraph without using the phrase "security mapping".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will rephrase it.
@JsonProperty("lakeformation") List<AWSLakeFormationSecurityMapping> awsLakeFormationSecurityMappings, | ||
@JsonProperty("s3") List<AWSS3SecurityMapping> awsS3SecurityMappings) | ||
{ | ||
checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awkward. Anyway there could be two constructors or two classes to handle the two different cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my initial design, I had both of them(AWS S3 and AWS Lake Formation) as separate features, which meant both had their own set of configs to enable the feature. I felt it would be similar code for both the features and since they are a subset of AWS services, we could have a common feature that can be used either for S3 or Lake Formation depending upon the use-case.
Since only one constructor can be annotated with @JsonCreator
, we had to do it this way since we use this class to parse the Json File which has the mappings.
ImmutableList.of(); | ||
} | ||
|
||
public boolean isAWSLakeFormationSecurityMappingEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These is
methods really make me think adding some subclasses would clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud, if I add a new parameter in the json file at the beginning something like service
which can have two values s3 or lakeformation. Instead of the "is" methods, I can introduce the getServiceName method based on which the relevant feature would be enabled, for eg. in AWSS3SecurityMappingConfigurationProvider#updateConfiguration
I will enable it only if service is S3.
Do you think this would be more readable/cleaner?
@tdcmeehan Can you please have another look in case you have any other comments? |
AWS Security Mapping | ||
^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Presto supports flexible security mapping for AWS Lake Formation, AWS Glue, and AWS S3 API calls, allowing for separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you still never say what a security mapping is. You need to define that. Easiest way might be to rewrite the first paragraph without using the phrase "security mapping".
A security mapping entry can be of two types: ``s3`` or ``lakeformation``. | ||
|
||
The security mapping entries are processed in the order listed in the configuration | ||
file. More specific mappings should thus be specified before less specific mappings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete "thus"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unaddressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it, somehow missed it earlier.
@MinDuration("0ms") | ||
@Config("hive.aws.security-mapping.refresh-period") | ||
@ConfigDescription("Time interval after which AWS IAM security mapping configuration will be refreshed") | ||
public AWSSecurityMappingConfig setRefreshPeriod(Duration refreshPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All someone has to do to make this null is pass null to this method. If that's nopt OK, put a requireNonNull here. If it is OK, annotate it as nullable.
import static com.google.common.base.MoreObjects.toStringHelper; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class AWSLakeFormationSecurityMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like AWSLakeFormationSecurityMapping and AWSS3SecurityMapping should have a common, possibly abstract, SecurityMapping superclass. That might allow you to avoid some of the problems in AWSSecurityMappings which could work exclusively with the supertype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me see what is the best way to simplify the implementation.
@elharo I have refactored the previous implementation to try to address the points you mentioned, can you please have a look and let me know if this looks better? I have kept a separate commit for now. |
return this; | ||
} | ||
|
||
@AssertTrue(message = "MAPPING TYPE(" + MAPPING_TYPE + ") must be configured when AWS Security Mapping Config File(" + CONFIG_FILE + ") is set and vice versa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen this annotation before. Where is this coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the below dependency:
<dependency>
<groupId>javax.validation</groupId>
<artifactId>validation-api</artifactId>
</dependency>
This is being used in a few more places already in Presto like com.facebook.presto.server.InternalCommunicationConfig#isRequiredSharedSecretSet
, com.facebook.presto.sql.analyzer.FeaturesConfig#isSpillerSpillPathsConfiguredIfSpillEnabled
d728296
to
0881c39
Compare
@elharo Thank you for the review. @tdcmeehan The PR is ready for a final review from your side. |
@tdcmeehan just a reminder, this PR is ready for final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of nits, happy to merge once addressed
|
||
AWSSecurityMapping awsS3SecurityMapping = mappings.get().getAWSS3SecurityMapping(context.getIdentity().getUser()); | ||
|
||
verify( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify( | |
checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan just wanted to confirm, I used verify since the expression I was trying to evaluate was not an argument to this method. Is it still okay to use checkArgument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checkArgument
is more appropriate, because it throws an illegal argument exception, which more accurately describes the problem by my read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have updated it to checkArgument.
This will allow flexible security mapping for AWS Lake Formation(and AWS Glue) and AWS S3 API calls, allowing for separate IAM roles or IAM credentials for specific users.
0881c39
to
d23c401
Compare
Description
This will allow flexible security mapping for AWS Lake Formation(and AWS Glue) and AWS S3 API calls, allowing for separate IAM roles or IAM credentials for specific users.
Motivation and Context
#20851
Impact
This PR introduces the ability to use multiple IAM roles / IAM credentials for different Presto Users to access AWS S3.
Follow-up PRs will allow the same for AWS Lake Formation.
Test Plan
Tested with different IAM Roles for different Presto Users when accessing AWS S3 using the Hive Connector
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.