Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2a: Add gRPC S2A #11113

Merged
merged 28 commits into from
Sep 14, 2024
Merged

s2a: Add gRPC S2A #11113

merged 28 commits into from
Sep 14, 2024

Conversation

rmehta19
Copy link
Contributor

@rmehta19 rmehta19 commented Apr 18, 2024

Add S2A Java client to gRPC Java.

Context: https://github.com/google/s2a-go/blob/main/README.md

Copy link

linux-foundation-easycla bot commented Apr 18, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@matthewstevenson88
Copy link
Contributor

Thanks @rmehta19. Can you PTAL at the test failures?

@rmehta19
Copy link
Contributor Author

rmehta19 commented Apr 22, 2024

@matthewstevenson88, I made 2 changes and it looks like 2/3 linux runs are passing. tests(11) is failing with an error in code that was not affected by this PR, so I don't think that failure is related.

The changes:

  • The build originally failed because the generated grpc code in this PR did not match the code generated by the codegen plugin. I couldn't easily compile the codegen myself(due to reasons explained in: protoc-gen-grpc-java not available on apple m1 #7690). So instead I used the latest published binary available on https://mvnrepository.com/artifact/io.grpc/protoc-gen-grpc-java (1.63) to generate S2AServiceGrpc.java (Created and used a separate simple maven project just to run the plugin and get grpc gen code). Then I changed line 8 & 9 of S2AServiceGrpc.java to match other generated code in this repo (ex:

    value = "by gRPC proto compiler",
    comments = "Source: grpc/gcp/handshaker.proto")
    ). This resolved the errors thrown by
    - name: Check for modified codegen
    run: test -z "$(git status --porcelain)" || (git status && echo Error Working directory is not clean. Forget to commit generated files? && false)

  • A failure in the tests was caused when IntegrationTest.java is in package io.grpc.s2a: GetAuthenticationMechanismsTest.java fails because the flag is not being set by JCommander (or perhaps it is getting overwritten). When I moved IntegrationTest.java into the same package as GetAuthenticationMechanismsTest.java: io.grpc.s2a.handshaker, this error does not happen.

@matthewstevenson88
Copy link
Contributor

LGTM, thanks @rmehta19! When you have a chance, please remove "draft" to the PR title and can we say a bit more in the CL description including pointing to the S2A-Go README (similar to what we did in the proto PR)?

After that, can we get sign-off from @erm-g and @ejona86?

@rmehta19 rmehta19 changed the title [DRAFT]s2a: Add gRPC S2A s2a: Add gRPC S2A Apr 26, 2024
@rmehta19
Copy link
Contributor Author

LGTM, thanks @rmehta19! When you have a chance, please remove "draft" to the PR title and can we say a bit more in the CL description including pointing to the S2A-Go README (similar to what we did in the proto PR)?

After that, can we get sign-off from @erm-g and @ejona86?

Done -- thank you for the review @matthewstevenson88!

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, s2a is the public API, and s2a/channel and s2a/handshaker are internal APIs. Are those internal APIs to be used anywhere, even within Google?

s2a/src/main/java/io/grpc/s2a/S2AChannelCredentials.java Outdated Show resolved Hide resolved
s2a/src/main/proto/grpc/gcp/common.proto Outdated Show resolved Hide resolved
@rmehta19
Copy link
Contributor Author

rmehta19 commented Sep 9, 2024

Thanks @larry-safran, the item's have been addressed

@rmehta19
Copy link
Contributor Author

Thanks for the review @larry-safran ! Please let me know if there is anything else to address.

@larry-safran larry-safran added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 13, 2024
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 13, 2024
@larry-safran larry-safran merged commit b8c1aa5 into grpc:master Sep 14, 2024
15 checks passed
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some random things I noticed.

* Configures an {@code S2AChannelCredentials.Builder} instance with credentials used to establish a
* connection with the S2A to support talking to the S2A over mTLS.
*/
public final class MtlsToS2AChannelCredentials {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a different builder than S2aChannelCredentials? If it just is for different arguments... that's why we have the Builder.

File certChainFile = new File(certChainPath);
File trustBundleFile = new File(trustBundlePath);

AdvancedTlsX509KeyManager keyManager = new AdvancedTlsX509KeyManager();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need AdvancedTlsX509KeyManager here. All you needed was:

TlsChannelCredentials.newBuilder()
    .keyManager(certChainFile, privateKeyFile)
    .trustManager(trustBundleFile)
    .build();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9c44754

* @param s2aAddress the address of the S2A server used to secure the connection.
* @return a {@code S2AChannelCredentials.Builder} instance.
*/
public static Builder createBuilder(String s2aAddress) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the naming convention of newBuilder()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 467ddf5

public static final class Builder {
private final String s2aAddress;
private ObjectPool<Channel> s2aChannelPool;
private Optional<ChannelCredentials> s2aChannelCredentials;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using Optional, default this to InsecureChannelCredentials.create().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ef53441

@Override
public Channel create() {
EventLoopGroup eventLoopGroup =
new NioEventLoopGroup(1, new DefaultThreadFactory("S2A channel pool", true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create your own event loop? With ALTS that was necessary because it would block (and it was supposed to be temporary). We should do no blocking with s2a.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the usage of own event loop in 03ced73

*/
@ThreadSafe
public final class S2AHandshakerServiceChannel {
private static final ConcurrentMap<String, Resource<Channel>> SHARED_RESOURCE_CHANNELS =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable with s2aChannelCredentials not being part of the cache key.

Do we even need this global? Having at most one connection per s2a-using Channel doesn't seem that bad. The channel can be configured with idleTimeout() or we tear it down when we aren't doing any handshakes.

Copy link
Contributor Author

@rmehta19 rmehta19 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was forked from alts, but we don't need it for S2A. I've removed the usage of the global in df413b1

try {
isDelegateTerminated =
delegate.awaitTermination(DELEGATE_TERMINATION_TIMEOUT.getSeconds(), SECONDS);
} catch (InterruptedException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Thread.currentThread().interrupt()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c1375bb

* Configures gRPC to use S2A for transport security when establishing a secure channel. Only for
* use on the client side of a gRPC connection.
*/
public final class S2AChannelCredentials {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the public APIs should be labeled with @ExperimentalApi. The string is a tracking issue; just creating one for all the APIs is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2a169ba

SessionResp resp;
try {
resp = stub.send(reqBuilder.build());
} catch (IOException | InterruptedException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Thread.currentThread().interrupt()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c1375bb

@Override
public void onSuccess(SslContext sslContext) {
ChannelHandler handler =
InternalProtocolNegotiators.tls(sslContext).newHandler(grpcHandler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are blocking in the S2ATrustManager, you must use tls(SslContext, ObjectPool<Executor>) for SSLHandler to avoid blocking on the event loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, although in order to do this, I think we need to add a new accessor to

public static ProtocolNegotiator tls(SslContext sslContext,
from InternalProtocolNegotiators.

Changes + adding the new accessor done in: 03ced73

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More random stuff. This is mostly from just skimming randomly and noticing unexpected code shapes.

public final class IntegrationTest {
private static final Logger logger = Logger.getLogger(FakeS2AServer.class.getName());

private static final String CERT_CHAIN =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they have an expiration date? In all other cases we have documentation on how these were created so we 1) know what they are and 2) know how to recreate them. FakeWriter has even more...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the hardcoded certs from this file and FakeWriter. I changed these files to read the certs from the resources folder like the rest of the tests. I regenerated the certs used in these two files and added instructions for how to do so in the resources folder. All of the certs used in the S2A tests expire in 20 yrs. See readme for commands to regenerate.

Done in 19b8f21

@@ -0,0 +1,16 @@
-----BEGIN CERTIFICATE REQUEST-----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to commit the csr. That is temporary, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the README,the csr is created as part of the steps to regenerate the certs in this folder, which have a lifetime of 20 yrs. This resources directory is similar to the one we have the S2A Go client: https://github.com/google/s2a-go/tree/main/testdata

subjectAltName = @alt_names

[alt_names]
IP.1 = ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here? How would this ever work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that the generated certs have localhost IP addr, so that certificate verification passes in tests

private static final ImmutableList<ByteString> FAKE_CERT_DER_CHAIN =
ImmutableList.of(
ByteString.copyFrom(
new byte[] {'f', 'a', 'k', 'e', '-', 'd', 'e', 'r', '-', 'c', 'h', 'a', 'i', 'n'}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: "fake-der-chain".getBytes(StardardCharsets.US_ASCII)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e3f3413


/** Manages access tokens for authenticating to the S2A. */
@ThreadSafe
public final class AccessTokenManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move all the subpackages that aren't supposed to be used into an internal subpackage, like io.grpc.s2a.internal.handshaker.tokenmanager here.

@Override
public void onNext(SessionResp resp) {
verify(!doneReading);
responses.offer(Result.createWithResponse(resp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the offer()s scare me. The queue is sort of randomly sized (not just 2), which also doesn't feel me with confidence. It'd be much better to use add() which throws an exception if the queue is full instead of silently losing data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d13bd88

exception = e;
}
responses.clear();
if (exception != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you allow non-exception results here? If there was ever a non-exception present, that guarantees 1) that this code doesn't know what is going on and 2) there is a race (as that message could have been delayed until send() sent the request, at which point it looks like it is the response for the wrong request).

It seems clear to me you've mirrored some of the ALTS code, but I don't know why it was made complicate/worse in ways like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic existed before we added withWaitForReady on the S2AStub. We had this logic to handle a very specific case:
sometimes the client tries sending the first RPC before the S2A is healthy, so the response returned was always the response to the first RPC. This hasn't caused any problems yet. But now that we are using withWaitForReady, we don't need this type of logic, so I've removed it in: 9efaa6a

* @return a {@link String} representing the ciphersuite.
* @throws AssertionError if the {@link Ciphersuite} is not one of the supported ciphersuites.
*/
static String convertCiphersuite(Ciphersuite ciphersuite) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like dead code.

Copy link
Contributor Author

@rmehta19 rmehta19 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 577b4e5

* @return a {@link String} representation of the TLS version.
* @throws AssertionError if the {@code tlsVersion} is not one of the supported TLS versions.
*/
static String convertTlsProtocolVersion(TLSVersion tlsVersion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisibleForTesting? Looks like it is only used privately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 59573ec

case TLS_VERSION_1_0:
return "TLSv1";
default:
throw new AssertionError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an error for an unexpected argument is asking for huge trouble. From what I can tell, this is not currently checked by the caller and this comes from a remote system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tls version info is coming from S2A. Would it be appropriate if we threw an IllegalArgumentException here, and dealt with it in the caller (SslContextFactory)?


Channel ch = channelPool.getChannel();
S2AServiceGrpc.S2AServiceStub stub = S2AServiceGrpc.newStub(ch);
S2AStub s2aStub = S2AStub.newInstance(stub);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Stub is never closed. This causes IntegrationTest (after Larry's reordering of awaitTermination) to not complete tests quickly, as the RPC is outstanding so awaitTermination() hangs. I think once we shut down the top-level channel that releases the S2AHandshakerServiceChannel.ChannelResource after 1 second, which then calls shutdownNow() on the s2a channel which cancels the RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we never call close on the stub directly. When the
close gets invoked, everything you mentioned happens.

@rmehta19
Copy link
Contributor Author

Thanks for the comments @ejona86; I'll address them in separate commits at #11534. Let me know if you prefer separate PRs for them, or any other way of organizing.

@ejona86
Copy link
Member

ejona86 commented Sep 18, 2024

There's plenty of smaller/isolated items. Feel free to group those together as you wish, as they are easy to review. A few of the comments might turn into larger/plumbing changes. Some of those may be best to be separate. But overall, if you do lots of the "easy" stuff together that'd be fine to review/merge while you work on something more difficult.

@rmehta19
Copy link
Contributor Author

@ejona86 , @larry-safran I have addressed all the comments on this PR in the following 3 PRs:

PTAL at these 3 and leave any comments.

I will address the comment to move things to internal package and combine the MTLS and non-MTLS apis in the following 2 PRs, once the above 3 are merged.

Besides these changes, please let me know anything else we can do to enable s2a to be included in a grpc-java release. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants