Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43296: [C++][FlightRPC] Remove Flight UCX transport #43297

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Jul 17, 2024

Rationale for this change

This was experimental and we should not encourage it's use.

What changes are included in this PR?

Removal of UCX transport for Flight

Are these changes tested?

Via CI

Are there any user-facing changes?

Yes, we do remove UCX as a possible transport for Flight.

If users want something similar we should encourage the Dissassociated IPC protocol implementation

Copy link

⚠️ GitHub issue #43296 has been automatically assigned in GitHub to PR creator.

@raulcd
Copy link
Member Author

raulcd commented Jul 17, 2024

@kou @lidavidm should we just remove it, as proposed here, or do we have to announce the deprecation on one release and then do it on a follow up?
I did directly remove it because it was supposed to be experimental but let me know if you think it would be best to do it as a more "planned" deprecation process

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jul 17, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

How about adding "Apache Arrow Flight UCX transport will be removed in 18.0.0" or something to apache/arrow-site#537 ?

@lidavidm
Copy link
Member

Good idea, I added a suggestion to the PR

@amirgon
Copy link
Contributor

amirgon commented Jul 18, 2024

We currently utilize the UCX ArrowFlight transport, which provides significant benefits, particularly in reducing CPU utilization. I understand that with Arrow 18, we'll need to implement this transport ourselves to maintain our current functionality.

However, I have several concerns and questions regarding this decision:

  1. Rationale for Removal: It's unclear why the UCX ArrowFlight transport is being removed and discouraged. As this feature was distributed as part of Arrow, we didn't anticipate it being discontinued.

  2. Recommended Alternative: If Disassociated IPC is now the recommended approach, why isn't the UCX ArrowFlight transport being reimplemented using Disassociated IPC and distributed as part of Arrow?

  3. User Abstraction: From a user perspective, the ability to choose a transport by simply prefixing the URL (e.g., ucx: or grpc+tcp:) provides a convenient abstraction. What's the motivation behind removing this abstraction and requiring users to implement protocol details themselves?

  4. Impact on Existing Users: Given that this feature is already in use and providing benefits, have you considered the impact on existing users who rely on this functionality?

  5. Migration Path: Could you provide guidance or a migration path for users who need to transition away from the UCX ArrowFlight transport?

We would appreciate more clarity on these points and any considerations for maintaining similar functionality within the Arrow project.

@raulcd
Copy link
Member Author

raulcd commented Jul 18, 2024

Hi,

It is great that you shared your concerns. Thanks for doing that I am happy to have a discussion around this as this was the expectation when I sent the initial email to the Mailing list over a month ago (See: https://lists.apache.org/thread/g89x2y6pvlq6gyf0d1jnxfl2onsrkyt8)

@amirgon I am not an expert on UCX but as the one proposing the removal after some discussions with other maintainers, I am going to answer here. I would love to move the conversation to the mailing list to give more visibility to the issue and so other experts can join the conversation too.

1. **Rationale for Removal**: It's unclear why the UCX ArrowFlight transport is being removed and discouraged. As this feature was distributed as part of Arrow, we didn't anticipate it being discontinued.

Since the beginning this was an experimental proof of concept. Quoting from the Mailing list:

Replacing gRPC was not the intent.
The disassociated protocol is worded very generically, but works over UCX and libfabric, so it is essentially equivalent but does not force you to use the predefined Flight RPC method names so it is more flexible in that recard.

As you comment, this was distributed as part of Arrow but as experimental. Usually there is a discussion on the mailing list and sometimes those experimental features might be deprecated in favour of other alternatives, from the documentation we can see it was always experimental:

The standard transport for Arrow Flight is gRPC_. The C++
implementation also experimentally supports a transport based on
UCX_. To use it, use the protocol scheme ``ucx:`` when starting a
server or creating a client.

Non-experimental features require a more strict deprecation method in case we as a community decide to do so, that's what happened with Plasma, for example.

2. **Recommended Alternative**: If Disassociated IPC is now the recommended approach, why isn't the UCX ArrowFlight transport being reimplemented using Disassociated IPC and distributed as part of Arrow?

As shared on the ML, this can be done but the Disassociated IPC protocol also allows a more flexible approach without having to use the same methods Flight RPC methods so being more flexible. Whether we want to reimplement ArrowFlight UCX using Dissassociated IPC is something that can be proposed / requested and done. I don't see an impediment to that.

3. **User Abstraction**: From a user perspective, the ability to choose a transport by simply prefixing the URL (e.g., `ucx:` or `grpc+tcp:`) provides a convenient abstraction. What's the motivation behind removing this abstraction and requiring users to implement protocol details themselves?

I don't have context about this and I am sure this could be discussed with experts on that area on the ML thread.

4. **Impact on Existing Users**: Given that this feature is already in use and providing benefits, have you considered the impact on existing users who rely on this functionality?

Yes, that's why we sent the mail to the mailing list to get feedback and where development discussion should happen. We also were going to add information to the blog post of the 17.0.0 release and I was suggesting whether a two release process for announcing the deprecation vs removing it was necessary or not, as we did for example for Plasma.

5. **Migration Path**: Could you provide guidance or a migration path for users who need to transition away from the UCX ArrowFlight transport?

In the past we've done similar guides for example when plasma was deprecated, see this mailing list thread here: https://lists.apache.org/thread/lk277x3b9gjol42sjg27bst2ggm5s0j2
I suppose we could try and do something like this.

We would appreciate more clarity on these points and any considerations for maintaining similar functionality within the Arrow project.

Please @lidavidm @zeroshade as the subject matter experts let me know if you want me to clarify something around that.

@lidavidm
Copy link
Member

(3) is the same argument: it may be convenient but it locks you into those methods, and I at least have come around to thinking that is a mistake for non-trivial applications. The disassociated IPC proposal is more flexible than that, and eventually I would like to make the base gRPC transport equally flexible.

For (4), if there's people willing to maintain it's possible the conversation may be different. Note that the tests were never fully enabled in CI because of flakiness, so there were already known issues in the implementation.

@raulcd
Copy link
Member Author

raulcd commented Jul 23, 2024

I plan to merge this if no more concerns are raised. @kou @lidavidm should I update the blog post for 17.0.0 release, send an email to the Mailing List (dev + user) announcing we are removing it or do we want to hold this until 19.0.0?

@lidavidm
Copy link
Member

Hmm. Given that there are some users and we haven't exactly demonstrated how to port to the Disassociated IPC approach, maybe we should hold it for one more cycle? And add explicit deprecation warnings for 18 instead.

@lidavidm
Copy link
Member

That said given the known deadlocks I'm surprised it actually works in practice 😅

@amirgon
Copy link
Contributor

amirgon commented Jul 23, 2024

That said given the known deadlocks I'm surprised it actually works in practice 😅

@lidavidm Could you elaborate on these "known deadlocks"? Where are they documented?

@kou
Copy link
Member

kou commented Jul 23, 2024

I agree with David. Can we defer this after the 18.0.0 release?

@lidavidm
Copy link
Member

@amirgon

#35742
#35587
#31543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants