Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFRT proxy: asynchronous and faster MakeArrayFromHostBuffer #19407

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

IFRT proxy: asynchronous and faster MakeArrayFromHostBuffer

Note: I use the term 'control path' to refer to everything except HostBufferStore.Store() and HostBufferStore.Lookup() operations.

This CL improves performance with the following changes:

  • The client manufactures array handles for MakeArrayFromHostBufferRequest (instead of the server generating them) and returns to the caller immediately after sending the request. Since ordering is maintained for control path requests across the proxy, future operations on the array do not require any special handling.
  • The data-path HostBufferStoreRequest that corresponds to a MakeArrayFromHostBufferRequest is not ordered by the client (before or after) the MakeArrayFromHostBufferRequest. On the server-side, the loop that handles control path requests, when it sees a MakeArrayFromHostBufferRequest, blocks until the corresponding HostBufferStoreRequest is processed.
  • The data-path (HostBufferStore implementation) and control path now use different gRPC channels.
  • Resulting performance: BM_HostToDeviceAsync/1M/2k results in more than 3 GB/s, making it bottlenecked by gRPC stream.Write() latency. BM_HostToDeviceAsync/1K/98k (~4MB/s) was already bottlenecked by gRPC stream.Write() latency.

This CL also:

  • Adds more XProf tracemes
  • Introduces global_flags.h and global_flags_google.cc so we can conveniently use command-line flags in the proxy-client. This may not be ideal from a clean-code perspective, but makes it much easier to develop and debug the client.

@copybara-service copybara-service bot force-pushed the test_693790827 branch 3 times, most recently from db99e81 to 2f258d9 Compare November 16, 2024 01:06
Note: I use the term 'control path' to refer to everything except `HostBufferStore.Store()` and `HostBufferStore.Lookup()` operations.

This CL improves performance with the following changes:
- The client manufactures array handles for `MakeArrayFromHostBufferRequest` (instead of the server generating them) and returns to the caller immediately after sending the request. Since ordering is maintained for control path requests across the proxy, future operations on the array do not require any special handling.
- The data-path `HostBufferStoreRequest` that corresponds to a `MakeArrayFromHostBufferRequest` is not ordered by the client (before or after) the `MakeArrayFromHostBufferRequest`. On the server-side, the loop that handles control path requests, when it sees a `MakeArrayFromHostBufferRequest`, blocks until the corresponding `HostBufferStoreRequest` is processed.
- The data-path (`HostBufferStore` implementation) and control path now use different gRPC channels.
- Resulting performance: BM_HostToDeviceAsync/1M/2k results in more than 3 GB/s, making it bottlenecked by gRPC `stream.Write()` latency. BM_HostToDeviceAsync/1K/98k (~4MB/s) was already bottlenecked by gRPC `stream.Write()` latency.

This CL also:
- Adds more XProf tracemes
- Introduces `global_flags.h` and `global_flags_google.cc` so we can conveniently use command-line flags in the proxy-client. This may not be ideal from a clean-code perspective, but makes it much easier to develop and debug the client.

PiperOrigin-RevId: 693790827
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant