-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A80: gRPC Metrics for TCP connection #428
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me as discussed offline, I'll leave others to approve.
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Disposition | Description | | ||
| ----------- | ----------- | ----------- | | ||
| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdroth @yashykt I feel like there's an equivalent label already in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a relatively expensive fan-out for a busy server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we discussed about this offline. Is there a way to support an opaque user-defined peer info string? In prod, by default we only record the cell information due to the fan-out concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out the right format here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be clarified more. For instance, is ipv4:///1.2.3.4:567
also OK? What is the format for ipv6
? What other schemes do we support? @ejona86 is concerned that ipv4:1.2.3.4:567
is not a URI we should support in the first place. So maybe this should just be specified fully as strictly:
- ipv4:<ipv4-address>:<port-number>
OR
- ipv6:[<ipv6-address>]:<port-number>
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashykt how well does seconds here align with other metrics we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are following OpenTelemetry's conventions mostly which advocates for seconds 's' as the unit for duration
A80-grpc-metrics-for-tcp-connection
Outdated
| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. | | ||
| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. | | ||
| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. | | ||
| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what spurious means in this context... maybe we could link to some write-up, or include some more text defining these things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find an authoritative source we can link, but I think we can include what's on go/fathom-metrics i.e. "These are retransmissions that TCP later discovered unnecessary.". The definition is public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will add that in the description.
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of remarks on this:
- Do you know how portable this is going to be? It would help to explain how you plan to collect that data. Do you plan to use
getsockopt
+TCP_INFO
that yields a min_rtt in some cases? Even on linux, I'm not sure all TCP algorithms measure the minimum RTT. - What is a Distribution type? Do you mean a histogram?
- Since this is a property of the transport, this probably belongs under
grpc.transport.tcp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think min_rtt is general enough that we can include, I presume most congestion control tracks RTT or equivalent to keep track of the estimated BDP.
And I think this gRFC is supposed to cover the definition of the metric and not about the collection method. @yashykt Yes, in Linux we can get it via either getsockopt(TCP_INFO)
or TCP_NLA_MIN_RTT
from cmsg
when timestamping is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think min_rtt is general enough that we can include, I presume most congestion control tracks RTT or equivalent to keep track of the estimated BDP.
OK, and even if it weren't the case, there is probably no problem with omitting it.
And I think this gRFC is supposed to cover the definition of the metric and not about the collection method.
OK, yes. My point is more that the definition needs to be precise enough so that different implementations measure the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it's important for the correctness of the metrics, it makes sense to give more information on this metric. Maybe you could say that a valid implementation of this is through getsockopt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a summary of how the metrics are collected and linked some references. Let me know if anything is unclear. Thanks.
A80-grpc-metrics-for-tcp-connection
Outdated
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | | ||
| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is Fathom? A quick internet search yields https://dl.acm.org/doi/pdf/10.1145/3603269.3604815 which sounds related (but I have not read it)... Do you think anything that uses it is going to useful to anyone outside Google?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, and app-limitedness is also not a general property either. I think we should update to something like "Latest throughput measured of the TCP connection."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, updated.
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Disposition | Description | | ||
| ----------- | ----------- | ----------- | | ||
| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we should add the remote_string and maybe the local_string as optional labels on these metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, updated.
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are following OpenTelemetry's conventions mostly which advocates for seconds 's' as the unit for duration
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it's important for the correctness of the metrics, it makes sense to give more information on this metric. Maybe you could say that a valid implementation of this is through getsockopt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(approved by mistake)
A80-grpc-metrics-for-tcp-connection
Outdated
|
||
| Name | Disposition | Description | | ||
| ----------- | ----------- | ----------- | | ||
| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…rics-for-tcp-connection.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. | | ||
| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. | | ||
| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashykt What types should we be defining here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the Counter type to uint64 based on the register methods in metrics.h.
@yashykt Feel free to modify if it will use other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64
and double
are very C/C++ specific. I prefer integer
and floating-point
to keep it generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the suggestion.
|
||
This document proposes changes to the following gRPC components. | ||
|
||
#### Per-Connection TCP Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be ###
instead of ####
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the suggestion.
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buckets for histogram are explicitly specified for request latency in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md. Do you plan to reuse the same buckets? It might be worth specifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, but using buckets for the min_rtt
metric wouldn't offer much significant insights. The value of min_rtt
is bound to the physical length of the path between sender and receiver or the queueing time caused by throttling and load. A step-change in min_rtt
values usually means that traffic is being throttled or experiencing congestion, or has been re-routed through a different path. I think it's better to opt out buckets in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nanahpang If you do not specify buckets, we will get the default OpenTelemetry buckets which is { 0, 5, 10, 25, 50, 75,100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}. Are there better defaults that we can suggest or do we leave it to the user to figure it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding OpenTelemetry buckets, if you're referring to a plugin used for exporting metrics, that would be a separate concern from the metric types we're defining here. However, I agree that specifying a smaller range for the min_rtt metric buckets is the right approach, especially since we're using seconds as the unit (to align with the open-source standard) while the underlying measurements are usually in microseconds.
A high-level approach to collecting TCP metrics (on Linux) is as follows: | ||
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. | ||
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). | ||
3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a bit more precise. I have 3 questions:
- Do you plan to have this interval be specified by users somehow? Or should it be fixed to 5 minutes? 5 minutes seems long to me for updating metrics, ideally it should be something close to the collection period (which iiuc is more typically in the 1 minute ballpark).
- How should the aggregation logic look like? I imagine you have something like this in mind:
- for counters, iterate over sockets and sum up packets_sent, packets_restransmitted of all open sockets.
- for histograms, for each socket, record the value (delivery_rate & min_rtt) in the histogram for the corresponding metric.
That's the only thing that really makes sense given the definition of the metrics above, so perhaps it's fine to leave it implicit.
- What happens to sockets that have been closed in between the interval? Do we just not collect statistics for those and loose the data? I'm not sure that there is an alternative that works, such as calling
getsockopts
and updating statistics just before closing the socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in C++ the interval is not fixed, and the default value is 5 minutes in Fathom. For other language implementations, the interval can be adjusted as needed. @yousukseung for other questions of the Fathom implementation. Thanks.
For context, this high-level plan aims to provide a general understanding of the existing metric collection process in C++ (implemented through Fathom), while offering flexibility for adaptation in other languages. To maintain clarity and focus, implementation details have been omitted from this proposal and can be found in the Fathom documentation.
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nanahpang If you do not specify buckets, we will get the default OpenTelemetry buckets which is { 0, 5, 10, 25, 50, 75,100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}. Are there better defaults that we can suggest or do we leave it to the user to figure it out?
A high-level approach to collecting TCP metrics (on Linux) is as follows: | ||
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. | ||
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). | ||
3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"At a specified time interval" means the interval will be configurable. How is it configured?
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. | | ||
| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection. <br /> Elapsed time = packet acked timestamp - last packet acked timestamp. <br /> Delivery rate = packet acked bytes / elapsed time. | | ||
| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "calculation period"?
|
||
A high-level approach to collecting TCP metrics (on Linux) is as follows: | ||
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission. | ||
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to know what the mapping is for each metric. The first few are easy because the names here mirror tcp_info (I assume). But then it gets less obvious, and tcp_info isn't documented very well.
Add the proposal of gRPC TCP per-connection metrics.