forked from majek/openonload
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ReleaseNotes
executable file
·370 lines (268 loc) · 14.3 KB
/
ReleaseNotes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
OpenOnload-201805
=================
This is a major feature release that adds new features to OpenOnload, in
particular support for the new X2 adapters from Solarflare, extensions
to TCPDirect and new Onload scalable filter capabilities.
The support for X2 adapters includes the new cut-through PIO (CTPIO)
mechanism, which reduces latency on the send path.
The notes below describe some of the new features, including CTPIO and
how it is used with OpenOnload, TCPDirect and ef_vi.
See the Onload user guide for full details of new features and associated
configuration options.
See the accompanying ChangeLog for a list of bugs fixed.
Linux distribution support
--------------------------
This package is supported on:
- Red Hat Enterprise Linux 6.7 - 6.9
- Red Hat Enterprise Linux 7.1 - 7.5
- SuSE Linux Enterprise Server 11 sp4
- SuSE Linux Enterprise Server 12 sp2 and sp3
- Canonical Ubuntu Server LTS 16.04, 18.04
- Canonical Ubuntu Server 17.10
- Debian 8 "Jessie"
- Debian 9 "Stretch"
- Linux kernels 3.0 - 4.16.6
CTPIO
-----
CTPIO improves send latency by moving packets from the PCIe bus to network
port with minimal latency. It can be used in three modes:
1) Cut-through: The frame is transmitted onto the network as it is
streamed across the PCIe bus. This mode offers the best latency.
2) Store-and-forward: The frame is buffered on the adapter before
transmiting onto the network.
3) Store-and-forward with poison disabled: As for (2), except that it is
guaranteed that frames are never poisoned. When this mode is enabled
on any VI, all VIs are placed into store-and-forward mode.
Underrun, poisoning and fallback:
When using cut-through mode, if the frame is not streamed to the adapter
at at least line rate, then the frame is likely to be poisoned. This is
most likely to happen if the application thread is interrupted while
writing the frame to the adapter. In the underrun case, the frame is
terminated with an invalid FCS -- this is referred to as "poisoning" --
and so will be discarded by the link partner. Cut-through mode is
currently expected to perform well only on 10G links.
CTPIO may be aborted for other reasons, including timeout while writing a
frame, contention between threads using CTPIO at the same time, and the
CPU writing data to the adapter in the wrong order.
In all of the above failure cases the adapter falls-back to sending via
the traditional DMA mechanism, which incurs a latency penalty. So a valid
copy of the packet is always transmitted, whether the CTPIO operation
succeeds or not.
Normally only an underrun in cut-through mode will result in a poisoned
frame being transmitted. In rare cases it is also possible for a poisoned
frame to be emitted in store-and-forward mode. If it is necessary to
strictly prevent poisoned packets from reaching the network, then
poisoning can be disabled globally.
CTPIO with OpenOnload
---------------------
When using a suitable adapter, CTPIO is enabled by default in "sf-np"
mode. The operation of CTPIO can be controlled via the following
Onload configuration variables:
EF_CTPIO: 0 (disabled)
1 (enable) (default)
2 (enable, fail if not available)
EF_CTPIO_MODE: ct (cut-through mode)
sf (store-and-forward mode)
sf-np (store-and-forward with no poisoning)
EF_CTPIO_MAX_FRAME_LEN: CTPIO is only used for packets whose length does
not exceed this value. Packets exceeding this
length are sent using another mechanism (DMA or
PIO).
CTPIO bypasses the main adapter datapath, and as a result, packets sent
by CTPIO cannot be looped back in hardware. As a result, CTPIO will not
be enabled by default on interfaces running full-featured firmware. This
behaviour can be overridden using the EF_CTPIO_SWITCH_BYPASS configuration
variable.
CTPIO with TCPDirect
--------------------
When using a suitable adapter CTPIO is enabled by default in "sf-np"
mode. The operation of CTPIO can be controlled via the following
TCPDirect attributes:
ctpio: 0 (disable)
1 (enable) (default)
2 (enable, warn if not available)
3 (enable, fail if not available)
ctpio_mode: ct (cut-through mode)
sf (store-and-forward mode)
sf-np (store-and-forward with no poisoning)
CTPIO with ef_vi
----------------
A complete example showing the use of CTPIO with ef_vi is given in
openonload/src/tests/rtt/rtt_efvi.c.
N.B. CTPIO bypasses the main adapter datapath, and as a result does not
support checksum offload: Frames sent with CTPIO are placed on the wire
without modification, so checksum fields must be completed in software before
sending. For the same reason, frames sent with CTPIO cannot be looped
back by the adapter.
Here is the sequence of steps needed:
1) When allocating a VI (ef_vi_alloc_from_pd()) set the EF_VI_TX_CTPIO
flag.
2) To initiate a send, form a complete Ethernet frame (including L3/L4
checksums, but excluding FCS) in host memory. Initiate the send with
ef_vi_transmit_ctpio() or ef_vi_transmitv_ctpio().
3) Post a fall-back descriptor using ef_vi_transmit_ctpio_fallback() or
ef_vi_transmitv_ctpio_fallback(). These calls are used just like the
standard DMA send calls (ef_vi_transmit() etc.), and so must be
provided with a copy of the frame in registered memory.
The posting of a fall-back descriptor is not on the latency critical
path, provided the CTPIO operation succeeds. However, it must be
posted before posting any further sends on the same VI.
CTPIO diagnostics
-----------------
The adapter maintains counters that show whether CTPIO is being used, and
any reasons for CTPIO sends failing. These can be inspected as follows:
ethtool -S ethX | grep ctpio
Note that some of these counters are maintained on a per-adapter basis,
whereas others are per network port.
Delegated sends in TCPDirect: technology preview
------------------------------------------------
This release adds new API functions to TCPDirect that allow the user to
have TCPDirect handle the TCP state machine while performing critical
sends through another mechanism (such as ef_vi) to achieve lower
latency. This is analogous both in principle and in the arrangement of
the API to the delegated sends feature of Onload. The new API is
offered as a technology preview: it is functionally complete but has
had limited testing, and the API may change if we find problems.
Customers are invited to experiment with the feature and to direct
feedback to [email protected].
Zockets are created through the TCPDirect API as normal. The user can
then request that TCPDirect delegate sending to their application.
Once the application has completed a send through (for example) ef_vi,
it updates TCPDirect and TCPDirect will handle the TCP state machinery,
retransmissions, and so on.
There is an example application at
<onload_install_dir>/src/tests/trade_sim/trader_tcpdirect_ds_efvi.c
Please see the README file in the same directory for further details.
The delegated sends API should be used carefully as there are windows of
time (while the send has been delegated to the application) when the TCP
sequence numbers used by TCPDirect (for ACKs) and the application (for
messages) differ. Delegated sends has been implemented in a way that
should make this harmless, but it is important to ensure the API is used
correctly, and there is potential to confuse other TCP implementations.
Onload scalable filters
-----------------------
A major overhaul to clustering and scalable filters enables new data
center use cases.
Scalable filters direct all traffic for a given VI to Onload so that
the filter table size is not a constraint on the number of concurrent
connections supported efficiently. This release allows scalable filters
to be combined for both passive and active open connections and with RSS,
enabling very high transaction rates for use cases such as a reverse web
proxy.
The extended feature requires modern CPUs that support the CLMUL
instruction.
See the user guide on how to work with the new scalable filter modes
using new and existing Onload configuration options including:
EF_SCALABLE_FILTERS: any=rss:active:passive
This option has been extended to support active-open sockets, and to
support multiple interfaces.
EF_SCALABLE_LISTEN_MODE: 1
This option avoids the performance limitations of kernel listening
sockets when there is a very large number of listening sockets.
The most effective way to use scalable filters is with a dedicated
VI created with a MACVLAN. The new kernel option inject_kernel_gid
controls the injection of packets not handled by Onload back to the
kernel when the VI is instead shared with other functions.
NGINX application support
-------------------------
Onload's clustering capability can support the NGINX application's
master process pattern and hot restart operation by correctly
associating Onload stacks and application worker processes, e.g.:
EF_SCALABLE_FILTERS_ENABLE: 2
EF_CLUSTER_HOT_RESTART: 1
See the user guide for full details. The Onload runtime profile
"nginx_reverse_proxy.opf" has an example set of relevant configurations.
Use of this mode is not compatible with use of the onload extensions
stackname API.
Improvements to socket caching and EF_TCP_SHARED_LOCAL_PORTS
------------------------------------------------------------
Applications such as web proxies that create and close large numbers of
sockets can benefit significantly from Onload's reduced overhead
compared to the kernel in the handling of the socket lifecycle. With
these use cases in mind, some configuration options have been added to
the EF_TCP_SHARED_LOCAL_PORTS feature:
EF_TCP_SHARED_LOCAL_PORTS_REUSE_FAST allows the pool of shared local
ports to be recycled more rapidly, and EF_TCP_SHARED_LOCAL_PORTS_PER_IP
enables shared local ports to be reserved dynamically for the IP
addresses that require them, avoiding exhaustion of the ephemeral port
range.
Similarly, socket caching has been extended to improve the support for
such applications. Applications using many listening sockets with
scalable filters can now use a common cache of sockets accepted from
them, improving utilization of the cache. Also, when a listening
socket is used by multiple processes simultaneously, file descriptors
can now be cached per-process, whereas before, accepted sockets were
cachable only in the process that originated them. This is of
particular benefit to server applications such as NGINX that support
dynamic reconfiguration by spawning a new process reusing existing
listening sockets. This feature is not compatible with sockets that
require O_CLOEXEC.
The default value for EF_PER_SOCKET_CACHE_MAX has changed from 0 to -1,
but the significance is unchanged: the default is still that there is
no per-listen-socket limit to the number of cached accepted sockets. A
value of 0 now means that no accepted sockets will be cached for any
listening sockets; this allows active-open socket caching to be enabled
without also enabling passive-open socket caching.
Please see the User Guide for full details of these features.
Changes to EF_UL_EPOLL=3
------------------------
EF_UL_EPOLL=3 allows epoll_wait() to perform well on epoll sets that
contain a large number of accelerated sockets. Previously, each socket
could gain this scalability benefit in at most one epoll set at a time.
Now, this is possible in up to four epoll sets for any given socket,
provided that each epoll set is in a different process.
Other new Onload configuration options
--------------------------------------
In addition to those already mentioned, this release of Onload
introduces the following new environmental options:
- EF_KERNEL_PACKETS_BATCH_SIZE
- EF_KERNEL_PACKETS_TIMER_USEC
Controls how often packets destined for the kernel are forwarded
there when kernel packet injection is enabled using the
inject_kernel_gid module option.
- EF_PERIODIC_TIMER_CPU
Specifies CPU affinity for periodic kernel tasks.
- EF_PREALLOC_PACKETS
Allocates the maximum number of packet buffers when the stack is
created.
- EF_SCALABLE_ACTIVE_WILDS_NEED_FILTER
Normally this option is set automatically according to the
scalable-filter mode, but it may be used manually to override
whether TCP shared local ports use IP hardware filters.
- EF_SLEEP_SPIN_USEC
Sleeps periodically during epoll_wait() after the initial spin
timeout has elapsed.
- EF_SYNC_CPLANE_AT_CREATE
Controls how Onload queries network interface configuration when
creating stacks.
- EF_TCP_MIN_CWND
Sets the minimum size of the congestion window for TCP connections.
- EF_TCP_SHARED_LOCAL_PORTS_NO_FALLBACK
Causes connect() to fail for TCP sockets if no shared local ports
are available.
Finally, the default value, but not the default behaviour, of the
EF_PER_SOCKET_CACHE_MAX variable has changed: please see the section above
on socket caching.
Please see the User Guide for more details.
32-bit kernels
--------------
This release drops support for 32-bit kernels. 32-bit userspace
applications are still supported.
CONFIG_KALLSYMS
---------------
The CONFIG_KALLSYMS kernel configuration option is required for Onload to
operate correctly. The dependency of Onload on this option is not new in
this release, but very recent kernels -- namely, those containing the
commit 21d375b6b34ff511a507de27bf316b3dde6938d9 that removes the x86
syscall-dispatch fast path -- will be unable to load the Onload driver
when CONFIG_KALLSYSMS is not defined. All of the Linux distributions
listed in the table at the start of this file set this option in their
default kernel configurations and so are not affected.
Monitoring tools
----------------
The 'orm_json' tool for monitoring Onload stacks and reporting statistics
has been improved to include more comprehensive statistics and further
details of Onload stacks' status. To ease integration with standard
performance monitoring tools, the new version features an additional, more
machine-friendly output format, as well as the facility to generate
metadata for the reported statistics.