Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add TLS support to gnet #435

Closed
wants to merge 58 commits into from
Closed

feat: Add TLS support to gnet #435

wants to merge 58 commits into from

Conversation

0-haha
Copy link
Contributor

@0-haha 0-haha commented Jan 27, 2023

1. Are you opening this pull request for bug-fixes, optimizations or new feature?

new feature

2. Please describe how these code changes achieve your intention.

This PR is to add the TLS support to get.

Change of the source code

  • The main TLS library is packaged in the directory pkg/tls/
  • internal/boring is the dummy package for boring TLS as it is required by the standard Golang TLS library
  • Other changes go to acceptor.go, connection.go, and eventloop.go. Basically, once the TLS enabled on the server side,
    it will call gnetConn.UpgradeTLS() to upgrade the protocol to TLS. Then, all reads and writes will go to the gnetConn.readTLS() and gnetConn.writeTLS()

The gnet TLS implementation

  • is developed based on https://github.com/luyu6056/tls.
  • merges the upstream go v1.20rc3 standard TLS library.
    • Since go 1.20 uses crypto/ecdh in crypto/tls/key_agreement.go, which is not available in go <= 1.19, go.mod is bumped up to 1.20.
  • adds the Kernel TLS support. So one can offload the encryption to the kernel.
    • The implementation of kernel TLS was based on @jim3m's implementation, which was based on @FiloSottile's implementation.
    • Kernel TLS is totally depending on the kernel version.
    • Kernel TLS Features
      • KTLS 1.2 TX & RX
      • KTLS 1.3 TX & RX
      • zerocopy and no pad for TLS 1.3
      • ciphersuites: AES-GCM-128, AES-GCM-256, CHACHA20POLY1305
    • Kernel TLS TODO
      • KTLS 1.3 RX disabled on kernel < 5.19 as it causes weird package lost
      • zero copy and no pad have not been tested yet. zero copy is enabled on kernel >= 5.19, and no pad is enabled on kernel >= 6.0.
      • sendfile api

Examples

An example of using gnet TLS can found at https://github.com/0-haha/gnet_tls_examples. You should be able to run the repo in docker.

Other Comments

I would say the majority of the implementation has been completed. Open to ongoing conversations.
中文可以聊。

3. Please link to the relevant issues (if any).

Fixes #16

4. Which documentation changes (if any) need to be made/updated because of this PR?

The gnet.Run command will accept the tls.Config like this:

cer, _ := tls.LoadX509KeyPair("server.crt", "server.key")
// server only uses TLS 1.2 and TLS 1.3
config := &tls.Config{
    MinVersion:   tls.VersionTLS12,
    Certificates: []tls.Certificate{cer},
}
gnet.Run(echo, "tcp://192.168.0.100:8000", gnet.WithTLS(config))

4. Checklist

  • [ x] I have squashed all insignificant commits.
  • [ x] I have commented my code for explaining package types, values, functions, and non-obvious lines.
  • I have written unit tests and verified that all tests passes (if needed).
  • I have documented feature info on the README (only when this PR is adding a new feature).
  • [ x] (optional) I am willing to help maintain this change if there are issues with it later.

0-haha added 17 commits January 25, 2023 23:37
2. change the gnet API name for the TLS server & client
3. gnet TLS write returns the exact number of bytes
    written to the socket rather than the lenght of data.
to MsgBuffer so that the tls conn not longer holds the actual buffer
when the connection is idle.

Other updates:
1. add defaultSize in MsgBuffer
2. fix the condition to clean up the buffer
     (i > blockSize to i >= blockSize)
1. The kernel TLS implementation is based on
     https://github.com/jim3ma/go.git
     branch: dev.ktls.1.16.3
2. Supports: TLS1.2 & TLS 1.3
3. Supported cipher suites:
     AES_128_GCM_SHA256
     AES_256_GCM_SHA384
     CHACHA20_POLY1305_SHA256
4. Server side has been tested and it works.
    Client side needs to be tested later
5. TODO: add sendfile(), TLS_TX_ZEROCOPY_RO (device offload),
    and TLS_RX_EXPECT_NO_PAD. (See
    https://docs.kernel.org/networking/tls.html#optional-optimizations)
    for details.
data should use the local declaration rather than re-declaring
in the if statement, which results len(data) is 0 on line 794,
resulting EOF.
=======================================
1. disable kTLS 1.3 RX on kernel 5.15
2. check zero copy on kernel 5.19
3. check tls 1.3 no pad on kernel 6.0
======================================
1. TLS writes the data into the socket directly rather than writing the
    data into the buffer. the data is buffered only if error unix.EAGAIN
    occurs.
2. Add "tlsEnabled bool" to control when to use tlsconn.Write(). The
    reason is that tlsconn.Write() encrypt the data, then calls
    gnetConn.Write() which could potently call either gnetConn.write()
    or gnetConn.writeTLS(). Therefore, we make "tlsEnabled" to false
    before calling tlsconn.Write(), and then restore "tlsEnabled" to
    true after that.
3. tlsconn.flush() calls gnetConn.Flush() to flush the buffer
    immediately. Therefore, we don't need to call gnetConn.Flush() in
    gnet TLS handshake phase as tlsconn.Handshake() calls
    gnetConn.Flush() implicitly.
@0-haha 0-haha marked this pull request as ready for review January 27, 2023 23:13
@panjf2000
Copy link
Owner

Thank you for implementing TLS and opening this PR.

This might cost me a lot of time to absorb and review the code, but I'll do this as fast as possible.

========================================
Redesign the buffer in gnet TLS implementation
to achieve zero-copy.

Background:
- tlsconn.rawInput: raw input from TCP to hold the TLS record
- tlsconn.input: buffer to hold decrypted TLS record
- tlsconn.hand: buffer to hold handshake data
- tlsconn.sendBuf: buffer to hold sending data

Problems:
- Memory copy in TLS read:
  In the previous implementation, tlsconn.input refers
  to the gnetConn.inboundBuffer. To decrypted, we copy
  el.buffer to tlsconn.rawInput. The TLS connection,
  write the decrypted data to tlsconn.input, which is
  gnetConn.inboundBuffer. When el.eventHandler.OnTraffic()
  is triggered, gnetConn.Next() and gnet.Conn.Peek()
  can trigger more data copy as it can write to c.loop.cache()
- Memory copy in TLS write:
  In the previous implementation, all encrypted data are
  first written to tlsconn.sendBuf, which refers to
  gnetConn.outboundBuffer. Then, tlsconn.Write() calls
  gnetConn.Write() which flushes the buffer to the socket

New implementation:
We designed LazyBuffer (lb) which has a buf []byte and its
reference ref *[]byte. In the lazy mode, lb.ref is always nil,
lb.buf is readonly. When calling lb.Write(), lb request a buffer
from the sync.Pool, and copies lb.buf to the new buffer.
Both lb.buf and lb.ref point to the new buffer.
- New TLS read:
  With LazyBuffer, we let tlsconn.rawInput refer to el.buffer.
  Decrypted data stores in tlsconn.rawInput as well. tlsconn.Data()
  returns the reference of all decrypted data, and will be
  assigned to gnetConn.buffer.
- New TLS write:
  tlsconn.Write() first encrypts the data, then calls
  gnetConn.WriteTCP() which directly writes the data
  to the socket.
- New TLS handshake:
  we restore the tlsconn.Buffering flag which is only used in
  the handshake. Incoming handshake data is stored
  in tlsconn.hand and will be discarded immediately
  after being used. Outgoing handshake data is buffered
  in tlsconn.sendBuf, and will be flushed after calling
  tlsconn.flush() which calls gnetConn.WriteTCP() which
  directly writes the data to the socket.
@0-haha
Copy link
Contributor Author

0-haha commented Jan 31, 2023

I optimized the memory copy and buffer usage for TLS read, write, and handshake. The following briefly describes the implementation idea which would be helpful to review the code

Buffers used in TLS

  • rawInput: stores the TLS record from TCP
  • input: stores the decrypted TLS record, and the memory is owned by rawInput
  • hand: stores the handshake data
  • sendBuf: stores the sending data, and it is only used during the handshake

TLS handshake (starts in eventloop.read()):

  1. Attach eventloop.buffer to tlsconn.rawInput (zero-copy)
  2. Call the tlsconn.handshake()
  3. If tlsconn.HandshakeComplete(), call eventloop.readTLS(); Otherwise, return and will restart at 1 in the next round

Note: In tlsconn.handshake(), it extracts the handshake message from tlsconn.rawInput and zero-copys the message to tls.hand.
tls.hand discards the messages immediately after using it.

TLS read:

  1. Attach eventloop.buffer to tlsconn.rawInput (zero-copy)
  2. Call eventloop.readTLS(). Since tlsconn.rawInput can holds multiple TLS records, we iteratively process all TLS records.
    for {
        Extract the TLS record from "tlsconn.rawInput"
        Decrypt the record into "tlsconn.rawInput"
        tlsconn.data = decrypted record (store the reference, zero-copy)
        gnetConn.buffer = tlsconn.Data() (return tlsconn.data, zero-copy)
        eventHandler.OnTraffic()
        c.inboundBuffer.Write(c.buffer)
        if no more TLS records in "tlsconn.rawInput" {
            discard the data in "tlsconn.rawInput" and "tlsconn.data"
            if there is data left in "tlsconn.rawInput", we cache it. so, the left data is not owned by "eventloop.buffer" which will be used by another "gnetConn"
        }
    }
    
    The data pipeline in each iteration is
    eventloop.buffer (TLS records from TCP) -> (zero-copy) -> tlsconn.rawInput -> (decrypt TLS records) -> tlsconn.rawInput -> (zero-copy) -> tlsconn.Data
    -> (zero-copy) -> gnetConn.buffer -> (used by eventHandler.OnTraffic()) -> (write rest into to) -> c.inboundBuffer
  • Kernel TLS read pipeline
    eventloop.buffer -> (attach to, zero-copy) -> tlsconn.rawInput
    ktlsReadRecord -> (decrypt) -> tlsconn.rawInput (referring to eventloop.buffer) -> (zero-copy) -> tlsconn.Data

    The rest follow the standard gnet TLS read.

TLS write:

The data pipeline is
data -> (encrypt) -> buffer from sync.Pool -> (call) -> gnetConn.writeTCP() -> (call) -> gnetConn.write() (standard gnet write function) -> return buffer tosync.Pool

  • Kernel TLS write
    call gnetConn.write(data). When gnetConn writes the data into the socket (call unix.Write()), kernel encrypts the data automatically.

0-haha and others added 3 commits February 5, 2023 10:48
Message marshalling makes use of BytesOrPanic a lot, under the
assumption that it will never panic. This assumption was incorrect, and
specifically crafted handshakes could trigger panics. Rather than just
surgically replacing the usages of BytesOrPanic in paths that could
panic, replace all usages of it with proper error returns in case there
are other ways of triggering panics which we didn't find.

In one specific case, the tree routed by expandLabel, we replace the
usage of BytesOrPanic, but retain a panic. This function already
explicitly panicked elsewhere, and returning an error from it becomes
rather painful because it requires changing a large number of APIs.
The marshalling is unlikely to ever panic, as the inputs are all either
fixed length, or already limited to the sizes required. If it were to
panic, it'd likely only be during development. A close inspection shows
no paths for a user to cause a panic currently.

This patches ends up being rather large, since it requires routing
errors back through functions which previously had no error returns.
Where possible I've tried to use helpers that reduce the verbosity
of frequently repeated stanzas, and to make the diffs as minimal as
possible.

Thanks to Marten Seemann for reporting this issue.

Updates #58001
Fixes #58359
Fixes CVE-2022-41724

Change-Id: Ieb55867ef0a3e1e867b33f09421932510cb58851
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1679436
Reviewed-by: Julie Qiu <[email protected]>
TryBot-Result: Security TryBots <[email protected]>
Run-TryBot: Roland Shoemaker <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
(cherry picked from commit 1d4e6ca9454f6cf81d30c5361146fb5988f1b5f6)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1728205
Reviewed-by: Tatiana Bradley <[email protected]>
Reviewed-on: https://go-review.googlesource.com/c/go/+/468121
Reviewed-by: Than McIntosh <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
TryBot-Bypass: Michael Pratt <[email protected]>
Run-TryBot: Michael Pratt <[email protected]>
@panjf2000 panjf2000 force-pushed the dev branch 3 times, most recently from fa1dc24 to 5d1cf9e Compare March 10, 2024 04:32
Copy link

This PR is marked as stale because it has been open for 21 days with no activity.

You should take one of the following actions:

  • Manually close this PR if it is no longer relevant
  • Push new commits or comment if you have more information to share

This PR will be automatically closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Oct 13, 2024
@github-actions github-actions bot removed the stale label Oct 15, 2024
Copy link

github-actions bot commented Nov 5, 2024

This PR is marked as stale because it has been open for 21 days with no activity.

You should take one of the following actions:

  • Manually close this PR if it is no longer relevant
  • Push new commits or comment if you have more information to share

This PR will be automatically closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Nov 5, 2024
@github-actions github-actions bot removed the stale label Nov 6, 2024
Copy link

This PR is marked as stale because it has been open for 21 days with no activity.

You should take one of the following actions:

  • Manually close this PR if it is no longer relevant
  • Push new commits or comment if you have more information to share

This PR will be automatically closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the stale label Nov 27, 2024
@ruimya
Copy link

ruimya commented Nov 29, 2024

@panjf2000 hope to further test this library and make gnet officially support tls as soon as possible.

maybe this library gnet-tls-go1-20 should be merged into gnet-io?

@github-actions github-actions bot removed the stale label Nov 30, 2024
@panjf2000
Copy link
Owner

@panjf2000 hope to further test this library and make gnet officially support tls as soon as possible.

maybe this library gnet-tls-go1-20 should be merged into gnet-io?

The owner hasn't responded to my code review for a long time, and this PR has been falling the GitHub CI. Therefore, I've decided to close this PR, but for anyone who wants to implement TLS on gnet, legit PRs are always welcome!

@panjf2000 panjf2000 closed this Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file new feature optimization Some small optimizations pending development Requested PR owner to improve code and waiting for the result working on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS support?
6 participants