Skip to content

Commit

Permalink
feat(caddy): persist replay cache across config reloads (#212)
Browse files Browse the repository at this point in the history
* Create a new config format so we can expand listener configuration for proxy protocol.

* Remove unused `fakeAddr`.

* Split `startPort` up between TCP and UDP.

* Use listeners to configure TCP and/or UDP services as needed.

* Remove commented out line.

* Use `ElementsMatch` to compare the services irrespective of element ordering.

* Do not ignore the `keys` field if `services` is used as well.

* Add some more tests for failure scenarios and empty files.

* Remove unused `GetPort()`.

* Move `ResolveAddr` to config.go.

* Remove use of `net.Addr` type.

* Pull listener creation into its own function.

* Move listener validation/creation to `config.go`.

* Use a custom type for listener type.

* Fix accept handler.

* Add doc comment.

* Fix tests still supplying the port.

* Move old config parsing to `loadConfig`.

* Lowercase `readConfig`.

* Use `Config` suffix for config types.

* Remove the IP version specifiers from the `newListener` config handling.

* refactor: remove use of port in proving metric

* Fix tests.

* Add a TODO comment to allow short-form direct listener config.

* Make legacy key config name consistent with type.

* Move config validation out of the `loadConfig` function.

* Remove unused port from bad merge.

* Add comment describing keys.

* Move validation of listeners to config's `Validate()` function.

* Introduce a `NetworkAdd` to centralize parsing and creation of listeners.

* Use `net.ListenConfig` to listen.

* Simplify how we create new listeners.

This does not yet deal with reused sockets.

* Do not use `io.Closer`.

* Use an inline error check.

* Use shared listeners and packet connections.

This allows us to reload a config while the existing one is still
running. They share the same underlying listener, which is actually
closed when the last user closes it.

* Close existing listeners once the new ones are serving.

* Elevate failure to stop listeners to `ERROR` level.

* Be more lenient in config validation to allow empty listeners or keys.

* Ensure the address is an IP address.

* Use `yaml.v3`.

* Move file reading back to `main.go`.

* Do not embed the `net.Listener` type.

* Use a `Service` object to abstract away some of the complex logic of managing listeners.

* Fix how we deal with legacy services.

* Remove commented out lines.

* Use `tcp` and `udp` types for direct listeners.

* Use a `ListenerManager` instead of globals to manage listener state.

* Add validation check that no two services have the same listener.

* Use channels to notify shared listeners they need to stop acceoting.

* Pass TCP timeout to service.

* Move go routine call up.

* Allow inserting single elements directly into the cipher list.

* Add the concept of a listener set to track existing listeners and close them all.

* Refactor how we create listeners.

We introduce shared listeners that allow us to keep an old config
running while we set up a new config. This is done by keeping track of
the usage of the listeners and only closing them when the last user is
done with the shared listener.

* Update comments.

* `go mod tidy`.

* refactor: don't link the TCP handler to a specific listener

* Protect new cipher handling methods with mutex.

* Move `listeners.go` under `/service`.

* Use callback instead of passing in key and manager.

* Move config start into a go routine for easier cleanup.

* Make a `StreamListener` type.

* Rename `closeFunc` to `onCloseFunc`.

* Rename `globalListener`.

* Don't track usage in the shared listeners.

* Add `getAddr()` to avoid some duplicate code.

* Move listener set creation out of the inner function.

* Remove `PushBack()` from `CipherList`.

* Move listener set to `main.go`.

* Close the accept channel with an atomic value.

* Update comment.

* Address review comments.

* Close before deleting key.

* `server.Stop()` does not return a value

* Add a comment for `StreamListener`.

* Do not delete the listener from the manager until the last user has closed it.

* Consolidate usage counting inside a `listenAddress` type.

* Remove `atomic.Value`.

* Add some missing comments.

* address review comments

* Add type guard for `sharedListener`.

* Stop the existing config in a goroutine.

* Add a TODO to wait for all handlers to be stopped.

* Run `stopConfig` in a goroutine in `Stop()` as well.

* Create a `TCPListener` that implements a `StreamListener`.

* Track close functions instead of the entire listener, which is not needed.

* Delegate usage tracking to a reference counter.

* Remove the `Get()` method from `refCount`.

* Return immediately.

* Rename `shared` to `virtual` as they are not actually shared.

* Simplify `listenAddr`.

* Fix use of the ref count.

* Add simple test case for early closing of stream listener.

* Add tests for creating stream listeners.

* Create handlers on demand.

* Refactor create methods.

* Address review comments.

* Use a mutex to ensure another user doesn't acquire a new closer while we're closing it.

* Move mutex up.

* Manage the ref counting next to the listener creation.

* Do the lazy initialization inside an anonymous function.

* Fix concurrent access to `acceptCh` and `closeCh`.

* Use `/` in key instead of `-`.

* Return error from stopping listeners.

* Use channels to ensure `virtualPacketConn`s get closed.

* Add more test cases for packet listeners.

* Only log errors from stopping old configs.

* Remove the `closed` field from the virtual listeners.

* Remove the `RefCount`.

* Implement channel-based packet read for virtual connections.

* Use a done channel.

* Set listeners and `onCloseFunc`'s to nil when closing.

* Set `onCloseFunc`'s to nil when closing.

* Fix race condition.

* Add some benchmarks for listener manager.

* Add structure logging with `slog`.

* Structure forgotten log.

* Another forgotten log.

* Remove IPInfo logic from TCP and UDP handling into the metrics collector.

* Refactor metrics into separate collectors.

* Rename some types to remove `Collector` suffix.

* Use an LRU cache to manage the ipInfos for Prometheus metrics.

* Use `nil` instead of `context.TODO()`.

* Use `LogAttrs` for `debug...()` log functions.

* Update logging in `metrics.go`.

* Fix another race condition.

* Revert renaming.

* Replace LRU cache with a simpler map that expires unused items.

* Move `SetBuildInfo()` call up.

* refactor: change `outlineMetrics` to implement the `prometheus.Collector` interface

* Address review comments.

* Refactor collectors so the connections/associations keep track of the connection metrics.

* Address review comments.

* Make metrics interfaces for bytes consistently use `int64`.

* Add license header.

* Support multi-module workspaces so we can develop Caddy and ss-server at the same time.

* Rename `Collector` to `Metrics`.

* Move service creation into the service package so it can be re-used by Caddy.

* Ignore custom Caddy binary.

* refactor: create re-usable service that can be re-used by Caddy

* Remove need to return errors in opt functions.

* Move the service into `shadowsocks.go`.

* Add Caddy module with app and handler.

* Refactor metrics to not share with Caddy.

* Set Prometheus metrics handler.

* Catch already registered collectors instead of using `once.Sync`.

* refactor: pass in logger to service so caller can control logs

* Fix test.

* Add `--watch` flag to README.

* Remove changes moved to another PR.

* Remove arguments from `Logger()`.

* Use `slog` instead of `zap`.

* Log error in `Provision()` instead of `defineMetrics()`.

* Do not panic on bad metrics registrations.

* Check if the cast to `OutlineApp` is ok.

* Remove `version` from the config.

* Use `outline_` prefix for Caddy metrics.

* Remove unused `NatTimeoutSec` config option.

* Move initialization of handlers to the constructor.

* Pass a `list.List` instead of a `CipherList`.

* Rename `SSServer` to `OutlineServer`.

* refactor: make connection metrics optional

* Make setting the logger a setter function.

* Revert "Pass a `list.List` instead of a `CipherList`."

This reverts commit 1259af8.

* Create noop metrics if nil.

* Revert some more changes.

* Use a noop metrics struct if no metrics provided.

* Add noop implementation for `ShadowsocksConnMetrics`.

* Move logger arg.

* Resolve nil metrics.

* Set logger explicitly to `noopLogger` in service creation.

* Address review comments.

* Set `noopLogger` in `NewShadowsocksStreamAuthenticator()` if nil.

* Fix logger reference.

* Add TODO comment to persist replay cache.

* Remove use of zap.

* feat: persist replay cache across config reloads

* Update comment.

* Fix bad merge and don't use a global.

* Address review comments.

* Update tests.

* Add more context to error message.
  • Loading branch information
sbruens authored Oct 7, 2024
1 parent 3dfecd8 commit 3c24817
Show file tree
Hide file tree
Showing 4 changed files with 106 additions and 6 deletions.
15 changes: 11 additions & 4 deletions caddy/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package caddy

import (
"errors"
"fmt"
"log/slog"

outline_prometheus "github.com/Jigsaw-Code/outline-ss-server/prometheus"
Expand All @@ -30,9 +31,14 @@ import (
const outlineModuleName = "outline"

func init() {
replayCache := outline.NewReplayCache(0)
caddy.RegisterModule(ModuleRegistration{
ID: outlineModuleName,
New: func() caddy.Module { return new(OutlineApp) },
ID: outlineModuleName,
New: func() caddy.Module {
app := new(OutlineApp)
app.ReplayCache = replayCache
return app
},
})
}

Expand Down Expand Up @@ -65,8 +71,9 @@ func (app *OutlineApp) Provision(ctx caddy.Context) error {
app.logger.Info("provisioning app instance")

if app.ShadowsocksConfig != nil {
// TODO: Persist replay cache across config reloads.
app.ReplayCache = outline.NewReplayCache(app.ShadowsocksConfig.ReplayHistory)
if err := app.ReplayCache.Resize(app.ShadowsocksConfig.ReplayHistory); err != nil {
return fmt.Errorf("failed to configure replay history with capacity %d: %v", app.ShadowsocksConfig.ReplayHistory, err)
}
}

if err := app.defineMetrics(); err != nil {
Expand Down
2 changes: 1 addition & 1 deletion caddy/shadowsocks_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ const ssModuleName = "layer4.handlers.shadowsocks"

func init() {
caddy.RegisterModule(ModuleRegistration{
ID: ssModuleName,
ID: ssModuleName,
New: func() caddy.Module { return new(ShadowsocksHandler) },
})
}
Expand Down
17 changes: 16 additions & 1 deletion service/replay.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package service

import (
"encoding/binary"
"errors"
"sync"
)

Expand Down Expand Up @@ -92,11 +93,25 @@ func (c *ReplayCache) Add(id string, salt []byte) bool {
return false
}
_, inArchive := c.archive[hash]
if len(c.active) == c.capacity {
if len(c.active) >= c.capacity {
// Discard the archive and move active to archive.
c.archive = c.active
c.active = make(map[uint32]empty, c.capacity)
}
c.active[hash] = empty{}
return !inArchive
}

// Resize adjusts the capacity of the ReplayCache.
func (c *ReplayCache) Resize(capacity int) error {
if capacity > MaxCapacity {
return errors.New("ReplayCache capacity would result in too many false positives")
}
c.mutex.Lock()
defer c.mutex.Unlock()
c.capacity = capacity
// NOTE: The active handshakes and archive lists are not explicitly shrunk.
// Their sizes will naturally adjust as new handshakes are added and the cache
// adheres to the updated capacity.
return nil
}
78 changes: 78 additions & 0 deletions service/replay_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ package service
import (
"encoding/binary"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

const keyID = "the key"
Expand Down Expand Up @@ -91,6 +94,81 @@ func TestReplayCache_Archive(t *testing.T) {
}
}

func TestReplayCache_Resize(t *testing.T) {
t.Run("Smaller resizes active and archive maps", func(t *testing.T) {
salts := makeSalts(10)
cache := NewReplayCache(5)
for _, s := range salts {
cache.Add(keyID, s)
}

err := cache.Resize(3)

require.NoError(t, err)
assert.Equal(t, cache.capacity, 3, "Expected capacity to be updated")

// Adding a new salt should trigger a shrinking of the active map as it hits the new
// capacity immediately.
cache.Add(keyID, salts[0])
assert.Len(t, cache.active, 1, "Expected active handshakes length to have shrunk")
assert.Len(t, cache.archive, 5, "Expected archive handshakes length to not have shrunk")

// Adding more new salts should eventually trigger a shrinking of the archive map as well,
// when the shrunken active map gets moved to the archive.
for _, s := range salts {
cache.Add(keyID, s)
}
assert.Len(t, cache.archive, 3, "Expected archive handshakes length to have shrunk")
})

t.Run("Larger resizes active and archive maps", func(t *testing.T) {
salts := makeSalts(10)
cache := NewReplayCache(5)
for _, s := range salts {
cache.Add(keyID, s)
}

err := cache.Resize(10)

require.NoError(t, err)
assert.Equal(t, cache.capacity, 10, "Expected capacity to be updated")
assert.Len(t, cache.active, 5, "Expected active handshakes length not to have changed")
assert.Len(t, cache.archive, 5, "Expected archive handshakes length not to have changed")
})

t.Run("Still detect salts", func(t *testing.T) {
salts := makeSalts(10)
cache := NewReplayCache(5)
for _, s := range salts {
cache.Add(keyID, s)
}

cache.Resize(10)

for _, s := range salts {
if cache.Add(keyID, s) {
t.Error("Should still be able to detect the salts after resizing")
}
}

cache.Resize(3)

for _, s := range salts {
if cache.Add(keyID, s) {
t.Error("Should still be able to detect the salts after resizing")
}
}
})

t.Run("Exceeding maximum capacity", func(t *testing.T) {
cache := &ReplayCache{}

err := cache.Resize(MaxCapacity + 1)

require.Error(t, err)
})
}

// Benchmark to determine the memory usage of ReplayCache.
// Note that NewReplayCache only allocates the active set,
// so the eventual memory usage will be roughly double.
Expand Down

0 comments on commit 3c24817

Please sign in to comment.