Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Any stack usage with map + list of lookups #1307

Closed
wants to merge 7 commits into from

Commits on Jul 28, 2023

  1. Table test TestFieldConstructors (#1306)

    Tiny test refactoring to make debugging easier. Spotted in #1305.
    rabbbit committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    8d1bd8c View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2023

  1. Optimize zap.Any to use less mamory on stack (unrolled version)

    This is an alternative to:
    - #1301 and #1302 and #1304 - a series of PRs that are faster than this
      one. However, they rely on unsafe.
    - #1303 - my own PR that uses closures, to reduce the stack size by 60%.
    
    This PR reduces the stack size from:
    ```
     field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
    ```
    to
    ```
      field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
    ```
    so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
    methods, like `zap.String`, etc.
    
    The downside is the (slight) incrase in the code maitenance - we unroll
    as much as we can and rely on the compiler correctly re-using small
    variable sizes. While this is not pretty, it feels safe - the changes
    were purely mechanical. Future changes and extensions should be easy to
    review.
    
    Additionally, the new code is (slightly) faster in all cases since we
    remove 1-2 function calls from all paths. The "in new goroutine" path is
    most affected, as shown in benchmarks below.
    
    This was largely inspired by conversations with @cdvr1993. We started
    looking at this in parallel, but I would have given up if it wasn't for
    our conversations.
    This particular version was inspired by an earlier version of #1304 -
    where I realized that @cdvr1993 is doing a similar dispatching mechanism
    that zap is already doing via `zapcore` - a possible optimization.
    
    Longer version:
    
    We have identified an issue where zap.Any can cause performance
    degradation due to stack size.
    
    This is apparently cased by the compiler assigning 4.8kb (a zap.Field
    per arm of the switch statement) for zap.Any on stack. This can result
    in an unnecessary runtime.newstack+runtime.copystack.
    A github issue against Go language is pending.
    
    This can be particularly bad if `zap.Any` was to be used in a new
    goroutine, since the default goroutine sizes can be as low as 2kb (it can
    vary depending on average stack size - see golang/go#18138).
    
    *Most crucially, `zap.Any` is now as fast as a direct dispatch like
    `zap.String`.*
    
    10 runs.
    ```
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
                               │ before-final.txt │           after-final.txt            │
                               │      sec/op      │    sec/op     vs base                │
    Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
    Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
    Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
    Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
    Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
    Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
    Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
    Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
    Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
    Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
    Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
    Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
    Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
    Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
    Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
    Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
    Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
    Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
    Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
    Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
    Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
    Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
    Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
    Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
    Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
    Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
    Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
    Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
    Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
    Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
    Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
    Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
    geomean                          206.3n         182.9n        -11.35%
    ```
    
    On absolute terms:
    
    Before, on arm64:
    ```
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
    BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 50.238s
    ```
    
    After:
    ```
    ❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
    
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
    BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 51.077s
    ```
    
    And amd64, before:
    ```
     % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
    goos: linux
    goarch: amd64
    pkg: go.uber.org/zap
    cpu: AMD EPYC 7B13
    BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 51.671s
    ```
    
    After:
    ```
     % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
    goos: linux
    goarch: amd64
    pkg: go.uber.org/zap
    cpu: AMD EPYC 7B13
    BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 50.294s
    ```
    rabbbit authored and prashantv committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    fcf187c View commit details
    Browse the repository at this point in the history
  2. tests

    rabbbit authored and prashantv committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    d51f401 View commit details
    Browse the repository at this point in the history
  3. coverage

    rabbbit authored and prashantv committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    f71daae View commit details
    Browse the repository at this point in the history
  4. Reduce Any stack usage with map + list of lookups

    Alternative to #1301 and #1303, prebuild a map by type
    to determine which function to call.
    
    This only works for concrete types, so we also need a list for
    interfaces, ordered in precedence order (since multiple interfaces
    may match a single type).
    prashantv committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    ed06c10 View commit details
    Browse the repository at this point in the history
  5. Simplify type lookup map

    prashantv committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    56159ed View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    74d8149 View commit details
    Browse the repository at this point in the history