Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compactor] Compactor has concurrent map writes / iterations and writes in v0.32.3 #6742

Closed
namco1992 opened this issue Sep 21, 2023 · 5 comments · Fixed by #6746
Closed

Comments

@namco1992
Copy link

Thanos, Prometheus and Golang version used:

thanos --version
thanos, version 0.32.3 (branch: , revision: 3d98d7ce7a254b893e4c8ee8122f7f6edd3174bd-modified)
  build user:
  build date:
  go version:       go1.21.0
  platform:         linux/amd64
  tags:             unknown

Object Storage Provider: In house Ceph cluster.

What happened:
The thanos compactor throws fatal error: concurrent map iteration and map write or fatal error: concurrent map writes and error out.

How to reproduce it (as minimally and precisely as possible):
TBD.

Full logs to relevant components:
fatal error: concurrent map writes: https://pastebin.com/raw/iWPVTQVV
fatal error: concurrent map iteration and map write: https://pastebin.com/raw/wrtHv76Q

Anything else we need to know:

Thanos compact args:

  - args:
    - --compact.concurrency=12
    - --block-files-concurrency=5
    - --compact.blocks-fetch-concurrency=5
    - --debug.max-compaction-level=3
    - --retention.resolution-raw=390d
    - --downsampling.disable
    - --data-dir=/var/thanos/data/compact
    - --log.level=debug
    - --no-debug.halt-on-error
    - --wait
    - --block-viewer.global.sync-block-timeout=15m
    - --delete-delay=30m
    - --consistency-delay=4h
    - --deduplication.func=penalty
    - --deduplication.replica-label=prometheus
    - --objstore.config-file=/etc/thanos/bucket.yaml
    command:
    - /bin/thanos
    - compact
@saswatamcode
Copy link
Member

Hmm, we haven't changed anything specific on compactor in 0.32.3. We did bump objstore version tho: thanos-io/objstore@c042a6a...eb06103

cc: @fpetkovski

@namco1992
Copy link
Author

namco1992 commented Sep 21, 2023

Hi @saswatamcode, we went from 0.30.2 -> 0.32.3, so it could be any version in between that introduces the bug.

To add some more context around the error:

ts=2023-09-20T23:58:28.378530888Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HATGANYEW7Q7HW8WJ24X3QYP/chunks/000001 err="unsupported type of io.Reader: io.nopCloser"
ts=2023-09-20T23:58:28.378597185Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HATGANYEW7Q7HW8WJ24X3QYP/chunks/000004 err="unsupported type of io.Reader: io.nopCloser"
ts=2023-09-20T23:58:28.378597404Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HATGANYEW7Q7HW8WJ24X3QYP/chunks/000002 err="unsupported type of io.Reader: io.nopCloser"
ts=2023-09-20T23:58:28.378593014Z caller=s3.go:487 level=warn msg="could not guess file size for multipart upload; upload might be not optimized" name=01HATGANYEW7Q7HW8WJ24X3QYP/chunks/000003 err="unsupported type of io.Reader: io.nopCloser"
fatal error: concurrent map writes
...

(Btw the error "could not guess file size for multipart upload" is also a new thing introduced by the objstore, which is fixed in thanos-io/objstore#77.)

Update: I think it's fixed in thanos-io/objstore#78, it's a race condition in the objstore so I suppose we can close this issue.

@saswatamcode
Copy link
Member

While that log is old, I think something changed to trigger it and is addressed in thanos-io/objstore#77

For the concurrent map panic, I think the fix might just be thanos-io/objstore#78

@saswatamcode
Copy link
Member

saswatamcode commented Sep 21, 2023

Fwiw we don't see the panics with below config (but do see the "could not guess file size"),

        - compact
        - '--wait'
        - '--log.level=warn'
        - '--log.format=logfmt'
        - '--objstore.config=$(OBJSTORE_CONFIG)'
        - '--data-dir=/var/thanos/compact'
        - '--debug.accept-malformed-index'
        - '--retention.resolution-raw=30d'
        - '--retention.resolution-5m=30d'
        - '--retention.resolution-1h=30d'
        - '--delete-delay=48h'
        - '--compact.concurrency=1'
        - '--downsample.concurrency=1'
        - '--deduplication.replica-label=replica'
        - '--debug.max-compaction-level=3'
        - '--no-downsampling.disable'

@saswatamcode
Copy link
Member

v0.32.4 is now available with these fixes: https://github.com/thanos-io/thanos/releases/tag/v0.32.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants