Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed marshalling when handling captured traffic #29

Open
alongir opened this issue Jan 25, 2024 · 19 comments
Open

Failed marshalling when handling captured traffic #29

alongir opened this issue Jan 25, 2024 · 19 comments
Assignees

Comments

@alongir
Copy link
Member

alongir commented Jan 25, 2024

When I tried v52.1.7-dev2, I sometimes see crypto/tls traffic and not open ssl. And I see the following logs in the worker, which makes me think there's a problem with parsing. These logs appear only with TLS traffic.
Also considering this: #30, it was challaning to debug, but this might be important.
I did validate that the socket worked on the tracer and worker.

2024-01-25T04:37:17Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:17Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:21Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:21Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:21Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:22Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:25Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:27Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:27Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="26.528µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:30Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:30Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="17.398µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:31Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:31Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:31Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:33Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="26.152µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:35Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:35Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:35Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:36Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="22.976µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:39Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:39Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="22.534µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:39Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:40Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:41Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:41Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:43Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:44Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:44Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:46Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="17.879µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:48Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:48Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:49Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:51Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:51Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:52Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:53Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:54Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:54Z INF server/middlewares/logger.go:69 > body_size=17 client_id=192.168.50.165 latency="25.976µs" method=GET path=/pcaps/total-size status_code=200                                     │
│ 2024-01-25T04:37:57Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting request to HAR"                                       │
│ 2024-01-25T04:37:57Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:37:58Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:38:01Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:38:01Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:38:02Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR"                                      │
│ 2024-01-25T04:38:02Z ERR main.go:173 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR" 
@alongir
Copy link
Member Author

alongir commented Jan 26, 2024

BTW, a way to reproduce:

  1. Run: kubectl apply -f https://raw.githubusercontent.com/kubeshark/sock-shop-demo/master/deploy/kubernetes/tls-demo.yaml
  2. You can look at the logs of each running pod to see the riming of a new message.
  3. Watch the Worker logs (worker on the same node as the test pods). See the log message.

@alongir
Copy link
Member Author

alongir commented Jan 26, 2024

@iluxa I tested: v52.1.7-dev3 and saw that the tracer_dump.pcap was getting a lot of payload for both crypto/tls and openssl. The dashboard was showing only crypto/tls only. Both workers logs were filled with the above message.
If I need to assume anything, I believe we are losing payload in this line: https://github.com/kubeshark/worker/blob/2ded70fb2d4ec1eb373d70c639729e67ee2351c8/main.go#L170

@alongir
Copy link
Member Author

alongir commented Jan 27, 2024

I made some more runs.
It;s pretty much unpredictable. I can see good from both pods and I can start missing traffic from one or the two.
The PCAPs, are actually quite interesting. a lot of good traffic and some weird.
I'll send you both PCAPs in slack.

Image

@alongir
Copy link
Member Author

alongir commented Jan 27, 2024

No description provided.

@iluxa
Copy link
Collaborator

iluxa commented Jan 27, 2024

Yes, I see receiver buffer is not enough, changed it

please test v52.1.7-dev4 and if the issue still exist, please provide tracer logs

@alongir
Copy link
Member Author

alongir commented Jan 27, 2024

So, I tested a few times. Only a few times, I don't see any failures. And most times I do.

See the logs attached:

kubeshark_logs.zip

│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258579] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379268422] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258579] Unable to get go user-kernel context [fd: 3154250937]]                                                                        │
│ 2024-01-27T18:03:51Z ERR tracer/bpf_logger.go:112 > [bpf] [1500571379258578] Unable to get go user-kernel context [fd: 3154250937]]      

@iluxa
Copy link
Collaborator

iluxa commented Jan 28, 2024

Are there original errors (Failed marshalling item: error="json: error calling MarshalJSON...) with v52.1.7-dev4 version or only new ones (Unable to get go user-kernel context...) ?

@alongir
Copy link
Member Author

alongir commented Jan 28, 2024

Sorry for the confusion. I'm helping debugging the original messages that occur in the Worker. These ones occur in the Tracer. I thought it would help.

@alongir
Copy link
Member Author

alongir commented Feb 1, 2024

@iluxa:

  1. PCAP + logs. the Marshaling problem persists
  2. Couldn't find obvious buffer errors in PCAPs

kubeshark_logs.zip
pcaps.tar.gz

@iluxa
Copy link
Collaborator

iluxa commented Feb 1, 2024

@alongir
I do not see previous inconsistency in pcaps (when packets are less size than expected)

However I see new issue in the tracer logs records like:

Buffer is full, dropped XX chunks

I see two issues here:

  1. packets get captured during disassembly still in progress. Disassembly process and packets polling process are competing for CPU. This can be fixed in the source code by starting capturing only after disassembly procedure is completed, however there is second issue:
  2. After the disassembly is completed there are still records with Buffer is full... which means CPU limit is to low for such amount of traffic

At this point I would recommend to retest this issue without any cpu limit and with a higher cpu limit

@alongir
Copy link
Member Author

alongir commented Feb 1, 2024

@iluxa , I ran a few tests:

  1. I confirm Buffer is full.. happens when Tracer's CPU reaches it limits.
  2. It happens also when tracer enjoys unlimited cpu/mem.

Image

@iluxa
Copy link
Collaborator

iluxa commented Feb 1, 2024

@alongir Can you please confirm Buffer is full... happens with unlimited cpu and memory limits, and provide logs and cpu graph metrics for this case.

I expect this case Buffer is full... happens only during disassembly. If this get confirmed, then issue can be fixed for the case with higher cpu/memory limit

@alongir
Copy link
Member Author

alongir commented Feb 1, 2024

I confirm Buffer is full.. happens when tracer enjoys unlimited cpu/mem.

kubeshark_logs.zip

In terms of Grafana, see the above/below image
Image

@alongir
Copy link
Member Author

alongir commented Feb 1, 2024

@iluxa this commit by @mertyildiran can help debug the marshaling error: https://github.com/kubeshark/worker/commit/2e5dbaa53c062862b9b206f29a1f85f6e1b6b19f
Can you make sure this commit exists in the next image you create?

@iluxa
Copy link
Collaborator

iluxa commented Feb 2, 2024

@alongir
PRs #31 has increased chunks buffer. This significantly increases performance based on my tests. There is still slow code need to be optimized, for example:

if err := binary.Read(buffer, binary.LittleEndian, &chunk); err != nil {

worker tag v52.1.20-dev2 includes https://github.com/kubeshark/worker/commit/2e5dbaa53c062862b9b206f29a1f85f6e1b6b19f

@alongir
Copy link
Member Author

alongir commented Feb 2, 2024

What do you recommend?

BTW:

  1. When there's no significant load on the cluster, I confirm I see the TLS payload in full (first time)
  2. I still see this error: │ 2024-02-02T20:10:04Z ERR main.go:174 > Failed marshalling item: error="json: error calling MarshalJSON for type http.HTTPPayload: Failed converting response to HAR", but I'm not sure what are the results (e.g. DNS maybe, TCP maybe?). I don't see @mertyildiran commit https://github.com/kubeshark/worker/commit/2e5dbaa53c062862b9b206f29a1f85f6e1b6b19f (e.g. logging the errors). Is it in the image?
  3. I don't see tracer errors in a busy cluster

Shall I approve this PR?

@iluxa
Copy link
Collaborator

iluxa commented Feb 3, 2024

@alongir

I see both changes from https://github.com/kubeshark/worker/commit/2e5dbaa53c062862b9b206f29a1f85f6e1b6b19f in the tag provided:
https://github.com/kubeshark/worker/blob/c2af2c5874fd25ce0ca72af3397d5ac1152ca6bb/pkg/extensions/http/structs.go#L33
https://github.com/kubeshark/worker/blob/c2af2c5874fd25ce0ca72af3397d5ac1152ca6bb/pkg/extensions/http/structs.go#L73

If there are no more tracer errors, then worker side output can be compared to what exists in tls_last.pcap

Shall I approve this #31?

related changes are already merged in the worker, #31 and https://github.com/kubeshark/worker/pull/62 are new PRs which use new https://github.com/kubeshark/tracerproto module

@alongir
Copy link
Member Author

alongir commented Feb 6, 2024

@iluxa
Using: v52.1.23-dev2
I the dashboard I see (NO LIB-SSL traffic):

Image

However, the Tracer PCAPs look good,. The logs are filled with errors

kubeshark_logs.zip
pcaps.tar.gz

@iluxa
Copy link
Collaborator

iluxa commented Feb 7, 2024

Issue was found which causes error calling MarshalJSON.... on the client side:

Client sometimes does not read full answer received, only first part, usually one packet only: kubeshark/sock-shop-demo#26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants