-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go TLS tracing not working on Amazon Linux 2023 #1986
Comments
@SamuraiPrinciple thanks for reporting this. I'm aiming to try to reproduce the issue later this week and will report back when I have more information. |
Hi @ddelnano - many thanks for the quick reply, it really is appreciated. Do let me know if I could be of any help (more logs, screenshare, ...). Take care and wish you a great day. |
This is still on my radar. I'm expecting to have time to look into this at the end of the week. |
That's great, thanks for the update. I've tried with the latest pixie version (0.14.11) and it's still not working. |
Hi @ddelnano hope all is well :) Wondering if you've managed to reproduce the issue? Happy to provide more details (EKS AMIs, pixie deployment config and sample go app) in case you didn't manage to get a repro... Many thanks, |
@SamuraiPrinciple sorry for missing to follow up on this! I'll be giving this some attention tomorrow or Thursday. |
No, no, no problem at all - just let me know if you're struggling to reproduce it, very happy to make it as easy as possible for you. Thanks for the reply and your hard work! |
Since our existing Go tls tracing test hasn't been updated for Go 1.22 yet, I wanted to rule out any obvious incompatibility. We will eventually have that version under test, but until that is upstream I've created this branch which adds coverage for Go 1.22. The test passes on that branch, which is a good sign. My next step is to reproduce the issue on EKS w/ AL2023. @SamuraiPrinciple while I haven't tried the EKS steps yet, can you able to share an example Go application that exhibits the problem and the compiler version? |
Hi, thanks for picking this up! I will provide you with the sample app later, but just as quick note - if I rotate EKS nodes so that they use AL2 AMIs, the tracing then works with everything else being exactly the same. |
No problem! Looking forward to seeing the example app as I didn't have luck with the reproduction on EKS. |
The repo contains a simple go program that polls https://example.com as well as a Dockerfile that can be used to build an image. There is also a very simple k8s deployment manifest that runs two containers (one for HTTP 1.1 and the other for HTTP/2). |
Thanks for the quick response. Does the HTTP/2 case work on AL2? Pixie's HTTP/2 tracing should only work for GRPC and when Go binaries have debug symbols (docs). As for the HTTP 1.1 case, can you explain more on the success case (when running on AL2)? I crafted a sample application very similar to yours and didn't see any traffic for AL2 or AL2023. I didn't consider my case a reproduction since I thought it was more similar to #899, which was never fully understood. I'll be able to continue debugging this on my end, but I'd still like to understand the AL2 case to help narrow down where the problem might be. |
Hi, mega-thanks for the quick reply! To be honest, we're not using HTTP/2 (i.e. we're actively disabling it - long story, but it's problematic for various reasons, unrelated to pixie), so I've only included that 'just in case'. I can confirm that on AL2 I can indeed see payloads for outbound TLS requests (HTTP 1.1). |
For your AL2 environment, can you run the PEM with You can add that arg to the DaemonSet even though the PID won't match on the other instances. Just make sure to grab the logs from the instance that has the application with the PID. |
I'm deploying pixie using a helm chart - do you happen to know what the quickest way is to pass an argument to pem daemonset? or do I have to manually update the pem daemonset? |
I think I've got it... |
Ok, I've tracked down the issue. The get_goid function is broken on AL23. I made the following changes to print out all the variables and here is the output that I'm seeing.
|
@SamuraiPrinciple can you try installing the I was able to verify that the issue above is because Pixie's prepackaged headers (what are used if kernel headers aren't available on host) is causing this thread lookup to fail. The fsbase is essential for locating the goid, so incorrectly reading that value breaks tracing. Essentially AL23 has enough kernel differences (back ports, etc) compared to vanilla linux that Pixie's pre-packaged headers aren't compatible. Our docs mention that it's highly recommended to install linux headers although we haven't had a good way to surface this to end users. Unfortunately these kernel header incompatibilities manifest in strange ways, so it's usually time consuming to identify the problem. Coincidentally I've been working on #2051 and it is getting close to making it in a release. I intend to make that check part of the Pixie cli and helm install process in addition to the |
Hi! Many thanks for the investigation and the info. I can confirm that installing linux headers package indeed resolves the issue. The HTTP 1.1 tls connections can be traced (I have noticed that gzipped responses are not rendered in UI - "resp_body: <removed: non-text content-type>,"). Happy to close this issue. |
No problem, sorry it took some time for me to dig into this and greatly appreciate your help throughout the process! I'm hopeful that surfacing the lack of linux headers in Pixie's diagnostic tools will help uncover and fix these problems before they become month long bugs! That tooling should be making its way in a release in the next few weeks (across vizier, operator, cli, etc). |
Glad I could be useful :) and many, many thanks for your help with this! |
Describe the bug
Go TLS tracing does not work on Amazon Linux 2023.
To Reproduce
Provision EKS cluster (v1.30) with two node groups, one running AL2, one running AL2023. Deploy pixie. Run a Go (1.21 or 1.22) app (pod/deployment) that's making outbound HTTPS requests (HTTP 1.1). Go to px/http_data and try and observe said outbound traffic. Only traffic initiated by the pods running on AL2 node group would appear.
Expected behavior
Be able to see traffic initiated by pods scheduled on node group running AL2023.
Logs
Please attach the logs by running the following command:
App information (please complete the following information):
pem-AL2.log
pem-al2023.log
The text was updated successfully, but these errors were encountered: