Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test Refactor] Enhance output way of e2e case --- using CAPI streams the logs of all pods #208

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

huchen2021
Copy link
Contributor

@huchen2021 huchen2021 commented Nov 4, 2021

Background:

When we first introduced the github runners, any failing e2e case was hard to debug because we couldn’t ssh into those runners and things would work fine in local env. So as a way to display logs when there’s a failure, we have some sort of debugging mechanism in place. For the current code implement, we show the pod logs by cli command "kubectl logs -n ${podNamespace} ${podName} --kubeconfig /tmp/mgmt.conf -c manager". But this wasn’t ideal way. In the PR, I change to using CAPI streams to get the log of this pod, which is more official.

What I did in this PR:

  1. Enhance output way of e2e case --- using CAPI streams the logs of byoh-controller-manager pods
  2. There are a lots of arguments pass between functions to make code hard to read and golangci-lint error, do some code refactor. Define 3 variables caseContextData, collectInfoData, byoHostPoolData for each case, they are used to save data, and if any function want to use the data, pass them as argument, it can makes code more clean and easy to read.

@huchen2021 huchen2021 added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress and removed cla-not-required labels Nov 4, 2021
@huchen2021 huchen2021 added cla-not-required and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress labels Nov 4, 2021
Copy link
Contributor

@jamiemonserrate jamiemonserrate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huchen2021 - Can I ask how does extracting a CaseContext struct help? Is it to help with passing data around?

@huchen2021
Copy link
Contributor Author

huchen2021 commented Nov 5, 2021

Yes, before do this, we need to pass several arguments. After do this, we only need to pass one argument CaseContext. Code is more clean. Same reason with CollectInfoContext, WriteDeploymentLogContext and ByoHostPoolContext.

@huchen2021 - Can I ask how does extracting a CaseContext struct help? Is it to help with passing data around?

@jamiemonserrate
Copy link
Contributor

@huchen2021 - I'm worried that the test is turning into a bunch of functions that do stuff, and we are passing data around using structs. I'm worried this is only going to become harder to maintain.

In my ideal world, we would tie the functions closer to the data they operate on. So have structs, and methods on that struct that can use the data. For example, there could be one struct to create a BYOHost and get its logs. There could be another struct that represents our management cluster.

I'm happy to spend some more time thinking and giving a more concrete suggestion if you agree with the approach I am suggesting. What do you think @huchen2021 ?

@huchen2021
Copy link
Contributor Author

huchen2021 commented Nov 9, 2021

@huchen2021 - I'm worried that the test is turning into a bunch of functions that do stuff, and we are passing data around using structs. I'm worried this is only going to become harder to maintain.

In my ideal world, we would tie the functions closer to the data they operate on. So have structs, and methods on that struct that can use the data. For example, there could be one struct to create a BYOHost and get its logs. There could be another struct that represents our management cluster.

I'm happy to spend some more time thinking and giving a more concrete suggestion if you agree with the approach I am suggesting. What do you think @huchen2021 ?

@jamiemonserrate Actually, I am doing exactly what you suggested. Let me talk more about my PR. There are 4 structs CaseContext., CollectInfoContext, WriteDeploymentLogContext and ByoHostPoolContext.

  • Struct CaseContext is represent the case data, since we have 3 e2e case, each case has independent CaseContext variable. It can prevent mutually influenced.

  • Struct CollectInfoContext is represent show log data, for example, where is the log location. since we have 3 e2e case, each case has independent CollectInfoContext variable to save their logs. All Show** function get CollectInfoContext variable as input, and show them.

  • Struct WriteDeploymentLogContext is represent deployment data, When you need to show a deployment log, you just to create a WriteDeploymentLogContext variable with DeploymentName+DeploymentNamespace+ContainerName, and pass it to WriteDeploymentLogs function, it will save the log into specified directory. For now, we only care byoh-system/byoh-controller-manager/manager.

  • Struct ByoHostPoolContext is represent ByoHostPool data, all host agent containers is keep in a byohost pool. And the data is save in this struct variable. since we have 3 e2e case, each case has independent ByoHostPoolContext variable. Function setupByohostPool is used to create a byohost pool. Function cleanByohostPool is used to clean a byohost pool. When we need to create a new e2e sub-test, we just need to call those two functions as followed, no need to copy a lot of lines like before. And you can check the code before this PR, it is a lot of repeated codes.

byoHostPoolData = new(ByoHostPoolContext)
byoHostPoolData.Capacity = 2
setupByohostPool(caseContextData, collectInfoData, byoHostPoolData)
cleanByohostPool(caseContextData, byoHostPoolData)

@jamiemonserrate
Copy link
Contributor

@huchen2021 - I'm worried that the test is turning into a bunch of functions that do stuff, and we are passing data around using structs. I'm worried this is only going to become harder to maintain.
In my ideal world, we would tie the functions closer to the data they operate on. So have structs, and methods on that struct that can use the data. For example, there could be one struct to create a BYOHost and get its logs. There could be another struct that represents our management cluster.
I'm happy to spend some more time thinking and giving a more concrete suggestion if you agree with the approach I am suggesting. What do you think @huchen2021 ?

@jamiemonserrate Actually, I am doing exactly what you suggested. Let me talk more about my PR. There are 4 structs CaseContext., CollectInfoContext, WriteDeploymentLogContext and ByoHostPoolContext.

  • Struct CaseContext is represent the case data, since we have 3 e2e case, each case has independent CaseContext variable. It can prevent mutually influenced.
  • Struct CollectInfoContext is represent show log data, for example, where is the log location. since we have 3 e2e case, each case has independent CollectInfoContext variable to save their logs. All Show** function get CollectInfoContext variable as input, and show them.
  • Struct WriteDeploymentLogContext is represent deployment data, When you need to show a deployment log, you just to create a WriteDeploymentLogContext variable with DeploymentName+DeploymentNamespace+ContainerName, and pass it to WriteDeploymentLogs function, it will save the log into specified directory. For now, we only care byoh-system/byoh-controller-manager/manager.
  • Struct ByoHostPoolContext is represent ByoHostPool data, all host agent containers is keep in a byohost pool. And the data is save in this struct variable. since we have 3 e2e case, each case has independent ByoHostPoolContext variable. Function setupByohostPool is used to create a byohost pool. Function cleanByohostPool is used to clean a byohost pool. When we need to create a new e2e sub-test, we just need to call those two functions as followed, no need to copy a lot of lines like before. And you can check the code before this PR, it is a lot of repeated codes.

byoHostPoolData = new(ByoHostPoolContext) byoHostPoolData.Capacity = 2 setupByohostPool(caseContextData, collectInfoData, byoHostPoolData) cleanByohostPool(caseContextData, byoHostPoolData)

@huchen2021 - I see what you are saying, and you are right, it is similar to what I was suggesting. I guess I am personally not a fan of the structs themselves. The structs feel like data holders, and the functions operate on this data. I agree with the functions you have identified, but I think the way the structs are written, its making it hard to read the test.

So more concretely, what I would have loved to see is only a struct like ByoHostPool. This struct would have methods to create the pool and retrieve the logs. And we don't need any other struct. All the functions should operate on the ByoHostPool. Thoughts?

@huchen2021
Copy link
Contributor Author

huchen2021 commented Nov 10, 2021

retrieve the logs.

@jamiemonserrate Only struct ByoHostPoolContext is not enough, it is only for ByoHostPool staff. If we pass too many arguments to a function, the golangci-lint will report error. And in this point, it also need to package those arguments into a struct variable.

If you see the test-framework project, it also define a struct as input for function. For example, ApplyClusterTemplateAndWait, ScaleAndWaitMachineDeployment and DeleteAllClustersAndWait. If you don't pass struct variable, then you need to pass a lots of variables instead to make function really really long, and golangci-lint will give you a error and suggest you to replace them with a struct variable.

Don't think it will make code hard to read.

@dharmjit
Copy link
Contributor

Hi @huchen2021, Is it possible to share some snippets of new output logs for e2e tests, and how do they differ from the existing output.

@huchen2021
Copy link
Contributor Author

huchen2021 commented Nov 12, 2021

Hi @huchen2021, Is it possible to share some snippets of new output logs for e2e tests, and how do they differ from the existing output.

@dharmjit There is no different from the he existing output. It didn't change the output logs, it just change the implement way of coding.

For the current code, I show the pod logs by cli command "kubectl logs -n ${podNamespace} ${podName} --kubeconfig /tmp/mgmt.conf -c manager". In the PR, I using CAPI streams to get the log of this pod, which is more official.

If you want the output of logs, please see the failed e2e case in RP. For example, click this one. At the end of e2e logs, it show something like this:

######################Start: Content of /tmp/read-byoh-controller-manager-log.sh##################
podNamespace=kubectl get pods --all-namespaces --kubeconfig /tmp/mgmt.conf | grep byoh-controller-manager | awk '{print $1}'
podName=kubectl get pods --all-namespaces --kubeconfig /tmp/mgmt.conf | grep byoh-controller-manager | awk '{print $2}'
kubectl logs -n ${podNamespace} ${podName} --kubeconfig /tmp/mgmt.conf -c manager

######################End: Content of /tmp/read-byoh-controller-manager-log.sh##################

#######################Start: execute result of /tmp/read-byoh-controller-manager-log.sh##################
2021-10-07T10:40:15.200Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2021-10-07T10:40:15.210Z INFO setup starting manager
I1007 10:40:15.211191 1 leaderelection.go:243] attempting to acquire leader lease byoh-system/controller-leader-election-caph...
2021-10-07T10:40:15.211Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
I1007 10:40:15.470315 1 leaderelection.go:253] successfully acquired lease byoh-system/controller-leader-election-caph
2021-10-07T10:40:15.471Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespace":"byoh-system","name":"controller-leader-election-caph","uid":"733a2ea0-0f36-430d-ab80-1b47ea4afd17","apiVersion":"v1","resourceVersion":"1174"}, "reason": "LeaderElection", "message": "byoh-controller-manager-5cff9bff45-2qxjw_a24dca25-cc38-4d3e-bdc0-088ba79868bb became leader"}
2021-10-07T10:40:15.471Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Lease","namespace":"byoh-system","name":"controller-leader-election-caph","uid":"e1f2aafc-8d20-4915-a4e2-bb977e9b71f5","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1177"}, "reason": "LeaderElection", "message": "byoh-controller-manager-5cff9bff45-2qxjw_a24dca25-cc38-4d3e-bdc0-088ba79868bb became leader"}
2021-10-07T10:40:15.512Z INFO controller-runtime.manager.controller.byocluster Starting EventSource {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.512Z INFO controller-runtime.manager.controller.byocluster Starting EventSource {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.byocluster Starting Controller {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster"}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.cluster Starting EventSource {"reconciler group": "cluster.x-k8s.io", "reconciler kind": "Cluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.cluster Starting Controller {"reconciler group": "cluster.x-k8s.io", "reconciler kind": "Cluster"}

......
######################End: execute result of /tmp/read-byoh-controller-manager-log.sh##################

Yes, on the current code It generate a serial of commands, and generate those command to get the logs. In this PR, it just used different way to get the logs, and save it on a specified directory. When the case is end with fail, there is a function to read all logs file from this directory, and show them as followed:

#######################Start: Content of -<container.Name>.log##################
2021-10-07T10:40:15.200Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2021-10-07T10:40:15.210Z INFO setup starting manager
I1007 10:40:15.211191 1 leaderelection.go:243] attempting to acquire leader lease byoh-system/controller-leader-election-caph...
2021-10-07T10:40:15.211Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
I1007 10:40:15.470315 1 leaderelection.go:253] successfully acquired lease byoh-system/controller-leader-election-caph
2021-10-07T10:40:15.471Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespace":"byoh-system","name":"controller-leader-election-caph","uid":"733a2ea0-0f36-430d-ab80-1b47ea4afd17","apiVersion":"v1","resourceVersion":"1174"}, "reason": "LeaderElection", "message": "byoh-controller-manager-5cff9bff45-2qxjw_a24dca25-cc38-4d3e-bdc0-088ba79868bb became leader"}
2021-10-07T10:40:15.471Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Lease","namespace":"byoh-system","name":"controller-leader-election-caph","uid":"e1f2aafc-8d20-4915-a4e2-bb977e9b71f5","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1177"}, "reason": "LeaderElection", "message": "byoh-controller-manager-5cff9bff45-2qxjw_a24dca25-cc38-4d3e-bdc0-088ba79868bb became leader"}
2021-10-07T10:40:15.512Z INFO controller-runtime.manager.controller.byocluster Starting EventSource {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.512Z INFO controller-runtime.manager.controller.byocluster Starting EventSource {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.byocluster Starting Controller {"reconciler group": "infrastructure.cluster.x-k8s.io", "reconciler kind": "ByoCluster"}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.cluster Starting EventSource {"reconciler group": "cluster.x-k8s.io", "reconciler kind": "Cluster", "source": "kind source: /, Kind="}
2021-10-07T10:40:15.513Z INFO controller-runtime.manager.controller.cluster Starting Controller {"reconciler group": "cluster.x-k8s.io", "reconciler kind": "Cluster"}
......
#######################Start: Content of -<container.Name>.log##################

@jamiemonserrate
Copy link
Contributor

@huchen2021 - Let me try to find some time this week to attempt to sketch out what I mean. I will get back to you on Wednesday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants