Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the ERROR level for logs that report erroneous user inputs #4684

Open
hoyhbx opened this issue Aug 24, 2022 · 0 comments
Open

Use the ERROR level for logs that report erroneous user inputs #4684

hoyhbx opened this issue Aug 24, 2022 · 0 comments

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Aug 24, 2022

The PR can be found at #4685.

Basically, we propose to use ERROR logs that report erroneous user input when rejecting them. In principle, any errors should be logged with ERROR (user error is no exception). Currently, INFO logs are used which makes it very difficult to identify those errors from a large number of INFO logs.

We carefully go over all the log statement related to user inputs and revise the log level in the PR.

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-07T21:33:51Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  9m4s

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-5d85aab1-3a5b-4bd2-a096-83be8a6e4c63   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-1       Bound    pvc-22015e6c-601d-409e-b480-07458ce53711   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-2       Bound    pvc-aa1bfdda-b2be-44ed-a721-a17ef5d6140c   10Gi       RWO            standard       8m37s
tikv-advanced-tidb-tikv-0   Bound    pvc-a702225b-1b7f-4363-8975-04f9170f5853   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-1   Bound    pvc-fc633404-97d4-4359-9921-61faaf26b4c8   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-2   Bound    pvc-69b6c9eb-6dcf-4ae6-91b8-912551c3c1d4   100Gi      RWO            standard       7m11s

What's the status of the TiDB cluster pods?

NAME                                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-zmg5d   1/1     Running   0          9m21s   10.244.3.2   test-worker2   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          9m20s   10.244.2.3   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          9m20s   10.244.3.4   test-worker2   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          9m20s   10.244.1.5   test-worker3   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          6m48s   10.244.2.6   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          6m48s   10.244.3.7   test-worker2   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          6m48s   10.244.1.8   test-worker3   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          7m54s   10.244.2.5   test-worker    <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          7m54s   10.244.3.6   test-worker2   <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          7m54s   10.244.1.7   test-worker3   <none>           <none>
tidb-controller-manager-5c775c65f5-24sw4   1/1     Running   0          9m36s   10.244.1.3   test-worker3   <none>           <none>
tidb-scheduler-5665b7f8fb-j4864            2/2     Running   0          9m36s   10.244.1.2   test-worker3   <none>           <none>

What did you do?

We discovered that many error messages related to invalid input are printed via klog.Infof (for example, code) instead of klog.Errorf, while the messages themselves should be raised at the ERROR level.

The error messages we got that were printed at the INFO level include (but are not limited to):

Deployment.apps "test-cluster-discovery" is invalid: [spec.template.spec.securityContext.fsGroupChangePolicy: Unsupported value: "OnRootMissmatch": supported values: "Always", "OnRootMismatch",
Deployment.apps "test-cluster-discovery" is invalid: spec.template.spec.dnsConfig.nameservers[0]: Invalid value: "MyIPAddress": must be valid IP address, requeuing
Deployment.apps "test-cluster-discovery" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: "Volume", requeuing
Deployment.apps "test-cluster-discovery" is invalid: spec.template.spec.dnsPolicy: Unsupported value: "nil": supported values: "ClusterFirstWithHostNet", "ClusterFirst", "Default", "None", requeuing
Deployment.apps "test-cluster-discovery" is invalid: [spec.template.spec.containers[0].resources.limits[MyType]: Invalid value: "MyType": must be a standard resource type or fully qualified

What did you expect to see?
We expect to see the error messages printed at the ERROR level, as all the messages we show above are indeed errors.

What did you see instead?
These messages are shown at the INFO level in the log.

Additional comment
We inspected the source code and we believe that this line should use klog.Errorf rather than klog.Infof to print the log message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant