-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(grafana): repair node selection & metrics name #158
Conversation
@huntergregory to review. |
8a7d8aa
to
2416abb
Compare
This PR will be closed in 7 days due to inactivity. |
Please have a look @vakalapa @huntergregory @rbtr |
hey @aslafy-z, thanks for working on this fix. I see that you have the DCO "signed-off-by" on all your commits, but we also need a cryptographic sig to be able to guarantee origin. Would you update these with a signature? Here's how: https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
Signed-off-by: Zadkiel AHARONIAN <[email protected]>
@rbtr I just rebased, squashed and signed my commits :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just need final sign-off from @huntergregory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aslafy-z, sorry for the delay. I missed the notifications from March 😕. Thanks for your PR and the interest in improving these dashboards.
Do you mind updating the PR description with details/examples as needed for the bugs/fixes? Also, if you have a working pod-level dashboard, could you help fix #271?
Added more details in the comment, but I don't think it would make sense to filter by node in the "Fleet View". I'm also not sure that we can/should change datasource
to DS_PROMETHEUS
.
}, | ||
"editorMode": "code", | ||
"expr": "sum(rate(networkobservability_forward_count{direction=\"egress\", cluster=\"$cluster\", instance=~\"$Nodes\"}[$__rate_interval]))", | ||
"expr": "sum(rate(networkobservability_forward_count{direction=\"egress\", cluster=\"$cluster\", instance=~\"($Nodes):[0-9]+\"}[$__rate_interval]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the :
do in this regex? Also, could you help me understand scenarios where the node selection is broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The $Nodes variable has one or multiple node ips formated like 1.1.1.1
or 1.1.1.1|2.2.2.2
.
The instance label however has the node:port 1.1.1.1:1234
.
This edit makes it possible to select multiple nodes.
}, | ||
"editorMode": "code", | ||
"expr": "sum by (cluster) (rate(networkobservability_drop_count[$__rate_interval]))", | ||
"expr": "sum by (cluster) (rate(networkobservability_drop_count{instance=~\"($Nodes):[0-9]+\"}[$__rate_interval]))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and above queries are part of the "Fleet View" panels, where the dashboard summarizes metrics across clusters. I'm not sure it makes sense to filter based on node here, since the Nodes
variable only contains nodes for the selected cluster (there is always exactly one cluster selected):
retina/deploy/grafana/dashboards/clusters.json
Lines 3727 to 3732 in 6946bab
"name": "Nodes", | |
"options": [], | |
"query": { | |
"query": "label_values(kube_node_info{cluster=\"$cluster\"},node)", | |
"refId": "PrometheusVariableQueryEditor-VariableQuery" | |
}, |
There are some analogous panels below where someone can filter by node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that this dashboard is broken and not importable #271 (at least on my grafana setup). If you have a working version, would you actually be able to export it for sharing externally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved out from my previous job and has no access to a cluster where I can install retina right now. I'll try on a kind when back with my personal laptop in a few days and see how it goes.
"refId": "StandardVariableQuery" | ||
}, | ||
"refresh": 1, | ||
"regex": "/.*_podname=\"([^\"]*).*/", | ||
"regex": "/.*podname=\"([^\"]*).*/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch. You must have been using the advanced local-context metric mode. Just noting how this prompted initial thoughts on #344
@@ -107,7 +107,7 @@ | |||
{ | |||
"datasource": { | |||
"type": "prometheus", | |||
"uid": "${datasource}" | |||
"uid": "${DS_PROMETHEUS}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update datasource variable to DS_PROMETHEUS convention
Do you have a link to this convention? I just glanced at the top dashboard on Grafana.com, and it uses datasource
rather than DS_PROMETHEUS
as the variable. Seems the same for the built-in dashboards in Azure's managed Grafana.
I'm also afraid that this might be a breaking change to someone's existing dashboard setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've observed this "naming convention" widely used in recent years. If the dashboard needs to incorporate another type of datasource in the future, the name will clearly indicate the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revert the change if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Added some thoughts here #158 (review)
This PR will be closed in 7 days due to inactivity. |
This PR will be closed in 7 days due to inactivity. |
"hide": 0, | ||
"includeAll": true, | ||
"label": "Nodes", | ||
"multi": true, | ||
"name": "Nodes", | ||
"options": [], | ||
"query": { | ||
"query": "label_values(kube_node_info{cluster=\"$cluster\"},node)", | ||
"query": "label_values(kube_node_info{cluster=\"$cluster\"},internal_ip)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use named capture groups here to separate displayed value and used value, e.g.
"query": {
"qryType": 3,
"query": "query_result(kube_node_info)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
"regex": "/node=\"(?<text>[^\"]+)|internal_ip=\"(?<value>[^\"]+)/g",
This would allow users to select nodes based on names but filter panels by underlying IPs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dashboard files have moved since #432. Could you please apply the changes in the new files? Also, I think @cmergenthaler makes a good suggestion about displaying the node names.
If changing datasource.uid
would break someone's dashboard when updating the dashboard to the new version, then I would prefer we not change it. Either way, it would be nice to keep this PR's scope smaller and make a change to datasource.uid
in another PR (there are also test files that depend on that value).
This PR will be closed in 7 days due to inactivity. |
Pull request closed due to inactivity. |
@rbtr @huntergregory @cmergenthaler |
…e groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>
…e groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>
…re groups in clusters dash Signed-off-by: Simone Rodigari <[email protected]>
…dash (#797) # Description This PR is to fix #158 * reduce scope of PR * [make it possible to select multiple nodes on clusters dash](#158 (comment)) * [fix pod-level regex](#158 (comment)) * [~~use named capture groups here to separate displayed value and used value in clusters dash~~](#158 (comment)) >NOTE: I have reverted the change to DS_PROMETHEUS not to break existing deployments and tests. This was requested in [this comment](#158 (review)) ## Related Issue fix #271 If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request. ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Please add any relevant screenshots or GIFs to showcase the changes made. ### All dashboards ![Screenshot 2024-10-01 152822](https://github.com/user-attachments/assets/6b15f10d-dc12-4405-9898-7da59b2fcdd9) ![Screenshot 2024-10-01 152846](https://github.com/user-attachments/assets/5e1763ce-2a48-4dd9-b4c5-f2b52a7cb3d5) ![Screenshot 2024-10-01 152917](https://github.com/user-attachments/assets/3e4aab9d-7b44-4357-a709-d137e3bb8e47) ### Node selection fix ![Screenshot 2024-10-02 103738](https://github.com/user-attachments/assets/5b61ce34-6a1e-414b-8c9e-1b35f89f7efb) ![Screenshot 2024-10-02 103802](https://github.com/user-attachments/assets/529d6b9f-85a9-48e6-be52-252ecadd066b) ## Additional Notes Thanks to @aslafy-z for the original PR #158 --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project. Signed-off-by: Simone Rodigari <[email protected]>
Note: The pod-level grafana dashboard still have some old metrics to update.