Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-465 Create a documentation section to use Grafana DataSource with SonataFlow Prometheus metrics #693

Merged
merged 5 commits into from
Jan 17, 2025

Conversation

jianrongzhang89
Copy link
Contributor

@jianrongzhang89 jianrongzhang89 commented Dec 10, 2024

Fix apache/incubator-kie-kogito-serverless-operator#465

Update the document to include Prometheus and Grafana installation, and Grafana Data Source congfiguration and import the default dashboard.

  • You have read the contributions doc
  • Pull Request title is properly formatted: Issue-XYZ Subject
  • Pull Request title contains the target branch if not targeting main: [0.9.x] Issue-XYZ Subject
  • The nav.adoc file has a link to this guide in the proper category
  • The index.adoc file has a card to this guide in the proper category, with a meaningful description

@ricardozanini
Copy link
Member

@jianrongzhang89 can you please take a look on CI?

@jianrongzhang89 jianrongzhang89 force-pushed the monitoring branch 2 times, most recently from 3250a02 to d95b765 Compare December 11, 2024 10:22
Copy link
Contributor

github-actions bot commented Dec 11, 2024

🎊 PR Preview f1a88df has been successfully built and deployed. See the documentation preview: https://sonataflow-docs-preview-pr-693.surge.sh

@jianrongzhang89
Copy link
Contributor Author

@jianrongzhang89 can you please take a look on CI?

@ricardozanini fixed CI errors.

Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have followed all the document for the OpenShift installation and worked fine.
See image below with my workflows.

image

Guide is working.
LGTM

@wmedvede
Copy link
Contributor

Would you mind check the procedure for regular Kubernetes clusters? @domhanak

@jianrongzhang89 jianrongzhang89 force-pushed the monitoring branch 2 times, most recently from eb1bac5 to 4892c25 Compare December 11, 2024 21:10
Copy link
Member

@ricardozanini ricardozanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, @jianrongzhang89. This documentation seems good. Thanks, @wmedvede, for verifying the steps in the cluster!

@ricardozanini
Copy link
Member

@kaldesai mind taking a look too?

Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jianrongzhang89 , I couldn't evict adding some more nitpicks when re-reading 😄


In {product_name}, you can check the following metrics:

* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed)
* `kogito_process_instance_started_total`: Number of started workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed)
* `kogito_process_instance_running_total`: Number of running workflows
* `kogito_process_instance_completed_total`: Number of completed workflows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kogito_process_instance_completed_total`: Number of completed workflows
* `kogito_process_instance_completed_total`: Number of completed workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed)
* `kogito_process_instance_running_total`: Number of running workflows
* `kogito_process_instance_completed_total`: Number of completed workflows
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
* `kogito_process_instance_error`: Number of workflows that report an error.

* `kogito_process_instance_running_total`: Number of running workflows
* `kogito_process_instance_completed_total`: Number of completed workflows
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds
* `kogito_process_instance_duration_seconds`: Duration of a workflow instance in seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* `kogito_process_instance_completed_total`: Number of completed workflows
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type)
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


[NOTE]
====
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively.
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` are workflow id and name respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively.
====

Each of the metrics mentioned previously contains a label for a specific workflow ID. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each of the metrics mentioned previously contains a label for a specific workflow ID. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow:
Each of the metrics mentioned previously contains a label for a specific workflow id. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

----

=== kogito_process_instance_duration_seconds
Calculates duration of a workflow instance that has reached a terminal state,, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Calculates duration of a workflow instance that has reached a terminal state,, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state.
Calculates duration of a workflow instance that has reached a terminal state, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type)
* `sonataflow_input_parameters_counter`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `sonataflow_input_parameters_counter`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`.
* `sonataflow_input_parameters_counter_total`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jianrongzhang89 jianrongzhang89 force-pushed the monitoring branch 2 times, most recently from 1a39e31 to 80de021 Compare December 20, 2024 01:46
Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RichardW98 a few nitpicks, and I have also re-installed the grafana dahsboard after these last modification.
Is working good, great work!

Just an observation regarding the dashboard, see screenshots please:

In my tests, I have these workflows: callbackstatetimeouts and callbackstatetimeouts-gitops.

The dashboard works fine:

image

However, in the filters a "greeting" value is shown.

Screenshot from 2024-12-20 10-39-10

Screenshot from 2024-12-20 10-39-17

@jianrongzhang89
Copy link
Contributor Author

@wmedvede I updated PR based on your above comments. Thanks.

@jianrongzhang89 jianrongzhang89 force-pushed the monitoring branch 2 times, most recently from c58e684 to 681b553 Compare December 21, 2024 23:14
Copy link
Contributor

@wmedvede wmedvede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ricardozanini
Copy link
Member

@domhanak mind taking a look so we can close this one?

…ith SonataFlow Prometheus metrics: address review comments
@ricardozanini ricardozanini merged commit 4f4c9e6 into apache:main Jan 17, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a documentation section to use Grafana DataSource with SonataFlow Prometheus metrics
3 participants