Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(triggers): Add Support for defining MBean Triggers to the cryostat agent #197

Merged
merged 37 commits into from
Oct 3, 2023

Conversation

Josh-Matsuoka
Copy link
Contributor

This PR Addresses #130 and #131 , It makes use of a bit of manual parsing to handle durations, and google's Common Expression Language to do the heavy lifting evaluating conditions for triggering a Flight Recording based on the contents of the MBeans.

The Syntax looks a little bit like this:

[Condition1 &&/|| Condition2 &&/|| .... targetDuration >/</= duration("$DURATION")]~recordingTemplate

In more human readable form, consider this example

[ProcessCpuUsage>0.2&&targetDuration<30s]~foo

Would trigger a recording if the OS MBean reports ProcessCpuUsage under 0.2 for 30 seconds or longer. This gets passed to the agent as an argument on startup.

I also took the time to refactor MBeanContext and pull the MBean reading out into a separate class since both this feature and the MBeanContext can make use of it.

Copy link
Member

@andrewazores andrewazores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far, I'm excited to test this out.

src/main/java/io/cryostat/agent/model/MBeanInfo.java Outdated Show resolved Hide resolved
@andrewazores andrewazores added feat New feature or request safe-to-test labels Sep 21, 2023
Copy link
Member

@andrewazores andrewazores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I have a few more minor comments here, plus some refactoring/cleanup ideas of my own. Please rebase this and I will commit + push my cleanup for you to look over. I also have some ideas about the re-triggering functionality and things that I will open a follow-up PR for later - would you be able to help review that as well when it's ready?

README.md Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
src/main/java/io/cryostat/agent/FlightRecorderModule.java Outdated Show resolved Hide resolved
@andrewazores
Copy link
Member

@Josh-Matsuoka how do my latest changes look? It's just some refactoring/cleanup, there should be no functional changes other than the ScheduledExecutorService update period is now configurable (but kept the same value as the default).

@Josh-Matsuoka
Copy link
Contributor Author

@andrewazores looks good! Thanks for cleaning that up

@andrewazores
Copy link
Member

Okay, after all that and the following patch:

diff --git a/smoketest.sh b/smoketest.sh
index 187c5a7e..a8774067 100755
--- a/smoketest.sh
+++ b/smoketest.sh
@@ -202,7 +202,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-1 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0]~profile" \
         --env QUARKUS_HTTP_PORT=10010 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-1" \
@@ -221,7 +221,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-2 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0]~default.jfc" \
         --env QUARKUS_HTTP_PORT=10011 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-2" \

and a mvn install of the -agent and local rebuild of quarkus-test;

image

I haven't tried out a smart trigger with a sustained duration yet, but at least this basic case seems to be working well.

@andrewazores
Copy link
Member

I haven't had much luck:

diff --git a/smoketest.sh b/smoketest.sh
index 187c5a7e..6d075a1d 100755
--- a/smoketest.sh
+++ b/smoketest.sh
@@ -202,7 +202,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-1 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0]~profile,[ProcessCpuLoad<100.0&&TargetDuration>duration(\"10s\")]~default" \
         --env QUARKUS_HTTP_PORT=10010 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-1" \
@@ -221,7 +221,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-2 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0]~default.jfc" \
         --env QUARKUS_HTTP_PORT=10011 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-2" \

The first condition for ThreadCount>0 triggers essentially immediately as expected, but the second condition for the CPU load over a duration never seems to trigger. I do see that it gets correctly parsed and registered at least.

@andrewazores
Copy link
Member

diff --git a/smoketest.sh b/smoketest.sh
index 187c5a7e..a48adbd4 100755
--- a/smoketest.sh
+++ b/smoketest.sh
@@ -202,7 +202,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-1 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dcom.sun.management.jmxremote.port=9097 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0]~default.jfc" \
         --env QUARKUS_HTTP_PORT=10010 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-1" \
@@ -221,7 +221,7 @@ runDemoApps() {
     podman run \
         --name quarkus-test-agent-2 \
         --pod cryostat-pod \
-        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar" \
+        --env JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/deployments/app/cryostat-agent.jar=[ThreadCount>0&&TargetDuration>duration(\"30s\")]~Profiling" \
         --env QUARKUS_HTTP_PORT=10011 \
         --env ORG_ACME_CRYOSTATSERVICE_ENABLED="false" \
         --env CRYOSTAT_AGENT_APP_NAME="quarkus-test-agent-2" \
@@ -234,6 +234,7 @@ runDemoApps() {
         --env CRYOSTAT_AGENT_TRUST_ALL="true" \
         --env CRYOSTAT_AGENT_AUTHORIZATION="Basic $(echo user:pass | base64)" \
         --env CRYOSTAT_AGENT_API_WRITES_ENABLED="true" \
+        --env CRYOSTAT_AGENT_SMART_TRIGGER_EVALUATION_PERIOD_MS=10000 \
         --rm -d quay.io/andrewazores/quarkus-test:latest
 
     # copy a jboss-client.jar into /clientlib first

Alright, with the latest changes and this patch to smoketest.sh, both my example triggers are working as desired :-) @Josh-Matsuoka could you confirm that and help me try out some other variants of trigger expressions and see if we can catch any other corner cases?

@Josh-Matsuoka
Copy link
Contributor Author

Confirmed it's looking good :) I'll try combining a few different constraints and see if anything breaks

@andrewazores
Copy link
Member

andrewazores commented Oct 2, 2023

Last couple of commits fix duration constraints like TargetDuration>=duration("30s") and TargetDuration>duration("2m"), and also remove the rawCondition.substring() bit in favour of just using more regex grouping.

@Josh-Matsuoka
Copy link
Contributor Author

Works nicely for me, I've tried a couple of different configurations and combinations of constraints and haven't had anything break yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request safe-to-test
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants