Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agent): two-way communications #1608

Merged
merged 14 commits into from
Sep 19, 2023

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Aug 8, 2023

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes: #1578
Depends on cryostatio/cryostat-agent#175

Description of the change:

This change adds allows a match expression example to be copied to the clipboard...

Motivation for the change:

This change is helpful because users may want to copy the example for easier use...

How to manually test:

  1. Run CRYOSTAT_IMAGE=quay.io... sh smoketest.sh...
  2. ...

@andrewazores andrewazores linked an issue Aug 8, 2023 that may be closed by this pull request
5 tasks
@andrewazores andrewazores marked this pull request as draft August 8, 2023 14:15
@github-actions github-actions bot added the needs-triage Needs thorough attention from code reviewers label Aug 8, 2023
@mergify mergify bot added the safe-to-test label Aug 8, 2023
@andrewazores andrewazores added feat New feature or request and removed needs-triage Needs thorough attention from code reviewers labels Aug 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-18e46b6197a68e9ab0d26a0ea3a295a81e47535e-linux-amd64 sh smoketest.sh

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-18e46b6197a68e9ab0d26a0ea3a295a81e47535e-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-f1ed1c2578dc596a27d877afc581ac9fdcd24a7b-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-f1ed1c2578dc596a27d877afc581ac9fdcd24a7b-linux-amd64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-5b947e0fbedac416de9b57b9bc6fe4d4aff5ddfa-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-5b947e0fbedac416de9b57b9bc6fe4d4aff5ddfa-linux-amd64 sh smoketest.sh

@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch from 5b947e0 to 01e7aea Compare August 14, 2023 18:18
@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-01e7aeae6fbf45d98efe7412f108f1236d5d34ed-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-01e7aeae6fbf45d98efe7412f108f1236d5d34ed-linux-amd64 sh smoketest.sh

@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch from 01e7aea to f80f01d Compare August 14, 2023 18:35
@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-f80f01da356523d0f023c7bda0a88ec2f2ca2d3c-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-f80f01da356523d0f023c7bda0a88ec2f2ca2d3c-linux-amd64 sh smoketest.sh

@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch from f80f01d to 3fed4a0 Compare August 14, 2023 19:09
@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-3fed4a037e6b7c803c03dfa3eebdced408680c79-linux-arm64 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1608-3fed4a037e6b7c803c03dfa3eebdced408680c79-linux-amd64 sh smoketest.sh

mergify[bot]
mergify bot previously requested changes Sep 13, 2023
Copy link

@mergify mergify bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request blocked. web-client submodule updates are performed automatically by CI when that repository is updated. Please revert or drop all changes to the web-client submodule from this PR and perform any required frontend work by opening and merging a PR against cryostat-web.

@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch from 04ff2f7 to c7755d3 Compare September 13, 2023 16:43
@andrewazores andrewazores marked this pull request as ready for review September 13, 2023 16:45
@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

@aali309 aali309 self-requested a review September 15, 2023 13:42
@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch 3 times, most recently from 0575b89 to 2e0dee0 Compare September 18, 2023 18:52
@andrewazores
Copy link
Member Author

cryostatio/cryostat-agent#175 (comment)

There seems to be a bug with stopping recordings on the quarkus-test-agent-1. That's a JMX-based target that uses the Agent for discovery, and the recording being stopped is one that is created by the Agent's Harvester: https://github.com/cryostatio/cryostat/blob/5f578f93c9ac3bf9dd25f5e7072f24fc7306184d/smoketest.sh#L205

@andrewazores
Copy link
Member Author

INFO: Outgoing WS message: {"meta":{"category":"ActiveRecordingStopped","type":{"type":"application","subType":"json"},"serverTime":1695129660},"message":{"target":"service:jmx:rmi:///jndi/rmi://cryostat:9097/jmxrmi","recording":{"downloadUrl":"https://localhost:8181/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9097%2Fjmxrmi/recordings/cryostat-agent","reportUrl":"https://localhost:8181/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2Fcryostat:9097%2Fjmxrmi/reports/cryostat-agent","metadata":{"labels":{}},"archiveOnStop":false,"id":1,"name":"cryostat-agent","state":"STOPPED","startTime":1695129573590,"duration":0,"continuous":true,"toDisk":true,"maxSize":0,"maxAge":60000}}}
Sep 19, 2023 1:21:00 PM io.cryostat.core.log.Logger error
SEVERE: HTTP 500: No value present
io.vertx.ext.web.handler.HttpException: Internal Server Error
Caused by: java.util.NoSuchElementException: No value present
	at java.base/java.util.Optional.get(Optional.java:143)
	at io.cryostat.recordings.RecordingTargetHelper.lambda$stopRecording$6(RecordingTargetHelper.java:300)
	at io.cryostat.net.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:151)
	at io.cryostat.recordings.RecordingTargetHelper.stopRecording(RecordingTargetHelper.java:279)
	at io.cryostat.recordings.RecordingTargetHelper.stopRecording(RecordingTargetHelper.java:272)
	at io.cryostat.net.web.http.api.v1.TargetRecordingPatchStop.handle(TargetRecordingPatchStop.java:40)
	at io.cryostat.net.web.http.api.v1.TargetRecordingPatchHandler.handleAuthenticated(TargetRecordingPatchHandler.java:105)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:80)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:50)
	at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
	at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

The stoppage notification is emitted, but after navigating away and back to the Recordings page and triggering a re-query of data, the recording state is actually still running. That matches what would be expected from the stack trace. This behaviour is also present on main.

@andrewazores
Copy link
Member Author

Agent-side logs:

2023-09-19 13:29:59,648 INFO  [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) STOPPED
2023-09-19 13:29:59,649 INFO  [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) CLOSED
2023-09-19 13:29:59,838 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) POST 200 (quarkus-test-agent-1_default_20230919T132959Z.jfr -> https://localhost:8181/api/beta/recordings/KnY4JVq_p8hVVoh-GsmOAarLxC6bwdSX9P2qDlU0520=): 392 KB/PT0.188978532S
2023-09-19 13:29:59,851 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(4) RUNNING
2023-09-19 13:30:03,031 INFO  [io.cry.age.CryostatClient] (cryostat-agent-registration) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ HTTP/1.1
2023-09-19 13:30:03,054 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ : 200

It looks like what is happening is that the remote JMX connection invokes the stopRecording operation, which the target JVM accepts and performs. Immediately, the Agent Harvester code executing within the target JVM sees this updated condition and reacts:

https://github.com/cryostatio/cryostat-agent/blob/8db9b4cf177781090baec976d924fc87533f82ec/src/main/java/io/cryostat/agent/Harvester.java#L193

The Agent dumps the stopped recording contents to a file, closes the recording, uploads the file dump to the server, and then starts a new replacement recording with the same name and other settings according to the Harvester configuration. The server meanwhile, after sending the JMX stopRecording command:

                    Optional<IRecordingDescriptor> descriptor =
                            getDescriptorByName(connection, recordingName);
                    if (descriptor.isPresent()) {
                        IRecordingDescriptor d = descriptor.get();
                        if (d.getState().equals(RecordingState.STOPPED) && quiet) {
                            return d;
                        }
                        connection.getService().stop(d);
                        this.cancelScheduledTasksIfExists(targetId, recordingName);
                        HyperlinkedSerializableRecordingDescriptor linkedDesc =
                                new HyperlinkedSerializableRecordingDescriptor(
                                        d,
                                        webServer.get().getDownloadURL(connection, d.getName()),
                                        webServer.get().getReportURL(connection, d.getName()),
                                        RecordingState.STOPPED);
                        this.issueNotification(targetId, linkedDesc, STOP_NOTIFICATION_CATEGORY);
                        return getDescriptorByName(connection, recordingName).get();
                    } else {
                        throw new RecordingNotFoundException(targetId, recordingName);
                    }

does a little other work of its own, then tries to query the target for the updated state of the recording by the same name so that it can inform the original client of the latest state. The expectation should be that only the running -> stopped state has changed, but this re-query is done to ensure that we really report the latest state to the client. I believe the NoSuchElementException arises because this follow-up query happens to occur in the timing interval where the Agent has deleted the original recording and has not yet re-started its replacement, so there is actually no recording with the expected name in the target JVM anymore.

tl;dr this is a conflict between trying to use manual recording stop operations on the Agent Harvester-managed recording. I'm not sure if it's really a bug in that case but maybe there is something to be done to make this result a little more clear and explicit to the user. In the Agent HTTP connection case that is probably easier to do since we have more control. In this JMX case, I don't know what can be done really.

andrewazores and others added 14 commits September 19, 2023 10:45
* chore(svc): extract EventOptionsBuilder to -core and use new CryostatFlightRecorderService

* add unimplemented overrides

* test(smoketest): enable API writes on one agent-equipped sample app

* chore(serial): extract recording descriptor to -core

* chore(activerecordings): clean up an error handler

* feat(agent): implement dynamic start of JFR over HTTP

* bump -core version
* build(deps): bump dagger from 2.45 to 2.47

Bumps [dagger](https://github.com/google/dagger) from 2.45 to 2.47.
- [Release notes](https://github.com/google/dagger/releases)
- [Changelog](https://github.com/google/dagger/blob/master/CHANGELOG.md)
- [Commits](google/dagger@dagger-2.45...dagger-2.47)

---
updated-dependencies:
- dependency-name: com.google.dagger:dagger
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* use single version for dagger artifacts

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Azores <[email protected]>
* build(pom): restore dagger compiler property

Signed-off-by: Elliott Baron <[email protected]>

* Use dagger.version for property

---------

Signed-off-by: Elliott Baron <[email protected]>
@andrewazores andrewazores force-pushed the 1578-epic-two-way-agent-communications branch from 2e0dee0 to 9e2ba17 Compare September 19, 2023 14:45
@github-actions
Copy link
Contributor

This PR/issue depends on:

@andrewazores andrewazores merged commit 3b46806 into main Sep 19, 2023
7 checks passed
@andrewazores andrewazores deleted the 1578-epic-two-way-agent-communications branch September 19, 2023 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request safe-to-test
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Epic] Two-way Agent communications
4 participants