Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One tool #429

Closed
wants to merge 24 commits into from
Closed

One tool #429

wants to merge 24 commits into from

Conversation

atheurer
Copy link
Contributor

@atheurer atheurer commented Nov 27, 2023

This PR will have multiple commits for different areas of work:

  1. Enabling remotehost endpoint to launch one tool per engine,
    plus any changes needed in rickshaw-*:
    • Add communication to endpoiont-deploy endpoint to receive messages about new roadblock followers from more than 1 endpint
  2. Enabling k8s endpoint to launch one tool per engine
  3. Enabling openstack endpoint to lunch one tool per engine
  4. Having rickshaw-run source/build an image per tool for the run
  5. Having all endpoints use the specific image for each tool they launch

Copy link
Contributor

@k-rister k-rister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this completely eliminate the concept of the user creating a profiler engine themselves? I see a couple places where you removed it from a set of roles that are being checked for so I'm guessing it does. If so, we are probably going to have CI issues with this that I will need to resolve since we explicitly test profiler engine creation.

Also, I think I identified a couple of local changes that slipped in that we don't actually want to merge upstream.

endpoints/remotehost/remotehost Outdated Show resolved Hide resolved
rickshaw-run Outdated Show resolved Hide resolved
rickshaw-run Outdated Show resolved Hide resolved
rickshaw-run Show resolved Hide resolved
@atheurer
Copy link
Contributor Author

So does this completely eliminate the concept of the user creating a profiler engine themselves? I see a couple places where you removed it from a set of roles that are being checked for so I'm guessing it does. If so, we are probably going to have CI issues with this that I will need to resolve since we explicitly test profiler engine creation.

Also, I think I identified a couple of local changes that slipped in that we don't actually want to merge upstream.

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

@k-rister
Copy link
Contributor

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

https://github.com/perftool-incubator/crucible-ci/blob/main/.github/actions/integration-tests/run-ci-stage1#L388

I think the intended purpose of this functionality was to be able to launch profilers on something like a KVM host, storage server, etc. where you wanted to collect data but weren't actually running a workload.

@atheurer
Copy link
Contributor Author

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

https://github.com/perftool-incubator/crucible-ci/blob/main/.github/actions/integration-tests/run-ci-stage1#L388

I think the intended purpose of this functionality was to be able to launch profilers on something like a KVM host, storage server, etc. where you wanted to collect data but weren't actually running a workload.

Ahh, OK, I misunderstood your question. I thought you were referring to some kind of user-provided profiler, which we of course don't have [yet], and so it did not make sense to me. I'll make sure we can still include remotehost endpoints which are only profilers. I'll test that next.

- engine naming needs to be resolved to allow multiple remotehost
  endpoints
- Engines which run tools need to have a more specific label, one that
  includes th endpoint-label, to avoid duplicate labels like
  "profiler-1"
- Logic for adding RB followers was a bit broken, only looking at the
  first RB message with new followers.
- There is still work to do to avoid duplicate tool collection when
  more than 1 remotehost endpoint use the sme host (one could argue to
  not do this, but I'd like it to just avoid duplicte tools in case
  someone does).
-ensure tool_name gets into ES
-void duplicate tool collection on remotehost endpoints that have same
host
-disable debug logging
-reducing some code duplication
-documenting globals in each function
-cpu-part needs to be fixed still
-other minor cleanups
-it's possible that a user used a remotehost endpoint as a "profiler"
only, but then the tool-params.json was empty, resulting in an endpoint
that still runs but doe snot launch engines.  A small change to not
create env-vars.json is needed.
@atheurer atheurer marked this pull request as ready for review January 31, 2024 21:35
@k-rister k-rister linked an issue Jan 31, 2024 that may be closed by this pull request
-also fix call in k8s
- I think there's some conflict with by-ref var name and local vars in
  the function, which if true is disappointing
@atheurer atheurer requested review from k-rister and removed request for k-rister February 3, 2024 02:50
Copy link
Contributor

@k-rister k-rister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review. I plan on running some tests with it myself just to better understand it, but I'll get to that later.

endpoints/base Outdated Show resolved Hide resolved
endpoints/base Outdated Show resolved Hide resolved
endpoints/base Show resolved Hide resolved
endpoints/base Show resolved Hide resolved
endpoints/base Show resolved Hide resolved
endpoints/remotehost/remotehost Outdated Show resolved Hide resolved
engine/engine-script-library Outdated Show resolved Hide resolved
rickshaw-post-process-bench Show resolved Hide resolved
rickshaw-run Outdated Show resolved Hide resolved
rickshaw-run Outdated Show resolved Hide resolved
rickshaw-settings.json Outdated Show resolved Hide resolved
@atheurer atheurer closed this Feb 6, 2024
@atheurer atheurer deleted the one-tool branch February 6, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move to a one tool per engine execution model
2 participants