One tool #429

atheurer · 2023-11-27T22:09:35Z

This PR will have multiple commits for different areas of work:

Enabling remotehost endpoint to launch one tool per engine,
plus any changes needed in rickshaw-*:
- Add communication to endpoiont-deploy endpoint to receive messages about new roadblock followers from more than 1 endpint
Enabling k8s endpoint to launch one tool per engine
Enabling openstack endpoint to lunch one tool per engine
Having rickshaw-run source/build an image per tool for the run
Having all endpoints use the specific image for each tool they launch

k-rister

So does this completely eliminate the concept of the user creating a profiler engine themselves? I see a couple places where you removed it from a set of roles that are being checked for so I'm guessing it does. If so, we are probably going to have CI issues with this that I will need to resolve since we explicitly test profiler engine creation.

Also, I think I identified a couple of local changes that slipped in that we don't actually want to merge upstream.

endpoints/remotehost/remotehost

rickshaw-run

atheurer · 2023-11-28T15:25:29Z

So does this completely eliminate the concept of the user creating a profiler engine themselves? I see a couple places where you removed it from a set of roles that are being checked for so I'm guessing it does. If so, we are probably going to have CI issues with this that I will need to resolve since we explicitly test profiler engine creation.

Also, I think I identified a couple of local changes that slipped in that we don't actually want to merge upstream.

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

k-rister · 2023-11-28T15:36:20Z

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

https://github.com/perftool-incubator/crucible-ci/blob/main/.github/actions/integration-tests/run-ci-stage1#L388

I think the intended purpose of this functionality was to be able to launch profilers on something like a KVM host, storage server, etc. where you wanted to collect data but weren't actually running a workload.

atheurer · 2023-11-29T17:44:04Z

I must be inadvertently removing something I did not understand had a purpose. Is there a specific example of a crucible run that has a user creating a profiler engine? I can then use that to make sure it works here.

https://github.com/perftool-incubator/crucible-ci/blob/main/.github/actions/integration-tests/run-ci-stage1#L388

I think the intended purpose of this functionality was to be able to launch profilers on something like a KVM host, storage server, etc. where you wanted to collect data but weren't actually running a workload.

Ahh, OK, I misunderstood your question. I thought you were referring to some kind of user-provided profiler, which we of course don't have [yet], and so it did not make sense to me. I'll make sure we can still include remotehost endpoints which are only profilers. I'll test that next.

- engine naming needs to be resolved to allow multiple remotehost endpoints

- Engines which run tools need to have a more specific label, one that includes th endpoint-label, to avoid duplicate labels like "profiler-1" - Logic for adding RB followers was a bit broken, only looking at the first RB message with new followers. - There is still work to do to avoid duplicate tool collection when more than 1 remotehost endpoint use the sme host (one could argue to not do this, but I'd like it to just avoid duplicte tools in case someone does).

-ensure tool_name gets into ES -void duplicate tool collection on remotehost endpoints that have same host -disable debug logging

-reducing some code duplication -documenting globals in each function -cpu-part needs to be fixed still

-other minor cleanups

-it's possible that a user used a remotehost endpoint as a "profiler" only, but then the tool-params.json was empty, resulting in an endpoint that still runs but doe snot launch engines. A small change to not create env-vars.json is needed.

-also fix call in k8s

- I think there's some conflict with by-ref var name and local vars in the function, which if true is disappointing

k-rister

First pass review. I plan on running some tests with it myself just to better understand it, but I'll get to that later.

endpoints/base

endpoints/remotehost/remotehost

engine/engine-script-library

rickshaw-post-process-bench

rickshaw-run

rickshaw-settings.json

atheurer force-pushed the one-tool branch from 6bfface to 5bceab2 Compare November 27, 2023 22:10

atheurer requested a review from k-rister November 27, 2023 22:11

atheurer marked this pull request as draft November 27, 2023 22:11

atheurer added the DO NOT MERGE label Nov 27, 2023

k-rister requested changes Nov 27, 2023

View reviewed changes

endpoints/remotehost/remotehost Outdated Show resolved Hide resolved

rickshaw-run Outdated Show resolved Hide resolved

rickshaw-run Outdated Show resolved Hide resolved

rickshaw-run Show resolved Hide resolved

atheurer force-pushed the one-tool branch from 842bc85 to 2837181 Compare January 9, 2024 02:01

atheurer removed the DO NOT MERGE label Jan 31, 2024

atheurer added 13 commits January 31, 2024 16:05

Run one tool per engine for remotehost

65a7e80

- engine naming needs to be resolved to allow multiple remotehost endpoints

Fixes

ddacc3d

-ensure tool_name gets into ES -void duplicate tool collection on remotehost endpoints that have same host -disable debug logging

major cleanup of remotehost

40b6519

-reducing some code duplication -documenting globals in each function -cpu-part needs to be fixed still

add cpu-part back and fix chroot

dec817f

-other minor cleanups

more fixes for cpu-part

d5697da

more fixes for one-tool

3484d16

specify proper image for tools

ac9a1ce

debug

c9c1634

missed cleanup

b9e9e0c

support an endpoint that has no engines

381feb5

-it's possible that a user used a remotehost endpoint as a "profiler" only, but then the tool-params.json was empty, resulting in an endpoint that still runs but doe snot launch engines. A small change to not create env-vars.json is needed.

1toolPerEngine support for osp endpoint

c44c8a9

cleanup

b7bd349

atheurer force-pushed the one-tool branch from 8179beb to b7bd349 Compare January 31, 2024 21:29

atheurer marked this pull request as ready for review January 31, 2024 21:35

k-rister linked an issue Jan 31, 2024 that may be closed by this pull request

Move to a one tool per engine execution model #288

Closed

atheurer requested a review from k-rister January 31, 2024 22:06

atheurer added 3 commits January 31, 2024 19:38

rename get_image

1766b9f

-also fix call in k8s

fix

a09fdd6

fix

6653f2d

- I think there's some conflict with by-ref var name and local vars in the function, which if true is disappointing

atheurer added 5 commits February 1, 2024 19:43

more support for one-tool for k8s

d310ba0

fix

4798fbb

fix

16c4fbe

fixes to pod spec

a81a9af

fixes to add_profiler_engines

da57fed

atheurer requested review from k-rister and removed request for k-rister February 3, 2024 02:50

k-rister reviewed Feb 4, 2024

View reviewed changes

fixes based on feedback

3897a99

k-rister reviewed Feb 5, 2024

View reviewed changes

rickshaw-settings.json Outdated Show resolved Hide resolved

atheurer added 2 commits February 5, 2024 14:51

separate bench and tool default userenvs

adc8605

more fixes based on feedback

25e76b2

atheurer requested a review from k-rister February 6, 2024 13:39

atheurer closed this Feb 6, 2024

atheurer deleted the one-tool branch February 6, 2024 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One tool #429

One tool #429

atheurer commented Nov 27, 2023 •

edited

Loading

k-rister left a comment

atheurer commented Nov 28, 2023

k-rister commented Nov 28, 2023

atheurer commented Nov 29, 2023

k-rister left a comment

One tool #429

One tool #429

Conversation

atheurer commented Nov 27, 2023 • edited Loading

k-rister left a comment

Choose a reason for hiding this comment

atheurer commented Nov 28, 2023

k-rister commented Nov 28, 2023

atheurer commented Nov 29, 2023

k-rister left a comment

Choose a reason for hiding this comment

atheurer commented Nov 27, 2023 •

edited

Loading