Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy automations: run script #17129

Closed
51 of 54 tasks
dherder opened this issue Feb 23, 2024 · 64 comments
Closed
51 of 54 tasks

Policy automations: run script #17129

dherder opened this issue Feb 23, 2024 · 64 comments
Assignees
Labels
~apple-mdm-maturity Contributes to maturity in macOS, iOS, or iPadOS MDM product category. ~csa Issue was created by or deemed important by the Customer Solutions Architect. customer-cisneros customer-deebradel customer-easterwood customer-flacourtia customer-flavia customer-knopfia customer-mozartia customer-numa customer-pingali customer-reedtimmer customer-rosner customer-schur ~dogfood Issue resulted from Fleet's product dogfooding. #g-endpoint-ops Endpoint ops product group P2 Prioritize as urgent :product Product Design department (shows up on 🦢 Drafting board) prospect-brashear prospect-cloutier prospect-konrad prospect-oaxaca prospect-pingouin prospect-rembrandt prospect-themis ~sc Request is a requirement in a presales opportunity story A user story defining an entire feature
Milestone

Comments

@dherder
Copy link
Contributor

dherder commented Feb 23, 2024

Goal

User story
As a Fleet user,
I want a policy failure in Fleet to trigger a script run on a host
so that I can run scripts on many hosts w/o having to use a third-party automation tool (ex. Tines).

"Policy automations: install software" (#19551). Except now we're triggering script runs.

Context

Changes

Product

Engineering

  • Database schema migrations: TODO
  • Load testing: TODO

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

  • Requires load testing: TODO
  • Risk level: Low / High TODO
  • Risk description: TODO

Manual testing steps

Migration

  • Starting with a script and policy created in <= 4.57.x works for these automation workflows

Regression avoidance

  • Manual script execution works
  • Manual script execution errors when the same script is already queued
  • Software install automation works (@jacobshandling and I QA'd this one for 4.57 if you need pointers), on both team and no-team

UI

  • Script automation is available for teams, including No Team
  • Script automation is not available for global policies
  • Script automation dialog allows adding/changing/removing scripts from team-specific policies (global-inherited policies should not be shown)
  • Scripts error on deletion attempt if they are associated with a policy, with useful error text
  • Scripts can be deleted if they are removed from a policy automation
  • Adding or changing a script automation for a policy clears that policy's stats/host statuses
  • Removing a script automation for a policy does not clear that policy's stats/host statuses
  • Changing a policy's name does not clear that policy's status/host statuses

Policy automation execution

* Known issue: No author on upcoming/past script run activity (fix incoming, pending product confirmation)

  • PowerShell scripts work on Windows
  • shell scripts work on macOS
  • zsh scripts work
  • shell scripts work on Linux
  • Pending activity visible for script run once queued
  • Script run activity shows in Past once executed
  • Manual script run fails when a policy failure has queued the same script
No-ops
  • Passing policies
  • Policies not assigned to the host's platform, even if the script could run (e.g. policy for macOS that would run a shell script, but host is Linux)
  • Policies with no script automation
  • Identical policy on another team has a script automation, but this team's policy version doesn't
  • Policies failing -> failing
  • Vanilla osquery
  • Host scripts are not enabled (if we don't have up to date information server-side, this may be an attempted run, followed by an exit code of -2)
  • Scropts are globally disabled
  • Too many (1k+) pending scripts (can test this with an offline host)
  • Same script is already pending for this host
  • Host is Windows and script is a shell script
  • Host is not-Windows and script is a PowerShell script

GitOps

  • Known issue: Non-functional on no-team due to path mismatch (so test on a different teaml fix incoming as part of GitOps script path fix)
  • Succeeds in setting up (confirm via UI) with correct YAML in team
controls:
    scripts:
        - path: ../path/to/script.sh
policies
   - # normal policy 
        run_script:
            path: ../path/to/script.sh
  • Succeeds when policy is defined in its own file, in a directory at a different nesting level than the team file

Changing existing configuration

  • If policy automation is dropped from YAML, it's dropped on-apply to the server
  • If policy automation is dropped and script is dropped from YAML, application is successful (script is deleted, policy automation is removed, no fkey issues)
  • If script contents change but path does not, script is updated in-place but policy is not reset
  • If script path changse (need to change in both controls and run_script), policy status/hosts are reset

Validation errors

  • Fails when attempted on global
  • Fails when script not found at path
  • Fails when script isn't also specified for the team
  • Fails on malformed YAML (e.g. missing value on path property)

Confirmation

  1. Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. QA (@____): Added comment to user story confirming successful completion of QA.
@dherder dherder added :product Product Design department (shows up on 🦢 Drafting board) ~feature fest Will be reviewed at next Feature Fest customer-rosner labels Feb 23, 2024
@noahtalerman
Copy link
Member

noahtalerman commented Feb 27, 2024

I would like to execute a script automatically when a policy fails instead of trigger a webhook.

@dherder we'll get to this but I think there's an iteration or two before we build it.

Currently, the customer can consume the failing policies webhook in Tines and execute a script using the Fleet API, right?

I think the first iteration will be sending a webhook per host that includes all the hosts failing policies. I think this simplifies the Tines story. The Tines story becomes this:

  1. Receive new webhook that includes a specific host's failing policies
  2. Loop through policies and take remediation action specific to each failing policy (via script or some other tool)

@noahtalerman noahtalerman removed the :product Product Design department (shows up on 🦢 Drafting board) label Feb 27, 2024
@dherder
Copy link
Contributor Author

dherder commented Feb 29, 2024

@noahtalerman would also be good to get a Fleet desktop notification on failed policies similar to #16264

@noahtalerman
Copy link
Member

would also be good to get a Fleet desktop notification on failed policies

@dherder the current plan is to solve the problem of notifying the end user by getting in their calendar: #17230

@dherder
Copy link
Contributor Author

dherder commented Mar 7, 2024

@noahtalerman I see the calendar remediation as a separate issue. It works great when you want an end user to do a thing like update an app or perform an OS update. Where it doesn't work so great is if you want the remediation to be "execute a root level script", where if the user is a standard user, they just simply wouldn't be able to do it.

@noahtalerman
Copy link
Member

Where it doesn't work so great is if you want the remediation to be "execute a root level script", where if the user is a standard user, they just simply wouldn't be able to do it.

@dherder I think the first iteration of "Fleet in your calendar" will address this.

The high level flow of the feature:

  1. IT admin chooses which policies trigger calendar events
  2. Calendar event is created when end user fails at least one of these policies
  3. Webhook is fire when the calendar event starts
  4. Automation tool (ex. Tines) receives the webhook and runs atuo-remediation (ex. script)

Check out the user story for more details on the flow: #17230

What do you think?

Also, we didn't have room for this "Auto remediation of policy failure" story in the current design sprint (4.48).

@noahtalerman noahtalerman added prospect-konrad and removed ~feature fest Will be reviewed at next Feature Fest labels Mar 11, 2024
@dherder dherder added the ~feature fest Will be reviewed at next Feature Fest label Apr 1, 2024
@noahtalerman noahtalerman removed the ~feature fest Will be reviewed at next Feature Fest label Apr 19, 2024
@pintomi1989 pintomi1989 added the ~csa Issue was created by or deemed important by the Customer Solutions Architect. label Apr 23, 2024
@nonpunctual
Copy link
Contributor

@noahtalerman it's still does not solve the problem of 3rd party solution integration that is a blocker for some of our current customers but especially prospective customers.

The expectation is that if Fleet has the script server-side & Fleet has a policy to check for a client state or attribute, that it would also have a way of executing the script on a policy failure without 3rd party integration required.

Couldn't Fleet just send the policy failure webhook to its own API endpoint for executing a script? Is there a technical concern like load on server due to script execution? Thanks.

cc @dherder @willmayhone88 @spokanemac @ksatter @pacamaster

@nonpunctual nonpunctual added customer-flacourtia ~feature fest Will be reviewed at next Feature Fest labels May 2, 2024
@dherder
Copy link
Contributor Author

dherder commented May 2, 2024

@noahtalerman i presented the option of remediation through 3rd party automation tools today (IT buying scenario) and the feedback was that it would be a blocker to move forward with Fleet.

@nonpunctual nonpunctual changed the title Auto remediation of policy failure Auto remediation (script execution) on policy failure May 2, 2024
@noahtalerman
Copy link
Member

Couldn't Fleet just send the policy failure webhook to its own API endpoint for executing a script? Is there a technical concern like load on server due to script execution? Thanks.

@nonpunctual no technical concern that I know of. It's just a matter of priorities/timing. Let's chat about it at feature fest!

@dherder dherder added the ~sc Request is a requirement in a presales opportunity label May 9, 2024
@noahtalerman
Copy link
Member

Hey @iansltx when you get the chance, can you please sanity check me here?

@noahtalerman
Copy link
Member

  • Changes to paid features or tiers: Available to Fleet Premium users only. Updating fleetdm.com/pricing is still TODO

Let's update the guide that "Device remediation" points to (remediation) to link to guides for automatically run scripts and install software:

Screenshot 2024-10-28 at 2 17 52 PM

We can frame these features (paid only) as device remediation.

rachaelshaw pushed a commit that referenced this issue Oct 28, 2024
…) (#23300)

- Update guides to reflect use case: automatically run scripts and
install software
- @noahtalerman: I removed top image from "Automatically run scripts"
b/c I think it looked rushed/unexpected
  - Update "execute" language to "run" and add "manual" language
- Clarify when a policy's host counts are reset
- Clarify support for policy automations: team v. default (global) v. no
team
- Update `software.packages` example to best practice: separate file
  - Inline is supported for backwards compatibility
- Remove `policies` and `controls` call outs about "No team." This info
is covered in the starter filed in fleetdm/gitops. For an example, see
`teams/no-teams.yml` here:
https://github.com/fleetdm/fleet-gitops/blob/main/teams/no-team.yml
@noahtalerman
Copy link
Member

Hey @iansltx just giving you another ping! Can you please sanity check me here?

This is what we have documented in the permissions guide: https://fleetdm.com/guides/role-based-access#user-permissions

Screenshot 2024-10-29 at 9 24 27 AM

@iansltx
Copy link
Member

iansltx commented Oct 29, 2024

@noahtalerman Re: permissions, as implemented in the API the team-specific policy automations (software install, script run) only require policy write permissions, so they're available to Maintainers as well as Admins and GitOps. My guess is that global automations are only available to admins, and that's what the existing permissions line item is referencing.

If we need to tighten down permissions for scripts/software it's doable, and could land in 4.59.0 if needed, but that would be a change from 4.57/4.58, and I'm not sure what the UI enforces here.

@noahtalerman
Copy link
Member

noahtalerman commented Oct 29, 2024

available to Maintainers as well as Admins and GitOps

@iansltx ah, ok. I think no need to update the permissions in the code. We just want the documentation to be accurate.

UPDATE: @noahtalerman: I opened a draft PR here: #23433

When you get the chance, can you please take a pass at a PR to the permissions guide? https://fleetdm.com/guides/role-based-access

@noahtalerman
Copy link
Member

in the API the team-specific policy automations (software install, script run) only require policy write permissions, so they're available to Maintainers as well as Admins and GitOps. My guess is that global automations are only available to admins, and that's what the existing permissions line item is referencing.

@iansltx when you get the chance can you please double check that these^ are the current permissions? I opened up a draft PR to the permissions table here: #23433

I'm not sure what the UI enforces here.

@RachelElysia are the permissions mentioned above also enforced in the UI?

@noahtalerman noahtalerman assigned noahtalerman and unassigned iansltx Oct 31, 2024
@RachelElysia
Copy link
Member

RachelElysia commented Oct 31, 2024

@noahtalerman

According to the code for the UI: For policy automations dropdown on the policy page, the user has to be a global admin or a team admin, and they need to be viewing a team policy table with at least one team policy shown on the UI table. The UI button for managing automations for policies is hidden for maintainers.

Just logged in as a team maintainer and confirmed Policy Automations dropdown is hidden for maintainers.

@iansltx
Copy link
Member

iansltx commented Oct 31, 2024

So, given the above, we have the API enforcing looser permissions than the UI. Do we want to:

  1. Tighten the API up (in which case docs stay the same)
  2. Allow maintainers to edit team-specific policy automations in the UI (in which case we should have a new line item in docs for software install/script run policies as their permissions are distinct from global automations like webhooks and calendars)
  3. Do nothing (what should docs say in this case)

@iansltx
Copy link
Member

iansltx commented Oct 31, 2024

Per design review just now, we're taking the second option of the above.

Action items (all on me):

  1. Verify that global automations are indeed limited to Admin or above (if I'm wrong here and global automations work for admins the next two items will look different)
  2. Add a frontend bug for mismatched permissions (FE should show install/script automations to maintainers)
  3. Update Team maintainers can manage policy automations #23433 with an additional line item for install/script automations (Maintainer or above permissions), and remove the new maintainer permission note on the existing policy automations line item

Self-assigning this until the above are done.

@iansltx iansltx assigned iansltx and unassigned noahtalerman Oct 31, 2024
@iansltx
Copy link
Member

iansltx commented Nov 1, 2024

Confirmed that global automations are admin-or-above; modifications to global automations hit the global config endpoint, which is controlled by the app_config.write permission, which is gated to admin or gitops.

Docs update incoming.

@iansltx
Copy link
Member

iansltx commented Nov 1, 2024

Going to set this up as a new PR to clean up the approval flow (and since the content of the PR is going to wind up quite different from the original docs change).

iansltx added a commit that referenced this issue Nov 1, 2024
…ftware install (#19551) and script execution (#17129) policy automations
@iansltx
Copy link
Member

iansltx commented Nov 1, 2024

RBAC docs PR is up: #23447

@iansltx
Copy link
Member

iansltx commented Nov 1, 2024

#23448 created for matching UI permissions with API permissions. Reassigning this ticket back to @noahtalerman for continuation of confirmation and celebration.

@iansltx iansltx assigned noahtalerman and unassigned iansltx Nov 1, 2024
iansltx added a commit that referenced this issue Nov 4, 2024
…and software install (#19551) and script execution (#17129) policy automations (#23447)

Co-authored-by: Noah Talerman <[email protected]>
@noahtalerman
Copy link
Member

@Patagonia121 @pintomi1989 @pintomi1989 @zayhanlon @ambrusps @phtardif1 @AnthonySnyder8 heads up that this user story was shipped in 4.58 🚀

Here's the guide.

(we wait to close the issue until reference docs are updated and guide is published)

@fleet-release
Copy link
Contributor

Script triggers rise,
Like sun on distant hosts gleams,
Effortless, we thrive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~apple-mdm-maturity Contributes to maturity in macOS, iOS, or iPadOS MDM product category. ~csa Issue was created by or deemed important by the Customer Solutions Architect. customer-cisneros customer-deebradel customer-easterwood customer-flacourtia customer-flavia customer-knopfia customer-mozartia customer-numa customer-pingali customer-reedtimmer customer-rosner customer-schur ~dogfood Issue resulted from Fleet's product dogfooding. #g-endpoint-ops Endpoint ops product group P2 Prioritize as urgent :product Product Design department (shows up on 🦢 Drafting board) prospect-brashear prospect-cloutier prospect-konrad prospect-oaxaca prospect-pingouin prospect-rembrandt prospect-themis ~sc Request is a requirement in a presales opportunity story A user story defining an entire feature
Development

No branches or pull requests