Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve Existing Best Practice Guide]: Automated checking for general sensitive information within Git #89

Closed
riverma opened this issue Feb 13, 2023 · 22 comments
Assignees
Labels
medium complexity Ticket is relatively straightforward but may have a difficult sub-task most requested Highly requested by community members software lifecycle Process improvements involving developing, testing, integrating, deploying software

Comments

@riverma
Copy link
Collaborator

riverma commented Feb 13, 2023

Checked for duplicates

Yes - I've already checked

Best Practice Guide

Continuous Integration

Best Practice Guide Sections

Starter Kits

Describe the improvement

We have some existing recommendations for checking sensitive AWS credential information via using git-secrets described here. However, we've received feedback that this could be improved via the following:

  • Sample pattern files to check for more specific sensitive information such as IPs, username / passwords, ARNs, security-groups
  • A GitHub-side automation that checks repositories even if folks have committed and pushed sensitive information

To support these two needs, we should evaluate if git-secrets is the right tool, or if it should be augmented or replaced with a better solution.

@riverma riverma moved this to 📋 Backlog in SLIM Planning Board Feb 14, 2023
@riverma riverma added low complexity Ticket is relatively easy & straightforward to resolve most requested Highly requested by community members software lifecycle Process improvements involving developing, testing, integrating, deploying software labels Feb 14, 2023
@jordanpadams
Copy link

jordanpadams commented Feb 15, 2023

One other idea:

  • absolute file paths - not sure how feasible/possible this is, but absolute file paths on file systems are considered sensitive from SAs

Per this:

A GitHub-side automation that checks repositories even if folks have committed and pushed sensitive information

The guidelines should also include a link to documentation about how to deep clean this from your commit history. Additionally, this automated check should also include GitHub Issues, which can include sensitive information.

@riverma
Copy link
Collaborator Author

riverma commented Feb 15, 2023

Great suggestions @jordanpadams - we will plan to include these in our scope.

@riverma
Copy link
Collaborator Author

riverma commented Feb 16, 2023

FYI @perryzjc see folks who are also interested in this scope of work here.

@sneely333
Copy link

We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.

@riverma
Copy link
Collaborator Author

riverma commented Feb 17, 2023

We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.

That's a very comprehensive list of tips - thank you very much @sneely333. We will look these over.

@perryzjc
Copy link
Member

perryzjc commented Feb 27, 2023

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

  • They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.

The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Tool Name File Content File Name Commit Message Pre-commit-hook Check history GitHub-side Automation Customized Pattern Entropy Analysis
git-secrets Yes No Yes Yes Yes Yes (doable) Yes No
gitleaks Yes No No Yes Yes Yes Yes Yes
trufflehog Yes No No Yes Yes Yes Yes Yes
talisman Yes Yes No Yes Yes Yes (doable) Yes Yes

@riverma riverma moved this from 📋 Backlog to 🏗 In Progress in SLIM Planning Board Feb 27, 2023
@riverma
Copy link
Collaborator Author

riverma commented Feb 27, 2023

@perryzjc - great work here. Will scope this and provide feedback. One tip is you'll want to also include GitHub's own secrets scanning tool in your trade-study. What is missing from GitHub's tool that these other tools support?

Example screenshot available in project settings on GitHub:
Screenshot 2023-02-27 at 11 09 07 AM

@riverma
Copy link
Collaborator Author

riverma commented Mar 2, 2023

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

  • They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.

The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.

Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:

  1. New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
  2. Full codebase scans (locally)
  3. Full codebase history, including previous commits

Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.

@perryzjc
Copy link
Member

perryzjc commented Mar 2, 2023

I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.

  • They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.

The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.
The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.

Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.

Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:

  1. New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
  2. Full codebase scans (locally)
  3. Full codebase history, including previous commits

Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.

Hi @riverma, thank you for the guidance! I will do one more trade study on GiHub’s own secret scanner and then make a architecture graph answering those questions.

@perryzjc
Copy link
Member

perryzjc commented Mar 8, 2023

As an comprehensive Architecture Diagram is taking longer than I expected (will post soon within next 2 days), I would like to provide an update on my trade study of GitHub Action's Secret scanning firstly.

GitHub Action's Secret scanning looks like a convenient and user-friendly product, particularly for GitHub-side automation. It offers additional features compared to other tools, but it also has its limitations, which is not friendly for public repositories.

Updated Trade Study table, compared to the old one

Tool Name File Content File Name Commit Message Pre-commit-hook Check history GitHub-side automation Customized Pattern Entropy Analysis
git-secrets Yes No Yes Yes Yes Yes (doable) Yes No
gitleaks Yes No No Yes Yes Yes Yes Yes
trufflehog Yes No No Yes Yes Yes Yes Yes
talisman Yes Yes No Yes Yes Yes (doable) Yes Yes
Secret scanning (GitHub's) Yes No No Similar as push protection Yes Yes, and more convenient as a built-in product Yes No

Here are the unique features of GitHub's Secret scanning

Pros

  • GitHub's built-in products offer easier configuration for GitHub-side automation.
  • In addition to the factors mentioned above, GitHub's Secret scanning also offers an additional feature : Scan Issue description and comment

Cons

My thought

Because of the Cons, I feel GitHub's secret scanning is not useful for public repository, it's more like a product helping companies discover whether the tokens they provide to users have been abused

@riverma
Copy link
Collaborator Author

riverma commented Mar 9, 2023

Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!

The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.

Look forward to your architecture / recommendation for this ticket!

@perryzjc
Copy link
Member

perryzjc commented Mar 10, 2023

Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!

The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.

Look forward to your architecture / recommendation for this ticket!

Hi @riverma - yes, I think it's doable to scan the issue tickets. Here is the relevant screenshot:
image

I have been doing a lot of research lately, consulting with other software engineers, and testing out multiple tools (in addition to the ones I mentioned previously). From what I've found so far, it seems that there isn't a single tool or combination of tools in the open source world that can fully meet all of our needs. For instance, it's challenging to find a tool that can scan file content, file names, commit information, history, issue tickets, support pre-commit-hooks, support regular expressions, and have entropy analysis capabilities all at once. This means that some customization will likely be necessary.

However, I recently stumbled upon a tool called "detect-secrets" that was recommended by Microsoft. It supports entropy analysis, has some commonly used regular expressions built-in, and scans quickly. It's also written in Python and designed to be scalable, which is a big plus.

While it doesn't currently support detecting file names and commit information, I've found that it's entirely feasible and relatively straightforward to modify the Python code and create a commit-msg hook.

The tool works well on GitHub Actions, but I did encounter one issue: the free version of GitHub doesn't support pre-receive hook. This means that although GitHub Action can detect the presence of sensitive information, the file has already been uploaded to the branch. This same issue also applies to issue tickets.

As of now, I don't have a solution for these two problems, but I think that the current approach is the most optimal solution compared to other options out there. It covers a wide range of needs and can potentially solve those two problems.

In addition, when it comes to scanning history, trufflehog is particularly powerful. If we could incorporate our customized detect-secrets tool into that type of historical scan, I believe the results would be excellent.

I'll be providing an architecture diagram shortly.

@perryzjc
Copy link
Member

perryzjc commented Mar 11, 2023

Here is the Scope of Work my solution able to provide:

Screen Shot 2023-03-10 at 10 06 53 PM

Note

The priority of each implementation is based on the needs from the community

Scope of Work:

  • Research and implement a workflow that can effectively manage secrets in git and GitHub repositories.
    • Able to identify various types of secrets, such as IPs, username / passwords, ARNs, security-groups, absolute file paths, sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.
    • Able to detect potential secrets that may not have been aware yet.
    • Utilize different methods to detect secrets, such as file content, filename, commit message, GitHub issue description and comments.
    • Scan the complete codebase history, including previous commits, to identify secrets.
    • Automatically detect secrets in both local commits and remote pushes.
      • Would be nice to Implement commit protect and push protect functionality to prevent accidental secrets exposure.
    • Provide guidelines on how to clean secrets from commit history, including relevant documentation.

@riverma
Copy link
Collaborator Author

riverma commented Mar 11, 2023

Hi @perryzjc - thanks for the write up here!

My thoughts:

  • Preference on the client-side scanning over GitHub if you're having trouble handling the latter. GitHub should serve as a backup layer to prevent sensitive info (more alerting than stopping), but understandably it won't have all the safety features of a git pre-hook. I think if someone writes code on GitHub itself and pushes to a branch, we could have the automation point to docs about purging the repo history. We don't want to require GitHub Enterprise features btw.
  • In terms of features to support, I think prioritize the features specifically mentioned in this ticket over others that might be nice but may get us bogged down
  • Some use cases to make things more tangible for your architecture diagram / approach: (1) client-side full scan of existing code base, (2) client-side scan of updated code upon Git commit, (3) server-side push to GitHub.com from client, or writing code on GitHub.com itself and being warned about sensitive info at earliest possible stage and pointers on how to purge / fix

@perryzjc
Copy link
Member

perryzjc commented Mar 11, 2023

Here is my diagram about how each tool relates to each need.

graph TD
subgraph solution
  subgraph Development-Tools
    subgraph open-source-tools
      detect_secrets[Detect Secrets]
      trufflehog[Trufflehog]
      style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style trufflehog fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph automation-tools
      subgraph local-git-hooks
        pre-commit[pre-commit hook]
        commit-msg[commit-msg hook]
        style pre-commit fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style commit-msg fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      end
      subgraph remote-GitHub-Action
        pre-receive[pre-receive hook]
        workflows[workflows on push]
        webhook[webhook]
        style pre-receive fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style workflows fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
        style webhook fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      end
    end
    style Development-Tools fill:#5DADE2,stroke:#333,stroke-width:2px
  end

  subgraph Needs
    Detecting-diverse-types-of-secrets{{Detecting-diverse-types-of-secrets}}
    Local-commit-protection[[fa:fa-ban Local-commit-protection]]
    Alarm-detected-secrets>fa:fa-camera-retro Alarm-detected-secrets]
    Push-protection[[fa:fa-ban Push-protection]]
    Detect-secrets-in-full-code-history[(Detect-secrets-in-full-code-history)]
    Detection-of-secrets-in-different-media{{Detection-of-secrets-in-different-media}}
  end

  detect_secrets == solves ==> Detecting-diverse-types-of-secrets
  local-git-hooks == solves ==> Local-commit-protection
  remote-GitHub-Action == solves ==> Alarm-detected-secrets
  pre-receive == solves if GitHub enterprise version ==> Push-protection
  remote-GitHub-Action == solves ==> Detection-of-secrets-in-different-media
  local-git-hooks == solves ==> Detection-of-secrets-in-different-media
  trufflehog == solves ==> Detect-secrets-in-full-code-history

  subgraph Other-notes
    subgraph development
      subgraph Server-may-needed
      end
      subgraph Web-crawling-may-needed
      end
    end
    subgraph needs
      subgraph webhook's-post-request
      end
      subgraph Secrets-in-previous-issues
      end
    end
    PRI([Becasue of the scalability of detect secrets, <br> all feature proposed here are implementable. <br> But priority is based on the needs of the community.])
    style PRI fill:#AF505C,stroke:#333,stroke-width:2px
  end

  webhook's-post-request -. need to be handled by .-> Server-may-needed
  Secrets-in-previous-issues -. can be scaned by .-> Web-crawling-may-needed

  webhook -- sends --> webhook's-post-request

  style solution fill:#EC7063,stroke:#666,stroke-width:4px
  style Detecting-diverse-types-of-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
  style Local-commit-protection fill:#48C9B0,stroke:#333,stroke-width:2px
  style Push-protection fill:#48C9B0,stroke:#333,stroke-width:2px
  style Alarm-detected-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
  style Detection-of-secrets-in-different-media fill:#48C9B0,stroke:#333,stroke-width:2px
  style Detect-secrets-in-full-code-history fill:#48C9B0,stroke:#333,stroke-width:2px
  style Server-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
  style Web-crawling-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
  style webhook's-post-request fill:#58D68D,stroke:#333,stroke-width:2px
  style Secrets-in-previous-issues fill:#58D68D,stroke:#333,stroke-width:2px
end
Loading

Development Tools:

  • Open source tools
  • Automation tools
    • local - git hooks
      1. pre-commit hook
      2. commit-msg hook)
    • remote - GitHub Action
      1. workflows on push
      2. webhook
      3. pre-receive hook if using GitHub enterprise

Notes

Becasue of the scalability of detect secrets, all feature proposed here are implementable.
But priority is based on the needs of the community.

Usage

detect secrets

  • written in python
  • easy to configure
  • has
    • Built-in plug-ins to detect popular patterns
    • Include Entrophy Analysis plug-in -- useful to detect -> the pattern not in the plug-ins yet
    • Scalable way to create customized plug-in -- helpful to detect -> the special needs from community, such as absolute file paths
      • plug-in is not limited to regular expression, any python code logic could work!
  • -- solves -> Detecting diverse types of secret
  • -- can work with -> local - git hook
  • -- can work with -> remote - GitHub Action

local - git hook

  • written in shell
  • has
    • pre-commit hook -- acheive -> commit protection
    • commit-msg hook -- acheive -> detect commit message
  • -- solves -> Local commit protection

remote - GitHub Action

  • written in yaml and shell
  • has
    • pre-receive hook (only for GitHub enterprise, not available for free version)
    • workflows on push -- arise alarm for -> detected secrets in new push
    • webhook -- support the secrets detect on -> a wider range of GitHub activities (including Issue discussion)
  • -- solves if GitHub enterprise version -> push protection
  • -- solves -> alarm 🚨 for detected secrets

local - git hook (AND) remote - GitHub Action

  • -- solves -> Detection of new secrets in different media

trufflehog

  • written in Go
  • has
    • convinient and strong functionality on scanning history
      • -- able to scans -> the history of a repository
      • -- able to scans -> the history of a organization
  • -- solves -> Detecting secrets in full code history
    • But it's not as scalable as detect secrets , so it's not easy to scan the history of secrets that appears on filename, commit message, and the previsou issue

Other notes

  • GitHub Action's webhook work in this way:
    • Once there is an event triggered, GitHub send the information to an URL
      • So if we want to detect the secrets, we probably need to hold a server to handle the post request. Call detect secrets to detect the information inside the post request
  • GitHub Action's webhook get triggered only for new event
    • So if we want to scan the previous issues, we may need to implement Web crawling to obtain the information of all issue tickets, then call our existing function to handle the detected secrets.

@perryzjc
Copy link
Member

Hi @perryzjc - thanks for the write up here!

My thoughts:

  • Preference on the client-side scanning over GitHub if you're having trouble handling the latter. GitHub should serve as a backup layer to prevent sensitive info (more alerting than stopping), but understandably it won't have all the safety features of a git pre-hook. I think if someone writes code on GitHub itself and pushes to a branch, we could have the automation point to docs about purging the repo history. We don't want to require GitHub Enterprise features btw.
  • In terms of features to support, I think prioritize the features specifically mentioned in this ticket over others that might be nice but may get us bogged down
  • Some use cases to make things more tangible for your architecture diagram / approach: (1) client-side full scan of existing code base, (2) client-side scan of updated code upon Git commit, (3) server-side push to GitHub.com from client, or writing code on GitHub.com itself and being warned about sensitive info at earliest possible stage and pointers on how to purge / fix

Hi @riverma, thanks for the suggestions!

I'll prioritize the needs of the community for the actual implementation. The first diagram was just to show the potential features of my solution.

I've added another diagram to show how each tool relates to each need.

Next up, I'll work on the diagram you mentioned.

With these three diagrams, hope it can provide people a better understanding of what my solution can do and how to use it.

@perryzjc
Copy link
Member

@riverma Here are my other diagrams of the solution. It includes the most essential parts of my solution.

Solution Structure Diagram

graph TD
subgraph SolutionStructure
  subgraph SecretsDetectionApproach
    subgraph Layer1["Layer 1: Push to GitHub.com (server-side)"]
      style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph Layer2["Layer 2: Scan of updated code upon Git commit (client-side)"]
      style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
    subgraph Layer3["Layer 3: Full scan of the existing code base (client-side)"]
      style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
  end

  subgraph Tools
    subgraph CoreTool["Core Tool: Detect Secrets"]
      style CoreTool fill:#5DADE2,stroke:#333,stroke-width:2px
      detect_secrets{{detect-secrets}}
      style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end

    subgraph OtherTools["Other Tools"]
      pre_commit_ci[pre-commit.ci]
      github_action[GitHub Action]
      pre_commit_manager[pre-commit manager]
      style github_action fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style pre_commit_ci fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
      style pre_commit_manager fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
    end
  end

  subgraph LayerDetails
    subgraph Layer1Details[Layer 1 Details]
      Compatible_with_all_local_machines{{Compatible with all local machines}}
      Protection_for_main_branch{{Protection for the main branch}}
      Error_notifications_via_GitHub_email{{Error notifications via GitHub and email}}
      Implemented_with_detect_secrets_workflow[[Implemented with detect-secrets workflow]]
    end
    subgraph Layer2Details[Layer 2 Details]
      Optional_due_to_compatibility_issues{{Optional due to compatibility issues}}
      Early_stage_secrets_detection{{Early-stage secrets detection}}
      Commit_prevention_and_error_messages{{Commit prevention and error messages}}
      Implemented_with_pre_commit_manager[[Implemented with pre-commit manager]]
    end
    subgraph Layer3Details[Layer 3 Details]
      Direct_use_of_detect_secrets{{Direct use of detect-secrets}}
      Error_messages_for_detected_secrets{{Error messages for detected secrets}}
    end
  end

  Layer1 -->|uses| pre_commit_ci
  Layer1 -->|uses| github_action
  Layer2 -->|uses| pre_commit_manager
  Layer1 -->|uses| detect_secrets
  Layer2 -->|uses| detect_secrets
  Layer3 -->|uses| detect_secrets


  style SolutionStructure fill:#EC7063,stroke:#666,stroke-width:4px
  style SecretsDetectionApproach fill:#AF7AC5,stroke:#333,stroke-width:2px
  style Tools fill:#48C9B0,stroke:#333,stroke-width:2px
  style LayerDetails fill:#F1948A,stroke:#333,stroke-width:2px
  style Compatible_with_all_local_machines fill:#58D68D,stroke:#333,stroke-width:2px
  style Protection_for_main_branch fill:#58D68D,stroke:#333,stroke-width:2px
  style Error_notifications_via_GitHub_email fill:#58D68D,stroke:#333,stroke-width:2px
  style Implemented_with_detect_secrets_workflow fill:#58D68D,stroke:#333,stroke-width:2px
  style Optional_due_to_compatibility_issues fill:#58D68D,stroke:#333,stroke-width:2px
  style Early_stage_secrets_detection fill:#58D68D,stroke:#333,stroke-width:2px
  style Commit_prevention_and_error_messages fill:#58D68D,stroke:#333,stroke-width:2px
  style Implemented_with_pre_commit_manager fill:#58D68D,stroke:#333,stroke-width:2px
  style Direct_use_of_detect_secrets fill:#58D68D,stroke:#333,stroke-width:2px
  style Error_messages_for_detected_secrets fill:#58D68D,stroke:#333,stroke-width:2px
end
Loading

User Workflow Diagram

flowchart TB
  User([fa:fa-user User])

  subgraph UserWorkflow["User Workflow to Secure Secrets"]
    Layer1["1. Layer 1: GitHub.com (server-side)"]
    Layer2["2. Layer 2: Git commit scan (client-side)"]
    Layer3["3. Layer 3: Full scan (client-side)"]

    Layer1 -->|If Secrets Detected| Clean1[Purge or Fix the commit manually]
    Layer2 -->|If Secrets Detected| Clean2[Clean local file directly. <br> Don't need to worry about cleaning commit history]
    Layer3 -->|If Secrets Detected| Clean3[Clean local file directly.]

    Secure["Only Main branch is in safe. <br> Secrets are leaked on other branch before cleaning"]
    Clean1 --> Secure
    
    SaveTime["It saves your time. And secrets are safe from GitHub"]
    Clean2 --> SaveTime
    Clean3 --> SaveTime
  end

  User -->|At least use| Layer1
  User -->|Helpful to use| Layer2
  User -->|Optional to use| Layer3

  style User fill:#F6F5F3,stroke:#333,stroke-width:1px
  style UserWorkflow fill:#AF7AC5,stroke:#333,stroke-width:2px
  style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
  style Clean1 fill:#5A88ED,stroke:#333,stroke-width:2px
  style Clean2 fill:#5A88ED,stroke:#333,stroke-width:2px
  style Clean3 fill:#5A88ED,stroke:#333,stroke-width:2px
  style SaveTime fill:#5ABF9B,stroke:#333,stroke-width:2px
  style Secure fill:#AF3034,stroke:#333,stroke-width:2px
Loading

Documentation

Solution Structure Diagram

  1. Secrets Detection Approach

    • Layer 1: Push to GitHub.com from the client (server-side)
    • Layer 2: Scan of updated code upon Git commit (client-side)
    • Layer 3: Full scan of the existing code base (client-side)
  2. Tools:

  3. Layer Details

    • Layer 1: Server-side push to GitHub.com
      • Compatible with all local machines
      • Protection for the main branch
      • If secrets detected,
        • Error notifications (including guideline of fix/ purge) via GitHub and email
      • Implemented with .github/workflows/detect-secrets.yml, pre-commit.ci, and detect-secrets
    • Layer 2: Client-side scan upon Git commit
      • May have compatibility issues
      • Early-stage secrets detection
      • If secrets detected,
        • Commit prevention and error messages
      • Implemented with pre-commit manager and .pre-commit-config.yaml using detect-secrets
    • Layer 3: Full scan of existing code base

User Workflow Diagram

  1. User Interaction with Layers:

    • Layer 1: GitHub.com (server-side)
      • The user should at least use this layer for securing secrets.
    • Layer 2: Git commit scan (client-side)
      • Using this layer is helpful for the user to detect secrets early on.
    • Layer 3: Full scan (client-side)
      • This layer is optional for the user to use for additional security.
  2. Actions to be taken if secrets are detected:

    • Layer 1: Purge or fix the commit manually
      • If secrets are detected, the user must purge or fix the commit manually to ensure the main branch remains secure.
    • Layer 2: Clean local file directly
      • If secrets are detected, the user can clean the local file directly, without worrying about cleaning the commit history.
    • Layer 3: Clean local file directly
      • If secrets are detected, the user can clean the local file directly.
  3. Effects of using different layers:

    • Using Layer 1 ensures that only the main branch is safe, and secrets are leaked on other branches before cleaning.
    • Using Layer 2 and Layer 3 saves the user's time and keeps secrets safe from GitHub.

@riverma
Copy link
Collaborator Author

riverma commented Mar 21, 2023

Hi @perryzjc -

Excellent work here with the research and brining this all together. I support your plan here, but I have a couple questions and suggestions.

Questions:

  • How would developers be notified of sensitive information being accidentally pushed via the "Layer 1" workflow?
  • With detect-secrets - where are patterns / RegExes stored such that users can customize further (from our baseline) which sensitive patterns to search for?
  • How active of a project would you say detect-secrets is and how does that affect the risk of using the software? I see the last release was in Oct 2022, and last commit Dec 2022. On the other hand, TruffleHog seems far more active. One way to assess is to reach out to the project's community and see how soon they respond to your questions.

Suggestions:

  • Keep it simple for the user: the "Layer 1" workflow should be a stand-alone GitHub Action that can be deployed with a single click. The "Layer 2/3" workflows should be as simple as a two-step process: (1) installing the software / dependencies using a package manager set of commands, (2) loading a custom configuration you've created and running away
  • In terms of prioritization: I'd suggest you start with Layer 1, then Layer 2, then Layer 3. This order of priority would make infusion to projects the simplest.
  • I think the key for the "Layer 1" workflow is we don't want public alerts for sensitive information found, we want to alert the developers so they can quickly and surreptitiously make the fix.
  • For the "Layer 1" workflow - it'd be good to link to clean-up instructions for past commits.

@perryzjc
Copy link
Member

perryzjc commented Mar 22, 2023

Hi Rishi, here is my response to the Questions:

  1. Developers will be notified by email sent from GitHub Action.

  2. detect-secrets has a plug-in folder We can add features (other patterns) by putting or modifying the Python scripts in that folder. It's scalable and convenient.

  3. The newest release of detect-secrets was on October 5, 2022. It's much more active than git-secrets, whose latest update was three years ago.

  4. Also, detect-secrets is an enterprise-friendly way of detecting and preventing secrets in code. Currently, I've found that IBM and Yelp are using it. I've also found that it is recommended by Microsoft

Also, thank you for the Suggestions! When it comes to the implementation, I will try to complete those features and make them as convenient and secure as possible.

@jpl-jengelke jpl-jengelke added medium complexity Ticket is relatively straightforward but may have a difficult sub-task and removed low complexity Ticket is relatively easy & straightforward to resolve labels Mar 22, 2023
@perryzjc
Copy link
Member

perryzjc commented May 11, 2023

Here are three sequence diagrams to help people better understand the three layers.

Layer1 - Server-side push to GitHub.com

sequenceDiagram
    participant User as Developer
    participant GH as GitHub
    participant Config as .pre-commit-config.yaml
    participant CI as Pre-commit CI
    participant DS as Detect-Secrets

    Note over User,GH: Developer creates pull request or pushes to branch
    User->>+GH: Creates pull request / pushes to branch
    GH->>+Config: Fetches pre-commit config
    Config->>CI: Returns config with Detect-Secrets setup
    CI->>DS: Requests secret scan
    DS->>DS: Scans pull request / branch for secrets with custom plugins
    alt Secrets Detected
        DS-->>CI: Returns detected secrets
        CI-->>GH: Reports status check as failed
        GH-->>User: Prevents merge / push & reports status check
    else No Secrets Detected
        DS-->>CI: Returns clean result
        CI-->>GH: Reports status check as passed
        GH-->>User: Allows merge / push
    end

Loading

Layer2 - Git commit scan (client-side)

sequenceDiagram
    participant User as Developer
    participant Local as Local Environment
    participant Config as .pre-commit-config.yaml
    participant PCH as Pre-commit Hook
    participant DS as Detect-Secrets
    participant File as Baseline File

    Note over User,Local: Developer attempts to commit
    User->>+Local: Request commit
    Local->>+Config: Fetches pre-commit config
    Config->>PCH: Returns config with Detect-Secrets setup
    PCH->>DS: Request secret scan with existing baseline
    DS->>File: Fetches baseline file
    File->>DS: Returns baseline file
    DS->>DS: Scans changes for secrets with custom plugins
    alt New Secrets Detected
        DS-->>PCH: Returns detected secrets
        PCH-->>Local: Prevents commit & reports detected secrets
        Local-->>User: Prevents commit & reports detected secrets
    else No New Secrets Detected
        DS-->>PCH: Returns clean result
        PCH-->>Local: Allows commit
        Local-->>User: Commits changes
    end

Loading

Layer3 - Full scan and audit (client-side)

sequenceDiagram
    participant Dev as Developer
    participant Env as Local Environment
    participant DS as Detect-Secrets
    participant File as Baseline File
    participant Audit as Audit Tool

    Note over Dev,Env: Developer initiates a direct scan for secrets
    Dev->>+Env: Triggers direct scan
    Env->>+DS: Requests scan on the codebase
    DS->>DS: Performs secret scanning
    DS->>File: Generates new baseline file
    File->>DS: Acknowledges file creation
    DS-->>-Env: Returns scan results and new baseline file
    Env-->>Dev: Presents scan results and new baseline file
    Note over Dev,File: Developer may audit the new baseline file
    Dev->>Audit: Initiates audit on the new baseline file
    Audit->>File: Fetches details from the baseline file
    File->>Audit: Returns secret details
    Audit-->>Dev: Presents detailed information of detected secrets

Loading

perryzjc added a commit to NASA-AMMOS/slim-config-detect-secrets that referenced this issue May 11, 2023
perryzjc added a commit to perryzjc/slim that referenced this issue May 11, 2023
Configuration files (yaml, baseline file, and plugins)
are stored at another repository:
https://github.com/NASA-AMMOS/slim-config-detect-secrets
perryzjc added a commit to perryzjc/slim that referenced this issue May 12, 2023
Configuration files (yaml, baseline file, and plugins) are stored at another repository:
https://github.com/NASA-AMMOS/slim-config-detect-secrets
@riverma riverma moved this from 🏗 In Progress to 👀 In Review in SLIM Planning Board May 12, 2023
perryzjc added a commit to perryzjc/slim that referenced this issue May 19, 2023
@riverma
Copy link
Collaborator Author

riverma commented Jun 10, 2023

@perryzjc has proposed his plugins as PR's to Yelp's Detect Secrets core codebase here. Once those are accepted, we may no longer need to host a separate fork of detect secrets in the future.

riverma added a commit that referenced this issue Jun 10, 2023
Issue #89: Update documentation for detect-secrets
@riverma riverma moved this from 👀 In Review to ✅ Work Complete in SLIM Planning Board Jun 14, 2023
@riverma
Copy link
Collaborator Author

riverma commented Jun 15, 2023

Resolved.

@riverma riverma closed this as completed Jun 23, 2023
ingyhere pushed a commit to ingyhere/slim-starterkit-python that referenced this issue Mar 15, 2024
ingyhere pushed a commit to ingyhere/slim-starterkit-python that referenced this issue Mar 15, 2024
ingyhere added a commit to ingyhere/slim-starterkit-python that referenced this issue Mar 15, 2024
ingyhere added a commit to NASA-AMMOS/slim-starterkit-python that referenced this issue Mar 15, 2024
NASA-AMMOS/slim#89: Functional GitHub Actions-based secrets detection…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium complexity Ticket is relatively straightforward but may have a difficult sub-task most requested Highly requested by community members software lifecycle Process improvements involving developing, testing, integrating, deploying software
Projects
Archived in project
Development

No branches or pull requests

5 participants