-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improve Existing Best Practice Guide]: Automated checking for general sensitive information within Git #89
Comments
One other idea:
Per this:
The guidelines should also include a link to documentation about how to deep clean this from your commit history. Additionally, this automated check should also include GitHub Issues, which can include sensitive information. |
Great suggestions @jordanpadams - we will plan to include these in our scope. |
We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords. |
That's a very comprehensive list of tips - thank you very much @sneely333. We will look these over. |
I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.
The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of. The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.
|
@perryzjc - great work here. Will scope this and provide feedback. One tip is you'll want to also include GitHub's own secrets scanning tool in your trade-study. What is missing from GitHub's tool that these other tools support? |
Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well. Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:
Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks. |
Hi @riverma, thank you for the guidance! I will do one more trade study on GiHub’s own secret scanner and then make a architecture graph answering those questions. |
As an comprehensive Architecture Diagram is taking longer than I expected (will post soon within next 2 days), I would like to provide an update on my trade study of GitHub Action's Secret scanning firstly. GitHub Action's Secret scanning looks like a convenient and user-friendly product, particularly for GitHub-side automation. It offers additional features compared to other tools, but it also has its limitations, which is not friendly for public repositories. Updated Trade Study table, compared to the old one
Here are the unique features of GitHub's Secret scanningPros
Cons
My thoughtBecause of the Cons, I feel GitHub's secret scanning is not useful for public repository, it's more like a product helping companies discover whether the tokens they provide to users have been abused |
Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here! The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work. Look forward to your architecture / recommendation for this ticket! |
Hi @riverma - yes, I think it's doable to scan the issue tickets. Here is the relevant screenshot: I have been doing a lot of research lately, consulting with other software engineers, and testing out multiple tools (in addition to the ones I mentioned previously). From what I've found so far, it seems that there isn't a single tool or combination of tools in the open source world that can fully meet all of our needs. For instance, it's challenging to find a tool that can scan file content, file names, commit information, history, issue tickets, support pre-commit-hooks, support regular expressions, and have entropy analysis capabilities all at once. This means that some customization will likely be necessary. However, I recently stumbled upon a tool called "detect-secrets" that was recommended by Microsoft. It supports entropy analysis, has some commonly used regular expressions built-in, and scans quickly. It's also written in Python and designed to be scalable, which is a big plus. While it doesn't currently support detecting file names and commit information, I've found that it's entirely feasible and relatively straightforward to modify the Python code and create a commit-msg hook. The tool works well on GitHub Actions, but I did encounter one issue: the free version of GitHub doesn't support pre-receive hook. This means that although GitHub Action can detect the presence of sensitive information, the file has already been uploaded to the branch. This same issue also applies to issue tickets. As of now, I don't have a solution for these two problems, but I think that the current approach is the most optimal solution compared to other options out there. It covers a wide range of needs and can potentially solve those two problems. In addition, when it comes to scanning history, trufflehog is particularly powerful. If we could incorporate our customized detect-secrets tool into that type of historical scan, I believe the results would be excellent. I'll be providing an architecture diagram shortly. |
Here is the Scope of Work my solution able to provide: NoteThe priority of each implementation is based on the needs from the community Scope of Work:
|
Hi @perryzjc - thanks for the write up here! My thoughts:
|
Here is my diagram about how each tool relates to each need. graph TD
subgraph solution
subgraph Development-Tools
subgraph open-source-tools
detect_secrets[Detect Secrets]
trufflehog[Trufflehog]
style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style trufflehog fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph automation-tools
subgraph local-git-hooks
pre-commit[pre-commit hook]
commit-msg[commit-msg hook]
style pre-commit fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style commit-msg fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph remote-GitHub-Action
pre-receive[pre-receive hook]
workflows[workflows on push]
webhook[webhook]
style pre-receive fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style workflows fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style webhook fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
style Development-Tools fill:#5DADE2,stroke:#333,stroke-width:2px
end
subgraph Needs
Detecting-diverse-types-of-secrets{{Detecting-diverse-types-of-secrets}}
Local-commit-protection[[fa:fa-ban Local-commit-protection]]
Alarm-detected-secrets>fa:fa-camera-retro Alarm-detected-secrets]
Push-protection[[fa:fa-ban Push-protection]]
Detect-secrets-in-full-code-history[(Detect-secrets-in-full-code-history)]
Detection-of-secrets-in-different-media{{Detection-of-secrets-in-different-media}}
end
detect_secrets == solves ==> Detecting-diverse-types-of-secrets
local-git-hooks == solves ==> Local-commit-protection
remote-GitHub-Action == solves ==> Alarm-detected-secrets
pre-receive == solves if GitHub enterprise version ==> Push-protection
remote-GitHub-Action == solves ==> Detection-of-secrets-in-different-media
local-git-hooks == solves ==> Detection-of-secrets-in-different-media
trufflehog == solves ==> Detect-secrets-in-full-code-history
subgraph Other-notes
subgraph development
subgraph Server-may-needed
end
subgraph Web-crawling-may-needed
end
end
subgraph needs
subgraph webhook's-post-request
end
subgraph Secrets-in-previous-issues
end
end
PRI([Becasue of the scalability of detect secrets, <br> all feature proposed here are implementable. <br> But priority is based on the needs of the community.])
style PRI fill:#AF505C,stroke:#333,stroke-width:2px
end
webhook's-post-request -. need to be handled by .-> Server-may-needed
Secrets-in-previous-issues -. can be scaned by .-> Web-crawling-may-needed
webhook -- sends --> webhook's-post-request
style solution fill:#EC7063,stroke:#666,stroke-width:4px
style Detecting-diverse-types-of-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
style Local-commit-protection fill:#48C9B0,stroke:#333,stroke-width:2px
style Push-protection fill:#48C9B0,stroke:#333,stroke-width:2px
style Alarm-detected-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
style Detection-of-secrets-in-different-media fill:#48C9B0,stroke:#333,stroke-width:2px
style Detect-secrets-in-full-code-history fill:#48C9B0,stroke:#333,stroke-width:2px
style Server-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
style Web-crawling-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
style webhook's-post-request fill:#58D68D,stroke:#333,stroke-width:2px
style Secrets-in-previous-issues fill:#58D68D,stroke:#333,stroke-width:2px
end
Development Tools:
NotesBecasue of the scalability of detect secrets, all feature proposed here are implementable. Usagedetect secrets
local - git hook
remote - GitHub Action
local - git hook (AND) remote - GitHub Action
trufflehog
Other notes
|
Hi @riverma, thanks for the suggestions! I'll prioritize the needs of the community for the actual implementation. The first diagram was just to show the potential features of my solution. I've added another diagram to show how each tool relates to each need. Next up, I'll work on the diagram you mentioned. With these three diagrams, hope it can provide people a better understanding of what my solution can do and how to use it. |
@riverma Here are my other diagrams of the solution. It includes the most essential parts of my solution. Solution Structure Diagramgraph TD
subgraph SolutionStructure
subgraph SecretsDetectionApproach
subgraph Layer1["Layer 1: Push to GitHub.com (server-side)"]
style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph Layer2["Layer 2: Scan of updated code upon Git commit (client-side)"]
style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph Layer3["Layer 3: Full scan of the existing code base (client-side)"]
style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
subgraph Tools
subgraph CoreTool["Core Tool: Detect Secrets"]
style CoreTool fill:#5DADE2,stroke:#333,stroke-width:2px
detect_secrets{{detect-secrets}}
style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph OtherTools["Other Tools"]
pre_commit_ci[pre-commit.ci]
github_action[GitHub Action]
pre_commit_manager[pre-commit manager]
style github_action fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style pre_commit_ci fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style pre_commit_manager fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
subgraph LayerDetails
subgraph Layer1Details[Layer 1 Details]
Compatible_with_all_local_machines{{Compatible with all local machines}}
Protection_for_main_branch{{Protection for the main branch}}
Error_notifications_via_GitHub_email{{Error notifications via GitHub and email}}
Implemented_with_detect_secrets_workflow[[Implemented with detect-secrets workflow]]
end
subgraph Layer2Details[Layer 2 Details]
Optional_due_to_compatibility_issues{{Optional due to compatibility issues}}
Early_stage_secrets_detection{{Early-stage secrets detection}}
Commit_prevention_and_error_messages{{Commit prevention and error messages}}
Implemented_with_pre_commit_manager[[Implemented with pre-commit manager]]
end
subgraph Layer3Details[Layer 3 Details]
Direct_use_of_detect_secrets{{Direct use of detect-secrets}}
Error_messages_for_detected_secrets{{Error messages for detected secrets}}
end
end
Layer1 -->|uses| pre_commit_ci
Layer1 -->|uses| github_action
Layer2 -->|uses| pre_commit_manager
Layer1 -->|uses| detect_secrets
Layer2 -->|uses| detect_secrets
Layer3 -->|uses| detect_secrets
style SolutionStructure fill:#EC7063,stroke:#666,stroke-width:4px
style SecretsDetectionApproach fill:#AF7AC5,stroke:#333,stroke-width:2px
style Tools fill:#48C9B0,stroke:#333,stroke-width:2px
style LayerDetails fill:#F1948A,stroke:#333,stroke-width:2px
style Compatible_with_all_local_machines fill:#58D68D,stroke:#333,stroke-width:2px
style Protection_for_main_branch fill:#58D68D,stroke:#333,stroke-width:2px
style Error_notifications_via_GitHub_email fill:#58D68D,stroke:#333,stroke-width:2px
style Implemented_with_detect_secrets_workflow fill:#58D68D,stroke:#333,stroke-width:2px
style Optional_due_to_compatibility_issues fill:#58D68D,stroke:#333,stroke-width:2px
style Early_stage_secrets_detection fill:#58D68D,stroke:#333,stroke-width:2px
style Commit_prevention_and_error_messages fill:#58D68D,stroke:#333,stroke-width:2px
style Implemented_with_pre_commit_manager fill:#58D68D,stroke:#333,stroke-width:2px
style Direct_use_of_detect_secrets fill:#58D68D,stroke:#333,stroke-width:2px
style Error_messages_for_detected_secrets fill:#58D68D,stroke:#333,stroke-width:2px
end
User Workflow Diagramflowchart TB
User([fa:fa-user User])
subgraph UserWorkflow["User Workflow to Secure Secrets"]
Layer1["1. Layer 1: GitHub.com (server-side)"]
Layer2["2. Layer 2: Git commit scan (client-side)"]
Layer3["3. Layer 3: Full scan (client-side)"]
Layer1 -->|If Secrets Detected| Clean1[Purge or Fix the commit manually]
Layer2 -->|If Secrets Detected| Clean2[Clean local file directly. <br> Don't need to worry about cleaning commit history]
Layer3 -->|If Secrets Detected| Clean3[Clean local file directly.]
Secure["Only Main branch is in safe. <br> Secrets are leaked on other branch before cleaning"]
Clean1 --> Secure
SaveTime["It saves your time. And secrets are safe from GitHub"]
Clean2 --> SaveTime
Clean3 --> SaveTime
end
User -->|At least use| Layer1
User -->|Helpful to use| Layer2
User -->|Optional to use| Layer3
style User fill:#F6F5F3,stroke:#333,stroke-width:1px
style UserWorkflow fill:#AF7AC5,stroke:#333,stroke-width:2px
style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Clean1 fill:#5A88ED,stroke:#333,stroke-width:2px
style Clean2 fill:#5A88ED,stroke:#333,stroke-width:2px
style Clean3 fill:#5A88ED,stroke:#333,stroke-width:2px
style SaveTime fill:#5ABF9B,stroke:#333,stroke-width:2px
style Secure fill:#AF3034,stroke:#333,stroke-width:2px
DocumentationSolution Structure Diagram
User Workflow Diagram
|
Hi @perryzjc - Excellent work here with the research and brining this all together. I support your plan here, but I have a couple questions and suggestions. Questions:
Suggestions:
|
Hi Rishi, here is my response to the Questions:
Also, thank you for the Suggestions! When it comes to the implementation, I will try to complete those features and make them as convenient and secure as possible. |
Here are three sequence diagrams to help people better understand the three layers. Layer1 - Server-side push to GitHub.comsequenceDiagram
participant User as Developer
participant GH as GitHub
participant Config as .pre-commit-config.yaml
participant CI as Pre-commit CI
participant DS as Detect-Secrets
Note over User,GH: Developer creates pull request or pushes to branch
User->>+GH: Creates pull request / pushes to branch
GH->>+Config: Fetches pre-commit config
Config->>CI: Returns config with Detect-Secrets setup
CI->>DS: Requests secret scan
DS->>DS: Scans pull request / branch for secrets with custom plugins
alt Secrets Detected
DS-->>CI: Returns detected secrets
CI-->>GH: Reports status check as failed
GH-->>User: Prevents merge / push & reports status check
else No Secrets Detected
DS-->>CI: Returns clean result
CI-->>GH: Reports status check as passed
GH-->>User: Allows merge / push
end
Layer2 - Git commit scan (client-side)sequenceDiagram
participant User as Developer
participant Local as Local Environment
participant Config as .pre-commit-config.yaml
participant PCH as Pre-commit Hook
participant DS as Detect-Secrets
participant File as Baseline File
Note over User,Local: Developer attempts to commit
User->>+Local: Request commit
Local->>+Config: Fetches pre-commit config
Config->>PCH: Returns config with Detect-Secrets setup
PCH->>DS: Request secret scan with existing baseline
DS->>File: Fetches baseline file
File->>DS: Returns baseline file
DS->>DS: Scans changes for secrets with custom plugins
alt New Secrets Detected
DS-->>PCH: Returns detected secrets
PCH-->>Local: Prevents commit & reports detected secrets
Local-->>User: Prevents commit & reports detected secrets
else No New Secrets Detected
DS-->>PCH: Returns clean result
PCH-->>Local: Allows commit
Local-->>User: Commits changes
end
Layer3 - Full scan and audit (client-side)sequenceDiagram
participant Dev as Developer
participant Env as Local Environment
participant DS as Detect-Secrets
participant File as Baseline File
participant Audit as Audit Tool
Note over Dev,Env: Developer initiates a direct scan for secrets
Dev->>+Env: Triggers direct scan
Env->>+DS: Requests scan on the codebase
DS->>DS: Performs secret scanning
DS->>File: Generates new baseline file
File->>DS: Acknowledges file creation
DS-->>-Env: Returns scan results and new baseline file
Env-->>Dev: Presents scan results and new baseline file
Note over Dev,File: Developer may audit the new baseline file
Dev->>Audit: Initiates audit on the new baseline file
Audit->>File: Fetches details from the baseline file
File->>Audit: Returns secret details
Audit-->>Dev: Presents detailed information of detected secrets
|
Configuration files (yaml, baseline file, and plugins) are stored at another repository: https://github.com/NASA-AMMOS/slim-config-detect-secrets
Configuration files (yaml, baseline file, and plugins) are stored at another repository: https://github.com/NASA-AMMOS/slim-config-detect-secrets
@perryzjc has proposed his plugins as PR's to Yelp's Detect Secrets core codebase here. Once those are accepted, we may no longer need to host a separate fork of detect secrets in the future. |
Issue #89: Update documentation for detect-secrets
Resolved. |
NASA-AMMOS/slim#89: Functional GitHub Actions-based secrets detection…
Checked for duplicates
Yes - I've already checked
Best Practice Guide
Continuous Integration
Best Practice Guide Sections
Starter Kits
Describe the improvement
We have some existing recommendations for checking sensitive AWS credential information via using
git-secrets
described here. However, we've received feedback that this could be improved via the following:To support these two needs, we should evaluate if
git-secrets
is the right tool, or if it should be augmented or replaced with a better solution.The text was updated successfully, but these errors were encountered: