Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URI] Do not attempt multiple verification attempts if host is non-resolvable #3656

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Nov 21, 2024

Description:

The URI detector currently makes an indiscriminate number of HTTP requests to domains, regardless of whether they actually exist. This results in wasted network bandwidth and logs spammed with things like below.

In addition to de-duplicating matches, this updates the URI detector to track hosts that are not found and skip verification.

Found unverified result 🐷🔑❓
Verification issue: lookup proxy.example.com: no such host
Detector Type: URI
Decoder Type: PLAIN
Raw result: http://username:[email protected]
Commit: 0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311
Email: Toan <[email protected]>
File: reactjs.zip
Line: 127
Link: https://github.com/azureossd/Deployment-Oryx-Samples/blob/0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311/reactjs.zip#L127
Repository: https://github.com/azureossd/Deployment-Oryx-Samples.git
Timestamp: 2020-04-23 01:20:33 +0000

Found unverified result 🐷🔑❓
Verification issue: lookup hostname: no such host
Detector Type: URI
Decoder Type: PLAIN
Raw result: http://username:pass%3Aword@hostname
Commit: 0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311
Email: Toan <[email protected]>
File: reactjs.zip
Line: 1
Link: https://github.com/azureossd/Deployment-Oryx-Samples/blob/0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311/reactjs.zip#L1
Repository: https://github.com/azureossd/Deployment-Oryx-Samples.git
Timestamp: 2020-04-23 01:20:33 +0000

Found unverified result 🐷🔑❓
Verification issue: lookup _jabber._tcp.google.com: no such host
Detector Type: URI
Decoder Type: PLAIN
Raw result: http://user:pass@_jabber._tcp.google.com/test
Commit: 0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311
Email: Toan <[email protected]>
File: reactjs.zip
Line: 7
Link: https://github.com/azureossd/Deployment-Oryx-Samples/blob/0dec3cdfe8cbd1c7fd6b5bdd3d8f108d4cc42311/reactjs.zip#L7
Repository: https://github.com/azureossd/Deployment-Oryx-Samples.git
Timestamp: 2020-04-23 01:20:33 +0000

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz rgmz requested a review from a team as a code owner November 21, 2024 23:51
@rgmz rgmz force-pushed the feat/uri-ignore-invalid branch 2 times, most recently from a9b65f5 to 9c2590c Compare November 21, 2024 23:59
Comment on lines +37 to +38

hostNotFoundCache = simple.NewCache[struct{}]()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would benefit from being centralized and shared by all detectors. A domain that's found to not exist for Azure Container Registry, Azure OpenAI, JDBC (below), etc. is applicable to any other detector.

Found unverified result 🐷🔑❓
Verification issue: lookup your_server.database.windows.net: no such host
Detector Type: JDBC
Decoder Type: PLAIN
Raw result: jdbc:sqlserver://your_server.database.windows.net:1433;
Commit: aa081336863641da6061a892c7304f9823b4e8d6
Commit_source: refs/remotes/origin/pull/2/head (hidden ref)
Email: Brian Gianforcaro <[email protected]>
File: Java/sample_java.java
Line: 11
Link: https://github.com/Azure/azure-sql-database-samples/blob/aa081336863641da6061a892c7304f9823b4e8d6/Java/sample_java.java#L11
Repository: https://github.com/Azure/azure-sql-database-samples.git
Timestamp: 2015-09-28 17:26:38 +0000

)

// Keywords are used for efficiently pre-filtering chunks.
// Use identifiers in the secret preferably, or the provider name.
func (s Scanner) Keywords() []string {
return []string{"http"}
return []string{"http://", "https://"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, http:// and https:// are more correct and efficient keywords.

var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil)
var _ interface {
detectors.Detector
detectors.CustomFalsePositiveChecker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we ignore all false positives for this detector with custom logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there were too many results being filtered by the wordlist, so someone decided to do this workaround instead of removing the problematic entries.

See #3246

@rgmz rgmz force-pushed the feat/uri-ignore-invalid branch 2 times, most recently from ecc2700 to 4a0f5c2 Compare December 2, 2024 13:28
@rgmz rgmz force-pushed the feat/uri-ignore-invalid branch 2 times, most recently from ce72986 to 57620fe Compare December 21, 2024 16:00
@rgmz rgmz force-pushed the feat/uri-ignore-invalid branch from 57620fe to c7bf326 Compare December 31, 2024 14:47
@rgmz rgmz force-pushed the feat/uri-ignore-invalid branch from c7bf326 to 41e205e Compare January 11, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants