Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Handle partial uploads and add file exclusion patterns #255

Merged
merged 5 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
## See [cloudrun-malware-scanner/CHANGELOG.md](cloudrun-malware-scanner/CHANGELOG.md).
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ CONFIG_JSON: |
"quarantined": "quarantined-bucket-name"
}
],
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name"
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
"fileExclusionPatterns": [],
ignoreZeroLengthFiles: false
}
```

Expand Down Expand Up @@ -111,14 +113,57 @@ resource "google_cloud_run_v2_service" "malware-scanner" {
quarantined = "quarantined-bucket-name"
}
]
ClamCvdMirrorBucket = "cvd-mirror-bucket-name"
ClamCvdMirrorBucket = "cvd-mirror-bucket-name",
fileExclusionPatterns = [],
ignoreZeroLengthFiles = false
})
}
}
}
}
```

## Notes on `fileExclusionPatterns`

The `fileExclusionPatterns` array in the config file can be used to ignore any
uploaded files matching a
[Regular Expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions).

This can be used for example if you have an upload system that creates temporary
files, then renames them once the files are fully uploaded.

The elements in the `fileExclusionPatterns` array can either be simple strings,
for example:

```json
"fileExclusionPatterns": [
"\\.tmp$",
"^ignore_me.*\\.txt$"
]
```

or they can be an array of 2 string values, allowing regular expression flags to
be specified, for example `"i"` for case-insensitive matches:

```json
"fileExclusionPatterns": [
[ "\\.tmp$", "i" ],
[ "tempfile.*.upload$", "i" ]
]
```

Files matching these patterns will be ignored by the scanner, and left in the
`unscanned` bucket, and an `ignored-files` counter incremented.

Helpful tools for regular expressions include the
[Regular Expression Cheatsheet](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet),
and the [Regex101](https://regex101.com/r/QK47Hp/1) playground (ensure
ECMAScript flavor is selected).

Note that when adding regular expressions into the config file, care must be
taken with `\` and `"` characters -- any of these characters in the regular
expression must be escaped with another `\`.

## Change history

See [CHANGELOG.md](cloudrun-malware-scanner/CHANGELOG.md)
Expand Down
22 changes: 21 additions & 1 deletion cloudrun-malware-scanner/config-env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,24 @@
# and can be shared across multiple deployments with the appropriate
# permissions.
#
# "fileExclusionPatterns" is a list of regular expressions. Files matching any
# of these patterns will be skipped during scanning. NOTE: These files will remain
# in the "unscanned" bucket and will need to be tidied and/or managed separately.
#
# Regular expressions can be expressed as simple strings,
# or as an array of 2 strings, the pattern and regexp flags, such as 'i' for case insensitive matching",
#
# Example:
#
# "fileExclusionPatterns": [
# "\\.filepart$", (Ignore files ending in ".filepart")
# "^ignore_me.*\\.txt$", (Ignore files starting with "ignore_me" and ending with ".txt")
# [ '\\.tmp$', 'i' ] (Case insensitive match for files ending in .TMP, .tmp, .TmP etc)",
# ]
#
# Cheat sheet for regular expressions:
# https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet
#
# Shell environmental variable substitution is supported in this file.
# At runtime, JSON will be written to the file /etc/malware-scanner-config.json.
#
Expand All @@ -34,5 +52,7 @@ CONFIG_JSON: |
"quarantined": "quarantined-${PROJECT_ID}"
}
],
"ClamCvdMirrorBucket": "cvd-mirror-${PROJECT_ID}"
"ClamCvdMirrorBucket": "cvd-mirror-${PROJECT_ID}",
"fileExclusionPatterns": [],
"ignoreZeroLengthFiles": false
}
93 changes: 84 additions & 9 deletions cloudrun-malware-scanner/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ const {logger} = require('./logger.js');
const pkgJson = require('./package.json');

/**
* @enum {string}
* @typedef {{
* unscanned: string,
* clean: string,
* quarantined: string,
* }} BucketDefs
*/
const BucketTypes = Object.freeze({
unscanned: 'unscanned',
clean: 'clean',
quarantined: 'quarantined',
});

/** @typedef {{[key in BucketTypes]: string}} BucketDefs */
/** @type {Array<keyof BucketDefs>} */
const BUCKET_TYPES = ['unscanned', 'clean', 'quarantined'];

/**
* Configuration object.
Expand All @@ -38,6 +38,9 @@ const BucketTypes = Object.freeze({
* @typedef {{
* buckets: Array<BucketDefs>,
* ClamCvdMirrorBucket: string,
* fileExclusionPatterns?: Array<string | Array<string>>,
* fileExclusionRegexps: Array<RegExp>,
* ignoreZeroLengthFiles: boolean,
* comments?: string
* }} Config
*/
Expand Down Expand Up @@ -80,7 +83,7 @@ async function readAndVerifyConfig(configFile) {
let success = true;
for (let x = 0; x < config.buckets.length; x++) {
const bucketDefs = config.buckets[x];
for (const bucketType in BucketTypes) {
for (const bucketType of BUCKET_TYPES) {
if (
!(await checkBucketExists(
bucketDefs[bucketType],
Expand All @@ -98,7 +101,7 @@ async function readAndVerifyConfig(configFile) {
logger.fatal(
`Error in ${configFile} buckets[${x}]: bucket names are not unique`,
);
success = false;
// success = false;
}
}
if (
Expand All @@ -110,6 +113,78 @@ async function readAndVerifyConfig(configFile) {
success = false;
}

// Validate ignoreZeroLengthFiles
if (config.ignoreZeroLengthFiles == null) {
config.ignoreZeroLengthFiles = false;
} else if (typeof config.ignoreZeroLengthFiles !== 'boolean') {
logger.fatal(
`Error in ${configFile} ignoreZeroLengthFiles must be true or false: ${JSON.stringify(config.ignoreZeroLengthFiles)}`,
);
success = false;
}

// Validate fileExclusionPatterns
config.fileExclusionRegexps = [];
if (config.fileExclusionPatterns == null) {
// not specified.
config.fileExclusionPatterns = [];
} else {
if (!(config.fileExclusionPatterns instanceof Array)) {
logger.fatal(
`Error in ${configFile} fileExclusionPatterns must be an array of Strings`,
);
success = false;
} else {
// config.fileExclusionPatterns is an array, check each value and
// convert to a regexp in fileExclusionRegexps[]
for (const i in config.fileExclusionPatterns) {
/** @type {string|undefined} */
let pattern;
/** @type {string|undefined} */
let flags;

// Each element can either be a simple pattern:
// "^.*\\.tmp$"
// or an array with pattern and flags, eg for case-insensive matching:
// [ "^.*\\tmp$", "i" ]

if (typeof config.fileExclusionPatterns[i] === 'string') {
// validate regex as simple string
pattern = config.fileExclusionPatterns[i];
} else if (
config.fileExclusionPatterns[i] instanceof Array &&
config.fileExclusionPatterns[i].length <= 2 &&
config.fileExclusionPatterns[i].length >= 1 &&
typeof config.fileExclusionPatterns[i][0] === 'string'
) {
// validate regex as [pattern, flags]
pattern = config.fileExclusionPatterns[i][0];
flags = config.fileExclusionPatterns[i][1];
} else {
pattern = undefined;
}

if (pattern == null) {
logger.fatal(
`Error in ${configFile} fileExclusionPatterns[${i}] must be either a string or an array of 2 strings: ${JSON.stringify(config.fileExclusionPatterns[i])}`,
);
success = false;
} else {
try {
config.fileExclusionRegexps[i] = new RegExp(pattern, flags);
} catch (e) {
const err = /** @type {Error} */ (e);
logger.fatal(
err,
`Error in ${configFile} fileExclusionPatterns[${i}]: Regexp compile failed for ${JSON.stringify(config.fileExclusionPatterns[i])}: ${err.message}`,
);
success = false;
}
}
}
}
}

if (!success) {
throw new Error('Invalid configuration');
}
Expand Down
25 changes: 23 additions & 2 deletions cloudrun-malware-scanner/config.json.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,29 @@
"Shell environmental variable substitution is supported in this file.",
"At runtime, it will be copied to /etc",
"",
"'fileExclusionPatterns' is a list of regular expressions. Files matching any",
"of these patterns will be skipped during scanning. NOTE: These files will remain",
"in the 'unscanned' bucket and will need to be tidied and/or managed separately.",
"Regular expressions can be expressed as simple strings",
"or as an array of 2 strings, the pattern and regexp flags, such as 'i' for case insensitive matching",
""
"",
"Example:",
"",
" 'fileExclusionPatterns: [",
" '\\.filepart$', (Ignore files ending in '.filepart')",
" '^ignore_me.*\\.txt$', (Ignore files starting with 'ignore_me' and ending with '.txt')",
" [ '\\.tmp$', 'i' ], (Case insensitive match for files ending in .TMP, .tmp, .TmP etc)",
" ]",
"",
"Reference and Cheat sheet for regular expressions:",
"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions",
"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet",
"",
"As an alternative to including this file in the container the contents can be passed as an enviroment variable CONFIG_JSON on",
"Cloud Run startup",
"",
"Note: The comments property is optional and can be removed."
"Note: This comments property is optional and can be removed."
],
"buckets": [
{
Expand All @@ -24,5 +43,7 @@
"quarantined": "quarantined-bucket-name"
}
],
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name"
"ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
"fileExclusionPatterns": [],
"ignoreZeroLengthFiles": false
}
Loading