Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan package files and extract for packages #1207

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AyanSinhaMahapatra
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra commented May 6, 2024

In all the following pipelines:

  • rootfs
  • docker
  • docker-windows
  • scan_codebase
    when we scan files for license, copyright and others, we are skipping the scan for codebase resources which have a status already before this step, and so anything tagged as application-package or system-package will not be scanned.

In the match_not_analyzed_to_system_packages pipe of the rootfs pipeline, we are matching all codebase resources which are a part of that package to the discovered package object and also updating it's status to system-package. (It seems like earlier we were also doing this for application packages with the match_not_analyzed_to_application_packages function, but this is not used anywhere after this)

Similary in the docker pipelines, in the create_system_package function of the collect_and_create_system_packages step we are updating the status of package files to system-package.

We can either:

  1. stop tagging the status of files which are part of a system-package
  2. or re-scan all package files tagged as system/application package

In this PR I've tried out the 2. approach, as this is what we do in SCTK also, but here we have to create a new argument update_status and pass it on to the function which saves data to resources after the scan to not overwrite the system-package or application-package status for codebase-resources to scanned, which was a side-effect of the file scans.

Since all these pipelines already did scan application package files (which were not metadata files/lockfiles) I'm assuming we also want to scan the metadata files which were not being scanned? Otherwise #762 does not make any sense. Note here that license scans which are part of a package scan (parsing the manifest and then only running license detection on the extracted part) can be different in some complex files than a simple license scan of the file, and we might need to improve how we handle this in SCTK to avoid confusion. See aboutcode-org/scancode-toolkit#3024 for details

Reference: #762
Reference: #1194
Reference: #83

For rootfs pipelines (rootfs, docker, docker-windows) all package files
which were a part of system packages had their status updated and
consequently were not being scanned for licenses, copyrights, emails and
urls. We were also not scanning package metadata files tagged as application
packages in scan_codebase and the rootfs pipelines. This commit scans all
package files and package metadata files to make sure we are not missing
any information.

Reference: #762
Reference: #1194
Reference: #83
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@pombredanne pombredanne requested a review from tdruez May 7, 2024 14:43
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
This feature's step should be optional, e.g., a "group"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants