-
Notifications
You must be signed in to change notification settings - Fork 123
MeetingMinutes2022
We meet online on Mondays at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.
Join us at https://meet.jit.si/AboutCode
The current meeting notes is at:
This is meeting minutes from all meetings in 2022:
Participants:
- Tushar @tg1999
- Chirag Bablani
- swastik sharma
- Omkar
- Hritik
Agenda:
- Packaging issues discovered in scancode.io
Discussion:
- Swastik brought up the breaking of the packaging library in scancode.io https://github.com/nexB/scancode.io/issues/576
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- akhil @lf32
- ajay @35C4n0r
- phillipe @pombredanne
- swastik sharma
Agenda:
- dark mode in scancode.io
- scantext PR
- nuget inspector
- scancode-toolkit release
Discussion:
- work on scancode license detection follow up PR is almost complete except a few minor improvements, we should be able to do the final review tomorrow, and try to get a beta release out this week.
- more checks in scantext? also tests are not completed. It can be done in the open PR too if these are critical, otherwise can be done in subsequent PRs too.
- dark mode in scancode.io: https://blog.openreplay.com/implementing-dark-mode-with-bulma/ this will be also in vulnerablecode, we can also thinka about making a common UI repo, as this is reused in scancode.io. vulnerablecode and now also in purlDb.
- On https://github.com/nexB/fetchcode/issues/64: it would be better to introduce some parameters here to do this rather than removing the code.
- minimal first version out of nuget-inspector, we got some issues reported and we have something which works better now, added support for metadata and target frameworks. PR: nexB/nuget-inspector#9
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yang @jyang
- phillipe @pombredanne
Agenda:
- license sync script bugs
- scancode release
Participants:
- Tushar @tg1999
- C35
- Chirag
- phillipe @pombredanne
Agenda:
- Chirag is looking for good first issues to work on - Tushar and Philippe pointed out that we may have a few bite-sized good first issues possibly with https://github.com/nexB/vulnerablecode/issues/597 - Such as project_kb_msr2019. You should reach out online for extra details.
- Philippe working on paper and design for federated data collection and sharing
- public release of matchcode: Jone and Philippe are working on it. Likely to be done in the purldb repo
- Tushar and Philippe discussed VCIO where we can have conflicting advisory ranges. We may need to keep track of which advisory reports which range
- Philippe went Friday to an event in Brussels https://swforum.eu/events/open-source-workshops-computing-sustainability and met with multiple users and possible backers of our projects.
Participants:
- Tushar @tg1999
- phillipe @pombredanne
- akhil @lf32
Agenda:
- NPM purls parsing - https://github.com/package-url/packageurl-python/pull/106
- Various Funding Propsals for Aboutcode projects
- Fixing bugs in SCTK - https://github.com/nexB/scancode-toolkit/issues/3160
- Test latest and greatest version of the dependencies to ensure the dependencies do not fail at run time.
- Releasing a new gem parser to fix https://github.com/nexB/scancode-toolkit/issues/3160
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- phillipe @pombredanne
Agenda:
- New license detection: A beta release of version 32. It will be major change that will need some changes at scancode.io
-
- New license detection
-
- Top level detection is implemented
- A lot of license detection process that depends on resource level
- A model object can be de-serialized from JSON.
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- omkar @OmkarPh
- phillipe @pombredanne
- @keshavspace
- Jono Yan @jyang
Agenda:
- scancode license PRs, other issues before next release
- nuget inspecter dependency resolution
- executable for scancode.io/toolkit
- shared model: experimenting
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- @keshavspace
- omkar @OmkarPh
Agenda:
- new repos and releases
- model sync between projects
Discussion:
- new pipeline in scancode.io to check for vulneribilities
- nexB/purldb with minecode and packagedb
- visitors: fetching data from package indexes
- mappers: transforms data into a scancode package model
- checks if purls actually exist
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- keshav @keshavspace
- omkar @OmkarPh
Agenda:
- clarity scoring
- scancode versions/clearlydefined
- unknown references to packages
- matching/repo of scans
- workbench update
Participants:
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- Jono Yan @jyang
- phillipe @pombredanne
- @keshavspace
- omkar @OmkarPh
Agenda:
- vulnerablecode release
- scancode dot release
Discussion:
- PRs to merge for a scancode dot release: 31.2.0 See https://github.com/nexB/scancode-toolkit/milestone/16 All PRs here are ready to merge.
- Vulnerablecode v30.0.0 is released: See https://github.com/nexB/vulnerablecode/releases/tag/v30.0.0 for details.
- Omkar: need phillipe's review on https://github.com/nexB/scancode-workbench/pull/532 to merge.
Participants:
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- Thomas @tdruez
- phillipe @pombredanne
- Steven @majurg
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Dependencies-packages page grouped by types UI updates on Packages view
- ziad's update: git importer
- lf32's update: licensetext UI update
- keshav's update: vulntotal benchmarking
Discussion:
- scantext UI update: We should have a link between the license-expressions on the left and the text and highlighting on the right, as otherwise it can be difficult to connect. One way could be applying a background color on the left same as the text background highlight on the right.
- workbench package view: we should have the heading for dependencies without packages indicate this clearly, and not just be 'other packages'. Also show dependencies and packages by their pURL directly and use the library for parsing recreating the strings.
- We need to discuss and create Version Range class for github importer in vers after looking at the version range exapmles in more detail.
Participants:
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- Thomas @tdruez
- phillipe @pombredanne
- Steven @majurg
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Created packages > dependencies page (Top level packages overview)
- ziad's update: Add support for rust ranges
- lf32's update: improve layout for license details
- keshav's update: Streamline VulnTotal CLI support JSON and YAML output add support for grouping Vulnerability by CVE
Discussion:
- Rust deps requirements doc: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html
- Demo by keshav of vultotal CLI: by giving a pURL by CLI, the tool shows the affected and fixed packages and this is grouped by CVE.
- Demo by omkar of workbench prototype: showing top level packages and dependencies, pakages and their dependencies nexted in the left, json data showed in the right. (Phillipe: could be yaml, and show packages by their pURL in the left)
- We need to also return URL links to RULEs in scancode itself at it is problematic to build these URLs elsewhere.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- @keshavspace
- @tdruez
- phillipe @pombredanne
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Created automatic release using github actions & dropzone for files
- lf32's scancode.io update: Highlight license matches
- ziad's update: migrate rust importer
Discussion:
-
Rust version ranges are not present in univers, they are semver like, but that is for versions, and so we have to create one Cargo VersionRange for this.
-
Can we create a generic version range for most of the semver cases? Not sure whether we can have a generic
-
Download release archives of the new workbench prototype here: https://github.com/OmkarPh/scancode-workbench/releases/tag/v4.0.0betaPowershell2 and give it a try. And also report and gice feedback.
-
On the scancode.io license text detection project, we can now highlight matches. Few points on improvement: 1. Also support overlapping matches 2. Seperate template code from views.py 3. Fix the details page such that it is for one match 4. make the highlighting continious, i.e. the stopwords/punctuation/symbols
beetween matched words should be highlighted (but not other unmatched words)
-
Discussion on the new LicenseDetection format next week monday.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- Avishrant @AvishrantSh
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: GSoC update: Completed menu actions, added release scripts
- keshav's update: GSoC update: added snyk.io DataSource and tests for the same
- lf32's scancode.io update: GSoC update: created charts for licenses worked on highlighting matches together in a single text (WIP) worked on details page (WIP)
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- Avishrant @AvishrantSh
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: landing page, buttons above functional, shows header information in another page. Looking into newer formats.
- ziad's update: Refactor Gitimporter using fetchcode
- keshav's update: add VulnerableCodeDataSource add OSS-Index DataSource
- lf32's scancode.io update: worked on highlighting matches together in a single text (WIP)
Discussion:
- workbench progress reviewed. Landing page UI changes made, buttons above table view for file/copyright/license/package filters is functional now.
- have started supporting newer output format versions in workbench. Header information is now shown in a new tab, the recent 31.x.x releases have more data structure changes that we have to support majorly.
- For scantext in scancode.io details tab with all details would be nice. Along with a highlight tab merge these different matches highlighted which is already present.
- Have to revamp tabular outputs by deprecating the current csv output and introducing seperete csv outputs for files/packages/dependencies. Also introduce xlsx output. See #3043 for more info.
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
- steven @majurg
Agenda:
- GSoC updates
- scancode release
GSoC Status:
- omkar's workbench update: Worked on Column filters for table view & SQlite imports (History adjustments)
- kevin's scancode update: modified rule and license validation based on Phillippe and Jono’s comments
- ziad's update: Refactor Gitimporter using fetchcode
- keshav's update:
- lf32's scancode.io update: fixed nexB/scancode.io#293, failing tests for nexB/scancode.io#450
Discussion:
- trying to keep a clone and update is a premature optimization, better to clone each time (even CIs do this as it can be complicated and problematic otherwise)
- we need to use different colors for each matched text, this could be also different backgrounds. This has to be implemented, have to create a new tokenization func & class that can accommodate multiple matches
- Added license/package and other column buttons. Some issues: wrapping should be word based, some empty lines are present between license keys. Import button could be moved to the left.
- Would be nice if we can build workbench from source for mac/windows/linux. Links for test and builds for workbench: https://github.com/inveniosoftware/intbitset/blob/master/.github/workflows/test-and-build.yml https://github.com/WojciechMula/pyahocorasick/blob/master/.github/workflows/test-and-build.yml https://github.com/nexB/scancode-toolkit/blob/develop/.github/workflows/scancode-release.yml
Participants:
- Jono @jyang
- omkar @OmkarPh
- Tushar @tg1999
- Kevin @KevinJi22
- keshav @keshav-space
- Ayan @AyanSinhaMahapatra
- ziad @ziadhany
- akhil @lf32
- Thomas @tdruez
Agenda:
- GSoC updates
- workbench build
- lf32 issue
- license detection testing
- GSoC evaluation
GSoC Status:
- omkar's workbench update: Filetree customizations, History, Build improvements, Chart view and other minor things
- kevin's scancode update: added validation during index creation time to check new licenses/rules
- ziad's update: npm importer - improver migration
- keshav's update: add GitHub validator add test for GitHub validator enable supported ecosystem listing in CLI
- lf32's scancode.io update: added resource navigation buttons https://github.com/nexB/scancode.io/pull/469 tried to fix highligting issues
Discussion:
- Hyperlink upstream package repos in lockfiles see https://github.com/nexB/scancode.io/issues/403#issuecomment-1194318783 Limitations in ace editor for hyperlinks and highlighting, but cannot replace it as it has useful functions. But the license detection app should not use/be constrained by ace editor
- See examples of building LicenseIndex from a hanful of rules. https://github.com/nexB/scancode-toolkit/blob/develop/tests/licensedcode/test_match.py#L1349
Participants:
- Jono @jyang
- omkar
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- avishrant
- steven @majurg
- ziad
- steven
- lf32
- Thomas @tdruez
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Completed bar chart section, and fixed sqlite models types
- kevin's scancode update: 1. Added documentation for how to install and use license plugins 2. Expanded feature to include installing and using rules for new licenses 3. Moved licensedcode_test_utils into src and added documentation for how to use it The link to the PR is here: nexB/scancode-toolkit#2979
- ziad's update: Migrate ruby to new importers , add a doctest for fireeye
- lf32's scancode.io update: Working on license view, adding barchart views
Participants:
- Jono @jyang
- omkar
- Tushar @tg1999
- Kevin @KevinJi22
- Ayan @AyanSinhaMahapatra
- ziad
- steven
- lf32
Agenda:
- GSoC updates
- scancode relase
GSoC Status:
- omkar's workbench update: Fixed querying issues, worked on path and column selection
- kevin's scancode update: added a CI job that installs licenses and tests license detection
- ziad's update: Add fireeye importer , add GSD test
- keshav's update: off this week for final exams
- lf32's scancode.io update: improved ui for scancode.io license detection view
Discussion:
- licening issue in https://github.com/nexB/vulnerablecode/issues/792, have to ask dennis. Also ask for the licensing data at https://github.com/mandiant/Vulnerability-Disclosures
- Add license index checks for licenses installed by wheel or folder. We could either keep these checks when reindexing licenses (if it doesn't take more than 20-30 seconds), else we can have this as seperately and recommend to run these after adding licenses.
- If we highlight based on start and end line, it will be easier but not as correct. Full lines aren't matched, partial matches are not displayed accurately in this case. We should organize a UI review session.
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- thomas
- ziad
- steven
- avishrant
- Ayan @AyanSinhaMahapatra
Agenda:
- GSoC updates
GSoC Status:
- omkar's workbench update: Implemented FileTree & path selection (updates are synced across all components on path change) next: implementation of chart views. Would require feedback session with users.
- kevin's scancode update: implemented functionality to use installed external license plugins in license detection
- ziad's update: opened prs for add GSD importer: https://github.com/nexB/vulnerablecode/pull/787
- keshav's update: opened prs for Deps validator and CLI support: https://github.com/nexB/vulnerablecode/pull/789 add osv validator: https://github.com/nexB/vulnerablecode/pull/788
- lf32's scancode.io update:Improved Templates for the web app see https://github.com/nexB/scancode.io/pull/450 for more details and discussions.
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- thomas
- ziad
- steven
- avishrant
Agenda:
- GSoC updates
GSoC Status:
- omkars's workbench update: Working on the views and data visualization for the typescript implementation.
- lf32's scancode.io update: start with simple text and highlighting and putting forward a nice UI, don't spend time on supporting binaries just yet. Developed Views and Tweaking around Templates, see https://github.com/nexB/scancode.io/pull/450 for more details and discussions.
- Kevins update on scancode toolkit external licenses: external licenses are being added successfully from folders, need to upload to pypi and test for this that extra license is being detected after being installed from pypi.
- ziad's update: opened prs for Add support for CWE: https://github.com/nexB/vulnerablecode/pull/782 add a PyPa importer: https://github.com/nexB/vulnerablecode/pull/780
- keshav's update: opened prs for Deps validator and CLI support: https://github.com/nexB/vulnerablecode/pull/789 add osv validator: https://github.com/nexB/vulnerablecode/pull/788
Participants:
- Jono @jyang
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
Agenda:
- External licenses
- Keshav PR
- workbench project
- Vulnerablecode release
- cyclonedx plugin
GSoC Status:
- kevin working on https://github.com/nexB/scancode-toolkit/pull/2979 where external licenses are added to the licenseindex and now working on getting external licenses installed by wheels https://github.com/nexB/scancode-toolkit/issues/2994. wondering how to find out which plugins have been installed in the first place, maybe will use entry_points variable in each package's setup.py, but how to use that to get all the installed plugins. If plugins installed, don't have to specify by passing options, just use the external licenses.
- omkar worked on updating using all the latest dependencies for the new workbench implementation, has managed to get data for the views, showed demo. Will work on the UI similar to the workbench, table views and other views that are present.
- lf32 has added https://github.com/nexB/scancode.io/pull/450, see discussions and comments for more details, no questions on the call specifically.
- keshav added PR https://github.com/nexB/vulnerablecode/pull/777 to add initial config for vulntotal, to be discussed in detail at https://meet.jit.si/VulnerableCode weekly call.
- ziad having final exams so will start from the following week.
Other Discussion:
- vulnerablecode release process started at https://github.com/nexB/vulnerablecode/pull/776 for 30.0.0rc1 - Tushar and Phillipe
- Finished upgrading scancode-toolkit in scancode.io - Jono
- the cyclonedx output is failing as top level attribute packages is required and cyclonedx output takes all its output from packages. Now there are two options 1. we make the cyclonedx plugin dependent on --packages or --system-packages and thus we will see a message if that isn't the case. 2. we just add warnings to the cli and/or to the output file if there are no packages. Here option 1 is not feasible as we also do --from-json and then convert to cyclonedx and here this should only be based on whether packages top level attribute is present or not. - Jono and Ayan
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- lf32
- Keshav
- Tushar @tg1999
- Kevin @KevinJi22
- Steven @majurg
Agenda:
- lf32 - Questions about scancode.io and PR
- Philippe - Status on SCTK build issues and resolution, release update.
Discussion:
- The scancode configure script has been updated to run on MacBooks with Apple Silicon. The modifications involved include rerunning scancode using Rosetta via bash when the configure script detects you are running on the native ARM terminal.
- A strange bug with the PyPI package parser has been fixed where the PyPI EndToEnd test would fail seemingly at random. This bug was due to the differences in the walk() function used on Azure's version of Python.
- lf32 had questions about how his project should be displayed/presented on scancode.io. He was wondering if each license text that needed to be scanned should have its own project, as is the current usage in scancode.io. Philippe suggests something simpler, where the license text scanning would be more like using Google Translate, something that just shows us the results and we should not keep it around for long.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- lf32
- Keshav
- catalyn
- Alexander
- Ernest
- Hritik
- Tushar @tg1999
- Ayan @AyanSinhaMahapatra
Agenda:
- workbench with omkar
- questions from lf32
- using latest features of scancode in scancode.io
- release of vulnerebalecode
- test issue suite vulnerablecode
- kubernates
Discussion:
- Report attached which is generated from vulnerablecode importers, issue: https://github.com/nexB/vulnerablecode/issues/755. We still don't have bower support, but we query npm instead. It would be intereseting to see which CVE/purl mapping we don't have, and what's the reason. Also would be nice to attach a CSV.
- tushar tested and verified all the data sources, should be out this week potentially. proper support of native code in apple M1/intel is pending, scancode.io system package support
- https://github.com/OmkarPh/workbench-prototype has some of the ecperiments porting to typescript. Need to use skeleton/other files from workbench. All the dependabot PRs should be closed in favour of the one PR to update packages by omkar.
- We don't have a solution yet, we need to recreate the get_installed_packages functions so we can seemlessly get package files in case of system packages. We have to create these functions thus.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- lf32
Agenda:
- vulnerablecode release
- pending issues in universe
- system packages scanning porting broken
- bazel pr from phillipe
- talk submit
Discussion:
- GSoC kickoff presentation: https://drive.google.com/file/d/1_KLaAVWVbQeEeGxkSuqwWfy2xeNhv4G5/view?usp=sharing. Will have discussions on the proposals next week.
- https://github.com/bazelbuild/rules_docker/pull/2065 PR by phillipe merged in bazel to add md5sums files list to distroless container
- We need to check out alternative virtual filesystem implementations supporting different storages and options to standardise this across our tools. Some existing implementations: https://github.com/PyFilesystem/pyfilesystem2 and https://github.com/fsspec/filesystem_spec
- Tushar and Phillipe to sync on submitting a talk for Open Source Summit (North America 2022), maybe also ask for extension. The deadline is tomorrow.
- Semver issues in univers, see https://github.com/nexB/univers/issues/74 and https://github.com/nexB/univers/pull/69for more info.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Thomas @tdruez
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- Alexander
- Ernest
- lf32
Agenda:
- gsoc
- composer versions
- scancode-toolkit release
- scancode.io reuse
- vulnerablecode release
- holder normalisation in summary @jono
- scancode kubernetes packaging
Discussion:
- https://github.com/xerrni/scancode-kube has been created to deploy scancode.io with kubernates. Please give this a try. Also added https://github.com/nexB/scancode.io/pull/442 to link to this.
- Holders are normalized from a list and because of this original detactions can't be referenced correctly. We could remove suffixes when tallying holders here and just use company names. See https://github.com/nexB/scancode-toolkit/issues/2972 for more details.
- Vulnerablecode should have a new release with all the newly ported to new model importer-improvers, should be ready by next week. Some problems are there beacause of redhat API being rate limited. Fixed by https://github.com/nexB/vulnerablecode/pull/757. Per page number can be 1000 instead of 10000 safely here.
- scancode.io docker pipeline is failing because of missing functions in scancode get_installed_files. The codebase/resource model is being reimplemented to be a list of paths, to better facilitate getting a list of files from scancode to scancode.io and assigning files to packages.
Participants:
- Philippe @pombreadanne
- Jono @jyang
- omkar
- Tushar @tg1999
- Hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- lf32
Agenda:
- Severity models in vulnerablecode
- PR#69 univers
- New approach commoncode
Discussion:
- severity is tightly related to reference, but not to the vulneribility. is there a case where a reference that lists more than two vulneribilities? A refrence only exists in the context of a vulneribility, if reference is reused in vulneribilities, that doesn't work. it can have same values but it isn't the same instance. Severity shouldn't be an attribute of a reference, instead, it should exist directly at a vulnaribility level.
- Should https://github.com/nexB/univers/pull/69#discussion_r870417205 be put in a different issue. Nginx shouldn't be considered as a base, for testing, maybe use npm/nuget/maven/pypi instead. Don't make up test cases for the same btw, real examples are always better.
- Codebase/Resource model has been changed such that codebase now holds a flat dictionary of paths. This enables us to use the new package models in scancode.io by passing the files for a package in a codebase scan. Also making all the related functions for traversing the tree return a list always, and an empty one if it doesn't find anything, essentially not failing. 31.0.0b1 released, feedback needed here.
Participants:
- Aditya
- Philippe @pombreadanne
- Jono @jyang
- omkar
- Tushar @tg1999
- Priya
- Keshav
- Ayan @AyanSinhaMahapatra
Agenda:
- scancode release
- LicenseDetection
- gemversion in univers
- PR#726 in vulnerablecode
- scancode issue
Discussion:
- https://github.com/nexB/vulnerablecode/pull/726 from threatrix.io fixes an infinite loop bug
- https://github.com/nexB/scancode.io/issues/409. We systamatically ignore files/directories that we ignore (based on distros/type of rootfs) sometimes which are not scanned and tagged as uninteresting, based on origin/license/security prespective. scancode.io needs documentation on what is tagged as uninteresting and not scanned.
- https://github.com/nexB/univers/pull/69 was created in univers to patch gemversion, need to revisit constraint validation as there could be duplication.
- scancode has beta releases out in pypi and as github releases, this is automated now and builds and publishes on pushing tags, a bit more complicated than the initial implementation in univers done by tushar to push to pypi, as scancode has to be relased as archives for the different supported platforms. Will incorporate bugfixes and release a major version soon.
- We should have two file-level buckets, one for LicenseDetections and the other for clues. The LicenseDetection list would be in the licenses list that had LicenseMatch objects before, and there would be another list (possibly license-clues) which would be a list of LicenseMatch objects from which LicenseDetections couldn’t be created. There should also be some rules which would be tagged as clues, which are not detections of a License in the proper sense but could be a reference to a license potentially (like say a link to ghostscript website is a clue, not a detection)
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Keshav
- Ayan @AyanSinhaMahapatra
- steven @majurg
- avishrant
Agenda:
- Scancode released betas and status
- scancode.io sync for the package models
- gsoc mentors
- vulnerablecode PR
- package resources fix assigning to for_packages
- conclusions wrt workbench
- release build automation
- storing version ranges
Discussion:
- conclusions: manually edit scans and conclude results based on research. Was an experiment in workbench. Whether something needs reviews or not, and having concluded/review status. We want to tag things that should be reviewed manually and things that don't need any review. This should be in scancode.io and should be removed from workbench as it's confusing there.
- vulneribility ID, should we use UUID/VULCOID one or both and what should be done here. vulneribility ID is needed, and this used to be an UUID, and we should move away from this. shouldn't worry too much about vulnerablecode instances, should focus on releases instead. We should be able to query by vulnerability ID and aliases.
- bugs present in assigning resources to a package in varios ecosystems, npm, pypi etc, have to report and fix these.
- scancode released, please run a small scan in different kinds of mac/windows systems if available, and report problems if present.
- releases vs tags: release can be automated and require manual intervention, tags are much easier. build and publish to pypi on tag, and then release is optional? or whether to release on tag automatically?
Participants:
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Viraj Dhanushka
- lf32
Agenda:
- Dependabot PR to update Django
- Fetchcode
- Model Changes
- Release of scancode toolkit
Discussion:
- Merging PRs in scancode toolkit, release is very near
- Model changes on Package License
- Automate release of all the projects
- PR to update Django, we are not using the vulnerable part of Django, we are using hardcoded values. It's a dot version, so we are updating it
- We will be using fetchcode for cloning git repos, we are not supporting incremental clones as of now, but this should be done in future as an enhancement
- Needed review on https://github.com/nexB/vulnerablecode/pull/667
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Jono @jyang
- Tushar @tg1999
- hritik
- Viraj Dhanushka
- lf32
- Keshav
- Omkar
Agenda:
- GSoC proposals
- package files status and release
- vulcoid
- gsoc lf32 scan single license test
- viraj conclusions scancode.io project
- vulnerablecode advisory
- openssl prs raised
Discussion:
- Issue: https://github.com/nexB/vulnerablecode/issues/695
tried to shorted vulcoid, does this:
VULCOID-srQji2x-4McmdRSewqgI5Q==
work? Also see https://pypi.org/project/shortuuid/. We want to have numbers, which resamles cves but is our own. Could be a hash, but this is not super readable. We could also shorten a hash. - This year in GSoC there are large/medium projects in terms of how many hours the participant will spend on the project. This has to be discussed and selected carefully based on how much time the participant is planning to spend and how much time the project requires. There is also a new addition this year to extend the project timelines based on discussions between the mentor and the participant if situation arises.
- conclusions are the process of reviewing detections from license, copyright and extending to any detected results and the process to review some of them for possible errors and thus the UI/models should enable the creation of this workflow and process.
Participants:
- Aditya
- Alexander
- Ayan @AyanSinhaMahapatra
- Chaitanya
- Ernest
- Philippe @pombreadanne
- Jono @jyang
- Keshav
- Sujit
- Tushar
Agenda:
- Kubernates presentation
- Jono tests
- Chaitanya GSoC
- GSoC proposal aditya
- status package files and how it'll go to scancode.io
- vulnerablecode
Discussion:
- kubernates presentation slides link
- will there be imporvements for alphine and APKBUILD files? Yes in terms of general
- compare adding regex and other data structures to compare time complexity, rust implementation of aho corasick,
- proper testing on nginx required. reaching a stable state for vulnerablecode is required. people interested in vulnerablecode and CPEs. modified some semantics. code that fetches package versions from API, super complicated and difficult to test and mock. Shouldn't be premature optimization there.
Participants:
- Akshat
- Avishrant
- Ayan @AyanSinhaMahapatra
- Chaitanya
- Hritik
- Philippe @pombreadanne
- lf32
- Jono @jyang
- Keshav
- Steven
- Thomas
- Tushar
Agenda:
- vulnerablecode cpe
- nvd reporter
- license text detection webapp
- summary plugin scancode
- vulnerablecode prs
- mypy integration
- package files
- resource codebase
Discussion:
- CPEs are important, how should we reference them in our models. We need to map purl to cpe to add it as packages, which is unsolved. We could add it as references, which seems okay. We don't have any urls associated to CPEs, which is a problem.
- NVD data doens't have any affected packages
- Inferring a PURL from a CPE (GSoC project) Doing search and build some mapping, building a improver.
- Originally the summary plugin would see all license-expressions, authors, copyright, languages etc. To deduplicate the contents of primary license/summary plugins we need to access data from other plugins so should we have a single plugin? If still want this summarization in toolkit, so it makes sense to mertge the license-clarity-score and summary plugins. For the classify plugin, there should still be seperate functions
- resource path, optimized for speed. Difficult to find path. We have to walk the codebase and check paths. We need to be able to find a path quickly.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Aditya
- Avishrant
- Jono @jyang
- Keshav
- Tushar
Agenda:
- scancode release/packages
- gsoc sessions
- scancode-toolkit summary plugins deprecation
- openssl PR
- false positives/analyzer
- packages in scancode
Discussion:
- Ayan and Phillipe: In the LicenseDetection implementation, there are the more accurate detection level merging like (See license and unknown intros) which are more accurate than other cases like merging matches into detections in case of a inaccurate detection. Should these features of the analyzer still be taken into scancode-toolkit proper? Yes, they should be unless they have huge machine learning models behind their functioning, with more expensive calculations and larger requirements. There are also parts of the analyzer which gets unique detections out of all the detections, and even though this is slow, it should also be moved into scancode proper. See (RFC False Postitives Issue)[https://github.com/nexB/scancode-toolkit/issues/2878#issuecomment-1079639973] for more background on this issue.
- Packages work is being completed, should have files satisfactorily for both system packages and application packages. RPM, alpine has support, debian complicated, but has been added with some warts. This would need review tomorrow.
- See https://github.com/nexB/scancode-toolkit/issues/2842#issuecomment-1041910505 where deprecation of various summary plugins have been discussed, in favour of making some options default and making primary license also default. Now, we would add deprecation messages which would be displayed at, added to the headers, and also to stderror for people testingfor that in their tests. This would give people a heads up that these options are being deprecated in favour of default options and better summarization with the primary license.
- A data migration should be done for the portgres md5 index issue, see [here](https://github.com/nexB/vulnerablecode/pull/653). Also, computing md5 together for three strings in place of having 3 seperate fields would not be computationally or memory wise less expensive at all, and would make it more complex for cases. So not point doing that.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Hritik
- Aditya
- Avishrant
- Keshav
- Nashit10
- Tushar
Agenda:
- scancode release/packages
- gsoc sessions
- npm licenseref-LICENSE
- pypi osv
- openssl PR
Discussion:
- phillipe: working on some changes in packages, specifically model/class simplification, would push a branch for review soon.
- Responsibilities for gsoc session and making slides divided. scancode: phillipe and ayan, vulnerablecode: hritik, tushar: fetchcode, vers and packageurl. Projects page and ideas page with one or two ideas. Also what to expect pages: ayan and tushar. Session could be recorded tomorrow.
- https://github.com/nexB/scancode-toolkit/issues/2872 has to be fixed in code, and not by rules.
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Jono Yang @jyang
Agenda:
- scancode release
- gsoc sessions
- packages
- summary plugin
Discussion:
- we should have classify as default only after file info is also enabled as true always.
- support for system packages i.e. debian, alpine, rpm are also being added to the new package model before this release.
- 31 release prep https://github.com/nexB/scancode-toolkit/pull/2888 is merged.
- There should be gsoc session thursday.
Participants:
- Aditya @adii21-Ux
- Avishrant @AvishrantsSh
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- scancode release
- gsoc sessions
- vulnerablecode status update
- openssl issues
- Some data without CVE in openssl, likely an one-off case. CVE is no longer mandatory in vulnerablecode, it's just an important field. It's an alias now. There is a vulnerability ID for the openssl advisory, but this is just that, there could be multiple advisories in a day. If you would have an alias for this, like openssl-20141015-CVE-CVENAME, the ones without CVE would have the part without it. I.e. OPENSSL-20141015, OPENSSL-20141015-CVE-2001-3567 are examples. See also https://www.openssl.org/news/secadv/ where if there's multiple advisories we have -2 added at last with the ID. Like 20101116 and 20101116-2.
- Tushar is working on GitHub imported, should be coming soon. OSV importer for python is also worked on.
- scancode beta release is coming soon, one issue being worked on is native building in one of the dependencies, which has it's share of rabbitholes.
- sessions at two times, maybe one in the morning pacific time, early afternoon in india.
Participants:
- Alexander
- Aditya @adii21-Ux
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Jono @jyang
- Purna
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Sanket
- Tushar Goel @TG1999
Agenda:
- scancode devel problems
- alpine importers
- improvers doc
- scancode.io alexander
- debian inspector bug
- vulnerablecode
- false positives
- gsoc
Discussion:
- Phillipe/Jono: There was this https://github.com/nexB/debian-inspector/issues/25 This was caused from a spelling error (it's license in the spec) "license". Whether we should always use dual spelling for license in index is not clear, but atleast this should not fail right away and should be added as an usual license paragraph.
- We need to have a public deployment of vulnerablecode, but also that there's no need to rush unnecessarily, as we want to make sure it's robust and accurate.
- Just do the migration first, 1. enabling more, once we have the pysec thing for osv merged, 2. we can make it generic later, it has been interpreted differently. Even as the OSV includes packageURL, it isn't present in github data. We can make different data formats and import twice sometimes, as this is not ideal but this is a plus for vulnerablecode, it can handle these differences.
- A plan for false positive detections: https://github.com/nexB/scancode-toolkit/issues/2878
- GSoC project ideas priority should be discuessed, time set for a meeting to make the project ideas page https://github.com/nexB/aboutcode/wiki/GSOC-2022 more ready.
Participants:
- Aditya @adii21-Ux
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- roadmap wrt. summary plugins
- primary licenses
- openssl wrt. univers
- osv importers in vulnerablecode
- GSoC project IDs
- building native wheels
Discussion
- Hritik/Phillipe: want to have single importer method and wrapper/subclass from there. importers could consume other data in extra_data, shouldn't discard.
- Keshav/Phillipe: openssl has FIPS versioning (some versions are FIPS certified and has fips in version numbers), how to deal with this. Example: <affects base="fips-1.1" version="fips-1.1.1"/>. Could be two seperate importer/improvers 1. fips versions of openssl 2. all other versions of openssl, without fips
- Tushar: vers support for alpine added and new release for vers. New release automation system added for vers, where on release a github action is triggered to test and push wheels to pypi. Will add this to skeleton.
- Ayan: We had summary options which were optional and used to aggregate data on license-expression, authors, copyrights and their counts, along with key-files, facets and details summary. This would be deprecated in favour of a much simpler and default summary option which would have primary data like primary license aggregated. The classify option will also become a default option in the process.
- Phillipe: intbitset and pyahocorasic were build from native code, have contributed upstream the build wheels.
- Phillipe: We had a seperate third party wheel repo, to avoid supply chain attacks wrt. yanking and changing packages. This crashed last week because we had more jobs running in azure as we have been given extra credits there. Will move away from dreamhost, and/or have this as a second option. https://thirdparty.aboutcode.org/pypi/
Participants:
- Abhishek
- Aditya
- Alexander
- Aman Mawar
- Avishrant
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Jono @jyang
- Karan Vaishnav
- Keshav Space @keshav-space
- Tushar Goel @TG1999
Agenda:
- vulnerablecode issue
- univers for alpine
- skeleton
- azure CI
- GSoC projects
- ONAP scancode
- fetchcode
Discussion:
- New version of packageurl-python released with support for generic URLs
- Web application for a package evasluation, in a specific domain. (GSoC Idea)
- Now, the model relationship is either package introduced/fixed vulneribility. We have both an attribute and a flag which is a wart. Vulnerablecode call tomorrow, should be discussed (11 AM CET).
- Scancode.io running at scancode.onap.eu with https://github.com/nexB/scancode.io/pull/397.
- version handling in gentoo handled by their tool, have to check where the bug is. If it needs to be fixed upstream, we need to do that. Could be an alpine only wart.
- Failing nix tests https://github.com/nexB/vulnerablecode/issues/617.
- Not using same approach in CI. Azure, GitHub CLI, appveyor, Travis being used in vaious places. Need to streamline and make that a part of the skeleton.
- scancode.io deployment @ kubernates, where to host a shard, we could have it in the same repo, or in a seperate one. See https://github.com/bitnami/charts
- alpine APKBUILD, fetchcode work needs to be looked at.
- Will be applying to GSoC this week, people asked about mentoring @GSoC.
- A Language server protocol using scancode as a backend, see https://microsoft.github.io/language-server-protocol/ (GSoC Idea)
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Karan Vaishnav
- Keshav Space @keshav-space
Agenda:
- vulnerablecode priorities
- project ideas for upcoming gsoc
- fossdem
- scancode packaging
- mocking in tests
Discussion:
- openssl importers migrate to new, versions are there and explicit, not range. We could also document this process of migration on importers, this could be used and could help. Rebase and merge whenever necessary as the code is changing a lot.
- There's also openssh vulneribilities datasource, we should also import from there.
- FOSSDEM happened this weekend remotely, in which Phillipe co organized the Software composition and dependency management devroom dev room, and there was a session from him on Package URL and version range. See https://fosdem.org/2022/schedule/event/package_url_and_version_range_spec/
- Scancode has two ways of use, as an application which we download and run. It could also be used as a library from pypi where dependencies are fetched and installed. Problem with the first one and there was a lot of issues reported, these are all added for the next milestone. Now we're looking at https://cibuildwheel.readthedocs.io/en/stable/ which is building across all the combinations of OS, python versions and others. This could be a GSoC project also as there is more work to be done here.
- Which importers should we start with? We are still lacking a bit on documentation on getting started with vulnerablecode. We should start with importers which has high volume data and good quality as well. Some are github, npm, nginx, osv, openssh, gitlab etc. See list at https://github.com/nexB/vulnerablecode/issues/597, this was updated to change the order. Whenever we have importers for one package, it should maybe be there in package URL spec. We need a new version range for openssl, it looks a lot like pypi.
Participants:
- Aditya
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- jono @jyang
- Philippe @pombreadanne
- Purna Chandra Mansingh
- Sanket
- Tushar @TG1999
- Karan Vaishnav
Agenda:
- project ideas for upcoming gsoc
- scancode release
- vulnerablecode priorities
- scancode-toolkit and scancode.io directions
- Software Heritage client PR, coding conventions
Discussion:
- It's best to use imperative style for function naming (and commits). We name the function after what is returned, and also have that documented in the docstring. We can give a hint of what it does, but better put it in code as comments or by making the code more readable. Look at other projects for style guides, how the code makes sense. We have to understand that code is written once, but read hundreds on times, so code should be self explanatory and have good docs.
- We were not returning the correct set of packages (and versions) with vulnerabilities, and reposting both false negatives and false positives and in some cases skipping. Major changes were introduced to rectify these and an user complained at https://github.com/nexB/vulnerablecode/issues/597. Now all the importers are disabled except one, as they need to be ported to the new model. After this update previous data has to be dropped completely, as the data is wrong, and we can't figure how We had moved away from two branches (main and develop) and to a develop branch with tags.
- We need to put together the project ideas list for GSoC this week, some discussed ideas were:
- Update the importers in vulnerablecode (also good introductory issues as small and very clearly verified outcome)
- copyright detection quality: rules (have very large dataset and review manually)
- copyright detection speed: regex is slow uses fsms and fs-automatons (could be something that be improved)
- scancode workbench update and support
- scancode-toolkit API docs
- scancode.io new views for more data visualization and details
- scancode workbench port to python-electron
- list of updates needed on package ecosystems
- Scancode release upcoming, some of the problems are supporting different versions of macOS, and having wheels built for different python versions, OSs and others. Tools to help with that: https://github.com/pypa/cibuildwheel/, also see https://github.com/pypa/cibuildwheel/discussions/1007 and https://github.com/pypa/cibuildwheel/discussions/1006 On intbitset, an alternative is https://github.com/RoaringBitmap/RoaringBitmap.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- jono @jyang
- Philippe @pombreadanne
- Tushar @TG1999
- Karan Vaishnav
Agenda:
- GSoC introductory session
- scancode release
- vulnerablecode updates
- scancode-toolkit and scancode.io directions
Discussion:
- Phillipe: Will work on the univers release. Btw, one problematic thing there is stars in version numbers, whatever highest number which could be there, pypi also has a star versioned range but defined differently, as star is a string pattern in the case. We have to cope with that. Now in univers we error out if we don't support the cases, explicitly why it doesn't support.
- Hritik: Vulnerablecode, porting improvers to the new model. Have to do a lot as there are a lot of open issues also which need attention, a lot of technical debt. Example: Importer still used importer yeilder, should be like improvers where we have list of explicit classes. We could provide data dumps.
- Phillipe: We should evaluate the amount of work here, because a lot of data is not usable and fully useful, and how can we in the direction of something more useful. Very hard to get where we get the data from. Any order that could be thought of for the importers?
- Hritik: Could look at data sources which are more important, or have more volume of data and used more first.
- Phillipe: New data sources could be interesting to look at and write importers for. Also interesting is we don't track the licensing of the data source which is okay for now. if we don't log when we import operations, which importers are importing when, if we don't know this vulnerability was imported using this process, we wouldn't be able to instill confidence and track and look into issues at scale. One could be an import log, like of a web server, logs creation of records, errors, time of running.
- Hritik: We could redirect the debug log to a file instead of a database maybe?
- Phillipe: We need to be able to go back to the errors and that would be more actionable. The per record basis logs on the vulrneribility/importers/improvers and on the package relations. Useful to do it sooner than later and this can be done in parallel and could be done in a seperate thread. But definately a lesser priority.
- Waiting for google to announce the program, we would organize a session after that, likely one or two where we will present the projects which we are considering.
- Phillipe: scancode-toolkit side we need to make sure that we report correctly packages, package dependencies and package like data. DLLs/Kernel modules a package? probably not. Is an executable that has metadata. It is misleading to call it package. All these would be package level data. Also finishing the work on which files are part of a package instance. Everything that detects, collecting packages, dependencies, collecting package instances is all scancode-tookit. We'll have scancode.io store dependencies, package instances. It already does take package files data, but there isn't much data that scancode returns correctly which needs to be improved. Given a pURL: 1. we need to fetch registry information 2. we need to fetch the packages for the corresponding (could be multiple things like various wheels in pypi, jars, poms in maven) We need to put all this together and it needs to be used in two contexts 1. fetchcode, live scan, 2. More common, but in scancode where as part of a larger complex scan we want to fetch metadata where we want to enrich/compare metadata, fetch metadata, sources and scan them for more enriched data.
- Another direction is everything around simplifying and reporting less data, like in scancode we report line level and technical details which are useful if you want to dive into details. So we need to seperate two things: 1. detections with their technical details like line details and other technical things 2. actual data from detections (like just the license expression, as opposed to the whole matches with details). Also have primary things also, as opposed to details. Also could be a primary package, keep the secondary but not report as default.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Jono @jyang
- Philippe @pombreadanne
- Tushar @TG1999
- Utkarsh
- Aditya
Agenda:
- starting path scancode.io
- packages vs dependencies
- GSoC introductory session
- scancode release
- vulnerablecode updates
Discussion:
- Phillipe: Working on a new scancode release, fixing vulnerabilities in lxml and dparse. Will release a smaller point release if the actual release process takes more time. Will also support python versions 3.10 and drop support for 3.6 in the newer releases potentially.
- Phillipe and Tushar: Need for a GSoC introductory session, after discussion it was decided that it could be after the dates announcement for GSoC.
- Phillipe: Need to report dependencies seperately from packages, as they are really seperate, for example a repository could be a project and not a package and still could have a dependencies file. Also in dependencies there's need for seperate processing for fetching package data.
Participants:
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Philippe @pombreadanne
- Tushar @TG1999
Agenda:
- unknown licenses in scancode
- dparse security issue
- univers issue
- planning on aboutcode: scancode/package
- planning on aboutcode: vulnerablecode
- organizing: function/package
Discussion:
- Phillipe:
Unknown license detection done by akansha has finally been merged after some improvements, it works great now. After the usual license match generation, if there are parts of legalese text that is not matched or weakly matched, we run unknown license detection with a ngram index built from all the license text and then we filter and refine those matches and add them to the License Matches.
Also working on a CVE vulneribility on dparse which is a scancode dependency that I maintained, got in touch with folks from GitHub dependabot, and they disclosed information about it.
- Hritik:
There's an issue reported: https://github.com/nexB/univers/issues/20, where there's a failure to parse version ranges because it isn't hashable.
- Phillipe:
Yes we should have made the constraints a tuple attribute in attrs and then used a converter convert list to tuple. Opens https://github.com/nexB/univers/pull/21 with the correction.
- Ayan:
Shares https://docs.google.com/document/d/1cHAxXZ_VxwEDxRF4BcOXTSSjGp3-_tYLVxXTx2X8oC4/edit?usp=sharing, the Package design doc. Here in the example, we have a simple case of two python manifests a setup.py and a requirements.txt, under the same filepath, so here we could just take the files under that filepath, but in cases where the manifest might be not in the same filepath, we could take the files under that path and have them collected.
Phillipe:
Yes, here a setup.py is a manifest which is sually at project root, but the same can't be said for requirements file, so files for a package could be manifest specific, and is a function of the manifest class itself. But it could be also done for a package instance, where all it's package manifests that it is made out of, their functions are used for getting package files, and then there is an aggregation. Also this can't be on the fly, it has to be after a package instance is created out of package manifests.
Asks about planning in scancode/scancode.io, vulnerablecode and other projects to everyone.
Ayan:
We have a lot of code about packages in a lot of places, we have package manifest parsing and other ecosystem specific functions in scancode, then we have some in fetchcode, some code for mining/matching and some code for fetching data by network calls, these are all seperated in different places and there should be some efforts towards the unification of these package code, maybe put all things that require network access/download as scancode.io pipelines, and rest core functionality in scancode toolkit, as discussed in planning before.
Hritik:
The work on univers and supporting version ranges has to be completed and there's importers and now improvers concept in vulnerablecode, and shivam worked on timetravel type improvers for the data, and also there was some ideas on using NLP to improve data from text sources these could be looked into.
Phillipe:
There's also some planning to be done in function vs packages, where there are a lot of smaller packages like debian inspector, rpm inspector and then smaller package specific functions could be in package inspector, there are other specific packages like fetchcode, then there are repos like univers which implement one function which is a dependency to all the packages which use version ranges, have to think more on which could be better.
Also, there's a question if scancode could get more data from network calls and aggregate more data from lockfiles or such, and would be an optional flag, and only keep tasks which are resource and compute heavy like downoading and scanning to scancode.io, and have all the code related to aggregating and getting data over network calls in scancode itself, and in scancode.io pipelines, import from scancode itself.
(The entire conversation was more of a discussion, opinions were asked and aggregated, this doesn't reflect all that, just what was discussed).
Participants:
- Ayan @AyanSinhaMahapatra
- Philippe @pombreadanne
- Tushar @TG1999
Agenda:
- new scancode release
- scancode workbench
- new gsoc page
Discussion:
Phillipe:
- New scancode release on the way, with updated license detection. Previously a couple thousand false positive rules were added for license lists commonly seen, but they are being deleted and instead a filter is added which filters the matches. If there's a lot of different licenses detected on consecutive lines, it would be a license list and would be removed from the license matches. Other updates are key phrases can now be defined in rules and support for cyclonedx output. The package manifest PR is also getting merged, which introduces package manifest classes for all manifests.
Tushar:
- There has been a issue reported in the gitter discuss channel for scancode-workbench, is this the same issue reported before?
Ayan:
- The repost is for scancode workbench develop branch and said the issues weren't present in the releases, so likely a new issue.
Phillipe:
- Should open a new ticket for that.
Tushar:
- We added a new page for GSoC 2022 Ideas: https://github.com/nexB/aboutcode/wiki/GSOC-2022