-
Notifications
You must be signed in to change notification settings - Fork 138
MeetingMinutes
We meet online on Mondays at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.
Join us at https://meet.jit.si/AboutCode
Old meeting notes been moved to:
- https://github.com/nexB/aboutcode/wiki/MeetingMinutes2021
- https://github.com/nexB/aboutcode/wiki/MeetingMinutes202
Here are the running meeting notes:
Participants and Agenda
- Dependency/Packages improvement: Phillipe
- SCTK planning: Ayan
- Also present: Jono, Hritik, Tushar, Keshav, Omkar, Jay, Swastik
Discussion:
Main points from the discussion on Dependencies issue:
- We will add packages for resolved packages too and dependencies would be foreign key relationships
- sub-packages could be called as contained similar to spdx
- design doc needed for implementation details
Participants and Agenda
- Dependencies/packages/requirements
- present: Phillipe, Thomas, Jono, Ayan, tushar, keshav, omkar, swastik, jay
Discussion:
See discussion on dependencies/packages summarized at: https://github.com/nexB/scancode.io/issues/1066#issuecomment-1946385355
Participants and Agenda
- Jono: Priority on PurlDB
- Omkar: Need PR review
- Phillipe: FOSDEM
- Keshav and Tushar also present
Discussion:
-
Omkar needs PR review on https://github.com/nexB/scancode-workbench/pull/628
-
The FOSDEM Fringe event was fully packed, with around 75 attendees, including Alan Friedman from CISA. People from major European corporations like ORT, Fosology, and Software 360 were also present. Folks want to see more cooperation between SPDX and CDX.
FOSDEM was also incredibly packed, with around 10K people. Philip gave two presentations on DejaCode and PackageURL in the SBOM dev room.
Participants and Agenda
- Phillipe: Flit/python-packaging, Issue cleanup
- Ayan: License Detection updates
- Keshav: fetchcode PR
- Omkar, Tushar also present
Discussion:
- Also modify the match attributes order to the following: license_expression, license_expression_spdx, from_file, start_line, end_line, boolean flags, matcher, score, matched_length, rule_length, match_coverage, rule_relevance, rule_id, rule_url, referenced_filenames, rule_notes, matched_text, rule_text. need to check out the SPDX matching guidelines (we have allowed extra words that we want to include) see https://github.com/nexB/scancode-toolkit/issues/3601
- We need to add default support for license file references in package manifests, we currently do this with rules specifically which depends on filename, and we would fail in case of unconventional filenames here.
- Migrating package managers PR in fetchcode is ready to review: https://github.com/nexB/fetchcode/pull/93
- We are organizing an issue cleanup for the end of the year, this is the 18-22 December week. This thursday we are doing an issue cleanup for scancode-workbench.
- We want to look into using https://github.com/pypa/flit in scancode-toolkit and elsewhere to package modules/files as required to reduce code duplication and copying. This is super minimal, only need to add the version in the module, add a pyproject.toml file, set up the release scripts and start publishing. We can also do it for license-data and index, but we can't ignore directories in flit currently. It's not likely that the project will accept this feature, so we can fork and add this too if required. But this can be done later.
Participants and Agenda
- Omkar: Workbench rule details
- Phillipe: Planning
- Jono: matchcode API
- Tushar, Swastik, Keshav, Jay, Ayan
Discussion:
- Putting matching in the server, where previously we used to do lookups on specific resources, but now we are sending codebase data to the server to be matched at once. Maybe we should have something like PackageMatch, we are matching file/directory.
Participants and Agenda
- Omkar: workbench review
- Ayan: Test fixtures with SCTK
- Phillipe: License reference data location
- Jono: update purldb fields and history
- Keshav, Tushar also present
Discussion:
- Matches tables should ideally be shown before the file regions. We should probably also have the rule details as a new modal and have the link to the text there. See suggested layout: | license expresion | | license expression spdx | | matched text | | matched_length/rule_length (match_coverage) | | score | rule_relevance | | Rule_details | matcher | See https://github.com/nexB/scancode-workbench/pull/610 for more info.
- There is a bug where we have License Detection IDs mismatch, where we cannot find the license detections in files, which are shown in top-level. Looking into this.
- Package field update would be a new method now, which will also be used to add to the history. It's always nice to have history entiries, as we will be updating package data in various forms. File level history is not relevant so this is not being implemented. See https://github.com/nexB/purldb/pull/222 for more details.
Participants and Agenda
- Ziad: GSoC presentation
- Jay: GSoC presentation
- Ayan, Tushar, Keshav, Omkar also attending
Presentation Links:
- Jay's presentation: https://docs.google.com/presentation/d/1LGuC7-dXLYqlc73WwXR0ufEHzDHwVE_x5Rrx3YqHXpM/edit?usp=sharing
- Ziad's presentation: https://docs.google.com/presentation/d/1YGVoSvPV-hpEdrcWiXvZa-oVDih7zg1dlGapYR9jauQ/edit?usp=sharing
Recorded videos will also be published shortly.
Participants and Agenda
- Keshav, Phillipe, Hritik, Ayan and Jono also present
- Omkar - workbench
- discussion on purldb
Discussion:
- Support for --todo option in workbench is being added, reviewing UI for https://github.com/nexB/scancode-workbench/pull/610 Here we need to make the utility of the checkbox more apparent, maybe by adding a Reviewed header above these. We also need to decide what we want to use here, Vetted/Reviewed or some other name. Only support for license todo's are being added now, the packages part of this will be added later, when this feature is more tested/used in SCTK. Also note that we are not computing any of this in workbench, only reusing the SCTK data and todo attribute. We also want to change the review_comments field to something more appropriate/verbose, for now we can rename this field in the workbench UI. We could call this issue type instead.
- Phillipe: On workbench we need to make the matched text diff with rule text a more central element of the license explorer view, instead of hiding this behind a click. Same for more rule attributes, we should display these too.
- purldb was discussed in full detail with a live demo from phillipe, with respect to SCIO, d2d, current capabilities and future plans.
Participants and Agenda
- Keshav, Phillipe, Hritik and Jono also present
- Omkar - workbench
- Ziad - purl-sync
- Ayan - SCTK planning
Discussion:
- workbench:
https://github.com/nexB/scancode-workbench/issues/605 https://github.com/nexB/scancode-workbench/releases/tag/v4.0.0rc4 https://scancode-workbench.readthedocs.io/en/update-docs/ https://github.com/nexB/scancode-workbench/issues/607 - Run django toolbar,
Also running in shell querysets and filters, whether we are able to model everything in a query
Participants and Agenda
- Ziad: GSoC presentation
- Hritik: Transitive support
- Ayan, Tushar, Jono, Hritik also attending
- Mathcing legal file to licensedb - keshav
- fetchcode doc tests - keshav
- rpm PR - Omkar
- purldb:metapackage - phillipe
Discussion:
- When we try to match a resource which is a legal file, we get a lot of matches to purldb, for example if it's an apache license file. So we should not send these files to purldb for matching, and run the usual license detection on these (and not some special hash matching for licenses instead). And we can later also add a step to validate whether it is consistent with the package/directory licenses. For now this can just be a dangling license file which needs review.
- How to extract .rpm file and edit files inside, and then re-compress this in a .rpm file again such that it is still a valid .rpm file which loads the same way, so we can only have the essential parts for the tests. Here in the tests we had .rpm files which had the metadata and the payload was truncated, which is why the rpm tool will not work.
Participants and Agenda
- Omkar: Content for "How-to Guide" in SCWB
- Ziad: Issue while mocking in multiple functions in pytest
- Jay, Ayan, Tushar, Jono, Hritik, and Keshav also attending
Discussion:
- Omkar: What should be included in the how-to guide? The detail is already present in UI Reference section. Ayan: We don't need to document every small detail; we just want GIF or video to quickly get started with SCWB highlighting, "Package view", "License view" and "Table view".
- Ziad: Unable to mock test test_remote_person_follow_purl in https://github.com/nexB/vulnerablecode/pull/1209/. Solution: Need to mock multiple functions using side_effect and return_value similar to what we do in SCIO https://github.com/nexB/scancode.io/blob/main/scanpipe/tests/test_pipelines.py#L986-L1023
Participants and Agenda
- Jay: GSoC updates
- Ziad: GSoC updates + performance for exporter
- Phillipe: Origin of Code
- Ayan: SCTK wheels
- Omkar: ToDo feature in workbench
- Tushar, Keshav, Jono also attending
Discussion:
- Jay: Adding some new features in bitcode to implement proper set operations to fix some bugs in the SCTK implementation. Ayan: We need to set up releases for these 3 repos from aboutcode-publisher, and temporarily I have added the package releases from my fork which can be used for testing/CI. See https://pypi.org/project/ahocode-test/ and https://pypi.org/project/bitcode-test
- Ziad: Implemented pagination, but still the export is slow, what can be done. Will be discussed more in the weekly vulnerablecode call.
- ToDo feature in SCWB: https://github.com/nexB/scancode-workbench/issues/593 This is creating the UI in existing views to show license/package detections which needs to be reviewed, and the initial step would be to do this for licenses, as it's more simple. In the license view we have unique license detections which are references in the todo plugin, and we need to show a exclamation mark there on the left pane. For packages we need to support cases which are undisplayed package data from the file level, and this would be similar to the license clues display.
- SCTK wheels: This would be subject of further discussion as we need to examine in more details what is needed, and we need to support all the install options of SCTK without much hassle. We should also simplify the install-app experience instead of simplifying the developer workflow, as for developement, we need to provide the option to modify the licenses/license index and be able to have this as changes together. Also related: https://github.com/nexB/scancode-toolkit/issues/3497
- We need to have models for code origin and package origin in purldb which would help us in matching to packages, selecting the correct origin.
Participants and Agenda
- Jay: GSoC updates
- Ziad: GSoC updates + test exported vuln data format
- hritik: cdxgen
- Tushar: pyDelhi presentation
- Phillipe, Omkar, Tushar, Keshav, Jono, Swastik, Ayan also attending
- Undiscussed: Origin of Code, SCTK wheels (will be on agenda next week)
Discussion:
- Couple of thoughts on Vulnerablecode data: let's not sort VCID keys, we might want to use saneyaml instead to output YAML. See https://github.com/nexB/saneyaml/. We should also serialize and output separately.
- On a separate note, we have issues related to qualifiers in Vulnerablecode, we want to discuss this on the next Vulnerablecode call. Maybe we should keep a qualifier only if the specifier is specific to the vulnerability.
- Hritik: wanted to add transitive dependencies support in SCIO and cdxgen Added support importing dependencies from cyclonedx boms. We are using hopper models to import/export, which has a lot of updates since. So we should look into if we can safely update to the latest there. In case of creating package/dependency UUIDs we are trusting the tools to create safe UUIDs. We need to also track how a package was created, or from which pipeline/process.
- There are also issues in creating dependencies from Cyclonedx. Here packages is something concrete you find in the codebase, whereas a dependency is something that is referenced and may not be in the codebase, could be optional/unresolved. Since we have a dependency tree in cyclonedx which has the relationships between all the listed packages, we only had packages created, but could not capture the relationships.
- Jay's project has been extended from the standard 12 week to a 14 week. We need to make sure the fallback libraries are actually released so the tests can run in SCTK CI and the issues can be debugged properly. We should also discuss the issues in detail preferably late this week.
- Tushar gave a presentation on python-inspector at pyDelhi last saturday. https://www.youtube.com/watch?v=HclGLQVLBhM https://conference.pydelhi.org/#section-schedule
Participants and Agenda
- Jay: GSoC updates
- Ziad: GSoC updates
- Omkar: testing in workbench
- Tushar, keshav, Ayan: Nothing to discuss
Discussion:
- Jay: tests are failing in SCTK integration. Had to use gitpod and for some reason the integration tests for the libraries were passing, but on testing locally recently there were failures. Need to extend the GSoC project. Also need to make sure the libraries are actually released so the tests can run in SCTK CI and so there won't be issues like this.
- Ziad: Want help to look into authentication and sending auth data. will open an issue as this was not resolved on the call.
- Omkar: In case of .jar files which return an empty package data without a purl, and this needs to be handled differently. In the dependency table will remove the 'No Value Detected' row, but in the pie charts it's still ok to show these cases.
Participants and Agenda
- Jay: GSoC updates
- Ziad: GSoC updates
- Omkar: testing in workbench
- Tushar: quine zip file in extractcode
- Jono, keshav, Ayan: Nothing to discuss
Discussions:
- Need to discuss: https://github.com/nexB/vulnerablecode/issues/1231
- phillipe discussing scancode-toolkit/src/packagedcode/maven.py for jay to be able to update relevant sections for pymaven changes. We should move code from maven.py to pymaven and not subclass, which makes things more complex.
- We have issue of quine zip files which are recursive archives and this crashes extractcode. See issue at: https://github.com/nexB/extractcode/issues/50
- Showing errors in file/codebase level in workbench What to test in workbench? Should we test UI elements also along with testing logical parsing elements?
- Phillipe: We patched a vulneribility in SCIO and published an advisory at https://github.com/nexB/scancode.io/security/advisories/GHSA-2ggp-cmvm-f62f
Participants and Agenda
- Jay: GSoC updates
- Ziad: GSoC updates
- keshav: vers support in purldb
- Hritik: purl.fyi
- Omkar: testing in workbench
- Tushar, Ayan, Jono, Swastik: Nothing to discuss
Discussions:
- Jay:
some tests failing on sanexml, will push a PR for the same, need some help there. Also added PR in SCTK for fallback libraries integration: https://github.com/nexB/scancode-toolkit/pull/3476
- Ziad:
Do we have regex for identifying purl? We can probably use the same thing in vulnerablecode where we lookup by purl and check for valid purl, we can do the same thing here too. Just importing with the PackageURL library should also work.
- Keshav:
Support for univers in purldb package index: In cases where dependencies are not pinned, we only submit lowest package version here, and maybe we should send packageURL + vers to be indexed? https://github.com/nexB/univers/blob/main/src/univers/version_range.py Maybe we can start a basic implementation there, maybe list of tuples/mappings.
- Hritik:
For https://purl.fyi/ it would be nice to consolidate all code we have scattered in different places, this could be a new option in purl_to_url, but there would be new dependencies, so we could also do this in purldb where we have existing code for source_urls etc. We had a GSoC project idea also on this: https://github.com/nexB/aboutcode/wiki/GSOC-2023#purldb-on-demand-retrieval-of-package-metadataarchives
- Omkar:
Discussion on test files for workbench. We also need to support SCIO outputs once it has licenses support. It would be just adding more tests. Also UI review on deps dashboard and package/deps explorer, looks great, just one point about splitting the package-type and number of packages column into two.
Participants and Agenda
- Jay: update on GSoC
- Jono: updating skeleton merge skeleton
- Ziad: detecting PURL, NLP
- Tushar, Keshav, Ayan, Omkar: no topic
- Philippe: github using clearlydefined data (i.e. scancode)
- Hritik: Vulntotal updates
Discussions:
- Jay: update on project - ahocode and bitcode implementation complete. - lxml fallback dependency WIP: https://github.com/nexB/sanexml - will open PR for SCTK integration for ahocode
- Ziad: - Following PackageURL page, how to subscribe? Email or something like a feed - more to be discussed on vulnerablecode call tomorrow
- Jono: - Merge conflicts in docs from skeleton in license-expression
- Phillipe: - https://github.blog/2009-02-13-this-github-is-going-to-the-boids/
Participants and Agenda
- Jay: update on your project
- Jono: feedback on how display history on packages in purlDB
- Ziad: safe HTML in Django
- Tushar, Keshav: no topic
- Ayan: absent, excused
- Philippe: Skeleton
- Omkar: queries on testing and depenencies
Discussions
- Jay: update on project - Some issues in sanexml wrt. lxml to fix - Next up will be integration in ScanCode and run the tests, making then pass
- Jono: feedback on how display history on packages in purlDB - History is simple text field. Each line is a timestamp and message - Should we return the history all times with a purl or have a different a different end-point? - A different a different end-point makes most sense
- Ziad: safe HTML in Django - Need review of how to get the content of a file in git - Need to discuss purl-sync vocabulary - We discussed the data for following PackageURL
- Philippe: Skeleton - The https://github.com/nexB/skeleton needs to be updated to remove Ubuntu 18 - We need a script to automate the base skeleton in many repos. Jono will give it a shot
- Omkar: queries on testing and displaying dependencies - We discussed the display of the dependencies summary and provided feedback - We need an issue in SCTK so that it returns a name and icon for each package type or data_source - We discussed testing including tests that are data-driven
Participants:
- Tushar @tg1999
- Keshav @keshavspace
- Ayan @AyanSinhaMahapatra
- swastik sharma @swastkk
- Jay @35C4n0r
- Akhil @lf32
- Omkar @OmkarPh
Agenda:
- GSoC
- Misc
Discussion:
- purldb still has to be updated with latest scancode and in a stable state for us to start adding good first issues there, so maybe this is better to do a month later. Meanwhile we can mark good first issues in other repositories, in vulnerablecode, scancode-toolkit, scancode.io etc for first time contributors.
- Conclusions pipeline: Conclusions/alerts/review/to-do items in scancode.io is a workflow where we can review detections which are incorrect or needs careful manual review and where the data can be updated in place. This was asked as a tentative GSoC project but we are still finalizing the project ideas and it is advised to start looking at the project ideas list after aboutcode is selected at GSoC.
- https://github.com/nexB/python-inspector/pull/119 was opened by swastik which was failing tests, as live packages are used for python-inspector tests and we need to regen these, we will also add this to the documentation.
- Akhil has updated https://github.com/nexB/scancode.io/pull/450 with binary file support and replaced the Scan Text button with a Utilities drop down with the Detect License option which goes to /scantext/.
- Phillipe needs to review https://github.com/nexB/scancode-workbench/pull/532, and please use scancode v31 with this as v32 is not supported yet here, see https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html for more updates there.
Participants:
- Tushar @tg1999
- phillipe @pombredanne
- Keshav @keshavspace
- Ayan @AyanSinhaMahapatra
- swastik sharma @swastkk
- Jay @35C4n0r
- Shrey Parekh
- Shrijal Acharya
Agenda:
- GSoC
- corrupted advisories
- yaml output
- scancode toolkit reference scans
- packaging and operating system support
- cylconedx input in scancode.io
Discussion:
- Need to review swastik's PR: https://github.com/nexB/python-inspector/pull/119
- should we use both cyclonedx libraries from the cyclonedx-python and the hoppr library? - Keshav links: https://gitlab.com/hoppr/hoppr-cyclonedx-models/ and https://github.com/CycloneDX/cyclonedx-python short term: working with these projects to merge features We don't use XML and don't care about old versions. The hoppr library does for the last 2 cyclonedx versions, and it uses the JSON schema to create the models. We can start using hoppr/hoppr-cyclonedx-models in scancode.io and then maybe later we can use it in scancode-toolkit too.
- JSON to XML conversion for cyclonedx -> library exists which works as a single executable in linux/windows/mac.
- advisories which were imported by previous importers, which aren't compatible to current models. We can delete everything from a importer, when we are reimporting from the same. There's a problem of stale and outdated data, and there's a problem of not discarding data that is used elsewhere also. We can consider archiving for this, or consider adding a deprecated flag.
- more people running non-intel architechture, which doesn't work The key thing would be a single executable: like Jono's work on a scancode.io appimage. We should also have app archives for all python versions which is python 3.7-3.11 and in linux/mac/windows. No arm for now, but would be nice. Another thing would be https://github.com/nexB/scancode-toolkit/issues/3205 If we are using other libraries, we have to write wrappers on them to match the same API. Serializing is another problem. Pyahocorasick is going to be the hardest, as this is a trie structure and saving/loading from disk is not simple.
- https://github.com/nexB/aboutcode/wiki/GSOC-2023 GSoC project ideas were discussed, and we need to further edit this and make all the projects have a clear goal and some detailed instructions to explain them better, Ideas related to vulnerablecode will be discussed in the vulnerabelcode call tomorrow see https://github.com/nexB/vulnerablecode/wiki/WeeklyMeetings.
- We uncovered that the scancode yaml output does not produce valid yaml in certain cases where there are license references and/or matched text in the yaml output and the license text has whitespaces/blank lines. for example, happens in the case of apache-2.0 license text. The solution can't be just to remove whitespaces as they are important, but the check has to be done at saneyaml and we have to produce valid yaml there.
- scancode-toolkit-reference-scan scripts are not working because of the dependency issues present while pip installing older versions, and maybe we should be using git checkout instead of pip install here.
Participants:
- Tushar @tg1999
- Jay @35C4n0r
- phillipe @pombredanne
- swastik sharma @swastkk
- Keshav @keshavspace
- Ayan @AyanSinhaMahapatra
- Jono @jyang
- Akhil @lf32
- Heet Dhorajiya
Agenda:
- scancode.io appimage
- dependency issues
- scancode-toolkit release
- GSoC project ideas
- skeleton
Discussion:
- https://github.com/nexB/scancode.io/tree/scancode.io-appimage/etc/scripts/appimage-build
- https://github.com/nexB/skeleton#usage
- Tushar Goel says:assert req is None or isinstance(req, Requirement), req
- https://github.com/nexB/python-inspector/pull/115
- https://github.com/nexB/packvers/issues/2
- https://www.tdcommons.org/dpubs_series/5632/
Participants:
- Tushar @tg1999
- Hritik @Hritik14
- Jay @35C4n0r
- phillipe @pombredanne
- swastik sharma @swastkk
- Keshav @keshavspace
Agenda:
-
Hritik - nothing
-
Swastik Sharma - SCIO: Issue on SCIO problem with installing with LegacyVersion and SPDX
These are due to https://github.com/pypa/packaging/issues/530 solved with https://github.com/nexB/packvers/ and the SPDX tools uypdates https://github.com/nexB/scancode-toolkit/pull/3173
-
Keshav - VCIO: discuss https://hex.pm/ and Exlixir advisory
-
Philippe - SCIO/SCTK: SPDX library issues - Get ready for planning next week
-
Tushar: - VCIO: About a day away to get all importers migrated for VC - VCIO: made release for VC 31 - VCIO: Will need hex in GH importer alright
-
35C/Ajay - FetchCode: made 2 pr in FetchCode - question wrt. https://github.com/nexB/scancode-toolkit/issues/3138
A: there are some likely updates in https://github.com/nexB/scancode-toolkit/pull/3150
- Question: what are scancode toolkit plugins?