Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with query fetching vulnerabilities for a large sbom and large number of advisories #1129

Closed
dejanb opened this issue Jan 9, 2025 · 6 comments
Assignees
Labels
DB Database aspects. Cross cutting concerns. Ownership of the entire data model. Ensuring it's design w performance

Comments

@dejanb
Copy link
Contributor

dejanb commented Jan 9, 2025

Intro

The issue happens when we have a large SBOM (in this concrete case rhel 7.9 with around 50K packages) and large number of vex entries for these packages (in this case all of Red Hat CSAF files from 2024).

This issue is in the https://github.com/trustification/trustify/blob/main/modules/fundamental/src/sbom/model/details.rs#L51 query, which resolves to

SELECT
    "advisory"."id" AS "advisory$id",
    "advisory"."identifier" AS "advisory$identifier",
    "advisory"."version" AS "advisory$version",
    "advisory"."document_id" AS "advisory$document_id",
    "advisory"."deprecated" AS "advisory$deprecated",
    "advisory"."issuer_id" AS "advisory$issuer_id",
    "advisory"."published" AS "advisory$published",
    "advisory"."modified" AS "advisory$modified",
    "advisory"."withdrawn" AS "advisory$withdrawn",
    "advisory"."title" AS "advisory$title",
    "advisory"."labels" AS "advisory$labels",
    "advisory"."source_document_id" AS "advisory$source_document_id",
    "vulnerability"."id" AS "vulnerability$id",
    "vulnerability"."title" AS "vulnerability$title",
    "vulnerability"."reserved" AS "vulnerability$reserved",
    "vulnerability"."published" AS "vulnerability$published",
    "vulnerability"."modified" AS "vulnerability$modified",
    "vulnerability"."withdrawn" AS "vulnerability$withdrawn",
    "vulnerability"."cwes" AS "vulnerability$cwes",
    "base_purl"."id" AS "base_purl$id",
    "base_purl"."type" AS "base_purl$type",
    "base_purl"."namespace" AS "base_purl$namespace",
    "base_purl"."name" AS "base_purl$name",
    "versioned_purl"."id" AS "versioned_purl$id",
    "versioned_purl"."base_purl_id" AS "versioned_purl$base_purl_id",
    "versioned_purl"."version" AS "versioned_purl$version",
    "qualified_purl"."id" AS "qualified_purl$id",
    "qualified_purl"."versioned_purl_id" AS "qualified_purl$versioned_purl_id",
    "qualified_purl"."qualifiers" AS "qualified_purl$qualifiers",
    "qualified_purl"."purl" AS "qualified_purl$purl",
    "sbom_package"."sbom_id" AS "sbom_package$sbom_id",
    "sbom_package"."node_id" AS "sbom_package$node_id",
    "sbom_package"."version" AS "sbom_package$version",
    "sbom_node"."sbom_id" AS "sbom_node$sbom_id",
    "sbom_node"."node_id" AS "sbom_node$node_id",
    "sbom_node"."name" AS "sbom_node$name",
    "status"."id" AS "status$id",
    "status"."slug" AS "status$slug",
    "status"."name" AS "status$name",
    "status"."description" AS "status$description",
    "cpe"."id" AS "cpe$id",
    "cpe"."part" AS "cpe$part",
    "cpe"."vendor" AS "cpe$vendor",
    "cpe"."product" AS "cpe$product",
    "cpe"."version" AS "cpe$version",
    "cpe"."update" AS "cpe$update",
    "cpe"."edition" AS "cpe$edition",
    "cpe"."language" AS "cpe$language"
FROM
    "sbom_package"
    INNER JOIN "sbom" ON "sbom"."sbom_id" = "sbom_package"."sbom_id"
    JOIN "sbom_node" ON "sbom_package"."sbom_id" = "sbom_node"."sbom_id"
    AND "sbom_package"."node_id" = "sbom_node"."node_id"
    LEFT JOIN "sbom_package_purl_ref" ON "sbom_package"."sbom_id" = "sbom_package_purl_ref"."sbom_id"
    AND "sbom_package"."node_id" = "sbom_package_purl_ref"."node_id"
    LEFT JOIN "qualified_purl" ON "sbom_package_purl_ref"."qualified_purl_id" = "qualified_purl"."id"
    LEFT JOIN "versioned_purl" ON "qualified_purl"."versioned_purl_id" = "versioned_purl"."id"
    LEFT JOIN "base_purl" ON "versioned_purl"."base_purl_id" = "base_purl"."id"
    JOIN "purl_status" ON "base_purl"."id" = "purl_status"."base_purl_id"
    JOIN "status" ON "purl_status"."status_id" = "status"."id"
    LEFT JOIN "version_range" ON "purl_status"."version_range_id" = "version_range"."id"
    LEFT JOIN "cpe" ON "purl_status"."context_cpe_id" = "cpe"."id"
    JOIN "advisory" ON "purl_status"."advisory_id" = "advisory"."id"
    JOIN "vulnerability" ON "purl_status"."vulnerability_id" = "vulnerability"."id"
WHERE
    "sbom"."sbom_id" = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0' 
    AND version_matches("versioned_purl"."version", "version_range".*)

This leads to query fail with the message

ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp3964.175": No space left on device

Meaning there's no enough space for intermediary results.

First analysis

It turns out that for this particular scenario there are around million entries in purl_status table for the base purls contained in the SBOM. Than the join is tried with the version_range table and caling version_matches function which leads to resources exhaustion.

Note: this might be a problem only in the local environment, but it still can lead to the issues with ingesting even more advisories and large SBOMs

Here's the query plan

Hash Join  (cost=203683.78..1006970.79 rows=1295446 width=1374)
  Hash Cond: ((purl_status.vulnerability_id)::text = (vulnerability.id)::text)
  ->  Hash Join  (cost=203497.35..1003380.38 rows=1295446 width=1117)
        Hash Cond: (purl_status.advisory_id = advisory.id)
        ->  Hash Left Join  (cost=203083.12..999562.47 rows=1295446 width=774)
              Hash Cond: (purl_status.context_cpe_id = cpe.id)
              ->  Hash Join  (cost=203056.94..996107.69 rows=1295446 width=736)
                    Hash Cond: (purl_status.status_id = status.id)
                    ->  Hash Left Join  (cost=203033.89..992659.24 rows=1295446 width=640)
                          Hash Cond: (purl_status.version_range_id = version_range.id)
                          Filter: version_matches((versioned_purl.version)::text, version_range.*)
                          ->  Hash Join  (cost=115607.43..221134.63 rows=3886337 width=656)
                                Hash Cond: (versioned_purl.base_purl_id = base_purl.id)
                                ->  Hash Join  (cost=12821.26..34309.17 rows=50379 width=532)
                                      Hash Cond: ((sbom_node.node_id)::text = (sbom_package.node_id)::text)
                                      ->  Nested Loop  (cost=10625.24..31981.09 rows=50303 width=515)
                                            ->  Index Only Scan using sbom_pkey on sbom  (cost=0.15..8.17 rows=1 width=16)
                                                  Index Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                            ->  Hash Join  (cost=10625.09..31469.89 rows=50303 width=499)
                                                  Hash Cond: ((sbom_package_purl_ref.node_id)::text = (sbom_node.node_id)::text)
                                                  ->  Hash Join  (cost=8549.07..29256.83 rows=52202 width=424)
                                                        Hash Cond: (qualified_purl.versioned_purl_id = versioned_purl.id)
                                                        ->  Hash Join  (cost=2075.99..16187.72 rows=52202 width=349)
                                                              Hash Cond: (qualified_purl.id = sbom_package_purl_ref.qualified_purl_id)
                                                              ->  Seq Scan on qualified_purl  (cost=0.00..12660.06 rows=247906 width=288)
                                                              ->  Hash  (cost=1423.46..1423.46 rows=52202 width=77)
                                                                    ->  Seq Scan on sbom_package_purl_ref  (cost=0.00..1423.46 rows=52202 width=77)
                                                                          Filter: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                        ->  Hash  (cost=3165.48..3165.48 rows=131248 width=75)
                                                              ->  Seq Scan on versioned_purl  (cost=0.00..3165.48 rows=131248 width=75)
                                                  ->  Hash  (cost=1423.56..1423.56 rows=52197 width=75)
                                                        ->  Seq Scan on sbom_node  (cost=0.00..1423.56 rows=52197 width=75)
                                                              Filter: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                      ->  Hash  (cost=1544.47..1544.47 rows=52124 width=94)
                                            ->  Seq Scan on sbom_package  (cost=0.00..1544.47 rows=52124 width=94)
                                                  Filter: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                ->  Hash  (cost=49168.42..49168.42 rows=1624379 width=140)
                                      ->  Hash Join  (cost=716.78..49168.42 rows=1624379 width=140)
                                            Hash Cond: (purl_status.base_purl_id = base_purl.id)
                                            ->  Seq Scan on purl_status  (cost=0.00..44186.79 rows=1624379 width=94)
                                            ->  Hash  (cost=453.57..453.57 rows=21057 width=46)
                                                  ->  Seq Scan on base_purl  (cost=0.00..453.57 rows=21057 width=46)
                          ->  Hash  (cost=38314.76..38314.76 rows=1632776 width=117)
                                ->  Seq Scan on version_range  (cost=0.00..38314.76 rows=1632776 width=117)
                    ->  Hash  (cost=15.80..15.80 rows=580 width=112)
                          ->  Seq Scan on status  (cost=0.00..15.80 rows=580 width=112)
              ->  Hash  (cost=19.97..19.97 rows=497 width=54)
                    ->  Seq Scan on cpe  (cost=0.00..19.97 rows=497 width=54)
        ->  Hash  (cost=356.88..356.88 rows=4588 width=359)
              ->  Seq Scan on advisory  (cost=0.00..356.88 rows=4588 width=359)
  ->  Hash  (cost=133.97..133.97 rows=4197 width=271)
        ->  Seq Scan on vulnerability  (cost=0.00..133.97 rows=4197 width=271)
JIT:
  Functions: 96
  Options: Inlining true, Optimization true, Expressions true, Deforming true

Reproduce

  1. Start trustify locally
  2. Ingest ds3 http POST localhost:8080/api/v1/dataset @etc/datasets/ds3.zip
  3. Ingest rhel 7.9 SBOM csaf scoop http://localhost:8080/api/v1/sbom etc/test-data/rhel-7.9.z.json
  4. Start redhat-csaf-vex-2024 importer
  5. Try to access SBOM

Solution

I still didn't find a proper solution. I had a few tries at rewriting the query to limit the impact of matching version ranges, but still without the success

@JimFuller-RedHat
Copy link
Collaborator

thx for this - on my plate

@JimFuller-RedHat JimFuller-RedHat added the DB Database aspects. Cross cutting concerns. Ownership of the entire data model. Ensuring it's design w label Jan 10, 2025
@JimFuller-RedHat JimFuller-RedHat moved this to Ready in Trustify Jan 10, 2025
@JimFuller-RedHat JimFuller-RedHat moved this from Ready to In progress in Trustify Jan 10, 2025
@dejanb
Copy link
Contributor Author

dejanb commented Jan 10, 2025

@JimFuller-RedHat Great. I managed to get by the error, by adding more resources to the postgres container, but still the query is really slow. I think it can be easily improved by eliminating not-affected statuses for starters.

@dejanb
Copy link
Contributor Author

dejanb commented Jan 10, 2025

Here's zipped sbom for the reference. It might be not the best example, but we can test with others as well
rhel-7.9.z.json.zip

@JimFuller-RedHat
Copy link
Collaborator

FWIW - I was able to reproduce ...

@JimFuller-RedHat
Copy link
Collaborator

temporarily setting in postgres:

SET enable_seqscan = OFF;

I can force the indexes to be used and its faster ... the reason why this query plan is coming up with seqscans is because of STATISTICS setting on the various tables ... I think in this case we can ensure a good query plan if we tweak some of the stats on some of the larger tables.

@dejanb
Copy link
Contributor Author

dejanb commented Jan 13, 2025

This helps some. The query time for go from

Time: 13359.658 ms (00:13.360)

to

Time: 10276.297 ms (00:10.276)

But that still seems a lot.

Here's the new query plan

                                                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=119029.14..414201.85 rows=75813 width=1306)
   Hash Cond: ((purl_status.vulnerability_id)::text = (vulnerability.id)::text)
   ->  Nested Loop  (cost=118441.51..413415.02 rows=75813 width=1049)
         ->  Hash Left Join  (cost=118441.22..410997.36 rows=75813 width=706)
               Hash Cond: (purl_status.context_cpe_id = cpe.id)
               ->  Hash Join  (cost=118211.80..410568.27 rows=75813 width=668)
                     Hash Cond: (purl_status.status_id = status.id)
                     ->  Nested Loop Left Join  (cost=118147.70..410303.70 rows=75813 width=572)
                           Filter: version_matches((versioned_purl.version)::text, version_range.*)
                           ->  Hash Join  (cost=118147.27..231037.57 rows=227439 width=588)
                                 Hash Cond: (purl_status.base_purl_id = versioned_purl.base_purl_id)
                                 ->  Index Scan using purl_status_x on purl_status  (cost=0.43..81402.42 rows=1709251 width=94)
                                 ->  Hash  (cost=117977.67..117977.67 rows=13534 width=510)
                                       ->  Nested Loop  (cost=61160.15..117977.67 rows=13534 width=510)
                                             ->  Nested Loop  (cost=61159.73..111945.87 rows=13534 width=442)
                                                   ->  Nested Loop  (cost=61159.31..105346.15 rows=13534 width=387)
                                                         ->  Nested Loop  (cost=61158.88..63441.39 rows=13534 width=179)
                                                               ->  Index Only Scan using sbom_pkey on sbom  (cost=0.28..8.29 rows=1 width=16)
                                                                     Index Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                               ->  Hash Join  (cost=61158.61..63297.75 rows=13534 width=179)
                                                                     Hash Cond: ((sbom_node.node_id)::text = (sbom_package.node_id)::text)
                                                                     ->  Hash Join  (cost=58381.24..60484.73 rows=13581 width=166)
                                                                           Hash Cond: ((sbom_package_purl_ref.node_id)::text = (sbom_node.node_id)::text)
                                                                           ->  Index Scan using sbom_package_purl_ref_sbom on sbom_package_purl_ref  (cost=0.43..1969.70 rows=51132 width=77)
                                                                                 Index Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                                           ->  Hash  (cost=57697.38..57697.38 rows=54674 width=89)
                                                                                 ->  Bitmap Heap Scan on sbom_node  (cost=4284.28..57697.38 rows=54674 width=89)
                                                                                       Recheck Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                                                       ->  Bitmap Index Scan on sbom_node_pkey  (cost=0.00..4270.61 rows=54674 width=0)
                                                                                             Index Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                                     ->  Hash  (cost=2089.62..2089.62 rows=55020 width=74)
                                                                           ->  Index Scan using sbom_package_sbom on sbom_package  (cost=0.43..2089.62 rows=55020 width=74)
                                                                                 Index Cond: (sbom_id = '019444fc-7ef5-7f91-adc2-af4b0f35a3e0'::uuid)
                                                         ->  Index Scan using qualified_purl_id_idx on qualified_purl  (cost=0.42..3.10 rows=1 width=224)
                                                               Index Cond: (id = sbom_package_purl_ref.qualified_purl_id)
                                                   ->  Index Scan using package_version_pkey on versioned_purl  (cost=0.42..0.49 rows=1 width=55)
                                                         Index Cond: (id = qualified_purl.versioned_purl_id)
                                             ->  Index Scan using base_purl_id_idx on base_purl  (cost=0.42..0.45 rows=1 width=68)
                                                   Index Cond: (id = versioned_purl.base_purl_id)
                           ->  Index Scan using version_range_pkey on version_range  (cost=0.43..0.53 rows=1 width=117)
                                 Index Cond: (id = purl_status.version_range_id)
                     ->  Hash  (cost=56.85..56.85 rows=580 width=112)
                           ->  Index Scan using status_pkey on status  (cost=0.15..56.85 rows=580 width=112)
               ->  Hash  (cost=214.11..214.11 rows=1225 width=54)
                     ->  Index Scan using cpe_pkey on cpe  (cost=0.28..214.11 rows=1225 width=54)
         ->  Memoize  (cost=0.29..0.31 rows=1 width=359)
               Cache Key: purl_status.advisory_id
               Cache Mode: logical
               ->  Index Scan using advisory_pkey on advisory  (cost=0.28..0.30 rows=1 width=359)
                     Index Cond: (id = purl_status.advisory_id)
   ->  Hash  (cost=532.72..532.72 rows=4393 width=271)
         ->  Index Scan using vulnerability_pkey on vulnerability  (cost=0.28..532.72 rows=4393 width=271)
 JIT:
   Functions: 83
   Options: Inlining false, Optimization false, Expressions true, Deforming true

What I think would be greatly beneficial is to filter for statuses with appropriate CPEs early in the query. At the moment we do this in the code after the query https://github.com/trustification/trustify/blob/main/modules/fundamental/src/sbom/model/details.rs#L225. So even if we do this expensive query, we end up with 0 result.

If we would code this as postgres function and use we could significantly improve the speed.
The simple test to prove this is to add something like

AND cpe.product = 'enterprise_linux'
    and cpe.version = '7'

to the query, in which case the execution time goes to

Time: 2979.886 ms (00:02.980)   

@JimFuller-RedHat JimFuller-RedHat moved this from In progress to In review in Trustify Jan 14, 2025
@JimFuller-RedHat JimFuller-RedHat moved this from In review to Done in Trustify Jan 15, 2025
@JimFuller-RedHat JimFuller-RedHat closed this as completed by moving to Done in Trustify Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB Database aspects. Cross cutting concerns. Ownership of the entire data model. Ensuring it's design w performance
Projects
Status: Done
Development

No branches or pull requests

2 participants