3.5: Further optimize Archive List API call / DB query #13601

agilgur5 · 2024-09-14T17:32:05Z

Pre-requisites

I have double-checked my configuration
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

This is a follow-up to #13295 and #13566. We can further improve the query performance, seemingly by another order of magnitude, by adding a union index:

CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat);

This can be added as a migration after this line in migration.go.

We should also test the performance of this index with and without the subquery. If the subquery is no longer necessary, then we can revert #13566 along with this change

More context

Quoting myself in #13566 (review):

Note that this query seems to be further optimizable per #13295 (comment), so we may want to open a new issue to add further optimizations, but this at least fixes the performance issues and gets us to a much more reasonable baseline

Quoting @Danny5487401 from #13563 (comment)

@agilgur5 three methods i tested:

nice implemention by sub,according to fix(api): optimise archived list query. Fixes #13295 #13566

[...]

it drops from 37s 768ms to 1 s 981 ms

force index (argo_archived_workflows_i4)
SELECT
    [...]
FROM `argo_archived_workflows` force index (argo_archived_workflows_i4)
WHERE [...]
ORDER BY `startedat` DESC
LIMIT 20;
it drops from 37s 768ms to 138 ms

union index (clustername, startedat) to use Backward index scan instead of filesort
CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat);
it drops from 37s to 140 ms

And then my response:

It sounds like the force index and new index are both faster than the subquery, at least on MySQL 8.0, I wonder if we should use those instead of the subquery. force index is potentially not the most resilient, but the new index seems like a "why not?" to me

We should also add a simplified/minimal "performance regression test" to ensure that, well, the performance doesn't regress, per #13566 (comment):

Version(s)

v3.5, v3.5.10, 729ac17

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a - this is API focused

Logs from the workflow controller

n/a - this is API focused

Logs from in your workflow's wait container

n/a - this is API focused

The text was updated successfully, but these errors were encountered:

kodieg · 2024-09-26T10:37:26Z

I tested fetching list of workflows using a new version v3.5.11, but still observed quite large times (~20s). We're using PostgreSQL database.

I tested converting json to jsonb and observed times approximately 9s.

Finally, tested generated columns variant proposed here and it worked superfast.

Is it possible to pursue optimization with generated columns for postgres?

agilgur5 · 2024-09-26T13:39:55Z

You should try the union index as written above from #13563 (comment).

Your attempts suggest that your DB engine is for some reason still parsing all JSON instead of the single row. What version of Postgres are you on?

The generated columns can cause a very long migration time per #13295 (comment) and require newer Postgres, so are not suitable for a patch upgrade. They can also slow down certain queries as well as they increase the size of the table and rows. Lastly, they are unsustainable in the long run per #13295 (comment) as we'd have to keep adding more columns as users want to filter or query more.

Generated columns are also a band-aid and don't solve the root cause of the performance, which is that the DB is extracting JSON in every row, despite it not being used in the WHERE filter. The subquery and different indices actually attempt to fix the root cause

kodieg · 2024-09-26T15:07:22Z

I already tested this with union index added.

We're using PostgreSQL 12.4.

Maybe my usecase is a bit different. We're fetching information about workflows using /api/v1/workflows/argo-workflows
So we want to fetch multiple rows (not a single row). This can take over long seconds even if we want to fetch 1000 of workflows. In this case with limiting and almost no filtering, parsing jsons is slow. Even query like fetch all data is slow. Even migration to jsonb helps a lot.

Maybe a way to improve current situation would be to migrate workflow column to jsonb type or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query (because this workflow jsons are huge).

agilgur5 · 2024-09-26T15:20:51Z

I already tested this with union index added.

Can you clarify the performance of that? You didn't mention it in your first comment.

This can take over long seconds even if we want to fetch 1000 of workflows.

Oh, well if you're intentionally not paginating, that's rather unsurprising that it's slow. Then the indexing is not the main issue but the actual retrieval.
Unpaginated queries weren't the original purpose of this issue, though I suppose that too is a 3.5 regression due to the JSON retrieval

agilgur5 · 2024-09-26T15:28:10Z

Even migration to jsonb helps a lot.

JSONB is only supported by Postgres, but we could migrate Postgres to that if the migration time is minimal. You'd have to test that and then PR for the migration and query differences, for Postgres only

or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query

This will have similar sustainability issues as the fields stored will change over time, but this causes unexpected problems in the future as well as adding fields will only add to new rows and not old rows. So that approach seems quite suboptimal.

kodieg · 2024-09-27T08:39:03Z

I tested times on small postgres instance. However, I think it will be faster than migration from text to json that already may be applied.

For 100k of workflows it took 96 seconds (on pretty small and slow instance).

Currently, we're cleaning up this table every few hours leaving us without history.

agilgur5 · 2024-09-27T15:38:52Z

Currently, we're cleaning up this table every few hours leaving us without history.

You should just use API requests with smaller limits and keep your history.

MasonM · 2024-10-02T02:55:41Z

I mentioned in the contributor meeting I'd look into this. To evaluate these kind of optimizations, you need a ton of rows in argo_archived_workflows, but there doesn't seem to be a way of generating that with the dev environment. Have you all been using real-world DB dumps? I don't have access to a dump with that kind of data, unfortunately.

I think there should be a script (probably under hack/) to generate dummy data and insert that into the DB. The schema for argo_archived_workflows is pretty straightforward, so that should be easy without introducing any new tools/libraries. I can work on that, unless anybody has any better ideas.

Joibel · 2024-10-02T14:47:17Z

I don't have better ideas - I agree that would be useful though.

agilgur5 · 2024-10-02T15:34:58Z

To evaluate these kind of optimizations, you need a ton of rows in argo_archived_workflows, but there doesn't seem to be a way of generating that with the dev environment

Indeed that is a challenge of testing this and one reason why higher scale performance regressions are hard to catch early -- both dev and test would need to operate with such large numbers.

Have you all been using real-world DB dumps?

No, we've actually been relying on user feedback to test much of the time.

I think there should be a script (probably under hack/) to generate dummy data and insert that into the DB.

Yea you can probably get this working with faker or similar tooling

so that should be easy without introducing any new tools/libraries

The JSON blob is the primary column needing an optimization, and its schema is the entire CR, so that part would be difficult without tooling 😅

This adds a small CLI tool for developers to use when working with the DB locally. It provides two functions: 1. Explicitly migrate the DB (which is normally only done when the workflow controller starts) to help test migrations: ``` $ go run ./hack/db migrate INFO[0000] Migrating database schema clusterName=default dbType=postgres ``` 2. Insert randomly-generated archived workflows into the DB, which is intended to help test query performance (e.g. argoproj#13601) ``` $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000 Using seed 2600415997737113470 Clusters: [cx5j6rrzm5vnkfqqr4d] Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9] Inserted 100000 rows real 17m56.554s user 1m53.833s sys 0m43.581s ``` This is obviously not as efficient as it could be, but it's only intended to be run on an adhoc basis when manually testing query performance. To make it fast, we'd probably have to switch to direct SQL inserts, which would couple this script to the logic in `persist/sqldb/workflow_archive.go`. Signed-off-by: Mason Malone <[email protected]> # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # Author: Mason Malone <[email protected]> # Date: Thu Oct 3 18:48:48 2024 -0700 # # On branch feat-cli-db-generator2 # Changes to be committed: # modified: docs/running-locally.md # new file: hack/db/main.go # # Untracked files: # notes.md # test.json # tmp #

This adds a small CLI tool for developers to use when working with the DB locally. It provides two functions: 1. Explicitly migrate the DB (which is normally only done when the workflow controller starts) to help test migrations: ``` $ go run ./hack/db migrate INFO[0000] Migrating database schema clusterName=default dbType=postgres ``` 2. Insert randomly-generated archived workflows into the DB, which is intended to help test query performance (e.g. argoproj#13601) ``` $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000 Using seed 2600415997737113470 Clusters: [cx5j6rrzm5vnkfqqr4d] Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9] Inserted 100000 rows real 17m56.554s user 1m53.833s sys 0m43.581s ``` This is obviously not as efficient as it could be, but it's only intended to be run on an adhoc basis when manually testing query performance. To make it fast, we'd probably have to switch to direct SQL inserts, which would couple this script to the logic in `persist/sqldb/workflow_archive.go`. Signed-off-by: Mason Malone <[email protected]>

MasonM · 2024-10-06T23:26:48Z

Okay, I entered a draft PR with a simple CLI tool to populate the DB with randomly-generated workflows: #13715

Unfortunately, I can't reproduce the performance issues reported in #13566. @ryancurrah said he had 93,135 workflows in #13566 (comment), so I used the tool to populate the DB with 100,000 randomly-generated workflows.

$  go run ./hack/db/main.go fake-archived-workflows --rows 100000 --seed 2600415997737113470
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
Inserted 100000 rows

Every workflow is ~~1466 bytes~~ ~5900 bytes (edited) bytes after storage and associated with the same cluster ("cx5j6rrzm5vnkfqqr4d"). The tool generated 5 namespaces that are randomly assigned to each workflow (i.e. each namespace has ~20,000 workflows associated with it). Then, I tried executing a simple variation of the "before" query from that issue:

EXPLAIN ANALYZE SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'cx5j6rrzm5vnkfqqr4d'
        AND "instanceid" = '')
       AND "namespace" = 'bs98pxsfhs9z'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded')
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1;

but it consistently executes in <2ms:

Limit  (cost=1.13..60.41 rows=1 width=318) (actual time=1.067..1.068 rows=1 loops=1)
  ->  Nested Loop  (cost=1.13..100778.36 rows=1700 width=318) (actual time=1.065..1.066 rows=1 loops=1)
        Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
        ->  Nested Loop  (cost=0.71..89644.36 rows=3364 width=1617) (actual time=0.071..0.071 rows=1 loops=1)
              ->  Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows  (cost=0.29..24325.29 rows=19917 width=1580) (actual time=0.026..0.027 rows=1 loops=1)
                    Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
                    Rows Removed by Filter: 2
              ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..3.28 rows=1 width=57) (actual time=0.040..0.040 rows=1 loops=1)
                    Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                    Filter: ((value)::text = 'Succeeded'::text)
        ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=0.42..3.28 rows=1 width=57) (actual time=0.016..0.016 rows=1 loops=1)
              Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
              Filter: ((value)::text = 'Persisted'::text)
Planning Time: 1.916 ms
Execution Time: 1.226 ms

The query plan shows it's doing an index scan on the 19,917 rows associated with the namespace bs98pxsfhs9z, which is what it should be doing. I'm not sure why @ryancurrah got such different results. Maybe the average size of the workflows in his DB is very large? If so, that would indicate TOAST issues. You can use this query to check:

select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;

Results for me:

          avg          |          avg          
-----------------------+-----------------------
 5907.8400000000000000 | 1488.8100000000000000

agilgur5 · 2024-10-07T03:27:20Z

Every workflow is ~1466 bytes

Maybe the average size of the workflows in his DB is larger than 2KB?

Yea I wouldn't be surprised by that, after including the status, workflows can be very large. See also node status offloading and #7121

MasonM · 2024-10-07T03:55:21Z

I edited my previous comment because I think I was wrong about the workflow size being ~1466 bytes. That was the value returned by select avg(pg_column_size(workflow)) from argo_archived_workflows, which I assumed including TOAST'd values, but it appears it doesn't (relevant StackOverflow). The actual size is ~5900 bytes per select avg(octet_length(workflow::text)) from argo_archived_workflows

kodieg · 2024-10-07T08:11:10Z

These 100'000 workflows were from a single namespace. I calculated average size of workflow column and max as well:

argo=# select avg(octet_length(workflow::text)) from argo_archived_workflows;
        avg
--------------------
 18599.709432882786

argo=# select max(octet_length(workflow::text)) from argo_archived_workflows;
  max
-------
 65875

Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time

```sql argo=# \timing

SELECT name,
namespace,
UID,
phase,
startedat,
finishedat,
coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
(workflow::JSON)->'spec'->>'suspend' AS suspend,
coalesce((workflow::JSON)->'status'->>'message', '') AS message,
coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
AND "instanceid" = '')
AND "namespace" = 'argo-workflows'
AND EXISTS
(SELECT 1
FROM argo_archived_workflows_labels
WHERE clustername = argo_archived_workflows.clustername
AND UID = argo_archived_workflows.uid
AND name = 'workflows.argoproj.io/phase'
AND value = 'Succeeded') AND EXISTS (SELECT 1
FROM argo_archived_workflows_labels
WHERE clustername = argo_archived_workflows.clustername
AND UID = argo_archived_workflows.uid
AND name = 'workflows.argoproj.io/workflow-archiving-status'
AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1000;

Time: 61666.463 ms (01:01.666)

</details>

With JSONB it takes less than 5 seconds

<details>

```sql
SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSONB)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSONB)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSONB)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSONB)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSONB)->'status'->>'message', '') AS message,
       coalesce((workflow::JSONB)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSONB)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
        AND "instanceid" = '')
       AND "namespace" = 'argo-workflows'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded') AND EXISTS (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1000;

Even though, fetching single row takes 60ms

SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
        AND "instanceid" = '')
       AND "namespace" = 'argo-workflows'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded') AND EXISTS (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1;

Time: 61.373 ms

Below execution plan for fetching a single row:

                                                                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.12..693.65 rows=1 width=343) (actual time=1.253..1.254 rows=1 loops=1)
   ->  Nested Loop  (cost=1.12..65098.89 rows=94 width=343) (actual time=1.252..1.253 rows=1 loops=1)
         Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
         ->  Nested Loop  (cost=0.71..60559.85 rows=1096 width=182) (actual time=0.039..0.039 rows=1 loops=1)
               ->  Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows  (cost=0.29..2133.01 rows=14172 width=145) (actual time=0.018..0.018 rows=1 loops=1)
                     Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
               ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..4.12 rows=1 width=45) (actual time=0.018..0.018 rows=1 loops=1)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                     Filter: ((value)::text = 'Succeeded'::text)
         ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=0.42..4.12 rows=1 width=45) (actual time=0.008..0.008 rows=1 loops=1)
               Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
               Filter: ((value)::text = 'Persisted'::text)
 Planning Time: 0.981 ms
 Execution Time: 1.316 ms
(14 rows)

Time: 58.798 ms

And for fetching 1000 rows

 Limit  (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.688..58700.913 rows=1000 loops=1)
   ->  Sort  (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.687..58700.843 rows=1000 loops=1)
         Sort Key: argo_archived_workflows.startedat DESC
         Sort Method: top-N heapsort  Memory: 2080kB
         ->  Nested Loop  (cost=2490.79..5130.45 rows=94 width=343) (actual time=29.798..58665.875 rows=13214 loops=1)
               Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
               ->  Hash Join  (cost=2490.38..4741.70 rows=94 width=90) (actual time=18.732..67.268 rows=13214 loops=1)
                     Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                     ->  Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.345..27.844 rows=13983 loops=1)
                           Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                           Filter: ((clustername)::text = 'default'::text)
                           Heap Blocks: exact=2321
                           ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..180.57 rows=1215 width=0) (actual time=2.986..2.986 rows=14183 loops=1)
                                 Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                     ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.378..15.379 rows=13214 loops=1)
                           Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1122kB
                           ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.275..11.281 rows=13214 loops=1)
                                 Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                 Filter: ((clustername)::text = 'default'::text)
                                 Heap Blocks: exact=2283
                                 ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.924..2.924 rows=13414 loops=1)
                                       Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
               ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.41..4.05 rows=1 width=145) (actual time=0.011..0.011 rows=1 loops=13214)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
                     Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
 Planning Time: 0.959 ms
 Execution Time: 58701.536 ms
(27 rows)

Time: 58733.573 ms (00:58.734)

I starded to dig a little bit more and found a single row that takes that long to fetch if I do it with some offset.

                                                                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5133.34..5133.34 rows=1 width=365) (actual time=60255.876..60255.880 rows=1 loops=1)
   ->  Sort  (cost=5133.31..5133.55 rows=94 width=365) (actual time=60255.872..60255.877 rows=12 loops=1)
         Sort Key: argo_archived_workflows.startedat DESC
         Sort Method: top-N heapsort  Memory: 50kB
         ->  Nested Loop  (cost=2490.79..5131.16 rows=94 width=365) (actual time=29.639..60230.904 rows=13029 loops=1)
               Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
               ->  Hash Join  (cost=2490.38..4741.70 rows=94 width=90) (actual time=19.018..63.536 rows=13029 loops=1)
                     Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                     ->  Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.701..26.502 rows=13798 loops=1)
                           Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                           Filter: ((clustername)::text = 'default'::text)
                           Heap Blocks: exact=2291
                           ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..180.57 rows=1215 width=0) (actual time=3.346..3.347 rows=14198 loops=1)
                                 Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                     ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.308..15.309 rows=13029 loops=1)
                           Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1108kB
                           ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.407..11.517 rows=13029 loops=1)
                                 Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                 Filter: ((clustername)::text = 'default'::text)
                                 Heap Blocks: exact=2253
                                 ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.962..2.963 rows=13429 loops=1)
                                       Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
               ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.41..4.05 rows=1 width=145) (actual time=0.010..0.010 rows=1 loops=13029)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
                     Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
 Planning Time: 0.977 ms
 Execution Time: 60255.951 ms
(27 rows)

Time: 60343.720 ms (01:00.344)

kodieg · 2024-10-07T08:34:16Z

I noticed that you used all query (and I did it as well with EXISTS) that supposedly parses all JSON. I tested with new query

EXPLAIN ANALYZE SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'status'->>'progress', '') AS progress,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE "clustername" = 'default'
  AND UID IN
    (SELECT UID
     FROM "argo_archived_workflows"
     WHERE (("clustername" = 'default'
             AND "instanceid" = '')
            AND "namespace" = 'argo-workflows'
            AND EXISTS
              (SELECT 1
               FROM argo_archived_workflows_labels
               WHERE clustername = argo_archived_workflows.clustername
                 AND UID = argo_archived_workflows.uid
                 AND name = 'workflows.argoproj.io/phase'
                 AND value = 'Succeeded')
           )
     ORDER BY "startedat" DESC
     LIMIT 1000);

JSON version takes 5 seconds

                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=4276.07..6182.55 rows=1000 width=375) (actual time=47.982..5625.164 rows=1000 loops=1)
   Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
   ->  Seq Scan on argo_archived_workflows  (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..7.680 rows=13812 loops=1)
         Filter: ((clustername)::text = 'default'::text)
   ->  Hash  (cost=4263.57..4263.57 rows=1000 width=37) (actual time=36.108..36.112 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 76kB
         ->  Subquery Scan on "ANY_subquery"  (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.608..35.879 rows=1000 loops=1)
               ->  Limit  (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.607..35.738 rows=1000 loops=1)
                     ->  Sort  (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.606..35.666 rows=1000 loops=1)
                           Sort Key: argo_archived_workflows_1.startedat DESC
                           Sort Method: top-N heapsort  Memory: 205kB
                           ->  Hash Join  (cost=2309.51..4195.73 rows=1096 width=45) (actual time=15.081..28.178 rows=13043 loops=1)
                                 Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                                 ->  Seq Scan on argo_archived_workflows argo_archived_workflows_1  (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.618 rows=13812 loops=1)
                                       Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
                                 ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.071..15.072 rows=13043 loops=1)
                                       Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1109kB
                                       ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.226..11.272 rows=13043 loops=1)
                                             Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                             Filter: ((clustername)::text = 'default'::text)
                                             Heap Blocks: exact=2255
                                             ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.883..2.883 rows=13443 loops=1)
                                                   Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
 Planning Time: 0.641 ms
 Execution Time: 5625.530 ms
(25 rows)

Time: 5714.274 ms (00:05.714)

while JSONB take 500ms

                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=4276.07..6142.55 rows=1000 width=375) (actual time=36.683..494.291 rows=1000 loops=1)
   Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
   ->  Seq Scan on argo_archived_workflows  (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..5.076 rows=13812 loops=1)
         Filter: ((clustername)::text = 'default'::text)
   ->  Hash  (cost=4263.57..4263.57 rows=1000 width=37) (actual time=35.711..35.716 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 76kB
         ->  Subquery Scan on "ANY_subquery"  (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.226..35.483 rows=1000 loops=1)
               ->  Limit  (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.225..35.355 rows=1000 loops=1)
                     ->  Sort  (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.224..35.284 rows=1000 loops=1)
                           Sort Key: argo_archived_workflows_1.startedat DESC
                           Sort Method: top-N heapsort  Memory: 205kB
                           ->  Hash Join  (cost=2309.51..4195.73 rows=1096 width=45) (actual time=14.685..27.776 rows=13043 loops=1)
                                 Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                                 ->  Seq Scan on argo_archived_workflows argo_archived_workflows_1  (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.707 rows=13812 loops=1)
                                       Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
                                 ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=14.676..14.678 rows=13043 loops=1)
                                       Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1109kB
                                       ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.230..11.149 rows=13043 loops=1)
                                             Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                             Filter: ((clustername)::text = 'default'::text)
                                             Heap Blocks: exact=2255
                                             ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.884..2.885 rows=13443 loops=1)
                                                   Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
 Planning Time: 0.596 ms
 Execution Time: 494.462 ms
(25 rows)

Time: 535.514 ms

I tested fetching 10, 100 and 1000 rows and times are increasing almost linearly, but JSONB version is consistently 10x faster.

MasonM · 2024-10-08T00:19:29Z

@kodieg Thanks for the details!

Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time

The queries in the PR description of #13566 all had LIMIT 1, and the screenshot shows the query taking 11495ms before the change. If it's nearly instant for you, that means you can't reproduce it either. Of course, there's still other performance issues here, and fixing those might end up fixing the original issue, but we can't be sure if we can't reproduce it.

In any case, I tried to replicate your setup by creating a large workflow to use as a template: https://gist.github.com/MasonM/15cfb3c4f497324ac67d1a1a334d65db

Then I ran the following command to insert 100,000 randomized workflows using that template, all of which are associated with a single namespace (bs98pxsfhs9z):

$ time go run ./hack/db/main.go fake-archived-workflows --rows 100000 --namespaces 1 --seed 2600415997737113470 --template '@large-workflow.yaml'
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z]
Inserted 100000 rows

real    19m7.916s
user    2m47.567s
sys     0m48.239s

The average workflow size is now ~19KB:

postgres=# select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;
        avg         |          avg          
--------------------+-----------------------
 18900.846520000000 | 1707.9891100000000000
(1 row)

After that, I ran the query from my original comment (the "before" query), which completes in <4ms, so it still can't reproduce the original issue. If I change LIMIT 1 to LIMIT 1000, the execution time jumps to ~450ms, which I'd consider to still be pretty decent, since it's rare someone would need to list 1000 workflows at once. However, the "new" query from #13566 is much slower for 1,000 rows, taking ~1250ms:

postgres=# EXPLAIN ANALYZE SELECT name,
postgres-#        namespace,
postgres-#        UID,
postgres-#        phase,
postgres-#        startedat,
postgres-#        finishedat,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
postgres-#        coalesce((workflow::JSON)->'status'->>'progress', '') AS progress,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
postgres-#        (workflow::JSON)->'spec'->>'suspend' AS suspend,
postgres-#        coalesce((workflow::JSON)->'status'->>'message', '') AS message,
postgres-#        coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
postgres-#        coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
postgres-# FROM "argo_archived_workflows"
postgres-# WHERE "clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres-#   AND UID IN
postgres-#     (SELECT UID
postgres(#      FROM "argo_archived_workflows"
postgres(#      WHERE (("clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres(#              AND "instanceid" = '')
postgres(#             AND "namespace" = 'bs98pxsfhs9z'
postgres(#             AND EXISTS
postgres(#               (SELECT 1
postgres(#                FROM argo_archived_workflows_labels
postgres(#                WHERE clustername = argo_archived_workflows.clustername
postgres(#                  AND UID = argo_archived_workflows.uid
postgres(#                  AND name = 'workflows.argoproj.io/phase'
postgres(#                  AND value = 'Succeeded')
postgres(#            )
postgres(#      ORDER BY "startedat" DESC
postgres(#      LIMIT 1000);
                                                                                                                   QUERY PLAN                                                                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=9212.87..16488.55 rows=1000 width=349) (actual time=33.046..1213.150 rows=1000 loops=1)
   ->  HashAggregate  (cost=9212.45..9222.45 rows=1000 width=37) (actual time=31.731..32.037 rows=1000 loops=1)
         Group Key: ("ANY_subquery".uid)::text
         ->  Subquery Scan on "ANY_subquery"  (cost=1000.74..9209.95 rows=1000 width=37) (actual time=11.299..31.039 rows=1000 loops=1)
               ->  Limit  (cost=1000.74..9199.95 rows=1000 width=45) (actual time=11.293..30.882 rows=1000 loops=1)
                     ->  Gather Merge  (cost=1000.74..136353.38 rows=16508 width=45) (actual time=11.290..30.783 rows=1000 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Nested Loop  (cost=0.71..133447.92 rows=6878 width=45) (actual time=0.222..27.746 rows=767 loops=3)
                                 ->  Parallel Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows argo_archived_workflows_1  (cost=0.29..102561.01 rows=41667 width=65) (actual time=0.051..7.306 rows=2262 loops=3)
                                       Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
                                 ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..0.74 rows=1 width=57) (actual time=0.009..0.009 rows=0 loops=6787)
                                       Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows_1.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                                       Filter: ((value)::text = 'Succeeded'::text)
                                       Rows Removed by Filter: 1
   ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.42..7.29 rows=1 width=1801) (actual time=0.005..0.005 rows=1 loops=1000)
         Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = ("ANY_subquery".uid)::text))
 Planning Time: 2.211 ms
 Execution Time: 1213.592 ms
(19 rows)

My best guess is that the performance is highly label-dependent. Could you please run the following query so we can get some stats?

select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;

agilgur5 · 2024-10-08T01:14:53Z

My best guess is that the performance is highly label-dependent

all of which are associated with a single namespace (bs98pxsfhs9z):

The namespace is also possibly a confounding variable. The DB version is another (more optimizations, engines, etc in newer versions).

However, the "new" query from #13566 is much slower for 1,000 rows, taking ~1250ms:

Interesting. I'd expect it to lack some performance as it adds a subquery, but did not expect a contradictory result.
That being said, the original goal of this issue (per its description) is to replace that with a union index anyway.

Also hopefully @ryancurrah will see this soon and add some reproducible color
The EXPLAIN logs in the original issue might help as well.

ryancurrah · 2024-10-09T15:55:48Z

In our deployment we only use one namespace for workflows, the workflows have around 10 labels, 7 of the labels are ones we added the rest are from argo.

Result of:

select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;

"avg" "avg-2"
69368.494313104989 12034.487900223380

Result of:

select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;

"total_labels" "average_labels_per_workflow" "average_label_size"
107440 12.6245904690990320 21.3524800539094153

Hope this helps.

This adds basic benchmarks for listing and counting archived workflows so we can evaluate potential optimizations, e.g. argoproj#13601 ``` $ make BenchmarkWorkflowArchive GIT_COMMIT=f218fc540367840d013a99642590d3509560de51 GIT_BRANCH=feat-postgresql-jsonb GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64 RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true go test --tags api,cli,cron,executor,examples,corefunctional,functional,plugins ./test/e2e -run='BenchmarkWorkflowArchive' -benchmem -bench . WARN[0000] Non-transient error: <nil> WARN[0000] Non-transient error: <nil> goos: linux goarch: amd64 pkg: github.com/argoproj/argo-workflows/v3/test/e2e cpu: 12th Gen Intel(R) Core(TM) i5-12400 BenchmarkWorkflowArchive/ListWorkflows-12 6 167109091 ns/op 527468 B/op 8614 allocs/op --- BENCH: BenchmarkWorkflowArchive/ListWorkflows-12 workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows workflow_archive_test.go:27: Found 100 workflows BenchmarkWorkflowArchive/CountWorkflows-12 31 36799882 ns/op 9022 B/op 212 allocs/op --- BENCH: BenchmarkWorkflowArchive/CountWorkflows-12 workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows workflow_archive_test.go:37: Found 100756 workflows ... [output truncated] PASS ok github.com/argoproj/argo-workflows/v3/test/e2e 3.392s ``` Signed-off-by: Mason Malone <[email protected]>

MasonM · 2024-10-17T20:32:56Z

Okay, I spent a lot of time looking into this, and I'm pretty sure the main problem is a TOAST issue (as I alluded to #13601 (comment)) introduced in #12912. That PR changed the query used to list workflows from select workflow from argo_archived_workflows... to

select name, namespace, uid, phase, startedat, finishedat, coalesce((workflow::json)->'metadata'->>'labels', '{}') as labels, coalesce((workflow::json)->'metadata'->>'annotations', '{}') as annotations, coalesce((workflow::json)->'status'->>'progress', '') as progress from argo_archived_workflows ...

While this change reduces the amount of data that needs to be transferred, it still significantly reduces performance for users with a large number of workflows due to TOAST. The article Jsonb: few more stories about the performance has a good explanation:

At the first glance there is no significant differense between them, but this impression is wrong. In the first case data is detoasted just once, everything is fine. Unfortunately, in the second case data is detoasted twice for both keys. And it’s like that not only for jsonb, but for all composite data types in PostgreSQL. In general, it’s not a big issue, but if our document is quite big and we’re trying to extract too many keys, we can end up with huge overhead.

In this case, the workflow column is being detoasted three times. If the average workflow is small, the performance impact is negligible, but it can become very high if you are storing 50KB+ workflows.

Reverting #12912 solves this, which I verified with the following steps:

Use test: simple DB CLI for local development #13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

$ time go run ./hack/db fake-archived-workflows  --template "@very-large-workflow.yaml" --rows 100000
Using seed 1935828722624432788
Clusters: [default]
Namespaces: [argo]
Inserted 100000 rows

real    18m35.316s
user    3m2.447s
sys     0m44.972s

Run the benchmarks using test: basic benchmarks for workflow archive DB operations #13767:
```
make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
```
Checkout this branch: main...MasonM:argo-workflows:revert-12912

Re-run benchmarks:

make BenchmarkWorkflowArchive > postgres_revert_10000_workflows.txt

Compare results using benchstat:

$ benchstat postgres_before_10000_workflows.txt postgres_revert_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows.txt │  postgres_revert_10000_workflows.txt  │
                                                     │               sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                             184.47m ± ∞ ¹   47.90m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                         184.23m ± ∞ ¹   46.70m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             13.35m ± ∞ ¹   12.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                       76.84m         30.74m        -59.99%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

However, reverting #12912 could re-introduce the memory issues mentioned in that PR, and will probably worsen query performance for MySQL (though I haven't tested that).

Migrating the workflow column to be of type jsonb is the biggest thing I could identify to improve performance. It doesn't eliminate the detoast overhead, but JSON operators are so much faster on jsonb that it doesn't matter. The only downside is it takes a long time to migrate if you have a lot of workflows. I'll enter a followup PR with more details, but I figured I'd get my findings out first. EDIT: Here's the PR: #13779

…j#13601 As explained in argoproj#13601 (comment), I believe argoproj#12912 introduced a performance regression when listing workflows for PostgreSQL users. Reverting that PR could re-introduce the memory issues mentioned in the PR description, so instead this mitigates the impact by converting the `workflow` column to be of type `jsonb`. Initially `workflow` was of type `text`, and was migrated to `json` in argoproj#2152. I'm not sure why `jsonb` wasn't chosen, but [based on this comment in the linked issue](argoproj#2133 (comment)), I think it was simply an oversight. Here's the relevant docs (https://www.postgresql.org/docs/current/datatype-json.html): > The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage. > > Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept. > > In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. I'm pretty sure we don't care about key order or whitespace. We do care somewhat about insertion speed, but archived workflows are read much more frequently than written, so a slight reduction in write speed that gives a large improvement in read speed is a good tradeoff. Here's steps to test this: 1. Use argoproj#13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template: ``` $ time go run ./hack/db fake-archived-workflows --template "@very-large-workflow.yaml" --rows 100000 Using seed 1935828722624432788 Clusters: [default] Namespaces: [argo] Inserted 100000 rows real 18m35.316s user 3m2.447s sys 0m44.972s ``` 2. Run the benchmarks using argoproj#13767: ``` make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt ``` 3. Run the migration the DB CLI: ``` $ time go run ./hack/db migrate INFO[0000] Migrating database schema clusterName=default dbType=postgres INFO[0000] applying database change change="alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb" changeSchemaVersion=60 2024/10/17 18:07:42 Session ID: 00001 Query: alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb Stack: fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673 fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756 fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208 fmt.Append@/usr/local/go/src/fmt/print.go:289 log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261 log.(*Logger).output@/usr/local/go/src/log/log.go:238 log.(*Logger).Print@/usr/local/go/src/log/log.go:260 github.com/argoproj/argo-workflows/v3/persist/sqldb.ansiSQLChange.apply@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/ansi_sql_change.go:11 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:295 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:284 github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.Exec@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:273 main.NewMigrateCommand.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:50 github.com/spf13/cobra.(*Command).execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:985 github.com/spf13/cobra.(*Command).ExecuteC@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 github.com/spf13/cobra.(*Command).Execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041 main.main@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:39 runtime.main@/usr/local/go/src/runtime/proc.go:272 runtime.goexit@/usr/local/go/src/runtime/asm_amd64.s:1700 Rows affected: 0 Error: upper: slow query Time taken: 69.12755s Context: context.Background real 1m10.087s user 0m1.541s sys 0m0.410s ``` 2. Re-run the benchmarks: ``` make BenchmarkWorkflowArchive > postgres_after_10000_workflows.txt ``` 4. Compare results using [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat): ``` $ benchstat postgres_before_10000_workflows3.txt postgres_after_10000_workflows2.txt goos: linux goarch: amd64 pkg: github.com/argoproj/argo-workflows/v3/test/e2e cpu: 12th Gen Intel(R) Core(TM) i5-12400 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ sec/op │ sec/op vs base │ WorkflowArchive/ListWorkflows-12 183.83m ± ∞ ¹ 24.69m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 192.71m ± ∞ ¹ 25.87m ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 13.04m ± ∞ ¹ 11.75m ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 77.31m 19.58m -74.68% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ B/op │ B/op vs base │ WorkflowArchive/ListWorkflows-12 497.2Ki ± ∞ ¹ 497.5Ki ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 503.1Ki ± ∞ ¹ 503.9Ki ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 8.972Ki ± ∞ ¹ 8.899Ki ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 130.9Ki 130.7Ki -0.20% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt │ │ allocs/op │ allocs/op vs base │ WorkflowArchive/ListWorkflows-12 8.373k ± ∞ ¹ 8.370k ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/ListWorkflows_with_label_selector-12 8.410k ± ∞ ¹ 8.406k ± ∞ ¹ ~ (p=1.000 n=1) ² WorkflowArchive/CountWorkflows-12 212.0 ± ∞ ¹ 212.0 ± ∞ ¹ ~ (p=1.000 n=1) ³ geomean 2.462k 2.462k -0.03% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 ³ all samples are equal ``` Signed-off-by: Mason Malone <[email protected]>

kodieg · 2024-10-18T08:15:05Z

One option I found on stack overflow: https://stackoverflow.com/questions/65820380/improving-performance-of-postgres-jsonb-queries-combined-with-relational-queries

I did quick tests of few variations of queries

EXPLAIN ANALYZE
WITH tab as (SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       (workflow::jsonb)->'metadata' as metadata,
       (workflow::jsonb)->'status' as status,
       (workflow::jsonb)->'spec' as spec
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
       coalesce(metadata->>'labels', '{}') AS labels,
       coalesce(metadata->>'annotations', '{}') AS annotations,
       coalesce(status->>'progress', '') AS progress,
       coalesce(metadata->>'creationTimestamp', '') AS creationtimestamp,
       spec->>'suspend' AS suspend,
       coalesce(status->>'message', '') AS message,
       coalesce(status->>'estimatedDuration', '0') AS estimatedduration,
       coalesce(status->>'resourcesDuration', '{}') AS resourcesduration
FROM tab

and

EXPLAIN ANALYZE
WITH tab as (SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       (workflow::jsonb) as workflow
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
       coalesce((workflow)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow)->'status'->>'progress', '') AS progress,
       coalesce((workflow)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow)->'spec'->>'suspend' AS suspend,
       coalesce((workflow)->'status'->>'message', '') AS message,
       coalesce((workflow)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM tab

First one worked better for jsonb column, second allowed to mitigate case of having json type.

@MasonM could you maybe add those queries to your benchmarks?

MasonM · 2024-10-19T00:56:32Z

@kodieg Thanks! The first query (with the CTE) is indeed a lot faster when I test it on PostgreSQL (~40% with json and ~60% with jsonb). I can enter a PR with more details once #13767 is merged (trying to juggle all these branches is getting to be a bit of a hassle).

tooptoop4 · 2024-10-19T04:50:51Z

#9516 might help

agilgur5 added type/bug type/regression Regression from previous behavior (a specific type of bug) area/api Argo Server API area/workflow-archive P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important labels Sep 14, 2024

agilgur5 added this to the v3.5.x patches milestone Sep 14, 2024

agilgur5 added the solution/suggested A solution to the bug has been suggested. Someone needs to implement it. label Sep 14, 2024

This was referenced Sep 14, 2024

fix(api): optimise archived list query. Fixes #13295 #13566

Merged

3.5: Improve Archived API/DB read performance #13295

Closed

MasonM mentioned this issue Oct 6, 2024

test: simple DB CLI for local development #13715

Open

MasonM mentioned this issue Oct 16, 2024

test: basic benchmarks for workflow archive DB operations #13767

Open

This was referenced Oct 16, 2024

build: add make targets to dump the database #13771

Merged

fix: remove JSON cast when querying archived workflows #13777

Merged

MasonM mentioned this issue Oct 17, 2024

fix!: migrate argo_archived_workflows.workflow to jsonb #13779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.5: Further optimize Archive List API call / DB query #13601

3.5: Further optimize Archive List API call / DB query #13601

agilgur5 commented Sep 14, 2024 •

edited

Loading

kodieg commented Sep 26, 2024 •

edited by agilgur5

Loading

agilgur5 commented Sep 26, 2024 •

edited

Loading

kodieg commented Sep 26, 2024 •

edited by agilgur5

Loading

agilgur5 commented Sep 26, 2024 •

edited

Loading

agilgur5 commented Sep 26, 2024

kodieg commented Sep 27, 2024

agilgur5 commented Sep 27, 2024

MasonM commented Oct 2, 2024 •

edited

Loading

Joibel commented Oct 2, 2024

agilgur5 commented Oct 2, 2024

MasonM commented Oct 6, 2024 •

edited by agilgur5

Loading

agilgur5 commented Oct 7, 2024

MasonM commented Oct 7, 2024

kodieg commented Oct 7, 2024 •

edited by agilgur5

Loading

kodieg commented Oct 7, 2024 •

edited by agilgur5

Loading

MasonM commented Oct 8, 2024 •

edited by agilgur5

Loading

agilgur5 commented Oct 8, 2024 •

edited

Loading

ryancurrah commented Oct 9, 2024

MasonM commented Oct 17, 2024 •

edited

Loading

kodieg commented Oct 18, 2024 •

edited

Loading

MasonM commented Oct 19, 2024

tooptoop4 commented Oct 19, 2024

3.5: Further optimize Archive List API call / DB query #13601

3.5: Further optimize Archive List API call / DB query #13601

Comments

agilgur5 commented Sep 14, 2024 • edited Loading

Pre-requisites

What happened? What did you expect to happen?

More context

Version(s)

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container

kodieg commented Sep 26, 2024 • edited by agilgur5 Loading

agilgur5 commented Sep 26, 2024 • edited Loading

kodieg commented Sep 26, 2024 • edited by agilgur5 Loading

agilgur5 commented Sep 26, 2024 • edited Loading

agilgur5 commented Sep 26, 2024

kodieg commented Sep 27, 2024

agilgur5 commented Sep 27, 2024

MasonM commented Oct 2, 2024 • edited Loading

Joibel commented Oct 2, 2024

agilgur5 commented Oct 2, 2024

MasonM commented Oct 6, 2024 • edited by agilgur5 Loading

agilgur5 commented Oct 7, 2024

MasonM commented Oct 7, 2024

kodieg commented Oct 7, 2024 • edited by agilgur5 Loading

kodieg commented Oct 7, 2024 • edited by agilgur5 Loading

MasonM commented Oct 8, 2024 • edited by agilgur5 Loading

agilgur5 commented Oct 8, 2024 • edited Loading

ryancurrah commented Oct 9, 2024

MasonM commented Oct 17, 2024 • edited Loading

kodieg commented Oct 18, 2024 • edited Loading

MasonM commented Oct 19, 2024

tooptoop4 commented Oct 19, 2024

agilgur5 commented Sep 14, 2024 •

edited

Loading

kodieg commented Sep 26, 2024 •

edited by agilgur5

Loading

agilgur5 commented Sep 26, 2024 •

edited

Loading

kodieg commented Sep 26, 2024 •

edited by agilgur5

Loading

agilgur5 commented Sep 26, 2024 •

edited

Loading

MasonM commented Oct 2, 2024 •

edited

Loading

MasonM commented Oct 6, 2024 •

edited by agilgur5

Loading

kodieg commented Oct 7, 2024 •

edited by agilgur5

Loading

kodieg commented Oct 7, 2024 •

edited by agilgur5

Loading

MasonM commented Oct 8, 2024 •

edited by agilgur5

Loading

agilgur5 commented Oct 8, 2024 •

edited

Loading

MasonM commented Oct 17, 2024 •

edited

Loading

kodieg commented Oct 18, 2024 •

edited

Loading