Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: compress node status when offloading to database. Fixes #13290 #13313

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

imliuda
Copy link
Contributor

@imliuda imliuda commented Jul 6, 2024

Fixes #13290

Motivation

Optimise offload status size, compress it before saving it to database, improve database performance, reduce slow queries. In our test cases, the size of node status can be 1/8 of uncompressed.

Modifications

Add a new database column "compressednodes" to argo_workflows, compress node status before saving it, decompress it after retrieving it from database. And compatible with old nodes method.

Verification

I tested the sql migration for both mysql and postgres, running some workflows using old version, and kill controller, then start the controller of new version, verified database schema have been changed, new writes to database are saved to compressednodes column. tested argo ui, it can list workflows and get workflow details with no error.

for compress of nodes when archiving, i tested database schema migration, list and get archives in UI for both old and new workflows. and i tested with argocli command line tool.

@imliuda imliuda marked this pull request as ready for review July 6, 2024 16:36
@tooptoop4
Copy link
Contributor

does this fix just affect mysql? or postgres getting benefit too?

@imliuda
Copy link
Contributor Author

imliuda commented Jul 7, 2024

in theory, it does affect postgres too, the key is the size can decreased a lot, so the performance of writing database binlog get improved also a lot.

@imliuda imliuda force-pushed the offload-compress branch 4 times, most recently from 96290a0 to b5c8645 Compare July 7, 2024 18:54
@imliuda imliuda changed the title fix: compress node status when offloading to database. Fixes #13290 fix: compress node status when offloading and archiving to database. Fixes #13290 Jul 8, 2024
@agilgur5
Copy link
Member

agilgur5 commented Jul 9, 2024

In our test cases, the size of node status can be 1/8 of uncompressed.

What was the speed difference observed? Did it make a significant reduction in query time?

@imliuda
Copy link
Contributor Author

imliuda commented Jul 9, 2024

Mysql long_query_time is set to 1s, after compressed, slow queries disappears. Before that, there 13033 slow queries, average query time is 10.72s in our test. And following is our test workflow, we run it 1000 parallelly.

Test Workflow
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: argo-test-
spec:
  entrypoint: argo-test
  ttlStrategy:
    secondsAfterCompletion: 7200
    secondsAfterFailure: 7200
    secondsAfterSuccess: 7200
  podGC:
    strategy: OnPodCompletion
  arguments:
    parameters:
    - name: param
      value: |
        # all these lines are random generated
        # 80 lines, each 128 chars
        knDCHPuPJz3Pr8q8KkA6yMKm4eZ9NrW9Dkt1tP1Y5YpWALjBY0LaQMxePwFgJSafM9MXx42LkzFvw90PhMzCJaFzRaw9PkuLwdKNrQPyp7T1FtmL91t4mTzW6WNwZXKg
        FpvLmSJzzgyReu7ZmKXM1gSijaW39MFu3g4G37UBimSK4xquFxwFWCQdXiHvtiu6r0EDhBZ0qJSXWfedUg7R2yedZ1n2xg7SbnK5iA8JpS0PqevPLLQwP1m3NmEYEYUf
        NaGt1nwvuP5RPYGzwCwUBppC0bpTbjuqdiNi86HmEgPcwV6uxEjw60fwuF3TBaEYZjHK8iTgDjFBKba9QGGe3bvDpq54uRca1691Hni669d2P3e7jGnHWmmiWc9YAjNK
        wpb9DfKDq0Wm503LtAc5f7V3tVQ6SDPSv0dpMaSV7HDK6kDrdvYCAuJh8gqdzSHbNFY45HJFqDUk6zYJ7h9cZyiCSPJCCKSfQ6Vp5H6Gp9mvqnWHWpwDRJEXZLK53bhf
        GS6beY1KZVkk9A85VAhUikzW1hqmFcZdv8hzHjjbFtz1zWTBjgLmiSzVzKRGQ1aSzEV5YESWwzU65UBEk3YnHimP6iEM2q2VzZzzNUB2dH4MTwhDE6hXmeUqCdLT3a5B
        v864kqpUAZ8B5WcUGSAVbNWrwmJFjcitbB1idH5bNghM9edMY0gN7WNGWmW5hitfTkZqMzx7tWzExFm5VZD02rWAT9FLGyKzHCi9Duj1Z5My24GmGuVXvEZ0Vr2SMP4G
        hbEZfRKY8rjcJpifx33PAFKCtQ66qDnbcLmVp0CVkpEjKUu8TUWeZTxZTYrMGZGuktphEczMy8cBfdtPKei30cn0MkPVA3ph3Q5hAUrMeYFbthRYyRakcqVmezz2wBhB
        GAuiHSRNhi7fNyLqqxmm8cghrp8rf14BESzRSHSt0CduFnFpieZV5ZTTGrr8CLvfqzCcS7MFcrVeGNDya7T9N8etye80SWvNvuezvzUc24Fd5Cw0SRt14QbXpQq3Dkkq
        FXy7gb5MJerHQtChKjznFnuS56n7rBuek4ep0XX6wtGh7u1AT80RGzBxU88DhC2p5fPeS7BpRdTAz85xB0R3TqvcVcfEwjA1QKNQ5cQfQ49hMSPhxRuaNECeYifkxxhj
        C3z0ijkndYa64ANrjVRTMS4hx59tF8mSDrWKCSQw9kaEYUEeMAJpLzFuwE05ky7bJ4YUnP2UTe9dqpnGuSx2UBJ0FJXtrbxbML66q7z0znLMB2LKeHNh65SUwje1HG6f
        d9jB6whtnfNgUa3PeiZj97qvY7nMwx1XdDSfdVmWQfUmEF2kVviTJp9SZKpyUTFzhjN0U38vPBDjK67qfn82xChvQkwWd2S6axATYUGH9n0gB8FFJxSk93c3texmgDfM
        pGbM5BiAzvaeLWhwLv5YZvhYCTxWHFC812qMLHEKbTwJgyQcQ40NLrmkAckdXj4x1KhhtSqCWSkHABfR0JRYaT59VfPHF5nHxPRNQXbZMcQ6bCJ85yGR8Gd2kcgxp51R
        x28zSBnYwmegzFwtMhzmUqiUBJRQGm6UTVH4pZQu421YRf4xtryEqqKS5SVqEDX4mW7VjTSpkWGbMLqaQ4ZBA3ZtZL5q8RmSjXVXbgVfGnw0CMhQQyJLHN1DRi5168aH
        YtfaKSq3hedzi0C6D6EYDRFRTXDu0Jz6xnbtDPkPbN9rvKQQ4PnmhRrgKnNkfxXBZz2mND1VKz8qhc6n2mQ5rqwZgFEW7H3wFHQb6h5yLu0mgLwyX4VpCN70f64CSLmH
        USLJih8zpFrbZRxZzYYp7q2ZxxmuWZnzULveHPiwMu96jSASTpMQGtqfCbMfT0CD9PhLUU3VWLvNdWyVPteXmELiZ98iqMzTh0jexyGczqC9eAtUGNQQmB9kGfCcV4Hi
        66aQ4X1ArE6nxY5yP9WZQ4MVVrr9N4D65cLkB9qZkM2jFcreYiLD0YneVtiPciFaYbbq2ShG7GGKhr8jeVYKiJdkjtFJvvCfZ4xt2fWuQXSLhui5Xy7ePLwfhRAGkjRr
        5ccHed0DPeE5K2DTUZ32bQkFwNrncVLb6UrMkCJSQ2QwYUDHjr4HP8k6fQpmPedAnSEGVDrHBGXNX013nEx3UZ6Vn83ESckrXR4qcMhNWTVrzyHkF7WmySSVtbadF8Bp
        Qf0MSaWzZ3QCfviPZXEaEKSf0PHpX0SDuS1B7Jeb69KitZnqX5Ky1FhH3Gcy8CrdzjHCt6c5b9TUuS1FgxFeVLZMgArEEyChJXtKtTuKwShMjauNcNc5ZQSUEfZB5fW0
        w2ZvhAHy7x7ZVGHPM003x8CF8fgvyTJB0xaJSfgq9fZLKWHqq5cL0epwR7DWZakAHayi2GEkSzF9b1NNJFZRky2yE5WmiPc8XLZWNtWSHng8rNrCN179GaRMaduWF6NS
        cKATQMdWheUxAptj44Sg71jxhHCJfuKAzyZnxgJqFgq4wqL4bqREKNwLJ4NUeQhYp5DpgVzLS1GJccv7xgVv6vcDUzvYFcuWzw3K7d8qqLJzBB0Yad1yUBa38xTfSPjW
        H3BBnDm7MjeXxU9B3ckWn89GZDU0SkL6yY41h3HitUfBcn7fHhjLY7cy2reDZcfEcf9m9jMpLUqbqyCQei0KHNDWKSKwph5fiQrUPGWHaZExbCX0nyrDmqgzD6B6SRqM
        Xk7CvSMaMSa2C4Ke1Vzn3L3Hh6W8J9WAa5Dr46rTZammu7SxrtjyZQbPBh57pN7vbc93HXbZQCSzwwNLytPcuLDtmjXXTz3B7but89M6zYf7jE5vFkwQti6vaehNeBmp
        gx4DB2nfttazcRQ5p8bTp5izCmJgMTQdSmG3e24KfzyGmwrBa7TMqyWnmRFPGECHNCBJXMGC4yeQ3gH7tpBvjhC3Yi9THyk2VkYMq3x1eRcTJmVEhWfeUxES3DbuTk0q
        Yb6bJjzA4tk3FAA0eDGnHci9dUQ1VUmntqfp8L7dtYxbBbgxgN4MuGSKDdE95QdPqPjdDujb6yTWXVfHSF96QFYvMkm3BNdprJKSLkW8RNU2uVPSFNhwk5BmY3KrHk6j
        pZtQc531ADhzi00DcBB1p5SBhL1SaN57zSdXThNQehyEDijhRpiTwBPben55pGBnQaf8D4hYyNbebcFULjvxe9PSFBvrfLWNjcBYUmNptwyuKhcHRCgKRWJaDpdSGDyq
        FeZXDHp11n6t1h1kPrkGm8gnUycApvSXGYzvz3R1mydgAt76M98YTdRaJ9dxVwC9LWjw75G78G66ZLZaEJtP7THPdC466jVmfXqDS5Ba4p9wD8KfkicDp4zAqnRC1TeE
        kKxi6aTNyxmvPpGgJv3VVuMzEdVKfcWyKfT89Uci837TZUUtfcew6u2bpD9w6TtXC5xiwu6yu3Dq3HPYpYpXnbRHV62ZXUMrjc4nKyD46V4Eb6qxQtpPMXpkj3GFWW27
        G8nUKQgek6e32uDpwDrrh4RkBGF708K2TTuSbLu3myC7SiSQnrUWWAqx7HzUHgZYP4RbhBbe40jZxETdZqd7Huq374UWdDULdnW9FAYE8ViRWBSgPeje48LqJMh4qKJA
        P0kmdwEPkGrFA6xrzmbJBiQz4YkaA78AjQZygipfjiCLaHM792vAVtYGaqNweUwnBvwTwpub4vSeVqhmjtUi9D4VV7k801XZQ7GW7F7BBcKL9Lm7u1dxCwrymVGtXVzh
        JiTWyjZfQLbVRdZQdmnJ2EK5nEFaLjDPeRNQqhJq3QLhPCv2Lfg9ev7Q5xyQb5vd5iidK0W1B296y9MuhQc8p2LrDkCi0yRqHFG5uHaCau7XzNaj7U4Yd4yhe46Makbw
        k9pkP0ipMA55aZYkyn4K3Bm4p0PvrQZ1aWubXezBrcZqU5uvzL9PBvAtVqYB4kPXvLgykwqz9EdycVFMenW1CnFPFwJRXCxTSY8xmV8rvtE4kK1buD2J5BRFff2PiGR6
        5An0Yzwz1RAqm6tvQLkpK2fJSZuNxhUUbtfeb3J6UrdHyGwKUDNELqUrzCYiSpy80pb6WXTYPZjhZzDGetGt0JdANPdyN4jGpYiBZJqfRr6Afjqp7Tgmy5AbLN0PQ7ZR
        S0SrN7aZFMicUZSQhc8GyjnXpbSCMSWYjLvahxvhwtku1YX64M30JZcGfmNCzbKL63dPRu83d1Lyb9EDCmdNTiWWycdZ1NKDGF35atS3ynXQ9BKjAf9rtLgXKCSPSi39
        HnG3wC6v01V0AdjXc8HBWufxSC6j7wnrMNF1RFe0tF7va5440Sx1kFrYkiW4kW1guHXqUph6gpqybQvyjPN8C7AfQ32ShxWQbKVbcmg8BS6vjveX2kHr3XZvig5iAM8B
        bdGt4rSVpCBJYN8idb5TSiAQ7qMnyPVp0eepuw5afCTXabdgWLdTBtBR3EUjTYTmDVE2tdpLfpygxLYxqkemCZig6FEr1myArdNFwEqE7CbUVgCKYdnf0XbxdUVw6t6U
        4NzQA70ccDtd4trz5WyBKa2qWkLQ7kPxMSx1DqCwrijVxL1rYVctCqU15Mm9ByM7FmfHTPp5EbSBgMQzK8fMiTGTvfr8btuEKp8gfrHeuR8U3LStK7TFmqQ9dHaL2Q0Y
        wiJ4uBFwpGV8BiUEDTv1EYP6ygmKbMGLi6pbb5JC7tBc7m20Z56WZGy6nTG4xNjqAVHF9QwEdzR3B0KFyB41rTu6D7HDAcX5gJcKeaEyQZjDU4Wd3nQcXMZDLDTTGS1X
        iWChjpSxUG5LMnTUQZrFwaFqyQGpWQpfuSHWk18TTwRnQ9G4WCY1v15LbgABDV972ZQj3LhVyeUxxDC9yS4fJSrb0h12BNkG7nydVKgNYZAgcpBXXffbSiX1MW38ccxv
        d23RD6KpMW8YVxdhQNqqHSy2nEGfMrAm8v78CVxgkqhuuPYMpkdDGMK2zgm8xczWdbcaTfN57kiTb9fG9Td9fYpPKC6HpRwg0ZUWeaSwVJxdmSvEfPSNY66WNmKVHBDp
        gCnSRRvtb0q7MLJLzex2QtBpMAUEzHWjmWamNuJjZQ47g5Fc8ZZVPwxc10zUcw3q1ay1zVFPAXFM5G8NLqz2vuQXqzT0uXhx1HkN9CJCkxpit0VE6epdkRq6xMzQAVta
        HhqH3513N62524PceKrjCRatDWvmb7m4kt6tkytWRA7kT3TmWXwvCtRENKvu7qc0WJXGmNMxeUdqE7rbvB9Ud86HrweUy83SdZE0VAravfykP6SDdupAGGpid4Sre5Wv
        iQcjvR5ZHAgH9gLVE1z31maPEBuL7SaSfAmkZdV3P4WzuEJTZLJ7TJSG3rPgQiF9z9W99KEih7uxAiUUQaSdgU1gGYtgr28cUKcr4VMrwK6iT063MXkxYQivQdWPjrPm
        LfN8thTa7BFznPiuxDNaG3N55XzUYCiQJ9ewxDn5aUP4bccxED9Qdb8BNRwWE04EuLPKpHh0uUJ3U7ChWpK6L8BQpTCZ4JDyWxkuD90rh4BeECX0x7jPYmvMpMN0GQ9k
        kaArKANnCgxw82kf3TiVctG2pA8D39XVamcLAYfZhcYfF9YD2Jik7x1d9cB5KNNgma9tLihgKZHyGVmuGUM3XTWMP8n956kptCxra7S18BAJEH7xTLBeq49eBiDJuwtd
        TVhgvNMpevcbVTJzZchrA8f6xkuCQpPt1jf2Y1ic0zw24w5f1K9dBLJzVaCPX3572HeVnYVyWzegL2t15R9CX10dQpHtKf4VBbaPUJz0Q9yJf5e1hzgwxYrET6abtKea
        cMumFzm9pe7ad50FQUaV0mNDtBKLNhea3BhdRPk7rhGW6vMHg23Q6jp8JYYHg3S2WE7zBXpK9RT05Jqewvbi7CdQuEk5Hd3QmL7N0iCeKJaUCWe133qCWMx6PdkJKSAE
        Gk7XtRr9McjFHMpYG4Bbzcgmrrtzb96ki75TQu4t5EBDZYypzaLaVx77HbR4hEXtW2Q2wJmDR8uNY63xxzSBPZf3be4m9aBnuMx34gS4TJfFdKFrPBYq0atpAgzWLixj
        wFyQNij91CbCh1FHDBLkr8NNHGQ7HGHmZRpVmrcWRLvZ1bSvCrt8jHSV4bC4tZvTVkNjeuueq2KMuHjMnR6zdUJZx3VkEMnkCGxRzR8hQ7WG9ZmDhm38TRkk24JArAnM
        FttWCUbzWUx5b82XvPgBg9La9fBqgkGrTuC9E18qS2qLvHhTpnQF0cGPRZSLGr6e7vcbXcGxZNNR15rffwfWVipb2XpLJhdLGng4ykdM967wR3jzyh3361KUzGiBzrqt
        6bLbhRipTuPA0DRYmP1jBCAXP0ckWXtWhy2eLbUw8xPP5td7LzdvbkjZDvMXExnduLKJUcJXhdiKCjVPYJDHEdZ2xWaRqS1PmQqmVETumHQixVHvzqCYCx6CEgzkTRdn
        Ud1SBRCurt4xQrfbprDE0Vt0MiqfBWwUaCihjDvybfbA74YrZZXeVJkWV0x0B7nZ3yarSdcJVnw4xydPdW1YQe0e8m6krEncnJAKiMYZXe3PvDPjZGfzVVPrdYhJ9qhc
        WBYxyjjLrx4hjQ7BLtTx1BzAA5wYk6zN2ZeJxuWk65CUErmUfAXWgapXPXVVLFWA1nyWR2cnCXfhYXmt8cyFcJGULn4cbLTSRgF7QyQweYJmhB7w9RXKRKBdhnyuYeNZ
        a8PykBBY8Xzj60rpMqwkz4GLhnbiwUgM3qW1Gmu8dzPHbQPiEJZQGmnRAiHCKVHb74WB8W0kiMvWUaajfigreknBAdvfUYFinDyJXH58NPEygR14QXGY3VY3Xy9UMCwc
        b1yBM7qe9YmRAQnXMjgv707qxaw0JM2cS22wqiHAywejbpHTnVTUhHXnJvFiLxXuVqSAt9KiSE78NNFp0aAtUHxrAw0XMfPxek7SvazJctA3H07uC0HKV5Q8YN7DCdb5
        Z3NB4XHRjtHUMMj8FDUAfSz6i5vBWhVLBz2raSVUQ6fe2bGTUazNuu8ED5SVBKTfk9vZ2urbWT2k6GTBfinF4KuYdwcYrpH3vJCLKwKFbKrnBRJ4DUkZwZuUz0ggcNRb
        UzHV6CSCyZdjB7F08TtfPDRxg44St7dq6SHjWTUEeiYFS8eDtARA0bTpwmQG4KwcxBnDXVNNWN9HbUa8AEBheXNRMF9CLJmSV05DQFShHwKUr4VTDUBmravwycFEjgC5
        nd8FcrxG5fUr8G4QnTiXiPh58nKkrSN7JTXUSY7VBpPgwc381digLd2i0PxatW2E8PDPpcmkzfmj2zFUCND5k3rrZKF9hrV9hq3uLEttKPg9ECZTQD4MfuKcQtV6cyw5
        4iQcWxaVqSPripUp7xxyc5yq1KaQ0YiQmPBR3GvY1Tr7uBiL5xqftnba5EekvpTZrBr7S4teWJE72v7NFjJgAb1LvXNmEHA91JzDx4ZWBPhaE6nfg59wfuh4FXKPQ703
        m3nXhJJdEZueKTh6TX97bfDLQ9qPtwuR83vLMwaEGpYkSJAbVdDhLSWQPt652WFPAt8cVKQazanmvQL0yLFNzA0QAeYCwxdjSN4aifJqLrWkZJbTEhdUj7DEw0GS0VKt
        UyvgYMc2tL7n5WUGbm6Gh06n9LZNQawZhYRDkwYy5aqjvB91xRrBnTDGJ0FZV0r9rTxd5MRy04wiKetpNAJrhuZ0YPwkKL5EUGpzL74MfS2kgityBaJv4vjFvdpM1mJb
        9uvjuNUaHkZQTiaK69dQ1xGzNyp1HP8J5C6YhHCuPHP9yGe9rhUE30miErZMch0bYrCR0DyhGJfrqb3RLYZw81TGHqBuQKA9fje9u8tTPSkDrB6MWVrVTbe20JF6FLGY
        eJ2by1VefH5LL9Kbz2M4eYZcD1aFy6ic8JB3ECFRFxZZV1Bcy56xdXa9ACLrDZ1yLvFgFtnXzF3iEzNFG9jDfKjjta3vtZk19zkA7vekSBrA748YD0Qeec5788ckmxM2
        giw5T54kPz1qxL251B8dDLXCNcBzXkCAPW7ebyf659aT1KgR898gbSvnbrxx4jmCCDwYHHV5FK1FirdP2G1CzHCBLVehmH5m4j2pd8wdW8hpdbWqHfSjXM8v0dzZQHwB
        LiaiaMUuU7Ct9VyBvUcGQ0HSnfXRrZbYurRnSqfq7SWdyz7vfhwM06bWxJyhENC6R6wGLxVHu2bKDGwekFb7jp55QhNBw3udWq7MKBX6gxcA43PXdViKYHXYmWEM9ueS
        LzwhrdWEKd6Jcxg9tFjLxhfNG0LBjDVtXZ9pKQQjHeZjgxbB1kVWy3WSfUt87G25Fpx7EwiqpRfW9HUARGC7cfQaAhvm75ayMQ7wBLL0waQRa80JmjFdzJCd88jpkx9R
        tiyAM2rSxirGHJwgeNKN6AynLjrQ3nXxNgw0QKMvpcZ6Qci4nWaTdFpBzKnLgNVike4p3B5ufwtDFzmEYtKCWgBj01xzzLAjSgu6k6JDuJY5JT0F4mvCkQPLq91iEXeC
        M9Em6TBu2hL29XMHYwPKhVumdByW1EPMgBcaQ0ffjtStk1Kggpr94GrefCZPS4xLZRmrWKnH6NtiFA4mC3DtwC7V5wQBSBZmtrCQa7fCiTxJinuNJpxJFvSW2Y2U6UPn
        MN7i57UQpXgPEnTFnxvjj6kehXee9k1nfQBDF71CwwjS9eaCVDXWzSjFkGXC2DwMBUjXH7traneD4YZQ47f9WBCxnwBhecedgLanCpXfaNik7SZPQvDXx0gCBgf06zYH
        pAJVDcRemJTKHge5SK4MjCmTpjamuL1Ajvib0XYrQNH5MSE6SUP1dYbgk9T3neP1K0NgmDcivFAk0q5EDTbtwFWYVjSngt03gyyRTXFb5uD6ZfW5qvSgzdaK7MMSCy4k
        BHU7p9jEM9iGfkhAxdA8X7rc3rjkRSLUyFpvZCebrPKWJfYpAR5zuH39DNz4PDfw4CKH5XQua7zZHWc8cp69wu0DFMk7NzdP9wQgJby3BrZWyx53NvkzBhPyzUYZ40RQ
        Ym8z4x0qnJUFN4AJ37UaRYX8kJYM2vRbQZWKRtAmKrumYZhAt6Cr5y9KxNqf5VKyZ8fyyuqt4ZF7imM78PYHwAHCaBF9EwbNqUBxH70yvG0qd1RADZYB4bm1DX7yzRnx
        VVBnh269t2zTS50GVGLPyat62Y6iXnVrgk424RcUQnKfU3EVN0amC54RCZTBMeEYRSzBkHLciQGwwtUeKcFavPd4wqRfxh32zLuq7dReSeAxiCH3vSBnuJVpk23xBm8g
        g87ie7rJdi11EXnVJaMdPZN8aVVkGq7191gF8DvRr1h6VQAzdy8fjuxZRPQ4VqE3jT1ZeXEBN3SSRmNB8ZXqnAaPSqJBTAV1ATqnFT60fGjXYLgLqh250zmJJJMHdkJf
        9v48h93xwwaaFCQLLQcZStCRxfavkMt8HaySjYDnzwPEmcPKGvPigiH94CEtrAKBbNzy6JFHyMQ6WX87LEwb7S4JBA7UuUfmdMZqwaVJHAhfHrdiFgPCyBXD2bn7nbaQ
        UwKP6cmyAYHyKJNqYhcPMHfQFvBUZqmKTPtCaj4KHYZW6hKXg45D8AQfmb51S25aT6LH6cJ24RUMvMSM5XtML22c3bkYQ3NRr2tmBKTx4aRaH9QnF7NMmwXFyB6uxdCJ
        6HjuH8kLpTSaYGnNUKuYYq6hzc9EMWzHpJBrhANtJiw3T9q16923hR72G9kvtJv7U9ZvB4C09eZ7K08YqkJNSf1JA1j1n6w6EZbZGiGjpSLkXbMcwMbSxEU0c2W2XC7K
        zBcaLLA712uRfit2Yhn3yWd298hkW9wR1xS7mNYrpY7AXGt356ixH8ERHxq9LrYLpizEUFCZhRyAd5Gfn9uc6Pf0bcMU9CC8mLnTUWPcYayy9wuPhaH2xrVmUpC6MawS
        xLPGU8brYfUKDgh4W3nAdBP2DpiztJP29G5vzUyx8yjBwNrVZPu9rMziNintG63yx7TdeE66pdK9uebLpMEM26RfzUavRgGAdU5tZ5tuPdQii23qLu8rH7GX78REngyT
        4pA1bNxn3H7WhBvB768gRkxMdujhgdrP90EQWxtmyXhArn6aE8ANH8Fb2eNgGCfgjF2d7eu7WzRnvaw8dU0WT2e0yUHfEFpCV3YnZ9LXT2SRUH3fnCKJMrhYJzUn1VJS
        y4ePgaqAn0JSwQz1EWExrYHXZKZgqYYZjgimTZ6uW0SZ65CjiLUS02SJj92v3qGKFUET2RtTNB63bLB2zhHh3Tk1aLtAkLbCRnkacCUaB9JLMkdYt7W7cpc4w9LQTvQU
  templates:
  - name: argo-test
    steps:
    - - name: do-work1
        template: do-work
        withParam: "[1,2,3,4,5,6,7,8,9,10]"
    - - name: do-work2
        template: do-work
        withParam: "[1,2,3,4,5,6,7,8,9,10]"
    - - name: do-work3
        template: do-work
        withParam: "[1,2,3,4,5,6,7,8,9,10]"
    - - name: do-work4
        template: do-work
        withParam: "[1,2,3,4,5,6,7,8,9,10]"
    - - name: noop1-1
        template: noop
      - name: noop1-2
        template: noop
      - name: noop1-3
        template: noop
    - - name: noop2-1
        template: noop
      - name: noop2-2
        template: noop
      - name: noop2-3
        template: noop
    - - name: noop3-1
        template: noop
      - name: noop3-2
        template: noop
      - name: noop3-3
        template: noop
    - - name: noop4-1
        template: noop
      - name: noop4-2
        template: noop
      - name: noop4-3
        template: noop
    - - name: noop5-1
        template: noop
      - name: noop5-2
        template: noop
      - name: noop5-3
        template: noop
    - - name: noop6-1
        template: noop
      - name: noop6-2
        template: noop
      - name: noop6-3
        template: noop
    - - name: noop7-1
        template: noop
      - name: noop7-2
        template: noop
      - name: noop7-3
        template: noop
    - - name: noop8-1
        template: noop
      - name: noop8-2
        template: noop
      - name: noop8-3
        template: noop
    - - name: noop9-1
        template: noop
      - name: noop9-2
        template: noop
      - name: noop9-3
        template: noop
    - - name: noop10-1
        template: noop
      - name: noop10-2
        template: noop
      - name: noop10-3
        template: noop
    - - name: noop11-1
        template: noop
      - name: noop11-2
        template: noop
      - name: noop11-3
        template: noop
    - - name: noop12-1
        template: noop
      - name: noop12-2
        template: noop
      - name: noop12-3
        template: noop
    - - name: noop13-1
        template: noop
      - name: noop13-2
        template: noop
      - name: noop13-3
        template: noop
    - - name: noop14-1
        template: noop
      - name: noop14-2
        template: noop
      - name: noop14-3
        template: noop
    - - name: noop15-1
        template: noop
      - name: noop15-2
        template: noop
      - name: noop15-3
        template: noop
    - - name: noop16-1
        template: noop
      - name: noop16-2
        template: noop
      - name: noop16-3
        template: noop
    - - name: noop17-1
        template: noop
      - name: noop17-2
        template: noop
      - name: noop17-3
        template: noop
    - - name: noop18-1
        template: noop
      - name: noop18-2
        template: noop
      - name: noop18-3
        template: noop
    - - name: noop19-1
        template: noop
      - name: noop19-2
        template: noop
      - name: noop19-3
        template: noop
    - - name: noop20-1
        template: noop
      - name: noop20-2
        template: noop
      - name: noop20-3
        template: noop
    - - name: noop21-1
        template: noop
      - name: noop21-2
        template: noop
      - name: noop21-3
        template: noop
    - - name: noop22-1
        template: noop
      - name: noop22-2
        template: noop
      - name: noop22-3
        template: noop
    - - name: noop23-1
        template: noop
      - name: noop23-2
        template: noop
      - name: noop23-3
        template: noop
    - - name: noop24-1
        template: noop
      - name: noop24-2
        template: noop
      - name: noop24-3
        template: noop
    - - name: noop25-1
        template: noop
      - name: noop25-2
        template: noop
      - name: noop25-3
        template: noop
    - - name: noop26-1
        template: noop
      - name: noop26-2
        template: noop
      - name: noop26-3
        template: noop
    - - name: noop27-1
        template: noop
      - name: noop27-2
        template: noop
      - name: noop27-3
        template: noop
    - - name: noop28-1
        template: noop
      - name: noop28-2
        template: noop
      - name: noop28-3
        template: noop
    - - name: noop29-1
        template: noop
      - name: noop29-2
        template: noop
      - name: noop29-3
        template: noop
    - - name: noop30-1
        template: noop
      - name: noop30-2
        template: noop
      - name: noop30-3
        template: noop
  - name: do-work
    inputs:
      parameters:
      - name: param
        value: "{{workflow.parameters.param}}"
    container:
      image: centos:7
      command: [sh, -c, 'sleep 1']
  - name: noop
    container:
      image: centos:7
      command: [sh, -c, 'sleep 1']

Following pictures is comparison of un-compressed and compressed.

image
image

image
image

image
image

And note that we conclude the size decrease is calculated with archived workflows.

select length(JSON_EXTRACT(workflow, '$.status.nodes')) origin, length(TO_BASE64(COMPRESS(JSON_EXTRACT(workflow, '$.status.nodes')))) compressed from argo_archived_workflows where finishedat > timestamp('2024-06-26') and finishedat < timestamp('2024-06-27');

@agilgur5
Copy link
Member

agilgur5 commented Jul 9, 2024

Wow! Great repro, info, & metrics! I'm surprised it's that impactful since it's not a field that's filtered on. I guess the sheer size of the column was causing (order of magnitude) slowdowns, I imagine due to JSON parsing or something -- see #13295 which this might make no longer needed.

One additional question: how's API performance? There I would think the difference is less significant since it's more read heavy and decompression will take up time.

@imliuda
Copy link
Contributor Author

imliuda commented Jul 9, 2024

It looks that we both have database performance issue with #13295, they encountered slow selecting of argo_archived _workflows, we encountered slow inserting of argo_workflows. Maybe add indexes or something else (but not compressing) can solve slow selecting of argo_archived_workflows, but our scenario is slow inserting, I think it is mainly caused by network bandwidth of disk io.

image

image

By the way, in our environment, with millions of archived workflows, the database size can be to hundreds GiB with uncompressed workflow.

One additional question: how's API performance? There I would think the difference is less significant since it's more read heavy and decompression will take up time.

We are using 3.4.x version, so it is separated of workflows from archived workflows, I haven't noticed the API server performance. we should consider the performance of API server if we compress archived workflows, right? Then, I can remove the code of compressing archived workflows in this PR, and open a PR with that issue if compressing can help.

@imliuda imliuda changed the title fix: compress node status when offloading and archiving to database. Fixes #13290 fix: compress node status when offloading to database. Fixes #13290 Jul 10, 2024
@imliuda
Copy link
Contributor Author

imliuda commented Jul 10, 2024

After viewing some related issues, I found it is a little complecated, I rolled back commits about archived workflows and crd generation logic. For now, this is only for offloading performance. @agilgur5 Can you review it again?

@imliuda imliuda force-pushed the offload-compress branch 2 times, most recently from c46b42c to 5eea6d0 Compare August 8, 2024 18:33
…ack and db stability, fix CI errors

Signed-off-by: 刘达 <[email protected]>
@imliuda
Copy link
Contributor Author

imliuda commented Aug 9, 2024

I have made a stress test, the average rate of offload is above 500 ops per second with mysql 8.0, and no deadlocks.

[[email protected] root]# export OFFLOAD_NODE_STATUS_TTL=20m

./main -db-uri 'mysql:******@tcp(x.x.x.x:3306)/argoperftest' -parallel 2000 --archive-count 200000 --archive-clean-size 50000 -archive-workers 20 --duration=90m -node-status-size 20480 -archive-node-status-size 204800

......
time="2024-07-20T19:53:21+08:00" level=info msg="Create archives done"
time="2024-07-20T19:53:21+08:00" level=info msg="First started workflow time is 2024-01-03 09:40:00 +0000 UTC"
time="2024-07-20T19:53:21+08:00" level=info msg="Performing periodic GC" periodicity=5m0s
time="2024-07-20T19:56:21+08:00" level=info msg="Average rate is: 765.323467"
time="2024-07-20T19:58:21+08:00" level=info msg="Performing periodic workflow GC"
time="2024-07-20T19:58:29+08:00" level=info msg="Deleting old offloads that are not live" len_wfs=2588
......
time="2024-07-20T20:31:43+08:00" level=error msg="getdriver: bad connection"
[mysql] 2024/07/20 20:31:43 packets.go:37: read tcp 10.0.64.143:36094->10.0.1.54:3306: read: connection reset by peer
......
[mysql] 2024/07/20 21:40:53 packets.go:37: read tcp 10.0.64.143:44080->10.0.1.54:3306: read: connection reset by peer
time="2024-07-20T21:40:53+08:00" level=error msg="savedriver: bad connection"
[mysql] 2024/07/20 21:40:53 packets.go:37: read tcp 10.0.64.143:44090->10.0.1.5
......
time="2024-07-20T21:41:37+08:00" level=info msg="Deleted archived workflows" rowsAffected=78800
time="2024-07-20T21:41:37+08:00" level=info msg="Cleaning 50000 archives cost 1h45m16.140031751s"
time="2024-07-20T21:41:37+08:00" level=info msg="Average rate when cleaning archives is: 537.429186"

@imliuda
Copy link
Contributor Author

imliuda commented Aug 10, 2024

@agilgur5 Can you review this PR again?

workflow/packer/packer.go Outdated Show resolved Hide resolved
workflow/packer/packer.go Show resolved Hide resolved
…ack and db stability, small fixes

Signed-off-by: 刘达 <[email protected]>
@imliuda
Copy link
Contributor Author

imliuda commented Sep 2, 2024

I want to know does anyone still work on this? 😃

@NullHypothesis
Copy link

Bumping this issue. We've been having problems with workflow offloading because of this (e.g., our database crashes because of OOM) and judging by the issue tracker, we're not alone with this.

@agilgur5 agilgur5 added the area/offloading Node status offloading label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/offloading Node status offloading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve mysql write performance and stability when offloading
5 participants