Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod Release 06/02/24 #546

Merged
merged 25 commits into from
Feb 6, 2024
Merged

Prod Release 06/02/24 #546

merged 25 commits into from
Feb 6, 2024

Commits on Oct 21, 2023

  1. fix: upgrade @next/font from 13.1.6 to 13.5.3

    Snyk has created this PR to upgrade @next/font from 13.1.6 to 13.5.3.
    
    See this package in npm:
    
    
    See this project in Snyk:
    https://app.snyk.io/org/gabehamilton/project/f1490843-1830-4eb0-a957-99816aa5edcc?utm_source=github&utm_medium=referral&page=upgrade-pr
    snyk-bot committed Oct 21, 2023
    Configuration menu
    Copy the full SHA
    c71b7d1 View commit details
    Browse the repository at this point in the history
  2. fix: upgrade @types/node from 18.13.0 to 18.18.1

    Snyk has created this PR to upgrade @types/node from 18.13.0 to 18.18.1.
    
    See this package in npm:
    
    
    See this project in Snyk:
    https://app.snyk.io/org/gabehamilton/project/f1490843-1830-4eb0-a957-99816aa5edcc?utm_source=github&utm_medium=referral&page=upgrade-pr
    snyk-bot committed Oct 21, 2023
    Configuration menu
    Copy the full SHA
    d4ff2ba View commit details
    Browse the repository at this point in the history
  3. fix: upgrade @types/react from 18.0.28 to 18.2.23

    Snyk has created this PR to upgrade @types/react from 18.0.28 to 18.2.23.
    
    See this package in npm:
    
    
    See this project in Snyk:
    https://app.snyk.io/org/gabehamilton/project/f1490843-1830-4eb0-a957-99816aa5edcc?utm_source=github&utm_medium=referral&page=upgrade-pr
    snyk-bot committed Oct 21, 2023
    Configuration menu
    Copy the full SHA
    d4c40e2 View commit details
    Browse the repository at this point in the history
  4. fix: upgrade eslint from 8.34.0 to 8.50.0

    Snyk has created this PR to upgrade eslint from 8.34.0 to 8.50.0.
    
    See this package in npm:
    
    
    See this project in Snyk:
    https://app.snyk.io/org/gabehamilton/project/f1490843-1830-4eb0-a957-99816aa5edcc?utm_source=github&utm_medium=referral&page=upgrade-pr
    snyk-bot committed Oct 21, 2023
    Configuration menu
    Copy the full SHA
    64c06e5 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2023

  1. fix: frontend/package.json to reduce vulnerabilities

    The following vulnerabilities are fixed with an upgrade:
    - https://snyk.io/vuln/SNYK-JS-NEXT-6032387
    snyk-bot committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    34cb33d View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2024

  1. feat: Capture errors thrown within Coordinator (#515)

    Currently, errors thrown within Coordinator V2 will bubble up to
    `main()` and cause the entire application to exit. This PR captures
    those errors, handling them accordingly.
    
    Errors are handled in either of the following ways:
    1. Exponentially retry - data which is 'critical' to the control, i.e.
    the indexer registry, executor/stream list, will be continuously
    retried, blocking the control loop from further progress as it would not
    make sense to continue without this information.
    2. Swallowed - actions such as starting/stopping executors/streams will
    be logged and swallowed. This is preferable over exponential retries as
    individual failures will not block the progress of the control loop,
    therefore allowing other indexers to be acted on. Skipping should be
    fine in this case as it will be retried in the next loop.
    
    I expect this behaviour to evolve over time as we learn more about the
    system, the important thing here is that Coordinator will not crash on
    errors.
    morgsmccauley authored Jan 24, 2024
    Configuration menu
    Copy the full SHA
    aac2273 View commit details
    Browse the repository at this point in the history
  2. feat: Only start indexers set within the allowlist (#518)

    This PR updates Coordinator V2 to read an `allowlist` from Redis, and
    only start accounts which have been added to that list. The `allowlist`
    is a single key in Redis stored as stringified JSON, e.g.:
    ```json
    [{ "account_id": "morgs.near" }]
    ```
    
    I've used JSON as each entry will eventually contain more fields, such
    as an acknowledgement from Coordinator V1, and a flag to set when
    completing the migration.
    
    Additionally, I've added various logs across Coordinator V2.
    morgsmccauley authored Jan 24, 2024
    Configuration menu
    Copy the full SHA
    acec01a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    32d6fa9 View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2024

  1. feat: Support Deployment of Block Streamer (#516)

    Block Streamer needs a Dockerfile in order to build the image and be
    deployable either locally or in GCP. I've created one for the service
    and updated the compose file to set the correct env variables needed.
    This also sserves as a record for what ENV variables need to be set
    during Terraform deployments to GCP.
    
    In addition, there's a small issue with running QueryApi locally where
    if the user runs Runner locally through yarn start, provisioning of
    resources through hasura fails because it populates the postgres host as
    'localhost' whereas it should be 'postgres', as the calls are processed
    inside the hasura docker. This is due to PGHOST being both used and also
    passed to Hasura inside Runner. To fix this, I created a separate env
    variable called PGHOST_HASURA which can be set to 'postgres'. This way,
    Runner can use 'localhost' while it passes 'postgres' to hasura's
    docker.
    darunrs authored Jan 25, 2024
    Configuration menu
    Copy the full SHA
    a431a77 View commit details
    Browse the repository at this point in the history
  2. feat: Enable Block Streams Start from V1 Interruption (#517)

    Coordinator V2 currently defaults to starting block streams with the
    start_block_height value populated in the registry. In order for a
    smooth migration from V1 to V2, we instead want Coordinator V2 to create
    block streams starting where the current indexers left off.
    
    Coordinator V1 now writes a last_published_block value for all indexers
    it manages on each matching block.
    
    Coordinator V2's start_block_height setting workflow needs refactoring.
    This is due to the fact that the exisitng flow defauls to using
    start_block_height as its present in the registry even after we begin
    processing a stream from that block height. We need to refactor the flow
    to ensure we use the start block height once, and otherwise use redis
    last published block height, or fallback if not present.
    
    New flow to handle edge cases will be tracked by
    #521.
    darunrs authored Jan 25, 2024
    Configuration menu
    Copy the full SHA
    920407a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0b6c08a View commit details
    Browse the repository at this point in the history
  4. Merge pull request #314 from near/snyk-upgrade-4ba709d36fa2e38a5ac110…

    …5f6ae38549
    
    [Snyk] Upgrade @types/node from 18.13.0 to 18.18.1
    Kevin101Zhang authored Jan 25, 2024
    Configuration menu
    Copy the full SHA
    c5b1e41 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f568bd0 View commit details
    Browse the repository at this point in the history
  6. Merge pull request #313 from near/snyk-upgrade-c7dc22e6846ada68552465…

    …a4cbc363b0
    
    [Snyk] Upgrade @next/font from 13.1.6 to 13.5.3
    Kevin101Zhang authored Jan 25, 2024
    Configuration menu
    Copy the full SHA
    a23c51e View commit details
    Browse the repository at this point in the history
  7. feat: Logging & Error updates (#526)

    - feat: Cap exponential retry to 30 secs
    - feat: Debug log stream/executor responses
    - feat: Add more logging to coordinator
    morgsmccauley authored Jan 25, 2024
    Configuration menu
    Copy the full SHA
    46e363c View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2024

  1. Configuration menu
    Copy the full SHA
    b6fc249 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #315 from near/snyk-upgrade-a7fd032540f59a0e0521b8…

    …386f1e3465
    
    [Snyk] Upgrade @types/react from 18.0.28 to 18.2.23
    Kevin101Zhang authored Jan 26, 2024
    Configuration menu
    Copy the full SHA
    6116c53 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #334 from near/snyk-fix-415cfdc1a838c227b64a7b254d…

    …253c56
    
    [Snyk] Security upgrade next from 13.1.6 to 13.5.0
    Kevin101Zhang authored Jan 26, 2024
    Configuration menu
    Copy the full SHA
    2317f36 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    102409a View commit details
    Browse the repository at this point in the history
  5. Merge pull request #316 from near/snyk-upgrade-fdf2121163d14312e34f79…

    …6d4e4ec0e4
    
    [Snyk] Upgrade eslint from 8.34.0 to 8.50.0
    Kevin101Zhang authored Jan 26, 2024
    Configuration menu
    Copy the full SHA
    70ea36d View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. fix: Prevent Coordinator from stopping V1 executors (#544)

    As V1 executors are returned via Runner gRPC, Coordinator V2 will
    attempt to synchronise them. These executors should be completely
    ignored, within the synchronisation logic.
    morgsmccauley authored Jan 31, 2024
    Configuration menu
    Copy the full SHA
    593f305 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. fix: Resolve duplicate processing of messages in Runner (#545)

    It was observed that Runner would execute on the same block multiple
    times. This was verified again when @Kevin101Zhang saw that the
    components indexer, which he modified, incorrectly incremented the star
    counter.
    
    I noticed some indexers such as sweat_blockheight very rarely had
    duplicate runs. To validate if Runner was the problem, I shut down
    Runner in dev and triggered some social feed interactions. I verified
    that the block only appeared in Redis once, indicating Coordinator was
    not the problem. When I started Runner again, the problem appeared
    again.
    
    After that, I worked on replicating the issue locally. I built up 3
    messages on a social feed indexer and modified Runner to write its
    executions to a file. Afterward, I ran Runner. I searched the file and
    found that the 3 blocks actually appeared in sequence like so: block 1,
    block 2, block 3, block 1, block 2, block 3... and so on. This seems to
    indicate duplicate workers was the problem but instead that the same
    message was being read into the block array after they were all read.
    
    Finally, with that I found the problem. The way Runner fills its array
    of S3 promises, it reads a stream message and increments a stored
    message ID. Subsequent stream fetches specify to fetch messages after
    that stream ID. This is needed as deletion of stream messages can only
    take place after the message is SUCCESSFULLY processed, to avoid
    problems if Runner was reset.
    
    However, the code which handles the case where no messages are available
    in the stream reset that counting stream ID to '0', as a mechanism to
    ensure messages that were somehow skipped are definitely read. This
    resetting of the ID ended up being the cause. To illustrate, here's a
    scenario:
    
    1. The stream has 3 messages in it. 
    2. Runner starts with 0 and then reads all 3 messages. The counting
    stream ID is the ID of the last read message + 1.
    3. The messages are added to the array as promises.
    4. The first message begins to process. 
    5. The producer loop sees that no messages are found anymore. 
    6. **The producer waits 100ms and then resets the ID.**
    7. The producer, while the first message was processing, fetches the
    same 3 messages again.
    8. This will repeat as long as no new messages appear, while messages
    are in the promise array but not processed.
    
    The fix is simple: Remove the resetting of the stream ID. It is no
    longer necessary as the problem of remaining messages in the stream was
    fixed back in December.
    darunrs authored Feb 1, 2024
    Configuration menu
    Copy the full SHA
    10e2963 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2024

  1. fix: Executors would crash when DmlHandler.create times out (#547)

    In each invocation of runFunction, on a block, DML Handler is created.
    It's responsible for making calls to the database. Part of setting it up
    involves fetching the credentials for the user's DB from Hasura, and
    creating a PG Client. This takes time so the process was run through an
    unawaited async request. While other setup and some of user code is ran,
    the setting up of DML Handler would be completed. The first context.db
    call would await its completion and subsequent calls would have it
    ready.
    
    However, it was observed that when the call to Hasura for the DB
    credentials times out, the error, instead of propagating into a try
    catch, would instead be considered by the Javascript runtime as an
    unhandled Promise Exception, and would terminate the worker thread,
    stopping the indexer.
    
    In order to fix this problem, we need to transition away from keeping
    DmlHandler.create as an unresolved Promise across multiple contexts. The
    approach I've decided to take is to defer the creation of the Hasura
    call promise until the first call of context.db. This adds more latency
    to the first context.db call as it now must wait for the entire process
    to complete. However, this also does not penalize Indexers that don't
    use context.db as their code does not need to connect to Hasura unless
    needed.
    
    Very soon, we will in fact overhaul this existing logic by migrating the
    Hasura credentials call away from runFunctions. This eliminates the
    underlying problem of unresolved promises as none remain afterward. So,
    the focus here is to address the bug, which is a critical problem,
    without too much change, as the workflow will be refactored again soon
    anyway.
    
    
    I also fix a small bug where context.db calls were getting logged under
    the wrong indexer logs table function name.
    darunrs authored Feb 5, 2024
    Configuration menu
    Copy the full SHA
    2c28dc3 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2024

  1. feat: Auto migrate indexers to Control Plane (#527)

    This PR builds on the Redis `allowlist` to auto migrate indexers from
    the existing infrastructure to the Control Plane. An account migration
    requires coordination between both the V1 and V2 architecture - the
    indexer must be removed/ignored from V1, and then correctly configured
    within V2.
    
    ## Allowlist shape
    Each `allowlist` entry now contain the following properties:
    ```rs
    pub struct AllowlistEntry {
        account_id: AccountId, // The account which should be migrated
        v1_ack: bool,          // True if Coordinator V1 has acknowledged the entry
        migrated: bool,        // True if the migration was successful
        failed: bool,          // True if the migration failed
    }
    ```
    
    ## Coordinator V1
    For Coordinator V1, the `allowlist` is really a Denylist, the code/types
    have therefore been named as such. Accounts within the "denylist" should
    be ignored completely by V1. Because we cannot guarantee the timing of
    when this "ignore" actually happens, a flag (`v1_ack`) will be set from
    V1. V2 will only take over once this flag has been set.
    
    Accounts within the "denylist" will be filtered out of the in-memory
    registry. Any new indexer "registrations" will also be ignored.
    In-progress historical backfills haven't been considered as we'll
    disable this functionality anyway.
    
    ## Coordinator V2
    Once acknowledged by V1, Coordinator V2 will attempt to migrate all
    functions under the relevant account. The steps for migration are:
    1. Remove the streams from the Redis `streams` set - preventing Runner
    from starting these indexers implicitly
    2. Stop the existing V1 executors which have already been started via
    the `streams` set
    3. Merge the existing historical (if it exists) and real-time streams
    
    Once migrated, accounts which have `v1_ack && migrated && !failed` will
    be exposed to the control loop, prompting V2 to act on these indexers.
    
    ### `migrated` flag
    For now, the `migrated` flag will not be set on success, preventing V2
    from running the indexer on the new architecture. There are some issues
    around V2 continuing from the right block correctly, so it's best to not
    run them for now. This allows us to test the migration in isolation, not
    worrying about what V2 does after that. I'll add this logic back in once
    #536 is complete.
    
    ### `failed` flag
    If any part of the migration fails, the `failed` flag will be set for
    that account. It would take a significant amount of time to cover all
    the edge cases in code so it would be faster to just set this flag to
    ignore the account, fix the migration manually and then reset the
    `failed` flag.
    morgsmccauley authored Feb 6, 2024
    Configuration menu
    Copy the full SHA
    9ef1f29 View commit details
    Browse the repository at this point in the history
  2. fix: Crashed Runner Executors would continue to display RUNNING (#550)

    Runner sets the status of an Indexer after a successful run or a failed
    run whose error was caught. However, if the executor itself crashed, the
    error would not be caught as the worker is terminated. As a result, the
    status of the indexer would continue to display RUNNING, which was
    incorrect and misleading.
    
    I updated the status by ensuring crashed workers have the STOPPED status
    set. In addition, I added a new FAILING status which will now be set
    when an indexer is still running, but failing on the same block.
    darunrs authored Feb 6, 2024
    Configuration menu
    Copy the full SHA
    fd64a8c View commit details
    Browse the repository at this point in the history