Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler Longhaul Tests #238

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

cicoyle
Copy link

@cicoyle cicoyle commented Oct 21, 2024

  1. Robust Jobs API test (all with ttl set)
  • schedule 100 oneshot jobs indefinitely (repeat = 1)
  • schedule 100 indefinite jobs indefinitely (repeat not set, trigger every 30s)
  • schedule repeat-job job indefinitely (repeat = 5, trigger every 1s due immediately)
  • indefinitely schedule and delete a create-delete-job job (repeat = 1, trigger every 1s)
  1. Fun Scheduler Actor Reminders test that simulates a player session where the players health decays over time and then simulates them getting a health pack or the like where their health is increased periodically. Once the players health gets to 0, the player dies, the reminders (health decay reminder and health increase reminder) are unregistered and then the player is revived and the reminders are started again. Note the SchedulerReminders config is set.

  2. Workflow app that repeatedly starts, pauses, and resumes workflows. It defines a TestWorkflow that interacts with activities, handles external events, and logs the workflow's progress through stages. The app continuously monitoring and terminating workflows as necessary. Note the SchedulerReminders config is set.

I added instructions for how to run locally.

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
@cicoyle cicoyle marked this pull request as ready for review October 30, 2024 15:05
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing @cicoyle ! I ran the local multi app run and it worked great. Before going into more detailed review of the apps I'll share a few notes so you can work on them in the meantime:

Also please doublecheck if we need to specify persistance for etcd, or are we using the default. I don't remember what we decided last.

Signed-off-by: Cassandra Coyle <[email protected]>
Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting feedback for the scheduler jobs app. I'm checking the other two tomorrow.


To run locally as a process: `go run scheduler-jobs.go`, assuming you have a scheduler and dapr running accordingly (see below).

Run Scheduler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use dapr cli for instructions on how to run the sidecar and scheduler

)

const appPort = ":3006"
const daprGRPCPort = "3501"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the readme me file specifying the grpc port for the sidecar when running locally, but we'd need to do the same for the test run itself, on k8s, otherwise the app would try to connect to a dead port.

I think it will be easier if we avoided overriding the default, which is whatever was set by the injector in the DAPR_GRPC_PORT env variable. The client should respect it by default if no argument is passed.

@@ -0,0 +1,17 @@
# How-To Run Locally:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short paragraph summing up what the app does. you can just copy paste what you already wrote in the PR description.

}
}(ctx)

// --dapr-grpc-port=3501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// --dapr-grpc-port=3501

Comment on lines +221 to +222
log.Println("waiting a few seconds to let connections establish")
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the sdk provide a way to know this for certain? If not, we should probably think about adding one.


// Schedule additional oneshot jobs once 100 are triggered
go func(ctx context.Context) {
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a for loop that reschedules the 100 jobs every time they finish would be a better fit for a longhaul test than only doing it twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other types.

![scheduler_debugger_config.png](scheduler_debugger_config.png)

Run Dapr sidecar:
![sidecar_debugger_config.png](sidecar_debugger_config.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, but update the app port to 3006.

}
}(ctx)

// Schedule job to trigger immediately every second for 1s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..."every second for 10s"?

TTL: "40s",
Data: nil,
}
err := daprClient.ScheduleJobAlpha1(context.Background(), req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add timeouts in the ctx of the schedule/delete functions pls. We should also be checking if the main context is still alive in the loop.

Run Dapr sidecar:
![sidecar_debugger_config.png](sidecar_debugger_config.png)

To run locally as a container:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we need additional networking config (host.docker.internal?) for an app running in a docker container to reach the daprd and scheduler processes running locally?

Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding review for the scheduler-workflow app.
Looking at actor reminders next

WorkflowComponent: "",
})
if err != nil {
log.Fatalf("Failed to pause workflow: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're logging the error on StartWorkflow, but exiting on this one.
I can see the benefit of exiting the app and noticing the container getting restarted in the Grafana dashboard, but I feel like we're already testing this in our other test suites. I'd prefer the app run for as long as possible, meaning we should log the errors, but not exit.


const appPort = ":8484"

var stage int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably never going to be accessed in parallel, but just to keep it clean, since it's a global variable and can be accessed from different goroutines, let's make it an atomic int.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants