Move App Systemd Tasks to Active Jobs #552

NoRePercussions · 2024-06-10T20:37:59Z

Partial progress towards #544.

Removed:

Backup cleaning reminder

Moved:

Email pulling
Slack notifications

Note: email pulling is no longer long-running.

Same:

TS and TS Indexing

Note: for moving to Docker, it may make sense to put the TS server into its own container so that Docker can handle its lifecycle.

This introduces substantial changes to email pulling. * Email pulling is no longer managed with systemd and rake. * Email pulling job no longer manages its own lifecycle. If we are no longer using systemd, we no longer have its ability to recover from unexpected failure by restarting the task. Additionally, we no longer need to manage an IMAP connection lifecycle, and avoid rate-limiting errors. * IMAP failures are now logged as errors, as we do not expect rate-limiting or dropped connection errors on a short-lived connection.

NoRePercussions · 2024-06-10T20:39:55Z

app/jobs/pull_email_job.rb

+        raise "No acceptable authentication mechanisms"
+      end
+    rescue Net::IMAP::NoResponseError, SocketError, Faraday::ConnectionFailed => error
+      logger.error("Could not authenticate for #{config[:email]}, error: #{error.message}")


Since we no longer anticipate rate-limits or connection drops, I changed this (and one below) to log an error. I figure that while it is an error, it is not likely to be one resulting from our app logic, but should it raise an error instead?

I think yes it should raise an error so that the job is properly marked as failed. We can then set retry_on or discard_on depending on how it errored.

DaAwesomeP

This is great! Main thing it is missing is something to initially start these jobs and then some before_perform to keep them going.

Alternatively, we could use something like activejob-scheduler or sidekiq-scheduler. The Sidekiq one seems more popular, and the schedules aren't really needed for development (which would use the async backend instead of Sidekiq).

DaAwesomeP · 2024-06-10T20:43:52Z

app/jobs/pull_email_job.rb

+        raise "No acceptable authentication mechanisms"
+      end
+    rescue Net::IMAP::NoResponseError, SocketError, Faraday::ConnectionFailed => error
+      logger.error("Could not authenticate for #{config[:email]}, error: #{error.message}")


I think yes it should raise an error so that the job is properly marked as failed. We can then set retry_on or discard_on depending on how it errored.

DaAwesomeP · 2024-06-10T20:54:24Z

lib/tasks/email.rake

-          waiting = Thread.start do
-            sleep(20.minutes)
-
-            imap.idle_done
-          end
-
-          imap.idle do |response|
-            if response.respond_to?(:name) && response.name == 'EXISTS'
-              waiting.kill
-              imap.idle_done
-            end
-          end


This is interesting--it seems that we force it to wait for 20 minutes, and then the imap.idle command will return when there are new inbox items. I'm wondering if we should have a separate task to spawn jobs with this method. We could do imap.idle(20*60) do |res| PullEmailJob.perform_later which would cause it to run every 20 minutes, or earlier if there is a notification. That job would have a after_perform to restart it.

https://ruby-doc.org/stdlib-2.5.3/libdoc/net/imap/rdoc/Net/IMAP.html#method-i-idle

On some level I'd rather have more frequent short-lived jobs than long-running ones. Sidekiq at least can run multiple threads, but it still hogs one of them. However, with Docker scaling plus threading, I don't think it will matter to us, so I'm happy to re-roll the idling.

Another concern of mine is that sounds like if we immediately reschedule, it would be also possible to end up in a fail-loop and get rate-limited. I think if we do idle, we should just resume at the next scheduled time.

I think we could experiment with a shorter idle delay--maybe 60 seconds to 5 minutes. That way a job never runs longer than 5 minutes. If it fails, it seems there is a wait parameter on the retry_on method so we can have it wait 10 minutes if it fails. That should cover the looping-rate-limit issue. It could then re-queue itself or rely on another scheduler (although we would need to scheduler to not run if a retry is currently waiting).

https://edgeapi.rubyonrails.org/classes/ActiveJob/Exceptions/ClassMethods.html#method-i-retry_on

So 1) initial job starts, 2) job ends and it goes into idle (either launches an idle job or just continues current job to idle), 3) idle job ends and re-queue or scheduler starts another.

I'm not really worried about thread hogging. Even though the task is running, this is blocking I/O so it won't consume anything else on the system. We can just treat our scaling as n + 1 where 1 is a worker dedicated to email. But also Sidekiq might recognize the blocking I/O and not lock up that worker (I have not tested this).

DaAwesomeP · 2024-06-10T21:02:56Z

app/jobs/pull_email_job.rb

+class PullEmailJob < ApplicationJob
+  queue_as :default
+
+  def perform(*args)


I think you can have no arguments if you aren't using any.

DaAwesomeP · 2024-06-10T21:03:28Z

app/jobs/pull_email_job.rb

+    STDOUT.sync = true
+
+    logger = Logger.new(STDOUT)
+    logger.level = Logger::INFO


Not sure if we still want these

DaAwesomeP · 2024-06-10T21:04:07Z

app/jobs/pull_email_job.rb

+    reconnectSleep = 1
+


Suggested change

reconnectSleep = 1

DaAwesomeP · 2024-06-10T21:05:18Z

app/jobs/pull_email_job.rb

+    begin
+      imap.select(config[:name])
+
+      while true


We don't want this while true anymore since we will control the frequency by running the job many times (not a single long-lasting job).

DaAwesomeP · 2024-06-10T21:05:48Z

app/jobs/send_event_slack_notifications_job.rb

+class SendEventSlackNotificationsJob < ApplicationJob
+  queue_as :default
+
+  def perform(*args)


Same here, unsure that we have to specify args if not used.

DaAwesomeP · 2024-06-10T21:06:14Z

app/jobs/send_event_slack_notifications_job.rb

+    STDOUT.sync = true
+
+    logger = Logger.new(STDOUT)


Not sure if we want these still for STDOUT

DaAwesomeP · 2024-06-10T21:09:13Z

app/jobs/send_event_slack_notifications_job.rb

@@ -0,0 +1,102 @@
+class SendEventSlackNotificationsJob < ApplicationJob
+  queue_as :default
+


This job probably needs a before_perform to schedule the job occurring on the next hour interval.

A later PR could break this into two jobs: One to send notifications for each event and one to schedule a notification job for each event (so that it isn't restricted to on the hour). This would also need to keep track of events that have already been notified (and clear that record if the time changes to a later time).

DaAwesomeP · 2024-06-10T21:09:51Z

app/jobs/pull_email_job.rb

+require 'logger'
+
+class PullEmailJob < ApplicationJob
+  queue_as :default


This job probably needs a before_perform to schedule the job occurring on the next 20 minute interval. Or, see other comment below we can have a separate job scheduling this job that incorporates the IMAP idle notifications.

DaAwesomeP · 2024-06-10T21:24:45Z

Note: for moving to Docker, it may make sense to put the TS server into its own container so that Docker can handle its lifecycle.

We can run them from the same container (but separately) and just change the command. This is how I run Sidekiq (and multiple scaled instances of it) in the same Compose instance as Rails:

sidekiq:
  image: ghcr.io/example/example:main
  restart: always
  environment:
    DATABASE_URL: "mariadb://example:example@postgresql/example_production"
    REDIS_URL: "redis://redis:6379/1"
    RAILS_MAILER_DEFAULT_URL_HOST: "example"
    RAILS_MAILER_DEFAULT_URL_PORT: "443"
    RAILS_MAILER_DEFAULT_URL_PROTOCOL: "https"
    RAILS_MAILER_SMTP_HOST: "example"
    RAILS_MAILER_SMTP_PORT: "587"
  command: bundle exec sidekiq
  networks:
    - app-mariadb
    - app-redis
  depends_on:
    - redis
    - mariadb  
  deploy:
    replicas: 3
redis:
  image: redis
  restart: always
  networks:
    - app-redis
  volumes:
    - redis:/data
  command: redis-server --save 60 1 --loglevel warning

NoRePercussions · 2024-06-10T22:21:50Z

I don't like the idea of writing our own scheduling code, whether it is self-requeueing or scheduling other jobs. There's a lot of pitfalls, and it might result in more maintenance time than it is worth. It seems like sidekiq-scheduler got most of their own race condition problems sorted out last month (after they were raised in 2016). I like how it looks in general.

DaAwesomeP · 2024-06-11T01:19:43Z

I don't like the idea of writing our own scheduling code, whether it is self-requeueing or scheduling other jobs. There's a lot of pitfalls, and it might result in more maintenance time than it is worth. It seems like sidekiq-scheduler got most of their own race condition problems sorted out last month (after they were raised in 2016). I like how it looks in general.

For recurring jobs I agree, self-requeuing has a lot of race conditions to work through. If it ever stops then there is a chance it won't start again.

However, spawning non-requeing jobs from jobs is fine. For example, a recurring job every 5 minutes (running on a scheduler) could spawn individual one-shot jobs to send event notifications at exactly the right time (this example feature should go in a separate PR though).

NoRePercussions added 3 commits June 10, 2024 15:27

Remove Unused 'Clean Backups' Reminder

187f51e

Move Slack Notifications to ActiveJobs

321df8e

NoRePercussions commented Jun 10, 2024

View reviewed changes

DaAwesomeP requested changes Jun 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move App Systemd Tasks to Active Jobs #552

Move App Systemd Tasks to Active Jobs #552

NoRePercussions commented Jun 10, 2024

NoRePercussions Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP left a comment

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

NoRePercussions Jun 10, 2024

DaAwesomeP Jun 11, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP Jun 10, 2024

DaAwesomeP commented Jun 10, 2024

NoRePercussions commented Jun 10, 2024 •

edited

Loading

DaAwesomeP commented Jun 11, 2024

		@@ -0,0 +1,102 @@
		class SendEventSlackNotificationsJob < ApplicationJob
		queue_as :default

Move App Systemd Tasks to Active Jobs #552

Are you sure you want to change the base?

Move App Systemd Tasks to Active Jobs #552

Conversation

NoRePercussions commented Jun 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaAwesomeP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaAwesomeP commented Jun 10, 2024

NoRePercussions commented Jun 10, 2024 • edited Loading

DaAwesomeP commented Jun 11, 2024

NoRePercussions commented Jun 10, 2024 •

edited

Loading