Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move App Systemd Tasks to Active Jobs #552

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NoRePercussions
Copy link
Contributor

Partial progress towards #544.

Removed:

  • Backup cleaning reminder

Moved:

  • Email pulling
  • Slack notifications

Note: email pulling is no longer long-running.

Same:

  • TS and TS Indexing

Note: for moving to Docker, it may make sense to put the TS server into its own container so that Docker can handle its lifecycle.

This introduces substantial changes to email pulling.

* Email pulling is no longer managed with systemd and rake.

* Email pulling job no longer manages its own lifecycle. If we are no
longer using systemd, we no longer have its ability to recover from
unexpected failure by restarting the task. Additionally, we no longer
need to manage an IMAP connection lifecycle, and avoid rate-limiting
errors.

* IMAP failures are now logged as errors, as we do not expect
rate-limiting or dropped connection errors on a short-lived connection.
raise "No acceptable authentication mechanisms"
end
rescue Net::IMAP::NoResponseError, SocketError, Faraday::ConnectionFailed => error
logger.error("Could not authenticate for #{config[:email]}, error: #{error.message}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we no longer anticipate rate-limits or connection drops, I changed this (and one below) to log an error. I figure that while it is an error, it is not likely to be one resulting from our app logic, but should it raise an error instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes it should raise an error so that the job is properly marked as failed. We can then set retry_on or discard_on depending on how it errored.

Copy link
Member

@DaAwesomeP DaAwesomeP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Main thing it is missing is something to initially start these jobs and then some before_perform to keep them going.

Alternatively, we could use something like activejob-scheduler or sidekiq-scheduler. The Sidekiq one seems more popular, and the schedules aren't really needed for development (which would use the async backend instead of Sidekiq).

raise "No acceptable authentication mechanisms"
end
rescue Net::IMAP::NoResponseError, SocketError, Faraday::ConnectionFailed => error
logger.error("Could not authenticate for #{config[:email]}, error: #{error.message}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes it should raise an error so that the job is properly marked as failed. We can then set retry_on or discard_on depending on how it errored.

Comment on lines -88 to -99
waiting = Thread.start do
sleep(20.minutes)

imap.idle_done
end

imap.idle do |response|
if response.respond_to?(:name) && response.name == 'EXISTS'
waiting.kill
imap.idle_done
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting--it seems that we force it to wait for 20 minutes, and then the imap.idle command will return when there are new inbox items. I'm wondering if we should have a separate task to spawn jobs with this method. We could do imap.idle(20*60) do |res| PullEmailJob.perform_later which would cause it to run every 20 minutes, or earlier if there is a notification. That job would have a after_perform to restart it.

https://ruby-doc.org/stdlib-2.5.3/libdoc/net/imap/rdoc/Net/IMAP.html#method-i-idle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some level I'd rather have more frequent short-lived jobs than long-running ones. Sidekiq at least can run multiple threads, but it still hogs one of them. However, with Docker scaling plus threading, I don't think it will matter to us, so I'm happy to re-roll the idling.

Another concern of mine is that sounds like if we immediately reschedule, it would be also possible to end up in a fail-loop and get rate-limited. I think if we do idle, we should just resume at the next scheduled time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could experiment with a shorter idle delay--maybe 60 seconds to 5 minutes. That way a job never runs longer than 5 minutes. If it fails, it seems there is a wait parameter on the retry_on method so we can have it wait 10 minutes if it fails. That should cover the looping-rate-limit issue. It could then re-queue itself or rely on another scheduler (although we would need to scheduler to not run if a retry is currently waiting).

https://edgeapi.rubyonrails.org/classes/ActiveJob/Exceptions/ClassMethods.html#method-i-retry_on

So 1) initial job starts, 2) job ends and it goes into idle (either launches an idle job or just continues current job to idle), 3) idle job ends and re-queue or scheduler starts another.

I'm not really worried about thread hogging. Even though the task is running, this is blocking I/O so it won't consume anything else on the system. We can just treat our scaling as n + 1 where 1 is a worker dedicated to email. But also Sidekiq might recognize the blocking I/O and not lock up that worker (I have not tested this).

class PullEmailJob < ApplicationJob
queue_as :default

def perform(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can have no arguments if you aren't using any.

Comment on lines +7 to +10
STDOUT.sync = true

logger = Logger.new(STDOUT)
logger.level = Logger::INFO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we still want these

Comment on lines +34 to +35
reconnectSleep = 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reconnectSleep = 1

begin
imap.select(config[:name])

while true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want this while true anymore since we will control the frequency by running the job many times (not a single long-lasting job).

class SendEventSlackNotificationsJob < ApplicationJob
queue_as :default

def perform(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, unsure that we have to specify args if not used.

Comment on lines +7 to +9
STDOUT.sync = true

logger = Logger.new(STDOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want these still for STDOUT

@@ -0,0 +1,102 @@
class SendEventSlackNotificationsJob < ApplicationJob
queue_as :default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job probably needs a before_perform to schedule the job occurring on the next hour interval.

A later PR could break this into two jobs: One to send notifications for each event and one to schedule a notification job for each event (so that it isn't restricted to on the hour). This would also need to keep track of events that have already been notified (and clear that record if the time changes to a later time).

require 'logger'

class PullEmailJob < ApplicationJob
queue_as :default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job probably needs a before_perform to schedule the job occurring on the next 20 minute interval. Or, see other comment below we can have a separate job scheduling this job that incorporates the IMAP idle notifications.

@DaAwesomeP
Copy link
Member

Note: for moving to Docker, it may make sense to put the TS server into its own container so that Docker can handle its lifecycle.

We can run them from the same container (but separately) and just change the command. This is how I run Sidekiq (and multiple scaled instances of it) in the same Compose instance as Rails:

sidekiq:
  image: ghcr.io/example/example:main
  restart: always
  environment:
    DATABASE_URL: "mariadb://example:example@postgresql/example_production"
    REDIS_URL: "redis://redis:6379/1"
    RAILS_MAILER_DEFAULT_URL_HOST: "example"
    RAILS_MAILER_DEFAULT_URL_PORT: "443"
    RAILS_MAILER_DEFAULT_URL_PROTOCOL: "https"
    RAILS_MAILER_SMTP_HOST: "example"
    RAILS_MAILER_SMTP_PORT: "587"
  command: bundle exec sidekiq
  networks:
    - app-mariadb
    - app-redis
  depends_on:
    - redis
    - mariadb  
  deploy:
    replicas: 3
redis:
  image: redis
  restart: always
  networks:
    - app-redis
  volumes:
    - redis:/data
  command: redis-server --save 60 1 --loglevel warning

@NoRePercussions
Copy link
Contributor Author

NoRePercussions commented Jun 10, 2024

I don't like the idea of writing our own scheduling code, whether it is self-requeueing or scheduling other jobs. There's a lot of pitfalls, and it might result in more maintenance time than it is worth. It seems like sidekiq-scheduler got most of their own race condition problems sorted out last month (after they were raised in 2016). I like how it looks in general.

@DaAwesomeP
Copy link
Member

I don't like the idea of writing our own scheduling code, whether it is self-requeueing or scheduling other jobs. There's a lot of pitfalls, and it might result in more maintenance time than it is worth. It seems like sidekiq-scheduler got most of their own race condition problems sorted out last month (after they were raised in 2016). I like how it looks in general.

For recurring jobs I agree, self-requeuing has a lot of race conditions to work through. If it ever stops then there is a chance it won't start again.

However, spawning non-requeing jobs from jobs is fine. For example, a recurring job every 5 minutes (running on a scheduler) could spawn individual one-shot jobs to send event notifications at exactly the right time (this example feature should go in a separate PR though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants