Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log errors with level error #140

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Log errors with level error #140

wants to merge 1 commit into from

Conversation

jaap3
Copy link
Contributor

@jaap3 jaap3 commented Jan 18, 2021

We missed a critical configuration change (mailserver password had changed) last week and didn't notice until end users started complaining about not receiving mail. It took a while for us to figure out what the culprit was, luckily the error message is stored in the log table. However, it would be better if deliverability errors were logged with a higher log level so our error tracking system (Sentry) could've picked them up and alerted us.

It might be possible to use logger.exception here, but I'm not 100% sure.

@coveralls
Copy link

coveralls commented Jan 18, 2021

Coverage Status

Coverage increased (+0.06%) to 86.461% when pulling 9641ba9 on jaap3:patch-1 into ae0475c on pinax:master.

@spookylukey
Copy link
Contributor

I think we may need some more fine-grained handling here, and need to consider all the cases. Being too 'loud' is just as much an issue as being too 'quiet', and I normally go by the rule that you shouldn't do 'warning' or 'error' unless there is a correct and sufficiently fine-grained way to silence those things. Otherwise, the true signal of things you need to attend to gets lost in the noise of things you don't.

Normally error level is something that gets reported to some bug tracking software e.g. Sentry. You 'resolve' the issue only when the bug is fixed, and if it occurs again, it means you have a regression or the bug wasn't fixed properly.

But here, one of the errors caught is smtplib.SMTPRecipientsRefused, which I think can happen just when the recipient address is wrong. We almost certainly don't want this to trigger an 'error' log, because it's not really an application error - it can just as easily be triggered by someone putting in the wrong email address for themselves. In this case there is not much you can do to resolve the error - it is expected behaviour of the app that you simply can't deliver such messages. So we don't want to group that with others that indicate a developer or devops error.

We need to know exactly when the different errors occur, and log at appropriate levels.

@jaap3
Copy link
Contributor Author

jaap3 commented Jan 18, 2021

I get what you are saying, I was just going by other use of logger.error in this file, e.g. acquire_lock also logs "temporary" failures at the error level:

def acquire_lock():
logger.debug("acquiring lock...")
if LOCK_PATH is not None:
lock_file_path = LOCK_PATH
else:
lock_file_path = "send_mail"
lock = lockfile.FileLock(lock_file_path)
try:
lock.acquire(LOCK_WAIT_TIMEOUT)
except lockfile.AlreadyLocked:
logger.error("lock already in place. quitting.")
return False, lock
except lockfile.LockTimeout:
logger.error("waiting for the lock timed out. quitting.")
return False, lock
logger.debug("acquired.")
return True, lock

I think that is fine. If the lock cannot be cleared for an extended period of time I'd like to investigate. If it's just a transitional error I can see in sentry that it "cleared" itself, so that's fine as well.

I think most delivery issues require some manual intervention right? Even the SMTPRecipientsRefused requires you to delete the message so it won't be retried indefinitely.

As it is now it's too easy for django-mailer to not work at all without us being notified.

P.S. as I was typing this message I got a sentry alert with the message lock already in place. quitting., what a coincidence.

@jaap3
Copy link
Contributor Author

jaap3 commented Jan 18, 2021

Well, having the locking error logged at the error level just saved us from even more debugging. It turns out that the reload for the config change didn't clear the lock, so I had to manually clear it.

I think in case of errors it's better to err on the side of being to loud instead of too quiet. It's fairly easy to fully ignore an error in Sentry (I think every type of error will be grouped individually in Sentry as the exception is part of the message), not getting notified about them however can be much more catastrophic.

But if more fine grained handling/logging is required for you to merge/release this PR, I'd be happy to do so. I do need some guidance however, as I cannot decide what type of errors should be logged at warning or even info level.

@jaap3
Copy link
Contributor Author

jaap3 commented Feb 3, 2021

As a workaround for the current behavior I resorted to a logging config with a special filter. It's slightly more complex than I would like, but it works:

import logging

class MailerErrorFilter(logging.Filter):
    """
    Filter django-mailer log messages so only
    errors remain.
    """
    # XXX: Workaround for https://github.com/pinax/django-mailer/pull/140
    def filter(self, record):
        if record.name.startswith('mailer.'):
            return (
                record.levelno > logging.INFO
                or record.msg.startswith('message deferred')
            )
        else:
            return True

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'sentry': {
            'class': 'sentry_sdk.integrations.logging.EventHandler',
            'filters': ['mailer_filter'],
            'level': 'DEBUG',
        },
    },
    'filters': {
        'mailer_filter': {
            '()': MailerErrorFilter
        }
    },
    'loggers': {
        'mailer': {  # django-mailer
            'handlers': ['sentry'],
            'level': 'INFO',
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants