Skip to content

Logging

Kevin Quick edited this page Jul 8, 2020 · 11 revisions

Purpose and Context

Logging in Matterhorn is something performed in a specific instance and controlled by the user; it does not interact with syslog() or other system-wide logging. Matterhorn logging is intended to record a stream of internal events for either local development debugging, or diagnosing an issue encountered by a user.

General logging can be done via either a startup argument, which specifies a filename to log to, or via the /log-start command. In both cases, the user specifies the specific file to log to and therefore have full control and opportunity to inspect and redact that file before sharing the information.

In addition there is the /log-snapshot command. This is especially useful in cases where something unexpected has occurred. Matterhorn will internally maintain an in-memory ring buffer of the latest messages that would normally have been logged. The /log-snapshot <path> command will dump that ring buffer to the specified file. This is very useful to log the most recent events: just after noticing a problem.

Users can type the /log command prefix in Matterhorn and see the set of logging commands available.

The Matterhorn logging is generally intended as a debugging/diagnostic tool for the developers. Any concern about sensitive information is handled by writing the information to this file, that the user can then edit or remove information from before (for example) providing to the development team to diagnose an issue; something which would be a bit more difficult to do with a screenshot. File-based logging also provides a historical context of events leading up to the actual error which a screenshot display of an error would not provide.

Matterhorn generally does not log information that is known to be sensitive (e.g. passwords), but we don't always control the information that gets placed into exceptions and error messages, so this methodology provides an additional layer of inspection for the user before information leaves their local environment.

Design and Implementation

High-level Concerns

  • Logging should cost nothing or almost nothing if it is disabled.
  • Logging should be possible without triggering redraws.
  • The mattermost-api library should be able to participate in the same logging infrastructure that is used by the main application. This may require modifying the API library.
  • Recent messages should be available in an internal buffer regardless of whether logging is enabled. (Logging is "enabled" if log messages are being written to an external location such as a disk file.)
  • It should be possible to obtain a log of all recent events even if logging to an output file wasn't fully enabled in order to help the diagnosis of unusual situations.

Approach

  • Ordered log levels have lots of problems. Initially we'll go with an approach that doesn't assign ordered levels to log messages, but rather assigns tags that are similar to levels (for categorization) but have no ordering relationship.
  • DONE: There will initially be no logging UI in the program itself Instead there will be keybindings and commands added to perform the following:
    • start logging: open the log file for append, write the current ring buffer contents to the log file, and then begin appending log messages as they arrive from the program.
    • stop logging: close the log file.
  • DONE: The start and stop operations should emit additional output lines indicating the beginning and end of the buffer flush as well as the start or end of logging.
  • DONE: All log messages should be timestamped. This can be done by the logging thread (see below) upon reception instead of by the caller; the delta between generation and incorporation is currently not a concern.
  • DONE: We'll use a Sequence to hold the internal buffer for efficient modification.
  • DONE: The size of log buffer (in entries) should be a configuration option with a reasonable default.
  • DONE: We'll have a logging thread responsible for owning the ring buffer and managing output to the log file (when enabled). We'll need a channel to that thread for sending logging protocol messages:
    • DONE: log a message
    • DONE: start logging to a disk file
    • stop logging to a disk file
  • DONE: We'll need a function mhLog :: Tag -> Text -> MH () for logging. The tag type is for classifying messages.
  • DONE: We'll add an implicit logging context to MH. The context can then be manipulated with a combinator withContext :: LogContext -> MH a -> MH a. The log context can include data relevant to any logging done in the monadic action, such as channel ID. All log messages will include a log context.
  • DONE: mhError and similar MH functions should be instrumented with calls to mhLog.

Various Considerations

  • Two primary usage cases: assisting the development process and collecting information for user-reported issues.

  • In-app presentation of logging is fancier, but there are a number of ways this could be presented (e.g. live or modally), with varying impacts upon existing functionality, modals, keybindings, etc. For simplicity, the initial approach will be focused on logging to a file which allows out-of-band display, filtering, and monitoring of the logging output.

  • Using Brick's suspend feature as a way to display the log buffer was determined as non-optimal because it only interrupts the display thread and not the other operational threads.

  • A potential future in-app presentation enhancement might be to create a pseudo-channel into which can display the logging buffer contents and allows all the standard message display functionality (scrolling, highlighting, etc.), and which can also be used to display errors instead of the current-channel injection methodology used by conventional Matterhorn.

  • The current implementation as of f70d9da1a882b9dd4db59c92cc093f378520de1c makes calls to getCurrentTime in MH. There is a concern that due to system I/O this could have a potentially significant runtime cost. Since we care about ensuring that event handlers don't block and execute as quickly as possible to keep input latency low, this could become a problem. We might need to check on whether this happens and explore alternatives, such as:

    • Using an alternative library call to get the current time that doesn't suffer from this problem, or
    • Moving the call to getCurrentTime to the logging thread to avoid paying that cost in the event handler. The trade-off in this case is that we lose a little precision since the timestamp associated with a log message is the time at which the message is removed from the log thread queue rather than the time at which the log message was constructed in the event handler, but this loss in precision might be an acceptable loss given the performance impact.