-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Troubleshooting
This page contains tips for troubleshooting ZFS on Linux and what info developers might want for bug triage.
Log files can be very useful for troubleshooting. In some cases, interesting information is stored in multiple log files that are correlated to system events.
Pro tip: logging infrastructure tools like elasticsearch, fluentd, influxdb, or splunk can simplify log analysis and event correlation.
Typically, Linux kernel log messages are available from dmesg -T
, /var/log/syslog
, or where kernel log messages are sent (eg by rsyslogd
).
The ZFS kernel modules use an internal log buffer for detailed logging information.
This log information is available in the pseudo file /proc/spl/kstat/zfs/dbgmsg
for ZFS builds where ZFS module parameter zfs_dbgmsg_enable = 1
Symptom: zfs
or zpool
command appear hung, does not return, and is not killable
Likely cause: kernel thread hung or panic
Log files of interest: Generic Kernel Log, ZFS Kernel Module Debug Messages
Important information: if a kernel thread is stuck, then a backtrace of the stuck thread can be in the logs. In some cases, the stuck thread is not logged until the deadman timer expires. See also debug tunables
ZFS uses an event-based messaging interface for communication of important events to
other consumers running on the system. The ZFS Event Daemon (zed) is a userland daemon that
listens for these events and processes them. zed is extensible so you can write shell scripts
or other programs that subscribe to events and take action. For example, the script usually
installed at /etc/zfs/zed.d/all-syslog.sh
writes a formatted event message to syslog.
See the man page for zed(8)
for more information.
A history of events is also available via the zpool events
command. This history begins at
ZFS kernel module load and includes events from any pool. These events are stored in RAM and
limited in count to a value determined by the kernel tunable zfs_event_len_max.
zed
has an internal throttling mechanism to prevent overconsumption of system resources
processing ZFS events.
More detailed information about events is observable using zpool events -v
The contents of the verbose events is subject to change, based on the event and information
available at the time of the event.
Each event has a class identifier used for filtering event types. Commonly seen events are
those related to pool management with class sysevent.fs.zfs.*
including import, export,
configuration updates, and zpool history
updates.
Events related to errors are reported as class ereport.*
These can be invaluable for
troubleshooting. Some faults can cause multiple ereports as various layers of the software
deal with the fault. For example, on a simple pool without parity protection, a faulty
disk could cause an ereport.io
during a read from the disk that results in an
erport.fs.zfs.checksum
at the pool level. These events are also reflected by the error
counters observed in zpool status
If you see checksum or read/write errors in zpool status
then there should be one or more
corresponding ereports in the zpool events
output.