Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for detecting and reporting library files (marked as (deleted) or DEL) updated/replaced by package updates #110

Open
atc0005 opened this issue Mar 9, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request linux
Milestone

Comments

@atc0005
Copy link
Owner

atc0005 commented Mar 9, 2023

Overview

From the man page:

TYPE
is the type of the node associated with the file - e.g., GDIR, GREG, VDIR, VREG, etc.
...
or ''DEL'' for a Linux map file that has been deleted;

and from https://stackoverflow.com/a/37160579/903870:

lsof usually reports entries from the Linux /proc//maps file with mem in the TYPE column. However, when lsof can't stat(2) a path in the process maps file and the mapsfile entry contains (deleted), indicating the file was deleted after it had been opened, lsof reports the file type as DEL.

Yes, Simply those files are deleted after they are read by the process. If you have updated/replaced those files then you probably want to restart the service/process.

References

@atc0005 atc0005 added the enhancement New feature or request label Mar 9, 2023
@atc0005 atc0005 added this to the Future milestone Mar 9, 2023
@atc0005
Copy link
Owner Author

atc0005 commented Mar 9, 2023

A shell script that I used to help flag systems in need of a reboot is below. An enhancement to this project providing similar support would attempt to do so directly (e.g., parsing entries from the /proc filesystem) instead of parsing the output from lsof.

#!/bin/bash

# Set Internal Field Separator to newlines only so spaces won't trigger a new
# array entry.
IFS=$'\n'

declare -a processes_with_old_lib_references library_references

# Pattern which indicates a file descriptor for a "deleted" library. We use this
# to find processes that have references to old copies of updated library files
# still open.
deleted_file_descriptor="DEL"

# grep -E compatible patterns that we want to filter out from lsof output
exclude_regex="/dev/zero|/SYSV|/\[aio\]|/usr/share|/var/lib/samba|/tmp"

# grep -E compatible patterns that we want to keep from lsof output
# (these patterns are applied first to thin the list)
include_regex="/(usr|lib)"

#   -n = no host names
# +c 0 = show full process name (no truncation)
#   -d = limit to file descriptor set (comma separated)
processes_with_old_lib_references=($(
    lsof -w -n +c 0 -d ${deleted_file_descriptor} \
        | grep -E "${include_regex}" \
        | grep -Ev "${exclude_regex}" \
        | tr -s ' ' \
        | cut -d ' ' -f 1 \
        | sort \
        | uniq \
))

# Example output contained in ${processes_with_old_lib_references[@]}:
#
# fail2ban-server
# mailgraph
# nginx
# sshd


reboot_required_indicator_file="/var/run/reboot-required"
reboot_required_package_list_file="/var/run/reboot-required.pkgs"

# The regex pattern for package names which appear in /var/run/reboot-required.pkgs
# whenever a new kernel has been installed or removed. This indicates that
# the system system has not yet been rebooted.
kernel_package_change_indicator="linux-image|linux-base"


if [ -f ${reboot_required_indicator_file} ]
then

    # Grab the contents of the file (one entry per line) flatten to a single
    # line (space separated) and then replace spaces with a command a trailing
    # space so that all items fit on a single line to improve Nagios status
    # information display.
    reboot_required_package_list=$(echo $(cat ${reboot_required_package_list_file}) | sed 's/ /, /g')

fi


if [[ "${#reboot_required_package_list[@]}" -ne 0 ]]; then

    # Check $reboot_required_package_list array for $kernel_package_string
    if [[ "${reboot_required_package_list[@]}" =~ $kernel_package_change_indicator ]]
    then
        echo -e "\n[${HOSTNAME}] WARNING: Kernel installed or removed. Reboot needed.\n"
        exit 1
    fi
fi


if [[ "${#processes_with_old_lib_references[@]}" -eq 0 ]]; then

    echo "[${HOSTNAME}] OK: No old library references or pending kernel changes found. Exiting ..."

    exit 0

else

    #
    #  Output affected process and old library file references
    #

    echo -e "\n[${HOSTNAME}] WARNING: The following processes need to be restarted:\n"

    for process_name in "${processes_with_old_lib_references[@]}"
    do

        # -w   = disable warnings
        # -n   = no host names
        # +c 0 = show full process name (no truncation)
        # -c   = filter on specific process name
        # -d   = limit to file descriptor set (comma separated)
        library_references=($(
            lsof -w -n +c 0 -c ${process_name} -d ${deleted_file_descriptor} \
            | grep -Ev ${exclude_regex} \
            | tr -s ' ' \
            | cut -d ' ' -f 8 \
            | sort \
            | uniq \
            | grep -E ${include_regex} \
        ))

        # Only print out process name if there are actual library references
        if [[ "${#library_references[@]}" -ne 0 ]]; then

            process_needing_restart_entry="${process_name}: "

            for lib_ref in "${library_references[@]}"
            do
                lib_filename="$(basename $(echo ${lib_ref} | cut -d ' ' -f 2))"
                process_needing_restart_entry="${process_needing_restart_entry} ${lib_filename}"
            done

            echo $process_needing_restart_entry

        fi

    done

    echo -e "\n"


fi

I used this from an Ansible role named mass-patch used to do just that: mass patch systems and reboot any that appeared to need it (using the above script) after applying patches. By rebooting just those that needed it vs every system it optimized for time during maintenance windows when time was short.

If I recall correctly (it has been a few years), we also had Nagios watching for systems that should have been restarted, but were not.

@atc0005
Copy link
Owner Author

atc0005 commented Mar 9, 2023

A shell script that I used to help flag systems in need of a reboot is below. An enhancement to this project providing similar support would attempt to do so directly (e.g., parsing entries from the /proc filesystem) instead of parsing the output from lsof.

Reminder to self when I loop back to this:

The atc0005/check-process project is already parsing the /proc filesystem for its purposes. It might be that the internal code to provide that functionality is moved into an external module so that both projects can share the logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request linux
Projects
None yet
Development

No branches or pull requests

1 participant