Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logratation causes that file is read duplicately #225

Open
gulycka opened this issue Dec 3, 2018 · 2 comments
Open

Logratation causes that file is read duplicately #225

gulycka opened this issue Dec 3, 2018 · 2 comments

Comments

@gulycka
Copy link

gulycka commented Dec 3, 2018

Hi,

I just upgraded my ELK cluster from 2.x version on Debian 8 to 6.x version on Debian 9.
Some basic information about my setup.
I have rsyslog that receives data and write them into the files.
Logstash with input plugin file reads files using wildcard path and after some processing it sends them to Elasticsearch nodes.
Files are rotated by logrotate with nocopy/nocopytruncate options.
So inodes of file is kept, files are only renamed (from test.log.1 -> test.log.2, test.log -> test.log.1, etc.) and new file is created (test.log).

On Debian 8 with ELK v2.x everything worked just fine.
After upgrading to Debian 9 with ELK v6.x I encountered a problems with duplicately processed files after logrotation takes place. I was trying to replicate it to have some debug informations. The issue doesn't occur everytime in my testing environment.

In this particular case I was able to replicate the issue with last file beeing processed duplicately by input plugin file.
For example, if I have:
test.log
test.log.1
test.log.2
Then during logrotation test.log.2 is renamed to test.log.3, test.log.1 is renamed to test.log.2, test.log is renamed to test.log.1 and new file test.log is created.
Logstash watches all the files and trackes them in sincedb file.
Sometimes after logrotation, logstash picks the last file, test.log.3 and reads it again.

In tested case that is stated below the problem was with the file test.log.17 that was processed again after logrotation as test.log.18.

Logrotate config:

/var/ls/remotelogs/test/*.log
{
        hourly
        rotate 24
        maxage 26
        nocopy
        nocopytruncate
        create 640 root logstash
        nocompress
        nodelaycompress
        missingok
        notifempty
        sharedscripts
        postrotate
                invoke-rc.d rsyslog rotate > /dev/null
        endscript
}

Logstash input configuration (ignore_older and sincedb_clean_after are kept in default settings what should be OK):

input {
    file {
        path => [ "/var/ls/remotelogs/test/test.log*" ]
        codec => "plain"
        type => "test"
        discover_interval => 15
        sincedb_path => "/var/ls/sincedb/monitored/test.touch"
        sincedb_write_interval => 1
        start_position => "beginning"
        stat_interval => 2
    }
}

Sincedb file before final logratation (problematic file on the top, with inode number 14, named test.log.17):

14 0 65035 122278 1543852831.1367729 /var/ls/remotelogs/test/test.log.17
12 0 65035 41873 1543852831.340915 /var/ls/remotelogs/test/test.log.16
13 0 65035 25786 1543852831.340071 /var/ls/remotelogs/test/test.log.15
15 0 65035 28589 1543852831.339257 /var/ls/remotelogs/test/test.log.14
16 0 65035 1529487 1543852831.341586 /var/ls/remotelogs/test/test.log.13
17 0 65035 5708664 1543852831.324611 /var/ls/remotelogs/test/test.log.12
18 0 65035 1828226 1543852831.282213 /var/ls/remotelogs/test/test.log.11
19 0 65035 1249527 1543852831.336627 /var/ls/remotelogs/test/test.log.10
20 0 65035 2587309 1543852831.33844 /var/ls/remotelogs/test/test.log.9
21 0 65035 5794124 1543852831.337893 /var/ls/remotelogs/test/test.log.8
22 0 65035 17483839 1543852831.340473 /var/ls/remotelogs/test/test.log.7
23 0 65035 23506699 1543852831.3483942 /var/ls/remotelogs/test/test.log.6
24 0 65035 7435294 1543852831.338851 /var/ls/remotelogs/test/test.log.5
25 0 65035 7865804 1543852831.33967 /var/ls/remotelogs/test/test.log.4
26 0 65035 13471598 1543852831.3373759 /var/ls/remotelogs/test/test.log.3
27 0 65035 3561704 1543852831.34687 /var/ls/remotelogs/test/test.log.2
28 0 65035 22756747 1543852831.3479078 /var/ls/remotelogs/test/test.log.1
29 0 65035 54957474 1543852999.732651 /var/ls/remotelogs/test/test.log

ls -li:
14 -rw-r----- 1 root logstash 122278 Dec 3 15:45 test.log.17

files from /var/ls/remotelogs/test/ that are opened by logstash process:
lr-x------ 1 logstash logstash 64 Dec 3 17:00 96 -> /var/ls/remotelogs/test/test.log

Just after logrotation, see the sincedb file (the file with inode 14 is still there, but with old filename test.log.17. test.log.17 is there twice, as test.log.16 that was renamed to test.log.17 with inode number 12 and test.log.17 that has inode number 14, but now is named test.log.18 - this change stays unnoticed by logstash input filter):

14 0 65035 122278 1543852831.1367729 /var/ls/remotelogs/test/test.log.17
12 0 65035 41873 1543853007.849294 /var/ls/remotelogs/test/test.log.17
13 0 65035 25786 1543853007.8600118 /var/ls/remotelogs/test/test.log.16
15 0 65035 28589 1543853007.860524 /var/ls/remotelogs/test/test.log.15
16 0 65035 1529487 1543853007.8608618 /var/ls/remotelogs/test/test.log.14
17 0 65035 5708664 1543853007.8611422 /var/ls/remotelogs/test/test.log.13
18 0 65035 1828226 1543853007.862993 /var/ls/remotelogs/test/test.log.12
19 0 65035 1249527 1543853007.863268 /var/ls/remotelogs/test/test.log.11
20 0 65035 2587309 1543853007.863535 /var/ls/remotelogs/test/test.log.10
21 0 65035 5794124 1543853007.863792 /var/ls/remotelogs/test/test.log.9
22 0 65035 17483839 1543853007.8640602 /var/ls/remotelogs/test/test.log.8
23 0 65035 23506699 1543853007.86431 /var/ls/remotelogs/test/test.log.7
24 0 65035 7435294 1543853007.8645642 /var/ls/remotelogs/test/test.log.6
25 0 65035 7865804 1543853007.864824 /var/ls/remotelogs/test/test.log.5
26 0 65035 13471598 1543853007.8650851 /var/ls/remotelogs/test/test.log.4
27 0 65035 3561704 1543853007.865335 /var/ls/remotelogs/test/test.log.3
28 0 65035 22756747 1543853007.865582 /var/ls/remotelogs/test/test.log.2
29 0 65035 55107447 1543853007.871568 /var/ls/remotelogs/test/test.log.1
30 0 65035 54944 1543853007.99283 /var/ls/remotelogs/test/test.lo

ls-li shows that file is really renamed and logrotation was as expected (without inode change):

11 drwx------ 2 root root        16384 Dec  3 13:12 lost+found
30 -rw-r----- 1 root logstash    85681 Dec  3 17:03 test.log
29 -rw-r----- 1 root logstash 55108493 Dec  3 17:03 test.log.1
20 -rw-r----- 1 root logstash  2587309 Dec  3 15:58 test.log.10
19 -rw-r----- 1 root logstash  1249527 Dec  3 15:57 test.log.11
18 -rw-r----- 1 root logstash  1828226 Dec  3 15:56 test.log.12
17 -rw-r----- 1 root logstash  5708664 Dec  3 15:55 test.log.13
16 -rw-r----- 1 root logstash  1529487 Dec  3 15:52 test.log.14
15 -rw-r----- 1 root logstash    28589 Dec  3 15:50 test.log.15
13 -rw-r----- 1 root logstash    25786 Dec  3 15:48 test.log.16
12 -rw-r----- 1 root logstash    41873 Dec  3 15:47 test.log.17
14 -rw-r----- 1 root logstash   122278 Dec  3 15:45 test.log.18
28 -rw-r----- 1 root logstash 22756747 Dec  3 16:42 test.log.2
27 -rw-r----- 1 root logstash  3561704 Dec  3 16:32 test.log.3
26 -rw-r----- 1 root logstash 13471598 Dec  3 16:31 test.log.4
25 -rw-r----- 1 root logstash  7865804 Dec  3 16:25 test.log.5
24 -rw-r----- 1 root logstash  7435294 Dec  3 16:21 test.log.6
23 -rw-r----- 1 root logstash 23506699 Dec  3 16:18 test.log.7
22 -rw-r----- 1 root logstash 17483839 Dec  3 16:08 test.log.8
21 -rw-r----- 1 root logstash  5794124 Dec  3 16:01 test.log.9

After few seconds, filename is updated in sincedb file is updated (notice that offset counter was also updated as the file was processed again):

14 0 65035 98121 1543853016.239249 /var/ls/remotelogs/test/test.log.18
12 0 65035 41873 1543853007.849294 /var/ls/remotelogs/test/test.log.17
13 0 65035 25786 1543853007.8600118 /var/ls/remotelogs/test/test.log.16
15 0 65035 28589 1543853007.860524 /var/ls/remotelogs/test/test.log.15
16 0 65035 1529487 1543853007.8608618 /var/ls/remotelogs/test/test.log.14
17 0 65035 5708664 1543853007.8611422 /var/ls/remotelogs/test/test.log.13
18 0 65035 1828226 1543853007.862993 /var/ls/remotelogs/test/test.log.12
19 0 65035 1249527 1543853007.863268 /var/ls/remotelogs/test/test.log.11
20 0 65035 2587309 1543853007.863535 /var/ls/remotelogs/test/test.log.10
21 0 65035 5794124 1543853007.863792 /var/ls/remotelogs/test/test.log.9
22 0 65035 17483839 1543853007.8640602 /var/ls/remotelogs/test/test.log.8
23 0 65035 23506699 1543853007.86431 /var/ls/remotelogs/test/test.log.7
24 0 65035 7435294 1543853007.8645642 /var/ls/remotelogs/test/test.log.6
25 0 65035 7865804 1543853007.864824 /var/ls/remotelogs/test/test.log.5
26 0 65035 13471598 1543853007.8650851 /var/ls/remotelogs/test/test.log.4
27 0 65035 3561704 1543853007.865335 /var/ls/remotelogs/test/test.log.3
28 0 65035 22756747 1543853007.865582 /var/ls/remotelogs/test/test.log.2
29 0 65035 55108493 1543853010.088067 /var/ls/remotelogs/test/test.log.1
30 0 65035 262418 1543853014.146191 /var/ls/remotelogs/test/test.log

Process is now reading three files (test.log and test.log.1 is expected, but also test.log.18 is beeing read):

lr-x------ 1 logstash logstash 64 Dec  3 17:03 113 -> /var/ls/remotelogs/test/test.log.1
lr-x------ 1 logstash logstash 64 Dec  3 17:03 114 -> /var/ls/remotelogs/test/test.log.18
lr-x------ 1 logstash logstash 64 Dec  3 17:00 96 -> /var/ls/remotelogs/test/test.log

Stat on test.log.18 file:

  File: test.log.18
  Size: 122278          Blocks: 240        IO Block: 4096   regular file
Device: fe0bh/65035d    Inode: 14          Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (  999/logstash)
Access: 2018-12-03 15:37:59.878413469 +0100
Modify: 2018-12-03 15:45:03.793572841 +0100
Change: 2018-12-03 17:03:25.975984162 +0100
 Birth: -

Basic version info of current setup:

  • Logstash version: 6.4.2-1
  • Logrotate version: 3.11.0-0.1
  • OS: Debian 9.6
@guyboertje
Copy link
Contributor

Please try the latest version - changelog

bin/logstash-plugin install logstash-input-file --version 4.1.8

@martinscholz83
Copy link

Is this fixed in the latest release? Because i currently face a similarly problem. Here is my environment. On server A i use filebeat with file output and rotation. Then i transfer these files to server B (logstash server) with sftp every 10 minutes. The inodes of the transferred files remain the same. Now the problem is that every time the new files were transferred to logstash server and logstash read them, it seems that it always reads all files again and put them into elasticsearch. Lets say i have 1000 entries in filebeat, filebeat.1, filebeat.2, .... Then on first read i have 1000 entries in elasticsearch. On second read it's 1800, 2100, 2600, ... even though the number of entries in the logfiles only increase by 10 or 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants