Missing records from project logs #6

lukas-vlcek · 2016-06-21T12:43:48Z

Summary

Missing log messages in Elasticsearch indices when running openshift-test.sh script.

Details

Note: the fix from PR #5 needs to be applied/merged first.

Assume the following ENV variables:

export USE_FLUENTD=true
export NMESSAGES=110
export NPROJECTS=10

When the test openshift-test.sh is executed it fails (time-outs) on verification of expected records in the index for the last project (i.e. project-09). Specifically:

error: 110 messages not found in this-is-project-09

Further investigation reveals that this index is missing some records (note the index this-is-project-09.1.2016.06.21 containing only 88 documents instead of 110):

# curl http://10.40.2.198:9200/_cat/indices?v
health status index                           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   this-is-project-07.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-09.1.2016.06.21   5   1         88            0     42.6kb         42.6kb 
yellow open   .operations.2016.06.21            5   1        110            0     37.9kb         37.9kb 
yellow open   this-is-project-10.1.2016.06.21   5   1        110            0     53.2kb         53.2kb 
yellow open   this-is-project-06.1.2016.06.21   5   1        110            0     52.9kb         52.9kb 
yellow open   this-is-project-08.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-04.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-03.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-02.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-01.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-05.1.2016.06.21   5   1        110            0     52.6kb         52.6kb

Note the size of source log files in the /tmp/tmp.pKiGhuIsLw/data/docker folder below, all are equal (which is expected and it means they all contain the same number of log messages):

+ ls -R -alrtF /tmp/tmp.pKiGhuIsLw
/tmp/tmp.pKiGhuIsLw:
total 240
drwx------.  4 root root    100 Jun 21 13:47 ./
-rw-r--r--.  1 root root 243210 Jun 21 13:47 orig
drwxr-xr-x.  2 root root     80 Jun 21 13:47 config/
drwxr-xr-x.  3 root root    120 Jun 21 13:47 data/
drwxrwxrwt. 13 root root    280 Jun 21 13:50 ../

/tmp/tmp.pKiGhuIsLw/config:
total 12
drwx------. 4 root root  100 Jun 21 13:47 ../
-rw-r--r--. 1 root root  583 Jun 21 13:47 syslog-input.conf
-rw-r--r--. 1 root root 4582 Jun 21 13:47 fluent.conf
drwxr-xr-x. 2 root root   80 Jun 21 13:47 ./

/tmp/tmp.pKiGhuIsLw/data:
total 40
drwx------. 4 root root   100 Jun 21 13:47 ../
drwxr-xr-x. 2 root root   240 Jun 21 13:47 docker/
-rw-r--r--. 1 root root 30030 Jun 21 13:47 messages
-rw-r--r--. 1 root root    52 Jun 21 13:47 system.pos
drwxr-xr-x. 3 root root   120 Jun 21 13:47 ./
-rw-r--r--. 1 root root  1740 Jun 21 13:47 es-containers.log.pos

/tmp/tmp.pKiGhuIsLw/data/docker:
total 320
drwxr-xr-x. 2 root root   240 Jun 21 13:47 ./
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-01_this-is-project-01_this-is-container-01-4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-02_this-is-project-02_this-is-container-02-53c234e5e8472b6ac51c1ae1cab3fe06fad053beb8ebfd8977b010655bfdd3c3.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-03_this-is-project-03_this-is-container-03-1121cfccd5913f0a63fec40a6ffd44ea64f9dc135c66634ba001d10bcf4302a2.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-04_this-is-project-04_this-is-container-04-7de1555df0c2700329e815b93b32c571c3ea54dc967b89e81ab73b9972b72d1d.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-05_this-is-project-05_this-is-container-05-f0b5c2c2211c8d67ed15e75e656c7862d086e9245420892a7de62cd9ec582a06.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-06_this-is-project-06_this-is-container-06-06e9d52c1720fca412803e3b07c4b228ff113e303f4c7ab94665319d832bbfb7.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-07_this-is-project-07_this-is-container-07-10159baf262b43a92d95db59dae1f72c645127301661e0a3ce4e38b295a97c58.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-08_this-is-project-08_this-is-container-08-aa67a169b0bba217aa0aa88a65346920c84c42447c36ba5f7ea65f422c1fe5d8.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-09_this-is-project-09_this-is-container-09-2e6d31a5983a91251bfae5aefa1c0a19d8ba3cf601d0e8a706b4cfa9661a6b8a.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-10_this-is-project-10_this-is-container-10-917df3320d778ddbaa5c5c7742bc4046bf803c36ed2b050f30844ed206783469.log
drwxr-xr-x. 3 root root   120 Jun 21 13:47 ../

Further we can see that there was possible issue in pushing the data to Elasticsearch (there was 1 rejected bulk request - bulk.rejected):

# curl http://10.40.2.198:9200/_cat/thread_pool?v
host         ip         bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
15dfd31fb381 172.17.0.2           0          0             1            0           0              0             0            0               0

However, Elasticsearch log does not show any errors. We can see that after the cluster is started expected indices were created and mapping was updated as documents are indexed. That is all:

# docker logs --tail=2000 15dfd31fb381
CLUSTER_NAME not set - using viaq
+ set -e
+ PORT=9200
+ HOST=localhost
+ LOG_FILE=elasticsearch_connect_log.txt
+ RETRY_COUNT=30
+ RETRY_INTERVAL=1
+ retry=30
+ max_time=30
+ timeouted=false
+ '[' -z '' ']'
+ echo CLUSTER_NAME not set - using viaq
+ export CLUSTER_NAME=viaq
+ CLUSTER_NAME=viaq
+ mkdir -p /elasticsearch/viaq
+ '[' -n '' ']'
+ regex='^([[:digit:]]+)([GgMm])$'
+ [[ 8G =~ ^([[:digit:]]+)([GgMm])$ ]]
+ num=8
+ unit=G
+ [[ G =~ [Gg] ]]
+ (( num = num * 1024 ))
+ [[ 8192 -lt 512 ]]
+ ES_JAVA_OPTS=' -Xms256M -Xmx4096m'
+ /usr/share/elasticsearch/bin/elasticsearch
+ add_index_template
+ wait_for_port_open
+ rm -f elasticsearch_connect_log.txt
+ echo -n 'Checking if Elasticsearch is ready on localhost:9200 '
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
Checking if Elasticsearch is ready on localhost:9200 .[2016-06-21 11:47:32,270][INFO ][node                     ] [Mister Sensitive] version[1.5.2], pid[11], build[62ff986/2015-04-27T09:21:06Z]
[2016-06-21 11:47:32,271][INFO ][node                     ] [Mister Sensitive] initializing ...
[2016-06-21 11:47:32,275][INFO ][plugins                  ] [Mister Sensitive] loaded [], sites []
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
..[2016-06-21 11:47:34,351][INFO ][node                     ] [Mister Sensitive] initialized
[2016-06-21 11:47:34,353][INFO ][node                     ] [Mister Sensitive] starting ...
[2016-06-21 11:47:34,487][INFO ][transport                ] [Mister Sensitive] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.17.0.2:9300]}
[2016-06-21 11:47:34,499][INFO ][discovery                ] [Mister Sensitive] elasticsearch/ABy2trHPR2-4ve1jp0SiEQ
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
...[2016-06-21 11:47:37,520][INFO ][cluster.service          ] [Mister Sensitive] new_master [Mister Sensitive][ABy2trHPR2-4ve1jp0SiEQ][15dfd31fb381][inet[/172.17.0.2:9300]], reason: zen-disco-join (elected_as_master)
[2016-06-21 11:47:37,545][INFO ][http                     ] [Mister Sensitive] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.2:9200]}
[2016-06-21 11:47:37,545][INFO ][node                     ] [Mister Sensitive] started
[2016-06-21 11:47:37,595][INFO ][gateway                  ] [Mister Sensitive] recovered [0] indices into cluster_state
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' -f elasticsearch_connect_log.txt ']'
+ grep -q 200 elasticsearch_connect_log.txt
+ echo '- connection successful'
- connection successful
+ curl -v -X GET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=30s'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
> GET /_cluster/health?wait_for_status=yellow&timeout=30s HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 252
< 
{ [data not shown]
100   252  100   252    0     0  15247      0 --:--:-- --:--:-- --:--:-- 14823
* Connection #0 to host localhost left intact
+ curl -v -X PUT -d@/usr/share/elasticsearch/config/com.redhat.viaq-template.json http://localhost:9200/_template/viaq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
> PUT /_template/viaq HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9200
> Accept: */*
> Content-Length: 7320
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
} [data not shown]
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 21
< 
{ [data not shown]
100  7341  100    21  100  7320    246  86075 --:--:-- --:--:-- --:--:-- 87142
* Connection #0 to host localhost left intact
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0}{"acknowledged":true}[2016-06-21 11:47:49,686][INFO ][cluster.metadata         ] [Mister Sensitive] [.operations.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:49,881][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-02.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:49,963][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-07.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,030][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-01.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,088][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-10.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,134][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-05.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,184][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-06.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,243][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-03.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,293][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-08.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,374][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-09.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,432][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-04.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:53,609][INFO ][cluster.metadata         ] [Mister Sensitive] [.operations.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,695][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-05.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,789][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-01.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,868][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-06.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,933][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-02.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,981][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-07.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,035][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-08.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,084][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-03.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,146][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-10.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,148][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-04.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,252][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-09.1.2016.06.21] update_mapping [fluentd] (dynamic)

The text was updated successfully, but these errors were encountered:

lukas-vlcek · 2016-06-21T12:58:30Z

Situation seem to be getting much worse with increasing number of files the Fluentd needs to tail.

For example if we use the following ENV variable for the test:

export NMESSAGES=20
export NPROJECTS=99

(^^ this means Fluentd will be tailing 100 log files each containing only 20 records)

We end up with the following indices state:

# curl http://10.40.2.198:9200/_cat/indices?v
health status index                           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   this-is-project-84.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-93.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-02.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-95.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-29.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-27.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-83.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-19.1.2016.06.21   5   1          3            0      6.7kb          6.7kb 
yellow open   this-is-project-97.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-20.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-86.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-82.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-35.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-45.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-78.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-96.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-30.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-64.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-34.1.2016.06.21   5   1         20            0     32.6kb         32.6kb 
yellow open   this-is-project-36.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-17.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-04.1.2016.06.21   5   1         12            0     19.7kb         19.7kb 
yellow open   this-is-project-01.1.2016.06.21   5   1         13            0       20kb           20kb 
yellow open   this-is-project-03.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-18.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-31.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-49.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-74.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-32.1.2016.06.21   5   1          8            0     13.3kb         13.3kb 
yellow open   this-is-project-81.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-89.1.2016.06.21   5   1          3            0      6.7kb          6.7kb 
yellow open   this-is-project-55.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-14.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-54.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-24.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-63.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-13.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-58.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-47.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-79.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-50.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-65.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-91.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-11.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-40.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-61.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-08.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-06.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-57.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-51.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-37.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-66.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-69.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-12.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-41.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-77.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-33.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-15.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-09.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-23.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-44.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-72.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-21.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-07.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-25.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-48.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-98.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-68.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-43.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-22.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-85.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-76.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-94.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-59.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-70.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-05.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-38.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-90.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-60.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   .operations.2016.06.21            5   1         20            0     23.8kb         23.8kb 
yellow open   this-is-project-28.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-16.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-88.1.2016.06.21   5   1          5            0      7.1kb          7.1kb 
yellow open   this-is-project-75.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-53.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-73.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-71.1.2016.06.21   5   1         20            0     32.6kb         32.6kb 
yellow open   this-is-project-92.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-42.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-67.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-52.1.2016.06.21   5   1         16            0     26.1kb         26.1kb 
yellow open   this-is-project-26.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-46.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-80.1.2016.06.21   5   1          9            0     13.6kb         13.6kb 
yellow open   this-is-project-39.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-99.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-56.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-87.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-62.1.2016.06.21   5   1         11            0     19.5kb         19.5kb 
yellow open   this-is-project-10.1.2016.06.21   5   1          0            0       575b           575b

Only few indices has some documents, just few have expected 20 documents.

We are getting 429 (!) bulk.rejected:

# curl http://10.40.2.198:9200/_cat/thread_pool?v
host         ip         bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
c681d5122401 172.17.0.2           0          0           429            0           0              0             0            0               0

Just for the record if we investigate Fluentd plugin status (via REST API, which is subject of PR #4) we can see that elasticsearch output does not hold any data in bufferes:

      {  
         "plugin_id":"object:1099830",
         "plugin_category":"output",
         "type":"elasticsearch_dynamic",
         "config":{  
            "@type":"elasticsearch_dynamic",
            "host":"viaq-elasticsearch",
            "port":"9200",
            "scheme":"http",
            "index_name":"${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}",
            "client_key":"",
            "client_cert":"",
            "ca_file":"",
            "flush_interval":"5s",
            "max_retry_wait":"300",
            "disable_retry_limit":""
         },
         "output_plugin":true,
         "buffer_queue_length":0,
         "buffer_total_queued_size":0,
         "retry_count":0
      }

lukas-vlcek · 2016-06-21T13:09:01Z

We might be suffering from default settings for queue sizes: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/modules-threadpool.html

lukas-vlcek · 2016-06-21T14:06:27Z

We are running into same issue when using rsyslog instead of Fluentd.

For the case:

export NMESSAGES=20
export NPROJECTS=99

We are getting bulk.rejected = 264

lukas-vlcek · 2016-06-21T15:09:09Z

Actually, as explained in official guide doc if bulk request fails due to rejection then it is not considered an error on ES side but this should be a signal to the client:

"Rejections are not errors: they just mean you should try again later."

Which means we need to check all clients (fluentd and rsyslog ATM) that they handle relevant HTTP response codes correctly. See here, could be good starting point.

lukas-vlcek · 2016-06-24T09:09:51Z

In case of rsyslog and omelasticsearch we need to configure errorfile (which is optional ATM). rsyslog should write all errored requests to this file (which should cover also rejected bulk requests - but I am not sure if this specific use case is tested in rsyslog code!). My understanding is that there is no mechanism of resubmitting data from this error file. It is responsibility of the user to setup some process to periodically check this file and investigate/decide what to do.

See http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html and rsyslog/rsyslog#104. Also check this rsyslog/rsyslog#246 for possible improvements/changes in newer versions of rsyslog (i.e. depending on version of rsyslog we use we should be able to get better support for resubmitting errored requests).

lukas-vlcek · 2016-06-24T09:38:04Z

Note that starting with ES 2.2 there should be configurable OOB bulk error retry mechanism that can self-heal from some issues we see today, see elastic/elasticsearch#14620 and elastic/elasticsearch#14829 (see also here https://www.elastic.co/guide/en/elasticsearch/reference/2.3/release-notes-2.2.0.html#enhancement-2.2.0 the "Java API").
However, this seem to be only improvement for short peaks, we still need to watch for cases where ES can not keep up with high bulk indexing traffic.

lukas-vlcek · 2016-06-30T13:57:00Z

In case of fluentd elasticsearch plugin it seem any sent data associated with following HTTP response code other than 200 is discarded, see uken/fluent-plugin-elasticsearch#105

cosmo0920 · 2017-11-21T07:08:08Z

This discarding issue is still present in fluent-plugin-elasticsearch v1.11.1 and v2.1.1?

richm · 2017-11-21T15:36:57Z

This discarding issue is still present in fluent-plugin-elasticsearch v1.11.1 and v2.1.1?

Probably not - we haven't tested with that

lukas-vlcek added the bug label Jun 21, 2016

lukas-vlcek mentioned this issue Jun 21, 2016

Missing documents in index due to bulk rejections openshift/origin-aggregated-logging#175

Closed

lukas-vlcek mentioned this issue Nov 11, 2016

WIP KBase information for splitting fluentd openshift/origin-aggregated-logging#264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing records from project logs #6

Missing records from project logs #6

lukas-vlcek commented Jun 21, 2016 •

edited

Loading

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 21, 2016 •

edited

Loading

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 24, 2016

lukas-vlcek commented Jun 24, 2016 •

edited

Loading

lukas-vlcek commented Jun 30, 2016

cosmo0920 commented Nov 21, 2017

richm commented Nov 21, 2017

Missing records from project logs #6

Missing records from project logs #6

Comments

lukas-vlcek commented Jun 21, 2016 • edited Loading

Summary

Details

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 21, 2016 • edited Loading

lukas-vlcek commented Jun 21, 2016

lukas-vlcek commented Jun 24, 2016

lukas-vlcek commented Jun 24, 2016 • edited Loading

lukas-vlcek commented Jun 30, 2016

cosmo0920 commented Nov 21, 2017

richm commented Nov 21, 2017

lukas-vlcek commented Jun 21, 2016 •

edited

Loading

lukas-vlcek commented Jun 21, 2016 •

edited

Loading

lukas-vlcek commented Jun 24, 2016 •

edited

Loading