-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing records from project logs #6
Comments
Situation seem to be getting much worse with increasing number of files the Fluentd needs to tail. For example if we use the following ENV variable for the test:
(^^ this means Fluentd will be tailing 100 log files each containing only 20 records) We end up with the following indices state:
Only few indices has some documents, just few have expected 20 documents. We are getting 429 (!)
Just for the record if we investigate Fluentd plugin status (via REST API, which is subject of PR #4) we can see that elasticsearch output does not hold any data in bufferes:
|
We might be suffering from default settings for queue sizes: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/modules-threadpool.html |
We are running into same issue when using rsyslog instead of Fluentd. For the case:
We are getting |
Actually, as explained in official guide doc if bulk request fails due to rejection then it is not considered an error on ES side but this should be a signal to the client:
Which means we need to check all clients (fluentd and rsyslog ATM) that they handle relevant HTTP response codes correctly. See here, could be good starting point. |
In case of rsyslog and omelasticsearch we need to configure See http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html and rsyslog/rsyslog#104. Also check this rsyslog/rsyslog#246 for possible improvements/changes in newer versions of rsyslog (i.e. depending on version of rsyslog we use we should be able to get better support for resubmitting errored requests). |
Note that starting with ES 2.2 there should be configurable OOB bulk error retry mechanism that can self-heal from some issues we see today, see elastic/elasticsearch#14620 and elastic/elasticsearch#14829 (see also here https://www.elastic.co/guide/en/elasticsearch/reference/2.3/release-notes-2.2.0.html#enhancement-2.2.0 the "Java API"). |
In case of fluentd elasticsearch plugin it seem any sent data associated with following HTTP response code other than |
This discarding issue is still present in fluent-plugin-elasticsearch v1.11.1 and v2.1.1? |
Probably not - we haven't tested with that |
Summary
Missing log messages in Elasticsearch indices when running
openshift-test.sh
script.Details
Note: the fix from PR #5 needs to be applied/merged first.
Assume the following ENV variables:
When the test
openshift-test.sh
is executed it fails (time-outs) on verification of expected records in the index for the last project (i.e.project-09
). Specifically:Further investigation reveals that this index is missing some records (note the index
this-is-project-09.1.2016.06.21
containing only88
documents instead of110
):Note the size of source log files in the
/tmp/tmp.pKiGhuIsLw/data/docker
folder below, all are equal (which is expected and it means they all contain the same number of log messages):Further we can see that there was possible issue in pushing the data to Elasticsearch (there was 1 rejected bulk request -
bulk.rejected
):However, Elasticsearch log does not show any errors. We can see that after the cluster is started expected indices were created and mapping was updated as documents are indexed. That is all:
The text was updated successfully, but these errors were encountered: