Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.2-debian-cloudwatch crashes where v0.12 works (k8s 1.8.13 CoreOS 17455.0) #150

Open
whereisaaron opened this issue Jun 15, 2018 · 1 comment
Labels
help wanted We need your help!

Comments

@whereisaaron
Copy link

Deploying for Kubernetes 1.8.13 on CoreOS 1745.5.0 using fluent/fluentd-kubernetes-daemonset

Deploying with v0.12-debian-cloudwatch works great as in the past, however switching to v1.2-debian-cloudwatch and every Pod on every node crash after ~1 minute of run time. Occasionally they get to create a log-flow and even log some entries first, but they always crash. The kept getting restarted but they just crash again. They keep in time too, so after a while they all have exactly e.g. 12 crashes, so I am guess they run the same amount of time before crashing.

Everything else about the config remains unchanged. I wondered if Debian needed more memory so I removed that limit, but an every node in the cluster the container would still run for maybe a minute and then crash.

2018-06-15 21:38:39 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-06-15 21:38:46 +0000 [info]: using configuration file: <ROOT>
  <match fluent.**>
    @type null
  </match>
  <source>
    @type tail
    path "/var/log/containers/*.log"
    pos_file "/var/log/fluentd-containers.log.pos"
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    tag "kubernetes.*"
    format json
    read_from_head true
    <parse>
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      @type json
      time_type string
    </parse>
  </source>
  <filter kubernetes.**>
    @type kubernetes_metadata
  </filter>
  <filter kubernetes.**>
    @type record_transformer
    enable_ruby true
    <record>
      kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
    </record>
  </filter>
  <match kubernetes.**>
    @type cloudwatch_logs
    log_group_name "anthill-cluster-containers"
    log_stream_name_key "kubehost"
    remove_log_group_name_key true
    auto_create_stream true
    put_log_events_retry_limit 20
  </match>
</ROOT>
2018-06-15 21:38:46 +0000 [info]: starting fluentd-1.2.2 pid=5 ruby="2.3.3"
2018-06-15 21:38:46 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.3.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "--under-supervisor"]
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-cloudwatch-logs' version '0.5.0'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.1.2'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.1'
2018-06-15 21:38:50 +0000 [info]: gem 'fluentd' version '1.2.2'
2018-06-15 21:38:50 +0000 [info]: adding match pattern="fluent.**" type="null"
2018-06-15 21:38:50 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2018-06-15 21:38:54 +0000 [info]: adding filter pattern="kubernetes.**" type="record_transformer"
2018-06-15 21:38:54 +0000 [info]: adding match pattern="kubernetes.**" type="cloudwatch_logs"
2018-06-15 21:38:57 +0000 [info]: adding source type="tail"
2018-06-15 21:38:57 +0000 [info]: #0 starting fluentd worker pid=16 ppid=5 worker=0
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-prometheus-exporter-node-fwnkt_prometheus_node-exporter-1412af047f962327fb4e3f7949fac5028ae156606e68d064240a78d37fd8af65.log
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-node-drainer-ds-bghgj_kube-system_main-7a733ef08fe677ea9c3998026c6e3149b30ffbf031c9ddfba8450dcb9ce8dae6.log
2018-06-15 21:38:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.

My config:

  <match fluent.**>
    @type null
  </match>

  <source>
    @type tail
    path /var/log/containers/*.log
    pos_file /var/log/fluentd-containers.log.pos
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    tag kubernetes.*
    format json
    read_from_head true
  </source>

  <filter kubernetes.**>
    @type kubernetes_metadata
  </filter>

  <filter kubernetes.**>
    @type record_transformer
    enable_ruby true
    <record>
      kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
    </record>
  </filter>

  <match kubernetes.**>
    @type cloudwatch_logs
    log_group_name "#{ENV['LOG_GROUP_NAME']}"
    log_stream_name_key kubehost
    remove_log_group_name_key true
    auto_create_stream true
    put_log_events_retry_limit 20
  </match>
@repeatedly
Copy link
Member

Does anyone have an good idea for this issue?
Vanilla fluentd v1.2 doesn't have this issue so we want to know what is the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted We need your help!
Projects
None yet
Development

No branches or pull requests

2 participants