Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Support for Dynamic CSV Headers in filelog Receiver. #36415

Open
VenuEmmadi opened this issue Nov 18, 2024 · 11 comments
Open

Need Support for Dynamic CSV Headers in filelog Receiver. #36415

VenuEmmadi opened this issue Nov 18, 2024 · 11 comments
Assignees
Labels
discussion needed Community discussion needed enhancement New feature or request receiver/filelog

Comments

@VenuEmmadi
Copy link
Contributor

Component(s)

receiver/filelog

What happened?

Description

Description:
I am using the filelog receiver in the OpenTelemetry Collector Contrib to parse CSV log files. When parsing a single file with a predefined header, the configuration works as expected. However, when attempting to process multiple CSV files with different headers, there is no way to dynamically handle varying headers.

If the header is omitted, the configuration fails with an error. This limitation makes it impossible to manage directories containing multiple CSV files with different structures efficiently.

Steps to Reproduce

Steps to Reproduce :

  1. Configure the filelog receiver to parse a single CSV file with a specified header
    receivers:
    filelog/LightningInteractionLogs_quoted:
    include: [/u01/SFLogs/8292024/continuationcallout_hundred.csv]
    start_at: beginning
    operators:

    • type: csv_parser
      header: ApplicationName, page_app_name, Application_Version, Environment, HostName, EventType, timestamp, user_id, user_name, url, duration, request_form_size, response_size, status_code, success, TimestampDerived
  2. Attempt to configure the receiver to include multiple CSV files with varying headers:
    receivers:
    filelog/LightningInteractionLogs_multiple:
    include: [/u01/SFLogs/*.csv]
    start_at: beginning
    operators:

    • type: csv_parser

      No way to handle multiple headers dynamically

  3. Observe the failure when the header is not explicitly provided:
    Error: failed to build pipelines: failed to create "filelog/LightningInteractionLogs_multiple" receiver for data type "logs"; missing required field "header" or "header_attribute"

Expected Result

Expected Result :
The csv_parser operator should be able to:

Dynamically detect headers from the first row of the CSV file (e.g., via a dynamic_header option).
Alternatively, allow mapping specific headers to specific files or file patterns using a header_attribute or similar configuration.

For example:
receivers:
filelog/LightningInteractionLogs_dynamic:
include: [/u01/SFLogs/*.csv]
start_at: beginning
operators:
- type: csv_parser
dynamic_header: true

Actual Result

Actual Result
The configuration fails when header is not explicitly provided, making it impossible to process multiple CSV files with different headers in the same receiver configuration.

Error message:
Error: failed to build pipelines: failed to create "filelog/LightningInteractionLogs_multiple" receiver for data type "logs"; missing required field "header" or "header_attribute"

Collector version

v0.109.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  filelog/LightningInteractionLogs_multiple:
    include: [/u01/SFLogs/*.csv]
    start_at: beginning
    operators:
      - type: csv_parser
exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    logs:
      receivers: [filelog/LightningInteractionLogs_multiple]
      exporters: [logging]

Log output

Error: failed to build pipelines: failed to create "filelog/LightningInteractionLogs_multiple" receiver for data type "logs"; missing required field "header" or "header_attribute"

Additional context

No response

@VenuEmmadi VenuEmmadi added bug Something isn't working needs triage New item requiring triage labels Nov 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

To me, it sounds like a valid enhancement request. But I'm quite unsure how to accomplish this.
Maybe the codeowners have thoughts over this?

@VihasMakwana VihasMakwana added discussion needed Community discussion needed enhancement New feature or request and removed needs triage New item requiring triage bug Something isn't working labels Nov 20, 2024
@VihasMakwana
Copy link
Contributor

This is an enhancement, not a bug. Please let me know if you disagree

@VenuEmmadi
Copy link
Contributor Author

This is an enhancement, not a bug. Please let me know if you disagree

I’m not entirely sure if this qualifies as an enhancement or a bug. At the very least, they should not accept the acceptance of regular expressions for file names in case of csv-parser. Please share your thoughts.

@VihasMakwana
Copy link
Contributor

I see what you mean. But I'm not very supportive of that idea. It seems like a strange case to me.

I'll explore the codebase and see what we can do here. In the meantime, I'll ask @djaglowski to share his thoughts over this issue.

@djaglowski
Copy link
Member

missing required field "header" or "header_attribute"

Have you tried using header_attribute?

@VihasMakwana
Copy link
Contributor

@djaglowski I think the user is looking for following use case:

Consider two csv files, file1.csv and file2.csv:

file1.csv:

headerA, headerB, headerC
...,
...

file2.csv:

headerX, headerY, headerZ
...,
...

If the user uses following config, he/she may want the parser to automatically detect the headers for different csv files.

receivers:
  filelog/LightningInteractionLogs_multiple:
    include: [file*.csv]
    start_at: beginning
    operators:
      - type: csv_parser
exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    logs:
      receivers: [filelog/LightningInteractionLogs_multiple]
      exporters: [logging]

For all the logs emitted from file1.csv, the parser will automatically set headers as headerA, headerB, headerC and for file2.csv as headerX, headerY, headerZ.

@VihasMakwana
Copy link
Contributor

VihasMakwana commented Dec 2, 2024

Dan, Is it possible to use header_attribute for multiple files, with different headers, under same stanza? I'm kind of unaware 😅

EDIT: closed by mistake.

@VihasMakwana
Copy link
Contributor

Found an older related issue #10275

@djaglowski
Copy link
Member

Thanks for clarifying @VihasMakwana.

I think it might be possible to accomplish this using the filelog reciever's header settings. The main problem is that I'm not sure how you'd write the header.pattern such that it would match the header and not the following lines. Maybe we need to add another option to it like header.lines: 1 which can be used instead of header.pattern.

The general idea would be that you configure the header section to move the header value into a known attribute, then use the csv parser's header_attribute.

@VihasMakwana
Copy link
Contributor

I can take this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Community discussion needed enhancement New feature or request receiver/filelog
Projects
None yet
Development

No branches or pull requests

3 participants