Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fingerprint file identity by default and migrate all existing filestream inputs to it #40197

Closed
belimawr opened this issue Jul 8, 2024 · 10 comments · Fixed by #41762
Closed
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented Jul 8, 2024

We started seeing cases where the filestream-monitoring will log an entry stating the Elastic-Agent log file was truncated:

{
  "log.level": "info",
  "@timestamp": "2024-07-05T15:42:27.135Z",
  "message": "File was truncated. Reading file from offset 0. Path=/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-monitoring"
  },
  "service.name": "filebeat",
  "id": "filestream-monitoring-agent",
  "path": "/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "ecs.version": "1.6.0",
  "log.logger": "input.filestream",
  "log.origin": {
    "file.line": 300,
    "file.name": "filestream/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).openFile"
  },
  "source_file": "filestream::filestream-monitoring-agent::native::524773-66308",
  "state-id": "native::524423-66428"
}

We know the Elastic-Agent does not truncate files, the only reason for Filebeat to detect a truncation is if inodes are re-used.

The best way to prevent inode reuse from affecting the monitoring logs is to use a file identity that won't be re-used, the best option for that is the fingerprint file identity.

We should either start using the fingerprint file identity or at least make it configurable so users facing this issue can work around the inode reuse.

One very important thing to bear in mind is that if we change the file identity, all existing files will be considered new and re-ingested.

@cmacknz
Copy link
Member

cmacknz commented Jul 9, 2024

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period? That is the configured file identity becomes used for new events, but we can still deal with old events?

This would let us switch to fingerprint everywhere without concern about data duplication.

CC @rdner

@belimawr
Copy link
Contributor Author

belimawr commented Jul 9, 2024

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

At a conceptual level, no there isn't a reason.

The file identity is part of the key we use in the registry, we have some abstractions to generate the key from a "source", in simple terms it concatenates: input type, input ID, file identity and the actual identifier for the file, e.g: filestream::filestream-monitoring-agent::native::524042-42000. The parts of the code responsible for generating registry key and fetching data from the registry are not really connected. When fetching an entry from the registry we need to know the file identity that was used to generate the key.

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

After the migration we can proceed with a normal execution.

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period?

Not without re-writing the whole registry and interactions with it, but I believe the migration I mentioned above will work well.

@ycombinator
Copy link
Contributor

ycombinator commented Jul 10, 2024

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

@belimawr Please correct me if I'm wrong but the above seems to be the key migration work that needs to be implemented in Beats to resolve this issue? If so, we can transfer this issue to the Beats repository.

@cmacknz cmacknz changed the title Make filestream-monitoring file identity configurable or fingerprint Migrate all filestream inputs to filestream by default Jul 11, 2024
@cmacknz cmacknz transferred this issue from elastic/elastic-agent Jul 11, 2024
@cmacknz cmacknz changed the title Migrate all filestream inputs to filestream by default Migrate all filestream inputs to the fingerprint file identity by default Jul 11, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 11, 2024
@cmacknz
Copy link
Member

cmacknz commented Jul 11, 2024

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

@cmacknz cmacknz added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jul 11, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 11, 2024
@belimawr
Copy link
Contributor Author

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

What would be "all filestream inputs" here? All started by the Elastic-Agent? By integrations? Or it becomes more focused on implementing the option to switch file identity without losing the state?

@cmacknz
Copy link
Member

cmacknz commented Jul 11, 2024

It means all of them regardless of whether they run in Beats or Elastic Agent, unless there is a reason we think non-fingerprint file identities are superior in some cases.

This would perhaps be better titled, "Use fingerprint file identity by default and migrate all existing filestream inputs to it".

@cmacknz cmacknz changed the title Migrate all filestream inputs to the fingerprint file identity by default Use fingerprint file identity by default and migrate all existing filestream inputs to it Jul 11, 2024
@jlind23
Copy link
Collaborator

jlind23 commented Sep 16, 2024

@pierrehilbert I think you can plan this one as soon as possible in order to make it land in 9.0 and give it enough time to soak in test environment.

@jlind23
Copy link
Collaborator

jlind23 commented Nov 25, 2024

@belimawr looks like this is already under implementation, is there a PR we can link to this issue?

@belimawr
Copy link
Contributor Author

@belimawr looks like this is already under implementation, is there a PR we can link to this issue?

Yes, it's in draft because I just pushed code that works, but still needs work and tests. Honestly, I only pushed it to run the whole test suit and see where it breaks.

I'll update the title and add this issue in the related issue section today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants