Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove task action audit logging and druid_taskLog metadata table #16309

Merged
merged 16 commits into from
Jul 17, 2024

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Apr 18, 2024

Note

  • This PR does not pertain to the audit logging system used by Druid for auditing all major update actions such as run a task, update rules, update dynamic configs, create a supervisor, etc. That information is persisted in druid_audit metadata table (if druid.audit.manager.type=sql) or simply logged.
  • Instead, it deals with the audit logging used only for task actions i.e. the druid_taskLog metadata table.

Description

Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.

As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs.

  • Only usage of the task audit logs is to serve the API /indexer/v1/task/{taskId}/segments
    which returns the list of segments created by a task.
  • The use case is really narrow and no prod clusters really use this information.
  • There can be better ways of obtaining this information, such as the metric
    segment/added/bytes which reports both the segment ID and task ID
    when a segment is committed by a task. We could also include committed segment IDs in task reports.
  • A task persisting several segments would bloat up the audit logs table putting unnecessary strain
    on metadata storage.

Changes

  • Remove TaskAuditLogConfig
  • Remove method TaskAction.isAudited(). No task action is audited anymore.
  • Remove SegmentInsertAction as it is not used anymore. SegmentTransactionalInsertAction is the new incarnation which has been in use for a while.
  • Deprecate MetadataStorageActionHandler.addLog() and getLogs(). These are not used anymore but need to be retained for backward compatibility of extensions.
  • Do not create druid_taskLog metadata table anymore.

Release notes

  • Task action audit logging was deprecated in Druid 0.13 and is being completely removed in this release.
  • The API /indexer/v1/task/{taskId}/segments is not supported anymore and will give a 404 NOT FOUND response.
  • Druid will not write to or read from the metadata table druid_taskLog anymore.
  • The property druid.indexer.auditlog.enabled will be ignored by Druid.
  • The metric task/action/log/time will not be emitted anymore.

Extension dev notes

The changes in this PR are backward compatible with all existing metadata storage extensions.
The methods addLog and getLogs of MetadataStorageActionHandler are now deprecated
and not used by the Druid code.
Any new metadata storage extension need not implement these methods.

Rolling upgrade concerns

No upgrade concerns as none of the tasks use the SegmentInsertAction.

Future solutions

Which task created a segment?

A more preferable approach would be to simply add a task_id column in the segments table.
Something similar has been recently done for pending segments in #16144.

Alternatively, it could also be possible to determine the list of segments committed by a task by inspecting
the reports of the task or emitted metrics.

Which user created a segment?

Task submission is already logged and/or persisted depending on configuration by the Druid audit system.
Once we can associate segments to task IDs, we would also be able to identify which user created a given
segment.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz kfaraz changed the title Remove task audit logging Remove task action audit logging Apr 19, 2024
@kfaraz kfaraz changed the title Remove task action audit logging Remove task action audit logging and druid_taskLog metadata table Jun 6, 2024
@kfaraz kfaraz requested review from clintropolis and removed request for abhishekrb19 June 26, 2024 07:33
@kfaraz kfaraz requested a review from clintropolis June 26, 2024 11:38
@kfaraz kfaraz merged commit 9f6ce6d into apache:master Jul 17, 2024
88 checks passed
@kfaraz kfaraz deleted the remove_task_action_audit branch July 17, 2024 11:39
@kfaraz
Copy link
Contributor Author

kfaraz commented Jul 17, 2024

Thanks for the review, @clintropolis !

@kfaraz kfaraz mentioned this pull request Jul 17, 2024
10 tasks
edgar2020 pushed a commit to edgar2020/druid that referenced this pull request Jul 19, 2024
…ache#16309)

Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368.

As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. 
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.

Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
edgar2020 pushed a commit to edgar2020/druid that referenced this pull request Jul 19, 2024
…ache#16309)

Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368.

As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. 
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.

Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
…ache#16309)

Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368.

As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. 
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.

Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants