-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove task action audit logging and druid_taskLog metadata table #16309
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kfaraz
changed the title
Remove task action audit logging
Remove task action audit logging and druid_taskLog metadata table
Jun 6, 2024
indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
Show resolved
Hide resolved
...xing-service/src/test/java/org/apache/druid/indexing/overlord/http/OverlordResourceTest.java
Dismissed
Show dismissed
Hide dismissed
server/src/test/java/org/apache/druid/metadata/SQLMetadataStorageActionHandlerTest.java
Dismissed
Show dismissed
Hide dismissed
server/src/test/java/org/apache/druid/metadata/SQLMetadataStorageActionHandlerTest.java
Dismissed
Show dismissed
Hide dismissed
processing/src/test/java/org/apache/druid/metadata/MetadataStorageActionHandlerTest.java
Dismissed
Show dismissed
Hide dismissed
processing/src/test/java/org/apache/druid/metadata/MetadataStorageActionHandlerTest.java
Dismissed
Show dismissed
Hide dismissed
clintropolis
approved these changes
Jul 17, 2024
Thanks for the review, @clintropolis ! |
10 tasks
edgar2020
pushed a commit
to edgar2020/druid
that referenced
this pull request
Jul 19, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
edgar2020
pushed a commit
to edgar2020/druid
that referenced
this pull request
Jul 19, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
sreemanamala
pushed a commit
to sreemanamala/druid
that referenced
this pull request
Aug 6, 2024
…ache#16309) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, apache#6368. As called out in the original discussion apache#5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note
druid_audit
metadata table (ifdruid.audit.manager.type=sql
) or simply logged.druid_taskLog
metadata table.Description
Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.
As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs.
/indexer/v1/task/{taskId}/segments
which returns the list of segments created by a task.
segment/added/bytes
which reports both the segment ID and task IDwhen a segment is committed by a task. We could also include committed segment IDs in task reports.
on metadata storage.
Changes
TaskAuditLogConfig
TaskAction.isAudited()
. No task action is audited anymore.SegmentInsertAction
as it is not used anymore.SegmentTransactionalInsertAction
is the new incarnation which has been in use for a while.MetadataStorageActionHandler.addLog()
andgetLogs()
. These are not used anymore but need to be retained for backward compatibility of extensions.druid_taskLog
metadata table anymore.Release notes
/indexer/v1/task/{taskId}/segments
is not supported anymore and will give a 404 NOT FOUND response.druid_taskLog
anymore.druid.indexer.auditlog.enabled
will be ignored by Druid.task/action/log/time
will not be emitted anymore.Extension dev notes
The changes in this PR are backward compatible with all existing metadata storage extensions.
The methods
addLog
andgetLogs
ofMetadataStorageActionHandler
are now deprecatedand not used by the Druid code.
Any new metadata storage extension need not implement these methods.
Rolling upgrade concerns
No upgrade concerns as none of the tasks use the
SegmentInsertAction
.Future solutions
Which task created a segment?
A more preferable approach would be to simply add a
task_id
column in thesegments
table.Something similar has been recently done for pending segments in #16144.
Alternatively, it could also be possible to determine the list of segments committed by a task by inspecting
the reports of the task or emitted metrics.
Which user created a segment?
Task submission is already logged and/or persisted depending on configuration by the Druid audit system.
Once we can associate segments to task IDs, we would also be able to identify which user created a given
segment.
This PR has: