-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch report worker poc #1
Conversation
server/channels/jobs/batch_worker.go
Outdated
"github.com/mattermost/mattermost/server/v8/channels/store" | ||
) | ||
|
||
type BatchWorker[T interface{}] struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchWorker
provides an interface for a job to create a worker that will loop a method over and over at a specified interval, with parameters that the method can modify to execute over a large dataset. The method can also tell the stop to stop when it's done, or the JobService can pause the job and resume with the last saved parameters.
This was refactored out of the work @lieut-data did on the BatchMigrationWorker
, which now extends this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first blush, it's not obvious to me what the generic offers here. It seems like BatchWorker
itself doesn't use the app T
, so would it make more sense for any embedding types to just accept an interface useful to itself?
See also https://go.dev/blog/when-generics and the section on Don’t replace interface types with type parameters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is something I can iterate on a bit. One thing for the batch_report_worker
is that it does need app methods, and I noticed our pattern was to define the interface and have the job server fill it with what's usually AppIFace
. Is there a reason we do that? I could just make it require AppIFace
I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the generics from BatchWorker, but I can't seem to get around using them for the report worker if I want to be able to define an extended interface on the job itself.
Is there something else I can do with this?
return data | ||
} | ||
|
||
func getData(jobData model.StringMap, app ExportUsersToCSVAppIFace) ([]model.ReportableObject, model.StringMap, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called in a loop, with specified job data saved in the DB. It will pull a page of users from the DB and send it out to be saved. Then it will increment the page and continue looping.
} | ||
} | ||
|
||
func (u *UserReport) ToReport() []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ToReport
method is required for an object to be saved using the reporting module. It is responsible for morphing the data into a string format to be exported.
@@ -1039,6 +1040,59 @@ type UserReport struct { | |||
UserPostStats | |||
} | |||
|
|||
func (u UserReport) GetHeaders() []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetHeaders
method is required for an object to be saved using the reporting module. It is a static method that defines the ordering and the column names for a report.
getData func(jobData model.StringMap, app T) ([]model.ReportableObject, model.StringMap, bool, error) | ||
} | ||
|
||
func MakeBatchReportWorker[T BatchReportWorkerAppIFace]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reporting job will create this worker, specifying the report format and the headers to use once compiled.
return true | ||
} | ||
|
||
for k, v := range nextData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data is only saved back once the batch is successful, so we can resume if an error occurs and still be at the right spot.
job.Data[k] = v | ||
} | ||
|
||
// TODO add progress? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be doable if it's not too inefficient to count the number of pages that we will have.
} | ||
|
||
func (a *App) SendReportToUser(userID string, filename string) *model.AppError { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should send the report to the user via DM. Will be implemented later.
return nil | ||
} | ||
|
||
func (a *App) CompileReportChunks(format string, prefix string, numberOfChunks int, headers []string) (string, *model.AppError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the job has batched everything, upon completion we should merge all the files together.
For a CSV, this just means appending everything into one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two high level thoughts on the overall design below -- I haven't reviewed everything line-by-line.
server/channels/jobs/batch_worker.go
Outdated
"github.com/mattermost/mattermost/server/v8/channels/store" | ||
) | ||
|
||
type BatchWorker[T interface{}] struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first blush, it's not obvious to me what the generic offers here. It seems like BatchWorker
itself doesn't use the app T
, so would it make more sense for any embedding types to just accept an interface useful to itself?
See also https://go.dev/blog/when-generics and the section on Don’t replace interface types with type parameters
.
SendReportToUser(userID string, filename string) *model.AppError | ||
} | ||
|
||
type BatchReportWorker[T BatchReportWorkerAppIFace] struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we anticipate having multiple of these batch report workers, or is this abstraction preemptive? If yes, this is fine, but if not, I imagine it would be best to fold this into the ExportUsersToCSV job lest we find ourselves fighting the interface with the second such job instead of /then/ figuring out the commonalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we will have a definition for exporting channels to CSV as well coming soon, among any other number of reports we want to generate.
case "csv": | ||
return a.saveCSVChunk(prefix, count, reportData) | ||
} | ||
return model.NewAppError("SaveReportChunk", "", nil, "unsupported report format", http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be a status bad request as the incorrect format comes from client.
for _, report := range reportData { | ||
err := w.Write(report.ToReport()) | ||
if err != nil { | ||
return model.NewAppError("saveCSVChunk", "", nil, "failed to write report data to CSV", http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to wrap original err
.
w := csv.NewWriter(&headerBuf) | ||
err := w.Write(headers) | ||
if err != nil { | ||
return "", model.NewAppError("compileCSVChunks", "", nil, "failed to write headers", http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to wrap original err
.
No description provided.