Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch report worker poc #1

Merged
merged 13 commits into from
Dec 14, 2023
Merged

Batch report worker poc #1

merged 13 commits into from
Dec 14, 2023

Conversation

devinbinnie
Copy link
Owner

No description provided.

"github.com/mattermost/mattermost/server/v8/channels/store"
)

type BatchWorker[T interface{}] struct {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchWorker provides an interface for a job to create a worker that will loop a method over and over at a specified interval, with parameters that the method can modify to execute over a large dataset. The method can also tell the stop to stop when it's done, or the JobService can pause the job and resume with the last saved parameters.

This was refactored out of the work @lieut-data did on the BatchMigrationWorker, which now extends this module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first blush, it's not obvious to me what the generic offers here. It seems like BatchWorker itself doesn't use the app T, so would it make more sense for any embedding types to just accept an interface useful to itself?

See also https://go.dev/blog/when-generics and the section on Don’t replace interface types with type parameters.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is something I can iterate on a bit. One thing for the batch_report_worker is that it does need app methods, and I noticed our pattern was to define the interface and have the job server fill it with what's usually AppIFace. Is there a reason we do that? I could just make it require AppIFace I suppose.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the generics from BatchWorker, but I can't seem to get around using them for the report worker if I want to be able to define an extended interface on the job itself.

Is there something else I can do with this?

return data
}

func getData(jobData model.StringMap, app ExportUsersToCSVAppIFace) ([]model.ReportableObject, model.StringMap, bool, error) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called in a loop, with specified job data saved in the DB. It will pull a page of users from the DB and send it out to be saved. Then it will increment the page and continue looping.

}
}

func (u *UserReport) ToReport() []string {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ToReport method is required for an object to be saved using the reporting module. It is responsible for morphing the data into a string format to be exported.

@@ -1039,6 +1040,59 @@ type UserReport struct {
UserPostStats
}

func (u UserReport) GetHeaders() []string {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetHeaders method is required for an object to be saved using the reporting module. It is a static method that defines the ordering and the column names for a report.

getData func(jobData model.StringMap, app T) ([]model.ReportableObject, model.StringMap, bool, error)
}

func MakeBatchReportWorker[T BatchReportWorkerAppIFace](
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reporting job will create this worker, specifying the report format and the headers to use once compiled.

return true
}

for k, v := range nextData {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is only saved back once the batch is successful, so we can resume if an error occurs and still be at the right spot.

job.Data[k] = v
}

// TODO add progress?
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be doable if it's not too inefficient to count the number of pages that we will have.

}

func (a *App) SendReportToUser(userID string, filename string) *model.AppError {
return nil
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should send the report to the user via DM. Will be implemented later.

return nil
}

func (a *App) CompileReportChunks(format string, prefix string, numberOfChunks int, headers []string) (string, *model.AppError) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the job has batched everything, upon completion we should merge all the files together.
For a CSV, this just means appending everything into one file.

Copy link

@lieut-data lieut-data left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two high level thoughts on the overall design below -- I haven't reviewed everything line-by-line.

"github.com/mattermost/mattermost/server/v8/channels/store"
)

type BatchWorker[T interface{}] struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first blush, it's not obvious to me what the generic offers here. It seems like BatchWorker itself doesn't use the app T, so would it make more sense for any embedding types to just accept an interface useful to itself?

See also https://go.dev/blog/when-generics and the section on Don’t replace interface types with type parameters.

SendReportToUser(userID string, filename string) *model.AppError
}

type BatchReportWorker[T BatchReportWorkerAppIFace] struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we anticipate having multiple of these batch report workers, or is this abstraction preemptive? If yes, this is fine, but if not, I imagine it would be best to fold this into the ExportUsersToCSV job lest we find ourselves fighting the interface with the second such job instead of /then/ figuring out the commonalities.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we will have a definition for exporting channels to CSV as well coming soon, among any other number of reports we want to generate.

case "csv":
return a.saveCSVChunk(prefix, count, reportData)
}
return model.NewAppError("SaveReportChunk", "", nil, "unsupported report format", http.StatusInternalServerError)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be a status bad request as the incorrect format comes from client.

for _, report := range reportData {
err := w.Write(report.ToReport())
if err != nil {
return model.NewAppError("saveCSVChunk", "", nil, "failed to write report data to CSV", http.StatusInternalServerError)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to wrap original err.

w := csv.NewWriter(&headerBuf)
err := w.Write(headers)
if err != nil {
return "", model.NewAppError("compileCSVChunks", "", nil, "failed to write headers", http.StatusInternalServerError)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to wrap original err.

@devinbinnie devinbinnie merged commit 07697a0 into MM-55726 Dec 14, 2023
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants