Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate internal job row type using struct conversion #37

Closed
wants to merge 1 commit into from

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Nov 18, 2023

I wanted to throw this one up as a possible alternative to a public
facing rivertype module in #36, in which we demonstrate an alternate
strategy using struct conversions. The problems it aims to solve are the
same:

  • A way to access core job types from within subpackages, allowing more
    modularity (e.g. Rescuer could live with other maintenance services).

  • Long term, a way of giving drivers access to job primitives without
    having to reference the top-level river.

The advantage of this approach is there's fewer user-facing changes, and
the user never has to interact with an alternate package.

It works by doing a struct conversion as necessary when moving from an
internal package to one that bubbles up to river. Go allows pure
struct conversions as long as their fields are the same, but there are
some limitations:

  • Substructs like AttemptError can't be converted, even if identical.

  • Custom types like JobState can't be converted, even if identical.

We work around these issues by:

  • Keeping errors as a [][]byte instead of it being unmarshaled by pgx.
    A helper called ErrorsUnmarshaled() is added to the top-level
    JobRow to give easy access to unmarshaled errors.

  • State is changed to a basic string akin to functions in http. This
    is technically a little less type-safe, but practically speaking would
    make very little difference because all these string-based types can
    be converted to and from each other with no checks anyway. Also, users
    aren't expected to access job state all that often. When they do, they
    can still use job.State == river.JobStateAvailable because the
    constants have been made available as strings.

I think this approach does also work reasonably well, although I worry a
little bit that there may be some sharp edge in the future that ends up
limiting us in some unexpected way. Also, having to use helpers like
ErrorsUnmarshaled() is definitely a little bit worse. That said, ~no
user-facing API changes, and has some marginal performance benefits:

  • Converting structs between each other should be ~free.

  • Errors probably aren't used that much, so by not unmarshaling them
    most of the time, we save on a lot of unnecessary JSON parsing.

Unlike #36, I didn't push the rescuer down (just in case this is a total
dead end), but this approach should make that possible.

I wanted to throw this one up as a possible alternative to a public
facing `rivertype` module in #36, in which we demonstrate an alternate
strategy using struct conversions. The problems it aims to solve are the
same:

* A way to access core job types from within subpackages, allowing more
  modularity (e.g. `Rescuer` could live with other maintenance services).

* Long term, a way of giving drivers access to job primitives without
  having to reference the top-level `river`.

The advantage of this approach is there's fewer user-facing changes, and
the user never has to interact with an alternate package.

It works by doing a struct conversion as necessary when moving from an
internal package to one that bubbles up to `river`. Go allows pure
struct conversions as long as their fields are the same, but there are
some limitations:

* Substructs like `AttemptError` can't be converted, even if identical.

* Custom types like `JobState` can't be converted, even if identical.

We work around these issues by:

* Keeping errors as a `[][]byte` instead of it being unmarshaled by pgx.
  A helper called `ErrorsUnmarshaled()` is added to the top-level
  `JobRow` to give easy access to unmarshaled errors.

* `State` is changed to a basic string akin to functions in `http`. This
  is technically a little less type-safe, but practically speaking would
  make very little difference because all these string-based types can
  be converted to and from each other with no checks anyway. Also, users
  aren't expected to access job state all that often. When they do, they
  can still use `job.State == river.JobStateAvailable` because the
  constants have been made available as strings.

I think this approach does also work reasonably well, although I worry a
little bit that there may be some sharp edge in the future that ends up
limiting us in some unexpected way. Also, having to use helpers like
`ErrorsUnmarshaled()` is definitely a little bit worse. That said, ~no
user-facing API changes, and has some marginal performance benefits:

* Converting structs between each other should be ~free.

* Errors probably aren't used that much, so by not unmarshaling them
  most of the time, we save on a lot of unnecessary JSON parsing.

Unlike #36, I didn't push the rescuer down (just in case this is a total
dead end), but this approach should make that possible.
@@ -809,7 +809,7 @@ func (c *Client[TTx]) distributeJobCompleterCallback(update jobcompleter.Complet
c.statsNumJobs++
}()

c.distributeJob(jobRowFromInternal(update.Job), jobStatisticsFromInternal(update.JobStats))
c.distributeJob((*JobRow)(dbsqlc.JobRowFromInternal(update.Job)), jobStatisticsFromInternal(update.JobStats))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that all of these currently convert river.JobRow <-> dbsqlc.RiverJob, but in the future we'd expect dbsqlc to get pushed entirely into the driver.

(dbsqlc is pgx v5 specific, so if we were to support an alternate package, it'd need its own dbsqlc-like package. For example, if we supported pgx v4, we'd have a dbsqlc configured to emit pgx v4, and that'd live in the riverpgxv4 driver.)

So the driver would convert to and return a rivertype.JobRow, and then all these conversion would become simple (*JobRow)(update.Job).

// JobRow contains the properties of a job that are persisted to the database.
// Use of `Job[T]` will generally be preferred in user-facing code like worker
// interfaces.
type JobRow struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical to river.JobRow so we're allowed to convert between them.

// there's no way to put a helper in a shared package that can produce one,
// which is why we have this copy/pasta. There are some potential alternatives
// to this, but none of them are great.
func jobRowFromInternal(internal *dbsqlc.RiverJob) *JobRow {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This moves into dbsqlc, and like #36, we can remove the duplicate that's in rivertest.

@brandur brandur requested a review from bgentry November 18, 2023 20:40
@brandur
Copy link
Contributor Author

brandur commented Nov 18, 2023

@bgentry Sorry I know these are going to be complex as hell to try and parse, but this is one change I think we should make before stabilizing the API so we have more runway for reconfiguring things down the line. Both the approaches in #36 and #37 seem plausible to me, with this one being the slightly more exotic/risky option, but with the smaller user-facing change.

@@ -69,7 +68,7 @@ type JobRow struct {

// Errors is a set of errors that occurred when the job was worked, one for
// each attempt. Ordered from earliest error to the latest error.
Errors []AttemptError
Errors [][]byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a fairly big downside and it's the reason I think #36 is the only viable option. I know we'll also be making use of metadata in the future and I think it's important that we keep these both as usable as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's definitely a sizable downside.

I played around with this a little more and one thing we could do there is rename the existing JobRow to JobRowRaw:

type JobRowRaw struct {
	// ID of the job. Generated as part of a Postgres sequence and generally
	// ascending in nature, but there may be gaps in it as transactions roll
	// back.
	ID int64

...

And then make JobRow embed a raw row with unmarshaled Errors that shadows the raw errors:

type JobRow struct {
	*JobRowRaw

	Errors []AttemptError
}

It does work pretty well actually, but ugh, the downside is that it unfortunately does still complicate the inheritance a little much because we now have JobRowRaw -> JobRow -> Job which is starting to get gnarly.

@brandur
Copy link
Contributor Author

brandur commented Nov 25, 2023

Closing this one as we went with #36 instead.

@brandur brandur closed this Nov 25, 2023
@brandur brandur deleted the brandur-struct-conversions branch November 25, 2023 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants