Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to zap logging #135

Merged

Conversation

ewollesen
Copy link
Contributor

As requested originally in #126 then re-worked to build on top of #134

BACK-2500

The function was writing HTTP responses and logging, both of which are better
done in the actual HTTP handler methods. When these helper methods know about
HTTP and can (but don't always) write responses, it makes the handlers awkward
to write, because it's never clear whether they should just return afterwards,
or if they can do any error recovery or better logging, etc.

Without the logging and sometimes writing of HTTP responses, the helper wasn't
doing anything, and so it's easily removed.

Also, the error handling of FindConfirmation is made consistent as an internal
server error.

BACK-2500
The function was writing HTTP responses and logging, both of which are better
done in the actual HTTP handler methods. When these helper methods know about
HTTP and can (but don't always) write responses, it makes the handlers awkward
to write, because it's never clear whether they should just return afterwards,
or if they can do any error recovery or better logging, etc.

As one example, findResetConfirmation would write 404 responses if the
confirmation wasn't found. The handler ignored that one particular case, and
it wasn't obvious why it was ignoring it. Even as the handler ignored it, it
still had to have a nil check/guard statement, which indented the code
awkwardly and left the reader wondering just what was returned the user if
conf was nil.

BACK-2500
During code review, it was requested that the zap logger be used.

I might have gone a bit overboard, but I've moved everything over to zap. I
think it's still TBD if zap is what we really want, but at least now we
can see it in action, and better evaluate if it brings advantages over other
options.

Zap produces a stack trace at the Error log level, so I used that as a
guide. In general, any time a function returned an error, and that error
caused the HTTP handler to abort its processing with a 5XX HTTP status, I used
the Error level for the log, so we'll log the stack trace that led up to the
error.

In the event that an error was returned and the handler returned a 4XX HTTP
status, I used the Info log level. My thinking was if the handler was telling
the client that the error was on their side, then whatever the Go error was,
we didn't need a stack trace. This happens a lot for example when a request
body contains invalid JSON. The Unmarshal call will fail with an error and the
handler returns a HTTP 400 or similar. A stack trace in that event wouldn't be
useful.

In general, I didn't use the Warn level much, as it behaves differently in
prod versus non-prod. In the former, it doesn't include a stack trace, while
in the latter it does. I used it during an optional permissions check. In
development, we'll get better debug info, but in production we'll skip the
stack trace, as the error isn't going to cause an abort of the handler method.

I also refactored many API responses to use the existing sendError and other
related methods. This should help maintain consistency in the future, and
should we decide to remove zap for example, we'll have fewer places to modify,
as the move to using sendError and friends has greatly reduced the number of
log calls that were scattered around the code base. (Last I checked, the
number of logging sites was reduced from 153 to 49, according to rough grep
checks.)

Another request from code review, was that logging should include the userId
where one existed. This was accomplished using a middleware function, such
that the userId is added to the logger's context and will automatically appear
in the log output. There's no need to manually specify it at the logging call
site.

The zap dependency version was updated to 1.22 to get access to the
WithOptions method of the Sugared logger, which makes some of our logging
cleaner.

BACK-2500
That's the InvocationParams found in from go.uber.org/fx

BACK-2500
When migrating to zap logging, the Api.logger was modified by middleware that
was adding the request's userID to the logger's context in a way that wasn't
thread-safe. The result was that many many non-sensical userIDs were logged in
the context, producing log output that wasn't at all useful.

The best way to fix this was to properly generate a child logger for each
request with the appropriate userID in its logging context, then pass that
logger into the handlers via their request's context.

As a result, the fix touches a lot of code—anywhere that logs, now needs to
retrieve the request-specific logger from that request's context.

Tests were written that attempt to exacerbate the problem, but as its a caused
by race conditions, it's very hard to reliably reproduce. Flappiness in the
test is a red flag though, and comments were added to indicate that.

BACK-2500
From a previous code review it was pointed out that some amount of context was
lost from some log messages. I have reviewed all the logging changes, and
found these places, and restored them. I may have missed a few, but I made a
best-effort attempt.

BACK-2500
@ewollesen
Copy link
Contributor Author

TL;DR If you see a logging site that has lost context, please point it out specifically!

Included in this new PR is another round of review for every logging site, looking at the potential loss of any context. Sometimes it may look like a loss of context at first glance, but there are several factors that aren't always obvious that I took into consideration. A partial list:

  1. if the function caller logs the same information
  2. if the information is now included via the context logger (e.g. userID or the function caller's name)
  3. some calls were demoted from warn to info, as warn will include a backtrace, and where there's no error to report on, backtraces don't make much sense

So please, if you see a place where you think context has been lost, please double check that one of the above doesn't replace the missing context/information, and if there's still a lack somewhere, then please call out each site. I've been through every logging site laboriously looking for places where I missed something, and I'd really rather not having to comb over everything yet again. :) Thank you!

return
}

defer req.Body.Close()
var ib = &ClinicInvite{}
if err := json.NewDecoder(req.Body).Decode(ib); err != nil {
a.logger.Errorw("error decoding invite", zap.Error(err))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this info is included in the sendError call.

@@ -85,37 +86,27 @@ func (a *Api) passwordReset(res http.ResponseWriter, req *http.Request, vars map
a.logMetricAsServer("reset confirmation sent")
} else {
a.logMetricAsServer("reset confirmation failed to be sent")
log.Print("Something happened generating a passwordReset email")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this out as a) addOrUpdateConfirmation already does logging, and b) it's not useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean createAndSendNotification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, yes. And I just double checked again to make sure, and yes, everywhere it returns false, it logs.

if err != nil {
log.Printf("findResetConfirmation: error [%s]\n", err.Error())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of this function will report the error, there' no need for duplications.

The same applies to all the logging in this function. The context of whether it wasn't found, or expired is maintained.

Here's the only place this method is called if you want to double check:

https://github.com/tidepool-org/hydrophone/blob/eric%2Fsend-invite-v2.2-logging/api/forgot.go#L138

// Find this confirmation
// write error if it fails
func (a *Api) addProfile(conf *models.Confirmation) error {
if conf.CreatorId != "" {
if err := a.seagull.GetCollection(conf.CreatorId, "profile", a.sl.TokenProvide(), &conf.Creator.Profile); err != nil {
log.Printf("error getting the creators profile [%v] ", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the caller logs the error in all cases.

return
}
if conf == nil {
statusErr := &status.StatusError{Status: status.NewStatus(http.StatusNotFound, statusInviteNotFoundMessage)}
log.Println("AcceptInvite ", statusErr.Error())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AcceptInvite will be logged by zap, as it logs the function name from which it was called.

@@ -163,7 +132,6 @@ func (a *Api) sendSignUp(res http.ResponseWriter, req *http.Request, vars map[st
return
} else {
a.logMetric("signup confirmation failed to be sent", req)
log.Print("Something happened generating a signup email")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not useful. Plus, createAndSendNotification writes log messages that actually contain useful information.

@@ -232,7 +202,6 @@ func (a *Api) resendSignUp(res http.ResponseWriter, req *http.Request, vars map[
a.logMetricAsServer("signup confirmation re-sent")
} else {
a.logMetricAsServer("signup confirmation failed to be sent")
log.Print("resendSignUp: Something happened trying to resend a signup email")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not useful. Plus, createAndSendNotification writes log messages that actually contain useful information.

@@ -144,7 +145,6 @@ func (c *MongoStoreClient) FindConfirmation(ctx context.Context, confirmation *m
opts := options.FindOne().SetSort(bson.D{{Key: "created", Value: -1}})

if err = confirmationsCollection(c).FindOne(ctx, query, opts).Decode(&result); err != nil && err != mongo.ErrNoDocuments {
log.Printf("FindConfirmation: something bad happened [%v]", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the caller logs these errors.

@@ -110,23 +110,7 @@ func (c *SesNotifier) Send(to []string, subject string, msg string) (int, string

// Display error messages if they occur.
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this information is output in the string representation of the AWS error, so nothing's lost here.

@@ -26,7 +25,6 @@ func NewHandler(store clients.StoreClient) events.EventHandler {
}

func (h *handler) HandleDeleteUserEvent(payload events.DeleteUserEvent) error {
log.Printf("Deleting confirmations for user %v", payload.UserID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as not useful. I guess I could re-add it as a trace-level, if this is somehow useful to someone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Since we are deleting user data it should even be logged at info level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

Requested in a previous code review.

BACK-2500
Copy link
Contributor

@toddkazakov toddkazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some issues and some nitpicks

api/clinic.go Outdated
if err != nil {
a.sendError(res, http.StatusInternalServerError, STATUS_ERR_FINDING_CONFIRMATION, err)
a.sendError(ctx, res, http.StatusInternalServerError, STATUS_ERR_FINDING_CONFIRMATION, err,
fmt.Sprintf("clinic %s user already has or had an invite from %v", clinicId, inviterID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's better if you pass clinicId and inviterId a structured arguments to the logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I just copied the existing text, but I can do that.

statusErr := a.sendClinicianConfirmation(req, confirmation)
if statusErr != nil {
a.sendError(res, statusErr.Code, statusErr.Reason, statusErr.Error())
code, msg, optionalErr := a.sendClinicianConfirmation(req, confirmation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having more than two arguments is a code smell. Consider wrapping all non-error return args in a struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the code is superfluous anyway, removing it.

if statusErr != nil {
a.sendError(res, statusErr.Code, statusErr.Reason, statusErr.Error())
code, msg, optionalErr := a.sendClinicianConfirmation(req, confirmation)
if code != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value. What is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed. :D

}

a.logMetric("clinician_invite_sent", req)
return nil
return 0, "", nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having more than two arguments is a code smell. There's already a struct that wraps a code and a message status.Status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

api/forgot.go Outdated
return
}

resetCnf.Email = email
//there is nothing more to do other than notify the user
resetCnf.UpdateStatus(models.StatusCompleted)
a.logger(ctx).With(zap.String("email", email)).Info(STATUS_RESET_NO_ACCOUNT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a duplicate of L63, but with a different error level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing what was there previously, I've removed this line.

c.Handle("/resend/signup/{useremail}", varsHandler(a.resendSignUp)).Methods("POST")
c.Handle("/resend/invite/{inviteId}", varsHandler(a.ResendInvite)).Methods("PATCH")
c.Handle("/resend/signup/{useremail}", vars(a.resendSignUp)).Methods("POST")
c.Handle("/resend/invite/{inviteId}", uid(vars(a.ResendInvite))).Methods("PATCH")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary use of uid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was another on 247 that I've also removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and I double checked the rest.

rtr.Handle("/v1/clinics/{clinicId}/invites/clinicians/{inviteId}", varsHandler(a.GetClinicianInvite)).Methods("GET")
rtr.Handle("/v1/clinics/{clinicId}/invites/clinicians/{inviteId}", varsHandler(a.ResendClinicianInvite)).Methods("PATCH")
rtr.Handle("/v1/clinics/{clinicId}/invites/clinicians/{inviteId}", varsHandler(a.CancelClinicianInvite)).Methods("DELETE")
c.Handle("/v1/clinics/{clinicId}/invites/patients", vars(a.GetPatientInvites)).Methods("GET")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding clinicId to the logger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a stab at this and see how it works.

nonErrs = append(nonErrs, extra)
}
}
return nonErrs, errs, fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating a struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -26,7 +25,6 @@ func NewHandler(store clients.StoreClient) events.EventHandler {
}

func (h *handler) HandleDeleteUserEvent(payload events.DeleteUserEvent) error {
log.Printf("Deleting confirmations for user %v", payload.UserID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Since we are deleting user data it should even be logged at info level.

a.sendModelAsResWithStatus(res, statusErr, http.StatusInternalServerError)
// addOrUpdateConfirmation logs and writes a response on errors
if !a.addOrUpdateConfirmation(ctx, conf, res) {
a.sendError(ctx, res, http.StatusInternalServerError, STATUS_ERR_SAVING_CONFIRMATION, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is already sent at this point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, good catch.

- fixes a logic error (re-factor gone wrong)
- some logging tweaks to level or structured variables
- re-worked sendClinicianConfirmation down to 2 return values
- removed findResetConfirmation (low value, confusing results)
- re-worked splitExtrasAndErrorsAndFields to return a struct

Next commit will have a few more related tweaks.

BACK-2500
As requested in code review. The addUserIDToLogger method was made
general-purpose. In addition, instead of using it only on those handlers that
have a userId/clinicId parameter, it's used as a generic middleware on all
requests. This small performance hit makes it so that any future endpoints
that are added will automatically log userid and clinicid, plus the endpoint
definition code is a bit cleaner, as it's not having to call uid helpers all
over the place.

BACK-2500
@ewollesen
Copy link
Contributor Author

ewollesen commented Nov 30, 2023

I believe I've addressed all of your comments to date, Todd. If you see something that I didn't respond to, call it out, maybe GitHub just hid it from me. :/ But I didn't ignore anything intentionally.

@@ -394,11 +394,11 @@ func (a *Api) sendClinicianConfirmation(req *http.Request, confirmation *models.
}

if !a.createAndSendNotification(req, confirmation, emailContent) {
return http.StatusInternalServerError, STATUS_ERR_SENDING_EMAIL, nil
return STATUS_ERR_SENDING_EMAIL, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always return an error if you return a message otherwise in some places the code may not return a response, e.g.:

		msg, err := a.sendClinicianConfirmation(req, confirmation)
		if err != nil {
			a.sendError(ctx, res, http.StatusInternalServerError, msg, err)
			return
		}

As reported in code review, returning an error is the best indicator that
something has gone wrong.

BACK-2500
Copy link
Contributor

@toddkazakov toddkazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think extensive regression testing is required.

@ewollesen ewollesen merged commit 029f2af into eric/send-invite-v2.2-fracture Dec 4, 2023
3 checks passed
@ewollesen ewollesen deleted the eric/send-invite-v2.2-logging branch December 4, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants