Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-8779: Remove additional cases where signaling server errors override successful PeerConnection attempts. #388

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

dgottlieb
Copy link
Member

Sorry I didn't really take the time to explain the step-by-step. I'm a little impatient with these errors at this point.

@viambot viambot added the safe to test Mark as safe to test label Nov 11, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 11, 2024
Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree with the changes, some small nits.

rpc/wrtc_client.go Outdated Show resolved Hide resolved
rpc/wrtc_client.go Outdated Show resolved Hide resolved
rpc/wrtc_client.go Outdated Show resolved Hide resolved
rpc/wrtc_client.go Outdated Show resolved Hide resolved
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 12, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 12, 2024
Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM % the lint issues.

rpc/wrtc_client.go Outdated Show resolved Hide resolved
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 12, 2024
@dgottlieb
Copy link
Member Author

There's a test unhappy with these changes. It wants to assert on error messages. The test is otherwise valuable, so I'm evaluating options.

such as when the initial offer/answer SDP is not valid
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 15, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 15, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 15, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 15, 2024
@@ -25,6 +25,10 @@ type ILogger interface {
type ZapCompatibleLogger interface {
Desugar() *zap.Logger

// Not defined: Named(name string) *zap.SugaredLogger
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a whole episode of adding Named here before realizing why we explicitly need to avoid it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we explicitly need to avoid Named? Because it creates a new logger that we "lose track of" whereas Sublogger retains the hierarchy needed to allow setting a level on a top-level logger and having it propagate down to subloggers? Or some other interface issue I'm forgetting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we explicitly need to avoid Named? Because it creates a new logger that we "lose track of" whereas Sublogger retains the hierarchy needed to allow setting a level on a top-level logger and having it propagate down to subloggers?

Correct

dOpts.authMaterial = dOpts.webrtcOpts.SignalingExternalAuthAuthMaterial
dOpts.creds = Credentials{}
}
}

if dOpts.debug {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt unnecessary now

@@ -1150,10 +1150,11 @@ func TestDialMulticastDNS(t *testing.T) {

t.Run("fix mdns instance name", func(t *testing.T) {
rpcServer, err := NewServer(
logger,
logger.Named("server"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% intentional. I think* these are fine given the logger is golog and this is in tests and not called from RDK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think* these are fine given the logger is golog and this is in tests and not called from RDK.

If by fine you mean that we don't need to be using Sublogger here, I would agree. Any reason you needed to create better logger names here, though? Easier to read test output or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you needed to create better logger names here, though? Easier to read test output or something?

"easier to read" is a polite way of putting it. At least in the authenticated subtest, we make 3 separate dials. It was nigh impossible to parse which "base channel/stream" was client and which was server.

@@ -152,8 +152,37 @@ func HeartbeatsAllowedFromCtx(ctx context.Context) bool {
return md[HeartbeatsAllowedMetadataField][0] == "true"
}

func (srv *WebRTCSignalingServer) asyncSetOfferError(host string, uuid string, offerErr error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the big lambda i factored out.

srv.callMu.RUnlock()

utils.PanicCapturingGo(func() {
defer srv.callWorkers.Done()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed to a defer

defer func() {
var errToSend error

if callErr == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notable behavior change is that this callErr == nil block will never result in calling SendOfferError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, sounds correct to me. Avoiding more of these "exchange happened without error but here's an irrelevant context error for you" situations.

sendCtx, cancel := context.WithTimeout(context.Background(), time.Second*5)
defer cancel()

sendErr := srv.callQueue.SendOfferError(sendCtx, host, uuid, offerErr)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also referenced in the deleted code: a "context timeout" that's concurrent with a "return nil // the connection has succeeded" should never result in this SendOfferError being invoked.

@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 15, 2024
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 16, 2024
Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much LGTM; one comment about naming.

@@ -25,6 +25,10 @@ type ILogger interface {
type ZapCompatibleLogger interface {
Desugar() *zap.Logger

// Not defined: Named(name string) *zap.SugaredLogger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we explicitly need to avoid Named? Because it creates a new logger that we "lose track of" whereas Sublogger retains the hierarchy needed to allow setting a level on a top-level logger and having it propagate down to subloggers? Or some other interface issue I'm forgetting?

@@ -138,11 +138,11 @@ func dial(
if !dOpts.mdnsOptions.Disable && tryLocal && isJustDomain {
wg.Add(1)
go func(dOpts dialOptions) {
mdnsLogger := utils.Sublogger(logger, "mdns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the extra subloggers and fewer guards behind dOpts.debug here; thanks.

@@ -1150,10 +1150,11 @@ func TestDialMulticastDNS(t *testing.T) {

t.Run("fix mdns instance name", func(t *testing.T) {
rpcServer, err := NewServer(
logger,
logger.Named("server"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think* these are fine given the logger is golog and this is in tests and not called from RDK.

If by fine you mean that we don't need to be using Sublogger here, I would agree. Any reason you needed to create better logger names here, though? Easier to read test output or something?

@@ -152,8 +152,37 @@ func HeartbeatsAllowedFromCtx(ctx context.Context) bool {
return md[HeartbeatsAllowedMetadataField][0] == "true"
}

func (srv *WebRTCSignalingServer) asyncSetOfferError(host, uuid string, offerErr error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is a separate method, and it's much easier to read than whatever was in that defer for Call before.

Suggested change
func (srv *WebRTCSignalingServer) asyncSetOfferError(host, uuid string, offerErr error) {
func (srv *WebRTCSignalingServer) asyncSendOfferError(host, uuid string, offerErr error) {

Distinction between Set and Send here could be helpful in naming. Set, to me, implies setting some local value representative of error, whereas Send makes me think of SendOfferError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a selfish change that I can just revert -- not a lot of utility and is a bit of a lightning rod. Almost all of the value of "publishing" the error is for tests. And specifically the tests that are setting up a failing connection attempt and want to assert we failed for the right reason.

And in those cases, we are using the "in memory" signaling service.

So your intuition on set vs. send was the intention I was aiming to achieve.

defer func() {
var errToSend error

if callErr == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, sounds correct to me. Avoiding more of these "exchange happened without error but here's an irrelevant context error for you" situations.

@@ -263,6 +250,8 @@ func (srv *WebRTCSignalingServer) Call(req *webrtcpb.CallRequest, server webrtcp
},
},
}); err != nil {
// Don't set an error for the connection attempt. Some candidates may have been
// exchanged that can result in a successful connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks as always for the in-line comments; totally agree with not sending error here.

@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Mark as safe to test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants