Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXC_BAD_ACCESS AWSIoTMQTTClient.m:1326 #5453

Open
AndrKonovalov opened this issue Oct 24, 2024 · 4 comments
Open

EXC_BAD_ACCESS AWSIoTMQTTClient.m:1326 #5453

AndrKonovalov opened this issue Oct 24, 2024 · 4 comments
Labels
bug Something isn't working iot Issues related to the IoT SDK

Comments

@AndrKonovalov
Copy link

AndrKonovalov commented Oct 24, 2024

Describe the bug
App randomly crashes with error:

EXC_BAD_ACCESS AWSIoTMQTTClient.m:1326
Attempted to dereference garbage pointer 0x8.

To Reproduce
Steps to reproduce the behavior:
N/A

Observed Behavior
App is crashing

Expected Behavior
I'm expecting to reconnect to the AWS client.

Stack Trace

to large attached as [threads.txt](https://github.com/user-attachments/files/17513554/threads.txt)

Code Snippet
N/A
Unique Configuration
N/A

Areas of the SDK you are using (AWSMobileClient, Cognito, Pinpoint, IoT, etc)?

Screenshots
N/A

Environment(please complete the following information):

  • SDK Version: [2.36.7]
  • Dependency Manager: [Cocoapods]
  • Swift Version : [5.9]
  • Xcode Version: [15.4]

Device Information (please complete the following information):

  • Device: [Iphone 12 pro ]
  • iOS Version: [iOS 17.6.1]
  • Specific to simulators:

Logs

EXC_BAD_ACCESS: Attempted to dereference garbage pointer 0x8.

0 CoreFoundation +0xa263c _CFRunLoopAddTimer
1 AWSIoT +0x4ddac -[AWSIoTMQTTClient scheduleReconnection] (AWSIoTMQTTClient.m:1326:9)
2 AWSIoT +0x4a928 __43-[AWSIoTMQTTClient initiateReconnectTimer:]_block_invoke (AWSIoTMQTTClient.m:780:9)
3 libdispatch.dylib +0x2138 __dispatch_call_block_and_release
4 libdispatch.dylib +0x3dd0 __dispatch_client_callout
5 libdispatch.dylib +0xb3fc __dispatch_lane_serial_drain
6 libdispatch.dylib +0xbf2c __dispatch_lane_invoke
7 libdispatch.dylib +0x16cb0 __dispatch_root_queue_drain_deferred_wlh
8 libdispatch.dylib +0x16524 __dispatch_workloop_worker_thread
9 libsystem_pthread.dylib +0x4930 __pthread_wqthread

After a little deep dive made by David Webb it seems that issue arises because of the cleanupReconnectTimer and scheduleReconnection methods run on different threads. cleanupReconnectTimer is specifically designed to run on the reconnectThread, whereas scheduleReconnection operates on a different timerQueue. This lack of synchronization between threads can cause a race condition where the reconnectTimer is being set up by scheduleReconnection while being invalidated or cleaned up by cleanupReconnectTimer. The cleanupReconnectTimer method ensures it queues cleanup operations on the correct thread (reconnectThread). However, if scheduleReconnection is in the middle of setting up the reconnect timer, and cleanupReconnectTimer is called at the same time on another thread, the timer could be invalidated before it is fully initialized, leading to unexpected behavior or crashes.

call AWSIoTDataManager.m disconnect method

- (void)disconnect{
    if ( !_userDidIssueConnect || _userDidIssueDisconnect ) {
        //Have to be connected to make this call. noop this call by returning
        return ;
    }
    _userDidIssueConnect = NO;
    _userDidIssueDisconnect = YES;
    [self.mqttClient disconnect];
}

which is calling AWSIoTMQTTClient.m

- (void)disconnect {
    if (self.userDidIssueDisconnect ) {
        //Issuing disconnect multiple times. Turn this function into a noop by returning here.
        return;
    }

    //Invalidate the reconnect timer so that there are no reconnect attempts.
    [self cleanupReconnectTimer];
    
    //Set the userDisconnect flag to true to indicate that the user has initiated the disconnect.
    self.userDidIssueDisconnect = YES;
    self.userDidIssueConnect = NO;

    //call disconnect on the session.
    [self.session disconnect];
    self.connectionAgeInSeconds = 0;

    //Cancel the current streams thread
    [self.streamsThread cancelAndDisconnect:YES];

    __weak AWSIoTMQTTClient *weakSelf = self;
    self.streamsThread.onStop = ^{
        __strong AWSIoTMQTTClient *strongSelf = weakSelf;
        //If the userDidIssueDisconnect has been set to NO, it means a new connection has been requested,
        //so we should disregard these updates
        if (!strongSelf || !strongSelf.userDidIssueDisconnect) {
            return;
        }

        //Invalidate connection age timer and close socket
        if (strongSelf.connectionAgeTimer != nil) {
            [strongSelf.connectionAgeTimer invalidate];
            strongSelf.connectionAgeTimer = nil;
        }

        if (strongSelf.webSocket) {
            [strongSelf.webSocket close];
            strongSelf.webSocket = nil;
        }

        //Notify disconnected status.
        strongSelf.mqttStatus = AWSIoTMQTTStatusDisconnected;
        [strongSelf notifyConnectionStatus];
    };

    AWSDDLogInfo(@"AWSIoTMQTTClient: Disconnect message issued.");
}

disconnect has no synchronization it calls cleanupReconnectTimer
which is invalidates and removes reference to the reconnect timer on the correct thread to avoid creating a memory leak.

 @discussion If called on any thread other than the reconnect thread the work is queued up on
 the reconnect thread but the method returns without waiting for the invalidation to be completed.
 This is called initially on the thread the consumer is calling the client's disconnect method on.
 */
- (void)cleanupReconnectTimer {
    if (self.reconnectTimer == nil) {
        return;
    }

    if (self.reconnectThread) {
        if (!self.reconnectThread.isFinished && ![[NSThread currentThread] isEqual:self.reconnectThread]) {
            // Move to reconnect thread to cleanup only if it's still running
            [self performSelector:@selector(cleanupReconnectTimer)
                         onThread:self.reconnectThread
                       withObject:nil
                    waitUntilDone:NO];
            return;
        }
        
        [self.reconnectTimer invalidate];
        self.reconnectTimer = nil;
    }
}

The code that then has issues is:

- (void)scheduleReconnection {
    dispatch_assert_queue(self.timerQueue);

    BOOL isConnectingOrConnected = self.mqttStatus == AWSIoTMQTTStatusConnected || self.mqttStatus == AWSIoTMQTTStatusConnecting;
    if (!self.reconnectTimer && !isConnectingOrConnected) {
        self.reconnectTimer = [NSTimer timerWithTimeInterval:self.currentReconnectTime
                                                      target:self
                                                    selector: @selector(reconnectToSession)
                                                    userInfo:nil
                                                     repeats:NO];
        [[NSRunLoop currentRunLoop] addTimer:self.reconnectTimer forMode:NSDefaultRunLoopMode];
        [[NSRunLoop currentRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
    }
}
@github-actions github-actions bot added pending-triage Issue is pending triage pending-maintainer-response Issue is pending response from an Amplify team member labels Oct 24, 2024
@edisooon edisooon added bug Something isn't working iot Issues related to the IoT SDK follow up Requires follow up from maintainers and removed pending-triage Issue is pending triage pending-maintainer-response Issue is pending response from an Amplify team member labels Oct 25, 2024
@edisooon
Copy link
Member

Thanks for your insight, one of our team members will do some investigation on this as soon as possible

@ruisebas
Copy link
Member

ruisebas commented Oct 25, 2024

Thanks @AndrKonovalov for the detailed analysis and your conclusions.

Based on them, I've implemented a simple fix in the fix/iot-crash branch. I've tested it with a sample app and everything seems to be working fine, but would you be able to check if it fixes the crash in your application?

Thanks!

@ruisebas ruisebas removed the follow up Requires follow up from maintainers label Oct 25, 2024
@AndrKonovalov
Copy link
Author

Sure, will try

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 28, 2024
@ruisebas ruisebas added pending-community-response Issue is pending response from the issue requestor and removed pending-maintainer-response Issue is pending response from an Amplify team member labels Oct 28, 2024
@AndrKonovalov
Copy link
Author

Hey, your changes fixed it. Issue is gone. Thanks!

@github-actions github-actions bot added pending-maintainer-response Issue is pending response from an Amplify team member and removed pending-community-response Issue is pending response from the issue requestor labels Nov 13, 2024
@edisooon edisooon removed the pending-maintainer-response Issue is pending response from an Amplify team member label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working iot Issues related to the IoT SDK
Projects
None yet
Development

No branches or pull requests

3 participants