fix: iOS app crash caused by the request operation canceling #48350

zhouzh1 · 2024-12-20T10:45:02Z

Summary:

Currently we observed many iOS app crashes caused by the [RCTFileRequestHanlder invalidate] method, just as the below screenshot.

Changelog:

[IOS] [FIXED] - app crash caused by the [RCTFileRequestHanlder invalidate] method

Test Plan:

I am not able to reproduce this issue locally either, so the changes in this PR are totally from my inference, I am not sure if it really makes sense, so please help take a deeper look, thanks.

zhouzh1 · 2024-12-20T10:47:02Z

@mojodna @jsierles @augustl Could you help take a look at this PR? Thanks.

augustl · 2024-12-20T10:53:52Z

@zhouzh1 hi! Out of curiosity, why did you tag me in this PR? I keep getting tagged in React Native PRs for some reason, and I'm not sure why :)

cipolleschi

I'm not sure this is proper fix for the issue.
Could you share more about the crash?
Is there an error message?
Can you share the full crashlog?
Is it happening in production only or also in debug?
Is the app in background or in foreground?

cipolleschi · 2024-12-20T12:13:12Z

packages/react-native/Libraries/Network/RCTFileRequestHandler.mm

+  if (_fileQueue) {
+    for (NSOperation *operation in _fileQueue.operations) {
+      if ([operation isKindOfClass:[NSOperation class]] && !operation.isCancelled && !operation.isFinished) {
+        [operation cancel];
+      }
+    }
+    _fileQueue = nil;
+  }


This is not necessary. You can only add NSOperation to an NSOperationQueue and cancellAllOperations runs the same code you are manually writing.

@cipolleschi Seems that the cancellAllOperations won't do the status checks for the operation, and in the crash report of my iOS app, the stack trace exactly tells me the crash point is just in the cancellAllOperations internal.

This is the official Apple docs for cancelAllOperations.

This method calls the cancel method on all operations currently in the queue.
Canceling the operations does not automatically remove them from the queue or stop those that are currently executing. For operations that are queued and waiting execution, the queue must still attempt to execute the operation before recognizing that it is canceled and moving it to the finished state. For operations that are already executing, the operation object itself must check for cancellation and stop what it is doing so that it can move to the finished state. In both cases, a finished (or canceled) operation is still given a chance to execute its completion block before it is removed from the queue.

And this is the docs of NSOperation cancel.

In any case, calling cancel on an already cancelled or finished operation does not crash the app.

I believe that the crash is happening inside one of the operations that are being cancelled and that's why the crash reporter reports the crash there.

Is it possible that even though we already put assurance for the corresponding code to make it only executed on the main thread or another sole thread, e.g. the JS thread, but because of the nature of object pointer reference, the operation object could still be shared among multiple thread contexts, then the situation you said above happens.

cipolleschi · 2024-12-20T12:14:46Z

packages/react-native/React/CxxBridge/RCTCxxBridge.mm

-  if (_didInvalidate) {
-    return;
-  }
+  RCTUnsafeExecuteOnMainQueueSync(^{


This could potentially deadlock. We should not run the unsafe variant of this method. Can you change it with RCTExecuteOnMainQueue?

cipolleschi · 2024-12-20T12:15:45Z

packages/react-native/React/CxxBridge/RCTCxxBridge.mm


-  RCTAssertMainQueue();
-  RCTLogInfo(@"Invalidating %@ (parent: %@, executor: %@)", self, _parentBridge, [self executorClass]);
+    RCTAssertMainQueue();


One thing that confuses me is that, in your stacktrace, the crash is happening in Thread 26... but this assert should force the app to be on the main thread, which is not the Thread 26... how's this possible?

@cipolleschi Good question, that's because of actually the RCTAssertMainQueue only takes effects in dev build, when in the release build, it does nothing.

That's why I added the RCTUnsafeExecuteOnMainQueueSync wrapper to ensure the code to run on the main thread.

As you can see, when the `NS_BLOCK_ASSERTIONS` is defined, the `RCTAssert` macro is actually empty, and generally the `NS_BLOCK_ASSERTIONS` is defined in release build.

That's a good explanation, but then we should see crashes in development happening because of the assertion. And IIUC, the app does not crash in development, right?

By looking at the crash log, the JS thread is triggering the invalidation. I think that this is the root of the problem: after the JS thread detect the invalidation, we should jump on the UI thread to invalidate everything...

I didn't encounter this crash in development, but I am not sure if it happens in it, you know, it's a occasional issue on itself.

zhouzh1 · 2024-12-20T12:59:07Z

@zhouzh1 hi! Out of curiosity, why did you tag me in this PR? I keep getting tagged in React Native PRs for some reason, and I'm not sure why :)

Hey @augustl, sorry, I thought you're a memeber of the react-native dev team, because your id was displayed in the hint list when I was typing the @ char in the comment box. 😸

zhouzh1 · 2024-12-20T13:23:17Z

@cipolleschi Put more information here for your reference.

I only observed this crash in production by sentry, have not seen it in dev.
According the sentry breadcrumbs, seems that most of the events happened after the app was pushed to the background state.
You can see the relevant error messages that we collected by sentry on the below screenshot.

4. Posted a complete crash report. [79c34beca2e542da9ab6df2ad54ec60d-symbolicated.crash.zip](https://github.com/user-attachments/files/18211158/79c34beca2e542da9ab6df2ad54ec60d-symbolicated.crash.zip)

RSNara · 2024-12-20T15:12:05Z

@zhouzh1, this diff assumes that the crash occurs because file reader invalidation should occur on the main thread. Have we validated that assumption? If not, we should! If so, then, why not just just surgically modify the file reader as opposed to the bridge? As it stands, your changes will cause a lot of code to execute on the main thread, which could have significant perf implications.

zhouzh1 · 2024-12-21T12:41:00Z

@zhouzh1, this diff assumes that the crash occurs because file reader invalidation should occur on the main thread. Have we validated that assumption? If not, we should! If so, then, why not just just surgically modify the file reader as opposed to the bridge? As it stands, your changes will cause a lot of code to execute on the main thread, which could have significant perf implications.

@RSNara As you can see from the code and the above conversation with @cipolleschi , there is already a RCTAssertMainQueue() invocation in the start of the invalidation, that means the original intention is exactly needing the corresponding code running on the main queue? Just exactly because of accidentally the official dev team didn't notice the RCTAssertMainQueue won't take effect in release build on most of the cases (it's just my guess and inference, if the dev team knew that and did it intentionally, please ignore).

RSNara · 2024-12-23T18:47:57Z

@zhouzh1, this diff assumes that the crash occurs because file reader invalidation should occur on the main thread. Have we validated that assumption? If not, we should! If so, then, why not just just surgically modify the file reader as opposed to the bridge? As it stands, your changes will cause a lot of code to execute on the main thread, which could have significant perf implications.

@RSNara As you can see from the code and the above conversation with @cipolleschi , there is already a RCTAssertMainQueue() invocation in the start of the invalidation, that means the original intention is exactly needing the corresponding code running on the main queue? Just exactly because of accidentally the official dev team didn't notice the RCTAssertMainQueue won't take effect in release build on most of the cases (it's just my guess and inference, if the dev team knew that and did it intentionally, please ignore).

I just took a look!

RCTCxxBridge invalidate should only ever be called from RCTBridge. And RCTBridge invalidate and reload (but not dealloc) schedules RCTCxxBridge invalidate on the main thread:

https://github.com/facebook/react-native/blob/main/packages/react-native/React/Base/RCTBridge.mm?fbclid=IwZXh0bgNhZW0CMTEAAR0vNpP3fC5jmfsIbD3EGeU8ynLhumgdGU2JgStLpsMNIT8VNT9PqeT3N2I_aem__qaj5Hd0GAQmvSVLs7dQZA#L368

So, it's very curious that you're running into this issue. Could it be that in your code, you're relying on the dealloc method of RCTBridge? And that just synchronously deallocates the RCTCxxBridge on the current (i.e: potentially non-main) thread?

zhouzh1 · 2024-12-24T02:42:57Z

@zhouzh1, this diff assumes that the crash occurs because file reader invalidation should occur on the main thread. Have we validated that assumption? If not, we should! If so, then, why not just just surgically modify the file reader as opposed to the bridge? As it stands, your changes will cause a lot of code to execute on the main thread, which could have significant perf implications.

@RSNara As you can see from the code and the above conversation with @cipolleschi , there is already a RCTAssertMainQueue() invocation in the start of the invalidation, that means the original intention is exactly needing the corresponding code running on the main queue? Just exactly because of accidentally the official dev team didn't notice the RCTAssertMainQueue won't take effect in release build on most of the cases (it's just my guess and inference, if the dev team knew that and did it intentionally, please ignore).

I just took a look!

RCTCxxBridge invalidate should only ever be called from RCTBridge. And RCTBridge invalidate and reload (but not dealloc) schedules RCTCxxBridge invalidate on the main thread:

https://github.com/facebook/react-native/blob/main/packages/react-native/React/Base/RCTBridge.mm?fbclid=IwZXh0bgNhZW0CMTEAAR0vNpP3fC5jmfsIbD3EGeU8ynLhumgdGU2JgStLpsMNIT8VNT9PqeT3N2I_aem__qaj5Hd0GAQmvSVLs7dQZA#L368

So, it's very curious that you're running into this issue. Could it be that in your code, you're relying on the dealloc method of RCTBridge? And that just synchronously deallocates the RCTCxxBridge on the current (i.e: potentially non-main) thread?

Your curiosity is mine as well. Before I submitted this PR, I was also suspecting if there is a certain place in my code or 3rd-party library code where the RCTBridge or the RCTCxxBridge deallocation is invoked explicitly, but I didn't manage to find it. However anyway, we always need to ensure the RCTCxxBridge invalidation to be run on the main thread, is it right? If so, it makes sense to wrap it with the RCTExecuteOnMainQueue?

cipolleschi · 2024-12-24T10:36:05Z

@zhouzh1 is your app using Expo? I wonder if Expo does something under the hood to try and manage the lifecycle of the Bridge. Similarly, there might be libraries that attempt to do the same. If they connect some private API like the reload one to some JS function, it might happen that the invalidation process starts from the JS Thread instead of from the main one. 🤔

This is an hypothesis that we need to validate, thought.. It would be helpful to know what dependencies are you using. Also, are you using any crash reporting solution like Sentry? those product usually allow you to leave breadcrumbs that can be used to investigate crashes. What was the user doing in the app when the crash occurred? What was the last action issued?

zhouzh1 · 2024-12-24T11:05:14Z

@zhouzh1 is your app using Expo? I wonder if Expo does something under the hood to try and manage the lifecycle of the Bridge. Similarly, there might be libraries that attempt to do the same. If they connect some private API like the reload one to some JS function, it might happen that the invalidation process starts from the JS Thread instead of from the main one. 🤔

This is an hypothesis that we need to validate, thought.. It would be helpful to know what dependencies are you using. Also, are you using any crash reporting solution like Sentry? those product usually allow you to leave breadcrumbs that can be used to investigate crashes. What was the user doing in the app when the crash occurred? What was the last action issued?

Yes, our app is using the expo and many of its associated libraries (e.g. the expo-updates, expo-camera, and so on), I think what you said above makes sense.
According to the Sentry breadcrumbs we collected, I found most of the crash events occurring after the app was pushed to the background state, and meanwhile, there were some requests in pending.

zhouzh1 · 2024-12-25T10:43:28Z

@cipolleschi Any ideas about the above information I provided?

fix: iOS app crash caused by canceling request operations

7181c31

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Shared with Meta Applied via automation to indicate that an Issue or Pull Request has been shared with the team. labels Dec 20, 2024

cipolleschi requested changes Dec 20, 2024

View reviewed changes

refactor: use the RCTExecuteOnMainQueue instead of its unsafe variant

dc302f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: iOS app crash caused by the request operation canceling #48350

fix: iOS app crash caused by the request operation canceling #48350

zhouzh1 commented Dec 20, 2024

zhouzh1 commented Dec 20, 2024

augustl commented Dec 20, 2024

cipolleschi left a comment

cipolleschi Dec 20, 2024

zhouzh1 Dec 20, 2024

cipolleschi Dec 20, 2024

zhouzh1 Dec 21, 2024

cipolleschi Dec 20, 2024

zhouzh1 Dec 20, 2024

cipolleschi Dec 20, 2024

zhouzh1 Dec 20, 2024

zhouzh1 Dec 20, 2024

zhouzh1 Dec 20, 2024

cipolleschi Dec 20, 2024

cipolleschi Dec 20, 2024

zhouzh1 Dec 21, 2024 •

edited

Loading

zhouzh1 commented Dec 20, 2024

zhouzh1 commented Dec 20, 2024

RSNara commented Dec 20, 2024 •

edited

Loading

zhouzh1 commented Dec 21, 2024 •

edited

Loading

RSNara commented Dec 23, 2024 •

edited

Loading

zhouzh1 commented Dec 24, 2024

cipolleschi commented Dec 24, 2024

zhouzh1 commented Dec 24, 2024

zhouzh1 commented Dec 25, 2024

fix: iOS app crash caused by the request operation canceling #48350

Are you sure you want to change the base?

fix: iOS app crash caused by the request operation canceling #48350

Conversation

zhouzh1 commented Dec 20, 2024

Summary:

Changelog:

Test Plan:

zhouzh1 commented Dec 20, 2024

augustl commented Dec 20, 2024

cipolleschi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhouzh1 Dec 21, 2024 • edited Loading

Choose a reason for hiding this comment

zhouzh1 commented Dec 20, 2024

zhouzh1 commented Dec 20, 2024

RSNara commented Dec 20, 2024 • edited Loading

zhouzh1 commented Dec 21, 2024 • edited Loading

RSNara commented Dec 23, 2024 • edited Loading

zhouzh1 commented Dec 24, 2024

cipolleschi commented Dec 24, 2024

zhouzh1 commented Dec 24, 2024

zhouzh1 commented Dec 25, 2024

zhouzh1 Dec 21, 2024 •

edited

Loading

RSNara commented Dec 20, 2024 •

edited

Loading

zhouzh1 commented Dec 21, 2024 •

edited

Loading

RSNara commented Dec 23, 2024 •

edited

Loading