Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

application_start event is not sent for Datadog sdk version >= 2.6.0 #2041

Open
saad352 opened this issue Sep 9, 2024 · 16 comments
Open

application_start event is not sent for Datadog sdk version >= 2.6.0 #2041

saad352 opened this issue Sep 9, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@saad352
Copy link

saad352 commented Sep 9, 2024

Describe the bug

Datadog SDK sends action of type application_start. This is working fine until the version 2.5.1 but this event is not being sent from version 2.6.0 onwards.

More info
I have scanned and tried to debug through the sdk code. The data for application_start seems to be gathered properly and written using FileWriter but I don't see data in the RUM Web dashboard. So my guess is that it is being collected but not sent. More over I see this code change to send N batches sequentially. Not sure but could be related to this. My reasoning for pointing to this PR is that we are only missing the first event which is application_start, the rest of the events I see are fine. So may be it is missing the first event and sending the later ones just fine? Or may be something else.

Reproduction steps

Use ddsdk version 2.5.1 and we can see the action application_start but when bumping to 2.6.0 the event does not seems to be sent.

SDK logs

N/A

Expected behavior

application_start event should be seen in the RUM dashboard.

Affected SDK versions

>= 2.6.0

Latest working SDK version

2.5.1

Did you confirm if the latest SDK version fixes the bug?

Yes, it does not fix the issue.

Integration Methods

Cocoapods

Xcode Version

15.4

Swift Version

swiftlang-5.10.0.13

MacOS Version

No response

Deployment Target

iOS 17

Device Information

All

Other relevant information

No response

@saad352 saad352 added the bug Something isn't working label Sep 9, 2024
@mariedm
Copy link
Contributor

mariedm commented Sep 10, 2024

Hi @saad352, thank you for reporting this issue. Just to confirm, are you experiencing this on the latest version (2.16) as well?

@saad352
Copy link
Author

saad352 commented Sep 10, 2024

Hey @mariedm,
Yes it is also occurring on latest version. 2.16.0. The last working version is 2.5.1

@mariedm
Copy link
Contributor

mariedm commented Sep 11, 2024

Hi @saad352, thank you for your response. I tested it on my side and can see the application_start event in the RUM dashboard. Could you share a code sample so I can reproduce the issue? Also, are you making any other changes in your code when testing with versions 2.5.1 and 2.16.0?

@mariedm mariedm added the awaiting response Waiting for response / confirmation from the reporter label Sep 11, 2024
@saad352
Copy link
Author

saad352 commented Sep 12, 2024

Hi @mariedm,
I have tried to reproduce the issue in the sample code base but the issue is not happening in the sample code base with 2.16.0. I have copied pasted the same code in our application and it does not send the application_start event with version 2.16.0 but it does send the same event with version 2.5.1.

Below is the the code which I am using in both.

Datadog.initialize(
        with: Datadog.Configuration(
            clientToken: clientToken,
            env: environment,
            site: .us1
        ),
        trackingConsent: .granted
    )

  RUM.enable(
      with: RUM.Configuration(
          applicationID: appID,
          uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
          uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate()
      )
  )
 

Could there be something else which could be interfering with this in our codebase?

Also, are you making any other changes in your code when testing with versions 2.5.1 and 2.16.0? "

No, we are not making any changes. Just bumping the version.

@saad352
Copy link
Author

saad352 commented Sep 12, 2024

With the above new findings that issue is only happening in our codebase, I again checked the changes between 2.5.1(good) and 2.6.0(bad) and I see that there was swizzling introduced in that. I think some other framework in our code base is swizzling maybe same thing and that could cause the issue. I am trying to look more on what is that method which is being swizzled. If you know please share that.

@mariedm
Copy link
Contributor

mariedm commented Sep 12, 2024

Hi @saad352,

Thank you for the additional information and for debugging on your side, this is very appreciated.

Could you enable debug logs to see if any error is reported by the SDK, please?
Datadog.verbosityLevel = .debug
RUMMonitor.shared().debug = true

Something else you could try is to set a breakpoint when the application_start event is being sent here, and verify that it's being triggered.

@saad352
Copy link
Author

saad352 commented Sep 13, 2024

Hi @mariedm,
No, I don't see any error reported by SDK.

Something else you could try is to set a breakpoint when the application_start event is being sent here, and verify that it's being triggered.

Yes, this part is being triggered and I can see the values.

Do you think if swizzling of ViewController lifecycle events could be an issue here?

@mariedm
Copy link
Contributor

mariedm commented Sep 13, 2024

I don't think swizzling has anything to do with it. The application_start event is triggered when enabling RUM, which is independant of any ViewController lifecycle event. Additionally, you've confirmed the event is collected and written.

Given that you are not facing this issue with a code sample, it seems to be specific to your app. Without your code, it's difficult for us to investigate further.

You might also try filtering actions by type using the search bar with action.type:application_start.

image

@dcacenabes
Copy link

Hi @mariedm ,
I'm pairing with @saad352 on this one. We have identified the commit that broke this action for us:
d622d2182043fa74801e862eab25c3c6577d0780
After playing around with the changes introduced in this PR, we found that this is the code that makes the application_start action disappear from our RUM:

// Start "ApplicationLaunch" view immediatelly:
startApplicationLaunchView(on: command, context: context, writer: writer)

Commenting that one line out makes the action appear again in Datadog RUM.

@maxep
Copy link
Member

maxep commented Sep 19, 2024

Hi @dcacenabes , @saad352 👋

Thank you very much for investigating further. I would be super helpful if you had a sample project that reproduce the issue, I understand it's a big request but would be quicker for us to troubleshoot. If that's not possible, I would kindly ask if you could do further debugging.

Check the app start propagation; make sure the application launch view is properly created by setting breakpoints:
- DatadogRUM/Sources/RUMMonitor/Scopes/RUMApplicationScope.swift#L235
- DatadogRUM/Sources/RUMMonitor/Scopes/RUMApplicationScope.swift#L242
- DatadogRUM/Sources/RUMMonitor/Scopes/RUMSessionScope.swift#L254
- DatadogRUM/Sources/RUMMonitor/Scopes/RUMViewScope.swift#L158

I'm interested to know where and why the propagation fails.

The fact that it works with a vanilla project suggests that the SDK init flow is different in your prod app. Where do you init the SDK? is it called from the app delegate/SwiftUI.App or later during the app process?

Thank you again for your help 🙏

@dcacenabes
Copy link

dcacenabes commented Sep 19, 2024

Hi @maxep ,
Thanks for chipping in!

The fact that it works with a vanilla project suggests that the SDK init flow is different in your prod app. Where do you init the SDK? is it called from the app delegate/SwiftUI.App or later during the app process?

We are initializing the SDK in a call from

  func application(
    _: UIApplication,
    didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?
  ) -> Bool

We're doing it like this:

Datadog.initialize(
      with: .init(
        clientToken: ...,
        env: ...,
        service: ...
      ),
      trackingConsent: ...
    )

    RUM.enable(with: ...)
    Trace.enable(with: ...)
    Logs.enable()

Check the app start propagation; make sure the application launch view is properly created by setting breakpoints:

On it! 🏃 🕵️ Will share my learnings later.

Thank you very much for investigating further. I would be super helpful if you had a sample project that reproduce the issue, I understand it's a big request but would be quicker for us to troubleshoot. If that's not possible, I would kindly ask if you could do further debugging.

We are already working on trimming our project to the bare minimum that reproduces the issue, but it is a complex project 🙏

@dcacenabes
Copy link

@maxep I have made some progress on debugging the issue. Although I have not yet figured out exactly where the error is, I have narrowed it down a lot, so I wanted to share my findings here already in case they can be of help.
The issue seems to be related with trackingConsent. There seems to be a race condition somewhere in the DD SDK code.

  1. We initialize the Datadog SDK with trackingConsent: .pending
  2. Very shortly after initializing the SDK with tracking consent .granted
  3. Datadog SDK calls the migrateUnauthorizedData(toConsent:)
  4. the application_start action is then written into the unauthorized directory. <- Here's the problem.
  5. Since the migrateUnauthorizedData had already been called, the file that contains application_start action stays in the unauthorized directory and never gets migrated, and because of that it never gets sent.

When we comment out the code:

// Start "ApplicationLaunch" view immediatelly:
startApplicationLaunchView(on: command, context: context, writer: writer)

Then the action is written to the right directory.

@dcacenabes
Copy link

I can consistently reproduce it with a sample project.

You simply need to do this in your App Delegate:

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
    Datadog.initialize(
      with: Datadog.Configuration(
        clientToken: "token",
        env: "stage",
        service: "service-name"
      ),
      trackingConsent: .pending
    )
    RUM.enable(
      with: RUM.Configuration(
        applicationID: "app-id",
        uiKitViewsPredicate: DefaultUIKitRUMViewsPredicate(),
        uiKitActionsPredicate: DefaultUIKitRUMActionsPredicate(),
        urlSessionTracking: RUM.Configuration.URLSessionTracking()
      )
    )
    RUMMonitor.shared().addAttribute(forKey: "build", value: "research-app-start")
    // This one works
//    DispatchQueue.main.async {
//      Datadog.set(trackingConsent: .granted)
//    }
    // This one does not work
     Datadog.set(trackingConsent: .granted)

    return true
  }

@maxep
Copy link
Member

maxep commented Sep 20, 2024

Hey @dcacenabes 👋

That's a very interesting finding, thank you very much 🙏

We have now steps to reproduce and we will investigate further on our side, it looks indeed like an issue of the SDK.
As a workaround, could you set the trackingConsent decision as a SDK init parameter? or set it before enabling RUM?

I hope this helps. Thank you again for the detailed troubleshooting!

@maxep maxep removed the awaiting response Waiting for response / confirmation from the reporter label Sep 20, 2024
@dcacenabes
Copy link

As a workaround, could you set the trackingConsent decision as a SDK init parameter? or set it before enabling RUM?

We're using reactive programming and the tracking consent is being set from a stream, so we can't pass it from the beginning in an easy way. The workaround for us is changing:

.receive(on: ExecutionContext.immediateOnMain)

to

.receive(on: ExecutionContext.main)

This workaround is enough to get the application_start submitted to datadog, but this has now unveiled another issue.
We are setting a few attributes to the RUM by using:
RUMMonitor.shared().addAttribute(forKey: "key", value: "value")
These attributes are correctly set for everything we send except for the application_start action.
We can't call RUMMonitor.shared() before calling RUM.enable(with:), and RUM.enable is the one creating the application_start action.
So the current state after the latest findings for us is:

  1. Introducing a delay allows us to successfully submit the application_start event
  2. the submitted application_start event is missing the custom attributes we add by using RUMMonitor.shared().addAttribute(forKey: "key", value: "value"), so we can't correctly filter the data in the Datadog RUM web.

@mariedm
Copy link
Contributor

mariedm commented Nov 8, 2024

Hi @dcacenabes, thank you for the thorough investigation and detailed report; it's very helpful. We addressed the race condition in PR #2063, which will be included in our next release. Keep an eye out for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants