Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posthog ruby does not work in background job systems like Sidekiq and Resque #10

Open
ajsharp opened this issue Sep 12, 2021 · 10 comments

Comments

@ajsharp
Copy link
Contributor

ajsharp commented Sep 12, 2021

Problem

The posthog ruby client uses a background thread to send requests to posthog. This doesn't work in Resque or background job systems, because Resque doesn't wait for background threads to complete before completing a job.

I see this line logged, but the request is never finished.

Currently, the Client#flush method exists, but it has an arbitrary 0.1 second sleep call in it -- not ideal for a non-exit scenario.

Temporary Solution

To get posthog to send events, I had to monkeypatch PostHog::Client to run the worker in the current thread, and PostHog::Worker to not run in a continuous loop, which would block the main thread.

This is my local monkey patched code:

class PostHog::Client
  def ensure_worker_running
    @worker_mutex.synchronize { @worker.run_sync }
  end
end

class PostHog::Worker
  def run_sync
    return if @queue.empty?

    @lock.synchronize do
      consume_message_from_queue! until @batch.full? || @queue.empty?
    end

    res = @transport.send @api_key, @batch
    @on_error.call(res.status, res.error) unless res.status == 200

    @lock.synchronize { @batch.clear }
  end
end

Suggested Solution

Ideally, you could configure the posthog client to do some version of the above at runtime. This way you could allow Posthog to run async if it's running in an app server context, and synchronously in a background worker context. Or, even better would be direct support for Resque or Sidekiq. There are other libraries that do something similar.

@wuputah
Copy link

wuputah commented May 19, 2022

Interesting. I did not expect the client to be sending work into a background thread for me. I used ActiveJob reflexively to move the work into the background, which worked fine as I am just using the thread pool back-end at the moment. An async/sync option would be preferred, and please add this to the docs.

Edit: Ah, from the docs:

It uses an internal queue to make calls fast and non-blocking. It also batches requests and flushes asynchronously, making it perfect to use in any part of your web app or other server-side application that needs performance.

@itadventurer
Copy link

It seems like the PostHog::Worker was renamed to PostHog::SendWorker in one of the previous releases. So, here is the updated Monkeypatch:

class PostHog::Client
  def ensure_worker_running
    @worker_mutex.synchronize { @worker.run_sync }
  end
end

class PostHog::SendWorker
  def run_sync
    return if @queue.empty?

    @lock.synchronize do
      consume_message_from_queue! until @batch.full? || @queue.empty?
    end

    res = @transport.send @api_key, @batch
    @on_error.call(res.status, res.error) unless res.status == 200

    @lock.synchronize { @batch.clear }
  end
end

@jetaggart
Copy link

This is biting me as well. We have a lot of out of band conversion events via webhooks/polling that we want to track. The built in non-blocking queue is not working correctly within resque, even with the monkey patches applied (some events make it, some don't).

Is there any progress/updates on this?

@stefanoverna
Copy link

stefanoverna commented Oct 23, 2023

This is a client wrapper that sends events synchronously if it is being called inside a Sidekiq worker:

  class SidekiqCompatiblePosthogClient
    def collect_events
      return unless client

      @inside_block = true

      if Sidekiq.server?
        @collected_events = []

        yield

        if @collected_events.any?
          # Posthog client does not work well in Sidekiq, so we're sending
          # the calls syncronously:  https://github.com/PostHog/posthog-ruby/issues/10

          response = PostHog::Transport.new(api_host: ENV["POSTHOG_HOST"]).send(
            ENV["POSTHOG_API_KEY"],
            @collected_events,
          )

          unless response.status == 200
            Rollbar.error(Exception.new("Posthog error: #{response.status}, #{response.error}"))
          end
        end
      else
        # Posthog client takes care of sending the events in a parallel thread
        yield
      end

      @inside_block = false
    end

    def group_identify(attrs)
      @inside_block or
        raise ArgumentError, "group_identify must be called within a #collect_events block"

      if Sidekiq.server?
        client.symbolize_keys!(attrs)
        add_to_collected_events(PostHog::FieldParser.parse_for_group_identify(attrs))
      else
        client.group_identify(attrs)
      end
    end

    def capture(attrs)
      @inside_block or
        raise ArgumentError, "capture must be called within a #collect_events block"

      if Sidekiq.server?
        client.symbolize_keys!(attrs)
        add_to_collected_events(PostHog::FieldParser.parse_for_capture(attrs))
      else
        client.capture(attrs)
      end
    end

    private

    def add_to_collected_events(action)
      action[:messageId] ||= client.uid
      @collected_events << action
    end

    def client
      # Initialize your regular PostHog client here
    end
  end

Usage:

client = SidekiqCompatiblePosthogClient.new

client.collect_events do
  client.capture(...)
end

@jclusso
Copy link
Contributor

jclusso commented Feb 6, 2024

Are there any plans to fix this? It's been open for 2+ years...

@pauldambra
Copy link
Member

A few of us at PostHog have written Ruby in the past but none of us recently

We already swap out the worker in tests

@worker = if opts[:test_mode]
        NoopWorker.new(@queue)
      else
        SendWorker.new(@queue, @api_key, opts)
      end

So, I guess a solution here would be to have a MainThreadWorker that we swap in based on config...

This is, however, just a long way of saying - you all know Ruby better than us and PRs are always welcome 😊

@neilkakkar
Copy link
Contributor

To add, if you want to see a reference, we have sync_mode in posthog-python that runs everything in the main thread, allowing it to be used in background job systems: https://github.com/PostHog/posthog-python/blob/a5e8b7d7fb9f5ae0e9fc3395d2bfefa73964e1fe/posthog/client.py#L44

@jclusso
Copy link
Contributor

jclusso commented Feb 7, 2024

I or someone on my team is going to work on a PR for this since it's essential that this works so we can use your paid service.

Additionally, as someone that operates a SaaS that has libraries in other languages, I understand it may not be the easiest thing to mantain when it's nobody's primary language.

That being said, it's rather alarming that this issue has been open for years and nobody internally has even proposed a PR to solve it for the community to help review (if that is necessary). It's not some wierd edge situation that nobody will have. In my opinion it's a pretty major probelm because it will create phantom data integrity issues for people that don't realize.

@jetaggart
Copy link

jetaggart commented Feb 9, 2024

FWIW, it's really unexpected to me that the library is doing queuing for me, I'm not convinced or really trust that your implementation won't cause memory leaks/perf issues/etc. I'd expect a simple api client only that I can use to talk to PH, leaving queuing concerns to the us. It'd also be way easier for you all to maintain!

We ended up removing the library all together and just making the http requests ourselves.

@jclusso
Copy link
Contributor

jclusso commented Feb 9, 2024

We ended up removing the library all together and just making the http requests ourselves.

We've decided on the same thing. After looking at this library more I realized it's not worth the headache for the same reasons. Also, it's some sort of forked version of analytics-ruby that hasn't been kept up to date with all the issues they've solved over there.

I might share a gist or make a simple gem once our implementation is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants