-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Posthog ruby does not work in background job systems like Sidekiq and Resque #10
Comments
Interesting. I did not expect the client to be sending work into a background thread for me. I used ActiveJob reflexively to move the work into the background, which worked fine as I am just using the thread pool back-end at the moment. An async/sync option would be preferred, and please add this to the docs. Edit: Ah, from the docs:
|
It seems like the class PostHog::Client
def ensure_worker_running
@worker_mutex.synchronize { @worker.run_sync }
end
end
class PostHog::SendWorker
def run_sync
return if @queue.empty?
@lock.synchronize do
consume_message_from_queue! until @batch.full? || @queue.empty?
end
res = @transport.send @api_key, @batch
@on_error.call(res.status, res.error) unless res.status == 200
@lock.synchronize { @batch.clear }
end
end |
This is biting me as well. We have a lot of out of band conversion events via webhooks/polling that we want to track. The built in non-blocking queue is not working correctly within resque, even with the monkey patches applied (some events make it, some don't). Is there any progress/updates on this? |
This is a client wrapper that sends events synchronously if it is being called inside a Sidekiq worker: class SidekiqCompatiblePosthogClient
def collect_events
return unless client
@inside_block = true
if Sidekiq.server?
@collected_events = []
yield
if @collected_events.any?
# Posthog client does not work well in Sidekiq, so we're sending
# the calls syncronously: https://github.com/PostHog/posthog-ruby/issues/10
response = PostHog::Transport.new(api_host: ENV["POSTHOG_HOST"]).send(
ENV["POSTHOG_API_KEY"],
@collected_events,
)
unless response.status == 200
Rollbar.error(Exception.new("Posthog error: #{response.status}, #{response.error}"))
end
end
else
# Posthog client takes care of sending the events in a parallel thread
yield
end
@inside_block = false
end
def group_identify(attrs)
@inside_block or
raise ArgumentError, "group_identify must be called within a #collect_events block"
if Sidekiq.server?
client.symbolize_keys!(attrs)
add_to_collected_events(PostHog::FieldParser.parse_for_group_identify(attrs))
else
client.group_identify(attrs)
end
end
def capture(attrs)
@inside_block or
raise ArgumentError, "capture must be called within a #collect_events block"
if Sidekiq.server?
client.symbolize_keys!(attrs)
add_to_collected_events(PostHog::FieldParser.parse_for_capture(attrs))
else
client.capture(attrs)
end
end
private
def add_to_collected_events(action)
action[:messageId] ||= client.uid
@collected_events << action
end
def client
# Initialize your regular PostHog client here
end
end Usage: client = SidekiqCompatiblePosthogClient.new
client.collect_events do
client.capture(...)
end |
Are there any plans to fix this? It's been open for 2+ years... |
A few of us at PostHog have written Ruby in the past but none of us recently We already swap out the worker in tests
So, I guess a solution here would be to have a This is, however, just a long way of saying - you all know Ruby better than us and PRs are always welcome 😊 |
To add, if you want to see a reference, we have sync_mode in posthog-python that runs everything in the main thread, allowing it to be used in background job systems: https://github.com/PostHog/posthog-python/blob/a5e8b7d7fb9f5ae0e9fc3395d2bfefa73964e1fe/posthog/client.py#L44 |
I or someone on my team is going to work on a PR for this since it's essential that this works so we can use your paid service. Additionally, as someone that operates a SaaS that has libraries in other languages, I understand it may not be the easiest thing to mantain when it's nobody's primary language. That being said, it's rather alarming that this issue has been open for years and nobody internally has even proposed a PR to solve it for the community to help review (if that is necessary). It's not some wierd edge situation that nobody will have. In my opinion it's a pretty major probelm because it will create phantom data integrity issues for people that don't realize. |
FWIW, it's really unexpected to me that the library is doing queuing for me, I'm not convinced or really trust that your implementation won't cause memory leaks/perf issues/etc. I'd expect a simple api client only that I can use to talk to PH, leaving queuing concerns to the us. It'd also be way easier for you all to maintain! We ended up removing the library all together and just making the http requests ourselves. |
We've decided on the same thing. After looking at this library more I realized it's not worth the headache for the same reasons. Also, it's some sort of forked version of analytics-ruby that hasn't been kept up to date with all the issues they've solved over there. I might share a gist or make a simple gem once our implementation is done. |
Problem
The posthog ruby client uses a background thread to send requests to posthog. This doesn't work in Resque or background job systems, because Resque doesn't wait for background threads to complete before completing a job.
I see this line logged, but the request is never finished.
Currently, the
Client#flush
method exists, but it has an arbitrary 0.1 secondsleep
call in it -- not ideal for a non-exit scenario.Temporary Solution
To get posthog to send events, I had to monkeypatch
PostHog::Client
to run the worker in the current thread, andPostHog::Worker
to not run in a continuous loop, which would block the main thread.This is my local monkey patched code:
Suggested Solution
Ideally, you could configure the posthog client to do some version of the above at runtime. This way you could allow Posthog to run async if it's running in an app server context, and synchronously in a background worker context. Or, even better would be direct support for Resque or Sidekiq. There are other libraries that do something similar.
The text was updated successfully, but these errors were encountered: