You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 31, 2021. It is now read-only.
I'm unsure how to approach this so I thought I might put it up here for some discussion.
Some background:
I'm using the snowplow-nodejs-tracker in an AWS Lambda environment to send events. I've detected an issue where some requests are not being sent out every once in a while (rarely).
I think I've narrowed it down to an interplay on how AWS Lambda's operating environment works, and the async nature of the emitter in snowplow-nodejs-tracker.
A quick primer on how AWS Lambda runs your code. When a request to run your function comes in, it spins up an instance of your function, and runs your code. When your function finishes execution, it puts the instance into a kind of sleep mode where your code is no longer running, but all objects in memory are kept intact waiting for the next execution run. Future executions can then take advantage of a speed boost because it does not have to re-initialize an instance of the function. This idea is called a cold/hot start in Lambda terminology.
The issue comes in how Lambda decides what 'finishes execution' means. There is 2 interface for your code to execute within Lambda, a NodeJS style callback function (i.e. module.exports = (event, context, callback) => { callback(null, 'success') }) or an 'async/await' style, available for Node 8.10, where you return a promise in your handler (i.e. module.exports = async (event, context) => { return 'ok' }
In the callback style, when callback is invoked, Lambda will not immediately end the function execution, but rather wait for NodeJs' Event Loop to drain before ending your method. This behaviour is controllable via a property (callbackWaitsForEmptyEventLoop) on the context object give to you by AWS, where a value of false would immediately end execution on the function. The default is true.
In the promise style, the behaviour is to terminate as soon as the return Promise resolves/rejects. Basically acts as if callbackWaitsForEmptyEventLoop is false.
The issue:
How this interplays with snowplow-nodejs-tracker's emitter is that, because the events are flushed asynchronously, there's a chance that the request has not been sent out before the function execution is terminated, even if you were to call flush.
Due to some interplay with how some other libraries work (namely some libraries like sequelize (an ORM) assume a long running nodejs environment and create some infinitely running setIntervals that manage its connection pool), its not always possible to set the callbackWaitsForEmptyEventLoop to true.
It would be a rather big gotcha to anybody who wants to use NodeJS Lambda to process snowplow events, which, given how well Lambda otherwise fits with the snowplow architecture, would be a shame.
I've not really seen a very good solution to this problem as there aren't a lot of libraries that especially cater for the Lambda operating environment. My experience so far with various libraries is to write hacks and work-arounds to make sure they properly clean themselves up when I want the function to end.
My proposal:
I was going to approach this by making the flush method return a Promise that signifies that a success/unsuccessful attempt to flush the events. Would this be something that would be accepted? The return type of flush is currently void so its technically a breaking change, but I'm thinking people aren't actually depending on that(?). Alternatively I could add another method called flushAsPromised (I'm not good at naming methods) that has the correct behaviour.
Alternatively, if this is not a good idea, could I get some guidance on whether or not the the Emitter interface is considered public? I would then implement my own Emitter thats more amenable to the Lambda execution environment.
The only other tracker I've used is the snowplow-ruby-tracker which differentiates between AsyncEmitter and Emitter. It looks like the Emitter class in this project is more akin to the AsyncEmitter in snowplow-ruby-tracker.
Thanks for reading. What do the maintainers think of this? Happy to chat more about this as well.
The text was updated successfully, but these errors were encountered:
clark-pan
changed the title
Add support synchronous flushing of events
Add support for results of flushing of events
Feb 13, 2019
This sounds like a nice improvement to me and I like the idea of improving the Lambda support of this library.
I'm planning on doing some tidy up / modernisation of this tracker for 0.4.0, but I can see this being a nice feature to drop in for 0.5.0.
I'm unsure how to approach this so I thought I might put it up here for some discussion.
Some background:
I'm using the
snowplow-nodejs-tracker
in an AWS Lambda environment to send events. I've detected an issue where some requests are not being sent out every once in a while (rarely).I think I've narrowed it down to an interplay on how AWS Lambda's operating environment works, and the async nature of the emitter in
snowplow-nodejs-tracker
.A quick primer on how AWS Lambda runs your code. When a request to run your function comes in, it spins up an instance of your function, and runs your code. When your function finishes execution, it puts the instance into a kind of
sleep mode
where your code is no longer running, but all objects in memory are kept intact waiting for the next execution run. Future executions can then take advantage of a speed boost because it does not have to re-initialize an instance of the function. This idea is called a cold/hot start in Lambda terminology.The issue comes in how Lambda decides what 'finishes execution' means. There is 2 interface for your code to execute within Lambda, a NodeJS style callback function (i.e.
module.exports = (event, context, callback) => { callback(null, 'success') }
) or an 'async/await' style, available for Node 8.10, where you return a promise in your handler (i.e.module.exports = async (event, context) => { return 'ok' }
In the
callback
style, whencallback
is invoked, Lambda will not immediately end the function execution, but rather wait for NodeJs' Event Loop to drain before ending your method. This behaviour is controllable via a property (callbackWaitsForEmptyEventLoop
) on thecontext
object give to you by AWS, where a value offalse
would immediately end execution on the function. The default istrue
.In the
promise
style, the behaviour is to terminate as soon as the returnPromise
resolves/rejects. Basically acts as ifcallbackWaitsForEmptyEventLoop
isfalse
.The issue:
How this interplays with
snowplow-nodejs-tracker
's emitter is that, because the events are flushed asynchronously, there's a chance that the request has not been sent out before the function execution is terminated, even if you were to callflush
.Due to some interplay with how some other libraries work (namely some libraries like
sequelize
(an ORM) assume a long running nodejs environment and create some infinitely runningsetIntervals
that manage its connection pool), its not always possible to set thecallbackWaitsForEmptyEventLoop
totrue
.It would be a rather big gotcha to anybody who wants to use NodeJS Lambda to process snowplow events, which, given how well Lambda otherwise fits with the snowplow architecture, would be a shame.
I've not really seen a very good solution to this problem as there aren't a lot of libraries that especially cater for the Lambda operating environment. My experience so far with various libraries is to write hacks and work-arounds to make sure they properly clean themselves up when I want the function to end.
My proposal:
I was going to approach this by making the
flush
method return aPromise
that signifies that a success/unsuccessful attempt to flush the events. Would this be something that would be accepted? The return type offlush
is currentlyvoid
so its technically a breaking change, but I'm thinking people aren't actually depending on that(?). Alternatively I could add another method calledflushAsPromised
(I'm not good at naming methods) that has the correct behaviour.Alternatively, if this is not a good idea, could I get some guidance on whether or not the the
Emitter
interface is considered public? I would then implement my ownEmitter
thats more amenable to the Lambda execution environment.The only other tracker I've used is the
snowplow-ruby-tracker
which differentiates betweenAsyncEmitter
andEmitter
. It looks like theEmitter
class in this project is more akin to theAsyncEmitter
insnowplow-ruby-tracker
.Thanks for reading. What do the maintainers think of this? Happy to chat more about this as well.
The text was updated successfully, but these errors were encountered: