Report snowplow-specific metrics #178

colmsnowplow · 2022-07-26T10:02:16Z

Even though the app is data-agnostic, there's a good argument that we should still report snowplow-specific metrics. Our usage of the app is snowplow-specific, and reporting latency from collector to target is valuable.

I think we should consider how to fit this into the design and see if we can accommodate it. Perhaps some setting that specifies that it's Snowplow data and grabs collector tstamp for metrics reporting purposes.

jbeemster · 2022-07-26T10:07:06Z

You could try and infer data-input type possibly - perhaps with some pattern matching. I think you could have three different Snowplow inputs generally:

raw: thrift decoder required
enriched: you already can parse this with analytics SDK
bad: JSON -> would need a decoder that would let you pull the correct value
other: default to timestamp of the record on the stream (what we do currently as far as I can remember)

One question from me though is what would be the cost implications of parsing every inbound event to extract the timestamp?

colmsnowplow · 2022-07-26T10:15:03Z

One question from me though is what would be the cost implications of parsing every inbound event to extract the timestamp?

I have the same concern - hopefully it'd be minimal since the analytics we constructed the analytics SDK in such a way as we can retrieve individual fields without processing the entire event. (Filters operate this way and are relatively efficient). But yes I'd want to keep an eye on it.

You could try and infer data-input type possibly - perhaps with some pattern matching. I think you could have three different Snowplow inputs generally:

raw: thrift decoder required
enriched: you already can parse this with analytics SDK
bad: JSON -> would need a decoder that would let you pull the correct value
other: default to timestamp of the record on the stream (what we do currently as far as I can remember)

Decoding thrift for the sake of grabbing the collector tstamp seems like overkill. And we don't have a use case for stream replicator-ing bad data at the moment. So my suggestion here would be to worry about enriched, and wait for requirements for other formats to surface themselves if they exist.

colmsnowplow mentioned this issue Jul 26, 2022

Release/1.0.0 #143

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report snowplow-specific metrics #178

Report snowplow-specific metrics #178

colmsnowplow commented Jul 26, 2022 •

edited

Loading

jbeemster commented Jul 26, 2022

colmsnowplow commented Jul 26, 2022

Report snowplow-specific metrics #178

Report snowplow-specific metrics #178

Comments

colmsnowplow commented Jul 26, 2022 • edited Loading

jbeemster commented Jul 26, 2022

colmsnowplow commented Jul 26, 2022

colmsnowplow commented Jul 26, 2022 •

edited

Loading