Releases: snowplow/enrich
5.0.0
This release adds the possibility to emit failed events to a third stream, with the exact same format as enriched events (TSV). For each error that happened, a failure entity gets added to derived_contexts
field.
CHANGELOG
- Add possibilty to emit failed events in TSV format into a third stream (#872)
4.2.1
- Replace mysql-connector with mariadb-client
- Upgrade fs2-kafka to 3.5.1
- Update schema for com.mandrill/message_opened/jsonschema/1-0-3 (#893)
- Bump sbt-snowplow-release to 0.3.2 (#892)
- Don't pass field value to ValidatorReport if validation fails (#892)
Full Changelog: 4.2.0...4.2.1
4.2.0
This release brings a few changes in some bad rows emitted by Enrich.
What's new
Switch from EnrichmentFailures
to SchemaViolations
for some errors
The following errors were previously emitted as EnrichmentFailures
bad rows and will now get emitted as SchemaViolations
(against atomic schema):
- When the context added by an enrichment is invalid.
- When something goes wrong when the input fields of the HTTP request are mapped to the fields of the enriched event (e.g. when
tr_tt
is converted from string to number and mapped totr_total
). - When an atomic field is longer than the limit.
More errors wrapped inside a same bad row
Before 4.2.0
, if there was any error in the mapping of the atomic fields, a bad row would get emitted right away and we would not try to validate the entities and unstructured event. All these errors are now wrapped inside a same SchemaViolations
bad row.
Likewise, before 4.2.0
, when an enrichment context was invalid, we were emitting a bad row right away and we were not checking the lengths of the atomic fields. Now all these errors are wrapped inside a same SchemaViolations
bad row.
So 4.2.0
is more exhaustive in the errors that get wrapped inside a bad row.
Upgrading to 4.2.0
When upgrading from 4.0.0
or 4.1.0
, there is only need to bump the version.
Check out Enrich documentation for the full guide on running and configuring the app.
CHANGELOG
4.1.0
What's new
Cross Navigation Enrichment
In this version, we introduce new enrichment: Cross Navigation Enrichment. This enrichment will be able to parse the extended cross navigation format in _sp
querystring parameter and attach the cross_navigation
context to an event.
The _sp
parameter can be attached by our Web (see cross-domain tracking) and mobile trackers and contains user, session and app identifiers (e.g., domain user and session IDs, business user ID, source app ID). The information to include in the parameters is configurable in the trackers. This is useful for tracking the movement of users across different apps and platforms.
The extended cross navigation format can be described by _sp={domainUserId}.{timestamp}.{sessionId}.{subjectUserId}.{sourceId}.{platform}.{reason}
More information about this enrichment can be found in here.
Multiple JS enrichments support
Starting with this version, it is possible to have multiple JS enrichments. This allows to implement new enrichments in JavaScript and easily add them to Enrich. Currently, the order in which they would run is not defined.
Passing an object of parameters to the JS enrichment
As mentioned above, we added support for multiple JS enrichments. This simplifies implementing custom enrichments in JavaScript and adding them to Enrich. However, most enrichments take parameters. To avoid having to change the JavaScript code (and re-encode it in base64) every time there is a parameter change, we've added capability to pass these parameters in the enrichment configuration.
You can pass these parameters in the enrichment configuration, for example:
{
"schema": "iglu:com.snowplowanalytics.snowplow/javascript_script_config/jsonschema/1-0-1",
"data": {
"vendor": "com.snowplowanalytics.snowplow",
"name": "javascript_script_config",
"enabled": true,
"parameters": {
"script": "script",
"config": {
"foo": 3,
"nested": {
"bar": "test"
}
}
}
}
}
The parameter object can be accessed in JavaScript enrichment code via the second
parameter of the process function, for example:
function process(event, params) {
event.setApp_id(params.nested.bar);
return [];
}
Authentication with Azure Event Hubs using OAuth2 in Enrich Kafka
The new version of Enrich Kafka allows to authenticate with Azure Event Hubs using OAuth2. If you would like to use this authentication method with Azure Event Hubs, you don't have to pass anything extra in the config. It is enough to remove security.protocol
, sasl.mechanism
and sasl.jaas.config
properties from consumerConf
and producerConf
sections. Application sets the necessary properties with the required values by default.
Stopping publishing jar files
Starting with 4.1.0, we no longer publish jar files of the applications. If you are still using jar files to run the application, we recommend to switch to Docker. You can find the running instructions for Docker in the docs page of the respective component.
Changelog
- enrich-kafka: authenticate with Event Hubs using OAuth2 (#863)
- Add Cross Navigation Enrichment (#855)
- Allow multiple javascript enrichments (#868)
- Add the message delayed event to Mandrill adapter and update schema versions (#815)
- Stop publishing fat jars (#862)
- Allow passing an object of parameters to the JS enrichment (#871)
- Make cookie extractor enrichment case insensitive (#877)
- Add tracking scenario ID in observed_event if defined (#807)
- Rename tracking_scenario to event_specification (#879)
- Bump nimbus-jose-jwt to 9.37.2 (#880)
- Bump postgres driver to 42.7.2 (#880)
4.0.1
4.0.0
What's new
Atomic fields lengths configurable
Several atomic fields, such as mkt_clickid
have length limits defined (in this case, 128 characters). Recent versions of Enrich enforce these limits, so that oversized data does not break loading into the warehouse columns. However, over time we’ve observed that valid data does not always fit these limits. For example, TikTok click ids can be up to 500 (or 1000, according to some sources) characters long.
In this release, we are adding a way to configure the limits, and we are increasing the default limits for several fields:
mkt_clickid
limit increased from128
to1000
page_url
limit increased from4096
to10000
page_referrer
limit increased from4096
to10000
Depending on your configuration, this might be a breaking change:
- If you have
featureFlags.acceptInvalid
set totrue
in Enrich, then you probably don’t need to worry, because you had no validation in the first place (although we do recommend to enable it). - If you have
featureFlags.acceptInvalid
set tofalse
(default), then previously invalid events might become valid (which is a good thing), and you need to prepare your warehouse for this eventuality:- For Redshift, you should resize the respective columns, e.g. to
VARCHAR(1000)
formkt_clickid
. If you don’t, Redshift will truncate the values. - For Snowflake and Databricks, we recommend removing the VARCHAR limit altogether. Otherwise, loading might break with longer values. Alternatively, you can alter the Enrich configuration to revert the changes in the defaults.
- For BigQuery, no steps are necessary.
- For Redshift, you should resize the respective columns, e.g. to
Below is an example of how to configure these limits:
{
...
# Optional. Configuration section for various validation-oriented settings.
"validation": {
# Optional. Configuration for custom maximum atomic fields (strings) length.
# Map-like structure with keys being field names and values being their max allowed length
"atomicFieldsLimits": {
"app_id": 5
"mkt_clickid": 100000
# ...and any other 'atomic' field with custom limit
}
}
}
Azure Blob Storage support
enrich-kafka
can now download enrichments' assets (e.g. MaxMind database) from Azure Blob Storage.
See the configuration reference for the setup.
New license
Following our recent licensing announcement, Enrich
is now released under the Snowplow Limited Use License Agreement
.
stream-enrich
assets and enrich-rabbitmq
deprecated
As announced a while ago, stream-enrich
assets and enrich-rabbitmq
are now deprecated.
Only one asset now exists for each type of message queue.
Setup guide for each can be found on this page.
Upgrading to 4.0.0
Migration guide can be found on this page.
Changelog
- Bump aws-msk-iam-aut to 2.0.3 (#857)
- Scan enrich-kafka and enrich-nsq Docker images (#857)
- Remove lacework workflow (#859)
- Use SLF4J for Cats Effect starvation warning message (#858)
- Bump jackson to 2.16.1 (#857)
- Bump azure-identity to 1.11.1 (#857)
- Bump http4s to 0.23.25 (#857)
- Bump fs2-blobstore to 0.9.12 (#857)
- Bump AWS SDK v2 to 2.23.9 (#857)
- Bump AWS SDK to 1.12.643 (#857)
- Bump mysql-connector-j to 8.3.0 (#857)
- Make atomic field limits configurable (#850)
- Switch from Blaze client to Ember client (#853)
- Upgrade to Cats Effect 3 ecosystem (#837)
- Add headset to the list of valid platform codes (#851)
- Add mandatory SLULA license acceptance flag (#848)
- Move to Snowplow Limited Use License (#846)
- Add different types of authentication for azure blob storage (#845)
- Remove config logging (#843)
- enrich-kafka: support for multiple Azure blob storage account (#842)
- enrich-kafka: add blob storage support (#831)
- Deprecate enrich-rabbitmq (#822)
- Deprecate Stream Enrich (#788)
3.9.0
This release bumps dependencies for potential security vulnerabilities. Also, it sets user-agent header in Pubsub publisher and consumer.
Changelog
3.8.2
3.8.1
It is now possible to ignore API and SQL enrichments errors thanks to a new parameter: ignoreOnError
(SQL and API). When set to true
, no bad row will be emitted if the enrichment fails and the enriched event will be emitted without the context added by the enrichment.
S3 and GCS dependencies were added to enrich-nsq
asset so that it can be used in Mini, following our plan to deprecate Stream Enrich assets.
CHANGELOG
- Github Actions: split testing and releasing (#806)
- Bump AWS SDK to 1.12.506 (#805)
- Bump snakeyaml to 1.33 (#804)
- Bump jackson to 2.15.2 (#803)
- Bump uap-java to 1.5.4 (#802)
- Bump log4j to 2.20.0 (#801)
- Remove bench module (#799)
- enrich-nsq: add S3 and GCS dependencies (#793)
- Add eventVolume and platform to observed_event (#795)
- Makes schemas configurable in adapters (#791)
- Update Iglu Scala Client to 1.5.0 (#794)
- common: ignore API/SQL enrichments when failing (#760)
3.8.0
This version comes with a new Enrich app, enrich-nsq
. Also, it has following improvements:
- Superseding schemas
- Improvements in API/SQL enrichments
- Making derived contexts accesible to the JavaScript enrichment
Superseding schemas
Schemas define the structure of the data that you collect. Each schema defines what fields are recorded with each event that is captured, and provides validation criteria for each field. Schemas are also used to describe the structure of entities that are attached to events.
However, there are some cases where we want to replace schema versions in incoming events with another version due to some problem in the tracking code. The new superseding schemas feature makes this possible.
So, how does this work exactly? If we want a schema to be replaced by another one, we state this with $supersededBy
field of the schema. Later, when an event with superseded schema arrived, superseded schema version will be replaced by the specified superseding schema version.
Improvements in API/SQL enrichments
The API enrichment lets you perform dimension widening on a Snowplow event via your own or third-party proprietary http(s) API. The SQL enrichment is the relational database counterpart of the API Enrichment. It allows you to use relational database to perform dimension widening.
Enrich caches the results of API requests and SQL queries with the API/SQL enrichments to avoid continuous calls. We've made some improvements in caching the errors. These improvements are:
- Set TTL for errors to the tenth of TTL for successful results. With this way, API/SQL requests can be retried faster in case of cached error.
- When we get an error, the error will be cached but we will return last known 'old' good value for further processing. This fallback would allow Enrich to produce fewer bad rows in case of 'getting stuck' with errors in the enrichment cache.
More details about the caching improvements can be found here.
Also, we've made some changes to the way we handle database connections with SQL enrichment. These changes should lead to acquiring database connections in a better way and better usage of existing database connections.
enrich-nsq, new member of 2nd generation enrich apps
In this release, enrich-nsq
becomes the newest member of 2nd generation Enrich apps. It allows to read from and write to NSQ topics.
Instructions to setup and configure Enrich can be found on our docs website.
Making derived contexts accesible to the JavaScript enrichment
Previously, the JavaScript enrichment allowed users to call event.getDerived_contexts()
, however, it was returning always null
. Starting with Enrich 3.8.0, it will be possible to access derived contexts in the JavaScript enrichment.
Changelog
- Take superseding schema into account during validation (#751)
- common: Provide derived contexts to JS enrichment (#769)
- Scan Docker images in Snyk Github action (#772)
- common: do not validate enrichment names (#767)
- common: SQL enrichment: get connection only if request not cached (#765)
- common: SQL enrichment: put getConnection in Blocker (#763)
- common-fs2: fix env var substitution for JSON files (#753)
- Add enrich-nsq (#740)
- fix: add mskAuth to kafka depedencies (#746)
- common: improve caching in API/SQL enrichments (#747)