4.0.0
What's new
Atomic fields lengths configurable
Several atomic fields, such as mkt_clickid
have length limits defined (in this case, 128 characters). Recent versions of Enrich enforce these limits, so that oversized data does not break loading into the warehouse columns. However, over time we’ve observed that valid data does not always fit these limits. For example, TikTok click ids can be up to 500 (or 1000, according to some sources) characters long.
In this release, we are adding a way to configure the limits, and we are increasing the default limits for several fields:
mkt_clickid
limit increased from128
to1000
page_url
limit increased from4096
to10000
page_referrer
limit increased from4096
to10000
Depending on your configuration, this might be a breaking change:
- If you have
featureFlags.acceptInvalid
set totrue
in Enrich, then you probably don’t need to worry, because you had no validation in the first place (although we do recommend to enable it). - If you have
featureFlags.acceptInvalid
set tofalse
(default), then previously invalid events might become valid (which is a good thing), and you need to prepare your warehouse for this eventuality:- For Redshift, you should resize the respective columns, e.g. to
VARCHAR(1000)
formkt_clickid
. If you don’t, Redshift will truncate the values. - For Snowflake and Databricks, we recommend removing the VARCHAR limit altogether. Otherwise, loading might break with longer values. Alternatively, you can alter the Enrich configuration to revert the changes in the defaults.
- For BigQuery, no steps are necessary.
- For Redshift, you should resize the respective columns, e.g. to
Below is an example of how to configure these limits:
{
...
# Optional. Configuration section for various validation-oriented settings.
"validation": {
# Optional. Configuration for custom maximum atomic fields (strings) length.
# Map-like structure with keys being field names and values being their max allowed length
"atomicFieldsLimits": {
"app_id": 5
"mkt_clickid": 100000
# ...and any other 'atomic' field with custom limit
}
}
}
Azure Blob Storage support
enrich-kafka
can now download enrichments' assets (e.g. MaxMind database) from Azure Blob Storage.
See the configuration reference for the setup.
New license
Following our recent licensing announcement, Enrich
is now released under the Snowplow Limited Use License Agreement
.
stream-enrich
assets and enrich-rabbitmq
deprecated
As announced a while ago, stream-enrich
assets and enrich-rabbitmq
are now deprecated.
Only one asset now exists for each type of message queue.
Setup guide for each can be found on this page.
Upgrading to 4.0.0
Migration guide can be found on this page.
Changelog
- Bump aws-msk-iam-aut to 2.0.3 (#857)
- Scan enrich-kafka and enrich-nsq Docker images (#857)
- Remove lacework workflow (#859)
- Use SLF4J for Cats Effect starvation warning message (#858)
- Bump jackson to 2.16.1 (#857)
- Bump azure-identity to 1.11.1 (#857)
- Bump http4s to 0.23.25 (#857)
- Bump fs2-blobstore to 0.9.12 (#857)
- Bump AWS SDK v2 to 2.23.9 (#857)
- Bump AWS SDK to 1.12.643 (#857)
- Bump mysql-connector-j to 8.3.0 (#857)
- Make atomic field limits configurable (#850)
- Switch from Blaze client to Ember client (#853)
- Upgrade to Cats Effect 3 ecosystem (#837)
- Add headset to the list of valid platform codes (#851)
- Add mandatory SLULA license acceptance flag (#848)
- Move to Snowplow Limited Use License (#846)
- Add different types of authentication for azure blob storage (#845)
- Remove config logging (#843)
- enrich-kafka: support for multiple Azure blob storage account (#842)
- enrich-kafka: add blob storage support (#831)
- Deprecate enrich-rabbitmq (#822)
- Deprecate Stream Enrich (#788)