Releases: snowplow/snowplow-rdb-loader
5.6.2
Fixes a regression which under rare circumstances caused exceptions like:
Load failed and will not be retried: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different; = SqlState: 0A000: [Amazon](500310) Invalid operation: cannot alter column "xyz" of relation "com_example_foo_2", target column size should be different;
Changelog
- Fix pattern matching on known exception for alter table failures (#1283)
5.6.1
A patch release to address small bugs which crept in with the 5.5.x series. These bugs only affect pipelines using SSH tunnels or pipelines sending failed events to Kinesis from the batch transformer.
Changelog
5.6.0
Starting with this version, loaders will create the database schema you've passed in the config automatically on initialization if it isn't created previously. No further configuration is needed to enable this.
The database user for the loader needs to have permission to create schemas to make this feature work. If the user doesn't have the necessary permission, the loader will just skip this step. In that case, you will need to create the schema manually prior to running the loader.
This feature only affects new deployments. If you are already successfully running the loader, nothing will change.
Changelog
5.5.0
Config parsing improvements
Before version 5.5.0, the only way of passing configuration to application was providing BASE64 encoded HOCON (for application config) and JSON (for Iglu resolver config) as a command line options.
Starting from version 5.5.0, it's possible to provide a full path to the configuration files. Here is an example, which mounts a config directory into the docker container at run time:
docker run \
-v /path/to/config:/myconfig \
snowplow/rdb-loader-redshift:5.5.0 \
--config /myconfig/loader.hocon \
--iglu-config /myconfig/resolver.json
It's no longer necessary to use BASE64 encoded strings on the command line, but to preserve compatibility the old way of configuring is still supported.
What is more, it's now possible to provide HOCON file for Iglu resolver configuration, so just like in the case of application configuration. This is important, as it allows you to utilize all great features of HOCON format for Iglu as well, like environment variable resolution. Plain JSON file is still supported.
These changes apply for all the loaders (Redshift, Snowflake, Databricks) and transformer (batch, streaming) applications.
Improved robustness of the loader
We've made quite a few small under-the-hood improvements, which we hope will make the loader more resilient against transient failures. We identified some of the most common edge-case error scenarios, where previous versions of the loader might hit an error, e.g. due to a stale connection or a network issue. The small changes include better handling of old connections, and retrying on transient failures.
Batch Transformer: transform_duration
metric
Batch transformer can now send a new metric to Cloudwatch, if configured: transform_duration
, which contains the duration needed to transform an input folder.
Upgrading
If you are already using a recent version of RDB Loader (3.0.0
or higher) then upgrading to 5.5.0
is as simple as pulling the newest docker images.
There are no changes needed to your configuration files.
docker pull snowplow/rdb-loader-redshift:5.5.0
docker pull snowplow/rdb-loader-snowflake:5.5.0
docker pull snowplow/rdb-loader-databricks:5.5.0
docker pull snowplow/transformer-pubsub:5.5.0
docker pull snowplow/transformer-kinesis:5.5.0
Starting from this version, batch transformer requires to use Java 11 om EMR (default is Java 8), for instance by running this script as a bootstrap action (needs to be stored on s3):
#!/bin/bash
set -e
sudo update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java
exit 0
Snowplow docs website has a full guide for running the RDB Loader and the transformer.
Changelog
- Bump Snowflake driver to 3.13.30 (#1256)
- Upgrade Databricks JDBC driver (#1254)
- Config parsing improvements (#1252)
- Loader: limit the total time spent retrying a failed load (#1251)
- Loader: do not skip batches on warehouse connection failures (#1250)
- Loader: Do not attempt rollback when connection is already closed (#1240)
- Use sbt-snowplow-release to build docker images (#1222)
- Loader: Improvements to webhook alerts (#1238)
- Add load_tstamp column to table definitions (#1233)
- Loader: Disable warnings on incomplete shredding for the streaming transformer (#967)
- Batch Transformer: emit transform_duration metric (#1236)
- Batch Transformer: use JDK 11 in assembly (#1241)
- Bump dependencies with CVEs (#1234)
- Loader: Retry failures for all warehouse operations (#1225)
- Loader: Avoid errors for "Connection is not available" (#1223)
- Upgrade to Cats Effect 3 (#1219)
5.4.3
5.4.2
5.4.1
5.4.0
This release brings a few features and bug fixes improving stability and observability of RDB Loader.
Full changelog
- Transformer: add flag disabling atomic fields truncation to the transformer (#1217)
- Loader fix: loaded batches must not get stuck in the retry queue (#1210)
- Databricks loader: resilient to duplicate entries in manifest table (#1213)
- Loader: make temp credentials session duration configurable (#1215)
- Upgrade schema-ddl to 0.17.1 (#1207)
- Snowflake ready check that does not require operate permission on the warehouse (#1195)
- Add Kinesis/Pubsub badrows sink to streaming transformer (#1189)
- Add Kinesis badrows sink to batch transformer (#1188)
- Transformer: set event limit per parquet partition (#1178)
- Transformer bad row count metric (#1171)
- Don't use Spark sink for parquet badrows (#1168)
5.3.2
This patch release brings a few features and bug fixes improving stability and observability of RDB Loader
Full changelog
- Loader: single COPY statement for each unique schema for Redshift (#1202)
- Loader: Improve management of temporary credentials (#1205)
- Scan Docker images with Snyk container monitor in ci.yml (#1191)
- Add alert webhook message with summary of folder monitoring issues (#1173)
- Enforce timeout on rolling back failed transaction (#1194)
- Loader: Databricks surface driver log level (#1180)
5.3.1
In 5.3.0, we've introduced a bug on Snowflake Loader that makes it not copy contexts and unstruct events to events table. We've fixed this problem in version 5.3.1. Thanks mgkeen for reporting this issue. Recovery instructions for missing data can be found in the Discourse post.
Also, in this version, we've started to use VARCHAR instead of CHAR with standard fields when creating events table on Databricks Loader (Github issue).