Skip to content

Commit

Permalink
Redshift: use zstd compression encoding instead text255 with the colu…
Browse files Browse the repository at this point in the history
…mns for enum fields (#215)

Currently, we make the compression encoding of the columns for enum fields text255.
However, text255 doesn't allow to alter the column therefore we get exception while
trying to resize the column. In order to solve this problem, we decided to use zstd
compression encoding with enum fields as well.

Encoding of the newly created columns for enum fields will be zstd after this change
however compression encoding of the existing columns won't be updated after this
change and we will continue to get exception when those columns are tried to be resized.
Those exception will be caught and ignored in RDB Loader.
  • Loading branch information
spenes committed Dec 13, 2024
1 parent 5e21a96 commit 00bb416
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,9 @@ case class ShredModelEntry(
.flatMap(_.apply(subSchema))
.getOrElse(ShredModelEntry.ColumnType.RedshiftVarchar(ShredModelEntry.VARCHAR_SIZE))

lazy val compressionEncoding: ShredModelEntry.CompressionEncoding = (subSchema.`enum`, columnType) match {
case (Some(_), ShredModelEntry.ColumnType.RedshiftVarchar(size)) if size <= 255 =>
ShredModelEntry.CompressionEncoding.Text255Encoding
case (_, ShredModelEntry.ColumnType.RedshiftBoolean) => ShredModelEntry.CompressionEncoding.RunLengthEncoding
case (_, ShredModelEntry.ColumnType.RedshiftDouble) => ShredModelEntry.CompressionEncoding.RawEncoding
lazy val compressionEncoding: ShredModelEntry.CompressionEncoding = columnType match {
case ShredModelEntry.ColumnType.RedshiftBoolean => ShredModelEntry.CompressionEncoding.RunLengthEncoding
case ShredModelEntry.ColumnType.RedshiftDouble => ShredModelEntry.CompressionEncoding.RawEncoding
case _ => ShredModelEntry.CompressionEncoding.ZstdEncoding
}

Expand Down Expand Up @@ -169,7 +167,6 @@ object ShredModelEntry {

implicit val compressionEncodingShow: Show[CompressionEncoding] = Show.show {
case RawEncoding => s"ENCODE RAW"
case Text255Encoding => s"ENCODE TEXT255"
case ZstdEncoding => s"ENCODE ZSTD"
case RunLengthEncoding => "ENCODE RUNLENGTH"
}
Expand All @@ -178,8 +175,6 @@ object ShredModelEntry {

case object RunLengthEncoding extends CompressionEncoding

case object Text255Encoding extends CompressionEncoding

case object ZstdEncoding extends CompressionEncoding
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ class ShredModelEntrySpec extends Specification {
}

"suggest compression" should {
"suggest Text255Encoding for enums less then 255 in length" in {
"suggest zstd for enums less then 255 in length" in {
val props = json"""{"type": "string", "enum": ["one", "two"], "maxLength": 42}""".schema
ShredModelEntry(dummyPtr, props).compressionEncoding must beEqualTo(Text255Encoding)
ShredModelEntry(dummyPtr, props).compressionEncoding must beEqualTo(ZstdEncoding)
}

"suggest RunLengthEncoding for booleans" in {
Expand Down

0 comments on commit 00bb416

Please sign in to comment.