Skip to content

Commit

Permalink
Update tutorials to use blob2 (#4284)
Browse files Browse the repository at this point in the history
  • Loading branch information
sffc authored Nov 17, 2023
1 parent cd9cf17 commit abaff69
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 8 deletions.
2 changes: 1 addition & 1 deletion docs/tutorials/crates/buffer/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ bin/tutorial_buffer:

buffer_data.postcard: ../bin/icu4x-datagen bin/tutorial_buffer
../bin/icu4x-datagen \
--format blob \
--format blob2 \
--keys-for-bin bin/tutorial_buffer \
--locales my en-ZA \
--cldr-tag latest \
Expand Down
8 changes: 5 additions & 3 deletions docs/tutorials/data_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,14 +135,16 @@ $ cargo add icu --features serde
$ cargo add icu_provider_blob
```

We can generate data for it using the `--format blob` flag:
We can generate data for it using the `--format blob2` flag:

```console
$ icu4x-datagen --keys all --locales ja --format blob --out my_data_blob.postcard
$ icu4x-datagen --keys all --locales ja --format blob2 --out my_data_blob.postcard
```

This will generate a `my_data_blob.postcard` file containing the serialized data for all components. The file is several megabytes large; we will optimize it later!

💡 Note: `--format blob2` generates version 2 of the blob format. Alternatively, `--format blob` produces an older blob format which works with ICU4X prior to 1.4 but is not as optimized.

## Locale Fallbacking

Unlike `BakedDataProvider`, `BlobDataProvider` (and `FsDataProvider`) does not perform locale fallbacking. For example, if `en-US` is requested but only `en` data is available, then the data request will fail. To enable fallback, we can wrap the provider in a `LocaleFallbackProvider`.
Expand Down Expand Up @@ -191,7 +193,7 @@ As you can see in the second `expect` message, it's not possible to statically t
You might have noticed that the blob we generated is a hefty 13MB. This is no surprise, as we used `--keys all`. However, our binary only uses date formatting data in Japanese. There's room for optimization:

```console
$ icu4x-datagen --keys-for-bin target/debug/myapp --locales ja --format blob --out my_data_blob.postcard --overwrite
$ icu4x-datagen --keys-for-bin target/debug/myapp --locales ja --format blob2 --out my_data_blob.postcard --overwrite
```

The `--keys-for-bin` argument tells `icu4x-datagen` to analyze the binary and only include keys that are used by its code. This significantly reduces the blob's file size, to 54KB, and our program still works. Quite the improvement!
Expand Down
10 changes: 6 additions & 4 deletions docs/tutorials/data_management_interactive.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ cargo install icu_datagen
We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed.

```console
$ icu4x-datagen --keys all --locales ccp --format blob --out ccp.blob
$ icu4x-datagen --keys all --locales ccp --format blob2 --out ccp.blob
```

This will generate a `ccp.blob` file containing data for Chakma.

Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).
💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob).

💡 Note: `--format blob2` generates version 2 of the blob format. Alternatively, `--format blob` produces an older blob format which works with ICU4X prior to 1.4 but is not as optimized.

## 3. Using the data pack

Expand Down Expand Up @@ -133,7 +135,7 @@ Note: the following steps are currently only possible in Rust. 🤷
When we ran `icu4x-datagen`, we passed `--keys all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which keys are needed:

```console
$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob --out ccp_smaller.blob
$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob2 --out ccp_smaller.blob
```

Note: you usually want to build with the `--release` flag, and analyze that binary, but we don't have all day.
Expand Down Expand Up @@ -176,7 +178,7 @@ Now we can run datagen with `--keys-for-bin` again:

```console
$ cargo build
$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob --out ccp_smallest.blob
$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob2 --out ccp_smallest.blob
```

The output will be much shorter:
Expand Down

0 comments on commit abaff69

Please sign in to comment.