diff --git a/docs/tutorials/crates/buffer/Makefile b/docs/tutorials/crates/buffer/Makefile index 8342d4a2cbd..68ec080aee0 100755 --- a/docs/tutorials/crates/buffer/Makefile +++ b/docs/tutorials/crates/buffer/Makefile @@ -18,7 +18,7 @@ bin/tutorial_buffer: buffer_data.postcard: ../bin/icu4x-datagen bin/tutorial_buffer ../bin/icu4x-datagen \ - --format blob \ + --format blob2 \ --keys-for-bin bin/tutorial_buffer \ --locales my en-ZA \ --cldr-tag latest \ diff --git a/docs/tutorials/data_management.md b/docs/tutorials/data_management.md index 38f66ce34b9..ce53513091b 100644 --- a/docs/tutorials/data_management.md +++ b/docs/tutorials/data_management.md @@ -135,14 +135,16 @@ $ cargo add icu --features serde $ cargo add icu_provider_blob ``` -We can generate data for it using the `--format blob` flag: +We can generate data for it using the `--format blob2` flag: ```console -$ icu4x-datagen --keys all --locales ja --format blob --out my_data_blob.postcard +$ icu4x-datagen --keys all --locales ja --format blob2 --out my_data_blob.postcard ``` This will generate a `my_data_blob.postcard` file containing the serialized data for all components. The file is several megabytes large; we will optimize it later! +💡 Note: `--format blob2` generates version 2 of the blob format. Alternatively, `--format blob` produces an older blob format which works with ICU4X prior to 1.4 but is not as optimized. + ## Locale Fallbacking Unlike `BakedDataProvider`, `BlobDataProvider` (and `FsDataProvider`) does not perform locale fallbacking. For example, if `en-US` is requested but only `en` data is available, then the data request will fail. To enable fallback, we can wrap the provider in a `LocaleFallbackProvider`. @@ -191,7 +193,7 @@ As you can see in the second `expect` message, it's not possible to statically t You might have noticed that the blob we generated is a hefty 13MB. This is no surprise, as we used `--keys all`. However, our binary only uses date formatting data in Japanese. There's room for optimization: ```console -$ icu4x-datagen --keys-for-bin target/debug/myapp --locales ja --format blob --out my_data_blob.postcard --overwrite +$ icu4x-datagen --keys-for-bin target/debug/myapp --locales ja --format blob2 --out my_data_blob.postcard --overwrite ``` The `--keys-for-bin` argument tells `icu4x-datagen` to analyze the binary and only include keys that are used by its code. This significantly reduces the blob's file size, to 54KB, and our program still works. Quite the improvement! diff --git a/docs/tutorials/data_management_interactive.md b/docs/tutorials/data_management_interactive.md index 1f9137fc216..6dbc5103280 100644 --- a/docs/tutorials/data_management_interactive.md +++ b/docs/tutorials/data_management_interactive.md @@ -26,12 +26,14 @@ cargo install icu_datagen We're ready to generate the data. We will use the blob format, and create a blob that will contain just Chakma data. At runtime we can then load it as needed. ```console -$ icu4x-datagen --keys all --locales ccp --format blob --out ccp.blob +$ icu4x-datagen --keys all --locales ccp --format blob2 --out ccp.blob ``` This will generate a `ccp.blob` file containing data for Chakma. -Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob). +💡 Note: if you're having technical difficulties, this file is available [here](https://storage.googleapis.com/static-493776/icu4x_2023-11-03/ccp.blob). + +💡 Note: `--format blob2` generates version 2 of the blob format. Alternatively, `--format blob` produces an older blob format which works with ICU4X prior to 1.4 but is not as optimized. ## 3. Using the data pack @@ -133,7 +135,7 @@ Note: the following steps are currently only possible in Rust. 🤷 When we ran `icu4x-datagen`, we passed `--keys all`, which make it generate *all* data for the Chakma locale, even though we only need date formatting. We can make `icu4x-datagen` analyze our binary to figure out which keys are needed: ```console -$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob --out ccp_smaller.blob +$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob2 --out ccp_smaller.blob ``` Note: you usually want to build with the `--release` flag, and analyze that binary, but we don't have all day. @@ -176,7 +178,7 @@ Now we can run datagen with `--keys-for-bin` again: ```console $ cargo build -$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob --out ccp_smallest.blob +$ icu4x-datagen --keys-for-bin target/debug/tutorial --locales ccp --format blob2 --out ccp_smallest.blob ``` The output will be much shorter: