Skip to content

Commit

Permalink
misc typo/format fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
aeluce committed Dec 27, 2024
1 parent 59f07d9 commit e0d55e1
Show file tree
Hide file tree
Showing 13 changed files with 62 additions and 57 deletions.
4 changes: 2 additions & 2 deletions site/docs/concepts/advanced/evolutions.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,12 @@ When you attempt to publish a breaking change to a collection in the Flow web ap

Click the **Apply** button to trigger an evolution and update all necessary specification to keep your Data Flow functioning. Then, review and publish your draft.

If you enabled [AutoDiscover](../captures.md#autodiscover) on a capture, any breaking changes that it introduces will trigger an automatic schema evolution, so long as you selected the **Breaking change re-versions collections** option(`evolveIncompatibleCollections`).
If you enabled [AutoDiscover](../captures.md#autodiscover) on a capture, any breaking changes that it introduces will trigger an automatic schema evolution, so long as you selected the **Breaking change re-versions collections** option (`evolveIncompatibleCollections`).

## What do schema evolutions do?

The schema evolution feature is available in the Flow web app when you're editing pre-existing Flow entities.
It notices when one of your edit would cause other components of the Data Flow to fail, alerts you, and gives you the option to automatically update the specs of these components to prevent failure.
It notices when one of your edits would cause other components of the Data Flow to fail, alerts you, and gives you the option to automatically update the specs of these components to prevent failure.

In other words, evolutions happen in the *draft* state. Whenever you edit, you create a draft.
Evolutions add to the draft so that when it is published and updates the active data flow, operations can continue seamlessly.
Expand Down
2 changes: 1 addition & 1 deletion site/docs/concepts/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ If desired, a derivation could re-key the collection
on `[/userId, /name]` to materialize the various `/name`s seen for a `/userId`.

This property makes keys less lossy than they might otherwise appear,
and it is generally good practice to chose a key that reflects how
and it is generally good practice to choose a key that reflects how
you wish to _query_ a collection, rather than an exhaustive key
that's certain to be unique for every document.

Expand Down
2 changes: 1 addition & 1 deletion site/docs/concepts/connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ sops:
```
You then use this `config.yaml` within your Flow specification.
The Flow runtime knows that this document is protected by `sops`
The Flow runtime knows that this document is protected by `sops`,
will continue to store it in its protected form,
and will attempt a decryption only when invoking a connector on your behalf.

Expand Down
7 changes: 4 additions & 3 deletions site/docs/concepts/derivations.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,8 @@ into JSON arrays or objects and embeds them into the mapped document:
`{"greeting": "hello", "items": [1, "two", 3]}`.
If parsing fails, the raw string is used instead.

If you would like to select all columns of the input collection,
rather than `select *`, use `select JSON($flow_document)`, e.g.
If you would like to select all columns of the input collection,
rather than `select *`, use `select JSON($flow_document)`, e.g.
`select JSON($flow_document where $status = open;`.

As a special case if your query selects a _single_ column
Expand Down Expand Up @@ -608,6 +608,7 @@ Flow read delays are very efficient and scale better
than managing very large numbers of fine-grain timers.

[See Grouped Windows of Transfers for an example using a read delay](#grouped-windows-of-transfers)

[Learn more from the Citi Bike "idle bikes" example](https://github.com/estuary/flow/blob/master/examples/citi-bike/idle-bikes.flow.yaml)

### Read priority
Expand Down Expand Up @@ -639,7 +640,7 @@ For SQLite derivations,
the entire SQLite database is the internal state of the task.
TypeScript derivations can use in-memory states with a
recovery and checkpoint mechanism.
Estuary intends to offer an additional mechanisms for
Estuary intends to offer additional mechanisms for
automatic internal state snapshot and recovery in the future.

The exact nature of internal task states vary,
Expand Down
2 changes: 1 addition & 1 deletion site/docs/concepts/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sidebar_position: 7
---
# Imports

When you work on a draft Data Flow [using `flowctl draft`](../concepts/flowctl.md#working-with-drafts),
When you work on a draft Data Flow [using `flowctl draft`](../guides/flowctl/edit-draft-from-webapp.md),
your Flow specifications may be spread across multiple files.
For example, you may have multiple **materializations** that read from collections defined in separate files,
or you could store a **derivation** separately from its **tests**.
Expand Down
31 changes: 17 additions & 14 deletions site/docs/concepts/materialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ You define and configure materializations in **Flow specifications**.
Materializations use real-time [connectors](./connectors.md) to connect to many endpoint types.

When you use a materialization connector in the Flow web app,
flow helps you configure it through the **discovery** workflow.
Flow helps you configure it through the **discovery** workflow.

To begin discovery, you tell Flow the connector you'd like to use, basic information about the endpoint,
and the collection(s) you'd like to materialize there.
Expand Down Expand Up @@ -67,7 +67,7 @@ materializations:
# Name of the collection to be read.
# Required.
name: acmeCo/example/collection
# Lower bound date-time for documents which should be processed.
# Lower bound date-time for documents which should be processed.
# Source collection documents published before this date-time are filtered.
# `notBefore` is *only* a filter. Updating its value will not cause Flow
# to re-process documents that have already been read.
Expand All @@ -93,11 +93,11 @@ materializations:
# Priority applied to documents processed by this binding.
# When all bindings are of equal priority, documents are processed
# in order of their associated publishing time.
#
#
# However, when one binding has a higher priority than others,
# then *all* ready documents are processed through the binding
# before *any* documents of other bindings are processed.
#
#
# Optional. Default: 0, integer >= 0
priority: 0

Expand Down Expand Up @@ -362,24 +362,27 @@ field implemented. Consult the individual connector documentation for details.
### How It Works

1. **Source Capture Level:**
- If the source capture provides a schema or namespace, it will be used as the default schema for all bindings in
- the materialization.

If the source capture provides a schema or namespace, it will be used as the default schema for all bindings in the materialization.

2. **Manual Overrides:**
- You can still manually configure schema names for each binding, overriding the default schema if needed.

You can still manually configure schema names for each binding, overriding the default schema if needed.

3. **Materialization-Level Configuration:**
- The default schema name can be set at the materialization level, ensuring that all new captures within that
- materialization automatically inherit the default schema name.

The default schema name can be set at the materialization level, ensuring that all new captures within that materialization automatically inherit the default schema name.

### Configuration Steps

1. **Set Default Schema at Source Capture Level:**
- When defining your source capture, specify the schema or namespace. If no schema is provided, Estuary Flow will
- automatically assign a default schema.

When defining your source capture, specify the schema or namespace. If no schema is provided, Estuary Flow will automatically assign a default schema.

2. **Override Schema at Binding Level:**
- For any binding, you can manually override the default schema by specifying a different schema name.

For any binding, you can manually override the default schema by specifying a different schema name.

3. **Set Default Schema at Materialization Level:**
- During the materialization configuration, set a default schema name for all captures within the materialization.

During the materialization configuration, set a default schema name for all captures within the materialization.
4 changes: 2 additions & 2 deletions site/docs/concepts/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Flow can usually generate suitable JSON schemas on your behalf.

For systems like relational databases, Flow will typically generate a complete JSON schema by introspecting the table definition.

For systems that store unstructured data, Flow will typically generate a very minimal schema, and will rely on schema inferrence to fill in the details. See [continuous schema inferenece](#continuous-schema-inference) for more information.
For systems that store unstructured data, Flow will typically generate a very minimal schema, and will rely on schema inference to fill in the details. See [continuous schema inference](#continuous-schema-inference) for more information.

### Translations

Expand All @@ -72,7 +72,7 @@ Schema inference is also used to provide translations into other schema flavors:
### Annotations

The JSON Schema standard introduces the concept of
[annotations](http://json-schema.org/understanding-json-schema/reference/generic.html#annotations),
[annotations](https://json-schema.org/understanding-json-schema/reference/annotations),
which are keywords that attach metadata to a location within a validated JSON document.
For example, `title` and `description` can be used to annotate a schema with its meaning:

Expand Down
2 changes: 1 addition & 1 deletion site/docs/concepts/storage-mappings.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Flow tasks — captures, derivations, and materializations — use recovery logs
Recovery logs are an opaque binary log, but may contain user data.

The recovery logs of a task are always prefixed by `recovery/`,
so a task named `acmeCo/produce-TNT` would have a recovery log called `recovery/acmeCo/roduce-TNT`
so a task named `acmeCo/produce-TNT` would have a recovery log called `recovery/acmeCo/produce-TNT`

Flow prunes data from recovery logs once it is no longer required.

Expand Down
18 changes: 9 additions & 9 deletions site/docs/guides/flowctl/edit-draft-from-webapp.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,13 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with

2. Run `flowctl draft list`

flowctl outputs a table of all the drafts to which you have access, from oldest to newest.
flowctl outputs a table of all the drafts to which you have access, from oldest to newest.

3. Use the name and timestamp to find the draft you're looking for.

Each draft has an **ID**, and most have a name in the **Details** column. Note the **# of Specs** column.
For drafts created in the web app, materialization drafts will always contain one specification.
A number higher than 1 indicates a capture with its associated collections.
Each draft has an **ID**, and most have a name in the **Details** column. Note the **# of Specs** column.
For drafts created in the web app, materialization drafts will always contain one specification.
A number higher than 1 indicates a capture with its associated collections.

4. Copy the draft ID.

Expand All @@ -57,10 +57,10 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with

7. Browse the source files.

The source files and their directory structure will look slightly different depending on the draft.
Regardless, there will always be a top-level file called `flow.yaml` that *imports* all other YAML files,
which you'll find in a subdirectory named for your catalog prefix.
These, in turn, contain the specifications you'll want to edit.
The source files and their directory structure will look slightly different depending on the draft.
Regardless, there will always be a top-level file called `flow.yaml` that *imports* all other YAML files,
which you'll find in a subdirectory named for your catalog prefix.
These, in turn, contain the specifications you'll want to edit.

## Edit the draft and publish

Expand All @@ -76,7 +76,7 @@ Next, you'll make changes to the specification(s), test, and publish the draft.

3. When you're done, sync the local work to the global draft: `flowctl draft author --source flow.yaml`.

Specifying the top-level `flow.yaml` file as the source ensures that all entities in the draft are imported.
Specifying the top-level `flow.yaml` file as the source ensures that all entities in the draft are imported.

4. Publish the draft: `flowctl draft publish`

Expand Down
12 changes: 6 additions & 6 deletions site/docs/guides/flowctl/edit-specification-locally.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Using these names, you'll identify and pull the relevant specifications for edit

* Pull a group of specifications by prefix or type filter, for example: `flowctl catalog pull-specs --prefix myOrg/marketing --collections`

The source files are written to your current working directory.
The source files are written to your current working directory.

4. Browse the source files.

Expand All @@ -106,15 +106,15 @@ Next, you'll complete your edits, test that they were performed correctly, and r
3. When you're done, you can test your changes:
`flowctl catalog test --source flow.yaml`

You'll almost always use the top-level `flow.yaml` file as the source here because it imports all other Flow specifications
in your working directory.
You'll almost always use the top-level `flow.yaml` file as the source here because it imports all other Flow specifications
in your working directory.

Once the test has passed, you can publish your specifications.
Once the test has passed, you can publish your specifications.

4. Re-publish all the specifications you pulled: `flowctl catalog publish --source flow.yaml`

Again you'll almost always want to use the top-level `flow.yaml` file. If you want to publish only certain specifications,
you can provide a path to a different file.
Again you'll almost always want to use the top-level `flow.yaml` file. If you want to publish only certain specifications,
you can provide a path to a different file.

5. Return to the web app or use `flowctl catalog list` to check the status of the entities you just published.
Their publication time will be updated to reflect the work you just did.
Expand Down
2 changes: 1 addition & 1 deletion site/docs/guides/schema-evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Regardless of whether the field is materialized or not, it must still pass schem

Database and data warehouse materializations tend to be somewhat restrictive about changing column types. They typically only allow dropping `NOT NULL` constraints. This means that you can safely change a schema to make a required field optional, or to add `null` as a possible type, and the materialization will continue to work normally. Most other types of changes will require materializing into a new table.

The best way to find out whether a change is acceptable to a given connector is to run test or attempt to re-publish. Failed attempts to publish won't affect any tasks that are already running.
The best way to find out whether a change is acceptable to a given connector is to run a test or attempt to re-publish. Failed attempts to publish won't affect any tasks that are already running.

**Web app workflow**

Expand Down
30 changes: 15 additions & 15 deletions site/docs/guides/system-specific-dataflows/s3-to-snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ credentials provided by your Estuary account manager.

3. Find the **Amazon S3** tile and click **Capture**.

A form appears with the properties required for an S3 capture.
A form appears with the properties required for an S3 capture.

4. Type a name for your capture.

Expand All @@ -69,23 +69,23 @@ credentials provided by your Estuary account manager.

* **Prefix**: You might organize your S3 bucket using [prefixes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html), which emulate a directory structure. To capture *only* from a specific prefix, add it here.

* **Match Keys**: Filters to apply to the objects in the S3 bucket. If provided, only data whose absolute path matches the filter will be captured. For example, `*\.json` will only capture JSON file.
* **Match Keys**: Filters to apply to the objects in the S3 bucket. If provided, only data whose absolute path matches the filter will be captured. For example, `*\.json` will only capture JSON files.

See the S3 connector documentation for information on [advanced fields](../../reference/Connectors/capture-connectors/amazon-s3.md#endpoint) and [parser settings](../../reference/Connectors/capture-connectors/amazon-s3.md#advanced-parsing-cloud-storage-data). (You're unlikely to need these for most use cases.)

6. Click **Next**.

Flow uses the provided configuration to initiate a connection to S3.
Flow uses the provided configuration to initiate a connection to S3.

It generates a permissive schema and details of the Flow collection that will store the data from S3.
It generates a permissive schema and details of the Flow collection that will store the data from S3.

You'll have the chance to tighten up each collection's JSON schema later, when you materialize to Snowflake.
You'll have the chance to tighten up each collection's JSON schema later, when you materialize to Snowflake.

7. Click **Save and publish**.

You'll see a notification when the capture publishes successfully.
You'll see a notification when the capture publishes successfully.

The data currently in your S3 bucket has been captured, and future updates to it will be captured continuously.
The data currently in your S3 bucket has been captured, and future updates to it will be captured continuously.

8. Click **Materialize Collections** to continue.

Expand All @@ -95,7 +95,7 @@ Next, you'll add a Snowflake materialization to connect the captured data to its

1. Locate the **Snowflake** tile and click **Materialization**.

A form appears with the properties required for a Snowflake materialization.
A form appears with the properties required for a Snowflake materialization.

2. Choose a unique name for your materialization like you did when naming your capture; for example, `acmeCo/mySnowflakeMaterialization`.

Expand All @@ -112,12 +112,12 @@ Next, you'll add a Snowflake materialization to connect the captured data to its

4. Click **Next**.

Flow uses the provided configuration to initiate a connection to Snowflake.
Flow uses the provided configuration to initiate a connection to Snowflake.

You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click **Next** to try again.
You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click **Next** to try again.

Once the connection is successful, the Endpoint Config collapses and the **Source Collections** browser becomes prominent.
It shows the collection you captured previously, which will be mapped to a Snowflake table.
Once the connection is successful, the Endpoint Config collapses and the **Source Collections** browser becomes prominent.
It shows the collection you captured previously, which will be mapped to a Snowflake table.

5. In the **Collection Selector**, optionally change the name in the **Table** field.

Expand All @@ -127,9 +127,9 @@ Next, you'll add a Snowflake materialization to connect the captured data to its

7. Apply a stricter schema to the collection for the materialization.

S3 has a flat data structure.
To materialize this data effectively to Snowflake, you should apply a schema that can translate to a table structure.
Flow's **Schema Inference** tool can help.
S3 has a flat data structure.
To materialize this data effectively to Snowflake, you should apply a schema that can translate to a table structure.
Flow's **Schema Inference** tool can help.

1. In the **Source Collections** browser, click the collection's **Collection** tab.

Expand Down
3 changes: 2 additions & 1 deletion site/docs/guides/transform_data_using_typescript.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,8 @@ You can use `flowctl` to quickly verify your derivation before publishing it. Us

As you can see, the output format matches the defined schema.  The last step would be to publish your derivation to Flow, which you can also do using `flowctl`.

:::warning Publishing the derivation will initialize the transformation on the live, real-time Wikipedia stream, make sure to delete it after completing the tutorial.
:::warning
Publishing the derivation will initialize the transformation on the live, real-time Wikipedia stream, make sure to delete it after completing the tutorial.
:::

```shell
Expand Down

0 comments on commit e0d55e1

Please sign in to comment.