diff --git a/site/docs/concepts/advanced/evolutions.md b/site/docs/concepts/advanced/evolutions.md index 28c3f882e1..8b8d73fdd1 100644 --- a/site/docs/concepts/advanced/evolutions.md +++ b/site/docs/concepts/advanced/evolutions.md @@ -53,12 +53,12 @@ When you attempt to publish a breaking change to a collection in the Flow web ap Click the **Apply** button to trigger an evolution and update all necessary specification to keep your Data Flow functioning. Then, review and publish your draft. -If you enabled [AutoDiscover](../captures.md#autodiscover) on a capture, any breaking changes that it introduces will trigger an automatic schema evolution, so long as you selected the **Breaking change re-versions collections** option(`evolveIncompatibleCollections`). +If you enabled [AutoDiscover](../captures.md#autodiscover) on a capture, any breaking changes that it introduces will trigger an automatic schema evolution, so long as you selected the **Breaking change re-versions collections** option (`evolveIncompatibleCollections`). ## What do schema evolutions do? The schema evolution feature is available in the Flow web app when you're editing pre-existing Flow entities. -It notices when one of your edit would cause other components of the Data Flow to fail, alerts you, and gives you the option to automatically update the specs of these components to prevent failure. +It notices when one of your edits would cause other components of the Data Flow to fail, alerts you, and gives you the option to automatically update the specs of these components to prevent failure. In other words, evolutions happen in the *draft* state. Whenever you edit, you create a draft. Evolutions add to the draft so that when it is published and updates the active data flow, operations can continue seamlessly. diff --git a/site/docs/concepts/collections.md b/site/docs/concepts/collections.md index dfd160905e..9930c1c534 100644 --- a/site/docs/concepts/collections.md +++ b/site/docs/concepts/collections.md @@ -332,7 +332,7 @@ If desired, a derivation could re-key the collection on `[/userId, /name]` to materialize the various `/name`s seen for a `/userId`. This property makes keys less lossy than they might otherwise appear, -and it is generally good practice to chose a key that reflects how +and it is generally good practice to choose a key that reflects how you wish to _query_ a collection, rather than an exhaustive key that's certain to be unique for every document. diff --git a/site/docs/concepts/connectors.md b/site/docs/concepts/connectors.md index bfb312f23b..f2ea738dac 100644 --- a/site/docs/concepts/connectors.md +++ b/site/docs/concepts/connectors.md @@ -219,7 +219,7 @@ sops: ``` You then use this `config.yaml` within your Flow specification. -The Flow runtime knows that this document is protected by `sops` +The Flow runtime knows that this document is protected by `sops`, will continue to store it in its protected form, and will attempt a decryption only when invoking a connector on your behalf. diff --git a/site/docs/concepts/derivations.md b/site/docs/concepts/derivations.md index 1714de71e1..de7e5745e3 100644 --- a/site/docs/concepts/derivations.md +++ b/site/docs/concepts/derivations.md @@ -218,8 +218,8 @@ into JSON arrays or objects and embeds them into the mapped document: `{"greeting": "hello", "items": [1, "two", 3]}`. If parsing fails, the raw string is used instead. -If you would like to select all columns of the input collection, -rather than `select *`, use `select JSON($flow_document)`, e.g. +If you would like to select all columns of the input collection, +rather than `select *`, use `select JSON($flow_document)`, e.g. `select JSON($flow_document where $status = open;`. As a special case if your query selects a _single_ column @@ -608,6 +608,7 @@ Flow read delays are very efficient and scale better than managing very large numbers of fine-grain timers. [See Grouped Windows of Transfers for an example using a read delay](#grouped-windows-of-transfers) + [Learn more from the Citi Bike "idle bikes" example](https://github.com/estuary/flow/blob/master/examples/citi-bike/idle-bikes.flow.yaml) ### Read priority @@ -639,7 +640,7 @@ For SQLite derivations, the entire SQLite database is the internal state of the task. TypeScript derivations can use in-memory states with a recovery and checkpoint mechanism. -Estuary intends to offer an additional mechanisms for +Estuary intends to offer additional mechanisms for automatic internal state snapshot and recovery in the future. The exact nature of internal task states vary, diff --git a/site/docs/concepts/import.md b/site/docs/concepts/import.md index c5435b50d0..9645a61232 100644 --- a/site/docs/concepts/import.md +++ b/site/docs/concepts/import.md @@ -3,7 +3,7 @@ sidebar_position: 7 --- # Imports -When you work on a draft Data Flow [using `flowctl draft`](../concepts/flowctl.md#working-with-drafts), +When you work on a draft Data Flow [using `flowctl draft`](../guides/flowctl/edit-draft-from-webapp.md), your Flow specifications may be spread across multiple files. For example, you may have multiple **materializations** that read from collections defined in separate files, or you could store a **derivation** separately from its **tests**. diff --git a/site/docs/concepts/materialization.md b/site/docs/concepts/materialization.md index e714aeabb8..2a300a3fd9 100644 --- a/site/docs/concepts/materialization.md +++ b/site/docs/concepts/materialization.md @@ -26,7 +26,7 @@ You define and configure materializations in **Flow specifications**. Materializations use real-time [connectors](./connectors.md) to connect to many endpoint types. When you use a materialization connector in the Flow web app, -flow helps you configure it through the **discovery** workflow. +Flow helps you configure it through the **discovery** workflow. To begin discovery, you tell Flow the connector you'd like to use, basic information about the endpoint, and the collection(s) you'd like to materialize there. @@ -67,7 +67,7 @@ materializations: # Name of the collection to be read. # Required. name: acmeCo/example/collection - # Lower bound date-time for documents which should be processed. + # Lower bound date-time for documents which should be processed. # Source collection documents published before this date-time are filtered. # `notBefore` is *only* a filter. Updating its value will not cause Flow # to re-process documents that have already been read. @@ -93,11 +93,11 @@ materializations: # Priority applied to documents processed by this binding. # When all bindings are of equal priority, documents are processed # in order of their associated publishing time. - # + # # However, when one binding has a higher priority than others, # then *all* ready documents are processed through the binding # before *any* documents of other bindings are processed. - # + # # Optional. Default: 0, integer >= 0 priority: 0 @@ -362,24 +362,27 @@ field implemented. Consult the individual connector documentation for details. ### How It Works 1. **Source Capture Level:** - - If the source capture provides a schema or namespace, it will be used as the default schema for all bindings in - - the materialization. + + If the source capture provides a schema or namespace, it will be used as the default schema for all bindings in the materialization. 2. **Manual Overrides:** - - You can still manually configure schema names for each binding, overriding the default schema if needed. + + You can still manually configure schema names for each binding, overriding the default schema if needed. 3. **Materialization-Level Configuration:** - - The default schema name can be set at the materialization level, ensuring that all new captures within that - - materialization automatically inherit the default schema name. + + The default schema name can be set at the materialization level, ensuring that all new captures within that materialization automatically inherit the default schema name. ### Configuration Steps 1. **Set Default Schema at Source Capture Level:** - - When defining your source capture, specify the schema or namespace. If no schema is provided, Estuary Flow will - - automatically assign a default schema. - + + When defining your source capture, specify the schema or namespace. If no schema is provided, Estuary Flow will automatically assign a default schema. + 2. **Override Schema at Binding Level:** - - For any binding, you can manually override the default schema by specifying a different schema name. + + For any binding, you can manually override the default schema by specifying a different schema name. 3. **Set Default Schema at Materialization Level:** - - During the materialization configuration, set a default schema name for all captures within the materialization. + + During the materialization configuration, set a default schema name for all captures within the materialization. diff --git a/site/docs/concepts/schemas.md b/site/docs/concepts/schemas.md index b1f92d2a97..812e0c0ca2 100644 --- a/site/docs/concepts/schemas.md +++ b/site/docs/concepts/schemas.md @@ -45,7 +45,7 @@ Flow can usually generate suitable JSON schemas on your behalf. For systems like relational databases, Flow will typically generate a complete JSON schema by introspecting the table definition. -For systems that store unstructured data, Flow will typically generate a very minimal schema, and will rely on schema inferrence to fill in the details. See [continuous schema inferenece](#continuous-schema-inference) for more information. +For systems that store unstructured data, Flow will typically generate a very minimal schema, and will rely on schema inference to fill in the details. See [continuous schema inference](#continuous-schema-inference) for more information. ### Translations @@ -72,7 +72,7 @@ Schema inference is also used to provide translations into other schema flavors: ### Annotations The JSON Schema standard introduces the concept of -[annotations](http://json-schema.org/understanding-json-schema/reference/generic.html#annotations), +[annotations](https://json-schema.org/understanding-json-schema/reference/annotations), which are keywords that attach metadata to a location within a validated JSON document. For example, `title` and `description` can be used to annotate a schema with its meaning: diff --git a/site/docs/concepts/storage-mappings.md b/site/docs/concepts/storage-mappings.md index 92d143d6cf..07cd39270b 100644 --- a/site/docs/concepts/storage-mappings.md +++ b/site/docs/concepts/storage-mappings.md @@ -22,7 +22,7 @@ Flow tasks — captures, derivations, and materializations — use recovery logs Recovery logs are an opaque binary log, but may contain user data. The recovery logs of a task are always prefixed by `recovery/`, -so a task named `acmeCo/produce-TNT` would have a recovery log called `recovery/acmeCo/roduce-TNT` +so a task named `acmeCo/produce-TNT` would have a recovery log called `recovery/acmeCo/produce-TNT` Flow prunes data from recovery logs once it is no longer required. diff --git a/site/docs/concepts/web-app.md b/site/docs/concepts/web-app.md index c2949e8913..0eeea40c95 100644 --- a/site/docs/concepts/web-app.md +++ b/site/docs/concepts/web-app.md @@ -22,7 +22,7 @@ With the Flow web app, you can perform most common workflows, including: * Viewing users and permissions. * Granting permissions to other users. * Authenticating with the flowctl CLI. -* Manage billing details. +* Managing billing details. Some advanced workflows, like transforming data with **derivations**, aren't fully available in the web app. @@ -31,7 +31,7 @@ it provides a quicker and easier path to create captures and materializations. Y ## Signing in -You use a Google, Microsoft, or GitHub account to sign into Flow. +You use a Google, Microsoft, or GitHub account to sign into Flow. Alternatively, [contact us](https://estuary.dev/contact-us) about Single Sign-On (SSO) options. ![](<./webapp-images/login-screen.png>) @@ -54,14 +54,14 @@ import Mermaid from '@theme/Mermaid'; `}/> While you may choose to [use the tabs in this sequence](../guides/create-dataflow.md), it's not necessary. -All Flow entities exist individually, outside of the context of complete Data Flow. +All Flow entities exist individually, outside of the context of a complete Data Flow. You can use the different pages in the web app to monitor and manage your items in a number of other ways, as described below. ## Captures page The **Captures** page shows you a table of existing Flow [captures](./captures.md) to which you have [access](../reference/authentication.md). The **New Capture** button is also visible. -You use the table to monitor your captures. +You can use the table to monitor your captures. ![](<./webapp-images/capture-page.png>) @@ -93,12 +93,11 @@ you can find it by filtering for `acmeCo*source-postgres`. **8:** Capture [statistics](./advanced/logs-stats.md#statistics). The **Data Written** column shows the total amount of data, in bytes and in [documents](./collections.md#documents), that the capture has written to its associated collections within a configurable time interval. -Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, or **Last Month**. -Note that the time intervals are in UTC. +Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, **Last Month**, or **All Time**. -**9:** Associated collections. The **Writes to** column shows all the collections to which the capture writes data. For captures with a large number of collections, there is a chip stating how many collections are hidden. Clicking on this will all you to hover over this column and scroll to view the full list. These are also links to the details page of the collection. +**9:** Associated collections. The **Writes to** column shows all the collections to which the capture writes data. For captures with a large number of collections, there is a chip stating how many collections are hidden. Clicking on this will allow you to hover over this column and scroll to view the full list. These also link to the details page of the collection. -**10:** Publish time. Hover over this value to see the exact UTC time the capture was last published. +**10:** Publish time. Hover over this value to see the exact time the capture was last published. **11:** Options. Click to open the menu to **Edit Specification**. @@ -125,7 +124,8 @@ You can proceed to the materialization, or opt to exit to a different page of th ## Collections page -The **Collections** page shows a read-only table of [collections](./collections.md) to which you have access. +The **Collections** page shows a table of [collections](./collections.md) to which you have access. There is also a button to begin a new derivation, or transformation. + The table has nearly all of the same features as the **Captures** table, with several important distinctions that are called out in the image below. @@ -155,17 +155,16 @@ In the event that the server cannot be reached, the indicator will show "Unknown **6:** Collection [statistics](./advanced/logs-stats.md#statistics). The **Data Written** column shows the total amount of data, in bytes and in [documents](./collections.md#documents), that has been written to each collection from its associated capture or derivation within a configurable time interval. -Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, or **Last Month**. -Note that the time intervals are in UTC. +Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, **Last Month**, or **All Time**. -**7:** Publish time. Hover over this value to see the exact UTC time the collection was last published. +**7:** Publish time. Hover over this value to see the exact time the collection was last published. ## Materializations page The **Materializations** page shows you a table of existing Flow [materializations](./materialization.md) to which you have [access](../reference/authentication.md). The **New Materialization** button is also visible. -You use the table to monitor your materializations. It's nearly identical to the table on the [Captures page](#captures-page), with a few exceptions. +You can use the table to monitor your materializations. It's nearly identical to the table on the [Captures page](#captures-page), with a few exceptions. ![](<./webapp-images/materialization-page.png>) @@ -194,12 +193,11 @@ you can find it by filtering for `acmeCo*mysql`. **7:** Materialization [statistics](./advanced/logs-stats.md#statistics). The **Data Read** column shows the total amount of data, in bytes and in [documents](./collections.md#documents), that the materialization has read from its associated collections within a configurable time interval. -Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, or **Last Month**. -Note that the time intervals are in UTC. +Click the time interval in the header to select from **Today**, **Yesterday**, **This Week**, **Last Week**, **This Month**, **Last Month**, or **All Time**. -**8:** Associated collections. The **Reads from** column shows all the collections from which the materialization reads data. For materializations with a large number of collections, there is a chip stating how many collections are hidden. Clicking on this will all you to hover over this column and scroll to view the full list. These are also links to the details page of the collection. +**8:** Associated collections. The **Reads from** column shows all the collections from which the materialization reads data. For materializations with a large number of collections, there is a chip stating how many collections are hidden. Clicking on this will allow you to hover over this column and scroll to view the full list. These also link to the details page of the collection. -**9:** Publish time. Hover over this value to see the exact UTC time the materialization was last published. +**9:** Publish time. Hover over this value to see the exact time the materialization was last published. **10:** Options. Click to open the menu to **Edit Specification**. @@ -239,15 +237,15 @@ When you click on the **name** of a capture on the [captures page](#captures-pag **1:** The full name of the capture. -**2:** Capture [statistics](./advanced/logs-stats.md#statistics). The **Usage** section displays the total amount of data, in bytes and in [documents](./collections.md#documents) written by the capture, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**. +**2:** Capture [statistics](./advanced/logs-stats.md#statistics). The **Usage** section displays the total amount of data, in bytes and in [documents](./collections.md#documents) written by the capture, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**, **48 hours**, or **30 days**. -**3:** The **Details** section shows different pieces of information about the capture. When it was last updated, when it was created, the connector being used, and the collections to which the capture writes data. +**3:** The **Details** section shows information about the capture: when it was last updated, when it was created, the connector being used, and the collections to which the capture writes data. **4:** Detailed tooltip. You can hover over a section in the graph to see the specific data of that hour. **5:** The most recent hour. This will automatically update every 15 seconds with the most recent data and docs. -**6:** Associated collections. Shows all the collections to which the capture writes data and when clicked will take you to the collection's [detail page](#collection-details-page) +**6:** Associated collections. Shows all the collections to which the capture writes data and when clicked will take you to the collection's [detail page](#collection-details-page). **7:** The **Shard Information** section shows the full identifier of the shard(s) that back your capture. If there's an error, you'll see an alert identifying the failing shard(s). Use the drop-down to open an expanded view of the failed shard's logs. @@ -266,15 +264,15 @@ When you click on the **name** of a collection on the [collections page](#collec **1:** The full name of the collection. -**2:** Collection [statistics](./advanced/logs-stats.md#statistics). The **Usage** section shows the total amount of data, in bytes and in [documents](./collections.md#documents) passing through a collection, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**. +**2:** Collection [statistics](./advanced/logs-stats.md#statistics). The **Usage** section shows the total amount of data, in bytes and in [documents](./collections.md#documents) passing through a collection, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**, **48 hours**, or **30 days**. -**3:** The **Details** section shows different pieces of information about the collection. When it was last updated, when it was created, and the associated collections (if any). +**3:** The **Details** section shows information about the collection: when it was last updated, when it was created, and the associated collections (if any). **4:** Detailed tooltip. You can hover over a section in the graph to see the specific data of that hour. **5:** The most recent hour. This will automatically update every 15 seconds with the most recent data and docs. -**6:** Associated collections. Shows all the collections to which the capture writes data and when clicked will take you to the collection's [detail page](#collection-details-page) +**6:** Associated collections. Shows source collections that this collection reads from. Click to go to the collection's [detail page](#collection-details-page). **7:** The **Shard Information** section (for derivations) shows the full identifier of the shard(s) that back your derivation. If there's an error, you'll see an alert identifying the failing shard(s). Use the drop-down to open an expanded view of the failed shard's logs. @@ -289,7 +287,7 @@ Documents are organized by their collection key value. Click a key from the list **2:** The collection's [schema](./schemas.md) displayed in a read only table. The table columns can be sorted to more easily find what you need. :::tip -If you need to modify a collection, edit the [capture](#editing-captures) that it came from. +If you need to modify a collection, edit the [capture](#editing-captures) or [derivation](./derivations.md) that provides its data. ::: ## Materialization Details Page @@ -303,15 +301,15 @@ When you click on the **name** of a materialization on the [materializations pag **1:** The full name of the materialization. -**2:** Materialization [statistics](./advanced/logs-stats.md#statistics). The **Usage** section shows the total amount of data, in bytes and in [documents](./collections.md#documents) read by a materialization, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**. +**2:** Materialization [statistics](./advanced/logs-stats.md#statistics). The **Usage** section shows the total amount of data, in bytes and in [documents](./collections.md#documents) read by a materialization, per hour. The number of hours being displayed in the chart can be changed by clicking the time interval in the header to select from **6 hours**, **12 hours**, **24 hours**, **48 hours**, or **30 days**. -**3:** The **Details** section shows different pieces of information about the materialization. When it was last updated, when it was created, and the associated collections. +**3:** The **Details** section shows information about the materialization: when it was last updated, when it was created, and the associated collections. **4:** Detailed tooltip. You can hover over a section in the graph to see the specific data of that hour. **5:** The most recent hour. This will automatically update every 15 seconds with the most recent data and docs. -**6:** Associated collections. Shows all the collections to which the capture writes data and when clicked will take you to the collection's [detail page](#collection-details-page) +**6:** Associated collections. Shows all the collections that provide data to this materialization. Click to go to the collection's [detail page](#collection-details-page). **7:** The **Shard Information** section shows the full identifier of the shard(s) that back your materialization. If there's an error, you'll see an alert identifying the failing shard(s). Use the drop-down to open an expanded view of the failed shard's logs. @@ -323,46 +321,63 @@ In the **Spec** tab, you can view the specification of the materialization itsel ## Admin page On the **Admin** page, you can view users' access grants, your organization's cloud storage locations, and a complete list of connectors. -You can also get an access token to authenticate with flowctl and update your cookie preferences. +You can also get an access token to authenticate with flowctl and manage billing information. -#### Users +### Account Access -The **Users** tab shows you all provisioned access grants on objects to which you also have access. +The **Account Access** tab shows you all provisioned access grants on objects to which you also have access. Both users and catalog prefixes can receive access grants. -These are split up into two tables called **Users** and **Prefixes**. +These are split up into two tables called **Organization Membership** and **Data Sharing**. Each access grant has its own row, so a given user or prefix may have multiple rows. -For example, if you had read access to `foo/` and write access to `bar/`, you'd have a separate table row in the **Users** table for each of these capabilities. +For example, if you had read access to `foo/` and write access to `bar/`, you'd have a separate table row in the **Organization Membership** table for each of these capabilities. If users Alice, Bob, and Carol each had write access on `foo/`, you'd see three more table rows representing these access grants. -Taking this a step further, the prefix `foo/` could have read access to `buz/`. You'd see this in the **Prefixes** table, +Taking this a step further, the prefix `foo/` could have read access to `buz/`. You'd see this in the **Data Sharing** table, and it'd signify that everyone who has access to `foo/` also inherits read access to `buz/`. Use the search boxes to filter by username, prefix, or object. +You can manage access by generating new user invitations, granting data sharing access, or selecting users or prefixes to revoke access. + +![](<./webapp-images/access-grant-invitation.png>) + +Generating a new invitation will create a URL with a grant token parameter. This token will allow access based on the prefix, capability, and type you select. Copy the URL and share it with its intended recipient to invite them to your organization. + [Learn more about capabilities and access.](../reference/authentication.md) -#### Storage Mappings +### Settings + +The **Settings** tab includes additional configuration, such as organization notifications and storage mappings. + +#### Organization Notifications -The **Storage Mappings** tab includes a table of the cloud storage locations that back your Flow collections. +Here, you are able to configure which email address(es) will receive notifications related to your organization or prefix. + +#### Cloud Storage + +This section provides a table of the cloud storage locations that back your Flow collections. You're able to view the table if you're an admin. Each top-level Flow [prefix](./catalogs.md#namespace) is backed by one or more cloud storage bucket that you own. You typically have just one prefix: your organization name, which you provided when configuring your Flow organizational account. -If you're a trial user, your prefix is `trial/`, and this tab isn't applicable to you; -your data is stored temporarily in Estuary's cloud storage bucket for your trial period. +If you're a trial user, your data is stored temporarily in Estuary's cloud storage bucket for your trial period. [Learn more about storage mappings.](./storage-mappings.md) -#### Connectors +### Billing -The **Connectors** tab offers a complete view of all connectors that are currently available through the web application, including both capture and materialization connectors. -If a connector you need is missing, you can request it. +The **Billing** tab allows you to view and manage information related to past usage, the current billing cycle, and payment methods. -#### CLI-API +Your usage is broken down by the amount of data processed and number of task hours. View usage trends across previous months in the **Usage by Month** chart and preview your bill based on usage for the current month. If you are on the free tier (up to 2 connectors and 10 GB per month), you will still be able to preview your bill breakdown, and will have a "Free tier credit" deduction. To help estimate your bill, also see the [Pricing Calculator](https://estuary.dev/pricing/#pricing-calculator). -The **CLI-API** tab provides the access token required to [authenticate with flowctl](../reference/authentication.md#authenticating-flow-using-the-cli). +To pay your bill, add a payment method to your account. You can choose to pay via card or bank account. You will not be charged until you exceed the free tier's limits. + +### Connectors + +The **Connectors** tab offers a complete view of all connectors that are currently available through the web application, including both capture and materialization connectors. +If a connector you need is missing, you can [request it](https://github.com/estuary/connectors/issues/new/choose). -#### Cookie Preferences +### CLI-API -You use the **Cookie Preferences** tab to view and modify cookie settings. \ No newline at end of file +The **CLI-API** tab provides the access token required to [authenticate with flowctl](../reference/authentication.md#authenticating-flow-using-the-cli). You can also revoke old tokens. diff --git a/site/docs/concepts/webapp-images/access-grant-invitation.png b/site/docs/concepts/webapp-images/access-grant-invitation.png new file mode 100644 index 0000000000..81ede8bfc9 Binary files /dev/null and b/site/docs/concepts/webapp-images/access-grant-invitation.png differ diff --git a/site/docs/concepts/webapp-images/login-screen.png b/site/docs/concepts/webapp-images/login-screen.png index 1a49b14fb7..cf722738c9 100644 Binary files a/site/docs/concepts/webapp-images/login-screen.png and b/site/docs/concepts/webapp-images/login-screen.png differ diff --git a/site/docs/guides/flowctl/edit-draft-from-webapp.md b/site/docs/guides/flowctl/edit-draft-from-webapp.md index a1d08ffa32..cbc4cbfa4f 100644 --- a/site/docs/guides/flowctl/edit-draft-from-webapp.md +++ b/site/docs/guides/flowctl/edit-draft-from-webapp.md @@ -41,13 +41,13 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with 2. Run `flowctl draft list` - flowctl outputs a table of all the drafts to which you have access, from oldest to newest. + flowctl outputs a table of all the drafts to which you have access, from oldest to newest. 3. Use the name and timestamp to find the draft you're looking for. - Each draft has an **ID**, and most have a name in the **Details** column. Note the **# of Specs** column. - For drafts created in the web app, materialization drafts will always contain one specification. - A number higher than 1 indicates a capture with its associated collections. + Each draft has an **ID**, and most have a name in the **Details** column. Note the **# of Specs** column. + For drafts created in the web app, materialization drafts will always contain one specification. + A number higher than 1 indicates a capture with its associated collections. 4. Copy the draft ID. @@ -57,10 +57,10 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with 7. Browse the source files. - The source files and their directory structure will look slightly different depending on the draft. - Regardless, there will always be a top-level file called `flow.yaml` that *imports* all other YAML files, - which you'll find in a subdirectory named for your catalog prefix. - These, in turn, contain the specifications you'll want to edit. + The source files and their directory structure will look slightly different depending on the draft. + Regardless, there will always be a top-level file called `flow.yaml` that *imports* all other YAML files, + which you'll find in a subdirectory named for your catalog prefix. + These, in turn, contain the specifications you'll want to edit. ## Edit the draft and publish @@ -76,7 +76,7 @@ Next, you'll make changes to the specification(s), test, and publish the draft. 3. When you're done, sync the local work to the global draft: `flowctl draft author --source flow.yaml`. - Specifying the top-level `flow.yaml` file as the source ensures that all entities in the draft are imported. + Specifying the top-level `flow.yaml` file as the source ensures that all entities in the draft are imported. 4. Publish the draft: `flowctl draft publish` diff --git a/site/docs/guides/flowctl/edit-specification-locally.md b/site/docs/guides/flowctl/edit-specification-locally.md index 8c95b612d8..f91cd64109 100644 --- a/site/docs/guides/flowctl/edit-specification-locally.md +++ b/site/docs/guides/flowctl/edit-specification-locally.md @@ -79,7 +79,7 @@ Using these names, you'll identify and pull the relevant specifications for edit * Pull a group of specifications by prefix or type filter, for example: `flowctl catalog pull-specs --prefix myOrg/marketing --collections` - The source files are written to your current working directory. + The source files are written to your current working directory. 4. Browse the source files. @@ -106,15 +106,15 @@ Next, you'll complete your edits, test that they were performed correctly, and r 3. When you're done, you can test your changes: `flowctl catalog test --source flow.yaml` - You'll almost always use the top-level `flow.yaml` file as the source here because it imports all other Flow specifications - in your working directory. + You'll almost always use the top-level `flow.yaml` file as the source here because it imports all other Flow specifications + in your working directory. - Once the test has passed, you can publish your specifications. + Once the test has passed, you can publish your specifications. 4. Re-publish all the specifications you pulled: `flowctl catalog publish --source flow.yaml` - Again you'll almost always want to use the top-level `flow.yaml` file. If you want to publish only certain specifications, - you can provide a path to a different file. + Again you'll almost always want to use the top-level `flow.yaml` file. If you want to publish only certain specifications, + you can provide a path to a different file. 5. Return to the web app or use `flowctl catalog list` to check the status of the entities you just published. Their publication time will be updated to reflect the work you just did. diff --git a/site/docs/guides/schema-evolution.md b/site/docs/guides/schema-evolution.md index ccf2f5925f..2976b69b56 100644 --- a/site/docs/guides/schema-evolution.md +++ b/site/docs/guides/schema-evolution.md @@ -173,7 +173,7 @@ Regardless of whether the field is materialized or not, it must still pass schem Database and data warehouse materializations tend to be somewhat restrictive about changing column types. They typically only allow dropping `NOT NULL` constraints. This means that you can safely change a schema to make a required field optional, or to add `null` as a possible type, and the materialization will continue to work normally. Most other types of changes will require materializing into a new table. -The best way to find out whether a change is acceptable to a given connector is to run test or attempt to re-publish. Failed attempts to publish won't affect any tasks that are already running. +The best way to find out whether a change is acceptable to a given connector is to run a test or attempt to re-publish. Failed attempts to publish won't affect any tasks that are already running. **Web app workflow** diff --git a/site/docs/guides/system-specific-dataflows/s3-to-snowflake.md b/site/docs/guides/system-specific-dataflows/s3-to-snowflake.md index de9e8b891f..738a36a157 100644 --- a/site/docs/guides/system-specific-dataflows/s3-to-snowflake.md +++ b/site/docs/guides/system-specific-dataflows/s3-to-snowflake.md @@ -52,7 +52,7 @@ credentials provided by your Estuary account manager. 3. Find the **Amazon S3** tile and click **Capture**. - A form appears with the properties required for an S3 capture. + A form appears with the properties required for an S3 capture. 4. Type a name for your capture. @@ -69,23 +69,23 @@ credentials provided by your Estuary account manager. * **Prefix**: You might organize your S3 bucket using [prefixes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html), which emulate a directory structure. To capture *only* from a specific prefix, add it here. - * **Match Keys**: Filters to apply to the objects in the S3 bucket. If provided, only data whose absolute path matches the filter will be captured. For example, `*\.json` will only capture JSON file. + * **Match Keys**: Filters to apply to the objects in the S3 bucket. If provided, only data whose absolute path matches the filter will be captured. For example, `*\.json` will only capture JSON files. See the S3 connector documentation for information on [advanced fields](../../reference/Connectors/capture-connectors/amazon-s3.md#endpoint) and [parser settings](../../reference/Connectors/capture-connectors/amazon-s3.md#advanced-parsing-cloud-storage-data). (You're unlikely to need these for most use cases.) 6. Click **Next**. - Flow uses the provided configuration to initiate a connection to S3. + Flow uses the provided configuration to initiate a connection to S3. - It generates a permissive schema and details of the Flow collection that will store the data from S3. + It generates a permissive schema and details of the Flow collection that will store the data from S3. - You'll have the chance to tighten up each collection's JSON schema later, when you materialize to Snowflake. + You'll have the chance to tighten up each collection's JSON schema later, when you materialize to Snowflake. 7. Click **Save and publish**. - You'll see a notification when the capture publishes successfully. + You'll see a notification when the capture publishes successfully. - The data currently in your S3 bucket has been captured, and future updates to it will be captured continuously. + The data currently in your S3 bucket has been captured, and future updates to it will be captured continuously. 8. Click **Materialize Collections** to continue. @@ -95,7 +95,7 @@ Next, you'll add a Snowflake materialization to connect the captured data to its 1. Locate the **Snowflake** tile and click **Materialization**. - A form appears with the properties required for a Snowflake materialization. + A form appears with the properties required for a Snowflake materialization. 2. Choose a unique name for your materialization like you did when naming your capture; for example, `acmeCo/mySnowflakeMaterialization`. @@ -112,12 +112,12 @@ Next, you'll add a Snowflake materialization to connect the captured data to its 4. Click **Next**. - Flow uses the provided configuration to initiate a connection to Snowflake. + Flow uses the provided configuration to initiate a connection to Snowflake. - You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click **Next** to try again. + You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click **Next** to try again. - Once the connection is successful, the Endpoint Config collapses and the **Source Collections** browser becomes prominent. - It shows the collection you captured previously, which will be mapped to a Snowflake table. + Once the connection is successful, the Endpoint Config collapses and the **Source Collections** browser becomes prominent. + It shows the collection you captured previously, which will be mapped to a Snowflake table. 5. In the **Collection Selector**, optionally change the name in the **Table** field. @@ -127,9 +127,9 @@ Next, you'll add a Snowflake materialization to connect the captured data to its 7. Apply a stricter schema to the collection for the materialization. - S3 has a flat data structure. - To materialize this data effectively to Snowflake, you should apply a schema that can translate to a table structure. - Flow's **Schema Inference** tool can help. + S3 has a flat data structure. + To materialize this data effectively to Snowflake, you should apply a schema that can translate to a table structure. + Flow's **Schema Inference** tool can help. 1. In the **Source Collections** browser, click the collection's **Collection** tab. diff --git a/site/docs/guides/transform_data_using_typescript.md b/site/docs/guides/transform_data_using_typescript.md index 3df0a292a4..53950c6d0a 100644 --- a/site/docs/guides/transform_data_using_typescript.md +++ b/site/docs/guides/transform_data_using_typescript.md @@ -273,7 +273,8 @@ You can use `flowctl` to quickly verify your derivation before publishing it. Us As you can see, the output format matches the defined schema.  The last step would be to publish your derivation to Flow, which you can also do using `flowctl`. -:::warning Publishing the derivation will initialize the transformation on the live, real-time Wikipedia stream, make sure to delete it after completing the tutorial. +:::warning +Publishing the derivation will initialize the transformation on the live, real-time Wikipedia stream, make sure to delete it after completing the tutorial. ::: ```shell