Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data structures via the cli supporting docs #1017

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
366 changes: 366 additions & 0 deletions docs/recipes/recipe-data-structures-in-git/index.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Managing data structures via the API"
sidebar_label: "Using the API"
sidebar_position: 2
sidebar_position: 3
sidebar_custom_props:
offerings:
- bdp
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: "Managing data structures via the CLI"
description: "Use the 'snowplow-cli data-structures' command to manage your data structures."
sidebar_label: "Using the CLI"
sidebar_position: 2
sidebar_custom_props:
offerings:
- bdp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cloud be included as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as i can tell this is just how the docs differentiate between BDP and Community. There are no other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm mixing something up, or the docs were changed since the last time I've done anything there

---

```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
```

The `data-structures` subcommand of [Snowplow CLI](https://github.com/snowplow-product/snowplow-cli) provides a collection of functionality to ease the integration of custom development and publishing workflows.

## Snowplow CLI Prerequisites

### Download

Releases can be found on github https://github.com/snowplow-product/snowplow-cli/releases.

For systems with `curl` available the following commands should get you started with the latest version. Take care to replace `darwin_arm64` with the correct architecture for your system.

```bash
curl -L -o snowplow-cli https://github.com/snowplow-product/snowplow-cli/releases/latest/download/snowplow-cli_darwin_arm64
chmod u+x snowplow-cli
```
:::info
The following examples assume you remain in the folder containing `snowplow-cli`.
:::

### Configure

You will need three values.

API Key Id and API Key Secret are generated from the [credentials section](https://console.snowplowanalytics.com/credentials) in BDP Console.

Organization Id can be retrieved from the URL immediately following the .com when visiting BDP console:

![](images/orgID.png)

Snowplow CLI can take its configuration from a variety of sources. More details are available from `./snowplow-cli data-structures --help`. Variations on these three examples should serve most cases.

<Tabs groupId="config">
<TabItem value="env" label="env variables" default>

```bash
SNOWPLOW_CONSOLE_API_KEY_ID=********-****-****-****-************
SNOWPLOW_CONSOLE_API_KEY=********-****-****-****-************
SNOWPLOW_CONSOLE_ORG_ID=********-****-****-****-************
```

</TabItem>
<TabItem value="defaultconfig" label="$HOME/.config/snowplow/snowplow.yml" >

```yaml
console:
api-key-id: ********-****-****-****-************
api-key: ********-****-****-****-************
org-id: ********-****-****-****-************
```

</TabItem>
<TabItem value="args" label="inline arguments" >

```bash
./snowplow-cli data-structures --api-key-id ********-****-****-****-************ --api-key ********-****-****-****-************ --org-id ********-****-****-****-************
```

</TabItem>
</Tabs>


## Available commands

### Creating data structures

```bash
./snowplow-cli ds generate login_click ./folder-name

```

Will create a minimal data structure template in a new file `./folder-name/login_click.yaml`. Note that you will need to add a vendor name to the template before it will pass validation. Alternatively supply a vendor at creation time with the `--vendor com.acme` flag.


### Downloading data structures

```bash
./snowplow-cli ds download
```

This command will retrieve all organization data structures. By default it will create a folder named `data-structures` in the current working directory to put them in. It uses a combination of vendor and name to further break things down.

Given a data structure with `vendor: com.acme` and `name: link_click` and assuming the default format of yaml the resulting folder structure will be `./data-structures/com.acme/link_click.yaml`.


### Validating data structures

```bash
./snowplow-cli ds validate ./folder-name
```

This command will find all files under `./folder-name` (if omitted then `./data-structures`) and attempt to validate them using BDP console. It will assert the following

1. Is each file a valid format (yaml/json) with expected fields
2. Does the schema in the file conform to [snowplow expectations](/docs/understanding-your-pipeline/schemas/#the-anatomy-of-a-schema)
3. Given the organization's [loading configuration](/docs/storing-querying/loading-process/) will any schema version number choices have a potentially negative effect on data loading

If any validations fail the command will report the problems to stdout and exit with status code 1.


### Publishing data structures

```bash
./snowplow-cli ds publish dev ./folder-name
```

This command will find all files under `./folder-name` (if omitted then `./data-structures`) and attempt to publish them to BDP console in the environment provided (`dev` or `prod`).

Publishing to `dev` will also cause data structures to be validated with the `validate` command before upload. Publishing to `prod` will not validate but requires all data structures referenced to be present on `dev`.



Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,17 @@ If you have hidden a Data Structure and wish to restore it, navigate to the bott
![](images/image-9.png)

This will take you to a list of hidden Data Structures, locate the one you wish to restore and click **'Restore data structure'** to show it in the main listing.

* * *

## Externally managed Data Structures

Data Structures can be managed from an external repository using our [snowplow-cli](/docs/understanding-tracking-design/managing-your-data-structures/cli/) tool.

When a Data Structure is managed this way it becomes locked in the UI disabling all editing. You will see a banner explaining the situation and giving people with the 'publish to production' (default for admin users) capability the ability to unlock.

![](images/locked-ds.png)

:::caution
Having a single source of truth for a data structure is a good idea. If your source of truth is an external repository then unlocking and editing will cause conflicts.
:::