Skip to content

Commit

Permalink
Add basic docs for integration setup and config (opensearch-project#1613
Browse files Browse the repository at this point in the history
)

Signed-off-by: Simeon Widdis <[email protected]>
  • Loading branch information
Swiddis authored Mar 23, 2024
1 parent 4e1e0e5 commit 8f7950f
Show file tree
Hide file tree
Showing 3 changed files with 128 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/integrations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# OpenSearch Integrations

This is the developer documentation for OpenSearch Integrations.

Some major documents to look at:
- [Setup](setup.md) explains the major steps of the integration setup process behind the scenes,
which gives context for how integration content is assembled. To get more into developing
integrations directly, there's the related [Config](config.md) document.
85 changes: 85 additions & 0 deletions docs/integrations/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Integration Configuration

**Date:** March 22, 2024

The bulk of an integration's functionality is defined in its config. Let's look a bit at the config
for the current [Nginx integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/nginx-1.0.0.json),
with some fields pruned for legibility, to get a better understanding of what information it
contains.

```json5
{
"name": "nginx",
"version": "1.0.0",
"workflows": [
{
"name": "queries"
},
{
"name": "dashboards"
}
],
"components": [
{
"name": "communication",
"version": "1.0.0"
},
{
"name": "http",
"version": "1.0.0"
},
{
"name": "logs",
"version": "1.0.0"
}
],
"assets": [
{
"name": "nginx",
"version": "1.0.0",
"extension": "ndjson",
"type": "savedObjectBundle",
"workflows": ["dashboards"]
},
{
"name": "create_table",
"version": "1.0.0",
"extension": "sql",
"type": "query"
},
{
"name": "create_mv",
"version": "1.0.0",
"extension": "sql",
"type": "query",
"workflows": ["dashboards"]
}
],
"sampleData": {
"path": "sample.json"
}
}
```

There are generally four key components to an integration's functionality, a lot of what's left is metadata or used for rendering.

- `assets` are the items that are associated with the integration, including queries, dashboards,
and index patterns. Originally the assets were just one `ndjson` file of exported Saved Objects
(today a `savedObjectBundle`), but to support further options it was transformed to a list with
further types. The assets are available under the [directory of the same name](https://github.com/opensearch-project/dashboards-observability/tree/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/assets).
The currently supported asset types are:
- `savedObjectBundle`: a saved object export. This typically includes an index pattern and a dashboard querying it, and it indicates that the integration expects data that conforms to this index pattern (see `components` below).
- `query`: A SQL query that is sent to OpenSearch Spark. You can read more about it at the
[opensearch-spark repository](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md).
- `workflows` are conditional flags that toggle whether or not an asset should be installed. They're
selected by the user before installing the integration. By default, an asset is included under
every workflow. Currently, workflows are only enabled for integrations that support S3 data source
installations, and workflows are run in order of type (`query`s are always run before `savedObjectBundle`s).
- `components` define the format of the data expected for saved queries and dashboards. This format
is specified by the components. These are typically shared between related integrations to allow
things like correlation by field. The current standard components defined here and in the
[OpenSearch Catalog](https://github.com/opensearch-project/opensearch-catalog) are heavily
inspired by [OpenTelemetry](https://opentelemetry.io/). The components can be used for validation
when connecting an integration to an index pattern. It's highly recommended to reuse existing
components where possible.
- `sampleData` is loaded after the rest of the integration setup process when users select the "Try it" option.
35 changes: 35 additions & 0 deletions docs/integrations/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Integrations Setup

**Date:** March 22, 2024

When an integration is being installed, there are several steps executed in the process of getting
everything up and running. This document describes the major steps of installing an integration that
happen behind the scenes, to make it more clear how to implement content. It's generally recommended to read this along with the [Config document](config.md).

Currently, two types of integration assets are supported with a synchronous install. The full
installation process installs these separately, in two major chunks.

- The frontend side of the setup is in
[setup_integrations.tsx](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L450).
This is where the installation flow is selected based on the type of integration being installed,
integration `query`s are ran if available, and eventually the build request is sent to the
backend.
- On the backend the request is routed to a
[builder](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/integrations_builder.ts#L32)
that handles some further reference tidying (rewriting UUIDs to avoid collisions, modifying which
index is read, etc) and makes the final integration instance object.

This process is a little confusing and perhaps more convoluted than it needs to be. This is known to
the author in hindsight.

## Query Mapping

If working on S3-based integrations, it's worth noting that queries have some values
[substituted](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L438) when installing. They are:

- `{s3_bucket_location}` to locate data.
- `{s3_checkpoint_location}` to store intermediate results, which is required by Spark.
- `{object_name}` used for giving tables a unique name per-integration to avoid collisions.

For some query examples, it can be worth looking at the assets for the
[VPC integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/aws_vpc_flow/assets/README.md).

0 comments on commit 8f7950f

Please sign in to comment.