Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] fix jdbc catalog doc (backport #43595) #43607

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions docs/en/data_source/catalog/jdbc_catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
displayed_sidebar: "English"
---

# JDBC catalog

StarRocks supports JDBC catalogs from v3.0 onwards.

A JDBC catalog is a kind of external catalog that enables you to query data from data sources accessed through JDBC without ingestion.

Also, you can directly transform and load data from JDBC data sources by using [INSERT INTO](../../sql-reference/sql-statements/data-manipulation/INSERT.md) based on JDBC catalogs.

JDBC catalogs currently support MySQL and PostgreSQL.

## Prerequisites

- The FEs and BEs or CNs in your StarRocks cluster can download the JDBC driver from the download URL specified by the `driver_url` parameter.
- `JAVA_HOME` in the **$BE_HOME/bin/start_be.sh** file on each BE or CN node is properly configured as a path in the JDK environment instead of a path in the JRE environment. For example, you can configure `export JAVA_HOME = <JDK_absolute_path>`. You must add this configuration at the beginning of the script and restart the BE or CN for the configuration to take effect.

## Create a JDBC catalog

### Syntax

```SQL
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES ("key"="value", ...)
```

### Parameters

#### `catalog_name`

The name of the JDBC catalog. The naming conventions are as follows:

- The name can contain letters, digits (0-9), and underscores (_). It must start with a letter.
- The name is case-sensitive and cannot exceed 1023 characters in length.

#### `comment`

The description of the JDBC catalog. This parameter is optional.

#### `PROPERTIES`

The properties of the JDBC Catalog. `PROPERTIES` must include the following parameters:

| **Parameter** | **Description** |
| ----------------- | ------------------------------------------------------------ |
| type | The type of the resource. Set the value to `jdbc`. |
| user | The username that is used to connect to the target database. |
| password | The password that is used to connect to the target database. |
| jdbc_uri | The URI that the JDBC driver uses to connect to the target database. For MySQL, the URI is in the `"jdbc:mysql://ip:port"` format. For PostgreSQL, the URI is in the `"jdbc:postgresql://ip:port/db_name"` format. For more information: [PostgreSQL](https://jdbc.postgresql.org/documentation/head/connect.html). |
| driver_url | The download URL of the JDBC driver JAR package. An HTTP URL or file URL is supported, for example, `https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar` and `file:///home/disk1/postgresql-42.3.3.jar`.<br />**NOTE**<br />You can also put the JDBC driver to any same path on the FE and BE or CN nodes and set `driver_url` to that path, which must be in the `file:///<path>/to/the/driver` format. |
| driver_class | The class name of the JDBC driver. The JDBC driver class names of common database engines are as follows:<ul><li>MySQL: `com.mysql.jdbc.Driver` (MySQL v5.x and earlier) and `com.mysql.cj.jdbc.Driver` (MySQL v6.x and later)</li><li>PostgreSQL: `org.postgresql.Driver`</li></ul> |

> **NOTE**
>
> The FEs download the JDBC driver JAR package at the time of JDBC catalog creation, and the BEs or CNs download the JDBC driver JAR package at the time of the first query. The amount of time taken for the download varies depending on network conditions.

### Examples

The following example creates two JDBC catalogs: `jdbc0` and `jdbc1`.

```SQL
CREATE EXTERNAL CATALOG jdbc0
PROPERTIES
(
"type"="jdbc",
"user"="postgres",
"password"="changeme",
"jdbc_uri"="jdbc:postgresql://127.0.0.1:5432/jdbc_test",
"driver_url"="https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar",
"driver_class"="org.postgresql.Driver"
);

CREATE EXTERNAL CATALOG jdbc1
PROPERTIES
(
"type"="jdbc",
"user"="root",
"password"="changeme",
"jdbc_uri"="jdbc:mysql://127.0.0.1:3306",
"driver_url"="https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar",
"driver_class"="com.mysql.cj.jdbc.Driver"
);
```

## View JDBC catalogs

You can use [SHOW CATALOGS](../../sql-reference/sql-statements/data-manipulation/SHOW_CATALOGS.md) to query all catalogs in the current StarRocks cluster:

```SQL
SHOW CATALOGS;
```

You can also use [SHOW CREATE CATALOG](../../sql-reference/sql-statements/data-manipulation/SHOW_CREATE_CATALOG.md) to query the creation statement of an external catalog. The following example queries the creation statement of a JDBC catalog named `jdbc0`:

```SQL
SHOW CREATE CATALOG jdbc0;
```

## Drop a JDBC catalog

You can use [DROP CATALOG](../../sql-reference/sql-statements/data-definition/DROP_CATALOG.md) to drop a JDBC catalog.

The following example drops a JDBC catalog named `jdbc0`:

```SQL
DROP Catalog jdbc0;
```

## Query a table in a JDBC catalog

1. Use [SHOW DATABASES](../../sql-reference/sql-statements/data-manipulation/SHOW_DATABASES.md) to view the databases in your JDBC-compatible cluster:

```SQL
SHOW DATABASES FROM <catalog_name>;
```

2. Use [SET CATALOG](../../sql-reference/sql-statements/data-definition/SET_CATALOG.md) to switch to the destination catalog in the current session:

```SQL
SET CATALOG <catalog_name>;
```

Then, use [USE](../../sql-reference/sql-statements/data-definition/USE.md) to specify the active database in the current session:

```SQL
USE <db_name>;
```

Or, you can use [USE](../../sql-reference/sql-statements/data-definition/USE.md) to directly specify the active database in the destination catalog:

```SQL
USE <catalog_name>.<db_name>;
```

3. Use [SELECT](../../sql-reference/sql-statements/data-manipulation/SELECT.md) to query the destination table in the specified database:

```SQL
SELECT * FROM <table_name>;
```

## FAQ

What do I do if an error suggesting "Malformed database URL, failed to parse the main URL sections" is thrown?

If you encounter such an error, the URI that you passed in `jdbc_uri` is invalid. Check the URI that you pass and make sure it is valid. For more information, see the parameter descriptions in the "[PROPERTIES](#properties)" section of this topic.
149 changes: 149 additions & 0 deletions docs/zh/data_source/catalog/jdbc_catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
displayed_sidebar: "Chinese"
---

# JDBC catalog

StarRocks 从 3.0 版本开始支持 JDBC Catalog。

JDBC Catalog 是一种 External Catalog。通过 JDBC Catalog,您不需要执行数据导入就可以直接查询 JDBC 数据源里的数据。

此外,您还可以基于 JDBC Catalog ,结合 [INSERT INTO](../../sql-reference/sql-statements/data-manipulation/INSERT.md) 能力对 JDBC 数据源的数据实现转换和导入。

目前 JDBC Catalog 支持 MySQL 和 PostgreSQL。

## 前提条件

- 确保 FE 和 BE(或 CN)可以通过 `driver_url` 指定的下载路径,下载所需的 JDBC 驱动程序。
- BE(或 CN)所在机器的启动脚本 **$BE_HOME/bin/start_be.sh** 中需要配置 `JAVA_HOME`,要配置成 JDK 环境,不能配置成 JRE 环境,比如 `export JAVA_HOME = <JDK 的绝对路径>`。注意需要将该配置添加在 BE(或 CN)启动脚本最开头,添加完成后需重启 BE(或 CN)。

## 创建 JDBC Catalog

### 语法

```SQL
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES ("key"="value", ...)
```

### 参数说明

#### `catalog_name`

JDBC Catalog 的名称。命名要求如下:

- 必须由字母 (a-z 或 A-Z)、数字 (0-9) 或下划线 (_) 组成,且只能以字母开头。
- 总长度不能超过 1023 个字符。
- Catalog 名称大小写敏感。

#### `comment`

JDBC Catalog 的描述。此参数为可选。

#### PROPERTIES

JDBC Catalog 的属性,包含如下必填配置项:

| **参数** | **说明** |
| ------------ | ------------------------------------------------------------ |
| type | 资源类型,固定取值为 `jdbc`。 |
| user | 目标数据库登录用户名。 |
| password | 目标数据库用户登录密码。 |
| jdbc_uri | JDBC 驱动程序连接目标数据库的 URI。如果使用 MySQL,格式为:`"jdbc:mysql://ip:port"`。如果使用 PostgreSQL,格式为 `"jdbc:postgresql://ip:port/db_name"`。 |
| driver_url | 用于下载 JDBC 驱动程序 JAR 包的 URL。支持使用 HTTP 协议或者 file 协议,例如`https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar` 和 `file:///home/disk1/postgresql-42.3.3.jar`。<br />**说明**<br />您也可以把 JDBC 驱动程序部署在 FE 或 BE(或 CN)所在节点上任意相同路径下,然后把 `driver_url` 设置为该路径,格式为 `file:///<path>/to/the/driver`。 |
| driver_class | JDBC 驱动程序的类名称。以下是常见数据库引擎支持的 JDBC 驱动程序类名称:<ul><li>MySQL:`com.mysql.jdbc.Driver`(MySQL 5.x 及之前版本)、`com.mysql.cj.jdbc.Driver`(MySQL 6.x 及之后版本)</li><li>PostgreSQL: `org.postgresql.Driver`</li></ul> |

> **说明**
>
> FE 会在创建 JDBC Catalog 时去获取 JDBC 驱动程序,BE(或 CN)会在第一次执行查询时去获取驱动程序。获取驱动程序的耗时跟网络条件相关。

### 创建示例

以下示例创建了两个 JDBC Catalog:`jdbc0` 和 `jdbc1`。

```SQL
CREATE EXTERNAL CATALOG jdbc0
PROPERTIES
(
"type"="jdbc",
"user"="postgres",
"password"="changeme",
"jdbc_uri"="jdbc:postgresql://127.0.0.1:5432/jdbc_test",
"driver_url"="https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar",
"driver_class"="org.postgresql.Driver"
);

CREATE EXTERNAL CATALOG jdbc1
PROPERTIES
(
"type"="jdbc",
"user"="root",
"password"="changeme",
"jdbc_uri"="jdbc:mysql://127.0.0.1:3306",
"driver_url"="https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar",
"driver_class"="com.mysql.cj.jdbc.Driver"
);
```

## 查看 JDBC Catalog

您可以通过 [SHOW CATALOGS](../../sql-reference/sql-statements/data-manipulation/SHOW_CATALOGS.md) 查询当前所在 StarRocks 集群里所有 Catalog:

```SQL
SHOW CATALOGS;
```

您也可以通过 [SHOW CREATE CATALOG](../../sql-reference/sql-statements/data-manipulation/SHOW_CREATE_CATALOG.md) 查询某个 External Catalog 的创建语句。例如,通过如下命令查询 JDBC Catalog `jdbc0` 的创建语句:

```SQL
SHOW CREATE CATALOG jdbc0;
```

## 删除 JDBC Catalog

您可以通过 [DROP CATALOG](../../sql-reference/sql-statements/data-definition/DROP_CATALOG.md) 删除一个 JDBC Catalog。

例如,通过如下命令删除 JDBC Catalog `jdbc0`:

```SQL
DROP Catalog jdbc0;
```

## 查询 JDBC Catalog 中的表数据

1. 通过 [SHOW DATABASES](../../sql-reference/sql-statements/data-manipulation/SHOW_CATALOGS.md) 查看指定 Catalog 所属的集群中的数据库:

```SQL
SHOW DATABASES FROM <catalog_name>;
```

2. 通过 [SET CATALOG](../../sql-reference/sql-statements/data-definition/SET_CATALOG.md) 切换当前会话生效的 Catalog:

```SQL
SET CATALOG <catalog_name>;
```

再通过 [USE](../../sql-reference/sql-statements/data-definition/USE.md) 指定当前会话生效的数据库:

```SQL
USE <db_name>;
```

或者,也可以通过 [USE](../../sql-reference/sql-statements/data-definition/USE.md) 直接将会话切换到目标 Catalog 下的指定数据库:

```SQL
USE <catalog_name>.<db_name>;
```

3. 通过 [SELECT](../../sql-reference/sql-statements/data-manipulation/SELECT.md) 查询目标数据库中的目标表:

```SQL
SELECT * FROM <table_name>;
```

## 常见问题

系统返回 "Malformed database URL, failed to parse the main URL sections" 报错应该如何处理?

该报错通常是由于 `jdbc_uri` 中传入的 URI 有误而引起的。请检查并确保传入的 URI 是否正确无误。参见本文“[PROPERTIES](#properties)”小节相关的参数说明。
Loading