Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
Signed-off-by: chenxu <[email protected]>
  • Loading branch information
dmetasoul01 committed Aug 22, 2023
1 parent bd4b341 commit 3256522
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 10 deletions.
8 changes: 4 additions & 4 deletions website/docs/01-Getting Started/01-setup-local-env.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,14 @@ export lakesoul_home=/opt/soft/pg.property
You can put customized database configuration information in this file.

## Install an Apache Spark environment
You could download spark distribution from https://spark.apache.org/downloads.html, and please choose spark version 3.3.0 or above. Note that the official package from Apache Spark does not include hadoop-cloud component. We provide a Spark package with Hadoop cloud dependencies, download it from https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz.
You could download spark distribution from https://spark.apache.org/downloads.html, and please choose spark version 3.3.0 or above. Note that the official package from Apache Spark does not include hadoop-cloud component. We provide a Spark package with Hadoop cloud dependencies, download it from https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz.

After unpacking spark package, you could find LakeSoul distribution jar from https://github.com/lakesoul-io/LakeSoul/releases. Download the jar file put it into `jars` directory of your spark environment.

```bash
wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz
tar xf spark-3.3.2-bin-hadoop-3.3.5.tgz
export SPARK_HOME=${PWD}/spark-3.3.2-bin-dmetasoul
wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.tgz
tar xf spark-3.3.2-bin-hadoop-3.tgz
export SPARK_HOME=${PWD}/spark-3.3.2-bin-hadoop3
wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.3.0/lakesoul-spark-2.3.0-spark-3.3.jar -P $SPARK_HOME/jars
```

Expand Down
9 changes: 9 additions & 0 deletions website/docs/03-Usage Docs/02-setup-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,15 @@ export LAKESOUL_PG_PASSWORD=root
````
:::
:::tip
If you need to access S3, you also need to download `[flink-s3-hadoop](https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop)` corresponding to the Flink version, and put to the `$FLINK_HOME/lib` directory.

If access to the Hadoop environment is required, the Hadoop Classpath environment variable can be declared:
```bash
export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
```
For details, please refer to: [Flink on Hadoop](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/)
:::

:::tip
LakeSoul may use extra amount of off-heap memory, consider to increase the off heap memory size for task manager:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ PGPASSWORD=lakesoul_test psql -h localhost -p 5432 -U lakesoul_test -f script/me
```

## 安装 Spark 环境
由于 Apache Spark 官方的下载安装包不包含 hadoop-cloud 以及 AWS S3 等依赖,我们提供了一个 Spark 安装包,其中包含了 hadoop cloud 、s3 等必要的依赖:https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-lakesoul-8e167b33.tgz
由于 Apache Spark 官方的下载安装包不包含 hadoop-cloud 以及 AWS S3 等依赖,我们提供了一个 Spark 安装包,其中包含了 hadoop cloud 、s3 等必要的依赖:https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz

```bash
wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz
tar xf spark-3.3.2-bin-hadoop-3.3.5.tgz
export SPARK_HOME=${PWD}/spark-3.3.2-bin-dmetasoul
wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz
tar xf spark-3.3.2-bin-hadoop3.tgz
export SPARK_HOME=${PWD}/spark-3.3.2-bin-hadoop3
```

:::tip
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,11 @@ containerized.taskmanager.env.LAKESOUL_PG_URL: jdbc:postgresql://localhost:5432/
请注意,需要同时设置 master 和 taskmanager 环境变量。
:::tip
Postgres数据库的连接信息、用户名和密码需要根据实际部署进行修改。
Postgres 数据库的连接信息、用户名和密码需要根据实际部署进行修改。
:::
::: caution
注意,如果使用Session方式启动作业,即以客户端的方式将作业提交给Flink Standalone Cluster,作为客户端的`flink run`不会读取上面的配置,所以需要单独配置环境变量, 即:
注意,如果使用 Session 方式启动作业,即以客户端的方式将作业提交给 Flink Standalone Cluster,作为客户端的 `flink run` 不会读取上面的配置,所以需要单独配置环境变量, 即:

```bash
export LAKESOUL_PG_DRIVER=com.lakesoul.shaded.org.postgresql.Driver
Expand All @@ -143,6 +143,16 @@ taskmanager.memory.task.off-heap.size: 3000m

并将 jar 文件放在 `$FLINK_HOME/lib` 下。在此之后,您可以像往常一样启动 flink 会话集群或应用程序。

:::tip
如果需要访问 S3,还需要下载与 Flink 版本对应的 `[flink-s3-hadoop](https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop)`,并放到 `$FLINK_HOME/lib` 目录下。

如果需要访问 Hadoop 环境,可以声明 Hadoop Classpath 环境变量:
```bash
export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
```
具体可以参考:[Flink on Hadoop](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/)
:::

### 在你的 Java 项目中添加 LakeSoul Flink Maven 依赖

将以下内容添加到项目的 pom.xml
Expand Down

0 comments on commit 3256522

Please sign in to comment.