Skip to content

Commit

Permalink
Parquet avro link
Browse files Browse the repository at this point in the history
  • Loading branch information
donPain committed Dec 12, 2024
1 parent 378f7bf commit ac4ccfb
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 18 deletions.
18 changes: 9 additions & 9 deletions docs/content.zh/docs/connectors/datastream/formats/parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,14 +198,14 @@ ds = env.from_source(source, WatermarkStrategy.no_watermarks(), "file-source")

Flink 支持三种方式来读取 Parquet 文件并创建 Avro records (PyFlink 只支持 Generic record):

- [Generic record](https://avro.apache.org/docs/1.10.0/api/java/index.html)
- [Specific record](https://avro.apache.org/docs/1.10.0/api/java/index.html)
- [Reflect record](https://avro.apache.org/docs/1.10.0/api/java/org/apache/avro/reflect/package-summary.html)
- [Generic record](https://avro.apache.org/docs/++version++/api/java/index.html)
- [Specific record](https://avro.apache.org/docs/++version++/api/java/index.html)
- [Reflect record](https://avro.apache.org/docs/++version++/api/java/org/apache/avro/reflect/package-summary.html)

### Generic record

使用 JSON 定义 Avro schemas。你可以从 [Avro specification](https://avro.apache.org/docs/1.10.0/spec.html) 获取更多关于 Avro schemas 和类型的信息。
此示例使用了一个在 [official Avro tutorial](https://avro.apache.org/docs/1.10.0/gettingstartedjava.html) 中描述的示例相似的 Avro schema:
使用 JSON 定义 Avro schemas。你可以从 [Avro specification](https://avro.apache.org/docs/++version++/spec.html) 获取更多关于 Avro schemas 和类型的信息。
此示例使用了一个在 [official Avro tutorial](https://avro.apache.org/docs/++version++/gettingstartedjava.html) 中描述的示例相似的 Avro schema:

```json lines
{"namespace": "example.avro",
Expand All @@ -219,11 +219,11 @@ Flink 支持三种方式来读取 Parquet 文件并创建 Avro records (PyFlin
}
```
这个 schema 定义了一个具有三个属性的的 user 记录:name,favoriteNumber 和 favoriteColor。你可以
[record specification](https://avro.apache.org/docs/1.10.0/spec.html#schema_record) 找到更多关于如何定义 Avro schema 的详细信息。
[record specification](https://avro.apache.org/docs/++version++/spec.html#schema_record) 找到更多关于如何定义 Avro schema 的详细信息。

在此示例中,你将创建包含由 Avro Generic records 格式构成的 Parquet records 的 DataStream。
Flink 会基于 JSON 字符串解析 Avro schema。也有很多其他的方式解析 schema,例如基于 java.io.File 或 java.io.InputStream。
请参考 [Avro Schema](https://avro.apache.org/docs/1.10.0/api/java/org/apache/avro/Schema.html) 以获取更多详细信息。
请参考 [Avro Schema](https://avro.apache.org/docs/++version++/api/java/org/apache/avro/Schema.html) 以获取更多详细信息。
然后,你可以通过 `AvroParquetReaders` 为 Avro Generic 记录创建 `AvroParquetRecordFormat`

{{< tabs "GenericRecord" >}}
Expand Down Expand Up @@ -286,7 +286,7 @@ stream = env.from_source(source, WatermarkStrategy.no_watermarks(), "file-source
基于之前定义的 schema,你可以通过利用 Avro 代码生成来生成类。
一旦生成了类,就不需要在程序中直接使用 schema。
你可以使用 `avro-tools.jar` 手动生成代码,也可以直接使用 Avro Maven 插件对配置的源目录中的任何 .avsc 文件执行代码生成。
请参考 [Avro Getting Started](https://avro.apache.org/docs/1.10.0/gettingstartedjava.html) 获取更多信息。
请参考 [Avro Getting Started](https://avro.apache.org/docs/++version++/gettingstartedjava.html) 获取更多信息。

此示例使用了样例 schema {{< gh_link file="flink-formats/flink-parquet/src/test/resources/avro/testdata.avsc" name="testdata.avsc" >}}:

Expand Down Expand Up @@ -335,7 +335,7 @@ final DataStream<GenericRecord> stream =

除了需要预定义 Avro Generic 和 Specific 记录, Flink 还支持基于现有 Java POJO 类从 Parquet 文件创建 DateStream。
在这种场景中,Avro 会使用 Java 反射为这些 POJO 类生成 schema 和协议。
请参考 [Avro reflect](https://avro.apache.org/docs/1.10.0/api/java/index.html) 文档获取更多关于 Java 类型到 Avro schemas 映射的详细信息。
请参考 [Avro reflect](https://avro.apache.org/docs/++version++/api/java/index.html) 文档获取更多关于 Java 类型到 Avro schemas 映射的详细信息。

本例使用了一个简单的 Java POJO 类 {{< gh_link file="flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/Datum.java" name="Datum" >}}:

Expand Down
18 changes: 9 additions & 9 deletions docs/content/docs/connectors/datastream/formats/parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,14 +196,14 @@ ds = env.from_source(source, WatermarkStrategy.no_watermarks(), "file-source")

Flink supports producing three types of Avro records by reading Parquet files (Only Generic record is supported in PyFlink):

- [Generic record](https://avro.apache.org/docs/1.10.0/api/java/index.html)
- [Specific record](https://avro.apache.org/docs/1.10.0/api/java/index.html)
- [Reflect record](https://avro.apache.org/docs/1.10.0/api/java/org/apache/avro/reflect/package-summary.html)
- [Generic record](https://avro.apache.org/docs/++version++/api/java/index.html)
- [Specific record](https://avro.apache.org/docs/++version++/api/java/index.html)
- [Reflect record](https://avro.apache.org/docs/++version++/api/java/org/apache/avro/reflect/package-summary.html)

### Generic record

Avro schemas are defined using JSON. You can get more information about Avro schemas and types from the [Avro specification](https://avro.apache.org/docs/1.10.0/spec.html).
This example uses an Avro schema example similar to the one described in the [official Avro tutorial](https://avro.apache.org/docs/1.10.0/gettingstartedjava.html):
Avro schemas are defined using JSON. You can get more information about Avro schemas and types from the [Avro specification](https://avro.apache.org/docs/++version++/spec.html).
This example uses an Avro schema example similar to the one described in the [official Avro tutorial](https://avro.apache.org/docs/++version++/gettingstartedjava.html):

```json lines
{"namespace": "example.avro",
Expand All @@ -217,10 +217,10 @@ This example uses an Avro schema example similar to the one described in the [of
}
```

This schema defines a record representing a user with three fields: name, favoriteNumber, and favoriteColor. You can find more details at [record specification](https://avro.apache.org/docs/1.10.0/spec.html#schema_record) for how to define an Avro schema.
This schema defines a record representing a user with three fields: name, favoriteNumber, and favoriteColor. You can find more details at [record specification](https://avro.apache.org/docs/++version++/spec.html#schema_record) for how to define an Avro schema.

In the following example, you will create a DataStream containing Parquet records as Avro Generic records.
It will parse the Avro schema based on the JSON string. There are many other ways to parse a schema, e.g. from java.io.File or java.io.InputStream. Please refer to [Avro Schema](https://avro.apache.org/docs/1.10.0/api/java/org/apache/avro/Schema.html) for details.
It will parse the Avro schema based on the JSON string. There are many other ways to parse a schema, e.g. from java.io.File or java.io.InputStream. Please refer to [Avro Schema](https://avro.apache.org/docs/++version++/api/java/org/apache/avro/Schema.html) for details.
After that, you will create an `AvroParquetRecordFormat` via `AvroParquetReaders` for Avro Generic records.

{{< tabs "GenericRecord" >}}
Expand Down Expand Up @@ -284,7 +284,7 @@ Based on the previously defined schema, you can generate classes by leveraging A
Once the classes have been generated, there is no need to use the schema directly in your programs.
You can either use `avro-tools.jar` to generate code manually or you could use the Avro Maven plugin to perform
code generation on any .avsc files present in the configured source directory. Please refer to
[Avro Getting Started](https://avro.apache.org/docs/1.10.0/gettingstartedjava.html) for more information.
[Avro Getting Started](https://avro.apache.org/docs/++version++/gettingstartedjava.html) for more information.

The following example uses the example schema {{< gh_link file="flink-formats/flink-parquet/src/test/resources/avro/testdata.avsc" name="testdata.avsc" >}}:

Expand Down Expand Up @@ -334,7 +334,7 @@ final DataStream<GenericRecord> stream =
Beyond Avro Generic and Specific record that requires a predefined Avro schema,
Flink also supports creating a DataStream from Parquet files based on existing Java POJO classes.
In this case, Avro will use Java reflection to generate schemas and protocols for these POJO classes.
Java types are mapped to Avro schemas, please refer to the [Avro reflect](https://avro.apache.org/docs/1.10.0/api/java/index.html) documentation for more details.
Java types are mapped to Avro schemas, please refer to the [Avro reflect](https://avro.apache.org/docs/++version++/api/java/index.html) documentation for more details.

This example uses a simple Java POJO class {{< gh_link file="flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/Datum.java" name="Datum" >}}:

Expand Down

0 comments on commit ac4ccfb

Please sign in to comment.