diff --git a/docs/admin-manual/data-admin/backup-restore/backup.md b/docs/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..60b416c75d505 --- /dev/null +++ b/docs/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "Backup", + "language": "en" +} +--- + + + +For concepts related to backup, please refer to [Backup and Restore](./overview.md). This guide provides the steps to create a Repository and back up data. + +## 1. Create Repository + + + +Use the appropriate statement to create a Repository based on your storage choice. For detailed usage, please refer to [Create Repository](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY.md). When backing up using the same path for the Repository across different clusters, ensure to use different labels to avoid conflicts that may cause data confusion. + +### Option 1: Create Repository on S3 + +To create a Repository on S3 storage, use the following SQL command: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your S3 bucket name. +- Provide the appropriate endpoint, access key, secret key, and region for S3 setup. + +### Option 2: Create Repository on Azure + +**Azure is supported since 2.0.8 or 3.0.4.** + +To create a Repository on Azure storage, use the following SQL command: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- Replace `bucket_name` with your Azure container name. +- Provide your Azure storage account and key for authentication. +- `s3.region` is a dummy but required field. +- The `provider` must be set to `AZURE` for Azure storage. + +### Option 3: Create Repository on GCP + +To create a Repository on Google Cloud Platform (GCP) storage, use the following SQL command: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your GCP bucket name. +- Provide your GCP endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. + +### Option 4: Create Repository on OSS (Alibaba Cloud Object Storage Service) + +To create a Repository on OSS, use the following SQL command: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- Replace `bucket_name` with your OSS bucket name. +- Provide your OSS endpoint, region, access key, and secret key. + +### Option 5: Create Repository on MinIO + +To create a Repository on MinIO storage, use the following SQL command: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- Replace `bucket_name` with your MinIO bucket name. +- Provide your MinIO endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. +- If you do not enable Virtual Host-style, then `use_path_style` must be true. + +### Option 6: Create Repository on HDFS + +To create a Repository on HDFS storage, use the following SQL command: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- Replace `prefix_path` with the actual path. +- Provide your HDFS endpoint and username. + +## 2. Backup + +Refer to the following statements to back up databases, tables, or partitions. For detailed usage, please refer to [Backup](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md). + +It is recommended to use meaningful label names, such as those containing the databases and tables included in the backup. + +### Option 1: Backup Current Database + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: Backup Specified Database + +The following SQL statement backs up a database named `destdb` to a Repository named `example_repo`, using the snapshot label `destdb_20241225`. + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: Backup Specified Tables + +The following SQL statement backs up two tables to a Repository named `example_repo`, using the snapshot label `exampledb_tbl_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: Backup Specified Partitions + +The following SQL statement backs up a table named `example_tbl2` and two partitions named `p1` and `p2` to a Repository named `example_repo`, using the snapshot label `example_tbl_p1_p2_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: Backup Current Database Excluding Certain Tables + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`, excluding two tables named `example_tbl` and `example_tbl1`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. View Recent Backup Job Execution Status + +The following SQL statement can be used to view the execution status of recent backup jobs. + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. View Existing Backups in Repository + +The following SQL statement can be used to view existing backups in a Repository named `example_repo`, where the Snapshot column is the snapshot label, and the Timestamp is the timestamp. + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. Cancel Backup (if necessary) + +You can use `CANCEL BACKUP FROM db_name;` to cancel a backup task in a database. For more specific usage, refer to [Cancel Backup](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CANCEL-BACKUP.md). \ No newline at end of file diff --git a/docs/admin-manual/data-admin/backup-restore/overview.md b/docs/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..e89ddc1c292eb --- /dev/null +++ b/docs/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "Backup and Restore Overview", + "language": "en" +} +--- + + + +## Introduction + +Doris provides support for backup and restore operations. These features allow users to back up data from databases, tables, or partitions to remote storage systems and restore it when needed. + +## Requirements + +- **Administrator Privileges**: Only users with **ADMIN** privileges can perform backup and restore operations. + +## Key Concepts + +**Snapshot**: + A snapshot is a time-point capture of data in a database, table, or partition. When creating a snapshot, a snapshot label must be specified, and a timestamp is generated upon completion, which can identify a snapshot through the Repository, snapshot label, and timestamp. + +**Repository**: + The remote storage location where backup files are stored. Supported remote storage includes S3, Azure, GCP, OSS, COS, MinIO, HDFS, and other S3-compatible object storage. + +**Backup Operation**: + The backup operation involves creating a snapshot of a database, table, or partition, uploading the snapshot file to a remote Repository, and storing metadata related to the backup. + +**Restore Operation**: + The restore operation involves retrieving a backup from the remote Repository and restoring it to the Doris cluster. + +## Key Features + +1. **Backup Data**: + Doris allows you to back up data from tables, partitions, or entire databases by creating snapshots. Data is backed up in file format and stored in HDFS, S3, or other S3-compatible remote storage systems. + +2. **Restore Data**: + You can restore backup data from the remote Repository to any Doris cluster. This includes full database restoration, full table restoration, and partition-level restoration, allowing for flexible data recovery. + +3. **Snapshot Management**: + Data is backed up in the form of snapshots. These snapshots are uploaded to remote storage systems and can be restored when needed. The restoration process involves downloading the snapshot file and mapping it to local metadata to make it effective. + +4. **Data Migration**: + In addition to backup and restore, this feature also supports data migration between different Doris clusters. You can back up data to a remote storage system and restore it to another Doris cluster, facilitating cluster migration scenarios. + +5. **Replication Control**: + When restoring data, you can specify the number of replicas for the restored data to ensure redundancy and fault tolerance. + +## Limitations + +1. **Decoupling of Storage and Computing**: + The storage-computing separation model does not support backup and restore. + +2. **Asynchronous Materialized Views (MTMV) Not Supported**: + Backup or restore of **asynchronous materialized views (MTMV)** is not supported. These views are not considered in backup and restore operations. + +3. **Tables with Storage Policies Not Supported**: + Tables that use [**storage policies**](../../../table-desgin/tiered-storage/remote-storage.md) **do not support** backup and restore operations. + +4. **Incremental Backup**: + Currently, Doris only supports full backups. Incremental backups (only storing data changed since the last backup) are not supported; you can back up specific partitions to achieve incremental backup. + +5. **colocate_with Attribute**: + During backup or restore operations, Doris does not retain the `colocate_with` attribute of the table. This may need to be reconfigured for colocated tables after restoration. + +6. **Dynamic Partition Support**: + After restoring a table, you need to manually enable this attribute using the `ALTER TABLE` command. + +7. **Single Concurrency**: + Only one backup or restore task can run simultaneously under a single database. + diff --git a/docs/admin-manual/data-admin/backup-restore/restore.md b/docs/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..9e3bd5469e280 --- /dev/null +++ b/docs/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,148 @@ +--- +{ + "title": "Restore", + "language": "en" +} +--- + + + +## Prerequisites + +1. Ensure you have **administrator** privileges to perform the restore operation. +2. Ensure you have an existing **Repository** to store the backup. If not, follow the steps to create a Repository and perform a [backup](backup.md). +3. Ensure you have a valid **backup** snapshot available for restoration. + +## 1. Get the Backup Timestamp of the Snapshot + +The following SQL statement can be used to view existing backups in the Repository named `example_repo`. + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. Restore from Snapshot + +### Option 1: Restore Snapshot to Current Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to the current database. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2: Restore Snapshot to Specified Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to a database named `destdb`. + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3: Restore a Single Table from Snapshot + +Restore the table `backup_tbl` from the snapshot in `example_repo` to the current database, with the snapshot labeled `restore_label1` and timestamp `2022-04-08-15-52-29`. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4: Restore Partitions and Tables from Snapshot + +Restore partitions p1 and p2 of the table `backup_tbl`, as well as the table `backup_tbl2` to the current database `example_db1`, renaming it to `new_tbl`, from the backup snapshot `snapshot_2`, with the snapshot label timestamp `"2018-05-04-17-11-01"`. + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. Check the Execution Status of the Restore Job + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/docs/admin-manual/data-admin/data-recovery.md b/docs/admin-manual/data-admin/data-recovery.md new file mode 100644 index 0000000000000..4f8e23a9c9a8e --- /dev/null +++ b/docs/admin-manual/data-admin/data-recovery.md @@ -0,0 +1,54 @@ +--- +{ + "title": "Data Recovery", + "language": "en" +} +--- + + + +# Repair Data + +For the Unique Key Merge on Write table, there are bugs in some Doris versions, which may cause errors when the system calculates the delete bitmap, resulting in duplicate primary keys. At this time, the full compaction function can be used to repair the data. This function is invalid for non-Unique Key Merge on Write tables. + +This feature requires Doris version 2.0+. + +To use this function, it is necessary to stop the import as much as possible, otherwise problems such as import timeout may occur. + +## Brief principle explanation + +After the full compaction is executed, the delete bitmap will be recalculated, and the wrong delete bitmap data will be deleted to complete the data restoration. + +## Instructions for use + +`POST /api/compaction/run?tablet_id={int}&compact_type=full` + +or + +`POST /api/compaction/run?table_id={int}&compact_type=full` + +Note that only one tablet_id and table_id can be specified, and cannot be specified at the same time. After specifying table_id, full_compaction will be automatically executed for all tablets under this table. + +## Example of use + +``` +curl -X POST "http://127.0.0.1:8040/api/compaction/run?tablet_id=10015&compact_type=full" +curl -X POST "http://127.0.0.1:8040/api/compaction/run?table_id=10104&compact_type=full" +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json index de58e14d98f1d..8acd0c981b00d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json @@ -543,8 +543,12 @@ "message": "分层存储", "description": "The label for category Tiered Storage in sidebar docs" }, - "sidebar.docs.category.Business continuity & data recovery": { + "sidebar.docs.category.Business Continuity & Data Recovery": { "message": "业务连续性和数据恢复", - "description": "The label for category Business continuity & data recovery in sidebar docs" + "description": "The label for category Business Continuity & Data Recovery in sidebar docs" + }, + "sidebar.docs.category.Backup & Restore": { + "message": "备份与恢复", + "description": "The label for category Backup & Restore in sidebar docs" } } diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..45bdb0026dcd9 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "备份", + "language": "zh-CN" +} +--- + + + +有关备份的概念,请参阅[备份与恢复](./overview.md)。本指南提供了创建 Repository 和备份数据的操作步骤。 + +## 1. 创建 Repository + + + +根据您的存储选择适当的语句来创建 Repository。有关详细用法,请参阅[创建 Repository ](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY.md)。在不同集群使用相同路径的 Repository 进行备份时,请确保使用不同的 Label,以避免冲突造成数据错乱。 + +### Option 1: 在 S3 上创建 Repository + +要在 S3 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 S3 存储桶名称。 +- 提供适当的 endpoint、access key、 secret key 和 region 以进行 S3 设置。 + +### Option 2: 在 Azure 上创建 Repository + +**2.1.8 以及 3.1.4 开始支持** +要在 Azure 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- 将 bucket_name 替换为您的 Azure 容器名称。 +- 提供您的 Azure 存储帐户和密钥以进行身份验证。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- `provider` 必须为 `AZURE`。 + +### Option 3: 在 GCP 上创建 Repository + +要在 Google Cloud Platform (GCP) 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 GCP 存储桶名称。 +- 提供您的 GCP endpoint、access key 和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 + +### Option 4: 在 OSS(阿里云对象存储服务)上创建 Repository + +要在 OSS 上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- 将 bucket_name 替换为您的 OSS 存储桶名称。 +- 提供您的 OSS endpoint、region、access key 和 secret key。 + +### Option 5: 在 MinIO 上创建 Repository + +要在 MinIO 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- 将 bucket_name 替换为您的 MinIO 存储桶名称。 +- 提供您的 MinIO endpoint、access key和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- 如果您不启用 Virtual Host-style,则 'use_path_style' 必须为 true。 + +### Option 6: 在 HDFS 上创建 Repository + +要在 HDFS 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- 将 prefix_path 替换为真实路径。 +- 提供您的 hdfs endpoint 和用户名。 + +## 2. 备份 + +请参考以下语句以备份数据库、表或分区。有关详细用法,请参阅[备份](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/BACKUP.md)。 + +建议使用有意义的 Label 名称,例如包含备份中包含的数据库和表。 + +### Option 1: 备份当前数据库 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: 备份指定数据库 + +以下 SQL 语句将名为 destdb 的数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `destdb_20241225`。 + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: 备份指定表 + +以下 SQL 语句将两个表备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_tbl_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: 备份指定分区 + +以下 SQL 语句将名为 `example_tbl2` 的表和名为 `p1` 和 `p2` 的两个分区备份到名为 `example_repo` 的 Repository ,并使用快照 Label `example_tbl_p1_p2_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: 备份当前数据库,排除某些表 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`,排除两个名为 `example_tbl` 和 `example_tbl1` 的表。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. 查看最近备份作业的执行情况 + +以下 SQL 语句可用于查看最近备份作业的执行情况。 + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. 查看 Repository 中的现有备份 + +以下 SQL 语句可用于查看名为 `example_repo` 的 Repository 中的现有备份,其中 Snapshot 列是快照 Label,Timestamp 是时间戳。 + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. 取消备份(如有需要) + +可以使用 `CANCEL BACKUP FROM db_name;` 取消一个数据库中的备份任务。更具体的用法可以参考[取消备份](../../../sql-manual/sql-statements/Data-Definition-Statements/Backup-and-Restore/CANCEL-BACKUP.md)。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..458e5cace8d94 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "备份和恢复概述", + "language": "zh-CN" +} +--- + + + +## 介绍 + +Doris 提供了备份和恢复操作支持。这些功能使用户能够将数据从库、表或者分区备份到远程存储系统,并在需要时进行恢复。 + +## 要求 + +- **管理员权限**:只有具有 **ADMIN** 权限的用户才能执行备份和恢复操作。 + +## 关键概念 + +**快照**: + 快照是数据库、表或分区中数据的时间点捕获,创建快照时需要指定一个快照 Label,快照完成时会生成一个时间戳,可以通过 Repository、快照 Label 和时间戳标识一个快照。 + +**Repository**: + 备份文件存储的远程存储位置,支持的远程存储包括 S3、Azure、GCP、OSS、COS、MinIO、HDFS 和其它兼容 S3 的对象存储。 + +**备份操作**: + 备份操作涉及创建数据库、表或分区的快照,将快照文件上传到远程 Repository,并存储与备份相关的元数据。 + +**恢复操作**: + 恢复操作涉及从远程 Repository 中备份并将其恢复到 Doris 集群。 + +## 关键特性 + +1. **备份数据**: + Doris 允许您通过创建快照来备份表、分区或整个数据库的数据。数据以文件格式备份并存储在 HDFS、S3 或其他兼容 S3 的远程存储系统上。 + +2. **恢复数据**: + 您可以从远程 Repository 恢复备份数据到任何 Doris 集群。这包括完整数据库恢复、完整表恢复和分区级恢复,允许灵活的数据恢复。 + +3. **快照管理**: + 数据以快照的形式备份。这些快照被上传到远程存储系统,并可以在需要时恢复。恢复过程涉及下载快照文件并将其映射到本地元数据以使其有效。 + +4. **数据迁移**: + 除了备份和恢复,此功能还支持在不同 Doris 集群之间的数据迁移。您可以将数据备份到远程存储系统并恢复到另一个 Doris 集群,帮助进行集群迁移场景。 + +5. **复制控制**: + 在恢复数据时,您可以指定恢复数据的副本数量,以确保冗余和容错。 + +## 限制 + +1. **存储与计算解耦**: + 存算分离模式不支持备份和恢复。 + +2. **不支持异步物化视图 (MTMV)**: + 不支持备份或恢复 **异步物化视图 (MTMV)**。在备份和恢复操作中,这些视图不被考虑。 + +3. **不支持具有存储策略的表**: + 使用了 [**存储策略**](../../../table-desgin/tiered-storage/remote-storage.md) 的表 **不支持** 备份和恢复操作。 + +4. **增量备份**: + 目前,Doris 仅支持全量备份。增量备份(仅存储自上次备份以来更改的数据)尚不支持,您可以可以备份特定分区来实现增量备份。 + +5. **colocate_with 属性**: + 在备份或恢复操作期间,Doris 不会保留表的 `colocate_with` 属性。这可能需要在恢复后重新配置共置表。 + +6. **动态分区支持**: + 恢复表之后,需要使用 `ALTER TABLE` 命令手动启用此属性。 + +7. **单并发**: + 一个数据库下同时只能运行一个备份或者恢复任务。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/restore.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..0513de5b6285a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,147 @@ +--- +{ + "title": "恢复", + "language": "zh-CN" +} +--- + + + +## 前提条件 + +1. 确保您拥有**管理员**权限以执行恢复操作。 +2. 确保您有一个有效的**备份**快照可供恢复,请参考[备份](backup.md)。 + +## 1. 获取快照的备份时间戳 + +以下SQL语句可用于查看名为`example_repo`的 Repository 中的现有备份。 + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. 从快照恢复 + +### Option 1:恢复快照到当前数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到当前数据库。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2:恢复快照到指定数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到名为 `destdb` 的数据库。 + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3:从快照恢复单个表 + +从`example_repo`中的快照恢复表`backup_tbl`到当前数据库,快照的标签为 `restore_label1`,时间戳为 `2022-04-08-15-52-29`。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4:从快照恢复分区和表 + +从`example_repo`中的备份快照`snapshot_2`恢复表`backup_tbl`的分区p1和p2,以及表`backup_tbl2`到当前数据库`example_db1`,并将其重命名为`new_tbl`,快照标签为时间版本为`"2018-05-04-17-11-01"`。 + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. 查看恢复作业的执行情况 + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md index dccf974e71fad..63382fde92561 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/data-admin/ccr/overview.md @@ -24,13 +24,13 @@ under the License. ## 概览 -CCR (Cross Cluster Replication) 是一种跨集群数据同步机制,能够在库或表级别将源集群的数据变更同步到目标集群。它主要用于提升在线服务的数据可用性、读写负载隔离和建设两地三中心架构。CCR 目前不支持存算分离模式。 +CCR (Cross Cluster Replication) 是一种跨集群数据同步机制,能够在库或表级别将源集群的数据变更同步到目标集群。 ### 适用场景 CCR 适用于以下几种常见场景: -- **容灾备份**:将企业数据备份到另一集群和机房,确保在业务中断或数据丢失时能够恢复数据,或快速实现主备切换。金融、医疗、电子商务等行业通常需要这种高 SLA 的容灾备份。 +- **容灾备份**:将企业数据备份到另一集群和机房,确保在业务中断或数据丢失时能够恢复数据。 - **读写分离**:通过将数据的查询操作与写入操作分离,减小读写之间的相互影响,提升服务稳定性。对于高并发或写入压力大的场景,采用读写分离可以有效分散负载,提升数据库性能和稳定性。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json index f7ba177768fa2..172490726bec1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1.json @@ -659,8 +659,12 @@ "message": "分层存储", "description": "The label for category Tiered Storage in sidebar docs" }, - "sidebar.docs.category.Business continuity & data recovery": { + "sidebar.docs.category.Business Continuity & Data Recovery": { "message": "业务连续性和数据恢复", - "description": "The label for category Business continuity & data recovery in sidebar docs" + "description": "The label for category Business Continuity & Data Recovery in sidebar docs" + }, + "sidebar.docs.category.Backup & Restore": { + "message": "备份与恢复", + "description": "The label for category Backup & Restore in sidebar docs" } } diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..b55abcf9a1b91 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "备份", + "language": "zh-CN" +} +--- + + + +有关备份的概念,请参阅[备份与恢复](./overview.md)。本指南提供了创建 Repository 和备份数据的操作步骤。 + +## 1. 创建 Repository + + + +根据您的存储选择适当的语句来创建 Repository。有关详细用法,请参阅[创建 Repository ](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md)。在不同集群使用相同路径的 Repository 进行备份时,请确保使用不同的 Label,以避免冲突造成数据错乱。 + +### Option 1: 在 S3 上创建 Repository + +要在 S3 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 S3 存储桶名称。 +- 提供适当的 endpoint、access key、 secret key 和 region 以进行 S3 设置。 + +### Option 2: 在 Azure 上创建 Repository + +**2.1.8 以及 3.1.4 开始支持** +要在 Azure 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- 将 bucket_name 替换为您的 Azure 容器名称。 +- 提供您的 Azure 存储帐户和密钥以进行身份验证。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- `provider` 必须为 `AZURE`。 + +### Option 3: 在 GCP 上创建 Repository + +要在 Google Cloud Platform (GCP) 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 GCP 存储桶名称。 +- 提供您的 GCP endpoint、access key 和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 + +### Option 4: 在 OSS(阿里云对象存储服务)上创建 Repository + +要在 OSS 上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- 将 bucket_name 替换为您的 OSS 存储桶名称。 +- 提供您的 OSS endpoint、region、access key 和 secret key。 + +### Option 5: 在 MinIO 上创建 Repository + +要在 MinIO 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- 将 bucket_name 替换为您的 MinIO 存储桶名称。 +- 提供您的 MinIO endpoint、access key和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- 如果您不启用 Virtual Host-style,则 'use_path_style' 必须为 true。 + +### Option 6: 在 HDFS 上创建 Repository + +要在 HDFS 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- 将 prefix_path 替换为真实路径。 +- 提供您的 hdfs endpoint 和用户名。 + +## 2. 备份 + +请参考以下语句以备份数据库、表或分区。有关详细用法,请参阅[备份](../../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md)。 + +建议使用有意义的 Label 名称,例如包含备份中包含的数据库和表。 + +### Option 1: 备份当前数据库 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: 备份指定数据库 + +以下 SQL 语句将名为 destdb 的数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `destdb_20241225`。 + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: 备份指定表 + +以下 SQL 语句将两个表备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_tbl_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: 备份指定分区 + +以下 SQL 语句将名为 `example_tbl2` 的表和名为 `p1` 和 `p2` 的两个分区备份到名为 `example_repo` 的 Repository ,并使用快照 Label `example_tbl_p1_p2_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: 备份当前数据库,排除某些表 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`,排除两个名为 `example_tbl` 和 `example_tbl1` 的表。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. 查看最近备份作业的执行情况 + +以下 SQL 语句可用于查看最近备份作业的执行情况。 + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. 查看 Repository 中的现有备份 + +以下 SQL 语句可用于查看名为 `example_repo` 的 Repository 中的现有备份,其中 Snapshot 列是快照 Label,Timestamp 是时间戳。 + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. 取消备份(如有需要) + +可以使用 `CANCEL BACKUP FROM db_name;` 取消一个数据库中的备份任务。更具体的用法可以参考[取消备份](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md)。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..458e5cace8d94 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "备份和恢复概述", + "language": "zh-CN" +} +--- + + + +## 介绍 + +Doris 提供了备份和恢复操作支持。这些功能使用户能够将数据从库、表或者分区备份到远程存储系统,并在需要时进行恢复。 + +## 要求 + +- **管理员权限**:只有具有 **ADMIN** 权限的用户才能执行备份和恢复操作。 + +## 关键概念 + +**快照**: + 快照是数据库、表或分区中数据的时间点捕获,创建快照时需要指定一个快照 Label,快照完成时会生成一个时间戳,可以通过 Repository、快照 Label 和时间戳标识一个快照。 + +**Repository**: + 备份文件存储的远程存储位置,支持的远程存储包括 S3、Azure、GCP、OSS、COS、MinIO、HDFS 和其它兼容 S3 的对象存储。 + +**备份操作**: + 备份操作涉及创建数据库、表或分区的快照,将快照文件上传到远程 Repository,并存储与备份相关的元数据。 + +**恢复操作**: + 恢复操作涉及从远程 Repository 中备份并将其恢复到 Doris 集群。 + +## 关键特性 + +1. **备份数据**: + Doris 允许您通过创建快照来备份表、分区或整个数据库的数据。数据以文件格式备份并存储在 HDFS、S3 或其他兼容 S3 的远程存储系统上。 + +2. **恢复数据**: + 您可以从远程 Repository 恢复备份数据到任何 Doris 集群。这包括完整数据库恢复、完整表恢复和分区级恢复,允许灵活的数据恢复。 + +3. **快照管理**: + 数据以快照的形式备份。这些快照被上传到远程存储系统,并可以在需要时恢复。恢复过程涉及下载快照文件并将其映射到本地元数据以使其有效。 + +4. **数据迁移**: + 除了备份和恢复,此功能还支持在不同 Doris 集群之间的数据迁移。您可以将数据备份到远程存储系统并恢复到另一个 Doris 集群,帮助进行集群迁移场景。 + +5. **复制控制**: + 在恢复数据时,您可以指定恢复数据的副本数量,以确保冗余和容错。 + +## 限制 + +1. **存储与计算解耦**: + 存算分离模式不支持备份和恢复。 + +2. **不支持异步物化视图 (MTMV)**: + 不支持备份或恢复 **异步物化视图 (MTMV)**。在备份和恢复操作中,这些视图不被考虑。 + +3. **不支持具有存储策略的表**: + 使用了 [**存储策略**](../../../table-desgin/tiered-storage/remote-storage.md) 的表 **不支持** 备份和恢复操作。 + +4. **增量备份**: + 目前,Doris 仅支持全量备份。增量备份(仅存储自上次备份以来更改的数据)尚不支持,您可以可以备份特定分区来实现增量备份。 + +5. **colocate_with 属性**: + 在备份或恢复操作期间,Doris 不会保留表的 `colocate_with` 属性。这可能需要在恢复后重新配置共置表。 + +6. **动态分区支持**: + 恢复表之后,需要使用 `ALTER TABLE` 命令手动启用此属性。 + +7. **单并发**: + 一个数据库下同时只能运行一个备份或者恢复任务。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..0513de5b6285a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,147 @@ +--- +{ + "title": "恢复", + "language": "zh-CN" +} +--- + + + +## 前提条件 + +1. 确保您拥有**管理员**权限以执行恢复操作。 +2. 确保您有一个有效的**备份**快照可供恢复,请参考[备份](backup.md)。 + +## 1. 获取快照的备份时间戳 + +以下SQL语句可用于查看名为`example_repo`的 Repository 中的现有备份。 + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. 从快照恢复 + +### Option 1:恢复快照到当前数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到当前数据库。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2:恢复快照到指定数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到名为 `destdb` 的数据库。 + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3:从快照恢复单个表 + +从`example_repo`中的快照恢复表`backup_tbl`到当前数据库,快照的标签为 `restore_label1`,时间戳为 `2022-04-08-15-52-29`。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4:从快照恢复分区和表 + +从`example_repo`中的备份快照`snapshot_2`恢复表`backup_tbl`的分区p1和p2,以及表`backup_tbl2`到当前数据库`example_db1`,并将其重命名为`new_tbl`,快照标签为时间版本为`"2018-05-04-17-11-01"`。 + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. 查看恢复作业的执行情况 + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json index 2994e1f59771b..adf9e0052ae50 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0.json @@ -663,8 +663,12 @@ "message": "分层存储", "description": "The label for category Tiered Storage in sidebar docs" }, - "sidebar.docs.category.Business continuity & data recovery": { + "sidebar.docs.category.Business Continuity & Data Recovery": { "message": "业务连续性和数据恢复", - "description": "The label for category Business continuity & data recovery in sidebar docs" + "description": "The label for category Business Continuity & Data Recovery in sidebar docs" + }, + "sidebar.docs.category.Backup & Restore": { + "message": "备份与恢复", + "description": "The label for category Backup & Restore in sidebar docs" } } diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..b55abcf9a1b91 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "备份", + "language": "zh-CN" +} +--- + + + +有关备份的概念,请参阅[备份与恢复](./overview.md)。本指南提供了创建 Repository 和备份数据的操作步骤。 + +## 1. 创建 Repository + + + +根据您的存储选择适当的语句来创建 Repository。有关详细用法,请参阅[创建 Repository ](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md)。在不同集群使用相同路径的 Repository 进行备份时,请确保使用不同的 Label,以避免冲突造成数据错乱。 + +### Option 1: 在 S3 上创建 Repository + +要在 S3 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 S3 存储桶名称。 +- 提供适当的 endpoint、access key、 secret key 和 region 以进行 S3 设置。 + +### Option 2: 在 Azure 上创建 Repository + +**2.1.8 以及 3.1.4 开始支持** +要在 Azure 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- 将 bucket_name 替换为您的 Azure 容器名称。 +- 提供您的 Azure 存储帐户和密钥以进行身份验证。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- `provider` 必须为 `AZURE`。 + +### Option 3: 在 GCP 上创建 Repository + +要在 Google Cloud Platform (GCP) 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- 将 bucket_name 替换为您的 GCP 存储桶名称。 +- 提供您的 GCP endpoint、access key 和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 + +### Option 4: 在 OSS(阿里云对象存储服务)上创建 Repository + +要在 OSS 上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- 将 bucket_name 替换为您的 OSS 存储桶名称。 +- 提供您的 OSS endpoint、region、access key 和 secret key。 + +### Option 5: 在 MinIO 上创建 Repository + +要在 MinIO 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- 将 bucket_name 替换为您的 MinIO 存储桶名称。 +- 提供您的 MinIO endpoint、access key和 secret key。 +- `s3.region` 只是一个虚假的 region,任意指定一个即可,但是必须要指定。 +- 如果您不启用 Virtual Host-style,则 'use_path_style' 必须为 true。 + +### Option 6: 在 HDFS 上创建 Repository + +要在 HDFS 存储上创建 Repository ,请使用以下 SQL 命令: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- 将 prefix_path 替换为真实路径。 +- 提供您的 hdfs endpoint 和用户名。 + +## 2. 备份 + +请参考以下语句以备份数据库、表或分区。有关详细用法,请参阅[备份](../../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md)。 + +建议使用有意义的 Label 名称,例如包含备份中包含的数据库和表。 + +### Option 1: 备份当前数据库 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: 备份指定数据库 + +以下 SQL 语句将名为 destdb 的数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `destdb_20241225`。 + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: 备份指定表 + +以下 SQL 语句将两个表备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_tbl_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: 备份指定分区 + +以下 SQL 语句将名为 `example_tbl2` 的表和名为 `p1` 和 `p2` 的两个分区备份到名为 `example_repo` 的 Repository ,并使用快照 Label `example_tbl_p1_p2_tbl1_20241225`。 + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: 备份当前数据库,排除某些表 + +以下 SQL 语句将当前数据库备份到名为 `example_repo` 的 Repository ,并使用快照 Label `exampledb_20241225`,排除两个名为 `example_tbl` 和 `example_tbl1` 的表。 + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. 查看最近备份作业的执行情况 + +以下 SQL 语句可用于查看最近备份作业的执行情况。 + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. 查看 Repository 中的现有备份 + +以下 SQL 语句可用于查看名为 `example_repo` 的 Repository 中的现有备份,其中 Snapshot 列是快照 Label,Timestamp 是时间戳。 + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. 取消备份(如有需要) + +可以使用 `CANCEL BACKUP FROM db_name;` 取消一个数据库中的备份任务。更具体的用法可以参考[取消备份](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md)。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..458e5cace8d94 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "备份和恢复概述", + "language": "zh-CN" +} +--- + + + +## 介绍 + +Doris 提供了备份和恢复操作支持。这些功能使用户能够将数据从库、表或者分区备份到远程存储系统,并在需要时进行恢复。 + +## 要求 + +- **管理员权限**:只有具有 **ADMIN** 权限的用户才能执行备份和恢复操作。 + +## 关键概念 + +**快照**: + 快照是数据库、表或分区中数据的时间点捕获,创建快照时需要指定一个快照 Label,快照完成时会生成一个时间戳,可以通过 Repository、快照 Label 和时间戳标识一个快照。 + +**Repository**: + 备份文件存储的远程存储位置,支持的远程存储包括 S3、Azure、GCP、OSS、COS、MinIO、HDFS 和其它兼容 S3 的对象存储。 + +**备份操作**: + 备份操作涉及创建数据库、表或分区的快照,将快照文件上传到远程 Repository,并存储与备份相关的元数据。 + +**恢复操作**: + 恢复操作涉及从远程 Repository 中备份并将其恢复到 Doris 集群。 + +## 关键特性 + +1. **备份数据**: + Doris 允许您通过创建快照来备份表、分区或整个数据库的数据。数据以文件格式备份并存储在 HDFS、S3 或其他兼容 S3 的远程存储系统上。 + +2. **恢复数据**: + 您可以从远程 Repository 恢复备份数据到任何 Doris 集群。这包括完整数据库恢复、完整表恢复和分区级恢复,允许灵活的数据恢复。 + +3. **快照管理**: + 数据以快照的形式备份。这些快照被上传到远程存储系统,并可以在需要时恢复。恢复过程涉及下载快照文件并将其映射到本地元数据以使其有效。 + +4. **数据迁移**: + 除了备份和恢复,此功能还支持在不同 Doris 集群之间的数据迁移。您可以将数据备份到远程存储系统并恢复到另一个 Doris 集群,帮助进行集群迁移场景。 + +5. **复制控制**: + 在恢复数据时,您可以指定恢复数据的副本数量,以确保冗余和容错。 + +## 限制 + +1. **存储与计算解耦**: + 存算分离模式不支持备份和恢复。 + +2. **不支持异步物化视图 (MTMV)**: + 不支持备份或恢复 **异步物化视图 (MTMV)**。在备份和恢复操作中,这些视图不被考虑。 + +3. **不支持具有存储策略的表**: + 使用了 [**存储策略**](../../../table-desgin/tiered-storage/remote-storage.md) 的表 **不支持** 备份和恢复操作。 + +4. **增量备份**: + 目前,Doris 仅支持全量备份。增量备份(仅存储自上次备份以来更改的数据)尚不支持,您可以可以备份特定分区来实现增量备份。 + +5. **colocate_with 属性**: + 在备份或恢复操作期间,Doris 不会保留表的 `colocate_with` 属性。这可能需要在恢复后重新配置共置表。 + +6. **动态分区支持**: + 恢复表之后,需要使用 `ALTER TABLE` 命令手动启用此属性。 + +7. **单并发**: + 一个数据库下同时只能运行一个备份或者恢复任务。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..0513de5b6285a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,147 @@ +--- +{ + "title": "恢复", + "language": "zh-CN" +} +--- + + + +## 前提条件 + +1. 确保您拥有**管理员**权限以执行恢复操作。 +2. 确保您有一个有效的**备份**快照可供恢复,请参考[备份](backup.md)。 + +## 1. 获取快照的备份时间戳 + +以下SQL语句可用于查看名为`example_repo`的 Repository 中的现有备份。 + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. 从快照恢复 + +### Option 1:恢复快照到当前数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到当前数据库。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2:恢复快照到指定数据库 + +以下SQL语句从名为`example_repo`的 Repository 中恢复标签为 `restore_label1` 和时间戳为 `2022-04-08-15-52-29` 的快照到名为 `destdb` 的数据库。 + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3:从快照恢复单个表 + +从`example_repo`中的快照恢复表`backup_tbl`到当前数据库,快照的标签为 `restore_label1`,时间戳为 `2022-04-08-15-52-29`。 + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4:从快照恢复分区和表 + +从`example_repo`中的备份快照`snapshot_2`恢复表`backup_tbl`的分区p1和p2,以及表`backup_tbl2`到当前数据库`example_db1`,并将其重命名为`new_tbl`,快照标签为时间版本为`"2018-05-04-17-11-01"`。 + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. 查看恢复作业的执行情况 + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/sidebars.json b/sidebars.json index 1696e27d814e5..814848ffeaaaf 100644 --- a/sidebars.json +++ b/sidebars.json @@ -497,11 +497,18 @@ }, { "type": "category", - "label": "Business continuity & data recovery", + "label": "Business Continuity & Data Recovery", "items": [ "admin-manual/data-admin/overview", - "admin-manual/data-admin/backup", - "admin-manual/data-admin/restore", + { + "type": "category", + "label": "Backup & Restore", + "items": [ + "admin-manual/data-admin/backup-restore/overview", + "admin-manual/data-admin/backup-restore/backup", + "admin-manual/data-admin/backup-restore/restore" + ] + }, { "type": "category", "label": "Cross Cluster Replication", diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..11eecd5d45575 --- /dev/null +++ b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "Backup", + "language": "en" +} +--- + + + +For concepts related to backup, please refer to [Backup and Restore](./overview.md). This guide provides the steps to create a Repository and back up data. + +## 1. Create Repository + + + +Use the appropriate statement to create a Repository based on your storage choice. For detailed usage, please refer to [Create Repository](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md). When backing up using the same path for the Repository across different clusters, ensure to use different labels to avoid conflicts that may cause data confusion. + +### Option 1: Create Repository on S3 + +To create a Repository on S3 storage, use the following SQL command: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your S3 bucket name. +- Provide the appropriate endpoint, access key, secret key, and region for S3 setup. + +### Option 2: Create Repository on Azure + +**Azure is supported since 2.0.8 or 3.0.4.** + +To create a Repository on Azure storage, use the following SQL command: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- Replace `bucket_name` with your Azure container name. +- Provide your Azure storage account and key for authentication. +- `s3.region` is a dummy but required field. +- The `provider` must be set to `AZURE` for Azure storage. + +### Option 3: Create Repository on GCP + +To create a Repository on Google Cloud Platform (GCP) storage, use the following SQL command: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your GCP bucket name. +- Provide your GCP endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. + +### Option 4: Create Repository on OSS (Alibaba Cloud Object Storage Service) + +To create a Repository on OSS, use the following SQL command: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- Replace `bucket_name` with your OSS bucket name. +- Provide your OSS endpoint, region, access key, and secret key. + +### Option 5: Create Repository on MinIO + +To create a Repository on MinIO storage, use the following SQL command: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- Replace `bucket_name` with your MinIO bucket name. +- Provide your MinIO endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. +- If you do not enable Virtual Host-style, then `use_path_style` must be true. + +### Option 6: Create Repository on HDFS + +To create a Repository on HDFS storage, use the following SQL command: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- Replace `prefix_path` with the actual path. +- Provide your HDFS endpoint and username. + +## 2. Backup + +Refer to the following statements to back up databases, tables, or partitions. For detailed usage, please refer to [Backup](../../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md). + +It is recommended to use meaningful label names, such as those containing the databases and tables included in the backup. + +### Option 1: Backup Current Database + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: Backup Specified Database + +The following SQL statement backs up a database named `destdb` to a Repository named `example_repo`, using the snapshot label `destdb_20241225`. + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: Backup Specified Tables + +The following SQL statement backs up two tables to a Repository named `example_repo`, using the snapshot label `exampledb_tbl_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: Backup Specified Partitions + +The following SQL statement backs up a table named `example_tbl2` and two partitions named `p1` and `p2` to a Repository named `example_repo`, using the snapshot label `example_tbl_p1_p2_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: Backup Current Database Excluding Certain Tables + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`, excluding two tables named `example_tbl` and `example_tbl1`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. View Recent Backup Job Execution Status + +The following SQL statement can be used to view the execution status of recent backup jobs. + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. View Existing Backups in Repository + +The following SQL statement can be used to view existing backups in a Repository named `example_repo`, where the Snapshot column is the snapshot label, and the Timestamp is the timestamp. + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. Cancel Backup (if necessary) + +You can use `CANCEL BACKUP FROM db_name;` to cancel a backup task in a database. For more specific usage, refer to [Cancel Backup](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md). \ No newline at end of file diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..e89ddc1c292eb --- /dev/null +++ b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "Backup and Restore Overview", + "language": "en" +} +--- + + + +## Introduction + +Doris provides support for backup and restore operations. These features allow users to back up data from databases, tables, or partitions to remote storage systems and restore it when needed. + +## Requirements + +- **Administrator Privileges**: Only users with **ADMIN** privileges can perform backup and restore operations. + +## Key Concepts + +**Snapshot**: + A snapshot is a time-point capture of data in a database, table, or partition. When creating a snapshot, a snapshot label must be specified, and a timestamp is generated upon completion, which can identify a snapshot through the Repository, snapshot label, and timestamp. + +**Repository**: + The remote storage location where backup files are stored. Supported remote storage includes S3, Azure, GCP, OSS, COS, MinIO, HDFS, and other S3-compatible object storage. + +**Backup Operation**: + The backup operation involves creating a snapshot of a database, table, or partition, uploading the snapshot file to a remote Repository, and storing metadata related to the backup. + +**Restore Operation**: + The restore operation involves retrieving a backup from the remote Repository and restoring it to the Doris cluster. + +## Key Features + +1. **Backup Data**: + Doris allows you to back up data from tables, partitions, or entire databases by creating snapshots. Data is backed up in file format and stored in HDFS, S3, or other S3-compatible remote storage systems. + +2. **Restore Data**: + You can restore backup data from the remote Repository to any Doris cluster. This includes full database restoration, full table restoration, and partition-level restoration, allowing for flexible data recovery. + +3. **Snapshot Management**: + Data is backed up in the form of snapshots. These snapshots are uploaded to remote storage systems and can be restored when needed. The restoration process involves downloading the snapshot file and mapping it to local metadata to make it effective. + +4. **Data Migration**: + In addition to backup and restore, this feature also supports data migration between different Doris clusters. You can back up data to a remote storage system and restore it to another Doris cluster, facilitating cluster migration scenarios. + +5. **Replication Control**: + When restoring data, you can specify the number of replicas for the restored data to ensure redundancy and fault tolerance. + +## Limitations + +1. **Decoupling of Storage and Computing**: + The storage-computing separation model does not support backup and restore. + +2. **Asynchronous Materialized Views (MTMV) Not Supported**: + Backup or restore of **asynchronous materialized views (MTMV)** is not supported. These views are not considered in backup and restore operations. + +3. **Tables with Storage Policies Not Supported**: + Tables that use [**storage policies**](../../../table-desgin/tiered-storage/remote-storage.md) **do not support** backup and restore operations. + +4. **Incremental Backup**: + Currently, Doris only supports full backups. Incremental backups (only storing data changed since the last backup) are not supported; you can back up specific partitions to achieve incremental backup. + +5. **colocate_with Attribute**: + During backup or restore operations, Doris does not retain the `colocate_with` attribute of the table. This may need to be reconfigured for colocated tables after restoration. + +6. **Dynamic Partition Support**: + After restoring a table, you need to manually enable this attribute using the `ALTER TABLE` command. + +7. **Single Concurrency**: + Only one backup or restore task can run simultaneously under a single database. + diff --git a/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..9e3bd5469e280 --- /dev/null +++ b/versioned_docs/version-2.1/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,148 @@ +--- +{ + "title": "Restore", + "language": "en" +} +--- + + + +## Prerequisites + +1. Ensure you have **administrator** privileges to perform the restore operation. +2. Ensure you have an existing **Repository** to store the backup. If not, follow the steps to create a Repository and perform a [backup](backup.md). +3. Ensure you have a valid **backup** snapshot available for restoration. + +## 1. Get the Backup Timestamp of the Snapshot + +The following SQL statement can be used to view existing backups in the Repository named `example_repo`. + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. Restore from Snapshot + +### Option 1: Restore Snapshot to Current Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to the current database. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2: Restore Snapshot to Specified Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to a database named `destdb`. + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3: Restore a Single Table from Snapshot + +Restore the table `backup_tbl` from the snapshot in `example_repo` to the current database, with the snapshot labeled `restore_label1` and timestamp `2022-04-08-15-52-29`. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4: Restore Partitions and Tables from Snapshot + +Restore partitions p1 and p2 of the table `backup_tbl`, as well as the table `backup_tbl2` to the current database `example_db1`, renaming it to `new_tbl`, from the backup snapshot `snapshot_2`, with the snapshot label timestamp `"2018-05-04-17-11-01"`. + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. Check the Execution Status of the Restore Job + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md new file mode 100644 index 0000000000000..11eecd5d45575 --- /dev/null +++ b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/backup.md @@ -0,0 +1,265 @@ +--- +{ + "title": "Backup", + "language": "en" +} +--- + + + +For concepts related to backup, please refer to [Backup and Restore](./overview.md). This guide provides the steps to create a Repository and back up data. + +## 1. Create Repository + + + +Use the appropriate statement to create a Repository based on your storage choice. For detailed usage, please refer to [Create Repository](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CREATE-REPOSITORY.md). When backing up using the same path for the Repository across different clusters, ensure to use different labels to avoid conflicts that may cause data confusion. + +### Option 1: Create Repository on S3 + +To create a Repository on S3 storage, use the following SQL command: + +```sql +CREATE REPOSITORY `s3_repo` +WITH S3 +ON LOCATION "s3://bucket_name/s3_repo" +PROPERTIES +( + "s3.endpoint" = "s3.us-east-1.amazonaws.com", + "s3.region" = "us-east-1", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your S3 bucket name. +- Provide the appropriate endpoint, access key, secret key, and region for S3 setup. + +### Option 2: Create Repository on Azure + +**Azure is supported since 2.0.8 or 3.0.4.** + +To create a Repository on Azure storage, use the following SQL command: + +```sql +CREATE REPOSITORY `azure_repo` +WITH S3 +ON LOCATION "s3://bucket_name/azure_repo" +PROPERTIES +( + "s3.endpoint" = "selectdbcloudtestwestus3.blob.core.windows.net", + "s3.region" = "dummy_region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "provider" = "AZURE" +); +``` + +- Replace `bucket_name` with your Azure container name. +- Provide your Azure storage account and key for authentication. +- `s3.region` is a dummy but required field. +- The `provider` must be set to `AZURE` for Azure storage. + +### Option 3: Create Repository on GCP + +To create a Repository on Google Cloud Platform (GCP) storage, use the following SQL command: + +```sql +CREATE REPOSITORY `gcp_repo` +WITH S3 +ON LOCATION "s3://bucket_name/backup/gcp_repo" +PROPERTIES +( + "s3.endpoint" = "storage.googleapis.com", + "s3.region" = "US-WEST2", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` + +- Replace `bucket_name` with your GCP bucket name. +- Provide your GCP endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. + +### Option 4: Create Repository on OSS (Alibaba Cloud Object Storage Service) + +To create a Repository on OSS, use the following SQL command: + +```sql +CREATE REPOSITORY `oss_repo` +WITH S3 +ON LOCATION "s3://bucket_name/oss_repo" +PROPERTIES +( + "s3.endpoint" = "oss.aliyuncs.com", + "s3.region" = "cn-hangzhou", + "s3.access_key" = "ak", + "s3.secret_key" = "sk" +); +``` +- Replace `bucket_name` with your OSS bucket name. +- Provide your OSS endpoint, region, access key, and secret key. + +### Option 5: Create Repository on MinIO + +To create a Repository on MinIO storage, use the following SQL command: + +```sql +CREATE REPOSITORY `minio_repo` +WITH S3 +ON LOCATION "s3://bucket_name/minio_repo" +PROPERTIES +( + "s3.endpoint" = "yourminio.com", + "s3.region" = "dummy-region", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "use_path_style" = "true" +); +``` + +- Replace `bucket_name` with your MinIO bucket name. +- Provide your MinIO endpoint, access key, and secret key. +- `s3.region` is a dummy but required field. +- If you do not enable Virtual Host-style, then `use_path_style` must be true. + +### Option 6: Create Repository on HDFS + +To create a Repository on HDFS storage, use the following SQL command: + +```sql +CREATE REPOSITORY `hdfs_repo` +WITH hdfs +ON LOCATION "/prefix_path/hdfs_repo" +PROPERTIES +( + "fs.defaultFS" = "hdfs://127.0.0.1:9000", + "hadoop.username" = "doris-test" +) +``` + +- Replace `prefix_path` with the actual path. +- Provide your HDFS endpoint and username. + +## 2. Backup + +Refer to the following statements to back up databases, tables, or partitions. For detailed usage, please refer to [Backup](../../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md). + +It is recommended to use meaningful label names, such as those containing the databases and tables included in the backup. + +### Option 1: Backup Current Database + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo; +``` + +### Option 2: Backup Specified Database + +The following SQL statement backs up a database named `destdb` to a Repository named `example_repo`, using the snapshot label `destdb_20241225`. + +```sql +BACKUP SNAPSHOT destdb.`destdb_20241225` +TO example_repo; +``` + +### Option 3: Backup Specified Tables + +The following SQL statement backs up two tables to a Repository named `example_repo`, using the snapshot label `exampledb_tbl_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT exampledb_tbl_tbl1_20241225 +TO example_repo +ON (example_tbl, example_tbl1); +``` + +### Option 4: Backup Specified Partitions + +The following SQL statement backs up a table named `example_tbl2` and two partitions named `p1` and `p2` to a Repository named `example_repo`, using the snapshot label `example_tbl_p1_p2_tbl1_20241225`. + +```sql +BACKUP SNAPSHOT example_tbl_p1_p2_tbl1_20241225 +TO example_repo +ON +( + example_tbl PARTITION (p1,p2), + example_tbl2 +); +``` + +### Option 5: Backup Current Database Excluding Certain Tables + +The following SQL statement backs up the current database to a Repository named `example_repo`, using the snapshot label `exampledb_20241225`, excluding two tables named `example_tbl` and `example_tbl1`. + +```sql +BACKUP SNAPSHOT exampledb_20241225 +TO example_repo +EXCLUDE +( + example_tbl, + example_tbl1 +); +``` + +## 3. View Recent Backup Job Execution Status + +The following SQL statement can be used to view the execution status of recent backup jobs. + +```sql +mysql> show BACKUP\G; +*************************** 1. row *************************** + JobId: 17891847 + SnapshotName: exampledb_20241225 + DbName: example_db + State: FINISHED + BackupObjs: [example_db.example_tbl] + CreateTime: 2022-04-08 15:52:29 + SnapshotFinishedTime: 2022-04-08 15:52:32 + UploadFinishedTime: 2022-04-08 15:52:38 + FinishedTime: 2022-04-08 15:52:44 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) +``` + +## 4. View Existing Backups in Repository + +The following SQL statement can be used to view existing backups in a Repository named `example_repo`, where the Snapshot column is the snapshot label, and the Timestamp is the timestamp. + +```sql +mysql> SHOW SNAPSHOT ON example_repo; ++-----------------+---------------------+--------+ +| Snapshot | Timestamp | Status | ++-----------------+---------------------+--------+ +| exampledb_20241225 | 2022-04-08-15-52-29 | OK | ++-----------------+---------------------+--------+ +1 row in set (0.15 sec) +``` + +## 5. Cancel Backup (if necessary) + +You can use `CANCEL BACKUP FROM db_name;` to cancel a backup task in a database. For more specific usage, refer to [Cancel Backup](../../../sql-manual/sql-statements/data-modification/backup-and-restore/CANCEL-BACKUP.md). \ No newline at end of file diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md new file mode 100644 index 0000000000000..e89ddc1c292eb --- /dev/null +++ b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/overview.md @@ -0,0 +1,88 @@ +--- +{ + "title": "Backup and Restore Overview", + "language": "en" +} +--- + + + +## Introduction + +Doris provides support for backup and restore operations. These features allow users to back up data from databases, tables, or partitions to remote storage systems and restore it when needed. + +## Requirements + +- **Administrator Privileges**: Only users with **ADMIN** privileges can perform backup and restore operations. + +## Key Concepts + +**Snapshot**: + A snapshot is a time-point capture of data in a database, table, or partition. When creating a snapshot, a snapshot label must be specified, and a timestamp is generated upon completion, which can identify a snapshot through the Repository, snapshot label, and timestamp. + +**Repository**: + The remote storage location where backup files are stored. Supported remote storage includes S3, Azure, GCP, OSS, COS, MinIO, HDFS, and other S3-compatible object storage. + +**Backup Operation**: + The backup operation involves creating a snapshot of a database, table, or partition, uploading the snapshot file to a remote Repository, and storing metadata related to the backup. + +**Restore Operation**: + The restore operation involves retrieving a backup from the remote Repository and restoring it to the Doris cluster. + +## Key Features + +1. **Backup Data**: + Doris allows you to back up data from tables, partitions, or entire databases by creating snapshots. Data is backed up in file format and stored in HDFS, S3, or other S3-compatible remote storage systems. + +2. **Restore Data**: + You can restore backup data from the remote Repository to any Doris cluster. This includes full database restoration, full table restoration, and partition-level restoration, allowing for flexible data recovery. + +3. **Snapshot Management**: + Data is backed up in the form of snapshots. These snapshots are uploaded to remote storage systems and can be restored when needed. The restoration process involves downloading the snapshot file and mapping it to local metadata to make it effective. + +4. **Data Migration**: + In addition to backup and restore, this feature also supports data migration between different Doris clusters. You can back up data to a remote storage system and restore it to another Doris cluster, facilitating cluster migration scenarios. + +5. **Replication Control**: + When restoring data, you can specify the number of replicas for the restored data to ensure redundancy and fault tolerance. + +## Limitations + +1. **Decoupling of Storage and Computing**: + The storage-computing separation model does not support backup and restore. + +2. **Asynchronous Materialized Views (MTMV) Not Supported**: + Backup or restore of **asynchronous materialized views (MTMV)** is not supported. These views are not considered in backup and restore operations. + +3. **Tables with Storage Policies Not Supported**: + Tables that use [**storage policies**](../../../table-desgin/tiered-storage/remote-storage.md) **do not support** backup and restore operations. + +4. **Incremental Backup**: + Currently, Doris only supports full backups. Incremental backups (only storing data changed since the last backup) are not supported; you can back up specific partitions to achieve incremental backup. + +5. **colocate_with Attribute**: + During backup or restore operations, Doris does not retain the `colocate_with` attribute of the table. This may need to be reconfigured for colocated tables after restoration. + +6. **Dynamic Partition Support**: + After restoring a table, you need to manually enable this attribute using the `ALTER TABLE` command. + +7. **Single Concurrency**: + Only one backup or restore task can run simultaneously under a single database. + diff --git a/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md new file mode 100644 index 0000000000000..9e3bd5469e280 --- /dev/null +++ b/versioned_docs/version-3.0/admin-manual/data-admin/backup-restore/restore.md @@ -0,0 +1,148 @@ +--- +{ + "title": "Restore", + "language": "en" +} +--- + + + +## Prerequisites + +1. Ensure you have **administrator** privileges to perform the restore operation. +2. Ensure you have an existing **Repository** to store the backup. If not, follow the steps to create a Repository and perform a [backup](backup.md). +3. Ensure you have a valid **backup** snapshot available for restoration. + +## 1. Get the Backup Timestamp of the Snapshot + +The following SQL statement can be used to view existing backups in the Repository named `example_repo`. + + ```sql + mysql> SHOW SNAPSHOT ON example_repo; + +-----------------+---------------------+--------+ + | Snapshot | Timestamp | Status | + +-----------------+---------------------+--------+ + | exampledb_20241225 | 2022-04-08-15-52-29 | OK | + +-----------------+---------------------+--------+ + 1 row in set (0.15 sec) + ``` + +## 2. Restore from Snapshot + +### Option 1: Restore Snapshot to Current Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to the current database. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 2: Restore Snapshot to Specified Database + +The following SQL statement restores the snapshot labeled `restore_label1` with the timestamp `2022-04-08-15-52-29` from the Repository named `example_repo` to a database named `destdb`. + +```sql +RESTORE SNAPSHOT destdb.`restore_label1` +FROM `example_repo` +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 3: Restore a Single Table from Snapshot + +Restore the table `backup_tbl` from the snapshot in `example_repo` to the current database, with the snapshot labeled `restore_label1` and timestamp `2022-04-08-15-52-29`. + +```sql +RESTORE SNAPSHOT `restore_label1` +FROM `example_repo` +ON ( `backup_tbl` ) +PROPERTIES +( + "backup_timestamp"="2022-04-08-15-52-29" +); +``` + +### Option 4: Restore Partitions and Tables from Snapshot + +Restore partitions p1 and p2 of the table `backup_tbl`, as well as the table `backup_tbl2` to the current database `example_db1`, renaming it to `new_tbl`, from the backup snapshot `snapshot_2`, with the snapshot label timestamp `"2018-05-04-17-11-01"`. + + ```sql + RESTORE SNAPSHOT `restore_label1` + FROM `example_repo` + ON + ( + `backup_tbl` PARTITION (`p1`, `p2`), + `backup_tbl2` AS `new_tbl` + ) + PROPERTIES + ( + "backup_timestamp"="2022-04-08-15-55-43" + ); + ``` + +## 3. Check the Execution Status of the Restore Job + + ```sql + mysql> SHOW RESTORE\G; + *************************** 1. row *************************** + JobId: 17891851 + Label: snapshot_label1 + Timestamp: 2022-04-08-15-52-29 + DbName: default_cluster:example_db1 + State: FINISHED + AllowLoad: false + ReplicationNum: 3 + RestoreObjs: { + "name": "snapshot_label1", + "database": "example_db", + "backup_time": 1649404349050, + "content": "ALL", + "olap_table_list": [ + { + "name": "backup_tbl", + "partition_names": [ + "p1", + "p2" + ] + } + ], + "view_list": [], + "odbc_table_list": [], + "odbc_resource_list": [] + } + CreateTime: 2022-04-08 15:59:01 + MetaPreparedTime: 2022-04-08 15:59:02 + SnapshotFinishedTime: 2022-04-08 15:59:05 + DownloadFinishedTime: 2022-04-08 15:59:12 + FinishedTime: 2022-04-08 15:59:18 + UnfinishedTasks: + Progress: + TaskErrMsg: + Status: [OK] + Timeout: 86400 + 1 row in set (0.01 sec) + ``` diff --git a/versioned_sidebars/version-2.1-sidebars.json b/versioned_sidebars/version-2.1-sidebars.json index 9d0e11d4421ab..94209cc3b409d 100644 --- a/versioned_sidebars/version-2.1-sidebars.json +++ b/versioned_sidebars/version-2.1-sidebars.json @@ -461,11 +461,18 @@ }, { "type": "category", - "label": "Business continuity & data recovery", + "label": "Business Continuity & Data Recovery", "items": [ "admin-manual/data-admin/overview", - "admin-manual/data-admin/backup", - "admin-manual/data-admin/restore", + { + "type": "category", + "label": "Backup & Restore", + "items": [ + "admin-manual/data-admin/backup-restore/overview", + "admin-manual/data-admin/backup-restore/backup", + "admin-manual/data-admin/backup-restore/restore" + ] + }, "admin-manual/data-admin/ccr", "admin-manual/data-admin/recyclebin" ] diff --git a/versioned_sidebars/version-3.0-sidebars.json b/versioned_sidebars/version-3.0-sidebars.json index 78a3461ee47dd..3ed3a38bbbcf2 100644 --- a/versioned_sidebars/version-3.0-sidebars.json +++ b/versioned_sidebars/version-3.0-sidebars.json @@ -493,11 +493,18 @@ }, { "type": "category", - "label": "Business continuity & data recovery", + "label": "Business Continuity & Data Recovery", "items": [ "admin-manual/data-admin/overview", - "admin-manual/data-admin/backup", - "admin-manual/data-admin/restore", + { + "type": "category", + "label": "Backup & Restore", + "items": [ + "admin-manual/data-admin/backup-restore/overview", + "admin-manual/data-admin/backup-restore/backup", + "admin-manual/data-admin/backup-restore/restore" + ] + }, "admin-manual/data-admin/ccr", "admin-manual/data-admin/recyclebin" ]