Skip to content

Commit

Permalink
[opt](backup) optimize backup doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Yongqiang YANG committed Dec 24, 2024
1 parent a159479 commit 27b43fc
Show file tree
Hide file tree
Showing 4 changed files with 483 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
{
"title": "Backup",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

Doris supports backing up the current data in the form of files to the remote storage system. Afterwards, you can restore data from the remote storage system to any Doris cluster through the restore command. Through this function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between different clusters.

This feature requires Doris version 0.8.2+

## Permission Requirements

1. Operations related to backup and recovery are currently only allowed to be performed by users with ADMIN privileges.


## 1. Create a repository

You can create a repository according to (create-repository.md).

## 2. Backup tables or db

### Option 1: Backup table example_tbl under example_db

```sql
BACKUP SNAPSHOT example_db.snapshot_label1
TO example_repo
ON (example_tbl)
PROPERTIES ("type" = "full");
```

### Option 2: Backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2

```sql
BACKUP SNAPSHOT example_db.snapshot_label2
TO example_repo
ON
(
example_tbl PARTITION (p1,p2),
example_tbl2
);
```

## 3. View the execution of the most recent backup job

```sql
mysql> show BACKUP\G;
*************************** 1. row ***************************
JobId: 17891847
SnapshotName: snapshot_label1
DbName: example_db
State: FINISHED
BackupObjs: [default_cluster:example_db.example_tbl]
CreateTime: 2022-04-08 15:52:29
SnapshotFinishedTime: 2022-04-08 15:52:32
UploadFinishedTime: 2022-04-08 15:52:38
FinishedTime: 2022-04-08 15:52:44
UnfinishedTasks:
Progress:
TaskErrMsg:
Status: [OK]
Timeout: 86400
1 row in set (0.01 sec)
```

## 4. View existing backups in remote repositories

```sql
mysql> SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1";
+-----------------+---------------------+--------+
| Snapshot | Timestamp | Status |
+-----------------+---------------------+--------+
| snapshot_label1 | 2022-04-08-15-52-29 | OK |
+-----------------+---------------------+--------+
1 row in set (0.15 sec)
```

For the detailed usage of BACKUP, please refer to [BACKUP](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md).

## More Help

For more detailed syntax and best practices used by BACKUP, please refer to the [BACKUP](../../sql-manual/sql-statements/data-modification/backup-and-restore/BACKUP.md) command manual.
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
{
"title": "Creating a Repository",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

## Overview

In Apache Doris, a **repository** is a remote storage location used for backing up and restoring data. Repositories support various storage systems including **S3**, **Azure**, **GCP**, **OSS**, **COS**, **MinIO**, **HDFS**, and other storages compatible with S3. This guide walks you through the steps of creating a repository to use for backup and restore operations in Doris.

## Permission Requirements

- Only users with **ADMIN** privileges are allowed to create repositories for backup and restore operations.

## Supported Storage Systems

- **S3**
- **Azure**
- **GCP**
- **OSS**
- **COS**
- **MinIO**
- **HDFS**
- Other storages compatible with S3

## Creating a Repository for S3

To create a repository for S3 storage, use the following SQL command:

```sql
CREATE REPOSITORY `s3_repo`
WITH S3
ON LOCATION "s3://bucket_name/test"
PROPERTIES
(
"AWS_ENDPOINT" = "http://xxxx.xxxx.com",
"AWS_ACCESS_KEY" = "xxxx",
"AWS_SECRET_KEY" = "xxx",
"AWS_REGION" = "xxx"
);
```

- Replace bucket_name with the name of your S3 bucket.
- Provide the appropriate endpoint, access key, secret key, and region for your S3 setup.

## Creating a Repository for Azure

To create a repository for Azure storage, use the following SQL command:

```sql
CREATE REPOSITORY `azure_repo`
WITH AZURE
ON LOCATION "azure://bucket_name/container"
PROPERTIES
(
"AZURE_STORAGE_ACCOUNT" = "your_storage_account",
"AZURE_STORAGE_KEY" = "your_storage_key"
);
```

- Replace bucket_name and container with your Azure container information.
- Provide your Azure storage account and key for authentication.

## Creating a Repository for GCP

To create a repository for Google Cloud Platform (GCP) storage, use the following SQL command:

```sql
CREATE REPOSITORY `gcp_repo`
WITH GCP
ON LOCATION "gs://bucket_name"
PROPERTIES
(
"GCP_PROJECT_ID" = "your_project_id",
"GCP_ACCESS_KEY" = "your_access_key",
"GCP_SECRET_KEY" = "your_secret_key"
);
```

- Replace bucket_name with the name of your GCP bucket.
- Provide your GCP project ID, access key, and secret key.

## Creating a Repository for OSS (Alibaba Cloud Object Storage Service)

To create a repository for OSS, use the following SQL command:

```sql
CREATE REPOSITORY `oss_repo`
WITH OSS
ON LOCATION "oss://bucket_name"
PROPERTIES
(
"OSS_ACCESS_KEY_ID" = "your_access_key",
"OSS_ACCESS_KEY_SECRET" = "your_secret_key",
"OSS_ENDPOINT" = "your_oss_endpoint"
);
```
- Replace bucket_name with the name of your OSS bucket.
- Provide your OSS access key, secret key, and endpoint.

## Creating a Repository for MinIO

To create a repository for MinIO storage, use the following SQL command:

```sql
CREATE REPOSITORY `minio_repo`
WITH MINIO
ON LOCATION "minio://bucket_name"
PROPERTIES
(
"MINIO_ACCESS_KEY" = "your_access_key",
"MINIO_SECRET_KEY" = "your_secret_key",
"MINIO_ENDPOINT" = "your_minio_endpoint"
);
```

- Replace bucket_name with the name of your MinIO bucket.
- Provide your MinIO access key, secret key, and endpoint.

## Creating a Repository for HDFS

```sql
CREATE REPOSITORY `hdfs_repo`
WITH HDFS
ON LOCATION "hdfs://namenode_host:port/path"
PROPERTIES
(
"fs.defaultFS" = "hdfs://namenode_host:port",
"hadoop.username" = "hadoop_user"
);
```

For more detailed usage instructions and examples, refer to the CREATE REPOSITORY documentation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
{
"title": "Backup and Restore Overview",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

## Introduction

Apache Doris provides robust support for backup and restore operations. These features enable users to back up data from tables or entire databases to remote storage systems and restore it as needed. The system supports snapshot-based backups, which capture the state of the data at a particular point in time, and these snapshots can be stored in remote repositories like HDFS, S3, and MinIO.

Backup and restore operations are crucial for disaster recovery, data migration between clusters, and ensuring data integrity over time.

## Requirements

- **ADMIN Privileges**: Only users with **ADMIN** privileges are authorized to perform backup and restore operations. This ensures secure handling of sensitive data and prevents unauthorized access to backup processes.

- Doris version 0.8.2 or higher.

## Key Concepts

1. **Snapshot**:
A snapshot is a point-in-time capture of the data in a table or partition. It is an efficient operation, as it only creates a hard link to the existing data files.

2. **Repository**:
A remote storage location where the backup files are stored. Supported repositories include HDFS, S3, MinIO and other object storages.

3. **Backup Operation**:
A backup operation involves creating a snapshot of a table or partition, uploading the snapshot files to a remote repository, and storing the metadata related to the backup.

4. **Restore Operation**:
A restore operation involves downloading the backup from a remote repository and restoring it to a Doris cluster.

## Key Features

1. **Backup Data**:
Doris allows you to back up data from a table, partition, or an entire database by creating snapshots. The data is backed up in file format and stored on remote storage systems like HDFS, S3, or other compatible systems via the broker process.

2. **Restore Data**:
You can restore the backup data from a remote repository to any Doris cluster. This includes full database restores, full table restores, and partition-level restores, allowing for flexibility in recovering data.

3. **Snapshot Management**:
Data is backed up in the form of snapshots. These snapshots are uploaded to remote storage systems and can be later restored as needed. The restore process involves downloading snapshot files and mapping them to local metadata to make them effective.

4. **Data Migration**:
In addition to backup and restore, this functionality enables data migration between different Doris clusters. You can back up data to a remote storage system and restore it to another Doris cluster, helping in cluster migration scenarios.

5. **Replication Control**:
When restoring data, you can specify the number of replicas for the restored data to ensure redundancy and fault tolerance.

## Not Supported Features

While Doris provides powerful backup and restore capabilities, there are some limitations and unsupported features in certain scenarios:

1. **Async Materialized View (MTMV) Not Supported**:
Doris currently does not support backing up or restoring tables that are associated with **Async Materialized Views (MTMV)**. If such views are involved, the backup or restore operations may not work as expected, and users may encounter issues related to consistency or data integrity during the process.

2. **Tables with Storage Policy Not Supported**:
Tables that have a **storage policy** defined (e.g., tables configured with custom storage settings) are **not supported** for backup and restore operations. These tables may encounter issues during backup or restore, as their storage configurations may conflict with the snapshot process.

3. **Incremental Backup**:
At present, Doris only supports full backups. Incremental backups (where only the changed data since the last backup is stored) are not yet supported, although this may be included in future versions.

4. **Colocate With Property**:
During a backup or restore operation, Doris will not preserve the `colocate_with` property of tables. This may require reconfiguring the colocated tables after restoring them.

5. **Dynamic Partition Support**:
While dynamic partitioning is supported in Doris, the dynamic partition attribute will be disabled during backup. When restoring data, this attribute needs to be manually enabled using the `ALTER TABLE` command.

For detailed usage instructions, please refer to the backup and restore user guides.

Loading

0 comments on commit 27b43fc

Please sign in to comment.