Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support new syntax for Backup/Restore #52729

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

srlch
Copy link
Contributor

@srlch srlch commented Nov 8, 2024

Why we need new syntax for Backup/Restore

  1. The semantic design of the old syntax is not easy to understand and is not user-friendly.
  2. The newly introduced syntax has better scalability and flexibility to support (multiple) catalog, multiple object type and multiple DBs Backup/Restore in the future.

New syntax design for Backup/Restore

BACKUP [DATABASE <db_name>] SNAPSHOT [<dbname>.]<snapshot_name>
TO <repo_name>
[ ON ( backup_restore_object [, ...] ) ]
[PROPERTIES]

RESTORE SNAPSHOT [<dbname>.]<snapshot_name> FROM <repo_name>
[DATABASE <db_name> [AS <db_alias>] ]
[ ON ( backup_restore_object [, ...] ) ]
[PROPERTIES]

backup_restore_object ::=
    ALL TABLE[S]             | (TABLE | TABLES) <table_name> [ PARTITION (...) ] [AS <alias>] |
    ALL MATERIALIZED VIEW[S] | MATERIALIZED (VIEW | VIEWS) <mv_name> [AS <alias>] |
    ALL VIEW[S]              | (VIEW | VIEWS) <view_name> [AS <alias>] |
    ALL FUNCTION[S]          | (FUNCTION | FUNCTIONS) <func_name> [AS <alias>] |
    <table_name> [ PARTITION (...) ] [AS <alias>]

The behavioral changes in Backup/Restore syntax

Expansion of ON clause:
We introduce the key word (TABLE(S)/VIEW(S)/MATERIALIZED VIEW(S)/FUNCTION(S)) to identify different type of Backup/Restore object and use ALL to represent all objects of a certain type which is much more clear the before.

Allow to specify database explicitly and separated from snapshot name:
Backup: User can specify database after DATABASE key word or before snapshot name as before.

Restore:

  1. If user does not specify dbname at all (both before snapshot name or after DATABASE keyword) without ON clause, it will create database with the same name in snapshot file and restore all data into it.
  2. If specify dbName after DATABASE keyword, no dbName can be specified before snapshot name.
  3. The dbName after DATABASE keyword must exactly match in snapshot file. DATABASE <db_name> [AS <db_alias>] means that restore the database named by db_name in snapshot to the database named by db_alias in current cluster. If AS is missing, restore to the database with the same name.

Fixes #52746

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@@ -8246,6 +8356,7 @@ public List<String> getColumnNames(StarRocksParser.ColumnAliasesContext context)
}

protected NodePosition createPos(ParserRuleContext context) {
Preconditions.checkState(context != null);
return createPos(context.start, context.stop);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who sings the song "Runnin' Down a Dream"?

@@ -136,7 +151,7 @@ public Void visitBackupStatement(BackupStmt backupStmt, ConnectContext context)

// analyze and get Function for stmt
List<FunctionRef> fnRefs = backupStmt.getFnRefs();
if (!withOnClause) {
if (!withOnClause || allFunction) /* without `On` or contains `ALL` */ {
fnRefs.add(new FunctionRef(database.getFunctions()));
} else {
backupStmt.getFnRefs().stream().forEach(x -> x.analyzeForBackup(database));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Potential unintended backup inclusion due to condition logic.

You can modify the code like this:

// Adjust the conditional logic for backing up all tables, MVs, or views correctly
if (!withOnClause && (allTable || allMV || allView)) {
    for (Table tbl : GlobalStateMgr.getCurrentState().getLocalMetastore().getTables(database.getId())) {
        if (!Config.enable_backup_materialized_view && tbl.isMaterializedView()) {
            LOG.info("Skip backup materialized view: {} because " +
                     "backup of materialized views is disabled", tbl.getName());
            continue;
        }
        if (tbl.isTemporaryTable()) {
            continue;
        }

        if ((tbl.isOlapTable() && !allTable) || 
            (tbl.isOlapMaterializedView() && !allMV) ||
            (tbl.isOlapView() && !allView)) {
            continue;
        }

        TableName tableName = new TableName(dbName, tbl.getName());
        TableRef tableRef = new TableRef(tableName, null, null);
        tableRefs.add(tableRef);
    }
}

This modification clarifies that all specific objects should only be added when the corresponding flags are set, correcting logical errors from the original inclusive checks.

.filter(x -> x.isOlapView()).map(x -> x.getName()).collect(Collectors.toSet()));
}
}

// only retain restore tables
jobInfo.retainTables(allTbls);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
A null pointer dereference when dbName is null during the creation or access of the database.

You can modify the code like this:

// check if db exist
String dbName = stmt.getDbName();
Database db = globalStateMgr.getLocalMetastore().getDb(dbName);

if (db == null) {
    if (stmt instanceof RestoreStmt) {
        // Ensure jobInfo is initialized before using it
        if (jobInfo == null && dbName != null) {
            ErrorReport.reportDdlException(ErrorCode.ERR_COMMON_ERROR, "JobInfo is not available for database: " + dbName);
        }

        if (dbName == null) {
            // Check if jobInfo has the necessary data to avoid null pointer exception
            if (jobInfo == null || jobInfo.dbName == null) {
                ErrorReport.reportDdlException(ErrorCode.ERR_COMMON_ERROR, 
                    "Cannot determine database name from job info during restore process");
            }
            // use dbName in snapshot if target dbName is null
            dbName = jobInfo.dbName;
        }

        try {
            globalStateMgr.getLocalMetastore().createDb(dbName, null);
            db = globalStateMgr.getLocalMetastore().getDb(dbName);
        } catch (Exception e) {
            ErrorReport.reportDdlException(ErrorCode.ERR_COMMON_ERROR,
                        "Cannot create database: " + dbName + " in restore process");
        }
    } else {
        ErrorReport.reportDdlException(ErrorCode.ERR_BAD_DB_ERROR, dbName);
    }
}

@srlch srlch changed the title [WIP] support new syntax for backup restore [Feature] support new syntax for backup restore Nov 8, 2024
@srlch srlch changed the title [Feature] support new syntax for backup restore [Feature] Support new syntax for Backup/Restore Nov 8, 2024
Signed-off-by: srlch <[email protected]>
Copy link

[FE Incremental Coverage Report]

pass : 60 / 69 (86.96%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/backup/BackupHandler.java 29 37 78.38% [274, 275, 276, 298, 301, 303, 305, 473]
🔵 com/starrocks/sql/analyzer/BackupRestoreAnalyzer.java 8 9 88.89% [109]
🔵 com/starrocks/sql/ast/AbstractBackupStmt.java 23 23 100.00% []

: tableDesc | FUNCTION qualifiedName
backupRestoreObjectDesc
: backupRestoreTableDesc
| (ALL (FUNCTION | FUNCTIONS) | (FUNCTION | FUNCTIONS) qualifiedName (AS identifier)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ( ) needed here? This syntax can't restrict that there is only one ALL FUNCTIONS or multiple specified FUNCTIONS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, maybe I want to restore all Functions, and restore some of them with alias. Can it be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

() is no needed here, () is just a marker for the syntax definition in .g4 file. ALL FUNCTIONS and FUNCTIONS can be specified multiple time, check definition in backupStatement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use ALL FUNCTIONS, alias can not be set in this case

@BaseMessage("Specify alias for backup object is forbidden in BACKUP stmt")
String unsupportedSepcifyAliasInBackupStmt();

@BaseMessage("`ON` clause is forbidden if no Database explicitly completely in Restore stmt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the meaning of explicity completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -222,4 +222,13 @@ public interface ParserErrorMsg {
String nullIdentifierCancelBackupRestore();
@BaseMessage("Value count in PIVOT {0} must match number of FOR columns {1}")
String pivotValueArityMismatch(int a0, int a1);

@BaseMessage("Specify dbName after snapshot name is forbidden if the DbName is specified explicitly in BACKUP/RESTORE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after -> before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

this.withOnClause = !(this.tblRefs.isEmpty() && this.fnRefs.isEmpty());
this.originDbName = "";

this.withOnClause = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly set the default in variable declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public ParseNode visitBackupStatement(StarRocksParser.BackupStatementContext context) {
QualifiedName qualifiedName = getQualifiedName(context.qualifiedName());
LabelName labelName = qualifiedNameToLabelName(qualifiedName);
private ParseNode getObjectRef(StarRocksParser.QualifiedNameContext qualifiedNameContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be clearer to split it into 2 functions: getFunctionRef, getTableRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

FROM identifier
(ON '(' restoreObjectDesc (',' restoreObjectDesc) * ')')?
FROM repoName=identifier
(DATABASE dbName=identifier (AS dbAlias=identifier)?)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be simpler to extract a sub-statement like:

backupRestorePackageStmt
    DATABASE dbName=identifier (AS dbAlias=identifier)? |
    .....

Other pageage types will be supported/used later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider this change in next pr for external catalog. It seems that backupRestorePackageStmt designed for DATABASE and CATALOG is better.

return new BackupStmt(labelName, repoName, tblRefs, fnRefs, properties, createPos(context));
AbstractBackupStmt stmt = null;
if (backupContext != null) {
stmt = (AbstractBackupStmt) (new BackupStmt(labelName, repoName, mixTblRefs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(AbstractBackupStmt) is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

String originDb = null;

boolean withOnClause = false;
boolean allTable = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it will be simpler to use vector to store allXxx and Refs, like:

  1. ArrayList allObjects = false * 4. with enum TABLE/MV/VIEW/FUNCTION.
  2. ArrayList<ArrayList<>> allRefs = {mixTblRefs, mixTblRefs, mixTblRefs, new List()};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed NO.1, for NO.2 it seems that keep the original implementation is better, because we will treat TABLE/MV/VIEW as TableRef.

Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Copy link

sonarcloud bot commented Nov 13, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE


public class AbstractBackupStmt extends DdlStmt {
public enum BackupRestoreAllMarker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackupObjectType

@@ -223,12 +223,12 @@ public interface ParserErrorMsg {
@BaseMessage("Value count in PIVOT {0} must match number of FOR columns {1}")
String pivotValueArityMismatch(int a0, int a1);

@BaseMessage("Specify dbName after snapshot name is forbidden if the DbName is specified explicitly in BACKUP/RESTORE")
@BaseMessage("Specify dbName before snapshot name is forbidden if the DbName is specified explicitly in BACKUP/RESTORE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying ...

}
TableRef tableRef = new TableRef(tableName, alias, partitionNames, position);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly return new TableRef(...);

@@ -3424,58 +3423,58 @@ private ParseNode parseBackupRestoreStatement(ParserRuleContext context) {
}

if (specifiedFunction) {
if (allFunction) {
if (allMarker.contains(AbstractBackupStmt.BackupRestoreAllMarker.FUNCTION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly import xxx.AbstractBackupStmt.BackupObjectType;
Then usage is simpler.

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support new syntax for Backup/Restore
2 participants