Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

[15721] Blocking Schema Changes #1341

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open

[15721] Blocking Schema Changes #1341

wants to merge 69 commits into from

Conversation

sxzh93
Copy link

@sxzh93 sxzh93 commented May 5, 2018

Overview:
This PR contains all blocking schema changes implementation for Add/Drop column, rename column, and change column type. Alter table for Add/Drop column and change column type operations could be aggregated in one single SQL statement; rename column is separated alone.

Implementation Changes:
1.Functional code: We construct a new schema object for target table in parser, this object includes all changes related to this alter statement. Executor will execute the alter plan and invoke catalog to replace old table with new table using the new schema object. Catalog will use a scan to apply the changes to all tuples.
2.JUnit test: add unit test for add/drop column, change column type.

Known issues:
1.Multiple transactions bugs.
2.We don't handle drop column contains foreign key in this PR.

DeanChensj and others added 30 commits April 28, 2018 13:53
This reverts commit 6704d0e, reversing
changes made to 39fa518.
    - added alter and rename plan and statements
    - modified optimizor to bind db name and make plans
    - modified parsenodes
    - added node to string support and vice versa
    - added nodetransform of alter table
This reverts commit 6704d0e, reversing
changes made to 39fa518.
    - added alter and rename plan and statements
    - modified optimizor to bind db name and make plans
    - modified parsenodes
    - added node to string support and vice versa
    - added nodetransform of alter table
This reverts commit 6704d0e, reversing
changes made to 39fa518.
This reverts commit 6704d0e, reversing
changes made to 39fa518.
// txn->RecordDrop(database_oid, old_table->GetOid(), INVALID_OID);

// Final step of physical change should be moved to commit time
database->ReplaceTableWithOid(table_oid, new_table);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[memory leak]
old table object is not explicitly freed.
database->ReplaceTableWithOid just replace the table pointer to new_table but old_table still exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! We left a TODO here, and we are still looking for the correct way to handle this.

new_table->AddIndex(new_index);

// reinsert record into pg_index
pg_index->InsertIndex(
Copy link

@star013 star013 May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[race condition]
It is risky to manipulate index directly under multiple transactions, because it is not protected by any consistency control mechanism like locks. It is better to do such operations through an executor just like your step 4(use a SeqScanPlan). The index executor with lock mechanism will take care of consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually InsertIndex uses a InsertPlan to insert a new tuple into pg_attribute so we can get concurrency control for free.

for (oid_t old_column_id = 0;
old_column_id < old_schema->GetColumnCount(); old_column_id++) {
old_column_ids.push_back(old_column_id);
for (oid_t new_column_id = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[inefficient implementation]
It is not efficient to get a matched name in a for loop. It is better to use a hashmap to do such a thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for your suggestion.

auto new_column_name = node.GetNewName();
auto old_column_name = node.GetOldName();

ResultType result = catalog::Catalog::GetInstance()->RenameColumn(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[race condition]
If there are two transactions trying to rename a column in a table at the same time, it is not safe to do operations on catalog instance without locks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We planed to add lock logic here and in the alterTable, should we merge your lock manager and executor implementation?

Copy link

@star013 star013 May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
Although our current implementation of lock manager is correct, we are still working on it to support more functions. Most interface may not change in future, but I suggest that you keep an eye on our future pull request to guarantee your codes can run on our latest version of lock manager and index executors.
If you find any bugs in our lock manager or index executor, please feel free to contact us. We are pleasure to fix those problems.

oldName(nullptr),
newName(nullptr) {
dropped_names =
type == AlterTableType::RENAME ? nullptr : (new std::vector<char *>);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[code style]
It seems better to bracket the judgement in the clause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks for pointing out.

Copy link

@star013 star013 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Questions:

  1. The codes seem to work. They modified postgres parser, planner, executor, and catelog, which is the typical information flow to change schema. Some places need to add locks to protect consistency under multiple transactions.
  2. All the codes are easily understood because of detailed comments and clear names.
  3. There are a few redundant codes like using for loop to find a match string, but most of codes are neat.
  4. The codes are modular and they re-use other modules properly.
  5. There exist some commented out codes, like some test codes, but I believe they retain those codes to debug in future.
  6. They use debug log functions properly and most of them are at TRACE level, which is suitable to avoid too many debug info.

Documentation Questions:

  1. Comments describe intents of codes correctly.
  2. All important functions are commented. Although some functions are not commented, their names are good enough suggestions.
  3. No unusual behavior observed.
  4. No other third-party libraries.
  5. Garbage collection for the old table is not completed.

Testing Questions:

  1. They have tests and they are comprehensive on single transaction operations, but they do not test concurrent situations.
  2. Their test is valid because they use JDBC to send SQL instructions and the parser in peloton will convert it into plans to execute.
  3. No hardcoded answers.
  4. They tests cover all their basic functions like rename, type change etc.

Copy link

@alphalzh alphalzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major]: Major issues need to notice
[Minor]: Minor issues need to notice
[Trivial]: Suggestions

Improvements:
+ Much better use of comments
+ Much more test cases
+ Overall LGTM

Documentation:
+ Proper use of TODO
+ No incomplete code
+ Good comments

Tests:
+ Test exists and comprehensive
- Some tests still fails

Generally better coding style, keep it up!

public void test_RenameCol_Base() throws SQLException {
conn.createStatement().execute(SQL_RENAME_COLUMN);
ResultSet rs = conn.createStatement().executeQuery(SQL_SELECT_STAR);
rs.next();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] If rs doesn't have next, this will through an exception that is not easy to catch. Maybe add some more handling here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we should use try-catch here.

conn.commit();
conn2.commit();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Still commented out code, need to address this in the next PR

new int [] {5, 400});
assertNoMoreRows(rs);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment] great tests!

// ALTER TABLE
//===--------------------------------------------------------------------===//

/**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[good] Good comments!


if (txn == nullptr)
throw CatalogException("Alter table requires transaction");
try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Why double try catch block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we'll fix this.

@@ -26,8 +26,8 @@

// Fix for PRId64 (See https://stackoverflow.com/a/18719205)
#if defined(__cplusplus) && !defined(__STDC_FORMAT_MACROS)
#define __STDC_FORMAT_MACROS 1 // Not sure where to put this
#endif
#define __STDC_FORMAT_MACROS 1 // Not sure where to put this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Is this a auto-formatting change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


class AlterExecutor : public AbstractExecutor {
public:
AlterExecutor(const AlterExecutor &) = delete;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[good] Nice use of delete

/**
* @class AlterTableStatement
* @brief Represents "ALTER TABLE add column COLUMN_NAME COLUMN_TYPE"
* TODO: add implementation of AlterTableStatement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] maybe you forgot to remove todo...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out!

this->table_name.c_str(),
this->database_name.c_str());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Maybe putting longer functions to .cpp file

if ((*parse_tree->added_columns)[i].get()->not_null) {
catalog::Constraint constraint(ConstraintType::NOTNULL,
"con_not_null");
column.AddConstraint(constraint);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] Easy-to-mixup variable names column and columns

conn2.commit();
}

// The following tests are currently broken.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when running these test cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the system will crash due to infinite retry in the binder.

conn.createStatement().execute(sql);
ResultSet rs = conn.createStatement().executeQuery(SQL_SELECT_STAR);
rs.next();
checkRow(rs,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check can't ensure there is only one column left.

for (auto index_oid_pair : old_index_oids) {
oid_t index_oid = index_oid_pair.first;
// delete record in pg_index
pg_index->DeleteIndex(index_oid, txn);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not delete index in pg_index only if not all indexed columns still exists?

txn);
column_offset++;
}
// TODO: Add gc logic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you guys finished this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still working on this, sorry about that.

auto old_schema = old_table->GetSchema();
std::vector<oid_t> column_ids;

// Step 1: remove drop columns from old schema
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good strategy to remove dropped columns.

bool missing_ok; /* skip error if table missing */
} AlterTableStmt;

typedef enum AlterTableType {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very comprehensive types.

const std::unique_ptr<catalog::Schema> &added_columns,
const std::unique_ptr<catalog::Schema> &changed_type_columns,
AlterType a_type);
explicit AlterPlan(const std::string &database_name,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you guys have changed rename columns from using vector to string, why still vector here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, will correct it. Thanks for pointing out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants