Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Internal catalog tables cannot be updated with SQL #1448

Open
jkosh44 opened this issue Jan 26, 2021 · 1 comment · May be fixed by #1643
Open

Internal catalog tables cannot be updated with SQL #1448

jkosh44 opened this issue Jan 26, 2021 · 1 comment · May be fixed by #1643
Labels
beginner Good for newcomers. Mark issues with this. bug Something isn't working (correctness). Mark issues with this.

Comments

@jkosh44
Copy link
Contributor

jkosh44 commented Jan 26, 2021

Bug Report

Summary

If you try and update an internal catalog table through SQL then the indexes won't be properly updated. The issue is in how we store internal catalog index schemas and how update plans are generated.

Looking at pg_proc as an example, if we look at how the schema is created we see the following:

IndexSchema Builder::GetProcOidIndexSchema(db_oid_t db) {
std::vector<IndexSchema::Column> columns;
columns.emplace_back("prooid", type::TypeId::INTEGER, false,
parser::ColumnValueExpression(db, PgProc::PRO_TABLE_OID, PgProc::PROOID.oid_));
columns.back().SetOid(indexkeycol_oid_t(1));
// Primary
IndexSchema schema(columns, storage::index::IndexType::HASHMAP, true, true, false, true);
return schema;
}

Notice that the ColumnValueExpression in the IndexSchema::Column does not contain the column name. All other internal catalog tables follow this same pattern.

If we look at the PlanGenerator for Update, we differentiate between indexed updates and non-indexed updates with the following logic:

auto indexes = accessor_->GetIndexes(op->GetTableOid());
ExprSet cves;
for (auto index : indexes) {
for (auto &column : index.second.GetColumns()) {
// TODO(tanujnay112) big cheating with the const_cast but as the todo in abstract_expression.h says
// these are supposed to be immutable anyway. We need to either go around consting everything or document
// this assumption better somewhere
parser::ExpressionUtil::GetTupleValueExprs(
&cves, common::ManagedPointer(const_cast<parser::AbstractExpression *>(column.StoredExpression().Get())));
}
}
std::unordered_set<std::string> update_column_names;
// Evaluate update expression and add to target list
auto updates = op->GetUpdateClauses();
for (auto &update : updates) {
auto col = tbl_schema.GetColumn(update->GetColumnName());
auto col_id = col.Oid();
if (update_col_offsets.find(col_id) != update_col_offsets.end())
throw SYNTAX_EXCEPTION("Multiple assignments to same column");
// We need to EvaluateExpression since column value expressions in the update
// clause should refer to column values coming from the child.
update_col_offsets.insert(col_id);
auto upd_value = parser::ExpressionUtil::EvaluateExpression(children_expr_map_, update->GetUpdateValue()).release();
builder.AddSetClause(std::make_pair(col_id, common::ManagedPointer(upd_value)));
update_column_names.insert(update->GetColumnName());
RegisterPointerCleanup<parser::AbstractExpression>(upd_value, true, true);
}
bool indexed_update = false;
// TODO(tanujnay112) can optimize if we stored updated column oids in the update nodes during binding
// such that we didn't have to store string sets
for (auto &cve : cves) {
if (update_column_names.find(cve.CastManagedPointerTo<parser::ColumnValueExpression>()->GetColumnName()) !=
update_column_names.end()) {
indexed_update = true;
break;
}
}

To summarize this, we collect the column names of all columns from all indexes from the table we are updating. If any of the column names that are being updated are found in this list of column names from the indexes, then it is an indexed update. The difference being that for indexed updates we must update the index and the underlying table, but for non-indexed updates we can just update the underlying table.

The issue here is that the internal catalog tables do not store the column names in the schema, as pointed out above. Therefore all updates on internal catalog tables will incorrectly be categorized as a non-indexed update. This leads to all sorts of issues, specifically that the indexes aren't updated when they should be.

As far as I know there isn't any current use cases that require the internal catalog to be updated through SQL. Still it would probably be best to handle this properly because we may have use cases for this in the future, it prevents people from accidentally corrupting their internal catalogs, and it prevents people from wasting a couple hours trying to figure out why they can't update the internal catalog tables (cough me cough).

Below are two possible approaches to fixing this

  1. In the binder, for UpdateStatements collect the column oids of all the columns that are being updated. Then in the PlanGenerator for Update instead of comparing column names you can compare column oids. Credit to @tanujnay112's comment the PlanGenerator.
  2. When creating the indexes for the internal catalog tables, add the column name to the ColumnValueExpressions in the IndexSchema::Columns.

I personally prefer the first option because comparing col_oid_t is more efficient that comparing strings and it's probably more consistent too.

This would be a good first issue for someone because it's pretty straightforward, not critical, and would expose you to various different parts of the database.

Environment

OS: Ubuntu (LTS) 20.04

Compiler: GCC 7.0+

CMake Profile: Debug

Jenkins/CI: N/A

Note: This issue is environment independent though the above is the environment I used.

Steps to Reproduce

  1. Compile with the following args: -DCMAKE_BUILD_TYPE=Debug -DNOISEPAGE_USE_ASAN=ON
  2. Run NoisePage with parallel execution turned off noisepage -parallel_execution=false
  3. Try and update an internal catalog table

Expected Behavior

noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     75 | plpgsql  | f       | t            |               |           |             
     74 | internal | f       | t            |               |           |             
(2 rows)

noisepage=# UPDATE pg_language SET lanoid = 75 WHERE lanoid = 74;
ERROR:  Query failed.
noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     75 | plpgsql  | f       | t            |               |           |             
     74 | internal | f       | t            |               |           |             
(2 rows)

noisepage=# UPDATE pg_language SET lanoid = 74 WHERE lanoid = 75;
ERROR:  Query failed.
noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     75 | plpgsql  | f       | t            |               |           |             
     74 | internal | f       | t            |               |           |             
(2 rows)

Actual Behavior

noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     75 | plpgsql  | f       | t            |               |           |             
     74 | internal | f       | t            |               |           |             
(2 rows)

noisepage=# UPDATE pg_language SET lanoid = 75 WHERE lanoid = 74;
UPDATE 1
noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     75 | plpgsql  | f       | t            |               |           |             
     75 | internal | f       | t            |               |           |             
(2 rows)

noisepage=# UPDATE pg_language SET lanoid = 74 WHERE lanoid = 75;
UPDATE 1
noisepage=# SELECT * FROM pg_language;
 lanoid | lanname  | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator 
--------+----------+---------+--------------+---------------+-----------+--------------
     74 | plpgsql  | f       | t            |               |           |             
     75 | internal | f       | t            |               |           |             
(2 rows)

A lot of weird stuff going on with this query. lanoid is supposed to be unique but we're breaking that constraint. Our second update should update both rows but it only updates one. The underlying issue is that the table is being updated but not the index.

@jkosh44 jkosh44 added beginner Good for newcomers. Mark issues with this. bug Something isn't working (correctness). Mark issues with this. labels Jan 26, 2021
@jkosh44
Copy link
Contributor Author

jkosh44 commented Jan 26, 2021

The other option is that we don't ever want to update a catalog table with SQL, in which case we could just return an error or something.

liyichao added a commit to liyichao/noisepage that referenced this issue Apr 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginner Good for newcomers. Mark issues with this. bug Something isn't working (correctness). Mark issues with this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant