You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.
If you try and update an internal catalog table through SQL then the indexes won't be properly updated. The issue is in how we store internal catalog index schemas and how update plans are generated.
Looking at pg_proc as an example, if we look at how the schema is created we see the following:
Notice that the ColumnValueExpression in the IndexSchema::Column does not contain the column name. All other internal catalog tables follow this same pattern.
If we look at the PlanGenerator for Update, we differentiate between indexed updates and non-indexed updates with the following logic:
// TODO(tanujnay112) can optimize if we stored updated column oids in the update nodes during binding
// such that we didn't have to store string sets
for (auto &cve : cves) {
if (update_column_names.find(cve.CastManagedPointerTo<parser::ColumnValueExpression>()->GetColumnName()) !=
update_column_names.end()) {
indexed_update = true;
break;
}
}
To summarize this, we collect the column names of all columns from all indexes from the table we are updating. If any of the column names that are being updated are found in this list of column names from the indexes, then it is an indexed update. The difference being that for indexed updates we must update the index and the underlying table, but for non-indexed updates we can just update the underlying table.
The issue here is that the internal catalog tables do not store the column names in the schema, as pointed out above. Therefore all updates on internal catalog tables will incorrectly be categorized as a non-indexed update. This leads to all sorts of issues, specifically that the indexes aren't updated when they should be.
As far as I know there isn't any current use cases that require the internal catalog to be updated through SQL. Still it would probably be best to handle this properly because we may have use cases for this in the future, it prevents people from accidentally corrupting their internal catalogs, and it prevents people from wasting a couple hours trying to figure out why they can't update the internal catalog tables (cough me cough).
Below are two possible approaches to fixing this
In the binder, for UpdateStatements collect the column oids of all the columns that are being updated. Then in the PlanGenerator for Update instead of comparing column names you can compare column oids. Credit to @tanujnay112's comment the PlanGenerator.
When creating the indexes for the internal catalog tables, add the column name to the ColumnValueExpressions in the IndexSchema::Columns.
I personally prefer the first option because comparing col_oid_t is more efficient that comparing strings and it's probably more consistent too.
This would be a good first issue for someone because it's pretty straightforward, not critical, and would expose you to various different parts of the database.
Environment
OS: Ubuntu (LTS) 20.04
Compiler: GCC 7.0+
CMake Profile: Debug
Jenkins/CI: N/A
Note: This issue is environment independent though the above is the environment I used.
Steps to Reproduce
Compile with the following args: -DCMAKE_BUILD_TYPE=Debug -DNOISEPAGE_USE_ASAN=ON
Run NoisePage with parallel execution turned off noisepage -parallel_execution=false
Try and update an internal catalog table
Expected Behavior
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
75 | plpgsql | f | t | | |
74 | internal | f | t | | |
(2 rows)
noisepage=# UPDATE pg_language SET lanoid = 75 WHERE lanoid = 74;
ERROR: Query failed.
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
75 | plpgsql | f | t | | |
74 | internal | f | t | | |
(2 rows)
noisepage=# UPDATE pg_language SET lanoid = 74 WHERE lanoid = 75;
ERROR: Query failed.
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
75 | plpgsql | f | t | | |
74 | internal | f | t | | |
(2 rows)
Actual Behavior
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
75 | plpgsql | f | t | | |
74 | internal | f | t | | |
(2 rows)
noisepage=# UPDATE pg_language SET lanoid = 75 WHERE lanoid = 74;
UPDATE 1
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
75 | plpgsql | f | t | | |
75 | internal | f | t | | |
(2 rows)
noisepage=# UPDATE pg_language SET lanoid = 74 WHERE lanoid = 75;
UPDATE 1
noisepage=# SELECT * FROM pg_language;
lanoid | lanname | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator
--------+----------+---------+--------------+---------------+-----------+--------------
74 | plpgsql | f | t | | |
75 | internal | f | t | | |
(2 rows)
A lot of weird stuff going on with this query. lanoid is supposed to be unique but we're breaking that constraint. Our second update should update both rows but it only updates one. The underlying issue is that the table is being updated but not the index.
The text was updated successfully, but these errors were encountered:
jkosh44
added
beginner
Good for newcomers. Mark issues with this.
bug
Something isn't working (correctness). Mark issues with this.
labels
Jan 26, 2021
Bug Report
Summary
If you try and update an internal catalog table through SQL then the indexes won't be properly updated. The issue is in how we store internal catalog index schemas and how update plans are generated.
Looking at
pg_proc
as an example, if we look at how the schema is created we see the following:noisepage/src/catalog/postgres/builder.cpp
Lines 796 to 807 in d60c55a
Notice that the
ColumnValueExpression
in theIndexSchema::Column
does not contain the column name. All other internal catalog tables follow this same pattern.If we look at the
PlanGenerator
forUpdate
, we differentiate between indexed updates and non-indexed updates with the following logic:noisepage/src/optimizer/plan_generator.cpp
Lines 888 to 930 in d60c55a
To summarize this, we collect the column names of all columns from all indexes from the table we are updating. If any of the column names that are being updated are found in this list of column names from the indexes, then it is an indexed update. The difference being that for indexed updates we must update the index and the underlying table, but for non-indexed updates we can just update the underlying table.
The issue here is that the internal catalog tables do not store the column names in the schema, as pointed out above. Therefore all updates on internal catalog tables will incorrectly be categorized as a non-indexed update. This leads to all sorts of issues, specifically that the indexes aren't updated when they should be.
As far as I know there isn't any current use cases that require the internal catalog to be updated through SQL. Still it would probably be best to handle this properly because we may have use cases for this in the future, it prevents people from accidentally corrupting their internal catalogs, and it prevents people from wasting a couple hours trying to figure out why they can't update the internal catalog tables (cough me cough).
Below are two possible approaches to fixing this
UpdateStatement
s collect the column oids of all the columns that are being updated. Then in thePlanGenerator
forUpdate
instead of comparing column names you can compare column oids. Credit to @tanujnay112's comment thePlanGenerator
.ColumnValueExpression
s in theIndexSchema::Column
s.I personally prefer the first option because comparing
col_oid_t
is more efficient that comparing strings and it's probably more consistent too.This would be a good first issue for someone because it's pretty straightforward, not critical, and would expose you to various different parts of the database.
Environment
OS: Ubuntu (LTS) 20.04
Compiler: GCC 7.0+
CMake Profile:
Debug
Jenkins/CI: N/A
Note: This issue is environment independent though the above is the environment I used.
Steps to Reproduce
Expected Behavior
Actual Behavior
A lot of weird stuff going on with this query.
lanoid
is supposed to be unique but we're breaking that constraint. Our second update should update both rows but it only updates one. The underlying issue is that the table is being updated but not the index.The text was updated successfully, but these errors were encountered: