-
Notifications
You must be signed in to change notification settings - Fork 623
Modified pg_column_stats initialization #1352
base: master
Are you sure you want to change the base?
Conversation
@poojanilangekar Nice! |
a625053
to
92d95fa
Compare
@mengranwo @GustavoAngulo We need to merge this in ASAP as its blocking the performance benchmarks. Please take a look at it when you get the time. |
I think it might be good to keep As for the the schema, I'm not sure which inconsistency you're referring to @poojanilangekar ? @chenboy do you see an inconsistency? |
@GustavoAngulo I have modified the function to Regarding the inconsistency, the declaration creates two SKEY indexes and the table has no primary key. While the header file states that the table should contain only one index with the primary key. |
The peloton/src/catalog/column_stats_catalog.cpp Line 135 in d68ab71
peloton/src/catalog/column_stats_catalog.cpp Line 186 in d68ab71
So I think we should stick to the declaration, not the header. |
@GustavoAngulo @camellyx Can you please review these changes? |
@pervazea Can you take a look since these other people are out of town? |
This is an important PR that we should merge. We don't want to lose track of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please add function headers / documentation as requested (and elsewhere too if I've missed anything).
Also, needs merge conflicts resolved :-( due to recent re-factors.
|
||
//===--------------------------------------------------------------------===// | ||
// write Related API | ||
//===--------------------------------------------------------------------===// | ||
bool InsertColumnStats(oid_t database_id, oid_t table_id, oid_t column_id, | ||
int num_rows, double cardinality, double frac_null, | ||
bool InsertColumnStats(oid_t table_id, oid_t column_id, int num_rows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment documenting function purpose and arguments.
std::string most_common_vals, | ||
std::string most_common_freqs, | ||
std::string histogram_bounds, std::string column_name, | ||
bool has_index, type::AbstractPool *pool, | ||
concurrency::TransactionContext *txn); | ||
bool DeleteColumnStats(oid_t database_id, oid_t table_id, oid_t column_id, | ||
bool DeleteColumnStats(oid_t table_id, oid_t column_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment documenting function purpose and arguments.
concurrency::TransactionContext *txn); | ||
|
||
//===--------------------------------------------------------------------===// | ||
// Read-only Related API | ||
//===--------------------------------------------------------------------===// | ||
std::unique_ptr<std::vector<type::Value>> GetColumnStats( | ||
oid_t database_id, oid_t table_id, oid_t column_id, | ||
concurrency::TransactionContext *txn); | ||
oid_t table_id, oid_t column_id, concurrency::TransactionContext *txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment document function, args and return information.
@@ -59,6 +60,13 @@ class SystemCatalogs { | |||
return pg_attribute_; | |||
} | |||
|
|||
ColumnStatsCatalog *GetColumnStatsCatalog() { | |||
if (!pg_column_stats_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a recoverable error? If not, this should be a PELOTON_ASSERT.
@@ -71,8 +67,8 @@ class StatsStorage { | |||
|
|||
/* Functions for triggerring stats collection */ | |||
|
|||
ResultType AnalyzeStatsForAllTables( | |||
concurrency::TransactionContext *txn = nullptr); | |||
ResultType AnalyzeStatsForAllTablesWithDatabaseOid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add function header describing this function, args, etc.
database_id, table_id, column_id, num_rows, cardinality, frac_null, | ||
most_common_vals, most_common_freqs, histogram_bounds, column_name, | ||
has_index, pool_.get(), txn); | ||
pg_column_stats->DeleteColumnStats(table_id, column_id, txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteColumnStats if they exist, I assume. May be helpful to add comment stating that, if correct.
@@ -108,14 +110,18 @@ TEST_F(StatsStorageTests, InsertAndGetTableStatsTest) { | |||
table_stats_collector.get()); | |||
|
|||
VerifyAndPrintColumnStats(data_table.get(), 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this function VerifyAndPrint... does print but very little verify. Acknowledge that this wasn't modified, but it does mean some of the old tests are not particularly useful.
This PR modifies the
ColumnStatsCatalog
constructor to use a predefined schema instead of using DDL. Additionally, it creates thepg_column_stats
table to a per database basis. Initially this was not done because of the dependencies of theStatsStorage
on the view of "Global Stats".Additionally, it changes the
AnalyzeStatsForAllTables
toAnalyzeStatsForAllTablesWithDatabaseOid
. Now that we maintainColumnStats
on a per database basis, it makes sense to also collect the stats on a per database basis.