[15721] Compiled Catalog Query Access #1339

nwang57 · 2018-05-04T15:44:42Z

We enabled complied catalog lookup in the first pull request. Based upon this, we further support complied insert and delete queries for all catalogs. We also fix bugs for the following issues:

QueryCache Bug #1298 Need to manually bind the tuple value expression so that equality checks will work correctly.
Sequential scan assumes that the output columns should start at offset 0 otherwise PerformBinding will not correctly find the corresponding column attributes.
ZoneMapCatalog needs special care to avoid chicken and egg problem. Each sequential plan needs to check the zone map to know whether to scan the tile or not, but checking the zone map requires ZoneMap catalog access which leads to an infinity loop. So we ensure in the ZoneMap catalog manager that if the scanning table is ZoneMap catalog then it will just scan it without checking the ZoneMap

add in predicate fix enable debug enable debug logger use seq scan for get table test compile fix unused write table object constructor fix bug query execute fix bug fix bug fix bug fix query param print wrapped tuple fix bug cache cache fix try fix shared pointer clean up performance test GetDatabaseObject column_catalog.pp/h database_catalog fix seqscan fix db_catalog db_catalog fix bug db_catalog column_catalog.cpp/h index catalog lan catalog column stat lang catalog settings catalog trigger catalog zone map catalog proc_catolog.cpp/h delete code clean up db and trigger cleanup settings catalog delete code restore catalog test format fix include format fix bug in scan plan by prashanth add index test fixed binding for tuple_value_expression, and changed query_cache_test added Insert and Delete with Compiled Query in abstract_catalog and table_catalog compiled seq plan for table catalog by looking up table_oid fix trigger changed trigger_test, changed the wrong assumption that triggers are in a certain order fix settings catalog query metrics catalog query metrics catalog Changed zone_map_catalog, having issue running zone_map_scan_test using expressionPtr Edited cloumn_catalog, index_catalog, proc_catalog, settings_catalog Added Insert and Delete with Compiled Query Fixed Binding for TupleValueExpression database catalog insert database catalog bound index metrics catalog insert query history catalog table metrics insert database catalog delete modify catalog inserts change to complied insert plan intex metrics catalog delete table metrics catalog delete update catalog to use compiled delete plan fixed code review addressed issue trigger catalog bound table catalog bound language catalog bound added insert, delete and bouding for zone_map_catalog clean up deleted redundant comments in index_catalog index cache uncomment fix proc catalog fix proc catalog added bind in language_catalog's delete add comment to zone map manager

* Adding new mapping table * Revert "Fixing non-unique key insert problem" This reverts commit 4267752. * Revert LOG_INFO to LOG_TRACE * Fix segment fault problem by moving munmap() to after ~EpochManager() * Avoid compiler error * Enhance log message for mmap()'ed mapping table

db-ol

Make sure header files newly added are necessary. And optimizations can be made in the engineering side.

db-ol · 2018-05-05T10:59:18Z

src/catalog/abstract_catalog.cpp

+* @param   insert_values     tuples to be inserted
+* @param   txn       TransactionContext
+* @return  Whether insertion is Successful
+*/


As mentioned last time, move comments to header files.

db-ol · 2018-05-05T10:59:44Z

src/catalog/abstract_catalog.cpp

+* @param   predicate        Predicate used in the seq scan
+* @param   txn           TransactionContext
+* @return  Whether deletion is Successful
+*/


As above, move comments to header files.

db-ol · 2018-05-05T11:00:08Z

src/catalog/abstract_catalog.cpp

+* @param   txn               TransactionContext
+*
+* @return  Unique pointer of vector of logical tiles
+*/


Move comments to header files.

db-ol · 2018-05-05T11:00:24Z

src/catalog/abstract_catalog.cpp

+ * @param   column_offsets    columns used for seq scan
+ * @param   predicate         Predicate used in the seq scan
+ * @return  true if successfully executes
+ */


Move comments to header files.

db-ol · 2018-05-05T11:20:13Z

src/catalog/column_catalog.cpp

+  auto constant_expr_7 = new expression::ConstantValueExpression(
+    val7);
+
+  tuples.push_back(std::vector<ExpressionPtr>());


Use emplace_back instead of push_back as the former would construct the object immediately.

db-ol · 2018-05-05T13:12:36Z

src/catalog/settings_catalog.cpp

-      config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
-    }
+  PELOTON_ASSERT(result_tuples.size() <= 1);
+  if (result_tuples.size() != 0) {


Use empty method to check for emptiness.

db-ol · 2018-05-05T13:12:44Z

src/catalog/settings_catalog.cpp

-      config_value = (*result_tiles)[0]->GetValue(0, 0).ToString();
-    }
+  PELOTON_ASSERT(result_tuples.size() <= 1);
+  if (result_tuples.size() != 0) {


Use empty method to check for emptiness.

db-ol · 2018-05-05T14:45:05Z

test/catalog/catalog_test.cpp

+// EXPECT_EQ(nullptr, table_object_1);
+// txn_manager.CommitTransaction(txn);
+//}
+//


Do you have problems passing this test? Are you going to uncomment it later?

This is the test we create for #1336 . Now it has been fixed.

db-ol · 2018-05-05T14:46:31Z

test/catalog/catalog_test.cpp

@@ -24,6 +24,7 @@
 #include "storage/storage_manager.h"
 #include "type/ephemeral_pool.h"
 #include "sql/testing_sql_util.h"
+#include "common/timer.h"


Why do you need to include this header file?

This is needed for performance testing. We plan to remove it later.

db-ol · 2018-05-05T14:52:24Z

src/include/catalog/trigger_catalog.h

@@ -17,7 +17,7 @@
 // 0: oid (pkey)
 // 1: tgrelid   : table_oid
 // 2: tgname    : trigger_name
-// 3: tgfoid    : function_oid
+// 3: tgfoid    : function_name


Should tgfoid be changed if function_oid was meant to be function_name?

db-ol · 2018-05-06T00:00:12Z

src/catalog/abstract_catalog.cpp

+ codegen::BufferingConsumer buffer{{}, context};
+
+
+ bool cached;


Would be better to define cached below where it's first assigned to a value.

db-ol · 2018-05-06T01:26:06Z

src/catalog/abstract_catalog.cpp

+
+ codegen::BufferingConsumer buffer{column_offsets, context};
+
+ bool cached;


Move the cached definition to where it's first assigned.

db-ol · 2018-05-06T01:27:02Z

src/catalog/abstract_catalog.cpp

+
+  // Create consumer
+  codegen::BufferingConsumer buffer{column_offsets, scan_context};
+  bool cached;


Move the cached definition to where it's first assigned.

db-ol · 2018-05-06T01:27:09Z

src/catalog/abstract_catalog.cpp

+
+     codegen::BufferingConsumer buffer{column_offsets, context};
+
+     bool cached;


Move the cached definition to where it's first assigned.

db-ol · 2018-05-06T01:27:15Z

test/codegen/query_cache_test.cpp

+
+      // execute SELECT a FROM table where a == 40;
+      codegen::BufferingConsumer buffer_1{{0, 1}, context_1};
+      bool cached;


Move the cached definition to where it's first assigned.

gvos94 · 2018-05-10T22:12:28Z

src/catalog/column_catalog.cpp

@@ -142,9 +145,9 @@ bool ColumnCatalog::InsertColumn(oid_t table_oid,
                                 const std::vector<Constraint> &constraints,
                                 type::AbstractPool *pool,
                                 concurrency::TransactionContext *txn) {
+  (void) pool;


you could probably use the "UNUSED ATTRIBUTE" specifier instead.

The same applies to a bunch of other places where "(void) attr" has been used to pacify the compiler.

I am not sure why we would want to keep the parameter if we are not using it at all.

gvos94 · 2018-05-10T22:46:46Z

src/catalog/query_metrics_catalog.cpp

+  values.emplace_back(new expression::ConstantValueExpression(
+      val11));
+  values.emplace_back(new expression::ConstantValueExpression(
+      val12));



Again, the multiple mallocs invocations can be coalesced using an array of objects.

gvos94 · 2018-05-10T23:37:52Z

src/catalog/column_catalog.cpp

+
+  auto *table_oid_expr =
+      new expression::TupleValueExpression(type::TypeId::INTEGER, 0,
+                                           ColumnId::TABLE_OID);


I don't think the allocated memory has been freed. Irrespective, you should probably use smart pointers instead of working with raw pointers. The same applies to a bunch of other places in the code where raw pointers could be replaced with smart pointers.

gvos94 · 2018-05-10T23:49:06Z

src/catalog/column_stats_catalog.cpp

+  most_common_freqs = tuple.GetValue(ColumnId::MOST_COMMON_FREQS);
+  hist_bounds = tuple.GetValue(ColumnId::HISTOGRAM_BOUNDS);
+  column_name = tuple.GetValue(ColumnId::COLUMN_NAME);
+  has_index = tuple.GetValue(ColumnId::HAS_INDEX);


Pooja mentioned in the last meeting that tuple.GetValue() is going to deprecated. However, I vaguely remember her mentioning that it could still be used with catalog tables. You might want to talk to her/Andy about this.

gvos94 · 2018-05-11T00:17:33Z

src/include/index/bwtree.h

@@ -2864,6 +2866,21 @@ class BwTree : public BwTreeBase {
   * the mapping table rather than CAS with nullptr
   */
  void InitMappingTable() {
+    mapping_table = (std::atomic<const BaseNode *> *) \
+                    mmap(NULL, 1024 * 1024 * 1024, 


The value (1024 x 1024 x 1024) shouldn't be hardcoded (the MAPPING_TABLE_SIZE is #defined to 1MB).

InitMappingTable() seems to be invoked from the bwtree constructor - is 1GB mmaped for every index created? If so, this seems like an overkill.

gvos94 · 2018-05-11T00:18:14Z

src/include/index/bwtree.h

+      // NOTE: Only unmap memory here because we need to access the mapping
+      // table in the above routine. If it was unmapped in ~BwTree() then this
+      // function will invoke illegal memory access
+      int munmap_ret = munmap(tree_p->mapping_table, 1024 * 1024 * 1024);


Again, a hard-coded value shouldn't be used.

gvos94 · 2018-05-11T00:41:02Z

src/include/catalog/column_stats_catalog.h

@@ -107,6 +107,7 @@ class ColumnStatsCatalog : public AbstractCatalog {

 private:
  ColumnStatsCatalog(concurrency::TransactionContext *txn);
+  std::vector<oid_t> all_column_ids = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};



It seems the purpose of all_column_ids is to use it with a copy constructor to construct a vector with all the column oid's.

Firstly, I don't think you need to declare an all_column_ids vector, it can be easily constructed explicitly at runtime. Moreover hardcoding the values it is not a good idea since if anyone changes the schema of the catalog table, he/she must also remember to update the vector declaration.

A better way to construct this at runtime without breaking abstractions is by using the "NumColumns" method and filling the vector using std::iota.

The same applies to every declaration of all_columns_ids and column_ids.

gvos94 · 2018-05-11T00:50:21Z

src/include/catalog/constraint.h

-  type::Value* getDefaultValue() {
-    return default_value.get();
-  }
+  type::Value *getDefaultValue() { return default_value.get(); }



This can probably be made inline.

gvos94 · 2018-05-11T00:59:48Z

src/include/catalog/zone_map_catalog.h

-    TYPE_OFF = 2 
-  };
+  enum ZoneMapOffset { MINIMUM_OFF = 0, MAXIMUM_OFF = 1, TYPE_OFF = 2 };
+


I'm not sure why a strongly typed enum has been replaced by an old-style enum. It's better to stick to stongly typed enums.

The same applies to a bunch of places where old-style enums have been used.

gvos94 · 2018-05-11T01:56:37Z

src/catalog/column_catalog.cpp

+    val6);
+  auto constant_expr_7 = new expression::ConstantValueExpression(
+    val7);
+


Since the number of objects to be allocated is known at compile time. Multiple malloc invocations can be replaced by a single invocation by allocating an array of objects with parameterized constructor.

eg: arr_constant_expr = new expression::ConstantValueExpression[8]{{val0},{val1}...{val7}}

This same applies to a bunch of other places where the number of objects to be allocated is known at compile time.

gvos94

Overall, the code looks clean, other than some minor changes. Good job on including the doxygen comments.

latelatif · 2018-05-12T02:07:11Z

src/catalog/abstract_catalog.cpp

+
+ // search for query
+ codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;
+ std::unique_ptr<codegen::Query> compiled_query(nullptr);


This object compiled_query is never used if the query is cached. Why create it every time? Can be moved inside the if block

latelatif · 2018-05-12T02:22:27Z

src/catalog/abstract_catalog.cpp

+ query->Execute(std::move(executor_context), buffer,
+                [&ret](executor::ExecutionResult result) { ret = result; });
+
+ return ret.m_result == peloton::ResultType::SUCCESS;


Should we keep the query in our cache even if it fails?
I am not sure if this is an overkill or even correct but we could probably keep only the compiled queries that return success in our cache

latelatif · 2018-05-12T02:41:49Z

src/catalog/abstract_catalog.cpp

+   new executor::ExecutorContext(txn, std::move(parameters)));
+
+ // search for query
+ codegen::Query *query = codegen::QueryCache::Instance().Find(insert_plan);;


We should probably make this a function and reuse the code instead of re writing it for every other function. Something like GetCompiledQuery() which searches for the query in the cache and if not found, compiles it and adds it to the cache.

We can even make it a macro or an inline function if it is on the critical path and we want to save a function call

latelatif · 2018-05-12T02:49:37Z

src/catalog/abstract_catalog.cpp

+
+     size_t column_count = catalog_table_->GetSchema()->GetColumnCount();
+     for (size_t col_itr = 0; col_itr < column_count; col_itr++) {
+      // Skip any column for update


Not sure what this comment means here. Could you please help me understand what this means

I think it is only looking for the columns in the tuple which need to be updated. For more detailed explanation you can turn to @mengranwo

latelatif · 2018-05-12T03:02:06Z

src/catalog/column_catalog.cpp

-    : table_oid(tile->GetValue(tupleId, ColumnCatalog::ColumnId::TABLE_OID)
+
+ColumnCatalogObject::ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple)
+    : table_oid(wrapped_tuple.GetValue(ColumnCatalog::ColumnId::TABLE_OID)


As mentioned by @gandeevan, GetValue is deprecated now. Please talk to @poojanilangekar to make sure you can use this for a wrapped tuple object safely

latelatif · 2018-05-12T03:08:49Z

src/catalog/column_stats_catalog.cpp

-  tuple->SetValue(ColumnId::COLUMN_NAME, val_column_name, pool);
-  tuple->SetValue(ColumnId::HAS_INDEX, val_has_index, nullptr);
+  tuples.emplace_back();
+//  tuples.push_back(std::vector<ExpressionPtr>());


latelatif · 2018-05-12T03:15:38Z

src/catalog/language_catalog.cpp

-LanguageCatalogObject::LanguageCatalogObject(executor::LogicalTile *tuple)
-    : lang_oid_(tuple->GetValue(0, 0).GetAs<oid_t>()),
-      lang_name_(tuple->GetValue(0, 1).GetAs<const char *>()) {}
+LanguageCatalogObject::LanguageCatalogObject(codegen::WrappedTuple tuple)


Do we want to make a copy of the WrappedTuple parameter? Wouldn't using a reference suffice?

latelatif · 2018-05-12T03:18:04Z

src/catalog/proc_catalog.cpp


 namespace peloton {
 namespace catalog {

 #define PROC_CATALOG_NAME "pg_proc"

-ProcCatalogObject::ProcCatalogObject(executor::LogicalTile *tile,
+ProcCatalogObject::ProcCatalogObject(codegen::WrappedTuple wrapped_tuple,


Same as above. Is a reference not good enough? This applies to all such uses in other constructors as well

latelatif · 2018-05-12T03:23:36Z

src/catalog/trigger_catalog.cpp

+      GetResultWithCompiledSeqScan(column_ids, predicate, txn);
+
+  // carefull! the result could be null!
+  LOG_INFO("size of the result tiles = %lu", result_tuples.size());


Is this still needed? Should it be LOG_INFO?

latelatif · 2018-05-12T03:25:43Z

src/include/catalog/column_catalog.h

@@ -40,7 +40,7 @@ namespace catalog {

 class ColumnCatalogObject {
 public:
-  ColumnCatalogObject(executor::LogicalTile *tile, int tupleId = 0);
+  ColumnCatalogObject(codegen::WrappedTuple wrapped_tuple);


Use a reference instead

latelatif

Some of the test cases fail when running make check

The following tests FAILED:
52 - stats_test (Failed)
54 - zone_map_scan_test (Failed)
Errors while running CTest
make[3]: *** [test/CMakeFiles/check] Error 8
make[2]: *** [test/CMakeFiles/check.dir/all] Error 2
make[1]: *** [test/CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
akanjani@dev3:~/review/peloton/build$ git status
On branch pcq-cp2
Your branch is up-to-date with 'origin/pcq-cp2'.

nothing to commit, working directory clean
akanjani@dev3:~/review/peloton/build$

1. valgrind memory leak during fillepredicatearray 2. uninitialized Value being moved/destroyed - this has been solved 3. Concurrency issue, need to put an issue on this

…avis

…alue to pass by reference. Also add UNUSED_ATTRIBUTE instead of (void).

coveralls · 2018-05-18T05:44:00Z

Coverage increased (+0.04%) to 77.595% when pulling 89161d8 on nwang57:pcq-cp2 into 5686479 on cmu-db:master.

apavlo · 2018-06-21T13:44:46Z

I am reviving this PR. We will need this when we get rid of the interpreted engine. It also has bug fix for #1362

nwang57 and others added 7 commits May 2, 2018 23:01

fix column id and remove index catalog singleton

8d2dc8b

add table insert and delete test

7b5d5f2

compiled update plan

644a73b

modify scheme catalog

b318853

remove catalog object constructor from logical tile

038f7e2

db-ol reviewed May 5, 2018

View reviewed changes

nwang57 and others added 4 commits May 5, 2018 11:26

move comment to headers

dedc056

some optimization

d215620

fixed CreateThenDropTable in catalog_test.cpp

96e0ac8

changed push_back to emplace_back, and uses empty() instead of size()==1

90439c3

db-ol reviewed May 6, 2018

View reviewed changes

saatviks added do not merge class-project labels May 6, 2018

gvos94 assigned gvos94 and unassigned gvos94 May 10, 2018

gvos94 self-requested a review May 10, 2018 21:14

gvos94 reviewed May 10, 2018

View reviewed changes

gvos94 reviewed May 11, 2018

View reviewed changes

latelatif reviewed May 12, 2018

View reviewed changes

Zeninma added 5 commits May 12, 2018 16:58

There are 3 bugs need to be fixed:

6e40ce2

1. valgrind memory leak during fillepredicatearray 2. uninitialized Value being moved/destroyed - this has been solved 3. Concurrency issue, need to put an issue on this

fix zone_map_scan_test memory leak

d2f0a9a

fixed value operator bug

ca66c5b

remove some debug messages to avoid log file size exceeds limit on Tr…

c718372

…avis

for CatalogObject constructor change the wrapped_tuple from pass by v…

b410e3d

…alue to pass by reference. Also add UNUSED_ATTRIBUTE instead of (void).

This was referenced May 13, 2018

ZoneMap bug when using cached compiled query #1362

Open

Destruction of uninitialized Value in FillPredicateArray #1363

Open

nwang57 added 2 commits May 17, 2018 21:19

Merge branch 'master' into pcq-cp2

f581859

fix tests

89161d8

apavlo added ready_for_review and removed class-project do not merge labels Jun 21, 2018

apavlo requested a review from pervazea June 21, 2018 13:44

apavlo mentioned this pull request Jun 21, 2018

[15721] Pre-compiled catalog access #1281

Closed


		codegen::BufferingConsumer buffer{column_offsets, context};

		bool cached;

[15721] Compiled Catalog Query Access #1339

Are you sure you want to change the base?

[15721] Compiled Catalog Query Access #1339

Conversation

nwang57 commented May 4, 2018

db-ol left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

db-ol May 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

db-ol May 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvos94 May 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvos94 May 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvos94 May 10, 2018 • edited Loading

Choose a reason for hiding this comment

gvos94 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvos94 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

gvos94 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvos94 May 11, 2018 • edited Loading

Choose a reason for hiding this comment

gvos94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

latelatif left a comment

Choose a reason for hiding this comment

coveralls commented May 18, 2018

apavlo commented Jun 21, 2018

db-ol left a comment •

edited

Loading

db-ol May 5, 2018 •

edited

Loading

db-ol May 6, 2018 •

edited

Loading

gvos94 May 10, 2018 •

edited

Loading

gvos94 May 10, 2018 •

edited

Loading

gvos94 May 10, 2018 •

edited

Loading

gvos94 May 11, 2018 •

edited

Loading

gvos94 May 11, 2018 •

edited

Loading

gvos94 May 11, 2018 •

edited

Loading

gvos94 May 11, 2018 •

edited

Loading