Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Schema - Layout decoupling #1327

Merged
merged 42 commits into from
May 14, 2018
Merged

Schema - Layout decoupling #1327

merged 42 commits into from
May 14, 2018

Conversation

poojanilangekar
Copy link
Contributor

@poojanilangekar poojanilangekar commented Apr 24, 2018

This PR decouples the logical schema from the physical layout of the data. Previously, each TileGroup contained a vector of schemas for which violates the notion of the storage layer being a physical store. We also had cases where the LayoutTuner modifies the column_map which ended up modifying the schemas vector via the DataTable. There were multiple other instances where the Schema was used to get the layout of a TileGroup while all we needed was the physical mapping between column_ids and a <tile_id, column_offset> pair. This PR now makes it possible to make schema changes like changing column name, adding & dropping columns without making any calls to the storage layer (i.e., It can now be done entirely in the catalog).

Changes:

  • Added Layout and LayoutCatalog classes to store the layout in a physical object and the pg_layout.
  • Added a default_layotut_ object to each table (replacing default_partition) and tile_group_layout_ to each TileGroup.
  • Persist the HYBRID layouts to the pg_catalog table. The two pre-defined layouts, ROW and COLUMN stores are not persisted.
  • Added support in the LLVM engine to read the Layout on a per TileGroup while invoking the TableScanTranslator.
  • Modified the LLVM updater and inserter to check that we insert new versions only if the TileGroup has a LayoutType::ROW layout.
  • Modified LayoutTuner to tune the Layout rather than the column_map. Additionally, the tuning decisions are now transactional.
  • Added the pg_catalog to the SystemsCatalog to ensure that the each layout is stored in a catalog with respect its database.
  • Minor cleanup in the catalog (some typos and a couple of magic numbers which caused tests to fail).

Testing:

  • Modified the LayoutTuner test to ensure it works as expected after the refactor.
  • Minor changes in the existing TileGroupLayoutTest and SeqScanTest which ensures that the old engine works as expected.
  • Added tests to the TableScanTranslatorTests to replicate the tests in TileGroupLayoutTest and SeqScanTest. The tests check that the LLVM engine can perform scans on tables different layouts and on tables where each TileGroup has a different Layout.
  • Added a test to CatalogTests to ensure that the pg_layout stores and retrieves data as expected and its state is transactional.

Disclaimer:

  • Each Tile still contains a Schema which will be used by the old engine, the GetValue and SetValue functions and (currently) the LLVM engines to determine the stride in case of a HYBRID layout. I believe we have to continue supporting the GetValue and SetValue functions even after the old engine is deprecated because we will be needing these interfaces in the Catalog and possibly the Optimizer.
  • In order to make the new layout change compatible with the LogicalTile generated by the executor engine, I had to reconstruct the Schemas. This isn't very efficient but the idea is that it is currently only used in tests and will soon be deprecated.
  • Unlike the other Catalogs (i.e., ColumnCatalog, StatsCatalog), the LayoutCatalog directly returns a Layout rather than a LayoutCatalogObject since the API are designed to be read only. In essence, the layout is immutable because if you were changing the layout, you need to change the physical representation. So it felt redundant to create a read only object to be returned by the catalog.

@mengranwo Can you please review the catalog changes?
@pmenon Can you please review the changes to the LLVM engine? Mainly, RuntimeFunctions, Inserter, Updater, TestingCodegenUtil and TableScanTranslatorTests.
@pervazea Can you please review the overall changes? I have modified a few APIs in the old engine. The current code doesn't break the existing tests and it seems to function as expected. But please let me know if you think there is anything I need to change.
@jarulraj I know you may not have the bandwidth for this, but if possible could you please review the changes to LayoutTuner and LayoutTunerTest. Nobody else has modified that code and it would be great if you could take a look.

@coveralls
Copy link

coveralls commented Apr 24, 2018

Coverage Status

Coverage increased (+0.1%) to 77.551% when pulling 38a5614 on poojanilangekar:master into d68ab71 on cmu-db:master.

}
return new_layout;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wondered what is the difference between function CreateLayout() and CreateDefaultLayout() cause they both have the same parameters passes in and same returned value. Is CreateLayout() only a helper function that is being called in test cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And from my understanding, only HYBRID layout type will be persist in pg_layout catalog table, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current transformation infrastructure, the layouts are changed in two contexts:

  1. Transform a specific TileGroup. In this case we want to change the layout of that TileGroup but not the default_layout_ of the table (used to allocate new TileGroups).
  2. The LayoutTuner changes the default_layout_ of the entire table based on its tuning mechanism.
    I have provided the APIs to support both operations, however, we might later want to change this because currently the Inserter only supports inserts for LayoutType::Row. I believe this can be decided once the tuner is integrated into the system.

Yes, we only need to persist HYBRID layouts because they contain real mapping information. For ROW and COLUMN layouts, you can reconstruct it as long as you know the number of columns.


column_map_type GetDefaultLayout() const;
inline void ResetDefaultLayout(LayoutType type = LayoutType::ROW) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments to clarify under what circumstances the Default LayoutType would be set to COLUMN(Seem like only appears in test cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, currently it is only done in test cases. However, once the tuner is in place, the brain may decide that a particular table is always used for OLAP workloads and hence it might make sense to transform it to COLUMN layout.

return false;
}

valid_layout_objects_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set to be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this function similar to TableCatalog::InsertColumnObject. My understanding is that this updates the validity of the TableCatalogObject cache and makes sure that subsequent requests are serviced by the cache directly. Was this assumption incorrect? Please let me know if i should change this.

Copy link
Contributor

@mengranwo mengranwo May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our logic within TableCatalogObject, we only the parameter valid_table_objects_ when ALL the table objects are inserted into cache. For your case, set valid_layout_objects_ only when all the layout objects from this table is inserted into txn cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tile->pool->Free(varlen_ptr);
}
}
void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything else has been change inside gc_manager.cpp except formatting and change the name of pass in parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, There is no tile_schemas object in the TileGroup. So it changes the way we check whether or not we are freeing a VARCHAR attribute.

@@ -73,6 +79,15 @@ class TableCatalogObject {
std::shared_ptr<ColumnCatalogObject> GetColumnObject(
const std::string &column_name, bool cached_only = false);

// Evict all layouts from the cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may remember this wrong, but last time we talked you were saying add an extra column in pg_table to record the largest layout_id for each data table.

txn_manager.CommitTransaction(txn);

txn = txn_manager.BeginTransaction();
auto table =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use table catalog object instead of data table to obtain information like database oid and table oid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mengranwo
Copy link
Contributor

mengranwo commented May 4, 2018

@poojanilangekar
Hi, this PR looks good for me except for some minor changes involved and some questions I posted in the comment area. I mainly focus on the logic of catalog & its corresponding test cases and I also take a look at the layout.h/layout.cpp, it stands.

And thank you for fix the typo in DEFAULT_SCHEMA_NAME!

// Drop the default layout.
auto default_layout_oid = default_layout->GetOid();
txn = txn_manager.BeginTransaction();
EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said Persist the HYBRID layouts to the pg_catalog table. The two pre-defined layouts, ROW and COLUMN stores are not persisted.
I may understand this wrong, the word ** Persist** means store this information in catalog table, right? If that is true, why DropLayout(default_layout_oid) returns SUCCESS? Since default_layout is ROW stored, it should not be inserted into pg_layout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct.
I was returning SUCCESS to indicate that the layout has been removed from pg_layout.

The initial default_layout_ of each table is always ROW store. However, it can be changed via CreateDefaultLayout function, wherein it creates a layout, persists it in pg_layout and updates the DataTable. In the case of this test case, I transformed the table to contain a HYBRID layout as the default_layout_ of the table.

Does this test case make sense? Or should I change it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes sense now. Thanks for your explanation

Copy link
Contributor

@mengranwo mengranwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing I found confusing.

Copy link
Contributor

@pervazea pervazea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments added. Will read further. Mostly looks good.

@@ -161,7 +162,7 @@ void Catalog::BootstrapSystemCatalogs(storage::Database *database,
system_catalogs->GetSchemaCatalog()->InsertSchema(
CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn);
system_catalogs->GetSchemaCatalog()->InsertSchema(
DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn);
DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well fix:
DEFUALT_SCHEMA_OID to be DEFAULT_....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/*
* @brief create a new layout for a table and make it the deafult if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deafult -> default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -555,6 +561,63 @@ ResultType Catalog::CreateIndex(
return ResultType::SUCCESS;
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers documenting the functions should be in the .h file (not the .cpp file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @param table_oid the table to which the layout belongs
* @param layout_oid the layout to be dropped
* @param txn TransactionContext
* @return TransactionContext ResultType(SUCCESS or FAILURE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete word TransactionContext? Just a ResultType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bool TableCatalogObject::InsertLayout(
std::shared_ptr<const storage::Layout> layout) {
// Invalid object
if (layout == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to be nice and permit nullptr or should this be an assertion that it is non-null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the existing TableCatalog convention. I have updated it to check for INVALID_OID as well.
@mengranwo Do you think this is fine or should we change this and all other instances?


// Check that both tile groups have the same schema
// Currently done by checking that the number of columns are equal
// TODO Pooja: Handle schena equality for multiple schema versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schena -> schema
Haven't looked at where / how equality is used yet, but ...
Is equality via same number of columns sufficient? Should equality also check equality of column types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.

if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) ==
nullptr) {
txn_manager.AbortTransaction(txn);
LOG_DEBUG("Layout Update to failed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct thing to do here?
void UpdateDefaultPartition, so no success/fail returnable at the moment, but the function, I assume, expects success. Therefore throwing an assertion here might be more correct. Failing with just a debug message is probably not ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is going to be a part of the brain code to Tune the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think it is fine not to throw an exception.
This is not your issue, but I'm wondering why the tuner doesn't apply a bit more intelligence. Just re-trying an operation without keeping any stats or ensuring you don't end up re-trying it forever, seems optimistic.

Copy link
Contributor Author

@poojanilangekar poojanilangekar May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this function to return a bool to indicate success. Later, @malin1993ml can determine how to handle failures in the Tuner as a part of thebrain. Can you please review the rest of the changes? This needs to be merged in order to support schema changes.

Also the query_logger_test failure is the same as #1307. So I am assuming we can ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pervazea I have changed that part. Please let me know if there are any other changes I need to make.

EXPECT_EQ(
11,
CATALOG_TABLES_COUNT + 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The +3 hardcoded number... improvement that now using CATALOG_TABLES_COUNT. Can we fix it all the way by using define constant instead of the 3 or updating CATALOG_TABLES_COUNT if that is the right solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -6,7 +6,7 @@
//
// Identification: test/tuning/layout_tuner_test.cpp
//
// Copyright (c) 2015-16, Carnegie Mellon University Database Group
// Copyright (c) 2015-16, Carnegie Mellon University Databa e Group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental deletion of s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


class LayoutCatalog : public AbstractCatalog {
public:
// Global Singleton, only the first call requires passing parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we trying to eliminate our use of singletons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was accidental, removed it.

Copy link
Contributor Author

@poojanilangekar poojanilangekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengranwo @pervazea Thanks for reviewing the PR. I have addressed all your comments inline. Please take a look at it again and let me know if I need to make any other changes.

tile->pool->Free(varlen_ptr);
}
}
void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, There is no tile_schemas object in the TileGroup. So it changes the way we check whether or not we are freeing a VARCHAR attribute.

@@ -161,7 +162,7 @@ void Catalog::BootstrapSystemCatalogs(storage::Database *database,
system_catalogs->GetSchemaCatalog()->InsertSchema(
CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn);
system_catalogs->GetSchemaCatalog()->InsertSchema(
DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn);
DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/*
* @brief create a new layout for a table and make it the deafult if
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

txn_manager.CommitTransaction(txn);

txn = txn_manager.BeginTransaction();
auto table =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @param table_oid the table to which the layout belongs
* @param layout_oid the layout to be dropped
* @param txn TransactionContext
* @return TransactionContext ResultType(SUCCESS or FAILURE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -6,7 +6,7 @@
//
// Identification: test/tuning/layout_tuner_test.cpp
//
// Copyright (c) 2015-16, Carnegie Mellon University Database Group
// Copyright (c) 2015-16, Carnegie Mellon University Databa e Group
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


class LayoutCatalog : public AbstractCatalog {
public:
// Global Singleton, only the first call requires passing parameters.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was accidental, removed it.

// Drop the default layout.
auto default_layout_oid = default_layout->GetOid();
txn = txn_manager.BeginTransaction();
EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct.
I was returning SUCCESS to indicate that the layout has been removed from pg_layout.

The initial default_layout_ of each table is always ROW store. However, it can be changed via CreateDefaultLayout function, wherein it creates a layout, persists it in pg_layout and updates the DataTable. In the case of this test case, I transformed the table to contain a HYBRID layout as the default_layout_ of the table.

Does this test case make sense? Or should I change it?


// Check that both tile groups have the same schema
// Currently done by checking that the number of columns are equal
// TODO Pooja: Handle schena equality for multiple schema versions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.

if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) ==
nullptr) {
txn_manager.AbortTransaction(txn);
LOG_DEBUG("Layout Update to failed.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is going to be a part of the brain code to Tune the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?

mengranwo
mengranwo previously approved these changes May 7, 2018
Copy link
Member

@pmenon pmenon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick aesthetic/style review. Will take a closer look at design soon.

// Get the tile offset assuming that it is still a row store
// Hence the Tile offset is 0.
auto layout = tile_group->GetLayout();
PELOTON_ASSERT(layout.IsRowStore());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You probably want to make this a const reference to avoid the copy.
  2. You'll need to add UNUSED_ATTRIBUTE to pass Release builds.


namespace storage {
class Layout;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close namespace.

@@ -17,6 +17,10 @@

namespace peloton {

namespace catalog {
class Schema;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close namespace.

/* tile_map_type used to store the mapping between a tile and its columns
* <tile index> to vector{<original column index, tile column offset>}
*/
typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold that these are common enough types for the whole system that they should be here ... but I guess it's okay.

@@ -247,9 +249,17 @@ class DataTable : public AbstractTable {

void ClearLayoutSamples();

void SetDefaultLayout(const column_map_type &layout);
inline void SetDefaultLayout(std::shared_ptr<const Layout> new_layout) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the inline keyword.


namespace catalog {
class Schema;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close namespace.

//
// Identification: src/include/storage/layout.h
//
// Copyright (c) 2015-18, Carnegie Mellon University Database Group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the formatter.py script, it'll add the properly formatted copyright.

Layout(const column_map_type &column_map, oid_t layout_oid);

/** @brief Check whether this layout is a row store. */
inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove inline.

inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); }

/** @brief Check whether this layout is a column store. */
inline bool IsColumnStore() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove inline.

@@ -35,6 +36,122 @@ class TableScanTranslatorTest : public PelotonCodeGenTest {
CreateAndLoadAllColsTable();
}

void ExecuteTileGroupTest(peloton::LayoutType layout_type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify the base test class functions to accept a LayoutType? Then you should reuse the create/loading functions rather than creating a one-off function. This also allows other tests to use custom table layouts for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed all your other comments except this one.
From what I understood, the PelotonCodeGenTest does not use the LLVM inserter. The LLVM inserter would only insert into LayoutType::ROW.
Can I make that assumption or do we plan to change that later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the tests do not use the inserter, but instead directly insert into the table here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmenon I have modified the CreateTestTables function to get the layout as a parameter. However, since its called from the constructor, so the entire test suit would get called with the given layout. I have however moved the CreateAndLoad function to the base class. Please let me know if you want me to change anything else.

Copy link
Contributor

@pervazea pervazea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I did not read the logic as completely as I would have liked, as there is a lot of code here.

Comments added are all style issues. The main one is that oid_t is used in many places where uint32_t or other flavor of int should be used.

void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group,
oid_t tuple_id) {
oid_t tile_count = tile_group->tile_count_;
oid_t tile_col_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tile_count and tile_col_count should be uint32_t not oid_t.
While use of oid_t will compile and run correctly, it is confusing and incorrect style.

@@ -99,7 +100,8 @@ class Catalog {
const std::string &database_name, const std::string &schema_name,
const std::string &table_name, std::unique_ptr<catalog::Schema>,
concurrency::TransactionContext *txn, bool is_catalog = false,
oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP);
oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oid_t -> uint32_t

@@ -214,13 +190,12 @@ class TileGroup : public Printable {
oid_t num_tuple_slots;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t

@@ -214,13 +190,12 @@ class TileGroup : public Printable {
oid_t num_tuple_slots;

// number of tiles
oid_t tile_count;
oid_t tile_count_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t


storage::Tile *tile = tile_group->GetTile(tile_itr);
const catalog::Schema &schema = *(tile->GetSchema());
oid_t tile_column_count = schema.GetColumnCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t

double GetLayoutDifference(const storage::Layout &other) const;

/** @brief Returns the tile_id of the given column_id. */
oid_t GetTileIdFromColumnId(oid_t column_id) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be oid_t because the function returns a tile_id.

oid_t GetTileIdFromColumnId(oid_t column_id) const;

/** @brief Returns the column offset of the column_id. */
oid_t GetTileColumnOffset(oid_t column_id) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be oid_t because it's the column_oid within the Tile.

oid_t GetTileColumnOffset(oid_t column_id) const;

/** @brief Returns the number of columns in the layout. */
oid_t GetColumnCount() const { return num_columns_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not oid_t

/** @brief Sets the tile id and column id w.r.t. of the tile corresponding
* to the specified tile group column id.
*/
void LocateTileAndColumn(oid_t column_offset, oid_t &tile_offset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misnamed? More correctly something more like SetTileAndColumn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry! This function was previously a part of TileGroup, I copied the comments as is.

I have changed it now. Essentially, we are using references here because we need to return both the tile_id and tile_column_id.

PELOTON_ASSERT(num_columns_ > column_offset);

// For row store layout, tile id is always 0 and the tile
// column_id and tile column_id is the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment text is unclear...
and the tile column_id and tile column_id is the same

Copy link
Member

@pmenon pmenon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The design is nice, code is pretty well documented. I just had a few questions below.

bool result =
pg_layout->InsertLayout(table_oid, new_layout, pool_.get(), txn);
if (!result) {
LOG_DEBUG("Failed to create a new layout for table %u", table_oid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be LOG_ERROR?

new const storage::Layout(column_map, layout_oid));

// Add the layout the pg_layout table
auto pg_layout = catalog_map_[database_oid]->GetLayoutCatalog();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but this map seems to be accessed concurrently without a lock in Catalog::BootstrapSystemCatalogs()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. There are other small bugs I noticed in #1356 and #1352. I will raise a Catalog cleanup PR once this gets merged in.

/** @brief used to store the mapping between a tile and its columns
* <tile index> to vector{<original column index, tile column offset>}
*/
typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to rename this to something more descriptive? Maybe TileToColumnMap?

@@ -76,7 +75,7 @@ class TileGroup : public Printable {
// Tile group constructor
TileGroup(BackendType backend_type, TileGroupHeader *tile_group_header,
AbstractTable *table, const std::vector<catalog::Schema> &schemas,
const column_map_type &column_map, int tuple_count);
std::shared_ptr<const Layout> layout, int tuple_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're passing in a shared pointer, does that mean this can be null? If not, should we make it a reference?

Also, in terms of ownership, does the TileGroup own the layout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout is actually shared among multiple TileGroups which are initialized with the same layout. We would want the ptr to be alive as long as we have atleast one TileGroup with the same.

To make sure a TileGroup always get a non-null layout, I am now throwing an exception in the TileGroupFactory if it passed a nullptr. Let me know if you want me to handle this differently.

@pervazea pervazea merged commit 5686479 into cmu-db:master May 14, 2018
mtunique pushed a commit to mtunique/peloton that referenced this pull request Apr 16, 2019
* Added layout.h. Modified Runtime functions to handle row stores.

* Modify TileGroup api

* Changed Updater & Inserter to use Layout object

* Remove calls from optimizer and executor

* Removed all calls to schemas vector from TileGroup

* Removed Calls to LocateTileAndColumn for TileGroup

* Moved definition of column_map_type. Removed column_map from tile_group

* Added GetInfo function

* make check pases. Need to change a few more function calls

* Convert PL_ASSERT -> PELOTON_ASSERT

* Modified DataTable and LayoutTunerTest

* Added Layout Test for codegen

* Get the tests to actually use different layouts

* Modified storage layer, builds successfully. Need to fix tests

* Moved GetColumnMapStats -> GetColumnLayoutStats

* Minor change in TableScanTranslator

* Modify TransformTileGroup

* Change layout.cpp to better handle HYBRID layouts

* use std::make_shared

* Modified tests to change calls to GetTileGroup()

* make check passes. Need to add test with hybrid layout.

* Modify print functions

* Add TODOs for catalog

* Added LayoutCatalog. Yet to modify TableCatalog

* Added catalog functions

* Added DeleteLayouts to delete all layouts of a table_oid

* LayoutTunerTest fixed

* Fix build failures

* Access LayoutCatalog via the global Catalog object

* Added Multi layout scan test

* Fixed tests after catalog refactor

* Failing LayoutCatalog test because of deserialization

* Fix all tests

* Added documentation

* Style Fix

* Address review comments + minor clean-up in the layout.h API

* Change valid_layout_objects_

* Revert unused changes

* Addressed Prashanth's initial comments

* Modify CreateTable for tests + modify LayoutTuner

* Address Pervaze's review comments.

* Changed after Prashanth's review
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants