-
Notifications
You must be signed in to change notification settings - Fork 623
Conversation
} | ||
return new_layout; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wondered what is the difference between function CreateLayout()
and CreateDefaultLayout()
cause they both have the same parameters passes in and same returned value. Is CreateLayout()
only a helper function that is being called in test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And from my understanding, only HYBRID layout type will be persist in pg_layout catalog table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current transformation infrastructure, the layouts are changed in two contexts:
- Transform a specific
TileGroup
. In this case we want to change the layout of thatTileGroup
but not thedefault_layout_
of the table (used to allocate newTileGroups
). - The
LayoutTuner
changes thedefault_layout_
of the entire table based on its tuning mechanism.
I have provided the APIs to support both operations, however, we might later want to change this because currently theInserter
only supports inserts forLayoutType::Row
. I believe this can be decided once the tuner is integrated into the system.
Yes, we only need to persist HYBRID
layouts because they contain real mapping information. For ROW
and COLUMN
layouts, you can reconstruct it as long as you know the number of columns.
src/include/storage/data_table.h
Outdated
|
||
column_map_type GetDefaultLayout() const; | ||
inline void ResetDefaultLayout(LayoutType type = LayoutType::ROW) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments to clarify under what circumstances the Default LayoutType would be set to COLUMN(Seem like only appears in test cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, currently it is only done in test cases. However, once the tuner is in place, the brain
may decide that a particular table is always used for OLAP workloads and hence it might make sense to transform it to COLUMN
layout.
src/catalog/table_catalog.cpp
Outdated
return false; | ||
} | ||
|
||
valid_layout_objects_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set to be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this function similar to TableCatalog::InsertColumnObject
. My understanding is that this updates the validity of the TableCatalogObject
cache and makes sure that subsequent requests are serviced by the cache directly. Was this assumption incorrect? Please let me know if i should change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our logic within TableCatalogObject
, we only the parameter valid_table_objects_
when ALL the table objects are inserted into cache. For your case, set valid_layout_objects_
only when all the layout objects from this table is inserted into txn cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tile->pool->Free(varlen_ptr); | ||
} | ||
} | ||
void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything else has been change inside gc_manager.cpp except formatting and change the name of pass in parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, There is no tile_schemas
object in the TileGroup
. So it changes the way we check whether or not we are freeing a VARCHAR
attribute.
@@ -73,6 +79,15 @@ class TableCatalogObject { | |||
std::shared_ptr<ColumnCatalogObject> GetColumnObject( | |||
const std::string &column_name, bool cached_only = false); | |||
|
|||
// Evict all layouts from the cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may remember this wrong, but last time we talked you were saying add an extra column in pg_table to record the largest layout_id for each data table.
test/catalog/catalog_test.cpp
Outdated
txn_manager.CommitTransaction(txn); | ||
|
||
txn = txn_manager.BeginTransaction(); | ||
auto table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use table catalog object instead of data table to obtain information like database oid and table oid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@poojanilangekar And thank you for fix the typo in DEFAULT_SCHEMA_NAME! |
// Drop the default layout. | ||
auto default_layout_oid = default_layout->GetOid(); | ||
txn = txn_manager.BeginTransaction(); | ||
EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said Persist the HYBRID layouts to the pg_catalog table. The two pre-defined layouts, ROW and COLUMN stores are not persisted.
I may understand this wrong, the word ** Persist** means store this information in catalog table, right? If that is true, why DropLayout(default_layout_oid)
returns SUCCESS? Since default_layout
is ROW stored, it should not be inserted into pg_layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
I was returning SUCCESS
to indicate that the layout has been removed from pg_layout
.
The initial default_layout_
of each table is always ROW
store. However, it can be changed via CreateDefaultLayout
function, wherein it creates a layout, persists it in pg_layout
and updates the DataTable
. In the case of this test case, I transformed the table to contain a HYBRID
layout as the default_layout_
of the table.
Does this test case make sense? Or should I change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes sense now. Thanks for your explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I found confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments added. Will read further. Mostly looks good.
src/catalog/catalog.cpp
Outdated
@@ -161,7 +162,7 @@ void Catalog::BootstrapSystemCatalogs(storage::Database *database, | |||
system_catalogs->GetSchemaCatalog()->InsertSchema( | |||
CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn); | |||
system_catalogs->GetSchemaCatalog()->InsertSchema( | |||
DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn); | |||
DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well fix:
DEFUALT_SCHEMA_OID to be DEFAULT_....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
} | ||
|
||
/* | ||
* @brief create a new layout for a table and make it the deafult if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deafult -> default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
@@ -555,6 +561,63 @@ ResultType Catalog::CreateIndex( | |||
return ResultType::SUCCESS; | |||
} | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The headers documenting the functions should be in the .h file (not the .cpp file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
* @param table_oid the table to which the layout belongs | ||
* @param layout_oid the layout to be dropped | ||
* @param txn TransactionContext | ||
* @return TransactionContext ResultType(SUCCESS or FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete word TransactionContext? Just a ResultType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/table_catalog.cpp
Outdated
bool TableCatalogObject::InsertLayout( | ||
std::shared_ptr<const storage::Layout> layout) { | ||
// Invalid object | ||
if (layout == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to be nice and permit nullptr or should this be an assertion that it is non-null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the existing TableCatalog
convention. I have updated it to check for INVALID_OID
as well.
@mengranwo Do you think this is fine or should we change this and all other instances?
src/storage/data_table.cpp
Outdated
|
||
// Check that both tile groups have the same schema | ||
// Currently done by checking that the number of columns are equal | ||
// TODO Pooja: Handle schena equality for multiple schema versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schena -> schema
Haven't looked at where / how equality is used yet, but ...
Is equality via same number of columns sufficient? Should equality also check equality of column types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.
if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) == | ||
nullptr) { | ||
txn_manager.AbortTransaction(txn); | ||
LOG_DEBUG("Layout Update to failed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the correct thing to do here?
void UpdateDefaultPartition, so no success/fail returnable at the moment, but the function, I assume, expects success. Therefore throwing an assertion here might be more correct. Failing with just a debug message is probably not ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is going to be a part of the brain
code to Tune
the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think it is fine not to throw an exception.
This is not your issue, but I'm wondering why the tuner doesn't apply a bit more intelligence. Just re-trying an operation without keeping any stats or ensuring you don't end up re-trying it forever, seems optimistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this function to return a bool
to indicate success. Later, @malin1993ml can determine how to handle failures in the Tuner
as a part of thebrain
. Can you please review the rest of the changes? This needs to be merged in order to support schema changes.
Also the query_logger_test
failure is the same as #1307. So I am assuming we can ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pervazea I have changed that part. Please let me know if there are any other changes I need to make.
test/catalog/catalog_test.cpp
Outdated
EXPECT_EQ( | ||
11, | ||
CATALOG_TABLES_COUNT + 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +3 hardcoded number... improvement that now using CATALOG_TABLES_COUNT. Can we fix it all the way by using define constant instead of the 3 or updating CATALOG_TABLES_COUNT if that is the right solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/tuning/layout_tuner_test.cpp
Outdated
@@ -6,7 +6,7 @@ | |||
// | |||
// Identification: test/tuning/layout_tuner_test.cpp | |||
// | |||
// Copyright (c) 2015-16, Carnegie Mellon University Database Group | |||
// Copyright (c) 2015-16, Carnegie Mellon University Databa e Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental deletion of s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/include/catalog/layout_catalog.h
Outdated
|
||
class LayoutCatalog : public AbstractCatalog { | ||
public: | ||
// Global Singleton, only the first call requires passing parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we trying to eliminate our use of singletons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was accidental, removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengranwo @pervazea Thanks for reviewing the PR. I have addressed all your comments inline. Please take a look at it again and let me know if I need to make any other changes.
tile->pool->Free(varlen_ptr); | ||
} | ||
} | ||
void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, There is no tile_schemas
object in the TileGroup
. So it changes the way we check whether or not we are freeing a VARCHAR
attribute.
src/catalog/catalog.cpp
Outdated
@@ -161,7 +162,7 @@ void Catalog::BootstrapSystemCatalogs(storage::Database *database, | |||
system_catalogs->GetSchemaCatalog()->InsertSchema( | |||
CATALOG_SCHEMA_OID, CATALOG_SCHEMA_NAME, pool_.get(), txn); | |||
system_catalogs->GetSchemaCatalog()->InsertSchema( | |||
DEFUALT_SCHEMA_OID, DEFUALT_SCHEMA_NAME, pool_.get(), txn); | |||
DEFUALT_SCHEMA_OID, DEFAULT_SCHEMA_NAME, pool_.get(), txn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
} | ||
|
||
/* | ||
* @brief create a new layout for a table and make it the deafult if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/catalog/catalog_test.cpp
Outdated
txn_manager.CommitTransaction(txn); | ||
|
||
txn = txn_manager.BeginTransaction(); | ||
auto table = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/catalog/catalog.cpp
Outdated
* @param table_oid the table to which the layout belongs | ||
* @param layout_oid the layout to be dropped | ||
* @param txn TransactionContext | ||
* @return TransactionContext ResultType(SUCCESS or FAILURE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/tuning/layout_tuner_test.cpp
Outdated
@@ -6,7 +6,7 @@ | |||
// | |||
// Identification: test/tuning/layout_tuner_test.cpp | |||
// | |||
// Copyright (c) 2015-16, Carnegie Mellon University Database Group | |||
// Copyright (c) 2015-16, Carnegie Mellon University Databa e Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/include/catalog/layout_catalog.h
Outdated
|
||
class LayoutCatalog : public AbstractCatalog { | ||
public: | ||
// Global Singleton, only the first call requires passing parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was accidental, removed it.
// Drop the default layout. | ||
auto default_layout_oid = default_layout->GetOid(); | ||
txn = txn_manager.BeginTransaction(); | ||
EXPECT_EQ(ResultType::SUCCESS, catalog->DropLayout(database_oid, table_oid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
I was returning SUCCESS
to indicate that the layout has been removed from pg_layout
.
The initial default_layout_
of each table is always ROW
store. However, it can be changed via CreateDefaultLayout
function, wherein it creates a layout, persists it in pg_layout
and updates the DataTable
. In the case of this test case, I transformed the table to contain a HYBRID
layout as the default_layout_
of the table.
Does this test case make sense? Or should I change it?
src/storage/data_table.cpp
Outdated
|
||
// Check that both tile groups have the same schema | ||
// Currently done by checking that the number of columns are equal | ||
// TODO Pooja: Handle schena equality for multiple schema versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Currently it is only called in tests. I agree it is not sufficient. The reason this code existed earlier is because we didn't support schema changes. So a table always had the same schema and so the old code worked fine. I plan to add additional checks once we have the schema changes in place.
if (catalog->CreateDefaultLayout(database_oid, table_oid, column_map, txn) == | ||
nullptr) { | ||
txn_manager.AbortTransaction(txn); | ||
LOG_DEBUG("Layout Update to failed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is going to be a part of the brain
code to Tune
the table layout. A transaction may fail because it conflicts with other concurrent updates. In an ideal case it must fail because tuning is a background task while the other transaction would be user specified. Since this is a background process, in case of failure, it would come around in the next iteration and attempt the change again.
I am not sure if it would be appropriate to throw an exception when the failure was due to an internal event rather than a user action. What do you think?
c16f90a
to
cf631f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick aesthetic/style review. Will take a closer look at design soon.
src/codegen/updater.cpp
Outdated
// Get the tile offset assuming that it is still a row store | ||
// Hence the Tile offset is 0. | ||
auto layout = tile_group->GetLayout(); | ||
PELOTON_ASSERT(layout.IsRowStore()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You probably want to make this a
const
reference to avoid the copy. - You'll need to add
UNUSED_ATTRIBUTE
to pass Release builds.
src/include/catalog/table_catalog.h
Outdated
|
||
namespace storage { | ||
class Layout; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
@@ -17,6 +17,10 @@ | |||
|
|||
namespace peloton { | |||
|
|||
namespace catalog { | |||
class Schema; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
src/include/common/internal_types.h
Outdated
/* tile_map_type used to store the mapping between a tile and its columns | ||
* <tile index> to vector{<original column index, tile column offset>} | ||
*/ | ||
typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold that these are common enough types for the whole system that they should be here ... but I guess it's okay.
src/include/storage/data_table.h
Outdated
@@ -247,9 +249,17 @@ class DataTable : public AbstractTable { | |||
|
|||
void ClearLayoutSamples(); | |||
|
|||
void SetDefaultLayout(const column_map_type &layout); | |||
inline void SetDefaultLayout(std::shared_ptr<const Layout> new_layout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the inline
keyword.
src/include/storage/layout.h
Outdated
|
||
namespace catalog { | ||
class Schema; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close namespace.
src/include/storage/layout.h
Outdated
// | ||
// Identification: src/include/storage/layout.h | ||
// | ||
// Copyright (c) 2015-18, Carnegie Mellon University Database Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the formatter.py
script, it'll add the properly formatted copyright.
src/include/storage/layout.h
Outdated
Layout(const column_map_type &column_map, oid_t layout_oid); | ||
|
||
/** @brief Check whether this layout is a row store. */ | ||
inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline
.
src/include/storage/layout.h
Outdated
inline bool IsRowStore() const { return (layout_type_ == LayoutType::ROW); } | ||
|
||
/** @brief Check whether this layout is a column store. */ | ||
inline bool IsColumnStore() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline
.
@@ -35,6 +36,122 @@ class TableScanTranslatorTest : public PelotonCodeGenTest { | |||
CreateAndLoadAllColsTable(); | |||
} | |||
|
|||
void ExecuteTileGroupTest(peloton::LayoutType layout_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify the base test class functions to accept a LayoutType
? Then you should reuse the create/loading functions rather than creating a one-off function. This also allows other tests to use custom table layouts for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed all your other comments except this one.
From what I understood, the PelotonCodeGenTest
does not use the LLVM inserter. The LLVM inserter would only insert into LayoutType::ROW
.
Can I make that assumption or do we plan to change that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the tests do not use the inserter, but instead directly insert into the table here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmenon I have modified the CreateTestTables
function to get the layout as a parameter. However, since its called from the constructor, so the entire test suit would get called with the given layout. I have however moved the CreateAndLoad
function to the base class. Please let me know if you want me to change anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I did not read the logic as completely as I would have liked, as there is a lot of code here.
Comments added are all style issues. The main one is that oid_t is used in many places where uint32_t or other flavor of int should be used.
src/gc/gc_manager.cpp
Outdated
void GCManager::CheckAndReclaimVarlenColumns(storage::TileGroup *tile_group, | ||
oid_t tuple_id) { | ||
oid_t tile_count = tile_group->tile_count_; | ||
oid_t tile_col_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tile_count and tile_col_count should be uint32_t not oid_t.
While use of oid_t will compile and run correctly, it is confusing and incorrect style.
src/include/catalog/catalog.h
Outdated
@@ -99,7 +100,8 @@ class Catalog { | |||
const std::string &database_name, const std::string &schema_name, | |||
const std::string &table_name, std::unique_ptr<catalog::Schema>, | |||
concurrency::TransactionContext *txn, bool is_catalog = false, | |||
oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP); | |||
oid_t tuples_per_tilegroup = DEFAULT_TUPLES_PER_TILEGROUP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oid_t -> uint32_t
src/include/storage/tile_group.h
Outdated
@@ -214,13 +190,12 @@ class TileGroup : public Printable { | |||
oid_t num_tuple_slots; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
src/include/storage/tile_group.h
Outdated
@@ -214,13 +190,12 @@ class TileGroup : public Printable { | |||
oid_t num_tuple_slots; | |||
|
|||
// number of tiles | |||
oid_t tile_count; | |||
oid_t tile_count_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
|
||
storage::Tile *tile = tile_group->GetTile(tile_itr); | ||
const catalog::Schema &schema = *(tile->GetSchema()); | ||
oid_t tile_column_count = schema.GetColumnCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
double GetLayoutDifference(const storage::Layout &other) const; | ||
|
||
/** @brief Returns the tile_id of the given column_id. */ | ||
oid_t GetTileIdFromColumnId(oid_t column_id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be oid_t
because the function returns a tile_id
.
oid_t GetTileIdFromColumnId(oid_t column_id) const; | ||
|
||
/** @brief Returns the column offset of the column_id. */ | ||
oid_t GetTileColumnOffset(oid_t column_id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be oid_t
because it's the column_oid
within the Tile
.
src/include/storage/layout.h
Outdated
oid_t GetTileColumnOffset(oid_t column_id) const; | ||
|
||
/** @brief Returns the number of columns in the layout. */ | ||
oid_t GetColumnCount() const { return num_columns_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not oid_t
src/include/storage/layout.h
Outdated
/** @brief Sets the tile id and column id w.r.t. of the tile corresponding | ||
* to the specified tile group column id. | ||
*/ | ||
void LocateTileAndColumn(oid_t column_offset, oid_t &tile_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misnamed? More correctly something more like SetTileAndColumn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry! This function was previously a part of TileGroup
, I copied the comments as is.
I have changed it now. Essentially, we are using references here because we need to return both the tile_id
and tile_column_id
.
src/storage/layout.cpp
Outdated
PELOTON_ASSERT(num_columns_ > column_offset); | ||
|
||
// For row store layout, tile id is always 0 and the tile | ||
// column_id and tile column_id is the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment text is unclear...
and the tile column_id and tile column_id is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The design is nice, code is pretty well documented. I just had a few questions below.
src/catalog/catalog.cpp
Outdated
bool result = | ||
pg_layout->InsertLayout(table_oid, new_layout, pool_.get(), txn); | ||
if (!result) { | ||
LOG_DEBUG("Failed to create a new layout for table %u", table_oid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be LOG_ERROR
?
new const storage::Layout(column_map, layout_oid)); | ||
|
||
// Add the layout the pg_layout table | ||
auto pg_layout = catalog_map_[database_oid]->GetLayoutCatalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but this map seems to be accessed concurrently without a lock in Catalog::BootstrapSystemCatalogs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/include/storage/layout.h
Outdated
/** @brief used to store the mapping between a tile and its columns | ||
* <tile index> to vector{<original column index, tile column offset>} | ||
*/ | ||
typedef std::map<oid_t, std::vector<std::pair<oid_t, oid_t>>> tile_map_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to rename this to something more descriptive? Maybe TileToColumnMap
?
@@ -76,7 +75,7 @@ class TileGroup : public Printable { | |||
// Tile group constructor | |||
TileGroup(BackendType backend_type, TileGroupHeader *tile_group_header, | |||
AbstractTable *table, const std::vector<catalog::Schema> &schemas, | |||
const column_map_type &column_map, int tuple_count); | |||
std::shared_ptr<const Layout> layout, int tuple_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're passing in a shared pointer, does that mean this can be null? If not, should we make it a reference?
Also, in terms of ownership, does the TileGroup own the layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layout is actually shared among multiple TileGroups which are initialized with the same layout. We would want the ptr to be alive as long as we have atleast one TileGroup with the same.
To make sure a TileGroup always get a non-null layout, I am now throwing an exception in the TileGroupFactory if it passed a nullptr
. Let me know if you want me to handle this differently.
d2ce8d2
to
d4e151f
Compare
* Added layout.h. Modified Runtime functions to handle row stores. * Modify TileGroup api * Changed Updater & Inserter to use Layout object * Remove calls from optimizer and executor * Removed all calls to schemas vector from TileGroup * Removed Calls to LocateTileAndColumn for TileGroup * Moved definition of column_map_type. Removed column_map from tile_group * Added GetInfo function * make check pases. Need to change a few more function calls * Convert PL_ASSERT -> PELOTON_ASSERT * Modified DataTable and LayoutTunerTest * Added Layout Test for codegen * Get the tests to actually use different layouts * Modified storage layer, builds successfully. Need to fix tests * Moved GetColumnMapStats -> GetColumnLayoutStats * Minor change in TableScanTranslator * Modify TransformTileGroup * Change layout.cpp to better handle HYBRID layouts * use std::make_shared * Modified tests to change calls to GetTileGroup() * make check passes. Need to add test with hybrid layout. * Modify print functions * Add TODOs for catalog * Added LayoutCatalog. Yet to modify TableCatalog * Added catalog functions * Added DeleteLayouts to delete all layouts of a table_oid * LayoutTunerTest fixed * Fix build failures * Access LayoutCatalog via the global Catalog object * Added Multi layout scan test * Fixed tests after catalog refactor * Failing LayoutCatalog test because of deserialization * Fix all tests * Added documentation * Style Fix * Address review comments + minor clean-up in the layout.h API * Change valid_layout_objects_ * Revert unused changes * Addressed Prashanth's initial comments * Modify CreateTable for tests + modify LayoutTuner * Address Pervaze's review comments. * Changed after Prashanth's review
This PR decouples the logical schema from the physical layout of the data. Previously, each TileGroup contained a vector of schemas for which violates the notion of the storage layer being a physical store. We also had cases where the
LayoutTuner
modifies thecolumn_map
which ended up modifying theschemas
vector via theDataTable
. There were multiple other instances where theSchema
was used to get the layout of aTileGroup
while all we needed was the physical mapping between column_ids and a <tile_id, column_offset> pair. This PR now makes it possible to make schema changes like changing column name, adding & dropping columns without making any calls to the storage layer (i.e., It can now be done entirely in the catalog).Changes:
Layout
andLayoutCatalog
classes to store the layout in a physical object and thepg_layout
.default_layotut_
object to each table (replacingdefault_partition
) andtile_group_layout_
to eachTileGroup
.HYBRID
layouts to thepg_catalog
table. The two pre-defined layouts,ROW
andCOLUMN
stores are not persisted.Layout
on a perTileGroup
while invoking theTableScanTranslator
.TileGroup
has aLayoutType::ROW
layout.LayoutTuner
to tune theLayout
rather than thecolumn_map
. Additionally, the tuning decisions are now transactional.pg_catalog
to theSystemsCatalog
to ensure that the each layout is stored in a catalog with respect its database.Testing:
LayoutTuner
test to ensure it works as expected after the refactor.TileGroupLayoutTest
andSeqScanTest
which ensures that the old engine works as expected.TableScanTranslatorTests
to replicate the tests inTileGroupLayoutTest
andSeqScanTest
. The tests check that the LLVM engine can perform scans on tables different layouts and on tables where eachTileGroup
has a differentLayout
.CatalogTests
to ensure that thepg_layout
stores and retrieves data as expected and its state is transactional.Disclaimer:
Tile
still contains aSchema
which will be used by the old engine, theGetValue
andSetValue
functions and (currently) the LLVM engines to determine the stride in case of aHYBRID
layout. I believe we have to continue supporting theGetValue
andSetValue
functions even after the old engine is deprecated because we will be needing these interfaces in theCatalog
and possibly theOptimizer
.LogicalTile
generated by the executor engine, I had to reconstruct theSchemas
. This isn't very efficient but the idea is that it is currently only used in tests and will soon be deprecated.ColumnCatalog
,StatsCatalog
), theLayoutCatalog
directly returns aLayout
rather than aLayoutCatalogObject
since the API are designed to be read only. In essence, the layout is immutable because if you were changing the layout, you need to change the physical representation. So it felt redundant to create a read only object to be returned by the catalog.@mengranwo Can you please review the catalog changes?
@pmenon Can you please review the changes to the LLVM engine? Mainly,
RuntimeFunctions
,Inserter
,Updater
,TestingCodegenUtil
andTableScanTranslatorTests
.@pervazea Can you please review the overall changes? I have modified a few APIs in the old engine. The current code doesn't break the existing tests and it seems to function as expected. But please let me know if you think there is anything I need to change.
@jarulraj I know you may not have the bandwidth for this, but if possible could you please review the changes to
LayoutTuner
andLayoutTunerTest
. Nobody else has modified that code and it would be great if you could take a look.