-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat[WIP]: JIT compiled SQL expressions #2891
Conversation
…build-dev6 Add ability to compile against MariaDB with new cost model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is work in progress. Hopefully these comments are useful. It looks like we so far only have some interfaces but not any actual compilation, which would have been the main focus of my review.
I tried to compile this in the submodule storage/columnstore/columnstore
under MariaDB/server@313c5a1. I got the build to complete, but the directory storage/columnstore/columnstore/jit/CMakeFiles/jit.dir
in my cmake -G Ninja
based build directory is empty. Other ColumnStore compilation units were being built.
jit/jit.cpp
Outdated
module_identifier_to_memory_manager.erase(module_memory_manager_it); | ||
compiled_code_size.fetch_sub(module.size, std::memory_order_relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which code or data structure owns the memory that module
referes to here? Could the memory actually be freed in the erase
call? If so, then these statements should be swapped with each other to make the code safe.
jit/jit.h
Outdated
// TODO lock when compiling a module | ||
mutable std::mutex jit_lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any mutex or any potentially concurrently accessed data structure, it would be useful to document the protection rules. Many of these data members are actually protected by jit_lock
in deleteCompiledModule()
.
jit/jit.h
Outdated
std::unique_ptr<JITSymbolResolver> symbol_resolver; | ||
std::unordered_map<uint64_t, std::unique_ptr<JITModuleMemoryManager>> module_identifier_to_memory_manager; | ||
uint64_t current_module_key = 0; | ||
std::atomic<size_t> compiled_code_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to see comment for documenting the lifetime and purpose of each data member. For compiled_code_size
it would be useful to know what (if any) kind of padding or alignment overhead it would include.
Due to the lack of comments, it took me some time to find a counterpart of deleteCompiledModule()
that would increment compiled_code_size
and add something to module_identifier_to_memory_manager
. It is compileModule(std::unique_ptr<llvm::Module>)
. It seems to lack proper jit_lock
protection.
jit/jit.cpp
Outdated
Arena() : page_size(8) | ||
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could live with an implicit default constructor if the initializer expression for page_size
was revised to this value.
jit/jit.cpp
Outdated
inline size_t getPageSize() const | ||
{ | ||
return page_size; | ||
} | ||
|
||
private: | ||
size_t page_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any modification of this page_size
in the code. I think that it could be declared simply as
const size_t page_size = 8;
or even static const
or static constexpr
to save some memory, and then we could avoid explicitly declaring Arena()
. I think that it is rather pointless to declare const
data members private
and to define public accessors for them.
What is the unit here? Bytes? Kilobytes? Some comments would be helpful.
Typically, sysconf(PAGESIZE)
should return 4096 or a larger value.
utils/funcexp/functor.h
Outdated
bool isCompilable(const execplan::CalpontSystemCatalog::ColType& colType) | ||
{ | ||
return false; | ||
} | ||
llvm::Value* compile() | ||
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the actual interesting part is still missing. The body of compile()
seems to be missing return nullptr;
.
jit/compilefunction.cpp
Outdated
void compileFunction(llvm::Module& module, const Func& function) | ||
{ | ||
llvm::IRBuilder<> b(module.getContext()); | ||
function-> | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member function body does not look like valid C++ to me.
@@ -38,6 +38,8 @@ | |||
|
|||
#include "dataconvert.h" | |||
|
|||
#include <llvm/IR/IRBuilder.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather large header file that will #include
many other header files. I think that including that kind of a "jungle" in a rather popular header file (functor.h
is being included from many other header files, which in turn could be included from many compilation units) could significantly include the compilation time.
We only seem to need this for the type name llvm::Value
, which compile()
would return a pointer to. I think that it would be better to define an opaque type and avoid including the header here. Something like this might work:
namespace llvm { class Value; }
jit/compileoperator.cpp
Outdated
} | ||
auto* func_type = llvm::FunctionType::get(return_type, {data_type, isNull_type}, false); | ||
auto* func = | ||
llvm::Function::Create(func_type, llvm::Function::ExternalLinkage, expression->alias(), module); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest
func->setDoesNotThrow();
so that no .eh_frame
or .rela.eh_frame
section will be generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
jit/compileoperator.h
Outdated
JITCompiledOperator<int64_t> compiled_function_int64; | ||
JITCompiledOperator<uint64_t> compiled_function_uint64; | ||
JITCompiledOperator<double> compiled_function_double; | ||
JITCompiledOperator<float> compiled_function_float; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can multiple compiled_function_
be set at a time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only one pointer will be set at a time.
jit/expressionjit.h
Outdated
int64_t getIntVal(rowgroup::Row& row, bool& isNull) override | ||
{ | ||
return compiledOperator.compiled_function_int64(row.getData(), isNull); | ||
} | ||
uint64_t getUintVal(rowgroup::Row& row, bool& isNull) override | ||
{ | ||
return compiledOperator.compiled_function_int64(row.getData(), isNull); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiled_function_uint64
seems to be unused. Do we really need this distinction? The ABI for signed and unsigned should be the same.
I would suggest to add some tests and to demonstrate that all this code is covered by the test cases.
jit/expressionjit.h
Outdated
double getDoubleVal(rowgroup::Row& row, bool& isNull) override | ||
{ | ||
return compiledOperator.compiled_function_double(row.getData(), isNull); | ||
} | ||
float getFloatVal(rowgroup::Row& row, bool& isNull) override | ||
{ | ||
return compiledOperator.compiled_function_float(row.getData(), isNull); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simply have one compiled_function
member that will be reinterpret_cast
to the appropriate type? That could avoid some run-time code duplication.
CMakeLists.txt
Outdated
@@ -29,7 +29,7 @@ endif() | |||
INCLUDE(ExternalProject) | |||
INCLUDE(CheckCXXSourceCompiles) | |||
|
|||
SET(CMAKE_CXX_STANDARD 11) | |||
SET(CMAKE_CXX_STANDARD 14) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use c++20 if compiler handles it via compiler flags. Is this set is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary. If it is not specified as 14 during compilation, an error will be reported.
CMakeLists.txt
Outdated
@@ -358,6 +372,7 @@ ADD_SUBDIRECTORY(writeengine/splitter) | |||
ADD_SUBDIRECTORY(storage-manager) | |||
ADD_SUBDIRECTORY(datatypes) | |||
ADD_SUBDIRECTORY(tests) | |||
ADD_SUBDIRECTORY(jit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory must be moved into ./utils and the paths changed accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
jit/CMakeLists.txt
Outdated
|
||
add_library(jit SHARED ${jit_LIB_SRCS}) | ||
|
||
target_link_libraries(jit ${NETSNMP_LIBRARIES} ${LLVM_LIBS} ${LLVM_SYSTEM_LIBS} ${LLVM_LDFLAGS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop ${NETSNMP_LIBRARIES} as this is an obsolete prerequisite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
jit/compileoperator.cpp
Outdated
{ | ||
auto columns = expression.get()->simpleColumnList(); | ||
llvm::IRBuilder<> b(module.getContext()); | ||
compileExternalFunction(module, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many times compileExternalFunction will be actually called?
jit/compileoperator.cpp
Outdated
b.CreateRet(ret); | ||
|
||
// Print the IR | ||
llvm::verifyFunction(*func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose these two lines will go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dbcon/execplan/arithmeticoperator.h
Outdated
inline llvm::Value* compile(llvm::IRBuilder<>& b, llvm::Value* data, llvm::Value* isNull, | ||
rowgroup::Row& row, ParseTree* lop, ParseTree* rop, | ||
CalpontSystemCatalog::ColDataType dataType) override; | ||
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); | |
inline llvm::Value* compileInt_(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); |
dbcon/execplan/arithmeticoperator.h
Outdated
rowgroup::Row& row, ParseTree* lop, ParseTree* rop, | ||
CalpontSystemCatalog::ColDataType dataType) override; | ||
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); | ||
inline llvm::Value* compileF(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline llvm::Value* compileF(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); | |
inline llvm::Value* compileFloat_(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r); |
[mod] add CMake config for llvm. [feat] function [feat] msc_jit [fix] errors caused by namespace [mod] resolved the build error
[feat] adjust the structure of the compile method and implement IR for expression [feat] add some testing code for JIT [fix] arena memory allocation error issue [mod] rollback the lib of llvm to the system built-in library [feat] add JIT optimization for simpleColumnInt type [feat] LogicOperator compile
[mod] add isCompilable for column [mod] add check for null row [fix] the issue of compilation errors in the algorithm column IR
[mod] remove some useless param [mod] remove some useless file [feat] add support for day() function [feat] jit add support for func_month() function
[feat] func_day add support for timestamp [feat] jit add external function timestampValueToInt [mod] Modify the implementation of compileExpression
[fix] the logic for func_hour [feat] add the jit support for func_second [mod] delete some useless define
[mod] move jit to utils dir [mod] remove link for NETSNMP_LIBRARIES [mod] standardize the function name
add temp add temp [fix] some Werror feat(cmake): hardcoded llvm include paths for a system llvm-13-dev @ubuntu20 WIP rebase on develop and change the code to compile with system llvm libs >= llvm-13
no hardcoded llvm version
contos7 llvm cmake config llvm conditional include
add llvm deps centos7 dep fix
update llvm to 12 on Ubuntu 20 and to 16 on Debian11 WIP llvm-16 compilation issue
WIP WIP compilation fix WIP introduce an expression name to disambigue JIT symbols in a JIT module
WIP compilation fix
WIP avoid SECOND(TIMESTAMP) compilation
work moved to #3165 |
Summary
MCS uses interepreted execution to calculate SQL expressions results. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. Given that table1.col1 is DECIMAL and table2.col1 is DOUBLE there is a number of conditions that drives the calculation of this relatively simple example in runtime. Given that SQL types and expression tree are known before the query begins it is possible to replace interpretation with JIT to produce specialized compiled bytecode that is:
small
has no or almost no branches
optimized for the specific platform it is run at
This is a research mostly project which goal is to produce a set of microbenchmarks that:
leverages any JIT compiler available, e.g. LLVM, MIR
demonstrates a negative and positive effects of using JIT
Related
gsoc-2023