Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat[WIP]: JIT compiled SQL expressions #2891

Closed
wants to merge 21 commits into from

Conversation

jun0315
Copy link

@jun0315 jun0315 commented Jul 4, 2023

Summary

MCS uses interepreted execution to calculate SQL expressions results. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. Given that table1.col1 is DECIMAL and table2.col1 is DOUBLE there is a number of conditions that drives the calculation of this relatively simple example in runtime. Given that SQL types and expression tree are known before the query begins it is possible to replace interpretation with JIT to produce specialized compiled bytecode that is:

small
has no or almost no branches
optimized for the specific platform it is run at
This is a research mostly project which goal is to produce a set of microbenchmarks that:

leverages any JIT compiler available, e.g. LLVM, MIR
demonstrates a negative and positive effects of using JIT

Related

gsoc-2023

@drrtuy drrtuy requested review from drrtuy and dr-m July 5, 2023 10:07
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is work in progress. Hopefully these comments are useful. It looks like we so far only have some interfaces but not any actual compilation, which would have been the main focus of my review.

I tried to compile this in the submodule storage/columnstore/columnstore under MariaDB/server@313c5a1. I got the build to complete, but the directory storage/columnstore/columnstore/jit/CMakeFiles/jit.dir in my cmake -G Ninja based build directory is empty. Other ColumnStore compilation units were being built.

jit/jit.cpp Outdated
Comment on lines 431 to 432
module_identifier_to_memory_manager.erase(module_memory_manager_it);
compiled_code_size.fetch_sub(module.size, std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which code or data structure owns the memory that module referes to here? Could the memory actually be freed in the erase call? If so, then these statements should be swapped with each other to make the code safe.

jit/jit.h Outdated
Comment on lines 43 to 44
// TODO lock when compiling a module
mutable std::mutex jit_lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any mutex or any potentially concurrently accessed data structure, it would be useful to document the protection rules. Many of these data members are actually protected by jit_lock in deleteCompiledModule().

jit/jit.h Outdated
std::unique_ptr<JITSymbolResolver> symbol_resolver;
std::unordered_map<uint64_t, std::unique_ptr<JITModuleMemoryManager>> module_identifier_to_memory_manager;
uint64_t current_module_key = 0;
std::atomic<size_t> compiled_code_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to see comment for documenting the lifetime and purpose of each data member. For compiled_code_size it would be useful to know what (if any) kind of padding or alignment overhead it would include.

Due to the lack of comments, it took me some time to find a counterpart of deleteCompiledModule() that would increment compiled_code_size and add something to module_identifier_to_memory_manager. It is compileModule(std::unique_ptr<llvm::Module>). It seems to lack proper jit_lock protection.

jit/jit.cpp Outdated
Comment on lines 37 to 39
Arena() : page_size(8)
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could live with an implicit default constructor if the initializer expression for page_size was revised to this value.

jit/jit.cpp Outdated
Comment on lines 108 to 114
inline size_t getPageSize() const
{
return page_size;
}

private:
size_t page_size = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any modification of this page_size in the code. I think that it could be declared simply as

  const size_t page_size = 8;

or even static const or static constexpr to save some memory, and then we could avoid explicitly declaring Arena(). I think that it is rather pointless to declare const data members private and to define public accessors for them.

What is the unit here? Bytes? Kilobytes? Some comments would be helpful.

Typically, sysconf(PAGESIZE) should return 4096 or a larger value.

Comment on lines 87 to 113
bool isCompilable(const execplan::CalpontSystemCatalog::ColType& colType)
{
return false;
}
llvm::Value* compile()
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the actual interesting part is still missing. The body of compile() seems to be missing return nullptr;.

Comment on lines 12 to 16
void compileFunction(llvm::Module& module, const Func& function)
{
llvm::IRBuilder<> b(module.getContext());
function->
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member function body does not look like valid C++ to me.

@@ -38,6 +38,8 @@

#include "dataconvert.h"

#include <llvm/IR/IRBuilder.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather large header file that will #include many other header files. I think that including that kind of a "jungle" in a rather popular header file (functor.h is being included from many other header files, which in turn could be included from many compilation units) could significantly include the compilation time.

We only seem to need this for the type name llvm::Value, which compile() would return a pointer to. I think that it would be better to define an opaque type and avoid including the header here. Something like this might work:

namespace llvm { class Value; }

}
auto* func_type = llvm::FunctionType::get(return_type, {data_type, isNull_type}, false);
auto* func =
llvm::Function::Create(func_type, llvm::Function::ExternalLinkage, expression->alias(), module);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest

func->setDoesNotThrow();

so that no .eh_frame or .rela.eh_frame section will be generated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

JITCompiledOperator<int64_t> compiled_function_int64;
JITCompiledOperator<uint64_t> compiled_function_uint64;
JITCompiledOperator<double> compiled_function_double;
JITCompiledOperator<float> compiled_function_float;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can multiple compiled_function_ be set at a time?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only one pointer will be set at a time.

Comment on lines 42 to 49
int64_t getIntVal(rowgroup::Row& row, bool& isNull) override
{
return compiledOperator.compiled_function_int64(row.getData(), isNull);
}
uint64_t getUintVal(rowgroup::Row& row, bool& isNull) override
{
return compiledOperator.compiled_function_int64(row.getData(), isNull);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiled_function_uint64 seems to be unused. Do we really need this distinction? The ABI for signed and unsigned should be the same.

I would suggest to add some tests and to demonstrate that all this code is covered by the test cases.

Comment on lines 50 to 57
double getDoubleVal(rowgroup::Row& row, bool& isNull) override
{
return compiledOperator.compiled_function_double(row.getData(), isNull);
}
float getFloatVal(rowgroup::Row& row, bool& isNull) override
{
return compiledOperator.compiled_function_float(row.getData(), isNull);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simply have one compiled_function member that will be reinterpret_cast to the appropriate type? That could avoid some run-time code duplication.

CMakeLists.txt Outdated
@@ -29,7 +29,7 @@ endif()
INCLUDE(ExternalProject)
INCLUDE(CheckCXXSourceCompiles)

SET(CMAKE_CXX_STANDARD 11)
SET(CMAKE_CXX_STANDARD 14)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use c++20 if compiler handles it via compiler flags. Is this set is necessary?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary. If it is not specified as 14 during compilation, an error will be reported.

CMakeLists.txt Outdated
@@ -358,6 +372,7 @@ ADD_SUBDIRECTORY(writeengine/splitter)
ADD_SUBDIRECTORY(storage-manager)
ADD_SUBDIRECTORY(datatypes)
ADD_SUBDIRECTORY(tests)
ADD_SUBDIRECTORY(jit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory must be moved into ./utils and the paths changed accordingly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


add_library(jit SHARED ${jit_LIB_SRCS})

target_link_libraries(jit ${NETSNMP_LIBRARIES} ${LLVM_LIBS} ${LLVM_SYSTEM_LIBS} ${LLVM_LDFLAGS})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop ${NETSNMP_LIBRARIES} as this is an obsolete prerequisite.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
auto columns = expression.get()->simpleColumnList();
llvm::IRBuilder<> b(module.getContext());
compileExternalFunction(module, b);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many times compileExternalFunction will be actually called?

b.CreateRet(ret);

// Print the IR
llvm::verifyFunction(*func);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose these two lines will go away.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

inline llvm::Value* compile(llvm::IRBuilder<>& b, llvm::Value* data, llvm::Value* isNull,
rowgroup::Row& row, ParseTree* lop, ParseTree* rop,
CalpontSystemCatalog::ColDataType dataType) override;
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);
inline llvm::Value* compileInt_(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);

rowgroup::Row& row, ParseTree* lop, ParseTree* rop,
CalpontSystemCatalog::ColDataType dataType) override;
inline llvm::Value* compileI(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);
inline llvm::Value* compileF(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline llvm::Value* compileF(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);
inline llvm::Value* compileFloat_(llvm::IRBuilder<>& b, llvm::Value* l, llvm::Value* r);

utils/rowgroup/rowgroup.cpp Outdated Show resolved Hide resolved
@drrtuy drrtuy marked this pull request as ready for review September 25, 2023 19:21
@jun0315 jun0315 marked this pull request as draft June 2, 2024 17:01
jun0315 added 9 commits June 3, 2024 22:21
[mod] add CMake config for llvm.

[feat] function

[feat] msc_jit

[fix] errors caused by namespace

[mod] resolved the build error
[feat] adjust the structure of the compile method and implement IR for expression

[feat] add some testing code for JIT

[fix] arena memory allocation error issue

[mod] rollback the lib of llvm to the system built-in library

[feat] add JIT optimization for simpleColumnInt type

[feat] LogicOperator compile
[mod] add isCompilable for column

[mod] add check for null row

[fix] the issue of compilation errors in the algorithm column IR
[mod] remove some useless param

[mod] remove some useless file

[feat] add support for day() function

[feat] jit add support for func_month() function
[feat] func_day add support for timestamp

[feat] jit add external function timestampValueToInt

[mod] Modify the implementation of compileExpression
[fix] the logic for func_hour

[feat] add the jit support for func_second

[mod] delete some useless define
[mod] move jit to utils dir

[mod] remove link for NETSNMP_LIBRARIES

[mod] standardize the function name
add temp

add temp

[fix] some Werror

feat(cmake): hardcoded llvm include paths for a system llvm-13-dev @ubuntu20

WIP rebase on develop and change the code to compile with system llvm libs >= llvm-13
no hardcoded llvm version
jun0315 added 6 commits June 3, 2024 22:23
contos7 llvm cmake config

llvm conditional include
add llvm deps

centos7 dep fix
update llvm to 12 on Ubuntu 20 and to 16 on Debian11

WIP llvm-16 compilation issue
WIP

WIP compilation fix

WIP introduce an expression name to disambigue JIT symbols in a JIT module
WIP avoid SECOND(TIMESTAMP) compilation
@mariadb-LeonidFedorov
Copy link
Collaborator

work moved to #3165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants