theme

background

title

info

class

drawings

transition

mdc

overviewSnapshots

coverAuthor

coverAuthorUrl

academic

https://cover.sli.dev

YeSQL Report

## YeSQL Report Presentation slides for my colleague and only for academic purposes. Learn more at https://mocusez.site/zh-CN/posts/78ca1/

text-center

persist

slide-left

true

MocusEZ

https://github.com/mocusez

YeSQL: “You extend SQL”

with Rich and Highly Performant User-Defined Functions in Relational Databases

Presenter: MocusEZ

transition: fade layout: two-cols

Authors

Yannis Foufoulas
- U. of Athens, Athena R.C.
- johnfouf@di.uoa.gr
Alkis Simitsis
- Athena Research Center
- alkis@athenarc.gr
Lefteris Stamatogiannakis
- University of Athens
- estama@gmail.com
Yannis Ioannidis
- U. of Athens, Athena R.C.
- yannis@di.uoa.gr

::right::

transition: fade

layout: two-cols

Motivation

Many programming language tools to assist developers design pipelines
- But: Complicated ecosystem, unscalable processing
Relational DBMSs offer efficient large data processing
- But: SQL has limited expressive power
UDFs in SQL merge relational and programming syntax and semantics
- But: Impedance mismatch between declarative (SQL) and procedural (e.g., Python) operation

transition: fade

Main issues addressed

Mismatch between the relational (SQL) evaluation and the procedural (Python) execution

(a) Context switching overhead:
- one facility needs to invoke the other through various levels of indirection. This is potentially expensive when performed frequently.
(b) Data conversion overhead:
- data is represented differently in the two environments and need to be wrapped/unwrapped or checked (e.g., for overflow) and encoded/decoded.

transition: fade

Our Approach: YeSQL

Usability and expressiveness
- Stateful, parametric, polymorphic, dynamically typed, scalar/aggregate/table UDFs
JIT-compiled UDFs and stateful UDFs
- UDF parallelization and UDF fusion

Performance enhancements
- Tracing JIT compilation
- Seamless integration with the DBMS(Mainly dependent on CFFI)
- UDF fusion
- Parallelism
- Stateful UDFs.

transition: fade layout: two-cols

Tracing JIT compilation & Parallelism

"Tracing Just-In-Time (JIT) compilation is an approach to dynamic code optimization that focuses on identifying and optimizing the most frequently executed paths, or "hot paths," within a program. Instead of compiling entire methods or functions, tracing JIT compilers record and compile a sequence of instructions as the program runs, specifically targeting these hot paths to produce highly optimized machine code. This optimized code can then be reused in future executions, making frequently executed code much faster."

Examples of tracing JIT implementations include PyPy (for Python) and LuaJIT (for Lua), both of which achieve significant performance gains by optimizing frequently executed paths in their respective interpreted languages.

::right::

Using PyPy Instead of CPython

transition: fade

Seamless Integration with DBMS

"UDFs are wrapped using embedded CFFI"

transition: fade

UDF Fusion

Fusable UDFs
- The second UDF’s input data is the same as the first UDF’s output
- The argument data types are available in the query plan

Example: fuse two scalar UDFs

transition: fade layout: two-cols

Stateful UDFs

Stateful UDFs are functions that can maintain state (memory) across multiple invocations during data processing. Unlike regular UDFs which process each row independently, stateful UDFs can remember information from previous rows and use it in processing subsequent rows.

Key characteristics of Stateful UDFs:

State Maintenance

Can store and update variables between function calls

Useful for running calculations, aggregations, or pattern detection

Support in Different Systems:

Apache Flink: Extensive support for stateful operations

Apache Spark: Supported through mapGroupsWithState and flatMapGroupsWithState

Many streaming systems support stateful processing

::right::

e.g., via a global dictionary

globaldict = {}
def var(arg1, arg2 = None):
if arg2 is not None:
  globaldict[arg1] = arg2
  return True
else:
  return globaldict[arg1]

SELECT var(‘a’, ’HELLO WORLD’);
>> 1
SELECT lower(var(‘a’));
>> hello world

transition: fade

Evaluation Results

Significant performance improvements, achieving up to 68x speedups compared to other traditional python methods.

layout: iframe-right url: https://mocusez.site/zh-CN/posts/78ca1/ class: my-cool-content-on-the-left

More Deatils Information

My Chinese Blog：
如何让数据库中的Python跑的更快-VLDB22-YeSQL文章阅读

Relative Information:
31st Symposium on Advanced Database Systems

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slides.md

slides.md

YeSQL: “You extend SQL”

with Rich and Highly Performant User-Defined Functions in Relational Databases

transition: fade layout: two-cols

Authors

layout: two-cols

Motivation

transition: fade

Main issues addressed

Mismatch between the relational (SQL) evaluation and the procedural (Python) execution

transition: fade

Our Approach: YeSQL

transition: fade layout: two-cols

Tracing JIT compilation & Parallelism

transition: fade

Seamless Integration with DBMS

transition: fade

UDF Fusion

transition: fade layout: two-cols

Stateful UDFs

transition: fade

Evaluation Results

layout: iframe-right url: https://mocusez.site/zh-CN/posts/78ca1/ class: my-cool-content-on-the-left

More Deatils Information

layout: center

Thanks for Listening

Files

slides.md

Latest commit

History

slides.md

File metadata and controls

YeSQL: “You extend SQL”

with Rich and Highly Performant User-Defined Functions in Relational Databases

transition: fade layout: two-cols

Authors

layout: two-cols

Motivation

transition: fade

Main issues addressed

Mismatch between the relational (SQL) evaluation and the procedural (Python) execution

transition: fade

Our Approach: YeSQL

transition: fade layout: two-cols

Tracing JIT compilation & Parallelism

transition: fade

Seamless Integration with DBMS

transition: fade

UDF Fusion

transition: fade layout: two-cols

Stateful UDFs

transition: fade

Evaluation Results

layout: iframe-right url: https://mocusez.site/zh-CN/posts/78ca1/ class: my-cool-content-on-the-left

More Deatils Information

layout: center

Thanks for Listening