diff --git a/404.html b/404.html index 8681b441..1fe1f8ee 100644 --- a/404.html +++ b/404.html @@ -1,4 +1,4 @@ - Substrait: Cross-Language Serialization for Relational Algebra
GitHub
\ No newline at end of file +-->
\ No newline at end of file diff --git a/about/index.html b/about/index.html index d4880c82..85a6c1c4 100644 --- a/about/index.html +++ b/about/index.html @@ -1,4 +1,4 @@ - About Substrait - Substrait: Cross-Language Serialization for Relational Algebra
Skip to content

Substrait: Cross-Language Serialization for Relational Algebra

Project Vision

The Substrait project aims to create a well-defined, cross-language specification for data compute operations. The specification declares a set of common operations, defines their semantics, and describes their behavior unambiguously. The project also defines extension points and serialized representations of the specification.

In many ways, the goal of this project is similar to that of the Apache Arrow project. Arrow is focused on a standardized memory representation of columnar data. Substrait is focused on what should be done to data.

Why not use SQL?

SQL is a well known language for describing queries against relational data. It is designed to be simple and allow reading and writing by humans. Substrait is not intended as a replacement for SQL and works alongside SQL to provide capabilities that SQL lacks. SQL is not a great fit for systems that actually satisfy the query because it does not provide sufficient detail and is not represented in a format that is easy for processing. Because of this, most modern systems will first translate the SQL query into a query plan, sometimes called the execution plan. There can be multiple levels of a query plan (e.g. physical and logical), a query plan may be split up and distributed across multiple systems, and a query plan often undergoes simplifying or optimizing transformations. The SQL standard does not define the format of the query or execution plan and there is no open format that is supported by a broad set of systems. Substrait was created to provide a standard and open format for these query plans.

Why not just do this within an existing OSS project?

A key goal of the Substrait project is to not be coupled to any single existing technology. Trying to get people involved in something can be difficult when it seems to be primarily driven by the opinions and habits of a single community. In many ways, this situation is similar to the early situation with Arrow. The precursor to Arrow was the Apache Drill ValueVectors concepts. As part of creating Arrow, Wes and Jacques recognized the need to create a new community to build a fresh consensus (beyond just what the Apache Drill community wanted). This separation and new independent community was a key ingredient to Arrow’s current success. The needs here are much the same: many separate communities could benefit from Substrait, but each have their own pain points, type systems, development processes and timelines. To help resolve these tensions, one of the approaches proposed in Substrait is to set a bar that at least two of the top four OSS data technologies (Arrow, Spark, Iceberg, Trino) supports something before incorporating it directly into the Substrait specification. (Another goal is to support strong extension points at key locations to avoid this bar being a limiter to broad adoption.)

  • Apache Calcite: Many ideas in Substrait are inspired by the Calcite project. Calcite is a great JVM-based SQL query parsing and optimization framework. A key goal of the Substrait project is to expose Calcite capabilities more easily to non-JVM technologies as well as expose query planning operations as microservices.
  • Apache Arrow: The Arrow format for data is what the Substrait specification attempts to be for compute expressions. A key goal of Substrait is to enable Substrait producers to execute work within the Arrow Rust and C++ compute kernels.

Why the name Substrait?

A strait is a narrow connector of water between two other pieces of water. In analytics, data is often thought of as water. Substrait is focused on instructions related to the data. In other words, what defines or supports the movement of water between one or more larger systems. Thus, the underlayment for the strait connecting different pools of water => sub-strait.

GitHub

Substrait: Cross-Language Serialization for Relational Algebra

Project Vision

The Substrait project aims to create a well-defined, cross-language specification for data compute operations. The specification declares a set of common operations, defines their semantics, and describes their behavior unambiguously. The project also defines extension points and serialized representations of the specification.

In many ways, the goal of this project is similar to that of the Apache Arrow project. Arrow is focused on a standardized memory representation of columnar data. Substrait is focused on what should be done to data.

Why not use SQL?

SQL is a well known language for describing queries against relational data. It is designed to be simple and allow reading and writing by humans. Substrait is not intended as a replacement for SQL and works alongside SQL to provide capabilities that SQL lacks. SQL is not a great fit for systems that actually satisfy the query because it does not provide sufficient detail and is not represented in a format that is easy for processing. Because of this, most modern systems will first translate the SQL query into a query plan, sometimes called the execution plan. There can be multiple levels of a query plan (e.g. physical and logical), a query plan may be split up and distributed across multiple systems, and a query plan often undergoes simplifying or optimizing transformations. The SQL standard does not define the format of the query or execution plan and there is no open format that is supported by a broad set of systems. Substrait was created to provide a standard and open format for these query plans.

Why not just do this within an existing OSS project?

A key goal of the Substrait project is to not be coupled to any single existing technology. Trying to get people involved in something can be difficult when it seems to be primarily driven by the opinions and habits of a single community. In many ways, this situation is similar to the early situation with Arrow. The precursor to Arrow was the Apache Drill ValueVectors concepts. As part of creating Arrow, Wes and Jacques recognized the need to create a new community to build a fresh consensus (beyond just what the Apache Drill community wanted). This separation and new independent community was a key ingredient to Arrow’s current success. The needs here are much the same: many separate communities could benefit from Substrait, but each have their own pain points, type systems, development processes and timelines. To help resolve these tensions, one of the approaches proposed in Substrait is to set a bar that at least two of the top four OSS data technologies (Arrow, Spark, Iceberg, Trino) supports something before incorporating it directly into the Substrait specification. (Another goal is to support strong extension points at key locations to avoid this bar being a limiter to broad adoption.)

  • Apache Calcite: Many ideas in Substrait are inspired by the Calcite project. Calcite is a great JVM-based SQL query parsing and optimization framework. A key goal of the Substrait project is to expose Calcite capabilities more easily to non-JVM technologies as well as expose query planning operations as microservices.
  • Apache Arrow: The Arrow format for data is what the Substrait specification attempts to be for compute expressions. A key goal of Substrait is to enable Substrait producers to execute work within the Arrow Rust and C++ compute kernels.

Why the name Substrait?

A strait is a narrow connector of water between two other pieces of water. In analytics, data is often thought of as water. Substrait is focused on instructions related to the data. In other words, what defines or supports the movement of water between one or more larger systems. Thus, the underlayment for the strait connecting different pools of water => sub-strait.

\ No newline at end of file +-->
\ No newline at end of file diff --git a/assets/javascripts/bundle.bd41221c.min.js b/assets/javascripts/bundle.1e8ae164.min.js similarity index 59% rename from assets/javascripts/bundle.bd41221c.min.js rename to assets/javascripts/bundle.1e8ae164.min.js index 70bcbf19..21297988 100644 --- a/assets/javascripts/bundle.bd41221c.min.js +++ b/assets/javascripts/bundle.1e8ae164.min.js @@ -1,15 +1,15 @@ -"use strict";(()=>{var _i=Object.create;var br=Object.defineProperty;var Ai=Object.getOwnPropertyDescriptor;var Ci=Object.getOwnPropertyNames,Ft=Object.getOwnPropertySymbols,ki=Object.getPrototypeOf,vr=Object.prototype.hasOwnProperty,eo=Object.prototype.propertyIsEnumerable;var Zr=(e,t,r)=>t in e?br(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,F=(e,t)=>{for(var r in t||(t={}))vr.call(t,r)&&Zr(e,r,t[r]);if(Ft)for(var r of Ft(t))eo.call(t,r)&&Zr(e,r,t[r]);return e};var to=(e,t)=>{var r={};for(var o in e)vr.call(e,o)&&t.indexOf(o)<0&&(r[o]=e[o]);if(e!=null&&Ft)for(var o of Ft(e))t.indexOf(o)<0&&eo.call(e,o)&&(r[o]=e[o]);return r};var gr=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Hi=(e,t,r,o)=>{if(t&&typeof t=="object"||typeof t=="function")for(let n of Ci(t))!vr.call(e,n)&&n!==r&&br(e,n,{get:()=>t[n],enumerable:!(o=Ai(t,n))||o.enumerable});return e};var jt=(e,t,r)=>(r=e!=null?_i(ki(e)):{},Hi(t||!e||!e.__esModule?br(r,"default",{value:e,enumerable:!0}):r,e));var ro=(e,t,r)=>new Promise((o,n)=>{var i=c=>{try{a(r.next(c))}catch(p){n(p)}},s=c=>{try{a(r.throw(c))}catch(p){n(p)}},a=c=>c.done?o(c.value):Promise.resolve(c.value).then(i,s);a((r=r.apply(e,t)).next())});var no=gr((xr,oo)=>{(function(e,t){typeof xr=="object"&&typeof oo!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(xr,function(){"use strict";function e(r){var o=!0,n=!1,i=null,s={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function a(C){return!!(C&&C!==document&&C.nodeName!=="HTML"&&C.nodeName!=="BODY"&&"classList"in C&&"contains"in C.classList)}function c(C){var ct=C.type,Ne=C.tagName;return!!(Ne==="INPUT"&&s[ct]&&!C.readOnly||Ne==="TEXTAREA"&&!C.readOnly||C.isContentEditable)}function p(C){C.classList.contains("focus-visible")||(C.classList.add("focus-visible"),C.setAttribute("data-focus-visible-added",""))}function l(C){C.hasAttribute("data-focus-visible-added")&&(C.classList.remove("focus-visible"),C.removeAttribute("data-focus-visible-added"))}function f(C){C.metaKey||C.altKey||C.ctrlKey||(a(r.activeElement)&&p(r.activeElement),o=!0)}function u(C){o=!1}function h(C){a(C.target)&&(o||c(C.target))&&p(C.target)}function w(C){a(C.target)&&(C.target.classList.contains("focus-visible")||C.target.hasAttribute("data-focus-visible-added"))&&(n=!0,window.clearTimeout(i),i=window.setTimeout(function(){n=!1},100),l(C.target))}function A(C){document.visibilityState==="hidden"&&(n&&(o=!0),Z())}function Z(){document.addEventListener("mousemove",J),document.addEventListener("mousedown",J),document.addEventListener("mouseup",J),document.addEventListener("pointermove",J),document.addEventListener("pointerdown",J),document.addEventListener("pointerup",J),document.addEventListener("touchmove",J),document.addEventListener("touchstart",J),document.addEventListener("touchend",J)}function te(){document.removeEventListener("mousemove",J),document.removeEventListener("mousedown",J),document.removeEventListener("mouseup",J),document.removeEventListener("pointermove",J),document.removeEventListener("pointerdown",J),document.removeEventListener("pointerup",J),document.removeEventListener("touchmove",J),document.removeEventListener("touchstart",J),document.removeEventListener("touchend",J)}function J(C){C.target.nodeName&&C.target.nodeName.toLowerCase()==="html"||(o=!1,te())}document.addEventListener("keydown",f,!0),document.addEventListener("mousedown",u,!0),document.addEventListener("pointerdown",u,!0),document.addEventListener("touchstart",u,!0),document.addEventListener("visibilitychange",A,!0),Z(),r.addEventListener("focus",h,!0),r.addEventListener("blur",w,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var zr=gr((kt,Vr)=>{/*! +"use strict";(()=>{var _i=Object.create;var br=Object.defineProperty;var Ai=Object.getOwnPropertyDescriptor;var Ci=Object.getOwnPropertyNames,Ft=Object.getOwnPropertySymbols,ki=Object.getPrototypeOf,vr=Object.prototype.hasOwnProperty,eo=Object.prototype.propertyIsEnumerable;var Zr=(e,t,r)=>t in e?br(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,F=(e,t)=>{for(var r in t||(t={}))vr.call(t,r)&&Zr(e,r,t[r]);if(Ft)for(var r of Ft(t))eo.call(t,r)&&Zr(e,r,t[r]);return e};var to=(e,t)=>{var r={};for(var o in e)vr.call(e,o)&&t.indexOf(o)<0&&(r[o]=e[o]);if(e!=null&&Ft)for(var o of Ft(e))t.indexOf(o)<0&&eo.call(e,o)&&(r[o]=e[o]);return r};var gr=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Hi=(e,t,r,o)=>{if(t&&typeof t=="object"||typeof t=="function")for(let n of Ci(t))!vr.call(e,n)&&n!==r&&br(e,n,{get:()=>t[n],enumerable:!(o=Ai(t,n))||o.enumerable});return e};var jt=(e,t,r)=>(r=e!=null?_i(ki(e)):{},Hi(t||!e||!e.__esModule?br(r,"default",{value:e,enumerable:!0}):r,e));var ro=(e,t,r)=>new Promise((o,n)=>{var i=c=>{try{s(r.next(c))}catch(p){n(p)}},a=c=>{try{s(r.throw(c))}catch(p){n(p)}},s=c=>c.done?o(c.value):Promise.resolve(c.value).then(i,a);s((r=r.apply(e,t)).next())});var no=gr((xr,oo)=>{(function(e,t){typeof xr=="object"&&typeof oo!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(xr,function(){"use strict";function e(r){var o=!0,n=!1,i=null,a={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function s(C){return!!(C&&C!==document&&C.nodeName!=="HTML"&&C.nodeName!=="BODY"&&"classList"in C&&"contains"in C.classList)}function c(C){var ct=C.type,Ne=C.tagName;return!!(Ne==="INPUT"&&a[ct]&&!C.readOnly||Ne==="TEXTAREA"&&!C.readOnly||C.isContentEditable)}function p(C){C.classList.contains("focus-visible")||(C.classList.add("focus-visible"),C.setAttribute("data-focus-visible-added",""))}function l(C){C.hasAttribute("data-focus-visible-added")&&(C.classList.remove("focus-visible"),C.removeAttribute("data-focus-visible-added"))}function f(C){C.metaKey||C.altKey||C.ctrlKey||(s(r.activeElement)&&p(r.activeElement),o=!0)}function u(C){o=!1}function h(C){s(C.target)&&(o||c(C.target))&&p(C.target)}function w(C){s(C.target)&&(C.target.classList.contains("focus-visible")||C.target.hasAttribute("data-focus-visible-added"))&&(n=!0,window.clearTimeout(i),i=window.setTimeout(function(){n=!1},100),l(C.target))}function A(C){document.visibilityState==="hidden"&&(n&&(o=!0),Z())}function Z(){document.addEventListener("mousemove",J),document.addEventListener("mousedown",J),document.addEventListener("mouseup",J),document.addEventListener("pointermove",J),document.addEventListener("pointerdown",J),document.addEventListener("pointerup",J),document.addEventListener("touchmove",J),document.addEventListener("touchstart",J),document.addEventListener("touchend",J)}function te(){document.removeEventListener("mousemove",J),document.removeEventListener("mousedown",J),document.removeEventListener("mouseup",J),document.removeEventListener("pointermove",J),document.removeEventListener("pointerdown",J),document.removeEventListener("pointerup",J),document.removeEventListener("touchmove",J),document.removeEventListener("touchstart",J),document.removeEventListener("touchend",J)}function J(C){C.target.nodeName&&C.target.nodeName.toLowerCase()==="html"||(o=!1,te())}document.addEventListener("keydown",f,!0),document.addEventListener("mousedown",u,!0),document.addEventListener("pointerdown",u,!0),document.addEventListener("touchstart",u,!0),document.addEventListener("visibilitychange",A,!0),Z(),r.addEventListener("focus",h,!0),r.addEventListener("blur",w,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var zr=gr((kt,Vr)=>{/*! * clipboard.js v2.0.11 * https://clipboardjs.com/ * * Licensed MIT © Zeno Rocha - */(function(t,r){typeof kt=="object"&&typeof Vr=="object"?Vr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof kt=="object"?kt.ClipboardJS=r():t.ClipboardJS=r()})(kt,function(){return function(){var e={686:function(o,n,i){"use strict";i.d(n,{default:function(){return Li}});var s=i(279),a=i.n(s),c=i(370),p=i.n(c),l=i(817),f=i.n(l);function u(D){try{return document.execCommand(D)}catch(M){return!1}}var h=function(M){var O=f()(M);return u("cut"),O},w=h;function A(D){var M=document.documentElement.getAttribute("dir")==="rtl",O=document.createElement("textarea");O.style.fontSize="12pt",O.style.border="0",O.style.padding="0",O.style.margin="0",O.style.position="absolute",O.style[M?"right":"left"]="-9999px";var I=window.pageYOffset||document.documentElement.scrollTop;return O.style.top="".concat(I,"px"),O.setAttribute("readonly",""),O.value=D,O}var Z=function(M,O){var I=A(M);O.container.appendChild(I);var W=f()(I);return u("copy"),I.remove(),W},te=function(M){var O=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},I="";return typeof M=="string"?I=Z(M,O):M instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(M==null?void 0:M.type)?I=Z(M.value,O):(I=f()(M),u("copy")),I},J=te;function C(D){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?C=function(O){return typeof O}:C=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},C(D)}var ct=function(){var M=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},O=M.action,I=O===void 0?"copy":O,W=M.container,K=M.target,Ce=M.text;if(I!=="copy"&&I!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(K!==void 0)if(K&&C(K)==="object"&&K.nodeType===1){if(I==="copy"&&K.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(I==="cut"&&(K.hasAttribute("readonly")||K.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Ce)return J(Ce,{container:W});if(K)return I==="cut"?w(K):J(K,{container:W})},Ne=ct;function Pe(D){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?Pe=function(O){return typeof O}:Pe=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},Pe(D)}function xi(D,M){if(!(D instanceof M))throw new TypeError("Cannot call a class as a function")}function Xr(D,M){for(var O=0;O0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof W.action=="function"?W.action:this.defaultAction,this.target=typeof W.target=="function"?W.target:this.defaultTarget,this.text=typeof W.text=="function"?W.text:this.defaultText,this.container=Pe(W.container)==="object"?W.container:document.body}},{key:"listenClick",value:function(W){var K=this;this.listener=p()(W,"click",function(Ce){return K.onClick(Ce)})}},{key:"onClick",value:function(W){var K=W.delegateTarget||W.currentTarget,Ce=this.action(K)||"copy",It=Ne({action:Ce,container:this.container,target:this.target(K),text:this.text(K)});this.emit(It?"success":"error",{action:Ce,text:It,trigger:K,clearSelection:function(){K&&K.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(W){return hr("action",W)}},{key:"defaultTarget",value:function(W){var K=hr("target",W);if(K)return document.querySelector(K)}},{key:"defaultText",value:function(W){return hr("text",W)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(W){var K=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return J(W,K)}},{key:"cut",value:function(W){return w(W)}},{key:"isSupported",value:function(){var W=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],K=typeof W=="string"?[W]:W,Ce=!!document.queryCommandSupported;return K.forEach(function(It){Ce=Ce&&!!document.queryCommandSupported(It)}),Ce}}]),O}(a()),Li=Mi},828:function(o){var n=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function s(a,c){for(;a&&a.nodeType!==n;){if(typeof a.matches=="function"&&a.matches(c))return a;a=a.parentNode}}o.exports=s},438:function(o,n,i){var s=i(828);function a(l,f,u,h,w){var A=p.apply(this,arguments);return l.addEventListener(u,A,w),{destroy:function(){l.removeEventListener(u,A,w)}}}function c(l,f,u,h,w){return typeof l.addEventListener=="function"?a.apply(null,arguments):typeof u=="function"?a.bind(null,document).apply(null,arguments):(typeof l=="string"&&(l=document.querySelectorAll(l)),Array.prototype.map.call(l,function(A){return a(A,f,u,h,w)}))}function p(l,f,u,h){return function(w){w.delegateTarget=s(w.target,f),w.delegateTarget&&h.call(l,w)}}o.exports=c},879:function(o,n){n.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},n.nodeList=function(i){var s=Object.prototype.toString.call(i);return i!==void 0&&(s==="[object NodeList]"||s==="[object HTMLCollection]")&&"length"in i&&(i.length===0||n.node(i[0]))},n.string=function(i){return typeof i=="string"||i instanceof String},n.fn=function(i){var s=Object.prototype.toString.call(i);return s==="[object Function]"}},370:function(o,n,i){var s=i(879),a=i(438);function c(u,h,w){if(!u&&!h&&!w)throw new Error("Missing required arguments");if(!s.string(h))throw new TypeError("Second argument must be a String");if(!s.fn(w))throw new TypeError("Third argument must be a Function");if(s.node(u))return p(u,h,w);if(s.nodeList(u))return l(u,h,w);if(s.string(u))return f(u,h,w);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function p(u,h,w){return u.addEventListener(h,w),{destroy:function(){u.removeEventListener(h,w)}}}function l(u,h,w){return Array.prototype.forEach.call(u,function(A){A.addEventListener(h,w)}),{destroy:function(){Array.prototype.forEach.call(u,function(A){A.removeEventListener(h,w)})}}}function f(u,h,w){return a(document.body,u,h,w)}o.exports=c},817:function(o){function n(i){var s;if(i.nodeName==="SELECT")i.focus(),s=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var a=i.hasAttribute("readonly");a||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),a||i.removeAttribute("readonly"),s=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var c=window.getSelection(),p=document.createRange();p.selectNodeContents(i),c.removeAllRanges(),c.addRange(p),s=c.toString()}return s}o.exports=n},279:function(o){function n(){}n.prototype={on:function(i,s,a){var c=this.e||(this.e={});return(c[i]||(c[i]=[])).push({fn:s,ctx:a}),this},once:function(i,s,a){var c=this;function p(){c.off(i,p),s.apply(a,arguments)}return p._=s,this.on(i,p,a)},emit:function(i){var s=[].slice.call(arguments,1),a=((this.e||(this.e={}))[i]||[]).slice(),c=0,p=a.length;for(c;c{"use strict";/*! + */(function(t,r){typeof kt=="object"&&typeof Vr=="object"?Vr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof kt=="object"?kt.ClipboardJS=r():t.ClipboardJS=r()})(kt,function(){return function(){var e={686:function(o,n,i){"use strict";i.d(n,{default:function(){return Li}});var a=i(279),s=i.n(a),c=i(370),p=i.n(c),l=i(817),f=i.n(l);function u(D){try{return document.execCommand(D)}catch(M){return!1}}var h=function(M){var O=f()(M);return u("cut"),O},w=h;function A(D){var M=document.documentElement.getAttribute("dir")==="rtl",O=document.createElement("textarea");O.style.fontSize="12pt",O.style.border="0",O.style.padding="0",O.style.margin="0",O.style.position="absolute",O.style[M?"right":"left"]="-9999px";var I=window.pageYOffset||document.documentElement.scrollTop;return O.style.top="".concat(I,"px"),O.setAttribute("readonly",""),O.value=D,O}var Z=function(M,O){var I=A(M);O.container.appendChild(I);var W=f()(I);return u("copy"),I.remove(),W},te=function(M){var O=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},I="";return typeof M=="string"?I=Z(M,O):M instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(M==null?void 0:M.type)?I=Z(M.value,O):(I=f()(M),u("copy")),I},J=te;function C(D){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?C=function(O){return typeof O}:C=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},C(D)}var ct=function(){var M=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},O=M.action,I=O===void 0?"copy":O,W=M.container,K=M.target,Ce=M.text;if(I!=="copy"&&I!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(K!==void 0)if(K&&C(K)==="object"&&K.nodeType===1){if(I==="copy"&&K.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(I==="cut"&&(K.hasAttribute("readonly")||K.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Ce)return J(Ce,{container:W});if(K)return I==="cut"?w(K):J(K,{container:W})},Ne=ct;function Pe(D){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?Pe=function(O){return typeof O}:Pe=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},Pe(D)}function xi(D,M){if(!(D instanceof M))throw new TypeError("Cannot call a class as a function")}function Xr(D,M){for(var O=0;O0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof W.action=="function"?W.action:this.defaultAction,this.target=typeof W.target=="function"?W.target:this.defaultTarget,this.text=typeof W.text=="function"?W.text:this.defaultText,this.container=Pe(W.container)==="object"?W.container:document.body}},{key:"listenClick",value:function(W){var K=this;this.listener=p()(W,"click",function(Ce){return K.onClick(Ce)})}},{key:"onClick",value:function(W){var K=W.delegateTarget||W.currentTarget,Ce=this.action(K)||"copy",It=Ne({action:Ce,container:this.container,target:this.target(K),text:this.text(K)});this.emit(It?"success":"error",{action:Ce,text:It,trigger:K,clearSelection:function(){K&&K.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(W){return hr("action",W)}},{key:"defaultTarget",value:function(W){var K=hr("target",W);if(K)return document.querySelector(K)}},{key:"defaultText",value:function(W){return hr("text",W)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(W){var K=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return J(W,K)}},{key:"cut",value:function(W){return w(W)}},{key:"isSupported",value:function(){var W=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],K=typeof W=="string"?[W]:W,Ce=!!document.queryCommandSupported;return K.forEach(function(It){Ce=Ce&&!!document.queryCommandSupported(It)}),Ce}}]),O}(s()),Li=Mi},828:function(o){var n=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function a(s,c){for(;s&&s.nodeType!==n;){if(typeof s.matches=="function"&&s.matches(c))return s;s=s.parentNode}}o.exports=a},438:function(o,n,i){var a=i(828);function s(l,f,u,h,w){var A=p.apply(this,arguments);return l.addEventListener(u,A,w),{destroy:function(){l.removeEventListener(u,A,w)}}}function c(l,f,u,h,w){return typeof l.addEventListener=="function"?s.apply(null,arguments):typeof u=="function"?s.bind(null,document).apply(null,arguments):(typeof l=="string"&&(l=document.querySelectorAll(l)),Array.prototype.map.call(l,function(A){return s(A,f,u,h,w)}))}function p(l,f,u,h){return function(w){w.delegateTarget=a(w.target,f),w.delegateTarget&&h.call(l,w)}}o.exports=c},879:function(o,n){n.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},n.nodeList=function(i){var a=Object.prototype.toString.call(i);return i!==void 0&&(a==="[object NodeList]"||a==="[object HTMLCollection]")&&"length"in i&&(i.length===0||n.node(i[0]))},n.string=function(i){return typeof i=="string"||i instanceof String},n.fn=function(i){var a=Object.prototype.toString.call(i);return a==="[object Function]"}},370:function(o,n,i){var a=i(879),s=i(438);function c(u,h,w){if(!u&&!h&&!w)throw new Error("Missing required arguments");if(!a.string(h))throw new TypeError("Second argument must be a String");if(!a.fn(w))throw new TypeError("Third argument must be a Function");if(a.node(u))return p(u,h,w);if(a.nodeList(u))return l(u,h,w);if(a.string(u))return f(u,h,w);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function p(u,h,w){return u.addEventListener(h,w),{destroy:function(){u.removeEventListener(h,w)}}}function l(u,h,w){return Array.prototype.forEach.call(u,function(A){A.addEventListener(h,w)}),{destroy:function(){Array.prototype.forEach.call(u,function(A){A.removeEventListener(h,w)})}}}function f(u,h,w){return s(document.body,u,h,w)}o.exports=c},817:function(o){function n(i){var a;if(i.nodeName==="SELECT")i.focus(),a=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var s=i.hasAttribute("readonly");s||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),s||i.removeAttribute("readonly"),a=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var c=window.getSelection(),p=document.createRange();p.selectNodeContents(i),c.removeAllRanges(),c.addRange(p),a=c.toString()}return a}o.exports=n},279:function(o){function n(){}n.prototype={on:function(i,a,s){var c=this.e||(this.e={});return(c[i]||(c[i]=[])).push({fn:a,ctx:s}),this},once:function(i,a,s){var c=this;function p(){c.off(i,p),a.apply(s,arguments)}return p._=a,this.on(i,p,s)},emit:function(i){var a=[].slice.call(arguments,1),s=((this.e||(this.e={}))[i]||[]).slice(),c=0,p=s.length;for(c;c{"use strict";/*! * escape-html * Copyright(c) 2012-2013 TJ Holowaychuk * Copyright(c) 2015 Andreas Lubbe * Copyright(c) 2015 Tiancheng "Timothy" Gu * MIT Licensed - */var Va=/["'&<>]/;qn.exports=za;function za(e){var t=""+e,r=Va.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i]/;qn.exports=za;function za(e){var t=""+e,r=Va.exec(t);if(!r)return t;var o,n="",i=0,a=0;for(i=r.index;i0&&i[i.length-1])&&(p[0]===6||p[0]===2)){r=0;continue}if(p[0]===3&&(!i||p[1]>i[0]&&p[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function V(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],s;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(a){s={error:a}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(s)throw s.error}}return i}function z(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o1||a(u,h)})})}function a(u,h){try{c(o[u](h))}catch(w){f(i[0][3],w)}}function c(u){u.value instanceof ot?Promise.resolve(u.value.v).then(p,l):f(i[0][2],u)}function p(u){a("next",u)}function l(u){a("throw",u)}function f(u,h){u(h),i.shift(),i.length&&a(i[0][0],i[0][1])}}function so(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof ue=="function"?ue(e):e[Symbol.iterator](),r={},o("next"),o("throw"),o("return"),r[Symbol.asyncIterator]=function(){return this},r);function o(i){r[i]=e[i]&&function(s){return new Promise(function(a,c){s=e[i](s),n(a,c,s.done,s.value)})}}function n(i,s,a,c){Promise.resolve(c).then(function(p){i({value:p,done:a})},s)}}function k(e){return typeof e=="function"}function pt(e){var t=function(o){Error.call(o),o.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var Wt=pt(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +***************************************************************************** */var yr=function(e,t){return yr=Object.setPrototypeOf||{__proto__:[]}instanceof Array&&function(r,o){r.__proto__=o}||function(r,o){for(var n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])},yr(e,t)};function se(e,t){if(typeof t!="function"&&t!==null)throw new TypeError("Class extends value "+String(t)+" is not a constructor or null");yr(e,t);function r(){this.constructor=e}e.prototype=t===null?Object.create(t):(r.prototype=t.prototype,new r)}function io(e,t,r,o){function n(i){return i instanceof r?i:new r(function(a){a(i)})}return new(r||(r=Promise))(function(i,a){function s(l){try{p(o.next(l))}catch(f){a(f)}}function c(l){try{p(o.throw(l))}catch(f){a(f)}}function p(l){l.done?i(l.value):n(l.value).then(s,c)}p((o=o.apply(e,t||[])).next())})}function Ut(e,t){var r={label:0,sent:function(){if(i[0]&1)throw i[1];return i[1]},trys:[],ops:[]},o,n,i,a;return a={next:s(0),throw:s(1),return:s(2)},typeof Symbol=="function"&&(a[Symbol.iterator]=function(){return this}),a;function s(p){return function(l){return c([p,l])}}function c(p){if(o)throw new TypeError("Generator is already executing.");for(;r;)try{if(o=1,n&&(i=p[0]&2?n.return:p[0]?n.throw||((i=n.return)&&i.call(n),0):n.next)&&!(i=i.call(n,p[1])).done)return i;switch(n=0,i&&(p=[p[0]&2,i.value]),p[0]){case 0:case 1:i=p;break;case 4:return r.label++,{value:p[1],done:!1};case 5:r.label++,n=p[1],p=[0];continue;case 7:p=r.ops.pop(),r.trys.pop();continue;default:if(i=r.trys,!(i=i.length>0&&i[i.length-1])&&(p[0]===6||p[0]===2)){r=0;continue}if(p[0]===3&&(!i||p[1]>i[0]&&p[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function V(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],a;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(s){a={error:s}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(a)throw a.error}}return i}function z(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o1||s(u,h)})})}function s(u,h){try{c(o[u](h))}catch(w){f(i[0][3],w)}}function c(u){u.value instanceof ot?Promise.resolve(u.value.v).then(p,l):f(i[0][2],u)}function p(u){s("next",u)}function l(u){s("throw",u)}function f(u,h){u(h),i.shift(),i.length&&s(i[0][0],i[0][1])}}function so(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof ue=="function"?ue(e):e[Symbol.iterator](),r={},o("next"),o("throw"),o("return"),r[Symbol.asyncIterator]=function(){return this},r);function o(i){r[i]=e[i]&&function(a){return new Promise(function(s,c){a=e[i](a),n(s,c,a.done,a.value)})}}function n(i,a,s,c){Promise.resolve(c).then(function(p){i({value:p,done:s})},a)}}function k(e){return typeof e=="function"}function pt(e){var t=function(o){Error.call(o),o.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var Wt=pt(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: `+r.map(function(o,n){return n+1+") "+o.toString()}).join(` - `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,o,n,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=ue(s),c=a.next();!c.done;c=a.next()){var p=c.value;p.remove(this)}}catch(A){t={error:A}}finally{try{c&&!c.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var l=this.initialTeardown;if(k(l))try{l()}catch(A){i=A instanceof Wt?A.errors:[A]}var f=this._finalizers;if(f){this._finalizers=null;try{for(var u=ue(f),h=u.next();!h.done;h=u.next()){var w=h.value;try{co(w)}catch(A){i=i!=null?i:[],A instanceof Wt?i=z(z([],V(i)),V(A.errors)):i.push(A)}}}catch(A){o={error:A}}finally{try{h&&!h.done&&(n=u.return)&&n.call(u)}finally{if(o)throw o.error}}}if(i)throw new Wt(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)co(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Er=Ie.EMPTY;function Dt(e){return e instanceof Ie||e&&"closed"in e&&k(e.remove)&&k(e.add)&&k(e.unsubscribe)}function co(e){k(e)?e():e.unsubscribe()}var ke={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var lt={setTimeout:function(e,t){for(var r=[],o=2;o0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var o=this,n=this,i=n.hasError,s=n.isStopped,a=n.observers;return i||s?Er:(this.currentObservers=null,a.push(r),new Ie(function(){o.currentObservers=null,Ve(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var o=this,n=o.hasError,i=o.thrownError,s=o.isStopped;n?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new j;return r.source=this,r},t.create=function(r,o){return new vo(r,o)},t}(j);var vo=function(e){se(t,e);function t(r,o){var n=e.call(this)||this;return n.destination=r,n.source=o,n}return t.prototype.next=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.next)===null||n===void 0||n.call(o,r)},t.prototype.error=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.error)===null||n===void 0||n.call(o,r)},t.prototype.complete=function(){var r,o;(o=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||o===void 0||o.call(r)},t.prototype._subscribe=function(r){var o,n;return(n=(o=this.source)===null||o===void 0?void 0:o.subscribe(r))!==null&&n!==void 0?n:Er},t}(v);var St={now:function(){return(St.delegate||Date).now()},delegate:void 0};var Ot=function(e){se(t,e);function t(r,o,n){r===void 0&&(r=1/0),o===void 0&&(o=1/0),n===void 0&&(n=St);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=o,i._timestampProvider=n,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=o===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,o),i}return t.prototype.next=function(r){var o=this,n=o.isStopped,i=o._buffer,s=o._infiniteTimeWindow,a=o._timestampProvider,c=o._windowTime;n||(i.push(r),!s&&i.push(a.now()+c)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var o=this._innerSubscribe(r),n=this,i=n._infiniteTimeWindow,s=n._buffer,a=s.slice(),c=0;c0?e.prototype.requestAsyncId.call(this,r,o,n):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,o,n){var i;if(n===void 0&&(n=0),n!=null?n>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,o,n);var s=r.actions;o!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==o&&(ut.cancelAnimationFrame(o),r._scheduled=void 0)},t}(zt);var yo=function(e){se(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var o=this._scheduled;this._scheduled=void 0;var n=this.actions,i;r=r||n.shift();do if(i=r.execute(r.state,r.delay))break;while((r=n[0])&&r.id===o&&n.shift());if(this._active=!1,i){for(;(r=n[0])&&r.id===o&&n.shift();)r.unsubscribe();throw i}},t}(qt);var de=new yo(xo);var L=new j(function(e){return e.complete()});function Kt(e){return e&&k(e.schedule)}function _r(e){return e[e.length-1]}function Je(e){return k(_r(e))?e.pop():void 0}function Ae(e){return Kt(_r(e))?e.pop():void 0}function Qt(e,t){return typeof _r(e)=="number"?e.pop():t}var dt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Yt(e){return k(e==null?void 0:e.then)}function Bt(e){return k(e[ft])}function Gt(e){return Symbol.asyncIterator&&k(e==null?void 0:e[Symbol.asyncIterator])}function Jt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Di(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Xt=Di();function Zt(e){return k(e==null?void 0:e[Xt])}function er(e){return ao(this,arguments,function(){var r,o,n,i;return Ut(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,ot(r.read())];case 3:return o=s.sent(),n=o.value,i=o.done,i?[4,ot(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,ot(n)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function tr(e){return k(e==null?void 0:e.getReader)}function N(e){if(e instanceof j)return e;if(e!=null){if(Bt(e))return Ni(e);if(dt(e))return Vi(e);if(Yt(e))return zi(e);if(Gt(e))return Eo(e);if(Zt(e))return qi(e);if(tr(e))return Ki(e)}throw Jt(e)}function Ni(e){return new j(function(t){var r=e[ft]();if(k(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Vi(e){return new j(function(t){for(var r=0;r=2;return function(o){return o.pipe(e?g(function(n,i){return e(n,i,o)}):ce,ye(1),r?Qe(t):jo(function(){return new or}))}}function $r(e){return e<=0?function(){return L}:x(function(t,r){var o=[];t.subscribe(S(r,function(n){o.push(n),e=2,!0))}function le(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new v}:t,o=e.resetOnError,n=o===void 0?!0:o,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,c=a===void 0?!0:a;return function(p){var l,f,u,h=0,w=!1,A=!1,Z=function(){f==null||f.unsubscribe(),f=void 0},te=function(){Z(),l=u=void 0,w=A=!1},J=function(){var C=l;te(),C==null||C.unsubscribe()};return x(function(C,ct){h++,!A&&!w&&Z();var Ne=u=u!=null?u:r();ct.add(function(){h--,h===0&&!A&&!w&&(f=Pr(J,c))}),Ne.subscribe(ct),!l&&h>0&&(l=new it({next:function(Pe){return Ne.next(Pe)},error:function(Pe){A=!0,Z(),f=Pr(te,n,Pe),Ne.error(Pe)},complete:function(){w=!0,Z(),f=Pr(te,s),Ne.complete()}}),N(C).subscribe(l))})(p)}}function Pr(e,t){for(var r=[],o=2;oe.next(document)),e}function R(e,t=document){return Array.from(t.querySelectorAll(e))}function P(e,t=document){let r=me(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function me(e,t=document){return t.querySelector(e)||void 0}function Re(){var e,t,r,o;return(o=(r=(t=(e=document.activeElement)==null?void 0:e.shadowRoot)==null?void 0:t.activeElement)!=null?r:document.activeElement)!=null?o:void 0}var la=T(d(document.body,"focusin"),d(document.body,"focusout")).pipe(be(1),q(void 0),m(()=>Re()||document.body),B(1));function vt(e){return la.pipe(m(t=>e.contains(t)),Y())}function Vo(e,t){return T(d(e,"mouseenter").pipe(m(()=>!0)),d(e,"mouseleave").pipe(m(()=>!1))).pipe(t?be(t):ce,q(!1))}function Ue(e){return{x:e.offsetLeft,y:e.offsetTop}}function zo(e){return T(d(window,"load"),d(window,"resize")).pipe(Me(0,de),m(()=>Ue(e)),q(Ue(e)))}function ir(e){return{x:e.scrollLeft,y:e.scrollTop}}function et(e){return T(d(e,"scroll"),d(window,"resize")).pipe(Me(0,de),m(()=>ir(e)),q(ir(e)))}function qo(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)qo(e,r)}function E(e,t,...r){let o=document.createElement(e);if(t)for(let n of Object.keys(t))typeof t[n]!="undefined"&&(typeof t[n]!="boolean"?o.setAttribute(n,t[n]):o.setAttribute(n,""));for(let n of r)qo(o,n);return o}function ar(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function gt(e){let t=E("script",{src:e});return H(()=>(document.head.appendChild(t),T(d(t,"load"),d(t,"error").pipe(b(()=>Ar(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(m(()=>{}),_(()=>document.head.removeChild(t)),ye(1))))}var Ko=new v,ma=H(()=>typeof ResizeObserver=="undefined"?gt("https://unpkg.com/resize-observer-polyfill"):$(void 0)).pipe(m(()=>new ResizeObserver(e=>{for(let t of e)Ko.next(t)})),b(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function pe(e){return{width:e.offsetWidth,height:e.offsetHeight}}function Ee(e){return ma.pipe(y(t=>t.observe(e)),b(t=>Ko.pipe(g(({target:r})=>r===e),_(()=>t.unobserve(e)),m(()=>pe(e)))),q(pe(e)))}function xt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function sr(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var Qo=new v,fa=H(()=>$(new IntersectionObserver(e=>{for(let t of e)Qo.next(t)},{threshold:0}))).pipe(b(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function yt(e){return fa.pipe(y(t=>t.observe(e)),b(t=>Qo.pipe(g(({target:r})=>r===e),_(()=>t.unobserve(e)),m(({isIntersecting:r})=>r))))}function Yo(e,t=16){return et(e).pipe(m(({y:r})=>{let o=pe(e),n=xt(e);return r>=n.height-o.height-t}),Y())}var cr={drawer:P("[data-md-toggle=drawer]"),search:P("[data-md-toggle=search]")};function Bo(e){return cr[e].checked}function Be(e,t){cr[e].checked!==t&&cr[e].click()}function We(e){let t=cr[e];return d(t,"change").pipe(m(()=>t.checked),q(t.checked))}function ua(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function da(){return T(d(window,"compositionstart").pipe(m(()=>!0)),d(window,"compositionend").pipe(m(()=>!1))).pipe(q(!1))}function Go(){let e=d(window,"keydown").pipe(g(t=>!(t.metaKey||t.ctrlKey)),m(t=>({mode:Bo("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),g(({mode:t,type:r})=>{if(t==="global"){let o=Re();if(typeof o!="undefined")return!ua(o,r)}return!0}),le());return da().pipe(b(t=>t?L:e))}function ve(){return new URL(location.href)}function st(e,t=!1){if(G("navigation.instant")&&!t){let r=E("a",{href:e.href});document.body.appendChild(r),r.click(),r.remove()}else location.href=e.href}function Jo(){return new v}function Xo(){return location.hash.slice(1)}function Zo(e){let t=E("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function ha(e){return T(d(window,"hashchange"),e).pipe(m(Xo),q(Xo()),g(t=>t.length>0),B(1))}function en(e){return ha(e).pipe(m(t=>me(`[id="${t}"]`)),g(t=>typeof t!="undefined"))}function At(e){let t=matchMedia(e);return nr(r=>t.addListener(()=>r(t.matches))).pipe(q(t.matches))}function tn(){let e=matchMedia("print");return T(d(window,"beforeprint").pipe(m(()=>!0)),d(window,"afterprint").pipe(m(()=>!1))).pipe(q(e.matches))}function Ur(e,t){return e.pipe(b(r=>r?t():L))}function Wr(e,t){return new j(r=>{let o=new XMLHttpRequest;return o.open("GET",`${e}`),o.responseType="blob",o.addEventListener("load",()=>{o.status>=200&&o.status<300?(r.next(o.response),r.complete()):r.error(new Error(o.statusText))}),o.addEventListener("error",()=>{r.error(new Error("Network error"))}),o.addEventListener("abort",()=>{r.complete()}),typeof(t==null?void 0:t.progress$)!="undefined"&&(o.addEventListener("progress",n=>{var i;if(n.lengthComputable)t.progress$.next(n.loaded/n.total*100);else{let s=(i=o.getResponseHeader("Content-Length"))!=null?i:0;t.progress$.next(n.loaded/+s*100)}}),t.progress$.next(5)),o.send(),()=>o.abort()})}function De(e,t){return Wr(e,t).pipe(b(r=>r.text()),m(r=>JSON.parse(r)),B(1))}function rn(e,t){let r=new DOMParser;return Wr(e,t).pipe(b(o=>o.text()),m(o=>r.parseFromString(o,"text/html")),B(1))}function on(e,t){let r=new DOMParser;return Wr(e,t).pipe(b(o=>o.text()),m(o=>r.parseFromString(o,"text/xml")),B(1))}function nn(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function an(){return T(d(window,"scroll",{passive:!0}),d(window,"resize",{passive:!0})).pipe(m(nn),q(nn()))}function sn(){return{width:innerWidth,height:innerHeight}}function cn(){return d(window,"resize",{passive:!0}).pipe(m(sn),q(sn()))}function pn(){return Q([an(),cn()]).pipe(m(([e,t])=>({offset:e,size:t})),B(1))}function pr(e,{viewport$:t,header$:r}){let o=t.pipe(X("size")),n=Q([o,r]).pipe(m(()=>Ue(e)));return Q([r,t,n]).pipe(m(([{height:i},{offset:s,size:a},{x:c,y:p}])=>({offset:{x:s.x-c,y:s.y-p+i},size:a})))}function ba(e){return d(e,"message",t=>t.data)}function va(e){let t=new v;return t.subscribe(r=>e.postMessage(r)),t}function ln(e,t=new Worker(e)){let r=ba(t),o=va(t),n=new v;n.subscribe(o);let i=o.pipe(ee(),oe(!0));return n.pipe(ee(),$e(r.pipe(U(i))),le())}var ga=P("#__config"),Et=JSON.parse(ga.textContent);Et.base=`${new URL(Et.base,ve())}`;function we(){return Et}function G(e){return Et.features.includes(e)}function ge(e,t){return typeof t!="undefined"?Et.translations[e].replace("#",t.toString()):Et.translations[e]}function Te(e,t=document){return P(`[data-md-component=${e}]`,t)}function ne(e,t=document){return R(`[data-md-component=${e}]`,t)}function xa(e){let t=P(".md-typeset > :first-child",e);return d(t,"click",{once:!0}).pipe(m(()=>P(".md-typeset",e)),m(r=>({hash:__md_hash(r.innerHTML)})))}function mn(e){if(!G("announce.dismiss")||!e.childElementCount)return L;if(!e.hidden){let t=P(".md-typeset",e);__md_hash(t.innerHTML)===__md_get("__announce")&&(e.hidden=!0)}return H(()=>{let t=new v;return t.subscribe(({hash:r})=>{e.hidden=!0,__md_set("__announce",r)}),xa(e).pipe(y(r=>t.next(r)),_(()=>t.complete()),m(r=>F({ref:e},r)))})}function ya(e,{target$:t}){return t.pipe(m(r=>({hidden:r!==e})))}function fn(e,t){let r=new v;return r.subscribe(({hidden:o})=>{e.hidden=o}),ya(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))}function Ct(e,t){return t==="inline"?E("div",{class:"md-tooltip md-tooltip--inline",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"})):E("div",{class:"md-tooltip",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"}))}function un(e,t){if(t=t?`${t}_annotation_${e}`:void 0,t){let r=t?`#${t}`:void 0;return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("a",{href:r,class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}else return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("span",{class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}function dn(e){return E("button",{class:"md-clipboard md-icon",title:ge("clipboard.copy"),"data-clipboard-target":`#${e} > code`})}function Dr(e,t){let r=t&2,o=t&1,n=Object.keys(e.terms).filter(c=>!e.terms[c]).reduce((c,p)=>[...c,E("del",null,p)," "],[]).slice(0,-1),i=we(),s=new URL(e.location,i.base);G("search.highlight")&&s.searchParams.set("h",Object.entries(e.terms).filter(([,c])=>c).reduce((c,[p])=>`${c} ${p}`.trim(),""));let{tags:a}=we();return E("a",{href:`${s}`,class:"md-search-result__link",tabIndex:-1},E("article",{class:"md-search-result__article md-typeset","data-md-score":e.score.toFixed(2)},r>0&&E("div",{class:"md-search-result__icon md-icon"}),r>0&&E("h1",null,e.title),r<=0&&E("h2",null,e.title),o>0&&e.text.length>0&&e.text,e.tags&&e.tags.map(c=>{let p=a?c in a?`md-tag-icon md-tag--${a[c]}`:"md-tag-icon":"";return E("span",{class:`md-tag ${p}`},c)}),o>0&&n.length>0&&E("p",{class:"md-search-result__terms"},ge("search.result.term.missing"),": ",...n)))}function hn(e){let t=e[0].score,r=[...e],o=we(),n=r.findIndex(l=>!`${new URL(l.location,o.base)}`.includes("#")),[i]=r.splice(n,1),s=r.findIndex(l=>l.scoreDr(l,1)),...c.length?[E("details",{class:"md-search-result__more"},E("summary",{tabIndex:-1},E("div",null,c.length>0&&c.length===1?ge("search.result.more.one"):ge("search.result.more.other",c.length))),...c.map(l=>Dr(l,1)))]:[]];return E("li",{class:"md-search-result__item"},p)}function bn(e){return E("ul",{class:"md-source__facts"},Object.entries(e).map(([t,r])=>E("li",{class:`md-source__fact md-source__fact--${t}`},typeof r=="number"?ar(r):r)))}function Nr(e){let t=`tabbed-control tabbed-control--${e}`;return E("div",{class:t,hidden:!0},E("button",{class:"tabbed-button",tabIndex:-1,"aria-hidden":"true"}))}function vn(e){return E("div",{class:"md-typeset__scrollwrap"},E("div",{class:"md-typeset__table"},e))}function Ea(e){let t=we(),r=new URL(`../${e.version}/`,t.base);return E("li",{class:"md-version__item"},E("a",{href:`${r}`,class:"md-version__link"},e.title))}function gn(e,t){return e=e.filter(r=>{var o;return!((o=r.properties)!=null&&o.hidden)}),E("div",{class:"md-version"},E("button",{class:"md-version__current","aria-label":ge("select.version")},t.title),E("ul",{class:"md-version__list"},e.map(Ea)))}var wa=0;function Ta(e,t){document.body.append(e);let{width:r}=pe(e);e.style.setProperty("--md-tooltip-width",`${r}px`),e.remove();let o=sr(t),n=typeof o!="undefined"?et(o):$({x:0,y:0}),i=T(vt(t),Vo(t)).pipe(Y());return Q([i,n]).pipe(m(([s,a])=>{let{x:c,y:p}=Ue(t),l=pe(t),f=t.closest("table");return f&&t.parentElement&&(c+=f.offsetLeft+t.parentElement.offsetLeft,p+=f.offsetTop+t.parentElement.offsetTop),{active:s,offset:{x:c-a.x+l.width/2-r/2,y:p-a.y+l.height+8}}}))}function Ge(e){let t=e.title;if(!t.length)return L;let r=`__tooltip_${wa++}`,o=Ct(r,"inline"),n=P(".md-typeset",o);return n.innerHTML=t,H(()=>{let i=new v;return i.subscribe({next({offset:s}){o.style.setProperty("--md-tooltip-x",`${s.x}px`),o.style.setProperty("--md-tooltip-y",`${s.y}px`)},complete(){o.style.removeProperty("--md-tooltip-x"),o.style.removeProperty("--md-tooltip-y")}}),T(i.pipe(g(({active:s})=>s)),i.pipe(be(250),g(({active:s})=>!s))).subscribe({next({active:s}){s?(e.insertAdjacentElement("afterend",o),e.setAttribute("aria-describedby",r),e.removeAttribute("title")):(o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t))},complete(){o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t)}}),i.pipe(Me(16,de)).subscribe(({active:s})=>{o.classList.toggle("md-tooltip--active",s)}),i.pipe(_t(125,de),g(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:s})=>s)).subscribe({next(s){s?o.style.setProperty("--md-tooltip-0",`${-s}px`):o.style.removeProperty("--md-tooltip-0")},complete(){o.style.removeProperty("--md-tooltip-0")}}),Ta(o,e).pipe(y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))}).pipe(ze(ie))}function Sa(e,t){let r=H(()=>Q([zo(e),et(t)])).pipe(m(([{x:o,y:n},i])=>{let{width:s,height:a}=pe(e);return{x:o-i.x+s/2,y:n-i.y+a/2}}));return vt(e).pipe(b(o=>r.pipe(m(n=>({active:o,offset:n})),ye(+!o||1/0))))}function xn(e,t,{target$:r}){let[o,n]=Array.from(e.children);return H(()=>{let i=new v,s=i.pipe(ee(),oe(!0));return i.subscribe({next({offset:a}){e.style.setProperty("--md-tooltip-x",`${a.x}px`),e.style.setProperty("--md-tooltip-y",`${a.y}px`)},complete(){e.style.removeProperty("--md-tooltip-x"),e.style.removeProperty("--md-tooltip-y")}}),yt(e).pipe(U(s)).subscribe(a=>{e.toggleAttribute("data-md-visible",a)}),T(i.pipe(g(({active:a})=>a)),i.pipe(be(250),g(({active:a})=>!a))).subscribe({next({active:a}){a?e.prepend(o):o.remove()},complete(){e.prepend(o)}}),i.pipe(Me(16,de)).subscribe(({active:a})=>{o.classList.toggle("md-tooltip--active",a)}),i.pipe(_t(125,de),g(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:a})=>a)).subscribe({next(a){a?e.style.setProperty("--md-tooltip-0",`${-a}px`):e.style.removeProperty("--md-tooltip-0")},complete(){e.style.removeProperty("--md-tooltip-0")}}),d(n,"click").pipe(U(s),g(a=>!(a.metaKey||a.ctrlKey))).subscribe(a=>{a.stopPropagation(),a.preventDefault()}),d(n,"mousedown").pipe(U(s),ae(i)).subscribe(([a,{active:c}])=>{var p;if(a.button!==0||a.metaKey||a.ctrlKey)a.preventDefault();else if(c){a.preventDefault();let l=e.parentElement.closest(".md-annotation");l instanceof HTMLElement?l.focus():(p=Re())==null||p.blur()}}),r.pipe(U(s),g(a=>a===o),Ye(125)).subscribe(()=>e.focus()),Sa(e,t).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))})}function Oa(e){return e.tagName==="CODE"?R(".c, .c1, .cm",e):[e]}function Ma(e){let t=[];for(let r of Oa(e)){let o=[],n=document.createNodeIterator(r,NodeFilter.SHOW_TEXT);for(let i=n.nextNode();i;i=n.nextNode())o.push(i);for(let i of o){let s;for(;s=/(\(\d+\))(!)?/.exec(i.textContent);){let[,a,c]=s;if(typeof c=="undefined"){let p=i.splitText(s.index);i=p.splitText(a.length),t.push(p)}else{i.textContent=a,t.push(i);break}}}}return t}function yn(e,t){t.append(...Array.from(e.childNodes))}function lr(e,t,{target$:r,print$:o}){let n=t.closest("[id]"),i=n==null?void 0:n.id,s=new Map;for(let a of Ma(t)){let[,c]=a.textContent.match(/\((\d+)\)/);me(`:scope > li:nth-child(${c})`,e)&&(s.set(c,un(c,i)),a.replaceWith(s.get(c)))}return s.size===0?L:H(()=>{let a=new v,c=a.pipe(ee(),oe(!0)),p=[];for(let[l,f]of s)p.push([P(".md-typeset",f),P(`:scope > li:nth-child(${l})`,e)]);return o.pipe(U(c)).subscribe(l=>{e.hidden=!l,e.classList.toggle("md-annotation-list",l);for(let[f,u]of p)l?yn(f,u):yn(u,f)}),T(...[...s].map(([,l])=>xn(l,t,{target$:r}))).pipe(_(()=>a.complete()),le())})}function En(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return En(t)}}function wn(e,t){return H(()=>{let r=En(e);return typeof r!="undefined"?lr(r,e,t):L})}var Tn=jt(zr());var La=0;function Sn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Sn(t)}}function _a(e){return Ee(e).pipe(m(({width:t})=>({scrollable:xt(e).width>t})),X("scrollable"))}function On(e,t){let{matches:r}=matchMedia("(hover)"),o=H(()=>{let n=new v,i=n.pipe($r(1));n.subscribe(({scrollable:c})=>{c&&r?e.setAttribute("tabindex","0"):e.removeAttribute("tabindex")});let s=[];if(Tn.default.isSupported()&&(e.closest(".copy")||G("content.code.copy")&&!e.closest(".no-copy"))){let c=e.closest("pre");c.id=`__code_${La++}`;let p=dn(c.id);c.insertBefore(p,e),G("content.tooltips")&&s.push(Ge(p))}let a=e.closest(".highlight");if(a instanceof HTMLElement){let c=Sn(a);if(typeof c!="undefined"&&(a.classList.contains("annotate")||G("content.code.annotate"))){let p=lr(c,e,t);s.push(Ee(a).pipe(U(i),m(({width:l,height:f})=>l&&f),Y(),b(l=>l?p:L)))}}return _a(e).pipe(y(c=>n.next(c)),_(()=>n.complete()),m(c=>F({ref:e},c)),$e(...s))});return G("content.lazy")?yt(e).pipe(g(n=>n),ye(1),b(()=>o)):o}function Aa(e,{target$:t,print$:r}){let o=!0;return T(t.pipe(m(n=>n.closest("details:not([open])")),g(n=>e===n),m(()=>({action:"open",reveal:!0}))),r.pipe(g(n=>n||!o),y(()=>o=e.open),m(n=>({action:n?"open":"close"}))))}function Mn(e,t){return H(()=>{let r=new v;return r.subscribe(({action:o,reveal:n})=>{e.toggleAttribute("open",o==="open"),n&&e.scrollIntoView()}),Aa(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}var Ln=".node circle,.node ellipse,.node path,.node polygon,.node rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}marker{fill:var(--md-mermaid-edge-color)!important}.edgeLabel .label rect{fill:#0000}.label{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.label foreignObject{line-height:normal;overflow:visible}.label div .edgeLabel{color:var(--md-mermaid-label-fg-color)}.edgeLabel,.edgeLabel rect,.label div .edgeLabel{background-color:var(--md-mermaid-label-bg-color)}.edgeLabel,.edgeLabel rect{fill:var(--md-mermaid-label-bg-color);color:var(--md-mermaid-edge-color)}.edgePath .path,.flowchart-link{stroke:var(--md-mermaid-edge-color);stroke-width:.05rem}.edgePath .arrowheadPath{fill:var(--md-mermaid-edge-color);stroke:none}.cluster rect{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}.cluster span{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}g #flowchart-circleEnd,g #flowchart-circleStart,g #flowchart-crossEnd,g #flowchart-crossStart,g #flowchart-pointEnd,g #flowchart-pointStart{stroke:none}g.classGroup line,g.classGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.classGroup text{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.classLabel .box{fill:var(--md-mermaid-label-bg-color);background-color:var(--md-mermaid-label-bg-color);opacity:1}.classLabel .label{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node .divider{stroke:var(--md-mermaid-node-fg-color)}.relation{stroke:var(--md-mermaid-edge-color)}.cardinality{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.cardinality text{fill:inherit!important}defs #classDiagram-compositionEnd,defs #classDiagram-compositionStart,defs #classDiagram-dependencyEnd,defs #classDiagram-dependencyStart,defs #classDiagram-extensionEnd,defs #classDiagram-extensionStart{fill:var(--md-mermaid-edge-color)!important;stroke:var(--md-mermaid-edge-color)!important}defs #classDiagram-aggregationEnd,defs #classDiagram-aggregationStart{fill:var(--md-mermaid-label-bg-color)!important;stroke:var(--md-mermaid-edge-color)!important}g.stateGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.stateGroup .state-title{fill:var(--md-mermaid-label-fg-color)!important;font-family:var(--md-mermaid-font-family)}g.stateGroup .composit{fill:var(--md-mermaid-label-bg-color)}.nodeLabel,.nodeLabel p{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node circle.state-end,.node circle.state-start,.start-state{fill:var(--md-mermaid-edge-color);stroke:none}.end-state-inner,.end-state-outer{fill:var(--md-mermaid-edge-color)}.end-state-inner,.node circle.state-end{stroke:var(--md-mermaid-label-bg-color)}.transition{stroke:var(--md-mermaid-edge-color)}[id^=state-fork] rect,[id^=state-join] rect{fill:var(--md-mermaid-edge-color)!important;stroke:none!important}.statediagram-cluster.statediagram-cluster .inner{fill:var(--md-default-bg-color)}.statediagram-cluster rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.statediagram-state rect.divider{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}defs #statediagram-barbEnd{stroke:var(--md-mermaid-edge-color)}.attributeBoxEven,.attributeBoxOdd{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityBox{fill:var(--md-mermaid-label-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityLabel{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.relationshipLabelBox{fill:var(--md-mermaid-label-bg-color);fill-opacity:1;background-color:var(--md-mermaid-label-bg-color);opacity:1}.relationshipLabel{fill:var(--md-mermaid-label-fg-color)}.relationshipLine{stroke:var(--md-mermaid-edge-color)}defs #ONE_OR_MORE_END *,defs #ONE_OR_MORE_START *,defs #ONLY_ONE_END *,defs #ONLY_ONE_START *,defs #ZERO_OR_MORE_END *,defs #ZERO_OR_MORE_START *,defs #ZERO_OR_ONE_END *,defs #ZERO_OR_ONE_START *{stroke:var(--md-mermaid-edge-color)!important}defs #ZERO_OR_MORE_END circle,defs #ZERO_OR_MORE_START circle{fill:var(--md-mermaid-label-bg-color)}.actor{fill:var(--md-mermaid-sequence-actor-bg-color);stroke:var(--md-mermaid-sequence-actor-border-color)}text.actor>tspan{fill:var(--md-mermaid-sequence-actor-fg-color);font-family:var(--md-mermaid-font-family)}line{stroke:var(--md-mermaid-sequence-actor-line-color)}.actor-man circle,.actor-man line{fill:var(--md-mermaid-sequence-actorman-bg-color);stroke:var(--md-mermaid-sequence-actorman-line-color)}.messageLine0,.messageLine1{stroke:var(--md-mermaid-sequence-message-line-color)}.note{fill:var(--md-mermaid-sequence-note-bg-color);stroke:var(--md-mermaid-sequence-note-border-color)}.loopText,.loopText>tspan,.messageText,.noteText>tspan{stroke:none;font-family:var(--md-mermaid-font-family)!important}.messageText{fill:var(--md-mermaid-sequence-message-fg-color)}.loopText,.loopText>tspan{fill:var(--md-mermaid-sequence-loop-fg-color)}.noteText>tspan{fill:var(--md-mermaid-sequence-note-fg-color)}#arrowhead path{fill:var(--md-mermaid-sequence-message-line-color);stroke:none}.loopLine{fill:var(--md-mermaid-sequence-loop-bg-color);stroke:var(--md-mermaid-sequence-loop-border-color)}.labelBox{fill:var(--md-mermaid-sequence-label-bg-color);stroke:none}.labelText,.labelText>span{fill:var(--md-mermaid-sequence-label-fg-color);font-family:var(--md-mermaid-font-family)}.sequenceNumber{fill:var(--md-mermaid-sequence-number-fg-color)}rect.rect{fill:var(--md-mermaid-sequence-box-bg-color);stroke:none}rect.rect+text.text{fill:var(--md-mermaid-sequence-box-fg-color)}defs #sequencenumber{fill:var(--md-mermaid-sequence-number-bg-color)!important}";var qr,ka=0;function Ha(){return typeof mermaid=="undefined"||mermaid instanceof Element?gt("https://unpkg.com/mermaid@10.7.0/dist/mermaid.min.js"):$(void 0)}function _n(e){return e.classList.remove("mermaid"),qr||(qr=Ha().pipe(y(()=>mermaid.initialize({startOnLoad:!1,themeCSS:Ln,sequence:{actorFontSize:"16px",messageFontSize:"16px",noteFontSize:"16px"}})),m(()=>{}),B(1))),qr.subscribe(()=>ro(this,null,function*(){e.classList.add("mermaid");let t=`__mermaid_${ka++}`,r=E("div",{class:"mermaid"}),o=e.textContent,{svg:n,fn:i}=yield mermaid.render(t,o),s=r.attachShadow({mode:"closed"});s.innerHTML=n,e.replaceWith(r),i==null||i(s)})),qr.pipe(m(()=>({ref:e})))}var An=E("table");function Cn(e){return e.replaceWith(An),An.replaceWith(vn(e)),$({ref:e})}function $a(e){let t=e.find(r=>r.checked)||e[0];return T(...e.map(r=>d(r,"change").pipe(m(()=>P(`label[for="${r.id}"]`))))).pipe(q(P(`label[for="${t.id}"]`)),m(r=>({active:r})))}function kn(e,{viewport$:t,target$:r}){let o=P(".tabbed-labels",e),n=R(":scope > input",e),i=Nr("prev");e.append(i);let s=Nr("next");return e.append(s),H(()=>{let a=new v,c=a.pipe(ee(),oe(!0));Q([a,Ee(e)]).pipe(U(c),Me(1,de)).subscribe({next([{active:p},l]){let f=Ue(p),{width:u}=pe(p);e.style.setProperty("--md-indicator-x",`${f.x}px`),e.style.setProperty("--md-indicator-width",`${u}px`);let h=ir(o);(f.xh.x+l.width)&&o.scrollTo({left:Math.max(0,f.x-16),behavior:"smooth"})},complete(){e.style.removeProperty("--md-indicator-x"),e.style.removeProperty("--md-indicator-width")}}),Q([et(o),Ee(o)]).pipe(U(c)).subscribe(([p,l])=>{let f=xt(o);i.hidden=p.x<16,s.hidden=p.x>f.width-l.width-16}),T(d(i,"click").pipe(m(()=>-1)),d(s,"click").pipe(m(()=>1))).pipe(U(c)).subscribe(p=>{let{width:l}=pe(o);o.scrollBy({left:l*p,behavior:"smooth"})}),r.pipe(U(c),g(p=>n.includes(p))).subscribe(p=>p.click()),o.classList.add("tabbed-labels--linked");for(let p of n){let l=P(`label[for="${p.id}"]`);l.replaceChildren(E("a",{href:`#${l.htmlFor}`,tabIndex:-1},...Array.from(l.childNodes))),d(l.firstElementChild,"click").pipe(U(c),g(f=>!(f.metaKey||f.ctrlKey)),y(f=>{f.preventDefault(),f.stopPropagation()})).subscribe(()=>{history.replaceState({},"",`#${l.htmlFor}`),l.click()})}return G("content.tabs.link")&&a.pipe(Le(1),ae(t)).subscribe(([{active:p},{offset:l}])=>{let f=p.innerText.trim();if(p.hasAttribute("data-md-switching"))p.removeAttribute("data-md-switching");else{let u=e.offsetTop-l.y;for(let w of R("[data-tabs]"))for(let A of R(":scope > input",w)){let Z=P(`label[for="${A.id}"]`);if(Z!==p&&Z.innerText.trim()===f){Z.setAttribute("data-md-switching",""),A.click();break}}window.scrollTo({top:e.offsetTop-u});let h=__md_get("__tabs")||[];__md_set("__tabs",[...new Set([f,...h])])}}),a.pipe(U(c)).subscribe(()=>{for(let p of R("audio, video",e))p.pause()}),$a(n).pipe(y(p=>a.next(p)),_(()=>a.complete()),m(p=>F({ref:e},p)))}).pipe(ze(ie))}function Hn(e,{viewport$:t,target$:r,print$:o}){return T(...R(".annotate:not(.highlight)",e).map(n=>wn(n,{target$:r,print$:o})),...R("pre:not(.mermaid) > code",e).map(n=>On(n,{target$:r,print$:o})),...R("pre.mermaid",e).map(n=>_n(n)),...R("table:not([class])",e).map(n=>Cn(n)),...R("details",e).map(n=>Mn(n,{target$:r,print$:o})),...R("[data-tabs]",e).map(n=>kn(n,{viewport$:t,target$:r})),...R("[title]",e).filter(()=>G("content.tooltips")).map(n=>Ge(n)))}function Ra(e,{alert$:t}){return t.pipe(b(r=>T($(!0),$(!1).pipe(Ye(2e3))).pipe(m(o=>({message:r,active:o})))))}function $n(e,t){let r=P(".md-typeset",e);return H(()=>{let o=new v;return o.subscribe(({message:n,active:i})=>{e.classList.toggle("md-dialog--active",i),r.textContent=n}),Ra(e,t).pipe(y(n=>o.next(n)),_(()=>o.complete()),m(n=>F({ref:e},n)))})}function Pa({viewport$:e}){if(!G("header.autohide"))return $(!1);let t=e.pipe(m(({offset:{y:n}})=>n),Ke(2,1),m(([n,i])=>[nMath.abs(i-n.y)>100),m(([,[n]])=>n),Y()),o=We("search");return Q([e,o]).pipe(m(([{offset:n},i])=>n.y>400&&!i),Y(),b(n=>n?r:$(!1)),q(!1))}function Rn(e,t){return H(()=>Q([Ee(e),Pa(t)])).pipe(m(([{height:r},o])=>({height:r,hidden:o})),Y((r,o)=>r.height===o.height&&r.hidden===o.hidden),B(1))}function Pn(e,{header$:t,main$:r}){return H(()=>{let o=new v,n=o.pipe(ee(),oe(!0));o.pipe(X("active"),je(t)).subscribe(([{active:s},{hidden:a}])=>{e.classList.toggle("md-header--shadow",s&&!a),e.hidden=a});let i=fe(R("[title]",e)).pipe(g(()=>G("content.tooltips")),re(s=>Ge(s)));return r.subscribe(o),t.pipe(U(n),m(s=>F({ref:e},s)),$e(i.pipe(U(n))))})}function Ia(e,{viewport$:t,header$:r}){return pr(e,{viewport$:t,header$:r}).pipe(m(({offset:{y:o}})=>{let{height:n}=pe(e);return{active:o>=n}}),X("active"))}function In(e,t){return H(()=>{let r=new v;r.subscribe({next({active:n}){e.classList.toggle("md-header__title--active",n)},complete(){e.classList.remove("md-header__title--active")}});let o=me(".md-content h1");return typeof o=="undefined"?L:Ia(o,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))})}function Fn(e,{viewport$:t,header$:r}){let o=r.pipe(m(({height:i})=>i),Y()),n=o.pipe(b(()=>Ee(e).pipe(m(({height:i})=>({top:e.offsetTop,bottom:e.offsetTop+i})),X("bottom"))));return Q([o,n,t]).pipe(m(([i,{top:s,bottom:a},{offset:{y:c},size:{height:p}}])=>(p=Math.max(0,p-Math.max(0,s-c,i)-Math.max(0,p+c-a)),{offset:s-i,height:p,active:s-i<=c})),Y((i,s)=>i.offset===s.offset&&i.height===s.height&&i.active===s.active))}function Fa(e){let t=__md_get("__palette")||{index:e.findIndex(o=>matchMedia(o.getAttribute("data-md-color-media")).matches)},r=Math.max(0,Math.min(t.index,e.length-1));return $(...e).pipe(re(o=>d(o,"change").pipe(m(()=>o))),q(e[r]),m(o=>({index:e.indexOf(o),color:{media:o.getAttribute("data-md-color-media"),scheme:o.getAttribute("data-md-color-scheme"),primary:o.getAttribute("data-md-color-primary"),accent:o.getAttribute("data-md-color-accent")}})),B(1))}function jn(e){let t=R("input",e),r=E("meta",{name:"theme-color"});document.head.appendChild(r);let o=E("meta",{name:"color-scheme"});document.head.appendChild(o);let n=At("(prefers-color-scheme: light)");return H(()=>{let i=new v;return i.subscribe(s=>{if(document.body.setAttribute("data-md-color-switching",""),s.color.media==="(prefers-color-scheme)"){let a=matchMedia("(prefers-color-scheme: light)"),c=document.querySelector(a.matches?"[data-md-color-media='(prefers-color-scheme: light)']":"[data-md-color-media='(prefers-color-scheme: dark)']");s.color.scheme=c.getAttribute("data-md-color-scheme"),s.color.primary=c.getAttribute("data-md-color-primary"),s.color.accent=c.getAttribute("data-md-color-accent")}for(let[a,c]of Object.entries(s.color))document.body.setAttribute(`data-md-color-${a}`,c);for(let a=0;a{let s=Te("header"),a=window.getComputedStyle(s);return o.content=a.colorScheme,a.backgroundColor.match(/\d+/g).map(c=>(+c).toString(16).padStart(2,"0")).join("")})).subscribe(s=>r.content=`#${s}`),i.pipe(Oe(ie)).subscribe(()=>{document.body.removeAttribute("data-md-color-switching")}),Fa(t).pipe(U(n.pipe(Le(1))),at(),y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))})}function Un(e,{progress$:t}){return H(()=>{let r=new v;return r.subscribe(({value:o})=>{e.style.setProperty("--md-progress-value",`${o}`)}),t.pipe(y(o=>r.next({value:o})),_(()=>r.complete()),m(o=>({ref:e,value:o})))})}var Kr=jt(zr());function ja(e){e.setAttribute("data-md-copying","");let t=e.closest("[data-copy]"),r=t?t.getAttribute("data-copy"):e.innerText;return e.removeAttribute("data-md-copying"),r.trimEnd()}function Wn({alert$:e}){Kr.default.isSupported()&&new j(t=>{new Kr.default("[data-clipboard-target], [data-clipboard-text]",{text:r=>r.getAttribute("data-clipboard-text")||ja(P(r.getAttribute("data-clipboard-target")))}).on("success",r=>t.next(r))}).pipe(y(t=>{t.trigger.focus()}),m(()=>ge("clipboard.copied"))).subscribe(e)}function Dn(e,t){return e.protocol=t.protocol,e.hostname=t.hostname,e}function Ua(e,t){let r=new Map;for(let o of R("url",e)){let n=P("loc",o),i=[Dn(new URL(n.textContent),t)];r.set(`${i[0]}`,i);for(let s of R("[rel=alternate]",o)){let a=s.getAttribute("href");a!=null&&i.push(Dn(new URL(a),t))}}return r}function mr(e){return on(new URL("sitemap.xml",e)).pipe(m(t=>Ua(t,new URL(e))),he(()=>$(new Map)))}function Wa(e,t){if(!(e.target instanceof Element))return L;let r=e.target.closest("a");if(r===null)return L;if(r.target||e.metaKey||e.ctrlKey)return L;let o=new URL(r.href);return o.search=o.hash="",t.has(`${o}`)?(e.preventDefault(),$(new URL(r.href))):L}function Nn(e){let t=new Map;for(let r of R(":scope > *",e.head))t.set(r.outerHTML,r);return t}function Vn(e){for(let t of R("[href], [src]",e))for(let r of["href","src"]){let o=t.getAttribute(r);if(o&&!/^(?:[a-z]+:)?\/\//i.test(o)){t[r]=t[r];break}}return $(e)}function Da(e){for(let o of["[data-md-component=announce]","[data-md-component=container]","[data-md-component=header-topic]","[data-md-component=outdated]","[data-md-component=logo]","[data-md-component=skip]",...G("navigation.tabs.sticky")?["[data-md-component=tabs]"]:[]]){let n=me(o),i=me(o,e);typeof n!="undefined"&&typeof i!="undefined"&&n.replaceWith(i)}let t=Nn(document);for(let[o,n]of Nn(e))t.has(o)?t.delete(o):document.head.appendChild(n);for(let o of t.values()){let n=o.getAttribute("name");n!=="theme-color"&&n!=="color-scheme"&&o.remove()}let r=Te("container");return Fe(R("script",r)).pipe(b(o=>{let n=e.createElement("script");if(o.src){for(let i of o.getAttributeNames())n.setAttribute(i,o.getAttribute(i));return o.replaceWith(n),new j(i=>{n.onload=()=>i.complete()})}else return n.textContent=o.textContent,o.replaceWith(n),L}),ee(),oe(document))}function zn({location$:e,viewport$:t,progress$:r}){let o=we();if(location.protocol==="file:")return L;let n=mr(o.base);$(document).subscribe(Vn);let i=d(document.body,"click").pipe(je(n),b(([c,p])=>Wa(c,p)),le()),s=d(window,"popstate").pipe(m(ve),le());i.pipe(ae(t)).subscribe(([c,{offset:p}])=>{history.replaceState(p,""),history.pushState(null,"",c)}),T(i,s).subscribe(e);let a=e.pipe(X("pathname"),b(c=>rn(c,{progress$:r}).pipe(he(()=>(st(c,!0),L)))),b(Vn),b(Da),le());return T(a.pipe(ae(e,(c,p)=>p)),e.pipe(X("pathname"),b(()=>e),X("hash")),e.pipe(Y((c,p)=>c.pathname===p.pathname&&c.hash===p.hash),b(()=>i),y(()=>history.back()))).subscribe(c=>{var p,l;history.state!==null||!c.hash?window.scrollTo(0,(l=(p=history.state)==null?void 0:p.y)!=null?l:0):(history.scrollRestoration="auto",Zo(c.hash),history.scrollRestoration="manual")}),e.subscribe(()=>{history.scrollRestoration="manual"}),d(window,"beforeunload").subscribe(()=>{history.scrollRestoration="auto"}),t.pipe(X("offset"),be(100)).subscribe(({offset:c})=>{history.replaceState(c,"")}),a}var Qn=jt(Kn());function Yn(e){let t=e.separator.split("|").map(n=>n.replace(/(\(\?[!=<][^)]+\))/g,"").length===0?"\uFFFD":n).join("|"),r=new RegExp(t,"img"),o=(n,i,s)=>`${i}${s}`;return n=>{n=n.replace(/[\s*+\-:~^]+/g," ").trim();let i=new RegExp(`(^|${e.separator}|)(${n.replace(/[|\\{}()[\]^$+*?.-]/g,"\\$&").replace(r,"|")})`,"img");return s=>(0,Qn.default)(s).replace(i,o).replace(/<\/mark>(\s+)]*>/img,"$1")}}function Ht(e){return e.type===1}function fr(e){return e.type===3}function Bn(e,t){let r=ln(e);return T($(location.protocol!=="file:"),We("search")).pipe(He(o=>o),b(()=>t)).subscribe(({config:o,docs:n})=>r.next({type:0,data:{config:o,docs:n,options:{suggest:G("search.suggest")}}})),r}function Gn({document$:e}){let t=we(),r=De(new URL("../versions.json",t.base)).pipe(he(()=>L)),o=r.pipe(m(n=>{let[,i]=t.base.match(/([^/]+)\/?$/);return n.find(({version:s,aliases:a})=>s===i||a.includes(i))||n[0]}));r.pipe(m(n=>new Map(n.map(i=>[`${new URL(`../${i.version}/`,t.base)}`,i]))),b(n=>d(document.body,"click").pipe(g(i=>!i.metaKey&&!i.ctrlKey),ae(o),b(([i,s])=>{if(i.target instanceof Element){let a=i.target.closest("a");if(a&&!a.target&&n.has(a.href)){let c=a.href;return!i.target.closest(".md-version")&&n.get(c)===s?L:(i.preventDefault(),$(c))}}return L}),b(i=>{let{version:s}=n.get(i);return mr(new URL(i)).pipe(m(a=>{let p=ve().href.replace(t.base,"");return a.has(p.split("#")[0])?new URL(`../${s}/${p}`,t.base):new URL(i)}))})))).subscribe(n=>st(n,!0)),Q([r,o]).subscribe(([n,i])=>{P(".md-header__topic").appendChild(gn(n,i))}),e.pipe(b(()=>o)).subscribe(n=>{var s;let i=__md_get("__outdated",sessionStorage);if(i===null){i=!0;let a=((s=t.version)==null?void 0:s.default)||"latest";Array.isArray(a)||(a=[a]);e:for(let c of a)for(let p of n.aliases.concat(n.version))if(new RegExp(c,"i").test(p)){i=!1;break e}__md_set("__outdated",i,sessionStorage)}if(i)for(let a of ne("outdated"))a.hidden=!1})}function Ka(e,{worker$:t}){let{searchParams:r}=ve();r.has("q")&&(Be("search",!0),e.value=r.get("q"),e.focus(),We("search").pipe(He(i=>!i)).subscribe(()=>{let i=ve();i.searchParams.delete("q"),history.replaceState({},"",`${i}`)}));let o=vt(e),n=T(t.pipe(He(Ht)),d(e,"keyup"),o).pipe(m(()=>e.value),Y());return Q([n,o]).pipe(m(([i,s])=>({value:i,focus:s})),B(1))}function Jn(e,{worker$:t}){let r=new v,o=r.pipe(ee(),oe(!0));Q([t.pipe(He(Ht)),r],(i,s)=>s).pipe(X("value")).subscribe(({value:i})=>t.next({type:2,data:i})),r.pipe(X("focus")).subscribe(({focus:i})=>{i&&Be("search",i)}),d(e.form,"reset").pipe(U(o)).subscribe(()=>e.focus());let n=P("header [for=__search]");return d(n,"click").subscribe(()=>e.focus()),Ka(e,{worker$:t}).pipe(y(i=>r.next(i)),_(()=>r.complete()),m(i=>F({ref:e},i)),B(1))}function Xn(e,{worker$:t,query$:r}){let o=new v,n=Yo(e.parentElement).pipe(g(Boolean)),i=e.parentElement,s=P(":scope > :first-child",e),a=P(":scope > :last-child",e);We("search").subscribe(l=>a.setAttribute("role",l?"list":"presentation")),o.pipe(ae(r),Ir(t.pipe(He(Ht)))).subscribe(([{items:l},{value:f}])=>{switch(l.length){case 0:s.textContent=f.length?ge("search.result.none"):ge("search.result.placeholder");break;case 1:s.textContent=ge("search.result.one");break;default:let u=ar(l.length);s.textContent=ge("search.result.other",u)}});let c=o.pipe(y(()=>a.innerHTML=""),b(({items:l})=>T($(...l.slice(0,10)),$(...l.slice(10)).pipe(Ke(4),jr(n),b(([f])=>f)))),m(hn),le());return c.subscribe(l=>a.appendChild(l)),c.pipe(re(l=>{let f=me("details",l);return typeof f=="undefined"?L:d(f,"toggle").pipe(U(o),m(()=>f))})).subscribe(l=>{l.open===!1&&l.offsetTop<=i.scrollTop&&i.scrollTo({top:l.offsetTop})}),t.pipe(g(fr),m(({data:l})=>l)).pipe(y(l=>o.next(l)),_(()=>o.complete()),m(l=>F({ref:e},l)))}function Qa(e,{query$:t}){return t.pipe(m(({value:r})=>{let o=ve();return o.hash="",r=r.replace(/\s+/g,"+").replace(/&/g,"%26").replace(/=/g,"%3D"),o.search=`q=${r}`,{url:o}}))}function Zn(e,t){let r=new v,o=r.pipe(ee(),oe(!0));return r.subscribe(({url:n})=>{e.setAttribute("data-clipboard-text",e.href),e.href=`${n}`}),d(e,"click").pipe(U(o)).subscribe(n=>n.preventDefault()),Qa(e,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))}function ei(e,{worker$:t,keyboard$:r}){let o=new v,n=Te("search-query"),i=T(d(n,"keydown"),d(n,"focus")).pipe(Oe(ie),m(()=>n.value),Y());return o.pipe(je(i),m(([{suggest:a},c])=>{let p=c.split(/([\s-]+)/);if(a!=null&&a.length&&p[p.length-1]){let l=a[a.length-1];l.startsWith(p[p.length-1])&&(p[p.length-1]=l)}else p.length=0;return p})).subscribe(a=>e.innerHTML=a.join("").replace(/\s/g," ")),r.pipe(g(({mode:a})=>a==="search")).subscribe(a=>{switch(a.type){case"ArrowRight":e.innerText.length&&n.selectionStart===n.value.length&&(n.value=e.innerText);break}}),t.pipe(g(fr),m(({data:a})=>a)).pipe(y(a=>o.next(a)),_(()=>o.complete()),m(()=>({ref:e})))}function ti(e,{index$:t,keyboard$:r}){let o=we();try{let n=Bn(o.search,t),i=Te("search-query",e),s=Te("search-result",e);d(e,"click").pipe(g(({target:c})=>c instanceof Element&&!!c.closest("a"))).subscribe(()=>Be("search",!1)),r.pipe(g(({mode:c})=>c==="search")).subscribe(c=>{let p=Re();switch(c.type){case"Enter":if(p===i){let l=new Map;for(let f of R(":first-child [href]",s)){let u=f.firstElementChild;l.set(f,parseFloat(u.getAttribute("data-md-score")))}if(l.size){let[[f]]=[...l].sort(([,u],[,h])=>h-u);f.click()}c.claim()}break;case"Escape":case"Tab":Be("search",!1),i.blur();break;case"ArrowUp":case"ArrowDown":if(typeof p=="undefined")i.focus();else{let l=[i,...R(":not(details) > [href], summary, details[open] [href]",s)],f=Math.max(0,(Math.max(0,l.indexOf(p))+l.length+(c.type==="ArrowUp"?-1:1))%l.length);l[f].focus()}c.claim();break;default:i!==Re()&&i.focus()}}),r.pipe(g(({mode:c})=>c==="global")).subscribe(c=>{switch(c.type){case"f":case"s":case"/":i.focus(),i.select(),c.claim();break}});let a=Jn(i,{worker$:n});return T(a,Xn(s,{worker$:n,query$:a})).pipe($e(...ne("search-share",e).map(c=>Zn(c,{query$:a})),...ne("search-suggest",e).map(c=>ei(c,{worker$:n,keyboard$:r}))))}catch(n){return e.hidden=!0,qe}}function ri(e,{index$:t,location$:r}){return Q([t,r.pipe(q(ve()),g(o=>!!o.searchParams.get("h")))]).pipe(m(([o,n])=>Yn(o.config)(n.searchParams.get("h"))),m(o=>{var s;let n=new Map,i=document.createNodeIterator(e,NodeFilter.SHOW_TEXT);for(let a=i.nextNode();a;a=i.nextNode())if((s=a.parentElement)!=null&&s.offsetHeight){let c=a.textContent,p=o(c);p.length>c.length&&n.set(a,p)}for(let[a,c]of n){let{childNodes:p}=E("span",null,c);a.replaceWith(...Array.from(p))}return{ref:e,nodes:n}}))}function Ya(e,{viewport$:t,main$:r}){let o=e.closest(".md-grid"),n=o.offsetTop-o.parentElement.offsetTop;return Q([r,t]).pipe(m(([{offset:i,height:s},{offset:{y:a}}])=>(s=s+Math.min(n,Math.max(0,a-i))-n,{height:s,locked:a>=i+n})),Y((i,s)=>i.height===s.height&&i.locked===s.locked))}function Qr(e,o){var n=o,{header$:t}=n,r=to(n,["header$"]);let i=P(".md-sidebar__scrollwrap",e),{y:s}=Ue(i);return H(()=>{let a=new v,c=a.pipe(ee(),oe(!0)),p=a.pipe(Me(0,de));return p.pipe(ae(t)).subscribe({next([{height:l},{height:f}]){i.style.height=`${l-2*s}px`,e.style.top=`${f}px`},complete(){i.style.height="",e.style.top=""}}),p.pipe(He()).subscribe(()=>{for(let l of R(".md-nav__link--active[href]",e)){if(!l.clientHeight)continue;let f=l.closest(".md-sidebar__scrollwrap");if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2})}}}),fe(R("label[tabindex]",e)).pipe(re(l=>d(l,"click").pipe(Oe(ie),m(()=>l),U(c)))).subscribe(l=>{let f=P(`[id="${l.htmlFor}"]`);P(`[aria-labelledby="${l.id}"]`).setAttribute("aria-expanded",`${f.checked}`)}),Ya(e,r).pipe(y(l=>a.next(l)),_(()=>a.complete()),m(l=>F({ref:e},l)))})}function oi(e,t){if(typeof t!="undefined"){let r=`https://api.github.com/repos/${e}/${t}`;return Lt(De(`${r}/releases/latest`).pipe(he(()=>L),m(o=>({version:o.tag_name})),Qe({})),De(r).pipe(he(()=>L),m(o=>({stars:o.stargazers_count,forks:o.forks_count})),Qe({}))).pipe(m(([o,n])=>F(F({},o),n)))}else{let r=`https://api.github.com/users/${e}`;return De(r).pipe(m(o=>({repositories:o.public_repos})),Qe({}))}}function ni(e,t){let r=`https://${e}/api/v4/projects/${encodeURIComponent(t)}`;return De(r).pipe(he(()=>L),m(({star_count:o,forks_count:n})=>({stars:o,forks:n})),Qe({}))}function ii(e){let t=e.match(/^.+github\.com\/([^/]+)\/?([^/]+)?/i);if(t){let[,r,o]=t;return oi(r,o)}if(t=e.match(/^.+?([^/]*gitlab[^/]+)\/(.+?)\/?$/i),t){let[,r,o]=t;return ni(r,o)}return L}var Ba;function Ga(e){return Ba||(Ba=H(()=>{let t=__md_get("__source",sessionStorage);if(t)return $(t);if(ne("consent").length){let o=__md_get("__consent");if(!(o&&o.github))return L}return ii(e.href).pipe(y(o=>__md_set("__source",o,sessionStorage)))}).pipe(he(()=>L),g(t=>Object.keys(t).length>0),m(t=>({facts:t})),B(1)))}function ai(e){let t=P(":scope > :last-child",e);return H(()=>{let r=new v;return r.subscribe(({facts:o})=>{t.appendChild(bn(o)),t.classList.add("md-source__repository--active")}),Ga(e).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Ja(e,{viewport$:t,header$:r}){return Ee(document.body).pipe(b(()=>pr(e,{header$:r,viewport$:t})),m(({offset:{y:o}})=>({hidden:o>=10})),X("hidden"))}function si(e,t){return H(()=>{let r=new v;return r.subscribe({next({hidden:o}){e.hidden=o},complete(){e.hidden=!1}}),(G("navigation.tabs.sticky")?$({hidden:!1}):Ja(e,t)).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Xa(e,{viewport$:t,header$:r}){let o=new Map,n=R(".md-nav__link",e);for(let a of n){let c=decodeURIComponent(a.hash.substring(1)),p=me(`[id="${c}"]`);typeof p!="undefined"&&o.set(a,p)}let i=r.pipe(X("height"),m(({height:a})=>{let c=Te("main"),p=P(":scope > :first-child",c);return a+.8*(p.offsetTop-c.offsetTop)}),le());return Ee(document.body).pipe(X("height"),b(a=>H(()=>{let c=[];return $([...o].reduce((p,[l,f])=>{for(;c.length&&o.get(c[c.length-1]).tagName>=f.tagName;)c.pop();let u=f.offsetTop;for(;!u&&f.parentElement;)f=f.parentElement,u=f.offsetTop;let h=f.offsetParent;for(;h;h=h.offsetParent)u+=h.offsetTop;return p.set([...c=[...c,l]].reverse(),u)},new Map))}).pipe(m(c=>new Map([...c].sort(([,p],[,l])=>p-l))),je(i),b(([c,p])=>t.pipe(Rr(([l,f],{offset:{y:u},size:h})=>{let w=u+h.height>=Math.floor(a.height);for(;f.length;){let[,A]=f[0];if(A-p=u&&!w)f=[l.pop(),...f];else break}return[l,f]},[[],[...c]]),Y((l,f)=>l[0]===f[0]&&l[1]===f[1])))))).pipe(m(([a,c])=>({prev:a.map(([p])=>p),next:c.map(([p])=>p)})),q({prev:[],next:[]}),Ke(2,1),m(([a,c])=>a.prev.length{let i=new v,s=i.pipe(ee(),oe(!0));if(i.subscribe(({prev:a,next:c})=>{for(let[p]of c)p.classList.remove("md-nav__link--passed"),p.classList.remove("md-nav__link--active");for(let[p,[l]]of a.entries())l.classList.add("md-nav__link--passed"),l.classList.toggle("md-nav__link--active",p===a.length-1)}),G("toc.follow")){let a=T(t.pipe(be(1),m(()=>{})),t.pipe(be(250),m(()=>"smooth")));i.pipe(g(({prev:c})=>c.length>0),je(o.pipe(Oe(ie))),ae(a)).subscribe(([[{prev:c}],p])=>{let[l]=c[c.length-1];if(l.offsetHeight){let f=sr(l);if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2,behavior:p})}}})}return G("navigation.tracking")&&t.pipe(U(s),X("offset"),be(250),Le(1),U(n.pipe(Le(1))),at({delay:250}),ae(i)).subscribe(([,{prev:a}])=>{let c=ve(),p=a[a.length-1];if(p&&p.length){let[l]=p,{hash:f}=new URL(l.href);c.hash!==f&&(c.hash=f,history.replaceState({},"",`${c}`))}else c.hash="",history.replaceState({},"",`${c}`)}),Xa(e,{viewport$:t,header$:r}).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))})}function Za(e,{viewport$:t,main$:r,target$:o}){let n=t.pipe(m(({offset:{y:s}})=>s),Ke(2,1),m(([s,a])=>s>a&&a>0),Y()),i=r.pipe(m(({active:s})=>s));return Q([i,n]).pipe(m(([s,a])=>!(s&&a)),Y(),U(o.pipe(Le(1))),oe(!0),at({delay:250}),m(s=>({hidden:s})))}function pi(e,{viewport$:t,header$:r,main$:o,target$:n}){let i=new v,s=i.pipe(ee(),oe(!0));return i.subscribe({next({hidden:a}){e.hidden=a,a?(e.setAttribute("tabindex","-1"),e.blur()):e.removeAttribute("tabindex")},complete(){e.style.top="",e.hidden=!0,e.removeAttribute("tabindex")}}),r.pipe(U(s),X("height")).subscribe(({height:a})=>{e.style.top=`${a+16}px`}),d(e,"click").subscribe(a=>{a.preventDefault(),window.scrollTo({top:0})}),Za(e,{viewport$:t,main$:o,target$:n}).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))}function li({document$:e}){e.pipe(b(()=>R(".md-ellipsis")),re(t=>yt(t).pipe(U(e.pipe(Le(1))),g(r=>r),m(()=>t),ye(1))),g(t=>t.offsetWidth{let r=t.innerText,o=t.closest("a")||t;return o.title=r,Ge(o).pipe(U(e.pipe(Le(1))),_(()=>o.removeAttribute("title")))})).subscribe(),e.pipe(b(()=>R(".md-status")),re(t=>Ge(t))).subscribe()}function mi({document$:e,tablet$:t}){e.pipe(b(()=>R(".md-toggle--indeterminate")),y(r=>{r.indeterminate=!0,r.checked=!1}),re(r=>d(r,"change").pipe(Fr(()=>r.classList.contains("md-toggle--indeterminate")),m(()=>r))),ae(t)).subscribe(([r,o])=>{r.classList.remove("md-toggle--indeterminate"),o&&(r.checked=!1)})}function es(){return/(iPad|iPhone|iPod)/.test(navigator.userAgent)}function fi({document$:e}){e.pipe(b(()=>R("[data-md-scrollfix]")),y(t=>t.removeAttribute("data-md-scrollfix")),g(es),re(t=>d(t,"touchstart").pipe(m(()=>t)))).subscribe(t=>{let r=t.scrollTop;r===0?t.scrollTop=1:r+t.offsetHeight===t.scrollHeight&&(t.scrollTop=r-1)})}function ui({viewport$:e,tablet$:t}){Q([We("search"),t]).pipe(m(([r,o])=>r&&!o),b(r=>$(r).pipe(Ye(r?400:100))),ae(e)).subscribe(([r,{offset:{y:o}}])=>{if(r)document.body.setAttribute("data-md-scrolllock",""),document.body.style.top=`-${o}px`;else{let n=-1*parseInt(document.body.style.top,10);document.body.removeAttribute("data-md-scrolllock"),document.body.style.top="",n&&window.scrollTo(0,n)}})}Object.entries||(Object.entries=function(e){let t=[];for(let r of Object.keys(e))t.push([r,e[r]]);return t});Object.values||(Object.values=function(e){let t=[];for(let r of Object.keys(e))t.push(e[r]);return t});typeof Element!="undefined"&&(Element.prototype.scrollTo||(Element.prototype.scrollTo=function(e,t){typeof e=="object"?(this.scrollLeft=e.left,this.scrollTop=e.top):(this.scrollLeft=e,this.scrollTop=t)}),Element.prototype.replaceWith||(Element.prototype.replaceWith=function(...e){let t=this.parentNode;if(t){e.length===0&&t.removeChild(this);for(let r=e.length-1;r>=0;r--){let o=e[r];typeof o=="string"?o=document.createTextNode(o):o.parentNode&&o.parentNode.removeChild(o),r?t.insertBefore(this.previousSibling,o):t.replaceChild(o,this)}}}));function ts(){return location.protocol==="file:"?gt(`${new URL("search/search_index.js",Yr.base)}`).pipe(m(()=>__index),B(1)):De(new URL("search/search_index.json",Yr.base))}document.documentElement.classList.remove("no-js");document.documentElement.classList.add("js");var rt=No(),Rt=Jo(),wt=en(Rt),Br=Go(),_e=pn(),ur=At("(min-width: 960px)"),hi=At("(min-width: 1220px)"),bi=tn(),Yr=we(),vi=document.forms.namedItem("search")?ts():qe,Gr=new v;Wn({alert$:Gr});var Jr=new v;G("navigation.instant")&&zn({location$:Rt,viewport$:_e,progress$:Jr}).subscribe(rt);var di;((di=Yr.version)==null?void 0:di.provider)==="mike"&&Gn({document$:rt});T(Rt,wt).pipe(Ye(125)).subscribe(()=>{Be("drawer",!1),Be("search",!1)});Br.pipe(g(({mode:e})=>e==="global")).subscribe(e=>{switch(e.type){case"p":case",":let t=me("link[rel=prev]");typeof t!="undefined"&&st(t);break;case"n":case".":let r=me("link[rel=next]");typeof r!="undefined"&&st(r);break;case"Enter":let o=Re();o instanceof HTMLLabelElement&&o.click()}});li({document$:rt});mi({document$:rt,tablet$:ur});fi({document$:rt});ui({viewport$:_e,tablet$:ur});var tt=Rn(Te("header"),{viewport$:_e}),$t=rt.pipe(m(()=>Te("main")),b(e=>Fn(e,{viewport$:_e,header$:tt})),B(1)),rs=T(...ne("consent").map(e=>fn(e,{target$:wt})),...ne("dialog").map(e=>$n(e,{alert$:Gr})),...ne("header").map(e=>Pn(e,{viewport$:_e,header$:tt,main$:$t})),...ne("palette").map(e=>jn(e)),...ne("progress").map(e=>Un(e,{progress$:Jr})),...ne("search").map(e=>ti(e,{index$:vi,keyboard$:Br})),...ne("source").map(e=>ai(e))),os=H(()=>T(...ne("announce").map(e=>mn(e)),...ne("content").map(e=>Hn(e,{viewport$:_e,target$:wt,print$:bi})),...ne("content").map(e=>G("search.highlight")?ri(e,{index$:vi,location$:Rt}):L),...ne("header-title").map(e=>In(e,{viewport$:_e,header$:tt})),...ne("sidebar").map(e=>e.getAttribute("data-md-type")==="navigation"?Ur(hi,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t})):Ur(ur,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t}))),...ne("tabs").map(e=>si(e,{viewport$:_e,header$:tt})),...ne("toc").map(e=>ci(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})),...ne("top").map(e=>pi(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})))),gi=rt.pipe(b(()=>os),$e(rs),B(1));gi.subscribe();window.document$=rt;window.location$=Rt;window.target$=wt;window.keyboard$=Br;window.viewport$=_e;window.tablet$=ur;window.screen$=hi;window.print$=bi;window.alert$=Gr;window.progress$=Jr;window.component$=gi;})(); -//# sourceMappingURL=bundle.bd41221c.min.js.map + `):"",this.name="UnsubscriptionError",this.errors=r}});function Ve(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Ie=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,o,n,i;if(!this.closed){this.closed=!0;var a=this._parentage;if(a)if(this._parentage=null,Array.isArray(a))try{for(var s=ue(a),c=s.next();!c.done;c=s.next()){var p=c.value;p.remove(this)}}catch(A){t={error:A}}finally{try{c&&!c.done&&(r=s.return)&&r.call(s)}finally{if(t)throw t.error}}else a.remove(this);var l=this.initialTeardown;if(k(l))try{l()}catch(A){i=A instanceof Wt?A.errors:[A]}var f=this._finalizers;if(f){this._finalizers=null;try{for(var u=ue(f),h=u.next();!h.done;h=u.next()){var w=h.value;try{co(w)}catch(A){i=i!=null?i:[],A instanceof Wt?i=z(z([],V(i)),V(A.errors)):i.push(A)}}}catch(A){o={error:A}}finally{try{h&&!h.done&&(n=u.return)&&n.call(u)}finally{if(o)throw o.error}}}if(i)throw new Wt(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)co(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Ve(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Ve(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Er=Ie.EMPTY;function Dt(e){return e instanceof Ie||e&&"closed"in e&&k(e.remove)&&k(e.add)&&k(e.unsubscribe)}function co(e){k(e)?e():e.unsubscribe()}var ke={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var lt={setTimeout:function(e,t){for(var r=[],o=2;o0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var o=this,n=this,i=n.hasError,a=n.isStopped,s=n.observers;return i||a?Er:(this.currentObservers=null,s.push(r),new Ie(function(){o.currentObservers=null,Ve(s,r)}))},t.prototype._checkFinalizedStatuses=function(r){var o=this,n=o.hasError,i=o.thrownError,a=o.isStopped;n?r.error(i):a&&r.complete()},t.prototype.asObservable=function(){var r=new j;return r.source=this,r},t.create=function(r,o){return new vo(r,o)},t}(j);var vo=function(e){se(t,e);function t(r,o){var n=e.call(this)||this;return n.destination=r,n.source=o,n}return t.prototype.next=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.next)===null||n===void 0||n.call(o,r)},t.prototype.error=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.error)===null||n===void 0||n.call(o,r)},t.prototype.complete=function(){var r,o;(o=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||o===void 0||o.call(r)},t.prototype._subscribe=function(r){var o,n;return(n=(o=this.source)===null||o===void 0?void 0:o.subscribe(r))!==null&&n!==void 0?n:Er},t}(g);var St={now:function(){return(St.delegate||Date).now()},delegate:void 0};var Ot=function(e){se(t,e);function t(r,o,n){r===void 0&&(r=1/0),o===void 0&&(o=1/0),n===void 0&&(n=St);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=o,i._timestampProvider=n,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=o===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,o),i}return t.prototype.next=function(r){var o=this,n=o.isStopped,i=o._buffer,a=o._infiniteTimeWindow,s=o._timestampProvider,c=o._windowTime;n||(i.push(r),!a&&i.push(s.now()+c)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var o=this._innerSubscribe(r),n=this,i=n._infiniteTimeWindow,a=n._buffer,s=a.slice(),c=0;c0?e.prototype.requestAsyncId.call(this,r,o,n):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,o,n){var i;if(n===void 0&&(n=0),n!=null?n>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,o,n);var a=r.actions;o!=null&&((i=a[a.length-1])===null||i===void 0?void 0:i.id)!==o&&(ut.cancelAnimationFrame(o),r._scheduled=void 0)},t}(zt);var yo=function(e){se(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var o=this._scheduled;this._scheduled=void 0;var n=this.actions,i;r=r||n.shift();do if(i=r.execute(r.state,r.delay))break;while((r=n[0])&&r.id===o&&n.shift());if(this._active=!1,i){for(;(r=n[0])&&r.id===o&&n.shift();)r.unsubscribe();throw i}},t}(qt);var de=new yo(xo);var L=new j(function(e){return e.complete()});function Kt(e){return e&&k(e.schedule)}function _r(e){return e[e.length-1]}function Je(e){return k(_r(e))?e.pop():void 0}function Ae(e){return Kt(_r(e))?e.pop():void 0}function Qt(e,t){return typeof _r(e)=="number"?e.pop():t}var dt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Yt(e){return k(e==null?void 0:e.then)}function Bt(e){return k(e[ft])}function Gt(e){return Symbol.asyncIterator&&k(e==null?void 0:e[Symbol.asyncIterator])}function Jt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Di(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Xt=Di();function Zt(e){return k(e==null?void 0:e[Xt])}function er(e){return ao(this,arguments,function(){var r,o,n,i;return Ut(this,function(a){switch(a.label){case 0:r=e.getReader(),a.label=1;case 1:a.trys.push([1,,9,10]),a.label=2;case 2:return[4,ot(r.read())];case 3:return o=a.sent(),n=o.value,i=o.done,i?[4,ot(void 0)]:[3,5];case 4:return[2,a.sent()];case 5:return[4,ot(n)];case 6:return[4,a.sent()];case 7:return a.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function tr(e){return k(e==null?void 0:e.getReader)}function N(e){if(e instanceof j)return e;if(e!=null){if(Bt(e))return Ni(e);if(dt(e))return Vi(e);if(Yt(e))return zi(e);if(Gt(e))return Eo(e);if(Zt(e))return qi(e);if(tr(e))return Ki(e)}throw Jt(e)}function Ni(e){return new j(function(t){var r=e[ft]();if(k(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Vi(e){return new j(function(t){for(var r=0;r=2;return function(o){return o.pipe(e?b(function(n,i){return e(n,i,o)}):ce,ye(1),r?Qe(t):jo(function(){return new or}))}}function $r(e){return e<=0?function(){return L}:x(function(t,r){var o=[];t.subscribe(S(r,function(n){o.push(n),e=2,!0))}function le(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new g}:t,o=e.resetOnError,n=o===void 0?!0:o,i=e.resetOnComplete,a=i===void 0?!0:i,s=e.resetOnRefCountZero,c=s===void 0?!0:s;return function(p){var l,f,u,h=0,w=!1,A=!1,Z=function(){f==null||f.unsubscribe(),f=void 0},te=function(){Z(),l=u=void 0,w=A=!1},J=function(){var C=l;te(),C==null||C.unsubscribe()};return x(function(C,ct){h++,!A&&!w&&Z();var Ne=u=u!=null?u:r();ct.add(function(){h--,h===0&&!A&&!w&&(f=Pr(J,c))}),Ne.subscribe(ct),!l&&h>0&&(l=new it({next:function(Pe){return Ne.next(Pe)},error:function(Pe){A=!0,Z(),f=Pr(te,n,Pe),Ne.error(Pe)},complete:function(){w=!0,Z(),f=Pr(te,a),Ne.complete()}}),N(C).subscribe(l))})(p)}}function Pr(e,t){for(var r=[],o=2;oe.next(document)),e}function R(e,t=document){return Array.from(t.querySelectorAll(e))}function P(e,t=document){let r=me(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function me(e,t=document){return t.querySelector(e)||void 0}function Re(){var e,t,r,o;return(o=(r=(t=(e=document.activeElement)==null?void 0:e.shadowRoot)==null?void 0:t.activeElement)!=null?r:document.activeElement)!=null?o:void 0}var la=T(d(document.body,"focusin"),d(document.body,"focusout")).pipe(be(1),q(void 0),m(()=>Re()||document.body),B(1));function vt(e){return la.pipe(m(t=>e.contains(t)),Y())}function Vo(e,t){return T(d(e,"mouseenter").pipe(m(()=>!0)),d(e,"mouseleave").pipe(m(()=>!1))).pipe(t?be(t):ce,q(!1))}function Ue(e){return{x:e.offsetLeft,y:e.offsetTop}}function zo(e){return T(d(window,"load"),d(window,"resize")).pipe(Me(0,de),m(()=>Ue(e)),q(Ue(e)))}function ir(e){return{x:e.scrollLeft,y:e.scrollTop}}function et(e){return T(d(e,"scroll"),d(window,"resize")).pipe(Me(0,de),m(()=>ir(e)),q(ir(e)))}function qo(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)qo(e,r)}function E(e,t,...r){let o=document.createElement(e);if(t)for(let n of Object.keys(t))typeof t[n]!="undefined"&&(typeof t[n]!="boolean"?o.setAttribute(n,t[n]):o.setAttribute(n,""));for(let n of r)qo(o,n);return o}function ar(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function gt(e){let t=E("script",{src:e});return H(()=>(document.head.appendChild(t),T(d(t,"load"),d(t,"error").pipe(v(()=>Ar(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(m(()=>{}),_(()=>document.head.removeChild(t)),ye(1))))}var Ko=new g,ma=H(()=>typeof ResizeObserver=="undefined"?gt("https://unpkg.com/resize-observer-polyfill"):$(void 0)).pipe(m(()=>new ResizeObserver(e=>{for(let t of e)Ko.next(t)})),v(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function pe(e){return{width:e.offsetWidth,height:e.offsetHeight}}function Ee(e){return ma.pipe(y(t=>t.observe(e)),v(t=>Ko.pipe(b(({target:r})=>r===e),_(()=>t.unobserve(e)),m(()=>pe(e)))),q(pe(e)))}function xt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function sr(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var Qo=new g,fa=H(()=>$(new IntersectionObserver(e=>{for(let t of e)Qo.next(t)},{threshold:0}))).pipe(v(e=>T(qe,$(e)).pipe(_(()=>e.disconnect()))),B(1));function yt(e){return fa.pipe(y(t=>t.observe(e)),v(t=>Qo.pipe(b(({target:r})=>r===e),_(()=>t.unobserve(e)),m(({isIntersecting:r})=>r))))}function Yo(e,t=16){return et(e).pipe(m(({y:r})=>{let o=pe(e),n=xt(e);return r>=n.height-o.height-t}),Y())}var cr={drawer:P("[data-md-toggle=drawer]"),search:P("[data-md-toggle=search]")};function Bo(e){return cr[e].checked}function Be(e,t){cr[e].checked!==t&&cr[e].click()}function We(e){let t=cr[e];return d(t,"change").pipe(m(()=>t.checked),q(t.checked))}function ua(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function da(){return T(d(window,"compositionstart").pipe(m(()=>!0)),d(window,"compositionend").pipe(m(()=>!1))).pipe(q(!1))}function Go(){let e=d(window,"keydown").pipe(b(t=>!(t.metaKey||t.ctrlKey)),m(t=>({mode:Bo("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),b(({mode:t,type:r})=>{if(t==="global"){let o=Re();if(typeof o!="undefined")return!ua(o,r)}return!0}),le());return da().pipe(v(t=>t?L:e))}function ve(){return new URL(location.href)}function st(e,t=!1){if(G("navigation.instant")&&!t){let r=E("a",{href:e.href});document.body.appendChild(r),r.click(),r.remove()}else location.href=e.href}function Jo(){return new g}function Xo(){return location.hash.slice(1)}function Zo(e){let t=E("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function ha(e){return T(d(window,"hashchange"),e).pipe(m(Xo),q(Xo()),b(t=>t.length>0),B(1))}function en(e){return ha(e).pipe(m(t=>me(`[id="${t}"]`)),b(t=>typeof t!="undefined"))}function At(e){let t=matchMedia(e);return nr(r=>t.addListener(()=>r(t.matches))).pipe(q(t.matches))}function tn(){let e=matchMedia("print");return T(d(window,"beforeprint").pipe(m(()=>!0)),d(window,"afterprint").pipe(m(()=>!1))).pipe(q(e.matches))}function Ur(e,t){return e.pipe(v(r=>r?t():L))}function Wr(e,t){return new j(r=>{let o=new XMLHttpRequest;return o.open("GET",`${e}`),o.responseType="blob",o.addEventListener("load",()=>{o.status>=200&&o.status<300?(r.next(o.response),r.complete()):r.error(new Error(o.statusText))}),o.addEventListener("error",()=>{r.error(new Error("Network error"))}),o.addEventListener("abort",()=>{r.complete()}),typeof(t==null?void 0:t.progress$)!="undefined"&&(o.addEventListener("progress",n=>{var i;if(n.lengthComputable)t.progress$.next(n.loaded/n.total*100);else{let a=(i=o.getResponseHeader("Content-Length"))!=null?i:0;t.progress$.next(n.loaded/+a*100)}}),t.progress$.next(5)),o.send(),()=>o.abort()})}function De(e,t){return Wr(e,t).pipe(v(r=>r.text()),m(r=>JSON.parse(r)),B(1))}function rn(e,t){let r=new DOMParser;return Wr(e,t).pipe(v(o=>o.text()),m(o=>r.parseFromString(o,"text/html")),B(1))}function on(e,t){let r=new DOMParser;return Wr(e,t).pipe(v(o=>o.text()),m(o=>r.parseFromString(o,"text/xml")),B(1))}function nn(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function an(){return T(d(window,"scroll",{passive:!0}),d(window,"resize",{passive:!0})).pipe(m(nn),q(nn()))}function sn(){return{width:innerWidth,height:innerHeight}}function cn(){return d(window,"resize",{passive:!0}).pipe(m(sn),q(sn()))}function pn(){return Q([an(),cn()]).pipe(m(([e,t])=>({offset:e,size:t})),B(1))}function pr(e,{viewport$:t,header$:r}){let o=t.pipe(X("size")),n=Q([o,r]).pipe(m(()=>Ue(e)));return Q([r,t,n]).pipe(m(([{height:i},{offset:a,size:s},{x:c,y:p}])=>({offset:{x:a.x-c,y:a.y-p+i},size:s})))}function ba(e){return d(e,"message",t=>t.data)}function va(e){let t=new g;return t.subscribe(r=>e.postMessage(r)),t}function ln(e,t=new Worker(e)){let r=ba(t),o=va(t),n=new g;n.subscribe(o);let i=o.pipe(ee(),oe(!0));return n.pipe(ee(),$e(r.pipe(U(i))),le())}var ga=P("#__config"),Et=JSON.parse(ga.textContent);Et.base=`${new URL(Et.base,ve())}`;function we(){return Et}function G(e){return Et.features.includes(e)}function ge(e,t){return typeof t!="undefined"?Et.translations[e].replace("#",t.toString()):Et.translations[e]}function Te(e,t=document){return P(`[data-md-component=${e}]`,t)}function ie(e,t=document){return R(`[data-md-component=${e}]`,t)}function xa(e){let t=P(".md-typeset > :first-child",e);return d(t,"click",{once:!0}).pipe(m(()=>P(".md-typeset",e)),m(r=>({hash:__md_hash(r.innerHTML)})))}function mn(e){if(!G("announce.dismiss")||!e.childElementCount)return L;if(!e.hidden){let t=P(".md-typeset",e);__md_hash(t.innerHTML)===__md_get("__announce")&&(e.hidden=!0)}return H(()=>{let t=new g;return t.subscribe(({hash:r})=>{e.hidden=!0,__md_set("__announce",r)}),xa(e).pipe(y(r=>t.next(r)),_(()=>t.complete()),m(r=>F({ref:e},r)))})}function ya(e,{target$:t}){return t.pipe(m(r=>({hidden:r!==e})))}function fn(e,t){let r=new g;return r.subscribe(({hidden:o})=>{e.hidden=o}),ya(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))}function Ct(e,t){return t==="inline"?E("div",{class:"md-tooltip md-tooltip--inline",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"})):E("div",{class:"md-tooltip",id:e,role:"tooltip"},E("div",{class:"md-tooltip__inner md-typeset"}))}function un(e,t){if(t=t?`${t}_annotation_${e}`:void 0,t){let r=t?`#${t}`:void 0;return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("a",{href:r,class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}else return E("aside",{class:"md-annotation",tabIndex:0},Ct(t),E("span",{class:"md-annotation__index",tabIndex:-1},E("span",{"data-md-annotation-id":e})))}function dn(e){return E("button",{class:"md-clipboard md-icon",title:ge("clipboard.copy"),"data-clipboard-target":`#${e} > code`})}function Dr(e,t){let r=t&2,o=t&1,n=Object.keys(e.terms).filter(c=>!e.terms[c]).reduce((c,p)=>[...c,E("del",null,p)," "],[]).slice(0,-1),i=we(),a=new URL(e.location,i.base);G("search.highlight")&&a.searchParams.set("h",Object.entries(e.terms).filter(([,c])=>c).reduce((c,[p])=>`${c} ${p}`.trim(),""));let{tags:s}=we();return E("a",{href:`${a}`,class:"md-search-result__link",tabIndex:-1},E("article",{class:"md-search-result__article md-typeset","data-md-score":e.score.toFixed(2)},r>0&&E("div",{class:"md-search-result__icon md-icon"}),r>0&&E("h1",null,e.title),r<=0&&E("h2",null,e.title),o>0&&e.text.length>0&&e.text,e.tags&&e.tags.map(c=>{let p=s?c in s?`md-tag-icon md-tag--${s[c]}`:"md-tag-icon":"";return E("span",{class:`md-tag ${p}`},c)}),o>0&&n.length>0&&E("p",{class:"md-search-result__terms"},ge("search.result.term.missing"),": ",...n)))}function hn(e){let t=e[0].score,r=[...e],o=we(),n=r.findIndex(l=>!`${new URL(l.location,o.base)}`.includes("#")),[i]=r.splice(n,1),a=r.findIndex(l=>l.scoreDr(l,1)),...c.length?[E("details",{class:"md-search-result__more"},E("summary",{tabIndex:-1},E("div",null,c.length>0&&c.length===1?ge("search.result.more.one"):ge("search.result.more.other",c.length))),...c.map(l=>Dr(l,1)))]:[]];return E("li",{class:"md-search-result__item"},p)}function bn(e){return E("ul",{class:"md-source__facts"},Object.entries(e).map(([t,r])=>E("li",{class:`md-source__fact md-source__fact--${t}`},typeof r=="number"?ar(r):r)))}function Nr(e){let t=`tabbed-control tabbed-control--${e}`;return E("div",{class:t,hidden:!0},E("button",{class:"tabbed-button",tabIndex:-1,"aria-hidden":"true"}))}function vn(e){return E("div",{class:"md-typeset__scrollwrap"},E("div",{class:"md-typeset__table"},e))}function Ea(e){let t=we(),r=new URL(`../${e.version}/`,t.base);return E("li",{class:"md-version__item"},E("a",{href:`${r}`,class:"md-version__link"},e.title))}function gn(e,t){return e=e.filter(r=>{var o;return!((o=r.properties)!=null&&o.hidden)}),E("div",{class:"md-version"},E("button",{class:"md-version__current","aria-label":ge("select.version")},t.title),E("ul",{class:"md-version__list"},e.map(Ea)))}var wa=0;function Ta(e,t){document.body.append(e);let{width:r}=pe(e);e.style.setProperty("--md-tooltip-width",`${r}px`),e.remove();let o=sr(t),n=typeof o!="undefined"?et(o):$({x:0,y:0}),i=T(vt(t),Vo(t)).pipe(Y());return Q([i,n]).pipe(m(([a,s])=>{let{x:c,y:p}=Ue(t),l=pe(t),f=t.closest("table");return f&&t.parentElement&&(c+=f.offsetLeft+t.parentElement.offsetLeft,p+=f.offsetTop+t.parentElement.offsetTop),{active:a,offset:{x:c-s.x+l.width/2-r/2,y:p-s.y+l.height+8}}}))}function Ge(e){let t=e.title;if(!t.length)return L;let r=`__tooltip_${wa++}`,o=Ct(r,"inline"),n=P(".md-typeset",o);return n.innerHTML=t,H(()=>{let i=new g;return i.subscribe({next({offset:a}){o.style.setProperty("--md-tooltip-x",`${a.x}px`),o.style.setProperty("--md-tooltip-y",`${a.y}px`)},complete(){o.style.removeProperty("--md-tooltip-x"),o.style.removeProperty("--md-tooltip-y")}}),T(i.pipe(b(({active:a})=>a)),i.pipe(be(250),b(({active:a})=>!a))).subscribe({next({active:a}){a?(e.insertAdjacentElement("afterend",o),e.setAttribute("aria-describedby",r),e.removeAttribute("title")):(o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t))},complete(){o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t)}}),i.pipe(Me(16,de)).subscribe(({active:a})=>{o.classList.toggle("md-tooltip--active",a)}),i.pipe(_t(125,de),b(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:a})=>a)).subscribe({next(a){a?o.style.setProperty("--md-tooltip-0",`${-a}px`):o.style.removeProperty("--md-tooltip-0")},complete(){o.style.removeProperty("--md-tooltip-0")}}),Ta(o,e).pipe(y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))}).pipe(ze(ae))}function Sa(e,t){let r=H(()=>Q([zo(e),et(t)])).pipe(m(([{x:o,y:n},i])=>{let{width:a,height:s}=pe(e);return{x:o-i.x+a/2,y:n-i.y+s/2}}));return vt(e).pipe(v(o=>r.pipe(m(n=>({active:o,offset:n})),ye(+!o||1/0))))}function xn(e,t,{target$:r}){let[o,n]=Array.from(e.children);return H(()=>{let i=new g,a=i.pipe(ee(),oe(!0));return i.subscribe({next({offset:s}){e.style.setProperty("--md-tooltip-x",`${s.x}px`),e.style.setProperty("--md-tooltip-y",`${s.y}px`)},complete(){e.style.removeProperty("--md-tooltip-x"),e.style.removeProperty("--md-tooltip-y")}}),yt(e).pipe(U(a)).subscribe(s=>{e.toggleAttribute("data-md-visible",s)}),T(i.pipe(b(({active:s})=>s)),i.pipe(be(250),b(({active:s})=>!s))).subscribe({next({active:s}){s?e.prepend(o):o.remove()},complete(){e.prepend(o)}}),i.pipe(Me(16,de)).subscribe(({active:s})=>{o.classList.toggle("md-tooltip--active",s)}),i.pipe(_t(125,de),b(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:s})=>s)).subscribe({next(s){s?e.style.setProperty("--md-tooltip-0",`${-s}px`):e.style.removeProperty("--md-tooltip-0")},complete(){e.style.removeProperty("--md-tooltip-0")}}),d(n,"click").pipe(U(a),b(s=>!(s.metaKey||s.ctrlKey))).subscribe(s=>{s.stopPropagation(),s.preventDefault()}),d(n,"mousedown").pipe(U(a),ne(i)).subscribe(([s,{active:c}])=>{var p;if(s.button!==0||s.metaKey||s.ctrlKey)s.preventDefault();else if(c){s.preventDefault();let l=e.parentElement.closest(".md-annotation");l instanceof HTMLElement?l.focus():(p=Re())==null||p.blur()}}),r.pipe(U(a),b(s=>s===o),Ye(125)).subscribe(()=>e.focus()),Sa(e,t).pipe(y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))})}function Oa(e){return e.tagName==="CODE"?R(".c, .c1, .cm",e):[e]}function Ma(e){let t=[];for(let r of Oa(e)){let o=[],n=document.createNodeIterator(r,NodeFilter.SHOW_TEXT);for(let i=n.nextNode();i;i=n.nextNode())o.push(i);for(let i of o){let a;for(;a=/(\(\d+\))(!)?/.exec(i.textContent);){let[,s,c]=a;if(typeof c=="undefined"){let p=i.splitText(a.index);i=p.splitText(s.length),t.push(p)}else{i.textContent=s,t.push(i);break}}}}return t}function yn(e,t){t.append(...Array.from(e.childNodes))}function lr(e,t,{target$:r,print$:o}){let n=t.closest("[id]"),i=n==null?void 0:n.id,a=new Map;for(let s of Ma(t)){let[,c]=s.textContent.match(/\((\d+)\)/);me(`:scope > li:nth-child(${c})`,e)&&(a.set(c,un(c,i)),s.replaceWith(a.get(c)))}return a.size===0?L:H(()=>{let s=new g,c=s.pipe(ee(),oe(!0)),p=[];for(let[l,f]of a)p.push([P(".md-typeset",f),P(`:scope > li:nth-child(${l})`,e)]);return o.pipe(U(c)).subscribe(l=>{e.hidden=!l,e.classList.toggle("md-annotation-list",l);for(let[f,u]of p)l?yn(f,u):yn(u,f)}),T(...[...a].map(([,l])=>xn(l,t,{target$:r}))).pipe(_(()=>s.complete()),le())})}function En(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return En(t)}}function wn(e,t){return H(()=>{let r=En(e);return typeof r!="undefined"?lr(r,e,t):L})}var Tn=jt(zr());var La=0;function Sn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Sn(t)}}function _a(e){return Ee(e).pipe(m(({width:t})=>({scrollable:xt(e).width>t})),X("scrollable"))}function On(e,t){let{matches:r}=matchMedia("(hover)"),o=H(()=>{let n=new g,i=n.pipe($r(1));n.subscribe(({scrollable:c})=>{c&&r?e.setAttribute("tabindex","0"):e.removeAttribute("tabindex")});let a=[];if(Tn.default.isSupported()&&(e.closest(".copy")||G("content.code.copy")&&!e.closest(".no-copy"))){let c=e.closest("pre");c.id=`__code_${La++}`;let p=dn(c.id);c.insertBefore(p,e),G("content.tooltips")&&a.push(Ge(p))}let s=e.closest(".highlight");if(s instanceof HTMLElement){let c=Sn(s);if(typeof c!="undefined"&&(s.classList.contains("annotate")||G("content.code.annotate"))){let p=lr(c,e,t);a.push(Ee(s).pipe(U(i),m(({width:l,height:f})=>l&&f),Y(),v(l=>l?p:L)))}}return _a(e).pipe(y(c=>n.next(c)),_(()=>n.complete()),m(c=>F({ref:e},c)),$e(...a))});return G("content.lazy")?yt(e).pipe(b(n=>n),ye(1),v(()=>o)):o}function Aa(e,{target$:t,print$:r}){let o=!0;return T(t.pipe(m(n=>n.closest("details:not([open])")),b(n=>e===n),m(()=>({action:"open",reveal:!0}))),r.pipe(b(n=>n||!o),y(()=>o=e.open),m(n=>({action:n?"open":"close"}))))}function Mn(e,t){return H(()=>{let r=new g;return r.subscribe(({action:o,reveal:n})=>{e.toggleAttribute("open",o==="open"),n&&e.scrollIntoView()}),Aa(e,t).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}var Ln=".node circle,.node ellipse,.node path,.node polygon,.node rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}marker{fill:var(--md-mermaid-edge-color)!important}.edgeLabel .label rect{fill:#0000}.label{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.label foreignObject{line-height:normal;overflow:visible}.label div .edgeLabel{color:var(--md-mermaid-label-fg-color)}.edgeLabel,.edgeLabel rect,.label div .edgeLabel{background-color:var(--md-mermaid-label-bg-color)}.edgeLabel,.edgeLabel rect{fill:var(--md-mermaid-label-bg-color);color:var(--md-mermaid-edge-color)}.edgePath .path,.flowchart-link{stroke:var(--md-mermaid-edge-color);stroke-width:.05rem}.edgePath .arrowheadPath{fill:var(--md-mermaid-edge-color);stroke:none}.cluster rect{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}.cluster span{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}g #flowchart-circleEnd,g #flowchart-circleStart,g #flowchart-crossEnd,g #flowchart-crossStart,g #flowchart-pointEnd,g #flowchart-pointStart{stroke:none}g.classGroup line,g.classGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.classGroup text{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.classLabel .box{fill:var(--md-mermaid-label-bg-color);background-color:var(--md-mermaid-label-bg-color);opacity:1}.classLabel .label{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node .divider{stroke:var(--md-mermaid-node-fg-color)}.relation{stroke:var(--md-mermaid-edge-color)}.cardinality{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.cardinality text{fill:inherit!important}defs #classDiagram-compositionEnd,defs #classDiagram-compositionStart,defs #classDiagram-dependencyEnd,defs #classDiagram-dependencyStart,defs #classDiagram-extensionEnd,defs #classDiagram-extensionStart{fill:var(--md-mermaid-edge-color)!important;stroke:var(--md-mermaid-edge-color)!important}defs #classDiagram-aggregationEnd,defs #classDiagram-aggregationStart{fill:var(--md-mermaid-label-bg-color)!important;stroke:var(--md-mermaid-edge-color)!important}g.stateGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.stateGroup .state-title{fill:var(--md-mermaid-label-fg-color)!important;font-family:var(--md-mermaid-font-family)}g.stateGroup .composit{fill:var(--md-mermaid-label-bg-color)}.nodeLabel,.nodeLabel p{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node circle.state-end,.node circle.state-start,.start-state{fill:var(--md-mermaid-edge-color);stroke:none}.end-state-inner,.end-state-outer{fill:var(--md-mermaid-edge-color)}.end-state-inner,.node circle.state-end{stroke:var(--md-mermaid-label-bg-color)}.transition{stroke:var(--md-mermaid-edge-color)}[id^=state-fork] rect,[id^=state-join] rect{fill:var(--md-mermaid-edge-color)!important;stroke:none!important}.statediagram-cluster.statediagram-cluster .inner{fill:var(--md-default-bg-color)}.statediagram-cluster rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.statediagram-state rect.divider{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}defs #statediagram-barbEnd{stroke:var(--md-mermaid-edge-color)}.attributeBoxEven,.attributeBoxOdd{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityBox{fill:var(--md-mermaid-label-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityLabel{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.relationshipLabelBox{fill:var(--md-mermaid-label-bg-color);fill-opacity:1;background-color:var(--md-mermaid-label-bg-color);opacity:1}.relationshipLabel{fill:var(--md-mermaid-label-fg-color)}.relationshipLine{stroke:var(--md-mermaid-edge-color)}defs #ONE_OR_MORE_END *,defs #ONE_OR_MORE_START *,defs #ONLY_ONE_END *,defs #ONLY_ONE_START *,defs #ZERO_OR_MORE_END *,defs #ZERO_OR_MORE_START *,defs #ZERO_OR_ONE_END *,defs #ZERO_OR_ONE_START *{stroke:var(--md-mermaid-edge-color)!important}defs #ZERO_OR_MORE_END circle,defs #ZERO_OR_MORE_START circle{fill:var(--md-mermaid-label-bg-color)}.actor{fill:var(--md-mermaid-sequence-actor-bg-color);stroke:var(--md-mermaid-sequence-actor-border-color)}text.actor>tspan{fill:var(--md-mermaid-sequence-actor-fg-color);font-family:var(--md-mermaid-font-family)}line{stroke:var(--md-mermaid-sequence-actor-line-color)}.actor-man circle,.actor-man line{fill:var(--md-mermaid-sequence-actorman-bg-color);stroke:var(--md-mermaid-sequence-actorman-line-color)}.messageLine0,.messageLine1{stroke:var(--md-mermaid-sequence-message-line-color)}.note{fill:var(--md-mermaid-sequence-note-bg-color);stroke:var(--md-mermaid-sequence-note-border-color)}.loopText,.loopText>tspan,.messageText,.noteText>tspan{stroke:none;font-family:var(--md-mermaid-font-family)!important}.messageText{fill:var(--md-mermaid-sequence-message-fg-color)}.loopText,.loopText>tspan{fill:var(--md-mermaid-sequence-loop-fg-color)}.noteText>tspan{fill:var(--md-mermaid-sequence-note-fg-color)}#arrowhead path{fill:var(--md-mermaid-sequence-message-line-color);stroke:none}.loopLine{fill:var(--md-mermaid-sequence-loop-bg-color);stroke:var(--md-mermaid-sequence-loop-border-color)}.labelBox{fill:var(--md-mermaid-sequence-label-bg-color);stroke:none}.labelText,.labelText>span{fill:var(--md-mermaid-sequence-label-fg-color);font-family:var(--md-mermaid-font-family)}.sequenceNumber{fill:var(--md-mermaid-sequence-number-fg-color)}rect.rect{fill:var(--md-mermaid-sequence-box-bg-color);stroke:none}rect.rect+text.text{fill:var(--md-mermaid-sequence-box-fg-color)}defs #sequencenumber{fill:var(--md-mermaid-sequence-number-bg-color)!important}";var qr,ka=0;function Ha(){return typeof mermaid=="undefined"||mermaid instanceof Element?gt("https://unpkg.com/mermaid@10.7.0/dist/mermaid.min.js"):$(void 0)}function _n(e){return e.classList.remove("mermaid"),qr||(qr=Ha().pipe(y(()=>mermaid.initialize({startOnLoad:!1,themeCSS:Ln,sequence:{actorFontSize:"16px",messageFontSize:"16px",noteFontSize:"16px"}})),m(()=>{}),B(1))),qr.subscribe(()=>ro(this,null,function*(){e.classList.add("mermaid");let t=`__mermaid_${ka++}`,r=E("div",{class:"mermaid"}),o=e.textContent,{svg:n,fn:i}=yield mermaid.render(t,o),a=r.attachShadow({mode:"closed"});a.innerHTML=n,e.replaceWith(r),i==null||i(a)})),qr.pipe(m(()=>({ref:e})))}var An=E("table");function Cn(e){return e.replaceWith(An),An.replaceWith(vn(e)),$({ref:e})}function $a(e){let t=e.find(r=>r.checked)||e[0];return T(...e.map(r=>d(r,"change").pipe(m(()=>P(`label[for="${r.id}"]`))))).pipe(q(P(`label[for="${t.id}"]`)),m(r=>({active:r})))}function kn(e,{viewport$:t,target$:r}){let o=P(".tabbed-labels",e),n=R(":scope > input",e),i=Nr("prev");e.append(i);let a=Nr("next");return e.append(a),H(()=>{let s=new g,c=s.pipe(ee(),oe(!0));Q([s,Ee(e)]).pipe(U(c),Me(1,de)).subscribe({next([{active:p},l]){let f=Ue(p),{width:u}=pe(p);e.style.setProperty("--md-indicator-x",`${f.x}px`),e.style.setProperty("--md-indicator-width",`${u}px`);let h=ir(o);(f.xh.x+l.width)&&o.scrollTo({left:Math.max(0,f.x-16),behavior:"smooth"})},complete(){e.style.removeProperty("--md-indicator-x"),e.style.removeProperty("--md-indicator-width")}}),Q([et(o),Ee(o)]).pipe(U(c)).subscribe(([p,l])=>{let f=xt(o);i.hidden=p.x<16,a.hidden=p.x>f.width-l.width-16}),T(d(i,"click").pipe(m(()=>-1)),d(a,"click").pipe(m(()=>1))).pipe(U(c)).subscribe(p=>{let{width:l}=pe(o);o.scrollBy({left:l*p,behavior:"smooth"})}),r.pipe(U(c),b(p=>n.includes(p))).subscribe(p=>p.click()),o.classList.add("tabbed-labels--linked");for(let p of n){let l=P(`label[for="${p.id}"]`);l.replaceChildren(E("a",{href:`#${l.htmlFor}`,tabIndex:-1},...Array.from(l.childNodes))),d(l.firstElementChild,"click").pipe(U(c),b(f=>!(f.metaKey||f.ctrlKey)),y(f=>{f.preventDefault(),f.stopPropagation()})).subscribe(()=>{history.replaceState({},"",`#${l.htmlFor}`),l.click()})}return G("content.tabs.link")&&s.pipe(Le(1),ne(t)).subscribe(([{active:p},{offset:l}])=>{let f=p.innerText.trim();if(p.hasAttribute("data-md-switching"))p.removeAttribute("data-md-switching");else{let u=e.offsetTop-l.y;for(let w of R("[data-tabs]"))for(let A of R(":scope > input",w)){let Z=P(`label[for="${A.id}"]`);if(Z!==p&&Z.innerText.trim()===f){Z.setAttribute("data-md-switching",""),A.click();break}}window.scrollTo({top:e.offsetTop-u});let h=__md_get("__tabs")||[];__md_set("__tabs",[...new Set([f,...h])])}}),s.pipe(U(c)).subscribe(()=>{for(let p of R("audio, video",e))p.pause()}),$a(n).pipe(y(p=>s.next(p)),_(()=>s.complete()),m(p=>F({ref:e},p)))}).pipe(ze(ae))}function Hn(e,{viewport$:t,target$:r,print$:o}){return T(...R(".annotate:not(.highlight)",e).map(n=>wn(n,{target$:r,print$:o})),...R("pre:not(.mermaid) > code",e).map(n=>On(n,{target$:r,print$:o})),...R("pre.mermaid",e).map(n=>_n(n)),...R("table:not([class])",e).map(n=>Cn(n)),...R("details",e).map(n=>Mn(n,{target$:r,print$:o})),...R("[data-tabs]",e).map(n=>kn(n,{viewport$:t,target$:r})),...R("[title]",e).filter(()=>G("content.tooltips")).map(n=>Ge(n)))}function Ra(e,{alert$:t}){return t.pipe(v(r=>T($(!0),$(!1).pipe(Ye(2e3))).pipe(m(o=>({message:r,active:o})))))}function $n(e,t){let r=P(".md-typeset",e);return H(()=>{let o=new g;return o.subscribe(({message:n,active:i})=>{e.classList.toggle("md-dialog--active",i),r.textContent=n}),Ra(e,t).pipe(y(n=>o.next(n)),_(()=>o.complete()),m(n=>F({ref:e},n)))})}function Pa({viewport$:e}){if(!G("header.autohide"))return $(!1);let t=e.pipe(m(({offset:{y:n}})=>n),Ke(2,1),m(([n,i])=>[nMath.abs(i-n.y)>100),m(([,[n]])=>n),Y()),o=We("search");return Q([e,o]).pipe(m(([{offset:n},i])=>n.y>400&&!i),Y(),v(n=>n?r:$(!1)),q(!1))}function Rn(e,t){return H(()=>Q([Ee(e),Pa(t)])).pipe(m(([{height:r},o])=>({height:r,hidden:o})),Y((r,o)=>r.height===o.height&&r.hidden===o.hidden),B(1))}function Pn(e,{header$:t,main$:r}){return H(()=>{let o=new g,n=o.pipe(ee(),oe(!0));o.pipe(X("active"),je(t)).subscribe(([{active:a},{hidden:s}])=>{e.classList.toggle("md-header--shadow",a&&!s),e.hidden=s});let i=fe(R("[title]",e)).pipe(b(()=>G("content.tooltips")),re(a=>Ge(a)));return r.subscribe(o),t.pipe(U(n),m(a=>F({ref:e},a)),$e(i.pipe(U(n))))})}function Ia(e,{viewport$:t,header$:r}){return pr(e,{viewport$:t,header$:r}).pipe(m(({offset:{y:o}})=>{let{height:n}=pe(e);return{active:o>=n}}),X("active"))}function In(e,t){return H(()=>{let r=new g;r.subscribe({next({active:n}){e.classList.toggle("md-header__title--active",n)},complete(){e.classList.remove("md-header__title--active")}});let o=me(".md-content h1");return typeof o=="undefined"?L:Ia(o,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))})}function Fn(e,{viewport$:t,header$:r}){let o=r.pipe(m(({height:i})=>i),Y()),n=o.pipe(v(()=>Ee(e).pipe(m(({height:i})=>({top:e.offsetTop,bottom:e.offsetTop+i})),X("bottom"))));return Q([o,n,t]).pipe(m(([i,{top:a,bottom:s},{offset:{y:c},size:{height:p}}])=>(p=Math.max(0,p-Math.max(0,a-c,i)-Math.max(0,p+c-s)),{offset:a-i,height:p,active:a-i<=c})),Y((i,a)=>i.offset===a.offset&&i.height===a.height&&i.active===a.active))}function Fa(e){let t=__md_get("__palette")||{index:e.findIndex(o=>matchMedia(o.getAttribute("data-md-color-media")).matches)},r=Math.max(0,Math.min(t.index,e.length-1));return $(...e).pipe(re(o=>d(o,"change").pipe(m(()=>o))),q(e[r]),m(o=>({index:e.indexOf(o),color:{media:o.getAttribute("data-md-color-media"),scheme:o.getAttribute("data-md-color-scheme"),primary:o.getAttribute("data-md-color-primary"),accent:o.getAttribute("data-md-color-accent")}})),B(1))}function jn(e){let t=R("input",e),r=E("meta",{name:"theme-color"});document.head.appendChild(r);let o=E("meta",{name:"color-scheme"});document.head.appendChild(o);let n=At("(prefers-color-scheme: light)");return H(()=>{let i=new g;return i.subscribe(a=>{if(document.body.setAttribute("data-md-color-switching",""),a.color.media==="(prefers-color-scheme)"){let s=matchMedia("(prefers-color-scheme: light)"),c=document.querySelector(s.matches?"[data-md-color-media='(prefers-color-scheme: light)']":"[data-md-color-media='(prefers-color-scheme: dark)']");a.color.scheme=c.getAttribute("data-md-color-scheme"),a.color.primary=c.getAttribute("data-md-color-primary"),a.color.accent=c.getAttribute("data-md-color-accent")}for(let[s,c]of Object.entries(a.color))document.body.setAttribute(`data-md-color-${s}`,c);for(let s=0;sa.key==="Enter"),ne(i,(a,s)=>s)).subscribe(({index:a})=>{a=(a+1)%t.length,t[a].click(),t[a].focus()}),i.pipe(m(()=>{let a=Te("header"),s=window.getComputedStyle(a);return o.content=s.colorScheme,s.backgroundColor.match(/\d+/g).map(c=>(+c).toString(16).padStart(2,"0")).join("")})).subscribe(a=>r.content=`#${a}`),i.pipe(Oe(ae)).subscribe(()=>{document.body.removeAttribute("data-md-color-switching")}),Fa(t).pipe(U(n.pipe(Le(1))),at(),y(a=>i.next(a)),_(()=>i.complete()),m(a=>F({ref:e},a)))})}function Un(e,{progress$:t}){return H(()=>{let r=new g;return r.subscribe(({value:o})=>{e.style.setProperty("--md-progress-value",`${o}`)}),t.pipe(y(o=>r.next({value:o})),_(()=>r.complete()),m(o=>({ref:e,value:o})))})}var Kr=jt(zr());function ja(e){e.setAttribute("data-md-copying","");let t=e.closest("[data-copy]"),r=t?t.getAttribute("data-copy"):e.innerText;return e.removeAttribute("data-md-copying"),r.trimEnd()}function Wn({alert$:e}){Kr.default.isSupported()&&new j(t=>{new Kr.default("[data-clipboard-target], [data-clipboard-text]",{text:r=>r.getAttribute("data-clipboard-text")||ja(P(r.getAttribute("data-clipboard-target")))}).on("success",r=>t.next(r))}).pipe(y(t=>{t.trigger.focus()}),m(()=>ge("clipboard.copied"))).subscribe(e)}function Dn(e,t){return e.protocol=t.protocol,e.hostname=t.hostname,e}function Ua(e,t){let r=new Map;for(let o of R("url",e)){let n=P("loc",o),i=[Dn(new URL(n.textContent),t)];r.set(`${i[0]}`,i);for(let a of R("[rel=alternate]",o)){let s=a.getAttribute("href");s!=null&&i.push(Dn(new URL(s),t))}}return r}function mr(e){return on(new URL("sitemap.xml",e)).pipe(m(t=>Ua(t,new URL(e))),he(()=>$(new Map)))}function Wa(e,t){if(!(e.target instanceof Element))return L;let r=e.target.closest("a");if(r===null)return L;if(r.target||e.metaKey||e.ctrlKey)return L;let o=new URL(r.href);return o.search=o.hash="",t.has(`${o}`)?(e.preventDefault(),$(new URL(r.href))):L}function Nn(e){let t=new Map;for(let r of R(":scope > *",e.head))t.set(r.outerHTML,r);return t}function Vn(e){for(let t of R("[href], [src]",e))for(let r of["href","src"]){let o=t.getAttribute(r);if(o&&!/^(?:[a-z]+:)?\/\//i.test(o)){t[r]=t[r];break}}return $(e)}function Da(e){for(let o of["[data-md-component=announce]","[data-md-component=container]","[data-md-component=header-topic]","[data-md-component=outdated]","[data-md-component=logo]","[data-md-component=skip]",...G("navigation.tabs.sticky")?["[data-md-component=tabs]"]:[]]){let n=me(o),i=me(o,e);typeof n!="undefined"&&typeof i!="undefined"&&n.replaceWith(i)}let t=Nn(document);for(let[o,n]of Nn(e))t.has(o)?t.delete(o):document.head.appendChild(n);for(let o of t.values()){let n=o.getAttribute("name");n!=="theme-color"&&n!=="color-scheme"&&o.remove()}let r=Te("container");return Fe(R("script",r)).pipe(v(o=>{let n=e.createElement("script");if(o.src){for(let i of o.getAttributeNames())n.setAttribute(i,o.getAttribute(i));return o.replaceWith(n),new j(i=>{n.onload=()=>i.complete()})}else return n.textContent=o.textContent,o.replaceWith(n),L}),ee(),oe(document))}function zn({location$:e,viewport$:t,progress$:r}){let o=we();if(location.protocol==="file:")return L;let n=mr(o.base);$(document).subscribe(Vn);let i=d(document.body,"click").pipe(je(n),v(([c,p])=>Wa(c,p)),le()),a=d(window,"popstate").pipe(m(ve),le());i.pipe(ne(t)).subscribe(([c,{offset:p}])=>{history.replaceState(p,""),history.pushState(null,"",c)}),T(i,a).subscribe(e);let s=e.pipe(X("pathname"),v(c=>rn(c,{progress$:r}).pipe(he(()=>(st(c,!0),L)))),v(Vn),v(Da),le());return T(s.pipe(ne(e,(c,p)=>p)),e.pipe(X("pathname"),v(()=>e),X("hash")),e.pipe(Y((c,p)=>c.pathname===p.pathname&&c.hash===p.hash),v(()=>i),y(()=>history.back()))).subscribe(c=>{var p,l;history.state!==null||!c.hash?window.scrollTo(0,(l=(p=history.state)==null?void 0:p.y)!=null?l:0):(history.scrollRestoration="auto",Zo(c.hash),history.scrollRestoration="manual")}),e.subscribe(()=>{history.scrollRestoration="manual"}),d(window,"beforeunload").subscribe(()=>{history.scrollRestoration="auto"}),t.pipe(X("offset"),be(100)).subscribe(({offset:c})=>{history.replaceState(c,"")}),s}var Qn=jt(Kn());function Yn(e){let t=e.separator.split("|").map(n=>n.replace(/(\(\?[!=<][^)]+\))/g,"").length===0?"\uFFFD":n).join("|"),r=new RegExp(t,"img"),o=(n,i,a)=>`${i}${a}`;return n=>{n=n.replace(/[\s*+\-:~^]+/g," ").trim();let i=new RegExp(`(^|${e.separator}|)(${n.replace(/[|\\{}()[\]^$+*?.-]/g,"\\$&").replace(r,"|")})`,"img");return a=>(0,Qn.default)(a).replace(i,o).replace(/<\/mark>(\s+)]*>/img,"$1")}}function Ht(e){return e.type===1}function fr(e){return e.type===3}function Bn(e,t){let r=ln(e);return T($(location.protocol!=="file:"),We("search")).pipe(He(o=>o),v(()=>t)).subscribe(({config:o,docs:n})=>r.next({type:0,data:{config:o,docs:n,options:{suggest:G("search.suggest")}}})),r}function Gn({document$:e}){let t=we(),r=De(new URL("../versions.json",t.base)).pipe(he(()=>L)),o=r.pipe(m(n=>{let[,i]=t.base.match(/([^/]+)\/?$/);return n.find(({version:a,aliases:s})=>a===i||s.includes(i))||n[0]}));r.pipe(m(n=>new Map(n.map(i=>[`${new URL(`../${i.version}/`,t.base)}`,i]))),v(n=>d(document.body,"click").pipe(b(i=>!i.metaKey&&!i.ctrlKey),ne(o),v(([i,a])=>{if(i.target instanceof Element){let s=i.target.closest("a");if(s&&!s.target&&n.has(s.href)){let c=s.href;return!i.target.closest(".md-version")&&n.get(c)===a?L:(i.preventDefault(),$(c))}}return L}),v(i=>{let{version:a}=n.get(i);return mr(new URL(i)).pipe(m(s=>{let p=ve().href.replace(t.base,"");return s.has(p.split("#")[0])?new URL(`../${a}/${p}`,t.base):new URL(i)}))})))).subscribe(n=>st(n,!0)),Q([r,o]).subscribe(([n,i])=>{P(".md-header__topic").appendChild(gn(n,i))}),e.pipe(v(()=>o)).subscribe(n=>{var a;let i=__md_get("__outdated",sessionStorage);if(i===null){i=!0;let s=((a=t.version)==null?void 0:a.default)||"latest";Array.isArray(s)||(s=[s]);e:for(let c of s)for(let p of n.aliases.concat(n.version))if(new RegExp(c,"i").test(p)){i=!1;break e}__md_set("__outdated",i,sessionStorage)}if(i)for(let s of ie("outdated"))s.hidden=!1})}function Ka(e,{worker$:t}){let{searchParams:r}=ve();r.has("q")&&(Be("search",!0),e.value=r.get("q"),e.focus(),We("search").pipe(He(i=>!i)).subscribe(()=>{let i=ve();i.searchParams.delete("q"),history.replaceState({},"",`${i}`)}));let o=vt(e),n=T(t.pipe(He(Ht)),d(e,"keyup"),o).pipe(m(()=>e.value),Y());return Q([n,o]).pipe(m(([i,a])=>({value:i,focus:a})),B(1))}function Jn(e,{worker$:t}){let r=new g,o=r.pipe(ee(),oe(!0));Q([t.pipe(He(Ht)),r],(i,a)=>a).pipe(X("value")).subscribe(({value:i})=>t.next({type:2,data:i})),r.pipe(X("focus")).subscribe(({focus:i})=>{i&&Be("search",i)}),d(e.form,"reset").pipe(U(o)).subscribe(()=>e.focus());let n=P("header [for=__search]");return d(n,"click").subscribe(()=>e.focus()),Ka(e,{worker$:t}).pipe(y(i=>r.next(i)),_(()=>r.complete()),m(i=>F({ref:e},i)),B(1))}function Xn(e,{worker$:t,query$:r}){let o=new g,n=Yo(e.parentElement).pipe(b(Boolean)),i=e.parentElement,a=P(":scope > :first-child",e),s=P(":scope > :last-child",e);We("search").subscribe(l=>s.setAttribute("role",l?"list":"presentation")),o.pipe(ne(r),Ir(t.pipe(He(Ht)))).subscribe(([{items:l},{value:f}])=>{switch(l.length){case 0:a.textContent=f.length?ge("search.result.none"):ge("search.result.placeholder");break;case 1:a.textContent=ge("search.result.one");break;default:let u=ar(l.length);a.textContent=ge("search.result.other",u)}});let c=o.pipe(y(()=>s.innerHTML=""),v(({items:l})=>T($(...l.slice(0,10)),$(...l.slice(10)).pipe(Ke(4),jr(n),v(([f])=>f)))),m(hn),le());return c.subscribe(l=>s.appendChild(l)),c.pipe(re(l=>{let f=me("details",l);return typeof f=="undefined"?L:d(f,"toggle").pipe(U(o),m(()=>f))})).subscribe(l=>{l.open===!1&&l.offsetTop<=i.scrollTop&&i.scrollTo({top:l.offsetTop})}),t.pipe(b(fr),m(({data:l})=>l)).pipe(y(l=>o.next(l)),_(()=>o.complete()),m(l=>F({ref:e},l)))}function Qa(e,{query$:t}){return t.pipe(m(({value:r})=>{let o=ve();return o.hash="",r=r.replace(/\s+/g,"+").replace(/&/g,"%26").replace(/=/g,"%3D"),o.search=`q=${r}`,{url:o}}))}function Zn(e,t){let r=new g,o=r.pipe(ee(),oe(!0));return r.subscribe(({url:n})=>{e.setAttribute("data-clipboard-text",e.href),e.href=`${n}`}),d(e,"click").pipe(U(o)).subscribe(n=>n.preventDefault()),Qa(e,t).pipe(y(n=>r.next(n)),_(()=>r.complete()),m(n=>F({ref:e},n)))}function ei(e,{worker$:t,keyboard$:r}){let o=new g,n=Te("search-query"),i=T(d(n,"keydown"),d(n,"focus")).pipe(Oe(ae),m(()=>n.value),Y());return o.pipe(je(i),m(([{suggest:s},c])=>{let p=c.split(/([\s-]+)/);if(s!=null&&s.length&&p[p.length-1]){let l=s[s.length-1];l.startsWith(p[p.length-1])&&(p[p.length-1]=l)}else p.length=0;return p})).subscribe(s=>e.innerHTML=s.join("").replace(/\s/g," ")),r.pipe(b(({mode:s})=>s==="search")).subscribe(s=>{switch(s.type){case"ArrowRight":e.innerText.length&&n.selectionStart===n.value.length&&(n.value=e.innerText);break}}),t.pipe(b(fr),m(({data:s})=>s)).pipe(y(s=>o.next(s)),_(()=>o.complete()),m(()=>({ref:e})))}function ti(e,{index$:t,keyboard$:r}){let o=we();try{let n=Bn(o.search,t),i=Te("search-query",e),a=Te("search-result",e);d(e,"click").pipe(b(({target:c})=>c instanceof Element&&!!c.closest("a"))).subscribe(()=>Be("search",!1)),r.pipe(b(({mode:c})=>c==="search")).subscribe(c=>{let p=Re();switch(c.type){case"Enter":if(p===i){let l=new Map;for(let f of R(":first-child [href]",a)){let u=f.firstElementChild;l.set(f,parseFloat(u.getAttribute("data-md-score")))}if(l.size){let[[f]]=[...l].sort(([,u],[,h])=>h-u);f.click()}c.claim()}break;case"Escape":case"Tab":Be("search",!1),i.blur();break;case"ArrowUp":case"ArrowDown":if(typeof p=="undefined")i.focus();else{let l=[i,...R(":not(details) > [href], summary, details[open] [href]",a)],f=Math.max(0,(Math.max(0,l.indexOf(p))+l.length+(c.type==="ArrowUp"?-1:1))%l.length);l[f].focus()}c.claim();break;default:i!==Re()&&i.focus()}}),r.pipe(b(({mode:c})=>c==="global")).subscribe(c=>{switch(c.type){case"f":case"s":case"/":i.focus(),i.select(),c.claim();break}});let s=Jn(i,{worker$:n});return T(s,Xn(a,{worker$:n,query$:s})).pipe($e(...ie("search-share",e).map(c=>Zn(c,{query$:s})),...ie("search-suggest",e).map(c=>ei(c,{worker$:n,keyboard$:r}))))}catch(n){return e.hidden=!0,qe}}function ri(e,{index$:t,location$:r}){return Q([t,r.pipe(q(ve()),b(o=>!!o.searchParams.get("h")))]).pipe(m(([o,n])=>Yn(o.config)(n.searchParams.get("h"))),m(o=>{var a;let n=new Map,i=document.createNodeIterator(e,NodeFilter.SHOW_TEXT);for(let s=i.nextNode();s;s=i.nextNode())if((a=s.parentElement)!=null&&a.offsetHeight){let c=s.textContent,p=o(c);p.length>c.length&&n.set(s,p)}for(let[s,c]of n){let{childNodes:p}=E("span",null,c);s.replaceWith(...Array.from(p))}return{ref:e,nodes:n}}))}function Ya(e,{viewport$:t,main$:r}){let o=e.closest(".md-grid"),n=o.offsetTop-o.parentElement.offsetTop;return Q([r,t]).pipe(m(([{offset:i,height:a},{offset:{y:s}}])=>(a=a+Math.min(n,Math.max(0,s-i))-n,{height:a,locked:s>=i+n})),Y((i,a)=>i.height===a.height&&i.locked===a.locked))}function Qr(e,o){var n=o,{header$:t}=n,r=to(n,["header$"]);let i=P(".md-sidebar__scrollwrap",e),{y:a}=Ue(i);return H(()=>{let s=new g,c=s.pipe(ee(),oe(!0)),p=s.pipe(Me(0,de));return p.pipe(ne(t)).subscribe({next([{height:l},{height:f}]){i.style.height=`${l-2*a}px`,e.style.top=`${f}px`},complete(){i.style.height="",e.style.top=""}}),p.pipe(He()).subscribe(()=>{for(let l of R(".md-nav__link--active[href]",e)){if(!l.clientHeight)continue;let f=l.closest(".md-sidebar__scrollwrap");if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2})}}}),fe(R("label[tabindex]",e)).pipe(re(l=>d(l,"click").pipe(Oe(ae),m(()=>l),U(c)))).subscribe(l=>{let f=P(`[id="${l.htmlFor}"]`);P(`[aria-labelledby="${l.id}"]`).setAttribute("aria-expanded",`${f.checked}`)}),Ya(e,r).pipe(y(l=>s.next(l)),_(()=>s.complete()),m(l=>F({ref:e},l)))})}function oi(e,t){if(typeof t!="undefined"){let r=`https://api.github.com/repos/${e}/${t}`;return Lt(De(`${r}/releases/latest`).pipe(he(()=>L),m(o=>({version:o.tag_name})),Qe({})),De(r).pipe(he(()=>L),m(o=>({stars:o.stargazers_count,forks:o.forks_count})),Qe({}))).pipe(m(([o,n])=>F(F({},o),n)))}else{let r=`https://api.github.com/users/${e}`;return De(r).pipe(m(o=>({repositories:o.public_repos})),Qe({}))}}function ni(e,t){let r=`https://${e}/api/v4/projects/${encodeURIComponent(t)}`;return De(r).pipe(he(()=>L),m(({star_count:o,forks_count:n})=>({stars:o,forks:n})),Qe({}))}function ii(e){let t=e.match(/^.+github\.com\/([^/]+)\/?([^/]+)?/i);if(t){let[,r,o]=t;return oi(r,o)}if(t=e.match(/^.+?([^/]*gitlab[^/]+)\/(.+?)\/?$/i),t){let[,r,o]=t;return ni(r,o)}return L}var Ba;function Ga(e){return Ba||(Ba=H(()=>{let t=__md_get("__source",sessionStorage);if(t)return $(t);if(ie("consent").length){let o=__md_get("__consent");if(!(o&&o.github))return L}return ii(e.href).pipe(y(o=>__md_set("__source",o,sessionStorage)))}).pipe(he(()=>L),b(t=>Object.keys(t).length>0),m(t=>({facts:t})),B(1)))}function ai(e){let t=P(":scope > :last-child",e);return H(()=>{let r=new g;return r.subscribe(({facts:o})=>{t.appendChild(bn(o)),t.classList.add("md-source__repository--active")}),Ga(e).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Ja(e,{viewport$:t,header$:r}){return Ee(document.body).pipe(v(()=>pr(e,{header$:r,viewport$:t})),m(({offset:{y:o}})=>({hidden:o>=10})),X("hidden"))}function si(e,t){return H(()=>{let r=new g;return r.subscribe({next({hidden:o}){e.hidden=o},complete(){e.hidden=!1}}),(G("navigation.tabs.sticky")?$({hidden:!1}):Ja(e,t)).pipe(y(o=>r.next(o)),_(()=>r.complete()),m(o=>F({ref:e},o)))})}function Xa(e,{viewport$:t,header$:r}){let o=new Map,n=R(".md-nav__link",e);for(let s of n){let c=decodeURIComponent(s.hash.substring(1)),p=me(`[id="${c}"]`);typeof p!="undefined"&&o.set(s,p)}let i=r.pipe(X("height"),m(({height:s})=>{let c=Te("main"),p=P(":scope > :first-child",c);return s+.8*(p.offsetTop-c.offsetTop)}),le());return Ee(document.body).pipe(X("height"),v(s=>H(()=>{let c=[];return $([...o].reduce((p,[l,f])=>{for(;c.length&&o.get(c[c.length-1]).tagName>=f.tagName;)c.pop();let u=f.offsetTop;for(;!u&&f.parentElement;)f=f.parentElement,u=f.offsetTop;let h=f.offsetParent;for(;h;h=h.offsetParent)u+=h.offsetTop;return p.set([...c=[...c,l]].reverse(),u)},new Map))}).pipe(m(c=>new Map([...c].sort(([,p],[,l])=>p-l))),je(i),v(([c,p])=>t.pipe(Rr(([l,f],{offset:{y:u},size:h})=>{let w=u+h.height>=Math.floor(s.height);for(;f.length;){let[,A]=f[0];if(A-p=u&&!w)f=[l.pop(),...f];else break}return[l,f]},[[],[...c]]),Y((l,f)=>l[0]===f[0]&&l[1]===f[1])))))).pipe(m(([s,c])=>({prev:s.map(([p])=>p),next:c.map(([p])=>p)})),q({prev:[],next:[]}),Ke(2,1),m(([s,c])=>s.prev.length{let i=new g,a=i.pipe(ee(),oe(!0));if(i.subscribe(({prev:s,next:c})=>{for(let[p]of c)p.classList.remove("md-nav__link--passed"),p.classList.remove("md-nav__link--active");for(let[p,[l]]of s.entries())l.classList.add("md-nav__link--passed"),l.classList.toggle("md-nav__link--active",p===s.length-1)}),G("toc.follow")){let s=T(t.pipe(be(1),m(()=>{})),t.pipe(be(250),m(()=>"smooth")));i.pipe(b(({prev:c})=>c.length>0),je(o.pipe(Oe(ae))),ne(s)).subscribe(([[{prev:c}],p])=>{let[l]=c[c.length-1];if(l.offsetHeight){let f=sr(l);if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:h}=pe(f);f.scrollTo({top:u-h/2,behavior:p})}}})}return G("navigation.tracking")&&t.pipe(U(a),X("offset"),be(250),Le(1),U(n.pipe(Le(1))),at({delay:250}),ne(i)).subscribe(([,{prev:s}])=>{let c=ve(),p=s[s.length-1];if(p&&p.length){let[l]=p,{hash:f}=new URL(l.href);c.hash!==f&&(c.hash=f,history.replaceState({},"",`${c}`))}else c.hash="",history.replaceState({},"",`${c}`)}),Xa(e,{viewport$:t,header$:r}).pipe(y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))})}function Za(e,{viewport$:t,main$:r,target$:o}){let n=t.pipe(m(({offset:{y:a}})=>a),Ke(2,1),m(([a,s])=>a>s&&s>0),Y()),i=r.pipe(m(({active:a})=>a));return Q([i,n]).pipe(m(([a,s])=>!(a&&s)),Y(),U(o.pipe(Le(1))),oe(!0),at({delay:250}),m(a=>({hidden:a})))}function pi(e,{viewport$:t,header$:r,main$:o,target$:n}){let i=new g,a=i.pipe(ee(),oe(!0));return i.subscribe({next({hidden:s}){e.hidden=s,s?(e.setAttribute("tabindex","-1"),e.blur()):e.removeAttribute("tabindex")},complete(){e.style.top="",e.hidden=!0,e.removeAttribute("tabindex")}}),r.pipe(U(a),X("height")).subscribe(({height:s})=>{e.style.top=`${s+16}px`}),d(e,"click").subscribe(s=>{s.preventDefault(),window.scrollTo({top:0})}),Za(e,{viewport$:t,main$:o,target$:n}).pipe(y(s=>i.next(s)),_(()=>i.complete()),m(s=>F({ref:e},s)))}function li({document$:e}){e.pipe(v(()=>R(".md-ellipsis")),re(t=>yt(t).pipe(U(e.pipe(Le(1))),b(r=>r),m(()=>t),ye(1))),b(t=>t.offsetWidth{let r=t.innerText,o=t.closest("a")||t;return o.title=r,Ge(o).pipe(U(e.pipe(Le(1))),_(()=>o.removeAttribute("title")))})).subscribe(),e.pipe(v(()=>R(".md-status")),re(t=>Ge(t))).subscribe()}function mi({document$:e,tablet$:t}){e.pipe(v(()=>R(".md-toggle--indeterminate")),y(r=>{r.indeterminate=!0,r.checked=!1}),re(r=>d(r,"change").pipe(Fr(()=>r.classList.contains("md-toggle--indeterminate")),m(()=>r))),ne(t)).subscribe(([r,o])=>{r.classList.remove("md-toggle--indeterminate"),o&&(r.checked=!1)})}function es(){return/(iPad|iPhone|iPod)/.test(navigator.userAgent)}function fi({document$:e}){e.pipe(v(()=>R("[data-md-scrollfix]")),y(t=>t.removeAttribute("data-md-scrollfix")),b(es),re(t=>d(t,"touchstart").pipe(m(()=>t)))).subscribe(t=>{let r=t.scrollTop;r===0?t.scrollTop=1:r+t.offsetHeight===t.scrollHeight&&(t.scrollTop=r-1)})}function ui({viewport$:e,tablet$:t}){Q([We("search"),t]).pipe(m(([r,o])=>r&&!o),v(r=>$(r).pipe(Ye(r?400:100))),ne(e)).subscribe(([r,{offset:{y:o}}])=>{if(r)document.body.setAttribute("data-md-scrolllock",""),document.body.style.top=`-${o}px`;else{let n=-1*parseInt(document.body.style.top,10);document.body.removeAttribute("data-md-scrolllock"),document.body.style.top="",n&&window.scrollTo(0,n)}})}Object.entries||(Object.entries=function(e){let t=[];for(let r of Object.keys(e))t.push([r,e[r]]);return t});Object.values||(Object.values=function(e){let t=[];for(let r of Object.keys(e))t.push(e[r]);return t});typeof Element!="undefined"&&(Element.prototype.scrollTo||(Element.prototype.scrollTo=function(e,t){typeof e=="object"?(this.scrollLeft=e.left,this.scrollTop=e.top):(this.scrollLeft=e,this.scrollTop=t)}),Element.prototype.replaceWith||(Element.prototype.replaceWith=function(...e){let t=this.parentNode;if(t){e.length===0&&t.removeChild(this);for(let r=e.length-1;r>=0;r--){let o=e[r];typeof o=="string"?o=document.createTextNode(o):o.parentNode&&o.parentNode.removeChild(o),r?t.insertBefore(this.previousSibling,o):t.replaceChild(o,this)}}}));function ts(){return location.protocol==="file:"?gt(`${new URL("search/search_index.js",Yr.base)}`).pipe(m(()=>__index),B(1)):De(new URL("search/search_index.json",Yr.base))}document.documentElement.classList.remove("no-js");document.documentElement.classList.add("js");var rt=No(),Rt=Jo(),wt=en(Rt),Br=Go(),_e=pn(),ur=At("(min-width: 960px)"),hi=At("(min-width: 1220px)"),bi=tn(),Yr=we(),vi=document.forms.namedItem("search")?ts():qe,Gr=new g;Wn({alert$:Gr});var Jr=new g;G("navigation.instant")&&zn({location$:Rt,viewport$:_e,progress$:Jr}).subscribe(rt);var di;((di=Yr.version)==null?void 0:di.provider)==="mike"&&Gn({document$:rt});T(Rt,wt).pipe(Ye(125)).subscribe(()=>{Be("drawer",!1),Be("search",!1)});Br.pipe(b(({mode:e})=>e==="global")).subscribe(e=>{switch(e.type){case"p":case",":let t=me("link[rel=prev]");typeof t!="undefined"&&st(t);break;case"n":case".":let r=me("link[rel=next]");typeof r!="undefined"&&st(r);break;case"Enter":let o=Re();o instanceof HTMLLabelElement&&o.click()}});li({document$:rt});mi({document$:rt,tablet$:ur});fi({document$:rt});ui({viewport$:_e,tablet$:ur});var tt=Rn(Te("header"),{viewport$:_e}),$t=rt.pipe(m(()=>Te("main")),v(e=>Fn(e,{viewport$:_e,header$:tt})),B(1)),rs=T(...ie("consent").map(e=>fn(e,{target$:wt})),...ie("dialog").map(e=>$n(e,{alert$:Gr})),...ie("header").map(e=>Pn(e,{viewport$:_e,header$:tt,main$:$t})),...ie("palette").map(e=>jn(e)),...ie("progress").map(e=>Un(e,{progress$:Jr})),...ie("search").map(e=>ti(e,{index$:vi,keyboard$:Br})),...ie("source").map(e=>ai(e))),os=H(()=>T(...ie("announce").map(e=>mn(e)),...ie("content").map(e=>Hn(e,{viewport$:_e,target$:wt,print$:bi})),...ie("content").map(e=>G("search.highlight")?ri(e,{index$:vi,location$:Rt}):L),...ie("header-title").map(e=>In(e,{viewport$:_e,header$:tt})),...ie("sidebar").map(e=>e.getAttribute("data-md-type")==="navigation"?Ur(hi,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t})):Ur(ur,()=>Qr(e,{viewport$:_e,header$:tt,main$:$t}))),...ie("tabs").map(e=>si(e,{viewport$:_e,header$:tt})),...ie("toc").map(e=>ci(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})),...ie("top").map(e=>pi(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})))),gi=rt.pipe(v(()=>os),$e(rs),B(1));gi.subscribe();window.document$=rt;window.location$=Rt;window.target$=wt;window.keyboard$=Br;window.viewport$=_e;window.tablet$=ur;window.screen$=hi;window.print$=bi;window.alert$=Gr;window.progress$=Jr;window.component$=gi;})(); +//# sourceMappingURL=bundle.1e8ae164.min.js.map diff --git a/assets/javascripts/bundle.bd41221c.min.js.map b/assets/javascripts/bundle.1e8ae164.min.js.map similarity index 82% rename from assets/javascripts/bundle.bd41221c.min.js.map rename to assets/javascripts/bundle.1e8ae164.min.js.map index 1663daba..6c33b8e8 100644 --- a/assets/javascripts/bundle.bd41221c.min.js.map +++ b/assets/javascripts/bundle.1e8ae164.min.js.map @@ -1,7 +1,7 @@ { "version": 3, "sources": ["node_modules/focus-visible/dist/focus-visible.js", "node_modules/clipboard/dist/clipboard.js", "node_modules/escape-html/index.js", "src/templates/assets/javascripts/bundle.ts", "node_modules/rxjs/node_modules/tslib/tslib.es6.js", "node_modules/rxjs/src/internal/util/isFunction.ts", "node_modules/rxjs/src/internal/util/createErrorClass.ts", "node_modules/rxjs/src/internal/util/UnsubscriptionError.ts", "node_modules/rxjs/src/internal/util/arrRemove.ts", "node_modules/rxjs/src/internal/Subscription.ts", "node_modules/rxjs/src/internal/config.ts", "node_modules/rxjs/src/internal/scheduler/timeoutProvider.ts", "node_modules/rxjs/src/internal/util/reportUnhandledError.ts", "node_modules/rxjs/src/internal/util/noop.ts", "node_modules/rxjs/src/internal/NotificationFactories.ts", "node_modules/rxjs/src/internal/util/errorContext.ts", "node_modules/rxjs/src/internal/Subscriber.ts", "node_modules/rxjs/src/internal/symbol/observable.ts", "node_modules/rxjs/src/internal/util/identity.ts", "node_modules/rxjs/src/internal/util/pipe.ts", "node_modules/rxjs/src/internal/Observable.ts", "node_modules/rxjs/src/internal/util/lift.ts", "node_modules/rxjs/src/internal/operators/OperatorSubscriber.ts", "node_modules/rxjs/src/internal/scheduler/animationFrameProvider.ts", "node_modules/rxjs/src/internal/util/ObjectUnsubscribedError.ts", "node_modules/rxjs/src/internal/Subject.ts", "node_modules/rxjs/src/internal/scheduler/dateTimestampProvider.ts", "node_modules/rxjs/src/internal/ReplaySubject.ts", "node_modules/rxjs/src/internal/scheduler/Action.ts", "node_modules/rxjs/src/internal/scheduler/intervalProvider.ts", "node_modules/rxjs/src/internal/scheduler/AsyncAction.ts", "node_modules/rxjs/src/internal/Scheduler.ts", "node_modules/rxjs/src/internal/scheduler/AsyncScheduler.ts", "node_modules/rxjs/src/internal/scheduler/async.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameAction.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameScheduler.ts", "node_modules/rxjs/src/internal/scheduler/animationFrame.ts", "node_modules/rxjs/src/internal/observable/empty.ts", "node_modules/rxjs/src/internal/util/isScheduler.ts", "node_modules/rxjs/src/internal/util/args.ts", "node_modules/rxjs/src/internal/util/isArrayLike.ts", "node_modules/rxjs/src/internal/util/isPromise.ts", "node_modules/rxjs/src/internal/util/isInteropObservable.ts", "node_modules/rxjs/src/internal/util/isAsyncIterable.ts", "node_modules/rxjs/src/internal/util/throwUnobservableError.ts", "node_modules/rxjs/src/internal/symbol/iterator.ts", "node_modules/rxjs/src/internal/util/isIterable.ts", "node_modules/rxjs/src/internal/util/isReadableStreamLike.ts", "node_modules/rxjs/src/internal/observable/innerFrom.ts", "node_modules/rxjs/src/internal/util/executeSchedule.ts", "node_modules/rxjs/src/internal/operators/observeOn.ts", "node_modules/rxjs/src/internal/operators/subscribeOn.ts", "node_modules/rxjs/src/internal/scheduled/scheduleObservable.ts", "node_modules/rxjs/src/internal/scheduled/schedulePromise.ts", "node_modules/rxjs/src/internal/scheduled/scheduleArray.ts", "node_modules/rxjs/src/internal/scheduled/scheduleIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleAsyncIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleReadableStreamLike.ts", "node_modules/rxjs/src/internal/scheduled/scheduled.ts", "node_modules/rxjs/src/internal/observable/from.ts", "node_modules/rxjs/src/internal/observable/of.ts", "node_modules/rxjs/src/internal/observable/throwError.ts", "node_modules/rxjs/src/internal/util/EmptyError.ts", "node_modules/rxjs/src/internal/util/isDate.ts", "node_modules/rxjs/src/internal/operators/map.ts", "node_modules/rxjs/src/internal/util/mapOneOrManyArgs.ts", "node_modules/rxjs/src/internal/util/argsArgArrayOrObject.ts", "node_modules/rxjs/src/internal/util/createObject.ts", "node_modules/rxjs/src/internal/observable/combineLatest.ts", "node_modules/rxjs/src/internal/operators/mergeInternals.ts", "node_modules/rxjs/src/internal/operators/mergeMap.ts", "node_modules/rxjs/src/internal/operators/mergeAll.ts", "node_modules/rxjs/src/internal/operators/concatAll.ts", "node_modules/rxjs/src/internal/observable/concat.ts", "node_modules/rxjs/src/internal/observable/defer.ts", "node_modules/rxjs/src/internal/observable/fromEvent.ts", "node_modules/rxjs/src/internal/observable/fromEventPattern.ts", "node_modules/rxjs/src/internal/observable/timer.ts", "node_modules/rxjs/src/internal/observable/merge.ts", "node_modules/rxjs/src/internal/observable/never.ts", "node_modules/rxjs/src/internal/util/argsOrArgArray.ts", "node_modules/rxjs/src/internal/operators/filter.ts", "node_modules/rxjs/src/internal/observable/zip.ts", "node_modules/rxjs/src/internal/operators/audit.ts", "node_modules/rxjs/src/internal/operators/auditTime.ts", "node_modules/rxjs/src/internal/operators/bufferCount.ts", "node_modules/rxjs/src/internal/operators/catchError.ts", "node_modules/rxjs/src/internal/operators/scanInternals.ts", "node_modules/rxjs/src/internal/operators/combineLatest.ts", "node_modules/rxjs/src/internal/operators/combineLatestWith.ts", "node_modules/rxjs/src/internal/operators/debounceTime.ts", "node_modules/rxjs/src/internal/operators/defaultIfEmpty.ts", "node_modules/rxjs/src/internal/operators/take.ts", "node_modules/rxjs/src/internal/operators/ignoreElements.ts", "node_modules/rxjs/src/internal/operators/mapTo.ts", "node_modules/rxjs/src/internal/operators/delayWhen.ts", "node_modules/rxjs/src/internal/operators/delay.ts", "node_modules/rxjs/src/internal/operators/distinctUntilChanged.ts", "node_modules/rxjs/src/internal/operators/distinctUntilKeyChanged.ts", "node_modules/rxjs/src/internal/operators/throwIfEmpty.ts", "node_modules/rxjs/src/internal/operators/endWith.ts", "node_modules/rxjs/src/internal/operators/finalize.ts", "node_modules/rxjs/src/internal/operators/first.ts", "node_modules/rxjs/src/internal/operators/takeLast.ts", "node_modules/rxjs/src/internal/operators/merge.ts", "node_modules/rxjs/src/internal/operators/mergeWith.ts", "node_modules/rxjs/src/internal/operators/repeat.ts", "node_modules/rxjs/src/internal/operators/scan.ts", "node_modules/rxjs/src/internal/operators/share.ts", "node_modules/rxjs/src/internal/operators/shareReplay.ts", "node_modules/rxjs/src/internal/operators/skip.ts", "node_modules/rxjs/src/internal/operators/skipUntil.ts", "node_modules/rxjs/src/internal/operators/startWith.ts", "node_modules/rxjs/src/internal/operators/switchMap.ts", "node_modules/rxjs/src/internal/operators/takeUntil.ts", "node_modules/rxjs/src/internal/operators/takeWhile.ts", "node_modules/rxjs/src/internal/operators/tap.ts", "node_modules/rxjs/src/internal/operators/throttle.ts", "node_modules/rxjs/src/internal/operators/throttleTime.ts", "node_modules/rxjs/src/internal/operators/withLatestFrom.ts", "node_modules/rxjs/src/internal/operators/zip.ts", "node_modules/rxjs/src/internal/operators/zipWith.ts", "src/templates/assets/javascripts/browser/document/index.ts", "src/templates/assets/javascripts/browser/element/_/index.ts", "src/templates/assets/javascripts/browser/element/focus/index.ts", "src/templates/assets/javascripts/browser/element/hover/index.ts", "src/templates/assets/javascripts/browser/element/offset/_/index.ts", "src/templates/assets/javascripts/browser/element/offset/content/index.ts", "src/templates/assets/javascripts/utilities/h/index.ts", "src/templates/assets/javascripts/utilities/round/index.ts", "src/templates/assets/javascripts/browser/script/index.ts", "src/templates/assets/javascripts/browser/element/size/_/index.ts", "src/templates/assets/javascripts/browser/element/size/content/index.ts", "src/templates/assets/javascripts/browser/element/visibility/index.ts", "src/templates/assets/javascripts/browser/toggle/index.ts", "src/templates/assets/javascripts/browser/keyboard/index.ts", "src/templates/assets/javascripts/browser/location/_/index.ts", "src/templates/assets/javascripts/browser/location/hash/index.ts", "src/templates/assets/javascripts/browser/media/index.ts", "src/templates/assets/javascripts/browser/request/index.ts", "src/templates/assets/javascripts/browser/viewport/offset/index.ts", "src/templates/assets/javascripts/browser/viewport/size/index.ts", "src/templates/assets/javascripts/browser/viewport/_/index.ts", "src/templates/assets/javascripts/browser/viewport/at/index.ts", "src/templates/assets/javascripts/browser/worker/index.ts", "src/templates/assets/javascripts/_/index.ts", "src/templates/assets/javascripts/components/_/index.ts", "src/templates/assets/javascripts/components/announce/index.ts", "src/templates/assets/javascripts/components/consent/index.ts", "src/templates/assets/javascripts/templates/tooltip/index.tsx", "src/templates/assets/javascripts/templates/annotation/index.tsx", "src/templates/assets/javascripts/templates/clipboard/index.tsx", "src/templates/assets/javascripts/templates/search/index.tsx", "src/templates/assets/javascripts/templates/source/index.tsx", "src/templates/assets/javascripts/templates/tabbed/index.tsx", "src/templates/assets/javascripts/templates/table/index.tsx", "src/templates/assets/javascripts/templates/version/index.tsx", "src/templates/assets/javascripts/components/tooltip/index.ts", "src/templates/assets/javascripts/components/content/annotation/_/index.ts", "src/templates/assets/javascripts/components/content/annotation/list/index.ts", "src/templates/assets/javascripts/components/content/annotation/block/index.ts", "src/templates/assets/javascripts/components/content/code/_/index.ts", "src/templates/assets/javascripts/components/content/details/index.ts", "src/templates/assets/javascripts/components/content/mermaid/index.css", "src/templates/assets/javascripts/components/content/mermaid/index.ts", "src/templates/assets/javascripts/components/content/table/index.ts", "src/templates/assets/javascripts/components/content/tabs/index.ts", "src/templates/assets/javascripts/components/content/_/index.ts", "src/templates/assets/javascripts/components/dialog/index.ts", "src/templates/assets/javascripts/components/header/_/index.ts", "src/templates/assets/javascripts/components/header/title/index.ts", "src/templates/assets/javascripts/components/main/index.ts", "src/templates/assets/javascripts/components/palette/index.ts", "src/templates/assets/javascripts/components/progress/index.ts", "src/templates/assets/javascripts/integrations/clipboard/index.ts", "src/templates/assets/javascripts/integrations/sitemap/index.ts", "src/templates/assets/javascripts/integrations/instant/index.ts", "src/templates/assets/javascripts/integrations/search/highlighter/index.ts", "src/templates/assets/javascripts/integrations/search/worker/message/index.ts", "src/templates/assets/javascripts/integrations/search/worker/_/index.ts", "src/templates/assets/javascripts/integrations/version/index.ts", "src/templates/assets/javascripts/components/search/query/index.ts", "src/templates/assets/javascripts/components/search/result/index.ts", "src/templates/assets/javascripts/components/search/share/index.ts", "src/templates/assets/javascripts/components/search/suggest/index.ts", "src/templates/assets/javascripts/components/search/_/index.ts", "src/templates/assets/javascripts/components/search/highlight/index.ts", "src/templates/assets/javascripts/components/sidebar/index.ts", "src/templates/assets/javascripts/components/source/facts/github/index.ts", "src/templates/assets/javascripts/components/source/facts/gitlab/index.ts", "src/templates/assets/javascripts/components/source/facts/_/index.ts", "src/templates/assets/javascripts/components/source/_/index.ts", "src/templates/assets/javascripts/components/tabs/index.ts", "src/templates/assets/javascripts/components/toc/index.ts", "src/templates/assets/javascripts/components/top/index.ts", "src/templates/assets/javascripts/patches/ellipsis/index.ts", "src/templates/assets/javascripts/patches/indeterminate/index.ts", "src/templates/assets/javascripts/patches/scrollfix/index.ts", "src/templates/assets/javascripts/patches/scrolllock/index.ts", "src/templates/assets/javascripts/polyfills/index.ts"], - "sourcesContent": ["(function (global, factory) {\n typeof exports === 'object' && typeof module !== 'undefined' ? factory() :\n typeof define === 'function' && define.amd ? define(factory) :\n (factory());\n}(this, (function () { 'use strict';\n\n /**\n * Applies the :focus-visible polyfill at the given scope.\n * A scope in this case is either the top-level Document or a Shadow Root.\n *\n * @param {(Document|ShadowRoot)} scope\n * @see https://github.com/WICG/focus-visible\n */\n function applyFocusVisiblePolyfill(scope) {\n var hadKeyboardEvent = true;\n var hadFocusVisibleRecently = false;\n var hadFocusVisibleRecentlyTimeout = null;\n\n var inputTypesAllowlist = {\n text: true,\n search: true,\n url: true,\n tel: true,\n email: true,\n password: true,\n number: true,\n date: true,\n month: true,\n week: true,\n time: true,\n datetime: true,\n 'datetime-local': true\n };\n\n /**\n * Helper function for legacy browsers and iframes which sometimes focus\n * elements like document, body, and non-interactive SVG.\n * @param {Element} el\n */\n function isValidFocusTarget(el) {\n if (\n el &&\n el !== document &&\n el.nodeName !== 'HTML' &&\n el.nodeName !== 'BODY' &&\n 'classList' in el &&\n 'contains' in el.classList\n ) {\n return true;\n }\n return false;\n }\n\n /**\n * Computes whether the given element should automatically trigger the\n * `focus-visible` class being added, i.e. whether it should always match\n * `:focus-visible` when focused.\n * @param {Element} el\n * @return {boolean}\n */\n function focusTriggersKeyboardModality(el) {\n var type = el.type;\n var tagName = el.tagName;\n\n if (tagName === 'INPUT' && inputTypesAllowlist[type] && !el.readOnly) {\n return true;\n }\n\n if (tagName === 'TEXTAREA' && !el.readOnly) {\n return true;\n }\n\n if (el.isContentEditable) {\n return true;\n }\n\n return false;\n }\n\n /**\n * Add the `focus-visible` class to the given element if it was not added by\n * the author.\n * @param {Element} el\n */\n function addFocusVisibleClass(el) {\n if (el.classList.contains('focus-visible')) {\n return;\n }\n el.classList.add('focus-visible');\n el.setAttribute('data-focus-visible-added', '');\n }\n\n /**\n * Remove the `focus-visible` class from the given element if it was not\n * originally added by the author.\n * @param {Element} el\n */\n function removeFocusVisibleClass(el) {\n if (!el.hasAttribute('data-focus-visible-added')) {\n return;\n }\n el.classList.remove('focus-visible');\n el.removeAttribute('data-focus-visible-added');\n }\n\n /**\n * If the most recent user interaction was via the keyboard;\n * and the key press did not include a meta, alt/option, or control key;\n * then the modality is keyboard. Otherwise, the modality is not keyboard.\n * Apply `focus-visible` to any current active element and keep track\n * of our keyboard modality state with `hadKeyboardEvent`.\n * @param {KeyboardEvent} e\n */\n function onKeyDown(e) {\n if (e.metaKey || e.altKey || e.ctrlKey) {\n return;\n }\n\n if (isValidFocusTarget(scope.activeElement)) {\n addFocusVisibleClass(scope.activeElement);\n }\n\n hadKeyboardEvent = true;\n }\n\n /**\n * If at any point a user clicks with a pointing device, ensure that we change\n * the modality away from keyboard.\n * This avoids the situation where a user presses a key on an already focused\n * element, and then clicks on a different element, focusing it with a\n * pointing device, while we still think we're in keyboard modality.\n * @param {Event} e\n */\n function onPointerDown(e) {\n hadKeyboardEvent = false;\n }\n\n /**\n * On `focus`, add the `focus-visible` class to the target if:\n * - the target received focus as a result of keyboard navigation, or\n * - the event target is an element that will likely require interaction\n * via the keyboard (e.g. a text box)\n * @param {Event} e\n */\n function onFocus(e) {\n // Prevent IE from focusing the document or HTML element.\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (hadKeyboardEvent || focusTriggersKeyboardModality(e.target)) {\n addFocusVisibleClass(e.target);\n }\n }\n\n /**\n * On `blur`, remove the `focus-visible` class from the target.\n * @param {Event} e\n */\n function onBlur(e) {\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (\n e.target.classList.contains('focus-visible') ||\n e.target.hasAttribute('data-focus-visible-added')\n ) {\n // To detect a tab/window switch, we look for a blur event followed\n // rapidly by a visibility change.\n // If we don't see a visibility change within 100ms, it's probably a\n // regular focus change.\n hadFocusVisibleRecently = true;\n window.clearTimeout(hadFocusVisibleRecentlyTimeout);\n hadFocusVisibleRecentlyTimeout = window.setTimeout(function() {\n hadFocusVisibleRecently = false;\n }, 100);\n removeFocusVisibleClass(e.target);\n }\n }\n\n /**\n * If the user changes tabs, keep track of whether or not the previously\n * focused element had .focus-visible.\n * @param {Event} e\n */\n function onVisibilityChange(e) {\n if (document.visibilityState === 'hidden') {\n // If the tab becomes active again, the browser will handle calling focus\n // on the element (Safari actually calls it twice).\n // If this tab change caused a blur on an element with focus-visible,\n // re-apply the class when the user switches back to the tab.\n if (hadFocusVisibleRecently) {\n hadKeyboardEvent = true;\n }\n addInitialPointerMoveListeners();\n }\n }\n\n /**\n * Add a group of listeners to detect usage of any pointing devices.\n * These listeners will be added when the polyfill first loads, and anytime\n * the window is blurred, so that they are active when the window regains\n * focus.\n */\n function addInitialPointerMoveListeners() {\n document.addEventListener('mousemove', onInitialPointerMove);\n document.addEventListener('mousedown', onInitialPointerMove);\n document.addEventListener('mouseup', onInitialPointerMove);\n document.addEventListener('pointermove', onInitialPointerMove);\n document.addEventListener('pointerdown', onInitialPointerMove);\n document.addEventListener('pointerup', onInitialPointerMove);\n document.addEventListener('touchmove', onInitialPointerMove);\n document.addEventListener('touchstart', onInitialPointerMove);\n document.addEventListener('touchend', onInitialPointerMove);\n }\n\n function removeInitialPointerMoveListeners() {\n document.removeEventListener('mousemove', onInitialPointerMove);\n document.removeEventListener('mousedown', onInitialPointerMove);\n document.removeEventListener('mouseup', onInitialPointerMove);\n document.removeEventListener('pointermove', onInitialPointerMove);\n document.removeEventListener('pointerdown', onInitialPointerMove);\n document.removeEventListener('pointerup', onInitialPointerMove);\n document.removeEventListener('touchmove', onInitialPointerMove);\n document.removeEventListener('touchstart', onInitialPointerMove);\n document.removeEventListener('touchend', onInitialPointerMove);\n }\n\n /**\n * When the polfyill first loads, assume the user is in keyboard modality.\n * If any event is received from a pointing device (e.g. mouse, pointer,\n * touch), turn off keyboard modality.\n * This accounts for situations where focus enters the page from the URL bar.\n * @param {Event} e\n */\n function onInitialPointerMove(e) {\n // Work around a Safari quirk that fires a mousemove on whenever the\n // window blurs, even if you're tabbing out of the page. \u00AF\\_(\u30C4)_/\u00AF\n if (e.target.nodeName && e.target.nodeName.toLowerCase() === 'html') {\n return;\n }\n\n hadKeyboardEvent = false;\n removeInitialPointerMoveListeners();\n }\n\n // For some kinds of state, we are interested in changes at the global scope\n // only. For example, global pointer input, global key presses and global\n // visibility change should affect the state at every scope:\n document.addEventListener('keydown', onKeyDown, true);\n document.addEventListener('mousedown', onPointerDown, true);\n document.addEventListener('pointerdown', onPointerDown, true);\n document.addEventListener('touchstart', onPointerDown, true);\n document.addEventListener('visibilitychange', onVisibilityChange, true);\n\n addInitialPointerMoveListeners();\n\n // For focus and blur, we specifically care about state changes in the local\n // scope. This is because focus / blur events that originate from within a\n // shadow root are not re-dispatched from the host element if it was already\n // the active element in its own scope:\n scope.addEventListener('focus', onFocus, true);\n scope.addEventListener('blur', onBlur, true);\n\n // We detect that a node is a ShadowRoot by ensuring that it is a\n // DocumentFragment and also has a host property. This check covers native\n // implementation and polyfill implementation transparently. If we only cared\n // about the native implementation, we could just check if the scope was\n // an instance of a ShadowRoot.\n if (scope.nodeType === Node.DOCUMENT_FRAGMENT_NODE && scope.host) {\n // Since a ShadowRoot is a special kind of DocumentFragment, it does not\n // have a root element to add a class to. So, we add this attribute to the\n // host element instead:\n scope.host.setAttribute('data-js-focus-visible', '');\n } else if (scope.nodeType === Node.DOCUMENT_NODE) {\n document.documentElement.classList.add('js-focus-visible');\n document.documentElement.setAttribute('data-js-focus-visible', '');\n }\n }\n\n // It is important to wrap all references to global window and document in\n // these checks to support server-side rendering use cases\n // @see https://github.com/WICG/focus-visible/issues/199\n if (typeof window !== 'undefined' && typeof document !== 'undefined') {\n // Make the polyfill helper globally available. This can be used as a signal\n // to interested libraries that wish to coordinate with the polyfill for e.g.,\n // applying the polyfill to a shadow root:\n window.applyFocusVisiblePolyfill = applyFocusVisiblePolyfill;\n\n // Notify interested libraries of the polyfill's presence, in case the\n // polyfill was loaded lazily:\n var event;\n\n try {\n event = new CustomEvent('focus-visible-polyfill-ready');\n } catch (error) {\n // IE11 does not support using CustomEvent as a constructor directly:\n event = document.createEvent('CustomEvent');\n event.initCustomEvent('focus-visible-polyfill-ready', false, false, {});\n }\n\n window.dispatchEvent(event);\n }\n\n if (typeof document !== 'undefined') {\n // Apply the polyfill to the global document, so that no JavaScript\n // coordination is required to use the polyfill in the top-level document:\n applyFocusVisiblePolyfill(document);\n }\n\n})));\n", "/*!\n * clipboard.js v2.0.11\n * https://clipboardjs.com/\n *\n * Licensed MIT \u00A9 Zeno Rocha\n */\n(function webpackUniversalModuleDefinition(root, factory) {\n\tif(typeof exports === 'object' && typeof module === 'object')\n\t\tmodule.exports = factory();\n\telse if(typeof define === 'function' && define.amd)\n\t\tdefine([], factory);\n\telse if(typeof exports === 'object')\n\t\texports[\"ClipboardJS\"] = factory();\n\telse\n\t\troot[\"ClipboardJS\"] = factory();\n})(this, function() {\nreturn /******/ (function() { // webpackBootstrap\n/******/ \tvar __webpack_modules__ = ({\n\n/***/ 686:\n/***/ (function(__unused_webpack_module, __webpack_exports__, __webpack_require__) {\n\n\"use strict\";\n\n// EXPORTS\n__webpack_require__.d(__webpack_exports__, {\n \"default\": function() { return /* binding */ clipboard; }\n});\n\n// EXTERNAL MODULE: ./node_modules/tiny-emitter/index.js\nvar tiny_emitter = __webpack_require__(279);\nvar tiny_emitter_default = /*#__PURE__*/__webpack_require__.n(tiny_emitter);\n// EXTERNAL MODULE: ./node_modules/good-listener/src/listen.js\nvar listen = __webpack_require__(370);\nvar listen_default = /*#__PURE__*/__webpack_require__.n(listen);\n// EXTERNAL MODULE: ./node_modules/select/src/select.js\nvar src_select = __webpack_require__(817);\nvar select_default = /*#__PURE__*/__webpack_require__.n(src_select);\n;// CONCATENATED MODULE: ./src/common/command.js\n/**\n * Executes a given operation type.\n * @param {String} type\n * @return {Boolean}\n */\nfunction command(type) {\n try {\n return document.execCommand(type);\n } catch (err) {\n return false;\n }\n}\n;// CONCATENATED MODULE: ./src/actions/cut.js\n\n\n/**\n * Cut action wrapper.\n * @param {String|HTMLElement} target\n * @return {String}\n */\n\nvar ClipboardActionCut = function ClipboardActionCut(target) {\n var selectedText = select_default()(target);\n command('cut');\n return selectedText;\n};\n\n/* harmony default export */ var actions_cut = (ClipboardActionCut);\n;// CONCATENATED MODULE: ./src/common/create-fake-element.js\n/**\n * Creates a fake textarea element with a value.\n * @param {String} value\n * @return {HTMLElement}\n */\nfunction createFakeElement(value) {\n var isRTL = document.documentElement.getAttribute('dir') === 'rtl';\n var fakeElement = document.createElement('textarea'); // Prevent zooming on iOS\n\n fakeElement.style.fontSize = '12pt'; // Reset box model\n\n fakeElement.style.border = '0';\n fakeElement.style.padding = '0';\n fakeElement.style.margin = '0'; // Move element out of screen horizontally\n\n fakeElement.style.position = 'absolute';\n fakeElement.style[isRTL ? 'right' : 'left'] = '-9999px'; // Move element to the same position vertically\n\n var yPosition = window.pageYOffset || document.documentElement.scrollTop;\n fakeElement.style.top = \"\".concat(yPosition, \"px\");\n fakeElement.setAttribute('readonly', '');\n fakeElement.value = value;\n return fakeElement;\n}\n;// CONCATENATED MODULE: ./src/actions/copy.js\n\n\n\n/**\n * Create fake copy action wrapper using a fake element.\n * @param {String} target\n * @param {Object} options\n * @return {String}\n */\n\nvar fakeCopyAction = function fakeCopyAction(value, options) {\n var fakeElement = createFakeElement(value);\n options.container.appendChild(fakeElement);\n var selectedText = select_default()(fakeElement);\n command('copy');\n fakeElement.remove();\n return selectedText;\n};\n/**\n * Copy action wrapper.\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @return {String}\n */\n\n\nvar ClipboardActionCopy = function ClipboardActionCopy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n var selectedText = '';\n\n if (typeof target === 'string') {\n selectedText = fakeCopyAction(target, options);\n } else if (target instanceof HTMLInputElement && !['text', 'search', 'url', 'tel', 'password'].includes(target === null || target === void 0 ? void 0 : target.type)) {\n // If input type doesn't support `setSelectionRange`. Simulate it. https://developer.mozilla.org/en-US/docs/Web/API/HTMLInputElement/setSelectionRange\n selectedText = fakeCopyAction(target.value, options);\n } else {\n selectedText = select_default()(target);\n command('copy');\n }\n\n return selectedText;\n};\n\n/* harmony default export */ var actions_copy = (ClipboardActionCopy);\n;// CONCATENATED MODULE: ./src/actions/default.js\nfunction _typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { _typeof = function _typeof(obj) { return typeof obj; }; } else { _typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return _typeof(obj); }\n\n\n\n/**\n * Inner function which performs selection from either `text` or `target`\n * properties and then executes copy or cut operations.\n * @param {Object} options\n */\n\nvar ClipboardActionDefault = function ClipboardActionDefault() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n // Defines base properties passed from constructor.\n var _options$action = options.action,\n action = _options$action === void 0 ? 'copy' : _options$action,\n container = options.container,\n target = options.target,\n text = options.text; // Sets the `action` to be performed which can be either 'copy' or 'cut'.\n\n if (action !== 'copy' && action !== 'cut') {\n throw new Error('Invalid \"action\" value, use either \"copy\" or \"cut\"');\n } // Sets the `target` property using an element that will be have its content copied.\n\n\n if (target !== undefined) {\n if (target && _typeof(target) === 'object' && target.nodeType === 1) {\n if (action === 'copy' && target.hasAttribute('disabled')) {\n throw new Error('Invalid \"target\" attribute. Please use \"readonly\" instead of \"disabled\" attribute');\n }\n\n if (action === 'cut' && (target.hasAttribute('readonly') || target.hasAttribute('disabled'))) {\n throw new Error('Invalid \"target\" attribute. You can\\'t cut text from elements with \"readonly\" or \"disabled\" attributes');\n }\n } else {\n throw new Error('Invalid \"target\" value, use a valid Element');\n }\n } // Define selection strategy based on `text` property.\n\n\n if (text) {\n return actions_copy(text, {\n container: container\n });\n } // Defines which selection strategy based on `target` property.\n\n\n if (target) {\n return action === 'cut' ? actions_cut(target) : actions_copy(target, {\n container: container\n });\n }\n};\n\n/* harmony default export */ var actions_default = (ClipboardActionDefault);\n;// CONCATENATED MODULE: ./src/clipboard.js\nfunction clipboard_typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { clipboard_typeof = function _typeof(obj) { return typeof obj; }; } else { clipboard_typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return clipboard_typeof(obj); }\n\nfunction _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError(\"Cannot call a class as a function\"); } }\n\nfunction _defineProperties(target, props) { for (var i = 0; i < props.length; i++) { var descriptor = props[i]; descriptor.enumerable = descriptor.enumerable || false; descriptor.configurable = true; if (\"value\" in descriptor) descriptor.writable = true; Object.defineProperty(target, descriptor.key, descriptor); } }\n\nfunction _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); return Constructor; }\n\nfunction _inherits(subClass, superClass) { if (typeof superClass !== \"function\" && superClass !== null) { throw new TypeError(\"Super expression must either be null or a function\"); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, writable: true, configurable: true } }); if (superClass) _setPrototypeOf(subClass, superClass); }\n\nfunction _setPrototypeOf(o, p) { _setPrototypeOf = Object.setPrototypeOf || function _setPrototypeOf(o, p) { o.__proto__ = p; return o; }; return _setPrototypeOf(o, p); }\n\nfunction _createSuper(Derived) { var hasNativeReflectConstruct = _isNativeReflectConstruct(); return function _createSuperInternal() { var Super = _getPrototypeOf(Derived), result; if (hasNativeReflectConstruct) { var NewTarget = _getPrototypeOf(this).constructor; result = Reflect.construct(Super, arguments, NewTarget); } else { result = Super.apply(this, arguments); } return _possibleConstructorReturn(this, result); }; }\n\nfunction _possibleConstructorReturn(self, call) { if (call && (clipboard_typeof(call) === \"object\" || typeof call === \"function\")) { return call; } return _assertThisInitialized(self); }\n\nfunction _assertThisInitialized(self) { if (self === void 0) { throw new ReferenceError(\"this hasn't been initialised - super() hasn't been called\"); } return self; }\n\nfunction _isNativeReflectConstruct() { if (typeof Reflect === \"undefined\" || !Reflect.construct) return false; if (Reflect.construct.sham) return false; if (typeof Proxy === \"function\") return true; try { Date.prototype.toString.call(Reflect.construct(Date, [], function () {})); return true; } catch (e) { return false; } }\n\nfunction _getPrototypeOf(o) { _getPrototypeOf = Object.setPrototypeOf ? Object.getPrototypeOf : function _getPrototypeOf(o) { return o.__proto__ || Object.getPrototypeOf(o); }; return _getPrototypeOf(o); }\n\n\n\n\n\n\n/**\n * Helper function to retrieve attribute value.\n * @param {String} suffix\n * @param {Element} element\n */\n\nfunction getAttributeValue(suffix, element) {\n var attribute = \"data-clipboard-\".concat(suffix);\n\n if (!element.hasAttribute(attribute)) {\n return;\n }\n\n return element.getAttribute(attribute);\n}\n/**\n * Base class which takes one or more elements, adds event listeners to them,\n * and instantiates a new `ClipboardAction` on each click.\n */\n\n\nvar Clipboard = /*#__PURE__*/function (_Emitter) {\n _inherits(Clipboard, _Emitter);\n\n var _super = _createSuper(Clipboard);\n\n /**\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n * @param {Object} options\n */\n function Clipboard(trigger, options) {\n var _this;\n\n _classCallCheck(this, Clipboard);\n\n _this = _super.call(this);\n\n _this.resolveOptions(options);\n\n _this.listenClick(trigger);\n\n return _this;\n }\n /**\n * Defines if attributes would be resolved using internal setter functions\n * or custom functions that were passed in the constructor.\n * @param {Object} options\n */\n\n\n _createClass(Clipboard, [{\n key: \"resolveOptions\",\n value: function resolveOptions() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n this.action = typeof options.action === 'function' ? options.action : this.defaultAction;\n this.target = typeof options.target === 'function' ? options.target : this.defaultTarget;\n this.text = typeof options.text === 'function' ? options.text : this.defaultText;\n this.container = clipboard_typeof(options.container) === 'object' ? options.container : document.body;\n }\n /**\n * Adds a click event listener to the passed trigger.\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n */\n\n }, {\n key: \"listenClick\",\n value: function listenClick(trigger) {\n var _this2 = this;\n\n this.listener = listen_default()(trigger, 'click', function (e) {\n return _this2.onClick(e);\n });\n }\n /**\n * Defines a new `ClipboardAction` on each click event.\n * @param {Event} e\n */\n\n }, {\n key: \"onClick\",\n value: function onClick(e) {\n var trigger = e.delegateTarget || e.currentTarget;\n var action = this.action(trigger) || 'copy';\n var text = actions_default({\n action: action,\n container: this.container,\n target: this.target(trigger),\n text: this.text(trigger)\n }); // Fires an event based on the copy operation result.\n\n this.emit(text ? 'success' : 'error', {\n action: action,\n text: text,\n trigger: trigger,\n clearSelection: function clearSelection() {\n if (trigger) {\n trigger.focus();\n }\n\n window.getSelection().removeAllRanges();\n }\n });\n }\n /**\n * Default `action` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultAction\",\n value: function defaultAction(trigger) {\n return getAttributeValue('action', trigger);\n }\n /**\n * Default `target` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultTarget\",\n value: function defaultTarget(trigger) {\n var selector = getAttributeValue('target', trigger);\n\n if (selector) {\n return document.querySelector(selector);\n }\n }\n /**\n * Allow fire programmatically a copy action\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @returns Text copied.\n */\n\n }, {\n key: \"defaultText\",\n\n /**\n * Default `text` lookup function.\n * @param {Element} trigger\n */\n value: function defaultText(trigger) {\n return getAttributeValue('text', trigger);\n }\n /**\n * Destroy lifecycle.\n */\n\n }, {\n key: \"destroy\",\n value: function destroy() {\n this.listener.destroy();\n }\n }], [{\n key: \"copy\",\n value: function copy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n return actions_copy(target, options);\n }\n /**\n * Allow fire programmatically a cut action\n * @param {String|HTMLElement} target\n * @returns Text cutted.\n */\n\n }, {\n key: \"cut\",\n value: function cut(target) {\n return actions_cut(target);\n }\n /**\n * Returns the support of the given action, or all actions if no action is\n * given.\n * @param {String} [action]\n */\n\n }, {\n key: \"isSupported\",\n value: function isSupported() {\n var action = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : ['copy', 'cut'];\n var actions = typeof action === 'string' ? [action] : action;\n var support = !!document.queryCommandSupported;\n actions.forEach(function (action) {\n support = support && !!document.queryCommandSupported(action);\n });\n return support;\n }\n }]);\n\n return Clipboard;\n}((tiny_emitter_default()));\n\n/* harmony default export */ var clipboard = (Clipboard);\n\n/***/ }),\n\n/***/ 828:\n/***/ (function(module) {\n\nvar DOCUMENT_NODE_TYPE = 9;\n\n/**\n * A polyfill for Element.matches()\n */\nif (typeof Element !== 'undefined' && !Element.prototype.matches) {\n var proto = Element.prototype;\n\n proto.matches = proto.matchesSelector ||\n proto.mozMatchesSelector ||\n proto.msMatchesSelector ||\n proto.oMatchesSelector ||\n proto.webkitMatchesSelector;\n}\n\n/**\n * Finds the closest parent that matches a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @return {Function}\n */\nfunction closest (element, selector) {\n while (element && element.nodeType !== DOCUMENT_NODE_TYPE) {\n if (typeof element.matches === 'function' &&\n element.matches(selector)) {\n return element;\n }\n element = element.parentNode;\n }\n}\n\nmodule.exports = closest;\n\n\n/***/ }),\n\n/***/ 438:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar closest = __webpack_require__(828);\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction _delegate(element, selector, type, callback, useCapture) {\n var listenerFn = listener.apply(this, arguments);\n\n element.addEventListener(type, listenerFn, useCapture);\n\n return {\n destroy: function() {\n element.removeEventListener(type, listenerFn, useCapture);\n }\n }\n}\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element|String|Array} [elements]\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction delegate(elements, selector, type, callback, useCapture) {\n // Handle the regular Element usage\n if (typeof elements.addEventListener === 'function') {\n return _delegate.apply(null, arguments);\n }\n\n // Handle Element-less usage, it defaults to global delegation\n if (typeof type === 'function') {\n // Use `document` as the first parameter, then apply arguments\n // This is a short way to .unshift `arguments` without running into deoptimizations\n return _delegate.bind(null, document).apply(null, arguments);\n }\n\n // Handle Selector-based usage\n if (typeof elements === 'string') {\n elements = document.querySelectorAll(elements);\n }\n\n // Handle Array-like based usage\n return Array.prototype.map.call(elements, function (element) {\n return _delegate(element, selector, type, callback, useCapture);\n });\n}\n\n/**\n * Finds closest match and invokes callback.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Function}\n */\nfunction listener(element, selector, type, callback) {\n return function(e) {\n e.delegateTarget = closest(e.target, selector);\n\n if (e.delegateTarget) {\n callback.call(element, e);\n }\n }\n}\n\nmodule.exports = delegate;\n\n\n/***/ }),\n\n/***/ 879:\n/***/ (function(__unused_webpack_module, exports) {\n\n/**\n * Check if argument is a HTML element.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.node = function(value) {\n return value !== undefined\n && value instanceof HTMLElement\n && value.nodeType === 1;\n};\n\n/**\n * Check if argument is a list of HTML elements.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.nodeList = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return value !== undefined\n && (type === '[object NodeList]' || type === '[object HTMLCollection]')\n && ('length' in value)\n && (value.length === 0 || exports.node(value[0]));\n};\n\n/**\n * Check if argument is a string.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.string = function(value) {\n return typeof value === 'string'\n || value instanceof String;\n};\n\n/**\n * Check if argument is a function.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.fn = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return type === '[object Function]';\n};\n\n\n/***/ }),\n\n/***/ 370:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar is = __webpack_require__(879);\nvar delegate = __webpack_require__(438);\n\n/**\n * Validates all params and calls the right\n * listener function based on its target type.\n *\n * @param {String|HTMLElement|HTMLCollection|NodeList} target\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listen(target, type, callback) {\n if (!target && !type && !callback) {\n throw new Error('Missing required arguments');\n }\n\n if (!is.string(type)) {\n throw new TypeError('Second argument must be a String');\n }\n\n if (!is.fn(callback)) {\n throw new TypeError('Third argument must be a Function');\n }\n\n if (is.node(target)) {\n return listenNode(target, type, callback);\n }\n else if (is.nodeList(target)) {\n return listenNodeList(target, type, callback);\n }\n else if (is.string(target)) {\n return listenSelector(target, type, callback);\n }\n else {\n throw new TypeError('First argument must be a String, HTMLElement, HTMLCollection, or NodeList');\n }\n}\n\n/**\n * Adds an event listener to a HTML element\n * and returns a remove listener function.\n *\n * @param {HTMLElement} node\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNode(node, type, callback) {\n node.addEventListener(type, callback);\n\n return {\n destroy: function() {\n node.removeEventListener(type, callback);\n }\n }\n}\n\n/**\n * Add an event listener to a list of HTML elements\n * and returns a remove listener function.\n *\n * @param {NodeList|HTMLCollection} nodeList\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNodeList(nodeList, type, callback) {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.addEventListener(type, callback);\n });\n\n return {\n destroy: function() {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.removeEventListener(type, callback);\n });\n }\n }\n}\n\n/**\n * Add an event listener to a selector\n * and returns a remove listener function.\n *\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenSelector(selector, type, callback) {\n return delegate(document.body, selector, type, callback);\n}\n\nmodule.exports = listen;\n\n\n/***/ }),\n\n/***/ 817:\n/***/ (function(module) {\n\nfunction select(element) {\n var selectedText;\n\n if (element.nodeName === 'SELECT') {\n element.focus();\n\n selectedText = element.value;\n }\n else if (element.nodeName === 'INPUT' || element.nodeName === 'TEXTAREA') {\n var isReadOnly = element.hasAttribute('readonly');\n\n if (!isReadOnly) {\n element.setAttribute('readonly', '');\n }\n\n element.select();\n element.setSelectionRange(0, element.value.length);\n\n if (!isReadOnly) {\n element.removeAttribute('readonly');\n }\n\n selectedText = element.value;\n }\n else {\n if (element.hasAttribute('contenteditable')) {\n element.focus();\n }\n\n var selection = window.getSelection();\n var range = document.createRange();\n\n range.selectNodeContents(element);\n selection.removeAllRanges();\n selection.addRange(range);\n\n selectedText = selection.toString();\n }\n\n return selectedText;\n}\n\nmodule.exports = select;\n\n\n/***/ }),\n\n/***/ 279:\n/***/ (function(module) {\n\nfunction E () {\n // Keep this empty so it's easier to inherit from\n // (via https://github.com/lipsmack from https://github.com/scottcorgan/tiny-emitter/issues/3)\n}\n\nE.prototype = {\n on: function (name, callback, ctx) {\n var e = this.e || (this.e = {});\n\n (e[name] || (e[name] = [])).push({\n fn: callback,\n ctx: ctx\n });\n\n return this;\n },\n\n once: function (name, callback, ctx) {\n var self = this;\n function listener () {\n self.off(name, listener);\n callback.apply(ctx, arguments);\n };\n\n listener._ = callback\n return this.on(name, listener, ctx);\n },\n\n emit: function (name) {\n var data = [].slice.call(arguments, 1);\n var evtArr = ((this.e || (this.e = {}))[name] || []).slice();\n var i = 0;\n var len = evtArr.length;\n\n for (i; i < len; i++) {\n evtArr[i].fn.apply(evtArr[i].ctx, data);\n }\n\n return this;\n },\n\n off: function (name, callback) {\n var e = this.e || (this.e = {});\n var evts = e[name];\n var liveEvents = [];\n\n if (evts && callback) {\n for (var i = 0, len = evts.length; i < len; i++) {\n if (evts[i].fn !== callback && evts[i].fn._ !== callback)\n liveEvents.push(evts[i]);\n }\n }\n\n // Remove event from queue to prevent memory leak\n // Suggested by https://github.com/lazd\n // Ref: https://github.com/scottcorgan/tiny-emitter/commit/c6ebfaa9bc973b33d110a84a307742b7cf94c953#commitcomment-5024910\n\n (liveEvents.length)\n ? e[name] = liveEvents\n : delete e[name];\n\n return this;\n }\n};\n\nmodule.exports = E;\nmodule.exports.TinyEmitter = E;\n\n\n/***/ })\n\n/******/ \t});\n/************************************************************************/\n/******/ \t// The module cache\n/******/ \tvar __webpack_module_cache__ = {};\n/******/ \t\n/******/ \t// The require function\n/******/ \tfunction __webpack_require__(moduleId) {\n/******/ \t\t// Check if module is in cache\n/******/ \t\tif(__webpack_module_cache__[moduleId]) {\n/******/ \t\t\treturn __webpack_module_cache__[moduleId].exports;\n/******/ \t\t}\n/******/ \t\t// Create a new module (and put it into the cache)\n/******/ \t\tvar module = __webpack_module_cache__[moduleId] = {\n/******/ \t\t\t// no module.id needed\n/******/ \t\t\t// no module.loaded needed\n/******/ \t\t\texports: {}\n/******/ \t\t};\n/******/ \t\n/******/ \t\t// Execute the module function\n/******/ \t\t__webpack_modules__[moduleId](module, module.exports, __webpack_require__);\n/******/ \t\n/******/ \t\t// Return the exports of the module\n/******/ \t\treturn module.exports;\n/******/ \t}\n/******/ \t\n/************************************************************************/\n/******/ \t/* webpack/runtime/compat get default export */\n/******/ \t!function() {\n/******/ \t\t// getDefaultExport function for compatibility with non-harmony modules\n/******/ \t\t__webpack_require__.n = function(module) {\n/******/ \t\t\tvar getter = module && module.__esModule ?\n/******/ \t\t\t\tfunction() { return module['default']; } :\n/******/ \t\t\t\tfunction() { return module; };\n/******/ \t\t\t__webpack_require__.d(getter, { a: getter });\n/******/ \t\t\treturn getter;\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/define property getters */\n/******/ \t!function() {\n/******/ \t\t// define getter functions for harmony exports\n/******/ \t\t__webpack_require__.d = function(exports, definition) {\n/******/ \t\t\tfor(var key in definition) {\n/******/ \t\t\t\tif(__webpack_require__.o(definition, key) && !__webpack_require__.o(exports, key)) {\n/******/ \t\t\t\t\tObject.defineProperty(exports, key, { enumerable: true, get: definition[key] });\n/******/ \t\t\t\t}\n/******/ \t\t\t}\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/hasOwnProperty shorthand */\n/******/ \t!function() {\n/******/ \t\t__webpack_require__.o = function(obj, prop) { return Object.prototype.hasOwnProperty.call(obj, prop); }\n/******/ \t}();\n/******/ \t\n/************************************************************************/\n/******/ \t// module exports must be returned from runtime so entry inlining is disabled\n/******/ \t// startup\n/******/ \t// Load entry module and return exports\n/******/ \treturn __webpack_require__(686);\n/******/ })()\n.default;\n});", "/*!\n * escape-html\n * Copyright(c) 2012-2013 TJ Holowaychuk\n * Copyright(c) 2015 Andreas Lubbe\n * Copyright(c) 2015 Tiancheng \"Timothy\" Gu\n * MIT Licensed\n */\n\n'use strict';\n\n/**\n * Module variables.\n * @private\n */\n\nvar matchHtmlRegExp = /[\"'&<>]/;\n\n/**\n * Module exports.\n * @public\n */\n\nmodule.exports = escapeHtml;\n\n/**\n * Escape special characters in the given string of html.\n *\n * @param {string} string The string to escape for inserting into HTML\n * @return {string}\n * @public\n */\n\nfunction escapeHtml(string) {\n var str = '' + string;\n var match = matchHtmlRegExp.exec(str);\n\n if (!match) {\n return str;\n }\n\n var escape;\n var html = '';\n var index = 0;\n var lastIndex = 0;\n\n for (index = match.index; index < str.length; index++) {\n switch (str.charCodeAt(index)) {\n case 34: // \"\n escape = '"';\n break;\n case 38: // &\n escape = '&';\n break;\n case 39: // '\n escape = ''';\n break;\n case 60: // <\n escape = '<';\n break;\n case 62: // >\n escape = '>';\n break;\n default:\n continue;\n }\n\n if (lastIndex !== index) {\n html += str.substring(lastIndex, index);\n }\n\n lastIndex = index + 1;\n html += escape;\n }\n\n return lastIndex !== index\n ? html + str.substring(lastIndex, index)\n : html;\n}\n", "/*\n * Copyright (c) 2016-2024 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport \"focus-visible\"\n\nimport {\n EMPTY,\n NEVER,\n Observable,\n Subject,\n defer,\n delay,\n filter,\n map,\n merge,\n mergeWith,\n shareReplay,\n switchMap\n} from \"rxjs\"\n\nimport { configuration, feature } from \"./_\"\nimport {\n at,\n getActiveElement,\n getOptionalElement,\n requestJSON,\n setLocation,\n setToggle,\n watchDocument,\n watchKeyboard,\n watchLocation,\n watchLocationTarget,\n watchMedia,\n watchPrint,\n watchScript,\n watchViewport\n} from \"./browser\"\nimport {\n getComponentElement,\n getComponentElements,\n mountAnnounce,\n mountBackToTop,\n mountConsent,\n mountContent,\n mountDialog,\n mountHeader,\n mountHeaderTitle,\n mountPalette,\n mountProgress,\n mountSearch,\n mountSearchHiglight,\n mountSidebar,\n mountSource,\n mountTableOfContents,\n mountTabs,\n watchHeader,\n watchMain\n} from \"./components\"\nimport {\n SearchIndex,\n setupClipboardJS,\n setupInstantNavigation,\n setupVersionSelector\n} from \"./integrations\"\nimport {\n patchEllipsis,\n patchIndeterminate,\n patchScrollfix,\n patchScrolllock\n} from \"./patches\"\nimport \"./polyfills\"\n\n/* ----------------------------------------------------------------------------\n * Functions - @todo refactor\n * ------------------------------------------------------------------------- */\n\n/**\n * Fetch search index\n *\n * @returns Search index observable\n */\nfunction fetchSearchIndex(): Observable {\n if (location.protocol === \"file:\") {\n return watchScript(\n `${new URL(\"search/search_index.js\", config.base)}`\n )\n .pipe(\n // @ts-ignore - @todo fix typings\n map(() => __index),\n shareReplay(1)\n )\n } else {\n return requestJSON(\n new URL(\"search/search_index.json\", config.base)\n )\n }\n}\n\n/* ----------------------------------------------------------------------------\n * Application\n * ------------------------------------------------------------------------- */\n\n/* Yay, JavaScript is available */\ndocument.documentElement.classList.remove(\"no-js\")\ndocument.documentElement.classList.add(\"js\")\n\n/* Set up navigation observables and subjects */\nconst document$ = watchDocument()\nconst location$ = watchLocation()\nconst target$ = watchLocationTarget(location$)\nconst keyboard$ = watchKeyboard()\n\n/* Set up media observables */\nconst viewport$ = watchViewport()\nconst tablet$ = watchMedia(\"(min-width: 960px)\")\nconst screen$ = watchMedia(\"(min-width: 1220px)\")\nconst print$ = watchPrint()\n\n/* Retrieve search index, if search is enabled */\nconst config = configuration()\nconst index$ = document.forms.namedItem(\"search\")\n ? fetchSearchIndex()\n : NEVER\n\n/* Set up Clipboard.js integration */\nconst alert$ = new Subject()\nsetupClipboardJS({ alert$ })\n\n/* Set up progress indicator */\nconst progress$ = new Subject()\n\n/* Set up instant navigation, if enabled */\nif (feature(\"navigation.instant\"))\n setupInstantNavigation({ location$, viewport$, progress$ })\n .subscribe(document$)\n\n/* Set up version selector */\nif (config.version?.provider === \"mike\")\n setupVersionSelector({ document$ })\n\n/* Always close drawer and search on navigation */\nmerge(location$, target$)\n .pipe(\n delay(125)\n )\n .subscribe(() => {\n setToggle(\"drawer\", false)\n setToggle(\"search\", false)\n })\n\n/* Set up global keyboard handlers */\nkeyboard$\n .pipe(\n filter(({ mode }) => mode === \"global\")\n )\n .subscribe(key => {\n switch (key.type) {\n\n /* Go to previous page */\n case \"p\":\n case \",\":\n const prev = getOptionalElement(\"link[rel=prev]\")\n if (typeof prev !== \"undefined\")\n setLocation(prev)\n break\n\n /* Go to next page */\n case \"n\":\n case \".\":\n const next = getOptionalElement(\"link[rel=next]\")\n if (typeof next !== \"undefined\")\n setLocation(next)\n break\n\n /* Expand navigation, see https://bit.ly/3ZjG5io */\n case \"Enter\":\n const active = getActiveElement()\n if (active instanceof HTMLLabelElement)\n active.click()\n }\n })\n\n/* Set up patches */\npatchEllipsis({ document$ })\npatchIndeterminate({ document$, tablet$ })\npatchScrollfix({ document$ })\npatchScrolllock({ viewport$, tablet$ })\n\n/* Set up header and main area observable */\nconst header$ = watchHeader(getComponentElement(\"header\"), { viewport$ })\nconst main$ = document$\n .pipe(\n map(() => getComponentElement(\"main\")),\n switchMap(el => watchMain(el, { viewport$, header$ })),\n shareReplay(1)\n )\n\n/* Set up control component observables */\nconst control$ = merge(\n\n /* Consent */\n ...getComponentElements(\"consent\")\n .map(el => mountConsent(el, { target$ })),\n\n /* Dialog */\n ...getComponentElements(\"dialog\")\n .map(el => mountDialog(el, { alert$ })),\n\n /* Header */\n ...getComponentElements(\"header\")\n .map(el => mountHeader(el, { viewport$, header$, main$ })),\n\n /* Color palette */\n ...getComponentElements(\"palette\")\n .map(el => mountPalette(el)),\n\n /* Progress bar */\n ...getComponentElements(\"progress\")\n .map(el => mountProgress(el, { progress$ })),\n\n /* Search */\n ...getComponentElements(\"search\")\n .map(el => mountSearch(el, { index$, keyboard$ })),\n\n /* Repository information */\n ...getComponentElements(\"source\")\n .map(el => mountSource(el))\n)\n\n/* Set up content component observables */\nconst content$ = defer(() => merge(\n\n /* Announcement bar */\n ...getComponentElements(\"announce\")\n .map(el => mountAnnounce(el)),\n\n /* Content */\n ...getComponentElements(\"content\")\n .map(el => mountContent(el, { viewport$, target$, print$ })),\n\n /* Search highlighting */\n ...getComponentElements(\"content\")\n .map(el => feature(\"search.highlight\")\n ? mountSearchHiglight(el, { index$, location$ })\n : EMPTY\n ),\n\n /* Header title */\n ...getComponentElements(\"header-title\")\n .map(el => mountHeaderTitle(el, { viewport$, header$ })),\n\n /* Sidebar */\n ...getComponentElements(\"sidebar\")\n .map(el => el.getAttribute(\"data-md-type\") === \"navigation\"\n ? at(screen$, () => mountSidebar(el, { viewport$, header$, main$ }))\n : at(tablet$, () => mountSidebar(el, { viewport$, header$, main$ }))\n ),\n\n /* Navigation tabs */\n ...getComponentElements(\"tabs\")\n .map(el => mountTabs(el, { viewport$, header$ })),\n\n /* Table of contents */\n ...getComponentElements(\"toc\")\n .map(el => mountTableOfContents(el, {\n viewport$, header$, main$, target$\n })),\n\n /* Back-to-top button */\n ...getComponentElements(\"top\")\n .map(el => mountBackToTop(el, { viewport$, header$, main$, target$ }))\n))\n\n/* Set up component observables */\nconst component$ = document$\n .pipe(\n switchMap(() => content$),\n mergeWith(control$),\n shareReplay(1)\n )\n\n/* Subscribe to all components */\ncomponent$.subscribe()\n\n/* ----------------------------------------------------------------------------\n * Exports\n * ------------------------------------------------------------------------- */\n\nwindow.document$ = document$ /* Document observable */\nwindow.location$ = location$ /* Location subject */\nwindow.target$ = target$ /* Location target observable */\nwindow.keyboard$ = keyboard$ /* Keyboard observable */\nwindow.viewport$ = viewport$ /* Viewport observable */\nwindow.tablet$ = tablet$ /* Media tablet observable */\nwindow.screen$ = screen$ /* Media screen observable */\nwindow.print$ = print$ /* Media print observable */\nwindow.alert$ = alert$ /* Alert subject */\nwindow.progress$ = progress$ /* Progress indicator subject */\nwindow.component$ = component$ /* Component observable */\n", "/*! *****************************************************************************\r\nCopyright (c) Microsoft Corporation.\r\n\r\nPermission to use, copy, modify, and/or distribute this software for any\r\npurpose with or without fee is hereby granted.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH\r\nREGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY\r\nAND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,\r\nINDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM\r\nLOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR\r\nOTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR\r\nPERFORMANCE OF THIS SOFTWARE.\r\n***************************************************************************** */\r\n/* global Reflect, Promise */\r\n\r\nvar extendStatics = function(d, b) {\r\n extendStatics = Object.setPrototypeOf ||\r\n ({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||\r\n function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };\r\n return extendStatics(d, b);\r\n};\r\n\r\nexport function __extends(d, b) {\r\n if (typeof b !== \"function\" && b !== null)\r\n throw new TypeError(\"Class extends value \" + String(b) + \" is not a constructor or null\");\r\n extendStatics(d, b);\r\n function __() { this.constructor = d; }\r\n d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());\r\n}\r\n\r\nexport var __assign = function() {\r\n __assign = Object.assign || function __assign(t) {\r\n for (var s, i = 1, n = arguments.length; i < n; i++) {\r\n s = arguments[i];\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p)) t[p] = s[p];\r\n }\r\n return t;\r\n }\r\n return __assign.apply(this, arguments);\r\n}\r\n\r\nexport function __rest(s, e) {\r\n var t = {};\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p) && e.indexOf(p) < 0)\r\n t[p] = s[p];\r\n if (s != null && typeof Object.getOwnPropertySymbols === \"function\")\r\n for (var i = 0, p = Object.getOwnPropertySymbols(s); i < p.length; i++) {\r\n if (e.indexOf(p[i]) < 0 && Object.prototype.propertyIsEnumerable.call(s, p[i]))\r\n t[p[i]] = s[p[i]];\r\n }\r\n return t;\r\n}\r\n\r\nexport function __decorate(decorators, target, key, desc) {\r\n var c = arguments.length, r = c < 3 ? target : desc === null ? desc = Object.getOwnPropertyDescriptor(target, key) : desc, d;\r\n if (typeof Reflect === \"object\" && typeof Reflect.decorate === \"function\") r = Reflect.decorate(decorators, target, key, desc);\r\n else for (var i = decorators.length - 1; i >= 0; i--) if (d = decorators[i]) r = (c < 3 ? d(r) : c > 3 ? d(target, key, r) : d(target, key)) || r;\r\n return c > 3 && r && Object.defineProperty(target, key, r), r;\r\n}\r\n\r\nexport function __param(paramIndex, decorator) {\r\n return function (target, key) { decorator(target, key, paramIndex); }\r\n}\r\n\r\nexport function __metadata(metadataKey, metadataValue) {\r\n if (typeof Reflect === \"object\" && typeof Reflect.metadata === \"function\") return Reflect.metadata(metadataKey, metadataValue);\r\n}\r\n\r\nexport function __awaiter(thisArg, _arguments, P, generator) {\r\n function adopt(value) { return value instanceof P ? value : new P(function (resolve) { resolve(value); }); }\r\n return new (P || (P = Promise))(function (resolve, reject) {\r\n function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }\r\n function rejected(value) { try { step(generator[\"throw\"](value)); } catch (e) { reject(e); } }\r\n function step(result) { result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected); }\r\n step((generator = generator.apply(thisArg, _arguments || [])).next());\r\n });\r\n}\r\n\r\nexport function __generator(thisArg, body) {\r\n var _ = { label: 0, sent: function() { if (t[0] & 1) throw t[1]; return t[1]; }, trys: [], ops: [] }, f, y, t, g;\r\n return g = { next: verb(0), \"throw\": verb(1), \"return\": verb(2) }, typeof Symbol === \"function\" && (g[Symbol.iterator] = function() { return this; }), g;\r\n function verb(n) { return function (v) { return step([n, v]); }; }\r\n function step(op) {\r\n if (f) throw new TypeError(\"Generator is already executing.\");\r\n while (_) try {\r\n if (f = 1, y && (t = op[0] & 2 ? y[\"return\"] : op[0] ? y[\"throw\"] || ((t = y[\"return\"]) && t.call(y), 0) : y.next) && !(t = t.call(y, op[1])).done) return t;\r\n if (y = 0, t) op = [op[0] & 2, t.value];\r\n switch (op[0]) {\r\n case 0: case 1: t = op; break;\r\n case 4: _.label++; return { value: op[1], done: false };\r\n case 5: _.label++; y = op[1]; op = [0]; continue;\r\n case 7: op = _.ops.pop(); _.trys.pop(); continue;\r\n default:\r\n if (!(t = _.trys, t = t.length > 0 && t[t.length - 1]) && (op[0] === 6 || op[0] === 2)) { _ = 0; continue; }\r\n if (op[0] === 3 && (!t || (op[1] > t[0] && op[1] < t[3]))) { _.label = op[1]; break; }\r\n if (op[0] === 6 && _.label < t[1]) { _.label = t[1]; t = op; break; }\r\n if (t && _.label < t[2]) { _.label = t[2]; _.ops.push(op); break; }\r\n if (t[2]) _.ops.pop();\r\n _.trys.pop(); continue;\r\n }\r\n op = body.call(thisArg, _);\r\n } catch (e) { op = [6, e]; y = 0; } finally { f = t = 0; }\r\n if (op[0] & 5) throw op[1]; return { value: op[0] ? op[1] : void 0, done: true };\r\n }\r\n}\r\n\r\nexport var __createBinding = Object.create ? (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n Object.defineProperty(o, k2, { enumerable: true, get: function() { return m[k]; } });\r\n}) : (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n o[k2] = m[k];\r\n});\r\n\r\nexport function __exportStar(m, o) {\r\n for (var p in m) if (p !== \"default\" && !Object.prototype.hasOwnProperty.call(o, p)) __createBinding(o, m, p);\r\n}\r\n\r\nexport function __values(o) {\r\n var s = typeof Symbol === \"function\" && Symbol.iterator, m = s && o[s], i = 0;\r\n if (m) return m.call(o);\r\n if (o && typeof o.length === \"number\") return {\r\n next: function () {\r\n if (o && i >= o.length) o = void 0;\r\n return { value: o && o[i++], done: !o };\r\n }\r\n };\r\n throw new TypeError(s ? \"Object is not iterable.\" : \"Symbol.iterator is not defined.\");\r\n}\r\n\r\nexport function __read(o, n) {\r\n var m = typeof Symbol === \"function\" && o[Symbol.iterator];\r\n if (!m) return o;\r\n var i = m.call(o), r, ar = [], e;\r\n try {\r\n while ((n === void 0 || n-- > 0) && !(r = i.next()).done) ar.push(r.value);\r\n }\r\n catch (error) { e = { error: error }; }\r\n finally {\r\n try {\r\n if (r && !r.done && (m = i[\"return\"])) m.call(i);\r\n }\r\n finally { if (e) throw e.error; }\r\n }\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spread() {\r\n for (var ar = [], i = 0; i < arguments.length; i++)\r\n ar = ar.concat(__read(arguments[i]));\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spreadArrays() {\r\n for (var s = 0, i = 0, il = arguments.length; i < il; i++) s += arguments[i].length;\r\n for (var r = Array(s), k = 0, i = 0; i < il; i++)\r\n for (var a = arguments[i], j = 0, jl = a.length; j < jl; j++, k++)\r\n r[k] = a[j];\r\n return r;\r\n}\r\n\r\nexport function __spreadArray(to, from, pack) {\r\n if (pack || arguments.length === 2) for (var i = 0, l = from.length, ar; i < l; i++) {\r\n if (ar || !(i in from)) {\r\n if (!ar) ar = Array.prototype.slice.call(from, 0, i);\r\n ar[i] = from[i];\r\n }\r\n }\r\n return to.concat(ar || Array.prototype.slice.call(from));\r\n}\r\n\r\nexport function __await(v) {\r\n return this instanceof __await ? (this.v = v, this) : new __await(v);\r\n}\r\n\r\nexport function __asyncGenerator(thisArg, _arguments, generator) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var g = generator.apply(thisArg, _arguments || []), i, q = [];\r\n return i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i;\r\n function verb(n) { if (g[n]) i[n] = function (v) { return new Promise(function (a, b) { q.push([n, v, a, b]) > 1 || resume(n, v); }); }; }\r\n function resume(n, v) { try { step(g[n](v)); } catch (e) { settle(q[0][3], e); } }\r\n function step(r) { r.value instanceof __await ? Promise.resolve(r.value.v).then(fulfill, reject) : settle(q[0][2], r); }\r\n function fulfill(value) { resume(\"next\", value); }\r\n function reject(value) { resume(\"throw\", value); }\r\n function settle(f, v) { if (f(v), q.shift(), q.length) resume(q[0][0], q[0][1]); }\r\n}\r\n\r\nexport function __asyncDelegator(o) {\r\n var i, p;\r\n return i = {}, verb(\"next\"), verb(\"throw\", function (e) { throw e; }), verb(\"return\"), i[Symbol.iterator] = function () { return this; }, i;\r\n function verb(n, f) { i[n] = o[n] ? function (v) { return (p = !p) ? { value: __await(o[n](v)), done: n === \"return\" } : f ? f(v) : v; } : f; }\r\n}\r\n\r\nexport function __asyncValues(o) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var m = o[Symbol.asyncIterator], i;\r\n return m ? m.call(o) : (o = typeof __values === \"function\" ? __values(o) : o[Symbol.iterator](), i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i);\r\n function verb(n) { i[n] = o[n] && function (v) { return new Promise(function (resolve, reject) { v = o[n](v), settle(resolve, reject, v.done, v.value); }); }; }\r\n function settle(resolve, reject, d, v) { Promise.resolve(v).then(function(v) { resolve({ value: v, done: d }); }, reject); }\r\n}\r\n\r\nexport function __makeTemplateObject(cooked, raw) {\r\n if (Object.defineProperty) { Object.defineProperty(cooked, \"raw\", { value: raw }); } else { cooked.raw = raw; }\r\n return cooked;\r\n};\r\n\r\nvar __setModuleDefault = Object.create ? (function(o, v) {\r\n Object.defineProperty(o, \"default\", { enumerable: true, value: v });\r\n}) : function(o, v) {\r\n o[\"default\"] = v;\r\n};\r\n\r\nexport function __importStar(mod) {\r\n if (mod && mod.__esModule) return mod;\r\n var result = {};\r\n if (mod != null) for (var k in mod) if (k !== \"default\" && Object.prototype.hasOwnProperty.call(mod, k)) __createBinding(result, mod, k);\r\n __setModuleDefault(result, mod);\r\n return result;\r\n}\r\n\r\nexport function __importDefault(mod) {\r\n return (mod && mod.__esModule) ? mod : { default: mod };\r\n}\r\n\r\nexport function __classPrivateFieldGet(receiver, state, kind, f) {\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a getter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot read private member from an object whose class did not declare it\");\r\n return kind === \"m\" ? f : kind === \"a\" ? f.call(receiver) : f ? f.value : state.get(receiver);\r\n}\r\n\r\nexport function __classPrivateFieldSet(receiver, state, value, kind, f) {\r\n if (kind === \"m\") throw new TypeError(\"Private method is not writable\");\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a setter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot write private member to an object whose class did not declare it\");\r\n return (kind === \"a\" ? f.call(receiver, value) : f ? f.value = value : state.set(receiver, value)), value;\r\n}\r\n", "/**\n * Returns true if the object is a function.\n * @param value The value to check\n */\nexport function isFunction(value: any): value is (...args: any[]) => any {\n return typeof value === 'function';\n}\n", "/**\n * Used to create Error subclasses until the community moves away from ES5.\n *\n * This is because compiling from TypeScript down to ES5 has issues with subclassing Errors\n * as well as other built-in types: https://github.com/Microsoft/TypeScript/issues/12123\n *\n * @param createImpl A factory function to create the actual constructor implementation. The returned\n * function should be a named function that calls `_super` internally.\n */\nexport function createErrorClass(createImpl: (_super: any) => any): T {\n const _super = (instance: any) => {\n Error.call(instance);\n instance.stack = new Error().stack;\n };\n\n const ctorFunc = createImpl(_super);\n ctorFunc.prototype = Object.create(Error.prototype);\n ctorFunc.prototype.constructor = ctorFunc;\n return ctorFunc;\n}\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface UnsubscriptionError extends Error {\n readonly errors: any[];\n}\n\nexport interface UnsubscriptionErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (errors: any[]): UnsubscriptionError;\n}\n\n/**\n * An error thrown when one or more errors have occurred during the\n * `unsubscribe` of a {@link Subscription}.\n */\nexport const UnsubscriptionError: UnsubscriptionErrorCtor = createErrorClass(\n (_super) =>\n function UnsubscriptionErrorImpl(this: any, errors: (Error | string)[]) {\n _super(this);\n this.message = errors\n ? `${errors.length} errors occurred during unsubscription:\n${errors.map((err, i) => `${i + 1}) ${err.toString()}`).join('\\n ')}`\n : '';\n this.name = 'UnsubscriptionError';\n this.errors = errors;\n }\n);\n", "/**\n * Removes an item from an array, mutating it.\n * @param arr The array to remove the item from\n * @param item The item to remove\n */\nexport function arrRemove(arr: T[] | undefined | null, item: T) {\n if (arr) {\n const index = arr.indexOf(item);\n 0 <= index && arr.splice(index, 1);\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { UnsubscriptionError } from './util/UnsubscriptionError';\nimport { SubscriptionLike, TeardownLogic, Unsubscribable } from './types';\nimport { arrRemove } from './util/arrRemove';\n\n/**\n * Represents a disposable resource, such as the execution of an Observable. A\n * Subscription has one important method, `unsubscribe`, that takes no argument\n * and just disposes the resource held by the subscription.\n *\n * Additionally, subscriptions may be grouped together through the `add()`\n * method, which will attach a child Subscription to the current Subscription.\n * When a Subscription is unsubscribed, all its children (and its grandchildren)\n * will be unsubscribed as well.\n *\n * @class Subscription\n */\nexport class Subscription implements SubscriptionLike {\n /** @nocollapse */\n public static EMPTY = (() => {\n const empty = new Subscription();\n empty.closed = true;\n return empty;\n })();\n\n /**\n * A flag to indicate whether this Subscription has already been unsubscribed.\n */\n public closed = false;\n\n private _parentage: Subscription[] | Subscription | null = null;\n\n /**\n * The list of registered finalizers to execute upon unsubscription. Adding and removing from this\n * list occurs in the {@link #add} and {@link #remove} methods.\n */\n private _finalizers: Exclude[] | null = null;\n\n /**\n * @param initialTeardown A function executed first as part of the finalization\n * process that is kicked off when {@link #unsubscribe} is called.\n */\n constructor(private initialTeardown?: () => void) {}\n\n /**\n * Disposes the resources held by the subscription. May, for instance, cancel\n * an ongoing Observable execution or cancel any other type of work that\n * started when the Subscription was created.\n * @return {void}\n */\n unsubscribe(): void {\n let errors: any[] | undefined;\n\n if (!this.closed) {\n this.closed = true;\n\n // Remove this from it's parents.\n const { _parentage } = this;\n if (_parentage) {\n this._parentage = null;\n if (Array.isArray(_parentage)) {\n for (const parent of _parentage) {\n parent.remove(this);\n }\n } else {\n _parentage.remove(this);\n }\n }\n\n const { initialTeardown: initialFinalizer } = this;\n if (isFunction(initialFinalizer)) {\n try {\n initialFinalizer();\n } catch (e) {\n errors = e instanceof UnsubscriptionError ? e.errors : [e];\n }\n }\n\n const { _finalizers } = this;\n if (_finalizers) {\n this._finalizers = null;\n for (const finalizer of _finalizers) {\n try {\n execFinalizer(finalizer);\n } catch (err) {\n errors = errors ?? [];\n if (err instanceof UnsubscriptionError) {\n errors = [...errors, ...err.errors];\n } else {\n errors.push(err);\n }\n }\n }\n }\n\n if (errors) {\n throw new UnsubscriptionError(errors);\n }\n }\n }\n\n /**\n * Adds a finalizer to this subscription, so that finalization will be unsubscribed/called\n * when this subscription is unsubscribed. If this subscription is already {@link #closed},\n * because it has already been unsubscribed, then whatever finalizer is passed to it\n * will automatically be executed (unless the finalizer itself is also a closed subscription).\n *\n * Closed Subscriptions cannot be added as finalizers to any subscription. Adding a closed\n * subscription to a any subscription will result in no operation. (A noop).\n *\n * Adding a subscription to itself, or adding `null` or `undefined` will not perform any\n * operation at all. (A noop).\n *\n * `Subscription` instances that are added to this instance will automatically remove themselves\n * if they are unsubscribed. Functions and {@link Unsubscribable} objects that you wish to remove\n * will need to be removed manually with {@link #remove}\n *\n * @param teardown The finalization logic to add to this subscription.\n */\n add(teardown: TeardownLogic): void {\n // Only add the finalizer if it's not undefined\n // and don't add a subscription to itself.\n if (teardown && teardown !== this) {\n if (this.closed) {\n // If this subscription is already closed,\n // execute whatever finalizer is handed to it automatically.\n execFinalizer(teardown);\n } else {\n if (teardown instanceof Subscription) {\n // We don't add closed subscriptions, and we don't add the same subscription\n // twice. Subscription unsubscribe is idempotent.\n if (teardown.closed || teardown._hasParent(this)) {\n return;\n }\n teardown._addParent(this);\n }\n (this._finalizers = this._finalizers ?? []).push(teardown);\n }\n }\n }\n\n /**\n * Checks to see if a this subscription already has a particular parent.\n * This will signal that this subscription has already been added to the parent in question.\n * @param parent the parent to check for\n */\n private _hasParent(parent: Subscription) {\n const { _parentage } = this;\n return _parentage === parent || (Array.isArray(_parentage) && _parentage.includes(parent));\n }\n\n /**\n * Adds a parent to this subscription so it can be removed from the parent if it\n * unsubscribes on it's own.\n *\n * NOTE: THIS ASSUMES THAT {@link _hasParent} HAS ALREADY BEEN CHECKED.\n * @param parent The parent subscription to add\n */\n private _addParent(parent: Subscription) {\n const { _parentage } = this;\n this._parentage = Array.isArray(_parentage) ? (_parentage.push(parent), _parentage) : _parentage ? [_parentage, parent] : parent;\n }\n\n /**\n * Called on a child when it is removed via {@link #remove}.\n * @param parent The parent to remove\n */\n private _removeParent(parent: Subscription) {\n const { _parentage } = this;\n if (_parentage === parent) {\n this._parentage = null;\n } else if (Array.isArray(_parentage)) {\n arrRemove(_parentage, parent);\n }\n }\n\n /**\n * Removes a finalizer from this subscription that was previously added with the {@link #add} method.\n *\n * Note that `Subscription` instances, when unsubscribed, will automatically remove themselves\n * from every other `Subscription` they have been added to. This means that using the `remove` method\n * is not a common thing and should be used thoughtfully.\n *\n * If you add the same finalizer instance of a function or an unsubscribable object to a `Subscription` instance\n * more than once, you will need to call `remove` the same number of times to remove all instances.\n *\n * All finalizer instances are removed to free up memory upon unsubscription.\n *\n * @param teardown The finalizer to remove from this subscription\n */\n remove(teardown: Exclude): void {\n const { _finalizers } = this;\n _finalizers && arrRemove(_finalizers, teardown);\n\n if (teardown instanceof Subscription) {\n teardown._removeParent(this);\n }\n }\n}\n\nexport const EMPTY_SUBSCRIPTION = Subscription.EMPTY;\n\nexport function isSubscription(value: any): value is Subscription {\n return (\n value instanceof Subscription ||\n (value && 'closed' in value && isFunction(value.remove) && isFunction(value.add) && isFunction(value.unsubscribe))\n );\n}\n\nfunction execFinalizer(finalizer: Unsubscribable | (() => void)) {\n if (isFunction(finalizer)) {\n finalizer();\n } else {\n finalizer.unsubscribe();\n }\n}\n", "import { Subscriber } from './Subscriber';\nimport { ObservableNotification } from './types';\n\n/**\n * The {@link GlobalConfig} object for RxJS. It is used to configure things\n * like how to react on unhandled errors.\n */\nexport const config: GlobalConfig = {\n onUnhandledError: null,\n onStoppedNotification: null,\n Promise: undefined,\n useDeprecatedSynchronousErrorHandling: false,\n useDeprecatedNextContext: false,\n};\n\n/**\n * The global configuration object for RxJS, used to configure things\n * like how to react on unhandled errors. Accessible via {@link config}\n * object.\n */\nexport interface GlobalConfig {\n /**\n * A registration point for unhandled errors from RxJS. These are errors that\n * cannot were not handled by consuming code in the usual subscription path. For\n * example, if you have this configured, and you subscribe to an observable without\n * providing an error handler, errors from that subscription will end up here. This\n * will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onUnhandledError: ((err: any) => void) | null;\n\n /**\n * A registration point for notifications that cannot be sent to subscribers because they\n * have completed, errored or have been explicitly unsubscribed. By default, next, complete\n * and error notifications sent to stopped subscribers are noops. However, sometimes callers\n * might want a different behavior. For example, with sources that attempt to report errors\n * to stopped subscribers, a caller can configure RxJS to throw an unhandled error instead.\n * This will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onStoppedNotification: ((notification: ObservableNotification, subscriber: Subscriber) => void) | null;\n\n /**\n * The promise constructor used by default for {@link Observable#toPromise toPromise} and {@link Observable#forEach forEach}\n * methods.\n *\n * @deprecated As of version 8, RxJS will no longer support this sort of injection of a\n * Promise constructor. If you need a Promise implementation other than native promises,\n * please polyfill/patch Promise as you see appropriate. Will be removed in v8.\n */\n Promise?: PromiseConstructorLike;\n\n /**\n * If true, turns on synchronous error rethrowing, which is a deprecated behavior\n * in v6 and higher. This behavior enables bad patterns like wrapping a subscribe\n * call in a try/catch block. It also enables producer interference, a nasty bug\n * where a multicast can be broken for all observers by a downstream consumer with\n * an unhandled error. DO NOT USE THIS FLAG UNLESS IT'S NEEDED TO BUY TIME\n * FOR MIGRATION REASONS.\n *\n * @deprecated As of version 8, RxJS will no longer support synchronous throwing\n * of unhandled errors. All errors will be thrown on a separate call stack to prevent bad\n * behaviors described above. Will be removed in v8.\n */\n useDeprecatedSynchronousErrorHandling: boolean;\n\n /**\n * If true, enables an as-of-yet undocumented feature from v5: The ability to access\n * `unsubscribe()` via `this` context in `next` functions created in observers passed\n * to `subscribe`.\n *\n * This is being removed because the performance was severely problematic, and it could also cause\n * issues when types other than POJOs are passed to subscribe as subscribers, as they will likely have\n * their `this` context overwritten.\n *\n * @deprecated As of version 8, RxJS will no longer support altering the\n * context of next functions provided as part of an observer to Subscribe. Instead,\n * you will have access to a subscription or a signal or token that will allow you to do things like\n * unsubscribe and test closed status. Will be removed in v8.\n */\n useDeprecatedNextContext: boolean;\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetTimeoutFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearTimeoutFunction = (handle: TimerHandle) => void;\n\ninterface TimeoutProvider {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n delegate:\n | {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n }\n | undefined;\n}\n\nexport const timeoutProvider: TimeoutProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setTimeout(handler: () => void, timeout?: number, ...args) {\n const { delegate } = timeoutProvider;\n if (delegate?.setTimeout) {\n return delegate.setTimeout(handler, timeout, ...args);\n }\n return setTimeout(handler, timeout, ...args);\n },\n clearTimeout(handle) {\n const { delegate } = timeoutProvider;\n return (delegate?.clearTimeout || clearTimeout)(handle as any);\n },\n delegate: undefined,\n};\n", "import { config } from '../config';\nimport { timeoutProvider } from '../scheduler/timeoutProvider';\n\n/**\n * Handles an error on another job either with the user-configured {@link onUnhandledError},\n * or by throwing it on that new job so it can be picked up by `window.onerror`, `process.on('error')`, etc.\n *\n * This should be called whenever there is an error that is out-of-band with the subscription\n * or when an error hits a terminal boundary of the subscription and no error handler was provided.\n *\n * @param err the error to report\n */\nexport function reportUnhandledError(err: any) {\n timeoutProvider.setTimeout(() => {\n const { onUnhandledError } = config;\n if (onUnhandledError) {\n // Execute the user-configured error handler.\n onUnhandledError(err);\n } else {\n // Throw so it is picked up by the runtime's uncaught error mechanism.\n throw err;\n }\n });\n}\n", "/* tslint:disable:no-empty */\nexport function noop() { }\n", "import { CompleteNotification, NextNotification, ErrorNotification } from './types';\n\n/**\n * A completion object optimized for memory use and created to be the\n * same \"shape\" as other notifications in v8.\n * @internal\n */\nexport const COMPLETE_NOTIFICATION = (() => createNotification('C', undefined, undefined) as CompleteNotification)();\n\n/**\n * Internal use only. Creates an optimized error notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function errorNotification(error: any): ErrorNotification {\n return createNotification('E', undefined, error) as any;\n}\n\n/**\n * Internal use only. Creates an optimized next notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function nextNotification(value: T) {\n return createNotification('N', value, undefined) as NextNotification;\n}\n\n/**\n * Ensures that all notifications created internally have the same \"shape\" in v8.\n *\n * TODO: This is only exported to support a crazy legacy test in `groupBy`.\n * @internal\n */\nexport function createNotification(kind: 'N' | 'E' | 'C', value: any, error: any) {\n return {\n kind,\n value,\n error,\n };\n}\n", "import { config } from '../config';\n\nlet context: { errorThrown: boolean; error: any } | null = null;\n\n/**\n * Handles dealing with errors for super-gross mode. Creates a context, in which\n * any synchronously thrown errors will be passed to {@link captureError}. Which\n * will record the error such that it will be rethrown after the call back is complete.\n * TODO: Remove in v8\n * @param cb An immediately executed function.\n */\nexport function errorContext(cb: () => void) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n const isRoot = !context;\n if (isRoot) {\n context = { errorThrown: false, error: null };\n }\n cb();\n if (isRoot) {\n const { errorThrown, error } = context!;\n context = null;\n if (errorThrown) {\n throw error;\n }\n }\n } else {\n // This is the general non-deprecated path for everyone that\n // isn't crazy enough to use super-gross mode (useDeprecatedSynchronousErrorHandling)\n cb();\n }\n}\n\n/**\n * Captures errors only in super-gross mode.\n * @param err the error to capture\n */\nexport function captureError(err: any) {\n if (config.useDeprecatedSynchronousErrorHandling && context) {\n context.errorThrown = true;\n context.error = err;\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { Observer, ObservableNotification } from './types';\nimport { isSubscription, Subscription } from './Subscription';\nimport { config } from './config';\nimport { reportUnhandledError } from './util/reportUnhandledError';\nimport { noop } from './util/noop';\nimport { nextNotification, errorNotification, COMPLETE_NOTIFICATION } from './NotificationFactories';\nimport { timeoutProvider } from './scheduler/timeoutProvider';\nimport { captureError } from './util/errorContext';\n\n/**\n * Implements the {@link Observer} interface and extends the\n * {@link Subscription} class. While the {@link Observer} is the public API for\n * consuming the values of an {@link Observable}, all Observers get converted to\n * a Subscriber, in order to provide Subscription-like capabilities such as\n * `unsubscribe`. Subscriber is a common type in RxJS, and crucial for\n * implementing operators, but it is rarely used as a public API.\n *\n * @class Subscriber\n */\nexport class Subscriber extends Subscription implements Observer {\n /**\n * A static factory for a Subscriber, given a (potentially partial) definition\n * of an Observer.\n * @param next The `next` callback of an Observer.\n * @param error The `error` callback of an\n * Observer.\n * @param complete The `complete` callback of an\n * Observer.\n * @return A Subscriber wrapping the (partially defined)\n * Observer represented by the given arguments.\n * @nocollapse\n * @deprecated Do not use. Will be removed in v8. There is no replacement for this\n * method, and there is no reason to be creating instances of `Subscriber` directly.\n * If you have a specific use case, please file an issue.\n */\n static create(next?: (x?: T) => void, error?: (e?: any) => void, complete?: () => void): Subscriber {\n return new SafeSubscriber(next, error, complete);\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected isStopped: boolean = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected destination: Subscriber | Observer; // this `any` is the escape hatch to erase extra type param (e.g. R)\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * There is no reason to directly create an instance of Subscriber. This type is exported for typings reasons.\n */\n constructor(destination?: Subscriber | Observer) {\n super();\n if (destination) {\n this.destination = destination;\n // Automatically chain subscriptions together here.\n // if destination is a Subscription, then it is a Subscriber.\n if (isSubscription(destination)) {\n destination.add(this);\n }\n } else {\n this.destination = EMPTY_OBSERVER;\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `next` from\n * the Observable, with a value. The Observable may call this method 0 or more\n * times.\n * @param {T} [value] The `next` value.\n * @return {void}\n */\n next(value?: T): void {\n if (this.isStopped) {\n handleStoppedNotification(nextNotification(value), this);\n } else {\n this._next(value!);\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `error` from\n * the Observable, with an attached `Error`. Notifies the Observer that\n * the Observable has experienced an error condition.\n * @param {any} [err] The `error` exception.\n * @return {void}\n */\n error(err?: any): void {\n if (this.isStopped) {\n handleStoppedNotification(errorNotification(err), this);\n } else {\n this.isStopped = true;\n this._error(err);\n }\n }\n\n /**\n * The {@link Observer} callback to receive a valueless notification of type\n * `complete` from the Observable. Notifies the Observer that the Observable\n * has finished sending push-based notifications.\n * @return {void}\n */\n complete(): void {\n if (this.isStopped) {\n handleStoppedNotification(COMPLETE_NOTIFICATION, this);\n } else {\n this.isStopped = true;\n this._complete();\n }\n }\n\n unsubscribe(): void {\n if (!this.closed) {\n this.isStopped = true;\n super.unsubscribe();\n this.destination = null!;\n }\n }\n\n protected _next(value: T): void {\n this.destination.next(value);\n }\n\n protected _error(err: any): void {\n try {\n this.destination.error(err);\n } finally {\n this.unsubscribe();\n }\n }\n\n protected _complete(): void {\n try {\n this.destination.complete();\n } finally {\n this.unsubscribe();\n }\n }\n}\n\n/**\n * This bind is captured here because we want to be able to have\n * compatibility with monoid libraries that tend to use a method named\n * `bind`. In particular, a library called Monio requires this.\n */\nconst _bind = Function.prototype.bind;\n\nfunction bind any>(fn: Fn, thisArg: any): Fn {\n return _bind.call(fn, thisArg);\n}\n\n/**\n * Internal optimization only, DO NOT EXPOSE.\n * @internal\n */\nclass ConsumerObserver implements Observer {\n constructor(private partialObserver: Partial>) {}\n\n next(value: T): void {\n const { partialObserver } = this;\n if (partialObserver.next) {\n try {\n partialObserver.next(value);\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n\n error(err: any): void {\n const { partialObserver } = this;\n if (partialObserver.error) {\n try {\n partialObserver.error(err);\n } catch (error) {\n handleUnhandledError(error);\n }\n } else {\n handleUnhandledError(err);\n }\n }\n\n complete(): void {\n const { partialObserver } = this;\n if (partialObserver.complete) {\n try {\n partialObserver.complete();\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n}\n\nexport class SafeSubscriber extends Subscriber {\n constructor(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((e?: any) => void) | null,\n complete?: (() => void) | null\n ) {\n super();\n\n let partialObserver: Partial>;\n if (isFunction(observerOrNext) || !observerOrNext) {\n // The first argument is a function, not an observer. The next\n // two arguments *could* be observers, or they could be empty.\n partialObserver = {\n next: (observerOrNext ?? undefined) as (((value: T) => void) | undefined),\n error: error ?? undefined,\n complete: complete ?? undefined,\n };\n } else {\n // The first argument is a partial observer.\n let context: any;\n if (this && config.useDeprecatedNextContext) {\n // This is a deprecated path that made `this.unsubscribe()` available in\n // next handler functions passed to subscribe. This only exists behind a flag\n // now, as it is *very* slow.\n context = Object.create(observerOrNext);\n context.unsubscribe = () => this.unsubscribe();\n partialObserver = {\n next: observerOrNext.next && bind(observerOrNext.next, context),\n error: observerOrNext.error && bind(observerOrNext.error, context),\n complete: observerOrNext.complete && bind(observerOrNext.complete, context),\n };\n } else {\n // The \"normal\" path. Just use the partial observer directly.\n partialObserver = observerOrNext;\n }\n }\n\n // Wrap the partial observer to ensure it's a full observer, and\n // make sure proper error handling is accounted for.\n this.destination = new ConsumerObserver(partialObserver);\n }\n}\n\nfunction handleUnhandledError(error: any) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n captureError(error);\n } else {\n // Ideal path, we report this as an unhandled error,\n // which is thrown on a new call stack.\n reportUnhandledError(error);\n }\n}\n\n/**\n * An error handler used when no error handler was supplied\n * to the SafeSubscriber -- meaning no error handler was supplied\n * do the `subscribe` call on our observable.\n * @param err The error to handle\n */\nfunction defaultErrorHandler(err: any) {\n throw err;\n}\n\n/**\n * A handler for notifications that cannot be sent to a stopped subscriber.\n * @param notification The notification being sent\n * @param subscriber The stopped subscriber\n */\nfunction handleStoppedNotification(notification: ObservableNotification, subscriber: Subscriber) {\n const { onStoppedNotification } = config;\n onStoppedNotification && timeoutProvider.setTimeout(() => onStoppedNotification(notification, subscriber));\n}\n\n/**\n * The observer used as a stub for subscriptions where the user did not\n * pass any arguments to `subscribe`. Comes with the default error handling\n * behavior.\n */\nexport const EMPTY_OBSERVER: Readonly> & { closed: true } = {\n closed: true,\n next: noop,\n error: defaultErrorHandler,\n complete: noop,\n};\n", "/**\n * Symbol.observable or a string \"@@observable\". Used for interop\n *\n * @deprecated We will no longer be exporting this symbol in upcoming versions of RxJS.\n * Instead polyfill and use Symbol.observable directly *or* use https://www.npmjs.com/package/symbol-observable\n */\nexport const observable: string | symbol = (() => (typeof Symbol === 'function' && Symbol.observable) || '@@observable')();\n", "/**\n * This function takes one parameter and just returns it. Simply put,\n * this is like `(x: T): T => x`.\n *\n * ## Examples\n *\n * This is useful in some cases when using things like `mergeMap`\n *\n * ```ts\n * import { interval, take, map, range, mergeMap, identity } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(5));\n *\n * const result$ = source$.pipe(\n * map(i => range(i)),\n * mergeMap(identity) // same as mergeMap(x => x)\n * );\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * Or when you want to selectively apply an operator\n *\n * ```ts\n * import { interval, take, identity } from 'rxjs';\n *\n * const shouldLimit = () => Math.random() < 0.5;\n *\n * const source$ = interval(1000);\n *\n * const result$ = source$.pipe(shouldLimit() ? take(5) : identity);\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * @param x Any value that is returned by this function\n * @returns The value passed as the first parameter to this function\n */\nexport function identity(x: T): T {\n return x;\n}\n", "import { identity } from './identity';\nimport { UnaryFunction } from '../types';\n\nexport function pipe(): typeof identity;\nexport function pipe(fn1: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction, fn3: UnaryFunction): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction,\n ...fns: UnaryFunction[]\n): UnaryFunction;\n\n/**\n * pipe() can be called on one or more functions, each of which can take one argument (\"UnaryFunction\")\n * and uses it to return a value.\n * It returns a function that takes one argument, passes it to the first UnaryFunction, and then\n * passes the result to the next one, passes that result to the next one, and so on. \n */\nexport function pipe(...fns: Array>): UnaryFunction {\n return pipeFromArray(fns);\n}\n\n/** @internal */\nexport function pipeFromArray(fns: Array>): UnaryFunction {\n if (fns.length === 0) {\n return identity as UnaryFunction;\n }\n\n if (fns.length === 1) {\n return fns[0];\n }\n\n return function piped(input: T): R {\n return fns.reduce((prev: any, fn: UnaryFunction) => fn(prev), input as any);\n };\n}\n", "import { Operator } from './Operator';\nimport { SafeSubscriber, Subscriber } from './Subscriber';\nimport { isSubscription, Subscription } from './Subscription';\nimport { TeardownLogic, OperatorFunction, Subscribable, Observer } from './types';\nimport { observable as Symbol_observable } from './symbol/observable';\nimport { pipeFromArray } from './util/pipe';\nimport { config } from './config';\nimport { isFunction } from './util/isFunction';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A representation of any set of values over any amount of time. This is the most basic building block\n * of RxJS.\n *\n * @class Observable\n */\nexport class Observable implements Subscribable {\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n source: Observable | undefined;\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n operator: Operator | undefined;\n\n /**\n * @constructor\n * @param {Function} subscribe the function that is called when the Observable is\n * initially subscribed to. This function is given a Subscriber, to which new values\n * can be `next`ed, or an `error` method can be called to raise an error, or\n * `complete` can be called to notify of a successful completion.\n */\n constructor(subscribe?: (this: Observable, subscriber: Subscriber) => TeardownLogic) {\n if (subscribe) {\n this._subscribe = subscribe;\n }\n }\n\n // HACK: Since TypeScript inherits static properties too, we have to\n // fight against TypeScript here so Subject can have a different static create signature\n /**\n * Creates a new Observable by calling the Observable constructor\n * @owner Observable\n * @method create\n * @param {Function} subscribe? the subscriber function to be passed to the Observable constructor\n * @return {Observable} a new observable\n * @nocollapse\n * @deprecated Use `new Observable()` instead. Will be removed in v8.\n */\n static create: (...args: any[]) => any = (subscribe?: (subscriber: Subscriber) => TeardownLogic) => {\n return new Observable(subscribe);\n };\n\n /**\n * Creates a new Observable, with this Observable instance as the source, and the passed\n * operator defined as the new observable's operator.\n * @method lift\n * @param operator the operator defining the operation to take on the observable\n * @return a new observable with the Operator applied\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * If you have implemented an operator using `lift`, it is recommended that you create an\n * operator by simply returning `new Observable()` directly. See \"Creating new operators from\n * scratch\" section here: https://rxjs.dev/guide/operators\n */\n lift(operator?: Operator): Observable {\n const observable = new Observable();\n observable.source = this;\n observable.operator = operator;\n return observable;\n }\n\n subscribe(observerOrNext?: Partial> | ((value: T) => void)): Subscription;\n /** @deprecated Instead of passing separate callback arguments, use an observer argument. Signatures taking separate callback arguments will be removed in v8. Details: https://rxjs.dev/deprecations/subscribe-arguments */\n subscribe(next?: ((value: T) => void) | null, error?: ((error: any) => void) | null, complete?: (() => void) | null): Subscription;\n /**\n * Invokes an execution of an Observable and registers Observer handlers for notifications it will emit.\n *\n * Use it when you have all these Observables, but still nothing is happening.\n *\n * `subscribe` is not a regular operator, but a method that calls Observable's internal `subscribe` function. It\n * might be for example a function that you passed to Observable's constructor, but most of the time it is\n * a library implementation, which defines what will be emitted by an Observable, and when it be will emitted. This means\n * that calling `subscribe` is actually the moment when Observable starts its work, not when it is created, as it is often\n * the thought.\n *\n * Apart from starting the execution of an Observable, this method allows you to listen for values\n * that an Observable emits, as well as for when it completes or errors. You can achieve this in two\n * of the following ways.\n *\n * The first way is creating an object that implements {@link Observer} interface. It should have methods\n * defined by that interface, but note that it should be just a regular JavaScript object, which you can create\n * yourself in any way you want (ES6 class, classic function constructor, object literal etc.). In particular, do\n * not attempt to use any RxJS implementation details to create Observers - you don't need them. Remember also\n * that your object does not have to implement all methods. If you find yourself creating a method that doesn't\n * do anything, you can simply omit it. Note however, if the `error` method is not provided and an error happens,\n * it will be thrown asynchronously. Errors thrown asynchronously cannot be caught using `try`/`catch`. Instead,\n * use the {@link onUnhandledError} configuration option or use a runtime handler (like `window.onerror` or\n * `process.on('error)`) to be notified of unhandled errors. Because of this, it's recommended that you provide\n * an `error` method to avoid missing thrown errors.\n *\n * The second way is to give up on Observer object altogether and simply provide callback functions in place of its methods.\n * This means you can provide three functions as arguments to `subscribe`, where the first function is equivalent\n * of a `next` method, the second of an `error` method and the third of a `complete` method. Just as in case of an Observer,\n * if you do not need to listen for something, you can omit a function by passing `undefined` or `null`,\n * since `subscribe` recognizes these functions by where they were placed in function call. When it comes\n * to the `error` function, as with an Observer, if not provided, errors emitted by an Observable will be thrown asynchronously.\n *\n * You can, however, subscribe with no parameters at all. This may be the case where you're not interested in terminal events\n * and you also handled emissions internally by using operators (e.g. using `tap`).\n *\n * Whichever style of calling `subscribe` you use, in both cases it returns a Subscription object.\n * This object allows you to call `unsubscribe` on it, which in turn will stop the work that an Observable does and will clean\n * up all resources that an Observable used. Note that cancelling a subscription will not call `complete` callback\n * provided to `subscribe` function, which is reserved for a regular completion signal that comes from an Observable.\n *\n * Remember that callbacks provided to `subscribe` are not guaranteed to be called asynchronously.\n * It is an Observable itself that decides when these functions will be called. For example {@link of}\n * by default emits all its values synchronously. Always check documentation for how given Observable\n * will behave when subscribed and if its default behavior can be modified with a `scheduler`.\n *\n * #### Examples\n *\n * Subscribe with an {@link guide/observer Observer}\n *\n * ```ts\n * import { of } from 'rxjs';\n *\n * const sumObserver = {\n * sum: 0,\n * next(value) {\n * console.log('Adding: ' + value);\n * this.sum = this.sum + value;\n * },\n * error() {\n * // We actually could just remove this method,\n * // since we do not really care about errors right now.\n * },\n * complete() {\n * console.log('Sum equals: ' + this.sum);\n * }\n * };\n *\n * of(1, 2, 3) // Synchronously emits 1, 2, 3 and then completes.\n * .subscribe(sumObserver);\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Subscribe with functions ({@link deprecations/subscribe-arguments deprecated})\n *\n * ```ts\n * import { of } from 'rxjs'\n *\n * let sum = 0;\n *\n * of(1, 2, 3).subscribe(\n * value => {\n * console.log('Adding: ' + value);\n * sum = sum + value;\n * },\n * undefined,\n * () => console.log('Sum equals: ' + sum)\n * );\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Cancel a subscription\n *\n * ```ts\n * import { interval } from 'rxjs';\n *\n * const subscription = interval(1000).subscribe({\n * next(num) {\n * console.log(num)\n * },\n * complete() {\n * // Will not be called, even when cancelling subscription.\n * console.log('completed!');\n * }\n * });\n *\n * setTimeout(() => {\n * subscription.unsubscribe();\n * console.log('unsubscribed!');\n * }, 2500);\n *\n * // Logs:\n * // 0 after 1s\n * // 1 after 2s\n * // 'unsubscribed!' after 2.5s\n * ```\n *\n * @param {Observer|Function} observerOrNext (optional) Either an observer with methods to be called,\n * or the first of three possible handlers, which is the handler for each value emitted from the subscribed\n * Observable.\n * @param {Function} error (optional) A handler for a terminal event resulting from an error. If no error handler is provided,\n * the error will be thrown asynchronously as unhandled.\n * @param {Function} complete (optional) A handler for a terminal event resulting from successful completion.\n * @return {Subscription} a subscription reference to the registered handlers\n * @method subscribe\n */\n subscribe(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((error: any) => void) | null,\n complete?: (() => void) | null\n ): Subscription {\n const subscriber = isSubscriber(observerOrNext) ? observerOrNext : new SafeSubscriber(observerOrNext, error, complete);\n\n errorContext(() => {\n const { operator, source } = this;\n subscriber.add(\n operator\n ? // We're dealing with a subscription in the\n // operator chain to one of our lifted operators.\n operator.call(subscriber, source)\n : source\n ? // If `source` has a value, but `operator` does not, something that\n // had intimate knowledge of our API, like our `Subject`, must have\n // set it. We're going to just call `_subscribe` directly.\n this._subscribe(subscriber)\n : // In all other cases, we're likely wrapping a user-provided initializer\n // function, so we need to catch errors and handle them appropriately.\n this._trySubscribe(subscriber)\n );\n });\n\n return subscriber;\n }\n\n /** @internal */\n protected _trySubscribe(sink: Subscriber): TeardownLogic {\n try {\n return this._subscribe(sink);\n } catch (err) {\n // We don't need to return anything in this case,\n // because it's just going to try to `add()` to a subscription\n // above.\n sink.error(err);\n }\n }\n\n /**\n * Used as a NON-CANCELLABLE means of subscribing to an observable, for use with\n * APIs that expect promises, like `async/await`. You cannot unsubscribe from this.\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * #### Example\n *\n * ```ts\n * import { interval, take } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(4));\n *\n * async function getTotal() {\n * let total = 0;\n *\n * await source$.forEach(value => {\n * total += value;\n * console.log('observable -> ' + value);\n * });\n *\n * return total;\n * }\n *\n * getTotal().then(\n * total => console.log('Total: ' + total)\n * );\n *\n * // Expected:\n * // 'observable -> 0'\n * // 'observable -> 1'\n * // 'observable -> 2'\n * // 'observable -> 3'\n * // 'Total: 6'\n * ```\n *\n * @param next a handler for each value emitted by the observable\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n */\n forEach(next: (value: T) => void): Promise;\n\n /**\n * @param next a handler for each value emitted by the observable\n * @param promiseCtor a constructor function used to instantiate the Promise\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n * @deprecated Passing a Promise constructor will no longer be available\n * in upcoming versions of RxJS. This is because it adds weight to the library, for very\n * little benefit. If you need this functionality, it is recommended that you either\n * polyfill Promise, or you create an adapter to convert the returned native promise\n * to whatever promise implementation you wanted. Will be removed in v8.\n */\n forEach(next: (value: T) => void, promiseCtor: PromiseConstructorLike): Promise;\n\n forEach(next: (value: T) => void, promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n const subscriber = new SafeSubscriber({\n next: (value) => {\n try {\n next(value);\n } catch (err) {\n reject(err);\n subscriber.unsubscribe();\n }\n },\n error: reject,\n complete: resolve,\n });\n this.subscribe(subscriber);\n }) as Promise;\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): TeardownLogic {\n return this.source?.subscribe(subscriber);\n }\n\n /**\n * An interop point defined by the es7-observable spec https://github.com/zenparsing/es-observable\n * @method Symbol.observable\n * @return {Observable} this instance of the observable\n */\n [Symbol_observable]() {\n return this;\n }\n\n /* tslint:disable:max-line-length */\n pipe(): Observable;\n pipe(op1: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction, op3: OperatorFunction): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction,\n ...operations: OperatorFunction[]\n ): Observable;\n /* tslint:enable:max-line-length */\n\n /**\n * Used to stitch together functional operators into a chain.\n * @method pipe\n * @return {Observable} the Observable result of all of the operators having\n * been called in the order they were passed in.\n *\n * ## Example\n *\n * ```ts\n * import { interval, filter, map, scan } from 'rxjs';\n *\n * interval(1000)\n * .pipe(\n * filter(x => x % 2 === 0),\n * map(x => x + x),\n * scan((acc, x) => acc + x)\n * )\n * .subscribe(x => console.log(x));\n * ```\n */\n pipe(...operations: OperatorFunction[]): Observable {\n return pipeFromArray(operations)(this);\n }\n\n /* tslint:disable:max-line-length */\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: typeof Promise): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: PromiseConstructorLike): Promise;\n /* tslint:enable:max-line-length */\n\n /**\n * Subscribe to this Observable and get a Promise resolving on\n * `complete` with the last emission (if any).\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * @method toPromise\n * @param [promiseCtor] a constructor function used to instantiate\n * the Promise\n * @return A Promise that resolves with the last value emit, or\n * rejects on an error. If there were no emissions, Promise\n * resolves with undefined.\n * @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise\n */\n toPromise(promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n let value: T | undefined;\n this.subscribe(\n (x: T) => (value = x),\n (err: any) => reject(err),\n () => resolve(value)\n );\n }) as Promise;\n }\n}\n\n/**\n * Decides between a passed promise constructor from consuming code,\n * A default configured promise constructor, and the native promise\n * constructor and returns it. If nothing can be found, it will throw\n * an error.\n * @param promiseCtor The optional promise constructor to passed by consuming code\n */\nfunction getPromiseCtor(promiseCtor: PromiseConstructorLike | undefined) {\n return promiseCtor ?? config.Promise ?? Promise;\n}\n\nfunction isObserver(value: any): value is Observer {\n return value && isFunction(value.next) && isFunction(value.error) && isFunction(value.complete);\n}\n\nfunction isSubscriber(value: any): value is Subscriber {\n return (value && value instanceof Subscriber) || (isObserver(value) && isSubscription(value));\n}\n", "import { Observable } from '../Observable';\nimport { Subscriber } from '../Subscriber';\nimport { OperatorFunction } from '../types';\nimport { isFunction } from './isFunction';\n\n/**\n * Used to determine if an object is an Observable with a lift function.\n */\nexport function hasLift(source: any): source is { lift: InstanceType['lift'] } {\n return isFunction(source?.lift);\n}\n\n/**\n * Creates an `OperatorFunction`. Used to define operators throughout the library in a concise way.\n * @param init The logic to connect the liftedSource to the subscriber at the moment of subscription.\n */\nexport function operate(\n init: (liftedSource: Observable, subscriber: Subscriber) => (() => void) | void\n): OperatorFunction {\n return (source: Observable) => {\n if (hasLift(source)) {\n return source.lift(function (this: Subscriber, liftedSource: Observable) {\n try {\n return init(liftedSource, this);\n } catch (err) {\n this.error(err);\n }\n });\n }\n throw new TypeError('Unable to lift unknown Observable type');\n };\n}\n", "import { Subscriber } from '../Subscriber';\n\n/**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional teardown logic here. This will only be called on teardown if the\n * subscriber itself is not already closed. This is called after all other teardown logic is executed.\n */\nexport function createOperatorSubscriber(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n onFinalize?: () => void\n): Subscriber {\n return new OperatorSubscriber(destination, onNext, onComplete, onError, onFinalize);\n}\n\n/**\n * A generic helper for allowing operators to be created with a Subscriber and\n * use closures to capture necessary state from the operator function itself.\n */\nexport class OperatorSubscriber extends Subscriber {\n /**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional finalization logic here. This will only be called on finalization if the\n * subscriber itself is not already closed. This is called after all other finalization logic is executed.\n * @param shouldUnsubscribe An optional check to see if an unsubscribe call should truly unsubscribe.\n * NOTE: This currently **ONLY** exists to support the strange behavior of {@link groupBy}, where unsubscription\n * to the resulting observable does not actually disconnect from the source if there are active subscriptions\n * to any grouped observable. (DO NOT EXPOSE OR USE EXTERNALLY!!!)\n */\n constructor(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n private onFinalize?: () => void,\n private shouldUnsubscribe?: () => boolean\n ) {\n // It's important - for performance reasons - that all of this class's\n // members are initialized and that they are always initialized in the same\n // order. This will ensure that all OperatorSubscriber instances have the\n // same hidden class in V8. This, in turn, will help keep the number of\n // hidden classes involved in property accesses within the base class as\n // low as possible. If the number of hidden classes involved exceeds four,\n // the property accesses will become megamorphic and performance penalties\n // will be incurred - i.e. inline caches won't be used.\n //\n // The reasons for ensuring all instances have the same hidden class are\n // further discussed in this blog post from Benedikt Meurer:\n // https://benediktmeurer.de/2018/03/23/impact-of-polymorphism-on-component-based-frameworks-like-react/\n super(destination);\n this._next = onNext\n ? function (this: OperatorSubscriber, value: T) {\n try {\n onNext(value);\n } catch (err) {\n destination.error(err);\n }\n }\n : super._next;\n this._error = onError\n ? function (this: OperatorSubscriber, err: any) {\n try {\n onError(err);\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._error;\n this._complete = onComplete\n ? function (this: OperatorSubscriber) {\n try {\n onComplete();\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._complete;\n }\n\n unsubscribe() {\n if (!this.shouldUnsubscribe || this.shouldUnsubscribe()) {\n const { closed } = this;\n super.unsubscribe();\n // Execute additional teardown if we have any and we didn't already do so.\n !closed && this.onFinalize?.();\n }\n }\n}\n", "import { Subscription } from '../Subscription';\n\ninterface AnimationFrameProvider {\n schedule(callback: FrameRequestCallback): Subscription;\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n delegate:\n | {\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n }\n | undefined;\n}\n\nexport const animationFrameProvider: AnimationFrameProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n schedule(callback) {\n let request = requestAnimationFrame;\n let cancel: typeof cancelAnimationFrame | undefined = cancelAnimationFrame;\n const { delegate } = animationFrameProvider;\n if (delegate) {\n request = delegate.requestAnimationFrame;\n cancel = delegate.cancelAnimationFrame;\n }\n const handle = request((timestamp) => {\n // Clear the cancel function. The request has been fulfilled, so\n // attempting to cancel the request upon unsubscription would be\n // pointless.\n cancel = undefined;\n callback(timestamp);\n });\n return new Subscription(() => cancel?.(handle));\n },\n requestAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.requestAnimationFrame || requestAnimationFrame)(...args);\n },\n cancelAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.cancelAnimationFrame || cancelAnimationFrame)(...args);\n },\n delegate: undefined,\n};\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface ObjectUnsubscribedError extends Error {}\n\nexport interface ObjectUnsubscribedErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (): ObjectUnsubscribedError;\n}\n\n/**\n * An error thrown when an action is invalid because the object has been\n * unsubscribed.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n *\n * @class ObjectUnsubscribedError\n */\nexport const ObjectUnsubscribedError: ObjectUnsubscribedErrorCtor = createErrorClass(\n (_super) =>\n function ObjectUnsubscribedErrorImpl(this: any) {\n _super(this);\n this.name = 'ObjectUnsubscribedError';\n this.message = 'object unsubscribed';\n }\n);\n", "import { Operator } from './Operator';\nimport { Observable } from './Observable';\nimport { Subscriber } from './Subscriber';\nimport { Subscription, EMPTY_SUBSCRIPTION } from './Subscription';\nimport { Observer, SubscriptionLike, TeardownLogic } from './types';\nimport { ObjectUnsubscribedError } from './util/ObjectUnsubscribedError';\nimport { arrRemove } from './util/arrRemove';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A Subject is a special type of Observable that allows values to be\n * multicasted to many Observers. Subjects are like EventEmitters.\n *\n * Every Subject is an Observable and an Observer. You can subscribe to a\n * Subject, and you can call next to feed values as well as error and complete.\n */\nexport class Subject extends Observable implements SubscriptionLike {\n closed = false;\n\n private currentObservers: Observer[] | null = null;\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n observers: Observer[] = [];\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n isStopped = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n hasError = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n thrownError: any = null;\n\n /**\n * Creates a \"subject\" by basically gluing an observer to an observable.\n *\n * @nocollapse\n * @deprecated Recommended you do not use. Will be removed at some point in the future. Plans for replacement still under discussion.\n */\n static create: (...args: any[]) => any = (destination: Observer, source: Observable): AnonymousSubject => {\n return new AnonymousSubject(destination, source);\n };\n\n constructor() {\n // NOTE: This must be here to obscure Observable's constructor.\n super();\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n lift(operator: Operator): Observable {\n const subject = new AnonymousSubject(this, this);\n subject.operator = operator as any;\n return subject as any;\n }\n\n /** @internal */\n protected _throwIfClosed() {\n if (this.closed) {\n throw new ObjectUnsubscribedError();\n }\n }\n\n next(value: T) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n if (!this.currentObservers) {\n this.currentObservers = Array.from(this.observers);\n }\n for (const observer of this.currentObservers) {\n observer.next(value);\n }\n }\n });\n }\n\n error(err: any) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.hasError = this.isStopped = true;\n this.thrownError = err;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.error(err);\n }\n }\n });\n }\n\n complete() {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.isStopped = true;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.complete();\n }\n }\n });\n }\n\n unsubscribe() {\n this.isStopped = this.closed = true;\n this.observers = this.currentObservers = null!;\n }\n\n get observed() {\n return this.observers?.length > 0;\n }\n\n /** @internal */\n protected _trySubscribe(subscriber: Subscriber): TeardownLogic {\n this._throwIfClosed();\n return super._trySubscribe(subscriber);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._checkFinalizedStatuses(subscriber);\n return this._innerSubscribe(subscriber);\n }\n\n /** @internal */\n protected _innerSubscribe(subscriber: Subscriber) {\n const { hasError, isStopped, observers } = this;\n if (hasError || isStopped) {\n return EMPTY_SUBSCRIPTION;\n }\n this.currentObservers = null;\n observers.push(subscriber);\n return new Subscription(() => {\n this.currentObservers = null;\n arrRemove(observers, subscriber);\n });\n }\n\n /** @internal */\n protected _checkFinalizedStatuses(subscriber: Subscriber) {\n const { hasError, thrownError, isStopped } = this;\n if (hasError) {\n subscriber.error(thrownError);\n } else if (isStopped) {\n subscriber.complete();\n }\n }\n\n /**\n * Creates a new Observable with this Subject as the source. You can do this\n * to create custom Observer-side logic of the Subject and conceal it from\n * code that uses the Observable.\n * @return {Observable} Observable that the Subject casts to\n */\n asObservable(): Observable {\n const observable: any = new Observable();\n observable.source = this;\n return observable;\n }\n}\n\n/**\n * @class AnonymousSubject\n */\nexport class AnonymousSubject extends Subject {\n constructor(\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n public destination?: Observer,\n source?: Observable\n ) {\n super();\n this.source = source;\n }\n\n next(value: T) {\n this.destination?.next?.(value);\n }\n\n error(err: any) {\n this.destination?.error?.(err);\n }\n\n complete() {\n this.destination?.complete?.();\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n return this.source?.subscribe(subscriber) ?? EMPTY_SUBSCRIPTION;\n }\n}\n", "import { TimestampProvider } from '../types';\n\ninterface DateTimestampProvider extends TimestampProvider {\n delegate: TimestampProvider | undefined;\n}\n\nexport const dateTimestampProvider: DateTimestampProvider = {\n now() {\n // Use the variable rather than `this` so that the function can be called\n // without being bound to the provider.\n return (dateTimestampProvider.delegate || Date).now();\n },\n delegate: undefined,\n};\n", "import { Subject } from './Subject';\nimport { TimestampProvider } from './types';\nimport { Subscriber } from './Subscriber';\nimport { Subscription } from './Subscription';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * A variant of {@link Subject} that \"replays\" old values to new subscribers by emitting them when they first subscribe.\n *\n * `ReplaySubject` has an internal buffer that will store a specified number of values that it has observed. Like `Subject`,\n * `ReplaySubject` \"observes\" values by having them passed to its `next` method. When it observes a value, it will store that\n * value for a time determined by the configuration of the `ReplaySubject`, as passed to its constructor.\n *\n * When a new subscriber subscribes to the `ReplaySubject` instance, it will synchronously emit all values in its buffer in\n * a First-In-First-Out (FIFO) manner. The `ReplaySubject` will also complete, if it has observed completion; and it will\n * error if it has observed an error.\n *\n * There are two main configuration items to be concerned with:\n *\n * 1. `bufferSize` - This will determine how many items are stored in the buffer, defaults to infinite.\n * 2. `windowTime` - The amount of time to hold a value in the buffer before removing it from the buffer.\n *\n * Both configurations may exist simultaneously. So if you would like to buffer a maximum of 3 values, as long as the values\n * are less than 2 seconds old, you could do so with a `new ReplaySubject(3, 2000)`.\n *\n * ### Differences with BehaviorSubject\n *\n * `BehaviorSubject` is similar to `new ReplaySubject(1)`, with a couple of exceptions:\n *\n * 1. `BehaviorSubject` comes \"primed\" with a single value upon construction.\n * 2. `ReplaySubject` will replay values, even after observing an error, where `BehaviorSubject` will not.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n * @see {@link shareReplay}\n */\nexport class ReplaySubject extends Subject {\n private _buffer: (T | number)[] = [];\n private _infiniteTimeWindow = true;\n\n /**\n * @param bufferSize The size of the buffer to replay on subscription\n * @param windowTime The amount of time the buffered items will stay buffered\n * @param timestampProvider An object with a `now()` method that provides the current timestamp. This is used to\n * calculate the amount of time something has been buffered.\n */\n constructor(\n private _bufferSize = Infinity,\n private _windowTime = Infinity,\n private _timestampProvider: TimestampProvider = dateTimestampProvider\n ) {\n super();\n this._infiniteTimeWindow = _windowTime === Infinity;\n this._bufferSize = Math.max(1, _bufferSize);\n this._windowTime = Math.max(1, _windowTime);\n }\n\n next(value: T): void {\n const { isStopped, _buffer, _infiniteTimeWindow, _timestampProvider, _windowTime } = this;\n if (!isStopped) {\n _buffer.push(value);\n !_infiniteTimeWindow && _buffer.push(_timestampProvider.now() + _windowTime);\n }\n this._trimBuffer();\n super.next(value);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._trimBuffer();\n\n const subscription = this._innerSubscribe(subscriber);\n\n const { _infiniteTimeWindow, _buffer } = this;\n // We use a copy here, so reentrant code does not mutate our array while we're\n // emitting it to a new subscriber.\n const copy = _buffer.slice();\n for (let i = 0; i < copy.length && !subscriber.closed; i += _infiniteTimeWindow ? 1 : 2) {\n subscriber.next(copy[i] as T);\n }\n\n this._checkFinalizedStatuses(subscriber);\n\n return subscription;\n }\n\n private _trimBuffer() {\n const { _bufferSize, _timestampProvider, _buffer, _infiniteTimeWindow } = this;\n // If we don't have an infinite buffer size, and we're over the length,\n // use splice to truncate the old buffer values off. Note that we have to\n // double the size for instances where we're not using an infinite time window\n // because we're storing the values and the timestamps in the same array.\n const adjustedBufferSize = (_infiniteTimeWindow ? 1 : 2) * _bufferSize;\n _bufferSize < Infinity && adjustedBufferSize < _buffer.length && _buffer.splice(0, _buffer.length - adjustedBufferSize);\n\n // Now, if we're not in an infinite time window, remove all values where the time is\n // older than what is allowed.\n if (!_infiniteTimeWindow) {\n const now = _timestampProvider.now();\n let last = 0;\n // Search the array for the first timestamp that isn't expired and\n // truncate the buffer up to that point.\n for (let i = 1; i < _buffer.length && (_buffer[i] as number) <= now; i += 2) {\n last = i;\n }\n last && _buffer.splice(0, last + 1);\n }\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Subscription } from '../Subscription';\nimport { SchedulerAction } from '../types';\n\n/**\n * A unit of work to be executed in a `scheduler`. An action is typically\n * created from within a {@link SchedulerLike} and an RxJS user does not need to concern\n * themselves about creating and manipulating an Action.\n *\n * ```ts\n * class Action extends Subscription {\n * new (scheduler: Scheduler, work: (state?: T) => void);\n * schedule(state?: T, delay: number = 0): Subscription;\n * }\n * ```\n *\n * @class Action\n */\nexport class Action extends Subscription {\n constructor(scheduler: Scheduler, work: (this: SchedulerAction, state?: T) => void) {\n super();\n }\n /**\n * Schedules this action on its parent {@link SchedulerLike} for execution. May be passed\n * some context object, `state`. May happen at some point in the future,\n * according to the `delay` parameter, if specified.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler.\n * @return {void}\n */\n public schedule(state?: T, delay: number = 0): Subscription {\n return this;\n }\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetIntervalFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearIntervalFunction = (handle: TimerHandle) => void;\n\ninterface IntervalProvider {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n delegate:\n | {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n }\n | undefined;\n}\n\nexport const intervalProvider: IntervalProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setInterval(handler: () => void, timeout?: number, ...args) {\n const { delegate } = intervalProvider;\n if (delegate?.setInterval) {\n return delegate.setInterval(handler, timeout, ...args);\n }\n return setInterval(handler, timeout, ...args);\n },\n clearInterval(handle) {\n const { delegate } = intervalProvider;\n return (delegate?.clearInterval || clearInterval)(handle as any);\n },\n delegate: undefined,\n};\n", "import { Action } from './Action';\nimport { SchedulerAction } from '../types';\nimport { Subscription } from '../Subscription';\nimport { AsyncScheduler } from './AsyncScheduler';\nimport { intervalProvider } from './intervalProvider';\nimport { arrRemove } from '../util/arrRemove';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncAction extends Action {\n public id: TimerHandle | undefined;\n public state?: T;\n // @ts-ignore: Property has no initializer and is not definitely assigned\n public delay: number;\n protected pending: boolean = false;\n\n constructor(protected scheduler: AsyncScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n public schedule(state?: T, delay: number = 0): Subscription {\n if (this.closed) {\n return this;\n }\n\n // Always replace the current state with the new state.\n this.state = state;\n\n const id = this.id;\n const scheduler = this.scheduler;\n\n //\n // Important implementation note:\n //\n // Actions only execute once by default, unless rescheduled from within the\n // scheduled callback. This allows us to implement single and repeat\n // actions via the same code path, without adding API surface area, as well\n // as mimic traditional recursion but across asynchronous boundaries.\n //\n // However, JS runtimes and timers distinguish between intervals achieved by\n // serial `setTimeout` calls vs. a single `setInterval` call. An interval of\n // serial `setTimeout` calls can be individually delayed, which delays\n // scheduling the next `setTimeout`, and so on. `setInterval` attempts to\n // guarantee the interval callback will be invoked more precisely to the\n // interval period, regardless of load.\n //\n // Therefore, we use `setInterval` to schedule single and repeat actions.\n // If the action reschedules itself with the same delay, the interval is not\n // canceled. If the action doesn't reschedule, or reschedules with a\n // different delay, the interval will be canceled after scheduled callback\n // execution.\n //\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, delay);\n }\n\n // Set the pending flag indicating that this action has been scheduled, or\n // has recursively rescheduled itself.\n this.pending = true;\n\n this.delay = delay;\n // If this action has already an async Id, don't request a new one.\n this.id = this.id ?? this.requestAsyncId(scheduler, this.id, delay);\n\n return this;\n }\n\n protected requestAsyncId(scheduler: AsyncScheduler, _id?: TimerHandle, delay: number = 0): TimerHandle {\n return intervalProvider.setInterval(scheduler.flush.bind(scheduler, this), delay);\n }\n\n protected recycleAsyncId(_scheduler: AsyncScheduler, id?: TimerHandle, delay: number | null = 0): TimerHandle | undefined {\n // If this action is rescheduled with the same delay time, don't clear the interval id.\n if (delay != null && this.delay === delay && this.pending === false) {\n return id;\n }\n // Otherwise, if the action's delay time is different from the current delay,\n // or the action has been rescheduled before it's executed, clear the interval id\n if (id != null) {\n intervalProvider.clearInterval(id);\n }\n\n return undefined;\n }\n\n /**\n * Immediately executes this action and the `work` it contains.\n * @return {any}\n */\n public execute(state: T, delay: number): any {\n if (this.closed) {\n return new Error('executing a cancelled action');\n }\n\n this.pending = false;\n const error = this._execute(state, delay);\n if (error) {\n return error;\n } else if (this.pending === false && this.id != null) {\n // Dequeue if the action didn't reschedule itself. Don't call\n // unsubscribe(), because the action could reschedule later.\n // For example:\n // ```\n // scheduler.schedule(function doWork(counter) {\n // /* ... I'm a busy worker bee ... */\n // var originalAction = this;\n // /* wait 100ms before rescheduling the action */\n // setTimeout(function () {\n // originalAction.schedule(counter + 1);\n // }, 100);\n // }, 1000);\n // ```\n this.id = this.recycleAsyncId(this.scheduler, this.id, null);\n }\n }\n\n protected _execute(state: T, _delay: number): any {\n let errored: boolean = false;\n let errorValue: any;\n try {\n this.work(state);\n } catch (e) {\n errored = true;\n // HACK: Since code elsewhere is relying on the \"truthiness\" of the\n // return here, we can't have it return \"\" or 0 or false.\n // TODO: Clean this up when we refactor schedulers mid-version-8 or so.\n errorValue = e ? e : new Error('Scheduled action threw falsy error');\n }\n if (errored) {\n this.unsubscribe();\n return errorValue;\n }\n }\n\n unsubscribe() {\n if (!this.closed) {\n const { id, scheduler } = this;\n const { actions } = scheduler;\n\n this.work = this.state = this.scheduler = null!;\n this.pending = false;\n\n arrRemove(actions, this);\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, null);\n }\n\n this.delay = null!;\n super.unsubscribe();\n }\n }\n}\n", "import { Action } from './scheduler/Action';\nimport { Subscription } from './Subscription';\nimport { SchedulerLike, SchedulerAction } from './types';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * An execution context and a data structure to order tasks and schedule their\n * execution. Provides a notion of (potentially virtual) time, through the\n * `now()` getter method.\n *\n * Each unit of work in a Scheduler is called an `Action`.\n *\n * ```ts\n * class Scheduler {\n * now(): number;\n * schedule(work, delay?, state?): Subscription;\n * }\n * ```\n *\n * @class Scheduler\n * @deprecated Scheduler is an internal implementation detail of RxJS, and\n * should not be used directly. Rather, create your own class and implement\n * {@link SchedulerLike}. Will be made internal in v8.\n */\nexport class Scheduler implements SchedulerLike {\n public static now: () => number = dateTimestampProvider.now;\n\n constructor(private schedulerActionCtor: typeof Action, now: () => number = Scheduler.now) {\n this.now = now;\n }\n\n /**\n * A getter method that returns a number representing the current time\n * (at the time this function was called) according to the scheduler's own\n * internal clock.\n * @return {number} A number that represents the current time. May or may not\n * have a relation to wall-clock time. May or may not refer to a time unit\n * (e.g. milliseconds).\n */\n public now: () => number;\n\n /**\n * Schedules a function, `work`, for execution. May happen at some point in\n * the future, according to the `delay` parameter, if specified. May be passed\n * some context object, `state`, which will be passed to the `work` function.\n *\n * The given arguments will be processed an stored as an Action object in a\n * queue of actions.\n *\n * @param {function(state: ?T): ?Subscription} work A function representing a\n * task, or some unit of work to be executed by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler itself.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @return {Subscription} A subscription in order to be able to unsubscribe\n * the scheduled work.\n */\n public schedule(work: (this: SchedulerAction, state?: T) => void, delay: number = 0, state?: T): Subscription {\n return new this.schedulerActionCtor(this, work).schedule(state, delay);\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Action } from './Action';\nimport { AsyncAction } from './AsyncAction';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncScheduler extends Scheduler {\n public actions: Array> = [];\n /**\n * A flag to indicate whether the Scheduler is currently executing a batch of\n * queued actions.\n * @type {boolean}\n * @internal\n */\n public _active: boolean = false;\n /**\n * An internal ID used to track the latest asynchronous task such as those\n * coming from `setTimeout`, `setInterval`, `requestAnimationFrame`, and\n * others.\n * @type {any}\n * @internal\n */\n public _scheduled: TimerHandle | undefined;\n\n constructor(SchedulerAction: typeof Action, now: () => number = Scheduler.now) {\n super(SchedulerAction, now);\n }\n\n public flush(action: AsyncAction): void {\n const { actions } = this;\n\n if (this._active) {\n actions.push(action);\n return;\n }\n\n let error: any;\n this._active = true;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions.shift()!)); // exhaust the scheduler queue\n\n this._active = false;\n\n if (error) {\n while ((action = actions.shift()!)) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\n/**\n *\n * Async Scheduler\n *\n * Schedule task as if you used setTimeout(task, duration)\n *\n * `async` scheduler schedules tasks asynchronously, by putting them on the JavaScript\n * event loop queue. It is best used to delay tasks in time or to schedule tasks repeating\n * in intervals.\n *\n * If you just want to \"defer\" task, that is to perform it right after currently\n * executing synchronous code ends (commonly achieved by `setTimeout(deferredTask, 0)`),\n * better choice will be the {@link asapScheduler} scheduler.\n *\n * ## Examples\n * Use async scheduler to delay task\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * const task = () => console.log('it works!');\n *\n * asyncScheduler.schedule(task, 2000);\n *\n * // After 2 seconds logs:\n * // \"it works!\"\n * ```\n *\n * Use async scheduler to repeat task in intervals\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * function task(state) {\n * console.log(state);\n * this.schedule(state + 1, 1000); // `this` references currently executing Action,\n * // which we reschedule with new state and delay\n * }\n *\n * asyncScheduler.schedule(task, 3000, 0);\n *\n * // Logs:\n * // 0 after 3s\n * // 1 after 4s\n * // 2 after 5s\n * // 3 after 6s\n * ```\n */\n\nexport const asyncScheduler = new AsyncScheduler(AsyncAction);\n\n/**\n * @deprecated Renamed to {@link asyncScheduler}. Will be removed in v8.\n */\nexport const async = asyncScheduler;\n", "import { AsyncAction } from './AsyncAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\nimport { SchedulerAction } from '../types';\nimport { animationFrameProvider } from './animationFrameProvider';\nimport { TimerHandle } from './timerHandle';\n\nexport class AnimationFrameAction extends AsyncAction {\n constructor(protected scheduler: AnimationFrameScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n protected requestAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle {\n // If delay is greater than 0, request as an async action.\n if (delay !== null && delay > 0) {\n return super.requestAsyncId(scheduler, id, delay);\n }\n // Push the action to the end of the scheduler queue.\n scheduler.actions.push(this);\n // If an animation frame has already been requested, don't request another\n // one. If an animation frame hasn't been requested yet, request one. Return\n // the current animation frame request id.\n return scheduler._scheduled || (scheduler._scheduled = animationFrameProvider.requestAnimationFrame(() => scheduler.flush(undefined)));\n }\n\n protected recycleAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle | undefined {\n // If delay exists and is greater than 0, or if the delay is null (the\n // action wasn't rescheduled) but was originally scheduled as an async\n // action, then recycle as an async action.\n if (delay != null ? delay > 0 : this.delay > 0) {\n return super.recycleAsyncId(scheduler, id, delay);\n }\n // If the scheduler queue has no remaining actions with the same async id,\n // cancel the requested animation frame and set the scheduled flag to\n // undefined so the next AnimationFrameAction will request its own.\n const { actions } = scheduler;\n if (id != null && actions[actions.length - 1]?.id !== id) {\n animationFrameProvider.cancelAnimationFrame(id as number);\n scheduler._scheduled = undefined;\n }\n // Return undefined so the action knows to request a new async id if it's rescheduled.\n return undefined;\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\nexport class AnimationFrameScheduler extends AsyncScheduler {\n public flush(action?: AsyncAction): void {\n this._active = true;\n // The async id that effects a call to flush is stored in _scheduled.\n // Before executing an action, it's necessary to check the action's async\n // id to determine whether it's supposed to be executed in the current\n // flush.\n // Previous implementations of this method used a count to determine this,\n // but that was unsound, as actions that are unsubscribed - i.e. cancelled -\n // are removed from the actions array and that can shift actions that are\n // scheduled to be executed in a subsequent flush into positions at which\n // they are executed within the current flush.\n const flushId = this._scheduled;\n this._scheduled = undefined;\n\n const { actions } = this;\n let error: any;\n action = action || actions.shift()!;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions[0]) && action.id === flushId && actions.shift());\n\n this._active = false;\n\n if (error) {\n while ((action = actions[0]) && action.id === flushId && actions.shift()) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AnimationFrameAction } from './AnimationFrameAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\n\n/**\n *\n * Animation Frame Scheduler\n *\n * Perform task when `window.requestAnimationFrame` would fire\n *\n * When `animationFrame` scheduler is used with delay, it will fall back to {@link asyncScheduler} scheduler\n * behaviour.\n *\n * Without delay, `animationFrame` scheduler can be used to create smooth browser animations.\n * It makes sure scheduled task will happen just before next browser content repaint,\n * thus performing animations as efficiently as possible.\n *\n * ## Example\n * Schedule div height animation\n * ```ts\n * // html:
\n * import { animationFrameScheduler } from 'rxjs';\n *\n * const div = document.querySelector('div');\n *\n * animationFrameScheduler.schedule(function(height) {\n * div.style.height = height + \"px\";\n *\n * this.schedule(height + 1); // `this` references currently executing Action,\n * // which we reschedule with new state\n * }, 0, 0);\n *\n * // You will see a div element growing in height\n * ```\n */\n\nexport const animationFrameScheduler = new AnimationFrameScheduler(AnimationFrameAction);\n\n/**\n * @deprecated Renamed to {@link animationFrameScheduler}. Will be removed in v8.\n */\nexport const animationFrame = animationFrameScheduler;\n", "import { Observable } from '../Observable';\nimport { SchedulerLike } from '../types';\n\n/**\n * A simple Observable that emits no items to the Observer and immediately\n * emits a complete notification.\n *\n * Just emits 'complete', and nothing else.\n *\n * ![](empty.png)\n *\n * A simple Observable that only emits the complete notification. It can be used\n * for composing with other Observables, such as in a {@link mergeMap}.\n *\n * ## Examples\n *\n * Log complete notification\n *\n * ```ts\n * import { EMPTY } from 'rxjs';\n *\n * EMPTY.subscribe({\n * next: () => console.log('Next'),\n * complete: () => console.log('Complete!')\n * });\n *\n * // Outputs\n * // Complete!\n * ```\n *\n * Emit the number 7, then complete\n *\n * ```ts\n * import { EMPTY, startWith } from 'rxjs';\n *\n * const result = EMPTY.pipe(startWith(7));\n * result.subscribe(x => console.log(x));\n *\n * // Outputs\n * // 7\n * ```\n *\n * Map and flatten only odd numbers to the sequence `'a'`, `'b'`, `'c'`\n *\n * ```ts\n * import { interval, mergeMap, of, EMPTY } from 'rxjs';\n *\n * const interval$ = interval(1000);\n * const result = interval$.pipe(\n * mergeMap(x => x % 2 === 1 ? of('a', 'b', 'c') : EMPTY),\n * );\n * result.subscribe(x => console.log(x));\n *\n * // Results in the following to the console:\n * // x is equal to the count on the interval, e.g. (0, 1, 2, 3, ...)\n * // x will occur every 1000ms\n * // if x % 2 is equal to 1, print a, b, c (each on its own)\n * // if x % 2 is not equal to 1, nothing will be output\n * ```\n *\n * @see {@link Observable}\n * @see {@link NEVER}\n * @see {@link of}\n * @see {@link throwError}\n */\nexport const EMPTY = new Observable((subscriber) => subscriber.complete());\n\n/**\n * @param scheduler A {@link SchedulerLike} to use for scheduling\n * the emission of the complete notification.\n * @deprecated Replaced with the {@link EMPTY} constant or {@link scheduled} (e.g. `scheduled([], scheduler)`). Will be removed in v8.\n */\nexport function empty(scheduler?: SchedulerLike) {\n return scheduler ? emptyScheduled(scheduler) : EMPTY;\n}\n\nfunction emptyScheduled(scheduler: SchedulerLike) {\n return new Observable((subscriber) => scheduler.schedule(() => subscriber.complete()));\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport function isScheduler(value: any): value is SchedulerLike {\n return value && isFunction(value.schedule);\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\nimport { isScheduler } from './isScheduler';\n\nfunction last(arr: T[]): T | undefined {\n return arr[arr.length - 1];\n}\n\nexport function popResultSelector(args: any[]): ((...args: unknown[]) => unknown) | undefined {\n return isFunction(last(args)) ? args.pop() : undefined;\n}\n\nexport function popScheduler(args: any[]): SchedulerLike | undefined {\n return isScheduler(last(args)) ? args.pop() : undefined;\n}\n\nexport function popNumber(args: any[], defaultValue: number): number {\n return typeof last(args) === 'number' ? args.pop()! : defaultValue;\n}\n", "export const isArrayLike = ((x: any): x is ArrayLike => x && typeof x.length === 'number' && typeof x !== 'function');", "import { isFunction } from \"./isFunction\";\n\n/**\n * Tests to see if the object is \"thennable\".\n * @param value the object to test\n */\nexport function isPromise(value: any): value is PromiseLike {\n return isFunction(value?.then);\n}\n", "import { InteropObservable } from '../types';\nimport { observable as Symbol_observable } from '../symbol/observable';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being Observable (but not necessary an Rx Observable) */\nexport function isInteropObservable(input: any): input is InteropObservable {\n return isFunction(input[Symbol_observable]);\n}\n", "import { isFunction } from './isFunction';\n\nexport function isAsyncIterable(obj: any): obj is AsyncIterable {\n return Symbol.asyncIterator && isFunction(obj?.[Symbol.asyncIterator]);\n}\n", "/**\n * Creates the TypeError to throw if an invalid object is passed to `from` or `scheduled`.\n * @param input The object that was passed.\n */\nexport function createInvalidObservableTypeError(input: any) {\n // TODO: We should create error codes that can be looked up, so this can be less verbose.\n return new TypeError(\n `You provided ${\n input !== null && typeof input === 'object' ? 'an invalid object' : `'${input}'`\n } where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.`\n );\n}\n", "export function getSymbolIterator(): symbol {\n if (typeof Symbol !== 'function' || !Symbol.iterator) {\n return '@@iterator' as any;\n }\n\n return Symbol.iterator;\n}\n\nexport const iterator = getSymbolIterator();\n", "import { iterator as Symbol_iterator } from '../symbol/iterator';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being an Iterable */\nexport function isIterable(input: any): input is Iterable {\n return isFunction(input?.[Symbol_iterator]);\n}\n", "import { ReadableStreamLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport async function* readableStreamLikeToAsyncGenerator(readableStream: ReadableStreamLike): AsyncGenerator {\n const reader = readableStream.getReader();\n try {\n while (true) {\n const { value, done } = await reader.read();\n if (done) {\n return;\n }\n yield value!;\n }\n } finally {\n reader.releaseLock();\n }\n}\n\nexport function isReadableStreamLike(obj: any): obj is ReadableStreamLike {\n // We don't want to use instanceof checks because they would return\n // false for instances from another Realm, like an

Want to learn more about Substrait? Try the following presentations and slide decks.

  • Substrait: A Common Representation for Data Compute Plans (Jacques Nadeau, April 2022) [slides]

Citation

If you use Substrait in your research, please cite it using the following BibTeX entry:

@misc{substrait,
+ Community - Substrait: Cross-Language Serialization for Relational Algebra      

Community

Substrait is developed as a consensus-driven open source product under the Apache 2.0 license. Development is done in the open leveraging GitHub issues and PRs.

Get In Touch

Mailing List/Google Group
We use the mailing list to discuss questions, formulate plans and collaborate asynchronously.
Slack Channel
The developers of Substrait frequent the Slack channel. You can get an invite to the channel by following this link.
GitHub Issues
Substrait is developed via GitHub issues and pull requests. If you see a problem or want to enhance the product, we suggest you file a GitHub issue for developers to review.
Twitter
The @substrait_io account on Twitter is our official account. Follow-up to keep to date on what is happening with Substrait!
Docs
Our website is all maintained in our source repository. If there is something you think can be improved, feel free to fork our repository and post a pull request.
Meetings
Our community meets every other week on Wednesday.

Talks

Want to learn more about Substrait? Try the following presentations and slide decks.

  • Substrait: A Common Representation for Data Compute Plans (Jacques Nadeau, April 2022) [slides]

Citation

If you use Substrait in your research, please cite it using the following BibTeX entry:

@misc{substrait,
   author = {substrait-io},
   title = {Substrait: Cross-Language Serialization for Relational Algebra},
   year = {2021},
@@ -28,4 +28,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/community/powered_by/index.html b/community/powered_by/index.html index 16ad5558..b648db5c 100644 --- a/community/powered_by/index.html +++ b/community/powered_by/index.html @@ -1,4 +1,4 @@ - Powered by Substrait - Substrait: Cross-Language Serialization for Relational Algebra

Powered by Substrait

In addition to the work maintained in repositories within the substrait-io GitHub organization, a growing list of other open source projects have adopted Substrait.

Acero
Acero is a query execution engine implemented as a part of the Apache Arrow C++ library. Acero provides a Substrait consumer interface.
ADBC
ADBC (Arrow Database Connectivity) is an API specification for Apache Arrow-based database access. ADBC allows applications to pass queries either as SQL strings or Substrait plans.
Arrow Flight SQL
Arrow Flight SQL is a client-server protocol for interacting with databases and query engines using the Apache Arrow in-memory columnar format and the Arrow Flight RPC framework. Arrow Flight SQL allows clients to send queries as SQL strings or Substrait plans.
DataFusion
DataFusion is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion provides a Substrait producer and consumer that can convert DataFusion logical plans to and from Substrait plans. It can be used through the DataFusion Python bindings.
DuckDB
DuckDB is an in-process SQL OLAP database management system. DuckDB provides a Substrait extension that allows users to produce and consume Substrait plans through DuckDB’s SQL, Python, and R APIs.
Gluten
Gluten is a plugin for Apache Spark that allows computation to be offloaded to engines that have better performance or efficiency than Spark’s built-in JVM-based engine. Gluten converts Spark physical plans to Substrait plans.
Ibis
Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It includes a dataframe API for Python with support for more than 10 query execution engines, plus a Substrait producer to enable support for Substrait-consuming execution engines.
Substrait R Interface
The Substrait R interface package allows users to construct Substrait plans from R for evaluation by Substrait-consuming execution engines. The package provides a dplyr backend as well as lower-level interfaces for creating Substrait plans and integrations with Acero and DuckDB.
Velox
Velox is a unified execution engine aimed at accelerating data management systems and streamlining their development. Velox provides a Substrait consumer interface.

To add your project to this list, please open a pull request.

GitHub

Powered by Substrait

In addition to the work maintained in repositories within the substrait-io GitHub organization, a growing list of other open source projects have adopted Substrait.

Acero
Acero is a query execution engine implemented as a part of the Apache Arrow C++ library. Acero provides a Substrait consumer interface.
ADBC
ADBC (Arrow Database Connectivity) is an API specification for Apache Arrow-based database access. ADBC allows applications to pass queries either as SQL strings or Substrait plans.
Arrow Flight SQL
Arrow Flight SQL is a client-server protocol for interacting with databases and query engines using the Apache Arrow in-memory columnar format and the Arrow Flight RPC framework. Arrow Flight SQL allows clients to send queries as SQL strings or Substrait plans.
DataFusion
DataFusion is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion provides a Substrait producer and consumer that can convert DataFusion logical plans to and from Substrait plans. It can be used through the DataFusion Python bindings.
DuckDB
DuckDB is an in-process SQL OLAP database management system. DuckDB provides a Substrait extension that allows users to produce and consume Substrait plans through DuckDB’s SQL, Python, and R APIs.
Gluten
Gluten is a plugin for Apache Spark that allows computation to be offloaded to engines that have better performance or efficiency than Spark’s built-in JVM-based engine. Gluten converts Spark physical plans to Substrait plans.
Ibis
Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It includes a dataframe API for Python with support for more than 10 query execution engines, plus a Substrait producer to enable support for Substrait-consuming execution engines.
Substrait R Interface
The Substrait R interface package allows users to construct Substrait plans from R for evaluation by Substrait-consuming execution engines. The package provides a dplyr backend as well as lower-level interfaces for creating Substrait plans and integrations with Acero and DuckDB.
Velox
Velox is a unified execution engine aimed at accelerating data management systems and streamlining their development. Velox provides a Substrait consumer interface.

To add your project to this list, please open a pull request.

\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/aggregate_functions/index.html b/expressions/aggregate_functions/index.html index ddc360cf..27bc9a73 100644 --- a/expressions/aggregate_functions/index.html +++ b/expressions/aggregate_functions/index.html @@ -1,4 +1,4 @@ - Aggregate Functions - Substrait: Cross-Language Serialization for Relational Algebra

Aggregate Functions

Aggregate functions are functions that define an operation which consumes values from multiple records to a produce a single output. Aggregate functions in SQL are typically used in GROUP BY functions. Aggregate functions are similar to scalar functions and function signatures with a small set of different properties.

Aggregate function signatures contain all the properties defined for scalar functions. Additionally, they contain the properties below:

Property Description Required
Inherits All properties defined for scalar function. N/A
Ordered Whether the result of this function is sensitive to sort order. Optional, defaults to false
Maximum set size Maximum allowed set size as an unsigned integer. Optional, defaults to unlimited
Decomposable Whether the function can be executed in one or more intermediate steps. Valid options are: NONE, ONE, MANY, describing how intermediate steps can be taken. Optional, defaults to NONE
Intermediate Output Type If the function is decomposable, represents the intermediate output type that is used, if the function is defined as either ONE or MANY decomposable. Will be a struct in many cases. Required for ONE and MANY.
Invocation Whether the function uses all or only distinct values in the aggregation calculation. Valid options are: ALL, DISTINCT. Optional, defaults to ALL

Aggregate Binding

When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

Property Description
Phase Describes the input type of the data: [INITIAL_TO_INTERMEDIATE, INTERMEDIATE_TO_INTERMEDIATE, INITIAL_TO_RESULT, INTERMEDIATE_TO_RESULT] describing what portion of the operation is required. For functions that are NOT decomposable, the only valid option will be INITIAL_TO_RESULT.
Ordering Zero or more ordering keys along with key order (ASC|DESC|NULL FIRST, etc.), declared similar to the sort keys in an ORDER BY relational operation. If no sorts are specified, the records are not sorted prior to being passed to the aggregate function.
GitHub

Aggregate Functions

Aggregate functions are functions that define an operation which consumes values from multiple records to a produce a single output. Aggregate functions in SQL are typically used in GROUP BY functions. Aggregate functions are similar to scalar functions and function signatures with a small set of different properties.

Aggregate function signatures contain all the properties defined for scalar functions. Additionally, they contain the properties below:

Property Description Required
Inherits All properties defined for scalar function. N/A
Ordered Whether the result of this function is sensitive to sort order. Optional, defaults to false
Maximum set size Maximum allowed set size as an unsigned integer. Optional, defaults to unlimited
Decomposable Whether the function can be executed in one or more intermediate steps. Valid options are: NONE, ONE, MANY, describing how intermediate steps can be taken. Optional, defaults to NONE
Intermediate Output Type If the function is decomposable, represents the intermediate output type that is used, if the function is defined as either ONE or MANY decomposable. Will be a struct in many cases. Required for ONE and MANY.
Invocation Whether the function uses all or only distinct values in the aggregation calculation. Valid options are: ALL, DISTINCT. Optional, defaults to ALL

Aggregate Binding

When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

Property Description
Phase Describes the input type of the data: [INITIAL_TO_INTERMEDIATE, INTERMEDIATE_TO_INTERMEDIATE, INITIAL_TO_RESULT, INTERMEDIATE_TO_RESULT] describing what portion of the operation is required. For functions that are NOT decomposable, the only valid option will be INITIAL_TO_RESULT.
Ordering Zero or more ordering keys along with key order (ASC|DESC|NULL FIRST, etc.), declared similar to the sort keys in an ORDER BY relational operation. If no sorts are specified, the records are not sorted prior to being passed to the aggregate function.
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/embedded_functions/index.html b/expressions/embedded_functions/index.html index 2f0a5327..f63de0b4 100644 --- a/expressions/embedded_functions/index.html +++ b/expressions/embedded_functions/index.html @@ -1,4 +1,4 @@ - Embedded Functions - Substrait: Cross-Language Serialization for Relational Algebra

Embedded Functions

Embedded functions are a special kind of function where the implementation is embedded within the actual plan. They are commonly used in tools where a user intersperses business logic within a data pipeline. This is more common in data science workflows than traditional SQL workflows.

Embedded functions are not pre-registered. Embedded functions require that data be consumed and produced with a standard API, may require memory allocation and have determinate error reporting behavior. They may also have specific runtime dependencies. For example, a Python pickle function may depend on pyarrow 5.0 and pynessie 1.0.

Properties for an embedded function include:

Property Description Required
Function Type The type of embedded function presented. Required
Function Properties Function properties, one of those items defined below. Required
Output Type The fully resolved output type for this embedded function. Required

The binary representation of an embedded function is:

message EmbeddedFunction {
+ Embedded Functions - Substrait: Cross-Language Serialization for Relational Algebra      

Embedded Functions

Embedded functions are a special kind of function where the implementation is embedded within the actual plan. They are commonly used in tools where a user intersperses business logic within a data pipeline. This is more common in data science workflows than traditional SQL workflows.

Embedded functions are not pre-registered. Embedded functions require that data be consumed and produced with a standard API, may require memory allocation and have determinate error reporting behavior. They may also have specific runtime dependencies. For example, a Python pickle function may depend on pyarrow 5.0 and pynessie 1.0.

Properties for an embedded function include:

Property Description Required
Function Type The type of embedded function presented. Required
Function Properties Function properties, one of those items defined below. Required
Output Type The fully resolved output type for this embedded function. Required

The binary representation of an embedded function is:

message EmbeddedFunction {
   repeated Expression arguments = 1;
   Type output_type = 2;
   oneof kind {
@@ -36,4 +36,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/extended_expression/index.html b/expressions/extended_expression/index.html index 9158120a..f3a0623f 100644 --- a/expressions/extended_expression/index.html +++ b/expressions/extended_expression/index.html @@ -1,4 +1,4 @@ - Extended Expression - Substrait: Cross-Language Serialization for Relational Algebra

Extended Expression

Extended Expression messages are provided for expression-level protocols as an alternative to using a Plan. They mainly target expression-only evaluations, such as those computed in Filter/Project/Aggregation rels. Unlike the original Expression defined in the substrait protocol, Extended Expression messages require more information to completely describe the computation context including: input data schema, referred function signatures, and output schema.

Since Extended Expression will be used seperately from the Plan rel representation, it will need to include basic fields like Version.

message ExtendedExpression {
+ Extended Expression - Substrait: Cross-Language Serialization for Relational Algebra      

Extended Expression

Extended Expression messages are provided for expression-level protocols as an alternative to using a Plan. They mainly target expression-only evaluations, such as those computed in Filter/Project/Aggregation rels. Unlike the original Expression defined in the substrait protocol, Extended Expression messages require more information to completely describe the computation context including: input data schema, referred function signatures, and output schema.

Since Extended Expression will be used seperately from the Plan rel representation, it will need to include basic fields like Version.

message ExtendedExpression {
   // Substrait version of the expression. Optional up to 0.17.0, required for later
   // versions.
   Version version = 7;
@@ -44,4 +44,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/field_references/index.html b/expressions/field_references/index.html index b190d274..e2c4f8c9 100644 --- a/expressions/field_references/index.html +++ b/expressions/field_references/index.html @@ -1,4 +1,4 @@ - Field References - Substrait: Cross-Language Serialization for Relational Algebra

Field References

In Substrait, all fields are dealt with on a positional basis. Field names are only used at the edge of a plan, for the purposes of naming fields for the outside world. Each operation returns a simple or compound data type. Additional operations can refer to data within that initial operation using field references. To reference a field, you use a reference based on the type of field position you want to reference.

Reference Type Properties Type Applicability Type return
Struct Field Ordinal position. Zero-based. Only legal within the range of possible fields within a struct. Selecting an ordinal outside the applicable field range results in an invalid plan. struct Type of field referenced
Array Value Array offset. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Negative and positive overflows return null values (no wrapping). list type of list
Array Slice Array offset and element count. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Position does not wrap, nor does length. list Same type as original list
Map Key A map value that is matched exactly against available map keys and returned. map Value type of map
Map KeyExpression A wildcard string that is matched against a simplified form of regular expressions. Requires the key type of the map to be a character type. [Format detail needed, intention to include basic regex concepts such as greedy/non-greedy.] map List of map value type
Masked Complex Expression An expression that provides a mask over a schema declaring which portions of the schema should be presented. This allows a user to select a portion of a complex object but mask certain subsections of that same object. any any

Compound References

References are typically constructed as a sequence. For example: [struct position 0, struct position 1, array offset 2, array slice 1..3].

Field references are in the same order they are defined in their schema. For example, let’s consider the following schema:

column a:
+ Field References - Substrait: Cross-Language Serialization for Relational Algebra      

Field References

In Substrait, all fields are dealt with on a positional basis. Field names are only used at the edge of a plan, for the purposes of naming fields for the outside world. Each operation returns a simple or compound data type. Additional operations can refer to data within that initial operation using field references. To reference a field, you use a reference based on the type of field position you want to reference.

Reference Type Properties Type Applicability Type return
Struct Field Ordinal position. Zero-based. Only legal within the range of possible fields within a struct. Selecting an ordinal outside the applicable field range results in an invalid plan. struct Type of field referenced
Array Value Array offset. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Negative and positive overflows return null values (no wrapping). list type of list
Array Slice Array offset and element count. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Position does not wrap, nor does length. list Same type as original list
Map Key A map value that is matched exactly against available map keys and returned. map Value type of map
Map KeyExpression A wildcard string that is matched against a simplified form of regular expressions. Requires the key type of the map to be a character type. [Format detail needed, intention to include basic regex concepts such as greedy/non-greedy.] map List of map value type
Masked Complex Expression An expression that provides a mask over a schema declaring which portions of the schema should be presented. This allows a user to select a portion of a complex object but mask certain subsections of that same object. any any

Compound References

References are typically constructed as a sequence. For example: [struct position 0, struct position 1, array offset 2, array slice 1..3].

Field references are in the same order they are defined in their schema. For example, let’s consider the following schema:

column a:
   struct<
     b: list<
       struct<
@@ -100,4 +100,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/scalar_functions/index.html b/expressions/scalar_functions/index.html index 086a4a9a..8f321035 100644 --- a/expressions/scalar_functions/index.html +++ b/expressions/scalar_functions/index.html @@ -1,4 +1,4 @@ - Scalar Functions - Substrait: Cross-Language Serialization for Relational Algebra

Scalar Functions

A function is a scalar function if that function takes in values from a single record and produces an output value. To clearly specify the definition of functions, Substrait declares an extensible specification plus binding approach to function resolution. A scalar function signature includes the following properties:

Property Description Required
Name One or more user-friendly UTF-8 strings that are used to reference this function. At least one value is required.
List of arguments Argument properties are defined below. Arguments can be fully defined or calculated with a type expression. See further details below. Optional, defaults to niladic.
Deterministic Whether this function is expected to reproduce the same output when it is invoked multiple times with the same input. This informs a plan consumer on whether it can constant-reduce the defined function. An example would be a random() function, which is typically expected to be evaluated repeatedly despite having the same set of inputs. Optional, defaults to true.
Session Dependent Whether this function is influenced by the session context it is invoked within. For example, a function may be influenced by a user who is invoking the function, the time zone of a session, or some other non-obvious parameter. This can inform caching systems on whether a particular function is cacheable. Optional, defaults to false.
Variadic Behavior Whether the last argument of the function is variadic or a single argument. If variadic, the argument can optionally have a lower bound (minimum number of instances) and an upper bound (maximum number of instances). Optional, defaults to single value.
Nullability Handling Describes how nullability of input arguments maps to nullability of output arguments. Three options are: MIRROR, DECLARED_OUTPUT and DISCRETE. More details about nullability handling are listed below. Optional, defaults to MIRROR
Description Additional description of function for implementers or users. Should be written human-readable to allow exposure to end users. Presented as a map with language => description mappings. E.g. { "en": "This adds two numbers together.", "fr": "cela ajoute deux nombres"}. Optional
Return Value The output type of the expression. Return types can be expressed as a fully-defined type or a type expression. See below for more on type expressions. Required
Implementation Map A map of implementation locations for one or more implementations of the given function. Each key is a function implementation type. Implementation types include examples such as: AthenaArrowLambda, TrinoV361Jar, ArrowCppKernelEnum, GandivaEnum, LinkedIn Transport Jar, etc. [Definition TBD]. Implementation type has one or more properties associated with retrieval of that implementation. Optional

Argument Types

There are three main types of arguments: value arguments, type arguments, and enumerations. Every defined arguments must be specified in every invocation of the function. When specified, the position of these arguments in the function invocation must match the position of the arguments as defined in the YAML function definition.

  • Value arguments: arguments that refer to a data value. These could be constants (literal expressions defined in the plan) or variables (a reference expression that references data being processed by the plan). This is the most common type of argument. The value of a value argument is not available in output derivation, but its type is. Value arguments can be declared in one of two ways: concrete or parameterized. Concrete types are either simple types or compound types with all parameters fully defined (without referencing any type arguments). Examples include i32, fp32, VARCHAR<20>, List<fp32>, etc. Parameterized types are discussed further below.
  • Type arguments: arguments that are used only to inform the evaluation and/or type derivation of the function. For example, you might have a function which is truncate(<type> DECIMAL<P0,S0>, <value> DECIMAL<P1, S1>, <value> i32). This function declares two value arguments and a type argument. The difference between them is that the type argument has no value at runtime, while the value arguments do.
  • Enumeration: arguments that support a fixed set of declared values as constant arguments. These arguments must be specified as part of an expression. While these could also have been implemented as constant string value arguments, they are formally included to improve validation/contextual help/etc. for frontend processors and IDEs. An example might be extract([DAY|YEAR|MONTH], <date value>). In this example, a producer must specify a type of date part to extract. Note, the value of a required enumeration cannot be used in type derivation.

Value Argument Properties

Property Description Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)
Type A fully defined type or a type expression. Required
Constant Whether this argument is required to be a constant for invocation. For example, in some system a regular expression pattern would only be accepted as a literal and not a column value reference. Optional, defaults to false

Type Argument Properties

Property Description Required
Type A partially or completely parameterized type. E.g. List<K> or K Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)

Required Enumeration Properties

Property Description Required
Options List of valid string options for this argument Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)

Options

In addition to arguments each call may specify zero or more options. These are similar to a required enumeration but more focused on supporting alternative behaviors. Options can be left unspecified and the consumer is free to choose which implementation to use. An example use case might be OVERFLOW_BEHAVIOR:[OVERFLOW, SATURATE, ERROR] If unspecified, an engine is free to use any of the three choices or even some alternative behavior (e.g. setting the value to null on overflow). If specified, the engine would be expected to behave as specified or fail. Note, the value of an optional enumeration cannot be used in type derivation.

Option Preference

A producer may specify multiple values for an option. If the producer does so then the consumer must deliver the first behavior in the list of values that the consumer is capable of delivering. For example, considering overflow as defined above, if a producer specified [ERROR, SATURATE] then the consumer must deliver ERROR if it is capable of doing so. If it is not then it may deliver SATURATE. If the consumer cannot deliver either behavior then it is an error and the consumer must reject the plan.

Optional Properties

Property Description Required
Values A list of valid strings for this option. Required
Name A human-readable name for this option. Required

Nullability Handling

Mode Description
MIRROR This means that the function has the behavior that if at least one of the input arguments are nullable, the return type is also nullable. If all arguments are non-nullable, the return type will be non-nullable. An example might be the + function.
DECLARED_OUTPUT Input arguments are accepted of any mix of nullability. The nullability of the output function is whatever the return type expression states. Example use might be the function is_null() where the output is always boolean independent of the nullability of the input.
DISCRETE The input and arguments all define concrete nullability and can only be bound to the types that have those nullability. For example, if a type input is declared i64? and one has an i64 literal, the i64 literal must be specifically cast to i64? to allow the operation to bind.

Parameterized Types

Types are parameterized by two types of values: by inner types (e.g. List<K>) and numeric values (e.g. DECIMAL<P,S>). Parameter names are simple strings (frequently a single character). There are two types of parameters: integer parameters and type parameters.

When the same parameter name is used multiple times in a function definition, the function can only bind if the exact same value is used for all parameters of that name. For example, if one had a function with a signature of fn(VARCHAR<N>, VARCHAR<N>), the function would be only be usable if both VARCHAR types had the same length value N. This necessitates that all instances of the same parameter name must be of the same parameter type (all instances are a type parameter or all instances are an integer parameter).

Type Parameter Resolution in Variadic Functions

When the last argument of a function is variadic and declares a type parameter e.g. fn(A, B, C...), the C parameter can be marked as either consistent or inconsistent. If marked as consistent, the function can only be bound to arguments where all the C types are the same concrete type. If marked as inconsistent, each unique C can be bound to a different type within the constraints of what T allows.

Output Type Derivation

Concrete Return Types

A concrete return type is one that is fully known at function definition time. Examples of simple concrete return types would be things such as i32, fp32. For compound types, a concrete return type must be fully declared. Example of fully defined compound types: VARCHAR<20>, DECIMAL<25,5>

Return Type Expressions

Any function can declare a return type expression. A return type expression uses a simplified set of expressions to describe how the return type should be returned. For example, a return expression could be as simple as the return of parameter declared in the arguments. For example f(List<K>) => K or can be a simple mathematical or conditional expression such as add(decimal<a,b>, decimal<c,d>) => decimal<a+c, b+d>. For the simple expression language, there is a very narrow set of types:

  • Integer: 64-bit signed integer (can be a literal or a parameter value)
  • Boolean: True and False
  • Type: A Substrait type (with possibly additional embedded expressions)

These types are evaluated using a small set of operations to support common scenarios. List of valid operations:

Math: +, -, *, /, min, max
+ Scalar Functions - Substrait: Cross-Language Serialization for Relational Algebra      

Scalar Functions

A function is a scalar function if that function takes in values from a single record and produces an output value. To clearly specify the definition of functions, Substrait declares an extensible specification plus binding approach to function resolution. A scalar function signature includes the following properties:

Property Description Required
Name One or more user-friendly UTF-8 strings that are used to reference this function. At least one value is required.
List of arguments Argument properties are defined below. Arguments can be fully defined or calculated with a type expression. See further details below. Optional, defaults to niladic.
Deterministic Whether this function is expected to reproduce the same output when it is invoked multiple times with the same input. This informs a plan consumer on whether it can constant-reduce the defined function. An example would be a random() function, which is typically expected to be evaluated repeatedly despite having the same set of inputs. Optional, defaults to true.
Session Dependent Whether this function is influenced by the session context it is invoked within. For example, a function may be influenced by a user who is invoking the function, the time zone of a session, or some other non-obvious parameter. This can inform caching systems on whether a particular function is cacheable. Optional, defaults to false.
Variadic Behavior Whether the last argument of the function is variadic or a single argument. If variadic, the argument can optionally have a lower bound (minimum number of instances) and an upper bound (maximum number of instances). Optional, defaults to single value.
Nullability Handling Describes how nullability of input arguments maps to nullability of output arguments. Three options are: MIRROR, DECLARED_OUTPUT and DISCRETE. More details about nullability handling are listed below. Optional, defaults to MIRROR
Description Additional description of function for implementers or users. Should be written human-readable to allow exposure to end users. Presented as a map with language => description mappings. E.g. { "en": "This adds two numbers together.", "fr": "cela ajoute deux nombres"}. Optional
Return Value The output type of the expression. Return types can be expressed as a fully-defined type or a type expression. See below for more on type expressions. Required
Implementation Map A map of implementation locations for one or more implementations of the given function. Each key is a function implementation type. Implementation types include examples such as: AthenaArrowLambda, TrinoV361Jar, ArrowCppKernelEnum, GandivaEnum, LinkedIn Transport Jar, etc. [Definition TBD]. Implementation type has one or more properties associated with retrieval of that implementation. Optional

Argument Types

There are three main types of arguments: value arguments, type arguments, and enumerations. Every defined arguments must be specified in every invocation of the function. When specified, the position of these arguments in the function invocation must match the position of the arguments as defined in the YAML function definition.

  • Value arguments: arguments that refer to a data value. These could be constants (literal expressions defined in the plan) or variables (a reference expression that references data being processed by the plan). This is the most common type of argument. The value of a value argument is not available in output derivation, but its type is. Value arguments can be declared in one of two ways: concrete or parameterized. Concrete types are either simple types or compound types with all parameters fully defined (without referencing any type arguments). Examples include i32, fp32, VARCHAR<20>, List<fp32>, etc. Parameterized types are discussed further below.
  • Type arguments: arguments that are used only to inform the evaluation and/or type derivation of the function. For example, you might have a function which is truncate(<type> DECIMAL<P0,S0>, <value> DECIMAL<P1, S1>, <value> i32). This function declares two value arguments and a type argument. The difference between them is that the type argument has no value at runtime, while the value arguments do.
  • Enumeration: arguments that support a fixed set of declared values as constant arguments. These arguments must be specified as part of an expression. While these could also have been implemented as constant string value arguments, they are formally included to improve validation/contextual help/etc. for frontend processors and IDEs. An example might be extract([DAY|YEAR|MONTH], <date value>). In this example, a producer must specify a type of date part to extract. Note, the value of a required enumeration cannot be used in type derivation.

Value Argument Properties

Property Description Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)
Type A fully defined type or a type expression. Required
Constant Whether this argument is required to be a constant for invocation. For example, in some system a regular expression pattern would only be accepted as a literal and not a column value reference. Optional, defaults to false

Type Argument Properties

Property Description Required
Type A partially or completely parameterized type. E.g. List<K> or K Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)

Required Enumeration Properties

Property Description Required
Options List of valid string options for this argument Required
Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)

Options

In addition to arguments each call may specify zero or more options. These are similar to a required enumeration but more focused on supporting alternative behaviors. Options can be left unspecified and the consumer is free to choose which implementation to use. An example use case might be OVERFLOW_BEHAVIOR:[OVERFLOW, SATURATE, ERROR] If unspecified, an engine is free to use any of the three choices or even some alternative behavior (e.g. setting the value to null on overflow). If specified, the engine would be expected to behave as specified or fail. Note, the value of an optional enumeration cannot be used in type derivation.

Option Preference

A producer may specify multiple values for an option. If the producer does so then the consumer must deliver the first behavior in the list of values that the consumer is capable of delivering. For example, considering overflow as defined above, if a producer specified [ERROR, SATURATE] then the consumer must deliver ERROR if it is capable of doing so. If it is not then it may deliver SATURATE. If the consumer cannot deliver either behavior then it is an error and the consumer must reject the plan.

Optional Properties

Property Description Required
Values A list of valid strings for this option. Required
Name A human-readable name for this option. Required

Nullability Handling

Mode Description
MIRROR This means that the function has the behavior that if at least one of the input arguments are nullable, the return type is also nullable. If all arguments are non-nullable, the return type will be non-nullable. An example might be the + function.
DECLARED_OUTPUT Input arguments are accepted of any mix of nullability. The nullability of the output function is whatever the return type expression states. Example use might be the function is_null() where the output is always boolean independent of the nullability of the input.
DISCRETE The input and arguments all define concrete nullability and can only be bound to the types that have those nullability. For example, if a type input is declared i64? and one has an i64 literal, the i64 literal must be specifically cast to i64? to allow the operation to bind.

Parameterized Types

Types are parameterized by two types of values: by inner types (e.g. List<K>) and numeric values (e.g. DECIMAL<P,S>). Parameter names are simple strings (frequently a single character). There are two types of parameters: integer parameters and type parameters.

When the same parameter name is used multiple times in a function definition, the function can only bind if the exact same value is used for all parameters of that name. For example, if one had a function with a signature of fn(VARCHAR<N>, VARCHAR<N>), the function would be only be usable if both VARCHAR types had the same length value N. This necessitates that all instances of the same parameter name must be of the same parameter type (all instances are a type parameter or all instances are an integer parameter).

Type Parameter Resolution in Variadic Functions

When the last argument of a function is variadic and declares a type parameter e.g. fn(A, B, C...), the C parameter can be marked as either consistent or inconsistent. If marked as consistent, the function can only be bound to arguments where all the C types are the same concrete type. If marked as inconsistent, each unique C can be bound to a different type within the constraints of what T allows.

Output Type Derivation

Concrete Return Types

A concrete return type is one that is fully known at function definition time. Examples of simple concrete return types would be things such as i32, fp32. For compound types, a concrete return type must be fully declared. Example of fully defined compound types: VARCHAR<20>, DECIMAL<25,5>

Return Type Expressions

Any function can declare a return type expression. A return type expression uses a simplified set of expressions to describe how the return type should be returned. For example, a return expression could be as simple as the return of parameter declared in the arguments. For example f(List<K>) => K or can be a simple mathematical or conditional expression such as add(decimal<a,b>, decimal<c,d>) => decimal<a+c, b+d>. For the simple expression language, there is a very narrow set of types:

  • Integer: 64-bit signed integer (can be a literal or a parameter value)
  • Boolean: True and False
  • Type: A Substrait type (with possibly additional embedded expressions)

These types are evaluated using a small set of operations to support common scenarios. List of valid operations:

Math: +, -, *, /, min, max
 Boolean: &&, ||, !, <, >, ==
 Parameters: type, integer
 Literals: type, integer
@@ -22,4 +22,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/specialized_record_expressions/index.html b/expressions/specialized_record_expressions/index.html index 9e630da6..44039cfb 100644 --- a/expressions/specialized_record_expressions/index.html +++ b/expressions/specialized_record_expressions/index.html @@ -1,4 +1,4 @@ - Specialized Record Expressions - Substrait: Cross-Language Serialization for Relational Algebra

Specialized Record Expressions

While all types of operations could be reduced to functions, in some cases this would be overly simplistic. Instead, it is helpful to construct some other expression constructs.

These constructs should be focused on different expression types as opposed to something that directly related to syntactic sugar. For example, CAST and EXTRACT or SQL operations that are presented using specialized syntax. However, they can easily be modeled using a function paradigm with minimal complexity.

Literal Expressions

For each data type, it is possible to create a literal value for that data type. The representation depends on the serialization format. Literal expressions include both a type literal and a possibly null value.

Nested Type Constructor Expressions

These expressions allow structs, lists, and maps to be constructed from a set of expressions. For example, they allow a struct expression like (field 0 - field 1, field 0 + field 1) to be represented.

Cast Expression

To convert a value from one type to another, Substrait defines a cast expression. Cast expressions declare an expected type, an input argument and an enumeration specifying failure behavior, indicating whether cast should return null on failure or throw an exception.

Note that Substrait always requires a cast expression whenever the current type is not exactly equal to (one of) the expected types. For example, it is illegal to directly pass a value of type i8[0] to a function that only supports an i8?[0] argument.

If Expression

An if value expression is an expression composed of one if clause, zero or more else if clauses and an else clause. In pseudocode, they are envisioned as:

if <boolean expression> then <result expression 1>
+ Specialized Record Expressions - Substrait: Cross-Language Serialization for Relational Algebra      

Specialized Record Expressions

While all types of operations could be reduced to functions, in some cases this would be overly simplistic. Instead, it is helpful to construct some other expression constructs.

These constructs should be focused on different expression types as opposed to something that directly related to syntactic sugar. For example, CAST and EXTRACT or SQL operations that are presented using specialized syntax. However, they can easily be modeled using a function paradigm with minimal complexity.

Literal Expressions

For each data type, it is possible to create a literal value for that data type. The representation depends on the serialization format. Literal expressions include both a type literal and a possibly null value.

Nested Type Constructor Expressions

These expressions allow structs, lists, and maps to be constructed from a set of expressions. For example, they allow a struct expression like (field 0 - field 1, field 0 + field 1) to be represented.

Cast Expression

To convert a value from one type to another, Substrait defines a cast expression. Cast expressions declare an expected type, an input argument and an enumeration specifying failure behavior, indicating whether cast should return null on failure or throw an exception.

Note that Substrait always requires a cast expression whenever the current type is not exactly equal to (one of) the expected types. For example, it is illegal to directly pass a value of type i8[0] to a function that only supports an i8?[0] argument.

If Expression

An if value expression is an expression composed of one if clause, zero or more else if clauses and an else clause. In pseudocode, they are envisioned as:

if <boolean expression> then <result expression 1>
 else if <boolean expression> then <result expression 2> (zero or more times)
 else <result expression 3>
 

When an if expression is declared, all return expressions must be the same identical type.

Shortcut Behavior

An if expression is expected to logically short-circuit on a positive outcome. This means that a skipped else/elseif expression cannot cause an error. For example, this should not actually throw an error despite the fact that the cast operation should fail.

if 'value' = 'value' then 0
@@ -31,4 +31,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/subqueries/index.html b/expressions/subqueries/index.html index a94cbf55..e949cfbc 100644 --- a/expressions/subqueries/index.html +++ b/expressions/subqueries/index.html @@ -1,4 +1,4 @@ - Subqueries - Substrait: Cross-Language Serialization for Relational Algebra

Subqueries

Subqueries are scalar expressions comprised of another query.

Forms

Scalar

Scalar subqueries are subqueries that return one row and one column.

Property Description Required
Input Input relation Yes

IN predicate

An IN subquery predicate checks that the left expression is contained in the right subquery.

Examples

SELECT *
+ Subqueries - Substrait: Cross-Language Serialization for Relational Algebra      

Subqueries

Subqueries are scalar expressions comprised of another query.

Forms

Scalar

Scalar subqueries are subqueries that return one row and one column.

Property Description Required
Input Input relation Yes

IN predicate

An IN subquery predicate checks that the left expression is contained in the right subquery.

Examples

SELECT *
 FROM t1
 WHERE x IN (SELECT * FROM t2)
 
SELECT *
@@ -103,4 +103,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/table_functions/index.html b/expressions/table_functions/index.html index f645709b..ee54e687 100644 --- a/expressions/table_functions/index.html +++ b/expressions/table_functions/index.html @@ -1,4 +1,4 @@ - Table Functions - Substrait: Cross-Language Serialization for Relational Algebra
GitHub
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/user_defined_functions/index.html b/expressions/user_defined_functions/index.html index 4e7722fd..82aaf24e 100644 --- a/expressions/user_defined_functions/index.html +++ b/expressions/user_defined_functions/index.html @@ -1,4 +1,4 @@ - User-Defined Functions - Substrait: Cross-Language Serialization for Relational Algebra

User-Defined Functions

Substrait supports the creation of custom functions using simple extensions, using the facilities described in scalar functions. The functions defined by Substrait use the same mechanism. The extension files for standard functions can be found here.

Here’s an example function that doubles its input:

Implementation Note

This implementation is only defined on 32-bit floats and integers but could be defined on all numbers (and even lists and strings). The user of the implementation can specify what happens when the resulting value falls outside of the valid range for a 32-bit float (either return NAN or raise an error).

%YAML 1.2
+ User-Defined Functions - Substrait: Cross-Language Serialization for Relational Algebra      

User-Defined Functions

Substrait supports the creation of custom functions using simple extensions, using the facilities described in scalar functions. The functions defined by Substrait use the same mechanism. The extension files for standard functions can be found here.

Here’s an example function that doubles its input:

Implementation Note

This implementation is only defined on 32-bit floats and integers but could be defined on all numbers (and even lists and strings). The user of the implementation can specify what happens when the resulting value falls outside of the valid range for a 32-bit float (either return NAN or raise an error).

%YAML 1.2
 ---
 scalar_functions:
   -
@@ -39,4 +39,4 @@
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
   IN THE SOFTWARE.
--->   
\ No newline at end of file +-->
\ No newline at end of file diff --git a/expressions/window_functions/index.html b/expressions/window_functions/index.html index e2ba834e..0e38aab7 100644 --- a/expressions/window_functions/index.html +++ b/expressions/window_functions/index.html @@ -1,4 +1,4 @@ - Window Functions - Substrait: Cross-Language Serialization for Relational Algebra

Window Functions

Window functions are functions which consume values from multiple records to produce a single output. They are similar to aggregate functions, but also have a focused window of analysis to compare to their partition window. Window functions are similar to scalar values to an end user, producing a single value for each input record. However, the consumption visibility for the production of each single record can be many records.

Window function signatures contain all the properties defined for aggregate functions. Additionally, they contain the properties below

Property Description Required
Inherits All properties defined for aggregate functions. N/A
Window Type STREAMING or PARTITION. Describes whether the function needs to see all data for the specific partition operation simultaneously. Operations like SUM can produce values in a streaming manner with no complete visibility of the partition. NTILE requires visibility of the entire partition before it can start producing values. Optional, defaults to PARTITION

When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

Property Description Required
Partition A list of partitioning expressions. False, defaults to a single partition for the entire dataset
Lower Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to start of partition
Upper Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to end of partition

Aggregate Functions as Window Functions

Aggregate functions can be treated as a window functions with Window Type set to STREAMING.

AVG, COUNT, MAX, MIN and SUM are examples of aggregate functions that are commonly allowed in window contexts.

GitHub

Window Functions

Window functions are functions which consume values from multiple records to produce a single output. They are similar to aggregate functions, but also have a focused window of analysis to compare to their partition window. Window functions are similar to scalar values to an end user, producing a single value for each input record. However, the consumption visibility for the production of each single record can be many records.

Window function signatures contain all the properties defined for aggregate functions. Additionally, they contain the properties below

Property Description Required
Inherits All properties defined for aggregate functions. N/A
Window Type STREAMING or PARTITION. Describes whether the function needs to see all data for the specific partition operation simultaneously. Operations like SUM can produce values in a streaming manner with no complete visibility of the partition. NTILE requires visibility of the entire partition before it can start producing values. Optional, defaults to PARTITION

When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

Property Description Required
Partition A list of partitioning expressions. False, defaults to a single partition for the entire dataset
Lower Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to start of partition
Upper Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to end of partition

Aggregate Functions as Window Functions

Aggregate functions can be treated as a window functions with Window Type set to STREAMING.

AVG, COUNT, MAX, MIN and SUM are examples of aggregate functions that are commonly allowed in window contexts.

\ No newline at end of file +-->
\ No newline at end of file diff --git a/extensions/functions_aggregate_approx/index.html b/extensions/functions_aggregate_approx/index.html index 590f296d..7aaa1566 100644 --- a/extensions/functions_aggregate_approx/index.html +++ b/extensions/functions_aggregate_approx/index.html @@ -1,4 +1,4 @@ - functions_aggregate_approx.yaml - Substrait: Cross-Language Serialization for Relational Algebra

functions_aggregate_approx.yaml

This document file is generated for functions_aggregate_approx.yaml

Aggregate Functions

approx_count_distinct

Implementations:
approx_count_distinct(x): -> return_type
0. approx_count_distinct(any): -> i64

Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

GitHub

functions_aggregate_approx.yaml

This document file is generated for functions_aggregate_approx.yaml

Aggregate Functions

approx_count_distinct

Implementations:
approx_count_distinct(x): -> return_type
0. approx_count_distinct(any): -> i64

Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

\ No newline at end of file +-->
\ No newline at end of file diff --git a/extensions/functions_aggregate_generic/index.html b/extensions/functions_aggregate_generic/index.html index f4bc1409..de7dcf6a 100644 --- a/extensions/functions_aggregate_generic/index.html +++ b/extensions/functions_aggregate_generic/index.html @@ -1,4 +1,4 @@ - functions_aggregate_generic.yaml - Substrait: Cross-Language Serialization for Relational Algebra

functions_aggregate_generic.yaml

This document file is generated for functions_aggregate_generic.yaml

Aggregate Functions

count

Implementations:
count(x, option:overflow): -> return_type
0. count(any, option:overflow): -> i64

Count a set of values

Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • count

    Implementations:

    Count a set of records (not field referenced)

    any_value

    Implementations:
    any_value(x): -> return_type
    0. any_value(any): -> any?

    *Selects an arbitrary value from a group of values. If the input is empty, the function returns null. *

    GitHub

    functions_aggregate_generic.yaml

    This document file is generated for functions_aggregate_generic.yaml

    Aggregate Functions

    count

    Implementations:
    count(x, option:overflow): -> return_type
    0. count(any, option:overflow): -> i64

    Count a set of values

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • count

    Implementations:

    Count a set of records (not field referenced)

    any_value

    Implementations:
    any_value(x): -> return_type
    0. any_value(any): -> any?

    *Selects an arbitrary value from a group of values. If the input is empty, the function returns null. *

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_arithmetic/index.html b/extensions/functions_arithmetic/index.html index d211cb98..009b9b92 100644 --- a/extensions/functions_arithmetic/index.html +++ b/extensions/functions_arithmetic/index.html @@ -1,4 +1,4 @@ - functions_arithmetic.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_arithmetic.yaml

    This document file is generated for functions_arithmetic.yaml

    Scalar Functions

    add

    Implementations:
    add(x, y, option:overflow): -> return_type
    0. add(i8, i8, option:overflow): -> i8
    1. add(i16, i16, option:overflow): -> i16
    2. add(i32, i32, option:overflow): -> i32
    3. add(i64, i64, option:overflow): -> i64
    4. add(fp32, fp32, option:rounding): -> fp32
    5. add(fp64, fp64, option:rounding): -> fp64

    Add two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • subtract

    Implementations:
    subtract(x, y, option:overflow): -> return_type
    0. subtract(i8, i8, option:overflow): -> i8
    1. subtract(i16, i16, option:overflow): -> i16
    2. subtract(i32, i32, option:overflow): -> i32
    3. subtract(i64, i64, option:overflow): -> i64
    4. subtract(fp32, fp32, option:rounding): -> fp32
    5. subtract(fp64, fp64, option:rounding): -> fp64

    Subtract one value from another.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • multiply

    Implementations:
    multiply(x, y, option:overflow): -> return_type
    0. multiply(i8, i8, option:overflow): -> i8
    1. multiply(i16, i16, option:overflow): -> i16
    2. multiply(i32, i32, option:overflow): -> i32
    3. multiply(i64, i64, option:overflow): -> i64
    4. multiply(fp32, fp32, option:rounding): -> fp32
    5. multiply(fp64, fp64, option:rounding): -> fp64

    Multiply two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • divide

    Implementations:
    divide(x, y, option:overflow): -> return_type
    0. divide(i8, i8, option:overflow): -> i8
    1. divide(i16, i16, option:overflow): -> i16
    2. divide(i32, i32, option:overflow): -> i32
    3. divide(i64, i64, option:overflow): -> i64
    4. divide(fp32, fp32, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp32
    5. divide(fp64, fp64, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp64

    *Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0). The on_division_by_zero option governs behavior in cases where y is 0 and x is not 0. LIMIT means positive or negative infinity (depending on the sign of x and y). If x and y are both 0 or both ±infinity, behavior will be governed by on_domain_error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • on_division_by_zero ['LIMIT', 'NAN', 'ERROR']
  • negate

    Implementations:
    negate(x, option:overflow): -> return_type
    0. negate(i8, option:overflow): -> i8
    1. negate(i16, option:overflow): -> i16
    2. negate(i32, option:overflow): -> i32
    3. negate(i64, option:overflow): -> i64
    4. negate(fp32): -> fp32
    5. negate(fp64): -> fp64

    Negation of the value

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • modulus

    Implementations:
    modulus(x, y, option:division_type, option:overflow, option:on_domain_error): -> return_type
    0. modulus(i8, i8, option:division_type, option:overflow, option:on_domain_error): -> i8
    1. modulus(i16, i16, option:division_type, option:overflow, option:on_domain_error): -> i16
    2. modulus(i32, i32, option:division_type, option:overflow, option:on_domain_error): -> i32
    3. modulus(i64, i64, option:division_type, option:overflow, option:on_domain_error): -> i64

    *Calculate the remainder ® when dividing dividend (x) by divisor (y). In mathematics, many conventions for the modulus (mod) operation exists. The result of a mod operation depends on the software implementation and underlying hardware. Substrait is a format for describing compute operations on structured data and designed for interoperability. Therefore the user is responsible for determining a definition of division as defined by the quotient (q). The following basic conditions of division are satisfied: (1) q ∈ ℤ (the quotient is an integer) (2) x = y * q + r (division rule) (3) abs® < abs(y) where q is the quotient. The division_type option determines the mathematical definition of quotient to use in the above definition of division. When division_type=TRUNCATE, q = trunc(x/y). When division_type=FLOOR, q = floor(x/y). In the cases of TRUNCATE and FLOOR division: remainder r = x - round_func(x/y) The on_domain_error option governs behavior in cases where y is 0, y is ±inf, or x is ±inf. In these cases the mod is undefined. The overflow option governs behavior when integer overflow occurs. If x and y are both 0 or both ±infinity, behavior will be governed by on_domain_error. *

    Options:
  • division_type ['TRUNCATE', 'FLOOR']
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • power

    Implementations:
    power(x, y, option:overflow): -> return_type
    0. power(i64, i64, option:overflow): -> i64
    1. power(fp32, fp32): -> fp32
    2. power(fp64, fp64): -> fp64

    Take the power with x as the base and y as exponent.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sqrt

    Implementations:
    sqrt(x, option:rounding, option:on_domain_error): -> return_type
    0. sqrt(i64, option:rounding, option:on_domain_error): -> fp64
    1. sqrt(fp32, option:rounding, option:on_domain_error): -> fp32
    2. sqrt(fp64, option:rounding, option:on_domain_error): -> fp64

    Square root of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • exp

    Implementations:
    exp(x, option:rounding): -> return_type
    0. exp(fp32, option:rounding): -> fp32
    1. exp(fp64, option:rounding): -> fp64

    The mathematical constant e, raised to the power of the value.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • cos

    Implementations:
    cos(x, option:rounding): -> return_type
    0. cos(fp32, option:rounding): -> fp64
    1. cos(fp64, option:rounding): -> fp64

    Get the cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • sin

    Implementations:
    sin(x, option:rounding): -> return_type
    0. sin(fp32, option:rounding): -> fp64
    1. sin(fp64, option:rounding): -> fp64

    Get the sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • tan

    Implementations:
    tan(x, option:rounding): -> return_type
    0. tan(fp32, option:rounding): -> fp64
    1. tan(fp64, option:rounding): -> fp64

    Get the tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • cosh

    Implementations:
    cosh(x, option:rounding): -> return_type
    0. cosh(fp32, option:rounding): -> fp32
    1. cosh(fp64, option:rounding): -> fp64

    Get the hyperbolic cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • sinh

    Implementations:
    sinh(x, option:rounding): -> return_type
    0. sinh(fp32, option:rounding): -> fp32
    1. sinh(fp64, option:rounding): -> fp64

    Get the hyperbolic sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • tanh

    Implementations:
    tanh(x, option:rounding): -> return_type
    0. tanh(fp32, option:rounding): -> fp32
    1. tanh(fp64, option:rounding): -> fp64

    Get the hyperbolic tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • acos

    Implementations:
    acos(x, option:rounding, option:on_domain_error): -> return_type
    0. acos(fp32, option:rounding, option:on_domain_error): -> fp64
    1. acos(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • asin

    Implementations:
    asin(x, option:rounding, option:on_domain_error): -> return_type
    0. asin(fp32, option:rounding, option:on_domain_error): -> fp64
    1. asin(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • atan

    Implementations:
    atan(x, option:rounding): -> return_type
    0. atan(fp32, option:rounding): -> fp64
    1. atan(fp64, option:rounding): -> fp64

    Get the arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • acosh

    Implementations:
    acosh(x, option:rounding, option:on_domain_error): -> return_type
    0. acosh(fp32, option:rounding, option:on_domain_error): -> fp32
    1. acosh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • asinh

    Implementations:
    asinh(x, option:rounding): -> return_type
    0. asinh(fp32, option:rounding): -> fp32
    1. asinh(fp64, option:rounding): -> fp64

    Get the hyperbolic arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • atanh

    Implementations:
    atanh(x, option:rounding, option:on_domain_error): -> return_type
    0. atanh(fp32, option:rounding, option:on_domain_error): -> fp32
    1. atanh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • atan2

    Implementations:
    atan2(x, y, option:rounding, option:on_domain_error): -> return_type
    0. atan2(fp32, fp32, option:rounding, option:on_domain_error): -> fp64
    1. atan2(fp64, fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arctangent of values given as x/y pairs.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • radians

    Implementations:
    radians(x, option:rounding): -> return_type
    0. radians(fp32, option:rounding): -> fp32
    1. radians(fp64, option:rounding): -> fp64

    *Converts angle x in degrees to radians. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • degrees

    Implementations:
    degrees(x, option:rounding): -> return_type
    0. degrees(fp32, option:rounding): -> fp32
    1. degrees(fp64, option:rounding): -> fp64

    *Converts angle x in radians to degrees. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • abs

    Implementations:
    abs(x, option:overflow): -> return_type
    0. abs(i8, option:overflow): -> i8
    1. abs(i16, option:overflow): -> i16
    2. abs(i32, option:overflow): -> i32
    3. abs(i64, option:overflow): -> i64
    4. abs(fp32): -> fp32
    5. abs(fp64): -> fp64

    *Calculate the absolute value of the argument. Integer values allow the specification of overflow behavior to handle the unevenness of the twos complement, e.g. Int8 range [-128 : 127]. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sign

    Implementations:
    sign(x): -> return_type
    0. sign(i8): -> i8
    1. sign(i16): -> i16
    2. sign(i32): -> i32
    3. sign(i64): -> i64
    4. sign(fp32): -> fp32
    5. sign(fp64): -> fp64

    *Return the signedness of the argument. Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] *

    factorial

    Implementations:
    factorial(n, option:overflow): -> return_type
    0. factorial(i32, option:overflow): -> i32
    1. factorial(i64, option:overflow): -> i64

    *Return the factorial of a given integer input. The factorial of 0! is 1 by convention. Negative inputs will raise an error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • bitwise_not

    Implementations:
    bitwise_not(x): -> return_type
    0. bitwise_not(i8): -> i8
    1. bitwise_not(i16): -> i16
    2. bitwise_not(i32): -> i32
    3. bitwise_not(i64): -> i64

    *Return the bitwise NOT result for one integer input. *

    bitwise_and

    Implementations:
    bitwise_and(x, y): -> return_type
    0. bitwise_and(i8, i8): -> i8
    1. bitwise_and(i16, i16): -> i16
    2. bitwise_and(i32, i32): -> i32
    3. bitwise_and(i64, i64): -> i64

    *Return the bitwise AND result for two integer inputs. *

    bitwise_or

    Implementations:
    bitwise_or(x, y): -> return_type
    0. bitwise_or(i8, i8): -> i8
    1. bitwise_or(i16, i16): -> i16
    2. bitwise_or(i32, i32): -> i32
    3. bitwise_or(i64, i64): -> i64

    *Return the bitwise OR result for two given integer inputs. *

    bitwise_xor

    Implementations:
    bitwise_xor(x, y): -> return_type
    0. bitwise_xor(i8, i8): -> i8
    1. bitwise_xor(i16, i16): -> i16
    2. bitwise_xor(i32, i32): -> i32
    3. bitwise_xor(i64, i64): -> i64

    *Return the bitwise XOR result for two integer inputs. *

    Aggregate Functions

    sum

    Implementations:
    sum(x, option:overflow): -> return_type
    0. sum(i8, option:overflow): -> i64?
    1. sum(i16, option:overflow): -> i64?
    2. sum(i32, option:overflow): -> i64?
    3. sum(i64, option:overflow): -> i64?
    4. sum(fp32, option:overflow): -> fp64?
    5. sum(fp64, option:overflow): -> fp64?

    Sum a set of values. The sum of zero elements yields null.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sum0

    Implementations:
    sum0(x, option:overflow): -> return_type
    0. sum0(i8, option:overflow): -> i64
    1. sum0(i16, option:overflow): -> i64
    2. sum0(i32, option:overflow): -> i64
    3. sum0(i64, option:overflow): -> i64
    4. sum0(fp32, option:overflow): -> fp64
    5. sum0(fp64, option:overflow): -> fp64

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • avg

    Implementations:
    avg(x, option:overflow): -> return_type
    0. avg(i8, option:overflow): -> i8?
    1. avg(i16, option:overflow): -> i16?
    2. avg(i32, option:overflow): -> i32?
    3. avg(i64, option:overflow): -> i64?
    4. avg(fp32, option:overflow): -> fp32?
    5. avg(fp64, option:overflow): -> fp64?

    Average a set of values. For integral types, this truncates partial values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • min

    Implementations:
    min(x): -> return_type
    0. min(i8): -> i8?
    1. min(i16): -> i16?
    2. min(i32): -> i32?
    3. min(i64): -> i64?
    4. min(fp32): -> fp32?
    5. min(fp64): -> fp64?
    6. min(timestamp): -> timestamp?
    7. min(timestamp_tz): -> timestamp_tz?

    Min a set of values.

    max

    Implementations:
    max(x): -> return_type
    0. max(i8): -> i8?
    1. max(i16): -> i16?
    2. max(i32): -> i32?
    3. max(i64): -> i64?
    4. max(fp32): -> fp32?
    5. max(fp64): -> fp64?
    6. max(timestamp): -> timestamp?
    7. max(timestamp_tz): -> timestamp_tz?

    Max a set of values.

    product

    Implementations:
    product(x, option:overflow): -> return_type
    0. product(i8, option:overflow): -> i8
    1. product(i16, option:overflow): -> i16
    2. product(i32, option:overflow): -> i32
    3. product(i64, option:overflow): -> i64
    4. product(fp32, option:rounding): -> fp32
    5. product(fp64, option:rounding): -> fp64

    Product of a set of values. Returns 1 for empty input.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • std_dev

    Implementations:
    std_dev(x, option:rounding, option:distribution): -> return_type
    0. std_dev(fp32, option:rounding, option:distribution): -> fp32?
    1. std_dev(fp64, option:rounding, option:distribution): -> fp64?

    Calculates standard-deviation for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • variance

    Implementations:
    variance(x, option:rounding, option:distribution): -> return_type
    0. variance(fp32, option:rounding, option:distribution): -> fp32?
    1. variance(fp64, option:rounding, option:distribution): -> fp64?

    Calculates variance for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • corr

    Implementations:
    corr(x, y, option:rounding): -> return_type
    0. corr(fp32, fp32, option:rounding): -> fp32?
    1. corr(fp64, fp64, option:rounding): -> fp64?

    *Calculates the value of Pearson’s correlation coefficient between x and y. If there is no input, null is returned. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • mode

    Implementations:
    mode(x): -> return_type
    0. mode(i8): -> i8?
    1. mode(i16): -> i16?
    2. mode(i32): -> i32?
    3. mode(i64): -> i64?
    4. mode(fp32): -> fp32?
    5. mode(fp64): -> fp64?

    *Calculates mode for a set of values. If there is no input, null is returned. *

    median

    Implementations:
    median(precision, x, option:rounding): -> return_type
    0. median(precision, i8, option:rounding): -> i8?
    1. median(precision, i16, option:rounding): -> i16?
    2. median(precision, i32, option:rounding): -> i32?
    3. median(precision, i64, option:rounding): -> i64?
    4. median(precision, fp32, option:rounding): -> fp32?
    5. median(precision, fp64, option:rounding): -> fp64?

    *Calculate the median for a set of values. Returns null if applied to zero records. For the integer implementations, the rounding option determines how the median should be rounded if it ends up midway between two values. For the floating point implementations, they specify the usual floating point rounding mode. *

    Options:
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • quantile

    Implementations:
    quantile(boundaries, precision, n, distribution, option:rounding): -> return_type

  • n: A positive integer which defines the number of quantile partitions.
  • distribution: The data for which the quantiles should be computed.
  • 0. quantile(boundaries, precision, i64, any, option:rounding): -> LIST?<any>

    *Calculates quantiles for a set of values. This function will divide the aggregated values (passed via the distribution argument) over N equally-sized bins, where N is passed via a constant argument. It will then return the values at the boundaries of these bins in list form. If the input is appropriately sorted, this computes the quantiles of the distribution. The function can optionally return the first and/or last element of the input, as specified by the boundaries argument. If the input is appropriately sorted, this will thus be the minimum and/or maximum values of the distribution. When the boundaries do not lie exactly on elements of the incoming distribution, the function will interpolate between the two nearby elements. If the interpolated value cannot be represented exactly, the rounding option controls how the value should be selected or computed. The function fails and returns null in the following cases: - n is null or less than one; - any value in distribution is null.

    The function returns an empty list if n equals 1 and boundaries is set to NEITHER. *

    Options:
  • boundaries ['NEITHER', 'MINIMUM', 'MAXIMUM', 'BOTH']
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • Window Functions

    row_number

    Implementations:
    0. row_number(): -> i64?

    the number of the current row within its partition.

    rank

    Implementations:
    0. rank(): -> i64?

    the rank of the current row, with gaps.

    dense_rank

    Implementations:
    0. dense_rank(): -> i64?

    the rank of the current row, without gaps.

    percent_rank

    Implementations:
    0. percent_rank(): -> fp64?

    the relative rank of the current row.

    cume_dist

    Implementations:
    0. cume_dist(): -> fp64?

    the cumulative distribution.

    ntile

    Implementations:
    ntile(x): -> return_type
    0. ntile(i32): -> i32?
    1. ntile(i64): -> i64?

    Return an integer ranging from 1 to the argument value,dividing the partition as equally as possible.

    first_value

    Implementations:
    first_value(expression): -> return_type
    0. first_value(any1): -> any1

    *Returns the first value in the window. *

    last_value

    Implementations:
    last_value(expression): -> return_type
    0. last_value(any1): -> any1

    *Returns the last value in the window. *

    nth_value

    Implementations:
    nth_value(expression, window_offset, option:on_domain_error): -> return_type
    0. nth_value(any1, i32, option:on_domain_error): -> any1?

    *Returns a value from the nth row based on the window_offset. window_offset should be a positive integer. If the value of the window_offset is outside the range of the window, null is returned. The on_domain_error option governs behavior in cases where window_offset is not a positive integer or null. *

    Options:
  • on_domain_error ['NAN', 'ERROR']
  • lead

    Implementations:
    lead(expression): -> return_type
    0. lead(any1): -> any1?
    1. lead(any1, i32): -> any1?
    2. lead(any1, i32, any1): -> any1?

    *Return a value from a following row based on a specified physical offset. This allows you to compare a value in the current row against a following row. The expression is evaluated against a row that comes after the current row based on the row_offset. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming before the current row, similar to the lag function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the window. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the following year. row_offset of 1. | year | sales | next_year_sales | | 2019 | 20.50 | 30.00 | | 2020 | 30.00 | 45.99 | | 2021 | 45.99 | null | *

    lag

    Implementations:
    lag(expression): -> return_type
    0. lag(any1): -> any1?
    1. lag(any1, i32): -> any1?
    2. lag(any1, i32, any1): -> any1?

    *Return a column value from a previous row based on a specified physical offset. This allows you to compare a value in the current row against a previous row. The expression is evaluated against a row that comes before the current row based on the row_offset. The expression can be a column, expression or subquery that evaluates to a single value. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming after the current row, similar to the lead function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the partition. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the previous year. row_offset of 1. | year | sales | previous_year_sales | | 2019 | 20.50 | null | | 2020 | 30.00 | 20.50 | | 2021 | 45.99 | 30.00 | *

    GitHub

    functions_arithmetic.yaml

    This document file is generated for functions_arithmetic.yaml

    Scalar Functions

    add

    Implementations:
    add(x, y, option:overflow): -> return_type
    0. add(i8, i8, option:overflow): -> i8
    1. add(i16, i16, option:overflow): -> i16
    2. add(i32, i32, option:overflow): -> i32
    3. add(i64, i64, option:overflow): -> i64
    4. add(fp32, fp32, option:rounding): -> fp32
    5. add(fp64, fp64, option:rounding): -> fp64

    Add two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • subtract

    Implementations:
    subtract(x, y, option:overflow): -> return_type
    0. subtract(i8, i8, option:overflow): -> i8
    1. subtract(i16, i16, option:overflow): -> i16
    2. subtract(i32, i32, option:overflow): -> i32
    3. subtract(i64, i64, option:overflow): -> i64
    4. subtract(fp32, fp32, option:rounding): -> fp32
    5. subtract(fp64, fp64, option:rounding): -> fp64

    Subtract one value from another.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • multiply

    Implementations:
    multiply(x, y, option:overflow): -> return_type
    0. multiply(i8, i8, option:overflow): -> i8
    1. multiply(i16, i16, option:overflow): -> i16
    2. multiply(i32, i32, option:overflow): -> i32
    3. multiply(i64, i64, option:overflow): -> i64
    4. multiply(fp32, fp32, option:rounding): -> fp32
    5. multiply(fp64, fp64, option:rounding): -> fp64

    Multiply two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • divide

    Implementations:
    divide(x, y, option:overflow, option:on_domain_error, option:on_division_by_zero): -> return_type
    0. divide(i8, i8, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i8
    1. divide(i16, i16, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i16
    2. divide(i32, i32, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i32
    3. divide(i64, i64, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i64
    4. divide(fp32, fp32, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp32
    5. divide(fp64, fp64, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp64

    *Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0). The on_division_by_zero option governs behavior in cases where y is 0. If the option is IEEE then the IEEE754 standard is followed: all values except ±infinity return NaN and ±infinity are unchanged. If the option is LIMIT then the result is ±infinity in all cases. If either x or y are NaN then behavior will be governed by on_domain_error. If x and y are both ±infinity, behavior will be governed by on_domain_error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • on_division_by_zero ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • rounding ['NAN', 'NULL', 'ERROR']
  • overflow ['IEEE', 'LIMIT', 'NULL', 'ERROR']
  • negate

    Implementations:
    negate(x, option:overflow): -> return_type
    0. negate(i8, option:overflow): -> i8
    1. negate(i16, option:overflow): -> i16
    2. negate(i32, option:overflow): -> i32
    3. negate(i64, option:overflow): -> i64
    4. negate(fp32): -> fp32
    5. negate(fp64): -> fp64

    Negation of the value

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • modulus

    Implementations:
    modulus(x, y, option:division_type, option:overflow, option:on_domain_error): -> return_type
    0. modulus(i8, i8, option:division_type, option:overflow, option:on_domain_error): -> i8
    1. modulus(i16, i16, option:division_type, option:overflow, option:on_domain_error): -> i16
    2. modulus(i32, i32, option:division_type, option:overflow, option:on_domain_error): -> i32
    3. modulus(i64, i64, option:division_type, option:overflow, option:on_domain_error): -> i64

    *Calculate the remainder ® when dividing dividend (x) by divisor (y). In mathematics, many conventions for the modulus (mod) operation exists. The result of a mod operation depends on the software implementation and underlying hardware. Substrait is a format for describing compute operations on structured data and designed for interoperability. Therefore the user is responsible for determining a definition of division as defined by the quotient (q). The following basic conditions of division are satisfied: (1) q ∈ ℤ (the quotient is an integer) (2) x = y * q + r (division rule) (3) abs® < abs(y) where q is the quotient. The division_type option determines the mathematical definition of quotient to use in the above definition of division. When division_type=TRUNCATE, q = trunc(x/y). When division_type=FLOOR, q = floor(x/y). In the cases of TRUNCATE and FLOOR division: remainder r = x - round_func(x/y) The on_domain_error option governs behavior in cases where y is 0, y is ±inf, or x is ±inf. In these cases the mod is undefined. The overflow option governs behavior when integer overflow occurs. If x and y are both 0 or both ±infinity, behavior will be governed by on_domain_error. *

    Options:
  • division_type ['TRUNCATE', 'FLOOR']
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • power

    Implementations:
    power(x, y, option:overflow): -> return_type
    0. power(i64, i64, option:overflow): -> i64
    1. power(fp32, fp32): -> fp32
    2. power(fp64, fp64): -> fp64

    Take the power with x as the base and y as exponent.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sqrt

    Implementations:
    sqrt(x, option:rounding, option:on_domain_error): -> return_type
    0. sqrt(i64, option:rounding, option:on_domain_error): -> fp64
    1. sqrt(fp32, option:rounding, option:on_domain_error): -> fp32
    2. sqrt(fp64, option:rounding, option:on_domain_error): -> fp64

    Square root of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • exp

    Implementations:
    exp(x, option:rounding): -> return_type
    0. exp(fp32, option:rounding): -> fp32
    1. exp(fp64, option:rounding): -> fp64

    The mathematical constant e, raised to the power of the value.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • cos

    Implementations:
    cos(x, option:rounding): -> return_type
    0. cos(fp32, option:rounding): -> fp64
    1. cos(fp64, option:rounding): -> fp64

    Get the cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • sin

    Implementations:
    sin(x, option:rounding): -> return_type
    0. sin(fp32, option:rounding): -> fp64
    1. sin(fp64, option:rounding): -> fp64

    Get the sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • tan

    Implementations:
    tan(x, option:rounding): -> return_type
    0. tan(fp32, option:rounding): -> fp64
    1. tan(fp64, option:rounding): -> fp64

    Get the tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • cosh

    Implementations:
    cosh(x, option:rounding): -> return_type
    0. cosh(fp32, option:rounding): -> fp32
    1. cosh(fp64, option:rounding): -> fp64

    Get the hyperbolic cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • sinh

    Implementations:
    sinh(x, option:rounding): -> return_type
    0. sinh(fp32, option:rounding): -> fp32
    1. sinh(fp64, option:rounding): -> fp64

    Get the hyperbolic sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • tanh

    Implementations:
    tanh(x, option:rounding): -> return_type
    0. tanh(fp32, option:rounding): -> fp32
    1. tanh(fp64, option:rounding): -> fp64

    Get the hyperbolic tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • acos

    Implementations:
    acos(x, option:rounding, option:on_domain_error): -> return_type
    0. acos(fp32, option:rounding, option:on_domain_error): -> fp64
    1. acos(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • asin

    Implementations:
    asin(x, option:rounding, option:on_domain_error): -> return_type
    0. asin(fp32, option:rounding, option:on_domain_error): -> fp64
    1. asin(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • atan

    Implementations:
    atan(x, option:rounding): -> return_type
    0. atan(fp32, option:rounding): -> fp64
    1. atan(fp64, option:rounding): -> fp64

    Get the arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • acosh

    Implementations:
    acosh(x, option:rounding, option:on_domain_error): -> return_type
    0. acosh(fp32, option:rounding, option:on_domain_error): -> fp32
    1. acosh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • asinh

    Implementations:
    asinh(x, option:rounding): -> return_type
    0. asinh(fp32, option:rounding): -> fp32
    1. asinh(fp64, option:rounding): -> fp64

    Get the hyperbolic arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • atanh

    Implementations:
    atanh(x, option:rounding, option:on_domain_error): -> return_type
    0. atanh(fp32, option:rounding, option:on_domain_error): -> fp32
    1. atanh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • atan2

    Implementations:
    atan2(x, y, option:rounding, option:on_domain_error): -> return_type
    0. atan2(fp32, fp32, option:rounding, option:on_domain_error): -> fp64
    1. atan2(fp64, fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arctangent of values given as x/y pairs.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • radians

    Implementations:
    radians(x, option:rounding): -> return_type
    0. radians(fp32, option:rounding): -> fp32
    1. radians(fp64, option:rounding): -> fp64

    *Converts angle x in degrees to radians. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • degrees

    Implementations:
    degrees(x, option:rounding): -> return_type
    0. degrees(fp32, option:rounding): -> fp32
    1. degrees(fp64, option:rounding): -> fp64

    *Converts angle x in radians to degrees. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • abs

    Implementations:
    abs(x, option:overflow): -> return_type
    0. abs(i8, option:overflow): -> i8
    1. abs(i16, option:overflow): -> i16
    2. abs(i32, option:overflow): -> i32
    3. abs(i64, option:overflow): -> i64
    4. abs(fp32): -> fp32
    5. abs(fp64): -> fp64

    *Calculate the absolute value of the argument. Integer values allow the specification of overflow behavior to handle the unevenness of the twos complement, e.g. Int8 range [-128 : 127]. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sign

    Implementations:
    sign(x): -> return_type
    0. sign(i8): -> i8
    1. sign(i16): -> i16
    2. sign(i32): -> i32
    3. sign(i64): -> i64
    4. sign(fp32): -> fp32
    5. sign(fp64): -> fp64

    *Return the signedness of the argument. Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] *

    factorial

    Implementations:
    factorial(n, option:overflow): -> return_type
    0. factorial(i32, option:overflow): -> i32
    1. factorial(i64, option:overflow): -> i64

    *Return the factorial of a given integer input. The factorial of 0! is 1 by convention. Negative inputs will raise an error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • bitwise_not

    Implementations:
    bitwise_not(x): -> return_type
    0. bitwise_not(i8): -> i8
    1. bitwise_not(i16): -> i16
    2. bitwise_not(i32): -> i32
    3. bitwise_not(i64): -> i64

    *Return the bitwise NOT result for one integer input. *

    bitwise_and

    Implementations:
    bitwise_and(x, y): -> return_type
    0. bitwise_and(i8, i8): -> i8
    1. bitwise_and(i16, i16): -> i16
    2. bitwise_and(i32, i32): -> i32
    3. bitwise_and(i64, i64): -> i64

    *Return the bitwise AND result for two integer inputs. *

    bitwise_or

    Implementations:
    bitwise_or(x, y): -> return_type
    0. bitwise_or(i8, i8): -> i8
    1. bitwise_or(i16, i16): -> i16
    2. bitwise_or(i32, i32): -> i32
    3. bitwise_or(i64, i64): -> i64

    *Return the bitwise OR result for two given integer inputs. *

    bitwise_xor

    Implementations:
    bitwise_xor(x, y): -> return_type
    0. bitwise_xor(i8, i8): -> i8
    1. bitwise_xor(i16, i16): -> i16
    2. bitwise_xor(i32, i32): -> i32
    3. bitwise_xor(i64, i64): -> i64

    *Return the bitwise XOR result for two integer inputs. *

    Aggregate Functions

    sum

    Implementations:
    sum(x, option:overflow): -> return_type
    0. sum(i8, option:overflow): -> i64?
    1. sum(i16, option:overflow): -> i64?
    2. sum(i32, option:overflow): -> i64?
    3. sum(i64, option:overflow): -> i64?
    4. sum(fp32, option:overflow): -> fp64?
    5. sum(fp64, option:overflow): -> fp64?

    Sum a set of values. The sum of zero elements yields null.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • sum0

    Implementations:
    sum0(x, option:overflow): -> return_type
    0. sum0(i8, option:overflow): -> i64
    1. sum0(i16, option:overflow): -> i64
    2. sum0(i32, option:overflow): -> i64
    3. sum0(i64, option:overflow): -> i64
    4. sum0(fp32, option:overflow): -> fp64
    5. sum0(fp64, option:overflow): -> fp64

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • avg

    Implementations:
    avg(x, option:overflow): -> return_type
    0. avg(i8, option:overflow): -> i8?
    1. avg(i16, option:overflow): -> i16?
    2. avg(i32, option:overflow): -> i32?
    3. avg(i64, option:overflow): -> i64?
    4. avg(fp32, option:overflow): -> fp32?
    5. avg(fp64, option:overflow): -> fp64?

    Average a set of values. For integral types, this truncates partial values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • min

    Implementations:
    min(x): -> return_type
    0. min(i8): -> i8?
    1. min(i16): -> i16?
    2. min(i32): -> i32?
    3. min(i64): -> i64?
    4. min(fp32): -> fp32?
    5. min(fp64): -> fp64?
    6. min(timestamp): -> timestamp?
    7. min(timestamp_tz): -> timestamp_tz?

    Min a set of values.

    max

    Implementations:
    max(x): -> return_type
    0. max(i8): -> i8?
    1. max(i16): -> i16?
    2. max(i32): -> i32?
    3. max(i64): -> i64?
    4. max(fp32): -> fp32?
    5. max(fp64): -> fp64?
    6. max(timestamp): -> timestamp?
    7. max(timestamp_tz): -> timestamp_tz?

    Max a set of values.

    product

    Implementations:
    product(x, option:overflow): -> return_type
    0. product(i8, option:overflow): -> i8
    1. product(i16, option:overflow): -> i16
    2. product(i32, option:overflow): -> i32
    3. product(i64, option:overflow): -> i64
    4. product(fp32, option:rounding): -> fp32
    5. product(fp64, option:rounding): -> fp64

    Product of a set of values. Returns 1 for empty input.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • std_dev

    Implementations:
    std_dev(x, option:rounding, option:distribution): -> return_type
    0. std_dev(fp32, option:rounding, option:distribution): -> fp32?
    1. std_dev(fp64, option:rounding, option:distribution): -> fp64?

    Calculates standard-deviation for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • variance

    Implementations:
    variance(x, option:rounding, option:distribution): -> return_type
    0. variance(fp32, option:rounding, option:distribution): -> fp32?
    1. variance(fp64, option:rounding, option:distribution): -> fp64?

    Calculates variance for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • corr

    Implementations:
    corr(x, y, option:rounding): -> return_type
    0. corr(fp32, fp32, option:rounding): -> fp32?
    1. corr(fp64, fp64, option:rounding): -> fp64?

    *Calculates the value of Pearson’s correlation coefficient between x and y. If there is no input, null is returned. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • mode

    Implementations:
    mode(x): -> return_type
    0. mode(i8): -> i8?
    1. mode(i16): -> i16?
    2. mode(i32): -> i32?
    3. mode(i64): -> i64?
    4. mode(fp32): -> fp32?
    5. mode(fp64): -> fp64?

    *Calculates mode for a set of values. If there is no input, null is returned. *

    median

    Implementations:
    median(precision, x, option:rounding): -> return_type
    0. median(precision, i8, option:rounding): -> i8?
    1. median(precision, i16, option:rounding): -> i16?
    2. median(precision, i32, option:rounding): -> i32?
    3. median(precision, i64, option:rounding): -> i64?
    4. median(precision, fp32, option:rounding): -> fp32?
    5. median(precision, fp64, option:rounding): -> fp64?

    *Calculate the median for a set of values. Returns null if applied to zero records. For the integer implementations, the rounding option determines how the median should be rounded if it ends up midway between two values. For the floating point implementations, they specify the usual floating point rounding mode. *

    Options:
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • quantile

    Implementations:
    quantile(boundaries, precision, n, distribution, option:rounding): -> return_type

  • n: A positive integer which defines the number of quantile partitions.
  • distribution: The data for which the quantiles should be computed.
  • 0. quantile(boundaries, precision, i64, any, option:rounding): -> LIST?<any>

    *Calculates quantiles for a set of values. This function will divide the aggregated values (passed via the distribution argument) over N equally-sized bins, where N is passed via a constant argument. It will then return the values at the boundaries of these bins in list form. If the input is appropriately sorted, this computes the quantiles of the distribution. The function can optionally return the first and/or last element of the input, as specified by the boundaries argument. If the input is appropriately sorted, this will thus be the minimum and/or maximum values of the distribution. When the boundaries do not lie exactly on elements of the incoming distribution, the function will interpolate between the two nearby elements. If the interpolated value cannot be represented exactly, the rounding option controls how the value should be selected or computed. The function fails and returns null in the following cases: - n is null or less than one; - any value in distribution is null.

    The function returns an empty list if n equals 1 and boundaries is set to NEITHER. *

    Options:
  • boundaries ['NEITHER', 'MINIMUM', 'MAXIMUM', 'BOTH']
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • Window Functions

    row_number

    Implementations:
    0. row_number(): -> i64?

    the number of the current row within its partition.

    rank

    Implementations:
    0. rank(): -> i64?

    the rank of the current row, with gaps.

    dense_rank

    Implementations:
    0. dense_rank(): -> i64?

    the rank of the current row, without gaps.

    percent_rank

    Implementations:
    0. percent_rank(): -> fp64?

    the relative rank of the current row.

    cume_dist

    Implementations:
    0. cume_dist(): -> fp64?

    the cumulative distribution.

    ntile

    Implementations:
    ntile(x): -> return_type
    0. ntile(i32): -> i32?
    1. ntile(i64): -> i64?

    Return an integer ranging from 1 to the argument value,dividing the partition as equally as possible.

    first_value

    Implementations:
    first_value(expression): -> return_type
    0. first_value(any1): -> any1

    *Returns the first value in the window. *

    last_value

    Implementations:
    last_value(expression): -> return_type
    0. last_value(any1): -> any1

    *Returns the last value in the window. *

    nth_value

    Implementations:
    nth_value(expression, window_offset, option:on_domain_error): -> return_type
    0. nth_value(any1, i32, option:on_domain_error): -> any1?

    *Returns a value from the nth row based on the window_offset. window_offset should be a positive integer. If the value of the window_offset is outside the range of the window, null is returned. The on_domain_error option governs behavior in cases where window_offset is not a positive integer or null. *

    Options:
  • on_domain_error ['NAN', 'ERROR']
  • lead

    Implementations:
    lead(expression): -> return_type
    0. lead(any1): -> any1?
    1. lead(any1, i32): -> any1?
    2. lead(any1, i32, any1): -> any1?

    *Return a value from a following row based on a specified physical offset. This allows you to compare a value in the current row against a following row. The expression is evaluated against a row that comes after the current row based on the row_offset. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming before the current row, similar to the lag function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the window. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the following year. row_offset of 1. | year | sales | next_year_sales | | 2019 | 20.50 | 30.00 | | 2020 | 30.00 | 45.99 | | 2021 | 45.99 | null | *

    lag

    Implementations:
    lag(expression): -> return_type
    0. lag(any1): -> any1?
    1. lag(any1, i32): -> any1?
    2. lag(any1, i32, any1): -> any1?

    *Return a column value from a previous row based on a specified physical offset. This allows you to compare a value in the current row against a previous row. The expression is evaluated against a row that comes before the current row based on the row_offset. The expression can be a column, expression or subquery that evaluates to a single value. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming after the current row, similar to the lead function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the partition. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the previous year. row_offset of 1. | year | sales | previous_year_sales | | 2019 | 20.50 | null | | 2020 | 30.00 | 20.50 | | 2021 | 45.99 | 30.00 | *

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_arithmetic_decimal/index.html b/extensions/functions_arithmetic_decimal/index.html index a1ff5526..8986f10b 100644 --- a/extensions/functions_arithmetic_decimal/index.html +++ b/extensions/functions_arithmetic_decimal/index.html @@ -1,4 +1,4 @@ - functions_arithmetic_decimal.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_arithmetic_decimal.yaml

    This document file is generated for functions_arithmetic_decimal.yaml

    Scalar Functions

    add

    Implementations:
    add(x, y, option:overflow): -> return_type
    0. add(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)
    + functions_arithmetic_decimal.yaml - Substrait: Cross-Language Serialization for Relational Algebra      

    functions_arithmetic_decimal.yaml

    This document file is generated for functions_arithmetic_decimal.yaml

    Scalar Functions

    add

    Implementations:
    add(x, y, option:overflow): -> return_type
    0. add(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)
     init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
     min_scale = min(init_scale, 6)
     delta = init_prec - 38
    @@ -58,4 +58,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_boolean/index.html b/extensions/functions_boolean/index.html index 211adfac..d206a7c7 100644 --- a/extensions/functions_boolean/index.html +++ b/extensions/functions_boolean/index.html @@ -1,4 +1,4 @@ - functions_boolean.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_boolean.yaml

    This document file is generated for functions_boolean.yaml

    Scalar Functions

    or

    Implementations:
    or(a): -> return_type
    0. or(boolean?): -> boolean?

    *The boolean or using Kleene logic. This function behaves as follows with nulls:

    true or null = true
    + functions_boolean.yaml - Substrait: Cross-Language Serialization for Relational Algebra      

    functions_boolean.yaml

    This document file is generated for functions_boolean.yaml

    Scalar Functions

    or

    Implementations:
    or(a): -> return_type
    0. or(boolean?): -> boolean?

    *The boolean or using Kleene logic. This function behaves as follows with nulls:

    true or null = true
     
     null or true = true
     
    @@ -45,4 +45,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_comparison/index.html b/extensions/functions_comparison/index.html index bb508aff..a256ad32 100644 --- a/extensions/functions_comparison/index.html +++ b/extensions/functions_comparison/index.html @@ -1,4 +1,4 @@ - functions_comparison.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_comparison.yaml

    This document file is generated for functions_comparison.yaml

    Scalar Functions

    not_equal

    Implementations:
    not_equal(x, y): -> return_type
    0. not_equal(any1, any1): -> boolean

    *Whether two values are not_equal. not_equal(x, y) := (x != y) If either/both of x and y are null, null is returned. *

    equal

    Implementations:
    equal(x, y): -> return_type
    0. equal(any1, any1): -> boolean

    *Whether two values are equal. equal(x, y) := (x == y) If either/both of x and y are null, null is returned. *

    is_not_distinct_from

    Implementations:
    is_not_distinct_from(x, y): -> return_type
    0. is_not_distinct_from(any1, any1): -> boolean

    *Whether two values are equal. This function treats null values as comparable, so is_not_distinct_from(null, null) == True This is in contrast to equal, in which null values do not compare. *

    lt

    Implementations:
    lt(x, y): -> return_type
    0. lt(any1, any1): -> boolean

    *Less than. lt(x, y) := (x < y) If either/both of x and y are null, null is returned. *

    gt

    Implementations:
    gt(x, y): -> return_type
    0. gt(any1, any1): -> boolean

    *Greater than. gt(x, y) := (x > y) If either/both of x and y are null, null is returned. *

    lte

    Implementations:
    lte(x, y): -> return_type
    0. lte(any1, any1): -> boolean

    *Less than or equal to. lte(x, y) := (x <= y) If either/both of x and y are null, null is returned. *

    gte

    Implementations:
    gte(x, y): -> return_type
    0. gte(any1, any1): -> boolean

    *Greater than or equal to. gte(x, y) := (x >= y) If either/both of x and y are null, null is returned. *

    between

    Implementations:
    between(expression, low, high): -> return_type

  • expression: The expression to test for in the range defined by `low` and `high`.
  • low: The value to check if greater than or equal to.
  • high: The value to check if less than or equal to.
  • 0. between(any1, any1, any1): -> boolean

    Whether the expression is greater than or equal to low and less than or equal to high. expression BETWEEN low AND high If low, high, or expression are null, null is returned.

    is_null

    Implementations:
    is_null(x): -> return_type
    0. is_null(any1): -> boolean

    Whether a value is null. NaN is not null.

    is_not_null

    Implementations:
    is_not_null(x): -> return_type
    0. is_not_null(any1): -> boolean

    Whether a value is not null. NaN is not null.

    is_nan

    Implementations:
    is_nan(x): -> return_type
    0. is_nan(fp32): -> boolean
    1. is_nan(fp64): -> boolean

    *Whether a value is not a number. If x is null, null is returned. *

    is_finite

    Implementations:
    is_finite(x): -> return_type
    0. is_finite(fp32): -> boolean
    1. is_finite(fp64): -> boolean

    *Whether a value is finite (neither infinite nor NaN). If x is null, null is returned. *

    is_infinite

    Implementations:
    is_infinite(x): -> return_type
    0. is_infinite(fp32): -> boolean
    1. is_infinite(fp64): -> boolean

    *Whether a value is infinite. If x is null, null is returned. *

    nullif

    Implementations:
    nullif(x, y): -> return_type
    0. nullif(any1, any1): -> any1

    If two values are equal, return null. Otherwise, return the first value.

    coalesce

    Implementations:
    0. coalesce(any1, any1): -> any1

    Evaluate arguments from left to right and return the first argument that is not null. Once a non-null argument is found, the remaining arguments are not evaluated. If all arguments are null, return null.

    least

    Implementations:
    0. least(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null if any argument evaluates to null.

    least_skip_null

    Implementations:
    0. least_skip_null(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null only if all arguments evaluate to null.

    greatest

    Implementations:
    0. greatest(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null if any argument evaluates to null.

    greatest_skip_null

    Implementations:
    0. greatest_skip_null(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null only if all arguments evaluate to null.

    GitHub

    functions_comparison.yaml

    This document file is generated for functions_comparison.yaml

    Scalar Functions

    not_equal

    Implementations:
    not_equal(x, y): -> return_type
    0. not_equal(any1, any1): -> boolean

    *Whether two values are not_equal. not_equal(x, y) := (x != y) If either/both of x and y are null, null is returned. *

    equal

    Implementations:
    equal(x, y): -> return_type
    0. equal(any1, any1): -> boolean

    *Whether two values are equal. equal(x, y) := (x == y) If either/both of x and y are null, null is returned. *

    is_not_distinct_from

    Implementations:
    is_not_distinct_from(x, y): -> return_type
    0. is_not_distinct_from(any1, any1): -> boolean

    *Whether two values are equal. This function treats null values as comparable, so is_not_distinct_from(null, null) == True This is in contrast to equal, in which null values do not compare. *

    lt

    Implementations:
    lt(x, y): -> return_type
    0. lt(any1, any1): -> boolean

    *Less than. lt(x, y) := (x < y) If either/both of x and y are null, null is returned. *

    gt

    Implementations:
    gt(x, y): -> return_type
    0. gt(any1, any1): -> boolean

    *Greater than. gt(x, y) := (x > y) If either/both of x and y are null, null is returned. *

    lte

    Implementations:
    lte(x, y): -> return_type
    0. lte(any1, any1): -> boolean

    *Less than or equal to. lte(x, y) := (x <= y) If either/both of x and y are null, null is returned. *

    gte

    Implementations:
    gte(x, y): -> return_type
    0. gte(any1, any1): -> boolean

    *Greater than or equal to. gte(x, y) := (x >= y) If either/both of x and y are null, null is returned. *

    between

    Implementations:
    between(expression, low, high): -> return_type

  • expression: The expression to test for in the range defined by `low` and `high`.
  • low: The value to check if greater than or equal to.
  • high: The value to check if less than or equal to.
  • 0. between(any1, any1, any1): -> boolean

    Whether the expression is greater than or equal to low and less than or equal to high. expression BETWEEN low AND high If low, high, or expression are null, null is returned.

    is_null

    Implementations:
    is_null(x): -> return_type
    0. is_null(any1): -> boolean

    Whether a value is null. NaN is not null.

    is_not_null

    Implementations:
    is_not_null(x): -> return_type
    0. is_not_null(any1): -> boolean

    Whether a value is not null. NaN is not null.

    is_nan

    Implementations:
    is_nan(x): -> return_type
    0. is_nan(fp32): -> boolean
    1. is_nan(fp64): -> boolean

    *Whether a value is not a number. If x is null, null is returned. *

    is_finite

    Implementations:
    is_finite(x): -> return_type
    0. is_finite(fp32): -> boolean
    1. is_finite(fp64): -> boolean

    *Whether a value is finite (neither infinite nor NaN). If x is null, null is returned. *

    is_infinite

    Implementations:
    is_infinite(x): -> return_type
    0. is_infinite(fp32): -> boolean
    1. is_infinite(fp64): -> boolean

    *Whether a value is infinite. If x is null, null is returned. *

    nullif

    Implementations:
    nullif(x, y): -> return_type
    0. nullif(any1, any1): -> any1

    If two values are equal, return null. Otherwise, return the first value.

    coalesce

    Implementations:
    0. coalesce(any1, any1): -> any1

    Evaluate arguments from left to right and return the first argument that is not null. Once a non-null argument is found, the remaining arguments are not evaluated. If all arguments are null, return null.

    least

    Implementations:
    0. least(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null if any argument evaluates to null.

    least_skip_null

    Implementations:
    0. least_skip_null(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null only if all arguments evaluate to null.

    greatest

    Implementations:
    0. greatest(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null if any argument evaluates to null.

    greatest_skip_null

    Implementations:
    0. greatest_skip_null(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null only if all arguments evaluate to null.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_datetime/index.html b/extensions/functions_datetime/index.html index d53ca51a..134c9c1f 100644 --- a/extensions/functions_datetime/index.html +++ b/extensions/functions_datetime/index.html @@ -1,4 +1,4 @@ - functions_datetime.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_datetime.yaml

    This document file is generated for functions_datetime.yaml

    Scalar Functions

    extract

    Implementations:
    extract(component, x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. extract(component, timestamp_tz, string): -> i64
    1. extract(component, precision_timestamp_tz<P1>, string): -> i64
    2. extract(component, timestamp): -> i64
    3. extract(component, precision_timestamp<P1>): -> i64
    4. extract(component, date): -> i64
    5. extract(component, time): -> i64
    6. extract(component, indexing, timestamp_tz, string): -> i64
    7. extract(component, indexing, precision_timestamp_tz<P1>, string): -> i64
    8. extract(component, indexing, timestamp): -> i64
    9. extract(component, indexing, precision_timestamp<P1>): -> i64
    10. extract(component, indexing, date): -> i64

    Extract portion of a date/time value. * YEAR Return the year. * ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of its days in January. * US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more) of its days in January. Last week of US epidemiological year has the year’s last Wednesday in it. US epidemiological week starts on Sunday. * QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter, April 1 through June 30 map to the second quarter, etc. * MONTH Return the number of the month within the year. * DAY Return the number of the day within the month. * DAY_OF_YEAR Return the number of the day within the year. January 1 maps to the first day, February 1 maps to the thirty-second day, etc. * MONDAY_DAY_OF_WEEK Return the number of the day within the week, from Monday (first day) to Sunday (seventh day). * SUNDAY_DAY_OF_WEEK Return the number of the day within the week, from Sunday (first day) to Saturday (seventh day). * MONDAY_WEEK Return the number of the week within the year. First week starts on first Monday of January. * SUNDAY_WEEK Return the number of the week within the year. First week starts on first Sunday of January. * ISO_WEEK Return the number of the ISO week within the ISO year. First ISO week has the majority (4 or more) of its days in January. ISO week starts on Monday. * US_WEEK Return the number of the US week within the US year. First US week has the majority (4 or more) of its days in January. US week starts on Sunday. * HOUR Return the hour (0-23). * MINUTE Return the minute (0-59). * SECOND Return the second (0-59). * MILLISECOND Return number of milliseconds since the last full second. * MICROSECOND Return number of microseconds since the last full millisecond. * NANOSECOND Return number of nanoseconds since the last full microsecond. * SUBSECOND Return number of microseconds since the last full second of the given timestamp. * UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds. * TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC. The range of values returned for QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK depends on whether counting starts at 1 or 0. This is governed by the indexing option. When indexing is ONE: * QUARTER returns values in range 1-4 * MONTH returns values in range 1-12 * DAY returns values in range 1-31 * DAY_OF_YEAR returns values in range 1-366 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 1-7 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 1-53 When indexing is ZERO: * QUARTER returns values in range 0-3 * MONTH returns values in range 0-11 * DAY returns values in range 0-30 * DAY_OF_YEAR returns values in range 0-365 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 0-6 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52 The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'UNIX_TIME']
  • indexing ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND']
  • component ['QUARTER', 'MONTH', 'DAY', 'DAY_OF_YEAR', 'MONDAY_DAY_OF_WEEK', 'SUNDAY_DAY_OF_WEEK', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK']
  • indexing ['ONE', 'ZERO']
  • extract_boolean

    Implementations:
    extract_boolean(component, x): -> return_type
    0. extract_boolean(component, timestamp): -> boolean
    1. extract_boolean(component, timestamp_tz, string): -> boolean
    2. extract_boolean(component, date): -> boolean

    *Extract boolean values of a date/time value. * IS_LEAP_YEAR Return true if year of the given value is a leap year and false otherwise. * IS_DST Return true if DST (Daylight Savings Time) is observed at the given value in the given timezone.

    Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.*

    Options:
  • component ['IS_LEAP_YEAR']
  • component ['IS_LEAP_YEAR', 'IS_DST']
  • add

    Implementations:
    add(x, y): -> return_type
    0. add(timestamp, interval_year): -> timestamp
    1. add(timestamp_tz, interval_year, string): -> timestamp_tz
    2. add(date, interval_year): -> timestamp
    3. add(timestamp, interval_day): -> timestamp
    4. add(timestamp_tz, interval_day): -> timestamp_tz
    5. add(date, interval_day): -> timestamp

    Add an interval to a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    multiply

    Implementations:
    multiply(x, y): -> return_type
    0. multiply(i8, interval_day): -> interval_day
    1. multiply(i16, interval_day): -> interval_day
    2. multiply(i32, interval_day): -> interval_day
    3. multiply(i64, interval_day): -> interval_day
    4. multiply(i8, interval_year): -> interval_year
    5. multiply(i16, interval_year): -> interval_year
    6. multiply(i32, interval_year): -> interval_year
    7. multiply(i64, interval_year): -> interval_year

    Multiply an interval by an integral number.

    add_intervals

    Implementations:
    add_intervals(x, y): -> return_type
    0. add_intervals(interval_day, interval_day): -> interval_day
    1. add_intervals(interval_year, interval_year): -> interval_year

    Add two intervals together.

    subtract

    Implementations:
    subtract(x, y): -> return_type
    0. subtract(timestamp, interval_year): -> timestamp
    1. subtract(timestamp_tz, interval_year): -> timestamp_tz
    2. subtract(timestamp_tz, interval_year, string): -> timestamp_tz
    3. subtract(date, interval_year): -> date
    4. subtract(timestamp, interval_day): -> timestamp
    5. subtract(timestamp_tz, interval_day): -> timestamp_tz
    6. subtract(date, interval_day): -> date

    Subtract an interval from a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    lte

    Implementations:
    lte(x, y): -> return_type
    0. lte(timestamp, timestamp): -> boolean
    1. lte(timestamp_tz, timestamp_tz): -> boolean
    2. lte(date, date): -> boolean
    3. lte(interval_day, interval_day): -> boolean
    4. lte(interval_year, interval_year): -> boolean

    less than or equal to

    lt

    Implementations:
    lt(x, y): -> return_type
    0. lt(timestamp, timestamp): -> boolean
    1. lt(timestamp_tz, timestamp_tz): -> boolean
    2. lt(date, date): -> boolean
    3. lt(interval_day, interval_day): -> boolean
    4. lt(interval_year, interval_year): -> boolean

    less than

    gte

    Implementations:
    gte(x, y): -> return_type
    0. gte(timestamp, timestamp): -> boolean
    1. gte(timestamp_tz, timestamp_tz): -> boolean
    2. gte(date, date): -> boolean
    3. gte(interval_day, interval_day): -> boolean
    4. gte(interval_year, interval_year): -> boolean

    greater than or equal to

    gt

    Implementations:
    gt(x, y): -> return_type
    0. gt(timestamp, timestamp): -> boolean
    1. gt(timestamp_tz, timestamp_tz): -> boolean
    2. gt(date, date): -> boolean
    3. gt(interval_day, interval_day): -> boolean
    4. gt(interval_year, interval_year): -> boolean

    greater than

    assume_timezone

    Implementations:
    assume_timezone(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. assume_timezone(timestamp, string): -> timestamp_tz
    1. assume_timezone(date, string): -> timestamp_tz

    Convert local timestamp to UTC-relative timestamp_tz using given local time’s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    local_timestamp

    Implementations:
    local_timestamp(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. local_timestamp(timestamp_tz, string): -> timestamp

    Convert UTC-relative timestamp_tz to local timestamp using given local time’s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    strptime_time

    Implementations:
    strptime_time(time_string, format): -> return_type
    0. strptime_time(string, string): -> time

    Parse string into time using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    strptime_date

    Implementations:
    strptime_date(date_string, format): -> return_type
    0. strptime_date(string, string): -> date

    Parse string into date using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    strptime_timestamp

    Implementations:
    strptime_timestamp(timestamp_string, format, timezone): -> return_type

  • timestamp_string: Timezone string from IANA tzdb.
  • 0. strptime_timestamp(string, string, string): -> timestamp_tz
    1. strptime_timestamp(string, string): -> timestamp_tz

    Parse string into timestamp using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference. If timezone is present in timestamp and provided as parameter an error is thrown. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is supplied as parameter and present in the parsed string the parsed timezone is used. If parameter supplied timezone is invalid an error is thrown.

    strftime

    Implementations:
    strftime(x, format): -> return_type
    0. strftime(timestamp, string): -> string
    1. strftime(timestamp_tz, string, string): -> string
    2. strftime(date, string): -> string
    3. strftime(time, string): -> string

    Convert timestamp/date/time to string using provided format, see https://man7.org/linux/man-pages/man3/strftime.3.html for reference. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    round_temporal

    Implementations:
    round_temporal(x, rounding, unit, multiple, origin): -> return_type
    0. round_temporal(timestamp, rounding, unit, i64, timestamp): -> timestamp
    1. round_temporal(timestamp_tz, rounding, unit, i64, string, timestamp_tz): -> timestamp_tz
    2. round_temporal(date, rounding, unit, i64, date): -> date
    3. round_temporal(time, rounding, unit, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the origin in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • round_calendar

    Implementations:
    round_calendar(x, rounding, unit, origin, multiple): -> return_type
    0. round_calendar(timestamp, rounding, unit, origin, i64): -> timestamp
    1. round_calendar(timestamp_tz, rounding, unit, origin, i64, string): -> timestamp_tz
    2. round_calendar(date, rounding, unit, origin, i64, date): -> date
    3. round_calendar(time, rounding, unit, origin, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the last origin unit in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • origin ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY']
  • origin ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • Aggregate Functions

    min

    Implementations:
    min(x): -> return_type
    0. min(date): -> date?
    1. min(time): -> time?
    2. min(timestamp): -> timestamp?
    3. min(timestamp_tz): -> timestamp_tz?
    4. min(interval_day): -> interval_day?
    5. min(interval_year): -> interval_year?

    Min a set of values.

    max

    Implementations:
    max(x): -> return_type
    0. max(date): -> date?
    1. max(time): -> time?
    2. max(timestamp): -> timestamp?
    3. max(timestamp_tz): -> timestamp_tz?
    4. max(interval_day): -> interval_day?
    5. max(interval_year): -> interval_year?

    Max a set of values.

    GitHub

    functions_datetime.yaml

    This document file is generated for functions_datetime.yaml

    Scalar Functions

    extract

    Implementations:
    extract(component, x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. extract(component, timestamp_tz, string): -> i64
    1. extract(component, precision_timestamp_tz<P1>, string): -> i64
    2. extract(component, timestamp): -> i64
    3. extract(component, precision_timestamp<P1>): -> i64
    4. extract(component, date): -> i64
    5. extract(component, time): -> i64
    6. extract(component, indexing, timestamp_tz, string): -> i64
    7. extract(component, indexing, precision_timestamp_tz<P1>, string): -> i64
    8. extract(component, indexing, timestamp): -> i64
    9. extract(component, indexing, precision_timestamp<P1>): -> i64
    10. extract(component, indexing, date): -> i64

    Extract portion of a date/time value. * YEAR Return the year. * ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of its days in January. * US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more) of its days in January. Last week of US epidemiological year has the year’s last Wednesday in it. US epidemiological week starts on Sunday. * QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter, April 1 through June 30 map to the second quarter, etc. * MONTH Return the number of the month within the year. * DAY Return the number of the day within the month. * DAY_OF_YEAR Return the number of the day within the year. January 1 maps to the first day, February 1 maps to the thirty-second day, etc. * MONDAY_DAY_OF_WEEK Return the number of the day within the week, from Monday (first day) to Sunday (seventh day). * SUNDAY_DAY_OF_WEEK Return the number of the day within the week, from Sunday (first day) to Saturday (seventh day). * MONDAY_WEEK Return the number of the week within the year. First week starts on first Monday of January. * SUNDAY_WEEK Return the number of the week within the year. First week starts on first Sunday of January. * ISO_WEEK Return the number of the ISO week within the ISO year. First ISO week has the majority (4 or more) of its days in January. ISO week starts on Monday. * US_WEEK Return the number of the US week within the US year. First US week has the majority (4 or more) of its days in January. US week starts on Sunday. * HOUR Return the hour (0-23). * MINUTE Return the minute (0-59). * SECOND Return the second (0-59). * MILLISECOND Return number of milliseconds since the last full second. * MICROSECOND Return number of microseconds since the last full millisecond. * NANOSECOND Return number of nanoseconds since the last full microsecond. * SUBSECOND Return number of microseconds since the last full second of the given timestamp. * UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds. * TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC. The range of values returned for QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK depends on whether counting starts at 1 or 0. This is governed by the indexing option. When indexing is ONE: * QUARTER returns values in range 1-4 * MONTH returns values in range 1-12 * DAY returns values in range 1-31 * DAY_OF_YEAR returns values in range 1-366 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 1-7 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 1-53 When indexing is ZERO: * QUARTER returns values in range 0-3 * MONTH returns values in range 0-11 * DAY returns values in range 0-30 * DAY_OF_YEAR returns values in range 0-365 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 0-6 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52 The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'UNIX_TIME']
  • indexing ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND']
  • component ['QUARTER', 'MONTH', 'DAY', 'DAY_OF_YEAR', 'MONDAY_DAY_OF_WEEK', 'SUNDAY_DAY_OF_WEEK', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK']
  • indexing ['ONE', 'ZERO']
  • extract_boolean

    Implementations:
    extract_boolean(component, x): -> return_type
    0. extract_boolean(component, timestamp): -> boolean
    1. extract_boolean(component, timestamp_tz, string): -> boolean
    2. extract_boolean(component, date): -> boolean

    *Extract boolean values of a date/time value. * IS_LEAP_YEAR Return true if year of the given value is a leap year and false otherwise. * IS_DST Return true if DST (Daylight Savings Time) is observed at the given value in the given timezone.

    Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.*

    Options:
  • component ['IS_LEAP_YEAR']
  • component ['IS_LEAP_YEAR', 'IS_DST']
  • add

    Implementations:
    add(x, y): -> return_type
    0. add(timestamp, interval_year): -> timestamp
    1. add(timestamp_tz, interval_year, string): -> timestamp_tz
    2. add(date, interval_year): -> timestamp
    3. add(timestamp, interval_day): -> timestamp
    4. add(timestamp_tz, interval_day): -> timestamp_tz
    5. add(date, interval_day): -> timestamp

    Add an interval to a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    multiply

    Implementations:
    multiply(x, y): -> return_type
    0. multiply(i8, interval_day): -> interval_day
    1. multiply(i16, interval_day): -> interval_day
    2. multiply(i32, interval_day): -> interval_day
    3. multiply(i64, interval_day): -> interval_day
    4. multiply(i8, interval_year): -> interval_year
    5. multiply(i16, interval_year): -> interval_year
    6. multiply(i32, interval_year): -> interval_year
    7. multiply(i64, interval_year): -> interval_year

    Multiply an interval by an integral number.

    add_intervals

    Implementations:
    add_intervals(x, y): -> return_type
    0. add_intervals(interval_day, interval_day): -> interval_day
    1. add_intervals(interval_year, interval_year): -> interval_year

    Add two intervals together.

    subtract

    Implementations:
    subtract(x, y): -> return_type
    0. subtract(timestamp, interval_year): -> timestamp
    1. subtract(timestamp_tz, interval_year): -> timestamp_tz
    2. subtract(timestamp_tz, interval_year, string): -> timestamp_tz
    3. subtract(date, interval_year): -> date
    4. subtract(timestamp, interval_day): -> timestamp
    5. subtract(timestamp_tz, interval_day): -> timestamp_tz
    6. subtract(date, interval_day): -> date

    Subtract an interval from a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    lte

    Implementations:
    lte(x, y): -> return_type
    0. lte(timestamp, timestamp): -> boolean
    1. lte(timestamp_tz, timestamp_tz): -> boolean
    2. lte(date, date): -> boolean
    3. lte(interval_day, interval_day): -> boolean
    4. lte(interval_year, interval_year): -> boolean

    less than or equal to

    lt

    Implementations:
    lt(x, y): -> return_type
    0. lt(timestamp, timestamp): -> boolean
    1. lt(timestamp_tz, timestamp_tz): -> boolean
    2. lt(date, date): -> boolean
    3. lt(interval_day, interval_day): -> boolean
    4. lt(interval_year, interval_year): -> boolean

    less than

    gte

    Implementations:
    gte(x, y): -> return_type
    0. gte(timestamp, timestamp): -> boolean
    1. gte(timestamp_tz, timestamp_tz): -> boolean
    2. gte(date, date): -> boolean
    3. gte(interval_day, interval_day): -> boolean
    4. gte(interval_year, interval_year): -> boolean

    greater than or equal to

    gt

    Implementations:
    gt(x, y): -> return_type
    0. gt(timestamp, timestamp): -> boolean
    1. gt(timestamp_tz, timestamp_tz): -> boolean
    2. gt(date, date): -> boolean
    3. gt(interval_day, interval_day): -> boolean
    4. gt(interval_year, interval_year): -> boolean

    greater than

    assume_timezone

    Implementations:
    assume_timezone(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. assume_timezone(timestamp, string): -> timestamp_tz
    1. assume_timezone(date, string): -> timestamp_tz

    Convert local timestamp to UTC-relative timestamp_tz using given local time’s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    local_timestamp

    Implementations:
    local_timestamp(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. local_timestamp(timestamp_tz, string): -> timestamp

    Convert UTC-relative timestamp_tz to local timestamp using given local time’s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    strptime_time

    Implementations:
    strptime_time(time_string, format): -> return_type
    0. strptime_time(string, string): -> time

    Parse string into time using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    strptime_date

    Implementations:
    strptime_date(date_string, format): -> return_type
    0. strptime_date(string, string): -> date

    Parse string into date using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    strptime_timestamp

    Implementations:
    strptime_timestamp(timestamp_string, format, timezone): -> return_type

  • timestamp_string: Timezone string from IANA tzdb.
  • 0. strptime_timestamp(string, string, string): -> timestamp_tz
    1. strptime_timestamp(string, string): -> timestamp_tz

    Parse string into timestamp using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference. If timezone is present in timestamp and provided as parameter an error is thrown. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is supplied as parameter and present in the parsed string the parsed timezone is used. If parameter supplied timezone is invalid an error is thrown.

    strftime

    Implementations:
    strftime(x, format): -> return_type
    0. strftime(timestamp, string): -> string
    1. strftime(timestamp_tz, string, string): -> string
    2. strftime(date, string): -> string
    3. strftime(time, string): -> string

    Convert timestamp/date/time to string using provided format, see https://man7.org/linux/man-pages/man3/strftime.3.html for reference. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    round_temporal

    Implementations:
    round_temporal(x, rounding, unit, multiple, origin): -> return_type
    0. round_temporal(timestamp, rounding, unit, i64, timestamp): -> timestamp
    1. round_temporal(timestamp_tz, rounding, unit, i64, string, timestamp_tz): -> timestamp_tz
    2. round_temporal(date, rounding, unit, i64, date): -> date
    3. round_temporal(time, rounding, unit, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the origin in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • round_calendar

    Implementations:
    round_calendar(x, rounding, unit, origin, multiple): -> return_type
    0. round_calendar(timestamp, rounding, unit, origin, i64): -> timestamp
    1. round_calendar(timestamp_tz, rounding, unit, origin, i64, string): -> timestamp_tz
    2. round_calendar(date, rounding, unit, origin, i64, date): -> date
    3. round_calendar(time, rounding, unit, origin, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the last origin unit in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: “Pacific/Marquesas”, “Etc/GMT+1”. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • origin ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY']
  • origin ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • Aggregate Functions

    min

    Implementations:
    min(x): -> return_type
    0. min(date): -> date?
    1. min(time): -> time?
    2. min(timestamp): -> timestamp?
    3. min(timestamp_tz): -> timestamp_tz?
    4. min(interval_day): -> interval_day?
    5. min(interval_year): -> interval_year?

    Min a set of values.

    max

    Implementations:
    max(x): -> return_type
    0. max(date): -> date?
    1. max(time): -> time?
    2. max(timestamp): -> timestamp?
    3. max(timestamp_tz): -> timestamp_tz?
    4. max(interval_day): -> interval_day?
    5. max(interval_year): -> interval_year?

    Max a set of values.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_geometry/index.html b/extensions/functions_geometry/index.html index e2c77a4a..535f10c2 100644 --- a/extensions/functions_geometry/index.html +++ b/extensions/functions_geometry/index.html @@ -1,4 +1,4 @@ - functions_geometry.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_geometry.yaml

    This document file is generated for functions_geometry.yaml

    Data Types

    name: geometry
    structure: BINARY

    Scalar Functions

    point

    Implementations:
    point(x, y): -> return_type
    0. point(fp64, fp64): -> u!geometry

    *Returns a 2D point with the given x and y coordinate values. *

    make_line

    Implementations:
    make_line(geom1, geom2): -> return_type
    0. make_line(u!geometry, u!geometry): -> u!geometry

    *Returns a linestring connecting the endpoint of geometry geom1 to the begin point of geometry geom2. Repeated points at the beginning of input geometries are collapsed to a single point. A linestring can be closed or simple. A closed linestring starts and ends on the same point. A simple linestring does not cross or touch itself. *

    x_coordinate

    Implementations:
    x_coordinate(point): -> return_type
    0. x_coordinate(u!geometry): -> fp64

    *Return the x coordinate of the point. Return null if not available. *

    y_coordinate

    Implementations:
    y_coordinate(point): -> return_type
    0. y_coordinate(u!geometry): -> fp64

    *Return the y coordinate of the point. Return null if not available. *

    num_points

    Implementations:
    num_points(geom): -> return_type
    0. num_points(u!geometry): -> i64

    *Return the number of points in the geometry. The geometry should be an linestring or circularstring. *

    is_empty

    Implementations:
    is_empty(geom): -> return_type
    0. is_empty(u!geometry): -> boolean

    *Return true is the geometry is an empty geometry. *

    is_closed

    Implementations:
    is_closed(geom): -> return_type
    0. is_closed(geometry): -> boolean

    *Return true if the geometry’s start and end points are the same. *

    is_simple

    Implementations:
    is_simple(geom): -> return_type
    0. is_simple(u!geometry): -> boolean

    *Return true if the geometry does not self intersect. *

    is_ring

    Implementations:
    is_ring(geom): -> return_type
    0. is_ring(u!geometry): -> boolean

    *Return true if the geometry’s start and end points are the same and it does not self intersect. *

    geometry_type

    Implementations:
    geometry_type(geom): -> return_type
    0. geometry_type(u!geometry): -> string

    *Return the type of geometry as a string. *

    envelope

    Implementations:
    envelope(geom): -> return_type
    0. envelope(u!geometry): -> u!geometry

    *Return the minimum bounding box for the input geometry as a geometry. The returned geometry is defined by the corner points of the bounding box. If the input geometry is a point or a line, the returned geometry can also be a point or line. *

    dimension

    Implementations:
    dimension(geom): -> return_type
    0. dimension(u!geometry): -> i8

    *Return the dimension of the input geometry. If the input is a collection of geometries, return the largest dimension from the collection. Dimensionality is determined by the complexity of the input and not the coordinate system being used. Type dimensions: POINT - 0 LINE - 1 POLYGON - 2 *

    is_valid

    Implementations:
    is_valid(geom): -> return_type
    0. is_valid(u!geometry): -> boolean

    *Return true if the input geometry is a valid 2D geometry. For 3 dimensional and 4 dimensional geometries, the validity is still only tested in 2 dimensions. *

    collection_extract

    Implementations:
    collection_extract(geom_collection): -> return_type
    0. collection_extract(u!geometry): -> u!geometry
    1. collection_extract(u!geometry, i8): -> u!geometry

    *Given the input geometry collection, return a homogenous multi-geometry. All geometries in the multi-geometry will have the same dimension. If type is not specified, the multi-geometry will only contain geometries of the highest dimension. If type is specified, the multi-geometry will only contain geometries of that type. If there are no geometries of the specified type, an empty geometry is returned. Only points, linestrings, and polygons are supported. Type numbers: POINT - 0 LINE - 1 POLYGON - 2 *

    flip_coordinates

    Implementations:
    flip_coordinates(geom_collection): -> return_type
    0. flip_coordinates(u!geometry): -> u!geometry

    *Return a version of the input geometry with the X and Y axis flipped. This operation can be performed on geometries with more than 2 dimensions. However, only X and Y axis will be flipped. *

    remove_repeated_points

    Implementations:
    remove_repeated_points(geom): -> return_type
    0. remove_repeated_points(u!geometry): -> u!geometry
    1. remove_repeated_points(u!geometry, fp64): -> u!geometry

    *Return a version of the input geometry with duplicate consecutive points removed. If the tolerance argument is provided, consecutive points within the tolerance distance of one another are considered to be duplicates. *

    buffer

    Implementations:
    buffer(geom, buffer_radius): -> return_type
    0. buffer(u!geometry, fp64): -> u!geometry

    *Compute and return an expanded version of the input geometry. All the points of the returned geometry are at a distance of buffer_radius away from the points of the input geometry. If a negative buffer_radius is provided, the geometry will shrink instead of expand. A negative buffer_radius may shrink the geometry completely, in which case an empty geometry is returned. For input the geometries of points or lines, a negative buffer_radius will always return an emtpy geometry. *

    centroid

    Implementations:
    centroid(geom): -> return_type
    0. centroid(u!geometry): -> u!geometry

    *Return a point which is the geometric center of mass of the input geometry. *

    minimum_bounding_circle

    Implementations:
    minimum_bounding_circle(geom): -> return_type
    0. minimum_bounding_circle(u!geometry): -> u!geometry

    *Return the smallest circle polygon that contains the input geometry. *

    GitHub

    functions_geometry.yaml

    This document file is generated for functions_geometry.yaml

    Data Types

    name: geometry
    structure: BINARY

    Scalar Functions

    point

    Implementations:
    point(x, y): -> return_type
    0. point(fp64, fp64): -> u!geometry

    *Returns a 2D point with the given x and y coordinate values. *

    make_line

    Implementations:
    make_line(geom1, geom2): -> return_type
    0. make_line(u!geometry, u!geometry): -> u!geometry

    *Returns a linestring connecting the endpoint of geometry geom1 to the begin point of geometry geom2. Repeated points at the beginning of input geometries are collapsed to a single point. A linestring can be closed or simple. A closed linestring starts and ends on the same point. A simple linestring does not cross or touch itself. *

    x_coordinate

    Implementations:
    x_coordinate(point): -> return_type
    0. x_coordinate(u!geometry): -> fp64

    *Return the x coordinate of the point. Return null if not available. *

    y_coordinate

    Implementations:
    y_coordinate(point): -> return_type
    0. y_coordinate(u!geometry): -> fp64

    *Return the y coordinate of the point. Return null if not available. *

    num_points

    Implementations:
    num_points(geom): -> return_type
    0. num_points(u!geometry): -> i64

    *Return the number of points in the geometry. The geometry should be an linestring or circularstring. *

    is_empty

    Implementations:
    is_empty(geom): -> return_type
    0. is_empty(u!geometry): -> boolean

    *Return true is the geometry is an empty geometry. *

    is_closed

    Implementations:
    is_closed(geom): -> return_type
    0. is_closed(geometry): -> boolean

    *Return true if the geometry’s start and end points are the same. *

    is_simple

    Implementations:
    is_simple(geom): -> return_type
    0. is_simple(u!geometry): -> boolean

    *Return true if the geometry does not self intersect. *

    is_ring

    Implementations:
    is_ring(geom): -> return_type
    0. is_ring(u!geometry): -> boolean

    *Return true if the geometry’s start and end points are the same and it does not self intersect. *

    geometry_type

    Implementations:
    geometry_type(geom): -> return_type
    0. geometry_type(u!geometry): -> string

    *Return the type of geometry as a string. *

    envelope

    Implementations:
    envelope(geom): -> return_type
    0. envelope(u!geometry): -> u!geometry

    *Return the minimum bounding box for the input geometry as a geometry. The returned geometry is defined by the corner points of the bounding box. If the input geometry is a point or a line, the returned geometry can also be a point or line. *

    dimension

    Implementations:
    dimension(geom): -> return_type
    0. dimension(u!geometry): -> i8

    *Return the dimension of the input geometry. If the input is a collection of geometries, return the largest dimension from the collection. Dimensionality is determined by the complexity of the input and not the coordinate system being used. Type dimensions: POINT - 0 LINE - 1 POLYGON - 2 *

    is_valid

    Implementations:
    is_valid(geom): -> return_type
    0. is_valid(u!geometry): -> boolean

    *Return true if the input geometry is a valid 2D geometry. For 3 dimensional and 4 dimensional geometries, the validity is still only tested in 2 dimensions. *

    collection_extract

    Implementations:
    collection_extract(geom_collection): -> return_type
    0. collection_extract(u!geometry): -> u!geometry
    1. collection_extract(u!geometry, i8): -> u!geometry

    *Given the input geometry collection, return a homogenous multi-geometry. All geometries in the multi-geometry will have the same dimension. If type is not specified, the multi-geometry will only contain geometries of the highest dimension. If type is specified, the multi-geometry will only contain geometries of that type. If there are no geometries of the specified type, an empty geometry is returned. Only points, linestrings, and polygons are supported. Type numbers: POINT - 0 LINE - 1 POLYGON - 2 *

    flip_coordinates

    Implementations:
    flip_coordinates(geom_collection): -> return_type
    0. flip_coordinates(u!geometry): -> u!geometry

    *Return a version of the input geometry with the X and Y axis flipped. This operation can be performed on geometries with more than 2 dimensions. However, only X and Y axis will be flipped. *

    remove_repeated_points

    Implementations:
    remove_repeated_points(geom): -> return_type
    0. remove_repeated_points(u!geometry): -> u!geometry
    1. remove_repeated_points(u!geometry, fp64): -> u!geometry

    *Return a version of the input geometry with duplicate consecutive points removed. If the tolerance argument is provided, consecutive points within the tolerance distance of one another are considered to be duplicates. *

    buffer

    Implementations:
    buffer(geom, buffer_radius): -> return_type
    0. buffer(u!geometry, fp64): -> u!geometry

    *Compute and return an expanded version of the input geometry. All the points of the returned geometry are at a distance of buffer_radius away from the points of the input geometry. If a negative buffer_radius is provided, the geometry will shrink instead of expand. A negative buffer_radius may shrink the geometry completely, in which case an empty geometry is returned. For input the geometries of points or lines, a negative buffer_radius will always return an emtpy geometry. *

    centroid

    Implementations:
    centroid(geom): -> return_type
    0. centroid(u!geometry): -> u!geometry

    *Return a point which is the geometric center of mass of the input geometry. *

    minimum_bounding_circle

    Implementations:
    minimum_bounding_circle(geom): -> return_type
    0. minimum_bounding_circle(u!geometry): -> u!geometry

    *Return the smallest circle polygon that contains the input geometry. *

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_logarithmic/index.html b/extensions/functions_logarithmic/index.html index 774e8f76..7e34ac14 100644 --- a/extensions/functions_logarithmic/index.html +++ b/extensions/functions_logarithmic/index.html @@ -1,4 +1,4 @@ - functions_logarithmic.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_logarithmic.yaml

    This document file is generated for functions_logarithmic.yaml

    Scalar Functions

    ln

    Implementations:
    ln(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. ln(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. ln(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Natural logarithm of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log10

    Implementations:
    log10(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log10(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log10(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 10 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log2

    Implementations:
    log2(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log2(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log2(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 2 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • logb

    Implementations:
    logb(x, base, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type

  • x: The number `x` to compute the logarithm of
  • base: The logarithm base `b` to use
  • 0. logb(fp32, fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. logb(fp64, fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Logarithm of the value with the given base logb(x, b) => log_{b} (x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log1p

    Implementations:
    log1p(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log1p(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log1p(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Natural logarithm (base e) of 1 + x log1p(x) => log(1+x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • GitHub

    functions_logarithmic.yaml

    This document file is generated for functions_logarithmic.yaml

    Scalar Functions

    ln

    Implementations:
    ln(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. ln(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. ln(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Natural logarithm of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log10

    Implementations:
    log10(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log10(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log10(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 10 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log2

    Implementations:
    log2(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log2(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log2(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 2 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • logb

    Implementations:
    logb(x, base, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type

  • x: The number `x` to compute the logarithm of
  • base: The logarithm base `b` to use
  • 0. logb(fp32, fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. logb(fp64, fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Logarithm of the value with the given base logb(x, b) => log_{b} (x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • log1p

    Implementations:
    log1p(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type
    0. log1p(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32
    1. log1p(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Natural logarithm (base e) of 1 + x log1p(x) => log(1+x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_rounding/index.html b/extensions/functions_rounding/index.html index f7420440..7d46bc95 100644 --- a/extensions/functions_rounding/index.html +++ b/extensions/functions_rounding/index.html @@ -1,4 +1,4 @@ - functions_rounding.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_rounding.yaml

    This document file is generated for functions_rounding.yaml

    Scalar Functions

    ceil

    Implementations:
    ceil(x): -> return_type
    0. ceil(fp32): -> fp32
    1. ceil(fp64): -> fp64

    *Rounding to the ceiling of the value x. *

    floor

    Implementations:
    floor(x): -> return_type
    0. floor(fp32): -> fp32
    1. floor(fp64): -> fp64

    *Rounding to the floor of the value x. *

    round

    Implementations:
    round(x, s, option:rounding): -> return_type

  • x: Numerical expression to be rounded.
  • s: Number of decimal places to be rounded to. When `s` is a positive number, nothing will happen since `x` is an integer value. When `s` is a negative number, the rounding is performed to the nearest multiple of `10^(-s)`.
  • 0. round(i8, i32, option:rounding): -> i8?
    1. round(i16, i32, option:rounding): -> i16?
    2. round(i32, i32, option:rounding): -> i32?
    3. round(i64, i32, option:rounding): -> i64?
    4. round(fp32, i32, option:rounding): -> fp32?
    5. round(fp64, i32, option:rounding): -> fp64?

    *Rounding the value x to s decimal places. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR', 'AWAY_FROM_ZERO', 'TIE_DOWN', 'TIE_UP', 'TIE_TOWARDS_ZERO', 'TIE_TO_ODD']
  • GitHub

    functions_rounding.yaml

    This document file is generated for functions_rounding.yaml

    Scalar Functions

    ceil

    Implementations:
    ceil(x): -> return_type
    0. ceil(fp32): -> fp32
    1. ceil(fp64): -> fp64

    *Rounding to the ceiling of the value x. *

    floor

    Implementations:
    floor(x): -> return_type
    0. floor(fp32): -> fp32
    1. floor(fp64): -> fp64

    *Rounding to the floor of the value x. *

    round

    Implementations:
    round(x, s, option:rounding): -> return_type

  • x: Numerical expression to be rounded.
  • s: Number of decimal places to be rounded to. When `s` is a positive number, nothing will happen since `x` is an integer value. When `s` is a negative number, the rounding is performed to the nearest multiple of `10^(-s)`.
  • 0. round(i8, i32, option:rounding): -> i8?
    1. round(i16, i32, option:rounding): -> i16?
    2. round(i32, i32, option:rounding): -> i32?
    3. round(i64, i32, option:rounding): -> i64?
    4. round(fp32, i32, option:rounding): -> fp32?
    5. round(fp64, i32, option:rounding): -> fp64?

    *Rounding the value x to s decimal places. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR', 'AWAY_FROM_ZERO', 'TIE_DOWN', 'TIE_UP', 'TIE_TOWARDS_ZERO', 'TIE_TO_ODD']
  • \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_set/index.html b/extensions/functions_set/index.html index 8f5b6764..89dc5b5b 100644 --- a/extensions/functions_set/index.html +++ b/extensions/functions_set/index.html @@ -1,4 +1,4 @@ - functions_set.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_set.yaml

    This document file is generated for functions_set.yaml

    Scalar Functions

    index_in

    Implementations:
    index_in(x, y, option:nan_equality): -> return_type
    0. index_in(T, List<T>, option:nan_equality): -> int64?

    *Checks the membership of a value in a list of values Returns the first 0-based index value of some input T if T is equal to any element in List<T>. Returns NULL if not found. If T is NULL, returns NULL. If T is NaN: - Returns 0-based index of NaN in List<T> (default) - Returns NULL (if NAN_IS_NOT_NAN is specified) *

    Options:
  • nan_equality ['NAN_IS_NAN', 'NAN_IS_NOT_NAN']
  • GitHub

    functions_set.yaml

    This document file is generated for functions_set.yaml

    Scalar Functions

    index_in

    Implementations:
    index_in(x, y, option:nan_equality): -> return_type
    0. index_in(T, List<T>, option:nan_equality): -> int64?

    *Checks the membership of a value in a list of values Returns the first 0-based index value of some input T if T is equal to any element in List<T>. Returns NULL if not found. If T is NULL, returns NULL. If T is NaN: - Returns 0-based index of NaN in List<T> (default) - Returns NULL (if NAN_IS_NOT_NAN is specified) *

    Options:
  • nan_equality ['NAN_IS_NAN', 'NAN_IS_NOT_NAN']
  • \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/functions_string/index.html b/extensions/functions_string/index.html index 4b0a4207..99ec906e 100644 --- a/extensions/functions_string/index.html +++ b/extensions/functions_string/index.html @@ -1,4 +1,4 @@ - functions_string.yaml - Substrait: Cross-Language Serialization for Relational Algebra

    functions_string.yaml

    This document file is generated for functions_string.yaml

    Scalar Functions

    concat

    Implementations:
    concat(input, option:null_handling): -> return_type
    0. concat(varchar<L1>, option:null_handling): -> varchar<L1>
    1. concat(string, option:null_handling): -> string

    Concatenate strings. The null_handling option determines whether or not null values will be recognized by the function. If null_handling is set to IGNORE_NULLS, null value arguments will be ignored when strings are concatenated. If set to ACCEPT_NULLS, the result will be null if any argument passed to the concat function is null.

    Options:
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • like

    Implementations:
    like(input, match, option:case_sensitivity): -> return_type

  • input: The input string.
  • match: The string to match against the input string.
  • 0. like(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. like(string, string, option:case_sensitivity): -> boolean

    Are two strings like each other. The case_sensitivity option applies to the match argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • substring

    Implementations:
    substring(input, start, length, option:negative_start): -> return_type
    0. substring(varchar<L1>, i32, i32, option:negative_start): -> varchar<L1>
    1. substring(string, i32, i32, option:negative_start): -> string
    2. substring(fixedchar<l1>, i32, i32, option:negative_start): -> string
    3. substring(varchar<L1>, i32, option:negative_start): -> varchar<L1>
    4. substring(string, i32, option:negative_start): -> string
    5. substring(fixedchar<l1>, i32, option:negative_start): -> string

    Extract a substring of a specified length starting from position start. A start value of 1 refers to the first characters of the string. When length is not specified the function will extract a substring starting from position start and ending at the end of the string. The negative_start option applies to the start parameter. WRAP_FROM_END means the index will start from the end of the input and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. LEFT_OF_BEGINNING means the returned substring will start from the left of the first character. A start of -1 will begin 2 characters left of the the input, while a start of 0 begins 1 character left of the input.

    Options:
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • regexp_match_substring

    Implementations:
    regexp_match_substring(input, pattern, position, occurrence, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_match_substring(varchar<L1>, varchar<L2>, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>
    1. regexp_match_substring(string, string, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the occurrence argument. Specifying 1 means the first occurrence will be extracted, 2 means the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return the substring matching the full regular expression. Specifying 1 will return the substring matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • regexp_match_substring_all

    Implementations:
    regexp_match_substring_all(input, pattern, position, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_match_substring_all(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    1. regexp_match_substring_all(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return substrings matching the full regular expression. Specifying 1 will return substrings matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • starts_with

    Implementations:
    starts_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. starts_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. starts_with(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. starts_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. starts_with(string, string, option:case_sensitivity): -> boolean
    4. starts_with(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. starts_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. starts_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. starts_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. starts_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string starts with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • ends_with

    Implementations:
    ends_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. ends_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. ends_with(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. ends_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. ends_with(string, string, option:case_sensitivity): -> boolean
    4. ends_with(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. ends_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. ends_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. ends_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. ends_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether input string ends with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • contains

    Implementations:
    contains(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. contains(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. contains(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. contains(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. contains(string, string, option:case_sensitivity): -> boolean
    4. contains(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. contains(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. contains(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. contains(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. contains(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string contains the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • strpos

    Implementations:
    strpos(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. strpos(string, string, option:case_sensitivity): -> i64
    1. strpos(varchar<L1>, varchar<L1>, option:case_sensitivity): -> i64
    2. strpos(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_strpos

    Implementations:
    regexp_strpos(input, pattern, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_strpos(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    1. regexp_strpos(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the occurrence argument. Specifying 1 means the position first occurrence will be returned, 2 means the position of the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • count_substring

    Implementations:
    count_substring(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to count.
  • 0. count_substring(string, string, option:case_sensitivity): -> i64
    1. count_substring(varchar<L1>, varchar<L2>, option:case_sensitivity): -> i64
    2. count_substring(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the number of non-overlapping occurrences of a substring in an input string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_count_substring

    Implementations:
    regexp_count_substring(input, pattern, position, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_count_substring(string, string, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    1. regexp_count_substring(varchar<L1>, varchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    2. regexp_count_substring(fixedchar<L1>, fixedchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • replace

    Implementations:
    replace(input, substring, replacement, option:case_sensitivity): -> return_type

  • input: Input string.
  • substring: The substring to replace.
  • replacement: The replacement string.
  • 0. replace(string, string, string, option:case_sensitivity): -> string
    1. replace(varchar<L1>, varchar<L2>, varchar<L3>, option:case_sensitivity): -> varchar<L1>

    Replace all occurrences of the substring with the replacement string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • concat_ws

    Implementations:
    concat_ws(separator, string_arguments): -> return_type

  • separator: Character to separate strings by.
  • string_arguments: Strings to be concatenated.
  • 0. concat_ws(string, string): -> string
    1. concat_ws(varchar<L2>, varchar<L1>): -> varchar<L1>

    Concatenate strings together separated by a separator.

    repeat

    Implementations:
    repeat(input, count): -> return_type
    0. repeat(string, i64): -> string
    1. repeat(varchar<L1>, i64, i64): -> varchar<L1>

    Repeat a string count number of times.

    reverse

    Implementations:
    reverse(input): -> return_type
    0. reverse(string): -> string
    1. reverse(varchar<L1>): -> varchar<L1>
    2. reverse(fixedchar<L1>): -> fixedchar<L1>

    Returns the string in reverse order.

    replace_slice

    Implementations:
    replace_slice(input, start, length, replacement): -> return_type

  • input: Input string.
  • start: The position in the string to start deleting/inserting characters.
  • length: The number of characters to delete from the input string.
  • replacement: The new string to insert at the start position.
  • 0. replace_slice(string, i64, i64, string): -> string
    1. replace_slice(varchar<L1>, i64, i64, varchar<L2>): -> varchar<L1>

    Replace a slice of the input string. A specified ‘length’ of characters will be deleted from the input string beginning at the ‘start’ position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If ‘length’ is negative, a null string is returned. If ‘length’ is zero, inserting of the new string occurs at the specified ‘start’ position and no characters are deleted. If ‘length’ is greater than the input string, deletion will occur up to the last character of the input string.

    lower

    Implementations:
    lower(input, option:char_set): -> return_type
    0. lower(string, option:char_set): -> string
    1. lower(varchar<L1>, option:char_set): -> varchar<L1>
    2. lower(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • upper

    Implementations:
    upper(input, option:char_set): -> return_type
    0. upper(string, option:char_set): -> string
    1. upper(varchar<L1>, option:char_set): -> varchar<L1>
    2. upper(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • swapcase

    Implementations:
    swapcase(input, option:char_set): -> return_type
    0. swapcase(string, option:char_set): -> string
    1. swapcase(varchar<L1>, option:char_set): -> varchar<L1>
    2. swapcase(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string’s lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • capitalize

    Implementations:
    capitalize(input, option:char_set): -> return_type
    0. capitalize(string, option:char_set): -> string
    1. capitalize(varchar<L1>, option:char_set): -> varchar<L1>
    2. capitalize(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • title

    Implementations:
    title(input, option:char_set): -> return_type
    0. title(string, option:char_set): -> string
    1. title(varchar<L1>, option:char_set): -> varchar<L1>
    2. title(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • char_length

    Implementations:
    char_length(input): -> return_type
    0. char_length(string): -> i64
    1. char_length(varchar<L1>): -> i64
    2. char_length(fixedchar<L1>): -> i64

    Return the number of characters in the input string. The length includes trailing spaces.

    bit_length

    Implementations:
    bit_length(input): -> return_type
    0. bit_length(string): -> i64
    1. bit_length(varchar<L1>): -> i64
    2. bit_length(fixedchar<L1>): -> i64

    Return the number of bits in the input string.

    octet_length

    Implementations:
    octet_length(input): -> return_type
    0. octet_length(string): -> i64
    1. octet_length(varchar<L1>): -> i64
    2. octet_length(fixedchar<L1>): -> i64

    Return the number of bytes in the input string.

    regexp_replace

    Implementations:
    regexp_replace(input, pattern, replacement, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • replacement: The replacement string.
  • position: The position to start the search.
  • occurrence: Which occurrence of the match to replace.
  • 0. regexp_replace(string, string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string
    1. regexp_replace(varchar<L1>, varchar<L2>, varchar<L3>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the occurrence argument. Specifying 1 means only the first occurrence will be replaced, 2 means the second occurrence, and so on. Specifying 0 means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • ltrim

    Implementations:
    ltrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. ltrim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. ltrim(string, string): -> string

    Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.

    rtrim

    Implementations:
    rtrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. rtrim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. rtrim(string, string): -> string

    Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.

    trim

    Implementations:
    trim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. trim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. trim(string, string): -> string

    Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.

    lpad

    Implementations:
    lpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. lpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1>
    1. lpad(string, i32, string): -> string

    Left-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the right-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    rpad

    Implementations:
    rpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. rpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1>
    1. rpad(string, i32, string): -> string

    Right-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the left-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    center

    Implementations:
    center(input, length, character, option:padding): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • character: The character to use for padding.
  • 0. center(varchar<L1>, i32, varchar<L1>, option:padding): -> varchar<L1>
    1. center(string, i32, string, option:padding): -> string

    Center the input string by padding the sides with a single character until the specified length of the string has been reached. By default, if the length will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the padding option. Behavior is undefined if the number of characters passed to the character argument is not 1.

    Options:
  • padding ['RIGHT', 'LEFT']
  • left

    Implementations:
    left(input, count): -> return_type
    0. left(varchar<L1>, i32): -> varchar<L1>
    1. left(string, i32): -> string

    Extract count characters starting from the left of the string.

    Implementations:
    right(input, count): -> return_type
    0. right(varchar<L1>, i32): -> varchar<L1>
    1. right(string, i32): -> string

    Extract count characters starting from the right of the string.

    string_split

    Implementations:
    string_split(input, separator): -> return_type

  • input: The input string.
  • separator: A character used for splitting the string.
  • 0. string_split(varchar<L1>, varchar<L2>): -> List<varchar<L1>>
    1. string_split(string, string): -> List<string>

    Split a string into a list of strings, based on a specified separator character.

    regexp_string_split

    Implementations:
    regexp_string_split(input, pattern, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • 0. regexp_string_split(varchar<L1>, varchar<L2>, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    1. regexp_string_split(string, string, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • Aggregate Functions

    string_agg

    Implementations:
    string_agg(input, separator): -> return_type

  • input: Column of string values.
  • separator: Separator for concatenated strings
  • 0. string_agg(string, string): -> string

    Concatenates a column of string values with a separator.

    GitHub

    functions_string.yaml

    This document file is generated for functions_string.yaml

    Scalar Functions

    concat

    Implementations:
    concat(input, option:null_handling): -> return_type
    0. concat(varchar<L1>, option:null_handling): -> varchar<L1>
    1. concat(string, option:null_handling): -> string

    Concatenate strings. The null_handling option determines whether or not null values will be recognized by the function. If null_handling is set to IGNORE_NULLS, null value arguments will be ignored when strings are concatenated. If set to ACCEPT_NULLS, the result will be null if any argument passed to the concat function is null.

    Options:
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • like

    Implementations:
    like(input, match, option:case_sensitivity): -> return_type

  • input: The input string.
  • match: The string to match against the input string.
  • 0. like(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. like(string, string, option:case_sensitivity): -> boolean

    Are two strings like each other. The case_sensitivity option applies to the match argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • substring

    Implementations:
    substring(input, start, length, option:negative_start): -> return_type
    0. substring(varchar<L1>, i32, i32, option:negative_start): -> varchar<L1>
    1. substring(string, i32, i32, option:negative_start): -> string
    2. substring(fixedchar<l1>, i32, i32, option:negative_start): -> string
    3. substring(varchar<L1>, i32, option:negative_start): -> varchar<L1>
    4. substring(string, i32, option:negative_start): -> string
    5. substring(fixedchar<l1>, i32, option:negative_start): -> string

    Extract a substring of a specified length starting from position start. A start value of 1 refers to the first characters of the string. When length is not specified the function will extract a substring starting from position start and ending at the end of the string. The negative_start option applies to the start parameter. WRAP_FROM_END means the index will start from the end of the input and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. LEFT_OF_BEGINNING means the returned substring will start from the left of the first character. A start of -1 will begin 2 characters left of the the input, while a start of 0 begins 1 character left of the input.

    Options:
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • regexp_match_substring

    Implementations:
    regexp_match_substring(input, pattern, position, occurrence, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_match_substring(varchar<L1>, varchar<L2>, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>
    1. regexp_match_substring(string, string, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the occurrence argument. Specifying 1 means the first occurrence will be extracted, 2 means the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return the substring matching the full regular expression. Specifying 1 will return the substring matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • regexp_match_substring_all

    Implementations:
    regexp_match_substring_all(input, pattern, position, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_match_substring_all(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    1. regexp_match_substring_all(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return substrings matching the full regular expression. Specifying 1 will return substrings matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • starts_with

    Implementations:
    starts_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. starts_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. starts_with(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. starts_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. starts_with(string, string, option:case_sensitivity): -> boolean
    4. starts_with(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. starts_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. starts_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. starts_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. starts_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string starts with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • ends_with

    Implementations:
    ends_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. ends_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. ends_with(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. ends_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. ends_with(string, string, option:case_sensitivity): -> boolean
    4. ends_with(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. ends_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. ends_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. ends_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. ends_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether input string ends with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • contains

    Implementations:
    contains(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. contains(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean
    1. contains(varchar<L1>, string, option:case_sensitivity): -> boolean
    2. contains(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    3. contains(string, string, option:case_sensitivity): -> boolean
    4. contains(string, varchar<L1>, option:case_sensitivity): -> boolean
    5. contains(string, fixedchar<L1>, option:case_sensitivity): -> boolean
    6. contains(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean
    7. contains(fixedchar<L1>, string, option:case_sensitivity): -> boolean
    8. contains(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string contains the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • strpos

    Implementations:
    strpos(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. strpos(string, string, option:case_sensitivity): -> i64
    1. strpos(varchar<L1>, varchar<L1>, option:case_sensitivity): -> i64
    2. strpos(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_strpos

    Implementations:
    regexp_strpos(input, pattern, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_strpos(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    1. regexp_strpos(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the occurrence argument. Specifying 1 means the position first occurrence will be returned, 2 means the position of the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • count_substring

    Implementations:
    count_substring(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to count.
  • 0. count_substring(string, string, option:case_sensitivity): -> i64
    1. count_substring(varchar<L1>, varchar<L2>, option:case_sensitivity): -> i64
    2. count_substring(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the number of non-overlapping occurrences of a substring in an input string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_count_substring

    Implementations:
    regexp_count_substring(input, pattern, position, option:case_sensitivity, option:multiline, option:dotall): -> return_type
    0. regexp_count_substring(string, string, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    1. regexp_count_substring(varchar<L1>, varchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    2. regexp_count_substring(fixedchar<L1>, fixedchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • replace

    Implementations:
    replace(input, substring, replacement, option:case_sensitivity): -> return_type

  • input: Input string.
  • substring: The substring to replace.
  • replacement: The replacement string.
  • 0. replace(string, string, string, option:case_sensitivity): -> string
    1. replace(varchar<L1>, varchar<L2>, varchar<L3>, option:case_sensitivity): -> varchar<L1>

    Replace all occurrences of the substring with the replacement string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • concat_ws

    Implementations:
    concat_ws(separator, string_arguments): -> return_type

  • separator: Character to separate strings by.
  • string_arguments: Strings to be concatenated.
  • 0. concat_ws(string, string): -> string
    1. concat_ws(varchar<L2>, varchar<L1>): -> varchar<L1>

    Concatenate strings together separated by a separator.

    repeat

    Implementations:
    repeat(input, count): -> return_type
    0. repeat(string, i64): -> string
    1. repeat(varchar<L1>, i64, i64): -> varchar<L1>

    Repeat a string count number of times.

    reverse

    Implementations:
    reverse(input): -> return_type
    0. reverse(string): -> string
    1. reverse(varchar<L1>): -> varchar<L1>
    2. reverse(fixedchar<L1>): -> fixedchar<L1>

    Returns the string in reverse order.

    replace_slice

    Implementations:
    replace_slice(input, start, length, replacement): -> return_type

  • input: Input string.
  • start: The position in the string to start deleting/inserting characters.
  • length: The number of characters to delete from the input string.
  • replacement: The new string to insert at the start position.
  • 0. replace_slice(string, i64, i64, string): -> string
    1. replace_slice(varchar<L1>, i64, i64, varchar<L2>): -> varchar<L1>

    Replace a slice of the input string. A specified ‘length’ of characters will be deleted from the input string beginning at the ‘start’ position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If ‘length’ is negative, a null string is returned. If ‘length’ is zero, inserting of the new string occurs at the specified ‘start’ position and no characters are deleted. If ‘length’ is greater than the input string, deletion will occur up to the last character of the input string.

    lower

    Implementations:
    lower(input, option:char_set): -> return_type
    0. lower(string, option:char_set): -> string
    1. lower(varchar<L1>, option:char_set): -> varchar<L1>
    2. lower(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • upper

    Implementations:
    upper(input, option:char_set): -> return_type
    0. upper(string, option:char_set): -> string
    1. upper(varchar<L1>, option:char_set): -> varchar<L1>
    2. upper(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • swapcase

    Implementations:
    swapcase(input, option:char_set): -> return_type
    0. swapcase(string, option:char_set): -> string
    1. swapcase(varchar<L1>, option:char_set): -> varchar<L1>
    2. swapcase(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string’s lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • capitalize

    Implementations:
    capitalize(input, option:char_set): -> return_type
    0. capitalize(string, option:char_set): -> string
    1. capitalize(varchar<L1>, option:char_set): -> varchar<L1>
    2. capitalize(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • title

    Implementations:
    title(input, option:char_set): -> return_type
    0. title(string, option:char_set): -> string
    1. title(varchar<L1>, option:char_set): -> varchar<L1>
    2. title(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • char_length

    Implementations:
    char_length(input): -> return_type
    0. char_length(string): -> i64
    1. char_length(varchar<L1>): -> i64
    2. char_length(fixedchar<L1>): -> i64

    Return the number of characters in the input string. The length includes trailing spaces.

    bit_length

    Implementations:
    bit_length(input): -> return_type
    0. bit_length(string): -> i64
    1. bit_length(varchar<L1>): -> i64
    2. bit_length(fixedchar<L1>): -> i64

    Return the number of bits in the input string.

    octet_length

    Implementations:
    octet_length(input): -> return_type
    0. octet_length(string): -> i64
    1. octet_length(varchar<L1>): -> i64
    2. octet_length(fixedchar<L1>): -> i64

    Return the number of bytes in the input string.

    regexp_replace

    Implementations:
    regexp_replace(input, pattern, replacement, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • replacement: The replacement string.
  • position: The position to start the search.
  • occurrence: Which occurrence of the match to replace.
  • 0. regexp_replace(string, string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string
    1. regexp_replace(varchar<L1>, varchar<L2>, varchar<L3>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the occurrence argument. Specifying 1 means only the first occurrence will be replaced, 2 means the second occurrence, and so on. Specifying 0 means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • ltrim

    Implementations:
    ltrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. ltrim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. ltrim(string, string): -> string

    Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.

    rtrim

    Implementations:
    rtrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. rtrim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. rtrim(string, string): -> string

    Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.

    trim

    Implementations:
    trim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. trim(varchar<L1>, varchar<L2>): -> varchar<L1>
    1. trim(string, string): -> string

    Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.

    lpad

    Implementations:
    lpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. lpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1>
    1. lpad(string, i32, string): -> string

    Left-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the right-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    rpad

    Implementations:
    rpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. rpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1>
    1. rpad(string, i32, string): -> string

    Right-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the left-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    center

    Implementations:
    center(input, length, character, option:padding): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • character: The character to use for padding.
  • 0. center(varchar<L1>, i32, varchar<L1>, option:padding): -> varchar<L1>
    1. center(string, i32, string, option:padding): -> string

    Center the input string by padding the sides with a single character until the specified length of the string has been reached. By default, if the length will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the padding option. Behavior is undefined if the number of characters passed to the character argument is not 1.

    Options:
  • padding ['RIGHT', 'LEFT']
  • left

    Implementations:
    left(input, count): -> return_type
    0. left(varchar<L1>, i32): -> varchar<L1>
    1. left(string, i32): -> string

    Extract count characters starting from the left of the string.

    Implementations:
    right(input, count): -> return_type
    0. right(varchar<L1>, i32): -> varchar<L1>
    1. right(string, i32): -> string

    Extract count characters starting from the right of the string.

    string_split

    Implementations:
    string_split(input, separator): -> return_type

  • input: The input string.
  • separator: A character used for splitting the string.
  • 0. string_split(varchar<L1>, varchar<L2>): -> List<varchar<L1>>
    1. string_split(string, string): -> List<string>

    Split a string into a list of strings, based on a specified separator character.

    regexp_string_split

    Implementations:
    regexp_string_split(input, pattern, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • 0. regexp_string_split(varchar<L1>, varchar<L2>, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    1. regexp_string_split(string, string, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • Aggregate Functions

    string_agg

    Implementations:
    string_agg(input, separator): -> return_type

  • input: Column of string values.
  • separator: Separator for concatenated strings
  • 0. string_agg(string, string): -> string

    Concatenates a column of string values with a separator.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/extensions/index.html b/extensions/index.html index 35c631ce..88548c57 100644 --- a/extensions/index.html +++ b/extensions/index.html @@ -1,4 +1,4 @@ - Extensions - Substrait: Cross-Language Serialization for Relational Algebra

    Extensions

    In many cases, the existing objects in Substrait will be sufficient to accomplish a particular use case. However, it is sometimes helpful to create a new data type, scalar function signature or some other custom representation within a system. For that, Substrait provides a number of extension points.

    Simple Extensions

    Some kinds of primitives are so frequently extended that Substrait defines a standard YAML format that describes how the extended functionality can be interpreted. This allows different projects/systems to use the YAML definition as a specification so that interoperability isn’t constrained to the base Substrait specification. The main types of extensions that are defined in this manner include the following:

    • Data types
    • Type variations
    • Scalar Functions
    • Aggregate Functions
    • Window Functions
    • Table Functions

    To extend these items, developers can create one or more YAML files at a defined URI that describes the properties of each of these extensions. The YAML file is constructed according to the YAML Schema. Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

    A Substrait plan can reference one or more YAML files via URI for extension. In the places where these entities are referenced, they will be referenced using a URI + name reference. The name scheme per type works as follows:

    Category Naming scheme
    Type The name as defined on the type object.
    Type Variation The name as defined on the type variation object.
    Function Signature A function signature compound name as described below.

    A YAML file can also reference types and type variations defined in another YAML file. To do this, it must declare the YAML file it depends on using a key-value pair in the dependencies key, where the value is the URI to the YAML file, and the key is a valid identifier that can then be used as an identifier-safe alias for the URI. This alias can then be used as a .-separated namespace prefix wherever a type class or type variation name is expected.

    For example, if the YAML file at file:///extension_types.yaml defines a type called point, a different YAML file can use the type in a function declaration as follows:

    dependencies:
    + Extensions - Substrait: Cross-Language Serialization for Relational Algebra      

    Extensions

    In many cases, the existing objects in Substrait will be sufficient to accomplish a particular use case. However, it is sometimes helpful to create a new data type, scalar function signature or some other custom representation within a system. For that, Substrait provides a number of extension points.

    Simple Extensions

    Some kinds of primitives are so frequently extended that Substrait defines a standard YAML format that describes how the extended functionality can be interpreted. This allows different projects/systems to use the YAML definition as a specification so that interoperability isn’t constrained to the base Substrait specification. The main types of extensions that are defined in this manner include the following:

    • Data types
    • Type variations
    • Scalar Functions
    • Aggregate Functions
    • Window Functions
    • Table Functions

    To extend these items, developers can create one or more YAML files at a defined URI that describes the properties of each of these extensions. The YAML file is constructed according to the YAML Schema. Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

    A Substrait plan can reference one or more YAML files via URI for extension. In the places where these entities are referenced, they will be referenced using a URI + name reference. The name scheme per type works as follows:

    Category Naming scheme
    Type The name as defined on the type object.
    Type Variation The name as defined on the type variation object.
    Function Signature A function signature compound name as described below.

    A YAML file can also reference types and type variations defined in another YAML file. To do this, it must declare the YAML file it depends on using a key-value pair in the dependencies key, where the value is the URI to the YAML file, and the key is a valid identifier that can then be used as an identifier-safe alias for the URI. This alias can then be used as a .-separated namespace prefix wherever a type class or type variation name is expected.

    For example, if the YAML file at file:///extension_types.yaml defines a type called point, a different YAML file can use the type in a function declaration as follows:

    dependencies:
       ext: file:///extension_types.yaml
     scalar_functions:
     - name: distance
    @@ -31,4 +31,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/faq/index.html b/faq/index.html index 3b31c9a4..318db2d2 100644 --- a/faq/index.html +++ b/faq/index.html @@ -1,4 +1,4 @@ - FAQ - Substrait: Cross-Language Serialization for Relational Algebra

    Frequently Asked Question

    What is the purpose of the post-join filter field on Join relations?

    The post-join filter on the various Join relations is not always equivalent to an explicit Filter relation AFTER the Join.

    See the example here that highlights how the post-join filter behaves differently than a Filter relation in the case of a left join.

    GitHub

    Frequently Asked Question

    What is the purpose of the post-join filter field on Join relations?

    The post-join filter on the various Join relations is not always equivalent to an explicit Filter relation AFTER the Join.

    See the example here that highlights how the post-join filter behaves differently than a Filter relation in the case of a left join.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/governance/index.html b/governance/index.html index c9994101..9a8dd2ad 100644 --- a/governance/index.html +++ b/governance/index.html @@ -1,4 +1,4 @@ - Governance - Substrait: Cross-Language Serialization for Relational Algebra

    Substrait Project Governance

    The Substrait project is run by volunteers in a collaborative and open way. Its governance is inspired by the Apache Software Foundation. In most cases, people familiar with the ASF model can work with Substrait in the same way. The biggest differences between the models are:

    • Substrait does not have a separate infrastructure governing body that gatekeeps the adoption of new developer tools and technologies.
    • Substrait Management Committee (SMC) members are responsible for recognizing the corporate relationship of its members and ensuring diverse representation and corporate independence.
    • Substrait does not condone private mailing lists. All project business should be discussed in public The only exceptions to this are security escalations (security@substrait.io) and harassment (harassment@substrait.io).
    • Substrait has an automated continuous release process with no formal voting process per release.

    More details about concrete things Substrait looks to avoid can be found below.

    The Substrait Project

    The Substrait project consists of the code and repositories that reside in the substrait-io GitHub organization, the Substrait.io website, the Substrait mailing list, MS-hosted teams community calls and the Substrait Slack workspace. (All are open to everyone and recordings/transcripts are made where technology supports it.)

    Substrait Volunteers

    We recognize four groups of individuals related to the project.

    User

    A user is someone who uses Substrait. They may contribute to Substrait by providing feedback to developers in the form of bug reports and feature suggestions. Users participate in the Substrait community by helping other users on mailing lists and user support forums.

    Contributors

    A contributor is a user who contributes to the project in the form of code or documentation. They take extra steps to participate in the project (loosely defined as the set of repositories under the github substrait-io organization) , are active on the developer mailing list, participate in discussions, and provide patches, documentation, suggestions, and criticism.

    Committer

    A committer is a developer who has write access to the code repositories and has a signed Contributor License Agreement (CLA) on file. Not needing to depend on other people to make patches to the code or documentation, they are actually making short-term decisions for the project. The SMC can (even tacitly) agree and approve the changes into permanency, or they can reject them. Remember that the SMC makes the decisions, not the individual committers.

    SMC Member

    A SMC member is a committer who was elected due to merit for the evolution of the project. They have write access to the code repository, the right to cast binding votes on all proposals on community-related decisions,the right to propose other active contributors for committership, and the right to invite active committers to the SMC. The SMC as a whole is the entity that controls the project, nobody else. They are responsible for the continued shaping of this governance model.

    Substrait Management and Collaboration

    The Substrait project is managed using a collaborative, consensus-based process. We do not have a hierarchical structure; rather, different groups of contributors have different rights and responsibilities in the organization.

    Communication

    Communication must be done via mailing lists, Slack, and/or Github. Communication is always done publicly. There are no private lists and all decisions related to the project are made in public. Communication is frequently done asynchronously since members of the community are distributed across many time zones.

    Substrait Management Committee

    The Substrait Management Committee is responsible for the active management of Substrait. The main role of the SMC is to further the long-term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration takes place. As part of this, the SMC is the primary approver of specification changes, ensuring that proposed changes represent a balanced and thorough examination of possibilities. This doesn’t mean that the SMC has to be involved in the minutiae of a particular specification change but should always shepard a healthy process around specification changes.

    Substrait Voting Process

    Because one of the fundamental aspects of accomplishing things is doing so by consensus, we need a way to tell whether we have reached consensus. We do this by voting. There are several different types of voting. In all cases, it is recommended that all community members vote. The number of binding votes required to move forward and the community members who have “binding” votes differs depending on the type of proposal made. In all cases, a veto of a binding voter results in an inability to move forward.

    The rules require that a community member registering a negative vote must include an alternative proposal or a detailed explanation of the reasons for the negative vote. The community then tries to gather consensus on an alternative proposal that can resolve the issue. In the great majority of cases, the concerns leading to the negative vote can be addressed. This process is called “consensus gathering” and we consider it a very important indication of a healthy community.

    +1 votes required Binding voters Voting Location
    Process/Governance modifications & actions. This includes promoting new contributors to committer or SMC. 3 SMC Mailing List
    Format/Specification Modifications (including breaking extension changes) 2 SMC Github PR
    Documentation Updates (formatting, moves) 1 SMC Github PR
    Typos 1 Committers Github PR
    Non-breaking function introductions 1 (not including proposer) Committers Github PR
    Non-breaking extension additions & non-format code modifications 1 (not including proposer) Committers Github PR
    Changes (non-breaking or breaking) to a Substrait library (i.e. substrait-java, substrait-validator) 1 (not including proposer) Committers Github PR

    Review-Then-Commit

    Substrait follows a review-then-commit policy. This requires that all changes receive consensus approval before being committed to the code base. The specific vote requirements follow the table above.

    Expressing Votes

    The voting process may seem more than a little weird if you’ve never encountered it before. Votes are represented as numbers between -1 and +1, with ‘-1’ meaning ‘no’ and ‘+1’ meaning ‘yes.’

    The in-between values indicate how strongly the voting individual feels. Here are some examples of fractional votes and what the voter might be communicating with them:

    • +0: ‘I don’t feel strongly about it, but I’m okay with this.’
    • -0: ‘I won’t get in the way, but I’d rather we didn’t do this.’
    • -0.5: ‘I don’t like this idea, but I can’t find any rational justification for my feelings.’
    • ++1: ‘Wow! I like this! Let’s do it!’
    • -0.9: ‘I really don’t like this, but I’m not going to stand in the way if everyone else wants to go ahead with it.’
    • +0.9: ‘This is a cool idea and I like it, but I don’t have time/the skills necessary to help out.’

    Votes on Code Modification

    For code-modification votes, +1 votes (review approvals in Github are considered equivalent to a +1) are in favor of the proposal, but -1 votes are vetoes and kill the proposal dead until all vetoers withdraw their -1 votes.

    Vetoes

    A -1 (or an unaddressed PR request for changes) vote by a qualified voter stops a code-modification proposal in its tracks. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetoes stand until and unless the individual withdraws their veto.

    To prevent vetoes from being used capriciously, the voter must provide with the veto a technical or community justification showing why the change is bad.

    Why do we vote?

    Votes help us to openly resolve conflicts. Without a process, people tend to avoid conflict and thrash around. Votes help to make sure we do the hard work of resolving the conflict.

    Substrait is non-commercial but commercially-aware

    Substrait’s mission is to produce software for the public good. All Substrait software is always available for free, and solely under the Apache License.

    We’re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes. However it is important in these cases to ensure that the third party does not misuse the brand and reputation of the Substrait project for its own purposes. It is important for the longevity and community health of Substrait that the community gets the appropriate credit for producing freely available software.

    The SMC actively track the corporate allegiances of community members and strives to ensure influence around any particular aspect of the project isn’t overly skewed towards a single corporate entity.

    Substrait Trademark

    The SMC is responsible for protecting the Substrait name and brand. TBD what action is taken to support this.

    Project Roster

    Substrait Management Committee (SMC)

    Name Association
    Phillip Cloud Voltron Data
    Weston Pace LanceDB
    Jacques Nadeau Sundeck
    Victor Barua Datadog
    David Sisson Voltron Data

    Substrait Committers

    Name Association
    Jeroen van Straten Qblox
    Carlo Curino Microsoft
    James Taylor Sundeck
    Sutou Kouhei Clearcode
    Micah Kornfeld Google
    Jinfeng Ni Sundeck
    Andy Grove Nvidia
    Jesus Camacho Rodriguez Microsoft
    Rich Tia Voltron Data
    Vibhatha Abeykoon Voltron Data
    Nic Crane Recast
    Gil Forsyth Voltron Data
    ChaoJun Zhang Intel
    Matthijs Brobbel Voltron Data
    Matt Topol Voltron Data

    Additional detail about differences from ASF

    Corporate Awareness: The ASF takes a blind-eye approach that has proven to be too slow to correct corporate influence which has substantially undermined many OSS projects. In contrast, Substrait SMC members are responsible for identifying corporate risks and over-representation and adjusting inclusion in the project based on that (limiting committership, SMC membership, etc). Each member of the SMC shares responsibility to expand the community and seek out corporate diversity.

    Infrastructure: The ASF shows its age wrt to infrastructure, having been originally built on SVN. Some examples of requirements that Substrait is eschewing that exist in ASF include: custom git infrastructure, release process that is manual, project external gatekeeping around the use of new tools/technologies.

    GitHub

    Substrait Project Governance

    The Substrait project is run by volunteers in a collaborative and open way. Its governance is inspired by the Apache Software Foundation. In most cases, people familiar with the ASF model can work with Substrait in the same way. The biggest differences between the models are:

    • Substrait does not have a separate infrastructure governing body that gatekeeps the adoption of new developer tools and technologies.
    • Substrait Management Committee (SMC) members are responsible for recognizing the corporate relationship of its members and ensuring diverse representation and corporate independence.
    • Substrait does not condone private mailing lists. All project business should be discussed in public The only exceptions to this are security escalations (security@substrait.io) and harassment (harassment@substrait.io).
    • Substrait has an automated continuous release process with no formal voting process per release.

    More details about concrete things Substrait looks to avoid can be found below.

    The Substrait Project

    The Substrait project consists of the code and repositories that reside in the substrait-io GitHub organization, the Substrait.io website, the Substrait mailing list, MS-hosted teams community calls and the Substrait Slack workspace. (All are open to everyone and recordings/transcripts are made where technology supports it.)

    Substrait Volunteers

    We recognize four groups of individuals related to the project.

    User

    A user is someone who uses Substrait. They may contribute to Substrait by providing feedback to developers in the form of bug reports and feature suggestions. Users participate in the Substrait community by helping other users on mailing lists and user support forums.

    Contributors

    A contributor is a user who contributes to the project in the form of code or documentation. They take extra steps to participate in the project (loosely defined as the set of repositories under the github substrait-io organization) , are active on the developer mailing list, participate in discussions, and provide patches, documentation, suggestions, and criticism.

    Committer

    A committer is a developer who has write access to the code repositories and has a signed Contributor License Agreement (CLA) on file. Not needing to depend on other people to make patches to the code or documentation, they are actually making short-term decisions for the project. The SMC can (even tacitly) agree and approve the changes into permanency, or they can reject them. Remember that the SMC makes the decisions, not the individual committers.

    SMC Member

    A SMC member is a committer who was elected due to merit for the evolution of the project. They have write access to the code repository, the right to cast binding votes on all proposals on community-related decisions,the right to propose other active contributors for committership, and the right to invite active committers to the SMC. The SMC as a whole is the entity that controls the project, nobody else. They are responsible for the continued shaping of this governance model.

    Substrait Management and Collaboration

    The Substrait project is managed using a collaborative, consensus-based process. We do not have a hierarchical structure; rather, different groups of contributors have different rights and responsibilities in the organization.

    Communication

    Communication must be done via mailing lists, Slack, and/or Github. Communication is always done publicly. There are no private lists and all decisions related to the project are made in public. Communication is frequently done asynchronously since members of the community are distributed across many time zones.

    Substrait Management Committee

    The Substrait Management Committee is responsible for the active management of Substrait. The main role of the SMC is to further the long-term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration takes place. As part of this, the SMC is the primary approver of specification changes, ensuring that proposed changes represent a balanced and thorough examination of possibilities. This doesn’t mean that the SMC has to be involved in the minutiae of a particular specification change but should always shepard a healthy process around specification changes.

    Substrait Voting Process

    Because one of the fundamental aspects of accomplishing things is doing so by consensus, we need a way to tell whether we have reached consensus. We do this by voting. There are several different types of voting. In all cases, it is recommended that all community members vote. The number of binding votes required to move forward and the community members who have “binding” votes differs depending on the type of proposal made. In all cases, a veto of a binding voter results in an inability to move forward.

    The rules require that a community member registering a negative vote must include an alternative proposal or a detailed explanation of the reasons for the negative vote. The community then tries to gather consensus on an alternative proposal that can resolve the issue. In the great majority of cases, the concerns leading to the negative vote can be addressed. This process is called “consensus gathering” and we consider it a very important indication of a healthy community.

    +1 votes required Binding voters Voting Location
    Process/Governance modifications & actions. This includes promoting new contributors to committer or SMC. 3 SMC Mailing List
    Format/Specification Modifications (including breaking extension changes) 2 SMC Github PR
    Documentation Updates (formatting, moves) 1 SMC Github PR
    Typos 1 Committers Github PR
    Non-breaking function introductions 1 (not including proposer) Committers Github PR
    Non-breaking extension additions & non-format code modifications 1 (not including proposer) Committers Github PR
    Changes (non-breaking or breaking) to a Substrait library (i.e. substrait-java, substrait-validator) 1 (not including proposer) Committers Github PR

    Review-Then-Commit

    Substrait follows a review-then-commit policy. This requires that all changes receive consensus approval before being committed to the code base. The specific vote requirements follow the table above.

    Expressing Votes

    The voting process may seem more than a little weird if you’ve never encountered it before. Votes are represented as numbers between -1 and +1, with ‘-1’ meaning ‘no’ and ‘+1’ meaning ‘yes.’

    The in-between values indicate how strongly the voting individual feels. Here are some examples of fractional votes and what the voter might be communicating with them:

    • +0: ‘I don’t feel strongly about it, but I’m okay with this.’
    • -0: ‘I won’t get in the way, but I’d rather we didn’t do this.’
    • -0.5: ‘I don’t like this idea, but I can’t find any rational justification for my feelings.’
    • ++1: ‘Wow! I like this! Let’s do it!’
    • -0.9: ‘I really don’t like this, but I’m not going to stand in the way if everyone else wants to go ahead with it.’
    • +0.9: ‘This is a cool idea and I like it, but I don’t have time/the skills necessary to help out.’

    Votes on Code Modification

    For code-modification votes, +1 votes (review approvals in Github are considered equivalent to a +1) are in favor of the proposal, but -1 votes are vetoes and kill the proposal dead until all vetoers withdraw their -1 votes.

    Vetoes

    A -1 (or an unaddressed PR request for changes) vote by a qualified voter stops a code-modification proposal in its tracks. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetoes stand until and unless the individual withdraws their veto.

    To prevent vetoes from being used capriciously, the voter must provide with the veto a technical or community justification showing why the change is bad.

    Why do we vote?

    Votes help us to openly resolve conflicts. Without a process, people tend to avoid conflict and thrash around. Votes help to make sure we do the hard work of resolving the conflict.

    Substrait is non-commercial but commercially-aware

    Substrait’s mission is to produce software for the public good. All Substrait software is always available for free, and solely under the Apache License.

    We’re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes. However it is important in these cases to ensure that the third party does not misuse the brand and reputation of the Substrait project for its own purposes. It is important for the longevity and community health of Substrait that the community gets the appropriate credit for producing freely available software.

    The SMC actively track the corporate allegiances of community members and strives to ensure influence around any particular aspect of the project isn’t overly skewed towards a single corporate entity.

    Substrait Trademark

    The SMC is responsible for protecting the Substrait name and brand. TBD what action is taken to support this.

    Project Roster

    Substrait Management Committee (SMC)

    Name Association
    Phillip Cloud Voltron Data
    Weston Pace LanceDB
    Jacques Nadeau Sundeck
    Victor Barua Datadog
    David Sisson Voltron Data

    Substrait Committers

    Name Association
    Jeroen van Straten Qblox
    Carlo Curino Microsoft
    James Taylor Sundeck
    Sutou Kouhei Clearcode
    Micah Kornfeld Google
    Jinfeng Ni Sundeck
    Andy Grove Nvidia
    Jesus Camacho Rodriguez Microsoft
    Rich Tia Voltron Data
    Vibhatha Abeykoon Voltron Data
    Nic Crane Recast
    Gil Forsyth Voltron Data
    ChaoJun Zhang Intel
    Matthijs Brobbel Voltron Data
    Matt Topol Voltron Data

    Additional detail about differences from ASF

    Corporate Awareness: The ASF takes a blind-eye approach that has proven to be too slow to correct corporate influence which has substantially undermined many OSS projects. In contrast, Substrait SMC members are responsible for identifying corporate risks and over-representation and adjusting inclusion in the project based on that (limiting committership, SMC membership, etc). Each member of the SMC shares responsibility to expand the community and seek out corporate diversity.

    Infrastructure: The ASF shows its age wrt to infrastructure, having been originally built on SVN. Some examples of requirements that Substrait is eschewing that exist in ASF include: custom git infrastructure, release process that is manual, project external gatekeeping around the use of new tools/technologies.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/index.html b/index.html index 43f3cf58..9bd1c6cd 100644 --- a/index.html +++ b/index.html @@ -1,4 +1,4 @@ - Home - Substrait: Cross-Language Serialization for Relational Algebra

    Substrait: Cross-Language Serialization for Relational Algebra

    What is Substrait?

    Substrait is a format for describing compute operations on structured data. It is designed for interoperability across different languages and systems.

    How does it work?

    Substrait provides a well-defined, cross-language specification for data compute operations. This includes a consistent declaration of common operations, custom operations and one or more serialized representations of this specification. The spec focuses on the semantics of each operation. In addition to the specification the Substrait ecosystem also includes a number of libraries and useful tools.

    We highly recommend the tutorial to learn how a Substrait plan is constructed.

    Benefits

    • Avoids every system needing to create a communication method between every other system – each system merely supports ingesting and producing Substrait and it instantly becomes a part of the greater ecosystem.
    • Makes every part of the system upgradable. There’s a new query engine that’s ten times faster? Just plug it in!
    • Enables heterogeneous environments – run on a cluster of an unknown set of execution engines!
    • The text version of the Substrait plan allows you to quickly see how a plan functions without needing a visualizer (although there are Substrait visualizers as well!).

    Example Use Cases

    • Communicate a compute plan between a SQL parser and an execution engine (e.g. Calcite SQL parsing to Arrow C++ compute kernel)
    • Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)
    • Submit a plan to different execution engines (e.g. Datafusion and Postgres) and get a consistent interpretation of the semantics.
    • Create an alternative plan generation implementation that can connect an existing end-user compute expression system to an existing end-user processing engine (e.g. Pandas operations executed inside SingleStore)
    • Build a pluggable plan visualization tool (e.g. D3 based plan visualizer)
    GitHub

    Substrait: Cross-Language Serialization for Relational Algebra

    What is Substrait?

    Substrait is a format for describing compute operations on structured data. It is designed for interoperability across different languages and systems.

    How does it work?

    Substrait provides a well-defined, cross-language specification for data compute operations. This includes a consistent declaration of common operations, custom operations and one or more serialized representations of this specification. The spec focuses on the semantics of each operation. In addition to the specification the Substrait ecosystem also includes a number of libraries and useful tools.

    We highly recommend the tutorial to learn how a Substrait plan is constructed.

    Benefits

    • Avoids every system needing to create a communication method between every other system – each system merely supports ingesting and producing Substrait and it instantly becomes a part of the greater ecosystem.
    • Makes every part of the system upgradable. There’s a new query engine that’s ten times faster? Just plug it in!
    • Enables heterogeneous environments – run on a cluster of an unknown set of execution engines!
    • The text version of the Substrait plan allows you to quickly see how a plan functions without needing a visualizer (although there are Substrait visualizers as well!).

    Example Use Cases

    • Communicate a compute plan between a SQL parser and an execution engine (e.g. Calcite SQL parsing to Arrow C++ compute kernel)
    • Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)
    • Submit a plan to different execution engines (e.g. Datafusion and Postgres) and get a consistent interpretation of the semantics.
    • Create an alternative plan generation implementation that can connect an existing end-user compute expression system to an existing end-user processing engine (e.g. Pandas operations executed inside SingleStore)
    • Build a pluggable plan visualization tool (e.g. D3 based plan visualizer)
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/relations/basics/index.html b/relations/basics/index.html index a89a1582..d00c2843 100644 --- a/relations/basics/index.html +++ b/relations/basics/index.html @@ -1,4 +1,4 @@ - Basics - Substrait: Cross-Language Serialization for Relational Algebra

    Basics

    Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.

    Each relational operation is composed of several properties. Common properties for relational operations include the following:

    Property Description Type
    Emit The set of columns output from this operation and the order of those columns. Logical & Physical
    Hints A set of optionally provided, optionally consumed information about an operation that better informs execution. These might include estimated number of input and output records, estimated record size, likely filter reduction, estimated dictionary size, etc. These can also include implementation specific pieces of execution information. Physical
    Constraint A set of runtime constraints around the operation, limiting its consumption based on real-world resources (CPU, memory) as well as virtual resources like number of records produced, the largest record size, etc. Physical

    Relational Signatures

    In functions, function signatures are declared externally to the use of those signatures (function bindings). In the case of relational operations, signatures are declared directly in the specification. This is due to the speed of change and number of total operations. Relational operations in the specification are expected to be <100 for several years with additions being infrequent. On the other hand, there is an expectation of both a much larger number of functions (1,000s) and a much higher velocity of additions.

    Each relational operation must declare the following:

    • Transformation logic around properties of the data. For example, does a relational operation maintain sortedness of a field? Does an operation change the distribution of data?
    • How many input relations does an operation require?
    • Does the operator produce an output (by specification, we limit relational operations to a single output at this time)
    • What is the schema and field ordering of an output (see emit below)?

    Emit: Output Ordering

    A relational operation uses field references to access specific fields of the input stream. Field references are always ordinal based on the order of the incoming streams. Each relational operation must declare the order of its output data. To simplify things, each relational operation can be in one of two modes:

    1. Direct output: The order of outputs is based on the definition declared by the relational operation.
    2. Remap: A listed ordering of the direct outputs. This remapping can be also used to drop columns no longer used (such as a filter field or join keys after a join). Note that remapping/exclusion can only be done at the outputs root struct. Filtering of compound values or extracting subsets must be done through other operation types (e.g. projection).

    Relation Properties

    There are a number of predefined properties that exist in Substrait relations. These include the following.

    Distribution

    When data is partitioned across multiple sibling sets, distribution describes that set of properties that apply to any one partition. This is based on a set of distribution expression properties. A distribution is declared as a set of one or more fields and a distribution type across all fields.

    Property Description Required
    Distribution Fields List of fields references that describe distribution (e.g. [0,2:4,5:0:0]). The order of these references do not impact results. Required for partitioned distribution type. Disallowed for singleton distribution type.
    Distribution Type PARTITIONED: For a discrete tuple of values for the declared distribution fields, all records with that tuple are located in the same partition. SINGLETON: there will only be a single partition for this operation. Required

    Orderedness

    A guarantee that data output from this operation is provided with a sort order. The sort order will be declared based on a set of sort field definitions based on the emitted output of this operation.

    Property Description Required
    Sort Fields A list of fields that the data are ordered by. The list is in order of the sort. If we sort by [0,1] then this means we only consider the data for field 1 to be ordered within each discrete value of field 0. At least one required.
    Per - Sort Field A field reference that the data is sorted by. Required
    Per - Sort Direction The direction of the data. See direction options below. Required

    Ordering Directions

    Direction Descriptions Nulls Position
    Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included before any values. First
    Descending Returns data in descending order based on the quality function associated with the type. Nulls are included before any values. First
    Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included after any values. Last
    Descending Returns data in descending order based on the quality function associated with the type. Nulls are included after any values. Last
    Custom function identifier Returns data using a custom function that returns -1, 0, or 1 depending on the order of the data. Per Function
    Clustered Ensures that all equal values are coalesced (but no ordering between values is defined). E.g. for values 1,2,3,1,2,3, output could be any of the following: 1,1,2,2,3,3 or 1,1,3,3,2,2 or 2,2,1,1,3,3 or 2,2,3,3,1,1 or 3,3,1,1,2,2 or 3,3,2,2,1,1. N/A, may appear anywhere but will be coalesced.
    Discussion Points
    • Should read definition types be more extensible in the same way that function signatures are? Are extensible read definition types necessary if we have custom relational operators?
    • How are decomposed reads expressed? For example, the Iceberg type above is for early logical planning. Once we do some operations, it may produce a list of Iceberg file reads. This is likely a secondary type of object.
    GitHub

    Basics

    Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.

    Each relational operation is composed of several properties. Common properties for relational operations include the following:

    Property Description Type
    Emit The set of columns output from this operation and the order of those columns. Logical & Physical
    Hints A set of optionally provided, optionally consumed information about an operation that better informs execution. These might include estimated number of input and output records, estimated record size, likely filter reduction, estimated dictionary size, etc. These can also include implementation specific pieces of execution information. Physical
    Constraint A set of runtime constraints around the operation, limiting its consumption based on real-world resources (CPU, memory) as well as virtual resources like number of records produced, the largest record size, etc. Physical

    Relational Signatures

    In functions, function signatures are declared externally to the use of those signatures (function bindings). In the case of relational operations, signatures are declared directly in the specification. This is due to the speed of change and number of total operations. Relational operations in the specification are expected to be <100 for several years with additions being infrequent. On the other hand, there is an expectation of both a much larger number of functions (1,000s) and a much higher velocity of additions.

    Each relational operation must declare the following:

    • Transformation logic around properties of the data. For example, does a relational operation maintain sortedness of a field? Does an operation change the distribution of data?
    • How many input relations does an operation require?
    • Does the operator produce an output (by specification, we limit relational operations to a single output at this time)
    • What is the schema and field ordering of an output (see emit below)?

    Emit: Output Ordering

    A relational operation uses field references to access specific fields of the input stream. Field references are always ordinal based on the order of the incoming streams. Each relational operation must declare the order of its output data. To simplify things, each relational operation can be in one of two modes:

    1. Direct output: The order of outputs is based on the definition declared by the relational operation.
    2. Remap: A listed ordering of the direct outputs. This remapping can be also used to drop columns no longer used (such as a filter field or join keys after a join). Note that remapping/exclusion can only be done at the outputs root struct. Filtering of compound values or extracting subsets must be done through other operation types (e.g. projection).

    Relation Properties

    There are a number of predefined properties that exist in Substrait relations. These include the following.

    Distribution

    When data is partitioned across multiple sibling sets, distribution describes that set of properties that apply to any one partition. This is based on a set of distribution expression properties. A distribution is declared as a set of one or more fields and a distribution type across all fields.

    Property Description Required
    Distribution Fields List of fields references that describe distribution (e.g. [0,2:4,5:0:0]). The order of these references do not impact results. Required for partitioned distribution type. Disallowed for singleton distribution type.
    Distribution Type PARTITIONED: For a discrete tuple of values for the declared distribution fields, all records with that tuple are located in the same partition. SINGLETON: there will only be a single partition for this operation. Required

    Orderedness

    A guarantee that data output from this operation is provided with a sort order. The sort order will be declared based on a set of sort field definitions based on the emitted output of this operation.

    Property Description Required
    Sort Fields A list of fields that the data are ordered by. The list is in order of the sort. If we sort by [0,1] then this means we only consider the data for field 1 to be ordered within each discrete value of field 0. At least one required.
    Per - Sort Field A field reference that the data is sorted by. Required
    Per - Sort Direction The direction of the data. See direction options below. Required

    Ordering Directions

    Direction Descriptions Nulls Position
    Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included before any values. First
    Descending Returns data in descending order based on the quality function associated with the type. Nulls are included before any values. First
    Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included after any values. Last
    Descending Returns data in descending order based on the quality function associated with the type. Nulls are included after any values. Last
    Custom function identifier Returns data using a custom function that returns -1, 0, or 1 depending on the order of the data. Per Function
    Clustered Ensures that all equal values are coalesced (but no ordering between values is defined). E.g. for values 1,2,3,1,2,3, output could be any of the following: 1,1,2,2,3,3 or 1,1,3,3,2,2 or 2,2,1,1,3,3 or 2,2,3,3,1,1 or 3,3,1,1,2,2 or 3,3,2,2,1,1. N/A, may appear anywhere but will be coalesced.
    Discussion Points
    • Should read definition types be more extensible in the same way that function signatures are? Are extensible read definition types necessary if we have custom relational operators?
    • How are decomposed reads expressed? For example, the Iceberg type above is for early logical planning. Once we do some operations, it may produce a list of Iceberg file reads. This is likely a secondary type of object.
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/relations/embedded_relations/index.html b/relations/embedded_relations/index.html index cfa2e3d7..c550ce3a 100644 --- a/relations/embedded_relations/index.html +++ b/relations/embedded_relations/index.html @@ -1,4 +1,4 @@ - Embedded Relations - Substrait: Cross-Language Serialization for Relational Algebra

    Embedded Relations

    Pending.

    Embedded relations allow a Substrait producer to define a set operation that will be embedded in the plan.

    TODO: define lots of details about what interfaces, languages, formats, etc. Should reasonably be an extension of embedded user defined table functions.

    GitHub

    Embedded Relations

    Pending.

    Embedded relations allow a Substrait producer to define a set operation that will be embedded in the plan.

    TODO: define lots of details about what interfaces, languages, formats, etc. Should reasonably be an extension of embedded user defined table functions.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/relations/logical_relations/index.html b/relations/logical_relations/index.html index 03ea5c33..2198b710 100644 --- a/relations/logical_relations/index.html +++ b/relations/logical_relations/index.html @@ -1,4 +1,4 @@ - Logical Relations - Substrait: Cross-Language Serialization for Relational Algebra

    Logical Relations

    Read Operator

    The read operator is an operator that produces one output. A simple example would be the reading of a Parquet file. It is expected that many types of reads will be added over time.

    Signature Value
    Inputs 0
    Outputs 1
    Property Maintenance N/A (no inputs)
    Direct Output Order Defaults to the schema of the data read after the optional projection (masked complex expression) is applied.

    Read Properties

    Property Description Required
    Definition The contents of the read property definition. Required
    Direct Schema Defines the schema of the output of the read (before any projection or emit remapping/hiding). Required
    Filter A boolean Substrait expression that describes a filter that must be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none.
    Best Effort Filter A boolean Substrait expression that describes a filter that may be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none.
    Projection A masked complex expression describing the portions of the content that should be read Optional, defaults to all of schema
    Output Properties Declaration of orderedness and/or distribution properties this read produces. Optional, defaults to no properties.
    Properties A list of name/value pairs associated with the read. Optional, defaults to empty

    Read Filtering

    The read relation has two different filter properties. A filter, which must be satisfied by the operator and a best effort filter, which does not have to be satisfied. This reflects the way that consumers are often implemented. A consumer is often only able to fully apply a limited set of operations in the scan. There can then be an extended set of operations which a consumer can apply in a best effort fashion. A producer, when setting these two fields, should take care to only use expressions that the consumer is capable of handling.

    As an example, a consumer may only be able to fully apply (in the read relation) <, =, and > on integral types. The consumer may be able to apply <, =, and > in a best effort fashion on decimal and string types. Consider the filter expression my_int < 10 && my_string < "x" && upper(my_string) > "B". In this case the filter should be set to my_int < 10 and the best_effort_filter should be set to my_string < "x" and the remaining portion (upper(my_string) > "B") should be put into a filter relation.

    A filter expression must be interpreted against the direct schema before the projection expression has been applied. As a result, fields may be referenced by the filter expression which are not included in the relation’s output.

    Read Definition Types

    Adding new Read Definition Types

    If you have a read definition that’s not covered here, see the process for adding new read definition types.

    Read definition types (like the rest of the features in Substrait) are built by the community and added to the specification.

    Virtual Table

    A virtual table is a table whose contents are embedded in the plan itself. The table data is encoded as records consisting of literal values.

    Property Description Required
    Data Required Required

    Named Table

    A named table is a reference to data defined elsewhere. For example, there may be a catalog of tables with unique names that both the producer and consumer agree on. This catalog would provide the consumer with more information on how to retrieve the data.

    Property Description Required
    Names A list of namespaced strings that, together, form the table name Required (at least one)

    Files Type

    Property Description Required
    Items An array of Items (path or path glob) associated with the read. Required
    Format per item Enumeration of available formats. Only current option is PARQUET. Required
    Slicing parameters per item Information to use when reading a slice of a file. Optional
    Slicing Files

    A read operation is allowed to only read part of a file. This is convenient, for example, when distributing a read operation across several nodes. The slicing parameters are specified as byte offsets into the file.

    Many file formats consist of indivisible “chunks” of data (e.g. Parquet row groups). If this happens the consumer can determine which slice a particular chunk belongs to. For example, one possible approach is that a chunk should only be read if the midpoint of the chunk (dividing by 2 and rounding down) is contained within the asked-for byte range.

    message ReadRel {
    + Logical Relations - Substrait: Cross-Language Serialization for Relational Algebra      

    Logical Relations

    Read Operator

    The read operator is an operator that produces one output. A simple example would be the reading of a Parquet file. It is expected that many types of reads will be added over time.

    Signature Value
    Inputs 0
    Outputs 1
    Property Maintenance N/A (no inputs)
    Direct Output Order Defaults to the schema of the data read after the optional projection (masked complex expression) is applied.

    Read Properties

    Property Description Required
    Definition The contents of the read property definition. Required
    Direct Schema Defines the schema of the output of the read (before any projection or emit remapping/hiding). Required
    Filter A boolean Substrait expression that describes a filter that must be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none.
    Best Effort Filter A boolean Substrait expression that describes a filter that may be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none.
    Projection A masked complex expression describing the portions of the content that should be read Optional, defaults to all of schema
    Output Properties Declaration of orderedness and/or distribution properties this read produces. Optional, defaults to no properties.
    Properties A list of name/value pairs associated with the read. Optional, defaults to empty

    Read Filtering

    The read relation has two different filter properties. A filter, which must be satisfied by the operator and a best effort filter, which does not have to be satisfied. This reflects the way that consumers are often implemented. A consumer is often only able to fully apply a limited set of operations in the scan. There can then be an extended set of operations which a consumer can apply in a best effort fashion. A producer, when setting these two fields, should take care to only use expressions that the consumer is capable of handling.

    As an example, a consumer may only be able to fully apply (in the read relation) <, =, and > on integral types. The consumer may be able to apply <, =, and > in a best effort fashion on decimal and string types. Consider the filter expression my_int < 10 && my_string < "x" && upper(my_string) > "B". In this case the filter should be set to my_int < 10 and the best_effort_filter should be set to my_string < "x" and the remaining portion (upper(my_string) > "B") should be put into a filter relation.

    A filter expression must be interpreted against the direct schema before the projection expression has been applied. As a result, fields may be referenced by the filter expression which are not included in the relation’s output.

    Read Definition Types

    Adding new Read Definition Types

    If you have a read definition that’s not covered here, see the process for adding new read definition types.

    Read definition types (like the rest of the features in Substrait) are built by the community and added to the specification.

    Virtual Table

    A virtual table is a table whose contents are embedded in the plan itself. The table data is encoded as records consisting of literal values.

    Property Description Required
    Data Required Required

    Named Table

    A named table is a reference to data defined elsewhere. For example, there may be a catalog of tables with unique names that both the producer and consumer agree on. This catalog would provide the consumer with more information on how to retrieve the data.

    Property Description Required
    Names A list of namespaced strings that, together, form the table name Required (at least one)

    Files Type

    Property Description Required
    Items An array of Items (path or path glob) associated with the read. Required
    Format per item Enumeration of available formats. Only current option is PARQUET. Required
    Slicing parameters per item Information to use when reading a slice of a file. Optional
    Slicing Files

    A read operation is allowed to only read part of a file. This is convenient, for example, when distributing a read operation across several nodes. The slicing parameters are specified as byte offsets into the file.

    Many file formats consist of indivisible “chunks” of data (e.g. Parquet row groups). If this happens the consumer can determine which slice a particular chunk belongs to. For example, one possible approach is that a chunk should only be read if the midpoint of the chunk (dividing by 2 and rounding down) is contained within the asked-for byte range.

    message ReadRel {
       RelCommon common = 1;
       NamedStruct base_schema = 2;
       Expression filter = 3;
    @@ -322,4 +322,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/relations/physical_relations/index.html b/relations/physical_relations/index.html index c796b009..fdf48128 100644 --- a/relations/physical_relations/index.html +++ b/relations/physical_relations/index.html @@ -1,4 +1,4 @@ - Physical Relations - Substrait: Cross-Language Serialization for Relational Algebra

    Physical Relations

    There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the “physical plan” but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait.

    Hash Equijoin Operator

    The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated.
    Direct Output Order Same as the Join operator.

    Hash Equijoin Properties

    Property Description Required
    Left Input A relational input.(Probe-side) Required
    Right Input A relational input.(Build-side) Required
    Left Keys References to the fields to join on in the left input. Required
    Right Keys References to the fields to join on in the right input. Required
    Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
    Join Type One of the join types defined in the Join operator. Required

    NLJ (Nested Loop Join) Operator

    The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness is eliminated.
    Direct Output Order Same as the Join operator.

    NLJ Properties

    Property Description Required
    Left Input A relational input. Required
    Right Input A relational input. Required
    Join Expression A boolean condition that describes whether each record from the left set “match” the record from the right set. Optional. Defaults to true (a Cartesian join).
    Join Type One of the join types defined in the Join operator. Required

    Merge Equijoin Operator

    The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness is eliminated.
    Direct Output Order Same as the Join operator.

    Merge Join Properties

    Property Description Required
    Left Input A relational input. Required
    Right Input A relational input. Required
    Left Keys References to the fields to join on in the left input. Required
    Right Keys References to the fields to join on in the right input. Reauired
    Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
    Join Type One of the join types defined in the Join operator. Required

    Exchange Operator

    The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderedness is maintained. Distribution is overwritten based on configuration.
    Direct Output Order Order of the input.

    Exchange Types

    Type Description
    Scatter Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels
    Single Bucket Define an expression that provides a single i32 bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition.
    Multi Bucket Define an expression that provides a List<i32> of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression.
    Broadcast Send all records to all partitions.
    Round Robin Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next.

    Exchange Properties

    Property Description Required
    Input The relational input. Required.
    Distribution Type One of the distribution types defined above. Required.
    Partition Count The number of partitions targeted for output. Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count.
    Expression Mapping Describes a relationship between each partition ID and the destination that partition should be sent to. Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value.

    Merging Capture

    A receiving operation that will merge multiple ordered streams to maintain orderedness.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderedness and distribution are maintained.
    Direct Output Order Order of the input.

    Merging Capture Properties

    Property Description Required
    Blocking Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. Optional, defaults to false

    Simple Capture

    A receiving operation that will merge multiple streams in an arbitrary order.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderness is empty after this operation. Distribution are maintained.
    Direct Output Order Order of the input.

    Naive Capture Properties

    Property Description Required
    Input The relational input. Required

    Top-N Operation

    The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit.
    Direct Output Order The field order of the input.

    Top-N Properties

    Property Description Required
    Input The relational input. Required
    Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required
    Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0.
    Count A positive integer. Declares the number of records that should be returned. Required

    Hash Aggregate Operation

    The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed.
    Direct Output Order Same as defined by Aggregate operation.

    Hash Aggregate Properties

    Property Description Required
    Input The relational input. Required
    Grouping Sets One or more grouping sets. Optional, required if no measures.
    Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
    Measures A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. Optional, required if no grouping sets.

    Streaming Aggregate Operation

    The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering.
    Direct Output Order Same as defined by Aggregate operation.

    Streaming Aggregate Properties

    Property Description Required
    Input The relational input. Required
    Grouping Sets One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. Optional, required if no measures.
    Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
    Measures A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. Optional, required if no grouping sets.

    Consistent Partition Window Operation

    A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution and ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Window Properties

    Property Description Required
    Input The relational input. Required
    Window Functions One or more window functions. At least one required.

    Expand Operation

    The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept.
    Direct Output Order The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from.

    Expand Properties

    Property Description Required
    Input The relational input. Required
    Direct Fields Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field Required

    Switching Field Properties

    A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates.

    Property Description Required
    Duplicates List of one or more expressions. The output will contain a row for each expression. Required

    Hashing Window Operation

    A window aggregate operation that will build hash tables for each distinct partition expression.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution. Eliminates ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Hashing Window Properties

    Property Description Required
    Input The relational input. Required
    Window Expressions One or more window expressions. At least one required.

    Streaming Window Operation

    A window aggregate operation that relies on a partition/ordering sorted input.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution. Eliminates ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Streaming Window Properties

    Property Description Required
    Input The relational input. Required
    Window Expressions One or more window expressions. Must be supported by the sortedness of the input. At least one required.
    GitHub

    Physical Relations

    There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the “physical plan” but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait.

    Hash Equijoin Operator

    The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated.
    Direct Output Order Same as the Join operator.

    Hash Equijoin Properties

    Property Description Required
    Left Input A relational input.(Probe-side) Required
    Right Input A relational input.(Build-side) Required
    Left Keys References to the fields to join on in the left input. Required
    Right Keys References to the fields to join on in the right input. Required
    Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
    Join Type One of the join types defined in the Join operator. Required

    NLJ (Nested Loop Join) Operator

    The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness is eliminated.
    Direct Output Order Same as the Join operator.

    NLJ Properties

    Property Description Required
    Left Input A relational input. Required
    Right Input A relational input. Required
    Join Expression A boolean condition that describes whether each record from the left set “match” the record from the right set. Optional. Defaults to true (a Cartesian join).
    Join Type One of the join types defined in the Join operator. Required

    Merge Equijoin Operator

    The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.

    Signature Value
    Inputs 2
    Outputs 1
    Property Maintenance Distribution is maintained. Orderedness is eliminated.
    Direct Output Order Same as the Join operator.

    Merge Join Properties

    Property Description Required
    Left Input A relational input. Required
    Right Input A relational input. Required
    Left Keys References to the fields to join on in the left input. Required
    Right Keys References to the fields to join on in the right input. Reauired
    Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
    Join Type One of the join types defined in the Join operator. Required

    Exchange Operator

    The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderedness is maintained. Distribution is overwritten based on configuration.
    Direct Output Order Order of the input.

    Exchange Types

    Type Description
    Scatter Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels
    Single Bucket Define an expression that provides a single i32 bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition.
    Multi Bucket Define an expression that provides a List<i32> of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression.
    Broadcast Send all records to all partitions.
    Round Robin Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next.

    Exchange Properties

    Property Description Required
    Input The relational input. Required.
    Distribution Type One of the distribution types defined above. Required.
    Partition Count The number of partitions targeted for output. Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count.
    Expression Mapping Describes a relationship between each partition ID and the destination that partition should be sent to. Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value.

    Merging Capture

    A receiving operation that will merge multiple ordered streams to maintain orderedness.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderedness and distribution are maintained.
    Direct Output Order Order of the input.

    Merging Capture Properties

    Property Description Required
    Blocking Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. Optional, defaults to false

    Simple Capture

    A receiving operation that will merge multiple streams in an arbitrary order.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Orderness is empty after this operation. Distribution are maintained.
    Direct Output Order Order of the input.

    Naive Capture Properties

    Property Description Required
    Input The relational input. Required

    Top-N Operation

    The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit.
    Direct Output Order The field order of the input.

    Top-N Properties

    Property Description Required
    Input The relational input. Required
    Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required
    Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0.
    Count A positive integer. Declares the number of records that should be returned. Required

    Hash Aggregate Operation

    The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed.
    Direct Output Order Same as defined by Aggregate operation.

    Hash Aggregate Properties

    Property Description Required
    Input The relational input. Required
    Grouping Sets One or more grouping sets. Optional, required if no measures.
    Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
    Measures A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. Optional, required if no grouping sets.

    Streaming Aggregate Operation

    The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering.
    Direct Output Order Same as defined by Aggregate operation.

    Streaming Aggregate Properties

    Property Description Required
    Input The relational input. Required
    Grouping Sets One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. Optional, required if no measures.
    Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
    Measures A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. Optional, required if no grouping sets.

    Consistent Partition Window Operation

    A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution and ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Window Properties

    Property Description Required
    Input The relational input. Required
    Window Functions One or more window functions. At least one required.

    Expand Operation

    The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept.
    Direct Output Order The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from.

    Expand Properties

    Property Description Required
    Input The relational input. Required
    Direct Fields Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field Required

    Switching Field Properties

    A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates.

    Property Description Required
    Duplicates List of one or more expressions. The output will contain a row for each expression. Required

    Hashing Window Operation

    A window aggregate operation that will build hash tables for each distinct partition expression.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution. Eliminates ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Hashing Window Properties

    Property Description Required
    Input The relational input. Required
    Window Expressions One or more window expressions. At least one required.

    Streaming Window Operation

    A window aggregate operation that relies on a partition/ordering sorted input.

    Signature Value
    Inputs 1
    Outputs 1
    Property Maintenance Maintains distribution. Eliminates ordering.
    Direct Output Order Same as Project operator (input followed by each window expression).

    Streaming Window Properties

    Property Description Required
    Input The relational input. Required
    Window Expressions One or more window expressions. Must be supported by the sortedness of the input. At least one required.
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/relations/user_defined_relations/index.html b/relations/user_defined_relations/index.html index 16bc3ecf..3f6ed531 100644 --- a/relations/user_defined_relations/index.html +++ b/relations/user_defined_relations/index.html @@ -1,4 +1,4 @@ - User Defined Relations - Substrait: Cross-Language Serialization for Relational Algebra
    GitHub
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json index 7653c9cf..ad4d579b 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Substrait: Cross-Language Serialization for Relational Algebra","text":""},{"location":"#what-is-substrait","title":"What is Substrait?","text":"

    Substrait is a format for describing compute operations on structured data. It is designed for interoperability across different languages and systems.

    "},{"location":"#how-does-it-work","title":"How does it work?","text":"

    Substrait provides a well-defined, cross-language specification for data compute operations. This includes a consistent declaration of common operations, custom operations and one or more serialized representations of this specification. The spec focuses on the semantics of each operation. In addition to the specification the Substrait ecosystem also includes a number of libraries and useful tools.

    We highly recommend the tutorial to learn how a Substrait plan is constructed.

    "},{"location":"#benefits","title":"Benefits","text":"
    • Avoids every system needing to create a communication method between every other system \u2013 each system merely supports ingesting and producing Substrait and it instantly becomes a part of the greater ecosystem.
    • Makes every part of the system upgradable. There\u2019s a new query engine that\u2019s ten times faster? Just plug it in!
    • Enables heterogeneous environments \u2013 run on a cluster of an unknown set of execution engines!
    • The text version of the Substrait plan allows you to quickly see how a plan functions without needing a visualizer (although there are Substrait visualizers as well!).
    "},{"location":"#example-use-cases","title":"Example Use Cases","text":"
    • Communicate a compute plan between a SQL parser and an execution engine (e.g. Calcite SQL parsing to Arrow C++ compute kernel)
    • Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)
    • Submit a plan to different execution engines (e.g. Datafusion and Postgres) and get a consistent interpretation of the semantics.
    • Create an alternative plan generation implementation that can connect an existing end-user compute expression system to an existing end-user processing engine (e.g. Pandas operations executed inside SingleStore)
    • Build a pluggable plan visualization tool (e.g. D3 based plan visualizer)
    "},{"location":"about/","title":"Substrait: Cross-Language Serialization for Relational Algebra","text":""},{"location":"about/#project-vision","title":"Project Vision","text":"

    The Substrait project aims to create a well-defined, cross-language specification for data compute operations. The specification declares a set of common operations, defines their semantics, and describes their behavior unambiguously. The project also defines extension points and serialized representations of the specification.

    In many ways, the goal of this project is similar to that of the Apache Arrow project. Arrow is focused on a standardized memory representation of columnar data. Substrait is focused on what should be done to data.

    "},{"location":"about/#why-not-use-sql","title":"Why not use SQL?","text":"

    SQL is a well known language for describing queries against relational data. It is designed to be simple and allow reading and writing by humans. Substrait is not intended as a replacement for SQL and works alongside SQL to provide capabilities that SQL lacks. SQL is not a great fit for systems that actually satisfy the query because it does not provide sufficient detail and is not represented in a format that is easy for processing. Because of this, most modern systems will first translate the SQL query into a query plan, sometimes called the execution plan. There can be multiple levels of a query plan (e.g. physical and logical), a query plan may be split up and distributed across multiple systems, and a query plan often undergoes simplifying or optimizing transformations. The SQL standard does not define the format of the query or execution plan and there is no open format that is supported by a broad set of systems. Substrait was created to provide a standard and open format for these query plans.

    "},{"location":"about/#why-not-just-do-this-within-an-existing-oss-project","title":"Why not just do this within an existing OSS project?","text":"

    A key goal of the Substrait project is to not be coupled to any single existing technology. Trying to get people involved in something can be difficult when it seems to be primarily driven by the opinions and habits of a single community. In many ways, this situation is similar to the early situation with Arrow. The precursor to Arrow was the Apache Drill ValueVectors concepts. As part of creating Arrow, Wes and Jacques recognized the need to create a new community to build a fresh consensus (beyond just what the Apache Drill community wanted). This separation and new independent community was a key ingredient to Arrow\u2019s current success. The needs here are much the same: many separate communities could benefit from Substrait, but each have their own pain points, type systems, development processes and timelines. To help resolve these tensions, one of the approaches proposed in Substrait is to set a bar that at least two of the top four OSS data technologies (Arrow, Spark, Iceberg, Trino) supports something before incorporating it directly into the Substrait specification. (Another goal is to support strong extension points at key locations to avoid this bar being a limiter to broad adoption.)

    "},{"location":"about/#related-technologies","title":"Related Technologies","text":"
    • Apache Calcite: Many ideas in Substrait are inspired by the Calcite project. Calcite is a great JVM-based SQL query parsing and optimization framework. A key goal of the Substrait project is to expose Calcite capabilities more easily to non-JVM technologies as well as expose query planning operations as microservices.
    • Apache Arrow: The Arrow format for data is what the Substrait specification attempts to be for compute expressions. A key goal of Substrait is to enable Substrait producers to execute work within the Arrow Rust and C++ compute kernels.
    "},{"location":"about/#why-the-name-substrait","title":"Why the name Substrait?","text":"

    A strait is a narrow connector of water between two other pieces of water. In analytics, data is often thought of as water. Substrait is focused on instructions related to the data. In other words, what defines or supports the movement of water between one or more larger systems. Thus, the underlayment for the strait connecting different pools of water => sub-strait.

    "},{"location":"faq/","title":"Frequently Asked Question","text":""},{"location":"faq/#what-is-the-purpose-of-the-post-join-filter-field-on-join-relations","title":"What is the purpose of the post-join filter field on Join relations?","text":"

    The post-join filter on the various Join relations is not always equivalent to an explicit Filter relation AFTER the Join.

    See the example here that highlights how the post-join filter behaves differently than a Filter relation in the case of a left join.

    "},{"location":"governance/","title":"Substrait Project Governance","text":"

    The Substrait project is run by volunteers in a collaborative and open way. Its governance is inspired by the Apache Software Foundation. In most cases, people familiar with the ASF model can work with Substrait in the same way. The biggest differences between the models are:

    • Substrait does not have a separate infrastructure governing body that gatekeeps the adoption of new developer tools and technologies.
    • Substrait Management Committee (SMC) members are responsible for recognizing the corporate relationship of its members and ensuring diverse representation and corporate independence.
    • Substrait does not condone private mailing lists. All project business should be discussed in public The only exceptions to this are security escalations (security@substrait.io) and harassment (harassment@substrait.io).
    • Substrait has an automated continuous release process with no formal voting process per release.

    More details about concrete things Substrait looks to avoid can be found below.

    "},{"location":"governance/#the-substrait-project","title":"The Substrait Project","text":"

    The Substrait project consists of the code and repositories that reside in the substrait-io GitHub organization, the Substrait.io website, the Substrait mailing list, MS-hosted teams community calls and the Substrait Slack workspace. (All are open to everyone and recordings/transcripts are made where technology supports it.)

    "},{"location":"governance/#substrait-volunteers","title":"Substrait Volunteers","text":"

    We recognize four groups of individuals related to the project.

    "},{"location":"governance/#user","title":"User","text":"

    A user is someone who uses Substrait. They may contribute to Substrait by providing feedback to developers in the form of bug reports and feature suggestions. Users participate in the Substrait community by helping other users on mailing lists and user support forums.

    "},{"location":"governance/#contributors","title":"Contributors","text":"

    A contributor is a user who contributes to the project in the form of code or documentation. They take extra steps to participate in the project (loosely defined as the set of repositories under the github substrait-io organization) , are active on the developer mailing list, participate in discussions, and provide patches, documentation, suggestions, and criticism.

    "},{"location":"governance/#committer","title":"Committer","text":"

    A committer is a developer who has write access to the code repositories and has a signed Contributor License Agreement (CLA) on file. Not needing to depend on other people to make patches to the code or documentation, they are actually making short-term decisions for the project. The SMC can (even tacitly) agree and approve the changes into permanency, or they can reject them. Remember that the SMC makes the decisions, not the individual committers.

    "},{"location":"governance/#smc-member","title":"SMC Member","text":"

    A SMC member is a committer who was elected due to merit for the evolution of the project. They have write access to the code repository, the right to cast binding votes on all proposals on community-related decisions,the right to propose other active contributors for committership, and the right to invite active committers to the SMC. The SMC as a whole is the entity that controls the project, nobody else. They are responsible for the continued shaping of this governance model.

    "},{"location":"governance/#substrait-management-and-collaboration","title":"Substrait Management and Collaboration","text":"

    The Substrait project is managed using a collaborative, consensus-based process. We do not have a hierarchical structure; rather, different groups of contributors have different rights and responsibilities in the organization.

    "},{"location":"governance/#communication","title":"Communication","text":"

    Communication must be done via mailing lists, Slack, and/or Github. Communication is always done publicly. There are no private lists and all decisions related to the project are made in public. Communication is frequently done asynchronously since members of the community are distributed across many time zones.

    "},{"location":"governance/#substrait-management-committee","title":"Substrait Management Committee","text":"

    The Substrait Management Committee is responsible for the active management of Substrait. The main role of the SMC is to further the long-term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration takes place. As part of this, the SMC is the primary approver of specification changes, ensuring that proposed changes represent a balanced and thorough examination of possibilities. This doesn\u2019t mean that the SMC has to be involved in the minutiae of a particular specification change but should always shepard a healthy process around specification changes.

    "},{"location":"governance/#substrait-voting-process","title":"Substrait Voting Process","text":"

    Because one of the fundamental aspects of accomplishing things is doing so by consensus, we need a way to tell whether we have reached consensus. We do this by voting. There are several different types of voting. In all cases, it is recommended that all community members vote. The number of binding votes required to move forward and the community members who have \u201cbinding\u201d votes differs depending on the type of proposal made. In all cases, a veto of a binding voter results in an inability to move forward.

    The rules require that a community member registering a negative vote must include an alternative proposal or a detailed explanation of the reasons for the negative vote. The community then tries to gather consensus on an alternative proposal that can resolve the issue. In the great majority of cases, the concerns leading to the negative vote can be addressed. This process is called \u201cconsensus gathering\u201d and we consider it a very important indication of a healthy community.

    +1 votes required Binding voters Voting Location Process/Governance modifications & actions. This includes promoting new contributors to committer or SMC. 3 SMC Mailing List Format/Specification Modifications (including breaking extension changes) 2 SMC Github PR Documentation Updates (formatting, moves) 1 SMC Github PR Typos 1 Committers Github PR Non-breaking function introductions 1 (not including proposer) Committers Github PR Non-breaking extension additions & non-format code modifications 1 (not including proposer) Committers Github PR Changes (non-breaking or breaking) to a Substrait library (i.e. substrait-java, substrait-validator) 1 (not including proposer) Committers Github PR"},{"location":"governance/#review-then-commit","title":"Review-Then-Commit","text":"

    Substrait follows a review-then-commit policy. This requires that all changes receive consensus approval before being committed to the code base. The specific vote requirements follow the table above.

    "},{"location":"governance/#expressing-votes","title":"Expressing Votes","text":"

    The voting process may seem more than a little weird if you\u2019ve never encountered it before. Votes are represented as numbers between -1 and +1, with \u2018-1\u2019 meaning \u2018no\u2019 and \u2018+1\u2019 meaning \u2018yes.\u2019

    The in-between values indicate how strongly the voting individual feels. Here are some examples of fractional votes and what the voter might be communicating with them:

    • +0: \u2018I don\u2019t feel strongly about it, but I\u2019m okay with this.\u2019
    • -0: \u2018I won\u2019t get in the way, but I\u2019d rather we didn\u2019t do this.\u2019
    • -0.5: \u2018I don\u2019t like this idea, but I can\u2019t find any rational justification for my feelings.\u2019
    • ++1: \u2018Wow! I like this! Let\u2019s do it!\u2019
    • -0.9: \u2018I really don\u2019t like this, but I\u2019m not going to stand in the way if everyone else wants to go ahead with it.\u2019
    • +0.9: \u2018This is a cool idea and I like it, but I don\u2019t have time/the skills necessary to help out.\u2019
    "},{"location":"governance/#votes-on-code-modification","title":"Votes on Code Modification","text":"

    For code-modification votes, +1 votes (review approvals in Github are considered equivalent to a +1) are in favor of the proposal, but -1 votes are vetoes and kill the proposal dead until all vetoers withdraw their -1 votes.

    "},{"location":"governance/#vetoes","title":"Vetoes","text":"

    A -1 (or an unaddressed PR request for changes) vote by a qualified voter stops a code-modification proposal in its tracks. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetoes stand until and unless the individual withdraws their veto.

    To prevent vetoes from being used capriciously, the voter must provide with the veto a technical or community justification showing why the change is bad.

    "},{"location":"governance/#why-do-we-vote","title":"Why do we vote?","text":"

    Votes help us to openly resolve conflicts. Without a process, people tend to avoid conflict and thrash around. Votes help to make sure we do the hard work of resolving the conflict.

    "},{"location":"governance/#substrait-is-non-commercial-but-commercially-aware","title":"Substrait is non-commercial but commercially-aware","text":"

    Substrait\u2019s mission is to produce software for the public good. All Substrait software is always available for free, and solely under the Apache License.

    We\u2019re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes. However it is important in these cases to ensure that the third party does not misuse the brand and reputation of the Substrait project for its own purposes. It is important for the longevity and community health of Substrait that the community gets the appropriate credit for producing freely available software.

    The SMC actively track the corporate allegiances of community members and strives to ensure influence around any particular aspect of the project isn\u2019t overly skewed towards a single corporate entity.

    "},{"location":"governance/#substrait-trademark","title":"Substrait Trademark","text":"

    The SMC is responsible for protecting the Substrait name and brand. TBD what action is taken to support this.

    "},{"location":"governance/#project-roster","title":"Project Roster","text":""},{"location":"governance/#substrait-management-committee-smc","title":"Substrait Management Committee (SMC)","text":"Name Association Phillip Cloud Voltron Data Weston Pace LanceDB Jacques Nadeau Sundeck Victor Barua Datadog David Sisson Voltron Data"},{"location":"governance/#substrait-committers","title":"Substrait Committers","text":"Name Association Jeroen van Straten Qblox Carlo Curino Microsoft James Taylor Sundeck Sutou Kouhei Clearcode Micah Kornfeld Google Jinfeng Ni Sundeck Andy Grove Nvidia Jesus Camacho Rodriguez Microsoft Rich Tia Voltron Data Vibhatha Abeykoon Voltron Data Nic Crane Recast Gil Forsyth Voltron Data ChaoJun Zhang Intel Matthijs Brobbel Voltron Data Matt Topol Voltron Data"},{"location":"governance/#additional-detail-about-differences-from-asf","title":"Additional detail about differences from ASF","text":"

    Corporate Awareness: The ASF takes a blind-eye approach that has proven to be too slow to correct corporate influence which has substantially undermined many OSS projects. In contrast, Substrait SMC members are responsible for identifying corporate risks and over-representation and adjusting inclusion in the project based on that (limiting committership, SMC membership, etc). Each member of the SMC shares responsibility to expand the community and seek out corporate diversity.

    Infrastructure: The ASF shows its age wrt to infrastructure, having been originally built on SVN. Some examples of requirements that Substrait is eschewing that exist in ASF include: custom git infrastructure, release process that is manual, project external gatekeeping around the use of new tools/technologies.

    "},{"location":"community/","title":"Community","text":"

    Substrait is developed as a consensus-driven open source product under the Apache 2.0 license. Development is done in the open leveraging GitHub issues and PRs.

    "},{"location":"community/#get-in-touch","title":"Get In Touch","text":"Mailing List/Google Group We use the mailing list to discuss questions, formulate plans and collaborate asynchronously. Slack Channel The developers of Substrait frequent the Slack channel. You can get an invite to the channel by following this link. GitHub Issues Substrait is developed via GitHub issues and pull requests. If you see a problem or want to enhance the product, we suggest you file a GitHub issue for developers to review. Twitter The @substrait_io account on Twitter is our official account. Follow-up to keep to date on what is happening with Substrait! Docs Our website is all maintained in our source repository. If there is something you think can be improved, feel free to fork our repository and post a pull request. Meetings Our community meets every other week on Wednesday."},{"location":"community/#talks","title":"Talks","text":"

    Want to learn more about Substrait? Try the following presentations and slide decks.

    • Substrait: A Common Representation for Data Compute Plans (Jacques Nadeau, April 2022) [slides]
    "},{"location":"community/#citation","title":"Citation","text":"

    If you use Substrait in your research, please cite it using the following BibTeX entry:

    @misc{substrait,\n  author = {substrait-io},\n  title = {Substrait: Cross-Language Serialization for Relational Algebra},\n  year = {2021},\n  month = {8},\n  day = {31},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/substrait-io/substrait}}\n}\n
    "},{"location":"community/#contribution","title":"Contribution","text":"

    All contributors are welcome to Substrait. If you want to join the project, open a PR or get in touch with us as above.

    "},{"location":"community/#principles","title":"Principles","text":"
    • Be inclusive and open to all.
    • Ensure a diverse set of contributors that come from multiple data backgrounds to maximize general utility.
    • Build a specification based on open consensus.
    • Avoid over-reliance/coupling to any single technology.
    • Make the specification and all tools freely available on a permissive license (ApacheV2)
    "},{"location":"community/powered_by/","title":"Powered by Substrait","text":"

    In addition to the work maintained in repositories within the substrait-io GitHub organization, a growing list of other open source projects have adopted Substrait.

    Acero Acero is a query execution engine implemented as a part of the Apache Arrow C++ library. Acero provides a Substrait consumer interface. ADBC ADBC (Arrow Database Connectivity) is an API specification for Apache Arrow-based database access. ADBC allows applications to pass queries either as SQL strings or Substrait plans. Arrow Flight SQL Arrow Flight SQL is a client-server protocol for interacting with databases and query engines using the Apache Arrow in-memory columnar format and the Arrow Flight RPC framework. Arrow Flight SQL allows clients to send queries as SQL strings or Substrait plans. DataFusion DataFusion is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion provides a Substrait producer and consumer that can convert DataFusion logical plans to and from Substrait plans. It can be used through the DataFusion Python bindings. DuckDB DuckDB is an in-process SQL OLAP database management system. DuckDB provides a Substrait extension that allows users to produce and consume Substrait plans through DuckDB\u2019s SQL, Python, and R APIs. Gluten Gluten is a plugin for Apache Spark that allows computation to be offloaded to engines that have better performance or efficiency than Spark\u2019s built-in JVM-based engine. Gluten converts Spark physical plans to Substrait plans. Ibis Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It includes a dataframe API for Python with support for more than 10 query execution engines, plus a Substrait producer to enable support for Substrait-consuming execution engines. Substrait R Interface The Substrait R interface package allows users to construct Substrait plans from R for evaluation by Substrait-consuming execution engines. The package provides a dplyr backend as well as lower-level interfaces for creating Substrait plans and integrations with Acero and DuckDB. Velox Velox is a unified execution engine aimed at accelerating data management systems and streamlining their development. Velox provides a Substrait consumer interface.

    To add your project to this list, please open a pull request.

    "},{"location":"expressions/aggregate_functions/","title":"Aggregate Functions","text":"

    Aggregate functions are functions that define an operation which consumes values from multiple records to a produce a single output. Aggregate functions in SQL are typically used in GROUP BY functions. Aggregate functions are similar to scalar functions and function signatures with a small set of different properties.

    Aggregate function signatures contain all the properties defined for scalar functions. Additionally, they contain the properties below:

    Property Description Required Inherits All properties defined for scalar function. N/A Ordered Whether the result of this function is sensitive to sort order. Optional, defaults to false Maximum set size Maximum allowed set size as an unsigned integer. Optional, defaults to unlimited Decomposable Whether the function can be executed in one or more intermediate steps. Valid options are: NONE, ONE, MANY, describing how intermediate steps can be taken. Optional, defaults to NONE Intermediate Output Type If the function is decomposable, represents the intermediate output type that is used, if the function is defined as either ONE or MANY decomposable. Will be a struct in many cases. Required for ONE and MANY. Invocation Whether the function uses all or only distinct values in the aggregation calculation. Valid options are: ALL, DISTINCT. Optional, defaults to ALL"},{"location":"expressions/aggregate_functions/#aggregate-binding","title":"Aggregate Binding","text":"

    When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

    Property Description Phase Describes the input type of the data: [INITIAL_TO_INTERMEDIATE, INTERMEDIATE_TO_INTERMEDIATE, INITIAL_TO_RESULT, INTERMEDIATE_TO_RESULT] describing what portion of the operation is required. For functions that are NOT decomposable, the only valid option will be INITIAL_TO_RESULT. Ordering Zero or more ordering keys along with key order (ASC|DESC|NULL FIRST, etc.), declared similar to the sort keys in an ORDER BY relational operation. If no sorts are specified, the records are not sorted prior to being passed to the aggregate function."},{"location":"expressions/embedded_functions/","title":"Embedded Functions","text":"

    Embedded functions are a special kind of function where the implementation is embedded within the actual plan. They are commonly used in tools where a user intersperses business logic within a data pipeline. This is more common in data science workflows than traditional SQL workflows.

    Embedded functions are not pre-registered. Embedded functions require that data be consumed and produced with a standard API, may require memory allocation and have determinate error reporting behavior. They may also have specific runtime dependencies. For example, a Python pickle function may depend on pyarrow 5.0 and pynessie 1.0.

    Properties for an embedded function include:

    Property Description Required Function Type The type of embedded function presented. Required Function Properties Function properties, one of those items defined below. Required Output Type The fully resolved output type for this embedded function. Required

    The binary representation of an embedded function is:

    Binary RepresentationHuman Readable Representation
    message EmbeddedFunction {\n  repeated Expression arguments = 1;\n  Type output_type = 2;\n  oneof kind {\n    PythonPickleFunction python_pickle_function = 3;\n    WebAssemblyFunction web_assembly_function = 4;\n  }\n\n  message PythonPickleFunction {\n    bytes function = 1;\n    repeated string prerequisite = 2;\n  }\n\n  message WebAssemblyFunction {\n    bytes script = 1;\n    repeated string prerequisite = 2;\n  }\n}\n

    As the bytes are opaque to Substrait there is no equivalent human readable form.

    "},{"location":"expressions/embedded_functions/#function-details","title":"Function Details","text":"

    There are many types of possible stored functions. For each, Substrait works to expose the function in as descriptive a way as possible to support the largest number of consumers.

    "},{"location":"expressions/embedded_functions/#python-pickle-function-type","title":"Python Pickle Function Type","text":"Property Description Required Pickle Body binary pickle encoded function using [TBD] API representation to access arguments. True Prereqs A list of specific Python conda packages that are prerequisites for access (a structured version of a requirements.txt file). Optional, defaults to none"},{"location":"expressions/embedded_functions/#webassembly-function-type","title":"WebAssembly Function Type","text":"Property Description Required Script WebAssembly function True Prereqs A list of AssemblyScript prerequisites required to compile the assemblyscript function using NPM coordinates. Optional, defaults to none Discussion Points
    • What are the common embedded function formats?
    • How do we expose the data for a function?
    • How do we express batching capabilities?
    • How do we ensure/declare containerization?
    "},{"location":"expressions/extended_expression/","title":"Extended Expression","text":"

    Extended Expression messages are provided for expression-level protocols as an alternative to using a Plan. They mainly target expression-only evaluations, such as those computed in Filter/Project/Aggregation rels. Unlike the original Expression defined in the substrait protocol, Extended Expression messages require more information to completely describe the computation context including: input data schema, referred function signatures, and output schema.

    Since Extended Expression will be used seperately from the Plan rel representation, it will need to include basic fields like Version.

    ExtendedExpression Message
    message ExtendedExpression {\n  // Substrait version of the expression. Optional up to 0.17.0, required for later\n  // versions.\n  Version version = 7;\n\n  // a list of yaml specifications this expression may depend on\n  repeated substrait.extensions.SimpleExtensionURI extension_uris = 1;\n\n  // a list of extensions this expression may depend on\n  repeated substrait.extensions.SimpleExtensionDeclaration extensions = 2;\n\n  // one or more expression trees with same order in plan rel\n  repeated ExpressionReference referred_expr = 3;\n\n  NamedStruct base_schema = 4;\n  // additional extensions associated with this expression.\n  substrait.extensions.AdvancedExtension advanced_extensions = 5;\n\n  // A list of com.google.Any entities that this plan may use. Can be used to\n  // warn if some embedded message types are unknown. Note that this list may\n  // include message types that are ignorable (optimizations) or that are\n  // unused. In many cases, a consumer may be able to work with a plan even if\n  // one or more message types defined here are unknown.\n  repeated string expected_type_urls = 6;\n\n}\n
    "},{"location":"expressions/extended_expression/#input-and-output-data-schema","title":"Input and output data schema","text":"

    Similar to base_schema defined in ReadRel, the input data schema describes the name/type/nullibilty and layout info of input data for the target expression evalutation. It also has a field name to define the name of the output data.

    "},{"location":"expressions/extended_expression/#referred-expression","title":"Referred expression","text":"

    An Extended Exression will have one or more referred expressions, which can be either Expression or AggregateFunction. Additional types of expressions may be added in the future.

    For a message with multiple expressions, users may produce each Extended Expression in the same order as they occur in the original Plan rel. But, the consumer does NOT have to handle them in this order. A consumer needs only to ensure that the columns in the final output are organized in the same order as defined in the message.

    "},{"location":"expressions/extended_expression/#function-extensions","title":"Function extensions","text":"

    Function extensions work the same for both Extended Expression and the original Expression defined in the Substrait protocol.

    "},{"location":"expressions/field_references/","title":"Field References","text":"

    In Substrait, all fields are dealt with on a positional basis. Field names are only used at the edge of a plan, for the purposes of naming fields for the outside world. Each operation returns a simple or compound data type. Additional operations can refer to data within that initial operation using field references. To reference a field, you use a reference based on the type of field position you want to reference.

    Reference Type Properties Type Applicability Type return Struct Field Ordinal position. Zero-based. Only legal within the range of possible fields within a struct. Selecting an ordinal outside the applicable field range results in an invalid plan. struct Type of field referenced Array Value Array offset. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Negative and positive overflows return null values (no wrapping). list type of list Array Slice Array offset and element count. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Position does not wrap, nor does length. list Same type as original list Map Key A map value that is matched exactly against available map keys and returned. map Value type of map Map KeyExpression A wildcard string that is matched against a simplified form of regular expressions. Requires the key type of the map to be a character type. [Format detail needed, intention to include basic regex concepts such as greedy/non-greedy.] map List of map value type Masked Complex Expression An expression that provides a mask over a schema declaring which portions of the schema should be presented. This allows a user to select a portion of a complex object but mask certain subsections of that same object. any any"},{"location":"expressions/field_references/#compound-references","title":"Compound References","text":"

    References are typically constructed as a sequence. For example: [struct position 0, struct position 1, array offset 2, array slice 1..3].

    Field references are in the same order they are defined in their schema. For example, let\u2019s consider the following schema:

    column a:\n  struct<\n    b: list<\n      struct<\n        c: map<string, \n          struct<\n            x: i32>>>>>\n

    If we want to represent the SQL expression:

    a.b[2].c['my_map_key'].x\n

    We will need to declare the nested field such that:

    Struct field reference a\nStruct field b\nList offset 2\nStruct field c\nMap key my_map_key\nStruct field x\n

    Or more formally in Protobuf Text, we get:

    selection {\n  direct_reference {\n    struct_field {\n      field: 0 # .a\n      child {\n        struct_field {\n          field: 0 # .b\n          child {\n            list_element {\n              offset: 2\n              child {\n                struct_field {\n                  field: 0 # .c\n                  child {\n                    map_key {\n                      map_key {\n                        string: \"my_map_key\" # ['my_map_key']\n                      }\n                      child {\n                        struct_field {\n                          field: 0 # .x\n                        }\n                      }\n                    }\n                  }\n                }\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n  root_reference { }\n}\n
    "},{"location":"expressions/field_references/#validation","title":"Validation","text":"

    References must validate against the schema of the record being referenced. If not, an error is expected.

    "},{"location":"expressions/field_references/#masked-complex-expression","title":"Masked Complex Expression","text":"

    A masked complex expression is used to do a subselection of a portion of a complex record. It allows a user to specify the portion of the complex object to consume. Imagine you have a schema of (note that structs are lists of fields here, as they are in general in Substrait as field names are not used internally in Substrait):

    struct:\n  - struct:\n    - integer\n    - list:\n      struct:\n        - i32\n        - string\n        - string\n     - i32\n  - i16\n  - i32\n  - i64\n

    Given this schema, you could declare a mask of fields to include in pseudocode, such as:

    0:[0,1:[..5:[0,2]]],2,3\n\nOR\n\n0:\n  - 0\n  - 1:\n    ..5:\n      -0\n      -2\n2\n3\n

    This mask states that we would like to include fields 0 2 and 3 at the top-level. Within field 0, we want to include subfields 0 and 1. For subfield 0.1, we want to include up to only the first 5 records in the array and only includes fields 0 and 2 within the struct within that array. The resulting schema would be:

    struct:\n  - struct:\n    - integer\n    - list:\n      struct: \n        - i32\n        - string\n  - i32\n  - i64\n
    "},{"location":"expressions/field_references/#unwrapping-behavior","title":"Unwrapping Behavior","text":"

    By default, when only a single field is selected from a struct, that struct is removed. When only a single element is removed from a list, the list is removed. A user can also configure the mask to avoid unwrapping in these cases. [TBD how we express this in the serialization formats.]

    Discussion Points
    • Should we support column reordering/positioning using a masked complex expression? (Right now, you can only mask things out.)
    "},{"location":"expressions/scalar_functions/","title":"Scalar Functions","text":"

    A function is a scalar function if that function takes in values from a single record and produces an output value. To clearly specify the definition of functions, Substrait declares an extensible specification plus binding approach to function resolution. A scalar function signature includes the following properties:

    Property Description Required Name One or more user-friendly UTF-8 strings that are used to reference this function. At least one value is required. List of arguments Argument properties are defined below. Arguments can be fully defined or calculated with a type expression. See further details below. Optional, defaults to niladic. Deterministic Whether this function is expected to reproduce the same output when it is invoked multiple times with the same input. This informs a plan consumer on whether it can constant-reduce the defined function. An example would be a random() function, which is typically expected to be evaluated repeatedly despite having the same set of inputs. Optional, defaults to true. Session Dependent Whether this function is influenced by the session context it is invoked within. For example, a function may be influenced by a user who is invoking the function, the time zone of a session, or some other non-obvious parameter. This can inform caching systems on whether a particular function is cacheable. Optional, defaults to false. Variadic Behavior Whether the last argument of the function is variadic or a single argument. If variadic, the argument can optionally have a lower bound (minimum number of instances) and an upper bound (maximum number of instances). Optional, defaults to single value. Nullability Handling Describes how nullability of input arguments maps to nullability of output arguments. Three options are: MIRROR, DECLARED_OUTPUT and DISCRETE. More details about nullability handling are listed below. Optional, defaults to MIRROR Description Additional description of function for implementers or users. Should be written human-readable to allow exposure to end users. Presented as a map with language => description mappings. E.g. { \"en\": \"This adds two numbers together.\", \"fr\": \"cela ajoute deux nombres\"}. Optional Return Value The output type of the expression. Return types can be expressed as a fully-defined type or a type expression. See below for more on type expressions. Required Implementation Map A map of implementation locations for one or more implementations of the given function. Each key is a function implementation type. Implementation types include examples such as: AthenaArrowLambda, TrinoV361Jar, ArrowCppKernelEnum, GandivaEnum, LinkedIn Transport Jar, etc. [Definition TBD]. Implementation type has one or more properties associated with retrieval of that implementation. Optional"},{"location":"expressions/scalar_functions/#argument-types","title":"Argument Types","text":"

    There are three main types of arguments: value arguments, type arguments, and enumerations. Every defined arguments must be specified in every invocation of the function. When specified, the position of these arguments in the function invocation must match the position of the arguments as defined in the YAML function definition.

    • Value arguments: arguments that refer to a data value. These could be constants (literal expressions defined in the plan) or variables (a reference expression that references data being processed by the plan). This is the most common type of argument. The value of a value argument is not available in output derivation, but its type is. Value arguments can be declared in one of two ways: concrete or parameterized. Concrete types are either simple types or compound types with all parameters fully defined (without referencing any type arguments). Examples include i32, fp32, VARCHAR<20>, List<fp32>, etc. Parameterized types are discussed further below.
    • Type arguments: arguments that are used only to inform the evaluation and/or type derivation of the function. For example, you might have a function which is truncate(<type> DECIMAL<P0,S0>, <value> DECIMAL<P1, S1>, <value> i32). This function declares two value arguments and a type argument. The difference between them is that the type argument has no value at runtime, while the value arguments do.
    • Enumeration: arguments that support a fixed set of declared values as constant arguments. These arguments must be specified as part of an expression. While these could also have been implemented as constant string value arguments, they are formally included to improve validation/contextual help/etc. for frontend processors and IDEs. An example might be extract([DAY|YEAR|MONTH], <date value>). In this example, a producer must specify a type of date part to extract. Note, the value of a required enumeration cannot be used in type derivation.
    "},{"location":"expressions/scalar_functions/#value-argument-properties","title":"Value Argument Properties","text":"Property Description Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0) Type A fully defined type or a type expression. Required Constant Whether this argument is required to be a constant for invocation. For example, in some system a regular expression pattern would only be accepted as a literal and not a column value reference. Optional, defaults to false"},{"location":"expressions/scalar_functions/#type-argument-properties","title":"Type Argument Properties","text":"Property Description Required Type A partially or completely parameterized type. E.g. List<K> or K Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)"},{"location":"expressions/scalar_functions/#required-enumeration-properties","title":"Required Enumeration Properties","text":"Property Description Required Options List of valid string options for this argument Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)"},{"location":"expressions/scalar_functions/#options","title":"Options","text":"

    In addition to arguments each call may specify zero or more options. These are similar to a required enumeration but more focused on supporting alternative behaviors. Options can be left unspecified and the consumer is free to choose which implementation to use. An example use case might be OVERFLOW_BEHAVIOR:[OVERFLOW, SATURATE, ERROR] If unspecified, an engine is free to use any of the three choices or even some alternative behavior (e.g. setting the value to null on overflow). If specified, the engine would be expected to behave as specified or fail. Note, the value of an optional enumeration cannot be used in type derivation.

    "},{"location":"expressions/scalar_functions/#option-preference","title":"Option Preference","text":"

    A producer may specify multiple values for an option. If the producer does so then the consumer must deliver the first behavior in the list of values that the consumer is capable of delivering. For example, considering overflow as defined above, if a producer specified [ERROR, SATURATE] then the consumer must deliver ERROR if it is capable of doing so. If it is not then it may deliver SATURATE. If the consumer cannot deliver either behavior then it is an error and the consumer must reject the plan.

    "},{"location":"expressions/scalar_functions/#optional-properties","title":"Optional Properties","text":"Property Description Required Values A list of valid strings for this option. Required Name A human-readable name for this option. Required"},{"location":"expressions/scalar_functions/#nullability-handling","title":"Nullability Handling","text":"Mode Description MIRROR This means that the function has the behavior that if at least one of the input arguments are nullable, the return type is also nullable. If all arguments are non-nullable, the return type will be non-nullable. An example might be the + function. DECLARED_OUTPUT Input arguments are accepted of any mix of nullability. The nullability of the output function is whatever the return type expression states. Example use might be the function is_null() where the output is always boolean independent of the nullability of the input. DISCRETE The input and arguments all define concrete nullability and can only be bound to the types that have those nullability. For example, if a type input is declared i64? and one has an i64 literal, the i64 literal must be specifically cast to i64? to allow the operation to bind."},{"location":"expressions/scalar_functions/#parameterized-types","title":"Parameterized Types","text":"

    Types are parameterized by two types of values: by inner types (e.g. List<K>) and numeric values (e.g. DECIMAL<P,S>). Parameter names are simple strings (frequently a single character). There are two types of parameters: integer parameters and type parameters.

    When the same parameter name is used multiple times in a function definition, the function can only bind if the exact same value is used for all parameters of that name. For example, if one had a function with a signature of fn(VARCHAR<N>, VARCHAR<N>), the function would be only be usable if both VARCHAR types had the same length value N. This necessitates that all instances of the same parameter name must be of the same parameter type (all instances are a type parameter or all instances are an integer parameter).

    "},{"location":"expressions/scalar_functions/#type-parameter-resolution-in-variadic-functions","title":"Type Parameter Resolution in Variadic Functions","text":"

    When the last argument of a function is variadic and declares a type parameter e.g. fn(A, B, C...), the C parameter can be marked as either consistent or inconsistent. If marked as consistent, the function can only be bound to arguments where all the C types are the same concrete type. If marked as inconsistent, each unique C can be bound to a different type within the constraints of what T allows.

    "},{"location":"expressions/scalar_functions/#output-type-derivation","title":"Output Type Derivation","text":""},{"location":"expressions/scalar_functions/#concrete-return-types","title":"Concrete Return Types","text":"

    A concrete return type is one that is fully known at function definition time. Examples of simple concrete return types would be things such as i32, fp32. For compound types, a concrete return type must be fully declared. Example of fully defined compound types: VARCHAR<20>, DECIMAL<25,5>

    "},{"location":"expressions/scalar_functions/#return-type-expressions","title":"Return Type Expressions","text":"

    Any function can declare a return type expression. A return type expression uses a simplified set of expressions to describe how the return type should be returned. For example, a return expression could be as simple as the return of parameter declared in the arguments. For example f(List<K>) => K or can be a simple mathematical or conditional expression such as add(decimal<a,b>, decimal<c,d>) => decimal<a+c, b+d>. For the simple expression language, there is a very narrow set of types:

    • Integer: 64-bit signed integer (can be a literal or a parameter value)
    • Boolean: True and False
    • Type: A Substrait type (with possibly additional embedded expressions)

    These types are evaluated using a small set of operations to support common scenarios. List of valid operations:

    Math: +, -, *, /, min, max\nBoolean: &&, ||, !, <, >, ==\nParameters: type, integer\nLiterals: type, integer\n

    Fully defined with argument types:

    • type_parameter(string name) => type
    • integer_parameter(string name) => integer
    • not(boolean x) => boolean
    • and(boolean a, boolean b) => boolean
    • or(boolean a, boolean b) => boolean
    • multiply(integer a, integer b) => integer
    • divide(integer a, integer b) => integer
    • add(integer a, integer b) => integer
    • subtract(integer a, integer b) => integer
    • min(integer a, integer b) => integer
    • max(integer a, integer b) => integer
    • equal(integer a, integer b) => boolean
    • greater_than(integer a, integer b) => boolean
    • less_than(integer a, integer b) => boolean
    • covers(Type a, Type b) => boolean Covers means that type b matches type A for as much as type B is defined. For example, if type A is VARCHAR<20> and type B is VARCHAR<N>, type B would be considered covering. Similarlily if type A was List<Struct<a:f32, b:f32>>and type B was List<Struct<>>, it would be considered covering. Note that this is directional \u201cas in B covers A\u201d or \u201cB can be further enhanced to match the definition A\u201d.
    • if(boolean a) then (integer) else (integer)
    • if(boolean a) then (type) else (type)
    "},{"location":"expressions/scalar_functions/#example-type-expressions","title":"Example Type Expressions","text":"

    For reference, here are are some common output type derivations and how they can be expressed with a return type expression:

    Operation Definition Add item to list add(List<T>, T) => List<T> Decimal Division divide(Decimal<P1,S1>, Decimal<P2,S2>) => Decimal<P1 -S1 + S2 + MAX(6, S1 + P2 + 1), MAX(6, S1 + P2 + 1)> Select a subset of map keys based on a regular expression (requires stringlike keys) extract_values(regex:string, map:Map<K,V>) => List<V> WHERE K IN [STRING, VARCHAR<N>, FIXEDCHAR<N>] Concatenate two fixed sized character strings concat(FIXEDCHAR<A>, FIXEDCHAR<B>) => FIXEDCHAR<A+B> Make a struct of a set of fields and a struct definition. make_struct(<type> T, K...) => T"},{"location":"expressions/specialized_record_expressions/","title":"Specialized Record Expressions","text":"

    While all types of operations could be reduced to functions, in some cases this would be overly simplistic. Instead, it is helpful to construct some other expression constructs.

    These constructs should be focused on different expression types as opposed to something that directly related to syntactic sugar. For example, CAST and EXTRACT or SQL operations that are presented using specialized syntax. However, they can easily be modeled using a function paradigm with minimal complexity.

    "},{"location":"expressions/specialized_record_expressions/#literal-expressions","title":"Literal Expressions","text":"

    For each data type, it is possible to create a literal value for that data type. The representation depends on the serialization format. Literal expressions include both a type literal and a possibly null value.

    "},{"location":"expressions/specialized_record_expressions/#nested-type-constructor-expressions","title":"Nested Type Constructor Expressions","text":"

    These expressions allow structs, lists, and maps to be constructed from a set of expressions. For example, they allow a struct expression like (field 0 - field 1, field 0 + field 1) to be represented.

    "},{"location":"expressions/specialized_record_expressions/#cast-expression","title":"Cast Expression","text":"

    To convert a value from one type to another, Substrait defines a cast expression. Cast expressions declare an expected type, an input argument and an enumeration specifying failure behavior, indicating whether cast should return null on failure or throw an exception.

    Note that Substrait always requires a cast expression whenever the current type is not exactly equal to (one of) the expected types. For example, it is illegal to directly pass a value of type i8[0] to a function that only supports an i8?[0] argument.

    "},{"location":"expressions/specialized_record_expressions/#if-expression","title":"If Expression","text":"

    An if value expression is an expression composed of one if clause, zero or more else if clauses and an else clause. In pseudocode, they are envisioned as:

    if <boolean expression> then <result expression 1>\nelse if <boolean expression> then <result expression 2> (zero or more times)\nelse <result expression 3>\n

    When an if expression is declared, all return expressions must be the same identical type.

    "},{"location":"expressions/specialized_record_expressions/#shortcut-behavior","title":"Shortcut Behavior","text":"

    An if expression is expected to logically short-circuit on a positive outcome. This means that a skipped else/elseif expression cannot cause an error. For example, this should not actually throw an error despite the fact that the cast operation should fail.

    if 'value' = 'value' then 0\nelse cast('hello' as integer) \n
    "},{"location":"expressions/specialized_record_expressions/#switch-expression","title":"Switch Expression","text":"

    Switch expression allow a selection of alternate branches based on the value of a given expression. They are an optimized form of a generic if expression where all conditions are equality to the same value. In pseudocode:

    switch(value)\n<value 1> => <return 1> (1 or more times)\n<else> => <return default>\n

    Return values for a switch expression must all be of identical type.

    "},{"location":"expressions/specialized_record_expressions/#shortcut-behavior_1","title":"Shortcut Behavior","text":"

    As in if expressions, switch expression evaluation should not be interrupted by \u201croads not taken\u201d.

    "},{"location":"expressions/specialized_record_expressions/#or-list-equality-expression","title":"Or List Equality Expression","text":"

    A specialized structure that is often used is a large list of possible values. In SQL, these are typically large IN lists. They can be composed from one or more fields. There are two common patterns, single value and multi value. In pseudocode they are represented as:

    Single Value:\nexpression, [<value1>, <value2>, ... <valueN>]\n\nMulti Value:\n[expressionA, expressionB], [[value1a, value1b], [value2a, value2b].. [valueNa, valueNb]]\n

    For single value expressions, these are a compact equivalent of expression = value1 OR expression = value2 OR .. OR expression = valueN. When using an expression of this type, two things are required; the types of the test expression and all value expressions that are related must be of the same type. Additionally, a function signature for equality must be available for the expression type used.

    "},{"location":"expressions/subqueries/","title":"Subqueries","text":"

    Subqueries are scalar expressions comprised of another query.

    "},{"location":"expressions/subqueries/#forms","title":"Forms","text":""},{"location":"expressions/subqueries/#scalar","title":"Scalar","text":"

    Scalar subqueries are subqueries that return one row and one column.

    Property Description Required Input Input relation Yes"},{"location":"expressions/subqueries/#in-predicate","title":"IN predicate","text":"

    An IN subquery predicate checks that the left expression is contained in the right subquery.

    "},{"location":"expressions/subqueries/#examples","title":"Examples","text":"
    SELECT *\nFROM t1\nWHERE x IN (SELECT * FROM t2)\n
    SELECT *\nFROM t1\nWHERE (x, y) IN (SELECT a, b FROM t2)\n
    Property Description Required Needles Expressions whose existence will be checked Yes Haystack Subquery to check Yes"},{"location":"expressions/subqueries/#set-predicates","title":"Set predicates","text":"

    A set predicate is a predicate over a set of rows in the form of a subquery.

    EXISTS and UNIQUE are common SQL spellings of these kinds of predicates.

    Property Description Required Operation The operation to perform over the set Yes Tuples Set of tuples to check using the operation Yes"},{"location":"expressions/subqueries/#set-comparisons","title":"Set comparisons","text":"

    A set comparison subquery is a subquery comparison using ANY or ALL operations.

    "},{"location":"expressions/subqueries/#examples_1","title":"Examples","text":"
    SELECT *\nFROM t1\nWHERE x < ANY(SELECT y from t2)\n
    Property Description Required Reduction operation The kind of reduction to use over the subquery Yes Comparison operation The kind of comparison operation to use Yes Expression Left-hand side expression to check Yes Subquery Subquery to check Yes Protobuf Representation
    message Subquery {\n  oneof subquery_type {\n    // Scalar subquery\n    Scalar scalar = 1;\n    // x IN y predicate\n    InPredicate in_predicate = 2;\n    // EXISTS/UNIQUE predicate\n    SetPredicate set_predicate = 3;\n    // ANY/ALL predicate\n    SetComparison set_comparison = 4;\n  }\n\n  // A subquery with one row and one column. This is often an aggregate\n  // though not required to be.\n  message Scalar {\n    Rel input = 1;\n  }\n\n  // Predicate checking that the left expression is contained in the right\n  // subquery\n  //\n  // Examples:\n  //\n  // x IN (SELECT * FROM t)\n  // (x, y) IN (SELECT a, b FROM t)\n  message InPredicate {\n    repeated Expression needles = 1;\n    Rel haystack = 2;\n  }\n\n  // A predicate over a set of rows in the form of a subquery\n  // EXISTS and UNIQUE are common SQL forms of this operation.\n  message SetPredicate {\n    enum PredicateOp {\n      PREDICATE_OP_UNSPECIFIED = 0;\n      PREDICATE_OP_EXISTS = 1;\n      PREDICATE_OP_UNIQUE = 2;\n    }\n    // TODO: should allow expressions\n    PredicateOp predicate_op = 1;\n    Rel tuples = 2;\n  }\n\n  // A subquery comparison using ANY or ALL.\n  // Examples:\n  //\n  // SELECT *\n  // FROM t1\n  // WHERE x < ANY(SELECT y from t2)\n  message SetComparison {\n    enum ComparisonOp {\n      COMPARISON_OP_UNSPECIFIED = 0;\n      COMPARISON_OP_EQ = 1;\n      COMPARISON_OP_NE = 2;\n      COMPARISON_OP_LT = 3;\n      COMPARISON_OP_GT = 4;\n      COMPARISON_OP_LE = 5;\n      COMPARISON_OP_GE = 6;\n    }\n\n    enum ReductionOp {\n      REDUCTION_OP_UNSPECIFIED = 0;\n      REDUCTION_OP_ANY = 1;\n      REDUCTION_OP_ALL = 2;\n    }\n\n    // ANY or ALL\n    ReductionOp reduction_op = 1;\n    // A comparison operator\n    ComparisonOp comparison_op = 2;\n    // left side of the expression\n    Expression left = 3;\n    // right side of the expression\n    Rel right = 4;\n  }\n}\n
    "},{"location":"expressions/table_functions/","title":"Table Functions","text":"

    Table functions produce zero or more records for each input record. Table functions use a signature similar to scalar functions. However, they are not allowed in the same contexts.

    to be completed\u2026

    "},{"location":"expressions/user_defined_functions/","title":"User-Defined Functions","text":"

    Substrait supports the creation of custom functions using simple extensions, using the facilities described in scalar functions. The functions defined by Substrait use the same mechanism. The extension files for standard functions can be found here.

    Here\u2019s an example function that doubles its input:

    Implementation Note

    This implementation is only defined on 32-bit floats and integers but could be defined on all numbers (and even lists and strings). The user of the implementation can specify what happens when the resulting value falls outside of the valid range for a 32-bit float (either return NAN or raise an error).

    %YAML 1.2\n---\nscalar_functions:\n  -\n    name: \"double\"\n    description: \"Double the value\"\n    impls:\n      - args:\n          - name: x\n            value: fp32\n        options:\n          on_domain_error:\n            values: [ NAN, ERROR ]\n        return: fp32\n      - args:\n          - name: x\n            value: i32\n        options:\n          on_domain_error:\n            values: [ NAN, ERROR ]\n        return: i32\n
    "},{"location":"expressions/window_functions/","title":"Window Functions","text":"

    Window functions are functions which consume values from multiple records to produce a single output. They are similar to aggregate functions, but also have a focused window of analysis to compare to their partition window. Window functions are similar to scalar values to an end user, producing a single value for each input record. However, the consumption visibility for the production of each single record can be many records.

    Window function signatures contain all the properties defined for aggregate functions. Additionally, they contain the properties below

    Property Description Required Inherits All properties defined for aggregate functions. N/A Window Type STREAMING or PARTITION. Describes whether the function needs to see all data for the specific partition operation simultaneously. Operations like SUM can produce values in a streaming manner with no complete visibility of the partition. NTILE requires visibility of the entire partition before it can start producing values. Optional, defaults to PARTITION

    When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

    Property Description Required Partition A list of partitioning expressions. False, defaults to a single partition for the entire dataset Lower Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to start of partition Upper Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to end of partition"},{"location":"expressions/window_functions/#aggregate-functions-as-window-functions","title":"Aggregate Functions as Window Functions","text":"

    Aggregate functions can be treated as a window functions with Window Type set to STREAMING.

    AVG, COUNT, MAX, MIN and SUM are examples of aggregate functions that are commonly allowed in window contexts.

    "},{"location":"extensions/","title":"Extensions","text":"

    In many cases, the existing objects in Substrait will be sufficient to accomplish a particular use case. However, it is sometimes helpful to create a new data type, scalar function signature or some other custom representation within a system. For that, Substrait provides a number of extension points.

    "},{"location":"extensions/#simple-extensions","title":"Simple Extensions","text":"

    Some kinds of primitives are so frequently extended that Substrait defines a standard YAML format that describes how the extended functionality can be interpreted. This allows different projects/systems to use the YAML definition as a specification so that interoperability isn\u2019t constrained to the base Substrait specification. The main types of extensions that are defined in this manner include the following:

    • Data types
    • Type variations
    • Scalar Functions
    • Aggregate Functions
    • Window Functions
    • Table Functions

    To extend these items, developers can create one or more YAML files at a defined URI that describes the properties of each of these extensions. The YAML file is constructed according to the YAML Schema. Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

    A Substrait plan can reference one or more YAML files via URI for extension. In the places where these entities are referenced, they will be referenced using a URI + name reference. The name scheme per type works as follows:

    Category Naming scheme Type The name as defined on the type object. Type Variation The name as defined on the type variation object. Function Signature A function signature compound name as described below.

    A YAML file can also reference types and type variations defined in another YAML file. To do this, it must declare the YAML file it depends on using a key-value pair in the dependencies key, where the value is the URI to the YAML file, and the key is a valid identifier that can then be used as an identifier-safe alias for the URI. This alias can then be used as a .-separated namespace prefix wherever a type class or type variation name is expected.

    For example, if the YAML file at file:///extension_types.yaml defines a type called point, a different YAML file can use the type in a function declaration as follows:

    dependencies:\n  ext: file:///extension_types.yaml\nscalar_functions:\n- name: distance\n  description: The distance between two points.\n  impls:\n  - args:\n    - name: a\n      value: ext.point\n    - name: b\n      value: ext.point\n    return: f64\n

    Here, the choice for the name ext is arbitrary, as long as it does not conflict with anything else in the YAML file.

    "},{"location":"extensions/#function-signature-compound-names","title":"Function Signature Compound Names","text":"

    A YAML file may contain one or more functions by the same name. The key used in the function extension declaration to reference a function is a combination of the name of the function along with a list of the required input argument types. The format is as follows:

    <function name>:<short_arg_type0>_<short_arg_type1>_..._<short_arg_typeN>\n

    Rather than using a full data type representation, the input argument types (short_arg_type) are mapped to single-level short name. The mappings are listed in the table below.

    Note

    Every compound function signature must be unique. If two function implementations in a YAML file would generate the same compound function signature, then the YAML file is invalid and behavior is undefined.

    Argument Type Signature Name Required Enumeration req i8 i8 i16 i16 i32 i32 i64 i64 fp32 fp32 fp64 fp64 string str binary vbin boolean bool timestamp ts timestamp_tz tstz date date time time interval_year iyear interval_day iday uuid uuid fixedchar<N> fchar varchar<N> vchar fixedbinary<N> fbin decimal<P,S> dec precision_timestamp<P> pts precision_timestamp_tz<P> ptstz struct<T1,T2,\u2026,TN> struct list<T> list map<K,V> map any[\\d]? any user defined type u!name"},{"location":"extensions/#examples","title":"Examples","text":"Function Signature Function Name add(optional enumeration, i8, i8) => i8 add:i8_i8 avg(fp32) => fp32 avg:fp32 extract(required enumeration, timestamp) => i64 extract:req_ts sum(any1) => any1 sum:any"},{"location":"extensions/#advanced-extensions","title":"Advanced Extensions","text":"

    Less common extensions can be extended using customization support at the serialization level. This includes the following kinds of extensions:

    Extension Type Description Relation Modification (semantic) Extensions to an existing relation that will alter the semantics of that relation. These kinds of extensions require that any plan consumer understand the extension to be able to manipulate or execute that operator. Ignoring these extensions will result in an incorrect interpretation of the plan. An example extension might be creating a customized version of Aggregate that can optionally apply a filter before aggregating the data. Note: Semantic-changing extensions shouldn\u2019t change the core characteristics of the underlying relation. For example, they should not change the default direct output field ordering, change the number of fields output or change the behavior of physical property characteristics. If one needs to change one of these behaviors, one should define a new relation as described below. Relation Modification (optimization) Extensions to an existing relation that can improve the efficiency of a plan consumer but don\u2019t fundamentally change the behavior of the operation. An example might be an estimated amount of memory the relation is expected to use or a particular algorithmic pattern that is perceived to be optimal. New Relations Creates an entirely new kind of relation. It is the most flexible way to extend Substrait but also make the Substrait plan the least interoperable. In most cases it is better to use a semantic changing relation as oppposed to a new relation as it means existing code patterns can easily be extended to work with the additional properties. New Read Types Defines a new subcategory of read that can be used in a ReadRel. One of Substrait is to provide a fairly extensive set of read patterns within the project as opposed to requiring people to define new types externally. As such, we suggest that you first talk with the Substrait community to determine whether you read type can be incorporated directly in the core specification. New Write Types Similar to a read type but for writes. As with reads, the community recommends that interested extenders first discuss with the community about developing new write types in the community before using the extension mechanisms. Plan Extensions Semantic and/or optimization based additions at the plan level.

    Because extension mechanisms are different for each serialization format, please refer to the corresponding serialization sections to understand how these extensions are defined in more detail.

    "},{"location":"extensions/functions_aggregate_approx/","title":"functions_aggregate_approx.yaml","text":"

    This document file is generated for functions_aggregate_approx.yaml

    "},{"location":"extensions/functions_aggregate_approx/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_aggregate_approx/#approx_count_distinct","title":"approx_count_distinct","text":"

    Implementations: approx_count_distinct(x): -> return_type 0. approx_count_distinct(any): -> i64

    Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

    "},{"location":"extensions/functions_aggregate_generic/","title":"functions_aggregate_generic.yaml","text":"

    This document file is generated for functions_aggregate_generic.yaml

    "},{"location":"extensions/functions_aggregate_generic/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_aggregate_generic/#count","title":"count","text":"

    Implementations: count(x, option:overflow): -> return_type 0. count(any, option:overflow): -> i64

    Count a set of values

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_aggregate_generic/#count_1","title":"count","text":"

    Implementations:

    Count a set of records (not field referenced)

    "},{"location":"extensions/functions_aggregate_generic/#any_value","title":"any_value","text":"

    Implementations: any_value(x): -> return_type 0. any_value(any): -> any?

    *Selects an arbitrary value from a group of values. If the input is empty, the function returns null. *

    "},{"location":"extensions/functions_arithmetic/","title":"functions_arithmetic.yaml","text":"

    This document file is generated for functions_arithmetic.yaml

    "},{"location":"extensions/functions_arithmetic/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_arithmetic/#add","title":"add","text":"

    Implementations: add(x, y, option:overflow): -> return_type 0. add(i8, i8, option:overflow): -> i8 1. add(i16, i16, option:overflow): -> i16 2. add(i32, i32, option:overflow): -> i32 3. add(i64, i64, option:overflow): -> i64 4. add(fp32, fp32, option:rounding): -> fp32 5. add(fp64, fp64, option:rounding): -> fp64

    Add two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y, option:overflow): -> return_type 0. subtract(i8, i8, option:overflow): -> i8 1. subtract(i16, i16, option:overflow): -> i16 2. subtract(i32, i32, option:overflow): -> i32 3. subtract(i64, i64, option:overflow): -> i64 4. subtract(fp32, fp32, option:rounding): -> fp32 5. subtract(fp64, fp64, option:rounding): -> fp64

    Subtract one value from another.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y, option:overflow): -> return_type 0. multiply(i8, i8, option:overflow): -> i8 1. multiply(i16, i16, option:overflow): -> i16 2. multiply(i32, i32, option:overflow): -> i32 3. multiply(i64, i64, option:overflow): -> i64 4. multiply(fp32, fp32, option:rounding): -> fp32 5. multiply(fp64, fp64, option:rounding): -> fp64

    Multiply two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#divide","title":"divide","text":"

    Implementations: divide(x, y, option:overflow): -> return_type 0. divide(i8, i8, option:overflow): -> i8 1. divide(i16, i16, option:overflow): -> i16 2. divide(i32, i32, option:overflow): -> i32 3. divide(i64, i64, option:overflow): -> i64 4. divide(fp32, fp32, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp32 5. divide(fp64, fp64, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp64

    *Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0). The on_division_by_zero option governs behavior in cases where y is 0 and x is not 0. LIMIT means positive or negative infinity (depending on the sign of x and y). If x and y are both 0 or both \u00b1infinity, behavior will be governed by on_domain_error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • on_division_by_zero ['LIMIT', 'NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#negate","title":"negate","text":"

    Implementations: negate(x, option:overflow): -> return_type 0. negate(i8, option:overflow): -> i8 1. negate(i16, option:overflow): -> i16 2. negate(i32, option:overflow): -> i32 3. negate(i64, option:overflow): -> i64 4. negate(fp32): -> fp32 5. negate(fp64): -> fp64

    Negation of the value

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#modulus","title":"modulus","text":"

    Implementations: modulus(x, y, option:division_type, option:overflow, option:on_domain_error): -> return_type 0. modulus(i8, i8, option:division_type, option:overflow, option:on_domain_error): -> i8 1. modulus(i16, i16, option:division_type, option:overflow, option:on_domain_error): -> i16 2. modulus(i32, i32, option:division_type, option:overflow, option:on_domain_error): -> i32 3. modulus(i64, i64, option:division_type, option:overflow, option:on_domain_error): -> i64

    *Calculate the remainder \u00ae when dividing dividend (x) by divisor (y). In mathematics, many conventions for the modulus (mod) operation exists. The result of a mod operation depends on the software implementation and underlying hardware. Substrait is a format for describing compute operations on structured data and designed for interoperability. Therefore the user is responsible for determining a definition of division as defined by the quotient (q). The following basic conditions of division are satisfied: (1) q \u2208 \u2124 (the quotient is an integer) (2) x = y * q + r (division rule) (3) abs\u00ae < abs(y) where q is the quotient. The division_type option determines the mathematical definition of quotient to use in the above definition of division. When division_type=TRUNCATE, q = trunc(x/y). When division_type=FLOOR, q = floor(x/y). In the cases of TRUNCATE and FLOOR division: remainder r = x - round_func(x/y) The on_domain_error option governs behavior in cases where y is 0, y is \u00b1inf, or x is \u00b1inf. In these cases the mod is undefined. The overflow option governs behavior when integer overflow occurs. If x and y are both 0 or both \u00b1infinity, behavior will be governed by on_domain_error. *

    Options:
  • division_type ['TRUNCATE', 'FLOOR']
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#power","title":"power","text":"

    Implementations: power(x, y, option:overflow): -> return_type 0. power(i64, i64, option:overflow): -> i64 1. power(fp32, fp32): -> fp32 2. power(fp64, fp64): -> fp64

    Take the power with x as the base and y as exponent.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sqrt","title":"sqrt","text":"

    Implementations: sqrt(x, option:rounding, option:on_domain_error): -> return_type 0. sqrt(i64, option:rounding, option:on_domain_error): -> fp64 1. sqrt(fp32, option:rounding, option:on_domain_error): -> fp32 2. sqrt(fp64, option:rounding, option:on_domain_error): -> fp64

    Square root of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#exp","title":"exp","text":"

    Implementations: exp(x, option:rounding): -> return_type 0. exp(fp32, option:rounding): -> fp32 1. exp(fp64, option:rounding): -> fp64

    The mathematical constant e, raised to the power of the value.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#cos","title":"cos","text":"

    Implementations: cos(x, option:rounding): -> return_type 0. cos(fp32, option:rounding): -> fp64 1. cos(fp64, option:rounding): -> fp64

    Get the cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#sin","title":"sin","text":"

    Implementations: sin(x, option:rounding): -> return_type 0. sin(fp32, option:rounding): -> fp64 1. sin(fp64, option:rounding): -> fp64

    Get the sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#tan","title":"tan","text":"

    Implementations: tan(x, option:rounding): -> return_type 0. tan(fp32, option:rounding): -> fp64 1. tan(fp64, option:rounding): -> fp64

    Get the tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#cosh","title":"cosh","text":"

    Implementations: cosh(x, option:rounding): -> return_type 0. cosh(fp32, option:rounding): -> fp32 1. cosh(fp64, option:rounding): -> fp64

    Get the hyperbolic cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#sinh","title":"sinh","text":"

    Implementations: sinh(x, option:rounding): -> return_type 0. sinh(fp32, option:rounding): -> fp32 1. sinh(fp64, option:rounding): -> fp64

    Get the hyperbolic sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#tanh","title":"tanh","text":"

    Implementations: tanh(x, option:rounding): -> return_type 0. tanh(fp32, option:rounding): -> fp32 1. tanh(fp64, option:rounding): -> fp64

    Get the hyperbolic tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#acos","title":"acos","text":"

    Implementations: acos(x, option:rounding, option:on_domain_error): -> return_type 0. acos(fp32, option:rounding, option:on_domain_error): -> fp64 1. acos(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#asin","title":"asin","text":"

    Implementations: asin(x, option:rounding, option:on_domain_error): -> return_type 0. asin(fp32, option:rounding, option:on_domain_error): -> fp64 1. asin(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#atan","title":"atan","text":"

    Implementations: atan(x, option:rounding): -> return_type 0. atan(fp32, option:rounding): -> fp64 1. atan(fp64, option:rounding): -> fp64

    Get the arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#acosh","title":"acosh","text":"

    Implementations: acosh(x, option:rounding, option:on_domain_error): -> return_type 0. acosh(fp32, option:rounding, option:on_domain_error): -> fp32 1. acosh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#asinh","title":"asinh","text":"

    Implementations: asinh(x, option:rounding): -> return_type 0. asinh(fp32, option:rounding): -> fp32 1. asinh(fp64, option:rounding): -> fp64

    Get the hyperbolic arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#atanh","title":"atanh","text":"

    Implementations: atanh(x, option:rounding, option:on_domain_error): -> return_type 0. atanh(fp32, option:rounding, option:on_domain_error): -> fp32 1. atanh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#atan2","title":"atan2","text":"

    Implementations: atan2(x, y, option:rounding, option:on_domain_error): -> return_type 0. atan2(fp32, fp32, option:rounding, option:on_domain_error): -> fp64 1. atan2(fp64, fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arctangent of values given as x/y pairs.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#radians","title":"radians","text":"

    Implementations: radians(x, option:rounding): -> return_type 0. radians(fp32, option:rounding): -> fp32 1. radians(fp64, option:rounding): -> fp64

    *Converts angle x in degrees to radians. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#degrees","title":"degrees","text":"

    Implementations: degrees(x, option:rounding): -> return_type 0. degrees(fp32, option:rounding): -> fp32 1. degrees(fp64, option:rounding): -> fp64

    *Converts angle x in radians to degrees. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#abs","title":"abs","text":"

    Implementations: abs(x, option:overflow): -> return_type 0. abs(i8, option:overflow): -> i8 1. abs(i16, option:overflow): -> i16 2. abs(i32, option:overflow): -> i32 3. abs(i64, option:overflow): -> i64 4. abs(fp32): -> fp32 5. abs(fp64): -> fp64

    *Calculate the absolute value of the argument. Integer values allow the specification of overflow behavior to handle the unevenness of the twos complement, e.g. Int8 range [-128 : 127]. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sign","title":"sign","text":"

    Implementations: sign(x): -> return_type 0. sign(i8): -> i8 1. sign(i16): -> i16 2. sign(i32): -> i32 3. sign(i64): -> i64 4. sign(fp32): -> fp32 5. sign(fp64): -> fp64

    *Return the signedness of the argument. Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] *

    "},{"location":"extensions/functions_arithmetic/#factorial","title":"factorial","text":"

    Implementations: factorial(n, option:overflow): -> return_type 0. factorial(i32, option:overflow): -> i32 1. factorial(i64, option:overflow): -> i64

    *Return the factorial of a given integer input. The factorial of 0! is 1 by convention. Negative inputs will raise an error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#bitwise_not","title":"bitwise_not","text":"

    Implementations: bitwise_not(x): -> return_type 0. bitwise_not(i8): -> i8 1. bitwise_not(i16): -> i16 2. bitwise_not(i32): -> i32 3. bitwise_not(i64): -> i64

    *Return the bitwise NOT result for one integer input. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_and","title":"bitwise_and","text":"

    Implementations: bitwise_and(x, y): -> return_type 0. bitwise_and(i8, i8): -> i8 1. bitwise_and(i16, i16): -> i16 2. bitwise_and(i32, i32): -> i32 3. bitwise_and(i64, i64): -> i64

    *Return the bitwise AND result for two integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_or","title":"bitwise_or","text":"

    Implementations: bitwise_or(x, y): -> return_type 0. bitwise_or(i8, i8): -> i8 1. bitwise_or(i16, i16): -> i16 2. bitwise_or(i32, i32): -> i32 3. bitwise_or(i64, i64): -> i64

    *Return the bitwise OR result for two given integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_xor","title":"bitwise_xor","text":"

    Implementations: bitwise_xor(x, y): -> return_type 0. bitwise_xor(i8, i8): -> i8 1. bitwise_xor(i16, i16): -> i16 2. bitwise_xor(i32, i32): -> i32 3. bitwise_xor(i64, i64): -> i64

    *Return the bitwise XOR result for two integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_arithmetic/#sum","title":"sum","text":"

    Implementations: sum(x, option:overflow): -> return_type 0. sum(i8, option:overflow): -> i64? 1. sum(i16, option:overflow): -> i64? 2. sum(i32, option:overflow): -> i64? 3. sum(i64, option:overflow): -> i64? 4. sum(fp32, option:overflow): -> fp64? 5. sum(fp64, option:overflow): -> fp64?

    Sum a set of values. The sum of zero elements yields null.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sum0","title":"sum0","text":"

    Implementations: sum0(x, option:overflow): -> return_type 0. sum0(i8, option:overflow): -> i64 1. sum0(i16, option:overflow): -> i64 2. sum0(i32, option:overflow): -> i64 3. sum0(i64, option:overflow): -> i64 4. sum0(fp32, option:overflow): -> fp64 5. sum0(fp64, option:overflow): -> fp64

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#avg","title":"avg","text":"

    Implementations: avg(x, option:overflow): -> return_type 0. avg(i8, option:overflow): -> i8? 1. avg(i16, option:overflow): -> i16? 2. avg(i32, option:overflow): -> i32? 3. avg(i64, option:overflow): -> i64? 4. avg(fp32, option:overflow): -> fp32? 5. avg(fp64, option:overflow): -> fp64?

    Average a set of values. For integral types, this truncates partial values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(i8): -> i8? 1. min(i16): -> i16? 2. min(i32): -> i32? 3. min(i64): -> i64? 4. min(fp32): -> fp32? 5. min(fp64): -> fp64? 6. min(timestamp): -> timestamp? 7. min(timestamp_tz): -> timestamp_tz?

    Min a set of values.

    "},{"location":"extensions/functions_arithmetic/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(i8): -> i8? 1. max(i16): -> i16? 2. max(i32): -> i32? 3. max(i64): -> i64? 4. max(fp32): -> fp32? 5. max(fp64): -> fp64? 6. max(timestamp): -> timestamp? 7. max(timestamp_tz): -> timestamp_tz?

    Max a set of values.

    "},{"location":"extensions/functions_arithmetic/#product","title":"product","text":"

    Implementations: product(x, option:overflow): -> return_type 0. product(i8, option:overflow): -> i8 1. product(i16, option:overflow): -> i16 2. product(i32, option:overflow): -> i32 3. product(i64, option:overflow): -> i64 4. product(fp32, option:rounding): -> fp32 5. product(fp64, option:rounding): -> fp64

    Product of a set of values. Returns 1 for empty input.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#std_dev","title":"std_dev","text":"

    Implementations: std_dev(x, option:rounding, option:distribution): -> return_type 0. std_dev(fp32, option:rounding, option:distribution): -> fp32? 1. std_dev(fp64, option:rounding, option:distribution): -> fp64?

    Calculates standard-deviation for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • "},{"location":"extensions/functions_arithmetic/#variance","title":"variance","text":"

    Implementations: variance(x, option:rounding, option:distribution): -> return_type 0. variance(fp32, option:rounding, option:distribution): -> fp32? 1. variance(fp64, option:rounding, option:distribution): -> fp64?

    Calculates variance for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • "},{"location":"extensions/functions_arithmetic/#corr","title":"corr","text":"

    Implementations: corr(x, y, option:rounding): -> return_type 0. corr(fp32, fp32, option:rounding): -> fp32? 1. corr(fp64, fp64, option:rounding): -> fp64?

    *Calculates the value of Pearson\u2019s correlation coefficient between x and y. If there is no input, null is returned. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#mode","title":"mode","text":"

    Implementations: mode(x): -> return_type 0. mode(i8): -> i8? 1. mode(i16): -> i16? 2. mode(i32): -> i32? 3. mode(i64): -> i64? 4. mode(fp32): -> fp32? 5. mode(fp64): -> fp64?

    *Calculates mode for a set of values. If there is no input, null is returned. *

    "},{"location":"extensions/functions_arithmetic/#median","title":"median","text":"

    Implementations: median(precision, x, option:rounding): -> return_type 0. median(precision, i8, option:rounding): -> i8? 1. median(precision, i16, option:rounding): -> i16? 2. median(precision, i32, option:rounding): -> i32? 3. median(precision, i64, option:rounding): -> i64? 4. median(precision, fp32, option:rounding): -> fp32? 5. median(precision, fp64, option:rounding): -> fp64?

    *Calculate the median for a set of values. Returns null if applied to zero records. For the integer implementations, the rounding option determines how the median should be rounded if it ends up midway between two values. For the floating point implementations, they specify the usual floating point rounding mode. *

    Options:
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#quantile","title":"quantile","text":"

    Implementations: quantile(boundaries, precision, n, distribution, option:rounding): -> return_type

  • n: A positive integer which defines the number of quantile partitions.
  • distribution: The data for which the quantiles should be computed.
  • 0. quantile(boundaries, precision, i64, any, option:rounding): -> LIST?<any>

    *Calculates quantiles for a set of values. This function will divide the aggregated values (passed via the distribution argument) over N equally-sized bins, where N is passed via a constant argument. It will then return the values at the boundaries of these bins in list form. If the input is appropriately sorted, this computes the quantiles of the distribution. The function can optionally return the first and/or last element of the input, as specified by the boundaries argument. If the input is appropriately sorted, this will thus be the minimum and/or maximum values of the distribution. When the boundaries do not lie exactly on elements of the incoming distribution, the function will interpolate between the two nearby elements. If the interpolated value cannot be represented exactly, the rounding option controls how the value should be selected or computed. The function fails and returns null in the following cases: - n is null or less than one; - any value in distribution is null.

    The function returns an empty list if n equals 1 and boundaries is set to NEITHER. *

    Options:
  • boundaries ['NEITHER', 'MINIMUM', 'MAXIMUM', 'BOTH']
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#window-functions","title":"Window Functions","text":""},{"location":"extensions/functions_arithmetic/#row_number","title":"row_number","text":"

    Implementations: 0. row_number(): -> i64?

    the number of the current row within its partition.

    "},{"location":"extensions/functions_arithmetic/#rank","title":"rank","text":"

    Implementations: 0. rank(): -> i64?

    the rank of the current row, with gaps.

    "},{"location":"extensions/functions_arithmetic/#dense_rank","title":"dense_rank","text":"

    Implementations: 0. dense_rank(): -> i64?

    the rank of the current row, without gaps.

    "},{"location":"extensions/functions_arithmetic/#percent_rank","title":"percent_rank","text":"

    Implementations: 0. percent_rank(): -> fp64?

    the relative rank of the current row.

    "},{"location":"extensions/functions_arithmetic/#cume_dist","title":"cume_dist","text":"

    Implementations: 0. cume_dist(): -> fp64?

    the cumulative distribution.

    "},{"location":"extensions/functions_arithmetic/#ntile","title":"ntile","text":"

    Implementations: ntile(x): -> return_type 0. ntile(i32): -> i32? 1. ntile(i64): -> i64?

    Return an integer ranging from 1 to the argument value,dividing the partition as equally as possible.

    "},{"location":"extensions/functions_arithmetic/#first_value","title":"first_value","text":"

    Implementations: first_value(expression): -> return_type 0. first_value(any1): -> any1

    *Returns the first value in the window. *

    "},{"location":"extensions/functions_arithmetic/#last_value","title":"last_value","text":"

    Implementations: last_value(expression): -> return_type 0. last_value(any1): -> any1

    *Returns the last value in the window. *

    "},{"location":"extensions/functions_arithmetic/#nth_value","title":"nth_value","text":"

    Implementations: nth_value(expression, window_offset, option:on_domain_error): -> return_type 0. nth_value(any1, i32, option:on_domain_error): -> any1?

    *Returns a value from the nth row based on the window_offset. window_offset should be a positive integer. If the value of the window_offset is outside the range of the window, null is returned. The on_domain_error option governs behavior in cases where window_offset is not a positive integer or null. *

    Options:
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#lead","title":"lead","text":"

    Implementations: lead(expression): -> return_type 0. lead(any1): -> any1? 1. lead(any1, i32): -> any1? 2. lead(any1, i32, any1): -> any1?

    *Return a value from a following row based on a specified physical offset. This allows you to compare a value in the current row against a following row. The expression is evaluated against a row that comes after the current row based on the row_offset. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming before the current row, similar to the lag function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the window. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the following year. row_offset of 1. | year | sales | next_year_sales | | 2019 | 20.50 | 30.00 | | 2020 | 30.00 | 45.99 | | 2021 | 45.99 | null | *

    "},{"location":"extensions/functions_arithmetic/#lag","title":"lag","text":"

    Implementations: lag(expression): -> return_type 0. lag(any1): -> any1? 1. lag(any1, i32): -> any1? 2. lag(any1, i32, any1): -> any1?

    *Return a column value from a previous row based on a specified physical offset. This allows you to compare a value in the current row against a previous row. The expression is evaluated against a row that comes before the current row based on the row_offset. The expression can be a column, expression or subquery that evaluates to a single value. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming after the current row, similar to the lead function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the partition. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the previous year. row_offset of 1. | year | sales | previous_year_sales | | 2019 | 20.50 | null | | 2020 | 30.00 | 20.50 | | 2021 | 45.99 | 30.00 | *

    "},{"location":"extensions/functions_arithmetic_decimal/","title":"functions_arithmetic_decimal.yaml","text":"

    This document file is generated for functions_arithmetic_decimal.yaml

    "},{"location":"extensions/functions_arithmetic_decimal/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_arithmetic_decimal/#add","title":"add","text":"

    Implementations: add(x, y, option:overflow): -> return_type 0. add(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = init_scale + max(P1 - S1, P2 - S2) + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Add two decimal values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y, option:overflow): -> return_type 0. subtract(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = init_scale + max(P1 - S1, P2 - S2) + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y, option:overflow): -> return_type 0. multiply(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = S1 + S2\ninit_prec = P1 + P2 + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#divide","title":"divide","text":"

    Implementations: divide(x, y, option:overflow): -> return_type 0. divide(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(6, S1 + P2 + 1)\ninit_prec = P1 - S1 + P2 + init_scale\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#modulus","title":"modulus","text":"

    Implementations: modulus(x, y, option:overflow): -> return_type 0. modulus(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = min(P1 - S1, P2 - S2) + init_scale\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_arithmetic_decimal/#sum","title":"sum","text":"

    Implementations: sum(x, option:overflow): -> return_type 0. sum(DECIMAL<P, S>, option:overflow): -> DECIMAL?<38,S>

    Sum a set of values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#avg","title":"avg","text":"

    Implementations: avg(x, option:overflow): -> return_type 0. avg(DECIMAL<P,S>, option:overflow): -> DECIMAL<38,S>

    Average a set of values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(DECIMAL<P, S>): -> DECIMAL?<P, S>

    Min a set of values.

    "},{"location":"extensions/functions_arithmetic_decimal/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(DECIMAL<P,S>): -> DECIMAL?<P, S>

    Max a set of values.

    "},{"location":"extensions/functions_arithmetic_decimal/#sum0","title":"sum0","text":"

    Implementations: sum0(x, option:overflow): -> return_type 0. sum0(DECIMAL<P, S>, option:overflow): -> DECIMAL<38,S>

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_boolean/","title":"functions_boolean.yaml","text":"

    This document file is generated for functions_boolean.yaml

    "},{"location":"extensions/functions_boolean/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_boolean/#or","title":"or","text":"

    Implementations: or(a): -> return_type 0. or(boolean?): -> boolean?

    *The boolean or using Kleene logic. This function behaves as follows with nulls:

    true or null = true\n\nnull or true = true\n\nfalse or null = null\n\nnull or false = null\n\nnull or null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value or true is always true. Behavior for 0 or 1 inputs is as follows: or() -> false or(x) -> x *

    "},{"location":"extensions/functions_boolean/#and","title":"and","text":"

    Implementations: and(a): -> return_type 0. and(boolean?): -> boolean?

    *The boolean and using Kleene logic. This function behaves as follows with nulls:

    true and null = null\n\nnull and true = null\n\nfalse and null = false\n\nnull and false = false\n\nnull and null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value and false is always false. Behavior for 0 or 1 inputs is as follows: and() -> true and(x) -> x *

    "},{"location":"extensions/functions_boolean/#and_not","title":"and_not","text":"

    Implementations: and_not(a, b): -> return_type 0. and_not(boolean?, boolean?): -> boolean?

    *The boolean and of one value and the negation of the other using Kleene logic. This function behaves as follows with nulls:

    true and not null = null\n\nnull and not false = null\n\nfalse and not null = false\n\nnull and not true = false\n\nnull and not null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value and not true is always false, as is false and not an unknown value. *

    "},{"location":"extensions/functions_boolean/#xor","title":"xor","text":"

    Implementations: xor(a, b): -> return_type 0. xor(boolean?, boolean?): -> boolean?

    *The boolean xor of two values using Kleene logic. When a null is encountered in either input, a null is output. *

    "},{"location":"extensions/functions_boolean/#not","title":"not","text":"

    Implementations: not(a): -> return_type 0. not(boolean?): -> boolean?

    *The not of a boolean value. When a null is input, a null is output. *

    "},{"location":"extensions/functions_boolean/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_boolean/#bool_and","title":"bool_and","text":"

    Implementations: bool_and(a): -> return_type 0. bool_and(boolean): -> boolean?

    *If any value in the input is false, false is returned. If the input is empty or only contains nulls, null is returned. Otherwise, true is returned. *

    "},{"location":"extensions/functions_boolean/#bool_or","title":"bool_or","text":"

    Implementations: bool_or(a): -> return_type 0. bool_or(boolean): -> boolean?

    *If any value in the input is true, true is returned. If the input is empty or only contains nulls, null is returned. Otherwise, false is returned. *

    "},{"location":"extensions/functions_comparison/","title":"functions_comparison.yaml","text":"

    This document file is generated for functions_comparison.yaml

    "},{"location":"extensions/functions_comparison/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_comparison/#not_equal","title":"not_equal","text":"

    Implementations: not_equal(x, y): -> return_type 0. not_equal(any1, any1): -> boolean

    *Whether two values are not_equal. not_equal(x, y) := (x != y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#equal","title":"equal","text":"

    Implementations: equal(x, y): -> return_type 0. equal(any1, any1): -> boolean

    *Whether two values are equal. equal(x, y) := (x == y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_not_distinct_from","title":"is_not_distinct_from","text":"

    Implementations: is_not_distinct_from(x, y): -> return_type 0. is_not_distinct_from(any1, any1): -> boolean

    *Whether two values are equal. This function treats null values as comparable, so is_not_distinct_from(null, null) == True This is in contrast to equal, in which null values do not compare. *

    "},{"location":"extensions/functions_comparison/#lt","title":"lt","text":"

    Implementations: lt(x, y): -> return_type 0. lt(any1, any1): -> boolean

    *Less than. lt(x, y) := (x < y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#gt","title":"gt","text":"

    Implementations: gt(x, y): -> return_type 0. gt(any1, any1): -> boolean

    *Greater than. gt(x, y) := (x > y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#lte","title":"lte","text":"

    Implementations: lte(x, y): -> return_type 0. lte(any1, any1): -> boolean

    *Less than or equal to. lte(x, y) := (x <= y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#gte","title":"gte","text":"

    Implementations: gte(x, y): -> return_type 0. gte(any1, any1): -> boolean

    *Greater than or equal to. gte(x, y) := (x >= y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#between","title":"between","text":"

    Implementations: between(expression, low, high): -> return_type

  • expression: The expression to test for in the range defined by `low` and `high`.
  • low: The value to check if greater than or equal to.
  • high: The value to check if less than or equal to.
  • 0. between(any1, any1, any1): -> boolean

    Whether the expression is greater than or equal to low and less than or equal to high. expression BETWEEN low AND high If low, high, or expression are null, null is returned.

    "},{"location":"extensions/functions_comparison/#is_null","title":"is_null","text":"

    Implementations: is_null(x): -> return_type 0. is_null(any1): -> boolean

    Whether a value is null. NaN is not null.

    "},{"location":"extensions/functions_comparison/#is_not_null","title":"is_not_null","text":"

    Implementations: is_not_null(x): -> return_type 0. is_not_null(any1): -> boolean

    Whether a value is not null. NaN is not null.

    "},{"location":"extensions/functions_comparison/#is_nan","title":"is_nan","text":"

    Implementations: is_nan(x): -> return_type 0. is_nan(fp32): -> boolean 1. is_nan(fp64): -> boolean

    *Whether a value is not a number. If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_finite","title":"is_finite","text":"

    Implementations: is_finite(x): -> return_type 0. is_finite(fp32): -> boolean 1. is_finite(fp64): -> boolean

    *Whether a value is finite (neither infinite nor NaN). If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_infinite","title":"is_infinite","text":"

    Implementations: is_infinite(x): -> return_type 0. is_infinite(fp32): -> boolean 1. is_infinite(fp64): -> boolean

    *Whether a value is infinite. If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#nullif","title":"nullif","text":"

    Implementations: nullif(x, y): -> return_type 0. nullif(any1, any1): -> any1

    If two values are equal, return null. Otherwise, return the first value.

    "},{"location":"extensions/functions_comparison/#coalesce","title":"coalesce","text":"

    Implementations: 0. coalesce(any1, any1): -> any1

    Evaluate arguments from left to right and return the first argument that is not null. Once a non-null argument is found, the remaining arguments are not evaluated. If all arguments are null, return null.

    "},{"location":"extensions/functions_comparison/#least","title":"least","text":"

    Implementations: 0. least(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null if any argument evaluates to null.

    "},{"location":"extensions/functions_comparison/#least_skip_null","title":"least_skip_null","text":"

    Implementations: 0. least_skip_null(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null only if all arguments evaluate to null.

    "},{"location":"extensions/functions_comparison/#greatest","title":"greatest","text":"

    Implementations: 0. greatest(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null if any argument evaluates to null.

    "},{"location":"extensions/functions_comparison/#greatest_skip_null","title":"greatest_skip_null","text":"

    Implementations: 0. greatest_skip_null(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null only if all arguments evaluate to null.

    "},{"location":"extensions/functions_datetime/","title":"functions_datetime.yaml","text":"

    This document file is generated for functions_datetime.yaml

    "},{"location":"extensions/functions_datetime/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_datetime/#extract","title":"extract","text":"

    Implementations: extract(component, x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. extract(component, timestamp_tz, string): -> i64 1. extract(component, precision_timestamp_tz<P1>, string): -> i64 2. extract(component, timestamp): -> i64 3. extract(component, precision_timestamp<P1>): -> i64 4. extract(component, date): -> i64 5. extract(component, time): -> i64 6. extract(component, indexing, timestamp_tz, string): -> i64 7. extract(component, indexing, precision_timestamp_tz<P1>, string): -> i64 8. extract(component, indexing, timestamp): -> i64 9. extract(component, indexing, precision_timestamp<P1>): -> i64 10. extract(component, indexing, date): -> i64

    Extract portion of a date/time value. * YEAR Return the year. * ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of its days in January. * US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more) of its days in January. Last week of US epidemiological year has the year\u2019s last Wednesday in it. US epidemiological week starts on Sunday. * QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter, April 1 through June 30 map to the second quarter, etc. * MONTH Return the number of the month within the year. * DAY Return the number of the day within the month. * DAY_OF_YEAR Return the number of the day within the year. January 1 maps to the first day, February 1 maps to the thirty-second day, etc. * MONDAY_DAY_OF_WEEK Return the number of the day within the week, from Monday (first day) to Sunday (seventh day). * SUNDAY_DAY_OF_WEEK Return the number of the day within the week, from Sunday (first day) to Saturday (seventh day). * MONDAY_WEEK Return the number of the week within the year. First week starts on first Monday of January. * SUNDAY_WEEK Return the number of the week within the year. First week starts on first Sunday of January. * ISO_WEEK Return the number of the ISO week within the ISO year. First ISO week has the majority (4 or more) of its days in January. ISO week starts on Monday. * US_WEEK Return the number of the US week within the US year. First US week has the majority (4 or more) of its days in January. US week starts on Sunday. * HOUR Return the hour (0-23). * MINUTE Return the minute (0-59). * SECOND Return the second (0-59). * MILLISECOND Return number of milliseconds since the last full second. * MICROSECOND Return number of microseconds since the last full millisecond. * NANOSECOND Return number of nanoseconds since the last full microsecond. * SUBSECOND Return number of microseconds since the last full second of the given timestamp. * UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds. * TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC. The range of values returned for QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK depends on whether counting starts at 1 or 0. This is governed by the indexing option. When indexing is ONE: * QUARTER returns values in range 1-4 * MONTH returns values in range 1-12 * DAY returns values in range 1-31 * DAY_OF_YEAR returns values in range 1-366 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 1-7 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 1-53 When indexing is ZERO: * QUARTER returns values in range 0-3 * MONTH returns values in range 0-11 * DAY returns values in range 0-30 * DAY_OF_YEAR returns values in range 0-365 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 0-6 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52 The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'UNIX_TIME']
  • indexing ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND']
  • component ['QUARTER', 'MONTH', 'DAY', 'DAY_OF_YEAR', 'MONDAY_DAY_OF_WEEK', 'SUNDAY_DAY_OF_WEEK', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK']
  • indexing ['ONE', 'ZERO']
  • "},{"location":"extensions/functions_datetime/#extract_boolean","title":"extract_boolean","text":"

    Implementations: extract_boolean(component, x): -> return_type 0. extract_boolean(component, timestamp): -> boolean 1. extract_boolean(component, timestamp_tz, string): -> boolean 2. extract_boolean(component, date): -> boolean

    *Extract boolean values of a date/time value. * IS_LEAP_YEAR Return true if year of the given value is a leap year and false otherwise. * IS_DST Return true if DST (Daylight Savings Time) is observed at the given value in the given timezone.

    Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.*

    Options:
  • component ['IS_LEAP_YEAR']
  • component ['IS_LEAP_YEAR', 'IS_DST']
  • "},{"location":"extensions/functions_datetime/#add","title":"add","text":"

    Implementations: add(x, y): -> return_type 0. add(timestamp, interval_year): -> timestamp 1. add(timestamp_tz, interval_year, string): -> timestamp_tz 2. add(date, interval_year): -> timestamp 3. add(timestamp, interval_day): -> timestamp 4. add(timestamp_tz, interval_day): -> timestamp_tz 5. add(date, interval_day): -> timestamp

    Add an interval to a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y): -> return_type 0. multiply(i8, interval_day): -> interval_day 1. multiply(i16, interval_day): -> interval_day 2. multiply(i32, interval_day): -> interval_day 3. multiply(i64, interval_day): -> interval_day 4. multiply(i8, interval_year): -> interval_year 5. multiply(i16, interval_year): -> interval_year 6. multiply(i32, interval_year): -> interval_year 7. multiply(i64, interval_year): -> interval_year

    Multiply an interval by an integral number.

    "},{"location":"extensions/functions_datetime/#add_intervals","title":"add_intervals","text":"

    Implementations: add_intervals(x, y): -> return_type 0. add_intervals(interval_day, interval_day): -> interval_day 1. add_intervals(interval_year, interval_year): -> interval_year

    Add two intervals together.

    "},{"location":"extensions/functions_datetime/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y): -> return_type 0. subtract(timestamp, interval_year): -> timestamp 1. subtract(timestamp_tz, interval_year): -> timestamp_tz 2. subtract(timestamp_tz, interval_year, string): -> timestamp_tz 3. subtract(date, interval_year): -> date 4. subtract(timestamp, interval_day): -> timestamp 5. subtract(timestamp_tz, interval_day): -> timestamp_tz 6. subtract(date, interval_day): -> date

    Subtract an interval from a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#lte","title":"lte","text":"

    Implementations: lte(x, y): -> return_type 0. lte(timestamp, timestamp): -> boolean 1. lte(timestamp_tz, timestamp_tz): -> boolean 2. lte(date, date): -> boolean 3. lte(interval_day, interval_day): -> boolean 4. lte(interval_year, interval_year): -> boolean

    less than or equal to

    "},{"location":"extensions/functions_datetime/#lt","title":"lt","text":"

    Implementations: lt(x, y): -> return_type 0. lt(timestamp, timestamp): -> boolean 1. lt(timestamp_tz, timestamp_tz): -> boolean 2. lt(date, date): -> boolean 3. lt(interval_day, interval_day): -> boolean 4. lt(interval_year, interval_year): -> boolean

    less than

    "},{"location":"extensions/functions_datetime/#gte","title":"gte","text":"

    Implementations: gte(x, y): -> return_type 0. gte(timestamp, timestamp): -> boolean 1. gte(timestamp_tz, timestamp_tz): -> boolean 2. gte(date, date): -> boolean 3. gte(interval_day, interval_day): -> boolean 4. gte(interval_year, interval_year): -> boolean

    greater than or equal to

    "},{"location":"extensions/functions_datetime/#gt","title":"gt","text":"

    Implementations: gt(x, y): -> return_type 0. gt(timestamp, timestamp): -> boolean 1. gt(timestamp_tz, timestamp_tz): -> boolean 2. gt(date, date): -> boolean 3. gt(interval_day, interval_day): -> boolean 4. gt(interval_year, interval_year): -> boolean

    greater than

    "},{"location":"extensions/functions_datetime/#assume_timezone","title":"assume_timezone","text":"

    Implementations: assume_timezone(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. assume_timezone(timestamp, string): -> timestamp_tz 1. assume_timezone(date, string): -> timestamp_tz

    Convert local timestamp to UTC-relative timestamp_tz using given local time\u2019s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#local_timestamp","title":"local_timestamp","text":"

    Implementations: local_timestamp(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. local_timestamp(timestamp_tz, string): -> timestamp

    Convert UTC-relative timestamp_tz to local timestamp using given local time\u2019s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#strptime_time","title":"strptime_time","text":"

    Implementations: strptime_time(time_string, format): -> return_type 0. strptime_time(string, string): -> time

    Parse string into time using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    "},{"location":"extensions/functions_datetime/#strptime_date","title":"strptime_date","text":"

    Implementations: strptime_date(date_string, format): -> return_type 0. strptime_date(string, string): -> date

    Parse string into date using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    "},{"location":"extensions/functions_datetime/#strptime_timestamp","title":"strptime_timestamp","text":"

    Implementations: strptime_timestamp(timestamp_string, format, timezone): -> return_type

  • timestamp_string: Timezone string from IANA tzdb.
  • 0. strptime_timestamp(string, string, string): -> timestamp_tz 1. strptime_timestamp(string, string): -> timestamp_tz

    Parse string into timestamp using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference. If timezone is present in timestamp and provided as parameter an error is thrown. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is supplied as parameter and present in the parsed string the parsed timezone is used. If parameter supplied timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#strftime","title":"strftime","text":"

    Implementations: strftime(x, format): -> return_type 0. strftime(timestamp, string): -> string 1. strftime(timestamp_tz, string, string): -> string 2. strftime(date, string): -> string 3. strftime(time, string): -> string

    Convert timestamp/date/time to string using provided format, see https://man7.org/linux/man-pages/man3/strftime.3.html for reference. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#round_temporal","title":"round_temporal","text":"

    Implementations: round_temporal(x, rounding, unit, multiple, origin): -> return_type 0. round_temporal(timestamp, rounding, unit, i64, timestamp): -> timestamp 1. round_temporal(timestamp_tz, rounding, unit, i64, string, timestamp_tz): -> timestamp_tz 2. round_temporal(date, rounding, unit, i64, date): -> date 3. round_temporal(time, rounding, unit, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the origin in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • "},{"location":"extensions/functions_datetime/#round_calendar","title":"round_calendar","text":"

    Implementations: round_calendar(x, rounding, unit, origin, multiple): -> return_type 0. round_calendar(timestamp, rounding, unit, origin, i64): -> timestamp 1. round_calendar(timestamp_tz, rounding, unit, origin, i64, string): -> timestamp_tz 2. round_calendar(date, rounding, unit, origin, i64, date): -> date 3. round_calendar(time, rounding, unit, origin, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the last origin unit in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • origin ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY']
  • origin ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • "},{"location":"extensions/functions_datetime/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_datetime/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(date): -> date? 1. min(time): -> time? 2. min(timestamp): -> timestamp? 3. min(timestamp_tz): -> timestamp_tz? 4. min(interval_day): -> interval_day? 5. min(interval_year): -> interval_year?

    Min a set of values.

    "},{"location":"extensions/functions_datetime/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(date): -> date? 1. max(time): -> time? 2. max(timestamp): -> timestamp? 3. max(timestamp_tz): -> timestamp_tz? 4. max(interval_day): -> interval_day? 5. max(interval_year): -> interval_year?

    Max a set of values.

    "},{"location":"extensions/functions_geometry/","title":"functions_geometry.yaml","text":"

    This document file is generated for functions_geometry.yaml

    "},{"location":"extensions/functions_geometry/#data-types","title":"Data Types","text":"

    name: geometry structure: BINARY

    "},{"location":"extensions/functions_geometry/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_geometry/#point","title":"point","text":"

    Implementations: point(x, y): -> return_type 0. point(fp64, fp64): -> u!geometry

    *Returns a 2D point with the given x and y coordinate values. *

    "},{"location":"extensions/functions_geometry/#make_line","title":"make_line","text":"

    Implementations: make_line(geom1, geom2): -> return_type 0. make_line(u!geometry, u!geometry): -> u!geometry

    *Returns a linestring connecting the endpoint of geometry geom1 to the begin point of geometry geom2. Repeated points at the beginning of input geometries are collapsed to a single point. A linestring can be closed or simple. A closed linestring starts and ends on the same point. A simple linestring does not cross or touch itself. *

    "},{"location":"extensions/functions_geometry/#x_coordinate","title":"x_coordinate","text":"

    Implementations: x_coordinate(point): -> return_type 0. x_coordinate(u!geometry): -> fp64

    *Return the x coordinate of the point. Return null if not available. *

    "},{"location":"extensions/functions_geometry/#y_coordinate","title":"y_coordinate","text":"

    Implementations: y_coordinate(point): -> return_type 0. y_coordinate(u!geometry): -> fp64

    *Return the y coordinate of the point. Return null if not available. *

    "},{"location":"extensions/functions_geometry/#num_points","title":"num_points","text":"

    Implementations: num_points(geom): -> return_type 0. num_points(u!geometry): -> i64

    *Return the number of points in the geometry. The geometry should be an linestring or circularstring. *

    "},{"location":"extensions/functions_geometry/#is_empty","title":"is_empty","text":"

    Implementations: is_empty(geom): -> return_type 0. is_empty(u!geometry): -> boolean

    *Return true is the geometry is an empty geometry. *

    "},{"location":"extensions/functions_geometry/#is_closed","title":"is_closed","text":"

    Implementations: is_closed(geom): -> return_type 0. is_closed(geometry): -> boolean

    *Return true if the geometry\u2019s start and end points are the same. *

    "},{"location":"extensions/functions_geometry/#is_simple","title":"is_simple","text":"

    Implementations: is_simple(geom): -> return_type 0. is_simple(u!geometry): -> boolean

    *Return true if the geometry does not self intersect. *

    "},{"location":"extensions/functions_geometry/#is_ring","title":"is_ring","text":"

    Implementations: is_ring(geom): -> return_type 0. is_ring(u!geometry): -> boolean

    *Return true if the geometry\u2019s start and end points are the same and it does not self intersect. *

    "},{"location":"extensions/functions_geometry/#geometry_type","title":"geometry_type","text":"

    Implementations: geometry_type(geom): -> return_type 0. geometry_type(u!geometry): -> string

    *Return the type of geometry as a string. *

    "},{"location":"extensions/functions_geometry/#envelope","title":"envelope","text":"

    Implementations: envelope(geom): -> return_type 0. envelope(u!geometry): -> u!geometry

    *Return the minimum bounding box for the input geometry as a geometry. The returned geometry is defined by the corner points of the bounding box. If the input geometry is a point or a line, the returned geometry can also be a point or line. *

    "},{"location":"extensions/functions_geometry/#dimension","title":"dimension","text":"

    Implementations: dimension(geom): -> return_type 0. dimension(u!geometry): -> i8

    *Return the dimension of the input geometry. If the input is a collection of geometries, return the largest dimension from the collection. Dimensionality is determined by the complexity of the input and not the coordinate system being used. Type dimensions: POINT - 0 LINE - 1 POLYGON - 2 *

    "},{"location":"extensions/functions_geometry/#is_valid","title":"is_valid","text":"

    Implementations: is_valid(geom): -> return_type 0. is_valid(u!geometry): -> boolean

    *Return true if the input geometry is a valid 2D geometry. For 3 dimensional and 4 dimensional geometries, the validity is still only tested in 2 dimensions. *

    "},{"location":"extensions/functions_geometry/#collection_extract","title":"collection_extract","text":"

    Implementations: collection_extract(geom_collection): -> return_type 0. collection_extract(u!geometry): -> u!geometry 1. collection_extract(u!geometry, i8): -> u!geometry

    *Given the input geometry collection, return a homogenous multi-geometry. All geometries in the multi-geometry will have the same dimension. If type is not specified, the multi-geometry will only contain geometries of the highest dimension. If type is specified, the multi-geometry will only contain geometries of that type. If there are no geometries of the specified type, an empty geometry is returned. Only points, linestrings, and polygons are supported. Type numbers: POINT - 0 LINE - 1 POLYGON - 2 *

    "},{"location":"extensions/functions_geometry/#flip_coordinates","title":"flip_coordinates","text":"

    Implementations: flip_coordinates(geom_collection): -> return_type 0. flip_coordinates(u!geometry): -> u!geometry

    *Return a version of the input geometry with the X and Y axis flipped. This operation can be performed on geometries with more than 2 dimensions. However, only X and Y axis will be flipped. *

    "},{"location":"extensions/functions_geometry/#remove_repeated_points","title":"remove_repeated_points","text":"

    Implementations: remove_repeated_points(geom): -> return_type 0. remove_repeated_points(u!geometry): -> u!geometry 1. remove_repeated_points(u!geometry, fp64): -> u!geometry

    *Return a version of the input geometry with duplicate consecutive points removed. If the tolerance argument is provided, consecutive points within the tolerance distance of one another are considered to be duplicates. *

    "},{"location":"extensions/functions_geometry/#buffer","title":"buffer","text":"

    Implementations: buffer(geom, buffer_radius): -> return_type 0. buffer(u!geometry, fp64): -> u!geometry

    *Compute and return an expanded version of the input geometry. All the points of the returned geometry are at a distance of buffer_radius away from the points of the input geometry. If a negative buffer_radius is provided, the geometry will shrink instead of expand. A negative buffer_radius may shrink the geometry completely, in which case an empty geometry is returned. For input the geometries of points or lines, a negative buffer_radius will always return an emtpy geometry. *

    "},{"location":"extensions/functions_geometry/#centroid","title":"centroid","text":"

    Implementations: centroid(geom): -> return_type 0. centroid(u!geometry): -> u!geometry

    *Return a point which is the geometric center of mass of the input geometry. *

    "},{"location":"extensions/functions_geometry/#minimum_bounding_circle","title":"minimum_bounding_circle","text":"

    Implementations: minimum_bounding_circle(geom): -> return_type 0. minimum_bounding_circle(u!geometry): -> u!geometry

    *Return the smallest circle polygon that contains the input geometry. *

    "},{"location":"extensions/functions_logarithmic/","title":"functions_logarithmic.yaml","text":"

    This document file is generated for functions_logarithmic.yaml

    "},{"location":"extensions/functions_logarithmic/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_logarithmic/#ln","title":"ln","text":"

    Implementations: ln(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. ln(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. ln(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Natural logarithm of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log10","title":"log10","text":"

    Implementations: log10(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log10(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log10(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 10 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log2","title":"log2","text":"

    Implementations: log2(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log2(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log2(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 2 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#logb","title":"logb","text":"

    Implementations: logb(x, base, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type

  • x: The number `x` to compute the logarithm of
  • base: The logarithm base `b` to use
  • 0. logb(fp32, fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. logb(fp64, fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Logarithm of the value with the given base logb(x, b) => log_{b} (x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log1p","title":"log1p","text":"

    Implementations: log1p(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log1p(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log1p(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Natural logarithm (base e) of 1 + x log1p(x) => log(1+x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_rounding/","title":"functions_rounding.yaml","text":"

    This document file is generated for functions_rounding.yaml

    "},{"location":"extensions/functions_rounding/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_rounding/#ceil","title":"ceil","text":"

    Implementations: ceil(x): -> return_type 0. ceil(fp32): -> fp32 1. ceil(fp64): -> fp64

    *Rounding to the ceiling of the value x. *

    "},{"location":"extensions/functions_rounding/#floor","title":"floor","text":"

    Implementations: floor(x): -> return_type 0. floor(fp32): -> fp32 1. floor(fp64): -> fp64

    *Rounding to the floor of the value x. *

    "},{"location":"extensions/functions_rounding/#round","title":"round","text":"

    Implementations: round(x, s, option:rounding): -> return_type

  • x: Numerical expression to be rounded.
  • s: Number of decimal places to be rounded to. When `s` is a positive number, nothing will happen since `x` is an integer value. When `s` is a negative number, the rounding is performed to the nearest multiple of `10^(-s)`.
  • 0. round(i8, i32, option:rounding): -> i8? 1. round(i16, i32, option:rounding): -> i16? 2. round(i32, i32, option:rounding): -> i32? 3. round(i64, i32, option:rounding): -> i64? 4. round(fp32, i32, option:rounding): -> fp32? 5. round(fp64, i32, option:rounding): -> fp64?

    *Rounding the value x to s decimal places. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR', 'AWAY_FROM_ZERO', 'TIE_DOWN', 'TIE_UP', 'TIE_TOWARDS_ZERO', 'TIE_TO_ODD']
  • "},{"location":"extensions/functions_set/","title":"functions_set.yaml","text":"

    This document file is generated for functions_set.yaml

    "},{"location":"extensions/functions_set/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_set/#index_in","title":"index_in","text":"

    Implementations: index_in(x, y, option:nan_equality): -> return_type 0. index_in(T, List<T>, option:nan_equality): -> int64?

    *Checks the membership of a value in a list of values Returns the first 0-based index value of some input T if T is equal to any element in List<T>. Returns NULL if not found. If T is NULL, returns NULL. If T is NaN: - Returns 0-based index of NaN in List<T> (default) - Returns NULL (if NAN_IS_NOT_NAN is specified) *

    Options:
  • nan_equality ['NAN_IS_NAN', 'NAN_IS_NOT_NAN']
  • "},{"location":"extensions/functions_string/","title":"functions_string.yaml","text":"

    This document file is generated for functions_string.yaml

    "},{"location":"extensions/functions_string/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_string/#concat","title":"concat","text":"

    Implementations: concat(input, option:null_handling): -> return_type 0. concat(varchar<L1>, option:null_handling): -> varchar<L1> 1. concat(string, option:null_handling): -> string

    Concatenate strings. The null_handling option determines whether or not null values will be recognized by the function. If null_handling is set to IGNORE_NULLS, null value arguments will be ignored when strings are concatenated. If set to ACCEPT_NULLS, the result will be null if any argument passed to the concat function is null.

    Options:
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • "},{"location":"extensions/functions_string/#like","title":"like","text":"

    Implementations: like(input, match, option:case_sensitivity): -> return_type

  • input: The input string.
  • match: The string to match against the input string.
  • 0. like(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. like(string, string, option:case_sensitivity): -> boolean

    Are two strings like each other. The case_sensitivity option applies to the match argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#substring","title":"substring","text":"

    Implementations: substring(input, start, length, option:negative_start): -> return_type 0. substring(varchar<L1>, i32, i32, option:negative_start): -> varchar<L1> 1. substring(string, i32, i32, option:negative_start): -> string 2. substring(fixedchar<l1>, i32, i32, option:negative_start): -> string 3. substring(varchar<L1>, i32, option:negative_start): -> varchar<L1> 4. substring(string, i32, option:negative_start): -> string 5. substring(fixedchar<l1>, i32, option:negative_start): -> string

    Extract a substring of a specified length starting from position start. A start value of 1 refers to the first characters of the string. When length is not specified the function will extract a substring starting from position start and ending at the end of the string. The negative_start option applies to the start parameter. WRAP_FROM_END means the index will start from the end of the input and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. LEFT_OF_BEGINNING means the returned substring will start from the left of the first character. A start of -1 will begin 2 characters left of the the input, while a start of 0 begins 1 character left of the input.

    Options:
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • "},{"location":"extensions/functions_string/#regexp_match_substring","title":"regexp_match_substring","text":"

    Implementations: regexp_match_substring(input, pattern, position, occurrence, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_match_substring(varchar<L1>, varchar<L2>, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1> 1. regexp_match_substring(string, string, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the occurrence argument. Specifying 1 means the first occurrence will be extracted, 2 means the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return the substring matching the full regular expression. Specifying 1 will return the substring matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#regexp_match_substring_all","title":"regexp_match_substring_all","text":"

    Implementations: regexp_match_substring_all(input, pattern, position, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_match_substring_all(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>> 1. regexp_match_substring_all(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return substrings matching the full regular expression. Specifying 1 will return substrings matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#starts_with","title":"starts_with","text":"

    Implementations: starts_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. starts_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. starts_with(varchar<L1>, string, option:case_sensitivity): -> boolean 2. starts_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. starts_with(string, string, option:case_sensitivity): -> boolean 4. starts_with(string, varchar<L1>, option:case_sensitivity): -> boolean 5. starts_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. starts_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. starts_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. starts_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string starts with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#ends_with","title":"ends_with","text":"

    Implementations: ends_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. ends_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. ends_with(varchar<L1>, string, option:case_sensitivity): -> boolean 2. ends_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. ends_with(string, string, option:case_sensitivity): -> boolean 4. ends_with(string, varchar<L1>, option:case_sensitivity): -> boolean 5. ends_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. ends_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. ends_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. ends_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether input string ends with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#contains","title":"contains","text":"

    Implementations: contains(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. contains(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. contains(varchar<L1>, string, option:case_sensitivity): -> boolean 2. contains(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. contains(string, string, option:case_sensitivity): -> boolean 4. contains(string, varchar<L1>, option:case_sensitivity): -> boolean 5. contains(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. contains(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. contains(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. contains(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string contains the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#strpos","title":"strpos","text":"

    Implementations: strpos(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. strpos(string, string, option:case_sensitivity): -> i64 1. strpos(varchar<L1>, varchar<L1>, option:case_sensitivity): -> i64 2. strpos(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#regexp_strpos","title":"regexp_strpos","text":"

    Implementations: regexp_strpos(input, pattern, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_strpos(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 1. regexp_strpos(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the occurrence argument. Specifying 1 means the position first occurrence will be returned, 2 means the position of the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#count_substring","title":"count_substring","text":"

    Implementations: count_substring(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to count.
  • 0. count_substring(string, string, option:case_sensitivity): -> i64 1. count_substring(varchar<L1>, varchar<L2>, option:case_sensitivity): -> i64 2. count_substring(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the number of non-overlapping occurrences of a substring in an input string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#regexp_count_substring","title":"regexp_count_substring","text":"

    Implementations: regexp_count_substring(input, pattern, position, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_count_substring(string, string, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 1. regexp_count_substring(varchar<L1>, varchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 2. regexp_count_substring(fixedchar<L1>, fixedchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#replace","title":"replace","text":"

    Implementations: replace(input, substring, replacement, option:case_sensitivity): -> return_type

  • input: Input string.
  • substring: The substring to replace.
  • replacement: The replacement string.
  • 0. replace(string, string, string, option:case_sensitivity): -> string 1. replace(varchar<L1>, varchar<L2>, varchar<L3>, option:case_sensitivity): -> varchar<L1>

    Replace all occurrences of the substring with the replacement string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#concat_ws","title":"concat_ws","text":"

    Implementations: concat_ws(separator, string_arguments): -> return_type

  • separator: Character to separate strings by.
  • string_arguments: Strings to be concatenated.
  • 0. concat_ws(string, string): -> string 1. concat_ws(varchar<L2>, varchar<L1>): -> varchar<L1>

    Concatenate strings together separated by a separator.

    "},{"location":"extensions/functions_string/#repeat","title":"repeat","text":"

    Implementations: repeat(input, count): -> return_type 0. repeat(string, i64): -> string 1. repeat(varchar<L1>, i64, i64): -> varchar<L1>

    Repeat a string count number of times.

    "},{"location":"extensions/functions_string/#reverse","title":"reverse","text":"

    Implementations: reverse(input): -> return_type 0. reverse(string): -> string 1. reverse(varchar<L1>): -> varchar<L1> 2. reverse(fixedchar<L1>): -> fixedchar<L1>

    Returns the string in reverse order.

    "},{"location":"extensions/functions_string/#replace_slice","title":"replace_slice","text":"

    Implementations: replace_slice(input, start, length, replacement): -> return_type

  • input: Input string.
  • start: The position in the string to start deleting/inserting characters.
  • length: The number of characters to delete from the input string.
  • replacement: The new string to insert at the start position.
  • 0. replace_slice(string, i64, i64, string): -> string 1. replace_slice(varchar<L1>, i64, i64, varchar<L2>): -> varchar<L1>

    Replace a slice of the input string. A specified \u2018length\u2019 of characters will be deleted from the input string beginning at the \u2018start\u2019 position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If \u2018length\u2019 is negative, a null string is returned. If \u2018length\u2019 is zero, inserting of the new string occurs at the specified \u2018start\u2019 position and no characters are deleted. If \u2018length\u2019 is greater than the input string, deletion will occur up to the last character of the input string.

    "},{"location":"extensions/functions_string/#lower","title":"lower","text":"

    Implementations: lower(input, option:char_set): -> return_type 0. lower(string, option:char_set): -> string 1. lower(varchar<L1>, option:char_set): -> varchar<L1> 2. lower(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#upper","title":"upper","text":"

    Implementations: upper(input, option:char_set): -> return_type 0. upper(string, option:char_set): -> string 1. upper(varchar<L1>, option:char_set): -> varchar<L1> 2. upper(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#swapcase","title":"swapcase","text":"

    Implementations: swapcase(input, option:char_set): -> return_type 0. swapcase(string, option:char_set): -> string 1. swapcase(varchar<L1>, option:char_set): -> varchar<L1> 2. swapcase(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string\u2019s lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#capitalize","title":"capitalize","text":"

    Implementations: capitalize(input, option:char_set): -> return_type 0. capitalize(string, option:char_set): -> string 1. capitalize(varchar<L1>, option:char_set): -> varchar<L1> 2. capitalize(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#title","title":"title","text":"

    Implementations: title(input, option:char_set): -> return_type 0. title(string, option:char_set): -> string 1. title(varchar<L1>, option:char_set): -> varchar<L1> 2. title(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#char_length","title":"char_length","text":"

    Implementations: char_length(input): -> return_type 0. char_length(string): -> i64 1. char_length(varchar<L1>): -> i64 2. char_length(fixedchar<L1>): -> i64

    Return the number of characters in the input string. The length includes trailing spaces.

    "},{"location":"extensions/functions_string/#bit_length","title":"bit_length","text":"

    Implementations: bit_length(input): -> return_type 0. bit_length(string): -> i64 1. bit_length(varchar<L1>): -> i64 2. bit_length(fixedchar<L1>): -> i64

    Return the number of bits in the input string.

    "},{"location":"extensions/functions_string/#octet_length","title":"octet_length","text":"

    Implementations: octet_length(input): -> return_type 0. octet_length(string): -> i64 1. octet_length(varchar<L1>): -> i64 2. octet_length(fixedchar<L1>): -> i64

    Return the number of bytes in the input string.

    "},{"location":"extensions/functions_string/#regexp_replace","title":"regexp_replace","text":"

    Implementations: regexp_replace(input, pattern, replacement, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • replacement: The replacement string.
  • position: The position to start the search.
  • occurrence: Which occurrence of the match to replace.
  • 0. regexp_replace(string, string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string 1. regexp_replace(varchar<L1>, varchar<L2>, varchar<L3>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the occurrence argument. Specifying 1 means only the first occurrence will be replaced, 2 means the second occurrence, and so on. Specifying 0 means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#ltrim","title":"ltrim","text":"

    Implementations: ltrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. ltrim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. ltrim(string, string): -> string

    Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#rtrim","title":"rtrim","text":"

    Implementations: rtrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. rtrim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. rtrim(string, string): -> string

    Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#trim","title":"trim","text":"

    Implementations: trim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. trim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. trim(string, string): -> string

    Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#lpad","title":"lpad","text":"

    Implementations: lpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. lpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1> 1. lpad(string, i32, string): -> string

    Left-pad the input string with the string of \u2018characters\u2019 until the specified length of the string has been reached. If the input string is longer than \u2018length\u2019, remove characters from the right-side to shorten it to \u2018length\u2019 characters. If the string of \u2018characters\u2019 is longer than the remaining \u2018length\u2019 needed to be filled, only pad until \u2018length\u2019 has been reached. If \u2018characters\u2019 is not specified, the default value is a single space.

    "},{"location":"extensions/functions_string/#rpad","title":"rpad","text":"

    Implementations: rpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. rpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1> 1. rpad(string, i32, string): -> string

    Right-pad the input string with the string of \u2018characters\u2019 until the specified length of the string has been reached. If the input string is longer than \u2018length\u2019, remove characters from the left-side to shorten it to \u2018length\u2019 characters. If the string of \u2018characters\u2019 is longer than the remaining \u2018length\u2019 needed to be filled, only pad until \u2018length\u2019 has been reached. If \u2018characters\u2019 is not specified, the default value is a single space.

    "},{"location":"extensions/functions_string/#center","title":"center","text":"

    Implementations: center(input, length, character, option:padding): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • character: The character to use for padding.
  • 0. center(varchar<L1>, i32, varchar<L1>, option:padding): -> varchar<L1> 1. center(string, i32, string, option:padding): -> string

    Center the input string by padding the sides with a single character until the specified length of the string has been reached. By default, if the length will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the padding option. Behavior is undefined if the number of characters passed to the character argument is not 1.

    Options:
  • padding ['RIGHT', 'LEFT']
  • "},{"location":"extensions/functions_string/#left","title":"left","text":"

    Implementations: left(input, count): -> return_type 0. left(varchar<L1>, i32): -> varchar<L1> 1. left(string, i32): -> string

    Extract count characters starting from the left of the string.

    "},{"location":"extensions/functions_string/#right","title":"right","text":"

    Implementations: right(input, count): -> return_type 0. right(varchar<L1>, i32): -> varchar<L1> 1. right(string, i32): -> string

    Extract count characters starting from the right of the string.

    "},{"location":"extensions/functions_string/#string_split","title":"string_split","text":"

    Implementations: string_split(input, separator): -> return_type

  • input: The input string.
  • separator: A character used for splitting the string.
  • 0. string_split(varchar<L1>, varchar<L2>): -> List<varchar<L1>> 1. string_split(string, string): -> List<string>

    Split a string into a list of strings, based on a specified separator character.

    "},{"location":"extensions/functions_string/#regexp_string_split","title":"regexp_string_split","text":"

    Implementations: regexp_string_split(input, pattern, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • 0. regexp_string_split(varchar<L1>, varchar<L2>, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>> 1. regexp_string_split(string, string, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_string/#string_agg","title":"string_agg","text":"

    Implementations: string_agg(input, separator): -> return_type

  • input: Column of string values.
  • separator: Separator for concatenated strings
  • 0. string_agg(string, string): -> string

    Concatenates a column of string values with a separator.

    "},{"location":"relations/basics/","title":"Basics","text":"

    Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.

    Each relational operation is composed of several properties. Common properties for relational operations include the following:

    Property Description Type Emit The set of columns output from this operation and the order of those columns. Logical & Physical Hints A set of optionally provided, optionally consumed information about an operation that better informs execution. These might include estimated number of input and output records, estimated record size, likely filter reduction, estimated dictionary size, etc. These can also include implementation specific pieces of execution information. Physical Constraint A set of runtime constraints around the operation, limiting its consumption based on real-world resources (CPU, memory) as well as virtual resources like number of records produced, the largest record size, etc. Physical"},{"location":"relations/basics/#relational-signatures","title":"Relational Signatures","text":"

    In functions, function signatures are declared externally to the use of those signatures (function bindings). In the case of relational operations, signatures are declared directly in the specification. This is due to the speed of change and number of total operations. Relational operations in the specification are expected to be <100 for several years with additions being infrequent. On the other hand, there is an expectation of both a much larger number of functions (1,000s) and a much higher velocity of additions.

    Each relational operation must declare the following:

    • Transformation logic around properties of the data. For example, does a relational operation maintain sortedness of a field? Does an operation change the distribution of data?
    • How many input relations does an operation require?
    • Does the operator produce an output (by specification, we limit relational operations to a single output at this time)
    • What is the schema and field ordering of an output (see emit below)?
    "},{"location":"relations/basics/#emit-output-ordering","title":"Emit: Output Ordering","text":"

    A relational operation uses field references to access specific fields of the input stream. Field references are always ordinal based on the order of the incoming streams. Each relational operation must declare the order of its output data. To simplify things, each relational operation can be in one of two modes:

    1. Direct output: The order of outputs is based on the definition declared by the relational operation.
    2. Remap: A listed ordering of the direct outputs. This remapping can be also used to drop columns no longer used (such as a filter field or join keys after a join). Note that remapping/exclusion can only be done at the outputs root struct. Filtering of compound values or extracting subsets must be done through other operation types (e.g. projection).
    "},{"location":"relations/basics/#relation-properties","title":"Relation Properties","text":"

    There are a number of predefined properties that exist in Substrait relations. These include the following.

    "},{"location":"relations/basics/#distribution","title":"Distribution","text":"

    When data is partitioned across multiple sibling sets, distribution describes that set of properties that apply to any one partition. This is based on a set of distribution expression properties. A distribution is declared as a set of one or more fields and a distribution type across all fields.

    Property Description Required Distribution Fields List of fields references that describe distribution (e.g. [0,2:4,5:0:0]). The order of these references do not impact results. Required for partitioned distribution type. Disallowed for singleton distribution type. Distribution Type PARTITIONED: For a discrete tuple of values for the declared distribution fields, all records with that tuple are located in the same partition. SINGLETON: there will only be a single partition for this operation. Required"},{"location":"relations/basics/#orderedness","title":"Orderedness","text":"

    A guarantee that data output from this operation is provided with a sort order. The sort order will be declared based on a set of sort field definitions based on the emitted output of this operation.

    Property Description Required Sort Fields A list of fields that the data are ordered by. The list is in order of the sort. If we sort by [0,1] then this means we only consider the data for field 1 to be ordered within each discrete value of field 0. At least one required. Per - Sort Field A field reference that the data is sorted by. Required Per - Sort Direction The direction of the data. See direction options below. Required"},{"location":"relations/basics/#ordering-directions","title":"Ordering Directions","text":"Direction Descriptions Nulls Position Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included before any values. First Descending Returns data in descending order based on the quality function associated with the type. Nulls are included before any values. First Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included after any values. Last Descending Returns data in descending order based on the quality function associated with the type. Nulls are included after any values. Last Custom function identifier Returns data using a custom function that returns -1, 0, or 1 depending on the order of the data. Per Function Clustered Ensures that all equal values are coalesced (but no ordering between values is defined). E.g. for values 1,2,3,1,2,3, output could be any of the following: 1,1,2,2,3,3 or 1,1,3,3,2,2 or 2,2,1,1,3,3 or 2,2,3,3,1,1 or 3,3,1,1,2,2 or 3,3,2,2,1,1. N/A, may appear anywhere but will be coalesced. Discussion Points
    • Should read definition types be more extensible in the same way that function signatures are? Are extensible read definition types necessary if we have custom relational operators?
    • How are decomposed reads expressed? For example, the Iceberg type above is for early logical planning. Once we do some operations, it may produce a list of Iceberg file reads. This is likely a secondary type of object.
    "},{"location":"relations/embedded_relations/","title":"Embedded Relations","text":"

    Pending.

    Embedded relations allow a Substrait producer to define a set operation that will be embedded in the plan.

    TODO: define lots of details about what interfaces, languages, formats, etc. Should reasonably be an extension of embedded user defined table functions.

    "},{"location":"relations/logical_relations/","title":"Logical Relations","text":""},{"location":"relations/logical_relations/#read-operator","title":"Read Operator","text":"

    The read operator is an operator that produces one output. A simple example would be the reading of a Parquet file. It is expected that many types of reads will be added over time.

    Signature Value Inputs 0 Outputs 1 Property Maintenance N/A (no inputs) Direct Output Order Defaults to the schema of the data read after the optional projection (masked complex expression) is applied."},{"location":"relations/logical_relations/#read-properties","title":"Read Properties","text":"Property Description Required Definition The contents of the read property definition. Required Direct Schema Defines the schema of the output of the read (before any projection or emit remapping/hiding). Required Filter A boolean Substrait expression that describes a filter that must be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none. Best Effort Filter A boolean Substrait expression that describes a filter that may be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none. Projection A masked complex expression describing the portions of the content that should be read Optional, defaults to all of schema Output Properties Declaration of orderedness and/or distribution properties this read produces. Optional, defaults to no properties. Properties A list of name/value pairs associated with the read. Optional, defaults to empty"},{"location":"relations/logical_relations/#read-filtering","title":"Read Filtering","text":"

    The read relation has two different filter properties. A filter, which must be satisfied by the operator and a best effort filter, which does not have to be satisfied. This reflects the way that consumers are often implemented. A consumer is often only able to fully apply a limited set of operations in the scan. There can then be an extended set of operations which a consumer can apply in a best effort fashion. A producer, when setting these two fields, should take care to only use expressions that the consumer is capable of handling.

    As an example, a consumer may only be able to fully apply (in the read relation) <, =, and > on integral types. The consumer may be able to apply <, =, and > in a best effort fashion on decimal and string types. Consider the filter expression my_int < 10 && my_string < \"x\" && upper(my_string) > \"B\". In this case the filter should be set to my_int < 10 and the best_effort_filter should be set to my_string < \"x\" and the remaining portion (upper(my_string) > \"B\") should be put into a filter relation.

    A filter expression must be interpreted against the direct schema before the projection expression has been applied. As a result, fields may be referenced by the filter expression which are not included in the relation\u2019s output.

    "},{"location":"relations/logical_relations/#read-definition-types","title":"Read Definition Types","text":"Adding new Read Definition Types

    If you have a read definition that\u2019s not covered here, see the process for adding new read definition types.

    Read definition types (like the rest of the features in Substrait) are built by the community and added to the specification.

    "},{"location":"relations/logical_relations/#virtual-table","title":"Virtual Table","text":"

    A virtual table is a table whose contents are embedded in the plan itself. The table data is encoded as records consisting of literal values.

    Property Description Required Data Required Required"},{"location":"relations/logical_relations/#named-table","title":"Named Table","text":"

    A named table is a reference to data defined elsewhere. For example, there may be a catalog of tables with unique names that both the producer and consumer agree on. This catalog would provide the consumer with more information on how to retrieve the data.

    Property Description Required Names A list of namespaced strings that, together, form the table name Required (at least one)"},{"location":"relations/logical_relations/#files-type","title":"Files Type","text":"Property Description Required Items An array of Items (path or path glob) associated with the read. Required Format per item Enumeration of available formats. Only current option is PARQUET. Required Slicing parameters per item Information to use when reading a slice of a file. Optional"},{"location":"relations/logical_relations/#slicing-files","title":"Slicing Files","text":"

    A read operation is allowed to only read part of a file. This is convenient, for example, when distributing a read operation across several nodes. The slicing parameters are specified as byte offsets into the file.

    Many file formats consist of indivisible \u201cchunks\u201d of data (e.g. Parquet row groups). If this happens the consumer can determine which slice a particular chunk belongs to. For example, one possible approach is that a chunk should only be read if the midpoint of the chunk (dividing by 2 and rounding down) is contained within the asked-for byte range.

    ReadRel Message
    message ReadRel {\n  RelCommon common = 1;\n  NamedStruct base_schema = 2;\n  Expression filter = 3;\n  Expression best_effort_filter = 11;\n  Expression.MaskExpression projection = 4;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  // Definition of which type of scan operation is to be performed\n  oneof read_type {\n    VirtualTable virtual_table = 5;\n    LocalFiles local_files = 6;\n    NamedTable named_table = 7;\n    ExtensionTable extension_table = 8;\n  }\n\n  // A base table. The list of string is used to represent namespacing (e.g., mydb.mytable).\n  // This assumes shared catalog between systems exchanging a message.\n  message NamedTable {\n    repeated string names = 1;\n    substrait.extensions.AdvancedExtension advanced_extension = 10;\n  }\n\n  // A table composed of literals.\n  message VirtualTable {\n    repeated Expression.Literal.Struct values = 1;\n  }\n\n  // A stub type that can be used to extend/introduce new table types outside\n  // the specification.\n  message ExtensionTable {\n    google.protobuf.Any detail = 1;\n  }\n\n  // Represents a list of files in input of a scan operation\n  message LocalFiles {\n    repeated FileOrFiles items = 1;\n    substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n    // Many files consist of indivisible chunks (e.g. parquet row groups\n    // or CSV rows).  If a slice partially selects an indivisible chunk\n    // then the consumer should employ some rule to decide which slice to\n    // include the chunk in (e.g. include it in the slice that contains\n    // the midpoint of the chunk)\n    message FileOrFiles {\n      oneof path_type {\n        // A URI that can refer to either a single folder or a single file\n        string uri_path = 1;\n        // A URI where the path portion is a glob expression that can\n        // identify zero or more paths.\n        // Consumers should support the POSIX syntax.  The recursive\n        // globstar (**) may not be supported.\n        string uri_path_glob = 2;\n        // A URI that refers to a single file\n        string uri_file = 3;\n        // A URI that refers to a single folder\n        string uri_folder = 4;\n      }\n\n      // Original file format enum, superseded by the file_format oneof.\n      reserved 5;\n      reserved \"format\";\n\n      // The index of the partition this item belongs to\n      uint64 partition_index = 6;\n\n      // The start position in byte to read from this item\n      uint64 start = 7;\n\n      // The length in byte to read from this item\n      uint64 length = 8;\n\n      message ParquetReadOptions {}\n      message ArrowReadOptions {}\n      message OrcReadOptions {}\n      message DwrfReadOptions {}\n\n      // The format of the files.\n      oneof file_format {\n        ParquetReadOptions parquet = 9;\n        ArrowReadOptions arrow = 10;\n        OrcReadOptions orc = 11;\n        google.protobuf.Any extension = 12;\n        DwrfReadOptions dwrf = 13;\n      }\n    }\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#filter-operation","title":"Filter Operation","text":"

    The filter operator eliminates one or more records from the input data based on a boolean filter expression.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness, Distribution, remapped by emit Direct Output Order The field order as the input."},{"location":"relations/logical_relations/#filter-properties","title":"Filter Properties","text":"Property Description Required Input The relational input. Required Expression A boolean expression which describes which records are included/excluded. Required FilterRel Message
    message FilterRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  Expression condition = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#sort-operation","title":"Sort Operation","text":"

    The sort operator reorders a dataset based on one or more identified sort fields and a sorting function for each.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. Direct Output Order The field order of the input."},{"location":"relations/logical_relations/#sort-properties","title":"Sort Properties","text":"Property Description Required Input The relational input. Required Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required SortRel Message
    message SortRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  repeated SortField sorts = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#project-operation","title":"Project Operation","text":"

    The project operation will produce one or more additional expressions based on the inputs of the dataset.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Distribution maintained, mapped by emit. Orderedness: Maintained if no window operations. Extended to include projection fields if fields are direct references. If window operations are present, no orderedness is maintained. Direct Output Order The field order of the input + the list of new expressions in the order they are declared in the expressions list."},{"location":"relations/logical_relations/#project-properties","title":"Project Properties","text":"Property Description Required Input The relational input. Required Expressions List of one or more expressions to add to the input. At least one expression required ProjectRel Message
    message ProjectRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  repeated Expression expressions = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#cross-product-operation","title":"Cross Product Operation","text":"

    The cross product operation will combine two separate inputs into a single output. It pairs every record from the left input with every record of the right input.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is empty post operation. Direct Output Order The emit order of the left input followed by the emit order of the right input."},{"location":"relations/logical_relations/#cross-product-properties","title":"Cross Product Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required CrossRel Message
    message CrossRel {\n  RelCommon common = 1;\n  Rel left = 2;\n  Rel right = 3;\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#join-operation","title":"Join Operation","text":"

    The join operation will combine two separate inputs into a single output, based on a join expression. A common subtype of joins is an equality join where the join expression is constrained to a list of equality (or equality + null equality) conditions between the two inputs of the join.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is empty post operation. Physical relations may provide better property maintenance. Direct Output Order The emit order of the left input followed by the emit order of the right input."},{"location":"relations/logical_relations/#join-properties","title":"Join Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Join Expression A boolean condition that describes whether each record from the left set \u201cmatch\u201d the record from the right set. Field references correspond to the direct output order of the data. Required. Can be the literal True. Post-Join Filter A boolean condition to be applied to each result record after the inputs have been joined, yielding only the records that satisfied the condition. Optional Join Type One of the join types defined below. Required"},{"location":"relations/logical_relations/#join-types","title":"Join Types","text":"Type Description Inner Return records from the left side only if they match the right side. Return records from the right side only when they match the left side. For each cross input match, return a record including the data from both sides. Non-matching records are ignored. Outer Return all records from both the left and right inputs. For each cross input match, return a record including the data from both sides. For any remaining non-match records, return the record from the corresponding input along with nulls for the opposite input. Left Return all records from the left input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the left input, return the left record along with nulls for the right input. Right Return all records from the right input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the right input, return the right record along with nulls for the left input. Semi Returns records from the left input. These are returned only if the records have a join partner on the right side. Anti Return records from the left input. These are returned only if the records do not have a join partner on the right side. Single Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. JoinRel Message
    message JoinRel {\n  RelCommon common = 1;\n  Rel left = 2;\n  Rel right = 3;\n  Expression expression = 4;\n  Expression post_join_filter = 5;\n\n  JoinType type = 6;\n\n  enum JoinType {\n    JOIN_TYPE_UNSPECIFIED = 0;\n    JOIN_TYPE_INNER = 1;\n    JOIN_TYPE_OUTER = 2;\n    JOIN_TYPE_LEFT = 3;\n    JOIN_TYPE_RIGHT = 4;\n    JOIN_TYPE_SEMI = 5;\n    JOIN_TYPE_ANTI = 6;\n    // This join is useful for nested sub-queries where we need exactly one record in output (or throw exception)\n    // See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf\n    JOIN_TYPE_SINGLE = 7;\n  }\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#set-operation","title":"Set Operation","text":"

    The set operation encompasses several set-level operations that support combining datasets, possibly excluding records based on various types of record level matching.

    Signature Value Inputs 2 or more Outputs 1 Property Maintenance Maintains distribution if all inputs have the same ordinal distribution. Orderedness is not maintained. Direct Output Order The field order of the inputs. All inputs must have identical fields."},{"location":"relations/logical_relations/#set-properties","title":"Set Properties","text":"Property Description Required Primary Input The primary input of the dataset. Required Secondary Inputs One or more relational inputs. At least one required Set Operation Type From list below. Required"},{"location":"relations/logical_relations/#set-operation-types","title":"Set Operation Types","text":"Property Description Minus (Primary) Returns the primary input excluding any matching records from secondary inputs. Minus (Multiset) Returns the primary input minus any records that are included in all sets. Intersection (Primary) Returns all rows primary rows that intersect at least one secondary input. Intersection (Multiset) Returns all rows that intersect at least one record from each secondary inputs. Union Distinct Returns all the records from each set, removing any rows that are duplicated (within or across sets). Union All Returns all records from each set, allowing duplicates. SetRel Message
    message SetRel {\n  RelCommon common = 1;\n  // The first input is the primary input, the remaining are secondary\n  // inputs.  There must be at least two inputs.\n  repeated Rel inputs = 2;\n  SetOp op = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  enum SetOp {\n    SET_OP_UNSPECIFIED = 0;\n    SET_OP_MINUS_PRIMARY = 1;\n    SET_OP_MINUS_MULTISET = 2;\n    SET_OP_INTERSECTION_PRIMARY = 3;\n    SET_OP_INTERSECTION_MULTISET = 4;\n    SET_OP_UNION_DISTINCT = 5;\n    SET_OP_UNION_ALL = 6;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#fetch-operation","title":"Fetch Operation","text":"

    The fetch operation eliminates records outside a desired window. Typically corresponds to a fetch/offset SQL clause. Will only returns records between the start offset and the end offset.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution and orderedness. Direct Output Order Unchanged from input."},{"location":"relations/logical_relations/#fetch-properties","title":"Fetch Properties","text":"Property Description Required Input A relational input, typically with a desired orderedness property. Required Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0. Count A positive integer. Declares the number of records that should be returned. Required FetchRel Message
    message FetchRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  // the offset expressed in number of records\n  int64 offset = 3;\n  // the amount of records to return\n  int64 count = 4;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#aggregate-operation","title":"Aggregate Operation","text":"

    The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderedness guaranteed. Direct Output Order The list of distinct columns from each grouping set (ordered by their first appearance) followed by the list of measures in declaration order, followed by an i32 describing the associated particular grouping set the value is derived from (if applicable).

    In its simplest form, an aggregation has only measures. In this case, all records are folded into one, and a column is returned for each aggregate expression in the measures list.

    Grouping sets can be used for finer-grained control over which records are folded. Within a grouping set, two records will be folded together if and only if each expressions in the grouping set yields the same value for each. The values returned by the grouping sets will be returned as columns to the left of the columns for the aggregate expressions. If a grouping set contains no grouping expressions, all rows will be folded for that grouping set.

    It\u2019s possible to specify multiple grouping sets in a single aggregate operation. The grouping sets behave more or less independently, with each returned record belonging to one of the grouping sets. The values for the grouping expression columns that are not part of the grouping set for a particular record will be set to null. Two grouping expressions will be returned using the same column if they represent the protobuf messages describing the expressions are equal. The columns for grouping expressions that do not appear in all grouping sets will be nullable (regardless of the nullability of the type returned by the grouping expression) to accomodate the null insertion.

    To further disambiguate which record belongs to which grouping set, an aggregate relation with more than one grouping set receives an extra i32 column on the right-hand side. The value of this field will be the zero-based index of the grouping set that yielded the record.

    If at least one grouping expression is present, the aggregation is allowed to not have any aggregate expressions. An aggregate relation is invalid if it would yield zero columns.

    "},{"location":"relations/logical_relations/#aggregate-properties","title":"Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional. Measures A list of one or more aggregate expressions along with an optional filter. Optional, required if no grouping sets. AggregateRel Message
    message AggregateRel {\n  RelCommon common = 1;\n\n  // Input of the aggregation\n  Rel input = 2;\n\n  // A list of one or more grouping expression sets that the aggregation measures should be calculated for.\n  // Required if there are no measures.\n  repeated Grouping groupings = 3;\n\n  // A list of one or more aggregate expressions along with an optional filter.\n  // Required if there are no groupings.\n  repeated Measure measures = 4;\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  message Grouping {\n    repeated Expression grouping_expressions = 1;\n  }\n\n  message Measure {\n    AggregateFunction measure = 1;\n\n    // An optional boolean expression that acts to filter which records are\n    // included in the measure. True means include this record for calculation\n    // within the measure.\n    // Helps to support SUM(<c>) FILTER(WHERE...) syntax without masking opportunities for optimization\n    Expression filter = 2;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#reference-operator","title":"Reference Operator","text":"

    The reference operator is used to construct DAGs of operations. In a Plan we can have multiple Rel representing various computations with potentially multiple outputs. The ReferenceRel is used to express the fact that multiple Rel might be sharing subtrees of computation. This can be used to express arbitrary DAGs as well as represent multi-query optimizations.

    As a concrete example think about two queries SELECT * FROM A JOIN B JOIN C and SELECT * FROM A JOIN B JOIN D, We could use the ReferenceRel to highlight the shared A JOIN B between the two queries, by creating a plan with 3 Rel. One expressing A JOIN B (in position 0 in the plan), one using reference as follows: ReferenceRel(0) JOIN C and a third one doing ReferenceRel(0) JOIN D. This allows to avoid the redundancy of A JOIN B.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains all properties of the input Direct Output Order Maintains order"},{"location":"relations/logical_relations/#reference-properties","title":"Reference Properties","text":"Property Description Required Referred Rel A zero-indexed positional reference to a Rel defined within the same Plan. Required ReferenceRel Message
    message ReferenceRel {\n  int32 subtree_ordinal = 1;\n\n}\n
    "},{"location":"relations/logical_relations/#write-operator","title":"Write Operator","text":"

    The write operator is an operator that consumes one input and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Output depends on OutputMode (none, or modified records) Direct Output Order Unchanged from input"},{"location":"relations/logical_relations/#write-properties","title":"Write Properties","text":"Property Description Required Write Type Definition of which object we are operating on (e.g., a fully-qualified table name). Required CTAS Schema The names of all the columns and their type for a CREATE TABLE AS. Required only for CTAS Write Operator Which type of operation we are performing (INSERT/DELETE/UPDATE/CTAS). Required Rel Input The Rel representing which records we will be operating on (e.g., VALUES for an INSERT, or which records to DELETE, or records and after-image of their values for UPDATE). Required Output Mode For views that modify a DB it is important to control which records to \u201creturn\u201d. Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_RECORDS, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to \u201ccount how many records were updated\u201d). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER"},{"location":"relations/logical_relations/#write-definition-types","title":"Write Definition Types","text":"Adding new Write Definition Types

    If you have a write definition that\u2019s not covered here, see the process for adding new write definition types.

    Write definition types are built by the community and added to the specification.

    WriteRel Message
    message WriteRel {\n  // Definition of which TABLE we are operating on\n  oneof write_type {\n    NamedObjectWrite named_table = 1;\n    ExtensionObject extension_table = 2;\n  }\n\n  // The schema of the table (must align with Rel input (e.g., number of leaf fields must match))\n  NamedStruct table_schema = 3;\n\n  // The type of operation to perform\n  WriteOp op = 4;\n\n  // The relation that determines the records to add/remove/modify\n  // the schema must match with table_schema. Default values must be explicitly stated\n  // in a ProjectRel at the top of the input. The match must also\n  // occur in case of DELETE to ensure multi-engine plans are unequivocal.\n  Rel input = 5;\n\n  // Output mode determines what is the output of executing this rel\n  OutputMode output = 6;\n  RelCommon common = 7;\n\n  enum WriteOp {\n    WRITE_OP_UNSPECIFIED = 0;\n    // The insert of new records in a table\n    WRITE_OP_INSERT = 1;\n    // The removal of records from a table\n    WRITE_OP_DELETE = 2;\n    // The modification of existing records within a table\n    WRITE_OP_UPDATE = 3;\n    // The Creation of a new table, and the insert of new records in the table\n    WRITE_OP_CTAS = 4;\n  }\n\n  enum OutputMode {\n    OUTPUT_MODE_UNSPECIFIED = 0;\n    // return no records at all\n    OUTPUT_MODE_NO_OUTPUT = 1;\n    // this mode makes the operator return all the record INSERTED/DELETED/UPDATED by the operator.\n    // The operator returns the AFTER-image of any change. This can be further manipulated by operators upstreams\n    // (e.g., retunring the typical \"count of modified records\").\n    // For scenarios in which the BEFORE image is required, the user must implement a spool (via references to\n    // subplans in the body of the Rel input) and return those with anounter PlanRel.relations.\n    OUTPUT_MODE_MODIFIED_RECORDS = 2;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#virtual-table_1","title":"Virtual Table","text":"Property Description Required Name The in-memory name to give the dataset. Required Pin Whether it is okay to remove this dataset from memory or it should be kept in memory. Optional, defaults to false."},{"location":"relations/logical_relations/#files-type_1","title":"Files Type","text":"Property Description Required Path A URI to write the data to. Supports the inclusion of field references that are listed as available in properties as a \u201crotation description field\u201d. Required Format Enumeration of available formats. Only current option is PARQUET. Required"},{"location":"relations/logical_relations/#ddl-data-definition-language-operator","title":"DDL (Data Definition Language) Operator","text":"

    The operator that defines modifications of a database schema (CREATE/DROP/ALTER for TABLE and VIEWS).

    Signature Value Inputs 1 Outputs 0 Property Maintenance N/A (no output) Direct Output Order N/A"},{"location":"relations/logical_relations/#ddl-properties","title":"DDL Properties","text":"Property Description Required Write Type Definition of which type of object we are operating on. Required Table Schema The names of all the columns and their type. Required (except for DROP operations) Table Defaults The set of default values for this table. Required (except for DROP operations) DDL Object Which type of object we are operating on (e.g., TABLE or VIEW). Required DDL Operator The operation to be performed (e.g., CREATE/ALTER/DROP). Required View Definition A Rel representing the \u201cbody\u201d of a VIEW. Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER DdlRel Message
    message DdlRel {\n  // Definition of which type of object we are operating on\n  oneof write_type {\n    NamedObjectWrite named_object = 1;\n    ExtensionObject extension_object = 2;\n  }\n\n  // The columns that will be modified (representing after-image of a schema change)\n  NamedStruct table_schema = 3;\n  // The default values for the columns (representing after-image of a schema change)\n  // E.g., in case of an ALTER TABLE that changes some of the column default values, we expect\n  // the table_defaults Struct to report a full list of default values reflecting the result of applying\n  // the ALTER TABLE operator successfully\n  Expression.Literal.Struct table_defaults = 4;\n\n  // Which type of object we operate on\n  DdlObject object = 5;\n\n  // The type of operation to perform\n  DdlOp op = 6;\n\n  // The body of the CREATE VIEW\n  Rel view_definition = 7;\n  RelCommon common = 8;\n\n  enum DdlObject {\n    DDL_OBJECT_UNSPECIFIED = 0;\n    // A Table object in the system\n    DDL_OBJECT_TABLE = 1;\n    // A View object in the system\n    DDL_OBJECT_VIEW = 2;\n  }\n\n  enum DdlOp {\n    DDL_OP_UNSPECIFIED = 0;\n    // A create operation (for any object)\n    DDL_OP_CREATE = 1;\n    // A create operation if the object does not exist, or replaces it (equivalent to a DROP + CREATE) if the object already exists\n    DDL_OP_CREATE_OR_REPLACE = 2;\n    // An operation that modifies the schema (e.g., column names, types, default values) for the target object\n    DDL_OP_ALTER = 3;\n    // An operation that removes an object from the system\n    DDL_OP_DROP = 4;\n    // An operation that removes an object from the system (without throwing an exception if the object did not exist)\n    DDL_OP_DROP_IF_EXIST = 5;\n  }\n  //TODO add PK/constraints/indexes/etc..?\n\n}\n
    Discussion Points
    • How should correlated operations be handled?
    "},{"location":"relations/physical_relations/","title":"Physical Relations","text":"

    There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the \u201cphysical plan\u201d but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait.

    "},{"location":"relations/physical_relations/#hash-equijoin-operator","title":"Hash Equijoin Operator","text":"

    The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#hash-equijoin-properties","title":"Hash Equijoin Properties","text":"Property Description Required Left Input A relational input.(Probe-side) Required Right Input A relational input.(Build-side) Required Left Keys References to the fields to join on in the left input. Required Right Keys References to the fields to join on in the right input. Required Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true. Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#nlj-nested-loop-join-operator","title":"NLJ (Nested Loop Join) Operator","text":"

    The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#nlj-properties","title":"NLJ Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Join Expression A boolean condition that describes whether each record from the left set \u201cmatch\u201d the record from the right set. Optional. Defaults to true (a Cartesian join). Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#merge-equijoin-operator","title":"Merge Equijoin Operator","text":"

    The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#merge-join-properties","title":"Merge Join Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Left Keys References to the fields to join on in the left input. Required Right Keys References to the fields to join on in the right input. Reauired Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true. Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#exchange-operator","title":"Exchange Operator","text":"

    The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness is maintained. Distribution is overwritten based on configuration. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#exchange-types","title":"Exchange Types","text":"Type Description Scatter Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels Single Bucket Define an expression that provides a single i32 bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. Multi Bucket Define an expression that provides a List<i32> of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression. Broadcast Send all records to all partitions. Round Robin Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next."},{"location":"relations/physical_relations/#exchange-properties","title":"Exchange Properties","text":"Property Description Required Input The relational input. Required. Distribution Type One of the distribution types defined above. Required. Partition Count The number of partitions targeted for output. Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count. Expression Mapping Describes a relationship between each partition ID and the destination that partition should be sent to. Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value."},{"location":"relations/physical_relations/#merging-capture","title":"Merging Capture","text":"

    A receiving operation that will merge multiple ordered streams to maintain orderedness.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness and distribution are maintained. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#merging-capture-properties","title":"Merging Capture Properties","text":"Property Description Required Blocking Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. Optional, defaults to false"},{"location":"relations/physical_relations/#simple-capture","title":"Simple Capture","text":"

    A receiving operation that will merge multiple streams in an arbitrary order.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderness is empty after this operation. Distribution are maintained. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#naive-capture-properties","title":"Naive Capture Properties","text":"Property Description Required Input The relational input. Required"},{"location":"relations/physical_relations/#top-n-operation","title":"Top-N Operation","text":"

    The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. Direct Output Order The field order of the input."},{"location":"relations/physical_relations/#top-n-properties","title":"Top-N Properties","text":"Property Description Required Input The relational input. Required Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0. Count A positive integer. Declares the number of records that should be returned. Required"},{"location":"relations/physical_relations/#hash-aggregate-operation","title":"Hash Aggregate Operation","text":"

    The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed. Direct Output Order Same as defined by Aggregate operation."},{"location":"relations/physical_relations/#hash-aggregate-properties","title":"Hash Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0. Measures A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. Optional, required if no grouping sets."},{"location":"relations/physical_relations/#streaming-aggregate-operation","title":"Streaming Aggregate Operation","text":"

    The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering. Direct Output Order Same as defined by Aggregate operation."},{"location":"relations/physical_relations/#streaming-aggregate-properties","title":"Streaming Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0. Measures A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. Optional, required if no grouping sets."},{"location":"relations/physical_relations/#consistent-partition-window-operation","title":"Consistent Partition Window Operation","text":"

    A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution and ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#window-properties","title":"Window Properties","text":"Property Description Required Input The relational input. Required Window Functions One or more window functions. At least one required."},{"location":"relations/physical_relations/#expand-operation","title":"Expand Operation","text":"

    The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept. Direct Output Order The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from."},{"location":"relations/physical_relations/#expand-properties","title":"Expand Properties","text":"Property Description Required Input The relational input. Required Direct Fields Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field Required"},{"location":"relations/physical_relations/#switching-field-properties","title":"Switching Field Properties","text":"

    A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates.

    Property Description Required Duplicates List of one or more expressions. The output will contain a row for each expression. Required"},{"location":"relations/physical_relations/#hashing-window-operation","title":"Hashing Window Operation","text":"

    A window aggregate operation that will build hash tables for each distinct partition expression.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution. Eliminates ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#hashing-window-properties","title":"Hashing Window Properties","text":"Property Description Required Input The relational input. Required Window Expressions One or more window expressions. At least one required."},{"location":"relations/physical_relations/#streaming-window-operation","title":"Streaming Window Operation","text":"

    A window aggregate operation that relies on a partition/ordering sorted input.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution. Eliminates ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#streaming-window-properties","title":"Streaming Window Properties","text":"Property Description Required Input The relational input. Required Window Expressions One or more window expressions. Must be supported by the sortedness of the input. At least one required."},{"location":"relations/user_defined_relations/","title":"User Defined Relations","text":"

    Pending

    "},{"location":"serialization/binary_serialization/","title":"Binary Serialization","text":"

    Substrait can be serialized into a protobuf-based binary representation. The proto schema/IDL files can be found on GitHub. Proto files are place in the io.substrait namespace for C++/Java and the Substrait.Protobuf namespace for C#.

    "},{"location":"serialization/binary_serialization/#plan","title":"Plan","text":"

    The main top-level object used to communicate a Substrait plan using protobuf is a Plan message (see the ExtendedExpression for an alternative other top-level object). The plan message is composed of a set of data structures that minimize repetition in the serialization along with one (or more) Relation trees.

    Plan Message
    message Plan {\n  // Substrait version of the plan. Optional up to 0.17.0, required for later\n  // versions.\n  Version version = 6;\n\n  // a list of yaml specifications this plan may depend on\n  repeated substrait.extensions.SimpleExtensionURI extension_uris = 1;\n\n  // a list of extensions this plan may depend on\n  repeated substrait.extensions.SimpleExtensionDeclaration extensions = 2;\n\n  // one or more relation trees that are associated with this plan.\n  repeated PlanRel relations = 3;\n\n  // additional extensions associated with this plan.\n  substrait.extensions.AdvancedExtension advanced_extensions = 4;\n\n  // A list of com.google.Any entities that this plan may use. Can be used to\n  // warn if some embedded message types are unknown. Note that this list may\n  // include message types that are ignorable (optimizations) or that are\n  // unused. In many cases, a consumer may be able to work with a plan even if\n  // one or more message types defined here are unknown.\n  repeated string expected_type_urls = 5;\n\n}\n
    "},{"location":"serialization/binary_serialization/#extensions","title":"Extensions","text":"

    Protobuf supports both simple and advanced extensions. Simple extensions are declared at the plan level and advanced extensions are declared at multiple levels of messages within the plan.

    "},{"location":"serialization/binary_serialization/#simple-extensions","title":"Simple Extensions","text":"

    For simple extensions, a plan references the URIs associated with the simple extensions to provide additional plan capabilities. These URIs will list additional relevant information for the plan.

    Simple extensions within a plan are split into three components: an extension URI, an extension declaration and a number of references.

    • Extension URI: A unique identifier for the extension pointing to a YAML document specifying one or more specific extensions. Declares an anchor that can be used in extension declarations.
    • Extension Declaration: A specific extension within a single YAML document. The declaration combines a reference to the associated Extension URI along with a unique key identifying the specific item within that YAML document (see Function Signature Compound Names). It also defines a declaration anchor. The anchor is a plan-specific unique value that the producer creates as a key to be referenced elsewhere.
    • Extension Reference: A specific instance or use of an extension declaration within the plan body.

    Extension URIs and declarations are encapsulated in the top level of the plan. Extension declarations are then referenced throughout the body of the plan itself. The exact structure of these references will depend on the extension point being used, but they will always include the extension\u2019s anchor (or key). For example, all scalar function expressions contain references to an extension declaration which defines the semantics of the function.

    Simple Extension URI
    message SimpleExtensionURI {\n  // A surrogate key used in the context of a single plan used to reference the\n  // URI associated with an extension.\n  uint32 extension_uri_anchor = 1;\n\n  // The URI where this extension YAML can be retrieved. This is the \"namespace\"\n  // of this extension.\n  string uri = 2;\n\n}\n

    Once the YAML file URI anchor is defined, the anchor will be referenced by zero or more SimpleExtensionDefinitions. For each simple extension definition, an anchor is defined for that specific extension entity. This anchor is then referenced to within lower-level primitives (functions, etc.) to reference that specific extension. Message properties are named *_anchor where the anchor is defined and *_reference when referencing the anchor. For example function_anchor and function_reference.

    Simple Extension Declaration
    message SimpleExtensionDeclaration {\n  oneof mapping_type {\n    ExtensionType extension_type = 1;\n    ExtensionTypeVariation extension_type_variation = 2;\n    ExtensionFunction extension_function = 3;\n  }\n\n  // Describes a Type\n  message ExtensionType {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific extension type\n    uint32 type_anchor = 2;\n\n    // the name of the type in the defined extension YAML.\n    string name = 3;\n  }\n\n  message ExtensionTypeVariation {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific type variation\n    uint32 type_variation_anchor = 2;\n\n    // the name of the type in the defined extension YAML.\n    string name = 3;\n  }\n\n  message ExtensionFunction {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific function\n    uint32 function_anchor = 2;\n\n    // A function signature compound name\n    string name = 3;\n  }\n\n}\n

    Note

    Anchors only have meaning within a single plan and exist simply to reduce plan size. They are not some form of global identifier. Different plans may use different anchors for the same specific functions, types, type variations, etc.

    Note

    It is valid for a plan to include SimpleExtensionURIs and/or SimpleExtensionDeclarations that are not referenced directly.

    "},{"location":"serialization/binary_serialization/#advanced-extensions","title":"Advanced Extensions","text":"

    Substrait protobuf exposes a special object in multiple places in the representation to expose extension capabilities. Extensions are done via this object. Extensions are separated into main concepts:

    Advanced Extension Type Description Optimization A change to the plan that may help some consumers work more efficiently with the plan. These properties should be propagated through plan pipelines where possible but do not impact the meaning of the plan. A consumer can safely ignore these properties. Enhancement A change to the plan that functionally changes the behavior of the plan. Use these sparingly as they will impact plan interoperability. Advanced Extension Protobuf
    message AdvancedExtension {\n  // An optimization is helpful information that don't influence semantics. May\n  // be ignored by a consumer.\n  google.protobuf.Any optimization = 1;\n\n  // An enhancement alter semantics. Cannot be ignored by a consumer.\n  google.protobuf.Any enhancement = 2;\n\n}\n
    "},{"location":"serialization/binary_serialization/#capabilities","title":"Capabilities","text":"

    When two systems exchanging Substrait plans want to understand each other\u2019s capabilities, they may exchange a Capabilities message. The capabilities message provides information on the set of simple and advanced extensions that the system supports.

    Capabilities Message
    message Capabilities {\n  // List of Substrait versions this system supports\n  repeated string substrait_versions = 1;\n\n  // list of com.google.Any message types this system supports for advanced\n  // extensions.\n  repeated string advanced_extension_type_urls = 2;\n\n  // list of simple extensions this system supports.\n  repeated SimpleExtension simple_extensions = 3;\n\n  message SimpleExtension {\n    string uri = 1;\n    repeated string function_keys = 2;\n    repeated string type_keys = 3;\n    repeated string type_variation_keys = 4;\n  }\n\n}\n
    "},{"location":"serialization/binary_serialization/#protobuf-rationale","title":"Protobuf Rationale","text":"

    The binary format of Substrait is designed to be easy to work with in many languages. A key requirement is that someone can take the binary format IDL and use standard tools to build a set of primitives that are easy to work with in any of a number of languages. This allows communities to build and use Substrait using only a binary IDL and the specification (and allows the Substrait project to avoid being required to build libraries for each language to work with the specification).

    There are several binary IDLs that exist today. The key requirements for Substrait are the following:

    • Strongly typed IDL schema language
    • High-quality well-supported and idiomatic bindings/compilers for key languages (Python, Javascript, C++, Go, Rust, Java)
    • Compact serial representation

    The primary formats that exist that roughly qualify under these requirements include: Protobuf, Thrift, Flatbuf, Avro, Cap\u2019N\u2019Proto. Protobuf was chosen due to its clean typing system and large number of high quality language bindings.

    The binary serialization IDLs can be found on GitHub and are sampled throughout the documentation.

    "},{"location":"serialization/text_serialization/","title":"Text Serialization","text":"

    To maximize the new user experience, it is important for Substrait to have a text representation of plans. This allows people to experiment with basic tooling. Building simple CLI tools that do things like SQL > Plan and Plan > SQL or REPL plan construction can all be done relatively straightforwardly with a text representation.

    The recommended text serialization format is JSON. Since the text format is not designed for performance, the format can be produced to maximize readability. This also allows nice symmetry between the construction of plans and the configuration of various extensions such as function signatures and user defined types.

    To ensure the JSON is valid, the object will be defined using the OpenApi 3.1 specification. This not only allows strong validation, the OpenApi specification enables code generators to be easily used to produce plans in many languages.

    While JSON will be used for much of the plan serialization, Substrait uses a custom simplistic grammar for record level expressions. While one can construct an equation such as (10 + 5)/2 using a tree of function and literal objects, it is much more human-readable to consume a plan when the information is written similarly to the way one typically consumes scalar expressions. This grammar will be maintained in an ANTLR grammar (targetable to multiple programming languages) and is also planned to be supported via JSON schema definition format tag so that the grammar can be validated as part of the schema validation.

    "},{"location":"spec/extending/","title":"Extending","text":"

    Substrait is a community project and requires consensus about new additions to the specification in order to maintain consistency. The best way to get consensus is to discuss ideas. The main ways to communicate are:

    • Substrait Mailing List
    • Substrait Slack
    • Community Meeting
    "},{"location":"spec/extending/#minor-changes","title":"Minor changes","text":"

    Simple changes like typos and bug fixes do not require as much effort. File an issue or send a PR and we can discuss it there.

    "},{"location":"spec/extending/#complex-changes","title":"Complex changes","text":"

    For complex features it is useful to discuss the change first. It will be useful to gather some background information to help get everyone on the same page.

    "},{"location":"spec/extending/#outline-the-issue","title":"Outline the issue","text":""},{"location":"spec/extending/#language","title":"Language","text":"

    Every engine has its own terminology. Every Spark user probably knows what an \u201cattribute\u201d is. Velox users will know what a \u201cRowVector\u201d means. Etc. However, Substrait is used by people that come from a variety of backgrounds and you should generally assume that its users do not know anything about your own implementation. As a result, all PRs and discussion should endeavor to use Substrait terminology wherever possible.

    "},{"location":"spec/extending/#motivation","title":"Motivation","text":"

    What problems does this relation solve? If it is a more logical relation then how does it allow users to express new capabilities? If it is more of an internal relation then how does it map to existing logical relations? How is it different than other existing relations? Why do we need this?

    "},{"location":"spec/extending/#examples","title":"Examples","text":"

    Provide example input and output for the relation. Show example plans. Try and motivate your examples, as best as possible, with something that looks like a real world problem. These will go a long ways towards helping others understand the purpose of a relation.

    "},{"location":"spec/extending/#alternatives","title":"Alternatives","text":"

    Discuss what alternatives are out there. Are there other ways to achieve similar results? Do some systems handle this problem differently?

    "},{"location":"spec/extending/#survey-existing-implementation","title":"Survey existing implementation","text":"

    It\u2019s unlikely that this is the first time that this has been done. Figuring out

    "},{"location":"spec/extending/#prototype-the-feature","title":"Prototype the feature","text":"

    Novel approaches should be implemented as an extension first.

    "},{"location":"spec/extending/#substrait-design-principles","title":"Substrait design principles","text":"

    Substrait is designed around interoperability so a feature only used by a single system may not be accepted. But don\u2019t dispair! Substrait has a highly developed extension system for this express purpose.

    "},{"location":"spec/extending/#you-dont-have-to-do-it-alone","title":"You don\u2019t have to do it alone","text":"

    If you are hoping to add a feature and these criteria seem intimidating then feel free to start a mailing list discussion before you have all the information and ask for help. Investigating other implementations, in particular, is something that can be quite difficult to do on your own.

    "},{"location":"spec/specification/","title":"Specification","text":""},{"location":"spec/specification/#status","title":"Status","text":"

    The specification has passed the initial design phase and is now in the final stages of being fleshed out. The community is encouraged to identify (and address) any perceived gaps in functionality using GitHub issues and PRs. Once all of the planned implementations have been completed all deprecated fields will be eliminated and version 1.0 will be released.

    "},{"location":"spec/specification/#components-complete","title":"Components (Complete)","text":"Section Description Simple Types A way to describe the set of basic types that will be operated on within a plan. Only includes simple types such as integers and doubles (nothing configurable or compound). Compound Types Expression of types that go beyond simple scalar values. Key concepts here include: configurable types such as fixed length and numeric types as well as compound types such as structs, maps, lists, etc. Type Variations Physical variations to base types. User Defined Types Extensions that can be defined for specific IR producers/consumers. Field References Expressions to identify which portions of a record should be operated on. Scalar Functions Description of how functions are specified. Concepts include arguments, variadic functions, output type derivation, etc. Scalar Function List A list of well-known canonical functions in YAML format. Specialized Record Expressions Specialized expression types that are more naturally expressed outside the function paradigm. Examples include items such as if/then/else and switch statements. Aggregate Functions Functions that are expressed in aggregation operations. Examples include things such as SUM, COUNT, etc. Operations take many records and collapse them into a single (possibly compound) value. Window Functions Functions that relate a record to a set of encompassing records. Examples in SQL include RANK, NTILE, etc. User Defined Functions Reusable named functions that are built beyond the core specification. Implementations are typically registered thorough external means (drop a file in a directory, send a special command with implementation, etc.) Embedded Functions Functions implementations embedded directly within the plan. Frequently used in data science workflows where business logic is interspersed with standard operations. Relation Basics Basic concepts around relational algebra, record emit and properties. Logical Relations Common relational operations used in compute plans including project, join, aggregation, etc. Text Serialization A human producible & consumable representation of the plan specification. Binary Serialization A high performance & compact binary representation of the plan specification."},{"location":"spec/specification/#components-designed-but-not-implemented","title":"Components (Designed but not Implemented)","text":"Section Description Table Functions Functions that convert one or more values from an input record into 0..N output records. Example include operations such as explode, pos-explode, etc. User Defined Relations Installed and reusable relational operations customized to a particular platform. Embedded Relations Relational operations where plans contain the \u201cmachine code\u201d to directly execute the necessary operations. Physical Relations Specific execution sub-variations of common relational operations that describe have multiple unique physical variants associated with a single logical operation. Examples include hash join, merge join, nested loop join, etc."},{"location":"spec/technology_principles/","title":"Technology Principles","text":"
    • Provide a good suite of well-specified common functionality in databases and data science applications.
    • Make it easy for users to privately or publicly extend the representation to support specialized/custom operations.
    • Produce something that is language agnostic and requires minimal work to start developing against in a new language.
    • Drive towards a common format that avoids specialization for single favorite producer or consumer.
    • Establish clear delineation between specifications that MUST be respected to and those that can be optionally ignored.
    • Establish a forgiving compatibility approach and versioning scheme that supports cross-version compatibility in maximum number of cases.
    • Minimize the need for consumer intelligence by excluding concepts like overloading, type coercion, implicit casting, field name handling, etc. (Note: this is weak and should be better stated.)
    • Decomposability/severability: A particular producer or consumer should be able to produce or consume only a subset of the specification and interact well with any other Substrait system as long the specific operations requested fit within the subset of specification supported by the counter system.
    "},{"location":"spec/versioning/","title":"Versioning","text":"

    As an interface specification, the goal of Substrait is to reach a point where (breaking) changes will never need to happen again, or at least be few and far between. By analogy, Apache Arrow\u2019s in-memory format specification has stayed functionally constant, despite many major library versions being released. However, we\u2019re not there yet. When we believe that we\u2019ve reached this point, we will signal this by releasing version 1.0.0. Until then, we will remain in the 0.x.x version regime.

    Despite this, we strive to maintain backward compatibility for both the binary representation and the text representation by means of deprecation. When a breaking change cannot be reasonably avoided, we may remove previously deprecated fields. All deprecated fields will be removed for the 1.0.0 release.

    Substrait uses semantic versioning for its version numbers, with the addition that, during 0.x.y, we increment the x digit for breaking changes and new features, and the y digit for fixes and other nonfunctional changes. The release process is currently automated and makes a new release every week, provided something has changed on the main branch since the previous release. This release cadence will likely be slowed down as stability increases over time. Conventional commits are used to distinguish between breaking changes, new features, and fixes, and GitHub actions are used to verify that there are indeed no breaking protobuf changes in a commit, unless the commit message states this.

    "},{"location":"tools/producer_tools/","title":"Producer Tools","text":""},{"location":"tools/producer_tools/#isthmus","title":"Isthmus","text":"

    Isthmus is an application that serializes SQL to Substrait Protobuf via the Calcite SQL compiler.

    "},{"location":"tools/substrait_validator/","title":"Substrait Validator","text":"

    The Substrait Validator is a tool used to validate substrait plans as well as print diagnostics information regarding the plan validity.

    "},{"location":"tools/third_party_tools/","title":"Third Party Tools","text":""},{"location":"tools/third_party_tools/#substrait-tools","title":"Substrait-tools","text":"

    The substrait-tools python package provides a command line interface for producing/consuming substrait plans by leveraging the APIs from different producers and consumers.

    "},{"location":"tools/third_party_tools/#substrait-fiddle","title":"Substrait Fiddle","text":"

    Substrait Fiddle is an online tool to share, debug, and prototype Substrait plans.

    The Substrait Fiddle Source is available allowing it to be run in any environment.

    "},{"location":"tutorial/sql_to_substrait/","title":"SQL to Substrait tutorial","text":"

    This is an introductory tutorial to learn the basics of Substrait for readers already familiar with SQL. We will look at how to construct a Substrait plan from an example query.

    We\u2019ll present the Substrait in JSON form to make it relatively readable to newcomers. Typically Substrait is exchanged as a protobuf message, but for debugging purposes it is often helpful to look at a serialized form. Plus, it\u2019s not uncommon for unit tests to represent plans as JSON strings. So if you are developing with Substrait, it\u2019s useful to have experience reading them.

    Note

    Substrait is currently only defined with Protobuf. The JSON provided here is the Protobuf JSON output, but it is not the official Substrait text format. Eventually, Substrait will define it\u2019s own human-readable text format, but for now this tutorial will make due with what Protobuf provides.

    Substrait is designed to communicate plans (mostly logical plans). Those plans contain types, schemas, expressions, extensions, and relations. We\u2019ll look at them in that order, going from simplest to most complex until we can construct full plans.

    This tutorial won\u2019t cover all the details of each piece, but it will give you an idea of how they connect together. For a detailed reference of each individual field, the best place to look is reading the protobuf definitions. They represent the source-of-truth of the spec and are well-commented to address ambiguities.

    "},{"location":"tutorial/sql_to_substrait/#problem-set-up","title":"Problem Set up","text":"

    To learn Substrait, we\u2019ll build up to a specific query. We\u2019ll be using the tables:

    CREATE TABLE orders (\n  product_id: i64 NOT NULL,\n  quantity: i32 NOT NULL,\n  order_date: date NOT NULL,\n  price: decimal(10, 2)\n);\n
    CREATE TABLE products (\n  product_id: i64 NOT NULL,\n  categories: list<string NOT NULL> NOT NULL,\n  details: struct<manufacturer: string, year_created: int32>,\n  product_name: string\n);\n

    This orders table represents events where products were sold, recording how many (quantity) and at what price (price). The products table provides details for each product, with product_id as the primary key.

    And we\u2019ll try to create the query:

    SELECT\n  product_name,\n  product_id,\n  sum(quantity * price) as sales\nFROM\n  orders\nINNER JOIN\n  products\nON\n  orders.product_id = products.product_id\nWHERE\n  -- categories does not contain \"Computers\"\n  INDEX_IN(\"Computers\", categories) IS NULL\nGROUP BY\n  product_name,\n  product_id\n

    The query asked the question: For products that aren\u2019t in the \"Computer\" category, how much has each product generated in sales?

    However, Substrait doesn\u2019t correspond to SQL as much as it does to logical plans. So to be less ambiguous, the plan we are aiming for looks like:

    |-+ Aggregate({sales = sum(quantity_price)}, group_by=(product_name, product_id))\n  |-+ InnerJoin(on=orders.product_id = products.product_id)\n    |- ReadTable(orders)\n    |-+ Filter(INDEX_IN(\"Computers\", categories) IS NULL)\n      |- ReadTable(products)\n
    "},{"location":"tutorial/sql_to_substrait/#types-and-schemas","title":"Types and Schemas","text":"

    As part of the Substrait plan, we\u2019ll need to embed the data types of the input tables. In Substrait, each type is a distinct message, which at a minimum contains a field for nullability. For example, a string field looks like:

    {\n  \"string\": {\n    \"nullability\": \"NULLABILITY_NULLABLE\"\n  }\n}\n

    Nullability is an enum not a boolean, since Substrait allows NULLABILITY_UNSPECIFIED as an option, in addition to NULLABILITY_NULLABLE (nullable) and NULLABILITY_REQUIRED (not nullable).

    Other types such as VarChar and Decimal have other parameters. For example, our orders.price column will be represented as:

    {\n  \"decimal\": {\n    \"precision\": 10,\n    \"scale\": 2,\n    \"nullability\": \"NULLABILITY_NULLABLE\"\n  }\n}\n

    Finally, there are nested compound types such as structs and list types that have other types as parameters. For example, the products.categories column is a list of strings, so can be represented as:

    {\n  \"list\": {\n    \"type\": {\n      \"string\": {\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    },\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n

    To know what parameters each type can take, refer to the Protobuf definitions in type.proto.

    Schemas of tables can be represented with a NamedStruct message, which is the combination of a struct type containing all the columns and a list of column names. For the orders table, this will look like:

    {\n  \"names\": [\n    \"product_id\",\n    \"quantity\",\n    \"order_date\",\n    \"price\"\n  ],\n  \"struct\": {\n    \"types\": [\n      {\n        \"i64\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"i32\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"date\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"decimal\": {\n          \"precision\": 10,\n          \"scale\": 2,\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      }\n    ],\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n

    Here, names is the names of all fields. In nested schemas, this includes the names of subfields in depth-first order. So for the products table, the details struct field will be included as well as the two subfields (manufacturer and year_created) right after. And because it\u2019s depth first, these subfields appear before product_name. The full schema looks like:

    {\n  \"names\": [\n    \"product_id\",\n    \"categories\",\n    \"details\",\n    \"manufacturer\",\n    \"year_created\",\n    \"product_name\"\n  ],\n  \"struct\": {\n    \"types\": [\n      {\n        \"i64\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"list\": {\n          \"type\": {\n            \"string\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"struct\": {\n          \"types\": [\n            {\n              \"string\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              },\n              \"i32\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              }\n            }\n          ],\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      },\n      {\n        \"string\": {\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      }\n    ],\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#expressions","title":"Expressions","text":"

    The next basic building block we will need is expressions. Expressions can be one of several things, including:

    • Field references
    • Literal values
    • Functions
    • Subqueries
    • Window Functions

    Since some expressions such as functions can contain other expressions, expressions can be represented as a tree. Literal values and field references typically are the leaf nodes.

    For the expression INDEX_IN(categories, \"Computers\") IS NULL, we have a field reference categories, a literal string \"Computers\", and two functions\u2014 INDEX_IN and IS NULL.

    The field reference for categories is represented by:

    {\n  \"selection\": {\n    \"directReference\": {\n      \"structField\": {\n        \"field\": 1\n      }\n    },\n    \"rootReference\": {}\n  }\n}\n

    Whereas SQL references field by names, Substrait always references fields numerically. This means that a Substrait expression only makes sense relative to a certain schema. As we\u2019ll see later when we discuss relations, for a filter relation this will be relative to the input schema, so the 1 here is referring to the second field of products.

    Note

    Protobuf may not serialize fields with integer type and value 0, since 0 is the default. So if you instead saw \"structField\": {}, know that is is equivalent to \"structField\": { \"field\": 0 }.

    \"Computers\" will be translated to a literal expression:

    {\n  \"literal\": {\n    \"string\": \"Computers\"\n  }\n}\n

    Both IS NULL and INDEX_IN will be scalar function expressions. Available functions in Substrait are defined in extension YAML files contained in https://github.com/substrait-io/substrait/tree/main/extensions. Additional extensions may be created elsewhere. IS NULL is defined as a is_null function in functions_comparison.yaml and INDEX_IN is defined as index_in function in functions_set.yaml.

    First, the expression for INDEX_IN(\"Computers\", categories) is:

    {\n  \"scalarFunction\": {\n    \"functionReference\": 1,\n    \"outputType\": {\n      \"i64\": {\n        \"nullability\": \"NULLABILITY_NULLABLE\"\n      }\n    },\n    \"arguments\": [\n      {\n        \"value\": {\n          \"literal\": {\n            \"string\": \"Computers\"\n          }\n        }\n      },\n      {\n        \"value\": {\n          \"selection\": {\n            \"directReference\": {\n              \"structField\": {\n                \"field\": 1\n              }\n            },\n            \"rootReference\": {}\n          }\n        }\n      }\n    ]\n  }\n}\n

    functionReference will be explained later in the plans section. For now, understand that it\u2019s a ID that corresponds to an entry in a list of function definitions that we will create later.

    outputType defines the type the function outputs. We know this is a nullable i64 type since that is what the function definition declares in the YAML file.

    arguments defines the arguments being passed into the function, which are all done positionally based on the function definition in the YAML file. The two arguments will be familiar as the literal and the field reference we constructed earlier.

    To create the final expression, we just need to wrap this in another scalar function expression for IS NULL.

    {\n  \"scalarFunction\": {\n    \"functionReference\": 2,\n    \"outputType\": {\n      \"bool\": {\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    },\n    \"arguments\": [\n      {\n        \"value\": {\n          \"scalarFunction\": {\n            \"functionReference\": 1,\n            \"outputType\": {\n              \"i64\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              }\n            },\n            \"arguments\": [\n              {\n                \"value\": {\n                  \"literal\": {\n                    \"string\": \"Computers\"\n                  }\n                }\n              },\n              {\n                \"value\": {\n                  \"selection\": {\n                    \"directReference\": {\n                      \"structField\": {\n                        \"field\": 1\n                      }\n                    },\n                    \"rootReference\": {}\n                  }\n                }\n              }\n            ]\n          }\n        }\n      }\n    ]\n  }\n}\n

    To see what other types of expressions are available and what fields they take, see the Expression proto definition in algebra.proto.

    "},{"location":"tutorial/sql_to_substrait/#relations","title":"Relations","text":"

    In most SQL engines, a logical or physical plan is represented as a tree of nodes, such as filter, project, scan, or join. The left diagram below may be a familiar representation of our plan, where nodes feed data into each other moving from left to right. In Substrait, each of these nodes is a Relation.

    A relation that takes another relation as input will contain (or refer to) that relation. This is usually a field called input, but sometimes different names are used in relations that take multiple inputs. For example, join relations take two inputs, with field names left and right. In JSON, the rough layout for the relations in our plan will look like:

    {\n    \"aggregate\": {\n        \"input\": {\n            \"join\": {\n                \"left\": {\n                    \"filter\": {\n                        \"input\": {\n                            \"read\": {\n                                ...\n                            }\n                        },\n                        ...\n                    }\n                },\n                \"right\": {\n                    \"read\": {\n                        ...\n                    }\n                },\n                ...\n            }\n        },\n        ...\n    }\n}\n

    For our plan, we need to define the read relations for each table, a filter relation to exclude the \"Computer\" category from the products table, a join relation to perform the inner join, and finally an aggregate relation to compute the total sales.

    The read relations are composed of a baseSchema and a namedTable field. The type of read is a named table, so the namedTable field is present with names containing the list of name segments (my_database.my_table). Other types of reads include virtual tables (a table of literal values embedded in the plan) and a list of files. See Read Definition Types for more details. The baseSchema is the schemas we defined earlier and namedTable are just the names of the tables. So for reading the orders table, the relation looks like:

    {\n  \"read\": {\n    \"namedTable\": {\n      \"names\": [\n        \"orders\"\n      ]\n    },\n    \"baseSchema\": {\n      \"names\": [\n        \"product_id\",\n        \"quantity\",\n        \"order_date\",\n        \"price\"\n      ],\n      \"struct\": {\n        \"types\": [\n          {\n            \"i64\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"i32\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"date\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"decimal\": {\n              \"scale\": 10,\n              \"precision\": 2,\n              \"nullability\": \"NULLABILITY_NULLABLE\"\n            }\n          }\n        ],\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    }\n  }\n}\n

    Read relations are leaf nodes. Leaf nodes don\u2019t depend on any other node for data and usually represent a source of data in our plan. Leaf nodes are then typically used as input for other nodes that manipulate the data. For example, our filter node will take the products read relation as an input.

    The filter node will also take a condition field, which will just be the expression we constructed earlier.

    {\n  \"filter\": {\n    \"input\": {\n      \"read\": { ... }\n    },\n    \"condition\": {\n      \"scalarFunction\": {\n        \"functionReference\": 2,\n        \"outputType\": {\n          \"bool\": {\n            \"nullability\": \"NULLABILITY_REQUIRED\"\n          }\n        },\n        \"arguments\": [\n          {\n            \"value\": {\n              \"scalarFunction\": {\n                \"functionReference\": 1,\n                \"outputType\": {\n                  \"i64\": {\n                    \"nullability\": \"NULLABILITY_NULLABLE\"\n                  }\n                },\n                \"arguments\": [\n                  {\n                    \"value\": {\n                      \"literal\": {\n                        \"string\": \"Computers\"\n                      }\n                    }\n                  },\n                  {\n                    \"value\": {\n                      \"selection\": {\n                        \"directReference\": {\n                          \"structField\": {\n                            \"field\": 1\n                          }\n                        },\n                        \"rootReference\": {}\n                      }\n                    }\n                  }\n                ]\n              }\n            }\n          }\n        ]\n      }\n    }\n  }\n}\n

    The join relation will take two inputs. In the left field will be the read relation for orders and in the right field will be the filter relation (from products). The type field is an enum that allows us to specify we want an inner join. Finally, the expression field contains the expression to use in the join. Since we haven\u2019t used the equals() function yet, we use the reference number 3 here. (Again, we\u2019ll see at the end with plans how these functions are resolved.) The arguments refer to fields 0 and 4, which are indices into the combined schema formed from the left and right inputs. We\u2019ll discuss later in Field Indices where these come from.

    {\n  \"join\": {\n    \"left\": { ... },\n    \"right\": { ... },\n    \"type\": \"JOIN_TYPE_INNER\",\n    \"expression\": {\n      \"scalarFunction\": {\n        \"functionReference\": 3,\n        \"outputType\": {\n          \"bool\": {\n            \"nullability\": \"NULLABILITY_NULLABLE\"\n          }\n        },\n        \"arguments\": [\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 0\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 4\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          }\n        ]\n      }\n    }\n  }\n}\n

    The final aggregation requires two things, other than the input. First is the groupings. We\u2019ll use a single grouping expression containing the references to the fields product_name and product_id. (Multiple grouping expressions can be used to do cube aggregations.)

    For measures, we\u2019ll need to define sum(quantity * price) as sales. Substrait is stricter about data types, and quantity is an integer while price is a decimal. So we\u2019ll first need to cast quantity to a decimal, making the Substrait expression more like sum(multiply(cast(decimal(10, 2), quantity), price)). Both sum() and multiply() are functions, defined in functions_arithmetic_demical.yaml. However cast() is a special expression type in Substrait, rather than a function.

    Finally, the naming with as sales will be handled at the end as part of the plan, so that\u2019s not part of the relation. Since we are always using field indices to refer to fields, Substrait doesn\u2019t record any intermediate field names.

    {\n  \"aggregate\": {\n    \"input\": { ... },\n    \"groupings\": [\n      {\n        \"groupingExpressions\": [\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 0\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 7\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n        ]\n      }\n    ],\n    \"measures\": [\n      {\n        \"measure\": {\n          \"functionReference\": 4,\n          \"outputType\": {\n            \"decimal\": {\n              \"precision\": 38,\n              \"scale\": 2,\n              \"nullability\": \"NULLABILITY_NULLABLE\"\n            }\n          },\n          \"arguments\": [\n            {\n              \"value\": {\n                \"scalarFunction\": {\n                  \"functionReference\": 5,\n                  \"outputType\": {\n                    \"decimal\": {\n                      \"precision\": 38,\n                      \"scale\": 2,\n                      \"nullability\": \"NULLABILITY_NULLABLE\"\n                    }\n                  },\n                  \"arguments\": [\n                    {\n                      \"value\": {\n                        \"cast\": {\n                          \"type\": {\n                            \"decimal\": {\n                              \"precision\": 10,\n                              \"scale\": 2,\n                              \"nullability\": \"NULLABILITY_REQUIRED\"\n                            }\n                          },\n                          \"input\": {\n                            \"selection\": {\n                              \"directReference\": {\n                                \"structField\": {\n                                  \"field\": 1\n                                }\n                              },\n                              \"rootReference\": {}\n                            }\n                          }\n                        }\n                      }\n                    },\n                    {\n                      \"value\": {\n                        \"selection\": {\n                          \"directReference\": {\n                            \"structField\": {\n                              \"field\": 3\n                            }\n                          },\n                          \"rootReference\": {}\n                        }\n                      }\n                    }\n                  ]\n                }\n              }\n            }\n          ]\n        }\n      }\n    ]\n  }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#field-indices","title":"Field indices","text":"

    So far, we have glossed over the field indices. Now that we\u2019ve built up each of the relations, it will be a bit easier to explain them.

    Throughout the plan, data always has some implicit schema, which is modified by each relation. Often, the schema can change within a relation\u2013we\u2019ll discuss an example in the next section. Each relation has it\u2019s own rules in how schemas are modified, called the output order or emit order. For the purposes of our query, the relevant rules are:

    • For Read relations, their output schema is the schema of the table.
    • For Filter relations, the output schema is the same as in the input schema.
    • For Joins relations, the input schema is the concatenation of the left and then the right schemas. The output schema is the same.
    • For Aggregate relations, the output schema is the group by fields followed by the measures.

    Note

    Sometimes it can be hard to tell what the implicit schema is. For help determining that, consider using the substrait-validator tool, described in Next Steps.

    The diagram below shows the mapping of field indices within each relation and how each of the field references show up in each relations properties.

    "},{"location":"tutorial/sql_to_substrait/#column-selection-and-emit","title":"Column selection and emit","text":"

    As written, the aggregate output schema will be:

    0: product_id: i64\n1: product_name: string\n2: sales: decimal(32, 8)\n

    But we want product_name to come before product_id in our output. How do we reorder those columns?

    You might be tempted to add a Project relation at the end. However, the project relation only adds columns; it is not responsible for subsetting or reordering columns.

    Instead, any relation can reorder or subset columns through the emit property. By default, it is set to direct, which outputs all columns \u201cas is\u201d. But it can also be specified as a sequence of field indices.

    For simplicity, we will add this to the final aggregate relation. We could also add it to all relations, only selecting the fields we strictly need in later relations. Indeed, a good optimizer would probably do that to our plan. And for some engines, the emit property is only valid within a project relation, so in those cases we would need to add that relation in combination with emit. But to keep things simple, we\u2019ll limit the columns at the end within the aggregation relation.

    For our final column selection, we\u2019ll modify the top-level relation to be:

    {\n  \"aggregate\": {\n    \"input\": { ... },\n    \"groupings\": [ ... ],\n    \"measures\": [ ... ],\n    \"common\": {\n      \"emit\": {\n        \"outputMapping\": [1, 0, 2]\n      }\n    }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#plans","title":"Plans","text":"

    Now that we\u2019ve constructed our relations, we can put it all into a plan. Substrait plans are the only messages that can be sent and received on their own. Recall that earlier, we had function references to those YAML files, but so far there\u2019s been no place to tell a consumer what those function reference IDs mean or which extensions we are using. That information belongs at the plan level.

    The overall layout for a plan is

    {\n  \"extensionUris\": [ ... ],\n  \"extensions\": [ ... ],\n  \"relations\": [\n    {\n      \"root\": {\n        \"names\": [\n          \"product_name\",\n          \"product_id\",\n          \"sales\"\n        ],\n        \"input\": { ... }\n      }\n    }\n  ]\n}\n

    The relations field is a list of Root relations. Most queries only have one root relation, but the spec allows for multiple so a common plan could be referenced by other plans, sort of like a CTE (Common Table Expression) from SQL. The root relation provides the final column names for our query. The input to this relation is our aggregate relation (which contains all the other relations as children).

    For extensions, we need to provide extensionUris with the locations of the YAML files we used and extensions with the list of functions we used and which extension they come from.

    In our query, we used:

    • index_in (1), from functions_set.yaml,
    • is_null (2), from functions_comparison.yaml,
    • equal (3), from functions_comparison.yaml,
    • sum (4), from functions_arithmetic_decimal.yaml,
    • multiply (5), from functions_arithmetic_decimal.yaml.

    So first we can create the three extension uris:

    [\n  {\n    \"extensionUriAnchor\": 1,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_set.yaml\"\n  },\n  {\n    \"extensionUriAnchor\": 2,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_comparison.yaml\"\n  },\n  {\n    \"extensionUriAnchor\": 3,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic_decimal.yaml\"\n  }\n]\n

    Then we can create the extensions:

    [\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 1,\n      \"functionAnchor\": 1,\n      \"name\": \"index_in\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 2,\n      \"functionAnchor\": 2,\n      \"name\": \"is_null\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 2,\n      \"functionAnchor\": 3,\n      \"name\": \"equal\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 3,\n      \"functionAnchor\": 4,\n      \"name\": \"sum\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 3,\n      \"functionAnchor\": 5,\n      \"name\": \"multiply\"\n    }\n  }\n]\n

    Once we\u2019ve added our extensions, the plan is complete. Our plan outputted in full is: final_plan.json.

    "},{"location":"tutorial/sql_to_substrait/#next-steps","title":"Next steps","text":"

    Validate and introspect plans using substrait-validator. Amongst other things, this tool can show what the current schema and column indices are at each point in the plan. Try downloading the final plan JSON above and generating an HTML report on the plan with:

    substrait-validator final_plan.json --out-file output.html\n
    "},{"location":"types/type_classes/","title":"Type Classes","text":"

    In Substrait, the \u201cclass\u201d of a type, not to be confused with the concept from object-oriented programming, defines the set of non-null values that instances of a type may assume.

    Implementations of a Substrait type must support at least this set of values, but may include more; for example, an i8 could be represented using the same in-memory format as an i32, as long as functions operating on i8 values within [-128..127] behave as specified (in this case, this means 8-bit overflow must work as expected). Operating on values outside the specified range is unspecified behavior.

    "},{"location":"types/type_classes/#simple-types","title":"Simple Types","text":"

    Simple type classes are those that don\u2019t support any form of configuration. For simplicity, any generic type that has only a small number of discrete implementations is declared directly, as opposed to via configuration.

    Type Name Description Protobuf representation for literals boolean A value that is either True or False. bool i8 A signed integer within [-128..127], typically represented as an 8-bit two\u2019s complement number. int32 i16 A signed integer within [-32,768..32,767], typically represented as a 16-bit two\u2019s complement number. int32 i32 A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two\u2019s complement number. int32 i64 A signed integer within [\u22129,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two\u2019s complement number. int64 fp32 A 4-byte single-precision floating point number with the same range and precision as defined for the IEEE 754 32-bit floating-point format. float fp64 An 8-byte double-precision floating point number with the same range and precision as defined for the IEEE 754 64-bit floating-point format. double string A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. string binary A binary value, [0..2,147,483,647] bytes in length. binary timestamp A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone) timestamp_tz A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 UTC date A date within [1000-01-01..9999-12-31]. int32 days since 1970-01-01 time A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. int64 microseconds past midnight interval_year Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m. int32 years and int32 months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. -10000y 200000m is not allowed) interval_day Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. 1d 0s is considered equal to 0d 86400s. int32 days, int32 seconds, and int32 microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. 3650001d -86400s 0us is not allowed) uuid A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: c48ffa9e-64f4-44cb-ae47-152b4e60e77b. Any 128-bit value is allowed, without specific adherence to RFC4122. 16-byte binary"},{"location":"types/type_classes/#compound-types","title":"Compound Types","text":"

    Compound type classes are type classes that need to be configured by means of a parameter pack.

    Type Name Description Protobuf representation for literals FIXEDCHAR<L> A fixed-length unicode string of L characters. L must be within [1..2,147,483,647]. L-character string VARCHAR<L> A unicode string of at most L characters.L must be within [1..2,147,483,647]. string with at most L characters FIXEDBINARY<L> A binary string of L bytes. When casting, values shorter than L are padded with zeros, and values longer than L are right-trimmed. L-byte bytes DECIMAL<P, S> A fixed-precision decimal value having precision (P, number of digits) <= 38 and scale (S, number of fractional digits) 0 <= S <= P. 16-byte bytes representing a little-endian 128-bit integer, to be divided by 10^S to get the decimal value STRUCT<T1,\u2026,Tn> A list of types in a defined order. repeated Literal, types matching T1..Tn NSTRUCT<N:T1,\u2026,N:Tn> Pseudo-type: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait\u2019s core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. n/a LIST<T> A list of values of type T. The list can be between [0..2,147,483,647] values in length. repeated Literal, all types matching T MAP<K, V> An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. repeated KeyValue (in turn two Literals), all key types matching K and all value types matching V PRECISIONTIMESTAMP<P> A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone) PRECISIONTIMESTAMPTZ<P> A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC"},{"location":"types/type_classes/#user-defined-types","title":"User-Defined Types","text":"

    User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of simple extensions. An extension can declare an arbitrary number of user defined extension types. Once a type has been declared, it can be used in function declarations.

    A YAML example of an extension type is below:

    name: point\nstructure:\n  longitude: i32\n  latitude: i32\n

    This declares a new type (namespaced to the associated YAML file) called \u201cpoint\u201d. This type is composed of two i32 values named longitude and latitude.

    "},{"location":"types/type_classes/#structure-and-opaque-types","title":"Structure and opaque types","text":"

    The name-type object notation used above is syntactic sugar for NSTRUCT<longitude: i32, latitude: i32>. The following means the same thing:

    name: point\nstructure: \"NSTRUCT<longitude: i32, latitude: i32>\"\n

    The structure field of a type is only intended to inform systems that don\u2019t have built-in support for the type how they can transfer the data type from one point to another without unnecessary serialization/deserialization and without loss of type safety. Note that it is currently not possible to \u201cunpack\u201d a user-defined type class into its structure type or components thereof using FieldReferences or any other specialized record expression; if support for this is desired for a particular type, this can be accomplished with an extension function.

    The structure field is optional. If not specified, the type class is considered to be fully opaque. This implies that a systems without built-in support for the type cannot manipulate values in any way, including moving and cloning. This may be useful for exotic, context-sensitive types, such as raw pointers or identifiers that cannot be cloned.

    Note however that the vast majority of types can be trivially moved and copied, even if they cannot be precisely represented using Substrait\u2019s built-in types. In this case, it is recommended to use binary or FIXEDBINARY<n> (where n is the size of the type) as the structure type. For example, an unsigned 32-bit integer type could be defined as follows:

    name: u32\nstructure: \"FIXEDBINARY<4>\"\n

    In this case, i32 might also be used.

    "},{"location":"types/type_classes/#literals","title":"Literals","text":"

    Literals for user-defined types are represented using protobuf Any messages.

    "},{"location":"types/type_classes/#compound-user-defined-types","title":"Compound User-Defined Types","text":"

    User-defined types may be turned into compound types by requiring parameters to be passed to them. The supported \u201cmeta-types\u201d for parameters are data types (like those used in LIST, MAP, and STRUCT), booleans, integers, enumerations, and strings. Using parameters, we could redefine \u201cpoint\u201d with different types of coordinates. For example:

    name: point\nparameters:\n  - name: T\n    description: |\n      The type used for the longitude and latitude\n      components of the point.\n    type: dataType\n

    or:

    name: point\nparameters:\n  - name: coordinate_type\n    type: enumeration\n    options:\n      - integer\n      - double\n

    or:

    name: point\nparameters:\n  - name: LONG\n    type: dataType\n  - name: LAT\n    type: dataType\n

    We can\u2019t specify the internal structure in this case, because there is currently no support for derived types in the structure.

    The allowed range can be limited for integer parameters. For example:

    name: vector\nparameters:\n  - name: T\n    type: dataType\n  - name: dimensions\n    type: integer\n    min: 2\n    max: 3\n

    This specifies a vector that can be either 2- or 3-dimensional. Note however that it\u2019s not currently possible to put constraints on data type, string, or (technically) boolean parameters.

    Similar to function arguments, the last parameter may be specified to be variadic, allowing it to be specified one or more times instead of only once. For example:

    name: union\nparameters:\n  - name: T\n    type: dataType\nvariadic: true\n

    This defines a type that can be parameterized with one or more other data types, for example union<i32, i64> but also union<bool>. Zero or more is also possible, by making the last argument optional:

    name: tuple\nparameters:\n  - name: T\n    type: dataType\n    optional: true\nvariadic: true\n

    This would also allow for tuple<>, to define a zero-tuple.

    "},{"location":"types/type_parsing/","title":"Type Syntax Parsing","text":"

    In many places, it is useful to have a human-readable string representation of data types. Substrait has a custom syntax for type declaration. The basic structure of a type declaration is:

    name?[variation]<param0,...,paramN>\n

    The components of this expression are:

    Component Description Required Name Each type has a name. A type is expressed by providing a name. This name can be expressed in arbitrary case (e.g. varchar and vArChAr are equivalent) although lowercase is preferred. Nullability indicator A type is either non-nullable or nullable. To express nullability, a question mark is added after the type name (before any parameters). Optional, defaults to non-nullable Variation When expressing a type, a user can define the type based on a type variation. Some systems use type variations to describe different underlying representations of the same data type. This is expressed as a bracketed integer such as [2]. Optional, defaults to [0] Parameters Compound types may have one or more configurable properties. The two main types of properties are integer and type properties. The parameters for each type correspond to a list of known properties associated with a type as declared in the order defined in the type specification. For compound types (types that contain types), the data type syntax will include nested type declarations. The one exception is structs, which are further outlined below. Required where parameters are defined"},{"location":"types/type_parsing/#grammars","title":"Grammars","text":"

    It is relatively easy in most languages to produce simple parser & emitters for the type syntax. To make that easier, Substrait also includes an ANTLR grammar to ease consumption and production of types. (The grammar also supports an entire language for representing plans as text.)

    "},{"location":"types/type_parsing/#structs-named-structs","title":"Structs & Named Structs","text":"

    Structs are unique from other types because they have an arbitrary number of parameters. The parameters are recursive and may include their own subproperties. Struct parsing is declared in the following two ways:

    YAMLText Format Examples
    # Struct\nstruct?[variation]<type0, type1,..., typeN>\n\n# Named Struct\nnstruct?[variation]<name0:type0, name1:type1,..., nameN:typeN>\n
    // Struct\nstruct?<string, i8, i32?, timestamp_tz>\n\n// Named structs are not yet supported in the text format.\n

    In the normal (non-named) form, struct declares a set of types that are fields within that struct. In the named struct form, the parameters are formed by tuples of names + types, delineated by a colon. Names that are composed only of numbers and letters can be left unquoted. For other characters, names should be quoted with double quotes and use backslash for double-quote escaping.

    Note, in core Substrait algebra, fields are unnamed and references are always based on zero-index ordinal positions. However, data inputs must declare name-to-ordinal mappings and outputs must declare ordinal-to-name mappings. As such, Substrait also provides a named struct which is a pseudo-type that is useful for human consumption. Outside these places, most structs in a Substrait plan are structs, not named-structs. The two cannot be used interchangeably.

    "},{"location":"types/type_parsing/#other-complex-types","title":"Other Complex Types","text":"

    Similar to structs, maps and lists can also have a type as one of their parameters. Type references may be recursive. The key for a map is typically a simple type but it is not required.

    YAMLText Format Examples
    list?<type>>\nmap<type0, type1>\n
    list?<list<string>>\nlist<struct<string, i32>>\nmap<i32?, list<map<i32, string?>>>\n
    "},{"location":"types/type_system/","title":"Type System","text":"

    Substrait tries to cover the most common types used in data manipulation. Types beyond this common core may be represented using simple extensions.

    Substrait types fundamentally consist of four components:

    Component Condition Examples Description Class Always i8, string, STRUCT, extensions Together with the parameter pack, describes the set of non-null values supported by the type. Subdivided into simple and compound type classes. Nullability Always Either NULLABLE (? suffix) or REQUIRED (no suffix) Describes whether values of this type can be null. Note that null is considered to be a special value of a nullable type, rather than the only value of a special null type. Variation Always No suffix or explicitly [0] (system-preferred), or an extension Allows different variations of the same type class to exist in a system at a time, usually distinguished by in-memory format. Parameters Compound types only <10, 2> (for DECIMAL), <i32, string> (for STRUCT) Some combination of zero or more data types or integers. The expected set of parameters and the significance of each parameter depends on the type class.

    Refer to Type Parsing for a description of the syntax used to describe types.

    Note

    Substrait employs a strict type system without any coercion rules. All changes in types must be made explicit via cast expressions.

    "},{"location":"types/type_variations/","title":"Type Variations","text":"

    Type variations may be used to represent differences in representation between different consumers. For example, an engine might support dictionary encoding for a string, or could be using either a row-wise or columnar representation of a struct. All variations of a type are expected to have the same semantics when operated on by functions or other expressions.

    All variations except the \u201csystem-preferred\u201d variation (a.k.a. [0], see Type Parsing) must be defined using simple extensions. The key properties of these variations are:

    Property Description Base Type Class The type class that this variation belongs to. Name The name used to reference this type. Should be unique within type variations for this parent type within a simple extension. Description A human description of the purpose of this type variation. Function Behavior INHERITS or SEPARATE: whether functions that support the system-preferred variation implicitly also support this variation, or whether functions should be resolved independently. For example, if one has the function add(i8,i8) defined and then defines an i8 variation, this determines whether the i8 variation can be bound to the base add operation (inherits) or whether a specialized version of add needs to be defined specifically for this variation (separate). Defaults to inherits."}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Substrait: Cross-Language Serialization for Relational Algebra","text":""},{"location":"#what-is-substrait","title":"What is Substrait?","text":"

    Substrait is a format for describing compute operations on structured data. It is designed for interoperability across different languages and systems.

    "},{"location":"#how-does-it-work","title":"How does it work?","text":"

    Substrait provides a well-defined, cross-language specification for data compute operations. This includes a consistent declaration of common operations, custom operations and one or more serialized representations of this specification. The spec focuses on the semantics of each operation. In addition to the specification the Substrait ecosystem also includes a number of libraries and useful tools.

    We highly recommend the tutorial to learn how a Substrait plan is constructed.

    "},{"location":"#benefits","title":"Benefits","text":"
    • Avoids every system needing to create a communication method between every other system \u2013 each system merely supports ingesting and producing Substrait and it instantly becomes a part of the greater ecosystem.
    • Makes every part of the system upgradable. There\u2019s a new query engine that\u2019s ten times faster? Just plug it in!
    • Enables heterogeneous environments \u2013 run on a cluster of an unknown set of execution engines!
    • The text version of the Substrait plan allows you to quickly see how a plan functions without needing a visualizer (although there are Substrait visualizers as well!).
    "},{"location":"#example-use-cases","title":"Example Use Cases","text":"
    • Communicate a compute plan between a SQL parser and an execution engine (e.g. Calcite SQL parsing to Arrow C++ compute kernel)
    • Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)
    • Submit a plan to different execution engines (e.g. Datafusion and Postgres) and get a consistent interpretation of the semantics.
    • Create an alternative plan generation implementation that can connect an existing end-user compute expression system to an existing end-user processing engine (e.g. Pandas operations executed inside SingleStore)
    • Build a pluggable plan visualization tool (e.g. D3 based plan visualizer)
    "},{"location":"about/","title":"Substrait: Cross-Language Serialization for Relational Algebra","text":""},{"location":"about/#project-vision","title":"Project Vision","text":"

    The Substrait project aims to create a well-defined, cross-language specification for data compute operations. The specification declares a set of common operations, defines their semantics, and describes their behavior unambiguously. The project also defines extension points and serialized representations of the specification.

    In many ways, the goal of this project is similar to that of the Apache Arrow project. Arrow is focused on a standardized memory representation of columnar data. Substrait is focused on what should be done to data.

    "},{"location":"about/#why-not-use-sql","title":"Why not use SQL?","text":"

    SQL is a well known language for describing queries against relational data. It is designed to be simple and allow reading and writing by humans. Substrait is not intended as a replacement for SQL and works alongside SQL to provide capabilities that SQL lacks. SQL is not a great fit for systems that actually satisfy the query because it does not provide sufficient detail and is not represented in a format that is easy for processing. Because of this, most modern systems will first translate the SQL query into a query plan, sometimes called the execution plan. There can be multiple levels of a query plan (e.g. physical and logical), a query plan may be split up and distributed across multiple systems, and a query plan often undergoes simplifying or optimizing transformations. The SQL standard does not define the format of the query or execution plan and there is no open format that is supported by a broad set of systems. Substrait was created to provide a standard and open format for these query plans.

    "},{"location":"about/#why-not-just-do-this-within-an-existing-oss-project","title":"Why not just do this within an existing OSS project?","text":"

    A key goal of the Substrait project is to not be coupled to any single existing technology. Trying to get people involved in something can be difficult when it seems to be primarily driven by the opinions and habits of a single community. In many ways, this situation is similar to the early situation with Arrow. The precursor to Arrow was the Apache Drill ValueVectors concepts. As part of creating Arrow, Wes and Jacques recognized the need to create a new community to build a fresh consensus (beyond just what the Apache Drill community wanted). This separation and new independent community was a key ingredient to Arrow\u2019s current success. The needs here are much the same: many separate communities could benefit from Substrait, but each have their own pain points, type systems, development processes and timelines. To help resolve these tensions, one of the approaches proposed in Substrait is to set a bar that at least two of the top four OSS data technologies (Arrow, Spark, Iceberg, Trino) supports something before incorporating it directly into the Substrait specification. (Another goal is to support strong extension points at key locations to avoid this bar being a limiter to broad adoption.)

    "},{"location":"about/#related-technologies","title":"Related Technologies","text":"
    • Apache Calcite: Many ideas in Substrait are inspired by the Calcite project. Calcite is a great JVM-based SQL query parsing and optimization framework. A key goal of the Substrait project is to expose Calcite capabilities more easily to non-JVM technologies as well as expose query planning operations as microservices.
    • Apache Arrow: The Arrow format for data is what the Substrait specification attempts to be for compute expressions. A key goal of Substrait is to enable Substrait producers to execute work within the Arrow Rust and C++ compute kernels.
    "},{"location":"about/#why-the-name-substrait","title":"Why the name Substrait?","text":"

    A strait is a narrow connector of water between two other pieces of water. In analytics, data is often thought of as water. Substrait is focused on instructions related to the data. In other words, what defines or supports the movement of water between one or more larger systems. Thus, the underlayment for the strait connecting different pools of water => sub-strait.

    "},{"location":"faq/","title":"Frequently Asked Question","text":""},{"location":"faq/#what-is-the-purpose-of-the-post-join-filter-field-on-join-relations","title":"What is the purpose of the post-join filter field on Join relations?","text":"

    The post-join filter on the various Join relations is not always equivalent to an explicit Filter relation AFTER the Join.

    See the example here that highlights how the post-join filter behaves differently than a Filter relation in the case of a left join.

    "},{"location":"governance/","title":"Substrait Project Governance","text":"

    The Substrait project is run by volunteers in a collaborative and open way. Its governance is inspired by the Apache Software Foundation. In most cases, people familiar with the ASF model can work with Substrait in the same way. The biggest differences between the models are:

    • Substrait does not have a separate infrastructure governing body that gatekeeps the adoption of new developer tools and technologies.
    • Substrait Management Committee (SMC) members are responsible for recognizing the corporate relationship of its members and ensuring diverse representation and corporate independence.
    • Substrait does not condone private mailing lists. All project business should be discussed in public The only exceptions to this are security escalations (security@substrait.io) and harassment (harassment@substrait.io).
    • Substrait has an automated continuous release process with no formal voting process per release.

    More details about concrete things Substrait looks to avoid can be found below.

    "},{"location":"governance/#the-substrait-project","title":"The Substrait Project","text":"

    The Substrait project consists of the code and repositories that reside in the substrait-io GitHub organization, the Substrait.io website, the Substrait mailing list, MS-hosted teams community calls and the Substrait Slack workspace. (All are open to everyone and recordings/transcripts are made where technology supports it.)

    "},{"location":"governance/#substrait-volunteers","title":"Substrait Volunteers","text":"

    We recognize four groups of individuals related to the project.

    "},{"location":"governance/#user","title":"User","text":"

    A user is someone who uses Substrait. They may contribute to Substrait by providing feedback to developers in the form of bug reports and feature suggestions. Users participate in the Substrait community by helping other users on mailing lists and user support forums.

    "},{"location":"governance/#contributors","title":"Contributors","text":"

    A contributor is a user who contributes to the project in the form of code or documentation. They take extra steps to participate in the project (loosely defined as the set of repositories under the github substrait-io organization) , are active on the developer mailing list, participate in discussions, and provide patches, documentation, suggestions, and criticism.

    "},{"location":"governance/#committer","title":"Committer","text":"

    A committer is a developer who has write access to the code repositories and has a signed Contributor License Agreement (CLA) on file. Not needing to depend on other people to make patches to the code or documentation, they are actually making short-term decisions for the project. The SMC can (even tacitly) agree and approve the changes into permanency, or they can reject them. Remember that the SMC makes the decisions, not the individual committers.

    "},{"location":"governance/#smc-member","title":"SMC Member","text":"

    A SMC member is a committer who was elected due to merit for the evolution of the project. They have write access to the code repository, the right to cast binding votes on all proposals on community-related decisions,the right to propose other active contributors for committership, and the right to invite active committers to the SMC. The SMC as a whole is the entity that controls the project, nobody else. They are responsible for the continued shaping of this governance model.

    "},{"location":"governance/#substrait-management-and-collaboration","title":"Substrait Management and Collaboration","text":"

    The Substrait project is managed using a collaborative, consensus-based process. We do not have a hierarchical structure; rather, different groups of contributors have different rights and responsibilities in the organization.

    "},{"location":"governance/#communication","title":"Communication","text":"

    Communication must be done via mailing lists, Slack, and/or Github. Communication is always done publicly. There are no private lists and all decisions related to the project are made in public. Communication is frequently done asynchronously since members of the community are distributed across many time zones.

    "},{"location":"governance/#substrait-management-committee","title":"Substrait Management Committee","text":"

    The Substrait Management Committee is responsible for the active management of Substrait. The main role of the SMC is to further the long-term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration takes place. As part of this, the SMC is the primary approver of specification changes, ensuring that proposed changes represent a balanced and thorough examination of possibilities. This doesn\u2019t mean that the SMC has to be involved in the minutiae of a particular specification change but should always shepard a healthy process around specification changes.

    "},{"location":"governance/#substrait-voting-process","title":"Substrait Voting Process","text":"

    Because one of the fundamental aspects of accomplishing things is doing so by consensus, we need a way to tell whether we have reached consensus. We do this by voting. There are several different types of voting. In all cases, it is recommended that all community members vote. The number of binding votes required to move forward and the community members who have \u201cbinding\u201d votes differs depending on the type of proposal made. In all cases, a veto of a binding voter results in an inability to move forward.

    The rules require that a community member registering a negative vote must include an alternative proposal or a detailed explanation of the reasons for the negative vote. The community then tries to gather consensus on an alternative proposal that can resolve the issue. In the great majority of cases, the concerns leading to the negative vote can be addressed. This process is called \u201cconsensus gathering\u201d and we consider it a very important indication of a healthy community.

    +1 votes required Binding voters Voting Location Process/Governance modifications & actions. This includes promoting new contributors to committer or SMC. 3 SMC Mailing List Format/Specification Modifications (including breaking extension changes) 2 SMC Github PR Documentation Updates (formatting, moves) 1 SMC Github PR Typos 1 Committers Github PR Non-breaking function introductions 1 (not including proposer) Committers Github PR Non-breaking extension additions & non-format code modifications 1 (not including proposer) Committers Github PR Changes (non-breaking or breaking) to a Substrait library (i.e. substrait-java, substrait-validator) 1 (not including proposer) Committers Github PR"},{"location":"governance/#review-then-commit","title":"Review-Then-Commit","text":"

    Substrait follows a review-then-commit policy. This requires that all changes receive consensus approval before being committed to the code base. The specific vote requirements follow the table above.

    "},{"location":"governance/#expressing-votes","title":"Expressing Votes","text":"

    The voting process may seem more than a little weird if you\u2019ve never encountered it before. Votes are represented as numbers between -1 and +1, with \u2018-1\u2019 meaning \u2018no\u2019 and \u2018+1\u2019 meaning \u2018yes.\u2019

    The in-between values indicate how strongly the voting individual feels. Here are some examples of fractional votes and what the voter might be communicating with them:

    • +0: \u2018I don\u2019t feel strongly about it, but I\u2019m okay with this.\u2019
    • -0: \u2018I won\u2019t get in the way, but I\u2019d rather we didn\u2019t do this.\u2019
    • -0.5: \u2018I don\u2019t like this idea, but I can\u2019t find any rational justification for my feelings.\u2019
    • ++1: \u2018Wow! I like this! Let\u2019s do it!\u2019
    • -0.9: \u2018I really don\u2019t like this, but I\u2019m not going to stand in the way if everyone else wants to go ahead with it.\u2019
    • +0.9: \u2018This is a cool idea and I like it, but I don\u2019t have time/the skills necessary to help out.\u2019
    "},{"location":"governance/#votes-on-code-modification","title":"Votes on Code Modification","text":"

    For code-modification votes, +1 votes (review approvals in Github are considered equivalent to a +1) are in favor of the proposal, but -1 votes are vetoes and kill the proposal dead until all vetoers withdraw their -1 votes.

    "},{"location":"governance/#vetoes","title":"Vetoes","text":"

    A -1 (or an unaddressed PR request for changes) vote by a qualified voter stops a code-modification proposal in its tracks. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetoes stand until and unless the individual withdraws their veto.

    To prevent vetoes from being used capriciously, the voter must provide with the veto a technical or community justification showing why the change is bad.

    "},{"location":"governance/#why-do-we-vote","title":"Why do we vote?","text":"

    Votes help us to openly resolve conflicts. Without a process, people tend to avoid conflict and thrash around. Votes help to make sure we do the hard work of resolving the conflict.

    "},{"location":"governance/#substrait-is-non-commercial-but-commercially-aware","title":"Substrait is non-commercial but commercially-aware","text":"

    Substrait\u2019s mission is to produce software for the public good. All Substrait software is always available for free, and solely under the Apache License.

    We\u2019re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes. However it is important in these cases to ensure that the third party does not misuse the brand and reputation of the Substrait project for its own purposes. It is important for the longevity and community health of Substrait that the community gets the appropriate credit for producing freely available software.

    The SMC actively track the corporate allegiances of community members and strives to ensure influence around any particular aspect of the project isn\u2019t overly skewed towards a single corporate entity.

    "},{"location":"governance/#substrait-trademark","title":"Substrait Trademark","text":"

    The SMC is responsible for protecting the Substrait name and brand. TBD what action is taken to support this.

    "},{"location":"governance/#project-roster","title":"Project Roster","text":""},{"location":"governance/#substrait-management-committee-smc","title":"Substrait Management Committee (SMC)","text":"Name Association Phillip Cloud Voltron Data Weston Pace LanceDB Jacques Nadeau Sundeck Victor Barua Datadog David Sisson Voltron Data"},{"location":"governance/#substrait-committers","title":"Substrait Committers","text":"Name Association Jeroen van Straten Qblox Carlo Curino Microsoft James Taylor Sundeck Sutou Kouhei Clearcode Micah Kornfeld Google Jinfeng Ni Sundeck Andy Grove Nvidia Jesus Camacho Rodriguez Microsoft Rich Tia Voltron Data Vibhatha Abeykoon Voltron Data Nic Crane Recast Gil Forsyth Voltron Data ChaoJun Zhang Intel Matthijs Brobbel Voltron Data Matt Topol Voltron Data"},{"location":"governance/#additional-detail-about-differences-from-asf","title":"Additional detail about differences from ASF","text":"

    Corporate Awareness: The ASF takes a blind-eye approach that has proven to be too slow to correct corporate influence which has substantially undermined many OSS projects. In contrast, Substrait SMC members are responsible for identifying corporate risks and over-representation and adjusting inclusion in the project based on that (limiting committership, SMC membership, etc). Each member of the SMC shares responsibility to expand the community and seek out corporate diversity.

    Infrastructure: The ASF shows its age wrt to infrastructure, having been originally built on SVN. Some examples of requirements that Substrait is eschewing that exist in ASF include: custom git infrastructure, release process that is manual, project external gatekeeping around the use of new tools/technologies.

    "},{"location":"community/","title":"Community","text":"

    Substrait is developed as a consensus-driven open source product under the Apache 2.0 license. Development is done in the open leveraging GitHub issues and PRs.

    "},{"location":"community/#get-in-touch","title":"Get In Touch","text":"Mailing List/Google Group We use the mailing list to discuss questions, formulate plans and collaborate asynchronously. Slack Channel The developers of Substrait frequent the Slack channel. You can get an invite to the channel by following this link. GitHub Issues Substrait is developed via GitHub issues and pull requests. If you see a problem or want to enhance the product, we suggest you file a GitHub issue for developers to review. Twitter The @substrait_io account on Twitter is our official account. Follow-up to keep to date on what is happening with Substrait! Docs Our website is all maintained in our source repository. If there is something you think can be improved, feel free to fork our repository and post a pull request. Meetings Our community meets every other week on Wednesday."},{"location":"community/#talks","title":"Talks","text":"

    Want to learn more about Substrait? Try the following presentations and slide decks.

    • Substrait: A Common Representation for Data Compute Plans (Jacques Nadeau, April 2022) [slides]
    "},{"location":"community/#citation","title":"Citation","text":"

    If you use Substrait in your research, please cite it using the following BibTeX entry:

    @misc{substrait,\n  author = {substrait-io},\n  title = {Substrait: Cross-Language Serialization for Relational Algebra},\n  year = {2021},\n  month = {8},\n  day = {31},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/substrait-io/substrait}}\n}\n
    "},{"location":"community/#contribution","title":"Contribution","text":"

    All contributors are welcome to Substrait. If you want to join the project, open a PR or get in touch with us as above.

    "},{"location":"community/#principles","title":"Principles","text":"
    • Be inclusive and open to all.
    • Ensure a diverse set of contributors that come from multiple data backgrounds to maximize general utility.
    • Build a specification based on open consensus.
    • Avoid over-reliance/coupling to any single technology.
    • Make the specification and all tools freely available on a permissive license (ApacheV2)
    "},{"location":"community/powered_by/","title":"Powered by Substrait","text":"

    In addition to the work maintained in repositories within the substrait-io GitHub organization, a growing list of other open source projects have adopted Substrait.

    Acero Acero is a query execution engine implemented as a part of the Apache Arrow C++ library. Acero provides a Substrait consumer interface. ADBC ADBC (Arrow Database Connectivity) is an API specification for Apache Arrow-based database access. ADBC allows applications to pass queries either as SQL strings or Substrait plans. Arrow Flight SQL Arrow Flight SQL is a client-server protocol for interacting with databases and query engines using the Apache Arrow in-memory columnar format and the Arrow Flight RPC framework. Arrow Flight SQL allows clients to send queries as SQL strings or Substrait plans. DataFusion DataFusion is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion provides a Substrait producer and consumer that can convert DataFusion logical plans to and from Substrait plans. It can be used through the DataFusion Python bindings. DuckDB DuckDB is an in-process SQL OLAP database management system. DuckDB provides a Substrait extension that allows users to produce and consume Substrait plans through DuckDB\u2019s SQL, Python, and R APIs. Gluten Gluten is a plugin for Apache Spark that allows computation to be offloaded to engines that have better performance or efficiency than Spark\u2019s built-in JVM-based engine. Gluten converts Spark physical plans to Substrait plans. Ibis Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It includes a dataframe API for Python with support for more than 10 query execution engines, plus a Substrait producer to enable support for Substrait-consuming execution engines. Substrait R Interface The Substrait R interface package allows users to construct Substrait plans from R for evaluation by Substrait-consuming execution engines. The package provides a dplyr backend as well as lower-level interfaces for creating Substrait plans and integrations with Acero and DuckDB. Velox Velox is a unified execution engine aimed at accelerating data management systems and streamlining their development. Velox provides a Substrait consumer interface.

    To add your project to this list, please open a pull request.

    "},{"location":"expressions/aggregate_functions/","title":"Aggregate Functions","text":"

    Aggregate functions are functions that define an operation which consumes values from multiple records to a produce a single output. Aggregate functions in SQL are typically used in GROUP BY functions. Aggregate functions are similar to scalar functions and function signatures with a small set of different properties.

    Aggregate function signatures contain all the properties defined for scalar functions. Additionally, they contain the properties below:

    Property Description Required Inherits All properties defined for scalar function. N/A Ordered Whether the result of this function is sensitive to sort order. Optional, defaults to false Maximum set size Maximum allowed set size as an unsigned integer. Optional, defaults to unlimited Decomposable Whether the function can be executed in one or more intermediate steps. Valid options are: NONE, ONE, MANY, describing how intermediate steps can be taken. Optional, defaults to NONE Intermediate Output Type If the function is decomposable, represents the intermediate output type that is used, if the function is defined as either ONE or MANY decomposable. Will be a struct in many cases. Required for ONE and MANY. Invocation Whether the function uses all or only distinct values in the aggregation calculation. Valid options are: ALL, DISTINCT. Optional, defaults to ALL"},{"location":"expressions/aggregate_functions/#aggregate-binding","title":"Aggregate Binding","text":"

    When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

    Property Description Phase Describes the input type of the data: [INITIAL_TO_INTERMEDIATE, INTERMEDIATE_TO_INTERMEDIATE, INITIAL_TO_RESULT, INTERMEDIATE_TO_RESULT] describing what portion of the operation is required. For functions that are NOT decomposable, the only valid option will be INITIAL_TO_RESULT. Ordering Zero or more ordering keys along with key order (ASC|DESC|NULL FIRST, etc.), declared similar to the sort keys in an ORDER BY relational operation. If no sorts are specified, the records are not sorted prior to being passed to the aggregate function."},{"location":"expressions/embedded_functions/","title":"Embedded Functions","text":"

    Embedded functions are a special kind of function where the implementation is embedded within the actual plan. They are commonly used in tools where a user intersperses business logic within a data pipeline. This is more common in data science workflows than traditional SQL workflows.

    Embedded functions are not pre-registered. Embedded functions require that data be consumed and produced with a standard API, may require memory allocation and have determinate error reporting behavior. They may also have specific runtime dependencies. For example, a Python pickle function may depend on pyarrow 5.0 and pynessie 1.0.

    Properties for an embedded function include:

    Property Description Required Function Type The type of embedded function presented. Required Function Properties Function properties, one of those items defined below. Required Output Type The fully resolved output type for this embedded function. Required

    The binary representation of an embedded function is:

    Binary RepresentationHuman Readable Representation
    message EmbeddedFunction {\n  repeated Expression arguments = 1;\n  Type output_type = 2;\n  oneof kind {\n    PythonPickleFunction python_pickle_function = 3;\n    WebAssemblyFunction web_assembly_function = 4;\n  }\n\n  message PythonPickleFunction {\n    bytes function = 1;\n    repeated string prerequisite = 2;\n  }\n\n  message WebAssemblyFunction {\n    bytes script = 1;\n    repeated string prerequisite = 2;\n  }\n}\n

    As the bytes are opaque to Substrait there is no equivalent human readable form.

    "},{"location":"expressions/embedded_functions/#function-details","title":"Function Details","text":"

    There are many types of possible stored functions. For each, Substrait works to expose the function in as descriptive a way as possible to support the largest number of consumers.

    "},{"location":"expressions/embedded_functions/#python-pickle-function-type","title":"Python Pickle Function Type","text":"Property Description Required Pickle Body binary pickle encoded function using [TBD] API representation to access arguments. True Prereqs A list of specific Python conda packages that are prerequisites for access (a structured version of a requirements.txt file). Optional, defaults to none"},{"location":"expressions/embedded_functions/#webassembly-function-type","title":"WebAssembly Function Type","text":"Property Description Required Script WebAssembly function True Prereqs A list of AssemblyScript prerequisites required to compile the assemblyscript function using NPM coordinates. Optional, defaults to none Discussion Points
    • What are the common embedded function formats?
    • How do we expose the data for a function?
    • How do we express batching capabilities?
    • How do we ensure/declare containerization?
    "},{"location":"expressions/extended_expression/","title":"Extended Expression","text":"

    Extended Expression messages are provided for expression-level protocols as an alternative to using a Plan. They mainly target expression-only evaluations, such as those computed in Filter/Project/Aggregation rels. Unlike the original Expression defined in the substrait protocol, Extended Expression messages require more information to completely describe the computation context including: input data schema, referred function signatures, and output schema.

    Since Extended Expression will be used seperately from the Plan rel representation, it will need to include basic fields like Version.

    ExtendedExpression Message
    message ExtendedExpression {\n  // Substrait version of the expression. Optional up to 0.17.0, required for later\n  // versions.\n  Version version = 7;\n\n  // a list of yaml specifications this expression may depend on\n  repeated substrait.extensions.SimpleExtensionURI extension_uris = 1;\n\n  // a list of extensions this expression may depend on\n  repeated substrait.extensions.SimpleExtensionDeclaration extensions = 2;\n\n  // one or more expression trees with same order in plan rel\n  repeated ExpressionReference referred_expr = 3;\n\n  NamedStruct base_schema = 4;\n  // additional extensions associated with this expression.\n  substrait.extensions.AdvancedExtension advanced_extensions = 5;\n\n  // A list of com.google.Any entities that this plan may use. Can be used to\n  // warn if some embedded message types are unknown. Note that this list may\n  // include message types that are ignorable (optimizations) or that are\n  // unused. In many cases, a consumer may be able to work with a plan even if\n  // one or more message types defined here are unknown.\n  repeated string expected_type_urls = 6;\n\n}\n
    "},{"location":"expressions/extended_expression/#input-and-output-data-schema","title":"Input and output data schema","text":"

    Similar to base_schema defined in ReadRel, the input data schema describes the name/type/nullibilty and layout info of input data for the target expression evalutation. It also has a field name to define the name of the output data.

    "},{"location":"expressions/extended_expression/#referred-expression","title":"Referred expression","text":"

    An Extended Exression will have one or more referred expressions, which can be either Expression or AggregateFunction. Additional types of expressions may be added in the future.

    For a message with multiple expressions, users may produce each Extended Expression in the same order as they occur in the original Plan rel. But, the consumer does NOT have to handle them in this order. A consumer needs only to ensure that the columns in the final output are organized in the same order as defined in the message.

    "},{"location":"expressions/extended_expression/#function-extensions","title":"Function extensions","text":"

    Function extensions work the same for both Extended Expression and the original Expression defined in the Substrait protocol.

    "},{"location":"expressions/field_references/","title":"Field References","text":"

    In Substrait, all fields are dealt with on a positional basis. Field names are only used at the edge of a plan, for the purposes of naming fields for the outside world. Each operation returns a simple or compound data type. Additional operations can refer to data within that initial operation using field references. To reference a field, you use a reference based on the type of field position you want to reference.

    Reference Type Properties Type Applicability Type return Struct Field Ordinal position. Zero-based. Only legal within the range of possible fields within a struct. Selecting an ordinal outside the applicable field range results in an invalid plan. struct Type of field referenced Array Value Array offset. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Negative and positive overflows return null values (no wrapping). list type of list Array Slice Array offset and element count. Zero-based. Negative numbers can be used to describe an offset relative to the end of the array. For example, -1 means the last element in an array. Position does not wrap, nor does length. list Same type as original list Map Key A map value that is matched exactly against available map keys and returned. map Value type of map Map KeyExpression A wildcard string that is matched against a simplified form of regular expressions. Requires the key type of the map to be a character type. [Format detail needed, intention to include basic regex concepts such as greedy/non-greedy.] map List of map value type Masked Complex Expression An expression that provides a mask over a schema declaring which portions of the schema should be presented. This allows a user to select a portion of a complex object but mask certain subsections of that same object. any any"},{"location":"expressions/field_references/#compound-references","title":"Compound References","text":"

    References are typically constructed as a sequence. For example: [struct position 0, struct position 1, array offset 2, array slice 1..3].

    Field references are in the same order they are defined in their schema. For example, let\u2019s consider the following schema:

    column a:\n  struct<\n    b: list<\n      struct<\n        c: map<string, \n          struct<\n            x: i32>>>>>\n

    If we want to represent the SQL expression:

    a.b[2].c['my_map_key'].x\n

    We will need to declare the nested field such that:

    Struct field reference a\nStruct field b\nList offset 2\nStruct field c\nMap key my_map_key\nStruct field x\n

    Or more formally in Protobuf Text, we get:

    selection {\n  direct_reference {\n    struct_field {\n      field: 0 # .a\n      child {\n        struct_field {\n          field: 0 # .b\n          child {\n            list_element {\n              offset: 2\n              child {\n                struct_field {\n                  field: 0 # .c\n                  child {\n                    map_key {\n                      map_key {\n                        string: \"my_map_key\" # ['my_map_key']\n                      }\n                      child {\n                        struct_field {\n                          field: 0 # .x\n                        }\n                      }\n                    }\n                  }\n                }\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n  root_reference { }\n}\n
    "},{"location":"expressions/field_references/#validation","title":"Validation","text":"

    References must validate against the schema of the record being referenced. If not, an error is expected.

    "},{"location":"expressions/field_references/#masked-complex-expression","title":"Masked Complex Expression","text":"

    A masked complex expression is used to do a subselection of a portion of a complex record. It allows a user to specify the portion of the complex object to consume. Imagine you have a schema of (note that structs are lists of fields here, as they are in general in Substrait as field names are not used internally in Substrait):

    struct:\n  - struct:\n    - integer\n    - list:\n      struct:\n        - i32\n        - string\n        - string\n     - i32\n  - i16\n  - i32\n  - i64\n

    Given this schema, you could declare a mask of fields to include in pseudocode, such as:

    0:[0,1:[..5:[0,2]]],2,3\n\nOR\n\n0:\n  - 0\n  - 1:\n    ..5:\n      -0\n      -2\n2\n3\n

    This mask states that we would like to include fields 0 2 and 3 at the top-level. Within field 0, we want to include subfields 0 and 1. For subfield 0.1, we want to include up to only the first 5 records in the array and only includes fields 0 and 2 within the struct within that array. The resulting schema would be:

    struct:\n  - struct:\n    - integer\n    - list:\n      struct: \n        - i32\n        - string\n  - i32\n  - i64\n
    "},{"location":"expressions/field_references/#unwrapping-behavior","title":"Unwrapping Behavior","text":"

    By default, when only a single field is selected from a struct, that struct is removed. When only a single element is removed from a list, the list is removed. A user can also configure the mask to avoid unwrapping in these cases. [TBD how we express this in the serialization formats.]

    Discussion Points
    • Should we support column reordering/positioning using a masked complex expression? (Right now, you can only mask things out.)
    "},{"location":"expressions/scalar_functions/","title":"Scalar Functions","text":"

    A function is a scalar function if that function takes in values from a single record and produces an output value. To clearly specify the definition of functions, Substrait declares an extensible specification plus binding approach to function resolution. A scalar function signature includes the following properties:

    Property Description Required Name One or more user-friendly UTF-8 strings that are used to reference this function. At least one value is required. List of arguments Argument properties are defined below. Arguments can be fully defined or calculated with a type expression. See further details below. Optional, defaults to niladic. Deterministic Whether this function is expected to reproduce the same output when it is invoked multiple times with the same input. This informs a plan consumer on whether it can constant-reduce the defined function. An example would be a random() function, which is typically expected to be evaluated repeatedly despite having the same set of inputs. Optional, defaults to true. Session Dependent Whether this function is influenced by the session context it is invoked within. For example, a function may be influenced by a user who is invoking the function, the time zone of a session, or some other non-obvious parameter. This can inform caching systems on whether a particular function is cacheable. Optional, defaults to false. Variadic Behavior Whether the last argument of the function is variadic or a single argument. If variadic, the argument can optionally have a lower bound (minimum number of instances) and an upper bound (maximum number of instances). Optional, defaults to single value. Nullability Handling Describes how nullability of input arguments maps to nullability of output arguments. Three options are: MIRROR, DECLARED_OUTPUT and DISCRETE. More details about nullability handling are listed below. Optional, defaults to MIRROR Description Additional description of function for implementers or users. Should be written human-readable to allow exposure to end users. Presented as a map with language => description mappings. E.g. { \"en\": \"This adds two numbers together.\", \"fr\": \"cela ajoute deux nombres\"}. Optional Return Value The output type of the expression. Return types can be expressed as a fully-defined type or a type expression. See below for more on type expressions. Required Implementation Map A map of implementation locations for one or more implementations of the given function. Each key is a function implementation type. Implementation types include examples such as: AthenaArrowLambda, TrinoV361Jar, ArrowCppKernelEnum, GandivaEnum, LinkedIn Transport Jar, etc. [Definition TBD]. Implementation type has one or more properties associated with retrieval of that implementation. Optional"},{"location":"expressions/scalar_functions/#argument-types","title":"Argument Types","text":"

    There are three main types of arguments: value arguments, type arguments, and enumerations. Every defined arguments must be specified in every invocation of the function. When specified, the position of these arguments in the function invocation must match the position of the arguments as defined in the YAML function definition.

    • Value arguments: arguments that refer to a data value. These could be constants (literal expressions defined in the plan) or variables (a reference expression that references data being processed by the plan). This is the most common type of argument. The value of a value argument is not available in output derivation, but its type is. Value arguments can be declared in one of two ways: concrete or parameterized. Concrete types are either simple types or compound types with all parameters fully defined (without referencing any type arguments). Examples include i32, fp32, VARCHAR<20>, List<fp32>, etc. Parameterized types are discussed further below.
    • Type arguments: arguments that are used only to inform the evaluation and/or type derivation of the function. For example, you might have a function which is truncate(<type> DECIMAL<P0,S0>, <value> DECIMAL<P1, S1>, <value> i32). This function declares two value arguments and a type argument. The difference between them is that the type argument has no value at runtime, while the value arguments do.
    • Enumeration: arguments that support a fixed set of declared values as constant arguments. These arguments must be specified as part of an expression. While these could also have been implemented as constant string value arguments, they are formally included to improve validation/contextual help/etc. for frontend processors and IDEs. An example might be extract([DAY|YEAR|MONTH], <date value>). In this example, a producer must specify a type of date part to extract. Note, the value of a required enumeration cannot be used in type derivation.
    "},{"location":"expressions/scalar_functions/#value-argument-properties","title":"Value Argument Properties","text":"Property Description Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0) Type A fully defined type or a type expression. Required Constant Whether this argument is required to be a constant for invocation. For example, in some system a regular expression pattern would only be accepted as a literal and not a column value reference. Optional, defaults to false"},{"location":"expressions/scalar_functions/#type-argument-properties","title":"Type Argument Properties","text":"Property Description Required Type A partially or completely parameterized type. E.g. List<K> or K Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)"},{"location":"expressions/scalar_functions/#required-enumeration-properties","title":"Required Enumeration Properties","text":"Property Description Required Options List of valid string options for this argument Required Name A human-readable name for this argument to help clarify use. Optional, defaults to a name based on position (e.g. arg0)"},{"location":"expressions/scalar_functions/#options","title":"Options","text":"

    In addition to arguments each call may specify zero or more options. These are similar to a required enumeration but more focused on supporting alternative behaviors. Options can be left unspecified and the consumer is free to choose which implementation to use. An example use case might be OVERFLOW_BEHAVIOR:[OVERFLOW, SATURATE, ERROR] If unspecified, an engine is free to use any of the three choices or even some alternative behavior (e.g. setting the value to null on overflow). If specified, the engine would be expected to behave as specified or fail. Note, the value of an optional enumeration cannot be used in type derivation.

    "},{"location":"expressions/scalar_functions/#option-preference","title":"Option Preference","text":"

    A producer may specify multiple values for an option. If the producer does so then the consumer must deliver the first behavior in the list of values that the consumer is capable of delivering. For example, considering overflow as defined above, if a producer specified [ERROR, SATURATE] then the consumer must deliver ERROR if it is capable of doing so. If it is not then it may deliver SATURATE. If the consumer cannot deliver either behavior then it is an error and the consumer must reject the plan.

    "},{"location":"expressions/scalar_functions/#optional-properties","title":"Optional Properties","text":"Property Description Required Values A list of valid strings for this option. Required Name A human-readable name for this option. Required"},{"location":"expressions/scalar_functions/#nullability-handling","title":"Nullability Handling","text":"Mode Description MIRROR This means that the function has the behavior that if at least one of the input arguments are nullable, the return type is also nullable. If all arguments are non-nullable, the return type will be non-nullable. An example might be the + function. DECLARED_OUTPUT Input arguments are accepted of any mix of nullability. The nullability of the output function is whatever the return type expression states. Example use might be the function is_null() where the output is always boolean independent of the nullability of the input. DISCRETE The input and arguments all define concrete nullability and can only be bound to the types that have those nullability. For example, if a type input is declared i64? and one has an i64 literal, the i64 literal must be specifically cast to i64? to allow the operation to bind."},{"location":"expressions/scalar_functions/#parameterized-types","title":"Parameterized Types","text":"

    Types are parameterized by two types of values: by inner types (e.g. List<K>) and numeric values (e.g. DECIMAL<P,S>). Parameter names are simple strings (frequently a single character). There are two types of parameters: integer parameters and type parameters.

    When the same parameter name is used multiple times in a function definition, the function can only bind if the exact same value is used for all parameters of that name. For example, if one had a function with a signature of fn(VARCHAR<N>, VARCHAR<N>), the function would be only be usable if both VARCHAR types had the same length value N. This necessitates that all instances of the same parameter name must be of the same parameter type (all instances are a type parameter or all instances are an integer parameter).

    "},{"location":"expressions/scalar_functions/#type-parameter-resolution-in-variadic-functions","title":"Type Parameter Resolution in Variadic Functions","text":"

    When the last argument of a function is variadic and declares a type parameter e.g. fn(A, B, C...), the C parameter can be marked as either consistent or inconsistent. If marked as consistent, the function can only be bound to arguments where all the C types are the same concrete type. If marked as inconsistent, each unique C can be bound to a different type within the constraints of what T allows.

    "},{"location":"expressions/scalar_functions/#output-type-derivation","title":"Output Type Derivation","text":""},{"location":"expressions/scalar_functions/#concrete-return-types","title":"Concrete Return Types","text":"

    A concrete return type is one that is fully known at function definition time. Examples of simple concrete return types would be things such as i32, fp32. For compound types, a concrete return type must be fully declared. Example of fully defined compound types: VARCHAR<20>, DECIMAL<25,5>

    "},{"location":"expressions/scalar_functions/#return-type-expressions","title":"Return Type Expressions","text":"

    Any function can declare a return type expression. A return type expression uses a simplified set of expressions to describe how the return type should be returned. For example, a return expression could be as simple as the return of parameter declared in the arguments. For example f(List<K>) => K or can be a simple mathematical or conditional expression such as add(decimal<a,b>, decimal<c,d>) => decimal<a+c, b+d>. For the simple expression language, there is a very narrow set of types:

    • Integer: 64-bit signed integer (can be a literal or a parameter value)
    • Boolean: True and False
    • Type: A Substrait type (with possibly additional embedded expressions)

    These types are evaluated using a small set of operations to support common scenarios. List of valid operations:

    Math: +, -, *, /, min, max\nBoolean: &&, ||, !, <, >, ==\nParameters: type, integer\nLiterals: type, integer\n

    Fully defined with argument types:

    • type_parameter(string name) => type
    • integer_parameter(string name) => integer
    • not(boolean x) => boolean
    • and(boolean a, boolean b) => boolean
    • or(boolean a, boolean b) => boolean
    • multiply(integer a, integer b) => integer
    • divide(integer a, integer b) => integer
    • add(integer a, integer b) => integer
    • subtract(integer a, integer b) => integer
    • min(integer a, integer b) => integer
    • max(integer a, integer b) => integer
    • equal(integer a, integer b) => boolean
    • greater_than(integer a, integer b) => boolean
    • less_than(integer a, integer b) => boolean
    • covers(Type a, Type b) => boolean Covers means that type b matches type A for as much as type B is defined. For example, if type A is VARCHAR<20> and type B is VARCHAR<N>, type B would be considered covering. Similarlily if type A was List<Struct<a:f32, b:f32>>and type B was List<Struct<>>, it would be considered covering. Note that this is directional \u201cas in B covers A\u201d or \u201cB can be further enhanced to match the definition A\u201d.
    • if(boolean a) then (integer) else (integer)
    • if(boolean a) then (type) else (type)
    "},{"location":"expressions/scalar_functions/#example-type-expressions","title":"Example Type Expressions","text":"

    For reference, here are are some common output type derivations and how they can be expressed with a return type expression:

    Operation Definition Add item to list add(List<T>, T) => List<T> Decimal Division divide(Decimal<P1,S1>, Decimal<P2,S2>) => Decimal<P1 -S1 + S2 + MAX(6, S1 + P2 + 1), MAX(6, S1 + P2 + 1)> Select a subset of map keys based on a regular expression (requires stringlike keys) extract_values(regex:string, map:Map<K,V>) => List<V> WHERE K IN [STRING, VARCHAR<N>, FIXEDCHAR<N>] Concatenate two fixed sized character strings concat(FIXEDCHAR<A>, FIXEDCHAR<B>) => FIXEDCHAR<A+B> Make a struct of a set of fields and a struct definition. make_struct(<type> T, K...) => T"},{"location":"expressions/specialized_record_expressions/","title":"Specialized Record Expressions","text":"

    While all types of operations could be reduced to functions, in some cases this would be overly simplistic. Instead, it is helpful to construct some other expression constructs.

    These constructs should be focused on different expression types as opposed to something that directly related to syntactic sugar. For example, CAST and EXTRACT or SQL operations that are presented using specialized syntax. However, they can easily be modeled using a function paradigm with minimal complexity.

    "},{"location":"expressions/specialized_record_expressions/#literal-expressions","title":"Literal Expressions","text":"

    For each data type, it is possible to create a literal value for that data type. The representation depends on the serialization format. Literal expressions include both a type literal and a possibly null value.

    "},{"location":"expressions/specialized_record_expressions/#nested-type-constructor-expressions","title":"Nested Type Constructor Expressions","text":"

    These expressions allow structs, lists, and maps to be constructed from a set of expressions. For example, they allow a struct expression like (field 0 - field 1, field 0 + field 1) to be represented.

    "},{"location":"expressions/specialized_record_expressions/#cast-expression","title":"Cast Expression","text":"

    To convert a value from one type to another, Substrait defines a cast expression. Cast expressions declare an expected type, an input argument and an enumeration specifying failure behavior, indicating whether cast should return null on failure or throw an exception.

    Note that Substrait always requires a cast expression whenever the current type is not exactly equal to (one of) the expected types. For example, it is illegal to directly pass a value of type i8[0] to a function that only supports an i8?[0] argument.

    "},{"location":"expressions/specialized_record_expressions/#if-expression","title":"If Expression","text":"

    An if value expression is an expression composed of one if clause, zero or more else if clauses and an else clause. In pseudocode, they are envisioned as:

    if <boolean expression> then <result expression 1>\nelse if <boolean expression> then <result expression 2> (zero or more times)\nelse <result expression 3>\n

    When an if expression is declared, all return expressions must be the same identical type.

    "},{"location":"expressions/specialized_record_expressions/#shortcut-behavior","title":"Shortcut Behavior","text":"

    An if expression is expected to logically short-circuit on a positive outcome. This means that a skipped else/elseif expression cannot cause an error. For example, this should not actually throw an error despite the fact that the cast operation should fail.

    if 'value' = 'value' then 0\nelse cast('hello' as integer) \n
    "},{"location":"expressions/specialized_record_expressions/#switch-expression","title":"Switch Expression","text":"

    Switch expression allow a selection of alternate branches based on the value of a given expression. They are an optimized form of a generic if expression where all conditions are equality to the same value. In pseudocode:

    switch(value)\n<value 1> => <return 1> (1 or more times)\n<else> => <return default>\n

    Return values for a switch expression must all be of identical type.

    "},{"location":"expressions/specialized_record_expressions/#shortcut-behavior_1","title":"Shortcut Behavior","text":"

    As in if expressions, switch expression evaluation should not be interrupted by \u201croads not taken\u201d.

    "},{"location":"expressions/specialized_record_expressions/#or-list-equality-expression","title":"Or List Equality Expression","text":"

    A specialized structure that is often used is a large list of possible values. In SQL, these are typically large IN lists. They can be composed from one or more fields. There are two common patterns, single value and multi value. In pseudocode they are represented as:

    Single Value:\nexpression, [<value1>, <value2>, ... <valueN>]\n\nMulti Value:\n[expressionA, expressionB], [[value1a, value1b], [value2a, value2b].. [valueNa, valueNb]]\n

    For single value expressions, these are a compact equivalent of expression = value1 OR expression = value2 OR .. OR expression = valueN. When using an expression of this type, two things are required; the types of the test expression and all value expressions that are related must be of the same type. Additionally, a function signature for equality must be available for the expression type used.

    "},{"location":"expressions/subqueries/","title":"Subqueries","text":"

    Subqueries are scalar expressions comprised of another query.

    "},{"location":"expressions/subqueries/#forms","title":"Forms","text":""},{"location":"expressions/subqueries/#scalar","title":"Scalar","text":"

    Scalar subqueries are subqueries that return one row and one column.

    Property Description Required Input Input relation Yes"},{"location":"expressions/subqueries/#in-predicate","title":"IN predicate","text":"

    An IN subquery predicate checks that the left expression is contained in the right subquery.

    "},{"location":"expressions/subqueries/#examples","title":"Examples","text":"
    SELECT *\nFROM t1\nWHERE x IN (SELECT * FROM t2)\n
    SELECT *\nFROM t1\nWHERE (x, y) IN (SELECT a, b FROM t2)\n
    Property Description Required Needles Expressions whose existence will be checked Yes Haystack Subquery to check Yes"},{"location":"expressions/subqueries/#set-predicates","title":"Set predicates","text":"

    A set predicate is a predicate over a set of rows in the form of a subquery.

    EXISTS and UNIQUE are common SQL spellings of these kinds of predicates.

    Property Description Required Operation The operation to perform over the set Yes Tuples Set of tuples to check using the operation Yes"},{"location":"expressions/subqueries/#set-comparisons","title":"Set comparisons","text":"

    A set comparison subquery is a subquery comparison using ANY or ALL operations.

    "},{"location":"expressions/subqueries/#examples_1","title":"Examples","text":"
    SELECT *\nFROM t1\nWHERE x < ANY(SELECT y from t2)\n
    Property Description Required Reduction operation The kind of reduction to use over the subquery Yes Comparison operation The kind of comparison operation to use Yes Expression Left-hand side expression to check Yes Subquery Subquery to check Yes Protobuf Representation
    message Subquery {\n  oneof subquery_type {\n    // Scalar subquery\n    Scalar scalar = 1;\n    // x IN y predicate\n    InPredicate in_predicate = 2;\n    // EXISTS/UNIQUE predicate\n    SetPredicate set_predicate = 3;\n    // ANY/ALL predicate\n    SetComparison set_comparison = 4;\n  }\n\n  // A subquery with one row and one column. This is often an aggregate\n  // though not required to be.\n  message Scalar {\n    Rel input = 1;\n  }\n\n  // Predicate checking that the left expression is contained in the right\n  // subquery\n  //\n  // Examples:\n  //\n  // x IN (SELECT * FROM t)\n  // (x, y) IN (SELECT a, b FROM t)\n  message InPredicate {\n    repeated Expression needles = 1;\n    Rel haystack = 2;\n  }\n\n  // A predicate over a set of rows in the form of a subquery\n  // EXISTS and UNIQUE are common SQL forms of this operation.\n  message SetPredicate {\n    enum PredicateOp {\n      PREDICATE_OP_UNSPECIFIED = 0;\n      PREDICATE_OP_EXISTS = 1;\n      PREDICATE_OP_UNIQUE = 2;\n    }\n    // TODO: should allow expressions\n    PredicateOp predicate_op = 1;\n    Rel tuples = 2;\n  }\n\n  // A subquery comparison using ANY or ALL.\n  // Examples:\n  //\n  // SELECT *\n  // FROM t1\n  // WHERE x < ANY(SELECT y from t2)\n  message SetComparison {\n    enum ComparisonOp {\n      COMPARISON_OP_UNSPECIFIED = 0;\n      COMPARISON_OP_EQ = 1;\n      COMPARISON_OP_NE = 2;\n      COMPARISON_OP_LT = 3;\n      COMPARISON_OP_GT = 4;\n      COMPARISON_OP_LE = 5;\n      COMPARISON_OP_GE = 6;\n    }\n\n    enum ReductionOp {\n      REDUCTION_OP_UNSPECIFIED = 0;\n      REDUCTION_OP_ANY = 1;\n      REDUCTION_OP_ALL = 2;\n    }\n\n    // ANY or ALL\n    ReductionOp reduction_op = 1;\n    // A comparison operator\n    ComparisonOp comparison_op = 2;\n    // left side of the expression\n    Expression left = 3;\n    // right side of the expression\n    Rel right = 4;\n  }\n}\n
    "},{"location":"expressions/table_functions/","title":"Table Functions","text":"

    Table functions produce zero or more records for each input record. Table functions use a signature similar to scalar functions. However, they are not allowed in the same contexts.

    to be completed\u2026

    "},{"location":"expressions/user_defined_functions/","title":"User-Defined Functions","text":"

    Substrait supports the creation of custom functions using simple extensions, using the facilities described in scalar functions. The functions defined by Substrait use the same mechanism. The extension files for standard functions can be found here.

    Here\u2019s an example function that doubles its input:

    Implementation Note

    This implementation is only defined on 32-bit floats and integers but could be defined on all numbers (and even lists and strings). The user of the implementation can specify what happens when the resulting value falls outside of the valid range for a 32-bit float (either return NAN or raise an error).

    %YAML 1.2\n---\nscalar_functions:\n  -\n    name: \"double\"\n    description: \"Double the value\"\n    impls:\n      - args:\n          - name: x\n            value: fp32\n        options:\n          on_domain_error:\n            values: [ NAN, ERROR ]\n        return: fp32\n      - args:\n          - name: x\n            value: i32\n        options:\n          on_domain_error:\n            values: [ NAN, ERROR ]\n        return: i32\n
    "},{"location":"expressions/window_functions/","title":"Window Functions","text":"

    Window functions are functions which consume values from multiple records to produce a single output. They are similar to aggregate functions, but also have a focused window of analysis to compare to their partition window. Window functions are similar to scalar values to an end user, producing a single value for each input record. However, the consumption visibility for the production of each single record can be many records.

    Window function signatures contain all the properties defined for aggregate functions. Additionally, they contain the properties below

    Property Description Required Inherits All properties defined for aggregate functions. N/A Window Type STREAMING or PARTITION. Describes whether the function needs to see all data for the specific partition operation simultaneously. Operations like SUM can produce values in a streaming manner with no complete visibility of the partition. NTILE requires visibility of the entire partition before it can start producing values. Optional, defaults to PARTITION

    When binding an aggregate function, the binding must include the following additional properties beyond the standard scalar binding properties:

    Property Description Required Partition A list of partitioning expressions. False, defaults to a single partition for the entire dataset Lower Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to start of partition Upper Bound Bound Following(int64), Bound Trailing(int64) or CurrentRow. False, defaults to end of partition"},{"location":"expressions/window_functions/#aggregate-functions-as-window-functions","title":"Aggregate Functions as Window Functions","text":"

    Aggregate functions can be treated as a window functions with Window Type set to STREAMING.

    AVG, COUNT, MAX, MIN and SUM are examples of aggregate functions that are commonly allowed in window contexts.

    "},{"location":"extensions/","title":"Extensions","text":"

    In many cases, the existing objects in Substrait will be sufficient to accomplish a particular use case. However, it is sometimes helpful to create a new data type, scalar function signature or some other custom representation within a system. For that, Substrait provides a number of extension points.

    "},{"location":"extensions/#simple-extensions","title":"Simple Extensions","text":"

    Some kinds of primitives are so frequently extended that Substrait defines a standard YAML format that describes how the extended functionality can be interpreted. This allows different projects/systems to use the YAML definition as a specification so that interoperability isn\u2019t constrained to the base Substrait specification. The main types of extensions that are defined in this manner include the following:

    • Data types
    • Type variations
    • Scalar Functions
    • Aggregate Functions
    • Window Functions
    • Table Functions

    To extend these items, developers can create one or more YAML files at a defined URI that describes the properties of each of these extensions. The YAML file is constructed according to the YAML Schema. Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

    A Substrait plan can reference one or more YAML files via URI for extension. In the places where these entities are referenced, they will be referenced using a URI + name reference. The name scheme per type works as follows:

    Category Naming scheme Type The name as defined on the type object. Type Variation The name as defined on the type variation object. Function Signature A function signature compound name as described below.

    A YAML file can also reference types and type variations defined in another YAML file. To do this, it must declare the YAML file it depends on using a key-value pair in the dependencies key, where the value is the URI to the YAML file, and the key is a valid identifier that can then be used as an identifier-safe alias for the URI. This alias can then be used as a .-separated namespace prefix wherever a type class or type variation name is expected.

    For example, if the YAML file at file:///extension_types.yaml defines a type called point, a different YAML file can use the type in a function declaration as follows:

    dependencies:\n  ext: file:///extension_types.yaml\nscalar_functions:\n- name: distance\n  description: The distance between two points.\n  impls:\n  - args:\n    - name: a\n      value: ext.point\n    - name: b\n      value: ext.point\n    return: f64\n

    Here, the choice for the name ext is arbitrary, as long as it does not conflict with anything else in the YAML file.

    "},{"location":"extensions/#function-signature-compound-names","title":"Function Signature Compound Names","text":"

    A YAML file may contain one or more functions by the same name. The key used in the function extension declaration to reference a function is a combination of the name of the function along with a list of the required input argument types. The format is as follows:

    <function name>:<short_arg_type0>_<short_arg_type1>_..._<short_arg_typeN>\n

    Rather than using a full data type representation, the input argument types (short_arg_type) are mapped to single-level short name. The mappings are listed in the table below.

    Note

    Every compound function signature must be unique. If two function implementations in a YAML file would generate the same compound function signature, then the YAML file is invalid and behavior is undefined.

    Argument Type Signature Name Required Enumeration req i8 i8 i16 i16 i32 i32 i64 i64 fp32 fp32 fp64 fp64 string str binary vbin boolean bool timestamp ts timestamp_tz tstz date date time time interval_year iyear interval_day iday uuid uuid fixedchar<N> fchar varchar<N> vchar fixedbinary<N> fbin decimal<P,S> dec precision_timestamp<P> pts precision_timestamp_tz<P> ptstz struct<T1,T2,\u2026,TN> struct list<T> list map<K,V> map any[\\d]? any user defined type u!name"},{"location":"extensions/#examples","title":"Examples","text":"Function Signature Function Name add(optional enumeration, i8, i8) => i8 add:i8_i8 avg(fp32) => fp32 avg:fp32 extract(required enumeration, timestamp) => i64 extract:req_ts sum(any1) => any1 sum:any"},{"location":"extensions/#advanced-extensions","title":"Advanced Extensions","text":"

    Less common extensions can be extended using customization support at the serialization level. This includes the following kinds of extensions:

    Extension Type Description Relation Modification (semantic) Extensions to an existing relation that will alter the semantics of that relation. These kinds of extensions require that any plan consumer understand the extension to be able to manipulate or execute that operator. Ignoring these extensions will result in an incorrect interpretation of the plan. An example extension might be creating a customized version of Aggregate that can optionally apply a filter before aggregating the data. Note: Semantic-changing extensions shouldn\u2019t change the core characteristics of the underlying relation. For example, they should not change the default direct output field ordering, change the number of fields output or change the behavior of physical property characteristics. If one needs to change one of these behaviors, one should define a new relation as described below. Relation Modification (optimization) Extensions to an existing relation that can improve the efficiency of a plan consumer but don\u2019t fundamentally change the behavior of the operation. An example might be an estimated amount of memory the relation is expected to use or a particular algorithmic pattern that is perceived to be optimal. New Relations Creates an entirely new kind of relation. It is the most flexible way to extend Substrait but also make the Substrait plan the least interoperable. In most cases it is better to use a semantic changing relation as oppposed to a new relation as it means existing code patterns can easily be extended to work with the additional properties. New Read Types Defines a new subcategory of read that can be used in a ReadRel. One of Substrait is to provide a fairly extensive set of read patterns within the project as opposed to requiring people to define new types externally. As such, we suggest that you first talk with the Substrait community to determine whether you read type can be incorporated directly in the core specification. New Write Types Similar to a read type but for writes. As with reads, the community recommends that interested extenders first discuss with the community about developing new write types in the community before using the extension mechanisms. Plan Extensions Semantic and/or optimization based additions at the plan level.

    Because extension mechanisms are different for each serialization format, please refer to the corresponding serialization sections to understand how these extensions are defined in more detail.

    "},{"location":"extensions/functions_aggregate_approx/","title":"functions_aggregate_approx.yaml","text":"

    This document file is generated for functions_aggregate_approx.yaml

    "},{"location":"extensions/functions_aggregate_approx/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_aggregate_approx/#approx_count_distinct","title":"approx_count_distinct","text":"

    Implementations: approx_count_distinct(x): -> return_type 0. approx_count_distinct(any): -> i64

    Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

    "},{"location":"extensions/functions_aggregate_generic/","title":"functions_aggregate_generic.yaml","text":"

    This document file is generated for functions_aggregate_generic.yaml

    "},{"location":"extensions/functions_aggregate_generic/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_aggregate_generic/#count","title":"count","text":"

    Implementations: count(x, option:overflow): -> return_type 0. count(any, option:overflow): -> i64

    Count a set of values

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_aggregate_generic/#count_1","title":"count","text":"

    Implementations:

    Count a set of records (not field referenced)

    "},{"location":"extensions/functions_aggregate_generic/#any_value","title":"any_value","text":"

    Implementations: any_value(x): -> return_type 0. any_value(any): -> any?

    *Selects an arbitrary value from a group of values. If the input is empty, the function returns null. *

    "},{"location":"extensions/functions_arithmetic/","title":"functions_arithmetic.yaml","text":"

    This document file is generated for functions_arithmetic.yaml

    "},{"location":"extensions/functions_arithmetic/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_arithmetic/#add","title":"add","text":"

    Implementations: add(x, y, option:overflow): -> return_type 0. add(i8, i8, option:overflow): -> i8 1. add(i16, i16, option:overflow): -> i16 2. add(i32, i32, option:overflow): -> i32 3. add(i64, i64, option:overflow): -> i64 4. add(fp32, fp32, option:rounding): -> fp32 5. add(fp64, fp64, option:rounding): -> fp64

    Add two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y, option:overflow): -> return_type 0. subtract(i8, i8, option:overflow): -> i8 1. subtract(i16, i16, option:overflow): -> i16 2. subtract(i32, i32, option:overflow): -> i32 3. subtract(i64, i64, option:overflow): -> i64 4. subtract(fp32, fp32, option:rounding): -> fp32 5. subtract(fp64, fp64, option:rounding): -> fp64

    Subtract one value from another.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y, option:overflow): -> return_type 0. multiply(i8, i8, option:overflow): -> i8 1. multiply(i16, i16, option:overflow): -> i16 2. multiply(i32, i32, option:overflow): -> i32 3. multiply(i64, i64, option:overflow): -> i64 4. multiply(fp32, fp32, option:rounding): -> fp32 5. multiply(fp64, fp64, option:rounding): -> fp64

    Multiply two values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#divide","title":"divide","text":"

    Implementations: divide(x, y, option:overflow, option:on_domain_error, option:on_division_by_zero): -> return_type 0. divide(i8, i8, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i8 1. divide(i16, i16, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i16 2. divide(i32, i32, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i32 3. divide(i64, i64, option:overflow, option:on_domain_error, option:on_division_by_zero): -> i64 4. divide(fp32, fp32, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp32 5. divide(fp64, fp64, option:rounding, option:on_domain_error, option:on_division_by_zero): -> fp64

    *Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0). The on_division_by_zero option governs behavior in cases where y is 0. If the option is IEEE then the IEEE754 standard is followed: all values except \u00b1infinity return NaN and \u00b1infinity are unchanged. If the option is LIMIT then the result is \u00b1infinity in all cases. If either x or y are NaN then behavior will be governed by on_domain_error. If x and y are both \u00b1infinity, behavior will be governed by on_domain_error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • on_division_by_zero ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • rounding ['NAN', 'NULL', 'ERROR']
  • overflow ['IEEE', 'LIMIT', 'NULL', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#negate","title":"negate","text":"

    Implementations: negate(x, option:overflow): -> return_type 0. negate(i8, option:overflow): -> i8 1. negate(i16, option:overflow): -> i16 2. negate(i32, option:overflow): -> i32 3. negate(i64, option:overflow): -> i64 4. negate(fp32): -> fp32 5. negate(fp64): -> fp64

    Negation of the value

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#modulus","title":"modulus","text":"

    Implementations: modulus(x, y, option:division_type, option:overflow, option:on_domain_error): -> return_type 0. modulus(i8, i8, option:division_type, option:overflow, option:on_domain_error): -> i8 1. modulus(i16, i16, option:division_type, option:overflow, option:on_domain_error): -> i16 2. modulus(i32, i32, option:division_type, option:overflow, option:on_domain_error): -> i32 3. modulus(i64, i64, option:division_type, option:overflow, option:on_domain_error): -> i64

    *Calculate the remainder \u00ae when dividing dividend (x) by divisor (y). In mathematics, many conventions for the modulus (mod) operation exists. The result of a mod operation depends on the software implementation and underlying hardware. Substrait is a format for describing compute operations on structured data and designed for interoperability. Therefore the user is responsible for determining a definition of division as defined by the quotient (q). The following basic conditions of division are satisfied: (1) q \u2208 \u2124 (the quotient is an integer) (2) x = y * q + r (division rule) (3) abs\u00ae < abs(y) where q is the quotient. The division_type option determines the mathematical definition of quotient to use in the above definition of division. When division_type=TRUNCATE, q = trunc(x/y). When division_type=FLOOR, q = floor(x/y). In the cases of TRUNCATE and FLOOR division: remainder r = x - round_func(x/y) The on_domain_error option governs behavior in cases where y is 0, y is \u00b1inf, or x is \u00b1inf. In these cases the mod is undefined. The overflow option governs behavior when integer overflow occurs. If x and y are both 0 or both \u00b1infinity, behavior will be governed by on_domain_error. *

    Options:
  • division_type ['TRUNCATE', 'FLOOR']
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • on_domain_error ['NULL', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#power","title":"power","text":"

    Implementations: power(x, y, option:overflow): -> return_type 0. power(i64, i64, option:overflow): -> i64 1. power(fp32, fp32): -> fp32 2. power(fp64, fp64): -> fp64

    Take the power with x as the base and y as exponent.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sqrt","title":"sqrt","text":"

    Implementations: sqrt(x, option:rounding, option:on_domain_error): -> return_type 0. sqrt(i64, option:rounding, option:on_domain_error): -> fp64 1. sqrt(fp32, option:rounding, option:on_domain_error): -> fp32 2. sqrt(fp64, option:rounding, option:on_domain_error): -> fp64

    Square root of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#exp","title":"exp","text":"

    Implementations: exp(x, option:rounding): -> return_type 0. exp(fp32, option:rounding): -> fp32 1. exp(fp64, option:rounding): -> fp64

    The mathematical constant e, raised to the power of the value.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#cos","title":"cos","text":"

    Implementations: cos(x, option:rounding): -> return_type 0. cos(fp32, option:rounding): -> fp64 1. cos(fp64, option:rounding): -> fp64

    Get the cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#sin","title":"sin","text":"

    Implementations: sin(x, option:rounding): -> return_type 0. sin(fp32, option:rounding): -> fp64 1. sin(fp64, option:rounding): -> fp64

    Get the sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#tan","title":"tan","text":"

    Implementations: tan(x, option:rounding): -> return_type 0. tan(fp32, option:rounding): -> fp64 1. tan(fp64, option:rounding): -> fp64

    Get the tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#cosh","title":"cosh","text":"

    Implementations: cosh(x, option:rounding): -> return_type 0. cosh(fp32, option:rounding): -> fp32 1. cosh(fp64, option:rounding): -> fp64

    Get the hyperbolic cosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#sinh","title":"sinh","text":"

    Implementations: sinh(x, option:rounding): -> return_type 0. sinh(fp32, option:rounding): -> fp32 1. sinh(fp64, option:rounding): -> fp64

    Get the hyperbolic sine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#tanh","title":"tanh","text":"

    Implementations: tanh(x, option:rounding): -> return_type 0. tanh(fp32, option:rounding): -> fp32 1. tanh(fp64, option:rounding): -> fp64

    Get the hyperbolic tangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#acos","title":"acos","text":"

    Implementations: acos(x, option:rounding, option:on_domain_error): -> return_type 0. acos(fp32, option:rounding, option:on_domain_error): -> fp64 1. acos(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#asin","title":"asin","text":"

    Implementations: asin(x, option:rounding, option:on_domain_error): -> return_type 0. asin(fp32, option:rounding, option:on_domain_error): -> fp64 1. asin(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#atan","title":"atan","text":"

    Implementations: atan(x, option:rounding): -> return_type 0. atan(fp32, option:rounding): -> fp64 1. atan(fp64, option:rounding): -> fp64

    Get the arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#acosh","title":"acosh","text":"

    Implementations: acosh(x, option:rounding, option:on_domain_error): -> return_type 0. acosh(fp32, option:rounding, option:on_domain_error): -> fp32 1. acosh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arccosine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#asinh","title":"asinh","text":"

    Implementations: asinh(x, option:rounding): -> return_type 0. asinh(fp32, option:rounding): -> fp32 1. asinh(fp64, option:rounding): -> fp64

    Get the hyperbolic arcsine of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#atanh","title":"atanh","text":"

    Implementations: atanh(x, option:rounding, option:on_domain_error): -> return_type 0. atanh(fp32, option:rounding, option:on_domain_error): -> fp32 1. atanh(fp64, option:rounding, option:on_domain_error): -> fp64

    Get the hyperbolic arctangent of a value in radians.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#atan2","title":"atan2","text":"

    Implementations: atan2(x, y, option:rounding, option:on_domain_error): -> return_type 0. atan2(fp32, fp32, option:rounding, option:on_domain_error): -> fp64 1. atan2(fp64, fp64, option:rounding, option:on_domain_error): -> fp64

    Get the arctangent of values given as x/y pairs.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#radians","title":"radians","text":"

    Implementations: radians(x, option:rounding): -> return_type 0. radians(fp32, option:rounding): -> fp32 1. radians(fp64, option:rounding): -> fp64

    *Converts angle x in degrees to radians. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#degrees","title":"degrees","text":"

    Implementations: degrees(x, option:rounding): -> return_type 0. degrees(fp32, option:rounding): -> fp32 1. degrees(fp64, option:rounding): -> fp64

    *Converts angle x in radians to degrees. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#abs","title":"abs","text":"

    Implementations: abs(x, option:overflow): -> return_type 0. abs(i8, option:overflow): -> i8 1. abs(i16, option:overflow): -> i16 2. abs(i32, option:overflow): -> i32 3. abs(i64, option:overflow): -> i64 4. abs(fp32): -> fp32 5. abs(fp64): -> fp64

    *Calculate the absolute value of the argument. Integer values allow the specification of overflow behavior to handle the unevenness of the twos complement, e.g. Int8 range [-128 : 127]. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sign","title":"sign","text":"

    Implementations: sign(x): -> return_type 0. sign(i8): -> i8 1. sign(i16): -> i16 2. sign(i32): -> i32 3. sign(i64): -> i64 4. sign(fp32): -> fp32 5. sign(fp64): -> fp64

    *Return the signedness of the argument. Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] *

    "},{"location":"extensions/functions_arithmetic/#factorial","title":"factorial","text":"

    Implementations: factorial(n, option:overflow): -> return_type 0. factorial(i32, option:overflow): -> i32 1. factorial(i64, option:overflow): -> i64

    *Return the factorial of a given integer input. The factorial of 0! is 1 by convention. Negative inputs will raise an error. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#bitwise_not","title":"bitwise_not","text":"

    Implementations: bitwise_not(x): -> return_type 0. bitwise_not(i8): -> i8 1. bitwise_not(i16): -> i16 2. bitwise_not(i32): -> i32 3. bitwise_not(i64): -> i64

    *Return the bitwise NOT result for one integer input. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_and","title":"bitwise_and","text":"

    Implementations: bitwise_and(x, y): -> return_type 0. bitwise_and(i8, i8): -> i8 1. bitwise_and(i16, i16): -> i16 2. bitwise_and(i32, i32): -> i32 3. bitwise_and(i64, i64): -> i64

    *Return the bitwise AND result for two integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_or","title":"bitwise_or","text":"

    Implementations: bitwise_or(x, y): -> return_type 0. bitwise_or(i8, i8): -> i8 1. bitwise_or(i16, i16): -> i16 2. bitwise_or(i32, i32): -> i32 3. bitwise_or(i64, i64): -> i64

    *Return the bitwise OR result for two given integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#bitwise_xor","title":"bitwise_xor","text":"

    Implementations: bitwise_xor(x, y): -> return_type 0. bitwise_xor(i8, i8): -> i8 1. bitwise_xor(i16, i16): -> i16 2. bitwise_xor(i32, i32): -> i32 3. bitwise_xor(i64, i64): -> i64

    *Return the bitwise XOR result for two integer inputs. *

    "},{"location":"extensions/functions_arithmetic/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_arithmetic/#sum","title":"sum","text":"

    Implementations: sum(x, option:overflow): -> return_type 0. sum(i8, option:overflow): -> i64? 1. sum(i16, option:overflow): -> i64? 2. sum(i32, option:overflow): -> i64? 3. sum(i64, option:overflow): -> i64? 4. sum(fp32, option:overflow): -> fp64? 5. sum(fp64, option:overflow): -> fp64?

    Sum a set of values. The sum of zero elements yields null.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#sum0","title":"sum0","text":"

    Implementations: sum0(x, option:overflow): -> return_type 0. sum0(i8, option:overflow): -> i64 1. sum0(i16, option:overflow): -> i64 2. sum0(i32, option:overflow): -> i64 3. sum0(i64, option:overflow): -> i64 4. sum0(fp32, option:overflow): -> fp64 5. sum0(fp64, option:overflow): -> fp64

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#avg","title":"avg","text":"

    Implementations: avg(x, option:overflow): -> return_type 0. avg(i8, option:overflow): -> i8? 1. avg(i16, option:overflow): -> i16? 2. avg(i32, option:overflow): -> i32? 3. avg(i64, option:overflow): -> i64? 4. avg(fp32, option:overflow): -> fp32? 5. avg(fp64, option:overflow): -> fp64?

    Average a set of values. For integral types, this truncates partial values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(i8): -> i8? 1. min(i16): -> i16? 2. min(i32): -> i32? 3. min(i64): -> i64? 4. min(fp32): -> fp32? 5. min(fp64): -> fp64? 6. min(timestamp): -> timestamp? 7. min(timestamp_tz): -> timestamp_tz?

    Min a set of values.

    "},{"location":"extensions/functions_arithmetic/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(i8): -> i8? 1. max(i16): -> i16? 2. max(i32): -> i32? 3. max(i64): -> i64? 4. max(fp32): -> fp32? 5. max(fp64): -> fp64? 6. max(timestamp): -> timestamp? 7. max(timestamp_tz): -> timestamp_tz?

    Max a set of values.

    "},{"location":"extensions/functions_arithmetic/#product","title":"product","text":"

    Implementations: product(x, option:overflow): -> return_type 0. product(i8, option:overflow): -> i8 1. product(i16, option:overflow): -> i16 2. product(i32, option:overflow): -> i32 3. product(i64, option:overflow): -> i64 4. product(fp32, option:rounding): -> fp32 5. product(fp64, option:rounding): -> fp64

    Product of a set of values. Returns 1 for empty input.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#std_dev","title":"std_dev","text":"

    Implementations: std_dev(x, option:rounding, option:distribution): -> return_type 0. std_dev(fp32, option:rounding, option:distribution): -> fp32? 1. std_dev(fp64, option:rounding, option:distribution): -> fp64?

    Calculates standard-deviation for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • "},{"location":"extensions/functions_arithmetic/#variance","title":"variance","text":"

    Implementations: variance(x, option:rounding, option:distribution): -> return_type 0. variance(fp32, option:rounding, option:distribution): -> fp32? 1. variance(fp64, option:rounding, option:distribution): -> fp64?

    Calculates variance for a set of values.

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • distribution ['SAMPLE', 'POPULATION']
  • "},{"location":"extensions/functions_arithmetic/#corr","title":"corr","text":"

    Implementations: corr(x, y, option:rounding): -> return_type 0. corr(fp32, fp32, option:rounding): -> fp32? 1. corr(fp64, fp64, option:rounding): -> fp64?

    *Calculates the value of Pearson\u2019s correlation coefficient between x and y. If there is no input, null is returned. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#mode","title":"mode","text":"

    Implementations: mode(x): -> return_type 0. mode(i8): -> i8? 1. mode(i16): -> i16? 2. mode(i32): -> i32? 3. mode(i64): -> i64? 4. mode(fp32): -> fp32? 5. mode(fp64): -> fp64?

    *Calculates mode for a set of values. If there is no input, null is returned. *

    "},{"location":"extensions/functions_arithmetic/#median","title":"median","text":"

    Implementations: median(precision, x, option:rounding): -> return_type 0. median(precision, i8, option:rounding): -> i8? 1. median(precision, i16, option:rounding): -> i16? 2. median(precision, i32, option:rounding): -> i32? 3. median(precision, i64, option:rounding): -> i64? 4. median(precision, fp32, option:rounding): -> fp32? 5. median(precision, fp64, option:rounding): -> fp64?

    *Calculate the median for a set of values. Returns null if applied to zero records. For the integer implementations, the rounding option determines how the median should be rounded if it ends up midway between two values. For the floating point implementations, they specify the usual floating point rounding mode. *

    Options:
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#quantile","title":"quantile","text":"

    Implementations: quantile(boundaries, precision, n, distribution, option:rounding): -> return_type

  • n: A positive integer which defines the number of quantile partitions.
  • distribution: The data for which the quantiles should be computed.
  • 0. quantile(boundaries, precision, i64, any, option:rounding): -> LIST?<any>

    *Calculates quantiles for a set of values. This function will divide the aggregated values (passed via the distribution argument) over N equally-sized bins, where N is passed via a constant argument. It will then return the values at the boundaries of these bins in list form. If the input is appropriately sorted, this computes the quantiles of the distribution. The function can optionally return the first and/or last element of the input, as specified by the boundaries argument. If the input is appropriately sorted, this will thus be the minimum and/or maximum values of the distribution. When the boundaries do not lie exactly on elements of the incoming distribution, the function will interpolate between the two nearby elements. If the interpolated value cannot be represented exactly, the rounding option controls how the value should be selected or computed. The function fails and returns null in the following cases: - n is null or less than one; - any value in distribution is null.

    The function returns an empty list if n equals 1 and boundaries is set to NEITHER. *

    Options:
  • boundaries ['NEITHER', 'MINIMUM', 'MAXIMUM', 'BOTH']
  • precision ['EXACT', 'APPROXIMATE']
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • "},{"location":"extensions/functions_arithmetic/#window-functions","title":"Window Functions","text":""},{"location":"extensions/functions_arithmetic/#row_number","title":"row_number","text":"

    Implementations: 0. row_number(): -> i64?

    the number of the current row within its partition.

    "},{"location":"extensions/functions_arithmetic/#rank","title":"rank","text":"

    Implementations: 0. rank(): -> i64?

    the rank of the current row, with gaps.

    "},{"location":"extensions/functions_arithmetic/#dense_rank","title":"dense_rank","text":"

    Implementations: 0. dense_rank(): -> i64?

    the rank of the current row, without gaps.

    "},{"location":"extensions/functions_arithmetic/#percent_rank","title":"percent_rank","text":"

    Implementations: 0. percent_rank(): -> fp64?

    the relative rank of the current row.

    "},{"location":"extensions/functions_arithmetic/#cume_dist","title":"cume_dist","text":"

    Implementations: 0. cume_dist(): -> fp64?

    the cumulative distribution.

    "},{"location":"extensions/functions_arithmetic/#ntile","title":"ntile","text":"

    Implementations: ntile(x): -> return_type 0. ntile(i32): -> i32? 1. ntile(i64): -> i64?

    Return an integer ranging from 1 to the argument value,dividing the partition as equally as possible.

    "},{"location":"extensions/functions_arithmetic/#first_value","title":"first_value","text":"

    Implementations: first_value(expression): -> return_type 0. first_value(any1): -> any1

    *Returns the first value in the window. *

    "},{"location":"extensions/functions_arithmetic/#last_value","title":"last_value","text":"

    Implementations: last_value(expression): -> return_type 0. last_value(any1): -> any1

    *Returns the last value in the window. *

    "},{"location":"extensions/functions_arithmetic/#nth_value","title":"nth_value","text":"

    Implementations: nth_value(expression, window_offset, option:on_domain_error): -> return_type 0. nth_value(any1, i32, option:on_domain_error): -> any1?

    *Returns a value from the nth row based on the window_offset. window_offset should be a positive integer. If the value of the window_offset is outside the range of the window, null is returned. The on_domain_error option governs behavior in cases where window_offset is not a positive integer or null. *

    Options:
  • on_domain_error ['NAN', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic/#lead","title":"lead","text":"

    Implementations: lead(expression): -> return_type 0. lead(any1): -> any1? 1. lead(any1, i32): -> any1? 2. lead(any1, i32, any1): -> any1?

    *Return a value from a following row based on a specified physical offset. This allows you to compare a value in the current row against a following row. The expression is evaluated against a row that comes after the current row based on the row_offset. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming before the current row, similar to the lag function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the window. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the following year. row_offset of 1. | year | sales | next_year_sales | | 2019 | 20.50 | 30.00 | | 2020 | 30.00 | 45.99 | | 2021 | 45.99 | null | *

    "},{"location":"extensions/functions_arithmetic/#lag","title":"lag","text":"

    Implementations: lag(expression): -> return_type 0. lag(any1): -> any1? 1. lag(any1, i32): -> any1? 2. lag(any1, i32, any1): -> any1?

    *Return a column value from a previous row based on a specified physical offset. This allows you to compare a value in the current row against a previous row. The expression is evaluated against a row that comes before the current row based on the row_offset. The expression can be a column, expression or subquery that evaluates to a single value. The row_offset should be a positive integer and is set to 1 if not specified explicitly. If the row_offset is negative, the expression will be evaluated against a row coming after the current row, similar to the lead function. A row_offset of null will return null. The function returns the default input value if row_offset goes beyond the scope of the partition. If a default value is not specified, it is set to null. Example comparing the sales of the current year to the previous year. row_offset of 1. | year | sales | previous_year_sales | | 2019 | 20.50 | null | | 2020 | 30.00 | 20.50 | | 2021 | 45.99 | 30.00 | *

    "},{"location":"extensions/functions_arithmetic_decimal/","title":"functions_arithmetic_decimal.yaml","text":"

    This document file is generated for functions_arithmetic_decimal.yaml

    "},{"location":"extensions/functions_arithmetic_decimal/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_arithmetic_decimal/#add","title":"add","text":"

    Implementations: add(x, y, option:overflow): -> return_type 0. add(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = init_scale + max(P1 - S1, P2 - S2) + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Add two decimal values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y, option:overflow): -> return_type 0. subtract(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = init_scale + max(P1 - S1, P2 - S2) + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y, option:overflow): -> return_type 0. multiply(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = S1 + S2\ninit_prec = P1 + P2 + 1\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#divide","title":"divide","text":"

    Implementations: divide(x, y, option:overflow): -> return_type 0. divide(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(6, S1 + P2 + 1)\ninit_prec = P1 - S1 + P2 + init_scale\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#modulus","title":"modulus","text":"

    Implementations: modulus(x, y, option:overflow): -> return_type 0. modulus(decimal<P1,S1>, decimal<P2,S2>, option:overflow): ->

    init_scale = max(S1,S2)\ninit_prec = min(P1 - S1, P2 - S2) + init_scale\nmin_scale = min(init_scale, 6)\ndelta = init_prec - 38\nprec = min(init_prec, 38)\nscale_after_borrow = max(init_scale - delta, min_scale)\nscale = init_prec > 38 ? scale_after_borrow : init_scale\nDECIMAL<prec, scale>  \n

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_arithmetic_decimal/#sum","title":"sum","text":"

    Implementations: sum(x, option:overflow): -> return_type 0. sum(DECIMAL<P, S>, option:overflow): -> DECIMAL?<38,S>

    Sum a set of values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#avg","title":"avg","text":"

    Implementations: avg(x, option:overflow): -> return_type 0. avg(DECIMAL<P,S>, option:overflow): -> DECIMAL<38,S>

    Average a set of values.

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_arithmetic_decimal/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(DECIMAL<P, S>): -> DECIMAL?<P, S>

    Min a set of values.

    "},{"location":"extensions/functions_arithmetic_decimal/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(DECIMAL<P,S>): -> DECIMAL?<P, S>

    Max a set of values.

    "},{"location":"extensions/functions_arithmetic_decimal/#sum0","title":"sum0","text":"

    Implementations: sum0(x, option:overflow): -> return_type 0. sum0(DECIMAL<P, S>, option:overflow): -> DECIMAL<38,S>

    *Sum a set of values. The sum of zero elements yields zero. Null values are ignored. *

    Options:
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • "},{"location":"extensions/functions_boolean/","title":"functions_boolean.yaml","text":"

    This document file is generated for functions_boolean.yaml

    "},{"location":"extensions/functions_boolean/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_boolean/#or","title":"or","text":"

    Implementations: or(a): -> return_type 0. or(boolean?): -> boolean?

    *The boolean or using Kleene logic. This function behaves as follows with nulls:

    true or null = true\n\nnull or true = true\n\nfalse or null = null\n\nnull or false = null\n\nnull or null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value or true is always true. Behavior for 0 or 1 inputs is as follows: or() -> false or(x) -> x *

    "},{"location":"extensions/functions_boolean/#and","title":"and","text":"

    Implementations: and(a): -> return_type 0. and(boolean?): -> boolean?

    *The boolean and using Kleene logic. This function behaves as follows with nulls:

    true and null = null\n\nnull and true = null\n\nfalse and null = false\n\nnull and false = false\n\nnull and null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value and false is always false. Behavior for 0 or 1 inputs is as follows: and() -> true and(x) -> x *

    "},{"location":"extensions/functions_boolean/#and_not","title":"and_not","text":"

    Implementations: and_not(a, b): -> return_type 0. and_not(boolean?, boolean?): -> boolean?

    *The boolean and of one value and the negation of the other using Kleene logic. This function behaves as follows with nulls:

    true and not null = null\n\nnull and not false = null\n\nfalse and not null = false\n\nnull and not true = false\n\nnull and not null = null\n

    In other words, in this context a null value really means \u201cunknown\u201d, and an unknown value and not true is always false, as is false and not an unknown value. *

    "},{"location":"extensions/functions_boolean/#xor","title":"xor","text":"

    Implementations: xor(a, b): -> return_type 0. xor(boolean?, boolean?): -> boolean?

    *The boolean xor of two values using Kleene logic. When a null is encountered in either input, a null is output. *

    "},{"location":"extensions/functions_boolean/#not","title":"not","text":"

    Implementations: not(a): -> return_type 0. not(boolean?): -> boolean?

    *The not of a boolean value. When a null is input, a null is output. *

    "},{"location":"extensions/functions_boolean/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_boolean/#bool_and","title":"bool_and","text":"

    Implementations: bool_and(a): -> return_type 0. bool_and(boolean): -> boolean?

    *If any value in the input is false, false is returned. If the input is empty or only contains nulls, null is returned. Otherwise, true is returned. *

    "},{"location":"extensions/functions_boolean/#bool_or","title":"bool_or","text":"

    Implementations: bool_or(a): -> return_type 0. bool_or(boolean): -> boolean?

    *If any value in the input is true, true is returned. If the input is empty or only contains nulls, null is returned. Otherwise, false is returned. *

    "},{"location":"extensions/functions_comparison/","title":"functions_comparison.yaml","text":"

    This document file is generated for functions_comparison.yaml

    "},{"location":"extensions/functions_comparison/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_comparison/#not_equal","title":"not_equal","text":"

    Implementations: not_equal(x, y): -> return_type 0. not_equal(any1, any1): -> boolean

    *Whether two values are not_equal. not_equal(x, y) := (x != y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#equal","title":"equal","text":"

    Implementations: equal(x, y): -> return_type 0. equal(any1, any1): -> boolean

    *Whether two values are equal. equal(x, y) := (x == y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_not_distinct_from","title":"is_not_distinct_from","text":"

    Implementations: is_not_distinct_from(x, y): -> return_type 0. is_not_distinct_from(any1, any1): -> boolean

    *Whether two values are equal. This function treats null values as comparable, so is_not_distinct_from(null, null) == True This is in contrast to equal, in which null values do not compare. *

    "},{"location":"extensions/functions_comparison/#lt","title":"lt","text":"

    Implementations: lt(x, y): -> return_type 0. lt(any1, any1): -> boolean

    *Less than. lt(x, y) := (x < y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#gt","title":"gt","text":"

    Implementations: gt(x, y): -> return_type 0. gt(any1, any1): -> boolean

    *Greater than. gt(x, y) := (x > y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#lte","title":"lte","text":"

    Implementations: lte(x, y): -> return_type 0. lte(any1, any1): -> boolean

    *Less than or equal to. lte(x, y) := (x <= y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#gte","title":"gte","text":"

    Implementations: gte(x, y): -> return_type 0. gte(any1, any1): -> boolean

    *Greater than or equal to. gte(x, y) := (x >= y) If either/both of x and y are null, null is returned. *

    "},{"location":"extensions/functions_comparison/#between","title":"between","text":"

    Implementations: between(expression, low, high): -> return_type

  • expression: The expression to test for in the range defined by `low` and `high`.
  • low: The value to check if greater than or equal to.
  • high: The value to check if less than or equal to.
  • 0. between(any1, any1, any1): -> boolean

    Whether the expression is greater than or equal to low and less than or equal to high. expression BETWEEN low AND high If low, high, or expression are null, null is returned.

    "},{"location":"extensions/functions_comparison/#is_null","title":"is_null","text":"

    Implementations: is_null(x): -> return_type 0. is_null(any1): -> boolean

    Whether a value is null. NaN is not null.

    "},{"location":"extensions/functions_comparison/#is_not_null","title":"is_not_null","text":"

    Implementations: is_not_null(x): -> return_type 0. is_not_null(any1): -> boolean

    Whether a value is not null. NaN is not null.

    "},{"location":"extensions/functions_comparison/#is_nan","title":"is_nan","text":"

    Implementations: is_nan(x): -> return_type 0. is_nan(fp32): -> boolean 1. is_nan(fp64): -> boolean

    *Whether a value is not a number. If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_finite","title":"is_finite","text":"

    Implementations: is_finite(x): -> return_type 0. is_finite(fp32): -> boolean 1. is_finite(fp64): -> boolean

    *Whether a value is finite (neither infinite nor NaN). If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#is_infinite","title":"is_infinite","text":"

    Implementations: is_infinite(x): -> return_type 0. is_infinite(fp32): -> boolean 1. is_infinite(fp64): -> boolean

    *Whether a value is infinite. If x is null, null is returned. *

    "},{"location":"extensions/functions_comparison/#nullif","title":"nullif","text":"

    Implementations: nullif(x, y): -> return_type 0. nullif(any1, any1): -> any1

    If two values are equal, return null. Otherwise, return the first value.

    "},{"location":"extensions/functions_comparison/#coalesce","title":"coalesce","text":"

    Implementations: 0. coalesce(any1, any1): -> any1

    Evaluate arguments from left to right and return the first argument that is not null. Once a non-null argument is found, the remaining arguments are not evaluated. If all arguments are null, return null.

    "},{"location":"extensions/functions_comparison/#least","title":"least","text":"

    Implementations: 0. least(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null if any argument evaluates to null.

    "},{"location":"extensions/functions_comparison/#least_skip_null","title":"least_skip_null","text":"

    Implementations: 0. least_skip_null(T, T): -> T

    Evaluates each argument and returns the smallest one. The function will return null only if all arguments evaluate to null.

    "},{"location":"extensions/functions_comparison/#greatest","title":"greatest","text":"

    Implementations: 0. greatest(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null if any argument evaluates to null.

    "},{"location":"extensions/functions_comparison/#greatest_skip_null","title":"greatest_skip_null","text":"

    Implementations: 0. greatest_skip_null(T, T): -> T

    Evaluates each argument and returns the largest one. The function will return null only if all arguments evaluate to null.

    "},{"location":"extensions/functions_datetime/","title":"functions_datetime.yaml","text":"

    This document file is generated for functions_datetime.yaml

    "},{"location":"extensions/functions_datetime/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_datetime/#extract","title":"extract","text":"

    Implementations: extract(component, x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. extract(component, timestamp_tz, string): -> i64 1. extract(component, precision_timestamp_tz<P1>, string): -> i64 2. extract(component, timestamp): -> i64 3. extract(component, precision_timestamp<P1>): -> i64 4. extract(component, date): -> i64 5. extract(component, time): -> i64 6. extract(component, indexing, timestamp_tz, string): -> i64 7. extract(component, indexing, precision_timestamp_tz<P1>, string): -> i64 8. extract(component, indexing, timestamp): -> i64 9. extract(component, indexing, precision_timestamp<P1>): -> i64 10. extract(component, indexing, date): -> i64

    Extract portion of a date/time value. * YEAR Return the year. * ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of its days in January. * US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more) of its days in January. Last week of US epidemiological year has the year\u2019s last Wednesday in it. US epidemiological week starts on Sunday. * QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter, April 1 through June 30 map to the second quarter, etc. * MONTH Return the number of the month within the year. * DAY Return the number of the day within the month. * DAY_OF_YEAR Return the number of the day within the year. January 1 maps to the first day, February 1 maps to the thirty-second day, etc. * MONDAY_DAY_OF_WEEK Return the number of the day within the week, from Monday (first day) to Sunday (seventh day). * SUNDAY_DAY_OF_WEEK Return the number of the day within the week, from Sunday (first day) to Saturday (seventh day). * MONDAY_WEEK Return the number of the week within the year. First week starts on first Monday of January. * SUNDAY_WEEK Return the number of the week within the year. First week starts on first Sunday of January. * ISO_WEEK Return the number of the ISO week within the ISO year. First ISO week has the majority (4 or more) of its days in January. ISO week starts on Monday. * US_WEEK Return the number of the US week within the US year. First US week has the majority (4 or more) of its days in January. US week starts on Sunday. * HOUR Return the hour (0-23). * MINUTE Return the minute (0-59). * SECOND Return the second (0-59). * MILLISECOND Return number of milliseconds since the last full second. * MICROSECOND Return number of microseconds since the last full millisecond. * NANOSECOND Return number of nanoseconds since the last full microsecond. * SUBSECOND Return number of microseconds since the last full second of the given timestamp. * UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds. * TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC. The range of values returned for QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK depends on whether counting starts at 1 or 0. This is governed by the indexing option. When indexing is ONE: * QUARTER returns values in range 1-4 * MONTH returns values in range 1-12 * DAY returns values in range 1-31 * DAY_OF_YEAR returns values in range 1-366 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 1-7 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 1-53 When indexing is ZERO: * QUARTER returns values in range 0-3 * MONTH returns values in range 0-11 * DAY returns values in range 0-30 * DAY_OF_YEAR returns values in range 0-365 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 0-6 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52 The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME']
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME']
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'UNIX_TIME']
  • indexing ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND']
  • component ['QUARTER', 'MONTH', 'DAY', 'DAY_OF_YEAR', 'MONDAY_DAY_OF_WEEK', 'SUNDAY_DAY_OF_WEEK', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK']
  • indexing ['ONE', 'ZERO']
  • "},{"location":"extensions/functions_datetime/#extract_boolean","title":"extract_boolean","text":"

    Implementations: extract_boolean(component, x): -> return_type 0. extract_boolean(component, timestamp): -> boolean 1. extract_boolean(component, timestamp_tz, string): -> boolean 2. extract_boolean(component, date): -> boolean

    *Extract boolean values of a date/time value. * IS_LEAP_YEAR Return true if year of the given value is a leap year and false otherwise. * IS_DST Return true if DST (Daylight Savings Time) is observed at the given value in the given timezone.

    Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.*

    Options:
  • component ['IS_LEAP_YEAR']
  • component ['IS_LEAP_YEAR', 'IS_DST']
  • "},{"location":"extensions/functions_datetime/#add","title":"add","text":"

    Implementations: add(x, y): -> return_type 0. add(timestamp, interval_year): -> timestamp 1. add(timestamp_tz, interval_year, string): -> timestamp_tz 2. add(date, interval_year): -> timestamp 3. add(timestamp, interval_day): -> timestamp 4. add(timestamp_tz, interval_day): -> timestamp_tz 5. add(date, interval_day): -> timestamp

    Add an interval to a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#multiply","title":"multiply","text":"

    Implementations: multiply(x, y): -> return_type 0. multiply(i8, interval_day): -> interval_day 1. multiply(i16, interval_day): -> interval_day 2. multiply(i32, interval_day): -> interval_day 3. multiply(i64, interval_day): -> interval_day 4. multiply(i8, interval_year): -> interval_year 5. multiply(i16, interval_year): -> interval_year 6. multiply(i32, interval_year): -> interval_year 7. multiply(i64, interval_year): -> interval_year

    Multiply an interval by an integral number.

    "},{"location":"extensions/functions_datetime/#add_intervals","title":"add_intervals","text":"

    Implementations: add_intervals(x, y): -> return_type 0. add_intervals(interval_day, interval_day): -> interval_day 1. add_intervals(interval_year, interval_year): -> interval_year

    Add two intervals together.

    "},{"location":"extensions/functions_datetime/#subtract","title":"subtract","text":"

    Implementations: subtract(x, y): -> return_type 0. subtract(timestamp, interval_year): -> timestamp 1. subtract(timestamp_tz, interval_year): -> timestamp_tz 2. subtract(timestamp_tz, interval_year, string): -> timestamp_tz 3. subtract(date, interval_year): -> date 4. subtract(timestamp, interval_day): -> timestamp 5. subtract(timestamp_tz, interval_day): -> timestamp_tz 6. subtract(date, interval_day): -> date

    Subtract an interval from a date/time type. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#lte","title":"lte","text":"

    Implementations: lte(x, y): -> return_type 0. lte(timestamp, timestamp): -> boolean 1. lte(timestamp_tz, timestamp_tz): -> boolean 2. lte(date, date): -> boolean 3. lte(interval_day, interval_day): -> boolean 4. lte(interval_year, interval_year): -> boolean

    less than or equal to

    "},{"location":"extensions/functions_datetime/#lt","title":"lt","text":"

    Implementations: lt(x, y): -> return_type 0. lt(timestamp, timestamp): -> boolean 1. lt(timestamp_tz, timestamp_tz): -> boolean 2. lt(date, date): -> boolean 3. lt(interval_day, interval_day): -> boolean 4. lt(interval_year, interval_year): -> boolean

    less than

    "},{"location":"extensions/functions_datetime/#gte","title":"gte","text":"

    Implementations: gte(x, y): -> return_type 0. gte(timestamp, timestamp): -> boolean 1. gte(timestamp_tz, timestamp_tz): -> boolean 2. gte(date, date): -> boolean 3. gte(interval_day, interval_day): -> boolean 4. gte(interval_year, interval_year): -> boolean

    greater than or equal to

    "},{"location":"extensions/functions_datetime/#gt","title":"gt","text":"

    Implementations: gt(x, y): -> return_type 0. gt(timestamp, timestamp): -> boolean 1. gt(timestamp_tz, timestamp_tz): -> boolean 2. gt(date, date): -> boolean 3. gt(interval_day, interval_day): -> boolean 4. gt(interval_year, interval_year): -> boolean

    greater than

    "},{"location":"extensions/functions_datetime/#assume_timezone","title":"assume_timezone","text":"

    Implementations: assume_timezone(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. assume_timezone(timestamp, string): -> timestamp_tz 1. assume_timezone(date, string): -> timestamp_tz

    Convert local timestamp to UTC-relative timestamp_tz using given local time\u2019s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#local_timestamp","title":"local_timestamp","text":"

    Implementations: local_timestamp(x, timezone): -> return_type

  • x: Timezone string from IANA tzdb.
  • 0. local_timestamp(timestamp_tz, string): -> timestamp

    Convert UTC-relative timestamp_tz to local timestamp using given local time\u2019s timezone. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#strptime_time","title":"strptime_time","text":"

    Implementations: strptime_time(time_string, format): -> return_type 0. strptime_time(string, string): -> time

    Parse string into time using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    "},{"location":"extensions/functions_datetime/#strptime_date","title":"strptime_date","text":"

    Implementations: strptime_date(date_string, format): -> return_type 0. strptime_date(string, string): -> date

    Parse string into date using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.

    "},{"location":"extensions/functions_datetime/#strptime_timestamp","title":"strptime_timestamp","text":"

    Implementations: strptime_timestamp(timestamp_string, format, timezone): -> return_type

  • timestamp_string: Timezone string from IANA tzdb.
  • 0. strptime_timestamp(string, string, string): -> timestamp_tz 1. strptime_timestamp(string, string): -> timestamp_tz

    Parse string into timestamp using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference. If timezone is present in timestamp and provided as parameter an error is thrown. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is supplied as parameter and present in the parsed string the parsed timezone is used. If parameter supplied timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#strftime","title":"strftime","text":"

    Implementations: strftime(x, format): -> return_type 0. strftime(timestamp, string): -> string 1. strftime(timestamp_tz, string, string): -> string 2. strftime(date, string): -> string 3. strftime(time, string): -> string

    Convert timestamp/date/time to string using provided format, see https://man7.org/linux/man-pages/man3/strftime.3.html for reference. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    "},{"location":"extensions/functions_datetime/#round_temporal","title":"round_temporal","text":"

    Implementations: round_temporal(x, rounding, unit, multiple, origin): -> return_type 0. round_temporal(timestamp, rounding, unit, i64, timestamp): -> timestamp 1. round_temporal(timestamp_tz, rounding, unit, i64, string, timestamp_tz): -> timestamp_tz 2. round_temporal(date, rounding, unit, i64, date): -> date 3. round_temporal(time, rounding, unit, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the origin in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • "},{"location":"extensions/functions_datetime/#round_calendar","title":"round_calendar","text":"

    Implementations: round_calendar(x, rounding, unit, origin, multiple): -> return_type 0. round_calendar(timestamp, rounding, unit, origin, i64): -> timestamp 1. round_calendar(timestamp_tz, rounding, unit, origin, i64, string): -> timestamp_tz 2. round_calendar(date, rounding, unit, origin, i64, date): -> date 3. round_calendar(time, rounding, unit, origin, i64, time): -> time

    Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the last origin unit in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: \u201cPacific/Marquesas\u201d, \u201cEtc/GMT+1\u201d. If timezone is invalid an error is thrown.

    Options:
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • origin ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • unit ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY']
  • origin ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • rounding ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • "},{"location":"extensions/functions_datetime/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_datetime/#min","title":"min","text":"

    Implementations: min(x): -> return_type 0. min(date): -> date? 1. min(time): -> time? 2. min(timestamp): -> timestamp? 3. min(timestamp_tz): -> timestamp_tz? 4. min(interval_day): -> interval_day? 5. min(interval_year): -> interval_year?

    Min a set of values.

    "},{"location":"extensions/functions_datetime/#max","title":"max","text":"

    Implementations: max(x): -> return_type 0. max(date): -> date? 1. max(time): -> time? 2. max(timestamp): -> timestamp? 3. max(timestamp_tz): -> timestamp_tz? 4. max(interval_day): -> interval_day? 5. max(interval_year): -> interval_year?

    Max a set of values.

    "},{"location":"extensions/functions_geometry/","title":"functions_geometry.yaml","text":"

    This document file is generated for functions_geometry.yaml

    "},{"location":"extensions/functions_geometry/#data-types","title":"Data Types","text":"

    name: geometry structure: BINARY

    "},{"location":"extensions/functions_geometry/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_geometry/#point","title":"point","text":"

    Implementations: point(x, y): -> return_type 0. point(fp64, fp64): -> u!geometry

    *Returns a 2D point with the given x and y coordinate values. *

    "},{"location":"extensions/functions_geometry/#make_line","title":"make_line","text":"

    Implementations: make_line(geom1, geom2): -> return_type 0. make_line(u!geometry, u!geometry): -> u!geometry

    *Returns a linestring connecting the endpoint of geometry geom1 to the begin point of geometry geom2. Repeated points at the beginning of input geometries are collapsed to a single point. A linestring can be closed or simple. A closed linestring starts and ends on the same point. A simple linestring does not cross or touch itself. *

    "},{"location":"extensions/functions_geometry/#x_coordinate","title":"x_coordinate","text":"

    Implementations: x_coordinate(point): -> return_type 0. x_coordinate(u!geometry): -> fp64

    *Return the x coordinate of the point. Return null if not available. *

    "},{"location":"extensions/functions_geometry/#y_coordinate","title":"y_coordinate","text":"

    Implementations: y_coordinate(point): -> return_type 0. y_coordinate(u!geometry): -> fp64

    *Return the y coordinate of the point. Return null if not available. *

    "},{"location":"extensions/functions_geometry/#num_points","title":"num_points","text":"

    Implementations: num_points(geom): -> return_type 0. num_points(u!geometry): -> i64

    *Return the number of points in the geometry. The geometry should be an linestring or circularstring. *

    "},{"location":"extensions/functions_geometry/#is_empty","title":"is_empty","text":"

    Implementations: is_empty(geom): -> return_type 0. is_empty(u!geometry): -> boolean

    *Return true is the geometry is an empty geometry. *

    "},{"location":"extensions/functions_geometry/#is_closed","title":"is_closed","text":"

    Implementations: is_closed(geom): -> return_type 0. is_closed(geometry): -> boolean

    *Return true if the geometry\u2019s start and end points are the same. *

    "},{"location":"extensions/functions_geometry/#is_simple","title":"is_simple","text":"

    Implementations: is_simple(geom): -> return_type 0. is_simple(u!geometry): -> boolean

    *Return true if the geometry does not self intersect. *

    "},{"location":"extensions/functions_geometry/#is_ring","title":"is_ring","text":"

    Implementations: is_ring(geom): -> return_type 0. is_ring(u!geometry): -> boolean

    *Return true if the geometry\u2019s start and end points are the same and it does not self intersect. *

    "},{"location":"extensions/functions_geometry/#geometry_type","title":"geometry_type","text":"

    Implementations: geometry_type(geom): -> return_type 0. geometry_type(u!geometry): -> string

    *Return the type of geometry as a string. *

    "},{"location":"extensions/functions_geometry/#envelope","title":"envelope","text":"

    Implementations: envelope(geom): -> return_type 0. envelope(u!geometry): -> u!geometry

    *Return the minimum bounding box for the input geometry as a geometry. The returned geometry is defined by the corner points of the bounding box. If the input geometry is a point or a line, the returned geometry can also be a point or line. *

    "},{"location":"extensions/functions_geometry/#dimension","title":"dimension","text":"

    Implementations: dimension(geom): -> return_type 0. dimension(u!geometry): -> i8

    *Return the dimension of the input geometry. If the input is a collection of geometries, return the largest dimension from the collection. Dimensionality is determined by the complexity of the input and not the coordinate system being used. Type dimensions: POINT - 0 LINE - 1 POLYGON - 2 *

    "},{"location":"extensions/functions_geometry/#is_valid","title":"is_valid","text":"

    Implementations: is_valid(geom): -> return_type 0. is_valid(u!geometry): -> boolean

    *Return true if the input geometry is a valid 2D geometry. For 3 dimensional and 4 dimensional geometries, the validity is still only tested in 2 dimensions. *

    "},{"location":"extensions/functions_geometry/#collection_extract","title":"collection_extract","text":"

    Implementations: collection_extract(geom_collection): -> return_type 0. collection_extract(u!geometry): -> u!geometry 1. collection_extract(u!geometry, i8): -> u!geometry

    *Given the input geometry collection, return a homogenous multi-geometry. All geometries in the multi-geometry will have the same dimension. If type is not specified, the multi-geometry will only contain geometries of the highest dimension. If type is specified, the multi-geometry will only contain geometries of that type. If there are no geometries of the specified type, an empty geometry is returned. Only points, linestrings, and polygons are supported. Type numbers: POINT - 0 LINE - 1 POLYGON - 2 *

    "},{"location":"extensions/functions_geometry/#flip_coordinates","title":"flip_coordinates","text":"

    Implementations: flip_coordinates(geom_collection): -> return_type 0. flip_coordinates(u!geometry): -> u!geometry

    *Return a version of the input geometry with the X and Y axis flipped. This operation can be performed on geometries with more than 2 dimensions. However, only X and Y axis will be flipped. *

    "},{"location":"extensions/functions_geometry/#remove_repeated_points","title":"remove_repeated_points","text":"

    Implementations: remove_repeated_points(geom): -> return_type 0. remove_repeated_points(u!geometry): -> u!geometry 1. remove_repeated_points(u!geometry, fp64): -> u!geometry

    *Return a version of the input geometry with duplicate consecutive points removed. If the tolerance argument is provided, consecutive points within the tolerance distance of one another are considered to be duplicates. *

    "},{"location":"extensions/functions_geometry/#buffer","title":"buffer","text":"

    Implementations: buffer(geom, buffer_radius): -> return_type 0. buffer(u!geometry, fp64): -> u!geometry

    *Compute and return an expanded version of the input geometry. All the points of the returned geometry are at a distance of buffer_radius away from the points of the input geometry. If a negative buffer_radius is provided, the geometry will shrink instead of expand. A negative buffer_radius may shrink the geometry completely, in which case an empty geometry is returned. For input the geometries of points or lines, a negative buffer_radius will always return an emtpy geometry. *

    "},{"location":"extensions/functions_geometry/#centroid","title":"centroid","text":"

    Implementations: centroid(geom): -> return_type 0. centroid(u!geometry): -> u!geometry

    *Return a point which is the geometric center of mass of the input geometry. *

    "},{"location":"extensions/functions_geometry/#minimum_bounding_circle","title":"minimum_bounding_circle","text":"

    Implementations: minimum_bounding_circle(geom): -> return_type 0. minimum_bounding_circle(u!geometry): -> u!geometry

    *Return the smallest circle polygon that contains the input geometry. *

    "},{"location":"extensions/functions_logarithmic/","title":"functions_logarithmic.yaml","text":"

    This document file is generated for functions_logarithmic.yaml

    "},{"location":"extensions/functions_logarithmic/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_logarithmic/#ln","title":"ln","text":"

    Implementations: ln(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. ln(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. ln(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Natural logarithm of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log10","title":"log10","text":"

    Implementations: log10(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log10(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log10(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 10 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log2","title":"log2","text":"

    Implementations: log2(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log2(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log2(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    Logarithm to base 2 of the value

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#logb","title":"logb","text":"

    Implementations: logb(x, base, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type

  • x: The number `x` to compute the logarithm of
  • base: The logarithm base `b` to use
  • 0. logb(fp32, fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. logb(fp64, fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Logarithm of the value with the given base logb(x, b) => log_{b} (x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_logarithmic/#log1p","title":"log1p","text":"

    Implementations: log1p(x, option:rounding, option:on_domain_error, option:on_log_zero): -> return_type 0. log1p(fp32, option:rounding, option:on_domain_error, option:on_log_zero): -> fp32 1. log1p(fp64, option:rounding, option:on_domain_error, option:on_log_zero): -> fp64

    *Natural logarithm (base e) of 1 + x log1p(x) => log(1+x) *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • "},{"location":"extensions/functions_rounding/","title":"functions_rounding.yaml","text":"

    This document file is generated for functions_rounding.yaml

    "},{"location":"extensions/functions_rounding/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_rounding/#ceil","title":"ceil","text":"

    Implementations: ceil(x): -> return_type 0. ceil(fp32): -> fp32 1. ceil(fp64): -> fp64

    *Rounding to the ceiling of the value x. *

    "},{"location":"extensions/functions_rounding/#floor","title":"floor","text":"

    Implementations: floor(x): -> return_type 0. floor(fp32): -> fp32 1. floor(fp64): -> fp64

    *Rounding to the floor of the value x. *

    "},{"location":"extensions/functions_rounding/#round","title":"round","text":"

    Implementations: round(x, s, option:rounding): -> return_type

  • x: Numerical expression to be rounded.
  • s: Number of decimal places to be rounded to. When `s` is a positive number, nothing will happen since `x` is an integer value. When `s` is a negative number, the rounding is performed to the nearest multiple of `10^(-s)`.
  • 0. round(i8, i32, option:rounding): -> i8? 1. round(i16, i32, option:rounding): -> i16? 2. round(i32, i32, option:rounding): -> i32? 3. round(i64, i32, option:rounding): -> i64? 4. round(fp32, i32, option:rounding): -> fp32? 5. round(fp64, i32, option:rounding): -> fp64?

    *Rounding the value x to s decimal places. *

    Options:
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR', 'AWAY_FROM_ZERO', 'TIE_DOWN', 'TIE_UP', 'TIE_TOWARDS_ZERO', 'TIE_TO_ODD']
  • "},{"location":"extensions/functions_set/","title":"functions_set.yaml","text":"

    This document file is generated for functions_set.yaml

    "},{"location":"extensions/functions_set/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_set/#index_in","title":"index_in","text":"

    Implementations: index_in(x, y, option:nan_equality): -> return_type 0. index_in(T, List<T>, option:nan_equality): -> int64?

    *Checks the membership of a value in a list of values Returns the first 0-based index value of some input T if T is equal to any element in List<T>. Returns NULL if not found. If T is NULL, returns NULL. If T is NaN: - Returns 0-based index of NaN in List<T> (default) - Returns NULL (if NAN_IS_NOT_NAN is specified) *

    Options:
  • nan_equality ['NAN_IS_NAN', 'NAN_IS_NOT_NAN']
  • "},{"location":"extensions/functions_string/","title":"functions_string.yaml","text":"

    This document file is generated for functions_string.yaml

    "},{"location":"extensions/functions_string/#scalar-functions","title":"Scalar Functions","text":""},{"location":"extensions/functions_string/#concat","title":"concat","text":"

    Implementations: concat(input, option:null_handling): -> return_type 0. concat(varchar<L1>, option:null_handling): -> varchar<L1> 1. concat(string, option:null_handling): -> string

    Concatenate strings. The null_handling option determines whether or not null values will be recognized by the function. If null_handling is set to IGNORE_NULLS, null value arguments will be ignored when strings are concatenated. If set to ACCEPT_NULLS, the result will be null if any argument passed to the concat function is null.

    Options:
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • "},{"location":"extensions/functions_string/#like","title":"like","text":"

    Implementations: like(input, match, option:case_sensitivity): -> return_type

  • input: The input string.
  • match: The string to match against the input string.
  • 0. like(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. like(string, string, option:case_sensitivity): -> boolean

    Are two strings like each other. The case_sensitivity option applies to the match argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#substring","title":"substring","text":"

    Implementations: substring(input, start, length, option:negative_start): -> return_type 0. substring(varchar<L1>, i32, i32, option:negative_start): -> varchar<L1> 1. substring(string, i32, i32, option:negative_start): -> string 2. substring(fixedchar<l1>, i32, i32, option:negative_start): -> string 3. substring(varchar<L1>, i32, option:negative_start): -> varchar<L1> 4. substring(string, i32, option:negative_start): -> string 5. substring(fixedchar<l1>, i32, option:negative_start): -> string

    Extract a substring of a specified length starting from position start. A start value of 1 refers to the first characters of the string. When length is not specified the function will extract a substring starting from position start and ending at the end of the string. The negative_start option applies to the start parameter. WRAP_FROM_END means the index will start from the end of the input and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. LEFT_OF_BEGINNING means the returned substring will start from the left of the first character. A start of -1 will begin 2 characters left of the the input, while a start of 0 begins 1 character left of the input.

    Options:
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • "},{"location":"extensions/functions_string/#regexp_match_substring","title":"regexp_match_substring","text":"

    Implementations: regexp_match_substring(input, pattern, position, occurrence, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_match_substring(varchar<L1>, varchar<L2>, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1> 1. regexp_match_substring(string, string, i64, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the occurrence argument. Specifying 1 means the first occurrence will be extracted, 2 means the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return the substring matching the full regular expression. Specifying 1 will return the substring matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#regexp_match_substring_all","title":"regexp_match_substring_all","text":"

    Implementations: regexp_match_substring_all(input, pattern, position, group, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_match_substring_all(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>> 1. regexp_match_substring_all(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return substrings matching the full regular expression. Specifying 1 will return substrings matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#starts_with","title":"starts_with","text":"

    Implementations: starts_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. starts_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. starts_with(varchar<L1>, string, option:case_sensitivity): -> boolean 2. starts_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. starts_with(string, string, option:case_sensitivity): -> boolean 4. starts_with(string, varchar<L1>, option:case_sensitivity): -> boolean 5. starts_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. starts_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. starts_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. starts_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string starts with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#ends_with","title":"ends_with","text":"

    Implementations: ends_with(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. ends_with(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. ends_with(varchar<L1>, string, option:case_sensitivity): -> boolean 2. ends_with(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. ends_with(string, string, option:case_sensitivity): -> boolean 4. ends_with(string, varchar<L1>, option:case_sensitivity): -> boolean 5. ends_with(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. ends_with(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. ends_with(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. ends_with(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether input string ends with the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#contains","title":"contains","text":"

    Implementations: contains(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. contains(varchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean 1. contains(varchar<L1>, string, option:case_sensitivity): -> boolean 2. contains(varchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 3. contains(string, string, option:case_sensitivity): -> boolean 4. contains(string, varchar<L1>, option:case_sensitivity): -> boolean 5. contains(string, fixedchar<L1>, option:case_sensitivity): -> boolean 6. contains(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> boolean 7. contains(fixedchar<L1>, string, option:case_sensitivity): -> boolean 8. contains(fixedchar<L1>, varchar<L2>, option:case_sensitivity): -> boolean

    Whether the input string contains the substring. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#strpos","title":"strpos","text":"

    Implementations: strpos(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to search for.
  • 0. strpos(string, string, option:case_sensitivity): -> i64 1. strpos(varchar<L1>, varchar<L1>, option:case_sensitivity): -> i64 2. strpos(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#regexp_strpos","title":"regexp_strpos","text":"

    Implementations: regexp_strpos(input, pattern, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_strpos(varchar<L1>, varchar<L2>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 1. regexp_strpos(string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the occurrence argument. Specifying 1 means the position first occurrence will be returned, 2 means the position of the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#count_substring","title":"count_substring","text":"

    Implementations: count_substring(input, substring, option:case_sensitivity): -> return_type

  • input: The input string.
  • substring: The substring to count.
  • 0. count_substring(string, string, option:case_sensitivity): -> i64 1. count_substring(varchar<L1>, varchar<L2>, option:case_sensitivity): -> i64 2. count_substring(fixedchar<L1>, fixedchar<L2>, option:case_sensitivity): -> i64

    Return the number of non-overlapping occurrences of a substring in an input string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#regexp_count_substring","title":"regexp_count_substring","text":"

    Implementations: regexp_count_substring(input, pattern, position, option:case_sensitivity, option:multiline, option:dotall): -> return_type 0. regexp_count_substring(string, string, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 1. regexp_count_substring(varchar<L1>, varchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64 2. regexp_count_substring(fixedchar<L1>, fixedchar<L2>, i64, option:case_sensitivity, option:multiline, option:dotall): -> i64

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#replace","title":"replace","text":"

    Implementations: replace(input, substring, replacement, option:case_sensitivity): -> return_type

  • input: Input string.
  • substring: The substring to replace.
  • replacement: The replacement string.
  • 0. replace(string, string, string, option:case_sensitivity): -> string 1. replace(varchar<L1>, varchar<L2>, varchar<L3>, option:case_sensitivity): -> varchar<L1>

    Replace all occurrences of the substring with the replacement string. The case_sensitivity option applies to the substring argument.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • "},{"location":"extensions/functions_string/#concat_ws","title":"concat_ws","text":"

    Implementations: concat_ws(separator, string_arguments): -> return_type

  • separator: Character to separate strings by.
  • string_arguments: Strings to be concatenated.
  • 0. concat_ws(string, string): -> string 1. concat_ws(varchar<L2>, varchar<L1>): -> varchar<L1>

    Concatenate strings together separated by a separator.

    "},{"location":"extensions/functions_string/#repeat","title":"repeat","text":"

    Implementations: repeat(input, count): -> return_type 0. repeat(string, i64): -> string 1. repeat(varchar<L1>, i64, i64): -> varchar<L1>

    Repeat a string count number of times.

    "},{"location":"extensions/functions_string/#reverse","title":"reverse","text":"

    Implementations: reverse(input): -> return_type 0. reverse(string): -> string 1. reverse(varchar<L1>): -> varchar<L1> 2. reverse(fixedchar<L1>): -> fixedchar<L1>

    Returns the string in reverse order.

    "},{"location":"extensions/functions_string/#replace_slice","title":"replace_slice","text":"

    Implementations: replace_slice(input, start, length, replacement): -> return_type

  • input: Input string.
  • start: The position in the string to start deleting/inserting characters.
  • length: The number of characters to delete from the input string.
  • replacement: The new string to insert at the start position.
  • 0. replace_slice(string, i64, i64, string): -> string 1. replace_slice(varchar<L1>, i64, i64, varchar<L2>): -> varchar<L1>

    Replace a slice of the input string. A specified \u2018length\u2019 of characters will be deleted from the input string beginning at the \u2018start\u2019 position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If \u2018length\u2019 is negative, a null string is returned. If \u2018length\u2019 is zero, inserting of the new string occurs at the specified \u2018start\u2019 position and no characters are deleted. If \u2018length\u2019 is greater than the input string, deletion will occur up to the last character of the input string.

    "},{"location":"extensions/functions_string/#lower","title":"lower","text":"

    Implementations: lower(input, option:char_set): -> return_type 0. lower(string, option:char_set): -> string 1. lower(varchar<L1>, option:char_set): -> varchar<L1> 2. lower(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#upper","title":"upper","text":"

    Implementations: upper(input, option:char_set): -> return_type 0. upper(string, option:char_set): -> string 1. upper(varchar<L1>, option:char_set): -> varchar<L1> 2. upper(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#swapcase","title":"swapcase","text":"

    Implementations: swapcase(input, option:char_set): -> return_type 0. swapcase(string, option:char_set): -> string 1. swapcase(varchar<L1>, option:char_set): -> varchar<L1> 2. swapcase(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Transform the string\u2019s lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#capitalize","title":"capitalize","text":"

    Implementations: capitalize(input, option:char_set): -> return_type 0. capitalize(string, option:char_set): -> string 1. capitalize(varchar<L1>, option:char_set): -> varchar<L1> 2. capitalize(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#title","title":"title","text":"

    Implementations: title(input, option:char_set): -> return_type 0. title(string, option:char_set): -> string 1. title(varchar<L1>, option:char_set): -> varchar<L1> 2. title(fixedchar<L1>, option:char_set): -> fixedchar<L1>

    Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • "},{"location":"extensions/functions_string/#char_length","title":"char_length","text":"

    Implementations: char_length(input): -> return_type 0. char_length(string): -> i64 1. char_length(varchar<L1>): -> i64 2. char_length(fixedchar<L1>): -> i64

    Return the number of characters in the input string. The length includes trailing spaces.

    "},{"location":"extensions/functions_string/#bit_length","title":"bit_length","text":"

    Implementations: bit_length(input): -> return_type 0. bit_length(string): -> i64 1. bit_length(varchar<L1>): -> i64 2. bit_length(fixedchar<L1>): -> i64

    Return the number of bits in the input string.

    "},{"location":"extensions/functions_string/#octet_length","title":"octet_length","text":"

    Implementations: octet_length(input): -> return_type 0. octet_length(string): -> i64 1. octet_length(varchar<L1>): -> i64 2. octet_length(fixedchar<L1>): -> i64

    Return the number of bytes in the input string.

    "},{"location":"extensions/functions_string/#regexp_replace","title":"regexp_replace","text":"

    Implementations: regexp_replace(input, pattern, replacement, position, occurrence, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • replacement: The replacement string.
  • position: The position to start the search.
  • occurrence: Which occurrence of the match to replace.
  • 0. regexp_replace(string, string, string, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> string 1. regexp_replace(varchar<L1>, varchar<L2>, varchar<L3>, i64, i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the occurrence argument. Specifying 1 means only the first occurrence will be replaced, 2 means the second occurrence, and so on. Specifying 0 means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#ltrim","title":"ltrim","text":"

    Implementations: ltrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. ltrim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. ltrim(string, string): -> string

    Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#rtrim","title":"rtrim","text":"

    Implementations: rtrim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. rtrim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. rtrim(string, string): -> string

    Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#trim","title":"trim","text":"

    Implementations: trim(input, characters): -> return_type

  • input: The string to remove characters from.
  • characters: The set of characters to remove.
  • 0. trim(varchar<L1>, varchar<L2>): -> varchar<L1> 1. trim(string, string): -> string

    Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.

    "},{"location":"extensions/functions_string/#lpad","title":"lpad","text":"

    Implementations: lpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. lpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1> 1. lpad(string, i32, string): -> string

    Left-pad the input string with the string of \u2018characters\u2019 until the specified length of the string has been reached. If the input string is longer than \u2018length\u2019, remove characters from the right-side to shorten it to \u2018length\u2019 characters. If the string of \u2018characters\u2019 is longer than the remaining \u2018length\u2019 needed to be filled, only pad until \u2018length\u2019 has been reached. If \u2018characters\u2019 is not specified, the default value is a single space.

    "},{"location":"extensions/functions_string/#rpad","title":"rpad","text":"

    Implementations: rpad(input, length, characters): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • characters: The string of characters to use for padding.
  • 0. rpad(varchar<L1>, i32, varchar<L2>): -> varchar<L1> 1. rpad(string, i32, string): -> string

    Right-pad the input string with the string of \u2018characters\u2019 until the specified length of the string has been reached. If the input string is longer than \u2018length\u2019, remove characters from the left-side to shorten it to \u2018length\u2019 characters. If the string of \u2018characters\u2019 is longer than the remaining \u2018length\u2019 needed to be filled, only pad until \u2018length\u2019 has been reached. If \u2018characters\u2019 is not specified, the default value is a single space.

    "},{"location":"extensions/functions_string/#center","title":"center","text":"

    Implementations: center(input, length, character, option:padding): -> return_type

  • input: The string to pad.
  • length: The length of the output string.
  • character: The character to use for padding.
  • 0. center(varchar<L1>, i32, varchar<L1>, option:padding): -> varchar<L1> 1. center(string, i32, string, option:padding): -> string

    Center the input string by padding the sides with a single character until the specified length of the string has been reached. By default, if the length will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the padding option. Behavior is undefined if the number of characters passed to the character argument is not 1.

    Options:
  • padding ['RIGHT', 'LEFT']
  • "},{"location":"extensions/functions_string/#left","title":"left","text":"

    Implementations: left(input, count): -> return_type 0. left(varchar<L1>, i32): -> varchar<L1> 1. left(string, i32): -> string

    Extract count characters starting from the left of the string.

    "},{"location":"extensions/functions_string/#right","title":"right","text":"

    Implementations: right(input, count): -> return_type 0. right(varchar<L1>, i32): -> varchar<L1> 1. right(string, i32): -> string

    Extract count characters starting from the right of the string.

    "},{"location":"extensions/functions_string/#string_split","title":"string_split","text":"

    Implementations: string_split(input, separator): -> return_type

  • input: The input string.
  • separator: A character used for splitting the string.
  • 0. string_split(varchar<L1>, varchar<L2>): -> List<varchar<L1>> 1. string_split(string, string): -> List<string>

    Split a string into a list of strings, based on a specified separator character.

    "},{"location":"extensions/functions_string/#regexp_string_split","title":"regexp_string_split","text":"

    Implementations: regexp_string_split(input, pattern, option:case_sensitivity, option:multiline, option:dotall): -> return_type

  • input: The input string.
  • pattern: The regular expression to search for within the input string.
  • 0. regexp_string_split(varchar<L1>, varchar<L2>, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>> 1. regexp_string_split(string, string, option:case_sensitivity, option:multiline, option:dotall): -> List<string>

    Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string.

    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • "},{"location":"extensions/functions_string/#aggregate-functions","title":"Aggregate Functions","text":""},{"location":"extensions/functions_string/#string_agg","title":"string_agg","text":"

    Implementations: string_agg(input, separator): -> return_type

  • input: Column of string values.
  • separator: Separator for concatenated strings
  • 0. string_agg(string, string): -> string

    Concatenates a column of string values with a separator.

    "},{"location":"relations/basics/","title":"Basics","text":"

    Substrait is designed to allow a user to construct an arbitrarily complex data transformation plan. The plan is composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.

    Each relational operation is composed of several properties. Common properties for relational operations include the following:

    Property Description Type Emit The set of columns output from this operation and the order of those columns. Logical & Physical Hints A set of optionally provided, optionally consumed information about an operation that better informs execution. These might include estimated number of input and output records, estimated record size, likely filter reduction, estimated dictionary size, etc. These can also include implementation specific pieces of execution information. Physical Constraint A set of runtime constraints around the operation, limiting its consumption based on real-world resources (CPU, memory) as well as virtual resources like number of records produced, the largest record size, etc. Physical"},{"location":"relations/basics/#relational-signatures","title":"Relational Signatures","text":"

    In functions, function signatures are declared externally to the use of those signatures (function bindings). In the case of relational operations, signatures are declared directly in the specification. This is due to the speed of change and number of total operations. Relational operations in the specification are expected to be <100 for several years with additions being infrequent. On the other hand, there is an expectation of both a much larger number of functions (1,000s) and a much higher velocity of additions.

    Each relational operation must declare the following:

    • Transformation logic around properties of the data. For example, does a relational operation maintain sortedness of a field? Does an operation change the distribution of data?
    • How many input relations does an operation require?
    • Does the operator produce an output (by specification, we limit relational operations to a single output at this time)
    • What is the schema and field ordering of an output (see emit below)?
    "},{"location":"relations/basics/#emit-output-ordering","title":"Emit: Output Ordering","text":"

    A relational operation uses field references to access specific fields of the input stream. Field references are always ordinal based on the order of the incoming streams. Each relational operation must declare the order of its output data. To simplify things, each relational operation can be in one of two modes:

    1. Direct output: The order of outputs is based on the definition declared by the relational operation.
    2. Remap: A listed ordering of the direct outputs. This remapping can be also used to drop columns no longer used (such as a filter field or join keys after a join). Note that remapping/exclusion can only be done at the outputs root struct. Filtering of compound values or extracting subsets must be done through other operation types (e.g. projection).
    "},{"location":"relations/basics/#relation-properties","title":"Relation Properties","text":"

    There are a number of predefined properties that exist in Substrait relations. These include the following.

    "},{"location":"relations/basics/#distribution","title":"Distribution","text":"

    When data is partitioned across multiple sibling sets, distribution describes that set of properties that apply to any one partition. This is based on a set of distribution expression properties. A distribution is declared as a set of one or more fields and a distribution type across all fields.

    Property Description Required Distribution Fields List of fields references that describe distribution (e.g. [0,2:4,5:0:0]). The order of these references do not impact results. Required for partitioned distribution type. Disallowed for singleton distribution type. Distribution Type PARTITIONED: For a discrete tuple of values for the declared distribution fields, all records with that tuple are located in the same partition. SINGLETON: there will only be a single partition for this operation. Required"},{"location":"relations/basics/#orderedness","title":"Orderedness","text":"

    A guarantee that data output from this operation is provided with a sort order. The sort order will be declared based on a set of sort field definitions based on the emitted output of this operation.

    Property Description Required Sort Fields A list of fields that the data are ordered by. The list is in order of the sort. If we sort by [0,1] then this means we only consider the data for field 1 to be ordered within each discrete value of field 0. At least one required. Per - Sort Field A field reference that the data is sorted by. Required Per - Sort Direction The direction of the data. See direction options below. Required"},{"location":"relations/basics/#ordering-directions","title":"Ordering Directions","text":"Direction Descriptions Nulls Position Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included before any values. First Descending Returns data in descending order based on the quality function associated with the type. Nulls are included before any values. First Ascending Returns data in ascending order based on the quality function associated with the type. Nulls are included after any values. Last Descending Returns data in descending order based on the quality function associated with the type. Nulls are included after any values. Last Custom function identifier Returns data using a custom function that returns -1, 0, or 1 depending on the order of the data. Per Function Clustered Ensures that all equal values are coalesced (but no ordering between values is defined). E.g. for values 1,2,3,1,2,3, output could be any of the following: 1,1,2,2,3,3 or 1,1,3,3,2,2 or 2,2,1,1,3,3 or 2,2,3,3,1,1 or 3,3,1,1,2,2 or 3,3,2,2,1,1. N/A, may appear anywhere but will be coalesced. Discussion Points
    • Should read definition types be more extensible in the same way that function signatures are? Are extensible read definition types necessary if we have custom relational operators?
    • How are decomposed reads expressed? For example, the Iceberg type above is for early logical planning. Once we do some operations, it may produce a list of Iceberg file reads. This is likely a secondary type of object.
    "},{"location":"relations/embedded_relations/","title":"Embedded Relations","text":"

    Pending.

    Embedded relations allow a Substrait producer to define a set operation that will be embedded in the plan.

    TODO: define lots of details about what interfaces, languages, formats, etc. Should reasonably be an extension of embedded user defined table functions.

    "},{"location":"relations/logical_relations/","title":"Logical Relations","text":""},{"location":"relations/logical_relations/#read-operator","title":"Read Operator","text":"

    The read operator is an operator that produces one output. A simple example would be the reading of a Parquet file. It is expected that many types of reads will be added over time.

    Signature Value Inputs 0 Outputs 1 Property Maintenance N/A (no inputs) Direct Output Order Defaults to the schema of the data read after the optional projection (masked complex expression) is applied."},{"location":"relations/logical_relations/#read-properties","title":"Read Properties","text":"Property Description Required Definition The contents of the read property definition. Required Direct Schema Defines the schema of the output of the read (before any projection or emit remapping/hiding). Required Filter A boolean Substrait expression that describes a filter that must be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none. Best Effort Filter A boolean Substrait expression that describes a filter that may be applied to the data. The filter should be interpreted against the direct schema. Optional, defaults to none. Projection A masked complex expression describing the portions of the content that should be read Optional, defaults to all of schema Output Properties Declaration of orderedness and/or distribution properties this read produces. Optional, defaults to no properties. Properties A list of name/value pairs associated with the read. Optional, defaults to empty"},{"location":"relations/logical_relations/#read-filtering","title":"Read Filtering","text":"

    The read relation has two different filter properties. A filter, which must be satisfied by the operator and a best effort filter, which does not have to be satisfied. This reflects the way that consumers are often implemented. A consumer is often only able to fully apply a limited set of operations in the scan. There can then be an extended set of operations which a consumer can apply in a best effort fashion. A producer, when setting these two fields, should take care to only use expressions that the consumer is capable of handling.

    As an example, a consumer may only be able to fully apply (in the read relation) <, =, and > on integral types. The consumer may be able to apply <, =, and > in a best effort fashion on decimal and string types. Consider the filter expression my_int < 10 && my_string < \"x\" && upper(my_string) > \"B\". In this case the filter should be set to my_int < 10 and the best_effort_filter should be set to my_string < \"x\" and the remaining portion (upper(my_string) > \"B\") should be put into a filter relation.

    A filter expression must be interpreted against the direct schema before the projection expression has been applied. As a result, fields may be referenced by the filter expression which are not included in the relation\u2019s output.

    "},{"location":"relations/logical_relations/#read-definition-types","title":"Read Definition Types","text":"Adding new Read Definition Types

    If you have a read definition that\u2019s not covered here, see the process for adding new read definition types.

    Read definition types (like the rest of the features in Substrait) are built by the community and added to the specification.

    "},{"location":"relations/logical_relations/#virtual-table","title":"Virtual Table","text":"

    A virtual table is a table whose contents are embedded in the plan itself. The table data is encoded as records consisting of literal values.

    Property Description Required Data Required Required"},{"location":"relations/logical_relations/#named-table","title":"Named Table","text":"

    A named table is a reference to data defined elsewhere. For example, there may be a catalog of tables with unique names that both the producer and consumer agree on. This catalog would provide the consumer with more information on how to retrieve the data.

    Property Description Required Names A list of namespaced strings that, together, form the table name Required (at least one)"},{"location":"relations/logical_relations/#files-type","title":"Files Type","text":"Property Description Required Items An array of Items (path or path glob) associated with the read. Required Format per item Enumeration of available formats. Only current option is PARQUET. Required Slicing parameters per item Information to use when reading a slice of a file. Optional"},{"location":"relations/logical_relations/#slicing-files","title":"Slicing Files","text":"

    A read operation is allowed to only read part of a file. This is convenient, for example, when distributing a read operation across several nodes. The slicing parameters are specified as byte offsets into the file.

    Many file formats consist of indivisible \u201cchunks\u201d of data (e.g. Parquet row groups). If this happens the consumer can determine which slice a particular chunk belongs to. For example, one possible approach is that a chunk should only be read if the midpoint of the chunk (dividing by 2 and rounding down) is contained within the asked-for byte range.

    ReadRel Message
    message ReadRel {\n  RelCommon common = 1;\n  NamedStruct base_schema = 2;\n  Expression filter = 3;\n  Expression best_effort_filter = 11;\n  Expression.MaskExpression projection = 4;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  // Definition of which type of scan operation is to be performed\n  oneof read_type {\n    VirtualTable virtual_table = 5;\n    LocalFiles local_files = 6;\n    NamedTable named_table = 7;\n    ExtensionTable extension_table = 8;\n  }\n\n  // A base table. The list of string is used to represent namespacing (e.g., mydb.mytable).\n  // This assumes shared catalog between systems exchanging a message.\n  message NamedTable {\n    repeated string names = 1;\n    substrait.extensions.AdvancedExtension advanced_extension = 10;\n  }\n\n  // A table composed of literals.\n  message VirtualTable {\n    repeated Expression.Literal.Struct values = 1;\n  }\n\n  // A stub type that can be used to extend/introduce new table types outside\n  // the specification.\n  message ExtensionTable {\n    google.protobuf.Any detail = 1;\n  }\n\n  // Represents a list of files in input of a scan operation\n  message LocalFiles {\n    repeated FileOrFiles items = 1;\n    substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n    // Many files consist of indivisible chunks (e.g. parquet row groups\n    // or CSV rows).  If a slice partially selects an indivisible chunk\n    // then the consumer should employ some rule to decide which slice to\n    // include the chunk in (e.g. include it in the slice that contains\n    // the midpoint of the chunk)\n    message FileOrFiles {\n      oneof path_type {\n        // A URI that can refer to either a single folder or a single file\n        string uri_path = 1;\n        // A URI where the path portion is a glob expression that can\n        // identify zero or more paths.\n        // Consumers should support the POSIX syntax.  The recursive\n        // globstar (**) may not be supported.\n        string uri_path_glob = 2;\n        // A URI that refers to a single file\n        string uri_file = 3;\n        // A URI that refers to a single folder\n        string uri_folder = 4;\n      }\n\n      // Original file format enum, superseded by the file_format oneof.\n      reserved 5;\n      reserved \"format\";\n\n      // The index of the partition this item belongs to\n      uint64 partition_index = 6;\n\n      // The start position in byte to read from this item\n      uint64 start = 7;\n\n      // The length in byte to read from this item\n      uint64 length = 8;\n\n      message ParquetReadOptions {}\n      message ArrowReadOptions {}\n      message OrcReadOptions {}\n      message DwrfReadOptions {}\n\n      // The format of the files.\n      oneof file_format {\n        ParquetReadOptions parquet = 9;\n        ArrowReadOptions arrow = 10;\n        OrcReadOptions orc = 11;\n        google.protobuf.Any extension = 12;\n        DwrfReadOptions dwrf = 13;\n      }\n    }\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#filter-operation","title":"Filter Operation","text":"

    The filter operator eliminates one or more records from the input data based on a boolean filter expression.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness, Distribution, remapped by emit Direct Output Order The field order as the input."},{"location":"relations/logical_relations/#filter-properties","title":"Filter Properties","text":"Property Description Required Input The relational input. Required Expression A boolean expression which describes which records are included/excluded. Required FilterRel Message
    message FilterRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  Expression condition = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#sort-operation","title":"Sort Operation","text":"

    The sort operator reorders a dataset based on one or more identified sort fields and a sorting function for each.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. Direct Output Order The field order of the input."},{"location":"relations/logical_relations/#sort-properties","title":"Sort Properties","text":"Property Description Required Input The relational input. Required Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required SortRel Message
    message SortRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  repeated SortField sorts = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#project-operation","title":"Project Operation","text":"

    The project operation will produce one or more additional expressions based on the inputs of the dataset.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Distribution maintained, mapped by emit. Orderedness: Maintained if no window operations. Extended to include projection fields if fields are direct references. If window operations are present, no orderedness is maintained. Direct Output Order The field order of the input + the list of new expressions in the order they are declared in the expressions list."},{"location":"relations/logical_relations/#project-properties","title":"Project Properties","text":"Property Description Required Input The relational input. Required Expressions List of one or more expressions to add to the input. At least one expression required ProjectRel Message
    message ProjectRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  repeated Expression expressions = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#cross-product-operation","title":"Cross Product Operation","text":"

    The cross product operation will combine two separate inputs into a single output. It pairs every record from the left input with every record of the right input.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is empty post operation. Direct Output Order The emit order of the left input followed by the emit order of the right input."},{"location":"relations/logical_relations/#cross-product-properties","title":"Cross Product Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required CrossRel Message
    message CrossRel {\n  RelCommon common = 1;\n  Rel left = 2;\n  Rel right = 3;\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#join-operation","title":"Join Operation","text":"

    The join operation will combine two separate inputs into a single output, based on a join expression. A common subtype of joins is an equality join where the join expression is constrained to a list of equality (or equality + null equality) conditions between the two inputs of the join.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is empty post operation. Physical relations may provide better property maintenance. Direct Output Order The emit order of the left input followed by the emit order of the right input."},{"location":"relations/logical_relations/#join-properties","title":"Join Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Join Expression A boolean condition that describes whether each record from the left set \u201cmatch\u201d the record from the right set. Field references correspond to the direct output order of the data. Required. Can be the literal True. Post-Join Filter A boolean condition to be applied to each result record after the inputs have been joined, yielding only the records that satisfied the condition. Optional Join Type One of the join types defined below. Required"},{"location":"relations/logical_relations/#join-types","title":"Join Types","text":"Type Description Inner Return records from the left side only if they match the right side. Return records from the right side only when they match the left side. For each cross input match, return a record including the data from both sides. Non-matching records are ignored. Outer Return all records from both the left and right inputs. For each cross input match, return a record including the data from both sides. For any remaining non-match records, return the record from the corresponding input along with nulls for the opposite input. Left Return all records from the left input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the left input, return the left record along with nulls for the right input. Right Return all records from the right input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the right input, return the right record along with nulls for the left input. Semi Returns records from the left input. These are returned only if the records have a join partner on the right side. Anti Return records from the left input. These are returned only if the records do not have a join partner on the right side. Single Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. JoinRel Message
    message JoinRel {\n  RelCommon common = 1;\n  Rel left = 2;\n  Rel right = 3;\n  Expression expression = 4;\n  Expression post_join_filter = 5;\n\n  JoinType type = 6;\n\n  enum JoinType {\n    JOIN_TYPE_UNSPECIFIED = 0;\n    JOIN_TYPE_INNER = 1;\n    JOIN_TYPE_OUTER = 2;\n    JOIN_TYPE_LEFT = 3;\n    JOIN_TYPE_RIGHT = 4;\n    JOIN_TYPE_SEMI = 5;\n    JOIN_TYPE_ANTI = 6;\n    // This join is useful for nested sub-queries where we need exactly one record in output (or throw exception)\n    // See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf\n    JOIN_TYPE_SINGLE = 7;\n  }\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#set-operation","title":"Set Operation","text":"

    The set operation encompasses several set-level operations that support combining datasets, possibly excluding records based on various types of record level matching.

    Signature Value Inputs 2 or more Outputs 1 Property Maintenance Maintains distribution if all inputs have the same ordinal distribution. Orderedness is not maintained. Direct Output Order The field order of the inputs. All inputs must have identical fields."},{"location":"relations/logical_relations/#set-properties","title":"Set Properties","text":"Property Description Required Primary Input The primary input of the dataset. Required Secondary Inputs One or more relational inputs. At least one required Set Operation Type From list below. Required"},{"location":"relations/logical_relations/#set-operation-types","title":"Set Operation Types","text":"Property Description Minus (Primary) Returns the primary input excluding any matching records from secondary inputs. Minus (Multiset) Returns the primary input minus any records that are included in all sets. Intersection (Primary) Returns all rows primary rows that intersect at least one secondary input. Intersection (Multiset) Returns all rows that intersect at least one record from each secondary inputs. Union Distinct Returns all the records from each set, removing any rows that are duplicated (within or across sets). Union All Returns all records from each set, allowing duplicates. SetRel Message
    message SetRel {\n  RelCommon common = 1;\n  // The first input is the primary input, the remaining are secondary\n  // inputs.  There must be at least two inputs.\n  repeated Rel inputs = 2;\n  SetOp op = 3;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  enum SetOp {\n    SET_OP_UNSPECIFIED = 0;\n    SET_OP_MINUS_PRIMARY = 1;\n    SET_OP_MINUS_MULTISET = 2;\n    SET_OP_INTERSECTION_PRIMARY = 3;\n    SET_OP_INTERSECTION_MULTISET = 4;\n    SET_OP_UNION_DISTINCT = 5;\n    SET_OP_UNION_ALL = 6;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#fetch-operation","title":"Fetch Operation","text":"

    The fetch operation eliminates records outside a desired window. Typically corresponds to a fetch/offset SQL clause. Will only returns records between the start offset and the end offset.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution and orderedness. Direct Output Order Unchanged from input."},{"location":"relations/logical_relations/#fetch-properties","title":"Fetch Properties","text":"Property Description Required Input A relational input, typically with a desired orderedness property. Required Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0. Count A positive integer. Declares the number of records that should be returned. Required FetchRel Message
    message FetchRel {\n  RelCommon common = 1;\n  Rel input = 2;\n  // the offset expressed in number of records\n  int64 offset = 3;\n  // the amount of records to return\n  int64 count = 4;\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n}\n
    "},{"location":"relations/logical_relations/#aggregate-operation","title":"Aggregate Operation","text":"

    The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderedness guaranteed. Direct Output Order The list of distinct columns from each grouping set (ordered by their first appearance) followed by the list of measures in declaration order, followed by an i32 describing the associated particular grouping set the value is derived from (if applicable).

    In its simplest form, an aggregation has only measures. In this case, all records are folded into one, and a column is returned for each aggregate expression in the measures list.

    Grouping sets can be used for finer-grained control over which records are folded. Within a grouping set, two records will be folded together if and only if each expressions in the grouping set yields the same value for each. The values returned by the grouping sets will be returned as columns to the left of the columns for the aggregate expressions. If a grouping set contains no grouping expressions, all rows will be folded for that grouping set.

    It\u2019s possible to specify multiple grouping sets in a single aggregate operation. The grouping sets behave more or less independently, with each returned record belonging to one of the grouping sets. The values for the grouping expression columns that are not part of the grouping set for a particular record will be set to null. Two grouping expressions will be returned using the same column if they represent the protobuf messages describing the expressions are equal. The columns for grouping expressions that do not appear in all grouping sets will be nullable (regardless of the nullability of the type returned by the grouping expression) to accomodate the null insertion.

    To further disambiguate which record belongs to which grouping set, an aggregate relation with more than one grouping set receives an extra i32 column on the right-hand side. The value of this field will be the zero-based index of the grouping set that yielded the record.

    If at least one grouping expression is present, the aggregation is allowed to not have any aggregate expressions. An aggregate relation is invalid if it would yield zero columns.

    "},{"location":"relations/logical_relations/#aggregate-properties","title":"Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional. Measures A list of one or more aggregate expressions along with an optional filter. Optional, required if no grouping sets. AggregateRel Message
    message AggregateRel {\n  RelCommon common = 1;\n\n  // Input of the aggregation\n  Rel input = 2;\n\n  // A list of one or more grouping expression sets that the aggregation measures should be calculated for.\n  // Required if there are no measures.\n  repeated Grouping groupings = 3;\n\n  // A list of one or more aggregate expressions along with an optional filter.\n  // Required if there are no groupings.\n  repeated Measure measures = 4;\n\n  substrait.extensions.AdvancedExtension advanced_extension = 10;\n\n  message Grouping {\n    repeated Expression grouping_expressions = 1;\n  }\n\n  message Measure {\n    AggregateFunction measure = 1;\n\n    // An optional boolean expression that acts to filter which records are\n    // included in the measure. True means include this record for calculation\n    // within the measure.\n    // Helps to support SUM(<c>) FILTER(WHERE...) syntax without masking opportunities for optimization\n    Expression filter = 2;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#reference-operator","title":"Reference Operator","text":"

    The reference operator is used to construct DAGs of operations. In a Plan we can have multiple Rel representing various computations with potentially multiple outputs. The ReferenceRel is used to express the fact that multiple Rel might be sharing subtrees of computation. This can be used to express arbitrary DAGs as well as represent multi-query optimizations.

    As a concrete example think about two queries SELECT * FROM A JOIN B JOIN C and SELECT * FROM A JOIN B JOIN D, We could use the ReferenceRel to highlight the shared A JOIN B between the two queries, by creating a plan with 3 Rel. One expressing A JOIN B (in position 0 in the plan), one using reference as follows: ReferenceRel(0) JOIN C and a third one doing ReferenceRel(0) JOIN D. This allows to avoid the redundancy of A JOIN B.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains all properties of the input Direct Output Order Maintains order"},{"location":"relations/logical_relations/#reference-properties","title":"Reference Properties","text":"Property Description Required Referred Rel A zero-indexed positional reference to a Rel defined within the same Plan. Required ReferenceRel Message
    message ReferenceRel {\n  int32 subtree_ordinal = 1;\n\n}\n
    "},{"location":"relations/logical_relations/#write-operator","title":"Write Operator","text":"

    The write operator is an operator that consumes one input and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Output depends on OutputMode (none, or modified records) Direct Output Order Unchanged from input"},{"location":"relations/logical_relations/#write-properties","title":"Write Properties","text":"Property Description Required Write Type Definition of which object we are operating on (e.g., a fully-qualified table name). Required CTAS Schema The names of all the columns and their type for a CREATE TABLE AS. Required only for CTAS Write Operator Which type of operation we are performing (INSERT/DELETE/UPDATE/CTAS). Required Rel Input The Rel representing which records we will be operating on (e.g., VALUES for an INSERT, or which records to DELETE, or records and after-image of their values for UPDATE). Required Output Mode For views that modify a DB it is important to control which records to \u201creturn\u201d. Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_RECORDS, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to \u201ccount how many records were updated\u201d). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER"},{"location":"relations/logical_relations/#write-definition-types","title":"Write Definition Types","text":"Adding new Write Definition Types

    If you have a write definition that\u2019s not covered here, see the process for adding new write definition types.

    Write definition types are built by the community and added to the specification.

    WriteRel Message
    message WriteRel {\n  // Definition of which TABLE we are operating on\n  oneof write_type {\n    NamedObjectWrite named_table = 1;\n    ExtensionObject extension_table = 2;\n  }\n\n  // The schema of the table (must align with Rel input (e.g., number of leaf fields must match))\n  NamedStruct table_schema = 3;\n\n  // The type of operation to perform\n  WriteOp op = 4;\n\n  // The relation that determines the records to add/remove/modify\n  // the schema must match with table_schema. Default values must be explicitly stated\n  // in a ProjectRel at the top of the input. The match must also\n  // occur in case of DELETE to ensure multi-engine plans are unequivocal.\n  Rel input = 5;\n\n  // Output mode determines what is the output of executing this rel\n  OutputMode output = 6;\n  RelCommon common = 7;\n\n  enum WriteOp {\n    WRITE_OP_UNSPECIFIED = 0;\n    // The insert of new records in a table\n    WRITE_OP_INSERT = 1;\n    // The removal of records from a table\n    WRITE_OP_DELETE = 2;\n    // The modification of existing records within a table\n    WRITE_OP_UPDATE = 3;\n    // The Creation of a new table, and the insert of new records in the table\n    WRITE_OP_CTAS = 4;\n  }\n\n  enum OutputMode {\n    OUTPUT_MODE_UNSPECIFIED = 0;\n    // return no records at all\n    OUTPUT_MODE_NO_OUTPUT = 1;\n    // this mode makes the operator return all the record INSERTED/DELETED/UPDATED by the operator.\n    // The operator returns the AFTER-image of any change. This can be further manipulated by operators upstreams\n    // (e.g., retunring the typical \"count of modified records\").\n    // For scenarios in which the BEFORE image is required, the user must implement a spool (via references to\n    // subplans in the body of the Rel input) and return those with anounter PlanRel.relations.\n    OUTPUT_MODE_MODIFIED_RECORDS = 2;\n  }\n\n}\n
    "},{"location":"relations/logical_relations/#virtual-table_1","title":"Virtual Table","text":"Property Description Required Name The in-memory name to give the dataset. Required Pin Whether it is okay to remove this dataset from memory or it should be kept in memory. Optional, defaults to false."},{"location":"relations/logical_relations/#files-type_1","title":"Files Type","text":"Property Description Required Path A URI to write the data to. Supports the inclusion of field references that are listed as available in properties as a \u201crotation description field\u201d. Required Format Enumeration of available formats. Only current option is PARQUET. Required"},{"location":"relations/logical_relations/#ddl-data-definition-language-operator","title":"DDL (Data Definition Language) Operator","text":"

    The operator that defines modifications of a database schema (CREATE/DROP/ALTER for TABLE and VIEWS).

    Signature Value Inputs 1 Outputs 0 Property Maintenance N/A (no output) Direct Output Order N/A"},{"location":"relations/logical_relations/#ddl-properties","title":"DDL Properties","text":"Property Description Required Write Type Definition of which type of object we are operating on. Required Table Schema The names of all the columns and their type. Required (except for DROP operations) Table Defaults The set of default values for this table. Required (except for DROP operations) DDL Object Which type of object we are operating on (e.g., TABLE or VIEW). Required DDL Operator The operation to be performed (e.g., CREATE/ALTER/DROP). Required View Definition A Rel representing the \u201cbody\u201d of a VIEW. Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER DdlRel Message
    message DdlRel {\n  // Definition of which type of object we are operating on\n  oneof write_type {\n    NamedObjectWrite named_object = 1;\n    ExtensionObject extension_object = 2;\n  }\n\n  // The columns that will be modified (representing after-image of a schema change)\n  NamedStruct table_schema = 3;\n  // The default values for the columns (representing after-image of a schema change)\n  // E.g., in case of an ALTER TABLE that changes some of the column default values, we expect\n  // the table_defaults Struct to report a full list of default values reflecting the result of applying\n  // the ALTER TABLE operator successfully\n  Expression.Literal.Struct table_defaults = 4;\n\n  // Which type of object we operate on\n  DdlObject object = 5;\n\n  // The type of operation to perform\n  DdlOp op = 6;\n\n  // The body of the CREATE VIEW\n  Rel view_definition = 7;\n  RelCommon common = 8;\n\n  enum DdlObject {\n    DDL_OBJECT_UNSPECIFIED = 0;\n    // A Table object in the system\n    DDL_OBJECT_TABLE = 1;\n    // A View object in the system\n    DDL_OBJECT_VIEW = 2;\n  }\n\n  enum DdlOp {\n    DDL_OP_UNSPECIFIED = 0;\n    // A create operation (for any object)\n    DDL_OP_CREATE = 1;\n    // A create operation if the object does not exist, or replaces it (equivalent to a DROP + CREATE) if the object already exists\n    DDL_OP_CREATE_OR_REPLACE = 2;\n    // An operation that modifies the schema (e.g., column names, types, default values) for the target object\n    DDL_OP_ALTER = 3;\n    // An operation that removes an object from the system\n    DDL_OP_DROP = 4;\n    // An operation that removes an object from the system (without throwing an exception if the object did not exist)\n    DDL_OP_DROP_IF_EXIST = 5;\n  }\n  //TODO add PK/constraints/indexes/etc..?\n\n}\n
    Discussion Points
    • How should correlated operations be handled?
    "},{"location":"relations/physical_relations/","title":"Physical Relations","text":"

    There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the \u201cphysical plan\u201d but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait.

    "},{"location":"relations/physical_relations/#hash-equijoin-operator","title":"Hash Equijoin Operator","text":"

    The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#hash-equijoin-properties","title":"Hash Equijoin Properties","text":"Property Description Required Left Input A relational input.(Probe-side) Required Right Input A relational input.(Build-side) Required Left Keys References to the fields to join on in the left input. Required Right Keys References to the fields to join on in the right input. Required Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true. Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#nlj-nested-loop-join-operator","title":"NLJ (Nested Loop Join) Operator","text":"

    The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#nlj-properties","title":"NLJ Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Join Expression A boolean condition that describes whether each record from the left set \u201cmatch\u201d the record from the right set. Optional. Defaults to true (a Cartesian join). Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#merge-equijoin-operator","title":"Merge Equijoin Operator","text":"

    The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.

    Signature Value Inputs 2 Outputs 1 Property Maintenance Distribution is maintained. Orderedness is eliminated. Direct Output Order Same as the Join operator."},{"location":"relations/physical_relations/#merge-join-properties","title":"Merge Join Properties","text":"Property Description Required Left Input A relational input. Required Right Input A relational input. Required Left Keys References to the fields to join on in the left input. Required Right Keys References to the fields to join on in the right input. Reauired Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true. Join Type One of the join types defined in the Join operator. Required"},{"location":"relations/physical_relations/#exchange-operator","title":"Exchange Operator","text":"

    The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness is maintained. Distribution is overwritten based on configuration. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#exchange-types","title":"Exchange Types","text":"Type Description Scatter Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels Single Bucket Define an expression that provides a single i32 bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. Multi Bucket Define an expression that provides a List<i32> of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression. Broadcast Send all records to all partitions. Round Robin Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next."},{"location":"relations/physical_relations/#exchange-properties","title":"Exchange Properties","text":"Property Description Required Input The relational input. Required. Distribution Type One of the distribution types defined above. Required. Partition Count The number of partitions targeted for output. Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count. Expression Mapping Describes a relationship between each partition ID and the destination that partition should be sent to. Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value."},{"location":"relations/physical_relations/#merging-capture","title":"Merging Capture","text":"

    A receiving operation that will merge multiple ordered streams to maintain orderedness.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderedness and distribution are maintained. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#merging-capture-properties","title":"Merging Capture Properties","text":"Property Description Required Blocking Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. Optional, defaults to false"},{"location":"relations/physical_relations/#simple-capture","title":"Simple Capture","text":"

    A receiving operation that will merge multiple streams in an arbitrary order.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Orderness is empty after this operation. Distribution are maintained. Direct Output Order Order of the input."},{"location":"relations/physical_relations/#naive-capture-properties","title":"Naive Capture Properties","text":"Property Description Required Input The relational input. Required"},{"location":"relations/physical_relations/#top-n-operation","title":"Top-N Operation","text":"

    The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. Direct Output Order The field order of the input."},{"location":"relations/physical_relations/#top-n-properties","title":"Top-N Properties","text":"Property Description Required Input The relational input. Required Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0. Count A positive integer. Declares the number of records that should be returned. Required"},{"location":"relations/physical_relations/#hash-aggregate-operation","title":"Hash Aggregate Operation","text":"

    The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed. Direct Output Order Same as defined by Aggregate operation."},{"location":"relations/physical_relations/#hash-aggregate-properties","title":"Hash Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0. Measures A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. Optional, required if no grouping sets."},{"location":"relations/physical_relations/#streaming-aggregate-operation","title":"Streaming Aggregate Operation","text":"

    The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering. Direct Output Order Same as defined by Aggregate operation."},{"location":"relations/physical_relations/#streaming-aggregate-properties","title":"Streaming Aggregate Properties","text":"Property Description Required Input The relational input. Required Grouping Sets One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. Optional, required if no measures. Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0. Measures A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. Optional, required if no grouping sets."},{"location":"relations/physical_relations/#consistent-partition-window-operation","title":"Consistent Partition Window Operation","text":"

    A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution and ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#window-properties","title":"Window Properties","text":"Property Description Required Input The relational input. Required Window Functions One or more window functions. At least one required."},{"location":"relations/physical_relations/#expand-operation","title":"Expand Operation","text":"

    The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept. Direct Output Order The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from."},{"location":"relations/physical_relations/#expand-properties","title":"Expand Properties","text":"Property Description Required Input The relational input. Required Direct Fields Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field Required"},{"location":"relations/physical_relations/#switching-field-properties","title":"Switching Field Properties","text":"

    A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates.

    Property Description Required Duplicates List of one or more expressions. The output will contain a row for each expression. Required"},{"location":"relations/physical_relations/#hashing-window-operation","title":"Hashing Window Operation","text":"

    A window aggregate operation that will build hash tables for each distinct partition expression.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution. Eliminates ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#hashing-window-properties","title":"Hashing Window Properties","text":"Property Description Required Input The relational input. Required Window Expressions One or more window expressions. At least one required."},{"location":"relations/physical_relations/#streaming-window-operation","title":"Streaming Window Operation","text":"

    A window aggregate operation that relies on a partition/ordering sorted input.

    Signature Value Inputs 1 Outputs 1 Property Maintenance Maintains distribution. Eliminates ordering. Direct Output Order Same as Project operator (input followed by each window expression)."},{"location":"relations/physical_relations/#streaming-window-properties","title":"Streaming Window Properties","text":"Property Description Required Input The relational input. Required Window Expressions One or more window expressions. Must be supported by the sortedness of the input. At least one required."},{"location":"relations/user_defined_relations/","title":"User Defined Relations","text":"

    Pending

    "},{"location":"serialization/binary_serialization/","title":"Binary Serialization","text":"

    Substrait can be serialized into a protobuf-based binary representation. The proto schema/IDL files can be found on GitHub. Proto files are place in the io.substrait namespace for C++/Java and the Substrait.Protobuf namespace for C#.

    "},{"location":"serialization/binary_serialization/#plan","title":"Plan","text":"

    The main top-level object used to communicate a Substrait plan using protobuf is a Plan message (see the ExtendedExpression for an alternative other top-level object). The plan message is composed of a set of data structures that minimize repetition in the serialization along with one (or more) Relation trees.

    Plan Message
    message Plan {\n  // Substrait version of the plan. Optional up to 0.17.0, required for later\n  // versions.\n  Version version = 6;\n\n  // a list of yaml specifications this plan may depend on\n  repeated substrait.extensions.SimpleExtensionURI extension_uris = 1;\n\n  // a list of extensions this plan may depend on\n  repeated substrait.extensions.SimpleExtensionDeclaration extensions = 2;\n\n  // one or more relation trees that are associated with this plan.\n  repeated PlanRel relations = 3;\n\n  // additional extensions associated with this plan.\n  substrait.extensions.AdvancedExtension advanced_extensions = 4;\n\n  // A list of com.google.Any entities that this plan may use. Can be used to\n  // warn if some embedded message types are unknown. Note that this list may\n  // include message types that are ignorable (optimizations) or that are\n  // unused. In many cases, a consumer may be able to work with a plan even if\n  // one or more message types defined here are unknown.\n  repeated string expected_type_urls = 5;\n\n}\n
    "},{"location":"serialization/binary_serialization/#extensions","title":"Extensions","text":"

    Protobuf supports both simple and advanced extensions. Simple extensions are declared at the plan level and advanced extensions are declared at multiple levels of messages within the plan.

    "},{"location":"serialization/binary_serialization/#simple-extensions","title":"Simple Extensions","text":"

    For simple extensions, a plan references the URIs associated with the simple extensions to provide additional plan capabilities. These URIs will list additional relevant information for the plan.

    Simple extensions within a plan are split into three components: an extension URI, an extension declaration and a number of references.

    • Extension URI: A unique identifier for the extension pointing to a YAML document specifying one or more specific extensions. Declares an anchor that can be used in extension declarations.
    • Extension Declaration: A specific extension within a single YAML document. The declaration combines a reference to the associated Extension URI along with a unique key identifying the specific item within that YAML document (see Function Signature Compound Names). It also defines a declaration anchor. The anchor is a plan-specific unique value that the producer creates as a key to be referenced elsewhere.
    • Extension Reference: A specific instance or use of an extension declaration within the plan body.

    Extension URIs and declarations are encapsulated in the top level of the plan. Extension declarations are then referenced throughout the body of the plan itself. The exact structure of these references will depend on the extension point being used, but they will always include the extension\u2019s anchor (or key). For example, all scalar function expressions contain references to an extension declaration which defines the semantics of the function.

    Simple Extension URI
    message SimpleExtensionURI {\n  // A surrogate key used in the context of a single plan used to reference the\n  // URI associated with an extension.\n  uint32 extension_uri_anchor = 1;\n\n  // The URI where this extension YAML can be retrieved. This is the \"namespace\"\n  // of this extension.\n  string uri = 2;\n\n}\n

    Once the YAML file URI anchor is defined, the anchor will be referenced by zero or more SimpleExtensionDefinitions. For each simple extension definition, an anchor is defined for that specific extension entity. This anchor is then referenced to within lower-level primitives (functions, etc.) to reference that specific extension. Message properties are named *_anchor where the anchor is defined and *_reference when referencing the anchor. For example function_anchor and function_reference.

    Simple Extension Declaration
    message SimpleExtensionDeclaration {\n  oneof mapping_type {\n    ExtensionType extension_type = 1;\n    ExtensionTypeVariation extension_type_variation = 2;\n    ExtensionFunction extension_function = 3;\n  }\n\n  // Describes a Type\n  message ExtensionType {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific extension type\n    uint32 type_anchor = 2;\n\n    // the name of the type in the defined extension YAML.\n    string name = 3;\n  }\n\n  message ExtensionTypeVariation {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific type variation\n    uint32 type_variation_anchor = 2;\n\n    // the name of the type in the defined extension YAML.\n    string name = 3;\n  }\n\n  message ExtensionFunction {\n    // references the extension_uri_anchor defined for a specific extension URI.\n    uint32 extension_uri_reference = 1;\n\n    // A surrogate key used in the context of a single plan to reference a\n    // specific function\n    uint32 function_anchor = 2;\n\n    // A function signature compound name\n    string name = 3;\n  }\n\n}\n

    Note

    Anchors only have meaning within a single plan and exist simply to reduce plan size. They are not some form of global identifier. Different plans may use different anchors for the same specific functions, types, type variations, etc.

    Note

    It is valid for a plan to include SimpleExtensionURIs and/or SimpleExtensionDeclarations that are not referenced directly.

    "},{"location":"serialization/binary_serialization/#advanced-extensions","title":"Advanced Extensions","text":"

    Substrait protobuf exposes a special object in multiple places in the representation to expose extension capabilities. Extensions are done via this object. Extensions are separated into main concepts:

    Advanced Extension Type Description Optimization A change to the plan that may help some consumers work more efficiently with the plan. These properties should be propagated through plan pipelines where possible but do not impact the meaning of the plan. A consumer can safely ignore these properties. Enhancement A change to the plan that functionally changes the behavior of the plan. Use these sparingly as they will impact plan interoperability. Advanced Extension Protobuf
    message AdvancedExtension {\n  // An optimization is helpful information that don't influence semantics. May\n  // be ignored by a consumer.\n  google.protobuf.Any optimization = 1;\n\n  // An enhancement alter semantics. Cannot be ignored by a consumer.\n  google.protobuf.Any enhancement = 2;\n\n}\n
    "},{"location":"serialization/binary_serialization/#capabilities","title":"Capabilities","text":"

    When two systems exchanging Substrait plans want to understand each other\u2019s capabilities, they may exchange a Capabilities message. The capabilities message provides information on the set of simple and advanced extensions that the system supports.

    Capabilities Message
    message Capabilities {\n  // List of Substrait versions this system supports\n  repeated string substrait_versions = 1;\n\n  // list of com.google.Any message types this system supports for advanced\n  // extensions.\n  repeated string advanced_extension_type_urls = 2;\n\n  // list of simple extensions this system supports.\n  repeated SimpleExtension simple_extensions = 3;\n\n  message SimpleExtension {\n    string uri = 1;\n    repeated string function_keys = 2;\n    repeated string type_keys = 3;\n    repeated string type_variation_keys = 4;\n  }\n\n}\n
    "},{"location":"serialization/binary_serialization/#protobuf-rationale","title":"Protobuf Rationale","text":"

    The binary format of Substrait is designed to be easy to work with in many languages. A key requirement is that someone can take the binary format IDL and use standard tools to build a set of primitives that are easy to work with in any of a number of languages. This allows communities to build and use Substrait using only a binary IDL and the specification (and allows the Substrait project to avoid being required to build libraries for each language to work with the specification).

    There are several binary IDLs that exist today. The key requirements for Substrait are the following:

    • Strongly typed IDL schema language
    • High-quality well-supported and idiomatic bindings/compilers for key languages (Python, Javascript, C++, Go, Rust, Java)
    • Compact serial representation

    The primary formats that exist that roughly qualify under these requirements include: Protobuf, Thrift, Flatbuf, Avro, Cap\u2019N\u2019Proto. Protobuf was chosen due to its clean typing system and large number of high quality language bindings.

    The binary serialization IDLs can be found on GitHub and are sampled throughout the documentation.

    "},{"location":"serialization/text_serialization/","title":"Text Serialization","text":"

    To maximize the new user experience, it is important for Substrait to have a text representation of plans. This allows people to experiment with basic tooling. Building simple CLI tools that do things like SQL > Plan and Plan > SQL or REPL plan construction can all be done relatively straightforwardly with a text representation.

    The recommended text serialization format is JSON. Since the text format is not designed for performance, the format can be produced to maximize readability. This also allows nice symmetry between the construction of plans and the configuration of various extensions such as function signatures and user defined types.

    To ensure the JSON is valid, the object will be defined using the OpenApi 3.1 specification. This not only allows strong validation, the OpenApi specification enables code generators to be easily used to produce plans in many languages.

    While JSON will be used for much of the plan serialization, Substrait uses a custom simplistic grammar for record level expressions. While one can construct an equation such as (10 + 5)/2 using a tree of function and literal objects, it is much more human-readable to consume a plan when the information is written similarly to the way one typically consumes scalar expressions. This grammar will be maintained in an ANTLR grammar (targetable to multiple programming languages) and is also planned to be supported via JSON schema definition format tag so that the grammar can be validated as part of the schema validation.

    "},{"location":"spec/extending/","title":"Extending","text":"

    Substrait is a community project and requires consensus about new additions to the specification in order to maintain consistency. The best way to get consensus is to discuss ideas. The main ways to communicate are:

    • Substrait Mailing List
    • Substrait Slack
    • Community Meeting
    "},{"location":"spec/extending/#minor-changes","title":"Minor changes","text":"

    Simple changes like typos and bug fixes do not require as much effort. File an issue or send a PR and we can discuss it there.

    "},{"location":"spec/extending/#complex-changes","title":"Complex changes","text":"

    For complex features it is useful to discuss the change first. It will be useful to gather some background information to help get everyone on the same page.

    "},{"location":"spec/extending/#outline-the-issue","title":"Outline the issue","text":""},{"location":"spec/extending/#language","title":"Language","text":"

    Every engine has its own terminology. Every Spark user probably knows what an \u201cattribute\u201d is. Velox users will know what a \u201cRowVector\u201d means. Etc. However, Substrait is used by people that come from a variety of backgrounds and you should generally assume that its users do not know anything about your own implementation. As a result, all PRs and discussion should endeavor to use Substrait terminology wherever possible.

    "},{"location":"spec/extending/#motivation","title":"Motivation","text":"

    What problems does this relation solve? If it is a more logical relation then how does it allow users to express new capabilities? If it is more of an internal relation then how does it map to existing logical relations? How is it different than other existing relations? Why do we need this?

    "},{"location":"spec/extending/#examples","title":"Examples","text":"

    Provide example input and output for the relation. Show example plans. Try and motivate your examples, as best as possible, with something that looks like a real world problem. These will go a long ways towards helping others understand the purpose of a relation.

    "},{"location":"spec/extending/#alternatives","title":"Alternatives","text":"

    Discuss what alternatives are out there. Are there other ways to achieve similar results? Do some systems handle this problem differently?

    "},{"location":"spec/extending/#survey-existing-implementation","title":"Survey existing implementation","text":"

    It\u2019s unlikely that this is the first time that this has been done. Figuring out

    "},{"location":"spec/extending/#prototype-the-feature","title":"Prototype the feature","text":"

    Novel approaches should be implemented as an extension first.

    "},{"location":"spec/extending/#substrait-design-principles","title":"Substrait design principles","text":"

    Substrait is designed around interoperability so a feature only used by a single system may not be accepted. But don\u2019t dispair! Substrait has a highly developed extension system for this express purpose.

    "},{"location":"spec/extending/#you-dont-have-to-do-it-alone","title":"You don\u2019t have to do it alone","text":"

    If you are hoping to add a feature and these criteria seem intimidating then feel free to start a mailing list discussion before you have all the information and ask for help. Investigating other implementations, in particular, is something that can be quite difficult to do on your own.

    "},{"location":"spec/specification/","title":"Specification","text":""},{"location":"spec/specification/#status","title":"Status","text":"

    The specification has passed the initial design phase and is now in the final stages of being fleshed out. The community is encouraged to identify (and address) any perceived gaps in functionality using GitHub issues and PRs. Once all of the planned implementations have been completed all deprecated fields will be eliminated and version 1.0 will be released.

    "},{"location":"spec/specification/#components-complete","title":"Components (Complete)","text":"Section Description Simple Types A way to describe the set of basic types that will be operated on within a plan. Only includes simple types such as integers and doubles (nothing configurable or compound). Compound Types Expression of types that go beyond simple scalar values. Key concepts here include: configurable types such as fixed length and numeric types as well as compound types such as structs, maps, lists, etc. Type Variations Physical variations to base types. User Defined Types Extensions that can be defined for specific IR producers/consumers. Field References Expressions to identify which portions of a record should be operated on. Scalar Functions Description of how functions are specified. Concepts include arguments, variadic functions, output type derivation, etc. Scalar Function List A list of well-known canonical functions in YAML format. Specialized Record Expressions Specialized expression types that are more naturally expressed outside the function paradigm. Examples include items such as if/then/else and switch statements. Aggregate Functions Functions that are expressed in aggregation operations. Examples include things such as SUM, COUNT, etc. Operations take many records and collapse them into a single (possibly compound) value. Window Functions Functions that relate a record to a set of encompassing records. Examples in SQL include RANK, NTILE, etc. User Defined Functions Reusable named functions that are built beyond the core specification. Implementations are typically registered thorough external means (drop a file in a directory, send a special command with implementation, etc.) Embedded Functions Functions implementations embedded directly within the plan. Frequently used in data science workflows where business logic is interspersed with standard operations. Relation Basics Basic concepts around relational algebra, record emit and properties. Logical Relations Common relational operations used in compute plans including project, join, aggregation, etc. Text Serialization A human producible & consumable representation of the plan specification. Binary Serialization A high performance & compact binary representation of the plan specification."},{"location":"spec/specification/#components-designed-but-not-implemented","title":"Components (Designed but not Implemented)","text":"Section Description Table Functions Functions that convert one or more values from an input record into 0..N output records. Example include operations such as explode, pos-explode, etc. User Defined Relations Installed and reusable relational operations customized to a particular platform. Embedded Relations Relational operations where plans contain the \u201cmachine code\u201d to directly execute the necessary operations. Physical Relations Specific execution sub-variations of common relational operations that describe have multiple unique physical variants associated with a single logical operation. Examples include hash join, merge join, nested loop join, etc."},{"location":"spec/technology_principles/","title":"Technology Principles","text":"
    • Provide a good suite of well-specified common functionality in databases and data science applications.
    • Make it easy for users to privately or publicly extend the representation to support specialized/custom operations.
    • Produce something that is language agnostic and requires minimal work to start developing against in a new language.
    • Drive towards a common format that avoids specialization for single favorite producer or consumer.
    • Establish clear delineation between specifications that MUST be respected to and those that can be optionally ignored.
    • Establish a forgiving compatibility approach and versioning scheme that supports cross-version compatibility in maximum number of cases.
    • Minimize the need for consumer intelligence by excluding concepts like overloading, type coercion, implicit casting, field name handling, etc. (Note: this is weak and should be better stated.)
    • Decomposability/severability: A particular producer or consumer should be able to produce or consume only a subset of the specification and interact well with any other Substrait system as long the specific operations requested fit within the subset of specification supported by the counter system.
    "},{"location":"spec/versioning/","title":"Versioning","text":"

    As an interface specification, the goal of Substrait is to reach a point where (breaking) changes will never need to happen again, or at least be few and far between. By analogy, Apache Arrow\u2019s in-memory format specification has stayed functionally constant, despite many major library versions being released. However, we\u2019re not there yet. When we believe that we\u2019ve reached this point, we will signal this by releasing version 1.0.0. Until then, we will remain in the 0.x.x version regime.

    Despite this, we strive to maintain backward compatibility for both the binary representation and the text representation by means of deprecation. When a breaking change cannot be reasonably avoided, we may remove previously deprecated fields. All deprecated fields will be removed for the 1.0.0 release.

    Substrait uses semantic versioning for its version numbers, with the addition that, during 0.x.y, we increment the x digit for breaking changes and new features, and the y digit for fixes and other nonfunctional changes. The release process is currently automated and makes a new release every week, provided something has changed on the main branch since the previous release. This release cadence will likely be slowed down as stability increases over time. Conventional commits are used to distinguish between breaking changes, new features, and fixes, and GitHub actions are used to verify that there are indeed no breaking protobuf changes in a commit, unless the commit message states this.

    "},{"location":"tools/producer_tools/","title":"Producer Tools","text":""},{"location":"tools/producer_tools/#isthmus","title":"Isthmus","text":"

    Isthmus is an application that serializes SQL to Substrait Protobuf via the Calcite SQL compiler.

    "},{"location":"tools/substrait_validator/","title":"Substrait Validator","text":"

    The Substrait Validator is a tool used to validate substrait plans as well as print diagnostics information regarding the plan validity.

    "},{"location":"tools/third_party_tools/","title":"Third Party Tools","text":""},{"location":"tools/third_party_tools/#substrait-tools","title":"Substrait-tools","text":"

    The substrait-tools python package provides a command line interface for producing/consuming substrait plans by leveraging the APIs from different producers and consumers.

    "},{"location":"tools/third_party_tools/#substrait-fiddle","title":"Substrait Fiddle","text":"

    Substrait Fiddle is an online tool to share, debug, and prototype Substrait plans.

    The Substrait Fiddle Source is available allowing it to be run in any environment.

    "},{"location":"tutorial/sql_to_substrait/","title":"SQL to Substrait tutorial","text":"

    This is an introductory tutorial to learn the basics of Substrait for readers already familiar with SQL. We will look at how to construct a Substrait plan from an example query.

    We\u2019ll present the Substrait in JSON form to make it relatively readable to newcomers. Typically Substrait is exchanged as a protobuf message, but for debugging purposes it is often helpful to look at a serialized form. Plus, it\u2019s not uncommon for unit tests to represent plans as JSON strings. So if you are developing with Substrait, it\u2019s useful to have experience reading them.

    Note

    Substrait is currently only defined with Protobuf. The JSON provided here is the Protobuf JSON output, but it is not the official Substrait text format. Eventually, Substrait will define it\u2019s own human-readable text format, but for now this tutorial will make due with what Protobuf provides.

    Substrait is designed to communicate plans (mostly logical plans). Those plans contain types, schemas, expressions, extensions, and relations. We\u2019ll look at them in that order, going from simplest to most complex until we can construct full plans.

    This tutorial won\u2019t cover all the details of each piece, but it will give you an idea of how they connect together. For a detailed reference of each individual field, the best place to look is reading the protobuf definitions. They represent the source-of-truth of the spec and are well-commented to address ambiguities.

    "},{"location":"tutorial/sql_to_substrait/#problem-set-up","title":"Problem Set up","text":"

    To learn Substrait, we\u2019ll build up to a specific query. We\u2019ll be using the tables:

    CREATE TABLE orders (\n  product_id: i64 NOT NULL,\n  quantity: i32 NOT NULL,\n  order_date: date NOT NULL,\n  price: decimal(10, 2)\n);\n
    CREATE TABLE products (\n  product_id: i64 NOT NULL,\n  categories: list<string NOT NULL> NOT NULL,\n  details: struct<manufacturer: string, year_created: int32>,\n  product_name: string\n);\n

    This orders table represents events where products were sold, recording how many (quantity) and at what price (price). The products table provides details for each product, with product_id as the primary key.

    And we\u2019ll try to create the query:

    SELECT\n  product_name,\n  product_id,\n  sum(quantity * price) as sales\nFROM\n  orders\nINNER JOIN\n  products\nON\n  orders.product_id = products.product_id\nWHERE\n  -- categories does not contain \"Computers\"\n  INDEX_IN(\"Computers\", categories) IS NULL\nGROUP BY\n  product_name,\n  product_id\n

    The query asked the question: For products that aren\u2019t in the \"Computer\" category, how much has each product generated in sales?

    However, Substrait doesn\u2019t correspond to SQL as much as it does to logical plans. So to be less ambiguous, the plan we are aiming for looks like:

    |-+ Aggregate({sales = sum(quantity_price)}, group_by=(product_name, product_id))\n  |-+ InnerJoin(on=orders.product_id = products.product_id)\n    |- ReadTable(orders)\n    |-+ Filter(INDEX_IN(\"Computers\", categories) IS NULL)\n      |- ReadTable(products)\n
    "},{"location":"tutorial/sql_to_substrait/#types-and-schemas","title":"Types and Schemas","text":"

    As part of the Substrait plan, we\u2019ll need to embed the data types of the input tables. In Substrait, each type is a distinct message, which at a minimum contains a field for nullability. For example, a string field looks like:

    {\n  \"string\": {\n    \"nullability\": \"NULLABILITY_NULLABLE\"\n  }\n}\n

    Nullability is an enum not a boolean, since Substrait allows NULLABILITY_UNSPECIFIED as an option, in addition to NULLABILITY_NULLABLE (nullable) and NULLABILITY_REQUIRED (not nullable).

    Other types such as VarChar and Decimal have other parameters. For example, our orders.price column will be represented as:

    {\n  \"decimal\": {\n    \"precision\": 10,\n    \"scale\": 2,\n    \"nullability\": \"NULLABILITY_NULLABLE\"\n  }\n}\n

    Finally, there are nested compound types such as structs and list types that have other types as parameters. For example, the products.categories column is a list of strings, so can be represented as:

    {\n  \"list\": {\n    \"type\": {\n      \"string\": {\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    },\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n

    To know what parameters each type can take, refer to the Protobuf definitions in type.proto.

    Schemas of tables can be represented with a NamedStruct message, which is the combination of a struct type containing all the columns and a list of column names. For the orders table, this will look like:

    {\n  \"names\": [\n    \"product_id\",\n    \"quantity\",\n    \"order_date\",\n    \"price\"\n  ],\n  \"struct\": {\n    \"types\": [\n      {\n        \"i64\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"i32\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"date\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"decimal\": {\n          \"precision\": 10,\n          \"scale\": 2,\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      }\n    ],\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n

    Here, names is the names of all fields. In nested schemas, this includes the names of subfields in depth-first order. So for the products table, the details struct field will be included as well as the two subfields (manufacturer and year_created) right after. And because it\u2019s depth first, these subfields appear before product_name. The full schema looks like:

    {\n  \"names\": [\n    \"product_id\",\n    \"categories\",\n    \"details\",\n    \"manufacturer\",\n    \"year_created\",\n    \"product_name\"\n  ],\n  \"struct\": {\n    \"types\": [\n      {\n        \"i64\": {\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"list\": {\n          \"type\": {\n            \"string\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          \"nullability\": \"NULLABILITY_REQUIRED\"\n        }\n      },\n      {\n        \"struct\": {\n          \"types\": [\n            {\n              \"string\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              },\n              \"i32\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              }\n            }\n          ],\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      },\n      {\n        \"string\": {\n          \"nullability\": \"NULLABILITY_NULLABLE\"\n        }\n      }\n    ],\n    \"nullability\": \"NULLABILITY_REQUIRED\"\n  }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#expressions","title":"Expressions","text":"

    The next basic building block we will need is expressions. Expressions can be one of several things, including:

    • Field references
    • Literal values
    • Functions
    • Subqueries
    • Window Functions

    Since some expressions such as functions can contain other expressions, expressions can be represented as a tree. Literal values and field references typically are the leaf nodes.

    For the expression INDEX_IN(categories, \"Computers\") IS NULL, we have a field reference categories, a literal string \"Computers\", and two functions\u2014 INDEX_IN and IS NULL.

    The field reference for categories is represented by:

    {\n  \"selection\": {\n    \"directReference\": {\n      \"structField\": {\n        \"field\": 1\n      }\n    },\n    \"rootReference\": {}\n  }\n}\n

    Whereas SQL references field by names, Substrait always references fields numerically. This means that a Substrait expression only makes sense relative to a certain schema. As we\u2019ll see later when we discuss relations, for a filter relation this will be relative to the input schema, so the 1 here is referring to the second field of products.

    Note

    Protobuf may not serialize fields with integer type and value 0, since 0 is the default. So if you instead saw \"structField\": {}, know that is is equivalent to \"structField\": { \"field\": 0 }.

    \"Computers\" will be translated to a literal expression:

    {\n  \"literal\": {\n    \"string\": \"Computers\"\n  }\n}\n

    Both IS NULL and INDEX_IN will be scalar function expressions. Available functions in Substrait are defined in extension YAML files contained in https://github.com/substrait-io/substrait/tree/main/extensions. Additional extensions may be created elsewhere. IS NULL is defined as a is_null function in functions_comparison.yaml and INDEX_IN is defined as index_in function in functions_set.yaml.

    First, the expression for INDEX_IN(\"Computers\", categories) is:

    {\n  \"scalarFunction\": {\n    \"functionReference\": 1,\n    \"outputType\": {\n      \"i64\": {\n        \"nullability\": \"NULLABILITY_NULLABLE\"\n      }\n    },\n    \"arguments\": [\n      {\n        \"value\": {\n          \"literal\": {\n            \"string\": \"Computers\"\n          }\n        }\n      },\n      {\n        \"value\": {\n          \"selection\": {\n            \"directReference\": {\n              \"structField\": {\n                \"field\": 1\n              }\n            },\n            \"rootReference\": {}\n          }\n        }\n      }\n    ]\n  }\n}\n

    functionReference will be explained later in the plans section. For now, understand that it\u2019s a ID that corresponds to an entry in a list of function definitions that we will create later.

    outputType defines the type the function outputs. We know this is a nullable i64 type since that is what the function definition declares in the YAML file.

    arguments defines the arguments being passed into the function, which are all done positionally based on the function definition in the YAML file. The two arguments will be familiar as the literal and the field reference we constructed earlier.

    To create the final expression, we just need to wrap this in another scalar function expression for IS NULL.

    {\n  \"scalarFunction\": {\n    \"functionReference\": 2,\n    \"outputType\": {\n      \"bool\": {\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    },\n    \"arguments\": [\n      {\n        \"value\": {\n          \"scalarFunction\": {\n            \"functionReference\": 1,\n            \"outputType\": {\n              \"i64\": {\n                \"nullability\": \"NULLABILITY_NULLABLE\"\n              }\n            },\n            \"arguments\": [\n              {\n                \"value\": {\n                  \"literal\": {\n                    \"string\": \"Computers\"\n                  }\n                }\n              },\n              {\n                \"value\": {\n                  \"selection\": {\n                    \"directReference\": {\n                      \"structField\": {\n                        \"field\": 1\n                      }\n                    },\n                    \"rootReference\": {}\n                  }\n                }\n              }\n            ]\n          }\n        }\n      }\n    ]\n  }\n}\n

    To see what other types of expressions are available and what fields they take, see the Expression proto definition in algebra.proto.

    "},{"location":"tutorial/sql_to_substrait/#relations","title":"Relations","text":"

    In most SQL engines, a logical or physical plan is represented as a tree of nodes, such as filter, project, scan, or join. The left diagram below may be a familiar representation of our plan, where nodes feed data into each other moving from left to right. In Substrait, each of these nodes is a Relation.

    A relation that takes another relation as input will contain (or refer to) that relation. This is usually a field called input, but sometimes different names are used in relations that take multiple inputs. For example, join relations take two inputs, with field names left and right. In JSON, the rough layout for the relations in our plan will look like:

    {\n    \"aggregate\": {\n        \"input\": {\n            \"join\": {\n                \"left\": {\n                    \"filter\": {\n                        \"input\": {\n                            \"read\": {\n                                ...\n                            }\n                        },\n                        ...\n                    }\n                },\n                \"right\": {\n                    \"read\": {\n                        ...\n                    }\n                },\n                ...\n            }\n        },\n        ...\n    }\n}\n

    For our plan, we need to define the read relations for each table, a filter relation to exclude the \"Computer\" category from the products table, a join relation to perform the inner join, and finally an aggregate relation to compute the total sales.

    The read relations are composed of a baseSchema and a namedTable field. The type of read is a named table, so the namedTable field is present with names containing the list of name segments (my_database.my_table). Other types of reads include virtual tables (a table of literal values embedded in the plan) and a list of files. See Read Definition Types for more details. The baseSchema is the schemas we defined earlier and namedTable are just the names of the tables. So for reading the orders table, the relation looks like:

    {\n  \"read\": {\n    \"namedTable\": {\n      \"names\": [\n        \"orders\"\n      ]\n    },\n    \"baseSchema\": {\n      \"names\": [\n        \"product_id\",\n        \"quantity\",\n        \"order_date\",\n        \"price\"\n      ],\n      \"struct\": {\n        \"types\": [\n          {\n            \"i64\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"i32\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"date\": {\n              \"nullability\": \"NULLABILITY_REQUIRED\"\n            }\n          },\n          {\n            \"decimal\": {\n              \"scale\": 10,\n              \"precision\": 2,\n              \"nullability\": \"NULLABILITY_NULLABLE\"\n            }\n          }\n        ],\n        \"nullability\": \"NULLABILITY_REQUIRED\"\n      }\n    }\n  }\n}\n

    Read relations are leaf nodes. Leaf nodes don\u2019t depend on any other node for data and usually represent a source of data in our plan. Leaf nodes are then typically used as input for other nodes that manipulate the data. For example, our filter node will take the products read relation as an input.

    The filter node will also take a condition field, which will just be the expression we constructed earlier.

    {\n  \"filter\": {\n    \"input\": {\n      \"read\": { ... }\n    },\n    \"condition\": {\n      \"scalarFunction\": {\n        \"functionReference\": 2,\n        \"outputType\": {\n          \"bool\": {\n            \"nullability\": \"NULLABILITY_REQUIRED\"\n          }\n        },\n        \"arguments\": [\n          {\n            \"value\": {\n              \"scalarFunction\": {\n                \"functionReference\": 1,\n                \"outputType\": {\n                  \"i64\": {\n                    \"nullability\": \"NULLABILITY_NULLABLE\"\n                  }\n                },\n                \"arguments\": [\n                  {\n                    \"value\": {\n                      \"literal\": {\n                        \"string\": \"Computers\"\n                      }\n                    }\n                  },\n                  {\n                    \"value\": {\n                      \"selection\": {\n                        \"directReference\": {\n                          \"structField\": {\n                            \"field\": 1\n                          }\n                        },\n                        \"rootReference\": {}\n                      }\n                    }\n                  }\n                ]\n              }\n            }\n          }\n        ]\n      }\n    }\n  }\n}\n

    The join relation will take two inputs. In the left field will be the read relation for orders and in the right field will be the filter relation (from products). The type field is an enum that allows us to specify we want an inner join. Finally, the expression field contains the expression to use in the join. Since we haven\u2019t used the equals() function yet, we use the reference number 3 here. (Again, we\u2019ll see at the end with plans how these functions are resolved.) The arguments refer to fields 0 and 4, which are indices into the combined schema formed from the left and right inputs. We\u2019ll discuss later in Field Indices where these come from.

    {\n  \"join\": {\n    \"left\": { ... },\n    \"right\": { ... },\n    \"type\": \"JOIN_TYPE_INNER\",\n    \"expression\": {\n      \"scalarFunction\": {\n        \"functionReference\": 3,\n        \"outputType\": {\n          \"bool\": {\n            \"nullability\": \"NULLABILITY_NULLABLE\"\n          }\n        },\n        \"arguments\": [\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 0\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 4\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          }\n        ]\n      }\n    }\n  }\n}\n

    The final aggregation requires two things, other than the input. First is the groupings. We\u2019ll use a single grouping expression containing the references to the fields product_name and product_id. (Multiple grouping expressions can be used to do cube aggregations.)

    For measures, we\u2019ll need to define sum(quantity * price) as sales. Substrait is stricter about data types, and quantity is an integer while price is a decimal. So we\u2019ll first need to cast quantity to a decimal, making the Substrait expression more like sum(multiply(cast(decimal(10, 2), quantity), price)). Both sum() and multiply() are functions, defined in functions_arithmetic_demical.yaml. However cast() is a special expression type in Substrait, rather than a function.

    Finally, the naming with as sales will be handled at the end as part of the plan, so that\u2019s not part of the relation. Since we are always using field indices to refer to fields, Substrait doesn\u2019t record any intermediate field names.

    {\n  \"aggregate\": {\n    \"input\": { ... },\n    \"groupings\": [\n      {\n        \"groupingExpressions\": [\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 0\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n          {\n            \"value\": {\n              \"selection\": {\n                \"directReference\": {\n                  \"structField\": {\n                    \"field\": 7\n                  }\n                },\n                \"rootReference\": {}\n              }\n            }\n          },\n        ]\n      }\n    ],\n    \"measures\": [\n      {\n        \"measure\": {\n          \"functionReference\": 4,\n          \"outputType\": {\n            \"decimal\": {\n              \"precision\": 38,\n              \"scale\": 2,\n              \"nullability\": \"NULLABILITY_NULLABLE\"\n            }\n          },\n          \"arguments\": [\n            {\n              \"value\": {\n                \"scalarFunction\": {\n                  \"functionReference\": 5,\n                  \"outputType\": {\n                    \"decimal\": {\n                      \"precision\": 38,\n                      \"scale\": 2,\n                      \"nullability\": \"NULLABILITY_NULLABLE\"\n                    }\n                  },\n                  \"arguments\": [\n                    {\n                      \"value\": {\n                        \"cast\": {\n                          \"type\": {\n                            \"decimal\": {\n                              \"precision\": 10,\n                              \"scale\": 2,\n                              \"nullability\": \"NULLABILITY_REQUIRED\"\n                            }\n                          },\n                          \"input\": {\n                            \"selection\": {\n                              \"directReference\": {\n                                \"structField\": {\n                                  \"field\": 1\n                                }\n                              },\n                              \"rootReference\": {}\n                            }\n                          }\n                        }\n                      }\n                    },\n                    {\n                      \"value\": {\n                        \"selection\": {\n                          \"directReference\": {\n                            \"structField\": {\n                              \"field\": 3\n                            }\n                          },\n                          \"rootReference\": {}\n                        }\n                      }\n                    }\n                  ]\n                }\n              }\n            }\n          ]\n        }\n      }\n    ]\n  }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#field-indices","title":"Field indices","text":"

    So far, we have glossed over the field indices. Now that we\u2019ve built up each of the relations, it will be a bit easier to explain them.

    Throughout the plan, data always has some implicit schema, which is modified by each relation. Often, the schema can change within a relation\u2013we\u2019ll discuss an example in the next section. Each relation has it\u2019s own rules in how schemas are modified, called the output order or emit order. For the purposes of our query, the relevant rules are:

    • For Read relations, their output schema is the schema of the table.
    • For Filter relations, the output schema is the same as in the input schema.
    • For Joins relations, the input schema is the concatenation of the left and then the right schemas. The output schema is the same.
    • For Aggregate relations, the output schema is the group by fields followed by the measures.

    Note

    Sometimes it can be hard to tell what the implicit schema is. For help determining that, consider using the substrait-validator tool, described in Next Steps.

    The diagram below shows the mapping of field indices within each relation and how each of the field references show up in each relations properties.

    "},{"location":"tutorial/sql_to_substrait/#column-selection-and-emit","title":"Column selection and emit","text":"

    As written, the aggregate output schema will be:

    0: product_id: i64\n1: product_name: string\n2: sales: decimal(32, 8)\n

    But we want product_name to come before product_id in our output. How do we reorder those columns?

    You might be tempted to add a Project relation at the end. However, the project relation only adds columns; it is not responsible for subsetting or reordering columns.

    Instead, any relation can reorder or subset columns through the emit property. By default, it is set to direct, which outputs all columns \u201cas is\u201d. But it can also be specified as a sequence of field indices.

    For simplicity, we will add this to the final aggregate relation. We could also add it to all relations, only selecting the fields we strictly need in later relations. Indeed, a good optimizer would probably do that to our plan. And for some engines, the emit property is only valid within a project relation, so in those cases we would need to add that relation in combination with emit. But to keep things simple, we\u2019ll limit the columns at the end within the aggregation relation.

    For our final column selection, we\u2019ll modify the top-level relation to be:

    {\n  \"aggregate\": {\n    \"input\": { ... },\n    \"groupings\": [ ... ],\n    \"measures\": [ ... ],\n    \"common\": {\n      \"emit\": {\n        \"outputMapping\": [1, 0, 2]\n      }\n    }\n}\n
    "},{"location":"tutorial/sql_to_substrait/#plans","title":"Plans","text":"

    Now that we\u2019ve constructed our relations, we can put it all into a plan. Substrait plans are the only messages that can be sent and received on their own. Recall that earlier, we had function references to those YAML files, but so far there\u2019s been no place to tell a consumer what those function reference IDs mean or which extensions we are using. That information belongs at the plan level.

    The overall layout for a plan is

    {\n  \"extensionUris\": [ ... ],\n  \"extensions\": [ ... ],\n  \"relations\": [\n    {\n      \"root\": {\n        \"names\": [\n          \"product_name\",\n          \"product_id\",\n          \"sales\"\n        ],\n        \"input\": { ... }\n      }\n    }\n  ]\n}\n

    The relations field is a list of Root relations. Most queries only have one root relation, but the spec allows for multiple so a common plan could be referenced by other plans, sort of like a CTE (Common Table Expression) from SQL. The root relation provides the final column names for our query. The input to this relation is our aggregate relation (which contains all the other relations as children).

    For extensions, we need to provide extensionUris with the locations of the YAML files we used and extensions with the list of functions we used and which extension they come from.

    In our query, we used:

    • index_in (1), from functions_set.yaml,
    • is_null (2), from functions_comparison.yaml,
    • equal (3), from functions_comparison.yaml,
    • sum (4), from functions_arithmetic_decimal.yaml,
    • multiply (5), from functions_arithmetic_decimal.yaml.

    So first we can create the three extension uris:

    [\n  {\n    \"extensionUriAnchor\": 1,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_set.yaml\"\n  },\n  {\n    \"extensionUriAnchor\": 2,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_comparison.yaml\"\n  },\n  {\n    \"extensionUriAnchor\": 3,\n    \"uri\": \"https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic_decimal.yaml\"\n  }\n]\n

    Then we can create the extensions:

    [\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 1,\n      \"functionAnchor\": 1,\n      \"name\": \"index_in\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 2,\n      \"functionAnchor\": 2,\n      \"name\": \"is_null\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 2,\n      \"functionAnchor\": 3,\n      \"name\": \"equal\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 3,\n      \"functionAnchor\": 4,\n      \"name\": \"sum\"\n    }\n  },\n  {\n    \"extensionFunction\": {\n      \"extensionUriReference\": 3,\n      \"functionAnchor\": 5,\n      \"name\": \"multiply\"\n    }\n  }\n]\n

    Once we\u2019ve added our extensions, the plan is complete. Our plan outputted in full is: final_plan.json.

    "},{"location":"tutorial/sql_to_substrait/#next-steps","title":"Next steps","text":"

    Validate and introspect plans using substrait-validator. Amongst other things, this tool can show what the current schema and column indices are at each point in the plan. Try downloading the final plan JSON above and generating an HTML report on the plan with:

    substrait-validator final_plan.json --out-file output.html\n
    "},{"location":"types/type_classes/","title":"Type Classes","text":"

    In Substrait, the \u201cclass\u201d of a type, not to be confused with the concept from object-oriented programming, defines the set of non-null values that instances of a type may assume.

    Implementations of a Substrait type must support at least this set of values, but may include more; for example, an i8 could be represented using the same in-memory format as an i32, as long as functions operating on i8 values within [-128..127] behave as specified (in this case, this means 8-bit overflow must work as expected). Operating on values outside the specified range is unspecified behavior.

    "},{"location":"types/type_classes/#simple-types","title":"Simple Types","text":"

    Simple type classes are those that don\u2019t support any form of configuration. For simplicity, any generic type that has only a small number of discrete implementations is declared directly, as opposed to via configuration.

    Type Name Description Protobuf representation for literals boolean A value that is either True or False. bool i8 A signed integer within [-128..127], typically represented as an 8-bit two\u2019s complement number. int32 i16 A signed integer within [-32,768..32,767], typically represented as a 16-bit two\u2019s complement number. int32 i32 A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two\u2019s complement number. int32 i64 A signed integer within [\u22129,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two\u2019s complement number. int64 fp32 A 4-byte single-precision floating point number with the same range and precision as defined for the IEEE 754 32-bit floating-point format. float fp64 An 8-byte double-precision floating point number with the same range and precision as defined for the IEEE 754 64-bit floating-point format. double string A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. string binary A binary value, [0..2,147,483,647] bytes in length. binary timestamp A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone) timestamp_tz A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 UTC date A date within [1000-01-01..9999-12-31]. int32 days since 1970-01-01 time A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. int64 microseconds past midnight interval_year Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m. int32 years and int32 months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. -10000y 200000m is not allowed) interval_day Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. 1d 0s is considered equal to 0d 86400s. int32 days, int32 seconds, and int32 microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. 3650001d -86400s 0us is not allowed) uuid A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: c48ffa9e-64f4-44cb-ae47-152b4e60e77b. Any 128-bit value is allowed, without specific adherence to RFC4122. 16-byte binary"},{"location":"types/type_classes/#compound-types","title":"Compound Types","text":"

    Compound type classes are type classes that need to be configured by means of a parameter pack.

    Type Name Description Protobuf representation for literals FIXEDCHAR<L> A fixed-length unicode string of L characters. L must be within [1..2,147,483,647]. L-character string VARCHAR<L> A unicode string of at most L characters.L must be within [1..2,147,483,647]. string with at most L characters FIXEDBINARY<L> A binary string of L bytes. When casting, values shorter than L are padded with zeros, and values longer than L are right-trimmed. L-byte bytes DECIMAL<P, S> A fixed-precision decimal value having precision (P, number of digits) <= 38 and scale (S, number of fractional digits) 0 <= S <= P. 16-byte bytes representing a little-endian 128-bit integer, to be divided by 10^S to get the decimal value STRUCT<T1,\u2026,Tn> A list of types in a defined order. repeated Literal, types matching T1..Tn NSTRUCT<N:T1,\u2026,N:Tn> Pseudo-type: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait\u2019s core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. n/a LIST<T> A list of values of type T. The list can be between [0..2,147,483,647] values in length. repeated Literal, all types matching T MAP<K, V> An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. repeated KeyValue (in turn two Literals), all key types matching K and all value types matching V PRECISIONTIMESTAMP<P> A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone) PRECISIONTIMESTAMPTZ<P> A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC"},{"location":"types/type_classes/#user-defined-types","title":"User-Defined Types","text":"

    User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of simple extensions. An extension can declare an arbitrary number of user defined extension types. Once a type has been declared, it can be used in function declarations.

    A YAML example of an extension type is below:

    name: point\nstructure:\n  longitude: i32\n  latitude: i32\n

    This declares a new type (namespaced to the associated YAML file) called \u201cpoint\u201d. This type is composed of two i32 values named longitude and latitude.

    "},{"location":"types/type_classes/#structure-and-opaque-types","title":"Structure and opaque types","text":"

    The name-type object notation used above is syntactic sugar for NSTRUCT<longitude: i32, latitude: i32>. The following means the same thing:

    name: point\nstructure: \"NSTRUCT<longitude: i32, latitude: i32>\"\n

    The structure field of a type is only intended to inform systems that don\u2019t have built-in support for the type how they can transfer the data type from one point to another without unnecessary serialization/deserialization and without loss of type safety. Note that it is currently not possible to \u201cunpack\u201d a user-defined type class into its structure type or components thereof using FieldReferences or any other specialized record expression; if support for this is desired for a particular type, this can be accomplished with an extension function.

    The structure field is optional. If not specified, the type class is considered to be fully opaque. This implies that a systems without built-in support for the type cannot manipulate values in any way, including moving and cloning. This may be useful for exotic, context-sensitive types, such as raw pointers or identifiers that cannot be cloned.

    Note however that the vast majority of types can be trivially moved and copied, even if they cannot be precisely represented using Substrait\u2019s built-in types. In this case, it is recommended to use binary or FIXEDBINARY<n> (where n is the size of the type) as the structure type. For example, an unsigned 32-bit integer type could be defined as follows:

    name: u32\nstructure: \"FIXEDBINARY<4>\"\n

    In this case, i32 might also be used.

    "},{"location":"types/type_classes/#literals","title":"Literals","text":"

    Literals for user-defined types are represented using protobuf Any messages.

    "},{"location":"types/type_classes/#compound-user-defined-types","title":"Compound User-Defined Types","text":"

    User-defined types may be turned into compound types by requiring parameters to be passed to them. The supported \u201cmeta-types\u201d for parameters are data types (like those used in LIST, MAP, and STRUCT), booleans, integers, enumerations, and strings. Using parameters, we could redefine \u201cpoint\u201d with different types of coordinates. For example:

    name: point\nparameters:\n  - name: T\n    description: |\n      The type used for the longitude and latitude\n      components of the point.\n    type: dataType\n

    or:

    name: point\nparameters:\n  - name: coordinate_type\n    type: enumeration\n    options:\n      - integer\n      - double\n

    or:

    name: point\nparameters:\n  - name: LONG\n    type: dataType\n  - name: LAT\n    type: dataType\n

    We can\u2019t specify the internal structure in this case, because there is currently no support for derived types in the structure.

    The allowed range can be limited for integer parameters. For example:

    name: vector\nparameters:\n  - name: T\n    type: dataType\n  - name: dimensions\n    type: integer\n    min: 2\n    max: 3\n

    This specifies a vector that can be either 2- or 3-dimensional. Note however that it\u2019s not currently possible to put constraints on data type, string, or (technically) boolean parameters.

    Similar to function arguments, the last parameter may be specified to be variadic, allowing it to be specified one or more times instead of only once. For example:

    name: union\nparameters:\n  - name: T\n    type: dataType\nvariadic: true\n

    This defines a type that can be parameterized with one or more other data types, for example union<i32, i64> but also union<bool>. Zero or more is also possible, by making the last argument optional:

    name: tuple\nparameters:\n  - name: T\n    type: dataType\n    optional: true\nvariadic: true\n

    This would also allow for tuple<>, to define a zero-tuple.

    "},{"location":"types/type_parsing/","title":"Type Syntax Parsing","text":"

    In many places, it is useful to have a human-readable string representation of data types. Substrait has a custom syntax for type declaration. The basic structure of a type declaration is:

    name?[variation]<param0,...,paramN>\n

    The components of this expression are:

    Component Description Required Name Each type has a name. A type is expressed by providing a name. This name can be expressed in arbitrary case (e.g. varchar and vArChAr are equivalent) although lowercase is preferred. Nullability indicator A type is either non-nullable or nullable. To express nullability, a question mark is added after the type name (before any parameters). Optional, defaults to non-nullable Variation When expressing a type, a user can define the type based on a type variation. Some systems use type variations to describe different underlying representations of the same data type. This is expressed as a bracketed integer such as [2]. Optional, defaults to [0] Parameters Compound types may have one or more configurable properties. The two main types of properties are integer and type properties. The parameters for each type correspond to a list of known properties associated with a type as declared in the order defined in the type specification. For compound types (types that contain types), the data type syntax will include nested type declarations. The one exception is structs, which are further outlined below. Required where parameters are defined"},{"location":"types/type_parsing/#grammars","title":"Grammars","text":"

    It is relatively easy in most languages to produce simple parser & emitters for the type syntax. To make that easier, Substrait also includes an ANTLR grammar to ease consumption and production of types. (The grammar also supports an entire language for representing plans as text.)

    "},{"location":"types/type_parsing/#structs-named-structs","title":"Structs & Named Structs","text":"

    Structs are unique from other types because they have an arbitrary number of parameters. The parameters are recursive and may include their own subproperties. Struct parsing is declared in the following two ways:

    YAMLText Format Examples
    # Struct\nstruct?[variation]<type0, type1,..., typeN>\n\n# Named Struct\nnstruct?[variation]<name0:type0, name1:type1,..., nameN:typeN>\n
    // Struct\nstruct?<string, i8, i32?, timestamp_tz>\n\n// Named structs are not yet supported in the text format.\n

    In the normal (non-named) form, struct declares a set of types that are fields within that struct. In the named struct form, the parameters are formed by tuples of names + types, delineated by a colon. Names that are composed only of numbers and letters can be left unquoted. For other characters, names should be quoted with double quotes and use backslash for double-quote escaping.

    Note, in core Substrait algebra, fields are unnamed and references are always based on zero-index ordinal positions. However, data inputs must declare name-to-ordinal mappings and outputs must declare ordinal-to-name mappings. As such, Substrait also provides a named struct which is a pseudo-type that is useful for human consumption. Outside these places, most structs in a Substrait plan are structs, not named-structs. The two cannot be used interchangeably.

    "},{"location":"types/type_parsing/#other-complex-types","title":"Other Complex Types","text":"

    Similar to structs, maps and lists can also have a type as one of their parameters. Type references may be recursive. The key for a map is typically a simple type but it is not required.

    YAMLText Format Examples
    list?<type>>\nmap<type0, type1>\n
    list?<list<string>>\nlist<struct<string, i32>>\nmap<i32?, list<map<i32, string?>>>\n
    "},{"location":"types/type_system/","title":"Type System","text":"

    Substrait tries to cover the most common types used in data manipulation. Types beyond this common core may be represented using simple extensions.

    Substrait types fundamentally consist of four components:

    Component Condition Examples Description Class Always i8, string, STRUCT, extensions Together with the parameter pack, describes the set of non-null values supported by the type. Subdivided into simple and compound type classes. Nullability Always Either NULLABLE (? suffix) or REQUIRED (no suffix) Describes whether values of this type can be null. Note that null is considered to be a special value of a nullable type, rather than the only value of a special null type. Variation Always No suffix or explicitly [0] (system-preferred), or an extension Allows different variations of the same type class to exist in a system at a time, usually distinguished by in-memory format. Parameters Compound types only <10, 2> (for DECIMAL), <i32, string> (for STRUCT) Some combination of zero or more data types or integers. The expected set of parameters and the significance of each parameter depends on the type class.

    Refer to Type Parsing for a description of the syntax used to describe types.

    Note

    Substrait employs a strict type system without any coercion rules. All changes in types must be made explicit via cast expressions.

    "},{"location":"types/type_variations/","title":"Type Variations","text":"

    Type variations may be used to represent differences in representation between different consumers. For example, an engine might support dictionary encoding for a string, or could be using either a row-wise or columnar representation of a struct. All variations of a type are expected to have the same semantics when operated on by functions or other expressions.

    All variations except the \u201csystem-preferred\u201d variation (a.k.a. [0], see Type Parsing) must be defined using simple extensions. The key properties of these variations are:

    Property Description Base Type Class The type class that this variation belongs to. Name The name used to reference this type. Should be unique within type variations for this parent type within a simple extension. Description A human description of the purpose of this type variation. Function Behavior INHERITS or SEPARATE: whether functions that support the system-preferred variation implicitly also support this variation, or whether functions should be resolved independently. For example, if one has the function add(i8,i8) defined and then defines an i8 variation, this determines whether the i8 variation can be bound to the base add operation (inherits) or whether a specialized version of add needs to be defined specifically for this variation (separate). Defaults to inherits."}]} \ No newline at end of file diff --git a/serialization/binary_serialization/index.html b/serialization/binary_serialization/index.html index e3407df3..6f677b74 100644 --- a/serialization/binary_serialization/index.html +++ b/serialization/binary_serialization/index.html @@ -1,4 +1,4 @@ - Binary Serialization - Substrait: Cross-Language Serialization for Relational Algebra

    Binary Serialization

    Substrait can be serialized into a protobuf-based binary representation. The proto schema/IDL files can be found on GitHub. Proto files are place in the io.substrait namespace for C++/Java and the Substrait.Protobuf namespace for C#.

    Plan

    The main top-level object used to communicate a Substrait plan using protobuf is a Plan message (see the ExtendedExpression for an alternative other top-level object). The plan message is composed of a set of data structures that minimize repetition in the serialization along with one (or more) Relation trees.

    message Plan {
    + Binary Serialization - Substrait: Cross-Language Serialization for Relational Algebra      

    Binary Serialization

    Substrait can be serialized into a protobuf-based binary representation. The proto schema/IDL files can be found on GitHub. Proto files are place in the io.substrait namespace for C++/Java and the Substrait.Protobuf namespace for C#.

    Plan

    The main top-level object used to communicate a Substrait plan using protobuf is a Plan message (see the ExtendedExpression for an alternative other top-level object). The plan message is composed of a set of data structures that minimize repetition in the serialization along with one (or more) Relation trees.

    message Plan {
       // Substrait version of the plan. Optional up to 0.17.0, required for later
       // versions.
       Version version = 6;
    @@ -126,4 +126,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/serialization/text_serialization/index.html b/serialization/text_serialization/index.html index 77046c9f..ae7f537e 100644 --- a/serialization/text_serialization/index.html +++ b/serialization/text_serialization/index.html @@ -1,4 +1,4 @@ - Text Serialization - Substrait: Cross-Language Serialization for Relational Algebra

    Text Serialization

    To maximize the new user experience, it is important for Substrait to have a text representation of plans. This allows people to experiment with basic tooling. Building simple CLI tools that do things like SQL > Plan and Plan > SQL or REPL plan construction can all be done relatively straightforwardly with a text representation.

    The recommended text serialization format is JSON. Since the text format is not designed for performance, the format can be produced to maximize readability. This also allows nice symmetry between the construction of plans and the configuration of various extensions such as function signatures and user defined types.

    To ensure the JSON is valid, the object will be defined using the OpenApi 3.1 specification. This not only allows strong validation, the OpenApi specification enables code generators to be easily used to produce plans in many languages.

    While JSON will be used for much of the plan serialization, Substrait uses a custom simplistic grammar for record level expressions. While one can construct an equation such as (10 + 5)/2 using a tree of function and literal objects, it is much more human-readable to consume a plan when the information is written similarly to the way one typically consumes scalar expressions. This grammar will be maintained in an ANTLR grammar (targetable to multiple programming languages) and is also planned to be supported via JSON schema definition format tag so that the grammar can be validated as part of the schema validation.

    GitHub

    Text Serialization

    To maximize the new user experience, it is important for Substrait to have a text representation of plans. This allows people to experiment with basic tooling. Building simple CLI tools that do things like SQL > Plan and Plan > SQL or REPL plan construction can all be done relatively straightforwardly with a text representation.

    The recommended text serialization format is JSON. Since the text format is not designed for performance, the format can be produced to maximize readability. This also allows nice symmetry between the construction of plans and the configuration of various extensions such as function signatures and user defined types.

    To ensure the JSON is valid, the object will be defined using the OpenApi 3.1 specification. This not only allows strong validation, the OpenApi specification enables code generators to be easily used to produce plans in many languages.

    While JSON will be used for much of the plan serialization, Substrait uses a custom simplistic grammar for record level expressions. While one can construct an equation such as (10 + 5)/2 using a tree of function and literal objects, it is much more human-readable to consume a plan when the information is written similarly to the way one typically consumes scalar expressions. This grammar will be maintained in an ANTLR grammar (targetable to multiple programming languages) and is also planned to be supported via JSON schema definition format tag so that the grammar can be validated as part of the schema validation.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 20e568df..705e16c4 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,242 +2,242 @@ https://substrait.io/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/about/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/faq/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/governance/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/community/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/community/powered_by/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/aggregate_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/embedded_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/extended_expression/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/field_references/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/scalar_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/specialized_record_expressions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/subqueries/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/table_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/user_defined_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/expressions/window_functions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_aggregate_approx/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_aggregate_generic/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_arithmetic/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_arithmetic_decimal/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_boolean/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_comparison/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_datetime/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_geometry/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_logarithmic/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_rounding/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_set/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/extensions/functions_string/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/relations/basics/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/relations/embedded_relations/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/relations/logical_relations/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/relations/physical_relations/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/relations/user_defined_relations/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/serialization/binary_serialization/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/serialization/text_serialization/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/spec/extending/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/spec/specification/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/spec/technology_principles/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/spec/versioning/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/tools/producer_tools/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/tools/substrait_validator/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/tools/third_party_tools/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/tutorial/sql_to_substrait/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/types/type_classes/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/types/type_parsing/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/types/type_system/ - 2024-03-28 + 2024-04-12 daily https://substrait.io/types/type_variations/ - 2024-03-28 + 2024-04-12 daily \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 10cb22b7..48610199 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ diff --git a/spec/extending/index.html b/spec/extending/index.html index d67337dd..9f41b1d0 100644 --- a/spec/extending/index.html +++ b/spec/extending/index.html @@ -1,4 +1,4 @@ - Extending - Substrait: Cross-Language Serialization for Relational Algebra

    Extending

    Substrait is a community project and requires consensus about new additions to the specification in order to maintain consistency. The best way to get consensus is to discuss ideas. The main ways to communicate are:

    • Substrait Mailing List
    • Substrait Slack
    • Community Meeting

    Minor changes

    Simple changes like typos and bug fixes do not require as much effort. File an issue or send a PR and we can discuss it there.

    Complex changes

    For complex features it is useful to discuss the change first. It will be useful to gather some background information to help get everyone on the same page.

    Outline the issue

    Language

    Every engine has its own terminology. Every Spark user probably knows what an “attribute” is. Velox users will know what a “RowVector” means. Etc. However, Substrait is used by people that come from a variety of backgrounds and you should generally assume that its users do not know anything about your own implementation. As a result, all PRs and discussion should endeavor to use Substrait terminology wherever possible.

    Motivation

    What problems does this relation solve? If it is a more logical relation then how does it allow users to express new capabilities? If it is more of an internal relation then how does it map to existing logical relations? How is it different than other existing relations? Why do we need this?

    Examples

    Provide example input and output for the relation. Show example plans. Try and motivate your examples, as best as possible, with something that looks like a real world problem. These will go a long ways towards helping others understand the purpose of a relation.

    Alternatives

    Discuss what alternatives are out there. Are there other ways to achieve similar results? Do some systems handle this problem differently?

    Survey existing implementation

    It’s unlikely that this is the first time that this has been done. Figuring out

    Prototype the feature

    Novel approaches should be implemented as an extension first.

    Substrait design principles

    Substrait is designed around interoperability so a feature only used by a single system may not be accepted. But don’t dispair! Substrait has a highly developed extension system for this express purpose.

    You don’t have to do it alone

    If you are hoping to add a feature and these criteria seem intimidating then feel free to start a mailing list discussion before you have all the information and ask for help. Investigating other implementations, in particular, is something that can be quite difficult to do on your own.

    GitHub

    Extending

    Substrait is a community project and requires consensus about new additions to the specification in order to maintain consistency. The best way to get consensus is to discuss ideas. The main ways to communicate are:

    • Substrait Mailing List
    • Substrait Slack
    • Community Meeting

    Minor changes

    Simple changes like typos and bug fixes do not require as much effort. File an issue or send a PR and we can discuss it there.

    Complex changes

    For complex features it is useful to discuss the change first. It will be useful to gather some background information to help get everyone on the same page.

    Outline the issue

    Language

    Every engine has its own terminology. Every Spark user probably knows what an “attribute” is. Velox users will know what a “RowVector” means. Etc. However, Substrait is used by people that come from a variety of backgrounds and you should generally assume that its users do not know anything about your own implementation. As a result, all PRs and discussion should endeavor to use Substrait terminology wherever possible.

    Motivation

    What problems does this relation solve? If it is a more logical relation then how does it allow users to express new capabilities? If it is more of an internal relation then how does it map to existing logical relations? How is it different than other existing relations? Why do we need this?

    Examples

    Provide example input and output for the relation. Show example plans. Try and motivate your examples, as best as possible, with something that looks like a real world problem. These will go a long ways towards helping others understand the purpose of a relation.

    Alternatives

    Discuss what alternatives are out there. Are there other ways to achieve similar results? Do some systems handle this problem differently?

    Survey existing implementation

    It’s unlikely that this is the first time that this has been done. Figuring out

    Prototype the feature

    Novel approaches should be implemented as an extension first.

    Substrait design principles

    Substrait is designed around interoperability so a feature only used by a single system may not be accepted. But don’t dispair! Substrait has a highly developed extension system for this express purpose.

    You don’t have to do it alone

    If you are hoping to add a feature and these criteria seem intimidating then feel free to start a mailing list discussion before you have all the information and ask for help. Investigating other implementations, in particular, is something that can be quite difficult to do on your own.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/spec/specification/index.html b/spec/specification/index.html index 029a7c99..8071782c 100644 --- a/spec/specification/index.html +++ b/spec/specification/index.html @@ -1,4 +1,4 @@ - Specification - Substrait: Cross-Language Serialization for Relational Algebra

    Specification

    Status

    The specification has passed the initial design phase and is now in the final stages of being fleshed out. The community is encouraged to identify (and address) any perceived gaps in functionality using GitHub issues and PRs. Once all of the planned implementations have been completed all deprecated fields will be eliminated and version 1.0 will be released.

    Components (Complete)

    Section Description
    Simple Types A way to describe the set of basic types that will be operated on within a plan. Only includes simple types such as integers and doubles (nothing configurable or compound).
    Compound Types Expression of types that go beyond simple scalar values. Key concepts here include: configurable types such as fixed length and numeric types as well as compound types such as structs, maps, lists, etc.
    Type Variations Physical variations to base types.
    User Defined Types Extensions that can be defined for specific IR producers/consumers.
    Field References Expressions to identify which portions of a record should be operated on.
    Scalar Functions Description of how functions are specified. Concepts include arguments, variadic functions, output type derivation, etc.
    Scalar Function List A list of well-known canonical functions in YAML format.
    Specialized Record Expressions Specialized expression types that are more naturally expressed outside the function paradigm. Examples include items such as if/then/else and switch statements.
    Aggregate Functions Functions that are expressed in aggregation operations. Examples include things such as SUM, COUNT, etc. Operations take many records and collapse them into a single (possibly compound) value.
    Window Functions Functions that relate a record to a set of encompassing records. Examples in SQL include RANK, NTILE, etc.
    User Defined Functions Reusable named functions that are built beyond the core specification. Implementations are typically registered thorough external means (drop a file in a directory, send a special command with implementation, etc.)
    Embedded Functions Functions implementations embedded directly within the plan. Frequently used in data science workflows where business logic is interspersed with standard operations.
    Relation Basics Basic concepts around relational algebra, record emit and properties.
    Logical Relations Common relational operations used in compute plans including project, join, aggregation, etc.
    Text Serialization A human producible & consumable representation of the plan specification.
    Binary Serialization A high performance & compact binary representation of the plan specification.

    Components (Designed but not Implemented)

    Section Description
    Table Functions Functions that convert one or more values from an input record into 0..N output records. Example include operations such as explode, pos-explode, etc.
    User Defined Relations Installed and reusable relational operations customized to a particular platform.
    Embedded Relations Relational operations where plans contain the “machine code” to directly execute the necessary operations.
    Physical Relations Specific execution sub-variations of common relational operations that describe have multiple unique physical variants associated with a single logical operation. Examples include hash join, merge join, nested loop join, etc.
    GitHub

    Specification

    Status

    The specification has passed the initial design phase and is now in the final stages of being fleshed out. The community is encouraged to identify (and address) any perceived gaps in functionality using GitHub issues and PRs. Once all of the planned implementations have been completed all deprecated fields will be eliminated and version 1.0 will be released.

    Components (Complete)

    Section Description
    Simple Types A way to describe the set of basic types that will be operated on within a plan. Only includes simple types such as integers and doubles (nothing configurable or compound).
    Compound Types Expression of types that go beyond simple scalar values. Key concepts here include: configurable types such as fixed length and numeric types as well as compound types such as structs, maps, lists, etc.
    Type Variations Physical variations to base types.
    User Defined Types Extensions that can be defined for specific IR producers/consumers.
    Field References Expressions to identify which portions of a record should be operated on.
    Scalar Functions Description of how functions are specified. Concepts include arguments, variadic functions, output type derivation, etc.
    Scalar Function List A list of well-known canonical functions in YAML format.
    Specialized Record Expressions Specialized expression types that are more naturally expressed outside the function paradigm. Examples include items such as if/then/else and switch statements.
    Aggregate Functions Functions that are expressed in aggregation operations. Examples include things such as SUM, COUNT, etc. Operations take many records and collapse them into a single (possibly compound) value.
    Window Functions Functions that relate a record to a set of encompassing records. Examples in SQL include RANK, NTILE, etc.
    User Defined Functions Reusable named functions that are built beyond the core specification. Implementations are typically registered thorough external means (drop a file in a directory, send a special command with implementation, etc.)
    Embedded Functions Functions implementations embedded directly within the plan. Frequently used in data science workflows where business logic is interspersed with standard operations.
    Relation Basics Basic concepts around relational algebra, record emit and properties.
    Logical Relations Common relational operations used in compute plans including project, join, aggregation, etc.
    Text Serialization A human producible & consumable representation of the plan specification.
    Binary Serialization A high performance & compact binary representation of the plan specification.

    Components (Designed but not Implemented)

    Section Description
    Table Functions Functions that convert one or more values from an input record into 0..N output records. Example include operations such as explode, pos-explode, etc.
    User Defined Relations Installed and reusable relational operations customized to a particular platform.
    Embedded Relations Relational operations where plans contain the “machine code” to directly execute the necessary operations.
    Physical Relations Specific execution sub-variations of common relational operations that describe have multiple unique physical variants associated with a single logical operation. Examples include hash join, merge join, nested loop join, etc.
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/spec/technology_principles/index.html b/spec/technology_principles/index.html index e53701c3..1b20091a 100644 --- a/spec/technology_principles/index.html +++ b/spec/technology_principles/index.html @@ -1,4 +1,4 @@ - Technology Principles - Substrait: Cross-Language Serialization for Relational Algebra

    Technology Principles

    • Provide a good suite of well-specified common functionality in databases and data science applications.
    • Make it easy for users to privately or publicly extend the representation to support specialized/custom operations.
    • Produce something that is language agnostic and requires minimal work to start developing against in a new language.
    • Drive towards a common format that avoids specialization for single favorite producer or consumer.
    • Establish clear delineation between specifications that MUST be respected to and those that can be optionally ignored.
    • Establish a forgiving compatibility approach and versioning scheme that supports cross-version compatibility in maximum number of cases.
    • Minimize the need for consumer intelligence by excluding concepts like overloading, type coercion, implicit casting, field name handling, etc. (Note: this is weak and should be better stated.)
    • Decomposability/severability: A particular producer or consumer should be able to produce or consume only a subset of the specification and interact well with any other Substrait system as long the specific operations requested fit within the subset of specification supported by the counter system.
    GitHub

    Technology Principles

    • Provide a good suite of well-specified common functionality in databases and data science applications.
    • Make it easy for users to privately or publicly extend the representation to support specialized/custom operations.
    • Produce something that is language agnostic and requires minimal work to start developing against in a new language.
    • Drive towards a common format that avoids specialization for single favorite producer or consumer.
    • Establish clear delineation between specifications that MUST be respected to and those that can be optionally ignored.
    • Establish a forgiving compatibility approach and versioning scheme that supports cross-version compatibility in maximum number of cases.
    • Minimize the need for consumer intelligence by excluding concepts like overloading, type coercion, implicit casting, field name handling, etc. (Note: this is weak and should be better stated.)
    • Decomposability/severability: A particular producer or consumer should be able to produce or consume only a subset of the specification and interact well with any other Substrait system as long the specific operations requested fit within the subset of specification supported by the counter system.
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/spec/versioning/index.html b/spec/versioning/index.html index 08c03f80..00e48689 100644 --- a/spec/versioning/index.html +++ b/spec/versioning/index.html @@ -1,4 +1,4 @@ - Versioning - Substrait: Cross-Language Serialization for Relational Algebra

    Versioning

    As an interface specification, the goal of Substrait is to reach a point where (breaking) changes will never need to happen again, or at least be few and far between. By analogy, Apache Arrow’s in-memory format specification has stayed functionally constant, despite many major library versions being released. However, we’re not there yet. When we believe that we’ve reached this point, we will signal this by releasing version 1.0.0. Until then, we will remain in the 0.x.x version regime.

    Despite this, we strive to maintain backward compatibility for both the binary representation and the text representation by means of deprecation. When a breaking change cannot be reasonably avoided, we may remove previously deprecated fields. All deprecated fields will be removed for the 1.0.0 release.

    Substrait uses semantic versioning for its version numbers, with the addition that, during 0.x.y, we increment the x digit for breaking changes and new features, and the y digit for fixes and other nonfunctional changes. The release process is currently automated and makes a new release every week, provided something has changed on the main branch since the previous release. This release cadence will likely be slowed down as stability increases over time. Conventional commits are used to distinguish between breaking changes, new features, and fixes, and GitHub actions are used to verify that there are indeed no breaking protobuf changes in a commit, unless the commit message states this.

    GitHub

    Versioning

    As an interface specification, the goal of Substrait is to reach a point where (breaking) changes will never need to happen again, or at least be few and far between. By analogy, Apache Arrow’s in-memory format specification has stayed functionally constant, despite many major library versions being released. However, we’re not there yet. When we believe that we’ve reached this point, we will signal this by releasing version 1.0.0. Until then, we will remain in the 0.x.x version regime.

    Despite this, we strive to maintain backward compatibility for both the binary representation and the text representation by means of deprecation. When a breaking change cannot be reasonably avoided, we may remove previously deprecated fields. All deprecated fields will be removed for the 1.0.0 release.

    Substrait uses semantic versioning for its version numbers, with the addition that, during 0.x.y, we increment the x digit for breaking changes and new features, and the y digit for fixes and other nonfunctional changes. The release process is currently automated and makes a new release every week, provided something has changed on the main branch since the previous release. This release cadence will likely be slowed down as stability increases over time. Conventional commits are used to distinguish between breaking changes, new features, and fixes, and GitHub actions are used to verify that there are indeed no breaking protobuf changes in a commit, unless the commit message states this.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/tools/producer_tools/index.html b/tools/producer_tools/index.html index 3c19a5bd..f1b56f99 100644 --- a/tools/producer_tools/index.html +++ b/tools/producer_tools/index.html @@ -1,4 +1,4 @@ - Producer Tools - Substrait: Cross-Language Serialization for Relational Algebra
    GitHub
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/tools/substrait_validator/index.html b/tools/substrait_validator/index.html index eeba7f97..16c3e5cb 100644 --- a/tools/substrait_validator/index.html +++ b/tools/substrait_validator/index.html @@ -1,4 +1,4 @@ - Substrait Validator - Substrait: Cross-Language Serialization for Relational Algebra
    GitHub
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/tools/third_party_tools/index.html b/tools/third_party_tools/index.html index 8035111b..5be3bdc5 100644 --- a/tools/third_party_tools/index.html +++ b/tools/third_party_tools/index.html @@ -1,4 +1,4 @@ - Third Party Tools - Substrait: Cross-Language Serialization for Relational Algebra

    Third Party Tools

    Substrait-tools

    The substrait-tools python package provides a command line interface for producing/consuming substrait plans by leveraging the APIs from different producers and consumers.

    Substrait Fiddle

    Substrait Fiddle is an online tool to share, debug, and prototype Substrait plans.

    The Substrait Fiddle Source is available allowing it to be run in any environment.

    GitHub

    Third Party Tools

    Substrait-tools

    The substrait-tools python package provides a command line interface for producing/consuming substrait plans by leveraging the APIs from different producers and consumers.

    Substrait Fiddle

    Substrait Fiddle is an online tool to share, debug, and prototype Substrait plans.

    The Substrait Fiddle Source is available allowing it to be run in any environment.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/tutorial/sql_to_substrait/index.html b/tutorial/sql_to_substrait/index.html index 41787f20..a10697c6 100644 --- a/tutorial/sql_to_substrait/index.html +++ b/tutorial/sql_to_substrait/index.html @@ -1,4 +1,4 @@ - SQL to Substrait tutorial - Substrait: Cross-Language Serialization for Relational Algebra

    SQL to Substrait tutorial

    This is an introductory tutorial to learn the basics of Substrait for readers already familiar with SQL. We will look at how to construct a Substrait plan from an example query.

    We’ll present the Substrait in JSON form to make it relatively readable to newcomers. Typically Substrait is exchanged as a protobuf message, but for debugging purposes it is often helpful to look at a serialized form. Plus, it’s not uncommon for unit tests to represent plans as JSON strings. So if you are developing with Substrait, it’s useful to have experience reading them.

    Note

    Substrait is currently only defined with Protobuf. The JSON provided here is the Protobuf JSON output, but it is not the official Substrait text format. Eventually, Substrait will define it’s own human-readable text format, but for now this tutorial will make due with what Protobuf provides.

    Substrait is designed to communicate plans (mostly logical plans). Those plans contain types, schemas, expressions, extensions, and relations. We’ll look at them in that order, going from simplest to most complex until we can construct full plans.

    This tutorial won’t cover all the details of each piece, but it will give you an idea of how they connect together. For a detailed reference of each individual field, the best place to look is reading the protobuf definitions. They represent the source-of-truth of the spec and are well-commented to address ambiguities.

    Problem Set up

    To learn Substrait, we’ll build up to a specific query. We’ll be using the tables:

    CREATE TABLE orders (
    + SQL to Substrait tutorial - Substrait: Cross-Language Serialization for Relational Algebra      

    SQL to Substrait tutorial

    This is an introductory tutorial to learn the basics of Substrait for readers already familiar with SQL. We will look at how to construct a Substrait plan from an example query.

    We’ll present the Substrait in JSON form to make it relatively readable to newcomers. Typically Substrait is exchanged as a protobuf message, but for debugging purposes it is often helpful to look at a serialized form. Plus, it’s not uncommon for unit tests to represent plans as JSON strings. So if you are developing with Substrait, it’s useful to have experience reading them.

    Note

    Substrait is currently only defined with Protobuf. The JSON provided here is the Protobuf JSON output, but it is not the official Substrait text format. Eventually, Substrait will define it’s own human-readable text format, but for now this tutorial will make due with what Protobuf provides.

    Substrait is designed to communicate plans (mostly logical plans). Those plans contain types, schemas, expressions, extensions, and relations. We’ll look at them in that order, going from simplest to most complex until we can construct full plans.

    This tutorial won’t cover all the details of each piece, but it will give you an idea of how they connect together. For a detailed reference of each individual field, the best place to look is reading the protobuf definitions. They represent the source-of-truth of the spec and are well-commented to address ambiguities.

    Problem Set up

    To learn Substrait, we’ll build up to a specific query. We’ll be using the tables:

    CREATE TABLE orders (
       product_id: i64 NOT NULL,
       quantity: i32 NOT NULL,
       order_date: date NOT NULL,
    @@ -598,4 +598,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/types/type_classes/index.html b/types/type_classes/index.html index 7fb1ca97..87cb27d8 100644 --- a/types/type_classes/index.html +++ b/types/type_classes/index.html @@ -1,4 +1,4 @@ - Type Classes - Substrait: Cross-Language Serialization for Relational Algebra

    Type Classes

    In Substrait, the “class” of a type, not to be confused with the concept from object-oriented programming, defines the set of non-null values that instances of a type may assume.

    Implementations of a Substrait type must support at least this set of values, but may include more; for example, an i8 could be represented using the same in-memory format as an i32, as long as functions operating on i8 values within [-128..127] behave as specified (in this case, this means 8-bit overflow must work as expected). Operating on values outside the specified range is unspecified behavior.

    Simple Types

    Simple type classes are those that don’t support any form of configuration. For simplicity, any generic type that has only a small number of discrete implementations is declared directly, as opposed to via configuration.

    Type Name Description Protobuf representation for literals
    boolean A value that is either True or False. bool
    i8 A signed integer within [-128..127], typically represented as an 8-bit two’s complement number. int32
    i16 A signed integer within [-32,768..32,767], typically represented as a 16-bit two’s complement number. int32
    i32 A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two’s complement number. int32
    i64 A signed integer within [−9,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two’s complement number. int64
    fp32 A 4-byte single-precision floating point number with the same range and precision as defined for the IEEE 754 32-bit floating-point format. float
    fp64 An 8-byte double-precision floating point number with the same range and precision as defined for the IEEE 754 64-bit floating-point format. double
    string A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. string
    binary A binary value, [0..2,147,483,647] bytes in length. binary
    timestamp A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone)
    timestamp_tz A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 UTC
    date A date within [1000-01-01..9999-12-31]. int32 days since 1970-01-01
    time A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. int64 microseconds past midnight
    interval_year Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m. int32 years and int32 months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. -10000y 200000m is not allowed)
    interval_day Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. 1d 0s is considered equal to 0d 86400s. int32 days, int32 seconds, and int32 microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. 3650001d -86400s 0us is not allowed)
    uuid A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: c48ffa9e-64f4-44cb-ae47-152b4e60e77b. Any 128-bit value is allowed, without specific adherence to RFC4122. 16-byte binary

    Compound Types

    Compound type classes are type classes that need to be configured by means of a parameter pack.

    Type Name Description Protobuf representation for literals
    FIXEDCHAR<L> A fixed-length unicode string of L characters. L must be within [1..2,147,483,647]. L-character string
    VARCHAR<L> A unicode string of at most L characters.L must be within [1..2,147,483,647]. string with at most L characters
    FIXEDBINARY<L> A binary string of L bytes. When casting, values shorter than L are padded with zeros, and values longer than L are right-trimmed. L-byte bytes
    DECIMAL<P, S> A fixed-precision decimal value having precision (P, number of digits) <= 38 and scale (S, number of fractional digits) 0 <= S <= P. 16-byte bytes representing a little-endian 128-bit integer, to be divided by 10^S to get the decimal value
    STRUCT<T1,…,Tn> A list of types in a defined order. repeated Literal, types matching T1..Tn
    NSTRUCT<N:T1,…,N:Tn> Pseudo-type: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait’s core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. n/a
    LIST<T> A list of values of type T. The list can be between [0..2,147,483,647] values in length. repeated Literal, all types matching T
    MAP<K, V> An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. repeated KeyValue (in turn two Literals), all key types matching K and all value types matching V
    PRECISIONTIMESTAMP<P> A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone)
    PRECISIONTIMESTAMPTZ<P> A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC

    User-Defined Types

    User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of simple extensions. An extension can declare an arbitrary number of user defined extension types. Once a type has been declared, it can be used in function declarations.

    A YAML example of an extension type is below:

    name: point
    + Type Classes - Substrait: Cross-Language Serialization for Relational Algebra      

    Type Classes

    In Substrait, the “class” of a type, not to be confused with the concept from object-oriented programming, defines the set of non-null values that instances of a type may assume.

    Implementations of a Substrait type must support at least this set of values, but may include more; for example, an i8 could be represented using the same in-memory format as an i32, as long as functions operating on i8 values within [-128..127] behave as specified (in this case, this means 8-bit overflow must work as expected). Operating on values outside the specified range is unspecified behavior.

    Simple Types

    Simple type classes are those that don’t support any form of configuration. For simplicity, any generic type that has only a small number of discrete implementations is declared directly, as opposed to via configuration.

    Type Name Description Protobuf representation for literals
    boolean A value that is either True or False. bool
    i8 A signed integer within [-128..127], typically represented as an 8-bit two’s complement number. int32
    i16 A signed integer within [-32,768..32,767], typically represented as a 16-bit two’s complement number. int32
    i32 A signed integer within [-2147483648..2,147,483,647], typically represented as a 32-bit two’s complement number. int32
    i64 A signed integer within [−9,223,372,036,854,775,808..9,223,372,036,854,775,807], typically represented as a 64-bit two’s complement number. int64
    fp32 A 4-byte single-precision floating point number with the same range and precision as defined for the IEEE 754 32-bit floating-point format. float
    fp64 An 8-byte double-precision floating point number with the same range and precision as defined for the IEEE 754 64-bit floating-point format. double
    string A unicode string of text, [0..2,147,483,647] UTF-8 bytes in length. string
    binary A binary value, [0..2,147,483,647] bytes in length. binary
    timestamp A naive timestamp with microsecond precision. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 (in an unspecified timezone)
    timestamp_tz A timezone-aware timestamp with microsecond precision. Similar to aware datetime in Python. int64 microseconds since 1970-01-01 00:00:00.000000 UTC
    date A date within [1000-01-01..9999-12-31]. int32 days since 1970-01-01
    time A time since the beginning of any day. Range of [0..86,399,999,999] microseconds; leap seconds need not be supported. int64 microseconds past midnight
    interval_year Interval year to month. Supports a range of [-10,000..10,000] years with month precision (= [-120,000..120,000] months). Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m. int32 years and int32 months, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. -10000y 200000m is not allowed)
    interval_day Interval day to second. Supports a range of [-3,650,000..3,650,000] days with microsecond precision (= [-315,360,000,000,000,000..315,360,000,000,000,000] microseconds). Usually stored as separate integers for various components, but only the total number of microseconds is significant, i.e. 1d 0s is considered equal to 0d 86400s. int32 days, int32 seconds, and int32 microseconds, with the added constraint that each component can never independently specify more than 10,000 years, even if the components have opposite signs (e.g. 3650001d -86400s 0us is not allowed)
    uuid A universally-unique identifier composed of 128 bits. Typically presented to users in the following hexadecimal format: c48ffa9e-64f4-44cb-ae47-152b4e60e77b. Any 128-bit value is allowed, without specific adherence to RFC4122. 16-byte binary

    Compound Types

    Compound type classes are type classes that need to be configured by means of a parameter pack.

    Type Name Description Protobuf representation for literals
    FIXEDCHAR<L> A fixed-length unicode string of L characters. L must be within [1..2,147,483,647]. L-character string
    VARCHAR<L> A unicode string of at most L characters.L must be within [1..2,147,483,647]. string with at most L characters
    FIXEDBINARY<L> A binary string of L bytes. When casting, values shorter than L are padded with zeros, and values longer than L are right-trimmed. L-byte bytes
    DECIMAL<P, S> A fixed-precision decimal value having precision (P, number of digits) <= 38 and scale (S, number of fractional digits) 0 <= S <= P. 16-byte bytes representing a little-endian 128-bit integer, to be divided by 10^S to get the decimal value
    STRUCT<T1,…,Tn> A list of types in a defined order. repeated Literal, types matching T1..Tn
    NSTRUCT<N:T1,…,N:Tn> Pseudo-type: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait’s core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. n/a
    LIST<T> A list of values of type T. The list can be between [0..2,147,483,647] values in length. repeated Literal, all types matching T
    MAP<K, V> An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. repeated KeyValue (in turn two Literals), all key types matching K and all value types matching V
    PRECISIONTIMESTAMP<P> A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone)
    PRECISIONTIMESTAMPTZ<P> A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. uint64 microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC

    User-Defined Types

    User-defined type classes can be created using a combination of pre-defined types. User-defined types are defined as part of simple extensions. An extension can declare an arbitrary number of user defined extension types. Once a type has been declared, it can be used in function declarations.

    A YAML example of an extension type is below:

    name: point
     structure:
       longitude: i32
       latitude: i32
    @@ -65,4 +65,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/types/type_parsing/index.html b/types/type_parsing/index.html index ef2aaa2a..604ad609 100644 --- a/types/type_parsing/index.html +++ b/types/type_parsing/index.html @@ -1,4 +1,4 @@ - Type Syntax Parsing - Substrait: Cross-Language Serialization for Relational Algebra

    Type Syntax Parsing

    In many places, it is useful to have a human-readable string representation of data types. Substrait has a custom syntax for type declaration. The basic structure of a type declaration is:

    name?[variation]<param0,...,paramN>
    + Type Syntax Parsing - Substrait: Cross-Language Serialization for Relational Algebra      

    Type Syntax Parsing

    In many places, it is useful to have a human-readable string representation of data types. Substrait has a custom syntax for type declaration. The basic structure of a type declaration is:

    name?[variation]<param0,...,paramN>
     

    The components of this expression are:

    Component Description Required
    Name Each type has a name. A type is expressed by providing a name. This name can be expressed in arbitrary case (e.g. varchar and vArChAr are equivalent) although lowercase is preferred.
    Nullability indicator A type is either non-nullable or nullable. To express nullability, a question mark is added after the type name (before any parameters). Optional, defaults to non-nullable
    Variation When expressing a type, a user can define the type based on a type variation. Some systems use type variations to describe different underlying representations of the same data type. This is expressed as a bracketed integer such as [2]. Optional, defaults to [0]
    Parameters Compound types may have one or more configurable properties. The two main types of properties are integer and type properties. The parameters for each type correspond to a list of known properties associated with a type as declared in the order defined in the type specification. For compound types (types that contain types), the data type syntax will include nested type declarations. The one exception is structs, which are further outlined below. Required where parameters are defined

    Grammars

    It is relatively easy in most languages to produce simple parser & emitters for the type syntax. To make that easier, Substrait also includes an ANTLR grammar to ease consumption and production of types. (The grammar also supports an entire language for representing plans as text.)

    Structs & Named Structs

    Structs are unique from other types because they have an arbitrary number of parameters. The parameters are recursive and may include their own subproperties. Struct parsing is declared in the following two ways:

    # Struct
     struct?[variation]<type0, type1,..., typeN>
     
    @@ -33,4 +33,4 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
       FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
       IN THE SOFTWARE.
    --->   
    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/types/type_system/index.html b/types/type_system/index.html index de820f19..aa37e3df 100644 --- a/types/type_system/index.html +++ b/types/type_system/index.html @@ -1,4 +1,4 @@ - Type System - Substrait: Cross-Language Serialization for Relational Algebra

    Type System

    Substrait tries to cover the most common types used in data manipulation. Types beyond this common core may be represented using simple extensions.

    Substrait types fundamentally consist of four components:

    Component Condition Examples Description
    Class Always i8, string, STRUCT, extensions Together with the parameter pack, describes the set of non-null values supported by the type. Subdivided into simple and compound type classes.
    Nullability Always Either NULLABLE (? suffix) or REQUIRED (no suffix) Describes whether values of this type can be null. Note that null is considered to be a special value of a nullable type, rather than the only value of a special null type.
    Variation Always No suffix or explicitly [0] (system-preferred), or an extension Allows different variations of the same type class to exist in a system at a time, usually distinguished by in-memory format.
    Parameters Compound types only <10, 2> (for DECIMAL), <i32, string> (for STRUCT) Some combination of zero or more data types or integers. The expected set of parameters and the significance of each parameter depends on the type class.

    Refer to Type Parsing for a description of the syntax used to describe types.

    Note

    Substrait employs a strict type system without any coercion rules. All changes in types must be made explicit via cast expressions.

    GitHub

    Type System

    Substrait tries to cover the most common types used in data manipulation. Types beyond this common core may be represented using simple extensions.

    Substrait types fundamentally consist of four components:

    Component Condition Examples Description
    Class Always i8, string, STRUCT, extensions Together with the parameter pack, describes the set of non-null values supported by the type. Subdivided into simple and compound type classes.
    Nullability Always Either NULLABLE (? suffix) or REQUIRED (no suffix) Describes whether values of this type can be null. Note that null is considered to be a special value of a nullable type, rather than the only value of a special null type.
    Variation Always No suffix or explicitly [0] (system-preferred), or an extension Allows different variations of the same type class to exist in a system at a time, usually distinguished by in-memory format.
    Parameters Compound types only <10, 2> (for DECIMAL), <i32, string> (for STRUCT) Some combination of zero or more data types or integers. The expected set of parameters and the significance of each parameter depends on the type class.

    Refer to Type Parsing for a description of the syntax used to describe types.

    Note

    Substrait employs a strict type system without any coercion rules. All changes in types must be made explicit via cast expressions.

    \ No newline at end of file +-->
    \ No newline at end of file diff --git a/types/type_variations/index.html b/types/type_variations/index.html index 8fa5a171..eaa543d7 100644 --- a/types/type_variations/index.html +++ b/types/type_variations/index.html @@ -1,4 +1,4 @@ - Type Variations - Substrait: Cross-Language Serialization for Relational Algebra

    Type Variations

    Type variations may be used to represent differences in representation between different consumers. For example, an engine might support dictionary encoding for a string, or could be using either a row-wise or columnar representation of a struct. All variations of a type are expected to have the same semantics when operated on by functions or other expressions.

    All variations except the “system-preferred” variation (a.k.a. [0], see Type Parsing) must be defined using simple extensions. The key properties of these variations are:

    Property Description
    Base Type Class The type class that this variation belongs to.
    Name The name used to reference this type. Should be unique within type variations for this parent type within a simple extension.
    Description A human description of the purpose of this type variation.
    Function Behavior INHERITS or SEPARATE: whether functions that support the system-preferred variation implicitly also support this variation, or whether functions should be resolved independently. For example, if one has the function add(i8,i8) defined and then defines an i8 variation, this determines whether the i8 variation can be bound to the base add operation (inherits) or whether a specialized version of add needs to be defined specifically for this variation (separate). Defaults to inherits.
    GitHub

    Type Variations

    Type variations may be used to represent differences in representation between different consumers. For example, an engine might support dictionary encoding for a string, or could be using either a row-wise or columnar representation of a struct. All variations of a type are expected to have the same semantics when operated on by functions or other expressions.

    All variations except the “system-preferred” variation (a.k.a. [0], see Type Parsing) must be defined using simple extensions. The key properties of these variations are:

    Property Description
    Base Type Class The type class that this variation belongs to.
    Name The name used to reference this type. Should be unique within type variations for this parent type within a simple extension.
    Description A human description of the purpose of this type variation.
    Function Behavior INHERITS or SEPARATE: whether functions that support the system-preferred variation implicitly also support this variation, or whether functions should be resolved independently. For example, if one has the function add(i8,i8) defined and then defines an i8 variation, this determines whether the i8 variation can be bound to the base add operation (inherits) or whether a specialized version of add needs to be defined specifically for this variation (separate). Defaults to inherits.
    \ No newline at end of file +-->
    \ No newline at end of file

    Talks