Export of internal ZetaSQL changes.

-- Change by ZetaSQL Team <[email protected]>: Updated the instructions for running execute_query with docker on MacOS with M1/M2 chips. -- Change by ZetaSQL Team <[email protected]>: Add a note about MacOS users seeing the error `execute_query_macos cannot be opened because the developer cannot be verified.` -- Change by ZetaSQL Team <[email protected]>: Refactoring in preparation for UPDATE constructor. -- Change by ZetaSQL Team <[email protected]>: Change the ZetaSQL Dockerfile to support different build modes. -- Change by Jeff Shute <[email protected]>: Add tests that check that a sql file runs successfully in execute_query. -- Change by ZetaSQL Team <[email protected]>: add a new TO_JSON signature that supports arg `unsupported_fiels`. -- Change by Jeff Shute <[email protected]>: Add some more example queries in examples/pipe_queries. -- Change by Brandon Dolphin <[email protected]>: Begin adding Measure type to TypeProto. -- Change by ZetaSQL Team <[email protected]>: Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer. -- Change by ZetaSQL Team <[email protected]>: Handle lambda functions directly in the BuiltinFunctionRegistry scalar function APIs. -- Change by ZetaSQL Team <[email protected]>: Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer. -- Change by ZetaSQL Team <[email protected]>: Add optional_ref library. -- Change by John Fremlin <[email protected]>: Add a testcase for deeply nested structs and arrays in JSON -- Change by ZetaSQL Team <[email protected]>: Update the ZetaSQL documentation: -- Change by John Fremlin <[email protected]>: Truncate output for deeply nested array expressions in unparser -- Change by ZetaSQL Team <[email protected]>: Update pipe syntax docs with TW peer review edits -- Change by Jeff Shute <[email protected]>: Fix execute_query command line help. -- Change by ZetaSQL Team <[email protected]>: add a new named arg `unsupported_fiels` for the TO_JSON function. -- Change by John Fremlin <[email protected]>: Truncate output for deeply nested CASE expressions in unparser -- Change by ZetaSQL Team <[email protected]>: Add MAP_REPLACE signatures, and reference implementation for KV pairs version -- Change by ZetaSQL Team <[email protected]>: Remove unnecessarily explicit function registrations from reference_impl/function.cc -- Change by Jeff Shute <[email protected]>: Adjust text area size so results are more visible. -- Change by ZetaSQL Team <[email protected]>: Disable formatting of SQL inside non-multiline string literals. -- Change by ZetaSQL Team <[email protected]>: Fixed issue with formatting SQL inside string literals when input string contains \r\n line endings. -- Change by ZetaSQL Team <[email protected]>: Format textproto inside annotated string literal. -- Change by ZetaSQL Team <[email protected]>: Disambiguate between open and close brackets annotations for braced constructor syntax. -- Change by Jeff Shute <[email protected]>: Improve multi-statement output in execute_query web. -- Change by ZetaSQL Team <[email protected]>: add a new named arg `unsupported_fiels` for the TO_JSON function. -- Change by ZetaSQL Team <[email protected]>: add a new built-in enum `UnsupportedFields` to be used by TO_JSON. -- Change by ZetaSQL Team <[email protected]>: Record parse location for OrderByItem iff record type is not PARSE_LOCATION_RECORD_NONE. -- Change by ZetaSQL Team <[email protected]>: Unify Lambda and non-lambda AlgebrizeFunctionCall codepaths -- Change by ZetaSQL Team <[email protected]>: small formatting updates for named arguments -- Change by ZetaSQL Team <[email protected]>: Fix the example Docker image name in the ZetaSQL doc. -- Change by ZetaSQL Team <[email protected]>: Refactor the parse AST and the grammar to use postfix table operators (e.g. TABLESAMPLE) on ASTTableExpression. -- Change by ZetaSQL Team <[email protected]>: Fix ZetaSQL documentation. GitOrigin-RevId: a68e25b308dadf3e78c4d22ec41adf72f8b08e5b Change-Id: I586b6974dbdb4e2bb4c99ba641ef96916ec33ba6
google · Aug 21, 2024 · 194cd32 · 194cd32
1 parent f30c319
commit 194cd32
Show file tree

Hide file tree

Showing 220 changed files with 5,902 additions and 6,180 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -52,17 +52,17 @@ USER zetasql
 
 ENV BAZEL_ARGS="--config=g++"
 
-# Pre-build the binary for execute_query so that users can try out zetasql
-# directly. Users can modify the target in the docker file or enter the
-# container and build other targets as needed.
-RUN cd zetasql                                                              && \
-    CC=/usr/bin/gcc CXX=/usr/bin/g++                                           \
-    bazel build ${BAZEL_ARGS} -c opt //zetasql/tools/execute_query:execute_query
-
-# Create a shortcut for execute_query.
 ENV HOME=/home/zetasql
 RUN mkdir -p $HOME/bin
-RUN ln -s /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
+
+# Supported MODE:
+# - `build` (default): Builds all ZetaSQL targets.
+# - `execute_query`: Installs the `execute_query` tool only. Erases all other
+#                    build artifacts.
+ARG MODE=build
+
+RUN cd zetasql && ./docker_build.sh $MODE
+
 ENV PATH=$PATH:$HOME/bin
 
 WORKDIR /zetasql
diff --git a/README.md b/README.md
@@ -11,21 +11,24 @@ giving errors for unuspported features.
 ZetaSQL's compliance test suite can be used to validate query engine
 implementations are correct and consistent.
 
-ZetaSQL implements the ZetaSQL language, which is used across several of
+ZetaSQL implements the GoogleSQL language, which is used across several of
 Google's SQL products, both publicly and internally, including BigQuery,
 Spanner, F1, BigTable, Dremel, Procella, and others.
 
-ZetaSQL and ZetaSQL have been described in these publications:
+GoogleSQL and ZetaSQL have been described in these publications:
 
-* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
+* (CDMS 2022) [GoogleSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
 * (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6.
-* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax.
+* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes GoogleSQL's new pipe query syntax.
 
 Some other documentation:
 
 * [ZetaSQL Language Reference](docs/README.md)
 * [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer.
 * [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines.
+* Pipe query syntax
+    * See the [reference documentation](https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md) and [research paper](https://research.google/pubs/pub1005959/).
+    * See some [example scripts](zetasql/examples/pipe_queries) and [TPC-H queries](zetasql/examples/tpch).
 
 ## Project Overview
 
@@ -62,7 +65,8 @@ You can run it using binaries from
 instructions below.
 
 There are some runnable example queries in
-[tpch examples](../zetasql/examples/tpch/README.md).
+[`zetasql/examples/tpch`](zetasql/examples/tpch) and
+[`zetasql/examples/pipe_queries`](zetasql/examples/pipe_queries).
 
 ### Getting and Running `execute_query`
 #### Pre-built Binaries
@@ -72,14 +76,20 @@ the [Releases](https://github.com/google/zetasql/releases) page. You can run
 the downloaded binary like:
 
 ```bash
+chmod +x execute_query_linux
 ./execute_query_linux --web
 ```
 
+MacOS users may see the error `execute_query_macos cannot be opened because the developer cannot be verified.`.
+You can right click the `execute_query_macos` file, click "open", and then you
+should be able to run the binary.
+
 Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency
-issues, you can try running `execute_query` with Docker. See the
-[Run with Docker](#run-with-docker) section.
+issues or if the binary is incompatible with your platform, you can try running
+`execute_query` with Docker. See the [Run with Docker](#run-with-docker)
+section.
 
-#### Running from a bazel build
+#### Running from a Bazel Build
 
 You can build `execute_query` with Bazel from source and run it by:
 
@@ -89,14 +99,29 @@ bazel run zetasql/tools/execute_query:execute_query -- --web
 
 #### Run with Docker
 
-You can run `execute_query` using Docker. First download the pre-built Docker
-image `zetasql` or build your own from Dockerfile. See the instructions in the
-[Build With Docker](#build-with-docker) section.
+You can run `execute_query` using Docker. Download the pre-built Docker image
+file `zetasql_docker.tar.gz` from the
+[Releases](https://github.com/google/zetasql/releases) page, and load the image
+using:
+
+```bash
+sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
+```
+
+The Docker image name is `zetasql`. (You can also build a Docker image locally
+using the instructions in the [Build with Docker](#build-with-docker) section.)
 
-Assuming your Docker image name is MyZetaSQLImage, run:
+You can then run `execute_query` using:
 
 ```bash
-sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web
+sudo docker run --init -it -h=$(hostname) -p 8080:8080 zetasql execute_query --web
+```
+
+If you are using MacOS with an Apple M1/M2 chip, add the additional argument
+`--platform=linux/amd64`:
+
+```bash
+sudo docker run --init -it -h=$(hostname) -p 8080:8080 --platform linux/amd64 zetasql execute_query --web
 ```
 
 Argument descriptions:
@@ -106,6 +131,7 @@ Argument descriptions:
 * `-h=$(hostname)`: Makes the hostname of the container the same as that of the
                     host.
 * `-p 8080:8080`: Sets up port forwarding.
+* `zetasql`: The docker image name.
 
 `-h=$(hostname)` and `-p 8080:8080` together make the URL address of the
 web server accessible from the host machine.
@@ -114,7 +140,7 @@ Alternatively, you can run this to start a bash shell, and then run
 `execute_query` inside:
 
 ```bash
-sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage
+sudo docker run --init -it -h=$(hostname) -p 8080:8080 my-zetasql-image
 
 # Inside the container bash shell
 execute_query --web
@@ -149,7 +175,7 @@ bazel build ...
 bazel run //zetasql/tools/execute_query:execute_query -- --web
 
 # The built binary can be found under bazel-bin and run directly.
-bazel-bin/tools/execute_query:execute_query --web
+bazel-bin/zetasql/tools/execute_query/execute_query --web
 
 # Build and run a test.
 bazel test //zetasql/parser:parser_set_test
@@ -165,28 +191,30 @@ version can be found in the `zetasql_deps_step_2.bzl` file.
 ZetaSQL also provides a `Dockerfile` which configures all the dependencies so
 that users can build ZetaSQL more easily across different platforms.
 
-To build the Docker image locally (called MyZetaSQLImage here), run:
+To build the Docker image locally (called `my-zetasql-image` here), run:
 
 ```bash
-sudo docker build . -t MyZetaSQLImage -f Dockerfile
+sudo docker build . -t my-zetasql-image -f Dockerfile
 ```
 
 Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the
 [Releases](https://github.com/google/zetasql/releases) page. You can load the
 downloaded image by:
 
 ```bash
-sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar
+sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
 ```
 
 To run builds or other commands inside the Docker environment, run this command
 to open a bash shell inside the container:
 
 ```bash
 # Start a bash shell running inside the Docker container.
-sudo docker run -it MyZetaSQLImage
+sudo docker run -it my-zetasql-image
 ```
 
+Replace `my-zetasql-image` with `zetasql` if you use the pre-built Docker image.
+
 Then you can run the commands from the [Build with Bazel](#build-with-bazel)
 section above.
 

diff --git a/docker_build.sh b/docker_build.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+#
+# Copyright 2024 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -e
+set -x
+
+MODE=$1
+
+CC=/usr/bin/gcc
+CXX=/usr/bin/g++
+
+if [ "$MODE" = "build" ]; then
+  # Build everything.
+  bazel build ${BAZEL_ARGS} -c opt ...
+elif [ "$MODE" = "execute_query" ]; then
+  # Install the execute_query tool.
+  bazel build ${BAZEL_ARGS} -c opt --dynamic_mode=off //zetasql/tools/execute_query:execute_query
+  # Move the generated binary to the home directory so that users can run it
+  # directly.
+  cp /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
+  # Remove the downloaded and generated artifacts to keep the image small.
+  bazel clean --expunge
+else
+  echo "Unknown mode: $MODE"
+  echo "Supported modes are: build, execute_query"
+  exit 1
+fi
diff --git a/docs/README.md b/docs/README.md
@@ -7,14 +7,15 @@
 The topics in this section provide the reference information you need to work
 with ZetaSQL:
 
-* [Lexical Structure and Syntax](https://github.com/google/zetasql/blob/master/docs/lexical.md)
-* [Expressions, Functions, and Operators](https://github.com/google/zetasql/blob/master/docs/functions-and-operators.md)
-* [Data Types](https://github.com/google/zetasql/blob/master/docs/data-types.md)
-* [Query Syntax](https://github.com/google/zetasql/blob/master/docs/query-syntax.md)
-* [Data Manipulation Language Reference](https://github.com/google/zetasql/blob/master/docs/data-manipulation-language.md)
-* [Data Model](https://github.com/google/zetasql/blob/master/docs/data-model.md)
-* [Data Definition Language Reference](https://github.com/google/zetasql/blob/master/docs/data-definition-language.md)
-* [Modules](https://github.com/google/zetasql/blob/master/docs/modules.md)
+* [Lexical Structure and Syntax](lexical.md)
+* [Expressions, Functions, and Operators](functions-and-operators.md)
+* [Data Types](data-types.md)
+* [Query Syntax](query-syntax.md)
+* [Pipe Query Syntax](pipe-syntax.md)
+* [Data Manipulation Language Reference](data-manipulation-language.md)
+* [Data Model](data-model.md)
+* [Data Definition Language Reference](data-definition-language.md)
+* [Modules](modules.md)
 
 ## License
 

diff --git a/docs/aggregate-dp-functions.md b/docs/aggregate-dp-functions.md
@@ -186,7 +186,7 @@ determine the optimal privacy parameters for your dataset and organization.
 WITH DIFFERENTIAL_PRIVACY ...
   AVG(
     expression,
-    [contribution_bounds_per_group => (lower_bound, upper_bound)]
+    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
   )
 ```
 
@@ -201,9 +201,9 @@ and can support the following arguments:
 
 + `expression`: The input expression. This can be any numeric input type,
   such as `INT64`.
-+ `contribution_bounds_per_group`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each group separately before performing intermediate
++ `contribution_bounds_per_group`: A named argument with a
+  [contribution bound][dp-clamped-named].
+  Performs clamping for each group separately before performing intermediate
   grouping on the privacy unit column.
 
 **Return type**
@@ -330,7 +330,7 @@ noise, see [Remove noise][dp-noise].
 WITH DIFFERENTIAL_PRIVACY ...
   COUNT(
     *,
-    [contribution_bounds_per_group => (lower_bound, upper_bound)]
+    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
   )
 ```
 
@@ -343,9 +343,9 @@ is an aggregation across a privacy unit column.
 This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax]
 and can support the following argument:
 
-+ `contribution_bounds_per_group`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each group separately before performing intermediate
++ `contribution_bounds_per_group`: A named argument with a
+  [contribution bound][dp-clamped-named].
+  Performs clamping for each group separately before performing intermediate
   grouping on the privacy unit column.
 
 **Return type**
@@ -468,9 +468,9 @@ and can support these arguments:
 
 + `expression`: The input expression. This expression can be any
   numeric input type, such as `INT64`.
-+ `contribution_bounds_per_group`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each group separately before performing intermediate
++ `contribution_bounds_per_group`: A named argument with a
+  [contribution bound][dp-clamped-named].
+  Performs clamping per each group separately before performing intermediate
   grouping on the privacy unit column.
 
 **Return type**
@@ -609,9 +609,9 @@ and can support these arguments:
   such as `INT64`. `NULL` values are always ignored.
 + `percentile`: The percentile to compute. The percentile must be a literal in
   the range `[0, 1]`.
-+ `contribution_bounds_per_row`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each row separately before performing intermediate
++ `contribution_bounds_per_row`: A named argument with a
+  [contribution bounds][dp-clamped-named].
+  Performs clamping for each row separately before performing intermediate
   grouping on the privacy unit column.
 
 `NUMERIC` and `BIGNUMERIC` arguments are not allowed.
@@ -689,7 +689,7 @@ GROUP BY item;
 WITH DIFFERENTIAL_PRIVACY ...
   SUM(
     expression,
-    [contribution_bounds_per_group => (lower_bound, upper_bound)]
+    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
   )
 ```
 
@@ -703,10 +703,9 @@ and can support these arguments:
 
 + `expression`: The input expression. This can be any numeric input type,
   such as `INT64`. `NULL` values are always ignored.
-+ `contribution_bounds_per_group`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each group separately before performing intermediate
-  grouping on the privacy unit column.
++ `contribution_bounds_per_group`: A named argument with a
+  [contribution bound][dp-clamped-named]. Performs clamping for each group
+  separately before performing intermediate grouping on the privacy unit column.
 
 **Return type**
 
@@ -830,7 +829,7 @@ noise, see [Use differential privacy][dp-noise].
 WITH DIFFERENTIAL_PRIVACY ...
   VAR_POP(
     expression,
-    [contribution_bounds_per_row => (lower_bound, upper_bound)]
+    [ contribution_bounds_per_row => (lower_bound, upper_bound) ]
   )
 ```
 
@@ -847,9 +846,9 @@ can support these arguments:
 
 + `expression`: The input expression. This can be any numeric input type,
   such as `INT64`. `NULL`s are always ignored.
-+ `contribution_bounds_per_row`: The
-  [contribution bounds named argument][dp-clamped-named].
-  Perform clamping per each row separately before performing intermediate
++ `contribution_bounds_per_row`: A named argument with a
+  [contribution bound][dp-clamped-named].
+  Performs clamping for each row separately before performing intermediate
   grouping on individual user values.
 
 `NUMERIC` and `BIGNUMERIC` arguments are not allowed.

diff --git a/docs/aggregate-function-calls.md b/docs/aggregate-function-calls.md
@@ -4,10 +4,10 @@
 
 # Aggregate function calls
 
-An aggregate function is a function that summarizes the rows of a group into a
-single value. When an aggregate function is used with the `OVER` clause, it
-becomes a window function, which computes values over a group of rows and then
-returns a single result for each row.
+An aggregate function summarizes the rows of a group into a single value. When
+an aggregate function is used with the `OVER` clause, it becomes a window
+function, which computes values over a group of rows and then returns a single
+result for each row.
 
 ## Aggregate function call syntax