Skip to content

Commit

Permalink
Export of internal ZetaSQL changes.
Browse files Browse the repository at this point in the history
--
Change by ZetaSQL Team <[email protected]>:
Updated the instructions for running execute_query with docker on MacOS with M1/M2 chips.
--
Change by ZetaSQL Team <[email protected]>:
Add a note about MacOS users seeing the error `execute_query_macos cannot be opened because the developer cannot be verified.`
--
Change by ZetaSQL Team <[email protected]>:
Refactoring in preparation for UPDATE constructor.
--
Change by ZetaSQL Team <[email protected]>:
Change the ZetaSQL Dockerfile to support different build modes.
--
Change by Jeff Shute <[email protected]>:
Add tests that check that a sql file runs successfully in execute_query.
--
Change by ZetaSQL Team <[email protected]>:
add a new TO_JSON signature that supports arg `unsupported_fiels`.
--
Change by Jeff Shute <[email protected]>:
Add some more example queries in examples/pipe_queries.
--
Change by Brandon Dolphin <[email protected]>:
Begin adding Measure type to TypeProto.
--
Change by ZetaSQL Team <[email protected]>:
Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer.
--
Change by ZetaSQL Team <[email protected]>:
Handle lambda functions directly in the BuiltinFunctionRegistry scalar function APIs.
--
Change by ZetaSQL Team <[email protected]>:
Add per column OPTIONS and WITH COLUMN OPTIONS to analyzer.
--
Change by ZetaSQL Team <[email protected]>:
Add optional_ref library.
--
Change by John Fremlin <[email protected]>:
Add a testcase for deeply nested structs and arrays in JSON
--
Change by ZetaSQL Team <[email protected]>:
Update the ZetaSQL documentation:
--
Change by John Fremlin <[email protected]>:
Truncate output for deeply nested array expressions in unparser
--
Change by ZetaSQL Team <[email protected]>:
Update pipe syntax docs with TW peer review edits
--
Change by Jeff Shute <[email protected]>:
Fix execute_query command line help.
--
Change by ZetaSQL Team <[email protected]>:
add a new named arg `unsupported_fiels` for the TO_JSON function.
--
Change by John Fremlin <[email protected]>:
Truncate output for deeply nested CASE expressions in unparser
--
Change by ZetaSQL Team <[email protected]>:
Add MAP_REPLACE signatures, and reference implementation for KV pairs version
--
Change by ZetaSQL Team <[email protected]>:
Remove unnecessarily explicit function registrations from reference_impl/function.cc
--
Change by Jeff Shute <[email protected]>:
Adjust text area size so results are more visible.
--
Change by ZetaSQL Team <[email protected]>:
Disable formatting of SQL inside non-multiline string literals.
--
Change by ZetaSQL Team <[email protected]>:
Fixed issue with formatting SQL inside string literals when input string contains \r\n line endings.
--
Change by ZetaSQL Team <[email protected]>:
Format textproto inside annotated string literal.
--
Change by ZetaSQL Team <[email protected]>:
Disambiguate between open and close brackets annotations for braced constructor syntax.
--
Change by Jeff Shute <[email protected]>:
Improve multi-statement output in execute_query web.
--
Change by ZetaSQL Team <[email protected]>:
add a new named arg `unsupported_fiels` for the TO_JSON function.
--
Change by ZetaSQL Team <[email protected]>:
add a new built-in enum `UnsupportedFields` to be used by TO_JSON.
--
Change by ZetaSQL Team <[email protected]>:
Record parse location for OrderByItem iff record type is not PARSE_LOCATION_RECORD_NONE.
--
Change by ZetaSQL Team <[email protected]>:
Unify Lambda and non-lambda AlgebrizeFunctionCall codepaths
--
Change by ZetaSQL Team <[email protected]>:
small formatting updates for named arguments
--
Change by ZetaSQL Team <[email protected]>:
Fix the example Docker image name in the ZetaSQL doc.
--
Change by ZetaSQL Team <[email protected]>:
Refactor the parse AST and the grammar to use postfix table operators (e.g. TABLESAMPLE) on ASTTableExpression.
--
Change by ZetaSQL Team <[email protected]>:
Fix ZetaSQL documentation.

GitOrigin-RevId: a68e25b308dadf3e78c4d22ec41adf72f8b08e5b
Change-Id: I586b6974dbdb4e2bb4c99ba641ef96916ec33ba6
  • Loading branch information
ZetaSQL Team authored and KimiWaRokkuWoKikanai committed Aug 21, 2024
1 parent f30c319 commit 194cd32
Show file tree
Hide file tree
Showing 220 changed files with 5,902 additions and 6,180 deletions.
18 changes: 9 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,17 +52,17 @@ USER zetasql

ENV BAZEL_ARGS="--config=g++"

# Pre-build the binary for execute_query so that users can try out zetasql
# directly. Users can modify the target in the docker file or enter the
# container and build other targets as needed.
RUN cd zetasql && \
CC=/usr/bin/gcc CXX=/usr/bin/g++ \
bazel build ${BAZEL_ARGS} -c opt //zetasql/tools/execute_query:execute_query

# Create a shortcut for execute_query.
ENV HOME=/home/zetasql
RUN mkdir -p $HOME/bin
RUN ln -s /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query

# Supported MODE:
# - `build` (default): Builds all ZetaSQL targets.
# - `execute_query`: Installs the `execute_query` tool only. Erases all other
# build artifacts.
ARG MODE=build

RUN cd zetasql && ./docker_build.sh $MODE

ENV PATH=$PATH:$HOME/bin

WORKDIR /zetasql
66 changes: 47 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,24 @@ giving errors for unuspported features.
ZetaSQL's compliance test suite can be used to validate query engine
implementations are correct and consistent.

ZetaSQL implements the ZetaSQL language, which is used across several of
ZetaSQL implements the GoogleSQL language, which is used across several of
Google's SQL products, both publicly and internally, including BigQuery,
Spanner, F1, BigTable, Dremel, Procella, and others.

ZetaSQL and ZetaSQL have been described in these publications:
GoogleSQL and ZetaSQL have been described in these publications:

* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
* (CDMS 2022) [GoogleSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
* (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6.
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax.
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes GoogleSQL's new pipe query syntax.

Some other documentation:

* [ZetaSQL Language Reference](docs/README.md)
* [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer.
* [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines.
* Pipe query syntax
* See the [reference documentation](https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md) and [research paper](https://research.google/pubs/pub1005959/).
* See some [example scripts](zetasql/examples/pipe_queries) and [TPC-H queries](zetasql/examples/tpch).

## Project Overview

Expand Down Expand Up @@ -62,7 +65,8 @@ You can run it using binaries from
instructions below.

There are some runnable example queries in
[tpch examples](../zetasql/examples/tpch/README.md).
[`zetasql/examples/tpch`](zetasql/examples/tpch) and
[`zetasql/examples/pipe_queries`](zetasql/examples/pipe_queries).

### Getting and Running `execute_query`
#### Pre-built Binaries
Expand All @@ -72,14 +76,20 @@ the [Releases](https://github.com/google/zetasql/releases) page. You can run
the downloaded binary like:

```bash
chmod +x execute_query_linux
./execute_query_linux --web
```

MacOS users may see the error `execute_query_macos cannot be opened because the developer cannot be verified.`.
You can right click the `execute_query_macos` file, click "open", and then you
should be able to run the binary.

Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency
issues, you can try running `execute_query` with Docker. See the
[Run with Docker](#run-with-docker) section.
issues or if the binary is incompatible with your platform, you can try running
`execute_query` with Docker. See the [Run with Docker](#run-with-docker)
section.

#### Running from a bazel build
#### Running from a Bazel Build

You can build `execute_query` with Bazel from source and run it by:

Expand All @@ -89,14 +99,29 @@ bazel run zetasql/tools/execute_query:execute_query -- --web

#### Run with Docker

You can run `execute_query` using Docker. First download the pre-built Docker
image `zetasql` or build your own from Dockerfile. See the instructions in the
[Build With Docker](#build-with-docker) section.
You can run `execute_query` using Docker. Download the pre-built Docker image
file `zetasql_docker.tar.gz` from the
[Releases](https://github.com/google/zetasql/releases) page, and load the image
using:

```bash
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
```

The Docker image name is `zetasql`. (You can also build a Docker image locally
using the instructions in the [Build with Docker](#build-with-docker) section.)

Assuming your Docker image name is MyZetaSQLImage, run:
You can then run `execute_query` using:

```bash
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web
sudo docker run --init -it -h=$(hostname) -p 8080:8080 zetasql execute_query --web
```

If you are using MacOS with an Apple M1/M2 chip, add the additional argument
`--platform=linux/amd64`:

```bash
sudo docker run --init -it -h=$(hostname) -p 8080:8080 --platform linux/amd64 zetasql execute_query --web
```

Argument descriptions:
Expand All @@ -106,6 +131,7 @@ Argument descriptions:
* `-h=$(hostname)`: Makes the hostname of the container the same as that of the
host.
* `-p 8080:8080`: Sets up port forwarding.
* `zetasql`: The docker image name.

`-h=$(hostname)` and `-p 8080:8080` together make the URL address of the
web server accessible from the host machine.
Expand All @@ -114,7 +140,7 @@ Alternatively, you can run this to start a bash shell, and then run
`execute_query` inside:

```bash
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage
sudo docker run --init -it -h=$(hostname) -p 8080:8080 my-zetasql-image

# Inside the container bash shell
execute_query --web
Expand Down Expand Up @@ -149,7 +175,7 @@ bazel build ...
bazel run //zetasql/tools/execute_query:execute_query -- --web

# The built binary can be found under bazel-bin and run directly.
bazel-bin/tools/execute_query:execute_query --web
bazel-bin/zetasql/tools/execute_query/execute_query --web

# Build and run a test.
bazel test //zetasql/parser:parser_set_test
Expand All @@ -165,28 +191,30 @@ version can be found in the `zetasql_deps_step_2.bzl` file.
ZetaSQL also provides a `Dockerfile` which configures all the dependencies so
that users can build ZetaSQL more easily across different platforms.

To build the Docker image locally (called MyZetaSQLImage here), run:
To build the Docker image locally (called `my-zetasql-image` here), run:

```bash
sudo docker build . -t MyZetaSQLImage -f Dockerfile
sudo docker build . -t my-zetasql-image -f Dockerfile
```

Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the
[Releases](https://github.com/google/zetasql/releases) page. You can load the
downloaded image by:

```bash
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar.gz
```

To run builds or other commands inside the Docker environment, run this command
to open a bash shell inside the container:

```bash
# Start a bash shell running inside the Docker container.
sudo docker run -it MyZetaSQLImage
sudo docker run -it my-zetasql-image
```

Replace `my-zetasql-image` with `zetasql` if you use the pre-built Docker image.

Then you can run the commands from the [Build with Bazel](#build-with-bazel)
section above.

Expand Down
40 changes: 40 additions & 0 deletions docker_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash
#
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -e
set -x

MODE=$1

CC=/usr/bin/gcc
CXX=/usr/bin/g++

if [ "$MODE" = "build" ]; then
# Build everything.
bazel build ${BAZEL_ARGS} -c opt ...
elif [ "$MODE" = "execute_query" ]; then
# Install the execute_query tool.
bazel build ${BAZEL_ARGS} -c opt --dynamic_mode=off //zetasql/tools/execute_query:execute_query
# Move the generated binary to the home directory so that users can run it
# directly.
cp /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
# Remove the downloaded and generated artifacts to keep the image small.
bazel clean --expunge
else
echo "Unknown mode: $MODE"
echo "Supported modes are: build, execute_query"
exit 1
fi
17 changes: 9 additions & 8 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,15 @@
The topics in this section provide the reference information you need to work
with ZetaSQL:

* [Lexical Structure and Syntax](https://github.com/google/zetasql/blob/master/docs/lexical.md)
* [Expressions, Functions, and Operators](https://github.com/google/zetasql/blob/master/docs/functions-and-operators.md)
* [Data Types](https://github.com/google/zetasql/blob/master/docs/data-types.md)
* [Query Syntax](https://github.com/google/zetasql/blob/master/docs/query-syntax.md)
* [Data Manipulation Language Reference](https://github.com/google/zetasql/blob/master/docs/data-manipulation-language.md)
* [Data Model](https://github.com/google/zetasql/blob/master/docs/data-model.md)
* [Data Definition Language Reference](https://github.com/google/zetasql/blob/master/docs/data-definition-language.md)
* [Modules](https://github.com/google/zetasql/blob/master/docs/modules.md)
* [Lexical Structure and Syntax](lexical.md)
* [Expressions, Functions, and Operators](functions-and-operators.md)
* [Data Types](data-types.md)
* [Query Syntax](query-syntax.md)
* [Pipe Query Syntax](pipe-syntax.md)
* [Data Manipulation Language Reference](data-manipulation-language.md)
* [Data Model](data-model.md)
* [Data Definition Language Reference](data-definition-language.md)
* [Modules](modules.md)

## License

Expand Down
45 changes: 22 additions & 23 deletions docs/aggregate-dp-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ determine the optimal privacy parameters for your dataset and organization.
WITH DIFFERENTIAL_PRIVACY ...
AVG(
expression,
[contribution_bounds_per_group => (lower_bound, upper_bound)]
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
```

Expand All @@ -201,9 +201,9 @@ and can support the following arguments:

+ `expression`: The input expression. This can be any numeric input type,
such as `INT64`.
+ `contribution_bounds_per_group`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each group separately before performing intermediate
+ `contribution_bounds_per_group`: A named argument with a
[contribution bound][dp-clamped-named].
Performs clamping for each group separately before performing intermediate
grouping on the privacy unit column.

**Return type**
Expand Down Expand Up @@ -330,7 +330,7 @@ noise, see [Remove noise][dp-noise].
WITH DIFFERENTIAL_PRIVACY ...
COUNT(
*,
[contribution_bounds_per_group => (lower_bound, upper_bound)]
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
```

Expand All @@ -343,9 +343,9 @@ is an aggregation across a privacy unit column.
This function must be used with the [`DIFFERENTIAL_PRIVACY` clause][dp-syntax]
and can support the following argument:

+ `contribution_bounds_per_group`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each group separately before performing intermediate
+ `contribution_bounds_per_group`: A named argument with a
[contribution bound][dp-clamped-named].
Performs clamping for each group separately before performing intermediate
grouping on the privacy unit column.

**Return type**
Expand Down Expand Up @@ -468,9 +468,9 @@ and can support these arguments:

+ `expression`: The input expression. This expression can be any
numeric input type, such as `INT64`.
+ `contribution_bounds_per_group`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each group separately before performing intermediate
+ `contribution_bounds_per_group`: A named argument with a
[contribution bound][dp-clamped-named].
Performs clamping per each group separately before performing intermediate
grouping on the privacy unit column.

**Return type**
Expand Down Expand Up @@ -609,9 +609,9 @@ and can support these arguments:
such as `INT64`. `NULL` values are always ignored.
+ `percentile`: The percentile to compute. The percentile must be a literal in
the range `[0, 1]`.
+ `contribution_bounds_per_row`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each row separately before performing intermediate
+ `contribution_bounds_per_row`: A named argument with a
[contribution bounds][dp-clamped-named].
Performs clamping for each row separately before performing intermediate
grouping on the privacy unit column.

`NUMERIC` and `BIGNUMERIC` arguments are not allowed.
Expand Down Expand Up @@ -689,7 +689,7 @@ GROUP BY item;
WITH DIFFERENTIAL_PRIVACY ...
SUM(
expression,
[contribution_bounds_per_group => (lower_bound, upper_bound)]
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
```

Expand All @@ -703,10 +703,9 @@ and can support these arguments:

+ `expression`: The input expression. This can be any numeric input type,
such as `INT64`. `NULL` values are always ignored.
+ `contribution_bounds_per_group`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each group separately before performing intermediate
grouping on the privacy unit column.
+ `contribution_bounds_per_group`: A named argument with a
[contribution bound][dp-clamped-named]. Performs clamping for each group
separately before performing intermediate grouping on the privacy unit column.

**Return type**

Expand Down Expand Up @@ -830,7 +829,7 @@ noise, see [Use differential privacy][dp-noise].
WITH DIFFERENTIAL_PRIVACY ...
VAR_POP(
expression,
[contribution_bounds_per_row => (lower_bound, upper_bound)]
[ contribution_bounds_per_row => (lower_bound, upper_bound) ]
)
```

Expand All @@ -847,9 +846,9 @@ can support these arguments:

+ `expression`: The input expression. This can be any numeric input type,
such as `INT64`. `NULL`s are always ignored.
+ `contribution_bounds_per_row`: The
[contribution bounds named argument][dp-clamped-named].
Perform clamping per each row separately before performing intermediate
+ `contribution_bounds_per_row`: A named argument with a
[contribution bound][dp-clamped-named].
Performs clamping for each row separately before performing intermediate
grouping on individual user values.

`NUMERIC` and `BIGNUMERIC` arguments are not allowed.
Expand Down
8 changes: 4 additions & 4 deletions docs/aggregate-function-calls.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

# Aggregate function calls

An aggregate function is a function that summarizes the rows of a group into a
single value. When an aggregate function is used with the `OVER` clause, it
becomes a window function, which computes values over a group of rows and then
returns a single result for each row.
An aggregate function summarizes the rows of a group into a single value. When
an aggregate function is used with the `OVER` clause, it becomes a window
function, which computes values over a group of rows and then returns a single
result for each row.

## Aggregate function call syntax

Expand Down
Loading

0 comments on commit 194cd32

Please sign in to comment.