Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#4462) Postgres compatibility tests using sqllogictest #4834

Merged
merged 21 commits into from
Jan 21, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,23 @@ jobs:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres

sqllogictest-postgres:
name: "Run sqllogictest with Postgres runner"
needs: [linux-build-lib]
runs-on: ubuntu-latest
steps:
- name: Check docker
run: docker ps
- uses: actions/checkout@v3
with:
submodules: true
- name: Setup toolchain
run: |
rustup toolchain install stable
rustup default stable
- name: Run sqllogictest
run: PG_COMPAT=true cargo test -p datafusion --test sqllogictests

windows:
name: cargo test (win64)
runs-on: windows-latest
Expand Down
7 changes: 6 additions & 1 deletion datafusion/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -104,17 +104,22 @@ xz2 = { version = "0.1", optional = true }
[dev-dependencies]
arrow = { version = "31.0.0", features = ["prettyprint", "dyn_cmp_dict"] }
async-trait = "0.1.53"
bigdecimal = "0.3.0"
criterion = "0.4"
csv = "1.1.6"
ctor = "0.1.22"
doc-comment = "0.3"
env_logger = "0.10"
half = "2.2.1"
parquet-test-utils = { path = "../../parquet-test-utils" }
postgres-types = { version = "0.2.4", features = ["derive", "with-chrono-0_4"] }
rstest = "0.16.0"
rust_decimal = { version = "1.27.0", features = ["tokio-pg"] }
sqllogictest = "0.10.0"
test-utils = { path = "../../test-utils" }
testcontainers = "0.14.0"
melgenek marked this conversation as resolved.
Show resolved Hide resolved
thiserror = "1.0.37"

tokio-postgres = "0.7.7"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the reasons for these dependencies:

  • half - f16 type for Datafusion
  • testcontainers - creates a fresh docker container with Postgres for each sqllogictest file.
  • postgres-types and tokio-postgres - these are required for writing a Postgres client
  • rust_decimal - converts Postgres "numeric" type to a rust type
  • bigdecimal - provides a common type to do floating number rounding. rust_decimal, unfortunately, doesn't handle numbers of arbitrary precision. For example, rust_decimal could not parse 26156334342021890000000000000000000000 that is currently present in one of .slt tests in Datafusion.

[target.'cfg(not(target_os = "windows"))'.dev-dependencies]
nix = "0.26.1"

Expand Down
5 changes: 4 additions & 1 deletion datafusion/core/src/test_util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,10 @@ pub fn parquet_test_data() -> String {
/// Returns either:
/// The path referred to in `udf_env` if that variable is set and refers to a directory
/// The submodule_data directory relative to CARGO_MANIFEST_PATH
fn get_data_dir(udf_env: &str, submodule_data: &str) -> Result<PathBuf, Box<dyn Error>> {
pub fn get_data_dir(
udf_env: &str,
submodule_data: &str,
) -> Result<PathBuf, Box<dyn Error>> {
// Try user defined env.
if let Ok(dir) = env::var(udf_env) {
let trimmed = dir.trim().to_string();
Expand Down
13 changes: 13 additions & 0 deletions datafusion/core/tests/sqllogictests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,19 @@ Run only the tests in `information_schema.slt`:
cargo test -p datafusion --test sqllogictests -- information
```

#### Running tests: Postgres compatibility

Test files that start with prefix `pg_compat_` verify compatibility with Postgres.
Datafusion runs these test files during normal sqllogictest runs.

In order to run sqllogictests with Postgres execute:

```shell
PG_COMPAT=true cargo test -p datafusion --test sqllogictests
```

This command requires a docker binary. Check that docker is properly installed with `which docker`.

#### Updating tests: Completion Mode

In test script completion mode, `sqllogictests` reads a prototype script and runs the statements and queries against the database engine. The output is is a full script that is a copy of the prototype script with result inserted.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
CREATE TABLE aggregate_test_100_by_sql
(
c1 character varying NOT NULL,
c2 smallint NOT NULL,
c3 smallint NOT NULL,
c4 smallint,
c5 integer,
c6 bigint NOT NULL,
c7 smallint NOT NULL,
c8 integer NOT NULL,
c9 bigint NOT NULL,
c10 character varying NOT NULL,
c11 real NOT NULL,
c12 double precision NOT NULL,
c13 character varying NOT NULL
);

COPY aggregate_test_100_by_sql
FROM '/opt/data/csv/aggregate_test_100.csv'
DELIMITER ','
CSV HEADER;
90 changes: 90 additions & 0 deletions datafusion/core/tests/sqllogictests/src/engines/conversion.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use bigdecimal::BigDecimal;
use half::f16;
use rust_decimal::prelude::*;
use rust_decimal::Decimal;

pub const NULL_STR: &str = "NULL";

pub fn bool_to_str(value: bool) -> String {
if value {
"true".to_string()
} else {
"false".to_string()
}
}

pub fn varchar_to_str(value: &str) -> String {
if value.is_empty() {
"(empty)".to_string()
} else {
value.to_string()
}
}

pub fn f16_to_str(value: f16) -> String {
if value.is_nan() {
"NaN".to_string()
} else if value == f16::INFINITY {
"Infinity".to_string()
} else if value == f16::NEG_INFINITY {
"-Infinity".to_string()
} else {
big_decimal_to_str(BigDecimal::from_str(&value.to_string()).unwrap())
}
}

pub fn f32_to_str(value: f32) -> String {
if value.is_nan() {
"NaN".to_string()
} else if value == f32::INFINITY {
"Infinity".to_string()
} else if value == f32::NEG_INFINITY {
"-Infinity".to_string()
} else {
big_decimal_to_str(BigDecimal::from_str(&value.to_string()).unwrap())
}
}

pub fn f64_to_str(value: f64) -> String {
if value.is_nan() {
"NaN".to_string()
} else if value == f64::INFINITY {
"Infinity".to_string()
} else if value == f64::NEG_INFINITY {
"-Infinity".to_string()
} else {
big_decimal_to_str(BigDecimal::from_str(&value.to_string()).unwrap())
}
}

pub fn i128_to_str(value: i128, scale: u32) -> String {
big_decimal_to_str(
BigDecimal::from_str(&Decimal::from_i128_with_scale(value, scale).to_string())
.unwrap(),
)
}

pub fn decimal_to_str(value: Decimal) -> String {
big_decimal_to_str(BigDecimal::from_str(&value.to_string()).unwrap())
}

pub fn big_decimal_to_str(value: BigDecimal) -> String {
value.round(12).normalized().to_string()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All numbers are rounded to 12 decimal digits. Without explicit types Postgres and Datafusion can choose different underlying types. For example, Postgres could choose to use numeric when Datafusion uses int. In order to compare results, all floating number types are converted to the same number of decimal points.

12 is chosen to pass the existing set of tests. I think it could produce errors, for example, when rounding f16 to 12 digits. I would probably use 3 (or 4) decimal digits if high precision is not required for Postgres compatibility tests. 3 or 4 is an expected number of digits for 16 bit binary according to IEEE_754, so it should be safer to round to the smallest possible data type.

}
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

mod util;

use crate::error::Result;
use crate::insert::util::LogicTestContextProvider;
use self::util::LogicTestContextProvider;
use super::error::Result;
use arrow::record_batch::RecordBatch;
use datafusion::datasource::MemTable;
use datafusion::prelude::SessionContext;
Expand Down
100 changes: 100 additions & 0 deletions datafusion/core/tests/sqllogictests/src/engines/datafusion/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use std::time::Duration;

use sqllogictest::DBOutput;

use self::error::{DFSqlLogicTestError, Result};
use async_trait::async_trait;
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::prelude::SessionContext;
use datafusion_sql::parser::{DFParser, Statement};
use insert::insert;
use sqlparser::ast::Statement as SQLStatement;

mod error;
mod insert;
mod normalize;

pub struct DataFusion {
melgenek marked this conversation as resolved.
Show resolved Hide resolved
ctx: SessionContext,
file_name: String,
is_pg_compatibility_test: bool,
}

impl DataFusion {
pub fn new(
ctx: SessionContext,
file_name: String,
postgres_compatible: bool,
) -> Self {
Self {
ctx,
file_name,
is_pg_compatibility_test: postgres_compatible,
}
}
}

#[async_trait]
impl sqllogictest::AsyncDB for DataFusion {
type Error = DFSqlLogicTestError;

async fn run(&mut self, sql: &str) -> Result<DBOutput> {
println!("[{}] Running query: \"{}\"", self.file_name, sql);
let result = run_query(&self.ctx, sql, self.is_pg_compatibility_test).await?;
Ok(result)
}

/// Engine name of current database.
fn engine_name(&self) -> &str {
"DataFusion"
}

/// [`Runner`] calls this function to perform sleep.
///
/// The default implementation is `std::thread::sleep`, which is universial to any async runtime
/// but would block the current thread. If you are running in tokio runtime, you should override
/// this by `tokio::time::sleep`.
async fn sleep(dur: Duration) {
tokio::time::sleep(dur).await;
}
}

async fn run_query(
ctx: &SessionContext,
sql: impl Into<String>,
is_pg_compatibility_test: bool,
) -> Result<DBOutput> {
let sql = sql.into();
// Check if the sql is `insert`
if let Ok(mut statements) = DFParser::parse_sql(&sql) {
let statement0 = statements.pop_front().expect("at least one SQL statement");
if let Statement::Statement(statement) = statement0 {
let statement = *statement;
if matches!(&statement, SQLStatement::Insert { .. }) {
return insert(ctx, statement).await;
}
}
}
let df = ctx.sql(sql.as_str()).await?;
let results: Vec<RecordBatch> = df.collect().await?;
let formatted_batches =
normalize::convert_batches(results, is_pg_compatibility_test)?;
Ok(formatted_batches)
}
Loading