Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test broken #492

Closed
kevinjqliu opened this issue Mar 2, 2024 · 5 comments · Fixed by #494
Closed

Integration test broken #492

kevinjqliu opened this issue Mar 2, 2024 · 5 comments · Fixed by #494

Comments

@kevinjqliu
Copy link
Contributor

Apache Iceberg version

main (development)

Please describe the bug 🐞

As of current commit 3/2/2024

Error log:

>           assert spark_partition_for_justification == expected_partition_record
E           assert Record[timestamp_field_hour=464611] == Record[timestamp_field_hour=464603]
E             Full diff:
E             - Record[timestamp_field_hour=464603]
E             ?                                 ^^
E             + Record[timestamp_field_hour=464611]
E             ?                                 ^^

tests/integration/test_partitioning_key.py:771: AssertionError

Reproduce:

git checkout main
git pull
make test-integration

Possibly related to #453

@HonahX
Copy link
Contributor

HonahX commented Mar 3, 2024

I also hit this today. I think this is because the spark session uses our computer's local timezone when inserting timestamp values into that table. We may config the Spark Session to always use UTC timezone in conftest.py:

spark = (
        SparkSession.builder.appName("PyIceberg integration test")
        .config("spark.sql.session.timeZone", "UTC")
        .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
        .config("spark.sql.catalog.integration", "org.apache.iceberg.spark.SparkCatalog")
        .config("spark.sql.catalog.integration.catalog-impl", "org.apache.iceberg.rest.RESTCatalog")
...

The github CI should pass without this change because the Github Action VM uses UTC timezone by default.

@kevinjqliu
Copy link
Contributor Author

makes sense, I'm in PT (UTC-8) which matches the difference between 464603 and 464611.

Just ran make test-integration with the

        .config("spark.sql.session.timeZone", "UTC")

and everything passed

@kevinjqliu
Copy link
Contributor Author

Do you already have a PR out for the change? Or should I push one up?

@HonahX
Copy link
Contributor

HonahX commented Mar 3, 2024

Could you please open one up? I can quickly merge that 😄 . Thanks!

@kevinjqliu
Copy link
Contributor Author

@HonahX opened #494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants