Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: GC causing "org.apache.iceberg.exceptions.NotFoundException: File does not exist" #9749

Open
dorsegal opened this issue Oct 13, 2024 · 8 comments

Comments

@dorsegal
Copy link

What happened

After I used GC I started to get file does not exist error.
Looks like the file was deleted but was not deleted from metadata.

apache/iceberg#8338

How to reproduce it

  1. Create table
  2. Add data
  3. Run GC with GC command
  4. Try to read all data from table again

Nessie server type (docker/uber-jar/built from source) and version

kubernetes 0.99.0

Client type (Ex: UI/Spark/pynessie ...) and version

Spark

Additional information

No response

@dorsegal
Copy link
Author

Maybe this is related as well #8263

@snazy
Copy link
Member

snazy commented Oct 14, 2024

@dorsegal can you provide more details, or better a reproducer?

@dorsegal
Copy link
Author

My setup is with kafka iceberg connect.
create a table and use kafka iceberg sink to write data into the table.
After several commits run GC and try to run rewrite files using spark.

spark.sql( "CALL nessie.system.rewrite_data_files(table => 'table', options => map('partial-progress.enabled','true', 'max-concurrent-file-group-rewrites', '30'))" ).show()

I can provide more logs if needed just don't know which one. From GC logs I see that it deleted some files.

@snazy
Copy link
Member

snazy commented Oct 14, 2024

What I meant is a full reproducer mentioning every step starting from scratch, so that s/o can get to the same behavior on a "clean"/empty environment.

@yunlou11
Copy link

java -jar nessie-gc-0.99.0.jar gc  

failed:

Caused by: java.lang.RuntimeException: Failed to get manifest files for ICEBERG_TABLE robot_dev.robot_data, content-ID fc122060-bf21-44d3-b776-fbecb2d23715 at commit 47a35e9867b0408c65feb09d7140a29d198354edf3a1aa0dc7cc09d192b07c27 via s3://ice-lake/robot_dev/robot_data/metadata/00000-6b65b94d-6370-4c43-9baa-b40ed0770c5d.metadata.json

After expire snapshot in Spark SQL:

CALL nessie.system.expire_snapshots('nessie.robot_dev.robot_data', TIMESTAMP '2024-10-15 00:00:00.000', 1)

Count of snapshots reduced and manifest files have been deleted. But Nessie metadata maybe not sync the changes of snapshots

@snazy

@snazy
Copy link
Member

snazy commented Nov 4, 2024

@yunlou11 thanks for the information. But what are all the necessary steps to get to that error message? Aka, what was all done before running GC?

@yunlou11
Copy link

yunlou11 commented Nov 7, 2024

@yunlou11 thanks for the information. But what are all the necessary steps to get to that error message? Aka, what was all done before running GC?

Sorry, It's maybe Iceberg error, not nessie:

spark.sql("call demo.system.remove_orphan_files(table => 'xxx').show()")

iceberg issue:
apache/iceberg#7914

@yunlou11
Copy link

yunlou11 commented Nov 7, 2024

@yunlou11 thanks for the information. But what are all the necessary steps to get to that error message? Aka, what was all done before running GC?

Sorry, It's maybe Iceberg error, not nessie:

spark.sql("call demo.system.remove_orphan_files(table => 'xxx').show()")

iceberg issue: apache/iceberg#7914

When I fixed iceberg remove_orphan_files s3 Exception followed the suggestion "Use SupportsPrefixOperations for Remove OrphanFile Procedure", I get the nessie-gc failure but It's bug of the fixed codes about iceberg , not nessie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants