Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct sequence for running maintenance steps on an iceberg table #11804

Open
salimpadela opened this issue Dec 17, 2024 · 0 comments
Open

correct sequence for running maintenance steps on an iceberg table #11804

salimpadela opened this issue Dec 17, 2024 · 0 comments
Labels
question Further information is requested

Comments

@salimpadela
Copy link

salimpadela commented Dec 17, 2024

Query engine

Spark, AWS Glue

Question

What is the correct sequence of maintenance steps to run on an Iceberg table? Our tables are write-once-read-many so I am not sure if I need to run rewrite_position_delete_files or not.

I am not very experienced in Iceberg so I wanted to see if I am running them in the correct order or am I missing something here:

Right now my sequence looks like this:

  1. rewrite_data_files
  2. rewrite_manifests
  3. expire_snapshots

I read it on a blog that I don't have to run rewrite_manifests before or after running rewrite_data_files as rewrite_data_files rewrites manifests but I am finding the results contradictory based on what I am seeing.

With the sequence I specified, this is how the results look like for rewrite_data_files and rewrite_manifests calls.

Result of rewrite_data_files

rewritten_data_files_count added_data_files_count rewritten_bytes_count failed_data_files_count
13507 2371 4307611669 0

Result of rewrite_manifests

rewritten_manifests_count added_manifests_count
29 25

P.S. My intention is to have only one snapshot at the end of the process so I am providing older_than in expire_snapshots.

@salimpadela salimpadela added the question Further information is requested label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant