From 479f468c5f389bd7a30938114f8e79445c48f179 Mon Sep 17 00:00:00 2001 From: Ryan Blue Date: Sun, 4 Aug 2024 14:32:52 -0700 Subject: [PATCH] Spec: Deprecate the file system table scheme (#10833) --- format/spec.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/format/spec.md b/format/spec.md index 5a90f6fd978d..daef7538e730 100644 --- a/format/spec.md +++ b/format/spec.md @@ -779,7 +779,9 @@ When two commits happen at the same time and are based on the same version, only #### File System Tables -An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS or most local file systems [1]. +_Note: This file system based scheme to commit a metadata file is **deprecated** and will be removed in version 4 of this spec. The scheme is **unsafe** in object stores and local file systems._ + +An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS [1]. Each version of table metadata is stored in a metadata folder under the table’s base location using a file naming scheme that includes a version number, `V`: `v.metadata.json`. To commit a new metadata version, `V+1`, the writer performs the following steps: @@ -1393,4 +1395,4 @@ This section covers topics not required by the specification but recommendations Iceberg supports two types of histories for tables. A history of previous "current snapshots" stored in ["snapshot-log" table metadata](#table-metadata-fields) and [parent-child lineage stored in "snapshots"](#table-metadata-fields). These two histories might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` can be used to set the snapshot of a table to any arbitrary snapshot, which might have a lineage derived from a table branch or no lineage at all). -When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. \ No newline at end of file +When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata.