Skip to content

Commit

Permalink
Fix README for PATCH
Browse files Browse the repository at this point in the history
  • Loading branch information
vitalif committed Nov 22, 2023
1 parent 92ddeb8 commit 4b9d8eb
Showing 1 changed file with 30 additions and 22 deletions.
52 changes: 30 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,18 +93,19 @@ See also [Common Issues](#common-issues).

\* xattrs without extra RTT only work with Yandex S3 (--list-type=ext-v1).

\* Partial object updates only work with Yandex S3.
\* Partial object updates (PATCH) only work with Yandex S3.

## Partial object updates
## Partial object updates (PATCH)

With Yandex S3 it is possible to do partial object updates (data only) without server-side copy or reupload.
Currently the feature can be enabled by the flag `--enable-patch` and will be enabled by default for YC S3 in the future.
With Yandex S3 it is possible to do partial object updates (data only) without server-side copy or reupload.

Currently the feature can be enabled with `--enable-patch` and will be enabled by default for YC S3 in the future.

Enabling patch uploads has the following benefits:
- Fast [fsync](#fsync): since nothing needs to be copied, fsync is now much cheaper
- Support for [concurrent updates](#concurrent-updates)
- Better memory utilization: less intermediate state needs to be cached, so more memory can be used for (meta)data cache
- Better performace for big files
- Better performance for big files

Note: new files, metadata changes and renames are still flushed to S3 as multipart uploads.

Expand Down Expand Up @@ -231,23 +232,6 @@ fio -name=test -ioengine=libaio -direct=1 -bs=4M -iodepth=1 -fallocate=none \

## Concurrent Updates

### Yandex S3

When using Yandex S3, it is possible to concurrently update a single object/file from multiple hosts
using PATCH method (`--enable-patch`). However, concurrent changes are not reported back to the clients,
so in order to see the actual object contents you need to stop all writes and refresh the inode cache (see below).

It is strongly advised that clients from different hosts write data by non-overlapping offsets and
the writes are aligned with object parts borders to avoid conflicts. If it impossible to avoid conflicts entirely,
the conflicts are resolved by the LWW strategy. In case the conflict can't be resolved,
you can choose to drop the cached update (`--drop-patch-conflicts`), otherwise the write will be retried later.

The conflicts are reported in the log as following:

```
main.WARNING Failed to patch range %d-%d of file %s (inode %d) due to concurrent updates
```

### Other clouds

GeeseFS doesn't support concurrent updates of the same file from multiple hosts. If you try to
Expand All @@ -261,6 +245,30 @@ you may encounter lost updates (conflicts) which are reported in the log in the
main.WARNING File xxx/yyy is deleted or resized remotely, discarding local changes
```

### PATCH

When using Yandex S3, it is possible to concurrently update a single object/file from multiple hosts
using PATCH method (`--enable-patch`). However, concurrent changes are not reported back to the clients,
so you have to stop all writes and refresh the inode cache to see the changes made by other hosts.

It is strongly advised to write data in non-overlapping ranges to avoid conflicts.
If you do overlapping writes then you have to guarantee that your hosts coordinate
and serialize their updates by doing an `fsync()` before the next overlapping write.

Also note that even with PATCH, best performance is achieved when writes are aligned
with object part boundaries (i.e. with 5 MB chunks for the first 5 GB by default),
because the server may do internal read-modify-write for patched object parts, and
it also may return an error if too many parallel requests try to modify the same
object part. Such "PATCH failures" are reported in the log in the following way:

```
main.WARNING Failed to patch range %d-%d of file %s (inode %d) due to concurrent updates
```

Normally, GeeseFS retries requests failed in this way, but if you don't want it
to retry them, you can also choose to drop the cached update by enabling
`--drop-patch-conflicts`.

## Asynchronous Write Errors

GeeseFS buffers updates in memory (or disk cache, if enabled) and flushes them asynchronously,
Expand Down

0 comments on commit 4b9d8eb

Please sign in to comment.