#Riak S2 (Riak CS) 2.1.0 Release Notes
Released October 13, 2015.
This is a backwards-compatible* release that introduces a new metrics system, garbage collection refinements, and several other new features. Riak S2 2.1 is designed to work with both Riak KV 2.0.5+ and 2.1.1+.
Note: This release is backwards compatible only with the Riak S2 2.x series.
###Riak KV 2.1.1 Usage Note
Riak KV 2.1.1 includes a copy of riak_cs_kv_multi_backend
, therefore there is no need to add lines specifying special multi_backend
and add_paths
configurations in advanced.config.
Instead, you can set the following in riak.conf:
storage_backend = prefix_multi
cs_version = 20100
If you need storage calculation, you will still require the add_paths
config to load MapReduce codes into Riak KV.
##New Features ###Metrics New metrics have been added that enable you to determine the health of your Riak S2 system, as well as get reports on your storage utilization per bucket or user. The following stats items are available:
- All calls, latencies, and counters in the S3 API
- All calls, latencies, and counters in Stanchion
- All Riak Erlang client operations, latencies, and counters
- Information about the counts (active, idle, and overflow) for the process pool and connection pool
- System information, versions, port count, and process count
- Memory information about the riak-cs virtual machine
- HTTP listener information: active sockets and waiting acceptors
Note: stats item names from prior to 2.0.x are not preserved; they have been renamed or removed. No backward consistency is maintained. Please see the documentation for more information.
Additional storage usage metrics are also available. . These metrics are gathered during storage calculation. Gathering these metrics is off by default, but you can turn it on by setting detailed_storage_calc
to true
in advanced.config. When you enable this option, you have access to information about how many manifests are writing
, pending_delete
, scheduled_delete
and active
which is not visible via the API.
Note: Metrics do not always correctly reflect actual disk usage. For instance, writing
may indicate more space than is actually used. Or, for example, if an upload was cancelled in the middle, the calculation does not know how much actual storage space is consumed. In the same way, scheduled_delete
also may not reflect the exact amount of disk usage because blocks might already be partially deleted by garbage collection.
- [PR 1120]
###riak-cs-admin
The following administration CLIs have been replaced by the riak-cs-admin
command:
riak-cs-storage
riak-cs-gc
riak-cs-access
riak-cs-stanchion
The commands listed above are deprecated and will be removed in future releases.
- [PR 1175]
###Garbage Collection Refinements
Several new options have been added to the riak-cs-admin gc
command:
active_delete_threshold
is an option to avoid delegating manifests and block deletion to garbage collector. This option relieves garbage collector from having to delete small objects. This can optimise performance in cases where both garbage collector does not catch up with DELETE Object API calls and garbage collector's elapsed time is dominated by small objects.[PR 1174]--start
and--end
options have been added to theriak-cs-admin gc batch
command to specify start and end in manual batch execution. Note that the--start
flag on the command line will overwrite theepoch_start
option in advanced.config. [PR 1147 ]--leeway
has been added to create a temporary leeway period whose values are used only once and not repeated at the next run, and--max-workers
has been added to allow you to override the concurrency value temporarily for a single run of garbage collector. [PR 1147 ]- Riak S2 2.0 (and older) has a race condition where fullsync replication and garbage collection may resurrect deleted blocks without any way to delete them again. When real-time replication and replication of a garbage collection bucket entry object being dropped from the real-time queue are combined, blocks may remain on the sink side without being collected. Riak S2 2.1 introduces deterministic garbage collection to avoid fullsync replication. Additionally, garbage collection and fullsync replication run concurrently, and work on the same blocks and manifests. You can now specify the range of time using the
--start
and--end
flags withriak-cs-admin gc batch
for garbage collector in order to collect deleted objects synchronously on both sink and source sides. [PR 1147 ] riak-cs-admin gc earliest-keys
is available so you can find the oldest entry afterepoch_start
in garbage collection. With this option, you can stay informed of garbage collection progress. [PR 1160]
More information on garbage collection can be found in the documentation.
##Additions ###Open Source
- A MapReduce optimisation in fetching Riak objects was introduced in Riak 2.1. Now, Riak CS 2.1 introduces an option to use that optimisation in storage calculation. It is off by default, but it can be used by setting
use_2i_for_storage_calc
astrue
in advanced.config. This reduced 50% of I/O in LevelDB. [PR 1089] - Erlang/OTP 17 support is now included. [PR 1245 and PR 1040]
- A module-level hook point for limiting user access and quota usage is now available with very preliminary, simple, node-wide limiting example modules. Operators can make, plug in, or combine different modules as quota-limiting, rate-limiting or bandwidth-limiting depending on their unique requirements. [PR 1118]
- An orphaned block scanner is now available. [PR 1133]
riak-cs-admin audit-bucket-ownership
is a new tool to check integrity between users and buckets added. For example, it can be used in cases where a bucket is visible when listing buckets but not accessible, or a bucket is visible and exists but could not be deleted. [PR 1202]- The following log rotation items have been added to cuttlefish:
- log.console.size
- log.console.rotation
- log.console.rotation.keep
- log.error.rotation
- log.error.rotation.keep
- log.error.size
riak_cs_wm_common
now has a default callback ofmultiple_choices
, which preventscode_server
from becoming a bottleneck. [PR 1181]- An option has been added to replace the
PR=all user GET
option withPR=one
just before authentication. This option improves latency, especially in the presence of slow (or actually-failing) nodes blocking the whole request flow because of PR=all. When enabled, a user's owned-bucket list is never pruned after a bucket is deleted, instead it is just marked as deleted. [PR 1191] - An info log has been added when starting a storage calculation batch. [PR 1238]
GET Bucket
requests now have clearer responses. A 501 stub for Bucket lifecycle and a simple stub for Bucket requestPayment have been added. [PR 1223]- Several user-friendly features have been added to
riak-cs-debug
: fine-grained information gathering options, user-defined filtering for configuration files, and verbose output for failed commands. [PR 1236]
###Enterprise
- MDC has
proxy_get
, which make block objects propagate to site clusters when they are requested. Now, multibag configuration with MDC supportsproxy_get
. [PR 1171 and PR 25] - Multibag is now renamed to "Supercluster". A bag has been a set of replicated underlying Riak clusters, which is now a member of a supercluster.
riak-cs-multibag
command has been renamed asriak-cs-supercluster
as well. [PR 1257], [PR 1260], [PR 106], [PR 107] and [PR 31]. - Several internal operation tools have been added to help diagnose or address issues. [PR 1145, PR 1134, and PR 1133]
- Added a generic function for manual operations to resolve siblings of manifests and blocks, which will assist Basho Client Service Engineers with troubleshooting and solving issues. [PR 1188]
##Changes
- Dependency versions have been updated in Riak S2 and Stanchion as follows: cuttlefish 2.0.4, node_package 2.0.3, riak-erlang-client 2.1.1, lager 2.2.0, lager_syslog 2.1.1, eper 0.92 (Basho patched), cluster_info 2.0.3, riak_repl_pb_api 2.1.1, and riak_cs_multibag 2.1.0. [PR 1190, PR 1197 , PR 27, PR 1245, and PR 104].
- Riak CS has moved from Folsom to Exometer. [PR 1165 and PR 1180]
- Improvements have been made to error tracing for retrieving blocks from client GET requests. There is a complex logic to resolve blocks when a GET is requested from the client. First, Riak CS tries to retrieve a block with
n_val=1
. If it fails, a retry will be done usingn_val=3
. If the block cannot be resolved locally,proxy_get
is enabled, and the system is configured with datacenter replication, then Riak CS will try to perform a proxied GET to the remote site. The fallback and retry logic is complex and hard to trace, especially in a faulty or unstable situation. This improvement adds error tracing for the whole sequence described above, which will help diagnose issues. Specifically, for each block, the block server stacks all errors returned from the Riak client and reports the reason for every error as well as the type of call in which the error occurred. [PR 1177] - Using the
GET Bucket
API with a specified prefix to list objects in a bucket needed optimization. It had been specifying end keys for folding objects in Riak too loosely. With this change, a tighter end key is specified for folding objects in Riak, which omits unnecessary fold in vnodes. [PR 1233] - A limitation to the max length of keys has been introduced. This limitation can be specified as 1024 by default, meaning no keys longer than 1024 bytes can be PUT, GET or DELETED unless
max_key_length
is explicitly specified as more than '1024' in riak-cs.conf. If you want to preserve the old key length behaviour, you may specify themax_key_length
as 'unlimited'. [PR 1233] - If a faulty cluster had several nodes down, the block server misunderstood that a block was already deleted and issued a false-notfound. This could lead to block leak. The PR default has been set to 'quorum' in an attempt to avoid this problem. Updates have also been made to make sure at least a single replica of a block is written in one of the primary nodes by setting the PW default to '1'. Additionally, measures are in place to prevent the block server from crashing when "not found" errors are returned due to a particular block of an object not being found in the cluster. Instead, unreachable blocks are skipped and the remaining blocks and manifests are collected. Since the PR and PW values are increased at blocks, the availability of PUTs and through-put of garbage collection may decrease. A few Riak nodes being unreachable may prevent PUT requests from returning successfully and may prevent garbage collection from collecting all blocks until the unreachable nodes come back. [PR 1242]
- The infinity timeout option has been set so that several functions make synchronous
gen_fsm
calls indefinitely, which prevents unnecessary timeouts. [PR 1249]
##Bugs Fixed
- [Issue 1097/PR 1212] When
x-amz-metadata-directive=COPY
was specified, Riak CS did not actually COPY the metadata of original resource. Instead, it would treat it as aREPLACE
. When directed tox-amz-metadata-directive=REPLACE
Content-Type
, Riack CS wouldREPLACE
it. Correct handling for thex-amz-metadata-directive
has been added to PUT Object Copy API. - [Issue 1099/PR 1096] There was an unnecessary NextMarker in Get Bucket's response if
CommonPrefixes
contained the last key. Fixed handling of uploaded parts that should be deleted after Multipart Complete Request. - [Issue 939/PR 1200] Copy requests without Content-Length request headers failed with 5xx errors. Such requests are now allowed without Content-Length header in Copy API calls. Additionally, Copy API calls with Content-Lengths more than zero have been given explicit errors.
- [Issue 1143/PR 1144] Manual batch start caused the last batch time to appear to be in the future. All temporal shifts have been fixed.
- [Issue PR 1162/PR 1163] Fix a configuration system bug where Riak CS could not start if
log.syslog=on
was set. - [Issue 1169/PR 1200] The error response of the PUT Copy API call showed the target resource path rather than the source path when the source was not found or not accessible by the request user. It now shows the source path appropriately.
- [PR 1178] Multiple IP address descriptions under a single condition statement of a bucket policy were not being properly parsed as lists.
- [PR 1185] If
proxy_get_active
was defined in riak-cs.conf as anything other than enabled or disabled, there would be excessive log output. Now,proxy_get_active
also accepts non-boolean definitions. - [PR 1184]
put_gckey_timeout
was used instead ofput_manifest_timeout
when a delete process tried to update the status of manifests. - [Issue 1201/PR 1230] A single slow or silently failing node caused intermittent user fetch failure. A grace period has been added so
riakc_pb_socket
can attempt to reconnect. - [PR 1232] Warning logs were being produced for unsatisfied primary reads. Since users are objects in Riak CS and CS tries to retrieve these objects for authentication for almost every request, the retrieval option (PR=all) would fail if even one primary vnode was stopped or unresponsive and a log would be created. Given that Riak is set up to be highly available, these logs were quite noisy. Now, the "No WM route" log from prior to Riak CS 2.1 has been revived. Also, the log severity has been downgraded to debug, since it indicates a client error in all but the development phase.
- [PR 1237] The
riak-cs-admin
status command exit code was non-zero, even in successful execution. It will now return zero. - [Issue 1097/PR 1212 and PR 4] Riak S2 did not copy the metadata of an original resource when the
x-amz-metadata-directive=COPY
command was used, nor whenx-amz-metadata-directive
was specified. Handling of thex-amz-metadata-directive
command in PUT Object Copy API has been added. - [Issue 1097/PR 1212 and PR 4] Riak CS did not store
Content-Type
in COPY requests when thex-amz-metadata-directive=REPLACE
command was used. Handling of thex-amz-metadata-directive
command in PUT Object Copy API has been added. - [Issue 1097/PR 1212 and PR 4] Fixed the handling of uploaded parts that should be deleted after Multipart Complete Request.
- [Issue 1214/PR 1246] Prior to Riak S2 2.1.0, a PUT Copy API command with identical source and destination changed user metadata (
x-amz-meta-*
headers) but failed to update Content-Type. Content-Type is now correctly updated by the API call. - [Issue PR 1261, [PR 1263] Fix
riak-cs-debug
to includeapp.config
when no generated files are found whenriak-cs.conf
is not used.
This is a bugfix release.
- Fix config item
gc.interval
not working wheninfinity
is set (#1125/PR#1126). - Add
log.access
switch to disable access logging (#1109/PR#1115). - Add missing riak-cs.conf items:
max_buckets_per_user
andgc.batch_size
(#1109/PR#1115). - Fix bugs around subsequent space characters for Delete Multiple Objects API and user administration API with XML content (#1129/PR#1135).
- Fix URL path resource and query parameters to work in AWS v4 header authentication. Previously,
+
was being input instead of%20
for blank spaces. (PR#1141)
- This release updates Riak CS to work with Riak 2.0.5.
- We have simplified the configuration system.
- All official patches for older versions of Riak and Riak CS have been included in these releases. There is no need to apply any patches released for Riak CS 1.4.x or 1.5.x to the Riak CS 2.0.x series. Patches released for Riak CS 1.4.x or 1.5.x cannot be directly applied to Riak CS 2.0.x because the version of Erlang/OTP shipped with Riak CS has been updated in version 2.0.0.
- Please review the complete Release Notes before upgrading.
- Access log can't be disabled.
- Advanced.config should be used to customize value of
max_buckets_per_user
andgc_batch_size
.
- Changed the name of
gc_max_workers
togc.max_workers
, and lowered the default value from 5 to 2 (#1110) to reduce the workload on the cs cluster. - Partial support of GET Location API (#1057)
- Add very preliminary AWS v4 header authentication - without query string authentication, object chunking and payload checksum (#1064). There is still a lot of work to reliably use v4 authentication.
- Put Enterprise deps into dependency graph (#1065)
- Introduce Cuttlefish (#1020, #1068, #1076, #1086, #1090) (Stanchion #88, #90, #91)
- Yessir Riak client to measure performance (#1072, #1083)
- Inspector improvement with usage change (#1084)
- Check signed date in S3 authentication (#1067)
- Update
cluster_info
and various dependent libraries (#1087, #1088) (Stanchion #85, #87, #93) - Storage calculation optimization (#1089) With Riak >= 2.1 this works
with
use_2i_for_storage_calc
flag might relieve disk read of storage calculation.
- Fix wrong webmachine log handler name (#1075)
- Fix lager crash (#1038)
- Fix hardcoded crashdump path (#1052)
- Suppress unnecessary warnings (#1053)
- Multibag simpler state transition (Multibag #21)
- GC block deletion failure after transition to multibag environment (Multibag #19)
- Connection closing caused errors for objects stored before the transition, after transition from single bag to multibag configuration (Multibag #18).
- Multi-Datacenter Replication using v2 replication support has been deprecated.
- Old list objects which required
fold_objects_for_list_keys
asfalse
have been deprecated and will be removed in the next major version. - Non-paginated GC in cases where
gc_paginated_indexes
isfalse
has been deprecated and will be removed in the next major version.
Upgrading a Riak CS system involves upgrading the underlying Riak, Riak CS and Stanchion installations. The upgrade process can be non-trivial depending on your existing system configurations and the combination of sub-system versions. This document contains general instructions and notices on upgrading the whole system to Riak CS 2.0.0.
Riak 2.0.0 introduced a new configuration system (riak.conf
), and as of Riak
CS 2.0.0, Riak CS now supports the new configuration style. Both Riak and Riak
CS still support the older style configurations through app.config
and
vm.args
.
Basho recommends moving to the new unified configuration system, using the
files riak.conf
, riak-cs.conf
and stanchion.conf
.
If you choose to use the legacy app.config
files for Riak CS and/or
Stanchion, some parameters have changed names and must be updated.
In particular, for the Riak CS app.config
:
cs_ip
andcs_port
have been combined intolistener
.riak_ip
andriak_pb_port
have been combined intoriak_host
.stanchion_ip
andstanchion_port
have been combined intostanchion_host
.admin_ip
andadmin_port
have been combined intoadmin_listener
.webmachine_log_handler
has becomewebmachine_access_log_handler
.
For the Stanchion app.config
:
stanchion_ip
andstanchion_port
have been combined intolistener
.riak_ip
andriak_port
have been combined intoriak_host
.
Each of the above pairs follows a similar form. Where the old form used a
separate IP and Port parameter, the new form combines those as {new_option, { "IP", Port}}
. For example, if your legacy app.config
configuration was
previously:
{riak_cs, [
{cs_ip, "127.0.0.1"},
{cs_port, 8080 },
. . .
]},
It should now read:
{riak_cs, [
{listener, {"127.0.0.1", 8080}},
. . .
]},
and so on.
Some key objects changed names after the upgrade. Applications may need to change their behaviour due to this bugfix.
Bucket number limitation per user have been introduced in 1.5.1. Users who have more than 100 buckets cannot create any bucket after the upgrade unless the limit is extended in the system configuration.
An operational procedure to clean up incomplete multipart under deleted buckets is needed. Otherwise new buckets with names that used to exist can't be created. The operation will fail with 409 Conflict.
Leeway seconds and disk space should also be carefully watched during the upgrade, because timestamp management of garbage collection was changed in the 1.5.0 release. Consult the "Leeway seconds and disk space section of 1.5 release notes for a more detailed description.
Basho supports upgrading from the two previous major versions to the latest release. Thus, this document will only cover upgrading from Riak CS versions 1.4.x and 1.5.x.
To upgrade to Riak CS 2.0.0 from versions prior to Riak CS 1.4.0, operators will need to first upgrade their system to Riak CS version 1.4.5 or 1.5.4. Upgrading to Riak CS 1.5.4 is recommended. The underlying Riak installation must also be upgraded to the Riak 1.4.x series, preferably version 1.4.12.
We recommend updating Stanchion before all other subsystems. Be careful not to have multiple live Stanchion nodes accessible from Riak CS nodes at the same time.
Repeat these steps on each node running Stanchion:
- Stop Stanchion
- Back up all Stanchion configuration files
- Uninstall the current Stanchion package
- Install the new Stanchion 2.0.0 package
- Migrate the Stanchion configuration (See below)
- Start Stanchion
Repeat these steps on every host:
- Stop Riak CS
- Stop Riak
- Back up all Riak and Riak CS configuration files and remove all patches
- Uninstall the current Riak CS package
- Uninstall the current Riak Riak packages
- Install the new Riak package
- Install the new Riak CS 2.0.0 package
- Migrate the Riak configuration (See below)
- Migrate the Riak CS configuration (See below)
- Start Riak
- Start Riak CS
When Riak CS is not installed on the same host as Riak, Riak CS can be upgraded at any time while the corresponding remote Riak node is alive.
Repeat these steps on every host:
- Stop Riak CS
- Back up all configuration files and remove all patches
- Uninstall the current Riak CS package
- Install the new Riak CS 2.0.0 package
- Migrate the Riak CS configuration (See below)
- Start Riak CS
When upgrading to Stanchion 2.0.0 the files app.config
and vm.args
are
migrated to the single file stanchion.conf
. Configuration files are still
stored in the same location as before.
Using the Configuration Mapping Tables below, edit your stanchion.conf
to
preserve your configuration preferences between Riak CS 1.5.x and 2.0.0.
The tables show the old and new configuration format and default values.
1.5.4 (app.config ) |
2.0.0 (stanchion.conf ) |
---|---|
{stanchion_ip, "127.0.0.1"} |
listener = 127.0.0.1:8080 |
{stanchion_port, 8085} |
|
{riak_ip, "127.0.0.1"} |
riak_host = 127.0.0.1:8087 |
{riak_pb_port, 8087} |
|
{admin_key, "admin-key"} |
admin.key = admin-key |
{admin_secret, "admin-secret"} |
admin.secret = admin-secret |
Riak's Lager configuration can be copied directly to the advanced.config
file.
1.5.4 (app.config ) |
2.0.0 (stanchion.conf ) |
---|---|
{ssl, [ |
|
{certfile, "./etc/cert.pem"} |
ssl.certfile |
{keyfile, "./etc/key.pem"} |
ssl.keyfile |
As Riak CS 2.0.0 only works on top of Riak 2.0.5 -- and does not work on top of the Riak 1.x.x series -- the underlying Riak installation must be upgraded to Riak 2.0.5. This document only covers upgrading from Riak 1.4.x. For more general information on upgrading Riak, please see the Riak upgrading to 2.0 guide.
Below are specific configuration changes required for a Riak 2.0.5 cluster supporting Riak CS.
In older versions of Riak, default bucket properties has been configured in the
app.config
as follows:
{riak_core, [
...
{default_bucket_props, [{allow_mult, true}]},
...
]}.
With Riak 2.0.5 in riak.conf
this becomes:
buckets.default.allow_mult = true
There are two ways to configure Riak 2.0.5 behind Riak CS 2.0.0:
Option 1: Reuse the existing app.config
file from Riak 1.4.x
In this case, add_paths
should be changed to target the new Riak CS binaries
installed by the Riak CS 2.0.0 package. These will be changed from
"/usr/lib/riak-cs/lib/riak_cs-1.5.4/ebin"
to
"/usr/lib/riak-cs/lib/riak_cs-2.0.0/ebin"
.
Option 2: Use Riak 2.0.0's new advanced.config
You will need to copy all riak_kv configuration items from app.config
into
advanced.config
, and update add_paths
to target the new Riak CS binaries
installed by the Riak CS 2.0.0 package. In advanced.config
, this will become:
{riak_kv, [
{add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.0.0/ebin"]}
]}.
The app.config
file must be removed when advanced.config
is used.
See Setting up the Proper Riak Backend for additional details.
Since the default configuration of the LevelDB memory size has changed, you will need to review your memory size settings. The memory use of Riak CS is primarily influenced by the Bitcask keydir and LevelDB block cache. Additionally, to improve IO performance, some extra memory for the kernel disk cache should be planned for. The equations below might help when specifying the memory size:
- Memory for backends = (Memory for Bitcask) + (Memory for LevelDB)
- Memory for storage = (Memory for backends) + (Memory for kernel cache)
The configuration setting relating to the memory size of LevelDB has changed
from max_open_files
to total_leveldb_mem_percent
in 2.0. This specifies the
total amount of memory consumed by LevelDB. Note that the default memory limit
has changed from being proportional to the number of max_open_files
to being a
percentage of the system's physical memory size.
Configuring total_leveldb_mem_percent
is strongly recommended as its
default value of 70% might be too aggressive for a
multi-backend configuration that also uses Bitcask. Bitcask keeps its keydir in
memory, which could be fairly large depending on the use case.
Bitcask stores all of its keys in memory as well as on disk. Correctly
estimating the total number of keys and their average size in Bitcask is very
important for estimating Bitcask memory usage. Total number of keys N(b)
in
Bitcask across the whole cluster will be:
N(b) = N(o, size <= 1MB) + N(o, size > 1MB) * avg(o, size > 1MB) / 1MB
where N(o, size <= 1MB)
is the number of objects with a size less than 1MB,
while N(o, size > 1MB
is the number of objects with a size greater than 1MB.
avg(o, size > 1MB)
is the average size of objects greater than 1MB in size.
The number of keys in Riak CS is related to the amount of data stored in MBs.
If the average lifetime of objects is significantly smaller than the leeway
period, treat objects waiting for garbage collections as live objects on disk.
Actual numbers of key count per vnode are included in the output of
riak-admin vnode-status
. There is an item named Status
in each vnode
section, which includes the key_count
in the be_blocks
section.
Once the numbers of keys is known, estimate the amount of memory used by Bitcask keydir as per the Bitcask Capacity Planning documentation.
The bucket name size is always 19 bytes (see riak_cs_utils:to_bucket_name/2
)
and the key size is always 20 bytes (see riak_cs_lfs_utils:block_name/3
). The
average value size is close to 1MB if large objects are dominant, otherwise it
should be estimated according to the specific use case. The number of writes is
3.
The upgrade of Riak from the 1.4.x series to the 2.0.x series is described in
Upgrading Your Configuration System. The
following are several major configuration items which are essential for Riak CS.
erlang.distribution_buffer_size
is commented out by default.
Riak 1.4 | Riak 2.0 |
---|---|
+zdbbl |
erlang.distribution_buffer_size = 1MB |
-name [email protected] |
nodename = [email protected] |
-setcookie riak |
distributed_cookie = riak |
If storage statistics are desired on your system, several more configuration options are required. Please see the storage statistics documentation for additional details.
The underlying Bitcask storage format has been changed in Riak 2.0.x to fix several important issues. The first start of Riak after an upgrade involves an implicit data format upgrade conversion, which means that all data files are read, and written out to new files. This might lead to higher than normal disk load. The duration of the upgrade will depend on the amount of data stored in Bitcask and the IO performance of the underlying disk.
The data conversion will start with logs like this:
2015-03-17 02:43:20.813 [info] <0.609.0>@riak_kv_bitcask_backend:maybe_start_upgrade_if_bitcask_files:720 Starting upgrade to version 1.7.0 in /mnt/data/bitcask/1096126227998177188652763624537212264741949407232
2015-03-17 02:43:21.344 [info] <0.610.0>@riak_kv_bitcask_backend:maybe_start_upgrade_if_bitcask_files:720 Starting upgrade to version 1.7.0 in /mnt/data/bitcask/1278813932664540053428224228626747642198940975104
The end of the data conversion can be observed as info log entries in Riak logs like this:
2015-03-17 07:18:49.754 [info] <0.609.0>@riak_kv_bitcask_backend:callback:446 Finished upgrading to Bitcask 1.7.0 in /mnt/data/bitcask/1096126227998177188652763624537212264741949407232
2015-03-17 07:23:07.181 [info] <0.610.0>@riak_kv_bitcask_backend:callback:446 Finished upgrading to Bitcask 1.7.0 in /mnt/data/bitcask/1278813932664540053428224228626747642198940975104
Multibag configurations must be moved to advanced.config
for both Riak CS and
Stanchion.
When upgrading to Riak CS 2.0.0 the files app.config
and vm.args
are
migrated to the single file riak-cs.conf
. Configuration files are still stored
in the same location as before.
Note: app.config
should be removed once you’ve completed the upgrade and
riak-cs.conf
is being used.
Some important configuration changes occurred between 1.5.x and 2.0.0, and not all items were translated one-to-one.
Using the Configuration Mapping Tables below, edit your riak-cs.conf
to
preserve your configuration preferences between Riak CS 1.5.x and 2.0.0.
The tables show the old and new configuration format and default values.
Note: storage.stats.schedule.$time
does not have any default value but an
example is added.
1.5.4 (app.config ) |
2.x (riak-cs.conf ) |
Note |
---|---|---|
{cs_ip, "127.0.0.1"} |
listener = 127.0.0.1:8080 |
|
{cs_port, 8080} |
||
{riak_ip, "127.0.0.1"} |
riak_host = 127.0.0.1:8087 |
|
{riak_pb_port, 8087} |
||
{stanchion_ip, "127.0.0.1"} |
stanchion_host = 127.0.0.1:8085 |
|
{stanchion_port, 8085 } |
||
{stanchion_ssl, false } |
stanchion_ssl = off |
|
{anonymous_user_creation, false} |
anonymous_user_creation = off |
|
{admin_key, "admin-key"} |
admin.key = admin-key |
|
{admin_secret, "admin-secret"} |
admin.secret = admin-secret |
|
{cs_root_host, "s3.amazonaws.com"} |
root_host = s3.amazonaws.com |
|
{connection_pools,[ |
||
{request_pool, {128, 0} }, |
pool.request.size = 128 |
|
pool.request.overflow = 0 |
||
{bucket_list_pool, {5, 0} } |
pool.list.size = 5 |
|
pool.list.overflow = 0 |
||
{max_buckets_per_user, 100} |
max_buckets_per_user = 100 |
from 2.0.1 |
{trust_x_forwarded_for, false} |
trust_x_forwarded_for = off |
|
{leeway_seconds, 86400} |
gc.leeway_period = 24h |
|
{gc_interval, 900} |
gc.interval = 15m |
|
{gc_retry_interval, 21600} |
gc.retry_interval = 6h |
|
{gc_batch_size, 1000} |
gc.batch_size = 1000 |
from 2.0.1 |
{access_log_flush_factor, 1} |
stats.access.flush_factor = 1 |
|
{access_log_flush_size, 1000000} |
stats.access.flush_size = 1000000 |
|
{access_archive_period, 3600} |
stats.access.archive_period = 1h |
|
{access_archiver_max_backlog, 2} |
stats.access.archiver.max_backlog = 2 |
|
(no explicit default) | stats.access.archiver.max_workers = 2 |
|
{storage_schedule, []} |
stats.storage.schedule.$time = 0600 |
|
{storage_archive_period, 86400} |
stats.storage.archive_period = 1d |
|
{usage_request_limit, 744} |
riak_cs.usage_request_limit = 31d |
|
{cs_version, 10300 } |
cs_version = 10300 |
|
{dtrace_support, false} |
dtrace = off |
1.5.4 (app.config ) |
2.0.0 (riak-cs.conf ) |
note |
---|---|---|
{server_name, "Riak CS"} |
server_name = Riak CS |
|
{log_handlers, ....} |
log.access = true |
from 2.0.1 |
log.access.dir = /var/log/riak-cs |
Due to a WebMachine change, if log_handlers
are defined in app.config
or
advanced.config
, the log handler's name should be changed as follows:
{log_handlers, [
{webmachine_access_log_handler, ["/var/log/riak-cs"]},
{riak_cs_access_log_handler, []}
]},
This does not have to be changed if log_handlers
is not defined in
app.config
or advanced.config
.
Riak's Lager configuration can be copied directly to the advanced.config
file.
All commented out items are undefined and disabled, except modules.
rewrite_module
and auth_module
are commented out, but the default value does
not change from Riak CS 1.5.4. This section is for showing operators how to
change these settings to the OOS API.
1.5.4 (app.config ) |
2.0.0 (riak-cs.conf ) |
---|---|
{rewrite_module, riak_cs_s3_rewrite } |
rewrite_module |
{auth_module, riak_cs_s3_auth }, |
auth_module |
{admin_ip, "127.0.0.1"} |
admin.listener = 127.0.0.1:8000 |
{admin_port, 8000 } |
|
{ssl, [ |
|
{certfile, "./etc/cert.pem"} |
ssl.certfile |
{keyfile, "./etc/key.pem"} |
ssl.keyfile |
The following configurations do not have corresponding items in riak-cs.conf
:
fold_objects_for_list_keys
n_val_1_get_requests
gc_paginated_indexes
If these values are still set as false
, they should be omitted from your Riak
CS 2.0.0 configuration.
If the old behavior is preferred, they must be included in the riak_cs
section
of advanced.config
.
To downgrade from Riak CS 2.0.0 to Riak CS 1.5.x and Stanchion 2.0.0 to Stanchion 1.5.0, repeat the following instructions for each node:
- Stop Riak CS
- Stop Riak
- Uninstall the Riak CS 2.0.0 package
- Uninstall the Riak 2.0.5 package
- Run the Bitcask downgrade script for all Bitcask directories*
- Install the desired Riak package
- Install the desired Riak CS package
- Restore configuration files
- Start Riak
- Start Riak CS
Finally, on any nodes running Stanchion:
- Stop Stanchion
- Uninstall the Stanchion 2.0.0 package
- Install the desired Stanchion package
- Restore Stanchion configuration files
- Start Stanchion
*The Bitcask file format has changed between Riak 1.4.x and 2.0.0. While the implicit upgrade of Bitcask data files is supported, automatic downgrades of Bitcask data files is not. For this reason downgrading requires a script to translate data files. See also the 2.0 downgrade notes.