Skip to content

Commit

Permalink
Added configuration file parameters for repository metadata lock timings
Browse files Browse the repository at this point in the history
  • Loading branch information
Madeeks committed May 5, 2023
1 parent b56ef57 commit c64b7f6
Show file tree
Hide file tree
Showing 9 changed files with 96 additions and 14 deletions.
7 changes: 4 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- Added the `sarus hooks` command to list the hooks configured for the engine
- Added the `--annotation` option to `sarus run` for setting custom annotations in the OCI bundle. More details [here](https://sarus.readthedocs.io/en/stable/user/user_guide.html#setting-oci-annotations)
- Added the `--mpi-type` option to `sarus run` for selecting an MPI hook among those configured by the system administrator
- Added a warning message when acquisition of a lock file on the local repository metadata file is taking an unusually long time.
The message is displayed at a [configurable interval](https://sarus.readthedocs.io/en/stable/config/configuration_reference.html#repositorymetadatalocktimings-object-optional) (default 10 seconds), until the lock acquisition timeout is reached.
- Added support for the optional `defaultMPIType` parameter in the `sarus.json` configuration file. More details [here](https://sarus.readthedocs.io/en/stable/config/configuration_reference.html#defaultmpitype-string-optional).
- Added a warning message when acquisition of a lockfile on the local repository metadata file is taking an unusually long time.
The message is displayed every 1 second, until the lock acquisition timeout is reached.
- Added support for the optional `repositoryMetadataLockTimings` parameter in the `sarus.json` configuration file. More details [here](https://sarus.readthedocs.io/en/stable/config/configuration_reference.html#repositorymetadatalocktimings-object-optional).
- Added a new OCI hook to perform arbitrary sequences of bind mounts and device mounts into containers.
The hook is meant to streamline the implementation and usage of advanced features which can be enabled through sets of related mounts.
More details [here](https://sarus.readthedocs.io/en/stable/config/mount-hook.html).
Expand All @@ -22,7 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Changed

- Sarus will now exit with an error if an operation requiring a lockfile on the local repository metadata cannot acquire a lock within 10 seconds.
- Sarus will now exit with an error if an operation requiring a lock file on the local repository metadata cannot acquire a lock within the [configured timeout duration](https://sarus.readthedocs.io/en/stable/config/configuration_reference.html#repositorymetadatalocktimings-object-optional) (default 60 seconds).
Previously, Sarus would keep attempting to acquire a lock indefinitely.
- When printing error traces, entries related to standard C++ exceptions now provide clearer information
- Updated recommended runc version to 1.1.6
Expand Down
30 changes: 29 additions & 1 deletion doc/config/configuration_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,30 @@ This could fit the case of a system where only a single MPI hook is configured.
Refer to :ref:`this section <mpi-hook-config-annotations-cli>` for more details
about the interaction between command line options, annotations, and hook selection.

.. _config-reference-repositoryMetadataLockTimings:

repositoryMetadataLockTimings (object, OPTIONAL)
------------------------------------------------
To prevent race conditions when accessing metadata files of local image
repositories, Sarus implements a mechanism based on a lock file.

The ``repositoryMetadataLockTimings`` JSON object allows to control the
behavior of Sarus while it is waiting to acquire the lock on the metadata file.
This object can have the following optional fields:

* ``timeoutMs`` (integer): Maximum amount of time (in milliseconds) for which
Sarus attempts to acquire the lock. If Sarus is still waiting after this time
has elapsed, the program times out and exits with an error.
The value must be a positive integer. If the parameter is not defined,
defaults to 60000 ms (60 seconds).
* ``warningMs``: While waiting to acquire the lock, Sarus prints regular
warning messages to inform of its status (i.e. access to the metadata file
is taking longer but the program is not stalled for another reason) until
the timeout is reached or acquisition of the lock is successful.
This parameter determines the time interval (in milliseconds) between such
warning messages. The value must be a positive integer. If the parameter is not
defined, defaults to 10000 ms (10 seconds).

.. _config-reference-PMIxv3:

enablePMIxv3Support (bool, OPTIONAL) (experimental)
Expand Down Expand Up @@ -482,5 +506,9 @@ Example configuration file
"enforce": false
},
"containersRegistries.dPath": "/opt/sarus/1.5.2/etc/registries.d"
"defaultMPIType": "mpich"
"defaultMPIType": "mpich",
"repositoryMetadataLockTimings": {
"timeoutMs": 120000,
"warningMs": 15000
}
}
13 changes: 13 additions & 0 deletions etc/sarus.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,19 @@
},
"enablePMIxv3Support": {
"type": "boolean"
},
"repositoryMetadataLockTimings": {
"type": "object",
"properties": {
"timeoutMs": {
"type": "integer",
"minimum": 1
},
"warningMs": {
"type": "integer",
"minimum": 1
}
}
}
},
"required": [
Expand Down
4 changes: 2 additions & 2 deletions src/common/Lockfile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Lockfile::Lockfile()
: logger(&common::Logger::getInstance())
{}

Lockfile::Lockfile(const boost::filesystem::path& file, unsigned int timeoutMs)
Lockfile::Lockfile(const boost::filesystem::path& file, unsigned int timeoutMs, unsigned int warningMs)
: logger(&common::Logger::getInstance())
, lockfile{convertToLockfile(file)}
{
Expand All @@ -45,7 +45,7 @@ Lockfile::Lockfile(const boost::filesystem::path& file, unsigned int timeoutMs)
const int backoffTimeMs = 100;
std::this_thread::sleep_for(std::chrono::milliseconds(backoffTimeMs));
elapsedTimeMs += backoffTimeMs;
if(elapsedTimeMs % 1000 == 0) {
if(elapsedTimeMs % warningMs == 0) {
message = boost::format("Still attempting to acquire lock on file %s after %d ms (will timeout after %d milliseconds)...")
% *lockfile % elapsedTimeMs % timeoutMs;
logger->log(message.str(), loggerSubsystemName, common::LogLevel::WARN);
Expand Down
2 changes: 1 addition & 1 deletion src/common/Lockfile.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class Lockfile {

public:
Lockfile();
Lockfile(const boost::filesystem::path& file, unsigned int timeoutMs=noTimeout);
Lockfile(const boost::filesystem::path& file, unsigned int timeoutMs=noTimeout, unsigned int warningMs=1000);
Lockfile(const Lockfile&) = delete;
Lockfile(Lockfile&&);
~Lockfile();
Expand Down
7 changes: 6 additions & 1 deletion src/common/test/json/valid.json
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,10 @@
"path": "/opt/sarus/etc/policy.json",
"enforce": false
},
"containersRegistries.dPath": "/opt/sarus/etc/registries.d"
"containersRegistries.dPath": "/opt/sarus/etc/registries.d",
"defaultMPIType": "mpich",
"repositoryMetadataLockTimings": {
"timeoutMs": 120000,
"warningMs": 15000
}
}
18 changes: 18 additions & 0 deletions src/common/test/test_JSON.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,18 @@ TEST(JSONTestGroup, validFile) {
CHECK_EQUAL(config->json["seccompProfile"].GetString(), std::string("/opt/sarus/etc/seccomp/default.json"));
CHECK_EQUAL(config->json["apparmorProfile"].GetString(), std::string("sarus-default"));
CHECK_EQUAL(config->json["selinuxLabel"].GetString(), std::string("system_u:system_r:svirt_sarus_t:s0:c124,c675"));
CHECK_EQUAL(config->json["selinuxMountLabel"].GetString(), std::string("system_u:object_r:svirt_sarus_file_t:s0:c715,c811"));

const rapidjson::Value& containersPolicy = config->json["containersPolicy"];
CHECK_EQUAL(containersPolicy["path"].GetString(), std::string("/opt/sarus/etc/policy.json"));
CHECK_EQUAL(containersPolicy["enforce"].GetBool(), false);

CHECK_EQUAL(config->json["containersRegistries.dPath"].GetString(), std::string("/opt/sarus/etc/registries.d"));
CHECK_EQUAL(config->json["defaultMPIType"].GetString(), std::string("mpich"));

const rapidjson::Value& lockTimings = config->json["repositoryMetadataLockTimings"];
CHECK_EQUAL(lockTimings["timeoutMs"].GetInt(), 120000);
CHECK_EQUAL(lockTimings["warningMs"].GetInt(), 15000);
}

TEST(JSONTestGroup, minimumRequirementsFile) {
Expand All @@ -96,4 +108,10 @@ TEST(JSONTestGroup, siteMountWithoutType) {
CHECK_THROWS(sarus::common::Error, sarus::common::Config(jsonFile, jsonSchemaFile));
}

TEST(JSONTestGroup, invalidLockTiming) {
boost::filesystem::path jsonFile(testSourceDir / "json/invlid_lock_timing.json");
boost::filesystem::path jsonSchemaFile(projectRootDir / "etc/sarus.schema.json");
CHECK_THROWS(sarus::common::Error, sarus::common::Config(jsonFile, jsonSchemaFile));
}

SARUS_UNITTEST_MAIN_FUNCTION();
26 changes: 21 additions & 5 deletions src/image_manager/ImageStore.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,23 @@ namespace image_manager {
ImageStore::ImageStore(std::shared_ptr<const common::Config> config)
: imagesDirectory{config->directories.images}
, metadataFile{config->directories.repository / "metadata.json"}
{}
{
auto rjPtr = rj::Pointer("/repositoryMetadataLockTimings/timeoutMs");
if (const rj::Value* configLockTimeoutMs = rjPtr.Get(config->json)) {
lockTimeoutMs = configLockTimeoutMs->GetInt();
}
else {
lockTimeoutMs = 60000;
}

rjPtr = rj::Pointer("/repositoryMetadataLockTimings/warningMs");
if (const rj::Value* configLockWarningMs = rjPtr.Get(config->json)) {
lockWarningMs = configLockWarningMs->GetInt();
}
else {
lockWarningMs = 10000;
}
}

/**
* Add the container image into repository (or update existing object)
Expand All @@ -45,7 +61,7 @@ namespace image_manager {
common::LogLevel::INFO);

try {
common::Lockfile lock{metadataFile, lockTimeoutMs};
common::Lockfile lock{metadataFile, lockTimeoutMs, lockWarningMs};
auto metadata = readRepositoryMetadata();

// remove previous entries with the same image reference (if any)
Expand Down Expand Up @@ -82,7 +98,7 @@ namespace image_manager {
common::LogLevel::INFO);

try {
common::Lockfile lock{metadataFile, lockTimeoutMs};
common::Lockfile lock{metadataFile, lockTimeoutMs, lockWarningMs};
auto repositoryMetadata = readRepositoryMetadata();
auto imageMetadata = findImageMetadata(imageReference, repositoryMetadata);

Expand Down Expand Up @@ -114,7 +130,7 @@ namespace image_manager {
auto images = std::vector<common::SarusImage>{};

try {
common::Lockfile lock{metadataFile, lockTimeoutMs};
common::Lockfile lock{metadataFile, lockTimeoutMs, lockWarningMs};
auto repositoryMetadata = readRepositoryMetadata();
for (const auto& imageMetadata : repositoryMetadata["images"].GetArray()) {
// If backing files are present, all image data is available: add the image to list to be visualized.
Expand Down Expand Up @@ -143,7 +159,7 @@ namespace image_manager {
boost::optional<common::SarusImage> image;

try {
common::Lockfile lock{metadataFile, lockTimeoutMs};
common::Lockfile lock{metadataFile, lockTimeoutMs, lockWarningMs};
auto repositoryMetadata = readRepositoryMetadata();
auto imageMetadata = findImageMetadata(reference, repositoryMetadata);
if (imageMetadata) {
Expand Down
3 changes: 2 additions & 1 deletion src/image_manager/ImageStore.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ class ImageStore {
const std::string sysname = "ImageStore"; // system name for logger
boost::filesystem::path imagesDirectory;
boost::filesystem::path metadataFile;
unsigned int lockTimeoutMs = 10000;
unsigned int lockWarningMs;
unsigned int lockTimeoutMs;
};

} // namespace
Expand Down

0 comments on commit c64b7f6

Please sign in to comment.