Skip to content

Commit

Permalink
Fix recover ssd cache data by preserving file id map (facebookincubat…
Browse files Browse the repository at this point in the history
…or#10409)

Summary:
In Meta interactive Prestissimo workload, we found there is not much read can hit in SSD cache
after cluster upgrade or worker reboot if we rerun the same set of queries. Sometime this is
because of the non-deterministic  execution order of multiple filter conditions in table scan. If the
filter rate is high, then the last few filter conditions might not be executed and the decision is
local to a split which means that filter condition execution order varies across splits. This might
cause the rerun query to read from remote storage instead of local SSD cache. But this only
cause a small portion of reads go to remote storage.

The main problem is due to the recover cache entries from checkpoint. Even though we guarantee
that the same source file name maps to the same file id but we can't guarantee that the newly
assigned file id will be mapped to the same SSD file shard. This will cause the cache space polution.
Even if we recovered all the SSD cache entries from checkpoint file, the cache lookup can't find them
if the newly assigned file id is mapped to a different SSD file shard.

To fix this problem, we add a new constructor in StringIdMap which takes both id and string to recover
the previous string id map. The caller guarantees there is no conflict for the passed id/string pairs, and
StringIdMap will throw if breaks. Correspondingly, when SSD file recovers from the checkpoint file, it
will use the persisted file id and name to rebuild the string id map.

Verified on Meta interactive cluster with SSD and a second run after cluster reboot read all the data from
local SSD. The end-to-end query latency has been reduced by half compared with the first run.

Pull Request resolved: facebookincubator#10409

Reviewed By: oerling

Differential Revision: D59418142

Pulled By: xiaoxmeng

fbshipit-source-id: e88a1aaed303bd1913b5eeb67389da4df32faf47
  • Loading branch information
xiaoxmeng authored and facebook-github-bot committed Jul 6, 2024
1 parent 259a4e8 commit 770f2ad
Show file tree
Hide file tree
Showing 4 changed files with 117 additions and 16 deletions.
3 changes: 1 addition & 2 deletions velox/common/caching/SsdFile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -998,8 +998,7 @@ void SsdFile::readCheckpoint(std::ifstream& state) {
std::string name;
name.resize(readNumber<int32_t>(state));
state.read(name.data(), name.size());
auto lease = StringIdLease(fileIds(), name);
idMap[id] = std::move(lease);
idMap[id] = StringIdLease(fileIds(), id, name);
}

const auto logSize = ::lseek(evictLogFd_, 0, SEEK_END);
Expand Down
33 changes: 32 additions & 1 deletion velox/common/caching/StringIdMap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,41 @@ uint64_t StringIdMap::makeId(std::string_view string) {
} while (idToEntry_.find(entry.id) != idToEntry_.end());
entry.numInUse = 1;
pinnedSize_ += string.size();
auto id = entry.id;
const auto id = entry.id;
idToEntry_[id] = std::move(entry);
stringToId_[string] = id;
return lastId_;
}

uint64_t StringIdMap::recoverId(uint64_t id, std::string_view string) {
std::lock_guard<std::mutex> l(mutex_);
auto it = stringToId_.find(string);
if (it != stringToId_.end()) {
VELOX_CHECK_EQ(
id, it->second, "Multiple recover ids assigned to {}", string);
auto entry = idToEntry_.find(it->second);
VELOX_CHECK(entry != idToEntry_.end());
if (++entry->second.numInUse == 1) {
pinnedSize_ += entry->second.string.size();
}
return id;
}

VELOX_CHECK_EQ(
idToEntry_.count(id),
0,
"Reused recover id {} assigned to {}",
id,
string);

Entry entry;
entry.string = string;
entry.id = id;
lastId_ = std::max(lastId_, id);
entry.numInUse = 1;
pinnedSize_ += string.size();
idToEntry_[id] = std::move(entry);
stringToId_[string] = id;
return id;
}
} // namespace facebook::velox
47 changes: 34 additions & 13 deletions velox/common/caching/StringIdMap.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,35 +35,48 @@ class StringIdMap {
void operator=(const StringIdMap& other) = delete;
void operator=(StringIdMap&& other) = delete;

// Returns the id of 'string' or kNoId if the string is not known.
/// Returns the id of 'string' or kNoId if the string is not known.
uint64_t id(std::string_view string);

// Returns the total length of strings involved in currently referenced
// mappings.
/// Returns the total length of strings involved in currently referenced
/// mappings.
int64_t pinnedSize() const {
return pinnedSize_;
}

// Returns the id for 'string' and increments its use count. Assigns a
// new id if none exists. Must be released with release() when no longer used.
/// Returns the id for 'string' and increments its use count. Assigns a
/// new id if none exists. Must be released with release() when no longer
/// used.
uint64_t makeId(std::string_view string);

/// Returns the id for 'string' and increments its use count. Assigns a new id
/// if none exists. Must be released with release() when no longer used.
/// Recovers string id map by assigning 'id' to 'string' and increments its
/// use count. The function returns the recovered 'id'. It throws if 'id' has
/// already been assigned to other string or 'string' has already assigned a
/// different id. Must be released with release() when no longer used.
///
/// NOTE: this is used by SSD cache to recover the file name to file id
/// mapping. This is to ensure the same file to be mapped to the same SSD file
/// shard after recover.
uint64_t recoverId(uint64_t id, std::string_view string);

// Decrements the use count of id and may free the associated memory if no
// uses remain.
void release(uint64_t id);

// Increments the use count of 'id'.
/// Increments the use count of 'id'.
void addReference(uint64_t id);

// Returns a copy of the string associated with id or empty string if id has
// no string.
/// Returns a copy of the string associated with id or empty string if id has
/// no string.
std::string string(uint64_t id) {
std::lock_guard<std::mutex> l(mutex_);
auto it = idToEntry_.find(id);
return it == idToEntry_.end() ? "" : it->second.string;
}

// Resets StringIdMap.
/// Resets StringIdMap.
void testingReset() {
std::lock_guard<std::mutex> l(mutex_);
stringToId_.clear();
Expand All @@ -72,30 +85,38 @@ class StringIdMap {
pinnedSize_ = 0;
}

uint64_t testingLastId() const {
std::lock_guard<std::mutex> l(mutex_);
return lastId_;
}

private:
struct Entry {
std::string string;
uint64_t id;
uint32_t numInUse{};
};

std::mutex mutex_;
mutable std::mutex mutex_;
folly::F14FastMap<std::string, uint64_t> stringToId_;
folly::F14FastMap<uint64_t, Entry> idToEntry_;
uint64_t lastId_{0};
uint64_t pinnedSize_{0};
};

// Keeps a string-id association live for the duration of this.
/// Keeps a string-id association live for the duration of this.
class StringIdLease {
public:
StringIdLease() = default;

// Makes a lease for 'string' and makes sure it has an id.
/// Makes a lease for 'string' and makes sure it has an id.
StringIdLease(StringIdMap& ids, std::string_view string)
: ids_(&ids), id_(ids_->makeId(string)) {}

// Makes a new lease for an id that already references a string.
StringIdLease(StringIdMap& ids, uint64_t id, std::string_view string)
: ids_(&ids), id_(ids_->recoverId(id, string)) {}

/// Makes a new lease for an id that already references a string.
StringIdLease(StringIdMap& ids, uint64_t id) : ids_(&ids), id_(id) {
ids_->addReference(id_);
}
Expand Down
50 changes: 50 additions & 0 deletions velox/common/caching/tests/StringIdMapTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "velox/common/caching/StringIdMap.h"

#include "gtest/gtest.h"
#include "velox/common/base/tests/GTestUtils.h"

using namespace facebook::velox;

Expand Down Expand Up @@ -53,3 +54,52 @@ TEST(StringIdMapTest, rehash) {
EXPECT_EQ(ids[i].id(), StringIdLease(map, name).id());
}
}

TEST(StringIdMapTest, recover) {
constexpr const char* kRecoverFile1 = "file_1";
constexpr const char* kRecoverFile2 = "file_2";
constexpr const char* kRecoverFile3 = "file_3";
StringIdMap map;
const uint64_t recoverId1{10};
const uint64_t recoverId2{20};
{
StringIdLease lease(map, recoverId1, kRecoverFile1);
ASSERT_TRUE(lease.hasValue());
ASSERT_EQ(map.pinnedSize(), ::strlen(kRecoverFile1));
ASSERT_EQ(map.testingLastId(), recoverId1);
VELOX_ASSERT_THROW(
std::make_unique<StringIdLease>(map, recoverId1, kRecoverFile2),
"(1 vs. 0) Reused recover id 10 assigned to file_2");
VELOX_ASSERT_THROW(
std::make_unique<StringIdLease>(map, recoverId2, kRecoverFile1),
"(20 vs. 10) Multiple recover ids assigned to file_1");
}
ASSERT_EQ(map.pinnedSize(), 0);

StringIdLease lease1(map, kRecoverFile1);
ASSERT_EQ(map.pinnedSize(), ::strlen(kRecoverFile1));
ASSERT_EQ(map.testingLastId(), recoverId1 + 1);

{
StringIdLease lease(map, recoverId2, kRecoverFile2);
ASSERT_TRUE(lease.hasValue());
ASSERT_EQ(lease.id(), recoverId2);
ASSERT_EQ(
map.pinnedSize(), ::strlen(kRecoverFile1) + ::strlen(kRecoverFile2));
ASSERT_EQ(map.testingLastId(), recoverId2);
VELOX_ASSERT_THROW(
std::make_unique<StringIdLease>(map, recoverId2, kRecoverFile3),
"(1 vs. 0) Reused recover id 20 assigned to file_3");
VELOX_ASSERT_THROW(
std::make_unique<StringIdLease>(map, recoverId2, kRecoverFile1),
"(20 vs. 11) Multiple recover ids assigned to file_1");
StringIdLease dupLease(map, recoverId2, kRecoverFile2);
ASSERT_TRUE(lease.hasValue());
ASSERT_EQ(lease.id(), recoverId2);
ASSERT_EQ(
map.pinnedSize(), ::strlen(kRecoverFile1) + ::strlen(kRecoverFile2));
}

ASSERT_EQ(map.testingLastId(), recoverId2);
ASSERT_EQ(map.pinnedSize(), ::strlen(kRecoverFile1));
}

0 comments on commit 770f2ad

Please sign in to comment.