Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ql:contains-word now can show the score of the word match in the respective text #1397

Merged
merged 56 commits into from
Dec 16, 2024
Merged
Changes from 2 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ea9d39c
ql:contains-word now can show the respective word-score.
Flixtastic Jul 12, 2024
30736ef
Fixed tests and formatted files.
Flixtastic Jul 12, 2024
e752db8
New formatting for Word Score Variables. Changed where necessary and …
Flixtastic Jul 27, 2024
4ef4d93
Merge branch 'ad-freiburg:master' into master
Flixtastic Jul 27, 2024
d52063f
Merge branch 'ad-freiburg:master' into master
Flixtastic Jul 29, 2024
c6fe0c6
Merge branch 'master' of github.com:Flixtastic/qlever.
Flixtastic Jul 29, 2024
d0b9ee8
Added getWordSCoreVariable for std::string_view
Flixtastic Jul 29, 2024
2eade97
Merge branch 'ad-freiburg:master' into master
Flixtastic Sep 23, 2024
595cb57
Merge branch 'ad-freiburg:master' into master
Flixtastic Oct 4, 2024
b4c8c3b
Merge branch 'ad-freiburg:master' into master
Flixtastic Oct 26, 2024
72e5d64
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 12, 2024
d8f9df4
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 15, 2024
29511c6
Made it possible to construct query execution contexts with text inde…
Flixtastic Nov 15, 2024
3855978
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 17, 2024
6021401
Reduced usage of column copying in TextIndexScanForWord.cpp
Flixtastic Nov 17, 2024
d9701ae
Merge branch 'ad-freiburg:master' into master
Flixtastic Nov 19, 2024
5f0ce01
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 3, 2024
e2c47cf
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 3, 2024
e6a0cf7
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 4, 2024
ed9fbda
Changed the counting of nofNonLiterals to nofLiterals. Some methods a…
Flixtastic Dec 4, 2024
5ad3d8f
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 4, 2024
af6bd64
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 5, 2024
56ea531
Cleaned up the filtering in TextIndexScanForWord::computeResult and c…
Flixtastic Dec 5, 2024
e1e12e9
renamed nofLiterals to nofLiteralsInTextIndex
Flixtastic Dec 5, 2024
017588c
Removed redundant method getWordScoreVariable
Flixtastic Dec 5, 2024
46666d0
added method appendEscapedWord to escape special chars in Variables
Flixtastic Dec 5, 2024
f36f189
Added two function in the TextIndexScanTestHelpers.h to add content t…
Flixtastic Dec 5, 2024
c62a7e6
Added tests for Scores. Also commented tests and refined them
Flixtastic Dec 5, 2024
89f0b27
Changed the getQec function and the respective makeTestIndex to take …
Flixtastic Dec 5, 2024
058e8ed
Merge branch 'ad-freiburg:master' into master
Flixtastic Dec 6, 2024
e8bf56e
Fix the multiple definition error.
joka921 Dec 12, 2024
5173aeb
Merge branch 'master' into flixtastic-master
joka921 Dec 12, 2024
4a15994
Make query planning of index scans fast again (#1674)
joka921 Dec 12, 2024
70964d6
Allow operations to not store their result in the cache (#1665)
joka921 Dec 12, 2024
4237e0d
For C++17, use `range-v3` instead of `std::ranges` (#1667)
joka921 Dec 12, 2024
1adcecb
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
f5eefab
Revert to first sync and then reapply "Reverting the nofLiterals bein…
Flixtastic Dec 12, 2024
583a67a
ql:contains-word now can show the respective word-score.
Flixtastic Dec 12, 2024
e4cb2ed
Fixed tests and formatted files.
Flixtastic Jul 12, 2024
3ce304d
New formatting for Word Score Variables. Changed where necessary and …
Flixtastic Jul 27, 2024
eb8e83a
Added getWordSCoreVariable for std::string_view
Flixtastic Jul 29, 2024
cd4789a
Made it possible to construct query execution contexts with text inde…
Flixtastic Dec 12, 2024
fdba417
Changed the counting of nofNonLiterals to nofLiterals. Some methods a…
Flixtastic Dec 4, 2024
6686325
renamed nofLiterals to nofLiteralsInTextIndex
Flixtastic Dec 5, 2024
0faf3d0
Removed redundant method getWordScoreVariable
Flixtastic Dec 5, 2024
eafd594
added method appendEscapedWord to escape special chars in Variables
Flixtastic Dec 5, 2024
fd01a97
Added two function in the TextIndexScanTestHelpers.h to add content t…
Flixtastic Dec 5, 2024
65842f4
Added tests for Scores. Also commented tests and refined them
Flixtastic Dec 5, 2024
baa10cf
Changed the getQec function and the respective makeTestIndex to take …
Flixtastic Dec 5, 2024
6bb80d3
Fix the multiple definition error.
joka921 Dec 12, 2024
d093d85
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
716e828
Revert to first sync and then reapply "Reverting the nofLiterals bein…
Flixtastic Dec 12, 2024
613a2c4
Merge remote-tracking branch 'origin/master'
Flixtastic Dec 12, 2024
2e32bd3
Reverting the nofLiterals being saved in the TextMetaData and instead…
Flixtastic Dec 12, 2024
e93f944
Changed some naming to better describe functions
Flixtastic Dec 12, 2024
deb1e37
Changed the ambiguous naming of nofNonLiterals to nofNonLiteralsInTex…
Flixtastic Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions src/engine/TextIndexScanForWord.cpp
Original file line number Diff line number Diff line change
@@ -20,9 +20,10 @@ Result TextIndexScanForWord::computeResult(

if (!isPrefix_) {
IdTable smallIdTable{getExecutionContext()->getAllocator()};
smallIdTable.setNumColumns(1);
smallIdTable.setNumColumns(2);
smallIdTable.resize(idTable.numRows());
std::ranges::copy(idTable.getColumn(0), smallIdTable.getColumn(0).begin());
std::ranges::copy(idTable.getColumn(2), smallIdTable.getColumn(1).begin());
Flixtastic marked this conversation as resolved.
Show resolved Hide resolved

return {std::move(smallIdTable), resultSortedOn(), LocalVocab{}};
}
@@ -46,12 +47,13 @@ VariableToColumnMap TextIndexScanForWord::computeVariableToColumnMap() const {
addDefinedVar(textRecordVar_.getMatchingWordVariable(
std::string_view(word_).substr(0, word_.size() - 1)));
}
addDefinedVar(textRecordVar_.getScoreVariable(word_));
return vcmap;
}

// _____________________________________________________________________________
size_t TextIndexScanForWord::getResultWidth() const {
return 1 + (isPrefix_ ? 1 : 0);
return 2 + (isPrefix_ ? 1 : 0);
}

// _____________________________________________________________________________
7 changes: 5 additions & 2 deletions src/index/FTSAlgorithms.cpp
Original file line number Diff line number Diff line change
@@ -10,19 +10,21 @@
// _____________________________________________________________________________
IdTable FTSAlgorithms::filterByRange(const IdRange<WordVocabIndex>& idRange,
const IdTable& idTablePreFilter) {
AD_CONTRACT_CHECK(idTablePreFilter.numColumns() == 2);
AD_CONTRACT_CHECK(idTablePreFilter.numColumns() == 3);
LOG(DEBUG) << "Filtering " << idTablePreFilter.getColumn(0).size()
<< " elements by ID range...\n";

IdTable idTableResult{idTablePreFilter.getAllocator()};
idTableResult.setNumColumns(2);
idTableResult.setNumColumns(3);
idTableResult.resize(idTablePreFilter.getColumn(0).size());

decltype(auto) resultCidColumn = idTableResult.getColumn(0);
decltype(auto) resultWidColumn = idTableResult.getColumn(1);
decltype(auto) resultSidColumn = idTableResult.getColumn(2);
size_t nofResultElements = 0;
decltype(auto) preFilterCidColumn = idTablePreFilter.getColumn(0);
decltype(auto) preFilterWidColumn = idTablePreFilter.getColumn(1);
decltype(auto) preFilterSidColumn = idTablePreFilter.getColumn(2);
// TODO<C++23> Use views::zip.
for (size_t i = 0; i < preFilterWidColumn.size(); ++i) {
// TODO<joka921> proper Ids for the text stuff.
@@ -36,6 +38,7 @@ IdTable FTSAlgorithms::filterByRange(const IdRange<WordVocabIndex>& idRange,
preFilterWidColumn[i].getWordVocabIndex() <= idRange.last()) {
resultCidColumn[nofResultElements] = preFilterCidColumn[i];
resultWidColumn[nofResultElements] = preFilterWidColumn[i];
resultSidColumn[nofResultElements] = preFilterSidColumn[i];
nofResultElements++;
}
}
9 changes: 7 additions & 2 deletions src/index/IndexImpl.Text.cpp
Original file line number Diff line number Diff line change
@@ -719,7 +719,7 @@ std::string_view IndexImpl::wordIdToString(WordIndex wordIndex) const {
IdTable IndexImpl::readWordCl(
const TextBlockMetaData& tbmd,
const ad_utility::AllocatorWithLimit<Id>& allocator) const {
IdTable idTable{2, allocator};
IdTable idTable{3, allocator};
vector<TextRecordIndex> cids = readGapComprList<TextRecordIndex>(
tbmd._cl._nofElements, tbmd._cl._startContextlist,
static_cast<size_t>(tbmd._cl._startWordlist - tbmd._cl._startContextlist),
@@ -735,6 +735,11 @@ IdTable IndexImpl::readWordCl(
idTable.getColumn(1).begin(), [](WordIndex id) {
return Id::makeFromWordVocabIndex(WordVocabIndex::make(id));
});
std::ranges::transform(
readFreqComprList<Score>(tbmd._cl._nofElements, tbmd._cl._startScorelist,
static_cast<size_t>(tbmd._cl._lastByte + 1 -
tbmd._cl._startScorelist)),
idTable.getColumn(2).begin(), &Id::makeFromInt);
return idTable;
}

@@ -773,7 +778,7 @@ IdTable IndexImpl::getWordPostingsForTerm(
const ad_utility::AllocatorWithLimit<Id>& allocator) const {
LOG(DEBUG) << "Getting word postings for term: " << term << '\n';
IdTable idTable{allocator};
idTable.setNumColumns(term.ends_with('*') ? 2 : 1);
idTable.setNumColumns(term.ends_with('*') ? 3 : 2);
auto optionalTbmd = getTextBlockMetadataForWordOrPrefix(term);
if (!optionalTbmd.has_value()) {
return idTable;
1 change: 1 addition & 0 deletions src/parser/sparqlParser/SparqlQleverVisitor.cpp
Original file line number Diff line number Diff line change
@@ -1279,6 +1279,7 @@ void Visitor::setMatchingWordAndScoreVisibleIfPresent(
}
for (std::string_view s : std::vector<std::string>(
absl::StrSplit(name.substr(1, name.size() - 2), ' '))) {
addVisibleVariable(var->getScoreVariable(std::string(s)));
Flixtastic marked this conversation as resolved.
Show resolved Hide resolved
if (!s.ends_with('*')) {
continue;
}
2 changes: 1 addition & 1 deletion test/QueryPlannerTestHelpers.h
Original file line number Diff line number Diff line change
@@ -104,7 +104,7 @@ constexpr auto TextIndexScanForWord = [](Variable textRecordVar,
string word) -> QetMatcher {
return RootOperation<::TextIndexScanForWord>(AllOf(
AD_PROPERTY(::TextIndexScanForWord, getResultWidth,
Eq(1 + word.ends_with('*'))),
Eq(2 + word.ends_with('*'))),
AD_PROPERTY(::TextIndexScanForWord, textRecordVar, Eq(textRecordVar)),
AD_PROPERTY(::TextIndexScanForWord, word, word)));
};
16 changes: 9 additions & 7 deletions test/engine/TextIndexScanForWordTest.cpp
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@
#include <gmock/gmock.h>
#include <gtest/gtest.h>

#include "../printers/VariablePrinters.h"
#include "../util/GTestHelpers.h"
#include "../util/IdTableHelpers.h"
#include "../util/IndexTestHelpers.h"
@@ -29,17 +30,18 @@ TEST(TextIndexScanForWord, WordScanPrefix) {
TextIndexScanForWord s1{qec, Variable{"?text1"}, "test*"};
TextIndexScanForWord s2{qec, Variable{"?text2"}, "test*"};

ASSERT_EQ(s1.getResultWidth(), 2);
ASSERT_EQ(s1.getResultWidth(), 3);

auto result = s1.computeResultOnlyForTesting();
ASSERT_EQ(result.idTable().numColumns(), 2);
ASSERT_EQ(result.idTable().numColumns(), 3);
ASSERT_EQ(result.idTable().size(), 3);
s2.getExternallyVisibleVariableColumns();

using enum ColumnIndexAndTypeInfo::UndefStatus;
VariableToColumnMap expectedVariables{
{Variable{"?text2"}, {0, AlwaysDefined}},
{Variable{"?ql_matchingword_text2_test"}, {1, AlwaysDefined}}};
{Variable{"?ql_matchingword_text2_test"}, {1, AlwaysDefined}},
{Variable{"?ql_score_text2_fixedEntity_test_42_"}, {2, AlwaysDefined}}};
EXPECT_THAT(s2.getExternallyVisibleVariableColumns(),
::testing::UnorderedElementsAreArray(expectedVariables));

@@ -60,10 +62,10 @@ TEST(TextIndexScanForWord, WordScanBasic) {

TextIndexScanForWord s1{qec, Variable{"?text1"}, "test"};

ASSERT_EQ(s1.getResultWidth(), 1);
ASSERT_EQ(s1.getResultWidth(), 2);

auto result = s1.computeResultOnlyForTesting();
ASSERT_EQ(result.idTable().numColumns(), 1);
ASSERT_EQ(result.idTable().numColumns(), 2);
ASSERT_EQ(result.idTable().size(), 2);

ASSERT_EQ("\"he failed the test\"",
@@ -73,10 +75,10 @@ TEST(TextIndexScanForWord, WordScanBasic) {

TextIndexScanForWord s2{qec, Variable{"?text1"}, "testing"};

ASSERT_EQ(s2.getResultWidth(), 1);
ASSERT_EQ(s2.getResultWidth(), 2);

result = s2.computeResultOnlyForTesting();
ASSERT_EQ(result.idTable().numColumns(), 1);
ASSERT_EQ(result.idTable().numColumns(), 2);
joka921 marked this conversation as resolved.
Show resolved Hide resolved
ASSERT_EQ(result.idTable().size(), 1);

ASSERT_EQ("\"testing can help\"",
Loading