-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern acceleration #87
Conversation
src/main/java/com/teragrep/pth_06/planner/BloomFilterTempTable.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/walker/conditions/ElementCondition.java
Outdated
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/walker/conditions/ElementConditionTest.java
Show resolved
Hide resolved
… and IndexStatementConditionTest and tests for isBloomSearchCondition method
} | ||
|
||
public ConditionConfig(DSLContext ctx, boolean streamQuery, boolean bloomEnabled, boolean withoutFilters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this removing the feature that allows to search files without a filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option was not used in github repo. Added option with implementation and tests
* Inserts a filter for each filtertype inside parent table records. Filter is filled with search term token set, | ||
* filter size is selected using parent table filtertype expected and fpp values. | ||
*/ | ||
private void insertFilters() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please elaborate more what happens here, from the description it sounds like select all from specific filtertype into temp table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method was poorly named. After refactoring TableFilters
is responsible of category tables filters and inserting all filter types. class TableFilterTypesFromMetadata
is used to fetch the tables filter types from metadata
} | ||
} | ||
|
||
private void insertFilterFromRecord( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please elaborate more what happens here, it sounds like select filter from record into xyz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored into method filterBytesFromRecord
in class TableFilters
that creates a bloom filter with correct size and tokens.
} | ||
|
||
/** | ||
* Generates a condition that returns true if this temp tables search term tokens might be contained in the parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a parent table in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parent table was the bloom filter table that the temp table was created to match, renamed as originTable
in new class CategoryTableImpl
|
||
import static com.teragrep.pth_06.jooq.generated.bloomdb.Bloomdb.BLOOMDB; | ||
|
||
public class PatternMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the responsibility of objects in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responsibility was to get bloom filter tables that had record with a filtertype pattern that matched with any of the search term tokens. Refactored into two classes PatternMatchTables
thet gets the tables from metadata and PatternMatchCondition
that generates the condition used to do the pattern matching
src/main/java/com/teragrep/pth_06/planner/walker/conditions/ElementCondition.java
Show resolved
Hide resolved
…y and ensure single responsibility for objects
…ss to get pattern match tables
RefactoringNew classes:
Other changes:
|
|
||
/** | ||
* Decorator that creates category table | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this object is not super clear to me, why not just call create() on the origin categoryTable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed decorators in refactoring and just use CategoryTable
interface methods create()
and insertFilters()
.
src/test/java/com/teragrep/pth_06/planner/CategoryTableImplTest.java
Outdated
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/PatternMatchTablesTest.java
Outdated
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/TableFilterTypesFromMetadataResultTest.java
Outdated
Show resolved
Hide resolved
src/test/java/com/teragrep/pth_06/planner/walker/conditions/CategoryTableConditionTest.java
Outdated
Show resolved
Hide resolved
@@ -97,9 +97,9 @@ public Condition condition() { | |||
|
|||
for (final Table<?> table : tableSet) { | |||
// create category temp table for this pattern match table | |||
final CategoryTable categoryTable = new Created( | |||
new WithFilterTypes(new CategoryTableImpl(config, table, tokenizedValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, perhaps the older way was better to reduce temporal coupling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kortemik what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, just with better naming.
@@ -97,9 +97,9 @@ public Condition condition() { | |||
|
|||
for (final Table<?> table : tableSet) { | |||
// create category temp table for this pattern match table | |||
final CategoryTable categoryTable = new Created( | |||
new WithFilterTypes(new CategoryTableImpl(config, table, tokenizedValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, just with better naming.
…ategory table building and remove interfaces from tests
…ns using pattern from each filter_type_id
} | ||
newCondition = config.withoutFilters() ? combinedNullFilterCondition : combinedTableCondition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as much as I like the ternary operator i don't think it is as clear as the if-statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to if-statement
…inserted to category table, some code cleanup
Enabled token entanglement in tokenizer |
Pattern acceleration feature to activate bloom filtering on set regex pattern. Goal is to limit bloom filtering to certain patterns like UUID.
Flow:
Notes: Tokenizer max token count is set to 0 to get only major tokens since that is what dpf_03 currently uses to tokenize the bloom filter tables