Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MatchingStats for SQL mode #3278

Closed
wants to merge 8 commits into from

Conversation

jnsrnhld
Copy link
Collaborator

  • Extracted an Interface for MatchingStats, copied the existing implementation to WorkerMatchingStats and added new SqlMatchingStats
  • Added a dedicated SqlUpdateMatchingStatsJob which walks the concept tree and collects the respective stats
  • Added the possibility to define a primary column per Table
For a single TreeConcept, we do:
  • a count(*) on each connector, union these tables and finally sum() the count per connector to obtain the concepts event count
  • select the PIDs for each connector, union these tables and do a countDistinct(pid) to obtain the concepts entity count
  • for each connector and each of it's validity dates, we select the start and end, union all these tables and select the min(start) and max(end) to obtain the concepts date span

@jnsrnhld
Copy link
Collaborator Author

@awildturtok Jetzt mit Parallelisierung: Pro ConceptTreeElement wird ein Runnable erstellt, was dann die 3 Queries für die MatchingStats ausführt und Werte setzt.

@awildturtok
Copy link
Collaborator

Das generierte SQL im postgres modus ist lieder falsch:

postgres=# select count(distinct null) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;
 primary_column
----------------
              0
(1 row)


postgres=# select count(distinct id) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;

 primary_column
----------------
           1507

oberes ist das von uns erzeugte.

Zusätzlich ist mir aufgefallen, dass wir die Verundung mit den Elternknoten hier noch nicht umgesetzt haben.

@awildturtok
Copy link
Collaborator

Performance Metriken zu bekommen ist relativ schwierig, die Query Ausführungszeiten variieren stark nach load und sind somit nicht besonders aussagekräftig. (habe hier was 8s vs 800ms braucht je nach Load)

@jnsrnhld
Copy link
Collaborator Author

jnsrnhld commented Mar 6, 2024

Das generierte SQL im postgres modus ist lieder falsch:

postgres=# select count(distinct null) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;
 primary_column
----------------
              0
(1 row)


postgres=# select count(distinct id) as "primary_column"
from (
  select "table"."id"
  from "table
  where "table"."code" similar to '1234%'
) as "entities"
;

 primary_column
----------------
           1507

oberes ist das von uns erzeugte.

Zusätzlich ist mir aufgefallen, dass wir die Verundung mit den Elternknoten hier noch nicht umgesetzt haben.

Hey, das liegt daran dass weder an der Table, noch in der SQL Config die primary column gesetzt ist. Dadurch kennt der Job die PID nicht.

@awildturtok
Copy link
Collaborator

@jnsrnhld die Primary Column ist gesetzt in der config.json

@jnsrnhld
Copy link
Collaborator Author

jnsrnhld commented Mar 6, 2024

Ok, das ist strange. Ich schau mir nochmal an woran es liegen könnte.

@jnsrnhld jnsrnhld force-pushed the sql/feature/matching-stats branch from b1d759c to 88f3b99 Compare May 29, 2024 11:58
@jnsrnhld jnsrnhld changed the base branch from develop to sql/refactoring/rework-select-and-filter-conversion May 29, 2024 11:58
Base automatically changed from sql/refactoring/rework-select-and-filter-conversion to develop June 13, 2024 07:34
# Conflicts:
#	backend/src/main/java/com/bakdata/conquery/apiv1/query/concept/filter/FilterValue.java
#	backend/src/main/java/com/bakdata/conquery/models/config/SqlConnectorConfig.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/Filter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/BigMultiSelectFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/CountFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/CountQuartersFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/DateDistanceFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/FlagFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/MultiSelectFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/NumberFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/SingleSelectFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/filters/specific/SumFilter.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/Select.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/concept/specific/EventDateUnionSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/concept/specific/EventDurationSumSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/concept/specific/ExistsSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/FirstValueSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/LastValueSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/RandomValueSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/specific/CountQuartersSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/specific/CountSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/specific/DateDistanceSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/specific/FlagSelect.java
#	backend/src/main/java/com/bakdata/conquery/models/datasets/concepts/select/connector/specific/SumSelect.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/cqelement/concept/CQConceptConverter.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/cqelement/concept/TablePath.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/aggregator/CommonAggregationSelect.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/aggregator/CountQuartersSqlAggregator.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/aggregator/CountSqlAggregator.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/aggregator/FlagSqlAggregator.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/aggregator/SumSqlAggregator.java
#	backend/src/main/java/com/bakdata/conquery/sql/conversion/model/filter/MultiSelectFilterConverter.java
@jnsrnhld
Copy link
Collaborator Author

@awildturtok Spricht aus deiner Sicht was dagegen den PR hier zu mergen, auch wenn das Performance-Thema noch nicht angegangen wurde? Für Demos usw. ist es ganz nice, wenn die MatchingStats funktionieren, auch wenn noch nicht performant.

@awildturtok
Copy link
Collaborator

Ah sorry ich wusste nicht, dass es noch nicht gemerged ist. Können wir da am Dienstag nochmal drüber sprechen. Ich hatte eine Idee wie man das evtl umsetzen könnte mit case-when

@jnsrnhld jnsrnhld closed this Oct 16, 2024
@jnsrnhld jnsrnhld deleted the sql/feature/matching-stats branch October 16, 2024 11:03
@awildturtok
Copy link
Collaborator

Bevor du hier zu tief in eine Implementierung gehst, wollen wir uns nochmal dran setzen Ansätze zu besprechen?

@jnsrnhld
Copy link
Collaborator Author

@awildturtok Ich muss aktuell das überhaupt erstmal wieder zum Laufen bekommen. Der große PR mit der ConceptId usw. hat den aktuellen Weg kaputt gemacht, ich muss das nochmal anpassen, dass ich über den Storage auch die Konzepte update.

@awildturtok
Copy link
Collaborator

awildturtok commented Oct 16, 2024

@thoniTUB kannst du dich mit Jonas per Mail kurzschließen, was du für das matching stats persistenz geplant hast.

Tut mir Leid, die Änderung war riesig und ist nervig aber auf lange sicht wollen wir so viel magic wie möglich entfernen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants