-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Union and Minus
shiyuan edited this page Jun 18, 2017
·
2 revisions
For your convenience we implemented union and minus queries. We also implemented several strategies for how the minus takes place , you can choose strategy with hints.
To use Union & Minus your first query field names should be the same as the second query field names
To achieve this you can use aliases if necessary
Query first table for the limit on first query (default is 200) Query the second table for the limit on second query (default is 200) combine the results and send them to the client
There are three implementations for minus queries:
- basic implementation
on basic implementation we run the first query on elasticsearch and add it to a set (we remove duplicates) after that we run the second query on elasticsearch. for each result on second query , we check if it exists on first result set and if it does we remove it from the set. And than we finally return the set as hits. - scrolling
We use scrolling in order for better performance when second data set is large In order to use scrolling you need to add this hint:
/*! MINUS_SCROLL_FETCH_AND_RESULT_LIMITS(maxFetchOnFirstTable,maxFetchOnSecondTable,docsFetchFromShardOnEachScroll) */
How it work?
- scroll on first table till fetch all / reach table limit.
- remove duplicates and put it on set S
- scroll on second table , for each scroll:
- get result and try to remove items from the set S
- return the set S as hits
- Scrolling and Terms optimization
it is only available when only one field on minus We use this optimization when both tables are large and when we want to keep our heap low
In order to use it you need to add 2 hints , the scroll hint and a new one:
/*! MINUS_SCROLL_FETCH_AND_RESULT_LIMITS(maxFetchOnFirstTable,maxFetchOnSecondTable,docsFetchFromShardOnEachScroll) */
/*! MINUS_USE_TERMS_OPTIMIZATION(true)*/
How it work?
- create set S
- scroll on first table , for each scroll:
* remove duplicates and put result on set S'
* scroll on second table while adding terms filter on all results from S', for each scroll:
- get result and try to remove items from the set S' * put results from S' to S
- return set S as hits
- union example (with alias):
SELECT firstname FROM myIndex/account WHERE firstname = 'Amber'
union all
SELECT dog_name as firstname FROM myIndex/dog WHERE dog_name = 'rex'
- minus examples:
- simple
SELECT pk FROM myIndex/systems WHERE system_name = 'A'
minus
SELECT pk FROM myIndex/systems WHERE system_name = 'B'
- two fields and aliases
SELECT pk , letter FROM myIndex/systems WHERE system_name = 'C'
minus
SELECT myId as pk , myLetter as letter FROM myIndex/systems WHERE system_name = 'E'
- two fields and aliases and scrolling hint
SELECT /*! MINUS_SCROLL_FETCH_AND_RESULT_LIMITS(100000,10000000,5000) */
pk , letter FROM myIndex/systems WHERE system_name = 'C'
minus
SELECT myId as pk , myLetter as letter FROM myIndex/systems WHERE system_name = 'E'
- terms optimization
SELECT /*! MINUS_SCROLL_FETCH_AND_RESULT_LIMITS(100000,10000000,5000) */
/*! MINUS_USE_TERMS_OPTIMIZATION(true)*/
pk FROM myIndex/systems WHERE system_name = 'A'
minus
SELECT pk FROM myIndex/systems WHERE system_name = 'B'