Replies: 7 comments
-
New workers will execute table scans from existing queries. However, they won't participate in (final) aggregations or joins |
Beta Was this translation helpful? Give feedback.
-
do u know which version that was added in? new workers are not being assigned for me in query type 2. i think the TaskOutputOperator is taking the most time |
Beta Was this translation helpful? Give feedback.
-
This have been part of Presto for long time |
Beta Was this translation helpful? Give feedback.
-
I think this has been in since the first public release of Presto (like 7 years ago). This feature was a side effect of doing streaming discovery of splits, which we added because we couldn't load all splits into memory for large tables. As splits are "discovered", we assign them to the currently available machines. Should we close this issue? |
Beta Was this translation helpful? Give feedback.
-
it doesn't work though. query json shows 5 workers did not participate |
Beta Was this translation helpful? Give feedback.
-
It depends how many splits are produced for the table scan. If a split is assigned to a worker, the split must execute there, even if that worker is busy and a new worker shows up later. You can try decreasing |
Beta Was this translation helpful? Give feedback.
-
With FTE this is closer to reality. |
Beta Was this translation helpful? Give feedback.
-
scenario:
In my case I am talking about simple queries that return GBs of data like:
query type 1:
select * from table;
or
query type 2:
select * from table where col = 'x';
I see these operators being used in query type 2:
ExchangeOperator
TaskOutputOperator
ScanFilterAndProjectOperator
current limited behavior:
"New workers will only be utilized for table scans (and only if the splits are not already assigned). The machines used for hash partitions, i.e. aggregation and joins, are fixed when the query starts."
Beta Was this translation helpful? Give feedback.
All reactions