[Bug]: Running out of space when sorting result of a query #6310

wrobell · 2023-11-13T15:49:47Z

What type of bug is this?

Other

What subsystems and features are affected?

Compression, Query executor

What happened?

There is no issues when running the SQL query for the compressed table message

select
    time_bucket('1 day', time) as period,
    count(*) / 1e6 as count
from message
where time > '2023-06-01'
group by period

Adding order by clause makes my system to run out of space (when using 2022-09-01 date, 2023-06-01 makes my system to run low on space)

select
    time_bucket('1 day', time) as period,
    count(*) / 1e6 as count
from message
where time > '2023-06-01'
group by period
order by period

Once the query fails, the space is reclaimed.

Attaching files with explain (analyze, buffers) output for both queries

explain-order-by.txt - plan for the query with order by clause
explain-no-order-by.txt - plan for the query without order by clause

Definition of the table

sq1090=# \d message
                       Table "public.message"
 Column |           Type           | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
 time   | timestamp with time zone |           | not null |
 data   | bytea                    |           | not null |
Indexes:
    "message_pkey" PRIMARY KEY, btree ("time")
Triggers:
    ts_insert_blocker BEFORE INSERT ON message FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Number of child tables: 1693 (Use \d+ to list them.)

TimescaleDB version affected

2.12.2

PostgreSQL version used

15.4

What operating system did you use?

ArchLinux

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

1. Create a compressed hypertable.
2. Populate table with data for multiple chunks (i.e. 1GB each, 100 chunks).
3. Compress chunks of the table.
4. Run query without `order by` clause, observe the storage space changes - should be negligible.
5. Run query with `order by` clause, and observe significant storage space changes, which might cause a system to run out of space.

The text was updated successfully, but these errors were encountered:

akuzm · 2023-11-13T16:42:12Z

In the EXPLAIN we can see that it uses external merge sort which uses like 700MB per chunk, that's probably the reason for the fs usage you're observing. I'm not sure why it decided to choose such a plan, maybe you'll be able to nudge it into using hash aggregation with set enable_sort to off;?

Regardless of the weird plan choice by Postgres, there's also might be a missed optimization on our side. I think time_bucket('1 day', time) is a monotonous function of time, and time is a compression orderby column, so we could be smarter here and sort on time instead. We have a special optimized query plan for this kind of sorting. You can try to force this kind of plan like this:

with sorted as materialized(select * from message order by time)
select
    time_bucket('1 day', time) as period,
    count(*) / 1e6 as count
from sorted
where time > '2023-06-01'
group by period
order by period

Not sure it will work, also don't forget to enable sort back with set enable_sort to on;.

wrobell · 2023-11-13T17:49:24Z

Disabling the sort does not help, still consuming storage space.

explain-sort-off.txt

wrobell · 2023-11-13T18:37:36Z

@akuzm I have tried your query just in case, but I restarted server once I had less than 5 GB of storage space left.

akuzm · 2023-11-13T19:06:19Z

Disabling the sort does not help, still consuming storage space.

explain-sort-off.txt

The Sort nodes have zero cost for some reason, this look like a bug, we'll have to investigate. GroupAggregate can't be disabled in Postgres, so I'm not sure how to force the HashAggregate plan. Maybe by another materialized cte:

with grouped as materialized (
select
    time_bucket('1 day', time) as period,
    count(*) / 1e6 as count
from message
where time > '2023-06-01'
group by period
)
select * from grouped
order by period

wrobell · 2023-11-13T21:20:31Z

@akuzm Yep, this will work, it is similar to what I have in #6309. I have plenty of workarounds, this is more to report the problem. Thanks again.

akuzm · 2023-11-14T22:27:48Z

Notes to self: I managed to reproduce the exact plan with per-chunk external sorts on a table from tsbench benchmarks:

set timescaledb.enable_chunkwise_aggregation to off; set enable_hashagg to off; explain (analyze, verbose) select count(*), time bucket from ht_metrics_compressed group by bucket order by bucket;

This is with set enable_hashagg to off though, without this it still prefers hash aggregation.

Here's what we can improve:

Zero costs of Sort nodes seems to be just a cosmetic problem, because we create these Sorts only at the Plan stage and cut some corners there. They don't exist at Path stage. See decompress_chunk_plan_create. Fixed.
Why compressed batch sorted merge is not used: it has a cost that is quadratic in total number of compressed batches, so it normally doesn't get used w/o limit. A more correct approach would be to make it quadratic in the number of unique segmentby values, since this is the number of batches it's going to actually operate on. This is a possible improvement. See cost_decompress_sorted_merge_append(). Fixed in Make batch sorted merge cost depend on number of unique segmentby values #6323
Why no per-chunk aggregates for explain select count(*), time_bucket('1 day', time) bucket from ht_metrics_compressed group by bucket order by bucket; Fixed in Support chunkwise aggregation with projection #7049
Why sorting by time_bucket is not transformed to sorting by time: we do have this optimization, but it has a small scope, it only tries to create index scan on chunk with transformed pathkeys. See ts_sort_transform_optimization(). Example query: set timescaledb.enable_chunkwise_aggregation to off; set enable_hashagg to off; explain select count(*), time_bucket('1 day', time) bucket from ht_metrics_compressed group by bucket order by bucket; Can be improved by this PR: Apply sort transform optimizations to compressed chunks #7528

wrobell added the bug label Nov 13, 2023

wrobell mentioned this issue Nov 13, 2023

[Bug]: Query with CTE seems to hold onto data of decompressed chunks #6309

Closed

akuzm mentioned this issue Nov 20, 2023

Don't disable batch sorted merge plan with enable_sort GUC #6330

Merged

akuzm modified the milestone: TimescaleDB 2.13 Nov 20, 2023

akuzm mentioned this issue Dec 14, 2023

debug aggregation paths #6320

Closed

4 tasks

akuzm mentioned this issue Dec 18, 2024

Apply sort transform optimizations to compressed chunks #7528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Running out of space when sorting result of a query #6310

[Bug]: Running out of space when sorting result of a query #6310

wrobell commented Nov 13, 2023

akuzm commented Nov 13, 2023

wrobell commented Nov 13, 2023 •

edited

Loading

wrobell commented Nov 13, 2023

akuzm commented Nov 13, 2023

wrobell commented Nov 13, 2023 •

edited

Loading

akuzm commented Nov 14, 2023 •

edited

Loading

[Bug]: Running out of space when sorting result of a query #6310

[Bug]: Running out of space when sorting result of a query #6310

Comments

wrobell commented Nov 13, 2023

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

akuzm commented Nov 13, 2023

wrobell commented Nov 13, 2023 • edited Loading

wrobell commented Nov 13, 2023

akuzm commented Nov 13, 2023

wrobell commented Nov 13, 2023 • edited Loading

akuzm commented Nov 14, 2023 • edited Loading

wrobell commented Nov 13, 2023 •

edited

Loading

wrobell commented Nov 13, 2023 •

edited

Loading

akuzm commented Nov 14, 2023 •

edited

Loading