Multicolumn aggregation very slow after ANALYZE when persistent hashes are enabled #6596

monetdb-team · 2020-11-30T16:28:02Z

Date: 2018-05-02 17:48:58 +0200
From: @swingbit
To: SQL devs <>
Version: 11.27.13 (Jul2017-SP4)

Last updated: 2018-08-31 13:23:23 +0200

Comment 26417

Date: 2018-05-02 17:48:58 +0200
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Build Identifier:

With the table in attachment, I get the following timings on the same query:

Before ANALYZE, default pipe: 122ms
Before ANALYZE, sequential pipe: 40ms
After ANALYZE, default pipe: 11s
After ANALYZE, sequential pipe: 7m45s

This is only 1M tuples for 18MB data, so I expect the problem is about data distribution, not data size (indeed, the same query on different data works just fine).

The question is why it becomes so much slower after ANALYZE.

sql>select count(distinct id) from pim_0_obj_type;
+---------+
| L3 |
+=========+
| 1072284 |
+---------+
1 tuple (13.760ms)
sql>select count(distinct type) from pim_0_obj_type;
+------+
| L3 |
+======+
| 10 |
+------+
1 tuple (15.949ms)
sql>select count(distinct typestr) from pim_0_obj_type;
+------+
| L3 |
+======+
| 10 |
+------+
1 tuple (9.249ms)

-- Before ANALYZE

sql>set optimizer='default_pipe';
operation successful (0.608ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (122.836ms)

sql>set optimizer='sequential_pipe';
operation successful (0.717ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (40.347ms)

sql>select column,type,count,columnsize,heapsize,hashes,sorted from sys.storage('sys','pim_0_obj_type');
+---------+--------+---------+------------+----------+--------+--------+
| column | type | count | columnsize | heapsize | hashes | sorted |
+=========+========+=========+============+==========+========+========+
| id | int | 1072446 | 4289784 | 0 | 0 | null |
| type | int | 1072446 | 4289784 | 0 | 0 | null |
| typestr | clob | 1072446 | 2144892 | 8585 | 0 | null |
| prob | double | 1072446 | 8579568 | 0 | 0 | null |
+---------+--------+---------+------------+----------+--------+--------+
4 tuples (1.335ms)

sql>analyze sys.pim_0_obj_type;

-- After ANALYZE

sql>select column,type,count,columnsize,heapsize,hashes,sorted from sys.storage('sys','pim_0_obj_type');
+---------+--------+---------+------------+----------+---------+--------+
| column | type | count | columnsize | heapsize | hashes | sorted |
+=========+========+=========+============+==========+=========+========+
| id | int | 1072446 | 4289784 | 0 | 0 | true |
| type | int | 1072446 | 4289784 | 0 | 4390952 | false |
| typestr | clob | 1072446 | 2144892 | 8585 | 4522024 | false |
| prob | double | 1072446 | 8579568 | 0 | 0 | true |
+---------+--------+---------+------------+----------+---------+--------+
4 tuples (2.588ms)

sql>set optimizer='default_pipe';
operation successful (0.977ms)

sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (11.3s)

sql>set optimizer='sequential_pipe';
operation successful (0.760ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (7m 45s)

Reproducible: Always

$ mserver5 --version
MonetDB 5 server v11.27.14 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2018 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 15.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.41 2017-07-05 (compiled with 8.41)
openssl: OpenSSL 1.1.0h 27 Mar 2018 (compiled with OpenSSL 1.1.0h-fips 27 Mar 2018)
libxml2: 2.9.7 (compiled with 2.9.7)
Compiled by: roberto@photon.hq.spinque.com (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wduplicated-cond -Wduplicated-branches -Wrestrict -Wnested-externs -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 26418

Date: 2018-05-02 17:51:02 +0200
From: @swingbit

The data file is too large to be attached, here it is:
https://drive.google.com/open?id=1YcuWJpbQ8_-ozE-1t9TDaFtajDutErGp

Comment 26420

Date: 2018-05-07 11:13:07 +0200
From: @swingbit

UPDATE: This only happens with PERSISTENTHASH and PARENT_HASH enabled.

More details of what then happens:

The problematic grouping (8 minutes) is on the string attribute, which normally takes a few milliseconds:

| 48 | X_11=<tmp_2750>[1072446]:bat[:int] := algebra.projection(C_5=<tmp_2766>[1072446]:bat[:oid], X_8=<tmp_2746>[1072446]:bat[:int]); |
| 9688 | (X_18=<tmp_2766>[1072446]:bat[:oid], C_19=<tmp_2770>[1072284]:bat[:oid], X_20=<tmp_2721>[1072284]:bat[:lng]) := group.group(X_11=<tmp_2750>[1072446]:bat[:int]); |
| 482097975 | (X_21=<tmp_2721>[1072446]:bat[:oid], C_22=<tmp_2770>[1072446]:bat[:oid], X_23=<tmp_1656>[1072446]:bat[:lng]) := group.subgroup(X_17=<tmp_516>[1072446]:bat[:str], X_18=<tmp_2766>[1072446]:bat[:oid]); |
| 806 | (X_24=<tmp_2766>[1072446]:bat[:oid], C_25=<tmp_516>[1072446]:bat[:oid], X_26=<tmp_1656>[1072446]:bat[:lng]) := group.subgroupdone(X_14=<tmp_2767>[1072446]:bat[:int], X_21=<tmp_2721>[1072446]:bat[:oid]); |

Relevant part of --algorithms output:

BATproject(l=tmp_31641072446-sorted-key,r=tmp_27451072446[int])
BATproject(l=tmp_3164,r=tmp_2745)=tmp_31751072446
BATproject(l=tmp_31641072446-sorted-key,r=tmp_27441072446[str])
BATproject(l=tmp_3164,r=tmp_2744)=tmp_32301072446
BATproject(l=tmp_31641072446-sorted-key,r=tmp_27461072446[int]-sorted)
BATproject(l=tmp_3164,r=tmp_2746)=tmp_31731072446-sorted
BATgroup(b=tmp_31731072446[int],s=NULL0,g=NULL0,e=NULL0,h=NULL0,subsorted=0): compare consecutive values
BATordered: fixed nosorted(51) for tmp_32301072446 (8 usec)
BATordered_rev: fixed norevsorted(1) for tmp_31641072446 (1 usec)
BATcheckhash: reusing persisted hash tmp_2744
BATgroup(b=tmp_32301072446[str],s=NULL0,g=tmp_31641072446,e=NULL0,h=NULL0,subsorted=0): use existing hash table
...
(stopped manually)

-- algorithms output without persistent hashes:

BATproject(l=tmp_14501072446-sorted-key,r=tmp_4721072446[int])
BATproject(l=tmp_1450,r=tmp_472)=tmp_14611072446
BATproject(l=tmp_14501072446-sorted-key,r=tmp_4111072446[int])
BATproject(l=tmp_1450,r=tmp_411)=tmp_15141072446
BATproject(l=tmp_14501072446-sorted-key,r=tmp_4711072446[str])
BATproject(l=tmp_1450,r=tmp_471)=tmp_14571072446
BATordered: fixed norevsorted(47) for tmp_14571072446
BATordered: fixed nosorted(51) for tmp_14571072446 (20 usec)
BATordered: fixed norevsorted(47) for tmp_15141072446
BATordered: fixed nosorted(51) for tmp_15141072446 (19 usec)
BATordered: fixed norevsorted(47) for tmp_14501072446
BATordered: fixed nosorted(51) for tmp_14501072446 (8 usec)
BATgroup(b=tmp_15141072446[int],s=NULL0,g=tmp_14501072446,e=NULL0,h=NULL0,subsorted=0): create partial hash table
HASHnew: create hash(size 1072446, mask 1048576, width 4, nil 4294967295, total 8484088 bytes);
BATordered: fixed norevsorted(1) for tmp_14611072446
BATordered: fixed sorted for tmp_14611072446 (1422 usec)
BATordered: fixed norevsorted(47) for tmp_14571072446
BATordered: fixed nosorted(51) for tmp_14571072446 (4 usec)
BATgroup(b=tmp_14611072446[int],s=NULL0,g=tmp_14571072446,e=NULL0,h=NULL0,subsorted=0): subscan old groups
virtualize(bn=tmp_15141072446,seq=0)
BATproject(l=tmp_15141072446-sorted-key,r=tmp_14611072446[int]-sorted)
BATproject(l=tmp_1514,r=tmp_1461)=tmp_14571072446-sorted

Comment 26470

Date: 2018-05-31 13:23:31 +0200
From: @swingbit

Any luck in reproducing this?

Comment 26473

Date: 2018-06-01 11:27:12 +0200
From: @swingbit

Checking with gdb, I see that all the time is spent in GRP_use_existing_hash_table_tpe(sht) (sht because strings are dictionary-compressed). I guess this is the culprit?

Using persistent hashes and running ANALYZE is the only case when this macro is used.

Which to me is also surprising, should the hash not be persisted after it is created upon the first query call, and then found on the second call, without the need for ANALYZE?

Comment 26500

Date: 2018-06-26 13:08:34 +0200
From: @sjoerdmullender

I can reproduce this, and I think the reason for the slowness is the large number of duplicate items in the data together with mitosis and the enabling of parent hashes. The latter is not a supported configuration for a reason, and this is it.

I'm not sure I want to spend a lot of time on this problem right now, although I have some ideas.

Comment 26501

Date: 2018-06-26 13:10:58 +0200
From: @swingbit

One comment on that: I doubt it is about mitosis. It actually is so unreasonably slower with sequential pipe, not with default pipe.

Comment 26503

Date: 2018-06-26 14:59:56 +0200
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 4)

Which to me is also surprising, should the hash not be persisted after it is
created upon the first query call, and then found on the second call,
without the need for ANALYZE?

When there is no hash table, BATgroup creates a temporary hash table that is not compatible with a full hash table (duplicates are not stored). Therefore this hash is not persisted.

Comment 26504

Date: 2018-06-26 15:12:20 +0200
From: @swingbit

Thanks Sjoerd. So only by calling ANALYZE the full hash table is created (and persisted, if enabled)?

Comment 26506

Date: 2018-06-26 15:20:21 +0200
From: @swingbit

In general: I do understand if you say it is not a priority for you.

Just to cast my vote: persistent hashes are quite essential for us. The performance difference for joins is such that doing without is simply not an option. We talk about 5-6ms vs. 200ms for a single join, when we need to deliver much more complex queries that stay below 100ms.

For now, I have left persistent hashes enabled, but commented out their usage for aggregations. Of course it would be awesome if the could work without commenting any code, but for now this will do.

I do hope persistent hashes will keep being supported, even though disabled by default.

Comment 26507

Date: 2018-06-26 15:22:28 +0200
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 8)

Thanks Sjoerd. So only by calling ANALYZE the full hash table is created
(and persisted, if enabled)?

My comment was for BATgroup (and BATunique). In other places where a hash is created it will be persisted (e.g. point select or join).

Comment 26508

Date: 2018-06-26 15:23:46 +0200
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 9)

In general: I do understand if you say it is not a priority for you.

I actually already have a solution. I'm just running tests before committing it.

The solution is to not use the preexisting hash at all when doing subgrouping.

Comment 26509

Date: 2018-06-26 15:58:38 +0200
From: MonetDB Mercurial Repository <>

Changeset a8c356ffefb4 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=a8c356ffefb4

Changeset description:

When subgrouping, don't use a preexisting hash table.
This fixes bug #6596.

The rationale is that we need to go through the hash chain to find a
preexisting group for the value we're dealing with.  If there are many
distinct groups, but lots of duplicates in the column being
subgrouped (is that a word?) this becomes very expensive.  We're
better off using the code that builds a new temporary hash table that
is based on the value being subgrouped *and* its group id.  This does
cost extra memory during subgrouping.

Comment 26510

Date: 2018-06-26 17:02:50 +0200
From: @sjoerdmullender

I am assuming my patch adequately fixes the problem.

Comment 26516

Date: 2018-06-27 16:34:53 +0200
From: @swingbit

It does, thanks!

Comment 26521

Date: 2018-06-28 14:08:51 +0200
From: @swingbit

One issue still (I'm not sure whether generic or related to this bug).

With the patch applied, and after ANALYZE:

Default pipeline has not improved much (5s).
Sequential pipeline has improved (50ms).

However, they both perform very bad if called twice at short distance (less than 1s):

Default, call 1: 5ms
Sequential, call 2: 11s
Sequential, call 1: 51ms
Sequential, call 2: 4s

If I let a few seconds in between the 2 calls, they all perform good (except default pipe)

Can this still be related to this bug?

Comment 26523

Date: 2018-06-29 09:53:53 +0200
From: @sjoerdmullender

I'm wondering whether this is due to persisting the hash. I still have to check, but perhaps what's happening is that while the hash is being persisted (which can take a while), the hash lock is taken for the BAT, which means that the next query must wait when it is checking whether a hash exists.

If this is the case, then query 3 and later should all be quick.

Comment 26524

Date: 2018-06-29 10:10:55 +0200
From: @sjoerdmullender

(In reply to Sjoerd Mullender from comment 16)

I'm wondering whether this is due to persisting the hash. I still have to
check, but perhaps what's happening is that while the hash is being
persisted (which can take a while), the hash lock is taken for the BAT,
which means that the next query must wait when it is checking whether a hash
exists.

If this is the case, then query 3 and later should all be quick.

Ignore this comment. The analyze should persist any hashes and the subgrouping which made the query slow originally doesn't use those hashes anymore anyway.

Comment 26525

Date: 2018-06-29 10:17:17 +0200
From: @swingbit

Actually, ignore my comment, too.

What I wrote definitely happened, but I no longer see it after dropping and recreating the table (with or without analyze).

If I see it again and can frame it better I'll open a new report.

Comment 26526

Date: 2018-06-29 10:23:04 +0200
From: @sjoerdmullender

I haven't been able to reproduce this dramatic slow down.
Here are the timings I got on my machine (debug build) using your dataset (which I assume is a lot smaller than the one you're testing with):

sql:0.000 opt:3.717 run:276.375 clk:282.063 ms
sql:0.000 opt:3.743 run:265.037 clk:270.372 ms
sql:0.000 opt:3.790 run:265.595 clk:271.028 ms

analyze sys.pim_0_obj_type;

sql:0.000 opt:3.861 run:255.614 clk:260.748 ms
sql:0.000 opt:4.043 run:253.165 clk:258.431 ms
sql:0.000 opt:3.826 run:251.523 clk:256.568 ms
sql:0.000 opt:3.900 run:250.855 clk:256.025 ms
sql:0.000 opt:3.870 run:249.873 clk:255.019 ms

set optimizer='sequential_pipe';

sql:0.000 opt:0.805 run:66.705 clk:70.104 ms
sql:0.000 opt:0.812 run:77.599 clk:79.572 ms
sql:0.000 opt:0.797 run:92.253 clk:94.234 ms
sql:0.000 opt:0.814 run:90.597 clk:92.676 ms
sql:0.000 opt:0.786 run:93.923 clk:95.948 ms
sql:0.000 opt:0.796 run:66.223 clk:68.747 ms
sql:0.000 opt:0.782 run:66.024 clk:68.015 ms
sql:0.000 opt:1.084 run:91.666 clk:93.979 ms
sql:0.000 opt:0.771 run:91.746 clk:93.780 ms
sql:0.000 opt:0.786 run:65.728 clk:67.739 ms

In this last sequence the faster times are after waits, so slow downs do occur, but not nearly as extreme as in your case.

As you can infer, in this test I did not try sequential_pipe on unanalyzed data.

Comment 26527

Date: 2018-06-29 10:23:47 +0200
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 18)

Actually, ignore my comment, too.

What I wrote definitely happened, but I no longer see it after dropping and
recreating the table (with or without analyze).

If I see it again and can frame it better I'll open a new report.

Ok, this report stays resolved.

The text was updated successfully, but these errors were encountered:

monetdb-team added bug Something isn't working normal SQL labels Nov 30, 2020

monetdb-team closed this as completed Nov 30, 2020

sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multicolumn aggregation very slow after ANALYZE when persistent hashes are enabled #6596

Multicolumn aggregation very slow after ANALYZE when persistent hashes are enabled #6596

monetdb-team commented Nov 30, 2020

Multicolumn aggregation very slow after ANALYZE when persistent hashes are enabled #6596

Multicolumn aggregation very slow after ANALYZE when persistent hashes are enabled #6596

Comments

monetdb-team commented Nov 30, 2020

Comment 26417

Comment 26418

Comment 26420

Comment 26470

Comment 26473

Comment 26500

Comment 26501

Comment 26503

Comment 26504

Comment 26506

Comment 26507

Comment 26508

Comment 26509

Comment 26510

Comment 26516

Comment 26521

Comment 26523

Comment 26524

Comment 26525

Comment 26526

Comment 26527