You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Build Identifier:
With the table in attachment, I get the following timings on the same query:
Before ANALYZE, default pipe: 122ms
Before ANALYZE, sequential pipe: 40ms
After ANALYZE, default pipe: 11s
After ANALYZE, sequential pipe: 7m45s
This is only 1M tuples for 18MB data, so I expect the problem is about data distribution, not data size (indeed, the same query on different data works just fine).
The question is why it becomes so much slower after ANALYZE.
sql>set optimizer='default_pipe';
operation successful (0.608ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (122.836ms)
sql>set optimizer='sequential_pipe';
operation successful (0.717ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (40.347ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (11.3s)
sql>set optimizer='sequential_pipe';
operation successful (0.760ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (7m 45s)
Reproducible: Always
$ mserver5 --version
MonetDB 5 server v11.27.14 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2018 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 15.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.41 2017-07-05 (compiled with 8.41)
openssl: OpenSSL 1.1.0h 27 Mar 2018 (compiled with OpenSSL 1.1.0h-fips 27 Mar 2018)
libxml2: 2.9.7 (compiled with 2.9.7)
Compiled by: roberto@photon.hq.spinque.com (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wduplicated-cond -Wduplicated-branches -Wrestrict -Wnested-externs -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Checking with gdb, I see that all the time is spent in GRP_use_existing_hash_table_tpe(sht) (sht because strings are dictionary-compressed). I guess this is the culprit?
Using persistent hashes and running ANALYZE is the only case when this macro is used.
Which to me is also surprising, should the hash not be persisted after it is created upon the first query call, and then found on the second call, without the need for ANALYZE?
I can reproduce this, and I think the reason for the slowness is the large number of duplicate items in the data together with mitosis and the enabling of parent hashes. The latter is not a supported configuration for a reason, and this is it.
I'm not sure I want to spend a lot of time on this problem right now, although I have some ideas.
Which to me is also surprising, should the hash not be persisted after it is
created upon the first query call, and then found on the second call,
without the need for ANALYZE?
When there is no hash table, BATgroup creates a temporary hash table that is not compatible with a full hash table (duplicates are not stored). Therefore this hash is not persisted.
In general: I do understand if you say it is not a priority for you.
Just to cast my vote: persistent hashes are quite essential for us. The performance difference for joins is such that doing without is simply not an option. We talk about 5-6ms vs. 200ms for a single join, when we need to deliver much more complex queries that stay below 100ms.
For now, I have left persistent hashes enabled, but commented out their usage for aggregations. Of course it would be awesome if the could work without commenting any code, but for now this will do.
I do hope persistent hashes will keep being supported, even though disabled by default.
When subgrouping, don't use a preexisting hash table.
This fixes bug #6596.
The rationale is that we need to go through the hash chain to find a
preexisting group for the value we're dealing with. If there are many
distinct groups, but lots of duplicates in the column being
subgrouped (is that a word?) this becomes very expensive. We're
better off using the code that builds a new temporary hash table that
is based on the value being subgrouped *and* its group id. This does
cost extra memory during subgrouping.
I'm wondering whether this is due to persisting the hash. I still have to check, but perhaps what's happening is that while the hash is being persisted (which can take a while), the hash lock is taken for the BAT, which means that the next query must wait when it is checking whether a hash exists.
If this is the case, then query 3 and later should all be quick.
I'm wondering whether this is due to persisting the hash. I still have to
check, but perhaps what's happening is that while the hash is being
persisted (which can take a while), the hash lock is taken for the BAT,
which means that the next query must wait when it is checking whether a hash
exists.
If this is the case, then query 3 and later should all be quick.
Ignore this comment. The analyze should persist any hashes and the subgrouping which made the query slow originally doesn't use those hashes anymore anyway.
I haven't been able to reproduce this dramatic slow down.
Here are the timings I got on my machine (debug build) using your dataset (which I assume is a lot smaller than the one you're testing with):
sql:0.000 opt:3.717 run:276.375 clk:282.063 ms
sql:0.000 opt:3.743 run:265.037 clk:270.372 ms
sql:0.000 opt:3.790 run:265.595 clk:271.028 ms
analyze sys.pim_0_obj_type;
sql:0.000 opt:3.861 run:255.614 clk:260.748 ms
sql:0.000 opt:4.043 run:253.165 clk:258.431 ms
sql:0.000 opt:3.826 run:251.523 clk:256.568 ms
sql:0.000 opt:3.900 run:250.855 clk:256.025 ms
sql:0.000 opt:3.870 run:249.873 clk:255.019 ms
set optimizer='sequential_pipe';
sql:0.000 opt:0.805 run:66.705 clk:70.104 ms
sql:0.000 opt:0.812 run:77.599 clk:79.572 ms
sql:0.000 opt:0.797 run:92.253 clk:94.234 ms
sql:0.000 opt:0.814 run:90.597 clk:92.676 ms
sql:0.000 opt:0.786 run:93.923 clk:95.948 ms
sql:0.000 opt:0.796 run:66.223 clk:68.747 ms
sql:0.000 opt:0.782 run:66.024 clk:68.015 ms
sql:0.000 opt:1.084 run:91.666 clk:93.979 ms
sql:0.000 opt:0.771 run:91.746 clk:93.780 ms
sql:0.000 opt:0.786 run:65.728 clk:67.739 ms
In this last sequence the faster times are after waits, so slow downs do occur, but not nearly as extreme as in your case.
As you can infer, in this test I did not try sequential_pipe on unanalyzed data.
Date: 2018-05-02 17:48:58 +0200
From: @swingbit
To: SQL devs <>
Version: 11.27.13 (Jul2017-SP4)
Last updated: 2018-08-31 13:23:23 +0200
Comment 26417
Date: 2018-05-02 17:48:58 +0200
From: @swingbit
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Build Identifier:
With the table in attachment, I get the following timings on the same query:
This is only 1M tuples for 18MB data, so I expect the problem is about data distribution, not data size (indeed, the same query on different data works just fine).
The question is why it becomes so much slower after ANALYZE.
sql>select count(distinct id) from pim_0_obj_type;
+---------+
| L3 |
+=========+
| 1072284 |
+---------+
1 tuple (13.760ms)
sql>select count(distinct type) from pim_0_obj_type;
+------+
| L3 |
+======+
| 10 |
+------+
1 tuple (15.949ms)
sql>select count(distinct typestr) from pim_0_obj_type;
+------+
| L3 |
+======+
| 10 |
+------+
1 tuple (9.249ms)
-- Before ANALYZE
sql>set optimizer='default_pipe';
operation successful (0.608ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (122.836ms)
sql>set optimizer='sequential_pipe';
operation successful (0.717ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (40.347ms)
sql>select column,type,count,columnsize,heapsize,hashes,sorted from sys.storage('sys','pim_0_obj_type');
+---------+--------+---------+------------+----------+--------+--------+
| column | type | count | columnsize | heapsize | hashes | sorted |
+=========+========+=========+============+==========+========+========+
| id | int | 1072446 | 4289784 | 0 | 0 | null |
| type | int | 1072446 | 4289784 | 0 | 0 | null |
| typestr | clob | 1072446 | 2144892 | 8585 | 0 | null |
| prob | double | 1072446 | 8579568 | 0 | 0 | null |
+---------+--------+---------+------------+----------+--------+--------+
4 tuples (1.335ms)
sql>analyze sys.pim_0_obj_type;
-- After ANALYZE
sql>select column,type,count,columnsize,heapsize,hashes,sorted from sys.storage('sys','pim_0_obj_type');
+---------+--------+---------+------------+----------+---------+--------+
| column | type | count | columnsize | heapsize | hashes | sorted |
+=========+========+=========+============+==========+=========+========+
| id | int | 1072446 | 4289784 | 0 | 0 | true |
| type | int | 1072446 | 4289784 | 0 | 4390952 | false |
| typestr | clob | 1072446 | 2144892 | 8585 | 4522024 | false |
| prob | double | 1072446 | 8579568 | 0 | 0 | true |
+---------+--------+---------+------------+----------+---------+--------+
4 tuples (2.588ms)
sql>set optimizer='default_pipe';
operation successful (0.977ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (11.3s)
sql>set optimizer='sequential_pipe';
operation successful (0.760ms)
sql>select count(*) from (select id as a1,type as a2,typestr as a3,max(prob) from pim_0_obj_type group by a1,a2,a3) as x;
+---------+
| L11 |
+=========+
| 1072446 |
+---------+
1 tuple (7m 45s)
Reproducible: Always
$ mserver5 --version
MonetDB 5 server v11.27.14 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2018 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 15.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.41 2017-07-05 (compiled with 8.41)
openssl: OpenSSL 1.1.0h 27 Mar 2018 (compiled with OpenSSL 1.1.0h-fips 27 Mar 2018)
libxml2: 2.9.7 (compiled with 2.9.7)
Compiled by: roberto@photon.hq.spinque.com (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wduplicated-cond -Wduplicated-branches -Wrestrict -Wnested-externs -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Comment 26418
Date: 2018-05-02 17:51:02 +0200
From: @swingbit
The data file is too large to be attached, here it is:
https://drive.google.com/open?id=1YcuWJpbQ8_-ozE-1t9TDaFtajDutErGp
Comment 26420
Date: 2018-05-07 11:13:07 +0200
From: @swingbit
UPDATE: This only happens with PERSISTENTHASH and PARENT_HASH enabled.
More details of what then happens:
The problematic grouping (8 minutes) is on the string attribute, which normally takes a few milliseconds:
| 48 | X_11=<tmp_2750>[1072446]:bat[:int] := algebra.projection(C_5=<tmp_2766>[1072446]:bat[:oid], X_8=<tmp_2746>[1072446]:bat[:int]); |
| 9688 | (X_18=<tmp_2766>[1072446]:bat[:oid], C_19=<tmp_2770>[1072284]:bat[:oid], X_20=<tmp_2721>[1072284]:bat[:lng]) := group.group(X_11=<tmp_2750>[1072446]:bat[:int]); |
| 482097975 | (X_21=<tmp_2721>[1072446]:bat[:oid], C_22=<tmp_2770>[1072446]:bat[:oid], X_23=<tmp_1656>[1072446]:bat[:lng]) := group.subgroup(X_17=<tmp_516>[1072446]:bat[:str], X_18=<tmp_2766>[1072446]:bat[:oid]); |
| 806 | (X_24=<tmp_2766>[1072446]:bat[:oid], C_25=<tmp_516>[1072446]:bat[:oid], X_26=<tmp_1656>[1072446]:bat[:lng]) := group.subgroupdone(X_14=<tmp_2767>[1072446]:bat[:int], X_21=<tmp_2721>[1072446]:bat[:oid]); |
Relevant part of --algorithms output:
BATproject(l=tmp_31641072446-sorted-key,r=tmp_27451072446[int])
BATproject(l=tmp_3164,r=tmp_2745)=tmp_31751072446
BATproject(l=tmp_31641072446-sorted-key,r=tmp_27441072446[str])
BATproject(l=tmp_3164,r=tmp_2744)=tmp_32301072446
BATproject(l=tmp_31641072446-sorted-key,r=tmp_27461072446[int]-sorted)
BATproject(l=tmp_3164,r=tmp_2746)=tmp_31731072446-sorted
BATgroup(b=tmp_31731072446[int],s=NULL0,g=NULL0,e=NULL0,h=NULL0,subsorted=0): compare consecutive values
BATordered: fixed nosorted(51) for tmp_32301072446 (8 usec)
BATordered_rev: fixed norevsorted(1) for tmp_31641072446 (1 usec)
BATcheckhash: reusing persisted hash tmp_2744
BATgroup(b=tmp_32301072446[str],s=NULL0,g=tmp_31641072446,e=NULL0,h=NULL0,subsorted=0): use existing hash table
...
(stopped manually)
-- algorithms output without persistent hashes:
BATproject(l=tmp_14501072446-sorted-key,r=tmp_4721072446[int])
BATproject(l=tmp_1450,r=tmp_472)=tmp_14611072446
BATproject(l=tmp_14501072446-sorted-key,r=tmp_4111072446[int])
BATproject(l=tmp_1450,r=tmp_411)=tmp_15141072446
BATproject(l=tmp_14501072446-sorted-key,r=tmp_4711072446[str])
BATproject(l=tmp_1450,r=tmp_471)=tmp_14571072446
BATordered: fixed norevsorted(47) for tmp_14571072446
BATordered: fixed nosorted(51) for tmp_14571072446 (20 usec)
BATordered: fixed norevsorted(47) for tmp_15141072446
BATordered: fixed nosorted(51) for tmp_15141072446 (19 usec)
BATordered: fixed norevsorted(47) for tmp_14501072446
BATordered: fixed nosorted(51) for tmp_14501072446 (8 usec)
BATgroup(b=tmp_15141072446[int],s=NULL0,g=tmp_14501072446,e=NULL0,h=NULL0,subsorted=0): create partial hash table
HASHnew: create hash(size 1072446, mask 1048576, width 4, nil 4294967295, total 8484088 bytes);
BATordered: fixed norevsorted(1) for tmp_14611072446
BATordered: fixed sorted for tmp_14611072446 (1422 usec)
BATordered: fixed norevsorted(47) for tmp_14571072446
BATordered: fixed nosorted(51) for tmp_14571072446 (4 usec)
BATgroup(b=tmp_14611072446[int],s=NULL0,g=tmp_14571072446,e=NULL0,h=NULL0,subsorted=0): subscan old groups
virtualize(bn=tmp_15141072446,seq=0)
BATproject(l=tmp_15141072446-sorted-key,r=tmp_14611072446[int]-sorted)
BATproject(l=tmp_1514,r=tmp_1461)=tmp_14571072446-sorted
Comment 26470
Date: 2018-05-31 13:23:31 +0200
From: @swingbit
Any luck in reproducing this?
Comment 26473
Date: 2018-06-01 11:27:12 +0200
From: @swingbit
Checking with gdb, I see that all the time is spent in GRP_use_existing_hash_table_tpe(sht) (sht because strings are dictionary-compressed). I guess this is the culprit?
Using persistent hashes and running ANALYZE is the only case when this macro is used.
Which to me is also surprising, should the hash not be persisted after it is created upon the first query call, and then found on the second call, without the need for ANALYZE?
Comment 26500
Date: 2018-06-26 13:08:34 +0200
From: @sjoerdmullender
I can reproduce this, and I think the reason for the slowness is the large number of duplicate items in the data together with mitosis and the enabling of parent hashes. The latter is not a supported configuration for a reason, and this is it.
I'm not sure I want to spend a lot of time on this problem right now, although I have some ideas.
Comment 26501
Date: 2018-06-26 13:10:58 +0200
From: @swingbit
One comment on that: I doubt it is about mitosis. It actually is so unreasonably slower with sequential pipe, not with default pipe.
Comment 26503
Date: 2018-06-26 14:59:56 +0200
From: @sjoerdmullender
(In reply to Roberto Cornacchia from comment 4)
When there is no hash table, BATgroup creates a temporary hash table that is not compatible with a full hash table (duplicates are not stored). Therefore this hash is not persisted.
Comment 26504
Date: 2018-06-26 15:12:20 +0200
From: @swingbit
Thanks Sjoerd. So only by calling ANALYZE the full hash table is created (and persisted, if enabled)?
Comment 26506
Date: 2018-06-26 15:20:21 +0200
From: @swingbit
In general: I do understand if you say it is not a priority for you.
Just to cast my vote: persistent hashes are quite essential for us. The performance difference for joins is such that doing without is simply not an option. We talk about 5-6ms vs. 200ms for a single join, when we need to deliver much more complex queries that stay below 100ms.
For now, I have left persistent hashes enabled, but commented out their usage for aggregations. Of course it would be awesome if the could work without commenting any code, but for now this will do.
I do hope persistent hashes will keep being supported, even though disabled by default.
Comment 26507
Date: 2018-06-26 15:22:28 +0200
From: @sjoerdmullender
(In reply to Roberto Cornacchia from comment 8)
My comment was for BATgroup (and BATunique). In other places where a hash is created it will be persisted (e.g. point select or join).
Comment 26508
Date: 2018-06-26 15:23:46 +0200
From: @sjoerdmullender
(In reply to Roberto Cornacchia from comment 9)
I actually already have a solution. I'm just running tests before committing it.
The solution is to not use the preexisting hash at all when doing subgrouping.
Comment 26509
Date: 2018-06-26 15:58:38 +0200
From: MonetDB Mercurial Repository <>
Changeset a8c356ffefb4 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.
For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=a8c356ffefb4
Changeset description:
Comment 26510
Date: 2018-06-26 17:02:50 +0200
From: @sjoerdmullender
I am assuming my patch adequately fixes the problem.
Comment 26516
Date: 2018-06-27 16:34:53 +0200
From: @swingbit
It does, thanks!
Comment 26521
Date: 2018-06-28 14:08:51 +0200
From: @swingbit
One issue still (I'm not sure whether generic or related to this bug).
With the patch applied, and after ANALYZE:
Default pipeline has not improved much (5s).
Sequential pipeline has improved (50ms).
However, they both perform very bad if called twice at short distance (less than 1s):
Default, call 1: 5ms
Sequential, call 2: 11s
Sequential, call 1: 51ms
Sequential, call 2: 4s
If I let a few seconds in between the 2 calls, they all perform good (except default pipe)
Can this still be related to this bug?
Comment 26523
Date: 2018-06-29 09:53:53 +0200
From: @sjoerdmullender
I'm wondering whether this is due to persisting the hash. I still have to check, but perhaps what's happening is that while the hash is being persisted (which can take a while), the hash lock is taken for the BAT, which means that the next query must wait when it is checking whether a hash exists.
If this is the case, then query 3 and later should all be quick.
Comment 26524
Date: 2018-06-29 10:10:55 +0200
From: @sjoerdmullender
(In reply to Sjoerd Mullender from comment 16)
Ignore this comment. The analyze should persist any hashes and the subgrouping which made the query slow originally doesn't use those hashes anymore anyway.
Comment 26525
Date: 2018-06-29 10:17:17 +0200
From: @swingbit
Actually, ignore my comment, too.
What I wrote definitely happened, but I no longer see it after dropping and recreating the table (with or without analyze).
If I see it again and can frame it better I'll open a new report.
Comment 26526
Date: 2018-06-29 10:23:04 +0200
From: @sjoerdmullender
I haven't been able to reproduce this dramatic slow down.
Here are the timings I got on my machine (debug build) using your dataset (which I assume is a lot smaller than the one you're testing with):
sql:0.000 opt:3.717 run:276.375 clk:282.063 ms
sql:0.000 opt:3.743 run:265.037 clk:270.372 ms
sql:0.000 opt:3.790 run:265.595 clk:271.028 ms
analyze sys.pim_0_obj_type;
sql:0.000 opt:3.861 run:255.614 clk:260.748 ms
sql:0.000 opt:4.043 run:253.165 clk:258.431 ms
sql:0.000 opt:3.826 run:251.523 clk:256.568 ms
sql:0.000 opt:3.900 run:250.855 clk:256.025 ms
sql:0.000 opt:3.870 run:249.873 clk:255.019 ms
set optimizer='sequential_pipe';
sql:0.000 opt:0.805 run:66.705 clk:70.104 ms
sql:0.000 opt:0.812 run:77.599 clk:79.572 ms
sql:0.000 opt:0.797 run:92.253 clk:94.234 ms
sql:0.000 opt:0.814 run:90.597 clk:92.676 ms
sql:0.000 opt:0.786 run:93.923 clk:95.948 ms
sql:0.000 opt:0.796 run:66.223 clk:68.747 ms
sql:0.000 opt:0.782 run:66.024 clk:68.015 ms
sql:0.000 opt:1.084 run:91.666 clk:93.979 ms
sql:0.000 opt:0.771 run:91.746 clk:93.780 ms
sql:0.000 opt:0.786 run:65.728 clk:67.739 ms
In this last sequence the faster times are after waits, so slow downs do occur, but not nearly as extreme as in your case.
As you can infer, in this test I did not try sequential_pipe on unanalyzed data.
Comment 26527
Date: 2018-06-29 10:23:47 +0200
From: @sjoerdmullender
(In reply to Roberto Cornacchia from comment 18)
Ok, this report stays resolved.
The text was updated successfully, but these errors were encountered: