You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT "a", COUNT(DISTINCT "b") FROM "test"."table1" GROUP BY "a"
is about 50 times slower then running multiple queries with the group by replaced with a where clause. When not using distinct the group by and multiple queries perform equally good.
Reproducible: Always
Comment 19269
Date: 2013-10-11 19:00:25 +0200
From: Christian Braun <>
Hi,
I have also encountered this problem. The easiest workaround is rewriting the query like this:
For the query (1)
SELECT "a", COUNT(DISTINCT "b") FROM "test"."table1" GROUP BY "a"
Write
(2)
SELECT "a",count("b") from (SELECT DISTINCT "a", "b" FROM "test"."table1" ) as d GROUP BY "a"
In the first query MonetDB uses only one CPU core. In the second uses all cores.
Comment 19311
Date: 2013-10-26 14:22:56 +0200
From: Christian Braun <>
Thanks for the tip. But unfortunately it does not work as a workaround:
[1]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
48 seconds
[2]
SELECT "a", COUNT(1) FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
3187 seconds
[3]
SELECT "a", COUNT(DISTINCT("b")) FROM "test"."table1" GROUP BY "a";
78 seconds
[4]
SELECT 0, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 0
UNION SELECT 1, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 1
UNION SELECT 2, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 2
UNION SELECT 3, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 3
UNION SELECT 4, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 4
UNION SELECT 5, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 5
;
2 seconds
[1], [2] and [3] run on a single cpu core most of the time.
[2] seems to be different issue. I would have expected [1] and [2] have the same execution time.
The above test is with a small dataset. When i run queries like [3] on larger datasets they do not finish after 10 hours.
Comment 19312
Date: 2013-10-26 14:24:06 +0200
From: Christian Braun <>
Created attachment 247
log file with timing for distinct in sub query
Attached file: bug_h3.log (application/octet-stream, 2225 bytes)
Description: log file with timing for distinct in sub query
That's odd! After your comment I wanted reproduce my performance results and I couldn't do that on fresh table. But on another table (created some time ago) Monet used all cores and performance was really OK.
Then I realized I had quite big performance improvement after database restart.
So try to restart your database if you can and check it again. I'v checked that on 3 new tables created before restart.
I hope there is a better solution to force monet to use all CPU.
My another observation is that changing order of columns in sub query
has influence on duration.
Try
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "b","a" FROM "test"."table1") AS t GROUP BY "a";
vs
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a","b" FROM "test"."table1") AS t GROUP BY "a";
I hope that helps you. I'm using Feb2013 branch
MonetDB 5 server v11.15.18 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 252.2GiB available memory, 32 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0 29 Mar 2010 (compiled with OpenSSL 1.0.0-fips 29 Mar 2010)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Wp,-D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Comment 19314
Date: 2013-10-27 10:13:31 +0100
From: Christian Braun <>
Thank you for following up. Restarting the database did not make a difference. But switching the column order helped:
[5]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
50 seconds
[6]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "b", "a" FROM "test"."table1") AS t GROUP BY "a";
3 seconds
The relational optimizer now rewrites a, count(distinct) group by a into
a, count(b) from ( select a, b from .. group by a, b) as t group by a;
Also the optimizer reorders the group by expressions (a,b) if it has enough information (ie if the types differ or if one is ordered).
Date: 2013-10-11 18:59:12 +0200
From: Christian Braun <>
To: SQL devs <>
Version: 11.15.19 (Feb2013-SP6)
CC: klaudiusz223, @njnes
Last updated: 2014-02-20 15:02:27 +0100
Comment 19268
Date: 2013-10-11 18:59:12 +0200
From: Christian Braun <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0
Build Identifier:
SELECT "a", COUNT(DISTINCT "b") FROM "test"."table1" GROUP BY "a"
is about 50 times slower then running multiple queries with the group by replaced with a where clause. When not using distinct the group by and multiple queries perform equally good.
Reproducible: Always
Comment 19269
Date: 2013-10-11 19:00:25 +0200
From: Christian Braun <>
Created attachment 241
sql script
Comment 19270
Date: 2013-10-11 19:01:03 +0200
From: Christian Braun <>
Created attachment 242
log file showing bug
Comment 19271
Date: 2013-10-11 19:01:34 +0200
From: Christian Braun <>
Created attachment 243
explain of select statements
Comment 19290
Date: 2013-10-20 17:34:44 +0200
From: @njnes
Could you add the ddl statements and the output of plan followed by the distinct
query.
Comment 19291
Date: 2013-10-21 10:48:01 +0200
From: Christian Braun <>
Created attachment 244
DDL statement
Comment 19292
Date: 2013-10-21 10:51:57 +0200
From: Christian Braun <>
Created attachment 245
explain of "select count distinct group by"
Comment 19293
Date: 2013-10-21 10:54:55 +0200
From: Christian Braun <>
Thanks for looking into this. I attached the files. I had to replace column and table names with generic identifies.
Comment 19306
Date: 2013-10-25 18:37:35 +0200
From: Klaudiusz <>
Hi,
I have also encountered this problem. The easiest workaround is rewriting the query like this:
For the query (1)
SELECT "a", COUNT(DISTINCT "b") FROM "test"."table1" GROUP BY "a"
Write
(2)
SELECT "a",count("b") from (SELECT DISTINCT "a", "b" FROM "test"."table1" ) as d GROUP BY "a"
In the first query MonetDB uses only one CPU core. In the second uses all cores.
Comment 19311
Date: 2013-10-26 14:22:56 +0200
From: Christian Braun <>
Thanks for the tip. But unfortunately it does not work as a workaround:
[1]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
48 seconds
[2]
SELECT "a", COUNT(1) FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
3187 seconds
[3]
SELECT "a", COUNT(DISTINCT("b")) FROM "test"."table1" GROUP BY "a";
78 seconds
[4]
SELECT 0, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 0
UNION SELECT 1, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 1
UNION SELECT 2, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 2
UNION SELECT 3, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 3
UNION SELECT 4, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 4
UNION SELECT 5, COUNT(DISTINCT "b") FROM "test"."table1" WHERE "a" = 5
;
2 seconds
[1], [2] and [3] run on a single cpu core most of the time.
[2] seems to be different issue. I would have expected [1] and [2] have the same execution time.
The above test is with a small dataset. When i run queries like [3] on larger datasets they do not finish after 10 hours.
Comment 19312
Date: 2013-10-26 14:24:06 +0200
From: Christian Braun <>
Created attachment 247
log file with timing for distinct in sub query
Comment 19313
Date: 2013-10-26 23:58:01 +0200
From: Klaudiusz <>
That's odd! After your comment I wanted reproduce my performance results and I couldn't do that on fresh table. But on another table (created some time ago) Monet used all cores and performance was really OK.
Then I realized I had quite big performance improvement after database restart.
So try to restart your database if you can and check it again. I'v checked that on 3 new tables created before restart.
I hope there is a better solution to force monet to use all CPU.
My another observation is that changing order of columns in sub query
has influence on duration.
Try
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "b","a" FROM "test"."table1") AS t GROUP BY "a";
vs
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a","b" FROM "test"."table1") AS t GROUP BY "a";
I hope that helps you. I'm using Feb2013 branch
MonetDB 5 server v11.15.18 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 252.2GiB available memory, 32 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0 29 Mar 2010 (compiled with OpenSSL 1.0.0-fips 29 Mar 2010)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Wp,-D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Comment 19314
Date: 2013-10-27 10:13:31 +0100
From: Christian Braun <>
Thank you for following up. Restarting the database did not make a difference. But switching the column order helped:
[5]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "a", "b" FROM "test"."table1") AS t GROUP BY "a";
50 seconds
[6]
SELECT "a", COUNT("b") FROM (SELECT DISTINCT "b", "a" FROM "test"."table1") AS t GROUP BY "a";
3 seconds
Comment 19414
Date: 2013-12-11 15:08:31 +0100
From: @njnes
The relational optimizer now rewrites a, count(distinct) group by a into
a, count(b) from ( select a, b from .. group by a, b) as t group by a;
Also the optimizer reorders the group by expressions (a,b) if it has enough information (ie if the types differ or if one is ordered).
Comment 19428
Date: 2013-12-17 13:26:43 +0100
From: Klaudiusz <>
Will this patch (http://dev.monetdb.org/hg/MonetDB/rev/78eca92f6d68) be merged into Feb2013 branch?
Comment 19595
Date: 2014-02-20 15:02:27 +0100
From: @sjoerdmullender
Jan2014 has been released.
The text was updated successfully, but these errors were encountered: