Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-column integer aggregation extremely slow #6317

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Two-column integer aggregation extremely slow #6317

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working GDK Kernel normal

Comments

@monetdb-team
Copy link

Date: 2017-05-15 21:29:41 +0200
From: @swingbit
To: GDK devs <>
Version: 11.25.21 (Dec2016-SP4)

Last updated: 2017-08-15 16:10:23 +0200

Comment 25328

Date: 2017-05-15 21:29:41 +0200
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Build Identifier:

I have a 280M tuples, two-column table of integers, with the following details:

sql>\d bigaggr
CREATE TABLE "sys"."bigaggr" (
"a1" INTEGER,
"a2" INTEGER
);
sql>select count(*) from bigaggr;
+-----------+
| L3 |
+===========+
| 279401374 |
+-----------+
1 tuple (30.333ms)
sql>select count(distinct a1) from bigaggr;
+--------+
| L3 |
+========+
| 153033 |
+--------+
1 tuple (804.289ms)
sql>select count(distinct a2) from bigaggr;
+-------+
| L3 |
+=======+
| 41365 |
+-------+
1 tuple (762.654ms)

On this table, I perform the following aggregation:

CREATE TABLE x1 AS
SELECT a1, a2
FROM bigaggr
GROUP BY a1,a2
WITH DATA;

This runs for hours at 100% CPU, about 5-6 GB resident memory (13GB virtual), until it eats up about 80GB of disk space and finally crashes.

Stethoscope shows that group.subgroup() correctly delivers 153033 groups in just a few seconds.

Then, groupsubgroupdone() runs forever.

Swapping the two columns in the GROUP BY clause does reflect in the MAL plan, but shows the same behaviour: the secondary aggregation fails.

Notice that the same query where GROUP BY is replaced with ORDER BY succeeds in just 50 seconds.

The dump of the table is available here (672MB compressed):
https://drive.google.com/open?id=0BwkuZ5tb4WEnOEdKRk85eXpXMXc

N.B.: sometimes, with many small groups, mitosis can hurt aggregations badly, but it is not to blame in this case. I have tried with no_mitosis_pipe, with the same results.

Reproducible: Always

$ mserver5 --version
MonetDB 5 server v11.25.22 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.40 2017-01-11 (compiled with 8.40)
openssl: OpenSSL 1.0.2j 26 Sep 2016 (compiled with OpenSSL 1.0.2j-fips 26 Sep 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: roberto@photon.hq.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -DDISABLE_SPINQUE -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 25330

Date: 2017-05-16 00:32:28 +0200
From: @swingbit

A correction: swapping the two columns in the GROUP BY clause, thus executing

CREATE TABLE x1 AS
SELECT a1, a2
FROM bigaggr
GROUP BY a2,a1
WITH DATA;

does complete, in 85 minutes!

Comment 25337

Date: 2017-05-16 13:37:44 +0200
From: @sjoerdmullender

I can see that the second call to subgroup does indeed take (just about) forever, but I don't see a growth in the database. Growth is anyway unexpected since there is no join involved which might explode the size of the datebase.

As to the size of the server, that is to be expected. The inputs of the second subgroup are two 280M row columns of 4 and 8 bytes wide respectively (the column being grouped (int) and the result of the first subgroup (oid)). The result of the call is a new group column (280M rows of oid) plus an extent and histogram (both 8 byte wide and with one row per group). There is also a temporary hash structure with is roughly 280M * 2 * 4 bytes.

With about 260M groups in the result, this gives about 12G of memory being involved in this call.

A fix for the speed issue is on its way.

Comment 25338

Date: 2017-05-16 13:39:55 +0200
From: MonetDB Mercurial Repository <>

Changeset f5629e7c3ee7 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=f5629e7c3ee7

Changeset description:

By doing what we say, subgrouping can be a lot faster.
This should fix bug #6317.

Comment 25339

Date: 2017-05-16 14:16:16 +0200
From: @swingbit

Yep, it now completes in 22 seconds. Thanks!

I'll try to have a look at why I saw that database growth. But this is fixed.

Comment 25340

Date: 2017-05-16 14:49:55 +0200
From: @swingbit

For completeness, the title of this bug report is perhaps misleading.
I did experience a crash due to database growth, but it was caused by something else (which I think I've just found and am going to report soon).

So this was just extremely slow.

Comment 25341

Date: 2017-05-16 14:52:27 +0200
From: @sjoerdmullender

Can't you edit the title? Right next to the title there should be an "edit" link.

Comment 25579

Date: 2017-08-15 16:10:23 +0200
From: MonetDB Mercurial Repository <>

Changeset 90a3b702ff32 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=90a3b702ff32

Changeset description:

Use a clever trick to reduce the overlap of value and group ID in subgroup.
See the comment.
This fix keeps the performance of query 16 (see changeset
cdf01e261bc6) and brings back the performance for bug #6317.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GDK Kernel normal
Projects
None yet
Development

No branches or pull requests

2 participants