Two-column aggregation on join result extremely slow. #6321

monetdb-team · 2020-11-30T15:48:31Z

Date: 2017-05-16 16:16:32 +0200
From: @swingbit
To: GDK devs <>
Version: 11.25.21 (Dec2016-SP4)

Last updated: 2017-10-26 14:01:31 +0200

Comment 25342

Date: 2017-05-16 16:16:32 +0200
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36
Build Identifier:

This is very similar to Bug #6317 (fixed in the meantime).

It is again about a slow aggregation on a two-column table, but it extends it with the join that generates the table to aggregate upon, and one more column.

Data to bootstrap the example can be found here:
https://drive.google.com/open?id=0BwkuZ5tb4WEnaHdYR2dyOWtpdGM

With that data:

-- Generate the table to aggregate upon (result is 280M tuples):
sql>CREATE TABLE tmp as SELECT ap.a1 as a1, bp.a1 AS a3, ap.prob * bp.prob as prob FROM ap, bp WHERE ap.a2 = bp.a2 WITH DATA;
operation successful (26.3s)

-- Aggregate, summing the prob column:
sql>CREATE TABLE t1 AS SELECT a1, a3, sum(prob) FROM tmp GROUP BY a1, a3 WITH DATA;
operation successful (35.0s)

No problem here.

If I rewrite the same query with the join as a subselect:

sql>CREATE TABLE t1 AS SELECT a1, a3, sum(prob) FROM (SELECT ap.a1 as a1, bp.a1 AS a3, ap.prob * bp.prob as prob FROM ap, bp WHERE ap.a2 = bp.a2) AS tmp GROUP BY a1, a3 WITH DATA;

this takes again forever (that is, I needed to stop it).

The interesting fact is that it is again (as in Bug #6317) stuck in group.subgroupdone().

Also interesting, the same query with only grouping but without the actual aggregation sum(prob) succeeds with no problem:

sql>CREATE TABLE t1 AS SELECT a1, a3 FROM (SELECT ap.a1 as a1, bp.a1 AS a3, ap.prob * bp.prob as prob FROM ap, bp WHERE ap.a2 = bp.a2) AS tmp GROUP BY a1, a3 WITH DATA;
operation successful (47.1s)

Reproducible: Always

$ mserver5 --version
MonetDB 5 server v11.25.22 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.40 2017-01-11 (compiled with 8.40)
openssl: OpenSSL 1.0.2j 26 Sep 2016 (compiled with OpenSSL 1.0.2j-fips 26 Sep 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: roberto@photon.hq.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -DDISABLE_SPINQUE -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 25353

Date: 2017-05-29 16:51:17 +0200
From: @sjoerdmullender

On my desktop with 16 GB of memory I can reproduce the problem. On my laptop with 32 GB of memory, both queries run reasonably fast.
It seems the problem is that for some reason the query with the sum requires more memory than the one without the sum. I don't understand why, though. The plans of both queries are identical up to and including the call to group.subgroupdone, so you'd expect basically the same behavior for both up to this point. But that's not what we're getting. The two plans are executed differently, possibly because of what follows later in the plan.
The query without sum stays within 13 GB or so in ther VIRT column of top, whereas the query with sum grows to 18 GB. The sum query therefore trashes the disk and thus becomes very slow.

Comment 25612

Date: 2017-09-05 11:32:22 +0200
From: @sjoerdmullender

With the current Jul2017 code, the problem seems to have been fixed. All three variations run in about a minute:

sql>CREATE TABLE t1 AS SELECT a1, a3, sum(prob) FROM tmp GROUP BY a1, a3 WITH DATA;
operation successful (1m 8s)
sql>drop table t1;
operation successful (19.492ms)
sql>CREATE TABLE t1 AS SELECT a1, a3, sum(prob) FROM (SELECT ap.a1 as a1, bp.a1 AS a3, ap.prob * bp.prob as prob FROM ap, bp WHERE ap.a2 = bp.a2) AS tmp GROUP BY a1, a3 WITH DATA;
operation successful (1m 14s)
sql>drop table t1;
operation successful (17.256ms)
sql>CREATE TABLE t1 AS SELECT a1, a3 FROM (SELECT ap.a1 as a1, bp.a1 AS a3, ap.prob * bp.prob as prob FROM ap, bp WHERE ap.a2 = bp.a2) AS tmp GROUP BY a1, a3 WITH DATA;
operation successful (53.0s)

This was probably fixed by changeset 90a3b702ff32 and the tweaks to the BATgroup code that followed.

Roberto, can you confirm?

Comment 25616

Date: 2017-09-06 09:28:48 +0200
From: @swingbit

It seems fixed indeed. Thanks.

monetdb-team added bug Something isn't working GDK Kernel normal labels Nov 30, 2020

monetdb-team closed this as completed Nov 30, 2020

sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two-column aggregation on join result extremely slow. #6321

Two-column aggregation on join result extremely slow. #6321

monetdb-team commented Nov 30, 2020

Two-column aggregation on join result extremely slow. #6321

Two-column aggregation on join result extremely slow. #6321

Comments

monetdb-team commented Nov 30, 2020

Comment 25342

Comment 25353

Comment 25612

Comment 25616