Incorrect profiling #6239

monetdb-team · 2020-11-30T15:33:33Z

Date: 2017-03-02 18:45:03 +0100
From: @swingbit
To: MonetDB5 devs <>
Version: 11.25.5 (Dec2016-SP1)

Last updated: 2017-10-26 14:01:32 +0200

Comment 25092

Date: 2017-03-02 18:45:03 +0100
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier:

Simple test which (to me) gives unexplainable profiling output.

-- create 500M integers
create table seq as select value from sys.generate_series(cast(1 as int),cast(500000000 as int)) with data;

-- get 10 of them into a second table
create table x as select * from seq sample 10 with data;

-- make sure we get a linear and easy to read plan
alter table seq set read only;
alter table x set read only;
set optimizer='sequential_pipe';

-- measure the time needed to obtain a MAL plan for a simple join (1-2 ms on my machine)
explain select seq.value from seq, x where seq.value=x.value;

-- trace the same query
trace select seq.value from seq, x where seq.value=x.value;

Here's the problem, I cannot explain the timings.

Dec2016, tip of hg, optimized compilation, 16GB ram, dbfarm on SSD.
Wall-clock timing is 2.4 seconds (this is correctly reported, it does take this time).
It takes the same time without trace.
MAL plan generation was 1-2 ms. Negligible
The single instructions take almost no time.
Execution is linear on one thread
The join is expected to be the only meaty operation, but it is reported to take 219usec. I find it hard to believe.
Still, 2.4 seconds for this join seem too long (500M x 10, integers, with the large table sorted)

Please find in attachment the outputs for EXPLAIN, TRACE, and STETHOSCOPE.

Reproducible: Always

$ mserver5 --version
MonetDB 5 server v11.25.10 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.40 2017-01-11 (compiled with 8.40)
openssl: OpenSSL 1.0.2j 26 Sep 2016 (compiled with OpenSSL 1.0.2j-fips 26 Sep 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: roberto@photon.hq.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking : /usr/bin/ld -m elf_x86_64

Comment 25093

Date: 2017-03-02 18:45:33 +0100
From: @swingbit

Created attachment 541
explain output

Attached file: 6239_explain.txt (text/plain, 6146 bytes)
Description: explain output

Comment 25094

Date: 2017-03-02 18:45:52 +0100
From: @swingbit

Created attachment 542
trace output

Attached file: 6239_trace.txt (text/plain, 5368 bytes)
Description: trace output

Comment 25095

Date: 2017-03-02 18:46:15 +0100
From: @swingbit

Created attachment 543
stethoscope output

Attached file: 6239_stethoscope.txt (text/plain, 12500 bytes)
Description: stethoscope output

Comment 25096

Date: 2017-03-02 19:00:05 +0100
From: @swingbit

Correction:

Both tables are sorted, and I verified that a mergejoin is used.

219usec seem too short for this join, but 2.4 seconds are definitly too long.

Comment 25097

Date: 2017-03-02 19:33:00 +0100
From: @swingbit

Second correction: compilation was not optimized.
I tried with optimized compilation, nothing changes.

Comment 25098

Date: 2017-03-02 20:48:58 +0100
From: @sjoerdmullender

You could run the server with the option -d2097152 ( -d$((1<<21)) ) to get algorithm debug output on the server console (standard error). It will print, among other things, the exact time the join operation took. It is calculated at the lowest level, so it won't be any more accurate than that.

When I try your scenario on my machine, I get 491usec in the trace output, and at the same time 475usec on the server console. It also tells me it does a mergejoin_int.

Comment 25099

Date: 2017-03-02 20:55:50 +0100
From: @swingbit

That is good news. The join IS very fast on this large-ish but easy join.

Do you also get more than 2 seconds wall-clock for the entire query though? If so, the question is where the time goes.

Comment 25101

Date: 2017-03-02 21:05:52 +0100
From: @sjoerdmullender

Are you using a debug build? In particular, do you have assertions enabled?
If so, try running the server with -d33554432 ( -d$((1<<25)) ) to disable cleaning of memory.
I profiled the query using callgrind and I saw that a huge amount of time was spent freeing (and writing debug patterns to the to-be-freed memory).

Without callgrind, I got about 2.5 seconds wall clock time for the query. Without callgrind and with -d33554432 it took a staggering 3.6 milliseconds.

Comment 25102

Date: 2017-03-02 21:13:16 +0100
From: @sjoerdmullender

I didn't do my test correctly. The fast time was the time for the explain.
The select was in fact about 22ms.

Comment 25103

Date: 2017-03-02 21:21:42 +0100
From: @sjoerdmullender

It's too late for this.

I now put the SQL statements in a script and run it with
mclient -ims -d$DB script.sql

If I run mserver without options (debug build, with assertions, not optimized), the select (without TRACE keyword, but after the EXPLAIN), takes 2.6 seconds.
If I run mserver with -d33554432 the select takes 3.4 milliseconds.

Comment 25135

Date: 2017-03-03 12:03:34 +0100
From: @swingbit

Sjoerd, I can confirm your timings.

Only compiling with --disable-debug --disable-developer --enable-optimize does not change anything, still 2.3 seconds.

Using debug mask 33554432 makes the whole select execute on my machine in 1.2ms and 1.9ms (88usec for the subjoin) without and with trace, respectively.

Now, these are the timings I'd love to have in production :D

Comment 25136

Date: 2017-03-03 12:07:23 +0100
From: @swingbit

What are the risks of not cleaning the freed memory?

Comment 25137

Date: 2017-03-03 12:47:19 +0100
From: @swingbit

I checked the code of GDKfree().

It clarifies that zeroing the to-be-freed memory is actually a debug measure to simplify detection of double-frees. For production builds, this is not needed.

What perhaps is not very clear is that --disable-developer DOES NOT imply --disable-assert. I got <1ms wall-clock time by using these configure switches, and no debug bitmask:

--disable-debug --disable-developer --disable-assert --enable-optimize

Would't you agree that --disable-developer should imply --disable-assert ?

On a different note, is the profiling information on a developer build to be considered wrong or correct? It sure isn't very informative.

Comment 25613

Date: 2017-09-05 12:37:14 +0200
From: @sjoerdmullender

I've changed the description of the --enable-developer option. It's orthogonal to the other options, but it should be clearer now what it's for.

Can this report be closed now?

Comment 25615

Date: 2017-09-06 09:24:11 +0200
From: @swingbit

It can be closed, for me. Thanks.

Comment 25617

Date: 2017-09-06 09:35:14 +0200
From: @sjoerdmullender

"Fixed" by clarifying the configure option that apparently caused some confusion.

monetdb-team added bug Something isn't working MAL/M5 normal labels Nov 30, 2020

monetdb-team closed this as completed Nov 30, 2020

sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect profiling #6239

Incorrect profiling #6239

monetdb-team commented Nov 30, 2020

Incorrect profiling #6239

Incorrect profiling #6239

Comments

monetdb-team commented Nov 30, 2020

Comment 25092

Comment 25093

Comment 25094

Comment 25095

Comment 25096

Comment 25097

Comment 25098

Comment 25099

Comment 25101

Comment 25102

Comment 25103

Comment 25135

Comment 25136

Comment 25137

Comment 25613

Comment 25615

Comment 25617