You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You could run the server with the option -d2097152 ( -d$((1<<21)) ) to get algorithm debug output on the server console (standard error). It will print, among other things, the exact time the join operation took. It is calculated at the lowest level, so it won't be any more accurate than that.
When I try your scenario on my machine, I get 491usec in the trace output, and at the same time 475usec on the server console. It also tells me it does a mergejoin_int.
Are you using a debug build? In particular, do you have assertions enabled?
If so, try running the server with -d33554432 ( -d$((1<<25)) ) to disable cleaning of memory.
I profiled the query using callgrind and I saw that a huge amount of time was spent freeing (and writing debug patterns to the to-be-freed memory).
Without callgrind, I got about 2.5 seconds wall clock time for the query. Without callgrind and with -d33554432 it took a staggering 3.6 milliseconds.
I now put the SQL statements in a script and run it with
mclient -ims -d$DB script.sql
If I run mserver without options (debug build, with assertions, not optimized), the select (without TRACE keyword, but after the EXPLAIN), takes 2.6 seconds.
If I run mserver with -d33554432 the select takes 3.4 milliseconds.
Only compiling with --disable-debug --disable-developer --enable-optimize does not change anything, still 2.3 seconds.
Using debug mask 33554432 makes the whole select execute on my machine in 1.2ms and 1.9ms (88usec for the subjoin) without and with trace, respectively.
Now, these are the timings I'd love to have in production :D
It clarifies that zeroing the to-be-freed memory is actually a debug measure to simplify detection of double-frees. For production builds, this is not needed.
What perhaps is not very clear is that --disable-developer DOES NOT imply --disable-assert. I got <1ms wall-clock time by using these configure switches, and no debug bitmask:
Date: 2017-03-02 18:45:03 +0100
From: @swingbit
To: MonetDB5 devs <>
Version: 11.25.5 (Dec2016-SP1)
Last updated: 2017-10-26 14:01:32 +0200
Comment 25092
Date: 2017-03-02 18:45:03 +0100
From: @swingbit
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier:
Simple test which (to me) gives unexplainable profiling output.
-- create 500M integers
create table seq as select value from sys.generate_series(cast(1 as int),cast(500000000 as int)) with data;
-- get 10 of them into a second table
create table x as select * from seq sample 10 with data;
-- make sure we get a linear and easy to read plan
alter table seq set read only;
alter table x set read only;
set optimizer='sequential_pipe';
-- measure the time needed to obtain a MAL plan for a simple join (1-2 ms on my machine)
explain select seq.value from seq, x where seq.value=x.value;
-- trace the same query
trace select seq.value from seq, x where seq.value=x.value;
Here's the problem, I cannot explain the timings.
Please find in attachment the outputs for EXPLAIN, TRACE, and STETHOSCOPE.
Reproducible: Always
$ mserver5 --version
MonetDB 5 server v11.25.10 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.40 2017-01-11 (compiled with 8.40)
openssl: OpenSSL 1.0.2j 26 Sep 2016 (compiled with OpenSSL 1.0.2j-fips 26 Sep 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: roberto@photon.hq.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code
Linking : /usr/bin/ld -m elf_x86_64
Comment 25093
Date: 2017-03-02 18:45:33 +0100
From: @swingbit
Created attachment 541
explain output
Comment 25094
Date: 2017-03-02 18:45:52 +0100
From: @swingbit
Created attachment 542
trace output
Comment 25095
Date: 2017-03-02 18:46:15 +0100
From: @swingbit
Created attachment 543
stethoscope output
Comment 25096
Date: 2017-03-02 19:00:05 +0100
From: @swingbit
Correction:
Both tables are sorted, and I verified that a mergejoin is used.
219usec seem too short for this join, but 2.4 seconds are definitly too long.
Comment 25097
Date: 2017-03-02 19:33:00 +0100
From: @swingbit
Second correction: compilation was not optimized.
I tried with optimized compilation, nothing changes.
Comment 25098
Date: 2017-03-02 20:48:58 +0100
From: @sjoerdmullender
You could run the server with the option -d2097152 ( -d$((1<<21)) ) to get algorithm debug output on the server console (standard error). It will print, among other things, the exact time the join operation took. It is calculated at the lowest level, so it won't be any more accurate than that.
When I try your scenario on my machine, I get 491usec in the trace output, and at the same time 475usec on the server console. It also tells me it does a mergejoin_int.
Comment 25099
Date: 2017-03-02 20:55:50 +0100
From: @swingbit
That is good news. The join IS very fast on this large-ish but easy join.
Do you also get more than 2 seconds wall-clock for the entire query though? If so, the question is where the time goes.
Comment 25101
Date: 2017-03-02 21:05:52 +0100
From: @sjoerdmullender
Are you using a debug build? In particular, do you have assertions enabled?
If so, try running the server with -d33554432 ( -d$((1<<25)) ) to disable cleaning of memory.
I profiled the query using callgrind and I saw that a huge amount of time was spent freeing (and writing debug patterns to the to-be-freed memory).
Without callgrind, I got about 2.5 seconds wall clock time for the query. Without callgrind and with -d33554432 it took a staggering 3.6 milliseconds.
Comment 25102
Date: 2017-03-02 21:13:16 +0100
From: @sjoerdmullender
I didn't do my test correctly. The fast time was the time for the explain.
The select was in fact about 22ms.
Comment 25103
Date: 2017-03-02 21:21:42 +0100
From: @sjoerdmullender
It's too late for this.
I now put the SQL statements in a script and run it with
mclient -ims -d$DB script.sql
If I run mserver without options (debug build, with assertions, not optimized), the select (without TRACE keyword, but after the EXPLAIN), takes 2.6 seconds.
If I run mserver with -d33554432 the select takes 3.4 milliseconds.
Comment 25135
Date: 2017-03-03 12:03:34 +0100
From: @swingbit
Sjoerd, I can confirm your timings.
Only compiling with --disable-debug --disable-developer --enable-optimize does not change anything, still 2.3 seconds.
Using debug mask 33554432 makes the whole select execute on my machine in 1.2ms and 1.9ms (88usec for the subjoin) without and with trace, respectively.
Now, these are the timings I'd love to have in production :D
Comment 25136
Date: 2017-03-03 12:07:23 +0100
From: @swingbit
What are the risks of not cleaning the freed memory?
Comment 25137
Date: 2017-03-03 12:47:19 +0100
From: @swingbit
I checked the code of GDKfree().
It clarifies that zeroing the to-be-freed memory is actually a debug measure to simplify detection of double-frees. For production builds, this is not needed.
What perhaps is not very clear is that --disable-developer DOES NOT imply --disable-assert. I got <1ms wall-clock time by using these configure switches, and no debug bitmask:
--disable-debug --disable-developer --disable-assert --enable-optimize
Would't you agree that --disable-developer should imply --disable-assert ?
On a different note, is the profiling information on a developer build to be considered wrong or correct? It sure isn't very informative.
Comment 25613
Date: 2017-09-05 12:37:14 +0200
From: @sjoerdmullender
I've changed the description of the --enable-developer option. It's orthogonal to the other options, but it should be clearer now what it's for.
Can this report be closed now?
Comment 25615
Date: 2017-09-06 09:24:11 +0200
From: @swingbit
It can be closed, for me. Thanks.
Comment 25617
Date: 2017-09-06 09:35:14 +0200
From: @sjoerdmullender
"Fixed" by clarifying the configure option that apparently caused some confusion.
The text was updated successfully, but these errors were encountered: