You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And a SEGFAULT (I am relatively sure this happens when evaluating the same query):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9fbc3ff700 (LWP 17502)]
0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
801 if (isaBatType(getArgType(mb, pci, i))) {
(gdb) bt
0 0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
1 0x00007f9fc74cf2b3 in runMALsequence (cntxt=0xf86630, mb=0x7f9f941b4860, startpc=1, stoppc=0, stk=0x7f9f94e198f0, env=0x0, pcicaller=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:720
2 0x00007f9fc74ce335 in callMAL (cntxt=0xf86630, mb=0x7f9f941b4860, env=0x7f9fbc3feba0, argv=0x7f9f94c96ee0, debug=0 '\000')
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:469
3 0x00007f9fbf282f56 in SQLexecutePrepared (c=0xf86630, be=0x7f9f94164740, q=0x7f9f9415c600) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1840
4 0x00007f9fbf283345 in SQLengineIntern (c=0xf86630, be=0x7f9f94164740) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1907
5 0x00007f9fbf2838ba in SQLengine (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:2008
6 0x00007f9fc74fba95 in runPhase (c=0xf86630, phase=4) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:522
7 0x00007f9fc74fbc82 in runScenarioBody (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:567
8 0x00007f9fc74fbdb4 in runScenario (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:586
9 0x00007f9fc74fce4a in MSserveClient (dummy=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_session.c:431
10 0x0000003599007761 in start_thread () from /lib64/libpthread.so.0
11 0x0000003598ce098d in clone () from /lib64/libc.so.6
Reproducible: Sometimes
MonetDB 5 server v11.15.2 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 35.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with OpenSSL 1.0.0d-fips 8 Feb 2011)
libxml2: 2.7.7 (compiled with 2.7.7)
Compiled by: roberto@spinque01.ins.cwi.nl (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wmissing-include-dirs
Linking : /usr/bin/ld -m elf_x86_64
It would be interesting to know how the bat (the first argument to the MAL function) got its hash table. Do you know?
Is it possible that the hash table was updated after it was created?
The simple solution would be to not depend on the supposed fact that the linked list in the hash table always point towards lower BUNs. But it would be interesting where in the code that supposition is broken.
We can't exploit hash links being in reverse order for existing hash tables.
We tried to exploit the supposed fact that links in the collision
lists of our hash tables were in reverse order of the BUN number. It
seems this assumption was not correct, at least not in all cases.
This should fix bug #3237.
The SEGFAULT mentioned in the initial bug report is apparently not (directly) related to the bug fix, as it still occurs (changing the bug title accordingly).
What I found so far is that the SEGFAULT does not seem to be related to grouping at all.
I have reduced it now to a simple selection on a view with an user function that does string processing (pcre). But again, it seems data-dependent, so I will file a bug report as soon as I can make it somewhat reproducible (unfortunately I am not allowed to share the whole data as they are).
Date: 2013-02-21 13:25:24 +0100
From: @swingbit
To: GDK devs <>
Version: 11.15.1 (Feb2013)
CC: @drstmane
Last updated: 2013-03-07 12:41:23 +0100
Comment 18534
Date: 2013-02-21 13:25:24 +0100
From: @swingbit
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
Build Identifier:
Unfortunately it is not easy to make this issue reproducible.
I got the issue on a query that looks like this:
CREATE TABLE "_attributesString" (
"subject" INTEGER,
"attribute" CHARACTER LARGE OBJECT,
"value" CHARACTER LARGE OBJECT,
"prob" DOUBLE DEFAULT 1.0
);
SELECT subject, attribute, value, MAX(prob) as prob FROM "_attributesString" GROUP BY subject, attribute, value;
However, the failure is data-dependent and the data that triggers the failure is produced inside an iterative process.
I got two types of error, depending on the data at hand:
An assertion: gdk/gdk_group.c:482: BATgroup_internal: Assertion `hs->link[hb] == ((BUN) 9223372036854775807LL) || hs->link[hb] < hb' failed.
And a SEGFAULT (I am relatively sure this happens when evaluating the same query):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9fbc3ff700 (LWP 17502)]
0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
801 if (isaBatType(getArgType(mb, pci, i))) {
(gdb) bt
0 0x00007f9fc74cf8ca in runMALsequence (cntxt=0xf86630, mb=0x7f9f941c24e0, startpc=1, stoppc=49, stk=0x7f9f94e5dc50, env=0x7f9f94e198f0, pcicaller=0x7f9f94debfa0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:801
1 0x00007f9fc74cf2b3 in runMALsequence (cntxt=0xf86630, mb=0x7f9f941b4860, startpc=1, stoppc=0, stk=0x7f9f94e198f0, env=0x0, pcicaller=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:720
2 0x00007f9fc74ce335 in callMAL (cntxt=0xf86630, mb=0x7f9f941b4860, env=0x7f9fbc3feba0, argv=0x7f9f94c96ee0, debug=0 '\000')
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_interpreter.c:469
3 0x00007f9fbf282f56 in SQLexecutePrepared (c=0xf86630, be=0x7f9f94164740, q=0x7f9f9415c600) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1840
4 0x00007f9fbf283345 in SQLengineIntern (c=0xf86630, be=0x7f9f94164740) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:1907
5 0x00007f9fbf2838ba in SQLengine (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/sql/backends/monet5/sql_scenario.c:2008
6 0x00007f9fc74fba95 in runPhase (c=0xf86630, phase=4) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:522
7 0x00007f9fc74fbc82 in runScenarioBody (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:567
8 0x00007f9fc74fbdb4 in runScenario (c=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_scenario.c:586
9 0x00007f9fc74fce4a in MSserveClient (dummy=0xf86630) at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/mal/mal_session.c:431
10 0x0000003599007761 in start_thread () from /lib64/libpthread.so.0
11 0x0000003598ce098d in clone () from /lib64/libc.so.6
Reproducible: Sometimes
MonetDB 5 server v11.15.2 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2013 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 35.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with OpenSSL 1.0.0d-fips 8 Feb 2011)
libxml2: 2.7.7 (compiled with 2.7.7)
Compiled by: roberto@spinque01.ins.cwi.nl (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wmissing-include-dirs
Linking : /usr/bin/ld -m elf_x86_64
Comment 18535
Date: 2013-02-21 14:13:17 +0100
From: @sjoerdmullender
This is the assertion that our hashes link backwards. Apparently they don't always.
Roberto, when this happens in the debugger, can you print both hb and hs->link[hb]?
Comment 18536
Date: 2013-02-21 14:37:54 +0100
From: @swingbit
(gdb) p hb
$1 = 140
(gdb) p hs->link[hb]
$2 = 282
(gdb)
A bit more context:
0 0x0000003598c328f5 in raise () from /lib64/libc.so.6
1 0x0000003598c340d5 in abort () from /lib64/libc.so.6
2 0x0000003598c2b8b5 in __assert_fail () from /lib64/libc.so.6
3 0x00007f1bb3421368 in BATgroup_internal (groups=0x7f1ba8481448, extents=0x7f1ba8481440, histo=0x7f1ba8481438, b=0x7f1b5c3c7610, g=0x7f1b5cac82b0, e=0x0, h=0x0, subsorted=0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/gdk/gdk_group.c:481
4 0x00007f1bb3427762 in BATgroup (groups=0x7f1ba8481448, extents=0x7f1ba8481440, histo=0x7f1ba8481438, b=0x7f1b5c3c7610, g=0x7f1b5cac82b0, e=0x0, h=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/gdk/gdk_group.c:734
5 0x00007f1bb3baad52 in GRPsubgroup4 (ngid=0x7f1b5c5945f0, next=0x7f1b5c594600, nhis=0x7f1b5c594610, bid=0x7f1b5c5944f0, gid=0x7f1b5c5945c0, eid=0x0, hid=0x0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/modules/kernel/group.mx:565
6 0x00007f1bb3baae93 in GRPsubgroup2 (ngid=0x7f1b5c5945f0, next=0x7f1b5c594600, nhis=0x7f1b5c594610, bid=0x7f1b5c5944f0, gid=0x7f1b5c5945c0)
at /opt/spinque/MonetDBServer/MonetDB.Spinque_Feb2013/src/monetdb5/modules/kernel/group.mx:586
Comment 18537
Date: 2013-02-21 14:56:05 +0100
From: @sjoerdmullender
It would be interesting to know how the bat (the first argument to the MAL function) got its hash table. Do you know?
Is it possible that the hash table was updated after it was created?
The simple solution would be to not depend on the supposed fact that the linked list in the hash table always point towards lower BUNs. But it would be interesting where in the code that supposition is broken.
Comment 18538
Date: 2013-02-21 15:06:02 +0100
From: @swingbit
Unfortunately I have difficulties getting the MAL plan of the failing transaction.
Comment 18541
Date: 2013-02-22 11:06:56 +0100
From: @sjoerdmullender
Changeset 88acc1bec9c7 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=88acc1bec9c7
Changeset description:
Comment 18542
Date: 2013-02-22 11:07:20 +0100
From: @sjoerdmullender
Roberto, can you test please.
Comment 18543
Date: 2013-02-22 13:35:41 +0100
From: @swingbit
The problem seems indeed solved, thanks!
Comment 18545
Date: 2013-02-22 15:31:59 +0100
From: @swingbit
The SEGFAULT mentioned in the initial bug report is apparently not (directly) related to the bug fix, as it still occurs (changing the bug title accordingly).
Comment 18546
Date: 2013-02-22 15:49:43 +0100
From: @sjoerdmullender
Can you then submit a fresh bug report for the segfault?
Comment 18549
Date: 2013-02-23 00:06:32 +0100
From: @drstmane
Does that segfault also occur with assertions enabled?
Comment 18550
Date: 2013-02-25 10:07:44 +0100
From: @swingbit
Stefan, I will try that.
What I found so far is that the SEGFAULT does not seem to be related to grouping at all.
I have reduced it now to a simple selection on a view with an user function that does string processing (pcre). But again, it seems data-dependent, so I will file a bug report as soon as I can make it somewhat reproducible (unfortunately I am not allowed to share the whole data as they are).
Comment 18551
Date: 2013-02-25 10:17:53 +0100
From: @swingbit
Stefan,
Now that I think about it, I am compiling as developer, so assertions are enabled:
--enable-strict --enable-assert --enable-debug --disable-optimize
Comment 18593
Date: 2013-03-07 12:41:23 +0100
From: @sjoerdmullender
Feb2013-SP1 has been released.
The text was updated successfully, but these errors were encountered: