You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>
To: GDK devs <>
Version: 11.23.13 (Jun2016-SP2)
Last updated: 2017-07-17 16:07:45 +0200
Comment 25345
Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>
Build is Jun2016 7a344a54d712 (but I believe the issue still exists in the default branch).
I've got an mserver5 instance stuck here:
(gdb) bt
0 0x00007f59ce6b0893 in select () at ../sysdeps/unix/syscall-template.S:81
1 0x00007f59cfd1f5f9 in MT_sleep_ms (ms=ms@entry=4) at gdk_posix.c:1175
2 0x00007f59cfc809d9 in incref (lock=1, logical=0, i=13630)
at gdk_bbp.c:2451
3 BBPincref (i=i@entry=13630, logical=logical@entry=0) at gdk_bbp.c:2536
4 0x00007f59d02c06d3 in BATdescriptor (i=13630) at ../../../gdk/gdk.h:2586
5 CMDbbp (ID=0xf937950, NS=0xf937970, TT=0xf937990, CNT=0xf9379b0,
REFCNT=0xf9379d0, LREFCNT=0xf9379f0, LOCATION=0xf937a10, HEAT=0xf937a30,
DIRTY=0xf937a50, STATUS=0xf937a70, KIND=0xf937a90) at bbp.c:437
6 0x00007f59d022c6e4 in malCommandCall (stk=,
pci=) at mal_interpreter.c:165
7 0x00007f59d022d7ab in runMALsequence (cntxt=0x0, mb=0x1812fe30,
startpc=0, stoppc=-1, stk=0xf9378a0, env=0x0, pcicaller=0x0)
at mal_interpreter.c:670
8 0x00007f59d022ec2b in callMAL (cntxt=0x0, cntxt@entry=0x7f59c940d4d0,
mb=0x0, mb@entry=0x1812fe30, env=0xf937930, argv=0xa0, debug=-128 '\200')
at mal_interpreter.c:436
9 0x00007f59c8ca300c in SQLexecutePrepared (c=0x7f59c940d4d0,
be=be@entry=0x6c46410, q=0x5e163c0) at sql_execute.c:370
10 0x00007f59c8ca34ea in SQLengineIntern (c=0x7f59c940d4d0, be=0x6c46410)
at sql_execute.c:435
11 0x00007f59d0248217 in runPhase (phase=4, c=0x7f59c940d4d0)
at mal_scenario.c:531
12 runScenarioBody (c=c@entry=0x7f59c940d4d0) at mal_scenario.c:575
13 0x00007f59d0248d9d in runScenario (c=c@entry=0x7f59c940d4d0)
at mal_scenario.c:595
14 0x00007f59d02492e0 in MSserveClient (dummy=dummy@entry=0x7f59c940d4d0)
at mal_session.c:457
15 0x00007f59d0249946 in MSscheduleClient (
command=command@entry=0x1fe95db0 "",
challenge=challenge@entry=0x7f535dc1ce80 "BjJUfUrNT", fin=0x8fed910,
fout=fout@entry=0x7f59b81db8c0) at mal_session.c:342
16 0x00007f59d02cab96 in doChallenge (data=) at mal_mapi.c:205
17 0x00007f59cfd1e0af in thread_starter (arg=)
at gdk_system.c:485
18 0x00007f59ce982064 in start_thread (arg=0x7f535dc1d700)
at pthread_create.c:309
19 0x00007f59ce6b762d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p BBP[0][13630]
$1 = {cache = {0x0, 0x0}, logical = {0x6a59990 "tmp_32476",
0x16a6cc70 "tmpr_32476"}, bak = {0x6a59990 "tmp_32476", 0x0}, next = {
59751, 0}, desc = 0x0, physical = 0x5ece700 "03/24/32476", options = 0x0,
refs = 1, lrefs = 0, lastused = 272413352, status = 2048}
This is the first (and so far only) time this has happened. I don't have logging of what queries led up to the incident.
The only place I can find to set BBPrec::status=2048 is BBPinsert(), so my hypothesis is:
BATnewstorage()
calls BATcreatedesc()
calls BBPinsert()
either HEAPalloc() or ATOMheap() fails, so BATnewstorage() returns before calling BBPcacheit().
Similar bailouts seem possible in BATcreatedesc() and VIEWcreate_() (and possibly some other places that I haven't found).
This bug, therefore, is a request that you check your error handling in these locations to ensure that the BBP is left in a stable state upon errors.
There is nothing in merovingian.log (other than normal client connection logging) anywhere near the time the problem started, so I'm not entirely happy with the above hypothesis; if you can come up with a better theory (or find a likely error path which logs nothing) then I'd be grateful. If you'd like any more information out of my core dump then let me know.
Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>
To: GDK devs <>
Version: 11.23.13 (Jun2016-SP2)
Last updated: 2017-07-17 16:07:45 +0200
Comment 25345
Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>
Build is Jun2016 7a344a54d712 (but I believe the issue still exists in the default branch).
I've got an mserver5 instance stuck here:
(gdb) bt
0 0x00007f59ce6b0893 in select () at ../sysdeps/unix/syscall-template.S:81
1 0x00007f59cfd1f5f9 in MT_sleep_ms (ms=ms@entry=4) at gdk_posix.c:1175
2 0x00007f59cfc809d9 in incref (lock=1, logical=0, i=13630)
at gdk_bbp.c:2451
3 BBPincref (i=i@entry=13630, logical=logical@entry=0) at gdk_bbp.c:2536
4 0x00007f59d02c06d3 in BATdescriptor (i=13630) at ../../../gdk/gdk.h:2586
5 CMDbbp (ID=0xf937950, NS=0xf937970, TT=0xf937990, CNT=0xf9379b0,
REFCNT=0xf9379d0, LREFCNT=0xf9379f0, LOCATION=0xf937a10, HEAT=0xf937a30,
DIRTY=0xf937a50, STATUS=0xf937a70, KIND=0xf937a90) at bbp.c:437
6 0x00007f59d022c6e4 in malCommandCall (stk=,
pci=) at mal_interpreter.c:165
7 0x00007f59d022d7ab in runMALsequence (cntxt=0x0, mb=0x1812fe30,
startpc=0, stoppc=-1, stk=0xf9378a0, env=0x0, pcicaller=0x0)
at mal_interpreter.c:670
8 0x00007f59d022ec2b in callMAL (cntxt=0x0, cntxt@entry=0x7f59c940d4d0,
mb=0x0, mb@entry=0x1812fe30, env=0xf937930, argv=0xa0, debug=-128 '\200')
at mal_interpreter.c:436
9 0x00007f59c8ca300c in SQLexecutePrepared (c=0x7f59c940d4d0,
be=be@entry=0x6c46410, q=0x5e163c0) at sql_execute.c:370
10 0x00007f59c8ca34ea in SQLengineIntern (c=0x7f59c940d4d0, be=0x6c46410)
at sql_execute.c:435
11 0x00007f59d0248217 in runPhase (phase=4, c=0x7f59c940d4d0)
at mal_scenario.c:531
12 runScenarioBody (c=c@entry=0x7f59c940d4d0) at mal_scenario.c:575
13 0x00007f59d0248d9d in runScenario (c=c@entry=0x7f59c940d4d0)
at mal_scenario.c:595
14 0x00007f59d02492e0 in MSserveClient (dummy=dummy@entry=0x7f59c940d4d0)
at mal_session.c:457
15 0x00007f59d0249946 in MSscheduleClient (
command=command@entry=0x1fe95db0 "",
challenge=challenge@entry=0x7f535dc1ce80 "BjJUfUrNT", fin=0x8fed910,
fout=fout@entry=0x7f59b81db8c0) at mal_session.c:342
16 0x00007f59d02cab96 in doChallenge (data=) at mal_mapi.c:205
17 0x00007f59cfd1e0af in thread_starter (arg=)
at gdk_system.c:485
18 0x00007f59ce982064 in start_thread (arg=0x7f535dc1d700)
at pthread_create.c:309
19 0x00007f59ce6b762d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p BBP[0][13630]
$1 = {cache = {0x0, 0x0}, logical = {0x6a59990 "tmp_32476",
0x16a6cc70 "tmpr_32476"}, bak = {0x6a59990 "tmp_32476", 0x0}, next = {
59751, 0}, desc = 0x0, physical = 0x5ece700 "03/24/32476", options = 0x0,
refs = 1, lrefs = 0, lastused = 272413352, status = 2048}
This is the first (and so far only) time this has happened. I don't have logging of what queries led up to the incident.
The only place I can find to set BBPrec::status=2048 is BBPinsert(), so my hypothesis is:
BATnewstorage()
calls BATcreatedesc()
calls BBPinsert()
either HEAPalloc() or ATOMheap() fails, so BATnewstorage() returns before calling BBPcacheit().
Similar bailouts seem possible in BATcreatedesc() and VIEWcreate_() (and possibly some other places that I haven't found).
This bug, therefore, is a request that you check your error handling in these locations to ensure that the BBP is left in a stable state upon errors.
There is nothing in merovingian.log (other than normal client connection logging) anywhere near the time the problem started, so I'm not entirely happy with the above hypothesis; if you can come up with a better theory (or find a likely error path which logs nothing) then I'd be grateful. If you'd like any more information out of my core dump then let me know.
Comment 25351
Date: 2017-05-29 10:33:55 +0200
From: @sjoerdmullender
If you still have this in a debugger, can you share the output of
thread apply all bt
Comment 25352
Date: 2017-05-29 10:57:10 +0200
From: @sjoerdmullender
(In reply to Sjoerd Mullender from comment 1)
Belay this request. After studying the situation (and your theory) a bit more, I think you're on to something.
Comment 25354
Date: 2017-05-29 17:12:32 +0200
From: MonetDB Mercurial Repository <>
Changeset e0f18665e346 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=e0f18665e346
Changeset description:
The text was updated successfully, but these errors were encountered: