Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock calling sys.bbp() #6323

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Deadlock calling sys.bbp() #6323

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working GDK Kernel normal

Comments

@monetdb-team
Copy link

Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>
To: GDK devs <>
Version: 11.23.13 (Jun2016-SP2)

Last updated: 2017-07-17 16:07:45 +0200

Comment 25345

Date: 2017-05-22 14:52:50 +0200
From: Richard Hughes <<richard.monetdb>>

Build is Jun2016 7a344a54d712 (but I believe the issue still exists in the default branch).

I've got an mserver5 instance stuck here:

(gdb) bt
0 0x00007f59ce6b0893 in select () at ../sysdeps/unix/syscall-template.S:81
1 0x00007f59cfd1f5f9 in MT_sleep_ms (ms=ms@entry=4) at gdk_posix.c:1175
2 0x00007f59cfc809d9 in incref (lock=1, logical=0, i=13630)
at gdk_bbp.c:2451
3 BBPincref (i=i@entry=13630, logical=logical@entry=0) at gdk_bbp.c:2536
4 0x00007f59d02c06d3 in BATdescriptor (i=13630) at ../../../gdk/gdk.h:2586
5 CMDbbp (ID=0xf937950, NS=0xf937970, TT=0xf937990, CNT=0xf9379b0,
REFCNT=0xf9379d0, LREFCNT=0xf9379f0, LOCATION=0xf937a10, HEAT=0xf937a30,
DIRTY=0xf937a50, STATUS=0xf937a70, KIND=0xf937a90) at bbp.c:437
6 0x00007f59d022c6e4 in malCommandCall (stk=,
pci=) at mal_interpreter.c:165
7 0x00007f59d022d7ab in runMALsequence (cntxt=0x0, mb=0x1812fe30,
startpc=0, stoppc=-1, stk=0xf9378a0, env=0x0, pcicaller=0x0)
at mal_interpreter.c:670
8 0x00007f59d022ec2b in callMAL (cntxt=0x0, cntxt@entry=0x7f59c940d4d0,
mb=0x0, mb@entry=0x1812fe30, env=0xf937930, argv=0xa0, debug=-128 '\200')
at mal_interpreter.c:436
9 0x00007f59c8ca300c in SQLexecutePrepared (c=0x7f59c940d4d0,
be=be@entry=0x6c46410, q=0x5e163c0) at sql_execute.c:370
10 0x00007f59c8ca34ea in SQLengineIntern (c=0x7f59c940d4d0, be=0x6c46410)
at sql_execute.c:435
11 0x00007f59d0248217 in runPhase (phase=4, c=0x7f59c940d4d0)
at mal_scenario.c:531
12 runScenarioBody (c=c@entry=0x7f59c940d4d0) at mal_scenario.c:575
13 0x00007f59d0248d9d in runScenario (c=c@entry=0x7f59c940d4d0)
at mal_scenario.c:595
14 0x00007f59d02492e0 in MSserveClient (dummy=dummy@entry=0x7f59c940d4d0)
at mal_session.c:457
15 0x00007f59d0249946 in MSscheduleClient (
command=command@entry=0x1fe95db0 "",
challenge=challenge@entry=0x7f535dc1ce80 "BjJUfUrNT", fin=0x8fed910,
fout=fout@entry=0x7f59b81db8c0) at mal_session.c:342
16 0x00007f59d02cab96 in doChallenge (data=) at mal_mapi.c:205
17 0x00007f59cfd1e0af in thread_starter (arg=)
at gdk_system.c:485
18 0x00007f59ce982064 in start_thread (arg=0x7f535dc1d700)
at pthread_create.c:309
19 0x00007f59ce6b762d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p BBP[0][13630]
$1 = {cache = {0x0, 0x0}, logical = {0x6a59990 "tmp_32476",
0x16a6cc70 "tmpr_32476"}, bak = {0x6a59990 "tmp_32476", 0x0}, next = {
59751, 0}, desc = 0x0, physical = 0x5ece700 "03/24/32476", options = 0x0,
refs = 1, lrefs = 0, lastused = 272413352, status = 2048}

This is the first (and so far only) time this has happened. I don't have logging of what queries led up to the incident.

The only place I can find to set BBPrec::status=2048 is BBPinsert(), so my hypothesis is:

BATnewstorage()
calls BATcreatedesc()
calls BBPinsert()
either HEAPalloc() or ATOMheap() fails, so BATnewstorage() returns before calling BBPcacheit().

Similar bailouts seem possible in BATcreatedesc() and VIEWcreate_() (and possibly some other places that I haven't found).

This bug, therefore, is a request that you check your error handling in these locations to ensure that the BBP is left in a stable state upon errors.

There is nothing in merovingian.log (other than normal client connection logging) anywhere near the time the problem started, so I'm not entirely happy with the above hypothesis; if you can come up with a better theory (or find a likely error path which logs nothing) then I'd be grateful. If you'd like any more information out of my core dump then let me know.

Comment 25351

Date: 2017-05-29 10:33:55 +0200
From: @sjoerdmullender

If you still have this in a debugger, can you share the output of

thread apply all bt

Comment 25352

Date: 2017-05-29 10:57:10 +0200
From: @sjoerdmullender

(In reply to Sjoerd Mullender from comment 1)

If you still have this in a debugger, can you share the output of

thread apply all bt

Belay this request. After studying the situation (and your theory) a bit more, I think you're on to something.

Comment 25354

Date: 2017-05-29 17:12:32 +0200
From: MonetDB Mercurial Repository <>

Changeset e0f18665e346 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=e0f18665e346

Changeset description:

When failing BAT creation after a successful BBPinsert(), we need to BBPclear().
This fixes bug #6323.
@monetdb-team monetdb-team added bug Something isn't working GDK Kernel normal labels Nov 30, 2020
@sjoerdmullender sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GDK Kernel normal
Projects
None yet
Development

No branches or pull requests

2 participants