You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20100101 Firefox/15.0.1
Build Identifier:
when a large number of clients perform concurrently select operations, mserver5 SIGSEGVs in the the MAL namespace allocation/counter function putName() in mal_namespace.c. The for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]) {} loop needs thread isolation, but the return clause has to be brought out of the loop/
Reproducible: Always
Steps to Reproduce:
create a db named crashdb, release it and start it
using the following python script, load 5,000,000 rows into the database
import random
import sys
import time
import string
def main():
C = string.letters + string.digits
random.seed(42)
for i in range(long(sys.argv[1])):
row = [
str(random.randint(0,231)),
str(random.randint(0,231)),
str(random.randint(0,2**31)),
'"' + "".join([random.choice(C) for i in range(random.randint(1,100))]) + '"' ,
'"' + "".join([random.choice(C) for i in range(random.randint(1,128))]) + '"',
'"' + "".join([random.choice(C) for i in range(random.randint(1,8000))]) + '"',
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200))),
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200))),
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200)))
]
print "|".join(row)
if name == "main":
main()
and load them into the database with
python gen_bigdata 5000000 | mclient crashdb -s "COPY INTO big_data FROM STDIN USING DELIMITERS '|','\n','"' NULL AS ''" -
create a file containing 10,000 random queries with the following python script:
def main():
random.seed(42)
connection = monetdb.sql.connect(username="analysis", password="analysis", hostname="localhost", database="crashdb")
cursor = connection.cursor()
cursor.arraysize = 10000
cursor.execute('SELECT B, E FROM big_data LIMIT ' + sys.argv[1] )
data = []
for row in cursor.fetchall():
data.append(row)
for i in range(1, long(sys.argv[1])):
r = data[random.randint(0,len(data)-1)]
print "SELECT A, D from big_Data where B=" + str(r[0]) + " AND E ='" + str(r[1]) + "';"
for i in seq 1 40; do cat big_data_queries.sql | mclient -d crashdb > /dev/null & done
[need username and password in ~/.monetdb
The server, after a few seconds will crash, every time.
This test needs a server with at least 64Gb of RAM and a large number of cores.
Actual Results:
crash as reported in merovingian.log
2012-10-13 08:47:42 MSG merovingian[59959]: database 'crashdb' (59996) was killed by signal SIGSEGV
crash as reported in syslog
Oct 13 08:47:40 tut kernel: [11059659.466492] mserver5[32149]: segfault at 53281 ip 00007fe0567d75e0 sp 00007fe03b7fa828 error 4 in libc-2.13.so[7fe056753000+197000]
Expected Results:
server should not be crashing
The trouble arises in putName(). If you run mserver5 under gdb, after cmopiling with --enable-debug, you will see the server crash in line 235 of mal_namespace.c:
for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){
Several concurrent threads are trying to define a new namespace and stomp on each others' toes, while doing that.
The for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){} loop needs thread isolation, but the early termination return clause has to be taken out of the for loop, to avoid guaranteed deadlocks. The following patch does that and solves the problem, under any magnitude of concurrency:
Forgot to say, after creating the database, create a user called 'analysis', with password 'analysis'.
CREATE USER "analysis" WITH PASSWORD 'analysis' NAME 'Analysis Explorer' SCHEMA "sys";
CREATE SCHEMA "analysis" AUTHORIZATION "analysis";
ALTER USER "analysis" SET SCHEMA "analysis";
After that, logged in as 'analysis', create the following table, before loading the data:
====================================
-- crash test
CREATE TABLE "big_data" (
A INT,
B INT,
C INT,
D VARCHAR(100),
E VARCHAR(128),
F VARCHAR(8000),
G DATE,
H DATE,
I DATE
);
The patch for 11.13.3 in my previous comment (and also the original patch for 11.11.11 in the original bug report) pays a significant performance price. The following patch (for 11.13.3, achieves the same result (i.e. preventing the SIGSEGV) without any noticeable performance impact
Martin, did your recent changes include a fix for this problem? We cannot build Oct2012-SP1 if this isn't fixed/committed.
Comment 18133
Date: 2012-11-27 15:50:41 +0100
From: @mlkersten
The concurrency conflict has been addressed in the namespace in Oct branch.
Running on my desktop and a small version of the database (5000),
which is enough the create the load and 100 concurrent users
(of which 64 are accepted) does not crash the server.
However, if you run the script with a naively large sequence (eg. 1000)
you will encounter bash/OS fork/resource limitations.
Comment 18136
Date: 2012-11-27 15:54:18 +0100
From: @mlkersten
Changeset 24c408dcf765 made by Martin Kersten mk@cwi.nl in the MonetDB repo, refers to this bug.
Concurrency on namespace
This patch addresses the bug #3163
The concurrency conflict has been addressed in the namespace in Oct branch.
Running on my desktop and a small version of the database (5000),
which is enough the create the load and 100 concurrent users
(of which 64 are accepted) does not crash the server.
However, if you run the script with a naively large sequence (eg. 1000)
you will encounter bash/OS fork/resource limitations.
Comment 18138
Date: 2012-11-27 15:58:10 +0100
From: @mlkersten
Downscale severity until more counterproofs of instability are reported.
I think it crashes only with -O3 or -O4 in CFLAGS. I had to cook the configure script in order to allow concurrently -g -O4 and this is the backtrace
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1112
1 0x00007f0a3d837f55 in putName (nme=0x7f0a3dda023a "sunique", len=7) at mal_namespace.c:239
2 0x00007f0a3dcc3bb6 in ESevaluate (empty=0x7f08a0a0dd70 "", mb=0x7f08a0965470, cntxt=) at opt_emptySet.c:55
3 OPTemptySetImplementation (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470, stk=, p=) at opt_emptySet.c:264
4 0x00007f0a3dce530f in OPTwrapper (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470, stk=0x0, p=) at opt_wrapper.c:171
5 0x00007f0a3dce08bb in optimizeMALBlock (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470) at opt_support.c:290
6 0x00007f0a366cfcf8 in addQueryToCache (c=) at sql_optimizer.c:521
7 0x00007f0a366cf446 in backend_dumpproc (be=0x7f0a2c6eb180, c=0x7f0a3800b4b8, cq=0x7f08a09997e0, s=0x7f08a0a1c800) at sql_gencode.c:2355
8 0x00007f0a366c7e8e in SQLparser (c=0x7f0a3800b4b8) at sql_scenario.c:1601
9 0x00007f0a3d84d1e4 in runPhase (phase=1, c=0x7f0a3800b4b8) at mal_scenario.c:522
10 runScenarioBody (c=0x7f0a3800b4b8) at mal_scenario.c:564
11 0x00007f0a3d84e36f in runScenario (c=0x7f0a3800b4b8) at mal_scenario.c:601
12 0x00007f0a3d84e410 in MSserveClient (dummy=0x7f0a3800b4b8) at mal_session.c:430
13 0x00007f0a3cdb8efc in start_thread (arg=0x7f09f64ba700) at pthread_create.c:304
14 0x00007f0a3caf359d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
15 0x0000000000000000 in ?? ()
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/local/pkg/MonetDB-11.13.5-debug/bin/mserver5 --set gdk_dbfarm /data1/monet'.
Program terminated with signal 11, Segmentation fault.
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
214 ../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
in ../sysdeps/x86_64/multiarch/../strcmp.S
(gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007fe9a1c152f8 in putName (nme=0x7fe99a938c9b "stdout", len=6) at mal_namespace.c:239
2 0x00007fe9a1bf6ee6 in newStmt (mb=0x7fe851753e00, module=0x7fe99a93a58a "io", name=0x7fe99a938c9b "stdout") at mal_builder.c:59
3 0x00007fe99a850a0e in _dumpstmt (sql=, mb=0x7fe851753e00, s=0x7fe8517848c0) at sql_gencode.c:2016
4 0x00007fe99a851892 in _dumpstmt (s=, mb=, sql=) at sql_gencode.c:707
5 backend_dumpstmt (be=0x7fe988d89f60, mb=0x7fe851753e00, s=0x7fe8517848c0) at sql_gencode.c:2206
6 0x00007fe99a8521b6 in backend_dumpproc (be=0x7fe988d89f60, c=0x7fe99c18ed78, cq=0x7fe851751a90, s=0x7fe8517848c0) at sql_gencode.c:2330
7 0x00007fe99a84adf8 in SQLparser (c=0x7fe99c18ed78) at sql_scenario.c:1601
8 0x00007fe9a1c2c1e6 in runPhase (phase=1, c=0x7fe99a938a63) at mal_scenario.c:522
9 runScenarioBody (c=0x7fe99a938a63) at mal_scenario.c:564
10 0x00007fe9a1c2d325 in runScenario (c=0x7fe99c18ed78) at mal_scenario.c:601
11 0x00007fe9a1c2d3e0 in MSserveClient (dummy=0x7fe99c18ed78) at mal_session.c:430
12 0x00007fe99f69cefc in start_thread (arg=0x7fe959676700) at pthread_create.c:304
13 0x00007fe99f3d759d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
14 0x0000000000000000 in ?? (
this is compiled with -O -g with gcc 4.7.2. As you can see, there were two threads inside the __strncmp_sse2 (): threads 31 and 1. This causes, I think, the SIGSEGV
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f93cb20c700 (LWP 45084) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f93cdb85700 (LWP 45083) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
42 Thread 0x7f93c91fc700 (LWP 45181) _lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
41 Thread 0x7f93c93fd700 (LWP 45179) 0x00007f93d2c29028 in BATsample (b=0x7f92357de0f8, n=128) at gdk_sample.c:83
40 Thread 0x7f93c95fe700 (LWP 45168) 0x00007f93d3136540 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
39 Thread 0x7f93c97ff700 (LWP 45158) 0x00007f93d2c29030 in BATsample (b=0x7f924de07478, n=128) at gdk_sample.c:79
38 Thread 0x7f93cb00b700 (LWP 45085) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
37 Thread 0x7f93d3b26740 (LWP 45082) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
36 Thread 0x7f93c9e02700 (LWP 45144) BATkdiff (l=0x807ee0, r=0x15c7300) at gdk_setop.mx:860
35 Thread 0x7f93c9c01700 (LWP 45152) 0x00007f93d2c28fe1 in BATsample (b=0x7f9252ba5688, n=128) at gdk_sample.c:83
34 Thread 0x7f93ca003700 (LWP 45141) 0x00007f93d2c29028 in BATsample (b=0x7f92357a8108, n=128) at gdk_sample.c:83
33 Thread 0x7f93ca204700 (LWP 45138) GDKfree (blk=0x7f92357dd640) at gdk_utils.c:887
32 Thread 0x7f93ca405700 (LWP 45135) 0x00007f93d2c29000 in BATsample (b=0x7f9252b99aa8, n=128) at gdk_sample.c:83
31 Thread 0x7f93ca606700 (LWP 45129) __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:215
30 Thread 0x7f93ca807700 (LWP 45127) 0x00007f93d3136544 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
29 Thread 0x7f93caa08700 (LWP 45125) 0x00007f93d2c29028 in BATsample (b=0x7f9239a6a9b8, n=128) at gdk_sample.c:83
28 Thread 0x7f93cac09700 (LWP 45120) 0x00007f93d31245d8 in setLifespan (mb=0x7f923dc39d40) at mal_function.c:704
27 Thread 0x7f93cae0a700 (LWP 45115) BATsample (b=0x7f924a0571c8, n=128) at gdk_sample.c:79
26 Thread 0x7f9388cad700 (LWP 45214) 0x00007f93d2c29028 in BATsample (b=0x7f925ba26cd8, n=128) at gdk_sample.c:83
25 Thread 0x7f9388eae700 (LWP 45213) 0x00007f93d3136540 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
24 Thread 0x7f93bbdfe700 (LWP 45198) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
23 Thread 0x7f93890af700 (LWP 45212) 0x00007f93d2c29028 in BATsample (b=0x7f924a057ec8, n=128) at gdk_sample.c:83
22 Thread 0x7f93892b0700 (LWP 45211) 0x00007f93d2c29030 in BATsample (b=0x7f9239a78958, n=128) at gdk_sample.c:79
21 Thread 0x7f93894b1700 (LWP 45210) 0x00007f93d2c29028 in BATsample (b=0x7f925ec105a8, n=128) at gdk_sample.c:83
20 Thread 0x7f93896b2700 (LWP 45209) 0x00007f93d2c2900a in BATsample (b=0x7f9231752df8, n=128) at gdk_sample.c:83
19 Thread 0x7f939eb13700 (LWP 45208) BATsample (b=0x7f925ba235a8, n=128) at gdk_sample.c:79
18 Thread 0x7f93babf5700 (LWP 45207) 0x00007f93d2c29028 in BATsample (b=0x7f922972e7f8, n=128) at gdk_sample.c:83
17 Thread 0x7f93baff7700 (LWP 45205) 0x00007f93d313652e in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
16 Thread 0x7f93badf6700 (LWP 45206) BATsample (b=0x7f923dc97618, n=128) at gdk_sample.c:80
15 Thread 0x7f93bb1f8700 (LWP 45204) 0x00007f93d2c29028 in BATsample (b=0x7f924a04bf18, n=128) at gdk_sample.c:83
14 Thread 0x7f93bb3f9700 (LWP 45203) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
13 Thread 0x7f93bb9fc700 (LWP 45200) __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
12 Thread 0x7f93bb5fa700 (LWP 45202) 0x00007f93d3136544 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
11 Thread 0x7f93bb7fb700 (LWP 45201) 0x00007f93d2c29028 in BATsample (b=0x7f9239a70518, n=128) at gdk_sample.c:83
10 Thread 0x7f93bbbfd700 (LWP 45199) 0x00007f93d2c29030 in BATsample (b=0x7f923dc6f508, n=128) at gdk_sample.c:79
9 Thread 0x7f93bbfff700 (LWP 45197) 0x00007f93d2c29028 in BATsample (b=0x7f9229756508, n=128) at gdk_sample.c:83
8 Thread 0x7f93c83f5700 (LWP 45196) 0x00007f93d2c2900a in BATsample (b=0x7f925ec08ee8, n=128) at gdk_sample.c:83
7 Thread 0x7f93c85f6700 (LWP 45195) BATsample (b=0x7f923db22e48, n=128) at gdk_sample.c:80
6 Thread 0x7f93c87f7700 (LWP 45194) 0x00007f93d2c29028 in BATsample (b=0x7f925b994d88, n=128) at gdk_sample.c:83
5 Thread 0x7f93c89f8700 (LWP 45193) BATsample (b=0x7f9245e03078, n=128) at gdk_sample.c:79
4 Thread 0x7f93c8bf9700 (LWP 45192) BATsample (b=0x7f92568811c8, n=128) at gdk_sample.c:79
3 Thread 0x7f93c9a00700 (LWP 45155) exp_create (sa=, type=1) at rel_exp.c:39
2 Thread 0x7f93c8dfa700 (LWP 45190) __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
1 Thread 0x7f93c8ffb700 (LWP 45183) __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
(gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007f93d313650d in putName (nme=0x7f924a0fe700 "s884_16", len=7) at mal_namespace.c:239
2 0x00007f93cbe0d041 in backend_dumpproc (be=0x7f93c0bc1230, c=0x7f93cd7291a0, cq=0x7f924a056640, s=0x7f924a0f5120) at sql_gencode.c:2290
3 0x00007f93cbe0658d in SQLparser (c=0x7f93cd7291a0) at sql_scenario.c:1601
4 0x00007f93d31488e7 in runPhase (c=, phase=) at mal_scenario.c:522
5 0x00007f93d3148a30 in runScenarioBody (c=0x7f924a0fe4c8) at mal_scenario.c:564
6 0x00007f93d31495bd in runScenario (c=0x7f93cd7291a0) at mal_scenario.c:601
7 0x00007f93d3149705 in MSserveClient (dummy=0x7f93cd7291a0) at mal_session.c:430
8 0x00007f93d0c38efc in start_thread (arg=0x7f93c8ffb700) at pthread_create.c:304
9 0x00007f93d097359d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
10 0x0000000000000000 in ?? ()
Comment 18275
Date: 2012-12-19 11:23:11 +0100
From: @mlkersten
Thank you for the detailed analysis.
Attempting to reproduce the error on my desktop machine and Feb2013 code.
I built the complete 5M row database as prescribed.
I restarted the server and attached gdb.
I ran the sequence of 40 concurrent users as prescribed, watching it using 'top'
It all seems to run smoothly sofar (still running).
One of the effects of this test is that namespace becomes polluted by a large number of query names, e.g. s884-16, which are never garbage collected.
This leads to a call to expandNamespace, which does not re-alloc, but performs a
malloc+copy+free. This could explain the SIGSEGV.
Resolutions:
be more conservative in name generation in SQL
use proper re-alloc code.
Comment 18276
Date: 2012-12-19 11:35:56 +0100
From: @mlkersten
The test run finished without causing a segfault.
the code will be patched to avoid the possible conflict
during expandNamespave.
could you compile MonetDB such that it does not use the SSE2 version of strncmp, e.g., by not using --march=opteron, and see whether the problem (segfault) persists?
Martin,
if done correctly (incl. checking for success), and gracefully bailing out otherwise), malloc, copy, free (instead of realloc) by themselves should not cause any segfaults.
Re: crashing in __strncmp_sse2, my opinion is that is just an epiphenomenon, not the real cause. That is where two or more threads meet, just due to stochastic, non deterministic behavior (execution timing, cpu load, disk speed varying over time etc.)
As a proof of that, at time, I would say 1 over 10 crashes, the problem manifests not as a crash, but as clients complaining of undefined namespaces; it's proof that threads might might meet elsewhere and "trash" namespace definitions.
When there are undefined namespaces, this is what I get on the console of the clients
[...]
TypeException:user.s989_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s990_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s991_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s992_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s993_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s994_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s995_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s996_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s997_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s998_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s999_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1000_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1001_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1002_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1003_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1004_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1005_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1006_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1007_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1008_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1009_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1010_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1011_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1012_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1013_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1014_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1015_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1016_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s947_15[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s948_15[85]:'io.stdout' undefined in: _114:any := io.stdout()
[...]
still crashes in __strncmp_sse2. I guess it is an optimization in libc. It probably checks where the cpu has sse2 instruction, and, if yes, it will use the strncmp_sse2()
=================
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/local/pkg/MonetDB-11.13.5-debug/bin/mserver5 --set gdk_dbfarm /data1/monet'.
Program terminated with signal 11, Segmentation fault.
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
214 ../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
in ../sysdeps/x86_64/multiarch/../strcmp.S
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f6bf60a4700 (LWP 46533) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f6be3fff700 (LWP 46660) BATsample (b=0x7f697ada3418, n=128) at gdk_sample.c:79
42 Thread 0x7f6bf02ef700 (LWP 46659) 0x00007f6bfb44f115 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:238
41 Thread 0x7f6bf04f0700 (LWP 46658) BATsample (b=0x7f696086c188, n=128) at gdk_sample.c:79
40 Thread 0x7f6bf06f1700 (LWP 46657) 0x00007f6bfaf2701a in BATsample (b=0x7f69871daa38, n=128) at gdk_sample.c:83
39 Thread 0x7f6bf08f2700 (LWP 46655) 0x00007f6bfaf27023 in BATsample (b=0x7f69608a6c78, n=128) at gdk_sample.c:83
38 Thread 0x7f6bf0af3700 (LWP 46654) 0x00007f6bfaf2702b in BATsample (b=0x7f697eba7298, n=128) at gdk_sample.c:79
37 Thread 0x7f6bf0cf4700 (LWP 46637) 0x00007f6bfaf27023 in BATsample (b=0x7f698716f128, n=128) at gdk_sample.c:83
36 Thread 0x7f6bf0ef5700 (LWP 46636) BATsample (b=0x7f6976920498, n=128) at gdk_sample.c:79
35 Thread 0x7f6bf10f6700 (LWP 46633) 0x00007f6bfb44f115 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:238
34 Thread 0x7f6bf12f7700 (LWP 46632) 0x00007f6bfaf2702b in BATsample (b=0x7f696e296f28, n=128) at gdk_sample.c:79
33 Thread 0x7f6bf14f8700 (LWP 46628) 0x00007f6bfaf27023 in BATsample (b=0x7f69768ca8d8, n=128) at gdk_sample.c:83
32 Thread 0x7f6bf3709700 (LWP 46534) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
31 Thread 0x7f6bf16f9700 (LWP 46625) BATsample (b=0x7f696a451628, n=128) at gdk_sample.c:82
30 Thread 0x7f6bf3508700 (LWP 46535) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
29 Thread 0x7f6bf18fa700 (LWP 46622) 0x00007f6bfaf27023 in BATsample (b=0x7f696086afb8, n=128) at gdk_sample.c:83
28 Thread 0x7f6bf1afb700 (LWP 46618) 0x00007f6bfaf27023 in BATsample (b=0x7f698fd898d8, n=128) at gdk_sample.c:83
27 Thread 0x7f6bf1cfc700 (LWP 46616) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
26 Thread 0x7f6bf1efd700 (LWP 46607) 0x00007f6bfaf27023 in BATsample (b=0x7f698be217c8, n=128) at gdk_sample.c:83
25 Thread 0x7f6bf20fe700 (LWP 46603) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
24 Thread 0x7f6bf22ff700 (LWP 46602) BATsample (b=0x7f6983c53e18, n=128) at gdk_sample.c:79
23 Thread 0x7f6bf2500700 (LWP 46599) 0x00007f6bfaf27023 in BATsample (b=0x7f696a40efa8, n=128) at gdk_sample.c:83
22 Thread 0x7f6bf2701700 (LWP 46595) BATsample (b=0x7f69643709f8, n=128) at gdk_sample.c:83
21 Thread 0x7f6bf2902700 (LWP 46591) 0x00007f6bfaf27023 in BATsample (b=0x3711bda8, n=128) at gdk_sample.c:83
20 Thread 0x7f6bf2b03700 (LWP 46589) 0x00007f6bfaf2702b in BATsample (b=0x7f697ebab318, n=128) at gdk_sample.c:79
19 Thread 0x7f6bf2d04700 (LWP 46585) BATsample (b=0x7f695842ab48, n=128) at gdk_sample.c:79
18 Thread 0x7f6bf2f05700 (LWP 46583) 0x00007f6bfaf2702b in BATsample (b=0x7f6960872cd8, n=128) at gdk_sample.c:79
17 Thread 0x7f6bf3106700 (LWP 46575) 0x00007f6bfaf27023 in BATsample (b=0x7f69642f2ed8, n=128) at gdk_sample.c:83
16 Thread 0x7f6bf3307700 (LWP 46573) 0x00007f6bfaf27023 in BATsample (b=0x7f696a40df28, n=128) at gdk_sample.c:83
15 Thread 0x7f6bfbeb2740 (LWP 46532) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
14 Thread 0x7f6be23f1700 (LWP 46674) 0x00007f6bfb44f0f0 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
13 Thread 0x7f6be25f2700 (LWP 46673) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
12 Thread 0x7f6be27f3700 (LWP 46672) BATsample (b=0x7f69871437f8, n=128) at gdk_sample.c:82
11 Thread 0x7f6be29f4700 (LWP 46671) 0x00007f6bfaf27023 in BATsample (b=0x7f696a412c88, n=128) at gdk_sample.c:83
10 Thread 0x7f6be2bf5700 (LWP 46670) BATsample (b=0x7f6972ad88b8, n=128) at gdk_sample.c:80
9 Thread 0x7f6be2df6700 (LWP 46669) 0x00007f6bfb44f104 in putName (nme=0x7f6bfb9bc26c "sortReverse", len=11) at mal_namespace.c:234
8 Thread 0x7f6be2ff7700 (LWP 46668) 0x00007f6bfaf27023 in BATsample (b=0x7f69642e6a58, n=128) at gdk_sample.c:83
7 Thread 0x7f6be31f8700 (LWP 46667) 0x00007f6bfaf27023 in BATsample (b=0x7f696e1ffcf8, n=128) at gdk_sample.c:83
6 Thread 0x7f6be33f9700 (LWP 46666) 0x00007f6bfaf27023 in BATsample (b=0x7f697ad9f4a8, n=128) at gdk_sample.c:83
5 Thread 0x7f6be35fa700 (LWP 46665) 0x00007f6bfaf27023 in BATsample (b=0x3711ddb8, n=128) at gdk_sample.c:83
4 Thread 0x7f6be37fb700 (LWP 46664) 0x00007f6bfb44f104 in putName (nme=0x7f6bf43ed597 "stdout", len=6) at mal_namespace.c:234
3 Thread 0x7f6be39fc700 (LWP 46663) BATsample (b=0x7f69583af298, n=128) at gdk_sample.c:79
2 Thread 0x7f6be3bfd700 (LWP 46662) BATsample (b=0x7f6964370018, n=128) at gdk_sample.c:83
1 Thread 0x7f6be3dfe700 (LWP 46661) __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007f6bfb44f125 in putName (nme=0x7f6bf43ed597 "stdout", len=6) at mal_namespace.c:239
2 0x00007f6bfb430da7 in newStmt (mb=0x7f698be1bec0, module=0x7f6bf43eeeba "io", name=0x7f6bf43ed597 "stdout") at mal_builder.c:59
3 0x00007f6bf430cd37 in _dumpstmt (sql=, mb=0x7f698be1bec0, s=0x7f698be33820) at sql_gencode.c:2016
4 0x00007f6bf430d9f2 in _dumpstmt (s=, mb=, sql=) at sql_gencode.c:707
5 backend_dumpstmt (be=0x7f6be8a3c460, mb=0x7f698be1bec0, s=0x7f698be33820) at sql_gencode.c:2206
6 0x00007f6bf430e55c in backend_dumpproc (be=0x7f6be8a3c460, c=0x7f6bf5c4a3a8, cq=0x7f698be1b090, s=0x7f698be33820) at sql_gencode.c:2330
7 0x00007f6bf4306f36 in SQLparser (c=0x7f6bf5c4a3a8) at sql_scenario.c:1601
8 0x00007f6bfb463bbc in runPhase (phase=1, c=0x7f6bf5c4a3a8) at mal_scenario.c:522
9 runScenarioBody (c=0x7f6bf5c4a3a8) at mal_scenario.c:564
10 0x00007f6bfb464d0f in runScenario (c=0x7f6bf5c4a3a8) at mal_scenario.c:601
11 0x00007f6bfb464db8 in MSserveClient (dummy=0x7f6bf5c4a3a8) at mal_session.c:430
12 0x00007f6bfa9f5efc in start_thread (arg=0x7f6be3dfe700) at pthread_create.c:304
13 0x00007f6bfa73059d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
14 0x0000000000000000 in ?? ()
Since we cannot yet reproduce the segfault, could you try whether you can reproduce it also with a non-optimized build (from scratch), preferably configured without setting CFLAGS and using options --disable-optimize --enable-debug --enable-assert ?
If that does not trigger a segfault, also an optimized build (from scratch) with CFLAGS="-g -O4 -mno-sse2" would be interesting ...
The libc is compiled with SSE support, so it seems unlikely compilation settings for mserver5 will make any difference in it (libc) using the sse-optimised strcmp.
Compiled with --disable-optimize --enable-debug --enable-assert from a pristine source tarball, as requested. This crashes as:
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f227e215700 (LWP 17952) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f228745d740 (LWP 17949) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
42 Thread 0x7f22759ec700 (LWP 18089) 0x00007f22854bcac7 in mprotect () at ../sysdeps/unix/syscall-template.S:82
41 Thread 0x7f2280e2e700 (LWP 17950) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
40 Thread 0x7f227ca09700 (LWP 18039) BATsample (b=0x7f22636d4508, n=128) at gdk_sample.c:79
39 Thread 0x7f227cc0a700 (LWP 18036) 0x00007f2285d414aa in BATsample (b=0x7f21fc114b08, n=128) at gdk_sample.c:83
38 Thread 0x7f227ce0b700 (LWP 18035) 0x00007f2285d414ba in BATsample (b=0x7f21fc0c68c8, n=128) at gdk_sample.c:79
37 Thread 0x7f227d00c700 (LWP 18023) 0x00007f2285d4147e in BATsample (b=0x7f21f85941a8, n=128) at gdk_sample.c:83
36 Thread 0x7f227d20d700 (LWP 18021) BATsample (b=0x7f21f8548158, n=128) at gdk_sample.c:80
35 Thread 0x7f227e416700 (LWP 17951) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
34 Thread 0x7f227d40e700 (LWP 18019) 0x00007f2285d414aa in BATsample (b=0x7f21fc13c2f8, n=128) at gdk_sample.c:83
33 Thread 0x7f227d60f700 (LWP 18015) 0x00007f2285d41451 in BATsample (b=0x7f22672b64d8, n=128) at gdk_sample.c:83
32 Thread 0x7f227d810700 (LWP 18008) 0x00007f2285d414c5 in BATsample (b=0x7339c98, n=128) at gdk_sample.c:79
31 Thread 0x7f227da11700 (LWP 18007) 0x00007f2285d4149e in BATsample (b=0x7f225f2b4898, n=128) at gdk_sample.c:83
30 Thread 0x7f227dc12700 (LWP 18002) 0x00007f2285d41451 in BATsample (b=0x7f225f3992f8, n=128) at gdk_sample.c:83
29 Thread 0x7f227de13700 (LWP 17999) 0x00007f2285d414c7 in BATsample (b=0x7f2257890a68, n=128) at gdk_sample.c:79
28 Thread 0x7f227e014700 (LWP 17994) 0x00007f2285d41472 in BATsample (b=0x7f2257866748, n=128) at gdk_sample.c:83
27 Thread 0x7f22751e8700 (LWP 18093) 0x00007f2285d41483 in BATsample (b=0x7f2200b2ee28, n=128) at gdk_sample.c:83
26 Thread 0x7f22753e9700 (LWP 18092) 0x00007f2285d414aa in BATsample (b=0x7f225783d4d8, n=128) at gdk_sample.c:83
25 Thread 0x7f22755ea700 (LWP 18091) 0x00007f2285d414aa in BATsample (b=0x7f224f7925f8, n=128) at gdk_sample.c:83
24 Thread 0x7f22757eb700 (LWP 18090) 0x00007f2285d4149e in BATsample (b=0x7f22672baa78, n=128) at gdk_sample.c:83
23 Thread 0x7f2275bed700 (LWP 18088) 0x00007f2285d414ba in BATsample (b=0x7f225774ca48, n=128) at gdk_sample.c:79
22 Thread 0x7f2275dee700 (LWP 18087) 0x00007f2285d41483 in BATsample (b=0x7f2213d2e9c8, n=128) at gdk_sample.c:83
21 Thread 0x7f2275fef700 (LWP 18086) 0x00007f2285d41472 in BATsample (b=0x7f226aed1ee8, n=128) at gdk_sample.c:83
20 Thread 0x7f22761f0700 (LWP 18085) 0x00007f2285d4149a in BATsample (b=0x7f2267301c08, n=128) at gdk_sample.c:83
19 Thread 0x7f22763f1700 (LWP 18084) BATsample (b=0x7f22637c83a8, n=128) at gdk_sample.c:79
18 Thread 0x7f22765f2700 (LWP 18083) 0x00007f2285d4147e in BATsample (b=0x7f226aef7858, n=128) at gdk_sample.c:83
17 Thread 0x7f22767f3700 (LWP 18082) 0x00007f2285d4149e in BATsample (b=0x7f22637cf848, n=128) at gdk_sample.c:83
16 Thread 0x7f22769f4700 (LWP 18081) 0x00007f2285d4147e in BATsample (b=0x7f224f792da8, n=128) at gdk_sample.c:83
15 Thread 0x7f2276bf5700 (LWP 18080) 0x00007f2285d414ba in BATsample (b=0x7f226fe67428, n=128) at gdk_sample.c:79
14 Thread 0x7f2276df6700 (LWP 18079) 0x00007f2285d414ba in BATsample (b=0x7f2200a6fc28, n=128) at gdk_sample.c:79
13 Thread 0x7f2276ff7700 (LWP 18078) 0x00007f2285d414c7 in BATsample (b=0x7f2213cdf6f8, n=128) at gdk_sample.c:79
12 Thread 0x7f22771f8700 (LWP 18075) 0x00007f2285d4149e in BATsample (b=0x7f226fe93278, n=128) at gdk_sample.c:83
11 Thread 0x7f22773f9700 (LWP 18073) 0x00007f2285d41451 in BATsample (b=0x7f2213ce6ce8, n=128) at gdk_sample.c:83
10 Thread 0x7f22775fa700 (LWP 18072) 0x00007f2285d414aa in BATsample (b=0x7f224f688ec8, n=128) at gdk_sample.c:83
9 Thread 0x7f22777fb700 (LWP 18068) 0x00007f2285d4149e in BATsample (b=0x7f2263664388, n=128) at gdk_sample.c:83
8 Thread 0x7f22779fc700 (LWP 18064) 0x00007f2285d414c2 in BATsample (b=0x7f226fe3fe28, n=128) at gdk_sample.c:79
7 Thread 0x7f2277bfd700 (LWP 18058) 0x00007f2285d4149a in BATsample (b=0x7f224f7b3f78, n=128) at gdk_sample.c:83
6 Thread 0x7f2277dfe700 (LWP 18054) 0x00007f2285d414ba in BATsample (b=0x7f22672dc928, n=128) at gdk_sample.c:79
5 Thread 0x7f2277fff700 (LWP 18051) BATsample (b=0x7f221544b358, n=128) at gdk_sample.c:80
4 Thread 0x7f227c205700 (LWP 18049) 0x00007f2285d414c7 in BATsample (b=0x73615c8, n=128) at gdk_sample.c:79
3 Thread 0x7f227c406700 (LWP 18046) 0x00007f2285d4149a in BATsample (b=0x7f2263750628, n=128) at gdk_sample.c:83
2 Thread 0x7f227c607700 (LWP 18044) 0x00007f2285d4149e in BATsample (b=0x7f21fc0eb918, n=128) at gdk_sample.c:83
1 Thread 0x7f227c808700 (LWP 18042) 0x00007f22864366e7 in putName (nme=0x2cd5fc0 "str", len=3) at mal_namespace.c:234
(gdb) bt
0 0x00007f22864366e7 in putName (nme=0x2cd5fc0 "str", len=3) at mal_namespace.c:234
1 0x00007f228640a5d2 in newStmt1 (mb=0x7f22154734a0, module=0x1f32f10 "calc", name=0x2cd5fc0 "str") at mal_builder.c:71
2 0x00007f227f022578 in _dumpstmt (sql=0x7f22709be8e0, mb=0x7f22154734a0, s=0x7f22154a6700) at sql_gencode.c:1765
3 0x00007f227f024720 in backend_dumpstmt (be=0x7f22709be8e0, mb=0x7f22154734a0, s=0x7f221549eb20) at sql_gencode.c:2206
4 0x00007f227f02503a in backend_dumpproc (be=0x7f22709be8e0, c=0x7f22809d1858, cq=0x7f22153d3170, s=0x7f221549eb20) at sql_gencode.c:2330
5 0x00007f227f018bcc in SQLparser (c=0x7f22809d1858) at sql_scenario.c:1601
6 0x00007f2286451549 in runPhase (c=0x7f22809d1858, phase=1) at mal_scenario.c:522
7 0x00007f2286451681 in runScenarioBody (c=0x7f22809d1858) at mal_scenario.c:564
8 0x00007f22864518fa in runScenario (c=0x7f22809d1858) at mal_scenario.c:601
9 0x00007f228645282e in MSserveClient (dummy=0x7f22809d1858) at mal_session.c:430
10 0x00007f2285785efc in start_thread (arg=0x7f227c808700) at pthread_create.c:304
11 0x00007f22854c059d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
12 0x0000000000000000 in ?? ()
This crash is solved by the patch I previously posted:
Make namespace more resilient
A new namespace manager has been introduced, which allows for concurrent reads without locks.
Writes in the structure are protected with locks.
It significantly improves the running time of the test mentioned in bug #3163 .
The server startup seems slightly longer (20ms), because now we use separate malloced structures.
The patch does not address the current SQL limitation to produce unique persistent names for all queries once cached.
Comment 18292
Date: 2012-12-20 21:58:14 +0100
From: @mlkersten
A new namespace manager has been introduced, which allows for concurrent reads without locks. Writes in the structure are protected with locks. It significantly improves the running time of this test case.
The startup cost is slightly longer, because now we use separate malloced structures.
The patch does not address the current SQL limitation to produce unique persistent names for all queries once cached.
Please confirm effectiveness of this patch.
Comment 18293
Date: 2012-12-20 22:03:12 +0100
From: @mlkersten
triple run of the experiment with the new namespace manager does not lead to SEGFAULTs on my desktop machine.
the first tests are very good. I ran the usual 40 concurrent clients 4 times. Only once I had a crash:
2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed.
2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed.
2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion c != ((void *)0)' failed.
2012-12-20 15:32:01 MSG merovingian[33044]: database 'crashdb' (35830) was killed by signal SIGABRT
The other three 3 times it worked very well without a glitch.
Comment 18298
Date: 2012-12-21 08:30:06 +0100
From: @mlkersten
Indeed. Internally a client record was taken from the pool for compilation. With the stress test under consideration, there may not be left a client slot by the time you reach that point. A patch is in testing.
Comment 18299
Date: 2012-12-21 09:17:29 +0100
From: @mlkersten
Patch committed. It uses a static client record instead now.
The (single) test run passes.
Date: 2012-10-13 17:18:36 +0200
From: Valerio Aimale <>
To: MonetDB5 devs <>
Version: 11.13.5 (Oct2012-SP1)
CC: @mlkersten, @drstmane, valerio
Last updated: 2013-01-22 09:29:07 +0100
Comment 17801
Date: 2012-10-13 17:18:36 +0200
From: Valerio Aimale <>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20100101 Firefox/15.0.1
Build Identifier:
when a large number of clients perform concurrently select operations, mserver5 SIGSEGVs in the the MAL namespace allocation/counter function putName() in mal_namespace.c. The for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]) {} loop needs thread isolation, but the return clause has to be brought out of the loop/
Reproducible: Always
Steps to Reproduce:
!/usr/bin/env python
by Valerio G. Aimale valerio@aimale.com
import random
import sys
import time
import string
def main():
C = string.letters + string.digits
random.seed(42)
for i in range(long(sys.argv[1])):
row = [
str(random.randint(0,231)),
str(random.randint(0,231)),
str(random.randint(0,2**31)),
'"' + "".join([random.choice(C) for i in range(random.randint(1,100))]) + '"' ,
'"' + "".join([random.choice(C) for i in range(random.randint(1,128))]) + '"',
'"' + "".join([random.choice(C) for i in range(random.randint(1,8000))]) + '"',
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200))),
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200))),
time.strftime("%Y-%m-%d", time.localtime(random.randint(946710000, 1325401200)))
]
print "|".join(row)
if name == "main":
main()
and load them into the database with
python gen_bigdata 5000000 | mclient crashdb -s "COPY INTO big_data FROM STDIN USING DELIMITERS '|','\n','"' NULL AS ''" -
!/usr/bin/env python
by Valerio G. Aimale valerio@aimale.com
import monetdb.sql
import random
import sys
def main():
random.seed(42)
connection = monetdb.sql.connect(username="analysis", password="analysis", hostname="localhost", database="crashdb")
cursor = connection.cursor()
cursor.arraysize = 10000
cursor.execute('SELECT B, E FROM big_data LIMIT ' + sys.argv[1] )
data = []
for row in cursor.fetchall():
data.append(row)
for i in range(1, long(sys.argv[1])):
r = data[random.randint(0,len(data)-1)]
print "SELECT A, D from big_Data where B=" + str(r[0]) + " AND E ='" + str(r[1]) + "';"
if name == "main":
main()
run as
python gen_bigdata_queries 10000 > big_data_queries.sql
then execute the queries concurrently as
for i in
seq 1 40
; do cat big_data_queries.sql | mclient -d crashdb > /dev/null & done[need username and password in ~/.monetdb
The server, after a few seconds will crash, every time.
This test needs a server with at least 64Gb of RAM and a large number of cores.
Actual Results:
crash as reported in merovingian.log
2012-10-13 08:47:42 MSG merovingian[59959]: database 'crashdb' (59996) was killed by signal SIGSEGV
crash as reported in syslog
Oct 13 08:47:40 tut kernel: [11059659.466492] mserver5[32149]: segfault at 53281 ip 00007fe0567d75e0 sp 00007fe03b7fa828 error 4 in libc-2.13.so[7fe056753000+197000]
Expected Results:
server should not be crashing
The trouble arises in putName(). If you run mserver5 under gdb, after cmopiling with --enable-debug, you will see the server crash in line 235 of mal_namespace.c:
for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){
Several concurrent threads are trying to define a new namespace and stomp on each others' toes, while doing that.
The for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){} loop needs thread isolation, but the early termination return clause has to be taken out of the for loop, to avoid guaranteed deadlocks. The following patch does that and solves the problem, under any magnitude of concurrency:
==========================
--- mal_namespace.c.orig 2012-10-12 16:06:51.970821824 -0600
+++ mal_namespace.c 2012-10-12 15:44:14.078131058 -0600
@@ -228,9 +228,11 @@
{
size_t l,top;
char buf[MAXIDENTLEN];
ifdef BACKUP
chkName(l);
@@ -264,9 +266,13 @@
l=k;
}
*/
====================
cd /path/to/MonetDB-11.11.11/monetdb5/mal
patch < mal_namespace.c.patch
====================================================================
root@tut:~/MonetDB-11.11.11 /usr/local/pkg/MonetDB-11.11.11/bin/mserver5 --version
MonetDB 5 server v11.11.11 "Jul2012-SP2" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2012 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 126.2GiB available memory, 48 available cpu cores
Libraries:
libpcre: 8.12 2011-01-15 (compiled with 8.12)
openssl: OpenSSL 1.0.0e 6 Sep 2011 (compiled with OpenSSL 1.0.0e 6 Sep 2011)
libxml2: 2.7.8 (compiled with 2.7.8)
Compiled by: root@tut (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -O3 -march=opteron -Wp,-D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Comment 17802
Date: 2012-10-13 17:20:09 +0200
From: Valerio Aimale <>
Created attachment 149
generate data required for the test
Comment 17803
Date: 2012-10-13 17:20:31 +0200
From: Valerio Aimale <>
Created attachment 150
generate queries required for the test
Comment 17804
Date: 2012-10-13 17:20:58 +0200
From: Valerio Aimale <>
Created attachment 151
patch for mal_namespace.c
Comment 17805
Date: 2012-10-13 17:27:26 +0200
From: @mlkersten
Thank you for the detailed analysis and providing a solution.
We will review it and merge it into the respective bug/feature releases.
regards, Martin Kersten
Comment 17807
Date: 2012-10-13 17:33:55 +0200
From: Valerio Aimale <>
Created attachment 152
Schema for the big_data table
Comment 17808
Date: 2012-10-13 17:37:36 +0200
From: Valerio Aimale <>
Forgot to say, after creating the database, create a user called 'analysis', with password 'analysis'.
CREATE USER "analysis" WITH PASSWORD 'analysis' NAME 'Analysis Explorer' SCHEMA "sys";
CREATE SCHEMA "analysis" AUTHORIZATION "analysis";
ALTER USER "analysis" SET SCHEMA "analysis";
After that, logged in as 'analysis', create the following table, before loading the data:
====================================
-- crash test
CREATE TABLE "big_data" (
A INT,
B INT,
C INT,
D VARCHAR(100),
E VARCHAR(128),
F VARCHAR(8000),
G DATE,
H DATE,
I DATE
);
===================================
your ~/.monetdb should look like
user=analysis
password=analysis
Comment 17852
Date: 2012-10-28 16:26:18 +0100
From: Valerio Aimale <>
Version 11.13.3 still crashes with the same test.
This is the patch for version 11.13.3:
===========================================================
--- mal_namespace.c.orig 2012-10-28 09:24:48.555393313 -0600
+++ mal_namespace.c 2012-10-28 09:16:16.892918629 -0600
@@ -228,9 +228,11 @@
{
size_t l,top;
char buf[MAXIDENTLEN];
ifdef BACKUP
chkName(l);
@@ -264,9 +266,13 @@
l=k;
}
*/
======================================
Comment 17853
Date: 2012-10-28 16:37:21 +0100
From: @grobian
Martin, can you please take a look at this, thanks.
Comment 17854
Date: 2012-10-28 19:09:12 +0100
From: Valerio Aimale <>
The patch for 11.13.3 in my previous comment (and also the original patch for 11.11.11 in the original bug report) pays a significant performance price. The following patch (for 11.13.3, achieves the same result (i.e. preventing the SIGSEGV) without any noticeable performance impact
====================================================
--- mal_namespace.c.orig 2012-10-28 09:24:48.555393313 -0600
+++ mal_namespace.c 2012-10-28 11:53:40.792026089 -0600
@@ -231,7 +231,8 @@
ifdef BACKUP
chkName(l);
endif
@@ -266,6 +267,9 @@
*/
return namespace.nme[l];
}
======================================================
it replaces the for loop with a while loop, protecting the atomic operation
l = namespace.link[l];
which is the only operation needing thread-isolation.
Valerio
Comment 17889
Date: 2012-11-07 18:46:03 +0100
From: @grobian
Martin, did your recent changes include a fix for this problem? We cannot build Oct2012-SP1 if this isn't fixed/committed.
Comment 18133
Date: 2012-11-27 15:50:41 +0100
From: @mlkersten
The concurrency conflict has been addressed in the namespace in Oct branch.
Running on my desktop and a small version of the database (5000),
which is enough the create the load and 100 concurrent users
(of which 64 are accepted) does not crash the server.
However, if you run the script with a naively large sequence (eg. 1000)
you will encounter bash/OS fork/resource limitations.
Comment 18136
Date: 2012-11-27 15:54:18 +0100
From: @mlkersten
Changeset 24c408dcf765 made by Martin Kersten mk@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=24c408dcf765
Changeset description:
Comment 18138
Date: 2012-11-27 15:58:10 +0100
From: @mlkersten
Downscale severity until more counterproofs of instability are reported.
Comment 18242
Date: 2012-12-08 22:36:53 +0100
From: @mlkersten
Considered resolved.
Comment 18267
Date: 2012-12-18 21:50:53 +0100
From: Valerio Aimale <>
Martin,
I'm sorry to report that with version 11.13.5, the crash still happens:
valerio@tut:
$ ps fax$ for i in[...]
625 ? Ssl 1:04 /usr/local/pkg/MonetDB-11.13.5/bin/monetdbd start /data1/monetdb/dbfarm/
49170 ? Ssl 126:09 _ /usr/local/pkg/MonetDB-11.13.5/bin/mserver5 --set gdk_dbfarm /data1/monetdb/dbfarm
[...]
valerio@tut:
seq 1 40
; do cat big_data_queries | mclient -d crashdb >/dev/null & done[1] 51568
[2] 51570
[3] 51572
[4] 51574
[5] 51576
[6] 51578
[7] 51580
[8] 51582
[9] 51585
[10] 51587
[11] 51589
[12] 51592
[13] 51594
[14] 51596
[15] 51600
[16] 51602
[17] 51605
[18] 51607
[19] 51611
[20] 51613
[21] 51617
[22] 51620
[23] 51623
[24] 51625
[25] 51627
[26] 51630
[27] 51633
[28] 51635
[29] 51638
[30] 51640
[31] 51642
[32] 51644
[33] 51646
[34] 51649
[35] 51652
[36] 51655
[37] 51658
[38] 51661
[39] 51663
[40] 51666
valerio@tut:~$ Connection terminated
Connection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminatedConnection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
Connection terminated
valerio@tut:~$
from the merovingian.log:
[...]
2012-12-18 13:48:44 MSG merovingian[625]: database 'crashdb' (51556) was killed by signal SIGSEGV
[...]
Comment 18268
Date: 2012-12-18 22:10:13 +0100
From: @mlkersten
That is a pitty to hear. You happen to be able to get the stack trace of
the running threads?
Comment 18269
Date: 2012-12-18 23:56:02 +0100
From: Valerio Aimale <>
I think it crashes only with -O3 or -O4 in CFLAGS. I had to cook the configure script in order to allow concurrently -g -O4 and this is the backtrace
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1112
1 0x00007f0a3d837f55 in putName (nme=0x7f0a3dda023a "sunique", len=7) at mal_namespace.c:239
2 0x00007f0a3dcc3bb6 in ESevaluate (empty=0x7f08a0a0dd70 "", mb=0x7f08a0965470, cntxt=) at opt_emptySet.c:55
3 OPTemptySetImplementation (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470, stk=, p=) at opt_emptySet.c:264
4 0x00007f0a3dce530f in OPTwrapper (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470, stk=0x0, p=) at opt_wrapper.c:171
5 0x00007f0a3dce08bb in optimizeMALBlock (cntxt=0x7f0a3800b4b8, mb=0x7f08a0965470) at opt_support.c:290
6 0x00007f0a366cfcf8 in addQueryToCache (c=) at sql_optimizer.c:521
7 0x00007f0a366cf446 in backend_dumpproc (be=0x7f0a2c6eb180, c=0x7f0a3800b4b8, cq=0x7f08a09997e0, s=0x7f08a0a1c800) at sql_gencode.c:2355
8 0x00007f0a366c7e8e in SQLparser (c=0x7f0a3800b4b8) at sql_scenario.c:1601
9 0x00007f0a3d84d1e4 in runPhase (phase=1, c=0x7f0a3800b4b8) at mal_scenario.c:522
10 runScenarioBody (c=0x7f0a3800b4b8) at mal_scenario.c:564
11 0x00007f0a3d84e36f in runScenario (c=0x7f0a3800b4b8) at mal_scenario.c:601
12 0x00007f0a3d84e410 in MSserveClient (dummy=0x7f0a3800b4b8) at mal_session.c:430
13 0x00007f0a3cdb8efc in start_thread (arg=0x7f09f64ba700) at pthread_create.c:304
14 0x00007f0a3caf359d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
15 0x0000000000000000 in ?? ()
Comment 18270
Date: 2012-12-18 23:56:51 +0100
From: Valerio Aimale <>
this is how I compiled it
CFLAGS="-g -O4 -march=opteron" CXXFLAGS="-g -O4 -march=opteron" ./configure --prefix=/usr/local/pkg/MonetDB-11.13.5-debug --with-readline=/usr --enable-odbc --with-pthread=/usr --enable-debug --enable-optimize
Comment 18271
Date: 2012-12-19 00:11:12 +0100
From: Valerio Aimale <>
Same when compiled with -O4 -g
==================================
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/local/pkg/MonetDB-11.13.5-debug/bin/mserver5 --set gdk_dbfarm /data1/monet'.
Program terminated with signal 11, Segmentation fault.
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
214 ../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
in ../sysdeps/x86_64/multiarch/../strcmp.S
(gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007fe9a1c152f8 in putName (nme=0x7fe99a938c9b "stdout", len=6) at mal_namespace.c:239
2 0x00007fe9a1bf6ee6 in newStmt (mb=0x7fe851753e00, module=0x7fe99a93a58a "io", name=0x7fe99a938c9b "stdout") at mal_builder.c:59
3 0x00007fe99a850a0e in _dumpstmt (sql=, mb=0x7fe851753e00, s=0x7fe8517848c0) at sql_gencode.c:2016
4 0x00007fe99a851892 in _dumpstmt (s=, mb=, sql=) at sql_gencode.c:707
5 backend_dumpstmt (be=0x7fe988d89f60, mb=0x7fe851753e00, s=0x7fe8517848c0) at sql_gencode.c:2206
6 0x00007fe99a8521b6 in backend_dumpproc (be=0x7fe988d89f60, c=0x7fe99c18ed78, cq=0x7fe851751a90, s=0x7fe8517848c0) at sql_gencode.c:2330
7 0x00007fe99a84adf8 in SQLparser (c=0x7fe99c18ed78) at sql_scenario.c:1601
8 0x00007fe9a1c2c1e6 in runPhase (phase=1, c=0x7fe99a938a63) at mal_scenario.c:522
9 runScenarioBody (c=0x7fe99a938a63) at mal_scenario.c:564
10 0x00007fe9a1c2d325 in runScenario (c=0x7fe99c18ed78) at mal_scenario.c:601
11 0x00007fe9a1c2d3e0 in MSserveClient (dummy=0x7fe99c18ed78) at mal_session.c:430
12 0x00007fe99f69cefc in start_thread (arg=0x7fe959676700) at pthread_create.c:304
13 0x00007fe99f3d759d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
14 0x0000000000000000 in ?? (
Comment 18272
Date: 2012-12-19 00:11:55 +0100
From: Valerio Aimale <>
I mean to sat "sam when compiled with gcc 4.7.2 and -O4 -g"
Comment 18273
Date: 2012-12-19 00:28:32 +0100
From: Valerio Aimale <>
Martin,
this is compiled with -O -g with gcc 4.7.2. As you can see, there were two threads inside the __strncmp_sse2 (): threads 31 and 1. This causes, I think, the SIGSEGV
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f93cb20c700 (LWP 45084) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f93cdb85700 (LWP 45083) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
42 Thread 0x7f93c91fc700 (LWP 45181) _lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
41 Thread 0x7f93c93fd700 (LWP 45179) 0x00007f93d2c29028 in BATsample (b=0x7f92357de0f8, n=128) at gdk_sample.c:83
40 Thread 0x7f93c95fe700 (LWP 45168) 0x00007f93d3136540 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
39 Thread 0x7f93c97ff700 (LWP 45158) 0x00007f93d2c29030 in BATsample (b=0x7f924de07478, n=128) at gdk_sample.c:79
38 Thread 0x7f93cb00b700 (LWP 45085) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
37 Thread 0x7f93d3b26740 (LWP 45082) 0x00007f93d096c613 in select () at ../sysdeps/unix/syscall-template.S:82
36 Thread 0x7f93c9e02700 (LWP 45144) BATkdiff (l=0x807ee0, r=0x15c7300) at gdk_setop.mx:860
35 Thread 0x7f93c9c01700 (LWP 45152) 0x00007f93d2c28fe1 in BATsample (b=0x7f9252ba5688, n=128) at gdk_sample.c:83
34 Thread 0x7f93ca003700 (LWP 45141) 0x00007f93d2c29028 in BATsample (b=0x7f92357a8108, n=128) at gdk_sample.c:83
33 Thread 0x7f93ca204700 (LWP 45138) GDKfree (blk=0x7f92357dd640) at gdk_utils.c:887
32 Thread 0x7f93ca405700 (LWP 45135) 0x00007f93d2c29000 in BATsample (b=0x7f9252b99aa8, n=128) at gdk_sample.c:83
31 Thread 0x7f93ca606700 (LWP 45129) __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:215
30 Thread 0x7f93ca807700 (LWP 45127) 0x00007f93d3136544 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
29 Thread 0x7f93caa08700 (LWP 45125) 0x00007f93d2c29028 in BATsample (b=0x7f9239a6a9b8, n=128) at gdk_sample.c:83
28 Thread 0x7f93cac09700 (LWP 45120) 0x00007f93d31245d8 in setLifespan (mb=0x7f923dc39d40) at mal_function.c:704
27 Thread 0x7f93cae0a700 (LWP 45115) BATsample (b=0x7f924a0571c8, n=128) at gdk_sample.c:79
26 Thread 0x7f9388cad700 (LWP 45214) 0x00007f93d2c29028 in BATsample (b=0x7f925ba26cd8, n=128) at gdk_sample.c:83
25 Thread 0x7f9388eae700 (LWP 45213) 0x00007f93d3136540 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
24 Thread 0x7f93bbdfe700 (LWP 45198) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
23 Thread 0x7f93890af700 (LWP 45212) 0x00007f93d2c29028 in BATsample (b=0x7f924a057ec8, n=128) at gdk_sample.c:83
22 Thread 0x7f93892b0700 (LWP 45211) 0x00007f93d2c29030 in BATsample (b=0x7f9239a78958, n=128) at gdk_sample.c:79
21 Thread 0x7f93894b1700 (LWP 45210) 0x00007f93d2c29028 in BATsample (b=0x7f925ec105a8, n=128) at gdk_sample.c:83
20 Thread 0x7f93896b2700 (LWP 45209) 0x00007f93d2c2900a in BATsample (b=0x7f9231752df8, n=128) at gdk_sample.c:83
19 Thread 0x7f939eb13700 (LWP 45208) BATsample (b=0x7f925ba235a8, n=128) at gdk_sample.c:79
18 Thread 0x7f93babf5700 (LWP 45207) 0x00007f93d2c29028 in BATsample (b=0x7f922972e7f8, n=128) at gdk_sample.c:83
17 Thread 0x7f93baff7700 (LWP 45205) 0x00007f93d313652e in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
16 Thread 0x7f93badf6700 (LWP 45206) BATsample (b=0x7f923dc97618, n=128) at gdk_sample.c:80
15 Thread 0x7f93bb1f8700 (LWP 45204) 0x00007f93d2c29028 in BATsample (b=0x7f924a04bf18, n=128) at gdk_sample.c:83
14 Thread 0x7f93bb3f9700 (LWP 45203) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
13 Thread 0x7f93bb9fc700 (LWP 45200) __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
12 Thread 0x7f93bb5fa700 (LWP 45202) 0x00007f93d3136544 in putName (nme=0x1620f90 "str", len=3) at mal_namespace.c:234
11 Thread 0x7f93bb7fb700 (LWP 45201) 0x00007f93d2c29028 in BATsample (b=0x7f9239a70518, n=128) at gdk_sample.c:83
10 Thread 0x7f93bbbfd700 (LWP 45199) 0x00007f93d2c29030 in BATsample (b=0x7f923dc6f508, n=128) at gdk_sample.c:79
9 Thread 0x7f93bbfff700 (LWP 45197) 0x00007f93d2c29028 in BATsample (b=0x7f9229756508, n=128) at gdk_sample.c:83
8 Thread 0x7f93c83f5700 (LWP 45196) 0x00007f93d2c2900a in BATsample (b=0x7f925ec08ee8, n=128) at gdk_sample.c:83
7 Thread 0x7f93c85f6700 (LWP 45195) BATsample (b=0x7f923db22e48, n=128) at gdk_sample.c:80
6 Thread 0x7f93c87f7700 (LWP 45194) 0x00007f93d2c29028 in BATsample (b=0x7f925b994d88, n=128) at gdk_sample.c:83
5 Thread 0x7f93c89f8700 (LWP 45193) BATsample (b=0x7f9245e03078, n=128) at gdk_sample.c:79
4 Thread 0x7f93c8bf9700 (LWP 45192) BATsample (b=0x7f92568811c8, n=128) at gdk_sample.c:79
3 Thread 0x7f93c9a00700 (LWP 45155) exp_create (sa=, type=1) at rel_exp.c:39
2 Thread 0x7f93c8dfa700 (LWP 45190) __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
(gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007f93d313650d in putName (nme=0x7f924a0fe700 "s884_16", len=7) at mal_namespace.c:239
2 0x00007f93cbe0d041 in backend_dumpproc (be=0x7f93c0bc1230, c=0x7f93cd7291a0, cq=0x7f924a056640, s=0x7f924a0f5120) at sql_gencode.c:2290
3 0x00007f93cbe0658d in SQLparser (c=0x7f93cd7291a0) at sql_scenario.c:1601
4 0x00007f93d31488e7 in runPhase (c=, phase=) at mal_scenario.c:522
5 0x00007f93d3148a30 in runScenarioBody (c=0x7f924a0fe4c8) at mal_scenario.c:564
6 0x00007f93d31495bd in runScenario (c=0x7f93cd7291a0) at mal_scenario.c:601
7 0x00007f93d3149705 in MSserveClient (dummy=0x7f93cd7291a0) at mal_session.c:430
8 0x00007f93d0c38efc in start_thread (arg=0x7f93c8ffb700) at pthread_create.c:304
9 0x00007f93d097359d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
10 0x0000000000000000 in ?? ()
Comment 18275
Date: 2012-12-19 11:23:11 +0100
From: @mlkersten
Thank you for the detailed analysis.
Attempting to reproduce the error on my desktop machine and Feb2013 code.
I built the complete 5M row database as prescribed.
I restarted the server and attached gdb.
I ran the sequence of 40 concurrent users as prescribed, watching it using 'top'
It all seems to run smoothly sofar (still running).
One of the effects of this test is that namespace becomes polluted by a large number of query names, e.g. s884-16, which are never garbage collected.
This leads to a call to expandNamespace, which does not re-alloc, but performs a
malloc+copy+free. This could explain the SIGSEGV.
Resolutions:
Comment 18276
Date: 2012-12-19 11:35:56 +0100
From: @mlkersten
The test run finished without causing a segfault.
during expandNamespave.
Comment 18277
Date: 2012-12-19 11:40:48 +0100
From: @drstmane
Valerio,
could you compile MonetDB such that it does not use the SSE2 version of strncmp, e.g., by not using --march=opteron, and see whether the problem (segfault) persists?
Martin,
if done correctly (incl. checking for success), and gracefully bailing out otherwise), malloc, copy, free (instead of realloc) by themselves should not cause any segfaults.
Comment 18278
Date: 2012-12-19 11:43:32 +0100
From: @drstmane
Valerio,
would you have an option to upgrade to Oct2012-SP1 (http://dev.monetdb.org/downloads/sources/Oct2012-SP1/) or even the upcoming Oct2012-SP2 (http://dev.monetdb.org/downloads/testing/sources/Oct2012-SP2/), and check, whether the problem still persists (with --march=opteron, i.e., with __strncmp_sse2())?
Comment 18279
Date: 2012-12-19 11:50:56 +0100
From: @drstmane
Oops, I just saw that Valerio already tested Oct2012-SP1.
Comment 18280
Date: 2012-12-19 17:15:08 +0100
From: Valerio Aimale <>
Re: crashing in __strncmp_sse2, my opinion is that is just an epiphenomenon, not the real cause. That is where two or more threads meet, just due to stochastic, non deterministic behavior (execution timing, cpu load, disk speed varying over time etc.)
As a proof of that, at time, I would say 1 over 10 crashes, the problem manifests not as a crash, but as clients complaining of undefined namespaces; it's proof that threads might might meet elsewhere and "trash" namespace definitions.
When there are undefined namespaces, this is what I get on the console of the clients
[...]
TypeException:user.s989_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s990_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s991_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s992_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s993_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s994_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s995_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s996_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s997_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s998_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s999_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1000_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1001_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1002_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1003_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1004_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1005_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1006_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1007_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1008_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1009_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1010_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1011_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1012_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1013_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1014_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1015_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s1016_24[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s947_15[85]:'io.stdout' undefined in: _114:any := io.stdout()
program contains errors
TypeException:user.s948_15[85]:'io.stdout' undefined in: _114:any := io.stdout()
[...]
I will try without march=opteron definition
Comment 18281
Date: 2012-12-19 18:15:43 +0100
From: Valerio Aimale <>
compiled with CFLAGS="-g -O4"
still crashes in __strncmp_sse2. I guess it is an optimization in libc. It probably checks where the cpu has sse2 instruction, and, if yes, it will use the strncmp_sse2()
=================
[Thread debugging using libthread_db enabled]
Core was generated by `/usr/local/pkg/MonetDB-11.13.5-debug/bin/mserver5 --set gdk_dbfarm /data1/monet'.
Program terminated with signal 11, Segmentation fault.
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
214 ../sysdeps/x86_64/multiarch/../strcmp.S: No such file or directory.
in ../sysdeps/x86_64/multiarch/../strcmp.S
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f6bf60a4700 (LWP 46533) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f6be3fff700 (LWP 46660) BATsample (b=0x7f697ada3418, n=128) at gdk_sample.c:79
42 Thread 0x7f6bf02ef700 (LWP 46659) 0x00007f6bfb44f115 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:238
41 Thread 0x7f6bf04f0700 (LWP 46658) BATsample (b=0x7f696086c188, n=128) at gdk_sample.c:79
40 Thread 0x7f6bf06f1700 (LWP 46657) 0x00007f6bfaf2701a in BATsample (b=0x7f69871daa38, n=128) at gdk_sample.c:83
39 Thread 0x7f6bf08f2700 (LWP 46655) 0x00007f6bfaf27023 in BATsample (b=0x7f69608a6c78, n=128) at gdk_sample.c:83
38 Thread 0x7f6bf0af3700 (LWP 46654) 0x00007f6bfaf2702b in BATsample (b=0x7f697eba7298, n=128) at gdk_sample.c:79
37 Thread 0x7f6bf0cf4700 (LWP 46637) 0x00007f6bfaf27023 in BATsample (b=0x7f698716f128, n=128) at gdk_sample.c:83
36 Thread 0x7f6bf0ef5700 (LWP 46636) BATsample (b=0x7f6976920498, n=128) at gdk_sample.c:79
35 Thread 0x7f6bf10f6700 (LWP 46633) 0x00007f6bfb44f115 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:238
34 Thread 0x7f6bf12f7700 (LWP 46632) 0x00007f6bfaf2702b in BATsample (b=0x7f696e296f28, n=128) at gdk_sample.c:79
33 Thread 0x7f6bf14f8700 (LWP 46628) 0x00007f6bfaf27023 in BATsample (b=0x7f69768ca8d8, n=128) at gdk_sample.c:83
32 Thread 0x7f6bf3709700 (LWP 46534) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
31 Thread 0x7f6bf16f9700 (LWP 46625) BATsample (b=0x7f696a451628, n=128) at gdk_sample.c:82
30 Thread 0x7f6bf3508700 (LWP 46535) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
29 Thread 0x7f6bf18fa700 (LWP 46622) 0x00007f6bfaf27023 in BATsample (b=0x7f696086afb8, n=128) at gdk_sample.c:83
28 Thread 0x7f6bf1afb700 (LWP 46618) 0x00007f6bfaf27023 in BATsample (b=0x7f698fd898d8, n=128) at gdk_sample.c:83
27 Thread 0x7f6bf1cfc700 (LWP 46616) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
26 Thread 0x7f6bf1efd700 (LWP 46607) 0x00007f6bfaf27023 in BATsample (b=0x7f698be217c8, n=128) at gdk_sample.c:83
25 Thread 0x7f6bf20fe700 (LWP 46603) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
24 Thread 0x7f6bf22ff700 (LWP 46602) BATsample (b=0x7f6983c53e18, n=128) at gdk_sample.c:79
23 Thread 0x7f6bf2500700 (LWP 46599) 0x00007f6bfaf27023 in BATsample (b=0x7f696a40efa8, n=128) at gdk_sample.c:83
22 Thread 0x7f6bf2701700 (LWP 46595) BATsample (b=0x7f69643709f8, n=128) at gdk_sample.c:83
21 Thread 0x7f6bf2902700 (LWP 46591) 0x00007f6bfaf27023 in BATsample (b=0x3711bda8, n=128) at gdk_sample.c:83
20 Thread 0x7f6bf2b03700 (LWP 46589) 0x00007f6bfaf2702b in BATsample (b=0x7f697ebab318, n=128) at gdk_sample.c:79
19 Thread 0x7f6bf2d04700 (LWP 46585) BATsample (b=0x7f695842ab48, n=128) at gdk_sample.c:79
18 Thread 0x7f6bf2f05700 (LWP 46583) 0x00007f6bfaf2702b in BATsample (b=0x7f6960872cd8, n=128) at gdk_sample.c:79
17 Thread 0x7f6bf3106700 (LWP 46575) 0x00007f6bfaf27023 in BATsample (b=0x7f69642f2ed8, n=128) at gdk_sample.c:83
16 Thread 0x7f6bf3307700 (LWP 46573) 0x00007f6bfaf27023 in BATsample (b=0x7f696a40df28, n=128) at gdk_sample.c:83
15 Thread 0x7f6bfbeb2740 (LWP 46532) 0x00007f6bfa729613 in select () at ../sysdeps/unix/syscall-template.S:82
14 Thread 0x7f6be23f1700 (LWP 46674) 0x00007f6bfb44f0f0 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
13 Thread 0x7f6be25f2700 (LWP 46673) 0x00007f6bfb44f104 in putName (nme=0x1618000 "str", len=3) at mal_namespace.c:234
12 Thread 0x7f6be27f3700 (LWP 46672) BATsample (b=0x7f69871437f8, n=128) at gdk_sample.c:82
11 Thread 0x7f6be29f4700 (LWP 46671) 0x00007f6bfaf27023 in BATsample (b=0x7f696a412c88, n=128) at gdk_sample.c:83
10 Thread 0x7f6be2bf5700 (LWP 46670) BATsample (b=0x7f6972ad88b8, n=128) at gdk_sample.c:80
9 Thread 0x7f6be2df6700 (LWP 46669) 0x00007f6bfb44f104 in putName (nme=0x7f6bfb9bc26c "sortReverse", len=11) at mal_namespace.c:234
8 Thread 0x7f6be2ff7700 (LWP 46668) 0x00007f6bfaf27023 in BATsample (b=0x7f69642e6a58, n=128) at gdk_sample.c:83
7 Thread 0x7f6be31f8700 (LWP 46667) 0x00007f6bfaf27023 in BATsample (b=0x7f696e1ffcf8, n=128) at gdk_sample.c:83
6 Thread 0x7f6be33f9700 (LWP 46666) 0x00007f6bfaf27023 in BATsample (b=0x7f697ad9f4a8, n=128) at gdk_sample.c:83
5 Thread 0x7f6be35fa700 (LWP 46665) 0x00007f6bfaf27023 in BATsample (b=0x3711ddb8, n=128) at gdk_sample.c:83
4 Thread 0x7f6be37fb700 (LWP 46664) 0x00007f6bfb44f104 in putName (nme=0x7f6bf43ed597 "stdout", len=6) at mal_namespace.c:234
3 Thread 0x7f6be39fc700 (LWP 46663) BATsample (b=0x7f69583af298, n=128) at gdk_sample.c:79
2 Thread 0x7f6be3bfd700 (LWP 46662) BATsample (b=0x7f6964370018, n=128) at gdk_sample.c:83
gdb) bt
0 __strncmp_sse2 () at ../sysdeps/x86_64/multiarch/../strcmp.S:214
1 0x00007f6bfb44f125 in putName (nme=0x7f6bf43ed597 "stdout", len=6) at mal_namespace.c:239
2 0x00007f6bfb430da7 in newStmt (mb=0x7f698be1bec0, module=0x7f6bf43eeeba "io", name=0x7f6bf43ed597 "stdout") at mal_builder.c:59
3 0x00007f6bf430cd37 in _dumpstmt (sql=, mb=0x7f698be1bec0, s=0x7f698be33820) at sql_gencode.c:2016
4 0x00007f6bf430d9f2 in _dumpstmt (s=, mb=, sql=) at sql_gencode.c:707
5 backend_dumpstmt (be=0x7f6be8a3c460, mb=0x7f698be1bec0, s=0x7f698be33820) at sql_gencode.c:2206
6 0x00007f6bf430e55c in backend_dumpproc (be=0x7f6be8a3c460, c=0x7f6bf5c4a3a8, cq=0x7f698be1b090, s=0x7f698be33820) at sql_gencode.c:2330
7 0x00007f6bf4306f36 in SQLparser (c=0x7f6bf5c4a3a8) at sql_scenario.c:1601
8 0x00007f6bfb463bbc in runPhase (phase=1, c=0x7f6bf5c4a3a8) at mal_scenario.c:522
9 runScenarioBody (c=0x7f6bf5c4a3a8) at mal_scenario.c:564
10 0x00007f6bfb464d0f in runScenario (c=0x7f6bf5c4a3a8) at mal_scenario.c:601
11 0x00007f6bfb464db8 in MSserveClient (dummy=0x7f6bf5c4a3a8) at mal_session.c:430
12 0x00007f6bfa9f5efc in start_thread (arg=0x7f6be3dfe700) at pthread_create.c:304
13 0x00007f6bfa73059d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
14 0x0000000000000000 in ?? ()
Comment 18282
Date: 2012-12-19 18:46:18 +0100
From: @drstmane
Thanks!
Since we cannot yet reproduce the segfault, could you try whether you can reproduce it also with a non-optimized build (from scratch), preferably configured without setting CFLAGS and using options --disable-optimize --enable-debug --enable-assert ?
If that does not trigger a segfault, also an optimized build (from scratch) with CFLAGS="-g -O4 -mno-sse2" would be interesting ...
Comment 18283
Date: 2012-12-19 18:51:34 +0100
From: @grobian
The libc is compiled with SSE support, so it seems unlikely compilation settings for mserver5 will make any difference in it (libc) using the sse-optimised strcmp.
Comment 18284
Date: 2012-12-19 18:54:28 +0100
From: Valerio Aimale <>
I agree. The sse2 optimization is in libc.
Comment 18285
Date: 2012-12-19 19:02:07 +0100
From: @drstmane
good point. my fault. thanks.
Comment 18286
Date: 2012-12-19 19:42:24 +0100
From: Valerio Aimale <>
I want you guys to have the full log of clients' stderr when the crash manifests as undefined namespaces. You can download it from
http://www.aimale.com/log.xz
it is 10Mb compressed and 0.5GB when uncompressed.
I'm not sure it is that informative.
Comment 18287
Date: 2012-12-19 19:46:57 +0100
From: Valerio Aimale <>
Compiled with --disable-optimize --enable-debug --enable-assert from a pristine source tarball, as requested. This crashes as:
(gdb) info threads
Id Target Id Frame
44 Thread 0x7f227e215700 (LWP 17952) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
43 Thread 0x7f228745d740 (LWP 17949) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
42 Thread 0x7f22759ec700 (LWP 18089) 0x00007f22854bcac7 in mprotect () at ../sysdeps/unix/syscall-template.S:82
41 Thread 0x7f2280e2e700 (LWP 17950) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
40 Thread 0x7f227ca09700 (LWP 18039) BATsample (b=0x7f22636d4508, n=128) at gdk_sample.c:79
39 Thread 0x7f227cc0a700 (LWP 18036) 0x00007f2285d414aa in BATsample (b=0x7f21fc114b08, n=128) at gdk_sample.c:83
38 Thread 0x7f227ce0b700 (LWP 18035) 0x00007f2285d414ba in BATsample (b=0x7f21fc0c68c8, n=128) at gdk_sample.c:79
37 Thread 0x7f227d00c700 (LWP 18023) 0x00007f2285d4147e in BATsample (b=0x7f21f85941a8, n=128) at gdk_sample.c:83
36 Thread 0x7f227d20d700 (LWP 18021) BATsample (b=0x7f21f8548158, n=128) at gdk_sample.c:80
35 Thread 0x7f227e416700 (LWP 17951) 0x00007f22854b9613 in select () at ../sysdeps/unix/syscall-template.S:82
34 Thread 0x7f227d40e700 (LWP 18019) 0x00007f2285d414aa in BATsample (b=0x7f21fc13c2f8, n=128) at gdk_sample.c:83
33 Thread 0x7f227d60f700 (LWP 18015) 0x00007f2285d41451 in BATsample (b=0x7f22672b64d8, n=128) at gdk_sample.c:83
32 Thread 0x7f227d810700 (LWP 18008) 0x00007f2285d414c5 in BATsample (b=0x7339c98, n=128) at gdk_sample.c:79
31 Thread 0x7f227da11700 (LWP 18007) 0x00007f2285d4149e in BATsample (b=0x7f225f2b4898, n=128) at gdk_sample.c:83
30 Thread 0x7f227dc12700 (LWP 18002) 0x00007f2285d41451 in BATsample (b=0x7f225f3992f8, n=128) at gdk_sample.c:83
29 Thread 0x7f227de13700 (LWP 17999) 0x00007f2285d414c7 in BATsample (b=0x7f2257890a68, n=128) at gdk_sample.c:79
28 Thread 0x7f227e014700 (LWP 17994) 0x00007f2285d41472 in BATsample (b=0x7f2257866748, n=128) at gdk_sample.c:83
27 Thread 0x7f22751e8700 (LWP 18093) 0x00007f2285d41483 in BATsample (b=0x7f2200b2ee28, n=128) at gdk_sample.c:83
26 Thread 0x7f22753e9700 (LWP 18092) 0x00007f2285d414aa in BATsample (b=0x7f225783d4d8, n=128) at gdk_sample.c:83
25 Thread 0x7f22755ea700 (LWP 18091) 0x00007f2285d414aa in BATsample (b=0x7f224f7925f8, n=128) at gdk_sample.c:83
24 Thread 0x7f22757eb700 (LWP 18090) 0x00007f2285d4149e in BATsample (b=0x7f22672baa78, n=128) at gdk_sample.c:83
23 Thread 0x7f2275bed700 (LWP 18088) 0x00007f2285d414ba in BATsample (b=0x7f225774ca48, n=128) at gdk_sample.c:79
22 Thread 0x7f2275dee700 (LWP 18087) 0x00007f2285d41483 in BATsample (b=0x7f2213d2e9c8, n=128) at gdk_sample.c:83
21 Thread 0x7f2275fef700 (LWP 18086) 0x00007f2285d41472 in BATsample (b=0x7f226aed1ee8, n=128) at gdk_sample.c:83
20 Thread 0x7f22761f0700 (LWP 18085) 0x00007f2285d4149a in BATsample (b=0x7f2267301c08, n=128) at gdk_sample.c:83
19 Thread 0x7f22763f1700 (LWP 18084) BATsample (b=0x7f22637c83a8, n=128) at gdk_sample.c:79
18 Thread 0x7f22765f2700 (LWP 18083) 0x00007f2285d4147e in BATsample (b=0x7f226aef7858, n=128) at gdk_sample.c:83
17 Thread 0x7f22767f3700 (LWP 18082) 0x00007f2285d4149e in BATsample (b=0x7f22637cf848, n=128) at gdk_sample.c:83
16 Thread 0x7f22769f4700 (LWP 18081) 0x00007f2285d4147e in BATsample (b=0x7f224f792da8, n=128) at gdk_sample.c:83
15 Thread 0x7f2276bf5700 (LWP 18080) 0x00007f2285d414ba in BATsample (b=0x7f226fe67428, n=128) at gdk_sample.c:79
14 Thread 0x7f2276df6700 (LWP 18079) 0x00007f2285d414ba in BATsample (b=0x7f2200a6fc28, n=128) at gdk_sample.c:79
13 Thread 0x7f2276ff7700 (LWP 18078) 0x00007f2285d414c7 in BATsample (b=0x7f2213cdf6f8, n=128) at gdk_sample.c:79
12 Thread 0x7f22771f8700 (LWP 18075) 0x00007f2285d4149e in BATsample (b=0x7f226fe93278, n=128) at gdk_sample.c:83
11 Thread 0x7f22773f9700 (LWP 18073) 0x00007f2285d41451 in BATsample (b=0x7f2213ce6ce8, n=128) at gdk_sample.c:83
10 Thread 0x7f22775fa700 (LWP 18072) 0x00007f2285d414aa in BATsample (b=0x7f224f688ec8, n=128) at gdk_sample.c:83
9 Thread 0x7f22777fb700 (LWP 18068) 0x00007f2285d4149e in BATsample (b=0x7f2263664388, n=128) at gdk_sample.c:83
8 Thread 0x7f22779fc700 (LWP 18064) 0x00007f2285d414c2 in BATsample (b=0x7f226fe3fe28, n=128) at gdk_sample.c:79
7 Thread 0x7f2277bfd700 (LWP 18058) 0x00007f2285d4149a in BATsample (b=0x7f224f7b3f78, n=128) at gdk_sample.c:83
6 Thread 0x7f2277dfe700 (LWP 18054) 0x00007f2285d414ba in BATsample (b=0x7f22672dc928, n=128) at gdk_sample.c:79
5 Thread 0x7f2277fff700 (LWP 18051) BATsample (b=0x7f221544b358, n=128) at gdk_sample.c:80
4 Thread 0x7f227c205700 (LWP 18049) 0x00007f2285d414c7 in BATsample (b=0x73615c8, n=128) at gdk_sample.c:79
3 Thread 0x7f227c406700 (LWP 18046) 0x00007f2285d4149a in BATsample (b=0x7f2263750628, n=128) at gdk_sample.c:83
2 Thread 0x7f227c607700 (LWP 18044) 0x00007f2285d4149e in BATsample (b=0x7f21fc0eb918, n=128) at gdk_sample.c:83
(gdb) bt
0 0x00007f22864366e7 in putName (nme=0x2cd5fc0 "str", len=3) at mal_namespace.c:234
1 0x00007f228640a5d2 in newStmt1 (mb=0x7f22154734a0, module=0x1f32f10 "calc", name=0x2cd5fc0 "str") at mal_builder.c:71
2 0x00007f227f022578 in _dumpstmt (sql=0x7f22709be8e0, mb=0x7f22154734a0, s=0x7f22154a6700) at sql_gencode.c:1765
3 0x00007f227f024720 in backend_dumpstmt (be=0x7f22709be8e0, mb=0x7f22154734a0, s=0x7f221549eb20) at sql_gencode.c:2206
4 0x00007f227f02503a in backend_dumpproc (be=0x7f22709be8e0, c=0x7f22809d1858, cq=0x7f22153d3170, s=0x7f221549eb20) at sql_gencode.c:2330
5 0x00007f227f018bcc in SQLparser (c=0x7f22809d1858) at sql_scenario.c:1601
6 0x00007f2286451549 in runPhase (c=0x7f22809d1858, phase=1) at mal_scenario.c:522
7 0x00007f2286451681 in runScenarioBody (c=0x7f22809d1858) at mal_scenario.c:564
8 0x00007f22864518fa in runScenario (c=0x7f22809d1858) at mal_scenario.c:601
9 0x00007f228645282e in MSserveClient (dummy=0x7f22809d1858) at mal_session.c:430
10 0x00007f2285785efc in start_thread (arg=0x7f227c808700) at pthread_create.c:304
11 0x00007f22854c059d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
12 0x0000000000000000 in ?? ()
This crash is solved by the patch I previously posted:
====================================================
--- mal_namespace.c.orig 2012-10-28 09:24:48.555393313 -0600
+++ mal_namespace.c 2012-10-28 11:53:40.792026089 -0600
@@ -231,7 +231,8 @@
ifdef BACKUP
chkName(l);
endif
@@ -266,6 +267,9 @@
*/
return namespace.nme[l];
}
======================================================
Comment 18288
Date: 2012-12-19 20:01:38 +0100
From: @drstmane
As far as I know, a similar patch is in the upcoming Oct2012-SP2 release
cf., http://dev.monetdb.org/hg/MonetDB/rev/24c408dcf765
Could you try that one?
cf., http://dev.monetdb.org/downloads/testing/sources/Oct2012-SP2/
Comment 18289
Date: 2012-12-19 20:25:13 +0100
From: Valerio Aimale <>
Stefan,
I think I've tried something similar. If you look at Comment 7 above, that was my first attempt at a fix. However, bracketing the whole loop
for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){
with thread isolation, pays a significant performance price. All threads pile up at the loop entrance.
Instead, by bracketing only
l = namespace.link[l];
performance is virtually unmodified.
Seeing the crashes we saw with -O4,
strncmp(nme,namespace.nme[l],len) == 0
might have to be bracketed too with thread isolation too.
Comment 18290
Date: 2012-12-19 20:46:47 +0100
From: @mlkersten
I am preparing a different solution to the namespace implementation to tackle
the two problems noted before.
It requires a different approach to be safe under such stress situations.
regards, Martin
Comment 18291
Date: 2012-12-20 21:57:49 +0100
From: @mlkersten
Changeset bd3853eda3ee made by Martin Kersten mk@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=bd3853eda3ee
Changeset description:
Comment 18292
Date: 2012-12-20 21:58:14 +0100
From: @mlkersten
A new namespace manager has been introduced, which allows for concurrent reads without locks. Writes in the structure are protected with locks. It significantly improves the running time of this test case.
The startup cost is slightly longer, because now we use separate malloced structures.
The patch does not address the current SQL limitation to produce unique persistent names for all queries once cached.
Please confirm effectiveness of this patch.
Comment 18293
Date: 2012-12-20 22:03:12 +0100
From: @mlkersten
triple run of the experiment with the new namespace manager does not lead to SEGFAULTs on my desktop machine.
Comment 18294
Date: 2012-12-20 22:04:15 +0100
From: Valerio Aimale <>
Thanks, Martin. I will test and report
Comment 18295
Date: 2012-12-20 22:49:33 +0100
From: Valerio Aimale <>
Martin, I have plugged in this file
http://dev.monetdb.org/hg/MonetDB/raw-file/bd3853eda3ee/monetdb5/mal/mal_namespace.c
into a pristine source tree of MonetDB 11.13.5
The variable mal_namespaceLock is used but not defined in the new mal_namespace.c , preventing compilation.
Comment 18296
Date: 2012-12-20 22:56:37 +0100
From: @mlkersten
I had patched the Feb2013 branch. This includes the following code.
mal.c:MT_Lock mal_namespaceLock;
mal.c: MT_lock_init( &mal_namespaceLock, "mal_namespaceLock");
mal.h:mal_export MT_Lock mal_namespaceLock;
mal_namespace.c: MT_lock_set(&mal_namespaceLock, "finishNamespace");
mal_namespace.c: MT_lock_unset(&mal_namespaceLock, "finishNamespace");
mal_namespace.c: MT_lock_set(&mal_namespaceLock, "putName");
mal_namespace.c: MT_lock_unset(&mal_namespaceLock, "putName");
Comment 18297
Date: 2012-12-21 01:06:44 +0100
From: Valerio Aimale <>
Martin,
the first tests are very good. I ran the usual 40 concurrent clients 4 times. Only once I had a crash:
2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed.2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed.2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed. 2012-12-20 15:32:01 ERR crashdb[35830]: mserver5: opt_pipes.c:520: compileOptimizer: Assertion
c != ((void *)0)' failed.2012-12-20 15:32:01 MSG merovingian[33044]: database 'crashdb' (35830) was killed by signal SIGABRT
The other three 3 times it worked very well without a glitch.
Comment 18298
Date: 2012-12-21 08:30:06 +0100
From: @mlkersten
Indeed. Internally a client record was taken from the pool for compilation. With the stress test under consideration, there may not be left a client slot by the time you reach that point. A patch is in testing.
Comment 18299
Date: 2012-12-21 09:17:29 +0100
From: @mlkersten
Patch committed. It uses a static client record instead now.
The (single) test run passes.
Comment 18364
Date: 2013-01-22 09:29:07 +0100
From: @sjoerdmullender
Oct2012-SP3 has been released.
The text was updated successfully, but these errors were encountered: