Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database does not start after upgrade #3420

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Database does not start after upgrade #3420

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working normal SQL

Comments

@monetdb-team
Copy link

Date: 2014-01-16 11:37:10 +0100
From: Christian Braun <>
To: SQL devs <>
Version: 11.17.9 (Jan2014)
CC: @mlkersten, @njnes

Last updated: 2014-02-20 15:02:48 +0100

Comment 19456

Date: 2014-01-16 11:37:10 +0100
From: Christian Braun <>

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0
Build Identifier:

After upgrading from 'MonetDB 5 server v11.15.19 "Feb2013-SP6"' to the release candidate, the database does not start. I can create and start a new database. But my existing database does not start. In the log i have:

2014-01-15 17:36:45 MSG merovingian[19494]: starting database 'db', up min/avg/max: 0s/3d/1w, crash average: 0.00 0.00 0.20 (77-67=10)
2014-01-15 17:36:45 MSG db[20059]: arguments: /usr/bin/mserver5 --dbpath=/var/monetdb/dbfarm/db --set merovingian_uri=mapi:monetdb://etna:50000/db --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb/dbfarm/db/.mapi.sock --set monet_vault_key=/var/monetdb/dbfarm/db/.vaultkey --set gdk_nr_threads=8 --set max_clients=64 --set sql_optimizer=default_pipe --set monet_daemon=yes

The mserver just hangs and uses 100% cpu on one core with no disk activity. It did not complete the start after over 10 hours. I then terminated the MonetDB processes and reinstalled Feb2013-SP6. With the previous version the database starts fine.

Is there anything else i can try to start my database?

Reproducible: Always

Comment 19475

Date: 2014-01-21 09:31:42 +0100
From: @sjoerdmullender

We have little to go on here, and I haven't been able to reproduce this problem. Can you provide a stack trace (e.g. run the program ptrace on the server, or attach gdb and execute "thread apply all bt")?
Also, can you attach strace (a few hundred system calls is probably enough) for a bit so that we can get an idea what the server is doing?
Please attach the outputs to this bug report.

Comment 19478

Date: 2014-01-21 14:52:12 +0100
From: Christian Braun <>

Created attachment 254
(gdb) thread apply all bt

Attached file: gdb.log (text/x-log, 8479 bytes)
Description: (gdb) thread apply all bt

Comment 19479

Date: 2014-01-21 14:53:18 +0100
From: Christian Braun <>

Created attachment 255
strace -tt -p 20487 -o strace.log

Attached file: strace.log.gz (application/x-gzip, 126017 bytes)
Description: strace -tt -p 20487 -o strace.log

Comment 19480

Date: 2014-01-21 14:56:53 +0100
From: Christian Braun <>

Thank you Sjoerd. Please let me know if i can help with anything else.

Comment 19483

Date: 2014-01-22 12:44:32 +0100
From: Christian Braun <>

Created attachment 256
20 hour startup

Attached file: merovingian.log (text/x-log, 24953 bytes)
Description: 20 hour startup

Comment 19484

Date: 2014-01-22 12:50:57 +0100
From: Christian Braun <>

After 20 hours the startup of MonetDB completed.

My database has 5k tables with 100k columns and a size of 300GB. Though 90% of the tables/columns are empty. Maybe it is slow because of the number of columns.

Comment 19485

Date: 2014-01-22 13:03:00 +0100
From: @njnes

5K tables with each (on average) 20 columns?

Comment 19487

Date: 2014-01-22 13:14:44 +0100
From: Christian Braun <>

yes, on average 20 columns.

Comment 19490

Date: 2014-01-22 17:07:05 +0100
From: @sjoerdmullender

Since the server does start up, I'd say we can drop the "critical" designation. The bug seems to be about slow startup, not about no startup.

Comment 19491

Date: 2014-01-22 17:32:23 +0100
From: Christian Braun <>

A 20 hour startup time is still critical to me. It is not much better then a not starting at all. Makes the new release unusable for me.

Comment 19492

Date: 2014-01-22 17:39:10 +0100
From: @sjoerdmullender

The question is, does the 20 hours only happen the for the first restart or for all subsequent restarts?

Comment 19493

Date: 2014-01-22 20:05:47 +0100
From: Christian Braun <>

I did a restart 8 hours ago and it is still not finished. So unfortunately not faster on restart.

Comment 19494

Date: 2014-01-22 20:14:52 +0100
From: @mlkersten

Could you take a sample of an strace call,
it may show what the system is doing.

Comment 19495

Date: 2014-01-22 20:26:54 +0100
From: Christian Braun <>

Created attachment 257
strace -p 6851 -tt -o strace2.log

Attached file: strace2.log.gz (application/x-gzip, 964568 bytes)
Description: strace -p 6851 -tt -o strace2.log

Comment 19496

Date: 2014-01-22 20:27:29 +0100
From: Christian Braun <>

Comment on attachment 257
strace -p 6851 -tt -o strace2.log

strace -p 6851 -tt -o strace2.log

Comment 19497

Date: 2014-01-22 20:42:25 +0100
From: @mlkersten

Thank you, that trace is very informative and may help to chase the bug quickly.

It looks like during recovery files are opened/closed way too many times.
As if individual updates leads to fsync requests.

To be analysed by the experts.

Comment 19498

Date: 2014-01-24 09:14:38 +0100
From: @sjoerdmullender

Can you build your own version of the server from the sources? Or aternatively, can you tell me the distribution (and version) you're using so that I can build a set of packages for you?

We can't get a handle on what the server is doing, so we've extended the debug flags a little, and we would like you to start the server with those flags set. You would need to look in the merovingian.log file to see the exact command used to start the server, and copy that to a shell command line. Then add the flags
--set sql_debug=3
to the command line option and save the error output in a file (2> file).

If you can build yourself, do so with configure flags --enable-debug --disable-optimize --enable-assert so that the debugger can make sense of the binary.

Comment 19500

Date: 2014-01-24 13:55:38 +0100
From: Christian Braun <>

I am using Debian wheezy and the packages from:
deb http://dev.monetdb.org/downloads/testing/deb/ wheezy monetdb

I downloaded your source package, changed the 3 options in the debian/rules file and rebuild with dpkg-buildpackage. But i am not getting any extra log message on the console or in merovingian.log.

$ /usr/bin/mserver5 --dbpath=/var/monetdb/dbfarm/db --set merovingian_uri=mapi:monetdb://etna:50000/db --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb/dbfarm/db/.mapi.sock --set monet_vault_key=/var/monetdb/dbfarm/db/.vaultkey --set gdk_nr_threads=8 --set max_clients=64 --set sql_optimizer=default_pipe --set monet_daemon=yes --set sql_debug=3
MonetDB 5 server v11.17.1 "Jan2014"
Serving database 'db', using 8 threads
Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked
Found 47.263 GiB available main-memory.
Copyright (c) 1993-July 2008 CWI.
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb/dbfarm/db/.mapi.sock

$ mserver5 --version --dbname=db
MonetDB 5 server v11.17.1 "Jan2014" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 47.3GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.30 2012-02-04 (compiled with 8.30)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with )
libxml2: 2.8.0 (compiled with 2.8.0)
Compiled by: root@wheezy (x86_64-pc-linux-gnu)
Compilation: gcc -g
Linking : /usr/bin/ld -m elf_x86_64

Comment 19501

Date: 2014-01-24 14:01:16 +0100
From: @sjoerdmullender

The source package doesn't have the debug flags yet. You need to get the sources from our Mercurial repository. Use
hg clone -u Jan2014 http://dev.monetdb.org/hg/MonetDB/
Then compile using
./bootstrap
./configure --enable-debug --enable-assert --disable-optimize --prefix=...
make
make install

Or else I can build a package for you.

Comment 19502

Date: 2014-01-24 14:02:44 +0100
From: Christian Braun <>

Created attachment 258
strace -p 3039 -tt -o strace3.log

Attached file: strace3.log.gz (application/x-gzip, 310521 bytes)
Description: strace -p 3039 -tt -o strace3.log

Comment 19503

Date: 2014-01-24 15:21:05 +0100
From: Christian Braun <>

Created attachment 259
mserver5 --set sql_debug #3

Attached file: debug1.txt.bz2 (application/x-redhat-package-manager, 889852 bytes)
Description: mserver5 --set sql_debug #3

Comment 19504

Date: 2014-01-24 15:25:15 +0100
From: Christian Braun <>

I followed your instructions. Attached is what i got from mserver5. I add a timestamp to the output. Table and column names i had to replace with a placeholder. The process is still running and loading a column every ~2 seconds.

Comment 19505

Date: 2014-01-24 15:57:19 +0100
From: @sjoerdmullender

Thanks to the debug output we now have a very plausible explanation of why things are so slow, and also why the traces look like they do (the traces show a BAT being created with an OID head column that is quickly destroyed again).
The cause is that you have probably at some time during the life of the database dropped a column or table. We should deal better with this situation.

Hopefully we'll now be able to fix this problem, and then I will create a new release candidate next week.

Comment 19506

Date: 2014-01-25 12:05:34 +0100
From: MonetDB Mercurial Repository <>

Changeset fa8c4a05a9f6 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=fa8c4a05a9f6

Changeset description:

cache bat access during catalog loading, solves bug #3420

Comment 19507

Date: 2014-01-28 10:06:24 +0100
From: Christian Braun <>

Thank you. With v11.17.3 "Jan2014" my database starts up in 60 seconds.

Comment 19508

Date: 2014-01-28 10:14:34 +0100
From: @sjoerdmullender

Thanks for the feedback, and also thanks for the help nailing this problem.
I think we can now consider this bug fixed.

Comment 19607

Date: 2014-02-20 15:02:48 +0100
From: @sjoerdmullender

Jan2014 has been released.

@monetdb-team monetdb-team added bug Something isn't working normal SQL labels Nov 30, 2020
@sjoerdmullender sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working normal SQL
Projects
None yet
Development

No branches or pull requests

2 participants