You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.17.21 (Jan2014-SP3)
CC: @njnes
Last updated: 2014-10-31 14:14:56 +0100
Comment 20050
Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>
I know that vacuum/reuse/shrink are not officially supported, but I'm hoping that either a quick fix or an alternate way of accomplishing what we want is possible. We're not using foreign keys at all. A rationale for why I'm using unsupported functions is at the bottom of this bug report.
Test script:
START TRANSACTION;
CREATE TABLE "sys"."test_201306" (
"id" BIGINT,
"s1" VARCHAR(127),
"s2" VARCHAR(127),
"s3" VARCHAR(127),
"i1" BIGINT,
"s4" VARCHAR(127),
"s5" VARCHAR(127)
);
COPY 5 RECORDS INTO "sys"."test_201306" FROM stdin USING DELIMITERS '\t','\n','"';
4906 "1-4520.10.1.4.82" "Et8oB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979992 "" ""
4907 "1-4564.10.1.4.82" "vn5nB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979991 "" ""
4908 "1-4576.10.1.4.82" "sp3pB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979993 "" ""
4909 "1-2560.10.1.4.82" "p5dsV5B_3vmo40-8yTpHAdf" "10.1.4.82" 78911885093920348 "" ""
4910 "1-1596.10.1.4.82" "8Q1Z7RnE4vmo40-8yTpHAdf" "10.1.4.82" 78911885778309629 "" ""
COMMIT;
create table x as select * from test_201306 with data;
select id from x;
call sys.reuse('sys','x');
select id from x;
call sys.reuse('sys','x');
select * from x;
The conditions under which it goes wrong are extremely random. If you can't reproduce it then try adding or removing some columns or calling reuse() some more times. It always seems to be the last row of the table that is corrupted.
Happens with both Jan2014SP3 release and current Jan2014 HEAD.
Our rationale for using reuse() is twofold: we are replicating from a standard row-store into MonetDB for the obvious analytics performance reasons. Our database is mostly-insert but very occasionally a bunch of rows will need to be updated. We used to handle this with UPDATE statements but they were taking on the order of 100ms per row and were causing our replication to get embarrassingly far behind reality; the fix for this was to switch to DELETE+INSERT.
Having made that switch we noticed that performance would dive for queries on any table which had ever had a DELETE on it; the performance dive seemed to be associated with an inability to keep an index on that table.
Unrelated to the deletions, we also observed that some of our columns were taking up far more disk space than we expected. Notably, varchar columns which have a large number of distinct values (but not all-distinct) appear to switch into a mode whereby they would treat all values as distinct, and thereby consume the full amount of disk space required for that scenario. A call to sys.reuse() would rehash the values of these columns and hence reclaim several hundred GB of disk space.
Before jumping for the undocumented features we tried using "create table bar as select * from foo with data" to repack the table but, in addition to being considerably slower than reuse(), this did nothing to rehash varchar columns.
I sincerely apologize for my evilness, but hope that you can forgive me enough to look in to the problem in any case.
Comment 20054
Date: 2014-08-18 17:39:06 +0200
From: Richard Hughes <<richard.monetdb>>
I just crashed mserver5 (Jan2014 HEAD). I'm not sure whether it's related to this or not.
(gdb) bt
0 DELTAproject (result=0x7fe3933d6190, sub=,
col=, uid=, uval=0x7fe3933d6170,
ins=) at sql.c:2175
1 0x00007fe87c8a247b in runMALsequence (cntxt=0x9cc0, mb=0x7fe392f43b40,
startpc=0, stoppc=0, stk=0x7fe3933d48c0, env=0x7fe3d4096630,
pcicaller=0x0) at mal_interpreter.c:654
2 0x00007fe87c8a42c5 in DFLOWworker (T=0x9cc0) at mal_dataflow.c:358
3 0x00007fe87b2360a4 in start_thread (arg=0x7fe3d309f700)
at pthread_create.c:309
4 0x00007fe87af6b04d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p res
$1 = (BAT *) 0x0
I was getting "BATproject: does not match always" from queries for a few minutes before this finally happened. The interesting thing was that I was getting that from queries on one table where sys.reuse() had been called recently and from queries on one table where sys.reuse() had never been called.
Comment 20055
Date: 2014-08-19 14:47:44 +0200
From: Richard Hughes <<richard.monetdb>>
Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.17.21 (Jan2014-SP3)
CC: @njnes
Last updated: 2014-10-31 14:14:56 +0100
Comment 20050
Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>
I know that vacuum/reuse/shrink are not officially supported, but I'm hoping that either a quick fix or an alternate way of accomplishing what we want is possible. We're not using foreign keys at all. A rationale for why I'm using unsupported functions is at the bottom of this bug report.
Test script:
START TRANSACTION;
CREATE TABLE "sys"."test_201306" (
"id" BIGINT,
"s1" VARCHAR(127),
"s2" VARCHAR(127),
"s3" VARCHAR(127),
"i1" BIGINT,
"s4" VARCHAR(127),
"s5" VARCHAR(127)
);
COPY 5 RECORDS INTO "sys"."test_201306" FROM stdin USING DELIMITERS '\t','\n','"';
4906 "1-4520.10.1.4.82" "Et8oB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979992 "" ""
4907 "1-4564.10.1.4.82" "vn5nB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979991 "" ""
4908 "1-4576.10.1.4.82" "sp3pB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979993 "" ""
4909 "1-2560.10.1.4.82" "p5dsV5B_3vmo40-8yTpHAdf" "10.1.4.82" 78911885093920348 "" ""
4910 "1-1596.10.1.4.82" "8Q1Z7RnE4vmo40-8yTpHAdf" "10.1.4.82" 78911885778309629 "" ""
COMMIT;
create table x as select * from test_201306 with data;
select id from x;
call sys.reuse('sys','x');
select id from x;
call sys.reuse('sys','x');
select * from x;
Expected results:
+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+------+--------------+---------------+-----------+-------------------+------+
| id | s1 | s2 | s3 | i1 | s4 |>
+======+==============+===============+===========+===================+======+
| 4906 | 1-4520.10.1. | Et8oB.Kf0vmo4 | 10.1.4.82 | 78911881069979992 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4907 | 1-4564.10.1. | vn5nB.Kf0vmo4 | 10.1.4.82 | 78911881069979991 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4908 | 1-4576.10.1. | sp3pB.Kf0vmo4 | 10.1.4.82 | 78911881069979993 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4909 | 1-2560.10.1. | p5dsV5B_3vmo4 | 10.1.4.82 | 78911885093920348 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4910 | 1-1596.10.1. | 8Q1Z7RnE4vmo4 | 10.1.4.82 | 78911885778309629 | |
: : 4.82 : 0-8yTpHAdf : : : :
+------+--------------+---------------+-----------+-------------------+------+
5 tuples !1 column dropped!
Actual results:
+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+----------------------+
| id |
+======================+
| 4910 |
| 4907 |
| 4908 |
| 4909 |
| -2604246222170760229 |
+----------------------+
5 tuples
+----------------------+------------------+----------------------+-----------+
| id | s1 | s2 | s3 |>
+======================+==================+======================+===========+
| 4910 | 1-1596.10.1.4.82 | 8Q1Z7RnE4vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4907 | 1-4564.10.1.4.82 | vn5nB.Kf0vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4908 | 1-4576.10.1.4.82 | sp3pB.Kf0vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4909 | 1-2560.10.1.4.82 | p5dsV5B_3vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| -2604246222170760229 | | | |
+----------------------+------------------+----------------------+-----------+
5 tuples !3 columns dropped!
The conditions under which it goes wrong are extremely random. If you can't reproduce it then try adding or removing some columns or calling reuse() some more times. It always seems to be the last row of the table that is corrupted.
Happens with both Jan2014SP3 release and current Jan2014 HEAD.
Our rationale for using reuse() is twofold: we are replicating from a standard row-store into MonetDB for the obvious analytics performance reasons. Our database is mostly-insert but very occasionally a bunch of rows will need to be updated. We used to handle this with UPDATE statements but they were taking on the order of 100ms per row and were causing our replication to get embarrassingly far behind reality; the fix for this was to switch to DELETE+INSERT.
Having made that switch we noticed that performance would dive for queries on any table which had ever had a DELETE on it; the performance dive seemed to be associated with an inability to keep an index on that table.
Unrelated to the deletions, we also observed that some of our columns were taking up far more disk space than we expected. Notably, varchar columns which have a large number of distinct values (but not all-distinct) appear to switch into a mode whereby they would treat all values as distinct, and thereby consume the full amount of disk space required for that scenario. A call to sys.reuse() would rehash the values of these columns and hence reclaim several hundred GB of disk space.
Before jumping for the undocumented features we tried using "create table bar as select * from foo with data" to repack the table but, in addition to being considerably slower than reuse(), this did nothing to rehash varchar columns.
I sincerely apologize for my evilness, but hope that you can forgive me enough to look in to the problem in any case.
Comment 20054
Date: 2014-08-18 17:39:06 +0200
From: Richard Hughes <<richard.monetdb>>
I just crashed mserver5 (Jan2014 HEAD). I'm not sure whether it's related to this or not.
(gdb) bt
0 DELTAproject (result=0x7fe3933d6190, sub=,
col=, uid=, uval=0x7fe3933d6170,
ins=) at sql.c:2175
1 0x00007fe87c8a247b in runMALsequence (cntxt=0x9cc0, mb=0x7fe392f43b40,
startpc=0, stoppc=0, stk=0x7fe3933d48c0, env=0x7fe3d4096630,
pcicaller=0x0) at mal_interpreter.c:654
2 0x00007fe87c8a42c5 in DFLOWworker (T=0x9cc0) at mal_dataflow.c:358
3 0x00007fe87b2360a4 in start_thread (arg=0x7fe3d309f700)
at pthread_create.c:309
4 0x00007fe87af6b04d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p res
$1 = (BAT *) 0x0
I was getting "BATproject: does not match always" from queries for a few minutes before this finally happened. The interesting thing was that I was getting that from queries on one table where sys.reuse() had been called recently and from queries on one table where sys.reuse() had never been called.
Comment 20055
Date: 2014-08-19 14:47:44 +0200
From: Richard Hughes <<richard.monetdb>>
Created attachment 295
Proposed fix
How's this for a fix?
Comment 20094
Date: 2014-08-26 08:31:46 +0200
From: @njnes
patch checked into oct2014 which solves this bug.
Comment 20397
Date: 2014-10-31 14:14:56 +0100
From: @sjoerdmullender
Oct2014 has been released.
The text was updated successfully, but these errors were encountered: