You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
Build Identifier:
Consider the TPCH sf1 script below, where sf1.load contains the relevant COPYs.
Consistently, we need to wait for 9 seconds on a trivial load:
COPY 5 RECORDS INTO region from '/export/scratch2/mk/tpch/SF-1/data/region.tbl' USING DELIMITERS '|', '|\n' LOCKED;
5 affected rows (9.5s)
compared to the rest of the load:
25 affected rows (695.376ms)
10000 affected rows (1.0s)
150000 affected rows (2.1s)
800000 affected rows (5.1s)
1500000 affected rows (5.4s)
6001215 affected rows (23.4s)
The subsequent alter script runs fine, also for the first instruction
operation successful (6.341ms)
If I place a sleep of 15 seconds after tpch_schema.sql,
I get the same results
If I explicitly stop the server and let it be restarted
by the load script, I don't see the delay.
Seems that the tpch_schema.sql transaction does not propagate the catalog updates immediately, giving this unexpected delay. (who triggers the wait?)
However, if I do the load UNLOCKED (i.e. not LOCKED), the problem disappears.
From looking at strace output I get the distinct impression that it has something to do with committing data. When using LOCKED, after the schema initialization I see that the first data file is loaded, but before that query returns, there are a lot of BATs that get written and fdatasync()ed. All in all that takes a long time. Without LOCKED I don't see the long list of BATs getting synchronized to disk.
Here are the timings of my CREATE script immediately followed by loading with LOCKED:
create schema
operation successful (20.326ms)
operation successful (16.295ms)
operation successful (15.564ms)
operation successful (16.778ms)
operation successful (15.954ms)
operation successful (24.331ms)
operation successful (16.029ms)
operation successful (24.783ms)
load data
5 affected rows (13195.189ms)
25 affected rows (1165.221ms)
10000 affected rows (1542.335ms)
150000 affected rows (2803.931ms)
200000 affected rows (2359.778ms)
800000 affected rows (4972.089ms)
1500000 affected rows (5651.299ms)
6001215 affected rows (19125.609ms)
Date: 2015-04-22 21:41:14 +0200
From: @mlkersten
To: SQL devs <>
Version: 11.17.13 (Jan2014-SP1)
CC: @njnes
Last updated: 2015-06-22 22:32:19 +0200
Comment 20831
Date: 2015-04-22 21:41:14 +0200
From: @mlkersten
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
Build Identifier:
Consider the TPCH sf1 script below, where sf1.load contains the relevant COPYs.
Consistently, we need to wait for 9 seconds on a trivial load:
COPY 5 RECORDS INTO region from '/export/scratch2/mk/tpch/SF-1/data/region.tbl' USING DELIMITERS '|', '|\n' LOCKED;
5 affected rows (9.5s)
compared to the rest of the load:
25 affected rows (695.376ms)
10000 affected rows (1.0s)
150000 affected rows (2.1s)
800000 affected rows (5.1s)
1500000 affected rows (5.4s)
6001215 affected rows (23.4s)
The subsequent alter script runs fine, also for the first instruction
operation successful (6.341ms)
If I place a sleep of 15 seconds after tpch_schema.sql,
I get the same results
If I explicitly stop the server and let it be restarted
by the load script, I don't see the delay.
Seems that the tpch_schema.sql transaction does not propagate the catalog updates immediately, giving this unexpected delay. (who triggers the wait?)
Reproducible: Always
Steps to Reproduce:
monetdb stop sf1
monetdb destroy sf1 -f
monetdb create sf1
monetdb release sf1
time mclient -d sf1 -ei tpch_schema.sql
time mclient -d sf1 -ei sf1.load
time mclient -d sf1 -ei tpch_alter.sql
Comment 20832
Date: 2015-04-22 22:17:38 +0200
From: @sjoerdmullender
I can reproduce the problem.
However, if I do the load UNLOCKED (i.e. not LOCKED), the problem disappears.
From looking at strace output I get the distinct impression that it has something to do with committing data. When using LOCKED, after the schema initialization I see that the first data file is loaded, but before that query returns, there are a lot of BATs that get written and fdatasync()ed. All in all that takes a long time. Without LOCKED I don't see the long list of BATs getting synchronized to disk.
Here are the timings of my CREATE script immediately followed by loading with LOCKED:
create schema
operation successful (20.326ms)
operation successful (16.295ms)
operation successful (15.564ms)
operation successful (16.778ms)
operation successful (15.954ms)
operation successful (24.331ms)
operation successful (16.029ms)
operation successful (24.783ms)
load data
5 affected rows (13195.189ms)
25 affected rows (1165.221ms)
10000 affected rows (1542.335ms)
150000 affected rows (2803.931ms)
200000 affected rows (2359.778ms)
800000 affected rows (4972.089ms)
1500000 affected rows (5651.299ms)
6001215 affected rows (19125.609ms)
And here without LOCKED:
create schema
operation successful (19.183ms)
operation successful (16.115ms)
operation successful (15.683ms)
operation successful (16.322ms)
operation successful (16.175ms)
operation successful (16.586ms)
operation successful (16.048ms)
operation successful (24.764ms)
load data
5 affected rows (163.523ms)
25 affected rows (191.679ms)
10000 affected rows (421.709ms)
150000 affected rows (1959.259ms)
200000 affected rows (1585.701ms)
800000 affected rows (4225.754ms)
1500000 affected rows (5308.068ms)
6001215 affected rows (16584.857ms)
Comment 20844
Date: 2015-04-24 13:41:11 +0200
From: @njnes
changes to the logger caused extra delay. BUNfnd(.., neg-cand list) may
help. Also a vacuum of the catalog table is needed.
Comment 20845
Date: 2015-04-24 15:21:34 +0200
From: @njnes
we use a hash(again) instead of nestedloop (the deleted list is still in efficient)
The text was updated successfully, but these errors were encountered: