You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MonetDb stores intermediate results within the bat directory and the corresponding sub directories.
Every time a query is being executed a new .tail file is being created - even if the same query is used.
These .tail files are only being deleted after restarting the server (dbfarm). Closing the connection, resultset and statment does not remove the files.
Eventually this leads to insuffient disk space and the server crashes.
I cannot share the data I was using, but I believe this behavior can be reproduced by generating a random dataset. The data I was using consists of two tables. Table A stores 10 million rows (10 columns) and Table B stores 1 Million rows (3 columns). Table A Column1 refernces Table B Column 1.
Both tables are being joined (using Column 1) and some statistical values are being calculated (QUARTILE, AVG, SUM using GROUP BY).
I tried to execute the same SQL query 1000 times sequentially and after the 60th iteration monetDB crashed because of insuffcient disk space.
I sent a request to the user mailing list and got the response that other users faced the same issue and are forced to restart the server in a nightly maintenance window.
I think this is a bug and the intermediate files should be deleted after the query has been executed.
Reproducible: Always
Steps to Reproduce:
ingest a medium size data set (e.g. 10 Million rows and 1 Million rows) with at least two tables
Create a SQL statement like this pseudo code
SELECT
COUNT(*) AS TOTAL, SUM(VALUE_DECIMAL) AS VALUE, STRING_VALUE,
MIN(VALUE_DECIMAL) as min,
QUANTILE (VALUE_DECIMAL,0.25) AS Q25,
QUANTILE (VALUE_DECIMAL,0.5) AS Q50,
QUANTILE (VALUE_DECIMAL,0.75) AS Q75,
MAX(VALUE_DECIMAL) as max
FROM TABLE_A AS a
JOIN TABLE_B as b on (b.INT_ID = a.INT_ID)
GROUP BY STRING_VALUE
Execute the SQL statement several times
Monitor the disk usage after the execution
You should see that the used disk space increases with each iteration. Every iteration creates a new .tail file.
Restart the dbFarm
Check the disk usage
Now you should see that the intermediate files were deleted
Actual Results:
New .tal file with each iteration, increasing disk usage until server crashes
Expected Results:
.tail files are deleted after the query has been executed
Date: 2015-10-17 01:54:13 +0200
From: Alex <<abraun_75>>
To: SQL devs <>
Version: 11.21.5 (Jul2015)
CC: @njnes
Last updated: 2015-11-03 10:18:35 +0100
Comment 21347
Date: 2015-10-17 01:54:13 +0200
From: Alex <<abraun_75>>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Build Identifier:
MonetDb stores intermediate results within the bat directory and the corresponding sub directories.
Every time a query is being executed a new .tail file is being created - even if the same query is used.
These .tail files are only being deleted after restarting the server (dbfarm). Closing the connection, resultset and statment does not remove the files.
Eventually this leads to insuffient disk space and the server crashes.
I cannot share the data I was using, but I believe this behavior can be reproduced by generating a random dataset. The data I was using consists of two tables. Table A stores 10 million rows (10 columns) and Table B stores 1 Million rows (3 columns). Table A Column1 refernces Table B Column 1.
Both tables are being joined (using Column 1) and some statistical values are being calculated (QUARTILE, AVG, SUM using GROUP BY).
I tried to execute the same SQL query 1000 times sequentially and after the 60th iteration monetDB crashed because of insuffcient disk space.
I sent a request to the user mailing list and got the response that other users faced the same issue and are forced to restart the server in a nightly maintenance window.
I think this is a bug and the intermediate files should be deleted after the query has been executed.
Reproducible: Always
Steps to Reproduce:
SELECT
COUNT(*) AS TOTAL, SUM(VALUE_DECIMAL) AS VALUE, STRING_VALUE,
MIN(VALUE_DECIMAL) as min,
QUANTILE (VALUE_DECIMAL,0.25) AS Q25,
QUANTILE (VALUE_DECIMAL,0.5) AS Q50,
QUANTILE (VALUE_DECIMAL,0.75) AS Q75,
MAX(VALUE_DECIMAL) as max
FROM TABLE_A AS a
JOIN TABLE_B as b on (b.INT_ID = a.INT_ID)
GROUP BY STRING_VALUE
You should see that the used disk space increases with each iteration. Every iteration creates a new .tail file.
Actual Results:
New .tal file with each iteration, increasing disk usage until server crashes
Expected Results:
.tail files are deleted after the query has been executed
Comment 21349
Date: 2015-10-17 11:16:32 +0200
From: MonetDB Mercurial Repository <>
Changeset a2d0aed144f5 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=a2d0aed144f5
Changeset description:
Comment 21350
Date: 2015-10-17 11:22:58 +0200
From: @njnes
the quantile function leaked bats.
Comment 21448
Date: 2015-11-03 10:18:35 +0100
From: @sjoerdmullender
Jul2015 SP1 has been released.
The text was updated successfully, but these errors were encountered: