Heapcache bugs/performance issues · Issue #3323 · MonetDB/MonetDB · GitHub

Date: 2013-07-17 11:14:56 +0200
From: Alistair Sutherland <<alistair.sutherland>>
To: GDK devs <>
Version: 11.15.15 (Feb2013-SP4)
CC: @hannesmuehleisen, @mlkersten

Last updated: 2013-10-22 14:42:11 +0200

Comment 18917

Date: 2013-07-17 11:14:56 +0200
From: Alistair Sutherland <<alistair.sutherland>>

User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
Build Identifier:

I've recently been conducting some performance testing of MonetDB with a variety of servers (from 4 core-8GB RAM to 20 core-64GB RAM) with various data sizes, in an attempt to gain a better understanding of how MonetDB scales.

During the performance tests it became obvious that much of the processing was IO bound due to:

Columns being unmapped from memory overly aggressively (even when there was plenty of memory still available).
The constant mapping/unmapping of memory mapped bat files for intermediate results.

I've attached a patch which attempts to address both issues. The first patch(gdk_utils.c) is to update the memory limit at which the GDKvmtrim kicks in to be 80% memory usage.

The second patch (gdk_heap.c) limits the number of mmap/munmap calls via the existing 'heap caching' mechanism which was not working at all! When the heap cache reached it's capacity (which it does quickly after startup), the HeapCacheFind() function would never find any mmapped files due to an 'off-by-one-error'. If it could enter the HeapCacheFind() function the logic to select a cached heap of the correct size was incorrect and the moving of the cached memory file to bat directory was also broken.

In addition to fixing up the caching code, I've also wired in the heap cache into the case where extending a malloced heap results in a swap over to memory mapped storage.

Reproducible: Always

Steps to Reproduce:

Run MonetDB and execute queries.
Observe that the heapcache is not used (check dbfarm 'HC' directory and observe cached files are never used).
Observe that performance suffers from unnecessary hard page faults.

Actual Results:

Columns trimmed from BPP cache even though low memory usage.
Heapcache not used.
High numbers of hard page faults.

Expected Results:

Columns remain cached in memory where possible.
Heapcache used where possible.
Low numbers of hard page faults when memory available.

Comment 18918

Date: 2013-07-17 11:15:45 +0200
From: Alistair Sutherland <<alistair.sutherland>>

Created attachment 212
Heapcache fixes

Attached file: CachingFixes.patch (text/plain, 8487 bytes)
Description: Heapcache fixes

Comment 18919

Date: 2013-07-17 11:51:01 +0200
From: @sjoerdmullender

Thanks for the patch. We will study it.

For completeness I repeat the remarks I made on the mailing list.

The heapcache isn't without problems. One serious issue is that the heaps that end up there were for temporary bats in which we are not interested anymore. However, the kernel doesn't know we're not interested in the contents, so it will write the data to disk. Maybe not immediately, but eventually. This is unnecessary I/O.

The other issue I want to raise is that I have made a change to the
Feb2013 branch after the latest release (SP3) which addresses some mmap inefficiencies. On Linux there is a mremap system call which can be used to grow (and possibly move) a memory map. If that system call is available, we now use it. On systems where the system call is not available, we now try to mmap extra space after the existing mmap, and if that fails, we munmap the whole area and mmap it larger somewhere else. The upshow of this is that we no longer write the data to disk if we need to grow a memory map. I am curious how this change performs in your tests.

Comment 18929

Date: 2013-07-19 16:45:37 +0200
From: @sjoerdmullender

The patch seems to do a number of more-or-less unrelated things. In addition, there seem to be gratuitous code changes (indentation, putting "{" on a new line, using spaces for indentation instead of tabs) which make it harder to assess the merits of the patch.

I'll try to go through some of the things I saw.

The HEAP_CACHE_SIZE constant was replaced by a variable that is never changed. This seems to me to be a gratuitous change.

The old code used a variable cursz to find the best cached heap. The patched code uses variables curdiffsz and diffsz. Unfortunately, the new code is wrong. The new variables are of type "int" which may cause overflow in their calculation. The old code was carefully constructed to not cause overflow. It is unclear to me whether the new code would work any better than the old (if we ignore the overflow issue). Please explain why you think the new code is better.

Other changes I see are:

If the cached heap is much larger than what is needed (more than 30%), the cached file is not used. Is this a smart thing to do? Wouldn't you end up with lots of cached heaps that are too large to be of use?

A call to GDKextend was deleted. Probably because that already happens in GDKload. Probably a good idea.

When going from a malloced heap to a memory mapped heap, the patched code now considers using the heap cache. If it turns out that the heap cache does work well, this is probably a good idea.

The change in gdk_utils.c is also unrelated. The old code was working on the idea that if the memory usage of the server grows (memdiff >= 0) but the resident set size of the process shrinks somewhat significantly, the O/S is taking away real memory that we actually wanted to use. Therefore that was a good time to reduce our memory footprint by calling BBPtrim. The problem this was trying to solve was to dynamically adapt to the amount of memory the process gets from the O/S. It seems to work reasonably well when, e.g., two servers are started on the same system and both compete for all available memory. Having said this, what would be really great if the server could better determine which bats are still needed or will be needed in the near future. This is, unfortunately, a very difficult thing to do in GDKvmtrim. That code doesn't know (and, because of modularization, can't know) what the server is up to.
Please see the commit messages for changesets 1b007b309a89 and 4f72f353b22b

Comment 18930

Date: 2013-07-19 18:49:59 +0200
From: Alistair Sutherland <<alistair.sutherland>>

Hi, thanks for reviewing the patch, I'll attempt to answer your queries.

The HEAP_CACHE_SIZE constant was replaced by a variable that is never changed.
This seems to me to be a gratuitous change.

Ah, I was planning on wiring the HEAP_CACHE_SIZE into "GDKgetenv" framework to allowing users to configure the size of the cache (and effectively disable it by setting it to 0) but I must have never got round to it...

The old code used a variable cursz to find the best cached heap. The patched
code uses variables curdiffsz and diffsz. Unfortunately, the new code is wrong.
The new variables are of type "int" which may cause overflow in their calculation.
The old code was carefully constructed to not cause overflow. It is unclear to me
whether the new code would work any better than the old (if we ignore the overflow
issue). Please explain why you think the new code is better.

Ok, the int overflow was an oversight. The problem with the old code is that it doesn't actually "prefer smallest larger than or equal to requested, otherwise largest smaller than requested". Imagine the scenario with 3 heap caches of size: 5MB, 15MB, 10MB and a cache request of 10MB. The first time around the loop "e == NULL" so we set cursz to 5MB. For the next two iterations cursz is not less than 15MB or 10MB so we end up sticking with 5MB...

This is over and above the other bug where hc->used is never less than hc->sz (after the cache reaches capacity) which resulted in the caching code never being used.

If the cached heap is much larger than what is needed (more than 30%), the cached
file is not used. Is this a smart thing to do? Wouldn't you end up with lots of
cached heaps that are too large to be of use?

If we don't do this we can end up with the dbfarm taking up a lot more disk space than is necessary. If every 1MB cache request (which isn't for a temporary) takes a 1.5GB heap cache entry then all of a sudden the dbfarm is huge.

A call to GDKextend was deleted. Probably because that already happens in GDKload. Probably a good idea.

It's more than that. The GDKLoad function requires that the filepath is relative to "bat" directory. This wasn't happening so all attempts to resize files were failing, again resulting in the cache not being used.

This is, unfortunately, a very difficult thing to do in GDKvmtrim. That code doesn't know (and, because of modularization, can't know) what the server
is up to. Please see the commit messages for changesets 1b007b309a89 and 4f72f353b22b

I can see what you are trying to do but even on a 12GB RAM machine we should be able to keep all columns in memory (for a table with 100M rows and 5 columns). After the vmtrim change (and heap cache fixes) I get no disk IO, before the changes the disk was thrashing due to hard page faults.

Comment 18931

Date: 2013-07-19 19:36:21 +0200
From: @mlkersten

The MonetDB philosophy is not to introduce low level steering parameters.
If that is crucial in a rare situation, then there should be enough expertise to patch the code and compile.

Comment 18932

Date: 2013-07-19 19:38:22 +0200
From: @mlkersten

To be more precise, we don't want steering parameters to become visible as user tuneable knobs.

For the remainder, thanks for looking so deeply into the code base.
Martin

Comment 18933

Date: 2013-07-22 11:40:00 +0200
From: @sjoerdmullender

(In reply to comment 4)

The old code used a variable cursz to find the best cached heap. The patched
code uses variables curdiffsz and diffsz. Unfortunately, the new code is wrong.
The new variables are of type "int" which may cause overflow in their calculation.
The old code was carefully constructed to not cause overflow. It is unclear to me
whether the new code would work any better than the old (if we ignore the overflow
issue). Please explain why you think the new code is better.

Ok, the int overflow was an oversight. The problem with the old code is that
it doesn't actually "prefer smallest larger than or equal to requested,
otherwise largest smaller than requested". Imagine the scenario with 3 heap
caches of size: 5MB, 15MB, 10MB and a cache request of 10MB. The first time
around the loop "e == NULL" so we set cursz to 5MB. For the next two
iterations cursz is not less than 15MB or 10MB so we end up sticking with
5MB...

I think a better fix (avoiding the possibility of overflow) is to add an extra condition to the test. In the bit starting with e == NULL I added "|| cursz < *maxzs". This ensures that the test also succeeds when looking at a heap cache that is at least as large as the requested size and the current best size is smaller than the requested size.

Comment 18934

Date: 2013-07-22 11:43:28 +0200
From: MonetDB Mercurial Repository <>

Changeset 610572053204 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=610572053204

Changeset description:

Fix for one of the problems in bug #3323.
When going through the heap cache and the first heap is smaller than
requested, we want to match any heap that is at least as large as
requested.

Comment 18935

Date: 2013-07-22 11:49:07 +0200
From: @sjoerdmullender

(In reply to comment 4)

This is over and above the other bug where hc->used is never less than
hc->sz (after the cache reaches capacity) which resulted in the caching code
never being used.

The fix for this is to compare hc->used with 0 in HEAPcacheFind. There needs to be at least one used entry for it to make sense to look for one. It doesn't matter how hc->used compares with hc->sz.

Comment 18936

Date: 2013-07-22 11:52:12 +0200
From: MonetDB Mercurial Repository <>

Changeset dee4d23f51ab made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=dee4d23f51ab

Changeset description:

Fix for one of the problems in bug #3323.
Once the heap cache was full, it was never used anymore.  When we want
to find a cache file for use, it only matters whether there are any in
the cache, not whether the cache is full.

Comment 18937

Date: 2013-07-22 12:56:45 +0200
From: @sjoerdmullender

(In reply to comment 4)

A call to GDKextend was deleted. Probably because that already happens in GDKload. Probably a good idea.

It's more than that. The GDKLoad function requires that the filepath is
relative to "bat" directory. This wasn't happening so all attempts to resize
files were failing, again resulting in the cache not being used.

I fixed this differently. I now use the recently introduced function MT_mremap to extend the memory map (which is what was being attempted here).

Comment 18938

Date: 2013-07-22 12:58:52 +0200
From: MonetDB Mercurial Repository <>

Changeset 93b8c8c0dee0 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=93b8c8c0dee0

Changeset description:

Fix for one of the problems in bug #3323.
When the code decided to use a pre-existing memory map, and the map
needed to be extended, the remapping of the extended file failed due
to incorrect use of the file name parameters.  We now use MT_mremap to
do the extending and remapping.

Comment 18940

Date: 2013-07-22 18:20:22 +0200
From: MonetDB Mercurial Repository <>

Changeset 3b88af94c574 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=3b88af94c574

Changeset description:

Fix for one of the problems in bug #3323.
Allocate a heap from the heap cache when extending a heap from memory
to memory map.

Comment 18941

Date: 2013-07-22 18:22:06 +0200
From: @sjoerdmullender

(In reply to comment 3)

When going from a malloced heap to a memory mapped heap, the patched code
now considers using the heap cache. If it turns out that the heap cache
does work well, this is probably a good idea.

I implemented this slightly differently.

But thanks a lot for the patches. They were very inspiring.

Comment 18969

Date: 2013-08-01 15:28:30 +0200
From: @sjoerdmullender

I have now addressed most of the problems you have described in the report. The one thing I did not address yet is the change to gdk_utils.c.

I have done a bunch of experiments to compare a number of configurations. The experiment was to load the TPC-H data, apply the constraints, and then run the 22 queries 10 times in succession. All without restarting the server. I did this mostly on my 8 GiB laptop, and I tried both scale factor 5 and 10 (i.e., roughly 5 GiB of data and 10 GiB of data, so one fits, and one doesn't).

I tried the following configurations:

"as is"
disabled heap cache (setting HEAP_CACHE_SIZE to 0)
disabled the vmtrim thread (run server with option --set gdk_vmtrim=no)
combination of the two above

Note that the vmtrim thread is the one that checks the resident set size, the change you made in gdk_utils.c.

Before starting each experiment, I cleared the operating system buffer cache by doing "echo 3 > /proc/sys/vm/drop_caches".

I measured the following:

time from start to finish
amount of data read from disk
amount of data written to disk

The latter two measures were done using systemtap and tapping into the submit_bio() function in the Linux kernel.

As a side note, when running the queries in scale factor 5, I saw that basically nothing was being read after the first round of queries. In other words, everything needed was kept in memory.

On Linux (which is what my laptop runs), it was clear that using the heap cache was a looser in all three measures. It took longer, and more data was read and written.

Having the vmtrim thread was sometimes better, sometimes worse, and usually when it was better for one measure in scale factor 5, it was worse in 10, and vice versa. In any case, the difference was never big.

The conclusion: despite the fixes that went into getting the heap cache to work, it may be better to remove it altogether. And the vmtrim thread we can leave or remove (remove is better from a maintenance point of view).

However, Linux is not the whole world. There is also Windows to consider.

On Windows, initial experiments show that the heap cache is beneficial. Probably because creating and deleting large files is very expensive. I am now experimenting with sparse files on Windows in the hope that creating those is faster than when not using sparse files. This experiment is unlikely to be finished before I leave for a two week vacation tomorrow.

Comment 19227

Date: 2013-09-30 17:06:17 +0200
From: MonetDB Mercurial Repository <>

Changeset 49cc8f100cb4 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=49cc8f100cb4

Changeset description:

Get rid of the heap cache.
This fixes bug #3376 and relates to bug #3323.

In bug #3323, comment 15, I already remarked that using the heap cache
is actually slower on Linux than not using the heap cache.

In addition to this, using the heap cache can be very detrimental as
witnessed by bug #3376.  What happens there is that a new string bat is
created and then appended to.  The appended string bat is big enough
so that the offset heap needs to be memory mapped, so the heap cache
is used to find a usable heap.  In this case, there is only one file
there, so it is used.  BATappend then needs to widen the offset heap,
so it calls GDKupgradevarheap.  That function doubles (or in this
case, quadruples) the size of the heap it was given since it doesn't
know the required BAT capacity.  Then the BAT is freed and the heap
returned to the cache.  This cycle repeats a few times, and this
repetition causes the heap to grow exponentially, eventually leading
to the errors seen in the bug report.

Comment 19254

Date: 2013-10-08 11:25:38 +0200
From: @hannesmuehleisen

The heap cache is gone.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heapcache bugs/performance issues #3323

Heapcache bugs/performance issues #3323

monetdb-team commented Nov 30, 2020

Heapcache bugs/performance issues #3323

Heapcache bugs/performance issues #3323

Comments

monetdb-team commented Nov 30, 2020

Comment 18917

Steps to Reproduce:

Actual Results:

Expected Results:

Comment 18918

Comment 18919

Comment 18929

Comment 18930

Comment 18931

Comment 18932

Comment 18933

Comment 18934

Comment 18935

Comment 18936

Comment 18937

Comment 18938

Comment 18940

Comment 18941

Comment 18969

Comment 19227

Comment 19254