You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:19.0) Gecko/20100101 Firefox/19.0
Build Identifier:
When you have a big query running, all worker threads are busy.
Then trying to connect with another mclient is hindered, because
all instructions are merged into a global queue. This most likely
means you can not easily stop the big query.
Solutions: either have a worker set per Client connection (preferred) or
change scheduler to better balance instructions.
Reproducible: Always
Steps to Reproduce:
run sf100
separate start of mclient to perform a sys.pause(x)
New thread group per dataflow block
The centralized worker thread group could lead to an unacceptable
situation. If a user is heavily processing complex queries, then
no other user could even log into the system, for its MAL statements
ended up at the end of the shared queues.
The problem has been resolved by introducing a thread group per dataflow
block. This may lead to a large number of processes, whose resources
are managed by the OS.
It solves bug #3258
Revert creating dataflow pools per client and fix problem differently.
When a MAL function calls language.dataflow(), the thread executing
the call waits until the whole dataflow block is executed by the other
threads in the dataflow pool. If this is done recursively, we go
through all available threads and all threads end up waiting for their
dataflow block to finish, which doesn't happen since there are no
worker threads available anymore. The solution that was tried before
was to create N threads whenever language.dataflow() was called, and
those threads never exited. This can very quickly cause very many
threads to be created (I have seen over 1300 threads on a system with
many cores). The current solutions instead creates a single extra
thread whenever a thread is blocked waiting for the dataflow block to
be finished, and when the block is finished, it stops a single thread
(possibly a different one, but who cares: the result is the same).
This may also fix bug #3258 in a different way then the original fix.
Comment 19163
Date: 2013-09-18 19:33:16 +0200
From: @mlkersten
This 'new' solution does not handle the main problem addressed with the thread
pool per client!
The point is that once a complex/expensive query is started from a fixed pool
it effectively blocks (administrative) SQL access to the server for querying.
Such server access is needed to inspect the query queue and kill e.g. the rogue query. => there should be a worker pool /client session !
Also parallel execution of concurrent queries is left to the OS thread scheduling.
If you want to reduce the poolt hen garbage collect the threads set aside per client when his session terminates.
Recursion will now also still create a large number threads(=equal to the recursion depth. Thereby not addressing the real problem. The MAL interpreter
has a high-water mark to stop too deeply recursive function calls (against
stack overflow). A similar approach should be considered if you want to
control the number processes. => The stack depth can be used to assess if dataflow parallelism should be used. Otherwise, simply continue in serial mode.
(I thought it already did so)
I propose to revert this patch in the light of this consideration.
Comment 19164
Date: 2013-09-18 19:51:19 +0200
From: @mlkersten
DFLOWinitialize... here we stop parallel interpretation and continue serial
MT_lock_unset(&mal_contextLock, "DFLOWinitialize");
if (grp > THREADS) {
// continue non-parallel
return -1;
}
runMALdataflow.... here we stop parallel processing if there are too many pools
/* too many threads turns dataflow processing off */
if ( cntxt->idx > MAXQ){
*ret = TRUE;
return MAL_SUCCEED;
}
Recursive calls can 'steal' a worker pool, because the recursive function may
have to be ran in parallel as well.
Also if you can not create a worker thread we continue with serial execution
On a large multicore system you may end up with MAXQ * THREADS processes
Comment 19165
Date: 2013-09-18 20:42:27 +0200
From: @mlkersten
A test like this could be used also in dataflow to turn off parallel processing
if ((unsigned)stk->stkdepth > THREAD_STACK_SIZE / sizeof(mb->var[0]) / 4 && THRhighwater())
/* we are running low on stack space */
Fix for bug #3258.
We now maintain a pool of N-1 generic worker threads which is extended
by one client-specific worker thread for each client that enters the
dataflow code.
Date: 2013-03-18 07:51:43 +0100
From: @mlkersten
To: MonetDB5 devs <>
Version: 11.15.15 (Feb2013-SP4)
Last updated: 2013-12-03 13:59:32 +0100
Comment 18624
Date: 2013-03-18 07:51:43 +0100
From: @mlkersten
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:19.0) Gecko/20100101 Firefox/19.0
Build Identifier:
When you have a big query running, all worker threads are busy.
Then trying to connect with another mclient is hindered, because
all instructions are merged into a global queue. This most likely
means you can not easily stop the big query.
Solutions: either have a worker set per Client connection (preferred) or
change scheduler to better balance instructions.
Reproducible: Always
Steps to Reproduce:
Comment 19003
Date: 2013-08-15 23:57:29 +0200
From: MonetDB Mercurial Repository <>
Changeset 3669ddd28bf0 made by Martin Kersten mk@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=3669ddd28bf0
Changeset description:
Comment 19162
Date: 2013-09-18 18:01:45 +0200
From: MonetDB Mercurial Repository <>
Changeset 8a77f78a4fe4 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=8a77f78a4fe4
Changeset description:
Comment 19163
Date: 2013-09-18 19:33:16 +0200
From: @mlkersten
This 'new' solution does not handle the main problem addressed with the thread
pool per client!
The point is that once a complex/expensive query is started from a fixed pool
it effectively blocks (administrative) SQL access to the server for querying.
Such server access is needed to inspect the query queue and kill e.g. the rogue query. => there should be a worker pool /client session !
Also parallel execution of concurrent queries is left to the OS thread scheduling.
If you want to reduce the poolt hen garbage collect the threads set aside per client when his session terminates.
Recursion will now also still create a large number threads(=equal to the recursion depth. Thereby not addressing the real problem. The MAL interpreter
has a high-water mark to stop too deeply recursive function calls (against
stack overflow). A similar approach should be considered if you want to
control the number processes. => The stack depth can be used to assess if dataflow parallelism should be used. Otherwise, simply continue in serial mode.
(I thought it already did so)
I propose to revert this patch in the light of this consideration.
Comment 19164
Date: 2013-09-18 19:51:19 +0200
From: @mlkersten
DFLOWinitialize... here we stop parallel interpretation and continue serial
MT_lock_unset(&mal_contextLock, "DFLOWinitialize");
if (grp > THREADS) {
// continue non-parallel
return -1;
}
runMALdataflow.... here we stop parallel processing if there are too many pools
/* too many threads turns dataflow processing off */
if ( cntxt->idx > MAXQ){
*ret = TRUE;
return MAL_SUCCEED;
}
Recursive calls can 'steal' a worker pool, because the recursive function may
have to be ran in parallel as well.
Also if you can not create a worker thread we continue with serial execution
On a large multicore system you may end up with MAXQ * THREADS processes
Comment 19165
Date: 2013-09-18 20:42:27 +0200
From: @mlkersten
A test like this could be used also in dataflow to turn off parallel processing
if ((unsigned)stk->stkdepth > THREAD_STACK_SIZE / sizeof(mb->var[0]) / 4 && THRhighwater())
/* we are running low on stack space */
Comment 19323
Date: 2013-11-05 14:05:56 +0100
From: MonetDB Mercurial Repository <>
Changeset 21892a4f04a1 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.
For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=21892a4f04a1
Changeset description:
Comment 19324
Date: 2013-11-05 15:20:20 +0100
From: @sjoerdmullender
Unless proven otherwise, this is now fixed.
Comment 19376
Date: 2013-12-03 13:59:32 +0100
From: @sjoerdmullender
Feb2013-SP6 has been released.
The text was updated successfully, but these errors were encountered: