You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.23.13 (Jun2016-SP2)
CC: @njnes
Last updated: 2019-04-30 12:36:02 +0200
Comment 25349
Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>
Created attachment 552
Patch v1 (Jun2016)
The attached patch adds a new relational optimizer pass which removes any unioned selects which cannot contribute any rows to the result set. It applies to Jun2016 7e686221367d; if there's interest from other people then I'll rebase it to something newer.
This is similar to, but a superset of, the merge table functionality implemented in rel_merge_table_rewrite(). I looked at using merge tables in my application, but the following problems exist:
I'd have to change quite a few bits of my application (not a show-stopper, but it seemed more productive to spend the time in Monet instead)
There's no way to make it work with aligned partitions
That last point needs some explanation. My application partitions tables by date, but has multiple associated tables in each date window, call them 'a' and 'b'. It is much more efficient to execute "(a1 join b1) union (a2 join b2) union (a3 join b3)" than "(a1 union a2 union a3) join (b1 union b2 union b3)" because the row count of the intermediate tables is much smaller. Merge tables, however, can only be used to implement the latter. The attached patch works for either.
A minimal example of this patch at work is:
sql>create table sub1 (i int);
sql>create table sub2 (i int);
sql>insert into sub1 values (10),(20);
sql>insert into sub2 values (30),(40);
sql>alter table sub1 set read only;
sql>alter table sub2 set read only;
sql>analyze sys.sub1;
sql>analyze sys.sub2;
sql>plan select i from sub1 where i between 25 and 50 union all select i from sub2 where i between 25 and 50;
+----------------------------------------+
| rel |
+========================================+
| project ( |
| | select ( |
| | | table(sys.sub2) [ sub2.i ] COUNT |
| | ) [ int "25" <= sub2.i <= int "50" ] |
| ) [ sub2.i ] |
+----------------------------------------+
There clearly need to be a whole bunch of tests written for this change, but I'm putting it out there now as an RFC. This is the first time I've messed around in this area of the MonetDB code base, so I'd appreciate it if somebody could tell me if I've done it right.
Date: 2017-05-26 14:24:52 +0200
From: Richard Hughes <<richard.monetdb>>
I should mention why partition elimination is important to me. There are two reasons:
Combining indexes. We have a number of queries like "select ... from unionview where t between timestamp '2017-5-14' and timestamp '2017-5-21' and x=42;". In the absence of partition elimination the quickest way to execute this is to filter on 't' first because that gets the row count down fastest and has the best locality in association with other queries; for this reason we actually write the above query as "... and x in (42)" to ensure that the optimizer sees that clause as less selective than the time. With partition elimination we can effectively use indexes on both t and x: out-of-range 't' tables are initially eliminated then the 'x' filter becomes easily the most selective without compromising locality.
MAL interpreter performance. I did this to myself due to Bug mitosis splits an unhelpful table when applied to unions #3548: most of our plans are around the 20,000-30,000 instruction mark, and 80,000 instructions is not unusual. Benchmarking suggests that the MAL interpreter can dispatch on the order of 20,000-40,000 instructions per second on our hardware (assuming those instructions have no actual work to do) which means that our queries take upwards of a second to return even if they're easy. Eliminating partitions much earlier saves both instruction execution time and MAL generation/optimization time.
The combination of both of these effects allows us to achieve sub-100ms response times in some common cases and generally saves a second or so on every single query.
Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.23.13 (Jun2016-SP2)
CC: @njnes
Last updated: 2019-04-30 12:36:02 +0200
Comment 25349
Date: 2017-05-25 16:39:54 +0200
From: Richard Hughes <<richard.monetdb>>
Created attachment 552
Patch v1 (Jun2016)
The attached patch adds a new relational optimizer pass which removes any unioned selects which cannot contribute any rows to the result set. It applies to Jun2016 7e686221367d; if there's interest from other people then I'll rebase it to something newer.
This is similar to, but a superset of, the merge table functionality implemented in rel_merge_table_rewrite(). I looked at using merge tables in my application, but the following problems exist:
That last point needs some explanation. My application partitions tables by date, but has multiple associated tables in each date window, call them 'a' and 'b'. It is much more efficient to execute "(a1 join b1) union (a2 join b2) union (a3 join b3)" than "(a1 union a2 union a3) join (b1 union b2 union b3)" because the row count of the intermediate tables is much smaller. Merge tables, however, can only be used to implement the latter. The attached patch works for either.
A minimal example of this patch at work is:
sql>create table sub1 (i int);
sql>create table sub2 (i int);
sql>insert into sub1 values (10),(20);
sql>insert into sub2 values (30),(40);
sql>alter table sub1 set read only;
sql>alter table sub2 set read only;
sql>analyze sys.sub1;
sql>analyze sys.sub2;
sql>plan select i from sub1 where i between 25 and 50 union all select i from sub2 where i between 25 and 50;
+----------------------------------------+
| rel |
+========================================+
| project ( |
| | select ( |
| | | table(sys.sub2) [ sub2.i ] COUNT |
| | ) [ int "25" <= sub2.i <= int "50" ] |
| ) [ sub2.i ] |
+----------------------------------------+
There clearly need to be a whole bunch of tests written for this change, but I'm putting it out there now as an RFC. This is the first time I've messed around in this area of the MonetDB code base, so I'd appreciate it if somebody could tell me if I've done it right.
Comment 25350
Date: 2017-05-26 14:24:52 +0200
From: Richard Hughes <<richard.monetdb>>
I should mention why partition elimination is important to me. There are two reasons:
Combining indexes. We have a number of queries like "select ... from unionview where t between timestamp '2017-5-14' and timestamp '2017-5-21' and x=42;". In the absence of partition elimination the quickest way to execute this is to filter on 't' first because that gets the row count down fastest and has the best locality in association with other queries; for this reason we actually write the above query as "... and x in (42)" to ensure that the optimizer sees that clause as less selective than the time. With partition elimination we can effectively use indexes on both t and x: out-of-range 't' tables are initially eliminated then the 'x' filter becomes easily the most selective without compromising locality.
MAL interpreter performance. I did this to myself due to Bug mitosis splits an unhelpful table when applied to unions #3548: most of our plans are around the 20,000-30,000 instruction mark, and 80,000 instructions is not unusual. Benchmarking suggests that the MAL interpreter can dispatch on the order of 20,000-40,000 instructions per second on our hardware (assuming those instructions have no actual work to do) which means that our queries take upwards of a second to return even if they're easy. Eliminating partitions much earlier saves both instruction execution time and MAL generation/optimization time.
The combination of both of these effects allows us to achieve sub-100ms response times in some common cases and generally saves a second or so on every single query.
Comment 26861
Date: 2019-01-30 12:31:29 +0100
From: MonetDB Mercurial Repository <>
Changeset 2c50b80d80c7 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=2c50b80d80c7
Changeset description:
The text was updated successfully, but these errors were encountered: