You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a table with a CTAS statement. I analyze the statistics and set the primary key
I try a “count distinct” type query (on all the table, no filters) on the PK Field. It is pretty slow, with a high sollicitation of my hard drive.
Reproducible: Always
Steps to Reproduce:
Create a table and populate it. Mine was something like 200 millions rows.
Analyze the statistics, set the Primary key on a field
3.just try a count distinct query on all the table on the PK field.
Actual Results:
High sollicitation of my hard drive and several minutes before having the result
Expected Results:
Something like a very quick result : not only it's the PK, but the statistics are compute...
If I do a distinct operation on the PK, I expect at least the result to be as fast as the count... and it's not the case at all.
And if it's in already in the statistics, I expect it to be immediate.
statistics are hints to the optimizer not pre-computed results. The problem here is that the distinct is in efficiently computed because of a multi partition approach.
Comment 26988
Date: 2019-05-01 21:58:44 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Niels.
Do you mean that Monetdb has a multi-partition approach? Because I didn't use partitions at all here.
MonetDB parallelizes query plans all by itself. You can see this happened in your query by the fact that there are a number of calls to mat.packIncrement.
The query requires a call be made to group.groupdone on the entire column. Often it is faster to do that in parallel first on parts of the input column and then doing it again on the combination of the results of those parallel calls. But this is not efficient if the input column has few duplicate values. In that case, the combined group.groupdone works over (almost) the entire input column, so in effect doing all the work again. If there are many duplicates, doing it twice is more efficient, since the final group.groupdone only works over a small column, namely all unique values in each of the partitions, and the partitions are done in parallel.
So, the problem is that sometimes MonetDB should partition the input, and sometimes not. And here we chose wrong.
Comment 27031
Date: 2019-06-06 23:42:26 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Sjoerd,
I'm not sure but shouldn't the decision take into consideration statistics (or pk..)?
Best regards,
Simon
Comment 27181
Date: 2019-07-27 23:01:09 +0200
From: @mlkersten
The underlying routine is the MAL mitosis optimizer. It only looks at the largest table without concerns how it is being used. It would require quite a bit of symbolic reasoning over the complete algebraic plan to avoid such a split beyond a trivial case like this.
One could say that calling for a COUNT DISTINCT on a PK seems like a misunderstanding of the PK concept.
I propose to drop this from further consideration.
Comment 27187
Date: 2019-07-29 07:13:50 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Martin,
"One could say that calling for a COUNT DISTINCT on a PK seems like a misunderstanding of the PK concept."
Probably. However I cannot control every query sent to the database while the database is used with self-service BI Tools such as Tableau or Spotfire. The end user don't have the information it's a PK and doesn't even know SQL. I expect a database being sufficiently robust/smart to take that into account. Especially when the query itself isn't wrong.
Comment 27188
Date: 2019-07-29 07:17:32 +0200
From: Simon AUBERT <<simon.aubert>>
Moreover, it would be also deadly slow without the PK..
Comment 27595
Date: 2020-03-15 11:05:32 +0100
From: @mlkersten
The current code base avoids the copying of the columns.
This removes the overhead and returns within 4 ms.
Extra (complex) optimizations for this pattern will marginally
improve the run time at the cost of more optimization steps.
create table cds(i integer primary key);
insert into cds select * from generate_series(0,1000000, 1);
Date: 2019-02-26 14:17:00 +0100
From: Simon AUBERT <<simon.aubert>>
To: SQL devs <>
Version: 11.31.13 (Aug2018-SP2)
CC: @mlkersten, @njnes
Last updated: 2020-06-03 16:58:52 +0200
Comment 26904
Date: 2019-02-26 14:17:00 +0100
From: Simon AUBERT <<simon.aubert>>
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0
Build Identifier:
Reproducible: Always
Steps to Reproduce:
3.just try a count distinct query on all the table on the PK field.
Actual Results:
High sollicitation of my hard drive and several minutes before having the result
Expected Results:
Something like a very quick result : not only it's the PK, but the statistics are compute...
If I do a distinct operation on the PK, I expect at least the result to be as fast as the count... and it's not the case at all.
And if it's in already in the statistics, I expect it to be immediate.
here the explain :
sql>explain select count(distinct salesorderdetailid) from formation.salesorderdetails;
+-----------------------------------------------------------------------------+
| mal |
+=============================================================================+
| function user.s10_1():void; |
| X_1:void := querylog.define("explain select count(distinct salesorderde |
: tailid) from formation.salesorderdetails;":str, "default_pipe":str, 14:int) :
: ; :
| barrier X_133:bit := language.dataflow(); |
| X_4:int := sql.mvc(); |
| C_70:bat[:oid] := sql.tid(X_4:int, "formation":str, "salesorderdetails" |
: :str, 0:int, 4:int); :
| X_77:bat[:int] := sql.bind(X_4:int, "formation":str, "salesorderdetails |
: ":str, "salesorderdetailid":str, 0:int, 0:int, 4:int); :
| X_90:bat[:int] := algebra.projection(C_70:bat[:oid], X_77:bat[:int]); |
| (X_97:bat[:oid], C_98:bat[:oid], X_99:bat[:lng]) := group.groupdone(X_9 |
: 0:bat[:int]); :
| X_100:bat[:int] := algebra.projection(C_98:bat[:oid], X_90:bat[:int]); |
| C_72:bat[:oid] := sql.tid(X_4:int, "formation":str, "salesorderdetails" |
: :str, 1:int, 4:int); :
| X_78:bat[:int] := sql.bind(X_4:int, "formation":str, "salesorderdetails |
: ":str, "salesorderdetailid":str, 0:int, 1:int, 4:int); :
| X_91:bat[:int] := algebra.projection(C_72:bat[:oid], X_78:bat[:int]); |
| (X_101:bat[:oid], C_102:bat[:oid], X_103:bat[:lng]) := group.groupdone( |
: X_91:bat[:int]); :
| X_104:bat[:int] := algebra.projection(C_102:bat[:oid], X_91:bat[:int]); |
| C_74:bat[:oid] := sql.tid(X_4:int, "formation":str, "salesorderdetails" |
: :str, 2:int, 4:int); :
| X_79:bat[:int] := sql.bind(X_4:int, "formation":str, "salesorderdetails |
: ":str, "salesorderdetailid":str, 0:int, 2:int, 4:int); :
| X_92:bat[:int] := algebra.projection(C_74:bat[:oid], X_79:bat[:int]); |
| (X_105:bat[:oid], C_106:bat[:oid], X_107:bat[:lng]) := group.groupdone( |
: X_92:bat[:int]); :
| X_108:bat[:int] := algebra.projection(C_106:bat[:oid], X_92:bat[:int]); |
| C_76:bat[:oid] := sql.tid(X_4:int, "formation":str, "salesorderdetails" |
: :str, 3:int, 4:int); :
| X_80:bat[:int] := sql.bind(X_4:int, "formation":str, "salesorderdetails |
: ":str, "salesorderdetailid":str, 0:int, 3:int, 4:int); :
| X_93:bat[:int] := algebra.projection(C_76:bat[:oid], X_80:bat[:int]); |
| (X_109:bat[:oid], C_110:bat[:oid], X_111:bat[:lng]) := group.groupdone( |
: X_93:bat[:int]); :
| X_112:bat[:int] := algebra.projection(C_110:bat[:oid], X_93:bat[:int]); |
| X_127:bat[:int] := mat.packIncrement(X_100:bat[:int], 4:int); |
| X_129:bat[:int] := mat.packIncrement(X_127:bat[:int], X_104:bat[:int]); |
| X_130:bat[:int] := mat.packIncrement(X_129:bat[:int], X_108:bat[:int]); |
| X_17:bat[:int] := mat.packIncrement(X_130:bat[:int], X_112:bat[:int]); |
| (X_18:bat[:oid], C_19:bat[:oid], X_113:bat[:lng]) := group.groupdone(X_ |
: 17:bat[:int]); :
| X_21:bat[:int] := algebra.projection(C_19:bat[:oid], X_17:bat[:int]); |
| X_22:lng := aggr.count(X_21:bat[:int], true:bit); |
| language.pass(X_90:bat[:int]); |
| language.pass(X_91:bat[:int]); |
| language.pass(X_92:bat[:int]); |
| language.pass(X_93:bat[:int]); |
| language.pass(X_17:bat[:int]); |
| exit X_133:bit; |
| sql.resultSet("formation.L3":str, "L3":str, "bigint":str, 64:int, 0:int |
: , 7:int, X_22:lng); :
| end user.s10_1; |
| inline actions= 0 time=1 usec |
| remap actions= 0 time=2 usec |
| costmodel actions= 1 time=1 usec |
| coercion actions= 0 time=160 usec |
| evaluate actions= 0 time=14 usec |
| emptybind actions= 1 time=57 usec |
| pushselect actions= 0 time=4 usec |
| aliases actions= 1 time=5 usec |
| mitosis actions=1 time=2237 usec |
| mergetable actions= 2 time=845 usec |
| deadcode actions=11 time=13 usec |
| aliases actions= 0 time=0 usec |
| constants actions= 3 time=174 usec |
| commonTerms actions= 5 time=26 usec |
| projectionpath actions= 0 time=14 usec |
| deadcode actions= 5 time=6 usec |
| reorder actions= 1 time=86 usec |
| matpack actions= 1 time=636 usec |
| dataflow actions= 1 time=37 usec |
| multiplex actions= 0 time=1 usec |
| profiler actions=1 time=1 usec |
| candidates actions=1 time=0 usec |
| deadcode actions= 0 time=32 usec |
| wlc actions= 0 time=2 usec |
| garbagecollector actions= 1 time=112 usec |
| total actions=28 time=6334 usec |
+-----------------------------------------------------------------------------+
Comment 26985
Date: 2019-05-01 13:14:51 +0200
From: @njnes
statistics are hints to the optimizer not pre-computed results. The problem here is that the distinct is in efficiently computed because of a multi partition approach.
Comment 26988
Date: 2019-05-01 21:58:44 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Niels.
Do you mean that Monetdb has a multi-partition approach? Because I didn't use partitions at all here.
Comment 27023
Date: 2019-06-04 12:25:14 +0200
From: @sjoerdmullender
MonetDB parallelizes query plans all by itself. You can see this happened in your query by the fact that there are a number of calls to mat.packIncrement.
The query requires a call be made to group.groupdone on the entire column. Often it is faster to do that in parallel first on parts of the input column and then doing it again on the combination of the results of those parallel calls. But this is not efficient if the input column has few duplicate values. In that case, the combined group.groupdone works over (almost) the entire input column, so in effect doing all the work again. If there are many duplicates, doing it twice is more efficient, since the final group.groupdone only works over a small column, namely all unique values in each of the partitions, and the partitions are done in parallel.
So, the problem is that sometimes MonetDB should partition the input, and sometimes not. And here we chose wrong.
Comment 27031
Date: 2019-06-06 23:42:26 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Sjoerd,
I'm not sure but shouldn't the decision take into consideration statistics (or pk..)?
Best regards,
Simon
Comment 27181
Date: 2019-07-27 23:01:09 +0200
From: @mlkersten
The underlying routine is the MAL mitosis optimizer. It only looks at the largest table without concerns how it is being used. It would require quite a bit of symbolic reasoning over the complete algebraic plan to avoid such a split beyond a trivial case like this.
One could say that calling for a COUNT DISTINCT on a PK seems like a misunderstanding of the PK concept.
I propose to drop this from further consideration.
Comment 27187
Date: 2019-07-29 07:13:50 +0200
From: Simon AUBERT <<simon.aubert>>
Hello Martin,
"One could say that calling for a COUNT DISTINCT on a PK seems like a misunderstanding of the PK concept."
Probably. However I cannot control every query sent to the database while the database is used with self-service BI Tools such as Tableau or Spotfire. The end user don't have the information it's a PK and doesn't even know SQL. I expect a database being sufficiently robust/smart to take that into account. Especially when the query itself isn't wrong.
Comment 27188
Date: 2019-07-29 07:17:32 +0200
From: Simon AUBERT <<simon.aubert>>
Moreover, it would be also deadly slow without the PK..
Comment 27595
Date: 2020-03-15 11:05:32 +0100
From: @mlkersten
The current code base avoids the copying of the columns.
This removes the overhead and returns within 4 ms.
Extra (complex) optimizations for this pattern will marginally
improve the run time at the cost of more optimization steps.
create table cds(i integer primary key);
insert into cds select * from generate_series(0,1000000, 1);
sql>select count(distinct i) from cds;
+---------+
| %1 |
+=========+
| 1000000 |
+---------+
1 tuple
sql:0.000 opt:1.493 run:1.024 clk:4.042 ms
sql>explain select count(distinct i) from cds;
+-------------------------------------------------------------------------------------------------------+
| mal |
+=======================================================================================================+
| function user.s26_0():void; |
| X_1:void := querylog.define("select count(distinct i) from cds;":str, "default_pipe":str, 9:int); |
| barrier X_93:bit := language.dataflow(); |
| X_4:int := sql.mvc(); |
| C_62:bat[:oid] := sql.tid(X_4:int, "sys":str, "cds":str, 0:int, 4:int); |
| X_79:lng := aggr.count(C_62:bat[:oid], true:bit); |
| C_64:bat[:oid] := sql.tid(X_4:int, "sys":str, "cds":str, 1:int, 4:int); |
| X_80:lng := aggr.count(C_64:bat[:oid], true:bit); |
| C_66:bat[:oid] := sql.tid(X_4:int, "sys":str, "cds":str, 2:int, 4:int); |
| X_81:lng := aggr.count(C_66:bat[:oid], true:bit); |
| C_68:bat[:oid] := sql.tid(X_4:int, "sys":str, "cds":str, 3:int, 4:int); |
| X_82:lng := aggr.count(C_68:bat[:oid], true:bit); |
| X_78:bat[:lng] := mat.pack(X_79:lng, X_80:lng, X_81:lng, X_82:lng); |
| X_83:bat[:lng] := algebra.selectNotNil(X_78:bat[:lng]); |
| X_12:lng := aggr.sum(X_83:bat[:lng]); |
| exit X_93:bit; |
| sql.resultSet(".%1":str, "%1":str, "bigint":str, 64:int, 0:int, 7:int, X_12:lng); |
| end user.s26_0; |
+-------------------------------------------------------------------------------------------------------+
46 tuples
sql:0.000 opt:0.000 run:0.000 clk:3.775 ms
Comment 27646
Date: 2020-04-01 14:45:56 +0200
From: Simon AUBERT <<simon.aubert>>
Hello,
I see the status is updated at "RESOLVED NEXTRELEASE". Thanks a lot. :)
@martin Kersten : thanks for the explanations and the comments.
The text was updated successfully, but these errors were encountered: