Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpectedly slow execution of SELECT length(fieldname) FROM tablename LIMIT 1 queries #6298

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels

Comments

@monetdb-team
Copy link

Date: 2017-05-02 18:46:22 +0200
From: Anthony Bucci <>
To: SQL devs <>
Version: 11.25.21 (Dec2016-SP4)
CC: anthony, imus, kesil, @PedroTadim, shery40, @yzchang

Last updated: 2020-06-03 16:58:53 +0200

Comment 25283

Date: 2017-05-02 18:46:22 +0200
From: Anthony Bucci <>

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0
Build Identifier:

A query of the form SELECT length(fieldname) FROM tablename LIMIT 1 takes an unexpectedly long time to execute on a table with several million rows. It's taking approximately 7 seconds on my hardware, whereas the corresponding SELECT fieldname FROM tablename LIMIT 1 runs in milliseconds. SELECT max(length(fieldname)) FROM tablename (maximizing the length value over the entire table) runs in 7 seconds as well. I tried converting the CLOB to a VARCHAR but this change does not appear to affect the behavior.

I am not adept at reading monetdb's EXPLAIN output, but it appears to me that the execution plan first computes length of the plaintext field over the entire table, and then returns the first row of that result set.

Reproducible: Always

Steps to Reproduce:

  1. CREATE TABLE "sys"."testtable" (
    "field1" VARCHAR(100),
    "field2" VARCHAR(100),
    "raw_text" CHARACTER LARGE OBJECT,
    "simple_text" CHARACTER LARGE OBJECT,
    "plaintext" CHARACTER LARGE OBJECT,
    "id" UUID
    );
  2. populate with data (approximately 6 million rows)
  3. select length(plaintext) from testtable limit 1;

Actual Results:

Observe that the SELECT in step 3 takes a considerable amount of time to execute, roughly the same amount of time as maximizing length over the entire table (select max(length(plaintext)) from testtable;)

Expected Results:

The SELECT in step 3 should execute as quickly as a simple SELECT/LIMIT of the field in question, e.g. as quickly as the query select plaintext from testtable limit 1;

I am not adept at reading monetdb's EXPLAIN output, but it appears to me that the execution plan first computes length of the plaintext field over the entire table, and then returns the first row of that result set. I would expect monetdb to first create a result set with a single row and only compute length on the plaintext field of that set, then return it, which should result in an execution that takes negligibly longer than simply selecting the field.

user@machine:~$ mserver5 --version
MonetDB 5 server v11.25.21 "Dec2016-SP4" (64-bit, 128-bit integers)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 7.7GiB available memory, 2 available cpu cores
Libraries:
libpcre: 8.38 2015-11-23 (compiled with 8.38)
openssl: OpenSSL 1.0.2g 1 Mar 2016 (compiled with OpenSSL 1.0.2g-fips 1 Mar 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: pbuilder@dev.monetdb.org (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -g -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

user@machine:~$ mclient --version
mclient, the MonetDB interactive terminal (Dec2016-SP4)
support for command-line editing compiled-in
character encoding: UTF-8

user@machine:~$ mclient -u monetdb -d uspto-esa
password:
Welcome to mclient, the MonetDB/SQL interactive terminal (Dec2016-SP4)
Database: MonetDB v11.25.21 (Dec2016-SP4), 'mapi:monetdb://machine:50000/uspto-esa'
Type \q to quit, ? for a list of available commands
auto commit mode: on
sql>\d testtable
CREATE TABLE "sys"."testtable" (
"field1" VARCHAR(100),
"field2" VARCHAR(100),
"raw_text" CHARACTER LARGE OBJECT,
"simple_text" CHARACTER LARGE OBJECT,
"plaintext" CHARACTER LARGE OBJECT,
"id" UUID
);
sql>select plaintext from testtable limit 1;
[RESULT]
1 tuple (38.735ms)
sql>select length(plaintext) from testtable limit 1;
[RESULT]
1 tuple (7.6s)
sql>select max(length(plaintext)) from testtable;
[RESULT]
1 tuple (7.1s)
sql>plan select length(plaintext) from testtable limit 1;
+-------------------------------------------------------+
| rel |
+=======================================================+
| top N ( |
| | project ( |
| | | table(sys.testtable) [ testtable.plaintext ] COUNT |
| | ) [ sys.length(testtable.plaintext) as L2.L2 ] |
| ) [ bigint "1" ] |
+-------------------------------------------------------+
5 tuples (1.047ms)
sql>explain select length(plaintext) from testtable limit 1;
+-----------------------------------------------------------------------------------------------------------+
| mal |
+===========================================================================================================+
| function user.s4_1():void; |
| X_0:void := querylog.define("select length(plaintext) from testtable limit 1;","default_pipe",32:int); |
| barrier X_96 := language.dataflow(); |
| X_29 := bat.new(nil:str); |
| X_35 := bat.new(nil:int); |
| X_33 := bat.new(nil:int); |
| X_32 := bat.new(nil:str); |
| X_31 := bat.new(nil:str); |
| X_3 := sql.mvc(); |
| C_61:bat[:oid] := sql.tid(X_3,"sys","testtable",0:int,2:int); |
| X_64:bat[:str] := sql.bind(X_3,"sys","testtable","plaintext",0:int,0:int,2:int); |
| X_67 := algebra.projection(C_61,X_64); |
| X_69 := batstr.length(X_67); |
| X_71 := algebra.subslice(X_69,0:lng,0:lng); |
| C_63:bat[:oid] := sql.tid(X_3,"sys","testtable",1:int,2:int); |
| X_65:bat[:str] := sql.bind(X_3,"sys","testtable","plaintext",0:int,1:int,2:int); |
| X_68 := algebra.projection(C_63,X_65); |
| X_70 := batstr.length(X_68); |
| X_72 := algebra.subslice(X_70,0:lng,0:lng); |
| X_86 := mat.packIncrement(X_71,2:int); |
| X_73 := mat.packIncrement(X_86,X_72); |
| X_26 := algebra.subslice(X_73,0:lng,0:lng); |
| X_75 := algebra.projection(X_71,X_69); |
| X_76 := algebra.projection(X_72,X_70); |
| X_89 := mat.packIncrement(X_75,2:int); |
| X_74 := mat.packIncrement(X_89,X_76); |
| X_27 := algebra.projection(X_26,X_74); |
| language.pass(X_71); |
| language.pass(X_69); |
| language.pass(X_72); |
| language.pass(X_70); |
| exit X_96; |
| X_36 := bat.append(X_29,"sys.L2"); |
| X_38 := bat.append(X_31,"L2"); |
| X_40 := bat.append(X_32,"int"); |
| X_42 := bat.append(X_33,32:int); |
| X_44 := bat.append(X_35,0:int); |
| sql.resultSet(X_36,X_38,X_40,X_42,X_44,X_27); |
| end user.s4_1; |
| inline actions= 0 time=5 usec |
| remap actions= 1 time=27 usec |
| costmodel actions= 1 time=4 usec |
| coercion actions= 2 time=14 usec |
| evaluate actions= 2 time=64 usec |
| emptybind actions= 3 time=45 usec |
| pushselect actions= 0 time=21 usec |
| aliases actions= 3 time=15 usec |
| mitosis actions= 1 time=60 usec |
| mergetable actions= 4 time=118 usec |
| deadcode actions= 5 time=24 usec |
| aliases actions= 0 time=10 usec |
| constants actions= 7 time=19 usec |
| commonTerms actions= 0 time=19 usec |
| projectionpath actions= 0 time=14 usec |
| deadcode actions= 0 time=19 usec |
| reorder actions= 1 time=72 usec |
| reduce actions=42 time=32 usec |
| matpack actions= 2 time=27 usec |
| dataflow actions=34 time=173 usec |
| multiplex actions= 0 time=17 usec |
| profiler actions= 1 time=20 usec |
| candidates actions= 1 time=6 usec |
| garbagecollector actions= 1 time=89 usec |
| total actions= 1 time=1680 usec |
+-----------------------------------------------------------------------------------------------------------+
64 tuples (5.183ms)
sql>trace select length(plaintext) from testtable limit 1;
[RESULT]
1 tuple (8.7s)
+---------+------------------------------------------------------------------------------------------------------------+
| usec | statement |
+=========+============================================================================================================+
| 127 | X_0="select length(plaintext) from testtable limit 1;":void := querylog.define("select length(plaintext) fr |
: : om testtable limit 1;":str,"default_pipe":str,32:int); :
| 124 | X_35=<tmp_4713>[0]:bat[:int] := bat.new(nil:int); |
| 92 | X_33=<tmp_5120>[0]:bat[:int] := bat.new(nil:int); |
| 112 | X_32=<tmp_5222>[0]:bat[:str] := bat.new(nil:str); |
| 95 | X_31=<tmp_5322>[0]:bat[:str] := bat.new(nil:str); |
| 89 | X_3=0:int := sql.mvc(); |
| 264 | X_29=<tmp_4612>[0]:bat[:str] := bat.new(nil:str); |
| 159 | C_61=<tmp_5422>[3243727]:bat[:oid] := sql.tid(X_3=0:int,"sys":str,"testtable":str,0:int,2:int); |
| 148 | C_63=<tmp_4105>[3243727]:bat[:oid] := sql.tid(X_3=0:int,"sys":str,"testtable":str,1:int,2:int); |
| 244 | X_64=<tmp_5015>[3243727]:bat[:str] := sql.bind(X_3=0:int,"sys":str,"testtable":str,"plaintext":str,0:int,0: |
: : int,2:int); :
| 284 | X_67=<tmp_4005>[3243727]:bat[:str] := algebra.projection(C_61=<tmp_5422>[3243727]:bat[:oid],X_64=<tmp_5015 |
: : >[3243727]:bat[:str]); :
| 4245 | X_65=<tmp_5522>[3243727]:bat[:str] := sql.bind(X_3=0:int,"sys":str,"testtable":str,"plaintext":str,0:int,1: |
: : int,2:int); :
| 117 | X_68=<tmp_5625>[3243727]:bat[:str] := algebra.projection(C_63=<tmp_4105>[3243727]:bat[:oid],X_65=<tmp_5522 |
: : >[3243727]:bat[:str]); :
| 6421959 | X_70=<tmp_5522>[3243727]:bat[:int] := batstr.length(X_68=<tmp_5625>[3243727]:bat[:str]); |
| 110 | X_72=<tmp_5625>[1]:bat[:oid] := algebra.subslice(X_70=<tmp_5522>[3243727]:bat[:int],0:lng,0:lng); |
| 99 | X_76=<tmp_4105>[1]:bat[:int] := algebra.projection(X_72=<tmp_5625>[1]:bat[:oid],X_70=<tmp_5522>[3243727]:b |
: : at[:int]); :
| 74 | language.pass(X_70=<tmp_5522>[3243727]:bat[:int]); |
| 7415885 | X_69=<tmp_5015>[3243727]:bat[:int] := batstr.length(X_67=<tmp_4005>[3243727]:bat[:str]); |
| 109 | X_71=<tmp_4005>[1]:bat[:oid] := algebra.subslice(X_69=<tmp_5015>[3243727]:bat[:int],0:lng,0:lng); |
| 102 | X_86=<tmp_5422>[1]:bat[:oid] := mat.packIncrement(X_71=<tmp_4005>[1]:bat[:oid],2:int); |
| 233 | X_75=<tmp_5522>[1]:bat[:int] := algebra.projection(X_71=<tmp_4005>[1]:bat[:oid],X_69=<tmp_5015>[3243727]:b |
: : at[:int]); :
| 77 | language.pass(X_71=<tmp_4005>[1]:bat[:oid]); |
| 95 | X_89=<tmp_4005>[1]:bat[:int] := mat.packIncrement(X_75=<tmp_5522>[1]:bat[:int],2:int); |
| 89 | X_74=<tmp_4005>[2]:bat[:int] := mat.packIncrement(X_89=<tmp_4005>[2]:bat[:int],X_76=<tmp_4105>[1]:bat[:int |
: : ]); :
| 70 | language.pass(X_69=<tmp_5015>[3243727]:bat[:int]); |
| 97 | X_73=<tmp_5422>[2]:bat[:oid] := mat.packIncrement(X_86=<tmp_5422>[2]:bat[:oid],X_72=<tmp_5625>[1]:bat[:oid |
: : ]); :
| 72 | language.pass(X_72=<tmp_5625>[1]:bat[:oid]); |
| 98 | X_26=<tmp_5625>[1]:bat[:oid] := algebra.subslice(X_73=<tmp_5422>[2]:bat[:oid],0:lng,0:lng); |
| 94 | X_27=<tmp_5422>[1]:bat[:int] := algebra.projection(X_26=<tmp_5625>[1]:bat[:oid],X_74=<tmp_4005>[2]:bat[:in |
: : t]); :
| 7446593 | barrier X_96=false:bit := language.dataflow(); |
| 106 | X_36=<tmp_4612>[1]:bat[:str] := bat.append(X_29=<tmp_4612>[1]:bat[:str],"sys.L2":str); |
| 84 | X_38=<tmp_5322>[1]:bat[:str] := bat.append(X_31=<tmp_5322>[1]:bat[:str],"L2":str); |
| 84 | X_40=<tmp_5222>[1]:bat[:str] := bat.append(X_32=<tmp_5222>[1]:bat[:str],"int":str); |
| 94 | X_42=<tmp_5120>[1]:bat[:int] := bat.append(X_33=<tmp_5120>[1]:bat[:int],32:int); |
| 82 | X_44=<tmp_4713>[1]:bat[:int] := bat.append(X_35=<tmp_4713>[1]:bat[:int],0:int); |
| 168 | X_28=5:int := sql.resultSet(X_36=<tmp_4612>[1]:bat[:str],X_38=<tmp_5322>[1]:bat[:str],X_40=<tmp_5222>[1]:b |
: : at[:str],X_42=<tmp_5120>[1]:bat[:int],X_44=<tmp_4713>[1]:bat[:int],X_27=<tmp_5422>[1]:bat[:int]); :
+---------+------------------------------------------------------------------------------------------------------------+
36 tuples (8.7s)

Comment 25284

Date: 2017-05-03 13:30:14 +0200
From: @yzchang

The situation is indeed caused by that LIMIT is not pushed down (shown clearly in the TRACE output). It's been on our TODO list for quite sometime to push down some simple LIMIT cases... Can you please mark this ticket as an "enhancement" with higher importance?

If possible, a quicker workaround is of course to first select the 1 "fieldname", before computing its length().

It's unrelated to CLOB or VARCHAR. Computing the length of a CLOB or a VARCHAR should have the same cost. So changing the data type won't help.

Comment 25285

Date: 2017-05-03 15:59:17 +0200
From: Anthony Bucci <>

Thank you, I've changed the ticket to enhancement and raised the importance to High. I'll experiment with the workaround you suggest.

(In reply to Ying Zhang from comment 1)

The situation is indeed caused by that LIMIT is not pushed down (shown
clearly in the TRACE output). It's been on our TODO list for quite sometime
to push down some simple LIMIT cases... Can you please mark this ticket as
an "enhancement" with higher importance?

If possible, a quicker workaround is of course to first select the 1
"fieldname", before computing its length().

It's unrelated to CLOB or VARCHAR. Computing the length of a CLOB or a
VARCHAR should have the same cost. So changing the data type won't help.

Comment 27673

Date: 2020-04-17 14:11:05 +0200
From: @PedroTadim

A new optimization was added today that pushes down limit and sample relations under projections. It will be available in the next feature release in June.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants