You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Build Identifier:
start transaction;
CREATE AGGREGATE pycount(val STRING)
RETURNS INTEGER
LANGUAGE PYTHON {
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = val[aggr_group==unique[i]].size
return(x)
};
sql>select count(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
+------+
| L5 |
+======+
| 2 |
+------+
1 tuple
sql>select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
Python exception
def pyfun(val,_columns,_column_types,_conn):
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
global name 'aggr_group' is not defined
One could say that this is correct, as there are no groups for this aggregation.
In that case, probably the python aggregation should check whether aggr_group exists, and aggregate over all rows in case it doesn't.
Is that the correct interpretation? If so, maybe this should be better explained somewhere.
The original post (https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb) does not mention this case.
Also, to simplify things, maybe aggr_group should also be passed, but with a null value. That would be a cleaner test to perform in the python code.
The other interpretation could be that there is 1 group, which includes all rows.
In that case, the absence of aggr_group is a bug.
So instead of:
select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
use:
select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term) group by id;
To make it clear: I do agree that the problem was not in the code, but in the documentation.
However, the solution is not to use the GROUP BY. That would yield a different result (the original query was an aggregation with no grouping).
The correct solution is to check for the existence of aggr_group in the python function:
try:
aggr_group
except NameError:
aggr_group doesn't exist
no groups, aggregate on all data
else:
aggr_group exists
aggregate on groups
Note that this should be done on every python aggregate function defined, no exception.
If you want to make the docs even clearer, perhaps you could include something like that.
Comment 27443
Date: 2019-12-05 15:07:14 +0100
From: Martin van Dinther <<martin.van.dinther>>
I have updated the documentation. It now uses:
try:
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = numpy.sum(val[aggr_group==unique[i]])
except NameError:
aggr_group doesn't exist. no groups, aggregate on all data
x = numpy.sum(val)
return(x)
This way it behaves similar to SQL aggregate functions.
I tested it with SQL queries:
SELECT groupnr, python_aggregate(value) FROM grouped_ints GROUP BY groupnr;
SELECT value, python_aggregate(groupnr) FROM grouped_ints GROUP BY value;
SELECT python_aggregate(groupnr) FROM grouped_ints;
SELECT python_aggregate(value) FROM grouped_ints;
and for the extended table (I called it grouped_ints2):
SELECT groupnr, groupnr2, python_aggregate(value) FROM grouped_ints2 GROUP BY groupnr, groupnr2;
SELECT python_aggregate(groupnr) FROM grouped_ints2;
SELECT python_aggregate(groupnr2) FROM grouped_ints2;
SELECT python_aggregate(value) FROM grouped_ints2;
These queries all work now and with the correct output.
Add test for bug #6726
Note that query: SELECT value, python_aggregate(groupnr) FROM grouped_ints GROUP BY value;
does not give the correct answer. It only returned two rows instead of 5.
Comment 27447
Date: 2019-12-05 16:20:54 +0100
From: Martin van Dinther <<martin.van.dinther>>
Reopened as test reveals a new problem in one case where the grouping is done on a column with unique values. See query:
SELECT value, python_aggregate(groupnr) FROM grouped_ints GROUP BY value;
Fix for bug #6726 failing test. Look for TYPE_void bats in pyapi and generate dense sequences from them. (I have to check for 32-bit architectures yet)
The text was updated successfully, but these errors were encountered:
Date: 2019-07-05 16:20:24 +0200
From: @swingbit
To: SQL devs <>
Version: 11.31.13 (Aug2018-SP2)
CC: martin.van.dinther, @PedroTadim
Last updated: 2019-12-20 15:36:33 +0100
Comment 27113
Date: 2019-07-05 16:20:24 +0200
From: @swingbit
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Build Identifier:
start transaction;
CREATE AGGREGATE pycount(val STRING)
RETURNS INTEGER
LANGUAGE PYTHON {
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = val[aggr_group==unique[i]].size
return(x)
};
sql>select count(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
+------+
| L5 |
+======+
| 2 |
+------+
1 tuple
sql>select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
Python exception
global name 'aggr_group' is not defined
One could say that this is correct, as there are no groups for this aggregation.
In that case, probably the python aggregation should check whether aggr_group exists, and aggregate over all rows in case it doesn't.
Is that the correct interpretation? If so, maybe this should be better explained somewhere.
The original post (https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb) does not mention this case.
Also, to simplify things, maybe aggr_group should also be passed, but with a null value. That would be a cleaner test to perform in the python code.
The other interpretation could be that there is 1 group, which includes all rows.
In that case, the absence of aggr_group is a bug.
Reproducible: Always
Comment 27124
Date: 2019-07-11 10:52:33 +0200
From: @sjoerdmullender
I suppose the documentation could be better, but your function could also catch the problem:
CREATE AGGREGATE pycount(val STRING)
RETURNS INTEGER
LANGUAGE PYTHON {
try:
unique = numpy.unique(aggr_group)
except NameError:
unique = numpy.unique([0])
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = val[aggr_group==unique[i]].size
return(x)
};
Comment 27125
Date: 2019-07-11 10:54:09 +0200
From: @swingbit
Indeed, I just was not sure whether this is how it was intended to be.
Comment 27146
Date: 2019-07-17 17:14:47 +0200
From: Martin van Dinther <<martin.van.dinther>>
FYI: https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb has been updated with info that parameter aggr_group is only created when GROUP BY is used.
So instead of:
select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term);
use:
select pycount(term) from (values (1,'apple'), (2,'pear')) as x(id, term) group by id;
Comment 27147
Date: 2019-07-17 17:43:55 +0200
From: @swingbit
Thanks for adding to the documentation.
To make it clear: I do agree that the problem was not in the code, but in the documentation.
However, the solution is not to use the GROUP BY. That would yield a different result (the original query was an aggregation with no grouping).
The correct solution is to check for the existence of aggr_group in the python function:
try:
aggr_group
except NameError:
aggr_group doesn't exist
no groups, aggregate on all data
else:
aggr_group exists
aggregate on groups
Note that this should be done on every python aggregate function defined, no exception.
If you want to make the docs even clearer, perhaps you could include something like that.
Comment 27443
Date: 2019-12-05 15:07:14 +0100
From: Martin van Dinther <<martin.van.dinther>>
I have updated the documentation. It now uses:
try:
unique = numpy.unique(aggr_group)
x = numpy.zeros(shape=(unique.size))
for i in range(0, unique.size):
x[i] = numpy.sum(val[aggr_group==unique[i]])
except NameError:
aggr_group doesn't exist. no groups, aggregate on all data
x = numpy.sum(val)
return(x)
This way it behaves similar to SQL aggregate functions.
I tested it with SQL queries:
SELECT groupnr, python_aggregate(value) FROM grouped_ints GROUP BY groupnr;
SELECT value, python_aggregate(groupnr) FROM grouped_ints GROUP BY value;
SELECT python_aggregate(groupnr) FROM grouped_ints;
SELECT python_aggregate(value) FROM grouped_ints;
and for the extended table (I called it grouped_ints2):
SELECT groupnr, groupnr2, python_aggregate(value) FROM grouped_ints2 GROUP BY groupnr, groupnr2;
SELECT python_aggregate(groupnr) FROM grouped_ints2;
SELECT python_aggregate(groupnr2) FROM grouped_ints2;
SELECT python_aggregate(value) FROM grouped_ints2;
These queries all work now and with the correct output.
Comment 27446
Date: 2019-12-05 16:16:07 +0100
From: MonetDB Mercurial Repository <>
Changeset ab919d359405 made by Martin van Dinther martin.van.dinther@monetdbsolutions.com in the MonetDB repo, refers to this bug.
For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=ab919d359405
Changeset description:
Comment 27447
Date: 2019-12-05 16:20:54 +0100
From: Martin van Dinther <<martin.van.dinther>>
Reopened as test reveals a new problem in one case where the grouping is done on a column with unique values. See query:
SELECT value, python_aggregate(groupnr) FROM grouped_ints GROUP BY value;
Comment 27460
Date: 2019-12-06 15:32:19 +0100
From: MonetDB Mercurial Repository <>
Changeset 2d8848e66425 made by Pedro Ferreira pedro.ferreira@monetdbsolutions.com in the MonetDB repo, refers to this bug.
For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=2d8848e66425
Changeset description:
The text was updated successfully, but these errors were encountered: