Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monetdb fails on remote tables #6227

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Monetdb fails on remote tables #6227

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working major SQL

Comments

@monetdb-team
Copy link

Date: 2017-02-23 17:12:12 +0100
From: Manuel <>
To: SQL devs <>
Version: 11.23.13 (Jun2016-SP2)
CC: @njnes

Last updated: 2017-03-31 13:28:34 +0200

Comment 25055

Date: 2017-02-23 17:12:12 +0100
From: Manuel <>

User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier:

Good afternoon,

I have been trying using distributed query processing monetdb.

This is my setup:

  • two hosts 'A' and 'B' running monetdb. 'A' and 'B' have the same amount of resources (cpus, memory, storage ....)

On host 'A' I have a setup:
- table 'T1' containing 10 million records.
On host 'B' I have setup:
- table 'T2' containing another set of 10 million records
- remote table 'T1' mapped to T1 on host 'A'
- table 'T3' a table containing the records of 'T1' and 'T2' (as a local table: this contains 20 million records),
- merge table 'T3_remote' as the merge table of 'T2' (local table) and 'T1' (remote table).
'T3' and 'T3_remote' will effectively contain the same 20 million records, one with all records saved locally, the other as a merge table of a local table and a remote table.

All tables have the same schema:

(
"customer_gender" CHAR(1),
"customer_region" VARCHAR(15) NOT NULL,
"customer_country" VARCHAR(40) NOT NULL,
"customer_state" VARCHAR(40),
"customer_city" VARCHAR(50) NOT NULL,
"customer_date_of_birth" DATE,
"customer_id" INTEGER NOT NULL,
"product_family" VARCHAR(20) NOT NULL,
"product_category" VARCHAR(30) NOT NULL,
"product_subcategory" VARCHAR(30) NOT NULL,
"product_name" VARCHAR(50) NOT NULL,
"product_id" INTEGER NOT NULL,
"store_id" INTEGER NOT NULL,
"store_longitude" DECIMAL(12,8) NOT NULL,
"store_latitude" DECIMAL(12,8) NOT NULL,
"store_name" VARCHAR(50) NOT NULL,
"store_manager" VARCHAR(50) NOT NULL,
"store_phone_number" VARCHAR(20) NOT NULL,
"store_region" VARCHAR(15) NOT NULL,
"store_country" VARCHAR(40) NOT NULL,
"store_state" VARCHAR(40),
"store_city" VARCHAR(50) NOT NULL,
"order_date" DATE NOT NULL,
"year_begin_date" DATE,
"quarter_begin_date" DATE,
"month_begin_date" DATE,
"yyyymm" INTEGER,
"yyyymmdd" INTEGER,
"ddmonyyyy" VARCHAR(9),
"line_price" DECIMAL(12,2) NOT NULL,
"line_cost" DECIMAL(12,2) NOT NULL,
"line_margin" DECIMAL(13,2) NOT NULL,
"line_margin_percent" DECIMAL(12,8) NOT NULL
);

All queries will be performed against the Monetdb instance running on host 'B'.

If I try to execute a simple query like :

select * from  "T1" where product_id > 1757;

The query execution fails with the error (of course the query runs fine on the local table):

	TypeException:user.l4[6]:'algebra.thetasubselect' undefined in:     algebra.thetasubselect(X_40:bat[:int],X_41:any,">":str);
	SQLState:  22000
	ErrorCode: 0
	Error: (mapi:monetdb://monetdb@192.168.1.46/visokio) 'algebra.thetasubselect' undefined in:     algebra.thetasubselect(X_46:bat[:int],X_47:any,">":str);
	SQLState:  22000
	ErrorCode: 0

I receive similar errors on every query containing a where clause.

Queries with where conditions also fail on T3_remote.

If I try queries without a where condition they are able to run on the remote table, but it seems that same query executed distributedly runs much slower than the query executed on the local table.
For example the query:

select count(*), avg("line_cost"), "product_family" from "T3_remote" group by "product_family"

takes from 17 seconds up to 28 seconds to execute, while the same query executed against the local table (containing the same records)

select count(*), avg("line_cost"), "product_family" from "T3" group by "product_family"

takes 0.2 up to 4 seconds.

Am I doing something wrong? Is there a reason why I cannot execute a query containing a where clause ?

Also, why is the query involving the remote table so much slower than the same query executed on the local table ? Isn't the query executed in a distribute manner supposed to leverage the processing power of both machines ?

Thank you for your help, let me know if you need some additional information, log files, the actual data of the tables ....

Best regards,

Manuel

Reproducible: Always

Actual Results:

Being able to perform a query on a remote table with a where clause

Comment 25153

Date: 2017-03-16 08:41:53 +0100
From: @njnes

fixed in the Dec2016 branch

@monetdb-team monetdb-team added bug Something isn't working major SQL labels Nov 30, 2020
@sjoerdmullender sjoerdmullender added this to the Ancient Release milestone Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major SQL
Projects
None yet
Development

No branches or pull requests

2 participants