Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI Client clobbered data ::SQLTables #2336

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

ANSI Client clobbered data ::SQLTables #2336

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working GDK Kernel normal

Comments

@monetdb-team
Copy link

Date: 2010-03-14 03:40:27 +0100
From: Farid Z <>
To: GDK devs <>
Version: -- development

Last updated: 2011-09-16 15:04:41 +0200

Comment 13108

Date: 2010-03-14 03:40:27 +0100
From: Farid Z <>

I am using an ANSI ODBC client (not Unicode) with latest version of MonetDB ODBC driver 1.36.01.01 2/24/2010 and I am getting clobbed resultset comparing demo database loaded with VOC dataset to itself by using Zidsoft CompareData app.

Does the ODBC driver not support ANSI clients?

Comment 13109

Date: 2010-03-15 18:03:09 +0100
From: @sjoerdmullender

The MonetDB ODBC driver only support Unicode and UTF-8. That is to say, the wide character calls are expected to use UCS-16, and the narrow character calls are expected to use UTF-8. The driver does not support any other character sets (well, ASCII since it is a proper subset of UTF-8).

Comment 13110

Date: 2010-03-15 18:28:28 +0100
From: Farid Z <>

My app is ANSI so I am using narrow calls. So that's not the cause of this issue. I see the cause of the problem: the driver is returning incorrect information for ::SQLTables character resultset columns:

     rc = SQLColAttribute(hstmt,
                          nCol,
                          SQL_DESC_OCTET_LENGTH,
                          NULL,
                          0,
                          NULL,
                          &pRow->buffer_length);

		rc = SQLColAttribute(hstmt,
							 nCol,
							 SQL_DESC_LENGTH,
							 NULL,
							 0,
							 NULL,
							 (SQLLEN*) &pRow->column_size);

The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH for ::SQLTables character resultsetset columns instead of actual values.
Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys resultsets character columns

Comment 13111

Date: 2010-04-15 21:55:07 +0200
From: @sjoerdmullender

If by ANSI you mean one of the ISO-8859 versions (e.g. ISO-8859-1 aka "latin 1") then that is definitely a problem. Those are not compatible with UTF-8.

What are the types of pRow->buffer_length and pRow->column_size?

Is your application 32 bit or 64 bit?

The MonetDB ODBC driver may not be completely compliant with the 64-bit definition of the ODBC interface. I'm hoping that can be fixed in the next feature release (not in a bug fix release since it's an API change).

Comment 13112

Date: 2010-04-15 22:42:37 +0200
From: Farid Z <>

I am running the 32-bit version of my app on Windows XP SP3 32-bit and using Windows characters set 1252 (US-EN).

I am using ODBC types for column_size and buffer_length which in 32-bit version of my app are SQUINTEGER and SQLINTEGER.

SQLULEN column_size;
SQLLEN buffer_length;

My app is tested successfully with 30-40 different DBMS ODBC drivers without encountering this issue http://www.zidsoft.com/dbmsnotes.html

The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH
for ::SQLTables character resultsetset columns instead of actual values.
Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys
resultsets character columns

Comment 13113

Date: 2010-04-16 10:07:58 +0200
From: @sjoerdmullender

I have been able to also get the value 0 from the call to SQLColAttribute on column 1 after a call to SQLTables.
The reason for the value 0 is that (in my case, anyway) all values in the column are NULL.
If I look at column 2 where I do get a non-NULL value, the value for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH is indeed greater than 0.
The ODBC speciification just says that the types of the columns are "varchar" without any size indication. So if the columns are completely filled with NULLs, the server just returns a length indication of 0.
I suppose I could arbitrarily change the length to 1 when it would otherwise have been 0.

Comment 13114

Date: 2010-04-16 14:43:07 +0200
From: Farid Z <>

The driver needs to return valid values (not 0 and certainly usually greater than 1) for SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH for the resultset columns as per ODBC descriptions for these.
http://msdn.microsoft.com/en-us/library/ms713558(VS.85).aspx

Quote:
SQL_DESC_LENGTH: A numeric value that is either the maximum or actual character length of a character string or binary data type. It is the maximum character length for a fixed-length data type, or the actual character length for a variable-length data type. Its value always excludes the null-termination byte that ends the character string

SQL_DESC_OCTET_LENGTH :
The length, in bytes, of a character string or binary data type. For fixed-length character or binary types, this is the actual length in bytes. For variable-length character or binary types, this is the maximum length in bytes. This value does not include the null terminator.

So for SQLTables resultset table names these would be for example, 128, 128 assuming that the maximum length for a table name in MonetDB is 128 character. Similar info should be returned by the driver for SQLStatistics, SQLPrimaryKeys and SQLForeignKesy, SQLGetTypeInfo and all other ODBC catalog functions for these attributes.

Comment 13115

Date: 2010-04-16 15:46:14 +0200
From: @sjoerdmullender

I read that too, but I'm not sure I can come to the same conclusions. I'm struggling to understand what the specification means.

The columns SQLTables returns are all varchar, without mention how large they should be. The first column is the catalog name. The values here can be NULL if the data source does not support catalogs. Since MonetDB indeed doesn't support catalogs, the value returned is indeed NULL.

The description in SQLTables doesn't say what size varchar column to return, but presumably one that is large enough to hold the data. In this case, varchar(1) is plenty long enough (for this particular column).

For SQLColAttribute, the description of SQL_DESC_LENGTH says that, for variable-length data (which varchar is), the size returned is the "actual character length" (we can ignore the fixed-length case). What is the actual character length of a NULL value? I can argue 0.

The problem for SQL_DESC_OCTET_LENGTH is that, for variable-length character types, it returns the maximum length in bytes. Again, what is the maximum length of a column that is always going to be NULL? The query that SQLTables does under the hood uses cast(null as varchar(1)) for the catalog column.

There is, however, a problem in MonetDB. The server returns a single column length value, and for varchar columns that value is the actual length of the longest value in the column (0 if all values are NULL or if there are no rows). (The value returned is the declared length for fixed-length columns, i.e. 50 for char(50).) In other words, the client does not know the maximum allowed length of the column if the longest value is shorter. So if a column is declared varchar(50) but only contains NULLs, the value returned by the server is 0. This means that determining SQL_DESC_LENGTH is relatively easy, but determining SQL_DESC_OCTET_LENGTH is not currently possible for varchar columns.

Another problem is that SQL_DESC_LENGTH counts characters and SQL_DESC_OCTET_LENGTH counts bytes. These are not interchangeable. MonetDB uses the UTF-8 encoding, which is a variable-length encoding. A character can take from 1 to 6 bytes. Should we then multiply the value by 6? If a column is declared as e.g. varchar(10), that means that it can store up to 10 characters in MonetDB.

The problem we're having here doesn't really have anything to do with SQLTables as such. The implementation does a SQLExecDirect call under the hood, and SQLColAttribute doesn't know (or care) how the result set was created. The problem has to do with getting metainformation about the result set. In other words, your suggestion of using 128 characters (or is that bytes?) doesn't help. That information is not passed on to SQLColAttribute, and anyway, the same problems would exist for hand-created SQL queries.

I do agree that all metainformation-returning queries should return compatible information.

Comment 13116

Date: 2010-04-16 16:16:28 +0200
From: Farid Z <>

The columns SQLTables returns are all varchar, without mention how large
they should be
The returned info would represent how the column is stored at the dbms. That's why the ODBC spec does not specify exact character counts. For DB2 this a table name can be maximum of 128 characters so their drivers returns 128 for SQL_DESC_LENGTH and 128 for SQL_DESC_OCTET_LENGTH for ANS or 128*2 for UTF16.

You can ignore the catalog column of the resultset since MonetDB does not support catalogs and if the driver correctly reports that catalog are not supported by the DBMS.

Another problem is that SQL_DESC_LENGTH counts characters and
SQL_DESC_OCTET_LENGTH counts bytes. These are not interchangeable.
MonetDB uses the UTF-8 encoding

Example column declared as varchar(30) the driver would return
SQL_DESC_LENGTH -> 30
SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for Unicode version of the driver. For Windows the driver must translate from dbms characterset (UTF8) to client characterset: ANSI (single byte characterset) or Windows UTF16 (implemented by ODBC spec as wchar_t two bytes).

your suggestion of using 128 characters (or is that bytes?) doesn't help
That was an example, not a suggestion. For MontDB the driver needs to return the actual info for the underlying catalog "views" where the information for SQLTables is stored. MonetDB must be storing the table names somewhere in its catalogs.

There is, however, a problem in MonetDB. The server returns a single column length value, and for varchar >columns that value is the actual length of the longest value in the column (0 if all values are NULL or if
there are no rows)
That would be a problem. If I create a column as varchar(30) then I should get (as explained above) whether or not there is ANY data in the table:
SQL_DESC_LENGTH -> 30
SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for Unicode version of the driver

SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH are used by the ODBC client to create and allocate buffers for the resultset data and without correct information for these attributes, the ODBC client can not correctly retrieve data from the database

Comment 13117

Date: 2010-04-16 18:17:26 +0200
From: @sjoerdmullender

What does "the actual character length for a variable-length data type" mean? I thought it meant the length of the (longest) actual value in the column. Since this is from SQLColAttribute, there are actual values in the result set to talk about. From your comment I understand you have a different interpretation (namely the maximum allowed length for the column). Is there anywhere a clearer definition of these attributes?

The definition of SQL_DESC_OCTET_LENGTH is perhaps a little clearer, although I could interpret that either way as well: "For variable-length character or binary types, this is the maximum length in bytes". Which maximum? The maximum according to the column definition or the maximum in the current result set?

In addition to SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH, there is also SQL_DESC_DISPLAY_SIZE which has a similar issue. Does that value depend on the actual content of the column (as returned by the query) or does that depend on the definition of the column?

I can see that the width of the columns returned by SQLTables could depend on the limitations of the server for the type of information (schema name length, etc.) Unfortunately, that doesn't help currently since, as mentioned, the information is not passed on.

As to ANSI vs. Unicode. We only support UTF-8 in the narrow calls and UTF-16 in the wide calls. As I understand it, the term ANSI refers to a Windows code page which is close to ISO-8859-1, but in any case not compatible with UTF-8.

I suppose I could change the code so that the narrow interface is CP-1252 and if you want to use characters that don't occur in that code page you're forced to use the wide interface. I could also differentiate between the calls with or without A at the end (e.g. SQLTables vs. SQLTablesA) where the calls with A use the "ANSI" code page and the calls without use UTF-8. But I don't really like this idea.

(My weekend is starting now, so I probably won't respond before Monday.)

Comment 13118

Date: 2010-04-16 18:52:39 +0200
From: Farid Z <>

What does "the actual character length for a variable-length data type" mean
It means the length of the column as defined by the DDL. Example, varchar(30) -> 30
I think we need to differentiate between describing a resultset columns (resultset description as returned by SQLColAttribute and other calls) and actual data values that are only known when the resultset is actually retrieved (stored in location pointed to by StrLen_or_Ind used in SQLBindCol).

When I am getting resultset description from the driver no data is retrieved (I don't want to retrieve any data yet, the table could have 4 million rows or no rows, but client only wants to know the resultset column descriptions). This is usually done by preparing the select statement "select * from

"

Remember I don't want the driver to retrieve the resultset I only want to know what is resultset description is. So the database can have 1000 tables and all the tables are empty, but the client should still be able to determine the resultset description for each table by preparing "select * from

" for example (or actually executing the select). So for example,
create table test (col1 varchar(30))

I should be able to determine that col1 resultset column length is 30 and octet length is 30 (ANSI) even though the table is empty by calling SQLColAttribute and SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH

Whether the table has data or not is not-relevant to the resultset description.

I suppose I could change the code so that the narrow interface is CP-1252 and if you want to use >characters that don't occur in that code page you're forced to use the wide interface
No. The client can be using Windows codepage 1252 or some other codepage (1256 Arabic, for example), depending on the client locale, etc. You can determine the code page the client is using (ANSI) by calling Windows API GetACP(VOID)

Since the server uses UTF8 then when transferring data to an ANSI client (narrow calls) the driver needs to convert from UTF8 to the codepage returned by GetACP() call. The conversion of course could fail if the client codepage has no equivalent to the UTF8 character and in this case the driver would report a conversion error for only that piece of data.

Similarly for wide clients, the driver would convert from UTF8 to UTF16 (UCS16, actually) and report any conversion errors (less likely).

Comment 13119

Date: 2010-05-04 09:32:09 +0200
From: Pseudo user for Sourceforge import <>

This bug was previously known as tracker item 2970087 at http://sourceforge.net/support/tracker.php?aid=2970087

Comment 16138

Date: 2011-08-22 20:21:27 +0200
From: @sjoerdmullender

A number of recent changes in the Aug2011 branch have hopefully solved the problems described here. The server now returns the size of the underlying datatype to the client (the ODBC driver) which passes it on to the application. Also, the various lengths that can be requested from SQLColAttribute (SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH, SQL_DESC_DISPLAY_SIZE) now result in the values as described in the ODBC documentation. In particular, the octet length for character types (char, varchar, clob) is 6 times the value of the SQL_DESC_LENGTH value since the latter is number of characters, and UTF-8 encoding can take up to 6 bytes for a Unicode code point (character).

I will call this bug resolved, although you can always reopen the bug if you don't agree with the resolution.

Comment 16256

Date: 2011-09-16 15:04:41 +0200
From: @sjoerdmullender

The Aug2011 version has been released.

@monetdb-team monetdb-team added bug Something isn't working GDK Kernel normal labels Nov 30, 2020
@sjoerdmullender sjoerdmullender added this to the Ancient Release milestone Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GDK Kernel normal
Projects
None yet
Development

No branches or pull requests

2 participants