Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODBC driver fails to return all documents, but is able to provide partial results #2660

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels
bug Something isn't working Client interfaces invalid This doesn't seem right normal

Comments

@monetdb-team
Copy link

Date: 2010-08-24 13:26:04 +0200
From: @skinkie
To: clients devs <>
Version: 1.40.1 (Oct2010) [obsolete]
CC: @sjoerdmullender, @drstmane

Last updated: 2010-09-27 18:29:13 +0200

Comment 14754

Date: 2010-08-24 13:26:04 +0200
From: @skinkie

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.36 Safari/534.3
Build Identifier:

Rather bizarre situation. I'm trying to index my new database. And before I was having the assumpption that the indexer was actually broken. Today I noticed that it was actually segfaulting inside the MonetDB driver.

Situation that works:
elect id, ljn, gepubliceerd, instantie, datum, publicatie, zaaknummers, uitspraak, conclusie, zittingsplaats, rechtsgebied, sector, soort, indicatie, kop from uitspraken limit 100000;

Situation that doesn't work:
elect id, ljn, gepubliceerd, instantie, datum, publicatie, zaaknummers, uitspraak, conclusie, zittingsplaats, rechtsgebied, sector, soort, indicatie, kop from uitspraken;
(Count is about 175k)

The backtrace I get from the indexer:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5fc604a in ODBCFetch (stmt=0x78c6b0, col=1, type=1, ptr=0x79fd50, buflen=11, lenp=0x79f978, nullp=0xf978, precision=0, scale=0,
datetime_interval_precision=0, offset=0, row=0) at ../../../../src/odbc/driver/ODBCConvert.c:1037
1037 *nullp = SQL_NULL_DATA;
(gdb) bt
0 0x00007ffff5fc604a in ODBCFetch (stmt=0x78c6b0, col=1, type=1, ptr=0x79fd50, buflen=11, lenp=0x79f978, nullp=0xf978, precision=0, scale=0,
datetime_interval_precision=0, offset=0, row=0) at ../../../../src/odbc/driver/ODBCConvert.c:1037
1 0x00007ffff5fdc9cf in SQLFetch_ (stmt=0x78c6b0) at ../../../../src/odbc/driver/SQLFetch.c:134
2 0x00007ffff5fdcc03 in SQLFetch (hStmt=0x78c6b0) at ../../../../src/odbc/driver/SQLFetch.c:194
3 0x00007ffff7b936bb in SQLFetch () from /usr/lib/libodbc.so.1
4 0x0000000000414710 in CSphSource_ODBC::SqlFetchRow (this=0x76b920) at sphinx.cpp:22267
5 0x0000000000441c70 in CSphSource_SQL::NextDocument (this=0x76b920, sError=...) at sphinx.cpp:19435
6 0x00000000004364e2 in CSphSource_Document::IterateHitsNext (this=0x78bda0, sError=...) at sphinx.cpp:18522
7 0x0000000000463b8e in CSphIndex_VLN::Build (this=0x76cd90, dSources=, iMemoryLimit=,
iWriteBuffer=) at sphinx.cpp:9559
8 0x000000000040ba73 in DoIndex (hIndex=, sIndexName=, hSources=,
bVerbose=, fpDumpRows=) at indexer.cpp:1138
9 0x000000000040cd8d in main (argc=, argv=) at indexer.cpp:1609

Serveral things come into me mind.

  1. Is it possible that the size is bigger than the 'client' cache, also observed when coding libmapi and attempting random access.
  2. Reviewing the code (and analysing the backtrace shows the nullp is 0xf978, that doesn't really sound like a valid adres (and even stack wise looks way to familiar to the lenp. Reviewing the code one level higher in SQLFetch.c:100 does indicate the suspicion that something mapi_* related is going on, since that fills the desc.

Maybe some interesting poking:

print *stmt
$3 = {Type = 5461, Error = 0x0, RetrievedErrors = 0, Dbc = 0x78a780, next = 0x0, State = FETCHED, hdl = 0x78c770, rowcount = 175504, currentRow = 115,
startRow = 114, rowSetSize = 1, currentCol = 0, retrieved = 0, queryid = -1, nparams = 0, querytype = 1, cursorType = 0, cursorScrollable = 0,
retrieveData = 1, noScan = 0, ApplRowDescr = 0x78c7f0, ApplParamDescr = 0x78c860, ImplRowDescr = 0x78c8d0, ImplParamDescr = 0x78c940,
AutoApplRowDescr = 0x78c7f0, AutoApplParamDescr = 0x78c860}

print i
$4 = 1

print *rec
{sql_desc_auto_unique_value = 0, sql_desc_base_column_name = 0x0, sql_desc_base_table_name = 0x0, sql_desc_case_sensitive = 0,
sql_desc_catalog_name = 0x0, sql_desc_concise_type = 1, sql_desc_data_ptr = 0x79fd50, sql_desc_datetime_interval_code = 0,
sql_desc_datetime_interval_precision = 0, sql_desc_display_size = 0, sql_desc_fixed_prec_scale = 0, sql_desc_indicator_ptr = 0xf978,
sql_desc_label = 0x0, sql_desc_length = 1, sql_desc_literal_prefix = 0x0, sql_desc_literal_suffix = 0x0, sql_desc_local_type_name = 0x0,
sql_desc_name = 0x0, sql_desc_nullable = 0, sql_desc_num_prec_radix = 0, sql_desc_octet_length = 11, sql_desc_octet_length_ptr = 0x79f978,
sql_desc_parameter_type = 0, sql_desc_precision = 0, sql_desc_rowver = 0, sql_desc_scale = 0, sql_desc_schema_name = 0x0, sql_desc_searchable = 0,
sql_desc_table_name = 0x0, sql_desc_type = 1, sql_desc_type_name = 0x0, sql_desc_unnamed = 0, sql_desc_unsigned = 0, sql_desc_updatable = 0}

Reproducible: Always

MonetDB server v5.21.0 (64-bit), based on kernel v1.39.0 (64-bit oids)
Not released
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 3.9GiB available memory, 2 available cpu cores
Configured for prefix: /opt/monetdb-testing
Libraries:
libpcre: 8.02 2010-03-19 (compiled with 8.02)
openssl: OpenSSL 1.0.0a 1 Jun 2010 (compiled with OpenSSL 1.0.0a 1 Jun 2010)
libxml2: 2.7.7 (compiled with 2.7.7)
Compiled by: skinkie@openkvk (x86_64-unknown-linux-gnu)
Compilation: gcc -Wall -Wextra -fno-strict-aliasing -g -Werror-implicit-function-declaration -Werror -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wp,-D_FORTIFY_SOURCE=2
Linking : /usr/x86_64-pc-linux-gnu/bin/ld -IPA -m elf_x86_64

Comment 14755

Date: 2010-08-24 14:26:22 +0200
From: @sjoerdmullender

The line your program crashed at is
*nullp = SQL_NULL_DATA;
So the crash is probably due to nullp having a bad value. As luck would have it, nullp is a parameter to the function, so we can see its value is 0xf978. This is indeed a somewhat strange value for a pointer.
The parameter comes from the call inside SQLFetch_ which passed the value of rec->sql_desc_indicator_ptr, which you also show (same value).
This value is initialized in SQLBindCol from the pnLengthOrIndicator parameter (the last parameter).

So the question is, how did you call SQLBindCol for this particular column? What are the values of the parameters to that call?

Also, note that the value comes from your application and is not generated within the driver.

By the way, you can set the environment variable ODBCDEBUG and assign it the full path name of a file which will then be used to log information about the ODBC calls that are going on. This file might have useful information. The environment variable has to be set for the application using the ODBC driver.

Comment 14756

Date: 2010-08-24 14:35:50 +0200
From: @skinkie

(In reply to comment 1)

So the question is, how did you call SQLBindCol for this particular column?
What are the values of the parameters to that call?

I still have an open debug, session. But the only place where I can find SQLBindCol within the upstream code base is:
http://code.google.com/p/sphinxsearch/source/browse/trunk/src/sphinx.cpp22193

By the way, you can set the environment variable ODBCDEBUG and assign it the
full path name of a file which will then be used to log information about the
ODBC calls that are going on. This file might have useful information. The
environment variable has to be set for the application using the ODBC driver.

Going to generate it, will attach when done.

Comment 14757

Date: 2010-08-24 14:43:05 +0200
From: @skinkie

Created attachment 31
ODBC Trace of sphinx indexer

Do you agree it is quite strange that when a limit was applied, lets say 100k, that it actually doesn't matter where the column is bound to?

Attached file: odbctrace.txt (text/plain, 14321 bytes)
Description: ODBC Trace of sphinx indexer

Comment 14759

Date: 2010-08-24 16:26:24 +0200
From: @sjoerdmullender

Looking at the value of rec->sql_desc_indicator_ptr again, I notice that its lower 4 bytes are quite possibly correct, but the upper 4 bytes of the pointer are 0 and should have been 0x79 (i.e. the value should have been 0x79f978 instead of 0xf978).
The question is, how did those upper bytes get squashed?
Since you compiled this yourself, are you absolutely sure that everything was recompiled after the last update?
Some types were changed to support 64 bit architectures better (i.e. according to the latest Microsoft documentation), so it is important that everything uses the same definitions.

In particular, the definition of SQLBindCol changed from
SQLINTEGER nTargetValueMax,
SQLINTEGER *pnLengthOrIndicator)
to
SQLLEN nTargetValueMax,
SQLLEN *pnLengthOrIndicator)
(last two parameters, the last of which is used to initialize the value mentioned above).

So the question is also, what does the ODBC driver manager (unixODBC in your case?) do with this function?

Perhaps you can set a breakpoint on SQLBindCol (the MonetDB ODBC driver version, not the ODBC driver manager version) and look at the values of the parameters. Especially interesting is when nCol==1. Perhaps you can set a watch point on the sql_desc_indicator_ptr value.

Comment 14760

Date: 2010-08-24 16:55:05 +0200
From: @skinkie

(In reply to comment 4)

Since you compiled this yourself, are you absolutely sure that everything was
recompiled after the last update?

For every directory I run rm -rf Linux, and then bootstrap again.

In particular, the definition of SQLBindCol changed from
SQLINTEGER nTargetValueMax,
SQLINTEGER *pnLengthOrIndicator)
to
SQLLEN nTargetValueMax,
SQLLEN *pnLengthOrIndicator)
(last two parameters, the last of which is used to initialize the value
mentioned above).

Yes, actually changed that myself, before that was committed already. But 175k is way below the actual SQLLEN limit anyway.

So the question is also, what does the ODBC driver manager (unixODBC in your
case?) do with this function?

Yes, unixODBC-2.3.0.

Perhaps you can set a breakpoint on SQLBindCol (the MonetDB ODBC driver
version, not the ODBC driver manager version) and look at the values of the
parameters. Especially interesting is when nCol==1. Perhaps you can set a
watch point on the sql_desc_indicator_ptr value.

Breakpoint 1, SQLBindCol (hStmt=0x78c6b0, nCol=1, nTargetType=1, pTargetValue=0x79fd50, nTargetValueMax=11, pnLengthOrIndicator=0x79f978)
at ../../../../src/odbc/driver/SQLBindCol.c:51
51 ODBCStmt *stmt = (ODBCStmt *) hStmt;

print *stmt
$2 = {Type = 5461, Error = 0x0, RetrievedErrors = 0, Dbc = 0x78a780, next = 0x0, State = EXECUTED1, hdl = 0x78c770, rowcount = 175504, currentRow = 0, startRow = 0,
rowSetSize = 0, currentCol = 0, retrieved = 0, queryid = -1, nparams = 0, querytype = 1, cursorType = 0, cursorScrollable = 0, retrieveData = 1, noScan = 0,
ApplRowDescr = 0x78c7f0, ApplParamDescr = 0x78c860, ImplRowDescr = 0x78c8d0, ImplParamDescr = 0x78c940, AutoApplRowDescr = 0x78c7f0, AutoApplParamDescr = 0x78c860}

Reading the code:
rec->sql_desc_indicator_ptr = pnLengthOrIndicator;

That should then become?
(gdb) print desc->descRec[1]
$5 = {sql_desc_auto_unique_value = 0, sql_desc_base_column_name = 0x0, sql_desc_base_table_name = 0x0, sql_desc_case_sensitive = 0, sql_desc_catalog_name = 0x0,
sql_desc_concise_type = 1, sql_desc_data_ptr = 0x79fd50, sql_desc_datetime_interval_code = 0, sql_desc_datetime_interval_precision = 0, sql_desc_display_size = 0,
sql_desc_fixed_prec_scale = 0, sql_desc_indicator_ptr = 0x79f978, sql_desc_label = 0x0, sql_desc_length = 1, sql_desc_literal_prefix = 0x0,
sql_desc_literal_suffix = 0x0, sql_desc_local_type_name = 0x0, sql_desc_name = 0x0, sql_desc_nullable = 0, sql_desc_num_prec_radix = 0, sql_desc_octet_length = 11,
sql_desc_octet_length_ptr = 0x79f978, sql_desc_parameter_type = 0, sql_desc_precision = 0, sql_desc_rowver = 0, sql_desc_scale = 0, sql_desc_schema_name = 0x0,
sql_desc_searchable = 0, sql_desc_table_name = 0x0, sql_desc_type = 1, sql_desc_type_name = 0x0, sql_desc_unnamed = 0, sql_desc_unsigned = 0, sql_desc_updatable = 0}
(gdb) print desc->descRec[0]
$6 = {sql_desc_auto_unique_value = 0, sql_desc_base_column_name = 0x0, sql_desc_base_table_name = 0x0, sql_desc_case_sensitive = 0, sql_desc_catalog_name = 0x0,
sql_desc_concise_type = 0, sql_desc_data_ptr = 0x0, sql_desc_datetime_interval_code = 0, sql_desc_datetime_interval_precision = 0, sql_desc_display_size = 0,
sql_desc_fixed_prec_scale = 0, sql_desc_indicator_ptr = 0x0, sql_desc_label = 0x0, sql_desc_length = 0, sql_desc_literal_prefix = 0x0, sql_desc_literal_suffix = 0x0,
sql_desc_local_type_name = 0x0, sql_desc_name = 0x0, sql_desc_nullable = 0, sql_desc_num_prec_radix = 0, sql_desc_octet_length = 0, sql_desc_octet_length_ptr = 0x0,
sql_desc_parameter_type = 0, sql_desc_precision = 0, sql_desc_rowver = 0, sql_desc_scale = 0, sql_desc_schema_name = 0x0, sql_desc_searchable = 0,
sql_desc_table_name = 0x0, sql_desc_type = 0, sql_desc_type_name = 0x0, sql_desc_unnamed = 0, sql_desc_unsigned = 0, sql_desc_updatable = 0}

Comment 14761

Date: 2010-08-24 17:28:38 +0200
From: @sjoerdmullender

Do this, filling in the value from the first command into the second:

print rec
watch ((ODBCDescRec*)0x...)->sql_desc_indicator_ptr
continue

It looks like the value is passed correctly to SQLBindCol, but then somehow corrupted. You should be able to find where if you get the watch statement right.

Comment 14762

Date: 2010-08-24 17:38:59 +0200
From: @skinkie

(In reply to comment 6)

Do this, filling in the value from the first command into the second:

print rec
watch ((ODBCDescRec*)0x...)->sql_desc_indicator_ptr
continue

It looks like the value is passed correctly to SQLBindCol, but then somehow
corrupted. You should be able to find where if you get the watch statement
right.

Old value = (SQLLEN *) 0x0
New value = (SQLLEN *) 0x79f978
SQLBindCol (hStmt=0x78c6b0, nCol=1, nTargetType=1, pTargetValue=0x79fd50, nTargetValueMax=11, pnLengthOrIndicator=0x79f978) at ../../../../src/odbc/driver/SQLBindCol.c:163
163 rec->sql_desc_octet_length_ptr = pnLengthOrIndicator;

...
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5fc604a in ODBCFetch (stmt=0x78c6b0, col=1, type=1, ptr=0x79fd50, buflen=11, lenp=0x79f978, nullp=0xf978, precision=0, scale=0, datetime_interval_precision=0,
offset=0, row=0) at ../../../../src/odbc/driver/ODBCConvert.c:1037
1037 *nullp = SQL_NULL_DATA;

So not any bit of information more...

Comment 14763

Date: 2010-08-24 18:57:25 +0200
From: @skinkie

After the segfault the hardware value is still correct:

print ((ODBCDescRec *) 0x79feb0)->sql_desc_indicator_ptr

Comment 14764

Date: 2010-08-24 18:57:46 +0200
From: @skinkie

$32 = (SQLLEN *) 0x79f978

Comment 14765

Date: 2010-08-24 19:21:50 +0200
From: @skinkie

I'm deeply puzzled.

(gdb) print ((ODBCDescRec *) 0x79feb0)->sql_desc_indicator_ptr
$33 = (SQLLEN *) 0x79f978
(gdb) up
1 0x00007ffff5fdc9cf in SQLFetch_ (stmt=0x78c6b0) at ../../../../src/odbc/driver/SQLFetch.c:134
134 if (ODBCFetch(stmt, i,
(gdb) print rec->sql_desc_indicator_ptr
$35 = (SQLLEN *) 0xf978

Comment 14766

Date: 2010-08-24 20:05:34 +0200
From: @drstmane

Well, what does print rec say?
In other words is rec == 0x79feb0 ?

Comment 14767

Date: 2010-08-24 20:24:18 +0200
From: @skinkie

(In reply to comment 11)

Well, what does print rec say?
In other words is rec == 0x79feb0 ?

Negative. If I 'up'.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5fc604a in ODBCFetch (stmt=0x78c6b0, col=1, type=1, ptr=0x79fd50,
buflen=11, lenp=0x79f978, nullp=0xf978, precision=0, scale=0,
datetime_interval_precision=0, offset=0, row=0)
at ../../../../src/odbc/driver/ODBCConvert.c:1037
1037 *nullp = SQL_NULL_DATA;
(gdb) print rec
$5 = 0
(gdb) print ((ODBCDescRec *) 0x79feb0)->sql_desc_indicator_ptr
$6 = (SQLLEN *) 0x79f978
(gdb) up
1 0x00007ffff5fdc9cf in SQLFetch_ (stmt=0x78c6b0)
at ../../../../src/odbc/driver/SQLFetch.c:134
134 if (ODBCFetch(stmt, i,
(gdb) print rec
$7 = (ODBCDescRec *) 0x7a61f0

Comment 14768

Date: 2010-08-24 20:46:28 +0200
From: @skinkie

If I watch this new value I see:

Breakpoint 1, SQLBindCol (hStmt=0x78c6b0, nCol=14, nTargetType=1,
pTargetValue=0x7a20f0, nTargetValueMax=7881, pnLengthOrIndicator=0x79fcb8)
at ../../../../src/odbc/driver/SQLBindCol.c:51
51 ODBCStmt *stmt = (ODBCStmt *) hStmt;
(gdb)
Continuing.
Hardware watchpoint 3: ((ODBCDescRec *) 0x7a61f0)->sql_desc_indicator_ptr

Old value = (SQLLEN *) 0x0
New value = (SQLLEN *) 0x79f978
0x00007ffff688fb9b in memcpy () from /lib/libc.so.6
(gdb) bt
0 0x00007ffff688fb9b in memcpy () from /lib/libc.so.6
1 0x00007ffff688790b in ?? () from /lib/libc.so.6
2 0x00007ffff6887c80 in realloc () from /lib/libc.so.6
3 0x00007ffff5fcfe08 in setODBCDescRecCount (desc=0x78c7f0, count=14)
at ../../../../src/odbc/driver/ODBCDesc.c:180
4 0x00007ffff5fd28f2 in SQLBindCol (hStmt=0x78c6b0, nCol=14, nTargetType=1,
pTargetValue=0x7a20f0, nTargetValueMax=7881, pnLengthOrIndicator=0x79fcb8)
at ../../../../src/odbc/driver/SQLBindCol.c:155
5 0x00007ffff7b85c1e in SQLBindCol () from /usr/lib/libodbc.so.1
6 0x000000000041e727 in CSphSource_ODBC::SqlQuery (this=0x76b920,
sQuery=) at sphinx.cpp:22193
7 0x0000000000442432 in CSphSource_SQL::IterateHitsStart (this=0x76b920,
sError=) at sphinx.cpp:19208
8 0x0000000000461c2a in CSphIndex_VLN::Build (this=0x76cd90,
dSources=, iMemoryLimit=,
iWriteBuffer=) at sphinx.cpp:9306
9 0x000000000040ba73 in DoIndex (hIndex=,
sIndexName=, hSources=,
bVerbose=, fpDumpRows=)
at indexer.cpp:1138
10 0x000000000040cd8d in main (argc=,
argv=) at indexer.cpp:1609


Hardware watchpoint 3: ((ODBCDescRec *) 0x7a61f0)->sql_desc_indicator_ptr

Old value = (SQLLEN *) 0x79f978
New value = (SQLLEN *) 0xf978
CSphSource_ODBC::SqlFetchRow (this=0x76b920) at sphinx.cpp:22274
22274 ARRAY_FOREACH ( i, m_dColumns )
(gdb) bt
0 CSphSource_ODBC::SqlFetchRow (this=0x76b920) at sphinx.cpp:22274
1 0x0000000000441c70 in CSphSource_SQL::NextDocument (this=0x76b920,
sError=...) at sphinx.cpp:19435
2 0x00000000004364e2 in CSphSource_Document::IterateHitsNext (this=0x200,
sError=...) at sphinx.cpp:18522
3 0x0000000000463b8e in CSphIndex_VLN::Build (this=0x76cd90,
dSources=, iMemoryLimit=,
iWriteBuffer=) at sphinx.cpp:9559
4 0x000000000040ba73 in DoIndex (hIndex=,
sIndexName=, hSources=,
bVerbose=, fpDumpRows=)
at indexer.cpp:1138
5 0x000000000040cd8d in main (argc=,
argv=) at indexer.cpp:1609

The ARRAY_FOREACH is defined:

define ARRAY_FOREACH(_index,_array)
for ( int _index=0; _index<_array.GetLength(); _index++ )

Comment 14769

Date: 2010-08-24 20:52:12 +0200
From: @drstmane

...

and at that very point, what does

list 22274

say, i.e., what does the complete code around line 22274 in sphinx.cpp look like?

...

Comment 14770

Date: 2010-08-24 21:11:22 +0200
From: @skinkie

(In reply to comment 14)

and at that very point, what does

list 22274

say, i.e., what does the complete code around line 22274 in sphinx.cpp look
like?

http://code.google.com/p/sphinxsearch/source/browse/trunk/src/sphinx.cpp22274

     ARRAY_FOREACH ( i, m_dColumns )
     {
             QueryColumn_t & tCol = m_dColumns[i];
             switch ( tCol.m_iInd )
             {
                     case SQL_NO_DATA:
                     case SQL_NULL_DATA:
                             tCol.m_dContents[0] = '\0';
                             tCol.m_dContents[0] = '\0';
                             break;

                     default:

if USE_WINDOWS // FIXME! support UCS-2 columns on Unix too
if ( tCol.m_bUnicode )
{
int iConv = WideCharToMultiByte ( CP_UTF8, 0, LPCWSTR ( &tCol.m_dRaw[0] ), tCol.m_iInd/sizeof(WCHAR),
LPSTR ( &tCol.m_dContents[0] ), tCol.m_iBufferSize-1, NULL, NULL );

                                     if ( iConv==0 )
                                             if ( GetLastError()==ERROR_INSUFFICIENT_BUFFER )
                                                     iConv = tCol.m_iBufferSize-1;

                                     tCol.m_dContents[iConv] = '\0';

                             } else

endif
{
tCol.m_dContents[tCol.m_iInd] = '\0';
}
break;
}
}

Comment 14771

Date: 2010-08-24 21:16:43 +0200
From: @sjoerdmullender

What is the m_iInd field in the QueryColumn_t struct/class defined as?
From the name, I fear INT when it should be SQLLEN.

I'm assuming you're running this on a 64 bit architecture. Is that correct?

Comment 14772

Date: 2010-08-24 21:22:41 +0200
From: @skinkie

(In reply to comment 16)

What is the m_iInd field in the QueryColumn_t struct/class defined as?
From the name, I fear INT when it should be SQLLEN.

             SQLLEN                          m_iInd;

So that is correct, (you do know that SQLLEN itself is defined as SQLINTEGER, and therefore int?)

I'm assuming you're running this on a 64 bit architecture. Is that correct?

Yes x86_64, Intel E2220.

Comment 14773

Date: 2010-08-25 10:17:03 +0200
From: @sjoerdmullender

The line of the second time the watch point hit (which is the interesting one) does not seem to match the code. The ARRAY_FOREACH line doesn't make any significant changes.
Is your compilation up-to-date? (I have to ask--it may also be that the compiler was confused.)
Also, when did you last update your sphinxsearch code? The line numbers on the page you refer to don't match with your line numbers.

I suspect that somewhere in the sphinx code the value is overwritten, but it is hard to pinpoint if the line numbers don't match.

Comment 14774

Date: 2010-08-25 12:41:20 +0200
From: @skinkie

(In reply to comment 18)

The line of the second time the watch point hit (which is the interesting one)
does not seem to match the code. The ARRAY_FOREACH line doesn't make any
significant changes.
Is your compilation up-to-date? (I have to ask--it may also be that the
compiler was confused.)

I compiled Sphinx yesterday again from SVN before I started to test if the indexer was fixed already. From that point on I noticed that their 'automatic backtrace' showed the MonetDB ODBC driver on the top of the list.

Also, when did you last update your sphinxsearch code? The line numbers on the
page you refer to don't match with your line numbers.

The Google URL page is actually updated, but I can do a SVN update again (now at r2464).

I suspect that somewhere in the sphinx code the value is overwritten, but it is
hard to pinpoint if the line numbers don't match.

Which value should I watch? The original rec, or the one at nCol==14?

Comment 14775

Date: 2010-08-25 13:33:10 +0200
From: @skinkie

Recompiled MonetDB and Sphinx;

Hardware watchpoint 1: ((ODBCDescRec *) 0x7a61f0)->sql_desc_indicator_ptr

Old value = (SQLLEN *) 0x0
New value = (SQLLEN *) 0x79f978
0x00007ffff688fb9b in memcpy () from /lib/libc.so.6
(gdb) bt
0 0x00007ffff688fb9b in memcpy () from /lib/libc.so.6
1 0x00007ffff688790b in ?? () from /lib/libc.so.6
2 0x00007ffff6887c80 in realloc () from /lib/libc.so.6
3 0x00007ffff5fcfe08 in setODBCDescRecCount (desc=0x78c7f0, count=14)
at ../../../../src/odbc/driver/ODBCDesc.c:180
4 0x00007ffff5fd28f2 in SQLBindCol (hStmt=0x78c6b0, nCol=14, nTargetType=1,
pTargetValue=0x7a20f0, nTargetValueMax=7881, pnLengthOrIndicator=0x79fcb8)
at ../../../../src/odbc/driver/SQLBindCol.c:155
5 0x00007ffff7b85c1e in SQLBindCol () from /usr/lib/libodbc.so.1
6 0x000000000041e887 in CSphSource_ODBC::SqlQuery (this=0x76b920,
sQuery=) at sphinx.cpp:22196
7 0x0000000000441d42 in CSphSource_SQL::IterateHitsStart (this=0x76b920,
sError=) at sphinx.cpp:19211
8 0x0000000000461c58 in CSphIndex_VLN::Build (this=0x76cd90,
dSources=, iMemoryLimit=,
iWriteBuffer=) at sphinx.cpp:9306
9 0x000000000040ba73 in DoIndex (hIndex=,
sIndexName=, hSources=,
bVerbose=, fpDumpRows=)
at indexer.cpp:1138
10 0x000000000040cd8d in main (argc=,
argv=) at indexer.cpp:1609
(gdb) c
Continuing.
Hardware watchpoint 1: ((ODBCDescRec *) 0x7a61f0)->sql_desc_indicator_ptr

Old value = (SQLLEN *) 0x79f978
New value = (SQLLEN *) 0xf978
CSphSource_ODBC::SqlFetchRow (this=0x76b920) at sphinx.cpp:22277
22277 ARRAY_FOREACH ( i, m_dColumns )
(gdb) bt
0 CSphSource_ODBC::SqlFetchRow (this=0x76b920) at sphinx.cpp:22277
1 0x0000000000435810 in CSphSource_SQL::NextDocument (this=0x76b920,
sError=...) at sphinx.cpp:19438
2 0x0000000000434032 in CSphSource_Document::IterateHitsNext (this=0x200,
sError=...) at sphinx.cpp:18525
3 0x0000000000463bde in CSphIndex_VLN::Build (this=0x76cd90,
dSources=, iMemoryLimit=,
iWriteBuffer=) at sphinx.cpp:9559
4 0x000000000040ba73 in DoIndex (hIndex=,
sIndexName=, hSources=,
bVerbose=, fpDumpRows=)
at indexer.cpp:1138
5 0x000000000040cd8d in main (argc=,
argv=) at indexer.cpp:1609

Comment 14776

Date: 2010-08-25 13:49:05 +0200
From: @sjoerdmullender

Your crash happened on the statement
*nullp = SQL_NULL_DATA;
As we had established, this was because the value of nullp was corrupt.
nullp is initialized from the value of rec->sql_desc_indicator_ptr in
SQLFetch_, and we've seen that that value is indeed also corrupt. We've seen
that this field is originally assigned the correct value, so what we need to
find out is, where did the value get corrupted. That's what we need to "watch"
in gdb.

From the stack trace of the crash, we can see that the crash happens for column
1 (col=1 parameter to ODBCFetch). So the interesting value to watch is
stmt->ApplRowDescr->descRec[1].sql_desc_indicator_ptr
where stmt is the value passed to SQLFetch by the ODBC driver manager (which
might be the value passed by sphinxsearch to SQLFetch in the
CSphSource_ODBC::SqlFetchRow method, but equally, it might not--I don't know
what unixODBC does with the pointer).
Note that both stmt->ApplRowDescr and stmt->ApplRowDescr->descRec are pointers
to allocated memory, and hence can change value (and
stmt->ApplRowDescr->descRec does for sure, witness the call to realloc in one
of your traces).

I don't know how many watch points you can have simultaneously (I seem to be
able to only have one), but what you could do is first determine the value of
stmt in any of the calls to the MonetDB ODBC driver and note that value down.
For instance, you could set a breakpoint in SQLBindCol as you did before.
Then check whether stmt->ApplRowDescr is NULL. If it is, set a watchpoint on
it and continue until it gets assinged a value. If it isn't NULL, or when it
gets assigned a value, delete this watchpoint and set a watchpoint on
stmt->ApplRowDescr->descRec, and if it is not NULL, also on
stmt->ApplRowDescr->descRec[1].sql_desc_indicator_ptr. If it is NULL, continue
and set the latter watchpoint once the value is not NULL.

When stmt->ApplRowDescr->descRec gets realloced, the watchpoint on stmt->ApplRowDescr->descRec[1].sql_desc_indicator_ptr may have to be reset.

In the above, you may have to replace stmt with ((ODBCStmt *) 0x...) where you
fill in the value of stmt you had found.

What I really want to know is in which piece of code does the value of
sql_desc_indicator_ptr (the value you're watching) change from 0x79f978 to
0xf978 (or whatever other value it gets in the new run).

Comment 14777

Date: 2010-08-25 14:27:43 +0200
From: @skinkie

This happens at a specific document; the 114th document in the system. And I'm currently trying to find a way to get the output trace from 113->114.

Comment 14778

Date: 2010-08-25 15:02:18 +0200
From: @skinkie

My current hunch;
tCol.m_dContents[0] = '\0';

The m_dContents here is set (also notice the two lines with [0]), it is basically the only place where a variable is set inside the loop. If a char is applied on that memory location. It could inhibbit the behavior that only the first byte are set, instead of all 4 bytes for SQLLEN, generating a 'not null' value.

Trying to figure that out.

Comment 14938

Date: 2010-09-26 16:48:24 +0200
From: @skinkie

A nice day of debugging showed;

Sphinx was limiting the buffer size of a full document to 1MB. I have a string in my table that exceeds that value (in bytes), which made the buffer size go back to the default 1024bytes, instead of giving an error for example. After I had fixed this bug another issue came up. I'll try to elaborate...

SQLDescribeCol.c:88

     /* also see SQLDescribeParam */
     if (pnColSize)
             *pnColSize = ODBCDisplaySize(rec);

This value is in 'characters on the screen'. Corresponding with LENGTH [1].

The last parameter of SQLBindCol returns the actual StrLen or Ind [2].

Now the million dollar question: if a column description is fetched by DescribeCol, why isn't it possible to get the maximum size in bytes, as would occur in SQLBindCol/SQLFetch? [I consider: "It's not in the standard" not an acceptable reply for an UTF-8 database.] It is obvious that programs that allocate a buffer to store the column, based on the maximum (display) size of the Column Description will fail if the actual size is bigger due to encoding in UTF-8 for example.

With respect to legacy software no program will fail if the maximum size in bytes is used. Opposed to the only 'true' solution in this case: multiplication of the output buffer by 4 because the database administrator knows MonetDB talks UTF-8. I would find it interesting if the ODBC driver could be compiled using --with-columnsize=bytes vs --with-columnsize=displaychars.

[1] http://msdn.microsoft.com/en-us/library/ms711786(v=VS.85).aspx
[2] http://msdn.microsoft.com/en-us/library/ms711010(VS.85).aspx

Comment 14939

Date: 2010-09-27 15:56:27 +0200
From: @sjoerdmullender

I understand from your comment that the bug that triggered this report was actually in Sphinx and not in MonetDB, so this report can be closed.
Furthermore, the bug tracker is not the place to ask question, and if you want to suggest a change to MonetDB, then please submit a new bug and mark it as "enhancement".
Having said that, it is not likely that any change will be made that is incompatible with the relevant standard (ODBC in this case). Apart from that, the driver does not have the information you want. The server sends the precision of the type (i.e. 10 when the type is VARCHAR(10)), and that size is a "logical" size, i.e. the number of Unicode code points that can be stored in the value.
If you know or suspect that a value contains non-ASCII characters, you can always multiply the size that you get from SQLDescribeCol with 6 (the maximum number of bytes needed for representing a Unicode code point using the UTF-8 encoding), or you can do two calls to first a call to SQLFetch to get the needed size, then a call to SQLBindCol with an bigger buffer, and finally another call to SQLFetch to get the complete data.

Comment 14941

Date: 2010-09-27 18:29:13 +0200
From: @skinkie

Thanks for your reply Sjoerd. I've filed a new issue that is close to a PostgreSQL bug reported in the last months.

http://bugs.monetdb.org/show_bug.cgi?id=2677

@monetdb-team monetdb-team added bug Something isn't working Client interfaces invalid This doesn't seem right normal labels Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Client interfaces invalid This doesn't seem right normal
Projects
None yet
Development

No branches or pull requests

1 participant