You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Build Identifier:
I stumbled upon a couple of calls of pcre_exec that, according to the manual, should be fixed.
The manual says, about the size of the output vector (ovector):
"The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down."
Then I suppose the following call will always be rounded down to 9. Better use 9 then.
The following one seems more dangerous.
The third element ("used as workspace by pcre_exec") is not allocated. Also, ovectsize is 2, which becomes 0 when rounded down:
You're right about the multiple of 3. I'll fix that.
How do you use PCREindex and what do you expect that it does?
There is no SQL function pcre_index in the code we distribute, so how did you define that?
True, I had forgotten that I had defined that myself (it is actually an alias for SQL function "patindex"):
CREATE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";
I would expect this to return 4 in both my previous calls. It does when the pattern is a literal '1', but not when it is a '\d' or a '[0-9]' (even after fixing the size of the output vector).
I'm not actually using this function, I just found this by chance.
Speaking of this, I wonder why the most useful pcre functions are available in mal but not mapped in SQL.
These are the extra functions I define:
CREATE OR REPLACE FUNCTION pcre_match(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."match";
CREATE OR REPLACE FUNCTION pcre_imatch(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."imatch";
CREATE OR REPLACE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";
CREATE OR REPLACE FUNCTION pcre_replace(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace";
CREATE OR REPLACE FUNCTION pcre_replacefirst(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace_first";
The problem with pcre.patindex is that it quotes all special characters in the pattern, so it is basically just an expensive way to do an ordinary substring search. This should probably be changed.
But it means that the result with pattern \d or [0-9] is totally expected: those strings don't occur in the to-be-matched string, and the "nomatch" return value is 0.
I fixed the bug with the multiple-of-three. The rest of the report is, I think, I misunderstandment of what the pcre_patindex function does. So I'm closing this.
The text was updated successfully, but these errors were encountered:
Date: 2019-07-01 16:24:39 +0200
From: @swingbit
To: SQL devs <>
Version: 11.33.3 (Apr2019)
Last updated: 2019-09-02 16:05:26 +0200
Comment 27095
Date: 2019-07-01 16:24:39 +0200
From: @swingbit
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Build Identifier:
I stumbled upon a couple of calls of pcre_exec that, according to the manual, should be fixed.
The manual says, about the size of the output vector (ovector):
"The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down."
Then I suppose the following call will always be rounded down to 9. Better use 9 then.
define BODY (pcre_exec(re, pe, v, (int) strlen(v), 0, 0, ovector, 10) >= 0)
The following one seems more dangerous.
The third element ("used as workspace by pcre_exec") is not allocated. Also, ovectsize is 2, which becomes 0 when rounded down:
PCREindex(int *res, const pcre *pattern, const str *s)
{
ifdef HAVE_LIBPCRE
int v[2];
}
The function PCREindex doesn't seem to work correctly indeed:
sql>select pcre_index('1','abc1def');
+------+
| L2 |
+======+
| 4 |
+------+
sql>select pcre_index('\d','abc1def');
+------+
| L2 |
+======+
| 0 |
+------+
Reproducible: Always
Comment 27096
Date: 2019-07-01 19:44:53 +0200
From: @sjoerdmullender
You're right about the multiple of 3. I'll fix that.
How do you use PCREindex and what do you expect that it does?
There is no SQL function pcre_index in the code we distribute, so how did you define that?
Comment 27097
Date: 2019-07-02 11:12:27 +0200
From: @swingbit
True, I had forgotten that I had defined that myself (it is actually an alias for SQL function "patindex"):
CREATE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";
I would expect this to return 4 in both my previous calls. It does when the pattern is a literal '1', but not when it is a '\d' or a '[0-9]' (even after fixing the size of the output vector).
I'm not actually using this function, I just found this by chance.
Speaking of this, I wonder why the most useful pcre functions are available in mal but not mapped in SQL.
These are the extra functions I define:
CREATE OR REPLACE FUNCTION pcre_match(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."match";
CREATE OR REPLACE FUNCTION pcre_imatch(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."imatch";
CREATE OR REPLACE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";
CREATE OR REPLACE FUNCTION pcre_replace(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace";
CREATE OR REPLACE FUNCTION pcre_replacefirst(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace_first";
Comment 27098
Date: 2019-07-02 11:48:10 +0200
From: @sjoerdmullender
The problem with pcre.patindex is that it quotes all special characters in the pattern, so it is basically just an expensive way to do an ordinary substring search. This should probably be changed.
But it means that the result with pattern \d or [0-9] is totally expected: those strings don't occur in the to-be-matched string, and the "nomatch" return value is 0.
Comment 27099
Date: 2019-07-02 11:53:00 +0200
From: @swingbit
Ah, I see now. It maps to PCREpatindex, which is meant to accept SQL-like patterns, not pcre patterns.
Comment 27105
Date: 2019-07-03 16:14:17 +0200
From: @sjoerdmullender
I fixed the bug with the multiple-of-three. The rest of the report is, I think, I misunderstandment of what the pcre_patindex function does. So I'm closing this.
The text was updated successfully, but these errors were encountered: