Escaping an ampersand in SQL Server Full-Text Search query using CONTAINSTABLE - sql-server

I have a very peculiar case. My ASP.NET page calls a stored procedure of ours that performs a Full-Text Search query on our database. Some of the commonly searched strings include an ampersand because a few brands of our products (well-known brands, too) have an & in their name.
It turns out that in a certain case I get no results unless I escape the ampersand (\&), and in a certain other case I get no results only if I escape the ampersand.
I don't know if this is relevant, but (without giving out the brand names) one ends in &b and the other one in &c.
Is it possible that these strings (&b or &c) have some special meaning of their own? And that by escaping them I'm actually passing a special string to T-SQL?
EDIT
Additional info: after further testing, I proved that the error is in the stored procedure itself. Calling it with & or \& yields different results.
I'll try to post selected parts of the stored procedures. I won't post it all, because most of it isn't really relevant.
The vParamBuca parameter is the one that causes the troubles. Values could be 'word&letter' or word\&letter.
SET #ricercaA = '''FORMSOF(INFLECTIONAL,"' +
REPLACE(LTRIM(RTRIM(#vParamBuca)),' ', '") AND FORMSOF(INFLECTIONAL,"') + '")'''
The variable #ricercaA is then used to create the query string:
[...]
FROM Products AS FT_TBL
LEFT OUTER JOIN CONTAINSTABLE (Products, Sign1, '+ #ricercaA + ') AS ColSign1_0 ON FT_TBL.ID = ColSign1_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, ManufacturerAdditionalText, '+ #ricercaA + ') AS ColManufacturerAdditionalText_0 ON FT_TBL.ID = ColManufacturerAdditionalText_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, ManufacturerForSearch, '+ #ricercaA + ') AS ColManufacturer_0 ON FT_TBL.ID = ColManufacturer_0.[KEY]
LEFT OUTER JOIN CONTAINSTABLE (Products, TuttaLaRiga, '+ #ricercaA + ') AS ColTuttaLaRiga_0 ON FT_TBL.ID = ColTuttaLaRiga_0.[KEY]
[...]
EDIT 2
Many thanks to #srutzky for pointing me in the right direction! In the meanwhile, I also found a data inconsistency where one of the brands with the & in its name was modified not to have the &, and the other one wasn't modified (bottom line, my current problem is caused by that: a partial fix that was made by someone in the past).
Anyway, back on track. Now I understand that the & character in the CONTAINSTABLE function is treated as a logical AND (non bitwise).
I still need a solution for that. This answer gives a solution that doesn't work for me (the conditions are not the same as mine). How could I perform a CONTAINSTABLE search for a string with an ampersand in it? Preferably without having to transform the ampersand to another safe character?

The odd behavior you are seeing is most likely due to the CONTAINS and CONTAINSTABLE functions (both used with SQL Server's Full Text Search feature) using the ampersand ( & ) character as equivalent to the AND operator. The following statement is taken from the documentation for CONTAINS:
The ampersand symbol (&) may be used instead of the AND keyword to represent the AND operator.
There is no mention of there being any escape character for it (and a back-slash isn't typically an escape character in SQL anyway).
UPDATE
Based on the information now provided in "Edit 2" of the Question, and additional research, I would say that you do not need to escape anything. It seems that putting the search phrases in double-quotes (as a result of using FORMSOF) treats the & as either a literal or a word-breaker, depending on the values on both sides of the &. Try the following examples:
DECLARE #Term NVARCHAR(100);
SET #Term = N'bob&sally'; -- 48 rows
--SET #Term = N'bob\&sally'; -- 48 rows
--SET #Term = N'r&f'; -- 4 rows
--SET #Term = N'r\&f'; -- 24 rows
SET #Term = N'FORMSOF(INFLECTIONAL,"' + #Term + '")';
SELECT * FROM sys.dm_fts_parser(#Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(#Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(#Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(#Term, 1033, NULL, 1);
The results for bob&sally and bob\&sally are the same, and in both cases bob and sally are separated and never combined into a single exact-match string.
The results between r&f and r\&f, however, are not the same. r&f is only ever treated as a single, exact-match string because r and f alone are not known words. On the other hand, adding in the back-slash separates the two letter since \ is a word-breaker, in which case you get both r and f.
Given that you stated in the Update that you have "data inconsistency, where one of the brands with the "&" in its name was modified not to have the "&", and the other one wasn't", I suspect that when you do not add in the \ character you get the brand that was not modified (since it is an exact match for the full term). But when you do add in the \ character, then you get the brand that was modified to have the & removed, since you are now searching on both pieces, each one matching part of that brand name.
I would fix the data to be consistent: update the brand names that had the & removed to put the ampersands back in. Then when people search using & without the extra \ added, it will be an exact match. This behavior will be consisted across the data, and will not require you adding code to circumvent the natural operation of FTS, which seems to be an error-prone approach.

Related

ABAP SQL preserve OR pad trailing spaces

I am trying to find a way to preserve a space within SQL concatenation.
For context: A table I am selecting from a table with a single concatenated key column. Concatenated keys respect spaces.
Example: BUKRS(4) = 'XYZ ', WERKS(4) = 'ABCD' is represented as key XYZ ABCD.
I am trying to form the same value in SQL, but it seems like ABAP SQL auto-trims all trailing spaces.
Select concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) as key, datab, datbi
from t001w
inner join tvko on tvko~vkorg = t001w~vkorg
left join ztab on ztab~key = concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) "This is why I need the concat
rpad( tvko~bukrs, 4, ' ' ) in this example returns XYZ, instead of XYZ , which leads to concatenated value being XYZABCD, rather than XYZ ABCD.
lpad seems to work just fine (returning XYZ), which leads me to believe I'm doing something wrong.
SQL functions don't accept string literals or variables (which preserve spaces in the same circumstances in ABAP) as they are non-elementary types.
Is there any way to pad/preserve the spaces in ABAP SQL (without pulling data and doing it in application server)?
Update: I solved my problem by splitting key selection from data selection and building the key in ABAP AS. It's a workaround that avoids the problem instead of solving it, so I'll keep the question open in case an actual solution appears.
EDIT: this post doesn't answer the question of inserting a number of characters which vary based on values in some table columns e.g. LENGTH function is forbidden in RPAD( tvko~bukrs, LENGTH( ... ), (' ') ). It's only starting from ABAP 7.55 that you can indicate SQL expressions instead of fixed numbers. You can't do it in ABAP before that. Possible workarounds are to mix ABAP SQL and ABAP (e.g. LIKE 'part1%part2' and then filtering out using ABAP) or to use native SQL directly (ADBC, AMDP, etc.)
Concerning how the trailing spaces are managed in OpenSQL/ABAP SQL, they seem to be ignored, the same way as they are ignored with ABAP fixed-length character variables.
Demonstration: I simplified your example to extract the line Walldorf plant:
These ones don't work (no line returned):
SELECT * FROM t001w
WHERE concat( 'Walldorf ' , 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_1).
SELECT * FROM t001w
WHERE concat( rpad( 'Walldorf', 1, ' ' ), 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_2).
These 2 ones work, one with leading space(s), one using concat_with_space:
SELECT * FROM t001w
WHERE concat( 'Walldorf', ' plant' ) = t001w~name1
INTO TABLE #DATA(itab_3).
SELECT * FROM t001w
WHERE concat_with_space( 'Walldorf', 'plant', 1 ) = t001w~name1
INTO TABLE #DATA(itab_4).
General information: ABAP documentation - SQL string functions
EDIT: working example added, using leading space(s).

How to replace escape character in Netezza column

I am trying to replace escape character in Netezza column, but it is not properly replacing.
Please help me some one on this.
select replace('replaces\tring','\','\\\\');
I need output as replaces\\\\tring. Below is the error message i am getting...
ERROR [42S02] ERROR: Function 'REPLACE(UNKNOWN, UNKNOWN, UNKNOWN)'
does not exist Unable to identify a function that satisfies the given
argument types You may need to add explicit typecasts
Thanks in advance.
This is because REPLACE function needs to be installed (which is not by default). There is another function which is called TRANSLATE which can be used in a limited way instead of REPLACE but unfortunately won't fit in your situation.
You can use the below query instead:
SELECT SUBSTRING(x, 1, INSTR(x, '\') - 1) || '\\\\' || SUBSTRING(x, INSTR(x, '\') + LENGTH('\')) FROM
(SELECT 'replaces\tring' AS x) t
\ passed to INSTR and LENGTH is the string to be replaced. Note that they occur in three positions.
\\\\ in the middle is the replacement string.
replaces\tring is the string to search in.
Check the below example for replace love with like in I love Netezza:
SELECT SUBSTRING(x, 1, INSTR(x, 'love') - 1) || 'like' || SUBSTRING(x, INSTR(x, 'love') + LENGTH('love')) FROM
(SELECT 'I love Netezza' AS x) t
That particular function is part of the "SQL Extensions toolkit" and on our system it is placed in the ADMIN schema of the SQLEXT database. All users have been granted execute access to that schema. Furthermore the database.schema have been placed in the path (the DBA's did it globally, but you can issue a "set PATH=..." in your session if need be)
our path is:
select current_path;
CURRENT_PATH
---------------------------------------------------------------------------------------
SQLEXT.ADMIN,INZA.INZA,NZA.INZA,NZM.INZA,NZMSG.INZA,NZR.INZA,NZRC.INZA,SYNCHDB.ADMIN
and as you can see the SQLEXT is at the beginning...

SQL Server String extract based on pattern

I have string data in the following format:
MODELNUMBER=Z12345&HELLOWORLD=WY554&GADTYPE=PLA&ID=Z-12345
/DTYPE=PLA&ID=S-10758&UN_JTT_REDIRECT=UN_JTT_IOSV
and need to extract IDs based on two conditions
Starting after a pattern &ID=
Ending till the last character or
if it hits a & stop right there.
So in the above example I'm using the following code:
SUBSTRING(MyCol,(PATINDEX('%&id=%',[MyCol])+4),(LEN(MyCol) - PATINDEX('%&id%',[MyCol])))
Essentially looking the pattern &id=% and extract string after that till end of the line. Would anyone advise on how to handle the later part of the logic ..
My current results are
Z-12345
Z-12345&UN_JTT_REDIRECT=UN_JTT_IOSV
What I need is
Z-12345
Z-12345
Try this
SUBSTRING(MyCol, (PATINDEX('%[A-Z]-[0-9][0-9][0-9][0-9][0-9]%',[MyCol])),7)
if you run into performance issues add the where clause below
-- from Mytable
WHERE [MyCol] like '%[A-Z]-[0-9][0-9][0-9][0-9][0-9]%'
maybe not the most elegant solution but it works for me.
Correct syntax of PATINDEX
Here's one example how to do it:
select
substring(d.data, s.s, isnull(nullif(e.e,0),2000)-s.s) as ID,
d.data
from data d
cross apply (
select charindex('&ID=', d.data)+4 as s
) s
cross apply (
select charindex('&', d.data, s) as e
) e
where s.s > 4
This assumes there data column is varchar(2000) and the where clause leaves out any rows that don't have &ID=
The first cross apply searches for the start position, the second one for the end. The isnull+nulliff in the actual select handles the case where & is not found and replaces it with 2000 to make sure the whole string is returned.

Building dynamic query for Sql Server 2008 when table name contains " ' "

I need to fetch Table's TOP_PK, IDENT_CURRENT, IDENT_INCR, IDENT_SEED for which i am building dynamic query as below:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{1}]') AS CURRENT_IDENT, IDENT_INCR('[{1}]') AS IDENT_ICREMENT, IDENT_SEED('[{1}]') AS IDENT_SEED", pPrimaryKey, pTableName)
Here pPrimaryKey is name of Table's primary key column and pTableName is name of Table.
Now, i am facing problem when Table_Name contains " ' " character.(For Ex. KIN'1)
When i am using above logic and building query it would be as below:
SELECT (SELECT TOP 1 [ID] FROM [KIL'1]) AS TOP_PK, IDENT_CURRENT('[KIL'1]') AS CURRENT_IDENT, IDENT_INCR('[KIL'1]') AS IDENT_ICREMENT, IDENT_SEED('[KIL'1]') AS IDENT_SEED
Here, by executing above query i am getting error as below:
Incorrect syntax near '1'.
Unclosed quotation mark after the character string ') AS IDENT_SEED'.
So, can anyone please show me the best way to solve this problem?
Escape a single quote by doubling it: KIL'1 becomes KIL''1.
If a string already has adjacent single quotes, two becomes four, or four becomes eight... it can get a little hard to read, but it works :)
Using string methods from .NET, your statement could be:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{2}]') AS CURRENT_IDENT, IDENT_INCR('[{2}]') AS IDENT_ICREMENT, IDENT_SEED('[{2}]') AS IDENT_SEED", pPrimaryKey, pTableName, pTableName.Replace("'","''"))
EDIT:
Note that the string replace is now only on a new, third substitution string. (I've taken out the string replace for pPrimaryKey, and for the first occurrence of pTableName.) So now, single quotes are only doubled, when they will be within other single quotes.
You need to replace every single quote into two single quotes http://beyondrelational.com/modules/2/blogs/70/posts/10827/understanding-single-quotes.aspx

Intra-SELECT variables?

Would it be possible to alias an expression returned by a SELECT statement in order to refer to it in other parts of this same SELECT as if it would be a column among others ?
A kind of "temporary variable" whose scope would be limited to the SELECT statement, a little bit like the WITH clause before a SELECT to use a temporary named recorset.
A naive sample of what I'd like to achieve :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FULLNAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
where FULLNAME could be used to determine the subsequent output field ISCORRECT, though not being a real column of the table USERS... instead of this laboured error-prone (but working) copy/paste :
SELECT
FIRSTNAME + ' ' + NAME AS FULLNAME,
CASE WHEN LEN(FIRSTNAME + ' ' + NAME)>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
This sample well describes what I want, but I can easily imagine similar needs where FULLNAME might also be used in other parts of the SELECT statement : in a JOIN, in the WHERE, in a GROUP BY, ORDER BY, etc.
PS : I use SQL Server 2005 but would be also interested in any 2008-specific answer.
Thanks a lot ! :-)
Edit :
In spite of my high respect towards those of you proposing to use a side- or inner-query, I don't feel at ease with such possibilities. My sample really is a naive one. The true queries are rather with 30 output fields including complex expressions (including calls to CLR functions), 15 inner/left outer joins, and 20 additionnal where criteria. I suspect I had rather not multiplying to many indirections towards co-queries if I can avoid it.
I believe you would have to put it in an inner query, and then be able to refer to it outside of the query.
Simplest example based on yours:
select a.fullname, case when len(a.fullname) > 3 then 1
else 0 end as incorrect
from (select firstname + ' ' + name as fullname
from users) a
Example with a CTE
;with names (FULLNAME) as (
SELECT FIRSTNAME + ' ' + NAME
FROM USERS
) select
FULLNAME,
CASE WHEN LEN(FULLNAME) > 3 THEN 1 ELSE 0 END AS ISCORRECT
FROM names
You can use cross apply to concatenate strings or do calculations etc.. that involves just the current row.
select T.fullname,
case when len(T.fullname) > 3
then 1
else 0
end iscorrect
from users as U
cross apply
(select U.firstname+' '+U.name) as T(fullname)
order by T.fullname
Though not very satisfied with it, I choose (temporarily ?) a third option : I avoid co-queries and copy/pasting my complex hard-to-read expression (here symbolized by the simple one aliased as FULLNAME) by embeddind it in a scalar function... which is therefore called several times in different parts of my SELECT.
SELECT
dbo.GetFULLNAME(FIRSTNAME,NAME) AS FULLNAME,
CASE WHEN LEN(dbo.GetFULLNAME(FIRSTNAME,NAME))>3 THEN 1 ELSE 0 END AS ISCORRECT
FROM USERS
What do you think of it ?
(I precise that though more complex and unreadable than in my OP, the real expression remains a "simple" matter of string manipulation using several input fields, and doesn't involve any sub-querying or anything like that).

Resources