SQL Server 2008 select statement that ignores non alphanumeric characters - sql-server

I have an interesting SQL Server search requirement.
Say I have a table with Part Numbers as follows:
PARTNO DESCRIPTION
------ -----------
ABC-123 First part
D/12a92 Second Part
How can I create a search that will return results if I search, say, for 'D12A'?
I currently have a full text search set up for the description column, but I am looking to find parts that match the part no even when users don't include the / or - etc.
I'd rather do this in a single SQL statement rather than creating functions if possible as we only have read access to the DB.

You could do something like:
SELECT * FROM PART_TABLE
WHERE REPLACE(REPLACE(PARTNO,'/', ''),'-','') LIKE '%D12A%'
This would work for the 2 characters you specified and could be extended for more character like so:
SELECT * FROM PART_TABLE
WHERE REPLACE(REPLACE(REPLACE(PARTNO,'/', ''),'-',''),*,'') LIKE '%D12A%'
Probably not the most elegant of solutions unless your special characters are limited. Otherwise I'd suggest writing a Function to strip out non-alphanumeric characters.
Here is an example of such a function:
CREATE FUNCTION dbo.udf_AlphaNumericChars
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #RemovingCharIndex INT
SET #RemovingCharIndex = PATINDEX('%[^0-9A-Za-z]%',#String)
WHILE #RemovingCharIndex > 0
BEGIN
SET #String = STUFF(#String,#RemovingCharIndex,1,'')
#RemovingCharIndex = PATINDEX('%[^0-9A-Za-z]%',#String)
END
RETURN #String
END
------- Query Sample (untested)---------
SELECT *
FROM PART_TABLE
WHERE DBO.udf_AlphaNumericChars(PARTNO) LIKE '%D12A%'
Taken From: http://sqlserver20.blogspot.co.uk/2012/06/find-alphanumeric-characters-only-from.html

Related

smart replace (re-phrase) text in SQL

I have a string in SQL with the following structure:
#number <logical> #number <logical> ....
ex:
#SQL='((#1 or #2) and (#10 or #21))'
I would like to update the string to be:
#SQL='((con_1=1 or con_2=1) and (con_10=1 or con_21=1))'
meaning remove the '#' and replace it with 'con_', leave the number (1 digit or more) as is and add '=1' after the digit.
Any idea how to do it?
Please avoid composing function or procedure for that.
you may use patindex, stuff or any other built in function for that.
First split the string into records on the '#' char that starts the value needing processed.
STUFF can be used to insert the '=1' after the last digit of the value in each record. I think searching for the last number is harder than searching for the first. The reverse allows searching for the first instead of the last. Because the string is reversed, insert the reverse '1=' at the start of the reverse number. If there is no number, return the revered value as is.
Concatenate it all back with 'con_' instead of '#'.
declare #SQL varchar(200) = ' ((#1 or #2) and (#10 or #21)) #33 #55 ';
with split as (
select value, REVERSE(value) as [rev], PATINDEX('%[0-9]%', REVERSE(value)) as [pat]
FROM STRING_SPLIT(#SQL, '#')
), fixed as (
select value, rev, pat, CASE WHEN pat= 0 THEN REVERSE(rev) ELSE REVERSE(STUFF(rev, pat, 0, '1=')) END as [fix]
FROM split
) select STRING_AGG(fix, 'con_') from fixed
Result: ((con_1=1 or con_2=1) and (con_10=1 or con_21=1)) con_33=1 con_55=1
It would be nice if Regex support was included in SQL server. I imagine to do so would require a careful implementation to avoid using all the server resources. A regex search on a millions of records or varchar(max) columns....
Found this solution which uses SQL# (SQLsharp).
SQL# is a .NET / CLR library that resides in a SQL Server 2005 (or newer) database and provides a suite of User-Defined Functions, Stored Procedures, User-Defined Aggregates, and User-Defined Types.
My solution:
declare #SQL varchar(200) = '#1 OR(#2 AND #3 AND (#4 or #5 or #6) AND (#7 or #8))'
SELECT SQL#.RegEx_Replace4k(#SQL, N'(#)+(\d*)', N'CON_$2=1', -1, 1, N'')

Use result of sql query to replace text on multiple xml files

i have a table on sql like this:
CD_MATERIAL | CD_IDENTIFICACAO
1 | 002323
2 | 00322234
... | ...
AND SO ON (5000+ lines)
I need to use that info to search and replace multiple external xml files on a folder (all the tags on those XML had numbers like the CD_IDENTIFICACAO from sql query, i need to replace with corresponding cd_material from sql query "ex.: 002323 becomes 1)
I used this query to extract all the cd_identificacao to use on Notepad++:
declare #result varchar(max)
select #result = COALESCE(#result + '', '') + CONCAT('(',CD_IDENTIFICACAO,')|') from TBL_MATERIAIS WHERE CD_IDENTIFICACAO <> '' ORDER BY CD_MATERIAL
select #result
That would bring me ex.:
(1TEC45D025)|(1TEC800039)|(999999999)|(542251)|(2TEC58426)|(234852)
and changed the parameters to get the replace ex.:
(? 2000)|(? 2001)|(? 2002)|(? 2003)|(? 2004)|(? 2005)
but i don't know how to add a number (increment) on front of "?" so notepad++ would understand it (search and replace would have 5000+ results, so it's not pratical to manually add the increment).
I was able to get a workaround for this. I've used this query to get all the the terms for find and replace i needed (1 per line)
select concat('<cProd>',cd_identificacao,'</cProd>'), concat('<cProd>',cd_material,'</cProd>') from tbl_materiais where cd_identificacao <> '' order by cd_material
That would result in:
<cProd>1TEC460054</cProd> <cProd>1</cProd>
<cProd>1TEC240035</cProd> <cProd>2</cProd>
(i added the tag too to make sure no other information could be replaced as there were many number combinations that could lead to incorrect replacement)
then pasted it on a txt and i used the notepad++ to replace the space between column 1 and 2 for /r/n wich would result in:
<cProd>1TEC460054</cProd>
<cProd>1</cProd>
<cProd>1TEC240035</cProd>
<cProd>2</cProd>
then i used "Ecobyte Replace Text" Tool, pasted my result file as new selection in bottom frame, loaded all my files on a new replace group on top frame (on properties of the group, u can change directory and options), then executed the replacement, it worked perfectly.
Thx.

Attempting to run a while loop in my select statement under cases in SQL Server 2012

The Data
Let us say I have a field in SQL that consists of multi-line Information, each of which consists of i topics, each topic consisting of m points of information. Topics are prefaced with 'i.' and information with a dash. It looks something like:
________________________________________________
|Number | Information
|===============================================
|1 | 1. Topic 1.1
| | -Info 1.1.1
| | - ... [more info]
| | 2. Topic 1.2
| | -Info 1.2.1
| | - ...[more info]
| | ... [more topics]
|_______|_____________________________
|2 | 1. Topic 2.1
|....and so on
The Current System
What I am doing with this information is to parse out each topic into it's own column, then unpivoting those columns and searching for Topics that contain a given keyword #keyword.
Currently the code reads something like:
Select
Number
,Case When Information LIKE '%1. %2. %'
Then substring (Information, charindex('1.',Information),
charindex('2.', Information) -(charindex('1.',Information)+2) )
Else Information
End as [Topic1]
,Case When Information LIKE '%2. %3. %'
Then substring (Information, charindex('2.',Information),
charindex('3.', Information) -(charindex('2.',Information)+2) )
Else 'N/A'
End as [Topic2]
...repeat 2nd case for each set of numbers up to '%20. %21. %'
The only reason the first one is different is because if it doesn't match the pattern then I want to grab the whole field so that I don't miss anything. I then unpivot the Topic fields that I just created into a general [Topic] field, and then utilize a WHERE [Topic] LIKE '%' +#keyword+'%' to pull out any particular topics and their associated case number to output as my final table. The cases can have anywhere from 1 to 40+ topics attached, with 1-7 attached info fields per topic.
The Desired Modification
Notice: To make the code easier to read, I will not be writing my substring code in proper syntax, instead opting to write substring(Information,ci(#Iter), ci(#Iter+1)-ci(#Iter)) to denote the substring running from the position given by '(iter).' to the position given by '(iter+1).'
What I would like to do is to perform the following:
Declare #Iter smallint
Declare #Result varchar(max)
Select
Number
, Set #Iter=1
Set #Result = ' '
Case When Information LIKE '%'+#keyword+'%' --keyword chosen at front end
Then While #Iter < #n --#n set by the user from front end
Begin
Case When Information LIKE '%' + cast(#Iter as varchar(5))
+ '. %'+cast((#Iter+1) as varchar(5))+'. %'
and substring(Information,ci(#Iter), ci(#Iter+1)-ci(#Iter) )
LIKE '%'+#keyword+'%'
Then Set #Result = #Result +substring(Information,ci(#Iter),
ci(#Iter+1)-ci(#Iter) )
Else Set #Result = #Result end
Set #Iter = #Iter +1
End
Else ' ' end [Result]
The Explanation
In case what I want isn't clear, I'll run through what I'm trying to accomplish
I want to output a list of case numbers that include Topics that include the keyword.
For each case in the list I want to output only those topics that include the keyword.
I want to allow the end user of the report to choose how many Topics in each case they'll search.
I don't want to have to create a table with a column for each Topic when I can't know how many the user will want to create.
Due to these considerations it feels like a loop would be the best option, but there are problems in trying to accomplish that.
The Problem
SQL server won't allow me to utilize a loop in my Select statement--Incorrect syntax near 'While'.
The place where the information comes from prohibits normalization of the information in the table I'm searching
Even if it didn't I am barred from creating my own permanent tables at work, so I can't normalize the data for all incoming data
I am also not allowed to write my own stored procedures.
If there is any way (for example through a cte) to implement these changes, I'm open to hearing them! I'm mostly looking at ways to make the code less daunting looking (20 cases to produce 20 fields in my current cte looks scary, which then needs 3 ctes just to unpack properly [unpivot, removal of certain cases meeting certain conditions, combination into a workable output table])
Thanks in advance for reading this and helping!
I think you're working too hard.
If all you need are topic names and numbers, isn't it easier to split the Information column by newlines, and then collect all lines that start with a number and not a "dash" by then, you will have a list of strings that look like:
Topic 1.1
Topic 2.1
And then it's easy to just match the lines against the keyword?
Something like this untested SQL:
select SUBSTRING(s.value,1, PATINDEX('% %', s.Value) - 1) AS topicId
, SUBSTRING(s.Value, PATINDEX('% %', s.Value), LENGTH(s.Value)) AS topicText
from [table that would make Codd cry] t
cross apply STRING_SPLIT(t.Information, CHAR(13)) s
where s.Value LIKE '[0-9]%' -- Starts with a number
AND s.Value LIKE #keywords --matches keywords
Not sure if you can create functions or you have STRING_SPLIT available in your SQL Server version, but if you don't, there are some string splitting CTEs you can find on the net to do the job for you

LIKE pattern T-SQL

I have wrote a phraser which is phrasing a quite long string into small pieces, after the phrasing of one item is completed, it removes him from #input and continue, until it wont be able to find any items to phrase.
I am selecting items based on LIKE pattern.
In some cases, there are however it is selecting some other parts of the message, and then it's end in infinitive loop.
The pattern I am looking for to be selected using LIKE clause is in format of :
(Any number from 1 to 9) + (variable length A-Z only) + '/' +
(variable length A-Z only)+space of Cr or Lf or CrLf.
--This is what I do have:
DECLARE #match NVarChar(100)
SET #match = '%[1-9][a-z]%'
DECLARE #input1 varchar(max),#input2 varchar(max)
SET #input1 ='1ABCD/EFGH *W/17001588 *RHELLO SMVML1C'
DECLARE #position Int
SET #position = PATINDEX(#match, #input1);
SELECT #position;
--after the loop- it is also 'catching' the 1C at the end of the string:
SET #input2 = '*W/17001588 *RHELLO SMVML1C'
SET #position = PATINDEX(#match, #input2);
SELECT #position
---In order to eliminate this, I have tried to change #match:
SET #match = '%[1-9][a-z][/][a-z]%'
SET #position = PATINDEX(#match, #input1);
SELECT #position --postion is 0, so the first item, that should have been selected, wasn't selected
SET #position = PATINDEX(#match, #input2);
SELECT #position --postion is 0
Many thanks for help!
Try changing your match variable/criteria to this:
SET #match = '%[1-9][a-z]%[/][a-z]%'
This will get you the desired result. Loosely translated it is saying "Get me the starting position of the first match where the pattern is [anything]-[number from 1-9]-[single letter from a-z]-[anything]-[slash]-[single letter from a-z]-[anything].
Hope this helps!
I agree with the comments above; this is a regex problem that needs to be solved with regex tools. I would recommend the assembly created by SimpleTalk. You can get their code and read their very thorough article.
Unfortunately this solution requires serious admin rights to the database and server, so it won't be portable as a script. But I think these functions are worth any effort it takes to get the installed if you develop on the same database on a regular basis.
Just be forwarned, regex is a real processing hog and undermines much of the indexing efficiency in SQL Server. Use it only when you can't use like.
I don't know if this solves your entire problem , but if you can prefix your input with a space, you can modify the pattern to avoid matching a number without a preceding space.
set #input = ' '+#input;
set #match = '% [1-9][a-z]%';
If you need your pattern to account for other whitespace like Cr and Lf, your pattern could look like this:
set #match = '%[ '+char(13)+char(10)+'][1-9][a-z]%';

Strange Behaviour on MSSQL Stored Procedure using Conditional WHERE with CONTAINS (Full Text Index)

I need some help from a MS SQL Master...
Short version:
When I execute a Conditional Where followed by a Contains, my query delays 1 minute (In its normal execution, it takes 200 milliseconds).
With this query, everything works fine:
Where
Contains(table.product_name, #search_word)
But with a Conditional Where, it takes 1 minute to execute:
Where
(#ExecuteWhereStatement = 0 Or (Contains(table.product_name, #search_word))
Long Version:
I'm using a stored procedure that receives some parameters. This Stored Procedure query a really large table, but everything is indexed properly and the query goes very well so far.
The main query is a little big, so I want to make the WHERE clause more smart possible, to avoid repeat multiple times the same statement.
The whole idea of the DataBase, is a history of purchases made by the State. So this query involves 3 tables:
Table 1 (table_purchase) - The purchase itself
id_purchase int (PK)
date_purchase datetime
buyer_code int (Nullable)
Table 2 (table_purchase_product) - The Items of a Purchase
id_product int (PK)
id_purchase int (FK of table_purchase)
product_quantity int (Nullable)
product_name varchar(255) (Nullable) (Full-Text-Indexed)
product_description varchar(2000) (Nullable) (Full-Text-Indexed)
id_product_bid_winner int (FK of table_product_bid)
Table 3 (table_product_bids) - The Bids for Each product of a Purchase
id_product_bid int (PK)
id_product int (FK of table_purchase_product)
product_brand varchar(255) (Nullable) (Full-Text-Indexed)
bid_value decimal (20,6)
So basicly, We have a "Purchase", that has several "Products (or Items)", and each "Product" has some "Bids (or Prices)"
And there is the Bad Girl (The SQL Stored Procedure):
ALTER PROCEDURE [dbo].[procPesquisaFullText]
#search_date datetime,
#search_word varchar(8000),
#search_brand varchar(255),
#only_one_bid bit = 0,
#search_buyer_code int = 0,
#quantityFrom decimal(20,6) = 0,
#quantityTo decimal(20,6) = 0
AS
BEGIN
SET NOCOUNT ON;
Declare #ExecuteWordSearch AS bit;
if (#uasg != 0 And #search_word = '')
begin
Set #ExecuteWordSearch = 0;
Set #search_word = 'nothing';
end
else
begin
Set #ExecuteWordSearch = 1;
end
Declare #ExecuteBrandSearch AS bit;
if (#search_brand = '')
begin
Set #ExecuteBrandSearch = 0;
Set #search_brand = 'nothing';
end
else
begin
Set #ExecuteMarcaSearch = 1;
end
begin
SELECT
pp.id_product,
pp.id_purchase,
pp.description
FROM
table_purchase_product pp
inner join table_purchase p on p.id_purchase = pp.id_purchase
WHERE
(p.date_purchase >= #search_date)
and (#search_buyer_code = 0 or (l.buyer_code = #search_buyer_code))
and (#quantityFrom = 0 or (li.product_quantity >= #QuantityFrom))
and (#quantityTo = 0 or (li.product_quantity <= #QuantityTo))
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
and (#only_one_bid = 0
or ((Select COUNT(*) From table_product_bid Where table_product_bid.id_product = pp.id_product) = 1))
and (#ExecuteBrandSearch = 0 Or (exists(
select 1
from table_product_bid ppb
where ppb.id_product_bid = pp.id_product_bid_winner
and contains(ppb.product_brand, #search_brand)
)
))
ORDER BY p.date_purchase DESC
end
END
So far, so good...
In the beginning I set two variables, used inside the query.
The first, verify if the user specified a "Buyer Code" AND didn't specify a "Search Word" (So, not the Product's description nor the Product's name is verified)
The second, verify if the user specified a "Specific Brand". If so, then the Winning Bid's BRAND is verified to match the users one.
Observation: You'll notice that when the "Search Words" is empty, I set them to "nothing". I do it because if the search term in the Contains is empty, it throws me a exception, even when it's not executed (I tested it in another query, absolutely isolated too)
As You can see, my user is able to search for:
- "Products" of Some Distinct Buyer "Purchase" (passing the #search_buyer_code parameter)
- A "Product" that contains a distinct word in its name or description
- A "Product" that has the Winner Bid of a specific Brand
- A "Product" that has only 1 bid at all
- A "Product" with a maximum and minimum quantity
And You'll notice that I used a lot of Conditions INSIDE the Where, producing a very dynamic Where, instead of using a "BIG If Else" statement, and repeating a lot of code. (I guess some "Googlers" will land here looking for Conditionally Wheres, and If so, I'm glad to help!)
Ok, so everything works veeery great at all. The query executes flawless. But here is the strange, damn, tricky issue:
If I want the user to be able to specify only a "Buyer Code" for Purchase, but No Word to Search of the Product using the code above (which is the first piece of code in the stored procedure does):
Changing from:
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
To:
and (#ExecuteWordSearch = 0 Or (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word)))
The query delays near 1 minute! (the execution is about 200 milliseconds for the query above).
But WHY??? I Use the same Logic of in all "Conditionally Wheres". I also use the same logic of having a flag/variable to indicate when execute the Where clause in the Word Search and the Brand Search, but the Brand Search works PERFECTLY! So Why, WHY only when I use the condition followed by a Contains my query delays 1 minute????
And this issue is not related with the amount of data, because I tried removing the entire Contains condition, allowing a lot of data to return, and it takes 1 second maximum...
Ow, It's a Microsoft SQL Server 2008 R2.
Thanks already for You read so far!
I cannot find the documentation I had around a very similar issue, but it sounded so familiar, I at least wanted to share what I remembered. Part of the issue is that for Sql Server, the full-text search engine is separate from the regular query execution engine, and so when you mix the two, in some cases, performance can tank. This is particularly true when the condition is an 'OR' rather than and 'AND'. (I remember hitting this exact situation). Conditional ANDs worked fine. But for OR, it's as if each condition gets evaluated repeatedly row by row.
Among the workarounds, one is, as already suggested, create your sql dynamically before execution.
Another would be to break the full-text and non-full text conditions into two search functions (literally UDF's) and then do whatever is needed (INTERSECT, EXCEPT, etc) with the two resultsets.
Try changing your WHERE clause to use a CASE statement, e.g.:
WHERE
CASE
WHEN #ExecuteWhereStatement = 0 THEN 1
WHEN #ExecuteWhereStatement = 1 THEN
CASE
WHEN CONTAINS([table].product_name, #search_word) THEN 1
ELSE 0
END
END = 1;

Resources