Compare two SUBSTRINGS in SQL Server - sql-server

I have the following test code:
DECLARE
#Str1 VARCHAR(MAX) = 'Hello World'
,#Str2 VARCHAR(MAX) = 'World Hello'
SELECT CHARINDEX(#Str1, #Str2)
The select statement returns zero because it takes the whole #Str1 and tries to find it within #Str2.
How can I make the search compare sub-strings?
In other word, I want the search to see if a substring of #Str1 can be found as substring in #Str2

If you're just splitting on spaces what you'd do is split the string and then search for each split word and get the char index of that.
Here's a quick example:
DECLARE
#Str1 VARCHAR(MAX) = 'Hello World'
,#Str2 VARCHAR(MAX) = 'World Hello'
DECLARE #substring VARCHAR(MAX)
DECLARE c CURSOR FOR
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#Str1, ' ', '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
OPEN c
FETCH NEXT FROM c INTO #substring
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT CHARINDEX(#substring, #str2)
FETCH NEXT FROM c INTO #substring
END
CLOSE c
DEALLOCATE c

You can use instr (which calculates length using characters defined by input character set) to find if the substring exist in other.
select * from tablename
where instr(upper(str1),upper(str2)) > 0
--This would give if a str1 exist in the str2
or upper(str1) = upper(str2);--This would be same string

I don't have the answer but cannot comment :-(
Question: What type of substring would you be looking for. A substring could be the entire word "Hello" but also just the letters "llo" of even "l". I would assume you mean to see if any of the words in #Str1 are contained in #Str2.
You would then use a split function such as found here to first split the #Str1 into a list and then create a loop over that table to use the CHARINDEX to find any substring.
But all of this depends on your definition of "substring"

Related

How to get substring in TSQL

I want to extract special word from a sentence in SQL Server.
For example I want to extract No-13 from 'Street3 NO-13 Floor 4th'
Following code is my primary code, but I can't find the last index to get special word:
SELECT PATINDEX('%Y[^][0-9]%', 'Street3 NO-13 Floor 4th')
Fortunately I found a solution, I use following code:
DECLARE #txt NVARCHAR(255)
SET #txt = 'Street3 NO- 13 Floor 4th'
DECLARE #startIndex INT
SELECT #startIndex = PATINDEX('% No%[0-9]%',#txt)
Declare #FirstLetters AS NVARCHAR(50)
DECLARE #remainingString NVARCHAR(MAX)
SELECT #remainingString = SUBSTRING(#txt, #startIndex, LEN(#txt) - #startIndex)
SELECT #FirstLetters=SUBSTRING(#remainingString, 0, PATINDEX('%[0-9]%',#remainingString))
SELECT #remainingString=REPLACE(#remainingString,#FirstLetters,'')
SELECT #FirstLetters +LEFT(
SubString(#remainingString, PatIndex('%[0-9.-]%', #remainingString), 8000),
PatIndex('%[^0-9.-]%', SubString(#remainingString, PatIndex('%[0-9.-]%',
#remainingString), 8000) + 'X')-1) AS BuildingNo
The length of the pattern is fixed, so you always know where the pattern ends. However, it seems that what you really want is to find the end of a word that starts with a given pattern. If you only care about spaces as word separators, this is quite simple - charindex takes an optional starting location index, so you can just find the first space after the patindex result. Then you can use substring to get the string between the two indices.

TSQL: How insert separator between each character in a string

I have a string like this:
Apple
I want to include a separator after each character so the end result will turn out like this:
A,p,p,l,e
In C#, we have one liner method to achieve the above with Regex.Replace('Apple', ".{1}", "$0,");
I can only think of looping each character with charindex to append the separator but seems a little complicated. Is there any elegant way and simpler way to achieve this?
Thanks HABO for the suggestions. I'm able to generate the result that I want using the code but takes a little bit of time to really understand how the code work.
After some searching, I manage to found one useful article to insert empty spaces between each character and it's easier for me to understand.
I modify the code a little to define and include desire separator instead of fixing it to space as the separator:
DECLARE #pos INT = 2 -- location where we want first space
DECLARE #result VARCHAR(100) = 'Apple'
DECLARE #separator nvarchar(5) = ','
WHILE #pos < LEN(#result)+1
BEGIN
SET #result = STUFF(#result, #pos, 0, #separator);
SET #pos = #pos+2;
END
select #result; -- Output: A,p,p,l,e
Reference
In following SQL scripts, I get each character using SUBSTRING() function using with a number table (basically I used spt_values view here for simplicity) and then I concatenate them via two different methods, you can choose one
If you are using SQL Server 2017, we have a new SQL string aggregation function
First script uses string_agg function
declare #str nvarchar(max) = 'Apple'
SELECT
string_agg( substring(#str,number,1) , ',') Within Group (Order By number)
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
If you are working with a previous version, you can use string concatenation using FOR XML Path and SQL Stuff function as follows
declare #str nvarchar(max) = 'Apple'
; with cte as (
SELECT
number,
substring(#str,number,1) as L
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
)
SELECT
STUFF(
(
SELECT
',' + L
FROM cte
order by number
FOR XML PATH('')
), 1, 1, ''
)
Both solution yields the same result, I hope it helps
If you have SQL Server 2017 and a copy of ngrams8k it's ultra simple:
declare #word varchar(100) = 'apple';
select newString = string_agg(token, ',') within group (order by position)
from dbo.ngrams8k(#word,1);
For pre-2017 systems it's almost as simple:
declare #word varchar(100) = 'apple';
select newstring =
( select token + case len(#word)+1-position when 1 then '' else ',' end
from dbo.ngrams8k(#word,1)
order by position
for xml path(''))
One ugly way to do it is to split the string into characters, ideally using a numbers table, and reassemble it with the desired separator.
A less efficient implementation uses recursion in a CTE to split the characters and insert the separator between pairs of characters as it goes:
declare #Sample as VarChar(20) = 'Apple';
declare #Separator as Char = ',';
with Characters as (
select 1 as Position, Substring( #Sample, 1, 1 ) as Character
union all
select Position + 1,
case when Position & 1 = 1 then #Separator else Substring( #Sample, Position / 2 + 1, 1 ) end
from Characters
where Position < 2 * Len( #Sample ) - 1 )
select Stuff( ( select Character + '' from Characters order by Position for XML Path( '' ) ), 1, 0, '' ) as Result;
You can replace the select Stuff... line with select * from Characters; to see what's going on.
Try this
declare #var varchar(50) ='Apple'
;WITH CTE
AS
(
SELECT
SeqNo = 1,
MyStr = #var,
OpStr = CAST('' AS VARCHAR(50))
UNION ALL
SELECT
SeqNo = SeqNo+1,
MyStr = MyStR,
OpStr = CAST(ISNULL(OpStr,'')+SUBSTRING(MyStR,SeqNo,1)+',' AS VARCHAR(50))
FROM CTE
WHERE SeqNo <= LEN(#var)
)
SELECT
OpStr = LEFT(OpStr,LEN(OpStr)-1)
FROM CTE
WHERE SeqNo = LEN(#Var)+1

How can I complete this Excel function in SQL Server?

I have approximately 30,000 records where I need to split the Description field and so far I can only seem to achieve this in Excel. An example Description would be:
1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS
Below is my Excel function:
=TRIM(MID(SUBSTITUTE($G309," ",REPT(" ",LEN($G309))),((COLUMNS($G309:G309)-1)*LEN($G309))+1,LEN($G309)))
This is then dragged across ten Excel columns, and splits the description field at each space.
I have seen many questions asked about splitting a string in SQL but they only seem to cover one space, not multiple spaces.
There is no easy function in SQL server to split strings. At least I don't know it. I use usually some trick that I found somewhere in the Internet some time ago. I modified it to your example.
The trick is that first we try to figure out how many columns do we need. We can do it by checking how many empty strings we have in the string. The easiest way is lenght of string - lenght of string without empty string.
After that for each string we try to find start and end of each word by position. At the end we cut simply string by start and end position and assign to coulmns. The details are in the query. Have fun!
CREATE TABLE test(id int, data varchar(100))
INSERT INTO test VALUES (1,'1USBCP 2RJ45C6 1DVI 1DP 3MD 3MLP HANDS')
INSERT INTO test VALUES (2,'Shorter one')
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(data)-len(replace(data,',',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('','',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + data +'' '' as data
from
test
) m
where n < len(data)-1
and substring(odata,n+1,1) = '','') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)
Here you can find example in SQL Fiddle
I didn't notice that you want to get rid of multiple blank spaces.
To do it please create some function that preprare your data :
CREATE FUNCTION dbo.[fnRemoveExtraSpaces] (#Number AS varchar(1000))
Returns Varchar(1000)
As
Begin
Declare #n int -- Length of counter
Declare #old char(1)
Set #n = 1
--Begin Loop of field value
While #n <=Len (#Number)
BEGIN
If Substring(#Number, #n, 1) = ' ' AND #old = ' '
BEGIN
Select #Number = Stuff( #Number , #n , 1 , '' )
END
Else
BEGIN
SET #old = Substring(#Number, #n, 1)
Set #n = #n + 1
END
END
Return #number
END
After that use the new version that removes extra spaces.
DECLARE #pivot varchar(8000)
DECLARE #select varchar(8000)
SELECT
#pivot=coalesce(#pivot+',','')+'[col'+cast(number+1 as varchar(10))+']'
FROM
master..spt_values where type='p' and
number<=(SELECT max(len(dbo.fnRemoveExtraSpaces(data))-len(replace(dbo.fnRemoveExtraSpaces(data),' ',''))) FROM test)
SELECT
#select='
select p.*
from (
select
id,substring(data, start+2, endPos-Start-2) as token,
''col''+cast(row_number() over(partition by id order by start) as varchar(10)) as n
from (
select
id, data, n as start, charindex('' '',data,n+2) endPos
from (select number as n from master..spt_values where type=''p'') num
cross join
(
select
id, '' '' + dbo.fnRemoveExtraSpaces(data) +'' '' as data
from
test
) m
where n < len(data)-1
and substring(data,n+1,1) = '' '') as data
) pvt
Pivot ( max(token)for n in ('+#pivot+'))p'
EXEC(#select)
I am probably not understanding your question, but all that you are doing in that formula, can be done almost exactly the same in SQL. I see someone has already answered but to my mind, how can it be necessary to do all that when you can do this. I might be wrong. But here goes.
declare #test as varchar(100)
set #test='abcd1234567'
select right(#test,2)
, left(#test,2)
, len(#test)
, case when len(#test)%2>0
then left(right(#test,round(len(#test)/2,0)+1),1)
else left(right(#test,round(len(#test)/2,0)+1),2) end
Results
67 ab 11 2
So right, left, length and mid can all be achieved.
If the spaces are the "substring" dividers, then: I dont remember well the actual syntax for do-while inside selects of sql, neither have i actually done that per se, but I don't see why it should not be possible. If it doesn't work then you need a temporary table and if that does not work you need a cursor. The cursor would be an external loop around this one to fetch and process a single string at a time. Or you can do something more clever. I am just a novice.
declare #x varchar(1)
declare #n integer
declare #i integer
declare #str varchar(100) -- this is your description. Fetch it and assign it. if in a cursor just use column-name
set #x = null
set #n = 0
set #i = 0
while n < len(#str)
while NOT #x = " "
begin
set #x = left(right(#str,n),1)
n = n+1
end
--insert into or update #temptable blablabla here.
Use i and n to locate substring and then left(right()) it out. or you can SELECT it, but that is a messy procedure if the number of substrings are long. Continue with:
set i = n
set #str = right(#str, i) -- this includes the " ". left() it out at will.
end
Now, a final comment, there should perhaps be a third loop checking for if you are at the last "substring" because I see now this code will throw error when it gets to the end. or "add" an empty space at the end to #str, that will also work. But my time is up. This is a suggestion at least.

T-SQL Find length of word within a string

With PATINDEX I can find the first occourence of a pattern in a string, say a number - in the string there is several matches to my pattern
My question is how can I find the end position of the first occourence of that pattern in a string?
DECLARE #txt VARCHAR(255)
SET #txt = 'this is a string 30486240 and the string is still going 30485 and this is the end'
PRINT SUBSTRING(#txt,PATINDEX('%[0-9]%',#txt),8)
My problem is, I dont want to put in the 8 in manually, I want to find the length of the first number
Using SQL Server 2012
Try this, it should return the first number from your text:
DECLARE #txt VARCHAR(255)
SET #txt = 'this is a string 30486240 and the string is still going 30485 and this is the end'
DECLARE #startIndex INTEGER
SELECT #startIndex = PATINDEX('%[0-9]%',#txt)
DECLARE #remainingString NVARCHAR(MAX)
SELECT #remainingString = substring(#txt, #startIndex, LEN(#txt) - #startIndex)
DECLARE #endingIndex INTEGER
SELECT #endingIndex = PATINDEX('%[a-zA-Z]%', #remainingString) - 1
SELECT RTRIM(SUBSTRING(#txt, #startIndex, #endingIndex))
This query will work as long as you don't have letters "embedded" in your numbers, like 30486a24b0
Here is one solution when you don't know the length of the substring:
SELECT Left(
SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000),
PatIndex('%[^0-9.-]%', SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000) + 'X')-1)
Source: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/extracting-numbers-with-sql-server/
I had to run through the exercise multiple times and kept thinking the blog post was wrong, before noticing the caret in the second PATINDEX.

sql to check string contains with where clause

in Sql server
I have a following string
DECLARE #str nvarchar(max);
set #str = "Hello how are you doing today,Its Monday and 5 waiting days";
DECLARE #srch nvarchar(max);
set #srch = " how,doing,monday,waiting";
Now i want to check whether str contains any of string (comma separated string) of srch
I want it in only sql server
is there possibilites to write some query with in clause
like
select from #str where _____ in (select * from CommaSplit(#srch)
where CommaSplit function rerturns rows of #srch comma separted value
I dont want to use cursor or any loop concept as the #srch value can be very long
Thanks
you can use same function to get first string in rows
select string from CommaSplit(#srch,'') where string in (select * from CommaSplit(#srch)
You can use the following common table expressions query to split your string into parts. cte will contain one record per phrase in #srch. In my example below, I show where in #str each of the search phrase is located. It returns 0 if it cannot locate a search phrase.
Note 1: it won't show the location twice if your search phrase is duplicated - you would need another CTE for that.
Note 2: I have to add comma at the end of #srch to make my CTE work. You can do that inside the CTE if you prefer not to change the search string.
DECLARE #str nvarchar(max);
set #str = 'Hello how are you doing today,Its Monday and 5 waiting days';
DECLARE #srch nvarchar(max);
set #srch = 'how,doing,monday,waiting';
set #srch = #srch + ','
-- first split the text into 1 character per row
;with cte
as
(
select substring(#srch, 1, CHARINDEX(',', #srch, 1) - 1) as Phrase, CHARINDEX(',', #srch, 1) as Idx
union all
select substring(#srch, cte.Idx + 1, CHARINDEX(',', #srch, cte.Idx + 1) - cte.Idx - 1) as Phrase, CHARINDEX(',', #srch, cte.Idx + 1) as Idx
from cte
where cte.Idx < CHARINDEX(',', #srch, cte.Idx + 1)
)
select charindex(cte.Phrase, #str, 1) from cte
I don't think that the IN clause is what you need. Instead of this you can use the LIKE construction as following:
if (select count(*) from CommaSplit(#srch) where #str like '%' + val + '%') > 0
select 'true'
else
select 'false'
In this case you will receive 'true' when at least 1 result of CommaSplit function exists in the #str text. But in this case you also will receive a 'true' value when the result of CommaSplit function is a part of the word in the #str string.
If you need more accurate solution, this can be achieved by the following way: you need to split the #str into the words (also replacing punctuation by spaces beforehand). And, after this, intersect of CommaSplit (#srch) and SpaceSplit(#str) will be the answer on the question. Among this, you also will be able to check which words are matching between two strings.
The overhead of this method is to create function SpaceSplit which is copy of CommaSplit but with another separator. Or the function CommaSplit can be modified to receive a separator as parameter.

Resources