This is the function that I am trying to make below with two parameters and one single output that is the matching words. I am using #searchentry and #bestmatch for my parameters. My questions is where should the parameters go in the function so that I can just call the function when it is created Select dbo.FunMatch('enamel cleaner', 'cleaner') it will excecute the function and return the matching words from the two arguments which would be 1 ?
Create Function dbo.FunMatch(
#searchentry varchar,
#bestmatch varchar
)
Returns INT
As
Begin
Declare #output INT
Set #output = (select
#searchentry,
#bestmatch,
cast(count(isMatch) as float) as matchingWords
from
(
select
s.value as word_from_search_entry_txt,
b.value as word_from_best_match,
case
when s.value = b.value or s.value+'s'=b.value or s.value=b.value+'s' then 'match'
else null
end as isMatch,
t.*
from (
SELECT
#searchentry,#bestmatch
FROM #tmp_parts
) t
cross apply
string_split(#searchentry, ' ') s
cross apply
string_split(#bestmatch, ' ') b
) a
group by
#searchentry,
#bestmatch)
Return #output
I am writing a function to return the matching words between two strings. example data below
CREATE TABLE #tmp_parts
(
search_entry_txt VARCHAR(30),
best_match VARCHAR(30),
);
INSERT INTO #tmp_parts
VALUES ('rotating conveyors', 'conveyor'),
('rivet tool', 'rivet nut tool'),
('enamel cleaner', 'cleaner'),
('farm ring', 'ring'),
('tire gauge', 'gauge'),
('ice cream','ice cream');
You can see the expected out here which is the matchingWords column
select
search_entry_txt,
best_match,
cast(count(isMatch) as float) as matchingWords
from
(
select
s.value as word_from_search_entry_txt,
b.value as word_from_best_match,
case
when s.value = b.value or s.value+'s'=b.value or s.value=b.value+'s' then 'match'
else null
end as isMatch,
t.*
from (
SELECT
search_entry_txt,best_match
FROM #tmp_parts
) t
cross apply
string_split(search_entry_txt, ' ') s
cross apply
string_split(best_match, ' ') b
) a
group by
search_entry_txt,
best_match
There are some issues with your function script.
The parameters #searchentry, #bestmatch might add type length otherwise that will declare length as 1.
you are missing the END on the function end.
from your code you don't need to use #tmp_parts temp table, just use parameters #searchentry, #bestmatch.
There are some verbosity script you might not need, (group by part, subquery which be able to use aggregate function to instead)
I had rewritten your script, you can try this.
Create Function dbo.FunMatch(
#searchentry varchar(max),
#bestmatch varchar(max)
)
Returns INT
As
Begin
Declare #output INT
set #output =(select
COUNT(case
when s.value = b.value or s.value+'s'=b.value or s.value=b.value+'s' then 'match'
else null
end)
from
string_split(#searchentry, ' ') s
cross apply
string_split(#bestmatch, ' ') b)
Return #output
END
sqlfiddle
Related
In t-sql my dilemma is that I have to parse a potentially long string (up to 500 characters) for any of over 230 possible values and remove them from the string for reporting purposes. These values are a column in another table and they're all upper case and 4 characters long with the exception of two that are 5 characters long.
Examples of these values are:
USFRI
PROME
AZCH
TXJS
NYDS
XVIV. . . . .
Example of string before:
"Offered to XVIV and USFRI as back ups. No response as of yet."
Example of string after:
"Offered to and as back ups. No response as of yet."
Pretty sure it will have to be a UDF but I'm unable to come up with anything other than stripping ALL the upper case characters out of the string with PATINDEX which is not the objective.
This is unavoidably cludgy but one way is to split your string into rows, once you have a set of words the rest is easy; Simply re-aggregate while ignoring the matching values*:
with t as (
select 'Offered to XVIV and USFRI as back ups. No response as of yet.' s
union select 'Another row AZCH and TXJS words.'
), v as (
select * from (values('USFRI'),('PROME'),('AZCH'),('TXJS'),('NYDS'),('XVIV'))v(v)
)
select t.s OriginalString, s.Removed
from t
cross apply (
select String_Agg(j.[value], ' ') within group(order by Convert(tinyint,j.[key])) Removed
from OpenJson(Concat('["',replace(s, ' ', '","'),'"]')) j
where not exists (select * from v where v.v = j.[value])
)s;
* Requires a fully-supported version of SQL Server.
build a function to do the cleaning of one sentence, then call that function from your query, something like this SELECT Col1, dbo.fn_ReplaceValue(Col1) AS cleanValue, * FROM MySentencesTable. Your fn_ReplaceValue will be something like the code below, you could also create the table variable outside the function and pass it as parameter to speed up the process, but this way is all self contained.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION fn_ReplaceValue(#sentence VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #ResultVar VARCHAR(500)
DECLARE #allValues TABLE (rowID int, sValues VARCHAR(15))
DECLARE #id INT = 0
DECLARE #ReplaceVal VARCHAR(10)
DECLARE #numberOfValues INT = (SELECT COUNT(*) FROM MyValuesTable)
--Populate table variable with all values
INSERT #allValues
SELECT ROW_NUMBER() OVER(ORDER BY MyValuesCol) AS rowID, MyValuesCol
FROM MyValuesTable
SET #ResultVar = #sentence
WHILE (#id <= #numberOfValues)
BEGIN
SET #id = #id + 1
SET #ReplaceVal = (SELECT sValue FROM #allValues WHERE rowID = #id)
SET #ResultVar = REPLACE(#ResultVar, #ReplaceVal, SPACE(0))
END
RETURN #ResultVar
END
GO
I suggest creating a table (either temporary or permanent), and loading these 230 string values into this table. Then use it in the following delete:
DELETE
FROM yourTable
WHERE col IN (SELECT col FROM tempTable);
If you just want to view your data sans these values, then use:
SELECT *
FROM yourTable
WHERE col NOT IN (SELECT col FROM tempTable);
I am working on a query that I need to modify so that a string is passed to in(). The view table is being used by some other view table and ultimately by a stored procedure. The string values must be in ' '.
select region, county, name
from vw_main
where state - 'MD'
and building_id in ('101', '102') -- pass the string into in()
The values for the building_id will be entered at the stored procedure level upon its execution.
Please check below scripts which will give you answer.
Way 1: Split CSV value using XML and directly use select query in where condition
DECLARE #StrBuildingIDs VARCHAR(1000)
SET #StrBuildingIDs = '101,102'
SELECT
vm.region,
vm.county,
vm.name
FROM vw_main vm
WHERE vm.state = 'MD'
AND vm.building_id IN
(
SELECT
l.value('.','VARCHAR(20)') AS Building_Id
FROM
(
SELECT CAST('<a>' + REPLACE(#StrBuildingIDs,',','</a><a>') + '</a>') AS BuildIDXML
) x
CROSS APPLY x.BuildIDXML.nodes('a') Split(l)
)
Way 2: Split CSV value using XML, Create Variable Table and use that in where condition
DECLARE #StrBuildingIDs VARCHAR(1000)
SET #StrBuildingIDs = '101,102'
DECLARE #TblBuildingID TABLE(BuildingId INT)
INSERT INTO #TblBuildingID(BuildingId)
SELECT
l.value('.','VARCHAR(20)') AS Building_Id
FROM
(
SELECT CAST('<a>' + REPLACE(#StrBuildingIDs,',','</a><a>') + '</a>') AS BuildIDXML
) x
CROSS APPLY x.BuildIDXML.nodes('a') Split(l)
SELECT
vm.region,
vm.county,
vm.name
FROM vw_main AS vm
WHERE vm.state = 'MD'
AND vm.building_id IN
(
SELECT
BuildingId
FROM #TblBuildingID
)
Way 3: Split CSV value using XML, Create Variable Table and use that in INNER JOIN
Assuming the input string is not end-user input, you can do this. That is, derived or pulled from another table or other controlled source.
DECLARE #in nvarchar(some length) = N'''a'',''b'',''c'''
declare #stmt nvarchar(4000) = N'
select region, county, name
from vw_main
where state = ''MD''
and building_id in ({instr})'
set #stmt = replace(#stmt, N'{instr}', #instr)
exec sp_executesql #stmt=#stmt;
If the input is from an end-user, this is safer:
declare # table (a int, b char)
insert into #(a, b) values (1,'A'), (2, 'B')
declare #str varchar(50) = 'A,B'
select t.* from # t
join (select * from string_split(#str, ',')) s(b)
on t.b = s.b
You may like it better anyway, since there's no dynamic sql involved. However you must be running SQL Server 2016 or higher.
I have two type of strings in a column.
DECLARE #t table(parameter varchar(100))
INSERT #t values
('It contains eact01' ),
('It contains preact01')
I'm trying to get the strings that contain the word 'eact01'.
My problem is that using the following SELECT, I get also the variables that contain 'preact01', because it contain 'eact01'.
SELECT * FROM #t WHERE parameter LIKE '%eact01%'
How could I get only the row containing 'eact01'?
This should find all combinations, any character not being a letter or a number considerer this as a spit character or a new word.
SELECT *
FROM #t
WHERE
parameter like '%[^0-9a-z]eact01'
or parameter like '%[^0-9a-z]eact01[^0-9a-z]%'
or parameter like 'eact01[^0-9a-z]%'
or parameter = 'eact01'
Try this-
select *
from #t
where
parameter='eact01'
OR parameter like '%[^0-9a-z]eact01%'
OR parameter like 'eact01[^0-9a-z]%'
OR parameter like '%[^0-9a-z]eact01[^0-9a-z]%'
The easiest way is just add space:
SELECT * FROM #t WHERE parameter LIKE '% eact01%' or parameter LIKE 'eact01%'
You need a string splitter for this. Here is one taken from Aaron Bertrand's article:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
Then, you can use EXISTS:
SELECT *
FROM #t
WHERE EXISTS(
SELECT 1
FROM dbo.SplitStrings_XML(parameter, ' ') s
WHERE s.Item = 'eact01'
)
I have to variables that contain comma-separated strings:
#v1 = 'hello, world, one, two'
#v2 = 'jump, down, yes, one'
I need a function that will return TRUE if there is at least one match. So in the above example, it would return TRUE since the value 'one' is in both strings.
Is this possible in SQL?
Use a split function (many examples here - CLR is going to be your best option in most cases back before SQL Server 2016 - now you should use STRING_SPLIT()).
Once you have a split function, the rest is quite easy. The model would be something like this:
DECLARE #v1 VARCHAR(MAX) = 'hello, world, one, two',
#v2 VARCHAR(MAX) = 'jump, down, yes, one';
SELECT CASE WHEN EXISTS
(
SELECT 1
FROM dbo.Split(#v1) AS a
INNER JOIN dbo.Split(#v2) AS b
ON a.Item = b.Item
)
THEN 1 ELSE 0 END;
You can even reduce this to only call the function once:
SELECT CASE WHEN EXISTS
(
SELECT 1 FROM dbo.Split(#v1)
WHERE ', ' + LTRIM(#v2) + ','
LIKE '%, ' + LTRIM(Item) + ',%'
) THEN 1 ELSE 0 END;
On 2016+:
SELECT CASE WHEN EXISTS
(
SELECT 1 FROM STRING_SPLIT(#v1, ',')
WHERE ', ' + LTRIM(#v2) + ','
LIKE '%, ' + LTRIM([Value]) + ',%'
) THEN 1 ELSE 0 END;
You can use CTEs to split your string into xml nodes, then insert the words into table variables. Joining the table variables will reveal any matches
DECLARE #v1 VARCHAR(200) = 'hello, world, one, two'
DECLARE #v2 VARCHAR(200) = 'jump, down, yes, one'
DECLARE #v1Words TABLE (word VARCHAR(100))
DECLARE #v2Words TABLE (word VARCHAR(100))
;WITH cteSplitV1 AS(
SELECT CAST('<word>' + REPLACE(#v1,', ','</word><word>') + '</word>' AS XML) AS words)
INSERT INTO #v1Words(word)
SELECT word.x.value('.','VARCHAR(100)') AS [word]
FROM cteSplitV1
CROSS APPLY words.nodes('/word') AS word(x)
;WITH cteSplitV2 AS(
SELECT CAST('<word>' + REPLACE(#v2,', ','</word><word>') + '</word>' AS XML) AS words)
INSERT INTO #v2Words(word)
SELECT word.x.value('.','VARCHAR(100)') AS [word]
FROM cteSplitV2
CROSS APPLY words.nodes('/word') AS word(x)
SELECT *
FROM #v1Words v1
JOIN #v2Words v2
ON v1.word = v2.word
I have created a user defined function to gain performance with queries containing 'WHERE col IN (...)' like this case:
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000);
The queries are generated from an web application and are in some cases much more complex.
The function definition looks like this:
CREATE FUNCTION [dbo].[udf_CSVtoIntTable]
(
#CSV VARCHAR(MAX),
#Delimiter CHAR(1) = ','
)
RETURNS
#Result TABLE
(
[Value] INT
)
AS
BEGIN
DECLARE #CurrStartPos SMALLINT;
SET #CurrStartPos = 1;
DECLARE #CurrEndPos SMALLINT;
SET #CurrEndPos = 1;
DECLARE #TotalLength SMALLINT;
-- Remove space, tab, linefeed, carrier return
SET #CSV = REPLACE(#CSV, ' ', '');
SET #CSV = REPLACE(#CSV, CHAR(9), '');
SET #CSV = REPLACE(#CSV, CHAR(10), '');
SET #CSV = REPLACE(#CSV, CHAR(13), '');
-- Add extra delimiter if needed
IF NOT RIGHT(#CSV, 1) = #Delimiter
SET #CSV = #CSV + #Delimiter;
-- Get total string length
SET #TotalLength = LEN(#CSV);
WHILE #CurrStartPos < #TotalLength
BEGIN
SET #CurrEndPos = CHARINDEX(#Delimiter, #CSV, #CurrStartPos);
INSERT INTO #Result
VALUES (CAST(SUBSTRING(#CSV, #CurrStartPos, #CurrEndPos - #CurrStartPos) AS INT));
SET #CurrStartPos = #CurrEndPos + 1;
END
RETURN
END
The function is intended to be used like this (or as an INNER JOIN):
SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (
SELECT [Value]
FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ',');
Do anyone have some optimiztion idears of my function or other ways to improve performance in my case?
Is there any drawbacks that I have missed?
I am using MS SQL Server 2005 Std and .NET 2.0 framework.
I'm not sure of the performance increase, but I would use it as an inner join and get away from the inner select statement.
Using a UDF in a WHERE clause or (worse) a subquery is asking for trouble. The optimizer sometimes gets it right, but often gets it wrong and evaluates the function once for every row in your query, which you don't want.
If your parameters are static (they appear to be) and you can issue a multistatement batch, I'd load the results of your UDF into a table variable, then use a join against the table variable to do your filtering. This should work more reliably.
that loop will kill performance!
create a table like this:
CREATE TABLE Numbers
(
Number int not null primary key
)
that has rows containing values 1 to 8000 or so and use this function:
CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
#ParsedList table
(
RowNumber int
,ListValue varchar(500)
)
AS
BEGIN
/*
DESCRIPTION: Takes the given #List string and splits it apart based on the given #SplitOn character.
A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue".
This function workes for fixed or variable lenght items.
Empty and null items will be included in the results set.
PARAMETERS:
#List varchar(8000) --REQUIRED, the list to split apart
#SplitOn char(1) --OPTIONAL, the character to split the #List string on, defaults to a comma ","
RETURN VALUES:
a table, one row per item in the list, with a column name "ListValue"
TEST WITH:
----------
SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B')
DECLARE #InputList varchar(200)
SET #InputList='17;184;75;495'
SELECT
'well formed list',LEFT(#InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',#InputList) dt ON h.EmployeeID=dt.ListValue
WHERE dt.ListValue IS NOT NULL
SET #InputList='17;;;184;75;495;;;'
SELECT
'poorly formed list join',LEFT(#InputList,40) AS InputList,h.Name
FROM Employee h
INNER JOIN dbo.FN_ListAllToNumTable(';',#InputList) dt ON h.EmployeeID=dt.ListValue
SELECT
'poorly formed list',LEFT(#InputList,40) AS InputList, ListValue
FROM dbo.FN_ListAllToNumTable(';',#InputList)
**/
/*this will return empty rows, and row numbers*/
INSERT INTO #ParsedList
(RowNumber,ListValue)
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
RETURN
END /*Function FN_ListAllToNumTable*/
I have other versions that do not return empty or null rows, ones that return just the item and not the row number, etc. Look in the header comment to see how to use this as part of a JOIN, which is much faster than in a where clause.
The CLR solution did not give me an good performance so I will use a recursive query. So here is the definition of the SP I will use (mostly based on Erland Sommarskogs examples):
CREATE FUNCTION [dbo].[priudf_CSVtoIntTable]
(
#CSV VARCHAR(MAX),
#Delimiter CHAR(1) = ','
)
RETURNS
#Result TABLE
(
[Value] INT
)
AS
BEGIN
-- Remove space, tab, linefeed, carrier return
SET #CSV = REPLACE(#CSV, ' ', '');
SET #CSV = REPLACE(#CSV, CHAR(9), '');
SET #CSV = REPLACE(#CSV, CHAR(10), '');
SET #CSV = REPLACE(#CSV, CHAR(13), '');
WITH csvtbl(start, stop) AS
(
SELECT start = CONVERT(BIGINT, 1),
stop = CHARINDEX(#Delimiter, #CSV + #Delimiter)
UNION ALL
SELECT start = stop + 1,
stop = CHARINDEX(#Delimiter, #CSV + #Delimiter, stop + 1)
FROM csvtbl
WHERE stop > 0
)
INSERT INTO #Result
SELECT CAST(SUBSTRING(#CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value]
FROM csvtbl
WHERE stop > 0
OPTION (MAXRECURSION 1000)
RETURN
END
Thank for the input, I have to admit that I have made som bad research before I started my work. I found that Erland Sommarskog has written a lot of this problem on his webpage, after your responeses and after reading his page I decided that I will try to make a CLR to solve this.
I tried a recursive query, this resulted in good performance but I will try CLR function anyway.