Get the highest conformity between two strings

Get the highest conformity between two strings - sql-server

Is the Levenshtein function the correct/best function to find the highest conformity between two strings?
eg:
string1 = CCC14E0APJ
string2 = CCC14E0APJ123
My end result should say that CCC14E0APJ is the master product of CCC14E0APJ123.
I can not do a exact match because some products will look like this.
CCC14E0AP
CCC14E0APJ
CCC14E0APK
which are all totally different products.
The master is always a 100% matching string for the longest found string.
For product abcde123, if there is a abcde in my master table, thats the master. If there is only abc, thats the master.

You do not need fancy How-close-is-the-string-functions, but rather compare the beginning of a string with all other strings, if they start with the same string. If so, the shorter is the parent of the longer...
With the following query you would get the ParentID, even in a hierarchical system:
DECLARE #dummy TABLE(YourID VARCHAR(100),ParentID VARCHAR(100));
INSERT INTO #dummy(YourID) VALUES
('CCC14E0AP')
,('CCC14E0APJ')
,('CCC14E0APK')
,('CCC14E0APK_1')
,('CCC14E');
WITH DependingIDs AS
(
SELECT d.ParentID
,d.YourID
,d2.YourID AS dependingID
,RANK() OVER(PARTITION BY d.YourID ORDER BY LEN(d2.YourID) DESC) AS NextLenght
FROM #dummy AS d
INNER JOIN #dummy AS d2 ON d.YourID LIKE d2.YourID + '%' AND d.YourID<>d2.YourID
)
UPDATE DependingIDs SET ParentID=dependingID
WHERE NextLenght=1;
SELECT * FROM #dummy
This is the result
YourID ParentID
CCC14E0AP CCC14E
CCC14E0APJ CCC14E0AP
CCC14E0APK CCC14E0AP
CCC14E0APK_1 CCC14E0APK
CCC14E NULL

For each row you just detect the max substring using APPLY operator:
DECLARE #t TABLE ( p VARCHAR(MAX) );
INSERT INTO #t
VALUES ( 'A' ),
( 'AAAA' ),
( 'AA' ),
( 'BBB' ),
( 'BBBB' ),
( 'BBBBB' ),
( 'BBBBB' ),
( 'C' )
SELECT *
FROM #t t
OUTER APPLY ( SELECT TOP 1 p
FROM #t
WHERE t.p <> p AND t.p LIKE p + '%'
ORDER BY LEN(p) DESC
) ca
Output:
A NULL
AAAA AA
AA A
BBB NULL
BBBB BBB
BBBBB BBBB
BBBBB BBBB
C NULL

Related

Split a string in SQL by hyphen in 2012 version

I have multiple string in a column where I have get last string after column
Below are three example like same I have different number hyphen that can occur in a string but desired result is I have string before last hyphen
1. abc-def-Opto
2. abc-def-ijk-5C-hello-Opto
3. abc-def-ijk-4C-hi-Build
4. abc-def-ijk-4C-123-suppymanagement
Desired result set is
def
hello
hi
123
How to do this in SQL query to get this result set. I have MSSQL 2012 version
Require a generic sql which can get the result set

There are many ways to split/parse a string. ParseName() would fail because you may have more than 4 positions.
One option (just for fun), is to use a little XML.
We reverse the string
Convert into XML
Grab the second node
Reverse the desired value for the final presentation
Example
Declare #YourTable Table ([SomeCol] varchar(50))
Insert Into #YourTable Values
('abc-def-Opto')
,('abc-def-ijk-5C-hello-Opto')
,('abc-def-ijk-4C-hi-Build')
,('abc-def-ijk-4C-123-suppymanagement')
Select *
,Value = reverse(convert(xml,'<x>'+replace(reverse(SomeCol),'-','</x><x>')+'</x>').value('x[2]','varchar(150)'))
from #YourTable
Returns
SomeCol Value
abc-def-Opto def
abc-def-ijk-5C-hello-Opto hello
abc-def-ijk-4C-hi-Build hi
abc-def-ijk-4C-123-suppymanagement 123

Without getting into XML stuff, simply using string functions of sql server.
Declare #YourTable Table ([SomeCol] varchar(50))
Insert Into #YourTable Values
('abc-def-Opto')
,('abc-def-ijk-5C-hello-Opto')
,('abc-def-ijk-4C-hi-Build')
,('abc-def-ijk-4C-123-suppymanagement');
SELECT *
,RTRIM(LTRIM(REVERSE(
SUBSTRING(
SUBSTRING(REVERSE([SomeCol]) , CHARINDEX('-', REVERSE([SomeCol])) +1 , LEN([SomeCol]) )
, 1 , CHARINDEX('-', SUBSTRING(REVERSE([SomeCol]) , CHARINDEX('-', REVERSE([SomeCol])) +1 , LEN([SomeCol]) ) ) -1
)
)))
FROM #YourTable

i am not sure this script will exactly useful to your requirement but i am just trying to give an idea how to split the data
IF OBJECT_ID('tempdb..#Temp')IS NOT NULL
DROP TABLE #Temp
;WITH CTE(Id,data)
AS
(
SELECT 1,'abc-def-Opto' UNION ALL
SELECT 2,'abc-def-ijk-5C-hello-Opto' UNION ALL
SELECT 3,'abc-def-ijk-4C-hi-Build' UNION ALL
SELECT 4,'abc-def-ijk-4C-123-suppymanagement'
)
,Cte2
AS
(
SELECT Id, CASE WHEN Id=1 AND Setdata=1 THEN data
WHEN Id=2 AND Setdata=2 THEN data
WHEN Id=3 AND Setdata=3 THEN data
WHEN Id=4 AND Setdata=4 THEN data
ELSE NULL
END AS Data
FROM
(
SELECT Id,
Split.a.value('.','nvarchar(1000)') AS Data,
ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS Setdata
FROM(
SELECT Id,
CAST('<S>'+REPLACE(data ,'-','</S><S>')+'</S>' AS XML) AS data
FROM CTE
) AS A
CROSS APPLY data.nodes('S') AS Split(a)
)dt
)
SELECT * INTO #Temp FROM Cte2
SELECT STUFF((SELECT DISTINCT ', '+ 'Set_'+CAST(Id AS VARCHAR(10))+':'+Data
FROM #Temp WHERE ISNULL(Data,'')<>'' FOR XML PATH ('')),1,1,'')
Result
Set_1:abc, Set_2:def, Set_3:ijk, Set_4:4C

You can do like
WITH CTE AS
(
SELECT 1 ID,'abc-def-Opto' Str
UNION
SELECT 2, 'abc-def-ijk-5C-hello-Opto'
UNION
SELECT 3, 'abc-def-ijk-4C-hi-Build'
UNION
SELECT 4, 'abc-def-ijk-4C-123-suppymanagement'
)
SELECT ID,
REVERSE(LEFT(REPLACE(P2, P1, ''), CHARINDEX('-', REPLACE(P2, P1, ''))-1)) Result
FROM (
SELECT LEFT(REVERSE(Str), CHARINDEX('-', REVERSE(Str))) P1,
REVERSE(Str) P2,
ID
FROM CTE
) T;
Returns:
+----+--------+
| ID | Result |
+----+--------+
| 1 | def |
| 2 | hello |
| 3 | hi |
| 4 | 123 |
+----+--------+
Demo

TSQL, change value on a comma delimited column

I have a column called empl_type_multi which is just a comma delimited column, each value is a link to another table called custom captions.
For instance, i might have the following as a value in empl_type_multi:
123, RHN, 458
Then in the custom_captions table these would be individual values:
123 = Dog
RHN = Cat
458 = Rabbit
All of these fields are NTEXT.
What i am trying to do is convert the empl_type_multi column and chance it to the respective names in the custom_captions table, so in the example above:
123, RHN, 458
Would become
Dog, Cat, Rabbit
Any help on this would be much appreciated.
----- EDIT ------------------------------------------------------------------
Ok so ive managed to convert the values to the corresponding caption and put it all into a temporary table, the following is the output from a CTE query on the table:
ID1 ID2 fName lName Caption_name Row_Number
10007 22841 fname1 lname1 DENTAL ASSISTANT 1
10007 22841 fname1 lname1 2
10007 22841 fname1 lname1 3
10008 23079 fname2 lname2 OPS WARD 1
10008 23079 fname2 lname2 DENTAL 2
10008 23079 fname2 lname2 3
How can i update this so that anything under caption name is added to the caption name of Row_Number 1 separated by a comma?
If i can do that all i need to do is delete all records where Row_Number != 1.
------ EDIT --------------------------------------------------
The solution to the first edit was:
WITH CTE AS
(
SELECT
p.ID1
, p.ID2
, p.fname
, p.lname
, p.caption_name--
, ROW_NUMBER() OVER (PARTITION BY p.id1ORDER BY caption_name DESC) AS RN
FROM tmp_cs p
)
UPDATE tblPerson SET empType = empType + ', ' + c.Data
FROM CTE c WHERE [DB1].dbo.tblPerson.personID = c.personID AND RN = 2
And then i just incremented RN = 2 until i got 0 rows affected.
This was after i ran:
DELETE FROM CTE WHERE RN != 1 AND Caption_name = ''

select ID1, ID2, fname, lname, left(captions, len(captions) - 1) as captions
from (
select distinct ID1, ID2, cast(fname as nvarchar) as fname, cast(lname as nvarchar) as lname, (
select cast(t1.caption_name as nvarchar) + ','
from #temp as t1
where t1.ID1 = t2.ID1
and t1.ID2 = t2.ID2
and cast(caption_name as nvarchar) != ''
order by t1.[row_number]
for xml path ('')) captions
from #temp as t2
) yay_concatenated_rows
This will give you what you want. You'll see casting from ntext to varchar. This is necessary for comparison because many logical ops can't be performed on ntext. It can be implicitly cast back the other way so no worries there. Note that when casting I did not specify length; this will default to 30, so adjust as varchar(length) as needed to avoid truncation. I also assumed that both ID1 and ID2 form a composite key (it appears this is so). Adjust the join as you need for the relationship.

you have just shared your part of problem,not exact problem.
try this,
DECLARE #T TABLE(ID1 VARCHAR(50),ID2 VARCHAR(50),fName VARCHAR(50),LName VARCHAR(50),Caption_name VARCHAR(50),Row_Number INT)
INSERT INTO #T VALUES
(10007,22841,'fname1','lname1','DENTAL ASSISTANT', 1)
,(10007,22841,'fname1','lname1', NULL, 2)
,(10007,22841,'fname1','lname1', NULL, 3)
,(10008,23079,'fname2','lname2','OPS WARD', 1)
,(10008,23079,'fname2','lname2','DENTAL', 2)
,(10008,23079,'fname2','lname2', NULL, 3)
SELECT *
,STUFF((SELECT ','+Caption_name
FROM #T T1 WHERE T.ID1=T1.ID1 FOR XML PATH('')
),1,1,'')
FROM #T T

You can construct the caption_name string easily by looping through while loop
declare #i int = 2,#Caption_name varchar(100)= (select series from
#temp where Row_Number= 1)
while #i <= (select count(*) from #temp)
begin
select #Caption_name = #Caption_name + Caption_name from #temp where Row_Number = #i)
set #i = #i+1
end
update #temp set Caption_name = #Caption_name where Row_Number = 1
and use case statement to remove null values
(select case when isnull(Caption_name ,'') = '' then
'' else ',' + Caption_name end

T-SQL select value where value contains less than 3 of the declared characters

Im trying to write a select statement which returns the value if it doesnt have at least 3 of the declared characters but I cant think of how to get it working, can someone point me in the right direction?
One thing to consider, I am not allowed to create a temporary table for this exercise.
I havn't really got any SQL so far as I cant think of a way to do it without a temp table.
the declared characters are any alpha characters between a and z, so if the value in the db is '1873' then it would return the value because it doesnt have at least 3 of the declared characters, but if the value was 'abcdefg' then it would not be returned as it has at least 3 of the declared characters.
Is anyone able to point me in a starting direction for this?

This will find all sys.objects with an x or a z:
Some explanations, as this is an exercise and you want to learn something:
You can split a delimitted string by transforming it into XML. x,z comes out as <x>x</x><x>z</x>. You can use this to create a derived table.
I use a CTE to avoid a created or declared table...
You can use CROSS APPLY for row-wise actions. Here I use CHARINDEX to find the position(s) of the chars you are looking for.
If all of them are not found, there SUM is zero. I use GROUP BY and HAVING to check this.
Hope this is clear :-)
DECLARE #chars VARCHAR(100)='x,z';
WITH Splitted AS
(
SELECT A.B.value('.','char') AS TheChar
FROM
(
SELECT CAST('<x>' + REPLACE(#chars,',','</x><x>')+ '</x>' AS XML) AS AsXml
) AS tbl
CROSS APPLY AsXml.nodes('/x') AS A(B)
)
SELECT name
FROM sys.objects
CROSS APPLY (SELECT CHARINDEX(TheChar,name) AS Found FROM Splitted) AS Found
GROUP BY name,Found
HAVING SUM(Found)>0

With
SrcTab As (
Select *
From (values ('Contains x y z')
, ('Contains x and y')
, ('Contains y only')) v (SrcField)),
CharList As ( --< CTE instead of temporary table
Select *
From (values ('x')
, ('y')
, ('z')) v (c))
Select SrcField
From SrcTab, CharList
Group By SrcField
Having SUM(SIGN(CharIndex(C, SrcField))) < 3 --< Count hits
;
If Distinct is not desirable and we need to only check count for each row:
With
SrcTab As ( --< Sample Data CTE
Select *
From (values ('Contains x y z')
, ('Contains x and y')
, ('Contains y only')
, ('Contains y only')) v (SrcField))
Select SrcField
From SrcTab
Where (
Select Count(*) --< Count hits
From (Values ('x'), ('y'), ('z')) v (c)
Where CharIndex(C, SrcField) > 0
) < 3
;

Using Numbers Table and Joins..I used declared characters as only 4 for demo purposes
Input:
12345
abcdef
ab
Declared table:used only 3 for demo..
a
b
c
Output:
12345
ab
Demo:
---Table population Scripts
Create table #t
(
val varchar(20)
)
insert into #t
select '12345'
union all
select 'abcdef'
union all
select 'ab'
create table #declarecharacters
(
dc char(1)
)
insert into #declarecharacters
select 'a'
union all
select 'b'
union all
select 'c'
Query used
;with cte
as
(
select * from #t
cross apply
(
select substring(val,n,1) as strr from numbers where n<=len(val))b(outputt)
)
select val from
cte c
left join
#declarecharacters dc1
on
dc1.dc=c.outputt
group by val
having
sum(case when dc is null then 0 else 1 end ) <3

SQL Server Script Quick Replace all found strings with incrementing integer

I have large INSERT SQL script that I want to modify it with quick replace. By replacing each found string with interger, where every next integer is previous integer+1.
Before:
INSERT Compartment (CompartmentID) VALUES ('A')
INSERT Compartment (CompartmentID) VALUES ('B')
After:
INSERT Compartment (CompartmentID) VALUES (1)
INSERT Compartment (CompartmentID) VALUES (2)
I know how to find the specific strings, but I can't find anywhere syntax or way have to replace it incrementing integers.

You can replace all you char CompartmentID with ordered numbers like this:
declare #Compartment table(CompartmentID varchar(10), name varchar(10), intID int)
INSERT INTO #Compartment(CompartmentID, name) values
('a', 'a')
, ('b', 'b')
, ('c', 'c')
, ('d', 'd')
, ('e', 'e')
UPDATE c SET CompartmentID = o.ID
FROM #Compartment c
INNER JOIN (
SELECT CompartmentID, ID = ROW_NUMBER() over(ORDER BY CompartmentID)
FROM #Compartment
) o ON c.CompartmentID = o.CompartmentID
SELECT * FROM #Compartment
Output:
CompartmentID name
1 a
2 b
3 c
4 d
5 e
It would be better to create a new column of type int or change the type of CompartmentID once the update is finished.
You should also use an identity column if you want the numbers to be incremented automaticaly.

Not sure how you want to handle empty string. You can select the rows where CompartmentID contains a character that isnt a numeric and update the result set like this:
DECLARE #Compartment table(CompartmentID varchar(20))
INSERT #Compartment(CompartmentID) VALUES ('A'),('A'),('B'),('1'),('A1')
-- EDIT: Changed answer
;WITH CTE as
(
SELECT CompartmentID, DENSE_RANK() over (ORDER BY CompartmentID) rn
FROM #Compartment
--WHERE CompartmentID LIKE '%[^0-9]%' OR CompartmentID = ''
)
UPDATE CTE
SET CompartmentID = rn
FROM CTE
Result:
CompartmentID
2
2
4
1
3
Note: Now all id will CompartmentID changed(also the numeric CompartmentID), identical values for old CompartmentID will get identical numeric values.

TSQL matching the first instances of multiple values in a resultset

Say I have part of a large query, as below, that returns a resultset with multiple rows of the same key information (PolNum) with different value information (PolPremium) in a random order.
Would it be possible to select the first matching PolNum fields and sum up the PolPremium. In this case I know that there are 2 PolNumber's used so given the screenshot of the resultset (yes I know it starts at 14 for illustration purposes) and return the first values and sum the result.
First match for PolNum 000035789547
(ROW 14) PolPremium - 32.00
First match for PolNum 000035789547
(ROW 16) PolPremium - 706043.00
Total summed should be 32.00 + 706043.00 = 706072.00
Query
OUTER APPLY
(
SELECT PolNum, PolPremium
FROM PN20
WHERE PolNum IN(SELECT PolNum FROM SvcPlanPolicyView
WHERE SvcPlanPolicyView.ControlNum IN (SELECT val AS ServedCoverages FROM ufn_SplitMax(
(SELECT TOP 1 ServicedCoverages FROM SV91 WHERE SV91.AccountKey = 3113413), ';')))
ORDER BY PN20.PolEffDate DESC
}
Resultset

Suppose that pic if the final result your query produces. Then you can do something like:
DECLARE #t TABLE
(
PolNum VARCHAR(20) ,
PolPremium MONEY
)
INSERT INTO #t
VALUES ( '000035789547', 32 ),
( '000035789547', 76 ),
( '000071709897', 706043.00 ),
( '000071709897', 1706043.00 )
SELECT t.PolNum ,
SUM(PolPremium) AS PolPremium
FROM ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY PolNum ORDER BY PolPremium ) AS rn
FROM #t
) t
WHERE rn = 1
GROUP BY GROUPING SETS(t.PolNum, ( ))
Output:
PolNum PolPremium
000035789547 32.00
000071709897 706043.00
NULL 706075.00
Just replace #t with your query. Also I assume that row with minimum of premium is the first. You could probably do filtering top row in outer apply part but it really not clear for me what is going on there without some sample data.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Get the highest conformity between two strings - sql-server

Related

Split a string in SQL by hyphen in 2012 version

TSQL, change value on a comma delimited column

T-SQL select value where value contains less than 3 of the declared characters

SQL Server Script Quick Replace all found strings with incrementing integer

TSQL matching the first instances of multiple values in a resultset

Categories

Resources