SQL Server Full Text Search to match contact name to prevent duplicates - sql-server

Using SQL Server Azure or 2017 with Full Text Search, I need to return possible matches on names.
Here's the simple scenario: an administrator is entering contact information for a new employee, first name, last name, address, etc. I want to be able to search the Employee table for a possible match on the name(s) to see if this employee has already been entered in the database.
This might happen as an autosuggest type of feature, or simply display some similar results, like here in Stackoverflow, while the admin is entering the data.
I need to prevent duplicates!
If the admin enters "Bob", "Johnson", I want to be able to match on:
Bob Johnson
Rob Johnson
Robert Johnson
This will give the administrator the option of seeing if this person has already been entered into the database and choose one of those choices.
Is it possible to do this type of match on words like "Bob" and include "Robert" in the results? If so, what is necessary to accomplish this?

Try this.
You need to change the #per parameter value to your requirement. It indicates how many letters out of the length of the first name should match for the result to return. I just set it to 50% for testing purposes.
The dynamic SQL piece inside the loop adds all the CHARINDEX result per letter of the first name in question, to all existing first names.
Caveats:
Repeating letters will of course be misleading, like Bob will count 3 matches in Rob because there's 2 Bs in Bob.
I didn't consider 2 first names, like Bob Robert Johnson, etc so this will fail. You can improve on that however, but you get the idea.
The final SQL query gets the LetterMatch that is greater than or equal to the set value in #per.
DECLARE #name varchar(MAX) = 'Bobby Johnson' --sample name
DECLARE #first varchar(50) = SUBSTRING(#name, 0, CHARINDEX(' ', #name)) --get the first part of the name before space
DECLARE #last varchar(50) = SUBSTRING(#name, CHARINDEX(' ', #name) + 1, LEN(#name) - LEN(#first) - 1) --get the last part of the name after space
DECLARE #walker int = 1 --for looping
DECLARE #per float = LEN(#first) * 0.50 --declare percentage of how many letters out of the length of the first name should match. I just used 50% for testing
DECLARE #char char --for looping
DECLARE #sql varchar(MAX) --for dynamic SQL use
DECLARE #matcher varchar(MAX) = '' --for dynamic SQL use
WHILE #walker <> LEN(#first) + 1 BEGIN --loop through all the letters of the first name saved in #first variable
SET #char = SUBSTRING(#first, #walker, 1) --save the current letter in the iteration
SET #matcher = #matcher + IIF(#matcher = '', '', ' + ') + 'IIF(CHARINDEX(''' + #char + ''', FirstName) > 0, 1, 0)' --build the additional column to be added to the dynamic SQL
SET #walker = #walker + 1 --move the loop
END
SET #sql = 'SELECT * FROM (SELECT FirstName, LastName, ' + #matcher + ' AS LetterMatch
FROM TestName
WHERE LastName LIKE ' + '''%' + #last + '%''' + ') AS src
WHERE CAST(LetterMatch AS int) >= ROUND(' + CAST(#per AS varchar(50)) + ', 0)'
SELECT #sql
EXEC(#sql)

SELECT * FROM tbl_Names
WHERE Name LIKE '% user defined text %';
using a text in between % % will search those text on any position in the data.

Related

Execution time for a SQL Query seems excessive

I have a data set of about 33 million rows and 20 columns. One of the columns is a raw data tab I'm using to extract relevant data from, inlcuding ID's and account numbers.
I extracted a column for User ID's into a temporary table to trim the User ID's of spaces. I'm now trying to add the trimmed User ID column back into the original data set using this code:
SELECT *
FROM [dbo].[DATA] AS A
INNER JOIN #TempTable AS B ON A. [RawColumn] = B. [RawColumn]
Extracting the User ID's and trimming the spaces took about a minute for each query. However, running this last query I'm at the 2 hour mark and I'm only 2% of the way through the dataset.
Is there a better way to run the query?
I'm running the query in SQL Server 2014 Management Studio
Thanks
Update:
I continued to let it run through the night. When I got back into work, only 6 million rows had been completed of the 33 million rows. I cancelled the execution and I'm trying to add a smaller primary key (The only other key I could see on the table was the [RawColumn], which was a very long string of text) using:
ALTER TABLE [dbo].[DATA]
ADD ID INT IDENTITY(1,1)
Right now I'm an hour into the execution.
Next, I'm planning to make it the primary key using
ALTER TABLE dbo.[DATA]
ADD CONSTRAINT PK_[DATA] PRIMARY KEY(ID)
I'm not familiar with using Indexes.. I've tried looking up on Stack Overflow how to create one, but from what I'm reading it sounds like it would take just as long to create an index as it would to run this query. Am I wrong about that?
For context on the RawColumn data, it looks something like this:
FirstName: John LastName: Smith UserID: JohnS Account#: 000-000-0000
Update #2:
I'm now learning that using "ALTER TABLE" is a bad idea. I should have done a little bit more research into how to add a primary key to a table.
Update #3
Here's the code I used to extract the "UserID" code out of the "RawColumn" data.
DROP #TEMPTABLE1
GO
SELECT [RAWColumn],
SUBSTRING([RAWColumn], CHARINDEX('USERID:', [RAWColumn])+LEN('USERID:'), CHARINDEX('Account#:', [RAWColumn])-Charindex('Username:', [RAWColumn]) - LEN('Account#:') - LEN('USERID:')) AS 'USERID_NEW'
INTO #TempTable1
FROM [dbo].[DATA]
Next I trimmed the data from the temporary tables
DROP #TEMPTABLE2
GO
SELECT [RawColumn],
LTRIM([USERID_NEW]) AS 'USERID_NEW'
INTO #TempTable2
FROM #TempTable1
So now I'm trying to get the data from #TEMPTABLE2 back into my original [DATA] table. Hopefully this is more clear now.
So I think your parsing code is a little bit wrong. Here's an approach that doesn't assume that the values appear in any particular order. It does assume that the header/tag name has a space after the colon character and it assumes that the value end at the subsequent space character. Here's a snippet that manipulates a single value.
declare #dat varchar(128) = 'FirstName: John LastName: Smith UserID: JohnS Account#: 000-000-0000';
declare #tag varchar(16) = 'UserID: ';
/* datalength() counts the trailing space character unlike len() */
declare #idx int = charindex(#tag, #dat) + datalength(#tag);
select substring(#dat, #idx, charindex(' ', #dat + ' ', #idx + 1) - #idx) as UserID
To use it in a single query without the temporary variable, the most straightforward approach is to just replace each instance of "#idx" with the original expression:
declare #tag varchar(16) = 'UserID: ';
select RawColumn,
substring(
RawColumn,
charindex(#tag, RawColumn) + datalength(#tag),
charindex(
' ', RawColumn + ' ',
charindex(#tag, RawColumn) + datalength(#tag) + 1
) - charindex(#tag, RawColumn) + datalength(#tag)
) as UserID
from dbo.DATA;
As an update it looks something like this:
declare #tag varchar(16) = 'UserID: ';
update dbo.DATA
set UserID =
substring(
RawColumn,
charindex(#tag, RawColumn) + datalength(#tag),
charindex(
' ', RawColumn + ' ',
charindex(#tag, RawColumn) + datalength(#tag) + 1
) - charindex(#tag, RawColumn) + datalength(#tag)
) as UserID;
You also appear to be ignoring upper/lower case in your string matches. It's not clear to me whether you need to consider that more carefully.

MS SQL - CONTAINS full-text search w/ variable number of values, not using dynamic sql

I've created a full-text indexed column on a table.
I have a stored procedure to which I may pass the value of a variable "search this text". I want to search for "search", "this" and "text" within the full-text column. The number of words to search would be variable.
I could use something like
WHERE column LIKE '%search%' OR column LIST '%this%' OR column LIKE '%text%'
But that would require me to use dynamic SQL, which I'm trying to avoid.
How can I use my full-text search to find each of the words, presumably using CONTAINS, and without converting the whole stored procedure to dynamic SQL?
If you say you definitely have SQL Table Full Text Search Enabled, Then you can use query like below.
select * from table where contains(columnname,'"text1" or "text2" or "text3"' )
See link below for details
Full-Text Indexing Workbench
So I think I came up with a solution. I created the following scalar function:
CREATE FUNCTION [dbo].[fn_Util_CONTAINS_SearchString]
(
#searchString NVARCHAR(MAX),
#delimiter NVARCHAR(1) = ' ',
#ANDOR NVARCHAR(3) = 'AND'
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
IF #searchString IS NULL OR LTRIM(RTRIM(#searchString)) = '' RETURN NULL
-- trim leading/trailing spaces
SET #searchString = LTRIM(RTRIM(#searchString))
-- remove double spaces (prevents empty search terms)
WHILE CHARINDEX(' ', #searchString) > 0
BEGIN
SET #searchString = REPLACE(#searchString,' ',' ')
END
-- reformat
SET #searchString = REPLACE(#searchString,' ','" ' + #ANDOR + ' "') -- replace spaces with " AND " (quote) AND (quote)
SET #searchString = ' "' + #searchString + '" ' -- surround string with quotes
RETURN #searchString
END
I can get my results:
DECLARE #ftName NVARCHAR (1024) = dbo.fn_Util_CONTAINS_SearchString('value1 value2',default,default)
SELECT * FROM Table WHERE CONTAINS(name,#ftName)
I would appreciate any comments/suggestions.
For your consideration.
I understand your Senior wants to avoid dynamic SQL, but it is my firm belief that Dynamic SQL is NOT evil.
In the example below, you can see that with a few parameters (or even defaults), and a 3 lines of code, you can:
1) Dynamically search any source
2) Return desired or all elements
3) Rank the Hit rate
The SQL
Declare #SearchFor varchar(max) ='Daily,Production,default' -- any comma delim string
Declare #SearchFrom varchar(150) ='OD' -- table or even a join statment
Declare #SearchExpr varchar(150) ='[OD-Title]+[OD-Class]' -- Any field or even expression
Declare #ReturnCols varchar(150) ='[OD-Nr],[OD-Title]' -- Any field(s) even with alias
Set #SearchFor = 'Sign(CharIndex('''+Replace(Replace(Replace(#SearchFor,' , ',','),', ',''),',',''','+#SearchExpr+'))+Sign(CharIndex(''')+''','+#SearchExpr+'))'
Declare #SQL varchar(Max) = 'Select * from (Select Distinct'+#ReturnCols+',Hits='+#SearchFor+' From '+#SearchFrom + ') A Where Hits>0 Order by Hits Desc'
Exec(#SQL)
Returns
OD-Nr OD-Title Hits
3 Daily Production Summary 2
6 Default Settings 1
I should add that my search string is comma delimited, but you can change to space.
Another note CharIndex can be substanitally faster that LIKE. Take a peek at
http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex

Generate column name dynamically in sql server

Please look at the below query..
select name as [Employee Name] from table name.
I want to generate [Employee Name] dynamically based on other column value.
Here is the sample table
s_dt dt01 dt02 dt03
2015-10-26
I want dt01 value to display as column name 26 and dt02 column value will be 26+1=27
I'm not sure if I understood you correctly. If I'am going into the wrong direction, please add comments to your question to make it more precise.
If you really want to create columns per sql you could try a variation of this script:
DECLARE #name NVARCHAR(MAX) = 'somename'
DECLARE #sql NVARCHAR(MAX) = 'ALTER TABLE aps.tbl_Fabrikkalender ADD '+#name+' nvarchar(10) NULL'
EXEC sys.sp_executesql #sql;
To retrieve the column name from another query insert the following between the above declares and fill the placeholders as needed:
SELECT #name = <some colum> FROM <some table> WHERE <some condition>
You would need to dynamically build the SQL as a string then execute it. Something like this...
DECLARE #s_dt INT
DECLARE #query NVARCHAR(MAX)
SET #s_dt = (SELECT DATEPART(dd, s_dt) FROM TableName WHERE 1 = 1)
SET #query = 'SELECT s_dt'
+ ', NULL as dt' + RIGHT('0' + CAST(#s_dt as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 1) as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 2) as VARCHAR), 2)
+ ', NULL as dt' + RIGHT('0' + CAST((#s_dt + 3) as VARCHAR), 2)
+ ' FROM TableName WHERE 1 = 1)
EXECUTE(#query)
You will need to replace WHERE 1 = 1 in two places above to select your data, also change TableName to the name of your table and it currently puts NULL as the dynamic column data, you probably want something else there.
To explain what it is doing:
SET #s_dt is selecting the date value from your table and returning only the day part as an INT.
SET #query is dynamically building your SELECT statement based on the day part (#s_dt).
Each line is taking #s_dt, adding 0, 1, 2, 3 etc, casting as VARCHAR, adding '0' to the left (so that it is at least 2 chars in length) then taking the right two chars (the '0' and RIGHT operation just ensure anything under 10 have a leading '0').
It is possible to do this using dynamic SQL, however I would also consider looking at the pivot operators to see if they can achieve what you are after a lot more efficiently.
https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx

SQL select to get a string between two spaces

I have a field with names in the format DOE JOHN HOWARD or DOE JOHN H.
I need a query to get the string between the two spaces (JOHN in this case).
The SO answer here shows how to do that when the desired substring is between two different strings, but I don't see how to apply something similar when the desired substring is between two instances of the same string (a space in this case).
How can I do that?
There is a somewhat sneaky way you could do this using PARSENAME.
It's intended purpose is to get particular parts of an object/namespace, however, in this case you could use it by replacing the strings with periods first.
E.g.,
SELECT PARSENAME(REPLACE('DOE JOHN HOWARD',' ','.'),2)
One way:
select
left(substring(fld,
charindex(' ', fld) + 1, len(fld)),
charindex(' ', substring(fld, charindex(' ', fld) + 2, len(fld))))
After doing some searches to resolve my problem, realised there is no generic answer, so below is piece of code to find string between some other strings. I think it may be useful for somebody in future.
DECLARE #STRING NVARCHAR(MAX) = 'Something here then stringBefore Searching stringAfter something there.'
DECLARE #FIRST NVARCHAR(20) = 'stringBefore'
DECLARE #SECOND NVARCHAR(20) = 'stringAfter'
DECLARE #SEARCHING NVARCHAR (20)
SET #SEARCHING = (SELECT SUBSTRING(#STRING, CHARINDEX(#FIRST, #STRING) + LEN(#FIRST), CHARINDEX(#SECOND, #STRING) - CHARINDEX(#FIRST, #STRING) - LEN(#FIRST)))
-- if you want to remove empty spaces
SET #SEARCHING = REPLACE(#SEARCHING, ' ', '')
SELECT #SEARCHING
Then output as below:
(No column name)
Searching

Search Entire Database To find extended ascii codes in sql

We have issues with extended ascii codes getting in our database (128-155)
Is there anyway to search the entire database and display the results of any of these characters that may be in there and where they are located within the tables and columns.
Hope that makes sense.
I have the script to search entire DB, but having trouble with opening line.
DECLARE #SearchStr nvarchar(100)
SET #SearchStr != between char(32) and char(127)
I have this originally that works, but I need to extend the range I'm looking for.
SET #SearchStr = '|' + char(9) + '|' + char(10) + '|' + char(13)
Thanks
It's very unclear what your data looks like, but this might help you to get started:
declare #TestData table (String nvarchar(100))
insert into #TestData select N'abc'
insert into #TestData select N'def'
insert into #TestData select char(128)
insert into #TestData select char(155)
declare #SearchPattern nvarchar(max) = N'%['
declare #i int = 128
while #i <= 155
begin
set #SearchPattern += char(#i)
set #i += 1
end
set #SearchPattern += N']%'
select #SearchPattern
select String
from #TestData
where String like #SearchPattern
Of course you'll need to add some code to loop over every table and column that you want to query (see this question), and it's possible that this code will behave differently on different collations.
... where dodgyColumn is your column with questionable data ....
WHERE(patindex('%[' + char(127) + '-' + char(255) + ']%', dodgyColumn COLLATE Latin1_General_BIN2) > 0)
This works for us, to identify extended ASCII characters in our otherwise normal ASCII data (characters, numbers, punctuation, dollar and percent signs, etc.)

Resources