SQL Server Full Text Searching - sql-server

I'm currently working on an application where we have a SQL-Server database and I need to get a full text search working that allows us to search people's names.
Currently the user can enter a into a name field that searches 3 different varchar cols. First, Last, Middle names
So say I have 3 rows with the following info.
1 - Phillip - J - Fry
2 - Amy - NULL - Wong
3 - Leo - NULL - Wong
If the user enters a name such as 'Fry' it will return row 1. However if they enter Phillip Fry, or Fr, or Phil they get nothing.. and I don't understand why its doing this. If they search for Wong they get rows 2 and 3 if they search for Amy Wong they again get nothing.
Currently the query is using CONTAINSTABLE but I have switched that with FREETEXTTABLE, CONTAINS, and FREETEXT without any noticeable differences in the results. The table methods are be preferred because they return the same results but with ranking.
Here is the query.
....
#Name nvarchar(100),
....
--""s added to prevent crash if searching on more then one word.
DECLARE #SearchString varchar(100)
SET #SearchString = '"'+#Name+'"'
SELECT Per.Lastname, Per.Firstname, Per.MiddleName
FROM Person as Per
INNER JOIN CONTAINSTABLE(Person, (LastName, Firstname, MiddleName), #SearchString)
AS KEYTBL
ON Per.Person_ID = KEYTBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEYTBL.RANK DESC;
....
Any Ideas...? Why this full text search is not working correctly ?

If you're just searching people's names, it might be in your best interest to not even use the full text index. Full text index makes sense when you have large text fields, but if you're mostly dealing with one word per field, I'm not sure how much extra you would get out of full text indexes. Waiting for the full text index to reindex itself before you can search for new records can be one of the many problems.
You could just make a query such as the following. Split your searchstring on spaces, and create a list of the search terms.
Select FirstName,MiddleName,LastName
From person
WHERE
Firstname like #searchterm1 + '%'
or MiddleName like #searchterm1 + '%'
or LastName like #searchterm1 + '%'
or Firstname like #searchterm2 + '%'
etc....

FreeTextTable should work.
INNER JOIN FREETEXTTABLE(Person, (LastName, Firstname, MiddleName), #SearchString)
#SearchString should contain the values like 'Phillip Fry' (one long string containing all of the lookup strings separated by spaces).
If you would like to search for Fr or Phil, you should use asterisk: Phil* and Fr*
'Phil' is looking for exactly the word 'Phil'. 'Phil*' is looking for every word which is starting with 'Phil'

Thanks for the responses guys I finally was able to get it to work. With part of both Biri, and Kibbee's answers. I needed to add * to the string and break it up on spaces in order to work. So in the end I got
....
#Name nvarchar(100),
....
--""s added to prevent crash if searching on more then one word.
DECLARE #SearchString varchar(100)
--Added this line
SET #SearchString = REPLACE(#Name, ' ', '*" OR "*')
SET #SearchString = '"*'+#SearchString+'*"'
SELECT Per.Lastname, Per.Firstname, Per.MiddleName
FROM Person as Per
INNER JOIN CONTAINSTABLE(Person, (LastName, Firstname, MiddleName), #SearchString)
AS KEYTBL
ON Per.Person_ID = KEYTBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEYTBL.RANK DESC;
....
There are more fields being searched upon I just simplified it for the question, sorry about that, I didn't think it would effect the answer. It actually searches a column that has a csv of nicknames and a notes column as well.
Thanks for the help.

Another approach could be to abstract the searching away from the individual fields.
In other words create a view on your data which turns all the split fields like firstname lastname into concatenated fields i.e. full_name
Then search on the view. This would likely make the search query simpler.

You might want to check out Lucene.net as an alternative to Full Text.

Related

SQL How to compare if a variable is the same as a value in a database column?

The problem I have is the following:
I work with a ticket system that uses plugins to realize workflows.
In this case I use SQL to presort incoming emails.
The SQL query looks like this:
SELECT Count(Case when {MSG_CC_002_DeviceReg_MailBody} LIKE '%You have received an invoice from%' Then 1 END);
What I want to do now is instead of using LIKE and then a certain phrase like above, I want to compare this to a column in a database table that contains all necessary phrases.
The table has only two columns, phraseID and phrase.
{MSG_CC_002_DeviceReg_MailBody} is the variable that needs to compared against the values of the column.
So if the variable matches with an entry in the column it should just return 1.
[Edit:]
This is just one of the things I want to use this for, I also have a variable {MSG_CC_002_DeviceReg_MailSender} that will provide the email address that I want to compare to a similar table that contains email addresses.
Is this possible?
If so - how?
You can use join or a subquery:
select count(*)
from t
where exists (select 1
from othertable ot
where {MSG_CC_002_DeviceReg_MailBody} LIKE '%' + ot.phrase + '%'
) ;
This will be dog-slow if you have a lot of phrases or email addresses, but it'll give you what you want.
SELECT COUNT(*) AS RetValue
FROM PhraseTable
WHERE {MSG_CC_002_DeviceReg_MailBody} LIKE '%' + phrase + '%';
Yes it is possible, you can achieve this with using dynamic query. Basically you need to construct your query as a string then execute it.
You can find examples and more information about dynamic query within the following link;
https://www.mssqltips.com/sqlservertip/1160/execute-dynamic-sql-commands-in-sql-server/

T-SQL Search on 'combined' column name

I would like to know if its possible to search on two combined columns. For instance in my application I have an input field for 'Full Name', but in the SQL Database I have columns Name, Surname.
Is it possible to construct a query to do a search like Name + Surname = %Full Name% ?
You can do this:
select *
from Production.Product
where (Name + ' ' + ProductNumber) like 'Bearing Ball BA-8327'
However, if you want to take advantage of indexing, you better split you your input parameter first and then use direct field comparison.
Yes you can do this, but it will be quite slow. Since Name and Surname can be indexed.
Expanding on the suggestions from previous answers, try this.
SELECT * FROM People
WHERE firstname = substring(#fullname, 1, charindex(' ', #fullname) - 1)
AND surname = substring(#fullname, charindex(' ', #fullname) + 1, len(#fullname))
Hope this helps.
If disk space is not a concern you can add a 'persistent' calculated column with the fullname this way you can maintain the specifics of two specific columns while achieving the possibility of indexing the full name column that is auto updated when either first or last name changes.
The caveats are the extra space needed as well as the somewhat slower inserts. If these two situations are a concern you can use the calculated columns without persistence to get a transparent fullname column and cleaner queries.
http://msdn.microsoft.com/en-us/library/ms189292.aspx

How would I determine if a varchar field in SQL contains any numeric characters?

I'm working on a project where we have to figure out if a given field is potentially a company name versus an address.
In taking a very broad swipe at it, we are going under the assumption that if this field contains no numbers, odds are it is a name vs. a street address (we're aiming for the 80% case, knowing some will have to be done manually).
So now to the question at hand. Given a table with, for the sake of simplicity, a single varchar(100) column, how could I find those records who have no numeric characters at any position within the field?
For example:
"Main Street, Suite 10A" --Do not return this.
"A++ Billing" --Should be returned
"XYZ Corporation" --Should be returned
"100 First Ave, Apt 20" --Should not be returned
Thanks in advance!
Sql Server allows for a regex-like syntax for range [0-9] or Set [0123456789] to be specified in a LIKE operator, which can be used with the any string wildcard (%). For example:
select * from Address where StreetAddress not like '%[0-9]%';
The wildcard % at the start of the like will obviously hurt performance (Scans are likely), but in your case this seems inevitable.
Another MSDN Reference.
select * from table where column not like '%[0-9]%'
This query returns you all rows from table where column does not contain any of the digits from 0 to 9.
I like the simple regex approach, but for the sake of discussion will mention this alternative which uses PATINDEX.
SELECT InvoiceNumber from Invoices WHERE PATINDEX('%[0-9]%', InvoiceNumber) = 0
This worked for me .
select total_employee_count from company_table where total_employee_count like '%[^0-9]%'
This returned all rows that contains non numeric values including 2-3 ..
This Query to list out Tables created with numeric Characters
select * from SYSOBJECTS where xtype='u' and name like '%[0-9]%'

Full-text search across concatenated columns?

I'm new to free-text search, so pardon the newbie question. Suppose I have the following full-text index:
Create FullText Index on Contacts(
FirstName,
LastName,
Organization
)
Key Index PK_Contacts_ContactID
Go
I want to do a freetext search against all three columns concatenated
FirstName + ' ' + LastName + ' ' + Organization
So that for example
Searching for jim smith returns all contacts named Jim Smith
Searching for smith ibm returns all contacts named Smith who work at IBM
This seems like it would be a fairly common scenario. I would have expected this to work:
Select c.FirstName, c.LastName, c.Organization, ft.Rank
from FreeTextTable(Contacts, *, 'smith ibm') ft
Left Join Contacts c on ft.[Key]=c.ContactID
Order by ft.Rank Desc
but this is apparently doing smith OR ibm; it returns a lot of Smiths who don't work at IBM and vice versa. Surprisingly, searching for smith AND ibm yields identical results.
This does what I want...
Select c.FirstName, c.LastName, c.Organization
from Contacts c
where Contains(*, 'smith') and Contains(*, 'ibm')
...but then I can't parameterize queries coming from the user -- I would have to break up the search string into words myself and assemble the SQL on the fly, which is ugly and unsafe.
The usual approach I take is to create a search view or calculated column (using a trigger) that puts all of those values into a single field.
The other thing I do is to use a full-text search engine- such as Lucene/Solr.
Boolean operators are only supported for CONTAINS, not FREETEXT.
Try your AND query with CONTAINSTABLE.

MS SQL FTI - searching on "n*" returns numbers

This seems like odd behaviour from SQL's full-text-index.
FTI stores number in its index with an "NN" prefix, so "123" is saved as "NN123".
Now when a user searches for words beginning with N (i.e. contains "n*" ) they also get all numbers.
So:
select [TextField]
from [MyTable]
where contains([TextField], '"n*"')
Returns:
MyTable.TextField
--------------------------------------------------
This text contains the word navigator
This text is nice
This text only has 123, and shouldn't be returned
Is there a good way to exclude that last row? Is there a consistent workaround for this?
Those extra "" are needed to make the wildcard token work:
select [TextField] from [MyTable] where contains([TextField], 'n*')
Would search for literal n* - and there aren't any.
--return rows with the word text
select [TextField] from [MyTable] where contains([TextField], 'text')
--return rows with the word tex*
select [TextField] from [MyTable] where contains([TextField], 'tex*')
--return rows with words that begin tex...
select [TextField] from [MyTable] where contains([TextField], '"tex*"')
There are a couple of ways to handle this, though neither is really all that great.
First, add a column to your table that says that TextField is really a number. If you could do that and filter, you would have the most performant version.
If that's not an option, then you will need to add a further filter. While I haven't extensively tested it, you could add the filter AND TextField NOT LIKE 'NN%[0-9]%'
The downside is that this would filter out 'NN12NOO' but that may be an edge case not represented by your data.

Resources