How to handle single character search terms in MS-SQL FreeText searching? - sql-server

I am having a problem with a FreeText search, currently running on a SQL 2000 server.
In a table of approximately 1.3 million rows which contain company names, I am attempting to use a FreeText query. However since SQL Server strips out special characters and single characters when building its index, our code does the same when submitting the query.
For example searches like 'Texas A & M' end up only querying for 'Texas' which returns a ton of irrelevant records.
What's the best-practice for handling these sorts of search queries? Would this problem be rectified by upgrading to a newer version of SQL Server?
At this point a third-party indexing engine like Lucene is not an option, even if it would fix the problem, which I am not sure of.

You can try using a single character wildcard '_' similar to:
WHERE myColumn like 'Texas_A_&_M'
or
WHERE myColumn like 'Texas%A_&_M'

You may check if improvements of in SQL Server 2005 can solve your problem:
SQL Server 2005 Full-Text Search: Internals and Enhancements, in particular about Noise Words in New Features for the Developer.

if you are searching company names and not long passages of text, why not just use LIKE?
...
WHERE
CompanyName LIKE '%Texas A%&%M%'

Related

Is there any way to search for numbers with leading zeros in SQL Server Full-text indexes

We have table with a Body (NVARCHAR(MAX)) column that contains text from emails and files. The column is full-text indexed.
Some of the documents contain reference numbers such as 00123. However the full-text engine seems to strip leading zeros so when we search using CONTAINS(Body, '00123') it also returns false positives containing just 123.
Is there anyway to fix this? Ideally there would be a way to address this in the query, but we would also consider other options such as alternative word breakers etc.
We are using SQL Server 2008 R2 and later.
According to SS 2012's Behavior Changes to Full-Text Search page, the previous version of the word breakers, when given the term 022, produced 022 and nn022, but the new version produces 022 and nn22. So SQL Server 2008 R2 will produce the desired result when searching for numbers with leading zeros but SQL Server 2012 will not. (This assumes the columns to be full-text indexed are using English as their language for word breaking).
There are a couple of ways to achieve the desired outcome on SQL Server 2012. You can either revert to the previous word breakers or, if you have a limited number of terms that you are a looking for, consider using a custom dictionary.
Custom dictionaries are described in Creating Custom Dictionaries for special terms to be indexed 'as-is' in SQL Server 2008 Full-Text Indexes and Customize the Behavior of Word Breakers with a Custom Dictionary.
Note: The first article says that the language hex code for English is 1033, but 1033 is the LCID for English. The language hex code for English is 0009. So for an English dictionary the filename should be Custom0009.lex.

SQL server search

I'm going to perform a search in my SQL server DB (ASP.NET, VS2010,C#), user types a phrase and I should search this phrase in several fields, how is it possible? do we have functions such as CONTAINS() in SQL server? can I perform my search using normal queries or I should work in my queries using C# functions?
for instance I have 3 fields in my table which can contain user search phrase, is it OK to write following sql command? (for instance user search phrase is GAME)
select * from myTable where columnA='GAME' or columnB='GAME' or columnC='GAME
I have used AND between different conditions, but can I use OR? how can I search inside my table fields? if one of my fields contains the phrase GAME, how can I find it? columnA='GAME' finds only those fields that are exactly 'GAME', is it right?
I'm a bit confused about my search approach, please help me, thanks guys
OR works fine if you want at least one of the conditions to be true.
If you want to search inside your text strings you can use LIKE
select * from myTable where columnA like '%GAME%' or columnB like '%GAME%' or columnC like '%GAME%'
Note that % is the wildcard.
If you want to find everything that begins with 'GAME' you type LIKE 'GAME%', if you allow 'GAME' to be in the middle you need % in both ends.
You can use LIKE instead of equals and then it can contain wildcard characters, so your example could be:
select * from myTable where columnA LIKE '%GAME%' or columnB LIKE '%GAME%' or columnC LIKE '%GAME%'
Further information may be found in MSDN
This is going to do some pretty heavy lifting in terms of what the database has to do though - I would suggest you consider something like full text search as I think it would more likely be suited to your scenario and provide faster results (of course, if you never have many records to search LIKE would probably do fine). Information on this is also in MSDN
Don't use LIKE, as suggested by other answers. It won't work with indexes, and therefore will be slow to return and expensive to run. Instead, you have two options:
Option 1: Full-Text Indexes
do we have functions such as CONTAINS() in SQL server?
Yes! You can use the CONTAINS() function in sql server. You just have to set up a full-text index for each of the columns you need to search on.
Option 2: Lucene.Net
Lucene.Net is a popular client-side library for searching text data that integrates closely with Sql Server. You can use it to make implementing your search a little easier.

SQL Server fulltext search does not return all the results

I tried to use full-text search for a table called "Business" in SQL Server 2008. Here is the statement (the search term is in Chinese).
select * from Business biz where CONTAINS(biz.*,'家具')
And then I use like statement to do the same
select * from Business where Name like '%家具%'
The full-text search returns 8 results and the like search returns 9 results which is what I expected. Does anyone know what might cause this?
I don't know the Chinese language, so I can't say for sure, but here's my best guess.
SQL Server's fulltext searching is word based, while LIKE is looking for character patterns within a string. As an example in English, a CONTAINS search for "warn" would not find the word "forewarned", but a LIKE for '%warn%' would.

SQL Server; index on TEXT column

I have a database table with several columns; most of them are VARCHAR(x) type columns, and some of these columns have an index on them so that I can search quickly for data inside it.
However, one of the columns is a TEXT column, because it contains a very large amount of data (23 kb of plain ascii text etc). I want to be able to search in that column (... WHERE col1 LIKE '%search string%'... ), but currently it's taking forever to perform the query. I know that the query is slow because of this column search because when I remove that criteria from the WHERE clause the query completes (what I would consider), instantaneously.
I can't add an index on this column because that option is grayed out for that column in the index builder / wizard in SQL Server Management Studio.
What are my options here, to speed up the query search in that column?
Thanks for your time...
Update
Ok, so I looked into the full text search and did all that stuff, and now I would like to run queries. However, when using "contains", it only accepts one word; what if I need an exact phrase? ... WHERE CONTAINS (col1, 'search phrase') ... throws an error.
Sorry, I'm new to SQL Server
Update 2
sorry, just figured it out; use multiple "contains" clauses instead of one clause with multiple words. Actually, this still doesn't get what I want (the exact phrase) it only makes sure that all words in the phrase are present.
Searching TEXT fields is always pretty slow. Give Full Text Search a try and see if that works better for you.
If your queries are like LIKE '%string%' (i. e. you search for a string inside a TEXT field), then you'll need a FULLTEXT index.
If you search for a substring in the beginning of the field (LIKE 'string%') and use SQL Server 2005 or higher, then you can convert your TEXT into a VARCHAR(MAX), create a computed column and index this column.
See this article in my blog for performance details:
Indexing VARCHAR(MAX)
You should be looking at using Full Text Indexing on the column.
You can do complex boolean querying in FTS; like
contains(yourcol,'"My first sting" or "my second string" and "my third string"')
Depending on your query ContainsTable or freetexttable might give better results.
If you are connecting through .Net you might want to look at A google full text search
And since nobody has already said it (maybe because it's obvious) querying LIKE '%string%' bypasses your existing indexes - so it'll run slow.
Hence - why you need to use full text indexing. (which is what Quassnoi said).
Correction - I'm sure I learnt this, and always believed it - but after some investigating it (using wildcard at the start) seems OK? My old regex queries run better with likes!

Best way to literal phrase search a text column in SQL Server

I thought full-text search would let me do exact phrase searching in a more optimized way than a LIKE predicate, but I'm reading that it doesn't do that exactly.
Is "LIKE" the most efficient way to search thousands of rows of TEXT fields in a table for a literal string?
It's got to be exact matching...
LIKE(string%) will work faster if you have proper index on the column and you are looking for "string" only in the beginning of the value. You have to use LIKE(%string%) if the "string" might be in the middle of your value; table scan will be fired in this case and it's slow (slower than full-text search mostly).
You can use the CONTAINS() function of full-text search for exact match.
Apparently, CONTAINS is faster than a LIKE query...
http://www.docstoc.com/docs/2280727/Microsoft-SQL-Server-70-Full-Text-Search-What-is-full-text-search
(Profiling can be found on Page 19 of that presentation)
What version of SQL Server are you on? I would recommend replacing TEXT with VARCHAR(MAX), if you ever can (SQL Server 2005 and up).
What makes you say that full text won't work? How have you set up fulltext, and what do your fulltext queries look like?
Marc

Resources