SQL Server Full Text Search Results differs from Like statement - sql-server

I tried these two methods on Advantureworks and got different results.
select * from Person.[Address] where AddressLine1 like '%99%'
select * from Person.[Address] where contains(Address.AddressLine1,'"*99*"')
Any Idea?

Full text search and LIKE are two completely different things:
LIKE works on strings of characters and matches exactly.
CONTAINS works on words and is somewhat fuzzy (how the strings are broken up into word parts depends on the language and can be customized even further if needed).

Related

Customize Normalization in SQL Server Full Text Search by replacing characters

I want to customize SQL Server FTS to handle language specific features better.
In many language like Persian and Arabic there are similar characters that in a proper search behavior they should consider as identical char like these groups:
['آ' , 'ا' , 'ء' , 'ا']
['ي' , 'ی' , 'ئ']
Currently my best solution is to store duplicate data in new column and replace these characters with a representative member and also normalize search term and perform search in the duplicated column.
Is there any way to tell SQL Server to treat any members of these groups as an identical character?
as far as i understand ,this would be used for suggestioning purposes so the being so accurate is not important. so
in farsi actually none of the character in list above doesn't share same meaning but we can say they do have a shared short form in some writing cases ('آ' != 'اِ' but they both can write as 'ا' )
SCENARIO 1 : THE INPUT TEXT IS IN COMPLETE FORM
imagine "محمّد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
after removing special character we can use following command :
select * from [db].[dbo].[table] where text REPLACE(text,' ّ ','') = REPLACE(N'محمد',' ّ ','');
the result would be
SCENARIO 2: THE INPUT IS IN SHORT FORMAT
imagine "محمد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
in this scenario we need to do some logical operation on text before we query in data base
for e.g. if "محمد" is input as we know and have a list of this special character ,it should be easily searched in query as :
select * from [db].[dbo].[table] where REPLACE(text,' ّ ','') = 'محمد';
note:
this solution is not exactly a best one because the input should not be affected in client side it, would be better if the sql server configure to handle this.
for people who doesn't understand farsi simply he wanna tell sql that َA =["B","C"] and a have same value these character in the list so :
when a "dad" word searched, if any word "dbd" or "dcd" exist return them too.
add:
some set of characters can have same meaning some of some times not ( ['ي','أ'] are same but ['آ','اِ'] not) so in we got first scenario :
select * from [db].[dbo].[table] where text like N'%هی[أي]ت' and text like N'هی[أي]ت%';

Identify all strings in SQL Server code (red color - like in SSMS)

I was not able to solve this by myself so I hope I didn't miss any similar post here and I'm not wasting your time.
What I want is to identify (get a list) of all strings used in SQL Server code.
Example:
select 'WordToCatch1' as 'Column1'
from Table1
where Column2 = 'WordToCatch2'
If you put above code to SSMS all three words in apostrophes will be red but only words 'WordToCatch1' and 'WordToCatch2' are "real" strings used in code.
My goal is to find all those "real" strings in any code.
For example if I will have stored procedure 10k rows long it would be impossible to search them manually so I want something what will find all those "real" strings for me and return a list of them or something.
Thanks in advance!
The trouble is, Column1 is nothing particular different compared to WordToCatch1 and WordToCatch2 - not unless you parse the SQL yourself. You could modify your query to take the quotes away from Column1 and it will show up coloured black.
I guess a simple regex will show up all identifiers after an AS keyword, which would be easier than fully parsing SQL, if all the unwanted strings are like that, and its not just an example.

SQL Contains exact phrase

I try to implement a search-mechanism with "CONTAINS()" on a SQL Server 2014.
I've read here https://technet.microsoft.com/en-us/library/ms142538%28v=sql.105%29.aspx and in the book "Pro Full-Text Search in SQL Server 2008" that I need to use double quotes to search an exact phrase.
But e.q. if I use this CONTAINS(*, '"test"') I receive results containing words like "numerictest" also. If I try CONTAINS(*, '" test "') it is the same. I've noticed, that there are less results as if I would search with CONTAINS(*, '*test*') for a prefix, sufix search, so there is definitely a delta between the searches.
I didn't expect the "numerictest" in the first statement. Is there an explanation for this behaviour?
I have been wracking my brain about a very similar problem and I recently found the solution.
In my case I was searching full text fields for "#username" but using CONTAINS(body, "#username") returned just "username" as well. I wanted it to strictly match with the # sign.
I could use LIKE "%#username%" but the query took over a minute which was unacceptable so I kept looking.
With the help of some people in a chat room they suggested using both CONTAINS and LIKE. So:
SELECT TOP 25 * FROM table WHERE
CONTAINS(body, "#username") AND body LIKE "%#username%";
this worked perfectly for me because the contains pulls both username and #username records and then the LIKE filters out the ones with the # sign. Queries take 2-3 seconds now.
I know this is an old question but I came across it in my searching so having the answer I thought I would post it. I hope this helps.
Contains(*,'"test"') will only match full words of "test" as you expect.
Contains(*,'" test "') same as above
Contains(*,'"*test*"') will actually do a PREFIX ONLY search, basically strips out any special characters at the start of word and only uses the 2nd *.
You cannot do POSTFIX searches using full text search.
My concern lies with the Contains(*) part, this will search for any full text cataloged items in that entire row. Without seeing the data it is hard to tell but my guess is that another column in that row you think is bad is actually matching on "test" somewhere.

Dynamic Search multiple terms in linqtosql

I'm trying to do the following, If a user the enters the term "IP Address Text" into my search box then I want the following SQL to be generated:
SELECT *
FROM tblComments
WHERE tblComments.Text LIKE '%IP%' OR tblComments.Text LIKE '%Address%' OR tblComments.Text LIKE '%Text%'
Obviously the number of words entered is going to be different each time.
I have tried a for each loop in LinqToSql adding multiple where clauses but this uses "AND" instead of "OR"
Any idea how to accomplish this?
You may want to read up on full text searching as an alternative to what you're trying to accomplish here. Searching for '%word%' will never perform well as the query cannot use an index.

Full text catalog/index search for %book%

I'm trying to wrap my head around how to search for something that appears in the middle of a word / expression - something like searching for "LIKE %book% " - but in SQL Server (2005) full text catalog.
How can I do that? It almost appears as if both CONTAINS and FREETEXT really don't support wildcard at the beginning of a search expression - can that really be?
I would have imagined that FREETEXT(*, "book") would find anything with "book" inside, including "rebooked" or something like that.
unfortunately CONTAINS only supports prefix wildcards:
CONTAINS(*, '"book*"')
SQL Server Full Text Search is based on tokenizing text into words. There is no smaller unit as a word, so the smallest things you can look for are words.
You can use prefix searches to look for matches that start with certain characters, which is possible because word lists are kept in alphabetical order and all the Server has to do is scan through the list to find matches.
To do what you want a query with a LIKE '%book%' clause would probably be just as fast (or slow).
If you want to do some serious full text searching then I would (and have) use Lucene.Net. MS SQL Full Text search never seems to work that well for anything other than the basics.
Here's a suggestion that is a workaround for that wildcard limitation. You create a computed column that contains the same content but in reverse as the column(s) you are searching.
If, for example, you are searching on a column named 'ProductTitle', then create a column named ProductsRev. Then update that field's 'Computed Column Specification' value to be:
(reverse([ProductTitle]))
Include the 'ProductsRev' column in your search and you should now be able to return results that support a wildcard at the beginning of the word. Good luck!!
Full text has a table that lists all the words the engine has found. It should have orders-of-magnitude less rows than your full-text-indexed table. You could select from that table " where field like '%book%' " to get all the words that have 'book' in them. Then use that list to write a fulltext query. Its cumbersome, but it would work, and it would be ok in the speed department. HOWEVER, ultimately you are using fulltext wrong when you are doing this. It might actually be better to educate the source of these feature requests about what fulltext is doing. You want them to understand what it WANTS to do, so they can get high value from fulltext. Example, only use wild cards at the end of a word, which means think of the words in an ordered list.
why don't program an assembly in C# to compute all the non repeated sufixes. For example if you have the Text "eat the red meat" you can store in a field "eat at t the he e red ed d meat" (note that is not necesary to add eat at and t again) ind then in this field use full text search. A function for doing that can easily written in Csharp
x) I know it seems od... it's a workarround
x) I know I'm adding overhead in the insert / update .... only justified if this overhead is insignificant besides the improvement in the search function
x) I know there is also an overhead in the size of the stored data.
But I'm pretty conffident that will be quite fast

Resources