Full text search vs LIKE - sql-server

My question is about using fulltext.As I know like queries which begin with % never use index :
SELECT * from customer where name like %username%
If I use fulltext for this query can ı take better performance? Can SQL Server use fulltext index advantages for queries like %username%?

Short answer
There is no efficient way to perform infix searches in SQL Server, neither using LIKE on an indexed column, or with a fulltext index.
Long answer
In the general case, there is no fulltext equivalent to the LIKE operator. While LIKE works on a string of characters and can perform arbitrary wildcard matches against anything inside the target, by design fulltext operates upon whole words/terms only. (This is a slight simplification but it will do for the purpose of this answer.)
SQL Server fulltext does support a subset of LIKE with the prefix term operator. From the docs (http://msdn.microsoft.com/en-us/library/ms187787.aspx):
SELECT Name
FROM Production.Product
WHERE CONTAINS(Name, ' "Chain*" ');
would return products named chainsaw, chainmail, etc. Functionally, this doesn't gain you anything over the standard LIKE operator (LIKE 'Chain%'), and as long as the column is indexed, using LIKE for a prefixed search should give acceptable performance.
The LIKE operator allows you to put the wildcard anywhere, for instance LIKE '%chain', and as you mentioned this prevents an index from being used. But with fulltext, the asterisk can only appear at the end of a query term, so this is of no help to you.
Using LIKE, it is possible to perform efficient postfix searches by creating a new column, setting its value to the reverse your target column, and indexing it. You can then query as follows:
SELECT Name
FROM Production.Product
WHERE Name_Reversed LIKE 'niahc%'; /* "chain" backwards */
which returns products with their names ending with "chain".
I suppose you could then combine the prefix and reversed postfix hack:
SELECT Name
FROM Production.Product
WHERE Name LIKE 'chain%'
AND Name_Reversed LIKE 'niahc%';
which implements a (potentially) indexed infix search, but it's not particularly pretty (and I've never tested this to see if the query optimizer would even use both indexes in its plan).

You have to understand how index is working. Index is the very same like the dead-wood edition of encyclopedia.
If you use:
SELECT * from customer where name like username%
The index, in fulltext or no fulltext should work. but
SELECT * from customer where name like %username%
will never work with index. and it will be time-consuming query.

Of what I know about fulltext indexes, i'll make the following extrapolations:
Upon indexing, it parses the text, searching for words (some RDBMS, like MySQL, only consider words longer than 3 chars), and placing the words in the index.
When you search in the fulltext index, you search for words, which then link to the row.
If I'm right about the first two (for MSSQL), then it will only work if you search for WORDS, with lengths of 4 or more characters. It won't find 'armchair' if you look for 'chair'.
Assuming all that is correct, I'll go ahead and make the following statement: The fulltext index is in fact an index, which makes search faster. It is large, and has fewer search posibilities than LIKE would have, but it's way faster.
More info:
http://www.developer.com/db/article.php/3446891
http://en.wikipedia.org/wiki/Full_text_search

Like and contains are very different -
Take the following data values
'john smith'
'sam smith'
'john fuller'
like 's%'
'sam smith'
like '%s%'
'john smith'
'sam smith'
contains 's'
contains 'john'
'john smith'
'john fuller'
contains 's*'
'john smith'
'sam smith'
contains s returns the same as contains s* - the initial asterisk is ignored, which is a bit of a pain but then the index is of words - not characters

You can use:
SELECT * from customer where CONTAINS(name, 'username')
OR
SELECT * from customer where FREETEXT(name, 'username')

https://stackoverflow.com/users/289319/mike-chamberlain, you are quite right as you suggest it's not enough to search something 'chain' WHERE Name LIKE 'chain%'
AND Name_Reversed LIKE 'niahc%' is not equivalent to like'%chain%'****

Related

Sybase like clause for matching a pattern between the string

I want to build a sybase ASE query to match lastname, firstname for a person. There are few different formats for name. It can be "lastname, firstname" OR it can be "lastname,firstname" (no space in between , and firstname). I have tried using name like 'lastname[,][ ]firstname' but it does not work. I can not use lastname,%firstname as it would match with any character for firstname. The valid character is either space or nothing. Any suggestions?
Unfortunately SAP/Sybase ASE does not provide support for regex patterns (eg, 'zero or more spaces'), so you're left with a few basic options ...
union (all) two queries:
select *
from names_table
where name like 'lastname, firstname'
union all
select *
from names_table
where name like 'lastname,firstname'
NOTE: Both queries should use an index on the name column assuming statistics show an index access plan is the best option.
or two where clauses:
select *
from names_table
where (name like 'lastname, firstname' or name like 'lastname,firstname')
NOTE: Whether or not this uses an index on the name column will depend on the statistics for the index and column and/or the complexity of the actual query.
Strip out spaces and match what's left:
select *
from names_table
where str_replace(name,' ',null) like 'lastname,firstname'
NOTE: In most cases this will disable the use of an index on the name column.
From an indexing perspective ...
If you need to run this type of query often, and the performance of said query is less than acceptable, you could look at a couple additional indexing options:
(materialized) computed column + index on said computed column
function-based index (ASE basically creates a 'system' computed column under the covers and then creates the index on said column)

Can SQL Server index a text string by delimiter?

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:
foo.bar.baz
They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."
How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts
foo
Or
foo.bar
Yes, I could do a LIKE query, but I never need find anything like:
fo
So that seems like a waste to me.
Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:
foo
foo.bar
foo.bar.baz
Is there any type of index that would facilitate searching like that?
Edit
I will never need to search backwards or from the middle. My searches will always begin from the front of the string:
foo.bar
Never:
bar.baz
SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):
WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';
However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:
WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;
This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.
In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.
Its appears to be a work for CTE!
create TableA(
id int identity,
parentid int null,
name varchar(50)
)
for a (fixed) two level its easy
select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'
To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.
http://msdn.microsoft.com/pt-br/library/ms175972.aspx

SQL Server 2012 full text search, can you use NEAR to match 'dave or david smith'?

I have this query:
select * from catnames where contains(name, 'NEAR((david, smith), MAX, TRUE)')
Which matches David Smith where the terms appear in order.
I'm wondering if it's possible to check david OR dave smith in the same query. The documentation for CONTAINS and NEAR is a little confusing. I've played around with a few attempts, mostly trying to add 'OR', but no dice.
Is it possible?
(Edit: Obviously, I mean within a single CONTAINS rather than chaining both CONTAINS)
Use an OR operator in your full text search.
select * from catnames where contains(name, 'NEAR((david, smith), MAX, TRUE) OR NEAR((dave, smith), MAX, TRUE)')
You could also use the full-text-search thesaurus to indicate that "David" and "Dave" are synonyms, and therefore would not need the OR clause. The thesaurus allows you to declare replacement terms and expansion terms. For your case, you would use an expansion term. The entry in the XML thesaurus file (there is one per language) would look like this:
<expansion>
<sub>David</sub>
<sub>Dave</sub>
</expansion>
See here for more info on thesaurus files in SQL Server 2012.

Query performance

I have table that has the following schema:
ID,firstName,MiddleName,LastName,FML,[some other columns]
FML column is created by concatenation firstName,space character,MiddleName,space character and last name. I want to search persong when you know FML. Therefore my query is
SELECT * from tbl where FML LIKE #Param
But I want to optimize this query, I'm thinking of separating input string into firstName,MiddleName,LastName strings and make query like that
SELECT * FROM tbl where firstName like #FN and MiddleName like #MN and LastName like #ln.
Also will query
SELECT smth from tbl where Val='test'
Be better in terms of performance then
Select smth from tbl where Val like 'test'
Thank you.
If you mean =, then use =. If you mean like, then use like. But once you add wildcards to like, the performance will decrease.
By separating and filtering on separate fields, you lose flexibility, but increase the ability to be more specific in your search. So it's not optimising, per se, as the functionality is different.
Imagine you have two records, Jack Roberts, and Robert Jack
Your first query allows you to find them both if your query is '%Robert%', whereas the second allows you to find them with separate queries.
Yes '=' operator gives the best performance, whereas LIKE searches all the Val which has test in its value.

How to construct a full-text search predicate for SQL Server 2008

I want to search a full-text column for the following two terms:
2011
J Vineyards
where I construct the predicate as
"2011*" and "j vineyards*"
no rows are returned.
A record which should match is
2011 j vineyards viognier alexander valley united states
After experimentation, it seems to be related to the single "j" character.
EDIT:
Here is the select statement for the full-text column BeverageSearchData.
Declare #test nvarchar(100);
Set #test='""2011*" and "j vineyards*"';
Select * from bv_beverage WHERE CONTAINS (BeverageSearchData,#test)
There are a couple of possibilities going on here. The full query and not just the predicate would help you get a better answer.
One thing that is probably going on is that SQL Full-text search eliminates single characters (As in J) when building its index.
If using CONTAINS, you may need to change your noise file and restart the SQL Server FullText Search service.
If using LIKE, you may be able try adding an additional single character wildcard. Play with it and see if it works without the 2011 and then add it back in.
WHERE myColumn like 'j_vineyards%'
WHERE myColumn like 'j%vineyards%'
An additional thing to note is that CONTAINS does not supports leading wildcards.
You're looking for %, not *.
Try this instead:
"%2011%" and "%j vineyards%"
What language did you choose when creating the index? SQL Server associates the system full-text stoplist by default when creating an index which is probably what is happening in your case.
Try building the index with STOPLIST OFF like so -
CREATE FULLTEXT INDEX on [table]([column]) ON [catalog] WITH STOPLIST OFF;
Alternatively, you can modify the stoplists to exclude certain words such as 'j' in the example shown above.

Resources