LIKE vs CONTAINS on SQL Server - sql-server

Which one of the following queries is faster (LIKE vs CONTAINS)?
SELECT * FROM table WHERE Column LIKE '%test%';
or
SELECT * FROM table WHERE Contains(Column, "test");

The second (assuming you means CONTAINS, and actually put it in a valid query) should be faster, because it can use some form of index (in this case, a full text index). Of course, this form of query is only available if the column is in a full text index. If it isn't, then only the first form is available.
The first query, using LIKE, will be unable to use an index, since it starts with a wildcard, so will always require a full table scan.
The CONTAINS query should be:
SELECT * FROM table WHERE CONTAINS(Column, 'test');

Having run both queries on a SQL Server 2012 instance, I can confirm the first query was fastest in my case.
The query with the LIKE keyword showed a clustered index scan.
The CONTAINS also had a clustered index scan with additional operators for the full text match and a merge join.

I think that CONTAINS took longer and used Merge because you had a dash("-") in your query adventure-works.com.
The dash is a break word so the CONTAINS searched the full-text index for adventure and than it searched for works.com and merged the results.

Also try changing from this:
SELECT * FROM table WHERE Contains(Column, "test") > 0;
To this:
SELECT * FROM table WHERE Contains(Column, '"*test*"') > 0;
The former will find records with values like "this is a test" and "a test-case is the plan".
The latter will also find records with values like "i am testing this" and "this is the greatest".

I didn't understand actually what is going on with "Contains" keyword. I set a full text index on a column. I run some queries on the table.
Like returns 450.518 rows but contains not and like's result is correct
SELECT COL FROM TBL WHERE COL LIKE '%41%' --450.518 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'41') ---40 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'"*41*"') -- 220.364 rows

Related

Fulltext Contains does not return rows

I have a row in a table that contains "DS012345" in a column called description
When I use this query:
Select * from Tablename where Contains(Description, ' "*012345*" ')
This query returns no result.
I have created the unique index, fulltext catalog, I have turned off the Stop Words using the Object Explorer. Still do not know why it does not return that row.
Any suggestion or cause for this?
Thanksl.
Why not just use LIKE instead to do a search.
Select * from Tablename where Description LIKE '%012345%'
Just does a search where 012345 appears anywhere within the description column.
Stop words is the number that it starts to seek for a word in your database..
Fulltext should be used to get the exact word, if you just want a part of the word you should use LIKE %...%.

How to force reasonable execution plan for query with LIKE statement?

When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'

How handle table updates/inserts if there is an index on table

Hi i am a bit confused about the handling of indexes in postgres. I use the 9.6 version. From my understanding after reading postgres docs and answers from stackoverflow i want to verify the following:
postgres does not support indexes with the classic notion
all indexes in postgres are non-clustered indexes
indexes does not allocate any new space but apply a sort on the table
thats way after create index a CLUSTER command shall follow.
in the docs it is stated that after updates/inserts on table the index is updated automatically
Show i created a table with col1,col2,col3,col4 and the an index based on col2, col3. Selects that have to do with col2, col3 became 15 times faster.
When i execute select * from table then results are displayed first sorted based on col2 and then based on col3.
When i add a new row in the table (with a col2 value (test_value) that already existed), this row went at the end of the table (this was checked with : select * from table).
1) Did the index got updated with this new entry automatically even if the select all showed the row at the end?
2) If a execute a query will all the rows that have the test_value on col2 what will happen? Will i get all the results through the index?
There are some wrong assumptions here.
The most important is: The order of the rows in a select is indeterminate unless you include ORDER BY. So you can get any result db engine decide is the faster way to get the data. So if select * from table return the last inserted element at the end, that doesn't tell you anything regarding the index.
How Rows are stored and Index information are separate things
1) Yes, index was updated after insert.
2) Yes, because index was already update.

Postgresql not using my index

I can't understand why my query is doing a sequential scan.
select columns
from table
where (column_1 ilike '%whatever%' or column_2 ilike '%whatever%')
I have an index on both column_1 and column_2.
The cardinality on both columns is very high.
My table is roughly 25 million rows.
What do you think I might be doing wrong? No matter what I do, it always does a sequential scan.
Edit #1:
My index looks like this:
Create index xxx on table (column_1, column_2);
Edit #2:
Changing my sql query to
select columns
from table
where (column_1 ilike 'whatever%' and column_2 ilike 'whatever%')
still made my query use a sequential scan. I got the same result when I just used like instead of ilike. But this query:
select columns
from table
where (column_1 = 'whatever' and column_2 = whatever)
made my query use an index scan and my query went much faster :)
Two Reasons:
Your query does a OR condition in which index can't be used.
You are doing a ilike on "%xyz%". This can't use any help of sorted(i.e. indexed) data.
--
Edit: See if you can have like on "xyz%". Then index can be used if you do a separate condition on both columns (and separate index on both)
Edit2: By the query, the thing you are trying to do looks like Full Text Search. For that you would need search indexing techniques (Read Elasticsearch, Sphinx, Solr)

Best index(es) to use for an OR Statement in SQL Server

I have a table which has a bunch of columns but the two relevant ones are:
Due_Amount MONEY
Bounced_Due_Amount MONEY
I have a SQL query like the following
SELECT * FROM table WHERE (Due_Amount > 0 OR Bounced_Due_Amount > 0)
Would the best index to put on this table for SQL Server 2008 be an index which includes both columns in the index, or should I put an separate index on each column?
An Index can't be used on an OR like that. try this:
SELECT * FROM table WHERE Due_Amount > 0
UNION ALL
SELECT * FROM table Bounced_Due_Amount > 0
--use "UNION" if Due_Amount and Bounced_Due_Amount could both >0 at any one time
have an index on Due_Amount and another on Bounced_Due_Amount.
It might be better to redesign your table. Without knowing your business logic or table, I'm going to guess that you could have a "Bounced" Y/N or 1/0 char/bit column and just a "Due_Amount" column. Add an index on that "Due_Amount" and the query would just be:
SELECT * FROM table WHERE Due_Amount > 0
you could still differentiate between a Bounced or not row. This will not work if you need to have both a bounced and non-bounced due amount at the same time.
My guess is that you would be better off with an index on each individual column. Having it on both won't help any more than having it on just the first column unless you have other queries that would use the compound index.
Your best bet is to try the query with an index on one column, an index on the other column, and two indexes - one on each column. Do some tests with each (on real data, not test data) and see which works best. Take a look at the query plans to understand why.
Depending on the specific data (both size and cardinality) SQL Server may end up using one, both, or possibly even neither index. The only way to know for sure is to test them each.
Technically, you can have an index on a persisted computed column and use the computed column instead of the OR condition in the query, see Creating Indexes on Computed Columns:
alter table [table] add Max_Due_Amount as
case
when Due_Amount > Bounced_Due_Amount the Due_Ammount
else Bounced_Due_Amount
end
persisted;
go
create index idxTableMaxDueAmount on table (Max_Due_Amount );
go
SELECT * FROM table WHERE Max_Due_Amount > 0;
But in general I'd recommend using the UNION approach like KM suggested.
Specifically for this query, it would be best to create an index on both columns in the order they are used in the where clause. Otherwise the index might not be used.

Resources