How to speed up SQL like %pattern% searches?

How to speed up SQL like %pattern% searches? - sql-server

Is there any way to speed up queries like this below ? I am looking for option which would require minimal change to application code.
SELECT *
FROM my_table
WHERE some_column like '%my string%'
ORDER BY some_column
The table which causes most of the slowdown has 2,5 million records and query takes 10 seconds to execute.
Execution plan tells that 99% of the cost is index scan (NonClustered), which is understandable because of LIKE and pattern with "%" on both sides.
If there is "%" just at the end, then index seek is used and query executes in a moment.
So I am looking for something like:
to add some kind of aditional index on the table, probably not
possible ?
a way to put this table and/or index into RAM sa the seek would be
faster
anything else ?
I can use either MS SQL 2012 or 2014, both standard edition.
Bonus question
Is it possible that this very same queries would execute instanteniously on DB2 database ? App was using db2 initially but was migrated over to MS SQL.

There may not be an answer, but there is a reason. When you use a search string with a leading wildcard, such as '%string', you're forcing the optimizer to do a table scan.
You might want to revisit some of the suggestions in this thread.
Good luck!

You can try changing the column collation to some binary form:
in query:
SELECT *
FROM my_table
WHERE some_column COLLATE Latin1_General_BIN like '%my string%'
ORDER BY some_column
or change it in the table design permanently if you can.
Caveat: it's cAsE sEnSiTiVe.
Edit: you can get around case sensitivity by converting both the column and the search string to upper case for example:
SELECT *
FROM my_table
WHERE UPPER(some_column) COLLATE Latin1_General_BIN like '%MY STRING%'
ORDER BY some_column
Edit 2: backup the database before doing any perpanent collation changes, I'm not sure how exactly it compares but I think in query it should be ok.
Explation article.

I'm not sure this solution is an option for you as it stores more data in database. it also may increase the time for update/insert, but its an idea anyway. Too long to put in to comment, so don't blame me!
Add a persisted computed column for the some_column with this formula: REVERSE(some_column) to store reverse of the string
Add index on that column
In your query use some_column like 'my string%' or rev_some_column like REVERSE('%my string'). You'd better to replace REVERSE('%my string') with a variable initiated before query.
I think in this case, both likes will use index.

Related

T-SQL -- faster way to do match for NOT LIKE '%[^0-9]%' used in CASE/WHEN statement

I am looking for a potentially faster way to do this check:
NOT LIKE '%[^0-9]%'
This checks to ensure all characters are numbers (see description of T-SQL pattern)
Is there a faster way to do this in Microsoft SQL Server (T-SQL)?
The full context is as part of a CASE/WHEN statement in the select part of a vary large query:
Select DATEADD(dd, CAST(CASE WHEN a.dateDuration NOT LIKE '%[^0-9]%' THEN a.Duration ELSE 1 END AS INT), a.StartDate) AS 'ourEndDate'
In the above, a is a table alias. The column a.dateDuration is a nullable varchar column. (The real names of entities have been replaced for proprietary reasons).
Indeed, variants of this are repeated in various "UNION ALL" operators, so if it could be made faster it could speed the query considerably.
The NOT LIKE operator is presumably relatively slow.
The version of the underlying database is SQL Server 2012.

In this context performance of LIKE / NOT LIKE operator is almost for sure not a problem. If your query is slow consider first how many rows you are returning and if you are doing full scans on tables for looking interesing rows.
Here it looks like you are only trying to format/adjust your final result - if you consider SQL is too slow here you can do this processing on application server side as this is not a part of fetching data from disk.
If this is subquery please show entire query.

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.

Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.

Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...

I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.

Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.

I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.

Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.

Update with "not in" on huge table in SQL Server 2005

I have a table with around 115k rows. Something like this:
Table: People
Column: ID PRIMARY KEY INT IDENTITY NOT NULL
Column: SpecialCode NVARCHAR(255) NULL
Column: IsActive BIT NOT NULL
Initially, I had an index defined like so:
PK_IDX (clustered) -- clustered index on primary key
IDX_SpecialCode (non clustered, non-unique) -- index on the SpecialCode column
And I'm doing an update like so:
Update People set IsActive = 0
Where SpecialCode not in ('...enormous list of special codes....')
This enormous list is essentially 99% of the users in the table.
This update takes forever on my server. As a test I trimmed the list of special codes in the "not in" clause to something like 1% of the users in the table, and my execution plan ends up using an INDEX SCAN on the PK_IDX index instead of the IDX_SpecialCode index that I thought it'd use.
So, I thought that maybe I needed to modify the IDX_SpecialCode so that it included the column "IsActive" in it. I did so and I still see the execution plan defaulting to the PK_IDX index scan and my query still takes a very long time to run.
So - what is the more correct way to do an update of this nature? I have the list of user's I want to exclude from the update, but was trying to avoid loading all employees special codes from the database, filtering out those not in my list on my application side, and then running my query with an in clause, which will be a much much smaller list in my actual usage.
Thanks

If you have the employees you want to exclude, why not just populate an indexed table with those PK_IDs and do a:
Update People
set IsActive = 0
Where NOT EXISTS (SELECT NULL
FROM lookuptable l
WHERE l.PK = People.PK)
You are getting index scans because SQL Server is not stupid, and realizes that it makes more sense to just look at the whole table instead of checking for 100 different criteria one at a time. If your stats are up to date the optimizer knows about how much of the table is covered by your IN statement and will do a table or clustered index scan if it thinks it will be faster.

With SQL-Server indexes are ignored when you use the NOT clause. That is why you are seeing the execution plan ignoring your index. <- Ref: page 6. MCTS Exam 70-433 Database Development SQL 2008 (I'm reading it at the moment)
It might be worth taking a look at Full text indexes although I don't know whether the same will happen with that (I haven't got access to a box with it set up to test at the moment)
hth

Is there any way you could use the IDs of the users you wish to exclude instead of their code - even on indexed values comparing ids may be faster than strings.

I think that the problem is your SpecialCode NVARCHAR(255). Strings comparison in Sql Server are very slow. Consider change your query to work with the IDs. And also, try to avoid the NVarchar. if dont care about Unicode, use Varchar instead.
Also, check your database collation to see if it matches the instance collation. Make sure you are not having hard disk performance issues.

Stop SQL Server Evaluating Useless UPPER/LOWER In WHERE Clause?

it seems that despite the fact that SQL Server does not match on case in a WHERE clause it still honours UPPER/LOWER in a WHERE clause which seems to be quite expensive. Is it possible to instruct SQL Server to disregard UPPER/LOWER in a WHERE clause?
This might seem like a pointless question but it's very nice to be able to write a single query for both Oracle and SQL Server.
Thanks, Jamie

The short answer to your question is no - you can't have SQL server magically ignore function calls in the WHERE clause.
As others have said, the performance issue is caused because, on SQL Server, using a function in the WHERE clause prevents the use of an index and forces a table scan.
To get best performance, you need to maintain two queries, one for each RDBMS platform (either in your application or in database objects like stored procedures or views). Given that so many other areas of functionality differ between Oracle and SQL Server, you're likely to end up doing it anyway, for something else if not for this.

So you mean something like:
WHERE YourColumn = #YourValue collate Latin1_General_BIN
But if you want it to work without the collate keyword, you could just set the collation of the column to something which is case insensitive.
Bear in mind that an index on YourColumn will be using a particular collation, so if you specify the collation in the WHERE clause (rather than on the column itself), an index will be less useful. I liken this to the fact that when I flew in Sweden a few years ago, I couldn't find Vasteras on the map, because the letters I thought were a actually had accents on them and were located at the end of the alphabet. The index in the back of the map wasn't so good when I was trying to use the wrong collation.

What fields should be indexed on a given table?

I've a table with a lot of registers (more than 2 million). It's a transaction table but I need a report with a lot of joins. Whats the best practice to index that table because it's consuming too much time.
I'm paging the table using the storedprocedure paging method but I need an index because when I want to export the report I need to get the entire query without pagination and to get the total records I need a select all.
Any help?

The SQL Server 2008 Management Studio query tool, if you turn on "Include Actual Execution Plan", will tell you what indexes a given query needs to run fast. (Assuming there's an obvious missing index that is making the query run unusually slow, that is.)
SQL Server 2008 Management Studio Query Screenshot http://img208.imageshack.us/img208/4108/image4sy8.png
We use this all the time on Stack Overflow.. one of the best features of SQL 2008. It works against older SQL instances as well, just install the SQL 2008 tools and point them at a SQL 2005 instance. Not sure if it works on anything earlier, though.
As others have noted, you can also do this manually, but it takes a bit of trial and error. You'll want indexes on fields that are used in ORDER BY and WHERE clauses.

key fields have to be everithing in
the where clause ???
No, that would be overkill. Indexing a field really only works if a) your WHERE clause is selective enough (that is: only selects out about 1-2% of the values; an index on a "Gender" field which can be only one of two or three possible values is pointless), and b) your WHERE clause doesn't involve function calls or other magic.
In your case, TBL.Status might be a candidate - how many possible values are there? You select the '1' and '2' value - if there are hundreds of possible values, then it's a good choice.
On a side note:
this clause here: (TBL.Login IS NULL AND TBL.Login <> 'dev' ) is pretty pointless - if the value of TBL.login IS NULL, then it's DEFINITELY not 'dev' ..... so just the "IS NULL" will be more than sufficient......
The other field you might want to consider putting an index on is the TBL.Date, since you seem to select a range of dates here - that might be a good choice.
Also, on a general note: whenever possible, DO NOT use a SELECT * FROM ...... to select your fields. This causes a lot of overhead for SQL Server. SPECIFY your columns - and ONLY select those that you REALLY NEED - not just all of them for the heck of it.....

Check your queries, and find which fields are used to match them. Those are usually the best candidates!

SQL Server has a 'Database Engine Tuning Advisor' that could help you. This does not exist for SQL Server Express, but does for all other versions of SQL Server.
Load your query in a query window.
On the menu, click Query -> Analyze Query in Database Engine
Tuning Advisor
The tuning advisor will identify indexes that could be added to your table(s) to improve performance. In my experience, the tuning advisor doesn't always help, but most of the time it does. It's where I suggest you start.

ok this is the query in doing
SELECT
TBL.*
FROM
FOREINGDATABASE..TABLENAME TBL
LEFT JOIN Status S
ON TBL.Status = S.Number
WHERE
(TBL.ID = CASE #Reference WHEN 0 THEN TBL.ID ELSE #Reference END) AND
TBL.Date >= #FechaInicial AND
TBL.Date <= #FechaFinal AND
(TBL.Channel = CASE #Canal WHEN '' THEN TBL.Channel ELSE #Canal END)AND
(TBL.DocType = CASE #TipoDocumento WHEN '' THEN TBL.DocType ELSE #TipoDocumento END)AND
(TBL.Document = CASE #NumDocumento WHEN '' THEN TBL.Document ELSE #NumDocumento END)AND
(TBL.Login = CASE #Login WHEN '' THEN TBL.Login ELSE #Login END)AND
(TBL.Login IS NULL AND TBL.Login <> 'dev' ) AND
TBL.Status IN ('1','2')
key fields have to be everithing in the where clause ???

If I am not mistaken, please correct me if I am, I think you should create non-clustered Index on the fields of the conditions of the where clause. (Maybe this can be useful as a starting point to get some candidates for the indexes).
Good Luck

if an Index Scan instead of a seek is performed, the cause might be that the fields are not in the correct order in the index.

put indexes on all columns that you're joining and filtering on.
the use of indexes is also determined by the selectivity of the indexed column.
the best way would be to show us your query so we can try to improve it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to speed up SQL like %pattern% searches? - sql-server

There may not be an answer, but there is a reason. When you use a search string with a leading wildcard, such as '%string', you're forcing the optimizer to do a table scan. You might want to revisit some of the suggestions in this thread. Good luck!

Related

T-SQL -- faster way to do match for NOT LIKE '%[^0-9]%' used in CASE/WHEN statement

Oracle 11g: Index not used in "select distinct"-query

Update with "not in" on huge table in SQL Server 2005

Stop SQL Server Evaluating Useless UPPER/LOWER In WHERE Clause?

What fields should be indexed on a given table?

Categories

Resources