How can I improve this stored procedure performance? - sql-server

I am trying to speed up the execution time of my stored procedure. One inner join in particular is taking around 5 seconds to execute. I looked at the execution plan and it seemed the bottle neck was on an inner join.
I tried creating a few non clustered indexes as there was a 65% cost for an index seek (nonclustered).
Forgive me if I did not provide enough information as I am not that accustomed to using indexes in sql.
Here is the query that takes ~5 seconds to execute as the tables contain a lot of data:
INSERT INTO TBL_1(TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL3.COLA)
SELECT TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL2.COLB
FROM TBL_2 TBL2 with(index(idx_tbl2IDX))
INNER JOIN TBL_3 TBL3 with(index(idx_tbl3IDX))
ON TBL2.COLB = TBL3.COLB
INNER JOIN TBL_4 TBL4 with(index(idx_tbl4IDX))
ON TBL3.COLA = TBL4.COLD
AND TBL4.COLA % 1000 = TBL3.COLC
AND TBL4.COLE = 0
WHERE TBL2.COLC = 1
And here are my indexes (i originally just created one for TBL_4 since that is where the biggest cost in the execution plan was but i ended up creating one for each table to see if it made any difference, which it didn't
CREATE NONCLUSTERED INDEX [idx_tbl4IDX]
ON [dbo].TBL_4(COLD, COLA, COLE)
INCLUDE (COLB, COLC);
CREATE NONCLUSTERED INDEX [idx_tbl3IDX]
ON [dbo].TBL_3 (COLB, COLA, COLC)
CREATE NONCLUSTERED INDEX [idx_tbl2IDX]
ON [dbo].TBL_2(COLB, COLC)
INCLUDE (COLA);
I realize this may be a bit confusing as I renamed all the columns and tables, if it makes no sense please let me know and I will try and use better naming conventions.

Perhaps post the actual execution plan, but it's likely that this
AND TBL4.COLA % 1000 = TBL3.COLC
is causing the slowness. The order of the columns in the index also might play into this, depending on how big your dataset is. Try ordering them from Most to Least selective. For instance, if TBL4.COLE is a 1/0 value and there are very few 0's, then perhaps make that the first column in your index.

Without knowing number of rows, selectivity etc. it is really hard to say anything. I would suggest
Remove all those with(index... (and never return them back)
Update statistics for all tables (e.g. UPDATE STATISTICS TBL_2 WITH FULLSCAN)
Add all possible indexes. There are 6 for tables TBL_3 and TBL_4 and two for TBL_2.
Run the query, see which indexes are used and what the time is.
If the time is ok, You can just delete indexes You do not need. If it is not, You would probably need to do something with the % 1000. You can make calculated persisted column and index that instead.

Related

Is it possible to use join hint with a cross join in T-SQL?

Is it possible to use join hint with a cross join in T-SQL? If so what is the syntax?
select *
from tableA
cross ? join tableB
Based on your comments
I am trying to fix my execution plan, my estimated rows are very off
in the nested loop join. I have changed a cursor to a cross join...
The code is faster now with the cross join, but I want to make it even
faster. So I just want to experiment with a join hint...
I have 900 out af 2000000 as actual and estimated for the nested loop
join..And I think it is the step where the cross join is happening...
it is a table from ETL so a lot of new data every day ..
I have a few suggestions
Don't go straight for a cross join. If it's doing a nested loop join because of really bad cardinality estimation, try using a hash join hint instead
It definitely can help to have statistics up-to-date (research the 'Ascending Key Problem' for info). However, you may want to check if your statistics are set to auto-update and whether they get triggered (e.g., after the ETL, view the properties of the statistics to see when they were last updated etc)
Try to fix the bad cardinality estimate. One way is to split the bigger tasks into smaller tasks (e.g., into temporary tables).
On the chance you're using table variables (e.g., DECLARE #temptable TABLE) rather than temporary tables (e.g., CREATE TABLE #TempTable) then stop it. Variables (including table variables) don't have statistics. Older versions often assume 1 row in table variables. SQL Server 2019 (as long as you're in the latest compatibility mode) has some changes to this, but still has some big issues.
When you get it down to the one operation that has the bad cardinality estimate, you can also do things like adding indexes/etc to help with that estimate (remember - you can put indexes and primary keys on temporary tables - they can speed up processing too if the table is accessed multiple times).

is index still effective after data has been selected?

I have two tables that I want to join, they both have index on the column I am trying to join.
QUERY 1
SELECT * FROM [A] INNER JOIN [B] ON [A].F = [B].F;
QUERY 2
SELECT * FROM (SELECT * FROM [A]) [A1] INNER JOIN (SELECT * FROM B) [B1] ON [A1].F=[B1].F
the first query clearly will utilize the index, what about the second one?
after the two select statements in the brackets are executed, then join would occur, but my guess is the index wouldn't help to speed up the query because it is pretty much a new table..
The query isn't executed quite so literally as you suggest, where the inner queries are executed first and then their results are combined with the outer query. The optimizer will take your query and will look at many possible ways to get your data through various join orders, index usages, etc. etc. and come up with a plan that it feels is optimal enough.
If you execute both queries and look at their respective execution plans, I think you will find that they use the exact same one.
Here's a simple example of the same concept. I created my schema as so:
CREATE TABLE A (id int, value int)
CREATE TABLE B (id int, value int)
INSERT INTO A (id, value)
VALUES (1,900),(2,800),(3,700),(4,600)
INSERT INTO B (id, value)
VALUES (2,800),(3,700),(4,600),(5,500)
CREATE CLUSTERED INDEX IX_A ON A (id)
CREATE CLUSTERED INDEX IX_B ON B (id)
And ran queries like the ones you provided.
SELECT * FROM A INNER JOIN B ON A.id = B.id
SELECT * FROM (SELECT * FROM A) A1 INNER JOIN (SELECT * FROM B) B1 ON A1.id = B1.id
The plans that were generated looked like this:
Which, as you can see, both utilize the index.
Chances are high that the SQL Server Query Optimizer will be able to detect that Query 2 is in fact the same as Query 1 and use the same indexed approach.
Whether this happens depends on a lot of factors: your table design, your table statistics, the complexity of your query, etc. If you want to know for certain, let SQL Server Query Analyzer show you the execution plan. Here are some links to help you get started:
Displaying Graphical Execution Plans
Examining Query Execution Plans
SQL Server uses predicate pushing (a.k.a. predicate pushdown) to move query conditions as far toward the source tables as possible. It doesn't slavishly do things in the order you parenthesize them. The optimizer uses complex rules--what is essentially a kind of geometry--to determine the meaning of your query, and restructure its access to the data as it pleases in order to gain the most performance while still returning the same final set of data that your query logic demands.
When queries become more and more complicated, there is a point where the optimizer cannot exhaustively search all possible execution plans and may end up with something that is suboptimal. However, you can pretty much assume that a simple case like you have presented is going to always be "seen through" and optimized away.
So the answer is that you should get just as good performance as if the two queries were combined. Now, if the values you are joining on are composite, that is they are the result of a computation or concatenation, then you are almost certainly not going to get the predicate push you want that will make the index useful, because the server won't or can't do a seek based on a partial string or after performing reverse arithmetic or something.
May I suggest that in the future, before asking questions like this here, you simply examine the execution plan for yourself to validate that it is using the index? You could have answered your own question with a little experimentation. If you still have questions, then come post, but in the meantime try to do some of your own research as a sign of respect for the people who are helping you.
To see execution plans, in SQL Server Management Studio (2005 and up) or SQL Query Analyzer (SQL 2000) you can just click the "Show Execution Plan" button on the menu bar, run your query, and switch to the tab at the bottom that displays a graphical version of the execution plan. Some little poking around and hovering your mouse over various pieces will quickly show you which indexes are being used on which tables.
However, if things aren't as you expect, don't automatically think that the server is making a mistake. It may decide that scanning your main table without using the index costs less--and it will almost always be right. There are many reasons that scanning can be less expensive, one of which is a very small table, another of which is that the number of rows the server statistically guesses it will have to return exceeds a significant portion of the table.
These both queries are same. The second query will be transformed just same as first one during transformation.
However, if you have specific requirement I would suggest that you put the whole code.Then It would be much easier to answer your question.

SQL Server last 25 records query optimization

I have 4million records in one of my tables. I need to get the last 25 records that have been added in the last 1 week.
This is how my current query looks
SELECT TOP(25) [t].[EId],
[t].[DateCreated],
[t].[Message]
FROM [dbo].[tblEvent] AS [t]
WHERE ( [t].[DateCreated] >= Dateadd(DAY, Datediff(DAY, 0, Getdate()) - 7, 0)
AND [t].[EId] = 1 )
ORDER BY [t].[DateCreated] DESC
Now I do not have any indexes running for this table and do not intend to have one. This query takes about 10-15 seconds to run and my apps times-out, now is there a way to better it?
You should create an index on EId, DateCreated or at least DateCreated
Without this only way of optimising this that I can think of would be to maintain the last 25 in a separate table via an insert trigger (and possibly update and delete triggers as well).
If you have an ID in the table that is autoincrement (not the Eid but a separate PK) you can order by ID desc instead of DateCreated, that might make your order by faster.
otherwise you do need an index (but your question says you do not want that).
If the table has no indexes to support the query you are going to be forced to perform a table scan.
You are going to struggle to get around the table scan aspect of that - and as the table grows, the response time will get slower.
You are going to have to endevour to educate your client as to the problems going forward they face, and that they should consider an index. They may be saying no, you need to show the evidence to support the reasoning, show them times with / without, and make sure the impact to the record insertion is also shown, it's a relatively simple cost / benefit / detriment for the adding of the index / not adding of it. If they insist on no index, then you have no choice but to extend your timeouts.
You should also try query hint:
http://msdn.microsoft.com/en-us/library/ms181714.aspx
With option FAST n -- number of rows.

Sql serve Full Text Search with Containstable is very slow when Used in JOIN!

I am using sql 2008 full text search and I am having serious issues with performance depending on how I use Contains or ContainsTable.
Here are sample: (table one has about 5000 records and there is a covered index on table1 which has all the fields in the where clause. I tried to simplify the statements so forgive me if there is syntax issues.)
Scenario 1:
select * from table1 as t1
where t1.field1=90
and t1.field2='something'
and Exists(select top 1 * from containstable(table1,*, 'something') as t2
where t2.[key]=t1.id)
results: 10 second (very slow)
Scenario 2:
select * from table1 as t1
join containstable(table1,*, 'something') as t2 on t2.[key] = t1.id
where t1.field1=90
and t1.field2='something'
results: 10 second (very slow)
Scenario 3:
Declare #tbl Table(id uniqueidentifier primary key)
insert into #tbl select {key] from containstable(table1,*, 'something')
select * from table1 as t1
where t1.field1=90
and t1.field2='something'
and Exists(select id from #tbl as tbl where id=req1.id)
results: fraction of a second (super fast)
Bottom line, it seems if I use Containstable in any kind of join or where clause condition of a select statement that also has other conditions, the performance is really bad. In addition if you look at profiler, the number of reads from the database goes to the roof. But if I first do the full text search and put results in a table variable and use that variable everything goes super fast. The number of reads are also much lower. It seems in "bad" scenarios, somehow it gets stuck in a loop which causes it to read many times from teh database but of course I don't understant why.
Now the question is first of all whyis that happening? and question two is that how scalable table variables are? what if it results to 10s of thousands of records? is it still going to be fast.
Any ideas?
Thanks
I spent quite sometime on this issue, and based on running many scenarios, this is what I figured out:
if you have Contains or ContainsTable anywhere in your query, that is the part that gets executed first and rather independently. Meaning that even if the rest of the conditions limit your search to only one record, neither Contains nor containstable care about that. So this is like a parallel execution.
Now since fulltext search only returns a Key field, it immediately looks for the Key as the first field of other indexes chosen for the query. So for the example above, it looks for the index with [key], field1, field2. The problem is that it chooses an index for the rest of query based on the fields in the where clause. so for the example above it picks the covered index that I have which is something like field1, field2, Id. (Id of the table is the same as the [Key] returned from the full text search). So summary is:
executes containstable
executes the rest of the query and pick an index based on where clause of the query
It tries to merge these two. Therefore, if the index that it picked for the rest of the query starts with the [key] field, it is fine. However, if the index doesn't have the [key] field as the first key, it starts doing loops. It does not even do a table scan, otherwise going through 5000 records would not be that slow. The way it does the loop is that it runs the loop for the total number of results from FTS multiplied by total number of results from the rest of the query. So if the FTS is returning 2000 records and the rest of the query returns 3000, it loops 2000*3000= 6,000,000. I donot understand why.
So in my case it does the full text search, then it does he rest of the query but picks the covered index that I have which is based on field1, field2,id (which is wrong) and as the result it screws up. If I change my covered index to Id, field1, field2 everything would be very fast.
My expection was that FTS returns bunch of [key], the rest of the query return bunch of [Id] and then the Id should be matched against [key].
Of course, I tried to simplify my query here, but the actual query is much more complicated and I cannot just change the index. I also do have scenarios where the text passed in full text is blank and in those scenarios I donot even want to join with containstable.
In those cases changing my covered index to have the id field as the first field, will generate disaster.
Anyways, for now I chose the temp table solution since it is working for me. I am also limiting the result to a few thousand which helps with the potential performance issues of table variables when the number of records go too high.
thanks
Normally it works very fast:
select t1.*, t2.Rank
from containstable(table1, field2, 'something') as t2
join table1 as t1 ON t1.id = t2.Key AND t1.field1=90
order by t2.Rank desc
There is a big difference where you put your search criteria: in JOIN or in WHERE.
I'm going to take a guess here that your issue is the same as on the other thread I linked to. Are you finding the issue arises with multiple word search terms?
If so my answer from that thread will apply.
From http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506240
The most important thing is that the
correct join type is picked for
full-text query. Cardinality
estimation on the FulltextMatch STVF
is very important for the right plan.
So the first thing to check is the
FulltextMatch cardinality estimation.
This is the estimated number of hits
in the index for the full-text search
string. For example, in the query in
Figure 3 this should be close to the
number of documents containing the
term ‘word’. In most cases it should
be very accurate but if the estimate
was off by a long way, you could
generate bad plans. The estimation for
single terms is normally very good,
but estimating multiple terms such as
phrases or AND queries is more complex
since it is not possible to know what
the intersection of terms in the index
will be based on the frequency of the
terms in the index. If the cardinality
estimation is good, a bad plan
probably is caused by the query
optimizer cost model. The only way to
fix the plan issue is to use a query
hint to force a certain kind of join
or OPTIMIZE FOR.
So it simply cannot know from the information it stores whether the 2 search terms together are likely to be quite independent or commonly found together. Maybe you should have 2 separate procedures one for single word queries that you let the optimiser do its stuff on and one for multi word search terms that you force a "good enough" plan on (sys.dm_fts_index_keywords might help if you want to do a rough estimate of cardinality yourself).
If you are getting the issue with single word queries this passage from the linked article might apply.
In SQL Server 2008 full-text search we have the ability to alter the plan that is
generated based on a cardinality estimation of the search term used. If the query plan is fixed (as it is in a parameterized query inside a stored procedure), this step does
not take place. Therefore, the compiled plan always serves this query, even if this plan is not ideal for a given search term.
So you might need to use the RECOMPILE option.

Why is doing a top(1) on an indexed column in SQL Server slow?

I'm puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.
Now I have 700k rows where the campaignid is indeed 3835
For all these rows, the connectionid is the same.
I just want to find out this connectionid.
use messaging_db;
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
Now this query takes approx 30 seconds to perform!
I (with my small db knowledge) would expect that it would take any of the rows, and return me that connectionid
If I test this same query for a campaign which only has 1 entry, it goes really fast. So the index works.
How would I tackle this and why does this not work?
edit:
estimated execution plan:
select (0%) - top (0%) - clustered index scan (100%)
Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK, index(idx_connectionid))
WHERE (campaignid_int = 3835)
I hope it will solve the issue.
Regards,
Enrique
I recently had the same issue and it's really quite simple to solve (at least in some cases).
If you add an ORDER BY-clause on any or some of the columns that's indexed it should be solved. That solved it for me at least.
You aren't specifying an ORDER BY clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add an ORDER BY x clause, where x being the clustered key on that table will probably be the fastest.
This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.
If the campaignid_int column is not indexed, add an index to it. That should speed up the query. Right now I presume that you need to do a full table scan to find the matches for campaignid_int = 3835 before the top(1) row is returned (filtering occurs before results are returned).
EDIT: An index is already in place, but since SQL Server does a clustered index scan, the optimizer has ignored the index. This is probably due to (many) duplicate rows with the same campaignid_int value. You should consider indexing differently or query on a different column to get the connectionid you want.
The index may be useless for 2 reasons:
700k in 10 million may be not selective enough
and /or
connectionid needs included so the entire query can used only an index
Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.
So, I suggest this...
CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)
This doesn't answer your question, but try using:
SET ROWCOUNT 1
SELECT connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.
Your query does not work as you expect, because Sql Server keeps statistics about your index and in this particular case knows that there are a lot of duplicate rows with the identifier 3835, hence it figures that it would make more sense to just do a full index (or table) scan. When you test for an ID which resolves to only one row, it uses the index as expected, i.e. performs an index seek (the execution plan should verify this guess).
Possible solutions ? Make the index composite, if you have anything to compose it with, that is, e.g. compose it with the date the message was sent (if I understand your case correctly) and then select the top 1 entry from the list with the specified id ordered by the date. Though I'm not sure whether this would be better (for one, a composite index takes up more space) - just a guess.
EDIT: I just tried out the suggestion of making the index composite by adding a date column. If you do that and specify order by date in your query, an index seek is performed as expected.
but since I'm specifying 'top(1)' it
means: give me any row. Why would it
first crawl through the 700k rows just
to return one? – reinier 30 mins ago
Sorry, can't comment yet but the answer here is that SQL server is not going to understand the human equivalent of "Bring me the first one you find" when it hears "Top 1". Instead of the expected "Give me any row" SQL Server goes and fetches the first of all found rows.
Only time it knows that is after fetching all rows first, then discarding the rest. Very thorough but in your case not really fast.
Main issue as other said are your statistics and selectivity of your index. If you have another unique field in your table (like an identity column) then try an combined index on campaignid_int first, unique column second. As you only query on campaignid_int it has to be the first part of the key.
Sounds worth a try as this index should have a higher selectivity thus the optimizer can use this better than doing an index crawl.

Resources