I have 4million records in one of my tables. I need to get the last 25 records that have been added in the last 1 week.
This is how my current query looks
SELECT TOP(25) [t].[EId],
[t].[DateCreated],
[t].[Message]
FROM [dbo].[tblEvent] AS [t]
WHERE ( [t].[DateCreated] >= Dateadd(DAY, Datediff(DAY, 0, Getdate()) - 7, 0)
AND [t].[EId] = 1 )
ORDER BY [t].[DateCreated] DESC
Now I do not have any indexes running for this table and do not intend to have one. This query takes about 10-15 seconds to run and my apps times-out, now is there a way to better it?
You should create an index on EId, DateCreated or at least DateCreated
Without this only way of optimising this that I can think of would be to maintain the last 25 in a separate table via an insert trigger (and possibly update and delete triggers as well).
If you have an ID in the table that is autoincrement (not the Eid but a separate PK) you can order by ID desc instead of DateCreated, that might make your order by faster.
otherwise you do need an index (but your question says you do not want that).
If the table has no indexes to support the query you are going to be forced to perform a table scan.
You are going to struggle to get around the table scan aspect of that - and as the table grows, the response time will get slower.
You are going to have to endevour to educate your client as to the problems going forward they face, and that they should consider an index. They may be saying no, you need to show the evidence to support the reasoning, show them times with / without, and make sure the impact to the record insertion is also shown, it's a relatively simple cost / benefit / detriment for the adding of the index / not adding of it. If they insist on no index, then you have no choice but to extend your timeouts.
You should also try query hint:
http://msdn.microsoft.com/en-us/library/ms181714.aspx
With option FAST n -- number of rows.
Related
I am trying to speed up the execution time of my stored procedure. One inner join in particular is taking around 5 seconds to execute. I looked at the execution plan and it seemed the bottle neck was on an inner join.
I tried creating a few non clustered indexes as there was a 65% cost for an index seek (nonclustered).
Forgive me if I did not provide enough information as I am not that accustomed to using indexes in sql.
Here is the query that takes ~5 seconds to execute as the tables contain a lot of data:
INSERT INTO TBL_1(TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL3.COLA)
SELECT TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL2.COLB
FROM TBL_2 TBL2 with(index(idx_tbl2IDX))
INNER JOIN TBL_3 TBL3 with(index(idx_tbl3IDX))
ON TBL2.COLB = TBL3.COLB
INNER JOIN TBL_4 TBL4 with(index(idx_tbl4IDX))
ON TBL3.COLA = TBL4.COLD
AND TBL4.COLA % 1000 = TBL3.COLC
AND TBL4.COLE = 0
WHERE TBL2.COLC = 1
And here are my indexes (i originally just created one for TBL_4 since that is where the biggest cost in the execution plan was but i ended up creating one for each table to see if it made any difference, which it didn't
CREATE NONCLUSTERED INDEX [idx_tbl4IDX]
ON [dbo].TBL_4(COLD, COLA, COLE)
INCLUDE (COLB, COLC);
CREATE NONCLUSTERED INDEX [idx_tbl3IDX]
ON [dbo].TBL_3 (COLB, COLA, COLC)
CREATE NONCLUSTERED INDEX [idx_tbl2IDX]
ON [dbo].TBL_2(COLB, COLC)
INCLUDE (COLA);
I realize this may be a bit confusing as I renamed all the columns and tables, if it makes no sense please let me know and I will try and use better naming conventions.
Perhaps post the actual execution plan, but it's likely that this
AND TBL4.COLA % 1000 = TBL3.COLC
is causing the slowness. The order of the columns in the index also might play into this, depending on how big your dataset is. Try ordering them from Most to Least selective. For instance, if TBL4.COLE is a 1/0 value and there are very few 0's, then perhaps make that the first column in your index.
Without knowing number of rows, selectivity etc. it is really hard to say anything. I would suggest
Remove all those with(index... (and never return them back)
Update statistics for all tables (e.g. UPDATE STATISTICS TBL_2 WITH FULLSCAN)
Add all possible indexes. There are 6 for tables TBL_3 and TBL_4 and two for TBL_2.
Run the query, see which indexes are used and what the time is.
If the time is ok, You can just delete indexes You do not need. If it is not, You would probably need to do something with the % 1000. You can make calculated persisted column and index that instead.
I've been tasked with improving the performance (and this is my first real-world performance tuning taks) of a reporting stored procedure which is called by an SSRS front-end and the stored procedure currently takes about 30 seconds to run on the largest amount of data (based on filters set from the report frontend).
This stored procedure has a breakdown of 19 queries executing in it, most of which are transforming the data from an initial (legacy) format from inside the base tables into a meaningful dataset to be displayed to the business side.
I've created a query based on a few DMV's in order to find out which are the most resource-consuming queries from the stored procedure (small snippet below) and I have found one query which takes about 10 seconds, in average, to complete.
select
object_name(st.objectid) [Procedure Name]
, dense_rank() over (partition by st.objectid order by qs.last_elapsed_time desc) [rank-execution time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_reads desc) [rank-logical reads]
, dense_rank() over (partition by st.objectid order by qs.last_worker_time desc) [rank-worker (CPU) time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_writes desc) [rank-logical write]
...
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text (qs.sql_handle) as st
cross apply sys.dm_exec_text_query_plan (qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) as qp
where st.objectid in ( object_id('SuperDooperReportingProcedure') )
, [rank-execution time]
, [rank-logical reads]
, [rank-worker (CPU) time]
, [rank-logical write] desc
Now, this query is a bit strange in the sense that the execution plan shows that shows that the bulk of the work (~80%) is done when inserting the data into the local temporary table and not when interrogating the other tables from which the source data is taken and then manipulated. (screenshot below is from SQL Sentry Plan Explorer)
Also, in terms of row estimates, the execution plan has way off estimates for this, in the sense that there are only 4218 rows inserted into the local temporary table as opposed to the ~248k rows that the execution plan thinks its moving into the local temporary table. So, becasue of this, I'm thinking "statistics", but still do those even matter if ~80% of the work is the actual insert into the table?
One of my first recommendations was to re-write the entire process and the stored procedure so as to not include the moving and transforming of the data into the reporting stored procedure and to do the data transformation nightly into some persisted tables (real-time data is not required, only relevant data until end of previous day). But the business side does not want to invest time and resources into redesigning this and instead "suggests" I do performance tuning in the sense of finding where and what indexes I can add to speed this up.
I don't believe that adding indexes to base tables will improve the performance of the report since most of the time needed for running the query is saving the data into a temporary table (which from my knowledge it will hit tempdb, which means that they will be written to disk -> increased time due to I/O latency).
But, even so, as I've mentioned this is my first performance tuning task and I've tried to read as much as possible related to this in the last couple of days and these are my conclusions so far, but I'd like to ask for advice from a broader audience and hopefully get a few more insights and understanding on what I can do to improve this procedure.
As a few clear questions I'd appreciate if could be answered are:
Is there anything incorrect in what I have said above (in my understanding of the db or my assumptions) ?
Is it true that adding an index to a temporary table will actually increase the time of execution, since the table (and its associated index(es) is/are being rebuilt on each execution)?
Could there anything else be done in this scenario without having to re-write the procedure / queries and only be done via indexes or other tuning methods? (I've read a few article headlines that you could also "tune tempdb", but I didn't get into the details of those, yet).
Any help is very much appreciated and if you need more details I'll be happy to post.
Update (2 Aug 2016):
The query in question is (partially) below. What is missing are a few more aggregate columns and their corresponding lines in the GROUP BY section:
select
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
,rtrim(ltrim(b.Store)) Store
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0) as Actual
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date then b.budgetValue else 0 end),0) as Budget
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
...
into #SalvesVsBudgets
from #StoresBudgets b
left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null or (b.StoreDivision is null and b.ProgramName = 'NewProgram'))
group by
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end
,rtrim(ltrim(b.Store))
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0)
I'm not sure if this is actually helpful, but since #kcung requested it, I added the information.
Also, to answer some his questions:
the temporary tables have no indexes on them
RAM size: 32 GB
Update (3 Aug 2016):
I have tried #kcung's suggestions to move the CASE statements from the aggregate-generating query and unfortunately, overall, the procedure time has not improved, noticeably, as it still fluctuates in the range of ±0.25 to ±1.0 second (yes, both lower and higher time than the original version of the stored procedure - but I'm guessing this is due to variable workload on my machine).
The execution plan for the same query, but modified to remove the CASE conditions, leaving only the SUM aggregates, is now:
Adding indexes to the temporary table will definitely improve the read call but slows down the write calls to the temporary table.
Here, as you mentioned, there are 19 queries executing in the procedure, so analyzing only one query with execution plan would not be more helpful.
Adding more, if possible, execute this query only & check how much time it takes (rows affected).
Other approach you may try, not sure if possible in your case, try using table variable instead of temporary table. This is because, using table variable over the temporary table has additional advantages such as, procedure is pre-compiled, no transactional logs are maintained. & more, you don't need to write drop table.
Any chance I can see the query ? and the indexes on both tables ?
How big is your ram ? how big is the row in each table(roughly) ?
Can you update statistics for both table and resend the query planner ?
To answer your question :
You're mostly right, except in the part of adding indexes. Adding indexes will help the query to do lookup. It will also give chance to the query planner to consider nested loop join plan instead of the hash join plan. Unfortunately, I can't answer more until my question being answered.
You shouldn't need to add index to the temp table. Adding index to this temp(or any insert destination table) table will increase write time, because the insert will need to update that index. Just imagine an index as copy of your table with less information and it sits on top of your table and it needs to be in sync with your table. Every write (insert, update, delete) needs to update this index.
Looking at both tables total rows, this query should run way faster than 10s, unless you have a lemon PC, then it's a different story.
EDIT:
Just want to point out for point 2, I didn't realise you're source table is temp table as well. Temporary table is destroyed after each session of a connection ended. Adding index to temporary table means that you will add extra time to create this index everytime you create this temporary table.
EDIT:
Sorry, I'm using phone now. I'm just gonna be short.
So essentially 2 things :
add primary key on temp table creation time so you do it in one go. Don't bother with adding nonclustered index or any covering index you will end up spending more time creating those.
see your query, all of the case when statement, instead of doing it in this query, why don't you add them as another column in the table. Essentially you want to avoid calculation on the fly when doing group by. You can leave the sum() in the query as it's an aggregate query, but try and reduce run time calculation as much as possible.
Sample :
case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
You can create a column named AM when creating table b.
Also those rtrim and ltrim. Please remove those and stick it in table creation time. :)
One suggestion is to increase the execution time of stored procedure.
cmd.CommandTimeout = 200 // in seconds.
You can also generate a report link and email it to user when the report was generated.
Other than that use CTE never use temp tables as they are more expensive.
I have a db where I have a little bit more than 2m rows. It has startIpNum and endIpNum columns(the ranges don't overlap). I am making some queries to that table:
table:
Id | startIpNum(Numeric(0,18)) | endIpNum(Numeric(0,18)) | locId
Query 1:
select locId from Blocks
where startIpNum <= 1550084098 and endIpNum >= 1550084098
Query 2(added this query hoping for better results):
select top 1 locId from Blocks
where endIpNum >= 1550084098
These queries take a reasonable time, no problems. But I need to get around 100 different rows each time I open a web page, and it tooks around 15 seconds which is possibly expected, but not desired.
I believe that by working with indexes I can increase that performence, so I've added 2 indexes, one to start(asc) one to end(desc) but performance is same.
What else can I do to achieve a better query performance?
Update
I have run the create index query you guys have proposed. No changes for now.
As requested I am including the sql query execution plans below(since I am not familiar with the execution plan thing I am only snipping screenshots from ssms, go ahead and ask if something else is required to answer my case):
Execution plan of Query1:
Execution plan of Query2:
As mentioned without an execution plan to look at this is slightly going slightly blind but the basic points are:
1) If the indexes are in place to support this query there is no point adding two. Only one of the indexes can be used. Therefore you need one index that contains both columns.
2) Bringing back "*" means that a key lookup will be inevitable as having used the index to get the rows it needs it will have to fetch the data not included in the index from the clustered index. Key lookups can get very expensive if you are bringing back large amounts of rows. If you can limit the columns you bring back then you can use an INCLUDE to avoid the key lookup. You don't need to include the primary key in this list as this is part of the index anyway.
Having said this your best option will be something like:
CREATE INDEX ix_range ON dbo.yourTable (start, end) INCLUDE (<list_of_columns_in_your_select)
Looking at your query plan it is also clear that a CONVERT_IMPLICIT is being performed on your parameters #1 and #2. These should be avoided so do the following:
DECLARE #1numeric numeric(18, 0),
#2numeric numeric(18, 0)
SELECT #1numeric = CAST(#1 AS numeric(18, 0)),
#2numeric = CAST(#2 AS numeric(18, 0))
SELECT locId FROM Blocks
WHERE startIpNum <= #1numeric and endIpNum >= #2numeric
Try to explicitly cast the value compared to match the columns.
select * from that_table
where CAST(123123123 as Numeric(18,0)) between start and end
I assume sqlserver is losing the index seek due to the implicit cast.
I have a very large table (150m+ rows) in SQL Server 2012 (web edition) that has no clustered index and one non-clustered index.
When I run this delete statement:
DELETE TOP(500000)
FROM pick
WHERE tournament_id < 157
(column name is in the non-clustered index), the execution plan produced by SQL Server looks like this:
The sort step looks problematic - it takes up 45% of the cost, and it is causing an alert saying "operator used tempdb to spill data during execution." The query is taking several minutes to run, and I feel like it should be quicker.
Two questions:
Why is there a sort step in the plan?
Any ideas how to overcome the spill? The server has 64gb of RAM and tempdb is sized at 8x 4gb data files.
I can definitely revisit the indexing strategy on this table if that might help.
Hope this all makes sense - thanks in advance for any tips.
I agree that there seems to be no good reason for a sort here.
I don't think it is needed for Halloween protection as it doesn't show up in the = 157 version of the plan.
Also the sort operation is sorting in order of Key Asc, Bmk ASC (presumably to get them ordered sequentially in index order) but this is the order the forward index seek on the very same index is returning the rows in anyway.
One way of removing it would be to obfuscate the TOP to get a narrow (per row) rather than a wide (per index) plan.
DECLARE #N INT = 500000
DELETE TOP(#N)
FROM pick
WHERE tournament_id < 157
OPTION (OPTIMIZE FOR (#N=1))
You'd need to test to see if this actually improved things or not.
I would try smaller chunks and a more selective WHERE clause, as well as a way to force SQL Server to pick the TOP rows in an order you specify:
;WITH x AS
(
SELECT TOP (10000) tournament_id
FROM dbo.pick
WHERE tournament_id < 157 -- AND some other where clause perhaps?
ORDER BY tournament_id -- , AND some other ordering column
)
DELETE x;
More selective could also mean deleting tournament_id < 20, then tournament_id < 40, etc. etc. instead of picking 500000 random rows from 1-157. Typically it's better for your system overall (both in terms of blocking impact, lock escalations etc., as well as impact to the log) to perform a series of small transactions rather than one large one. I blogged about this here: http://www.sqlperformance.com/2013/03/io-subsystem/chunk-deletes
The sort may still be present in these cases (particularly if it is for Hallowe'en protection or something to do with the RID), but it may be far less problematic at a smaller scale (please don't go just based on that estimated cost % number, because often those numbers are garbage). So first I would really consider adding a clustered index. Without more requirements I don't have an explicit suggestion for you, but it could be as simple as a clustered index only on tournament_id (depending on how many potential rows you have per id) or adding an IDENTITY column which you could potentially use to help determine rows to delete in the future.
I'ld the following steps:
Create an clusterd index on the column tournament_id
Update Statistics for your database
run your query again
From my experience this should give some seconds.
In addition I'll do a more detailed query to your table, if possible.
Version 1 (with date format dd/mm/yyyy):
;WITH To_Delete
(
SELECT tournament_id
FROM dbo.pick
WHERE tournmanet_id < 157
AND date like '01/%/2013' -- if available, Need to be customized
AND date like '03/%/2013' -- if available, Need to be customized
)
DELETE X;
Verion 2 (with month function, no matter which format your date have):
;WITH To_Delete
(
SELECT tournament_id
FROM dbo.pick
WHERE tournmanet_id < 157
AND month(date) = 1
AND month(date) < 3
)
DELETE X;
I'm puzzled by the following. I have a DB with around 10 million rows, and (among other indices) on 1 column (campaignid_int) is an index.
Now I have 700k rows where the campaignid is indeed 3835
For all these rows, the connectionid is the same.
I just want to find out this connectionid.
use messaging_db;
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
Now this query takes approx 30 seconds to perform!
I (with my small db knowledge) would expect that it would take any of the rows, and return me that connectionid
If I test this same query for a campaign which only has 1 entry, it goes really fast. So the index works.
How would I tackle this and why does this not work?
edit:
estimated execution plan:
select (0%) - top (0%) - clustered index scan (100%)
Due to the statistics, you should explicitly ask the optimizer to use the index you've created instead of the clustered one.
SELECT TOP (1) connectionid
FROM outgoing_messages WITH (NOLOCK, index(idx_connectionid))
WHERE (campaignid_int = 3835)
I hope it will solve the issue.
Regards,
Enrique
I recently had the same issue and it's really quite simple to solve (at least in some cases).
If you add an ORDER BY-clause on any or some of the columns that's indexed it should be solved. That solved it for me at least.
You aren't specifying an ORDER BY clause in your query, so the optimiser is not being instructed as to the sort order it should be selecting the top 1 from. SQL Server won't just take a random row, it will order the rows by something and take the top 1, and it may be choosing to order by something that is sub-optimal. I would suggest that you add an ORDER BY x clause, where x being the clustered key on that table will probably be the fastest.
This may not solve your problem -- in fact I'm not sure I expect it to from the statistics you've given -- but (a) it won't hurt, and (b) you'll be able to rule this out as a contributing factor.
If the campaignid_int column is not indexed, add an index to it. That should speed up the query. Right now I presume that you need to do a full table scan to find the matches for campaignid_int = 3835 before the top(1) row is returned (filtering occurs before results are returned).
EDIT: An index is already in place, but since SQL Server does a clustered index scan, the optimizer has ignored the index. This is probably due to (many) duplicate rows with the same campaignid_int value. You should consider indexing differently or query on a different column to get the connectionid you want.
The index may be useless for 2 reasons:
700k in 10 million may be not selective enough
and /or
connectionid needs included so the entire query can used only an index
Otherwise, the optimiser decides it may as well use the PK/clustered index to both filter on campaignid_int and get connectionid, to avoid a bookmark lookup on 700k rows from the current index.
So, I suggest this...
CREATE NONCLUSTERED INDEX IX_Foo ON MyTable (campaignid_int) INCLUDE (connectionid)
This doesn't answer your question, but try using:
SET ROWCOUNT 1
SELECT connectionid
FROM outgoing_messages WITH (NOLOCK)
WHERE (campaignid_int = 3835)
I've seen top(x) perform very badly in certain situations as well. I'm sure it's doing a full table scan. Perhaps your index on that particular column needs to be rebuilt? The above is worth a try, however.
Your query does not work as you expect, because Sql Server keeps statistics about your index and in this particular case knows that there are a lot of duplicate rows with the identifier 3835, hence it figures that it would make more sense to just do a full index (or table) scan. When you test for an ID which resolves to only one row, it uses the index as expected, i.e. performs an index seek (the execution plan should verify this guess).
Possible solutions ? Make the index composite, if you have anything to compose it with, that is, e.g. compose it with the date the message was sent (if I understand your case correctly) and then select the top 1 entry from the list with the specified id ordered by the date. Though I'm not sure whether this would be better (for one, a composite index takes up more space) - just a guess.
EDIT: I just tried out the suggestion of making the index composite by adding a date column. If you do that and specify order by date in your query, an index seek is performed as expected.
but since I'm specifying 'top(1)' it
means: give me any row. Why would it
first crawl through the 700k rows just
to return one? – reinier 30 mins ago
Sorry, can't comment yet but the answer here is that SQL server is not going to understand the human equivalent of "Bring me the first one you find" when it hears "Top 1". Instead of the expected "Give me any row" SQL Server goes and fetches the first of all found rows.
Only time it knows that is after fetching all rows first, then discarding the rest. Very thorough but in your case not really fast.
Main issue as other said are your statistics and selectivity of your index. If you have another unique field in your table (like an identity column) then try an combined index on campaignid_int first, unique column second. As you only query on campaignid_int it has to be the first part of the key.
Sounds worth a try as this index should have a higher selectivity thus the optimizer can use this better than doing an index crawl.