How can I speed up this sql server query? - sql-server

-- Holds last 30 valdates
create table #valdates(
date int
)
insert into #valdates
select distinct top (30) valuation_date
from tbsm.tbl_key_rates_summary
where valuation_date <= 20150529
order by valuation_date desc
select
sum(fv_change), sc_group, valuation_date
from
(select *
from tbsm.tbl_security_scorecards_summary
where valuation_date in (select date from #valdates)) as fact
join
(select *
from tbsm.tbl_security_classification
where sc_book = 'UC' ) as dim on fact.classification_id = dim.classification_id
group by
valuation_date, sc_group
drop table #valdates
This query takes around 40 seconds to return because the fact table has almost 13 million rows.. Can I do anything about this?

Based on the fact that there's no proper index that supports the fetch, that's probably the easiest (or only) option to really improve the performance. Most likely index like this would improve the situation a lot:
create index idx_security_scorecards_summary_1 on
tbl_security_scorecards_summary (valuation_date, classification_id)
include (fv_change)
Everything depends of course on how good the selectivity of the valuation_date and classification_id fields are (=how big portion of the table needs to be fetched) and might work better with the fields in opposite order. The field fv_change is in the include section so that it's included in the index structure so there's no need to fetch it from the base table.
Include fields help if the SQL has to fetch a lot of rows from the table. If the amount of rows that this touches is small, then it might not help at all. Like always in indexing, this of course slows down the inserts / updates, and is optimized for this case only and you should of course look at the bigger picture too.
The select is written in a little bit strange way, not sure if that makes any difference, but you could also try the normal way to do this:
select
sum(fact.c), dim.sc_group, fact.valuation_date
from
tbsm.tbl_security_scorecards_summary fact
join tbsm.tbl_security_classification dim
on fact.classification_id = dim.classification_id
where
fact.valuation_date in (select date from #valdates) and
dim.sc_book = 'UC'
group by
fact.valuation_date,
dim.sc_group
Looking at "statistics io" output should give you a good idea which table is causing the slowness, and looking at query plan to see if there's any strange operators might help to understand the situation better.

Related

Optimize SQL in MS SQL Server that returns more than 90% of records in the table

I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.

SQL Server Performance With Large Query

Hi everyone I have a couple of queries for some reports in which each query is pulling Data from 35+ tables. Each Table has almost 100K records. All the Queries are Union ALL for Example
;With CTE
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table3 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table4 WHERE Some_Condition
.
.
. And so on
)
SELECT col1, col2, col3 FROM CTE
ORDER BY col3 DESC
So far I have only tested this query on Dev Server and I can see It takes its time to get the results. All of these 35+ tables are not related with each other and this is the only way I can think of to get all the Desired Data in result set.
Is there a better way to do this kind of query ??
If this is the only way to go for this kind of query how can I
improve the performance for this Query by making any changes if
possible??
My Opinion
I Dont mind having a few dirty-reads in this report. I was thinking of using Query hints with nolock or Transaction Isolation Level set to READ UNCOMMITED.
Will any of this help ???
Edit
Every Table has 5-10 Bit columns and a Corresponding Date column to each Bit Column and my condition for each SELECT Statement is something like
WHERE BitColumn = 1 AND DateColumn IS NULL
Suggestion By Peers
Filtered Index
CREATE NONCLUSTERED INDEX IX_Table_Column
ON TableName(BitColumn)
WHERE BitColum = 1
Filtered Index with Included Column
CREATE NONCLUSTERED INDEX fIX_IX_Table_Column
ON TableName(BitColumn)
INCLUDE (DateColumn)
WHERE DateColumn IS NULL
Is this the best way to go ? or any suggestions please ???
There are lots of things that can be done to make it faster.
If I assume you need to do these UNIONs, then you can speed up the query by :
Caching the results, for example,
Can you create an indexed view from the whole statement ? Or there are lots of different WHERE conditions, so there'd be lots of indexed views ? But know that this will slow down modifications (INSERT, etc.) for those tables
Can you cache it in a different way ? Maybe in the mid layer ?
Can it be recalculated in advance ?
Make a covering index. Leading columns are columns form WHERE and then all other columns from the query as included columns
Note that a covering index can be also filtered but filtered index isn't used if the WHERE in the query will have variables / parameters and they can potentially have the value that is not covered by the filtered index (i.e., the row isn't covered)
ORDER BY will cause sort
If you can cache it, then it's fine - no sort will be needed (it's cached sorted)
Otherwise, sort is CPU bound (and I/O bound if not in memory). To speed it up, do you use fast collation ? The performance difference between the slowest and fastest collation can be even 3 times. For example, SQL_EBCDIC280_CP1_CS_AS, SQL_Latin1_General_CP1251_CS_AS, SQL_Latin1_General_CP1_CI_AS are one of the fastest collations. However, it's hard to make recommendations if I don't know the collation characteristics you need
Network
'network packet size' for the connection that does the SELECT should be the maximum value possible - 32,767 bytes if the result set (number of rows) will be big. This can be set on the client side, e.g., if you use .NET and SqlConnection in the connection string. This will minimize CPU overhead when sending data from the SQL Server and will improve performance on both side - client and server. This can boost performance even by tens of percents if the network was the bottleneck
Use shared memory endpoint if the client is on the SQL Server; otherwise TCP/IP for the best performance
General things
As you said, using isolation level read uncommmitted will improve the performance
...
Probably you can't do changes beyond rewriting the query, etc. but just in case, adding more memory in case it isn't sufficient now, or using SQL Server 2014 in memory features :-), ... would surely help.
There are way too many things that could be tuned but it's hard to point out the key ones if the question isn't very specific.
Hope this helps a bit
well you haven't give any statistics or sample run time of any execution so it is not possible to guess what is slow and is it really slow. how much data is in the result set? it might be just retrieving 100K rows as in result is just taking its time. if the result set of 10000 rows is taking 5 minute, yes definitely something can be looked at. so if you have sample query, number of rows in result and how much time it took for couple of execution with different where conditions, post that. it will help us to compare results.
BTW, do not use CTE just use regular inner and outer query select. make sure the Temp DB is configured properly. LDF and MDF is not default configured for 10% increase. by certain try and error you will come to know how much log and temp DB is increased for verity of range queries and based on that you should set the initial and increment size of the MDF and LDF of temp DB. for the Covered filter index the include column should be col1, col2 and co3 not column Date unless Date is also in select list.
how frequently the data in original 35 tables get updated? if max once per day or if they all get updates almost same time then Indexed-Views can be a possible solution. but if original tables getting updates more than once a day or they gets updates anytime and no where they are in same line then do no think about Indexed-View.
if disk space is not an issue as a last resort try and test performance using trigger on each 35 table. create new table to hold final results as you are expecting from this select query. create insert/update/delete trigger on each 35 table where you check the conditions inside trigger and if yes then only copy the same insert/update/delete to new table. yes you will need a column in new table that identifies which data coming from which table. because Date is Null-Able column you do not get full advantage of Index on that Column as "mostly you are looking for WHERE Date is NULL".
in the new Table only query you always do is where Date is NULL then do not even bother to create that column just create BIT columns and other col1, col2, col3 etc... if you give real example of your query and explain the actual tables, other details can be workout later.
The query hints or the Isolation Level are only going to help you in case of any blocking occurs.
If you dont mind dirty reads and there are locks during the execution it could be a good idea.
The key question is how many data fits the Where clausule you need to use (WHERE BitColumn = 1 AND DateColumn IS NULL)
If the subset filtered by that is small compared with the total number of rows, then use an index on both columns, BitColum and DateColumn, including the columns in the select clausule to avoid "Page Lookup" operations in your query plan.
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(BitColumn, DateColumn)
INCLUDE (col1, col2, col3)
Of course the space needed for that covered-filtered index depends on the datatype of the fields involved and the number of rows that satisfy WHERE BitColumn = 1 AND DateColumn IS NULL.
After that I recomend to use a View instead of a CTE:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
.
.
.
)
By doing that, your query plan should look like 35 small index scans, but if most of the data satisfies the where clausule of your index, the performance is going to be similar to scan the 35 source tables and the solution won't worth it.
But You say "Every Table has 5-10 Bit columns and a Corresponding Date column.." then I think is not going to be a good idea to make an index per bit colum.
If you need to filter by using diferent BitColums and Different DateColums, use a compute column in your table:
ALTER TABLE Table1 ADD ComputedFilterFlag AS
CAST(
CASE WHEN BitColum1 = 1 AND DateColumn1 IS NULL THEN 1 ELSE 0 END +
CASE WHEN BitColum2 = 1 AND DateColumn2 IS NULL THEN 2 ELSE 0 END +
CASE WHEN BitColum3 = 1 AND DateColumn3 IS NULL THEN 4 ELSE 0 END
AS tinyint)
I recomend you use the value 2^(X-1) for conditionX(BitColumnX=1 and DateColumnX IS NOT NULL). It is going to allow you to filter by using any combination of that criteria.
By using value 3 you can locate all rows that accomplish: Bit1, Date1 and Bit2, Date2 condition. Any condition combination has its corresponding ComputedFilterFlag value because the ComputedFilterFlag acts as a bitmap of conditions.
If you heve less than 8 diferents filters you should use tinyint to save space in the index and decrease the IO operations needed.
Then use an Index over ComputedFilterFlag colum:
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(ComputedFilterFlag)
INCLUDE (col1, col2, col3)
And create the view:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
.
.
.
)
By doing that, your index coveres all the conditions and your query plan should look like 35 small index seeks.
But this is a tricky solution, may be a refactoring in your table schema could produce simpler and faster results.
You'll never get real time results from a union all query over many tables but I can tell you how I got a little speed out of a similar situation. Hopefully this will help you out.
You can actually run all of them at once with a little bit coding and ingenuity.
You create a global temporary table instead of a common table expression and don't put any keys on the global temporary table it will just slow things down. Then you start all the individual queries which insert into the global temporary table. I've done this a hundred or so times manually and it's faster than a union query because you get a query running on each cpu core. The tricky part is the mechanism to determine when the individual queries have finished your on your own for that piece hence I do these manually.

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]
That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate
I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.
While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

Tune Slow SQL Query

I got an app running on my SQL Server that is starting to slow down on a specific task. I ran SQL Profiler and noticed that the
following query is taking an enormous (1-2 minutes) amount of time. I don't have access to the code to change the query.
Is there anything I can tune/change in the database? The PC10000 table in the statement below has approx. 119000 records. I also have the execution plan attached.
SELECT TOP 25
zProjectID, zTaskID, zTransactionNumber, zTransactionDate, zUserID,
zCostCategoryDDL, zCostCategoryString, zSubCostCategory, zSubCostCategoryString,
zDepartmentID, zJournalEntry, zPostingDate, zSalesPostingDate, zPeriodNumber,
zTransactionDescription, zBillingDescriptionLine1, zBillingDescriptionLine2,
zBillingDescriptionLine3, zBillingDescriptionLine4, zSalesAccountIndex,
zSalesAccountString, zDistDocumentTypeDDL, zDistDocumentNumber, zDistSequenceNumber,
zSalesDocumentTypeDDL, zSalesDocumentNumber, zSalesLineNumber, zDistHistoryYear,
zSeriesDDL, zSourceDoc, zWebSource, zOrigDocumentNumber, zOrigDocumentDate,
zOrigID, zOrigName, zExpenseStatusDDL, zApprovalUserIDCost, zAccountIndex,
zAccountNumberString, zBillingStatusDDL, zApprovalUserIDBilling, zBillingWorkQty,
zBillingWorkAmt, zQty, zQtyBilled, zUnitCost,
zUnitPrice, zRevenueAmt, zOriginatingRevenueAmt, zCostAmtEntered, zCostAmt,
zOriginatingCostAmt, zPayGroupID, zPayrollStatusDDL, zTotalTimeStatusDDL,
zEmployeeID, zHoursEntered, zHoursPaid, zPayRecord, zItemID, zItemDescription,
zUofM, zItemQty, zBurdenStatusDDL, zUserDefinedDate, zUserDefinedDate2,
zUserDefinedString, zUserDefinedString2, zUserDefinedCurrency,
zUserDefinedCurrency2, zNoteIndex, zImportType, DEX_ROW_ID
FROM
DBServer.dbo.pc10000
WHERE
(zDistDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283')
or zSalesDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283'))
ORDER BY
zProjectID ASC ,zTaskID ASC ,zTransactionNumber ASC
The biggest problem you have looks to be due to lack of suitable indexes.
You can see that because of the presence of Table Scans within the execution plan.
Table Scans hit performance as they mean the whole table is being scanned for data that matches the given clauses in the query.
I'd recommend you add an index on BACHNUMB in GL10001
You may also want to try indexes on zDistDocumentNumber and zSalesDocumentNumber in PC10000, but I think the GL10001 index is the main one.
"IN" clauses are typically quite expensive compared to other techniques, but as you can't change the query itself then there's nothing you can do about that.
Without a doubt, you need to add suitable indexes
The query is doing 2 table scans on the GL10001 table. From a quick look at the query (which is a bit hard to read) I would see if you have an index on the BACHNUMB column.
the execution plan shows pretty clearly that actually locating the rows is what's taking all the time (no cumbersome bookmark lookups, or aggregation/rearrange tasks), so it's quite positively going to be a question of indexing. hover the table scans in the execution plan, and check 'object' in the tooltip, to see what columns are being used. see to it that they're indexed.
you might also want to run a trace to sample some live data, and feed that to the database tuning advisor.
You could rewrite those sub-selects as a join, and add an index to GP01..GL10001 on BACHNUMB and JRNENTRY
Since you can't change the query, the best thing you could do is make sure you have indexes on the columns that you're using for your joins (and subqueries). If you can think of a better query plan, you could provide that to SQL Server instead of letting it calculate its own (this is a very rare case).
Replace the OR with a UNION ALL of two queries this should get shot of those spools
i.e. run the query once with something like this
SELECT ....
(zDistDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283')
UNION ALL
SELECT ...
zSalesDocumentNumber in
(select cast(JRNENTRY as varchar(20))
from DBServer..GL10001
where BACHNUMB = 'PMCHK00004283'))
In addition to adding indexes, you can also convert the IN statements to EXISTS... something along these lines:
SELECT TOP 25 ....
FROM GP01.dbo.pc10000 parent
WHERE EXISTS
(
SELECT child.*
FROM GP01..GL10001 child
WHERE BACHNUMB = 'PMCHK00004283'
and parent.zDistDocumentNumber = child.JRNENTRY
)
OR EXISTS
(
SELECT child2.*
FROM GP01..GL10001 child2
WHERE BACHNUMB = 'PMCHK00004283'
and parent.zSalesDocumentnumber = child2.JRENTRY
)
ORDER BY zProjectID ASC ,zTaskID ASC ,zTransactionNumber ASC

Resources