Need to reduce Query Time - sql-server

My problem is below query takes 38 seconds to complete,
I need to reduce this time as much as I can.
When I look at Execution plan : %54 cost spend on Dim_Customers Index Scanning.
Any suggestion would be appreciated. Thanks
DECLARE #SalesPersonCode NVARCHAR(4)
DECLARE #StartDate DATETIME
DECLARE #EndDate DATETIME
SET #SalesPersonCode = 'AC';
SET #StartDate = '03/01/2012';
SET #endDate = '03/31/2012';
SELECT AA_FactSalesOrderDetails.Salesperson
, Dim_SalesOrganisation.[Salesperson name]
, AA_FactSalesOrderDetails.[Order Date]
, Dim_Customers.[Customer number]
, Dim_Customers.[Customer name]
, Dim_Customers.[Area/state]
, Dim_Customers.country
, Dim_Customers.[Customer stop] AS [Customer Block]
, AA_FactSalesOrderDetails.[Customer order stop] AS [Co Stop]
, AA_FactSalesOrderDetails.[First delivery date Header]
, AA_FactSalesOrderDetails.[Last delivery date Header]
, Dim_Customers.[User-defined field 6 - customer]
, Dim_Customers.[Customer group name]
, AA_FactSalesOrderDetails.[Contact Method]
, AA_FactSalesOrderDetails.[Customer order number]
, AA_FactSalesOrderDetails.[Price Level]
, AA_FactSalesOrderDetails.[Item number]
, Dim_Items.[Product group description] AS [Item name]
, AA_FactSalesOrderDetails.[Ordered quantity - basic U/M] AS [Quantity Ordered]
, AA_FactSalesOrderDetails.[Ordered quantity - basic U/M] * AA_FactSalesOrderDetails.[Net price] AS [Order Line Total ]
FROM AA_FactSalesOrderDetails
LEFT JOIN
Dim_SalesOrganisation
ON
AA_FactSalesOrderDetails.Salesperson = Dim_SalesOrganisation.Salesperson
LEFT JOIN
Dim_Customers
ON
AA_FactSalesOrderDetails.Dim_Customers_dKey = Dim_Customers.Dim_Customers_dKey
LEFT JOIN
Dim_Items
ON
AA_FactSalesOrderDetails.[Item number] = Dim_Items.[Item number]
LEFT JOIN
Dim_CustomerOrderTypes
ON
AA_FactSalesOrderDetails.[Customer order type] = Dim_CustomerOrderTypes.[Customer order type]
WHERE AA_FactSalesOrderDetails.[Order Date]
BETWEEN
dbo.fnc_M3_sql_datetime_to_M3_date(#StartDate) /* !!!Procedural Approach!!! */
AND
dbo.fnc_M3_sql_datetime_to_M3_date(#EndDate) /* !!!Procedural Approach!!! */
AND
AA_FactSalesOrderDetails.Salesperson = #SalesPersonCode

Since the fnc_M3_sql_datetime_to_M3_date takes a value that is constant throughout the execution of the query, move those two calls (the one with startDate and the one with endDate to the top of your query and assign the returned values to declared variables. Then reference those declared variables below instead of calling the function within the where clause. That may help. Functions sometimes inhibit the formulation of a good query plan.
This talks a little about it
Why do SQL Server Scalar-valued functions get slower?
and this too
http://strictlysql.blogspot.com/2010/06/scalar-functions-on-where-clause.html
declare #m3StartDate Numeric(8,0)
Set #m3StartDate = fnc_M3_sql_datetime_to_M3_date(#StartDate)
declare #m3EndDate Numeric(8,0)
Set #m3EndDate = fnc_M3_sql_datetime_to_M3_date(#EndDate)
...
WHERE AA_FactSalesOrderDetails.[Order Date]
BETWEEN #m3StartDate AND #m3EndDate
AND
AA_FactSalesOrderDetails.Salesperson = #SalesPersonCode
The type of the two #m3-- vars should be exactly the same as AA_FactSalesOrderDetails.[Order Date].
I would also examine the definition of the key on Dim_Customers that is getting the scan instead of a seek, and ensure Dim_Customers is indexed in a way that helps you if it isn't already.
http://blog.sqlauthority.com/2009/08/24/sql-server-index-seek-vs-index-scan-diffefence-and-usage-a-simple-note/

Although #hatchet is right in avoiding using functions on WHERE clause, I guess this is not the problem in this case, because it is used over scalar values (one could only be sure with the actual query plan).
Definitely, you can remove the reference to the table Dim_CustomerOrderTypes, that is not filtering nor returning any data. And I believe this query should improve performance using the following indexes:
-- to seek on [Salesperson] and scan on [Order Date]
CREATE CLUSTERED INDEX IDXC ON AA_FactSalesOrderDetails([Salesperson], [Order Date]);
-- to seek on key
CREATE CLUSTERED INDEX IDXC ON Dim_Customers([Dim_Customers_dKey]);
-- to seek only this index instead of reading from table
CREATE INDEX IDX0 ON Dim_SalesOrganisation([Salesperson], [Salesperson name]);
-- to seek only this index instead of reading from table
CREATE INDEX IDX0 ON Dim_Items ([Item number], [Product group description])
I hope these suggestions help you.

I am willing to bet money that this version runs faster than 35 seconds.
Now, there still may be other optimizations possible (such as creating or improving indexes, which we can't know without seeing the plan), but I think I've cleaned up several issues in your query that should assist performance.
EDIT a few edits since apparently the user is running against 2000 even though the question is tagged 2008...
-- make sure you don't have an implicit conversion between varchar and nvarchar
DECLARE
#SalesPersonCode NVARCHAR(4),
#StartDate DATETIME,
#EndDate DATETIME;
SELECT
#SalesPersonCode = N'AC', -- nvarchar needs N prefix!
-- get rid of the function call, I am guessing it just removes time
-- in which case, use the DATE data type instead.
#StartDate = '20120301',
#EndDate = '20120331';
-- since a salesperson can only have one code, and you are only pulling the name into the
-- SELECT list (it will be the same for every row), use a constant and eliminate the join.
DECLARE #SalesPersonName NVARCHAR(255);
SELECT #SalesPersonName = SalesPerson_Name
FROM dbo.Dim_SalesOrganisation
WHERE SalesPerson = #SalesPersonCode;
-- I've added table aliases which make the query MUCH, MUCH easier to read
SELECT f.Salesperson
, Salesperson_name = #SalesPersonName
, f.[Order Date]
, c.[Customer number]
, c.[Customer name]
, c.[Area/state]
, c.country
, c.[Customer stop] AS [Customer Block]
, f.[Customer order stop] AS [Co Stop]
, f.[First delivery date Header]
, f.[Last delivery date Header]
, c.[User-defined field 6 - customer]
, c.[Customer group name]
, f.[Contact Method]
, f.[Customer order number]
, f.[Price Level]
, f.[Item number]
, i.[Product group description] AS [Item name]
, f.[Ordered quantity - basic U/M] AS [Quantity Ordered]
, f.[Ordered quantity - basic U/M] * f.[Net price] AS [Order Line Total ]
-- I've also added schema prefix. See below *
FROM
dbo.AA_FactSalesOrderDetails AS f
-- I've removed the join to Dim_SalesOrganisation as per above
LEFT OUTER JOIN dbo.Dim_Customers AS c
ON f.c_dKey = c.Dim_Customers_dKey
LEFT OUTER JOIN dbo.Dim_Items AS i
ON f.[Item number] = i.[Item number]
-- I've removed the join to Dim_CustomerOrderTypes since it is never used
WHERE
-- in case [Order Date] is DATETIME and includes time information. See below **
f.[Order Date] >= #StartDate
AND f.[Order Date] < DATEADD(DAY, 1, #EndDate)
-- still need to restrict it to the stated salesperson
AND f.SalesPerson = #SalesPersonCode;
* Bad habits to kick : avoiding the schema prefix
** Bad habits to kick : mis-handling date / range queries

Related

SQL Server contiguous dates - summarizing multiple rows into contiguous start and end date rows without CTE's, loops,...s

Is it possible to write an sql query that will summarize rows with start and end dates into rows that have contiguous start and end dates?
The constraint is that it has to be regular sql, i.e. no CTE's, loops and the like as a third party tool is used that only allows an sql statement to start with Select.
e.g.:
ID StartDate EndDate
1001, Jan-1-2018, Jan-04-2018
1002, Jan-5-2018, Jan-13-2018
1003, Jan-14-2018, Jan-18-2018
1004, Jan-25-2018, Feb-05-2018
The required output needs to be:
Jan-1-2018, Jan-18-2018
Jan-25-2018, Feb-05-2018
Thank you
You can take advantage of both window functions and the use of a concept called gaps-and-islands. In your case, contiguous dates would be the island, and the the gaps are self explanatory.
I wrote the answer below in a verbose way to help make it clear what the query is doing, but it could most likely be written in a different way that is more concise. Please see my comments in the answer explaining what each step (sub-query) does.
--Determine Final output
select min(c.StartDate) as StartDate
, max(c.EndDate) as EndDate
from (
--Assign a number to each group of Contiguous Records
select b.ID
, b.StartDate
, b.EndDate
, b.EndDatePrev
, b.IslandBegin
, sum(b.IslandBegin) over (order by b.ID asc) as IslandNbr
from (
--Determine if its Contiguous (IslandBegin = 1, means its not Contiguous with previous record)
select a.ID
, a.StartDate
, a.EndDate
, a.EndDatePrev
, case when a.EndDatePrev is NULL then 1
when datediff(d, a.EndDatePrev, a.StartDate) > 1 then 1
else 0
end as IslandBegin
from (
--Determine Prev End Date
select tt.ID
, tt.StartDate
, tt.EndDate
, lag(tt.EndDate, 1, NULL) over (order by tt.ID asc) as EndDatePrev
from dbo.Table_Name as tt
) as a
) as b
) as c
group by c.IslandNbr
order by c.IslandNbr
I hope following SQL query can help you to identify gaps and covered dates for given case
I did not use a CTE expression of a dates table function, etc
On the other hand, I used a numbers table using master..spt_values to generate the dates table as the main table of a LEFT join
You can create a numbers table or a dates table if it does not fit to your requirements
In the query, to catch changes between borders I used SQL LAG() function which enables me to compare with previous value of a column in a sorted list
select
max(startdate) as startdate,
max(enddate) as enddate
from (
select
date,
case when exist = 1 then date else null end as startdate,
case when exist = 0 then dateadd(d,-1,date) else null end as enddate,
( row_number() over (order by date) + 1) / 2 as rn
from (
select date, exist, case when exist <> (lag(exist,1,'') over (order by date)) then 1 else 0 end as changed
from (
select
d.date,
case when exists (select * from Periods where d.date between startdate and enddate) then 1 else 0 end as exist
from (
SELECT dateadd(dd,number,'20180101') date
FROM master..spt_values
WHERE Type = 'P' and dateadd(dd,number,'20180101') <= '20180228'
) d
) cte
) tbl
where changed = 1
) dates
group by rn
Here is the result

Optimize short part of a SQL Server query

I have this subquery that takes a little to long. Does anyone has any idea about how can I modify it in order to be faster?
ISNULL(ISNULL(
(select top 1 CONVERT(VARCHAR(11), qq.startdate , 111)
from (select a.startdate, a.ownerid
from wfassignment a
where a.ownertable='PM' /*order by a.startdate*/)qq
where qq.ownerid=pm.pmuid ),
(select min(w.reportdate)
from workorder w where w.pmnum=pm.pmnum
and w.siteid=pm.siteid
and w.orgid= pm.orgid)
),CONVERT(DATETIME,'01-02-2015 00:00:00'))
In oracle it's much more faster than in SQL Server. I also want to know for sure if top 1 is equivalent with rownum=1 from oracle.
Thanks :)
I'm assuming you need the minimum startdate in your first subquery, so I worked out this:
select top 1 [sq].[pm_date]
from
(
select convert(tinyint, 1) as [priority]
, min(a.startdate) as [pm_date]
from wfassignment a
where a.ownertable = 'PM'
and a.ownerid = pm.pmuid
union all
select 2
, min(w.reportdate)
from workorder w
where w.pmnum = pm.pmnum
and w.siteid = pm.siteid
and w.orgid = pm.orgid
union all
select 3
, convert(datetime, '2015-02-01 00:00:00')
/* use yyyymmdd format to avoid confusion when casting datetime values */
) as sq
where [sq].[pm_date] is not null
order by [sq].[priority] asc
You need to add the outer reference to the [pm] alias though, but that part wasn't given in your question, so I've just worked it out like this.

SQL Server : optimize the efficiency with many joins relationship

I have SQL Server code which takes long time to run the result. In the past, it took 15 minutes. But recently, might as a result of accumulated sales data, it took 2 hours to get the result!!
Therefore, I would like to get some advice regarding how to optimize the code:
The code structure is simple: just to get the sales sum for different regions for different time periods and for each SKU. (I have deleted some code here is to find the different SKU for each materials without size).
Many thanks in advance for your help.
The main code structure is as below, since it is almost the same, so I just give the first 2 paragraphs as example:
SELECT SKU from [MATINFO]
-- Global Sales History Qty - All the years
LEFT JOIN
(
SELECT SKU,SUM([SALES Qty]) as [Global Sales History Qty - All the years]
from dbo.[SALES]
where [PO] IS NOT NULL
group by SKU
)histORy
on MATINFO.[SKU]=histORy.[SKU]
-- Global Sales History Qty - Past 2 years
LEFT JOIN
(
SELECT (
SELECT SKU,SUM([SALES Qty]) as [Global Sales History Qty - All the years]
from dbo.[SALES]
where [PO] IS NOT NULL
group by SKU
/* date range */
and ([ORDER DATE] = '2015.11' OR [ORDER DATE] = '2015.12' or [ORDER DATE] like '%2015%' OR [ORDER DATE] like '%2016%' )
group by SKU
)histORy2
on MATINFO.[SKU]=histORy2.[SKU]
--Global Sales History Qty - Past 1 years
......SIMILAR TO THE CODE STRUCTURE AS ABOVE
The most likely cause of the poor performance is using string for dates and possibly the lack if as adequate indexes.
like '%2015%'
Using double-ended wildcards with like results in full table scans so subqueries are scanning the whole table each time you serach for a different date range. Using temp tables will not solve the underlying issues.
[added later]
Another facet of your original query structure might reduce the number of scans you need of the data - by using "conditional aggregates"
e.g. here is a condensed version of your original query
SELECT
SKU
FROM [MATINFO]
-- Global Sales History Qty - All the years
LEFT JOIN (SELECT
SKU
, SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY
SKU) histORy ON MATINFO.[SKU] = histORy.[SKU]
-- Global Sales History Qty - Past 2 years
LEFT JOIN (SELECT
SKU
, SUM([SALES Qty]) AS [Global Sales History Qty - Past 2 years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
/* date range */
AND [ORDER DATE] >= '20151101' AND [ORDER DATE] < '20161101'
GROUP BY
SKU) histORy2 ON MATINFO.[SKU] = histORy2.[SKU]
That requires a 2 complete passes of the data in dbo.[SALES], but if you were to use a case expression inside the SUM() function you need only one pass of the data (in this example)
SELECT
SKU
, SUM([SALES Qty]) AS [Qty_all_years]
, SUM(CASE
WHEN [ORDER DATE] >= '20151101' AND [ORDER DATE] < '20161101'
THEN [SALES Qty]
END) AS [Qty_past_2_years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY
SKU
I suspect you could apply this logic to most of the columns and substantially improve efficiency of the query when coupled with date columns and appropriate indexing.
Expansion on my comment. Note it is just a suggestion, no guarantee if will run faster.
Take the following derived table histORy:
SELECT SKU,SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY SKU
Before you run your query, materialize the derived table in a temporary table:
SELECT SKU,SUM([SALES Qty]) AS [Global Sales History Qty - All the years]
INTO #histORy
FROM dbo.[SALES]
WHERE [PO] IS NOT NULL
GROUP BY SKU
Then use the temporary table in the query:
LEFT JOIN #histORy AS h ON MATINFO.[SKU]=h.[SKU]
In this case you may want to have a index on the SKU field, so you could create the temporary table yourself, slap an index on it, populate with INSERT INTO #history... SELECT ... etc.

Can I get row_number to continue the sequence on a later query

I've got the following query to give me a ROW_NUMBER that acts as a 'stage' in a case being progressed. This query runs under an INSERT INTO to populate a table.
At the minute we're having to re-build the table in full every month to capture every change in stage with a sequential row number (one row per stage of the case as the process moves on). The table will get too big to do this with at some point, I'd like to be able to append but the problem is, I can't think of a way to get the row numbers to continue from where they left off. I tried it in a test table and when I ran the query twice, doing November 2015 in two halves, any stages within each case that changed from the first half of November into the second half had the row number starting again from 1 if it was in the second run, instead of carrying on.
The column ModifiedRecordID is what identifies the cases and you can see is what the PARTITION BY is grouped on.
Is there any way to do what I'm thinking of?
SELECT agc.ObjectTypeCode
, ev.Name AS EntityName
, agc.AttributeId
, ea.AttributeName
, ind.createdbyname AS CaseCreatedBy
, agc.CallingUserName
, agc.CallingUserId
, LEFT(CAST(agc.OldFieldValue AS NVARCHAR(MAX)), ISNULL(NULLIF(CHARINDEX(',', CAST(agc.OldFieldValue AS NVARCHAR(MAX))) -1, -1), '')) AS ChangedField
, UPPER(RIGHT(CAST(agc.OldFieldValue AS NVARCHAR(MAX)), LEN(CAST(agc.OldFieldValue AS NVARCHAR(MAX))) - CHARINDEX(',', CAST(agc.OldFieldValue AS NVARCHAR(MAX))))) AS PreviousGuidValue
, wkt.ptl_name AS WorkType
, ModifiedRecordId
, ind.ticketnumber AS CaseRef
, agc.ActionId
, ind.createdon AS MatterCreated
, agc.LogDateTime AS AuditedDate
, agc.AuditId
, ROW_NUMBER() OVER(PARTITION BY ModifiedRecordId ORDER BY agc.LogDateTime ASC) Stage
FROM AuditGuidChange agc LEFT JOIN EntityView ev
ON agc.ObjectTypeCode = ev.ObjectTypeCode
LEFT JOIN EntityAttribute ea
ON agc.AttributeId = ea.ColumnNumber
AND agc.ObjectTypeCode = ea.MappedObjectCode
LEFT JOIN Peppermint_Data.dbo.incident ind
ON agc.modifiedrecordid = ind.incidentid
LEFT JOIN Filteredptl_worktype wkt
ON agc.PreviousGuidValue = wkt.ptl_worktypeid
WHERE LEFT(CAST(agc.OldFieldValue AS NVARCHAR(MAX)), ISNULL(NULLIF(CHARINDEX(',', CAST(agc.OldFieldValue AS NVARCHAR(MAX))) -1, -1), '')) = 'ptl_worktype'
AND CAST(agc.LogDateTime AS DATE) BETWEEN #sDate AND #eDate;
declare #offset int = 12;
select id, ROW_NUMBER() over (order by id) + #offset as rn
from table;

Use DISTINCT with ORDER BY clause

I want to Alter my following stored procedure
ALTER Procedure [dbo].[spEGCRedemptionReportForMHR]
#DateTo datetime,
#DateFrom datetime,
#MerchantID varchar(11),
#Pan varchar(16)
As
Set #DateTo = #DateTo +1
Select Distinct
Convert(varchar(50),pt.TransactionDate,103) 'TransactionDate',
m.MerchantName1 MerchantName,
m.MerchantAddress Location,
m.MerchantID,
pt.TerminalID,
pt.batchnumber 'Batch #',
pt.SequenceNumber 'Receipt #',
pt.PAN 'Card Number',
c.EmbossName 'Card Holder Name',
Convert(Decimal(10,2),Case when pt.TransactionTypeID=2 then (pt.TotalAmount) end) As 'Points Redeemed',
Convert(Decimal(10,2),Case when pt.TransactionTypeID=2 then (((pt.TotalAmount)/(cc.usdconversionrate))/2) end) as 'Total Payment Amount (AED)', --/cc.USDConversionRate end) As 'Total Amount in AED',
Convert(Decimal(10,2),Case when pt.TransactionTypeID=2 then (((pt.TotalAmount)/(cc.usdconversionrate))/2) -15 end) as 'Total loaded Amount (AED)',
3.00 as 'Procco Share',
Convert(Decimal(10,2),Case when pt.TransactionTypeID=2 then (((pt.TotalAmount)/(cc.usdconversionrate))/2) - 3 end) as 'Settlement Amount'
from POS_Transactions pt
inner join Terminal t on t.TerminalID=pt.TerminalID
inner join Merchant m on m.MerchantID=t.MerchantID
inner join Card c on c.EmbossLine=pt.PAN
inner join Share s on s.MerchantID=m.MerchantID,Currency cc
where IsEmaar =1 and
cc.CurrencyCode='AED'
--and m.isemaarmerchant = 1
and (m.MerchantID=#MerchantID or #MerchantID='-999')
and (pt.TransactionDate>=#datefrom and pt.TransactionDate<=#dateto)
and (pt.PAN=#Pan or #Pan ='-999')
order by pt.TransactionDate
But it throws an error everytime I am trying to execute it
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
I have already used pt.TransactionDate in my select but still its asking me to include it since it is in my order by clause. What is possibly wrong with my query?
Try:
ORDER BY 'TransactionDate'
The error is not that helpful, but the comment by ughai is correct. The engine is particular about how the column is modified in the SELECT statement, and the exact form must be included in the "ORDER BY."
I have run into this when concatenating columns for output reasons, and it gets ugly fast.
You must
ORDER BY Convert(varchar(50),pt.TransactionDate,103)
If you use GROUP BY instead of DISTINCT to eliminate duplicates then you can include grouped fields whether they're in the select list of not. The downside is you'll have to list all the columns used but this will do it.
GROUP BY
pt.TransactionDate,
m.MerchantName1,
m.MerchantAddress ,
m.MerchantID,
pt.TerminalID,
pt.batchnumber,
pt.SequenceNumber,
pt.PAN ,
c.EmbossName,
pt.TransactionTypeID,
pt.TotalAmount,
cc.usdconversionrate
ORDER BY pt.TransactionDate
Yes the column pt.TransactionDate appears in the SELECT but, in inside the CONVERT function as below:
CONVERT(VARCHAR(50), pt.TransactionDate, 103) 'TransactionDate',
add pt.TransactionDate by itself as an extra column as follows:
...
SELECT DISTINCT
CONVERT(VARCHAR(50), pt.TransactionDate, 103) 'TransactionDate',
pt.TransactionDate, -- <-- here
m.MerchantName1 MerchantName,
m.MerchantAddress Location,
m.MerchantID,
pt.TerminalID,
....

Resources