SQL Server 2008 Stored Procedure Performance issue - sql-server

Hi I have a Stored Procedure
ALTER PROCEDURE [dbo].[usp_EP_GetTherapeuticalALternates]
(
#NDCNumber CHAR(11) ,
#patientid INT ,
#pbmid INT
)
AS
BEGIN
TRUNCATE TABLE TempTherapeuticAlt
INSERT INTO TempTherapeuticAlt
SELECT --PR.ProductID AS MedicationID ,
NULL AS MedicationID ,
PR.ePrescribingName AS MedicationName ,
U.Strength AS MedicationStrength ,
FRM.FormName AS MedicationForm ,
PR.DEAClassificationID AS DEASchedule ,
NULL AS NDCNumber
--INTO #myTemp
FROM DatabaseTwo.dbo.Product PR
JOIN ( SELECT MP.MarketedProductID
FROM DatabaseTwo.dbo.Therapeutic_Concept_Tree_Specific_Product TCTSP
JOIN DatabaseTwo.dbo.Marketed_Product MP ON MP.SpecificProductID = TCTSP.SpecificProductID
JOIN ( SELECT TCTSP.TherapeuticConceptTreeID
FROM DatabaseTwo.dbo.Marketed_Product MP
JOIN DatabaseTwo.dbo.Therapeutic_Concept_Tree_Specific_Product TCTSP ON MP.SpecificProductID = TCTSP.SpecificProductID
JOIN ( SELECT
PR.MarketedProductID
FROM
DatabaseTwo.dbo.Package PA
JOIN DatabaseTwo.dbo.Product PR ON PA.ProductID = PR.ProductID
WHERE
PA.NDC11 = #NDCNumber
) PAPA ON MP.MarketedProductID = PAPA.MarketedProductID
) xxx ON TCTSP.TherapeuticConceptTreeID = xxx.TherapeuticConceptTreeID
) MPI ON PR.MarketedProductID = MPI.MarketedProductID
JOIN ( SELECT P.ProductID ,
O.Strength ,
O.Unit
FROM DatabaseTwo.dbo.Product AS P
INNER JOIN DatabaseTwo.dbo.Marketed_Product
AS M ON P.MarketedProductID = M.MarketedProductID
INNER JOIN DatabaseTwo.dbo.Specific_Product
AS S ON M.SpecificProductID = S.SpecificProductID
LEFT OUTER JOIN DatabaseTwo.dbo.OrderableName_Combined
AS O ON S.SpecificProductID = O.SpecificProductID
GROUP BY P.ProductID ,
O.Strength ,
O.Unit
) U ON PR.ProductID = U.ProductID
JOIN ( SELECT PA.ProductID ,
S.ScriptFormID ,
F.Code AS NCPDPScriptFormCode ,
S.FormName
FROM DatabaseTwo.dbo.Package AS PA
INNER JOIN DatabaseTwo.dbo.Script_Form
AS S ON PA.NCPDPScriptFormCode = S.NCPDPScriptFormCode
INNER JOIN DatabaseTwo.dbo.FormCode AS F ON S.FormName = F.FormName
GROUP BY PA.ProductID ,
S.ScriptFormID ,
F.Code ,
S.FormName
) FRM ON PR.ProductID = FRM.ProductID
WHERE
( PR.OffMarketDate IS NULL )
OR ( PR.OffMarketDate = '' )
OR (PR.OffMarketDate = '1899-12-30 00:00:00.000')
OR ( PR.OffMarketDate <> '1899-12-30 00:00:00.000'
AND DATEDIFF(dd, GETDATE(),PR.OffMarketDate) > 0
)
GROUP BY PR.ePrescribingName ,
U.Strength ,
FRM.FormName ,
PR.DEAClassificationID
-- ORDER BY pr.ePrescribingName
SELECT LL.ProductID AS MedicationID ,
temp.MedicationName ,
temp.MedicationStrength ,
temp.MedicationForm ,
temp.DEASchedule ,
temp.NDCNumber ,
fs.[ReturnFormulary] AS FormularyStatus ,
copay.CopaTier ,
copay.FirstCopayTerm ,
copay.FlatCopayAmount ,
copay.PercentageCopay ,
copay.PharmacyType,
dbo.udf_EP_GetBrandGeneric(LL.ProductID) AS BrandGeneric
FROM TempTherapeuticAlt temp
OUTER APPLY ( SELECT TOP 1
ProductID
FROM DatabaseTwo.dbo.Product
WHERE ePrescribingName = temp.MedicationName
) AS LL
OUTER APPLY [dbo].[udf_EP_tbfGetFormularyStatus](#patientid,
LL.ProductID,
#pbmid) AS fs
OUTER APPLY ( SELECT TOP 1
*
FROM udf_EP_CopayDetails(LL.ProductID,
#PBMID,
fs.ReturnFormulary)
) copay
--ORDER BY LL.ProductID
TRUNCATE TABLE TempTherapeuticAlt
END
On my dev server I have data of 63k in each table
so this procedure took about 30 seconds to return result.
On my Production server, it is timing out, or taking >1 minute.
I am wondering my production server tables are full with 1400 millions of records,
can this be a reason.
if so what can be done, I have all required indexes on tables.
any help would be greatly appreciated.
thanks
Execution Plan
http://www.sendspace.com/file/hk8fao
Major Leakage
OUTER APPLY [dbo].[udf_EP_tbfGetFormularyStatus](#patientid,
LL.ProductID,
#pbmid) AS fs

Some strategies that may help:
Remove the first ORDER BY statement, those are killer on complex queries shouldn't be necessary.
Use CTEs to break the query into smaller pieces that can be individually addressed.
Reduce the nesting in the first set of JOINs
Extract the second and third set of joins (the GROUPED ones) and insert those into a temporary indexed table before joining and grouping everything.
You did not include the definition for function1 or function2 -- custom functions are often a place where performance issues can hide.
Without seeing the execution plan, it's difficult to see where the particular problems may be.

You have a query that selects data from 4 or 5 tables , some of them multiple times. It's really hard to say how to improve without deep analysis of what you are trying to achieve and what table structure actually is.
Data size is definitely an issue; I think it's quite obvious that the more data has to be processed, the longer query will take. Some general advices... Run the query directly and check execution plan. It may reveal bottlenecks. Then check if statistics is up to date. Also, review your tables, partitioning may help a lot in some cases. In addition, you can try altering tables and create clustered index not on PK (as it's done by default unless otherwise specified), but on other column[s] so your query will benefit from certain physical order of records. Note : do it only if you are absolutely sure what you are doing.
Finally, try refactoring your query. I have a feeling that there is a better way to get desired results (sorry, without understanding of table structure and expected results I cannot tell exact solution, but multiple joins of the same tables and bunch of derived tables don't look good to me)

Related

Improving SQL query efficiency

I have a query which calculates a distance, it works as expected however it can take a long time to query with lots of data points.
I have tried adding distribution=hash and it has had some impact but not significant amounts.
CREATE TABLE #POI_PINGS_ALL
WITH
(
DISTRIBUTION=HASH([UUID])
)
AS
SELECT
l.[PoiId]
, v.[UUID]
, [CreatedOn]
, v.[Lat]
, v.[Lon]
FROM #LOCATION_PINGS_COUNCIL_DISTRIBUTED v
INNER JOIN #POI_LOOKUP l
ON l.[CouncilId] = v.[CouncilId]
INNER JOIN #POI p
ON p.[PoiId] = l.[PoiId]
WHERE
dbo.fn_GetDist(v.[Lat], v.[Lon], p.[Lat], p.[Lon]) <= p.[Radius]
Is there a more efficient way to write this query ?

SQL Query to get data from various databases

I wrote the below query to pull the data from different databases. I have created two temp tables to pull the data from two different databases and finally a select statement from the original database to join all the tables. My query is getting executed but not getting any data.(Report is blank). I tried executing the two temp tables separately. it is giving the correct data. But when I execute the whole query, the result is blank. Below is the query. Please help.
"set fmtonly off
use GODSDB
IF object_id('tempdb..#CISIS_Call_Log') IS NOT NULL DROP TABLE #CISIS_Call_Log
select *
into #CISIS_Call_Log
from OPENQUERY (CSISDB,
'select
ccl.ContractOID,
ccl.db_insertdate,
ccl.ContractCallLogStatusIdentifier,
ccl.db_UpdateDate,
ccp.ContractCallLogPurposeOID,
ccp.ContractCallLogPurposeIdentifier,
ccp.Description
from csisdb.dbo.ContractCallLog CCL
inner join csisdb.dbo.ContractCallLogPurpose CCP on ccl.ContractCallLogPurposeIdentifier = ccp.ContractCallLogPurposeIdentifier
where JurisdictionShortIdentifier = ''ON''
AND ContractCallLogStatusIdentifier IN (''DNR'', ''NR'')
')
IF object_id('tempdb..#CMS_Campaign') IS NOT NULL DROP TABLE #CMS_Campaign
select *
into #CMS_Campaign
from OPENQUERY (BA_GBASSTOCMS, '
Select
SystemSourceIdentifier,
ContractOID,
OfferSentDate,
CampaignOfferTypeIdentifier,
CampaignContractStatusIdentifier,
CampaignContractStatusUpdateDate,
DeclineDate,
CampaignOfferOID,
CampaignOID,
CampaignStartDate,
CampaignEndDate,
Jurisdiction,
CampaignDescription
from CMS.dbo.vw_CampaignInfo
where Jurisdiction = ''ON''
and CampaignOfferTypeIdentifier = ''REN''
')
select mp.CommodityTypeIdentifier as Commodity
,c.RtlrContractIdentifier as ContractID
,cs.ContractStatusIdentifier as ContractStatus
,c.SigningDate
,cf.StartDate as FlowStartDate
,cf.EndDate as FlowEndDate
,datediff(day, getdate(), c.RenewalDate) as RemainingDays
,c.RenewalDate
,l.ContractCallLogStatusIdentifier as CallLogType
,Substring (l.Description, 1, 20) as CallPurpose
,l.db_insertDate as CallLogDate
,cms.CampaignOfferOID as OfferID
,cms.CampaignContractStatusIdentifier as OfferStatus
,cms.CampaignContractStatusUpdateDate as StatusChangeDate
,cms.DeclineDate
from Contract c
inner join contractstate cs on cs.contractoid = c.ContractOID
and cs.ContractStatusIdentifier in ('ERA', 'FLW')
and datediff(day, getdate(), c.RenewalDate) > 60
inner join SiteIdentification si on si.SiteOID = c.SiteOID
inner join MarketParticipant mp on mp.MarketParticipantOID = si.MarketParticipantOID
inner join Market m on m.MarketOID = mp.MarketOID
inner join Jurisdiction j on j.JurisdictionOID = m.JurisdictionOID
and j.CountryCode = 'CA'
and j.ProvinceOrStateCode = 'ON'
inner join ContractFlow cf on cf.ContractOID = c.ContractOID
inner join #CISIS_Call_Log l on convert(varchar(15), l.ContractOID) = c.RtlrContractIdentifier
inner join #CMS_Campaign cms on convert(varchar(15), cms.ContractOID) = c.RtlrContractIdentifier
set fmtonly on"
IF the data in each temp table is verified, then:
Try a smaller, less complex, query to test your temp tables with. Also try them using a LEFT join as well e.g.:
select
c.RtlrContractIdentifier as ContractID
, c.SigningDate
, datediff(day, getdate(), c.RenewalDate) as RemainingDays
, c.RenewalDate
, l.ContractCallLogStatusIdentifier as CallLogType
, Substring (l.Description, 1, 20) as CallPurpose
, l.db_insertDate as CallLogDate
, cms.CampaignOfferOID as OfferID
, cms.CampaignContractStatusIdentifier as OfferStatus
, cms.CampaignContractStatusUpdateDate as StatusChangeDate
, cms.DeclineDate
from Contract c
LEFT join #CISIS_Call_Log l on convert(varchar(15), l.ContractOID) = c.RtlrContractIdentifier
LEFT join #CMS_Campaign cms on convert(varchar(15), cms.ContractOID) = c.RtlrContractIdentifier
Does this return data? Does it return data from both joined tables?
If neither temp table is returning data then those join conditions need to be changed.
If both temp tables do return data from that query, then try INNER joins. If that still works, then add back more joins (one at a time) until you identify the join that causes the overall fault.
Without data for every table it just isn't possible for us to pinpoint the exact reason for a NULL result. Only you can, so you need to trouble-shoot the problem one step at a time.

RIGHT\LEFT Join does not provide null values without condition

I have two tables one is the lookup table and the other is the data table. The lookup table has columns named cycleid, cycle. The data table has SID, cycleid, cycle. Below is the structure of the tables.
If you check the data table, the SID may have all the cycles and may not have all the cycles. I want to output the SID completed as well as missed cycles.
I right joined the lookup table and retrieved the missing as well as completed cycles. Below is the query I used.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN
[dbo].[lookup_data] s4 ON s3.CYCLEID = s4.CYCLEID
The query is not displaying me the missed values when I query for all the SID's. When I specifically query for a SID with the below query i am getting the correct result including the missed ones.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN [dbo].[lookup_data] s4
ON s3.CYCLEID = s4.CYCLEID
AND s3.SID = 101002
ORDER BY [SID], s4.[CYCLEID]
As I am supplying this query into tableau I cannot provide the sid value in the query. I want to return all the sid's and from tableau I will be do the rest of the things.
The expected output that i need is as shown below.
I wrote a cross join query like below to acheive my expected output
SELECT DISTINCT
tab.CYCLEID
,tab.SID
,d.CYCLE
FROM ( SELECT d.SID
,d.[CYCLE]
,e.CYCLEID
FROM ( SELECT e.sid
,e.CYCLE
FROM [db_temp].[dbo].[Sheet3$] e
) d
CROSS JOIN [db_temp].[dbo].[Sheet4$] e
) tab
LEFT OUTER JOIN [db_temp].[dbo].[Sheet3$] d
ON d.CYCLEID = tab.CYCLEID
AND d.SID = tab.SID
ORDER BY tab.SID
,tab.CYCLEID;
However I am not able to use this query for more scenarios as my data set have nearly 20 to 40 columns and i am having issues when i use the above one.
Is there any way to do this in a simpler manner with only left or right join itself? I want the query to return all the missing values and the completed values for the all the SID's instead of supplying a single sid in the query.
You can create a master table first (combine all SID and CYCLE ID), then right join with the data table
;with ctxMaster as (
select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d
)
select d.SID, m.CYCLE, m.CYCLEID
from ctxMaster m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID
Fiddle
Or if you don't want to use common table expression, subquery version:
select d.SID, m.CYCLE, m.CYCLEID
from (select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d) m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID

Sql Server right side restrictions on left join

Please read it slowly. This isn't a dup.
Tables:
CREATE TABLE [dbo].[TEST] (
[TEST_ID] [integer] IDENTITY (1, 1) NOT NULL ,
....
[TEST_TYPE_ID] [char](1) NULL ,
....
)
CREATE TABLE [dbo].[TEST_A] (
[TEST_ID] [integer] NOT NULL ,
....
)
CREATE TABLE [dbo].[TEST_B] (
[TEST_ID] [integer] NOT NULL ,
....
)
Normally you would write:
select *
from dbo.TEST as t
left join dbo.TEST_A as ta on ta.TEST_ID = t.TEST_ID
left join dbo.TEST_B as tb on tb.TEST_ID = t.TEST_ID
...
However, Sql Server can save a lot of work - IF it knows that only some of table TEST's rows potentially join to TEST_A:
select *
from dbo.TEST as t
left join dbo.TEST_A as ta on t.TEST_TYPE_ID = 'A'
and ta.TEST_ID = t.TEST_ID
left join dbo.TEST_B as tb on t.TEST_TYPE_ID = 'B'
and tb.TEST_ID = t.TEST_ID
...
These queries return the exact same result. Adding TEST_TYPE_ID = X does not change the result.
Note: You CAN'T put the restriction on TEST_TYPE_ID in the where statement. That would change the number of rows returned.
My question is: In a left join if you place a restriction on the right side, will Sql Server use this information first? Order of operations is very important here. This is important when TEST and TEST_A are large, but only a few records join.
I have tested this, and the execution plan seems to indicate: no. It appears Sql Server first does a normal left join trying to join all the records in TEST to TEST_A, then it applies a "filter". However, I'm not certain I'm reading the execution plan correctly. If TEST_TYPE_ID = X is applied second, it is effectly a no-op. If TEST_TYPE_ID = X is applied first, it will limit the left join to only the rows that will actually join.
Note: My actual case looks very different. I have distilled the question down to this bare bones example to demonstrate the issue.

SQL query distict count using inner join

Need help ensuring the below query doesn't return inaccurate results.
select #billed = count(a.[counter]) from [dbo].cxitems a with (nolock)
inner join [dbo].cxitemhist b with (nolock) on a.[counter] = b.cxlink
where b.[eventtype] in ('BILLED','REBILLED')
and b.[datetime] between #begdate and #enddate
The query is "mostly" accurate as is, however there is a slight possibility that cxitemhist table could contain more than 1 "billed" record for given date range. I only need to count item as "Billed" once during given date range.
You can join on a sub query the limits you to one row for each combination of fields used for the join:
select #billed = count(a.[counter])
from [dbo].cxitems a
inner join (
select distinct cxlink
from [dbo].cxitemhist
where [eventtype] in ('BILLED','REBILLED')
and [datetime] between #begdate and #enddate
) b on a.[counter] = b.cxlink
You can also use the APPLY operator instead of a join here, but you'll have to check against your data to see which gives better performance.
If you only need to count records from the cxitems table, that have any corresponding records from the cxitemhist table, you can use the exists clause with a subquery.
select #billed = count(a.[counter]) from [dbo].cxitems a
where exists(select * from [dbo].cxitemhist b
where a.[counter] = b.cxlink
and b.[eventtype] in ('BILLED','REBILLED')
and b.[datetime] between #begdate and #enddate)
Cannot really say how this will affect performance, without specific data, though, but it should be comparably fast with your code.

Resources