Right Index scan when used XML

Right Index scan when used XML - sql-server

below is T-SQL generated by application
DECLARE #xml_0 XML
SET #xml_0 = N'<Val>8e4cd3e3-de98-4f55-9c55-57881157a0f0</Val>
<Val>2f483275-7333-4786-aca8-454e5bf4823f</Val>
<Val>ce1ce763-1f68-48ec-bedf-f4641e40d8f8</Val>
<Val>6b471d5e-fd5c-4db8-aa31-abb910651e18</Val>
<Val>89064e42-0592-4845-b21e-38f788ab0d2e</Val>
<Val>d54793f0-cbfb-428e-ba08-db70cab1af07</Val>
<Val>8027e6bd-09e5-4a5b-aae7-54aff4a0e6c0</Val>
<Val>53f1a5e3-b2a8-49c3-935b-a5ac7fe0c1d8</Val>
<Val>faceabad-1d0c-4f3f-8d94-674bbf1c3428</Val>
<Val>f8e0a43d-cff7-45aa-b73f-6858b1d17cd1</Val>
<Val>94e9bc76-5bb3-4cf9-9b59-fc3163c904d7</Val>
<Val>e4be8c69-5166-40cc-b49a-18adec78e356</Val>
<Val>5c564b82-64e1-46c5-a41d-bc30104f14a5</Val>
<Val>dc246c2c-7edd-407a-b378-747789bd5a75</Val>
<Val>411ac1e9-3d4f-447c-808a-b82d388816dd</Val>'
SELECT COUNT(*) FROM
(
SELECT [t0].[ID]
FROM [dbo].[HM_Rows] AS [t0], [dbo].[HM_Cells] AS [t1]
WHERE
(
[t1].[Value] IN
(
SELECT node.value('.', 'NVARCHAR(200)') FROM #xml_0.nodes('/Val') xml_0(node)
)
)
) AS [r2487772634]
here is execution plan of T-SQL above
so it scans index
it scans correct indexes
missing_index_FOR_Value_INC_RowID - on HM_Cells table
and
PK_HM_Rows - on HM_Rows table
any idea?
P.S tables are large
Row counts
HM_Rows - 17'736'181
HM_Cells - 1'048'693'775
AND YES i have rebuilded indexes and updated statistics
HM_Cells.Value is NVarChar(200)
also without XML and HM_Rows table it working fine
e.g
SELECT ID FROM HM_Cells WHERE Value IN (.........)
works excellent
Thanks a lot :)

Try using a JOIN instead of an IN, because this way you can force a loop strategy, which will probably use a seek instead of a scan:
SELECT COUNT(*) FROM
(
SELECT [t0].[ID]
FROM (
SELECT DISTINCT node.value('.', 'NVARCHAR(200)') AS Val
FROM #xml_0.nodes('/Val') xml_0(node)
) q1 INNER LOOP JOIN [dbo].[HM_Cells] AS [t1] ON q1.Val=t1.Value
CROSS JOIN [dbo].[HM_Rows] AS [t0]
) AS [r2487772634]

Related

Multiple Substring Charindex

I am trying to extract strings from a column to show which are the FROM Tables and JOIN Tables. the complete string consists of one FROM Table and multiple JOIN Tables. Here is a sample of a string:
FROM [TABLEOWNER] .[load_XT_Customer_test] load_XT_Customer_test
INNER JOIN [TABLEOWNER].[load_xt_orders_test] load_xt_orders_test ON load_xt_customer_test.customer_id=load_xt_orders_test.customer_id
INNER JOIN [TABLEOWNER].[load_XT_Orders] load_XT_Orders ON
load_xt_customer_test.customer_id=load_xt_orders.customer_id
INNER JOIN [TABLEOWNER].[load_xt_order_details_test] load_xt_order_details_test
ON load_xt_customer_test.customer_id=load_xt_order_details_test.customer_id
My problem here is that there is no unique character to separate the strings, how can I dynamically extract the single join tables?
I tested it with:
SELECT st_where
,SUBSTRING(st_where
,CHARINDEX('] ',st_where,1)+1
,ABS(CHARINDEX(' ',st_where,CHARINDEX('] ',st_where,1)+1)
-CHARINDEX(']',st_where,1)-1))
AS FromTable
,SUBSTRING(st_where
,CHARINDEX('JOIN [TABLEOWNER].[',st_where,1)+19
,ABS(CHARINDEX(' ',st_where,CHARINDEX('JOIN [TABLEOWNER].[',st_where,1)+19)
-CHARINDEX('JOIN [TABLEOWNER].[',st_where,1)-20))
AS JoinTable
,SUBSTRING(st_where
,CHARINDEX('=',st_where,1)+59
,ABS(CHARINDEX(' ',st_where,CHARINDEX('=',st_where,1)+59)
-CHARINDEX('=',st_where,1)-60))
AS JoinTable2

This should get you started:
DECLARE #st_where nvarchar(max) =
N'
FROM [TABLEOWNER] .[load_XT_Customer_test] load_XT_Customer_test
INNER JOIN [TABLEOWNER].[load_xt_orders_test] load_xt_orders_test ON load_xt_customer_test.customer_id=load_xt_orders_test.customer_id
INNER JOIN [TABLEOWNER].[load_XT_Orders] load_XT_Orders ON
load_xt_customer_test.customer_id=load_xt_orders.customer_id
INNER JOIN [TABLEOWNER].[load_xt_order_details_test] load_xt_order_details_test
ON load_xt_customer_test.customer_id=load_xt_order_details_test.customer_id
';
SELECT
TableName =
SUBSTRING
(
SS.[value],
CharStart.pos + 1,
CharEnd.pos - CharStart.pos + 1
)
FROM STRING_SPLIT
(
-- Replace delimiter phrase with a single character
-- so we can use STRING_SPLIT
-- or use a more general string splitter
REPLACE(#st_where, N'INNER JOIN', NCHAR(256)),
NCHAR(256)
) AS SS
CROSS APPLY
(
VALUES
(
-- Find the start of the table name
CHARINDEX(N'.[', SS.[value], 1)
)
) AS CharStart (pos)
CROSS APPLY
(
VALUES
(
-- Find the end of the table name
CHARINDEX(N']', SS.[value], CharStart.pos)
)
) AS CharEnd (pos);
db<>fiddle demo
The general idea is to split the input string on the phrase "INNER JOIN" then locate the table names using ".[" and "]" as delimiters.

sql tvp vs xml speed in stored procedure are the same??? Is my testing or logic flawed?

I'm an accidental DBA charged with speeding up all our sql servers. I've got a highly used query with a horrible average worker time. I noticed it uses XML to pass data to a stored procedure. Query plan tells me it spends most of its time converting XML. Everything I've read says XML is about 33% slower than TVP. I rewrote the SP using TVP and compared times using the method:
SELECT #StartTime=GETDATE()
exec GetTVPData3 #tvp --or XML method
SELECT #EndTime=GETDATE()
SELECT DATEDIFF(ms,#StartTime,#EndTime) AS [Duration in millisecs]
After many runs and averaging out the times.... TVP vs XML has XML winning by 5ms. ????? (550ms vs 545ms) Is my testing or logic flawed?
Both XML and TVP are populated before I get StartTime. I've run this on 2 different SQL test servers with similar results.
The particular code is used in a cross apply. The only difference in the SPs are:
**TVP**
CROSS APPLY (SELECT id AS ProductID, sortorder AS SortOrder FROM #Insert_tvp) Items
**XML**
CROSS APPLY (
SELECT f.id.value('#id', 'int') AS ProductID, f.id.value('#sortorder', 'int') AS SortOrder
FROM #ProductIDs.nodes('list/p')
AS f(id)
) Items
Everything in my head tells me we need to switch to using TVP and get rid of XML. But I can't convince coders without better results.
EDIT: Adding the whole XML SP:
ALTER PROCEDURE [dbo].[ExtendedDataXML]
#HostedSiteID INT,
#ProductIDs XML = NULL,
#ImageType VARCHAR(20) = NULL
AS
BEGIN
SET NOCOUNT ON;
SELECT
Products.ID AS ItemID,
0 AS ItemType,
Products.SKU,
Products.Title,
HSP.Slug,
Products.Rank,
Products.Rank AS SalesRank,
Products.Status,
Products.LaunchDate,
Products.IsOnline,
Products.IsAutoOffline,
Products.IsSalableOnline,
Products.IsMarketableOnline,
Products.LeadIn, Products.LeadOut,
COALESCE(Products.CaseQuantity, 1) AS CaseQuantity,
COALESCE(Products.MinimumOrderQuantity, 1) AS MinimumOrderQuantity,
Products.QuantityOnHand,
Image.Filename, Image.Width, Image.Height, Image.Alt, Image.Title,
Pricing.Price, Pricing.SalePrice,
Products.TruckShipment,
HSP.NDescription
FROM Products
JOIN HostedSites_Products HSP ON Products.ID = HSP.ProductID
CROSS APPLY (
SELECT f.id.value('#id', 'int') AS ProductID, f.id.value('#sortorder', 'int') AS SortOrder
FROM #ProductIDs.nodes('list/p')
AS f(id)
) Items
OUTER APPLY (
SELECT TOP(1) Filename, Width, Height, Alt, Title
FROM Items_Images
JOIN Images ON Items_Images.ImageID = Images.ID
WHERE Items_Images.ItemID = Products.ID
AND Items_Images.ItemType = 0
AND Images.Type = COALESCE(#ImageType, '.4b')
) Image
OUTER APPLY (
SELECT TOP(1) Price, SalePrice, CurrentPrice
FROM ProductPrices
WHERE ProductPrices.ProductID = Products.ID
ORDER BY LoRange ASC
) Pricing
WHERE Products.ID = Items.ProductID
AND HSP.HostedSiteID = #HostedSiteID
AND HSP.Validated = 1
AND Products.IsMarketableOnline = 1
ORDER BY Items.SortOrder
END

CROSS APPLY means row wise execution! You are parsing your XML over and over...
Your ID-List is - as far as I understand - meant as a filter
Besides the fact, that this was much better done within an inlined TVF (syntax without BEGIN...END you might try this like this:
ALTER PROCEDURE [dbo].[ExtendedDataXML]
#HostedSiteID INT,
#ProductIDs XML = NULL,
#ImageType VARCHAR(20) = NULL
AS
BEGIN
SET NOCOUNT ON;
WITH IDList AS
(
SELECT f.id.value('#id', 'int') AS ProductID, f.id.value('#sortorder', 'int') AS SortOrder
FROM #ProductIDs.nodes('list/p') AS f(id)
)
SELECT
Products.ID AS ItemID,
0 AS ItemType,
Products.SKU,
Products.Title,
HSP.Slug,
Products.Rank,
Products.Rank AS SalesRank,
Products.Status,
Products.LaunchDate,
Products.IsOnline,
Products.IsAutoOffline,
Products.IsSalableOnline,
Products.IsMarketableOnline,
Products.LeadIn, Products.LeadOut,
COALESCE(Products.CaseQuantity, 1) AS CaseQuantity,
COALESCE(Products.MinimumOrderQuantity, 1) AS MinimumOrderQuantity,
Products.QuantityOnHand,
Image.Filename, Image.Width, Image.Height, Image.Alt, Image.Title,
Pricing.Price, Pricing.SalePrice,
Products.TruckShipment,
HSP.NDescription
FROM Products
JOIN HostedSites_Products HSP ON HSP.HostedSiteID = #HostedSiteID AND HSP.Validated = 1 AND Products.ID = HSP.ProductID
INNER JOIN IDList AS Items ON Items.ProductID=Products.ProductID
OUTER APPLY (
SELECT TOP(1) Filename, Width, Height, Alt, Title
FROM Items_Images
JOIN Images ON Items_Images.ImageID = Images.ID
WHERE Items_Images.ItemID = Products.ID
AND Items_Images.ItemType = 0
AND Images.Type = COALESCE(#ImageType, '.4b')
) Image
OUTER APPLY (
SELECT TOP(1) Price, SalePrice, CurrentPrice
FROM ProductPrices
WHERE ProductPrices.ProductID = Products.ID
ORDER BY LoRange ASC
) Pricing
WHERE Products.IsMarketableOnline = 1
ORDER BY Items.SortOrder
END

Optimize select query for non-indexed view

I want to select rows from non-indexed view by their IDs.
Here is view definition:
CREATE VIEW [dbo].[_V_V3]
AS
SELECT (CONVERT(varchar,T1.PK_ID)+','+CONVERT(varchar,T2.PK_ID)) as ID,
T1.[Id] as [T1_Id],
T1.[V1] as [T1_V1],
T2.[Id] as [T2_Id],
T2.[V1] as [T2_V1]
FROM [T1] INNER JOIN [T2] ON (T2.V1=T1.V1)
where T2.V1, T1.V1 - nvarchar.
My select query:
SELECT * FROM [dbo].[_V_V3] WHERE ID IN ('1,1', '2,3', ....)
It works very slowly. With 1000 IDs this query could perform several minutes.
Is there any way to optimize this select?

Can you just do this:
View
CREATE VIEW [dbo].[_V_V3]
AS
T1.PK_ID as PK_ID1,
T2.PK_ID as PK_ID2,
T1.[Id] as [T1_Id],
T1.[V1] as [T1_V1],
T2.[Id] as [T2_Id],
T2.[V1] as [T2_V1]
FROM [T1] INNER JOIN [T2] ON (T2.V1=T1.V1)
Query
SELECT * FROM [dbo].[_V_V3]
WHERE
(
PK_ID1=1 AND PK_ID2=1
)
OR
(
PK_ID1=2 AND PK_ID2=3
)
I would think that an integer compare is fast then varchar compare

Conditional JOIN Statement SQL Server

Is it possible to do the following:
IF [a] = 1234 THEN JOIN ON TableA
ELSE JOIN ON TableB
If so, what is the correct syntax?

I think what you are asking for will work by joining the Initial table to both Option_A and Option_B using LEFT JOIN, which will produce something like this:
Initial LEFT JOIN Option_A LEFT JOIN NULL
OR
Initial LEFT JOIN NULL LEFT JOIN Option_B
Example code:
SELECT i.*, COALESCE(a.id, b.id) as Option_Id, COALESCE(a.name, b.name) as Option_Name
FROM Initial_Table i
LEFT JOIN Option_A_Table a ON a.initial_id = i.id AND i.special_value = 1234
LEFT JOIN Option_B_Table b ON b.initial_id = i.id AND i.special_value <> 1234
Once you have done this, you 'ignore' the set of NULLS. The additional trick here is in the SELECT line, where you need to decide what to do with the NULL fields. If the Option_A and Option_B tables are similar, then you can use the COALESCE function to return the first NON NULL value (as per the example).
The other option is that you will simply have to list the Option_A fields and the Option_B fields, and let whatever is using the ResultSet to handle determining which fields to use.

This is just to add the point that query can be constructed dynamically based on conditions.
An example is given below.
DECLARE #a INT = 1235
DECLARE #sql VARCHAR(MAX) = 'SELECT * FROM [sourceTable] S JOIN ' + IIF(#a = 1234,'[TableA] A ON A.col = S.col','[TableB] B ON B.col = S.col')
EXEC(#sql)
--Query will be
/*
SELECT * FROM [sourceTable] S JOIN [TableB] B ON B.col = S.col
*/

You can solve this with union
select a, b
from tablea
join tableb on tablea.a = tableb.a
where b = 1234
union
select a, b
from tablea
join tablec on tablec.a = tableb.a
where b <> 1234

I disagree with the solution suggesting 2 left joins. I think a table-valued function is more appropriate so you don't have all the coalescing and additional joins for each condition you would have.
CREATE FUNCTION f_GetData (
#Logic VARCHAR(50)
) RETURNS #Results TABLE (
Content VARCHAR(100)
) AS
BEGIN
IF #Logic = '1234'
INSERT #Results
SELECT Content
FROM Table_1
ELSE
INSERT #Results
SELECT Content
FROM Table_2
RETURN
END
GO
SELECT *
FROM InputTable
CROSS APPLY f_GetData(InputTable.Logic) T

I think it will be better to think about your query in a different way and treat them more like sets.
I do believe if you make two separate queries then join them using UNION, It will be much better in performance and more readable.

SQL Server 2008 Stored Procedure Performance issue

Hi I have a Stored Procedure
ALTER PROCEDURE [dbo].[usp_EP_GetTherapeuticalALternates]
(
#NDCNumber CHAR(11) ,
#patientid INT ,
#pbmid INT
)
AS
BEGIN
TRUNCATE TABLE TempTherapeuticAlt
INSERT INTO TempTherapeuticAlt
SELECT --PR.ProductID AS MedicationID ,
NULL AS MedicationID ,
PR.ePrescribingName AS MedicationName ,
U.Strength AS MedicationStrength ,
FRM.FormName AS MedicationForm ,
PR.DEAClassificationID AS DEASchedule ,
NULL AS NDCNumber
--INTO #myTemp
FROM DatabaseTwo.dbo.Product PR
JOIN ( SELECT MP.MarketedProductID
FROM DatabaseTwo.dbo.Therapeutic_Concept_Tree_Specific_Product TCTSP
JOIN DatabaseTwo.dbo.Marketed_Product MP ON MP.SpecificProductID = TCTSP.SpecificProductID
JOIN ( SELECT TCTSP.TherapeuticConceptTreeID
FROM DatabaseTwo.dbo.Marketed_Product MP
JOIN DatabaseTwo.dbo.Therapeutic_Concept_Tree_Specific_Product TCTSP ON MP.SpecificProductID = TCTSP.SpecificProductID
JOIN ( SELECT
PR.MarketedProductID
FROM
DatabaseTwo.dbo.Package PA
JOIN DatabaseTwo.dbo.Product PR ON PA.ProductID = PR.ProductID
WHERE
PA.NDC11 = #NDCNumber
) PAPA ON MP.MarketedProductID = PAPA.MarketedProductID
) xxx ON TCTSP.TherapeuticConceptTreeID = xxx.TherapeuticConceptTreeID
) MPI ON PR.MarketedProductID = MPI.MarketedProductID
JOIN ( SELECT P.ProductID ,
O.Strength ,
O.Unit
FROM DatabaseTwo.dbo.Product AS P
INNER JOIN DatabaseTwo.dbo.Marketed_Product
AS M ON P.MarketedProductID = M.MarketedProductID
INNER JOIN DatabaseTwo.dbo.Specific_Product
AS S ON M.SpecificProductID = S.SpecificProductID
LEFT OUTER JOIN DatabaseTwo.dbo.OrderableName_Combined
AS O ON S.SpecificProductID = O.SpecificProductID
GROUP BY P.ProductID ,
O.Strength ,
O.Unit
) U ON PR.ProductID = U.ProductID
JOIN ( SELECT PA.ProductID ,
S.ScriptFormID ,
F.Code AS NCPDPScriptFormCode ,
S.FormName
FROM DatabaseTwo.dbo.Package AS PA
INNER JOIN DatabaseTwo.dbo.Script_Form
AS S ON PA.NCPDPScriptFormCode = S.NCPDPScriptFormCode
INNER JOIN DatabaseTwo.dbo.FormCode AS F ON S.FormName = F.FormName
GROUP BY PA.ProductID ,
S.ScriptFormID ,
F.Code ,
S.FormName
) FRM ON PR.ProductID = FRM.ProductID
WHERE
( PR.OffMarketDate IS NULL )
OR ( PR.OffMarketDate = '' )
OR (PR.OffMarketDate = '1899-12-30 00:00:00.000')
OR ( PR.OffMarketDate <> '1899-12-30 00:00:00.000'
AND DATEDIFF(dd, GETDATE(),PR.OffMarketDate) > 0
)
GROUP BY PR.ePrescribingName ,
U.Strength ,
FRM.FormName ,
PR.DEAClassificationID
-- ORDER BY pr.ePrescribingName
SELECT LL.ProductID AS MedicationID ,
temp.MedicationName ,
temp.MedicationStrength ,
temp.MedicationForm ,
temp.DEASchedule ,
temp.NDCNumber ,
fs.[ReturnFormulary] AS FormularyStatus ,
copay.CopaTier ,
copay.FirstCopayTerm ,
copay.FlatCopayAmount ,
copay.PercentageCopay ,
copay.PharmacyType,
dbo.udf_EP_GetBrandGeneric(LL.ProductID) AS BrandGeneric
FROM TempTherapeuticAlt temp
OUTER APPLY ( SELECT TOP 1
ProductID
FROM DatabaseTwo.dbo.Product
WHERE ePrescribingName = temp.MedicationName
) AS LL
OUTER APPLY [dbo].[udf_EP_tbfGetFormularyStatus](#patientid,
LL.ProductID,
#pbmid) AS fs
OUTER APPLY ( SELECT TOP 1
*
FROM udf_EP_CopayDetails(LL.ProductID,
#PBMID,
fs.ReturnFormulary)
) copay
--ORDER BY LL.ProductID
TRUNCATE TABLE TempTherapeuticAlt
END
On my dev server I have data of 63k in each table
so this procedure took about 30 seconds to return result.
On my Production server, it is timing out, or taking >1 minute.
I am wondering my production server tables are full with 1400 millions of records,
can this be a reason.
if so what can be done, I have all required indexes on tables.
any help would be greatly appreciated.
thanks
Execution Plan
http://www.sendspace.com/file/hk8fao
Major Leakage
OUTER APPLY [dbo].[udf_EP_tbfGetFormularyStatus](#patientid,
LL.ProductID,
#pbmid) AS fs

Some strategies that may help:
Remove the first ORDER BY statement, those are killer on complex queries shouldn't be necessary.
Use CTEs to break the query into smaller pieces that can be individually addressed.
Reduce the nesting in the first set of JOINs
Extract the second and third set of joins (the GROUPED ones) and insert those into a temporary indexed table before joining and grouping everything.
You did not include the definition for function1 or function2 -- custom functions are often a place where performance issues can hide.
Without seeing the execution plan, it's difficult to see where the particular problems may be.

You have a query that selects data from 4 or 5 tables , some of them multiple times. It's really hard to say how to improve without deep analysis of what you are trying to achieve and what table structure actually is.
Data size is definitely an issue; I think it's quite obvious that the more data has to be processed, the longer query will take. Some general advices... Run the query directly and check execution plan. It may reveal bottlenecks. Then check if statistics is up to date. Also, review your tables, partitioning may help a lot in some cases. In addition, you can try altering tables and create clustered index not on PK (as it's done by default unless otherwise specified), but on other column[s] so your query will benefit from certain physical order of records. Note : do it only if you are absolutely sure what you are doing.
Finally, try refactoring your query. I have a feeling that there is a better way to get desired results (sorry, without understanding of table structure and expected results I cannot tell exact solution, but multiple joins of the same tables and bunch of derived tables don't look good to me)