I've got a question in terms of processing and making a query more efficient whilst maintaining its accuracy. Before I display the query I'd like to point out some basics of it.
I've got a case that manipulates the where-clause to get all childs of the parent. Basically I've got two types of data that I need to display; a red and a green type. The red type has a column (TRK_TrackerGroup_LKID2) set to NULL by default, whereas the green data has a value in said column (ranging from 5-7).
My problem is that I need to extract both types of data to accurately get a count of outstanding issues in a view, but doing so (by adding the case) the execution time goes from < 1 second to well over 15 seconds.
This is the query (with the mentioned case):
SELECT TS.id AS TrackerStartDateID,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
Count(TF.id) AS Cnt
FROM [dbo].[trk_startdate] TS
INNER JOIN [dbo].[trk_startdatemap] TSM
ON TS.id = TSM.trk_startdateid
AND TSM.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures] TF
ON TF.trk_startdateid = TS.id
AND TF.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK
ON TFLK.id = TF.trk_feature_lkid
WHERE TS.deletedflag = 0
AND TF.applicabletoproject = 1
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
AND TF.datestamp = (SELECT Max(TF2.datestamp)
FROM [dbo].[trk_trackerfeatures] TF2
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK2
ON TFLK2.id = TF2.trk_feature_lkid
WHERE TF.trk_startdateid = TF2.trk_startdateid
AND TFLK2.trk_trackergroup_lkid = TFLK.trk_trackergroup_lkid)
GROUP BY TS.id,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
TF.datestamp
It functions as a 'parent' in the sense that it grabs the latest inserted data-set (using DateStamp) from every single child-group. This is necessary to produce a parent-report in SSRS report at a later time, but at the moment my problem (as mentioned above) is the execution time.
I'd like to hear if there are any suggestions on how to decrease the execution time whilst maintaining the accuracy of the query.
Expected output:
Without the case I get this:
Your problem is this condition cant use INDEX
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
Try to change it to
AND ( TF.readyforwork = 0 and TF.trk_trackerstatus_lkid2 IS NULL
OR TF.readyforwork = 1 and TF.trk_trackerstatus_lkid2 IS NOT NULL
)
But again you should check with EXPLAIN ANALIZE to test if your query is using index or not.
The most problematic bit of your query seems to be the correlated subquery, because you must call it for every possible row.
You should optimize this first. To do so you can add indexes that the engine could use to quickly calculate that value on each row.
Based on your query I would add these two indexes multiples :
On Table trackerfeatures, index fields : trk_startdateid, datestamp
On Table trk_trackerfeatures_lk, index fields : id, trk_trackergroup_lkid
Related
We've setup a stream on a table that is continuously loaded via snowpipe.
We're consuming this data with a task that runs every minute where we merge into another table. There is a possibility of duplicate keys so we use a ROW_NUMBER() window function, ordered by the file created timestamp descending where row_num=1. This way we always get the latest insert
Initially we used a standard task with the merge statement but we noticed that in some instances, since snowpipe does not guarantee loading in order of when the files were staged, we were updating rows with older data. As such, on the WHEN MATCHED section we added a condition so only when the file created ts > existing, to update the row
However, since we did that, reconciliation checks show that some new inserts are missing. I don't know for sure why changing the matched clause would interfere with the not matched clause.
My theory was that the extra clause added a bit of time to the task run where some runs were skipped or the next run happened almost immediately after the last one completed. The idea being that the missing rows were caught up in the middle and the offset changed before they could be consumed
As such, we changed the task to call a stored procedure which uses an explicit transaction. We did this because the docs seem to suggest that using a transaction will lock the stream. However even with this we can see that new inserts are still missing. We're talking very small numbers e.g. 8 out of 100,000s
Any ideas what might be happening?
Example task code below (not the sp version)
WAREHOUSE = TASK_WH
SCHEDULE = '1 minute'
WHEN SYSTEM$stream_has_data('my_stream')
AS
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED AND ms.file_created >= pd.file_created THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....
;
I am not fully sure what is going wrong here, but the file created time related recommendation is given by Snowflake somewhere. It suggest that the file created timestamp is calculated in cloud service and it may be bit different than you think. There is another recommendation related to snowpipe and data ingestion. The queue service takes a min to consume the data from pipe and if you have lot of data being flown inside with in a min, you may end up this issue. Look you implementation and simulate if pushing data in 1min interval solve that issue and don't rely on file create time.
The condition "AND ms.file_created >= pd.file_created" seems to be added as a mechanism to avoid updating the same row multiple times.
Alternative approach could be using IS DISTINCT FROM to compare source against target columns(except id):
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED
AND (pd.col1, pd.col2,..., pd.coln) IS DISTINCT FROM (ms.col1, ms.col2,..., ms.coln)
THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....;
This approach will also prevent updating row when nothing has changed.
This question already has answers here:
SQL Server: Query fast, but slow from procedure
(12 answers)
Closed 2 years ago.
My stored procedure was taking around 10 seconds, but suddenly (for unknown reasons) it became so slow (taking 9 minutes).
I did not do any changes at all that may cause the delay.
I wonder if someone can tell why it is so slow.
Here is my query
SELECT
P.PtsID, P.PtsCode, P.PtsName,
FORMAT(P.DOB, 'dd/MM/yyyy') AS DOB, P.Gender, V.VisitID, V.VisitType,
FORMAT(V.VisitDate, 'dd/MM/yyyy') AS VisitDate,
FORMAT(V.DischargeDate, 'dd/MM/yyyy') AS DischargeDate,
R.RepID, R.RepDate, R.RepType, R.RepDesc
FROM
Patients P
INNER JOIN
Visits V ON P.PtsID = V.PtsID
INNER JOIN
Reps R ON R.PtsID = P.PtsID AND R.VisitID = V.VisitID
WHERE
(P.Deleted = 0 AND V.Deleted = 0 AND R.Deleted = 0)
AND (P.PtsName LIKE '%'+TRIM(#PtsName)+'%' OR TRIM(#PtsName) = '')
AND (P.PtsCode LIKE '%'+TRIM(#PtsNo)+'%' OR TRIM(#PtsNo) = '')
AND (R.RepText LIKE '%'+TRIM(#RepText)+'%' OR TRIM(#RepText) = '')
AND (TRIM(#RepCode) = '' OR R.RepID IN (SELECT RepID
FROM tags
WHERE tag = 'XXX'
AND Deleted = 0
AND code IN (SELECT value
FROM string_split(#RepCode,','))))
and this is the execution plan
When I execute the script as an ad-hoc query, not as stored procedure, it is very fast.
Edit
Here is my actual execution plan:
https://www.brentozar.com/pastetheplan/?id=ByPLltcZD
Thanks
So purely bassed on the execution plan.
Statistics
In your exection plan you can see the actual numbers and estimates are far apart.
This can have multiple causes.
Out of date statistics and indexes. Try rebuilding indexes and updating statistics
Updating stats: EXEC sp_updatestats (this might take some time and resources, so if its a production server, do it out of the office hours / batch job windows.)
Parameter sniffig can also cause wrong estimations. You can expirment with OPTION(RECOMPILE) or OPTION(OPTIMIZE FOR UNKNOWN) to test if this is the case.
201 Bucket Problem
Query optimization
When you write IN(subquery) and you have a key lookup, like you have. You are going to have a bad time. For every row returned from the index (NonClusteredIndex-code) the engine needs to access the clustered index to retrieve the RepID one-by-one = Painfully slow when it's multiple rows (In your case: 430.474.500 rows).
You can change this by using EXISTS, in your example:
SELECT
P.PtsID, P.PtsCode, P.PtsName,
FORMAT(P.DOB, 'dd/MM/yyyy') AS DOB, P.Gender, V.VisitID, V.VisitType,
FORMAT(V.VisitDate, 'dd/MM/yyyy') AS VisitDate,
FORMAT(V.DischargeDate, 'dd/MM/yyyy') AS DischargeDate,
R.RepID, R.RepDate, R.RepType, R.RepDesc
FROM
Patients P
INNER JOIN
Visits V ON P.PtsID = V.PtsID
INNER JOIN
Reps R ON R.PtsID = P.PtsID AND R.VisitID = V.VisitID
WHERE
(P.Deleted = 0 AND V.Deleted = 0 AND R.Deleted = 0)
AND (P.PtsName LIKE '%'+TRIM(#PtsName)+'%' OR TRIM(#PtsName) = '')
AND (P.PtsCode LIKE '%'+TRIM(#PtsNo)+'%' OR TRIM(#PtsNo) = '')
AND (R.RepText LIKE '%'+TRIM(#RepText)+'%' OR TRIM(#RepText) = '')
AND (TRIM(#RepCode) = '' OR EXISTS (SELECT 1
FROM tags
WHERE tag = 'XXX'
AND Deleted = 0
AND tags.RepId = r.RepId
AND code IN (SELECT value
FROM string_split(#RepCode,','))))
Index optimizations
If you are still suffering from the Key lookup you may want to change the index, so it also has RepId or include it.
You still have 2 other key lookups. You could also solve this with an INCLUDE on the indexes but only if it makes sense. (Can they be used for other queries, is the current query executing frequently, ...)
Doing the trims and concatenation and storing them in a separate variable may give you some minor improvements.
Wait stats
The following wait stats where also included in the execution plan.
<WaitStats>
<Wait WaitType="RESERVED_MEMORY_ALLOCATION_EXT" WaitTimeMs="1018" WaitCount="2694600"/>
<Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="514" WaitCount="159327"/>
<Wait WaitType="ASYNC_NETWORK_IO" WaitTimeMs="63" WaitCount="5"/>
<Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="25" WaitCount="14639"/>
</WaitStats>
RESERVED_MEMORY_ALLOCATION_EXT and MEMORY_ALLOCATION_EXT but there is no issue with wait stats.
As mentioned in another post: SQL Server: Query fast, but slow from procedure
You can use the following workaround
Slow: SET ANSI_NULLS OFF
Fast: SET ANSI_NULLS ON
Recently I ran into an issue where we have multiple concurrent client requests causing performance issue in db. I tried the test scenario and as it turned out, when I run SELECT queries (same query) 6 to 7 times (gets worse with more), It degrades the performance and execution takes a lot of time. However I tried this one
SELECT TOP (100) COUNT(DISTINCT([Doc_Number])) AS "Expression"
FROM (
SELECT *
FROM "dbo"."Dummy_Table" "table_alias"
WHERE ((CAST("table_alias"."ID" AS NVARCHAR)) NOT IN
(
SELECT "PrimaryKey" AS ExceptionKey
FROM dbo.exceptions inner_exceptionStatus
LEFT JOIN dbo.Workflow inner_workflowStates ON
(inner_exceptionStatus."Status"= inner_workflowStates."UUID" AND
inner_exceptionStatus."UUID"= 'CA1662D6-73A2-4692-A765-E7E3EDB66062')
WHERE ("inner_workflowStates"."RemoveFromRecordSet" = 1 AND
"inner_workflowStates"."IsDeleted" = 0) AND
("inner_exceptionStatus"."IsArchived" IS NULL OR
"inner_exceptionStatus"."IsArchived" = 0)))) wrapperQuery
The query when runs alone takes around 1sec execution time. But If we runs it in parallel, for each query it takes up a wried amount of time of leads to timeout.
The only thing bothers me here is that SELECT query should be non-blocking and even with shared lock, then need to get along easily.
I am not sure if there is anything wrong in the query that adds up the situation.
Any help is deeply appreciated !!
Try this way
SELECT Count(DISTINCT( [Doc_Number] )) AS Expression
FROM dbo.Dummy_Table table_alias
WHERE NOT EXISTS (SELECT 1
FROM dbo.exceptions inner_exceptionStatus
INNER JOIN dbo.Workflow inner_workflowStates
ON ( inner_exceptionStatus.Status = inner_workflowStates.UUID
AND inner_exceptionStatus.UUID = 'CA1662D6-73A2-4692-A765-E7E3EDB66062' )
WHERE inner_workflowStates.RemoveFromRecordSet = 1
AND inner_workflowStates.IsDeleted = 0
AND ( inner_exceptionStatus.IsArchived IS NULL
OR inner_exceptionStatus.IsArchived = 0 )
AND table_alias.ID = PrimaryKey)
Made couple of changes.
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
Still if the query is running slow then you need to post the execution plan and make sure the statistics are up-to-date
You can simplyfy your query like this :
SELECT COUNT(DISTINCT(Doc_Number)) AS Expression
FROM dbo.Dummy_Table dmy
WHERE not exists
(
SELECT *
FROM dbo.exceptions ies
INNER JOIN dbo.Workflow iws ON ies.Status= iws.UUID AND ies.UUID= 'CA1662D6-73A2-4692-A765-E7E3EDB66062'
WHERE iws.RemoveFromRecordSet = 1 AND iws.IsDeleted = 0 AND (ies.IsArchived IS NULL OR ies.IsArchived = 0)
and dmy.ID=PrimaryKey
)
Like prdp say :
Changed NOT IN to NOT EXISTS
Removed the convert in "table_alias"."ID" because it will avoid using any index present in "table_alias"."ID" column. If the conversion is really required then add it.
Removed Top (100) since there is no Group By it will return a single record as result.
I add :
Remove you temporary table wrapperQuery
You can use INNER JOIN because into where you test RemoveFromRecordSet = 1 then you remove null values.
Remove not utils quotes ,brackets and parenthèses into where clause
I'm creating a stock market database and am stumped that the following works correctly EXCEPT for the last select that returns results (after which the select does not change on subsequent loops). I've tried to simplify the code as follows, thanks in advance for feedback (I'm still noob):
Three tables:
BuyOrders
SellOrders
MatchedOrders
Stored procedure to process a NewBuyOrder:
Insert NewBuyOrder to BuyOrders;
While (NewBuyOrder.SharesRemaining > 0 )
SELECT TOP 1
FROM SellOrders
WHERE SellOrders.Price <= NewBuyOrder.Price
ORDER BY SellOrders.Price, SellOrders.TimePlaced;
IF NewBuyOrder.SharesRemaining < SellOrders.SharesAvailable
UPDATE SellOrders.SharesAvailable = [difference];
UPDATE BuyOrders = 0;
INSERT INTO MatchedOrders;
SET NewBuyOrder.SharesRemaining = 0;
BREAK;
ELSE
UPDATE SellOrders = 0;
UPDATE BuyOrders = [difference];
INSERT INTO MatchedOrders;
SET NewBuyOrder.SharesRemaining = [difference];
CONTINUE;
In hope it might help someone else, I found the issue . . . I'm using local variables to store the matched SellOrderID. As such if the Select returns no match on a second pass through then the local variables were not getting updated (and hence erroneously reused in subsequent while loops until the If kicked in).
So I put a SET SellOrders.ID = 0 into the WHILE loop before the Select then below the Select added a IF SellOrders.ID = 0 and inside that a SET NewBuyOrder.SharesRemaining = 0 and BREAK (then made the first IF above into an ELSE IF).
I need to revisit the process to see if I can make it more elegant but would sincerely welcome thoughts on better ways to accomplish a process for matching the best available counteroffers in sequence. I've read but don't know much about cursors, plus think it transactionally superior not to SELECT a prioritized table of all matches rather than using my iterative loop -- but also have read suggestions not to use loops in SQL. Comments?
In addition I note the following: By itself a Select with no results returns a null set. Thus my original plan was to Select into my SP local variables and then use an IF EXISTS. I assume the local variable exists upon instantiation (even with no value) but am surprised that after a Select into the local variable with no results also did not fail an IF NULL test (i.e. presumably NULL cannot be inserted into a variable). What then is the value of an instantiated local variable with no value -- Blank?
I need some help from a MS SQL Master...
Short version:
When I execute a Conditional Where followed by a Contains, my query delays 1 minute (In its normal execution, it takes 200 milliseconds).
With this query, everything works fine:
Where
Contains(table.product_name, #search_word)
But with a Conditional Where, it takes 1 minute to execute:
Where
(#ExecuteWhereStatement = 0 Or (Contains(table.product_name, #search_word))
Long Version:
I'm using a stored procedure that receives some parameters. This Stored Procedure query a really large table, but everything is indexed properly and the query goes very well so far.
The main query is a little big, so I want to make the WHERE clause more smart possible, to avoid repeat multiple times the same statement.
The whole idea of the DataBase, is a history of purchases made by the State. So this query involves 3 tables:
Table 1 (table_purchase) - The purchase itself
id_purchase int (PK)
date_purchase datetime
buyer_code int (Nullable)
Table 2 (table_purchase_product) - The Items of a Purchase
id_product int (PK)
id_purchase int (FK of table_purchase)
product_quantity int (Nullable)
product_name varchar(255) (Nullable) (Full-Text-Indexed)
product_description varchar(2000) (Nullable) (Full-Text-Indexed)
id_product_bid_winner int (FK of table_product_bid)
Table 3 (table_product_bids) - The Bids for Each product of a Purchase
id_product_bid int (PK)
id_product int (FK of table_purchase_product)
product_brand varchar(255) (Nullable) (Full-Text-Indexed)
bid_value decimal (20,6)
So basicly, We have a "Purchase", that has several "Products (or Items)", and each "Product" has some "Bids (or Prices)"
And there is the Bad Girl (The SQL Stored Procedure):
ALTER PROCEDURE [dbo].[procPesquisaFullText]
#search_date datetime,
#search_word varchar(8000),
#search_brand varchar(255),
#only_one_bid bit = 0,
#search_buyer_code int = 0,
#quantityFrom decimal(20,6) = 0,
#quantityTo decimal(20,6) = 0
AS
BEGIN
SET NOCOUNT ON;
Declare #ExecuteWordSearch AS bit;
if (#uasg != 0 And #search_word = '')
begin
Set #ExecuteWordSearch = 0;
Set #search_word = 'nothing';
end
else
begin
Set #ExecuteWordSearch = 1;
end
Declare #ExecuteBrandSearch AS bit;
if (#search_brand = '')
begin
Set #ExecuteBrandSearch = 0;
Set #search_brand = 'nothing';
end
else
begin
Set #ExecuteMarcaSearch = 1;
end
begin
SELECT
pp.id_product,
pp.id_purchase,
pp.description
FROM
table_purchase_product pp
inner join table_purchase p on p.id_purchase = pp.id_purchase
WHERE
(p.date_purchase >= #search_date)
and (#search_buyer_code = 0 or (l.buyer_code = #search_buyer_code))
and (#quantityFrom = 0 or (li.product_quantity >= #QuantityFrom))
and (#quantityTo = 0 or (li.product_quantity <= #QuantityTo))
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
and (#only_one_bid = 0
or ((Select COUNT(*) From table_product_bid Where table_product_bid.id_product = pp.id_product) = 1))
and (#ExecuteBrandSearch = 0 Or (exists(
select 1
from table_product_bid ppb
where ppb.id_product_bid = pp.id_product_bid_winner
and contains(ppb.product_brand, #search_brand)
)
))
ORDER BY p.date_purchase DESC
end
END
So far, so good...
In the beginning I set two variables, used inside the query.
The first, verify if the user specified a "Buyer Code" AND didn't specify a "Search Word" (So, not the Product's description nor the Product's name is verified)
The second, verify if the user specified a "Specific Brand". If so, then the Winning Bid's BRAND is verified to match the users one.
Observation: You'll notice that when the "Search Words" is empty, I set them to "nothing". I do it because if the search term in the Contains is empty, it throws me a exception, even when it's not executed (I tested it in another query, absolutely isolated too)
As You can see, my user is able to search for:
- "Products" of Some Distinct Buyer "Purchase" (passing the #search_buyer_code parameter)
- A "Product" that contains a distinct word in its name or description
- A "Product" that has the Winner Bid of a specific Brand
- A "Product" that has only 1 bid at all
- A "Product" with a maximum and minimum quantity
And You'll notice that I used a lot of Conditions INSIDE the Where, producing a very dynamic Where, instead of using a "BIG If Else" statement, and repeating a lot of code. (I guess some "Googlers" will land here looking for Conditionally Wheres, and If so, I'm glad to help!)
Ok, so everything works veeery great at all. The query executes flawless. But here is the strange, damn, tricky issue:
If I want the user to be able to specify only a "Buyer Code" for Purchase, but No Word to Search of the Product using the code above (which is the first piece of code in the stored procedure does):
Changing from:
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
To:
and (#ExecuteWordSearch = 0 Or (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word)))
The query delays near 1 minute! (the execution is about 200 milliseconds for the query above).
But WHY??? I Use the same Logic of in all "Conditionally Wheres". I also use the same logic of having a flag/variable to indicate when execute the Where clause in the Word Search and the Brand Search, but the Brand Search works PERFECTLY! So Why, WHY only when I use the condition followed by a Contains my query delays 1 minute????
And this issue is not related with the amount of data, because I tried removing the entire Contains condition, allowing a lot of data to return, and it takes 1 second maximum...
Ow, It's a Microsoft SQL Server 2008 R2.
Thanks already for You read so far!
I cannot find the documentation I had around a very similar issue, but it sounded so familiar, I at least wanted to share what I remembered. Part of the issue is that for Sql Server, the full-text search engine is separate from the regular query execution engine, and so when you mix the two, in some cases, performance can tank. This is particularly true when the condition is an 'OR' rather than and 'AND'. (I remember hitting this exact situation). Conditional ANDs worked fine. But for OR, it's as if each condition gets evaluated repeatedly row by row.
Among the workarounds, one is, as already suggested, create your sql dynamically before execution.
Another would be to break the full-text and non-full text conditions into two search functions (literally UDF's) and then do whatever is needed (INTERSECT, EXCEPT, etc) with the two resultsets.
Try changing your WHERE clause to use a CASE statement, e.g.:
WHERE
CASE
WHEN #ExecuteWhereStatement = 0 THEN 1
WHEN #ExecuteWhereStatement = 1 THEN
CASE
WHEN CONTAINS([table].product_name, #search_word) THEN 1
ELSE 0
END
END = 1;