SQL Server : While loop with nested If - sql-server

I'm creating a stock market database and am stumped that the following works correctly EXCEPT for the last select that returns results (after which the select does not change on subsequent loops). I've tried to simplify the code as follows, thanks in advance for feedback (I'm still noob):
Three tables:
BuyOrders
SellOrders
MatchedOrders
Stored procedure to process a NewBuyOrder:
Insert NewBuyOrder to BuyOrders;
While (NewBuyOrder.SharesRemaining > 0 )
SELECT TOP 1
FROM SellOrders
WHERE SellOrders.Price <= NewBuyOrder.Price
ORDER BY SellOrders.Price, SellOrders.TimePlaced;
IF NewBuyOrder.SharesRemaining < SellOrders.SharesAvailable
UPDATE SellOrders.SharesAvailable = [difference];
UPDATE BuyOrders = 0;
INSERT INTO MatchedOrders;
SET NewBuyOrder.SharesRemaining = 0;
BREAK;
ELSE
UPDATE SellOrders = 0;
UPDATE BuyOrders = [difference];
INSERT INTO MatchedOrders;
SET NewBuyOrder.SharesRemaining = [difference];
CONTINUE;

In hope it might help someone else, I found the issue . . . I'm using local variables to store the matched SellOrderID. As such if the Select returns no match on a second pass through then the local variables were not getting updated (and hence erroneously reused in subsequent while loops until the If kicked in).
So I put a SET SellOrders.ID = 0 into the WHILE loop before the Select then below the Select added a IF SellOrders.ID = 0 and inside that a SET NewBuyOrder.SharesRemaining = 0 and BREAK (then made the first IF above into an ELSE IF).
I need to revisit the process to see if I can make it more elegant but would sincerely welcome thoughts on better ways to accomplish a process for matching the best available counteroffers in sequence. I've read but don't know much about cursors, plus think it transactionally superior not to SELECT a prioritized table of all matches rather than using my iterative loop -- but also have read suggestions not to use loops in SQL. Comments?
In addition I note the following: By itself a Select with no results returns a null set. Thus my original plan was to Select into my SP local variables and then use an IF EXISTS. I assume the local variable exists upon instantiation (even with no value) but am surprised that after a Select into the local variable with no results also did not fail an IF NULL test (i.e. presumably NULL cannot be inserted into a variable). What then is the value of an instantiated local variable with no value -- Blank?

Related

Streams + tasks missing inserts?

We've setup a stream on a table that is continuously loaded via snowpipe.
We're consuming this data with a task that runs every minute where we merge into another table. There is a possibility of duplicate keys so we use a ROW_NUMBER() window function, ordered by the file created timestamp descending where row_num=1. This way we always get the latest insert
Initially we used a standard task with the merge statement but we noticed that in some instances, since snowpipe does not guarantee loading in order of when the files were staged, we were updating rows with older data. As such, on the WHEN MATCHED section we added a condition so only when the file created ts > existing, to update the row
However, since we did that, reconciliation checks show that some new inserts are missing. I don't know for sure why changing the matched clause would interfere with the not matched clause.
My theory was that the extra clause added a bit of time to the task run where some runs were skipped or the next run happened almost immediately after the last one completed. The idea being that the missing rows were caught up in the middle and the offset changed before they could be consumed
As such, we changed the task to call a stored procedure which uses an explicit transaction. We did this because the docs seem to suggest that using a transaction will lock the stream. However even with this we can see that new inserts are still missing. We're talking very small numbers e.g. 8 out of 100,000s
Any ideas what might be happening?
Example task code below (not the sp version)
WAREHOUSE = TASK_WH
SCHEDULE = '1 minute'
WHEN SYSTEM$stream_has_data('my_stream')
AS
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED AND ms.file_created >= pd.file_created THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....
;
I am not fully sure what is going wrong here, but the file created time related recommendation is given by Snowflake somewhere. It suggest that the file created timestamp is calculated in cloud service and it may be bit different than you think. There is another recommendation related to snowpipe and data ingestion. The queue service takes a min to consume the data from pipe and if you have lot of data being flown inside with in a min, you may end up this issue. Look you implementation and simulate if pushing data in 1min interval solve that issue and don't rely on file create time.
The condition "AND ms.file_created >= pd.file_created" seems to be added as a mechanism to avoid updating the same row multiple times.
Alternative approach could be using IS DISTINCT FROM to compare source against target columns(except id):
MERGE INTO processed_data pd USING (
select
ms.*,
CASE WHEN ms.status IS NULL THEN 1/mv.count ELSE NULL END as pending_count,
CASE WHEN ms.status='COMPLETE' THEN 1/mv.count ELSE NULL END as completed_count
from my_stream ms
JOIN my_view mv ON mv.id = ms.id
qualify
row_number() over (
partition by
id
order by
file_created DESC
) = 1
) ms ON ms.id = pd.id
WHEN NOT MATCHED THEN INSERT (col1, col2, col3,... )
VALUES (ms.col1, ms.col2, ms.col3,...)
WHEN MATCHED
AND (pd.col1, pd.col2,..., pd.coln) IS DISTINCT FROM (ms.col1, ms.col2,..., ms.coln)
THEN UPDATE SET pd.col1 = ms.col1, pd.col2 = ms.col2, pd.col3 = ms.col3, ....;
This approach will also prevent updating row when nothing has changed.

Decrease execution time of SQL query

I've got a question in terms of processing and making a query more efficient whilst maintaining its accuracy. Before I display the query I'd like to point out some basics of it.
I've got a case that manipulates the where-clause to get all childs of the parent. Basically I've got two types of data that I need to display; a red and a green type. The red type has a column (TRK_TrackerGroup_LKID2) set to NULL by default, whereas the green data has a value in said column (ranging from 5-7).
My problem is that I need to extract both types of data to accurately get a count of outstanding issues in a view, but doing so (by adding the case) the execution time goes from < 1 second to well over 15 seconds.
This is the query (with the mentioned case):
SELECT TS.id AS TrackerStartDateID,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
Count(TF.id) AS Cnt
FROM [dbo].[trk_startdate] TS
INNER JOIN [dbo].[trk_startdatemap] TSM
ON TS.id = TSM.trk_startdateid
AND TSM.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures] TF
ON TF.trk_startdateid = TS.id
AND TF.deletedflag = 0
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK
ON TFLK.id = TF.trk_feature_lkid
WHERE TS.deletedflag = 0
AND TF.applicabletoproject = 1
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
AND TF.datestamp = (SELECT Max(TF2.datestamp)
FROM [dbo].[trk_trackerfeatures] TF2
INNER JOIN [dbo].[trk_trackerfeatures_lk] TFLK2
ON TFLK2.id = TF2.trk_feature_lkid
WHERE TF.trk_startdateid = TF2.trk_startdateid
AND TFLK2.trk_trackergroup_lkid = TFLK.trk_trackergroup_lkid)
GROUP BY TS.id,
TSM.mappingtypeid,
TSM.maptoid,
TFLK.trk_trackergroup_lkid,
TF.datestamp
It functions as a 'parent' in the sense that it grabs the latest inserted data-set (using DateStamp) from every single child-group. This is necessary to produce a parent-report in SSRS report at a later time, but at the moment my problem (as mentioned above) is the execution time.
I'd like to hear if there are any suggestions on how to decrease the execution time whilst maintaining the accuracy of the query.
Expected output:
Without the case I get this:
Your problem is this condition cant use INDEX
AND TF.readyforwork = CASE -- HERE IS THE PROBLEM
WHEN TF.trk_trackerstatus_lkid2 IS NULL THEN 0
ELSE 1
END
Try to change it to
AND ( TF.readyforwork = 0 and TF.trk_trackerstatus_lkid2 IS NULL
OR TF.readyforwork = 1 and TF.trk_trackerstatus_lkid2 IS NOT NULL
)
But again you should check with EXPLAIN ANALIZE to test if your query is using index or not.
The most problematic bit of your query seems to be the correlated subquery, because you must call it for every possible row.
You should optimize this first. To do so you can add indexes that the engine could use to quickly calculate that value on each row.
Based on your query I would add these two indexes multiples :
On Table trackerfeatures, index fields : trk_startdateid, datestamp
On Table trk_trackerfeatures_lk, index fields : id, trk_trackergroup_lkid

Strange Behaviour on MSSQL Stored Procedure using Conditional WHERE with CONTAINS (Full Text Index)

I need some help from a MS SQL Master...
Short version:
When I execute a Conditional Where followed by a Contains, my query delays 1 minute (In its normal execution, it takes 200 milliseconds).
With this query, everything works fine:
Where
Contains(table.product_name, #search_word)
But with a Conditional Where, it takes 1 minute to execute:
Where
(#ExecuteWhereStatement = 0 Or (Contains(table.product_name, #search_word))
Long Version:
I'm using a stored procedure that receives some parameters. This Stored Procedure query a really large table, but everything is indexed properly and the query goes very well so far.
The main query is a little big, so I want to make the WHERE clause more smart possible, to avoid repeat multiple times the same statement.
The whole idea of the DataBase, is a history of purchases made by the State. So this query involves 3 tables:
Table 1 (table_purchase) - The purchase itself
id_purchase int (PK)
date_purchase datetime
buyer_code int (Nullable)
Table 2 (table_purchase_product) - The Items of a Purchase
id_product int (PK)
id_purchase int (FK of table_purchase)
product_quantity int (Nullable)
product_name varchar(255) (Nullable) (Full-Text-Indexed)
product_description varchar(2000) (Nullable) (Full-Text-Indexed)
id_product_bid_winner int (FK of table_product_bid)
Table 3 (table_product_bids) - The Bids for Each product of a Purchase
id_product_bid int (PK)
id_product int (FK of table_purchase_product)
product_brand varchar(255) (Nullable) (Full-Text-Indexed)
bid_value decimal (20,6)
So basicly, We have a "Purchase", that has several "Products (or Items)", and each "Product" has some "Bids (or Prices)"
And there is the Bad Girl (The SQL Stored Procedure):
ALTER PROCEDURE [dbo].[procPesquisaFullText]
#search_date datetime,
#search_word varchar(8000),
#search_brand varchar(255),
#only_one_bid bit = 0,
#search_buyer_code int = 0,
#quantityFrom decimal(20,6) = 0,
#quantityTo decimal(20,6) = 0
AS
BEGIN
SET NOCOUNT ON;
Declare #ExecuteWordSearch AS bit;
if (#uasg != 0 And #search_word = '')
begin
Set #ExecuteWordSearch = 0;
Set #search_word = 'nothing';
end
else
begin
Set #ExecuteWordSearch = 1;
end
Declare #ExecuteBrandSearch AS bit;
if (#search_brand = '')
begin
Set #ExecuteBrandSearch = 0;
Set #search_brand = 'nothing';
end
else
begin
Set #ExecuteMarcaSearch = 1;
end
begin
SELECT
pp.id_product,
pp.id_purchase,
pp.description
FROM
table_purchase_product pp
inner join table_purchase p on p.id_purchase = pp.id_purchase
WHERE
(p.date_purchase >= #search_date)
and (#search_buyer_code = 0 or (l.buyer_code = #search_buyer_code))
and (#quantityFrom = 0 or (li.product_quantity >= #QuantityFrom))
and (#quantityTo = 0 or (li.product_quantity <= #QuantityTo))
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
and (#only_one_bid = 0
or ((Select COUNT(*) From table_product_bid Where table_product_bid.id_product = pp.id_product) = 1))
and (#ExecuteBrandSearch = 0 Or (exists(
select 1
from table_product_bid ppb
where ppb.id_product_bid = pp.id_product_bid_winner
and contains(ppb.product_brand, #search_brand)
)
))
ORDER BY p.date_purchase DESC
end
END
So far, so good...
In the beginning I set two variables, used inside the query.
The first, verify if the user specified a "Buyer Code" AND didn't specify a "Search Word" (So, not the Product's description nor the Product's name is verified)
The second, verify if the user specified a "Specific Brand". If so, then the Winning Bid's BRAND is verified to match the users one.
Observation: You'll notice that when the "Search Words" is empty, I set them to "nothing". I do it because if the search term in the Contains is empty, it throws me a exception, even when it's not executed (I tested it in another query, absolutely isolated too)
As You can see, my user is able to search for:
- "Products" of Some Distinct Buyer "Purchase" (passing the #search_buyer_code parameter)
- A "Product" that contains a distinct word in its name or description
- A "Product" that has the Winner Bid of a specific Brand
- A "Product" that has only 1 bid at all
- A "Product" with a maximum and minimum quantity
And You'll notice that I used a lot of Conditions INSIDE the Where, producing a very dynamic Where, instead of using a "BIG If Else" statement, and repeating a lot of code. (I guess some "Googlers" will land here looking for Conditionally Wheres, and If so, I'm glad to help!)
Ok, so everything works veeery great at all. The query executes flawless. But here is the strange, damn, tricky issue:
If I want the user to be able to specify only a "Buyer Code" for Purchase, but No Word to Search of the Product using the code above (which is the first piece of code in the stored procedure does):
Changing from:
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
To:
and (#ExecuteWordSearch = 0 Or (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word)))
The query delays near 1 minute! (the execution is about 200 milliseconds for the query above).
But WHY??? I Use the same Logic of in all "Conditionally Wheres". I also use the same logic of having a flag/variable to indicate when execute the Where clause in the Word Search and the Brand Search, but the Brand Search works PERFECTLY! So Why, WHY only when I use the condition followed by a Contains my query delays 1 minute????
And this issue is not related with the amount of data, because I tried removing the entire Contains condition, allowing a lot of data to return, and it takes 1 second maximum...
Ow, It's a Microsoft SQL Server 2008 R2.
Thanks already for You read so far!
I cannot find the documentation I had around a very similar issue, but it sounded so familiar, I at least wanted to share what I remembered. Part of the issue is that for Sql Server, the full-text search engine is separate from the regular query execution engine, and so when you mix the two, in some cases, performance can tank. This is particularly true when the condition is an 'OR' rather than and 'AND'. (I remember hitting this exact situation). Conditional ANDs worked fine. But for OR, it's as if each condition gets evaluated repeatedly row by row.
Among the workarounds, one is, as already suggested, create your sql dynamically before execution.
Another would be to break the full-text and non-full text conditions into two search functions (literally UDF's) and then do whatever is needed (INTERSECT, EXCEPT, etc) with the two resultsets.
Try changing your WHERE clause to use a CASE statement, e.g.:
WHERE
CASE
WHEN #ExecuteWhereStatement = 0 THEN 1
WHEN #ExecuteWhereStatement = 1 THEN
CASE
WHEN CONTAINS([table].product_name, #search_word) THEN 1
ELSE 0
END
END = 1;

How to move a row from a table to another table if a Column's value changes in SQL?

I have two tables, Hosts, and UnusedHosts. Hosts has 17 columns, and UnusedHosts has 14 columns, where the first 12 is the same as in Hosts, and the 13th is a UserName, who moved a host to UnusedHosts, and the 14th is a date, when he did it. In Hosts there is a Column Unused which is False. I want do the following. If i change in Hosts this value to True, then it should automatically removed to UnusedHosts.
How can i do this? Could someone provide some example?
P.S.: My SQL knowledge is very small, i can use only very simple selects, updates, inserts, and delete commands.
Thanks!
There's two main types of query in SQL Server - the AFTER and the INSTEAD OF. They work, much as they sound - the AFTER performs your original query, and then runs your trigger. The INSTEAD OF runs your trigger in place of the original query. You can use either in this case, though in different ways.
AFTER:
create trigger hosts_unused
on Hosts
after UPDATE
as
insert into UnusedHosts
select h.<<your_columns>>...
from Hosts h
where h.unused = 1 --Or however else you may be denoting True
delete from Hosts
where unused = 0 --Or however else you may be denoting False
GO
INSTEAD OF:
create trigger hosts_unused
on Hosts
instead of UPDATE
as
insert into UnusedHosts
select i.<<your_columns>>...
from inserted i
where i.unused = 1 --Or however else you may be denoting True
delete h
from inserted i inner join
Hosts h on i.host_id = h.host_id
where i.unused = 1 --Or however else you may be denoting True
update h
set hosts_column_1 = i.hosts_column_1,
hosts_column_2 = i.hosts_column_2,
etc
from inserted i inner join
Hosts h on i.host_id = h.host_id
where i.unused = 0 --Or however else you may be denoting False
GO
It's always important to think of performance when applying triggers. If you have a lot of updates on the Hosts table, but only a few of them are setting the unused column, then the AFTER trigger is probably going to give you better performance. The AFTER trigger also has the benefit that you can simply put in , INSERT after the after UPDATE bit, and it'll work for inserts too.
Check out Books Online on the subject.

SDAC -RecordCount and FetchAll

I am using SDAC components to query a SQL Server 2008 database. It has a recordcountproperty as all datasets do and it also has the FetchAll property (which I think it is called packedrecords on clientdatasets). Said that, I got a few questions:
1 - If I set FetchAll = True the recordcount property returns ok. But in this case, when I have a large database and my query returns a lot of lines, sometimes the memory grows a lot (because it is fetching all data to get the recordcount of course).
2 - If I set FetchAll = False, the recordcount returns -1 and the memory does not grow. But I really need the recordcount. And I also wanna create a generic function for this, so I dont have to change all my existent queries.
What can I do to have the recordcount working and the memory usage of the application low in this case?
Please, do not post that I dont need recordcount (or that I should use EOF and BOF) because I really do and this is not the question.
I thought about using a query to determine the recordcount, but it has some problems since my query is going to be executed twice (1 for recordcount, 1 for data)
EDIT
#Johan pointed out a good solution, and it seems to work. Can anybody confirm this? I am using 1 TMSCconnection for every TMSQuery (because i am using threads), so I dont think this will be a problem, will it?
MSQuery1.FetchAll := False;
MSQuery1.FetchRows := 10;
MSQuery1.SQL.Text := 'select * from cidade';
MSQuery1.Open;
ShowMessage(IntToStr(MSQuery1.RecordCount)); //returns 10
MSQuery1.Close;
MSQuery2.SQL.Text := 'SELECT ##rowcount AS num_of_rows';
MSQuery2.Open;
ShowMessage(MSQuery2.FieldByName('num_of_rows').AsString); //returns 289
EDIT 2*
MSQuery1 must be closed, or MSQuery2 will not return the num_of_rows. Why is that?
MSQuery1.FetchAll := False;
MSQuery1.FetchRows := 10;
MSQuery1.SQL.Text := 'select * from cidade';
MSQuery1.Open;
ShowMessage(IntToStr(MSQuery1.RecordCount)); //returns 10
//MSQuery1.Close; <<commented
MSQuery2.SQL.Text := 'SELECT ##rowcount AS num_of_rows';
MSQuery2.Open;
ShowMessage(MSQuery2.FieldByName('num_of_rows').AsString); //returns 0
Run your query as normal, than close the query
MSQuery1.SQL.Text := 'select * from cidade';
MSQuery1.Open;
MSQuery1.Close;
You need the close otherwise SQL-server has not closed the cursor yet, and will not register the query as 'completed'.
and run the following query right afterwards:
SELECT ##rowcount AS num_of_rows
This will select the total number of rows your last select read.
It will also select the number of rows your update/delete/insert statement affected.
See: http://technet.microsoft.com/en-us/library/ms187316.aspx
Note that this variable is per connection, so queries in other connections do not affect you.
I use ODAC and I believe SDAC inherits from the same base classes and works the same way as ODAC. In ODAC, there is an option called QueryRecCount under Options in your query component. Look for TCustomDADataSet.Options.QueryRecCount in your help file.
Setting QueryRecCount = True and FetchAll = False will reduce your memory usage and give you the record count. But SDAC will run a second query in the background to get the record count so it does add a little bit of extra time to your query.
Take a look at the Devart forum entry at http://www.devart.com/forums/viewtopic.php?t=8143.

Resources