one to one parent child relationship sql server - sql-server

I have a table with fields TransactionID, Amount and ParentTransactionID
The transactions can be cancelled so a new entry posted with amount and ParentTransactionID as cancelled TransactionID.
Lets say a transaction
1 100 NULL
I cancelled the above entry, it will like
2 -100 1
Again cancelled the above transaction, so it should like
3 100 2
When I fetch I should get the record 3 as ID 1 and 2 got cancelled.
result should be
3 100 2
If I cancelled the 3rd entry no records should return.
SELECT * FROM Transaction t
WHERE NOT EXISTS (SELECT TOP 1 NULL FROM Transaction pt
WHERE (pt.ParentTransactionID = t.TransactionID OR t.ParentTransactionID = pt.TransactionID)
AND ABS(t.Amount) = ABS(pt.Amount))
This works if only one level of cancel is made.

If all transactions are cancelled by a new transaction setting ParentTransactionId to the transaction it cancels, it can be done using a simple LEFT JOIN;
SELECT t1.* FROM Transactions t1
LEFT JOIN Transactions t2
ON t1.TransactionId = t2.ParentTransactionId
WHERE t2.TransactionId IS NULL;
t1 being the transaction we're currently looking at and t2 being the possibly cancelling transaction. If there is no cancelling transaction (ie the TransactionId for t2 does not exist), return the row.
I'm not sure about your last statement though, If I cancelled the 3rd entry no records should return.. How would you cancel #3 without adding a new transaction to the table? You may have some other condition for a cancel you're not telling us about...?
Simple SQLfiddle demo.
EDIT: Since you don't want cancelled transactions (or rather transactions with an odd number of cancellations), you need a quite a bit more complicated recursive query to figure out whether to show the last transaction or not;
WITH ChangeLog(TransactionID, Amount, ParentTransactionID,
IsCancel, OriginalTransactionID) AS
(
SELECT TransactionID, Amount, ParentTransactionID, 0, TransactionID
FROM Transactions WHERE ParentTransactionID IS NULL
UNION ALL
SELECT t.TransactionID, t.Amount, t.ParentTransactionID,
1-c.IsCancel, c.OriginalTransactionID
FROM Transactions t
JOIN ChangeLog c ON c.TransactionID = t.ParentTransactionID
)
SELECT c1.TransactionID, c1.Amount, c1.ParentTransactionID
FROM ChangeLog c1
LEFT JOIN ChangeLog c2
ON c1.TransactionID < c2.TransactionID
AND c1.OriginalTransactionID = c2.OriginalTransactionID
WHERE c2.TransactionID IS NULL AND c1.IsCancel=0
This will, in your example with 3 transactions, show the last row, but if the last row is cancelled, it won't return anything.
Since SQLfiddle is up again, here is a fiddle to test with.
A short explanation of the query may be in order even if a bit hard to do in a simple way; it defines a recursive "view", ChangeLog that tracks cancels and the original transaction id from the original to the last transaction in a series (a series is all transactions with the same OriginalTransactionId). After that, it joins ChangeLog with itself to find the last entry (ie all transactions that don't have a cancelling transaction). If the last entry found in a series is not a cancel (IsCancel=0) it will show up.

Related

"Subquery returned more than 1 value" when deleting records, not not if I change the number fetched

I am trying to delete millions of records from 4 databases, and running into an unexpected error. I made a temp table that holds a list of all the id's I wish to delete:
CREATE TABLE #CaseList (case_id int)
INSERT INTO #CaseList
SELECT DISTINCT id
FROM my_table
WHERE <my criteria for choosing cases>
I have deleted all the associated records (with foreign key on case_id)
DELETE FROM image WHERE case_id in (SELECT case_id from #CaseList)
Then I'm deleting records from my_table in batches (so as not to blow up the transaction log - which despite my database being in Simple Mode - still grows when making changes like deletions):
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 0 ROWS
FETCH NEXT 10000 ROWS ONLY)
This will work fine for one or three or five rounds (so I've deleted 10k-50k records), then will fail with this error message:
Msg 512, Level 16, State 1, Procedure trgd_image, Line 188
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Which is really weird because as I said, I already deleted all the associated records from the image table. Then it gets weirder because if I select smaller batches, the deletion works without error.
I generally cut the FETCH NEXT n half (5k), then in half again (2500), then in half again (1200) etc. until it works
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 50000 ROWS
FETCH NEXT 1200 ROWS ONLY)
Then repeat that amount until I get past where it failed, then turn it back up to 10000 and it will work again for a batch or three...
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 60000 ROWS
FETCH NEXT 10000 ROWS ONLY)
then fail again with the same error... rinse, wash, and repeat.
What can cause that subquery error when there are NOT related records in the image table? Why would selecting the cases in smaller batches work "around it" and then allow larger batches again?
I would really love a solution to this so I can make a WHILE loop and run this deletion through the millions of rows that way instead of having to manage it manually which is going to take me weeks with millions of rows needed to be deleted out of 4 databases.
The query you're showing cannot produce the error you're seeing. If you're sure it is, you have a bug report. My guess is that trgd_image, Line 188 (or somewhere nearby) you'll find you're using a scalar comparison, = instead of in.
I also have some advice for you, free for the asking. I wrote lots of queries like yours, and never used anything like OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY. You don't need to, either, and your SQL will be easier to write if you don't.
First, unless your machine is seriously undersized for 2018 for the scale of data you're using, I think you'll find 100,000 row transactions are just fine. If not, at least try to understand why not. A machine managing many millions of rows ought to be able to deal with a 1% of them without breaking a sweat.
When you populate #CaseList, trap ##rowcount. Then you can print/record that, and compute the number of "chunks" in your work.
Ideally, though, there's no temporary table. Instead, those cases probably have some logical grouping you can operate on. They might have regions or owners or dates, whatever was used to select them in the first place. Iterate over that, e.g.
delete from T where id in (select id from S where user = 1
Once you do that, you can write a loop:
select #user = min(user) from S where ...
while #user is not NULL begin
print "deleting cases for user", #user
delete from T where id in (select id from S where user = #user)
select #u = #user
select #user = min(user) from S where ... and user > #u
end
That way, if the process blows up partway through -- for any reason -- you have a logical grouping of deletions and a clean break: you know all the cases for user (or whatever) less than #user are deleted, and you can look into what's wrong with the "current" one. Quite often, you'll discover that the problem isn't unique, and by solving it you'll prevent future problems with others.

TSQL Large Update in Batches - Is Join Costing Me More Because it is Performed Each Time in a Loop

I'm trying to archive many records in batches rather than in one shot.
Will TSQL Join the two tables, TeamRoster and #teamIdsToDelete for every loop in the batch? My concern is that if my temporary table is huge and I don't remove records from the temporary table as I go, the JOIN might be unnecessarily expensive. On the other hand, how expensive is it to delete from the temporary table as I go? Is it made up for by the (?real/hypothetical?) smaller joins I'll have to do in each batch?
(Can provide more details/thoughts but will do so if helpful.)
DECLARE #teamIdsToDelete Table
(
RosterID int PRIMARY KEY
)
--collect the list of active teamIds. we will rely on the modified date to age them out.
INSERT INTO #teamIdsToDelete
SELECT DISTINCT tr.RosterID FROM
rosterload.TeamRoster tr WITH (NOLOCK)
WHERE tr.IsArchive=0 and tr.Loaded=1
--ageout out remaining rosters. (no cap - proved we can update more than 50k by modifying test case:
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
UPDATE TOP (1000) r
SET [Status] = 'Delete', IsArchive = 1, ModifiedDate = GETDATE(), ModifiedBy = 'abc'
FROM rosterload.TeamRoster r with(rowlock)
JOIN #teamIdsToDelete ttd ON ttd.rosterID = r.RosterID
WHERE r.[Status] != 'Delete' AND r.IsArchive != 1 AND r.ModifiedBy != 'abc' -- predicate for filtering;
IF ##ROWCOUNT = 0 -- terminating condition;
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
As I understand the goal of this query is to archive huge number of rows w/o blocking other queries at the same time. The temp table helps you to narrow down the subset of records to delete. Since it has one column which is clustered primary key, the join to another PK will be blazingly fast. You will spend more efforts on calculating and deleting updated records from the temp table.
Also, there is no reason to use transaction and do batches. You could just do one big update instead. The result is the same - table will be locked after first 5k row locks are acquired (~after first five batches updated) until the COMMIT statement. With rowlock hint does not prevent lock escalation. On the other hand, running w/o transaction would give other queries opportunity to continue after each 1000-row batch. If you need to make sure that all records are archived in one go - add some retry logic to your query or your application code for such errors like deadlocks or process interruption. And do you really need NOLOCK hint?

target table `RECEIPT` of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause

I have a table called INVOICE which stores bill information about an order/orders, one of the columns in this table is a column named paid which is a type of bit. As its name indicates, this column indicates whether the specific order/orders bill is paid or not.
I have another table named RECEIPT, this table stores information about any payment processes for a specific invoice.
So every time user pay an amount for the specified invoice, a new receipt record is created.
Now What I'm trying to do is to create a trigger that updates the paid column in the INVOICE table and set it to 1. This update process should be triggered in case of that the sum of receipts that belong to the invoice is equal to the amount_due in the INVOICE table.
In other words, if invoice due amount= 100$
and the user paid 50$
then, late he paid the other 50$
The paid column in the INVOICE table should be set to 1 as the total payments are equal to the invoice due amount
This is the trigger I've created to achieve the above
CREATE TRIGGER tg_invoice_payment ON RECEIPT
AFTER INSERT
AS
BEGIN
UPDATE INVOICE
SET paid = 1
WHERE INVOICE.invoice_id = (SELECT inserted.invoice_id FROM inserted)
AND (SELECT SUM(RECEIPT.amount_paid)
FROM RECEIPT
JOIN inserted ON RECEIPT.receipt_id = inserted.receipt_id
WHERE RECEIPT.invoice_id = inserted.invoice_id) = (SELECT INVOICE.amount_due
FROM INVOICE
JOIN inserted ON INVOICE.invoice_id = inserted.invoice_id
WHERE INVOICE.invoice_id = inserted.invoice_id)
END;
it compiled successfully but at run time I've get the below error:
The target table 'RECEIPT' of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause
I think personally that you should update the paid status outside the scope of triggers. If you perform an INSERT into RECEIPT, you can execute the UPDATE INVOICE ... statement right after that (inside a TRANSACTION of course). A lot cleaner and predictable that way.
As to the error you are getting it's hard to say what is causing that based on the information you gave us. Perhaps the TRIGGER is triggering other TRIGGERs that produce the error you are getting? The statement you provided simply doesn't have an OUTPUT statement.
In any case, the statement you provided is not written correctly (as Damien pointed out) because the inserted table can have multipe rows. This is a rewrite to correct at least that part:
CREATE TRIGGER tg_invoice_payment ON RECEIPT
AFTER INSERT
AS
BEGIN
UPDATE
INVOICE
SET
paid = 1
FROM
inserted AS ins
INNER JOIN INVOICE AS inv ON
inv.invoice_id=ins.invoice_id
WHERE
inv.amount_due=(
SELECT
SUM(r.amount_paid)
FROM
RECEIPT AS r
WHERE
r.receipt_id=ins.invoice_id
);
END;
But as I mentioned earlier you probably not be doing this from a TRIGGER. Execute this statement from your program right after any INSERT/UPDATE. Alternatively, write a Stored Procedure for inserting into RECEIPT and execute the UPDATE statement right after the INSERT.

Return unlocked rows in a "select top n" query

I need to have a MsSql database table and another 8 (identical) processes accessing the same table in parallel - making a select top n, processing those n rows, and updating a column of those rows. The problem is that I need to select and process each row just once. This means that if one process got to the database and selected the top n rows, when the second process comes it should find those rows locked and select the rows from n to 2*n rows, and so on...
Is it possible to put a lock on some rows when you select them, and when someone requests top n rows which are locked to return the next rows, and not to wait for the locked ones? Seems like a long shot, but...
Another thing I was thinking - maybe not so elegant but sounds simple and safe, is to have in the database a counter for the instances which made selects on that table. The first instance that comes will increment the counter and select top n, the next one will increment the counter and select rows from n*(i-1) to n*i, and so on...
Does this sound like a good ideea? Do you have any better suggestions? Any thought is highly appreciated!
Thanks for your time.
Here's a sample I blogged about a while ago:
The READPAST hint is what ensures multiple processes don't block each other when polling for records to process. Plus, in this example I have a bit field to physically "lock" a record - could be a datetime if needed.
DECLARE #NextId INTEGER
BEGIN TRANSACTION
-- Find next available item available
SELECT TOP 1 #NextId = ID
FROM QueueTable WITH (UPDLOCK, READPAST)
WHERE IsBeingProcessed = 0
ORDER BY ID ASC
-- If found, flag it to prevent being picked up again
IF (#NextId IS NOT NULL)
BEGIN
UPDATE QueueTable
SET IsBeingProcessed = 1
WHERE ID = #NextId
END
COMMIT TRANSACTION
-- Now return the queue item, if we have one
IF (#NextId IS NOT NULL)
SELECT * FROM QueueTable WHERE ID = #NextId
The most simplest method is to use row locking:
BEGIN TRAN
SELECT *
FROM authors
WITH (HOLDLOCK, ROWLOCK)
WHERE au_id = '274-80-9391'
/* Do all your stuff here while the record is locked */
COMMIT TRAN
But if you are accessing your data and then closing the connection, you won't be able to use this method.
How long will you be needing to lock the rows for? The best way might actually be as you say to place a counter on the rows you select (best done using OUTPUT clause within an UPDATE).
The best idea if you want to select records in this manner would be to use a counter in a separate table.
You really don't want to be locking rows on a production database exclusively for any great period of time, therefore I would recommend using a counter. This way only one of your processes would be able to grab that counter number at a time (as it will lock as it is being updated) which will give you the concurrency that you need.
If you need a hand writing the tables and procedures that will do this (simply and safely as you put it!) just ask.
EDIT: ahh, nevermind, you're working in a disconnected style. How about this:
UPDATE TOP (#n) QueueTable SET Locked = 1
OUTPUT INSERTED.Col1, INSERTED.Col2 INTO #this
WHERE Locked = 0
<do your stuff>
Perhaps you are looking for the READPAST hint?
<begin or save transaction>
INSERT INTO #this (Col1, Col2)
SELECT TOP (#n) Col1, Col2
FROM Table1 WITH (ROWLOCK, HOLDLOCK, READPAST)
<do your stuff>
<commit or rollback>

Get "next" row from SQL Server database and flag it in single transaction

I have a SQL Server table that I'm using as a queue, and it's being processed by a multi-threaded (and soon to be multi-server) application. I'd like a way for a process to claim the next row from the queue, flagging it as "in-process", without the possibility that multiple threads (or multiple servers) will claim the same row at the same time.
Is there a way to update a flag in a row and retrieve that row at the same time? I want something like this psuedocode, but ideally, without blocking the whole table:
Block the table to prevent others from reading
Grab the next ID in the queue
Update the row of that item with a "claimed" flag (or whatever)
Release the lock and let other threads repeat the process
What's the best way to use T-SQL to accomplish this? I remember seeing a statement one time that would DELETE rows and, at the same time, deposit the DELETED rows into a temp table so you could do something else with them, but I can't for the life of me find it now.
You can use the OUTPUT clause
UPDATE myTable SET flag = 1
WHERE
id = 1
AND
flag <> 1
OUTPUT DELETED.id
Main thing is to use a combination of table hints as shown below, within a transaction.
DECLARE #NextId INTEGER
BEGIN TRANSACTION
SELECT TOP 1 #NextId = ID
FROM QueueTable WITH (UPDLOCK, ROWLOCK, READPAST)
WHERE BeingProcessed = 0
ORDER BY ID ASC
IF (#NextId IS NOT NULL)
BEGIN
UPDATE QueueTable
SET BeingProcessed = 1
WHERE ID = #NextID
END
COMMIT TRANSACTION
IF (#NextId IS NOT NULL)
SELECT * FROM QueueTable WHERE ID = #NextId
UPDLOCK will lock the next available row it finds that's available, preventing other processes from grabbing it.
ROWLOCK will ensure only the individual row is locked (I've never found it to be a problem not using this as I think it will only use a rowlock anyway, but safest to use it).
READPAST will prevent a process being blocked, waiting for another to finish.

Resources