Creating CTE vs selecting criteria in SQL - sql-server

I am working on a school database project that requires a trigger-based solution for some of the optional restrictions my database has. My database model represents and online-based video viewing service where users have access to a large number of videos( similar principle as that of YouTube). Table history stores up to 100 viewed videos for every user. What my trigger is supposed to do is:
Delete previous entries for the same video and user (or change the date of viewing to current time)
Insert new entry (or update an existing one, i.e. possible step 1)
Delete any entries older than the 100th one
Here is the code i wrote:
CREATE TRIGGER [History_Update] ON dbo.History INSTEAD OF INSERT AS
BEGIN
DECLARE #user_id bigint, #video_id bigint, #history_count smallint
SELECT #user_id=user_id, #video_id=video_id FROM inserted
DELETE FROM History where user_id = #user_id AND video_id=#video_id
INSERT INTO History(user_id, video_id) VALUES (#user_id, #video_id)
SET #history_count = (select count(*) FROM History WHERE user_id= #user_id AND video_id = #video_id)
IF( #history_count >= 100)
BEGIN
WITH temp AS (SELECT TOP 1 * FROM History WHERE user_id= #user_id AND video_id=#video_id ORDER BY viewtime ASC)
DELETE FROM temp
END
END
Now, I have few questions regarding this:
Is it better to Use CTE as written above or something like this:
SET #viewtime = (SELECT TOP 1 viewtime FROM History WHERE user_id= #user_id AND video_id=#video_id ORDER BY viewtime ASC)
DELETE FROM History where user_id = #user_id AND video_id=#video_id AND viewtime = #viewtime
Also, would it be better to check if a specific user-video entry exists in History and then update the viewtime attribute. Also, since I am using INSTEAD OF trigger, would this violate any rule regarding the use of this kind of trigger since I am not sure if I understood it well. From what I read online, INSTEAD OF triggers must perform the specified action within the body of the trigger.
Thanks!

Given the choice between your expression and the set, I would choose the CTE. I find that using set with a subquery somewhat icky. After all, the following does the same thing:
SELECT TOP 1 #viewtime = viewtime
FROM History
WHERE user_id = #user_id AND video_id = #video_id
ORDER BY viewtime ASC;
In other words, the set is redundant.
In addition, separating the set from the delete introduces an opportunity for a race condition. Perhaps another query might insert a row or delete the one you are trying to delete.
As for the CTE itself, it is okay. You need the CTE (or subquery) if you are going to delete rows in a particular order.

Related

Trigger reverses the changes made - SQL Server

I'm working on an E-commerce system where I have an order table that stores all the information regarding an order. The orders go through different stages: Open, Verified, In Process, etc. And I'm keeping counts of these orders at different stages e.g. Open Orders 95, Verified 5, In Process 3, etc.
When a new order is inserted in the table, I have a trigger that increments the Open Orders by 1. Similarly, I have a trigger for updates which checks the order's previous stage and the next to decrement and increment accordingly.
The INSERT trigger is working fine as described above. But the UPDATE trigger has a weird behavior that it makes the desired changes to the Counts but then reverses the changes for some reason.
For instance, upon changing the status of an order from Open to Verified, the ideal behavior would be to decrement Open Orders by 1 and increment Verified Orders by 1. The trigger currently performed the desired action but then for some reason restores the previous value.
Here's a snippet of my trigger where I check if the order previously belonged to the Open status and is now being updated to Verified status:
IF ##ROWCOUNT = 0 RETURN
DECLARE #orderID VARCHAR(MAX) -- orderID of the order that is being updated
DECLARE #storeID VARCHAR(MAX) -- storeID of the store the order belongs to
SELECT TOP 1
#orderID = i.id,
#storeID = i.storeID
FROM
inserted AS i
INNER JOIN deleted AS d
ON i.id = d.id
-- IF from Open Order
IF EXISTS (
SELECT *
FROM
deleted
WHERE
orderStatus = 'Open' AND
id = #orderID
)
BEGIN
-- IF to Verified Order
IF EXISTS (
SELECT *
FROM
inserted
WHERE
orderStatus = 'Verified' AND
id = #orderID
)
BEGIN
UPDATE order_counts
SET
open_orders = open_orders - ##ROWCOUNT,
verified_orders = verified_orders + ##ROWCOUNT
WHERE storeID = #storeID
END
END
EDIT:
Here's some extra information which will be helpful in light of the first comment on the question:
I have a lot of records in the table so using COUNT() again and again has a lot of impact on the overall performance. This is why I'm keeping counts in a separate table. Also, I've written the trigger in a way that it handles both single record/multi record changes. I only check one row because I know in case of multiple records they will all be going through the same change of status. Hence, the decrement/increment of ##ROWCOUNT
If you can tolerate a slightly different representation of the order counts, I'd strongly suggest using an indexed view instead1:
create table dbo.Orders (
ID int not null,
OrderStatus varchar(20) not null,
constraint PK_Orders PRIMARY KEY (ID)
)
go
create view dbo.OrderCounts
with schemabinding
as
select
OrderStatus,
COUNT_BIG(*) as Cnt
from
dbo.Orders
group by OrderStatus
go
create unique clustered index IX_OrderCounts on dbo.OrderCounts (OrderStatus)
go
insert into dbo.Orders (ID,OrderStatus) values
(1,'Open'),
(2,'Open'),
(3,'Verified')
go
update dbo.Orders set OrderStatus = 'Verified' where ID = 2
go
select * from dbo.OrderCounts
Results:
OrderStatus Cnt
-------------------- --------------------
Open 1
Verified 2
This has the advantage that, whilst behind the scenes SQL Server is doing something very similar to running triggers, this code has been debugged thoroughly and is correct.
In your current attempted trigger, one further reason why the trigger is currently broken is that ##ROWCOUNT isn't "sticky" - it doesn't remember the number of rows that were affected by the original UPDATE when you're running other statements inside your trigger that also set ##ROWCOUNT.
1You can always stack a non-indexed view atop this view and perform a PIVOT if you really want the counts in a single row and in multiple columns.
The reason for this behavior is the use of ##ROWCOUNT multiple times while in reality once the results from ##ROWCOUNT is fetched, the results are cleared. Instead get the results into variable and use that variable across the trigger. Check the below scenario for the same.
CREATE DATABASE Test
USE Test
CREATE TABLE One
(
ID INT IDENTITY(1,1)
,Name NVARCHAR(MAX)
)
GO
CREATE TRIGGER TR_One ON One FOR INSERT,UPDATE
AS
BEGIN
PRINT ##ROWCOUNT
SELECT ##ROWCOUNT
END
UPDATE One
SET Name = 'Name4'
WHERE ID = 3
RESULTS :-
The Print ##ROWCOUNT statement would give a value of 1, where as the select ##ROWCOUNT would give the value of 0

Select and Delete in the same transaction using TOP clause

I have table in which the data is been continuously added at a rapid pace.
And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.
I am using the below query to
BEGIN TRAN readrowdata
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
WITH q AS
(
SELECT
top 5 [RawDataId],
[RawData]
FROM
[TABLE] with(HOLDLOCK)
)
DELETE from q
COMMIT TRANSACTION readrowdata
I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.
Is the query is ok in performance term? If i can improve it then please provide me your suggestion.
Thank you
Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...
DECLARE #rowsBeingProcessed TABLE (
id INT
);
WITH rows AS (
SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO #rowsBeingProcessed;
-- Business Logic Here
DELETE yourTable WHERE RowDataID IN (SELECT id FROM #rowsBeingProcessed);
Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.
By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.
If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).
Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!
You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...
Data Definition language
CREATE TABLE SampleTable
(
id int
)
Sample Record
insert into SampleTable(id)Values(1)
Sample Trigger
CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
Set NOCOUNT ON
SET XACT_ABORT ON
Begin Try
Begin Tran
Select ID From Inserted
DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);
Commit Tran
End Try
Begin Catch
Rollback Tran
End Catch
End
Hope this is very simple and helpful
If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?
If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?
I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:
DECLARE #Output table
(
StaffID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO #Output
FROM Sales.vSalesPerson sp
INNER JOIN dbo.SalesStaff ss
ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM #output;
Maybe it will be helpfull for you.

SQL Server unique auto-increment column in the context of another column

Suppose the table with two columns:
ParentEntityId int foreign key
Number int
ParentEntityId is a foreign key to another table.
Number is a local identity, i.e. it is unique within single ParentEntityId.
Uniqueness is easily achieved via unique key over these two columns.
How to make Number be automatically incremented in the context of the ParentEntityId on insert?
Addendum 1
To clarify the problem, here is an abstract.
ParentEntity has multiple ChildEntity, and each ChiildEntity should have an unique incremental Number in the context of its ParentEntity.
Addendum 2
Treat ParentEntity as a Customer.
Treat ChildEntity as an Order.
So, orders for every customer should be numbered 1, 2, 3 and so on.
Well, there's no native support for this type of column, but you could implement it using a trigger:
CREATE TRIGGER tr_MyTable_Number
ON MyTable
INSTEAD OF INSERT
AS
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN;
WITH MaxNumbers_CTE AS
(
SELECT ParentEntityID, MAX(Number) AS Number
FROM MyTable
WHERE ParentEntityID IN (SELECT ParentEntityID FROM inserted)
)
INSERT MyTable (ParentEntityID, Number)
SELECT
i.ParentEntityID,
ROW_NUMBER() OVER
(
PARTITION BY i.ParentEntityID
ORDER BY (SELECT 1)
) + ISNULL(m.Number, 0) AS Number
FROM inserted i
LEFT JOIN MaxNumbers_CTE m
ON m.ParentEntityID = i.ParentEntityID
COMMIT
Not tested but I'm pretty sure it'll work. If you have a primary key, you could also implement this as an AFTER trigger (I dislike using INSTEAD OF triggers, they're harder to understand when you need to modify them 6 months later).
Just to explain what's going on here:
SERIALIZABLE is the strictest isolation mode; it guarantees that only one database transaction at a time can execute these statements, which we need in order to guarantee the integrity of this "sequence." Note that this irreversibly promotes the entire transaction, so you won't want to use this inside of a long-running transaction.
The CTE picks up the highest number already used for each parent ID;
ROW_NUMBER generates a unique sequence for each parent ID (PARTITION BY) starting from the number 1; we add this to the previous maximum if there is one to get the new sequence.
I probably should also mention that if you only ever need to insert one new child entity at a time, you're better off just funneling those operations through a stored procedure instead of using a trigger - you'll definitely get better performance out of it. This is how it's currently done with hierarchyid columns in SQL '08.
Need add OUTPUT clause to trigger for Linq to SQL сompatibility.
For example:
INSERT MyTable (ParentEntityID, Number)
OUTPUT inserted.*
SELECT
i.ParentEntityID,
ROW_NUMBER() OVER
(
PARTITION BY i.ParentEntityID
ORDER BY (SELECT 1)
) + ISNULL(m.Number, 0) AS Number
FROM inserted i
LEFT JOIN MaxNumbers_CTE m
ON m.ParentEntityID = i.ParentEntityID
This solves the question as I understand it: :-)
DECLARE #foreignKey int
SET #foreignKey = 1 -- or however you get this
INSERT Tbl (ParentEntityId, Number)
VALUES (#foreignKey, ISNULL((SELECT MAX(Number) FROM Tbl WHERE ParentEntityId = #foreignKey), 0) + 1)

is there anyway to cache data that can be used in a SQL server db trigger

i have an orders table that has a userID column
i have a user table that has id, name,
i would like to have a database trigger that shows the insert, update or delete by name.
so i wind up having to do this join between these two tables on every single db trigger. I would think it would be better if i can one query upfront to map users to Ids and then reuse that "lookup " on my triggers . . is this possible?
DECLARE #oldId int
DECLARE #newId int
DECLARE #oldName VARCHAR(100)
DECLARE #newName VARCHAR(100)
SELECT #oldId = (SELECT user_id FROM Deleted)
SELECT #newId = (SELECT user_id FROM Inserted)
SELECT #oldName = (SELECT name FROM users where id = #oldId)
SELECT #newName = (SELECT name FROM users where id = #newId)
INSERT INTO History(id, . . .
Good news, you are already are using a cache! Your SELECT name FROM users WHERE id = #id is going to fetch the name for the buffer pool cached pages. Believe you me, you won't be able to construct a better tuned, higher scale and faster cache than that.
Result caching may make sense in the client, where one can avoid the roundtrip to the database altogether. Or it may be valuable to cache some complex and long running query result. But inside a stored proc/trigger there is absolutely no value in caching a simple index lookup result.
How about you turn on Change Data Capture, and then get rid of all this code?
Edited to add the rest:
Actually, if you're considering the possibility of a scalar function to fetch the username, then don't. That's really bad because of the problems of scalar functions being procedural. You'd be better off with something like:
INSERT dbo.History (id, ...)
SELECT i.id, ...
FROM inserted i
JOIN deleted d ON d.id = i.id
JOIN dbo.users u ON u.user_id = i.user_id;
As user_id is unique, and you have a FK whenever it's used, it shouldn't be a major problem. But yes, you need to repeat this logic in every trigger. If you don't want to repeat the logic, then use Change Data Capture in SQL 2008.

Update SQL with consecutive numbering

I want to update a table with consecutive numbering starting with 1. The update has a where clause so only results that meet the clause will be renumbered. Can I accomplish this efficiently without using a temp table?
This probably depends on your database, but here is a solution for MySQL 5 that involves using a variable:
SET #a:=0;
UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2,field3
You should probably edit your question and indicate which database you're using however.
Edit: I found a solution utilizing T-SQL for SQL Server. It's very similar to the MySQL method:
DECLARE #myVar int
SET #myVar = 0
UPDATE
myTable
SET
#myvar = myField = #myVar + 1
For Microsoft SQL Server 2005/2008. ROW_NUMBER() function was added in 2005.
; with T as (select ROW_NUMBER() over (order by ColumnToOrderBy) as RN
, ColumnToHoldConsecutiveNumber from TableToUpdate
where ...)
update T
set ColumnToHoldConsecutiveNumber = RN
EDIT: For SQL Server 2000:
declare #RN int
set #RN = 0
Update T
set ColumnToHoldConsecutiveNubmer = #RN
, #RN = #RN + 1
where ...
NOTE: When I tested the increment of #RN appeared to happen prior to setting the the column to #RN, so the above gives numbers starting at 1.
EDIT: I just noticed that is appears you want to create multiple sequential numbers within the table. Depending on the requirements, you may be able to do this in a single pass with SQL Server 2005/2008, by adding partition by to the over clause:
; with T as (select ROW_NUMBER()
over (partition by Client, City order by ColumnToOrderBy) as RN
, ColumnToHoldConsecutiveNumber from TableToUpdate)
update T
set ColumnToHoldConsecutiveNumber = RN
If you want to create a new PrimaryKey column, use just this:
ALTER TABLE accounts ADD id INT IDENTITY(1,1)
As well as using a CTE or a WITH, it is also possible to use an update with a self-join to the same table:
UPDATE a
SET a.columnToBeSet = b.sequence
FROM tableXxx a
INNER JOIN
(
SELECT ROW_NUMBER() OVER ( ORDER BY columnX ) AS sequence, columnY, columnZ
FROM tableXxx
WHERE columnY = #groupId AND columnY = #lang2
) b ON b.columnY = a.columnY AND b.columnZ = a.columnZ
The derived table, alias b, is used to generated the sequence via the ROW_NUMBER() function together with some other columns which form a virtual primary key.
Typically, each row will require a unique sequence value.
The WHERE clause is optional and limits the update to those rows that satisfy the specified conditions.
The derived table is then joined to the same table, alias a, joining on the virtual primary key columns with the column to be updated set to the generated sequence.
In oracle this works:
update myTable set rowColum = rownum
where something = something else
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/pseudocolumns009.htm#i1006297
To get the example by Shannon fully working I had to edit his answer:
; WITH CTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY [NameOfField]) as RowNumber, t1.ID
FROM [ActualTableName] t1
)
UPDATE [ActualTableName]
SET Name = 'Depersonalised Name ' + CONVERT(varchar(255), RowNumber)
FROM CTE
WHERE CTE.Id = [ActualTableName].ID
as his answer was trying to update T, which in his case was the name of the Common Table Expression, and it throws an error.
UPDATE TableName
SET TableName.id = TableName.New_Id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS New_Id
FROM TableName
) TableName
I've used this technique for years to populate ordinals and sequentially numbered columns. However I recently discovered an issue with it when running on SQL Server 2012. It would appear that internally the query engine is applying the update using multiple threads and the predicate portion of the UPDATE is not being handled in a thread-safe manner. To make it work again I had to reconfigure SQL Server's max degree of parallelism down to 1 core.
EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
DECLARE #id int
SET #id = -1
UPDATE dbo.mytable
SET #id = Ordinal = #id + 1
Without this you'll find that most sequential numbers are duplicated throughout the table.
One more way to achieve the desired result
1. Create a sequence object - (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver16)
CREATE SEQUENCE dbo.mySeq
AS BIGINT
START WITH 1 -- up to you from what number you want to start cycling
INCREMENT BY 1 -- up to you how it will increment
MINVALUE 1
CYCLE
CACHE 100;
2. Update your records
UPDATE TableName
SET Col2 = NEXT VALUE FOR dbo.mySeq
WHERE ....some condition...
EDIT: To reset sequence to start from the 1 for the next time you use it
ALTER SEQUENCE dbo.mySeq RESTART WITH 1 -- or start with any value you need`
Join to a Numbers table? It involves an extra table, but it wouldn't be temporary -- you'd keep the numbers table around as a utility.
See http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html
or
http://www.sqlservercentral.com/articles/Advanced+Querying/2547/
(the latter requires a free registration, but I find it to be a very good source of tips & techniques for MS SQL Server, and a lot is applicable to any SQL implementation).
It is possible, but only via some very complicated queries - basically you need a subquery that counts the number of records selected so far, and uses that as the sequence ID. I wrote something similar at one point - it worked, but it was a lot of pain.
To be honest, you'd be better off with a temporary table with an autoincrement field.

Resources