Options for indexing a view with cte - sql-server

I have a view for which I want to create an Indexed view. After a lot of energy I was able to put the sql query in place for the view and It looks like this -
ALTER VIEW [dbo].[FriendBalances] WITH SCHEMABINDING as
WITH
trans (Amount,PaidBy,PaidFor, Id) AS
(SELECT Amount,userid AS PaidBy, PaidForUsers_FbUserId AS PaidFor, Id FROM dbo.Transactions
FULL JOIN dbo.TransactionUser ON dbo.Transactions.Id = dbo.TransactionUser.TransactionsPaidFor_Id),
bal (PaidBy,PaidFor,Balance) AS
(SELECT PaidBy,PaidFor, SUM( Amount/ transactionCounts.[_count]) AS Balance FROM trans
JOIN (SELECT Id,COUNT(*)AS _count FROM trans GROUP BY Id) AS transactionCounts ON trans.Id = transactionCounts.Id AND trans.PaidBy <> trans.PaidFor
GROUP BY trans.PaidBy,trans.PaidFor )
SELECT ISNULL(bal.PaidBy,bal2.PaidFor)AS PaidBy,ISNULL(bal.PaidFor,bal2.PaidBy)AS PaidFor,
ISNULL( bal.Balance,0)-ISNULL(bal2.Balance,0) AS Balance
FROM bal
left JOIN bal AS bal2 ON bal.PaidBy = bal2.PaidFor AND bal.PaidFor = bal2.Paidby
WHERE ISNULL( bal.Balance,0)>ISNULL(bal2.Balance,0)
Sample Data for FriendBalances View -
PaidBy PaidFor Balance
------ ------- -------
9990 9991 1000
9990 9992 2000
9990 9993 1000
9991 9993 1000
9991 9994 1000
It is mainly a join of 2 tables.
Transactions -
CREATE TABLE [dbo].[Transactions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL,
[Amount] [float] NOT NULL,
[UserId] [bigint] NOT NULL,
[Remarks] [nvarchar](255) NULL,
[GroupFbGroupId] [bigint] NULL,
CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED
Sample data in Transactions Table -
Id Date Amount UserId Remarks GroupFbGroupId
-- ----------------------- ------ ------ -------------- --------------
1 2001-01-01 00:00:00.000 3000 9990 this is a test NULL
2 2001-01-01 00:00:00.000 3000 9990 this is a test NULL
3 2001-01-01 00:00:00.000 3000 9991 this is a test NULL
TransactionUsers -
CREATE TABLE [dbo].[TransactionUser](
[TransactionsPaidFor_Id] [bigint] NOT NULL,
[PaidForUsers_FbUserId] [bigint] NOT NULL
) ON [PRIMARY]
Sample Data in TransactionUser Table -
TransactionsPaidFor_Id PaidForUsers_FbUserId
---------------------- ---------------------
1 9991
1 9992
1 9993
2 9990
2 9991
2 9992
3 9990
3 9993
3 9994
Now I am not able to create a view because my query contains cte(s). What are the options that I have now?
If cte can be removed, what should be the other option which would help in creating indexed views.
Here is the error message -
Msg 10137, Level 16, State 1, Line 1 Cannot create index on view "ShareBill.Test.Database.dbo.FriendBalances" because it references common table expression "trans". Views referencing common table expressions cannot be indexed. Consider not indexing the view, or removing the common table expression from the view definition.
The concept:
Transaction mainly consists of:
an Amount that was paid
UserId of the User who paid that amount
and some more information which is not important for now.
TransactionUser table is a mapping between a Transaction and a User Table. Essentially a transaction can be shared between multiple persons. So we store that in this table.
So we have transactions where 1 person is paying for it and other are sharing the amount. So if A pays 100$ for B then B would owe 100$ to A. Similarly if B pays 90$ for A then B would owe only $10 to A. Now if A pays 300$ for A,b,c that means B would owe 110$ and C would owe 10$ to A.
So in this particular view we are aggregating the effective amount that has been paid (if any) between 2 users and thus know how much a person owes another person.

Okay, this gives you an indexed view (that needs an additional view on top of to sort out the who-owes-who detail), but it may not satisfy your requirements still.
/* Transactions table, as before, but with handy unique constraint for FK Target */
CREATE TABLE [dbo].[Transactions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL,
[Amount] [float] NOT NULL,
[UserId] [bigint] NOT NULL,
[Remarks] [nvarchar](255) NULL,
[GroupFbGroupId] [bigint] NULL,
CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED (Id),
constraint UQ_Transactions_XRef UNIQUE (Id,Amount,UserId)
)
Nothing surprising so far, I hope
/* Much expanded TransactionUser table, we'll hide it away and most of the maintenance is automatic */
CREATE TABLE [dbo]._TransactionUser(
[TransactionsPaidFor_Id] int NOT NULL,
[PaidForUsers_FbUserId] [bigint] NOT NULL,
Amount float not null,
PaidByUserId bigint not null,
UserCount int not null,
LowUserID as CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN [PaidForUsers_FbUserId] ELSE PaidByUserId END,
HighUserID as CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN PaidByUserId ELSE [PaidForUsers_FbUserId] END,
PerUserDelta as (Amount/UserCount) * CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN -1 ELSE 1 END,
constraint PK__TransactionUser PRIMARY KEY ([TransactionsPaidFor_Id],[PaidForUsers_FbUserId]),
constraint FK__TransactionUser_Transactions FOREIGN KEY ([TransactionsPaidFor_Id]) references dbo.Transactions,
constraint FK__TransactionUser_Transaction_XRef FOREIGN KEY ([TransactionsPaidFor_Id],Amount,PaidByUserID)
references dbo.Transactions (Id,Amount,UserId) ON UPDATE CASCADE
)
This table now maintains enough information to allow the view to be constructed. The rest of the work we do is to construct/maintain the data in the table. Note that, with the foreign key constraint, we've already ensured that if, say, an amount is changed in the transactions table, everything gets recalculated.
/* View that mimics the original TransactionUser table -
in fact it has the same name so existing code doesn't need to change */
CREATE VIEW dbo.TransactionUser
with schemabinding
as
select
[TransactionsPaidFor_Id],
[PaidForUsers_FbUserId]
from
dbo._TransactionUser
GO
/* Effectively the PK on the original table */
CREATE UNIQUE CLUSTERED INDEX PK_TransactionUser on dbo.TransactionUser ([TransactionsPaidFor_Id],[PaidForUsers_FbUserId])
Anything that's already written to work against TransactionUser will now work against this view, and be none the wiser. Except, they can't insert/update/delete the rows without some help:
/* Now we write the trigger that maintains the underlying table */
CREATE TRIGGER dbo.T_TransactionUser_IUD
ON dbo.TransactionUser
INSTEAD OF INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON;
/* Every delete affects *every* row for the same transaction
We need to drop the counts on every remaining row, as well as removing the actual rows we're interested in */
WITH DropCounts as (
select TransactionsPaidFor_Id,COUNT(*) as Cnt from deleted group by TransactionsPaidFor_Id
), KeptRows as (
select tu.TransactionsPaidFor_Id,tu.PaidForUsers_FbUserId,UserCount - dc.Cnt as NewCount
from dbo._TransactionUser tu left join deleted d
on tu.TransactionsPaidFor_Id = d.TransactionsPaidFor_Id and
tu.PaidForUsers_FbUserId = d.PaidForUsers_FbUserId
inner join DropCounts dc
on
tu.TransactionsPaidFor_Id = dc.TransactionsPaidFor_Id
where
d.PaidForUsers_FbUserId is null
), ChangeSet as (
select TransactionsPaidFor_Id,PaidForUsers_FbUserId,NewCount,1 as Keep
from KeptRows
union all
select TransactionsPaidFor_Id,PaidForUsers_FbUserId,null,0
from deleted
)
merge into dbo._TransactionUser tu
using ChangeSet cs on tu.TransactionsPaidFor_Id = cs.TransactionsPaidFor_Id and tu.PaidForUsers_FbUserId = cs.PaidForUsers_FbUserId
when matched and cs.Keep = 1 then update set UserCount = cs.NewCount
when matched then delete;
/* Every insert affects *every* row for the same transaction
This is why the indexed view couldn't be generated */
WITH TU as (
select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,PaidByUserId from dbo._TransactionUser
where TransactionsPaidFor_Id in (select TransactionsPaidFor_Id from inserted)
union all
select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,UserId
from inserted i inner join dbo.Transactions t on i.TransactionsPaidFor_Id = t.Id
), CountedTU as (
select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,PaidByUserId,
COUNT(*) OVER (PARTITION BY TransactionsPaidFor_Id) as Cnt
from TU
)
merge into dbo._TransactionUser tu
using CountedTU new on tu.TransactionsPaidFor_Id = new.TransactionsPaidFor_Id and tu.PaidForUsers_FbUserId = new.PaidForUsers_FbUserId
when matched then update set Amount = new.Amount,PaidByUserId = new.PaidByUserId,UserCount = new.Cnt
when not matched then insert
([TransactionsPaidFor_Id],[PaidForUsers_FbUserId],Amount,PaidByUserId,UserCount)
values (new.TransactionsPaidFor_Id,new.PaidForUsers_FbUserId,new.Amount,new.PaidByUserId,new.Cnt);
Now that the underlying table is being maintained, we can finally write the indexed view you wanted in the first place... almost. The issue is that the totals we create may be positive or negative, because we've normalized the transactions so that we can easily sum them:
CREATE VIEW [dbo]._FriendBalances
WITH SCHEMABINDING
as
SELECT
LowUserID,
HighUserID,
SUM(PerUserDelta) as Balance,
COUNT_BIG(*) as Cnt
FROM dbo._TransactionUser
WHERE LowUserID != HighUserID
GROUP BY
LowUserID,
HighUserID
GO
create unique clustered index IX__FriendBalances on dbo._FriendBalances (LowUserID, HighUserID)
So we finally create a view, built on the indexed view above, that if the balance is negative, we flip the person owed, and the person owing around. But it will use the index on the above view, which is most of the work we were seeking to save by having the indexed view:
create view dbo.FriendBalances
as
select
CASE WHEN Balance >= 0 THEN LowUserID ELSE HighUserID END as PaidBy,
CASE WHEN Balance >= 0 THEN HighUserID ELSE LowUserID END as PaidFor,
ABS(Balance) as Balance
from
dbo._FriendBalances WITH (NOEXPAND)
Now, finally, we insert your sample data:
set identity_insert dbo.Transactions on --Ensure we get IDs we know
GO
insert into dbo.Transactions (Id,[Date] , Amount , UserId , Remarks ,GroupFbGroupId)
select 1 ,'2001-01-01T00:00:00.000', 3000, 9990 ,'this is a test', NULL union all
select 2 ,'2001-01-01T00:00:00.000', 3000, 9990 ,'this is a test', NULL union all
select 3 ,'2001-01-01T00:00:00.000', 3000, 9991 ,'this is a test', NULL
GO
set identity_insert dbo.Transactions off
GO
insert into dbo.TransactionUser (TransactionsPaidFor_Id, PaidForUsers_FbUserId)
select 1, 9991 union all
select 1, 9992 union all
select 1, 9993 union all
select 2, 9990 union all
select 2, 9991 union all
select 2, 9992 union all
select 3, 9990 union all
select 3, 9993 union all
select 3, 9994
And query the final view:
select * from dbo.FriendBalances
PaidBy PaidFor Balance
9990 9991 1000
9990 9992 2000
9990 9993 1000
9991 9993 1000
9991 9994 1000
Now, there is additional work we could do, if we were concerned that someone may find a way to dodge the triggers and perform direct changes to the base tables. The first would be yet another indexed view, that will ensure that every row for the same transaction has the same UserCount value. Finally, with a few additional columns, check constraints, FK constraints and more work in the triggers, I think we can ensure that the UserCount is correct - but it may add more overhead than you want.
I can add scripts for these aspects if you want me to - it depends on how restrictive you want/need the database to be.

Related

Creating INSERT trigger that sets values to 0

I have two tables : Invoice and Invoice_item, relationship 1 to many.
The Invoice_item table has columns Number_sold and Item_price, and the Invoice table has Number_sold_total and Item_price_total columns that will store total values of columns Number_sold and Item_price from the Invoice_item table with the same Invoice_ID key.
CREATE TABLE [Invoice] (
[Invoice_ID] [int] NOT NULL,
[Number_sold_total] [int] NOT NULL,
[Item_price_total] [decimal] NOT NULL,
PRIMARY KEY ([Invoice_ID]));
CREATE TABLE [Invoice_item] (
[Invoice_item_ID] [int] NOT NULL,
[Invoice_ID] [int] NOT NULL,
[Number_sold] [int] NOT NULL,
[Item_price] [decimal] NOT NULL,
PRIMARY KEY ([Invoice_item_ID],[Invoice_ID],
FOREIGN KEY ([Invoice_ID]) REFERENCES [Invoice]([Invoice_ID]);
So, if there are three rows in Invoice_item with the same Invoice_ID, the row with that Invoice_ID in Invoice table will have SUM values of corresponding columns in Invoice_item table.
Let's say i have three rows in Invoice_item table and columns Item_price with values 100,200 and 300, and they have the Invoice_ID = 3. The column Item_price_total in Invoice will have value of 600, where the Invoice_ID = 3.
QUESTION -
My task is to create an insert trigger on table Invoice that will set the values of Number_sold_total and Item_price_total to 0(ZERO) if there is no Invoice_item with corresponding Invoice_ID -> IF NOT EXISTS (Invoice.Invoice_ID = Invoice_item.Invoice_ID)...
I am using SQL Server 2017.
Ideally you would not implement this using triggers.
Instead you should use a view. If you are worried about querying performance, you can index it, at the cost of insert and delete performance.
CREATE VIEW dbo.Invoice_Totals
WITH SCHEMABINDING
AS
SELECT
i.Invoice_ID,
Number_sold = SUM(i.Number_sold),
Item_price = SUM(i.Item_price),
ItemCount = COUNT_BIG(*) -- must include count for indexed view
FROM dbo.Invoice_item;
And then index it
CREAT UNIQUE CLUSTERED INDEX CX_Invoice_Totals ON Invoice_Totals
(Invoice_ID);
If you really, really want to do this using triggers, you can use the following
CREATE OR ALTER TRIGGER TR_Invoice_Total
ON dbo.Invoice_item
AFTER INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON; -- prevent spurious resultsets
IF (NOT EXISTS (SELECT 1 FROM inserted) AND NOT EXISTS (SELECT 1 FROM deleted))
RETURN; -- early bail-out if no rows
UPDATE i
SET Number_sold_total += totals.Number_sold_total,
Item_price_total += totals.Item_price_total
FROM Invoice i
JOIN (
SELECT
Invoice_ID = ISNULL(i.Invoice_ID, d.Invoice_ID),
Number_sold_total = SUM(ISNULL(i.Number_sold, 0) - ISNULL(d.Number_sold, 0)),
Item_price_total = SUM(ISNULL(i.Item_price, 0) - ISNULL(d.Item_price, 0))
FROM inserted i
FULL JOIN deleted d ON d.Invoice_ID = i.Invoice_ID
GROUP BY
ISNULL(i.Invoice_ID, d.Invoice_ID)
) totals
ON totals.Invoice_Id = i.Invoice_ID;
db<>fiddle
The steps of the trigger are as follows:
Bail out early if the modification affected 0 rows.
Join the inserted and deleted tables together on the primary key. This needs to be a full-join, because in an INSERT there are no deleted and in a DELETE there are no inserted rows.
Group up the changed rows by Invoice_ID, taking the sum of the differences.
Join back to the Invoice table
Update the Invoice table adding the total difference to each column.
This effectively recreates what the indexed view would do for you automatically.
You cannot just select the first row from inserted and deleted into variables, as there may be multiple rows affected. You must join and group them

At what point in the query processing lifecycle are runtime constant functions evaluated?

I have a table that holds data about events in my application and I want to process these events in order, one at a time. Rows are created (inserted into the table) from a trigger on a different table. Rows are picked for processing using an UPDATE TOP 1...ORDER BY Id style query. Common sense says that a row must be created before it can be picked, but during load testing very occasionally the datetime recorded for the picking is BEFORE the datetime recorded for the create.
After Googling for a while my best guess as to what is going on (based mainly on a blog from Connor Cunningham linked from Using function in where clause: how many times is the function evaluated?) is that the execution of the create and the pick queries overlap and sysutcdatetime() is evaluated at the start of query execution before waits causes the queries to finish in the opposite order to which they started. Something roughly like this (time moving downwards)
---------------------------------------------------
|Create Query |Pick Query |
===================================================
| |query start |
---------------------------------------------------
| |evaluate sysutcdatetime |
---------------------------------------------------
|query start |wait/block |
---------------------------------------------------
|evaluate sysutcdatetime |wait/block |
---------------------------------------------------
|insert rows using |wait/block |
|sysutcdatetime value | |
|as Create timestamp | |
---------------------------------------------------
|transaction commits |wait/block |
---------------------------------------------------
| |update top 1 using |
| |sysutcdatetime value as |
| |Pick timestamp |
---------------------------------------------------
Can anyone confirm when runtime constant functions are evaluated? Or provide an alternative explanation for how the datetime recorded for the picking could be BEFORE the datetime recorded for the create?
Just to be clear, I'm looking to understand the behaviour I'm seeing, not for ways to change my schema/code to make the problem go away. My fix for now is to remove the (PickedAt >= CreatedAt) check constraint.
For completeness, the relevant parts of my event table are;
create table dbo.JobInstanceEvent (
Id int identity not null constraint PK_JobInstanceEvent primary key,
JobInstanceId int not null constraint FK_JobInstanceEvent_JobInstance foreign key references dbo.JobInstance (Id),
JobInstanceStateCodeOld char(4) not null constraint FK_JobInstanceEvent_JobInstanceState1 foreign key references ref.JobInstanceState (Code),
JobInstanceStateCodeNew char(4) not null constraint FK_JobInstanceEvent_JobInstanceState2 foreign key references ref.JobInstanceState (Code),
JobInstanceEventStateCode char(4) not null constraint FK_JobInstance_JobInstanceEventState foreign key references ref.JobInstanceEventState (Code),
CreatedAt datetime2 not null,
PickedAt datetime2 null,
FinishedAt datetime2 null,
constraint CK_JobInstanceEvent_PickedAt check (PickedAt >= CreatedAt),
constraint CK_JobInstanceEvent_FinishedAt check (FinishedAt >= PickedAt),
constraint CK_JobInstanceEvent_PickedAt_FinishedAt check (PickedAt is null and FinishedAt is null or
PickedAt is not null) -- this covers the allowable combinations of PickedAt/FinishedAt
)
The SQL statement that creates the new rows is;
insert dbo.JobInstanceEvent (JobInstanceId, JobInstanceStateCodeOld, JobInstanceStateCodeNew, JobInstanceEventStateCode, CreatedAt)
select
i.Id as JobInstanceId,
d.JobInstanceStateCode as JobInstanceStateCodeOld,
i.JobInstanceStateCode as JobInstanceStateCodeNew,
'CRTD' as JobInstanceEventStateCode,
sysutcdatetime() as CreatedAt
from
inserted i
inner join deleted d on d.Id = i.Id
where
i.JobInstanceStateCode <> d.JobInstanceStateCode and -- the state has changed and
i.JobInstanceStateCode in ('SUCC', 'FAIL') -- the new state is either success or failure.
The SQL statement that picks a row is;
; with cte as (
select top 1
jie.Id,
jie.JobInstanceId,
jie.JobInstanceStateCodeOld,
jie.JobInstanceStateCodeNew,
jie.JobInstanceEventStateCode,
jie.PickedAt
from
dbo.JobInstanceEvent jie
where
jie.JobInstanceEventStateCode = 'CRTD'
order by
jie.Id
)
update cte set
JobInstanceEventStateCode = 'PICK',
PickedAt = sysutcdatetime()
output
inserted.Id,
inserted.JobInstanceId,
inserted.JobInstanceStateCodeOld,
inserted.JobInstanceStateCodeNew
into
#PickedJobInstanceEvent
I'm using SQL Server 2016 but I don't think this is a version specific issue.
explanation for how the datetime recorded for the picking could be
BEFORE the datetime recorded for the create?
You could simulate the behavior of the create/pick query diagram by using the following (two ssms windows, for the create and pickup queries)
Another contributing factor is the time accuracy of windows. In a highly concurrent system, blocking and waits will definitely occur and picked dates could be at least the same or a few millisecs before the creation dates (if pickup queries have to wait for the creation of new rows).
create table dbo.atest
(
id int identity primary key clustered,
colA char(500) default('a'),
createddate datetime2(4) default(sysdatetime()),
pickeddate datetime2(4)
)
go
--rows already picked up
insert into dbo.atest(colA, createddate,pickeddate)
values
('a', '20200405 12:00', '20200406 10:00'),
('b', '20200405 12:00', '20200406 10:10'),
('c', '20200405 12:00', '20200406 10:20'),
('d', '20200405 12:00', '20200406 10:30');
--create a new row..to be picked up
begin transaction -- ...
update dbo.atest --..query start | wait block
set colA = colA
waitfor delay '00:00:40'
--during the waitfor delay, in another window(SSMS)
/*
--this will wait(blocking) for the delay and the insert and commit...
update a
set pickeddate = sysdatetime()
from
(
select top (1) *
from dbo.atest
where pickeddate is null
order by id
) as a;
--insertion happened after the update was fired, picked<created
select *
from dbo.atest
where pickeddate < createddate;
*/
--create new row
insert into dbo.atest(colA) values('e')
commit transaction
go
--drop table dbo.atest
You could prevent pickdate < createdate by incorporating a condition in the select/pickup query:
from
dbo.JobInstanceEvent jie
where
jie.JobInstanceEventStateCode = 'CRTD'
and jie.CreatedAt < /*= ?*/ sysutcdatetime()
order by
jie.Id

Formatting data in sql

I have few tables and basically I'm working out on telerik reports. The structure and the sample data I have is given below:
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Leave'))
BEGIN;
DROP TABLE [Leave];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Addition'))
BEGIN;
DROP TABLE [Addition];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Deduction'))
BEGIN;
DROP TABLE [Deduction];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('EmployeeInfo'))
BEGIN;
DROP TABLE [EmployeeInfo];
END;
GO
CREATE TABLE [EmployeeInfo] (
[EmpID] INT NOT NULL PRIMARY KEY,
[EmployeeName] VARCHAR(255)
);
CREATE TABLE [Addition] (
[AdditionID] INT NOT NULL PRIMARY KEY,
[AdditionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Deduction] (
[DeductionID] INT NOT NULL PRIMARY KEY,
[DeductionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Leave] (
[LeaveID] INT NOT NULL PRIMARY KEY,
[LeaveType] VARCHAR(255) NULL,
[DateFrom] VARCHAR(255),
[DateTo] VARCHAR(255),
[Approved] Binary,
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
GO
INSERT INTO EmployeeInfo([EmpID], [EmployeeName]) VALUES
(1, 'Marcia'),
(2, 'Lacey'),
(3, 'Fay'),
(4, 'Mohammad'),
(5, 'Mike')
INSERT INTO Addition([AdditionID], [AdditionType], [Amount], [EmpID]) VALUES
(1, 'Bonus', '2000', 2),
(2, 'Increment', '5000', 5)
INSERT INTO Deduction([DeductionID], [DeductionType], [Amount], [EmpID]) VALUES
(1, 'Late Deductions', '2000', 4),
(2, 'Delayed Project Completion', '5000', 1)
INSERT INTO Leave([LeaveID],[LeaveType],[DateFrom],[DateTo], [Approved], [EmpID]) VALUES
(1, 'Annual Leave','2018-01-08 04:52:03','2018-01-10 20:30:53', 1, 1),
(2, 'Sick Leave','2018-02-10 03:34:41','2018-02-14 04:52:14', 1, 2),
(3, 'Casual Leave','2018-01-04 11:06:18','2018-01-05 04:11:00', 1, 3),
(4, 'Annual Leave','2018-01-17 17:09:34','2018-01-21 14:30:44', 1, 4),
(5, 'Casual Leave','2018-01-09 23:31:16','2018-01-12 15:11:17', 1, 3),
(6, 'Annual Leave','2018-02-16 18:01:03','2018-02-19 17:16:04', 1, 2)
The query I am using to get the output is something like this:
SELECT Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
SUM(DATEDIFF(Day, Leave.DateFrom, Leave.DateTo)) [#OfLeaves],
DatePart(MONTH, Leave.DateFrom)
FROM EmployeeInfo Info
LEFT JOIN Leave
ON Info.EmpID = Leave.EmpID
LEFT JOIN Addition
ON Info.EmpID = Addition.EmpID
LEFT JOIN Deduction
ON Info.EmpID = Deduction.EmpID
WHERE Approved = 1
GROUP BY Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
DatePart(MONTH, Leave.DateFrom)
I actually want to get the output which I could be able to show on the report but somehow as I'm using joins the data is repeating on multiple rows for same user and that's why it's also appearing multiple times on the report.
The output I am getting is something like this
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey Bonus 2000 NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
Although what I want it looks something like this:
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey NULL NULL NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
As there was only one bonus and it was not allocated multiple times than it should appear one time. I am stuck in formatting the table layout so I think I might able to get a hint in formatting the output in query so I won't have to do there.
Best,
My own recommendation on this case is to change the left joins to a single table in the following way:
select
info.employeename, additiontype, additionamount, deductiontype, deductionamount, leavetype, #ofleaves, leavemth
from Employeeinfo info
join
(
Select
Leave.empid, null as additiontype, null as additionamount, null as deductiontype, null as deductionamount, leave.leavetype, DATEDIFF(Day, Leave.DateFrom, Leave.DateTo) [#OfLeaves], DatePart(MONTH, DateFrom) leavemth
from leave
where approved = 1
Union all
Select
Addition.empid, additiontype, amount, null, null, null, null, null
From addition
Union all
Select empid, null, null, deductiontype, amount, null, null, null
From deduction
) payadj on payadj.empid= info.empid
This approach separates the different pay adjustments into the different columns and also ensures that you don't get the double ups where this joins add multiple employee IDs.
You might need to explicitly name all the null columns for each Union - I haven't tested it, but I thought you only need to name the columns in a union all once.
The output comes in the format below;
employeename bonus leavetype
Lacey 2000 null
Lacey null Sick Leave
Lacey null Annual Leave
Rather than type out the full result set here is a link to sqlfiddle;
http://sqlfiddle.com/#!18/935e9/5/0
The problem you're facing is based on how you are joining the tables together. It's not syntax that's necessarily wrong but how we look at the data and how we understand the relationships between the tables. When doing the LEFT JOINs your query is able to find EmpIDs in each table and it is happy with that and grabs the records (or returns NULL if there are no records matching the EmpID). That isn't really what you're looking for since it can join too much together. So let's see why this is happening. If we take out the join to the Addition table your results would look like this:
Fay NULL NULL Casual Leave 4 1
Lacey NULL NULL Annual Leave 3 2
Lacey NULL NULL Sick Leave 4 2
Marcia Delayed Project Completion 5000 Annual Leave 2 1
Mohammad Late Deductions 2000 Annual Leave 4 1
You are still left with two rows for Lacey. The reason for these two rows is because of the join to the Leave table. Lacey has taken two leaves of absence. One for Sick Leave and the other for Annual Leave. Both of those records share the same EmpID of 2. So when you join to the Addition table (and/or to the rest of the tables) on EmpID the join looks for all matching records to complete that join. There's a single Addition record that matches two Leave records joined on EmpID. Thus, you end up with two Bonus results--the same Addition record for the two Leave records. Try running this query and check the results, it should also illustrate the problem:
SELECT l.LeaveType, l.EmpID, a.AdditionType, a.Amount
FROM Leave l
LEFT JOIN Addition a ON a.EmpID = l.EmpID
The results using your provided data would be:
Annual Leave 1 NULL NULL
Sick Leave 2 Bonus 2000
Casual Leave 3 NULL NULL
Annual Leave 4 NULL NULL
Casual Leave 3 NULL NULL
Annual Leave 2 Bonus 2000
So the data itself isn't wrong. It's just that when joining on EmpID in this way the relationships may be confusing.
So the problem is the relationship between the Leave table and the others. It doesn't make sense to join Leave to the Addition or Deduction tables directly on EmpID because it may look as though Lacey received a bonus for each leave of absence for example. This is what you are experiencing here.
I would suggest three separate queries (and potentially three reports). One to return the leave of absence data and the others for the Addition and Deduction data. Something like:
--Return each employee's leaves of absence
SELECT e.EmployeeName
, l.LeaveType
, SUM(DATEDIFF(Day, l.DateFrom, l.DateTo)) [#OfLeaves]
, DatePart(MONTH, l.DateFrom)
FROM EmployeeInfo e
LEFT JOIN Leave l ON e.EmpID = l.EmpID
WHERE l.Approved = 1
--Return each employee's Additions
SELECT e.EmployeeName
, a.AdditionType
, a.Amount
FROM EmployeeInfo e
LEFT JOIN Addition a ON e.EmpID = a.EmpID
--Return each employee's Deductions
SELECT e.EmployeeName
, d.DeductionType
, d.Amount
FROM EmployeeInfo e
LEFT JOIN Deduction d ON e.EmpID = d.EmpID
Having three queries should better represent the relationship the EmployeeInfo table has with each of the others and separate concerns. From there you can GROUP BY the different types of data and aggregate the values and get total counts and sums.
Here are some resources which may help if you hadn't found these already:
Explanation of SQL Joins: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SQL Join Examples: https://www.w3schools.com/sql/sql_join.asp
Telerik Reporting Documentation: https://docs.telerik.com/reporting/overview

different estimated rows on same index operation?

Introduction and Background
I had to optimize a simple query (example below). After rewriting it several times I recognized that the estimated row count on the one and same index operation differs depending on the way the query is written.
Originally the query did a clustered index scan, as the table in production contains a binary column the table is quite large (about 100 GB) and the full table scan takes too much time to execute.
Question
Why is the estimated row count different on the same index operation (example will show)? What is the optimizer doing here?
the example database - I am using SQL Server 2008 R2
I tried to create a very simplyfied version of my production tables that shows the behaviour.
-- CREATE THE SAMPLE TABLES
----------------------------
CREATE TABLE dbo.MasterTable(
MasterId smallint NOT NULL,
Name varchar(5) NOT NULL,
CONSTRAINT PK_MasterTable PRIMARY KEY CLUSTERED (MasterId ASC)
) ON [PRIMARY]
GO
CREATE TABLE dbo.DetailTable(
DetailId bigint IDENTITY(1,1) NOT NULL,
MasterId smallint NOT NULL,
Name nvarchar(50) NOT NULL,
CreateDate datetime NOT NULL,
CONSTRAINT PK_DetailTable PRIMARY KEY CLUSTERED (DetailId ASC)
) ON [PRIMARY]
GO
ALTER TABLE dbo.DetailTable
ADD CONSTRAINT FK1
FOREIGN KEY(MasterId) REFERENCES dbo.MasterTable (MasterId)
GO
CREATE NONCLUSTERED INDEX IX_DetailTable
ON dbo.DetailTable( MasterId ASC, Name ASC )
GO
-- INSERT SOME SAMPLE DATA
----------------------------
SET NOCOUNT ON
GO
-- These are some Codes. In our system we always use these codes to search for "types" of data.
INSERT INTO dbo.MasterTable (MasterId, Name)
VALUES (1, 'N1'), (2, 'N2'), (3, 'N3'), (4, 'N4'), (5, 'N5'), (6, 'N6'), (7, 'N7'), (8, 'N8')
GO
-- ADD ROWS TO THE DETAIL TABLE
-- Takes about 1 minute to run
-- Don't care about the logic, it's just to get a distribution similar to production system
----------------------------
declare #x int = 1
DECLARE #MasterID INT
while (#x <= 400000)
begin
SET #MasterID = ABS(CHECKSUM(NEWID())) % 8 + 1
INSERT INTO dbo.DetailTable(MasterId,Name,CreateDate)
VALUES(
CASE
WHEN #MasterID IN (1, 3, 4) AND #x % 20 != 0 THEN 2
WHEN #MasterID IN (5, 6) AND #x % 20 != 0 THEN 7
WHEN #MasterID = 8 AND #x % 100 != 0 THEN 7
ELSE #MasterID
END,
NEWID(),
DATEADD(DAY, - ABS(CHECKSUM(NEWID())) % 1000, GETDATE())
)
SET #x = #x + 1
end
go
-- DO THE INDEX AND STATISTIC MAINTENANCE
----------------------------
alter index all on dbo.DetailTable reorganize
alter index all on dbo.MasterTable reorganize
update statistics dbo.DetailTable WITH FULLSCAN
update statistics dbo.MasterTable WITH FULLSCAN
go
Preparation is done, let's start with the query
Let's have a look at the statistics first, look at RANGE_HI_KEY=8, there are 489 EQ_ROWS
-- CHECK THE STATISTICS
----------------------------
dbcc show_statistics ('dbo.DetailTable', IX_DetailTable)
GO
Now we do the query. The first one is the original query I had to optimize.
Please activate the current execution plan when executing.
Have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"
-- ORIGINAL QUERY
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- FORCESEEK
----------------------------
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 50.000
-- TABLE VARIABLE
----------------------------
DECLARE #MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
GO
-- Actual: 489, Estimated 40.000
-- TEMP TABLE
----------------------------
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
-- Actual 489, Estimated 489
DROP TABLE #MasterId
GO
Analyse and final question(s)
Please have a look at the operation "index seek (nonclustered) [DetailTable].[IX_DetailTable]"
The comments in the script above show you the values I got for estimated and actual row count.
In our production environment this table has 33 million rows, the estimated rows in the queries above differ from 3 million to 16 million.
To summarize:
when a join between the DetailTable and the MasterTable is made, the estimated rowcount is 12,5% (there are 8 values in the master table, it makes sense, kind of...)
when a join between the DetailTable and the table variable is made, the estimated rowcount is 10%
when a join between the DetailTable and the temp table is made, the estimated rowcount is exactly the same as the actual row count
The question is why do these values differ?
The statistics are up to date and making an estimation should really be easy.
I just would like to understand this.
As nobody answer i ll try to give answer :
Please don`t force optimizer to follow you
(1) Explanation about you original query :
SELECT d.DetailId
FROM dbo.DetailTable d
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
Why this query is slow ?
this query is slow because your indexes are not covering this query,
both query are using index scan and than joining with "Hash join":
WHY scanning entire row for mastertable ?
Because index on Master table is on column MasterId , not on column Name.
WHY scanning entire row for Detailtable? Because here as well index is on
(DETAILID) "CLUSTERED" AND ( MasterId ASC, Name ASC ) "NON CLUSTERED"
not on Createdate column.
having one NONCLUSTERED index will help this query ON column (CREATEDATE,MasterId ) for this particular Query.
If your Master table is huge as well you can create NONCLUSTERED index on (Name) column.
(2) Explanation on FORCESEEK :
-- FORCESEEK
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN dbo.MasterTable m ON d.MasterId = m.MasterId
WHERE m.Name = 'N8'
AND d.CreateDate > '20150312 11:00:00'
GO
Why optimizer estimated 50,000 row ?
Here you are joining on column d.MasterId = m.MasterId and you are FORCING optimizer to choose seek on Detail table, so
optizer using INDEX IX_DetailTable () to join your Mastertable using LOOP join .
Since Optimizer chooosing Loop join to join all rows (Actually ONE) of MAster table to Detail table
so it will choose one key from master table then seek for entire index and then pass the matching value to further iterator.
so optimizer chooses Average of rows per value .
8 unique values in column 40000 table cardinality (rows) so
40000 / 8 Is 50,000 rows estimated (fair enough).
(3) -- TABLE VARIABLE
Here is your query :
DECLARE #MasterId AS TABLE( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
GO
Statatictic does not maintain on table variable so optimzer has no idia how many rows( so it estimate 1 row )it gonaa deal with to produce a good plan,
here as well estimated rows are 1 and actual row 1 aswell congrates!!
but how optimizer Estimated "40.000" ROWS
Personally i never checked this and because of this question i did servels testing, but have no idia how optimzer calculating estimated rows, so it will be great if someone come up and enlight us.
(4) -- TEMP TABLE
Your Query
CREATE TABLE #MasterId( MasterId SMALLINT )
INSERT INTO #MasterId (MasterId)
SELECT MasterID FROM dbo.MasterTable WHERE Name = 'N8'
SELECT d.DetailId
FROM dbo.DetailTable d --WITH (FORCESEEK)
INNER JOIN #MasterId m ON d.MasterId = m.MasterId
WHERE d.CreateDate > '20150312 11:00:00'
-- Actual 489, Estimated 489
DROP TABLE #MasterId
here as well optimizer is choosing same query plan as was choosing in table variable but diffrence is
Statistics does maintain on temp tables, So Here in query optimizer has a fair idia what row it actually going to join.
"N8" key has 8, and 8`s estimated rows in dbo.DetailTable is 489.

loop through values and update after each one completes

I have the following code that i need to run for 350 locations it takes an hour to do 5 locations so I run 5 at a time by using where location_code in ('0001', '0002', '0003', '0005', '0006') I would like to create a temp table with 2 columns one location_id and the other completed and loop through each value on the location_id column then update the completed column with date and time stamp when complete and commit after each. This way I can just let it run and if i need to kill it i can see the last completed location_id and know where to restart the process from or better yet have it check for a vaule in the completed column and if exists go to the next .....
--Collecting all records containing remnant cost. You will need to specify the location number(s). In the example below we're using location 0035
select sku_id, ib.location_id, price_status_id, inventory_status_id, sum(transaction_units) as units, sum(transaction_cost) as cost,
sum(transaction_valuation_retail) as val_retail, sum(transaction_selling_retail) as sell_retail
into #remnant_cost
from ib_inventory ib
inner join location l on l.location_id = ib.location_id
where location_code in ('0001', '0002', '0003', '0005', '0006')
group by sku_id, ib.location_id, price_status_id, inventory_status_id
having sum(transaction_units) = 0
and sum(transaction_cost) <> 0
--Verify the total remnant cost.
select location_id, sum(units) as units, sum(cost) as cost, sum(val_retail) as val_retail, sum(sell_retail) as sell_retail
from #remnant_cost
group by location_id
select *
from #remnant_cost
----------------------------------------------------Run above this line first and gather results--------------------------------
--inserting into a temp table the cost negation using transaction_type_code 500 (Actual shrink) before inserting into ib_inventory
--corrected query adding transaction date as column heading (Marshall)
select
sku_id, location_id, price_status_id, convert(smalldatetime,convert(varchar(50),getdate(),101)) as transaction_date, 500 as transaction_type_code, inventory_status_id, NULL as other_location_id,
NULL as transaction_reason_id, 999999 as document_number, 0 as transaction_units, cost * -1 as transaction_cost, 0 as transaction_valuation_retail,
0 as transaction_selling_retail,NULL as price_change_type, NULL as units_affected
into #rem_fix
from #remnant_cost
--Validating to make sure cost will have the exact opposite to negate.
select location_id, sum(transaction_units) as units, sum(transaction_cost) as cost, sum(transaction_valuation_retail) as val_retail,
sum(transaction_selling_retail) as sell_retail
from #rem_fix
group by location_id
BEGIN TRAN
EXEC inventory_update_$sp 'SELECT sku_id,location_id,price_status_id,transaction_date,transaction_type_code,inventory_status_id,other_location_id,
transaction_reason_id,document_number,transaction_units,transaction_cost,transaction_valuation_retail,transaction_selling_retail,price_change_type,
units_affected FROM #rem_fix'
COMMIT
Making some assumptions about your schema:
-- A working table to track progress that will stick around.
create table dbo.Location_Codes
( Location_Code VarChar(4), Started DateTime NULL, Completed DateTime NULL );
Then break up the work this way:
if not exists ( select 42 from dbo.Location_Codes where Completed is NULL )
begin
-- All of the locations have been processed (or this is the first time through).
delete from dbo.Location_Codes;
-- Get all of the location codes.
insert into dbo.Location_Codes
select Location_Code, NULL, NULL
from Location;
end
-- Temporary table to make things easier.
declare #Pass_Location_Codes as Table ( Location_Code VarChar(4) );
-- Loop until all locations have been processed.
while exists ( select 42 from dbo.Location_Codes where Completed is NULL )
begin
-- Get the next five locations for which processing has not completed.
delete from #Pass_Location_Codes;
insert into #Pass_Location_Codes
select top 5 Location_Code
from dbo.Location_Codes
where Completed is NULL
order by Location_Code;
-- Record the start date/time.
update dbo.Location_Codes
set Started = GetDate()
where Location_Code in ( select Location_Code from #Pass_Location_Codes );
-- Run the big query.
select ...
where Location_Code in ( select Location_Code from #Pass_Location_Codes )
...
-- Record the completion date/time.
update dbo.Location_Codes
set Completed = GetDate()
where Location_Code in ( select Location_Code from #Pass_Location_Codes );
end

Resources