Matching in 2 Waves. Second "Wave" is based on First "Wave" - sql-server
I have a "matching" scenario where I need to match records from a table.
I've altered my situation to use the Northwind database .. for illustration purposes.
Given a "set" of data (put in my #holder table below), I need to find matches based on the following criteria.
If both lastname and firstname match, match on TWO or more of the following : (city-state-together OR zip) , phone, extension
If one of either lastname OR firstname match, match on THREE or more of the following : (city-state-together OR zip) , phone,
extension
Note that "city-state-together OR zip" means that I need to match on the combination of city-and-state ........ or zip..........and if all three match, (city-state-and-zip), that should still only count as "1" for the "(ColumnCityStateZipEnum + ColumnHomePhoneEnum + ColumnExtensionEnum)" calculation.
I've come up with the below. But I have 7 left joins.
Is there another way to do this kind of problem in SQL?
Use Northwind /* Or NorthwindPartial */
GO
declare #holder table ( holderidentitykey int identity (1,1), lastname varchar(32) , firstname varchar(48) , city varchar(32) , stateabbr varchar(32) , zip varchar(5) , homephone varchar(16) , extension varchar(8) )
insert into #holder ( lastname , firstname , city , stateabbr, zip, homephone , extension )
select null , null, null, null, null , null, null
union all select 'Davolio' , 'Nancy', null, null, '98122' , '(206) 555-9857', null /* should 'match'. lastname, firstname and TWO of the other data-elements */
union all select 'Davolio' , null, null, null, null , null, null
union all select 'Fuller' , 'Andrew', 'Tacoma', 'WA', null , null, null
union all select 'Peacock' , 'MaggyNotAMatchNoPhone', 'Redmond', 'WA', '98052' , null, null
union all select 'Peacock' , 'MaggyNotAMatchWithPhoneAndExtension', 'Redmond', 'WA', '98052' , '(206) 555-8122', '5176' /* should 'match'. lastname and THREE of the other data-elements */
/*
If both lastname and firstname match, match on TWO or more of the following : (city-state-together OR zip) , phone, extension
If one of either lastname OR firstname match, match on THREE or more of the following : (city-state-together OR zip) , phone, extension
*/
select distinct * from
(
select
holderidentitykey,
ColumnLastNameFirstNameEnum =
case
when h.lastname = eLastName.LastName and h.firstname = eFirstName.FirstName then 2
when h.lastname = eLastName.LastName then 1
when h.firstname = eFirstName.FirstName then 1
else 0
end
,
ColumnCityStateZipEnum =
case
when h.zip = eZip.PostalCode then 1
when h.city = eCity.City and h.stateabbr = eState.Region then 1
else 0
end
,
ColumnHomePhoneEnum =
case
when h.homephone = eHomePhone.HomePhone then 1
else 0
end
,
ColumnExtensionEnum =
case
when h.extension = eExtension.Extension then 1
else 0
end
, eLastName.LastName , eFirstName.FirstName, eZip.PostalCode, eCity.City, eState.Region, eHomePhone.HomePhone, eExtension.Extension
from
#holder h
left join dbo.Employees eLastName on h.lastname = eLastName.LastName
left join dbo.Employees eFirstName on h.firstname = eFirstName.FirstName
left join dbo.Employees eZip on h.zip = eZip.PostalCode
left join dbo.Employees eCity on h.city = eCity.City
left join dbo.Employees eState on h.stateabbr = eState.Region
left join dbo.Employees eHomePhone on h.homephone = eHomePhone.HomePhone
left join dbo.Employees eExtension on h.extension = eExtension.Extension
) as derived1
where
derived1.ColumnLastNameFirstNameEnum >= 2 and (ColumnCityStateZipEnum + ColumnHomePhoneEnum + ColumnExtensionEnum) >= 2
OR
derived1.ColumnLastNameFirstNameEnum >= 1 and (ColumnCityStateZipEnum + ColumnHomePhoneEnum + ColumnExtensionEnum) >= 3
-- select * from dbo.Employees e
Here is a "partial" Northwind creation if you don't have one handy.
SET NOCOUNT ON
GO
USE master
GO
if exists (select * from sysdatabases where name='NorthwindPartial')
drop database NorthwindPartial
go
DECLARE #device_directory NVARCHAR(520)
SELECT #device_directory = SUBSTRING(filename, 1, CHARINDEX(N'master.mdf', LOWER(filename)) - 1)
FROM master.dbo.sysaltfiles WHERE dbid = 1 AND fileid = 1
EXECUTE (N'CREATE DATABASE NorthwindPartial
ON PRIMARY (NAME = N''NorthwindPartial'', FILENAME = N''' + #device_directory + N'northwndPartial.mdf'')
LOG ON (NAME = N''NorthwindPartial_log'', FILENAME = N''' + #device_directory + N'northwndPartial.ldf'')')
go
GO
set quoted_identifier on
GO
/* Set DATEFORMAT so that the date strings are interpreted correctly regardless of
the default DATEFORMAT on the server.
*/
SET DATEFORMAT mdy
GO
use "NorthwindPartial"
go
if exists (select * from sysobjects where id = object_id('dbo.Employees') and sysstat & 0xf = 3)
drop table "dbo"."Employees"
GO
CREATE TABLE "Employees" (
"EmployeeID" "int" IDENTITY (1, 1) NOT NULL ,
"LastName" nvarchar (20) NOT NULL ,
"FirstName" nvarchar (10) NOT NULL ,
"Title" nvarchar (30) NULL ,
"TitleOfCourtesy" nvarchar (25) NULL ,
"BirthDate" "datetime" NULL ,
"HireDate" "datetime" NULL ,
"Address" nvarchar (60) NULL ,
"City" nvarchar (15) NULL ,
"Region" nvarchar (15) NULL ,
"PostalCode" nvarchar (10) NULL ,
"Country" nvarchar (15) NULL ,
"HomePhone" nvarchar (24) NULL ,
"Extension" nvarchar (4) NULL ,
"Photo" "image" NULL ,
"Notes" "ntext" NULL ,
"ReportsTo" "int" NULL ,
"PhotoPath" nvarchar (255) NULL ,
CONSTRAINT "PK_Employees" PRIMARY KEY CLUSTERED
(
"EmployeeID"
),
CONSTRAINT "FK_Employees_Employees" FOREIGN KEY
(
"ReportsTo"
) REFERENCES "dbo"."Employees" (
"EmployeeID"
),
CONSTRAINT "CK_Birthdate" CHECK (BirthDate < getdate())
)
GO
CREATE INDEX "LastName" ON "dbo"."Employees"("LastName")
GO
CREATE INDEX "PostalCode" ON "dbo"."Employees"("PostalCode")
GO
set quoted_identifier on
go
set identity_insert "Employees" on
go
ALTER TABLE "Employees" NOCHECK CONSTRAINT ALL
go
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(1,'Davolio','Nancy','Sales Representative','Ms.','12/08/1948','05/01/1992','507 - 20th Ave. E.
Apt. 2A','Seattle','WA','98122','USA','(206) 555-9857','5467',null,'Education includes a BA in psychology from Colorado State University in 1970. She also completed "The Art of the Cold Call." Nancy is a member of Toastmasters International.',2,'http://accweb/emmployees/davolio.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(2,'Fuller','Andrew','Vice President, Sales','Dr.','02/19/1952','08/14/1992','908 W. Capital Way','Tacoma','WA','98401','USA','(206) 555-9482','3457',null,'Andrew received his BTS commercial in 1974 and a Ph.D. in international marketing from the University of Dallas in 1981. He is fluent in French and Italian and reads German. He joined the company as a sales representative, was promoted to sales manager in January 1992 and to vice president of sales in March 1993. Andrew is a member of the Sales Management Roundtable, the Seattle Chamber of Commerce, and the Pacific Rim Importers Association.',NULL,'http://accweb/emmployees/fuller.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(3,'Leverling','Janet','Sales Representative','Ms.','08/30/1963','04/01/1992','722 Moss Bay Blvd.','Kirkland','WA','98033','USA','(206) 555-3412','3355',null,'Janet has a BS degree in chemistry from Boston College (1984). She has also completed a certificate program in food retailing management. Janet was hired as a sales associate in 1991 and promoted to sales representative in February 1992.',2,'http://accweb/emmployees/leverling.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(4,'Peacock','Margaret','Sales Representative','Mrs.','09/19/1937','05/03/1993','4110 Old Redmond Rd.','Redmond','WA','98052','USA','(206) 555-8122','5176',null,'Margaret holds a BA in English literature from Concordia College (1958) and an MA from the American Institute of Culinary Arts (1966). She was assigned to the London office temporarily from July through November 1992.',2,'http://accweb/emmployees/peacock.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(5,'Buchanan','Steven','Sales Manager','Mr.','03/04/1955','10/17/1993','14 Garrett Hill','London',NULL,'SW1 8JR','UK','(71) 555-4848','3453',null,'Steven Buchanan graduated from St. Andrews University, Scotland, with a BSC degree in 1976. Upon joining the company as a sales representative in 1992, he spent 6 months in an orientation program at the Seattle office and then returned to his permanent post in London. He was promoted to sales manager in March 1993. Mr. Buchanan has completed the courses "Successful Telemarketing" and "International Sales Management." He is fluent in French.',2,'http://accweb/emmployees/buchanan.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(6,'Suyama','Michael','Sales Representative','Mr.','07/02/1963','10/17/1993','Coventry House
Miner Rd.','London',NULL,'EC2 7JR','UK','(71) 555-7773','428',null,'Michael is a graduate of Sussex University (MA, economics, 1983) and the University of California at Los Angeles (MBA, marketing, 1986). He has also taken the courses "Multi-Cultural Selling" and "Time Management for the Sales Professional." He is fluent in Japanese and can read and write French, Portuguese, and Spanish.',5,'http://accweb/emmployees/davolio.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(7,'King','Robert','Sales Representative','Mr.','05/29/1960','01/02/1994','Edgeham Hollow
Winchester Way','London',NULL,'RG1 9SP','UK','(71) 555-5598','465',null,'Robert King served in the Peace Corps and traveled extensively before completing his degree in English at the University of Michigan in 1992, the year he joined the company. After completing a course entitled "Selling in Europe," he was transferred to the London office in March 1993.',5,'http://accweb/emmployees/davolio.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(8,'Callahan','Laura','Inside Sales Coordinator','Ms.','01/09/1958','03/05/1994','4726 - 11th Ave. N.E.','Seattle','WA','98105','USA','(206) 555-1189','2344',null,'Laura received a BA in psychology from the University of Washington. She has also completed a course in business French. She reads and writes French.',2,'http://accweb/emmployees/davolio.bmp')
GO
INSERT "Employees"("EmployeeID","LastName","FirstName","Title","TitleOfCourtesy","BirthDate","HireDate","Address","City","Region","PostalCode","Country","HomePhone","Extension","Photo","Notes","ReportsTo","PhotoPath") VALUES(9,'Dodsworth','Anne','Sales Representative','Ms.','01/27/1966','11/15/1994','7 Houndstooth Rd.','London',NULL,'WG2 7LT','UK','(71) 555-4444','452',null,'Anne has a BA degree in English from St. Lawrence College. She is fluent in French and German.',5,'http://accweb/emmployees/davolio.bmp')
go
set identity_insert "Employees" off
go
ALTER TABLE "Employees" CHECK CONSTRAINT ALL
go
set quoted_identifier on
go
Select * from "Employees"
It's probably helpful to analyse your match rule a little, if we break it down we can see that the non-negotiable condition for a match is that either the FirstName OR the LastName matches. So let's build a query where we join only those rows from the employee table:
...
FROM #holder As h
JOIN Employee As e
ON h.FirstName = e.FirstName
OR h.LastName = e.LastName
...
Now that we're only looking at rows which meet the minimum criteria, we can assess the others. Basically your rule says that if either FirstName or LastName match, then we need a minimum of three of the following (let's assume that we matched FirstName):
Match LastName
Match City AND State, OR PostalCode
Match HomePhone
Match Extension
You present different rules depending if both FirstName and LastName match, but as long as you have one of those two then it so happens that the rules are mathematically equivalent from the perspective that I'm taking.
So we can take our potential match rows and just count how many of those matching attributes there are, and filter out rows where there aren't enough.
Select h.holderidentitykey, e.*
From #holder As h
Join Employees As e
On h.FirstName = e.FirstName
Or h.lastname = e.LastName
Where iif(h.firstname = e.firstname, 1, 0) +
iif(h.lastname = e.LastName, 1, 0) +
iif((h.city = e.City AND h.stateabbr = e.Region) OR h.zip = e.PostalCode, 1, 0) +
iif(h.homephone = e.HomePhone, 1, 0) +
iif(h.extension = e.Extension, 1, 0) >= 4;
Please note that this approach may not scale well if you have large tables (1M+) and want to match often, but if/when those situations occur then you could look at refactoring.
Related
oracle database, insert data
I'm using Oracle 11g table: create or replace type address as object ( street varchar2(20), city varchar2(10), p_code varchar2(8) ) not final; / create or replace type name as object ( title varchar2(5), firstName varchar2(8), surname varchar2(8) ) not final; / create or replace type phone as object ( homePhone int, mobile1 int, mobile2 int ) not final; / create or replace type person as object ( pname name, pAddress address, Pphone phone ) not final; / create or replace type employee under person ( empId varchar2(5), position varchar2(16), salary int, joinDate date, supervisor ref employee); / create table tb_employee of employee ( primary key(empID) ) / data I insert insert into tb_employee values ( person(name('mr','jone','smith'),address('street','city','post code'),phone('11111111111','22222222222','33333333333')), position('head'), salary(1111), joinDate(12-Feb-1994), empID('001') ) insert into tb_employee values ( person(name('mr','jane','smith'),address('street','city','post code'),phone('11111111111','22222222222','33333333333')), position('accountant'), salary(2222, joinDate(13-Feb-1995), empID('002') ) insert into tb_employee values ( person(name('miss','ross','smith'),address('street','city','post code'),phone('11111111111','22222222222','33333333333')), position(manager), salary(333), joinDate(14-Feb-1996), empID('003') ) I would like to insert supervisor to data by using reference function, for example: for head (jone smith) is a supervisor or a manager (miss ross smith), manager(miss ross smith) is a supervisor of account(Mr jane smith), thanks!
You are inserting records of employee type: that applies to the whole record so you need to write a VALUES clause which matches the projection of that type. To populate the REF clause you need to select the reference of the pertinent object. Your first record doesn't have a supervisor, so we pass NULL in this case: insert into tb_employee values ( employee( name('mr','jone','smith') , address('street','city','postcode') , phone('11111111111','22222222222','33333333333') , '001' -- emp id , 'head' -- position , 11111 -- salary , to_date('12-Feb-1994','dd-mon-yyyy') -- joinDate , null-- supervisor )) / For the other records we use the INSERT ... SELECT ... FROM syntax: insert into tb_employee select employee( name('mr','jane','smith') , address('street','city','postcode') , phone('11111111111','22222222222','33333333333') , '002' -- emp id , 'accountant' -- position , 2222 -- salary , to_date('13-Feb-1995','dd-mon-yyyy') -- joinDate , ref (m) -- supervisor ) from tb_employee m where m.empid = '001' / insert into tb_employee select employee( name('miss','ross','smith') , address('street','city','postcode') , phone('11111111111','22222222222','33333333333') , '003' -- emp id , 'manager' -- position , 333 -- salary , to_date('14-Feb-1996','dd-mon-yyyy') -- joinDate , ref (m) -- supervisor ) from tb_employee m where m.empid = '002' / Here is a Oracle LiveSQL demo (free OTN account required). (It's a shame that Oracle's developer Cloud can't handle user-defined types nicely.)
Formatting data in sql
I have few tables and basically I'm working out on telerik reports. The structure and the sample data I have is given below: IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Leave')) BEGIN; DROP TABLE [Leave]; END; GO IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Addition')) BEGIN; DROP TABLE [Addition]; END; GO IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Deduction')) BEGIN; DROP TABLE [Deduction]; END; GO IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('EmployeeInfo')) BEGIN; DROP TABLE [EmployeeInfo]; END; GO CREATE TABLE [EmployeeInfo] ( [EmpID] INT NOT NULL PRIMARY KEY, [EmployeeName] VARCHAR(255) ); CREATE TABLE [Addition] ( [AdditionID] INT NOT NULL PRIMARY KEY, [AdditionType] VARCHAR(255), [Amount] VARCHAR(255), [EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID) ); CREATE TABLE [Deduction] ( [DeductionID] INT NOT NULL PRIMARY KEY, [DeductionType] VARCHAR(255), [Amount] VARCHAR(255), [EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID) ); CREATE TABLE [Leave] ( [LeaveID] INT NOT NULL PRIMARY KEY, [LeaveType] VARCHAR(255) NULL, [DateFrom] VARCHAR(255), [DateTo] VARCHAR(255), [Approved] Binary, [EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID) ); GO INSERT INTO EmployeeInfo([EmpID], [EmployeeName]) VALUES (1, 'Marcia'), (2, 'Lacey'), (3, 'Fay'), (4, 'Mohammad'), (5, 'Mike') INSERT INTO Addition([AdditionID], [AdditionType], [Amount], [EmpID]) VALUES (1, 'Bonus', '2000', 2), (2, 'Increment', '5000', 5) INSERT INTO Deduction([DeductionID], [DeductionType], [Amount], [EmpID]) VALUES (1, 'Late Deductions', '2000', 4), (2, 'Delayed Project Completion', '5000', 1) INSERT INTO Leave([LeaveID],[LeaveType],[DateFrom],[DateTo], [Approved], [EmpID]) VALUES (1, 'Annual Leave','2018-01-08 04:52:03','2018-01-10 20:30:53', 1, 1), (2, 'Sick Leave','2018-02-10 03:34:41','2018-02-14 04:52:14', 1, 2), (3, 'Casual Leave','2018-01-04 11:06:18','2018-01-05 04:11:00', 1, 3), (4, 'Annual Leave','2018-01-17 17:09:34','2018-01-21 14:30:44', 1, 4), (5, 'Casual Leave','2018-01-09 23:31:16','2018-01-12 15:11:17', 1, 3), (6, 'Annual Leave','2018-02-16 18:01:03','2018-02-19 17:16:04', 1, 2) The query I am using to get the output is something like this: SELECT Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount, Leave.LeaveType, SUM(DATEDIFF(Day, Leave.DateFrom, Leave.DateTo)) [#OfLeaves], DatePart(MONTH, Leave.DateFrom) FROM EmployeeInfo Info LEFT JOIN Leave ON Info.EmpID = Leave.EmpID LEFT JOIN Addition ON Info.EmpID = Addition.EmpID LEFT JOIN Deduction ON Info.EmpID = Deduction.EmpID WHERE Approved = 1 GROUP BY Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount, Leave.LeaveType, DatePart(MONTH, Leave.DateFrom) I actually want to get the output which I could be able to show on the report but somehow as I'm using joins the data is repeating on multiple rows for same user and that's why it's also appearing multiple times on the report. The output I am getting is something like this Fay NULL NULL NULL NULL Casual Leave 4 1 Lacey Bonus 2000 NULL NULL Annual Leave 3 2 Lacey Bonus 2000 NULL NULL Sick Leave 4 2 Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1 Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1 Although what I want it looks something like this: Fay NULL NULL NULL NULL Casual Leave 4 1 Lacey Bonus 2000 NULL NULL Annual Leave 3 2 Lacey NULL NULL NULL NULL Sick Leave 4 2 Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1 Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1 As there was only one bonus and it was not allocated multiple times than it should appear one time. I am stuck in formatting the table layout so I think I might able to get a hint in formatting the output in query so I won't have to do there. Best,
My own recommendation on this case is to change the left joins to a single table in the following way: select info.employeename, additiontype, additionamount, deductiontype, deductionamount, leavetype, #ofleaves, leavemth from Employeeinfo info join ( Select Leave.empid, null as additiontype, null as additionamount, null as deductiontype, null as deductionamount, leave.leavetype, DATEDIFF(Day, Leave.DateFrom, Leave.DateTo) [#OfLeaves], DatePart(MONTH, DateFrom) leavemth from leave where approved = 1 Union all Select Addition.empid, additiontype, amount, null, null, null, null, null From addition Union all Select empid, null, null, deductiontype, amount, null, null, null From deduction ) payadj on payadj.empid= info.empid This approach separates the different pay adjustments into the different columns and also ensures that you don't get the double ups where this joins add multiple employee IDs. You might need to explicitly name all the null columns for each Union - I haven't tested it, but I thought you only need to name the columns in a union all once. The output comes in the format below; employeename bonus leavetype Lacey 2000 null Lacey null Sick Leave Lacey null Annual Leave Rather than type out the full result set here is a link to sqlfiddle; http://sqlfiddle.com/#!18/935e9/5/0
The problem you're facing is based on how you are joining the tables together. It's not syntax that's necessarily wrong but how we look at the data and how we understand the relationships between the tables. When doing the LEFT JOINs your query is able to find EmpIDs in each table and it is happy with that and grabs the records (or returns NULL if there are no records matching the EmpID). That isn't really what you're looking for since it can join too much together. So let's see why this is happening. If we take out the join to the Addition table your results would look like this: Fay NULL NULL Casual Leave 4 1 Lacey NULL NULL Annual Leave 3 2 Lacey NULL NULL Sick Leave 4 2 Marcia Delayed Project Completion 5000 Annual Leave 2 1 Mohammad Late Deductions 2000 Annual Leave 4 1 You are still left with two rows for Lacey. The reason for these two rows is because of the join to the Leave table. Lacey has taken two leaves of absence. One for Sick Leave and the other for Annual Leave. Both of those records share the same EmpID of 2. So when you join to the Addition table (and/or to the rest of the tables) on EmpID the join looks for all matching records to complete that join. There's a single Addition record that matches two Leave records joined on EmpID. Thus, you end up with two Bonus results--the same Addition record for the two Leave records. Try running this query and check the results, it should also illustrate the problem: SELECT l.LeaveType, l.EmpID, a.AdditionType, a.Amount FROM Leave l LEFT JOIN Addition a ON a.EmpID = l.EmpID The results using your provided data would be: Annual Leave 1 NULL NULL Sick Leave 2 Bonus 2000 Casual Leave 3 NULL NULL Annual Leave 4 NULL NULL Casual Leave 3 NULL NULL Annual Leave 2 Bonus 2000 So the data itself isn't wrong. It's just that when joining on EmpID in this way the relationships may be confusing. So the problem is the relationship between the Leave table and the others. It doesn't make sense to join Leave to the Addition or Deduction tables directly on EmpID because it may look as though Lacey received a bonus for each leave of absence for example. This is what you are experiencing here. I would suggest three separate queries (and potentially three reports). One to return the leave of absence data and the others for the Addition and Deduction data. Something like: --Return each employee's leaves of absence SELECT e.EmployeeName , l.LeaveType , SUM(DATEDIFF(Day, l.DateFrom, l.DateTo)) [#OfLeaves] , DatePart(MONTH, l.DateFrom) FROM EmployeeInfo e LEFT JOIN Leave l ON e.EmpID = l.EmpID WHERE l.Approved = 1 --Return each employee's Additions SELECT e.EmployeeName , a.AdditionType , a.Amount FROM EmployeeInfo e LEFT JOIN Addition a ON e.EmpID = a.EmpID --Return each employee's Deductions SELECT e.EmployeeName , d.DeductionType , d.Amount FROM EmployeeInfo e LEFT JOIN Deduction d ON e.EmpID = d.EmpID Having three queries should better represent the relationship the EmployeeInfo table has with each of the others and separate concerns. From there you can GROUP BY the different types of data and aggregate the values and get total counts and sums. Here are some resources which may help if you hadn't found these already: Explanation of SQL Joins: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/ SQL Join Examples: https://www.w3schools.com/sql/sql_join.asp Telerik Reporting Documentation: https://docs.telerik.com/reporting/overview
T-SQL prepare dynamic COALESCE
As attached in screenshot, there are two tables. Configuration: Detail Using Configuration and Detail table I would like to populate IdentificationType and IDerivedIdentification column in the Detail table. Following logic should be used, while deriving above columns Configuration table has order of preference, which user can change dynamically (i.e. if country is Austria then ID preference should be LEI then TIN (in case LEI is blanks) then CONCAT (if both blank then some other logic) In case of contract ID = 3, country is BG, so LEI should be checked first, since its NULL, CCPT = 456 will be picked. I could have used COALESCE and CASE statement, in case hardcoding is allowed. Can you please suggest any alternation approach please ? Regards Digant
Assuming that this is some horrendous data dump and you are trying to clean it up here is some SQL to throw at it. :) Firstly, I was able to capture your image text via Adobe Acrobat > Excel. (I also built the schema for you at: http://sqlfiddle.com/#!6/8f404/12) Firstly, the correct thing to do is fix the glaring problem and that's the table structure. Assuming you can't here's a solution. So, here it is and what it does is unpivots the columns LEI, NIND, CCPT and TIN from the detail table and also as well as FirstPref, SecondPref, ThirdPref from the Configuration table. Basically, doing this helps to normalize the data although it's costing you major performance if there are no plans to fix the data structure or you cannot. After that you are simply joining the tables Detail.ContactId to DerivedTypes.ContactId then DerivedPrefs.ISOCountryCode to Detail.CountrylSOCountryCode and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType If you use an inner join rather than the left join you can remove the RANK() function but it will not show all ContactIds, only those that have a value in their LEI, NIND, CCPT or TIN columns. I think that's a better solution anyway because why would you want to see an error mixed in a report? Write a separate report for those with no values in those columns. Lastly, the TOP (1) with ties allows you to display one record per ContactId and allows for the record with the error to still display. Hope this helps. CREATE TABLE Configuration (ISOCountryCode varchar(2), CountryName varchar(8), FirstPref varchar(6), SecondPref varchar(6), ThirdPref varchar(6)) ; INSERT INTO Configuration (ISOCountryCode, CountryName, FirstPref, SecondPref, ThirdPref) VALUES ('AT', 'Austria', 'LEI', 'TIN', 'CONCAT'), ('BE', 'Belgium', 'LEI', 'NIND', 'CONCAT'), ('BG', 'Bulgaria', 'LEI', 'CCPT', 'CONCAT'), ('CY', 'Cyprus', 'LEI', 'NIND', 'CONCAT') ; CREATE TABLE Detail (ContactId int, FirstName varchar(1), LastName varchar(3), BirthDate varchar(4), CountrylSOCountryCode varchar(2), Nationality varchar(2), LEI varchar(9), NIND varchar(9), CCPT varchar(9), TIN varchar(9)) ; INSERT INTO Detail (ContactId, FirstName, LastName, BirthDate, CountrylSOCountryCode, Nationality, LEI, NIND, CCPT, TIN) VALUES (1, 'A', 'DES', NULL, 'AT', 'AT', '123', '4345', NULL, NULL), (2, 'B', 'DEG', NULL, 'BE', 'BE', NULL, '890', NULL, NULL), (3, 'C', 'DEH', NULL, 'BG', 'BG', NULL, '123', '456', NULL), (4, 'D', 'DEi', NULL, 'BG', 'BG', NULL, NULL, NULL, NULL) ; SELECT TOP (1) with ties Detail.ContactId, FirstName, LastName, BirthDate, CountrylSOCountryCode, Nationality, LEI, NIND, CCPT, TIN, ISNULL(DerivedPrefs.ldentificationType, 'ERROR') ldentificationType, IDerivedIdentification, RANK() OVER (PARTITION BY Detail.ContactId ORDER BY CASE WHEN Pref = 'FirstPref' THEN 1 WHEN Pref = 'SecondPref' THEN 2 WHEN Pref = 'ThirdPref' THEN 3 ELSE 99 END) AS PrefRank FROM Detail LEFT JOIN ( SELECT ContactId, LEI, NIND, CCPT, TIN FROM Detail ) DetailUNPVT UNPIVOT (IDerivedIdentification FOR ldentificationType IN (LEI, NIND, CCPT, TIN) )AS DerivedTypes ON DerivedTypes.ContactId = Detail.ContactId LEFT JOIN ( SELECT ISOCountryCode, CountryName, FirstPref, SecondPref, ThirdPref FROM Configuration ) ConfigUNPIVOT UNPIVOT (ldentificationType FOR Pref IN (FirstPref, SecondPref, ThirdPref) )AS DerivedPrefs ON DerivedPrefs.ISOCountryCode = Detail.CountrylSOCountryCode and DerivedTypes.ldentificationType = DerivedPrefs.ldentificationType ORDER BY RANK() OVER (PARTITION BY Detail.ContactId ORDER BY CASE WHEN Pref = 'FirstPref' THEN 1 WHEN Pref = 'SecondPref' THEN 2 WHEN Pref = 'ThirdPref' THEN 3 ELSE 99 END)
Multiple rows in one
I know there are similar questions like mine but unfortunately I haven't found the corresponding solution to my problem. First off here's a simplified overview of my tables: Partner table: PartnerID, Name Address table: PartnerID, Street, Postcode, City, ValidFrom Contact table: PartnerID, TypID, TelNr, Email, ValidFrom A partner can have one or more addresses as well as contact info. With the contact info a partner could have let's say 2 tel numbers, 1 mobile number and 1 email. In the table it would look like this: PartnerID TypID TelNr Email ValidFrom 1 1 0041 / 044 - 2002020 01.01.2010 1 1 0041 / 044 - 2003030 01.01.2011 1 2 0041 / 079 - 7003030 01.04.2011 1 3 myemail#hotmail.com 01.06.2011 What I need in the end is, combining all tables for each partner, is like this: PartnerID Name Street Postcode City TelNr Email 1 SomeGuy MostActualStreet MostActualPC MostActualCity MostActual Nr (either type 1 or 2) MostActual Email Any help?
Here's some T-SQL that gets the answer I think you're looking for if by "Most Actual" you mean "most recent": WITH LatestAddress (PartnerID,Street,PostCode,City) AS ( SELECT PartnerID,Street,PostCode,City FROM [Address] a WHERE ValidFrom = ( SELECT MAX(ValidFrom) FROM [Address] aMax WHERE aMax.PartnerID = a.PartnerID ) ) SELECT p.PartnerID,p.Name,la.Street,la.PostCode,la.City ,(SELECT TOP 1 TelNr FROM Contact c WHERE c.PartnerID = p.PartnerID AND TelNr IS NOT NULL ORDER BY ValidFrom DESC) AS MostRecentTelNr ,(SELECT TOP 1 Email FROM Contact c WHERE c.PartnerID = p.PartnerID AND Email IS NOT NULL ORDER BY ValidFrom DESC) AS MostRecentEmail FROM [Partner] p LEFT OUTER JOIN LatestAddress la ON p.PartnerID = la.PartnerID Breaking it down, this example used a common table expression (CTE) to get the latest address information for each Partner WITH LatestAddress (PartnerID,Street,PostCode,City) AS ( SELECT PartnerID,Street,PostCode,City FROM [Address] a WHERE ValidFrom = ( SELECT MAX(ValidFrom) FROM [Address] aMax WHERE aMax.PartnerID = a.PartnerID ) ) I left-joined from the Partner table to the CTE, because I didn't want partners who don't have addresses to get left out of the results. FROM [Partner] p LEFT OUTER JOIN LatestAddress la ON p.PartnerID = la.PartnerID In the SELECT statement, I selected columns from the Partner table, the CTE, and I wrote two subselects, one for the latest non-null telephone number for each partner, and one for the latest non-null email address for each partner. I was able to do this as a subselect because I knew I was returning a scalar value by selecting the TOP 1. SELECT p.PartnerID,p.Name,la.Street,la.PostCode,la.City ,(SELECT TOP 1 TelNr FROM Contact c WHERE c.PartnerID = p.PartnerID AND TelNr IS NOT NULL ORDER BY ValidFrom DESC) AS MostRecentTelNr ,(SELECT TOP 1 Email FROM Contact c WHERE c.PartnerID = p.PartnerID AND Email IS NOT NULL ORDER BY ValidFrom DESC) AS MostRecentEmail I would strongly recommend that you separate your Contact table into separate telephone number and email tables, each with their own ValidDate if you need to be able to keep old phone numbers and email addresses.
Check out my answer on another post which explains how to get the most actual information in a case like yours : Aggregate SQL Function to grab only the first from each group PS : The DATE_FIELD would be ValidFrom in your case.
Options for indexing a view with cte
I have a view for which I want to create an Indexed view. After a lot of energy I was able to put the sql query in place for the view and It looks like this - ALTER VIEW [dbo].[FriendBalances] WITH SCHEMABINDING as WITH trans (Amount,PaidBy,PaidFor, Id) AS (SELECT Amount,userid AS PaidBy, PaidForUsers_FbUserId AS PaidFor, Id FROM dbo.Transactions FULL JOIN dbo.TransactionUser ON dbo.Transactions.Id = dbo.TransactionUser.TransactionsPaidFor_Id), bal (PaidBy,PaidFor,Balance) AS (SELECT PaidBy,PaidFor, SUM( Amount/ transactionCounts.[_count]) AS Balance FROM trans JOIN (SELECT Id,COUNT(*)AS _count FROM trans GROUP BY Id) AS transactionCounts ON trans.Id = transactionCounts.Id AND trans.PaidBy <> trans.PaidFor GROUP BY trans.PaidBy,trans.PaidFor ) SELECT ISNULL(bal.PaidBy,bal2.PaidFor)AS PaidBy,ISNULL(bal.PaidFor,bal2.PaidBy)AS PaidFor, ISNULL( bal.Balance,0)-ISNULL(bal2.Balance,0) AS Balance FROM bal left JOIN bal AS bal2 ON bal.PaidBy = bal2.PaidFor AND bal.PaidFor = bal2.Paidby WHERE ISNULL( bal.Balance,0)>ISNULL(bal2.Balance,0) Sample Data for FriendBalances View - PaidBy PaidFor Balance ------ ------- ------- 9990 9991 1000 9990 9992 2000 9990 9993 1000 9991 9993 1000 9991 9994 1000 It is mainly a join of 2 tables. Transactions - CREATE TABLE [dbo].[Transactions]( [Id] [int] IDENTITY(1,1) NOT NULL, [Date] [datetime] NOT NULL, [Amount] [float] NOT NULL, [UserId] [bigint] NOT NULL, [Remarks] [nvarchar](255) NULL, [GroupFbGroupId] [bigint] NULL, CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED Sample data in Transactions Table - Id Date Amount UserId Remarks GroupFbGroupId -- ----------------------- ------ ------ -------------- -------------- 1 2001-01-01 00:00:00.000 3000 9990 this is a test NULL 2 2001-01-01 00:00:00.000 3000 9990 this is a test NULL 3 2001-01-01 00:00:00.000 3000 9991 this is a test NULL TransactionUsers - CREATE TABLE [dbo].[TransactionUser]( [TransactionsPaidFor_Id] [bigint] NOT NULL, [PaidForUsers_FbUserId] [bigint] NOT NULL ) ON [PRIMARY] Sample Data in TransactionUser Table - TransactionsPaidFor_Id PaidForUsers_FbUserId ---------------------- --------------------- 1 9991 1 9992 1 9993 2 9990 2 9991 2 9992 3 9990 3 9993 3 9994 Now I am not able to create a view because my query contains cte(s). What are the options that I have now? If cte can be removed, what should be the other option which would help in creating indexed views. Here is the error message - Msg 10137, Level 16, State 1, Line 1 Cannot create index on view "ShareBill.Test.Database.dbo.FriendBalances" because it references common table expression "trans". Views referencing common table expressions cannot be indexed. Consider not indexing the view, or removing the common table expression from the view definition. The concept: Transaction mainly consists of: an Amount that was paid UserId of the User who paid that amount and some more information which is not important for now. TransactionUser table is a mapping between a Transaction and a User Table. Essentially a transaction can be shared between multiple persons. So we store that in this table. So we have transactions where 1 person is paying for it and other are sharing the amount. So if A pays 100$ for B then B would owe 100$ to A. Similarly if B pays 90$ for A then B would owe only $10 to A. Now if A pays 300$ for A,b,c that means B would owe 110$ and C would owe 10$ to A. So in this particular view we are aggregating the effective amount that has been paid (if any) between 2 users and thus know how much a person owes another person.
Okay, this gives you an indexed view (that needs an additional view on top of to sort out the who-owes-who detail), but it may not satisfy your requirements still. /* Transactions table, as before, but with handy unique constraint for FK Target */ CREATE TABLE [dbo].[Transactions]( [Id] [int] IDENTITY(1,1) NOT NULL, [Date] [datetime] NOT NULL, [Amount] [float] NOT NULL, [UserId] [bigint] NOT NULL, [Remarks] [nvarchar](255) NULL, [GroupFbGroupId] [bigint] NULL, CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED (Id), constraint UQ_Transactions_XRef UNIQUE (Id,Amount,UserId) ) Nothing surprising so far, I hope /* Much expanded TransactionUser table, we'll hide it away and most of the maintenance is automatic */ CREATE TABLE [dbo]._TransactionUser( [TransactionsPaidFor_Id] int NOT NULL, [PaidForUsers_FbUserId] [bigint] NOT NULL, Amount float not null, PaidByUserId bigint not null, UserCount int not null, LowUserID as CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN [PaidForUsers_FbUserId] ELSE PaidByUserId END, HighUserID as CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN PaidByUserId ELSE [PaidForUsers_FbUserId] END, PerUserDelta as (Amount/UserCount) * CASE WHEN [PaidForUsers_FbUserId] < PaidByUserId THEN -1 ELSE 1 END, constraint PK__TransactionUser PRIMARY KEY ([TransactionsPaidFor_Id],[PaidForUsers_FbUserId]), constraint FK__TransactionUser_Transactions FOREIGN KEY ([TransactionsPaidFor_Id]) references dbo.Transactions, constraint FK__TransactionUser_Transaction_XRef FOREIGN KEY ([TransactionsPaidFor_Id],Amount,PaidByUserID) references dbo.Transactions (Id,Amount,UserId) ON UPDATE CASCADE ) This table now maintains enough information to allow the view to be constructed. The rest of the work we do is to construct/maintain the data in the table. Note that, with the foreign key constraint, we've already ensured that if, say, an amount is changed in the transactions table, everything gets recalculated. /* View that mimics the original TransactionUser table - in fact it has the same name so existing code doesn't need to change */ CREATE VIEW dbo.TransactionUser with schemabinding as select [TransactionsPaidFor_Id], [PaidForUsers_FbUserId] from dbo._TransactionUser GO /* Effectively the PK on the original table */ CREATE UNIQUE CLUSTERED INDEX PK_TransactionUser on dbo.TransactionUser ([TransactionsPaidFor_Id],[PaidForUsers_FbUserId]) Anything that's already written to work against TransactionUser will now work against this view, and be none the wiser. Except, they can't insert/update/delete the rows without some help: /* Now we write the trigger that maintains the underlying table */ CREATE TRIGGER dbo.T_TransactionUser_IUD ON dbo.TransactionUser INSTEAD OF INSERT, UPDATE, DELETE AS SET NOCOUNT ON; /* Every delete affects *every* row for the same transaction We need to drop the counts on every remaining row, as well as removing the actual rows we're interested in */ WITH DropCounts as ( select TransactionsPaidFor_Id,COUNT(*) as Cnt from deleted group by TransactionsPaidFor_Id ), KeptRows as ( select tu.TransactionsPaidFor_Id,tu.PaidForUsers_FbUserId,UserCount - dc.Cnt as NewCount from dbo._TransactionUser tu left join deleted d on tu.TransactionsPaidFor_Id = d.TransactionsPaidFor_Id and tu.PaidForUsers_FbUserId = d.PaidForUsers_FbUserId inner join DropCounts dc on tu.TransactionsPaidFor_Id = dc.TransactionsPaidFor_Id where d.PaidForUsers_FbUserId is null ), ChangeSet as ( select TransactionsPaidFor_Id,PaidForUsers_FbUserId,NewCount,1 as Keep from KeptRows union all select TransactionsPaidFor_Id,PaidForUsers_FbUserId,null,0 from deleted ) merge into dbo._TransactionUser tu using ChangeSet cs on tu.TransactionsPaidFor_Id = cs.TransactionsPaidFor_Id and tu.PaidForUsers_FbUserId = cs.PaidForUsers_FbUserId when matched and cs.Keep = 1 then update set UserCount = cs.NewCount when matched then delete; /* Every insert affects *every* row for the same transaction This is why the indexed view couldn't be generated */ WITH TU as ( select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,PaidByUserId from dbo._TransactionUser where TransactionsPaidFor_Id in (select TransactionsPaidFor_Id from inserted) union all select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,UserId from inserted i inner join dbo.Transactions t on i.TransactionsPaidFor_Id = t.Id ), CountedTU as ( select TransactionsPaidFor_Id,PaidForUsers_FbUserId,Amount,PaidByUserId, COUNT(*) OVER (PARTITION BY TransactionsPaidFor_Id) as Cnt from TU ) merge into dbo._TransactionUser tu using CountedTU new on tu.TransactionsPaidFor_Id = new.TransactionsPaidFor_Id and tu.PaidForUsers_FbUserId = new.PaidForUsers_FbUserId when matched then update set Amount = new.Amount,PaidByUserId = new.PaidByUserId,UserCount = new.Cnt when not matched then insert ([TransactionsPaidFor_Id],[PaidForUsers_FbUserId],Amount,PaidByUserId,UserCount) values (new.TransactionsPaidFor_Id,new.PaidForUsers_FbUserId,new.Amount,new.PaidByUserId,new.Cnt); Now that the underlying table is being maintained, we can finally write the indexed view you wanted in the first place... almost. The issue is that the totals we create may be positive or negative, because we've normalized the transactions so that we can easily sum them: CREATE VIEW [dbo]._FriendBalances WITH SCHEMABINDING as SELECT LowUserID, HighUserID, SUM(PerUserDelta) as Balance, COUNT_BIG(*) as Cnt FROM dbo._TransactionUser WHERE LowUserID != HighUserID GROUP BY LowUserID, HighUserID GO create unique clustered index IX__FriendBalances on dbo._FriendBalances (LowUserID, HighUserID) So we finally create a view, built on the indexed view above, that if the balance is negative, we flip the person owed, and the person owing around. But it will use the index on the above view, which is most of the work we were seeking to save by having the indexed view: create view dbo.FriendBalances as select CASE WHEN Balance >= 0 THEN LowUserID ELSE HighUserID END as PaidBy, CASE WHEN Balance >= 0 THEN HighUserID ELSE LowUserID END as PaidFor, ABS(Balance) as Balance from dbo._FriendBalances WITH (NOEXPAND) Now, finally, we insert your sample data: set identity_insert dbo.Transactions on --Ensure we get IDs we know GO insert into dbo.Transactions (Id,[Date] , Amount , UserId , Remarks ,GroupFbGroupId) select 1 ,'2001-01-01T00:00:00.000', 3000, 9990 ,'this is a test', NULL union all select 2 ,'2001-01-01T00:00:00.000', 3000, 9990 ,'this is a test', NULL union all select 3 ,'2001-01-01T00:00:00.000', 3000, 9991 ,'this is a test', NULL GO set identity_insert dbo.Transactions off GO insert into dbo.TransactionUser (TransactionsPaidFor_Id, PaidForUsers_FbUserId) select 1, 9991 union all select 1, 9992 union all select 1, 9993 union all select 2, 9990 union all select 2, 9991 union all select 2, 9992 union all select 3, 9990 union all select 3, 9993 union all select 3, 9994 And query the final view: select * from dbo.FriendBalances PaidBy PaidFor Balance 9990 9991 1000 9990 9992 2000 9990 9993 1000 9991 9993 1000 9991 9994 1000 Now, there is additional work we could do, if we were concerned that someone may find a way to dodge the triggers and perform direct changes to the base tables. The first would be yet another indexed view, that will ensure that every row for the same transaction has the same UserCount value. Finally, with a few additional columns, check constraints, FK constraints and more work in the triggers, I think we can ensure that the UserCount is correct - but it may add more overhead than you want. I can add scripts for these aspects if you want me to - it depends on how restrictive you want/need the database to be.