A very complicated SQL query issue - sql-server

I have 2 tables ...
Customer
CustomerIdentification
Customer table has 2 fields
CustomerId varchar(20)
Customer_Id_Link varchar(50)
CustomerIdentification table has 3 fields
CustomerId varchar(20)
Identification_Number varchar(50)
Personal_ID_Type_Code int -- is a foreign key to another table but thats irrelevant
Basically, Customer is the customer master table (with CustomerID as primary key) and CustomerIdentification can have several pieces of identifications for a given customer. In other words, CustomerId in CustomerIdentification is a foriegn key to Customer table. A customer can have many pieces of identifications, each having a Identification_Number and Personal_ID_Type_Code (which is an integer that tells you whether the identification is a passport, sin, drivers license etc.).
Now, customer table has the following data: Customer_Id_Link is blank (empty string) at this point
CustomerId Customer_Id_Link
--------------------------------
'CU-1' <Blank>
'CU-2' <Blank>
'CU-3' <Blank>
'CU-4' <Blank>
'CU-5' <Blank>
and CustomerIdentification table has the following data:
CustomerId Identification_Number Personal_ID_Type_Code
------------------------------------------------------------
'CU-1' 'A' 1
'CU-1' 'A' 2
'CU-1' 'A' 3
'CU-2' 'A' 1
'CU-2' 'B' 3
'CU-2' 'C' 4
'CU-3' 'A' 1
'CU-3' 'B' 2
'CU-3' 'C' 4
'CU-4' 'A' 1
'CU-4' 'B' 2
'CU-4' 'B' 3
'CU-5' 'B' 3
Essentially, more than one customer can have same Identification_Number and Personal_ID_Type_Code in CustomerIdentification. When this happens, all Customer_Id_Link fields need to be updated with a common value (could be a GUID or whatever). But the processing for this is more complex.
Rules are these:
For matching Personal_ID_Type_Code and Identification_Number fields between Customer Records
- Compare the Identification_Number fields for all other common Personal_ID_Type_Code fields for all the Customer Records from the above match
- if true, then link the Customer Records
For example:
Match ID 1 A for CU-1, CU-2, CU-3, CU-4
Exception ID 2 mismatch (A on CU-1 vs B on CU-3)
No linkage done
Match ID 2 B for CU-3, CU-4
No ID mismatch
Link CU-3 and CU-4 (update Customer_Id_Link field with a common value in customer table for both)
Match ID 3 A for CU-1, CU-4
Exception ID 2 mismatch (A vs B)
No linkage done
Match ID 3 B for CU-2, CU-5
No ID mismatch
Link CU-2 and CU-5 (update Customer_Id_Link field with a common value in customer table for both) Match ID 4 C for CU-2, CU-3
CU-2 already linked, keep CU-5 to customer linking list
CU-3 already linked, keep CU-4 to customer linking list
Exception ID 3 mismatch (B on CU-2 vs A on CU-4)
No linkage done (previous linkage remains)
Any help will be appreciated. This has kept me awake for two days now, and I cant seem to be able to find the solution. Ideally, the solution will be a stored procedure that I can execute to do customer linking.
 - SQL Server 2008 R2 Standard 64 bit
UPDATE-------------------------------
I knew it was going to be tough to explain this problem, so I take the blame. But essentially, I want to be able to link all the customers that have same identificationNumbers, only, a customer can have more than 1 identificationNumber. Take example 1. 1 A (1 being Personal_id_type_code and A being identificationNumber exists for 4 different customers. CU-1, CU-2, CU-3, CU-4. So they could potentially be the same customer that exists 4 different times in customer table with different customer ID. We need to link them with 1 common value. However, CU-1 has 2 other identifications and if even 1 of them is different from the other 3 (CU-2, CU-3, CU-4) they are not the same customer. So ID 2 with Num A does not match with ID 2 for CU-3 (its B) and same for CU-4. Also, even though ID 2 num A does not exist in CU-2, CU-1's ID 3 and num A does not match with CU-2s ID 3 (its B). Therefore its not a match at all.
Next common Id's and num is 2-b which exists in CU-3 and CU-4. These two customers are in fact same cause both have ID 1 - A and ID 2 - B. ID 4 - C and ID 3 - A is irrelevant cause both IDs are different. Which essentially means this customer has 4 IDs I A, 2 B, 4 C and 3 A. So now we need to link this customer with a common unique value (guid) in customer table.
I hope I explained this very complicated issue now. It is tough to explain as this is a very unique problem.

I've changed your data model a bit to try and make it a bit more obvious what's going on..
CREATE TABLE [dbo].[Customer]
(
[CustomerName] VARCHAR(20) NOT NULL,
[CustomerLink] VARBINARY(20) NULL
)
CREATE TABLE [dbo].[CustomerIdentification]
(
[CustomerName] VARCHAR(20) NOT NULL,
[ID] VARCHAR(50) NOT NULL,
[IDType] VARCHAR(16) NOT NULL
)
And I've added some more test data..
INSERT [dbo].[Customer]
([CustomerName])
VALUES ('Fred'),
('Bob'),
('Vince'),
('Tom'),
('Alice'),
('Matt'),
('Dan')
INSERT [dbo].[CustomerIdentification]
VALUES
('Fred', 'A', 'Passport'),
('Fred', 'A', 'SIN'),
('Fred', 'A', 'Drivers Licence'),
('Bob', 'A', 'Passport'),
('Bob', 'B', 'Drivers Licence'),
('Bob', 'C', 'Credit Card'),
('Vince', 'A', 'Passport'),
('Vince', 'B', 'SIN'),
('Vince', 'C', 'Credit Card'),
('Tom', 'A', 'Passport'),
('Tom', 'B', 'SIN'),
('Tom', 'B', 'Drivers Licence'),
('Alice', 'B', 'Drivers Licence'),
('Matt', 'X', 'Drivers Licence'),
('Dan', 'X', 'Drivers Licence')
Is this what you're looking for:
;WITH [cteNonMatchingIDs] AS (
-- Pairs where the IDType is the same, but
-- name and ID don't match
SELECT ci3.[CustomerName] AS [CustomerName1],
ci4.[CustomerName] AS [CustomerName2]
FROM [dbo].[CustomerIdentification] ci3
INNER JOIN [dbo].[CustomerIdentification] ci4
ON ci3.[IDType] = ci4.[IDType]
WHERE ci3.[CustomerName] <> ci4.[CustomerName]
AND ci3.[ID] <> ci4.[ID]
),
[cteMatchedPairs] AS (
-- Pairs where the IDType and ID match, and
-- there aren't any non matching IDs for the
-- CustomerName
SELECT DISTINCT
ci1.[CustomerName] AS [CustomerName1],
ci2.[CustomerName] AS [CustomerName2]
FROM [dbo].[CustomerIdentification] ci1
LEFT JOIN [dbo].[CustomerIdentification] ci2
ON ci1.[CustomerName] <> ci2.[CustomerName]
AND ci1.[IDType] = ci2.[IDType]
WHERE ci1.[ID] = ISNULL(ci2.[ID], ci1.[ID])
AND NOT EXISTS (
SELECT 1
FROM [cteNonMatchingIDs]
WHERE ci1.[CustomerName] = [CustomerName1] -- correlated subquery
AND ci2.[CustomerName] = [CustomerName2]
)
AND ci1.[CustomerName] < ci2.[CustomerName]
),
[cteMatchedList] ([CustomerName], [CustomerNameList]) AS (
-- Turn the matched pairs into list of matching
-- CustomerNames
SELECT [CustomerName1],
[CustomerNameList]
FROM (
SELECT [CustomerName1],
CONVERT(VARCHAR(1000), '$'
+ [CustomerName1] + '$'
+ [CustomerName2]) AS [CustomerNameList]
FROM [cteMatchedPairs]
UNION ALL
SELECT [CustomerName2],
CONVERT(VARCHAR(1000), '$'
+ [CustomerName2]) AS [CustomerNameList]
FROM [cteMatchedPairs]
) [cteMatchedPairs]
UNION ALL
SELECT [cteMatchedList].[CustomerName],
CONVERT(VARCHAR(1000),[CustomerNameList] + '$'
+ [cteMatchedPairs].[CustomerName2])
FROM [cteMatchedList] -- recursive CTE
INNER JOIN [cteMatchedPairs]
ON RIGHT([cteMatchedList].[CustomerNameList],
LEN([cteMatchedPairs].[CustomerName1])
) = [cteMatchedPairs].[CustomerName1]
),
[cteSubstringLists] AS (
SELECT r1.[CustomerName],
r2.[CustomerNameList]
FROM [cteMatchedList] r1
INNER JOIN [cteMatchedList] r2
ON r2.[CustomerNameList] LIKE '%' + r1.[CustomerNameList] + '%'
),
[cteCustomerLink] AS (
SELECT DISTINCT
x1.[CustomerName],
HASHBYTES('SHA1', x2.[CustomerNameList]) AS [CustomerLink]
FROM (
SELECT [CustomerName],
MAX(LEN([CustomerNameList])) AS [MAX LEN CustomerList]
FROM [cteSubstringLists]
GROUP BY [CustomerName]
) x1
INNER JOIN (
SELECT [CustomerName],
LEN([CustomerNameList]) AS [LEN CustomerList],
[CustomerNameList]
FROM [cteSubstringLists]
) x2
ON x1.[MAX LEN CustomerList] = x2.[LEN CustomerList]
AND x1.[CustomerName] = x2.[CustomerName]
)
UPDATE c
SET [CustomerLink] = cl.[CustomerLink]
FROM [dbo].[Customer] c
INNER JOIN [cteCustomerLink] cl
ON cl.[CustomerName] = c.[CustomerName]
SELECT *
FROM [dbo].[Customer]

Related

Formatting data in sql

I have few tables and basically I'm working out on telerik reports. The structure and the sample data I have is given below:
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Leave'))
BEGIN;
DROP TABLE [Leave];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Addition'))
BEGIN;
DROP TABLE [Addition];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Deduction'))
BEGIN;
DROP TABLE [Deduction];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('EmployeeInfo'))
BEGIN;
DROP TABLE [EmployeeInfo];
END;
GO
CREATE TABLE [EmployeeInfo] (
[EmpID] INT NOT NULL PRIMARY KEY,
[EmployeeName] VARCHAR(255)
);
CREATE TABLE [Addition] (
[AdditionID] INT NOT NULL PRIMARY KEY,
[AdditionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Deduction] (
[DeductionID] INT NOT NULL PRIMARY KEY,
[DeductionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Leave] (
[LeaveID] INT NOT NULL PRIMARY KEY,
[LeaveType] VARCHAR(255) NULL,
[DateFrom] VARCHAR(255),
[DateTo] VARCHAR(255),
[Approved] Binary,
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
GO
INSERT INTO EmployeeInfo([EmpID], [EmployeeName]) VALUES
(1, 'Marcia'),
(2, 'Lacey'),
(3, 'Fay'),
(4, 'Mohammad'),
(5, 'Mike')
INSERT INTO Addition([AdditionID], [AdditionType], [Amount], [EmpID]) VALUES
(1, 'Bonus', '2000', 2),
(2, 'Increment', '5000', 5)
INSERT INTO Deduction([DeductionID], [DeductionType], [Amount], [EmpID]) VALUES
(1, 'Late Deductions', '2000', 4),
(2, 'Delayed Project Completion', '5000', 1)
INSERT INTO Leave([LeaveID],[LeaveType],[DateFrom],[DateTo], [Approved], [EmpID]) VALUES
(1, 'Annual Leave','2018-01-08 04:52:03','2018-01-10 20:30:53', 1, 1),
(2, 'Sick Leave','2018-02-10 03:34:41','2018-02-14 04:52:14', 1, 2),
(3, 'Casual Leave','2018-01-04 11:06:18','2018-01-05 04:11:00', 1, 3),
(4, 'Annual Leave','2018-01-17 17:09:34','2018-01-21 14:30:44', 1, 4),
(5, 'Casual Leave','2018-01-09 23:31:16','2018-01-12 15:11:17', 1, 3),
(6, 'Annual Leave','2018-02-16 18:01:03','2018-02-19 17:16:04', 1, 2)
The query I am using to get the output is something like this:
SELECT Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
SUM(DATEDIFF(Day, Leave.DateFrom, Leave.DateTo)) [#OfLeaves],
DatePart(MONTH, Leave.DateFrom)
FROM EmployeeInfo Info
LEFT JOIN Leave
ON Info.EmpID = Leave.EmpID
LEFT JOIN Addition
ON Info.EmpID = Addition.EmpID
LEFT JOIN Deduction
ON Info.EmpID = Deduction.EmpID
WHERE Approved = 1
GROUP BY Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
DatePart(MONTH, Leave.DateFrom)
I actually want to get the output which I could be able to show on the report but somehow as I'm using joins the data is repeating on multiple rows for same user and that's why it's also appearing multiple times on the report.
The output I am getting is something like this
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey Bonus 2000 NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
Although what I want it looks something like this:
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey NULL NULL NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
As there was only one bonus and it was not allocated multiple times than it should appear one time. I am stuck in formatting the table layout so I think I might able to get a hint in formatting the output in query so I won't have to do there.
Best,
My own recommendation on this case is to change the left joins to a single table in the following way:
select
info.employeename, additiontype, additionamount, deductiontype, deductionamount, leavetype, #ofleaves, leavemth
from Employeeinfo info
join
(
Select
Leave.empid, null as additiontype, null as additionamount, null as deductiontype, null as deductionamount, leave.leavetype, DATEDIFF(Day, Leave.DateFrom, Leave.DateTo) [#OfLeaves], DatePart(MONTH, DateFrom) leavemth
from leave
where approved = 1
Union all
Select
Addition.empid, additiontype, amount, null, null, null, null, null
From addition
Union all
Select empid, null, null, deductiontype, amount, null, null, null
From deduction
) payadj on payadj.empid= info.empid
This approach separates the different pay adjustments into the different columns and also ensures that you don't get the double ups where this joins add multiple employee IDs.
You might need to explicitly name all the null columns for each Union - I haven't tested it, but I thought you only need to name the columns in a union all once.
The output comes in the format below;
employeename bonus leavetype
Lacey 2000 null
Lacey null Sick Leave
Lacey null Annual Leave
Rather than type out the full result set here is a link to sqlfiddle;
http://sqlfiddle.com/#!18/935e9/5/0
The problem you're facing is based on how you are joining the tables together. It's not syntax that's necessarily wrong but how we look at the data and how we understand the relationships between the tables. When doing the LEFT JOINs your query is able to find EmpIDs in each table and it is happy with that and grabs the records (or returns NULL if there are no records matching the EmpID). That isn't really what you're looking for since it can join too much together. So let's see why this is happening. If we take out the join to the Addition table your results would look like this:
Fay NULL NULL Casual Leave 4 1
Lacey NULL NULL Annual Leave 3 2
Lacey NULL NULL Sick Leave 4 2
Marcia Delayed Project Completion 5000 Annual Leave 2 1
Mohammad Late Deductions 2000 Annual Leave 4 1
You are still left with two rows for Lacey. The reason for these two rows is because of the join to the Leave table. Lacey has taken two leaves of absence. One for Sick Leave and the other for Annual Leave. Both of those records share the same EmpID of 2. So when you join to the Addition table (and/or to the rest of the tables) on EmpID the join looks for all matching records to complete that join. There's a single Addition record that matches two Leave records joined on EmpID. Thus, you end up with two Bonus results--the same Addition record for the two Leave records. Try running this query and check the results, it should also illustrate the problem:
SELECT l.LeaveType, l.EmpID, a.AdditionType, a.Amount
FROM Leave l
LEFT JOIN Addition a ON a.EmpID = l.EmpID
The results using your provided data would be:
Annual Leave 1 NULL NULL
Sick Leave 2 Bonus 2000
Casual Leave 3 NULL NULL
Annual Leave 4 NULL NULL
Casual Leave 3 NULL NULL
Annual Leave 2 Bonus 2000
So the data itself isn't wrong. It's just that when joining on EmpID in this way the relationships may be confusing.
So the problem is the relationship between the Leave table and the others. It doesn't make sense to join Leave to the Addition or Deduction tables directly on EmpID because it may look as though Lacey received a bonus for each leave of absence for example. This is what you are experiencing here.
I would suggest three separate queries (and potentially three reports). One to return the leave of absence data and the others for the Addition and Deduction data. Something like:
--Return each employee's leaves of absence
SELECT e.EmployeeName
, l.LeaveType
, SUM(DATEDIFF(Day, l.DateFrom, l.DateTo)) [#OfLeaves]
, DatePart(MONTH, l.DateFrom)
FROM EmployeeInfo e
LEFT JOIN Leave l ON e.EmpID = l.EmpID
WHERE l.Approved = 1
--Return each employee's Additions
SELECT e.EmployeeName
, a.AdditionType
, a.Amount
FROM EmployeeInfo e
LEFT JOIN Addition a ON e.EmpID = a.EmpID
--Return each employee's Deductions
SELECT e.EmployeeName
, d.DeductionType
, d.Amount
FROM EmployeeInfo e
LEFT JOIN Deduction d ON e.EmpID = d.EmpID
Having three queries should better represent the relationship the EmployeeInfo table has with each of the others and separate concerns. From there you can GROUP BY the different types of data and aggregate the values and get total counts and sums.
Here are some resources which may help if you hadn't found these already:
Explanation of SQL Joins: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SQL Join Examples: https://www.w3schools.com/sql/sql_join.asp
Telerik Reporting Documentation: https://docs.telerik.com/reporting/overview

SQL Function to return sequential id's

Consider this simple INSERT
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,123 FROM Customers
That will obviously assign UserId=123 to all customers.
What I need to do is assign them to 3 userId's sequentially, so 3 users get one third of the accounts equally.
INSERT INTO Assignment (CustomerId,UserId)
SELECT CustomerId,fnGetNextId() FROM Customers
Could I create a function to return sequentially from a list of 3 ID's?, i.e. each time the function is called it returns the next one in the list?
Thanks
Could I create a function to return sequentially from a list of 3 ID's?,
If you create a SEQUENCE, then you can assign incremental numbers with the NEXT VALUE FOR (Transact-SQL) expression.
This is a strange requirement, but the modulus operator (%) should help you out without the need for functions, sequences, or altering your database structure. This assumes that the IDs are integers. If they're not, you can use ROW_NUMBER or a number of other tactics to get a distinct number value for each customer.
Obviously, you would replace the SELECT statement with an INSERT once you're satisfied with the code, but it's good practice to always select when developing before inserting.
SETUP WITH SAMPLE DATA:
DECLARE #Users TABLE (ID int, [Name] varchar(50))
DECLARE #Customers TABLE (ID int, [Name] varchar(50))
DECLARE #Assignment TABLE (CustomerID int, UserID int)
INSERT INTO #Customers
VALUES
(1, 'Joe'),
(2, 'Jane'),
(3, 'Jon'),
(4, 'Jake'),
(5, 'Jerry'),
(6, 'Jesus')
INSERT INTO #Users
VALUES
(1, 'Ted'),
(2, 'Ned'),
(3, 'Fred')
QUERY:
SELECT C.Name AS [CustomerName], U.Name AS [UserName]
FROM #Customers C
JOIN #Users U
ON
CASE WHEN C.ID % 3 = 0 THEN 1
WHEN C.ID % 3 = 1 THEN 2
WHEN C.ID % 3 = 2 THEN 3
END = U.ID
You would change the THEN 1 to whatever your first UserID is, THEN 2 with the second UserID, and THEN 3 with the third UserID. If you end up with another user and want to split the customers 4 ways, you would do replace the CASE statement with the following:
CASE WHEN C.ID % 4 = 0 THEN 1
WHEN C.ID % 4 = 1 THEN 2
WHEN C.ID % 4 = 2 THEN 3
WHEN C.ID % 4 = 3 THEN 4
END = U.ID
OUTPUT:
CustomerName UserName
-------------------------------------------------- --------------------------------------------------
Joe Ned
Jane Fred
Jon Ted
Jake Ned
Jerry Fred
Jesus Ted
(6 row(s) affected)
Lastly, you will want to select the IDs for your actual insert, but I selected the names so the results are easier to understand. Please let me know if this needs clarification.
Here's one way to produce Assignment as an automatically rebalancing view:
CREATE VIEW dbo.Assignment WITH SCHEMABINDING AS
WITH SeqUsers AS (
SELECT UserID, ROW_NUMBER() OVER (ORDER BY UserID) - 1 AS _ord
FROM dbo.Users
), SeqCustomers AS (
SELECT CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) - 1 AS _ord
FROM dbo.Customers
)
-- INSERT Assignment(CustomerID, UserID)
SELECT SeqCustomers.CustomerID, SeqUsers.UserID
FROM SeqUsers
JOIN SeqCustomers ON SeqUsers._ord = SeqCustomers._ord % (SELECT COUNT(*) FROM SeqUsers)
;
This shifts assignments around if you insert a new user, which could be quite undesirable, and it's also not efficient if you had to JOIN on it. You can easily repurpose the query it contains for one-time inserts (the commented-out INSERT). The key technique there is joining on ROW_NUMBER()s.

SQL Server: results of the table valued function doesn't match column names

I have this function:
CREATE FUNCTION [dbo].[full_ads](#date SMALLDATETIME)
returns TABLE
AS
RETURN
SELECT *,
COALESCE((SELECT TOP 1 ptype
FROM special_ads
WHERE [adid] = a.id
AND #date BETWEEN starts AND ends), 1) AS ptype,
(SELECT TOP 1 name
FROM cities
WHERE id = a.cid) AS city,
(SELECT TOP 1 name
FROM provinces
WHERE id = (SELECT pid
FROM cities
WHERE id = a.cid)) AS province,
(SELECT TOP 1 name
FROM models
WHERE id = a.mid) AS model,
(SELECT TOP 1 name
FROM car_names
WHERE id = (SELECT car_id
FROM models
WHERE id = a.mid)) AS brand,
(SELECT TOP 1 pid
FROM cities
WHERE id = a.cid) pid,
(SELECT TOP 1 car_id
FROM models
WHERE id = a.mid) bid,
(SELECT TOP 1 name
FROM colors
WHERE id = a.color_id) AS color,
COALESCE((SELECT TOP 1 fileid
FROM carimgs
WHERE adid = a.id), 'nocarimage.png') AS [image]
FROM ads a
WHERE isdeleted <> 1
Sometimes it works correctly, but sometimes column names doesn't match values like (I have written a sample results with fewer columns just to show the problem):
ID Name City Color Image
----------------------------------------------
1 John New York Null Red
2 Ted Chicago Null Blue
As you see color and Image values are shifted one column and this continues to the last column.
Can anyone tell me where the problem is?
This arises from using *.
If the definition of ads changes (columns added or removed) this can mess up the metadata associated with the TVF.
You would need to run sp_refreshsqlmodule on it to refresh this metadata after such changes. It is best to avoid * in view definitions or inline TVFs for this reason.
An example of this
CREATE TABLE T
(
A CHAR(1) CONSTRAINT DF_A DEFAULT 'A',
B CHAR(1) CONSTRAINT DF_B DEFAULT 'B',
C CHAR(1) CONSTRAINT DF_C DEFAULT 'C',
D CHAR(1) CONSTRAINT DF_D DEFAULT 'D'
)
GO
INSERT INTO T DEFAULT VALUES
GO
CREATE FUNCTION F()
RETURNS TABLE
AS
RETURN
SELECT * FROM T
GO
SELECT * FROM F()
GO
ALTER TABLE T DROP CONSTRAINT DF_C, COLUMN C
ALTER TABLE T ADD E CHAR(1) DEFAULT 'E' WITH VALUES
GO
SELECT * FROM F()
Returns
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
| A | B | D | E |
+---+---+---+---+
Note that the D and E values are shown in the wrong columns. It still shows column C even though it has been dropped.

CTE Recursion to get tree hierarchy

I need to get an ordered hierarchy of a tree, in a specific way. The table in question looks a bit like this (all ID fields are uniqueidentifiers, I've simplified the data for sake of example):
EstimateItemID EstimateID ParentEstimateItemID ItemType
-------------- ---------- -------------------- --------
1 A NULL product
2 A 1 product
3 A 2 service
4 A NULL product
5 A 4 product
6 A 5 service
7 A 1 service
8 A 4 product
Graphical view of the tree structure (* denotes 'service'):
A
___/ \___
/ \
1 4
/ \ / \
2 7* 5 8
/ /
3* 6*
Using this query, I can get the hierarchy (just pretend 'A' is a uniqueidentifier, I know it isn't in real life):
DECLARE #EstimateID uniqueidentifier
SELECT #EstimateID = 'A'
;WITH temp as(
SELECT * FROM EstimateItem
WHERE EstimateID = #EstimateID
UNION ALL
SELECT ei.* FROM EstimateItem ei
INNER JOIN temp x ON ei.ParentEstimateItemID = x.EstimateItemID
)
SELECT * FROM temp
This gives me the children of EstimateID 'A', but in the order that it appears in the table. ie:
EstimateItemID
--------------
1
2
3
4
5
6
7
8
Unfortunately, what I need is an ordered hierarchy with a result set that follows the following constraints:
1. each branch must be grouped
2. records with ItemType 'product' and parent are the top node
3. records with ItemType 'product' and non-NULL parent grouped after top node
4. records with ItemType 'service' are bottom node of a branch
So, the order that I need the results, in this example, is:
EstimateItemID
--------------
1
2
3
7
4
5
8
6
What do I need to add to my query to accomplish this?
Try this:
;WITH items AS (
SELECT EstimateItemID, ItemType
, 0 AS Level
, CAST(EstimateItemID AS VARCHAR(255)) AS Path
FROM EstimateItem
WHERE ParentEstimateItemID IS NULL AND EstimateID = #EstimateID
UNION ALL
SELECT i.EstimateItemID, i.ItemType
, Level + 1
, CAST(Path + '.' + CAST(i.EstimateItemID AS VARCHAR(255)) AS VARCHAR(255))
FROM EstimateItem i
INNER JOIN items itms ON itms.EstimateItemID = i.ParentEstimateItemID
)
SELECT * FROM items ORDER BY Path
With Path - rows a sorted by parents nodes
If you want sort childnodes by ItemType for each level, than you can play with Level and SUBSTRING of Pathcolumn....
Here SQLFiddle with sample of data
This is an add-on to Fabio's great idea from above. Like I said in my reply to his original post. I have re-posted his idea using more common data, table name, and fields to make it easier for others to follow.
Thank you Fabio! Great name by the way.
First some data to work with:
CREATE TABLE tblLocations (ID INT IDENTITY(1,1), Code VARCHAR(1), ParentID INT, Name VARCHAR(20));
INSERT INTO tblLocations (Code, ParentID, Name) VALUES
('A', NULL, 'West'),
('A', 1, 'WA'),
('A', 2, 'Seattle'),
('A', NULL, 'East'),
('A', 4, 'NY'),
('A', 5, 'New York'),
('A', 1, 'NV'),
('A', 7, 'Las Vegas'),
('A', 2, 'Vancouver'),
('A', 4, 'FL'),
('A', 5, 'Buffalo'),
('A', 1, 'CA'),
('A', 10, 'Miami'),
('A', 12, 'Los Angeles'),
('A', 7, 'Reno'),
('A', 12, 'San Francisco'),
('A', 10, 'Orlando'),
('A', 12, 'Sacramento');
Now the recursive query:
-- Note: The 'Code' field isn't used, but you could add it to display more info.
;WITH MyCTE AS (
SELECT ID, Name, 0 AS TreeLevel, CAST(ID AS VARCHAR(255)) AS TreePath
FROM tblLocations T1
WHERE ParentID IS NULL
UNION ALL
SELECT T2.ID, T2.Name, TreeLevel + 1, CAST(TreePath + '.' + CAST(T2.ID AS VARCHAR(255)) AS VARCHAR(255)) AS TreePath
FROM tblLocations T2
INNER JOIN MyCTE itms ON itms.ID = T2.ParentID
)
-- Note: The 'replicate' function is not needed. Added it to give a visual of the results.
SELECT ID, Replicate('.', TreeLevel * 4)+Name 'Name', TreeLevel, TreePath
FROM MyCTE
ORDER BY TreePath;
I believe that you need to add the following to the results of your CTE...
BranchID = some kind of identifier that uniquely identifies the branch. Forgive me for not being more specific, but I'm not sure what identifies a branch for your needs. Your example shows a binary tree in which all branches flow back to the root.
ItemTypeID where (for example) 0 = Product and 1 = service.
Parent = identifies the parent.
If those exist in the output, I think you should be able to use the output from your query as either another CTE or as the FROM clause in a query. Order by BranchID, ItemTypeID, Parent.

Query to find the record with most matching columns, where the number of columns and names of columns is unknown?

I have two tables, X and Y, with identical schema but different records. Given a record from X, I need a query to find the closest matching record in Y that contains NULL values for non-matching columns. Identity columns should be excluded from the comparison. For example, if my record looked like this:
------------------------
id | col1 | col2 | col3
------------------------
0 |'abc' |'def' | 'ghi'
And table Y looked like this:
------------------------
id | col1 | col2 | col3
------------------------
6 |'abc' |'def' | 'zzz'
8 | NULL |'def' | NULL
Then the closest match would be record 8, since where the columns don't match, there are NULL values. 6 WOULD have been the closest match, but the 'zzz' disqualified it.
What's unique about this problem is that the schema of the tables is unknown besides the id column and the data types. There could be 4 columns, or there could be 7 columns. We just don't know - it's dynamic. All we know is that there is going to be an 'id' column and that the columns will be strings, either varchar or nvarchar.
What is the best query in this case to pick the closest matching record out of Y, given a record from X? I'm actually writing a function. The input is an integer (the id of a record in X) and the output is an integer (the id of a record in Y, or NULL). I'm an SQL novice, so a brief explanation of what's happening in your solution would help me greatly.
There could be 4 columns, or there could be 7 columns.... I'm actually writing a function.
This is an impossible task. Because functions are deterministic, so you cannot have a function that will work on an arbitrary table structure, using dynamic SQL. A stored procedure, sure, but not a function.
However, the below shows you a way using FOR XML and some decomposing of the XML to unpivot rows into column names and values which can then be compared. The technique used here and the queries can be incorporated into a stored procedure.
MS SQL Server 2008 Schema Setup:
-- this is the data table to match against
create table t1 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t1
select 6, 'abc', 'def', 'zzz' union all
select 8, null , 'def', null;
-- this is the data with the row you want to match
create table t2 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t2
select 0, 'abc', 'def', 'ghi';
GO
Query 1:
;with unpivoted1 as (
select n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select (select * from t2 where id=0 for xml path(''), type)) x(xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
), unpivoted2 as (
select x.id,
n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select id,(select * from t1 where id=outr.id for xml path(''), type) from t1 outr) x(id,xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
)
select TOP(1) WITH TIES
B.id,
sum(case when A.value=B.value then 1 else 0 end) matches
from unpivoted1 A
join unpivoted2 B on A.colname = B.colname
group by B.id
having max(case when A.value <> B.value then 1 end) is null
ORDER BY matches;
Results:
| ID | MATCHES |
----------------
| 8 | 1 |

Resources