SQL Server : alternatives to INTERSECT - sql-server

I am writing an attempt at implementing a name search functionality via a SQL Server stored procedure.
Three tables are involved with the following definitions:
Employee table (other columns removed for brevity)
CREATE TABLE Payroll.Employee
(
EmployeeID INT NOT NULL IDENTITY(1,1),
EmployeeName NVARCHAR(50) NOT NULL ,
CONSTRAINT PK_Employee PRIMARY KEY CLUSTERED (EmployeeID),
);
Names table (single names stored with a a unique KeyNameID)
CREATE TABLE Payroll.KeyName
(
KeyNameID INT NOT NULL IDENTITY(1,1),
KeyName NVARCHAR(20) NOT NULL ,
RwVersion ROWVERSION NOT NULL
CONSTRAINT PK_KeyName PRIMARY KEY CLUSTERED (KeyNameID),
CONSTRAINT UC_KeyName UNIQUE (KeyName)
);
EmployeeName (employee's name stored using the id for each single name)
CREATE TABLE Payroll.EmployeeName
(
EmployeeNameID INT NOT NULL IDENTITY(1,1),
KeyNameID INT NOT NULL,
EmployeeID INT NOT NULL,
CONSTRAINT PK_EmployeeName PRIMARY KEY CLUSTERED (EmployeeNameID),
CONSTRAINT UC_EmployeeName UNIQUE (KeyNameID,EmployeeID)
);
The Employee table has the following rows:
INSERT INTO Employee (EmployeeID, EmployeeName) VALUES (1, 'ayub kassim');
INSERT INTO Employee (EmployeeID, EmployeeName) VALUES (2, 'eric yuda');
INSERT INTO Employee (EmployeeID, EmployeeName) VALUES (3, 'james kassim');
Each of the above names are split and stored in the KeyName table as follows:
INSERT INTO KeyName (KeyNameID, KeyName) VALUES (1, 'ayub');
INSERT INTO KeyName (KeyNameID, KeyName) VALUES (2, 'eric');
INSERT INTO KeyName (KeyNameID, KeyName) VALUES (3, 'james');
INSERT INTO KeyName (KeyNameID, KeyName) VALUES (4, 'kassim');
INSERT INTO KeyName (KeyNameID, KeyName) VALUES (5, 'yuda');
The KeyNameIDs are then used to identify each employee's single name as follows (two records per employee):
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (1, 1, 1);
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (2, 4, 1);
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (3, 2, 2);
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (4, 5, 2);
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (3, 3, 3);
INSERT INTO EmployeeName (EmployeeNameID, KeyNameID, EmployeeID) VALUES (3, 4, 3);
My search code is as follows:
CREATE PROCEDURE dbo.uspEmployeeSearch
#SearchString1 NVARCHAR(20),
#SearchString2 NVARCHAR(20)
AS
DECLARE #StringLength INT = LEN(#SearchString2)
SELECT #SearchString1 = RTRIM(#SearchString1) + '%'
SELECT #SearchString2 = RTRIM(#SearchString2) + '%'
SET NOCOUNT ON
IF #StringLength = 0
SELECT
EmployeeName,
taxpin
FROM
Employee
JOIN
EmployeeName ON Employee.EmployeeId = EmployeeName.EmployeeId
JOIN
KeyName ON KeyName.KeyNameId = EmployeeName.KeyNameId
WHERE
KeyName.KeyName LIKE #SearchString1;
ELSE
SELECT
EmployeeName,
taxpin
FROM
Employee
JOIN
EmployeeName ON Employee.EmployeeId = EmployeeName.EmployeeId
JOIN
KeyName ON KeyName.KeyNameId = EmployeeName.KeyNameId
WHERE
KeyName.KeyName LIKE #SearchString1
INTERSECT
SELECT
EmployeeName,
taxpin
FROM
Employee
JOIN
EmployeeName ON Employee.EmployeeId = EmployeeName.EmployeeId
JOIN
KeyName ON KeyName.KeyNameId = EmployeeName.KeyNameId
WHERE
KeyName.KeyName LIKE #SearchString2
The procedure expects two parameters. Where only one name is being searched, the second parameter would be a zero-length string.
So far it works with around 31% of the cost on SORT. The nested loops use 'index seek' which is good for me.
In short, if I am searching for 'ayub kassim' I only want employee records with the full name 'ayub kassim'. But if I search for 'kassim' I want all employee records with 'kassim' in the name.
My question is: Is it possible to implement this stored procedure using JOINS only, without the proprietary INTERSECT clause?
For the record I do not want to use LIKE because I need a very fast procedure and my employee table could run into hundreds of thousands of records.
Thanks in advance for your help.

Related

SQL filter results based on user/role combination

Given the following tables, I need to filter the things data based on which user is making the call as well as additional (optional) filters that consist of a user_id/role combination.
At any time a user should only receive results for things that they are linked to unless they have full access. The additional filters are AND filters meaning that the results should satisfy all the filters.
The parameters #user_id, #has_full_access and user_id/role filters are passed through using Dapper.
CREATE TABLE users
(
id int NOT NULL
CONSTRAINT PK_users PRIMARY KEY CLUSTERED (id)
)
CREATE TABLE user_roles
(
user_id int NOT NULL,
role varchar(10) NOT NULL
CONSTRAINT FK_user_roles_users FOREIGN KEY(user_id) REFERENCES users(id)
)
CREATE TABLE things
(
id int NOT NULL
CONSTRAINT PK_things PRIMARY KEY CLUSTERED (id)
)
CREATE TABLE thing_permissions
(
thing_id int NOT NULL,
user_id int NOT NULL,
role varchar(10) NOT NULL
CONSTRAINT FK_thing_permissions_things FOREIGN KEY(thing_id) REFERENCES things(id),
CONSTRAINT FK_thing_permissions_users FOREIGN KEY(user_id) REFERENCES users(id)
)
INSERT INTO users VALUES (1)
INSERT INTO users VALUES (2)
INSERT INTO users VALUES (3)
INSERT INTO users VALUES (4)
INSERT INTO users VALUES (5)
INSERT INTO user_roles VALUES (1, 'Admin')
INSERT INTO user_roles VALUES (2, 'Creator')
INSERT INTO user_roles VALUES (2, 'Owner')
INSERT INTO user_roles VALUES (3, 'Creator')
INSERT INTO user_roles VALUES (3, 'Owner')
INSERT INTO user_roles VALUES (4, 'Creator')
INSERT INTO user_roles VALUES (5, 'Owner')
INSERT INTO things VALUES (1)
INSERT INTO things VALUES (2)
INSERT INTO things VALUES (3)
INSERT INTO things VALUES (4)
INSERT INTO things VALUES (5)
INSERT INTO thing_permissions VALUES (1, 2, 'Creator')
INSERT INTO thing_permissions VALUES (1, 3, 'Creator')
INSERT INTO thing_permissions VALUES (1, 2, 'Owner')
INSERT INTO thing_permissions VALUES (2, 2, 'Creator')
INSERT INTO thing_permissions VALUES (2, 5, 'Owner')
INSERT INTO thing_permissions VALUES (3, 4, 'Creator')
INSERT INTO thing_permissions VALUES (3, 3, 'Owner')
INSERT INTO thing_permissions VALUES (3, 5, 'Owner')
INSERT INTO thing_permissions VALUES (4, 3, 'Creator')
INSERT INTO thing_permissions VALUES (4, 5, 'Owner')
INSERT INTO thing_permissions VALUES (5, 2, 'Creator')
The following are some examples of various input combinations as well as the expected results.
--Scenario 1:
--Expected Results: 1, 2, 3, 4, 5
DECLARE #user_id int = 1
DECLARE #has_full_access bit = 1
DECLARE #filters TABLE (user_id int, [role] varchar(10))
--Scenario 2:
--Expected Results: 1, 2, 5
DECLARE #user_id int = 2
DECLARE #has_full_access bit = 0
DECLARE #filters TABLE (user_id int, [role] varchar(10))
--Scenario 3:
--Expected Results: 1
DECLARE #user_id int = 1
DECLARE #has_full_access bit = 1
DECLARE #filters TABLE (user_id int, [role] varchar(10))
INSERT INTO #filters VALUES (2, 'Creator')
INSERT INTO #filters VALUES (2, 'Owner')
--Scenario 4:
--Expected Results: 3
DECLARE #user_id int = 1
DECLARE #has_full_access bit = 1
DECLARE #filters TABLE (user_id int, [role] varchar(10))
INSERT INTO #filters VALUES (3, 'Owner')
INSERT INTO #filters VALUES (5, 'Owner')
--Scenario 5:
--Expected Results: 1
DECLARE #user_id int = 2
DECLARE #has_full_access bit = 0
DECLARE #filters TABLE (user_id int, [role] varchar(10))
INSERT INTO #filters VALUES (3, 'Creator')
--Scenario 6:
--Expected Results: no results
DECLARE #user_id int = 1
DECLARE #has_full_access bit = 1
DECLARE #filters TABLE (user_id int, [role] varchar(10))
INSERT INTO #filters VALUES (2, 'Creator')
INSERT INTO #filters VALUES (4, 'Creator')
Here is a SQL Fiddle with the setup.
At the moment, I have the following function that returns all the things as well as the role(s) in which the user is linked.
FUNCTION GetMyThings (#user_id INT, #has_full_access BIT)
RETURNS TABLE
AS
RETURN
(
SELECT t.id, 'Admin' AS role
FROM things t
WHERE #has_full_access = 1
UNION
SELECT t.id, tp.role
FROM things t
INNER JOIN thing_permissions tp ON tp.thing_id = t.id
WHERE tp.user_id = #user_id
)
I use this function to get a list of things that the calling user has access to as well as those for each user in the filters. Finally I return the results that are in both these data sets.
DECLARE #my_things TABLE (id INT)
INSERT INTO #my_things SELECT id FROM GetMyThings(#user_id, #has_full_access)
DECLARE #filtered_things TABLE (id INT)
INSERT INTO #filtered_things SELECT ft.id FROM #filters f CROSS APPLY (SELECT DISTINCT id, role FROM GetMyThings(f.user_id, 0)) ft WHERE ft.role = f.role GROUP BY ft.id HAVING COUNT(ft.id) >= (SELECT COUNT(user_id) FROM #filters)
DECLARE #has_filter BIT = (SELECT has_filter = CASE WHEN (COUNT(user_id) > 0) THEN 1 ELSE 0 END FROM #filters)
DECLARE #final_things TABLE (id INT)
INSERT INTO #final_things SELECT id FROM #my_things WHERE #has_filter = 0 OR id IN (SELECT id FROM #filtered_things)
SELECT * FROM #final_things
Is there a better way of doing this? My solution works but with bigger data sets it seems as if the function slows the query down when compared to selecting from the original data.
I've also tried using views but because I need the #has_full_access parameter and separate SELECTs UNIONed together I cannot add a WHERE to each SELECT.

How to show only one row

I have this table structure and the sample data as well. I want to get only one row of the data. But instead it is giving me rows equal to it's child records.
--DROP TABLE [Detail];
--DROP TABLE [Master];
--CREATE TABLE [Master]
--(
--ID INT NOT NULL PRIMARY KEY,
--Code VARCHAR(25)
--);
--INSERT INTO [Master] VALUES (1, 'CASH');
--INSERT INTO [Master] VALUES (2, 'CASH');
--CREATE TABLE [Detail]
--(
--ID INT NOT NULL PRIMARY KEY,
--MasterID INT,
--DrAmount Numeric,
--CrAmount Numeric,
--CONSTRAINT FK_MASTER FOREIGN KEY (MasterID)
--REFERENCES [Master](ID)
--);
--INSERT INTO [Detail] VALUES (1, 1, '2200', NULL);
--INSERT INTO [Detail] VALUES (2, 1, NULL, '3200');
--INSERT INTO [Detail] VALUES (3, 1, '1000', NULL);
--INSERT INTO [Detail] VALUES (4, 2, NULL, '3200');
--INSERT INTO [Detail] VALUES (5, 2, '3200', NULL);
Here is the query and result:
SELECT [MASTER].[Code], [DETAIL].[MasterID], [DETAIL].[CrAmount]
FROM [MASTER], [DETAIL]
WHERE [MASTER].[ID] = [DETAIL].[MasterID]
Looks like you need GROUP BY and as #HoneyBadger suggests, it would be better to use the modern explicit join syntax - it is much more clear:
select m.code, d.masterid, sum(d.cramount) amount
from [master] m
join[detail] d on m.[id] = d.[masterid]
group by m.code, d.masterid
Result:
code masterid amount
CASH 1 3200
CASH 2 3200

sql server:Column name or number of supplied values does not match table definition

I create a table with fields below in sql server 2016
[ID] int primary key not null,[FirstName] nvarchar(50),[LastName] nvarchar(50),[Gender] nvarchar(10),[Salary] int,[DepartmentId] int
I forgot to make ID as entity so I drop it and made a new field with the name new_id as identity(1,1)
now when I insert some rows I get the following error:
Column name or number of supplied values does not match table definition
I choose database from the top and then make a query with this instructions:
insert into Employees values ('Mark','Hastings','Male',60000,1)
insert into Employees values ('Steve','Pound','Male',45000,3)
insert into Employees values ('Ben','Hoskins','Male',70000,1)
insert into Employees values ('Philip','Hastings','Male',45000,2)
insert into Employees values ('Mary','Lambeth','female',30000,2)
insert into Employees values ('Valaries','Vikings','Female',35000,3)
insert into Employees values ('John','Stanmore','Male',80000,1)
Perhaps you forgot to make the ID column with Identity.
Change your ID column to auto generate values with Identity property
[ID] int Identity primary key not null,[FirstName] nvarchar(50).........
Your table definition should be like
CREATE TABLE EMPLOYEESS (
[ID] INT IDENTITY PRIMARY KEY NOT NULL
,[FirstName] NVARCHAR(50)
,[LastName] NVARCHAR(50)
,[Gender] NVARCHAR(10)
,[Salary] INT
,[DepartmentId] INT
)
So that ID column will auto generated the values. And it is always advisable to List the column names while inserting.
Try this,
insert into Employees([FirstName],[LastName],[Gender],[Salary],[DepartmentId]) values ('Mark','Hastings','Male',60000,1)
You can specify the columns to insert on, which would avoid the ambiguity.
INSERT INTO Employees([FirstName],[LastName],[Gender],[Salary],[DepartmentId])
VALUES ('Mark','Hastings','Male',60000,1)
or, another way to insert
INSERT INTO Employees([FirstName],[LastName],[Gender],[Salary],[DepartmentId])
SELECT 'Mark','Hastings','Male',60000,1
UNION ALL SELECT 'Steve','Pound','Male',45000,3
UNION ALL SELECT 'Ben','Hoskins','Male',70000,1
UNION ALL SELECT 'Philip','Hastings','Male',45000,2
UNION ALL SELECT 'Mary','Lambeth','female',30000,2
UNION ALL SELECT 'Valaries','Vikings','Female',35000,3
UNION ALL SELECT 'John','Stanmore','Male',80000,1
You should include the field names you are assigning :
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Mark','Hastings','Male',60000,1)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Steve','Pound','Male',45000,3)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Ben','Hoskins','Male',70000,1)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Philip','Hastings','Male',45000,2)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Mary','Lambeth','female',30000,2)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('Valaries','Vikings','Female',35000,3)
insert into Employees ([FirstName], [LastName], [Gender], [Salary] ,[DepartmentId]) values ('John','Stanmore','Male',80000,1)

Foreign Key constraint failure and error mesage when inserting values

Hopefully someone can help. I have created two tables Customer and Order as follows;
CREATE TABLE Customer
CustomerID int NOT NULL PRIMARY KEY
CustomerName varchar(25)
The other columns in Customer table are not relevant to my question, so I will not include them here. My CustomerID numbers are from 1 through to 15, all unique.
The second table I created is Orders as follows
CREATE TABLE Orders
OrderID smallint NOT NULL PRIMARY KEY
OrderDate date NOT NULL
CustomerID int FOREIGN KEY REFERENCES Customer (CustomerID);
My insert values is as follows
INSERT INTO Orders (OrderID, OrderDate, CustomerID)
VALUES
(1001, '2008-10-21', 1),
(1002, '2008-10-21', 8),
(1003, '2008-10-22', 15),
(1004, '2008-10-22', 5),
(1005, '2008-10-24', 3),
(1006, '2008-10-24', 2),
(1007, '2008-10-27', 11),
(1008, '2008-10-30', 12),
(1009, '2008-11-05', 4),
(1010, '2008-11-05', 1);
When I try to insert my values into the Order table, I get the following error message....
Msg 547, Level 16, State 0, Line 1.....The INSERT statement conflicted
with the FOREIGN KEY constraint "FK__OrderT__Customer__2D27B809". The
conflict occurred table "dbo.Customer", column 'CustomerID'. The
statement has been terminated.
The numbers for CustomerID in my Order table, are (1; 1; 2; 3; 4; 5; 8; 11; 12 and 15). Therefore I have checked that all my CustomerID numbers in Order table are also in the Customer table.
So my questions are
1) Has the insert values failed because my CustomerID column in Customer table in NOT NULL and I in error made CustomerID column NULL in Order.
2) If the answer to the above question is yes, then is it possible for me to (a) drop the foreign key on the CustomerID column in Order (b) change the column to NOT NULL and (c) then add the foreign key constraint again to this column and then insert the values again?
It might be easier to drop and re-create the table Order. But I am curious if option 2 would work, re dropping and adding a foreign key on the same column.
Hopefully I am on the right track with why I think the error occurred, feel
to correct me if I am wrong.
Thanks everyone
Josie
1) It should be NOT NULL in both. However error is because you attempted to insert a CustomerId that is not in Customer table.
2) You can simply alter the table and make it NOT NULL (error was not that).
Sample:
CREATE TABLE Customer
(
CustomerID INT NOT NULL
PRIMARY KEY ,
CustomerName VARCHAR(25)
);
CREATE TABLE Orders
(
OrderID INT NOT NULL
PRIMARY KEY ,
OrderDate DATE NOT NULL ,
CustomerID INT FOREIGN KEY REFERENCES Customer ( CustomerID )
);
INSERT [Customer] ( [CustomerID], [CustomerName] )
VALUES ( 1, 'Customer 1' ),
( 2, 'Customer 2' ),
( 3, 'Customer 3' ),
( 4, 'Customer 4' ),
( 5, 'Customer 5' ),
( 6, 'Customer 6' );
INSERT [Orders] ( [OrderID], [OrderDate], [CustomerID] )
VALUES
( 1, GETDATE(), 1 ),
( 2, GETDATE(), 2 ),
( 3, GETDATE(), 3 ),
( 4, GETDATE(), 4 ),
( 5, GETDATE(), 5 ),
( 6, GETDATE(), 6 );
INSERT [Orders] ( [OrderID], [OrderDate], [CustomerID] )
VALUES ( 7, GETDATE(), 7 );
Last one would error, because Customer with CustomerID 7 doesn't exist.
Update: I later saw your sample insert. You can find the offending ID like this:
DECLARE #ids TABLE ( id INT );
INSERT #ids ( [id] )
VALUES ( 1 ),
( 8 ),
( 15 ),
( 5 ),
( 3 ),
( 2 ),
( 11 ),
( 12 ),
( 4 ),
( 1 );
SELECT *
FROM #ids AS [i]
WHERE id NOT IN ( SELECT CustomerID
FROM [Customer] AS [c] );
1) Has the insert values failed because my CustomerID column in
Customer table in NOT NULL and I in error made CustomerID column NULL
in Order.
No. The error is not related to allowing NULL in the Order table. A NULL value will be allowed and not checked for referential integrity.
The foreign key violation error means you are attempting to insert a non-NULL CustomerID value into the Order table that does not exist in Customer. If you are certain the CustomerID values exist, perhaps the column mapping is wrong. Try specifying an explicit column list on the INSERT statement.
This error happens whern you are trying to insert a value in foreign key column, which this value does not exists in it's parent table. for example you are trying to insert value X to CustomerId in Order table, which this value does not exists in Customer table. This error occurred because we need to have a good strategy for Referential Integrity. So the only you need to do, is to check your new values(which you are going to insert them into table) to find out that is there any value compromising this rule or not.
However if you want to get an answer for your second question, you can try the below script:
create table t1
(
Id int primary key,
Name varchar(50) null
)
create table t2
(
Id int,
FK int null foreign key references t1(id)
)
go
alter table t2
alter column FK int not null
The identity column is set up once when you create the table.
The id assigned by an identity column start by the seed and are never reused.
So if you find that your customer ids starts from 6, it means that in the past you have added 5 customers and removed it.
If, for any reason, you want to use fixed Id don't use identity. In that case you take full responsability to set a unique value.
I suggest to never rely on fixed ids, if you must add orders from a script use the CustomerName (if unique), or any natural unique key.
You could use a script like this
DECLARE #newOrders TABLE (OrderID INT, CustomerName VARCHAR(25), OrderDate DATE);
INSERT INTO #newOrders (OrderID, CustomerName, OrderDate) VALUES
(1001, 'some-username', '2008-10-21'),
(1002, 'another-username', '2008-10-21');
INSERT INTO Orders(OrderID, CustomerId, OrderDate)
SELECT
o.OrderID,
c.CustomerID,
o.OrderDate
FROM #newOrders o
JOIN Customer c
ON c.CustomerName = o.CustomerName;
In this way you insert the correct CustomerID.
Note that in very rare cases (think twice to use it) you could insert values in identity colums using SET IDENTITY_INSERT statement.

TSQL - Bringing Data Together from Different Sources ...refactoring PK and FKs

I have various offices and one central head office. Each office has its own SQL Server 2008 instance so each office has its own data set with its own set of IDs.
Each office has already imported data into the head office and stored the data on a set of STAGING_Tables that look like this.
DECLARE #STAGING_COUNTRY TABLE
(
Original_CountryID INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
Data VARCHAR(200) NOT NULL
);
DECLARE #STAGING_CITY TABLE
(
Original_CityID INT NOT NULL,
Original_CountryID_FK INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
OtherData VARCHAR(100) NOT NULL
);
STAGING_COUNTRY has the original ID of each row (which off course will be duplicated since each office will have ID=1 for the 1st row on their Country table) and also has a unique OfficeID value that together with the Original_CountryID ..makes a unique value.
STAGING_CITY has also the original ID of each row, the unique OfficeID value that represent each office and in this case a FK to CountryID, (but of course at this point we have a reference to the Original_CountryID ..that in conjunction with the office ID could be identified).
Let's add some dummy rows:
/* ADD DUMMY VALUES TO STAGING_COUNTRY */
INSERT INTO #STAGING_COUNTRY
(Original_CountryID, OfficeID, Data) VALUES (1, 'Office1', 'USA')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (2, 'Office1', 'Canada')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office1', 'Japan')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (1, 'Office2', 'USA')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (1, 'Office2', 'Italy')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office2', 'Canada')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office3', 'Canada')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (2, 'Office3', 'France')
INSERT INTO #STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office3', 'USA')
/* ADD DUMMY VALUES TO STAGING_CITY */
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK, OfficeID, OtherData) VALUES
(1, 1, 'Office1', 'New York')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 1, 'Office1', 'Vancouver')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 1, 'Office1', 'Tokia')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (1, 2, 'Office2', 'New York')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 2, 'Office2', 'Rome')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 2, 'Office2', 'Vancouver')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (1, 3, 'Office3', 'Vancouver')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 3, 'Office3', 'Paris')
INSERT INTO #STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 3, 'Office3', 'New York')
The central head office wants to run reports from a central dtabase that pretty much contains copy all the data from all offices but in order to make this reporting DB optimized, we need to reshuffle a bit the STAGING_Tables ...and reorganize the data in FINAL_Tables that look like this:
DECLARE #FINAL_COUNTRY TABLE
(
CountryID INT IDENTITY PRIMARY KEY,
Original_CountryID INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
Data VARCHAR(200) NOT NULL
);
DECLARE #FINAL_CITY TABLE
(
CityID INT IDENTITY PRIMARY KEY,
Original_CityID INT NOT NULL,
CountryID_FK INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
OtherData VARCHAR(100) NOT NULL
);
PROBLEM:
The FINAL_COUNTRY and FINAL_CITY tables should be as optimized as possible for reporting purposes. These reports will be written in T-SQL stored procedures.
QUESTION:
What is the best way to reorganize the FINAL_Tables so that each record has a TRUE PK identifier (like in the original Office_Tables) and each FK is updated to point to the right newly created PK ...at the server level?
NOTE:
Please note that both staging & final tables are inside the same DB, on the server.
Also we still need to keep the OriginalIDs on the FINAL_Tables for other purposes.
GOALS:
The main goal here is to reorganize into a set of tables that can be easily indexed for performance purposes.
Please ask more info if needed.
Many many thanks in advanced...
This is probably just a partial answer. You may want to consider putting a generic IDENTITY id on each of your staging tables. Something like:
DECLARE #STAGING_COUNTRY TABLE
(
Stage_Country_id INT IDENTITY(1,1) NOT NULL,
Original_CountryID INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
Data VARCHAR(200) NOT NULL
);
DECLARE #STAGING_CITY TABLE
(
Stage_City_id INT IDENTITY(1,1) NOT NULL,
Original_CityID INT NOT NULL,
Original_CountryID_FK INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
OtherData VARCHAR(100) NOT NULL
);
Your final tables should not have the original_ids as you should only have 1 record per city / country in them.
Then I think you'd need some sort of cross reference tables to bridge your final tables to your stage tables. That would look like this:
DECLARE #COUNTRY_xref TABLE
(
country_xref_id INT IDENTITY(1,1) not null,
CountryID INT not null,
Stage_Country_id INT
);
DECLARE #CITY_xref TABLE
(
city_xref_id INT IDENTITY(1,1) not null,
CityID INT not null,
Stage_City_id INT not null
);
Are you also asking what the loading / conversion process would look like or was this more about the schema?
your final tables would probably look like this:
DECLARE #FINAL_COUNTRY TABLE
(
CountryID INT IDENTITY PRIMARY KEY,
Data VARCHAR(200) NOT NULL
);
DECLARE #FINAL_CITY TABLE
(
CityID INT IDENTITY PRIMARY KEY,
CountryID_FK INT NOT NULL,
OtherData VARCHAR(100) NOT NULL
);

Resources