Update Or Skip Rows with duplicate Primary Key While Bulk Insert - SQL - sql-server

I'm using Bulk Insert Method to insert rows from CSV files. But it will fail on duplicate primary keys.
Here is my Sample Code:
Use People
Go
BULK
INSERT tblProfile
FROM 'F:\People.txt'
WITH
(
DATAFILETYPE='widechar',
CODEPAGE = 'ACP',
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'F:\ErrorRows.csv'
)
GO
I need to update fields on duplicate primary keys' rows.
For example here is a sample of my table:
Code Name Family City
---------------------------
45 Joe Stone USA
67 Sara Stone USA
68 Stone
if there is a row with code "68" in CSV file and in this row we have name or city (which is empty or null in my table) then bulk insert update and fill them otherwise skip this duplication on primary key and do the insert for others.
Is something like this possible?

As DoctorMick stated here
You could set the MAXERRORS property to quite a high which will allow the valid records to be inserted and the duplicates to be ignored. Unfortunately, this will mean that any other errors in the dataset won't cause the load to fail.
Alternatively, you could set the BATCHSIZE property which will load the data in multiple transactions therefore if there are duplicates it will only roll back the batch.
Or
Use Temp table to filter the Duplicate and Update it
INSERT INTO #tblProfile(Id, Col1) -- temporary table
VALUES
(3, S3),
(4, S4),
(5, S5)
INSERT INTO tblProfile
SELECT * FROM #tblProfile
WHERE NOT EXISTS (SELECT Id FROM #tblProfile WHERE #tblProfile.Id = tblProfile.id)
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY id
ORDER BY ( SELECT 0)) RN
FROM #tblProfile)
DELETE FROM cte
WHERE RN > 1
Update T
SET T.Col1 = ISNULL(T1.Col1,T.Col1)
FROM tblProfile T join #tblProfile T1 ON T.id =T1.id

Related

How is this SQL sub-query correctly vectorising?

Sample data:
CREATE TABLE Departments (
Code INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
Budget decimal NOT NULL
);
CREATE TABLE Employees (
SSN INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
LastName varchar(255) NOT NULL ,
Department INTEGER NOT NULL ,
foreign key (department) references Departments(Code)
)
INSERT INTO Departments(Code,Name,Budget) VALUES(14,'IT',65000);
INSERT INTO Departments(Code,Name,Budget) VALUES(37,'Accounting',15000);
INSERT INTO Departments(Code,Name,Budget) VALUES(59,'Human Resources',240000);
INSERT INTO Departments(Code,Name,Budget) VALUES(77,'Research',55000);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('123234877','Michael','Rogers',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('152934485','Anand','Manikutty',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('222364883','Carol','Smith',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('326587417','Joe','Stevens',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332154719','Mary-Anne','Foster',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332569843','George','O''Donnell',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('546523478','John','Doe',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('631231482','David','Smith',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('654873219','Zacary','Efron',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('745685214','Eric','Goldsmith',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657245','Elizabeth','Doe',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657246','Kumar','Swamy',14);
Problem: "Select the names of departments with more than two employees."
Wikibooks solution:
/*With subquery*/
SELECT D.Name FROM Departments D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code
);
My question: How does this solution work? That is, how does MSSQL know which values in Departments are to be kept from the sub-query? I can't see any way that the condition WHERE Department = D.Code can return a result that is ordered in a useful way to the outer query. I don't think that this is a fluke, I think that I just don't understand how SQL is vectorised.
This is called a correlated subquery.
That is to say, the inner query is correlated to the outer one by use of an outer reference. In this case, that is D.Code. Therefore the subquery is being calculated for every row of D.
It's not a matter of ordering, in fact this query can return results in any order. But the result from the subquery must be greater than 2 otherwise the WHERE predicate fails.
SELECT D.Name FROM Departments D -- Departments has been aliased as D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code -- Here the inner query is being limited by
-- the reference to the outer D table
);
I would probably use ... > 2 rather than 2 < ...
Side point: It's better to always use an explicit table reference in subqueries, eg e.Department = D.Code, because otherwise you could misspell a column and end up referring to an outer column instead of an inner column, and the correlation wouldn't work properly

I only want to insert data from table2 to table1 that doesn't exist already on MS SQL

I have a table named MarketData, and I am trying to pull certain columns from MarketData and insert them to another table called Payups. But I only want to insert data that doesn't repeat onto Payups.
Keep in mind, the MarketData table includes more columns that I don't need in the PayUps column.
After I execute the query below, it keeps on adding records that are similar.
INSERT INTO [MortgagePipeline].[dbo].[PayUps]
(
RecordDate
,Investor
,Product
,MinCoupon
,[Payup(bcp)]
)
SELECT DISTINCT t1.ContractDate
,t1.SourceCode
,t1.ProductName
,t1.ContractInterestRate
,t1.ContractPrice
FROM [MortgagePipeline].[dbo].[MarketData] t1
WHERE NOT EXISTS
(
SELECT RecordDate
,Investor
,Product
,MinCoupon
,[Payup(bcp)]
FROM [MortgagePipeline].[dbo].[PayUps] t2
WHERE t2.RecordDate = t1.ContractDate
AND t2.Investor = t1.SourceCode
AND t2.Product = t1.ProductName
AND t2.MinCoupon = t1.ContractInterestRate
AND t2.[Payup(bcp)] = t1.ContractPrice
AND t2.MaxCoupon = NULL
AND t2.PayupName = NULL
)
RecordDate, Investor , Product ,MinCoupon ,[Payup(bcp)], MaxCoupon PayupName are all the columns that exists on Payups. all of The values of those columns are inserted from the MarketData Table except from MaxCoupon and PayupName that remain null.

Show all and only rows in table 1 not in table 2 (using multiple columns)

I have one table (Table1) that has several columns used in combination: Name, TestName, DevName, Dept. When each of these 4 columns have values, the record is inserted into Table2. I need to confirm that all of the records with existing values in each of these fields within Table1 were correctly copied into Table 2.
I have created a query for it:
SELECT DISTINCT wr.Name,wr.TestName, wr.DEVName ,wr.Dept
FROM table2 wr
where NOT EXISTS (
SELECT NULL
FROM TABLE1 ym
WHERE ym.Name = wr.Name
AND ym.TestName = wr. TestName
AND ym.DEVName = wr.DEVName
AND ym. Dept = wr. Dept
)
My counts are not adding up, so I believe that this is incorrect. Can you advise me on the best way to write this query for my needs?
You can use the EXCEPT set operator for this one if the table definitions are identical.
SELECT DISTINCT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM table1 ym
EXCEPT
SELECT DISTINCT wr.Name, wr.TestName, wr.DEVName, wr.Dept
FROM table2 wr
This returns distinct rows from the first table where there is not a match in the second table. Read more about EXCEPT and INTERSECT here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017
Your query should do the job. It checks anything that are in Table1, but not Table2
SELECT ym.Name, ym.TestName, ym.DEVName, ym.Dept
FROM Table1 ym
WHERE NOT EXISTS (
SELECT 1
FROM table2
WHERE ym.Name = Name AND ym.TestName = TestName AND ym.DEVName = DEVName AND ym. Dept = Dept
)
If the structure of both tables are the same, EXCEPT is probably simpler.
IF OBJECT_ID(N'tempdb..#table1') IS NOT NULL drop table #table1
IF OBJECT_ID(N'tempdb..#table2') IS NOT NULL drop table #table2
create table #table1 (id int, value varchar(10))
create table #table2 (id int)
insert into #table1(id, value) VALUES (1,'value1'), (2,'value2'), (3,'value3')
--test here. Comment next line
insert into #table2(id) VALUES (1) --Comment/Uncomment
select * from #table1
select * from #table2
select #table1.*
from #table1
left JOIN #table2 on
#table1.id = #table2.id
where (#table2.id is not null or not exists (select * from #table2))

SQL Server 2008: Unique constraint for values non-related with columns

I have a simple problem. How can I add a unique constraint for a table, without relating the values to their columns? For example, I have this table
ID_A ID_B
----------
1 2
... ...
In that example, I have the record (1,2). For me, (1,2) = (2,1). So i don't want to allow my database to store both values. I know I can accomplish it using, triggers or checks and functions. But i was wondering if there is any instruccion like
CREATE UNIQUE CONSTRAINT AS A SET_CONSTRAINT
You could write a view like that:
select 1 as Dummy
from T t1
join T t2 on t1.ID1 = t2.ID2 AND t1.ID2 = t2.ID1 --join to corresponding row
cross join TwoRows
And create a unique index on Dummy. TwoRows is a table that contains two rows with arbitrary contents. It is supposed to make the unique index fail if there ever is a row in it. Any row in this view indicates a uniqueness violation.
You can do this using Instead of Insert trigger.
Demo
Table Schema
CREATE TABLE te(ID_A INT,ID_B INT)
INSERT te VALUES ( 1,2)
Trigger
Go
CREATE TRIGGER trg_name
ON te
instead OF INSERT
AS
BEGIN
IF EXISTS (SELECT 1
FROM inserted a
WHERE EXISTS (SELECT 1
FROM te b
WHERE ( ( a.id_a = b.id_b
AND a.id_b = b.id_a )
OR ( a.id_a = b.id_a
AND a.id_b = b.id_b ) )))
BEGIN
PRINT 'duplciate record'
ROLLBACK
END
ELSE
INSERT INTO te
SELECT Id_a,id_b
FROM inserted
END
SELECT * FROM te
Insert Script
INSERT INTO te VALUES (2,1) -- Duplicate
INSERT INTO te VALUES (1,2) --Duplicate
INSERT INTO te VALUES (3,2) --Will work

How to get records where csv column contains all the values of a filter

i have a table some departments tagged with user as
User | Department
user1 | IT,HR,House Keeping
user2 | HR,House Keeping
user3 | IT,Finance,HR,Maintainance
user4 | Finance,HR,House Keeping
user5 | IT,HR,Finance
i have created a SP that take parameter varchar(max) as filter (i dynamically merged if in C# code)
in the sp i creat a temp table for the selected filters eg; if user select IT & Finance & HR
i merged the string as IT##Finance##HR (in C#) & call the sp with this parameter
in SP i make a temp table as
FilterValue
IT
Finance
HR
now the issue how can i get the records that contains all the departments taged with them
(users that are associated with all the values in temp table) to get
User | Department
user3 | IT,Finance,HR,Maintainance
user5 | IT,HR,Finance
as optput
please suggest an optimised way to achive this filtering
This design is beyond horrible -- you should really change this to use truly relational design with a dependent table.
That said, if you are not in a position to change the design, you can limp around the problem with XML, and it might give you OK performance.
Try something like this (replace '#test' with your table name as needed...). You won't need to even create your temp table -- this will jimmy your comma-delimited string around into XML, which you can then use XQuery against directly:
DECLARE #test TABLE (usr int, department varchar(1000))
insert into #test (usr, department)
values (1, 'IT,HR,House Keeping')
insert into #test (usr, department)
values (2, 'HR,House Keeping')
insert into #test (usr, department)
values (3, 'IT,Finance,HR,Maintainance')
insert into #test (usr, department)
values (4, 'Finance,HR,House Keeping')
insert into #test (usr, department)
values (5, 'IT,HR,Finance')
;WITH departments (usr, department, depts)
AS
(
SELECT usr, department, CAST(NULLIF('<department><dept>' + REPLACE(department, ',', '</dept><dept>') + '</dept></department>', '<department><dept></dept></department>') AS xml)
FROM #test
)
SELECT departments.usr, departments.department
FROM departments
WHERE departments.depts.exist('/department/dept[text()[1] eq "IT"]') = 1
AND departments.depts.exist('/department/dept[text()[1] eq "HR"]') = 1
AND departments.depts.exist('/department/dept[text()[1] eq "Finance"]') = 1
I agree with others that your design is, um, not ideal, and given the fact that it may change, as per your comment, one is not too motivated to find a really fascinating solution for the present situation.
Still, you can have one that at least works correctly (I think) and meets the situation. Here's what I've come up with:
;
WITH
UserDepartment ([User], Department) AS (
SELECT 'user1', 'IT,HR,House Keeping' UNION ALL
SELECT 'user2', 'HR,House Keeping' UNION ALL
SELECT 'user3', 'IT,Finance,HR,Maintainance' UNION ALL
SELECT 'user4', 'Finance,HR,House Keeping' UNION ALL
SELECT 'user5', 'IT,HR,Finance'
),
Filter (FilterValue) AS (
SELECT 'IT' UNION ALL
SELECT 'Finance' UNION ALL
SELECT 'HR'
),
CSVSplit AS (
SELECT
ud.*,
--x.node.value('.', 'varchar(max)')
x.Value AS aDepartment
FROM UserDepartment ud
CROSS APPLY (SELECT * FROM dbo.Split(',', ud.Department)) x
)
SELECT
c.[User],
c.Department
FROM CSVSplit c
INNER JOIN Filter f ON c.aDepartment = f.FilterValue
GROUP BY
c.[User],
c.Department
HAVING
COUNT(*) = (SELECT COUNT(*) FROM Filter)
The first two CTEs are just sample tables, the rest of the query is the solution proper.
The CSVSplit CTE uses a Split function that splits a comma-separated list into a set of items and returns them as a table. The entire CTE turns the row set of the form
----- ---------------------------
user1 department1,department2,...
... ...
into this:
----- -----------
user1 department1
user1 department2
... ...
The main SELECT joins the normalised row set with the filter table and selects rows where the number of matches exactly equals the number of items in the filter table. (Note: this implies there's no identical names in UserDepartment.Department).

Resources