TSQL for updating TABLEA from tableB where no id's exist - sql-server

I have the following table structure.
Table A
InvoiceNumber
InvoiceDate
Sku
SerialNumber
Table B
InvoiceNumber
Invoicedate
Sku
SerialNumber
Table B has valid SerialNumbers, while Table A does NOT (its blank). I would like to update Table A with table B's serial numbers.
There can be multiple records with the same invoiceNumber, Invoicedate and Sku, only serialNumber is unique.
If I do an
update tablea set serialNumber = tableb.serialNumber
where tablea.sku = tableb.sku
and tablea.invoicenumber = tableb.invoicenumber
and tablea.invoicedate = tableb.invoicedate
I end up getting duplicate serials in table a.
Sample Data
Table a
InvoiceNbr : 1 invoiceDate = 10/01/2015 sku = ABC serial = blank
InvoiceNbr : 1 invoiceDate = 10/01/2015 sku = ABC serial = blank
Table b
InvoiceNbr : 1 invoiceDate = 10/01/2015 sku = ABC serial = abc
InvoiceNbr : 1 invoiceDate = 10/01/2015 sku = ABC serial = xyz
No matter what I do I always end up with dupes in table a :|

Try this:
update tableA
set serialNumber = b.serialNumber
from (select *, row_number() over (partition by invoicenumber, invoicedate, sku order by serialnumber) rn from tableA) a
inner join
(select *, row_number() over (partition by invoicenumber, invoicedate, sku order by serialnumber) rn from tableB) b
on a.sku = b.sku and a.invoicenumber = b.invoicenumber and a.invoicedate = b.invoicedate and a.rn = b.rn
Demo
If I understand correctly, there are records in table B with all columns except serialNumber having same values, and your current update logic just fills table A with one of the values of serialNumber rather than a 1 to 1 update. The above solution uses row_number to create an extra identifier for each row in table B, and then uses that as an additional criteria to match the rows for updating.

Related

SSIS audit step encountering errors

I have a step in my SSIS package where I'd like to update the latest row in my execution log (T1) with information from the latest row in another table (T2).
I get an error around the 'Where' statement
UPDATE T1
SET
[Survey_Size] = ssd.[FileName]
,Survey_Start_Date = ssd.[Start_Date]
,Survey_End_Date = ssd.[End_Date]
,[EndTime] = getdate()
,loaded = 1
FROM (SELECT max(log_sk) AS maxSk FROM T1) A
JOIN (SELECT max(PK) AS maxPK FROM T2) SS
JOIN (SELECT PK, [FileName], Start_Date, End_Date, Survey_Size FROM T2) ssd ON ss.maxPK = ssd.pk
WHERE log_sk = a.maxSk
Table 1 looks like this:
log_sk | FileName | Survey_Size | Start_Date | End_Date
and I'd like to update the information from Table 2 which looks like below, where FileName would be a joining key in both
PK | FileName | Start_Date | End_Date | Survey_Size
I rewrite it with CTE Because it's more efficient and much more readable.
;With LastT1 as (
Select
log_sk as ID,
Survey_Size,
Survey_Start_Date,
Survey_End_Date,
EndTime,
loaded,
ROW_NUMBER() over (order by log_sk Desc) as Row_No
From T1
), LatestT2 as (
Select
PK as ID,
[FileName],
[Start_Date],
End_Date,
Survey_Size,
ROW_NUMBER() over (order by PK Desc) as Row_No
From T2
)
Update Source
Set
Source.[Survey_Size] = LatestT2.[FileName],
Source.Survey_Start_Date = LatestT2.[Start_Date],
Source.Survey_End_Date = LatestT2.[End_Date],
Source.[EndTime] = getdate(),
Source.loaded = 1
From LastT1 as Source
Inner Join LatestT2 on Source.ID = LatestT2.ID and LatestT2.Row_No = 1
Where Source.Row_No = 1

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.
This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.
Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

Delete a row based on another row

I have the following:
stock | Customer
12345 | NULL
12345 | ABC
What I want to do is remove the first without affecting the second anytime there is a set of rows like this:
if exists (select stock from table WHERE stock='12345' AND Customer is not null )
BEGIN
DELETE FROM table WHERE stock= '12345' AND Customer is null
END
The query works, but how can I change it so that I don't have to specify a stock? I want to keep the rows with null customer is it is the only value associated with that stock.
You can use exists:
DELETE t0
FROM table t0
WHERE Customer IS NULL
AND EXISTS
(
SELECT 1
FROM table t1
WHERE t0.stock = t1.stock
AND t1.Customer IS NOT NULL
)
This will only delete records where the customer is null and there is at least one record with the same stock id.
Please check following SQL DELETE command within CTE expression
I used SQL Count function with Partition By clause.
For testing NOT NULL customer field values I counted them per stock with filed name enabling me to remove NULL
;with cte as (
select
stock,
Customer,
cnt = Count(Customer) over (partition by stock)
from StockCustomer
)
delete from cte
where Customer is null and cnt > 0
You can consider different situations like in following rows
create table StockCustomer (stock int, Customer varchar(10))
insert into StockCustomer select 12345 , NULL
insert into StockCustomer select 12345 , 'ABC'
insert into StockCustomer select 11111 , 'XYZ'
insert into StockCustomer select 555555 , NULL
Use the following:
WITH CTE (stock, customer, DuplicateCount)
AS
(
SELECT stock, customer,
ROW_NUMBER() OVER(PARTITION BY Stock ORDER BY customer desc) AS DuplicateCount
FROM [Table]
)
DELETE
FROM CTE
WHERE DuplicateCount > 1 and customer is NULL
GO
You can use cross join as follows:
DELETE
FROM mytable
WHERE stock IN (
SELECT m2.stock
FROM mytable m1
CROSS JOIN mytable m2
WHERE m1.customer IS NULL
GROUP BY m2.stock
HAVING count(m2.stock) > 1
)
AND customer is NULL
Just do delete from table where customer is null if this is your only requirement.

Delete row from table that is NULL and has the same DATE

I have a SQL question, which I am not able to handle.
I need to delete rows from a table, which are NULL in the date_to columns, but have a different value to the date_from column.
The column I want to delete is the one with the number 3.
It has the same value in date_from and is null.
This was my own approach:
select
date_from, id, count(*) number_of_rows_with_the_same_datefrom
from
test
group by
date_from, id
having
count(*) > 1
This returns all the rows that have the same from date with the same ID, from there I was lost:)
Another option would be to use a CTE and Row_Number()
;with cte as (
Select *
,RN=Row_Number() over (Partition By ID, date_from Order by date_to Desc)
From YourTable
)
Select * --<< Remove if Satisfied
--Delete --<< Remove comment if Statisfied
From cte
Where RN>1 and date_to is null
Returns (Records to be deleted)
ID name date_from date_to RN
2 Jim 2012-08-01 NULL 2
Using an exists statement, this seems like it should be pretty straight forward.
DELETE FROM table A
WHERE exists (SELECT 1
FROM table B
WHERE A.Id = B.ID
and A.Date_from = B.date_from
and A.Name = B.Name
and B.Date_To is null)
Assuming ID and Name and date_from make some sort of key... I'd be more comfortable knowing what the PK is on this table. (or a unique index if one exists)
To verify it's going to do what you want; run this first.
SELECT *
FROM table A
WHERE exists (SELECT 1
FROM table B
WHERE A.Id = B.ID
and A.Date_from = B.date_from
and A.Name = B.Name
and B.Date_To is null)

Joining on unique ID and date range - must return 1 row

In my calculated data layer, I am attempting to populate a Customer's postcode at the time of the order, a sub sample of the table being populated is as follows:
CustomerOrders
(
CustomerID varchar(20),
...
OrderDate date,
...
CustomerPostcodeAtTimeOfOrder varchar(10)
)
This table is a join of the Customers table, the Orders table and the CustomerAddress table which looks like follows:
CustomerAddress
(
CustomerID varchar(20),
AddressType varchar(10),
/*
AddressDetails
*/
StartDate date,
EndDate date,
AddressRank int
)
It is quite conceivable that a customer may have recorded addresses of various types for a single date so the intention when populating the CustomerOrders table is to join as below:
SELECT *
FROM Customers c
LEFT JOIN Orders o
ON o.CustomerID = c.CustomerID
OUTER APPLY
(
SELECT TOP 1 Postcode
FROM CustomerAddress ca
WHERE ca.CustomerID = c.CustomerID
AND o.OrderDate BETWEEN ca.StartDate AND ca.EndDate
ORDER BY AddressRank
)
However, the performance hit I am getting by adding this join to the query means that returning 1000 rows goes from taking 4 seconds to taking 106 seconds.
Just to note, I have added a non-clustered index on the Address table too. The definition of which is as below:
CREATE NONCLUSTERED INDEX (IX_CustomerAddress)
ON CustomerAddress (StartDate, EndDate)
INCLUDE (AddressRank, CustomerID, Postcode)
I'm looking for any suggestions on the best way to tackle this issue please?
I'm not completely sure if this will return results faster, but you can rewrite your query like this:
;WITH OrderAddress AS
(
SELECT o.*,
ca.Postcode,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY AddressRank DESC)
FROM CustomerAddress ca
INNER JOIN Orders o
ON ca.CustomerID = c.CustomerID
AND o.OrderDate BETWEEN ca.StartDate AND ca.EndDate
)
SELECT *
FROM Customers c
LEFT JOIN ( SELECT *
FROM OrderAddress
WHERE RN = 1) o
ON o.CustomerID = c.CustomerID;
You should also post the index definition on the Address table.

Resources