Update table but skipping some rows with specific condition - sql-server

I have a table called body_scan that looks like this:
body_no tag
--------------------
1 noscan
2 noscan
3 missing
4 noscan
5 missing
I also have a list that I can load into a temp table like so
tag_no
------
aaa
bbb
ccc
What I need to be able to do is to update the body_scan table with the tag numbers in the temporary table.
You will notice that there are only 3 tags in the temp table but 5 in the body_scan table. I need to update the tag value "noscan" with values from the temp table and leave the missing as they are..
The order of the tags in the temporary table is the same as the order of body_no from the body_scan table.
So yes, I did consider the row_number() function. But I'm just not 100% sure how to define the join correctly..
How do I achieve this please?
The desired result is :
body_no tag
-------------------
1 aaa
2 bbb
3 missing
4 ccc
5 missing

Firstly, you need to preserve the input file order of data by adding an identity field to the temp_table (note that some ETL tools insert data in parallel and that messes things up so you might even need to add this column to the file)
Once you've done that, you need to generate a key in body_scan that you can join to. This is simply ROW_NUMBER() over the existing table, excluding the missing rows
This returns the row and what it should be matched to in temp_table
SELECT
body_no,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing';
This joins in the temp table (assumes your ordinal column is called RowID)
SELECT T1.body_no, T1.tag, T1.RN, T2.tag_no
FROM
(
SELECT
body_no,tag,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing'
) T1
INNER JOIN
temp_table T2
ON T1.RN=T2.RowID;
This updates it back to the table:
UPDATE TGT
SET tag=SRC.tag_no
FROM body_scan TGT
INNER JOIN
(
SELECT T1.body_no, T2.tag_no
FROM
(
SELECT
body_no,tag,
ROW_NUMBER() OVER (ORDER BY body_no) RN
FROM body_scan
WHERE tag<> 'missing'
) T1
INNER JOIN
temp_table T2
ON T1.RN=T2.RowID
) SRC
ON SRC.body_no=TGT.body_no;
(There's half a dozen ways to write that final statement but I prefer this way as you can see the dataset you're updating from in the subselect)

I cant understand your explanation and command discussion. I workout(in SQL 2012) to achieve your OUTPUT table. As,
update a
set a.tag = t.tag
from (
select m.*, ROW_NUMBER() over(partition by m.tag order by m.rn)trn from(
select *, row_number() over(partition by (select null) order by (select null)) rn from body_scan --set order what the order of actual table's order
) m --set row number for noscan rows
) a
join(
select *, ROW_NUMBER() over(order by (select null)) rn from #temp --set order what the order of actual table order
) t
on a.trn = t.rn and a.tag <> 'missing' -- join to noscan rows using row numbers
OUTPUT:
body_no tag
--------------
1 aaa
2 bbb
3 missing
4 ccc
5 missing

Related

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.
This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.
Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

Merge two rows into one and sum a column

THIS IS NOT ASKING HOW TO USE SUM() AND GROUP BY
I have two rows in tableA
ID VALUE
1 100
1 200
I want tableA:
ID VALUE
1 300
Note that I want to
delete the original two records in tableA
and replace them by the new record in tableA
Is this related to merge function?
I only want to work on tableA, don't want to create any new tables.
How about MERGE:
;with cte as (
select t.*,
row_number() over (partition by id order by id) as rn,
sum(value) over (partition by id) as total_value
from your_table t
)
merge into cte as t
using cte as t2
on (
t.id = t2.id
and t.rn = t2.rn
and t.rn = 1
)
when matched then update set t.value = t2.total_value
when not matched by source then delete;
Demo

How to close sequence gaps in SQL Server?

Let's say I have a Turtle table. When I run
SELECT * FROM Turtle ORDER BY Sort
I get this:
Id | Name | Sort
2 Leo 1
3 Raph 2
4 Don 5
1 Mike 7
What is the easiest way to close the gaps between Raph and Don, and between Don and Mike, so that the table looks like this?
Id | Name | Sort
2 Leo 1
3 Raph 2
4 Don 3
1 Mike 4
This should work no matter how many turtles are in the table, and no matter how many gaps there are or how long each gap is.
You can do the update just using a CTE with the row_number(), and then just update the CTE:
;with CTE as (
select *, row_number () over (order by Sort) as RN
from Turtle
)
update CTE
set Sort = RN
Try this
SELECT ID, Name, ROW_NUMBER() OVER(ORDER BY Sort) AS Sort FROM Turtle
Here's the answer I came up with:
UPDATE t
SET t.Sort = t2.Sort
FROM Turtle AS t,
(SELECT Id, Sort = ROW_NUMBER() OVER(ORDER BY Sort) FROM Turtle) as t2
WHERE t.Id = t2.Id
We can select the Turtle table as t2, ordering the turtles by the Sort column, but assigning the ROW_NUMBER() to the Sort column. We can then use the new value in t2.Sort to update each row in the Turtle table where the Ids match.
Edit (based on Juan Carlos Oropeza's feedback):
Here is the code using an explicit JOIN instead.
UPDATE t
SET t.Sort = t2.Sort
FROM Turtle AS t
JOIN (SELECT Id, ROW_NUMBER() OVER(ORDER BY Sort) AS Sort FROM Turtle) AS t2 ON t.Id = t2.Id
This is what they mean by using a cte in an update.
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Sort) NewSort
FROM Turtle
)
UPDATE cte SET Sort = NewSort

Finding which COLUMN has a max( value per row

MSSQL
Table looks like so
ID 1 | 2 | 3 | 4 | 5
AA1 1 | 1 | 1 | 2 | 1
any clues on how I could make a query to return
ID | MaxNo
AA1 | 4
, usign the above table example? I know I could write a case blah when statement, but I have a feeling there's a much simpler way of doing this
You can use UNPIVOT to get these comparable items, correctly1, into the same column, and then use ROW_NUMBER() to find the highest valued row2:
declare #t table (ID char(3) not null,[1] int not null,[2] int not null,
[3] int not null,[4] int not null,[5] int not null)
insert into #t (ID,[1],[2],[3],[4],[5]) values
('AA1',1,1,1,2,1)
;With Unpivoted as (
select *,ROW_NUMBER() OVER (ORDER BY Value desc) rn
from #t t UNPIVOT (Value FOR Col in ([1],[2],[3],[4],[5])) u
)
select * from Unpivoted where rn = 1
Result:
ID Value Col rn
---- ----------- ------------------------- --------------------
AA1 2 4 1
1 If you have data from the same "domain" appearing in multiple columns in the same table (such that it even makes sense to compare such values), it's usually a sign of attribute splitting, where part of your data has, incorrectly, been used to form part of a column name.
2 In your question, you say "per row", and yet you've only given a one row sample. If we assume that ID values are unique for each row, and you want to find the maximum separately for each ID, you'd write the ROW_NUMBER() as ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Value desc) rn, to get (I hope) the result you're looking for.
You can use a cross apply where you do max() over the columns for one row.
select T1.ID,
T2.Value
from YourTable as T1
cross apply
(
select max(T.Value) as Value
from (values (T1.[1]),
(T1.[2]),
(T1.[3]),
(T1.[4]),
(T1.[5])) as T(Value)
) as T2
If you are on SQL Server 2005 you can use union all in the derived table instead of values().
select T1.ID,
T2.Value
from YourTable as T1
cross apply
(
select max(T.Value) as Value
from (select T1.[1] union all
select T1.[2] union all
select T1.[3] union all
select T1.[4] union all
select T1.[5]) as T(Value)
) as T2
SQL Fiddle

SQL Update with row_number()

I want to update my column CODE_DEST with an incremental number. I have:
CODE_DEST RS_NOM
null qsdf
null sdfqsdfqsdf
null qsdfqsdf
I would like to update it to be:
CODE_DEST RS_NOM
1 qsdf
2 sdfqsdfqsdf
3 qsdfqsdf
I have tried this code:
UPDATE DESTINATAIRE_TEMP
SET CODE_DEST = TheId
FROM (SELECT Row_Number() OVER (ORDER BY [RS_NOM]) AS TheId FROM DESTINATAIRE_TEMP)
This does not work because of the )
I have also tried:
WITH DESTINATAIRE_TEMP AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST=RN
But this also does not work because of union.
How can I update a column using the ROW_NUMBER() function in SQL Server 2008 R2?
One more option
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
DECLARE #id INT
SET #id = 0
UPDATE DESTINATAIRE_TEMP
SET #id = CODE_DEST = #id + 1
GO
try this
http://www.mssqltips.com/sqlservertip/1467/populate-a-sql-server-column-with-a-sequential-number-not-using-an-identity/
With UpdateData As
(
SELECT RS_NOM,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST = RN
FROM DESTINATAIRE_TEMP
INNER JOIN UpdateData ON DESTINATAIRE_TEMP.RS_NOM = UpdateData.RS_NOM
Your second attempt failed primarily because you named the CTE same as the underlying table and made the CTE look as if it was a recursive CTE, because it essentially referenced itself. A recursive CTE must have a specific structure which requires the use of the UNION ALL set operator.
Instead, you could just have given the CTE a different name as well as added the target column to it:
With SomeName As
(
SELECT
CODE_DEST,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE SomeName SET CODE_DEST=RN
This is a modified version of #Aleksandr Fedorenko's answer adding a WHERE clause:
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
WHERE x.CODE_DEST <> x.New_CODE_DEST AND x.CODE_DEST IS NOT NULL
By adding a WHERE clause I found the performance improved massively for subsequent updates. Sql Server seems to update the row even if the value already exists and it takes time to do so, so adding the where clause makes it just skip over rows where the value hasn't changed. I have to say I was astonished as to how fast it could run my query.
Disclaimer: I'm no DB expert, and I'm using PARTITION BY for my clause so it may not be exactly the same results for this query. For me the column in question is a customer's paid order, so the value generally doesn't change once it is set.
Also make sure you have indexes, especially if you have a WHERE clause on the SELECT statement. A filtered index worked great for me as I was filtering based on payment statuses.
My query using PARTITION by
UPDATE UpdateTarget
SET PaidOrderIndex = New_PaidOrderIndex
FROM
(
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
) AS UpdateTarget
WHERE UpdateTarget.PaidOrderIndex <> UpdateTarget.New_PaidOrderIndex AND UpdateTarget.PaidOrderIndex IS NOT NULL
-- test to 'break' some of the rows, and then run the UPDATE again
update [order] set PaidOrderIndex = 2 where PaidOrderIndex=3
The 'IS NOT NULL' part isn't required if the column isn't nullable.
When I say the performance increase was massive I mean it was essentially instantaneous when updating a small number of rows. With the right indexes I was able to achieve an update that took the same amount of time as the 'inner' query does by itself:
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
I did this for my situation and worked
WITH myUpdate (id, myRowNumber )
AS
(
SELECT id, ROW_NUMBER() over (order by ID) As myRowNumber
FROM AspNetUsers
WHERE UserType='Customer'
)
update AspNetUsers set EmployeeCode = FORMAT(myRowNumber,'00000#')
FROM myUpdate
left join AspNetUsers u on u.Id=myUpdate.id
Simple and easy way to update the cursor
UPDATE Cursor
SET Cursor.CODE = Cursor.New_CODE
FROM (
SELECT CODE, ROW_NUMBER() OVER (ORDER BY [CODE]) AS New_CODE
FROM Table Where CODE BETWEEN 1000 AND 1999
) Cursor
If table does not have relation, just copy all in new table with row number and remove old and rename new one with old one.
Select RowNum = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) , * INTO cdm.dbo.SALES2018 from
(
select * from SALE2018) as SalesSource
In my case I added a new column and wanted to update it with the equevilat record number for the whole table
id name new_column (ORDER_NUM)
1 Ali null
2 Ahmad null
3 Mohammad null
4 Nour null
5 Hasan null
6 Omar null
I wrote this query to have the new column populated with the row number
UPDATE My_Table
SET My_Table.ORDER_NUM = SubQuery.rowNumber
FROM (
SELECT id ,ROW_NUMBER() OVER (ORDER BY [id]) AS rowNumber
FROM My_Table
) SubQuery
INNER JOIN My_Table ON
SubQuery.id = My_Table.id
after executing this query I had 1,2,3,... numbers in my new column
I update a temp table with the first occurrence of part where multiple parts can be associated with a sequence number. RowId=1 returns the first occurence which I join the tmp table and data using part and sequence number.
update #Tmp
set
#Tmp.Amount=#Amount
from
(SELECT Part, Row_Number() OVER (ORDER BY [Part]) AS RowId FROM #Tmp
where Sequence_Num=#Sequence_Num
)data
where data.Part=#Tmp.Part
and data.RowId=1
and #Tmp.Sequence_Num=#Sequence_Num
I don't have a running ID in order to do what "Basheer AL-MOMANI" suggested.
I did something like this: (joined my table on myself, just to get the Row Number)
update T1 set inID = T2.RN
from (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T1
inner join (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T2 on T2.RN = T1.RN

Resources