Replace Values in Oracle DB with set of test data - database

I am currently working on a project where I have a copy of a production Oracle database with some production values that I want to replace with a set of test values (with the assumption that I have more production records than test ones and I will have duplicates).
Here is a sample of what I am looking to do:
Any suggestions would be greatly appreciated.

One approach would be to number the rows by their rowid, then use a modulo operation to match the original data to your test data table, like this:
merge into customer c
using (
with cte_customer as (
select rowid xrowid, mod(row_number() over (order by rowid)-1,(select count(*) from test_data))+1 rownumber
from customer
order by rowid
), cte_testdata as (
select row_number() over (order by rowid) rownumber, first_name, last_name, email
from test_data
order by rowid
)
select c.xrowid, t.last_name, t.first_name, t.email
from cte_customer c
left outer join cte_testdata t on (t.rownumber = c.rownumber)
order by c.xrowid
) u
on (c.rowid = u.xrowid)
when matched then update set
c.last_name = u.last_name,
c.first_name = u.first_name,
c.email = u.email

Related

Updating multiple row with random data from another table?

Combining some examples, I came up with the following query (fields and table names have been anonymised soI hope I didn't insert typos).
UPDATE destinationTable
SET destinationField = t2.value
FROM destinationTable t1
CROSS APPLY (
SELECT TOP 1 'SomeRequiredPrefix ' + sourceField as value
FROM #sourceTable
WHERE sourceField <> ''
ORDER BY NEWID()
) t2
Problem
Currently, all records get the same value into destinationField , value needs to be random and different. I'm probably missing something here.
Here's a possible solution. Using CTE's assign row numbers to both tables based on random order. Join the tables together using that rownumber and update the rows accordingly.
;WITH
dt AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.destinationtable),
st AS
(SELECT *, ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNum
FROM dbo.#sourcetable)
UPDATE dt
SET dt.destinationfield = 'SomeRequiredPrefix ' + st.sourcefield
FROM dt
JOIN st ON dt.RowNum = st.RowNum
UPDATED SOLUTION
I used CROSS JOIN to get all possibilities since you have less rows in source table. Then assign random rownumbers and only take 1 row for each destination field.
;WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY destinationfield ORDER BY NEWID()) AS Rownum
FROM destinationtable
CROSS JOIN #sourcetable
WHERE sourcefield <> ''
)
UPDATE cte
SET cte.destinationfield = 'SomeRequiredPrefix ' + sourcefield
WHERE cte.Rownum = 1
SELECT * FROM dbo.destinationtable

How to concatenate 3 tables MSSQL Server

In a single database, I have three tables as below:
I want to concatenate them in some single table, any suggestion?
If I am understanding correctly your answer you want something like this:
CREATE TABLE COMBINED (/*Insert the set of fields here*/);
GO
INSERT INTO COMBINED
SELECT Table1.ProposalId
, /*all other Table1 fields*/
, Table2.ProposalId
/*all other Table2 fields*/
, Table3.ProposalId
, /*all other Table2 fields*/
FROM Table1
FULL JOIN Table2
ON Table1.ProposalId = Table2.ProposalId
FULL JOIN Table3
ON Table1.ProposalId = Table3.ProposalId;
This will match the rows with the same proposal id (but keep the all the its values as proposalid_1,proposalid_2... and for those that do not match will bring the full row and NULL for every other fields of the rest of the tables.
Thanks for your help the issue has been solved.
SELECT mtlreq.proposalid, mtlreq.prp_mtlreq_taskgrp,mtlreq.prp_mtlreq_taskcode,mtlreq.prp_mtlreq_itemcode,mtlreq.prp_mtlreq_rateper,mtlreq.prp_mtlreq_qty,mtlreq.prp_mtlreq_Inter_MaterCost,mtlreq.prp_mtlreq_UOM,mtlreq.item_short_desc, resreq.proposalid,resreq.prp_resreq_taskcode ,resreq.prp_resreq_resource,resreq.prp_resreq_usage,resreq.prp_resreq_uom,resreq.prp_resreq_rate,resreq.prp_resreq_overhd_pers
FROM (
SELECT proposalid,prp_mtlreq_taskgrp,prp_mtlreq_taskcode,prp_mtlreq_itemcode,prp_mtlreq_rateper,prp_mtlreq_qty,prp_mtlreq_Inter_MaterCost,prp_mtlreq_UOM,item_short_desc,
ROW_NUMBER() OVER (ORDER BY proposalid) AS rn
FROM prjdet_prp_taskwork_mtlreq ) AS mtlreq
FULL OUTER JOIN (
SELECT proposalid,prp_resreq_taskcode ,prp_resreq_resource,prp_resreq_usage,prp_resreq_uom,prp_resreq_rate,prp_resreq_overhd_pers,
ROW_NUMBER() OVER (ORDER BY proposalid) AS rn
FROM prjdet_prp_taskwork_resreq) AS resreq
ON mtlreq.rn = resreq.rn
FULL OUTER JOIN (
SELECT proposalid,mtprp_delv_lineno,mtprp_delv_itemcode,mtprp_delv_cost,mtprp_delv_linelevelmrg,mtprp_delv_proposedamt,
ROW_NUMBER() OVER (ORDER BY proposalid) AS rn
FROM prjproposal_delidtl ) AS delidtl
on mtlreq.rn = delidtl.rn and resreq.rn=delidtl.rn

How to extract the last records based on entrydate sql server

i have many duplicate job id but entry date is can not be duplicate. i need to fetch always unique job id based on last entry date. i have solved it with the below query but like to know is there any better way to form the same sql when data would be huge for best performance. please guide me thanks.
SELECT A.JID,A.EntryDate,RefundDate,Comments,Refund, ActionBy
FROM (
(
select JID, Max(EntryDate) AS EntryDate
from refundrequested
GROUP BY JID
) A
Inner JOIN
(
SELECT JID,ENTRYDATE,refundDate,Comments,refund,ActionBy
from refundrequested
) B
ON A.JID=B.JID AND A.EntryDate = B.EntryDate
)
Using the row_number() function is usually a bit faster:
select *
from (
select row_number() over (partition by jid
order by EntryDate desc) as rn
, *
from refundrequested
) as SubQueryAlias
where rn = 1
Query:
SELECT t1.JID,
t1.EntryDate,
t1.RefundDate,
t1.Comments,
t1.Refund,
t1.ActionBy
FROM refundrequested t1
LEFT JOIN refundrequested t2
ON t2.JID = t1.JID
AND t2.EntryDate > t1.EntryDate
WHERE t2.JID is null

SQL Update with row_number()

I want to update my column CODE_DEST with an incremental number. I have:
CODE_DEST RS_NOM
null qsdf
null sdfqsdfqsdf
null qsdfqsdf
I would like to update it to be:
CODE_DEST RS_NOM
1 qsdf
2 sdfqsdfqsdf
3 qsdfqsdf
I have tried this code:
UPDATE DESTINATAIRE_TEMP
SET CODE_DEST = TheId
FROM (SELECT Row_Number() OVER (ORDER BY [RS_NOM]) AS TheId FROM DESTINATAIRE_TEMP)
This does not work because of the )
I have also tried:
WITH DESTINATAIRE_TEMP AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST=RN
But this also does not work because of union.
How can I update a column using the ROW_NUMBER() function in SQL Server 2008 R2?
One more option
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
DECLARE #id INT
SET #id = 0
UPDATE DESTINATAIRE_TEMP
SET #id = CODE_DEST = #id + 1
GO
try this
http://www.mssqltips.com/sqlservertip/1467/populate-a-sql-server-column-with-a-sequential-number-not-using-an-identity/
With UpdateData As
(
SELECT RS_NOM,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE DESTINATAIRE_TEMP SET CODE_DEST = RN
FROM DESTINATAIRE_TEMP
INNER JOIN UpdateData ON DESTINATAIRE_TEMP.RS_NOM = UpdateData.RS_NOM
Your second attempt failed primarily because you named the CTE same as the underlying table and made the CTE look as if it was a recursive CTE, because it essentially referenced itself. A recursive CTE must have a specific structure which requires the use of the UNION ALL set operator.
Instead, you could just have given the CTE a different name as well as added the target column to it:
With SomeName As
(
SELECT
CODE_DEST,
ROW_NUMBER() OVER (ORDER BY [RS_NOM] DESC) AS RN
FROM DESTINATAIRE_TEMP
)
UPDATE SomeName SET CODE_DEST=RN
This is a modified version of #Aleksandr Fedorenko's answer adding a WHERE clause:
UPDATE x
SET x.CODE_DEST = x.New_CODE_DEST
FROM (
SELECT CODE_DEST, ROW_NUMBER() OVER (ORDER BY [RS_NOM]) AS New_CODE_DEST
FROM DESTINATAIRE_TEMP
) x
WHERE x.CODE_DEST <> x.New_CODE_DEST AND x.CODE_DEST IS NOT NULL
By adding a WHERE clause I found the performance improved massively for subsequent updates. Sql Server seems to update the row even if the value already exists and it takes time to do so, so adding the where clause makes it just skip over rows where the value hasn't changed. I have to say I was astonished as to how fast it could run my query.
Disclaimer: I'm no DB expert, and I'm using PARTITION BY for my clause so it may not be exactly the same results for this query. For me the column in question is a customer's paid order, so the value generally doesn't change once it is set.
Also make sure you have indexes, especially if you have a WHERE clause on the SELECT statement. A filtered index worked great for me as I was filtering based on payment statuses.
My query using PARTITION by
UPDATE UpdateTarget
SET PaidOrderIndex = New_PaidOrderIndex
FROM
(
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
) AS UpdateTarget
WHERE UpdateTarget.PaidOrderIndex <> UpdateTarget.New_PaidOrderIndex AND UpdateTarget.PaidOrderIndex IS NOT NULL
-- test to 'break' some of the rows, and then run the UPDATE again
update [order] set PaidOrderIndex = 2 where PaidOrderIndex=3
The 'IS NOT NULL' part isn't required if the column isn't nullable.
When I say the performance increase was massive I mean it was essentially instantaneous when updating a small number of rows. With the right indexes I was able to achieve an update that took the same amount of time as the 'inner' query does by itself:
SELECT PaidOrderIndex, SimpleMembershipUserName, ROW_NUMBER() OVER(PARTITION BY SimpleMembershipUserName ORDER BY OrderId) AS New_PaidOrderIndex
FROM [Order]
WHERE PaymentStatusTypeId in (2,3,6) and SimpleMembershipUserName is not null
I did this for my situation and worked
WITH myUpdate (id, myRowNumber )
AS
(
SELECT id, ROW_NUMBER() over (order by ID) As myRowNumber
FROM AspNetUsers
WHERE UserType='Customer'
)
update AspNetUsers set EmployeeCode = FORMAT(myRowNumber,'00000#')
FROM myUpdate
left join AspNetUsers u on u.Id=myUpdate.id
Simple and easy way to update the cursor
UPDATE Cursor
SET Cursor.CODE = Cursor.New_CODE
FROM (
SELECT CODE, ROW_NUMBER() OVER (ORDER BY [CODE]) AS New_CODE
FROM Table Where CODE BETWEEN 1000 AND 1999
) Cursor
If table does not have relation, just copy all in new table with row number and remove old and rename new one with old one.
Select RowNum = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) , * INTO cdm.dbo.SALES2018 from
(
select * from SALE2018) as SalesSource
In my case I added a new column and wanted to update it with the equevilat record number for the whole table
id name new_column (ORDER_NUM)
1 Ali null
2 Ahmad null
3 Mohammad null
4 Nour null
5 Hasan null
6 Omar null
I wrote this query to have the new column populated with the row number
UPDATE My_Table
SET My_Table.ORDER_NUM = SubQuery.rowNumber
FROM (
SELECT id ,ROW_NUMBER() OVER (ORDER BY [id]) AS rowNumber
FROM My_Table
) SubQuery
INNER JOIN My_Table ON
SubQuery.id = My_Table.id
after executing this query I had 1,2,3,... numbers in my new column
I update a temp table with the first occurrence of part where multiple parts can be associated with a sequence number. RowId=1 returns the first occurence which I join the tmp table and data using part and sequence number.
update #Tmp
set
#Tmp.Amount=#Amount
from
(SELECT Part, Row_Number() OVER (ORDER BY [Part]) AS RowId FROM #Tmp
where Sequence_Num=#Sequence_Num
)data
where data.Part=#Tmp.Part
and data.RowId=1
and #Tmp.Sequence_Num=#Sequence_Num
I don't have a running ID in order to do what "Basheer AL-MOMANI" suggested.
I did something like this: (joined my table on myself, just to get the Row Number)
update T1 set inID = T2.RN
from (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T1
inner join (select *, ROW_NUMBER() over (order by ID) RN from MyTable) T2 on T2.RN = T1.RN

Finding Duplicate Data in Oracle

I have a table with 500,000+ records, and fields for ID, first name, last name, and email address. What I'm trying to do is find rows where the first name AND last name are both duplicates (as in the same person has two separate IDs, email addresses, or whatever, they're in the table more than once). I think I know how to find the duplicates using GROUP BY, this is what I have:
SELECT first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
The problem is that I need to then move the entire row with these duplicated names into a different table. Is there a way to find the duplicates and get the whole row? Or at least to get the IDs as well? I tried using a self-join, but got back more rows than were in the table to begin with. Would that be a better approach? Any help would be greatly appreciated.
The most effective way to remove duplicate rows is with a self-join:
DELETE FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
This will remove all duplicates even if there are more than one duplicate row.
There is more on removing duplicates and differing methods here: http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
Hope it helps...
EDIT: As per your comments, if you want to select all but one of the duplicates then
SELECT *
FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
An index on (first_name, last_name) or on (last_name, first_name) would help:
SELECT t.*
FROM
person_table t
JOIN
( SELECT first_name, last_name
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) dup
ON dup.last_name = t.last_name
AND dup.first_name = t.first_name
or:
SELECT t.*
FROM person_table t
WHERE EXISTS
( SELECT *
FROM person_table dup
WHERE dup.last_name = t.last_name
AND dup.first_name = t.first_name
AND dup.ID <> t.ID
)
This will give you an ID you want to move/delete/etc. Note that it does not work if count(*) > 2, as you get only 1 ID (you could re-run your query for these cases).
SELECT max(ID), first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
Edit: You can use COLLECT to get all IDs at once (but be careful, as you only want to move/delete all but one)
To add another option, I usually use this one to remove duplicates:
delete from person_table
where rowid in (select rid
from (select rowid rid, row_number() over
(partition by first_name,last_name order by rowid) rn
from person_table
)
where rn <> 1 )

Resources