Clone row and maintain a reference to cloned row - sql-server

I want to clone some row from a table, the row must be selected using a specific condition, but I also need to save in a temporary table N pair(sourceRowID, destinationRowID)
I've tried using OUTPUT but I can't use fields of the source row
INSERT INTO myTable(Value1, Value2)
OUTPUT myTable.ID, Inserted.ID
INTO #tempTable(sourceRowID, destinationRowID)
SELECT Value1, Value2
FROM myTable
WHERE Value2 > 10
Is there a solution to this? Unfortunately I am constraint to use SQL server 2008
UPDATE:
For now this is my solution:
First of all I write the inserted record using the OUTPUT instruction to the #tempTable ensuring that the records are inserted ordered by ID
At this point I've the #tempTable that contains, for example
sourceRowID | destinationRowID
-------------------------------
NULL | 12
NULL | 15
NULL | 16
NULL | 23
than I update #tempTable retrieving again the same record with the same WHERE condition joining the #tempTable and the retrieved record by RowNumber ensuring again that the records are retrieved ordered by destinationRowID, in this way the order of new and old record must be the same, and with the update I'll obtain the #tempTable with correct sourceRowID and destinationRowID

Related

How to add data to a single column

I have a question in regards to adding data to a particular column of a table, i had a post yesterday where a user guided me (thanks for that) to what i needed and said an update was the way to go for what i need, but i still can't achieve my goal.
i have two tables, the tables where the information will be added from and the table where the information will be added to, here is an example:
source_table (has only a column called "name_expedient_reviser" that is nvarchar(50))
name_expedient_reviser
kim
randy
phil
cathy
josh
etc.
on the other hand i have the destination table, this one has two columns, one with the ids and the other where the names will be inserted, this column values are null, there are some ids that are going to be used for this.
this is how the other table looks like
dbo_expedient_reviser (has 2 columns, unique_reviser_code numeric PK NOT AI, and name_expedient_reviser who are the users who check expedients this one is set as nvarchar(50)) also this is the way this table is now:
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | NULL
2 | NULL
3 | NULL
4 | NULL
5 | NULL
6 | NULL
what i need is the information of the source_table to be inserted into the row name_expedient_reviser, so the result should look like this
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | kim
2 | randy
3 | phil
4 | cathy
5 | josh
6 | etc.
how can i pass the information into this table? what do i have to do?.
EDIT
the query i saw that should have worked doesn't update which is this one:
UPDATE dbo_expedient_reviser
SET dbo_expedient_reviser.name_expedient_reviser = source_table.name_expedient_reviser
FROM source_table
JOIN dbo_expedient_reviser ON source_table.name_expedient_reviser = dbo_expedient_reviser.name_expedient_reviser
WHERE dbo_expedient_reviser.name_expedient_reviser IS NULL
the query was supposed to update the information into the table, extracting it from the source_table as long as the row name_expedient_reviser is null which it is but is doesn't work.
Since the Names do not have an Id associated with them I would just use ROW_NUMBER and join on ROW_NUMBER = unique_reviser_code. The only problem is, knowing what rows are null. From what I see, they all appear null. In your data, is this the case or are there names sporadically in the table like 5,17,29...etc? If the name_expedient_reviser is empty in dbo_expedient_reviser you could also truncate the table and insert values directly. Hopefully that unique_reviser_code isn't already linked to other things.
WITH CTE (name_expedient_reviser, unique_reviser_code)
AS
(
SELECT name_expedient_reviser
,ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
FROM source_table
)
UPDATE er
SET er.name_expedient_reviser = cte.name_expedient_reviser
FROM dbo_expedient_reviser er
JOIN CTE on cte.unique_reviser_code = er.unique_reviser_code
Or Truncate:
Truncate Table dbo_expedient_reviser
INSERT INTO dbo_expedient_reviser (name_expedient_reviser, unique_reviser_code)
SELECT DISTINCT
unique_reviser_code = ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
,name_expedient_reviser
FROM source_table
it is not posible to INSERT the data into a single column, but to UPDATE and move the data you want is the only way to go in that cases

Update strategy for table with sequence generated number as primary key in Informatica

I have a mapping that gets data from multiple sql server source tables and assigns a sequence generated number as ID for each rows. In the target table, the ID field is set as primary key.
Every time I run this mapping, it creates new rows and assigns a new ID for the records that are pre-existing in the target. Below is an example:
1st run:
ID SourceID Name State
1 123 ABC NY
2 456 DEF PA
2nd run:
ID SourceID Name State
1 123 ABC NY
2 456 DEF PA
3 123 ABC NY
4 456 DEF PA
Desired Output must:
1) create a new row and assign a new ID if a record gets updated in the source.
2) create a new row and assign a new ID if new rows are inserted in the source.
How can this be obtained in Informatica?
Thank you in advance!
I'll take a flyer and assume the ACTUAL question here is 'How can I tell if the incoming record is neither insert nor update so that I can ignore it'. You could
a) have some date field in your source data to identify when the record was updated and then restrict your source qualifier to only pick up records which were last updated after the last time this mapping ran... drawback is if fields you're not interested in were updated then you'll process a lot of redundant records
b) better suggestion!! Configure a dynamic lookup which should store the latest state of a record matching by the SourceID. Then you can use the newlookuprow indicator port to tell if the record is an insert, update or no change and filter out the no change records in a subsequent transformation
Give the ID field an IDENTITY PROPERTY...
Create Table SomeTable (ID int identity(1,1),
SourceID int,
[Name] varchar(64),
[State] varchar(64))
When you insert into it... you don't insert anything for ID. For example...
insert into SomeTable
select
SourceID,
[Name],
[State]
from
someOtherTable
The ID field will be an auto increment starting at 1 and increment by 1 each time a row is inserted. In regards to your question about adding rows each time one is updated or inserted into another table, this is what TRIGGERS are for.

SQL unique PK for grouped data in SP

I am trying to build a temp table with grouped data from multiple tables (in an SP), I am successful in building the data set however I have a requirement that each grouped row have a unique id. I know there are ways to generate unique ids for each row, However the problem I have is that I need the id for a given row to be the same on each run regardless of the number of rows returned.
Example:
1st run:
ID Column A Column B
1 apple 15
2 orange 10
3 grape 11
2nd run:
ID Column A Column B
3 grape 11
The reason I want this is because i am sending this data up to SOLR and when I do a delta I need to have the ID back for the same row as its trying to re-index
Any way I can do this?
Not sure if this will help, not entirely confident of your wider picture, but ...
As your new data is assembled, log each [column a] value in a table of your own.
Give that table an IDENTITY column to do the numbering for you.
Now you can join any new data sets to your lookup table and you'll have a persistent number for each column A.
You just need to ensure that each time you query new data, you add new values to the lookup table.
create table dbo.myRef(
idx int identity(1,1)
,[A] nvarchar(100)
)
General draft as below ...
--- just simulating some input data here
with cte as (
select 'apple' as [A], 15 as [B]
UNION
select 'orange' as [A], 10 as [B]
UNION
select 'banana' as [A], 4 as [B]
)
select * into #temp from cte;
-- Put any new values into the lookup table
-- and they will be assigned a new index number by the identity column
insert into dbo.myRef([A])
select distinct [A]
from #temp where [A] not in (select [A] from dbo.myRef)
-- now pull your original data for output, joining to the lookup table to get a ref number.
select T.*,R.idx
from #temp T
inner join
oer.myRef R
on T.[A] = R.[A]
Sorry for the late reply, i was stuck with something else, however i solved my own issue.
I built 2 temp tables one with all the data from the various tables (#master) and another temp table (#final) to house all the grouped data with an empty column for ID
Next i did a concat(column1, '-',column2,'-', column3) on 3 columns from the #master and updated the #final table based on the type
this helped me to get the same concat ids on each run

SQL Server : recreate table in appropriate order

I've deleted some records (more precisely row 4) from a table in a SQL Server database. Now the first column goes like this (1,2,3,5) without row 4:
ID Name
------------
1 Luk
2 Sky
3 Philips
5 Andrey
How can I recreate this table and insert all data again in appropriate order?
Like this:
ID Name
--------
1 Luk
2 Sky
3 Philips
4 Andrey
EDIT:
But if i have another column (number) that is not a key, like this:
ID Number Name
------------
1 1 Luk
2 2 Sky
3 3 Philips
5 5 Andrey
Then can i recreate column Number and Name,
ID Number Name
------------
1 1 Luk
2 2 Sky
3 3 Philips
5 4 Andrey 'Can i do this, and if can HOW?
I would make a pretty strong case for never storing this number, since it is calculated, instead you could just create a view:
CREATE VIEW dbo.YourView
AS
SELECT ID,
Number = ROW_NUMBER() OVER(ORDER BY ID),
Name
FROM dbo.YourTable;
GO
This way after you have deleted rows, your view will already be in sync without having to perform any updates.
If you need to store the value, then almost the same query applies, but just placed inside a common table expression, which is then updated:
WITH CTE AS
( SELECT ID,
Number,
NewNumber = ROW_NUMBER() OVER(ORDER BY ID)
FROM dbo.YourTable
)
UPDATE CTE
SET Number = NewNumber;
You can use dbcc command
DBCC CHECKIDENT('tableName', RESEED, 0)
It would reset identity to 0.
Note it would require to truncate table first.
You can make the ID to auto increment which by default, the starting value for AUTO_INCREMENT is 1, and it will increment by 1 for each new record.
E.g MSSQL uses IDENTITY keyword to auto increment whereas MySQL uses the AUTO_INCREMENT keyword to perform an auto-increment feature.
MSSQL
ID int IDENTITY(1,1) PRIMARY KEY
MySQL
ID int NOT NULL AUTO_INCREMENT

Delete duplicates from large dataset (>100Mio rows)

I know that this topic came up many times before here but none of the suggested solutions worked for my dataset because my laptop stopped calculating due to memory issues or full storage.
My table looks like the following and has 108 Mio rows:
Col1 |Col2 | Col3 |Col4 |SICComb | NameComb
Case New |3523 | Alexander |6799 |67993523| AlexanderCase New
Case New |3523 | Undisclosed |6799 |67993523| Case NewUndisclosed
Undisclosed|6799 | Case New |3523 |67993523| Case NewUndisclosed
Case New |3523 | Undisclosed |6799 |67993523| Case NewUndisclosed
SmartCard |3674 | NEC |7373 |73733674| NECSmartCard
SmartCard |3674 | Virtual NetComm|7373 |73733674| SmartCardVirtual NetComm
SmartCard |3674 | NEC |7373 |73733674| NECSmartCard
The unique columns are SICComb and NameComb. I tried to add a primary key with:
ALTER TABLE dbo.test ADD ID INT IDENTITY(1,1)
but the integers are filling up more than 30 GB of my storage just in a new minutes.
Which would be the fastest and most efficient method to delete the duplicates from the table?
If you're using SQL Server, you can use delete from common table expression:
with cte as (
select row_number() over(partition by SICComb, NameComb order by Col1) as row_num
from Table1
)
delete
from cte
where row_num > 1
Here all rows will be numbered, you get own sequence for each unique combination of SICComb + NameComb. You can choose which rows you want to delete by choosing order by inside the over clause.
In general, the fastest way to delete duplicates from a table is to insert the records -- without duplicates -- into a temporary table, truncate the original table and insert them back in.
Here is the idea, using SQL Server syntax:
select distinct t.*
into #temptable
from t;
truncate table t;
insert into t
select tt.*
from #temptable;
Of course, this depends to a large extent on how fast the first step is. And, you need to have the space to store two copies of the same table.
Note that the syntax for creating the temporary table differs among databases. Some use the syntax of create table as rather than select into.
EDIT:
Your identity insert error is troublesome. I think you need to remove the identity from the list of columns for the distinct. Or do:
select min(<identity col>), <all other columns>
from t
group by <all other columns>
If you have an identity column, then there are no duplicates (by definition).
In the end, you will need to decide which id you want for the rows. If you can generate a new id for the rows, then just leave the identity column out of the column list for the insert:
insert into t(<all other columns>)
select <all other columns>;
If you need the old identity value (and the minimum will do), turn off identity insert and do:
insert into t(<all columns including identity>)
select <all columns including identity>;

Resources