update data when importing a duplicate record in SQL

update data when importing a duplicate record in SQL - sql-server

I have a unique requirement - I have a data list which is in excel format and I import this data into SQL 2008 R2., once every year, using SQL's import functionality. In the table "Patient_Info", i have a primary key set on the column "MemberID" and when i import the data without any duplicates, all is well.
But some times, when i get this data, some of the patient's info gets repeated with updated address / telephone , etc., with the same MemberID and since I set this as primary key, this record gets left out without importing into the database and thus, i dont have an updated record for that patient.
EDIT
I am not sure how to achieve this, to update some of the rows which might have existing memberIDs and any pointer to this is greatly appreciated.
examples below:
List 1:
List 2:

This is not a terribly unique requirement.
One acceptable pattern you can use to resolve this problem would be to import your data into "staging" table. The staging table would have the same structure as the target table to which you're importing, but it would be a heap - it would not have a primary key.
Once the data is imported, you would then use queries to consolidate newer data records with older data records by MemberID.
Once you've consolidated all same MemberID records, there will be no duplicate MemberID values, and you can then insert all the staging table records into the target table.
EDIT
As #Panagiotis Kanavos suggests, you can use a SQL MERGE statement to both insert new records and update existing records from your staging table to the target table.
Assume that the Staging table is named Patient_Info_Stage, the target table is named Patient_Info, and that these tables have similar schemas. Also assume that field MemberId is the primary key of table Patient_Info.
The following MERGE statement will merge the staging table data into the target table:
BEGIN TRAN;
MERGE Patient_Info WITH (SERIALIZABLE) AS Target
USING Patient_Info_Stage AS Source
ON Target.MemberId = Source.MemberId
WHEN MATCHED THEN UPDATE
SET Target.FirstName = Source.FirstName
,Target.LastName = Source.LastName
,Target.Address = Source.Address
,Target.PhoneNumber = Source.PhoneNumber
WHEN NOT MATCHED THEN INSERT (
MemberID
,FirstName
,LastName
,Address
,PhoneNumber
) Values (
Source.MemberId
,Source.FirstName
,Source.LastName
,Source.Address
,Source.PhoneNumber
);
COMMIT TRAN;
*NOTE: The T-SQL MERGE operation is not atomic, and it is possible to get into a race condition with it. To insure it will work properly, do these things:
Ensure that your SQL Server is up-to-date with service packs and patches (current rev of SQL Server 2008 R2 is SP3, version 10.50.6000.34).
Wrap your MERGE in a transaction (BEGIN TRAN;, COMMIT TRAN;)
Use SERIALIZABLE hint to help prevent a potential race condition with the T-SQL MERGE statement.

Related

Transforms vs. Table triggers in SymmetricDS

In the source database we have a table, lets call it TableA. with primary key PK_TableA. This table has a dependent table in source database, lets call it TableB, via a FK - lets call it FK_TableA.
We syncronize TableA from source database to target database, with same table names.
We do NOT syncronize TableB from source database to target database, but it exists in target database with the same name and has the same relation of dependence with TableA.
When a row is deleted from TableA in source database, TableB is updated by modifying all the rows with the deleted FK, setting FK_TableA column to null.
We intend to produce the same behaviour in target database without having to syncronize TableB.
So, on delete of a row from TableA in source database we:
1) want to update, to null, column FK_TableA from TableB in the target database, for the corresponding rows
2) delete the row from TableA in targert database
Is this possible?
What is the best mechanism? Transforms or Table Triggers (maybe with a Sync On Delete Condition)?
Can you please try to explain the way to do it?
Thanks.

Either a load filter or a load transform would work. The load filter is probably simpler for this case. Use sym_load_filter to configure a "before write" BeanShell script that does this:
if (data.getDataEventType().name().equals("DELETE")) {
context.findTransaction().execute("update tableb set fk_tablea = null " +
"where fk_tablea = " + OLD_FK_TABLEA);
}
return true;
The script checks that it's a DELETE statement, then it will run the SQL you need. The values for the table columns on the current row are available as upper case variables. The script returns true so the original delete will also run.
See https://www.symmetricds.org/doc/3.10/html/user-guide.html#_load_filters for more details on how to use load filters.

How to do incremental load in SQL server

I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.

Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx

If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH

Trigger to log inserted/updated/deleted values SQL Server 2012

I'm using SQL Server 2012 Express and since I'm really used to PL/SQL it's a little hard to find some answers to my T-SQL questions.
What I have: about 7 tables with distinct columns and an additional one for logging inserted/updated/deleted values from the other 7.
Question: how can I create one trigger per table so that it stores the modified data on the Log table, considering I can't used Change Data Capture because I'm using the SQL Server Express edition?
Additional info: there is only two columns in the Logs table that I need help filling; the altered data from all the columns merged, example below:
CREATE TABLE USER_DATA
(
ID INT IDENTITY(1,1) NOT NULL,
NAME NVARCHAR2(25) NOT NULL,
PROFILE INT NOT NULL,
DATE_ADDED DATETIME2 NOT NULL
)
GO
CREATE TABLE AUDIT_LOG
(
ID INT IDENTITY(1,1) NOT NULL,
USER_ALTZ NVARCHAR(30) NOT NULL,
MACHINE SYSNAME NOT NULL,
DATE_ALTERERED DATETIME2 NOT NULL,
DATA_INSERTED XML,
DATA_DELETED XML
)
GO
The columns I need help filling are the last two (DATA_INSERTED and DATA_DELETED). I'm not even sure if the data type should be XML, but when someone either
INSERTS or UPDATES (new values only), all data inserted/updated on the all columns of USER_DATA should be merged somehow on the DATA_INSERTED.
DELETES or UPDATES (old values only), all data deleted/updated on the all columns of USER_DATA should be merged somehow on the DATA_DELETED.
Is it possible?

Use the inserted and deleted Tables
DML trigger statements use two special tables: the deleted table and
the inserted tables. SQL Server automatically creates and manages
these tables. You can use these temporary, memory-resident tables to
test the effects of certain data modifications and to set conditions
for DML trigger actions. You cannot directly modify the data in the
tables or perform data definition language (DDL) operations on the
tables, such as CREATE INDEX. In DML triggers, the inserted and
deleted tables are primarily used to perform the following: Extend
referential integrity between tables. Insert or update data in base
tables underlying a view. Test for errors and take action based on the
error. Find the difference between the state of a table before and
after a data modification and take actions based on that difference.
And
OUTPUT Clause (Transact-SQL)
Returns information from, or expressions based on, each row affected
by an INSERT, UPDATE, DELETE, or MERGE statement. These results can be
returned to the processing application for use in such things as
confirmation messages, archiving, and other such application
requirements. The results can also be inserted into a table or table
variable. Additionally, you can capture the results of an OUTPUT
clause in a nested INSERT, UPDATE, DELETE, or MERGE statement, and
insert those results into a target table or view.

Just posting because this is what solved my problem. As user #SeanLange said in the comments to my post, he said to me to use an "audit", which I didn't know it existed.
Googling it, led me to this Stackoverflow answer where the first link there is a procedure that creates triggers and "shadow" tables doing sort of what I needed (it didn't merge all values into one column, but it fits the job).

SQL: Export data to new table and update old data simultaneously

I want to export data from one table into a new one with a nightly job.
To prevent generate dublicates, I implemented a column named "ExportState" in the source table which is 0 for not exported and 1 for exported.
My problem is, that I want to export the data and then setting the State to 1. But I can not make a INSERT INTO ... SELECT and then UPDATE Statements because it is possible that additional Data would inserted to the source table while the export routine runs. So I would at the end UPDATE the ExportState to 1 on records which I never INSERTed to the destination table.
Do you have suggestions to the following solutions ?
A. INSERT INTO ... SELECT and UPDATE ExportState row by row
B. Take a Snaphot INSERT and UPDATE ExportState of the snapshoted Data
Which makes more sense ?
The second problem: The source and destination tables are on different SQL Servers and database instances. Ideas ?

I would create a stored procedure to perform the task.
Within the stored procedure create a table variable or temp table. Insert the data from the source table where ExportState = 0 into the temp table. (If you have a primary key on this table just store the primary key in your temp table.)
Perform your insert statement from source table to destination table.
Using your temp table, perform your update statement to set ExportState = 1 for each record in your temp table.
Wrap all of this within a transaction.
Sample Code:
BEGIN TRAN
DECLARE #Exported TABLE (PK INTEGER NOT NULL);
INSERT INTO #Exported (PK) SELECT PK FROM SourceTable WHERE ExportState = 0;
INSERT INTO #DestinationTable (Field Names)
SELECT FieldNames
FROM SourceTable s
INNER JOIN #Exported e
ON s.PK = e.PK
WHERE s.ExportStatus = 0;
UPDATE s SET ExportStatus=1
FROM SourceTable s
INNER JOIN #Exported e
on s.PK =e.PK;
COMMIT TRAN
Invoke the stored procedure from your nightly job.

To connect to databases on other SQL Servers, look into using Linked Servers. You should be able to configure one under the "Server Objects" folder in SSMS 2008. Here is a link to more info if you are interested...http://msdn.microsoft.com/en-us/library/ff772782.aspx

Best way to move data between tables and generate mapping of old to new identity values

I need to merge data from 2 tables into a third (all having the same schema) and generate a mapping of old identity values to new ones. The obvious approach is to loop through the source tables using a cursor, inserting the old and new identity values along the way. Is there a better (possibly set-oriented) way to do this?
UPDATE: One additional bit of info: the destination table already has data.

Create your mapping table with an IDENTITY column for the new ID. Insert from your source tables into this table, creating your mapping.
SET IDENTITY_INSERT ON for your target table.
Insert into the target table from your source tables joined to the mapping table, then SET IDENTITY_INSERT OFF.

I created a mapping table based on the OUTPUT clause of the MERGE statement. No IDENTITY_INSERT required.
In the example below, there is RecordImportQueue and RecordDataImportQueue, and RecordDataImportQueue.RecordID is a FK to RecordImportQueue.RecordID. The data in these staging tables needs to go to Record and RecordData, and FK must be preserved.
RecordImportQueue to Record is done using a MERGE statement, producing a mapping table from its OUTPUT, and RecordDataImportQueue goes to RecordData using an INSERT from a SELECT of the source table joined to the mapping table.
DECLARE #MappingTable table ([NewRecordID] [bigint],[OldRecordID] [bigint])
MERGE [dbo].[Record] AS target
USING (SELECT [InstanceID]
,RecordID AS RecordID_Original
,[Status]
FROM [RecordImportQueue]
) AS source
ON (target.RecordID = NULL) -- can never match as RecordID is IDENTITY NOT NULL.
WHEN NOT MATCHED THEN
INSERT ([InstanceID],[Status])
VALUES (source.[InstanceID],source.[Status])
OUTPUT inserted.RecordID, source.RecordID_Original INTO #MappingTable;
After that, you can insert the records in a referencing table as folows:
INSERT INTO [dbo].[RecordData]
([InstanceID]
,[RecordID]
,[Status])
SELECT [InstanceID]
,mt.NewRecordID -- the new RecordID from the mappingtable
,[Status]
FROM [dbo].[RecordDataImportQueue] AS rdiq
JOIN #MappingTable AS mt
ON rdiq.RecordID = mt.OldRecordID
Although long after the original post, I hope this can help other people, and I'm curious for any feedback.

I think I would temporarily add an extra column to the new table to hold the old ID. Once your inserts are complete, you can extract the mapping into another table and drop the column.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

update data when importing a duplicate record in SQL - sql-server

Related

Transforms vs. Table triggers in SymmetricDS

How to do incremental load in SQL server

Trigger to log inserted/updated/deleted values SQL Server 2012

SQL: Export data to new table and update old data simultaneously

Best way to move data between tables and generate mapping of old to new identity values

Categories

Resources