How to do incremental load in SQL server - sql-server

I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.

Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx

If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH

Related

Query optimization_T- SQL

we have 14M data in a source table and wanted to know what are all the possible ways to insert data from source table to destination table without dropping indexes. Indexes are created on destination table but not on source table. In SSIS package we tried with Data flow task, Lookup and by using Execute SQL task, but there is no use and lacking in performance. So, Kindly let me now the possible ways to speedup the insertion without dropping indexes. Thanks in advance
let me now the possible ways to speedup the insertion without dropping indexes
It really depend upon the complete set up.
What is real query like or what is table schema ?
How many indexes are there ?
Is it one time operation or daily avg will be 14m.
General answer :
i) Run insert operation in downtime or during very minimum traffic hour.
ii) Use TABLOCK hint
INSERT INTO TAB1 WITH (TABLOCK)
SELECT COL1,COL2,COL3
FROM TAB2
iii) You can think of disable/Rebuild index
ALTER INDEX ALL ON sales.customers
DISABLE;
--Insert query
ALTER INDEX ALL ON sales.customers
REBUILD ;
GO
iv) If source server is different then you can put source data in some destination parking table which is without index.Now again another job will put from parking to destination table.
Transaction between same server will be relatively faster.

Database insert script for SQL Server

I have a requirement where data from 150 tables with different columns should be copied to another table which has all these columns. I need a script which will do this activity automatically instead of manually inserting one by one.
Any suggestions?
You can get the column names from either sys.columns or information_schema.columns along with the datatype, then it's just a simple matter of de-duping the columns (based on name) and sorting out any conflicts with differing datatypes to create your destination table.
once you have that, you can create and execute all your insert statements.
Good luck.

update data when importing a duplicate record in SQL

I have a unique requirement - I have a data list which is in excel format and I import this data into SQL 2008 R2., once every year, using SQL's import functionality. In the table "Patient_Info", i have a primary key set on the column "MemberID" and when i import the data without any duplicates, all is well.
But some times, when i get this data, some of the patient's info gets repeated with updated address / telephone , etc., with the same MemberID and since I set this as primary key, this record gets left out without importing into the database and thus, i dont have an updated record for that patient.
EDIT
I am not sure how to achieve this, to update some of the rows which might have existing memberIDs and any pointer to this is greatly appreciated.
examples below:
List 1:
List 2:
This is not a terribly unique requirement.
One acceptable pattern you can use to resolve this problem would be to import your data into "staging" table. The staging table would have the same structure as the target table to which you're importing, but it would be a heap - it would not have a primary key.
Once the data is imported, you would then use queries to consolidate newer data records with older data records by MemberID.
Once you've consolidated all same MemberID records, there will be no duplicate MemberID values, and you can then insert all the staging table records into the target table.
EDIT
As #Panagiotis Kanavos suggests, you can use a SQL MERGE statement to both insert new records and update existing records from your staging table to the target table.
Assume that the Staging table is named Patient_Info_Stage, the target table is named Patient_Info, and that these tables have similar schemas. Also assume that field MemberId is the primary key of table Patient_Info.
The following MERGE statement will merge the staging table data into the target table:
BEGIN TRAN;
MERGE Patient_Info WITH (SERIALIZABLE) AS Target
USING Patient_Info_Stage AS Source
ON Target.MemberId = Source.MemberId
WHEN MATCHED THEN UPDATE
SET Target.FirstName = Source.FirstName
,Target.LastName = Source.LastName
,Target.Address = Source.Address
,Target.PhoneNumber = Source.PhoneNumber
WHEN NOT MATCHED THEN INSERT (
MemberID
,FirstName
,LastName
,Address
,PhoneNumber
) Values (
Source.MemberId
,Source.FirstName
,Source.LastName
,Source.Address
,Source.PhoneNumber
);
COMMIT TRAN;
*NOTE: The T-SQL MERGE operation is not atomic, and it is possible to get into a race condition with it. To insure it will work properly, do these things:
Ensure that your SQL Server is up-to-date with service packs and patches (current rev of SQL Server 2008 R2 is SP3, version 10.50.6000.34).
Wrap your MERGE in a transaction (BEGIN TRAN;, COMMIT TRAN;)
Use SERIALIZABLE hint to help prevent a potential race condition with the T-SQL MERGE statement.

import sybase data to sql server when any change make in table

I have a Sybase database which i want to migrate to SQL SERVER 2008R2. I have done this, but i got a new requirement. When any data is modified or new insert in table in Sybase that data only migrate from Sybase to Sql Server. The one table data is around 11,000135 so every time this is not possible to migrate all the data from Sybase to Sql Server. Is any possible way to do this?
I don't see any straight forward solution here. I would use following steps to deal with that issue:
Create table with unique key of source_table:
Create table mod_date
(
key int unique,
modified_date datetime
)
Create insert/update trigger for source_table that will be inserting/updating modified_date table.
When selecting data from source_table join mod_date and filter out only dates grater than last update.
Opposite to create new table yuo could also add modified_date to your source_table and use it during select. GL!

Permanently sorting a table in SQLServer based on pre-existing data

I have made a table in SQL Server based on pre-existing data:
SELECT pre_existing_data
INTO new_table
FROM existing_table
I am trying to get the output to permanently sort by a particular field once the table is created. I thought this would be as simple as adding an ORDER BY clause at the end of the chunk of code that makes the table, but the data still won't sort properly.
There is no way to permanently sort a table in SQL.
You can create an index on the table and queries which use the index (in an ORDER BY clause) will be returned quicker, but the order the data is stored on the disk is not controllable.
You can create an index-organized table by using a CLUSTERED INDEX, which stores the data on disk in an ordered fashion on the clustering key. Then if you ORDER BY in your query based on the clustering key, data should come out very fast. Note that you have to use the ORDER BY in your query no matter what.
I have made a new table on SQL Server on pre-existing Schema
insert into new_table
select * from old_table
ORDER BY col ASC|DSC;
After it drop old_table and rename new table to old_table_name
drop table old_table_name;
rename new_table_name to old_table_name;
Try this trick to short your data in the table permanently

Resources