To override the in-build SCD transformation in SSIS dataflow, I used checksum values of columns and a lookup. Below is the process.
I need to implement the SCD type 1 in Target_Fact_table.
Source Query
Select key, a, b, CHECKSUM (a,b) new_value from Source_table
In Lookup
Select key, CHECKSUM (a,b) Old_value from Target_Fact_table
If no match found the record will be inserted and if match found compare the New_value and Old_value and if any change then update the record.
First run doesn’t have any issue. But for the second time when source having more records to update and insert then the target table will be locked because of the bulk insert and update.
I tried removing the table lock from the OLE DB Destination task but still the locking is there.
What can I do to avoid this locking or can i put some small delay in the update transformation ?
Your optimal solution is instead of an OLEDB Command to update the matched values, insert your matched data into a staging table in the destination and then do a single UPDATE JOIN statement on the fact table and your staging table to update all the new values.
This avoids locking, increases throughput, and can provide a cleaner audit trail for changes.
Related
we have 14M data in a source table and wanted to know what are all the possible ways to insert data from source table to destination table without dropping indexes. Indexes are created on destination table but not on source table. In SSIS package we tried with Data flow task, Lookup and by using Execute SQL task, but there is no use and lacking in performance. So, Kindly let me now the possible ways to speedup the insertion without dropping indexes. Thanks in advance
let me now the possible ways to speedup the insertion without dropping indexes
It really depend upon the complete set up.
What is real query like or what is table schema ?
How many indexes are there ?
Is it one time operation or daily avg will be 14m.
General answer :
i) Run insert operation in downtime or during very minimum traffic hour.
ii) Use TABLOCK hint
INSERT INTO TAB1 WITH (TABLOCK)
SELECT COL1,COL2,COL3
FROM TAB2
iii) You can think of disable/Rebuild index
ALTER INDEX ALL ON sales.customers
DISABLE;
--Insert query
ALTER INDEX ALL ON sales.customers
REBUILD ;
GO
iv) If source server is different then you can put source data in some destination parking table which is without index.Now again another job will put from parking to destination table.
Transaction between same server will be relatively faster.
In a Stored Procedure we are using partition function. The job runs based on the period like '202001','202002' etc. For previous periods the SP was executed and now due to some data issues we are thinking to execute the SP for previous periods.
We are actually loading data into a Work Table and using partition we are switching data from Work table to main table
ALTER TABLE db_table_Work switch
TO db_table partition $PARTITION.db_table_PFPerPost(#PeriodKey);
If we execute the SP now again for past period, will it cause the data to insert again for existing rows? Or will it insert the newly updated data?
SWITCH will err if the target partition is not empty. Furthermore, the non-partitioned source table must have a check constraint on the partitioning column to ensure all rows fall within the target boundary.
If data will exist in the primary table partition during reprocessing, you'll need to ensure data in the replaced partition is not changed during reprocessing of previous periods to avoid data loss. If that is not possible, a MERGE is needed to insert/update/delete rows instead of the more efficient SWITCH.
Consider partitioning the work table using the same partition scheme as the target table to avoid the need for a check constraint. This will also facilitate moving the partition into the work table and switching back after reprocessing. Below is an example of this technique, which assumes it is acceptable for the primary table partition to be empty during reprocessing of the period:
--switch primary table partiton into work table for reprocessing
ALTER TABLE db_table
SWITCH PARTITION $PARTITION.db_table_PFPerPost(#PeriodKey)
TO db_table_Work PARTITION $PARTITION.db_table_PFPerPost(#PeriodKey);
--reprocess data here
--switch reprocessed data back into primary table
ALTER TABLE db_table_Work
SWITCH PARTITION $PARTITION.db_table_PFPerPost(#PeriodKey)
TO db_table PARTITION $PARTITION.db_table_PFPerPost(#PeriodKey);
In the source database we have a table, lets call it TableA. with primary key PK_TableA. This table has a dependent table in source database, lets call it TableB, via a FK - lets call it FK_TableA.
We syncronize TableA from source database to target database, with same table names.
We do NOT syncronize TableB from source database to target database, but it exists in target database with the same name and has the same relation of dependence with TableA.
When a row is deleted from TableA in source database, TableB is updated by modifying all the rows with the deleted FK, setting FK_TableA column to null.
We intend to produce the same behaviour in target database without having to syncronize TableB.
So, on delete of a row from TableA in source database we:
1) want to update, to null, column FK_TableA from TableB in the target database, for the corresponding rows
2) delete the row from TableA in targert database
Is this possible?
What is the best mechanism? Transforms or Table Triggers (maybe with a Sync On Delete Condition)?
Can you please try to explain the way to do it?
Thanks.
Either a load filter or a load transform would work. The load filter is probably simpler for this case. Use sym_load_filter to configure a "before write" BeanShell script that does this:
if (data.getDataEventType().name().equals("DELETE")) {
context.findTransaction().execute("update tableb set fk_tablea = null " +
"where fk_tablea = " + OLD_FK_TABLEA);
}
return true;
The script checks that it's a DELETE statement, then it will run the SQL you need. The values for the table columns on the current row are available as upper case variables. The script returns true so the original delete will also run.
See https://www.symmetricds.org/doc/3.10/html/user-guide.html#_load_filters for more details on how to use load filters.
I have a unique requirement - I have a data list which is in excel format and I import this data into SQL 2008 R2., once every year, using SQL's import functionality. In the table "Patient_Info", i have a primary key set on the column "MemberID" and when i import the data without any duplicates, all is well.
But some times, when i get this data, some of the patient's info gets repeated with updated address / telephone , etc., with the same MemberID and since I set this as primary key, this record gets left out without importing into the database and thus, i dont have an updated record for that patient.
EDIT
I am not sure how to achieve this, to update some of the rows which might have existing memberIDs and any pointer to this is greatly appreciated.
examples below:
List 1:
List 2:
This is not a terribly unique requirement.
One acceptable pattern you can use to resolve this problem would be to import your data into "staging" table. The staging table would have the same structure as the target table to which you're importing, but it would be a heap - it would not have a primary key.
Once the data is imported, you would then use queries to consolidate newer data records with older data records by MemberID.
Once you've consolidated all same MemberID records, there will be no duplicate MemberID values, and you can then insert all the staging table records into the target table.
EDIT
As #Panagiotis Kanavos suggests, you can use a SQL MERGE statement to both insert new records and update existing records from your staging table to the target table.
Assume that the Staging table is named Patient_Info_Stage, the target table is named Patient_Info, and that these tables have similar schemas. Also assume that field MemberId is the primary key of table Patient_Info.
The following MERGE statement will merge the staging table data into the target table:
BEGIN TRAN;
MERGE Patient_Info WITH (SERIALIZABLE) AS Target
USING Patient_Info_Stage AS Source
ON Target.MemberId = Source.MemberId
WHEN MATCHED THEN UPDATE
SET Target.FirstName = Source.FirstName
,Target.LastName = Source.LastName
,Target.Address = Source.Address
,Target.PhoneNumber = Source.PhoneNumber
WHEN NOT MATCHED THEN INSERT (
MemberID
,FirstName
,LastName
,Address
,PhoneNumber
) Values (
Source.MemberId
,Source.FirstName
,Source.LastName
,Source.Address
,Source.PhoneNumber
);
COMMIT TRAN;
*NOTE: The T-SQL MERGE operation is not atomic, and it is possible to get into a race condition with it. To insure it will work properly, do these things:
Ensure that your SQL Server is up-to-date with service packs and patches (current rev of SQL Server 2008 R2 is SP3, version 10.50.6000.34).
Wrap your MERGE in a transaction (BEGIN TRAN;, COMMIT TRAN;)
Use SERIALIZABLE hint to help prevent a potential race condition with the T-SQL MERGE statement.
I have DB tables where there are no identity column. We have client data fetched from DB2 to SQL Server and unfortunately DB2 design doesn't have identity columns.
Now we have some data inserted, updated and deleted from source (DB2/SQL Server) and these data I want to load to destination (SQL Server) using some incremental load concept.
I tried SSIS lookups in Dataflow task however it's taking huge time to simply insert one new record. Please note that, in "lookup transformation editor" I'm mapping all "available input columns" to available "available lookup columns " as there is no identity column. I think, this is why it's taking time. I have few tables having around 20 million records.
Is there any faster method /ways available to do this, specially when table does not have identity column? Is except or SQL merge will help?
I'm open to have any other approach other than SSIS.
Look up is SSIS takes some time, so you can use ESQL Task and call the merge procedures.
I think what you can do is use merge procedures there you can create a column in your source table and update the records in the column like
merge desination
using
{
source columns from source s}
join desination d
on s.primarykey=d.primary key
when matched then
s.updatedrecord=1
when not matched then
insert into desination columns.
from the above the query you new records will be inserted and the updated records with the help of updatedrecord column you can update or insert them in your destination table successfully.
you can go to the following link for merge procedures.
https://www.sqlservercentral.com/Forums/Topic1042053-392-1.aspx
https://msdn.microsoft.com/en-us/library/bb510625.aspx
If your source is a SQL query from DB2 for instance, try adding a new column to this. It will be a checksum value over the columns you select "expect to change or want to monitor changes over".
SELECT
BINARY_CHECKSUM(
Column1
,Column2
,Column3)AS ChecksumValue
,Column1
,Column2
,Column3
FROM #TEMP
You would have to add this to your existing table in SQL as well to be able to start comparing.
If you have this, then you can do the lookup on the checksum value rater than on the columns. Since number lookups are a lot quicker than varchar comparisons over multiple columns. I am guessing since there is no key, you would then have to split the data between checksum matches (which should be no change existing records) and non matches. The non matches Could be new rows or just updates. But your set should be smaller to work with.
Good luck. HTH