I am a starter of SSIS. If you have any guidance, please be specific. Thank you.
I am trying to sync two tables on two different servers.
Table A on Server A(60k data) and Table B on Server B(60k data).
there are mainly 3 things:
Add new records where they are in Table A not Table B.
Update records where Table A and B both have and update them.
Delete records where they are not in Table A, but in B
There are no primary keys in both tables. But two columns together can mostly help to find certain records(there are duplicates each table, but not many)
http://www.rad.pasfu.com/index.php?/archives/150-Insert,-Update,-and-Delete-Destination-table-with-SSIS.html
I've tried to use this guy's method to figure it out, but failed. I tried this way on two sample tables which has a couple of lines of data, successful.
On real tables, I set Column A and B as SortKeyPosition 1 and 2 since I have to use them together.
Full outer join at Merge Join
Conditional Split for New Records as below:
(!ISNULL(S_Column A) && !ISNULL(S_Column B)) && (ISNULL([D_Column A]) && ISNULL(D_Column B))
Delete Records as:
(ISNULL(S_Column A) && ISNULL(S_Column B)) && (!ISNULL([D_Column A]) && !ISNULL(D_Column B))
As a result, I got 34k data for new records and 0 records for delete ones.
I've tested on SQL for the real result, 1000+ for new records and 600+ for deleting ones. 60k around needs to be updated. I don't know what causes this problem and how to fix it.
Updating: I personally used a OLE DB Command to write SQL command replacing all data(some of the data don't need to be updated) after conditional split(assuming I am doing write before). I am looking for better solutions for updating too.
Hoping to get helped! Thank you again and early happy holiday!
Try the following steps:
COPY table from ServerA to a working table on ServerB
Handle Duplicates by theoretical index
Are these 'duplicates' differentiatable by adding additional fields?
If so, add those fields to the indexing.
If not, assume that all of these rows on ServerB will end up looking like one row from ServerA
DELETE rows from B that do not match rows from A
UPDATE rows in B where the keys match in A but data differs (it is worthwhile to compare the data and only update if the data is actually different).
INSERT the new rows from A
DROP or TRUNCATE the working table.
Related
I tried asking this question before and it seemed to have gotten swept under the rug.
First thing first, here are these two pictures to show the table structure and the current output I get in SSIS.
Table Diagram
Current Output
So in table three, there is only one entry. This entry (name) applies to the other foreign keys though. What I want the final output to look like is like my current output, but instead of the NULLS, there should just be ones.
I was able to get this far on my own through researching and learning about the merge transformations but I can't seem to find anything on manipulating the data in the way that I want.
I greatly appreciate any tips or advice you can offer.
EDIT: Since the images can't be seen apparently, I will try and describe them.
The table diagram has four tables, the top one in the waterfall has a primary key formed from the three foreign keys for the three different tables.
Trying to accomplish filling out this table in SSIS, my output has each foreign key id from the first two tables, but only one in the third table. The rest from the third foreign key are all NULLS. I believe this is because there is only one entry in that table for now, but this entry applies to all of the foreign key ids and so it should be repeating.
It should look like this:
ID1 ID2 ID3
1 1 1
2 2 1
3 3 1
But instead, I am only getting nulls in the ID3 field after the first record. How do I make the single id repeat in ID3?
EDIT 2: Some additional screenshots of my data flow and merge transformation as requested.
[![SSIS Dataflow][3]][3]
After working on this for a few weeks, and with a tips from a colleague, a solution to this question was found. Surprisingly, it was quite simple and I'm slightly shocked that no one on here could provide the answer.
The solution was simply this; Using a data source, write the following SQL code in the data access mode (SQL Command):
SELECT a.T1ID,
b.T2ID,
c.T3ID
FROM Table1 AS a join
Table2 AS b
On a.T1ID = b.T2ID,
Table3 AS c
ORDER BY a.[T1ID] ASC
If Table3 will always have just a single row, the simplest solution would be to use an Execute SQL task to save the T3id to a variable (Control Flow), then use a Derived Column task (Data Flow) to add the variable as a new column.
If that won't work for you (or your data), you can take a look here to see how to fudge the Merge Join task to do what you want.
In our SQL Server DB, we have about some 800+ tables and there are 40 - 50 tables are business critical tables. MIS team needs to generate reports based on those 50 business tables.
Those 50 tables gets updated frequently. MIS team requires those delta records (update/inserted/deleted)
What would be the best solution?
We have few approches here
1.Always On 2.Replication 3.Mirroring 4.Introducing new column (LastModifiedDate & creating index) in those 50 tables and pulling those records periodically and populating it to MIS environment.
There will be huge code change for the new column LastModifiedDate approach.
Based on those 50 tables, we have huge number of stored procedures which it has Insert/Update
statements. In those stored procedures, we need to do code change for LastModifiedDate.
What would be the best solution from the above approches?
Pls let us know if any other approach to do. Note: We are using SQL Server 2008 R2
Regards Karthik
One approach is to have insert, update and delete triggers on these tables, and for each table an archive table with exactly the same columns plus e.g. username, modifieddatetime and a bit to indicate new and old. Then the triggers simply insert into archive select from inserted/deleted + current user, current time and 1 for inserted and 0 for deleted.
Then all your MIS need to concern themselves with is the archive tables, and you will not need to make a structure change to the existing tables.
I've got table A and table B in a database. I'm trying to move the column in table A to table B without losing any data. Not sure if its best to move the column somehow or make a whole new column in table B then copy the data over?
The problem is that the database is already in production and I don't want the clients to lose the data that is currently stored in column X in table A. I thought of making a migration to create the same column X in table B, and then somehow copying the data there. I am not sure how to do that, and I couldn't find a similar problem here.
if you have phpmyadmin you can do this pretty easily. this command should work:
INSERT INTO `tabletwo.columnb` (SELECT 'columna' FROM tableone)
Always back up the dbs, load local and try this, never live lol. I'm sure your aware. :)
Note: columna and columnb are placeholder for your actual column names
I think you can create the migration table to create de column on table B.
After you can use tinker ("php artisan tinker" on the terminal) to move the desired data on the table A to the B.
Can someone help me get into the thinking of knowing how to fix data in SQL tables (by trying NOT to give me an SQL routines I could run).
Ok, this is the situation…. Suppose I have a single table with has a column called ColumnA which has lots of duplicate values. I need to remove all the duplicate entries from the table in question. Question is….if I had to write pseudo-code as a plan, what SQL should be written
Many thanks to anyone who can offer me any pointers.
Kind Regards
James
I think you've already articulated a very basic psuedocode for the issue you describe in stating that you wish to delete duplicate values from column A.
For this example, I would tend to;
Find all instances of duplicates
Work out method of determining which one to keep (Google "MAX N in
Group" for ideas) There are good articles here on SO and DBA
Stackexchange also other external articles with examples
Write your delete to cater for the records you identify as unwanted
duplicates in the previous step
For me, when working through these types of issues in SQL Server, I tend to write a series of Common Table Expressions (CTE) to identify my target records and then delete based on that.
For example;
;WITH Duplicates AS (
-- Write your select query to identify which subset of your records are affected by having duplicate values in Column A
), TargetRows AS (
-- Further select from Duplicates some method of MAX N in Group to identify which of the rows are unwanted
) -- Then here DELETE from your table based upon your findings from above
I need to create index from two tables that are not related. But when I try to do so I get the response that record fetched is equal to both tables records but no index in created.
Please help
Creating 1 index from 2 unrelated tables sound very strange. Why can't you create 2 indexes?
In case you don't have other choice create unique id filed in schema (look at http://wiki.apache.org/solr/UniqueKey). Check also in DB logs what queries are actually run.