I have sql query composed from 3 tables. I need to synchronize the query result with 1 table every 24 hours.
The idea is to run query every 24 hours. Compare the result with target table. Delete the target table rows, insert new ones or update existing rows.
I'm asking for "best-practices" to deal with this kind of situation.
Thank you.
Related
Once a day I have to synchronize table between two databases.
Source: Microsoft SQL Server
Destination: PostgreSQL
Table contains up to 30 million rows.
For the first time i will copy all table, but then for effectiveness my plan is to insert/update only changed rows.
In this way if I delete row from source database, it will not be deleted from the destination database.
The problem is that I don’t know which rows were deleted from the source database.
My dirty thoughts right now tend to use binary search - to compare the sum of the rows on each side and thus catch the deleted rows.
I’m at a dead end - please share your thoughts on this...
In SQL Server you can enable Change Tracking to track which rows are Inserted, Updated, or Deleted since the last time you synchronized the tables.
with TDS FDWs (Foreign Data Wrapper), map the source table with a temp table in pg, an use a join to find/exclude the rows that you need.
I am using the Delete-Step of Pentaho in order to delete about 200k rows (with 35 columns each) on an MSSQL Server Express table. Primary key is the condition I do sort the rows upon! Commit size is at 10k.
Performance of the Server should not be the issue because I am able to insert with a speed of over 1000 rows per second.
Tried the same steps with a table that does not have a primary key constraint. Same issue.
Would appreciate any help!
how should the statement look like if I don't want to type in all the 400k PK-numbers into the WHERE-clause?
Not sure what strategy Pentaho is using to run the deletes, but you might try loading the 400k IDs into a staging table or temporary table, and referencing that in the DELETE. eg
delete from maintable where id in (select id from maintable_ids_to_delete)
I have a source table in a Sybase database (ORDERS) and a Source table in an MSSQL Database (DOCUMENTS). I need to query the Sybase database and for each row found in the ORDERS table get the matching row(s) by order number from the DOCUMENTS table.
I originally wrote the SSIS package using a lookup transformation, simple, except that it could be a one-to-many relationship, where 1 order number will exist in the ORDERS table but more than 1 documents could exist in the DOCUMENTS table. The SSIS lookup will only match on first.
My 2nd attempt will be to stage the rows from the ORDERS table into a staging table in MSSQL and then loop through the rows in this table using a FOR EACH LOOP CONTAINER and get the matching rows from the DOCUMENTS table, inserting the DOCUMENTS rows into another staging table. After all rows from ORDERS have been processed I will write a query to join the two staging tables to give me my result. A concern with this method is that I will be opening and closing the DOCUMENTS database connection many times, which will not be very efficient (although there will probably be less than 200 records).
Or could you let me know of any other way of doing this?
I have 2 tables. One having 1 million rows (Table 1), and another having 99 million rows (Table 2). Both of them are in a separate schemas.
They have similar structures, so no problem there.
My question would be this:
I need the table containing both of the tables' Data on the schema containing Table 1.
Would it be faster to run a code to transfer all 99 million rows of Table 2 to Table 1
OR
Would it be faster to run a code to transfer all 1 million rows to Table 2, and then run a code to Alter Table 2's schema to Table 1's schema?
OR
Would everything actually be instantaneous?
My understanding is you want to insert all records from Table 2 into Table 1. If this is the case I would suggest dropping the indexes on table 1, run the insert and then rebuild it. Alternatively you could leave the indexes on, but that would slow things WAY DOWN. Another solution which is my preferred, is create Table 3, insert each of the 2 tables into it, build the index then rename Table 1 & 2 to TableName_Backup, and rename table 3 to be whatever you want. The 2nd solution should give you optimal results while keep both original tables in their original state while you validate the data. Once you feel good, either put the 2 original tables in an archive location or drop them, depending on your company policy.
I need to update 2 million records in 120 tables. I have created indexes on each table since the same column is referred in where clause and in update statement. Each table has 100 000 records on an average. I have disabled the foreign key constraints before update.
For this I have written a procedure where it will fetch the records and run the update statements. From the blogs I came to know Bulk Collect & FORALL is good option in oracle, but there is only small difference in time that I can see to run the update statements. Are they any other approach to increase the performance and reduce the time to update the records into the tables?