During logical delete of DB Polling process of OSB, I want to update a timestamp column besides the MarkReadColumn column.
Is there any way to do that? Can I modify the sql query for logical delete?
Best, Khalid
Related
Environment: Oracle 12C
Got a table with about 10 columns which include few clob and date columns. This is a very busy table for an ETL process as described below-
Flat files are loaded into the table first, then updated and processed. The insert and updates happen in batches. Millions of records are inserted and updated.
There is also a delete process to delete old data based on a date field from the table. The delete process runs as a pl/sql procedure and deletes from the table in a loop fetching first n records only based on date field.
I do not want the delete process to interfere with the regular insert/update . What is the best practice to code the delete so that it has minimal impact on the regular insert/update process ?
I can also partition the table and delete in parallel since each partition uses its own rollback segment but am looking for a simpler way to tune the delete process.
Any suggestions on using a special rollback segment or other tuning tips ?
The first thing you should look for is to decouple various ETL processes so that you need not do all of them together or in a particular sequence. Thereby, removing the dependency of the INSERTS/UPDATES and the DELETES. While a insert/update you could manage in single MERGE block in your ETL, you could do the delete later by simply marking the rows to be deleted later, thus doing a soft delete. You could do this as a flag in your table column. And use the same in your application and queries to filter them out.
By doing the delete later, your critical path of the ETL should minimize. Partitioning the data based on date range should definitely help you to maintain the data and also make the transactions efficient if it's date driven. Also, look for any row-by-row thus slow-by-slow transactions and make them in bulk. Avoid context switching between SQL and PL/SQL as much as possible.
If you partition the table as a date range, then you could look into DROP/TRUNCATE partition which will discard the rows stored in that partition as a DDL statement. This cannot be rolled back. It executes quickly and uses few system resources (Undo and Redo). You can read more about it in the documentation.
We would like to synchronize data (insert, update) from Oracle (11g) to PostgreSQL (10). Our approach was the following:
A trigger on the table in Oracle updates a column with nextval from a sequence before insert and update.
PostgreSQL knows the last sequence number processed and fetches the rows from Oracle > lastSequenceNumberFetched.
We now have the following problem:
Session 1 in Oracle inserts a row, sequence number (let's say 45) is written but no COMMIT is done in Oracle.
Session 2 in Oracle inserts a row, sequence number is written (let's say 49 (because sequences in Oracle can have gaps)) and a COMMIT is done in Oracle.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 44 (because the lastSequenceNumberFetched is 44) and gets the row with sequenceNumber 49. So this is the new lastSequenceNumberFetched.
Session 1 in Oracle makes a commit.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 49. Problem is that the row with sequenceNumber 45 is never fetched.
Are there any better approaches for our use case avoiding our problem with missing data?
In case you don't have delete operations in your tables and the tables are not very big then I suggest to use Oracle System Change Number (SCN) on the row level which is returned by the pseudo column ORA_ROWSCN (link). This is the commit time presented by number. By default the SCN is tracked for the data block, but you can enable tracking on the row level (keyword rowdependencies). So you have to recreate your table with this keyword. At the sync procedure launch you get the current scn by the function call dbms_flashback.get_system_change_number, then scan all tables where ora_rowscn between _last_scn_value_ and _current_scn_value_. The disadvantage is that this pseudo columns is not indexed, so you will have full table scans, which is slow for big tables.
If you use delete statements then you have to track the records which were deleted. For this purpose you can use one log table having the following columns: table_name, table_id_value, operation (insert/update/delete). Table is filled by the trigger on base tables. So for your case when session 1 commits data in base table - then you have the record in log table to process. And you don't see it until the session commits. So no issues with sequence numbers that you described.
Hope that helps.
Is this purely a data project or do you have some client here. If you do have a middle tier you could use an ORM to abstract some of this and do writes to both. Do you care whether the sequences are the same? It would be possible to do something like collect all the data to synchronize since a particular timestamp (every table would have to have a UTC timestamp) and then take a hash of all the data and compare with what is in Postgres.
It might be useful to have some more of your requirements for the synchronization of data and the reasoning behind this e.g.
Do the keys need to be the same against both environments? Why?
Who views the data, is the same consumer looking at both sources.
Why wouldn't you just use an ORM to target only one db why do you need oracle and postgres?
I have seen a similar setup. An application on Postgres mostly for reporting and other secondary tasks while main app was on Oracle.
Some of the main app tables are cached in Postgres for convenience. But this setup brings in the sync problem.
The compromise solution was a mix of incremental sequence-based sync during daytime and full table copy overnight
Regarding other solutions proposed here:
Postgres fdw is slow for complex queries and it puts extra load on foreign db especially when where clause refer to both local and foreign tables.
The same query will run much faster if foreign table is cached in postgres.
Incremental/differential sync using sequence numbers -tried this and works acceptable for small tables, but the nightmare starts with child relations maybe an orm can help here
The ideal solution in my opinion would probably be to stream Oracle changes to Postgres or intermediary process that replicates changes to Postgres
I have no clue about how to do this as I understood it requires Oracle golden gate app (+ licence)
I have a db on Oracle 11g where there's a table updated by external users. Now I want to catch the insert/update/delete on this table in order to bring these changes on a table on another db and I'm trying different methods for research. I tested polling (a job to check every minute if there is an update, insert or delete on the table) and trigger (fire on update, insert or delete on the table) yet, so are there alternative methods?
I found AOQ (Oracle Advanced Queuing), DBMS_PIPE, Oracle SNMP Agent Integrator Polling Activity, but I don't know if they are right for this case...
It depends.
Polling or triggers are often all you need depending on the volume of data involved, and the frequency of inserts/updates/deletes.
For example, the polling method might be as simple as adding a column which is set to 1 by default, and updated to NULL when the row is "consumed" by the replication code. A trigger on the table would set it back to 1 if a row is updated. An index on this column would be lightweight (the index would only include entries for rows where the column is 1) and therefore fast to query. You'd need another table to handle deletes, though.
The trigger method would merely write insert/update/delete rows into a log table of some sort, which would then get purged periodically by a job.
For heavier volumes solutions include Oracle GoldenGate and Oracle Streams: http://www.oracle.com/technetwork/database/focus-areas/data-integration/index.html
Scenario:
I have Database1 (PostgreSQL). For this i) When a record is deleted, the status col. for that record is changed to inactive. ii) When a record is updated, the current record is rendered INACTIVE and a new record is inserted. iii) Insertion happens as usual. There is a timestamp col for each record for all the tables in the database.
I have another database2 (SQLite) which is synced with Database1, and follows the same property of Database1
Database1 gets changed regularly and I would get the CSV files for all the tables. The CSV would include all the data, including new insertions, and updations.
Requirement:
I need to make the data in Database1 consistent with the new CSV.
i) For the records that are not in the CSV, but are there in Database1 (DELETED RECORDS) - These records I have to set the status as inactive.
ii) For the records that are there in the CSV but not there in the Database1 (INSERTED RECORDS) - I need these records to be inserted.
iii) For the records that are updated in the CSVs I need to set status as inactive and insert new records.
Kindly help me with the logical implementation of these!!!
Thanks
Jayakrishnan
I assume you're looking to build software to achieve what you want, not looking for an off-the-shelf solution.
What environments are you able to develop in? C? PHP? Java? C#?
Lots of options in many environments that can all read/write from CSV/SQLite/PostgreSQL.
you could use an ON DELETE trigger to override existing delete behavior.
This strikes me as dangerous however. Someone is going to rely on this and then when the trigger isn't there, you will have actual deletions occur... It's better to encapsulate this behind a view or something and put a trigger on that. Or go through a stored procedure or something.
I erroneously delete all the rows from a MS SQL 2000 table that is used in merge replication (the table is on the publisher). I then compounded the issue by using a DTS operation to retrieve the rows from a backup database and repopulate the table.
This has created the following issue:
The delete operation marked the rows for deletion on the clients but the DTS operation bypasses the replication triggers so the imported rows are not marked for insertion on the subscribers. In effect the subscribers lose the data although it is on the publisher.
So I thought "no worries" I will just delete the rows again and then add them correctly via an insert statement and they will then be marked for insertion on the subscribers.
This is my problem:
I cannot delete the DTSed rows because I get a "Cannot insert duplicate key row in object 'MSmerge_tombstone' with unique index 'uc1MSmerge_tombstone'." error. What I would like to do is somehow delete the rows from the table bypassing the merge replication trigger. Is this possible? I don't want to remove and redo the replication because the subscribers are 50+ windows mobile devices.
Edit: I have tried the Truncate Table command. This gives the following error "Cannot truncate table xxxx because it is published for replication"
Have you tried truncating the table?
You may have to truncate the table and reset the ID field back to 0 if you need the inserted rows to have the same ID. If not, just truncate and it should be fine.
You also could look into temporarily dropping the unique index and adding it back when you're done.
Look into sp_mergedummyupdate
Would creating a second table be an option? You could create a second table, populate it with the needed data, add the constraints/indexes, then drop the first table and rename your second table. This should give you the data with the right keys...and it should all consist of SQL statements that are allowed to trickle down the replication. It just isn't probably the best on performance...and definitely would impose some risk.
I haven't tried this first hand in a replicated environment...but it may be at least worth trying out.
Thanks for the tips...I eventually found a solution:
I deleted the merge delete trigger from the table
Deleted the DTSed rows
Recreated the merge delete trigger
Added my rows correctly using an insert statement.
I was a little worried bout fiddling with the merge triggers but every thing appears to be working correctly.