I have to write a script which will take a day's data from a single table and dump it into another table (where my other process will process that data). However, the table from which I want to export data is a "Live" table which has a high amount of inserts. Any kind of locking (while the export is being done) will not be acceptable. I know that mysql has a way of NOT holding a lock via the "--skip-lock-tables" flag. Does Oracle have something similar ? If not , what is the best way to achieve this ?
Oracle version being used is - Oracle SE One 12.1.0.1.v2
At the point in time when SELECT starts, an SCN is attached to your SELECT. Any changes to the data after that SCN does not impact your SELECT. Your SELECT reads the old data from UNDO. If this is a high transaction table and you expect your SELECT to be long running, ensure that you have sufficient UNDO space and enough UNDO_RETENTION.
Also focus on good design prevents any potential issues. From this point of view, I would suggest implementing daily partitions on your source table. This would help backing up one day's data easily and help maintain the table in future.
Hope this helps.
Related
I have an application which selects quite a large amount of data (pyodbc + sqlalchemy, db = SQL Server), does some operations on it (with pandas) and then inserts the results into another table.
My issue is now that I would like to mark the rows which I have selected originally at the end of my processing.
What is the best way to achieve this?
I currently prevent any new inserts etc to my first table with a pid lock (blocking the loader), but this of course is not a constraint on the DB and then bulk update the rows in the first table which don't have any mark yet
I could of course get a list of the ID's which were in my original data and update the ID's in batches which is probably really slow since there could be millions upon millions of rows)
Another option would be to lock the table at the start of my process, but is this actually a good idea? (what if my script dies to whatever reasons during the processing in a way that the "finally" block for releasing the lock is not executed)
Thankful for any ideas, thoughts etc!
We've a legacy system (MAS200 if you need to know) and there's an old vbs script which pull data from MAS and populates two staging tables in our production SQL database. And after some processing / cleanup that data goes into actual tables.
Data flow : MAS200 --> Staging tables --> Production table
To simplify consider there's an "Order" parent table and an "Items" child table. Order can have multiple items, each item record will have an FK OrderId. So, during import first we import the Order data and create an entry in the "Order" table and then fetch "Items" entries and import them.
Existing TRIGGER based approach -
At present we've two TRIGGERs - one on each staging table (Order & Items). So each new insert is tapped, and after processing data a new entry is inserted into actual production table. My only concern is that the trigger is executed for each Items entry instead of BULK insert. And it seems less manageable.
SP based approach -
If I remove both the TRIGGERs then import data into staging tables and finally execute an SP which will import Order data and then perform a BULK insert into the Items table. Could that be more efficient / faster?
Its not a comparison actually just a diff design. I'd like to know which one seems better or if there's a 3rd better approach to import from MAS to production SQL db.
EDIT 1 : Thanks. As asked by many - the data volume is not big or too frequent. Lets say 10-12 Orders (with 20-30 Items) every hour. Also with TRIGGERs, thought we don't get a TRANSACTION but only two simple TRIGGERs are suffice. I believe more scripting is needed with SP.
Goal : Need to keep it as simple, clean and efficient as possible.
Using Triggers:
Pros:
The data sync is real time. As you create data by data entry, the volume of data should not be big, so having bulk insert doesn't improve a lot. performance using trigger is good enough
Cons:
Data sync is not real time and if the connection breaks between MAS200 and production, you'll have a big problem. Also (as you mentioned) you can not have transaction, which is a big issue.
I suggest you use SP to transfer data in a time interval basis (if you can tolerate synchronization delay).
If you really want fast approach , you need :
1) to disable the FK on the ITEMS table for the duration of the load
2) then LOAD the ORDERS , and then enable the FK
All this should be done using SP , trigger approach is safe but very slow when its come to large bulks load
I hope you will find it useful
Thanks
Is there an automatic way of knowing which rows are the latest to have been added to an OpenEdge table? I am working with a client and have access to their database, but they are not saving ids nor timestamps for the data.
I was wondering if, hopefully, OpenEdge is somehow doing this out of the box. (I doubt it is but it won't hurt to check)
Edit: My Goal
My goal from this is to be able to only import the new data, i.e. the delta, of a specific table. Without having which rows are new, I am forced to import everything because I have no clue what was aded.
1) Short answer is No - there's no "in the box" way for you to tell which records were added, or the order they were added.
The only way to tell the order of creation is by applying a sequence or by time-stamping the record. Since your application does neither, you're out of luck.
2) If you're looking for changes w/out applying schema changes, you can capture changes using session or db triggers to capture updates to the db, and saving that activity log somewhere.
3) If you're just looking for a "delta" - you can take a periodic backup of the database, and then use queries to compare the current db with the backup db and get the differences that way.
4) Maintain a db on the customer site with the contents of the last table dump. The next time you want to get deltas from the customer, compare that table's contents with the current table, dump the differences, then update the db table to match the current db's table.
5) Personally. I'd talk to the customer and see if (a) they actually require this functionality, (b) find out what they think about adding some fields and a bit of code to the system to get an activity log. Adding a few fields and some code to update them shouldn't be that big of a deal.
You could use database triggers to meet this need. In order to do so you will need to be able to write and deploy trigger procedures. And you need to keep in mind that the 4GL and SQL-92 engines do not recognize each other's triggers. So if updates are possible via SQL, 4GL triggers will be blind to those updates. And vice-versa. (If you do not use SQL none of this matters.)
You would probably want to use WRITE triggers to catch both insertions and updates to data. Do you care about deletes?
Simple-minded 4gl WRITE trigger:
TRIGGER PROCEDURE FOR WRITE OF Customer. /* OLD BUFFER oldCustomer. */ /* OLD BUFFER is optional and not needed in this use case ... */
output to "customer.dat" append.
export customer.
output close.
return.
end.
I'm writing an application which must log information pretty frequently, say, twice in a second. I wish to save the information to an sqlite database, however I don't mind to commit changes to the disk once every ten minutes.
Executing my queries when using a file-database takes to long, and makes the computer lag.
An optional solution is to use an in-memory database (it will fit, no worries), and synchronize it to the disk from time to time,
Is it possible? Is there a better way to achieve that (can you tell sqlite to commit to disk only after X queries?).
Can I solve this with Qt's SQL wrapper?
Let's assume you have an on-disk database called 'disk_logs' with a table called 'events'. You could attach an in-memory database to your existing database:
ATTACH DATABASE ':memory:' AS mem_logs;
Create a table in that database (which would be entirely in-memory) to receive the incoming log events:
CREATE TABLE mem_logs.events(a, b, c);
Then transfer the data from the in-memory table to the on-disk table during application downtime:
INSERT INTO disk_logs.events SELECT * FROM mem_logs.events;
And then delete the contents of the existing in-memory table. Repeat.
This is pretty complicated though... If your records span multiple tables and are linked together with foreign keys, it might be a pain to keep these in sync as you copy from an in-memory tables to on-disk tables.
Before attempting something (uncomfortably over-engineered) like this, I'd also suggest trying to make SQLite go as fast as possible. SQLite should be able to easily handly > 50K record inserts per second. A few log entries twice a second should not cause significant slowdown.
If you're executing each insert within it's own transaction - that could be a significant contributor to the slow-downs you're seeing. Perhaps you could:
Count the number of records inserted so far
Begin a transaction
Insert your record
Increment count
Commit/end transaction when N records have been inserted
Repeat
The downside is that if the system crashes during that period you risk loosing the un-committed records (but if you were willing to use an in-memory database, than it sounds like you're OK with that risk).
A brief search of the SQLite documentation turned up nothing useful (it wasn't likely and I didn't expect it).
Why not use a background thread that wakes up every 10 minutes, copies all of the log rows from the in-memory database to the external database (and deletes them from the in-memory database). When your program is ready to end, wake up the background thread one last time to save the last logs, then close all of the connections.
I have a requirement to take a "snapshot" of a current database and clone it into the same database, with new Primary Keys.
The schema in question consists of about 10 tables, but a few of the tables will potentially contain hundreds of thousands to 1 million records that need to be duplicated.
What are my options here?
I'm afraid that writing a SPROC will require a locking of the database rows in question (for concurrency) for the entire duration of the operation, which is quite annoying to other users. How long would such an operation take, assuming that we can optimize it to the full extent sqlserver allows? Is it going to be 30 seconds to 1 minute to perform this many inserts? I'm not able to lock the whole table(s) and do a bulk insert, because there are other users under other accounts that are using the same tables independently.
Depending on performance expectations, an alternative would be to dump the current db into an xml file and then asynchronously clone the db from this xml file at leisure in the background. The obvious advantage of this is that the db is only locked for the time it takes to do the xml dump, and the inserts can run in the background.
If a good DBA can get the "clone" operation to execute start to finish in under 10 seconds, then it's probably not worth the complexity of the xmldump/webservice solution. But if it's a lost cause, and inserting potentially millions of rows is likely to balloon out in time, then I'd rather start out with the xml approach right away.
Or maybe there's an entirely better approach altogether??
Thanks a lot for any insights you can provide.
I would suggest backing the up database, and then restoring it as new db on your server. You can use that new DB as your source.
I will definitely recommend against the xml dump idea..
Does it need to be in the exact same tables? You could make a set of "snapshots" tables where all these records go, you would only need a single insert + select, like
insert into snapshots_source1 (user,col1, col2, ..., colN)
select 'john', col1, col2, ..., colN from source1
and so on.
You can make snapshots_* to have an IDENTITY column that will create the 'new PK' and that can also preserve the old one if you so wished.
This has (almost) no locking issues and looks a lot saner.
It does require a change in the code, but shouldn't be too hard to make the app to point to the snapshots table when appropriate.
This also eases cleaning and maintenance issues
---8<------8<------8<---outdated answer---8<---8<------8<------8<------8<---
Why don't you just take a live backup and do the data manipulation (key changing) on the destination clone?
Now, in general, this snapshot with new primary keys idea sounds suspect. If you want a replica, you have log shipping and cluster service, if you want a copy of the data to generate a 'new app instance' a backup/restore/manipulate process should be enough.
You don't say how much your DB will occupy, but you can certainly backup 20 million rows (800MB?) in about 10 seconds depending on how fast your disk subsystem is...