I'm working with cursors at the moment and it's getting messy for me, hope you can highlight some questions to me.
I have checked the oracle documentation about cursors but I cannot find out:
When a cursor is opened, is a local copy of the result created on memory?
Yes: Does it really make sense if I have a table with lot of data? I think it would not be really efficient, isn't it?.
No:
Is the whole data locked to other processes?
YES
: What if I'm doing a truly heavy process for each row, the data would be unavaliable for so long...
NO
: What would happen if another process modify the data I'm currently using with the cursor or if it adds new rows, would it be updated for the cursor?
Thanks so much.
You may want to read the section on Data Concurrency and Consistency in the Concepts Guide.
The answers to your specific questions:
When a cursor is opened, is a local copy of the result created on
memory?
No, however via Oracle's "Multiversion Read Consistency" (see link above) the rows fetched by the cursor will all be consistent with the point in time at which the cursor was opened - i.e. each row when fetched will be a row that existed when the cursor was opened and still has the same values (even though another session may have updated or even deleted it in the mean time).
No: Is the whole data locked to other processes?
No
NO : What would happen if another process modify the data I'm currently using with the cursor or if it adds new rows, would it be updated for the cursor?
Your cursor would not see those changes, it would continue to work with the rows as they existed when the cursor was opened.
The Concepts Guide explains this in detail, but the essence of the way it works is as follows:
Oracle maintains a something called a System Change Number (SCN) that is continually incremented.
When your cursor opens it notes the current value of the SCN.
As the cursor fetches rows it looks at the SCN stamped on them. If this SCN is the same or lower than the cursor's starting SCN then the data is up to date and is used. However if the row's SCN is higher that the cursor's then this means that another session has changed the row (and committed the change). In this case Oracle looks in the rollback segments for the old version of the row and uses that instead. If the query runs for a long time it is possible that the old version has been overwritten in the rollback segments. In this case the query fails with an ORA-01555 error.
You can modify this default behaviour if needed. For example, if it is vital that no other session modifies the rows you are querying during the running of your cursor, then you can use the FOR UPDATE clause to lock the rows:
CURSOR c IS SELECT sal FROM emp FOR UPDATE OF sal;
Now any session that tries to modify a row used in your query while it is running is blocked until your query has finished you commit or rollback.
Related
I want to copy one big database table to another. This is my current approach:
OPEN CURSOR WITH HOLD lv_db_cursor FOR
SELECT * FROM zcustomers.
DO.
REFRESH gt_custom.
FETCH NEXT CURSOR lv_db_cursor
INTO TABLE gt_custom
PACKAGE SIZE lv_package_size.
IF sy-subrc NE 0.
CLOSE CURSOR lv_db_cursor.
EXIT.
ENDIF.
INSERT zcustomers1 FROM TABLE gt_custom.
* Write code to modify u r custom table from gt_custom .
ENDDO.
But the problem is that I get a error "Enterprise]ASE has run out of LOCKS".
I tried to use COMMIT statement after insert some piece of records, but it closes the cursor.
I don't want to increase max locks by database setting or make a copy on database level.
I want to understand how I can copy with best performance and low usage memory in ABAP...
Thank you.
You can also "copy on database level" from within ABAP SQL using a combined INSERT and SELECT:
INSERT zcustomers1 FROM ( SELECT * FROM zcustomers ).
Unlike the other solution, this runs in one single transaction (no inconsistency on the database) and avoids moving the data between the database and the ABAP server, so should be by magnitudes faster. However, like the code in question this might still run into database limits due to opening many locks during the insert (though might avoid other problems). This should be solved on database side and is not a limitation of ABAP.
By using COMMIT CONNECTION instead of COMMIT WORK it is possible to commit only the transaction writing to zcustomers1, while keeping the transaction reading from zcustomers open.
Note that having multiple transactions (one reading, multiple writing) can create inconsistencies in the database if zcustomers or zcustomers1 are written while this code runs. Also reading from zcustomers1 shows only a part of the entries from zcustomers.
DATA:
gt_custom TYPE TABLE OF ZCUSTOMERS,
lv_package_size TYPE i,
lv_db_cursor TYPE cursor.
lv_package_size = 10000.
OPEN CURSOR WITH HOLD lv_db_cursor FOR
SELECT * FROM zcustomers.
DO.
REFRESH gt_custom.
FETCH NEXT CURSOR lv_db_cursor
INTO TABLE gt_custom
PACKAGE SIZE lv_package_size.
IF sy-subrc NE 0.
CLOSE CURSOR lv_db_cursor.
EXIT.
ENDIF.
MODIFY zcustomers2 FROM TABLE gt_custom.
" regularily commiting here releases a partial state to the database
" through that, locks are released and running into ASE error SQL1204 is avoided
COMMIT CONNECTION default.
ENDDO.
If on ABC table update was running at the same point select was also started running on the same table,In postgresql, which will take first?
Okay, I think I know what you're asking.
When you update something, it's not visible to anything else until you commit that transaction. When you update something and it takes a while, it should not block any selects on that same stuff.
So if you start your 10-minute update, and then someone starts their 2-minute select, they will see the old data, and will not be blocked. The update will continue without interruption. After the update completes, if someone starts the 2-minute select again, they will see the new data.
Does that answer your question?
The reason for this, in case you're interested, is that postgresql uses MVCC. Essentially that means that when you update rows, new copies of those rows are written. On the new copies, postgres indicates what transactions can see the new data. The old data sticks around, and also has information about what transactions can see it.
When you select, postgres checks the table, and looks to see which of those rows are visible to you in your current transaction. It only returns rows you can see.
When there are no more transactions that can see a row, postgres knows it can be deleted, and will reclaim that space.
I have data to load where I only need to pull records since the last time I pulled this data. There are no date fields to save this information in my destination table so I have to keep track of the maximum date that I last pulled. The problem is I can't see how to save this value in SSIS for the next time the project runs.
I saw this:
Persist a variable value in SSIS package
but it doesn't work for me because there is another process that purges and reloads the data separate from my process. This means that I have to do more than just know the last time my process ran.
The only solution I can think of is to create a table but it seems a bit much to create a table to hold one field.
This is a very common thing to do. You create an execution table that stores the package name, the start time, the end time, and whether or not the package failed/succeeded. You are then able to pull the max start time of the last successfully ran execution.
You can't persist anything in a package between executions.
What you're talking about is a form of differential replication and this has been done many many times.
For differential replication it is normal to store some kind of state in the subscriber (the system reading the data) or the publisher (the system providing the data) that remembers what state you're up to.
So I suggest you:
Read up on differential replication design patterns
Absolutely put your mind at rest about writing data to a table
If you end up having more than one source system or more than one source table your storage table is not going to have just one record. Have a think about that. I answered a question like this the other day - you'll find over time that you're going to add handy things like the last time the replication ran, how long it took, how many records were transferred etc.
Is it viable to have a SQL table with only one row and one column?
TTeeple and Nick.McDermaid are absolutely correct, and you should follow their advice if humanly possible.
But if for some reason you don't have access to write to an execution table, you can always use a script task to read/write the last loaded date to a text file on on whatever local file-system you're running SSIS on.
Objective
To understand the mechanism/implementation when processing DMLs against a table. Does a database (I work on Oracle 11G R2) take snapshots (for each DML) of the table to apply the DMLs?
Background
I run a SQL to update the AID field of the target table containing old values with the new values from the source table.
UPDATE CASES1 t
SET t.AID=(
SELECT DISTINCT NID
FROM REF1
WHERE
oid=t.aid
)
WHERE EXISTS (
SELECT 1
FROM REF1
WHERE oid=t.aid
);
I thought the 'OLD01' could be updated twice (OLD01 -> NEW01 -> SCREWED).
However, it did not happen.
Question
For each DML, does a database take a snapshot of table X (call it X+1) for a DML (1st) and then keep taking snapshot (call it X+2) of the result (X+1) for the next DML (2nd) on the table, and so on for each DML that are successibly executed? Is this also used as a mechanism to implement Rollback/Commit?
Is it an expected behaviour specified as a standard somewhere? If so, kindly suggest relevant references. I Googled but not sure what the key words should be to get the right result.
Thanks in advance for your help.
Update
Started reading Oracle Core (ISBN 9781430239543) by Jonathan Lewis and saw the diagram. So current understanding is the UNDO records are created in the UNDO tablespace for each update and the original data is reconstructed from there, which I initially thought as snapshots.
In Oracle, if you ran that update twice in a row in the same session, with the data as you've shown, I believe you should get the results that you expected. I think you must have gone off track somewhere. (For example, if you executed the update once, then without committing you opened a second session and executed the same update again, then your result would make sense.)
Conceptually, I think the answer to your question is yes (speaking specifically about Oracle, that is). A SQL statement effectively operates on a snapshot of the tables as of the point in time that the statement starts executing. The proper term for this in Oracle is read-consistency. The mechanism for it, however, does not involve taking a snapshot of the entire table before changes are made. It is more the reverse - records of the changes are kept in undo segments, and used to revert blocks of the table to the appropriate point in time as needed.
The documentation you ought to look at to understand this in some depth is in the Oracle Concepts manual: http://docs.oracle.com/cd/E11882_01/server.112/e40540/part_txn.htm#CHDJIGBH
Is it possible to effectively tail a database table such that when a new row is added an application is immediately notified with the new row? Any database can be used.
Use an ON INSERT trigger.
you will need to check for specifics on how to call external applications with the values contained in the inserted record, or you will write your 'application' as a SQL procedure and have it run inside the database.
it sounds like you will want to brush up on databases in general before you paint yourself into a corner with your command line approaches.
Yes, if the database is a flat text file and appends are done at the end.
Yes, if the database supports this feature in some other way; check the relevant manual.
Otherwise, no. Databases tend to be binary files.
I am not sure but this might work for primitive / flat file databases but as far as i understand (and i could be wrong) the modern database files are encrypted. Hence reading a newly added row would not work with that command.
I would imagine most databases allow for write triggers, and you could have a script that triggers on write that tells you some of what happened. I don't know what information would be available, as it would depend on the individual database.
There are a few options here, some of which others have noted:
Periodically poll for new rows. With the way MVCC works though, it's possible to miss a row if there were two INSERTS in mid-transaction when you last queried.
Define a trigger function that will do some work for you on each insert. (In Postgres you can call a NOTIFY command that other processes can LISTEN to.) You could combine a trigger with writes to an unpublished_row_ids table to ensure that your tailing process doesn't miss anything. (The tailing process would then delete IDs from the unpublished_row_ids table as it processed them.)
Hook into the database's replication functionality, if it provides any. This should have a means of guaranteeing that rows aren't missed.
I've blogged in more detail about how to do all these options with Postgres at http://btubbs.com/streaming-updates-from-postgres.html.
tail on Linux appears to be using inotify to tell when a file changes - it probably uses similar filesystem notifications frameworks on other operating systems. Therefore it does detect file modifications.
That said, tail performs an fstat() call after each detected change and will not output anything unless the size of the file increases. Modern DB systems use random file access and reuse DB pages, so it's very possible that an inserted row will not cause the backing file size to change.
You're better off using inotify (or similar) directly, and even better off if you use DB triggers or whatever mechanism your DBMS offers to watch for DB updates, since not all file updates are necessarily row insertions.
I was just in the middle of posting the same exact response as glowcoder, plus another idea:
The low-tech way to do it is to have a timestamp field, and have a program run a query every n minutes looking for records where the timestamp is greater than that of the last run. The same concept can be done by storing the last key seen if you use a sequence, or even adding a boolean field "processed".
With oracle you can select an psuedo-column called 'rowid' that gives a unique identifier for the row in the table and rowid's are ordinal... new rows get assigned rowids that are greater than any existing rowid's.
So, first select max(rowid) from table_name
I assume that one cause for the raised question is that there are many, many rows in the table... so this first step will be taxing the db a little and take some time.
Then, select * from table_name where rowid > 'whatever_that_rowid_string_was'
you still have to periodically run the query, but it is now just a quick and inexpensive query