Databases: Where does the data go before a commit? - database

We're trying to locate a memory leak in our code. As the software runs, I can see memory usage slowly get bigger and bigger. Each operation add records in a database.
Then, I was wondering, where does the data from an INSERT command really go before we commit the changes? Is the data added in the actual database file and flagged as "Rollback this if requested"? or only stored in internal memory and dumped when the commit request is done?
If it helps, we're using Access for now.

As part of the "Begin Transaction", the "Data" as such does not go anywhere or change in any way, instead it records the list of commands issued to it. If you then cancel the transaction with a "Rollback" the instructions are dropped and no changes made, else if a "Commit" is issued then it executes the saved instructions in the correct order.
As the instructions are stored in a local table (as mentioned by Albert) this is why you see a memory increase as the local file is opened in it's entirety to memory (hence why we split front and back end database's in Access to avoid dumping a huge file on the RAM)
Also worth mentioning that any SQL statements that are issued have their syntax checked before been saved in order to ensure that if you are running multiple SQL's; it won't fail mid way through and leave your data in a state that you did not intend.
Apologies for "Instructions" I know this is the layman's term but I hope it makes sense.

Related

How to only allow one update/insert to a table if row is outdated

I have a file stored on disk that can be access across multiple servers in a web farm. This file is updated as necessary based on data changes in the database. I have a database table that stores a row with a URI for this file and some hashes based off of some database tables. If the hashes don't match their respective tables, then the file need to be regenerated and a new row needs to be inserted.
How do I make it so that only 1 client regenerates this file and inserts a row?
The easiest but worst solution (because of locks) is to:
BEGIN TRANSACTION
SELECT ROW FROM TABLE (lock the table for the remainder of the transaction)
IF ROW IS OUT OF DATE:
REGENERATE FILE
INSERT ROW INTO TABLE
DO SOME STUFF WITH FILE (30s)
COMMIT TRANSACTION
However, if multiple clients execute this code, all of the subsequent clients sit for a long time while the "DO SOME STUFF WITH FILE" processes.
Is there a better way to handle this? Maybe changing the way I process the file before the commit to make it faster? I've been stumped on this for a couple days.
It sounds like you need to do your file processing asynchronously, so the file process is spun off and the transaction completes in a timely manner. There are a few ways to do that, but the easiest might be to replace the "do stuff with file" with a "insert a record into the table This_File_Needs_To_Be_Updated, then run a job every few minutes that updates each record in that table. Or HERE is some code that generates a job on the fly. Or see THIS question on Stack Overflow.
The answer depends on the details of file level processing.
If you just swap the database and file operations, you risk corruption of the file or busy waiting (depending on how exactly you open it, and what your code does when a concurrent open is rejected). Busy waiting would definitely be worse than waiting on a database lock from a throughput (or any other) perspective.
If your file processing really takes so long as to be frequently causing queueing of requests, the only solutions are to add more powerful hardware or optimize file level processing.
For example, if the file only reflects the data in the database, you might get away with not updating it at all, and having a background process that periodically regenerates its content based on the data in the database. You might need to add versioning that makes sure that whoever reads the file is not receiving stale data. If the file pointed to by the URL has a new name every time, you might need an error handler that makes sure that GET requests are not habitually receiving a 404 response on new files.

Creating a cursor, Is the data being copied?

I'm working with cursors at the moment and it's getting messy for me, hope you can highlight some questions to me.
I have checked the oracle documentation about cursors but I cannot find out:
When a cursor is opened, is a local copy of the result created on memory?
Yes: Does it really make sense if I have a table with lot of data? I think it would not be really efficient, isn't it?.
No:
Is the whole data locked to other processes?
YES
: What if I'm doing a truly heavy process for each row, the data would be unavaliable for so long...
NO
: What would happen if another process modify the data I'm currently using with the cursor or if it adds new rows, would it be updated for the cursor?
Thanks so much.
You may want to read the section on Data Concurrency and Consistency in the Concepts Guide.
The answers to your specific questions:
When a cursor is opened, is a local copy of the result created on
memory?
No, however via Oracle's "Multiversion Read Consistency" (see link above) the rows fetched by the cursor will all be consistent with the point in time at which the cursor was opened - i.e. each row when fetched will be a row that existed when the cursor was opened and still has the same values (even though another session may have updated or even deleted it in the mean time).
No: Is the whole data locked to other processes?
No
NO : What would happen if another process modify the data I'm currently using with the cursor or if it adds new rows, would it be updated for the cursor?
Your cursor would not see those changes, it would continue to work with the rows as they existed when the cursor was opened.
The Concepts Guide explains this in detail, but the essence of the way it works is as follows:
Oracle maintains a something called a System Change Number (SCN) that is continually incremented.
When your cursor opens it notes the current value of the SCN.
As the cursor fetches rows it looks at the SCN stamped on them. If this SCN is the same or lower than the cursor's starting SCN then the data is up to date and is used. However if the row's SCN is higher that the cursor's then this means that another session has changed the row (and committed the change). In this case Oracle looks in the rollback segments for the old version of the row and uses that instead. If the query runs for a long time it is possible that the old version has been overwritten in the rollback segments. In this case the query fails with an ORA-01555 error.
You can modify this default behaviour if needed. For example, if it is vital that no other session modifies the rows you are querying during the running of your cursor, then you can use the FOR UPDATE clause to lock the rows:
CURSOR c IS SELECT sal FROM emp FOR UPDATE OF sal;
Now any session that tries to modify a row used in your query while it is running is blocked until your query has finished you commit or rollback.

Performance problems using Squeryl and H2 database for a desktop application

I have a desktop application that persists its data in a local H2 database. I am using Squeryl to interface to the database.
The size of the database is very small (some 10kB). I'm experiencing severe performance problems and there is extensive disk IO going on. I am only reading the DB and thus I expected that the complete data could be cached; I even set the cache size to some value (way higher than total db size). Also I tried disabling locking with no result.
My program performs very many small queries on the database; basically I have a Swing TableModel that makes a query for every table entry (each column of each row). I'm wrapping each of those calls into a Squeryl transaction block.
I've made a profile using JVisualVM and I suspect the following call tree shows the problem. The topmost method is a read access from my code.
link to JVisualVM screen shot.
Question
How can I fix this or what am I doing wrong? Somehow I expect that I should be able to make many small calls to a DB that is small enough to be held in under 1MB of memory. Why is this disk IO going on and how can I avoid it?
Looking at the screeshot it seems you are selecting from the DB inside the getValueAt() method of your TableModel (the method name getRowAt() at the top of the call stack causes this assumption of mine).
If my assumption is correct, than this is the your main problem. getValueAt() is called by the JTable's paint() method constantly (probably several times a second), so that should be as quick as possible.
You should get the data for your JTable in a single SQL query and then save the result in a some data structure (e.g. an ArrayList or something like that).
I don't know Squeryl, but I doubt you really need to wrap every SELECT into a transaction. From the stacktrace it appears that this causes massive write in H2. Did you try to run the SELECTs without explicitely opening (and closing) a transaction each time?
The solution was very simple in the end. I'll quote the FAQ.
Delayed Database Closing
Usually, a database is closed when the last connection to it is closed. In some situations this slows down the application, for example when it is not possible to keep at least one connection open. The automatic closing of a database can be delayed or disabled with the SQL statement SET DB_CLOSE_DELAY <seconds>. The parameter <seconds> specifies the number of seconds to keep a database open after the last connection to it was closed. The following statement will keep a database open for 10 seconds after the last connection was closed:
SET DB_CLOSE_DELAY 10
The value -1 means the database is not closed automatically. The value 0 is the default and means the database is closed when the last connection is closed. This setting is persistent and can be set by an administrator only. It is possible to set the value in the database URL: jdbc:h2:~/test;DB_CLOSE_DELAY=10.

Why are rollbacks needed?

Why are rollbacks so important?
Is it to prevent data (like data in a SQL DB) from being in an inconsistent state?
If so, how comes the data "store" (the SQL DB or whatever) made it possible in the first place to become in a corrupt state?
Are there data storage mechanisms that don't have a need for "rollbacks"?
Rollbacks are important in case of any kind of errors appearing during database operational. They can really save the day in case of database server crashes or a critical exception is thrown in an application that modifies contents of DB. When a significant DB operation is performed (i.e. updates, inserts, etc.) and the process is broken in the middle, it would be very hard to trace which operations were successful and usage of DB afterward would be very complicated.
The "store" itself does not generally have a built-in mechanism for consistency control - this is exactly why we use rollbacks and transactions. This can be perceived as a sort of 'live backup' mechanism.
There are cases, when you need insert/update data in many related tables - if you didn't have transactional logic, then any errors somewhere in middle of process could make data inconsistent.
Simple example. Say you need to insert both order header data into orders table and order lines into lines table. You insert order header, read identity, start inserting order lines - but this second insert fails on whatever reason. Only reliable way to recover from this situation is to rollback first insert - either explicitly (when your connection to db is alive) or implicitly (when link is gone down).

Confirm before delete/update in SQL Management Studio?

So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause.
I've been all up and down the mgmt studio options, but can't find a confirm option. I know other tools for other databases have it.
I'd suggest that you should always write SELECT statement with WHERE clause first and execute it to actually see what rows will your DELETE command delete. Then just execute DELETE with the same WHERE clause. The same applies for UPDATEs.
Under Tools>Options>Query Execution>SQL Server>ANSI, you can enable the Implicit Transactions option which means that you don't need to explicitly include the Begin Transaction command.
The obvious downside of this is that you might forget to add a Commit (or Rollback) at the end, or worse still, your colleagues will add Commit at the end of every script by default.
You can lead the horse to water...
You might suggest that they always take an ad-hoc backup before they do anything (depending on the size of your DB) just in case.
Try using a BEGIN TRANSACTION before you run your DELETE statement.
Then you can choose to COMMIT or ROLLBACK same.
In SSMS 2005, you can enable this option under Tools|Options|Query Execution|SQL Server|ANSI ... check SET IMPLICIT_TRANSACTIONS. That will require a commit to affect update/delete queries for future connections.
For the current query, go to Query|Query Options|Execution|ANSI and check the same box.
This page also has instructions for SSMS 2000, if that is what you're using.
As others have pointed out, this won't address the root cause: it's almost as easy to paste a COMMIT at the end of every new query you create as it is to fire off a query in the first place.
First, this is what audit tables are for. If you know who deleted all the records you can either restrict their database privileges or deal with them from a performance perspective. The last person who did this at my office is currently on probation. If she does it again, she will be let go. You have responsibilites if you have access to production data and ensuring that you cause no harm is one of them. This is a performance problem as much as a technical problem. You will never find a way to prevent people from making dumb mistakes (the database has no way to know if you meant delete table a or delete table a where id = 100 and a confirm will get hit automatically by most people). You can only try to reduce them by making sure the people who run this code are responsible and by putting into place policies to help them remember what to do. Employees who have a pattern of behaving irresponsibly with your busness data (particulaly after they have been given a warning) should be fired.
Others have suggested the kinds of things we do to prevent this from happening. I always embed a select in a delete that I'm running from a query window to make sure it will delete only the records I intend. All our code on production that changes, inserts or deletes data must be enclosed in a transaction. If it is being run manually, you don't run the rollback or commit until you see the number of records affected.
Example of delete with embedded select
delete a
--select a.* from
from table1 a
join table 2 b on a.id = b.id
where b.somefield = 'test'
But even these techniques can't prevent all human error. A developer who doesn't understand the data may run the select and still not understand that it is deleting too many records. Running in a transaction may mean you have other problems when people forget to commit or rollback and lock up the system. Or people may put it in a transaction and still hit commit without thinking just as they would hit confirm on a message box if there was one. The best prevention is to have a way to quickly recover from errors like these. Recovery from an audit log table tends to be faster than from backups. Plus you have the advantage of being able to tell who made the error and exactly which records were affected (maybe you didn't delete the whole table but your where clause was wrong and you deleted a few wrong records.)
For the most part, production data should not be changed on the fly. You should script the change and check it on dev first. Then on prod, all you have to do is run the script with no changes rather than highlighting and running little pieces one at a time. Now inthe real world this isn't always possible as sometimes you are fixing something broken only on prod that needs to be fixed now (for instance when none of your customers can log in because critical data got deleted). In a case like this, you may not have the luxury of reproducing the problem first on dev and then writing the fix. When you have these types of problems, you may need to fix directly on prod and you should have only dbas or database analysts, or configuration managers or others who are normally responsible for data on the prod do the fix not a developer. Developers in general should not have access to prod.
That is why I believe you should always:
1 Use stored procedures that are tested on a dev database before deploying to production
2 Select the data before deletion
3 Screen developers using an interview and performance evaluation process :)
4 Base performance evaluation on how many database tables they do/do not delete
5 Treat production data as if it were poisonous and be very afraid
So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause
Probably the only solution will be to replace someone with someone else ;). Otherwise they will always find their workaround
Eventually restrict the database access for that person and provide them with the stored procedure that takes the parameter used in the where clause and grant them access to execute that stored procedure.
Put on your best Trogdor and Burninate until they learn to put in the WHERE clause.
The best advice is to get the muckety-mucks that are mucking around in the database to use transactions when testing. It goes a long way towards preventing "whoops" moments. The caveat is that now you have to tell them to COMMIT or ROLLBACK because for sure they're going to lock up your DB at least once.
Lock it down:
REVOKE delete rights on all your tables.
Put in an audit trigger and audit table.
Create parametrized delete SPs and only give rights to execute on an as needed basis.
Isn't there a way to give users the results they need without providing raw access to SQL? If you at least had a separate entry box for "WHERE", you could default it to "WHERE 1 = 0" or something.
I think there must be a way to back these out of the transaction journaling, too. But probably not without rolling everything back, and then selectively reapplying whatever came after the fatal mistake.
Another ugly option is to create a trigger to write all DELETEs (maybe over some minimum number of records) to a log table.

Resources