Is deleting all records in a table a bad practice in SQL Server? - sql-server

I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.

If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx

delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.

To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)

You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!

The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.

Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.

I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.

Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.

There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.

Related

Dropping an unindexed table with over 1.7 Billion rows on live database (SQL Admin Nightmare)

A recent employee of our company had a stored procedure that has gone haywire, and caused mass inserts into a debug table of his. The table is unindexed, is now at close to 1.7 billion rows, and is taking up so much space that the backup no longer fits on the backup drive (Backups now reach close to 250GB).
I haven't really seen anything like this, so I'm seeking advice from the MSSQL Gurus out here.
I know I could nibble away at the table, but being unindexed, the DELETE FROM [TABLE] WHERE ID IN (SELECT TOP 10000 [ID] FROM [TABLE]) nearly locks up the server searching for them.
I also don't want my log file to get massive, it's currently sitting at 480GB on a 1TB drive. If I delete this table, will I be able to shrink it back down? (My recovery mode is simple)
We could index the id field on the table, though we only have around 9 hours downtime a day, and during business hours we can't be locking up the database.
Just looking for advice here, and a point in the right direction.
Thanks.
You may want to consider TRUNCATE
MSDN reference: http://technet.microsoft.com/en-us/library/aa260621(v=sql.80).aspx
Removes all rows from a table without logging the individual row deletes.
Syntax:
TRUNCATE TABLE [YOUR_TABLE]
As #Rahul suggests in the comments, you could also use DROP TABLE [YOUR_TABLE] if you no longer plan to use the table in question. The TRUNCATE option would simply empty the table but leave it in place if you wanted to continue to use it.
With regards to the space issue, both of these operations will be comparatively quick and the space will be reclaimed, but it won't happen instantly. When using TRUNCATE, the data still has to be deleted, but SQL Server will simply deallocate the data pages used by the table and use a background process to actually perform the clean up afterwards.
This post should provide some useful information.
One suggestion would be ... take the back up of only that 1.7 billion rows table (probably in a tape drive/somewhere with good enough space) and then drop the table saying drop table table_name.
That way, if at all that debug table data is needed in future; you have a copy and can restore from backup.
I would remove the logging for this table and launch a delete stored procedure that would commit every 1000 rows.

Delete 200G ( 465025579 records ) of data from sql server

I have database with one of the tables that got over populated ( 465025579 records ), what is the best way to delete the records and keep only 3 months of the records, without the device to hang?
Delete them in batches based on date earliest first. Sure it'll take some time, but it's safer (as you are defining which to delete) and not so resource intensive. It also means you can shrink the database in batches too, instead of one big hit (which is quite resource intensive).
Yeah, it might fragment the database a little, but until you've got the actual data down to a manageable level, there isn't that much you can do.
To be fair, 200G of data isn't that much on a decent machine these days.
All this said, I'm presuming you want the database to remain 'online'
If you don't need the database to be available whilst you're doing this, the easiest thing to do is usually to select the rows that you want to keep into a different table, run a TRUNCATE on this table, and then copy the saved rows back in.
From TRUNCATE:
TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.

Truncate or Drop and Create Table

I have this table in a SQL Server 2008 R2 instance which I have a scheduled process that runs nightly against it. The table can have upward to 500K records in it at any one time. After processing this table I need to remove all rows from it so I am wondering which of the following methods would produce the least overhead (ie Excessive Transaction Log entries):
Truncate Table
Drop and recreate the table
Deleting the contents of the table is out due to time and extra Transaction log entries it makes.
The consensus seems to be Truncation, Thanks everyone!
TRUNCATE TABLE is your best bet. From MSDN:
Removes all rows from a table without logging the individual row
deletes.
So that means it won't bloat your transaction log. Dropping and creating the table not only requires more complex SQL, but also additional permissions. Any settings attached to the table (triggers, GRANT or DENY, etc.) will also have to be re-built.
Truncating the table does not leave row-by-row entries in the transaction log - so neither solution will clutter up your logs too much. If it were me, I'd truncate over having to drop and create each time.
I would go for TRUNCATE TABLE. You can potentially have overheads when indexes, triggers, etc get dropped. Plus you will lose permissions which will also have to be re-created along with any other required objects required for that table.
Also on DROP TABLE in MDSN below it mentions a little gotcha if you execute DROP and CREATE TABLE in the same batch
DROP TABLE and CREATE TABLE should not be executed on the same table
in the same batch. Otherwise an unexpected error may occur.
Dropping the table will destroy any associated objects (indexes, triggers) and may make procedures or views invalid. I would go with truncate, since it won't blow up your log and causes none of the possible issues a drop and create does.

Sql server table can be queried but not updated

i have a table which was always updatable before, but then suddenly i can no longer update the any of the columns in the table. i can still query the whole table and the results come back very fast, but the moment i try to update a column in the table, the update query simply stalls and does nothing.
i tried using
select req_transactionUOW
from master..syslockinfo
where req_spid = -2
to see if some orphaned transaction was locking the table, but it returns no results.
i can't seems to find signs of my table being locked, but simply cannot update it. any clues as to how to fix the table or whatever state it is in?
Could you please issue this query:
SELECT COUNT(*)
FROM mytable WITH (UPDLOCK, READPAST)
which will skip the locked records and make sure it returns the same number of records as
SELECT COUNT(*)
FROM mytable
You may need to repeat it with every index on the table forced, to make sure that no index resources is locked as well.
When you say "times out", does it hit the client timeout? For example, the default .net command timeout is 30 seconds. I would suggest increasing this to a very large value or running the update in SQL tools (by default no timeout set).
Other than that, an update will finish at some point or error and rollback: are you leaving enough time?
There is also the blocking, last index rebuild, last statistics update, triggers, accidental cross join, MDF or LDF file growth, poor IO, OS paging... etc. And have you restarted the SQL instance or server to remove environmental issues and kill all other connections?
There simply isn't enough information to make a judgement right now sorry.
I'm guessing this isn't a permissions issue as you're not getting an error.
So the closest I have had to this before is when the indexes on the table have become corrupt. Have you tried dropping the indexes and recreating them? Try one by one at first.
If you suspect locking, one of the first things I would do would be to run sp_lock. It will give you a list of all of the current locks held. You can use the DB_NAME and OBJECT_NAME functions to get the names that correspond to the dbid and ObjId columns.
Have you got any triggers on the table?
If so it could be that the trigger is failing so preventing the update.
Can you update other tables? If not (or anyways, if you like) you could check if the transaction log is full (if you use the full recovery model)/the partition your transaction log resides on is full. I think if SQL Server is unable to write to the transaction log you would/could experience this behaviour.
DBCC would be your friend: DBCC SQLPERF(LOGSPACE) shows you, how much (in percent) of your log is used. If it is (close to) 100% this might be your issue.
Just my two pennies worth.

Long query prevents inserts

I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.
Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.
Any ways to avoid this?
Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).
The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?
Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?
If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:
ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SET MULTI_USER;
It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.
It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.
The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...
SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID
Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.
Hope this helps.
If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.
I reckon your best way of avoiding this is to do it in SQL rather than in the application.
You can add a
WAITFOR DELAY '000:00:01'
at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway
The query is performing a table lock, thus the inserts are failing.
It sounds to me like you're keeping a lock on the table while processing the results.
You should instead load them into an array or collection of some sort, and close the database connection.
Then process the array.
In addition, while you're doing your select use either:
WITH(NOLOCK) or WITH(READPAST)
I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:
Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.
If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.
-Brian
You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.
If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.

Resources