I have a table that have 40million records.
What's best (faster)? Create a column directly in that table or create another table with identity column and insert data from first?
If I create an identity column in the table that have 40million records, is it possible estimate how long does it take to create it?
This kind of depends. Creating an identity column won't take that long (well ok this is relative to the size of the table), assuming you appended it to the end of the table. If you didn't, the server has to create a new table with the identity column at the desired position, export all the rows to the new table, and then change the table name. I am guessing that is what is taking so long.
I'm guessing it's blocked - did you use the GUI or a query window (do you know the SPID it's running under?)
Try these - let us know if they give results and you're not sure what to do:
USE master
SELECT * FROM sysprocesses WHERE blocked <> 0
SELECT * FROM sysprocesses WHERE status = 'runnable' AND spid <> ##SPID
If you used ALTER TABLE [...] ADD ... in a query window, it is pretty fast, in fact it would have finished long ago. If you used the Management Studio table designer it is copying the table into a new table, dropping the old one, then renaming the new table as the old one. It will take a while, specially if you did not pre-grow the database and the log to accommodate the extra space needed. Because is all one single transaction, it would take about another 16 hours to rollback if you stop it now.
Isn't it something you'll only have to do once, and therefore isn't really a problem how long it takes? (Assuming it doesn't take days...)
Can you not create a test copy of the database and create the column on that to see how long it takes?
I think a lot depends upon the hardware and which DBMS you are in. In my environment, creating a new table and copying the old data into it would take about 3 or 4 hours. I would expect the addition of an identity column to take around the same amount of time, just based on other experiences. I'm on Oracle with multiple servers on a SAN, so things can run faster than in a single server environment. You may just have to sit back and wait.
Related
I'm currently changing the Id field of table to be an IDENTITY field. This is simple: Create a temp-table, copy all the data to the temp-table, adjust all the references from and to the table to point from and to the new temp-table, drop the old table, rename the temp-table to the original name.
Now I've got the problem that the copy step is taking too long. Actually the table doesn't have too many entries (~7.5 million rows), but it still takes multiple hours to do this.
I'm currently moving the data with a query like this:
SET IDENTITY_INSERT MyTable_Temp ON
INSERT INTO MyTable_Temp ([Fields]) SELECT [Fields] FROM MyTable
SET IDENTITY_INSERT MyTable_Temp OFF
I've had a look at bcp in combination with cmdshell and a following BULK INSERT, but I don't like the solution of first writing the data to a temp-file and afterwards dumping it back into the new table.
Is there a more efficient way to copy or move the data from the old to the new table? And can this be done in "pure" T-SQL?
Keep in mind, the data is correct (no external sources involved) and no changes are being made to the data during transfer.
Your approach seems fair, but the transaction generated by the insert command is too large and that is why it takes so long.
My approach when dealing with this in the past, was to use a cursor and a batching mechanism.
Perform the operation for only 100000 rows at a time, and you will see major improvements.
After the copy is made you can rebuild your references and eventually remove the old table... and so on. Be careful to reseed your new table accordingly after the data is copied.
A recent employee of our company had a stored procedure that has gone haywire, and caused mass inserts into a debug table of his. The table is unindexed, is now at close to 1.7 billion rows, and is taking up so much space that the backup no longer fits on the backup drive (Backups now reach close to 250GB).
I haven't really seen anything like this, so I'm seeking advice from the MSSQL Gurus out here.
I know I could nibble away at the table, but being unindexed, the DELETE FROM [TABLE] WHERE ID IN (SELECT TOP 10000 [ID] FROM [TABLE]) nearly locks up the server searching for them.
I also don't want my log file to get massive, it's currently sitting at 480GB on a 1TB drive. If I delete this table, will I be able to shrink it back down? (My recovery mode is simple)
We could index the id field on the table, though we only have around 9 hours downtime a day, and during business hours we can't be locking up the database.
Just looking for advice here, and a point in the right direction.
Thanks.
You may want to consider TRUNCATE
MSDN reference: http://technet.microsoft.com/en-us/library/aa260621(v=sql.80).aspx
Removes all rows from a table without logging the individual row deletes.
Syntax:
TRUNCATE TABLE [YOUR_TABLE]
As #Rahul suggests in the comments, you could also use DROP TABLE [YOUR_TABLE] if you no longer plan to use the table in question. The TRUNCATE option would simply empty the table but leave it in place if you wanted to continue to use it.
With regards to the space issue, both of these operations will be comparatively quick and the space will be reclaimed, but it won't happen instantly. When using TRUNCATE, the data still has to be deleted, but SQL Server will simply deallocate the data pages used by the table and use a background process to actually perform the clean up afterwards.
This post should provide some useful information.
One suggestion would be ... take the back up of only that 1.7 billion rows table (probably in a tape drive/somewhere with good enough space) and then drop the table saying drop table table_name.
That way, if at all that debug table data is needed in future; you have a copy and can restore from backup.
I would remove the logging for this table and launch a delete stored procedure that would commit every 1000 rows.
My current project for a client requires me to work with Oracle databases (11g). Most of my previous database experience is with MSSQL Server, Access, and MySQL. I've recently run into an issue that seems incredibly strange to me and I was hoping someone could provide some clarity.
I was looking to do a statement like the following:
update MYTABLE set COLUMN_A = COLUMN_B;
MYTABLE has about 13 million rows.
The source column is indexed (COLUMN_B), but the destination column is not (COLUMN_A)
The primary key field is a GUID.
This seems to run for 4 hours but never seems to complete.
I spoke with a former developer that was more familiar with Oracle than I, and they told me you would normally create a procedure that breaks this down into chunks of data to be commited (roughly 1000 records or so). This procedure would iterate over the 13 million records and commit 1000 records, then commit the next 1000...normally breaking the data up based on the primary key.
This sounds somewhat silly to me coming from my experience with other database systems. I'm not joining another table, or linking to another database. I'm simply copying data from one column to another. I don't consider 13 million records to be large considering there are systems out there in the orders of billions of records. I can't imagine it takes a computer hours and hours (only to fail) at copying a simple column of data in a table that as a whole takes up less than 1 GB of storage.
In experimenting with alternative ways of accomplishing what I want, I tried the following:
create table MYTABLE_2 as (SELECT COLUMN_B, COLUMN_B as COLUMN_A from MYTABLE);
This took less than 2 minutes to accomplish the exact same end result (minus dropping the first table and renaming the new table).
Why does the UPDATE run for 4 hours and fail (which simply copies one column into another column), but the create table which copies the entire table takes less than 2 minutes?
And are there any best practices or common approaches used to do this sort of change? Thanks for your help!
It does seem strange to me. However, this comes to mind:
When you are updating the table, transaction logs must be created in case a rollback is needed. Creating a table, that isn't necessary.
I have situation where I need to change the order of the columns/adding new columns for existing Table in SQL Server 2008. It is not allowing me to do without drop and recreate. But that is in production system and having data in that table. I can take backup of the data, and drop the existing table and change the order/add new columns and recreate it, insert the backup data into new table.
Is there any best way to do this without dropping and recreating. I think SQL Server 2005 will allow this process without dropping and recreating while changing to existing table structure.
Thanks
You can't really change the column order in a SQL Server 2008 table - it's also largely irrelevant (at least it should be, in the relational model).
With the visual designer in SQL Server Management Studio, as soon as you make too big a change, the only reliable way to do this for SSMS is to re-create the table in the new format, copy the data over, and then drop the old table. There's really nothing you can do about this to change it.
What you can do at all times is add new columns to a table or drop existing columns from a table using SQL DDL statements:
ALTER TABLE dbo.YourTable
ADD NewColumn INT NOT NULL ........
ALTER TABLE dbo.YourTable
DROP COLUMN OldColumn
That'll work, but you won't be able to influence the column order. But again: for your normal operations, column order in a table is totally irrelevant - it's at best a cosmetic issue on your printouts or diagrams..... so why are you so fixated on a specific column order??
There is a way to do it by updating SQL server system table:
1) Connect to SQL server in DAC mode
2) Run queries that will update columns order:
update syscolumns
set colorder = 3
where name='column2'
But this way is not reccomended, because you can destroy something in DB.
One possibility would be to not bother about reordering the columns in the table and simply modify it by add the columns. Then, create a view which has the columns in the order you want -- assuming that the order is truly important. The view can be easily changed to reflect any ordering that you want. Since I can't imagine that the order would be important for programmatic applications, the view should suffice for those manual queries where it might be important.
As the other posters have said, there is no way without re-writing the table (but SSMS will generate scripts which do that for you).
If you are still in design/development, I certainly advise making the column order logical - nothing worse than having a newly added column become part of a multi-column primary key and having it no where near the other columns! But you'll have to re-create the table.
One time I used a 3rd party system which always sorted their columns in alphabetical order. This was great for finding columns in their system, but whenever they revved their software, our procedures and views became invalid. This was in an older version of SQL Server, though. I think since 2000, I haven't seen much problem with incorrect column order. When Access used to link to SQL tables, I believe it locked in the column definitions at time of table linking, which obviously has problems with almost any table definition changes.
I think the simplest way would be re-create the table the way you want it with a different name and then copy the data over from the existing table, drop it, and re-name the new table.
Would it perhaps be possible to script the table with all its data.
Do an edit on the script file in something like notepad++
Thus recreating the table with the new columns but the same.
Just a suggestion, but it might take a while to accomplish this.
Unless you write yourself a small little c# application that can work with the file and apply rules to it.
If only notepadd++ supported a find and move operation
I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.
If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx
delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.
To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)
You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!
The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.
Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.
I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.
Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.
There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.