Is SQL Server Bulk Insert Transactional? - sql-server

If I run the following query in SQL Server 2000 Query Analyzer:
BULK INSERT OurTable
FROM 'c:\OurTable.txt'
WITH (CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = '\t', ROWS_PER_BATCH = 10000, TABLOCK)
On a text file that conforms to OurTable's schema for 40 lines, but then changes format for the last 20 lines (lets say the last 20 lines have fewer fields), I receive an error. However, the first 40 lines are committed to the table. Is there something about the way I'm calling Bulk Insert that makes it not be transactional, or do I need to do something explicit to force it to rollback on failure?

BULK INSERT acts as a series of individual INSERT statements and thus, if the job fails, it doesn't roll back all of the committed inserts.
It can, however, be placed within a transaction so you could do something like this:
BEGIN TRANSACTION
BEGIN TRY
BULK INSERT OurTable
FROM 'c:\OurTable.txt'
WITH (CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = '\t',
ROWS_PER_BATCH = 10000, TABLOCK)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH

You can rollback the inserts . To do that we need to understand two things first
BatchSize
: No of rows to be inserted per transaction . The Default is entire
Data File. So a data file is in transaction
Say you have a text file which has 10 rows and row 8 and Row 7 has some invalid details . When you Bulk Insert the file without specifying or with specifying batch Size , 8 out of 10 get inserted into the table. The Invalid Row's i.e. 8th and 7th gets failed and doesn't get inserted.
This Happens because the Default MAXERRORS count is 10 per transaction.
As Per MSDN :
MAXERRORS :
Specifies the maximum number of syntax errors allowed in the data
before the bulk-import operation is canceled. Each row that cannot be
imported by the bulk-import operation is ignored and counted as one
error. If max_errors is not specified, the default is 10.
So Inorder to fail all the 10 rows even if one is invalid we need to set MAXERRORS=1 and BatchSize=1 Here the number of BatchSize also matters.
If you specify BatchSize and the invalid row is inside the particular batch , it will rollback the particular batch only ,not the entire data set.
So be careful while choosing this option
Hope this solves the issue.

As stated in the BATCHSIZE definition for BULK INSERT in MSDN Library (http://msdn.microsoft.com/en-us/library/ms188365(v=sql.105).aspx) :
"If this fails, SQL Server commits or rolls back the transaction for every batch..."
In conclusion is not necessary to add transactionality to Bulk Insert.

Try to put it inside user-defined transaction and see what happens. Actually it should roll-back as you described it.

Related

Bulk Insert does not insert data

I want to perform a Bulk Insert for data I get via Stream. Here I get Survey data where each row are the information and answers of a person. I consumed the stream via .net and saved the data row by row each ending with a vblf (I checked if this exist via Word and could see that after each dataset there is a new line). The data are comma separated. First off I created a table with 1000 columns, since I do not know yet how many data will come in but for sure there is not dataset longer than 500 yet and even in the future it will definitly not get longer than 1000 and if so, I can extend the table. Here the Table I created:
The first two datasets looks like this:
"4482359","12526","2014 Company","upload by","new upload","Anonymous","User","anonymous#company.org","","222.222.222.222","1449772662000","undefined","","951071","2015","","2","3","1","5","1","1","3","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","5","1","3","3","3","3","1","2","3","1","3","5","1","","Here ppl can type in some text.","1"
"4482360","12526","2014 Company","upload by","new upload","Anonymous","User","anonymous#company.org","","222.222.222.222","1449772662000","undefined","","951071","2015","","2","5","1","","2","2","3","4","3","1","4","4","4","4","3","3","","4","3","1","4","3","1","4","4","4","3","3","4","4","4","4","3","4","4","4","4","4","4","5","2","3","4","1","3","2","2","5","1","3","","2","","","2"
Now I want to do a Bulk Insert using this command:
USE MyDatabase
BULK INSERT insert_Table FROM 'C:\new.txt'
With (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
The command runs through and does not thow an error but I get the message 0 rows affected and there is no data in the datatable. Does anyone has an idea what I am doing wrong here?

SSMS Bulk inserts = Error + Which line is it?

I'm trying to insert a lot of data with SQL Server Management Studio. This is what I do:
I open my file containing a lot of SQL inserts: data.sql
I execute it (F5)
I get a lot of these:
(1 row(s) affected)
and some of these:
Msg 8152, Level 16, State 13, Line 26
String or binary data would be truncated.
The statement has been terminated.
Question: How to get the error line number ? Line 26 doesn't seems to be the correct error line number...
This is something that has annoyed SQL Server developers for years. Finally, with SQL Server 2017 CU12 w/ trace flag 460 they give you a better error message, like:
Msg 2628, Level 16, State 6, Procedure ProcedureName, Line Linenumber
String or binary data would be truncated in table ‘%.*ls’, column
‘%.*ls’. Truncated value: ‘%.*ls
A method to get around this now is to add a print statement after each insert. Then, when you see your rows affected print out, you could see what ever you print.
...
insert into table1
select...
print 'table1 insert complete'
insert into table2
select...
print 'table2 insert complete'
This isn't going to tell you what column, but would narrow it down to the correct insert. You can also add SET NOCOUNT ON at the top of your script if you don't want the rows affected message printed out.
Another addition, if you really are using BULK INSERT and weren't just using the term generally, you can specify an ERRORFILE. This will log the row(s) which caused the error(s) in your BULK INSERT command. It's important to know that by default, BULK INSERT will complete if there are 10 errors or less. You can override this by specifying the MAXERRORS in your BULK INSERT command.

Insert from select or update from select with commit every 1M records

I've already seen a dozen such questions but most of them get answers that doesn't apply to my case.
First off - the database is am trying to get the data from has a very slow network and is connected to using VPN.
I am accessing it through a database link.
I have full write/read access on my schema tables but I don't have DBA rights so I can't create dumps and I don't have grants for creation new tables etc.
I've been trying to get the database locally and all is well except for one table.
It has 6.5 million records and 16 columns.
There was no problem getting 14 of them but the remaining two are Clobs with huge XML in them.
The data transfer is so slow it is painful.
I tried
insert based on select
insert all 14 then update the other 2
create table as
insert based on select conditional so I get only so many records and manually commit
The issue is mainly that the connection is lost before the transaction finishes (or power loss or VPN drops or random error etc) and all the GBs that have been downloaded are discarded.
As I said I tried putting conditionals so I get a few records but even this is a bit random and requires focus from me.
Something like :
Insert into TableA
Select * from TableA#DB_RemoteDB1
WHERE CREATION_DATE BETWEEN to_date('01-Jan-2016') AND to_date('31-DEC-2016')
Sometimes it works sometimes it doesn't. Just after a few GBs Toad is stuck running but when I look at its throughput it is 0KB/s or a few Bytes/s.
What I am looking for is a loop or a cursor that can be used to get maybe 100000 or a 1000000 at a time - commit it then go for the rest until it is done.
This is a one time operation that I am doing as we need the data locally for testing - so I don't care if it is inefficient as long as the data is brought in in chunks and a commit saves me from retrieving it again.
I can count already about 15GBs of failed downloads I've done over the last 3 days and my local table still has 0 records as all my attempts have failed.
Server: Oracle 11g
Local: Oracle 11g
Attempted Clients: Toad/Sql Dev/dbForge Studio
Thanks.
You could do something like:
begin
loop
insert into tablea
select * from tablea#DB_RemoteDB1 a_remote
where not exists (select null from tablea where id = a_remote.id)
and rownum <= 100000; -- or whatever number makes sense for you
exit when sql%rowcount = 0;
commit;
end loop;
end;
/
This assumes that there is a primary/unique key you can use to check if a row int he remote table already exists in the local one - in this example I've used a vague ID column, but replace that with your actual key column(s).
For each iteration of the loop it will identify rows in the remote table which do not exist in the local table - which may be slow, but you've said performance isn't a priority here - and then, via rownum, limit the number of rows being inserted to a manageable subset.
The loop then terminates when no rows are inserted, which means there are no rows left in the remote table that don't exist locally.
This should be restartable, due to the commit and where not exists check. This isn't usually a good approach - as it kind of breaks normal transaction handling - but as a one off and with your network issues/constraints it may be necessary.
Toad is right, using bulk collect would be (probably significantly) faster in general as the query isn't repeated each time around the loop:
declare
cursor l_cur is
select * from tablea#dblink3 a_remote
where not exists (select null from tablea where id = a_remote.id);
type t_tab is table of l_cur%rowtype;
l_tab t_tab;
begin
open l_cur;
loop
fetch l_cur bulk collect into l_tab limit 100000;
forall i in 1..l_tab.count
insert into tablea values l_tab(i);
commit;
exit when l_cur%notfound;
end loop;
close l_cur;
end;
/
This time you would change the limit 100000 to whatever number you think sensible. There is a trade-off here though, as the PL/SQL table will consume memory, so you may need to experiment a bit to pick that value - you could get errors or affect other users if it's too high. Lower is less of a problem here, except the bulk inserts become slightly less efficient.
But because you have a CLOB column (holding your XML) this won't work for you, as #BobC pointed out; the insert ... select is supported over a DB link, but the collection version will get an error from the fetch:
ORA-22992: cannot use LOB locators selected from remote tables
ORA-06512: at line 10
22992. 00000 - "cannot use LOB locators selected from remote tables"
*Cause: A remote LOB column cannot be referenced.
*Action: Remove references to LOBs in remote tables.

DB2: Purge large number of records from table

I am using DB2 9.7 FP5 for LUW. I have a table with 2.5 million rows and I want to delete about 1 million rows and this delete operation is distributed across table. I am deleting data with 5 delete statements.
delete from tablename where tableky between range1 and range2
delete from tablename where tableky between range3 and range4
delete from tablename where tableky between range5 and range5
delete from tablename where tableky between range7 and range8
delete from tablename where tableky between range9 and range10
While doing this, first 3 deletes works properly but the 4th fails and DB2 hangs, doing nothing. Below is the process I followed, please help me on this:
1. Set following profile registry parameters: DB2_SKIPINSERTED,DB2_USE_ALTERNATE_PAGE_CLEANING,DB2_EVALUNCOMMITTED,DB2_SKIPDELETED,DB2_PARALLEL_IO
2.Alter bufferpools for automatic storage.
3. Turn off logging for tables (alter table tabname activate not logged initially) and delete records
4. Execute the script with +c to make sure logging is off
What are the best practices to delete such large amount of data? Why its failing when it is deleting data from same table and of same nature?
This is allways tricky task. The size of transaction (e.g. for safe rollback) is limited by the size of transaction log. The transaction log is filled not only by yours sql commands but also by the commands of other users using db in the same moment.
I would suggest using one of/or combination of following methods
1. Commits
Do commmits often - in your case I would put one commit after each delete command
2. Increase the size of transaction log
As I recall default db2 transaction log is not very big. The size of transaction log should be calculated/tuned for each db individually. Reference here and with more details here
3. Stored procedure
Write and call stored procedure which does deletes in blocks, e.g.:
-- USAGE - create: db2 -td# -vf del_blocks.sql
-- USAGE - call: db2 "call DEL_BLOCKS(4, ?)"
drop PROCEDURE DEL_BLOCKS#
CREATE PROCEDURE DEL_BLOCKS(IN PK_FROM INTEGER, IN PK_TO INTEGER)
LANGUAGE SQL
BEGIN
declare v_CNT_BLOCK bigint;
set v_CNT_BLOCK = 0;
FOR r_cur as c_cur cursor with hold for
select tableky from tablename
where tableky between pk_from and pk_to
for read only
DO
delete from tablename where tableky=r_cur.tableky;
set v_CNT_BLOCK=v_CNT_BLOCK+1;
if v_CNT_BLOCK >= 5000 then
set v_CNT_BLOCK = 0;
commit;
end if;
END FOR;
commit;
END#
4. Export + import with replace option
In some cases when I needed to purge very big tables or leave just small amount of records (and had no FK constraints), then I used export + import(replace). The replace import option is very destructive - it purges the whole table before import of new records starts (reference of db2 import command), so be sure what you're doing and make backup before. For such sensitive operations I create 3 scripts and run each separately: backup, export, import. Here is the script for export:
echo '===================== export started ';
values current time;
export to tablename.del of del
select * from tablename where (tableky between 1 and 1000
or tableky between 2000 and 3000
or tableky between 5000 and 7000
) ;
echo '===================== export finished ';
values current time;
Here is the import script:
echo '===================== import started ';
values current time;
import from tablename.del of del allow write access commitcount 2000
-- !!!! this is IMPORTANT and VERY VERY destructive option
replace
into tablename ;
echo '===================== import finished ';
5. Truncate command
Db2 in version 9.7 introduced TRUNCATE statement which:
deletes all of the rows from a table.
Basically:
TRUNCATE TABLE <tablename> IMMEDIATE
I had no experience with TRUNCATE in db2 but in some other engines, the command is very fast and does not use transaction log (at least not in usual manner). Please check all details here or in official documentation. As solution 4, this method too is very destructive - it purges the whole table so be very careful before issuing the command. Ensure previous state with table/db backup doing first.
Note about when to do this
When there are no other users on db, or ensure this by locking the table.
Note about rollback
In transaction db (like db2) rollback can restore db state to the state when transaction started. In methods 1,3 and 4 this can't be achieved, so if you need feature "restoring to the original state", the only option which ensures this is the method nr. 2 - increase transaction log.
delete from ordpos where orderid in ((select orderid from ordpos where orderid not in (select id from ordhdr) fetch first 40000 rows only));
Hoping this will resolve your query :)
It's unlikely that DB2 is "hanging" – more likely it's in the process of doing a Rollback after the DELETE operation filled the transaction log.
Make sure that you are committing after each individual DELETE statement. If you are executing the script using the +c option for the DB2 CLP, then make sure you include an explicit COMMIT statement between each DELETE.
Best practice to delete the data which has millions of rows is to use commit in between the deletes. In your case you can use commit after every delete statement.
What commit does is it will clear the transction logs and make space available for other delte operations to perform.
Alternatively instad of 5 delete statements use loop and pass the delete statement to delete, After one iteration of the loop execute one commit then database will never hang and simultaneously your data will get deleted.
use some thing like this.
while(count<no of records)
delete from (select * from table fetch fist 50000 records only)
commit;
count= total records- no of records.
If SELECT WHERE FETCH FIRST 10 ROWS ONLY can pull-in a few chunk of records,in chunks of 10 for example, then you can feed this as input into another script that will then delete these records. Rinse and repeat...
For the benefit of everyone, here is the link to my developerWorks article on the same problem. I tried different things and the one I shared on this article worked perfectly for me.

Limit SQL Server log file growth rate

I have a big database of 2.3 billion rows and size of 76gb.
My problem is that I want to convert a column type to smalldatetime but during this operation the .ldf file grows so big that it takes my entire disk space (it got up to 350gb) and then query exits with error.
Is there any way to keep the .ldf small?
I shrinked my .ldf from options.
Database recovery model is set to simple.
Add a new nullable column of type smalldatetime. Then slowly (that is, batches of 10-100k rows, for instance) populate that column by setting its value based on the old columns value. Once all rows have a value in the new column, drop the old one and rename the new one to the old ones name.
That'll ensure no transaction becomes big enough to severely impact your log file.
here is the final code :
I run it now so I will know if its 100% good tommorow , but it seems to work
WHILE (2 > 1)
BEGIN
BEGIN TRANSACTION
UPDATE TOP ( 10000 ) [ais].[dbo].[imis position report]
SET [time2] = convert(smalldatetime, left(date, 19))
IF ##ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
-- 1 second delay
WAITFOR DELAY '00:00:01'
END -- WHILE
GO

Resources