SQL Server 2008 R2 Bulk Insert Logging

SQL Server 2008 R2 Bulk Insert Logging - sql-server

I have to build a very large table (hundreds of millions of records) from tab-delimited text files that I created with a parsing script and therefore know to be uniform and correct. I would love to find a way to do this without SQL Server using any processing power or disk space to log the transactions--if something goes wrong, I'm happy if the bulk insert just dies without trying to roll back the table to some earlier state. Is that possible?
I've searched extensively on this question and I find mentions of using a "Simple" Recovery Model, but it sounds like that might be only effective with the first bulk insert, when the table is empty? If it matters, I plan to index after all of the bulk inserts have completed.
Any advice would be greatly appreciated.

Bulk logged recovery model is the right one for your scenario, which will be doing minimal logging for bulk operation like BULK INSERT, INDEX maintenance operations, etc.... And in SQL Server there is no operation which is non-logged and everything is logged but based on database recovery model the level of logging and transaction log clearing happens. please find the following link which describes various recovery models.

Related

Disable transactions on SQL Server

I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?

You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.

" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.

It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.

Detect Table Changes In A Database Without Modifications

I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?

Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.

I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.

I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.

SQL Server - Tempdb vs. Database Log usage

This may be a very basic question, but how can you determine beforehand whether a large operation will end up using database log or tempdb space?
For instance, one large insert / update operation I did used the database log to a point where we needed to employ SSIS & bulk operations just so the space wouldn't run out, because all the changes in the script had to be deployed at one time.
So now I'm working with a massive delete operation, that would fill the log 10 times over. So I created a script to check the space used by the database log file and delete the rows in smaller batches, with the idea that once the log file was large enough, the script would abort and then continue from that point the next day (allowing normal usage to continue till the next backup, without risk of the log running out of space).
Now, instead of filling the log, the latter query started filling up tempdb. Tempdb data file, not log file, to be specific. So I'm thinking there's a huge hole where my understanding of these two should be. :)
Thanks for any advice!
Edit:
To clarify, the question here is that why does the first example use database log, while the latter uses tempdb data file, to store the changes? And in general, by which logic are DML operations stored to either tempdb or log? Normally log should store all DB changes while tempdb is only used to store the processed data during operation when explicitly requested (ie, temp objects) or when the server runs out of RAM, right?

There is actually quite a bit that goes on behind the scenes when deleting records from a table. This MSDN Blog link may help shed some light on why tempdb is filling up when you try and delete. Either way, the delete will fill up the transaction logs as well, it just sounds like tempdb is filling up before it gets to the step of logging the transaction(s).
I'm not entirely sure what your requirements are, but the following links could be somewhat enlightening on your transaction logging issues. These are all set for SQL Server 2008 R2, but you can switch to whatever version you are running.
Recovery Model Overiew
Considerations for Switching from the Simple Recovery Model
Considerations for Switching from the Full or Bulk-Logged Recovery Model
You also have the option of truncating the table, but that depends on a few things. If you don't need the operation to be logged and you're deleting all the records from the table you can truncate. If you are doing some sort of conditional delete, but you're deleting more than you're keeping, you could always insert all of the records you want to keep into another "staging" table and then truncate the original. Then you can re-insert the records into the staging table. However, that really only works when you have no foreign key relationships on that table.

Managing transaction log for datawarehouse

I have a sql database that I dump data into every 15 minutes using SSIS. The transaction log get's huge and I back it up and shrink it a few times a week. But I know I'm doing something wrong. What is the best practice for me to maintain it? Should it stay ~1GB and I should be backing it up hourly? Since it's a datawarehouse, should I be backing it up at all? Show I be doing something different in SSIS? The datawarehouse recovery model is Bulk Logged.

You should also look to make sure SSIS is setup correct to do minimally logged operations as you may not have it set correctly. Then once that is working correctly, evaluate if you really need bulk logged recovery model: http://msdn.microsoft.com/en-us/library/ms175987(v=sql.105).aspx is good link on subject. If you can quickly (your definition of quick) redo the data that is lost and don't want point in time recovery of non-bulk operations, move to simple recovery, IMHO.

You have a few options with logs and they are related to how you are backing up your database and how much data loss you can stand. If you are doing full backups nightly for example...change you database to simple mode. This truncates your log on commit (meaning once logged operations complete the log cleans itself up) ... there are a lot of articles on this topic ... Google it up and if you need more help post back.

Is there a congruent command to truncate for re-filling a table? [duplicate]

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.

From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx

Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight