How to diagnose slow NHibernate inserts? - sql-server

I know NHibernate isn't meant to do batch inserts, because it's about 5x slower than SqlBulkCopy, but I decided to use it for code simplicity.
However, my code's not 5x slower. It's 2400x slower. I'm inserting about 2500 records. I've turned off log4net logging. I'm running it in release mode. I'm not using an id generator (I'm specifying it in the code via an integer counter). I'm using a stateless session. I've set a batch size of 100 (I could go more, but doesn't seem to help). I tried adding the generator back in, but setting its class to "assigned".
I'm not inserting any child elements. I've confirmed the batch inserts are occurring.
Is it still calling SELECT SCOPE_IDENTITY()? But even if it is, that's still a ridiculous amount of time.
I don't do too many batch operations, so I can continue to use SqlBulkCopy for this process, but I'm concerned that my entire application could be running faster.
I don't have a license for NHProf, but I'm wondering if now is the time to download the trial.
I'm using NHibernate 3.3 GA with Syscache2 -- but again, I'm using a stateless session.
Any HBMs, configuration, or code you want to see? Suggestions?
Thanks

because it's about 5x slower than SqlBulkCopy
You must be joking.
NHibrnate does inserts. Using batched inserts (i.e. more than one insert statement in a command), handwritten - something I do not think NHibernate does - I got around 400 inserts in a specific project.
Using SqlBUlkCopy i got 75000.
That is NOT a factor of 5, that is a factor of of 187.
However, my code's not 5x slower. It's 2400x slower
Not an NHibernate specialist. Log the connection - I would assume NHibernate sends one insert per batch, which means a LOT of slow processing etc. and is a LOT slower than the stuff I Did (beginning of my text).
Where the heck did you get the 5x factor from? That is a false start to start with.
I don't do too many batch operations, so I can continue to use SqlBulkCopy for this process, but I'm concerned
that my entire application could be running faster.
Here is a reality check for you: you do not use an ORM when you need extreme select or insert speed. They are there for business rule heavy objects - business objects. When you end up doing bulk inserts or reads, you DO NOT USE A FULL ORM. Simple like that.
When you think SqlBUlkCopy is fast, check this:
* Multiple SqlBulkCopy running on multiple threads...
* ...inerting into temporary tables and then
* ...using one insert into select statement to copy the data to the final table.
Why? Because SqlBulkCopy has some bad locking behavior for multi threads. This is how I got it that high.
AND: 2500 rows is low for SqlBulkCopy - the setup overhead is significant (i.e. before line 1).... So you will get less gain. I use 50k row batches.
What is NHibernate doing on the wire level?
I've confirmed the batch inserts are occurring.
How? What do you consider a batch insert?
Are there triggers on the table?

inserting 10000 objects with a couple long and string properties into a local mysql database on my devmashine takes:
StatelessSession: 4,6 seconds
Session: 5,7 seconds

I have found if you are doing many inserts the First Level Cache will clog and soon cause it to slow down to a crawl. You can try and use a stateless session or just open and close the Session periodically, ie every 5 inserts. You will of course lose things like transactions by closing session.
But ultimately, I tend to use SqlBulkCopy if I have more than about 100 rows to insert, many times quicker.

Related

node.js DB bulk insert vs insert one-at-a-tme

I'm a nodejs newbie and was wondering which way was better to insert huge number of rows into a DB. On the surface, it looks like inserting stuff one-at-a-time looks more like the way to go because I can free the event loop quickly and serve other requests. But, the code looks hard to understand that way. For bulk inserts, I'd have to prepare the data beforehand which would mean using loops for sure. This would cause less requests to be served during that period as the event loop is busy with the loop.
So, what's the preferred way ? Is my analysis correct ?
There's no right answer here. It depends on the details: why are you inserting a huge number of rows? How often? Is this just a one-time bootstrap or does your app do this every 10 seconds? It also matters what compute/IO resources are available. Is your app the only thing using the database or is blasting it with requests going to be a denial of service for other users?
Without the details, my rule of thumb would be bulk insert with a small concurrency limit, like fire off up to 10 inserts, and then wait until one of them finishes before sending another insert command to the database. This follows the model of async.eachLimit. This is how browsers handle concurrent requests to a given web site, and it has proven to be a reasonable default policy.
In general, loops on in-memory objects should be fast, very fast.
I know you're worried about blocking the CPU, but you should be considering the total amount of work to be done. Sending items one at time carries a lot of overhead. Each query to the DB has its own sequence of inner for loops that probably make your "batching" for loop look pretty small.
If you need to dump 1000 things in the DB, the minimum amount of work you can do is to run this all at once. If you make it 10 batches of 100 "things", you have to do all of the same work + you have to generate and track all of these requests.
So how often are you doing these bulk inserts? If this is a regular occurrence, you probably want to minimize the total amount of work and bulk insert everything at once.
The trade-off here is logging and retries. It's usually not enough to just perform some type of bulk insert and forget about it. The bulk insert is eventually going to fail (fully or partially) and you will need some type of logic for retries or consolidation.
If that's a concern, you probably want to manage the size of the bulk insert so that you can retry blocks intelligently.

Need recommendations on pushing the envelope with SqlBulkCopy on SQL Server

I am designing an application, one aspect of which is that it is supposed to be able to receive massive amounts of data into SQL database. I designed the database stricture as a single table with bigint identity, something like this one:
CREATE TABLE MainTable
(
_id bigint IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
field1, field2, ...
)
I will omit how am I intending to perform queries, since it is irrelevant to the question I have.
I have written a prototype, which inserts data into this table using SqlBulkCopy. It seemed to work very well in the lab. I was able to insert tens of millions records at a rate of ~3K records/sec (full record itself is rather large, ~4K). Since the only index on this table is autoincrementing bigint, I have not seen a slowdown even after significant amount of rows was pushed.
Considering that the lab SQL server was a virtual machine with relatively weak configuration (4Gb RAM, shared with other VMs disk sybsystem), I was expecting to get significantly better throughput on the physical machine, but it didn't happen, or lets say the performance increase was negligible. I could, maybe get 25% faster inserts on physical machine. Even after I configured 3-drive RAID0, which performed 3 times faster than a single drive (measured by a benchmarking software), I got no improvement. Basically: faster drive subsystem, dedicated physical CPU and double RAM almost didn't translate into any performance gain.
I then repeated the test using biggest instance on Azure (8 cores, 16Gb), and I got the same result. So, adding more cores did not change insert speed.
At this time I have played around with following software parameters without any significant performance gain:
Modifying SqlBulkInsert.BatchSize parameter
Inserting from multiple threads simultaneously, and adjusting # of threads
Using table lock option on SqlBulkInsert
Eliminating network latency by inserting from a local process using shared memory driver
I am trying to increase performance at least 2-3 times, and my original idea was that throwing more hardware would get tings done, but so far it doesn't.
So, can someone recommend me:
What resource could be suspected a bottleneck here? How to confirm?
Is there a methodology I could try to get reliably scalable bulk insert improvement considering there is a single SQL server system?
UPDATE I am certain that load app is not a problem. It creates record in a temporary queue in a separate thread, so when there is an insert it goes like this (simplified):
===>start logging time
int batchCount = (queue.Count - 1) / targetBatchSize + 1;
Enumerable.Range(0, batchCount).AsParallel().
WithDegreeOfParallelism(MAX_DEGREE_OF_PARALLELISM).ForAll(i =>
{
var batch = queue.Skip(i * targetBatchSize).Take(targetBatchSize);
var data = MYRECORDTYPE.MakeDataTable(batch);
var bcp = GetBulkCopy();
bcp.WriteToServer(data);
});
====> end loging time
timings are logged, and the part that creates a queue never takes any significant chunk
UPDATE2 I have implemented collecting how long each operation in that cycle takes and the layout is as follows:
queue.Skip().Take() - negligible
MakeDataTable(batch) - 10%
GetBulkCopy() - negligible
WriteToServer(data) - 90%
UPDATE3 I am designing for standard version of SQL, so I cannot rely on partitioning, since it's only available in Enterprise version. But I tried a variant of partitioning scheme:
created 16 filegroups (G0 to G15),
made 16 tables for insertion only (T0 to T15) each bound to its individual group. Tables are with no indexes at all, not even clustered int identity.
threads that insert data will cycle through all 16 tables each. This makes it almost a guarantee that each bulk insert operation uses its own table
That did yield ~20% improvement in bulk insert. CPU cores, LAN interface, Drive I/O were not maximized, and used at around 25% of max capacity.
UPDATE4 I think it is now as good as it gets. I was able to push inserts to a reasonable speeds using following techniques:
Each bulk insert goes into its own table, then results are merged into main one
Tables are recreated fresh for every bulk insert, table locks are used
Used IDataReader implementation from here instead of DataTable.
Bulk inserts done from multiple clients
Each client is accessing SQL using individual gigabit VLAN
Side processes accessing the main table use NOLOCK option
I examined sys.dm_os_wait_stats, and sys.dm_os_latch_stats to eliminate contentions
I have a hard time to decide at this point who gets a credit for answered question. Those of you who don't get an "answered", I apologize, it was a really tough decision, and I thank you all.
UPDATE5: Following item could use some optimization:
Used IDataReader implementation from here instead of DataTable.
Unless you run your program on machine with massive CPU core count, it could use some re-factoring. Since it is using reflection to generate get/set methods, that becomes a major load on CPUs. If performance is a key, it adds a lot of performance when you code IDataReader manually, so that it is compiled, instead of using reflection
For recommendations on tuning SQL Server for bulk loads, see the Data Loading and Performance Guide paper from MS, and also Guidelines for Optimising Bulk Import from books online. Although they focus on bulk loading from SQL Server, most of the advice applies to bulk loading using the client API. This papers apply to SQL 2008 - you don't say which SQL Server version you're targetting
Both have quite a lot of information which it's worth going through in detail. However, some highlights:
Minimally log the bulk operation. Use bulk-logged or simple recovery.
You may need to enable traceflag 610 (but see the caveats on doing
this)
Tune the batch size
Consider partitioning the target table
Consider dropping indexes during bulk load
Nicely summarised in this flow chart from Data Loading and Performance Guide:
As others have said, you need to get some peformance counters to establish the source of the bottleneck, since your experiments suggest that IO might not be the limitation.
Data Loading and Performance Guide includes a list of SQL wait types and performance counters to monitor (there are no anchors in the document to link to but this is about 75% through the document, in the section "Optimizing Bulk Load")
UPDATE
It took me a while to find the link, but this SQLBits talk by Thomas Kejser is also well worth watching - the slides are available if you don't have time to watch the whole thing. It repeats some of the material linked here but also covers a couple of other suggestions for how to deal with high incidences of particular performance counters.
It seems you have done a lot however I am not sure if you have had chance to study Alberto Ferrari SqlBulkCopy Performance Analysis report, which describes several factors to consider the performance related with SqlBulkCopy. I would say lots of things discussed in that paper is still worth trying to that would good to try first.
I am not sure why you are not getting 100% utilization on CPU, IO or memory. But if you simply want to improve your bulk load speeds, here is something to consider:
Partition you data file into different files. Or if they are coming from different sources, then simply create different data files.
Then run multiple bulk inserts simultaneously.
Depending on your situation the above may not be feasible; but if you can then I am sure it should improve your load speeds.

DB insert operations are slower when using parallel.foreach than regular foreach

I have a function that loops through my list of objects, makes some checks on them and saving to an SQL server DB.
The number of objects I should save can be tens of thousands, so I decided to use parallel framework to make it faster. I use parallel.foreach to do that.
This iteration is in a backgroundworker thread.
It works, but I recognized it is slower than 'normal' foreach. E.g. in my last test a list of my objects are processed in 2:41.35 minutes, regular foreach do the does it in 2:14.92. The difference is nearly half a minute.
Is it a common behavior for DB insert operations in parallel framework? Is it advised to use parallel.foreach to make DB insert operations?
Thx!
VS2010/.Net4/c#
If you have a single database connection, trying to save objects in parallel is probably just adding synchronization overhead on the client side (the connection object). See the potential pitfalls of parallelism. Without knowing anything about your particular code, I would guess that a better approach would be to try to parallelize the checking only, and then do the saving (using the single db connection) sequentially. Of course, if your objects are mutable you need to make sure they can't change between checking and saving, but that's still true in the current approach.
BTW, are you saving all of these objects to the db in a single transaction or do you just have autocommit on? It depends on what kind of integrity constraints apply to your insertions, of course, but 2-3 minutes seems like a long time even for tens of thousands of rows. Using transactions wisely, if you aren't currently, is likely to get you a significant increase in performance.
Edited to add: Using SQLite on my desktop machine, I can insert 100 1-field rows in 5 seconds using autocommit. I can insert 10,000 1-field rows in 13 seconds if they're in a single transaction.

how many rows can a be executed at one time unless to get time out

i work on sql 2005 server
I have almost 350 000 insert scripts.. The insert script has 10 columns to be inserted.
So how many rows should I select to be executed at one click. "Execute" click..
Please tell me an average number according to an average system configuration..
Win XP
Cor 2 Duo
3,66 Gb ram
Ok, lets get some things straight here:
Win XP Cor 2 Duo 3,66 Gb ram
Not average but outdated. On top it totally missed the most important nubmer for a db, which is speed/number of discs.
i work on sql 2005 server I have
almost 350 000 insert scripts..
I seriously doubt you haver 350.000 insert SCRIPTS. THIs would be 350.000 FILES that contain insert commands. This is a lot of files.
The insert script has 10 columns to be
inserted.
I order a pizza. How much fuel does my car require per km? Same relation. 10 columns is nice, but you dont say how many insert commands your scripts contain.
So, at the end the only SENSIBLE interpretation is you have to insert 350.000 rows, and try to do it from a program (i.e. there are no scripts to start with), but this is pretty much absolutely NOT what you say.
So how many rows should I select to be
executed at one click
How many pizzas should I order with one telephone? THe click here is irrelevant. It woud also not get faster when you use a command line program to do the isnerts.
The question is how to get inserts into the db fastest.
For normal SQL:
Batch the inserts. Like 50 or 100 into one statement (yes, you can write more than one insert into one command).
Submit them interleaved async, preparing the next statement while the prevous one executes.
This is very flexible as yuo can do real sql etc.
for real mass inserts:
Forget the idea of writing insertsstatements. Prepare the data properly as per table structure, use SqlBulkCopy to mass insert them.
Less flexible - but a LOT faster.
The later approach on my SMALL (!) database computer would handle this in about 3-5 seconds when the fields are small (a field dan be a 2gb binary data thing, you know). I handle about 80.000 row isnerts per second without a lot of optimization, but i have small and a little less fields. This is 4 processor cores (irrelvant, they never get busy), 8gb RAM (VERY small for a datbase server, irrelevant as well in this context), and 6 vlociraptors for the data in a Raid 10 (again, a small configuration for a database, b ut very relevant). I get a peak insert in the 150mb per second range here in activity monitor. I will do a lot of optimization here, as i open /close a db connection at the moment every 20.000 items... bad batching.
But then, you dont seem to have a database system at all, just a database installed on a low end workstation, an this means your IO is going to be REALLY slow compared to database servers, and insert speed / update speed is IO bound. Desktop discs suck, and you have data AND logs on the same discs.
But.... at the end you dont really say us anything about your problem.
And... the timeout CAN be set programmatically on the connection object.
I'm pretty sure the timeout can be set by the user by going Server Properties -> Connections -> Remote query timeout. If you set this sufficiently high (or to 0 which should mean it never times out) then you can run as many scripts as you like.
Obviously this is only ok if the database is not yet live - and you're simply needing to populate. If the data is coming from another MS SQL Server however you might just want to take a Full backup and restore - this will be both simpler and quicker.
This may be of help.
The general rule of thumb is to not exceed 0.1 seconds per UI operation for excellent performance. You are going to need to benchmark to find out what that is.

SQL Server & update (or insert) parallelism

I got a large conversion job- 299Gb of JPEG images, already in the database, into thumbnail equivalents for reporting and bandwidth purposes.
I've written a thread safe SQLCLR function to do the business of re-sampling the images, lovely job.
Problem is, when I execute it in an UPDATE statement (from the PhotoData field to the ThumbData field), this executes linearly to prevent race conditions, using only one processor to resample the images.
So, how would I best utilise the 12 cores and phat raid setup this database machine has? Is it to use a subquery in the FROM clause of the update statement? Is this all that is required to enable parallelism on this kind of operation?
Anyway the operation is split into batches, around 4000 images per batch (in a windowed query of about 391k images), this machine has plenty of resources to burn.
Please check the configuration setting for Maximum Degree of Parallelism (MAXDOP) on your SQL Server. You can also set the value of MAXDOP.
This link might be useful to you http://www.mssqltips.com/tip.asp?tip=1047
cheers
Could you not split the query into batches, and execute each batch separately on a separate connection? SQL server only uses parallelism in a query when it feels like it, and although you can stop it, or even encourage it (a little) by changing the cost threshold for parallelism option to O, but I think its pretty hit and miss.
One thing thats worth noting is that it will only decide whether or not to use parallelism at the time that the query is compiled. Also, if the query is compiled at a time when the CPU load is higher, SQL server is less likely to consider parallelism.
I too recommend the "round-robin" methodology advocated by kragen2uk and onupdatecascade (I'm voting them up). I know I've read something irritating about CLR routines and SQL paralellism, but I forget what it was just now... but I think they don't play well together.
The bit I've done in the past on similar tasks it to set up a table listing each batch of work to be done. For each connection you fire up, it goes to this table, gest the next batch, marks it as being processed, processes it, updates it as Done, and repeats. This allows you to gauge performance, manage scaling, allow stops and restarts without having to start over, and gives you something to show how complete the task is (let alone show that it's actually doing anything).
Find some criteria to break the set into distinct sub-sets of rows (1-100, 101-200, whatever) and then call your update statement from multiple connections at the same time, where each connection handles one subset of rows in the table. All the connections should run in parallel.

Resources