Speed of Insert - sql-server

this is a newbe question.
I have two tables in a SQL data base. Both simply a dozen columns of string, int or date, no indexes, no stored procedures. In a select * from statement I get ~30,000 rows per second. But on insert into ... I only get < 1000 inserts per second.
Is this (factor) what I should expect. (I actually expected a comparable speed on the insert part.)

Insert speed varies wildly by the method of inserting. There is a disk and a CPU component. From the low speed of inserting I guess that that you are inserting rows one by one in a separate transaction each. This pretty much maximizes CPU and disk usage. Each insert is a write to disk.
Make yourself familiar with efficient ways of inserting. There are plenty with various degrees of performance and development time required to program them.
To get you started with something simple: Enclose many (100+) inserts in one transaction. Insert in batches.

Related

Is the number of result sets limited in SQL Server?

Is the number of result sets limited that a stored procedure can return in SQL Server? Or is there any other component between server and a .Net Client using sqlncli11 limiting it? I'm thinking of really large numbers like 100000 result sets.
I couldn't find a specific answer to this in the Microsoft docs or here on SO.
My use case:
A stored procedure that iterates over a cursor and produces around 100 rows each iteration. I could collect all the rows in a temp table first, but since this is a long-running operation I want the client to start sooner with processing the results. Also the temp table can get quite large and the execution plans shows 98% cost on the INSERT INTO part.
I'm thinking of really large numbers like 100000 result sets.
Ah, I hope you have a LOT of time.
100k result sets means 100k SELECT statements.
Just switching from one result set to the next will take - together - a long time. 1ms? that is 100 seconds.
Is the number of result sets limited that a stored procedure can return in SQL Server?
Not to my knowledge. Remember, those are not part of any real metadata - there is a stream of data, endmarker, next stream. The number of resultsets a procedure returns is not defined (as: it can vary).
Also the temp table can get quite large
I have seen temp tables with hundreds of GB.
and the execution plans shows 98% cost on the INSERT INTO part.
That basically indicates that there is otherwise not a lot happening. Note that unless you do optimization - the relative cost is not relevant, the absolute is.
Have you considered a middle ground? Collect data and return resultsets grouping i.e. 100 resultsets.
But yes, staging into temp has a lot of overhead. It also means you can not start returning data BEFORE all processing is finished. That can be a bummer. Your approach will allow processing to start while the SP is still working on more data.

What is the normal insert time in a medium size database on an indexed string column?

I have a sqlite database with only one table (around 50,000 rows) and I recurrently perform update-otherwise-insert operations on it using Java and sqlitejdbc (i.e. I try to update rows if they exist and insert new rows otherwise). My table is similar to a word frequency table with "word" and "frequency" columns, and without a primary key!
The problem is that I perform this update-otherwise-insert operation hundreds of thousands of times and on average the insert or update operation takes more than 2ms. There are even times when the insert operations take some 20 milliseconds. I should also mention that the table has an index on the column on which I use the "where" clause in my insert operations (the "word" column"), which naturally should make the insert operation more expensive.
Firstly I want to make sure if 2ms for an insert operation on an indexed table with 50,000 rows is normal and there isn't anything that I've missed, and after that any suggestions to improve the performance is more than welcome. It struck me that dropping the index before performing large crunches of insert operations and recreating it again afterwards is good practice, but I can't do it here because I need to check if a row with the same word already exists.
I know all the stuff about "it depends on the hardware" and "it depends on the rest of your code" etc, but I really think one CAN have an idea of how much an insert operation should take on an average pc.
I partially solved my problem. For anyone interested in an answer to this, this link will be helpful. In short, turning off journal mode in sqlite ("pragma journal_mode=OFF") improves the insert performance significantly (almost four times the previous speed in my case) to the cost of making the code prone to data loss in case of unexpected shutdown.
As for the normal insert speed, it is way faster than 2ms/operation. It can reach as high as hundreds of thousands of insert operations per second using the right pragma instructions, making best use of transactions, etc.

SQLite vs serializing to disk

I'm doing some performance comparison whether to go for serializing data or to store them in a a DB. The application receives hell of a lot of data (x GB) that needs to be persisted with a minimum speed rate of 18mb/s (as for now)
Storing in DB offers easier functionality in terms of searching and accessing data at a later time, data snapshots, data migration and etc, but my tests so far shows a huge difference in performance time.
The test saves 1000 objects (of about 7hundredsomething kb each). Either to their respective columns in table or to disk by saving them as a generic List. (The SQLite ends up with a bit more data)
Saving to SQLite v3, total size 745mb: 30.7seconds (~speed: 24,3 mb/s)
Serializing to disk, total size 741mb: 0.33 seconds (~speed: 2245 mb/s)
I haven't done any performance tweaks to SQLite, just use it out of the box with Fluent nHibernate and the SQLite.Data adapter (no transaction), but at first thought that is a huge difference.
Obviously I know that going through a ORM mapper and DB to write to disk gives an overhead compared to serializing, but that was a lot.
Also into considerations are to persist the data right away as I recieve them. If there is a power failure I need the last data recieved.
Any thoughts?
----- Updates (as I continue to investigate solutions) ------
Wrapping the 1000 inserts in a transaction the time was now ~14s = 53mb/s, however if I throw an exception halfway I loose all my data.
Using a IStatelessSession seems to improve the time by 0.5-1s
Didn't see any performance gain by assigning the ID to the entity instead of having it automaticly assigned in the table and thus getting rid of (select row_generatedid()) for every insert sql. -> Id(x => x.Id).GeneratedBy.Assigned();
the nosync() alternative in SQLite is not an alternative as the DB might be corrupted in case of a power failure.
I had a similar problem once and I suggest you go the SQLite route.
As for your performance issues, I'm pretty sure you'll get a very significant boost if you:
execute all INSERTs in a single transaction - write queries must acquire (and release) a lock to the SQLite file, this is very expensive in terms of disk I/O and you should notice a huge boost***
consider using multi-INSERTs (this probably won't work for you since you rely on a ORM)
as #user896756 mentioned you should also prepare your statements
Test 1: 1000 INSERTs
CREATE TABLE t1(a INTEGER, b INTEGER, c VARCHAR(100));
INSERT INTO t1 VALUES(1,13153,'thirteen thousand one hundred fifty three');
INSERT INTO t1 VALUES(2,75560,'seventy five thousand five hundred sixty');
... 995 lines omitted
INSERT INTO t1 VALUES(998,66289,'sixty six thousand two hundred eighty nine');
INSERT INTO t1 VALUES(999,24322,'twenty four thousand three hundred twenty two');
INSERT INTO t1 VALUES(1000,94142,'ninety four thousand one hundred forty two');
PostgreSQL: 4.373
MySQL: 0.114
SQLite 2.7.6: 13.061
SQLite 2.7.6 (nosync): 0.223
Test 2: 25000 INSERTs in a transaction
BEGIN;
CREATE TABLE t2(a INTEGER, b INTEGER, c VARCHAR(100));
INSERT INTO t2 VALUES(1,59672,'fifty nine thousand six hundred seventy two');
... 24997 lines omitted
INSERT INTO t2 VALUES(24999,89569,'eighty nine thousand five hundred sixty nine');
INSERT INTO t2 VALUES(25000,94666,'ninety four thousand six hundred sixty six');
COMMIT;
PostgreSQL: 4.900
MySQL: 2.184
SQLite 2.7.6: 0.914
SQLite 2.7.6 (nosync): 0.757
*** These benchmarks are for SQLite 2, SQLite 3 should be even faster.
You should consider using compiled statements for sqlite.
Check this
On insert/update queries there is a huge performance boost, I managed to obtain from 2x to 10x faster execution time using compiled statements, although from 33 sec to 0.3 sec is long way.
On the other hand, the SQLite execution speed depends on the schema of the table you are using, ex: if you have an index on a huge data, it would result a slow insert.
After investigating further, the answer lays in a bit of a confusion of the intial results.
While testing the result with larger data I got some other result.
The disk transfer rate is limited to 126mb/s by the manufacturer and how could I write 750MB in a split second? Not sure why. But when I increased the data amount the transfer rate when fast down to ~136 mb/s.
As for database, using a transaction I got speeds up to 90mb/s using the IStatelessSession with large amounts of data (5-10GB). This is good enough for our purpose and I'm sure it can still be tweaked with compiled SQL statements and other if needed.

Sql Server 2008 R2 DC Inserts Performance Change

I have noticed an interesting performance change that happens around 1,5 million entered values. Can someone give me a good explanation why this is happening?
Table is very simple. It is consisted of (bigint, bigint, bigint, bool, varbinary(max))
I have a pk clusered index on first three bigints. I insert only boolean "true" as data varbinary(max).
From that point on, performance seems pretty constant.
Legend: Y (Time in ms) | X (Inserts 10K)
I am also curios about constant relatively small (sometimes very large) spikes I have on the graph.
Actual Execution Plan from before spikes.
Legend:
Table I am inserting into: TSMDataTable
1. BigInt DataNodeID - fk
2. BigInt TS - main timestapm
3. BigInt CTS - modification timestamp
4. Bit: ICT - keeps record of last inserted value (increases read performance)
5. Data: Data
Bool value Current time stampl keeps
Enviorment
It is local.
It is not sharing any resources.
It is fixed size database (enough so it does not expand).
(Computer, 4 core, 8GB, 7200rps, Win 7).
(Sql Server 2008 R2 DC, Processor Affinity (core 1,2), 3GB, )
Have you checked the execution plan once the time goes up? The plan may change depending on statistics. Since your data grow fast, stats will change and that may trigger a different execution plan.
Nested loops are good for small amounts of data, but as you can see, the time grows with volume. The SQL query optimizer then probably switches to a hash or merge plan which is consistent for large volumes of data.
To confirm this theory quickly, try to disable statistics auto update and run your test again. You should not see the "bump" then.
EDIT: Since Falcon confirmed that performance changed due to statistics we can work out the next steps.
I guess you do a one by one insert, correct? In that case (if you cannot insert bulk) you'll be much better off inserting into a heap work table, then in regular intervals, move the rows in bulk into the target table. This is because for each inserted row, SQL has to check for key duplicates, foreign keys and other checks and sort and split pages all the time. If you can afford postponing these checks for a little later, you'll get a superb insert performance I think.
I used this method for metrics logging. Logging would go into a plain heap table with no indexes, no foreign keys, no checks. Every ten minutes, I create a new table of this kind, then with two "sp_rename"s within a transaction (swift swap) I make the full table available for processing and the new table takes the logging. Then you have the comfort of doing all the checking, sorting, splitting only once, in bulk.
Apart from this, I'm not sure how to improve your situation. You certainly need to update statistics regularly as that is a key to a good performance in general.
Might try using a single column identity clustered key and an additional unique index on those three columns, but I'm doubtful it would help much.
Might try padding the indexes - if your inserted data are not sequential. This would eliminate excessive page splitting and shuffling and fragmentation. You'll need to maintain the padding regularly which may require an off-time.
Might try to give it a HW upgrade. You'll need to figure out which component is the bottleneck. It may be the CPU or the disk - my favourite in this case. Memory not likely imho if you have one by one inserts. It should be easy then, if it's not the CPU (the line hanging on top of the graph) then it's most likely your IO holding you back. Try some better controller, better cached and faster disk...

When does MS-SQL maintain table indexes?

For arguments sake, lets say it's for SQL 2005/8. I understand that when you place indexes on a table to tune SELECT statements, these indexes need maintaining during INSERT / UPDATE / DELETE actions.
My main question is this:
When will SQL Server maintain a table's indexes?
I have many subsequent questions:
I naively assume that it will do so after a command has executed. Say you are inserting 20 rows, it will maintain the index after 20 rows have been inserted and committed.
What happens in the situation where a
script features multiple statements
against a table, but are otherwise
distinct statements?
Does the server have the intelligence
to maintain the index after all
statements are executed or does it do
it per statement?
I've seen situations where indexes are dropped and recreated after large / many INSERT / UPDATE actions.
This presumably incurs rebuilding the
entire table's indexes even if you
only change a handful of rows?
Would there be a performance benefit
in attempting to collate INSERT and
UPDATE actions into a larger batch,
say by collecting rows to insert in a
temporary table, as opposed to doing
many smaller inserts?
How would collating the rows above stack up against dropping an index versus taking the maintenance hit?
Sorry for the proliferation of questions - it's something I've always known to be mindful of, but when trying to tune a script to get a balance, I find I don't actually know when index maintenance occurs.
Edit: I understand that performance questions largely depend on the amount of data during the insert/update and the number of indexes. Again for arguments sake, I'd have two situations:
An index heavy table tuned for
selects.
An index light table (PK).
Both situations would have a large insert/update batch, say, 10k+ rows.
Edit 2: I'm aware of being able to profile a given script on a data set. However, profiling doesn't tell me why a given approach is faster than another. I am more interested in the theory behind the indexes and where performance issues stem, not a definitive "this is faster than that" answer.
Thanks.
When your statement (not even transaction) is completed, all your indexes are up-to-date. When you commit, all the changes become permanent, and all locks are released. Doing otherwise would not be "intelligence", it would violate the integrity and possibly cause errors.
Edit: by "integrity" I mean this: once committed, the data should be immediately available to anyone. If the indexes are not up-to-date at that moment, someone may get incorrect results.
As you are increasing batch size, your performance originally improves, then it will slow down. You need to run your own benchmarks and find out your optimal batch size. Similarly, you need to benchmark to determine whether it is faster to drop/recreate indexes or not.
Edit: if you insert/update/delete batches of rows in one statement, your indexes are modified once per statement. The following script demonstrates that:
CREATE TABLE dbo.Num(n INT NOT NULL PRIMARY KEY);
GO
INSERT INTO dbo.Num(n)
SELECT 0
UNION ALL
SELECT 1;
GO
-- 0 updates to 1, 1 updates to 0
UPDATE dbo.Num SET n = 1-n;
GO
-- doing it row by row would fail no matter how you do it
UPDATE dbo.Num SET n = 1-n WHERE n=0;
UPDATE dbo.Num SET n = 1-n WHERE n=1;

Resources