I want to make almost 1000 inserts in a table in each second. And also each day I want to query all inserted rows just once at altogether. And I want to improve efficiency by multi-threading and connection-pooling. But i want to know which level of concurrency control is more suitable for me. The list of options for SQL-Server are in MSDN Site.
Thank you.
You should be OK with default isolation level for inserts. Do you have clustered index? If so, ensure that it doesn't fragment as you insert new rows. Typically guid would be a bad candidate for clustered index. Also if you have Enterprise edition and you are able to identify partitions in your table you might to partition the table using this column (for example region or city) and store partitions of the table on different filegroups. This way you might avoid IO contention.
If you select all data once a day and you would like to maintain inserts speed during the select without too much locking, you might consider creating database snapshot (again Enterprise Edition) and select from it. If you can live with dirty reads you might add with(nolock) hint to your select.
You might be barking up the wrong tree. Have a look into using row-versioning transaction isolation instead of supplying lock hints for individual statements.
A lot of people I talk to have had good results through the use of READ COMMITTED SNAPSHOT - which can be enabled at the database level and requires no code change.
I can say that SNAPSHOT has served me well in the past, but it does require code change.
And a word of warning, be sure that your tempdb throughput is good, as row-versioning increases the load on tempdb significantly.
Related
I have an Audit table in a SQL server database that with the following columns:
Sequence --- bigint (primary key)
TableName --- varachar(50) (nonclustered index)
ColumnName --- varachar(50) (nonclustered index)
Control --- char(10) (nonclustered index)
BeforeValue --- varchar(500) (nonclustered index)
AfterValue, ---varchar(500)
DateChanged --- datetime
ChangedBy --- char(20)
CompanyCode --- char(5) (nonclustered index)
It has 5 billion+ rows of data. Around 200+ triggers are inserting data in this table and around 50+ stored procedures are inserting as well as querying data from this table. Whenever a column is updated/deleted in any of the 200+ tables in a transactional database, a row is inserted in the Audit table respectively.
I inherited this table recently. We have been experiencing performance issues lately and I am told to redesign this Audit table to address the associated performance problems.
I am looking for suggestions, next steps, performance matrix ideas, any help will be appreciated.
Thanks in advance.
I think you do not need to change much, but just need to redesign your process as follows:
Create an archive table exactly the same as your current audit table.
Transfer all your current audit able into this new archive table, meaning your current audit table is empty
Schedule a daily (or weekly) job to move data from your audit table to the archive table.
Based on your data retention policy, clean up your archive table.
But for performance issue, you have to make sure whether inserting into your audit table is really a culprit. If it is, then the above-mentioned way may relieve your pain. Otherwise, it may not help.
I had to do something similar. If you must maintain the 5 billion records, then your best solution is to partition the table. You will need to do the following:
Create partition function
Create partition schema
Create partition files
The partition function will govern how the data is partitioned: typically it is either by row count (i.e., sequence) or date (i.e., monthly, quarterly, yearly, etc.). Note: partitioning is only available on SQL Server 201X Enterprise.
https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes
I highly recommend that you first read Microsoft's white paper on this before doing anything. Also, once you implement it, this should be done over the weekend to allow for processing as I imagine it will take some time to complete.
https://technet.microsoft.com/en-us/library/dd578580(v=sql.100).aspx
For comparisons sake, a query would not complete after 10 minutes of run time in a pre-partitioned state. After partitioning the table, the query completed within 10 seconds. Note: once the table is partitioned, you will need to tune your queries to include the partitioned column in the predicate. Otherwise, you probably won't notice much of a difference in response times.
You're not highly clear on where the performance issues are: are they in reading the table, or writing to it? Typically, if the issue is in reading the table, you'll notice a particular area of the system that is slow due to such reads, while if the issue is with writing to the table, you'll have a more subtle performance impact, but will be much more widespread and tends to slow everything down (in varying degrees) instead of just certain hotspot areas.
Another unclear issue is whether this is primarily write-only or heavily read data. Audit tables are typically write-optimized, to be as fast as they can be for adding new data, while slow to read from it when you do need it (typically we're writing far more often than reading: the opposite of "normal" transactional rows in a RDMS).
Thus, the first step is to determine where the performance issues are, and gather some clues from there to determine whether you need to be optimizing the read portion or the write portion of the table.
If it is indeed a write performance issue (a bit harder to profile and recognize compared to read issues), I would look at perhaps dropping some indexes that are on the table. I think for a write-optimized table, there's a lot of indexes on there that invoking a lot of overhead on each write operation (remember that each index needs to be maintained with every write to the table, so while indexes are great for reading, they take away write performance: especially nonclustered indexes).
If it is a read issue, I might still opt to write-optimize anyway, then devise some means to improve the read performance as best can be once it's write-optimized. Other answers here address that in particular, but without being more familiar with the system requirements, it's hard to say which direction is the best to be moving.
I have a table of a little over 1 billion rows of time-series data with fantastic insert performance but (sometimes) awful select performance.
Table tblTrendDetails (PK is ordered as shown):
PK TrendTime datetime
PK CavityId int
PK TrendValueId int
TrendValue real
The table is continuously pulling in new data and purging old data, so insert and delete performance needs to remain snappy.
When executing a query such as the following, performance is poor (30 sec):
SELECT *
FROM tblTrendDetails
WHERE TrendTime BETWEEN #inMinTime AND #inMaxTime
AND CavityId = #inCavityId
AND TrendValueId = #inTrendId
If I execute the same query again (with similar times, but any #inCavityId or #inTrendId), performance is very good (1 sec). Performance counters show that disk access is the culprit the first time the query is run.
Any recommendations regarding how to improve performance without (significantly) adversely affecting the insert or delete performance? Any suggestions (including completely changing the underlying database) are welcome.
The fact that subsequent queries of the same or similar data run much faster is probably due to SQL Server caching your data. That said, is it possible to speed this initial query up?
Verify the query plan:
My guess is that your query should result in an Index Seek rather than an Index Scan (or worse, a Table Scan). Please verify this using SET SHOWPLAN_TEXT ON; or a similar feature. Using between and = as your query does should really take advantage of the clustered index, though that's debatable.
Index Fragmentation:
It is possible that your clustered index (the primary key in this case) is quite fragmented after all of those inserts and deletes. I would probably check this with DBCC SHOWCONTIG (tblTrendDetails).
You can defrag the table's indexes with DBCC INDEXDEFRAG (MyDatabase, tblTrendDetails).
This may take some time, but will allow the table to remain accessible, and you can stop the operation without any nasty side-effects.
You might have to go further and use DBCC DBREINDEX (tblTrendDetails). This is an offline operation, though, so you should only do this when the table does not need to be accessed.
There are some differences described here: Microsoft SQL Server 2000 Index Defragmentation Best Practices.
Be aware that your transaction log can grow quite a bit from defragging a large table, and it can take a long time.
Partitioned Views:
If these do not remedy the situation (or fragmentation is not a problem), you may even wish to look to partitioned views, in which you create a bunch of underlying base tables for various ranges of records, then union them all up in a view (replacing your original table).
Better Stuff:
If performance of these selects is a real business need, you may be able to make the case for better hardware: faster drives, more memory, etc. If your drives are twice as fast, then this query will run in half the time, yeah? Also, this may not be workable for you, but I've simply found newer versions of SQL Server to truly be faster with more options and better to maintain. I'm glad to have moved most of my company's data to 2008R2. But I digress...
Question: I'm wondering if I can apply an index on a table without filling the transaction log up.
Detail: I have a table that is 800GB. A few months ago I went to apply an index on it and the transaction log filled up (for obvious reasons - was dumb of me to even attempt it). Instead I had to create the table over again, apply the index I wanted and then copy the records over.
Now we are going to be setting up partitions on this table. I'm wondering if I remove the clustered index and apply the new clustered index on a partitioning layout if it will still fill up the transaction log (assume I have 10 million rows per partition)? Or will the tlog not fill up because the indexes would also be partitioned which would allow the SQL server to finish an index faster and be able to checkpoint?
Does anybody have any idea? Otherwise I can just re-create the table again and apply the partition stuff to it and re-fill it but obviously that is much more involved.
Thanks!
Applying the partition scheme will result in a huge log, yes.
First I would recommend you to see if you can, if possible, achieve a minimally logged operation. Offline CREATE INDEX is subject to minimally logging, provided that the database is in bulk-logged mode.
If not possible (eg. full recovery is required) I would look at doing the move to the partitioned table online. Online operations operate in small batches and commit frequently. As long as you take frequent log backups, it will not grow that large.
i'm running into a strange problem in Microsoft SQL Server 2008.
I have a large database (20 GB) with about 10 tables and i'm attempting to make a point regarding how to correctly create indexes.
Here's my problem: on some nested queries i'm getting faster results without using indexes! It's close (one or two seconds), but in some cases using no indexes at all seems to make these queries run faster... I'm running a Checkpoiunt and a DBCC dropcleanbuffers to reset the caches before running the scripts, so I'm kinda lost.
What could be causing this?
I know for a fact that the indexes are poorly constructed (think one index per relevant field), the whole point is to prove the importance of constructing them correctly, but it should never be slower than having no indexes at all, right?
EDIT: here's one of the guilty queries:
SET STATISTICS TIME ON
SET STATISTICS IO ON
USE DBX;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC FREEPROCCACHE;
GO
SELECT * FROM Identifier where CarId in (SELECT CarID from Car where ManufactId = 14) and DataTypeId = 1
Identifier table:
- IdentifierId int not null
- CarId int not null
- DataTypeId int not null
- Alias nvarchar(300)
Car table:
- CarId int not null
- ManufactId int not null
- (several fields followed, all nvarchar(100)
Each of these bullet points has an index, along with some indexes that simultaneously store two of them at a time (e.g. CarId and DataTypeId).
Finally, The identifier table has over million entries, while the Car table has two or three million
My guess would be that SQL Server is incorrectly deciding to use an index, which is then forcing a bookmark lookup*. Usually when this happens (the incorrect use of an index) it's because the statistics on the table are incorrect.
This can especially happen if you've just loaded large amounts of data into one or more of the tables. Or, it could be that SQL Server is just screwing up. It's pretty rare that this happens (I can count on one hand the times I've had to force index use over a 15 year career with SQL Server), but the optimizer is not perfect.
* A bookmark lookup is when SQL Server finds a row that it needs on an index, but then has to go to the actual data pages to retrieve additional columns that are not in the index. If your result set returns a lot of rows this can be costly and clustered index scans can result in better performance.
One way to get rid of bookmark lookups is to use covering indexes - an index which has the filtering columns first, but then also includes any other columns which you would need in the "covered" query. For example:
SELECT
my_string1,
my_string2
FROM
My_Table
WHERE
my_date > '2000-01-01'
covering index would be (my_date, my_string1, my_string2)
Indexes don't really have any benefit until you have many records. I say many because I don't really know what that tipping over point is...It depends on the specific application and circumstances.
It does take time for the SQL Server to work with an index. If that time exceeds the benefit...This would especially be true in subqueries, where a small difference would be multiplied.
If it works better without the index, leave out the index.
Try DBCC FREEPROCCACHE to clear the execution plan cache as well.
This is an empty guess. Maybe if you have a lot of indexes, SQL Server is spending time on analyzing and picking one, and then rejecting all of them. If you had no indexes, the engine wouldn't have to waste it's time with this vetting process.
How long this vetting process actually takes, I have no idea.
For some queries, it is faster to read directly from the table (clustered index scan), than it is to read the index and fetch records from the table (index scan + bookmark lookup).
Consider that a record lives along with other records in a datapage. Datapage is the basic unit of IO. If the table is read directly, you could get 10 records for the cost of 1 IO. If the index is read directly, and then records are fetched from the table, you must pay 1 IO per record.
Generally SQL server is very good at picking the best way to access a table (direct vs index). There may be something in your query that is blinding the optimizer. Query hints can instruct the optimizer to use an index when it is wrong to do so. Join hints can alter the order or method of access of a table. Table Variables are considered to have 0 records by the optimizer, so if you have a large Table Variable - the optimizer may choose a bad plan.
One more thing to look out for - varchar vs nvarchar. Make sure all parameters are of the same type as the target columns. There's a case where SQL Server will convert the whole index to the parameter's type in the event of a type mismatch.
Normally SQL Server does a good job at deciding what index to use if any to retrieve the data in the fastest way. Quite often it will decide not to use any indexes as it can retrieve small amounts of data from small tables quicker without going away to the index (in some situations).
It sounds like in your case SQL may not be taking the most optimum route. Having lots of badly created indexes may be causing it to pick the wrong routes to get to the data.
I would suggest viewing the query plan in management studio to check what indexes its using, and where the time is being taken. This should give you a good idea where to start.
Another note is it maybe that these indexes have gotten fragmented over time and are now not performing to their best, it maybe worth checking this and rebuilding some of them if needed.
Check the execution plan to see if it is using one of these indexes that you "know" to be bad?
Generally, indexing slows down writing data and can help to speed up reading data.
So yes, I agree with you. It should never be slower than having no indexes at all.
SQL server actually makes some indexes for you (e.g. on primary key).
Indexes can become fragmented.
Too many indexes will always reduce performance (there are FAQs on why not to index every col in the db)
also there are some situations where indexes will always be slower.
run:
SET SHOWPLAN_ALL ON
and then run your query with and without the index usage, this will let you see what index if any are being used, where the "work" is going on etc.
No Sql Server analyzes both the indexes and the statistics before deciding to use an index to speed up a query. It is entirely possible that running a non-indexed version is faster than an indexed version.
A few things to try
ensure the indexes are created and rebuilt, and re-organized (defragmented).
ensure that the auto create statistics is turned on.
Try using Sql Profiler to capture a tuning profile and then using the Database Engine Tuning Advisor to create your indexes.
Surprisingly the MS Press Examination book for Sql administration explains indexes and statistics pretty well.
See Chapter 4 table of contents in this amazon reader preview of the book
Amazon Reader of Sql 2008 MCTS Exam Book
To me it sounds like your sql is written very poorly and thus not utilizing the indexes that you are creating.
you can add indexes till you're blue in the face but if your queries aren't optimized to use those indexes then you won't get any performance gain.
give us a sample of the queries you're using.
alright...
try this and see if you get any performance gains (with the pk indexes)
SELECT i.*
FROM Identifier i
inner join Car c
on i.CarID=c.CarID
where c.ManufactId = 14 and i.DataTypeId = 1
I have an app, which cycles through a huge number of records in a database table and performs a number of SQL and .Net operations on records within that database (currently I am using Castle.ActiveRecord on PostgreSQL).
I added some basic btree indexes on a couple of the feilds, and as you would expect, the performance of the SQL operations increased substantially. Wanting to make the most of dbms performance I want to make some better educated choices about what I should index on all my projects.
I understand that there is a detrement to performance when doing inserts (as the database needs to update the index, as well as the data), but what suggestions and best practices should I consider with creating database indexes? How do I best select the feilds/combination of fields for a set of database indexes (rules of thumb)?
Also, how do I best select which index to use as a clustered index? And when it comes to the access method, under what conditions should I use a btree over a hash or a gist or a gin (what are they anyway?).
Some of my rules of thumb:
Index ALL primary keys (I think most RDBMS do this when the table is created).
Index ALL foreign key columns.
Create more indexes ONLY if:
Queries are slow.
You know the data volume is going to increase significantly.
Run statistics when populating a lot of data in tables.
If a query is slow, look at the execution plan and:
If the query for a table only uses a few columns, put all those columns into an index, then you can help the RDBMS to only use the index.
Don't waste resources indexing tiny tables (hundreds of records).
Index multiple columns in order from high cardinality to less. This means: first index the columns with more distinct values, followed by columns with fewer distinct values.
If a query needs to access more than 10% of the data, a full scan is normally better than an index.
Here's a slightly simplistic overview: it's certainly true that there is an overhead to data modifications due to the presence of indexes, but you ought to consider the relative number of reads and writes to the data. In general the number of reads is far higher than the number of writes, and you should take that into account when defining an indexing strategy.
When it comes to which columns to index I'v e always felt that the designer ought to know the business well enough to be able to take a very good first pass at which columns are likely to benefit. Other then that it really comes down to feedback from the programmers, full-scale testing, and system monitoring (preferably with extensive internal metrics on performance to capture long-running operations),
As #David Aldridge mentioned, the majority of databases perform many more reads than they do writes and in addition, appropriate indexes will often be utilised even when performing INSERTS (to determine the correct place to INSERT).
The critical indexes under an unknown production workload are often hard to guess/estimate, and a set of indexes should not be viewed as set once and forget. Indexes should be monitored and altered with changing workloads (that new killer report, for instance).
Nothing beats profiling; if you guess your indexes, you will often miss the really important ones.
As a general rule, if I have little idea how the database will be queried, then I will create indexes on all Foriegn Keys, profile under a workload (think UAT release) and remove those that are not being used, as well as creating important missing indexes.
Also, make sure that a scheduled index maintenance plan is also created.