I know that indexes hurt insert/update performance, but I'm trying to troubleshoot and determine the right balance between query performance and insert/update performance.
We've created a number of views (about 20) for some really really complicated queries. They're really slow for seeking by keys (can take 20 seconds to scan for 5 to 10 keys).
Indexing these views (with both clustered and non-clustered indexes on the various key columns) speeds up their performance in the area of 80x to 100x. It also hurts insert/update performance to the point that a script which inserts about 100 rows into various related tables takes about 45 seconds to run instead of being instantaneous.
I'd prefer not to go the OLAP route for these views (it would add a whole new layer of complexity....and the views are currently updatable, which would pose a reverse synchronization problem)...so I'm trying to figure out how to balance query performance with insert/update performance.
Can someone please suggest how to diagnose the specific problem indexes - and potential ways of reducing their impact on inserts/updates?
I've already tried using covering indexes, indexes with INCLUDEs and composite clustered indexes as alternatives to see if it makes a difference (it doesn't really).
Thanks.
For this scenario please use single column else filtered indexes and avoid composite ones which have more than two columns.
Related
I have some tables in that two of them have around 1 million of records. In one procedure I'm using these tables, and it is taking around 5-10 minute to fetch around 25,000 rows.
I created some clustered and non clustered indexes, and the execution plan shows all are Clustered Index seek or Non-Clustered Index seek. But the procedure still takes more than 5 minute to execute.
So i tried creating Column-Store Index but still no Improvements.
Guys, can anyone give advise me on this. How i need to create Index and Which one is better Column-Store or Ordinary Clustered/non-clustered Index
Whether columnstore index is a good idea depends on the purpose of the table / database is. The columnstore is designed to be used in large fact tables in data warehouses. It is not built for OLTP or any other operational database. If you're working with a data warehouse, clustered columnstore is usually a good idea, although I think it's designed for more than a million tows, but I would assume it still works ok, and you should also get benefits from improved compression.
For OLTP or mixed used, you probably want to just focus on indexing. Look at the query plan and statistics io output to see what's causing the slowness, and if you don't figure out what could be wrong, either edit the post or ask a new one with details about your tables, indexes and the query plan.
Typical things to look in the query plan are index scans and sorts & key lookups for a large number of rows. Since you're working with million rows, there could also be spools or spills into temp db that's causing the slowness.
How can you determine if the performance gained on a SELECT by indexing a column will outweigh the performance loss on an INSERT in the same table? Is there a "tipping-point" in the size of the table when the index does more harm than good?
I have table in SQL Server 2008 with 2-3 million rows at any given time. Every time an insert is done on the table, a lookup is also done on the same table using two of its columns. I'm trying to determine if it would be beneficial to add indexes to the two columns used in the lookup.
Like everything else SQL-related, it depends:
What kind of fields are they? Varchar? Int? Datetime?
Are there other indexes on the table?
Will you need to include additional fields?
What's the clustered index?
How many rows are inserted/deleted in a transaction?
The only real way to know is to benchmark it. Put the index(es) in place and do frequent monitoring, or run a trace.
This depends on your workload and your requirements. Sometimes data is loaded once and read millions of times, but sometimes not all loaded data is ever read.
Sometimes reads or writes must complete in certain time.
case 1: If table is static and is queried heavily (eg: item table in Shopping Cart application) then indexes on the appropriate fields is highly beneficial.
case 2: If table is highly dynamic and not a lot of querying is done on a daily basis (eg: log tables used for auditing purposes) then indexes will slow down the writes.
If above two cases are the boundary cases, then to build indexes or not to build indexes on a table depends on which case above does the table in contention comes closest to.
If not leave it to the judgement of Query tuning advisor. Good luck.
Scenario I have a 10 million row table. I partition it into 10 partitions, which results in 1 million rows per partition but I do not do anything else (like move the partitions to different file groups or spindles)
Will I see a performance increase? Is this in effect like creating 10 smaller tables? If I have queries that perform key lookups or scans, will the performance increase as if they were operating against a much smaller table?
I'm trying to understand how partitioning is different from just having a well indexed table, and where it can be used to improve performance.
Would a better scenario be to move the old data (using partition switching) out of the primary table to a read only archive table?
Is having a table with a 1 million row partition and a 9 million row partition analagous (performance wise) to moving the 9 million rows to another table and leaving only 1 million rows in the original table?
Partitioning is not a performance feature, is for maintenance operations like moving data between tables and dropping data real fast. Partitioning a 10M rows table into 10 partitions of 1M rows each not only that it won't increase most queries performance, will likely make quite a few of them slower.
No query can operate against a smaller set of rows of a single partition unless the query can be determined that it only needs row in that partition alone. But this can be solved, always, and much better, by properly choosing the clustered index on the table, or at least a good covering non-clustered index.
Well first partitioning will only help you if you are using the enterprise version.
I believe it will improve your performance, although the actual benefit you'll get will depend on your specific work load (yhea, it always depends).
It is not exactly as creating 10 smaller tables but if your queries run against the ranges of your partitions the data in that partition will only be "touch". In those cases I think the performance improvement will be noticeable. In cases where queries run across the partition ranges the performance will be worse.
I think you'll have to try what solution fits your data the best, in some cases partition will help you and in some other will do the opposite, another benefit from partitioning is that you won't have to worry about to move your data around.
This is a good article about partitioning - bol
I have an app, which cycles through a huge number of records in a database table and performs a number of SQL and .Net operations on records within that database (currently I am using Castle.ActiveRecord on PostgreSQL).
I added some basic btree indexes on a couple of the feilds, and as you would expect, the performance of the SQL operations increased substantially. Wanting to make the most of dbms performance I want to make some better educated choices about what I should index on all my projects.
I understand that there is a detrement to performance when doing inserts (as the database needs to update the index, as well as the data), but what suggestions and best practices should I consider with creating database indexes? How do I best select the feilds/combination of fields for a set of database indexes (rules of thumb)?
Also, how do I best select which index to use as a clustered index? And when it comes to the access method, under what conditions should I use a btree over a hash or a gist or a gin (what are they anyway?).
Some of my rules of thumb:
Index ALL primary keys (I think most RDBMS do this when the table is created).
Index ALL foreign key columns.
Create more indexes ONLY if:
Queries are slow.
You know the data volume is going to increase significantly.
Run statistics when populating a lot of data in tables.
If a query is slow, look at the execution plan and:
If the query for a table only uses a few columns, put all those columns into an index, then you can help the RDBMS to only use the index.
Don't waste resources indexing tiny tables (hundreds of records).
Index multiple columns in order from high cardinality to less. This means: first index the columns with more distinct values, followed by columns with fewer distinct values.
If a query needs to access more than 10% of the data, a full scan is normally better than an index.
Here's a slightly simplistic overview: it's certainly true that there is an overhead to data modifications due to the presence of indexes, but you ought to consider the relative number of reads and writes to the data. In general the number of reads is far higher than the number of writes, and you should take that into account when defining an indexing strategy.
When it comes to which columns to index I'v e always felt that the designer ought to know the business well enough to be able to take a very good first pass at which columns are likely to benefit. Other then that it really comes down to feedback from the programmers, full-scale testing, and system monitoring (preferably with extensive internal metrics on performance to capture long-running operations),
As #David Aldridge mentioned, the majority of databases perform many more reads than they do writes and in addition, appropriate indexes will often be utilised even when performing INSERTS (to determine the correct place to INSERT).
The critical indexes under an unknown production workload are often hard to guess/estimate, and a set of indexes should not be viewed as set once and forget. Indexes should be monitored and altered with changing workloads (that new killer report, for instance).
Nothing beats profiling; if you guess your indexes, you will often miss the really important ones.
As a general rule, if I have little idea how the database will be queried, then I will create indexes on all Foriegn Keys, profile under a workload (think UAT release) and remove those that are not being used, as well as creating important missing indexes.
Also, make sure that a scheduled index maintenance plan is also created.
I have a table with 158 columns in SQL Server 2005.
any disdvantage of keeping so many columns.
Also I have to keep those many columns,
how can i improve performance - like using SP's, Indexes?
Wide tables can be quite performant when you usually want all the fields for a particular row. Have you traced your user's usage patterns? If they're usually pulling just one or two fields from multiple rows then your performance will suffer. The main issue is when your total row size hits the 8k page mark. That means SQL has to hit the disk twice for every row (first page + overflow page), and thats not counting any index hits.
The guys at Universal Data Models will have some good ideas for refactoring your table. And Red Gate's SQL Refactor makes splitting a table heaps easier.
There is nothing inherently wrong with wide tables. The main case for normalization is database size, where lots of null columns take up a lot of space.
The more columns you have, the slower your queries will be.
That's just a fact. That isn't to say you aren't justified in having many columns. The above does not give one carte blanche to split one entity's worth of table with many columns into multiple tables with fewer columns. The administrative overhead of such a solution would most probably outweigh any perceived performance gains.
My number one recommendation to you, based off of my experience with abnormally wide tables (denormalized schemas of bulk imported data) is to keep the columns as thin as possible. I had to work with a lot of crazy data and left most of the columns as VARCHAR(255). I recommend against this. Although convenient for development purposes, performance would spiral out of control, especially when working with Perl. Shrinking the columns to their bare minimum (VARCHAR(18) for instance) helped immensely.
Stored procedures are just batches of SQL commands; they don't have any direct on speed other than that regular use of certain types of stored procedures will end up using cached query plans (which is a performance boost).
You can use indexes to speed up certain queries, but there's no hard and fast rule here. Good index design depends entirely on the type of queries you're running. Indexing will, by definition, make your writes slower; they exist only to make your reads faster.
The problem with having that many columns in a table is that finding rows using the clustered primary key can be expensive. If it were possible to change the schema, breaking this up into many normalized tables will be the best way to improve efficiency. I would strongly recommend this course.
If not, then you may be able to use indices to make some SELECT queries faster. If you have queries that only use a small number of the columns, adding indices on those columns could mean that the clustered index will not need to be scanned. Of course, there is always a price to pay with indices, in terms of storage space and INSERT, UPDATE and DELTETE time, so this may not be a good idea for you.