(my english is not good enough. so bear with me)
I am working on optimizing this query:
Select Title from Groups Where GroupID = WWPMD.GroupID) as GroupName
FROM dbo.Detail AS WWPMD INNER JOIN
dbo.News AS WWPMN ON WWPMD.DetailID = WWPMN.NewsID
WHERE
WWPMD.IsDeleted = 0 AND WWPMD.IsEnabled= 1 AND WWPMD.GroupID IN (
Select ModuleID FROM Page_Content_Item WHERE ContentID='a6780e80-4ead4e62-9b22-1190bb88ed90')
in this case, tables have clustered indexes on primary keys which are GUID. in the execution plan, the arrows were a little thick, and the cost of clustered index seek, for table "detail" was 87%, and there was no cost for key lookup
then, I changed indexes of table "detail". I have put clustered index on a datetime column, and 3 unclustered indexes on PK and FKs. now in the execution plan, index seek cost for table detail is 4percent and key lookup is 80 percent, with thin arrows.
I want to know which execution plan is better, and what can I do to improve this query.
UPDATE:
thank all of you for your guidance. one more question. I want to know if 80% cost of a clustered index seek is better, or 80% total cost of a non-clustered index seek and key look up. which one is better?
IN statement is good for selecting littele bit data if you are selecting more data you should use INNER JOIN its better performance than IN for large data
IN is faster than JOIN on DISTINCT
EXISTS is more efficient that IN, because "EXISTS returns only one row"
Clustered index on a guid column is not a good idea when your guids are not sequential, since this will lead to performance loss on inserts. The records in your table are physically ordered based on the clustered index. The clustered index should be put on a column wich has sequential values and doesn't change (often).
if you have a nonclustered index on groupid (table groups), then you could make 'title' an included column on that index. See msdn for included columns.
I suggest that use following query:
INNER JOIN (SELECT DISTINCT ModuleID FROM Page_Content_Item WHERE ContentID='a6780e80-4ead4e62-9b22-1190bb88ed90')z OR WWPMD.GroupID= z.ModuleID
Instead of:
AND WWPMD.GroupID IN (
Select ModuleID FROM Page_Content_Item WHERE ContentID='a6780e80-4ead4e62-9b22-1190bb88ed90')
Also must be survey execution plan of this query, It seem that Index on Detail.DetailId with filter (IsDeleted = 0 and IsEnable = 1) was very useful.
Indexes will not be particularly useful unless the leftmost indexed columns are specified in JOIN and/or WHERE clauses. I suggest thise indexes below (unique when possible): dbo.Detail(GroupID), dbo.News(DetailID), dbo.Page_Content_Item(ContentID)
You can fine tine these indexes with included columns and/or filters, but I suspect performance may be good enough without those measures.
Note that primary keys are important. Not only is that a design best practice, a unique index will automatically be created on the key columns(s), which can improve performance of joins to the table on related columns. Consider reviewing your model to ensure you have proper primary keys and relationships.
Related
I have a fact table which is partitioned along the PeriodDate column.
CREATE TABLE MyFactTable
(
PeriodDate DATE,
OrderID INT
CONSTRAINT MyFk FOREIGN KEY (OrderID) REFERENCES DimOrder(ID)
)
I'd like create a partition aligned index on the OrderID column, and as I understood from BOL, I need to include the partitioning key (PeriodDate) in order to have the index aligned.
Like this:
CREATE NONCLUSTERED INDEX MyAlignedOrderIdIndex
ON MyFactTable (OrderID, PeriodDate);
My question is: in what order should I put the two columns in the index above?
ON MyFactTable (OrderID, PeriodDate);
or
ON MyFactTable (PeriodDate, OrderID);
As I read on BOL as well the order matters in composite indexes, and my queries will usually use OrderID to lookup Dim table data.
First OrderID, PeriodDate order seems logical choice, but since I am not familiar with partitioning I don't know how it will "like it" when the tables has millions of rows.
What does the best practices dictates here?
My question is: In what order should I put the two columns in the index above?:
(OrderID,PeriodDate) The index is there to enable retrieval of all the facts for a given set of OrderIDs, and if your partitions have multiple PeriodDate's in them, having the index with PeriodDate first, wouldn't be as helpful.
The general rule of thumb here is that you don't partition by the leading column. That way you get partition elimination and the index order as fully independent access paths.
My dimension will have a dozen or maximum a hundred rows. The fact table will have millions of rows. Does it worth to create this index?
You'll have to test. It really depends on the queries. However if your fact table is a clustered columnstore (which a fact table with millions of rows typically should be), you'll probably find that the index is not used much for the query workload. Ie it may be used for queries for a single product, but not for queries that filter by non-key product attributes.
I have a SQL 2005 database I've inherited, with a table that has grown to about 17 million records over the course of about 15 years, and is now horribly slow.
The table layout looks about like this:
id_column = nvarchar(20),indexed, not unique
column2 = nvarchar(20), indexed, not unique
column3 = nvarchar(10), indexed, not unique
column4 = nvarchar(10), indexed, not unique
column5 = numeric(8,0), indexed, not unique
column6 = numeric(8,0), indexed, not unique
column7 = nvarchar(20), indexed, not unique
column8 = nvarchar(10), indexed, not unique
(and about 5 more columns that look pretty much the same, not indexed)
The 'id' field is a value entered in a front-end application by the end-user.
There are no defined primary keys, and no columns that can be combined to make a unique row (unless all columns are combined). The table actually is a 'details' table to another table, but there are no constraints ensuring referential integrity.
Every column is heavily used in 'where' clauses in queries, which is why I assume there's an index on every one, or perhaps a desperate attempt to speed things up by another DBA.
Having said all that, my question is: could adding a clustered index do me any good at this point?
If I did add a clustered index, I assume it would have to be a new column, ie., an identity column? Basically, is it worth the trouble?
Appreciate any advice.
I would say only add the clustered index if there is a reasoning for needing it. So ask these questions;
Does the order of the data make sense?
Is there sequential value to the way the data is inserted?
Do I need to use a feature that requires it have a clustered index, such as full text index?
If the answer to these questions is all "No" than a clustered index might not be of any additional help over a good non-clustered index strategy. Instead you might want to consider how and when you update statistics, when you refresh the indexes and whether or not filtered indexes make sense in your situation. Looking at the table you have as an example it tough to say, but maybe it makes sense to normalize the table further and use numeric keys instead of nvarchar.
http://www.mssqltips.com/sqlservertip/3041/when-sql-server-nonclustered-indexes-are-faster-than-clustered-indexes/
the article is a great example of when non-clustered indexes might make more sense.
I would recommend adding a clustered index, even if it's an identity column for 3 reasons:
Assuming that your existing queries have to go through the entire table every time, a clustered index scan is still faster than a table scan.
The table is a child to some other table. With some extra works, you can use the new child_id to join against the parent table. This enables clustered index seek, which is a lot faster than scan in some cases.
Depend on how they are setup, the existing indices may not do much good. I've come across some terrible indices that cover 1 column each, or indices that don't include the appropriate columns, causing costly Key Lookups operations. Check your index stats to see if they are being used at all.
we have a straightforward table such as this:
OrderID primary key / clustered index
CustomerID foreign key / single-column non-clustered index
[a bunch more columns]
Then, we have a query such as this:
SELECT [a bunch of columns]
FROM Orders
WHERE CustomerID = 1234
We're finding that sometimes SQL Server 2008 R2 does a seek on the non-clustered index, and then a bookmark lookup on the clustered index (we like this - it's plenty fast).
But on other seemingly random occasions, SQL Server instead does a scan on the clustered index (very slow - brings our app to a crawl - and it seems to do this at the busiest hours of our day).
I know that we could (a) use an index hint, or (b) enhance our non-clustered index so that it covers our large set of selected columns. But (a) ties the logical to the physical, and regarding (b), I've read that an index shouldn't cover too many columns.
I would first love to hear any ideas why SQL Server is doing what it's doing. Also, any recommendations would be most appreciated. Thanks!
The selectivity of CustomerID will play some part in the query optimiser's decision. If, on one hand, it was unique, then an equality operation will yield at most one result, so a SEEK/LOOKUP operation is almost guaranteed. If, on the other hand, potentially hundreds or thousands of records will match a value for CustomerID, then a clustered-index scan may seem more attractive.
You'd be surprised how selective a filter has to be to preclude a scan. I can't find the article I originally pulled this figure from, but if CustomerID 1234 will match as little as 4% of the records in the table, a scan on the clustered index may be more efficient, or at least appear that way to the optimiser (which doesn't get it right 100% of the time).
It sounds at least plausible that the statistics kept on the non-clustered index on CustomerID is causing the optimiser to toggle between seek/scan based on the selectivity criteria.
You may be able to coax the optimiser towards use of the index by introducing a JOIN or EXISTS operation:
-- Be aware: this approach is untested
select o.*
from Orders o
inner join Customers c on o.CustomerID = c.CustomerID
where c.CustomerID = 1234;
Or:
-- Be aware: this approach is untested
select o.*
from Orders o
where exists (select 1
from Customers c
where c.CustomerID = 1234 and
o.CustomerID = c.CustomerID);
Also be aware that with this EXISTS approach, if you don't have an index on the "join" predicate (in this case, the CustomerID field) in both tables then you will end up with a nested loop which is painfully slow. Using inner joins seems much safer, but the EXISTS approach has its place from time to time when it can exploit indexes.
These are just suggestions; I can't say if they will be effective or not. Just something to try, or for a resident expert to confirm or deny.
You should make your index a covered index so that the bookmark lookup is not required. This is the potentially expensive operation which may be causing the query optimiser to ignore your index.
If you are using SQL Server 2005 or above, you can add them as included columns, otherwise you would have to add them as additional key columns.
A covered index always performs better than a noncovered index, particularly for nonselective queries.
I am no good at SQL.
I am looking for a way to speed up a simple join like this:
SELECT
E.expressionID,
A.attributeName,
A.attributeValue
FROM
attributes A
JOIN
expressions E
ON
E.attributeId = A.attributeId
I am doing this dozens of thousands times and it's taking more and more as the table gets bigger.
I am thinking indexes - If I was to speed up selects on the single tables I'd probably put nonclustered indexes on expressionID for the expressions table and another on (attributeName, attributeValue) for the attributes table - but I don't know how this could apply to the join.
EDIT: I already have a clustered index on expressionId (PK), attributeId (PK, FK) on the expressions table and another clustered index on attributeId (PK) on the attributes table
I've seen this question but I am asking for something more general and probably far simpler.
Any help appreciated!
You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.
In fact, because there are so few columns being returned, I would consider a covered index for this query
i.e. an index that includes all the fields in the query.
Some things you need to care about are indexes, the query plan and statistics.
Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).
Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).
Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.
That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.
I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?
Another thing to do is add some indexes like this:
attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}
This is hacky! But useful if it's a last resort.
What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).
This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.
Finally, the best way to speed your query is to add a WHERE clause!
If I'm understanding your schema correctly, you're stating that your tables kinda look like this:
Expressions: PK - ExpressionID, AttributeID
Attributes: PK - AttributeID
Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.
Tips,
If you want to speed up your query using join:
For "inner join/join",
Don't use where condition instead use it in "ON" condition.
Eg:
select id,name from table1 a
join table2 b on a.name=b.name
where id='123'
Try,
select id,name from table1 a
join table2 b on a.name=b.name and a.id='123'
For "Left/Right Join",
Don't use in "ON" condition, Because if you use left/right join it will get all rows for any one table.So, No use of using it in "On". So, Try to use "Where" condition
I am working on a database that usually uses GUIDs as primary keys.
By default SQL Server places a clustered index on primary key columns. I understand that this is a silly idea for GUID columns, and that non-clustered indexes are better.
What do you think - should I get rid of all the clustered indexes and replace them with non-clustered indexes?
Why wouldn't SQL's performance tuner offer this as a recommendation?
A big reason for a clustered index is when you often want to retrieve rows for a range of values for a given column. Because the data is physically arranged in that order, the rows can be extracted very efficiently.
Something like a GUID, while excellent for a primary key, could be positively detrimental to performance, as there will be additional cost for inserts and no perceptible benefit on selects.
So yes, don't cluster an index on GUID.
As to why it's not offered as a recommendation, I'd suggest the tuner is aware of this fact.
You almost certainly want to establish a clustered index on every table in your database.
If a table does not have a clustered index it is what is referred to as a "Heap" and performance of most types of common queries is less for a heap than for a clustered index table.
Which fields the clustered index should be established on depend on the table itself, and the expected usage patterns of queries against the table. In almost every case you probably want the clustered index to be on a column or a combination of columns that is unique, i.e., (an alternate key), because if it isn't, SQL will add a unique value to the end of whatever fields you select anyway. If your table has a column or columns in it that will be frequently used by queries to select or filter multiple records, (for example if your table contains sales transactions, and your application will frequently request sales transactions by product Id, or even better, a Invoice details table, where in almost every case you will be retrieving all the detail records for a specific invoice, or an invoice table where you often retrieve all the invoices for a particular customer... This is true whether you will be selected large numbers of records by a single value, or by a range of values)
These columns are candidates for the clustered index. The order of the columns in the clustered index is critical.. The first column defined in the index should be the column that will be selected or filtered on first in expected queries.
The reason for all this is based on understanding the internal structure of a database index. These indices are called balanced-tree (B-Tree) indices. they are kinda like a binary tree, except that each node in the tree can have an arbitrary number of entries, (and child nodes), instead of just two. What makes a clustered index different is that the leaf nodes in a clustered index are the actual physical disk data pages of the table itself. whereas the leaf nodes of the non-clustered index just "point" to the tables' data pages.
When a table has a clustered index, therefore, the tables data pages are the leaf level of that index, and each one has a pointer to the previous page and the next page in the index order (they form a doubly-linked-list).
So if your query requests a range of rows that is in the same order as the clustered index... the processor only has to traverse the index once (or maybe twice), to find the start page of the data, and then follow the linked list pointers to get to the next page and the next page, until it has read all the data pages it needs.
For a non-clustered index, it has to traverse the index once for every row it retrieves...
NOTE: EDIT
To address the sequential issue for Guid Key columns, be aware that SQL2k5 has NEWSEQUENTIALID() that does in fact generate Guids the "old" sequential way.
or you can investigate Jimmy Nielsens COMB guid algotithm that is implemented in client side code:
COMB Guids
The problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
However, with integer-based clustered indexes, the integers are normally sequential (like with an IDENTITY spec), so they just get added to the end an no data needs to be moved around.
On the other hand, clustered indexes are not always bad on GUIDs... it all depends upon the needs of your application. If you need to be able to SELECT records quickly, then use a clustered index... the INSERT speed will suffer, but the SELECT speed will be improved.
While clustering on a GUID is normally a bad idea, be aware that GUIDs can under some circumstances cause fragmentation even in non-clustered indexes.
Note that if you're using SQL Server 2005, the newsequentialid() function produces sequential GUIDs. This helps to prevent the fragmentation problem.
I suggest using a SQL query like the following to measure fragmentation before making any decisions (excuse the non-ANSI syntax):
SELECT OBJECT_NAME (ips.[object_id]) AS 'Object Name',
si.name AS 'Index Name',
ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
ips.page_count AS 'Pages',
ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats
(DB_ID ('MyDatabase'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE si.object_id = ips.object_id
AND si.index_id = ips.index_id
AND ips.index_level = 0;
If you are using NewId(), you could switch to NewSequentialId(). That should help the insert perf.
Yes, there's no point in having a clustered index on a random value.
You probably do want clustered indexes SOMEWHERE in your database. For example, if you have a "Author" table and a "Book" table with a foreign key to "Author", and if you have a query in your application that says, "select ... from Book where AuthorId = ..", then you would be reading a set of books. It will be faster if those book are physically next to each other on the disk, so that the disk head doesn't have to bounce around from sector to sector gathering all the books of that author.
So, you need to think about your application, the ways in which it queries the database.
Make the changes.
And then test, because you never know...
As most have mentioned, avoid using a random identifier in a clustered index-you will not gain the benefits of clustering. Actually, you will experience an increased delay. Getting rid of all of them is solid advice. Also keep in mind newsequentialid() can be extremely problematic in a multi-master replication scenario. If database A and B both invoke newsequentialid() prior to replication, you will have a conflict.
Yes you should remove the clustered index on GUID primary keys for the reasons Galwegian states above. We have done this on our applications.
It depends if you're doing a lot of inserts, or if you need very quick lookup by PK.