Non clustered indexes for special case - sql-server

TABLE SellerTransactions
string SellerId,
string ProductId,
DateTime CreateDate,
string BankNumber,
string Name(name+' '+surname+' 'alias),
string Comments,
decimal Amount
etc...
what would be the best case scenario for search/filtering with non clustered index when we search by sellerID, ProductIds, CreateDate and sometimes Amount/ BankNumber.. should the non clustered index be only on (first sellerID, ProductIds, CreateDate) columns or on all possible columns where the search might happen (a single big non clustered index).
Query will always contain (sellerID, ProductIds, CreateDate) and sometimes additionally bankNumber/Amount.
Say 90% of the time sellerID, ProductIds, CreateDate will be searched and 10% of the time sellerID, ProductIds, CreateDate & Amount or bankNumber.
I was thinking having a nonclustered index on (sellerID, ProductIds, CreateDate) and separate ones for amount and bank number.

I think you have to use a filtered index to improve the performance of your query.
What is a filtered index?
Filtered index is used to get some portion of table.
i.e. a filtered index applies a filter on index which improves query performance.
For more info, see: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes?view=sql-server-2017
Syntax
CREATE NONCLUSTERED INDEX Non_ClustredIndexName
ON Table(ColumnName)
WHERE ColumnName = #ColumnValue
Example as per your table:
1.
CREATE NONCLUSTERED INDEX FI_Employee_DOJ
ON tbl_SellerTransactions(ST_Name)
WHERE ST_Name IS NOT NULL
2.
CREATE NONCLUSTERED INDEX NonCluster_sellerID
ON tbl_SellerTransactions(sellerID)
WHERE sellerID BETWEEN '100' AND '500'
3.
CREATE NONCLUSTERED INDEX FI_Employee_DOJ
ON tbl_SellerTransactions(ST_Name)
INCLUDE(SellerId,amt,ProductId,BankNumber) --Including remaining columns in the index
WHERE ST_Name IS NOT NULL
Notice
Filtered index can be used on views only if filtered indexes are persisted views
Filtered indexes are not created fulltext indexes

Related

Why index setting is able to affect query cost when scan is imperative

I'm having a review of performance tuning study and practicing with AdventureWorks2012.
I built 4 copies from Product table then setup with the following indexes.
--tmpProduct1 nothing
CREATE CLUSTERED INDEX cIdx ON tmpProduct2 (ProductID ASC)
CREATE NONCLUSTERED INDEX ncIdx ON tmpProduct3 (ProductID ASC)
CREATE NONCLUSTERED INDEX ncIdx ON tmpProduct4 (ProductID ASC) INCLUDE (Name, ProductNumber)
Then I do the execution plan with following queries.
SELECT ProductID FROM tmpProduct1
SELECT ProductID FROM tmpProduct2
SELECT ProductID FROM tmpProduct3
SELECT ProductID FROM tmpProduct4
I expected the performance should be the same to all four of them since they all need to scan. Plus, I select only ProductID column and there is no WHERE condition.
However, it turns out to be
Why is clustered index more expensive than non-clustered index?
Why non-clustered index reduce the cost in this scenario?
Why columns store makes query4 cost more than query3?
For query1 without indexes, you are scanning entire table..
For query2 ,you have a clustered index,but then again..you are scanning the entire table..any index is usefull only when you use to eliminate rows..so this is same as query1
Reason for query4 cost more than query 3 may be due to the index you have and the way indexes are stored..For know ,it is enough to know keys are stored at root level and data is stored at leaf level...For more info read this :https://www.sqlskills.com/blogs/kimberly/category/indexes/
For query3,there is only key,so the number of pages required to store the data will be less and thus requires less traversal
For query 4, you have few more columns,thus more pages and more traversal
Below screenshot shows you the pages tmproduct4(18),tmproduct3(15)..so the extra cost may be IO cost required to traverse additional pages

Why QO choses clustered index-scan vs table-scan?

If I have a query like this:
SELECT * FROM tTable
where tTable does not contain any indexes a table-scan happens, as expected. If I add a clustered index on some column then QO decides to use clustered index scan on this query. Why? Why is clustered-index-scan preferred instead of table-scan in this case?
If I add a clustered index on some column then QO decides to use clustered index scan on this query
because when you create a clustered index on a table,data in table is rearranged in index order..so table it self is clustered index.This is also the reason why you can't have two clustered indexes on same table
To summarize,when you create a clustered index,there is only one structure ,not two(clustered index and table)
The query is "give me all rows and all columns" which means "read every row" which is a scan
There is nothing to do an index seek on, because there is no WHERE clause.
Unlike this:
SELECT * FROM tTable WHERE PrimaryClusteredKeyValue = 45
Then this may use a nonclustered seek followed by a clustered key lookup or it may still scan the clustered index because you ask for all columns. It depends on how many rows gbn will match
SELECT * FROM tTable WHERE NonClusteredOtherColumnValue = 'gbn'

Log table with non unique columns; what indexes to create

I have a log table with two columns.
DocumentType (varchar(250), not unique, not null)
DateEntered (Date, not unique, not null)
The table will only have rows inserted, never updated or deleted.
Here is the stored procedure for the report:
SELECT DocumentType,
COUNT(DocumentType) AS "CountOfDocs"
FROM DocumentTypes
WHERE DateEntered>= #StartDate AND DateEntered<= #EndDate
GROUP BY DocumentType
ORDER BY DocumentType ASC;
In the future user may want to also filter by document type in a different report. I currently have a non-clustered index containing both columns. Is this the proper index to create?
Clustered index on the date, for sure.
I think your NCI is fine. I would say both in as named columns as I assume you will have the date in the WHERE clause for your queries. I don't think 1000 per day worst case scenario will have a major impact on insert times when loading the data.
Don't add any index. It'll be heap table and wait for your "future you" with task to select something from this table :).
If you want index:
With heap: Add index on field you will filter and if the second one is only in select (=isn't in where clause) put the second one as included column. If you'll filter with both column put index on both columns.
If you want add clustered index (for example on new autoincrement primary key column) add only one index on col you want filter or try to don't add aditional index and check execution plan and efectivity - in most cases is clustered index with seeks enough.
Don't create clustered index on nonunique columns (it's used only in very special cases).

sql server multi column index queries

If I have created a single index on two columns [lastName] and [firstName] in that order. If I then do a query to find the number of the people with first name daniel:
SELECT count(*)
FROM people
WHERE firstName = N'daniel'
will this search in each section of the first index (lastname) and use the secondary index (firstName) to quickly search through each of the blocks of LastName entries?
This seems like an obvious thing to do and I assume that it is what happens but you know what they say about assumptions.
Yes, this query may - and probably do - use this index (and do an Index Scan) if the query optimizer thinks that it's better to "quickly search through each of the blocks of LastName entries" as you say than (do an Full Scan) of the table.
An index on (firstName) would be more efficient though for this particular query so if there is such an index, SQL-Server will use that one (and do an Index Seek).
Tested in SQL-Server 2008 R2, Express edition:
CREATE TABLE Test.dbo.people
( lastName NVARCHAR(30) NOT NULL
, firstName NVARCHAR(30) NOT NULL
) ;
INSERT INTO people
VALUES
('Johnes', 'Alex'),
... --- about 300 rows
('Johnes', 'Bill'),
('Brown', 'Bill') ;
Query without any index, Table Scan:
SELECT count(*)
FROM people
WHERE firstName = N'Bill' ;
Query with index on (lastName, firstName), Index Scan:
CREATE INDEX last_first_idx
ON people (lastName, firstName) ;
SELECT ...
Query with index on (firstName), Index Seek:
CREATE INDEX first_idx
ON people (firstName) ;
SELECT ...
If you have an index on (lastname, firstname), in this order, then a query like
WHERE firstname = 'daniel'
won't use the index, as long as you don't include the first column of the composite index (i.e. lastname) in the WHERE clause. To efficiently search for firstname only, you will need a separate index on that column.
If you frequently search on both columns, do 2 separate single column indexes. But keep in mind that each index will be updated on insert/update, so affecting performance.
Also, avoid composite indexes if they aren't covering indexes at the same time. For tips regarding composite indexes see the following article at sql-server-performance.com:
Tips on Optimizing SQL Server Composite Indexes
Update (to address downvoters):
In this specific case of SELECT Count(*) the index is a covering index (as pointed out by #ypercube in the comment), so the optimizer may choose it for execution. Using the index in this case means an Index Scan and not an Index Seek.
Doing an Index Scan means scanning every single row in the index. This will be faster, if the index contains less rows than the whole table. So, if you got a highly selective index (with many unique values) you'll get an index with roughly as many rows as the table itself. In such a case usually there won't be a big difference in doing a Clustered Index Scan (implies a PK on the table, iterates over the PK) or a Non-Clustered Index Scan (iterates over the index). A Table Scan (as seen in the screenshot of #ypercube's answer) means that there is no PK on the table, which results in an even slower execution than a Clustered Index Scan, as it doesn't have the advantage of sequential data alignment on disk given by a PK.

What is the difference between composite non clustered index and covering index

SQL Server 2005 includes "covering index" feature which allows us to select more than one non key column to be included to the existing non clustered index.
For example, I have the following columns:
EmployeeID, DepartmentID, DesignationID, BranchID
Here are two scenarios:
EmployeeID is a primary key with
clustered index and the remaining
columns (DepartmentID, DesignationID,
BranchID) are taken as non clustered
index (composite index).
EmployeeID is a primary key with
clustered index and DepartmentID is
non clustered index with
DesignationID, BranchID are "included
columns" for non clustered index.
What is the difference between the above two? If both are same what's new to introduce "Covering Index" concept?
The difference is that if there are two rows with the same DepartmentID in the first index they will be sorted based on their values of DesignationID and BranchID. In the second case they will not be sorted relative to each other and could appear in any order in the index.
In terms of what this means to your application:
A query which can use an index on (DepartmentID, DesignationID) can be more efficient with the first query than the second.
Building the first index may take slightly longer because of the extra sorting required.
Covered index is a nonclustered index with INCLUDE clause

Resources