What fields should be indexed on a given table?

What fields should be indexed on a given table? - sql-server

I've a table with a lot of registers (more than 2 million). It's a transaction table but I need a report with a lot of joins. Whats the best practice to index that table because it's consuming too much time.
I'm paging the table using the storedprocedure paging method but I need an index because when I want to export the report I need to get the entire query without pagination and to get the total records I need a select all.
Any help?

The SQL Server 2008 Management Studio query tool, if you turn on "Include Actual Execution Plan", will tell you what indexes a given query needs to run fast. (Assuming there's an obvious missing index that is making the query run unusually slow, that is.)
SQL Server 2008 Management Studio Query Screenshot http://img208.imageshack.us/img208/4108/image4sy8.png
We use this all the time on Stack Overflow.. one of the best features of SQL 2008. It works against older SQL instances as well, just install the SQL 2008 tools and point them at a SQL 2005 instance. Not sure if it works on anything earlier, though.
As others have noted, you can also do this manually, but it takes a bit of trial and error. You'll want indexes on fields that are used in ORDER BY and WHERE clauses.

key fields have to be everithing in
the where clause ???
No, that would be overkill. Indexing a field really only works if a) your WHERE clause is selective enough (that is: only selects out about 1-2% of the values; an index on a "Gender" field which can be only one of two or three possible values is pointless), and b) your WHERE clause doesn't involve function calls or other magic.
In your case, TBL.Status might be a candidate - how many possible values are there? You select the '1' and '2' value - if there are hundreds of possible values, then it's a good choice.
On a side note:
this clause here: (TBL.Login IS NULL AND TBL.Login <> 'dev' ) is pretty pointless - if the value of TBL.login IS NULL, then it's DEFINITELY not 'dev' ..... so just the "IS NULL" will be more than sufficient......
The other field you might want to consider putting an index on is the TBL.Date, since you seem to select a range of dates here - that might be a good choice.
Also, on a general note: whenever possible, DO NOT use a SELECT * FROM ...... to select your fields. This causes a lot of overhead for SQL Server. SPECIFY your columns - and ONLY select those that you REALLY NEED - not just all of them for the heck of it.....

Check your queries, and find which fields are used to match them. Those are usually the best candidates!

SQL Server has a 'Database Engine Tuning Advisor' that could help you. This does not exist for SQL Server Express, but does for all other versions of SQL Server.
Load your query in a query window.
On the menu, click Query -> Analyze Query in Database Engine
Tuning Advisor
The tuning advisor will identify indexes that could be added to your table(s) to improve performance. In my experience, the tuning advisor doesn't always help, but most of the time it does. It's where I suggest you start.

ok this is the query in doing
SELECT
TBL.*
FROM
FOREINGDATABASE..TABLENAME TBL
LEFT JOIN Status S
ON TBL.Status = S.Number
WHERE
(TBL.ID = CASE #Reference WHEN 0 THEN TBL.ID ELSE #Reference END) AND
TBL.Date >= #FechaInicial AND
TBL.Date <= #FechaFinal AND
(TBL.Channel = CASE #Canal WHEN '' THEN TBL.Channel ELSE #Canal END)AND
(TBL.DocType = CASE #TipoDocumento WHEN '' THEN TBL.DocType ELSE #TipoDocumento END)AND
(TBL.Document = CASE #NumDocumento WHEN '' THEN TBL.Document ELSE #NumDocumento END)AND
(TBL.Login = CASE #Login WHEN '' THEN TBL.Login ELSE #Login END)AND
(TBL.Login IS NULL AND TBL.Login <> 'dev' ) AND
TBL.Status IN ('1','2')
key fields have to be everithing in the where clause ???

If I am not mistaken, please correct me if I am, I think you should create non-clustered Index on the fields of the conditions of the where clause. (Maybe this can be useful as a starting point to get some candidates for the indexes).
Good Luck

if an Index Scan instead of a seek is performed, the cause might be that the fields are not in the correct order in the index.

put indexes on all columns that you're joining and filtering on.
the use of indexes is also determined by the selectivity of the indexed column.
the best way would be to show us your query so we can try to improve it.

Related

SQL Server : view MUCH slower than same query by itself

From this SO answer a view should provide the same performance as using the same query directly.
Is querying over a view slower than executing SQL directly?
I have a view where this is not true.
This query targeting a view
SELECT
*
FROM
[Front].[vw_Details] k
WHERE
k.Id = 970435
Takes 10 seconds to complete. Copying the query from the view and adding WHERE k.Id = 970435 to it completes in less than 1 second. The view is nothing special, 4 LEFT JOINs, and a few CASE directives to clean up data.
How can I figure out what the problem is, or what do I need to complete this question with in order for this to be answerable?
Update 1:
SQL Server Version: 12.0.4436.0
Query plan for view: https://pastebin.com/RY40Ab0k
Query plan for select: https://pastebin.com/gwahhgpu

Your query plan is no longer visible, but if you look in the plan, you will most likely see a triangle complaining about the cardinality estimation and/or implicite declaration. What it means is that you are joining tables in a way where your keys are hard to guess for the SQL engine.
It is instant when you run from a query directly, probably because it doesn't need to guess the size of your key is
For example:
k.Id = 970435
SQLSERVER already knows that it is looking for 970435 a 6 digit number.
It can eliminate all the key that doesn't start by 9 and doesn't have 6 digits. :)
However, in a view, it has to build the plan in a way to account for unknown. Because it doesn't know what kind of key it may hold.
See the microsoft for various example and scenario that may help you.
https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15
If you are always looking for an int, one work around is to force the type with a cast or convert clause. It's may cause performance penalty depending on your data, but it is a trick in the toolbox to tell sql to not attempt the query a plan as varchar(max) or something along that line.
SELECT *
FROM [Front].[vw_Details] k
WHERE TRY_CONVERT(INT,k.Id) = 970435

use a stored procedure to return results. stored procedures use indexes, whereas, views often don't
or
use a table function and query the table function

How to speed up SQL like %pattern% searches?

Is there any way to speed up queries like this below ? I am looking for option which would require minimal change to application code.
SELECT *
FROM my_table
WHERE some_column like '%my string%'
ORDER BY some_column
The table which causes most of the slowdown has 2,5 million records and query takes 10 seconds to execute.
Execution plan tells that 99% of the cost is index scan (NonClustered), which is understandable because of LIKE and pattern with "%" on both sides.
If there is "%" just at the end, then index seek is used and query executes in a moment.
So I am looking for something like:
to add some kind of aditional index on the table, probably not
possible ?
a way to put this table and/or index into RAM sa the seek would be
faster
anything else ?
I can use either MS SQL 2012 or 2014, both standard edition.
Bonus question
Is it possible that this very same queries would execute instanteniously on DB2 database ? App was using db2 initially but was migrated over to MS SQL.

There may not be an answer, but there is a reason. When you use a search string with a leading wildcard, such as '%string', you're forcing the optimizer to do a table scan.
You might want to revisit some of the suggestions in this thread.
Good luck!

You can try changing the column collation to some binary form:
in query:
SELECT *
FROM my_table
WHERE some_column COLLATE Latin1_General_BIN like '%my string%'
ORDER BY some_column
or change it in the table design permanently if you can.
Caveat: it's cAsE sEnSiTiVe.
Edit: you can get around case sensitivity by converting both the column and the search string to upper case for example:
SELECT *
FROM my_table
WHERE UPPER(some_column) COLLATE Latin1_General_BIN like '%MY STRING%'
ORDER BY some_column
Edit 2: backup the database before doing any perpanent collation changes, I'm not sure how exactly it compares but I think in query it should be ok.
Explation article.

I'm not sure this solution is an option for you as it stores more data in database. it also may increase the time for update/insert, but its an idea anyway. Too long to put in to comment, so don't blame me!
Add a persisted computed column for the some_column with this formula: REVERSE(some_column) to store reverse of the string
Add index on that column
In your query use some_column like 'my string%' or rev_some_column like REVERSE('%my string'). You'd better to replace REVERSE('%my string') with a variable initiated before query.
I think in this case, both likes will use index.

MSSQL/Oracle Query Tuning 500,000+ records Coldfusion - does lower() reduce performance

I'm not trying to start a debate on which is better in general, I'm asking specifically to this question. :)
I need to write a query to pull back a list of userid (uid) from a database containing 500k+ records. I'm returning just the one field, uid. I can query either our Oracle box or our MSSQL 2000 box. The query looks like this (this has not been simplied)
select uid
from employeeRec
where uid = 'abc123'
Yes, it really is that simply of a query. Where I need the tuninig help is that the uid is indexed and some uid could be (not many but some) 'ABC123' or 'abc123'. MSSQL doesn't care of the case-sensitivity whereas Oracle does. So for Oracle, my query would look like this:
select uid
from employeeRec
where lower(uid) = 'abc123'
I've learned that if you use lower on an index field in MSSQL, you render the index useless (there are ways around it but that is beyond the scope of my question here - since if I choose MSSQL, I don't need to use lower at all). I wanted to know if I choose Oracle, and use the lower() function, will that also hurt performance of the query?
I'm looping over this query about 200 times in addition to some other queries that are being run and to process the entire loop takes 1 second per iteration and I've narrowed down the slowness to this particular query. For a web page, 200 seconds seems like eternity. For you CF readers, timeout value has been increased so the page doesn't error out and there are no page errors, I'm just trying to speed up this query.
Another item to note: This database is in a different city than the other queries being run so I do expect some lag time there.

As TomTom put, your index will simply not be used by Oracle. But, you can create a function based index, and this new index will be used when you issue your query.
create index my_new_ix on employeeRec(lower(uid));

Wrapping an indexed column in a function call would have the potential to cause performance problems in Oracle. Oracle couldn't use a plain index on UID to process your query. On the other hand, you could create a function-based index on lower(uid) that would be used by the query, i.e.
CREATE INDEX case_insensitive_idx
ON employeeRec( lower( uid ) );
Note that if you want to do case-insensitive queries in general, you may be better served setting NLS parameters to force case-insensitivity. You'd still need function-based indexes on the columns you're searching on, but it can simplify your queries a bit.

I wanted to know if I choose Oracle,
and use the lower() function, will
that also hurt performance of the
query?
Yes. The perforamnce reduction is because the index is on the original value and the collation i case sensitive, so all possible values must be run through the function to filter out the ones matching.

How do I filter one of the columns in a SQL Server SQL Query

I have a table (that relates to a number of other tables) where I would like to filter ONE of the columns (RequesterID) - that column will be a combobox where only people that are not sales people should be selectable.
Here is the "unfiltered" query, lets call it QUERY 1:
SELECT RequestsID, RequesterID, ProductsID
FROM dbo.Requests
If using a separate query, lets call it QUERY 2, to filter RequesterID (which is a People related column, connected to People.PeopleID), it would look like this:
SELECT People.PeopleID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
Now, is there a way of "merging" the QUERY 2 into QUERY 1?
(dbo.Requests in QUERY 1 has RequesterID populated as a Foreign Key from dbo.People, so no problem there... The connections are all right, just not know how to write the SQL query!)
UPDATE
Trying to explain what I mean in a bit more... :
The result set should be a number of REQUESTS - and the number of REQUESTS should not be limited by QUERY 2. QUERY 2:s only function is to limit the selectable subset in column Requests.RequesterID - and no, it´s not that clear, but in the C# VS2008 implementation I use Requests.RequesterID to eventually populate a ComboBox with [Full name], which is another column in the People table - and in that column I don´t want SalesGuy to show up as possible to select; here I´m trying to clear it out EVEN MORE... (but with wrong syntax, of course)
SELECT RequestsID, (RequesterID WHERE RequesterID != 8), ProductsID
FROM dbo.Requests
Yes, RequesterID 8 happens to be the SalesGuy :-)

here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)

How about this? Since the query already joins on the requests table you can simply add the columns to the select-list like so :
SELECT Requests.RequestsID, Requests.RequesterID, Requests.ProductsID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
You can in fact select any column from any of the joined tables (Roles, Requests, People, etc.)
It becomes clear if you just replace People.PeopleId with * and it will show you everything retrieved from the tables.

Select statement performance

I'm having a performance issue with a select statement I'm executing.
Here it is:
SELECT Material.*
FROM Material
INNER JOIN LineInfo ON Material.LineInfoCtr = LineInfo.ctr
INNER JOIN Order_Header ON LineInfo.Order_HeaderCtr = Order_Header.ctr
WHERE (Order_Header.jobNum = 'ttest')
AND (Order_Header.revision_number = 0)
AND (LineInfo.lineNum = 46)
The statement is taking 5-10 seconds to execute depending on server load.
Some table stats:
- Material has 2,030,xxx records.
- Lineinfo has 190,xxx records
- Order_Header has 2,5xx records.
My statement is returning a total of 18 rows containing about 20-25 fields of data. Returning a single field or all of them makes no difference. Is this performance typical? Is there something I could do to improve it?
I've tried using a sub select to retrieve the foreign key, the IN clause and I found one post where a fella said using a left outer join helped him. For me, they all yield the same 5 to 10 seconds of execution time.
This is MS SQL server 2005 accessed through MS SQL management studio. Times are the elapsed time in query analyzer.
Any ideas?

The first thing you should do is analyze the query plan, to see what indexes (if any) SQL Server is using.
You can probably benefit from some covering indexes in this query, since you only use columns in Lineinfo and Order_Header for the join and the query restriction (the WHERE clause).

I do not see anything special in your query so, if indexes are correct, it should perform much more faster than that,, the number of rows is not very high.
Do you have indexes on the table involved in the query and have you tried to use the "display execution plan" option of the Query Analyzer. Basically you need to run the query, loop at the execution plan and add indexes so that you do not see any full table scan operation.
If you run from SQL Management studio then you have the option to tune automatically the query adding indexes but I would suggest trying optimize on your own to better understand what you're doing.
Regards
Massimo

It won't affect performance, but don't write a query such as "SELECT * FROM X". Eschew the star notation and spell out the individual columns. The code that calls this will still work that way, even if the schema is changed by adding a column.
Indexes are key here, as others have already said.
The order of the WHERE clauses can help. Execute the one that eliminates the greatest number of rows from consideration first.

Taking all suggestions and rolling them together I was able to setup some indexes and now it's taking less than a second to execute. Honestly, it's almost immediate.
My problem was that by clicking on the table properties I saw that the primary key was indexed and I mistakenly thought that's what everyone had been talking about. I looked at the execution plan and ran the tuning assistant and putting the two together, I realized that you could index the foreign keys too. That is now done and things are exceptionally snappy.
Thanks for the help, and sorry for such a newb question.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight