SQL Server Index Dependency - sql-server

I'm considering dropping an index from a table in a SQL Server 2005 instance. Is there a way that I can see which stored procedures might have statements that are dependent on that index?

First check if the indexes are being used at all, you can use the sys.dm_db_index_usage_stats DMV for that, check the user_scans and the user_seeks column
read this Use the sys.dm db index usage stats dmv to check if indexes are being used

Nope. For one thing, index selection is dynamic - the indexes aren't selected until the query executes.
Barring "HINT", but let's not go there.

As le dorfier says, this depends on the execution plan SQL determines at runtime. I'd suggest setting up perfmon to track table scans, or keep sql profiler running after you drop the index filtering for the colum names you're indexing. Look for long running queries.

Related

SQL Server Update Statistics

I have 2 questions about SQL Server statistics, please help me. I am using SQL Server 2016.
My table TBL1 has only one column COL1. When I used COL1 in joins with other tables, statistics are automatically created on COL1.
Next I create a non-clustered index on COL1 of TBL1, then another set of statistics are created on COL1. Now I have 2 sets of statistics on COL1.
Out of the above 2 statistics, which statistics are used by SQL Server for further queries? I am assuming that the statistics created by the non-clustered index will be used, am I right?
If I use the Update Statics TBL1 command, all the statistics for TBL1 are updated. In the MSDN documentation, I see that updating statistics causes queries to recompile, what do they mean by re-compiling of queries? The MSDN link is
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/update-statistics?view=sql-server-ver15
Please explain.
If there's only 1 column in your table, there's no reason to have a non-clustered index. This creates a separate copy of that data. Just create the clustered index on that column.
Yes - Since your table only has the one column and an index was created on that column, it's almost certain that SQL Server will use that index whenever joining to that table and thus the statistics for that index will be used.
In this context, it means that the execution plan in cache will be invalidated due to stale statistics and the next time a query executes the optimizer will recreate an execution plan. In other words, it will be assumed there may be a better set of steps to execute the query and the optimizer will try to assemble a better set of steps (execution plan) to execute.
Recommended Reading:
SQL Server Statistics
Understanding Execution Plans
Execution Plan Caching & Reuse

Slow query with cfqueryparam searching on indexed column containing hashes

I have the following query that runs in 16ms - 30ms.
<cfquery name="local.test1" datasource="imagecdn">
SELECT hash FROM jobs WHERE hash in(
'EBDA95630915EB80709C69089315399B',
'3617B8E6CF0C62ECBD3C48DDF8585466',
'D519A38F09FDA868A2FEF1C55C9FEE76',
'135F94C3774F7719CFF8FF3A275D2D05',
'D58FAE69C559273D8427673A08193789',
'2BD7276F209768F2FCA6635659D7922A',
'B1E3CFBFCCFF6F5B48A849A050E6D424',
'2288F5B8A797F5302E8CA24323617236',
'8951883E36B5D38A4643DFAA0396BF13',
'839210BD564E30BE1355D1A6D4EF7081',
'ED4A2CB0C28B608C29576819CF7BE19B',
'CB26925A4874945B810707D5FF0B91F2',
'33B2FC229F0CC797A02AD163CDBA0875',
'624986E7547DBAC0F47B3005CFDE0A16',
'6F692C289BD805CEE41EF59F83F16F4D',
'8551F0033C617BD9EADAAD6CEC4B3E9E',
'94C3C0A74C2DE085FF9F1BBF928821A4',
'28DC1A9D2A69C2EDF5E6C0E6368A0B3C'
)
</cfquery>
If I execute the same query but use cfqueryparam it runs in 500ms - 2000ms.
<cfset local.hashes = "[list of the same ids as above]">
<cfquery name="local.test2" datasource="imagecdn">
SELECT hash FROM jobs WHERE hash in(
<cfqueryparam cfsqltype="cf_sql_varchar" value="#local.hashes#" list="yes">
)
</cfquery>
The table has roughly 60,000 rows. The "hash" column is varchar(50) and has a unique non-clustered index, but is not the primary key. DB server is MSSQL 2008. The web server is running the latest version of CF9.
Any idea why the cfqueryparam causes the performance to bomb out? It behaves this way every single time, no matter how many times I refresh the page. If I pair the list down to only 2 or 3 hashes, it still performs poorly at like 150-200ms. When I eliminate the cfqueryparam the performance is as expected. In this situation there is the possibility for SQL injection and thus using cfqueryparam would certainly be preferable, but it shouldn't take 100ms to find 2 records from an indexed column.
Edits:
We are using hashes generated by hash() not UUIDS or GUIDS. The hash is generated by a hash(SerializeJSON({ struct })) which contains the plan for a set of operations to execute on an image. The purpose for this is that it allows us to know before insert and before query the exact unique id for that structure. These hashes act as an "index" of what structures have already been stored in the DB. In addition with hashes the same structure will hash to the same result, which is not true for UUIDS and GUIDS.
The query is being executed on 5 different CF9 servers and all of them exhibit the same behavior. To me this rules out the idea that CF9 is caching something. All servers are connecting to the exact same DB so if caching was occurring it would have to be the DB level.
Your issue may be related to VARCHAR vs NVARCHAR. These 2 links may help
Querying MS SQL Server G/UUIDs from ColdFusion and
nvarchar vs. varchar in SQL Server, BEWARE
What might be happening is there is a setting in ColdFusion administrator if cfqueryparam sends varchars as unicode or not. If that setting does not match the column setting (in your case, if that setting is enabled) then MS SQL will not use that index.
As Mark points out it is is probably got a bad execution plan in the cache. One of the advantages of cfqueryparam is that when you pass in different values it can reuse the cached plan it has for that statement. This is why when you try it with a smaller list you see no improvement. When you do not use cfqueryparam SQL Server has to work out the Execution Plan each time. This normally a bad thing unless it has a sub optimal plan in the cache. Try clearing the cache as explained here http://www.devx.com/tips/Tip/14401 this hopefully will mean that the next time you run your statement with cfqueryparam in it'll cache the better plan.
Make sense?
I don't think cfqueryparam causing issue. As you have mention big hike in execution it may be index not going to use for your query when trying with cfqueryparam. I have created same scenario on my development computer but I got same execution time with and without cfqueryparam. There may be some overhead using list as in first query you are passing it directly as test and in second coldfusion need to create from query parameter from provided list but again this should not that much. I will suggest to start "SQL Server Profiler" and monitor query executed on server, this will give you better who costing another 500 ms.

SQL Query Optimization

I am using SQL Server 2008 and I need to optimize my queries.For that purpose I am using Database Engine Tuning Advisor.
My question is can I check the performance of only one SQL query at a time or more than one suing new session?
To analyze one query at a time right click it in the SSMS script window and choose the option "Analyze Query in DTA" For this workload select the option "keep all existing PDS" to avoid loads of drop recommendations for indexes not used by the query under examination.
To do more than one first capture a trace file with a representative workload sample then you can analyse that with the DTA.
There are simple steps that must follow when writes SQL Query:-
1-Take the name of the columns in the select query instead of *
2-Avoid sub queries
3-Avoid to use operator IN operator
4-Use having as a filter in in Group By
5-Don not save image in database instead of this save the image
Path in database Ex: saving image in the DB takes large space and each
time needs to serialization when saving or retrieving images in the database.
6-Each table should have a primary key
7-Each table should have a minimum of one clustered index
8-Each table should have an appropriate amount of non-clustered index Non-clustered index should be created on columns of table based on query which is running
9-Following priority orders should be followed when any index is
created a) WHERE clause, b) JOIN clause, c) ORDER BY clause, d)SELECT clause
10-Do not to use Views or replace views with original source table
11-Triggers should not be used if possible, incorporate
the logic of trigger in stored procedure
12-Remove any adhoc queries and use Stored Procedure instead
13-Check if there is atleast 30% HHD is empty it will be improves the performance a bit
14-If possible move the logic of UDF to SP as well
15-Remove any unnecessary joins from the table
16-If there is cursor used in a query, see if there is any other way to avoid the use of this
(either by SELECT … INTO or INSERT … INTO, etc)

SQL Server STATISTICS

So for this one project, we have a bunch of queries that are executed on a regular basis (every minute or so. I used the "Analyze Query in Database Engine " to check on them.
They are pretty simple:
select * from tablex where processed='0'
There is an index on processed, and each query should return <1000 rows on a table with 1MM records.
The Analyzer recommended creating some STATISTICS on this.... So my question is: What are those statistics ? do they really help performance ? how costly are they for a table like above ?
Please bear in mind that by no means I would call myself a SQL Server experienced user ... And this is the first time using this Analyzer.
Statistics are what SQL Server uses to determine the viability of how to get data.
Let's say, for instance, that you have a table that only has a clustered index on the primary key. When you execute SELECT * FROM tablename WHERE col1=value, SQL Server only has one option, to scan every row in the table to find the matching rows.
Now we add an index on col1 so you assume that SQL Server will use the index to find the matching rows, but that's not always true. Let's say that the table has 200,000 rows and col1 only has 2 values: 1 and 0. When SQL Server uses an index to find data, the index contains pointers back to the clustered index position. Given there's only two values in the indexed column, SQL Server decides it makes more sense to just scan the table because using the index would be more work.
Now we'll add another 800,000 rows of data to the table, but this time the values in col1 are widely varied. Now it's a useful index because SQL Server can viably use the index to limit what it needs to pull out of the table. Will SQL Server use the index?
It depends. And what it depends on are the Statistics. At some point in time, with AUTO UPDATE STATISTICS set on, the server will update the statistics for the index and know it's a very good and valid index to use. Until that point, however, it will ignore the index as being irrelevant.
That's one use of statistics. But there is another use and that isn't related to indices. SQL Server keeps basic statistics about all of the columns in a table. If there's enough different data to make it worthwhile, SQL Server will actually create a temporary index on a column and use that to filter. While this takes more time than using an existing index, it takes less time than a full table scan.
Sometimes you will get recommendations to create specific statistics on columns that would be useful for that. These aren't indices, but the do keep track of the statistical sampling of data in the column so SQL Server can determine whether it makes sense to create a temporary index to return data.
HTH
In Sql Server 2005, set auto create statistics and auto update statistics. You won't have to worry about creating them or maintaining them yourself, since the database handles this very well itself.

SQL Server query execution plan shows wrong "actual row count" on an used index and performance is terrible slow

Today i stumbled upon an interesting performance problem with a stored procedure running on Sql Server 2005 SP2 in a db running on compatible level of 80 (SQL2000).
The proc runs about 8 Minutes and the execution plan shows the usage of an index with an actual row count of 1.339.241.423 which is about factor 1000 higher than the "real" actual rowcount of the table itself which is 1.144.640 as shown correctly by estimated row count. So the actual row count given by the query plan optimizer is definitly wrong!
Interestingly enough, when i copy the procs parameter values inside the proc to local variables and than use the local variables in the actual query, everything works fine - the proc runs 18 seconds and the execution plan shows the right actual row count.
EDIT: As suggested by TrickyNixon, this seems to be a sign of the parameter sniffing problem. But actually, i get in both cases exact the same execution plan. Same indices are beeing used in the same order. The only difference i see is the way to high actual row count on the PK_ED_Transitions index when directly using the parametervalues.
I have done dbcc dbreindex and UPDATE STATISTICS already without any success.
dbcc show_statistics shows good data for the index, too.
The proc is created WITH RECOMPILE so every time it runs a new execution plan is getting compiled.
To be more specific - this one runs fast:
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
declare #local datetime
set #local = #Param
select
some columns
from
table1
where
column = #local
group by
some other columns
And this version runs terribly slow, but produces exactly the same execution plan (besides the too high actual row count on an used index):
CREATE Proc [dbo].[myProc](
#Param datetime
)
WITH RECOMPILE
as
set nocount on
select
some columns
from
table1
where
column = #Param
group by
some other columns
Any ideas?
Anybody out there who knows where Sql Server gets the actual row count value from when calculating query plans?
Update: I tried the query on another server woth copat mode set to 90 (Sql2005). Its the same behavior. I think i will open up an ms support call, because this looks to me like a bug.
Ok, finally i got to it myself.
The two query plans are different in a small detail which i missed at first. the slow one uses a nested loops operator to join two subqueries together. And that results in the high number at current row count on the index scan operator which is simply the result of multiplicating the number of rows of input a with number of rows of input b.
I still don't know why the optimizer decides to use the nested loops instead of a hash match which runs 1000 timer faster in this case, but i could handle my problem by creating a new index, so that the engine does an index seek statt instead of an index scan under the nested loops.
When you're checking execution plans of the stored proc against the copy/paste query, are you using the estimated plans or the actual plans? Make sure to click Query, Include Execution Plan, and then run each query. Compare those plans and see what the differences are.
It sounds like a case of Parameter Sniffing. Here's an excellent explanation along with possible solutions: I Smell a Parameter!
Here's another StackOverflow thread that addresses it: Parameter Sniffing (or Spoofing) in SQL Server
To me it still sounds as if the statistics were incorrect. Rebuilding the indexes does not necessarily update them.
Have you already tried an explicit UPDATE STATISTICS for the affected tables?
Have you run sp_spaceused to check if SQL Server's got the right summary for that table? I believe in SQL 2000 the engine used to use that sort of metadata when building execution plans. We used to have to run DBCC UPDATEUSAGE weekly to update the metadata on some of the rapidly changing tables, as SQL Server was choosing the wrong indexes due to the incorrect row count data.
You're running SQL 2005, and BOL says that in 2005 you shouldn't have to run UpdateUsage anymore, but since you're in 2000 compat mode you might find that it is still required.

Resources