What is the purpose of automatically creating statistics for a non-indexed column? - sql-server

In SQL server, creating an index automatically creates statistics object for that index and uses it to decide best query execution plan.
Also, statistics object is automatically created for columns used in the WHERE clause - for example:
SELECT *
FROM AWSales
WHERE ProductID = 898
The above query automatically creates a statistics object for ProductID. What purpose does this serve?
Since the non-indexed column is unsorted, and it is also not a B-tree structure, then how does statistics help to choose a better query plan than a table scan?
I thought the purpose of statistics was to allow the engine to choose between using an index or not; and whether to use seek or scan. What knowledge am I missing?

It serves the same purpose as the statistics created for the index. It will use the statistics for estimates to choose the best execution plan based on CPU time and I/O's. The plan with the lowest cost will be selected.
When the indexes on the table do not cover the column in the where clause, so ProductID, in your example, it will create statistics on the column to create the histogram to sniff the estimates for the value you've supplied unless it already has a cached plan.
In your execution plan you can see what statistics the engine used to pick the plan by viewing the properties on the SELECT object in the plan (the left most object). Expand the OptimizerStatsUsage property.

Related

Is there benefit to index base tables of an indexed view?

After I created the indexed view, I tried disabling all the indexes in base tables including the indexes for foreign key column (constraint is still there) and the query plan for the view stays the same.
It is just like magic to me that the indexed view would be able to optimize the query so much even without base table being indexed. Even without any index on the View, SQL Server is able to do an index scan on the primary key index of the indexed view to retrieve data like 1000 times faster than using the base table.
Something like SELECT * FROM MyView WITH(NOEXPAND) WHERE NotIndexedColumn = 5 ORDER BY NotIndexedColumn
So the first two questions are:
Is there any benefit to index base tables of indexed view?
What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
Then I noticed that if I use full-text search + order by I would see a table spool (eager spool) in the query plan with a cost like 95%.
Query looks like SELECT ID FROM View WITH(NOEXPAND) WHERE CONTAINS(IndexedColumn, '"SomeText*"') ORDER BY IndexedColumn
Question n° 3:
Is there any index I could add to get rid of that operation?
It's important to understand that an indexed view is a "materialized view" and the results are stored onto disk.
So the speedup you are seeing is the actual result of the query you are seeing stored to disk.
To answer your questions:
1) Is there any benefit to index base tables of indexed view?
This is situational. If your view is flattening out data or having many extra aggregate columns, then an indexed view is better than the table. If you are just using your indexed view like such
SELECT * FROM foo WHERE createdDate > getDate() then probably not.
But if you are doing SELECT sum(price),min(id) FROM x GROUP BY id,price then the indexed view would probably be better. Granted, you are doing a more complex query with joins and other advanced options.
2) What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
First we need to understand how clustered indexes are stored. The index is stored in a B-tree. So SQL Server is walking the tree finding all values that match your criteria when you are searching on a clustered index Depending on how you have your indexes set up i.e covering vs non covering and how your non-clustered indexes are set up will determine what the Pages and Extents look like. Without more knowledge of the table structure I can't help you understand what the scan is actually doing.
3)Is there any index I could add to get rid of that operation?
Just because something is taking 95% of the query's time doesn't make that a bad thing. The query time needs to add up to 100%, so no matter what you do there is always going to be something taking up a large percentage of time. What you need to check is the IO reads and how much time the query itself takes.
To determine this, you need to understand that SQL Server caches the results of queries. With this in mind, you can have a query take a long time the first time but afterward since the data itself is cached it would be much quicker. It all depends on the frequency of the query and how your system is set up.
For a more in-depth read on indexed view

Optimizing a table for the latest/last rows in Azure SQL Server

I have a table on a MS Azure SQL DB with 60,000 rows that is starting to take longer to execute with a SELECT statement. The first column is the "ID" column which is the primary key. As of right now, there is no other indexes. The thing about this table is the rows are based on recent news articles, therefore the last rows in the table are always going to be accessed more than the older rows.
If possible, how can I tell SQL Server to start querying at the end of the table working backwards when I do a SELECT operation?
Also, what can I do with indexes to make reading from the table faster with the last rows as the priority?
Typically, the SQL Server query optimizer will choose the data access strategy based on the available indexes, data distribution statistics & query. For example, SQL Server can scan an index forward, backward, physical order & so on. The choice is determined based on many variables.
In your example, if there is a date/time column in the table then you can index and use that in your predicate(s). This will automatically enable use of that index if that is the most selective one.
Alternatively, you can partition the table based on a column and access most recent data based on the partitioning key. This is common use of partitioning with a rolling window. With this approach, the predicate in your queries will specify the partitioning column which will help the optimizer pick the correct set of partitions to scan. This will dramatically reduce the amount of data that needs to be searched since partition elimination happens before execution depending on the query plan.

Creating an Index on left(date,4), is this possible? Or is there a better way to handle large data?

I'm taking over an database with a table that is growing out of control. It has transaction records for 2011, 2012, 2013 and into the future.
The table is crucial to the company's operation. But it is growing out of control with 730k records and growing with transaction being added bi-weekly.
I do not wish to alter the existing structure of the table because many existing operations depend on it, so far it has an index on the transaction ID and transaction date. But it is becoming very cumbersome to query the table.
Would it be wise or is it even possible to index them to just the year of the transaction dates by using left(date,4) as part of the index?
EDIT: the table is not normalized (and I don't see the purpose to normalize since each row is unique to the claim number), and there are 168 fields to each record with 5 differnent "memo" fields of varchar(255).
One of the options is to create a Filtered Index - it is like uniting a group of records
depending on specific criteria.
In your case you should create several indexed - each filtering the records for specific year.
For example:
CREATE NONCLUSTERED INDEX IndexFor2013Year
ON MyTable.BillOfMaterials (SomeDate)
WHERE SomeDate>"2013-01-01 00:00:00" and SomeDate<"2014-01-01 00:00:00";
GO
Anyway, creating many indexes to a table that is often under DML operations (UPDATE/ INSERT/ DELETE) could even lead to bad performance.
You should perform some test and compare the execution plans.
And please, note that I am giving just an example - depending on what exactly is your query you should create an index. Sometimes, watching the execution plans of my queries I am proposed by the SQL Management Studio (2012) what indexes exactly could lead to better performance.

Sorting rows in DB (by id or another field?)

What is the best way and why?
sorting rows by id
sorting rows by another field (create time for example)
Upd 1. Of course, I can add index for another field
Upd 2. Best for speed & usability
Do you mean sorting or indexing? Regardless of which database technology you are using, you can typically apply indexes on any column or on different combinations of columns. This allows the database engine to optimize query execution and make better execution plans. In a way, indexing is "sorting", for your database engine.
Actual sorting (as in ORDER BY MyColumn ASC|DESC) is really only relevant in the context of querying the database. How you decide to sort your query results would typically depend on how you intend to use your data.
I assume that you are using a relational database, such as MySQL or PostgreSQL, and not one of the non-SQL databases.
This means that you are interacting with the database using the SQL language, and the way you get data from the database is to use the SQL SELECT statement. In general, your tables will have a "key" attribute, which means that each row has a unique value for that attribute, and usually the database will store the data pre-sorted by that key (often in a B-tree data structure).
So, when you do a query, e.g. "SELECT firstname,lastname FROM employees;", where "employees" is a table with a few attributes, such as "employee_id", "firstname", "lastname", "home_address", and so forth, the data will generally be delivered in order of employee_id value.
To get the data sorted in a different order, you might make use of the SQL "ORDER_BY" clause in the SELECT statement. For example, if you wanted the data to be sorted by "lastname", then you might use something like "SELECT firstname,lastname FROM employees ORDER_BY lastname;".
One way that the database can implement this query is to retrieve all the data and then sort it before passing it on to the user.
In addition, it is possible to create indexes for the table which allows the database to find rows with particular values, or value ranges, for attributes or sets of attributes. If you have added a "WHERE" clause to the SELECT query which (dramatically) reduces the number of matching rows, then the database may use the index to speed up the query processing by first filtering the rows and then (if necessary) sorting them. Note that the whole topic of query optimization for databases is complex and takes into account a wide range of factors to try and estimate which of the possible query implementation alternatives will result in the fastest implementation.

SQL Server Indexes Aren't Helping

I have a table (SQL 2000) with over 10,000,000 records. Records get added at a rate of approximately 80,000-100,000 per week. Once a week a few reports get generated from the data. The reports are typically fairly slow to run because there are few indexes (presumably to speed up the INSERTs). One new report could really benefit from an additional index on a particular "char(3)" column.
I've added the index using Enterprise Manager (Manage Indexes -> New -> select column, OK), and even rebuilt the indexes on the table, but the SELECT query has not sped up at all. Any ideas?
Update:
Table definition:
ID, int, PK
Source, char(3) <--- column I want indexed
...
About 20 different varchar fields
...
CreatedDate, datetime
Status, tinyint
ExternalID, uniqueidentifier
My test query is just:
select top 10000 [field list] where Source = 'abc'
You need to look at the query plan and see if it is using that new index - if it isnt there are a couple things. One - it could have a cached query plan that it is using that has not been invalidated since the new index was created. If that is not the case you can also trying index hints [ With (Index (yourindexname)) ].
10,000,000 rows is not unheard of, it should read that out pretty fast.
Use the Show Execution Plan in SQL Query Analyzer to see if the index is used.
You could also try making it a clustered index if it isn't already.
For a table of that size your best bet is probably going to be partitioning your table and indexes.
select top 10000
How unique are your sources? Indexes on fields that have very few values are usually ignore by the SQL engine. They make queries slower. You might want to remove that index and see if it is faster if your SOURCE field only has a handful of values.

Resources