I am a little bit confused about using indexed views in SQL Server 2016.
Here is my issue. If I have a fact table with a lot of columns and I create an indexed view named IV_Sales as
select
year,
customer,
sum(sales)
from F_Sales
group by year, customer
I would aggregate all sales for year and customer.
After that, when a user runs a query from the F_sales like
Select
year, customer,
sum(sales)
from F_sales
group by year, customer
will the Optimizer (in SQL Server Enterprise Edition) automatically use the indexed view IV_sales instead of table scan of F_sales?
I have the Standard Edition and when I add
Select
year,
customer,
sum(sales)
from F_sales WITH (NOEXPAND)
group by year, customer
I get an error since there is no clustered index like the one I created on the indexed view. Is there a way to force using index views instead of the table in Standard Edition?
My real world issue is that I have a Cognos Framework model pointing to the table F_sales and when a report is executed using Year, customer and sum of sales for performance reasons I want it to use the indexed view automatically instead of the table.
I hope I'm being clear about my issue. Many thanks in advance.
If you have a performance issue, Indexed views are probably the last thing you want to try.
You should exhaust all other avenues, like standard indexes first.
For example if you know for sure that you are doing a table scan, the simple solution is to add a non clustered index to satisfy the query so it does an index scan or seek instead. If it still doesn't use this, you need to continue your performance tuning, and work out why it isn't (non sargable expressions? stale statistics?)
Your indexed view will automatically be used (without explicit mention of the indexed view) in a very limited number of cases. You'll see it in the query plan.
If your query very closely matches the index view definition, it will use your indexed view.
Make a very small change to your SQL, (like joining to another table) and it won't throw an error, it will just fall back to not using the indexed view.
Automatic SQL writing tools like Cognos will very quickly make the SQL unrecognisable to the query planner and therefore not use the indexed view.
This is all very easily verifiable if you just crack open SSMS and do some experiments.
So in short: start your optmisation with standard indexes, filtered indexes, even column store indexes (which are particularly good for fact tables or so I hear)
Related
I have a large amount of data around 5M that are stored in a very flat table which has 12 Columns. This table contains aggregated data and it does not have any relationship with other tables. I want to run dynamic queries on this data for reporting purpose. The table contains Fields like District, City, Year, Category, SubCategory, SaleAmount etc
I want to view reports such as Sales between year 2010 and 2013.
Sales of each product in various year and compare them.
Sales by specific salesmen in a year.
Sales by category, Subcategory etc.
I am using SQL Server 2008, but I am not a DBA hence I do not know things like what type of indexes should I create? Which Columns should I index in order to make my queries work.
If the amount of data was small I would not have bothered about all these questions and just proceeded but knowing which columns to index and what type of indexes to create is vital in this case.
Kindly let me know the best way to ensure fast execution of queries.
Will it work if I create a clustered index on all my columns? or will it hurt me.
Keep in mind that this table will not be updated very frequently maybe on monthly basis.
Given your very clear and specific requirements, I would suggest you create a non-clustered index for each field and leave it to the optimiser as a first step. (ie you create 12 indexes) Place only a single field in each index. Dont index ( or at least use caution ) any long text type fields. Also dont index a field such as M/F that has only 2 values and a 50/50 split. I am assuming you have predicates on each field, but dont bother indexing any fields that are never used for selection purposes.
If you still have problems after this, find the query analyser in sql server and use it to guide how queries are processed.
Multi segmented indexes are sometimes better, but if your queries are mostly restricting to a small subset of the table then single field indexs will be fine.
You might have residual performance issues with queries that use "order by", but lets just leave that as a heads up at this stage.
My reasoning is based on
You only have 12 columns, so we wont overload anything
There are only 5M rows. This is quite easy for sql/server to handle
The growth in the data is small, so index updates shouldnt be too much of an issue.
The optimiser will love these queries combined with indexes.
We dont't have typical query examples to specify multi segment indexes, and the question seems to imply highly variable queries.
I'm taking over an database with a table that is growing out of control. It has transaction records for 2011, 2012, 2013 and into the future.
The table is crucial to the company's operation. But it is growing out of control with 730k records and growing with transaction being added bi-weekly.
I do not wish to alter the existing structure of the table because many existing operations depend on it, so far it has an index on the transaction ID and transaction date. But it is becoming very cumbersome to query the table.
Would it be wise or is it even possible to index them to just the year of the transaction dates by using left(date,4) as part of the index?
EDIT: the table is not normalized (and I don't see the purpose to normalize since each row is unique to the claim number), and there are 168 fields to each record with 5 differnent "memo" fields of varchar(255).
One of the options is to create a Filtered Index - it is like uniting a group of records
depending on specific criteria.
In your case you should create several indexed - each filtering the records for specific year.
For example:
CREATE NONCLUSTERED INDEX IndexFor2013Year
ON MyTable.BillOfMaterials (SomeDate)
WHERE SomeDate>"2013-01-01 00:00:00" and SomeDate<"2014-01-01 00:00:00";
GO
Anyway, creating many indexes to a table that is often under DML operations (UPDATE/ INSERT/ DELETE) could even lead to bad performance.
You should perform some test and compare the execution plans.
And please, note that I am giving just an example - depending on what exactly is your query you should create an index. Sometimes, watching the execution plans of my queries I am proposed by the SQL Management Studio (2012) what indexes exactly could lead to better performance.
I'm looking for some advice for a table structure in sql.
Basically I will have a table with about 30 columns of strings, ints and decimals. A service will be writing to this table about 500 times a day. Each record in the table can either be 'inactive' or 'active'. This table will constantly grow and at any one time there will be about 100 'active' records that need to be returned.
While the table is small the performance to return the 'active' records is responsive. My concern comes 12-18 months down the line when the table is much larger or even later when there will be millions of records in the table.
Is it better to maintain two tables one for 'active' records and one for 'inactive' records from a performance view or will creating a index on the active column solve any potential performance issues?
It certainly will be more performant to have a small "active" table. The most obvious cost is that maintaining the records correctly is more troublesome than with one table. I would probably not do so immediately, but bear it in mind as a potential optimisation.
An index on the active column is going to massively improve matters. Even more so, would multi-column index (or indices) appropriate for the query (or queries) most often used. For example, if you would often ask for active rows created after a certain date, then an index on both date and active could be used to have a single index for retrieval. Likewise, if you wanted all active rows ordered by id, then one on both id and active could be used.
Testing with Database Engine Tuning Advisor can be very informative here, though not as good at predicting what the best approach for data you expect to change in months to come - as you do here.
An indexed view may well be your best approach, as that way you can create the closest thing to a partial index that is available in SQLServer 2005 (which your tags suggest you are using). See http://technet.microsoft.com/en-us/library/cc917715.aspx#XSLTsection124121120120 This will create an index based on your general search/join/order criteria, but only on the relevant rows (ignoring the others entirely).
Better still, if you can use SQLServer 2008, then use a filtered index (what Microsoft have decided to call partial indices). See http://technet.microsoft.com/en-us/library/cc280372.aspx for more on them.
If you'd tagged with 2008 rather than 2005 I'd definitely be suggesting filtered indices, as is I'd probably go for the indexed view, but might just go for the multi-column index.
Index the active field and rebuild the index each weekend and you will be good for ages if it's really only 500 records a day.
365 days times 500 is 182500 and you wrote
millions of records in the table
but with only 500 a day that would take eleven years.
Index is probably the way to go for performance on a table like that.
You can consider using another table by putting data you are sure you won't use unless on certain specific report.
I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?).
A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY.
What do you think is the correct way of solving this problem? Is there a better way than having two tables?
Using a simple index on the columns required (Year and Month) should greatly improve either a DISTINCT, or GROUP BY Query.
I would not go with a secondary table as this adds extra over head to maintaining the secondary table (inserts/updates deletes will require that you validate the secondary table)
EDIT:
You might even want to consider using Improving Performance with SQL Server 2005 Indexed Views
Make sure to have an Clustered Index on those columns.
and partition your table on these date columns an place the datafiles on different disk drives
I Believe keeping your index fragmentation low is your best shot.
I also Believe having a physical view with the desired select is not a good idea,
because it adds Insert/Update overhead.
on average there's 3,5 insert's per minute.
or about 17 seconds between each insert (on average please correct me if I'm wrong)
The question is are you selecting more often than every 17 seconds?
That's the key thought.
Hope it helped.
Use a 'Materialized View', also called an 'Indexed View with Schema Binding', and then index this view. When you do this SQL server will essentially create and maintain the data in a secondary table behind the scenes and choose to use the index on this table when appropriate.
This is similar to what your co-worker suggested, the advantage being you won't need to add logic to your query to take advantage of it, SQL Server will do this when it creates a query plan and SQL Server will also automatically maintain the data in the Indexed View.
Here is how you would accomplish this: create a view that returns the distinct [month] [year] values and then index [year] [month] on the view. Again SQL Server will use the tiny index on the view and avoid the table scan on the big table.
Because SQL server will not let you index a view with the DISTINCT keyword, instead use GROUP BY [year],[month] and use BIG_COUNT(*) in the SELECT. It will look something like this:
CREATE VIEW dbo.vwMonthYear WITH SCHEMABINDING
AS
SELECT
[year],
[month],
COUNT_BIG(*) [MonthCount]
FROM [dbo].[YourBigTable]
GROUP BY [year],[month]
GO
CREATE UNIQUE CLUSTERED INDEX ICU_vwMonthYear_Year_Month
ON [dbo].[vwMonthYear](Year,Month)
Now when you SELECT DISTINCT [Year],[Month] on the big table, the query optimizer will scan the tiny index on the view instead of scanning millions of records on the big table.
SELECT DISTINCT
[year],
[month]
FROM YourBigTable
This technique took me from 5 million reads with an estimated I/O of 10.9 to 36 reads with an estimated I/O of 0.003. The overhead on this will be that of maintaining an additional index, so each time the large table is updated the index on the view will also be updated.
If you find this index is substantially slowing down your load times. Drop the index, perform your data load and then recreate it.
Full working example:
CREATE TABLE YourBigTable(
YourBigTableID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_YourBigTable_YourBigTableID PRIMARY KEY,
[Year] INT,
[Month] INT)
GO
CREATE VIEW dbo.vwMonthYear WITH SCHEMABINDING
AS
SELECT
[year],
[month],
COUNT_BIG(*) [MonthCount]
FROM [dbo].[YourBigTable]
GROUP BY [year],[month]
GO
CREATE UNIQUE CLUSTERED INDEX ICU_vwMonthYear_Year_Month ON [dbo].[vwMonthYear](Year,Month)
SELECT DISTINCT
[year],
[month]
FROM YourBigTable
-- Actual execution plan shows SQL server scaning ICU_vwMonthYear_Year_Month
create a materialized indexed view of:
SELECT DISTINCT
MonthCol, YearCol
FROM YourTable
you will now get access to the pre-computed distinct values without going through the work every time.
Make the date the first column in the table's clustered index key. This is very typical for historic data, because most, if not all, queries are interested in specific ranges and a clustered index on time can address this. All queries like 'month of May' need to be addressed as ranges, eg: WHERE DATECOLKEY BETWEEN '05/01/2010' AND '06/01/2001'. Answering a question like 'are there any records in May' will involve a simple seek into the clustered index.
While this seems complicated for a programmer mind, it is the optimal way to approach a database design problem.
Microsoft in its MSDN entry about altering SQL 2005 partitions, listed a few possible approaches:
Create a new partitioned table with the desired partition function, and then insert the data from the old table into the new table by using an INSERT INTO...SELECT FROM statement.
Create a partitioned clustered index on a heap
Drop and rebuild an existing partitioned index by using the Transact-SQL CREATE INDEX statement with the DROP EXISTING = ON clause.
Perform a sequence of ALTER PARTITION FUNCTION statements.
Any idea what will be the most efficient way for a large scale DB (millions of records) with partitions based on the dates of the records (something like monthly partitions), where data spreads over 1-2 years?
Also, if I mostly access (for reading) recent information, will it make sense to keep a partition for the last X days, and all the rest of the data will be another partition? Or is it better to partition the rest of the data too (for any random access based on date range)?
I'd recommend the first approach - creating a new partitioned table and inserting into it - because it gives you the luxury of comparing your old and new tables. You can test query plans against both styles of tables and see if your queries are indeed faster before cutting over to the new table design. You may find there's no improvement, or you may want to try several different partitioning functions/schemes before settling on your final result. You may want to partition on something other than date range - date isn't always effective.
I've done partitioning with 300-500m row tables with data spread over 6-7 years, and that table-insert approach was the one I found most useful.
You asked about how to partition - the best answer is to try to design your partitions so that your queries will hit a single partition. If you tend to concentrate queries on recent data, AND if you filter on that date field in your where clauses, then yes, have a separate partition for the most recent X days.
Be aware that you do have to specify the partitioned field in your where clause. If you aren't specifying that field, then the query is probably going to hit every partition to get the data, and at that point you won't have any performance gains.
Hope that helps! I've done a lot of partitioning, and if you want to post a few examples of table structures & queries, that'll help you get a better answer for your environment.