I am quite confused about the difference between an index on table and index on view (Indexed View). Please clarify it.
There really is none. The index on both table or view basically serves to speed up searches.
The main thing is: views normally do not have indices. When you add a clustered index to a view, you're basically "materializing" that view into a system-maintained, always automatically updated "pseudo-table" that exists on disk, uses disk space just like a table, and since it's really almost a table already, you can also add additional indices to an indexed view.
So really - between a table and an indexed view, there's little difference - and there's virtually no difference at all between indices on tables and an indexed view.
Indexes on views have some restrictions, because views can be based upon various combinations of tables and views.
In either case, they are similar, and as underlying data changes, indexes may or not need to be updated.
Indexes on table are generally always used - typically you will have at least one unique index (primary key) and may have identified one of the indexes to be clustered.
Indexes on views are generally only applied as an optimization technique as view reads become heavy, indexes on the view can improve performance using the views.
I've used indexed views to drastically improve the performance of queries where I want to group by a unique combination of fields and maybe calculate some aggregate SUM or count on them.
For example, consider a table that contains customer, truck, distance, date (plus about 30 other performance columns I don't want to query right now). I have hundreds of customers, they have hundreds of trucks each and each truck reports distance and other data 5 times a day. If I want to query a list of which trucks are reporting in which months, I create a view like this:
CREATE VIEW dbo.vw_DistinctUnitMonths
WITH SCHEMABINDING
AS
SELECT CustomerGroup,
CustomerId,
Vehicle,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE) AS Month, --Converts Date to First of the Month
SUM(CASE WHEN Miles > 0 THEN Miles ELSE 0 END) AS Miles,
COUNT_BIG(*) AS Count
FROM dbo.PerformanceData
GROUP BY CustomerGroup, CustomerId, Vehicle, CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_DistinctUnitMonths ON vw_DistinctUnitMonths (CustomerGroup, CustomerId, Vehicle, Month)
GO
Here's a slow query that doesn't use the view:
--Can Be Very Slow!
SELECT CustomerGroup,
CustomerId,
Vehicle,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE) AS Month
FROM PerformanceData
WHERE Month >= '2020-01-01'
AND Month < '2020-02-01'
GROUP BY Vehicle, ClientID, ClientGroupId, CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, Date), 0) AS DATE)
And here is one that runs much faster, because of the indexed view.
--Much Faster
SELECT CustomerGroup,
CustomerId,
Vehicle,
Month
FROM vw_DistinctUnitMonths WITH (NOEXPAND)
WHERE Month >= '2020-01-01'
AND Month < '2020-04-01'
GROUP BY Vehicle, ClientID, ClientGroupId, Month
Because the indexed view is creating an index on only the unique combinations of customer, group, vehicle and month, the disk space for the view is much smaller than if I were to index those columns on the source table. Queries to the view are faster because the data in the view is concentrated to some tens of megabytes instead of the hundreds of gigabytes the source table occupies.
See also MSFT Docs: Create Indexed Views
Related
I have defined cluster key on one of the column "time periods" , when i use where clause it operates on metadata that I can see in history profile of below query
select count(*) from table where time_period = 'Jan 2021'
but when i use group by to know count of each month , it scan all the partition.
select time_period , count(*) from table group by time_period
Why the second query is not the metadata operation ..?
select time_period , count(*) from table group by time_period;
is a full table scan.
select count(*) from table where time_period = 'Jan 2021'
is a full scan on partitions with the time_period equal to one value, so the meta data is searched to find the matching partitions, thus the pruning.
if you table has values from 'Jan 2020' to 'Jan 2021' and assuming those are dates not strings (which would be very bad for performance), and assuming you data is clustered on time_period (or naturally inserting in "months") then
select time_period, count(*)
from table
where time_period >= '2021-06-01'
group by 1 order by 1;
should only read ~50% of your partitions, as the assumed order of the data, means only half the tables need to be read.
Answering the "meta-data" vs "scanning" question. This is based on years of working with query optimization, and is "very well educated speculation".
There is big difference between "COUNT()" and "COUNT() ... GROUP BY". The latter is much more complex and handles much more complex queries.
Optimizers evolve over time to handle special cases, but they start out focusing on more common types of queries.
The non-GROUP query against a non-keyed but well clustered table with use a scan. It's a specialized optimization, meaningful, optimization for a special case.
But the same specialization is not present in the GROUP BY, which addresses a much broader class of queries, with GROUP BY and WHERE clauses for multiple non-cluser-key columns.
The COUNT() GROUP BY would need to add a special check for this particular query form; once anything else is added, the meta-data would not be sufficient.
So no specialized optimization for this specific case in COUNT(), GROUP BY
I have many tables in my database and each one has one or two fields which is DATE field. This is increasing my database size so I am thinking to store all DATE fields in one table and add relationship to all tables. Is it possible and a good idea or not?
My database, example:
Old design
tblCustomer = > CustomerID, Surname, Name, DateFirstVisit, DateStopped
tblOrder = > OrderID, CustomerID, DateOrder, Order, DateShiped
tblPayment = > PaymentID, CustomerID, DatePayment, Price, DateCheck
New design
tblCustomer = > CustomerID, Surname, Name, DateInID, DateOutID
tblOrder = > OrderID, CustomerID, DateInID, Order, DateOutID
tblPayment = > PaymentID, CustomerID, DateInID, Price, DateOutID
tblDateIn = > DateInID, DateIn
tblDateOut = > DateOutID, DateOut
Can I combine tblDateIn and tblDateOut?
Thank you...
Technically, yes, you can further normalize your database this way. You could go so far as to have a Dates table that just has every date in it and use those dates by reference to a DateID, but this is over-normalization.
In addition to making simple queries more complicated because you will have to join to the dates table every time, I think you'll find that you don't save that much space and might possibly use more space. I don't know for certain what Access uses, but dates are typically stored internally as decimal values or an integer representing a count of seconds since a starting date. In any case, the space you would save in your tables by having an integer key versus Access' internal date value would be tiny and likely offset by having additional tables and indexes involved in foreign keys.
I'm trying to create indexed view containing only the data for the last 2 weeks.
This part works fine:
CREATE VIEW [dbo].[MainLogView]
WITH SCHEMABINDING
AS
SELECT Id, Date, System, [Function], StartTime, EndTime, Duration, ResponseIsSuccess, ResponseErrors
FROM dbo.MainLog
WHERE (Date >= DATEADD(day, - 14, GETDATE()))
But when I try add index:
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON MainLogView (Id);
I'm geting:
Cannot create index on view 'dbo.MainLogView'. The function 'getdate'
yields nondeterministic results. Use a deterministic system function,
or modify the user-defined function to return deterministic results.
I know why, but how to reduce the data in a view for the last 2 weeks?
I need small and fast querable portion of data from my table.
You could (I think, but I have no real world experience with indexed views) create a one record table (an actual table, since a view is not allowed in an indexed view) which you fill with the current date - 14 days. This table you can keep up to date; either manually, with a trigger or some other clever mechanism. You can use that table to join, and in effect use as filter.
Of course, when you query the view, you have to be sure to update your 'currentDate' table first!
You'd get something like:
CREATE VIEW [dbo].[MainLogView]
WITH SCHEMABINDING
AS
SELECT Id, Date, System, [Function], StartTime, EndTime, Duration, ResponseIsSuccess, ResponseErrors
FROM dbo.MainLog ML
INNER JOIN dbo.CurrentDate CD
ON ML.Date >= CD.CurrentDateMin14Days
(Totally untested, might not work... This is basically a hack, I am not at all sure the indexed view will give you any performance increase. You might be better off with a regular view.)
I have 2 database tables called Spend, and VendorSpend. The columns used in the Spend table are called VendorID, VendorName, RecordDate, and Charges. The VendorSpend table contains VendorID and VendorName but with distinct data (one record for each unique VendorID). I need a simple way to add a column to the VendorSpend table called Aug2015, this column will contain the SUM of each Vendor's charges within that month time period. It will be calculated based on this query:
Select Sum(Charges)
from Spend
where RecordDate >= '2015-08-01' and RecordDate <= '2015-08-31'
Keep in mind this will need to be called whenever new data is inserted into the Spend table and the VendorSpend table will need to update based on the new data. This will happen every month so actually a new column will need to be added and the data be calculated every month.
Any assistance is greatly appreciated.
Create a user-defined function that you pass a VendorID and Date to and which does your SELECT:
Select Sum(Charges)
from Spend
where VendorID=#VendorID
AND DATEDIFF(month, RecordDate, #Date) = 0
Now personally, I would stop right there and use the function to select your data at query time, rather than adding a new column to your table.
But treating your question as academic, you can create a computed column called [Aug2015] in VendorSpend that passes [VendorID] and '08/01/2015' to this function and it will contain the desired result.
I am trying to create a simple indexed view on the query below. But when I try to create a unique clustered index on it, I get the following error:
Cannot create the clustered index '..' on view '..' because the select
list of the view contains an expression on result of aggregate
function or grouping column. Consider removing expression on result of
aggregate function or grouping column from select list.
The query I used is as follows:
SELECT
[Manufacturer]
,ISNULL(SUM([QAV]),0) as AvgQAV
,ISNULL(SUM([BackOrders$]),0)as AvgBackorder$
,DATEPART(year,[Date])as Year
,DATEPART(month,[Date])as Month
,[fixSBU]
,[DC Name]
FROM [dbo].[TABLE1]
Group By
[Manufacturer]
,DATEPART(year,[Date])
,DATEPART(month,[Date])
,[fixSBU]
,[DC Name]
Could anyone tell me the possible cause for this?
As you can see I am already using the ISNULL function.
Here is a link to all the restrictions of an index view: https://msdn.microsoft.com/en-us/library/ms191432.aspx#Restrictions
From the documentation these two items should stick out:
If GROUP BY is present, the VIEW definition must contain COUNT_BIG(*)
and must not contain HAVING. These GROUP BY restrictions are
applicable only to the indexed view definition. A query can use an
indexed view in its execution plan even if it does not satisfy these
GROUP BY restrictions.
If the view definition contains a GROUP BY
clause, the key of the unique clustered index can reference only the
columns specified in the GROUP BY clause.
Also, you need to change your ISNULL statements. Right now you have ISNULL(SUM([BackOrders$]),0) and it should be SUM(ISNULL([BackOrders$], 0)). You need to SUM the ISNULL, not the other way around.
Doesn't make a whole lot of sense (at least not to me) but reference: https://msdn.microsoft.com/en-us/library/ms191432.aspx
Specifically:
If GROUP BY is present, the VIEW definition must contain COUNT_BIG(*) and must not contain HAVING. These GROUP BY restrictions are applicable only to the indexed view definition. A query can use an indexed view in its execution plan even if it does not satisfy these GROUP BY restrictions.
try adding a COUNT_BIG(*) to your select list and give it a whirl.
I had a similar problem. One of my select fields looked like this:
sum(Pa * (CTRatio1a/CTRatio2a) * (VTRatio1/VTRatio2)* Polarity * [Percentage])/1000.0
By including the last division by 1000 in the bracket, it resolved the problem:
sum(Pa * (CTRatio1a/CTRatio2a) * (VTRatio1/VTRatio2)* Polarity * [Percentage]/1000.0)
Tip: It's better to have a real date field in the database and not just Year / Month. That way you can create a date index in addition to the clustered index.
However if you have FullDate, Year and Month you can get the same error message view contains an expression on result of aggregate function or grouping column.
That error can occur if you do this:
SELECT
[Manufacturer]
,[Date] as FullDate
,DATEPART(year,[Date]) as Year
,DATEPART(month,[Date]) as Month
,COUNT_BIG(*) as Count
,SUM(OrderValue) as TotalOrderValue
FROM [dbo].[TABLE1]
Group By
[Manufacturer]
,[Date]
,DATEPART(year,[Date])
,DATEPART(month,[Date])
While not immediately obvious what's going on I assume this is because it looks at Date in the grouping columns and finds Date used in other columns (for the year and month). Clearly though this should logically work and you should be able to group like that.
I found a trick that got it working:
SELECT
[Manufacturer]
,DATEADD(day, 0, [Date]) as FullDate
,DATEPART(year,[Date])as Year
,DATEPART(month,[Date])as Month
,COUNT_BIG(*) as Count
,SUM(OrderValue) as TotalOrderValue
FROM [dbo].[TABLE1]
Group By
[Manufacturer]
,DATEADD(day, 0, [Date])
,DATEPART(year,[Date])
,DATEPART(month,[Date])
This tricked the parser into allowing it, and now I can create a separate index (after the clustered) to search by FullDate.
Bonus: The real reason I stumbled upon this was because I needed ISO_WEEK and ISO_YEAR which are expensive to calculate. Here here's my final full list of grouping clauses I'm using for that:
-- date
DATEADD(day, 0, [Order].OrderDateDt) as OrderDateDt, -- trick! add 0 days to date
-- day / month / year / quarter
DATEPART(day, [Order].OrderDateDt) as OrderDate_day,
DATEPART(month, [Order].OrderDateDt) as OrderDate_month,
DATEPART(year, [Order].OrderDateDt) as OrderDate_year,
DATEPART(quarter, [Order].OrderDateDt) as OrderDate_quarter,
-- iso week
DATEPART(iso_week, [Order].OrderDateDt) as OrderDate_isoweek,
YEAR(DATEADD(day, 26 - DATEPART(iso_week, [Order].OrderDateDt), [Order].OrderDateDt)) as OrderDate_isoyear
Make sure to include all these exactly the same in the SELECT and GROUP BY.