When are SQL Server Views updated? - sql-server

I have two tables that are around 500k rows each (and growing). Inserts/Updates happen to these constantly, sometimes 100's per minute. The system is having performance issues, namely timeouts, on basic inserts into these tables. We've tuned our indexes, and done the usual optimizations. But I'm wondering if the fact that these two tables are referenced in 5 views with heavy joining might be detrimental. I always thought, maybe mistakenly, that as underlying tables change, the views that reference them change too. So if the tables are changing that much, maybe our system is getting overwhelmed by having to constantly play catch-up updating views.

Unless they're indexed views (you haven't mention such in your question), they're not "updated" at all.
Normal views are similar to a macro in C - they're just a convenient shorthand to hide a part of a larger expression. They're expanded out into the parse tree of whatever statement references them, and the entire tree is then compiled and optimized - at the point of usage.
For indexed views, you would be largely correct - the views are maintained as part of the same transaction that performs changes in the base tables. However, the rules for indexed views have been designed so that this update activity shouldn't incur too large a penalty (they can be maintained without having to re-query the entire base table).

It depends:
1) If the view is not indexed, then view is expanded
-- View definition
CREATE VIEW Sales.v_SalesOrderDetail
AS
SELECT h.SalesOrderID, h.SalesOrderNumber, h.OrderDate,
d.SalesOrderDetailID, d.OrderQty, d.UnitPrice, d.LineTotal,
p.ProductID, p.Name AS ProductName, p.Color AS ProductColor
FROM Sales.SalesOrderHeader h
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Production.Product p ON d.ProductID = p.ProductID
GO
-- View usage
SELECT v.SalesOrderDetailID, v.OrderQty, v.UnitPrice, v.ProductName
FROM Sales.v_SalesOrderDetail v
WHERE v.ProductColor='Red';
GO
If we look at the execution plan (SSMS: Ctrl + M),
then we will see that the view (FROM Sales.v_SalesOrderDetail v) is expanded and the server queries just two tables: Sales.SalesOrderDetail d and Production.Product p. More, we can see how the join between Sales.SalesOrderHeader h and Sales.SalesOrderDetail d is removed because:
the SELECT clause (SELECT v.SalesOrderDetailID, v.OrderQty, v.UnitPrice, v.ProductName) doesn't includes columns from Sales.SalesOrderHeader table,
between this two table there is a foreign key constraint and
this FK constraint is trusted.
2) If the view is indexed (meaning that there is an UNIQUE CLUSTERED INDEX defined on the view) and the SQL Server edition is enterprise then the view could be expanded or not. If the edition <> enterprise then indexed view is expanded. We can force the server to not expands the [indexed] view by using NOEXPAND hint:
-- View definition
CREATE VIEW Sales.v_SalesOrderDetail2
WITH SCHEMABINDING
AS
SELECT h.SalesOrderID, h.SalesOrderNumber, h.OrderDate,
d.SalesOrderDetailID, d.OrderQty, d.UnitPrice, d.LineTotal,
p.ProductID, p.Name AS ProductName, p.Color AS ProductColor
FROM Sales.SalesOrderHeader h
INNER JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
INNER JOIN Production.Product p ON d.ProductID = p.ProductID
GO
-- Defining the UNIQUE CLUSTERED INDEX
CREATE UNIQUE CLUSTERED INDEX PK_v_SalesOrderDetail2
ON Sales.v_SalesOrderDetail2 (SalesOrderDetailID);
GO
-- View usage
SELECT v.SalesOrderDetailID, v.OrderQty, v.UnitPrice, v.ProductName
FROM Sales.v_SalesOrderDetail2 v WITH (NOEXPAND)
WHERE v.ProductColor='Red';
GO
In this case, we can see that the execution plan
includes the Clustered Index Scan PK_v_SalesOrderDetail2 operator. So, it uses the index defined on the second view.
Be aware: SQL Server bug indexed view + MERGE.

SQLServer views are not cached, so everytime you request a view the query is executed

Tuning indexes is good but it might be worth considering how often and when you update stats on these indexes. Also, using a buffer table to hold inserts that can then be inserted as bulk operation every 5 or 10 minutes or whatever is suitable for your system may help (putting this on a separate physical disk would be a good idea.) Doing this would make selects much faster 90% of the time and not much worse than current the other 10% of the time.

Related

SQL DELETE performance, T-SQL or ISO-compatible query

If I have two very large tables (TableA and TableB), both with an Id column, and I would like to remove all rows from TableA that have their Ids present in TableB. Which would be the fastest? Why?
--ISO-compatible
DELETE FROM TabelA
WHERE Id IN (SELECT Id FROM TableB)
or
-- T-SQL
DELETE A FROM TabelA AS A
INNER JOIN TableB AS B
ON A.Id = B.Id
If there are indexes on each Id, they should perform equally well.
If there are not indexes on each Id, exists() or in () may perform better.
In general I prefer exists() over in () because it allows you to easily add more than one comparison when needed.
delete a
from tableA as a
where exists (
select 1
from tableB as b
where a.Id = b.Id
)
Reference:
in vs inner join - Gail Shaw
exists() vs in - Gail Shaw
As long as your Id in TableB is unique, both queries should create the same execution plan. Just include the execution plan to each queries and verify it.
Take a look at this nice post: in-vs-join-vs-exists
There's an easy way to find out, using the execution plan (press ctrl + L on SSMS).
Since we don't know the data model behind your tables (the eventual indexes etc), we can't know for sure which query will be the fastest.
By experience, I can tell you that, for very large tables (>1mil rows), the delete clause is quite slow, because of all the logging. Depending on the operation you're doing, you will want SQL Server NOT TO log the delete.
You might want to check at this question :
How to delete large data of table in SQL without log?

Force joined view not to be optimized

I have a somewhat complex view which includes a join to another view. For some reason the generated query plan is highly inefficient. The query runs for many hours. However if I select the sub-view into a temporary table first and then join with this, the same query finished in a few minutes.
My question is: Is there some kind of query hint or other trick which will force the optimizer to execute the joined sub-view in isolation before performing the join, just as when using a temp table? Clearly the default strategy chosen by the optimizer is not optimal.
I cannot use the temporary table-trick since views does not allow temporary tables. I understand I could probably rewrite everything to a stored procedure, but that would break composeability of views, and it seems also like bad for maintenance to rewrite everything just to trick the optimizer to not use a bad optimization.
Adam Machanic explained one such way at a SQL Saturday I recently attended. The presentation was called Clash of the Row Goals. The method involves using a TOP X at the beginning of the sub-select. He explained that when doing a TOP X, the query optimizer assumes it is more efficient to grab the TOP X rows one at a time. As long as you set X as a sufficiently large number (limit of INT or BIGINT?), the query will always get the correct results.
So one example that Adam provided:
SELECT
x.EmployeeId,
y.totalWorkers
FROM HumanResources.Employee AS x
INNER JOIN
(
SELECT
y0.ManagerId,
COUNT(*) AS totalWorkers
FROM HumanResources.Employee AS y0
GROUP BY
y0.ManagerId
) AS y ON
y.ManagerId = x.ManagerId
becomes:
SELECT
x.EmployeeId,
y.totalWorkers
FROM HumanResources.Employee AS x
INNER JOIN
(
SELECT TOP(2147483647)
y0.ManagerId,
COUNT(*) AS totalWorkers
FROM HumanResources.Employee AS y0
GROUP BY
y0.ManagerId
) AS y ON
y.ManagerId = x.ManagerId
It is a super cool trick and very useful.
When things get messy the query optimize often resorts to loop joins
If materializing to a temp fixed it then most likely that is the problem
The optimizer often does not deal with views very well
I would rewrite you view to not uses views
Join Hints (Transact-SQL)
You may be able to use these hints on views
Try merge and hash
Try changing the order of join
Move condition into the join whenever possible
select *
from table1
join table2
on table1.FK = table2.Key
where table2.desc = 'cat1'
should be
select *
from table1
join table2
on table1.FK = table2.Key
and table2.desc = 'cat1'
Now the query optimizer will get that correct but as the query gets more complex the query optimize goes into what I call stupid mode and loop joins. But that is also done to protect the server and have as little in memory as possible.

How can I speed up this SQL view?

I'm a beginner at this so hope you can help. I'm working in SQL server 2008R2 and have a view that is comprised from four tables all joined together:
SELECT DISTINCT ad.award_id,
bl.funding_id,
bl.budget_line,
dd4.monthnumberofyear AS month,
dd4.yearcalendar AS year,
CASE
WHEN frb.full_value IS NULL THEN '0'
ELSE frb.full_value
END AS Expenditure_value,
bl.budget_id,
frb.accode,
'Actual' AS Type
FROM dw.dbo.dimdate5 AS dd4
LEFT OUTER JOIN dbo.award_data AS ad
ON dd4.fulldate BETWEEN ad.usethisstartdate AND
ad.usethisenddate
LEFT OUTER JOIN dbo.budget_line AS bl
ON bl.award_id = ad.award_id
LEFT OUTER JOIN dw.dbo.fctresearchbalances AS frb
ON frb.el3 = bl.award_id
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The view has 9 columns and 1.5 million rows and growing. A select * from this view was taking 20 minutes for all the rows. I added indexes on the fields in the tables that are joined on and that improved it to 10 minutes. My question is what else could I do to get the select to run faster?
Many thanks, Violet.
Try getting rid of the case statement.
If you have 1.5 million rows, if you're interesting in the aggregation of those rows rather than the whole set, you might want to sum the rows in fctResearchBalances first and then do the joins.
(It's a bit difficult to determine what else you might benefit from, without seeing the access plan.)
1- You can use stored procedure to have buffer cache.
2- you can use indexed view , this means creating index on schemabound views.
3- you can use query hints in join to order the query optimizer to use special kind of join.
4- you can use table partitioning .
SELECT DISTINCT --#1 - potential bottleneck
ad.award_id
, bl.funding_id
, bl.budget_line
, [month] = dd4.monthnumberofyear
, [year] = dd4.yearcalendar
, Expenditure_value = ISNULL(frb.full_value, '0')
, bl.budget_id
, frb.accode
, [type] = 'Actual'
FROM dbo.dimdate5 dd4
LEFT JOIN dbo.award_data ad ON dd4.fulldate BETWEEN ad.usethisstartdate AND ad.usethisenddate
LEFT JOIN dbo.budget_line bl ON bl.award_id = ad.award_id
LEFT JOIN dbo.fctresearchbalances frb ON frb.el3 = bl.award_id --#2 - join by multiple columns
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The CASE statement can be replace by
COALESCE(frb.full_value,'0') AS Expenditure_value
Without more info it's not possible to tell exactly what is wrong but just to give you some pointers.
When you have so many LEFT JOINS the order of the joins can make a difference.
Do you have standard indexes or covering indexes with included columns?
If you don't have covering indexes, then primary keys matter in the joins. Including all the primary key columns in the join condition will speed up the query.
Then look at your data - do you need all those LEFT JOINS base on the foreign keys between those tables? Depending on your keys a LEFT JOIN may be equivalent to an INNER JOIN.
And with all those LEFT JOINS is having a DISTINCT really useful?
How much RAM do you have? If you have 8GB+ then 1.5m rows is nothing for SQL Server. You need to optimise those joins.

WHERE clause better execute before IN and JOIN or after

I read this article:
Logical Processing Order of the SELECT statement
in end of article has been write ON and JOIN clause consider before WHERE.
Consider we have a master table that has 10 million records and a detail table (that has reference to master table(FK)) with 50 million record. We have a query that want just 100 record of detail table according a PK in master table.
In this situation ON and JOIN execute before WHERE?I mean that do we have 500 million record after JOIN and then WHERE apply to it?or first WHERE apply and then JOIN and ON Consider? If second answer is true do it has incoherence with top article?
thanks
In the case of an INNER JOIN or a table on the left in a LEFT JOIN, in many cases, the optimizer will find that it is better to perform any filtering first (highest selectivity) before actually performing whatever type of physical join - so there are obviously physical order of operations which are better.
To some extent you can sometimes control this (or interfere with this) with your SQL, for instance, with aggregates in subqueries.
The logical order of processing the constraints in the query can only be transformed according to known invariant transformations.
So:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is still logically equivalent to:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and they will generally have the same execution plan.
On the other hand:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is NOT equivalent to:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and so the optimizer isn't going to transform them into the same execution plan.
The optimizer is very smart and is able to move things around pretty successfully, including collapsing views and inline table-valued functions as well as even pushing things down through certain kinds of aggregates fairly successfully.
Typically, when you write SQL, it needs to be understandable, maintainable and correct. As far as efficiency in execution, if the optimizer is having difficulty turning the declarative SQL into an execution plan with acceptable performance, the code can sometimes be simplified or appropriate indexes or hints added or broken down into steps which should perform more quickly - all in successive orders of invasiveness.
It doesn't matter
Logical processing order is always honoured: regardless of actual processing order
INNER JOINs and WHERE conditions are effectively associative and commutative (hence the ANSI-89 "join in the where" syntax) so actual order doesn't matter
Logical order becomes important with outer joins and more complex queries: applying WHERE on an OUTER table changes the logic completely.
Again, it doesn't matter how the optimiser does it internally so long as the query semantics are maintained by following logical processing order.
And the key word here is "optimiser": it does exactly what it says
Just re-reading Paul White's excellent series on the Query Optimiser and remembered this question.
It is possible to use an undocumented command to disable specific transformation rules and get some insight into the transformations applied.
For (hopefully!) obvious reasons only try this on a development instance and remember to re-enable them and remove any suboptimal plans from the cache.
USE AdventureWorks2008;
/*Disable the rules*/
DBCC RULEOFF ('SELonJN');
DBCC RULEOFF ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
You can see with those two rules disabled it does a cartesian join and filter after.
/*Re-enable them*/
DBCC RULEON ('SELonJN');
DBCC RULEON ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
With them enabled the predicate is pushed right down into the index seek and so reduces the number of rows processed by the join operation.
There is no defined order. The SQL engine determines what order to perform the operations based on the execution strategy chosen by its optimizer.
I think you have misread ON as IN in the article.
However, the order it is showing in the article is correct (obviously it is msdn anyway). The ON and JOIN are executed before WHERE naturally because WHERE has to be applied as a filter on the temporary resultset obtained due to JOINS
The article just says it is logical order of execution and also at end of the paragraph it adds this line too ;)
"Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list."

SQL Server CTE referred in self joins slow

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table.
There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows.
The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.
I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.
The query is quite complex but here is a simplified pseudo-version of it
WITH DataCTE as
(
SELECT [columns] FROM table
INNER JOIN table2
ON [...]
INNER JOIN table3
ON [...]
LEFT JOIN table3
ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
ON [...]
AND Product = 'Bananas'
AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
ON [...]
AND Product = 'Bananas'
AND Quality < 20
LEFT JOIN DataCTE MangosSubset
ON [...]
GROUP BY [
I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.
I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.
I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.
The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.
Why isn't SQL Server smart enough to keep the CTE results in memory?
I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.
To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.
Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?
Thanks,
Kharlos
I believe that CTE results are retrieved every time. With a temp table the results are stored until it is dropped. This would seem to explain the performance gains you saw when you switched to a temp table.
Another benefit is that you can create indexes on a temporary table which you can't do to a cte. Not sure if there would be a benefit in your situation but it's good to know.
Related reading:
Which are more performant, CTE or temporary tables?
SQL 2005 CTE vs TEMP table Performance when used in joins of other tables
http://msdn.microsoft.com/en-us/magazine/cc163346.aspx#S3
Quote from the last link:
The CTE's underlying query will be
called each time it is referenced in
the immediately following query.
I'd say go with the temp table. Unfortunately elegant isn't always the best solution.
UPDATE:
Hmmm that makes things more difficult. It's hard for me to say with out looking at your whole environment.
Some thoughts:
can you use a stored procedure instead of a UDF (instead, not from within)?
This may not be possible but if you can remove the left join from you CTE you could move that into an indexed view. If you are able to do this you may see performance gains over even the temp table.

Resources