Speed up view performance - sql-server

I have an old view that takes 4 mins to run, I have been asked to speed it up. The FROM looks like this:
FROM TableA
CROSS JOIN ViewA
INNER JOIN TableB on ViewA.Name = TableB.Name
AND TableA.Code = TableB.Code
AND TableA.Location = TableB.Location
WHERE (DATEDIFF(m, ViewA.SubmitDate, GETDATE()) = 1) -- Only pull last months rows
Table A has around 99k rows, ViewA has around 2000 rows and TableB has around 101K rows. I think the problem is at the INNER JOIN because it I remove it, the query takes 1 second.
My first thought was to see if I could down the number of rows in ViewA by breaking the whole thing into CTEs but this made zero impact. I am thinking I need to index TableB, because it is just a bunch of varchars being used in the joins. I am now changing it to temp tables so I can index it. I can not change the underlying tables and views. Is index temp tables a good way to go, or is there a better solution.
Edit to add info regarding existing indexes. Only thing with an index on it right now is TableA.Id which is the PK and a clustered Index. TableB has an Id field but it is not the PK. ViewA is not indexed.
Edit again to correct some structure. SubmitDate is in the View, not the table.
Here is a very basic structure:
CREATE TABLE TableA
(
Id int NOT NULL PRIMARY KEY,
Section varchar(20) NULL,
Code varchar(20) NULL
)
CREATE TABLE TableB
(
Id int NOT NULL PRIMARY KEY,
Name varchar(20) NULL,
Code varchar(20) NULL,
Section varchar(20) NULL
)
CREATE TABLE TableC
(
Id int NOT NULL PRIMARY KEY,
Name varchar(20) NULL,
SubmitDate DateTime NOT NULL
)
CREATE TABLE TableD
(
Id int NOT NULL PRIMARY KEY,
Section varchar(20) NULL
)
CREATE VIEW ViewA
AS
SELECT c.Section, d.Name, c.SubmitDate
FROM TableC c
JOIN TableD d ON a.Id = b.Id

One improovement is to rewrite where clause into sargable clause. Add index to SubmitDate if there is no index and change query to:
FROM TableA
CROSS JOIN ViewA
INNER JOIN TableB on ViewA.Name = TableB.Name
AND TableA.Code = TableB.Code
AND TableA.Location = TableB.Location
WHERE
TableA.SubmitDate >=DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE())-1,0)
And TableA.SubmitDate < Dateadd(DAY, 1, DATEADD(MONTH,
DATEDIFF(MONTH, -1, GETDATE())-1, -1) )
Also add nonclustered indexes on Name, Code and Location columns.

Related

Creating INSERT trigger that sets values to 0

I have two tables : Invoice and Invoice_item, relationship 1 to many.
The Invoice_item table has columns Number_sold and Item_price, and the Invoice table has Number_sold_total and Item_price_total columns that will store total values of columns Number_sold and Item_price from the Invoice_item table with the same Invoice_ID key.
CREATE TABLE [Invoice] (
[Invoice_ID] [int] NOT NULL,
[Number_sold_total] [int] NOT NULL,
[Item_price_total] [decimal] NOT NULL,
PRIMARY KEY ([Invoice_ID]));
CREATE TABLE [Invoice_item] (
[Invoice_item_ID] [int] NOT NULL,
[Invoice_ID] [int] NOT NULL,
[Number_sold] [int] NOT NULL,
[Item_price] [decimal] NOT NULL,
PRIMARY KEY ([Invoice_item_ID],[Invoice_ID],
FOREIGN KEY ([Invoice_ID]) REFERENCES [Invoice]([Invoice_ID]);
So, if there are three rows in Invoice_item with the same Invoice_ID, the row with that Invoice_ID in Invoice table will have SUM values of corresponding columns in Invoice_item table.
Let's say i have three rows in Invoice_item table and columns Item_price with values 100,200 and 300, and they have the Invoice_ID = 3. The column Item_price_total in Invoice will have value of 600, where the Invoice_ID = 3.
QUESTION -
My task is to create an insert trigger on table Invoice that will set the values of Number_sold_total and Item_price_total to 0(ZERO) if there is no Invoice_item with corresponding Invoice_ID -> IF NOT EXISTS (Invoice.Invoice_ID = Invoice_item.Invoice_ID)...
I am using SQL Server 2017.
Ideally you would not implement this using triggers.
Instead you should use a view. If you are worried about querying performance, you can index it, at the cost of insert and delete performance.
CREATE VIEW dbo.Invoice_Totals
WITH SCHEMABINDING
AS
SELECT
i.Invoice_ID,
Number_sold = SUM(i.Number_sold),
Item_price = SUM(i.Item_price),
ItemCount = COUNT_BIG(*) -- must include count for indexed view
FROM dbo.Invoice_item;
And then index it
CREAT UNIQUE CLUSTERED INDEX CX_Invoice_Totals ON Invoice_Totals
(Invoice_ID);
If you really, really want to do this using triggers, you can use the following
CREATE OR ALTER TRIGGER TR_Invoice_Total
ON dbo.Invoice_item
AFTER INSERT, UPDATE, DELETE
AS
SET NOCOUNT ON; -- prevent spurious resultsets
IF (NOT EXISTS (SELECT 1 FROM inserted) AND NOT EXISTS (SELECT 1 FROM deleted))
RETURN; -- early bail-out if no rows
UPDATE i
SET Number_sold_total += totals.Number_sold_total,
Item_price_total += totals.Item_price_total
FROM Invoice i
JOIN (
SELECT
Invoice_ID = ISNULL(i.Invoice_ID, d.Invoice_ID),
Number_sold_total = SUM(ISNULL(i.Number_sold, 0) - ISNULL(d.Number_sold, 0)),
Item_price_total = SUM(ISNULL(i.Item_price, 0) - ISNULL(d.Item_price, 0))
FROM inserted i
FULL JOIN deleted d ON d.Invoice_ID = i.Invoice_ID
GROUP BY
ISNULL(i.Invoice_ID, d.Invoice_ID)
) totals
ON totals.Invoice_Id = i.Invoice_ID;
db<>fiddle
The steps of the trigger are as follows:
Bail out early if the modification affected 0 rows.
Join the inserted and deleted tables together on the primary key. This needs to be a full-join, because in an INSERT there are no deleted and in a DELETE there are no inserted rows.
Group up the changed rows by Invoice_ID, taking the sum of the differences.
Join back to the Invoice table
Update the Invoice table adding the total difference to each column.
This effectively recreates what the indexed view would do for you automatically.
You cannot just select the first row from inserted and deleted into variables, as there may be multiple rows affected. You must join and group them

I am trying fetch data from sqlit3 database and haveing this ambiguous column name problem, I don't see any issue, need an explanation

I am Trying to fetch the list of movies where two people are star in the same movies here is the table format:
CREATE TABLE people (
id INTEGER,
name TEXT NOT NULL,
birth NUMERIC,
PRIMARY KEY(id)
);
CREATE TABLE stars (
movie_id INTEGER NOT NULL,
person_id INTEGER NOT NULL,
FOREIGN KEY(movie_id) REFERENCES movies(id),
FOREIGN KEY(person_id) REFERENCES people(id)
);
CREATE TABLE movies (
id INTEGER,
title TEXT NOT NULL,
year NUMERIC,
PRIMARY KEY(id)
);
Query:
Running the below query is giving me ambiguous column name: movie_id, i don't understand what is the issue here,
select movie_id from (
(select movie_id,person_id from (
select id from people where name = "Johnny Depp") as x
inner join
stars on x.id = stars.person_id) as xx
inner join
(select movie_id,person_id from (
select id from people where name = "Helena Bonham Carter") as y
inner join
stars on y.id = stars.person_id) as yy
on xx.movie_id = yy.movie_id
);
It is easier to group by the movies and select only those groups having the 2 different names from the where clause
select s.movie_id
from people p
join stars s on p.id = s.person_id
where p.name in ('Johnny Depp', 'Helena Bonham Carter')
group by s.movie_id
having count(distinct p.name) = 2
The issue with your query is that when selecting from 2 tables with the same column name you have to tell the DB which column you want to use by adding the table name
select xx.movie_id ...

Self Join on large tables slowness issue

I have two tables like...
table1 (cid, duedate, currency, value)
main_table1 (cid)
My query is like below, I am find out co-relation between each cid and table1 contains 3 million records(cid and duedate column is compositely unique) and main_table contains 1500 records all unique.
SELECT
b.cid, c.cid,
(COUNT(*) * SUM(b.value * c.value) -
SUM(b.value) * SUM(c.value)) /
(SQRT(COUNT(*) * SUM(b.value * b.value) -
SUM(b.value) * SUM(b.value)) *
SQRT(COUNT(*) * SUM(c.value * c.value) -
SUM(c.value) * SUM(c.value))
) AS correl_ij
FROM
main_table1 a
JOIN
table1 AS b ON a.cid = b.cid
JOIN
table1 AS c ON b.cid < c.cid
AND b.duedate = c.duedate
AND b.currency = c.currency
GROUP BY
b.cid, c.cid
Please suggest how to optimize this query because it is running slow.
CREATE TABLE #table1(
id int identity,
cid int NOT NULL,
duedate date NOT NULL,
currency char(3) NOT NULL,
value float,
PRIMARY KEY(id,currency,cid,duedate)
);
CREATE TABLE #main_table1(
cid int NOT NULL PRIMARY KEY,
currency char(3)
);
--#main table contains 155000 cid records there is no duplicate values
insert into #main_table1
values(19498,'ABC'),(19500,'ABC'),(19534,'ABC')
INSERT INTO #table1(CID,DUEDATE,currency,value)
VALUES(19498,'2016-12-08','USD',-0.0279702098021799) ,
(19498,'2016-12-12','USD',0.0151285161000268),
(19498,'2016-12-15','USD',-0.00965080868337728),
(19498,'2016-12-19','USD',0.00808331709091531)
There are 3 million records in this table for diffrent dates and cid and most of the cid are present in #main_table1.
I am using a.cid < b.cid to remove duplicate relationship between a.cid and b.cid beause i am deriving corelation between each cid.
so 19498 -->>19500 corelation is calculated hence then i do not want 19500--> 19498 because it would be same but duplicate.
That PK is silly. Why would you include Iden in a composite PK let alone in the first position? Drop Iden unless you have to have it for some misguided reason.
PRIMARY KEY(cid, currency, duedate)
Or the natural key if different
If you're commonly joining or sorting on the cid column, you probably want a clustered index on that column or a composite beginning with that column.
If cid, duedate is unique then you can consider removing the id altogether.
If you want to retain id for some reason, make it PRIMARY KEY NONCLUSTERED, and specify a clustered index on cid, duedate.

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall
CREATE TABLE contacts
(
id int PRIMARY KEY,
phone1 varchar(20) NULL,
phone2 varchar(20) NULL,
phone3 varchar(20) NULL,
phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
list_id int NOT NULL,
phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
list_id ASC,
phone ASC
);
I would like to see what contacts matches the phone number in a specific list of DoNotCall phone.
For faster lookup, I have indexed donotcall on list_id and phone.
When I make the following JOIN it takes a long time (eg. 9 seconds):
SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
Execution plan on Pastebin
While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):
SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
ON d1.list_id = 1
AND d1.phone = c.phone1
LEFT JOIN donotcall d2
ON d2.list_id = 1
AND d2.phone = c.phone2
LEFT JOIN donotcall d3
ON d3.list_id = 1
AND d3.phone = c.phone3
LEFT JOIN donotcall d4
ON d4.list_id = 1
AND d4.phone = c.phone4
WHERE
d1.phone IS NOT NULL
OR d2.phone IS NOT NULL
OR d3.phone IS NOT NULL
OR d4.phone IS NOT NULL
Execution plan on Pastebin
My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?
SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.
You can test if the index would be faster with a hint:
SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)
Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:
select object_name(ind.object_id) as TableName
, ind.name as IndexName
, stats_date(ind.object_id, ind.index_id) as StatisticsDate
from sys.indexes ind
order by
stats_date(ind.object_id, ind.index_id) desc
You can update them manually with:
EXEC sp_updatestats;
With this poor database structure, a UNION ALL query might be fastest.

SQL Server putting data into temp table first before heavy join

Is it a good idea to put data into temp table first before joining several other tables?
For instance, let's say I have the following:
tableA, 5 million rows
tableB, 5 million rows
tableC, 5 million rows
...
tableG
The Query I want to perform may look like:
SELECT 1 FROM tableA
INNER JOIN tableB WITH (NOLOCK) ON tableA.col1= tableB.col1
LEFT JOIN tableC WITH (NOLOCK) ON ...
...
LEFT JOIN tableG WITH (NOLOCK) ON ...
WHERE tableA.someCol= conditionA AND tableB.someCol= conditionB...
Assuming with the filter, only a small subset of tableA will be returned. Would it be a good idea to pull data from tableA first before joining other tables, so as to avoid blocking and may be increase performance?
I tried googling but couldn't find any satisfactory answer. Thanks in advance.
Here are the "typicals" that I try. I usually try them out and see what happens under load and under "big data" that represents production row numbers, not dev row numbers.
Going from memory.
If it is "one time" use, I try to use the derived table method.
If it data in the "holder" table can be reused, I start with a #variableTable if the number of rows will be small.
2.b. The only time I've seen a #variableTable screw you is if you do some aggregate results...where the "summary rows" are only a few, but to generate the summary rows, you hit a large amount of rows. Think something like "Select StateAbbreviation, count(*) from dbo.LargeTableOfData".....there will only be 50 or so rows in the result table, BUT the aggregate data comes from a large table with lots of rows.
Then I to go a #TempTable. Most times without an index. Sometimes with an index.
2 or 3 times in my life the index on the #TempTable resulted in significant improvement.
It is a "try it out game". Sometimes you just don't know until you give it the ole college try.
Use Northwind
GO
/* Temp Table , No Index(es) */
IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
drop table #TempTableNoIndex
end
CREATE TABLE #TempTableNoIndex
(
OrderID int
)
Insert into #TempTableNoIndex (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #TempTableNoIndex innerHolder where innerHolder.OrderID = od.OrderID )
/* Temp Table , With Index(es) */
IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
drop table #TempTableWithIndex
end
CREATE TABLE #TempTableWithIndex
(
OrderID int
)
CREATE INDEX IX_TEMPTABLE_TempTableWithIndex_OrderID ON #TempTableWithIndex (OrderID)
Insert into #TempTableWithIndex (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #TempTableWithIndex innerHolder where innerHolder.OrderID = od.OrderID )
/* Variable Table */
Declare #HolderTable TABLE ( OrderID int )
Insert into #HolderTable (OrderID) select top 5 OrderID from dbo.Orders
Select * from dbo.[Order Details] od where exists (select null from #HolderTable innerHolder where innerHolder.OrderID = od.OrderID )
/* Derived Table */
Select * from dbo.[Order Details] od
join
( select top 5 OrderID from dbo.Orders ) as derived1
on od.OrderID = derived1.OrderID
/* Clean up */
IF OBJECT_ID('tempdb..#TempTableNoIndex') IS NOT NULL
begin
drop table #TempTableNoIndex
end
IF OBJECT_ID('tempdb..#TempTableWithIndex') IS NOT NULL
begin
drop table #TempTableWithIndex
end

Resources