How to speed up denormalized table with indexes - sql-server

I have created a denormalized table which has 35 columns and 360k records. the table consist of 8 other tables. Table only has one inner join with some other table.
My main problem is performance, queries to this table works so slow. I also have full-text catalog related to that table. Fts searches also slow.
Also i am having deadlocks, in activity monitor i see LCK_IM_X, LCK_IM_S wait types.
I placed indexes like crazy and followed query executions plans.
Now i know that table scan is bad when there's too much record. I passed that long time ago, but right now, Index Scan's are going with %80 cost.
In below, inside of select comes from another table as string, this query works very much at its top speed. Also, i dynamically build where clause along with other conditions.
https://imgur.com/a/ijOUYeY
SET #Sql=';WITH TempResult AS(Select '+(SELECT STUFF((SELECT ',' +DBFieldName FROM TableFields where TableFields.TableID=3 FOR XML PATH('')), 1, 1, ''))+',
0 as [DeviceChange],
0 as [DeviceReturn],
0 as [SNORepeat]
from DenormalizedTable
inner join Companies On Companies.CompanyID=DenormalizedTable.CompanyID
WHERE (#state is null or DenormalizedTable.StateID=#state)
AND ((#status = -1 AND DenormalizedTable.Status IN(0,1,10,11,4)) OR (#status=1 AND DenormalizedTable.Status IN(1,4,10,11)) OR
(#status=2 AND DenormalizedTable.Status IN (0))) AND (#CategoryID is null or DenormalizedTable.CategoryID=#CategoryID)
AND (DenormalizedTable.CompanyID=#companyID OR Companies.SubCompanyOf=#companyID)
'+ ( CASE WHEN #CustomSearchParam='' THEN '' ELSE (Select dbo.[perf_whereBuilder](#CustomSearchParam)) END)+'
AND (#technicianID is null or DenormalizedTable.JobOrderID IN (Select AttendedStaff.JobOrderID from AttendedStaff Where AttendedStaff.StaffID=#technicianID))
AND (#FilterStartDate is null or convert(varchar, JobOrder_PerfTable.StartDate, 20) between #FilterStartDate and #FilterEndDate)
), TotalCount AS (Select COUNT(*) as TotalCount from TempResult)
Select * from TempResult, TotalCount
order by 1 desc
OFFSET #skip ROWS
FETCH NEXT #take ROWS ONLY;';
when i run the sp it tooks almos 5 seconds to load. if there's any search parameter it goes even more.
I need to know what i should do to run query faster, sorry if question is broad.

Related

Why selecting STUFF value decrease performance of a query in SQL Server

Recently I want to improve some of my queries and I see something that I don't really understand, here is my query :
SELECT
S.VENTE_GUID, Top6QtySold.list
FROM
SALES s with(nolock)
OUTER APPLY
(SELECT
STUFF((SELECT TOP(6) ',' + t.CodeProd
FROM topSold t
WHERE t.VENTE_GUID = s.SALE_GUID
ORDER BY t.nbOrdered DESC
FOR XML PATH('')), 1, 1, '') AS list
) AS Top6QtySold
When I execute this query, it takes around 6 seconds, but when I comment out the column: op6QtySold.list in the SELECT statement, it takes less than 1 second.
I know that the number of columns in the SELECT statement can decrease performance but in this case, the hard work has been done by the OUTER APPLY and it should be a formality to display the value of Top6QtySold, no ?
(I simplify the query but imagine that there is a lot of data in the SELECT part and more joins)

Abysmal performance - Hash join takes >80% of query cost in SQL Server 2012

This part of the query makes it quite bad, unfortunately can't see a way around, only optimize.
update #db set contents = i.contents
from (select distinct
(select max(ac.contents) from ##dwv d
left join ##calendar c on 1=1
left join #db ac on d.id = ac.id
and c.ReportingPeriod = ac.DateValue and ac.Data_Type = 'ActivePeriod'
where d.ID = dd.id and month_number >= (cc.month_number-3)
and month_number <= cc.month_number) contents
,dd.id
,cc.ReportingPeriod
from #db dd
left join ##calendar cc on cc.ReportingPeriod = dd.DateValue
where dd.Data_Type = 'ActivePeriod'
)i
where i.id = #db.id and i.ReportingPeriod = #dashboard.DateValue
I was trying to merge it first, but wasn't going somewhere fast, and the above puppy came to be.
The Idea is to mark every customer as active in any given period (year, month in format 'YYYYMM') according to a specific algorithm, so for every customer that matches the report criteria I need to have a row which will tell me if he was active (that is: bought something recently).
#db is a temp table where I'm gathering all the data that will be later used for aggregates to produce report - large table of several million rows, depending on timeframe:
Create table #db
(
C_P varchar(6)
,Data_Type varchar(20)
,id int
,contents int
,DateValue varchar(10)
)
##dwv is a temp table where I'm dumping the result of a select on a large view (which itself is very slow), holds about 2.4 million rows
##calendar is an ad-hoc table which stores every period the report encompasses in same format 'YYYYMM':
select CONVERT(char(6), cast(#startdate as date), 112) "CP"
,CONVERT(char(6), cast(PKDate as date), 112) "RP"
,(ROW_NUMBER() over (order by (CONVERT(char(6), cast(PKDate as date), 112)) asc))-1
as month_number
into ##calendar
from [calendar].[dbo].[days]
where PKDate between #startdate and #enddate2
group by CONVERT(char(6), cast(PKDate as date), 112)
Query plan tells me that the bit c.ReportingPeriod = ac.DateValue is the cuplrit - takes 88% of the subquery cost with it, which in turns accounts for 87% of the cost of whole query.
What am I not seeing here and how can I improve that?
Hash Joins usually mean that the columns used in the JOIN are not indexed.
Make sure you have covering indexes for these columns:
d.id = ac.id
and c.ReportingPeriod = ac.DateValue and ac.Data_Type
Just in case if someone stumbles here I'll explain what I did to trim down execution time from 32 minutes to 15 seconds.
One, as suggested in comments and answer by Tab Alleman I've looked at the indexes for the tables where HASH JOIN showed on the execution plan. I've also had a closer look at ON clauses for the joins here and there refining them, which ended with smaller numbers of rows in results. To be more specific - the inline query fetching the 'contents' value for update now is against source table '#dwv' that joins to a preprocessed '#calendar' table, as opposed to a cross join between two tables and then another join to the result. This reduced the end dataset to bare hundreds of thousands of rows instead of 17 billion, as reported in the query plan.
Effect is that now the report is lightning quick compared to previous drafts, so much so that it now can be run in a loop and it still outputs in more than reasonable time.
Bottom line is that one has to pay attention to what SQL Server complains about, but also at least have a look at the number of rows crunched and try to lower them whenever possible. Indexing is good, but it's not "the miracle cure" for all that ails your query.
Thanks to all who took time to write here - when several people say similar things it's always good to sit down and think about it.

Update with limit and offset applied to joined table

I have an UPDATE with an INNER JOIN. My overall question is how (if it is possible at all) to set LIMIT and OFFSET to that joined table.
Example query without limit and offset:
UPDATE t2
SET t2.some_col = t1.some_col
FROM table_1 t1
INNER JOIN table_2 t2
ON t1.other_col = t2.other_col
And how to rebuild this query to get only first 1000000, 1000000 - 2000000, 2000000 - 3000000, ... etc. records from t2.
Exact scenery:
My task is to rebuild very large tables with hash indexes (char(32)) to bigint indexes. Example tables:
URLS: PAGE_VIEWS:
id char(32) urlId char(32)
other_columns referrerUrlId char(32)
intUrlId bigint (added and filled) other_columns
intUrlId bigint (needs to update)
intReferrerUrlId bigint (needs to update)
First table is about 200 mln records, second over 1 bln. I update this tables in packs. The update job wouldn't be difficult if I could use WHERE urls.intUrlId BETWEEN ... but I can't. Sometimes JOIN return on example 500000 records for single pack but many times it returns 0 so it update 0 records but join in such big tables costs quite a lot of time. So I need equal packs limited by page_views table not urls table. Page_views table has no column I can base WHERE clause so I need limit this table by TOP and ROW_NUMBER() clauses but I dunno how. (I'm quite new in MsSQL, I used to work on MySQL and PostgreSql databases which has LIMIT and OFFSET clauses.
For any answer I would appreciate info about cost of this solution because someone would appreciate any LIMIT - OFFSET solution but not me. I already have query which update what I need. But it use intUrlId from urls table and it is slow. I need faster solution. Server version 2008.
BTW. Don't ask me who the hell based database on char indexes :-) Now it become a problem and multi TBs database needs to be rebuilded.
You can try using a CTE with a RowNumber
WITH toUpdate AS
(
SELECT urlId, intUrlId, ROW_NUMBER() OVER (ORDER BY something) AS RowNumber
FROM [XXX].[ZZZ].[Urls]
)
UPDATE pv
SET pv.intUrlId = urls.intUrlId
FROM toUpdate urls
INNER JOIN [XXX].[YYY].[PageViews] pv WITH(NOLOCK) ON pv.urlId = urls.id and RowNumber between 10000 and 20000
To answer the question "how to set LIMIT, OFFSET to joined table" in Jeremy's answer tables needs to be switched. I'll give correct answer for example query I used in my question.
WITH toUpdate AS
(
SELECT some_col, other_col, ROW_NUMBER() OVER (ORDER BY any_column) AS RowNumber
FROM table_2
)
UPDATE toUpdate
SET toUpdate.some_col = t1.some_col
FROM table_1 t1
INNER JOIN toUpdate ON t1.other_col = toUpdate.other_col
AND RowNumber BETWEEN 1000000 AND 2000000

Performance Issues with Count(*) in SQL Server

I am having some performance issues with a query I am running in SQL Server 2008. I have the following query:
Query1:
SELECT GroupID, COUNT(*) AS TotalRows FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2
ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word) GROUP BY GroupID
Table1 contains about 500,000 rows. Table2 contains about 50,000, but will eventually contain millions. Playing around with the query, I found that re-writing the query as follows will reduce the execution time of the query to under 1 second.
Query 2:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
What I do not understand is it is a simple count query. If I execute the following query on Table 1, it returns in < 1 s:
Query 3:
SELECT Count(*) FROM Table1
This query returns around 500,000 as the result.
However, the Original query (Query 1) mentioned above only returns a count of 50,000 and takes 3s to execute even though simply removing the GROUP BY (Query 2) reduces the execution time to < 1s.
I do not believe this is an indexing issue as I already have indexes on the appropriate columns. Any help would be very appreciated.
Performing a simple COUNT(*) FROM table can do a much more efficient scan of the clustered index, since it doesn't have to care about any filtering, joining, grouping, etc. The queries that include full-text search predicates and mysterious subqueries have to do a lot more work. The count is not the most expensive part there - I bet they're still relatively slow if you leave the count out but leave the group by in, e.g.:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
GROUP BY GroupID;
Looking at the provided actual execution plan in the free SQL Sentry Plan Explorer*, I see this:
And this:
Which lead me to believe you should:
Update the statistics on both Inventory and A001_Store_Inventory so that the optimizer can get a better rowcount estimate (which could lead to a better plan shape).
Ensure that Inventory.ItemNumber and A001_Store_Inventory.ItemNumber are the same data type to avoid an implicit conversion.
(*) disclaimer: I work for SQL Sentry.
You should have a look at the query plan to see what SQL Server is doing to retrieve the data you requested. Also, I think it would be better to rewrite your original query as follows:
SELECT
Table1.GroupID -- When you use JOINs, it's always better to specify Table (or Alias) names
,COUNT(Table1.GroupID) AS TotalRows
FROM
Table1
INNER JOIN
Table2 ON
(Table2.Column1 = Table1.Column1) AND
(Table2.GroupID = #GroupID)
WHERE
CONTAINS(Table1.*, #Word)
GROUP BY
Table1.GroupID
Also, keep in mind that a simple COUNT and a COUNT with a JOIN and GROUP BY are not the same thing. In one case, it's just a matter of going through an index and making a count, in the other there are other tables and grouping involved, which can be time consuming depending on several factors.

SQL Server: Order By DateDiff Performance issue

I'm having a problem getting top 100 rows from a table with 2M rows in reasonable time.
The problem is the order by part, it takes more than 50 minutes to get results for this query..
What can be the best solution for this problem?
select top 100 * from THETABLE TT
Inner join SecondTable ST on TT.TypeID = ST.TypeID
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Many thanks,
Bentzy
Edit:
* TheTable is the one with 2M rows.
* SomeParam has 15 distinct values (more or less)
There are two things that come to mind to speed up this fetch:
If you need to run this query often, you should index the column 'lastCheckDate'. No matter which sql db you are using, a well defined index on the column will allow for faster selects, especially in an orders by clause.
Perform the date math before doing the select query. You are getting the difference in days between the row's checkDate and the current date, times some parameter. Does the multiplication affect the ordering of the rows? Can this simply be ordered by the 'lastCheckDate desc'? Explore other sorting options that return the same result.
Two ideas come to mind:
a) If ST.param doesn't change often, perhaps you can cache the result of the multiplication somewhere. The numbers would be "off" after a day, but the relative values would be the same - i.e., the sort order wouldn't change.
b) Find a way to reduce the size of the input tables. There are probably some values of LastCheckDate &/or SomeParam that will never be in the top 100. For example,
Select *
into #tmp
from THETABLE
where LastCheckDate between '2012-06-01' and getdate()
select top 100 *
from #tmp join SecondTable ST on #tmp.TypeID = ST.TypeID
order by DateDiff(day, LastCheckDate, getdate()) * ST.SomeParam desc
It's a lot faster to search a small table than a big one.
DATEDIFF(Day, TT.LastCheckDate, GETDATE()) is the number of days since "last check".
If you just order by TT.LastCheckDate you get a similar order.
EDIT
Maybe you can work out what dates you don't expect to get back and filter on them. Of course you then also need an index on that LastDateCheck column. If everything works out, you can at least shorten the list of records to check from 2M to some managable amount.
It is quite complicated.Do you seriouslly need all columns in query?There is one thing which you could try here. First just get the top 100 rows typeid
something like below
select top 100 typeid
,TT.lastcheckdate,st.someparam --do not use these if the typeid is unqiue in both tables..
--or just the PK columns of both tables and typeid so that these can be joined on PK
into #temptable
from st inner join tt on st.typeid = tt.typeid
ORDER BY DATEDIFF(Day, TT.LastCheckDate, GETDATE()) * ST.SomeParam DESC
Above will sort very minimal data and thus should be faster.Based on how many columns you have in your table and indexes this should be way faster (it will be fast if you have many columns in both tables but this query will use just 3.Also, maybe these columns (st.typeid,st.someparam and tt.typeid and tt.lastcheckdate) are covered by some of indexes so no need to read underlying tables and thus reduce the IO as well) than actual one..Then join this data back to both tables.
If that doesnt work the way you expect.Then you can have indexed view using above select by adding the order by expression as column. Then use this indexed view to get top 100 and join with main tables.This will surely reduce the amount of work and thus improve perf.But Indexed view will have overhead which will depend on how frequently data changed in the table TT.
To lessen number of rows you might retrieve top (100) for each SecondTable record ordered by LastCheckDate, and then union all them and finally select top (100), by means of temporary table or dynamic sql generated query.
This solution uses cursor to fetch top 100 records for each value in SecondTable. With index on (TypeID, LastCheckDate) on TheTable it runs instantaneously (tested on my system with a table of 700,000 records and 50 SecondTable entries).
declare #SomeParam varchar(3)
declare #TypeID int
declare #tbl table (TheTableID int, LastCheckDate datetime, SomeParam float)
declare rstX cursor local fast_forward for
select TypeID, SomeParam
from SecondTable
open rstX
while 1 = 1
begin
fetch next from rstX into #TypeID, #SomeParam
if ##fetch_status <> 0
break
insert into #tbl
select top 100 ID, LastCheckDate, #SomeParam
from TheTable
where TypeID = #TypeID
order by LastCheckDate
end
close rstX
deallocate rstX
select top 100 *
from #tbl
order by DATEDIFF(Day, LastCheckDate, GETDATE()) * SomeParam
Obviously this solution fetches ID's only. You might want to expand temporary table with additional columns.

Resources