SQL Server UNION ALL Merge Join (Concatenation) too slow

SQL Server UNION ALL Merge Join (Concatenation) too slow - sql-server

I have a select query which utilizes UNION ALL keyword on two tables with same structure (columns and primary key, they have different non-clustered indexes). These two tables contain 39 million rows, a million in one and a 38 million in the other. When running a query just on a table1 which has a million rows, it takes approximately 0.2 seconds, on table2 we have a different situation it takes from 0.5 to 1.2 seconds maximum, depending on the stress of DB.
In reality, for displaying I need to union those two tables, but the problem is that the union query takes a whopping 8 seconds to run. When taking a look at execution plan, the most heavy operation is Merge Join (Concatenation) with the cost of 91%, I am little bit concerned as the WHERE clause I'm running selects 51 entries from table1 and 0 entries from the table2 (larger table).
I can't get my head around it, I've been trying to find any solutions to my problem for last two days now and all I found was either UNION being used instead of UNION ALL or unnecessary clauses being in place like GROUP BY or LEFT/INNER JOINS. Query also does paging, using this commands ORDER BY [Id] DESC OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY;, all tests (on single tables and with UNION ALL) were performed with paging (OFFSET and FETCH NEXT) keywords in place.
If required, I can provide the table details and query details. It a simple select query with 2 INNER JOINS and 2 LEFT JOINS all of the joined tables contain really small amount of data (ranging from 50 entries to 20K entries).
Here's the query
SELECT *
FROM (SELECT tr.Id,
tr.Amount,
tr.TypeId,
t.Name AS [Type],
tr.Date,
tr.ExternalKey,
tr.ExternalDescription,
tr.GameId,
tr.GameProviderId,
gp.Name AS GameProvider,
u.Username,
u.Pincode,
gp.Name,
g.GameName,
u.OperatorId,
tr.BalanceBefore,
tr.BalanceAfter,
tr.UserId
FROM (
SELECT *
FROM dbo.ActiveTransactions at
WHERE ( 1 = 1 )
AND ( [Date] >= '2017-07-17 20:00:00' )
AND ( [TypeId] != 10 )
AND ( [UserId] = 29041 )
UNION ALL
SELECT *
FROM dbo.TransactionHistory th --WITH(INDEX(IX_TransactionHistory_DateType_UserId))
WHERE ( 1 = 1 )
AND ( [Date] >= '2017-07-17 20:00:00' )
AND ( [TypeId] != 10 )
AND ( [UserId] = 29041 )
) AS tr
INNER JOIN dbo.Users u ON tr.UserId = u.Id
LEFT JOIN dbo.GameProviders gp ON tr.GameProviderId = gp.Id
LEFT JOIN dbo.Games g ON tr.GameId = g.GameId AND tr.GameProviderId = g.ProviderId
INNER JOIN dbo.Types t ON tr.TypeId = t.Id ) AS t
ORDER BY [Id] DESC OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY;

This is an educated guess: without knowledge of indexes or execution plan on what the engine is actually getting caught up on.
Consider: Doing the union after the joins/filters instead of before: you have to repeat the joins but you may gain some index efficiency's lost in a unioned set processing.
In this case I created two CTE's and then unioned them.
as I'm unable to really test this, I may have some syntax errors.
WITH cte1 AS
(SELECT tr.Id,
tr.Amount,
tr.TypeId,
t.Name AS [Type],
tr.Date,
tr.ExternalKey,
tr.ExternalDescription,
tr.GameId,
tr.GameProviderId,
gp.Name AS GameProvider,
u.Username,
u.Pincode,
gp.Name,
g.GameName,
u.OperatorId,
tr.BalanceBefore,
tr.BalanceAfter,
tr.UserId
FROM (
SELECT *
FROM dbo.ActiveTransactions at
WHERE ( 1 = 1 )
AND ( [Date] >= '2017-07-17 20:00:00' )
AND ( [TypeId] != 10 )
AND ( [UserId] = 29041 )) AS tr
INNER JOIN dbo.Users u ON tr.UserId = u.Id
LEFT JOIN dbo.GameProviders gp ON tr.GameProviderId = gp.Id
LEFT JOIN dbo.Games g ON tr.GameId = g.GameId AND tr.GameProviderId = g.ProviderId
INNER JOIN dbo.Types t ON tr.TypeId = t.Id ) AS t),
CTE2 as (
SELECT tr.Id,
tr.Amount,
tr.TypeId,
t.Name AS [Type],
tr.Date,
tr.ExternalKey,
tr.ExternalDescription,
tr.GameId,
tr.GameProviderId,
gp.Name AS GameProvider,
u.Username,
u.Pincode,
gp.Name,
g.GameName,
u.OperatorId,
tr.BalanceBefore,
tr.BalanceAfter,
tr.UserId
FROM (SELECT *
FROM dbo.TransactionHistory th --WITH(INDEX(IX_TransactionHistory_DateType_UserId))
WHERE ( 1 = 1 )
AND ( [Date] >= '2017-07-17 20:00:00' )
AND ( [TypeId] != 10 )
AND ( [UserId] = 29041 ) as tr
INNER JOIN dbo.Users u ON tr.UserId = u.Id
LEFT JOIN dbo.GameProviders gp ON tr.GameProviderId = gp.Id
LEFT JOIN dbo.Games g ON tr.GameId = g.GameId AND tr.GameProviderId = g.ProviderId
INNER JOIN dbo.Types t ON tr.TypeId = t.Id) AS t
)
SELECT * from CTE1
UNION ALL
SELECT * from CTE2
ORDER BY [Id] DESC OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY;

Related

Need suggestion on optimization query of multiple million of records with multiple joins

How can I optimize this query as EXT tables contain about 1.5 million records each. I have also other joins but they have relatively less than 50 records.
both EXT tables have set identity on with default setting and is P
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(ORDER BY ID ASC) AS RowNumber
, *
FROM History
LEFT JOIN FlattenExt1
ON History.ID = FlattenExt1.ExtID
LEFT JOIN FlattenExt2
ON History.ID = FlattenExt2.ExtId
) as final
where final.RowNumber BETWEEN (#PageIndex -1) * #PageSize + 1
AND (((#PageIndex -1) * #PageSize + 1) + #PageSize) - 1
order by final.rownumber

from what is visible, i believe optimizer's problem here is the luck of knowing if the left joins do duplicate HISTORY.ID values, affecting ROW_NUMBER.
if the left join conditions both have join limits of 0-1 rows per history row, then do the ROW_NUMBER on history alone, get the ids, then join
DECLARE #page INT = 150 , #rows INT = 10
;WITH
data AS (SELECT ID FROM History)
,rows (page, pages, rows) AS ( SELECT #page, CEILING(CAST(COUNT(*) AS float)/#page), COUNT(*) FROM data )
SELECT * FROM history INNER JOIN
(SELECT TOP (#rows) rowNumber,page, pages, rows,ID
FROM ( SELECT row_number() OVER (ORDER BY ID ASC ) rowNumber, * FROM rows, data ) pagination
WHERE rowNumber > (#page-1) * #rows
order by rowNumber
)historypageids ON history.ID = historypageids
LEFT JOIN FlattenExt1 ON History.ID = FlattenExt1.ExtID
LEFT JOIN FlattenExt2 ON History.ID = FlattenExt2.ExtId

THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION (generic SQL Server).
The following only works for SQL Server 2012+.
If you don't need the row_number() value, I would suggest:
SELECT . . .
FROM History h LEFT JOIN
FlattenExt1 f1
ON h.ID = f1.ExtID LEFT JOIN
FlattenExt2 f2
ON h.ID = f2.ExtId
ORDER BY h.ID
OFFSET (#PageIndex -1) * #PageSize + 1
FETCH NEXT #PageSize ROWS;
This should be able to take advantage of an index on History(ID), FlattenExt1(ExtId), FlattenExt2(ExtId).

MSSQL DISTINCT again

This query work perfectly on MySQL, but I should rewrite to work with MSSQL and this doesn't work
SELECT DISTINCT TOP 20 [UF].[id], [UF].[created], [Company].[name]
FROM [user_functions] AS [UF]
LEFT JOIN [companies] AS [Company] ON ([Company].[code] = [UF].[company_code])
WHERE [UF].[user_id] = 8923 AND [UF].[state] != 500
ORDER BY [UF].[created] DESC
This query return duplicated rows, even i set DISTINCT.
But, when remove [Company].[name] from SELECT it's return correctly.
I would like using many fields from [Company] and [UF] tables.

You can try row_number and get the first rownum as below:
select * from
(
SELECT DISTINCT TOP 20 [UF].[id], [UF].[created], [Company].[name],
RowNum = row_number() over(partition by [Company].[name] order by [UF].[ID])
FROM [user_functions] AS [UF]
LEFT JOIN [companies] AS [Company] ON ([Company].[code] = [UF].[company_code])
WHERE [UF].[user_id] = 8923 AND [UF].[state] != 500
) a where RowNum = 1
order by a.Created Desc

How to use multiple values in between clause

Hi all is there any way that i can use multiple values in between clause as
column_name between 0 and 100 or 200 and 300 like this
Any help would be appreciated
here is my query SELECT CASE WHEN ISNUMERIC(value_text) = 1 THEN CAST(value_text AS INT) ELSE -1 END) between 0 and 100
i just want to append multiple values in between clause
This is full query
SELECT ROW_NUMBER() OVER
(
order by Vendor_PrimaryInfo.Vendor_ID asc
)AS RowNumber
, Unit_Table.Unit_title, Vendor_Base_Price.Base_Price, Vendor_Base_Price.showprice, Category_Table.Title, Vendor_Registration.Business_Name,
Vendor_PrimaryInfo.Street_Address, Vendor_PrimaryInfo.Locality, Vendor_PrimaryInfo.Nearest_Landmark, Vendor_PrimaryInfo.City, Vendor_PrimaryInfo.State,
Vendor_PrimaryInfo.Country, Vendor_PrimaryInfo.PostalCode, Vendor_PrimaryInfo.Latitude, Vendor_PrimaryInfo.Longitude, Vendor_PrimaryInfo.ImageUrl,
Vendor_PrimaryInfo.ContactNo, Vendor_PrimaryInfo.Email,Vendor_PrimaryInfo.Vendor_ID
FROM Unit_Table INNER JOIN
Vendor_Base_Price ON Unit_Table.Unit_ID = Vendor_Base_Price.Unit_ID INNER JOIN
Vendor_PrimaryInfo ON Vendor_Base_Price.Vendor_ID = Vendor_PrimaryInfo.Vendor_ID INNER JOIN
Vendor_Registration ON Vendor_Base_Price.Vendor_ID = Vendor_Registration.Vendor_ID AND
Vendor_PrimaryInfo.Vendor_ID = Vendor_Registration.Vendor_ID INNER JOIN
Category_Table ON Vendor_Registration.Category_ID = Category_Table.Category_ID
LEFT JOIN
Vendor_Value_Table ON Vendor_Registration.Vendor_ID = Vendor_Value_Table.Vendor_ID LEFT JOIN
Feature_Table ON Vendor_Value_Table.Feature_ID = Feature_Table.Feature_ID
where Vendor_Registration.Category_ID=5 and Vendor_PrimaryInfo.City='City'
AND(
value_text in('Dhol Wala$Shahnai Wala')
or
(SELECT CASE WHEN ISNUMERIC(value_text) = 1 THEN CAST(value_text AS INT) ELSE -1 END) between 0 and 100
)

You can do this using AND/OR logic
value_text NOT LIKE '%[^0-9]%' and
(
value_text between 0 and 100
Or
value_text between 101 and 200
)
If you don't want to repeat the column name then frame the range in table valued constructor and join with your table
SELECT Row_number()
OVER (
ORDER BY Vendor_PrimaryInfo.Vendor_ID ASC )AS RowNumber,
Unit_Table.Unit_title,
Vendor_Base_Price.Base_Price,
Vendor_Base_Price.showprice,
Category_Table.Title,
Vendor_Registration.Business_Name,
Vendor_PrimaryInfo.Street_Address,
Vendor_PrimaryInfo.Locality,
Vendor_PrimaryInfo.Nearest_Landmark,
Vendor_PrimaryInfo.City,
Vendor_PrimaryInfo.State,
Vendor_PrimaryInfo.Country,
Vendor_PrimaryInfo.PostalCode,
Vendor_PrimaryInfo.Latitude,
Vendor_PrimaryInfo.Longitude,
Vendor_PrimaryInfo.ImageUrl,
Vendor_PrimaryInfo.ContactNo,
Vendor_PrimaryInfo.Email,
Vendor_PrimaryInfo.Vendor_ID
FROM Unit_Table
INNER JOIN Vendor_Base_Price
ON Unit_Table.Unit_ID = Vendor_Base_Price.Unit_ID
INNER JOIN Vendor_PrimaryInfo
ON Vendor_Base_Price.Vendor_ID = Vendor_PrimaryInfo.Vendor_ID
INNER JOIN Vendor_Registration
ON Vendor_Base_Price.Vendor_ID = Vendor_Registration.Vendor_ID
AND Vendor_PrimaryInfo.Vendor_ID = Vendor_Registration.Vendor_ID
INNER JOIN Category_Table
ON Vendor_Registration.Category_ID = Category_Table.Category_ID
LEFT JOIN Vendor_Value_Table
ON Vendor_Registration.Vendor_ID = Vendor_Value_Table.Vendor_ID
LEFT JOIN Feature_Table
ON Vendor_Value_Table.Feature_ID = Feature_Table.Feature_ID
JOIN (VALUES (0, 100),
(101, 200),
(201, 300)) tc (st, ed)
ON Try_cast(value_text AS INT) BETWEEN st AND ed
OR Try_cast(value_text AS VARCHAR(100)) = 'Dhol Wala$Shahnai Wala'
WHERE Vendor_Registration.Category_ID = 5
AND Vendor_PrimaryInfo.City = 'City'
Note : You have stored two different information's in a single column which causes lot of pain when you want to extract the data like this. Consider changing your table structure

SQL Query tuning - MS SQL Server -2012

I am new to sql tuning. I have the following SQL which takes around 15 to 20 seconds to produce the results.
SELECT D.DealerName,
Z.Zone,
C.Id ,
L.Id ,
A.Id ,
L.LeadDate,
LT.LeadType ,
EM.FirstName + ' ' + EM.LastName ,
LS.LeadSource ,
--C.*,
E.Id ,
E.StartDateTime,
0 ,
Chiefed = CASE A.AppointmentTypeId
WHEN 3 THEN 'True'
ELSE ''
END,
9 AS WorkflowPhase
FROM Customers C( NOLOCK )
INNER JOIN Dealers D
ON C.DEALERId = D.Id
INNER JOIN Leads L( NOLOCK )
ON L.CustomerId = C.Id
INNER JOIN Appointments A( NOLOCK )
ON A.LeadId = L.Id
AND ( NOT( A.AppointmentTypeId = 5
OR A.AppointmentTypeId = 6 ) )
JOIN CalendarEvents E( NOLOCK )
ON E.TableId = 1
AND E.TableRowId = A.Id
AND E.IsDeleted = 0
AND Dateadd(hh, #TZO, Getdate()) >= E.StartDateTime
LEFT OUTER JOIN AppointmentResults AR( NOLOCK )
ON AR.EventId = E.Id
LEFT OUTER JOIN LeadSources LS( NOLOCK )
ON LS.Id = L.LeadSourceId
LEFT OUTER JOIN LeadTypes LT( NOLOCK )
ON LT.Id = L.LeadTypeId
LEFT OUTER JOIN Users EM( NOLOCK )
ON EM.Id = E.EmployeeId
LEFT OUTER JOIN Zone Z( NOLOCK )
ON Z.Id = C.ZoneId
WHERE EXISTS(SELECT 1
FROM WorkflowStatus WS( NOLOCK )
WHERE TableId = 1
AND TableRowId = A.Id
AND WorkflowPhaseId = 9
AND IsCompleted = 0
AND IsDeleted = 0)
AND ( EXISTS (SELECT 1
FROM dbo.Uft_userpermissionzonesbyworkflow(#EmployeeId, 9)
WHERE ZoneId = C.zoneid) )
AND EXISTS (SELECT 1
FROM Uft_userenableddealers(#EmployeeId)
WHERE DealerId = C.DealerId)
ORDER BY C.LastName,
C.CompanyName,
C.CompanyContact
I already tuned up to my knowledge but still I can see some index scans. I tried to convert those index scans to index seek but it is not possible due to number of records.
Please refer the screenshot of plan diagram and top operations
Kindly provide any suggestions to improvise this query.

DECLARE #p TABLE (DealerId INT PRIMARY KEY WITH (IGNORE_DUP_KEY=ON))
INSERT INTO #p
SELECT DealerId
FROM dbo.Uft_userenableddealers(#EmployeeId)
DECLARE #z TABLE (ZoneId INT PRIMARY KEY WITH (IGNORE_DUP_KEY=ON))
INSERT INTO #z
SELECT ZoneId
FROM dbo.Uft_userpermissionzonesbyworkflow(#EmployeeId, 9)
SELECT ...
FROM ...
WHERE EXISTS(SELECT 1
FROM WorkflowStatus WS( NOLOCK )
WHERE TableId = 1
AND TableRowId = A.Id
AND WorkflowPhaseId = 9
AND IsCompleted = 0
AND IsDeleted = 0)
AND C.zoneid IN (SELECT * FROM #z)
AND C.DealerId IN (SELECT * FROM #p)
ORDER BY C.LastName,
C.CompanyName,
C.CompanyContact
OPTION(RECOMPILE)

as discussed below Devarts answer here the example with a CTE instead of the declared table variables. I'd assume that the declared TVs are faster due to the key, but the CTE is ad-hoc and - maybe - better integrated. Thx for testing:
;WITH p AS
(
SELECT DealerId
FROM dbo.Uft_userenableddealers(#EmployeeId)
)
,z AS
(
SELECT ZoneId
FROM dbo.Uft_userpermissionzonesbyworkflow(#EmployeeId, 9)
)
SELECT ...
FROM ...
WHERE EXISTS(SELECT 1
FROM WorkflowStatus WS( NOLOCK )
WHERE TableId = 1
AND TableRowId = A.Id
AND WorkflowPhaseId = 9
AND IsCompleted = 0
AND IsDeleted = 0)
AND C.zoneid IN (SELECT ZoneId FROM z)
AND C.DealerId IN (SELECT DealerId FROM p)
ORDER BY C.LastName,
C.CompanyName,
C.CompanyContact
OPTION(RECOMPILE)

paging and ordering a MS Access query

i have the following MS ACCESS query that i would like it to return results ordered by name and "paged" by "faking" a rownumber
select * from (SELECT *
FROM (SELECT
s.name as SHolderCategory,
c1.id,
c1.fmember,
c1.link,
m.name as category,
c1.name,
c1.address1,
c1.address2,
c1.city,
c1.state,
c1.zip,
(SELECT COUNT(c2.id) FROM orgs AS c2 WHERE c2.id <= c1.id) AS rownumber
FROM
((orgs AS c1 inner join membershipcls m on m.Id = c1.mClassID)
inner join SHolderscategories s on s.Id = c1.SHolderCategoryID
)
where c1.active = 1)
order by c1.name)
WHERE rownumber > 20 AND rownumber <=40
the problem here is that the ordering is done before the where clause which enforces paging.
so it ends up sorting one page at a time, rather than sorting the whole resultset then paging it...so the results are wrong because in page 1 i have names starting with a to g ... then in page 2 it comes back to names starting with c .... and so on
when i try to get the order clause out so that the query executes the paging first...Mr ACCESS is Angry!!! and tells me it is a COMPLEX query !!!!
any workaround for this?

try also this approach:
SELECT * FROM
(
SELECT TOP 20 *
FROM
(
SELECT TOP 40
s.name as SHolderCategory,
c1.id,
c1.fmember,
c1.link,
m.name as category,
c1.name,
c1.address1,
c1.address2,
c1.city,
c1.state,
c1.zip
FROM
orgs AS c1
inner join membershipcls m on m.Id = c1.mClassID
inner join SHolderscategories s on s.Id = c1.SHolderCategoryID
WHERE c1.active = 1
ORDER BY c1.name
) o
ORDER BY o.name DESC
) f ORDER BY f.name

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server UNION ALL Merge Join (Concatenation) too slow - sql-server

Related

Need suggestion on optimization query of multiple million of records with multiple joins

MSSQL DISTINCT again

How to use multiple values in between clause

SQL Query tuning - MS SQL Server -2012

paging and ordering a MS Access query

Categories

Resources