Missing Rows when running SELECT in SQL Server - sql-server

I have a simple select statement. It's basically 2 CTE's, one includes a ROW_NUMBER() OVER (PARTITION BY, then a join from these into 4 other tables. No functions or anything unusual.
WITH Safety_Check_CTE AS
(
SELECT
Fact_Unit_Safety_Checks_Wkey,
ROW_NUMBER() OVER (PARTITION BY [Dim_Unit_Wkey], [Dim_Safety_Check_Type_Wkey]
ORDER BY [Dim_Safety_Check_Date_Wkey] DESC) AS Check_No
FROM
[Pitches].[Fact_Unit_Safety_Checks]
), Last_Safety_Check_CTE AS
(
SELECT
Fact_Unit_Safety_Checks_Wkey
FROM
Safety_Check_CTE
WHERE
Check_No = 1
)
SELECT
COUNT(*)
FROM
Last_Safety_Check_CTE lc
JOIN
Pitches.Fact_Unit_Safety_Checks f ON lc.Fact_Unit_Safety_Checks_Wkey = f.Fact_Unit_Safety_Checks_Wkey
JOIN
DIM.Dim_Unit u ON f.Dim_Unit_Wkey = u.Dim_Unit_Wkey
JOIN
DIM.Dim_Safety_Check_Type t ON f.Dim_Safety_Check_Type_Wkey = t.Dim_Safety_Check_Type_Wkey
JOIN
DIM.Dim_Date d ON f.Dim_Safety_Check_Date_Wkey = d.Dim_Date_Wkey
WHERE
f.Safety_Check_Certificate_No IN ('GP/KB11007') --option (maxdop 1)
Sometimes it returns 0, 1 or 2 rows. The result should obviously be consistent.
I have ran a profile trace whilst replicating the issue and my session was the only one in the database.
I have compared the Actual execution plans and they are both the same, except the final hash match returns the differing number of rows.
I cannot replicate if I use MAXDOP 0.

In case you use my comment as the answer.
My guess is ORDER BY [Dim_Safety_Check_Date_Wkey] is not deterministic.

In the CTE's you are finding the [Fact_Unit_Safety_Checks_Wkey] that's associated with the most resent row for any given [Dim_Unit_Wkey], [Dim_Safety_Check_Type_Wkey] combination... With no regard for weather or not [Safety_Check_Certificate_No] is equal to 'GP/KB11007'.
Then, in the outer query, you are filtering results based on [Safety_Check_Certificate_No] = 'GP/KB11007'.
So, unless the most recent [Fact_Unit_Safety_Checks_Wkey] happens to have [Safety_Check_Certificate_No] = 'GP/KB11007', the data is going to be filtered out.

Related

Correlated SQL Server Subquery taking very long

I have this table: G_HIST with 700K rows and about 200 columns. Below is the
correlated query that is taking almost 6 minutes. Is there a better way to write it so
that it can take less than half minute.
If not, what indexes I need to have on this table? Currently it has only PK Unique
index on Primary Keys made up of 10 columns.
Here is the code below to select current version of cycle filtering
participant_identifier:
select distinct Curr.Cycle_Number, Curr.Process_Date,Curr.Group_Policy_Number,
Curr.Record_Type, Curr.Participant_Identifier,Curr.Person_Type,
Curr.Effective_Date
FROM G_HIST as Curr
WHERE Curr.Participant_Identifier not in (
select prev.Participant_Identifier
from G_HIST as Prev
where Prev.Cycle_Number = (
select max(b.Cycle_Number)-1
FROM G_HIST as b
WHERE b.Group_Policy_Number = Curr. Group_Policy_Number
)
)
AND Curr.[Cycle_Number] = (
select max(a.[Cycle_Number])
FROM G_HIST as a
WHERE a.[Group_Policy_Number] = Curr.[Group_Policy_Number]
)
You have aggregating -- MAX() -- correlated (dependent) subqueries. Those can be slow because they need to be re-evaluated for each row in the main query. Let's refactor them to ordinary subqueries. The ordinary subqueries need only be evaluated once.
You need a virtual table containing the largest Cycle_Number for each Group_Policy_Number. You get that with the following subquery.
SELECT MAX(Cycle_Number) Max_Cycle_Number,
Group_Policy_Number
FROM GHIST
GROUP BY Group_Policy_Number
This subquery will benefit, dramatically, from a multicolumn index on (Group_Policy_Number, Max_Cycle_Number).
And you have this pattern:
WHERE someColumn NOT IN (a correlated subquery)
That NOT IN can be refactored to use the LEFT JOIN ... IS NULL pattern (also known as the antijoin pattern) and an ordinary subquery. I guess your business rule says you start by finding the participant numbers in the previous cycle.
This query, using a Common Table Expression, should get you that list of participant numbers from the previous cycle for each Group_Policy_Number. You might want to inspect some results from this to ensure it gives you what you want.
WITH
Maxc AS (
SELECT MAX(Cycle_Number) Max_Cycle_Number,
Group_Policy_Number
FROM GHIST
GROUP BY Group_Policy_Number
),
PrevParticipant AS (
SELECT Participant_Identifier,
Group_Policy_Number
FROM GHIST
JOIN Maxc ON GHIST.Group_Policy_Number = Maxc.Group_Policy_Number
WHERE GHIST.Cycle_Number = Maxc.Cycle_Number - 1
)
SELECT * FROM PrevParticipant;
Then we can use the LEFT JOIN ... IS NULL pattern.
So here is the refactored query, not debugged, use at your own risk.
WITH
Maxc AS (
SELECT MAX(Cycle_Number) Max_Cycle_Number,
Group_Policy_Number
FROM G_HIST
GROUP BY Group_Policy_Number
),
PrevParticipant AS (
SELECT Participant_Identifier,
Group_Policy_Number
FROM G_HIST
JOIN Maxc ON G_HIST.Group_Policy_Number = Maxc.Group_Policy_Number
WHERE GHIST.Cycle_Number = Maxc.Cycle_Number - 1
)
SELECT DISTINCT Curr.whatever
FROM G_HIST Curr
JOIN Maxc
ON Curr.Group_Policy_Number = Maxc.Group_Policy_Number
LEFT JOIN PrevParticipant
ON Curr.Group_Policy_Number = PrevParticipant.Group_Policy_Number
AND Curr.Participant_Number <> PrevParticiant.Participant_Number
WHERE PrevParticipant.Group_Policy_Number IS NULL
AND Curr.Cycle_Number = Maxc.Cycle_Number;
If you have SQL Server 2016 or earlier you won't be able to use Common Table Expressions. Let me know in a comment and I'll show you how to write the query without them.
You can use SSMS's Actual Execution Plan to identify any other indexes you need to speed up the whole query.

Sort in query plan TSQL

Need to handle query by eliminating and improving performance by deleting sort operators which consumes the greatest amount of resources.
The temp table is around 20,000 rows and the physical table is around 60 million of rows.
I am using LAG function due to that I need to compare values in bigger table, Have You guys any idea to figure it out ?
I am posting query, but if you will need any further info then let me know.
;WITH CTE AS
(
SELECT
a.VIN_NUMBER,
B.CELL_VALUE, B.CELL_VALUE_NEGATIVE_VALUES,
ROW_NUMBER() OVER (PARTITION BY B.VIN_NUMBER, B.LOG_NUM, B.SEQUENCE_NUM_OF_CELL
ORDER BY B.VIN_NUMBER, B.DATE_OF_CELL_READ, B.LOG_NUM, B.SEQUENCE_NUM_OF_CELL) ROW_NUM,
B.CELL_VALUE - LAG(B.CELL_VALUE, 1) OVER (ORDER BY B.VIN_NUMBER, B.DATE_OF_CELL_READ, B.LOG_NUM, B.SEQUENCE_NUM_OF_CELL) CELL_VALUE_NEW
FROM
#TEMP_CHASSI_LAST_LOAD A
JOIN
DBO.LOGS_FROM_CARS B WITH (NOLOCK) ON B.ROW_CREATION_DATE BETWEEN A.MIN_ROW_CREATION_DATE
AND A.MAX_ROW_CREATION_DATE
AND A.VIN_NUMBER = B.VIN_NUMBER
)
SELECT
VIN_NUMBER,
IIF(CELL_VALUE_NEW < 0, 0, CELL_VALUE_NEW) AS CELL_VALUE_NEW,
IIF(CELL_VALUE_NEW < 0, CELL_VALUE_NEW, NULL) AS CELL_VALUE_NEGATIVE_VALUES
FROM
CTE
WHERE
ROW_NUM > 1
AND (CELL_VALUE_NEW <> CELL_VALUE OR CELL_VALUE IS NULL)
It's hard to be sure what you are doing without sample data and full execution plan, but I'd explore a few options.
First, I don't think your LAG() is correct. I think you should add PARTITION BY B.VIN_NUMBER. Pretty sure you do not want to compare values of different VIN's. This will let you get rid of your ROW_NUMBER() as LAG() will now have NULL for the first row. That means your CELL_VALUE_NEW <> CELL_VALUE will filter out, so can remove ROW_NUM > 1
Optimized Query
WITH CTE AS (
SELECT
A.VIN_NUMBER,
B.CELL_VALUE,
B.CELL_VALUE_NEGATIVE_VALUES,
B.CELL_VALUE - LAG(B.CELL_VALUE, 1) OVER (PARTITION BY B.VIN_NUMBER ORDER BY B.DATE_OF_CELL_READ, B.LOG_NUM, B.SEQUENCE_NUM_OF_CELL) CELL_VALUE_NEW
FROM #TEMP_CHASSI_LAST_LOAD AS A
INNER JOIN dbo.LOGS_FROM_CARS B WITH (NOLOCK)
ON B.ROW_CREATION_DATE BETWEEN A.MIN_ROW_CREATION_DATE AND A.MAX_ROW_CREATION_DATE
AND A.VIN_NUMBER = B.VIN_NUMBER
)
SELECT
VIN_NUMBER,
IIF(CELL_VALUE_NEW < 0, 0, CELL_VALUE_NEW) AS CELL_VALUE_NEW,
IIF(CELL_VALUE_NEW < 0, CELL_VALUE_NEW, NULL) AS CELL_VALUE_NEGATIVE_VALUES
FROM CTE
WHERE (CELL_VALUE_NEW <> CELL_VALUE OR CELL_VALUE IS NULL)
Things to Review:
Double check data types for your join conditions. Ex. make sure MIN_ROW_CREATION_DATE and MAX_ROW_CREATION_DATE are the same as ROW_CREATION_DATE. Makes sure it's not text vs date. Ideally VIN_NUMBER is using CHAR(17) (all car VIN's are 17 characters)
Create index on larger table (and maybe try one on the temp table. Query performance improvement might be worth the time to create the index on the temp table)
CREATE INDEX ix_test ON dbo.LOGS_FROM_CARS(VIN_NUMBER,ROW_CREATION_DATE)
INCLUDE (CELL_VALUE,CELL_VALUE_NEGATIVE_VALUES,DATE_OF_CELL_READ, LOG_NUM, SEQUENCE_NUM_OF_CELL)
Try FORCESEEK option on table join to LOGS_FROM_CARS. Be cautious using query hints as can lead to issues down the road, but might be worth it for this query
Are you sure you need CELL_VALUE_NEGATIVE_VALUES from LOGS_FROM_CARS? I don't see it used anywhere. Would remove that from the query if you don't need it

Count all max number value in difference tables sql

I got an error when I tried to solve this problem. First I need to count all values of 2 tables then I need in where condition get all max values.
My code:
Select *
FROM (
select Operator.OperatoriausPavadinimas,
(
select count(*)
from Plan
where Plan.operatoriausID= Operator.operatoriausID
) as NumberOFPlans
from Operator
)a
where a.NumberOFPlans= Max(a.NumberOFPlans)
I get this error
Msg 147, Level 15, State 1, Line 19
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
I don't know how to solve this.
I need get this http://prntscr.com/p700w9
Update 1
Plan table contains of http://prntscr.com/p7055l values and
Operator table contains of http://prntscr.com/p705k0 values.
Are you looking for... an aggregate query that joins both tables and returns the record that has the maximum count?
I suspect that this might phrase as follows:
SELECT TOP(1) o.OperatoriausPavadinimas, COUNT(*)
FROM Operatorius o
INNER JOIN Planas p ON p.operatoriausID = o.operatoriausID
GROUP BY o.OperatoriausPavadinimas
ORDER BY COUNT(*) DESC
If you want to allow ties, you can use TOP(1) WITH TIES.
You can use top with ties. Your query is a bit hard to follow, but I think you want:
select top (1) with ties o.OperatoriausPavadinimas, count(*)
from plan p join
operator o
on p.operatoriausID = o.operatoriausID
group by o.OperatoriausPavadinimas
order by count(*) desc;

RowNumber() and Partition By performance help wanted

I've got a table of stock market moving average values, and I'm trying to compare two values within a day, and then compare that value to the same calculation of the prior day. My sql as it stands is below... when I comment out the last select statement that defines the result set, and run the last cte shown as the result set, I get my data back in about 15 minutes. Long, but manageable since it'll run as an insert sproc overnight. When I run it as shown, I'm at 40 minutes before any results even start to come in. Any ideas? It goes from somewhat slow, to blowing up, probably with the addition of ROW_NUMBER() OVER (PARTITION BY) BTW I'm still working through the logic, which is currently impossible with this performance issue. Thanks in advance..
Edit: I fixed my partition as suggested below.
with initialSmas as
(
select TradeDate, Symbol, Period, Value
from tblDailySMA
),
smaComparisonsByPer as
(
select i.TradeDate, i.Symbol, i.Period FastPer, i.Value FastVal,
i2.Period SlowPer, i2.Value SlowVal, (i.Value-i2.Value) FastMinusSlow
from initialSmas i join initialSmas as i2 on i.Symbol = i2.Symbol
and i.TradeDate = i2.TradeDate and i2.Period > i.Period
),
smaComparisonsByPerPartitioned as
(
select ROW_NUMBER() OVER (PARTITION BY sma.Symbol, sma.FastPer, sma.SlowPer
ORDER BY sma.TradeDate) as RowNum, sma.TradeDate, sma.Symbol, sma.FastPer,
sma.FastVal, sma.SlowPer, sma.SlowVal, sma.FastMinusSlow
from smaComparisonsByPer sma
)
select scp.TradeDate as LatestDate, scp.FastPer, scp.FastVal, scp.SlowPer, scp.SlowVal,
scp.FastMinusSlow, scp2.TradeDate as LatestDate, scp2.FastPer, scp2.FastVal, scp2.SlowPer,
scp2.SlowVal, scp2.FastMinusSlow, (scp.FastMinusSlow * scp2.FastMinusSlow) as Comparison
from smaComparisonsByPerPartitioned scp join smaComparisonsByPerPartitioned scp2
on scp.Symbol = scp2.Symbol and scp.RowNum = (scp2.RowNum - 1)
1) You have some fields both in the Partition By and the Order By clauses. That doesn't make sense since you will have one and only one value for each (sma.FastPer, sma.SlowPer). You can safely remove these fields from the Order By part of the window function.
2) Assuming that you already have indexes for adequate performance in "initialSmas i join initialSmas" and that you already have and index for (initialSmas.Symbol, initialSmas.Period, initialSmas.TradeDate) the best you can do is to copy smaComparisonsByPer into a temporary table where you can create an index on (sma.Symbol, sma.FastPer, sma.SlowPer, sma.TradeDate)

Efficient way to get max date before a given date

Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.
It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.
I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!
;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
FROM Transaction d
INNER JOIN Price v
ON v.FundCode = d.FundCode
WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)
Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.

Resources