SQL not doing the join correctly - sql-server

I have a SQL statement with some JOIN condition it is working fine for all of them but not the last one the code is below:
SELECT
A.EMPL_CTG,
B.DESCR AS PrName,
SUM(A.CURRENT_COMPRATE) AS SALARY_COST_BUDGET,
SUM(A.BUDGET_AMT) AS BUDGET_AMT,
SUM(A.BUDGET_AMT)*100/SUM(A.CURRENT_COMPRATE) AS MERIT_GOAL,
SUM(C.FACTOR_XSALARY) AS X_Programp,
SUM(A.FACTOR_XSALARY) AS X_Program,
COUNT(A.EMPLID) AS EMPL_CNT,
COUNT(D.EMPLID),
SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END) AS PRMCNT,
SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END) AS EXPCNT,
(SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END)+SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END))*100/(COUNT(A.EMPLID)) AS PEpercent
FROM
EMP_DTL A INNER JOIN EMPL_CTG_L1 B ON A.EMPL_CTG = B.EMPL_CTG
INNER JOIN
ECM_PRYR_VW C ON A.EMPLID=C.EMPLID
INNER JOIN ECM_INELIG D on D.EMPL_CTG=A.EMPL_CTG and D.YEAR=YEAR(getdate())
WHERE
A.YEAR=YEAR(getdate())
AND B.EFF_STATUS='A'
GROUP BY
A.EMPL_CTG,
B.DESCR
ORDER BY B.DESCR
The COUNT(D.EMPLID) is returning the same value as COUNT(A.EMPLID) but I need the count of EMPLIDs for Table D in the join condition, any help?

COUNT() (and also the other GROUP BY aggregate functions) doesn't process only the rows from one of the tables.
They work on all the rows produced by the JOIN. If the JOIN without GROUP BY produces 42 rows then COUNT(*) and COUNT(1) returns 42 while COUNT(A.EMPLID) and COUNT(D.EMPLID) return the number of not-NULL values in those columns.
In order to get the number of rows extracted from one of the tables the you should use COUNT(DISTINCT). It ignores the NULL values and also the duplicates produced by the JOIN.
Change COUNT(D.EMPLID) to COUNT(DISTINCT D.EMPLID).

Related

Aggregate function is return incorrect value when joining more table

I am getting the Total sale using the following query.
SELECT SUM([B].[TotalSale])
FROM [dbo].[BookingDetail] [BF] WITH (READPAST)
INNER JOIN [dbo].[Booking] [B] WITH (READPAST) ON [B].[BookingDetailID] = [BF].[ID]
WHERE [BF].[MarketID] = '2'
I want to add another column to get the Gross Sale .
For that I have to make a join with another table called AirTraveler.
But once I add a new table to the query
SELECT
SUM([B].[TotalSale]) ,
SUM(CASE WHEN [B].[TravelSectorID] = 3 AND [B].[BookingStatusID] IN (16, 20, 22, 23) THEN COALESCE([B].[TotalSale], 0.0)
WHEN ([B].[TravelSectorID] = 1 AND [B].[IsDomestic] = 1 AND CONVERT(varchar, [AT].[FareDetails].query('string(/AirFareInfo[1]/PT[1])')) = 'FlightAndHotel') THEN [AT].[TotalSale]
ELSE 0 END) AS [GrossSale]
FROM [dbo].[BookingDetail] [BF] WITH (READPAST)
INNER JOIN [dbo].[Booking] [B] WITH (READPAST) ON [B].[BookingDetailID] = [BF].[ID]
LEFT OUTER JOIN [dbo].[AirTraveler] [AT] WITH(READPAST) ON [B].[ID] = [AT].[BookingID]
WHERE [BF].[MarketID] = '2'
it is giving incorrect result of [TotalSale] .the aggregate functions return wrong values because there may be multiple AirTraveler per Booking ID, which is correct. What can I do to solve the aggregate function problem?
I am actually stuck.
I am using SQL Server .
Thanks in advance.
Not tested or anything, but when you are joining to a lower level table that causes a header table to double count, you can pre-aggregate it before it joins
This is probably missing some opening/closing brackets and aliases but hopefully you can work it out
SELECT
SUM([B].[TotalSale]) ,
SUM(CASE WHEN [B].[TravelSectorID] = 3
AND [B].[BookingStatusID] IN (16, 20, 22, 23)
THEN COALESCE([B].[TotalSale], 0.0)
WHEN ([B].[TravelSectorID] = 1 AND [B].[IsDomestic] = 1
THEN [AT].[TotalSale]
ELSE 0 END) AS [GrossSale]
FROM [dbo].[BookingDetail] [BF] WITH (READPAST)
INNER JOIN [dbo].[Booking] [B] WITH (READPAST) ON [B].[BookingDetailID] = [BF].[ID]
LEFT OUTER JOIN
(
SELECT BookingID, SUM(CASE WHEN
CONVERT(varchar(50), [FareDetails].query('string(/AirFareInfo[1]/PT[1])'))
= 'FlightAndHotel') THEN [TotalSale] ELSE 0 END) TotalSale
FROM [dbo].[AirTraveler] [AT] WITH(READPAST)
GROUP BY BookingID
) AT
ON [B].[ID] = [AT].[BookingID]
WHERE [BF].[MarketID] = '2'
Also I gave your varchar cast a size - I think if you don't do this it'll be 1 so your case is never true

Althought I specify one expression sql gives "Only one expression can be specified error"

DS.SOURCE_TYPE,
(SELECT
I_MAX_VADE,
CASE
WHEN (PA.ACTOR_KIND = 5 AND PA.PROCESS_ID = PROCESS_ID)
THEN 0
ELSE I_MAX_VADE
END
FROM
KDS_INTER_FAKTORING_OMDM_PARAMS
WHERE
PA.ID = PROCESS_ACTOR_ID) AS TERER
FROM
dbo.PROCESS_ACTOR AS PA
JOIN
dbo.OMDM_RESULT AS O ON O.PROCESS_ID = PA.PROCESS_ID
JOIN
dbo.KDS_PROPOSAL_OMDM_PARAMS AS POP ON POP.PROCESS_ID = PA.PROCESS_ID
JOIN
dbo.PROPOSAL_SNAP AS PS ON PS.PROCESS_ID = PA.PROCESS_ID
JOIN
dbo.DEBTOR_SNAP AS DS ON DS.PROCESS_ID = PA.PROCESS_ID
This simple query throws the error specified below.
I think I have already selected one column in my subquery. I would like to assign I_MAX_VADE to its db value.
if (PA.ACTOR_KIND = 5 AND PA.PROCESS_ID = PROCESS_ID) else zero.
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
Your subquery select returns two columns: I_MAX_VADE and the result of the CASE expression. That is one too many.
Use OUTER APPLY. It is like a correlated subquery (sometimes) but it is in the FROM clause rather than the SELECT. And, you can have as many columns as you like:
SELECT . . .,
DS.SOURCE_TYPE,
params.*
FROM dbo.PROCESS_ACTOR PA JOIN
dbo.OMDM_RESULT O
ON O.PROCESS_ID = PA.PROCESS_ID JOIN
dbo.KDS_PROPOSAL_OMDM_PARAMS POP
ON POP.PROCESS_ID = PA.PROCESS_ID JOIN
dbo.PROPOSAL_SNAP PS
ON PS.PROCESS_ID = PA.PROCESS_ID JOIN
dbo.DEBTOR_SNAP DS
ON DS.PROCESS_ID = PA.PROCESS_ID OUTER APPLY
(SELECT I_MAX_VADE,
(CASE WHEN PA.ACTOR_KIND = 5 AND PA.PROCESS_ID = params.PROCESS_ID THEN 0
ELSE params.I_MAX_VADE
END) as TERER
FROM KDS_INTER_FAKTORING_OMDM_PARAMS params
WHERE PA.ID = params.PROCESS_ACTOR_ID
) params
Note: I had to guess at the source of a couple columns. You should always qualify column names -- and this is even more important when you are using correlated subqueries.
+1 to answer from HABO, but I'll add an example:
WRONG:
SELECT A, B, (SELECT X, Y FROM MyTable) AS C
FROM OtherTable
When you put a subquery into your select-list, the subquery must be a scalar subquery; i.e. it must be guaranteed to return one column and one row.
The example above is wrong because the subquery returns two columns, and it is not necessarily going to return a single row.
RIGHT:
SELECT A, B, (SELECT TOP 1 X FROM MyTable) AS C
FROM OtherTable
I finally figure out how I can accomplish my task.Here is my updated code.
I simply remove my column(I_MAX_VADE) from select case and insert it into then clause.
**
Before
**
(SELECT
I_MAX_VADE,
CASE
WHEN (PA.ACTOR_KIND = 5 AND PA.PROCESS_ID = PROCESS_ID)
THEN 0
ELSE I_MAX_VADE
END
FROM
KDS_INTER_FAKTORING_OMDM_PARAMS
WHERE
PA.ID = PROCESS_ACTOR_ID) AS TERER
**
After
**
(SELECT
(CASE
WHEN (PA.ACTOR_KIND=5)
THEN (SELECT I_MAX_VADE FROM KDS_INTER_FAKTORING_OMDM_PARAMS
WHERE PA.ID=PROCESS_ACTOR_ID )
ELSE 0 END) ) AS TERER

How can I convert T-SQL syntax, which has written by CASE WHEN, to pivot or window functions

How can I convert T-SQL syntax, which has written by CASE WHEN, to PIVOT or window functions with following code:
SELECT
T.TaskID,
SUM(CASE WHEN T.LogDate<'2016-02-04' AND T.TaskStatusID=2 THEN ISNULL(DA_CHILD.Value,0)*(T.DoneScore/100) ELSE 0 END) PreAmount,
SUM(CASE WHEN T.LogDate>='2016-02-04' AND T.LogDate<='2017-02-04' AND T.TaskStatusID=2 THEN ISNULL(DA_CHILD.Value,0)*(T.DoneScore/100) ELSE 0 END) CurAmount,
SUM(CASE WHEN T.LogDate>'2017-02-04' AND T.TaskStatusID=2 THEN ISNULL(DA_CHILD.Value,0)*(T.DoneScore/100) ELSE 0 END) AfterAmount
FROM
NetTasks$ T
INNER JOIN NetDeviceActions DA ON DA.DeviceActionID=T.DeviceActionID
INNER JOIN NetActionParents AP ON AP.ParentID=DA.ActionID
INNER JOIN NetDeviceActions DA_CHILD ON DA_CHILD.ActionID=AP.ChildID AND
DA_CHILD.DeviceID=DA.DeviceID AND
DA_CHILD.ContractInfoID=DA.ContractInfoID
WHERE
T.ParentTaskID = 0 AND
T.FinishDate<='2017-01-07' AND
DA.ContractInfoID=15
GROUP BY
T.TaskID, T.DoneScore,T.FinishDate
This is my result:
TaskID PreAmount CurAmount AfterAmount
686170 0 0 0
655768 NULL 0 0
734520 0 NULL 0
682661 0 NULL 0
Here is the first transformation that you have to put your data through:
Please test this for errors first
SELECT
T.TaskID,
CASE
WHEN T.LogDate<'2016-02-04' AND T.TaskStatusID=2
THEN 'PreAmount'
WHEN T.LogDate>='2016-02-04' AND T.LogDate<='2017-02-04'
AND T.TaskStatusID=2 THEN 'CurAmount'
WHEN T.LogDate>'2017-02-04' AND T.TaskStatusID=2 THEN 'AfterAmount'
END As Type,
ISNULL(DA_CHILD.Value,0)*(T.DoneScore/100) Amount
FROM
NetTasks$ T
INNER JOIN NetDeviceActions DA ON DA.DeviceActionID=T.DeviceActionID
INNER JOIN NetActionParents AP ON AP.ParentID=DA.ActionID
INNER JOIN NetDeviceActions DA_CHILD ON DA_CHILD.ActionID=AP.ChildID
AND DA_CHILD.DeviceID=DA.DeviceID
AND DA_CHILD.ContractInfoID=DA.ContractInfoID
WHERE
T.ParentTaskID = 0 AND
T.FinishDate<='2017-01-07' AND
DA.ContractInfoID=15
Now if you take a look at this page:
https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
The query above takes place of (<SELECT query that produces the data>)
So when we follow those instructions and look at some examples, we come up with this:
SELECT
TaskID,
[PreAmount],[CurAmount],[AfterAmount]
FROM (
SELECT
T.TaskID,
CASE
WHEN T.LogDate<'2016-02-04' AND T.TaskStatusID=2
THEN 'PreAmount'
WHEN T.LogDate>='2016-02-04' AND T.LogDate<='2017-02-04'
AND T.TaskStatusID=2 THEN 'CurAmount'
WHEN T.LogDate>'2017-02-04' AND T.TaskStatusID=2 THEN 'c'
END As Type,
ISNULL(DA_CHILD.Value,0)*(T.DoneScore/100) Amount
FROM
NetTasks$ T
INNER JOIN NetDeviceActions DA ON DA.DeviceActionID=T.DeviceActionID
INNER JOIN NetActionParents AP ON AP.ParentID=DA.ActionID
INNER JOIN NetDeviceActions DA_CHILD ON DA_CHILD.ActionID=AP.ChildID
AND DA_CHILD.DeviceID=DA.DeviceID
AND DA_CHILD.ContractInfoID=DA.ContractInfoID
WHERE
T.ParentTaskID = 0 AND
T.FinishDate<='2017-01-07' AND
DA.ContractInfoID=15
) As SRC
PIVOT
(
SUM(Amount)
FOR Type IN [PreAmount],[CurAmount],[AfterAmount]
) As PivotTable
Here are all the problems with this:
You are not going to see any performance improvement here
Your code is more complicated and less maintainable
Problems with original query:
it's full of hard coded dates
SUMing a rate usually gives you a nonsense number. Usually you sum numerator and denominator first then divide. Maybe that's your real problem
PS the window function makes even less sense I can't even think of a sensible way to do that.

TSQL Partitioning to COUNT() only subset of rows

I have a query that I am working on that, for every given month and year in a sales table, returns back a SUM() of the total items ordered as well as a count of the distinct number of accounts ordering items and a couple break down of SUm()s on various product types. See the query below for an example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
The problem I am facing now is that I also need to get the count of distinct accounts broken out based on another field in the dataset. For example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered,
SUM(CASE IsInitial WHEN 1 THEN Quantity ELSE 0 END) as InitialOrders,
SUM(CASE IsInitial WHEN 0 THEN Quantity ELSE 0 END) as Reorders,
COUNT(/*DISTINCT DiamndId WHERE IsInitial = 1 */) as InitialOrderAccounts
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
Obviously we would replace the commented out section in the last SUM with something that would not throw an error. I just added that for illustrative purposes.
I feel like this can be done using the Partition methods in SQL, but I have to admit that I just am not very good with them and cant figure out how to do this. And the MS documentation online for Partitions really just makes my head hurt after reading it last night.
EDIT: I mistakenly has the last aggregate function as sum and I meant for it to be a COUNT().
So to help clarify, COUNT(DISTINCT DiamondId) will give me back a count of all unique DiamondId values in the set but I also need to get a COUNT() of all Unique Diamond Id values in the set whose corresponding IsInitial flag is set to 1
Just a matter of nulling the ones that don't qualify:
count(distinct
case
when IsInitial = 1 then DiamndId
/* else null */
end
)

Count unique IDs with Case

I have this Query(using SQL Server 2008):
SELECT
ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
COUNT(CASE WHEN sd.intResult=4 THEN 1 ELSE NULL END) as Positives
FROM TB_Test t
LEFT JOIN TB_Sample s ON t.intTestID=s.intTestID
LEFT JOIN TB_Sample_Drug sd ON s.intSampleID=sd.intSampleID
LEFT JOIN TB_Employee e ON t.intEmployeeID=e.intEmployeeID
LEFT JOIN TB_Test_Type tt ON t.intTestTypeID=tt.intTestTypeID
WHERE
s.dtmCollection BETWEEN '2013-06-01 00:00' AND '2013-08-31 23:59'
AND f.intCompanyID = 91
GROUP BY
tt.strType
The thing is each sample has four records on sample_drug, which represents the drugs tested on that sample. And the result of the sample can be positive from one to four drugs at the same time.
What I need to show in the third column is just if the sample was positive, no matter how many drugs.
And I can't find a way to do that, because i need the CASE WHEN to know that i want all the Results = 4 but just from unique intSampleIDs.
Thanks in advance!
If I understand correctly, the easiest way is to use max() instead of count():
SELECT ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
max(CASE WHEN sd.intResult = 4 THEN 1 ELSE NULL END) as HasPositives
. . .
EDIT:
If you want the number of samples with a positive result, then use count(distinct) with a case statement.
SELECT ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
count(distinct CASE WHEN sd.intResult = 4 THEN s.intSampleID end) as NumPositiveSamples
instead of this:
COUNT(CASE WHEN sd.intResult=4 THEN 1 ELSE NULL END) as Positives
try this:
SUM(CASE WHEN sd.intResult=4 THEN 1 ELSE 0 END) as Positives
Should work if i undestood correctly that you want show that a row in TB_Sample_Drug marked with intResult = 4 (1) or not (null) in the positives column.
SELECT
ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
CASE WHEN sd.Positives > 0 THEN 1 ELSE NULL END as Positives
FROM TB_Test t
LEFT JOIN TB_Sample s ON t.intTestID=s.intTestID
LEFT JOIN TB_Employee e ON t.intEmployeeID=e.intEmployeeID
LEFT JOIN TB_Test_Type tt ON t.intTestTypeID=tt.intTestTypeID
CROSS APPLY (select count(*) as Positives from TB_Sample_Drug sd where s.intSampleID=sd.intSampleID and sd.intResult=4) sd
WHERE
s.dtmCollection BETWEEN '2013-06-01 00:00' AND '2013-08-31 23:59'
AND f.intCompanyID = 91
GROUP BY
tt.strType

Resources