TSQL Partitioning to COUNT() only subset of rows - sql-server

I have a query that I am working on that, for every given month and year in a sales table, returns back a SUM() of the total items ordered as well as a count of the distinct number of accounts ordering items and a couple break down of SUm()s on various product types. See the query below for an example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
The problem I am facing now is that I also need to get the count of distinct accounts broken out based on another field in the dataset. For example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered,
SUM(CASE IsInitial WHEN 1 THEN Quantity ELSE 0 END) as InitialOrders,
SUM(CASE IsInitial WHEN 0 THEN Quantity ELSE 0 END) as Reorders,
COUNT(/*DISTINCT DiamndId WHERE IsInitial = 1 */) as InitialOrderAccounts
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
Obviously we would replace the commented out section in the last SUM with something that would not throw an error. I just added that for illustrative purposes.
I feel like this can be done using the Partition methods in SQL, but I have to admit that I just am not very good with them and cant figure out how to do this. And the MS documentation online for Partitions really just makes my head hurt after reading it last night.
EDIT: I mistakenly has the last aggregate function as sum and I meant for it to be a COUNT().
So to help clarify, COUNT(DISTINCT DiamondId) will give me back a count of all unique DiamondId values in the set but I also need to get a COUNT() of all Unique Diamond Id values in the set whose corresponding IsInitial flag is set to 1

Just a matter of nulling the ones that don't qualify:
count(distinct
case
when IsInitial = 1 then DiamndId
/* else null */
end
)

Related

Group by two columns with case query in SQL Server

I'm trying to retrieve drivers data with Total Accepted and Total Ignored ride requests for the current date.
Based on the Drivers and DriverReceivedRequests tables, I get the total count but the twists is I have duplicate rows that reside on the DriverReceivedRequest table against the driverId and the rideId. So there has to be group by clause on both the driverId and RideId, having the driver getting multiple requests for the current date, but also receiving twice or thrice request for the same ride as well.
This is the table structure for DriverReceivedRequests:
Id DriverId RideId ReceivedStatusId DateTime
------------------------------------------------------------------------
0014d26b 93665f55 fef6fb96 NULL 04:55.6
00175c65 6e62a94e cb214a84 NULL 09:32.1
0017c22b ec9e1297 4b47dc8a 4211357D 10:28:5
0014d26b 6e62a94e fef6fb96 NULL 04:56.8
This is the query I have tried:
select
d.Id, d.FirstName, d.LastName,
Sum(case when drrs.Number = 1 then 1 else 0 end) as TotalAccepted,
Sum(case when drrs.Number = 2 or drr.ReceivedStatusId is null then 1 else 0 end) as TotalRejected
from
dbo.[DriverReceivedRequests] drr
inner join
dbo.[Drivers] d on drr.DriverId = d.Id
left join
dbo.[DriverReceivedRequestsStatus] drrs on drr.ReceivedStatusId = drrs.Id
where
Day(drr.Datetime) = Day(getdate())
and month(drr.DateTime) = Month(getdate())
group by
d.FirstName, d.LastName, d.Id
In the above query if I group by with RideId as well, it generates duplicate names of drivers as well with incorrect data. I've also applied partition by clause with DriverId but not the correct result
This also generates the same result
WITH cte AS
(
SELECT
d.Id,
d.FirstName, d.LastName,
TotalAccepted = SUM (CASE WHEN drrs.Number = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY drr.DriverId),
TotalRejected = SUM (CASE WHEN drrs.Number = 2 OR drr.ReceivedStatusId IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY drr.DriverId),
rn = ROW_NUMBER() OVER(PARTITION BY drr.DriverId
ORDER BY drr.DateTime DESC)
FROM
DriverReceivedRequests drr
INNER JOIN
dbo.[Drivers] d ON drr.DriverId = d.Id
LEFT JOIN
dbo.[DriverReceivedRequestsStatus] drrs ON drr.ReceivedStatusId = drrs.Id
WHERE
DAY (drr.Datetime) = DAY (GETDATE())
AND MONTH (drr.DateTime) = MONTH (GETDATE())
)
SELECT
Id,
FirstName, LastName,
TotalAccepted, TotalRejected
FROM
cte
WHERE
rn = 1
My question is how can I group by individual driver data in terms of incoming ride requests?
Note: the driver receives same ride request multiple times

select distinct with parent-child and return boolean for at least one child with a value?

I am currently searching for orders that have at least one orderline (product) with a certain boolean set:
- the product is a subscription product
- the product is a setup product
If one of the orderlines has this value set to 1, I want to return this in the query per DISTINCT order ID.
This does not seem to work for me:
SELECT DISTINCT [ORDER].[order_id]
,[ORDERLINE].[is_subscription] AS hasSubArticles
,[ORDERLINE].[is_setup] AS hasSetupArticles
FROM [ORDER]
LEFT JOIN [ORDERLINE]
ON [ORDER].[order_id] = [ORDERLINE].[f_order_id]
WHERE [G_ORDER].[status] = 1
ORDER BY [ORDER].[order_id]
,[ORDERLINE].[is_subscription] AS hasSubArticles
,[ORDERLINE].[is_setup] AS hasSetupArticles
When I check the returned records, I receive duplicate ORDER records:
order_id hasSubArticles hasSetupArticles
----------------------------------------
17804 NULL NULL
17804 1 0
I want to return only 1 record per order ID, thus this isn't working for me.
What am I doing wrong?
Distinct does not work for your requirement. MAX, Min functions are not allowed to use with bit type. You could use Group by and SUM like this
SELECT
[ORDER].[order_id]
,CASE WHEN SUM( CASE WHEN [ORDERLINE].[is_subscription] = 1 THEN 1 ELSE 0 END) > 0 THEN 1
ELSE 0
END AS hasSubArticles
,CASE WHEN SUM( CASE WHEN [ORDERLINE].[is_setup] = 1 THEN 1 ELSE 0 END) > 0
THEN 1
ELSE 0
END hasSetupArticles
FROM [ORDER]
LEFT JOIN [ORDERLINE]
ON [ORDER].[order_id] = [ORDERLINE].[f_order_id]
WHERE [G_ORDER].[status] = 1
GROUP BY [ORDER].[order_id]

Look at data in one date range and count from another date range

I'm stuck on a query where I am trying to get information on just customers that are newly acquired during a certain date range.
I had need to get a list of customers who placed their first order (of all time) in the first 6 months of the year. I then need to get total of their invoices, first invoice date, last invoice date, and count of orders for just the last 6 months.
I used a HAVING clause to ensure that I am just looking at customers that placed their first order in that 6 month period, but since we are past that period now, the total invoice info and order count information would include orders placed after this time. I considered including a restriction in the HAVING clause for the 'last invoice date', but then I am eliminating customers whose first order date was in the 6 month block, but also ordered after that. I'm not sure what to do next and am not having luck finding similar questions. Here is what I have so far:
SELECT c.customer, MAX(c.name) AS Name,
SUM(
CASE WHEN im.debit = 0
THEN im.amount * -1
ELSE im.amount
END
) AS TotalInvoiceAmount,
MIN(
im.date) AS FirstInvoiceDate,
MAX(
im.date) AS LastInvoiceDate,
COUNT(DISTINCT om.[order]) AS OrderCount
FROM invoicem im
INNER JOIN customer c ON im.customer = c.customer
FULL JOIN orderm om ON im.customer = om.customer
WHERE im.amount <> 0
GROUP BY c.customer
HAVING MIN(im.date) BETWEEN '01-01-2015' AND '06-30-2015'
ORDER BY c.customer
You can put the first 6 months qualification as a subquery. This would also work as a CTE
declare #startDate date = dateadd(month,-6,getdate())
SELECT c.customer, MAX(c.name) AS Name,
SUM(
CASE WHEN im.debit = 0
THEN im.amount * -1
ELSE im.amount
END
) AS TotalInvoiceAmount,
MIN(
im.date) AS FirstInvoiceDate,
MAX(
im.date) AS LastInvoiceDate,
COUNT(DISTINCT om.[order]) AS OrderCount
FROM invoice im
INNER JOIN (SELECT customer from invoice
GROUP BY customer
HAVING MIN(date) BETWEEN '01-01-2015'
AND '06-30-2015') im2
ON im.customer = im2.customer
INNER JOIN customer c ON im.customer = c.customer
FULL JOIN orderm om ON im.customer = om.customer
WHERE im.amount <> 0
AND im.date >= #startdate
GROUP BY c.customer
ORDER BY c.customer

SQL not doing the join correctly

I have a SQL statement with some JOIN condition it is working fine for all of them but not the last one the code is below:
SELECT
A.EMPL_CTG,
B.DESCR AS PrName,
SUM(A.CURRENT_COMPRATE) AS SALARY_COST_BUDGET,
SUM(A.BUDGET_AMT) AS BUDGET_AMT,
SUM(A.BUDGET_AMT)*100/SUM(A.CURRENT_COMPRATE) AS MERIT_GOAL,
SUM(C.FACTOR_XSALARY) AS X_Programp,
SUM(A.FACTOR_XSALARY) AS X_Program,
COUNT(A.EMPLID) AS EMPL_CNT,
COUNT(D.EMPLID),
SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END) AS PRMCNT,
SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END) AS EXPCNT,
(SUM(CASE WHEN A.PROMOTION_SECTION = 'Y' THEN 1 ELSE 0 END)+SUM(CASE WHEN A.EXCEPT_IND = 'Y' THEN 1 ELSE 0 END))*100/(COUNT(A.EMPLID)) AS PEpercent
FROM
EMP_DTL A INNER JOIN EMPL_CTG_L1 B ON A.EMPL_CTG = B.EMPL_CTG
INNER JOIN
ECM_PRYR_VW C ON A.EMPLID=C.EMPLID
INNER JOIN ECM_INELIG D on D.EMPL_CTG=A.EMPL_CTG and D.YEAR=YEAR(getdate())
WHERE
A.YEAR=YEAR(getdate())
AND B.EFF_STATUS='A'
GROUP BY
A.EMPL_CTG,
B.DESCR
ORDER BY B.DESCR
The COUNT(D.EMPLID) is returning the same value as COUNT(A.EMPLID) but I need the count of EMPLIDs for Table D in the join condition, any help?
COUNT() (and also the other GROUP BY aggregate functions) doesn't process only the rows from one of the tables.
They work on all the rows produced by the JOIN. If the JOIN without GROUP BY produces 42 rows then COUNT(*) and COUNT(1) returns 42 while COUNT(A.EMPLID) and COUNT(D.EMPLID) return the number of not-NULL values in those columns.
In order to get the number of rows extracted from one of the tables the you should use COUNT(DISTINCT). It ignores the NULL values and also the duplicates produced by the JOIN.
Change COUNT(D.EMPLID) to COUNT(DISTINCT D.EMPLID).

Count unique IDs with Case

I have this Query(using SQL Server 2008):
SELECT
ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
COUNT(CASE WHEN sd.intResult=4 THEN 1 ELSE NULL END) as Positives
FROM TB_Test t
LEFT JOIN TB_Sample s ON t.intTestID=s.intTestID
LEFT JOIN TB_Sample_Drug sd ON s.intSampleID=sd.intSampleID
LEFT JOIN TB_Employee e ON t.intEmployeeID=e.intEmployeeID
LEFT JOIN TB_Test_Type tt ON t.intTestTypeID=tt.intTestTypeID
WHERE
s.dtmCollection BETWEEN '2013-06-01 00:00' AND '2013-08-31 23:59'
AND f.intCompanyID = 91
GROUP BY
tt.strType
The thing is each sample has four records on sample_drug, which represents the drugs tested on that sample. And the result of the sample can be positive from one to four drugs at the same time.
What I need to show in the third column is just if the sample was positive, no matter how many drugs.
And I can't find a way to do that, because i need the CASE WHEN to know that i want all the Results = 4 but just from unique intSampleIDs.
Thanks in advance!
If I understand correctly, the easiest way is to use max() instead of count():
SELECT ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
max(CASE WHEN sd.intResult = 4 THEN 1 ELSE NULL END) as HasPositives
. . .
EDIT:
If you want the number of samples with a positive result, then use count(distinct) with a case statement.
SELECT ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
count(distinct CASE WHEN sd.intResult = 4 THEN s.intSampleID end) as NumPositiveSamples
instead of this:
COUNT(CASE WHEN sd.intResult=4 THEN 1 ELSE NULL END) as Positives
try this:
SUM(CASE WHEN sd.intResult=4 THEN 1 ELSE 0 END) as Positives
Should work if i undestood correctly that you want show that a row in TB_Sample_Drug marked with intResult = 4 (1) or not (null) in the positives column.
SELECT
ISNULL(tt.strType,'Random'),
COUNT(DISTINCT t.intTestID) as Qt_Tests,
CASE WHEN sd.Positives > 0 THEN 1 ELSE NULL END as Positives
FROM TB_Test t
LEFT JOIN TB_Sample s ON t.intTestID=s.intTestID
LEFT JOIN TB_Employee e ON t.intEmployeeID=e.intEmployeeID
LEFT JOIN TB_Test_Type tt ON t.intTestTypeID=tt.intTestTypeID
CROSS APPLY (select count(*) as Positives from TB_Sample_Drug sd where s.intSampleID=sd.intSampleID and sd.intResult=4) sd
WHERE
s.dtmCollection BETWEEN '2013-06-01 00:00' AND '2013-08-31 23:59'
AND f.intCompanyID = 91
GROUP BY
tt.strType

Resources