Postgres COUNT number of column values with INNER JOIN - database

I am creating a report in Postgres 9.3. This is my SQL Fiddle.
Basically I have two tables, responses and questions, the structure is:
responses
->id
->question_id
->response
questions
->id
->question
->costperlead
for the column response there can only be 3 values, Yes/No/Possbily,
and my report should have the columns:
question_id
, # of Yes Responses
, # of No Responses
, # of Possbily Responses
, Revenue
Then:
# of Yes Responses - count of all Yes values in the response column
# of No Responses - count of all No values in the response column
# of Possbily Responses - count of all 'Possbily' values in the response column
Revenue is the costperlead * (Number of Yes Responses + Number of Possibly Responses).
I don't know how to construct the query, I'm new plus I came from MySQL so some things are different for postgres. In my SQL Fiddle sample most responses are Yes and Null, it's ok eventually, there will be Possibly and No.
So far I have only:
SELECT a.question_id
FROM responses a
INNER JOIN questions b ON a.question_id = b.id
WHERE a.created_at = '2015-07-17'
GROUP BY a.question_id;

You should try:
SELECT a.question_id,
SUM(CASE WHEN a.response = 'Yes' THEN 1 ELSE 0 END) AS NumsOfYes,
SUM(CASE WHEN a.response = 'No' THEN 1 ELSE 0 END) AS NumsOfNo,
SUM(CASE WHEN a.response = 'Possibly' THEN 1 ELSE 0 END) AS NumOfPossibly,
costperlead * SUM(CASE WHEN a.response = 'Yes' THEN 1 ELSE 0 END) + SUM(CASE WHEN a.response = 'Possibly' THEN 1 ELSE 0 END) AS revenue
FROM responses a
INNER JOIN questions b ON a.question_id = b.id
GROUP BY a.question_id, b.costperlead

Since the only predicate filters rows from table responses, it would be most efficient to aggregate responses first, then join to questions:
SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
FROM (
SELECT question_id
, count(*) FILTER (WHERE response = 'Yes') AS ct_yes
, count(*) FILTER (WHERE response = 'No') AS ct_no
, count(*) FILTER (WHERE response = 'Possibly') AS ct_maybe
FROM responses
WHERE created_at = '2015-07-17'
GROUP BY 1
) r
JOIN questions q ON q.id = r.question_id;
db<>fiddle here
This uses the aggregate FILTER clause (in Postgres 9.4 or later). See:
Aggregate columns with additional (distinct) filters
Aside: consider implementing response as boolean type with true/false/null.
For Postgres 9.3:
SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
FROM (
SELECT question_id
, count(response = 'Yes' OR NULL) AS ct_yes
, count(response = 'No' OR NULL) AS ct_no
, count(response = 'Possibly' OR NULL) AS ct_maybe
FROM responses
WHERE created_at = '2015-07-17'
GROUP BY 1
) r
JOIN questions q ON q.id = r.question_id;
Old sqlfiddle
Comprehensive comparison of techniques:
For absolute performance, is SUM faster or COUNT?

Related

Group by two columns with case query in SQL Server

I'm trying to retrieve drivers data with Total Accepted and Total Ignored ride requests for the current date.
Based on the Drivers and DriverReceivedRequests tables, I get the total count but the twists is I have duplicate rows that reside on the DriverReceivedRequest table against the driverId and the rideId. So there has to be group by clause on both the driverId and RideId, having the driver getting multiple requests for the current date, but also receiving twice or thrice request for the same ride as well.
This is the table structure for DriverReceivedRequests:
Id DriverId RideId ReceivedStatusId DateTime
------------------------------------------------------------------------
0014d26b 93665f55 fef6fb96 NULL 04:55.6
00175c65 6e62a94e cb214a84 NULL 09:32.1
0017c22b ec9e1297 4b47dc8a 4211357D 10:28:5
0014d26b 6e62a94e fef6fb96 NULL 04:56.8
This is the query I have tried:
select
d.Id, d.FirstName, d.LastName,
Sum(case when drrs.Number = 1 then 1 else 0 end) as TotalAccepted,
Sum(case when drrs.Number = 2 or drr.ReceivedStatusId is null then 1 else 0 end) as TotalRejected
from
dbo.[DriverReceivedRequests] drr
inner join
dbo.[Drivers] d on drr.DriverId = d.Id
left join
dbo.[DriverReceivedRequestsStatus] drrs on drr.ReceivedStatusId = drrs.Id
where
Day(drr.Datetime) = Day(getdate())
and month(drr.DateTime) = Month(getdate())
group by
d.FirstName, d.LastName, d.Id
In the above query if I group by with RideId as well, it generates duplicate names of drivers as well with incorrect data. I've also applied partition by clause with DriverId but not the correct result
This also generates the same result
WITH cte AS
(
SELECT
d.Id,
d.FirstName, d.LastName,
TotalAccepted = SUM (CASE WHEN drrs.Number = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY drr.DriverId),
TotalRejected = SUM (CASE WHEN drrs.Number = 2 OR drr.ReceivedStatusId IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY drr.DriverId),
rn = ROW_NUMBER() OVER(PARTITION BY drr.DriverId
ORDER BY drr.DateTime DESC)
FROM
DriverReceivedRequests drr
INNER JOIN
dbo.[Drivers] d ON drr.DriverId = d.Id
LEFT JOIN
dbo.[DriverReceivedRequestsStatus] drrs ON drr.ReceivedStatusId = drrs.Id
WHERE
DAY (drr.Datetime) = DAY (GETDATE())
AND MONTH (drr.DateTime) = MONTH (GETDATE())
)
SELECT
Id,
FirstName, LastName,
TotalAccepted, TotalRejected
FROM
cte
WHERE
rn = 1
My question is how can I group by individual driver data in terms of incoming ride requests?
Note: the driver receives same ride request multiple times

Select from same column under different conditons

I need to join these two tables. I need to select occurrences where:
ex_head of_family_active = 1 AND tax_year = 2017
and also:
ex_head of_family_active = 0 AND tax_year = 2016
The first time I tried to join these two tables I got the warehouse data
dbo.tb_master_ascend AND warehouse_data.dbo.tb_master_ascend in the from clause have the same exposed names. As the query now shown below, I get a syntax error on the "where". What am I doing wrong? Thank you
use [warehouse_data]
select
parcel_number as Account,
pact_code as type,
owner_name as Owner,
case
when ex_head_of_family_active >= 1
then 'X'
else ''
end 'Head_Of_Fam'
from
warehouse_data.dbo.tb_master_ascend
inner join
warehouse_data.dbo.tb_master_ascend on parcel_number = parcel_number
where
warehouse_data.dbo.tb_master_ascend.tax_year = '2016'
and ex_head_of_family_active = 0
where
warehouse_data.dbo.tb_master_ascend.t2.tax_year = '2017'
and ex_head_of_family_active >= 1
and (eff_from_date <= getdate())
and (eff_to_date is null or eff_to_date >= getdate())
#marc_s I changed the where statements and updated my code however the filter is not working now:
use [warehouse_data]
select
wh2.parcel_number as Account
,wh2.pact_code as Class_Type
,wh2.owner_name as Owner_Name
,case when wh2.ex_head_of_family_active >= 1 then 'X'
else ''
end 'Head_Of_Fam_2017'
from warehouse_data.dbo.tb_master_ascend as WH2
left join warehouse_data.dbo.tb_master_ascend as WH1 on ((WH2.parcel_number = wh1.parcel_number)
and (WH1.tax_year = '2016')
and (WH1.ex_head_of_family_active is null))
where WH2.tax_year = '2017'
and wh2.ex_head_of_family_active >= 1
and (wh2.eff_from_date <= getdate())
and (wh2.eff_to_date is null or wh2.eff_to_date >= getdate())
I would use a CTE to get all your parcels that meet your 2016 rules.
Then join that against your 2017 rules on parcel ID.
I'm summarizing:
with cte as
(
select parcelID
from
where [2016 rules]
group by parcelID --If this isn't unique you will cartisian your results
)
select columns
from table
join cte on table.parcelid=cte.parcelID
where [2017 rules]

select distinct with parent-child and return boolean for at least one child with a value?

I am currently searching for orders that have at least one orderline (product) with a certain boolean set:
- the product is a subscription product
- the product is a setup product
If one of the orderlines has this value set to 1, I want to return this in the query per DISTINCT order ID.
This does not seem to work for me:
SELECT DISTINCT [ORDER].[order_id]
,[ORDERLINE].[is_subscription] AS hasSubArticles
,[ORDERLINE].[is_setup] AS hasSetupArticles
FROM [ORDER]
LEFT JOIN [ORDERLINE]
ON [ORDER].[order_id] = [ORDERLINE].[f_order_id]
WHERE [G_ORDER].[status] = 1
ORDER BY [ORDER].[order_id]
,[ORDERLINE].[is_subscription] AS hasSubArticles
,[ORDERLINE].[is_setup] AS hasSetupArticles
When I check the returned records, I receive duplicate ORDER records:
order_id hasSubArticles hasSetupArticles
----------------------------------------
17804 NULL NULL
17804 1 0
I want to return only 1 record per order ID, thus this isn't working for me.
What am I doing wrong?
Distinct does not work for your requirement. MAX, Min functions are not allowed to use with bit type. You could use Group by and SUM like this
SELECT
[ORDER].[order_id]
,CASE WHEN SUM( CASE WHEN [ORDERLINE].[is_subscription] = 1 THEN 1 ELSE 0 END) > 0 THEN 1
ELSE 0
END AS hasSubArticles
,CASE WHEN SUM( CASE WHEN [ORDERLINE].[is_setup] = 1 THEN 1 ELSE 0 END) > 0
THEN 1
ELSE 0
END hasSetupArticles
FROM [ORDER]
LEFT JOIN [ORDERLINE]
ON [ORDER].[order_id] = [ORDERLINE].[f_order_id]
WHERE [G_ORDER].[status] = 1
GROUP BY [ORDER].[order_id]

TSQL Partitioning to COUNT() only subset of rows

I have a query that I am working on that, for every given month and year in a sales table, returns back a SUM() of the total items ordered as well as a count of the distinct number of accounts ordering items and a couple break down of SUm()s on various product types. See the query below for an example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
The problem I am facing now is that I also need to get the count of distinct accounts broken out based on another field in the dataset. For example:
SELECT
YearReported,
MonthReported,
COUNT(DISTINCT DiamondId) as Accounts,
SUM(Quantity) as TotalUnitsOrdered,
SUM(CASE P.ProductType WHEN 1 THEN Quantity ELSE 0 END) as MonthliesOrdered,
SUM(CASE P.ProductType WHEN 1 THEN 0 ELSE Quantity END) as TPBOrdered,
SUM(CASE IsInitial WHEN 1 THEN Quantity ELSE 0 END) as InitialOrders,
SUM(CASE IsInitial WHEN 0 THEN Quantity ELSE 0 END) as Reorders,
COUNT(/*DISTINCT DiamndId WHERE IsInitial = 1 */) as InitialOrderAccounts
FROM
RetailOrders R WITH (NOLOCK)
LEFT JOIN
Products P WITH (NOLOCK) ON R.ProductId = P.ProductId
GROUP BY
YearReported, MonthReported
Obviously we would replace the commented out section in the last SUM with something that would not throw an error. I just added that for illustrative purposes.
I feel like this can be done using the Partition methods in SQL, but I have to admit that I just am not very good with them and cant figure out how to do this. And the MS documentation online for Partitions really just makes my head hurt after reading it last night.
EDIT: I mistakenly has the last aggregate function as sum and I meant for it to be a COUNT().
So to help clarify, COUNT(DISTINCT DiamondId) will give me back a count of all unique DiamondId values in the set but I also need to get a COUNT() of all Unique Diamond Id values in the set whose corresponding IsInitial flag is set to 1
Just a matter of nulling the ones that don't qualify:
count(distinct
case
when IsInitial = 1 then DiamndId
/* else null */
end
)

SQL Percentage calculation

Is it possible in SQL to calculate the percentage of the 'StaffEntered' column's "Yes" values (case when calculated column) out of the grand total number of orders by that user (RequestedBy)? I'm basically doing this function now myself in Excel with a Pivot table, but thought it may be easier to build it into the query. Here is the existing sample SQL code:
Select
Distinct
RequestedBy = HStaff.Name,
AccountID = isnull(pv.AccountID, ''),
StaffEntered = Case When DictionaryItem2.Name like '%PLB%' Then 'Yes' Else 'No' end
FROM
[dbo].[HOrd] HOrd WITH ( NOLOCK )
left outer join HStaff HStaff with (nolock)
on HOrd.Requestedby_oid = HStaff.ObjectID
and HStaff.Active = 1
left outer join DictionaryItem DictionaryItem2 WITH (NOLOCK)
ON HSUser1.PreferenceGroup_oid = DictionaryItem2.ObjectID
AND DictionaryItem2.ItemType_oid = 98
Here is what I am doing in Excel currently with the query results, I have a pivot table and I am dividing the "Yes" values of the "StaffEntered" field out of the Grand Total number of entries for that specific "RequestedBy" user. Essentially Excel is doing the summarization and then I am doing a simple division calculation to obtain the percentage.
Thanks in advance!
You didn't provide a lot in the way of details but I think this should be pretty close to what you are looking for.
select HStaff.Name as RequestedBy
, isnull(pv.AccountID, '') as AccountID
, Case When DictionaryItem2.Name like '%PLB%' Then 'Yes' Else 'No' end as StaffEntered
, sum(Case When DictionaryItem2.Name like '%PLB%' Then 1 Else 0 end) / GrandTotal
From SomeTable
group by HStaff.Name
, isnull(pv.AccountID, '')
, GrandTotal
Giving the FROM part of your SQL Statement would allow us to create a more correct answer. This statement will get the totals of yes/no per HStaff name and add it to each detail record in your SQL statement:
WITH cte
AS ( SELECT HStaff.Name ,
SUM(CASE WHEN dictionaryItem2.Name LIKE '%PLB%' THEN 1
ELSE 0
END) AS YesCount ,
SUM(CASE WHEN dictionaryItem2.Name NOT LIKE '%PLB%'
THEN 1
ELSE 0
END) AS NotCount
FROM YourTable
GROUP BY HStaff.Name
)
SELECT HStaff.Name AS requestedBy ,
ISNULL(pv.AccountID, '') AS AccountID ,
CASE WHEN DictionaryItem2.Name LIKE '%PLB%' THEN 'Yes'
ELSE 'No'
END AS StaffEntered ,
cte.YesCount / ( cte.YesCount + cte.NotCount ) AS PLB_Percentage
FROM yourtable
INNER JOIN cte ON yourtable.Hstaff.Name = cte.NAME

Resources