SQL Count Distinct User IDs based on Column String Value - snowflake-cloud-data-platform

I am trying to create a new data table that has total counts of users for each week in two separate columns, a count for distinct users from Platform = 'Web' and a count of distinct users from Platform = 'iOS' or 'Android'
My attempt below did not work. Can someone advise?
DATA SOURCE:
DATE USER_ID Platform
1/1/2020 1223 Web
1/2/2020 2032 iOS
1/3/2020 2432 Android
1/4/2020 20311 iOS
1/4/2020 2443 Android
SQL ATTEMPT
SELECT DATE_TRUNC(week, d.DATE) as DATE,
COUNT(DISTINCT d.USER_ID WHERE d.PLATFORM = 'Web') AS USER_COUNT_WEB,
COUNT(DISTINCT d.USER_ID WHERE d.PLATFORM = 'Android' OR 'iOS') AS USER_COUNT_PHONE
FROM data d
GROUP BY 1;
Target table:
DATE User_Count_Web User_count_Phone
1/1/2020 12 230
1/8/2020 20 442
1/15/2020 24 533

here is one way :
SELECT
DATE_TRUNC(week, d.DATE) as DATE,
COUNT(DISTINCT case when d.PLATFORM = 'Web' then d.USER_ID end) AS USER_COUNT_WEB,
COUNT(DISTINCT case when d.PLATFORM in ('Android','iOS') then d.USER_ID end) AS USER_COUNT_PHONE
FROM data d
GROUP BY 1;

The sum/case pattern is also useful for this kind of "pivot query" -
SELECT
WEEK(date),
SUM(CASE WHEN platform='Web' THEN 1 ELSE 0 END) AS count_web,
SUM(CASE WHEN platform IN ('iOS','Android') THEN 1 ELSE 0 END) AS count_phone
FROM data
GROUP BY WEEK(date)
(The week() function is specific to mysql, and this answer gives the week as an integer offset from the start of the year rather than as an actual date, but the idea is the same.)

Related

subquery returned more than 1 value table join

I am writing a course project (something like a program for the hotel manager) and I need a little help. I have tables Reservations and Rooms and I need to calculate the amount of payment after the client leaves the room ((End_date - Start_date) * price_per_day), but I'm having trouble getting the price_per_day from the table Rooms.
My query only works if there is one record in the Resertvation table, if there are 2 or more, I get an error "subquery returned more than 1 value" and I don’t know how to fix it (the problem is in this part of the query SELECT price_per_day FROM Rooms AS ro JOIN Reservations AS re ON ro.room_id = re.room_id)
I'm using visual studio 2019 + SQL Server Express LocalDB.
I will be grateful for any help or hint!
UPDATE Reservations
SET Amount_payable = (
DATEDIFF(day, CONVERT(datetime, Start_date, 104), CONVERT(datetime, End_date, 104) * (SELECT price_per_day FROM Rooms AS ro JOIN Reservations AS re ON ro.room_id = re.room_id))
)
WHERE Status = 'Archived'
Table Reservations
reservation_id customer_id room_id start_date end_date status Amount_payable
1 3 3 12.04.2020 05.06.2020 Archived 0
2 2 4 11.04.2020 30.05.2020 Active 0
Table Rooms
reservation_id room_id number_of_persons room_type price_per_day
0 1 3 Double 300
0 2 4 Triple 600
0 3 3 Studio 400
2 4 2 Single 444
you need slightly different approach to resolve the issue.
try the following:
UPDATE re
SET
Amount_payable = (DATEDIFF(day, CONVERT(DATETIME, Start_date, 104), CONVERT(DATETIME, End_date, 104)) * price_per_day)
FROM Reservations re
JOIN Rooms AS ro ON ro.room_id = re.room_id
WHERE STATUS = 'Archived';

Sql server - Using aggregate functions in where clause

I am working on a sql query for Transport business, this query when executed should get the drivers information who got more than 20% star rating(5*) rating from his customers in last 30 days... also that should be a minimum of 5 trips..
Lets say if a driver completed 100 trips in last 30 days and he received 30 star rating (5*) feedback then this Driver and all his star (5*) Trips information should be retrieved by the query..this driver has completed more than 20% 5 star trips
select tr.[TripId], tr.[DriverId], tr.[Rating], dr.[DriverName]
from tblTripInfo
left outer join tblDriver dr
on tr.[DriverId] = dr.[DriverId]
where tr.[Rating] = 5 and tr.[TripDate] >= GetDate() - 30
the above query gets all the information of trips and driver who got 5* ratings in last 30 days, i want to get only those who have minimum of 20% 5* trips out of their total trips and that should me minimum of 5 trips
Initially i wanted to get only DriverId's who met the above condition and the below query worked
select DriverId,
count(case when Rating = 5 then DriverId end) as TotalStars,
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) as Average5Stars
from tblTripInfo
where TripDate >= GetDate() - 30
group by DriverId
having
count(case when Rating = 5 then DriverId end) > 10
and
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) > 25
But now i want to get all the information like tripId, driverName, trip date of those 5* trips as well
You need something in the line of this:
WITH TotalTrips as (
SELECT Count() as TotalTrips,
DriverId
FROM tblTripInfo
GROUP BY DriverId
)
SELECT DriverId,
count(case when Rating = 5 then DriverId end) as Total5StarTrips,
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) as Average5Stars
FROM tblTripInfo t1
JOIN TotalTrips t2
ON t1.DriverId = t2.DriverId
AND t2.TotalTrips > 5 --more than 5 trips
where TripDate >= GetDate() - 30
group by DriverId
HAVING COUNT(case when Rating = 5 then DriverId end) / t2.TotalTrips > 0.2 --more than 20% 5-starred trips
No need of complicated logic if you can use some SubQuery for simplicity.

How to filter out from count distinct query

I am trying to calculate numbers of customers whom are active in the past 3 and 6 months.
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 180) AND InvoiceDate < (GETDATE() - 90)
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 90)
However, based on above query, I'll get count Customers which has been active for both in the last 3 months and the last 6 months, even if there are duplicates like this.
Customer A bought once in past 3 months
Customer A bought once in past 6 months too
How do I filter out the customers, so that if customer A has been active in both past 3 and 6 months, he/she will only be counted in the 'active in past 3 months' query and not in the 'active in past 6 months' too.
I solve this problem this way
Let us consider you have following table. You might have more columns but for the result you want, we only require customer_id and date they bought something on.
CREATE TABLE [dbo].[customer_invoice](
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_id] [int] NULL,
[date] [date] NULL,
CONSTRAINT [PK_customer_invoice] PRIMARY KEY([id]);
I created this sample data on this table
INSERT INTO [dbo].[customer_invoice]
([customer_id]
,[date])
VALUES
(1,convert(date,'2019-12-01')),
(2,convert(date,'2019-11-05')),
(2,convert(date,'2019-8-01')),
(3,convert(date,'2019-7-01')),
(4,convert(date,'2019-4-01'));
Lets not try to jump directly on the final solution directly but take a single leap each time.
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id;
The above query gives you the number of days before each customer was active
customer_id lastActiveDays
1 15
2 41
3 168
4 259
Now We will use this query as subquery and Add a new column ActiveWithinCategory so that in later step we can group our data by the column.
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable;
This query gives you the the following result
customer_id lastActiveDays ActiveWithinCategory
1 15 active within 3 months
2 41 active within 3 months
3 168 active within 6 months
4 259 not active
Now use the above whole thing as subquery and Group the data using ActiveWithinCategory
SELECT ActiveWithinCategory, COUNT(*) AS NumberofCustomers FROM (
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable
) AS FinalResult GROUP BY ActiveWithinCategory;
And Here is your final result
ActiveWithinCategory NumberofEmployee
active within 3 months 2
active within 6 months 1
not active 1
If you want to achieve same thing is MySQL Database
Here is the final Query
SELECT ActiveWithinCategory, count(*) NumberofCustomers FROM(
SELECT MIN(DATEDIFF(curdate(),date)) AS lastActiveBefore,
IF(MIN(DATEDIFF(curdate(),date))<90,
'active within 3 months',
IF(MIN(DATEDIFF(curdate(),date))<180,'active within 6 months','not active')
) ActiveWithinCategory
FROM customer_invoice GROUP BY customer_id
) AS FinalResult GROUP BY ActiveWithinCategory;
I suspect that you want to do conditional aggregation here:
SELECT
CustomerNo,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 90 THEN 1 END) AS cnt_last_3,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 180 AND InvoiceDate < GETDATE() - 90
THEN 1 END) AS cnt_first_3
FROM yourTable
GROUP BY
CustomerNo;
Here cnt_last_3 is the count over the immediate past 3 months, and cnt_first_3 is the count from the 3 month period starting 6 months ago and ending 3 months ago.
If you want the distinct count you may add distinct like this
Select
count( Case when dt between getdate()- 90 and getdate() then id else null end) cnt_3_months
,count(distinct Case when dt between getdate() - 180 and getdate() - 90 then id else null end) cnt_6_months
from a

Need query for counting for 2 columns at a time

I have a table named "Orders"
It has following fields:
OrderID, OrderDate, ..... ,City, StatusID.
I want this result as return:
City No. of Delivered Orders, No. of Pending (Not Delivered)
-------------------------------------------------------------------
London 3 4
Paris 5 6
New York 7 8
Since we have only one field to track the delivery status that is StatusID, so I am facing difficulty in order to count for two conditions at a time..
Thanx in Advance :)
select City,
sum(case when StatusID = 'delivered' then 1 else 0 end) as [No. of Delivered Orders],
sum(case when StatusID = 'not_delivered' then 1 else 0 end) as [No. of Pending]
from Orders

Why some dates give worse performance than other in MS SQL Server

I have a query in MS SQL Server asking for name and some date-related information, depending on two dates, a start- and an enddate.
The problem is, I´m not always getting the same performance. Whenever I request something between the dates;
2010-07-01 00:00:00.000 and
2011-07-21 23:59:59.999
the performance is excellent. I get my result within mseconds. When I request something between these dates, for example,
2011-07-01 00:00:00.000 and
2011-07-21 23:59:59.999
the performance is.. less than good, taking between 20-28 seconds for each query. Do note how the dates giving good performance is more than a year between, while the latter is 20 days.
Is there any particular reason (maybe related to how DATETIME work) for this?
EDIT: The query,
SELECT ENAME,
SUM(CASE DATE WHEN 0 THEN 1 ELSE 0 END) AS U2,
SUM(CASE DATE WHEN 1 THEN 1 ELSE 0 END) AS B_2_4,
SUM(CASE DATE WHEN 2 THEN 1 ELSE 0 END) AS B_4_8,
SUM(CASE DATE WHEN 3 THEN 1 ELSE 0 END) AS B_8_16,
SUM(CASE DATE WHEN 4 THEN 1 ELSE 0 END) AS B_16_24,
SUM(CASE DATE WHEN 5 THEN 1 ELSE 0 END) AS B_24_48,
SUM(CASE DATE WHEN 6 THEN 1 ELSE 0 END) AS O_48,
SUM(CASE DATE WHEN 7 THEN 1 ELSE 0 END) AS status,
AVG(AVG) AS AVG,
SUM(DATE) AS TOTAL
FROM
(SELECT ENAME,
(CASE
WHEN status = 'Öppet' THEN 7
WHEN DATE < 48 THEN
(CASE WHEN DATE BETWEEN 0 AND 2 THEN 0
WHEN DATE BETWEEN 2 AND 4 THEN 1
WHEN DATE BETWEEN 4 AND 8 THEN 2
WHEN DATE BETWEEN 8 AND 16 THEN 3
WHEN DATE BETWEEN 16 AND 24 THEN 4
WHEN DATE BETWEEN 24 AND 48 THEN 5
ELSE - 1 END)
ELSE 6 END) AS DATE,
DATE AS AVG
FROM
(SELECT DATEDIFF(HOUR, cases.date, status.date) AS DATE,
extern.name AS ENAME,
status.status
FROM
cases INNER JOIN
status ON cases.id = status.caseid
AND status.date =
(SELECT MAX(date) AS Expr1
FROM status AS status_1
WHERE (caseid = cases.id)
GROUP BY caseid) INNER JOIN
extern ON cases.owner = extern.id
WHERE (cases.org = 'Expert')
AND (cases.date BETWEEN '2009-01-15 09:48:25.633'
AND '2011-07-21 09:48:25.633'))
AS derivedtbl_1)
AS derivedtbl_2
GROUP BY ENAME
ORDER BY ENAME
(parts of) The tables:
Extern
-ID (->cases.owner)
-name
Cases
-Owner (->Extern.id)
-id (->status.caseid)
-date (case created at this date)
Status
-caseid (->cases.id)
-Status
-Date (can be multiple, MAX(status.date) gives us date when
status was last changed)
I would have thought a statistics issue.
When you are only selecting the most recent dates these may be unrepresented in the statistics yet as the threshold has not yet been reached that would trigger auto updating.
See this blog post for an example.

Resources