SQL Server - Calculate AVG() using Joins - sql-server

I have a Cab transport application
Each driver has a Trip and for each trip, there can be multiple customers (Cab pooling) giving their feedback.
Now I want to get the drivers of those drivers who got more than 10 five star ratings(5*) and a minimum of 20% Five-star ratings out of their total Ratings received from their customers.
Let's say a driver got a total 40 feedbacks in the last 30 days out of which 16 are 5-star ratings, then this driver has met the criteria of minimum 10 5* star ratings and more than 20% 5* ratings. This driver id should be fetched.
SELECT TR.[DriverId]
,100.0 * AVG(CASE
WHEN FE.[Rating] = 5
THEN 1.0
ELSE 0
END) AS Percentage
FROM tblFeedback FE
LEFT OUTER JOIN tblTrip TR ON FE.TripId = TR.TripId
WHERE FE.DATE >= GETDATE() - 30
AND FE.Rating = 5
GROUP BY DriverId
HAVING COUNT(CASE
WHEN FE.[Rating] = 5
THEN DriverId
END) >= 10
AND 100 * AVG(CASE
WHEN FE.[Rating] = 5
THEN 1.0
ELSE 0
END) > 20
The above query is showing the Percentage as 100.000 for all the Drivers whose Id's are fetched, even those drivers whose total percentage is 18% are also fetched and their percentage is shown as 100%.
This query has screwed my report completely

Try this. You need to include all the ratings in order to calculate the percentage:
SELECT r.[DriverId], 100.0*r.five_stars/r.total_ratings AS Percentage
FROM (
SELECT TR.[DriverId]
SUM(CASE WHEN FE.Rating =5 THEN 1 ELSE 0 END) AS five_stars,
SUM(*) AS total_ratings
FROM tblFeedback FE
INNER JOIN tblTrip TR ON FE.TripId = TR.TripId
WHERE FE.DATE >= GETDATE() - 30
GROUP BY TR.DriverId) r
WHERE r.five_stars>=10
AND 100.0*r.five_stars/r.total_ratings>20.0;

I think the issue is in your WHERE clause. This line in particular:
AND FE.Rating = 5
This is forcing the tblFeedback table to only return records that have a five-star rating, and therefore, only the five-star ratings are used in the calculation. Try taking that line out and see if the calculations are any closer to what you expect.

Related

SQL: How to count distinct for all hourly periods in a day

I have a table of hotel data like this:
Room_ID
Check_in_time
Check_out_time
123
2021-10-01 01:02:03
2021-10-01 02:03:04
I would like to do a count of how many rooms were were checked in during each hour throughout a day (even if the room was checked in for 1 minute during the hour it still counts), so an output that look like this:
Time period
Number of rooms
09:00-10:00
10
10:00-11:00
12
..
..
There are a couple of other 'where' conditions but this is the crux of the problem. I have so far managed to write a query that can count unique room ID by specifying the hourly window:
select count (distinct room_id)
from data
where check_out_time > 9am and check_in_time < 10am
But how do I do this for each of the 24 hourly windows without repeating the same query 24 times? Hopefully something that can be later adapted into half hour intervals, or even minutes. I'm using Sigma in case that matters. Thanks in advance!
In Snowflake, I'd leverage a DATE_TRUNC function. If your dataset is very large, this will likely perform much better than any of the BETWEEN type of filtering that the OP and other answers are using.
select date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1;
If you needed to parse it out by day and time, you could add that, as well:
select date_trunc('day',check_out_time) as check_out_day
, date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1,2;
For reference:
https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html
You may try the following:
A recursive CTE is used to generate the possible hours 0-23 (we could have also select distinct hours from your existing dataset but i did not want to assume that every hour was possibly booked and this may be a less expensive operation for this case to get all possible hours). A left join was then used to determine hours rooms were booked before aggregating this and counting the number of bookings each hour.
WITH recursive hours(hr) as (
select 0 as hr
union all
select hr + 1 from hours where hr < 23
)
select
concat(h.hr,':00-',(h.hr+1),':00') as time_period,
COUNT(DISTINCT r.room_id) as no_rooms
from hours h
left join room_times r on (
CAST(r.check_in_time AS DATE) = CAST(r.check_out_time AS DATE) AND
h.hr BETWEEN DATE_PART(hour,r.Check_in_time) AND DATE_PART(hour,r.Check_out_time)
) OR
(
CAST(r.check_in_time AS DATE) < CAST(r.check_out_time AS DATE) AND
(
h.hr >= DATE_PART(hour, r.Check_in_time) OR
h.hr <= DATE_PART(hour,r.Check_out_time)
)
)
GROUP BY h.hr
order by h.hr
See working db fiddle (using sql server instead) with the same logic and additional data and outputs to assist verification here.
Sample Data:
INSERT INTO room_times
(Room_ID, Check_in_time, Check_out_time)
VALUES
('123', '2021-10-01 01:02:03', '2021-10-01 03:03:04'),
('124', '2021-10-01 15:02:03', '2021-10-02 01:03:04');
Outputs:
time_period
no_rooms
0:00-1:00
1
1:00-2:00
2
2:00-3:00
1
3:00-4:00
1
4:00-5:00
0
5:00-6:00
0
6:00-7:00
0
7:00-8:00
0
8:00-9:00
0
9:00-10:00
0
10:00-11:00
0
11:00-12:00
0
12:00-13:00
0
13:00-14:00
0
14:00-15:00
0
15:00-16:00
1
16:00-17:00
1
17:00-18:00
1
18:00-19:00
1
19:00-20:00
1
20:00-21:00
1
21:00-22:00
1
22:00-23:00
1
23:00-24:00
1
Let me know if this works for you.

Selecting same MSSQL table with different condition to get the difference

I have a table FinTrans As
Seq|Ledger|Debit_Credit|Amount
1 |130000|Debit |105
2 |120000|Debit |1456
3 |130000|Credit |500
4 |130000|Debit |9680
5 |130000|Credit |1432
6 |120000|Debit |1628
I want to find (sum of Debit Amount) - (sum of Credit Amount) for each ledger.
For eg.in above case for Ledger 130000
the sum of Debit Amount = 105+9680 = 9785
the sum of Credit Amount = 500 +1432=1537
Difference = 8248
How can I write a SQL query on the same table?
You can put a CASE expression inside an aggregate function. This is called conditional aggregation.
SELECT Ledger, SUM(Amount * CASE WHEN Debit_Credit = 'Credit' THEN -1 ELSE 1 END) As Difference
FROM FinTrans
-- WHERE Ledger = 130000 -- optional
GROUP BY d.Ledger
It works here because of the commutative property, which says you don't have to add up all the credits and debits separately to subtract one from the other; you can do all the additions and subtractions in any order and still end up with the same result.
If you really want to, you can do it this way:
SELECT Ledger,
SUM(CASE WHEN Debit_Credit = 'Debit' THEN Amount ELSE 0 END)
- SUM(CASE WHEN Debit_Credit = 'Credit' THEN Amount ELSE 0 END) As Difference
FROM FinTrans
-- WHERE Ledger = 130000 -- optional
GROUP BY d.Ledger
It more resembles the problem description. But it's more complicated and slower, and again, it's not needed.

Sql server - Using aggregate functions in where clause

I am working on a sql query for Transport business, this query when executed should get the drivers information who got more than 20% star rating(5*) rating from his customers in last 30 days... also that should be a minimum of 5 trips..
Lets say if a driver completed 100 trips in last 30 days and he received 30 star rating (5*) feedback then this Driver and all his star (5*) Trips information should be retrieved by the query..this driver has completed more than 20% 5 star trips
select tr.[TripId], tr.[DriverId], tr.[Rating], dr.[DriverName]
from tblTripInfo
left outer join tblDriver dr
on tr.[DriverId] = dr.[DriverId]
where tr.[Rating] = 5 and tr.[TripDate] >= GetDate() - 30
the above query gets all the information of trips and driver who got 5* ratings in last 30 days, i want to get only those who have minimum of 20% 5* trips out of their total trips and that should me minimum of 5 trips
Initially i wanted to get only DriverId's who met the above condition and the below query worked
select DriverId,
count(case when Rating = 5 then DriverId end) as TotalStars,
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) as Average5Stars
from tblTripInfo
where TripDate >= GetDate() - 30
group by DriverId
having
count(case when Rating = 5 then DriverId end) > 10
and
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) > 25
But now i want to get all the information like tripId, driverName, trip date of those 5* trips as well
You need something in the line of this:
WITH TotalTrips as (
SELECT Count() as TotalTrips,
DriverId
FROM tblTripInfo
GROUP BY DriverId
)
SELECT DriverId,
count(case when Rating = 5 then DriverId end) as Total5StarTrips,
100.0 * avg(case when Rating = 5 then 1.0 else 0 end) as Average5Stars
FROM tblTripInfo t1
JOIN TotalTrips t2
ON t1.DriverId = t2.DriverId
AND t2.TotalTrips > 5 --more than 5 trips
where TripDate >= GetDate() - 30
group by DriverId
HAVING COUNT(case when Rating = 5 then DriverId end) / t2.TotalTrips > 0.2 --more than 20% 5-starred trips
No need of complicated logic if you can use some SubQuery for simplicity.

Calculate Bounce Rate SQL Server 2008

I'm trying to calculate the Bounce Rate of pages in SQL Server in a table with Audit Data from Sharepoint.
ItemId UserId DocLocation Occurred
1 1 Home.aspx 2016-08-02 13:39:41
1 2 Home.aspx 2016-08-02 13:40:07
2 1 Other.aspx 2016-08-02 13:40:16
3 1 Items.aspx 2016-08-02 13:40:17
2 2 Other.aspx 2016-08-02 13:40:11
ItemId is the id of the page, DocLocation the location of the page and Occurred when the user goes into the page.
To calculate the bounce rate we have to divide the number of bounces between the total number of visits.
A Bounce happens when an user leaves the page in less than 5 seconds.
This should be the results for that table:
ItemId Bounces Visits BounceRate(Bounces/Visits)
1 1 2 0.5
2 1 2 0.5
3 0 1 0
I want to count a bounce calculating how much passes since the user performs the check until the user makes a visit to another page. If that time is less than 5 seconds, it would be counted as a bounce.
I'm making a stored procedure that execute the query to show the bounce rate of each page, but this doesn´t work.
SELECT
SUM(CASE
WHEN (DATEDIFF(second, #Occurred,
(SELECT TOP 1 a.Occurred
FROM [AuditPages] a
WHERE a.UserId = #userId
AND a.Occurred > #occurred
ORDER BY a.Occurred ASC))) < 30
THEN 1.0
ELSE 0.0
END) / COUNT(#itemId)
Someone knows how i can calculate this Bounce Rate?
Thanks for all the answers.
I like using row_number for this type of sequenced problem. The query below gives the desired result. I find performance with CTEs can sometimes be problematic with larger tables and you may need to convert to a temp table. You might consider using milliseconds if there is a chance you would want to use 4.5 seconds or such in the future.
declare #bounce_seconds int = 5;
with audit_cte as (
select *, ROW_NUMBER() over (partition by UserId order by Occurred) row_num
from AuditPages
--order by UserId,row_num
)
select a.ItemId, sum(a.bounce) Bounces, count(1) Visits, sum(a.bounce)/convert(float, count(1)) BounceRate
from (
select a1.ItemId, datediff(s,a1.Occurred, a2.Occurred) elapsed, case when datediff(s,a1.Occurred, a2.Occurred) < #bounce_seconds then 1 else 0 end bounce
from audit_cte a1
left join audit_cte a2
on a2.UserId = a1.UserId
and a2.row_num = a1.row_num + 1
--order by a1.UserId, a1.row_num
) a
group by a.ItemId
order by a.ItemId;
SELECT ItemId,COUNT(1) VISITS,SUM(BOUNCE_IND) BOUNCE, cast(SUM(BOUNCE_IND) as decimal(5,2))/cast(COUNT(1) as decimal(5,2)) BOUNCE_RATE
FROM (
Select
UserID,
ItemID,
DocLocation,
Occurred as Entry_time,
Lead(Occurred,1) Over (Partition by Userid order by Occurred) Exit_time,
CASE WHEN DATEDIFF(ss,Occurred,Lead(Occurred,1) Over (Partition by Userid order by Occurred)) <= 5 THEN 1 ELSE 0 END BOUNCE_IND
FROM Web_Data_Sample
) TBL GROUP BY ItemId

sql cross join - what use has anyone found for it? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Today, for the first time in 10 years of development with sql server I used a cross join in a production query. I needed to pad a result set to a report and found that a cross join between two tables with a creative where clause was a good solution. I was wondering what use has anyone found in production code for the cross join?
Update: the code posted by Tony Andrews is very close to what I used the cross join for. Believe me, I understand the implications of using a cross join and would not do so lightly. I was excited to have finally used it (I'm such a nerd) - sort of like the time I first used a full outer join.
Thanks to everyone for the answers! Here's how I used the cross join:
SELECT CLASS, [Trans-Date] as Trans_Date,
SUM(CASE TRANS
WHEN 'SCR' THEN [Std-Labor-Value]
WHEN 'S+' THEN [Std-Labor-Value]
WHEN 'S-' THEN [Std-Labor-Value]
WHEN 'SAL' THEN [Std-Labor-Value]
WHEN 'OUT' THEN [Std-Labor-Value]
ELSE 0
END) AS [LABOR SCRAP],
SUM(CASE TRANS
WHEN 'SCR' THEN [Std-Material-Value]
WHEN 'S+' THEN [Std-Material-Value]
WHEN 'S-' THEN [Std-Material-Value]
WHEN 'SAL' THEN [Std-Material-Value]
ELSE 0
END) AS [MATERIAL SCRAP],
SUM(CASE TRANS WHEN 'RWK' THEN [Act-Labor-Value] ELSE 0 END) AS [LABOR REWORK],
SUM(CASE TRANS
WHEN 'PRD' THEN [Act-Labor-Value]
WHEN 'TRN' THEN [Act-Labor-Value]
WHEN 'RWK' THEN [Act-Labor-Value]
ELSE 0
END) AS [ACTUAL LABOR],
SUM(CASE TRANS
WHEN 'PRD' THEN [Std-Labor-Value]
WHEN 'TRN' THEN [Std-Labor-Value]
ELSE 0
END) AS [STANDARD LABOR],
SUM(CASE TRANS
WHEN 'PRD' THEN [Act-Labor-Value] - [Std-Labor-Value]
WHEN 'TRN' THEN [Act-Labor-Value] - [Std-Labor-Value]
--WHEN 'RWK' THEN [Act-Labor-Value]
ELSE 0 END) -- - SUM([Std-Labor-Value]) -- - SUM(CASE TRANS WHEN 'RWK' THEN [Act-Labor-Value] ELSE 0 END)
AS [LABOR VARIANCE]
FROM v_Labor_Dist_Detail
where [Trans-Date] between #startdate and #enddate
--and CLASS = (CASE #class WHEN '~ALL' THEN CLASS ELSE #class END)
GROUP BY [Trans-Date], CLASS
UNION --REL 2/6/09 Pad result set with any missing dates for each class.
select distinct [Description] as class, cast([Date] as datetime) as [Trans-Date], 0,0,0,0,0,0
FROM Calendar_To_Fiscal cross join PRMS.Product_Class
where cast([Date] as datetime) between #startdate and #enddate and
not exists (select class FROM v_Labor_Dist_Detail vl where [Trans-Date] between #startdate and #enddate
and vl.[Trans-Date] = cast(Calendar_To_Fiscal.[Date] as datetime)
and vl.class= PRMS.Product_Class.[Description]
GROUP BY [Trans-Date], CLASS)
order by [Trans-Date], CLASS
A typical legitimate use of a cross join would be a report that shows e.g. total sales by product and region. If no sales were made of product P in region R then we want to see a row with a zero, rather than just not showing a row.
select r.region_name, p.product_name, sum(s.sales_amount)
from regions r
cross join products p
left outer join sales s on s.region_id = r.region_id
and s.product_id = p.product_id
group by r.region_name, p.product_name
order by r.region_name, p.product_name;
One use I've come across a lot is splitting records out into several records, mainly for reporting purposes.
Imagine a string where each character represents some event during the corresponding hour.
ID | Hourly Event Data
1 | -----X-------X-------X--
2 | ---X-----X------X-------
3 | -----X---X--X-----------
4 | ----------------X--X-X--
5 | ---X--------X-------X---
6 | -------X-------X-----X--
Now you want a report which shows how many events happened at what day. Cross join the table with a table of IDs 1 to 24, then work your magic...
SELECT
[hour].id,
SUM(CASE WHEN SUBSTRING([data].string, [hour].id, 1) = 'X' THEN 1 ELSE 0 END)
FROM
[data]
CROSS JOIN
[hours]
GROUP BY
[hours].id
=>
1, 0
2, 0
3, 0
4, 2
5, 0
6, 2
7, 0
8, 1
9, 0
10, 2
11, 0
12, 0
13, 2
14, 1
15, 0
16, 1
17, 2
18, 0
19, 0
20, 1
21, 1
22, 3
23, 0
24, 0
I have different reports that prefilter the recordset (by various lines of business within the firm), but there were calculations that required percentages of revenue firm-wide. The recordsource had to contain the firm total instead of relying on calculating the overall sum in the report itself.
Example: The recordset has balances for each client and the Line of Business the client's revenue comes from. The report may only show 'retail' clients. There is no way to get a sum of the balances for the entire firm, but the report shows the percentage of the firm's revenue.
Since there are different balance fields, I felt it was less complicated to have full join with the view that has several balances (I can also reuse this view of firm totals) instead of multiple fields made up sub queries.
Another one is an update statement where multiple records needed to be created (one record for each step in a preset workflow process).
Here's one, where the CROSS JOIN substitutes for an INNER JOIN. This is useful and legitimate when there are no identical values between two tables on which to join. For example, suppose you have a table that contains version 1, version 2 and version 3 of some statement or company document, all saved in a SQL Server table so that you can recreate a document that is associated with an order, on the fly, long after the order, and long after your document was rewritten into a new version. But only one of the two tables you need to join (the Documents table) has a VersionID column. Here is a way to do this:
SELECT DocumentText, VersionID =
(
SELECT d.VersionID
FROM Documents d
CROSS JOIN Orders o
WHERE o.DateOrdered BETWEEN d.EffectiveStart AND d.EffectiveEnd
)
FROM Documents
I've used a CROSS JOIN recently in a report that we use for sales forcasting, the report needs to break out the amount of sales that a sales person has done in each General Ledger account.
So in the report I do something to this effect:
SELECT gla.AccountN, s.SalespersonN
FROM
GLAccounts gla
CROSS JOIN Salesperson s
WHERE (gla.SalesAnalysis = 1 OR gla.AccountN = 47500)
This gives me every GL account for every sales person like:
SalesPsn AccountN
1000 40100
1000 40200
1000 40300
1000 48150
1000 49980
1000 49990
1005 40100
1005 40200
1005 40300
1054 48150
1054 49980
1054 49990
1078 40100
1078 40200
1078 40300
1078 48150
1078 49980
1078 49990
1081 40100
1081 40200
1081 40300
1081 48150
1081 49980
1081 49990
1188 40100
1188 40200
1188 40300
1188 48150
1188 49980
1188 49990
For charting (reports) where every grouping must have a record even if it is zero.
(e.g. RadCharts)
I had combinations of am insolvency field from my source data.
There are 5 distinct types but the data had combinations of 2 of these. So I created lookup table of the 5 distinct values then used a cross join for an insert statement to fill out the rest. like so
insert into LK_Insolvency (code,value)
select a.code+b.code, a.value+' '+b.value
from LK_Insolvency a
cross join LK_Insolvency b
where a.code <> b.code <--this makes sure the x product of the value with itself is not used as this does not appear in the source data.
I personally try to avoid cartesian product's in my queries. I suppose have a result set of every combination of your join could be useful, but usually if I end up with one, I know I have something wrong.

Resources