Calculate Bounce Rate SQL Server 2008 - sql-server

I'm trying to calculate the Bounce Rate of pages in SQL Server in a table with Audit Data from Sharepoint.
ItemId UserId DocLocation Occurred
1 1 Home.aspx 2016-08-02 13:39:41
1 2 Home.aspx 2016-08-02 13:40:07
2 1 Other.aspx 2016-08-02 13:40:16
3 1 Items.aspx 2016-08-02 13:40:17
2 2 Other.aspx 2016-08-02 13:40:11
ItemId is the id of the page, DocLocation the location of the page and Occurred when the user goes into the page.
To calculate the bounce rate we have to divide the number of bounces between the total number of visits.
A Bounce happens when an user leaves the page in less than 5 seconds.
This should be the results for that table:
ItemId Bounces Visits BounceRate(Bounces/Visits)
1 1 2 0.5
2 1 2 0.5
3 0 1 0
I want to count a bounce calculating how much passes since the user performs the check until the user makes a visit to another page. If that time is less than 5 seconds, it would be counted as a bounce.
I'm making a stored procedure that execute the query to show the bounce rate of each page, but this doesn´t work.
SELECT
SUM(CASE
WHEN (DATEDIFF(second, #Occurred,
(SELECT TOP 1 a.Occurred
FROM [AuditPages] a
WHERE a.UserId = #userId
AND a.Occurred > #occurred
ORDER BY a.Occurred ASC))) < 30
THEN 1.0
ELSE 0.0
END) / COUNT(#itemId)
Someone knows how i can calculate this Bounce Rate?
Thanks for all the answers.

I like using row_number for this type of sequenced problem. The query below gives the desired result. I find performance with CTEs can sometimes be problematic with larger tables and you may need to convert to a temp table. You might consider using milliseconds if there is a chance you would want to use 4.5 seconds or such in the future.
declare #bounce_seconds int = 5;
with audit_cte as (
select *, ROW_NUMBER() over (partition by UserId order by Occurred) row_num
from AuditPages
--order by UserId,row_num
)
select a.ItemId, sum(a.bounce) Bounces, count(1) Visits, sum(a.bounce)/convert(float, count(1)) BounceRate
from (
select a1.ItemId, datediff(s,a1.Occurred, a2.Occurred) elapsed, case when datediff(s,a1.Occurred, a2.Occurred) < #bounce_seconds then 1 else 0 end bounce
from audit_cte a1
left join audit_cte a2
on a2.UserId = a1.UserId
and a2.row_num = a1.row_num + 1
--order by a1.UserId, a1.row_num
) a
group by a.ItemId
order by a.ItemId;

SELECT ItemId,COUNT(1) VISITS,SUM(BOUNCE_IND) BOUNCE, cast(SUM(BOUNCE_IND) as decimal(5,2))/cast(COUNT(1) as decimal(5,2)) BOUNCE_RATE
FROM (
Select
UserID,
ItemID,
DocLocation,
Occurred as Entry_time,
Lead(Occurred,1) Over (Partition by Userid order by Occurred) Exit_time,
CASE WHEN DATEDIFF(ss,Occurred,Lead(Occurred,1) Over (Partition by Userid order by Occurred)) <= 5 THEN 1 ELSE 0 END BOUNCE_IND
FROM Web_Data_Sample
) TBL GROUP BY ItemId

Related

Write Query That Consider Date Interval

I have a table that contains Transactions of Customers.
I should Find Customers That had have at least 2 transaction with amount>20000 in Three consecutive days each month.
For example , Today is 2022/03/12 , I should Gather Data Of Transactions From 2022/02/13 To 2022/03/12, Then check These Data and See If a Customer had at least 2 Transaction With Amount>=20000 in Three consecutive days.
For Example, Consider Below Table:
Id
CustomerId
Transactiondate
Amount
1
1
2022-01-01
50000
2
2
2022_02_01
20000
3
3
2022_03_05
30000
4
3
2022_03_07
40000
5
2
2022_03_07
20000
6
4
2022_03_07
30000
7
4
2022_03_07
30000
The Out Put Should be : CustomerId =3 and CustomerId=4
I write query that Find Customer For Special day , but i don't know how to find these customers in one month with out using loop.
the query for special day is:
With cte (select customerid, amount, TransactionDate,Dateadd(day,-2,TransactionDate) as PrevDate
From Transaction
Where TransactionDate=2022-03-12)
Select CustomerId,Count(*)
From Cte
Where
TransactionDate>=Prevdate and TransactionDate<=TransactionDate
And Amount>=20000
Group By CustomerId
Having count(*)>=2
Hi there are many options how to achieve this.
I think that easies (from perfomance maybe not) is using LAG function:
WITH lagged_days AS (
SELECT
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
,*
FROM Transaction
), valid_cust_base as (
SELECT
*
FROM lagged_days
WHERE DATEPART(MONTH, lagged) = DATEPART(MONTH, Transactiondate)
AND datediff(day, Transactiondate, lagged_dt) <= 3
AND Amount >= 20000
)
SELECT
CustomerID
FROM valid_cust_base
GROUP BY CustomerID
HAVING COUNT(*) >= 2
First I have created lagged TransactionDate over customer (I assume that id is incremental). Then I have Selected only transactions within one month, with amount >= 20000 and where date difference between transaction is less then 4 days. Then just select customers who had more than 1 transaction.
In LAG First value is always missing per Customer missing, but you still need to be able say: 1st and 2nd transaction are within 3 days. Thats why I am replacing first NULL value with LEAD. It doesn't matter if you use:
ISNULL(LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
OR
ISNULL(LEAD(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id),
LAG(Transactiondate) OVER(PARTITION BY CustomerID ORDER BY id)) lagged_dt
The main goal is to have for each transaction closest TransactionDate.

SQL: How to count distinct for all hourly periods in a day

I have a table of hotel data like this:
Room_ID
Check_in_time
Check_out_time
123
2021-10-01 01:02:03
2021-10-01 02:03:04
I would like to do a count of how many rooms were were checked in during each hour throughout a day (even if the room was checked in for 1 minute during the hour it still counts), so an output that look like this:
Time period
Number of rooms
09:00-10:00
10
10:00-11:00
12
..
..
There are a couple of other 'where' conditions but this is the crux of the problem. I have so far managed to write a query that can count unique room ID by specifying the hourly window:
select count (distinct room_id)
from data
where check_out_time > 9am and check_in_time < 10am
But how do I do this for each of the 24 hourly windows without repeating the same query 24 times? Hopefully something that can be later adapted into half hour intervals, or even minutes. I'm using Sigma in case that matters. Thanks in advance!
In Snowflake, I'd leverage a DATE_TRUNC function. If your dataset is very large, this will likely perform much better than any of the BETWEEN type of filtering that the OP and other answers are using.
select date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1;
If you needed to parse it out by day and time, you could add that, as well:
select date_trunc('day',check_out_time) as check_out_day
, date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1,2;
For reference:
https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html
You may try the following:
A recursive CTE is used to generate the possible hours 0-23 (we could have also select distinct hours from your existing dataset but i did not want to assume that every hour was possibly booked and this may be a less expensive operation for this case to get all possible hours). A left join was then used to determine hours rooms were booked before aggregating this and counting the number of bookings each hour.
WITH recursive hours(hr) as (
select 0 as hr
union all
select hr + 1 from hours where hr < 23
)
select
concat(h.hr,':00-',(h.hr+1),':00') as time_period,
COUNT(DISTINCT r.room_id) as no_rooms
from hours h
left join room_times r on (
CAST(r.check_in_time AS DATE) = CAST(r.check_out_time AS DATE) AND
h.hr BETWEEN DATE_PART(hour,r.Check_in_time) AND DATE_PART(hour,r.Check_out_time)
) OR
(
CAST(r.check_in_time AS DATE) < CAST(r.check_out_time AS DATE) AND
(
h.hr >= DATE_PART(hour, r.Check_in_time) OR
h.hr <= DATE_PART(hour,r.Check_out_time)
)
)
GROUP BY h.hr
order by h.hr
See working db fiddle (using sql server instead) with the same logic and additional data and outputs to assist verification here.
Sample Data:
INSERT INTO room_times
(Room_ID, Check_in_time, Check_out_time)
VALUES
('123', '2021-10-01 01:02:03', '2021-10-01 03:03:04'),
('124', '2021-10-01 15:02:03', '2021-10-02 01:03:04');
Outputs:
time_period
no_rooms
0:00-1:00
1
1:00-2:00
2
2:00-3:00
1
3:00-4:00
1
4:00-5:00
0
5:00-6:00
0
6:00-7:00
0
7:00-8:00
0
8:00-9:00
0
9:00-10:00
0
10:00-11:00
0
11:00-12:00
0
12:00-13:00
0
13:00-14:00
0
14:00-15:00
0
15:00-16:00
1
16:00-17:00
1
17:00-18:00
1
18:00-19:00
1
19:00-20:00
1
20:00-21:00
1
21:00-22:00
1
22:00-23:00
1
23:00-24:00
1
Let me know if this works for you.

Cumulative Addition in SQL Server 2008

Sample data in tblData:
RowID SID Staken DateTaken
---------------------------------------------
1 1 1 2014-09-15 14:18:11.997
2 1 1 2014-09-16 14:18:11.997
3 1 1 2014-09-17 14:18:11.997
I would like to get the daywise count of SIDs and also a cumulative sum like
Date ThisDayCount TotalCount
-----------------------------------
2014-09-15 1 1
2014-09-16 10 11
2014-09-17 30 41
This is what I have now in my stored procedure with the start & end date parameters. Is there a more elegant way to do this?
;WITH TBL AS
(
SELECT
CONVERT(date, asu.DateTaken) AS Date,
COUNT(*) AS 'ThisDayCount'
FROM
tblData asu
WHERE
asu.SID = 1
AND asu.STaken = 1
AND asu.DateTaken >= #StartDate
AND asu.DateTaken <= #EndDate
GROUP BY
CONVERT(date, asu.DateTaken)
)
SELECT
t1.Date, t1.ThisDayCount, SUM(t1.ThisDayCount) AS 'TotalCount'
FROM
TBL t1
INNER JOIN
TBL t2 ON t1.date >= t2.date
GROUP BY
t1.Date, t1.ThisDayCount
I am not aware of a more elegant way to do that, other than perhaps with a subquery for your running total. What you have is pretty elegant by T-SQL standards.
But, depending on how many records you have to process and what your indexes look like, this could be very slow. You don't say what the destination of this information is, but if it's any kind of report or web page, I'd consider doing the running total as part of the processing at the destination rather than in the database.

How to write this query without cursor in SQL Server 2008 R2?

I have this table ScoreDetails, 2 columns (there are more, but only 2 needed or this query). One is ScoreDate, Score.
The structure is like
2012:03:27: 5:06:37:134 27
2012:03:27: 5:06:37:276 37
2012:03:28: 4:12:97:019 19
2012:03:29: 7:06:37:134 7
2012:03:29: 8:06:37:134 0
2012:04:03: 12:06:37:739 16
2012:04:04: 23:21:15:834 33
2012:04:04: 15:08:24:697 12
2012:04:06: 5:06:37:134 0
2012:04:09: 5:06:37:134 2
2012:04:13: 5:06:37:134 92
What I want is to write a select query, without using temp table or cursor. Such that, I have a column that starts from 1 and keeps on increasing as 2,3 and so on, upto when the score is non-zero. But as soon as a zero is encountered in score column, it resets to 1 and then start again. Like this...
2012:03:27: 5:06:37:134 27 1
2012:03:27: 5:06:37:276 37 2
2012:03:28: 4:12:97:019 19 3
2012:03:29: 7:06:37:134 7 4
2012:03:29: 8:06:37:134 0 0
2012:04:03: 12:06:37:739 16 1
2012:04:04: 23:21:15:834 33 2
2012:04:04: 15:08:24:697 12 3
2012:04:06: 5:06:37:134 0 0
2012:04:09: 5:06:37:134 2 1
2012:04:13: 5:06:37:134 92 2
I am using SQL Server 2008 R2.
You can use common table expressions for that. I defined 2 anchor queries: one for records with 0 score and the other for the first record. Then you build up the result based on previous results until you find 0 score.
with cte
as
(
select ScoreDate, Score, ScoreRank, 0 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score = 0
union all
select ScoreDate, Score, ScoreRank, 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
where Score <> 0 and ScoreRank = 1
union all
select X.ScoreDate, X.Score, X.ScoreRank, cte.Value + 1 as Value
from (select ScoreDate, Score, dense_rank() over (order by ScoreDate) ScoreRank
from ScoreDetails) X
inner join cte
on X.ScoreRank = cte.ScoreRank + 1
and X.Score <> 0
)
select ScoreDate, Score, Value, ScoreRank
from cte
order by ScoreDate
SQL Fiddle Demo
I won't spoil the fun of finding the solution yourself, but I will give you some hints on how to split the problem into smaller pieces:
Find all the records where the score is reset. Let's call this subquery the resetRecords.
Join the records of the original table to the resetRecords, such that every record has "its" reset record (i.e., the reset record that provides the base for its count).
Use ROW_NUMBER() OVER (PARTITION BY ... ) to assign the numbers.
Try to do this one step at a time. Beware: It won't be a simple query, so a solution with temp tables or cursors might be easier to understand and maintain.
Try something like this:
with x as (
select *, sum(case when Score=0 then 1 else 0 end) over(order by ScoreDate) as grp
from ScoreDetails
)
select ScoreDate, Score, row_number() over (partition by grp order by ScoreDate)
from x
order by ScoreDate
(as soon as a zero is encountered in score column, it resets to 1 and then start again, you said)

Rolling a number from rows with a flag into the next row without the flag

I'm a bit stumped about how to solve this particular piece of a problem I'm working on. I started with a much bigger problem, but I managed to simplify it into this while keeping good performance intact.
Say I have the following result set. AggregateMe is something I'm deriving from SQL conditionals.
MinutesElapsed AggregateMe ID Type RowNumber
1480 1 1 A 1
1200 0 1 A 2
1300 0 1 B 3
1550 0 1 C 4
725 1 1 A 5
700 0 1 A 6
1900 1 2 A 7
3300 1 2 A 8
4900 0 2 A 9
If AggregateMe is 1 (true) or, if you prefer, if is true, I want the counts to be aggregated into the next row where AggregateMe (or conditions) do not evaluate to true.
Aggregate functions or Subqueries are fair game as is PARTITION BY.
For example, the above result set would become:
MinutesElapsed ID Type
2680 1 A
1300 1 B
1550 1 C
1425 1 A
10100 2 A
Is there a clean way to do this? If you want, I can share more about the original problem, but it is a bit more complicated.
Edited to add: SUM and GROUP BY alone won't work, because some sums would be rolled into the wrong row. My sample data did not reflect this case, so I added rows where this case can occur. In the updated sample data, using an aggregate function in the simplest way would cause the 2680 count and the 1425 count to be rolled together, which I do not want.
EDIT: And if you're wondering how I got here in the first place, here you go. I'm going to aggregate statistics about how long our program left something in a certain ActionType, and my first step was by creating this subquery. Please feel free to criticize:
select
ROW_NUMBER() over(order by claimid, insertdate asc) as RowNbr,
DateDiff(mi, ahCurrent.InsertDate, CASE WHEN ahNext.NextInsertDate is null THEN GetDate() ELSE ahNext.NextInsertDate END) as MinutesInActionType,
ahCurrent.InsertDate, ahNext.NextInsertDate,
ahCurrent.ClaimID, ahCurrent.ActionTypeID,
case when ahCurrent.ActionTypeID = ahNext.NextActionTypeID and ahCurrent.ClaimID = ahNext.NextClaimID then 1 else 0 end as aggregateme
FROM
(
select ROW_NUMBER () over(order by claimid, insertdate asc) as RowNum, ClaimID, InsertDate, ActionTypeID
From autostatushistory
--Where AHCurrent is not AHPast
) ahCurrent
LEFT JOIN
(
select ROW_NUMBER() over(order by claimid, insertdate asc) as RowNum, ClaimID as NextClaimID, InsertDate as NextInsertDate, ActionTypeID as NextActionTypeID
FROM autostatushistory
) ahNext
ON (ahCurrent.ClaimID = ahNext.NextClaimID AND ahCurrent.RowNum = ahNext.RowNum - 1 and ahCurrent.ActionTypeID = ahNext.NextActionTypeID)
here the query the you need to execute,
it's not clean, maybe you'll optimize it:
WITH cte AS( /* Create a table containing row number */
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ROW,
MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM rolling
)
SELECT MinutesElapsed + (CASE /* adding minutes from next valid records*/
WHEN cte.AggregateMe <> 1 /*if current record is 0 then */
THEN 0 /*skip it*/
ELSE
(SELECT SUM(MinutesElapsed) /* calculating sum of all -> */
FROM cte localTbl
WHERE
cte.ROW < localTbl.ROW /* next records -> */
AND
localTbl.ROW <= ( /* until we find aggregate = 0 */
SELECT MIN(ROW)
FROM cte sTbl
WHERE sTbl.AggregateMe = 0
AND
sTbl.ROW > cte.ROW
)
AND
(localTbl.AggregateMe = 0 OR /* just to be sure :) */
localTbl.AggregateMe = 1))
END) as MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM cte
WHERE cte.ROW = 1 OR NOT( /* not showing records used that are used in sum, skipping 1 record*/
( /* records with agregate 0 after record with aggregate 1 */
cte.AggregateMe = 0
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)>0
)
OR
( /* record with aggregate 1 after record with aggregate 1 */
cte.AggregateMe = 1
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)= 1
)
);
test here
hope it helps to your problem.
feel free to ask questions.
By looking at your result set seems like following would work,
SELECT ID,Type,SUM(MinutesElapsed)
FROM mytable
GROUP BY ID,Type
But cannot tell for sure without looking into original dataset.

Resources