Average calculation through time scale (SQL Server) - preparation for charting

Average calculation through time scale (SQL Server) - preparation for charting - sql-server

I have the following table in SQL Server Express edition:
Time Device Value
0:00 1 2
0:01 2 3
0:03 3 5
0:03 1 3
0:13 2 5
0:22 1 7
0:34 3 5
0:35 2 6
0:37 1 5
The table is used to log the events of different devices which are reporting their latest values. What I'd like to do is to prepare the data in a way that I'd present the average data through time scale and eventually create a chart using this data. I've manipulated this example data in Excel in the following way:
Time Average value
0:03 3,666666667
0:13 4,333333333
0:22 5,666666667
0:34 5,666666667
0:35 6
0:37 5,333333333
So, at time 0:03 I need to take latest data I have in the table and calculate the average. In this case it's (3+3+5)/3=3,67. At time 0:13 the steps would be repeated, and again at 0:22,...
As I'd like to leave the everything within the SQL table (I wouldn't like to create any service with C# or similar which would grab the data and store it into some other table)
I'd like to know the following:
is this the right approach or should I use some other concept of calculating the average for charting data preparation?
if yes, what's the best approach to implement it? Table view, function within the database, stored procedure (which would be called from the charting API)?
any suggestions on how to implement this?
Thank you in advance.
Mark
Update 1
In the mean time I got one idea how to approach to this problem. I'd kindly ask you for your comments on it and I'd still need some help in getting the problem resolved.
So, the idea is to crosstab the table like this:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 NULL 3 NULL
0:03 3 NULL 5
0:13 NULL 5 NULL
0:22 7 NULL NULL
0:34 NULL NULL 5
0:35 NULL 6 NULL
0:37 5 NULL NULL
The query for this to happen would be:
SELECT Time,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 1) AS Device1Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 2) AS Device2Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 3) AS Device3Value
FROM dbo.Event S GROUP BY Time
What I'd still need to do is to write a user defined function and call it within this query which would write last available value in case of NULL and if the last available value doesn't exist it would leave NULL value. With this function I'd get the following results:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 2 3 NULL
0:03 3 3 5
0:13 3 5 5
0:22 7 5 5
0:34 7 5 5
0:35 7 6 5
0:37 5 6 5
And by having this results I'd be able to calculate the average for each time by only SUMing up the 3 relevant columns and dividing it by count (in this case 3). For NULL I'd use 0 value.
Can anybody suggest how to create a user defined function for replacing NULL values with latest value?
Update 2
Thanks Martin.
This query worked but it took almost 21 minutes to go through the 13.576 lines which is far too much.
The final query I used was:
SELECT Time,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 1 ORDER BY e.Time DESC) AS Device1Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 2 ORDER BY e.Time DESC) AS Device2Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 3 ORDER BY e.Time DESC) AS Device3Value
FROM dbo.Event S GROUP BY Time
but I've extended it to 10 devices.
I agree that this is not the best way to do it. Is there any other way to prepare the data for the average calculation because this takes just too much of the processing.

Here's one way. It uses the "Quirky Update" approach to filling in the gaps. This relies on an undocumented behaviour so you may prefer to use a cursor for this.
DECLARE #SourceData TABLE([Time] TIME, Device INT, value FLOAT)
INSERT INTO #SourceData
SELECT '0:00',1,2 UNION ALL
SELECT '0:01',2,3 UNION ALL
SELECT '0:03',3,5 UNION ALL
SELECT '0:03',1,3 UNION ALL
SELECT '0:13',2,5 UNION ALL
SELECT '0:22',1,7 UNION ALL
SELECT '0:34',3,5 UNION ALL
SELECT '0:35',2,6 UNION ALL
SELECT '0:37',1,5
CREATE TABLE #tmpResults
(
[Time] Time primary key,
[1] FLOAT,
[2] FLOAT,
[3] FLOAT
)
INSERT INTO #tmpResults
SELECT [Time],[1],[2],[3]
FROM #SourceData
PIVOT ( MAX(value) FOR Device IN ([1],[2],[3])) AS pvt
ORDER BY [Time];
DECLARE #1 FLOAT, #2 FLOAT, #3 FLOAT
UPDATE #tmpResults
SET #1 = [1] = ISNULL([1],#1),
#2 = [2] = ISNULL([2],#2),
#3 = [3] = ISNULL([3],#3)
SELECT [Time],
(SELECT AVG(device)
FROM (SELECT [1] AS device
UNION ALL
SELECT [2]
UNION ALL
SELECT [3]) t) AS [Average value]
FROM #tmpResults
DROP TABLE #tmpResults

So one of the possible solutions which I found is far more efficient (less than a second for 14.574 lines). I haven't yet had time to review the results in details but on the first hand it looks promising. This is the code for the 3 device example:
SELECT Time,
SUM(CASE MAC WHEN '1' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '2' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '3' THEN Stock ELSE 0 END) Device1Value,
FROM dbo.Event
GROUP BY Time
ORDER BY Time
In any case I'll test the code provided by Martin to see if it makes any difference to the results.

Related

Choose row that equal to the max value from a query

I want to know who has the most friends from the app I own(transactions), which means it can be either he got paid, or paid himself to many other users.
I can't make the query to show me only those who have the max friends number (it can be 1 or many, and it can be changed so I can't use limit).
;with relationships as
(
select
paid as 'auser',
Member_No as 'afriend'
from Payments$
union all
select
member_no as 'auser',
paid as 'afriend'
from Payments$
),
DistinctRelationships AS (
SELECT DISTINCT *
FROM relationships
)
select
afriend,
count(*) cnt
from DistinctRelationShips
GROUP BY
afriend
order by
count(*) desc
I just can't figure it out, I've tried count, max(count), where = max, nothing worked.
It's a two columns table - "Member_No" and "Paid" - member pays the money, and the paid is the one who got the money.
Member_No
Paid
14
18
17
1
12
20
12
11
20
8
6
3
2
4
9
20
8
10
5
20
14
16
5
2
12
1
14
10
It's from Excel, but I loaded it into sql-server.
It's just a sample, there are 1000 more rows

It seems like you are massively over-complicating this. There is no need for self-joining.
Just unpivot each row so you have both sides of the relationship, then group it up by one side and count distinct of the other side
SELECT
-- for just the first then SELECT TOP (1)
-- for all that tie for the top place use SELECT TOP (1) WITH TIES
v.Id,
Relationships = COUNT(DISTINCT v.Other),
TotalTransactions = COUNT(*)
FROM Payments$ p
CROSS APPLY (VALUES
(p.Member_No, p.Paid),
(p.Paid, p.Member_No)
) v(Id, Other)
GROUP BY
v.Id
ORDER BY
COUNT(DISTINCT v.Other) DESC;
db<>fiddle

How to join a table to a changelog table?

I have two tables related to tickets, DimTickets and want to calculate the time it takes between the CreatedDateTime and when the status was first changed to Done/Released. The DimTickets table is structured like so:
IssueKey
IssueType
Priority
Project
Status
CreatedDateTime
TEAM1-100
Story
High
Team 1
Approved for Release
2020-04-02 16:09:45
TEAM1-101
Task
Medium
Team 1
Done
2020-04-03 15:38:25
TEAM1-102
Sub-task
Low
Team 1
Done
2020-04-08 09:03:43
TEAM1-103
Bug
High
Team 1
In Progress
2020-04-13 12:18:56
TEAM1-104
Task
Medium
Team 1
Done
2020-04-16 11:40:08
TEAM2-100
Task
Medium
Team 2
Done
2020-04-17 09:06:17
TEAM2-101
Story
Medium
Team 2
Released
2020-04-17 15:55:45
TEAM2-102
Task
Low
Team 2
Done
2020-04-20 10:12:41
TEAM1-105
Task
High
Team 1
In Progress
2020-04-20 15:24:56
and a DimTicketChangelog that's structured like this:
ChangeLogID
IssueKey
FromStatus
ToStatus
ChangeLogDateTime
1
TEAM1-100
1
2
2019-06-14 15:56:03
2
TEAM1-100
2
3
2019-06-15 12:58:29
3
TEAM2-102
2
4
2019-06-16 17:58:48
4
TEAM1-100
3
5
2019-06-16 20:01:43
5
TEAM1-104
1
3
2019-06-18 10:02:39
6
TEAM1-105
4
5
2019-06-21 18:03:19
7
TEAM1-104
3
5
2019-06-24 22:05:28
8
TEAM2-102
4
6
2019-07-02 08:06:50
9
TEAM2-103
1
4
2019-07-04 11:06:50
Is there a way for me to join to DimTicketChangelog the first time a ticket is changed from a status < 5 to status 5/6 so that I can create a field that is essentially ChangeLogDateTime - CreatedDateTime to get the amount of time it took between creation of the ticket, to when it had its status changed to a resolved one?

Something like this should work
Explanation: You'll use the CTE to find all the instances where a ticket crossed from <5 to 5+ and "rank" them by change log date. You'll then select all of records where this ranking = 1 as that is the first instance, sorted by change log date time
EDIT: I added a date diff to satisfy your second requirement for the time it took. Your sample data is a bit interesting in terms of these dates, though with your real data you should be fine.
;WITH FindFirstChange AS (
select
t.IssueKey /*Add whatever other columns you need here*/
,t.createdDateTime
,tcl.ChangeLogDateTime
,DATEDIFF(day, t.createdDateTime ,tcl.ChangeLogDateTime) Diff
,ranking = ROW_NUMBER() OVER(PARTITION BY tcl.issuekey ORDER BY tcl.ChangeLogDateTime ASC)
FROM DimTickets t
INNER JOIN DimTicketchangelog tcl ON t.issuekey = tcl.issuekey
WHERE tcl.fromStatus <5
AND toStatus >= 5
)
SELECT *
FROM FindFirstChange
WHERE ranking = 1;

You have not provided any expected results and your test data doesn't really cover the criteria you describe, also your log table dates are earlier than the createDate of each ticket which makes no sense?! However see if the following works for you:
with l as (
select issuekey,changelogdatetime, Row_Number() over(partition by issueKey order by ChangeLogID) rn
from DimTicketChangeLog
where tostatus in (5,6) and FromStatus<5
)
select t.*, DateDiff(day,t.createddatetime,l.ChangeLogDateTime) Duration
from DimTickets t
left join l on l.IssueKey=t.IssueKey
where l.rn=1

Using t-sql to select aggregate when date difference is not just equal but small

I have a table where I want to select the maximum of a column but based on when the date difference is equal or small (lets say 3 days). When two subsequent dates are very close, the data are likely spurious and I want to get the highest state when that happens.
My data looks similar to this
DECLARE #TestingResults TABLE (
IDNumber varchar(100),
DateSeen date,
[state] int)
INSERT INTO #TestingResults VALUES
('A','2015-04-21',2),
('A','2015-05-08',2),
('A','2015-07-01',3),
('B','2014-06-18',100), -- this is the one I want
('B','2014-06-19',2),
('B','2014-07-31',2),
('B','2014-08-11',3),
('B','2014-09-24',3),
('B','2014-10-24',3),
('B','2014-11-24',3),
('B','2014-12-15',3),
('B','2015-01-12',3),
('B','2015-01-13',400), -- this is the one I want
('B','2015-04-06',10), -- either will do
('B','2015-04-07',10),
('B','2015-07-06',3), -- either will do
('B','2015-07-07',3),
('B','2015-10-12',3),
('C','2012-02-20',3),
('C','2012-03-12',3),
('C','2012-04-02',3),
('C','2012-11-21',3)
What I really want is something like this where I take the maximum of state when the difference between dates is < 3 (note, some of the data may have the same state even when the differences in date are small ...) :
IDNumber DateSeen state
A 2015-04-21 2
A 2015-05-08 2
A 2015-07-01 3
-- if there are observations < 3 days apart, take MAX
B 2014-06-18 100
B 2014-07-31 2
B 2014-08-11 3
B 2014-09-24 3
B 2014-10-24 3
B 2014-11-24 3
B 2014-12-15 3
-- if there are observations < 3 days apart, take MAX
B 2015-01-13 400
-- if there are observations < 3 days apart, take MAX
B 2015-04-07 10
-- if there are observations < 3 days apart, take MAX
B 2015-07-07 3
B 2015-10-12 3
C 2012-02-20 3
C 2012-03-12 3
C 2012-04-02 3
C 2012-11-21 3
I guess I could create another variable table to hold it and then query it but there are a couple of problems. First as you can see, IDNumber='B' has a couple of triggers in its sequences of dates so I am thinking there should be an 'smarter' way.
Thanks!

After your clarifying comments (thanks for that!), I would do this as follows:
SELECT ISNULL(high.IDNumber, results.IDNumber) AS IDNumber,
ISNULL(high.DateSeen, results.DateSeen) AS DateSeen,
ISNULL(high.[state], results.[state]) AS [state]
FROM #TestingResults results
OUTER APPLY
(
SELECT TOP 1 IDNumber, DateSeen, [state]
FROM #TestingResults highest
WHERE highest.DateSeen < results.DateSeen
AND highest.IDNumber = results.IDNumber
AND DATEDIFF(DAY,highest.DateSeen,results.DateSeen) <=3
ORDER BY [state] DESC, [DateSeen] DESC
) high
WHERE NOT EXISTS
(
SELECT 1
FROM #TestingResults nearFuture
WHERE nearFuture.DateSeen > results.DateSeen
AND nearFuture.IDNumber = results.IDNumber
AND DATEDIFF(DAY,results.DateSeen,nearFuture.DateSeen) <=3
)
This is almost certainly not the most elegant way to achieve this (I suspect this could be done more efficiently with Window Functions or a recursive CTE or similar), I believe it gives you the behaviour and results you desire.

This should do it using a recursive CTE:
WITH TestingResults AS (
SELECT
*
,ROW_NUMBER() OVER(ORDER BY IDNumber, DateSeen) AS RowNum
FROM #TestingResults
), Data AS (
SELECT
tmp1.IDNumber,
tmp1.DateSeen,
tmp1.state,
tmp1.RowNum,
tmp1.RowNum AS GroupID
FROM (
SELECT
*
,ABS(DATEDIFF(DAY, DateSeen, LAG(DateSeen, 1, NULL) OVER(PARTITION BY IDNumber ORDER BY DateSeen))) AS AbsPrev
FROM TestingResults
) AS tmp1
WHERE tmp1.AbsPrev IS NULL OR tmp1.AbsPrev >= 3 --the first date in a sequence
UNION ALL
SELECT
r.IDNumber,
r.DateSeen,
r.state,
r.RowNum,
d.GroupID
FROM Data d
INNER JOIN TestingResults r ON
r.IDNumber = d.IDNumber
AND DATEDIFF(DAY, d.DateSeen, r.DateSeen) < 3
AND d.RowNum+1 = r.RowNum
)
SELECT MIN(d.IDNumber) AS IDNumber, MAX(d.DateSeen) AS DateSeen, MAX(d.state) AS state
FROM Data d
GROUP BY d.GroupID

Calculate Bounce Rate SQL Server 2008

I'm trying to calculate the Bounce Rate of pages in SQL Server in a table with Audit Data from Sharepoint.
ItemId UserId DocLocation Occurred
1 1 Home.aspx 2016-08-02 13:39:41
1 2 Home.aspx 2016-08-02 13:40:07
2 1 Other.aspx 2016-08-02 13:40:16
3 1 Items.aspx 2016-08-02 13:40:17
2 2 Other.aspx 2016-08-02 13:40:11
ItemId is the id of the page, DocLocation the location of the page and Occurred when the user goes into the page.
To calculate the bounce rate we have to divide the number of bounces between the total number of visits.
A Bounce happens when an user leaves the page in less than 5 seconds.
This should be the results for that table:
ItemId Bounces Visits BounceRate(Bounces/Visits)
1 1 2 0.5
2 1 2 0.5
3 0 1 0
I want to count a bounce calculating how much passes since the user performs the check until the user makes a visit to another page. If that time is less than 5 seconds, it would be counted as a bounce.
I'm making a stored procedure that execute the query to show the bounce rate of each page, but this doesn´t work.
SELECT
SUM(CASE
WHEN (DATEDIFF(second, #Occurred,
(SELECT TOP 1 a.Occurred
FROM [AuditPages] a
WHERE a.UserId = #userId
AND a.Occurred > #occurred
ORDER BY a.Occurred ASC))) < 30
THEN 1.0
ELSE 0.0
END) / COUNT(#itemId)
Someone knows how i can calculate this Bounce Rate?
Thanks for all the answers.

I like using row_number for this type of sequenced problem. The query below gives the desired result. I find performance with CTEs can sometimes be problematic with larger tables and you may need to convert to a temp table. You might consider using milliseconds if there is a chance you would want to use 4.5 seconds or such in the future.
declare #bounce_seconds int = 5;
with audit_cte as (
select *, ROW_NUMBER() over (partition by UserId order by Occurred) row_num
from AuditPages
--order by UserId,row_num
)
select a.ItemId, sum(a.bounce) Bounces, count(1) Visits, sum(a.bounce)/convert(float, count(1)) BounceRate
from (
select a1.ItemId, datediff(s,a1.Occurred, a2.Occurred) elapsed, case when datediff(s,a1.Occurred, a2.Occurred) < #bounce_seconds then 1 else 0 end bounce
from audit_cte a1
left join audit_cte a2
on a2.UserId = a1.UserId
and a2.row_num = a1.row_num + 1
--order by a1.UserId, a1.row_num
) a
group by a.ItemId
order by a.ItemId;

SELECT ItemId,COUNT(1) VISITS,SUM(BOUNCE_IND) BOUNCE, cast(SUM(BOUNCE_IND) as decimal(5,2))/cast(COUNT(1) as decimal(5,2)) BOUNCE_RATE
FROM (
Select
UserID,
ItemID,
DocLocation,
Occurred as Entry_time,
Lead(Occurred,1) Over (Partition by Userid order by Occurred) Exit_time,
CASE WHEN DATEDIFF(ss,Occurred,Lead(Occurred,1) Over (Partition by Userid order by Occurred)) <= 5 THEN 1 ELSE 0 END BOUNCE_IND
FROM Web_Data_Sample
) TBL GROUP BY ItemId

Creating a recursive CTE with no rootrecord

My Apologies for the appalling Title, I was trying to be descriptive but not sure I got to the point. Hopefully the below will explain it
I begin with a table that has the following information
Party Id Party Name Party Code Parent Id
1 Acme 1 ACME1 1
2 Acme 2 ACME2 1
3 Acme 3 ACME3 3
4 Acme 4 ACME4 4
5 Acme 5 ACME5 4
6 Acme 6 ACME6 6
As you can see this isn't perfect for a recursive CTE because rather than having a NULL where there isn't a parent record it is instead parented to itself (see rows 1,3 and 6). Some however are parented normally.
I have therefore tried to amend this table in a CTE then refer to the output of that CTE as part of my recursive query... This doesn't appear to be running very well (no errors yet) so I wonder if I have managed to create an infinite loop or some other error that just slows the query to a crawl rather than killing it
My Code is below... please pick it apart!
--This is my attempt to 'clean' the data and set records parented to themselves as the 'anchor'
--record
WITH Parties
AS
(Select CASE
WHEN Cur_Parent_Id = Party_Id THEN NULL
ELSE Cur_Parent_Id
END AS Act_Parent_Id
, Party_Id
, CUR_PARTY_CODE
, CUR_PARTY_NAME
FROM EDW..TBDIMD_PARTIES
WHERE CUR_FLG = 1),
--In this CTE I referred to my 'clean' records from above and then traverse through them
--looking at the actual parent record identified
linkedParties
AS
(
Select Act_Parent_Id, Party_Id, CUR_PARTY_CODE, CUR_PARTY_NAME, 0 AS LEVEL
FROM Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.CUR_PARTY_CODE, p.CUR_PARTY_NAME, Level + 1
FROM Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
From the data I supplied earlier the results I would expect are;
Party Id Party Name Party Code Parent Id Level
1 Acme 1 ACME1 1 0
3 Acme 3 ACME3 3 0
4 Acme 4 ACME4 4 0
6 Acme 6 ACME6 6 0
2 Acme 2 ACME2 1 1
5 Acme 5 ACME5 4 1
If everything seems to be OK then I'll assume its just a processing issue and start investigating that but I am not entirely comfortable with CTE's so wish to make sure the error is not mine before looking elsewhere.
Many Thanks

I think that you made it more complicated than it needs to be :).
drop table #temp
GO
select
*
into #temp
from (
select '1','Acme 1','ACME1','1' union all
select '2','Acme 2','ACME2','1' union all
select '3','Acme 3','ACME3','3' union all
select '4','Acme 4','ACME4','4' union all
select '5','Acme 5','ACME5','4' union all
select '6','Acme 6','ACME6','6'
) x ([Party Id],[Party Name],[Party Code],[Parent Id])
GO
;with cte as (
select
*,
[Level] = 0
from #temp
where 1=1
and [Party Id]=[Parent Id] --assuming these are root records
union all
select
t.*,
[Level] = c.[Level]+1
from #temp t
join cte c
on t.[Parent Id]=c.[Party Id]
where 1=1
and t.[Party Id]<>t.[Parent Id] --prevent matching root records with themselves creating infinite recursion
)
select
*
from cte
(* should ofcourse be replaced with actual column names)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Average calculation through time scale (SQL Server) - preparation for charting - sql-server

Related

Choose row that equal to the max value from a query

How to join a table to a changelog table?

Using t-sql to select aggregate when date difference is not just equal but small

Calculate Bounce Rate SQL Server 2008

Creating a recursive CTE with no rootrecord

Categories

Resources