Avoid multiple counting of overlapped times - sql-server

I have calculated the overlap times between start_time and end_time for groups of IDs for multiple user logins sessions. ID is unique to a user. A user can have multiple sessions from different browsers, devices, etc.
Here's the dataset I have
row id start_time end_time overlap_in_seconds
1 1 08:41:27 08:47:26 359
2 1 08:39:31 08:40:42 71
3 1 08:41:37 08:47:26 349
If you notice for rows 1 and 3 the time overlap time between 08:41:37 and 08:47:26 has been counted twice.
There are 2 options how I would like to show the result:
Option 1:
row id start_time end_time overlap_in_seconds
1 1 08:41:27 08:47:26 10 (the extra overlap time from 08:41:27 to 08:41:37)
2 1 08:39:31 08:40:42 71
3 1 08:41:37 08:47:26 349
Option 2:
I use this table as a temp table and in the outer query when I do a sum on Overlap_in_seconds, I get the total overlap time as 430 seconds (10+71+349), not 779(349+359+71).
meeting_id. correct_overlap
1 430
Below is my outer query.
select temp.id as meeting_id, sum(temp.Overlap_in_seconds) as correct_overlap
from temp
group by temp.id
Any ideas how to do this?

I'm not sure if this is the quickest way to do it, but what i would is to convert every date to a 86400 bit array, every position correspond to each second in the day, next just simply join all the arrays and it will be the seconds for each group.

Related

Retain last 5 visits by Person in SAS

I have the following that contains dates, the visit number, and a specific variable of interest. I would like to retain the last five visits that are available in SAS by person. I am familiar with retaining the first and last visits. The data for a single subject is listed below:
Person Date VisitNumber VariableOfInterest
001 10/10/2001 1 6
001 11/12/2001 3 8
001 01/05/2002 5 12
001 03/10/2002 6 5
001 05/03/2002 8 3
001 07/29/2002 10 11
Any insight would be appreciated.
A double DOW loop will let you measure the group in the first loop and select from the group based on your desired per-group criteria in the second loop. This is useful when have is large and pre-sorted, and you want to avoid additional sorting.
data want;
* measure the group size;
do _n_ = 1 by 1 until (last.person);
set have;
by person visitnumber; * visitnumber in by only to enforce expectation of orderness;
end;
_i_ = _n_;
* apply the criteria "last 5 rows in group";
do _n_ = 1 to _n_;
set have;
if _i_ - _n_ < 5 then output;
end;
run;
It is easier if you sort by descending VisitNumber so that the problem becomes take the first 5 observations for a person. Then just generate a counter of which observation this is for the person and subset on that.
data want;
set have ;
by person descending visitnumber;
if first.person then rowno=0;
rowno+1;
if rowno <= 5;
run;

AVG giving a Count instead of Average

This is probably a silly mistake on my end but I can't quite figure it out on my on.
I'm trying to calculate average over a set of data pulled from a sub-query presented in the following way:
TotalPDMPs DefaultClinicID
13996 -1
134 23
432 29
123 26
39 27
13 21
40 24
46 30
1 25
Now the average for each 'DefaultClinicID' calculated for 'TotalPDMPs' is the same as the data above.
Here's my query for calculating the average:
select DefaultClinicID as ClinicID, AVG(TotalPDMPs)
from
(select count(p.PatientID) as TotalPDMPs, DefaultClinicID from PatientPrescriptionRegistry ppr, Patient p
where p.PatientID = ppr.PatientID
and p.NetworkID = 2
group by DefaultClinicID) p
group by DefaultClinicID
can someone tell me what I'm doing wrong here?
Thanks.
The group by column is the same so it gets a count in the inner query by DefaultClinicID and then it tries to take an average of the same DefaultClinicID.
Does that make sense? Any aggregation on that column while you group by the same thing will return the same thing. So for clinic 23 the average calculation would be: 134 / 1 = 134.
I think you just need to do the average in your inner query and you get what you want. Or maybe avg(distinct p.patientID) is what you are after?
In the inner sub-query you already grouped by DefaultClinicID,
So every unique DefaultClinicID has already only one row.
And the avg of x is x.

Selecting the row before and after the value changes in one column and correlate with the change in value in next column

I have the following problem that I am trying to solve using SQL server 2008.
The table has 4 columns
1- Identifier such as (a,a,a,b,b,c,c)
2- Time (in seconds)
3- Value 1 (integer)
4- Value 2 (Float)
When sorted by time for each identifier the Value 1 is repeated for several rows. Once the Value 1 changes there is a corresponding change in Value 2 after several rows (ranging between 1 and 5+ seconds).
1- Detect when the Value 1 changes and correspond it to when Value 2 changes
2- Difference in seconds between Value 1 changes.
I cannot use Partition by because there are recurring values for Value 1 and Value 2.
Identifier TimeStamp Value1 Value2
a 12:10:01 2 0.98
a 12:10:02 2 0.98
a 12:10:03 3 0.98
a 12:10:05 2 0.98
a 12:10:06 3 0.50
a 12:10:09 2 0.98
a 12:10:12 2 0.50
a 12:10:13 2 0.98
b 12:10:10 2 0.98
b 12:10:11 4 0.98
b 12:10:12 5 0.98
b 12:10:12 5 0.80
b 12:10:12 5 1.20
I have been trying the following query but it is taking too long to run. For every change in Value1 there is a corresponding change in Value2. The change in value 2 can happen any time over the period of several seconds. I cannot figure out a way to correlate the two changes.
;WITH Value1Change AS (
SELECT
ROW_NUMBER () OVER (ORDER BY TimeStamp) AS [RNum]
,Identifier
,T1.[TimeStamp] AS [T1 TimeStamp]
,T2.[TimeStamp] AS [T2 TimeStamp]
,T1.[Value1] AS [T1_Value 1]
,T2.[Value2] AS [T2_Value 1]
FROM Table T1
INNER JOIN Table T2 ON T1.[Identifier]=T2.[Identifier] AND T1.RNum=T2.RNum+1
WHERE T2.[Value1]<>T1.[Value1]
)
SELECT
VC1.Identifier
,VC1.[T2 TimeStamp]
,VC2.[T2 TimeStamp]
,DATEDIFF(S,VC1.[T2 TimeStamp],VC2.[T2 TimeStamp]) AS [Time Between Change]
,VC1.[T1_Value 1]
,VC1.[T2_Value 1]
FROM ValueChange VC1
INNER JOIN ValueChange VC2 ON VC1.Unit=VC2.Unit AND VC1.RNum=VC2.RNum+1
ORDER BY VC1.RNum

SSRS. How to group in a group?

I have SSRS report like below with Boolean parameter to show 12h view or 24h view. To fit report into single screen the 24h report need to group by every 2hr.
07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 ...
Line 1 25 30 24 26 25 25 30 30 ...
08:00 10:00 12:00 14:00 ...
Line 1 55 50 50 60 ...
The query for the dataset is:
SELECT LineID
,Hour
,HourValue
,Target
FROM vwData
ORDER BY LineID, CASE WHEN [Hour] > 6 THEN - 1 ELSE [Hour] END
How can I achieve this?
This declares your bit variable (which should be true when they want the 24 hour view - false when 12 hour)
DECLARE #24Hour bit = 0
SELECT CASE WHEN #24Hour = 0
THEN Hour
ELSE Hour + (Hour % 2)
END AS [HourGroup]
,SUM(Target) AS [TargetTotal]
FROM vwData
GROUP BY CASE WHEN #24Hour = 0
THEN Hour
ELSE Hour + (Hour % 2)
END
If they want the 24 hour view, we make hour = hour + hour % 2. (7 = 8, 8=8, 9=10, etc., etc.). If you had a more complex query, I would suggest reading up on cross apply, but this is so simple I think this will suffice. The grouping by makes sure to aggregate the REAL 7 and REAL 8 hour records (which will both be returned as "8", if using the 24 hour view). If you don't group your results, you will get two 8 oclock records - one with the REAL 7 hour total and one with the REAL 8 hour total.
EDIT:
Since you didn't include the schema of your DB, I'm guessing that 'Target' is the value being summated, but it could just as easily be 'HourValue'. Furthermore, I have no idea why you would need LineID, so I omitted it from my answer. But you can easily modify that if it's inaccurate. In the future, you should provide some sample data and your database schema so that others aren't forced to make assumptions or guess.
You could add a calculated field with a value given by something like this: `Fields!Hour.Value + Fields!Hour.Value Mod 2' and then group on that field, using a parameter to choose the Group By field in the report (Your new field or the actual hour value).

Running counts of records and sum of max() records within date range based specified intervals in t-sql

Sample data: (assume year_month_record is the first day of the month and is datetime data type)
location item year_month_record type visits1 visits2
ABC111 11JF445553 2014-01 sales 3 5
ABC111 11JF445553 2014-02 sales 3 6
ABC111 11JF445553 2014-03 sales 2 8
ABC111 11JF445553 2014-04 sales 2 4
ABC111 22WZ777814 2014-02 sales 3 5
ABC111 55RR342013 2014-01 nsales 1 2
For the given sample data, I need to count how many times records with the same location and item appear within specified intervals. In addition, I need to grab the maximum value for specified interval / time frame and sum it up based on location, item_number and type.
The output should look something like this:
location year_month_record length_months type count_unique_visits sum_max_visits1 sum_max_visits2
ABC111 2014-01 3 sales 4 6 13
ABC111 2014-02 3 sales 4 6 12
ABC111 2014-03 3 sales 2 4 12
ABC111 2014-04 3 sales 1 2 4
ABC111 2014-01 3 nsales 1 1 2
notes for calculating visits1 / visits2 above
example output of record 1: max(of item 11JF445553) = 3 + max(item 22WZ77781) = 3. Sum = 6 (item 55RR342013 has a different type). Note 2. All records with max summed up are within "length_months" specified of 3 months. 2014-01 through 2014-03.
new "type" will cause new grouping to start
Additional notes:
count_unique_visits is the count for each record within date range
length_months is defined prior to execution and can be hardcoded
current year_month_record + length_months (i.e. 2014-01 year_month_record with length_months = 3) is 01/2014 through 03/2014
I've tried creating a recursive CTE to select the count and max, but i'm doing something wrong.
Basically, I need to be able to recursively, grab a count and the max visit1/2 for a given interval.
Starting with 01/2014, it would need to look for the max(visits1/2) for the next three months (basically, 01/2014 - 04/2014) and return those. In 02/2014, it would use the range of 02/2014 through 05/2014 and return the max there as well. It would continue this throughout the recordset. The interval would be 3 months, but then I could copy the query and replace with 6 months and so on and so forth.
Closing this topic to ask a more targeted/specific question.
Any help would be appreciated.
You can use a combination of a groupping subquery followed by a cross apply subquery:
DECLARE #len int = 3
SELECT grp.*, SUM(ca.cuv) count_unique_visits, SUM(ca.visits1) sum_max_visits1, SUM(ca.visits2) sum_max_visits2
FROM
(SELECT v.location, v.year_month_record, v.type
FROM Visits v
GROUP BY v.location, v.year_month_record, v.type) grp CROSS APPLY
(SELECT COUNT(*) cuv, MAX(visits1) visits1, MAX(visits2) visits2
FROM Visits ca_v
WHERE ca_v.location = grp.location AND grp.type = ca_v.type AND ca_v.year_month_record >= grp.year_month_record AND
ca_v.year_month_record < DATEADD(month, #len, grp.year_month_record)
GROUP BY ca_v.item
) ca
GROUP BY grp.location, grp.year_month_record, grp.type
ORDER BY grp.type DESC, grp.year_month_record
You can see the results in this SQLFiddle.
NOTE: As I wrote in the comment to the original question, I suspect you have a mistake in the requested output, if not, please explain...

Resources