How to merge data to form a panel? - dataset

I have two data frames. Data frame "weather" looks like this:
weather<-data.frame(Date=c("2012-04-01","2012-04-02","2012-04-03","2012-04-04"),Day=c("Sunday","Monday","Tuesday","Wednesday"), Temp=c(86,89,81,80))
Date Day Temperature
2012-04-01 Sunday 86
2012-04-02 Monday 89
2012-04-03 Tuesday 81
2012-04-04 Wednesday 80
And, data frame "Regularity", looks like this:
Regularity<-data.frame(Date=c("2012-04-02","2012-04-04","2012-04-03","2012-04-04"),EmployeeID=c(1,1,2,2),Attendance=c(1,1,1,1))
Date EmployeeID Attendance
2012-04-02 1 1
2012-04-04 1 1
2012-04-03 2 1
2012-04-04 2 1
I want to create a panel dataframe in R of the form:
Date Day Temperature EmployeeID Attendence
2012-04-01 Sunday 86 1 0
2012-04-02 Monday 89 1 1
2012-04-03 Tuesday 81 1 0
2012-04-04 Wednesday 80 1 1
2012-04-01 Sunday 86 2 0
2012-04-02 Monday 89 2 0
2012-04-03 Tuesday 81 2 1
2012-04-04 Wednesday 80 2 1
I have tried the merge and reshape2, but in vain. I will be very grateful for any help. Thank you.

Here is how. Suppose tb1 is the first table and tb2 is the second. Then the desired result will be achieved by following:
tb2_tf<-dcast(tb2,Date~EmployeeID,value.var="Attendance")
tb<-melt(merge(tb1,tb2_tf,all=TRUE),id=1:3,variable.name="EmployeeID",value.name="Attendance")
tb$Attendance[is.na(tb$Attendance)] <- 0
tb
Date Day Temperature EmployeeID Attendance
1 2012-04-01 Sunday 86 1 0
2 2012-04-02 Monday 89 1 1
3 2012-04-03 Tuesday 81 1 0
4 2012-04-04 Wednesday 80 1 1
5 2012-04-01 Sunday 86 2 0
6 2012-04-02 Monday 89 2 0
7 2012-04-03 Tuesday 81 2 1
8 2012-04-04 Wednesday 80 2 1
I would like to see the solution without the reshape part. I suspect there is one using some form of theta join.

Related

SQL Server PIVOT creating all Null Values

I have the following data I am trying to pivot. My goal is one row for each Label, and each week becomes a column with the rate as the week's value.
Label
Week
Rate
51220
Week 0
-11
51220
Week 1
-41
51220
Week 2
159
51220
Week 3
117
51220
Week 4
207
51220
Week 5
-37
51220
Week 6
138
51220
Week 7
139
51220
Week 8
-42
51220
Week 9
-45
51220
Week 10
-82
51220
Week 11
-85
51220
Week 12
-25
51347
Week 0
23
51347
Week 1
24
51347
Week 2
25
51347
Week 3
25
51347
Week 4
25
51347
Week 5
24
51347
Week 6
24
51347
Week 7
24
51347
Week 8
24
51347
Week 9
24
51347
Week 10
24
51347
Week 11
24
51347
Week 12
23
Here my my query:
SELECT * FROM table1
PIVOT (
SUM(Rate) FOR Week IN (Week0,Week1,Week2,Week3,Week4,Week5,Week6,Week7,Week8,Week9,Week10,Week11,Week12)
) pivot_table;
This results are always NULL. What am I doing incorrectly? I'm following several tutorials with no success.
Yeah, those brackets will do it.
SELECT *
FROM (VALUES('51220','Week 0',-11)
,('51220','Week 1',-41)
,('51347','Week 1', 24)
) table1(Label, Week, Rate)
PIVOT (SUM(rate) FOR WEEK IN ([Week 0],[Week 1])) AS pivot_table

SQL Server, running total, reset for each month and sum again

I have a calendar table where working days are marked.
Now I need a running total called "current_working_day" which sums up the working days until the end of a month and restarts again.
This is my query:
select
WDAYS.Date,
WDAYS.DayName,
WDAYS.WorkingDay,
sum(WDAYS.WorkingDay) OVER(order by (Date), MONTH(Date), YEAR(Date)) as 'current_working_day',
sum(WDAYS.WorkingDay) OVER(PARTITION by YEAR(WDAYS.Date), MONTH(WDAYS.Date) ) total_working_days_per_month
from WDAYS
where YEAR(WDAYS.Date) = 2022
This is my current output
Date
DayName
WorkingDay
current_working_day
total_working_days_per_month
2022-01-27
Thursday
1
19
21
2022-01-28
Friday
1
20
21
2022-01-29
Saturday
0
20
21
2022-01-30
Sunday
0
20
21
2022-01-31
Monday
1
21
21
2022-02-01
Tuesday
1
22
20
2022-02-02
Wednesday
1
23
20
2022-02-03
Thursday
1
24
20
But the column "current_workind_day" should be like this
Date
DayName
WorkingDay
current_working_day
total_working_days_per_month
2022-01-27
Thursday
1
19
21
2022-01-28
Friday
1
20
21
2022-01-29
Saturday
0
20
21
2022-01-30
Sunday
0
20
21
2022-01-31
Monday
1
21
21
2022-02-01
Tuesday
1
1
20
2022-02-02
Wednesday
1
2
20
2022-02-03
Thursday
1
3
20
Thanks for any advice.
You can try to use PARTITION by with EOMONTH function which might get the same result but better performance, then you might only need to order by Date instead of using the function with the date.
select
WDAYS.Date,
WDAYS.DayName,
WDAYS.WorkingDay,
sum(WDAYS.WorkingDay) OVER(PARTITION by EOMONTH(WDAYS.Date) order by Date) as 'current_working_day',
sum(WDAYS.WorkingDay) OVER(PARTITION by EOMONTH(WDAYS.Date) ) total_working_days_per_month
from WDAYS
where YEAR(WDAYS.Date) = 2022

SSRS - Using specific Row Number

I have a SQL query where I am getting the row number for a count of employees per division and per month at the beginning of the month and the end of the month. To do that, I use a payroll end date which is a weekly date. So in essence I have 4 dates where employee counts are shown. Some months have 5 dates which makes the row count for that month 5 instead of 4.
I then need to build an SSRS report to show only the first employee count and the last employee count per division, per month. I have the first number since I am using =IIF(Fields!RowNumber.Value = 1, Fields!EMPCOUNT.Value, 0)
The problem I have now is getting the last employee count where I need to conditionally select a count where row number needs to be 5 if exists or 4 if it doesn't exist. I'm not sure how to get the expression to work in SSRS. Sample data is below.
PRCo EMPCOUNT udDivision PREndDate ROWNUM Type
1 89 Civil 2018-01-06 00:00:00 1 1
1 97 Civil 2018-01-13 00:00:00 2 1
1 97 Civil 2018-01-20 00:00:00 3 1
1 97 Civil 2018-01-27 00:00:00 4 1
1 16 Colorado 2018-01-06 00:00:00 1 1
1 18 Colorado 2018-01-13 00:00:00 2 1
1 14 Colorado 2018-01-20 00:00:00 3 1
1 10 Colorado 2018-01-27 00:00:00 4 1
1 94 Civil 2018-02-03 00:00:00 1 2
1 91 Civil 2018-02-10 00:00:00 2 2
1 92 Civil 2018-02-17 00:00:00 3 2
1 91 Civil 2018-02-24 00:00:00 4 2
1 16 Colorado 2018-02-03 00:00:00 1 2
1 16 Colorado 2018-02-10 00:00:00 2 2
1 18 Colorado 2018-02-17 00:00:00 3 2
1 19 Colorado 2018-02-24 00:00:00 4 2
1 92 Civil 2018-03-03 00:00:00 1 3
1 91 Civil 2018-03-10 00:00:00 2 3
1 88 Civil 2018-03-17 00:00:00 3 3
1 92 Civil 2018-03-24 00:00:00 4 3
1 90 Civil 2018-03-31 00:00:00 5 3
1 19 Colorado 2018-03-03 00:00:00 1 3
1 26 Colorado 2018-03-10 00:00:00 2 3
1 25 Colorado 2018-03-17 00:00:00 3 3
1 27 Colorado 2018-03-24 00:00:00 4 3
1 24 Colorado 2018-03-31 00:00:00 5 3
I would do this in your query rather than trying to get it to work directly in SSRS. There might be a simpler way than this but this is just based on your existing query.
Please note this is untested and just off the top of my head so it may need some editing before it will work.
SELECT * INTO #t FROM YOUR_EXISTING_QUERY
SELECT DISTINCT
PRCo
, udDivision
, YEAR(PREndDate) AS Yr
, MONTH(PREndDate) AS Mnth
, FIRST_VALUE(EMPCOUNT) OVER(PARTITION BY PRCo, udDivision, YEAR(PREndDate), MONTH(PREndDate) ORDER BY ROWNUM) AS OpeningEMPCOUNT
, LAST_VALUE(EMPCOUNT) OVER(PARTITION BY PRCo, udDivision, YEAR(PREndDate), MONTH(PREndDate) ORDER BY ROWNUM) AS CLosing_EMPCOUNT
FROM #t
Yo might need to include Type not sure what this does but you get the idea hopefully.
The FIRST_VALUE and LAST_VALUE functions simply get the first/last value within the partition defined, in your case PRCo, udDivision and then just the year and month portion of the payroll end date, the first and last positions are determined by the order clause, in this case row number.

Read next record?

We are on MS SQL-Server 2012.Users want to know if the time of the patients next admission date is less than 30 days for any reason. Doesn’t have to be seen by same provider
I am not sure how to read the next record, if the MRN ID is the same, then calculate the difference in days between the record you are on and the next next record.
For example:
Record 1 : MRNID =33 Discharge date = 1/1/2016
Record 2 : MRNID = 33 Admission date = 2/2/2016
MRNIDs are the same, so I calculate. Then I compare record 2 to record 3 and do the same process.
Use Lead() window function
select mrnid,
admission_date,
discharge_date,
lead(admission_date) over (partition by mrnid order by admission_date) next_date
from table;
SAMPLE OUTPUT
mnrid admission_date lead(admission_date)
33 2016-01-01 2016-01-02
33 2016-01-02 2016-01-03
33 2016-01-03 2016-01-04
33 2016-01-04 null
34 2016-01-01 2016-01-02
34 2016-01-02 2016-01-03
34 2016-01-03 2016-01-04
34 2016-01-04 null

MS SQL: Group by date [duplicate]

This question already has answers here:
Sql Date Grouping with avaliable dates in database
(3 answers)
Closed 4 years ago.
ID DateTime EmailCount
93 6/1/2014 00:00:00 4
94 6/2/2014 00:00:00 4
95 6/3/2014 00:00:00 2
96 6/4/2014 00:00:00 2
97 6/5/2014 00:00:00 2
98 6/6/2014 00:00:00 2
99 6/7/2014 00:00:00 2
73 6/8/2014 00:00:00 2
74 6/9/2014 00:00:00 2
75 6/10/2014 00:00:00 4
76 6/11/2014 00:00:00 4
77 6/12/2014 00:00:00 2
78 6/13/2014 00:00:00 2
79 6/14/2014 00:00:00 2
80 6/16/2014 00:00:00 2
81 6/17/2014 00:00:00 4
82 6/18/2014 00:00:00 4
83 6/19/2014 00:00:00 4
84 6/20/2014 00:00:00 4
100 6/21/2014 00:00:00 4
101 6/22/2014 00:00:00 4
102 6/23/2014 00:00:00 4
103 6/24/2014 00:00:00 4
89 6/27/2014 00:00:00 4
90 6/28/2014 00:00:00 4
91 6/29/2014 00:00:00 4
92 6/30/2014 00:00:00 4
104 7/1/2014 00:00:00 4
105 7/2/2014 00:00:00 4
106 7/3/2014 00:00:00 4
121 7/6/2014 00:00:00 2
122 7/7/2014 00:00:00 2
123 7/8/2014 00:00:00 2
Generated Output
Startdate EndDate EmailCount
6/3/2014 00:00:00 6/14/2014 00:00:00 2
6/16/2014 00:00:00 6/16/2014 00:00:00 2
7/6/2014 00:00:00 7/8/2014 00:00:00 2
6/1/2014 00:00:00 6/11/2014 00:00:00 4
6/17/2014 00:00:00 6/24/2014 00:00:00 4
6/27/2014 00:00:00 7/3/2014 00:00:00 4
Here, the generated output is not perfect because I want StartDate to EndDate in groups like: (6/3/2014 to 6/9/2014 and EmailCount = 2) and (6/10/2014 to 6/11/2014 and EmailCount =4) and (6/12/2014 to 6/14/2014 and EmailCount =2). Also, date not in database should not be added to group.
A somewhat complex query to explain, but here goes an attempt;
If the time is always midnight, you could use a common table expression to assign a row number to each row, and group by the difference between the date and row number. As long as the sequence is not broken (ie the dates are consecutive and with the same emailid) they will end up in the same group and an outer query can easily extract the start and end date for each group;
WITH cte AS (
SELECT dateandtime, emailid,
ROW_NUMBER() OVER (PARTITION BY emailid ORDER BY dateandtime) rn
FROM mytable
)
SELECT MIN(dateandtime) start_time,
MAX(dateandtime) end_time,
MAX(emailid) emailid
FROM cte GROUP BY DATEADD(d, -rn, dateandtime) ORDER BY start_time
An SQLfiddle to test with.
If the datetimes are not always midnight, the grouping will fail. If that's the case, you could add a common table expression that converts the datetime to a date as a separate step before running this query.
You're looking for runs of consecutive dates in blocks with the same EmailID. This assumes you have no gaps in the dates. I'm not sure it's the most elegant approach but you can find a lot of stuff on this topic.
with BlockStart as (
select t.StartDate, t.EmailID
from T as t left outer join T as t2
on t2.StartDate = t1.StartDate - 1 and t2.EmailID = t1.EmailID
where t2.StartDate is null
union all
select max(StartDate) + 1, null
from T
) as BlockStart
select
StartDate,
(select min(StartDate) - 1 from BlockStart as bs2 where bs2 > bs.StartDate) as EndDate,
EmailID
from BlockStart as bs
where
EmailID is not null
-- /* or */ exists (select 1 from BlockStart as bs3 where bs3.StartDate > bs.StartDate)

Resources