SQL - building a fact table with calendar day granularity - sql-server

I have a dataset (DATASET1) that lists all employees with their Dept IDs, the date they started and the date they were terminated.
I'd like my query to return a dataset in which every row represents a day for each employee stayed employed, with number of days worked (Start-to-Date).
How do I this query? Thanks for your help, in advance.
DATASET1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20100505 20130101
DATASET2
DeptID EmployeeID Date #ofDaysWorked
--------------------------------------------
001 123 20100101 1
001 123 20100102 2
001 123 20100103 3
001 123 20100104 4
.... .... ........ ...
EIDT: My goal is to build a fact table which would be used to derive measures in SSAS. The measure I am building is 'average length of employment'. The measure will be deployed in a dashboard and the users will have the ability to select a calendar period and drill-down into month, week and days. That's why I need to start with such a large dataset. Maybe I can accomplish this goal by using MDX queries but how?

You can use a recursive CTE to perform this:
;with data (deptid, employeeid, inc_date, enddate) as
(
select deptid, employeeid, startdate, enddate
from yourtable
union all
select deptid, employeeid,
dateadd(d, 1, inc_date),
enddate
from data
where dateadd(d, 1, inc_date) <= enddate
)
select deptid,
employeeid,
inc_date,
rn NoOfDaysWorked
from
(
select deptid, employeeid,
inc_date,
row_number() over(partition by deptid, employeeid
order by inc_date) rn
from data
) src
OPTION(MAXRECURSION 0)
See SQL Fiddle with Demo
The result is similar to this:
| DEPTID | EMPLOYEEID | DATE | NOOFDAYSWORKED |
-----------------------------------------------------
| 1 | 123 | 2010-01-01 | 1 |
| 1 | 123 | 2010-01-02 | 2 |
| 1 | 123 | 2010-01-03 | 3 |
| 1 | 123 | 2010-01-04 | 4 |
| 1 | 123 | 2010-01-05 | 5 |
| 1 | 123 | 2010-01-06 | 6 |
| 1 | 123 | 2010-01-07 | 7 |
| 1 | 123 | 2010-01-08 | 8 |
| 1 | 123 | 2010-01-09 | 9 |
| 1 | 123 | 2010-01-10 | 10 |
| 1 | 123 | 2010-01-11 | 11 |
| 1 | 123 | 2010-01-12 | 12 |

SELECT DeptID, EmployeeID, Date, DATEDIFF(DAY, StartDate, '3/1/2011') AS ofDaysWorked
FROM DATASET1
See if that worked!

Related

Select row satisfying certain condition and rows next to it

Let's say I have a historical table keeping who has modified data
-------------------------------------------------------------
| ID | Last_Modif | User_Modif | Col3, Col4...
-------------------------------------------------------------
| 1 | 2018-04-09 12:12:00 | John
| 2 | 2018-04-09 11:10:00 | Jim
| 3 | 2018-04-09 11:05:00 | Mary
| 4 | 2018-04-09 11:00:00 | John
| 5 | 2018-04-09 10:56:00 | David
| 6 | 2018-04-09 10:53:00 | John
| 7 | 2018-04-08 19:50:00 | Eric
| 8 | 2018-04-08 18:50:00 | Chris
| 9 | 2018-04-08 15:50:00 | John
| 10 | 2018-04-08 12:50:00 | Chris
----------------------------------------------------------
I would like to find the modifs done by John and previous version before he did that, to check what he had modified. For example in this scenario I would like to return row 1,2,4,5,6,7,9,10
I am thinking of ranking first based on Last_modif then do a join to pick up the next row, but somehow the result is not correct. This seems not a LAG/LEAD case since I am not picking a single value from the next row, but instead the whole next row. Any idea ?
-- sample 1000 rows with RowNumber
with TopRows as
(select top 1000 *, ROW_NUMBER() OVER(ORDER BY Last_modif desc) RowNum from [Table])
--Reference rows : Rows modif by John
, ModifByJohn as
(Select * from TopRows where USER_MODIF = 'John')
select * from ModifByJohn
UNION
select ModifByNext.* from ModifByJohn join TopRows ModifbyNext on ModifByJohn.RowNum + 1 = ModifByNext.RowNum
order by RowNum
How will the code look like if we would like to return last 2 modifs before John did instead of 1 ?
Maybe you can take advantage of your current ID:
with x as
(
select t1.*,
(select top 1 id from tbl where id > t1.id) prev_id
from tbl t1
where t1.User_Modif = 'John'
)
select * from x;
GO
ID | Last_Modif | User_Modif | prev_id
-: | :------------------ | :--------- | ------:
1 | 09/04/2018 12:12:00 | John | 2
4 | 09/04/2018 11:00:00 | John | 5
6 | 09/04/2018 10:53:00 | John | 7
9 | 08/04/2018 15:50:00 | John | 10
with x as
(
select t1.*,
(select top 1 id from tbl where id > t1.id) prev_id
from tbl t1
where t1.User_Modif = 'John'
)
select ID, Last_Modif, User_Modif from x
union all
select ID, Last_Modif, User_Modif
from tbl
where ID in (select prev_id from x)
order by ID
GO
ID | Last_Modif | User_Modif
-: | :------------------ | :---------
1 | 09/04/2018 12:12:00 | John
2 | 09/04/2018 11:10:00 | Jim
4 | 09/04/2018 11:00:00 | John
5 | 09/04/2018 10:56:00 | David
6 | 09/04/2018 10:53:00 | John
7 | 08/04/2018 19:50:00 | Eric
9 | 08/04/2018 15:50:00 | John
10 | 08/04/2018 12:50:00 | Chris
dbfiddle here

How to sum up values in a column for each week number?

I need to sum up values from Money column for each WeekNumber.
Now I have view:
WeekNumber | DayTime | Money
---------------------------------------
1 | 2012-01-01 | 20.4
1 | 2012-01-02 | 30.5
1 | 2012-01-03 | 55.1
2 | 2012-02-01 | 67.3
2 | 2012-02-02 | 33.4
3 | 2012-03-01 | 11.8
3 | 2012-03-04 | 23.9
3 | 2012-03-05 | 34.3
4 | 2012-04-01 | 76.6
4 | 2012-04-02 | 90.3
Tsql:
SELECT datepart(week,DayTime) AS WeekNumber, DayTime, Money FROM dbo.Transactions
In conclusion, I would like to get something like this:
WeekNumber | DayTime | Sum
---------------------------------------
1 | 2012-01-01 | 106
2 | 2012-02-02 | 100.7
3 | 2012-03-03 | 470
4 | 2012-04-01 | 166.9
DayTime should be random for each Week Number but exists in column DayTime from view above.
Please, be free to write your ideas. Thanks.
SELECT datepart(week,DayTime) AS WeekNumber
, MIN(DayTime) DayTime --<-- Instead of random get first date from your data in that week
, SUM(Money) AS [Sum]
FROM dbo.Transactions
GROUP BY datepart(week,DayTime)
Try this
SELECT datepart(week,DayTime) AS WeekNumber, SUM(Money) FROM dbo.Transactions GROUP BY WeekNumber
As you will have number of rows for each week you cannot get DayTime with the same table. There are other ways to add that too like JOIN
Change your SQL to sum the money column. Like this
SELECT
datepart(week,DayTime) AS WeekNumber,
DayTime, Money = SUM(Money)
FROM dbo.Transactions
GROUP BY datepart(week,DayTime),DayTime
SELECT datepart(week, DayTime) AS WeekNumber
,MIN(DayTime)
,SUM(MONEY)
FROM dbo.Transactions
GROUP BY datepart(week, DayTime)

Create a View From Max1 of One Column and the Max2 of Another if there are multiple Max1 for group

From my table, I want to select for each project ID the ID with the latest deploymentDate and if there are two identical latest deployment dates for the same project ID, select the ID with the latest submittedOn datetime. So if my table looks like this:
id | projectId | deploymentDate | submittedOn |
1 | 1 | 2017-01-02 | 2017-01-02 13:00:00 |
2 | 1 | 2017-01-04 | 2017-01-04 11:00:00 |
3 | 2 | 2017-01-06 | 2017-01-06 17:00:00 |
4 | 2 | 2017-01-06 | 2017-01-01 12:00:00 |
5 | 3 | 2017-01-02 | 2017-01-02 13:30:00 |
6 | 3 | 2017-01-02 | 2017-01-05 15:00:00 |
7 | 3 | 2017-01-02 | 2017-01-04 10:00:00 |
The desired rows are:
id | projectId | deploymentDate | submittedOn |
2 | 1 | 2017-01-04 | 2017-01-04 11:00:00 |
3 | 2 | 2017-01-06 | 2017-01-06 17:00:00 |
6 | 3 | 2017-01-02 | 2017-01-05 15:00:00 |
You can try the below. Adjust the sorting in the row_number as needed.
select
a.id,
a.projectid,
a.deploymentdate,
a.submittedOn
from project a
inner join
(select
a.id,
row_number() over (partition by projectid order by deploymentdate desc, submittedOn desc, id) as rid
from project
) as b
on b.id = a.id
and b.rid = 1
This would work:
select t.id, latest.*
from tab t join (
select projectid, max(deploymentdate) deploymentdate, max(submittedon) submittedon
from tab
group by projectid
) latest on t.projectid = latest.projectid and t.deploymentdate = latest.deploymentdate and t.submittedon = latest.submittedon
I found the latest based on the project id and then, joined with the source table to find the corresponding id.

SQL Server : how to use variable values from CTE in WHERE clause?

First of all please correct me if my title are not specific/clear enough.
I have use the following code to generate the start dates and end dates :
DECLARE #start_date date, #end_date date;
SET #start_date = '2016-07-01';
with dates as
(
select
#start_date AS startDate,
DATEADD(DAY, 6, #start_date) AS endDate
union all
select
DATEADD(DAY, 7, startDate) AS startDate,
DATEADD(DAY, 7, endDate) AS endDate
from
dates
where
startDate < '2017-03-31'
)
select * from dates
Below is part of the output from above query :
+------------+------------+
| startDate | endDate |
+------------+------------+
| 2016-07-01 | 2016-07-07 |
| 2016-07-08 | 2016-07-14 |
| 2016-07-15 | 2016-07-21 |
| 2016-07-22 | 2016-07-28 |
| 2016-07-29 | 2016-08-04 |
+------------+------------+
Now I have another table named sales, which have 3 columns sales_id,sales_date and sales_amount as below :
+----------+------------+--------------+
| sales_ID | sales_date | sales_amount |
+----------+------------+--------------+
| 1 | 2016-07-04 | 10 |
| 2 | 2016-07-06 | 20 |
| 3 | 2016-07-13 | 30 |
| 4 | 2016-07-19 | 15 |
| 5 | 2016-07-21 | 20 |
| 6 | 2016-07-25 | 25 |
| 7 | 2016-07-26 | 40 |
| 8 | 2016-07-29 | 20 |
| 9 | 2016-08-01 | 30 |
| 10 | 2016-08-02 | 30 |
| 11 | 2016-08-03 | 40 |
+----------+------------+--------------+
How can I create the query to show the total sales amount of each week (which is between each startDate and endDate from the first table)? I suppose I will need to use a recursive query with WHERE clause to check if the dates are in between startDate and endDate but I cant find a working example.
Here are my expected result (the startDate and endDate are the records from the first table) :
+------------+------------+--------------+
| startDate | endDate | sales_amount |
+------------+------------+--------------+
| 2016-07-01 | 2016-07-07 | 30 |
| 2016-07-08 | 2016-07-14 | 30 |
| 2016-07-15 | 2016-07-21 | 35 |
| 2016-07-22 | 2016-07-28 | 65 |
| 2016-07-29 | 2016-08-04 | 120 |
+------------+------------+--------------+
Thank you!
Your final Select (after the cte) should be something like this
Select D.*
,Sales_Amount = sum(Sales)
From dates D
Join Sales S on (S.sales_date between D.startDate and D.endDate)
Group By D.startDate,D.endDate
Order By D.startDate
EDIT: You could use a Left Join if you want to see missing dates from
Sales

Moving average in Temporal database in PostgreSQL

How can I apply the moving average in temporal database.
My data includes temperature and I want to apply moving average for every 15 records.
You can fire query as below
marc=# SELECT entity, name, salary, start_date,
avg(salary) OVER (ORDER BY entity, start_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM salary;
entity | name | salary | start_date | avg
-----------+-----------+---------+---------------+----------------------
Accounting | millicent | 850.00 | 2006-01-01 | 825.0000000000000000
Accounting | jack | 800.00 | 2010-05-01 | 916.6666666666666667
R&D | tom | 1100.00 | 2005-01-01 | 966.6666666666666667
R&D | john | 1000.00 | 2008-07-01 | 933.3333333333333333
R&D | maria | 700.00 | 2009-01-01 | 733.3333333333333333
R&D | kevin | 500.00 | 2009-05-01 | 633.3333333333333333
R&D | marc | 700.00 | 2010-02-15 | 600.0000000000000000
WITH moving_avrag AS (
SELECT 0 AS [lag] UNION ALL
SELECT 1 AS [lag] UNION ALL
SELECT 2 AS [lag] UNION ALL
SELECT 3 AS [lag] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1]) AS [avg_value1],
AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];

Resources