Using SQL Server windowing function to get running total by fiscal year - sql-server

I'm using SQL Server 2014. I have a Claims table containing totals of claims made per month in my system:
+-----------+-------------+------------+
| Claim_ID | Claim_Date | Nett_Total |
+-----------+-------------+------------+
| 1 | 31 Jan 2012 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 |
| 3 | 31 Mar 2012 | 35344.33 |
| 4 | 30 Apr 2012 | 142355.63 |
| etc. | etc. | etc. |
+-----------+-------------+------------+
For a report I am writing I need to be able to produce a cumulative running total that resets to zero at the start of each fiscal year (in my country this is from March 1 to February 28/29 of the following year).
The report will look similar to the table, with an extra running total column, something like:
+-----------+-------------+------------+---------------+
| Claim_ID | Claim_Date | Nett_Total | Running Total |
+-----------+-------------+------------+---------------+
| 1 | 31 Jan 2012 | 321454.67 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 | 844997.01 |
| 3 | 31 Mar 2012 | 35344.33 | 35344.33 | (restart at 0
| 4 | 30 Apr 2012 | 142355.63 | 177699.96 | for new yr)
| etc. | etc. | etc. | |
+-----------+-------------+------------+---------------+
I know windowing functions are very powerful and I've used them in rudimentary ways in the past to get overall sums and averages while avoiding needing to group my resultset rows. I have an intuition that I will need to employ the 'preceding' keyword to get the running total for the current fiscal year each row falls into, but I can't quite grasp how to express the fiscal year as a concept to use in the 'preceding' clause (or if indeed it's possible to use a date range in this way).
Any assistance on the way of "phrasing" the fiscal year for the "preceding" clause will be of enormous help to me, please.

i think you should try this:
/* Create Table*/
CREATE TABLE dbo.Claims (
Claim_ID int
,Claim_Date datetime
,Nett_Total decimal(10,2)
);
/* Insert Testrows*/
INSERT INTO dbo.Claims VALUES
(1, '20120101', 10000)
,(2, '20120202', 10000)
,(3, '20120303', 10000)
,(4, '20120404', 10000)
,(5, '20120505', 10000)
,(6, '20120606', 10000)
,(7, '20120707', 10000)
,(8, '20120808', 10000)
Query the Data:
SELECT Claim_ID, Claim_Date, Nett_Total, SUM(Nett_Total) OVER
(PARTITION BY YEAR(DATEADD(month,-2,Claim_Date)) ORDER BY Claim_ID) AS
[Running Total] FROM dbo.Claims
The Trick: PARTITION BY YEAR(DATEADD(month,-2,Claim_Date))
New Partition by year, but i change the date so it fits your fiscal year.
Output:
Claim_ID |Claim_Date |Nett_Total |Running Total
---------+---------------------------+------------+-------------
1 |2012-01-01 00:00:00.000 |10000.00 |10000.00
2 |2012-02-02 00:00:00.000 |10000.00 |20000.00
3 |2012-03-03 00:00:00.000 |10000.00 |10000.00 <- New partition
4 |2012-04-04 00:00:00.000 |10000.00 |20000.00
5 |2012-05-05 00:00:00.000 |10000.00 |30000.00
6 |2012-06-06 00:00:00.000 |10000.00 |40000.00
7 |2012-07-07 00:00:00.000 |10000.00 |50000.00
8 |2012-08-08 00:00:00.000 |10000.00 |60000.00

Related

Sum running total in sql

I am trying to insert a running total column into a SQL Server table as part of a stored procedure. I am needing this for a financial database so I am dealing with accounts and departments. For example, let's say I have this data set:
Account | Dept | Date | Value | Running_Total
--------+--------+------------+----------+--------------
5000 | 40 | 2018-02-01 | 10 | 15
5000 | 40 | 2018-01-01 | 5 | 5
4000 | 40 | 2018-02-01 | 10 | 30
5000 | 30 | 2018-02-01 | 15 | 15
4000 | 40 | 2017-12-01 | 20 | 20
The Running_Total column provides a historical sum of dates less than or equal to each row's date value. However, the account and dept must match for this to be the case.
I was able to get close by using
SUM(Value) OVER (PARTITION BY Account, Dept, Date)
but it does not go back and get the previous months...
Any ideas? Thanks!
You are close. You need an order by:
Sum(Value) over (partition by Account, Dept order by Date)

Difference in time between current record and previous record (with a lag variable and case statement)

I want to calculate the time between subsequent records in SQL if the records happened on the same day.
I have a table that looks like this with data from January 24th and 25th from the same piece of equipment:
EQUIPMENTNUMBER | TimeOfDay | Month | WeekNumber | Day | Year
10020576 | 4:18:58 AM | 1 | 4 | 24 | 2019
10020576 | 4:57:23 AM | 1 | 4 | 24 | 2019
10020576 | 3:22:47 AM | 1 | 4 | 25 | 2019
10020576 | 4:19:14 AM | 1 | 4 | 25 | 2019
I am able to use a lag variable within a case statement to identify if the previous record occured on the same day as the current record. Using the following code, I get the following table:
SELECT
EQUIPMENTNUMBER,
[TimeOfDay],
Month,
WeekNumber,
Day,
Year,
CASE
WHEN Day = lag(day,1,0) over (order by EQUIPMENTNUMBER, YEAR, WeekNumber,
DAY,timeofday)
THEN 1
ELSE 0
END as [PreviousRecordOnSameDay]
FROM [Table]
EQUIPMENTNUMBER | TimeOfDay | Month | WeekNumber | Day | Year | PreviousRecordOnSameDay
10020576 | 4:18:58 AM | 1 | 4 | 24 | 2019 | 0
10020576 | 4:57:23 AM | 1 | 4 | 24 | 2019 | 1
10020576 | 3:22:47 AM | 1 | 4 | 25 | 2019 | 0
10020576 | 4:19:14 AM | 1 | 4 | 25 | 2019 | 1
So now I have an indicator telling me if the previous record occurred on the same day as the current one. Now I want to calculate the difference in time from the previous record to the current one if they occurred on the same day. I use the following SQL and get an error.
SELECT
EQUIPMENTNUMBER,
[TimeOfDay],
Month,
WeekNumber,
Day,
Year,
CASE
WHEN Day = lag(day,1,0) over (order by CEID, YEAR, WeekNumber, DAY,timeofday)
THEN
datediff(minute, lag(timeofday,1,0) over (order by CEID, YEAR, WeekNumber, DAY,timeofday), timeofday)
ELSE 0
END
FROM [Table]
I get the following error:
Msg 206, Level 16, State 2, Line 26
Operand type clash: int is incompatible with time
Can anyone out there please elaborate on what this error means or provide suggestions on how to calculate the difference in a timestamp between subsequent records in SQL Server?
Thanks!
lag(timeofday,1,0) is the part that causing type clash.
timeofday is of type time and specified default for lag function is an int. these should be the same. either remove optional default or change to time type '00:00:00.000'
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )

Compute lag difference for different days

I need help computing a date difference across different rows with variable lag (specifically, rows that are not on the same day) without subqueries, joins, etc. I think this should be possible with some inline t-SQL aggregates that use OVER(PARTITION BY) clause, such as LAG, DENSE_RANK, etc., but I can't quite put a finger on it. This is for a SQL Server 2017 Developer's edition.
A clarifying example:
Consider a dataset with Job beginning and end dates (across various projects). Some jobs start and end on the same day (such as jobs 2 & 3, 4 & 5). I need to compute the idle time between consequent jobs that started on different days (per project). That is the days between last job's ending time and current job's beginning time. If the previous job started on the same day, then look further back in history of the same project. I.e. the jobs that started on the same day can be considered as parts of the same job.
UPDATE: I simplified the code/output by dropping time values (question's history has original dataset).
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t(Prj TINYINT, Beg DATE, Eñd DATE);
INSERT INTO #t SELECT 1, '1/1/17', '1/2/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/15/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/18/17';
INSERT INTO #t SELECT 1, '1/20/17', '1/24/17';
INSERT INTO #t SELECT 2, '2/2/17', '2/5/17';
INSERT INTO #t SELECT 2, '2/7/17', '2/9/17';
ALTER TABLE #t ADD Job INT NOT NULL IDENTITY (1,1) PRIMARY KEY;
A LAG(.,1) function uses precisely the previous job's ending time, which is not what I want. It yields incorrect idle duration for jobs 2 & 3, 4 & 5. Jobs 2 & 3 should both use the ending time of job 1. Jobs 4 & 5 should both use the ending time of job 3. The joined query computes idle duration correctly, but an inline calculation is desirable here (without joins, subqueries).
SELECT c.Job, c.Prj, c.Beg, c.Eñd,
-- in-line computation with OVER clause
PrvEñd_lg=LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg),
Idle_lg=DATEDIFF(DAY, LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg), c.Beg),
-- calculation over current and (joined) previous records
PrvEñd_j=MAX(p.Eñd),
IdleDur_j=DATEDIFF(DAY, MAX(p.Eñd), c.Beg)
FROM #t c LEFT JOIN #t p ON c.Prj=p.Prj AND c.Beg > p.Eñd
GROUP BY c.Job, c.Prj, c.Beg, c.Eñd
ORDER BY c.Prj, c.Beg
Job Prj Beg Eñd PrvEñd_lg Idle_lg PrvEñd_j IdleDur_j
1 1 2017-01-01 2017-01-02 NULL NULL NULL NULL
2 1 2017-01-05 2017-01-07 2017-01-02 3 2017-01-02 3
3 1 2017-01-05 2017-01-07 2017-01-07 -2 2017-01-02 3
4 1 2017-01-15 2017-01-15 2017-01-07 8 2017-01-07 8
5 1 2017-01-15 2017-01-18 2017-01-15 0 2017-01-07 8
6 1 2017-01-20 2017-01-24 2017-01-18 2 2017-01-18 2
7 2 2017-02-02 2017-02-05 NULL NULL NULL NULL
8 2 2017-02-07 2017-02-09 2017-02-05 2 2017-02-05 2
Please let me know, if I can further clarify any specific details.
Many thanks!
You can use a self-join.
select a.Job
, a.Prj
, a.Beg
, a.Eñd
, max(b.Eñd) as PrevEñd
, min(datediff(mi, b.Eñd, a.Beg) / (60*24.0)) as IdleDur
from #t as a
left join #t as b on a.Prj = b.Prj
and cast(a.Beg as date) > cast(b.Eñd as date)
group by a.Job
, a.Prj
, a.Beg
, a.Eñd
This produces the following output:
+-----+-----+---------------------+---------------------+---------------------+-----------+
| Job | Prj | Beg | Eñd | PrevEñd | IdleDur |
+-----+-----+---------------------+---------------------+---------------------+-----------+
| 1 | 1 | 2017-01-01 01:00:00 | 2017-01-02 02:00:00 | NULL | NULL |
| 2 | 1 | 2017-01-05 02:00:00 | 2017-01-07 03:00:00 | 2017-01-02 02:00:00 | 3.0000000 |
| 3 | 1 | 2017-01-05 03:00:00 | 2017-01-07 02:00:00 | 2017-01-02 02:00:00 | 3.0416666 |
| 4 | 1 | 2017-01-15 04:00:00 | 2017-01-15 03:00:00 | 2017-01-07 03:00:00 | 8.0416666 |
| 5 | 1 | 2017-01-15 15:00:00 | 2017-01-18 03:00:00 | 2017-01-07 03:00:00 | 8.5000000 |
| 6 | 1 | 2017-01-20 05:00:00 | 2017-01-24 02:00:00 | 2017-01-18 03:00:00 | 2.0833333 |
| 7 | 2 | 2017-02-02 06:00:00 | 2017-02-05 03:00:00 | NULL | NULL |
| 8 | 2 | 2017-02-07 07:00:00 | 2017-02-09 02:00:00 | 2017-02-05 03:00:00 | 2.1666666 |
+-----+-----+---------------------+---------------------+---------------------+-----------+

Dynamic table/output each month for report

I have a table and report I need to create and I'm not sure how to wrap my head around how to make it display in the correct order each month for the output.
Using SQL Server 2012 and SSRS 2016 as the output, I need to create a rolling report that displays the last 12 months with their corresponding values. Each month the previous 12th month will drop off.
What's the best table design to approach something like this and how do you control the output to drop off the previous 12th month and keep it rolling?
Sample of desired output would be something like below but next month I need to drop off Dec - 15 and add Jan - 16 but have the columns sorted in a descending order so the previous month is always the last month in the report.
-- Desc | DEC - 15 | Jan - 16 | Feb - 16 | restofmonths| Nov 16 | Dec 16|
********************************************************************************
-- Loss | 1,000 | 2500 | 1700 | 123 | 4565 | 3433 |
-- Expense | 2,000 | 3200 | 900 | 456 | 1223 | 4445 |
-- Reserve | 3,000 | 3300 | 400 | 789 | 4747 | 4444 |
You need to use a matrix.
In your dataset, add 2 columns (if not already present). The first is the column header and the second is the column sort order. Something like this
[Header] = LEFT(DATENAME(MONTH, DateValue), 3) + ' - ' + RIGHT(YearNum, 2)
, [HeaderSort] = CONVERT(VARCHAR, YEAR(DateValue)) + RIGHT('0' + CONVERT(VARCHAR, DATEPART(MONTH, DateValue)), 2)
Set the matrix column group to your Header value and set the sort order to your HeaderSort value.

Cassandra CQL token function for pagination

I am new to CQL and trying to add pagination support for my tables defined in cassandra as shown below -
cqlsh:dev> create table emp4 (empid uuid , year varchar , month varchar , day varchar, primary key((year, month, day), empid));
cqlsh:dev> insert into emp4 (empid, year, month, day) values (08f823ac-4dd2-11e5-8ad6-0c4de9ac7563,'2014','03','19');
cqlsh:dev> insert into emp4 (empid, year, month, day) values (08f823ac-4dd2-11e5-8ad6-0c4de9ac7562,'2016','03','19');
cqlsh:dev> select * from emp4;
year | month | day | empid
------+-------+-----+--------------------------------------
2016 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
2015 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac756f
2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
When I try to execute a query to fetch the records based on the the following comparison, the statement seems to be incomplete as show below -
cqlsh:dev> select * from emp4 where token(year, month, day, empid) > token('2014','04',28',08f823ac-4dd2-11e5-8ad6-0c4de9ac7563) LIMIT 1;
... ;
...
I am trying to fetch records which have year,month, day greater than a specific value and also a given uuid. I think I am executing the query in the wrong way. Can someone help me with this ?
First of all, inputs that you send to the token() function must match your partition key...not your complete primary key:
Secondly, the order of your partition values is not necessarily the same as the tokens generated for them. Look what happens when I insert three more rows, and the query using the token function:
system.token(year, month, day) | year | month | day | empid
--------------------------------+------+-------+-----+--------------------------------------
-8209483605981607433 | 2016 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-6378102587642519893 | 2015 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-5253110411337677325 | 2013 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-3665221797724106443 | 2011 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-2421035798234525153 | 2012 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-742508345287024993 | 2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
(6 rows)
As you can see, the rows are decidedly not in order by year. And in this case your 2014 row has generated the largest token. Therefore querying for rows with a token value larger than that year will yield nothing. But, if I want to query for rows with a token year greater than 2013, it works:
SELECT token(year,month,day),year,month,day,empid FROM emp4
WHERE token(year,month,day) > token('2013','03','19');
system.token(year, month, day) | year | month | day | empid
--------------------------------+------+-------+-----+--------------------------------------
-3665221797724106443 | 2011 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-2421035798234525153 | 2012 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-742508345287024993 | 2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
(3 rows)
Also note that you will need to use dates of your existing rows to return results that are of more value to you. After all, the token generated by token('2014','04','28') may not actually be greater than the token generated by token('2014','03','19').

Resources