Cassandra CQL token function for pagination

Cassandra CQL token function for pagination - database

I am new to CQL and trying to add pagination support for my tables defined in cassandra as shown below -
cqlsh:dev> create table emp4 (empid uuid , year varchar , month varchar , day varchar, primary key((year, month, day), empid));
cqlsh:dev> insert into emp4 (empid, year, month, day) values (08f823ac-4dd2-11e5-8ad6-0c4de9ac7563,'2014','03','19');
cqlsh:dev> insert into emp4 (empid, year, month, day) values (08f823ac-4dd2-11e5-8ad6-0c4de9ac7562,'2016','03','19');
cqlsh:dev> select * from emp4;
year | month | day | empid
------+-------+-----+--------------------------------------
2016 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
2015 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac756f
2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
When I try to execute a query to fetch the records based on the the following comparison, the statement seems to be incomplete as show below -
cqlsh:dev> select * from emp4 where token(year, month, day, empid) > token('2014','04',28',08f823ac-4dd2-11e5-8ad6-0c4de9ac7563) LIMIT 1;
... ;
...
I am trying to fetch records which have year,month, day greater than a specific value and also a given uuid. I think I am executing the query in the wrong way. Can someone help me with this ?

First of all, inputs that you send to the token() function must match your partition key...not your complete primary key:
Secondly, the order of your partition values is not necessarily the same as the tokens generated for them. Look what happens when I insert three more rows, and the query using the token function:
system.token(year, month, day) | year | month | day | empid
--------------------------------+------+-------+-----+--------------------------------------
-8209483605981607433 | 2016 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-6378102587642519893 | 2015 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-5253110411337677325 | 2013 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-3665221797724106443 | 2011 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-2421035798234525153 | 2012 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-742508345287024993 | 2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
(6 rows)
As you can see, the rows are decidedly not in order by year. And in this case your 2014 row has generated the largest token. Therefore querying for rows with a token value larger than that year will yield nothing. But, if I want to query for rows with a token year greater than 2013, it works:
SELECT token(year,month,day),year,month,day,empid FROM emp4
WHERE token(year,month,day) > token('2013','03','19');
system.token(year, month, day) | year | month | day | empid
--------------------------------+------+-------+-----+--------------------------------------
-3665221797724106443 | 2011 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-2421035798234525153 | 2012 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7562
-742508345287024993 | 2014 | 03 | 19 | 08f823ac-4dd2-11e5-8ad6-0c4de9ac7563
(3 rows)
Also note that you will need to use dates of your existing rows to return results that are of more value to you. After all, the token generated by token('2014','04','28') may not actually be greater than the token generated by token('2014','03','19').

Related

Sum running total in sql

I am trying to insert a running total column into a SQL Server table as part of a stored procedure. I am needing this for a financial database so I am dealing with accounts and departments. For example, let's say I have this data set:
Account | Dept | Date | Value | Running_Total
--------+--------+------------+----------+--------------
5000 | 40 | 2018-02-01 | 10 | 15
5000 | 40 | 2018-01-01 | 5 | 5
4000 | 40 | 2018-02-01 | 10 | 30
5000 | 30 | 2018-02-01 | 15 | 15
4000 | 40 | 2017-12-01 | 20 | 20
The Running_Total column provides a historical sum of dates less than or equal to each row's date value. However, the account and dept must match for this to be the case.
I was able to get close by using
SUM(Value) OVER (PARTITION BY Account, Dept, Date)
but it does not go back and get the previous months...
Any ideas? Thanks!

You are close. You need an order by:
Sum(Value) over (partition by Account, Dept order by Date)

Difference in time between current record and previous record (with a lag variable and case statement)

I want to calculate the time between subsequent records in SQL if the records happened on the same day.
I have a table that looks like this with data from January 24th and 25th from the same piece of equipment:
EQUIPMENTNUMBER | TimeOfDay | Month | WeekNumber | Day | Year
10020576 | 4:18:58 AM | 1 | 4 | 24 | 2019
10020576 | 4:57:23 AM | 1 | 4 | 24 | 2019
10020576 | 3:22:47 AM | 1 | 4 | 25 | 2019
10020576 | 4:19:14 AM | 1 | 4 | 25 | 2019
I am able to use a lag variable within a case statement to identify if the previous record occured on the same day as the current record. Using the following code, I get the following table:
SELECT
EQUIPMENTNUMBER,
[TimeOfDay],
Month,
WeekNumber,
Day,
Year,
CASE
WHEN Day = lag(day,1,0) over (order by EQUIPMENTNUMBER, YEAR, WeekNumber,
DAY,timeofday)
THEN 1
ELSE 0
END as [PreviousRecordOnSameDay]
FROM [Table]
EQUIPMENTNUMBER | TimeOfDay | Month | WeekNumber | Day | Year | PreviousRecordOnSameDay
10020576 | 4:18:58 AM | 1 | 4 | 24 | 2019 | 0
10020576 | 4:57:23 AM | 1 | 4 | 24 | 2019 | 1
10020576 | 3:22:47 AM | 1 | 4 | 25 | 2019 | 0
10020576 | 4:19:14 AM | 1 | 4 | 25 | 2019 | 1
So now I have an indicator telling me if the previous record occurred on the same day as the current one. Now I want to calculate the difference in time from the previous record to the current one if they occurred on the same day. I use the following SQL and get an error.
SELECT
EQUIPMENTNUMBER,
[TimeOfDay],
Month,
WeekNumber,
Day,
Year,
CASE
WHEN Day = lag(day,1,0) over (order by CEID, YEAR, WeekNumber, DAY,timeofday)
THEN
datediff(minute, lag(timeofday,1,0) over (order by CEID, YEAR, WeekNumber, DAY,timeofday), timeofday)
ELSE 0
END
FROM [Table]
I get the following error:
Msg 206, Level 16, State 2, Line 26
Operand type clash: int is incompatible with time
Can anyone out there please elaborate on what this error means or provide suggestions on how to calculate the difference in a timestamp between subsequent records in SQL Server?
Thanks!

lag(timeofday,1,0) is the part that causing type clash.
timeofday is of type time and specified default for lag function is an int. these should be the same. either remove optional default or change to time type '00:00:00.000'
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )

Limit RANGE with condition in Window function

Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?

Dynamic table/output each month for report

I have a table and report I need to create and I'm not sure how to wrap my head around how to make it display in the correct order each month for the output.
Using SQL Server 2012 and SSRS 2016 as the output, I need to create a rolling report that displays the last 12 months with their corresponding values. Each month the previous 12th month will drop off.
What's the best table design to approach something like this and how do you control the output to drop off the previous 12th month and keep it rolling?
Sample of desired output would be something like below but next month I need to drop off Dec - 15 and add Jan - 16 but have the columns sorted in a descending order so the previous month is always the last month in the report.
-- Desc | DEC - 15 | Jan - 16 | Feb - 16 | restofmonths| Nov 16 | Dec 16|
********************************************************************************
-- Loss | 1,000 | 2500 | 1700 | 123 | 4565 | 3433 |
-- Expense | 2,000 | 3200 | 900 | 456 | 1223 | 4445 |
-- Reserve | 3,000 | 3300 | 400 | 789 | 4747 | 4444 |

You need to use a matrix.
In your dataset, add 2 columns (if not already present). The first is the column header and the second is the column sort order. Something like this
[Header] = LEFT(DATENAME(MONTH, DateValue), 3) + ' - ' + RIGHT(YearNum, 2)
, [HeaderSort] = CONVERT(VARCHAR, YEAR(DateValue)) + RIGHT('0' + CONVERT(VARCHAR, DATEPART(MONTH, DateValue)), 2)
Set the matrix column group to your Header value and set the sort order to your HeaderSort value.

Using SQL Server windowing function to get running total by fiscal year

I'm using SQL Server 2014. I have a Claims table containing totals of claims made per month in my system:
+-----------+-------------+------------+
| Claim_ID | Claim_Date | Nett_Total |
+-----------+-------------+------------+
| 1 | 31 Jan 2012 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 |
| 3 | 31 Mar 2012 | 35344.33 |
| 4 | 30 Apr 2012 | 142355.63 |
| etc. | etc. | etc. |
+-----------+-------------+------------+
For a report I am writing I need to be able to produce a cumulative running total that resets to zero at the start of each fiscal year (in my country this is from March 1 to February 28/29 of the following year).
The report will look similar to the table, with an extra running total column, something like:
+-----------+-------------+------------+---------------+
| Claim_ID | Claim_Date | Nett_Total | Running Total |
+-----------+-------------+------------+---------------+
| 1 | 31 Jan 2012 | 321454.67 | 321454.67 |
| 2 | 29 Feb 2012 | 523542.34 | 844997.01 |
| 3 | 31 Mar 2012 | 35344.33 | 35344.33 | (restart at 0
| 4 | 30 Apr 2012 | 142355.63 | 177699.96 | for new yr)
| etc. | etc. | etc. | |
+-----------+-------------+------------+---------------+
I know windowing functions are very powerful and I've used them in rudimentary ways in the past to get overall sums and averages while avoiding needing to group my resultset rows. I have an intuition that I will need to employ the 'preceding' keyword to get the running total for the current fiscal year each row falls into, but I can't quite grasp how to express the fiscal year as a concept to use in the 'preceding' clause (or if indeed it's possible to use a date range in this way).
Any assistance on the way of "phrasing" the fiscal year for the "preceding" clause will be of enormous help to me, please.

i think you should try this:
/* Create Table*/
CREATE TABLE dbo.Claims (
Claim_ID int
,Claim_Date datetime
,Nett_Total decimal(10,2)
);
/* Insert Testrows*/
INSERT INTO dbo.Claims VALUES
(1, '20120101', 10000)
,(2, '20120202', 10000)
,(3, '20120303', 10000)
,(4, '20120404', 10000)
,(5, '20120505', 10000)
,(6, '20120606', 10000)
,(7, '20120707', 10000)
,(8, '20120808', 10000)
Query the Data:
SELECT Claim_ID, Claim_Date, Nett_Total, SUM(Nett_Total) OVER
(PARTITION BY YEAR(DATEADD(month,-2,Claim_Date)) ORDER BY Claim_ID) AS
[Running Total] FROM dbo.Claims
The Trick: PARTITION BY YEAR(DATEADD(month,-2,Claim_Date))
New Partition by year, but i change the date so it fits your fiscal year.
Output:
Claim_ID |Claim_Date |Nett_Total |Running Total
---------+---------------------------+------------+-------------
1 |2012-01-01 00:00:00.000 |10000.00 |10000.00
2 |2012-02-02 00:00:00.000 |10000.00 |20000.00
3 |2012-03-03 00:00:00.000 |10000.00 |10000.00 <- New partition
4 |2012-04-04 00:00:00.000 |10000.00 |20000.00
5 |2012-05-05 00:00:00.000 |10000.00 |30000.00
6 |2012-06-06 00:00:00.000 |10000.00 |40000.00
7 |2012-07-07 00:00:00.000 |10000.00 |50000.00
8 |2012-08-08 00:00:00.000 |10000.00 |60000.00