Limit RANGE with condition in Window function - sql-server

Take an example I have the following transaction table, with transaction values of each department for each trimester.
TransactionID | Department | Trimester | Year | Value | Moving Avg
1 | Dep1 | T1 | 2014 | 13 |
2 | Dep1 | T1 | 2014 | 43 |
3 | Dep1 | T2 | 2014 | 36 |
300 | Dep1 T1 | 2017 | 28 |
301 | Dep2 T1 | 2014 | 24 |
I would like to calculate moving average for each transaction from the same department, taking the window as from the 6 trimesters to 2 trimesters before the current line's trimester. Example for transaction 300 in T1 2017, I'd like to have the average of transaction values for Dep1 from T1-2015 to T2-2016.
How can I achieve this with sliding window function in SQL Server 2014. My thought is that I should use something like
SELECT
AVG(VALUES) OVER
(PARTITION BY DEPARTMENT ORDER BY TRIMESTER,
YEAR RANGE [Take the range from previous 6 to 2 trimesters])
How would we define the RANGE clause. I suppose I could not use ROWS due to the number of rows for the window is unknown.
The same question for median. How would we rewrite for calculating the median instead of mean ?

Related

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

Sum running total in sql

I am trying to insert a running total column into a SQL Server table as part of a stored procedure. I am needing this for a financial database so I am dealing with accounts and departments. For example, let's say I have this data set:
Account | Dept | Date | Value | Running_Total
--------+--------+------------+----------+--------------
5000 | 40 | 2018-02-01 | 10 | 15
5000 | 40 | 2018-01-01 | 5 | 5
4000 | 40 | 2018-02-01 | 10 | 30
5000 | 30 | 2018-02-01 | 15 | 15
4000 | 40 | 2017-12-01 | 20 | 20
The Running_Total column provides a historical sum of dates less than or equal to each row's date value. However, the account and dept must match for this to be the case.
I was able to get close by using
SUM(Value) OVER (PARTITION BY Account, Dept, Date)
but it does not go back and get the previous months...
Any ideas? Thanks!
You are close. You need an order by:
Sum(Value) over (partition by Account, Dept order by Date)

Ranking within multiple groups & Efficient query for multiple table updates

I'm trying to add rank by sales by month and also change the date column to a 'month end' field that would show only last day of month.
Can i do two sets in a row like that without adding an update?
I'm looking for top 2 within each month - does limit and group by work?
I feel like this is right and most efficient query, but its not working - any help appreciated!!
UPDATE table1
SET DATE=EOMONTH(DATE) AS MONTH_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER sales;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
orig table
+------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired output
+------+-----------+-------+------+
| CUST | Month End | SALES | Rank |
+------+-----------+-------+------+
| | | | |
| 37 | 3-31-18 | 100 | 1 |
| 38 | 3-31-18 | 65 | 2 |
| 39 | 4-30-18 | 500 | 1 |
| 40 | 4-30-18 | 199 | 2 |
+------+-----------+-------+------+
I do not know why you want EOMONTH as a stored value, but what you have for that will work.
I would not use [rank] as a column name as I avoid any words that are used in SQL, maybe [sales_rank] or similar.
ALTER TABLE table1
ADD COLUMN [sales_rank] INT AFTER sales;
with cte as (
select
cust
, DENSE_RANK() OVER(PARTITION BY cust ORDER BY sales DESC) as ranking
from table1
)
update cte
set sales_rank = ranking
where ranking < 3
;
LIMIT 2 is not something that can be used in SQL Server by the way, and it sure can't be used "per grouping". When you use a "window function" such as rank() or dense_rank() you can use the output of those in the where clause of the next "layer". i.e. use those functions in a subquery (or cte) and then use a where clause to filter rows by the calculated values.
Also note I used dense_rank() to guarantee that no rank numbers are skipped, so that the subsequent where clause will be effective.

sql server update Balance field in one statement

In Sql Server, I have a simple table that store amount and balance like this:
ID | Date | Amount | Balance
-------------------------------------
101 | 1/15/2017 | 3.00 | 67.50
102 | 1/16/2017 | 5.00 | 72.50
103 | 1/19/2017 | 9.00 | 81.50
104 | 1/20/2017 | -2.00 | 79.50
If I changed a amount of a record, I need to update all the balance after that record.
ID | Date | Amount | Balance
-------------------------------------
101 | 1/15/2017 | 3.00 | 67.50
102 | 1/16/2017 | *5.02* | *72.52*
103 | 1/19/2017 | 9.00 | *81.52*
104 | 1/20/2017 | -2.00 | *79.52*
By now I have more than 100 million records in this table. To do this work, I don't want to use sql cursor or client program, it will submit plenty Update statements and take several hours to finish.
Is it can be done in one sql statement to re-calculate the balance of entire table?
You can easily do it in a single SQL statement using SUM() OVER.
eg
WITH tot as (select ID, SUM(Amount) as balance OVER (order by ID)
UPDATE Tab
SET Balance = t.Balance
FROM YOURTABLE tab
JOIN Tot
ON tot.id = tab.id
If the balance is reset by any other column then use this as a partition by clause and include in the join.
Now if you are inserting a new row you can simply run this update query with a where clause.

Return records based on min data in source and target if conditions satisfied in sql server

Hi I have data in sql server
Table : emp
Empid | deptid | doj | loc | Status|guid
1 | 10 | 2013-09-25 | hyd | 5 |10
1 | 10 | 2014-03-25 | che | 5 |11
1 | 10 | 2014-04-09 | pune | 5 |12
1 | 10 | 2015-01-22 | pune | 5 |13
2 | 20 | 2015-12-13 | beng | 5 |14
2 | 20 | 2014-12-17 | chen | 5 |15
2 | 20 | 2010-10-15 | beng | 4 |16
Table : empref
empid | deptid | startdate | status |guid
1 | 10 | 2013-10-02 | 2 |1
1 | 10 | 2014-04-09 | 2 |2
1 | 10 | 2015-12-09 | 1 |3
1 | 10 | 2015-01-30 | 2 |4
2 | 20 | 2015-12-14 | 2 |2
2 | 20 | 2015-12-15 | 2 |3
Both tables have common columns Empid + deptid
We need to consider emp table status=5 related records compare with empref table status=2
related records and emp table doj <= startdate --empref table and days difference between less than or equal to 30 days
If we find multiple records fall within 30 days in empref table startdate then we need to consider min(startdate) corresponding records
and that records need to be considered as update. Remain status values 4 or 1 no need in the return result set at this time.
If emp table status=5 related records compare with empref table status=2
related records and emp table doj <= startdate --empref table and daysdiffernce between less than or equal 30 days
If we find multiple records fall with in 30days in emp table doj then we need to consider min(doj) corresponding records
and that record needs to be considered as update in the filter column and guid information from empref table.
Remaining records considered as insert records in the filter column and guid information from emptable.
if emp table doj <=startdate--empref table condition not satisfied or
daysdiffernce not between less than or equal 30 days then that records we need to consider insert in the filter column
based on above tables I want output like below
Empid | Deptid | loc | Status | Filter | Doj |guid
1 | 10 | hyd | 5 | Update | 2013-09-25|1
1 | 10 | che | 5 | insert | 2014-03-25|11 ------min(startdate) corresponding record
1 | 10 | pune | 5 | update | 2014-04-09|2 --------mul
1 | 10 | Pune | 5 | update | 2015-01-22|4
2 | 20 | beng | 5 | update | 2015-12-13|2 --------------min(doj) record
2 | 20 | chen | 5 | insert | 2014-12-17|15
2 | 20 | beng | 4 | insert | 2010-10-15|16 -----this record not fall the above conditions
I tried like below
select s1.*
,'Update' as Filter from emp e join empref er
on e.empid=er.empid and
e.deptid=t.deptid
and e.status='5'
and er.status='2' and
e.doj<=er.startdate and datediff(dd,er.startdate,e.doj)*-1<=30
group by er.startdate,
e.empid,e.deptid.e.doj,e.loc
having e.startdate= min(er.startdate)
In the above query not given expected result. Please help me write this query to achieve this task in sql server.
It seems like the query you supplied is very close. Here is what I quickly put together. I haven't tested it against a lot of the possible options.
select e.Empid, er.deptid, e.loc, e.[status],
case when DATEDIFF(DAY, e.doj, er.startdate) <= 0 THEN 'INSERT'
ELSE 'UPDATE' END [DaysOffset],
e.doj
FROM #emp e inner join #empref ER
on e.Empid = er.empid and
e.deptid = er.deptid
where e.[status] = 5 and er.[status] = 2
and e.doj <= er.startdate and
DATEDIFF(DAY, e.doj, er.startdate) <= 30
The CASE statement is where it determines when the record is flagged for INSERT or UPDATE. With the datediff in the WHERE clause, it will only return records that are 30 days or less.

Resources