TSQL Divide set in groups

TSQL Divide set in groups - sql-server

I have a table with 3 fields: wk, cor, id
"wk" is the week, "cor" groups items from same location, "id" is the id of each item to retrieve from warehouse.
Given a certain number of items to retrieve, I must take almost the same quantity of items from each group ("cor" represents groups) for balancing the warehouse performance, respecting the week precedence (before going to the following week, the previous must be ehausted).
If you follow the link the image may be clear:
Data sample
rows are taken in this order:
yellow, orange, green, gray (this last one starts with "cor 2" because "cor 1" was the last used in week 28)
The RES column (done by hand in the sample) represents the right order I should take items; currently this is obtained with a cursor, which is very very slow and I'd like to do something better, if possible; I've tried with windowed functions, cte, recursive cte but was not able to get anything right.
With this script you can have the same table
DECLARE #t TABLE (wk int, cor int, id int)
INSERT INTO #t
(
wk
,cor
,id
)
VALUES
(28,1,4044534),
(28,1,6778322),
(28,1,7921336),
(28,1,4326390),
(28,2,2669622),
(28,2,6580257),
(28,2,1179795),
(28,3,3980111),
(28,3,2549129),
(28,3,6763533),
(29,1,6023538),
(29,1,8219574),
(29,1,3836858),
(29,2,3355314),
(29,2,148847),
(29,2,8083320),
(29,3,1359966),
(29,3,8746308)
The expected result:
All fields are given while the RES field must be calculated and represents the order in which items will be taken out (explained below the table).
+----+-----+---------+-----+
| wk | cor | id | RES |
+----+-----+---------+-----+
| 28 | 1 | 4044534 | 1 |
| 28 | 1 | 6778322 | 4 |
| 28 | 1 | 7921336 | 7 |
| 28 | 1 | 4326390 | 10 |
| 28 | 2 | 2669622 | 2 |
| 28 | 2 | 6580257 | 5 |
| 28 | 2 | 1179795 | 8 |
| 28 | 3 | 3980111 | 3 |
| 28 | 3 | 2549129 | 6 |
| 28 | 3 | 6763533 | 9 |
| 29 | 1 | 6023538 | 11 |
| 29 | 1 | 8219574 | 14 |
| 29 | 1 | 3836858 | 17 |
| 29 | 2 | 3355314 | 12 |
| 29 | 2 | 148847 | 15 |
| 29 | 2 | 8083320 | 18 |
| 29 | 3 | 1359966 | 13 |
| 29 | 3 | 8746308 | 16 |
+----+-----+---------+-----+
The algo is like that:
The older week must be first exausted (in the sample, wk 28 must be finished before taking itmes from wk 29)
Items must be equally reparted in "cor"s, so if 10 items are required they must come out like that: 3 from cor1,3 from cor2, 3 from cor3. The last one may come from whichever cor because 10 is not divisible by 3, obv
If 11 items are required; week 28 only contains 10 items so the last one will be taken from week 29, with the same principle: equally distribute the exit among cors, even if weeks change. If the last article from week 28 was taken from cor 1, the next one in week 29 will be taken from cor 2

Does this answer your problem ?
DROP TABLE IF EXISTS #temp
DROP TABLE IF EXISTS #temp2
CREATE TABLE #temp (idx INT PRIMARY KEY IDENTITY(1,1), wk int, cor int, id int)
INSERT INTO #temp
VALUES
(28,1,4044534),
(28,1,6778322),
(28,1,7921336),
(28,1,4326390),
(28,2,2669622),
(28,2,6580257),
(28,2,1179795),
(28,3,3980111),
(28,3,2549129),
(28,3,6763533),
(29,1,6023538),
(29,1,8219574),
(29,1,3836858),
(29,2,3355314),
(29,2,148847),
(29,2,8083320),
(29,3,1359966),
(29,3,8746308)
SELECT wk, cor, id
, ROW_NUMBER() OVER (ORDER BY wk, RES, idx) as RES
FROM (
SELECT idx
, wk
, cor
, id
, ROW_NUMBER() OVER (PARTITION BY wk, cor ORDER BY cor) AS RES
FROM #temp
) AS t
ORDER BY idx

You don't have the correct information in your test data to support the desired output.
If however, you were to have an identity column that represents the insertion order, you could use something like the following...
WITH cte_RankOrder AS (
SELECT
t.rn, t.wk, t.cor, t.id,
RankOrder = DENSE_RANK() OVER (PARTITION BY t.wk, t.cor ORDER BY t.rn, t.wk)
FROM
#t t
)
SELECT
ro.rn, ro.wk, ro.cor, ro.id,
RES = ROW_NUMBER() OVER (ORDER BY wk, ro.RankOrder, ro.cor)
FROM
cte_RankOrder ro
ORDER BY ro.rn;
results...
rn wk cor id RES
----------- ----------- ----------- ----------- --------------------
1 28 1 4044534 1
2 28 1 6778322 4
3 28 1 7921336 7
4 28 1 4326390 10
5 28 2 2669622 2
6 28 2 6580257 5
7 28 2 1179795 8
8 28 3 3980111 3
9 28 3 2549129 6
10 28 3 6763533 9
11 29 1 6023538 11
12 29 1 8219574 14
13 29 1 3836858 17
14 29 2 3355314 12
15 29 2 148847 15
16 29 2 8083320 18
17 29 3 1359966 13
18 29 3 8746308 16
HTH, Jason

Related

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)

Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.

the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?

If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

Calculate row difference within groups

I'm looking for help with calculating the difference between consecutive ordered rows within groups in SQL (Microsoft SQL server).
I have a table like this:
ID School_ID Enrollment_Start_Date Order
1 56 1/1/2018 10
1 56 5/5/2018 24
1 56 7/7/2018 35
1 103 4/4/2019 26
1 103 3/3/2019 19
I want to calculate the difference between Order, group by ID, School_ID, and order by Enrollment_Start_Date.
so I want something like this:
ID School_ID Enrollment_Start_Date Order Diff
1 56 1/1/2018 10 10 # nothing to be subtracted from 10
1 56 5/5/2018 24 14 # 24-10
1 56 7/7/2018 35 11 # 35-24
1 103 3/3/2019 19 19 # nothing to be subtracted from 19
1 103 4/4/2019 26 7 # 26-19
I have hundreds of IDs, and each ID can have at most 6 Enrollment_Start_Date, so I'm looking for some generalizable implementations.

Use LAG(<column>) analytic function to obtain a "previous" column value specified within the OVER part, then substract current value from it and make it a positive number multiplying it by -1. If previous value isn't present (is null) then take the current value.
Pseudo code would be:
If previous_order_value exists:
-1 * (previous_order_value - current_order_value)
Else
current_order_value
where previous_order_value is based on the same id & school_id and is sorted by enrollment_start_date in ascending order
SQL Code:
select
id,
school_id,
enrollment_start_date,
[order],
coalesce(-1 * (lag([order]) over (partition by id, school_id order by enrollment_start_date ) - [order]), [order]) as diff
from yourtable
Also note, that order keyword is reserved in SQL Server, which is why your column was created with name wrapped within [ ]. I suggest using some other word for this column, if possible.

use lag() analytic function for getting difference of two row and case when for getting orginal value of order column where no difference exist
with cte as
(
select 1 as id, 56 as sclid, '2018-01-01' as s_date, 10 as orders
union all
select 1,56,'2018-05-05',24 union all
select 1,56,'2018-07-07',35 union all
select 1,103,'2019-04-04',26 union all
select 1,103,'2019-03-03',19
) select t.*,
case when ( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )
is null then [orders] else
( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )*(-1) end
as diff
from cte t
output
id sclid s_date orders diff
1 56 2018-01-01 10 10
1 56 2018-05-05 24 14
1 56 2018-07-07 35 11
1 103 2019-03-03 19 19
1 103 2019-04-04 26 7
demo link

Use LAG(COLUMN_NAME)
Query
SELECT id, School_ID, Enrollment_Start_Date, cOrder,
ISNULL((cOrder - (LAG(cOrder) OVER(PARTITION BY id, School_ID ORDER BY Enrollment_Start_Date))),cOrder)Diff
FROM Table1
Samle Output
| id | School_ID | Enrollment_Start_Date | cOrder | Diff |
|----|-----------|-----------------------|--------|------|
| 1 | 56 | 2018-01-01 | 10 | 10 |
| 1 | 56 | 2018-05-05 | 24 | 14 |
| 1 | 56 | 2018-07-07 | 35 | 11 |
| 1 | 103 | 2019-03-03 | 19 | 19 |
| 1 | 103 | 2019-04-04 | 26 | 7 |
SQL Fiddle Demo

TSQL count vs sum distinct values

I have a slightly confusing conundrum and I have been stuck all day.
I have the following types of data ...
For each customer record I have order numbers and for each order, I have a series of package numbers and for each package number, I have a possibility of zones... Normally the math would be relatively simple if there was 1 package with 1 or more zones we just select distinct amount of seats for example.
+-----------+-------+-----+------+-------+
| customer | order | pkg | zone | seats |
+-----------+-------+-----+------+-------+
| 1 | 1 | 11 | 7 | 2 |
| 1 | 1 | 12 | 7 | 2 |
+-----------+-------+-----+------+-------+
We know customer 1 has 2 seats per package.
Here is where it gets tricky
+----------+-------+-----+------+-------+
| customer | order | pkg | zone | seats |
+----------+-------+-----+------+-------+
| 2 | 3 | 8 | 5 | 2 |
| 2 | 3 | 9 | 5 | 2 |
| 2 | 3 | 10 | 5 | 2 |
-- In the above case we know a given customer has one order #3, with three packages in the same zone each package has two seats.
| 2 | 3 | 9 | 6 | 1 |
| 2 | 3 | 9 | 8 | 1 |
| 2 | 3 | 10 | 7 | 2 |
+----------+-------+-----+------+-------+
-- Here things are confusing because the same customer, has a single order #3 (and its possible
-- both scenarios occur in one single order) with two packages 9 and 10, package 9 has two zones
-- 1 and 1 and package 10 has one zones with two seats. how do we distinguish when we simply count
-- the seats like in the first/second occurrence or when we sum the seats like in the last example.
To reiterate a single customer would have a single order each order can have many packages in it with distinct package numbers each package can have 1 or more zones and each zone can have 1 or more seats.
When the zones are the same for a single package we simply count distinct. when a single order+package has more than one zone we sum we don't count.
I can't figure out how to code the logic. Please help.
My columns are customer_no, order_no, pkg_no, zone_no and pkg_seats.
Here is a real example
+----------+-------+-----+-------+------+
| customer | order | pkg | seats | zone |
+----------+-------+-----+-------+------+
| 374 | 876 | 68 | 2 | 26 |
| 374 | 876 | 68 | 1 | 32 |
| 374 | 876 | 68 | 1 | 56 |
| 374 | 876 | 71 | 2 | 56 |
| 374 | 876 | 71 | 2 | 79 |
| 862 | 538 | 71 | 2 | 33 |
| 862 | 538 | 71 | 1 | 81 |
| 862 | 538 | 71 | 1 | 82 |
-- In the below case we simply count 2. in the above we sum.
| 575 | 994 | 68 | 2 | 34 |
| 575 | 994 | 68 | 2 | 79 |
+----------+-------+-----+-------+------+
I should add one super confusing piece. We have a series of packages that are part of other packages. For example package 68, 70 and 71 are all together and the parent package is 68.
I can't figure out the grouping.

with data as (
select *,
min(zone_no) over
(partition by customer_no, order_no, pkg_no) as min_zone_no1,
min(zone_no) over
(partition by customer_no, order_no, pkg_no, pkg_seats) as min_zone_no2
from T
)
select
customer_no, order_no,
sum(case when zone_no = min_zone_no1 then pkg_seats end) as seat_total1,
sum(case when zone_no = min_zone_no2 then pkg_seats end) as seat_total2
from data
group by customer_no, order_no
order by customer_no, order_no;
I've poured over your description a few times and I'm still not certain I'm on the right track. You seem to have a problem of double-counting: essentially you want a sum, but some of the rows shouldn't be included. (To "count distinct seats" is likely the wrong nomenclature here.)
My approach above is to try and identify sets of rows that involve "duplicates" and some data that will assist in counting only one of them. I'm not sure what to make of order 876 which has different numbers of seats across the three zones.

how to get sum of each column new records in SQL Server

I have a question about SQL Server. I have a table something like this:
productname |Level| January | Feburary | March | total
------------x-----x-----------x----------x-------x------
Rin | L1 | 10 | 20 | 30 | 60
Rin | L2 | 5 | 10 | 10 | 25
Rin | L3 | 20 | 5 | 5 | 30
Pen | L1 | 5 | 6 | 10 | 21
Pen | L2 | 10 | 10 | 20 | 40
Pen | L3 | 30 |10 | 40 | 80
based on above table data I want output like below
productname |Level| January | Feburary | March | total
------------x-----x-----------x----------x-------x------
Rin | L1 | 10 | 20 | 30 | 60
Rin | L2 | 5 | 10 | 10 | 25
Rin | L3 | 20 | 5 | 5 | 30
RinTotal |All | 35 | 35 | 45 | 115
Pen | L1 | 5 | 6 | 10 | 21
Pen | L2 | 10 | 10 | 20 | 40
Pen | L3 | 30 | 10 | 40 | 80
PenTotal | All | 45 | 26 | 70 |141
I tried like bellow query
SELECT productname
,LEVEL
,sum(january) AS January
,sum(Feburary) AS Feburary )
,Sum(march) AS March
,Sum(total) AS total
FROM test
UNION
SELECT *
FROM test
but its not given exact output .Please point me to right direction on how to achieve this task in SQL Server.

please try this:
SELECT * FROM TEST
UNION
SELECT PRODUCTNAME+'TOTAL','ALL' AS LEVEL,SUM(JANUARY)AS JANUARY,SUM(FEBURARY)AS FEBURARY),SUM(MARCH)AS MARCH,SUM(TOTAL)AS TOTAL
FROM TEST GROUP BY PRODUCTNAME

This really belongs in the front end. Group subtotals and such are usually really simple from most reporting tools. Also, don't get lazy and use select *, you should always be explicit in your columns. Since you have a specific order I added a couple of extra columns to use for sorting.
Also don't be afraid to add some white space and formatting to your queries. It makes your life a lot easier to read and later debug.
I think something like this should get you close. Notice I changed to a UNION ALL. When using UNION it will exclude duplicates. Since you know for a fact that there are no duplicate rows a UNION ALL will eliminate the need to check for duplicates.
select productname + 'Total' as productname
, 'All' as level
, sum(january) as January
, sum(Feburary) as Feburary
, Sum(march) as March
, Sum(total) as total
, productname as SortName
, 1 as SortOrder
from test
group by productname
union ALL
select productname
, level
, January
, Feburary
, March
, Total
, productname as SortName
, 0 as SortOrder
from test
order by SortName, SortOrder

I would do this using Group by With Rollup. For more info check here
SELECT *
FROM (SELECT productname=productname + CASE WHEN level IS NULL THEN 'Total'
ELSE '' END,
Level=Isnull(level, 'ALL'),
Sum(january) AS January,
Sum(feburary) AS Feburary,
Sum(march) AS March,
Sum(total) AS total
FROM Yourtable
GROUP BY rollup ( productname, level )) a
WHERE productname IS NOT NULL
SQLFIDDLE DEMO

Aggregating using a combination of rows and columns

Sample Data from Ranges Table named ranges is shown below:
+-----------------+-------------------+----------+----------+
| SectionCategory | RangeName | LowerEnd | UpperEnd |
+-----------------+-------------------+----------+----------+
| Sanction | 0-7 days | 0 | 7 |
| Sanction | 8-15 days | 8 | 15 |
| Sanction | More than 15 days | 16 | 99999 |
| Disbursal | 0-7 days | 0 | 7 |
| Disbursal | 8-15 days | 8 | 15 |
| Disbursal | More than 15 days | 16 | 99999 |
+-----------------+-------------------+----------+----------+
Sample Data from the Delays Table is shown below:
+-----------+---------------+-----------------+
| Loan No. | SanctionDelay | Disbursal Delay |
+-----------+---------------+-----------------+
| 247 | 8 | 35 |
| 661 | 18 | 37 |
| 1235 | 12 | 6 |
| 1235 | 8 | 15 |
| 1241 | 28 | 9 |
| 1241 | 11 | 9 |
| 1283 | 22 | 20 |
| 1283 | 28 | 41 |
| 1523 | 1 | 27 |
| 1523 | 6 | 28 |
+-----------+---------------+-----------------+
The desired output is shown below:
+-----------+-------------------+-------+
| Section | Range | Count |
+-----------+-------------------+-------+
| Sanction | 0-7 days | 2 |
| Sanction | 8-15 days | 4 |
| Sanction | More than 15 days | 4 |
| Disbursal | 0-7 days | 1 |
| Disbursal | 8-15 days | 3 |
| Disbursal | More than 15 days | 6 |
+-----------+-------------------+-------+
Currently two separate queries are written and UNION is used to collate the output.
From a maintainability point of view, would it be possible to do this in a single query?
(For Sanction in the Ranges table, the SanctionDelay column from Delays Table should be used and for Disbursal, the DisbursalDelay column should be used.) The need is because the number of stages of the loan lifecycle is expected to increase and more and more UNIONs would be needed to collate the output.

It can be done with a CROSS JOIN, not sure how efficient it is.
Sample data:
declare #Ranges table (SectionCategory varchar(10) not null,RangeName varchar(20) not null,LowerEnd int not null,UpperEnd int not null)
insert into #Ranges (SectionCategory,RangeName,LowerEnd,UpperEnd) values
('Sanction','0-7 days',0,7),
('Sanction','8-15 days',8,15),
('Sanction','More than 15 days',16,99999),
('Disbursal','0-7 days',0,7),
('Disbursal','8-15 days',8,15),
('Disbursal','More than 15 days',16,99999)
declare #Delays table (LoanNo int not null,SanctionDelay int not null,DisbursalDelay int not null)
insert into #Delays (LoanNo,SanctionDelay,DisbursalDelay) values
( 247, 8,35),
( 661,18,37),
(1235,12, 6),
(1235, 8,15),
(1241,28, 9),
(1241,11, 9),
(1283,22,20),
(1283,28,41),
(1523, 1,27),
(1523, 6,28)
Query (must be run in same batch as sample data):
select
r.SectionCategory,
r.RangeName,
SUM(CASE
WHEN r.SectionCategory='Sanction' and d.SanctionDelay BETWEEN r.LowerEnd and r.UpperEnd then 1
WHEN r.SectionCategory='Disbursal' and d.DisbursalDelay BETWEEN r.LowerEnd and r.UpperEnd then 1
else 0 end) as Cnt
from #Ranges r
cross join
#Delays d
group by
r.SectionCategory,
r.RangeName
order by SectionCategory,RangeName
Results:
SectionCategory RangeName Cnt
--------------- -------------------- -----------
Disbursal 0-7 days 1
Disbursal 8-15 days 3
Disbursal More than 15 days 6
Sanction 0-7 days 2
Sanction 8-15 days 4
Sanction More than 15 days 4
From a maintainability perspective, it may be better to have a single delay column in the delays table and an additional column that specifies the type of the delay. At the moment, it feels like some form of attribute splitting - in the Ranges table, the type is represented as a column value (Sanction, Disbursal, etc), yet in the delays table, this same "type" is represented in the table meta-data, in terms of distinct column names.
You say that "the number of stages of the loan lifecycle is expected to increase", and I'd expect that this cross over (representing attributes as data in some tables and meta-data in others) will increase the pain of writing decent queries.

Try this
SELECT
SectionCategory
,RangeName
,CASE
WHEN R.SectionCategory='Sanction' THEN
(SELECT COUNT(1) FROM Delays D WHERE D.Sanction_Delay BETWEEN R.LowerEnd AND R.UpperEnd)
WHEN R.SectionCategory='Disbursal' THEN
(SELECT COUNT(1) FROM Delays D WHERE D.[Disbursal Delay] BETWEEN R.LowerEnd AND R.UpperEnd)
END as cnt
FROM Ranges R
Here is SQLFiddle demo

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

TSQL Divide set in groups - sql-server

Related

Maximum Daisy Chain Length

Calculate row difference within groups

TSQL count vs sum distinct values

how to get sum of each column new records in SQL Server

Aggregating using a combination of rows and columns

Categories

Resources