get first row for each group - sql-server

I want to transform this data
account; docdate; docnum
17700; 9/11/2015; 1
17700; 9/12/2015; 2
70070; 9/1/2015; 4
70070; 9/2/2015; 6
70070; 9/3/2015; 9
into this
account; docdate; docnum
17700; 9/12/2015; 2
70070; 9/3/2015; 9
.. for each account I want to have one row with the most current (=max(docdate)) docdate. I already tried different approaches with cross apply and row_number but couldn't achieve the desired results

Use ROW_NUMBER:
SELCT account, docdate, docnum
FROM (
SELECT account, docdate, docnum,
ROW_NUMBER() OVER (PARTITION BY account
ORDER BY docdate DESC) AS rn
FROM mytable ) AS t
WHERE t.rn = 1
PARTITION BY account clause creates slices of rows sharing the same account value. ORDER BY docdate DESC places the record having the maximum docdate value at the top of its related slice. Hence rn = 1 points to the record with the maximum docdate value within each account partition.

Related

SQL spread out one group proportionally to other groups

So, for example, we have a number of users with different group id. Some of them don't have group:
userID groupID
-------------
user1 group1
user2 group1
user3 group2
user4 group1
user5 NULL
user6 NULL
user7 NULL
user8 NULL
We need to group users by their groupID. And we want users without group (groupID equals NULL) to be assigned to one of existing groups(group1 or group2 in this example). But we want to distribute them proportionally to amount of users already assigned to those groups. In our example group1 has 3 users and group2 has only 1 user. So 75% (3/4) of new users should be counted as members of group 1 and other 25% (1/4) should be "added" to group2. The end result should look like this:
groupID numOfUsers
-------------
group1 6
group2 2
This is a simplified example.
Basically we just can't figure out how users without a group can be divided between groups in a certain proportion, not just evenly distributed between them.
We can have any number of groups and users, so we can't just hardcode percentages.
Any help is appreciated.
Edit:
I tried to use NTILE(), but it gives even distribution, not proportional to amount of users in groups
SELECT userID ,
NTILE(2) OVER( ) gr
from(
select DISTINCT userID
from test_task
WHERE groupID IS NULL ) AS abc
here is one way:
select
groupid
, count(*)
+ round(count(*) / sum(count(*)) over(),0) * (select count(*) from table where groupid ='no_group')
from table
where groupid <> 'no_group'
group by groupid
We can use an updatable CTE to do this
First, we take all existing data, group it up by groupID, and calculate a running sum of the number of rows, as well as the total rows over the whole set
We take the rows we want to update and add a row-number (subtract 1 so the calculations work)
Join the two based on that row-number modulo the total existing rows should be between the previous running sum and the current running sum
Note that this only works well when there are a divisible number of rows eg. 4 or 8, by 4 existing rows
WITH Groups AS (
SELECT
groupID,
perGroup = COUNT(*),
total = SUM(COUNT(*)) OVER (),
runningSum = SUM(COUNT(*)) OVER (ORDER BY groupID ROWS UNBOUNDED PRECEDING)
FROM test_task
WHERE groupID IS NOT NULL
GROUP BY groupID
),
ToUpdate AS (
SELECT
groupID,
userID,
rn = ROW_NUMBER() OVER (ORDER BY userID) - 1
FROM test_task tt
WHERE groupID IS NULL
)
UPDATE u
SET groupID = g.groupID
FROM ToUpdate u
JOIN Groups g
ON u.rn % (g.total) >= g.runningSum - g.perGroup
AND u.rn % (g.total) < g.runningSum;
db<>fiddle

T-SQL - Get last as-at date SUM(Quantity) was not negative

I am trying to find a way to get the last date by location and product a sum was positive. The only way i can think to do it is with a cursor, and if that's the case I may as well just do it in code. Before i go down that route, i was hoping someone may have a better idea?
Table:
Product, Date, Location, Quantity
The scenario is; I find the quantity by location and product at a particular date, if it is negative i need to get the sum and date when the group was last positive.
select
Product,
Location,
SUM(Quantity) Qty,
SUM(Value) Value
from
ProductTransactions PT
where
Date <= #AsAtDate
group by
Product,
Location
i am looking for the last date where the sum of the transactions previous to and including it are positive
Based on your revised question and your comment, here another solution I hope answers your question.
select Product, Location, max(Date) as Date
from (
select a.Product, a.Location, a.Date from ProductTransactions as a
join ProductTransactions as b
on a.Product = b.Product and a.Location = b.Location
where b.Date <= a.Date
group by a.Product, a.Location, a.Date
having sum(b.Value) >= 0
) as T
group by Product, Location
The subquery (table T) produces a list of {product, location, date} rows for which the sum of the values prior (and inclusive) is positive. From that set, we select the last date for each {product, location} pair.
This can be done in a set based way using windowed aggregates in order to construct the running total. Depending on the number of rows in the table this could be a bit slow but you can't really limit the time range going backwards as the last positive date is an unknown quantity.
I've used a CTE for convenience to construct the aggregated data set but converting that to a temp table should be faster. (CTEs get executed each time they are called whereas a temp table will only execute once.)
The basic theory is to construct the running totals for all of the previous days using the OVER clause to partition and order the SUM aggregates. This data set is then used and filtered to the expected date. When a row in that table has a quantity less than zero it is joined back to the aggregate data set for all previous days for that product and location where the quantity was greater than zero.
Since this may return multiple positive date rows the ROW_NUMBER() function is used to order the rows based on the date of the positive quantity day. This is done in descending order so that row number 1 is the most recent positive day. It isn't possible to use a simple MIN() here because the MIN([Date]) may not correspond to the MIN(Quantity).
WITH x AS (
SELECT [Date],
Product,
[Location],
SUM(Quantity) OVER (PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS Quantity,
SUM([Value]) OVER(PARTITION BY Product, [Location] ORDER BY [Date] ASC) AS [Value]
FROM ProductTransactions
WHERE [Date] <= #AsAtDate
)
SELECT [Date], Product, [Location], Quantity, [Value], Positive_date, Positive_date_quantity
FROM (
SELECT x1.[Date], x1.Product, x1.[Location], x1.Quantity, x1.[Value],
x2.[Date] AS Positive_date, x2.[Quantity] AS Positive_date_quantity,
ROW_NUMBER() OVER (PARTITION BY x1.Product, x1.[Location] ORDER BY x2.[Date] DESC) AS Positive_date_row
FROM x AS x1
LEFT JOIN x AS x2 ON x1.Product=x2.Product AND x1.[Location]=x2.[Location]
AND x2.[Date]<x1.[Date] AND x1.Quantity<0 AND x2.Quantity>0
WHERE x1.[Date] = #AsAtDate
) AS y
WHERE Positive_date_row=1
Do you mean that you want to get the last date of positive quantity come to positive in group?
For example, If you are using SQL Server 2012+:
In following scenario, when the date going to 01/03/2017 the summary of quantity come to 1(-10+5+6).
Is it possible the quantity of following date come to negative again?
;WITH tb(Product, Location,[Date],Quantity) AS(
SELECT 'A','B',CONVERT(DATETIME,'01/01/2017'),-10 UNION ALL
SELECT 'A','B','01/02/2017',5 UNION ALL
SELECT 'A','B','01/03/2017',6 UNION ALL
SELECT 'A','B','01/04/2017',2
)
SELECT t.Product,t.Location,SUM(t.Quantity) AS Qty,MIN(CASE WHEN t.CurrentSum>0 THEN t.Date ELSE NULL END ) AS LastPositiveDate
FROM (
SELECT *,SUM(tb.Quantity)OVER(ORDER BY [Date]) AS CurrentSum FROM tb
) AS t GROUP BY t.Product,t.Location
Product Location Qty LastPositiveDate
------- -------- ----------- -----------------------
A B 3 2017-01-03 00:00:00.000

Selecting changes in an employees details

I have a table in SQL Server where user is allowed to make changes to the employee's details. Every time a new record is placed in the EMPLOYEE_HIST table. Only the EMP_ID is kept constant for the employee, and all other details are modifiable.
Also there the is a SEQ_NO column which maintains the sequence of entries made.
EMPLOYEE_HIST:
SEQ_NO EMP_ID SOME_VAL1 SOME_VAL2
1 E1 V11 V21 (initial value of this employee)
2 E2 V12 V22 (initial value of this employee)
3 E3 V13 V23 (initial value of this employee)
4 E2 V00 V22
5 E1 V01 V21
6 E2 V02 V22
7 E4 V00 V00 (initial value of this employee)
I want a query which will give me changes made to particular employees, something like
EMP_ID SOME_VAL1_OLD SOME_VAL1_NEW SOME_VAL2_OLD SOME_VAL2_NEW
E1 V11 V01 V21 V21
E2 V12 V00 V22 V22
E2 V00 V02 V22 V22
UPDATE
Also employee details may be modified by user n number of times and for each change, a row should be present in the result set.
Please help.
EDIT:
I finally settled with using LAG function. It will work like this:
SELECT *,ROW_NUMBER() OVER(PARTITION BY EMP_ID,CHANGE_NO ORDER BY EMP_ID,CHANGE_NO,SEQ_NO)
FROM(
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1)
OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL1 AS NEW_VAL, '1' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL UNION ALL
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1) OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL2 AS NEW_VAL, '2' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL) TEMP
But the performance is terribly slow for fetching total 500 rows on the table containing 3 million records. Please give some suggestions to improve sorting cost.
You can use a CTE with a Window function if you're using 2008 or newer:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN r AS s1 ON e.EMP_ID = s1.EMP_ID and s1.rank = 1 --the last change
LEFT JOIN r AS s2 ON e.EMP_ID = s2.EMP_ID and s2.rank = 2 --the second to last change
If you want all of the changes, not just the top two, then you should be able to do something like this:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN (r AS s1 --the change
INNER JOIN r AS s2 ON s1.EMP_ID = s2.EMP_ID and s2.rank = s1.rank + 1) --previous value
ON e.EMP_ID = s1.EMP_ID
This should enumerate all changes until it encounters the original value.
You could use a CTE to get a partitioned row number, by EMP_ID. Then join that against itself where the row number is offset by 1.
;WITH PartitionedRows
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY SEQ_NO) AS RowID, EMP_ID, SOME_VAL1,SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT a.EMP_ID,b.SOME_VAL1 AS SOME_VAL1_OLD,a.SOME_VAL1 AS SOME_VAL1_NEW,b.SOME_VAL2 AS SOME_VAL2_OLD,a.SOME_VAL2 AS SOME_VAL2_NEW
FROM PartitionedRows a
LEFT JOIN PartitionedRows b ON a.EMP_ID = b.EMP_ID AND a.RowID = (b.RowID + 1)
WHERE b.RowID IS NOT NULL
You may be better off with a different data model. You could have a table EMPLOYEE_HIST_OLD that contains the identical data structure. This would allow you to archive the former data (even with a timestamp and/or sequence number), keep the size of the EMPLOYEE_HIST table smaller and w/o data you would not reference regularly, etc. This would allow for a basic join statement between the two tables.
I would then suggest you use the timestamp of the EMPLOYEE_HIST_OLD records to find the most recent modifications, then join those records back to the current records. This will only present to you the changed records. You could limit the query on EMPLOYEE_HIST_OLD to simply return one record (most recent) if you like. SQL query to get most recent row for each instance of a given key
If you must stay within the same EMPLOYEE_HIST table for everything and use the sequence number approach you may wish to use a count() to find changed records for a particular Employee ID and return the values ORDERED by sequence number. You could also limit the query to employees with count > 1. You would then view the data vertically in the table, though. To parse the values into separate columns like VAR1_OLD and VAR1 essentially would require you to only read the last two values and make one record out of two. You lose the visibility of all the changes when trying to view the data horizontally. There could be more than one historical change. To view the records horizontally would require you to do some array manipulation outside of SQL after the data was returned from the query.
For info on counting:
SQL query for finding records where count > 1

unique chat records sql

I have DB which having 5 column as follows:
message_id
user_id_send
user_id_rec
message_date
message_details
Looking for a SQL Serve Query, I want to Filter Results from two columns (user_id_send,user_id_rec)for Given User ID based on following constrains:
Get the Latest Record (filtered on date or message_id)
Only Unique Records (1 - 2 , 2 - 1 are same so only one record will be returned which ever is the latest one)
Ordered by Descending based on message_id
SQL Query
The main purpose of this query is to get records of user_id to find out to whom he has sent messages and from whom he had received messages.
I have also attached the sheet for your reference.
Here is my try
WITH t
AS (SELECT *
FROM messages
WHERE user_id_sender = 1)
SELECT DISTINCT user_id_reciever,
*
FROM t;
WITH h
AS (SELECT *
FROM messages
WHERE user_id_reciever = 1)
SELECT DISTINCT user_id_sender,
*
FROM h;
;WITH tmpMsg AS (
SELECT M2.message_id
,M2.user_id_receiver
,M2.user_id_sender
,M2.message_date
,M2.message_details
,ROW_NUMBER() OVER (PARTITION BY user_id_receiver+user_id_sender ORDER BY message_date DESC) AS 'RowNum'
FROM messages M2
WHERE M2.user_id_receiver = 1
OR M2.user_id_sender = 1
)
SELECT T.message_id
,T.user_id_receiver
,T.user_id_sender
,T.message_date
,T.message_details
FROM tmpMsg T
WHERE RowNum <= 1
The above should fetch you the results you are looking for when you query for a particular user_id (replace the 1 with parameter e.g. #p_user_id). The user_id_receiver+user_id_sender in the PARTITION clause ensure that records with user id combinations such as 1 - 2, 2 - 1 are not selected twice.
Hope this helps.
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_receiver = 1
--order by message_date DESC
) T where T.rowno = 1
UNION ALL
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_sender = 1
-- order by message_date DESC
) T where T.rowno = 1
Explanation: For each group of user_id_sender, it orders internally by message_date desc, and then adds row numbers, and we only want the first one (chronologically last). Then do the same for user_id_receiver, and union the results together to get 1 result set with all the desired rows. You can then add your own order by clause and additional where conditions at the end as required.
Of course, this only works for any 1 user_id at a time (replace =1 with #user_id).
To get a result from all user_id's at once, is a totally different query, so I hope this helps?

Left outer join for first row in group only

I have a table that looks like this:
BANK ACCOUNT_NAME EXCESS DEBT
Acme Bank Checking1 500 300
Acme Bank Personal 200 100
Bank One Business 100 50
I need a sql query that returns.
BANK ACCOUNT_NAME EXCESS DEBT AVAILABLE
Acme Bank Checking1 500 300 300
Acme Bank Personal 200 100 NULL
Bank One Business 100 50 50
AVAILABLE would be the Sum(EXCESS) - Sum(DEBT) grouped by BANK. AVAILABLE would then appear only on the first row of BANK-ACCOUNT_NAME combination. How do I do this?
My first attempt results in AVAILABLE having values on all rows, which not intended. I only want the first row in the group to have an AVAILABLE value.
SELECT
outer.BANK
,outer.ACCOUNT_NAME
,outer.EXCESS
,outer.DEBT
,inner2.AVAILABLE
FROM BankBalances AS outer
CROSS APPLY
(
SELECT TOP 1
Bank
,SUM(EXCESS) - SUM(DEBT) AS AVAILABLE
FROM BankBalances AS inner
GROUP BY Bank
WHERE outer.BANK = inner.BANK
) AS inner2
You can use the following query:
SELECT BANK, ACCOUNT_NAME, EXCESS, DEBT,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY BANK ORDER BY ACCOUNT_NAME) = 1
THEN SUM(EXCESS) OVER (PARTITION BY BANK) -
SUM(DEBT) OVER (PARTITION BY BANK)
ELSE NULL
END AS AVAILABLE
FROM BankBalances
You can use windowed version of SUM in order to avoid CROSS APPLY. ROW_NUMBER is simply used to check for first row.
I have made the assumption that first row is considered the one having the 'minimum' ACCOUNT_NAME value within each BANK partition.
Demo here
you can use ROW_NUMBER and SUM OVER() with Partition by like this.
;WITH CTE AS
(
SELECT
BANK
,ACCOUNT_NAME
,EXCESS
,DEBT
,SUM(EXCESS - DEBT) OVER(PARTITION BY BANK) AS AVAILABLE,
,ROW_NUMBER()OVER(PARTITION BY BANK ORDER BY ACCOUNT_NAME ASC) rn
FROM BankBalances
)
SELECT BANK
,ACCOUNT_NAME
,EXCESS
,DEBT
,CASE WHEN rn = 1 THEN AVAILABLE ELSE null end as AVAILABLE
FROM CTE

Resources