Retrieving the most recent records within a query

Retrieving the most recent records within a query - sql-server

I have the following tables:
tblPerson:
PersonID | Name
---------------------
1 | John Smith
2 | Jane Doe
3 | David Hoshi
tblLocation:
LocationID | Timestamp | PersonID | X | Y | Z | More Columns...
---------------------------------------------------------------
40 | Jan. 1st | 3 | 0 | 0 | 0 | More Info...
41 | Jan. 2nd | 1 | 1 | 1 | 0 | More Info...
42 | Jan. 2nd | 3 | 2 | 2 | 2 | More Info...
43 | Jan. 3rd | 3 | 4 | 4 | 4 | More Info...
44 | Jan. 5th | 2 | 0 | 0 | 0 | More Info...
I can produce an SQL query that gets the Location records for each Person like so:
SELECT LocationID, Timestamp, Name, X, Y, Z
FROM tblLocation
JOIN tblPerson
ON tblLocation.PersonID = tblPerson.PersonID;
to produce the following:
LocationID | Timestamp | Name | X | Y | Z |
--------------------------------------------------
40 | Jan. 1st | David Hoshi | 0 | 0 | 0 |
41 | Jan. 2nd | John Smith | 1 | 1 | 0 |
42 | Jan. 2nd | David Hoshi | 2 | 2 | 2 |
43 | Jan. 3rd | David Hoshi | 4 | 4 | 4 |
44 | Jan. 5th | Jane Doe | 0 | 0 | 0 |
My issue is that we're only concerned with the most recent Location record. As such, we're only really interested in the following Rows: LocationID 41, 43, and 44.
The question is: How can we query these tables to give us the most recent data on a per-person basis? What special grouping needs to happen to produce the desired result?

MySQL doesn't have ranking/analytical/windowing functionality.
SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z
FROM tblPerson tp
JOIN tblLocation tl ON tl.personid = tp.personid
JOIN (SELECT t.personid,
MAX(t.timestamp) AS max_date
FROM tblLocation t
GROUP BY t.personid) x ON x.personid = tl.personid
AND x.max_date = tl.timestamp
SQL Server 2005+ and Oracle 9i+ support analytics, so you could use:
SELECT x.locationid, x.timestamp, x.name, x.X, x.Y, x.Z
FROM (SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY tp.name ORDER BY tl.timestamp DESC) AS rank
FROM tblPerson tp
JOIN tblLocation tl ON tl.personid = tp.personid) x
WHERE x.rank = 1
Using a variable to get same as ROW_NUMBER functionality on MySQL:
SELECT x.locationid, x.timestamp, x.name, x.X, x.Y, x.Z
FROM (SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z,
CASE
WHEN #name != t.name THEN
#rownum := 1
ELSE #rownum := #rownum + 1
END AS rank,
#name := tp.name
FROM tblLocation tl
JOIN tblPerson tp ON tp.personid = tl.personid
JOIN (SELECT #rownum := NULL, #name := '') r
ORDER BY tp.name, tl.timestamp DESC) x
WHERE x.rank = 1

As #Mark Byers mentions, this problem comes up frequently on Stack Overflow.
Here's the solution I most frequently recommend, given your tables:
SELECT p.*, l1.*
FROM tblPerson p
JOIN tblLocation l1 ON p.PersonID = l1.PersonID
LEFT OUTER JOIN tblLocation l2 ON p.PersonID = l2.PersonID AND
(l1.timestamp < l2.timestamp OR l1.timestamp = l2.timestamp AND l1.LocationId < l2.LocationId)
WHERE l2.LocationID IS NULL;
To see other examples, follow the tag greatest-n-per-group, which I added to your question.

This is a classic 'max per group' question that comes up on Stack Overflow almost every day. There are many ways to solve it and you can find example solutions by searching Stack Overflow. Here is one way that you can do it in MySQL:
SELECT
location.LocationId,
location.Timestamp,
person.Name,
location.X,
location.Y,
location.Z
FROM (
SELECT
LocationID,
#rn := CASE WHEN #prev_PersonID = PersonID
THEN #rn + 1
ELSE 1
END AS rn,
#prev_PersonID := PersonID
FROM (SELECT #prev_PersonID := NULL) vars, tblLocation
ORDER BY PersonID, Timestamp DESC
) T1
JOIN tblLocation location ON location.LocationID = T1.LocationId
JOIN tblPerson person ON person.PersonID = location.PersonID
WHERE rn = 1

Related

Update All other Records Based on a single record

I have a table with a million records. I need to update some columns which are null based on the existing 'not null' records of a particular id based columns. I've tried with one query, it seems to be working fine but I don't have confidence in it that it will be able to update all those 1 million records exactly the way I need. I'm providing you some sample data how my table looks like.Any help will be appreciated
SELECT * INTO #TEST FROM (
SELECT 1 AS EMP_ID,10 AS DEPT_ID,15 AS ITEM_NBR ,NULL AS AMOUNT,NULL AS ITEM_NME
UNION ALL
SELECT 1,20,16,500,'ABCD'
UNION ALL
SELECT 1,30,17,NULL,NULL
UNION ALL
SELECT 2,10,15,1000,'XYZ'
UNION ALL
SELECT 2,30,16,NULL,NULL
UNION ALL
SELECT 2,40,17,NULL,NULL
) AS A
Sample data:
+--------+---------+----------+--------+----------+
| EMP_ID | DEPT_ID | ITEM_NBR | AMOUNT | ITEM_NME |
+--------+---------+----------+--------+----------+
| 1 | 10 | 15 | NULL | NULL |
| 1 | 20 | 16 | 500 | ABCD |
| 1 | 30 | 17 | NULL | NULL |
| 2 | 10 | 15 | 1000 | XYZ |
| 2 | 30 | 16 | NULL | NULL |
| 2 | 40 | 17 | NULL | NULL |
+--------+---------+----------+--------+----------+
Expected result:
+--------+---------+----------+--------+----------+
| EMP_ID | DEPT_ID | ITEM_NBR | AMOUNT | ITEM_NME |
+--------+---------+----------+--------+----------+
| 1 | 10 | 15 | 500 | ABCD |
| 1 | 20 | 16 | 500 | ABCD |
| 1 | 30 | 17 | 500 | ABCD |
| 2 | 10 | 15 | 1000 | XYZ |
| 2 | 30 | 16 | 1000 | XYZ |
| 2 | 40 | 17 | 1000 | XYZ |
+--------+---------+----------+--------+----------+
I tried this but I'm unable to conclude whether it is updating all the 1 million records properly.
SELECT * FROM #TEST T
inner JOIN #TEST T1 ON T1.EMP_ID=T.EMP_ID
WHERE T1.AMOUNT IS NOT NULL
UPDATE T SET AMOUNT=T1.AMOUNT
FROM #TEST T
inner JOIN #TEST T1 ON T1.EMP_ID=T.EMP_ID
WHERE T1.AMOUNT IS not NULL

I have used UPDATE using inner join
UPDATE T
SET T.AMOUNT = X.AMT,T.ITEM_NME=X.I_N
FROM #TEST T
JOIN
(SELECT EMP_ID,MAX(AMOUNT) AS AMT,MAX(ITEM_NME) AS I_N
FROM #TEST
GROUP BY EMP_ID) X ON X.EMP_ID = T.EMP_ID

SELECT * into #Test1
FROM #TEST
WHERE AMOUNT IS NOT NULL
For records validation run this query first
SELECT T.AMOUNT, T1.AMOUNT, T1.EMP_ID,T1.EMP_ID
FROM #TEST T
inner JOIN #TEST1 T1 ON T1.EMP_ID=T.EMP_ID
WHERE T.AMOUNT IS NULL
Begin Trans
UPDATE T
SET T.AMOUNT=T1.AMOUNT, T.ITEM_NME= = T1.ITEM_NME
FROM #TEST T
inner JOIN #TEST1 T1 ON T1.EMP_ID=T.EMP_ID
WHERE T.AMOUNT IS NULL
rollback

SELECT EMP_ID,MAX(AMOUNT) as AMOUNT MAX(ITEM_NAME) as ITEM_NAME
INTO #t
FROM #TEST
GROUP BY EMP_ID
UPDATE t SET t.AMOUNT = t1.AMOUNT, t.ITEM_NAME = t1.ITEM_NAME
FROM #TEST t INNER JOIN #t t1
ON t.emp_id = t1.emp_id
WHERE t.AMOUNT IS NULL and t.ITEM_NAME IS NULL
Use MAX aggregate function to get amount and item name for each employee and then replace null values of amount and item name with those values. For validation use COUNT function to calculate the number of rows with values of amount and item name as null. If the number of rows is zero then table is updated correctly

Swap row values in SQL Server

I have question about SQL Server
Source: emp
id | name | check |deptname
100 | a | 1 |ceo
100 | b | 2 |hr
100 | c | 3 |po
100 | d | 5 |no
101 | a | 1 |pm
101 | b | 5 |ceo
102 | a | 1 |rn
102 | b | 2 |han
Here same id have check 2 and 5 values then we need to replace check values to 2 check values for that id.
Based on above table I want load/output data into target table like below
Target : emp1
id | name | check |deptname
100 | a | 1 |ceo
100 | d | 2 |hr
100 | c | 3 |po
101 | a | 1 |pm
101 | b | 5 |ceo
102 | a | 1 |rn
102 | b | 2 |han
and I tried like below
select
a1.id,
a1.name,
isnull(a2.[check],a1.[check]) as [check]
from
emp as a1
left outer join
emp as a2 on a2.id = a1.id
and a1.[check] in (2,5)
and a2.[check] in (2,5)
and a2.[check] <> a1.[check]
where
a2.id is null
or (a1.[check] = 5
and a2.[check] = 2)
That query does not return the right result.
Please tell me how to write query to get the expected output in SQL Server

This is how you can do that:
select
e1.id,
isnull(e2.name, e1.name) as name,
e1.[check],
e1.deptname
from emp e1
left outer join emp e2
on e2.id = e1.id and e1.[check] = 2 and e2.[check] = 5
where
not exists (select 1 from emp e3 where e3.id = e1.id and
e1.[check] = 5 and e3.[check] = 2)
Example in SQL Fiddle

SUM On Column With Group By SQL

I have following data:
+----------------+--------------+-----+
| StgDescription | ID | Amt |
+----------------+--------------+-----+
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| B | ZA47/ A | 12 |
| B | ZA47/ A | 12 |
| B | ZA47/ B | 10 |
| B | ZA47/ B | 10 |
| B | ZA48/ A | 14 |
| B | ZA48/ F | 10 |
| B | ZA48 /G | 13 |
| B | ZA48 /H | 10 |
| B | ZA48/ I | 15 |
| B | ZA48/ J | 10 |
| B | ZA48/ K | 16 |
| B | ZA48/ L | 10 |
| c | FA01LM100340 | 10 |
| c | PA53 AE | 10 |
+----------------+--------------+-----+
I want to generate report in following format. The amount should be sum for ID for same StgDescription.
+----------------+-----+
| StgDescription | Amt |
+----------------+-----+
| a | 11 |
| b | 120 |
| c | 20 |
+----------------+-----+
I've written following query to get this result:
WITH CTE AS(
SELECT
distinct
s.StgDescription
,p.ID
,Amt
FROM [DinDb].[dbo].[tblTvlTransaction] t
JOIN tblstgmaster s on t.StgId=s.StgId
JOIN tblProjDocSt p on t.TDocID=p.DocId
JOIN [PdasDb].[dbo].[tblIDmaster] f ON p.ID=f.ID
where OptAuthoDateTime between '2015-07-27 00:00:00' and '2015-09-01 00:00:00')
select StgDescription,sum(AMT) from cte group by StgDescription
Is there any other efficient alternative to do this?

First in cte remove duplicates, then GROUP BY like:
WITH cte AS (
SELECT DISTINCT StgDescription, ID, Amt
FROM your_tab
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;
OR:
WITH cte AS (
SELECT StgDescription, ID, Amt
FROM your_tab
GROUP BY StgDescription, ID, Amt
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;

I hope that you get the data from a query, not from a table. It would not be good to store data thus redundantly. And it would not be gould to name a column ID which is not the unique identifier for a row in a table.
Your problem with the data is that you have duplicates, which prevents you from getting the sum directly. So use DISTINCT to make your data unique first.
If this data is from a query then simply add DISTINCT after the SELECT keyword. If not, use a derived table (i.e. a subquery) where you select distinct records from the table.
select stgdescription, sum(amt)
from
(
select distinct stgdescription, id, amt
from mydata
) distinct_data
group by stgdescription;
You may want to replace stgdescription with lower(stgdescription), though, if stgdescription can be 'A' or 'a' and you want to treat them the same.

I'd keep it as simple as possible, like this:
select StgDescription, sum(Amt) from
(
select distinct StgDescription, ID, Amt from tablename
) a
group by StgDescription
Hope it helps!

I suspect your duplicates are coming from [tblTvlTransaction], therefore, I would remove this table as a JOIN and use EXISTS to just check a record is there. So essentially the only tables in the FROM clause are those you actually need data from:
SELECT s.StgDescription, p.ID, s.Amt
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
);
The advantage of EXISTS is that it can use a semi-join, which essentially means rather than pulling back all the rows from the transaction table, it will stop the seek/scan as soon as it finds one matching record. This should leave you without duplicates so you can do the SUM directly:
SELECT s.StgDescription, Amount = SUM(s.Amt)
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
)
GROUP BY s.StgDescription;

SQL: Place the sum of data in table A (which is dependent on table B) into table C arranged according to table B. (sample data provided)

Basically, I'll have to group the people in Table 2 according to departments (known only through referring to table 1) and then sum each department according to type (known through Table 2). The results have to grouped by departments (known through table 1).
Can anyone help me execute such an action in SQL?
Sample data:
Table 1
DeptID | Dept
1 | Eng
2 | Mkt
3 | Mkt
4 | Eng
Table 2
Person | DeptID | Type | Amount
A | 1 | p1 | 5
B | 2 | p2 | 3
C | 3 | p1 | 10
D | 2 | p1 | 20
E | 4 | p2 | 17
F | 1 | p2 | 15
G | 2 | p1 | 16
Table Results
Dept | Sum p1 | Sum p2
Eng | 5 | 32
Mkt | 46 | 3

You could do this using conditional aggregation.
SELECT
t1.Dept,
[Sum p1] = SUM(CASE WHEN t2.Type = 'p1' THEN t2.Amount END),
[Sum p2] = SUM(CASE WHEN t2.Type = 'p2' THEN t2.Amount END)
FROM Table2 t2
INNER JOIN Table1 t1
ON t1.DeptID = t2.DeptID
GROUP BY
t1.Dept

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)

I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.

It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Retrieving the most recent records within a query - sql-server

Related

Update All other Records Based on a single record

Swap row values in SQL Server

SUM On Column With Group By SQL

SQL: Place the sum of data in table A (which is dependent on table B) into table C arranged according to table B. (sample data provided)

How do I utilize Row_Number() (partitioning) for my datapool correctly

Categories

Resources