SQL Join one-to-many tables, selecting only most recent entries - sql-server

This is my first post - so I apologise if it's in the wrong seciton!
I'm joining two tables with a one-to-many relationship using their respective ID numbers: but I only want to return the most recent record for the joined table and I'm not entirely sure where to even start!
My original code for returning everything is shown below:
SELECT table_DATES.[date-ID], *
FROM table_CORE LEFT JOIN table_DATES ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
ORDER BY [table_CORE].[core-ID], [table_DATES].[iteration];
This returns a group of records: showing every matching ID between table_CORE and table_DATES:
table_CORE date-ID iteration
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
3 3 1
4 4 1
But I need to return only the date with the maximum value in the "iteration" field as shown below
table_CORE date-ID iteration Additional data
1 1 3 MoreInfo
2 2 2 MoreInfo
3 3 1 MoreInfo
4 4 1 MoreInfo
I really don't even know where to start - obviously it's going to be a JOIN query of some sort - but I'm not sure how to get the subquery to return only the highest iteration for each item in table 2's ID field?
Hope that makes sense - I'll reword if it comes to it!
--edit--
I'm wondering how to integrate that when I'm needing all the fields from table 1 (table_CORE in this case) and all the fields from table2 (table_DATES) joined as well?
Both tables have additional fields that will need to be merged.
I'm pretty sure I can just add the fields into the "SELECT" and "GROUP BY" clauses, but there are around 40 fields altogether (and typing all of them will be tedious!)

Try using the MAX aggregate function like this with a GROUP BY clause.
SELECT
[ID1],
[ID2],
MAX([iteration])
FROM
table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE
table_CORE.[core-ID] Like '*' --LIKE '%something%' ??
GROUP BY
[ID1],
[ID2]
Your example field names don't match your sample query so I'm guessing a little bit.

Just to make sure that I have everything you’re asking for right, I am going to restate some of your question and then answer it.
Your source tables look like this:
table_core:
table_dates:
And your outputs are like this:
Current:
Desired:
In order to make that happen all you need to do is use a subquery (or a CTE) as a “cross-reference” table. (I used temp tables to recreate your data example and _ in place of the - in your column names).
--Loading the example data
create table #table_core
(
core_id int not null
)
create table #table_dates
(
date_id int not null
, iteration int not null
, additional_data varchar(25) null
)
insert into #table_core values (1), (2), (3), (4)
insert into #table_dates values (1,1, 'More Info 1'),(1,2, 'More Info 2'),(1,3, 'More Info 3'),(2,1, 'More Info 4'),(2,2, 'More Info 5'),(3,1, 'More Info 6'),(4,1, 'More Info 7')
--select query needed for desired output (using a CTE)
; with iter_max as
(
select td.date_id
, max(td.iteration) as iteration_max
from #table_dates as td
group by td.date_id
)
select tc.*
, td.*
from #table_core as tc
left join iter_max as im on tc.core_id = im.date_id
inner join #table_dates as td on im.date_id = td.date_id
and im.iteration_max = td.iteration

select *
from
(
SELECT table_DATES.[date-ID], *
, row_number() over (partition by table_CORE date-ID order by iteration desc) as rn
FROM table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
) tt
where tt.rn = 1
ORDER BY [core-ID]

Related

Getting non-deterministic results from WITH RECURSIVE cte

I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.

T-SQL "<> ANY(subquery)"

I have a question about the Any-Operator.
On Technet it says
For example, the following query finds customers located in a territory not covered by any sales persons.
Use AdventureWorks2008R2;
GO
SELECT
CustomerID
FROM
Sales.Customer
WHERE
TerritoryID <> ANY
(
SELECT
TerritoryID
FROM
Sales.SalesPerson
);
Further
The results include all customers, except those whose sales territories are NULL, because every territory that is assigned to a customer is covered by a sales person. The inner query finds all the sales territories covered by sales persons, and then, for each territory, the outer query finds the customers who are not in one.
But that query returns all customers.
I updated a customers TerritoryID to a value that no sales.person has, but still that query returns all customers, instead of that one I expected ..
Am I missing something ?
Might it be that that article on technet is simply wrong ?
https://technet.microsoft.com/de-de/library/ms187074(v=sql.105).aspx (german)
There is one customer with TerritoryID = 13
Inner query result (SELECT TerritoryID FROM Sales.SalesPerson) :
4
2
4
3
6
5
1
4
6
1
1
6
9
1
8
10
7
And in table Sales.Customer is a row with CustomerID = 13, which is the one not covered by a sales-person..
create table #t1
(
id int
)
insert into #t1
values(1),(2),(3)
As you can see,T1 has three values
now lets see,how Any Works
When 'is Equal to ' is used with any ,it works like IN
select * from #t1 where id=
any(select 0)--no result
when Any is used with > or <> ,Any means get me all the values which are greater than minimum value
select * from #t1 where id<>
any(select 1)--2,3
select * from #t1 where id<>
any(select 0)--1,2,3
If your subquery returns one value,the outer query will try to get values which are greater than inner query
<> ANY means any Sales.Customer with a TerritoryID that is Greater Than or Less Than any of the TerritoryID's in the Sales.SalesPerson
so TerritoryID = 13 is greater than all or your examples (4 2 4 3 6 5 1 4 6 1 1 6 9 1 8 10 7), so it's included.
<> ALL is the equivalent of NOT IN so that is what you're confusing <> ANY with
Look at <> ANY as, if there are any records in the set that are not equal to the quailifier, then include it.
The following query has the same result:
SELECT CustomerID FROM Sales.Customer
WHERE TerritoryID NOT IN (SELECT TerritoryID FROM Sales.SalesPerson)

How do I exclude rows when an incremental value starts over?

I am a newbie poster but have spent a lot of time researching answers here. I can't quite figure out how to create a SQL result set using SQL Server 2008 R2 that should probably be using lead/lag from more modern versions. I am trying to aggregate data based on sequencing of one column, but there can be varying numbers of instances in each sequence. The only way I know a sequence has ended is when the next row has a lower sequence number. So it may go 1-2, 1-2-3-4, 1-2-3, and I have to figure out how to make 3 aggregates out of that.
Source data is joined tables that look like this (please help me format):
recordID instanceDate moduleID iResult interactionNum
1356 10/6/15 16:14 1 68 1
1357 10/7/15 16:22 1 100 2
1434 10/9/15 16:58 1 52 1
1435 10/11/15 17:00 1 60 2
1436 10/15/15 16:57 1 100 3
1437 10/15/15 16:59 1 100 4
I need to find a way to separate the first 2 rows from the last 4 rows in this example, based on values in the last column.
What I would love to ultimately get is a result set that looks like this, which averages the iResult column based on the grouping and takes the first instanceDate from the grouping:
instanceDate moduleID iResult
10/6/15 1 84
10/9/15 1 78
I can aggregate to get this result using MIN and AVG if I can just find a way to separate the groups. The data is ordered by instanceDate (please ignore the date formatting here) then interactionNum and the group separation should happen when the query finds a row where the interactionNum is <= than the previous row (will usually start over with '1' but not always, so prefer just to separate on a lower or equal integer value).
Here is the query I have so far (includes the joins that give the above data set):
SELECT
X.*
FROM
(SELECT TOP 100 PERCENT
instanceDate, b.ModuleID, iResult, b.interactionNum
FROM
(firstTable a
INNER JOIN
secondTable b ON b.someID = a.someID)
WHERE
a.someID = 2
AND b.otherID LIKE 'xyz'
AND a.ModuleID = 1
ORDER BY
instanceDate) AS X
OUTER APPLY
(SELECT TOP 1
*
FROM
(SELECT
instanceDate, d.ModuleID, iResult, d.interactionNum
FROM
(firstTable c
INNER JOIN
secondTable d ON d.someID = c.someID)
WHERE
c.someID = 2
AND d.otherID LIKE 'xyz'
AND c.ModuleID = 1
AND d.interactionNum = X.interactionNum
AND c.instanceDate < X.instanceDate) X2
ORDER BY
instanceDate DESC) Y
WHERE
NOT EXISTS (SELECT Y.interactionNum INTERSECT SELECT X.interactionNum)
But this is returning an interim result set like this:
instanceDate ModuleID iResult interactionNum
10/6/15 16:10 1 68 1
10/6/15 16:14 1 100 2
10/15/15 16:57 1 100 3
10/15/15 16:59 1 100 4
and the problem is that interactionNum 3, 4 do not belong in this result set. They would go in the next result set when I loop over this query. How do I keep them out of the result set in this iteration? I need the result set from this query to just include the first two rows, 'seeing' that row 3 of the source data has a lower value for interactionNum than row 2 has.
Not sure what ModuleID was supposed to be used, but I guess you're looking for something like this:
select min (instanceDate), [moduleID], avg([iResult])
from (
select *,row_number() over (partition by [moduleID] order by instanceDate) as RN
from Table1
) X
group by [moduleID], RN - [interactionNum]
The idea here is to create a running number with row_number for each moduleid, and then use the difference between that and InteractionNum as grouping criteria.
Example in SQL Fiddle
Here is my solution, although it should be said, I think #JamesZ answer is cleaner.
I created a new field called newinstance which is 1 wherever your instanceNumber is 1. I then created a rolling sum(newinstance) called rollinginstance to group on.
Change the last select to SELECT * FROM cte2 to show all the fields I added.
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
CREATE TABLE #tmpData (recordID INT, instanceDate DATETIME, moduleID INT, iResult INT, interactionNum INT)
INSERT INTO #tmpData
SELECT 1356,'10/6/15 16:14',1,68,1 UNION
SELECT 1357,'10/7/15 16:22',1,100,2 UNION
SELECT 1434,'10/9/15 16:58',1,52,1 UNION
SELECT 1435,'10/11/15 17:00',1,60,2 UNION
SELECT 1436,'10/15/15 16:57',1,100,3 UNION
SELECT 1437,'10/15/15 16:59',1,100,4
;WITH cte1 AS
(
SELECT *,
CASE WHEN interactionNum=1 THEN 1 ELSE 0 END AS newinstance,
ROW_NUMBER() OVER(ORDER BY recordID) as rowid
FROM #tmpData
), cte2 AS
(
SELECT *,
(select SUM(newinstance) from cte1 b where b.rowid<=a.rowid) as rollinginstance
FROM cte1 a
)
SELECT MIN(instanceDate) AS instanceDate, moduleID, AVG(iResult) AS iResult
FROM cte2
GROUP BY moduleID, rollinginstance

Get the missing value in a sequence of numbers

I made the following query for the SQL Server backend
SELECT TOP(1) (v.rownum + 99)
FROM
(
SELECT incrementNo-99 as id, ROW_NUMBER() OVER (ORDER BY incrementNo) as rownum
FROM proposals
WHERE [year] = '12'
) as v
WHERE v.rownum <> v.id
ORDER BY v.rownum
to find the first unused proposal number.
(It's not about the lastrecord +1)
But I realized ROW_NUMBER is not supported in access.
I looked and I can't find something similar.
Does anyone know how to get the same result as a ROW_NUMBER in access?
Maybe there's a better way of doing this.
Actually people insert their proposal No (incrementID) with no constraint. This number looks like this 13-152. xx- is for the current year and the -xxx is the proposal number. The last 3 digits are supposed to be incremental but in some case maybe 10 times a year they have to skip some numbers. That's why I can't have the auto increment.
So I do this query so when they open the form, the default number is the first unused.
How it works:
Because the number starts at 100, I do -99 so it starts at 1.
Then I compare the row number with the id so it looks like this
ROW NUMBER | ID
1 1 (100)
2 2 (101)
3 3 (102)
4 5 (104)<--------- WRONG
5 6 (105)
So now I know that we skip 4. So I return (4 - 99) = 103
If there's a better way, I don't mind changing but I really like this query.
If there's really no other way and I can't simulate a row number in access, i will use the pass through query.
Thank you
From your question it appears that you are looking for a gap in a sequence of numbers, so:
SELECT b.akey, (
SELECT Top 1 akey
FROM table1 a
WHERE a.akey > b.akey) AS [next]
FROM table1 AS b
WHERE (
SELECT Top 1 akey
FROM table1 a
WHERE a.akey > b.akey) <> [b].[akey]+1
ORDER BY b.akey
Where table1 is the table and akey is the sequenced number.
SELECT T.Value, T.next -1 FROM (
SELECT b.Value , (
SELECT Top 1 Value
FROM tblSequence a
WHERE a.Value > b.Value) AS [next]
FROM tblSequence b
) T WHERE T.next <> T.Value +1

TSQL: Get all rows for given ID

I am trying to output all of my database reports into one report. I'm currently using nested select statements to get each line for each ID (the number of ID's is unknown). Now I would like to return all the rows for every ID (e.g. 1-25 if there are 25 rows) in one query. How would I do this?
SELECT (
(SELECT ... FROM ... WHERE id = x) As Col1
(SELECT ... FROM ... WHERE id = x) As Col2
(SELECT ... FROM ... WHERE id = x) As Col3
)
EDIT: Here's an example:
SELECT
(select post_id from posts where report_id = 1) As ID,
(select isnull(rank, 0) from results where report_id = 1 and url like '%www.testsite.com%') As Main,
(select isnull(rank, 0) from results where report_id = 1 and url like '%.testsite%' and url not like '%www.testsite%') As Sub
This will return the rank of a result for the main domain and the sub-domain, as well as the ID for the posts table.
ID Main Sub
--------------------------------------
1 5 0
I'd like to loop through this query and change report_id to 2, then 3, then 4 and carry on until all results are displayed. Nothing else needs to change other than the report_id.
Here's a basic example of what is inside the tables
POSTS
post_id post report_id
---------------------------------------------------------
1 "Hello, I am..." 1
2 "This may take..." 2
3 "Bla..." 2
4 "Bla..." 3
5 "Bla..." 4
RESULTS
result_id url title report_id
--------------------------------------------------------
1 http://... "Intro" 1
2 http://... "Hello!" 1
3 http://... "Question" 2
4 http://... "Help" 3
REPORTS
report_id description
---------------------------------
1 Introductions
2 Q&A
3 Starting Questions
4 Beginner Guides
5 Lectures
The query will want to pull the first post, the first result from the main website (www) and the first result from a subdomain by their report_id. These tables are part of a complicated join structure with many other tables but for these purposes these tables are the only ones that are needed.
I've managed to solve the problem by creating a table, setting variables to take all the contents and insert them in a while loop, then selecting them and dropping the table. I'll leave this open for a bit to see if anyone picks up a better way of doing it because I hate doing it this way.
If you need each report id on its own column, take a look at the PIVOT/UNPIVOT commands.
Here's one way of doing it :
SELECT posts.post_id AS ID,
IsNull(tblMain.Rank, 0) AS Main,
IsNull(tblSub.Rank, 0) AS Sub
FROM posts
LEFT JOIN results AS tblMain ON posts.post_id = tblMain.report_id AND tblMain.url like '%www.testsite.com%'
LEFT JOIN results AS tblSub ON posts.post_id = tblSub.report_id AND tblSub.url like '%.testsite%' and tblSub.url not like '%www.testsite%'
That is one query? You've provided your own answer?
If you mean you want to return a series of 'rows' as, for some reason, 'columns', this ability does exist, but I can't remember the exact name. Possible pivot. But it's a little odd.
see if this is what you are looking
SELECT
CASE WHEN reports.id = 1 THEN reports.Name
ELSE "" AS Col1,
CASE WHEN reports.id = 2 THEN reports.Name
ELSE "" AS Col2
....
FROM reports
Best Regards,
Iordan
Assuming you have a "master" table of IDs (if not I suggest you do so for Foreign Key purposes):
SELECT (
(SELECT ... FROM ... WHERE id = m.ID) As Col1
(SELECT ... FROM ... WHERE id = m.ID) As Col2
(SELECT ... FROM ... WHERE id = m.ID) As Col3
)
FROM MasterIDs m
Depending on how much each report is similar,you may be able to speed that up by moving some of the logic out of the nested statements and into the main body of the query.
Possibly a better way of thinking about this is to alter each report statement to return (ID,value) and do something like:
SELECT
report1.Id
,report1.Value AS Col1
,report2.Value AS Col2
FROM (SELECT Id, ... AS Value FROM ...) report1
JOIN (SELECT Id, ... AS Value FROM ...) report2 ON report1.Id = report2.Id
again, depending on the similarity of your reports you could probably combine these in someway.

Resources