Selecting the most recent date on a joined table - sql-server

I cannot figure out how to create a sub-query to select only the latest date of the grouped by value sys_loc_code. Two tables E and R joined on sys_sample_code. I want to to the distinct value for the sys_loc_code field. I want this to be the data from the row that contains the latest date in it's sample_date field.
The code I have so far is:
SELECT E.sample_date, E.sys_loc_code, R.sys_sample_code, R.chemical_name, R.result_value, R.detect, R.LabSampType
FROM GMP.GMP_Sample_Results AS R
INNER JOIN GMP.GMP_Sample_Events AS E ON R.sys_sample_code = E.sys_sample_code
WHERE (R.chemical_name = N'Tetrachloroethene') and E.sample_date > '2016-01-01 00:00:00.000'
ORDER BY sys_loc_code, sample_date desc
Please see image for desired results. Desired results are in yellow.
I have tried MAX, DISTINCT, multiple joins, MAX DISTINCT, GROUP BY and countless others. Can someone please suggest the code I need to get the results I desire. Many thanks.

If you use ROW_NUMBER and PARTITION BY the column you want to be unique and ORDER BY the column you want the most recent of, and then take only those results where the row number is 1, you should get what you want.
SELECT sample_date, sys_loc_code, sys_sample_code, chemical_name, result_value, detect, LabSampType
FROM (
SELECT E.sample_date, E.sys_loc_code, R.sys_sample_code, R.chemical_name, R.result_value, R.detect, R.LabSampType
, ROW_NUMBER() OVER (PARTITION BY E.sys_loc_code ORDER BY sample_date DESC) RN
FROM GMP.GMP_Sample_Results AS R
INNER JOIN GMP.GMP_Sample_Events AS E ON R.sys_sample_code = E.sys_sample_code
WHERE (R.chemical_name = N'Tetrachloroethene') AND E.sample_date > '2016-01-01 00:00:00.000'
) X
WHERE RN = 1
ORDER BY sys_loc_code, sample_date DESC;

This is the code I used to get the results I wanted.
SELECT R.*, E.sample_date
FROM GMP.GMP_Sample_Results AS R
INNER JOIN GMP.GMP_Sample_Events AS E ON R.sys_sample_code = E.sys_sample_code
INNER JOIN (
SELECT sys_loc_code, MAX(sample_date) AS MAX_DATE
FROM GMP.GMP_Sample_Events
GROUP BY sys_loc_code
) AS MD ON E.sys_loc_code = MD.sys_loc_code AND E.sample_date = MD.MAX_DATE
WHERE (R.chemical_name = 'Tetrachloroethene') AND (E.sample_date > '2016-01-01-00:00:00.000')
ORDER BY E.sys_loc_code
I am intrigued by the TOP 1 and PARTITION solutions also, if anyone would care to explain how I can successfully get these to work. There is always more than 1 way to skin a cat. As a beginner, the more tools I have in my memory box the better I will be at my job. Thanks to all that have helped so far. #Dale K , if you can find a solution to the errors in the comments section, I will mark your's as answered as well.

Related

Two IDs matching one causing duplicates

I am trying to inner join however I keep getting this duplicate pop up where there are two job IDs matching one Invoice ID (inner joined with a middle table that links both).
I want to only get 1 invoice id and summing the total despite 2 job ids matching it.
Basically there is AINVOICEID (table:Invoice ) matching ABINVOICEID (table:INLines) and inside the INLines table, it contains ARJOBID that matches the JOBID in Jobs.
Select distinct sum(totalBASE) as InvoiceTotal,
DATEADD(MONTH, DATEDIFF(MONTH, 0, InvDate), 0)
from (select distinct left(JOBID,5)as JOBID
from jobs
group by JOBID
) jobs
inner join (select distinct ABINVOICEID, left(ARLJOBID,5) as arljobid
from INLines
group by ARLJOBID, ABINVOICEID
) INLines
on left(ARLJOBID,5) = left(JOBID,5)
inner join (select distinct AINVOICEID
, sum(totalBASE) as totalBASE
, InvDate
from Invoice
group by AINVOICEID, InvDate
) Invoice
on AINVOICEID = ABINVOICEID
where left(JOBID ,5)=left(ARLJOBID,5) and AINVOICEID = ABINVOICEID
and InvDate between '05/01/2022' AND '05/31/2022'
group by left(JOBID ,5), DATEADD(MONTH, DATEDIFF(MONTH, 0, InvDate), 0)
It is quite difficult to try and make sense of what you're asking without an example of the output that you're getting and what the expected output is.
However, I think you're wanting something like this:
SELECT I.AINVOICEID, SUM(I.totalBASE) AS totalBASE, InvDate
FROM Invoice I INNER JOIN
(
SELECT L.ABINVOICEID, J.JOBID
FROM INLines L INNER JOIN
Jobs J ON L.ARLJOBID = J.JOBID
GROUP BY L.ABINVOICEID, J.JOBID
) LJ ON I.AINVOICEID = L.ABINVOICEID
WHERE I.InvDate BETWEEN '05/01/2022' AND '05/31/2022'
GROUP BY I.AINVOICEID, InvDate
This is based on what your SQL query which doesn't really look like it needs to JOIN on the INLines or Jobs table because you're getting everything you need from the Invoice table in the SELECT ?
If this isn't what you're after, if you can elaborate a bit more, then the community on here should be able to better assist with your question.

Unsupported subquery type cannot be evaluated with LIMIT 1 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
(Submitting the following as both an attempt at info-share of best practices as well as a request for additional opinions...)
Q: I need to run a query that gets the exchange rate for a transaction. Based on the date of the transaction, I need the latest exchange rate from a rate history table where the starting effective date for the rate is less than the transaction's date.
Very basically:
SELECT t.Amount, t.TrxDate, t.Currency, ex.ExchangeRate
FROM Transactions t
LEFT JOIN LATERAL (
SELECT ExchangeRate
FROM RateHistory h
WHERE t.TrxDate <= h.EffectiveDate AND t.Currency = h.Currency ORDER BY EffectiveDate DESC
LIMIT 1) ex
The problem comes in with the LIMIT 1. I can't have this subquery returning multiple rows, I only need the most recent exchange rate. Using LIMIT 1 or attempting to use ROW_NUMBER() and a conditional join both result in the same "Unsupported subquery" error.
Anyone have any recommendations? The query works just fine elsewhere, but not in Snowflake.
Here is the actual query. The last subquery is the issue.
SELECT
'SG' AS Region,
CAST(gl.acct_no AS NUMERIC(12,6)) AS acct_no,
gl.trx_date,
source,
gl.reference,
doc_no,
gl.amount,
for_amt,
reference2,
gl.reg_int_id,
gl.reg_seq_no,
gl.curr_id AS gl_curr_id,
c.curr_id AS chart_curr_id,
z.reference AS reference3,
z.dist_dt,
cur.cur_desc,
ex.exchng_rate
FROM gltrx gl
JOIN chart c ON ROUND(c.acct_no,6) = ROUND(gl.acct_no,6)
LEFT OUTER JOIN currencies cur ON gl.curr_id = cur.curr_id
LEFT JOIN LATERAL (SELECT MAX(ap.reference) AS reference, MIN(aph.dist_dt) AS dist_dt
FROM apdist ap
LEFT OUTER JOIN aphist aph ON ap.voucher_no = aph.voucher_no
WHERE ap.reg_int_id = gl.reg_int_id AND ap.reg_seq_no = gl.reg_seq_no) z
LEFT JOIN LATERAL (SELECT Exchng_rate
FROM Curr_hist
WHERE Curr_Id = gl.curr_id
AND Exchng_Dt <= gl.trx_date
ORDER BY exchng_dt DESC
LIMIT 1) ex
Recommendation 1: Tried below query with returned one row successfully, both emp and dept tables have two rows each.
SELECT * FROM EMP WHERE DEPTNO = (SELECT DEPTNO FROM DEPT LIMIT 1)
So we can use LIMIT in subquery, there is no issue with LIMIT, issue is somewhere else.
Try to break query into pieces, and run each piece individually and then combine them one-by one to analyze.
Response To Recommendation 1: That subquery works, however the issue is with the LEFT JOIN LATERAL. I have tried my query without the LIMIT 1 and it works fine... except it obviously returns too many results.
Recommendation 2: Any Specific reason we are using LATERAL?
Try writing subquery using with clause and join it later.
Response to Recommendation 2: Using LATERAL because I need exactly one record from the subquery to join to each record in the transaction table. LATERAL allows for the conditional aggregation to achieve that. I'm not quite sure what you mean by using the WITH clause. Are you suggesting a CTE?
Recommendation 3: Maybe you can try removing LATERAL and also use MAX function instead of sorting subquery and getting limit, because essentially you need the latest rate after specified date but that was max by date will give you.
SELECT t.Amount, t.TrxDate, t.Currency, ex.ExchangeRate
FROM Transactions t
LEFT JOIN (
SELECT MAX(h.EffectiveDate ), h.Currency
FROM RateHistory h
WHERE t.TrxDate <= h.EffectiveDate
AND t.Currency = h.Currency
GROUP BY h.Currency
) ex
Response to Recommendation 3: Since what I need is the the exchange rate for the max date, I'd need to select the exchange rate in that subquery as well, which means it would need to be grouped by and I'd end up in the same place; too many joins.
Are there any other recommendations and/or work-arounds available out there??
use a window function to make sure you get only 1 row?
(SELECT Exchng_rate,
RANK() OVER (ORDER BY exchng_dt DESC) as order
FROM Curr_hist
WHERE Curr_Id = gl.curr_id
AND Exchng_Dt <= gl.trx_date
)
where order=1
So I moved the first LATERAL to a CTE which might not have been the problem.
The second LATERAL is needing each rows gl.tx_date so if you just LEFT JOIN on that (which gives to way to many rows) and then use the ROW_NUMBER() to implement the filter you wanted with the LIMIT 1. Which makes me think this will be slow, but should work.
WITH cte_a AS (
SELECT ap.reg_int_id
,ap.reg_seq_no
,MAX(ap.reference) AS reference
,MIN(aph.dist_dt) AS dist_dt
FROM apdist ap
LEFT OUTER JOIN aphist aph
ON ap.voucher_no = aph.voucher_no
GROUP BY 1,2
)
SELECT Region,
acct_no,
trx_date,
source,
reference,
doc_no,
amount,
for_amt,
reference2,
reg_int_id,
reg_seq_no,
gl_curr_id,
chart_curr_id,
reference3,
dist_dt,
cur_desc,
exchng_rate
FROM (
SELECT
'SG' AS Region,
CAST(gl.acct_no AS NUMERIC(12,6)) AS acct_no,
gl.trx_date,
source,
gl.reference,
doc_no,
gl.amount,
for_amt,
reference2,
gl.reg_int_id,
gl.reg_seq_no,
gl.curr_id AS gl_curr_id,
c.curr_id AS chart_curr_id,
z.reference AS reference3,
z.dist_dt,
cur.cur_desc,
ex.exchng_rate
row_number() over (partition by gl.curr_id order by ex.exchng_dt DESC) as rn
FROM gltrx gl
JOIN chart c
ON ROUND(c.acct_no,6) = ROUND(gl.acct_no,6)
LEFT OUTER JOIN currencies cur ON gl.curr_id = cur.curr_id
LEFT JOIN cte_a as z
ON z.reg_int_id = gl.reg_int_id AND z.reg_seq_no = gl.reg_seq_no
LEFT JOIN Curr_hist as ex
ON ex.Curr_Id = gl.curr_id and ex.Exchng_Dt <= gl.trx_date
)
WHERE rn = 1;

Complete Select Statement ELSE a defaulted value?

I'm wondering if there is a way to have a complete select statement and if no row is returned to return a row with a default value in a specific field (SQL Server)? Here's a generic version of my SQL to better explain:
SELECT COUNT(CASE WHEN CAST(c.InjuryDate as DATE)>DATEADD(dd,-60, getdate ()) THEN b.InjuryID end) InjuryCount, a.PersonID
FROM Person_Info a
JOIN Injury_Subject b on b.PersonID=a.PersonID
JOIN Injury_Info c on c.InjuryID=b.InjuryID
WHERE EXISTS (SELECT * FROM Hospital_Record d WHERE d.PersonID=b.PersonID and d.InjuryID=b.InjuryID) --There could be multiple people associated with the same InjuryID
GROUP BY a.PersonID
If NOT EXISTS (SELECT * FROM Hospital_Record d WHERE d.PersonID=a.PersonID) THEN '0' in InjuryCount
I want a row for each person who has had an injury to display. Then I'd like a count of how many injuries resulted in hospitalizations in the last 60 days. If they were not hospitalized, I'd like the row to still be generated, but display '0' in InjuryCount column. I've played with this a bunch, moving my date from the WHERE to the SELECT, trying IF ELSE combos, etc. Could someone help me figure out how to get what I want please?
It's hard to tell without an example of input and desired output, but I think this is what you are going for:
select
InjuryCount = count(case
when cast(ii.InjuryDate as date) > dateadd(day,-60,getdate())
then i.InjuryId
else null
end
)
, p.PersonId
from Person_Info p
left join Hosptal_Record h on p.PersonId = h.PersonId
left join Injury_Subject i on i.PersonId = h.PersonId
and h.InjuryId = i.InjuryId
left join Injury_Info ii on ii.InjuryId = i.InjuryId
group by p.PersonId;

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

How to SELECT DISTINCT Info with TOP 1 Info and an Order By FROM the Top 1 Info

I have 2 tables, that look like:
CustomerInfo(CustomterID, CustomerName)
CustomerReviews(ReviewID, CustomerID, Review, Score)
I want to search reviews for a string and return CustomerInfo.CustomerID and CustomerInfo.CustomerName. However, I only want to show distinct CustomerID and CustomerName along with just one of their CustomerReviews.Reviews and CustomerReviews.Score. I also want to order by the CustomerReviews.Score.
I can't figure out how to do this, since a customer can leave multiple reviews, but I only want a list of customers with their highest scored review.
Any ideas?
This is the greatest-n-per-group problem that has come up dozens of times on Stack Overflow.
Here's a solution that works with a window function:
WITH CustomerCTE (
SELECT i.*, r.*, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Score DESC) AS RN
FROM CustomerInfo i
INNER JOIN CustomerReviews r ON i.CustomerID = r.CustomerID
WHERE CONTAINS(r.Review, '"search"')
)
SELECT * FROM CustomerCTE WHERE RN = 1
ORDER BY Score;
And here's a solution that works more broadly with RDBMS brands that don't support window functions:
SELECT i.*, r1.*
FROM CustomerInfo i
INNER JOIN CustomerReviews r1 ON i.CustomerID = r1.CustomerID
AND CONTAINS(r1.Review, '"search"')
LEFT OUTER JOIN CustomerReviews r2 ON i.CustomerID = r2.CustomerID
AND CONTAINS(r1.Review, '"search"')
AND (r1.Score < r2.Score OR r1.Score = r2.Score AND r1.ReviewID < r2.ReviewID)
WHERE r2.CustomerID IS NULL
ORDER BY Score;
I'm showing the CONTAINS() function because you should be using the fulltext search facility in SQL Server, not using LIKE with wildcards.
I voted for Bill Karwin's answer, but I thought I'd throw out another option.
It uses a correlated subquery, which can often incur performance problems with large data sets, so use with caution. I think the only upside is that the query is easier to immediately understand.
select *
from [CustomerReviews] r
where [ReviewID] =
(
select top 1 [ReviewID]
from [CustomerReviews] rInner
where rInner.CustomerID = r.CustomerID
order by Score desc
)
order by Score desc
I didn't add the string search filter, but that can be easily added.
I think this should do it
select ci.CustomterID, ci.CustomerName, cr.Review, cr.Score
from CustomerInfo ci inner join
(select top 1*
from CustomerReviews
where Review like '%search%'
order by Score desc) cr on ci.CustomterID = cr.CustomterID
order by cr.Score

Resources