Query for item with subset of related items - sql-server

I've got two tables:
Part (Table)
----
PartID
SerialNumber
CreationDate
Test (Table)
----
PartID
TestName
TestDateTime
TestResult
The tables have a one to many relationship on PartID, one part may have many Test entries.
What I'm trying to do is return a list of parts with the information of only the last test performed on that part.
Part Test
PartID SerialNumber CreationDate PartID TestName TestDateTime TestResult
-------------------------------- -------------------------------------------
1 555 12/9/2013 1 Test 1 1/1/2014 Pass
1 Test 2 2/2/2014 Fail
I would like to return the last test data with the part's information:
PartID SerialNumber CreationDate TestName TestDateTime TestResult
-----------------------------------------------------------------
1 555 12/9/2013 Test 2 2/2/2014 Fail
I can currently get the TestDateTime of the part's last test, but no other information with this query (as a subquery cannot return more than more item):
SELECT PartID, SerialNumber, CreationDate,
(SELECT TOP (1) TestDateTime
FROM Test
WHERE (PartID = Part.PartID)
ORDER BY TestDateTime DESC) AS LastDateTime
FROM Part
ORDER BY SerialNumber
Is there a different approach I can take to get the data I'm looking for?

Here is another way to do that only hits the Test table one time.
with SortedData as
(
SELECT PartID
, SerialNumber
, CreationDate
, TestDateTime
, ROW_NUMBER() over (Partition by PartID ORDER BY TestDateTime DESC) AS RowNum
FROM Part p
join Test t on t.PartID = p.PartID
)
select PartID
, SerialNumber
, CreationDate
, TestDateTime
from SortedData
where RowNum = 1
ORDER BY SerialNumber
If you are on 2012 or later you can also use FIRST_VALUE

Try using a sub query in your join and then filter based on that. Your Sub query should select the PardID and Max(TestDateTime)
Select TestSubQ.PartID, Max(TestSubQ.TestDateTime)
From Test TestSubQ
group by TestSubQ.PartID
Then just filter your main query by joining this table
Select Part.PartID, SerialNumber, CreationDate,
TestMain.PartID, TestMain.TestName, TestMain.TestDateTime, TestMain.TestResult
From Part
Left Outer Join (Select TestSubQ.PartID, Max(TestSubQ.TestDateTime)
From Test TestSubQ
group by TestSubQ.PartID) TestPartSub
On Part.PartID = TestPartSub.PartID
Left Outer Join Test TestMain
On TestPartSub.PartID = TestMain.PartID
And TestPartSub.TestDateTime = TestMain.TestDateTime
Order By SerialNumber
Note though that if your data only contains dates and not times then you may still end up with 2 entries if two tests were done on the same date. If time is included though it is highly unlikely that two exact datetimes will match for two different tests for any one part.

Related

Select rows where a value is maximum, and a column is null

I have a table, products, that looks along these lines:
productID | version | done
1 | 1 | 2000-01-01
1 | 2 | NULL
2 | 1 | NULL
2 | 2 | 2000-01-01
Version is assumed to be increasing.
What I want is a query that returns a ProductID and its highest / current Version, if the Done column for that version is NULL. In plain English, I want all products where the latest version is not Done, and the corresponding version. The goal: among products, find the ones with a new version that have not been "done" / processed yet.
Note: in the example above, I would expect the query to return ProductID 1, Version 2 only. I do not want the highest not-done version of a product, I want the highest version of a product, if it is not-done. Sorry if the clarification is overkill.
I wrote a query which appears to do what I want:
SELECT productID ProductID, version Version
FROM products
WHERE done IS NULL
AND version IN (
SELECT MAX(version)
FROM products
GROUP BY productID
)
However, it also appears to not be very efficient. So my question is, is there a better way to approach this query?
We can try using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY productID ORDER BY version DESC) rn
FROM products
)
SELECT productID, version
FROM cte
WHERE rn = 1 AND done IS NULL;
Demo
The CTE above assigns a row number, starting with 1, to latest record for each product, according to version. Then, we subquery and retain only product records where the latest one happens to not have a value assigned to the done column.
Seems you are almost correct with your query, what's missing is the correlation between the productID of your subquery and your main table.
SELECT t.productID ProductID, t.version Version
FROM products t
WHERE t.done IS NULL
AND version IN (
SELECT MAX(p.version)
FROM products p
WHERE p.productID = t.productID
GROUP BY p.productID
)
Another solution is to use join
select t1.* from products t1
inner join
(select max(version) as versionId, productID
from products
group by productID) t2 on t2.productID = t1.productID and t2.versionId = t1.version
where coalesce(done, '') = ''

how to get multiple min values from two SQL tables?

I have two tables, a Members table and a Plan table. They are structured as follows.
member start_date Mplan Pplan version start_dt end_dt
John 20120701 johnplan johnplan 1 20120601 20130531
John 20130201 johnplan johnplan 2 20130601 20140531
John 20130901 johnplan
John 20131201 johnplan
I need to update the start_date on the Members table to be the minimum value present for that member but within the same Plan version.
Example:
20130201 would be changed to 20120701 and 20131201 would change to 20130901.
Code:
UPDATE Members
SET start_date =(
SELECT MIN(start_date) FROM Members a
LEFT JOIN Plan ON Mplan = Pplan AND
start_date BETWEEN start_dt AND end_dt
WHERE member=a.member
AND start_date BETWEEN start_dt AND end_dt
)
Unfortunately this sets every single start_date to 19900101 aka the lowest value in the entire table for that column.
First you need to get the minimum start date of each member for a specific plan. The following will provide you that.
select MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan
The result will be something like this.
min_date member_name plan_name
2012-07-01 00:00:00.000 John johnplan1
2013-09-01 00:00:00.000 John johnplan2
Use this to update each member's start date for a plan with the lowest start date of the respective plan.
update members
set start_date= tbl.min_date from
(SELECT MIN(start_date) as min_date,a.member as member_name,a.Mplan as plan_name FROM Members a
inner JOIN [plan] p ON a.Mplan = p.Pplan AND
start_date BETWEEN p.start_dt AND p.end_dt
group by a.member, a.Mplan) as tbl
where member=tbl.member_name and Mplan=tbl.plan_name
I created your 2 tables, members and plan, and tested this solution with sample data and it works. I hope it helps.
You really need to convert the dates to Datetime. You will have a greater precision, the possibility to store hours, days and minutes as well as access to date specific functions, international conversion and localization.
If your column is a Varchar(8), then it uses no less space than a Datetime column.
That said, what you are looking for is row_number().
Something like:
SELECT Member, MPlan, Start_Date, Row_Number() OVER (PARTITION BY Member, MPLan ORDER BY Start_Date) as Version
FROM Members
Could you try this ? I didn't test it.
With Member_start_dt as
(
select *, (select start_dt from Pplan where M.start_date <= start_dt AND M.start_date >= end_dt) as Pplan_date
from Members M
),
Member_by_plan as
(
select *, ROW_NUMBER () over (partition by Pplan_date order by start_date) num
from Member_start_dt
)
update M
Set M.start_date = MBP1.start_date
from Members M
inner join Member_by_plan MBP1 ON MBP1.member = M.Member AND num = 1
inner join Member_by_plan MBP2 ON MBP2.member = M.Member AND MBP2.Pplan_date = MBP1.Pplan_date AND MBP2.start_date = M.start_date

MS SQL Server Can Not Get A Select Sum Column Correct

I am using MS SQL Server Management Studio. What I am trying to do is get a sum as one of my columns for each record but that sum would only sum up values based on the values from the first two columns.
The query looks like this so far:
SELECT DISTINCT
BeginPeriod,
EndPeriod,
(
SUM((select FO_NumPages from tbl_Folder where FO_StatisticDateTime > BeginPeriod AND FO_StatisticDateTime < EndPeriod))
) AS PageCount
FROM
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime),0),101) AS BeginPeriod,
tbl_Folder.FO_PK_ID AS COL1ID
FROM
tbl_Folder
)AS ProcMonth1
INNER JOIN
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime)+1,0),101) AS EndPeriod,
tbl_Folder.FO_PK_ID AS COL2ID
FROM
tbl_Folder
)AS ProcNextMonth1
ON ProcMonth1.COL1ID = ProcNextMonth1.COL2ID
ORDER BY BeginPeriod DESC;
The table I am getting the data from would look something like this:
FO_StatisticsDateTime | FO_PK_ID | FO_NumPages
-------------------------------------------------
03/21/2013 | 24 | 5
04/02/2013 | 22 | 6
I want the sum to count the number of pages for each record that is between the beginning period and the end period for each record.
I understand the sum with the select statement has an aggregate error in that function for the column values. But is there a way I can get that sum for each record?
I'm trusting that everything in the FROM clause works as you expect, and would suggest that this change to the top part of your query should get what you want:
SELECT DISTINCT
BeginPeriod,
EndPeriod,
(Select SUM(FO_NumPages)
from tbl_Folder f1
where f1.FO_StatisticDateTime >= ProcMonth1.BeginPeriod
AND f1.FO_StatisticDateTime <= ProcNextMonth1.EndPeriod
) AS PageCount
FROM
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime),0),101) AS BeginPeriod,
tbl_Folder.FO_PK_ID AS COL1ID
FROM
tbl_Folder
)AS ProcMonth1
INNER JOIN
(
SELECT
CONVERT(varchar(12),DATEADD(mm,DATEDIFF(mm,0,tbl_Folder.FO_StatisticDateTime)+1,0),101) AS EndPeriod,
tbl_Folder.FO_PK_ID AS COL2ID
FROM
tbl_Folder
)AS ProcNextMonth1
ON ProcMonth1.COL1ID = ProcNextMonth1.COL2ID
ORDER BY BeginPeriod DESC;
This should work:
select BeginDate,
EndDate,
SUM(tbl_Folder.FO_NumPages) AS PageCount
from (select distinct dateadd(month,datediff(month,0,FO_StatisticDateTime),0) BeginDate from tbl_Folder) begindates
join (select distinct dateadd(month,datediff(month,0,FO_StatisticDateTime)+1,0) EndDate from tbl_Folder) enddates
on BeginDate < EndDate
join tbl_Folder
on tbl_Folder.FO_StatisticDateTime >= BeginDate
and tbl_Folder.FO_StatisticDateTime < EndDate
group by BeginDate, EndDate
order by 1, 2
I changed your expressions that converted the dates, because the string comparisons won't work as expected.
It joins two sub-queries of distinct beginning and ending dates to get all the possible date combinations. Then it joins that with your data that falls between the dates so that you can come up with your sum.

Max Value with unique values in more than one column

I feel like I'm missing something really obvious here.
Using T-SQL/SQL-Server:
I have unique values in more than one column but want to select the max version based on one particular column.
Dataset:
Example
ID | Name| Version | Code
------------------------
1 | Car | 3 | NULL
1 | Car | 2 | 1000
1 | Car | 1 | 2000
Target status: I want my query to only select the row with the highest version value. Running a MAX on the version column pulls all three because of the distinct values in the 'Code' column:
SELECT ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
The net result is that I get all three entries as per the data set due to the unique values in the Code column, but I only want the top row (Version 3).
Any help would be appreciated.
You need to identify the row with the highest version as 1 query and use another outer query to pull out all the fields for that row. Like so:
SELECT t.ID, t.Name, GRP.Version, t.Code
FROM (
SELECT ID
,Name
,MAX(Version) as Version
FROM Table
GROUP BY ID, Name
) GRP
INNER JOIN Table t on GRP.ID = t.ID and GRP.Name = t.Name and GRP.Version = t.Version
You can also use row_number() to do this kind of logic, for example like this:
select ID, Name, Version, Code
from (
select *, row_number() over (order by Version desc) as RN
from Table1
) X where RN = 1
Example in SQL Fiddle
add the top statment to force the return of a single row. Also add the order by notation
SELECT top 1 ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
order by max(version) desc

Problem with unique SQL query

I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?
Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.
SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.
I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.
create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale

Resources