SQL Selecting all but newest result per id - sql-server

I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data.
What am I doing wrong here?
Here's my select statement:
select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id
Here's the actual query I want to run:
update EV.Test_Result
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0

If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.
EV.Test_Result.test_row_id = t.test_row_id
and EV.Test_Result.start_time < t.start_time
This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.
and t.device_id = 1219
and t.waived = 0
This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.
Something like the following might be what you needed:
SELECT A.test_row_id
, A.test_result_id
, A.waived
, A.pass
, A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
, test_row_id
FROM EV.Test_Result
WHERE device_id = 1219
AND waived = 0
GROUP BY test_row_id
) AS T ON A.test_row_id = T.test_row_id
AND A.start_time < T.start_time
ORDER BY A.test_row_id
This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.
UPDATE:
Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:
As you know, you write a simple SELECT statement like the following:
SELECT <aggregate column>, SUM(<non-aggregate column>) AS Cost
FROM <table_name>
WHERE <column> = 'some_value'
GROUP BY <aggregate column>
HAVING SUM(<non-aggregate column>) > some_value
ORDER BY <column>
Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.
Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:
FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY
There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/
Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).
Hope this helps solve the problem and better yet how SQL works. Cheers

Can you try ROW_NUMBER () function order by timestamp descending and filtering out values having ROW_NUMBER 1 ;
Below query should fetch all records per id except the latest one
I tried below query in Oracle with a table having fields : id,user_id, record_order adn timestamp and it worked :
select
<table_name_alias>.*
from
(
select
id,
user_id,
row_number() over (partition by id order by record_order desc) as record_number
from
<your_table_name>
) <table_name_alias>
where
record_number <>1;
If you are using Teradata DB, you can also try QUALIFY statement. I'm not sure if all DBs support this.
Select
table_name.*
from table_name
QUALIFY row_number() over (partition by id order by record_order desc) <>1;

Related

Splitting Data from a Column into two Columns

I have data in a column called "Medication_Description". First I need to
then find out who that received Tylenol also received a second drug (Ibuprofen here).
I also only want the top two results for each MRN (i.e. patient). I only want data for the past year.
Later I will be plugging this into an SSRS report where I will determine the percentage of patients that received both drugs.
I've played with a couple different ways of getting this to work but can't get it working quite right.
The data in this table looks like this:
As for desired results, I'd like to have something like this:
Blockquote
For MRN 654321, no Ibuprofen was administered so it returns NULL (it could also return another drug name - doesn't matter too much. I just need to be able to count the results later to determine a percentage).
For MRN 246824, only one dose of Ibuprofen was administered so the second line is NULL.
Below is my latest attempt but (as you can see) Med1 and Med2 will always reflect the same exact data - how can I make Med1 reflect one medication and Med2 reflect a second?
SELECT [MRN], [Med1], [Med2], [Row_Num], [Department_Name], [Date]
FROM
( SELECT [MRN], [Medication_Description] AS [Med1], [Medication_Description] AS [Med2],
ROW_NUMBER() OVER (PARTITION BY [MRN]
ORDER BY [Medication_Description] DESC
)
AS [Row_Num],
[Date],
[Department_Name]
FROM T_Med_Orders
WHERE [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
AND [Department_Name] LIKE 'ICU'
--First Med Must Match "Tylenol" but 2nd should match any result...???
AND Medication_Description LIKE '%Tylenol%'
)
tmp
WHERE
[Row_Num] <= 2
AND Med1 LIKE '%Tylenol%'
--AND Med2 LIKE '%Ibuprofen%'
AND [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
ORDER BY [MRN]
You have a solid start, but I think you're being too ambitious with your query. While there are ways to optimize the query, using Partitions may be overkill for the current requirements. What I ended up doing was to make two CTEs where each is filtered to the individual medicine being identified. I can LEFT JOIN those to the source table and filter it to only show results where the CTE values are NOT NULL. I can also apply the other DeptName and Date clauses, although I have not done so in my code snippet.
The main drawback is that this format would require more and more CTEs if you wanted to expand to include other medicines, swiftly reducing optimization further. But without knowing how MedicationDescription is formatted (or if it even has a standard format) I can't write that for you.
WITH Tylenol_CTE AS
(SELECT *, 'Tylenol' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Tylenol%')
,Ibuprofen_CTE AS
(SELECT *, 'Ibuprofen' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Ibuprofen%')
SELECT t.*
, Tylenol_CTE.[FilteredMedicine] AS Med1
, Ibuprofen_CTE.[FilteredMedicine] AS Med2
FROM #Temp t
LEFT JOIN Tylenol_CTE
ON t.MRN = Tylenol_CTE.MRN
AND t.Date = Tylenol_CTE.Date
LEFT JOIN Ibuprofen_CTE
ON t.MRN = Ibuprofen_CTE.MRN
AND t.Date = Ibuprofen_CTE.Date
WHERE Ibuprofen_CTE.Medication_Description IS NOT NULL
AND Tylenol_CTE.Medication_Description IS NOT NULL

Solving Duplicated in Access

i had a table depends on more than one tables and i get this final
ScrrenShoot have a look in picture
i need to choose from values if firstdate duplicated in specific criteria
for ex . i need one row for 18.2.2016 / max value ( get the greater one ) / min value (get the less one )
You need to provide us with better information, but here is what I think you're looking for.
You need a separate query for each min/max value you want to find. Where you see "MyTable" you need to replace it with the object name shown in the screenshot.
Query 1 "Max"
SELECT MyTable.FirstOfDate, Max(MyTable.MaxValue) AS MaxOfMaxValue
FROM MyTable
GROUP BY MyTable.FirstOfDate;
Query 2 "Min"
SELECT MyTable.FirstOfDate, Min(MyTable.MinValue) AS MinOfMinValue
FROM MyTable
GROUP BY MyTable.FirstOfDate;
Query 3 "Merge"
SELECT DISTINCT MyTable.FirstOfDate, Max.MaxOfMaxValue, Min.MinOfMinValue
FROM (MyTable
INNER JOIN [Max] ON MyTable.FirstOfDate = Max.FirstOfDate)
INNER JOIN [Min] ON MyTable.FirstOfDate = Min.FirstOfDate
GROUP BY MyTable.FirstOfDate, Max.MaxOfMaxValue, Min.MinOfMinValue;

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

What technique should I use for Optimizing the SQL Query

Hi I have a stored procedure that is used to fetch records while searching. This procedure returns millions of records. However there was a bug found inside the search procedure which also return duplicate records in some scenario when certain condition are met. I have found the error why it was returning duplicate records: Below is the query that is in question:
With cteAutoApprove (AcctID, AutoApproved,DecisionDate)
AS (
select
A.AcctID,
CAST(autoEnter AS SMALLINT) AS AutoApproved,
DecisionDate
from
(
SELECT
awt.AcctID,
MIN(awt.dtEnter) AS DecisionDate
FROM
dbo.AccountWorkflowTask awt
JOIN dbo.WorkflowTask wt ON awt.WorkflowTaskID = wt.WorkflowTaskID
Join Task T on T.TaskID = wt.TaskID
WHERE
(
(T.TaskStageID = 3 and awt.ReasonIDExit is NULL)
OR (wt.TaskID IN (9,15,201,208,220,308,319,320,408,420,508,608,620,1470,1608,1620))
)
GROUP BY
awt.AcctID
) A
Join AccountWorkflowTask awt1
on awt1.dtEnter=A.DecisionDate and awt1.AcctID=a.AcctID
),
This CTE was returning duplicate record because of the condition on awt1.dtEnter=A.DecisionDate the dtEnter for some account was exactly same. This is the reason it returned duplicate record.
My question is what should I use to prevent this. I cannot use Distinct here as it will definitely slow down the search procedure. Shall I use Rank or Dense Rank so that it is optimized and the query takes less time to execute the result? Or some other technique? Please help as I am actually stuck here
It does seem like a good candidate for row_number (not rank, with the same dates on the same acctid, you'd still have multiple records)
Obviously I can't test the query here, but winging it:
select
A.AcctID,
CAST(autoEnter AS SMALLINT) AS AutoApproved,
DecisionDate
from
(
SELECT
awt.AcctID,
awt.dtEnter AS DecisionDate,
autoEnter,
row_number() over (partition by awt.acctid order by awt.dtEnter) rnr
FROM
dbo.AccountWorkflowTask awt
JOIN dbo.WorkflowTask wt ON awt.WorkflowTaskID = wt.WorkflowTaskID
Join Task T on T.TaskID = wt.TaskID
WHERE
(
(T.TaskStageID = 3 and awt.ReasonIDExit is NULL)
OR (wt.TaskID IN (9,15,201,208,220,308,319,320,408,420,508,608,620,1470,1608,1620))
)
) A
where rnr = 1
This way, the group by is no longer necessary: getting the first date is done by row_number. Neither is the second join, the subquery already contains all the data (and the optimizer is smart enough not to do anything with the rows it doesn't need)
PS. because sql server window functions work incredibly efficient, using row_number instead of the min() - join construction, will most likely gain a performance boost, even if there were no double rows.

Performant way to get the maximum value of a running total in TSQL

We have a table of transactions which is structured like the following :
TranxID int (PK and Identity field)
ItemID int
TranxDate datetime
TranxAmt money
TranxAmt can be positive or negative, so the running total of this field (for any ItemID) will go up and down as time goes by. Getting the current total is obviously simple, but what I'm after is a performant way of getting the highest value of the running total and the TranxDate when this occurred. Note that TranxDate is not unique, and due to some backdating the ID field is not necessarily in the same sequence as TranxDate for a given Item.
Currently we're doing something like this (#tblTranx is a table variable containing just the transactions for a given Item) :
SELECT Top 1 #HighestTotal = z.TotalToDate, #DateHighest = z.TranxDate
FROM
(SELECT a.TranxDate, a.TranxID, Sum(b.TranxAmt) AS TotalToDate
FROM #tblTranx AS a
INNER JOIN #tblTranx AS b ON a.TranxDate >= b.TranxDate
GROUP BY a.TranxDate, a.TranxID) AS z
ORDER BY z.TotalToDate DESC
(The TranxID grouping removes the issue caused by duplicate date values)
This, for one Item, gives us the HighestTotal and the TranxDate when this occurred. Rather than run this on the fly for tens of thousands of entries, we only calculate this value when the app updates the relevant entry and record the value in another table for use in reporting.
The question is, can this be done in a better way so that we can work out these values on the fly (for multiple items at once) without falling into the RBAR trap (some ItemIDs have hundreds of entries). If so, could this then be adapted to get the highest values of subsets of transactions (based on a TransactionTypeID not included above). I'm currently doing this with SQL Server 2000, but SQL Server 2008 will be taking over soon here so any SQL Server tricks can be used.
SQL Server sucks in calculating running totals.
Here's a solution for your very query (which groups by dates):
WITH q AS
(
SELECT TranxDate, SUM(TranxAmt) AS TranxSum
FROM t_transaction
GROUP BY
TranxDate
),
m (TranxDate, TranxSum) AS
(
SELECT MIN(TranxDate), SUM(TranxAmt)
FROM (
SELECT TOP 1 WITH TIES *
FROM t_transaction
ORDER BY
TranxDate
) q
UNION ALL
SELECT DATEADD(day, 1, m.TranxDate),
m.TranxSum + q.TranxSum
FROM m
CROSS APPLY
(
SELECT TranxSum
FROM q
WHERE q.TranxDate = DATEADD(day, 1, m.TranxDate)
) q
WHERE m.TranxDate <= GETDATE()
)
SELECT TOP 1 *
FROM m
ORDER BY
TranxSum DESC
OPTION (MAXRECURSION 0)
You need to have an index on TranxDate for this to work fast.

Resources