Select top field based on ordering of another field - sql-server

Sorry if the title is hard to understand -- I'm not quite sure how to describe what I want to do. Let's say I have this table. test1 and test2 are int columns, whereas test3 is a string/varchar:
test1 | test2 | test3
1 1 one
1 2 two
1 3 three
1 4 four
2 10 ten
2 11 eleven
2 12 twelve
3 101 one hundred one
3 104 one hundred four
3 107 one hundred seven
I am trying to figure out the select query that will return the top test3 for each test1, where the ordering is done based on the value in test2. In other words, I am trying to find the query that will return this:
test1 | test3
1 four
2 twelve
3 one hundred seven
It'd be really great if the solution could work on both MS SQL Server (2005 and 2008) and MS Access (2007 and 2010).

A Lowest common denominator answer.
SELECT yourtable.test1, yourtable.test3
FROM yourtable
INNER JOIN
(
SELECT test1, MAX(test2) AS test_2
FROM yourtable
GROUP BY test1
) t
ON t.test1 = yourtable.test1 AND t.test_2 = yourtable.test2

In SQL Server:
SELECT test1, test3
FROM (
SELECT test1, test3, ROW_NUMBER() OVER (PARTITION BY test1 ORDER BY test2 DESC) AS rn
FROM mytable
) q
WHERE rn = 1
or
SELECT test1, test3
FROM (
SELECT DISTINCT test1
FROM mytable
) md
CROSS APPLY
(
SELECT TOP 1
test3
FROM mytable mi
WHERE mi.test1 = md.test1
ORDER BY
mi.test2 DESC
) mt
Each of these methods can be more efficient than another, depending on your data distribution.
This article may be of interest to you:
SQL Server: Selecting records holding group-wise maximum

Related

Identifying changes over time

No doubt a similar question has come up before, but I haven't been able to locate it by searching...
I have a raw dataset with time series data including 'from' and 'to' date fields.
The problem is, when data is loaded, new records have been created ('to' date added to old record, new record 'from' load date) even where no values have changed.
I want to convert this to a table which just shows a row for each genuine change - and the from/ to dates reflecting this.
By way of example, the source data looks like this:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
1
2
3
01/07/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
The first two records for Test2 (rows 2 and 3) are essentially the same - there was no change when the second row was loaded on 01/07/2020. I want a single row for the period 01/01/2020 - 30/09/2020 for which there was no change:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
For this simplified example, I can achieve that by grouping by each column (apart from dates) and using the MIN from date/ MAX end date:
SELECT
ID, Col1, Col2, Col3, MIN(From) AS From, MAX(To) as TO
FROM TABLE
GROUP BY ID, Col1, Col2, Col3
However, this won't work if a value changes then subsequently changes back to what it was before eg
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/04/2020
Test2
1
2
3
01/05/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
Simply using MIN/ MAX in the code above would return this - so it looks like both sets of values were valid for the period from 01/07/2020 - 30/10/2020:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
31/12/9999
Test2
3
2
1
01/07/2020
30/10/2020
Whereas actually the first set of values were valid before and after that period, but not during.
It should return a single row for instead of two for the period from 01/01/2020 - 30/06/2020 when there were no changes for this ID, but then another row for the period when the values were different, and then another row where it reverted to the initial values, but with a new From date.
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
I'm struggling to conceptualise how to approach this.
I'm guessing I need to use LAG somehow but not sure how to apply it - eg rank everything in a staging table first, then use LAG to compare a concatenation of the whole row?
I'm sure I could find a fudged way eventually, but I've no doubt this problem has been solved many times before so hoping somebody can point me to a simpler/ neater solution than I'd inevitably come up with...
Advanced Gaps and Islands
I believe this is an advanced "gaps and islands" problem. Use that as a search term and you'll find plenty of literature on the subject. Only difference is normally only one column is being tracked, but you have 3.
No Gaps Assumption
One major assumption of this script is there is no gap in the overlapping dates, or in other words, it assumes the previous rows ToDate = current FromDate - 1 day.
Not sure if you need to account for gaps, would be simple just add criteria to IsChanged to check for that
Multi-Column Gaps and Islands Solution
DROP TABLE IF EXISTS #Grouping
DROP TABLE IF EXISTS #Test
CREATE TABLE #Test (ID INT IDENTITY(1,1),TestName Varchar(10),Col1 INT,Col2 INT,Col3 INT,FromDate Date,ToDate DATE)
INSERT INTO #Test VALUES
('Test1',1,1,1,'2020-01-01','9999-12-31')
,('Test2',1,2,3,'2020-01-01','2020-04-30')
,('Test2',1,2,3,'2020-05-01','2020-06-30')
,('Test2',3,2,1,'2020-07-01','2020-10-30')
,('Test2',1,2,3,'2020-11-01','9999-12-31')
;WITH cte_Prev AS (
SELECT *
,PrevCol1 = LAG(Col1) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol2 = LAG(Col2) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol3 = LAG(Col3) OVER (PARTITION BY TestName ORDER BY FromDate)
FROM #Test
), cte_Compare AS (
SELECT *
,IsChanged = CASE
WHEN Col1 = PrevCol1
AND Col2 = PrevCol2
AND Col3 = PrevCol3
THEN 0 /*No change*/
ELSE 1 /*Iterate so new group created */
END
FROM cte_Prev
)
SELECT *,GroupID = SUM(IsChanged) OVER (PARTITION BY TestName ORDER BY ID)
INTO #Grouping
FROM cte_Compare
/*Raw unformatted data so you can see how it works*/
SELECT *
FROM #Grouping
/*Aggregated results*/
SELECT GroupID,TestName,Col1,Col2,Col3
,FromDate = MIN(FromDate)
,ToDate = MAX(ToDate)
,NumberOfRowsCollapsedIntoOneRow = COUNT(*)
FROM #Grouping
GROUP BY GroupID,TestName,Col1,Col2,Col3

Create a view using SQL Server with repeating rows and new column

I have a table with the following columns.
EVAL_ID | GGRP_ID | GOAL_ID
1 1 1
2 2 1
2 2 2
3 1 3
I want to create a view with another columns called GOAL_VERSION which has values from 1 to 3. So that each row from the above table should be duplicated 5 times for different GOAL_VERSION numbers. The out put should be like this.
EVAL_ID | GGRP_ID | GOAL_ID |GOAL_VERSION
1 1 1 1
1 1 1 2
1 1 1 3
1 1 1 4
1 1 1 5
2 2 1 1
2 2 1 2
2 2 1 3
2 2 1 4
2 2 1 5
How can I do that. Help me. Thank you.
Is it this you are looking for?
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
SELECT tbl.*
,x.Nr
FROM #tbl AS tbl
CROSS JOIN (VALUES(1),(2),(3),(4),(5)) AS x(Nr)
EDIT: Varying count of repetition
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
DECLARE #tblCountOfRep TABLE(CountOfRep INT);
INSERT INTO #tblCountOfRep VALUES(3);
SELECT tbl.*
,y.Nr
FROM #tbl AS tbl
CROSS JOIN (SELECT TOP (SELECT CountOfRep FROM #tblCountOfRep) * FROM(VALUES(1),(2),(3),(4),(5) /*add the max count here*/) AS x(Nr)) AS y
In this case I'd prefer I numbers table...
Take a look at CROSS JOIN. If you make a table that's got one column with the 5 rows you want you can just CROSS JOIN it to get the result you're after.
You can achieve this using a CTE and CROSS APPLY:
;WITH CTE AS
(
SELECT 1 AS GOAL_VERSION
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
)
SELECT * FROM <your table>
CROSS APPLY CTE
use "with" (cte) with rank clause for creating view.
If you have a numbers table in SQL database, you can cross join your table with the numbers table for numbers between 1 and 5
Here is my SQL solution for your requirement
select
goals.*,
n.i as GOAL_VERSION
from goals, dbo.NumbersTable(1,5,1) n
And here is the modified version with "cross join" as suggested in the comments
select
goals.*,
n.i as GOAL_VERSION
from goals
cross join dbo.NumbersTable(1,5,1) n
You can realize, I used a SQL table-valued function for SQL numbers table
Please create that SQL function using the source codes given in the referred tutorial
I hope it helps,

Select randomly few Rows of the same ID in the same table (T-SQL)

I'm trying to select randomly few rows for each Id stored in one table where these Ids have multiple rows on this table. It's difficult to explain with words, so let me show you with an example :
Example from the table :
Id Review
1 Text11
1 Text12
1 Text13
2 Text21
3 Text31
3 Text32
4 Text41
5 Text51
6 Text61
6 Text62
6 Text63
Result expected :
Id Review
1 Text11
1 Text13
2 Text21
3 Text32
4 Text41
5 Text51
6 Text62
In fact, the table contains thousands of rows. Some Ids contain only one Review but others can contain hundreds of reviews. I would like to select 10% of these, and select at least once, all rows wich have 1-9 reviews (I saw the SELECT TOP 10 percent FROM table ORDER BY NEWID() includes the row even if it's alone)
I read some Stack topics, I think I have to use a subquery but I don't find the correct solution.
Thanks by advance.
Regards.
Try this:
DECLARE #t table(Id int, Review char(6))
INSERT #t values
(1,'Text11'),
(1,'Text12'),
(1,'Text13'),
(2,'Text21'),
(3,'Text31'),
(3,'Text32'),
(4,'Text41'),
(5,'Text51'),
(6,'Text61'),
(6,'Text62'),
(6,'Text63')
;WITH CTE AS
(
SELECT
id, Review,
row_number() over (partition by id order by newid()) rn,
count(*) over (partition by id) cnt
FROM #t
)
SELECT id, Review
FROM CTE
WHERE rn <= (cnt / 10) + 1
Result(random):
id Review
1 Text12
2 Text21
3 Text31
4 Text41
5 Text51
6 Text63

Expand row results based on a value in column (with iterator)

Need help from you all in writing up this query. Running SQL 2005 Standard edition.
I have a basic query that gets a subset of records from a table where the record_Count is greater then 1.
SELECT *
FROM Table_Records
WHERE Record_Count > 1
This query gives me a result set of, say:
TableRecords_ID Record_Desc Record_Count
123 XYZ 3
456 PQR 2
The above query needs to be modified so that each record appears as many times as the Record_Count and has its iteration number with it, as a value. So the new query should return results as follows:
TableRecords_ID Record_Desc Record_Count Rec_Iteration
123 XYZ 3 1
123 XYZ 3 2
123 XYZ 3 3
456 PQR 2 1
456 PQR 2 2
Could anyone help we write this query up? appreciate the help.
Clarification: Rec_Iteration column is a sub representation of the Record_Count. Basically, since there are three Record_Count for XYZ description thus three rows were returned with the Rec_Iteration representing the Row one , two and three respectively.
You can use a recursive CTE for this query. Below I use a table variable #T instead of your table Table_Records.
declare #T table(TableRecords_ID int,Record_Desc varchar(3), Record_Count int)
insert into #T
select 123, 'XYZ', 3 union all
select 456, 'PQR', 2
;with cte as
(
select TableRecords_ID,
Record_Desc,
Record_Count,
1 as Rec_Iteration
from #T
where Record_Count > 1
union all
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration + 1
from cte
where Rec_Iteration < Record_Count
)
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration
from cte
order by TableRecords_ID,
Rec_Iteration

Want data from table in SQL server

I have table data like this
id id1 name
1 1 test1
1 1 test1
1 2 test2
2 1 test1
2 2 test2
3 1 test1
3 2 test2
3 2 test2
now from table i want the data as below
like
for id = 1 order by id1 asc the first name = test1
so i want the first two row
id id1 name
1 1 test1
1 1 test1
not third row
For id=2 order by id1 asc the first name = test1
so i want first row as test1 has assign only ones for id=2
id id1 name
2 1 test1
And for id=3 same as id=2
Please suggest me how can get the perticlur value for ID , because the scenerio is differnt for ID=1
Use RANK() or DENSE_RANK() to get the first ranked rows, including duplicates, for each id.
select * from (
select *, dense_rank() over (partition by [id] order by [id2]) as ranking
from [table]
) as t
where ranking = 1
Sounds to me like you just want select * from [tablename] where id1 = 1, but I might be wrong. I find the question a bit, well, vague...
Perhaps this:
select * from [table] where id = 1 order by id1
I think the point is that you can use one column in the where clause, and a different column in the order by clause. that's no problem.
But I'm not sure you could actually have the data table you describe, because the first two rows are identical (how can SQL tell them apart? Or more technically, there'd be a primary key violation)?
select * from [table] where id = 1 and name like "test1" order by id1

Resources