Sql;select average and one value from same column - sql-server

How can I get an average value and one other value from the same column into two different columns in a new table?
I have this:
Person_ID col2 col3_values
1 101010A 20000
1 101010B 30000
2 101010A 25000
2 101010B 30000
3 101010A 22000
3 101010B 24000
And I want a table that average col3_values with ID:s from col1_ID (1,2,3) and then compare this average value with a column wich holds the col1_ID: value like this:
col2 AVG(value personID_1-3) Value PersonID_1
101010 A 22333 20000
101010 B 28000 30000
I have tried a lot of code but nothing had worked. Can someone please help me with this? If this worked I would be grateful if I also could get a fourth column thay show the difference between the averagecolumn and the third column that hold ID_1:s values.

There's many ways to do this, one would be to use the outer apply construct:
select
col2,
AVG(t.col3_values) as "AVG(value personID_1-3)",
a.col3_values as "Value PersonID_1",
AVG(t.col3_values) - a.col3_values as "Difference"
from your_table t
outer apply (
select col3_values from your_table where Person_ID = 1 and t.col2 = col2
) a
group by col2, a.col3_values
Or you could use a correlated subquery:
select
col2,
AVG(t.col3_values) as "AVG(value personID_1-3)",
(
select col3_values from your_table where Person_ID = 1 and t.col2 = col2
) as "Value PersonID_1"
from your_table t
group by col2
Sample output:
Query 1:
col2 AVG(value personID_1-3) Value PersonID_1 Difference
-------------------- ----------------------- ---------------- -----------
101010A 22333 20000 2333
101010B 28000 30000 -2000
Query 2:
col2 AVG(value personID_1-3) Value PersonID_1
-------------------- ----------------------- ----------------
101010A 22333 20000
101010B 28000 30000

Related

Identifying changes over time

No doubt a similar question has come up before, but I haven't been able to locate it by searching...
I have a raw dataset with time series data including 'from' and 'to' date fields.
The problem is, when data is loaded, new records have been created ('to' date added to old record, new record 'from' load date) even where no values have changed.
I want to convert this to a table which just shows a row for each genuine change - and the from/ to dates reflecting this.
By way of example, the source data looks like this:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
1
2
3
01/07/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
The first two records for Test2 (rows 2 and 3) are essentially the same - there was no change when the second row was loaded on 01/07/2020. I want a single row for the period 01/01/2020 - 30/09/2020 for which there was no change:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
For this simplified example, I can achieve that by grouping by each column (apart from dates) and using the MIN from date/ MAX end date:
SELECT
ID, Col1, Col2, Col3, MIN(From) AS From, MAX(To) as TO
FROM TABLE
GROUP BY ID, Col1, Col2, Col3
However, this won't work if a value changes then subsequently changes back to what it was before eg
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/04/2020
Test2
1
2
3
01/05/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
Simply using MIN/ MAX in the code above would return this - so it looks like both sets of values were valid for the period from 01/07/2020 - 30/10/2020:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
31/12/9999
Test2
3
2
1
01/07/2020
30/10/2020
Whereas actually the first set of values were valid before and after that period, but not during.
It should return a single row for instead of two for the period from 01/01/2020 - 30/06/2020 when there were no changes for this ID, but then another row for the period when the values were different, and then another row where it reverted to the initial values, but with a new From date.
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
I'm struggling to conceptualise how to approach this.
I'm guessing I need to use LAG somehow but not sure how to apply it - eg rank everything in a staging table first, then use LAG to compare a concatenation of the whole row?
I'm sure I could find a fudged way eventually, but I've no doubt this problem has been solved many times before so hoping somebody can point me to a simpler/ neater solution than I'd inevitably come up with...
Advanced Gaps and Islands
I believe this is an advanced "gaps and islands" problem. Use that as a search term and you'll find plenty of literature on the subject. Only difference is normally only one column is being tracked, but you have 3.
No Gaps Assumption
One major assumption of this script is there is no gap in the overlapping dates, or in other words, it assumes the previous rows ToDate = current FromDate - 1 day.
Not sure if you need to account for gaps, would be simple just add criteria to IsChanged to check for that
Multi-Column Gaps and Islands Solution
DROP TABLE IF EXISTS #Grouping
DROP TABLE IF EXISTS #Test
CREATE TABLE #Test (ID INT IDENTITY(1,1),TestName Varchar(10),Col1 INT,Col2 INT,Col3 INT,FromDate Date,ToDate DATE)
INSERT INTO #Test VALUES
('Test1',1,1,1,'2020-01-01','9999-12-31')
,('Test2',1,2,3,'2020-01-01','2020-04-30')
,('Test2',1,2,3,'2020-05-01','2020-06-30')
,('Test2',3,2,1,'2020-07-01','2020-10-30')
,('Test2',1,2,3,'2020-11-01','9999-12-31')
;WITH cte_Prev AS (
SELECT *
,PrevCol1 = LAG(Col1) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol2 = LAG(Col2) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol3 = LAG(Col3) OVER (PARTITION BY TestName ORDER BY FromDate)
FROM #Test
), cte_Compare AS (
SELECT *
,IsChanged = CASE
WHEN Col1 = PrevCol1
AND Col2 = PrevCol2
AND Col3 = PrevCol3
THEN 0 /*No change*/
ELSE 1 /*Iterate so new group created */
END
FROM cte_Prev
)
SELECT *,GroupID = SUM(IsChanged) OVER (PARTITION BY TestName ORDER BY ID)
INTO #Grouping
FROM cte_Compare
/*Raw unformatted data so you can see how it works*/
SELECT *
FROM #Grouping
/*Aggregated results*/
SELECT GroupID,TestName,Col1,Col2,Col3
,FromDate = MIN(FromDate)
,ToDate = MAX(ToDate)
,NumberOfRowsCollapsedIntoOneRow = COUNT(*)
FROM #Grouping
GROUP BY GroupID,TestName,Col1,Col2,Col3

SQL Server query involving subqueries - performance issues

I have three tables:
Table 1: | dbo.pc_a21a22 |
batchNbr Other columns...
-------- ----------------
12345
12346
12347
Table 2: | dbo.outcome |
passageId record
---------- ---------
00003 200
00003 9
00004 7
Table 3: | dbo.passage |
passageId passageTime batchNbr
---------- ------------- ---------
00001 2015.01.01 12345
00002 2016.01.01 12345
00003 2017.01.01 12345
00004 2018.01.01 12346
What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2
What is the most efficient way to do this?
I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.
My current solution (slow):
SELECT TOP 50000
a.batchNbr,
CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults
)
) then 1 else 0 end as bit) as KGT_IO_END
FROM dbo.pc_a21a22 a
The desired output is:
batchNbr 200present
--------- ----------
12345 1
12346 0
I suggest you use table joining rather than subqueries.
select
a.*, b.*
from
dbo.table1 a
join
dbo.table2 b on a.id = b.id
where
/*your where clause for filtering*/
EDIT:
You could use this as a reference Join vs. sub-query
Try this
SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS
KGT_IO_END
FROM dbo.Test1 AS a
LEFT OUTER JOIN
(SELECT record, p.batchNbr
FROM dbo.Test2 AS o
LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM
dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;
The MAX subquery is to get the latest passageId by batchNbr.
However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.
I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.
Output from your example data:
batchNbr KGT_IO_END
12345 0
12346 0
12347 0
Output if we change the passageId of record 200 to 00003 (the latest for 12345)
batchNbr KGT_IO_END
12345 1
12346 0
12347 0

Getting number of records against each row using SQL server

I have table
col1 col2
---- ----
a rrr
a fff
b ccc
b zzz
b xxx
i want a query that return number of occurrences of col1 against each row like.
rows col1 col2
---- ---- ----
2 a rrr
2 a fff
3 b ccc
3 b zzz
3 b xxx
As a is repeated 2 time and b is repeated 3 time.
You can try Count over partition_by_clause which divides the result set produced by the FROM clause into partitions to which the function is applied.
This function treats all rows of the query result set as a single group
Try this...
select count (col1) over (partition by col1) [rows] ,col1 ,col2 from tablename
You can use the OVER clause with aggregate functions like COUNT:
SELECT rows = COUNT(*) OVER(PARTITION BY col1),
col1,
col2
FROM dbo.TableName
Demo

T-SQL select rows by oldest date and unique category

I'm using Microsoft SQL. I have a table that contains information stored by two different categories and a date. For example:
ID Cat1 Cat2 Date/Time Data
1 1 A 11:00 456
2 1 B 11:01 789
3 1 A 11:01 123
4 2 A 11:05 987
5 2 B 11:06 654
6 1 A 11:06 321
I want to extract one line for each unique combination of Cat1 and Cat2 and I need the line with the oldest date. In the above I want ID = 1, 2, 4, and 5.
Thanks
Have a look at row_number() on MSDN.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY date_time, id) rn
FROM mytable
) q
WHERE rn = 1
(run the code on SQL Fiddle)
Quassnoi's answer is fine, but I'm a bit uncomfortable with how it handles dups. It seems to return based on insertion order, but I'm not sure if even that can be guaranteed? (see these two fiddles for an example where the result changes based on insertion order: dup at the end, dup at the beginning)
Plus, I kinda like staying with old-school SQL when I can, so I would do it this way (see this fiddle for how it handles dups):
select *
from my_table t1
left join my_table t2
on t1.cat1 = t2.cat1
and t1.cat2 = t2.cat2
and t1.datetime > t2.datetime
where t2.datetime is null

Expand row results based on a value in column (with iterator)

Need help from you all in writing up this query. Running SQL 2005 Standard edition.
I have a basic query that gets a subset of records from a table where the record_Count is greater then 1.
SELECT *
FROM Table_Records
WHERE Record_Count > 1
This query gives me a result set of, say:
TableRecords_ID Record_Desc Record_Count
123 XYZ 3
456 PQR 2
The above query needs to be modified so that each record appears as many times as the Record_Count and has its iteration number with it, as a value. So the new query should return results as follows:
TableRecords_ID Record_Desc Record_Count Rec_Iteration
123 XYZ 3 1
123 XYZ 3 2
123 XYZ 3 3
456 PQR 2 1
456 PQR 2 2
Could anyone help we write this query up? appreciate the help.
Clarification: Rec_Iteration column is a sub representation of the Record_Count. Basically, since there are three Record_Count for XYZ description thus three rows were returned with the Rec_Iteration representing the Row one , two and three respectively.
You can use a recursive CTE for this query. Below I use a table variable #T instead of your table Table_Records.
declare #T table(TableRecords_ID int,Record_Desc varchar(3), Record_Count int)
insert into #T
select 123, 'XYZ', 3 union all
select 456, 'PQR', 2
;with cte as
(
select TableRecords_ID,
Record_Desc,
Record_Count,
1 as Rec_Iteration
from #T
where Record_Count > 1
union all
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration + 1
from cte
where Rec_Iteration < Record_Count
)
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration
from cte
order by TableRecords_ID,
Rec_Iteration

Resources