Looking for a special way to join on SQL server - sql-server

I would like to join two tables but couldn't find any existing joins(I tried left, right, full, cross) to do that.
I want to combine table 1 and table 2 into the table 3.
The data order was based on chronological order, and I would like to see the same order in desired table.
TABLE 1:
Student---- Score1
A------------ 90
A------------ 80
B------------ 85
B------------ 60
C------------ 50
C------------ 40
Table2:
Student---- Score2
A------------ 66
A------------ 70
A------------ 85
B------------ 60
C------------ 40
Table 3: Desired Table
Student---- Score1-----Score2
A------------ 90 ----------- 66
A------------ 80 ----------- 70
A------------null -----------85
B------------ 85 ----------- 60
B------------ 60 ----------- null
C------------ 50 ----------- 40
C------------ 40 ----------- null
Thank you!

Ok, we need as first thing try to find a way to add a positional column to your table at runtime. This can be done with ROW_NUMBER() function:
SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table1
SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table2
This creates a nice Position column in our result:
Student Score1 Position
---------- ----------- --------------------
A 90 1
A 80 2
B 85 1
B 60 2
C 50 1
C 40 2
(6 rows affected)
Student Score2 Position
---------- ----------- --------------------
A 66 1
A 70 2
A 85 3
B 60 1
C 40 1
(5 rows affected)
Now we need to join these two temporary results. Since you want to include all the rows from each table, leaving empty (NULL) the spaces left from non-matching rows. FULL OUTER JOIN comes to the rescue, in all its beauty:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table1) T1
FULL OUTER JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table2) T2
ON T1.Student = T2.Student AND T1.Position = T2.Position
We get this:
Student Score1 Position Student Score2 Position
---------- ----------- -------------------- ---------- ----------- --------------------
A 90 1 A 66 1
A 80 2 A 70 2
NULL NULL NULL A 85 3
B 85 1 B 60 1
B 60 2 NULL NULL NULL
C 50 1 C 40 1
C 40 2 NULL NULL NULL
(7 rows affected)
Now just select what you are interested in:
SELECT COALESCE(T1.student, T2.student) Student,
T1.score1,
T2.score2
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table1) T1
FULL OUTER JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Student) Position FROM Table2) T2
ON T1.Student = T2.Student AND T1.Position = T2.Position
And voilĂ :
Student score1 score2
---------- ----------- -----------
A 90 66
A 80 70
A NULL 85
B 85 60
B 60 NULL
C 50 40
C 40 NULL
(7 rows affected)
Be aware though: with many records, this could not be the most efficient way of storing and retrieving your data...
Edit: what follows has been added after answer acceptance
Really important: since a small diatribe is born in comments, let's state the obvious.
The database design proposed by OP has many defect, for first it's based on the assumption that the order of the records in the table will always be the one in which the records have been inserted.
This could not be true and my solution can not work as expected until some more robust way of sorting records is implemented.
Would it be better to add a CreatedAt column to both tables, of type datetime, in which to store record insert date:
ALTER TABLE dbo.Table1 ADD
CreatedAt datetime NOT NULL CONSTRAINT DF_Table1_CreatedAt DEFAULT getdate()
ALTER TABLE dbo.Table2 ADD
CreatedAt datetime NOT NULL CONSTRAINT DF_Table2_CreatedAt DEFAULT getdate()
This could allow to more safely order the records.
The solution would change as follows:
SELECT COALESCE(T1.student, T2.student) Student,
T1.score1,
T2.score2
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY CreatedAt) Position FROM Table1) T1
FULL OUTER JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Student ORDER BY CreatedAt) Position FROM Table2) T2
ON T1.Student = T2.Student AND T1.Position = T2.Position

Well, technically:
with t1 as (
select *, ord = row_number() over(partition by student order by score1 desc)
from table1
),
t2 as (
select *, ord = row_number() over(partition by student order by score2)
from table1
)
select student = isnull(t1.student, t2.student),
t1.score1,
t2.score2
from t1
full join t2 on t1.student = t2.student and t1.ord = t2.ord;
But I doubt your desire is to order scores in table1 in descending order and scores in table2 in ascending order. So you're going to have to pin that down. Is there a column for time the test was administered? Probably best to order by that.
Better still, do this in your front end software, such as with SSRS, crystal reports, or the like. I say this because I assume this is for a reporting need where the rows don't really represent 'records' anymore.

Related

How would you write a T-SQL query that supported event study analysis

I trying to create a table that will support a simple event study analysis, but I'm not sure how best to approach this.
I'd like to create a table with the following columns: Customer, Date, Time on website, Outcome. I'm testing the premise that the outcome for a particular customer on any give day if a function of the time spent on the website on the current day as well as the preceding five site visits. I'm envisioning a table similar to this:
I'm hoping to write a T-SQL query that will produce an output like this:
Given this objective, here are my questions:
Assuming this is indeed possible, how should I structure my table to accomplish this objective? Is there a need for a column that refers to the prior visit? Do I need to add an index to a particular column?
Would this be considered a recursive query?
Given the appropriate table structure, what would the query look like?
Is it possible to structure the query with a variable that determines the number of prior periods to include in addition to the current period (for example, if I want to compare 5 periods to 3 periods)?
Not sure I understand analytic value of your matrix
Declare #Table table (id int,VisitDate date,VisitTime int,Outcome varchar(25))
Insert Into #Table (id,VisitDate,VisitTime,Outcome) values
(123,'2015-12-01',100,'P'),
(123,'2016-01-01',101,'P'),
(123,'2016-02-01',102,'N'),
(123,'2016-03-01',100,'P'),
(123,'2016-04-01', 99,'N'),
(123,'2016-04-09', 98,'P'),
(123,'2016-05-09', 99,'P'),
(123,'2016-05-14',100,'N'),
(123,'2016-06-13', 99,'P'),
(123,'2016-06-15', 98,'P')
Select *
,T0 = VisitTime
,T1 = Lead(VisitTime,1,0) over(Partition By ID Order By ID,VisitDate Desc)
,T2 = Lead(VisitTime,2,0) over(Partition By ID Order By ID,VisitDate Desc)
,T3 = Lead(VisitTime,3,0) over(Partition By ID Order By ID,VisitDate Desc)
,T4 = Lead(VisitTime,4,0) over(Partition By ID Order By ID,VisitDate Desc)
,T5 = Lead(VisitTime,5,0) over(Partition By ID Order By ID,VisitDate Desc)
From #Table
Order By ID,VisitDate Desc
Returns
id VisitDate VisitTime Outcome T0 T1 T2 T3 T4 T5
123 2016-06-15 98 P 98 99 100 99 98 99
123 2016-06-13 99 P 99 100 99 98 99 100
123 2016-05-14 100 N 100 99 98 99 100 102
123 2016-05-09 99 P 99 98 99 100 102 101
123 2016-04-09 98 P 98 99 100 102 101 100
123 2016-04-01 99 N 99 100 102 101 100 0
123 2016-03-01 100 P 100 102 101 100 0 0
123 2016-02-01 102 N 102 101 100 0 0 0
123 2016-01-01 101 P 101 100 0 0 0 0
123 2015-12-01 100 P 100 0 0 0 0 0
With fixed columns you can do it like this with lag:
select
time,
lag(time, 1) over (partition by customer order by date desc),
lag(time, 2) over (partition by customer order by date desc),
lag(time, 3) over (partition by customer order by date desc),
lag(time, 4) over (partition by customer order by date desc)
from
yourtable
If you need dynamic columns, then you'll have to build it using dynamic SQL.

sql select based on first minimum value from top

I know we already have a few posts on similar topic. But I think this case a bit different and actually I couldn't get the result I wanted by answers given in other posts.
We have a table as below:
id code amount
------------------
1 A1 80
2 A1 75
3 A1 70
4 A1 70
5 A1 70
1 A2 92
2 A2 85
3 A2 79
4 A2 50
5 A2 50
How can I select the row for "A1" and "A2" based on first lowest value (from top) on "Amount" column? In this case I want the result like below:
id code amount
------------------
3 A1 70
4 A2 50
Thanks!
Use ROW_NUMBER:
SELECT
id, code, amount
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY code ORDER BY amount, id)
FROM tbl
) AS t
WHERE Rn = 1
you don't need ordering by id. so correct way would be :
SELECT
id, code, amount
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY code ORDER BY amount)
FROM tbl
) AS t
WHERE Rn = 1

Joining two SQL tables based on the equality of few columns

I am trying to Create a SQL View by joining two SQL tables and return only the lowest value from second table and all the rows from first table similar to left join.
My problem can be clearly explained with the below example.
Table1
Id Product Grade Term Bid Offer
100 ABC A Q1 10 20
101 ABC A Q1 5 25
102 XYZ A Q2 25 30
103 XYZ B Q2 20 30
Table2
Id Product Grade Term TradeValue
1 ABC A Q1 100
2 ABC A Q1 95
3 XYZ B Q2 100
In the above data I want to join Table1 and Table2 when ever the columns Product,Grade and Term from both the tables are equal and return all the rows from Table1 while joining the lowest Value of the column TradeValue from Table2 to the first record of the match and making TradeValue as NULL for other rows of the resultant View and the resultant View should have the Id of Table2 as LTID
So the resultant SQL View should be
RESULT
Id Product Grade Term Bid Offer TradeValue LTID
100 ABC A Q1 10 20 95 2
101 ABC A Q1 5 25 NULL 2
102 XYZ A Q2 25 30 NULL NULL
103 XYZ B Q2 20 30 100 3
I tried using the following query
CREATE VIEW [dbo].[ViewCC]
AS
SELECT
a.Id,a.Product,a.Grade,a.Term,a.Bid,a.Offer,
b.TradeValue
FROM Table1 AS a
left JOIN (SELECT Product,Grade,Term,MIN(TradeValue) TradeValue from Table2 Group by Product,Grade,Term,) AS b
ON b.Product=a.Product
and b.Grade=a.Grade
and b.Term=a.Term
GO
The above Query returned the following data which is apt to the query I wrote but that is not what I was trying to get
Id Product Grade Term Bid Offer TradeValue
100 ABC A Q1 10 20 95
101 ABC A Q1 5 25 95 --This should be null
102 XYZ A Q2 25 30 NULL
103 XYZ B Q2 20 30 100
As we can see minimum value of TradeValue being assigned to all matching rows in Table1 and also I was not able to return Id As LTID from Table2 as I have issues with group by clause as I cannot group it by b.Id as it returns too many rows.
May I know a better way to deal with this?
You need a row number attached to each record from Table1, so that the requirement of only joining the first record from each group of Table1 can be fulfilled:
CREATE VIEW [dbo].[ViewCC]
AS
SELECT a.Id, a.Product, a.Grade, a.Term, a.Bid, a.Offer,
b.TradeValue, b.Id AS LTID
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Product, Grade, Term ORDER BY Id) AS rn
FROM Table1
) a
OUTER APPLY (
SELECT TOP 1 CASE WHEN rn = 1 THEN TradeValue
ELSE NULL
END AS TradeValue, Id
FROM Table2
WHERE Product=a.Product AND Grade=a.Grade AND Term=a.Term
ORDER BY TradeValue) b
GO
OUTER APPLY returns a table expression containing either the matching record from Table2 with the lowest TradeValue, or NULL if no matching record exists.

Removing Duplicates of two columns in a query

I have a select * query which gives lots of row and lots of columns of results. I have an issue with duplicates of one column A when given the same value of another column B that I would like to only include one of.
Basically I have a column that tells me the "name" of object and another that tells me the "number". Sometimes I have an object "name" with more than one entry for a given object "number". I only want distinct "numbers" within a "name" but I want the query to give the entire table when this is true and not just these two columns.
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 3 443 76
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bill 1 443 76
Bill 2 54 1856
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 2 209 17
This example above is not fine, I only want one of the Bob 2's.
Try it if you are using SQL 2005 or above:
With ranked_records AS
(
select *,
ROW_NUMBER() OVER(Partition By name, number Order By name) [ranked]
from MyTable
)
select * from ranked_records
where ranked = 1
If you just want the Name and number, then
SELECT DISTINCT Name, Number FROM Table1
If you want to know how many of each there are, then
SELECT Name, Number, COUNT(*) FROM Table1 GROUP BY Name, Number
By using a Common Table Expression (CTE) and the ROW_NUMBER OVER PARTION syntax as follows:
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name, Number ORDER BY Name, Number) AS R
FROM
dbo.ATable
)
SELECT
*
FROM
CTE
WHERE
R = 1
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Plant, BatchNumber ORDER BY Plant, BatchNumber) AS R
FROM dbo.StatisticalReports WHERE dbo.StatisticalReports. \!"FermBatchStartTime\!" >= DATEADD(d,-90, getdate())
)
SELECT
*
FROM
CTE
WHERE
R = 1
ORDER BY dbo.StatisticalReports.Plant, dbo.StatisticalReports.FermBatchStartTime

Take average of only most recent group

There's one table named StudentScore which has fields of: Score, CourseID, StudentID and Semester. The later three ones are the primary keys.
I want to write a stored procedure to get the average score of each student. But the rule is quite complex and I don't know how to express it in one query. Nested query should be avoided if is possible.
Here is the rule:
If one student take a course for more than once, only the last score should be calculated.
For example, there're following data:
StudentID | CourseID | Semester | Score
1 1 1 80
1 2 1 40
1 3 1 60
1 2 2 50
1 3 2 20
2 1 1 90
The stored procedure should return:
StudentID | AvgScore
1 50 // which is avg(80, 50, 20)
2 90
Please suggest stored procedure as efficient as possible. Thanks!
;WITH x AS
(
SELECT StudentID, Score, rn = ROW_NUMBER() OVER
(PARTITION BY StudentID, CourseID
ORDER BY Semester DESC)
FROM dbo.StudentScore
)
SELECT StudentID, AvgScore = AVG(Score)
FROM x
WHERE rn = 1
GROUP BY StudentID;
If you want something rounded to certain decimal places, maybe:
;WITH x AS
(
SELECT StudentID, Score = 1.0*Score, rn = ROW_NUMBER() OVER
(PARTITION BY StudentID, CourseID
ORDER BY Semester DESC)
FROM dbo.StudentScore
)
SELECT StudentID, AvgScore = CONVERT(DECIMAL(10,2), AVG(Score))
FROM x
WHERE rn = 1
GROUP BY StudentID;

Resources