How does updating rows from a subquery work in SQL Server? - sql-server

How does SQL Server know which rows to update when updating from a subquery rather than a table?
Say I have a table with three columns defined like below:
CREATE TABLE A (
AId int IDENTITY (1,1) PRIMARY KEY,
AExternalId int NULL,
ASequence int NULL
)
I want to update the column ASequence by sequential numbers within groups of AExternalId where ASequence is NULL.
For example, having inserted four different AExternalId's (or groups),
INSERT INTO A ([AExternalId]) VALUES (1001)
INSERT INTO A ([AExternalId]) VALUES (1002)
INSERT INTO A ([AExternalId]) VALUES (1002)
INSERT INTO A ([AExternalId]) VALUES (1003)
INSERT INTO A ([AExternalId]) VALUES (1003)
INSERT INTO A ([AExternalId]) VALUES (1003)
INSERT INTO A ([AExternalId], [ASequence]) VALUES (1004, 10)
INSERT INTO A ([AExternalId], [ASequence]) VALUES (1004, 20)
INSERT INTO A ([AExternalId], [ASequence]) VALUES (1004, 30)
the table looks like this:
AId
AExternalId
ASequence
1
1001
NULL
2
1002
NULL
3
1002
NULL
4
1003
NULL
5
1003
NULL
6
1003
NULL
7
1004
10
8
1004
20
9
1004
30
After the update, the table should look like this:
AId
AExternalId
ASequence
1
1001
1
2
1002
1
3
1002
2
4
1003
1
5
1003
2
6
1003
3
7
1004
10
8
1004
20
9
1004
30
AIds within every group of AExternalId's now has a sequential number (except for the ones that already had a sequence).
I can achieve this by running the following query:
UPDATE t1
SET t1.[ASequence] = t1.[CalcSequence]
FROM (
SELECT AId, AExternalId, ASequence, ROW_NUMBER() OVER (PARTITION BY [AExternalId] ORDER BY [AExternalId], [AId] ASC) AS [CalcSequence]
FROM [A]
WHERE (ASequence IS NULL) AND (AExternalId IS NOT NULL)
) t1
The question is, why (or rather how) does this work as there is no table specified and no condition for the update?
I have been taught that an update without condition will update all rows in a table but in this case there is no table specified (only in the subquery).
Does this work because I am updating the resulting rows from the inner select? If so, how are rows "matched" so that the update is made on the correct row?
Is this an example of a Correlated Subquery?
I've tried to read up on those but failed to understand if this applies here. Many texts on Correlated Subqueries talk about performance issues and that the correlated subquery requires values from its outer query which does not seem to fit this example.
An alternative way of achieving the same result is by using an INNER JOIN:
UPDATE t1
SET t1.[ASequence] = t2.[CalcSequence]
FROM [A] t1
INNER JOIN (
SELECT AId, AExternalId, ASequence, ROW_NUMBER() OVER (PARTITION BY [AExternalId] ORDER BY [AExternalId], [AId] ASC) AS [CalcSequence]
FROM [A]
WHERE (ASequence IS NULL) AND (AExternalId IS NOT NULL)
) t2 ON t2.AId = t1.AId
I have compared the results of both queries and they are identical. Performance-wise, the first query seems to be a bit faster and consumes less resources.
The second query (with inner join) feels more "familiar", more "correct" but I would really like to understand how the first one works.

Related

FInd duplicate rows and show only the earliest

I have the following table:
respid, uploadtime
I need a query that will show all the records that respid is duplicate and show them except the latest (by upload time)
exmple:
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01
4 2016-01-01
In this case the query should return four records (the latest is : 4 2016-01-01 )
Thank you very much.
Use ROW_NUMBER:
WITH cte AS (
SELECT respid, uploadtime,
ROW_NUMBER() OVER (PARTITION BY respid ORDER BY uploadtime DESC) rn
FROM yourTable
)
SELECT respid, uploadtime
FROM cte
WHERE rn > 1
ORDER BY respid, uploadtime;
The logic here is to show all records except those having the first row number value, which would be the latest records for each respid group.
If I interpreted your question correctly, then you want to see all records where respid occurs multiple times, but exclude the last duplicate.
Translating this to SQL could sound like "show all records that have a later record for the same respid". That is exactly what the solution below does. It says that for every row in the result a later record with the same respid must exists.
Sample data
declare #MyTable table
(
respid int,
uploadtime date
);
insert into #MyTable (respid, uploadtime) values
(4, '2014-01-01'),
(4, '2014-06-01'),
(4, '2015-01-01'),
(4, '2015-06-01'),
(4, '2016-01-01'), --> last duplicate of respid=4, not part of result
(5, '2020-01-01'); --> has no duplicate, not part of result
Solution
select mt.respid, mt.uploadtime
from #MyTable mt
where exists ( select top 1 'x'
from #MyTable mt2
where mt2.respid = mt.respid
and mt2.uploadtime > mt.uploadtime );
Result
respid uploadtime
----------- ----------
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01

SQL Pivot only select rows

I am attempting to pivot a database so that only certain rows become columns. Below is what my table looks like:
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
I would like the table to look like this:
ID AccNum EmpNam
1 10 John Inc
2 11 CBS
I have two main problems I am trying to account for.
1st: the value that I am trying to get isn't always in the same column. So while AccNum is always in the NumV column, EmpName is always in the CharV column.
2nd: I need to find a way to ignore data that I don't want. In this example it would be the row with UW in the QType column.
Below is the code that I have:
SELECT *
FROM testTable
Pivot(
MAX(NumV)
FOR[QType]
In ([AccNum],[TheValue])
)p
But it's giving me the below result:
ID CharV AccNum TheValue
1 10 NULL
2 11 NULL
2 CBS NULL NULL
2 Dan NULL NULL
1 John IncNULL NULL
1 Josh NULL NULL
In this case grouping with conditional aggregation should work. Try something like:
SELECT ID
, MAX(CASE WHEN QType = 'AccNum' THEN NumV END) AS AccNum
, MAX(CASE WHEN QType = 'EmpNam' THEN CharV END) AS EmpNam
FROM testTable
GROUP BY ID
Since the inner CASE only gets a value when the WHEN condition is met, the MAX function will give you the value desired. This of course, only works as long as there are only unique QTypes per ID.
Generally using PIVOT in Sql-Server doesn't work in one step when your conditions are complex, specially when you need values from different columns. You could pivot your table in two queries and join those, but it would perform poorly and is less readable than my suggestion.

How can I always return Null for a column without updating the column's value in the database?

ID Name tuition num of courses
1 Brandon 4430 6
2 Lisa 2300 3
3 Victoria null 0
4 Jack 3330 4
The type of the tuition column is money, but I need to return return null in my select statement without updating the values in the table.
I tried nullif(tuition is not null), but it didn't work.
How can I return results like those in the table below, without updating the table or modifying the data in database?
ID Name tuition num of courses
1 Brandon null 6
2 Lisa null 3
3 Victoria null 0
4 Jack null 4
If you are returning null for every row, just code the column as:
NULL AS Tuition
Example query:
SELECT Id, Name, NULL as Tuition, NumCourses FROM TheTable
I have created the table and inserted records as you have shown above
It is a self join query.
-- To make sure that the underlying table is not updated run both the queries together.
select TT.Id, TT.Name,
nullif(TT.Tuition, BT.Tuition) as Tuition, TT.NOCs
from tblTuition TT
join tblTuition BT
on TT.Id = Bt.Id
select * from tblTuition
Whenever you need to get value as null then you can use like this,
SELECT NULL AS ABC FROM MYTABLE
So above statement add one ABC column in your select list AS All NULL Values, same thing can be use as getting a Default value e.g. if you want to get 1 then simply use SELECT 1 AS ABC FROM MYTABLE

T-SQL select rows by oldest date and unique category

I'm using Microsoft SQL. I have a table that contains information stored by two different categories and a date. For example:
ID Cat1 Cat2 Date/Time Data
1 1 A 11:00 456
2 1 B 11:01 789
3 1 A 11:01 123
4 2 A 11:05 987
5 2 B 11:06 654
6 1 A 11:06 321
I want to extract one line for each unique combination of Cat1 and Cat2 and I need the line with the oldest date. In the above I want ID = 1, 2, 4, and 5.
Thanks
Have a look at row_number() on MSDN.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY date_time, id) rn
FROM mytable
) q
WHERE rn = 1
(run the code on SQL Fiddle)
Quassnoi's answer is fine, but I'm a bit uncomfortable with how it handles dups. It seems to return based on insertion order, but I'm not sure if even that can be guaranteed? (see these two fiddles for an example where the result changes based on insertion order: dup at the end, dup at the beginning)
Plus, I kinda like staying with old-school SQL when I can, so I would do it this way (see this fiddle for how it handles dups):
select *
from my_table t1
left join my_table t2
on t1.cat1 = t2.cat1
and t1.cat2 = t2.cat2
and t1.datetime > t2.datetime
where t2.datetime is null

SQL Server 2008: produce table of unique entries

I have the following problem. I have a table with a few hundred thousand records, which has the following identifiers (for simplicity)
MemberID SchemeName BenefitID BenefitAmount
10 ABC 1 10000
10 ABC 1 2000
10 ABC 2 5000
10 A.B.C 3 11000
What I need to do is to convert this into a single record that looks like this:
MemberID SchemeName B1 B2 B3
10 ABC 12000 5000 11000
The problem of course being that I need to differentiate by SchemeName, and for most records this won't be a problem, but for some SchemeName wouldn't be captured properly. Now, I don't particularly care if the converted table uses "ABC" or "A.B.C" as scheme name, as long as it just uses 1 of them.
I'd love hear your suggestions.
Thanks
Karl
(Using SQL Server 2008)
based on the limited info in the original question, give this a try:
DECLARE #YourTable table(MemberID int, SchemeName varchar(10), BenefitID int, BenefitAmount int)
INSERT INTO #YourTable VALUES (10,'ABC' ,1,10000)
INSERT INTO #YourTable VALUES (10,'ABC' ,1,2000)
INSERT INTO #YourTable VALUES (10,'ABC' ,2,5000)
INSERT INTO #YourTable VALUES (10,'A.B.C',3,11000)
INSERT INTO #YourTable VALUES (11,'ABC' ,1,10000)
INSERT INTO #YourTable VALUES (11,'ABC' ,1,2000)
INSERT INTO #YourTable VALUES (11,'ABC' ,2,5000)
INSERT INTO #YourTable VALUES (11,'A.B.C',3,11000)
INSERT INTO #YourTable VALUES (10,'mnp',3,11000)
INSERT INTO #YourTable VALUES (11,'mnp' ,1,10000)
INSERT INTO #YourTable VALUES (11,'mnp' ,1,2000)
INSERT INTO #YourTable VALUES (11,'mnp' ,2,5000)
INSERT INTO #YourTable VALUES (11,'mnp',3,11000)
SELECT
MemberID, REPLACE(SchemeName,'.','') AS SchemeName
,SUM(CASE WHEN BenefitID=1 THEN BenefitAmount ELSE 0 END) AS B1
,SUM(CASE WHEN BenefitID=2 THEN BenefitAmount ELSE 0 END) AS B2
,SUM(CASE WHEN BenefitID=3 THEN BenefitAmount ELSE 0 END) AS B3
FROM #YourTable
GROUP BY MemberID, REPLACE(SchemeName,'.','')
ORDER BY MemberID, REPLACE(SchemeName,'.','')
OUTPUT:
MemberID SchemeName B1 B2 B3
----------- ----------- ----------- ----------- -----------
10 ABC 12000 5000 11000
10 mnp 0 0 11000
11 ABC 12000 5000 11000
11 mnp 12000 5000 11000
(4 row(s) affected)
It looks that PIVOTS can help
The schemename issue is something that will have to be dealt with manually since the names can be so different. This indicates first and foremost a problem with how you are allowing data entry. You should not have these duplicate schemenames.
However since you do, I think the best thing is to create cross reference table that has two columns, something like recordedscheme and controlling scheme. Select distinct scheme name to create a list of possible schemenames and insert into the first column. Go through the list and determine what the schemename you want to use for each one is (most willbe the same as the schemename). Once you have this done, you can join to this table to get the query. This will work for the current dataset, however, you need to fix whatever is causeing the schemename to get duplicated beofre going further. YOu will also want to fix it so when a schemename is added, you table is populated with the new schemename in both columns. Then if it later turns out that a new one is a duplicate, all you have to do is write a quick update to the second column showing which one it really is and boom you are done.
The alternative is to actually update the schemenames that are bad in the data set to the correct one. Depending on how many records you have to update and in how many tables, this might be a performance issue.This too is only good for querying the data right now and doesn't address how to fix the data going forth.

Resources