Is there a way to have a column from another table with value which is always the same inside a View> Example:
SELECT *,
(SELECT value FROM tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM tblStudents
Will the above nested query get executed fro each row? Is there a way to execute it once and use for all rows.
Please note, I'm specifically talking about a View, not a Stored Procedure. I know this can be done in a Stored Procedure.
This actually depends on your table set up. Unless prefixName is constrained to be unique you could come across errors, where the subquery returns more than one row. If it is not constrained to be unique, but happens to be unique for SeniorPrefix then your query will be executed 1000 times. To demonstrate I have used the following DDL:
CREATE TABLE #tblStudents (ID INT IDENTITY(1, 1), Filler CHAR(100));
INSERT #tblStudents (Filler)
SELECT TOP 10000 NULL
FROM sys.all_objects a, sys.all_objects b;
CREATE TABLE #tblStudentPrefixes (Value VARCHAR(10), PrefixName VARCHAR(20));
INSERT #tblStudentPrefixes (Value, PrefixName) VALUES ('A Value', 'SeniorPrefix');
Running your query gives the following IO output:
Table '#tblStudentPrefixes'. Scan count 10000, logical reads 10000
Table '#tblStudents'. Scan count 1, logical reads 142
The key being the 1000 logical reads on tblStudentPrefixes. The other problem with it not being constrained to be unique is that if you have duplicates your query will fail with the error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
If you can't constrain PrefixName to be unique, then you can stop it executing for each row and avoid the errors by using TOP:
SELECT *,
(SELECT TOP 1 value FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix' ORDER BY Value)
AS StudentPrefix
FROM #tblStudents
The IO now becomes:
Table '#tblStudentPrefixes'. Scan count 1, logical reads 1
Table '#tblStudents'. Scan count 1, logical reads 142
However, I would still recommend switching to a CROSS JOIN here:
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
CROSS JOIN
( SELECT TOP 1 value
FROM #tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
ORDER BY Value
) AS p;
Inspection of the execution plans shows that a sub-select using a table spool which is very unnecessary for a single value:
So in summary, it depends on your table set up whether it will execute for each row, but regardless you are giving the optimiser a better chance if you switch to a cross join.
EDIT
In light of the fact that you need to return rows from tblstudent when there is no match for SeniorPrefix in tblStudentPrefixes, and that PrefixName is not currenty constrianed to be unique then the best solution is:
SELECT *,
(SELECT MAX(value) FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM #tblStudents;
If you do constrain it to be unique, then the following 3 queries produce (essentially) the same plan and the same results, it is simply personal preference:
SELECT *,
(SELECT value FROM #tblStudentPrefixes WHERE PrefixName = 'SeniorPrefix')
AS StudentPrefix
FROM #tblStudents;
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
LEFT JOIN #tblStudentPrefixes AS p
ON p.PrefixName = 'SeniorPrefix';
SELECT s.*, p.Value AS StudentPrefix
FROM #tblStudents AS s
OUTER APPLY
( SELECT Value
FROM #tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
) AS p;
I hope I understand your question right, but try this
SELECT *
FROM tblStudents
Outer Apply
(
SELECT value
FROM tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix'
) as tble
This is OK. Subquery would be executed for every row on every row (which could provide bad performance).
You could try also:
SELECT tblStudents.*,StudentPrefix.value
FROM tblStudents,
(SELECT value
FROM tblStudentPrefixes
WHERE PrefixName = 'SeniorPrefix')StudentPrefix
Related
I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer
I have a table with some names in a row. For each row I want to generate a random name. I wrote the following query to:
BEGIN transaction t1
Create table TestingName
(NameID int,
FirstName varchar(100),
LastName varchar(100)
)
INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants'
UNION
SELECT 1, 'Bugs', 'Bunny'
UNION
SELECT 2, 'Homer', 'Simpson'
UNION
SELECT 3, 'Mickey', 'Mouse'
UNION
SELECT 4, 'Fred', 'Flintstone'
SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5
ROLLBACK Transaction t1
The problem is the "ABS(CHECKSUM(NEWID())) % 5" portion of this query sometime returns more than 1 row and sometimes returns 0 rows. I must be missing something but I can't see it.
If I change the query to
DECLARE #n int
set #n= ABS(CHECKSUM(NEWID())) % 5
SELECT FirstName from TestingName
WHERE NameID = #n
Then everything works and I get a random number per row.
If you take the query above and paste it into SQL management studio and run the first query a bunch of times you will see what I am attempting to describe.
The final update query will look like
Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5)
This does not work because sometimes I get more than 1 row and sometimes I get no rows.
What am I missing?
The problem is that you are getting a different random value for each row. That is the problem. This query is probably doing a full table scan. The where clause is executed for each row -- and a different random number is generated.
So, you might get a sequence of random numbers where none of the ids match. Or a sequence where more than one matches. On average, you'll have one match, but you don't want "on average", you want a guarantee.
This is when you want rand(), which produces only one random number per query:
SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);
This should get you one value.
Why not use top 1?
Select top 1 firstName
From testingName
Order by newId()
This worked for me:
WITH
CTE
AS
(
SELECT
ID
,FName
,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
FROM
dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
SELECT
CTE.ID
, CTE.FName
, dbo.TestingName.FirstName AS RandomName
FROM
CTE
LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;
This solution depends on how smart optimizer is.
For example, if I use INNER JOIN instead of LEFT JOIN (which is the correct choice for this query), optimizer would move calculation of random numbers outside the join loop and end result would be not what we expect.
I created a table TestingName with 5 rows as in the question and a table TableWithABunchOfNames with 100 rows.
Here is the execution plan with LEFT JOIN. You can see the Compute scalar that calculates random numbers is done before the join loop. You can see that 100 rows were updated:
Here is the execution plan with INNER JOIN. You can see the Compute scalar that calculates random numbers is done after the join loop and with extra filter. This query may update not all rows in TableWithABunchOfNames and some rows in TableWithABunchOfNames may be updated several times. You can see that Filter left 102 rows and Stream aggregate left only 69 rows. It means that only 69 rows were eventually updated and also there were multiple matches for some rows (102 - 69 = 33).
To guarantee that the result is what you expect you should generate random number for each row in TableWithABunchOfNames and explicitly remember the result, i.e. materialize the CTE shown above. Then use this temporary result to join with the table TestingName.
You can add a column to TableWithABunchOfNames to store generated random numbers or save CTE to a temp table or table variable.
I have the following query:
Select PH.SubId
From dbo.PanelHistory PH
Where
PH.Scribe2Time <> (Select FIRST_VALUE(ReadTimeLocal) OVER (Order By ReadTimeLocal) From dbo.PanelWorkflow Where ProcessNumber = 2690 And dbo.PanelWorkflow.SubId = PH.SubId)
I'm getting an error (512) that says: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
How can the subquery return more than a single value? There can only be one first value. I must be overlooking something with this query.
By the way, I realize I could easily use Min() instead of First_Value, but I wanted to experiment with some of these Windowing functions.
How many rows do you see?
SELECT FIRST_VALUE(name) OVER (ORDER BY create_date) AS RN
FROM sys.objects
Even though there is only one distinct first value it still returns it for every row in the query.
So if the sub query itself matches multiple rows you will get this error. You could get rid of it with DISTINCT or TOP 1.
Probably not very efficient but you say this is just for experimental purposes.
This isn't an answer. It's just an extended comment generated by the following conclusion:
I could easily use Min() instead of First_Value, but I
wanted to experiment with some of these Windowing functions.
Min can't be used instead of FIRST_VALUE.
Example:
SET NOCOUNT ON;
DECLARE #MyTable TABLE(ID INT, TranDate DATETIME)
INSERT #MyTable VALUES (1, '2012-02-02'), (2, '2011-01-01'), (3, '2013-03-03')
SELECT MIN(ID) AS MIN_ID FROM #MyTable
SELECT ID, MIN(ID) OVER(ORDER BY TranDate) AS MIN_ID_ORDER_BY FROM #MyTable;
SELECT ID, FIRST_VALUE(ID) OVER(ORDER BY TranDate) AS FIRST_VALUE_ID_ORDER_BY FROM #MyTable;
Results:
MIN_ID
-----------
1
ID MIN_ID_ORDER_BY
----------- ---------------
2 2
1 1
3 1
ID FIRST_VALUE_ID_ORDER_BY
----------- -----------------------
2 2
1 2
3 2
FIRST_VALUE() will still return a row for every record that meets tour WHERE clause. TOP 1 should work:
Select PH.SubId
From dbo.PanelHistory PH
Where
PH.Scribe2Time <> (Select TOP 1 ReadTimeLocal
From dbo.PanelWorkflow
Where ProcessNumber = 2690
And dbo.PanelWorkflow.SubId = PH.SubId
Order By ReadTimeLocal DESC)
or MIN:
Select PH.SubId
From dbo.PanelHistory PH
Where
PH.Scribe2Time <> (Select MIN(ReadTimeLocal)
From dbo.PanelWorkflow
Where ProcessNumber = 2690
And dbo.PanelWorkflow.SubId = PH.SubId)
The PARTITION/OVER functions are look-ahead column functions. They aren't row functions - by that, I mean, they don't effect an entire row, number of rows returned, etc. An OVER aggregate can depend on values in other rows, but the tangible result is only to calculate a single column in the current row.
You may have seen something similar to what you are trying to do via an OVER ROW_NUMBER ranking function. Multiple rows are still returned, but only one of them has a ROW_NUMBER of 1. The rest are filtered in an encapsulating WHERE or JOIN predicate.
a few minutes ago i asked here how to get parent records with a recursive CTE.
This works now, but I get the wrong order(backwards, ordered by the PK idData) when i create a Table valued Function which returns all parents. I cannot order directly because i need the logical order provided by the CTE.
This gives the correct order(from next parent to that parent and so on):
declare #fiData int;
set #fiData=16177344;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
select iddata from PreviousClaims
But the following function returns all records in backwards order(ordered by PK):
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
AS
BEGIN
DECLARE #idData int;
WITH PreviousClaims(idData,fiData)
AS(
SELECT parent.idData,parent.fiData
FROM tabData parent
WHERE parent.idData = #fiData
UNION ALL
SELECT child.idData,child.fiData
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims;
RETURN;
END;
select * from dbo._previousClaimsByFiData(16177344);
UPDATE:
Since everybody beliefs that the CTE is not ordering(Any "ordering" will be totally arbitrary and coincidental), i'm wondering why the opposite seems to be true. I have queried a child claim with many parents and the order in the CTE is exactly the logical order when i go from child to parent and so on. This would mean that the CTE is iterating from record to record like a cursor and the following select returns it in exact this order. But when i call the TVF i got the order of the primary key idData instead.
The solution was simple. I only needed to remove the parent key of the return-Table of the TVF. So change...
RETURNS #retPreviousClaims TABLE
(
idData int PRIMARY KEY NOT NULL
)
to...
RETURNS #retPreviousClaims TABLE
(
idData int
)
.. and it keeps the right "order" (same order they were inserted into the CTE's temporary result set).
UPDATE2:
Because Damien mentioned that the "CTE-Order" could change in certain circumstances, i will add a new column relationLevel to the CTE which describes the level of relationship of the parent records (what is by the way quite useful in general f.e. for a ssas cube).
So the final Inline-TVF(which returns all columns) is now:
CREATE FUNCTION [dbo].[_previousClaimsByFiData] (
#fiData INT
)
RETURNS TABLE AS
RETURN(
WITH PreviousClaims
AS(
SELECT 1 AS relationLevel, child.*
FROM tabData child
WHERE child.idData = #fiData
UNION ALL
SELECT relationLevel+1, child.*
FROM tabData child
INNER JOIN PreviousClaims parent ON parent.fiData = child.idData
)
SELECT TOP 100 PERCENT * FROM PreviousClaims order by relationLevel
)
This is an exemplary relationship:
select idData,fiData,relationLevel from dbo._previousClaimsByFiData(46600314);
Thank you.
The correct way to do your ORDERing is to add an ORDER BY clause to your outermost select. Anything else is relying on implementation details that may change at any time (including if the size of your database/tables goes up, which may allow more parallel processing to occur).
If you need something convenient to allow the ordering to take place, look at Example D in the examples from the MSDN page on WITH:
WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
SELECT ManagerID, EmployeeID, Title, 0 AS EmployeeLevel
FROM dbo.MyEmployees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.ManagerID, e.EmployeeID, e.Title, EmployeeLevel + 1
FROM dbo.MyEmployees AS e
INNER JOIN DirectReports AS d
ON e.ManagerID = d.EmployeeID
)
Add something similay to the EmployeeLevel column to your CTE, and everything should work.
I think the impression that the CTE is creating an ordering is wrong. It's a coincidence that the rows are coming out in order (possibly due to how they were originally inserted into tabData). Regardless, the TVF is returning a table so you have to explicitly add an ORDER BY to the SELECT you're using to call it if you want to guarantee ordering:
select * from dbo._previousClaimsByFiData(16177344) order by idData
There is no ORDER BY anywhere in sight - neither in the table-valued function, nor in the SELECT from that TVF.
Any "ordering" will be totally arbitrary and coincidental.
If you want a specific order, you need to specify an ORDER BY.
So why can't you just add an ORDER BY to your SELECT:
SELECT * FROM dbo._previousClaimsByFiData(16177344)
ORDER BY (whatever you want to order by)....
or put your ORDER BY into the TVF:
INSERT INTO #retPreviousClaims
SELECT idData FROM PreviousClaims
ORDER BY idData DESC (or whatever it is you want to order by...)
I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);