Can "constant" lookups be done efficiently within a single query? - sql-server

Pop quiz, SQL Server hotshots:
How many times will the following student subquery be executed? (assuming there are at least ten rows in something):
SELECT TOP 10 a, b
, (SELECT type_id
FROM type
WHERE type_code = 'student') student
FROM something
If you said 1, then like me, you assume SQL Server would recognize the value of student as an invariant scalar.
Unfortunately, the answer is 10:
I know, I'll use a CTE!
WITH codes (student) AS (
SELECT (SELECT type_id
FROM type
WHERE type_code = 'student')
)
SELECT TOP 10 a, b
, student
FROM something
CROSS JOIN codes
The result is exactly the same.
Of course, I can get the desired efficiency by first capturing the scalar to a variable:
DECLARE #Student tinyint
SELECT #Student = type_id
FROM type
WHERE type_code = 'student'
SELECT TOP 10 a, b
, #Student student
FROM something
This only does one seek, and adds nothing to the main query plan:
But besides being more verbose, if you're defining an inline table-valued function, it means you also have to write out an otherwise implicit return schema, which is a pain (and adds a vector for errors).
Is there any way to write a single query that only runs the subquery once?

For this query:
SELECT TOP 10 a, b,
(SELECT type_id FROM type WHERE type_code = 'student'
) as student
FROM something;
You want an index on type(type_code, type_id).
You might find this more efficient if you move the subquery to the FROM clause:
SELECT TOP 10 a, b,
t.type_id
FROM something s CROSS JOIN
(SELECT type_id FROM type WHERE type_code = 'student'
) t
Or even:
SELECT TOP 10 s.a, s.b, t.type_id
FROM something s JOIN
type t
ON t.type_code = 'student';

Related

Rows to columns without PIVOT in SQL Server

I have a 3 tables from which contain this data:
Table 1:
Table 2:
Table 3:
Output:
I have tried using Pivot but it has to have an aggregate function in it.
SELECT
project_code, project_name, fk_prj_project_id,
[A], [B], [C], [D]
FROM
(SELECT
project_code, project_name, employee_name,
fk_prj_project_id, fk_prj_project_id AS nm,
activity_details
FROM
PRJ_MST_PROJECT AS a
LEFT JOIN
PRJ_TNS_DAILY_SUMMARY AS b ON a.pk_prj_project_id = b.fk_prj_project_id
LEFT JOIN
HRM_EMP_MST_EMPLOYEE AS c ON b.fk_hrm_emp_employee_id = c.pk_hrm_emp_employee_id
WHERE
a.project_status = 0
AND b.transaction_status = 1
AND CONVERT(date, b.transaction_date, 103) = CONVERT(date, '15/04/2021', 103)) x
PIVOT
(MAX(nm)
FOR nm IN ([A], [B], [C], [D])
) p
The problem is you set your PIVOT to look for values of nm in A, B, C, and D, but nm is an alias for fk_prj_project_id, which has possible values of 1, 2, 3, 4, and 5. So there are no A, B, C, or D values to be had. I don't even see a name for the column that holds A, B, C, and D, but whatever column that is needs to be what you put in the "FOR ___ IN" section of your pivot.
Test your query by commenting out the reference to the pivot columns in the SELECT and comment out the word PIVOT and everything after it and re-run your query. You should see some column with values A, B, C, D. If you don't, fix your query so you do. Once you do, that column is what you PIVOT on (put it between FOR and IN in the pivot block).
Oh, and if you provide data in a usable format people might run your query and give you directly usable results, it's a lot to ask to have people enter your data to get to help you so meet them half way. A link to sqlfiddle is ideal, but even just a bunch of DECLARE #T1 and INSERT INTO T1 VALUES statements is usually enough to get significantly better help.
EDIT:
Nice job with the Fiddle!
OK, so using your data, we can test out actual queries. For PIVOT to work, we need a column to look up (employee name), a column to aggregate (activity_details), and some columns that will be constant across the rows produced (the project's name and ID). You're working with text not numbers, so your aggregation can't be mathematical, leaving you with pretty much just MAX or MIN. To make sure you get the right (newest) one, I first built a table of comments and numbered them by how new they were, then I picked just the newest comment for each (project, user) pair. cteCommentNewest is the result of that.
Now with a clean (and verified) table to pivot, the actual pivot syntax is simple. Well, as simple as Pivot can be, it's inherently pretty confusing IMHO, but structuring it this way keeps the actual PIVOT as clean as possible.
Note that the query is in twice, I tested it as a static query before converting it to dynamic because it's much easier to troubleshoot a static query, then I left it in in case you want to experiment with it. You don't need it for the final solution to work.
Here's the final code, fully tested and producing the specified output:
DECLARE #cols3 AS NVARCHAR(MAX)
DECLARE #query3 AS NVARCHAR(MAX)=''
DECLARE #dt varchar(100)='14/04/2021'
select #cols3 = STUFF((SELECT ',' + QUOTENAME(employee_name)
from dbo.HRM_EMP_MST_EMPLOYEE
order by employee_name
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--SELECT #cols3 --Test column list for dynamic query
--Test the core functions of pivot before making dynamic
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)
SELECT *
FROM cteCommentsNewest as N --TEST up to this point to see the raw table
PIVOT (MAX(activity_details) FOR employee_name IN (A, B, C) ) as P
--Put the working query, modified for dynamic columns, into a variable
set #query3 = N'
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)SELECT *
FROM cteCommentsNewest as N
PIVOT (MAX(activity_details) FOR employee_name IN (' + #cols3 + ') ) as P
'
exec sp_executesql #query3
which produces the following output
project_code
project_name
A
B
C
MOA20171
Project A
some remark By Employee A on 14
NULL
some remark By Employee C on 14
MOA20172
Project B
NULL
NULL
some remark By Employee C on 15
MOA20173
Project C
NULL
NULL
NULL

SQL Server - WHERE <several columns> in (<list of columns values>)

I have done this long ago in other DBMSs (Oracle or MySQL... don't really remember) and I'm looking for the way to do this in SQL Server, if possible at all.
Suppose you have a table with several columns, say A, B, C, ... M. I wish to phrase a select from this table where columns A, B, and C display specific sets of value or, in other words, a list of values combinations.
For instance, I wish to retrieve all the records that match any of the following combinations:
A B C
1 'Apples' '2016-04-12'
56 'Cars' '2014-02-11'
....
Since the list of possible combinations may be quite long (including the option of an inner SELECT), it would not be practical to use something like:
WHERE ( A = 1 AND B = 'Apples' and C = '2016-04-12' ) OR
( A = 56 AND B = 'Cars' and C = '2014-02-11' ) OR
...
As stated, I did use this type of construct in the past and it was something like:
SELECT *
FROM MyTable
WHERE (A,B,C) IN (SELECT A,B,C FROM MYOtherTable) ;
[Most likely this syntax is wrong but it shows what I'm looking for]
Also, I would rather avoid Dynamic SQL usage.
So, the questions would be:
Is this doable in SQL Server?
If the answer is YES, how should the SELECT be phrased?
Thanks in advance.
You can use JOIN
SELECT m1.*
FROM MyTable m1
JOIN MYOtherTable m2
ON m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C
or Exists
SELECT m1.*
FROM MyTable m1
WHERE EXISTS (SELECT 1
FROM MYOtherTable m2
WHERE m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C)

Selecting a set of rows more than once

Is there a simple, concise way to select the same set of rows repeated based on a count held in a variable, without using a loop?
For instance, suppose SELECT a, b, c FROM foo WHERE whatsit = something returns
a b c
--- --- ----
1 2 3
...and I have #count with 3 in it. Is there a reasonable way without a loop to get:
a b c
--- --- ----
1 2 3
1 2 3
1 2 3
? Order doesn't matter, and I don't need to know which group any given row belongs to. I actually only need this for one row (as above), and a solution that only works for one row would do the trick, but I assume if we can do it for one, we can do it for any number.
Try with a Recursive CTE
WITH cte
AS (SELECT 1 AS id,a,b,c
FROM tablename
UNION ALL
SELECT id + 1,a,b,c
FROM cte
WHERE id < 3) --#count
SELECT a,b,c
FROM cte
Another way to do using cross join
SELECT a, b, c
FROM Table1
CROSS JOIN (SELECT number
FROM master.dbo.spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 3) T
I don't know of a way you could do this without a loop or dynamic SQL. I think a union is all that I can come up with
select q1.a,q1.b,q1.c
from (
SELECT a, b, c FROM foo
union all
SELECT a, b, c FROM foo
union all
SELECT a, b, c FROM foo ) q1
order by q1.a

T-SQL get row count before TOP is applied

I have a SELECT that can return hundreds of rows from a table (table can be ~50000 rows). My app is interested in knowing the number of rows returned, it means something important to me, but it actually uses only the top 5 of those hundreds of rows. What I want to do is limit the SELECT query to return only 5 rows, but also tell my app how many it would have returned (the hundreds). This is the original query:
SELECT id, a, b, c FROM table WHERE a < 2
Here is what I came up with - a CTE - but I don't feel comfortable with the total row count appearing in every column. Ideally I would want a result set of the TOP 5 and a returned parameter for the total row count.
WITH Everything AS
(
SELECT id, a, b, c FROM table
),
DetermineCount AS
(
SELECT COUNT(*) AS Total FROM Everything
)
SELECT TOP (5) id, a, b, c, Total
FROM Everything
CROSS JOIN DetermineCount;
Can you think of a better way?
Is there a way in T-SQl to return the affected row count of a select top query before the top was applied? ##rowcount would return 5 but I wonder if there is a ##rowcountbeforetop sort of thing.
Thanks in advance for your help.
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant.
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;
Here's a relatively efficient way to get 5 random rows and include the total count. The random element will introduce a full sort no matter where you put it.
SELECT TOP (5) id,a,b,c,total = COUNT(*) OVER()
FROM dbo.mytable
ORDER BY NEWID();
Assuming you want the top 5 ordering by id ascending, this will do it with a single pass through your table.
; WITH Everything AS
(
SELECT id
, a
, b
, c
, ROW_NUMBER() OVER (ORDER BY id ASC) AS rn_asc
, ROW_NUMBER() OVER (ORDER BY id DESC) AS rn_desc
FROM <table>
)
SELECT id
, a
, b
, c
, rn_asc + rn_desc - 1 AS total_rows
FROM Everything
WHERE rn_asc <= 5
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant. Let me know what you think. Thanks!
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Resources