Combining columns of two different datasets - sql-server

I have a UDF that needs to always return the same dataset structure, columns a, b, c and d.
It needs to return a UNION ALL from more than one datasource, including other UDF:s.
Let's say I have another function (myOtherUDF) that returns column a and b.
I also have a table (myTable) with the column names a, b, c and d.
What I want to do is to UNION ALL on myOtherUDF and myTable in a way that the columns c and d are added to myOtherUDF.
i.e. I want this to work although myOtherUDF lacks the columns c and d:
CREATE FUNCTION myUDF (#param INT)
RETURNS #tbl TABLE
(
a int NOT NULL,
b int NOT NULL,
c int NOT NULL,
d int NOT NULL
)
AS
BEGIN
INSERT INTO #tbl
SELECT * FROM myTable
UNION ALL // this will obviously not work
SELECT * FROM myOtherUDF(#param)
RETURN
END
I cannot use a process to preload a table and I cannot use a view since I need the parameter #param.

If you explicitly list your columns - which is best practice - you immediately solve this problem, well after adding defaults for c and d from myOtherUDF;
INSERT INTO #tbl (a, b, c, d)
SELECT a, b, c, d
FROM myTable
UNION ALL
SELECT a, b, 0, 0
FROM myOtherUDF(#param);
RETURN;
To be clear you should pretty much never select * and never insert into table without listing the columns. It saves so many issues down the line.
And for performance reasons its nearly always better to use an inline table valued function e.g.
CREATE FUNCTION myUDF
(
#param INT
)
RETURNS TABLE
RETURN
SELECT a, b, c, d
FROM myTable
UNION ALL
SELECT a, b, 0, 0
FROM myOtherUDF(#param);

Related

Use variable as WHERE filter for multiple strings

I have a field containing event codes, sometimes two, sometimes 3.
Column looks like this:
'3011, 6009'
'3011, 3054'
'3011, 3013'
'6009, 9524'
'3011, 9524'
'3011, 6009, 3054'
'3011, 6009, 9524'
'3011, 9950'
'6009, 9950'
The combinations define a certain group.
I want to use a variable #x and set the values. So far, so good.
But I am not sure how to use the variable in my SELECT statement:
SELECT A, B, C
FROM TableA
WHERE EventCodes IN (#x)
Can anyone point out where the quotes go in this, I can't find it.
IN clause wants a set of elements. A variable, as your, is one string with concatenation of your elements, so are very different.
You can create a temp table as follow:
CREATE TABLE #app
(id varchar(10))
INSERT INTO #app VALUES ('3011, 6009'), and so on
Then, you can re-write your query as follow:
SELECT A, B, C
FROM TableA
WHERE EventCodes IN (SELECT id FROM #app)
or
SELECT A, B, C
FROM TableA
JOIN #app
ON TableA.EventCode = #app.id
If you're on 2016, you can use the new STRING_SPLIT function. It'll take your comma-delimited string and return a table, which you can use in an IN clause. For example,
SELECT A, B, C
FROM TableA
WHERE EventCodes IN STRING_SPLIT(#X, ',')
It'd be better if you could normalize your tables though.

Can "constant" lookups be done efficiently within a single query?

Pop quiz, SQL Server hotshots:
How many times will the following student subquery be executed? (assuming there are at least ten rows in something):
SELECT TOP 10 a, b
, (SELECT type_id
FROM type
WHERE type_code = 'student') student
FROM something
If you said 1, then like me, you assume SQL Server would recognize the value of student as an invariant scalar.
Unfortunately, the answer is 10:
I know, I'll use a CTE!
WITH codes (student) AS (
SELECT (SELECT type_id
FROM type
WHERE type_code = 'student')
)
SELECT TOP 10 a, b
, student
FROM something
CROSS JOIN codes
The result is exactly the same.
Of course, I can get the desired efficiency by first capturing the scalar to a variable:
DECLARE #Student tinyint
SELECT #Student = type_id
FROM type
WHERE type_code = 'student'
SELECT TOP 10 a, b
, #Student student
FROM something
This only does one seek, and adds nothing to the main query plan:
But besides being more verbose, if you're defining an inline table-valued function, it means you also have to write out an otherwise implicit return schema, which is a pain (and adds a vector for errors).
Is there any way to write a single query that only runs the subquery once?
For this query:
SELECT TOP 10 a, b,
(SELECT type_id FROM type WHERE type_code = 'student'
) as student
FROM something;
You want an index on type(type_code, type_id).
You might find this more efficient if you move the subquery to the FROM clause:
SELECT TOP 10 a, b,
t.type_id
FROM something s CROSS JOIN
(SELECT type_id FROM type WHERE type_code = 'student'
) t
Or even:
SELECT TOP 10 s.a, s.b, t.type_id
FROM something s JOIN
type t
ON t.type_code = 'student';

I need to get the whole row for all duplicate entries efficiently

internet! I'm pretty new to SQL and I need to get all the rows with duplicate information in certain fields and have them display right next to their other duplicates (group by duplicates).
For instance, say I have a table with columns:
A,B,C,D,E,F,G
I want to be able to get all entries (the full row) where B, C, D, and E share the same value as another entry and show the duplicates right next to the original entry. I already have a solution, but it is horribly inefficient. I am trying to improve my running time here.
My original solution was this:
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT
B+C+D+E+F+G
FROM
tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
)
ORDER BY B,C,D,E,F,G ASC
This gave me the results that I wanted, but it is horrendously slow (took over 15 mins to run). I reworked my solution with a temporary table and shaved the time down to 5 mins of running time using this script:
--Drop the temp table if it exists.
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
SELECT
B+C+D+E+F+G AS CompareString
INTO #Temp1
FROM tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT * FROM #Temp1
)
ORDER BY B,C,D,E,F,G ASC
Five minutes still seems like a long time. Is there a faster way to do this? I'm new to SQL, so if something I did was not good, let me know! Thanks!
I would do something like this:
with cte as (
SELECT *
, count(*) over (partition by B, C, D, E, F, G) as cnt
, dense_rank() over (order by B, C, D, E, F, G) as grp
FROM STI.[dbo].[tbl_Consignee]
)
select *
from cte
where cnt > 1
order by grp
Essentially, the dense_rank() call gives each unique tuple an identifier (so you can put duplicates next to each other with the order by clause) and the count counts the number of rows per group.
Without actual data, I have to make a few assumptions here.
First, I assume your lettered fields are all text types, and you are using + to concatenate and not to add numeric values (otherwise A+B+C = 6 when A = 1 B = 2 and C = 3 as well as when A=2 B=3 and C=1, which would not be a match).
Next I'm going to assume there is a key field of some kind on each row that is not represented in your example. Something like tbl_myTable.MyTableKey bigint IDENTITY (1,1) NOT NULL.
Assuming all that, I'd try...
SELECT
[BaseTable].MyTableKey AS [Original Record],
[DupCheckTable].MyTableKey AS [Duplicate Record]
FROM
tbl_myTable [BaseTable]
LEFT OUTER JOIN tbl_myTable [DupCheckTable] ON
[BaseTable].A = [DupCheckTable].A
AND
[BaseTable].B = [DupCheckTable].B
AND
--... repeat for each actual field
--AND
[BaseTable].G = [DupCheckTable].G
AND
[BaseTable].MyTableKey < [DupCheckTable].MyTableKey --the less than operator prevents you from getting each match twice
WHERE
[DupCheckTable].MyTableKey IS NOT NULL
I think this will run faster because you can use the table key, which is presumably indexed, as part of the join. Also, you feed any of your (or my) queries to the Tuning Advisor to see what it thinks would help in the lines of statistics and indexes.

How to reuse calculated columns avoiding duplicating the sql statement

I have a lots of calculated columns and they keep repeating themselves, one inside of the others, including nested cases statements.
There is a really simplified version of something that I've searching a way to do.
SELECT
(1+2) AS A,
A + 3 AS B,
B * 7 AS C
FROM MYTABLE
You could try something like this.
SELECT
A.Val AS A,
B.Val AS B,
C.Val AS C
FROM MYTABLE
cross apply(select 1 + 2) as A(Val)
cross apply(select A.Val + 3) as B(Val)
cross apply(select B.Val * 7) as C(Val)
You can't reference just-created expressions by later referencing their column aliases. Think of the entire select list as being materialized at the same time or in random order - A doesn't exist yet when you're trying to make an expression to create B. You need to repeat the expressions - I don't think you'll be able to make "simpler" computed columns without repeating them, and views the same - you'll have to nest things, like:
SELECT A, B, C = B * 7
FROM
(
SELECT A, B = A + 3
FROM
(
SELECT A = (1 + 2)
) AS x
) AS y;
Or repeat the expression (but I guess that is what you're trying to avoid).
Another option if someone is still interested:
with aa(a) as ( select 1+2 )
, bb(b) as ( select a+3 from aa )
,cc(c) as ( select b*7 from bb)
SELECT aa.a, bb.b, cc.c
from aa,bb,cc
The only way to "save" the results of your calculations would be using them in a subquery, that way you can use A, B and C. Unfortunately it cannot be done any other way.
You can create computed columns to represent the values you want. Also, you can use a view if your calculations are dependent on data in a separate table.
Do you want calculated results out of your table? In that case you can put the relevant calculations in scalar valued user defined function and use that inside your select statement.
Or do you want the calculated results to appear as columns in the table, then use a computed column:
CREATE TABLE Test(
ID INT NOT NULL IDENTITY(1,1),
TimesTen AS ID * 10
)

How to insert artificial data into a select statement

Suppose we got a original query as follows:
SELECT A, B, C FROM tblA
Now, I need to additional artificial rows like
SELECT 'Kyala', B, C FROM tblA when, for example, C = 100 to be inserted into the resultset.
As an example, if the tblA hold one row:
A B C
John 1 100
my goal is to return two rows like below with a single SQL query.
A B C
John 1 100
Kyala 1 100
How could I achieve it using a single SQL instead of relying on table variable or temp table?
Just refined the query to resolve error on Union:
SELECT A, B, C from tblA
UNION
SELECT 'Kyala' as A, B, C FROM tblA WHERE C = 100
And if you don't want the others where c=100 and still getting the A in the result (from the first Select in the union), you can do it like:
SELECT A, B, C from tblA WHERE C <> 100
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
or
SELECT CASE(C)
when 100 then 'Kyala'
else A
END as A, B, C from tblA
You can use a CASE:
SELECT B, C,
CASE
WHEN C = 100 THEN 'Kyala'
ELSE A
END
FROM tblA
You could achieve this with the UNION operator.
SELECT A, B, C from tblA
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
In response to the question in the comments about improving performance so that the table is only queried once - you could add a covering index over columns C and B so that the second part of the query uses that index rather than querying the table:
CREATE NONCLUSTERED INDEX [IX_tblA_CD] ON [dbo].[tblA]
(
[C] ASC
)
INCLUDE ( [B]) ON [PRIMARY]
GO
However, depending on the use case (this sounds like some kind of ad-hoc process for testing?), you might prefer to take the hit of two table scans rather than adding a new index which might not be appropriate for use in production.
You can use UNIION statement:
SELECT A, B, C FROM tblA
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
I need to additional artificial rows like SELECT 'Kyala', B, C FROM
tblA when, for example, C = 100 to be
inserted into the resultset.
Now, read up on....
* IF in SQL Server
*SWITCH etc.
Basically, you can define an additional column as was shown
(SELECT 'test', A, B, C FROM...)
But instead of 'test' you can put in an if or switch and work with the other fields to determine the exact stuff to output.
SELECT IF (xxxx) AS FirstColumn, A, B,
C FROM ...

Resources