How to reuse calculated columns avoiding duplicating the sql statement

How to reuse calculated columns avoiding duplicating the sql statement - sql-server

I have a lots of calculated columns and they keep repeating themselves, one inside of the others, including nested cases statements.
There is a really simplified version of something that I've searching a way to do.
SELECT
(1+2) AS A,
A + 3 AS B,
B * 7 AS C
FROM MYTABLE

You could try something like this.
SELECT
A.Val AS A,
B.Val AS B,
C.Val AS C
FROM MYTABLE
cross apply(select 1 + 2) as A(Val)
cross apply(select A.Val + 3) as B(Val)
cross apply(select B.Val * 7) as C(Val)

You can't reference just-created expressions by later referencing their column aliases. Think of the entire select list as being materialized at the same time or in random order - A doesn't exist yet when you're trying to make an expression to create B. You need to repeat the expressions - I don't think you'll be able to make "simpler" computed columns without repeating them, and views the same - you'll have to nest things, like:
SELECT A, B, C = B * 7
FROM
(
SELECT A, B = A + 3
FROM
(
SELECT A = (1 + 2)
) AS x
) AS y;
Or repeat the expression (but I guess that is what you're trying to avoid).

Another option if someone is still interested:
with aa(a) as ( select 1+2 )
, bb(b) as ( select a+3 from aa )
,cc(c) as ( select b*7 from bb)
SELECT aa.a, bb.b, cc.c
from aa,bb,cc

The only way to "save" the results of your calculations would be using them in a subquery, that way you can use A, B and C. Unfortunately it cannot be done any other way.

You can create computed columns to represent the values you want. Also, you can use a view if your calculations are dependent on data in a separate table.

Do you want calculated results out of your table? In that case you can put the relevant calculations in scalar valued user defined function and use that inside your select statement.
Or do you want the calculated results to appear as columns in the table, then use a computed column:
CREATE TABLE Test(
ID INT NOT NULL IDENTITY(1,1),
TimesTen AS ID * 10
)

Related

T-SQL - Verify that all values are included in the result of a select statement

Let say I have a function in which I pass a comma separated list of values. Then I have another function which returns a table. How can I verify that each of the values from the comma separated list is included in the result of the second function? If the comma separated list contains a value that is not in the result from the second function, the result must be FALSE.
Valid scenario:
input: A, B, C
result from second function: A, B, C, D
Invalid scenario:
input: A, B, C, D
result from second function: A, B, C
Thanks in advance.

Something like this:
DECLARE #input VARCHAR(100)='1,3,4';
WITH splittedInput AS
(
SELECT val.value('text()[1]','int') As theInt
FROM
(
SELECT CAST('<x>' + REPLACE(#input,',','</x><x>') + '</x>' AS XML) AS singleValue
) AS x
CROSS APPLY x.singleValue.nodes('/x') As y(val)
)
SELECT *
FROM splittedInput AS si
LEFT JOIN (VALUES(1),(2),(3),(4)) AS t(x) ON t.x=si.theInt
WHERE t.x IS NULL;
Run this part separately
SELECT val.value('text()[1]','int') As theInt
FROM
(
SELECT CAST('<x>' + REPLACE(#input,',','</x><x>') + '</x>' AS XML) AS singleValue
) AS x
CROSS APPLY x.singleValue.nodes('/x') As y(val)
You see that this will return your comma separated input as derived table.
if you are using SQL Server 2016+ you can use STRING_SPLIT() which makes the whole thing much simpler.
The example is creating a simulated result using VALUES to return 1, 2, 3 and 4. the LEFT JOIN will return all the result values together with the joinable input values. If there is none, it will be NULL. Run it without the WHERE clause to see the difference.
Try to add a value to your input, which is not included in the output and try the script again.
UPDATE
With a SELECT like this
SELECT CASE WHEN COUNT(*)>0 THEN 0 ELSE 1 END AS ResultIsValid
You'd get a single 0 or 1 marking the validity to return this - if needed.
UPDATE 2: Using STRING_SPLIT()
With version 2016 MS introduced some new string methods, one of them is STRING_SPIT().
I cannot test this at the moment (would need a SQL-Server 2016+) but this should work
SELECT *
FROM STRING_SPLIT(#input,',') AS ss
LEFT JOIN (VALUES(1),(2),(3),(4)) AS t(x) ON t.x=ss.value
WHERE t.x IS NULL;

SQL Server - WHERE <several columns> in (<list of columns values>)

I have done this long ago in other DBMSs (Oracle or MySQL... don't really remember) and I'm looking for the way to do this in SQL Server, if possible at all.
Suppose you have a table with several columns, say A, B, C, ... M. I wish to phrase a select from this table where columns A, B, and C display specific sets of value or, in other words, a list of values combinations.
For instance, I wish to retrieve all the records that match any of the following combinations:
A B C
1 'Apples' '2016-04-12'
56 'Cars' '2014-02-11'
....
Since the list of possible combinations may be quite long (including the option of an inner SELECT), it would not be practical to use something like:
WHERE ( A = 1 AND B = 'Apples' and C = '2016-04-12' ) OR
( A = 56 AND B = 'Cars' and C = '2014-02-11' ) OR
...
As stated, I did use this type of construct in the past and it was something like:
SELECT *
FROM MyTable
WHERE (A,B,C) IN (SELECT A,B,C FROM MYOtherTable) ;
[Most likely this syntax is wrong but it shows what I'm looking for]
Also, I would rather avoid Dynamic SQL usage.
So, the questions would be:
Is this doable in SQL Server?
If the answer is YES, how should the SELECT be phrased?
Thanks in advance.

You can use JOIN
SELECT m1.*
FROM MyTable m1
JOIN MYOtherTable m2
ON m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C
or Exists
SELECT m1.*
FROM MyTable m1
WHERE EXISTS (SELECT 1
FROM MYOtherTable m2
WHERE m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C)

I need to get the whole row for all duplicate entries efficiently

internet! I'm pretty new to SQL and I need to get all the rows with duplicate information in certain fields and have them display right next to their other duplicates (group by duplicates).
For instance, say I have a table with columns:
A,B,C,D,E,F,G
I want to be able to get all entries (the full row) where B, C, D, and E share the same value as another entry and show the duplicates right next to the original entry. I already have a solution, but it is horribly inefficient. I am trying to improve my running time here.
My original solution was this:
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT
B+C+D+E+F+G
FROM
tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
)
ORDER BY B,C,D,E,F,G ASC
This gave me the results that I wanted, but it is horrendously slow (took over 15 mins to run). I reworked my solution with a temporary table and shaved the time down to 5 mins of running time using this script:
--Drop the temp table if it exists.
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
SELECT
B+C+D+E+F+G AS CompareString
INTO #Temp1
FROM tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT * FROM #Temp1
)
ORDER BY B,C,D,E,F,G ASC
Five minutes still seems like a long time. Is there a faster way to do this? I'm new to SQL, so if something I did was not good, let me know! Thanks!

I would do something like this:
with cte as (
SELECT *
, count(*) over (partition by B, C, D, E, F, G) as cnt
, dense_rank() over (order by B, C, D, E, F, G) as grp
FROM STI.[dbo].[tbl_Consignee]
)
select *
from cte
where cnt > 1
order by grp
Essentially, the dense_rank() call gives each unique tuple an identifier (so you can put duplicates next to each other with the order by clause) and the count counts the number of rows per group.

Without actual data, I have to make a few assumptions here.
First, I assume your lettered fields are all text types, and you are using + to concatenate and not to add numeric values (otherwise A+B+C = 6 when A = 1 B = 2 and C = 3 as well as when A=2 B=3 and C=1, which would not be a match).
Next I'm going to assume there is a key field of some kind on each row that is not represented in your example. Something like tbl_myTable.MyTableKey bigint IDENTITY (1,1) NOT NULL.
Assuming all that, I'd try...
SELECT
[BaseTable].MyTableKey AS [Original Record],
[DupCheckTable].MyTableKey AS [Duplicate Record]
FROM
tbl_myTable [BaseTable]
LEFT OUTER JOIN tbl_myTable [DupCheckTable] ON
[BaseTable].A = [DupCheckTable].A
AND
[BaseTable].B = [DupCheckTable].B
AND
--... repeat for each actual field
--AND
[BaseTable].G = [DupCheckTable].G
AND
[BaseTable].MyTableKey < [DupCheckTable].MyTableKey --the less than operator prevents you from getting each match twice
WHERE
[DupCheckTable].MyTableKey IS NOT NULL
I think this will run faster because you can use the table key, which is presumably indexed, as part of the join. Also, you feed any of your (or my) queries to the Tuning Advisor to see what it thinks would help in the lines of statistics and indexes.

Counts rows returned by query that uses distinct

I have a simple query in this form:
SELECT DISTINCT a, b, c
FROM MyTable
WHERE a = SomeConditionsEtc
etc...
But I need to know how many rows it's going to return. Initially I was doing this:
SELECT COUNT(DISTINCT a)
FROM MyTable
WHERE a = SomeConditionsEtc
But that's not reliable in case a contains duplicates where the other don't. So now I'm using a nested query:
SELECT COUNT(*)
FROM (SELECT DISTINCT a, b, c
FROM MyTable
WHERE a = SomeConditionsEtc) AS Temp
Is that the correct approach or is there a better way?

Your query is straight to the point, does the job, and it's simple enough, I'm sure you can bake some unnecessary rocket science into it, but would be overblown imho. Aside from what you have, you can use a group by like below to illustrate what I mean, but you will be basically doing the same thing, getting the uniques and counting them.
SELECT COUNT(1)
FROM (SELECT a
FROM MyTable
WHERE a = 'a'
GROUP BY a, b, c) Temp

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?

If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.

You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)

Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?

If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to reuse calculated columns avoiding duplicating the sql statement - sql-server

I have a lots of calculated columns and they keep repeating themselves, one inside of the others, including nested cases statements. There is a really simplified version of something that I've searching a way to do. SELECT (1+2) AS A, A + 3 AS B, B * 7 AS C FROM MYTABLE

You could try something like this. SELECT A.Val AS A, B.Val AS B, C.Val AS C FROM MYTABLE cross apply(select 1 + 2) as A(Val) cross apply(select A.Val + 3) as B(Val) cross apply(select B.Val * 7) as C(Val)

Another option if someone is still interested: with aa(a) as ( select 1+2 ) , bb(b) as ( select a+3 from aa ) ,cc(c) as ( select b*7 from bb) SELECT aa.a, bb.b, cc.c from aa,bb,cc

The only way to "save" the results of your calculations would be using them in a subquery, that way you can use A, B and C. Unfortunately it cannot be done any other way.

You can create computed columns to represent the values you want. Also, you can use a view if your calculations are dependent on data in a separate table.

Related

T-SQL - Verify that all values are included in the result of a select statement

SQL Server - WHERE <several columns> in (<list of columns values>)

I need to get the whole row for all duplicate entries efficiently

Counts rows returned by query that uses distinct

set difference in SQL query

Categories

Resources