How does a view with union handle where's - sql-server

Let's say I have a View like this
CREATE VIEW MyView
AS
SELECT Id, Name FROM Source1
UNION
SELECT Id, Name FROM Source2
Then I query the View
SELECT Id, Name From MyView WHERE Name = 'Sally'
Will SQL Server internally first Select from Source1 and Source2 all the Data and then apply the where or will it put the where for each Select statement?

SQL Server can move predicates around as it sees fit in order to optimize a query. Views are effectively macros that are expanded into the body of the query before optimization occurs.
What it will do in any particular case isn't 100% possible to predict - because in SQL, you tell the system what you want, not how to do it.
For a trivial example like this, I would expect it to evaluate the predicate against the base tables and then perform the union, but only an examination of the query plan on your database, with your tables and indexes could answer the question for sure.

Depends on the optimizer, cardinalities, indices available etc but yes it will apply the criteria to base tables where appropriate.
Note that your UNION as oppose to a UNION ALL requires a SORT to remove duplicates.

Related

Using UNION select inside a view

I have a requirement to check if a specific user is already being referenced to one of our transaction tables (we have around 10 transaction tables). I suggested using a VIEW that will contain all the users that are already referenced, then the DEV team could just SELECT through that table to find out if the data they're looking for is there or not,
so here's my query for the view:
SELECT DISTINCT user_ID
FROM transaction_table_1
UNION
SELECT DISTINCT user_ID
FROM transaction_table_2
UNION
SELECT DISTINCT user_ID
FROM transaction_table_3
UNION
SELECT DISTINCT user_ID
FROM transaction_table_4
[...]
Right now it works, but my question is, is this a good idea? The requirement asks that I only provide a script (or a view) and not a stored procedure, I think this would be better with an SP since I could just do a quick IF EXIST() statement for each of the table and just check if the parameter user exists in any of the table, but they really wanted it to be only a script they could check (and no using of variables).
Can you guys give me advice on a better way of doing this requirement, that would have less impact on performance since this may not be the optimized solution for this requirement.
TIA,
Rommel
Well, you can remove the DISTINCT because UNION already makes it :)
SELECT user_ID
FROM transaction_table_1
UNION
SELECT user_ID
FROM transaction_table_2
UNION
SELECT user_ID
FROM transaction_table_3
UNION
SELECT user_ID
FROM transaction_table_4
But since you have to use a view, I don't see how to make it differently.
From a performance point of view I would structure the query slightly differently:
SELECT DISTINCT user_ID
FROM (
SELECT user_ID
FROM transaction_table_1
UNION ALL
SELECT user_ID
FROM transaction_table_2
UNION ALL
SELECT user_ID
FROM transaction_table_3
...
) x
This will reduce the number of unique index scans that need to be done to 1 - rather than having one each time a UNION is performed

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

Possible to test for null records in SQL only?

I am trying to help a co-worker with a peculiar problem, and she's limited to MS SQL QUERY code only. The object is to insert a dummy record (into a surrounding union) IF no records are returned from a query.
I am having a hard time going back and forth from PL/SQL to MS SQL, and I am appealing for help (I'm not particularly appealing, but I am appealing to the StackOverflow audiance).
Basically, we need a single, testable value from the target Select ... statement.
In theory, it would do this:
(other records from unions)
Union
Select "These" as fld1, "are" as fld2, "Dummy" as fld3, "Fields" as fld4
where NOT (Matching Logic)
Union
Select fld1, fld2, fld3, fld4 // Regular records exist
From tested_table
Where (Matching Logic)
Forcing an individual dummy record, with no conditions, works.
IS there a way to get a single, testable result from a Select?
Can't do it in code (not allowed), but can feed SQL
Anybody? Anybody? Bbeller?
You could put the unions in a with, then include another union that returns a null only when the big union is empty:
; with BigUnion as
(
select *
from table1
union all
select *
from table2
)
select *
from BigUnion
union all
select null
where not exists (select * from BigUnion)

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1
You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.
No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.
if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)
I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)

How to simplify this Sql query

The Table - Query has 2 columns (functionId, depFunctionId)
I want all values that are either in functionid or in depfunctionid
I am using this:
select distinct depfunctionid from Query
union
select distinct functionid from Query
How to do it better?
I think that's the best you'll get.
Thats as good as it gets I think...
Lose the DISTINCT clauses, as your UNION (vs UNION ALL) will take care of removing duplicates.
An alternative - but perhaps less clear and probably with the same execution plan - would be to do a FULL JOIN across the 2 columns.
SELECT
COALESCE(Query1.FunctionId, Query2.DepFunctionId) as FunctionId
FROM Query as Query1
FULL OUTER JOIN Query as Query2 ON
Query1.FunctionId = Query2.DepFunctionId
I am almost sure you can loose the distinct's.
When you use UNION instead of UNION ALL, duplicated results are thrown away.
It all depends on how heavy your inline view query is. The key for a better perfomance would be to execute only once, but that is not possible given the data that it returns.
If you do it like this :
select depfunctionid , functionid from Query
group by depfunctionid , functionid
It is very likely that you'll get repeated results for depfunctionid or functionid.
I may be wrong, but it seems to me that you're trying to retrieve a tree of dependencies. If thats the case, I personally would try to use a materialized path approach.
If the materialized path is stored in a self referencing table name, I would retrieve the tree using something like
select asrt2.function_id
from a_self_referencig_table asrt1,
a_self_referencig_table asrt2
where asrt1.function_name = 'blah function'
and asrt2.materialized_path like (asrt1.materialized_path || '%')
order by asrt2.materialized_path, asrt2.some_child_node_ordering_column
This would retrieved the whole tree in the proper order. What sucks is having to construct the materialized path based on the function_id and parent_function_id (or in your case, functionid and depfunctionid), but a trigger could take care of it quite easily.

Resources