A client of mine would like to create a view that is a UNION of a few tables.
From the client:
These tables are populated by streaming data sources and are pretty
sizable already. Because of how the set operations get applied, any
queries of this view are causing performance issues, since any
filters/predicates get applied after the UNIONs. I know that you
can’t materialize a view with UNION operations, so I was wondering
if Snowflake recommends any other solution short of building a
separate table which unions the constituent tables.
I created a view as follows and pruning worked as expected, prior to the UNION.
I wonder if their problem is due to UNION (has the performance penalty of de-duping) versus UNION ALL
create or replace view test_unions
(
mycol1,
mycol2,
mycol3
) as (
(select mycol1,mycol2,mycol3 from tableA)
union all (select mycol1,mycol2,mycol3 from tableB)
union all (select mycol1,mycol2,mycol3 from tableC)
)
;
Related
Let's say I have a View like this
CREATE VIEW MyView
AS
SELECT Id, Name FROM Source1
UNION
SELECT Id, Name FROM Source2
Then I query the View
SELECT Id, Name From MyView WHERE Name = 'Sally'
Will SQL Server internally first Select from Source1 and Source2 all the Data and then apply the where or will it put the where for each Select statement?
SQL Server can move predicates around as it sees fit in order to optimize a query. Views are effectively macros that are expanded into the body of the query before optimization occurs.
What it will do in any particular case isn't 100% possible to predict - because in SQL, you tell the system what you want, not how to do it.
For a trivial example like this, I would expect it to evaluate the predicate against the base tables and then perform the union, but only an examination of the query plan on your database, with your tables and indexes could answer the question for sure.
Depends on the optimizer, cardinalities, indices available etc but yes it will apply the criteria to base tables where appropriate.
Note that your UNION as oppose to a UNION ALL requires a SORT to remove duplicates.
This is a performance question where I want to combine two columns from two separate tables. How can you do the combination?
I understand this as or condition such that
SELECT a.contract1 or b.contract2 from TABLE1 a, TABLE2 b
where my goal is to get a single column where each element is either in Contract1 of Table1 or Contract2 of Table2. The or notation does not differentiate between distinct values and other values. I need distinct values. The proposed solution, the union method, acts slow with large datasets over many GBs because of the underlying distinct.
Please propose efficient methods to deal with the performance.
Input
Column in Table A
1
2
3
Golumn in Table B
1
3
5
Wanted Output
1
2
3
5
That's what UNION does
SELECT contract1 FROM TABLE1
UNION
SELECT contract2 FROM TABLE2
Edit
The performance problem you're talking about in your comment is probably caused by the nature of UNION itself; what happens behind the scenes is that the dbms executes both the statements separately then applies a distinct on the resulting set. On large tables this latter step may cause problems with the overall performances, and you can confirm that by switching to UNION ALL (which won't perform the distinct).
If you cannot settle for UNION ALL, because you don't want duplicates, I found this interesting article that proposes a solution for this kind of issues. It involves the usage of a table variable, that you populate with your two statements and from where you select to get the final result.
Essentially the steps are
DECLARE #Result TABLE (
Contract varchar(50)
— Example of how to declare a PK within a table variable
PRIMARY KEY ( Contract )
)
INSERT #Result
SELECT Contract1
FROM Table1
INSERT #Result
SELECT Contract2
FROM Table2
SELECT *
FROM #Result
but you can find a more detailed explaination at the link above
I have a requirement to check if a specific user is already being referenced to one of our transaction tables (we have around 10 transaction tables). I suggested using a VIEW that will contain all the users that are already referenced, then the DEV team could just SELECT through that table to find out if the data they're looking for is there or not,
so here's my query for the view:
SELECT DISTINCT user_ID
FROM transaction_table_1
UNION
SELECT DISTINCT user_ID
FROM transaction_table_2
UNION
SELECT DISTINCT user_ID
FROM transaction_table_3
UNION
SELECT DISTINCT user_ID
FROM transaction_table_4
[...]
Right now it works, but my question is, is this a good idea? The requirement asks that I only provide a script (or a view) and not a stored procedure, I think this would be better with an SP since I could just do a quick IF EXIST() statement for each of the table and just check if the parameter user exists in any of the table, but they really wanted it to be only a script they could check (and no using of variables).
Can you guys give me advice on a better way of doing this requirement, that would have less impact on performance since this may not be the optimized solution for this requirement.
TIA,
Rommel
Well, you can remove the DISTINCT because UNION already makes it :)
SELECT user_ID
FROM transaction_table_1
UNION
SELECT user_ID
FROM transaction_table_2
UNION
SELECT user_ID
FROM transaction_table_3
UNION
SELECT user_ID
FROM transaction_table_4
But since you have to use a view, I don't see how to make it differently.
From a performance point of view I would structure the query slightly differently:
SELECT DISTINCT user_ID
FROM (
SELECT user_ID
FROM transaction_table_1
UNION ALL
SELECT user_ID
FROM transaction_table_2
UNION ALL
SELECT user_ID
FROM transaction_table_3
...
) x
This will reduce the number of unique index scans that need to be done to 1 - rather than having one each time a UNION is performed
I am trying to help a co-worker with a peculiar problem, and she's limited to MS SQL QUERY code only. The object is to insert a dummy record (into a surrounding union) IF no records are returned from a query.
I am having a hard time going back and forth from PL/SQL to MS SQL, and I am appealing for help (I'm not particularly appealing, but I am appealing to the StackOverflow audiance).
Basically, we need a single, testable value from the target Select ... statement.
In theory, it would do this:
(other records from unions)
Union
Select "These" as fld1, "are" as fld2, "Dummy" as fld3, "Fields" as fld4
where NOT (Matching Logic)
Union
Select fld1, fld2, fld3, fld4 // Regular records exist
From tested_table
Where (Matching Logic)
Forcing an individual dummy record, with no conditions, works.
IS there a way to get a single, testable result from a Select?
Can't do it in code (not allowed), but can feed SQL
Anybody? Anybody? Bbeller?
You could put the unions in a with, then include another union that returns a null only when the big union is empty:
; with BigUnion as
(
select *
from table1
union all
select *
from table2
)
select *
from BigUnion
union all
select null
where not exists (select * from BigUnion)
The Table - Query has 2 columns (functionId, depFunctionId)
I want all values that are either in functionid or in depfunctionid
I am using this:
select distinct depfunctionid from Query
union
select distinct functionid from Query
How to do it better?
I think that's the best you'll get.
Thats as good as it gets I think...
Lose the DISTINCT clauses, as your UNION (vs UNION ALL) will take care of removing duplicates.
An alternative - but perhaps less clear and probably with the same execution plan - would be to do a FULL JOIN across the 2 columns.
SELECT
COALESCE(Query1.FunctionId, Query2.DepFunctionId) as FunctionId
FROM Query as Query1
FULL OUTER JOIN Query as Query2 ON
Query1.FunctionId = Query2.DepFunctionId
I am almost sure you can loose the distinct's.
When you use UNION instead of UNION ALL, duplicated results are thrown away.
It all depends on how heavy your inline view query is. The key for a better perfomance would be to execute only once, but that is not possible given the data that it returns.
If you do it like this :
select depfunctionid , functionid from Query
group by depfunctionid , functionid
It is very likely that you'll get repeated results for depfunctionid or functionid.
I may be wrong, but it seems to me that you're trying to retrieve a tree of dependencies. If thats the case, I personally would try to use a materialized path approach.
If the materialized path is stored in a self referencing table name, I would retrieve the tree using something like
select asrt2.function_id
from a_self_referencig_table asrt1,
a_self_referencig_table asrt2
where asrt1.function_name = 'blah function'
and asrt2.materialized_path like (asrt1.materialized_path || '%')
order by asrt2.materialized_path, asrt2.some_child_node_ordering_column
This would retrieved the whole tree in the proper order. What sucks is having to construct the materialized path based on the function_id and parent_function_id (or in your case, functionid and depfunctionid), but a trigger could take care of it quite easily.