How to simplify this Sql query

How to simplify this Sql query - sql-server

The Table - Query has 2 columns (functionId, depFunctionId)
I want all values that are either in functionid or in depfunctionid
I am using this:
select distinct depfunctionid from Query
union
select distinct functionid from Query
How to do it better?

I think that's the best you'll get.

Thats as good as it gets I think...

Lose the DISTINCT clauses, as your UNION (vs UNION ALL) will take care of removing duplicates.
An alternative - but perhaps less clear and probably with the same execution plan - would be to do a FULL JOIN across the 2 columns.
SELECT
COALESCE(Query1.FunctionId, Query2.DepFunctionId) as FunctionId
FROM Query as Query1
FULL OUTER JOIN Query as Query2 ON
Query1.FunctionId = Query2.DepFunctionId

I am almost sure you can loose the distinct's.
When you use UNION instead of UNION ALL, duplicated results are thrown away.
It all depends on how heavy your inline view query is. The key for a better perfomance would be to execute only once, but that is not possible given the data that it returns.
If you do it like this :
select depfunctionid , functionid from Query
group by depfunctionid , functionid
It is very likely that you'll get repeated results for depfunctionid or functionid.
I may be wrong, but it seems to me that you're trying to retrieve a tree of dependencies. If thats the case, I personally would try to use a materialized path approach.
If the materialized path is stored in a self referencing table name, I would retrieve the tree using something like
select asrt2.function_id
from a_self_referencig_table asrt1,
a_self_referencig_table asrt2
where asrt1.function_name = 'blah function'
and asrt2.materialized_path like (asrt1.materialized_path || '%')
order by asrt2.materialized_path, asrt2.some_child_node_ordering_column
This would retrieved the whole tree in the proper order. What sucks is having to construct the materialized path based on the function_id and parent_function_id (or in your case, functionid and depfunctionid), but a trigger could take care of it quite easily.

Related

BigQuery GENERATE_UUID() and CTE's

This behavior surprised me a little bit.
When you generate a uuid in a CTE (to make a row id, etc) and reference it in the future you'll find that it changes. It seems that generate_uuid() is being called twice instead of once. Anyone know why this is the case w/ BigQuery and what this is called?
I was using generate_uuid() to create a row_id and was finding that in my eventual joins that no matches were occurring because of this. Best way to get around it I've found is by just creating a table from the first CTE which cements the uuid in place for future use.
Still curious to know more about the why and what behind this.
with _first as (
select generate_uuid() as row_id
)
,_second as (
select * from _first
)
select row_id from _first
union all
select row_id from _second

curious to know more about the why and what behind this
This is by design:
WITH clauses are not materialized. Placing all your queries in WITH clauses and then running UNION ALL is a misuse of the WITH clause.
If a query appears in more than one WITH clause, it executes in each clause.
You can see in documentation - Do not treat WITH clauses as prepared statements

How does a view with union handle where's

Let's say I have a View like this
CREATE VIEW MyView
AS
SELECT Id, Name FROM Source1
UNION
SELECT Id, Name FROM Source2
Then I query the View
SELECT Id, Name From MyView WHERE Name = 'Sally'
Will SQL Server internally first Select from Source1 and Source2 all the Data and then apply the where or will it put the where for each Select statement?

SQL Server can move predicates around as it sees fit in order to optimize a query. Views are effectively macros that are expanded into the body of the query before optimization occurs.
What it will do in any particular case isn't 100% possible to predict - because in SQL, you tell the system what you want, not how to do it.
For a trivial example like this, I would expect it to evaluate the predicate against the base tables and then perform the union, but only an examination of the query plan on your database, with your tables and indexes could answer the question for sure.

Depends on the optimizer, cardinalities, indices available etc but yes it will apply the criteria to base tables where appropriate.
Note that your UNION as oppose to a UNION ALL requires a SORT to remove duplicates.

SQL Server Lookup Functions

Is it possible to build lookup type functions in SQL Server or are these always inferior (performance) to just writing subqueries/joins?
I would like to take some code like this
SELECT
ContactId,
ProductType,
SUM(OrderAmount) TotalOrders
FROM
(
SELECT
ContactId,
ProductType,
OrderAmount
FROM
UserOrders ord
JOIN
(
SELECT
ProductCode,
CASE
--Complex business logic
END ProductType
FROM
ItemTable
) item
ON
item.ProductCode=ord.ProductCode
) a
GROUP BY
ContactId,
ProductType
And instead be able to write a query like this
SELECT
ContactId,
UDF_GET_PRODUCT(ProductCode) ProductType,
SUM(OrderAmount) TotalOrders
FROM
UserOrders
GROUP BY
ContactId,
UDF_GET_PRODUCT(ProductCode)

It is possible, but not quite in the format you have described. Whether it is advisable or not really depends.
I agree with the other answer in that scalar functions are performance killers, and I personally do not use them at all.
That being said I don't think that is a reason to ignore the DRY principle where feasible. i.e. I would not take a short cut
if it had an impact on performance, however I also don't like the idea of having complex logic repeated in multiple places.
When anything changes you then have multiple queries to change, and inevitably some get missed, so if you will be re-using this
logic then it is a good idea to encapsulate it in a single place.
Based on your example perhaps a view would be most appropriate:
CREATE VIEW dbo.ItemTableWithLogic
AS
SELECT ProductCode,
ProductType = <your logic>
FROM ItemTable;
Then you can simply use:
SELECT ord.ContactId, item.ProductType, SUM(ord.OrderAmount) AS TotalOrders
FROM UserOrders AS ord
INNER JOIN dbo.ItemTableWithLogic AS item
ON item.ProductCode=ord.ProductCode
GROUP BY ord.ContactId, item.ProductType;
Which simplifies things somewhat.
Another alternative is an inline table valued function, something like:
CREATE FUNCTION dbo.GetProductType (#ProductCode INT)
RETURNS TABLE
AS
RETURN
( SELECT ProductType = <your logic>
FROM ItemTable
WHERE ProductCode = #ProductCode
);
Which can be called using:
SELECT ord.ContactId, item.ProductType, SUM(ord.OrderAmount) AS TotalOrders
FROM UserOrders AS ord
CROSS APPLY dbo.ItemTableWithLogic(ord.ProductCode) AS item
GROUP BY ord.ContactId, item.ProductType;
My preference is for views over table valued functions, however, it would really depend on your usage as to which I would recommend, so I don't really want to pick a side, I will stick to sitting on the fence.
In summary, If you only need to use the logic in one place, and won't need to reuse it in many queries then just stick to a subquery. If you need to reuse the same logic multiple times, don't use a scalar valued function in the same way you might in a procedural language, but also don't let this rule out other ways of keeping your logic in a single place.

Stick to sub-queries and Joins.
Because it would use a set based approach and execute the inner query once , apply aggregate on to the result set returned from the inner query and return the final result set.
On the other hand if you use a Scalar function like you have shown in your second query, all the code inside the function (sub-query in your original question) will be executed for the each row returned.
Scalar functions are performance killers and should avoid them whenever possible. This is the .net mentality that if you are having to write a piece of a code again and again put it inside a method and call the method, not true for sql server.

Is it okay to use table look up functionalities in scalar function?

In our case we have some business logic that looks into several tables in a certain order, so that the first non null value from one table is used. While the look up is not hard, but it does take several lines of SQL code to accomplish. I have read about scalar valued functions in SQL Server, but don't know if the re-compliation issue affects me enough to do it in a less convenient way.
So what's the general rule of thumb?
Would you rather have something like
select id, udfGetFirstNonNull(id), from mytable
Or is table-valued functions any better than scalar?
select id,
(select firstNonNull from udfGetFirstNonNull(id)) as firstNonNull
from myTable

The scalar udf will look up for each row in myTable which can run exponentially longer as data increases. Effectively you have a CURSOR. If you have a few rows, it won't matter of course.
I do the same myself where I don't expect a lot of rows (more than a few hundred).
However, I would consider a table value function where I've placed "foo" here. "foo" could also be a CTE in a UDF too (not tested):
select id,
(select firstNonNull from udfGetFirstNonNull(id)) as firstNonNull
from
myTable M
JOIN
(SELECT value, id as firstNonNull
FROM OtherTable
WHERE value IS NOT NULL
GROUP BY id
ORDER BY value) foo ON M.id = foo.id

Your first query is fine. One place I work for is absolutely obsessed with speed and optimization, and they use UDF's heavily in this way.

I think for readibility and maintainability, I would prefer to use the scalar function, as that is what it is returning.

How to add more OR searches with CONTAINS Brings Query to Crawl?

I have a simple query that relies on two full-text indexed tables, but it runs extremely slow when I have the CONTAINS combined with any additional OR search. As seen in the execution plan, the two full text searches crush the performance. If I query with just 1 of the CONTAINS, or neither, the query is sub-second, but the moment you add OR into the mix the query becomes ill-fated.
The two tables are nothing special, they're not overly wide (42 cols in one, 21 in the other; maybe 10 cols are FT indexed in each) or even contain very many records (36k recs in the biggest of the two).
I was able to solve the performance by splitting the two CONTAINS searches into their own SELECT queries and then UNION the three together. Is this UNION workaround my only hope?
SELECT a.CollectionID
FROM collections a
INNER JOIN determinations b ON a.CollectionID = b.CollectionID
WHERE a.CollrTeam_Text LIKE '%fa%'
OR CONTAINS(a.*, '"*fa*"')
OR CONTAINS(b.*, '"*fa*"')
Execution Plan:

I'd be curious to see if a LEFT JOIN to an equivalent CONTAINSTABLE would perform any better. Something like:
SELECT a.CollectionID
FROM collections a
INNER JOIN determinations b ON a.CollectionID = b.CollectionID
LEFT JOIN CONTAINSTABLE(a, *, '"*fa*"') ct1 on a.CollectionID = ct1.[Key]
LEFT JOIN CONTAINSTABLE(b, *, '"*fa*"') ct2 on b.CollectionID = ct2.[Key]
WHERE a.CollrTeam_Text LIKE '%fa%'
OR ct1.[Key] IS NOT NULL
OR ct2.[Key] IS NOT NULL

I was going to suggest to UNION each as their own query, but as I read your question I saw that you have found that. I can't think of a better way, so if it helps use it. The UNION method is a common approach to a poor performing query that has several OR conditions where each performs well on its own.

I would probably use the UNION. If you are really against it, you might try something like:
SELECT a.CollectionID
FROM collections a
LEFT OUTER JOIN (SELECT CollectionID FROM collections WHERE CONTAINS(*, '"*fa*"')) c
ON c.CollectionID = a.CollectionID
LEFT OUTER JOIN (SELECT CollectionID FROM determinations WHERE CONTAINS(*, '"*fa*"')) d
ON d.CollectionID = a.CollectionID
WHERE a.CollrTeam_Text LIKE '%fa%'
OR c.CollectionID IS NOT NULL
OR d.CollectionID IS NOT NULL

We've experience the exact same problem and at the time, put it down to our query being badly formed - that SQL 2005 had let us get away with it, but 2008 wouldn't.
In the end, we split the query into 2 SELECTs that were called using an IF. Glad someone else has had the same problem and that it's a known issue. We were seeing queries on a table with ~150,000 rows + full-text going from < 1 second (2005) to 30+ seconds (2008).