inner join Vs scalar Function

inner join Vs scalar Function - sql-server

Which of the following query is better... This is just an example, there are numerous situations, where I want the user name to be displayed instead of UserID
Select EmailDate, B.EmployeeName as [UserName], EmailSubject
from Trn_Misc_Email as A
inner join
Mst_Users as B on A.CreatedUserID = B.EmployeeLoginName
or
Select EmailDate, GetUserName(CreatedUserID) as [UserName], EmailSubject
from Trn_Misc_Email
If there is no performance benefit in using the First, I would prefer using the second... I would be having around 2000 records in User Table and 100k records in email table...
Thanks

A good question and great to be thinking about SQL performance, etc.
From a pure SQL point of view the first is better. In the first statement it is able to do everything in a single batch command with a join. In the second, for each row in trn_misc_email it is having to run a separate BATCH select to get the user name. This could cause a performance issue now, or in the future
It is also eaiser to read for anyone else coming onto the project as they can see what is happening. If you had the second one, you've then got to go and look in the function (I'm guessing that's what it is) to find out what that is doing.
So in reality two reasons to use the first reason.

The inline SQL JOIN will usually be better than the scalar UDF as it can be optimised better.
When testing it though be sure to use SQL Profiler to view the cost of both versions. SET STATISTICS IO ON doesn't report the cost for scalar UDFs in its figures which would make the scalar UDF version appear better than it actually is.

Scalar UDFs are very slow, but the inline ones are much faster, typically as fast as joins and subqueries
BTW, you query with function calls is equivalent to an outer join, not to an inner one.

To help you more, just a tip, in SQL server using the Managment studio you can evaluate the performance by Display Estimated execution plan. It shown how the indexs and join works and you can select the best way to use it.
Also you can use the DTA (Database Engine Tuning Advisor) for more info and optimization.

Related

is index still effective after data has been selected?

I have two tables that I want to join, they both have index on the column I am trying to join.
QUERY 1
SELECT * FROM [A] INNER JOIN [B] ON [A].F = [B].F;
QUERY 2
SELECT * FROM (SELECT * FROM [A]) [A1] INNER JOIN (SELECT * FROM B) [B1] ON [A1].F=[B1].F
the first query clearly will utilize the index, what about the second one?
after the two select statements in the brackets are executed, then join would occur, but my guess is the index wouldn't help to speed up the query because it is pretty much a new table..

The query isn't executed quite so literally as you suggest, where the inner queries are executed first and then their results are combined with the outer query. The optimizer will take your query and will look at many possible ways to get your data through various join orders, index usages, etc. etc. and come up with a plan that it feels is optimal enough.
If you execute both queries and look at their respective execution plans, I think you will find that they use the exact same one.
Here's a simple example of the same concept. I created my schema as so:
CREATE TABLE A (id int, value int)
CREATE TABLE B (id int, value int)
INSERT INTO A (id, value)
VALUES (1,900),(2,800),(3,700),(4,600)
INSERT INTO B (id, value)
VALUES (2,800),(3,700),(4,600),(5,500)
CREATE CLUSTERED INDEX IX_A ON A (id)
CREATE CLUSTERED INDEX IX_B ON B (id)
And ran queries like the ones you provided.
SELECT * FROM A INNER JOIN B ON A.id = B.id
SELECT * FROM (SELECT * FROM A) A1 INNER JOIN (SELECT * FROM B) B1 ON A1.id = B1.id
The plans that were generated looked like this:
Which, as you can see, both utilize the index.

Chances are high that the SQL Server Query Optimizer will be able to detect that Query 2 is in fact the same as Query 1 and use the same indexed approach.
Whether this happens depends on a lot of factors: your table design, your table statistics, the complexity of your query, etc. If you want to know for certain, let SQL Server Query Analyzer show you the execution plan. Here are some links to help you get started:
Displaying Graphical Execution Plans
Examining Query Execution Plans

SQL Server uses predicate pushing (a.k.a. predicate pushdown) to move query conditions as far toward the source tables as possible. It doesn't slavishly do things in the order you parenthesize them. The optimizer uses complex rules--what is essentially a kind of geometry--to determine the meaning of your query, and restructure its access to the data as it pleases in order to gain the most performance while still returning the same final set of data that your query logic demands.
When queries become more and more complicated, there is a point where the optimizer cannot exhaustively search all possible execution plans and may end up with something that is suboptimal. However, you can pretty much assume that a simple case like you have presented is going to always be "seen through" and optimized away.
So the answer is that you should get just as good performance as if the two queries were combined. Now, if the values you are joining on are composite, that is they are the result of a computation or concatenation, then you are almost certainly not going to get the predicate push you want that will make the index useful, because the server won't or can't do a seek based on a partial string or after performing reverse arithmetic or something.
May I suggest that in the future, before asking questions like this here, you simply examine the execution plan for yourself to validate that it is using the index? You could have answered your own question with a little experimentation. If you still have questions, then come post, but in the meantime try to do some of your own research as a sign of respect for the people who are helping you.
To see execution plans, in SQL Server Management Studio (2005 and up) or SQL Query Analyzer (SQL 2000) you can just click the "Show Execution Plan" button on the menu bar, run your query, and switch to the tab at the bottom that displays a graphical version of the execution plan. Some little poking around and hovering your mouse over various pieces will quickly show you which indexes are being used on which tables.
However, if things aren't as you expect, don't automatically think that the server is making a mistake. It may decide that scanning your main table without using the index costs less--and it will almost always be right. There are many reasons that scanning can be less expensive, one of which is a very small table, another of which is that the number of rows the server statistically guesses it will have to return exceeds a significant portion of the table.

These both queries are same. The second query will be transformed just same as first one during transformation.
However, if you have specific requirement I would suggest that you put the whole code.Then It would be much easier to answer your question.

Sql query gets too slow

Few days ago I wrote one query and it gets executes quickly but now a days it takes 1 hrs.
This query run on my SQL7 server and it takes about 10 seconds.
This query exists on another SQL7 server and until last week it took about
10 seconds.
The configuration of both servers are same. Only the hardware is different.
Now, on the second server this query takes about 30 minutes to extract the s
ame details, but anybody has changed any details.
If I execute this query without Where, it'll show me the details in 7
seconds.
This query still takes about same time if Where is problem

Without seeing the query and probably the data I can't do a lot other than offer tips.
Can you put more constraints on the query. If you can reduce the amount of data involved then this will speed up the query.
Look at the columns used in your joins, where and having clauses and order by. Check that the tables that the columns belong to contain indices for these columns.
Do you need to use the user defined function or can it be done another way?
Are you using subquerys? If so can these be pulled out into separate views?
Hope this helps.

Without knowing how much data is going into your tables, and not knowing your schema, it's hard to give a definitive answer but things to look at:
Try running UPDATE STATS or DBCC REINDEX.
Do you have any indexes on the tables? If not, try adding indexes to columns used in WHERE clauses and JOIN predicates.
Avoid cross table OR clauses (i.e, where you do WHERE table1.col1 = #somevalue OR table2.col2 = #someothervalue). SQL can't use indexes effectively with this construct and you may get better performance by splitting the query into two and UNION'ing the results.
What do your functions (UDFs) do and how are you using them? It's worth noting that dropping them in the columns part of a query gets expensive as the function is executed per row returned: thus if a function does a select against the database, then you end up running n + 1 queries against the database (where n = number of rows returned in the main select). Try and engineer the function out if possible.
Make sure your JOINs are correct -- where you're using a LEFT JOIN, revisit the logic and see if it needs to be a LEFT or whether it can be turned into an INNER JOIN. Sometimes people use LEFT JOINs, but when you examine the logic in the rest of the query, it can sometimes be apparent that the LEFT JOIN gives you nothing (because, for example, someone may had added a WHERE col IS NOT NULL predicate against the joined table). INNER JOINs can be faster, so it's worth reviewing all of these.
It would be a lot easier to suggest things if we could see the query.

Performance characteristics of T-SQL CTEs

I've got some SQL that looks roughly like this:
with InterestingObjects(ObjectID, OtherInformation, Whatever) as (
select X.ObjectID, Y.OtherInformation, Z.Whatever
from X join Y join Z -- abbreviated for brevity
)
-- ...long query follows, which uses InterestingObjects in several more CTEs,
-- and then uses those CTEs in a select statement at the end.
When I run it, I can see in the execution plan that it appears to be running the query in the CTE basically every single time the CTE is referenced. If I instead create a temp table #InterestingObjects and use it, of course, it runs the query once, puts the result in the temp table, and queries that from then on. In my particular instance, that makes the whole thing run much faster.
My question is: Is this always what I can expect from CTEs (not memoizing the results in any way, just as if it were inlining the query everywhere?) Is there a reason that SQL Server could not optimize this better? Usually I am in awe at how smart the optimizer is, but I'm surprised that it couldn't figure this out.
(edit: BTW, I'm running this on SQL Server '08 R2.)

CTE's can be better or worse, just depending on how they're used (involving concepts of recursion, indexing, etc.). You might find this article interesting: http://www.sqlservercentral.com/articles/T-SQL/2926/

Select statement performance

I'm having a performance issue with a select statement I'm executing.
Here it is:
SELECT Material.*
FROM Material
INNER JOIN LineInfo ON Material.LineInfoCtr = LineInfo.ctr
INNER JOIN Order_Header ON LineInfo.Order_HeaderCtr = Order_Header.ctr
WHERE (Order_Header.jobNum = 'ttest')
AND (Order_Header.revision_number = 0)
AND (LineInfo.lineNum = 46)
The statement is taking 5-10 seconds to execute depending on server load.
Some table stats:
- Material has 2,030,xxx records.
- Lineinfo has 190,xxx records
- Order_Header has 2,5xx records.
My statement is returning a total of 18 rows containing about 20-25 fields of data. Returning a single field or all of them makes no difference. Is this performance typical? Is there something I could do to improve it?
I've tried using a sub select to retrieve the foreign key, the IN clause and I found one post where a fella said using a left outer join helped him. For me, they all yield the same 5 to 10 seconds of execution time.
This is MS SQL server 2005 accessed through MS SQL management studio. Times are the elapsed time in query analyzer.
Any ideas?

The first thing you should do is analyze the query plan, to see what indexes (if any) SQL Server is using.
You can probably benefit from some covering indexes in this query, since you only use columns in Lineinfo and Order_Header for the join and the query restriction (the WHERE clause).

I do not see anything special in your query so, if indexes are correct, it should perform much more faster than that,, the number of rows is not very high.
Do you have indexes on the table involved in the query and have you tried to use the "display execution plan" option of the Query Analyzer. Basically you need to run the query, loop at the execution plan and add indexes so that you do not see any full table scan operation.
If you run from SQL Management studio then you have the option to tune automatically the query adding indexes but I would suggest trying optimize on your own to better understand what you're doing.
Regards
Massimo

It won't affect performance, but don't write a query such as "SELECT * FROM X". Eschew the star notation and spell out the individual columns. The code that calls this will still work that way, even if the schema is changed by adding a column.
Indexes are key here, as others have already said.
The order of the WHERE clauses can help. Execute the one that eliminates the greatest number of rows from consideration first.

Taking all suggestions and rolling them together I was able to setup some indexes and now it's taking less than a second to execute. Honestly, it's almost immediate.
My problem was that by clicking on the table properties I saw that the primary key was indexed and I mistakenly thought that's what everyone had been talking about. I looked at the execution plan and ran the tuning assistant and putting the two together, I realized that you could index the foreign keys too. That is now done and things are exceptionally snappy.
Thanks for the help, and sorry for such a newb question.

SQL Server query taking up 100% CPU and runs for hours

I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.

Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.

You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.

If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.

Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.

I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.

Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id

-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.

Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight