Multiple Joins in SQL query - sql-server

I'm trying to run a query from multiple tables and I'm having an issue with the query taking over 10 minutes to provide just 3 records. The query is as follows:
select TOP 100 pm_entity_type_name, year(event_date),
pm_event_type_name, pm_event_name, pm_entity_name,
pm_entity_code, event_priority, event_cost
from pm_event_priority, pm_entity, pm_entity_type, pm_event_type, pm_event
where pm_event.pm_event_id = pm_event_priority.pm_event_id
And pm_entity.pm_entity_id = pm_event_priority.pm_entity_id
And pm_entity_type.pm_entity_type_id = pm_entity.pm_entity_type_id
And pm_event_type.pm_event_type_id = pm_event_priority.pm_event_type_id
And ( pm_entity.pm_entity_type_id = '002LEITUU0005T8EX40001XFTEW000000OZX' OR
pm_entity_type.parent_id= '002LEITUU0005T8EX40001XFTEW000000OZX' )
ORDER BY 1,2,3
I wonder, is there any way I can modify this query to possibly make the query a little faster?

Query performance can tank when you have to join many large tables together, particularly when the join columns are not properly indexed. In your case, I suspect your tables are quite large (many rows) and the _id columns are not indexed.
If you are using SQL Server Management Studio, you can click on "Display Estimated Execution Plan" to see how the query optimizer is interpreting your query. If you see a bunch of Table Scans rather than Index Scans/Seeks, this means SQL Server has to read through each and every row in your tables; a performance nightmare! Try putting some indexes on the _id columns of each table (perhaps a clustered index), and/or using Database Engine Tuning Advisor to automatically recommend the best index structure to apply to your tables to improve this query's performance.

You need to look at the query plan. See this question on how to obtain it.
Once you have a query plan, see if you can tell from the list what is so slow. Chances are there are table scans because of missing indexes.
What happens if you take out the TOP 100 and do a SELECT * using the same criteria? If you get a ridiculous amount of data back, there may be missing join criteria.

Related

is index still effective after data has been selected?

I have two tables that I want to join, they both have index on the column I am trying to join.
QUERY 1
SELECT * FROM [A] INNER JOIN [B] ON [A].F = [B].F;
QUERY 2
SELECT * FROM (SELECT * FROM [A]) [A1] INNER JOIN (SELECT * FROM B) [B1] ON [A1].F=[B1].F
the first query clearly will utilize the index, what about the second one?
after the two select statements in the brackets are executed, then join would occur, but my guess is the index wouldn't help to speed up the query because it is pretty much a new table..
The query isn't executed quite so literally as you suggest, where the inner queries are executed first and then their results are combined with the outer query. The optimizer will take your query and will look at many possible ways to get your data through various join orders, index usages, etc. etc. and come up with a plan that it feels is optimal enough.
If you execute both queries and look at their respective execution plans, I think you will find that they use the exact same one.
Here's a simple example of the same concept. I created my schema as so:
CREATE TABLE A (id int, value int)
CREATE TABLE B (id int, value int)
INSERT INTO A (id, value)
VALUES (1,900),(2,800),(3,700),(4,600)
INSERT INTO B (id, value)
VALUES (2,800),(3,700),(4,600),(5,500)
CREATE CLUSTERED INDEX IX_A ON A (id)
CREATE CLUSTERED INDEX IX_B ON B (id)
And ran queries like the ones you provided.
SELECT * FROM A INNER JOIN B ON A.id = B.id
SELECT * FROM (SELECT * FROM A) A1 INNER JOIN (SELECT * FROM B) B1 ON A1.id = B1.id
The plans that were generated looked like this:
Which, as you can see, both utilize the index.
Chances are high that the SQL Server Query Optimizer will be able to detect that Query 2 is in fact the same as Query 1 and use the same indexed approach.
Whether this happens depends on a lot of factors: your table design, your table statistics, the complexity of your query, etc. If you want to know for certain, let SQL Server Query Analyzer show you the execution plan. Here are some links to help you get started:
Displaying Graphical Execution Plans
Examining Query Execution Plans
SQL Server uses predicate pushing (a.k.a. predicate pushdown) to move query conditions as far toward the source tables as possible. It doesn't slavishly do things in the order you parenthesize them. The optimizer uses complex rules--what is essentially a kind of geometry--to determine the meaning of your query, and restructure its access to the data as it pleases in order to gain the most performance while still returning the same final set of data that your query logic demands.
When queries become more and more complicated, there is a point where the optimizer cannot exhaustively search all possible execution plans and may end up with something that is suboptimal. However, you can pretty much assume that a simple case like you have presented is going to always be "seen through" and optimized away.
So the answer is that you should get just as good performance as if the two queries were combined. Now, if the values you are joining on are composite, that is they are the result of a computation or concatenation, then you are almost certainly not going to get the predicate push you want that will make the index useful, because the server won't or can't do a seek based on a partial string or after performing reverse arithmetic or something.
May I suggest that in the future, before asking questions like this here, you simply examine the execution plan for yourself to validate that it is using the index? You could have answered your own question with a little experimentation. If you still have questions, then come post, but in the meantime try to do some of your own research as a sign of respect for the people who are helping you.
To see execution plans, in SQL Server Management Studio (2005 and up) or SQL Query Analyzer (SQL 2000) you can just click the "Show Execution Plan" button on the menu bar, run your query, and switch to the tab at the bottom that displays a graphical version of the execution plan. Some little poking around and hovering your mouse over various pieces will quickly show you which indexes are being used on which tables.
However, if things aren't as you expect, don't automatically think that the server is making a mistake. It may decide that scanning your main table without using the index costs less--and it will almost always be right. There are many reasons that scanning can be less expensive, one of which is a very small table, another of which is that the number of rows the server statistically guesses it will have to return exceeds a significant portion of the table.
These both queries are same. The second query will be transformed just same as first one during transformation.
However, if you have specific requirement I would suggest that you put the whole code.Then It would be much easier to answer your question.

Oracle 11g: Index not used in "select distinct"-query

My question concerns Oracle 11g and the use of indexes in SQL queries.
In my database, there is a table that is structured as followed:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above.
The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
As an experiment, I tested the following SQL command:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text".
The whole index file has a size of about 50 GB.
A few more informations:
The database is Oracle 11g installed on my local computer.
I use Windows 7 Enterprise 64bit.
The whole index is split over 3 dbf files with about 50GB size.
I would really be glad, if someone could tell me how to make Oracle use the index in the first query.
Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
Thanks in advance.
[01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan.
The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as #Dan and #Ollie mentioned, with year as the second column this will make the index even slower.
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
If the time is off, check your workload statistics.
Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
Your index should be:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
Hope it helps...
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year as the first column in the index it might use it.
Your second test query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
I don't know if it's relevant, but I tested the following query:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
The columns dynamic_col_1 and dynamic_col_1_text are nullable. Does this have an effect on the usage of the index?
01.10.2011: UPDATE]
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
Try this:
1) Create an index on year field (see Ollie answer).
2) And then use this query:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.

Sql query gets too slow

Few days ago I wrote one query and it gets executes quickly but now a days it takes 1 hrs.
This query run on my SQL7 server and it takes about 10 seconds.
This query exists on another SQL7 server and until last week it took about
10 seconds.
The configuration of both servers are same. Only the hardware is different.
Now, on the second server this query takes about 30 minutes to extract the s
ame details, but anybody has changed any details.
If I execute this query without Where, it'll show me the details in 7
seconds.
This query still takes about same time if Where is problem
Without seeing the query and probably the data I can't do a lot other than offer tips.
Can you put more constraints on the query. If you can reduce the amount of data involved then this will speed up the query.
Look at the columns used in your joins, where and having clauses and order by. Check that the tables that the columns belong to contain indices for these columns.
Do you need to use the user defined function or can it be done another way?
Are you using subquerys? If so can these be pulled out into separate views?
Hope this helps.
Without knowing how much data is going into your tables, and not knowing your schema, it's hard to give a definitive answer but things to look at:
Try running UPDATE STATS or DBCC REINDEX.
Do you have any indexes on the tables? If not, try adding indexes to columns used in WHERE clauses and JOIN predicates.
Avoid cross table OR clauses (i.e, where you do WHERE table1.col1 = #somevalue OR table2.col2 = #someothervalue). SQL can't use indexes effectively with this construct and you may get better performance by splitting the query into two and UNION'ing the results.
What do your functions (UDFs) do and how are you using them? It's worth noting that dropping them in the columns part of a query gets expensive as the function is executed per row returned: thus if a function does a select against the database, then you end up running n + 1 queries against the database (where n = number of rows returned in the main select). Try and engineer the function out if possible.
Make sure your JOINs are correct -- where you're using a LEFT JOIN, revisit the logic and see if it needs to be a LEFT or whether it can be turned into an INNER JOIN. Sometimes people use LEFT JOINs, but when you examine the logic in the rest of the query, it can sometimes be apparent that the LEFT JOIN gives you nothing (because, for example, someone may had added a WHERE col IS NOT NULL predicate against the joined table). INNER JOINs can be faster, so it's worth reviewing all of these.
It would be a lot easier to suggest things if we could see the query.

SQL Server ORDER BY Performance Aberration

SQL Server 2008 running on Windows Server Enterprise(?) Edition 2008
I have a query joining against twenty some-odd tables (mostly LEFT OUTER JOINs). The full dataset returned by an unfiltered query returns less than 1,000 rows in less than 1s. When I apply a WHERE clause to filter the query it returns less than 300 rows in less than 1s.
When I apply an ORDER BY clause to the query it returns in 90s.
I examined the results of the query and notice a number of NULL results returned in the column that is being used to sort. I modified the query to COALESCE a NULL value to a valid search value without any change to the performance of the query.
I then did a
SELECT * FROM
(
my query goes here
) qry
ORDER BY myOrderByHere
And that produced the same results.
When I SELECT ... INTO #tempTable (without the ORDER BY) and then SELECT FROM the #tempTable with the order by the query returns in less than 1s.
What is really strange at this point is that the SELECT... INTO will also take 90s even without the ORDER BY.
The Execution Plan says that the SORT is taking 98% of the execution time when included with the main query. If I am doing the INSERT INTO the the explain plan says that the actual insert into the temp table takes 99% of the execution time.
And to take out server issues I have run the same tests on two different instances of SQL Server 2008 with nearly identical results.
Many thanks!
rjsjr
Sounds like something strange is going on with your tempdb. Inserting 1000 rows in a temporary table should be fast, whether it's an implicit spool for sorting, or an explicit select into.
Check the size of your tempdb, the health of the hard disk it's on, and it's recovery model (should be simple, not full or bulk logged.)
A sort operation is usually an expensive step in the query. So, it's not surprising that the addition of the sort adds time. You may be seeing similar results when you incorporate a temp table in your steps. The sort operation in your original query may use tempdb to help do the sort, and that can be the time-consuming step in each query you compare.
If you want to learn more about each query you're running, you can review query plan outputs.

Select statement performance

I'm having a performance issue with a select statement I'm executing.
Here it is:
SELECT Material.*
FROM Material
INNER JOIN LineInfo ON Material.LineInfoCtr = LineInfo.ctr
INNER JOIN Order_Header ON LineInfo.Order_HeaderCtr = Order_Header.ctr
WHERE (Order_Header.jobNum = 'ttest')
AND (Order_Header.revision_number = 0)
AND (LineInfo.lineNum = 46)
The statement is taking 5-10 seconds to execute depending on server load.
Some table stats:
- Material has 2,030,xxx records.
- Lineinfo has 190,xxx records
- Order_Header has 2,5xx records.
My statement is returning a total of 18 rows containing about 20-25 fields of data. Returning a single field or all of them makes no difference. Is this performance typical? Is there something I could do to improve it?
I've tried using a sub select to retrieve the foreign key, the IN clause and I found one post where a fella said using a left outer join helped him. For me, they all yield the same 5 to 10 seconds of execution time.
This is MS SQL server 2005 accessed through MS SQL management studio. Times are the elapsed time in query analyzer.
Any ideas?
The first thing you should do is analyze the query plan, to see what indexes (if any) SQL Server is using.
You can probably benefit from some covering indexes in this query, since you only use columns in Lineinfo and Order_Header for the join and the query restriction (the WHERE clause).
I do not see anything special in your query so, if indexes are correct, it should perform much more faster than that,, the number of rows is not very high.
Do you have indexes on the table involved in the query and have you tried to use the "display execution plan" option of the Query Analyzer. Basically you need to run the query, loop at the execution plan and add indexes so that you do not see any full table scan operation.
If you run from SQL Management studio then you have the option to tune automatically the query adding indexes but I would suggest trying optimize on your own to better understand what you're doing.
Regards
Massimo
It won't affect performance, but don't write a query such as "SELECT * FROM X". Eschew the star notation and spell out the individual columns. The code that calls this will still work that way, even if the schema is changed by adding a column.
Indexes are key here, as others have already said.
The order of the WHERE clauses can help. Execute the one that eliminates the greatest number of rows from consideration first.
Taking all suggestions and rolling them together I was able to setup some indexes and now it's taking less than a second to execute. Honestly, it's almost immediate.
My problem was that by clicking on the table properties I saw that the primary key was indexed and I mistakenly thought that's what everyone had been talking about. I looked at the execution plan and ran the tuning assistant and putting the two together, I realized that you could index the foreign keys too. That is now done and things are exceptionally snappy.
Thanks for the help, and sorry for such a newb question.

Resources