SQL Multiple Table Join - Best Optimization - sql-server

Hi am hoping someone can help my SQL theory out. I have to create a set of reports which use joins from multiple tables. These reports are running far slower than I would like and I am hoping to optimize my SQL although my knowledge has hit a wall and I cant seem to find anything on Google.
I am hoping someone here can give me some best practice guidance.
Essentially I am trying to filter on the results set as it comes back to reduce the number of rows included in later joins
Items INNER JOIN BlueItems ON Items.ItemID = BlueItems.ItemID AND BlueItems.shape = 'square'
LEFT JOIN ItemHistory ON Items.ItemID = ItemHistory.ItemsID
LEFT JOIN ItemDates ON Items.ItemID = ItemDates.ItemID
WHERE ItemDates.ManufactureDate BETWEEN '01/01/2017' AND '01/05/2017'
I figure that Inner Joining on Blue items that are squares vastly reduces the data set at this point?
I also understand that the Where clause is intelligent enough to reduce the data set on run time? Am I mistaken? Is it returning all the data and then just filtering on that data?
Any guidance on essentially how to speed this kind of query up would be fantastic, Index's and such have already been put in place. Unfortunately the database is actually looked after by someone else and I am simply creating reports based on their database. This does limit me to just being able to optimize my queries rather than the data itself.
I guess at this point its time for me to try and improve my knowledge on how SQL handles the various ways you can filter on data and try to understand which actually reduce the dataset used and which simply filter on it. Any guidance would be very appreciated!

You mentioned that the primary keys are all indexed, but this is always the case for primary key fields. The only portion of your current query which would possibly benefit from this is the first join with Items. For the other joins and the WHERE clause, these primary key fields are not being used.
For this particular query, I would suggest the following indices:
ALTER TABLE BlueItems ADD INDEX bi_item_idx (ItemID, shape)
ALTER TABLE ItemHistory ADD INDEX ih_item_idx (ItemID)
ALTER TABLE ItemDates ADD INDEX id_idx (ItemID, ManufactureDate)
For the ItemHistory table, the index ih_item_idx should speed up the join involving the ItemID foreign key. A column by the same name is also involved with the other two joins, and hence is part of the other indices. The reason for the composite indices (i.e. indices involving more than one column) is that we want to cover all the columns which appear in either the join or the WHERE clause.

These comments are not really an answer but too big to put in a comment...
IF the dates being passed in as parameters (i'm guessing they are) then it might be parameter sniffing that is causing the issue. The query may be using a bad plan.
I've seen this a lot especially when using the between operator. A few quick things to try as adding OPTION(RECOMPILE) to the end of your query. This might seem counter intuitive but just try it. Although compiled queries should be faster than recompiling, if a bad plan is being used, it can slow things down A LOT.
Also, If ItemDates is big, try dumping yuor filtered results to a temp table and joining to that, so something like.
SELECT * INTO #id FROM ItemDates i WHERE i.ManufactureDate BETWEEN '01/01/2017' AND '01/05/2017'
The change you main query to be something like
SELECT *
FROM Items
JOIN BlueItems ON Items.ItemID = BlueItems.ItemID AND BlueItems.shape = 'square'
JOIN #id i ON Items.ItemID = i.ItemID
LEFT JOIN ItemHistory ON Items.ItemID = ItemHistory.ItemsID
I also changed the JOIN from being a LEFT JOIN to a JOIN (implicitly an inner join) as you are only selecting items which have a match in ItemDates so LEFT joining makes no sense.

Related

How to compare tables for possible combinations to match people

I have to tables. A table that we call "The Vault" which has a ton of demographic data and Table A which has some of the demographic data. I am new at SQL Server, and I was tasked with finding a list of 21 students in The Vault table (Table B). Problem, there is no primary key or anything distinctive besides, FirstName, LastName, BirthMonth, Birthday, Birthyear.
Goal: We could not match these people in the conventional way we have, and so we are attempting a Hail Mary to try to see which of these shared combinations will perhaps land us with a match.
What I have tried doing: I have placed both tables on tempt tables, table A and table B, and I am trying to do an Inner Join but then I realized that in order to make it work, I would have to do a crazy join statement where I say (see the code below)
But the problem as you can imagine is it brings a lot more than my current 21 records and is in the thousands so then I would have to make that join statement longer but I am not sure this is the right way to do this. I believe that is where the WHERE clause would come in no?
Question: How do I compare these two tables for possible matches using a WHERE clause where I can mix and match different columns without having to filter the data constrains in the ON clause of the JOIN. I don't want to JOIN on like 6 different columns. Is there another way to do this so I can perhaps learn. I understand this is easier when you have a primary key shared and that would be the JOIN criteria I would use, but when we are comparing two tables to find possible matches, I have never done that.
FROM #Table a
INNER JOIN #table b ON (a.LAST_NAME = B.LAST_NAME AND a.FIRST_NAME = b.FIRST_NAME.....)```

Oracle Implicit Partition Pruning

I am trying to optimize a long-running stored procedure at my company. From checking the query plan, it looks like we could make some nice gains by writing the query to allow for better partition pruning. The trouble is, it seems like doing so would create a very verbose query. Essentially, we have a bunch of tables that have a foreign key to client and "sub-client". In many cases, data is not shared between clients/sub-clients so we partitioned on those IDs for each table. Here's a sample query to show what I mean:
SELECT ...
FROM CLIENT_PRODUCT cp
INNER JOIN ORDER o ON o.product_id = cp.id
INNER JOIN PRICE_HISTORY ph on ph.product_id = cp.id
WHERE cp.id = ?
All of the tables have a foreign key that references a client and sub client. The same client product cannot belong to two different clients or sub clients (Sorry. This example is using made up tables and is a bit contrived)
I can improve partition pruning by doing the following:
SELECT ...
FROM CLIENT_PRODUCT cp
INNER JOIN ORDER o ON o.product_id = cp.id and o.client_id = l_client_id and o.sub_client_id = l_sub_client_id
INNER JOIN PRICE_HISTORY ph on ph.product_id = cp.id and ph.client_id = l_client_id and ph.sub_client_id = l_sub_client_id
WHERE cp.id = ? and cp.client_id = l_client_id and cp.sub_client_id = l_sub_client_id
With this change, I just explicitly say what partition Oracle can look at for each join. This feels pretty gross though because I've added a bunch of mostly repeated SQL that doesn't functionally change what is returned. This same pattern would need to be applied for many joins (larger than the example)
I know that our application has an invariant that any Order for a Product must belong to the same Client and Sub-Client. Likewise, any Price-History item must belong to the same Client and Sub-Client as the Product. The same idea applies to many pairs of tables. In an ideal world, Oracle would be able to infer the Client and Sub-Client for each join from the other tables in the join because of that invariant. It does not seem to be doing that (and I understand that my specific invariant does not apply to everyone). Is there a way I can get Oracle to do this implicit partition pruning without me needing to add all those additional conditions? It seems like that would add a lot of value across the codebase and remove the need for all these "unnecessary" explicit joins.
There's also the possibility that I'm just totally overthinking / misunderstanding this so other suggestions would be great.

TSQL Joining three tables where none have a common column

This will probably be pretty simple, as I am very much a novice.
I have two tables I have imported from Excel, and pretty much I need to update an existing table of e-mail addresses based off of the email addresses from the spreadsheet.
My only issue is I cannot join on a common column, as none of the tables share a column.
So I am wondering if, in a Join, I can just put something like
FROM table a
INNER JOIN table b ON b.column 'name' = a.column 'nameplus' `
Any help would be appreciated!
A join without matching predicates can be implemented effectively be a cross join: i.e. every row in table A matched with every row in table B.
If you specify an INNER JOIN then you have to have an ON term, which either matches something or it doesn't: in your example you may have a technical match (i.e. b.column really does - perhaps totally coincidentally - match a.column) that makes no business sense.
So you either have
a CROSS JOIN: no way of linking the tables but the result is all possible combinations of rows
Or:
an inner join where you must specify how the rows are to be combined (or a left/right outer join if you want to include all rows from either side, regardless of matched-ness)

join versus explicit in condition

are there some valid reasons, in a Oracle db, to preferring in a generic query, a filter condition expressed by a join table , instead of a filter with an IN condition with a large number of elements (some hundreds). I mean if you can write something like
SELECT .... FROM t1
WHERE t1. IN (......) with 100-200 items
or if it is better to change it with
SELECT .... FROM t1
JOIN t2 ON t1. = T2.
where the t2 table contains the values needed for the filter
many thanks
Thanks for the answers
I try to explain the situation and my doubt
I have an user interface where the user can choose in a control many items (for example one or more people a list of professionals).
I can use directly this list adding this in a IN condition, that is
SELECT .... FROM t1 WHERE t1. IN (p1... p200)
but this solution, could raise some problems:
- if the selected items are a lot, then the string can exceed a limit of sql string (I remember in Oracle existed a limit of 4000 bytes)
- an IN condition with many valuesmay be inefficient
So an alternative solution can be
1. create a temporary table with the selected item
2. using a join between the temparary table and the main table
Usually the filling of a temporary table is fast and my question is if this second solution is more efficient of the first
The two queries are not functionally equivalent, so the question is somewhat odd--I can't imagine this comes up very often (if ever).
That said, if you have a table that contains exactly the rows that need to be filtered, a JOIN would be a more natural/standard way to handle it.
Is the idea in the first example is to query t2 to get all the values, then add them to a collection and generate an IN clause? If so, I would say this would be a very bad practice.
From what I see, there are two different questions.
a) Using a Static List/table.
If the (100-200) item list is a list of static values, for eg.Let's say a list of Countries or currencies, I think it would be better to add this to a static table/parameter table and change the query to use the table instead. If you need to track a new code/country etc. later, all you need to do later is insert a new code in the look up table.
Also, if there are other queries that use the same conditions (and there usually are), this look up table will promote re-use.
select * from t1 where id in (select id from t2);
and
select * from t1,t2
where t1.id = t2.id
are both equivalent and better than
select * from t1 where
id in ('USD','EUR'..... ); -- 100 to 200 items to track.
b) The choice of Join vs IN:
It really does not matter a lot. The final query that oracle executes will be the transformed version of your query which might evaluate to the same query in both cases.
You should see which of the two queries are more easier to read and convey the intentions correctly.
Useful Link : http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/
http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/

Joining against views in SQLServer with strange query optimizer behavior

I have a complex view that I use to pull a list of primary keys that indicate rows in a table that have been modified between two temporal points.
This view has to query 13 related tables and look at a changelog table to determine if a entity is "dirty" or not.
Even with all of this going on, doing a simple query:
select * from vwDirtyEntities;
Takes only 2 seconds.
However, if I change it to
select
e.Name
from
Entities e
inner join vwDirtyEntities de
on e.Entity_ID = de.Entity_ID
This takes 1.5 minutes.
However, if I do this:
declare #dirtyEntities table
(
Entity_id uniqueidentifier;
)
insert into #dirtyEntities
select * from vwDirtyEntities;
select
e.Name
from
Entities e
inner join #dirtyEntities de
on e.Entity_ID = de.Entity_ID
I get the same results in only 2 seconds.
This leads me to believe that SQLServer is evaluating the view per row when joined to Entities, instead of constructing a query plan that involves joining the single inner join above to the other joins in the view.
Note that I want to join against the full result set from this view, as it filters out only the keys I want internally.
I know I could make it into a materialized view, but this would involve schema binding the view and it's dependencies and I don't like the overhead maintaining the index would cause (This view is only queried for exports, while there are far more writes to the underlying tables).
So, aside from using a table variable to cache the view results, is there any way to tell SQL Server to cache the view while evaluating the join? I tried changing the join order (Select from the view and join against Entities), however that did not make any difference.
The view itself is also very efficient, and there is no room to optimize there.
There is nothing magical about a view. It's a macro that expands. The optimiser decides when JOINed to expand the view into the main query.
I'll address other points in your post:
you have ruled out an indexed view. A view can only be a discrete entity when it is indexed
SQL Server will never do a RBAR query on it's own. Only developers can write loops.
there is no concept of caching: every query uses latest data unless you use temp tables
you insist on using the view which you've decided is very efficient. But have no idea how views are treated by the optimizer and it has 13 tables
SQL is declarative: join order usually does not matter
Many serious DB developer don't use views because of limitations like this: they are not reusable because they are macros
Edit, another possibility. Predicate pushing on SQL Server 2005. That is, SQL Server can not push the JOIN condition "deeper" into the view.

Resources