Probably more information is needed, but this is really odd. Using SQL 2005 I am executing an Inner Join on two tables. If I rename one of the tables (using Alter Table), the resulting Query Plan is significanly longer. There are views on the table, but the Inner Join is using the base table, not any of the views. Does this make sense? Is this to be expected?
Related
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.
why can't we use Order By clause while creating the view. What is the reason behind SQL supporting Order by clause with TOP clause mentioned in the query and not supporting the same without TOP clause
A view is nothing but a virtual table and the order in which data is stored in a table can never be guaranteed in any RDBMS.
What you will need to do is:
SELECT <Column1>,<Column2>,....,<ColumnN>
FROM <MyView>
ORDER BY <MyColumn>
Because the tsql is relational and view is a relation and the relation doesn't have order.
In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.
For ordering the resulted data you need to query it and apply order by clause as per your requirement.
I created indexed view (clustered unique index on Table1_ID) view with such T-SQL:
Select Table1_ID, Count_BIG(*) as Table2TotalCount from Table2 inner join
Table1 inner join... where Table2_DeletedMark=0 AND ... Group BY Table1_ID
Also after creating the view, we set clustered unique index on column Table1_ID.
So View consists of two columns:
Table1_ID
Table2TotalCount
T-Sql for creating View is a heavy because of group by and several millions of rows in Table2.
But when I run a query to a view like
Select Total2TotalCount from MyView where Table1_ID = k
- it executes fast and without overhead for server.
Also in t-sql for creating view many conditions in where clause for Table2 columns. And
If I changed Table2_DeletedMark to 1 and run a query
Select Total2TotalCount from MyView where Table1_ID = k
again - I'll get correct results. (Table2TotalCount decreased by 1).
So our questions are:
1. Why does query execution time decreased so much when we used Indexed View (compare to without view using (even we run DBCC DROPCLEANBUFFERS() before executing query to VIEW))
2. After changing
Table2_DeletedMark
View immediately recalculated and we get correct results, but what is the process behind? we can't imagine that sql executes t-sql by what view was generated each time we changes any values of 10+ columns containing in the t-sql view generating, because it is too heavy.
We understand that it is enough to run a simple query to recalculate values, depends on columns values we changing.
But how does sql understand it?
An indexed view is materialized e.g. its rows that it contains (from the tables it depends on) are physically stored on disk - much like a "system-computed" table that's always kept up to date whenever its underlying tables change. This is done by adding the clustered index - the leaf pages of the clustered index on a SQL Server table (or view) are the data pages, really.
Columns in an indexed view can be indexed with non-clustered indexes, too, and thus you can improve query performance even more. The down side is: since the rows are stored, you need disk space (and some data is duplicated, obviously).
A normal view on the other hand is just a fragment of SQL that will be executed to compute the results - based on what you select from that view. There's no physical representation of that view, there are no rows stored for a regular view - they need to be joined together from the base tables as needed.
Why do you think there are so many bizarre rules on what's allowed in indexed views, and what the base tables are allowed to do? It's so that the SQL engine can immediately know "If I'm touching this row, it potentially affects the result of this view - let's see, this row no longer fits the view criteria, but I insisted on having a COUNT_BIG(*), so I can just decrement that value by one"
I have two tables, one has about 1500 records and the other has about 300000 child records. About a 1:200 ratio. I stage the parent table to a staging table, SomeParentTable_Staging, and then I stage all of it's child records, but I only want the ones that are related to the records I staged in the parent table. So I use the below query to perform this staging by joining with the parent tables staged data.
--Stage child records
INSERT INTO [dbo].[SomeChildTable_Staging]
([SomeChildTableId]
,[SomeParentTableId]
,SomeData1
,SomeData2
,SomeData3
,SomeData4
)
SELECT [SomeChildTableId]
,D.[SomeParentTableId]
,SomeData1
,SomeData2
,SomeData3
,SomeData4
FROM [dbo].[SomeChildTable] D
INNER JOIN dbo.SomeParentTable_Staging I ON D.SomeParentTableID = I.SomeParentTableID;
The execution plan indicates that the tables are being joined with a Nested Loop. When I run just the select portion of the query without the insert, the join is performed with Hash Match. So the select statement is the same, but in the context of an insert it uses the slower nested loop. I have added non-clustered index on the D.SomeParentTableID so that there is an index on both sides of the join. I.SomeParentTableID is a primary key with clustered index.
Why does it use a nested loop for inserts that use a join? Is there a way to improve the performance of the join for the insert?
A few thoughts:
Make sure your statistics are up to date. Bad statistics account for many of the bizarre "intermittent" query plan problems.
Make sure your indexes are covering, otherwise there's a much higher probability of the optimizer ignoring them.
If none of that helps, you can always force a specific join by writing INNER HASH JOIN as opposed to just INNER JOIN.
Does the destination table have a clustered index? The choice of join may be necessary to facilitate the ordering of the data in the insert. I've seen execution plans differ depending on whether the destination table has a clustered index and what column(s) it is on.
I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error:
An association from the table xxx refers to an unmapped class.
If the tables are in the same database, then everything works fine.
I am not sure about this error. But I have also faced problems in joining tables from different databases in a procedure. I created temp table in one of the databases and then inserted data from the other table and did join using temp table with table in the same database.