Does JOINING of tables more easy to read or faster to execute if I create a sub query and a select statement with only the columns needed for the entire query?
-- Example
SELECT s.Id,
s.TransactionDate,
s.TransactionNo,
s.CustomerId,
s.SiteLocationId,
s.SubTotal,
sd.ItemId,
sd.UnitPrice,
sd.GrossAmount
FROM tblTransactions s
LEFT OUTER JOIN tblTransactionDetails sd ON sd.TransactionId = s.Id
Compare to this:
SELECT s.Id,
s.TransactionDate,
s.TransactionNo,
s.CustomerId,
s.SubTotal,
sd.ItemId,
sd.UnitPrice,
sd.GrossAmount
FROM tblTransactions s
LEFT OUTER JOIN (
SELECT TransactionId,
ItemId,
UnitPrice,
GrossAmount
FROM tblTransactionDetails
) sd ON sd.TransactionId = s.Id
What are the advantages and disadvantages of each example given? I am also trying to reduce the percentage read of the Details in the Execution Plan.
I think that SQL Server creates the same execution plan for both queries. If you want to increase the performance due to the column selection, you should create a non-clustered index. Then, the query optimizer should use the more compact index instead the table.
CREATE NONCLUSTERED INDEX ix_tblTransactionDetails_test
ON tblTransactionDetails (TransactionId) INCLUDE (ItemId, UnitPrice, GrossAmount)
Related
I have a query that runs fairly fast under normal circumstances. But it is running very slow (at least 20 minutes in SSMS) due to how many values are in the filter.
Here's the generic version of it, and you can see that one part is filtering by over 8,000 values, making it run slow.
SELECT DISTINCT
column
FROM
table_a a
JOIN
table_b b ON (a.KEY = b.KEY)
WHERE
a.date BETWEEN #Start and #End
AND b.ID IN (... over 8,000 values)
AND b.place IN ( ... 20 values)
ORDER BY
a.column ASC
It's to the point where it's too slow to use in the production application.
Does anyone know how to fix this, or optimize the query?
To make a query fast, you need indexes.
You need a separate index for the following columns: a.KEY, b.KEY, a.date, b.ID, b.place.
As gotqn wrote before, if you put your 8000 items to a temp table, and inner join it, it will make the query even faster too, but without the index on the other part of the join it will be slow even then.
What you need is to put the filtering values in temporary table. Then use the table to apply filtering using INNER JOIN instead of WHERE IN. For example:
IF OBJECT_ID('tempdb..#FilterDataSource') IS NOT NULL
BEGIN;
DROP TABLE #FilterDataSource;
END;
CREATE TABLE #FilterDataSource
(
[ID] INT PRIMARY KEY
);
INSERT INTO #FilterDataSource ([ID])
-- you need to split values
SELECT DISTINCT column
FROM table_a a
INNER JOIN table_b b
ON (a.KEY = b.KEY)
INNER JOIN #FilterDataSource FS
ON b.id = FS.ID
WHERE a.date BETWEEN #Start and #End
AND b.place IN ( ... 20 values)
ORDER BY .column ASC;
Few important notes:
we are using temporary table in order to allow parallel execution plans to be used
if you have fast (for example CLR function) for spiting, you can join the function itself
it is not good to use IN with many values, the SQL Server is not able to build always the execution plan which may lead to time outs/internal error - you can find more information here
I have four tables for join I am trying to join with views in sql server. i have successfully done join query and retrieving data from multiple table with join query. But I Execute the same query sql server shows the different result every time.
SELECT DISTINCT
dbo.tbl_verifyFinger2.ID
, dbo.tbl_verifyCnicDetails.fID
, dbo.tbl_verifyCnicDetails.colGRName
, dbo.tbl_verifyFinger2.colCompanyID
, dbo.tbl_verifyAvailableFingers.colCNIC
, dbo.tbl_agent.agent_id
, dbo.tbl_agent.colIMSI
, dbo.tbl_verifyFinger2.colDate
, dbo.tbl_verifyFinger2.colStatusMessage
FROM dbo.tbl_verifyFinger2
INNER JOIN dbo.tbl_verifyCnicDetails
ON dbo.tbl_verifyFinger2.ID = dbo.tbl_verifyCnicDetails.fID
INNER JOIN dbo.tbl_verifyAvailableFingers
ON dbo.tbl_verifyFinger2.colCNIC = dbo.tbl_verifyAvailableFingers.colCNIC
INNER JOIN dbo.tbl_agent
ON dbo.tbl_verifyAvailableFingers.colIMSI = dbo.tbl_agent.colIMSI
Cause SQL Server not allow to use ORDER By clause inside views, to get same preview of result every time, you must include ORDER BY clause in you outer SELECT query, at the end of query.
Of course, carefully choose columns in ORDER BY clause, because it must be deterministic which guarantee that every time sorted result will be the same and moving your rows up and down will not be presented more.
SELECT
*
FROM schema_name.view_name AS v
ORDER BY
v.column_name (ASC|DESC) --If ommiting directions, ASC is the default
I have a view like that:
create view dbo.VEmployeeSalesOrders
as
select
employees.employeeID, Products.productID,
Sum(Price * Quantity) as Total,
salesDate,
COUNT_BIG() as [RecordCount]
from
dbo.Employees
inner join
dbo.sales on employees.employeeID = sales.employeeID
inner join
dbo.products on sales.productID = products.ProductID
group by
Employees.employeeID, products.ProductID, salesDate
When I select * from dbo.VEmployeeSalesOrders it takes 97% of the execution plan. It needs it to be faster.
And when I try to create an index, an exception fires with the following message:
select list doesn't include a proper use on count_Big()
Why am getting this error?
1-first you need to alter your view and make it contains COUNT_BIG() function because you used aggregate function in select statment,
AND THE REASON FOR USING THAT is that SQL Server needs to track the record where the record is ,number of records
like this
create view dbo.VEmployeeSalesOrders
as
select employees.employeeID,Products.productID,Sum(Price*Quantity)
as Total,salesDate,COUNT_BIG(*) as [RecordCount]
from dbo.Employees
inner join dbo.sales on employees.employeeID=sales.employeeID
inner join dbo.products on sales.productID-products.ProductID
group by Employees.employeeID,products.ProductID,salesDate
2- then you need to create index like that
Create Unique Clustered Index Cidx_IndexName
on dbo.VEmployeeSalesOrders(employedID,ProductID,SalesDate)
Hope It Works
I am having some performance issues with a query I am running in SQL Server 2008. I have the following query:
Query1:
SELECT GroupID, COUNT(*) AS TotalRows FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2
ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word) GROUP BY GroupID
Table1 contains about 500,000 rows. Table2 contains about 50,000, but will eventually contain millions. Playing around with the query, I found that re-writing the query as follows will reduce the execution time of the query to under 1 second.
Query 2:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
What I do not understand is it is a simple count query. If I execute the following query on Table 1, it returns in < 1 s:
Query 3:
SELECT Count(*) FROM Table1
This query returns around 500,000 as the result.
However, the Original query (Query 1) mentioned above only returns a count of 50,000 and takes 3s to execute even though simply removing the GROUP BY (Query 2) reduces the execution time to < 1s.
I do not believe this is an indexing issue as I already have indexes on the appropriate columns. Any help would be very appreciated.
Performing a simple COUNT(*) FROM table can do a much more efficient scan of the clustered index, since it doesn't have to care about any filtering, joining, grouping, etc. The queries that include full-text search predicates and mysterious subqueries have to do a lot more work. The count is not the most expensive part there - I bet they're still relatively slow if you leave the count out but leave the group by in, e.g.:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
GROUP BY GroupID;
Looking at the provided actual execution plan in the free SQL Sentry Plan Explorer*, I see this:
And this:
Which lead me to believe you should:
Update the statistics on both Inventory and A001_Store_Inventory so that the optimizer can get a better rowcount estimate (which could lead to a better plan shape).
Ensure that Inventory.ItemNumber and A001_Store_Inventory.ItemNumber are the same data type to avoid an implicit conversion.
(*) disclaimer: I work for SQL Sentry.
You should have a look at the query plan to see what SQL Server is doing to retrieve the data you requested. Also, I think it would be better to rewrite your original query as follows:
SELECT
Table1.GroupID -- When you use JOINs, it's always better to specify Table (or Alias) names
,COUNT(Table1.GroupID) AS TotalRows
FROM
Table1
INNER JOIN
Table2 ON
(Table2.Column1 = Table1.Column1) AND
(Table2.GroupID = #GroupID)
WHERE
CONTAINS(Table1.*, #Word)
GROUP BY
Table1.GroupID
Also, keep in mind that a simple COUNT and a COUNT with a JOIN and GROUP BY are not the same thing. In one case, it's just a matter of going through an index and making a count, in the other there are other tables and grouping involved, which can be time consuming depending on several factors.
I have a table with purpose of holding id's.
I want to select from other table ( a big table of millions of records) also many records.
Which one would outperform:
SELECT id, att1, att2
FROM myTable
WHERE id IN (SELECT id FROM #myTabwithIDS)
Or
SELECT id, att1, att2
FROM myTable t
INNER JOIN #myTabwithIDS t2
ON t2.id = t.id
I would use the Query Analyzer built in to SQL Server to explore the execution plan.
http://www.sql-server-performance.com/2006/query-analyzer/
Specifically turn on Show Execution Plan, and Statistics IO and Time.
Normally a join is better than a subquery, especially in your case where the outer queries condition depends on the results of the subquery (known as a correlated subquery). See Subqueries vs joins for more details.