Inner join query completes but the results is wrong [duplicate] - sql-server

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 6 months ago.
When I run this:
SELECT
R.RegionID AS Region, C.CountryName AS Country,
S.SegmentName AS Segment, SUM(SKPI.KPI) AS YearlySalesKPI,
ROUND(SUM(SOLI.SalePrice), 2) AS YearlySales
FROM
Country C
INNER JOIN
Region R ON C.CountryID = R.CountryID
INNER JOIN
Segment S ON R.SegmentID = S.SegmentID
INNER JOIN
SalesRegion SR ON R.RegionID = SR.RegionID
INNER JOIN
SalesPerson SP ON SR.SalesPersonID = SP.SalesPersonID
INNER JOIN
SalesKPI SKPI ON SP.SalesPersonID = SKPI.SalesPersonID
INNER JOIN
SalesOrder SO ON SR.SalesRegionID = SO.SalesRegionID
INNER JOIN
SalesOrderLineItem SOLI ON SO.SalesOrderID = SOLI.SalesOrderID
INNER JOIN
Product P ON SOLI.ProductID = P.ProductID
INNER JOIN
ProductCost PC ON PC.ProductID = P.ProductID
AND PC.CountryId = C.CountryID
GROUP BY
R.RegionID, C.CountryName, S.SegmentName
ORDER BY
C.CountryName ASC, R.RegionID ASC;
The query takes over 1 minute to complete, and when it does it gives me results that are larger than the real values.
I believe the additional tables are adding additional rows - when it shouldn't. But I am unsure on how to stop this from happening. I think their might be something wrong with the relationships but I'm unsure.
I have tried all methods. Any help is appreciated.
This is my database:
Click here for the diagram
Results
Expected Result:
Expected results
Expected Results:
Expected results
The expected results are the above columns shown in one query output.

I'm not sure if I understand your problem.
Here are my thoughts:
For faster run time
Consider creating index for your tables.
Results that are larger than real values
If you are referring to YearlySalesKPI and YearlySales, duplicate rows might be the reason behind it, try running your query without sum first, check if there are repeating values, additional column joins will probably solve it.

Related

SQLite join multiple values from two tables [duplicate]

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.

Optimize joins from multiple tables

How can I optimize Performance of the below mentioned query when the table structure is as shown in the pic below
Pic Showing The Table Structure
select CounterID, OutletTitle, CounterTitle
from(
select OutletID, Text as OutletTitle
from Outlets as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8 --Locale & CompanyID & OutletID
) as O
inner join
(
select CounterID, Text as CounterTitle, OutletID
from Counters as q1
inner join
TranslationTexts as tt
on q1.TitleID=tt.TranslationID
where tt.Locale='ar-SA' and q1.OutletID=8 --Locale & OutletID
) as C
on O.OutletID=C.OutletID
You should try this request :
SELECT CounterID, tou.Text as OutletTitle, tco.Text as CounterTitle
FROM Counters as co
INNER JOIN Outlets as ou ON co.OutletID = ou.OutletID
INNER JOIN TranslationTexts as tco on co.TitleID=tco.TranslationID
INNER JOIN TranslationTexts as tou on ou.TitleID=tou.TranslationID
WHERE co.CompanyID=311 and co.OutletID=8 AND tco.Locale='ar-SA' and tou.Locale='ar-SA'
To have much better performance, you could add some indexes on the 3 tables.
This is a different approach. I cannot say about improvement in performance because that depends on a lot of other things, but I believe it is an equivalent version and an easier one to read.
SELECT
C.CounterID
, tt.Text AS OutletTitle
, tt.Text AS CounterTitle
FROM
Outlets AS q1
INNER JOIN TranslationTexts AS tt ON q1.TitleID=tt.TranslationID
INNER JOIN Counters C ON c.OutletID=q1.OutletID
INNER JOIN TranslationTexts AS tt2 ON tt2.TranslationID=tt.TranslationID AND tt2.Locale=tt.Locale
WHERE
tt.Locale='ar-SA' and q1.CompanyID=311 and q1.OutletID=8;
The question is what you want to optimize.. readability (and maintainability) and/or performance ?
Most people have their own 'style' when writing queries. I prefer the one below, but to the server it will probably look the same and most likely the system will have the exact same amount of 'work' to get the data even though it 'looks' different to us humans. I'd suggest to google around a bit and learn how to interpret a Query Plan.
SELECT q2.CounterID,
tt1.Text as OutletTitle,
tt2.Text as CounterTitle
FROM Outlets as q1
INNER JOIN Counters as q2
ON q2.OutletID = q1.OutletID
INNER JOIN TranslationTexts as tt1
ON tt1.TranslationID = q1.TitleID
AND tt1.Locale = 'ar-SA'
INNER JOIN TranslationTexts as tt2
ON tt2.TranslationID = q2.TitleID
AND tt2.Locale = 'ar-SA'
WHERE q1.CompanyID = 311
AND q1.OutletID = 8
On of the things I notice is that you pass both CompanyID and OutletID as filters for the Outlets table. Since OutletID is the primary key of that table I wonder if you really need the filter on CompanyID. At best it will eliminate the record because it's the wrong company, but somehow I'm under the impression that you already know the right CompanyID.
As for performance, I'd advice these indexes
CREATE INDEX idx_Locale ON TranslationTexts (Locale, Translation_id)
CREATE INDEX idx_CompanyID ON Outlets (CompanyID) INCLUDE (TitleID, OutletID)
Most likely you even can make that index on Local a UNIQUE index making it work even better.

Grouping data based on expression using CountDistinct aggregate function

I am newbie to Stack overflow and also SQL server reporting services. So please excuse me for the format of the question.
So here is the situation:
I am developing a SSRS report which needs to be grouped by an Count of Distinct product names as shown below.
I created a text box called ProdCount with an expression
COUNTDISTNCT(Fields!Product.value,"DataSet1")
which gives me the count 63 within the scope of DataSet1.
Now i need to group the data by taking product names where the above formula is >1 .
=IIF(ProdCount>1,Fields!Product.value,Nothing)
My Problem:
I tried to call the ProdCount from the calculated field since i
cant use the aggregate functions in Calculated Fields and use
the second expression by using
= ReportItems!ProdCount.value
which gives me an error FieldValue Denying ReportItems
I tried to combine the above two expressions by creating a calculated field by
IIF(CountDistinct(Fields!Product.Value,"DataSet1")>1,Fields!Product.Value,Nothing)
which gives me an error Calculated fields cannot have expressions
I tried to use Report Variables in the same way as above(1) which was not working either.
I also tried to use CROSS JOIN in the query
Select Count(Distinct(Product Name)
from Query1
Cross join
My Main Query which give me the data
which is taking more time to execute.
So Can anyone help me with solution where i can group the data by combining the above two expressions.
Please excuse me for the format. I was confused with framing question. I accept all your edits , so that i can learn in future.
Here is my code:
SELECT * FROM
--Query1 which counts the number of distinct products)
(SELECT DISTINCT COUNT(gproduct.ProductName) AS ProdCount
FROM Table1
LEFT JOIN Table4
ON Table1.column=Table1.column
LEFT JOIN Table2
ON Table3.Column = TTable1.Column
LEFT JOIN
(
SELECT Distinct Table6.Name AS ProductName,Table9.ColumnId
FROM Table6
INNER JOIN Table7
ON Table6.Column=Table7.Column
INNER JOIN Table8
ON Table7.Column=Table8.Column
INNER JOIN Table9
ON Table9.Column=Table8.Column
)gproduct
ON Table1.ColumnId=gproduct.ColumnId
GROUP BY gproduct.ColumnId,
)qProduct
CROSS JOIN
--My main Query which get data from different table including Product name
(SELECT
Upper(CASE WHEN (CASE WHEN Table4.Column =1 THEN 'Yes' ELSE 'NO' END)='YES'
THEN gab.ProductName
ELSE
Table2.productName
END) AS Product,
FROM Table1 AS ec
LEFT JOIN Table2 AS ep
ON --
LEFT JOIN Table3 AS ebrd
ON --
Left JOIN Table4 AS etpc
ON --
LEFT JOIN Table5 AS gst
ON --
LEFT JOIN
(
SELECT Distinct Table6.Name AS ProductName,Table9.ColumnId
FROM Table6
INNER JOIN Table7
ON Table6.Column=Table7.Column
INNER JOIN Table8
ON Table7.Column=Table8.Column
INNER JOIN Table9
ON Table9.Column=Table8.Column
) gab
ON Table1.ColumnId=gab.ColumnId
)QMain
Personally I would try to solve the problem in query itself instead of SSRS report. According the data you provided it would be something like:
SELECT
ProductName,
count(distinct Product)
from
YourTable
group by
ProductName
having count(distinct product) > 1
Later on creating SSRS report should be quite easy.

SQL Server speed: left outer join vs inner join

In theory, why would inner join work remarkably faster then left outer join given the fact that both queries return same result set. I had a query which would take long time to describe, but this is what I saw changing single join: left outer join - 6 sec, inner join - 0 sec (the rest of the query is the same). Result set: the same
Actually depending on the data, left outer join and inner join would not return the same results..most likely left outer join will have more result and again depends on the data..
I'd be worried if I changed a left join to an inner join and the results were not different. I would suspect that you have a condition on the left side of the table in the where clause effectively (and probably incorrectly) turning it into an inner join.
Something like:
select *
from table1 t1
left join table2 t2 on t1.myid = t2.myid
where t2.somefield = 'something'
Which is not the same thing as
select *
from table1 t1
left join table2 t2
on t1.myid = t2.myid and t2.somefield = 'something'
So first I would be worried that my query was incorrect to begin with, then I would worry about performance. An inner join is NOT a performance enhancement for a Left Join, they mean two different things and should return different results unless you have a table where there will always be a match for every record. In this case you change to an inner join because the other is incorrect not to improve performance.
My best guess as to the reason the left join takes longer is that it is joining to many more rows that then get filtered out by the where clause. But that is just a wild guess. To know you need to look at the Execution plans.

Index with Leftouter join there is always Index scan in sql server 2005

I have query joining several tables, the last table is joined with LEFT
JOIN. The last table
has more then million rows and execution plan shows table scan on it. I have
indexed columns
on which the join is made. It is always use index scan but If I replace LEFT JOIN with INNER JOIN, index seek is used
used and execution
takes few seconds but with LEFT JOIN there is a table scan , so the
execution
takes several minutes. Does using outer joins turn off indexes? Missed I
something?
What is the reason for such behavior?
Here is the Query
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
Where S.SubjectID =15
There is cluster index on SubjectID in "Subject" table. and there is non-cluster index on questionID in other tables.
Solution:
I try it in other way and now I am index seek on Cell table. Here is the modified query:
Select *
FROM
Subjects s
INNER join Question q ON q.SubjectID = s.SubjectID
INNER JOIN Answer c ON a.QestionID = q.QuestionID
Left outer JOIN Cell c ON c.Question ID = q.QuestionID
AND C.QuestionID > 0
AND C.CellKey > 0
Where S.SubjectID =15
This way I did high selectivity on Cell table. :)
I just tried to simulate the same issue, however there is no table scan instead it was using the clustered index of Cell, at the same time you could try to force the index, you can check the syntax here and the issues you may face when forcing an index here. Hope this helps.

Resources