preventing display of duplicate records in SQL server - sql-server

I'm using an stored procedure in SQL server, but it is giving me some duplicate records, of course I don't have duplicate records in my database, but my stored procedure is giving me two instances of a same record, what can be wrong? how can I prevent my query from giving duplicate records?
it is my SP select clause:
select (ROW_NUMBER() OVER (ORDER BY Review.Point desc) ) as rownumber,
Business.BusinessId,Business.BName,Business.BAddress1
,Business.BAddress2,Business.BCity,Business.BState,Business.BZipCode,Business.countryCode,Business.BPhone1,Business.BPhone2,Business.BEmail,Business.Keyword
,Business.BWebAddress,Business.BCatId,Business.BSubCatId,Business.BDetail,Business.bImage,Business.UCId,Business.UCConfirm
,Business.UOId,Business.UOConfirm,Business.x,Business.y,Cat.CatName,SubCat1.SubCatName
from Business left outer join
Review on business.BusinessId=Review.BusinessId left outer join
Cat on business.BCatid=Cat.CatId left outer join
SubCat1 on business.BSubCatid=SubCat1.SubCatId '+#sql2+'
) as tbl
where rownumber between '+CONVERT(varchar, #lbound)+' and '+CONVERT(varchar, #ubound);

I don't know your data to dig in to your join logic, but if it duplicating across BusinessID, you could add another ROW_NUMBER() for the duplicates:
select (ROW_NUMBER() OVER (ORDER BY Review.Point desc) ) as rownumber,
r = ROW_NUMBER()OVER(PARTITION BY Business.BusinessId ORDER BY Business.BusinessId)
Business.BusinessId,Business.BName,Business.BAddress1
,Business.BAddress2,Business.BCity,Business.BState,Business.BZipCode,Business.countryCode,Business.BPhone1,Business.BPhone2,Business.BEmail,Business.Keyword
,Business.BWebAddress,Business.BCatId,Business.BSubCatId,Business.BDetail,Business.bImage,Business.UCId,Business.UCConfirm
,Business.UOId,Business.UOConfirm,Business.x,Business.y,Cat.CatName,SubCat1.SubCatName
from Business left outer join
Review on business.BusinessId=Review.BusinessId left outer join
Cat on business.BCatid=Cat.CatId left outer join
SubCat1 on business.BSubCatid=SubCat1.SubCatId '+#sql2+'
) as tbl
where rownumber between '+CONVERT(varchar, #lbound)+' and '+CONVERT(varchar, #ubound)
AND r = 1;

Include the reserved word DISTINCT in your query.
eg
select distinct
*
from
students s
inner join enrollments e on e.StudentId = s.Id
inner join courses c on c.Id = e.CourseId
However, unexpected duplicates in a result table is often (but not always) a clue that you have a badly formed query or a badly designed database.

Try to remove this left join
Review on business.BusinessId=Review.BusinessId left outer join
seems not needed in your query and if there are more than one review for one business ...

Related

How do I properly add this query into my existing query within Query Designer?

I currently have the below query written within Query Designer. I asked a question yesterday and it worked on its own but I would like to incorporate it into my existing report.
SELECT Distinct
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT Distinct
MIN(c2.Comments) AS FlaggedComments
,c2.SalesKey
FROM [SalesComment] AS c2
WHERE(c2.Comments like 'Flagged*%')
GROUP BY c2.SalesKey) ic2
ON ic2.SalesKey = i.SalesKey
LEFT JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'
My question yesterday was that I wanted a way to select only the first comment made for each sale. I have a query for selecting the flagged comments but I want both the first row and the flagged comment. They would both be pulling from the same table. This was the query provided and it worked on its own but I cant figure out how to make it work with my existing query.
SELECT a.DateTimeCommented, a.ProductNumber, a.Comments, a.SalesKey
FROM (
SELECT
DateTimeCommented, ProductNumber, Comments, SalesKey,
ROW_NUMBER() OVER(PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) a
WHERE a.RowN = 1
Thank you so much for your assistance.
You can use a combination of row-numbering and aggregation to get both the Flagged% comments, and the first comment.
You may want to change the PARTITION BY clause to suit.
DISTINCT on the outer query is probably spurious, on the inner query it definitely is, as you have GROUP BY anyway. If you are getting multiple rows, don't just throw DISTINCT at it, instead think about your joins and whether you need aggregation.
The second LEFT JOIN logically becomes an INNER JOIN due to the WHERE predicate. Perhaps that predicate should have been in the ON instead?
SELECT
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
,ic2.FirstComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT
MIN(CASE WHEN c2.RowN = 1 THEN c2.Comments) AS FirstComments
,c2.SalesKey
,MIN(CASE WHEN c2.Comments like 'Flagged*%' THEN c2.Comments) AS FlaggedComments
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) AS c2
GROUP BY c2.SalesKey
) ic2 ON ic2.SalesKey = i.SalesKey
JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'

SQL Query Multiple Joins Unexpected Results

My task is to write a query that will return sales information for each customer category and year. The columns required in the result set are:
OrderYear - the year the orders were placed
CustomerCategoryName - as it appears in the table Sales.CustomerCategories
CustomerCount - the number of unique customers placing orders for each CustomerCategoryName and OrderYear
OrderCount - the number of orders placed for each CustomerCategoryName and OrderYear
Sales - the subtotal from the orders placed, calculated from Quantity and UnitPrice of the table Sales.OrderLines
AverageSalesPerCustomer - the average sales per customer for each CustomerCategoryName and OrderYear
The results should be sorted in ascending order, first by order year, then by customer category name.
My attempt at a solution:
SELECT
CC.CustomerCategoryName,
YEAR(O.OrderDate) AS OrderYear,
COUNT(DISTINCT C.CustomerID) AS CustomerCount,
COUNT(DISTINCT O.OrderID) AS OrderCount,
SUM(OL.Quantity * OL.UnitPrice) AS Sales,
SUM(OL.Quantity * OL.UnitPrice) / COUNT(DISTINCT C.CustomerID) AS AverageSalesPerCustomer
FROM
Sales.CustomerCategories CC
INNER JOIN
Sales.Customers C ON C.CustomerCategoryID = CC.CustomerCategoryID
INNER JOIN
Sales.Orders O ON O.CustomerID = C.CustomerID
INNER JOIN
Sales.OrderLines OL ON OL.OrderID = O.OrderID
GROUP BY
CC.CustomerCategoryName, YEAR(O.OrderDate)
ORDER BY
YEAR(O.OrderDate), CC.CustomerCategoryName;
My OrderCount seems correct. However, I don't believe my CustomerCount is correct and my Sales and AverageSalesPerCustomer seem way off. The Categories that do not have any customers and orders do not show up in my results.
Is the reason that my counts are off and that he categories that do not have any customers are omitted is because they only have null values? I believe the question is looking for all the categories.
I am using the sample tables of WideWorldImporters from Microsoft.
Any help would be appreciated as I am new to SQL and Joins are a very hard concept for me to understand.
Presently, you're getting only the data that exists in order details...and not getting anything for the non-existent orders. Normally, this is accomplished with outer joins instead of inner joins, and an isnull(possiblyNullValue,replacementValue).
Also, while you're grouping by year(o.OrderDate), your join for orders isn't distinguishing by year...probably getting all years worth of data for each customer for each reporting period.
So, let's get the reporting period out first...and make sure we're basing our results on that:
select distinct year(o.OrderDate) from Sales.Orders
But really, you want all categories and all years...so you can combine them to get the real basis:
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
Now, you want to join this mess into the remaining query. There are two ways to do this...one is to use a with clause...but sometimes it's just easier to just wrap the basis query up in parentheses and use it as if it was a table:
select
cy.CustomerCategoryName,
cy.CalendarYear,
count(distinct c.CustomerId) CustomerCount,
isnull(sum(ol.UnitPrice * ol.Quantitiy),0.0) Sales,
isnull(sum(ol.UnitPrice * ol.Quantitiy) / count(distinct c.CustomerId),0.0) AverageSalesPerCustomer
from
(
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate) CalendarYear --> must name calc'd cols in virtual tables
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
) as cy --> cy is the "Category Years" virtual table
left outer join
Sales.Customers c
on cy.CustomerCategoryId = c.CustomerCategoryId
left outer join
Sales.Orders o
on
c.CustomerId = o.CustomerId --> join on customer and year
and --> to make sure we're only getting
cy.CalendarYear = Year(o.OrderDate) --> orders in the right year
left outer join
Sales.OrderLines ol
on o.OrderId = ol.OrderId
group by
cy.CalendarYear,
cy.CustomerCategoryName
order by
cy.CalendarYear,
cy.CustomerCategoryName
By the way...get comfortable messing with your queries to select some subset...for example, you can add a where clause to select only one company...and then go have a look at the details...to see if it passes the smell test. It's a lot easier to evaluate the results when you limit them. Similarly, you can add the customer to the select list and the outer grouping for the same reason. Experimentation is the key.

The column "xxxx" was specified multiple times for "CTE_YYYY"

I am new to Microsoft SQL Server. I am trying to join two tables that has common key named CampaignID using LEFT OUTER JOIN. I need to reuse the result in a different query, so I decided to capture the result set using CTE_Results. For example,
-- This is my CTE script
WITH CTE_Results AS
(
SELECT t1.CampaignID, t2.CampaignID, t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
However, when I ran above, I get:
The column `CampaignID` was specified multiple times for `CTE_Results`.
From reading through old StackOverflow questions and answers, it seems like since CampaignID is in both tables that are being joined, I must use table aliases to specify whose (which table's) CampaignID I want to SELECT. But I think I did that and even that it seems like the error still occurs.
Is there a way for me to select and keep BOTH CampaignID's in my CTE? If so, what should be changed? Thank you for the answers!
You have CampaignID selected twice in CTE, use different alias name to fix the problem
WITH CTE_Results
AS (SELECT t1.CampaignID AS cd_CampaignID,
t2.CampaignID AS cod_CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
or use this
WITH CTE_Results(cd_CampaignID, cod_CampaignID, NAME, Vendor)
AS (SELECT t1.CampaignID,
t2.CampaignID,
t1.NAME,
t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID)
-- This is the script I want to use to compare the resulting table. For example,
SELECT Vendor
FROM CTE_Results
You need to Alias the CampaignID Columns in your CTE or define the returned column names in the CTE declaration. Otherwise it would be like creating a table with two columns with the same name.
Example Column Alias:
WITH CTE_Results AS
(
SELECT t1.CampaignID as 'CampaignID1', t2.CampaignID as 'CampaignID2', t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)
Or In CTE declaration:
WITH CTE_Results (CampaignID1, CampaignID2, [Name], Vendor) AS
(
SELECT t1.CampaignID, t2.CampaignID , t1.Name, t2.Vendor
FROM CampaignDetails AS t1
LEFT OUTER JOIN CampaignOnlineDetails AS t2
ON t1.CampaignID = t2.CampaignID
)

Convert T-SQL Cross Apply to Oracle

I'm looking to convert this SQL Server (T-SQL) query that uses a cross apply to Oracle 11g. Oracle does not support Cross Apply until 12g, so I have to find a work-around. The idea behind the query is for each Tab.Name that = 'Foobar', I need find the previous row's name with the same ID ordered by Tab.Date. (This table contains multiple rows for 1 ID with different Name and Date).
Here is the T-SQL code:
SELECT DISTINCT t1.ID
t1.Name,
t1.Date,
t2.Date as 'PreviousDate',
t2.Name as 'PreviousName'
FROM Tab t1
OUTER apply (SELECT TOP 1 t2.Date,
t2.Name
FROM Tab t2
WHERE t1.Id = t2.Id
ORDER BY t2.Date DESC) t2
WHERE t1.Name = 'Foobar' )
Technically, I was able to recreate this same functionality in Oracle using LEFT JOIN and LAG() function:
SELECT DISTINCT t1.ID
t1.Name,
t1.Date,
t2.PreviousDate as PreviousDate,
t2.PreviousName as PreviousName
FROM Tab t1
LEFT JOIN (
SELECT ID,
LAG(Name) OVER (PARTITION BY ID ORDER BY PreviousDate) as PreviousName,
LAG(Date) OVER (PARTITION BY ID ORDER BY PreviousDate) as PreviousDate
FROM Tab) t2 ON t2.ID = t1.ID
WHERE t1.Name = 'Foobar'
The issue is the order it executes the Oracle query. It will pull back ALL rows from Tab, order them (because of the LAG function), then it will filter them down using the ON statement when it joins it to the main query. That table has millions of records, so doing that for EACH ID is not feasible. Basically, I want to change the order of operations in the sub-query to just pull back rows for a single ID, sort those rows to find the previous, and join that. Any ideas on how to tweak it?
TL;DR
SQL Server: filters, orders, joins
Oracle: orders, filters, joins
You can look for the latest row per (id) group with row_number():
select *
from tab t1
left join
(
select row_number() over (
partition by id
order by Date desc) as rn
, *
from t2
) t2
on t1.id = t2.id
and t2.rn = 1 -- Latest row per id

Grouping data based on expression using CountDistinct aggregate function

I am newbie to Stack overflow and also SQL server reporting services. So please excuse me for the format of the question.
So here is the situation:
I am developing a SSRS report which needs to be grouped by an Count of Distinct product names as shown below.
I created a text box called ProdCount with an expression
COUNTDISTNCT(Fields!Product.value,"DataSet1")
which gives me the count 63 within the scope of DataSet1.
Now i need to group the data by taking product names where the above formula is >1 .
=IIF(ProdCount>1,Fields!Product.value,Nothing)
My Problem:
I tried to call the ProdCount from the calculated field since i
cant use the aggregate functions in Calculated Fields and use
the second expression by using
= ReportItems!ProdCount.value
which gives me an error FieldValue Denying ReportItems
I tried to combine the above two expressions by creating a calculated field by
IIF(CountDistinct(Fields!Product.Value,"DataSet1")>1,Fields!Product.Value,Nothing)
which gives me an error Calculated fields cannot have expressions
I tried to use Report Variables in the same way as above(1) which was not working either.
I also tried to use CROSS JOIN in the query
Select Count(Distinct(Product Name)
from Query1
Cross join
My Main Query which give me the data
which is taking more time to execute.
So Can anyone help me with solution where i can group the data by combining the above two expressions.
Please excuse me for the format. I was confused with framing question. I accept all your edits , so that i can learn in future.
Here is my code:
SELECT * FROM
--Query1 which counts the number of distinct products)
(SELECT DISTINCT COUNT(gproduct.ProductName) AS ProdCount
FROM Table1
LEFT JOIN Table4
ON Table1.column=Table1.column
LEFT JOIN Table2
ON Table3.Column = TTable1.Column
LEFT JOIN
(
SELECT Distinct Table6.Name AS ProductName,Table9.ColumnId
FROM Table6
INNER JOIN Table7
ON Table6.Column=Table7.Column
INNER JOIN Table8
ON Table7.Column=Table8.Column
INNER JOIN Table9
ON Table9.Column=Table8.Column
)gproduct
ON Table1.ColumnId=gproduct.ColumnId
GROUP BY gproduct.ColumnId,
)qProduct
CROSS JOIN
--My main Query which get data from different table including Product name
(SELECT
Upper(CASE WHEN (CASE WHEN Table4.Column =1 THEN 'Yes' ELSE 'NO' END)='YES'
THEN gab.ProductName
ELSE
Table2.productName
END) AS Product,
FROM Table1 AS ec
LEFT JOIN Table2 AS ep
ON --
LEFT JOIN Table3 AS ebrd
ON --
Left JOIN Table4 AS etpc
ON --
LEFT JOIN Table5 AS gst
ON --
LEFT JOIN
(
SELECT Distinct Table6.Name AS ProductName,Table9.ColumnId
FROM Table6
INNER JOIN Table7
ON Table6.Column=Table7.Column
INNER JOIN Table8
ON Table7.Column=Table8.Column
INNER JOIN Table9
ON Table9.Column=Table8.Column
) gab
ON Table1.ColumnId=gab.ColumnId
)QMain
Personally I would try to solve the problem in query itself instead of SSRS report. According the data you provided it would be something like:
SELECT
ProductName,
count(distinct Product)
from
YourTable
group by
ProductName
having count(distinct product) > 1
Later on creating SSRS report should be quite easy.

Resources