Duplicate values and group by throwing query results off - sql-server

I'm trying to execute a query and having some issues. The objective is to find all duplicate values for a specific field (Upc) that manufacturers (idPub) are using. An example is manufacturer A uses upc 1010 while manufacture B also occupies upc 1010. This data is stored in one table. So far, I've come up with this query below...
USE dbIdwWhseLC
SELECT tbItem.sUpc, COUNT(*) AS NumberofDups
FROM tbItem
WHERE sUpc IS NOT NULL
GROUP BY sUpc
HAVING COUNT(*) > 1
ORDER BY COUNT(*)
The query is displaying the correct data as far as upc numbers and counts, however when attempting to throw the manufacturer field in the query, I have to group by that manufacturer field too which is throwing the results off. I'm trying to the query to return data like this below...
Upc idPub
1010 A
1010 B
Any recommendations would be greatly appreciated. Thanks.

You have to join back to your main table, like so:
WITH Duplicates AS
(
Select tbItem.sUpc
,COUNT(*) As NumberofDups
From tbItem
Where sUpc is not null
Group by sUpc
Having COUNT(*)>1
)
SELECT
D.sUpc
,TI.idPub
,M.[name]
FROM
Duplicates AS D
INNER JOIN
tblItem AS TI
ON
D.sUpc = TI.sUpc
INNER JOIN
tbMfrReporting AS M
ON
TI.nIdPub = M.nIdPub
The Duplicates statement above is known as a CTE.
I hope that helps.
Ash
EDIT: query updated to add further table as per comments

You want to find all the entries where the upc is in the list of UPCs that have more than one entry, so...
select sUpc, idPub
from tbItem
where sUpc in
(
Select tbItem.sUpc
From tbItem
Where sUpc is not null
Group by sUpc
Having COUNT(*)>1
)

Related

SQL Project using a where clause

So this is what I am working with new to sql and still learning been stuck on this for a few days now. Any advice would be appreciated I attached the image of the goal I'm trying to achieve
OrderItem And Product Table
Order And OrderItem Table(https://i.stack.imgur.com/pdbMT.png)
Scenario: Our boss would like to see the OrderNumber, OrderDate, Product Name, UnitPrice and Quantity for products that have TotalAmounts larger than the average
Create a query with a subquery in the WHERE clause. OrderNumber, OrderDate and TotalAmount come from the Order table. ProductName comes from the Product table. UnitPrice and Quantity come from the OrderItem table.
This is the code I came up with but it causes product name to run endlessly and displays wrong info.
USE TestCorp;
SELECT DISTINCT OrderNumber,
OrderDate,
ProductName,
i.UnitPrice,
Quantity,
TotalAmount
FROM [Order], Product
JOIN OrderItem i ON Product.UnitPrice = i.UnitPrice
WHERE TotalAmount < ( SELECT AVG(TotalAmount)
FROM [Order]
)
ORDER BY TotalAmount DESC;
Best guess assuming joins and fields not provided.
SELECT O.OrderNumber, O.orderDate, P.ProductName, OI.UnitPrice, OI.Quantity, O.TotalAmount
FROM [Order] O
INNER JOIN OrderItem OI
on O.ID = OI.orderID
INNER JOIN Product P
on P.ID= OI.ProductID
CROSS JOIN (SELECT avg(TotalAmount) AvgTotalAmount FROM [Order]) z
WHERE O.TotalAmount > z.AvgTotalAmount
Notes:
You're mixing join notations don't use , and inner join together that's mixing something called ANSI Standards.
I'm not sure why you have a cross join to product to begin with
You don't specify how to join Order to order item.
It seems very odd to be joining on Price.... join on order ID or productID maybe?
you could cross join to an "Average" result so it's available on every record. (I aliased this inline view "Z" in my attempt)
so what the above does is include all Orders. and for each order, an order item must be associated for it to be included. And then for each order item, a productid must be included and related to a record in product. If for some reason an order item record doens't have a related entry in product table, it gets excluded.
I use a cross join to get the average as it's executed 1 time and applied/joined to every record.
If we use the query in the where clause it's executed one time for EVERY record (unless the DB Engine optimizer figures it out and generates a better plan)
I Assume
Order.ID relates to OrderItem.OrderID
OrderItem.productID relates to Product.ID
Order.TotalAmount is what we are wanting to "Average" and compare against
Every Order has an Order Item entry
Every Order Item entry has a related product.

How to filter or split a CTE so that 2 rows are not added with the same value in a specific column

So the title sounds convoluted because my problem kinda is.
I have a CTE that pulls in some values (LineId, OrderNumber, OrderLine, Type, BuildUsed)
Later on a have a Select that populates a view that does a join on the CTE with something like this
left join CTE C on C.LineId = (select top 1 lineId from CTE C2 where C2.orderNumber = orderNumber and C2.orderLine = orderLine order by LineId
An example of my data would look like
LineId = 10, Order : OIP001, Line = 1, Type = Active, BuildUsed = XE9
LineId = 80, Order : OIP001, Line = 1, Type = Inactive, BuildUsed = XB2
The CTE does a Select, Union, Select. The first select gets all the active entries and the 2nd select gets all the inactive entries.
Any given order could have both active or inactive or just 1 of them.
The issue I am having is that my runtime is bad. It runs in close to 20 seconds when it should be like 4 or 5. The issue is that the join I listed above has to search and order every time and its a huge time sink.
So i thought if there was a way to basically break the CTE into 2 steps.
Insert all the active orders (These are the ones that I would want to pick if they are available)
Insert all the inactive orders (If that ordernumber and orderline does not already exist in the first step)
That way I don't have to order and sort every single join but I can just do a normal join thats significantly faster.
If it helps at all the LineId is based on a rownumber() in the CTE that looks like
ROW_NUMBER() OVER(ORDER BY Type desc, DescriptionStatus asc) as LineId
So the LineId is already ordered correctly.
Is there any way to split the CTE so that my 2nd part of the select can check if the ordernumber and orderline alraedy exists in the first part?
To specify. I would like to find any Active entries for the ordernumber and orderline first and then if none are found, try the inactive entries.
WHAT I HAVE TRIED SO FAR :
I tried adding the query for the 2nd part into the first part as a where clause. So it would only add where it wouldn't exist in the first part. But the time of the query got so insane I just stopped running it and scrapped that idea.
I believe you're just looking for a WHERE NOT EXISTS that uses a correlated sub-query to eliminate rows from your second result set that you've already retrieved in your first result set.
WHERE NOT EXISTS is generally pretty performant, but test the CTE by itself to be sure it meets your needs.
Something similar to this:
WITH cte
AS
(
SELECT
act.LineID,
act.OrderNumber,
act.OrderLine,
act.Type,
act.BuildUsed
FROM
ActiveSource AS act
UNION ALL
SELECT
inact.LineID
,inact.OrderNumber
,inact.OrderLine
,inact.Type
,inact.BuildUsed
FROM
InactiveSource AS inact
WHERE
NOT EXISTS
(
SELECT
1
FROM
ActiveSource AS a
WHERE
a.OrderNumber = inact.OrderNumber
AND a.OrderLine = inact.OrderLine
)
)
SELECT * FROM cte;

SQL queries combined into one row

I'm having some difficulty combining the following queries, so that the results display in one row rather than in multiple rows:
SELECT value FROM dbo.parameter WHERE name='xxxxx.name'
SELECT dbo.contest.name AS Event_Name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name
SELECT COUNT(*) FROM open_option AS total_people
SELECT SUM(scanned) AS TotalScanned,SUM(number) AS Totalnumber
FROM dbo.open_box
GROUP BY contest_id
SELECT COUNT(*) FROM open AS reff
WHERE refer = 'True'
I would like to display data from the fields in each column similar to what is shown in the image below. Any help is appreciated!
Tab's solution is fine, I just wanted to show an alternative way of doing this. The following statement uses subqueries to get the information in one row:
SELECT
[xxxx.name]=(SELECT value FROM dbo.parameter WHERE name='xxxxx.name'),
[Event Name]=(SELECT dbo.contest.name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name),
[Total People]=(SELECT COUNT(*) FROM open_option),
[Total Scanned]=(SELECT SUM(scanned)
FROM dbo.open_box
GROUP BY contest_id),
[Total Number]=(SELECT SUM(number)
FROM dbo.open_box
GROUP BY contest_id),
Ref=(SELECT COUNT(*) FROM open WHERE refer = 'True');
This requires the Total Scanned and Total Number to be queried seperately.
Update: if you then want to INSERT that into another table there are essentially two ways to do that.
Create the table directly from the SELECT statement:
SELECT
-- the fields from the first query
INTO
[database_name].[schema_name].[new_table_name]; -- creates table new_table_name
Insert into a table that already exists from the INSERT
INSERT INTO [database_name].[schema_name].[existing_table_name](
-- the fields in the existing_table_name
)
SELECT
-- the fields from the first query
Just CROSS JOIN the five queries as derived tables:
SELECT * FROM (
Query1
) AS q1
CROSS JOIN (
Query2
) AS q2
CROSS JOIN (...
Assuming that each of your individual queries only returns one row, then this CROSS JOIN should result in only one row.

Select all Item From Table Where Order has any Items between dates

I have an Orders table and an OrderItem table. I would like to select all OrderItems that have been shipped between 2 dates, and select the additional OrderItem of a certain type that was shipped outside of the 2 dates if it's part of an Order that has OrderItems shipped between the 2 dates.
This seemed really easy when I first thought of it, but I'm having a hard time putting it into a SQL statement. I'm using SQL Server.
EDIT:
Yes, I am familiar with the between keyword. What I have is an Order, Say Order #10001. It has 2 items, a product that is shipped on 01/20/2015 and a warranty that is marked as shipped on 02/04/2015. So when I run my query:
SELECT *
FROM OrderItems
WHERE ShipDate BETWEEN '01/01/2015' AND '01/31/2015'
I only get the 1 product, I want to get the warranty that is on the Order as well.
Hope that clarifies my question.
You can do this like this:
SELECT *
FROM OrderItems
WHERE OrderID IN(
SELECT DISTINCT OrderID
FROM OrderItems
WHERE ShipDate BETWEEN '01/01/2015' AND '01/31/2015'
)
Or:
SELECT *
FROM OrderItems oi1
JOIN (
SELECT DISTINCT OrderID
FROM OrderItems
WHERE ShipDate BETWEEN '01/01/2015' AND '01/31/2015'
) oi2 ON oi1.OrderID = oi2.OrderID
Are you familiar with BETWEEN keyword?
SELECT ...
WHERE col BETWEEN AND
If you add more information, such as sample data to your question, I can elaborate on the answer.

Multiple Select against one CTE

I have a CTE query filtering a table Student
Student
(
StudentId PK,
FirstName ,
LastName,
GenderId,
ExperienceId,
NationalityId,
CityId
)
Based on a lot filters (multiple cities, gender, multiple experiences (1, 2, 3), multiple nationalites), I create a CTE by using dynamic sql and joining the student table with a user defined tables (CityTable, NationalityTable,...)
After that I have to retrieve the count of student by each filter like
CityId City Count
NationalityId Nationality Count
Same thing the other filter.
Can I do something like
;With CTE(
Select
FROM Student
Inner JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId)
FROm CTE
GROUP BY CityId,City
SELECT GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
I want to something like what LinkedIn is doing with search(people search,job search)
http://www.linkedin.com/search/fpsearch?type=people&keywords=sales+manager&pplSearchOrigin=GLHD&pageKey=member-home
It's so fast and do the same thing.
You can not use multiple select but you can use more than one CTE like this.
WITH CTEA
AS
(
SELECT 'Coulmn1' A,'Coulmn2' B
),
CETB
AS
(
SELECT 'CoulmnX' X,'CoulmnY' Y
)
SELECT * FROM CTEA, CETB
For getting count use RowNumber and CTE some think like this.
ROW_NUMBER() OVER ( ORDER BY COLUMN NAME )AS RowNumber,
Count(1) OVER() AS TotalRecordsFound
Please let me know if you need more information on this.
Sample for your reference.
With CTE AS (
Select StudentId, S.CityId, S.GenderId
FROM Student S
Inner JOIN CITY C
ON S.CityId = C.CityId
INNER JOIN GENDER G
ON S.GenderId = G.GenderId)
,
GENDER
AS
(
SELECT GenderId
FROM CTE
GROUP BY GenderId
)
SELECT * FROM GENDER, CTE
It is not possible to get multiple result sets from a single CTE.
You can however use a table variable to cache some of the information and use it later instead of issuing the same complex query multiple times:
declare #relevantStudent table (StudentID int);
insert into #relevantStudent
select s.StudentID from Students s
join ...
where ...
-- now issue the multiple queries
select s.GenderID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.GenderID
select s.CityID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.CityID
The trick is to store only the minimum required information in the table variable.
As with any query whether this will actually improve performance vs. issuing the queries independently depends on many things (how big the table variable data set is, how complex is the query used to populate it and how complex are the subsequent joins/subselects against the table variable, etc.).
Do a UNION ALL to do multiple SELECT and concatenate the results together into one table.
;WITH CTE AS(
SELECT
FROM Student
INNER JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId),NULL,NULL
FROM CTE
GROUP BY CityId,City
UNION ALL
SELECT NULL,NULL,NULL,GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
Note: The NULL values above just allow the two results to have matching columns, so the results can be concatenated.
I know this is a very old question, but here's a solution I just used. I have a stored procedure that returns a PAGE of search results, and I also need it to return the total count matching the query parameters.
WITH results AS (...complicated foo here...)
SELECT results.*,
CASE
WHEN #page=0 THEN (SELECT COUNT(*) FROM results)
ELSE -1
END AS totalCount
FROM results
ORDER BY bar
OFFSET #page * #pageSize ROWS FETCH NEXT #pageSize ROWS ONLY;
With this approach, there's a small "hit" on the first results page to get the count, and for the remaining pages, I pass back "-1" to avoid the hit (I assume the number of results won't change during the user session). Even though totalCount is returned for every row of the first page of results, it's only computed once.
My CTE is doing a bunch of filtering based on stored procedure arguments, so I couldn't just move it to a view and query it twice. This approach allows avoid having to duplicate the CTE's logic just to get a count.

Resources