Count of operations in group by - sql-server

Recently in the code of my collegue I saw an sql query, where she used GROUP BY with lots of columns. Most of these columns needn't be grouped in the query. She has done this to prevent this error:
Column 'some_col' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
I was wondering how heavy GROUP BY is, and is it ok to use such statements? If it is heavy than I'd better optimize her the query cause now I work on that piece of code.

It is hard to tell for sure without seeing the specific query, but I used to achieve surprising performance gains (at leas in SQL2K) by minimizing number of columns included in GROUP BY, and resolving those columns back with join on the inner query. To be more specific: let's assume you have classing OrderDetails (OrderID, ProductID, Quantity, Price) and Products (ProductID, ProductName) tables. Changing this query:
select P.ProductID, ProductName, sum(Quantity * Price)
from Products as P
inner join OrderDetails as OD on P.ProductID = OD.ProductID
group by P.ProductID, ProductName
to this:
select X.ProductID, PP.ProductName, X.OrderValue
from
(
select P.ProductID, sum(Quantity * Price) as OrderValue
from Products as P
inner join OrderDetails as OD on P.ProductID = OD.ProductID
group by P.ProductID
) as X
inner join Products as PP on X.ProductID = P.ProductID
would give me performance gain despite two joins to the same table, because grouping on integer index was faster then grouping on text-valued, unsorted product name.

Related

SQL Project using a where clause

So this is what I am working with new to sql and still learning been stuck on this for a few days now. Any advice would be appreciated I attached the image of the goal I'm trying to achieve
OrderItem And Product Table
Order And OrderItem Table(https://i.stack.imgur.com/pdbMT.png)
Scenario: Our boss would like to see the OrderNumber, OrderDate, Product Name, UnitPrice and Quantity for products that have TotalAmounts larger than the average
Create a query with a subquery in the WHERE clause. OrderNumber, OrderDate and TotalAmount come from the Order table. ProductName comes from the Product table. UnitPrice and Quantity come from the OrderItem table.
This is the code I came up with but it causes product name to run endlessly and displays wrong info.
USE TestCorp;
SELECT DISTINCT OrderNumber,
OrderDate,
ProductName,
i.UnitPrice,
Quantity,
TotalAmount
FROM [Order], Product
JOIN OrderItem i ON Product.UnitPrice = i.UnitPrice
WHERE TotalAmount < ( SELECT AVG(TotalAmount)
FROM [Order]
)
ORDER BY TotalAmount DESC;
Best guess assuming joins and fields not provided.
SELECT O.OrderNumber, O.orderDate, P.ProductName, OI.UnitPrice, OI.Quantity, O.TotalAmount
FROM [Order] O
INNER JOIN OrderItem OI
on O.ID = OI.orderID
INNER JOIN Product P
on P.ID= OI.ProductID
CROSS JOIN (SELECT avg(TotalAmount) AvgTotalAmount FROM [Order]) z
WHERE O.TotalAmount > z.AvgTotalAmount
Notes:
You're mixing join notations don't use , and inner join together that's mixing something called ANSI Standards.
I'm not sure why you have a cross join to product to begin with
You don't specify how to join Order to order item.
It seems very odd to be joining on Price.... join on order ID or productID maybe?
you could cross join to an "Average" result so it's available on every record. (I aliased this inline view "Z" in my attempt)
so what the above does is include all Orders. and for each order, an order item must be associated for it to be included. And then for each order item, a productid must be included and related to a record in product. If for some reason an order item record doens't have a related entry in product table, it gets excluded.
I use a cross join to get the average as it's executed 1 time and applied/joined to every record.
If we use the query in the where clause it's executed one time for EVERY record (unless the DB Engine optimizer figures it out and generates a better plan)
I Assume
Order.ID relates to OrderItem.OrderID
OrderItem.productID relates to Product.ID
Order.TotalAmount is what we are wanting to "Average" and compare against
Every Order has an Order Item entry
Every Order Item entry has a related product.

SQL Query Multiple Joins Unexpected Results

My task is to write a query that will return sales information for each customer category and year. The columns required in the result set are:
OrderYear - the year the orders were placed
CustomerCategoryName - as it appears in the table Sales.CustomerCategories
CustomerCount - the number of unique customers placing orders for each CustomerCategoryName and OrderYear
OrderCount - the number of orders placed for each CustomerCategoryName and OrderYear
Sales - the subtotal from the orders placed, calculated from Quantity and UnitPrice of the table Sales.OrderLines
AverageSalesPerCustomer - the average sales per customer for each CustomerCategoryName and OrderYear
The results should be sorted in ascending order, first by order year, then by customer category name.
My attempt at a solution:
SELECT
CC.CustomerCategoryName,
YEAR(O.OrderDate) AS OrderYear,
COUNT(DISTINCT C.CustomerID) AS CustomerCount,
COUNT(DISTINCT O.OrderID) AS OrderCount,
SUM(OL.Quantity * OL.UnitPrice) AS Sales,
SUM(OL.Quantity * OL.UnitPrice) / COUNT(DISTINCT C.CustomerID) AS AverageSalesPerCustomer
FROM
Sales.CustomerCategories CC
INNER JOIN
Sales.Customers C ON C.CustomerCategoryID = CC.CustomerCategoryID
INNER JOIN
Sales.Orders O ON O.CustomerID = C.CustomerID
INNER JOIN
Sales.OrderLines OL ON OL.OrderID = O.OrderID
GROUP BY
CC.CustomerCategoryName, YEAR(O.OrderDate)
ORDER BY
YEAR(O.OrderDate), CC.CustomerCategoryName;
My OrderCount seems correct. However, I don't believe my CustomerCount is correct and my Sales and AverageSalesPerCustomer seem way off. The Categories that do not have any customers and orders do not show up in my results.
Is the reason that my counts are off and that he categories that do not have any customers are omitted is because they only have null values? I believe the question is looking for all the categories.
I am using the sample tables of WideWorldImporters from Microsoft.
Any help would be appreciated as I am new to SQL and Joins are a very hard concept for me to understand.
Presently, you're getting only the data that exists in order details...and not getting anything for the non-existent orders. Normally, this is accomplished with outer joins instead of inner joins, and an isnull(possiblyNullValue,replacementValue).
Also, while you're grouping by year(o.OrderDate), your join for orders isn't distinguishing by year...probably getting all years worth of data for each customer for each reporting period.
So, let's get the reporting period out first...and make sure we're basing our results on that:
select distinct year(o.OrderDate) from Sales.Orders
But really, you want all categories and all years...so you can combine them to get the real basis:
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
Now, you want to join this mess into the remaining query. There are two ways to do this...one is to use a with clause...but sometimes it's just easier to just wrap the basis query up in parentheses and use it as if it was a table:
select
cy.CustomerCategoryName,
cy.CalendarYear,
count(distinct c.CustomerId) CustomerCount,
isnull(sum(ol.UnitPrice * ol.Quantitiy),0.0) Sales,
isnull(sum(ol.UnitPrice * ol.Quantitiy) / count(distinct c.CustomerId),0.0) AverageSalesPerCustomer
from
(
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate) CalendarYear --> must name calc'd cols in virtual tables
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
) as cy --> cy is the "Category Years" virtual table
left outer join
Sales.Customers c
on cy.CustomerCategoryId = c.CustomerCategoryId
left outer join
Sales.Orders o
on
c.CustomerId = o.CustomerId --> join on customer and year
and --> to make sure we're only getting
cy.CalendarYear = Year(o.OrderDate) --> orders in the right year
left outer join
Sales.OrderLines ol
on o.OrderId = ol.OrderId
group by
cy.CalendarYear,
cy.CustomerCategoryName
order by
cy.CalendarYear,
cy.CustomerCategoryName
By the way...get comfortable messing with your queries to select some subset...for example, you can add a where clause to select only one company...and then go have a look at the details...to see if it passes the smell test. It's a lot easier to evaluate the results when you limit them. Similarly, you can add the customer to the select list and the outer grouping for the same reason. Experimentation is the key.

SQL is it possible to use AND/OR with HAVING clause

So I have these tables:
Products
--------
Product ID | Quantity
AND
OrdersLines
-----------
Product ID | Amount --(multiple lines with the same ID)
I'm using this select:
SELECT
P.ProductID,
P.Quantity,
SUM(OL.Amount) AS Ordered
FROM atbl_Sales_Products AS P
INNER JOIN atbl_Sales_OrdersLines AS OL ON OL.ProductID = P.ProductID
GROUP BY P.ProductID, P.Quantity
HAVING P.Quantity > SUM(OL.Amount)
The select works properly if ProductID is used in both tables.
However, if ProductID is not used in OrdersLines table or Amount in that table is Null - such rows are not included.
When you want to join across tables but always need to include records from one side of the join or the other, then you need to use one of the OUTER JOINs as opposed to the INNER JOIN in your SQL. If you want to include a record from your atbl_Sales_Products even when there may be no matching record in the atbl_Sales_OrderLines with the same ProductID then you should use a LEFT JOIN.
As mentioned in the comments you can use any operators you use in a WHERE clause with a HAVING clause.
SELECT
P.ProductID,
P.Quantity,
SUM(OL.Amount) AS Ordered
FROM atbl_Sales_Products AS P
LEFT JOIN atbl_Sales_OrdersLines AS OL
ON P.ProductID = OL.ProductID
GROUP BY P.ProductID, P.Quantity
HAVING SUM(OL.Amount) IS NULL
OR P.Quantity > SUM(Ol.Amount)
Additional OR statement solved my problem.

difference between joining three tables in a nested join query and a single join query

I wanna know that what is the difference between these two queries when both produce the same results:
SELECT a.OrderID,
ProductID,
a.LastName
FROM [Order Details],
(SELECT Employees.EmployeeID,
OrderID,
LastName
FROM Employees,
Orders
WHERE Employees.EmployeeID = Orders.EmployeeID
AND LastName = 'Buchanan')a
WHERE [Order Details].OrderID = a.OrderID
and
SELECT Orders.OrderID,
ProductID,
LastName
FROM [Order Details],
Employees,
Orders
WHERE Orders.OrderID = [Order Details].OrderID
AND Orders.EmployeeID = Employees.EmployeeID
AND LastName = 'Buchanan'
first one is a nested join query and second one is a single join query but joining the same three tables and producing the same results...
Joins will perform better (for almost all cases) but sub queries are easier to understand and write, especially for people who are newer at SQL. For small data sets you will likely not see any difference, but it will become exponential as your set rises.
Join vs. sub-query

Need help creating a query for a non-normalized database

I've never worked with a non-normalized database before, so I'll try and explain my problem as best I can. So I have two tables:
The customers table holds all the customers information, and the orders table holds all the orders that they have placed. I haven't listed all the fields in the tables, just the ones that I need. The customer number in both tables is not the primary key, but I'm inner joining on them anyway. So the problem I'm having is that I don't know how to make a query that:
Selects all the customers with their first name, last name, and email, and also show the most recent orderdate, most recent total, and most recent ordertype. I know that I have to use a max() aggregate for the date, but that's as far as I got. Please help a noob out.
You can try:
SELECT FirstName,
LastName,
Email,
OrderDate,
OrderTotal,
OrderType
FROM Customers AS C
INNER JOIN Order AS O
ON O.CustomerNumber = C.CustomerNumber AND
O.OrderDate = (
SELECT MAX (O1.OrderDate)
FROM Order AS O1
WHERE O1.CustomerNumber = C.CustomerNumber)
)
assuming that Orders.OrderDate is unique for each CustomerNumber, does this work for you? if a single CustomerNumber has more than one entry in Order for OrderDate, you'll get each of those rows.
select c.FirstName, c.LastName, c.Email, o.OrderDate, o.OrderTotal, o.OrderType
from Customers c
join
(select CusomterNumber, max(OrderDate) as MostRecentOrderDate
from Orders
group by CustomerNumber
) mro on mro.CustomerNumber=s.CustomerNumber
join Orders o on o.OrderDate=mro.MostRecentOrdeDate and
o.CustomerNumber=mro.CustomerNumber
Try this:
SELECT
Customers.*, Orders.*
FROM
Customers
JOIN
(SELECT
Customer_Number,
MAX(Order_Date) OrderDate
FROM
Orders
GROUP BY
Customer_Number
) as Ord ON Customers.Customer_Number = Ord.Customer_Number
JOIN Order ON Orders.Customer_Number = Ord.Customer_Number
If you are doing this with SQL Server use the query designer and basically all you want to do is do a join since you have two keys that are the same one in Customer Table ->Customer Join on Order->Customer alias the Customer table as C and Orders table as O
so for example
SELECT Customer.*, Orders.*
From Customer c, Orders O INNER JOIN O where C.Customer Number = O.Customer Number
This should be enough to get you started.. if you don't want all the fields then fully qualify the names for example
SELECT C.FirstName, C.LastName, O.OrderDate, O.OrderType FROM Customer C, Orders O
WHERE C.Customer NUmber = O.Customer Number //this is another way of doing a Join when working with the where Clause.

Resources