Join tables then sum on distinct values - sql-server

I'm working through some Adventureworks Challenges, and I'm trying to whittle down/summarize a joined table.
The following correctly joins my two tables and produces 3 columns:
use AdventureWorks2012
select datename(dw,orderdate ) as "Day", LineTotal,OrderQty
from sales.SalesOrderDetail join sales.SalesOrderHeader
on (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID)
Adjusting it slightly, results in a table that shows only the distinct dates of Monday, tuesday, wednesday, etc...
use AdventureWorks2012
select distinct(datename(dw,orderdate)) as "Day"
from sales.SalesOrderDetail join sales.SalesOrderHeader
on (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID)
I'm trying to figure out how to produce the sum of LineTotal and the sum of OrderQty per day, but not having success.
The closest I've gotten is the following:
use AdventureWorks2012
select distinct(datename(dw,orderdate)) as "Day", sum(LineTotal), SUM(OrderQty)
from sales.SalesOrderDetail join sales.SalesOrderHeader
on (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID)
group by OrderDate
However, this produces many, many rows, instead of just 7 rows and the accompanying totals of LineTotal, OrderQty.
Thanks for any suggestions.

use AdventureWorks2012
select
datename(dw,orderdate ) as "Day",
SumLineTotal=SUM(LineTotal),
SumOrderQty=SUM(OrderQty)
from
sales.SalesOrderDetail
INNER join sales.SalesOrderHeader on (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID)
group by
datename(dw,orderdate)

I don't have adventure works but pretty sure you could do something like this. Notice I am using aliases here to make this a little easier. Also, using datepart and the long hand date part instead of short hand. The short hand version is difficult to remember.
select OrderDay = datepart(dayofweek, soh.OrderDate)
, sum(sod.LineTotal)
, SUM(sod.OrderQty)
from sales.SalesOrderDetail sod
join sales.SalesOrderHeader soh on sod.SalesOrderID = soh.SalesOrderID
group by datepart(dayofweek, soh.OrderDate)

use AdventureWorks2012
select datename(dw,orderdate) as "Day", sum(LineTotal), SUM(OrderQty)
from sales.SalesOrderDetail join sales.SalesOrderHeader
on SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID
group by datename(dw,orderdate)
When used, it is the group by clause that controls the formation of rows, not the select clause. So use the datename function on the orderdate column in the group by clause.
Also, please note that distinct is not a function. so parentheses after the word distinct mean nothing, in fact they are ignored.
Do NOT use select distinct when using group by, it is redundant.

Related

AdventureWorks in SQL Server 2019 : ordershare percent for each item per month

I'm new to SQL Server. I'm trying to write a code to find the sale percent/per item/per month. Something like this:
Year
Month
ProductID
Order_Quantity_Per_Month
Total_Sold_Per_Month
%_Of_Total_Sale
2011
5
707
422
17024
2
First and most importantly, I want to write this code with "CTE" and "Group by". I've tried many times but I failed. How can I write this code with cte and group by?
I wrote this code with "Over" and "Partition". Could someone check the codes I've written to see if it's actually correct:
USE AdventureWorks2019
GO
SELECT
YEAR (soh.OrderDate) As Year,
MONTH (soh.OrderDate) As Month,
pro.productid AS [Product ID],
pro.Name AS [Product Name],
SUM(sod.OrderQty) OVER (PARTITION BY Month(soh.OrderDate) ORDER BY by soh.OrderDate) AS [Order Quantity Per Month],
SUM(sod.OrderQty) OVER (PARTITION BY Month(soh.OrderDate)) AS [Total Sold Per Month],
SUM(sod.OrderQty) OVER (PARTITION BY Month(soh.OrderDate) ORDER BY by soh.OrderDate) * 100 / SUM(sod.OrderQty) OVER (PARTITION BY Month(soh.OrderDate)) AS [% of TotalSale]
FROM
Production.Product pro
INNER JOIN
Sales.SalesOrderdetail sod ON pro.ProductID = sod.ProductID
INNER JOIN
Sales.SalesOrderheader soh ON soh.SalesOrderID = sod.SalesOrderID
GROUP BY
YEAR(soh.OrderDate), MONTH(soh.OrderDate),
soh.OrderDate, pro.productid, pro.Name, sod.OrderQty
ORDER BY
Year, Month
If the above code is correct, How can I write the code with cte and group by?
I think the better question is why you want (or need) to use a CTE. A simple CTE (i.e., not recursive) is just syntactic sugar for a derived table. There is nothing particular special or complicated about writing and using one in a query. If you "tried many times", you should have included those attempts in your question.
But to satisfy the need to use a CTE, you can simply "cram" the query you have into the CTE and select rows from it. Example:
with cteOrders as (
select ... -- your original query here without ORDER BY clause
)
select * from cteOrders
order by [Year], [Month]
;
That is a extremely simplistic way of using a CTE. There is no real or obvious advantage to doing so but it does satisfy your goal. Because of that, I smell a XY problem.

How do I properly add this query into my existing query within Query Designer?

I currently have the below query written within Query Designer. I asked a question yesterday and it worked on its own but I would like to incorporate it into my existing report.
SELECT Distinct
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT Distinct
MIN(c2.Comments) AS FlaggedComments
,c2.SalesKey
FROM [SalesComment] AS c2
WHERE(c2.Comments like 'Flagged*%')
GROUP BY c2.SalesKey) ic2
ON ic2.SalesKey = i.SalesKey
LEFT JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'
My question yesterday was that I wanted a way to select only the first comment made for each sale. I have a query for selecting the flagged comments but I want both the first row and the flagged comment. They would both be pulling from the same table. This was the query provided and it worked on its own but I cant figure out how to make it work with my existing query.
SELECT a.DateTimeCommented, a.ProductNumber, a.Comments, a.SalesKey
FROM (
SELECT
DateTimeCommented, ProductNumber, Comments, SalesKey,
ROW_NUMBER() OVER(PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) a
WHERE a.RowN = 1
Thank you so much for your assistance.
You can use a combination of row-numbering and aggregation to get both the Flagged% comments, and the first comment.
You may want to change the PARTITION BY clause to suit.
DISTINCT on the outer query is probably spurious, on the inner query it definitely is, as you have GROUP BY anyway. If you are getting multiple rows, don't just throw DISTINCT at it, instead think about your joins and whether you need aggregation.
The second LEFT JOIN logically becomes an INNER JOIN due to the WHERE predicate. Perhaps that predicate should have been in the ON instead?
SELECT
i.ProductNumber
,i.ProductType
,i.ProductPurchaseDate
,ih.SalesPersonComputerID
,ih.SalesPerson
,ic2.FlaggedComments
,ic2.FirstComments
FROM [Products] i
LEFT OUTER JOIN
(SELECT
MIN(CASE WHEN c2.RowN = 1 THEN c2.Comments) AS FirstComments
,c2.SalesKey
,MIN(CASE WHEN c2.Comments like 'Flagged*%' THEN c2.Comments) AS FlaggedComments
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY DateTimeCommented) as RowN
FROM [SalesComment]
) AS c2
GROUP BY c2.SalesKey
) ic2 ON ic2.SalesKey = i.SalesKey
JOIN [SalesHistory] AS ih
ON ih.SalesKey = i.SalesKey
WHERE
i.SaleDate between #StartDate and #StopDate
AND ih.Status = 'SOLD'

How to self join a CTE?

Hi This is the query i have created to fetch records from a history table This query is working fine but taking too much time to execute because it is selecting data twice from the table and the table has 20 millon or above records. So i want to optimize it. I will try to explain what this query does and what i want to achieve.
I want to select two rows for each Id (ActivityId) from the table first is where the data is min and second where the date is maximum, So i can see how much progress happened in the mean time. Now what i am doing is first selecting the data where date is min as CTE and then selecting the data where date is maximum as CTE2. I can select both rows in a single CTE but I am not able to get a single record from both rows. Like i only need the Progress and Planned Field from the maximum date row and select all the other fields (most fields are common on both rows). So how can i achieve this. and please ask for clarification if needed.
CTE:
WITH cte AS
(
SELECT Row_number() OVER (partition BY pmaph.activityid ORDER BY date) r1,
pma.planenddate AS planenddate ,
pma.planstartdate AS planstartdate,
pmaph.activityid,
pmaph.projectmilestoneactivproghist_id AS promileavtiid,
umd.uom_name AS uomname,
pmm.milestonename AS milestonename,
pma.activityname AS activityname ,
pmp.projectname AS projectname,
Replace((Rtrim(Ltrim(CONVERT(VARCHAR(12),Cast(pmaph.date AS DATETIME),106)))),' ','-') AS rdate,
Isnull(pmaph.actual_progress,0) AS actualprogress,
Isnull(pmaph.planned_progress,0) AS plannedprogress
FROM projectmilestoneactivityprogresshistory AS pmaph
LEFT JOIN dbo.pm_project AS pmp
ON pmaph.projectid=pmp.projectid
LEFT JOIN dbo.pm_activity AS pma
ON pmaph.activityid=pma.activityid
LEFT JOIN dbo.pm_milestone AS pmm
ON pmaph.milestoneid=pmm.milestoneid
LEFT JOIN dbo.uomdetail AS umd
ON pma.uom_id=umd.uom_id
WHERE pmaph.client_id=1030), cte2 AS( r2,isnull(pmaph.actual_progress,0) AS actualprogress, isnull(pmaph.planned_progress,0) AS plannedprogress, replace((rtrim(ltrim(CONVERT(varchar(12), cast(pmaph.date AS datetime),106)))),' ','-') AS rdate,pmaph.activityid FROM projectmilestoneactivityprogresshistory AS pmaph WHERE pmaph.client_id=1030)
SELECT cte2.rdate AS todate,
cte2.actualprogress AS end_actualprogress,
cte2.plannedprogress AS end_plannedprogress,
cte.actualprogress AS start_actualprogress,
cte.plannedprogress AS start_plannedprogress,
cte.rdate AS fromdate ,
cte.planenddate,
cte.planstartdate,
cte.activityid,
cte.uomname,
cte.milestonename,
cte.activityname,
cte.projectname
FROM cte
JOIN cte2
ON cte.activityid= cte2.activityid
WHERE cte.r1=1
AND cte2.r2=1

SQL Query Multiple Joins Unexpected Results

My task is to write a query that will return sales information for each customer category and year. The columns required in the result set are:
OrderYear - the year the orders were placed
CustomerCategoryName - as it appears in the table Sales.CustomerCategories
CustomerCount - the number of unique customers placing orders for each CustomerCategoryName and OrderYear
OrderCount - the number of orders placed for each CustomerCategoryName and OrderYear
Sales - the subtotal from the orders placed, calculated from Quantity and UnitPrice of the table Sales.OrderLines
AverageSalesPerCustomer - the average sales per customer for each CustomerCategoryName and OrderYear
The results should be sorted in ascending order, first by order year, then by customer category name.
My attempt at a solution:
SELECT
CC.CustomerCategoryName,
YEAR(O.OrderDate) AS OrderYear,
COUNT(DISTINCT C.CustomerID) AS CustomerCount,
COUNT(DISTINCT O.OrderID) AS OrderCount,
SUM(OL.Quantity * OL.UnitPrice) AS Sales,
SUM(OL.Quantity * OL.UnitPrice) / COUNT(DISTINCT C.CustomerID) AS AverageSalesPerCustomer
FROM
Sales.CustomerCategories CC
INNER JOIN
Sales.Customers C ON C.CustomerCategoryID = CC.CustomerCategoryID
INNER JOIN
Sales.Orders O ON O.CustomerID = C.CustomerID
INNER JOIN
Sales.OrderLines OL ON OL.OrderID = O.OrderID
GROUP BY
CC.CustomerCategoryName, YEAR(O.OrderDate)
ORDER BY
YEAR(O.OrderDate), CC.CustomerCategoryName;
My OrderCount seems correct. However, I don't believe my CustomerCount is correct and my Sales and AverageSalesPerCustomer seem way off. The Categories that do not have any customers and orders do not show up in my results.
Is the reason that my counts are off and that he categories that do not have any customers are omitted is because they only have null values? I believe the question is looking for all the categories.
I am using the sample tables of WideWorldImporters from Microsoft.
Any help would be appreciated as I am new to SQL and Joins are a very hard concept for me to understand.
Presently, you're getting only the data that exists in order details...and not getting anything for the non-existent orders. Normally, this is accomplished with outer joins instead of inner joins, and an isnull(possiblyNullValue,replacementValue).
Also, while you're grouping by year(o.OrderDate), your join for orders isn't distinguishing by year...probably getting all years worth of data for each customer for each reporting period.
So, let's get the reporting period out first...and make sure we're basing our results on that:
select distinct year(o.OrderDate) from Sales.Orders
But really, you want all categories and all years...so you can combine them to get the real basis:
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
Now, you want to join this mess into the remaining query. There are two ways to do this...one is to use a with clause...but sometimes it's just easier to just wrap the basis query up in parentheses and use it as if it was a table:
select
cy.CustomerCategoryName,
cy.CalendarYear,
count(distinct c.CustomerId) CustomerCount,
isnull(sum(ol.UnitPrice * ol.Quantitiy),0.0) Sales,
isnull(sum(ol.UnitPrice * ol.Quantitiy) / count(distinct c.CustomerId),0.0) AverageSalesPerCustomer
from
(
select
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate) CalendarYear --> must name calc'd cols in virtual tables
from
Sales.Orders o
cross join
Sales.CustomerCategories cc
group by
cc.CustomerCategoryId,
cc.CustomerCategoryName,
year(o.OrderDate)
) as cy --> cy is the "Category Years" virtual table
left outer join
Sales.Customers c
on cy.CustomerCategoryId = c.CustomerCategoryId
left outer join
Sales.Orders o
on
c.CustomerId = o.CustomerId --> join on customer and year
and --> to make sure we're only getting
cy.CalendarYear = Year(o.OrderDate) --> orders in the right year
left outer join
Sales.OrderLines ol
on o.OrderId = ol.OrderId
group by
cy.CalendarYear,
cy.CustomerCategoryName
order by
cy.CalendarYear,
cy.CustomerCategoryName
By the way...get comfortable messing with your queries to select some subset...for example, you can add a where clause to select only one company...and then go have a look at the details...to see if it passes the smell test. It's a lot easier to evaluate the results when you limit them. Similarly, you can add the customer to the select list and the outer grouping for the same reason. Experimentation is the key.

Max(datetime) ignore seconds

I got my sql query working but when I find a date with the same hour and minutes, I am not able to make the query works. For example:
On the column "trans_date" I can't use my query because with max(trans_date) I am getting no results, somehow the sql is ignoring the seconds.
This is my complete sql sentence:
SELECT till.code,art.description
FROM [TCPOS4].[dbo].[transactions] as tra,
TCPOS4.dbo.articles as art,[TCPOS4].[dbo].[trans_articles] as tro,
[TCPOS4].[dbo].[tills] as till,[TCPOS4].[dbo].[shops] as shop
where tra.till_id=till.id and shop.id=till.shop_id and tro.transaction_id=tra.id and
art.id=tro.article_id and tra.trans_date =(select max(trans_date)
from tcpos4.dbo.transactions as t2 where t2.till_id=tra.till_id and trans_date > '2016-10-26 00:00:0.000' and trans_date< '2016-10-27 00:00:00.000' )
group by till.code,art.description
With this query I am getting for each "code" from the 2016-10-26 to 2016-10-27 the max transaction_date, but I am not getting any information from the code "5446". I should get "TABLE CHOCOLECHE-CONGUITOS" because it's the max trans_date in the range.
Can you try with a different approach like the following?
SELECT code, description
FROM
(
SELECT till.code, art.description,
row_number() OVER (PARTITION BY till.code ORDER BY trans_date DESC) RowNum
FROM [TCPOS4].[dbo].[transactions] AS tra
LEFT JOIN [TCPOS4].[dbo].[tills] AS till
ON tra.till_id=till.id
LEFT JOIN [TCPOS4].[dbo].[shops] AS shop
ON shop.id=till.shop_id
LEFT JOIN [TCPOS4].[dbo].[trans_articles] AS tro
ON tro.transaction_id=tra.id
LEFT JOIN TCPOS4.dbo.articles AS art
ON art.id=tro.article_id
) sbt
WHERE RowNum=1
In this way you will get one result for each till.code even if you have the same exact date.
You can add more fields in the ORDER BY if needed.
EDIT: removed art.description in PARTITION BY.
EDIT 2: converted with LEFT JOIN
try
...CAST(tra.trans_date AS DATE) = (select CAST(max(trans_date) AS DATE)...

Resources