TSQL sub query invalid column and only one expression - sql-server

It has been a while since writing T-SQL for me and I know this can be done but my memory is good enough to get me close (I think I'm close) but poor enough to not get it right.
To start I have this query:
SELECT DISTINCT(COMM_TYPE),
COUNT(COMM_TYPE) AS 'Total'
FROM
[MYDB].[dbo].[COMM]
GROUP BY
COMM_TYPE
Which returns:
COMM_TYPE Total
--------------------------
TypeA 1
TypeB 44474
TypeC 3
TypeD 3854
TypeE 12327
TypeF 362912
TypeG 484344
TypeH 386
TypeI 106
This is an accurate result.
So now I want the above PLUS a sample of each one. Something with columns like:
ID COMM_TYPE TOTAL DATA COMMENTS PRIMARY COMM_NUMBER
I believe this can be done with a sub query but I am not writing it correctly as I get two errors.
Msg 207, Level 16, State 1, Line 10
Invalid column name 'CT'.
Msg 116, Level 16, State 1, Line 7
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
The second error I understand. My sub query has two columns being returned but positioned in the select as I have it wants only one.
The first error I'm more lost on. I thought I could reference an sub query column in the outer query?
Here is the query:
SELECT TOP(1)
*,
(SELECT
DISTINCT(COMM_TYPE),
COUNT(COMM_TYPE)
FROM
[MYDB].[dbo].[COMM]
GROUP BY
COMM_TYPE) AS CT
FROM
[MYDB].[dbo].[COMM]
WHERE
CT = COMM_TYPE
This is mostly for myself but if it helps anyone here ya go:
We start with a (cte to wrap the entire operation as it bring many benefits but the two applicable here are:
1.Enable grouping by a column that is derived from a scalar subselect.
2.Reference the resulting table multiple times in the same statement
WITH T
AS (
CTE SELECT Statement
)
FINAL SELECT Statement
Next our CTE select basically return three columns for us.
1.Total which in my query was COUNT on a column
2.RN which is the row number
3.Wildcard * which gets all the columns from the table
Now from this point we get into the Partitioning....
So it seems that we need to choose how we are going to break this table up. Since I had defined DISTINCT(COMM_TYPE) without realizing it there was my partition....in that first column definition we also do a count(*). So what must be happening is that first SQL engine breaks table into pieces (partitions) then does a count of records in those pieces....????
SELECT Count(*)
OVER (PARTITION BY COMM_TYPE) AS Total,
Next we do a row_number() operation OVER (aka operating against) again my partition of COMM_TYPE...we then order it and project the column name of rn....kinda not sure why this is needed till I got to the end then it made sense.
Row_number()
OVER (PARTITION BY COMM_TYPE
ORDER BY COMM_TYPE) AS RN,
finally we just pull a wildcard which is every column in the table.
So in the depths of the SQL engine namespace memory registers this must be quite a big hunk of data with these repeated grouping operations "OVER" everything.
However all we see is a single row and that is because of the last select which gives me everything all mushed together as I wanted and we only get the TOP(1) because of that RN column I didn't understand earlier.
Do I understand it properly?

This should do what you need.
WITH T
AS (SELECT Count(*)
OVER (PARTITION BY COMM_TYPE) AS Total,
Row_number()
OVER (PARTITION BY COMM_TYPE
ORDER BY COMM_TYPE) AS RN,
*
FROM MyDb.dbo.Comm)
SELECT *
FROM T
WHERE RN = 1

Related

How to filter or split a CTE so that 2 rows are not added with the same value in a specific column

So the title sounds convoluted because my problem kinda is.
I have a CTE that pulls in some values (LineId, OrderNumber, OrderLine, Type, BuildUsed)
Later on a have a Select that populates a view that does a join on the CTE with something like this
left join CTE C on C.LineId = (select top 1 lineId from CTE C2 where C2.orderNumber = orderNumber and C2.orderLine = orderLine order by LineId
An example of my data would look like
LineId = 10, Order : OIP001, Line = 1, Type = Active, BuildUsed = XE9
LineId = 80, Order : OIP001, Line = 1, Type = Inactive, BuildUsed = XB2
The CTE does a Select, Union, Select. The first select gets all the active entries and the 2nd select gets all the inactive entries.
Any given order could have both active or inactive or just 1 of them.
The issue I am having is that my runtime is bad. It runs in close to 20 seconds when it should be like 4 or 5. The issue is that the join I listed above has to search and order every time and its a huge time sink.
So i thought if there was a way to basically break the CTE into 2 steps.
Insert all the active orders (These are the ones that I would want to pick if they are available)
Insert all the inactive orders (If that ordernumber and orderline does not already exist in the first step)
That way I don't have to order and sort every single join but I can just do a normal join thats significantly faster.
If it helps at all the LineId is based on a rownumber() in the CTE that looks like
ROW_NUMBER() OVER(ORDER BY Type desc, DescriptionStatus asc) as LineId
So the LineId is already ordered correctly.
Is there any way to split the CTE so that my 2nd part of the select can check if the ordernumber and orderline alraedy exists in the first part?
To specify. I would like to find any Active entries for the ordernumber and orderline first and then if none are found, try the inactive entries.
WHAT I HAVE TRIED SO FAR :
I tried adding the query for the 2nd part into the first part as a where clause. So it would only add where it wouldn't exist in the first part. But the time of the query got so insane I just stopped running it and scrapped that idea.
I believe you're just looking for a WHERE NOT EXISTS that uses a correlated sub-query to eliminate rows from your second result set that you've already retrieved in your first result set.
WHERE NOT EXISTS is generally pretty performant, but test the CTE by itself to be sure it meets your needs.
Something similar to this:
WITH cte
AS
(
SELECT
act.LineID,
act.OrderNumber,
act.OrderLine,
act.Type,
act.BuildUsed
FROM
ActiveSource AS act
UNION ALL
SELECT
inact.LineID
,inact.OrderNumber
,inact.OrderLine
,inact.Type
,inact.BuildUsed
FROM
InactiveSource AS inact
WHERE
NOT EXISTS
(
SELECT
1
FROM
ActiveSource AS a
WHERE
a.OrderNumber = inact.OrderNumber
AND a.OrderLine = inact.OrderLine
)
)
SELECT * FROM cte;

Using Top in T-SQL

A question on using Top. For example, we have this SQL statement:
SELECT TOP (5) WITH TIES orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate DESC;
It orders return rows by orderdate first then select the top most five rows.
But isn't that ORDER clause happens after SELECT clause, which means that the first five order in random will be returned first then those five rows are ordered by orderdate?
The order of commands in the statement doesn't reflect the actual order of operations that SQL follows. See this article which shows the order to be:
from
where
group by
having
select
order by
limit
As you can see, the TOP operation (limit) is the last to be executed.
Question has already an accepted answer. But I would like to quote content from Microsoft Documentation.
Logical Processing Order of the SELECT statement
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
But isn't that ORDER clause happens after SELECT clause, which means
that the first five order in random will be returned first then those
five rows are ordered by orderdate ?
No. ORDER BY is processed after the SELECT, but limiting the result set to 5 rows happens even later.
The physical details of actual query processing may vary, but the end result would be as if the server sorted the whole table by orderdate, then picked the top 5 (or more if needed due to ties) rows, return those rows and discard the rest.

SQL Server - Delete Duplicate Rows - how does Partition By affect this query?

I've been using the following inherited query where I'm trying to delete duplicate rows and I'm getting some unexpected results when first running it as a SELECT - I believe it has something to do with my lack of understanding of the Partition part of the statement:
WITH CTE AS(
SELECT [Id],
[Url],
[Identifier],
[Name],
[Entity],
[DOB],
RN = ROW_NUMBER()OVER(PARTITION BY Name ORDER BY Name)
FROM Data.Statistics
where Id = 2170
)
DELETE FROM CTE WHERE RN > 1
Can someone help me understand exactly what I'm doing with the Partition BY Name part of this? This doesn't limit the query in any way to only looking for duplicates in the Name field, correct? I need to ensure that it's looking for records where all 5 of the fields inside the CTE definition are the same for a record to be considered a duplicate.
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) doesn't make a lot of sense. You wouldn't ORDER BY the same thing you used in PARTITION BY since it will be the same value for everything in the partition, making the ORDER BY part useless.
Basically the CTE part of this query is saying to split the matching rows (those with [Id] = 2170) temporarily into groups for each distinct name, and within each group of rows with the same name, order those by name (which are obviously all the same value) and then return the row number within that sequence group as RN. Unique names will all have a row number of 1, because there is only one row with that name. Duplicate names will have row numbers 1, 2, 3, and so on. The order of those rows is undefined in this case because of the silly ORDER BY clause, but if you changed the ORDER BY to something meaningful, the row numbers would follow that sequence.

MSSQL Group By failing, but no dup Column names

I know this question has been asked time and time again, but I have no two column names that are the same, yet I am getting:
Msg 8120, Level 16, State 1, Line 13 Column 'dbo.PRODUCT.ProductName'
is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
My ProductId column is unique to my dbo.Product Table, and I am not sure why it is getting confused with another value. In this image you can see the dup ProductIds
WITH products AS
(
SELECT
*,
ROW_NUMBER() OVER(ORDER BY p.[ProductName]) AS 'RowNumber'
FROM dbo.PRODUCT p
JOIN dbo.Category c ON p.ProductCategoryCode = c.CategoryCode
JOIN dbo.Supplier s ON p.ProductSupplierCode = s.SupplierCode
LEFT JOIN dbo.ProductTag pt ON pt.ProductUPC = p.UPC
LEFT JOIN dbo.Tag t ON pt.ProductTagTagCode = t.TagCode
GROUP BY p.ProductId
)
SELECT *
FROM products
WHERE RowNumber BETWEEN 0 AND 2;
Your error is because you are selecting ALL of the fields in ALL of the tables, but you are only grouping by one value. If a value is returned by the query, then it must either be GROUPED or aggregated (Min, Max, SUM, AVG, etcetera).
If you simply add the Product Name to your grouping:
GROUP BY p.ProductId, p.ProductName
You will still have the same problem with (for example) p.ProductCategoryCode, p.ProductSupplierCode, c.CategoryCode, etc, etc.
In this case, where you are looking for unique rows, do not use GROUP BY - use DISTINCT (which works on all fields returned automatically) instead. Note that #bjones is still correct as to why you are getting duplicates - one of the tables you are joining in can have multiple rows for each product (e.g. many times a product will come from more than one supplier.)
To solve this, you need to:
Determine what data you need to return, and only select those columns
Determine if you need to summarize any data (i.e. Total Sold or On Hand), then:
Use GROUP BY if you do need to summarize any values, or
Use DISTINCT if you do not need to summarize any values

Row_Number Over Where RowNumber between

I'm try to select a certain rows from my table using the row_number over. However, the sql will prompt the error msg "Invalid column name 'ROWNUMBERS' ". Anyone can correct me?
SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product
WHERE ROWNUMBERS BETWEEN #fromCount AND #toCount
Attempting to reference the aliased column in the WHERE clause does not work because of the logical query processing taking place. The WHERE is evaluated before the SELECT clause. Therefore, the column ROWNUMBERS does not exist when WHERE is evaluated.
The correct way to reference the column in this example would be:
SELECT a.*
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product) a
WHERE a.ROWNUMBERS BETWEEN #fromCount AND #toCount
For your reference, the order for operations is:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
There is another answer here that solves the specific error reported. However, I also want to address the wider problem. It looks a lot like what you are doing here is paging your results for display. If that is the case, and if you can use Sql Server 2012, there is a better way now. Take a look at OFFSET/FETCH:
SELECT First Name + ' ' + Last Name
FROM Employees
ORDER BY First Name
OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
That would show the third page of a query where the page size is 5.

Resources