Row_Number Over Where RowNumber between - sql-server

I'm try to select a certain rows from my table using the row_number over. However, the sql will prompt the error msg "Invalid column name 'ROWNUMBERS' ". Anyone can correct me?
SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product
WHERE ROWNUMBERS BETWEEN #fromCount AND #toCount

Attempting to reference the aliased column in the WHERE clause does not work because of the logical query processing taking place. The WHERE is evaluated before the SELECT clause. Therefore, the column ROWNUMBERS does not exist when WHERE is evaluated.
The correct way to reference the column in this example would be:
SELECT a.*
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product) a
WHERE a.ROWNUMBERS BETWEEN #fromCount AND #toCount
For your reference, the order for operations is:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

There is another answer here that solves the specific error reported. However, I also want to address the wider problem. It looks a lot like what you are doing here is paging your results for display. If that is the case, and if you can use Sql Server 2012, there is a better way now. Take a look at OFFSET/FETCH:
SELECT First Name + ' ' + Last Name
FROM Employees
ORDER BY First Name
OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
That would show the third page of a query where the page size is 5.

Related

Return all columns excluding rows with a duplicateID in one column

I have an interesting issue.
I inherited a sloppy database with a table that has duplicate rows. However, they are not exact duplicates due to one column(a text column).
Here is an example:
TestID TestDescription Cost
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
You can see that all the data except the 'TestDescription' is identical.
There are 1000's of rows like this where there might be 2 or 3 duplicate rows with minor spacing or spelling issues in 'TestDescription'
Because of this, using DISTINCT won't work.
I want to SELECT all rows but only get one row for each TestID...lets say the first one, then ignore the rest.
I tried SELECT DISTINCT *
But I can't do this using DISTINCT because TestDescription contains minor differences between rows.
SELECT DISTINCT TestID works, but that only returns TestID and I need to see all columns.
Is there a way of doing this in Sql Server 2012?
Thanks!
One approach uses row_number():
select *
from (
select t.*, row_number() over(partition by testid order by (select null)) rn
from mytable t
) t
where rn = 1
This assumes that you want one row per testid, as your question suggests.
You did not tell which column you want to use to break the ties, and I am unsure there is actually one, so I odered by (select null). This is not a deterministic order by clause, so consequent executions of the query might not always select the same row from a given duplicate group.

Using Top in T-SQL

A question on using Top. For example, we have this SQL statement:
SELECT TOP (5) WITH TIES orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate DESC;
It orders return rows by orderdate first then select the top most five rows.
But isn't that ORDER clause happens after SELECT clause, which means that the first five order in random will be returned first then those five rows are ordered by orderdate?
The order of commands in the statement doesn't reflect the actual order of operations that SQL follows. See this article which shows the order to be:
from
where
group by
having
select
order by
limit
As you can see, the TOP operation (limit) is the last to be executed.
Question has already an accepted answer. But I would like to quote content from Microsoft Documentation.
Logical Processing Order of the SELECT statement
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
But isn't that ORDER clause happens after SELECT clause, which means
that the first five order in random will be returned first then those
five rows are ordered by orderdate ?
No. ORDER BY is processed after the SELECT, but limiting the result set to 5 rows happens even later.
The physical details of actual query processing may vary, but the end result would be as if the server sorted the whole table by orderdate, then picked the top 5 (or more if needed due to ties) rows, return those rows and discard the rest.

SQL Server - Delete Duplicate Rows - how does Partition By affect this query?

I've been using the following inherited query where I'm trying to delete duplicate rows and I'm getting some unexpected results when first running it as a SELECT - I believe it has something to do with my lack of understanding of the Partition part of the statement:
WITH CTE AS(
SELECT [Id],
[Url],
[Identifier],
[Name],
[Entity],
[DOB],
RN = ROW_NUMBER()OVER(PARTITION BY Name ORDER BY Name)
FROM Data.Statistics
where Id = 2170
)
DELETE FROM CTE WHERE RN > 1
Can someone help me understand exactly what I'm doing with the Partition BY Name part of this? This doesn't limit the query in any way to only looking for duplicates in the Name field, correct? I need to ensure that it's looking for records where all 5 of the fields inside the CTE definition are the same for a record to be considered a duplicate.
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) doesn't make a lot of sense. You wouldn't ORDER BY the same thing you used in PARTITION BY since it will be the same value for everything in the partition, making the ORDER BY part useless.
Basically the CTE part of this query is saying to split the matching rows (those with [Id] = 2170) temporarily into groups for each distinct name, and within each group of rows with the same name, order those by name (which are obviously all the same value) and then return the row number within that sequence group as RN. Unique names will all have a row number of 1, because there is only one row with that name. Duplicate names will have row numbers 1, 2, 3, and so on. The order of those rows is undefined in this case because of the silly ORDER BY clause, but if you changed the ORDER BY to something meaningful, the row numbers would follow that sequence.

TSQL sub query invalid column and only one expression

It has been a while since writing T-SQL for me and I know this can be done but my memory is good enough to get me close (I think I'm close) but poor enough to not get it right.
To start I have this query:
SELECT DISTINCT(COMM_TYPE),
COUNT(COMM_TYPE) AS 'Total'
FROM
[MYDB].[dbo].[COMM]
GROUP BY
COMM_TYPE
Which returns:
COMM_TYPE Total
--------------------------
TypeA 1
TypeB 44474
TypeC 3
TypeD 3854
TypeE 12327
TypeF 362912
TypeG 484344
TypeH 386
TypeI 106
This is an accurate result.
So now I want the above PLUS a sample of each one. Something with columns like:
ID COMM_TYPE TOTAL DATA COMMENTS PRIMARY COMM_NUMBER
I believe this can be done with a sub query but I am not writing it correctly as I get two errors.
Msg 207, Level 16, State 1, Line 10
Invalid column name 'CT'.
Msg 116, Level 16, State 1, Line 7
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
The second error I understand. My sub query has two columns being returned but positioned in the select as I have it wants only one.
The first error I'm more lost on. I thought I could reference an sub query column in the outer query?
Here is the query:
SELECT TOP(1)
*,
(SELECT
DISTINCT(COMM_TYPE),
COUNT(COMM_TYPE)
FROM
[MYDB].[dbo].[COMM]
GROUP BY
COMM_TYPE) AS CT
FROM
[MYDB].[dbo].[COMM]
WHERE
CT = COMM_TYPE
This is mostly for myself but if it helps anyone here ya go:
We start with a (cte to wrap the entire operation as it bring many benefits but the two applicable here are:
1.Enable grouping by a column that is derived from a scalar subselect.
2.Reference the resulting table multiple times in the same statement
WITH T
AS (
CTE SELECT Statement
)
FINAL SELECT Statement
Next our CTE select basically return three columns for us.
1.Total which in my query was COUNT on a column
2.RN which is the row number
3.Wildcard * which gets all the columns from the table
Now from this point we get into the Partitioning....
So it seems that we need to choose how we are going to break this table up. Since I had defined DISTINCT(COMM_TYPE) without realizing it there was my partition....in that first column definition we also do a count(*). So what must be happening is that first SQL engine breaks table into pieces (partitions) then does a count of records in those pieces....????
SELECT Count(*)
OVER (PARTITION BY COMM_TYPE) AS Total,
Next we do a row_number() operation OVER (aka operating against) again my partition of COMM_TYPE...we then order it and project the column name of rn....kinda not sure why this is needed till I got to the end then it made sense.
Row_number()
OVER (PARTITION BY COMM_TYPE
ORDER BY COMM_TYPE) AS RN,
finally we just pull a wildcard which is every column in the table.
So in the depths of the SQL engine namespace memory registers this must be quite a big hunk of data with these repeated grouping operations "OVER" everything.
However all we see is a single row and that is because of the last select which gives me everything all mushed together as I wanted and we only get the TOP(1) because of that RN column I didn't understand earlier.
Do I understand it properly?
This should do what you need.
WITH T
AS (SELECT Count(*)
OVER (PARTITION BY COMM_TYPE) AS Total,
Row_number()
OVER (PARTITION BY COMM_TYPE
ORDER BY COMM_TYPE) AS RN,
*
FROM MyDb.dbo.Comm)
SELECT *
FROM T
WHERE RN = 1

How ROW_NUMBER used with insertions?

I've multipe uniond statements in MSSQL Server that is very hard to find a unique column among the result.
I need to have a unique value per each row, so I've used ROW_NUMBER() function.
This result set is being copied to other place (actually a SOLR index).
In the next time I will run the same query, I need to pick only the newly added rows.
So, I need to confirm that, the newly added rows will be numbered afterward the last row_number value of the last time.
In other words, Is the ROW_NUMBER functions orders the results with the insertion order - suppose I don't adding any ORDER BY clause?
If no, (as I think), Is there any alternatives?
Thanks.
Without seeing the sql I can only give the general answer that MS Sql does not guarantee the order of select statements without an order clause so that would mean that the row_number may not be the insertion order.
I guess you can do something like this..
;WITH
cte
AS
(
SELECT * , rn = ROW_NUMBER() OVER (ORDER BY SomeColumn)
FROM
(
/* Your Union Queries here*/
)q
)
INSERT INTO Destination_Table
SELECT * FROM
CTE LEFT JOIN Destination_Table
ON CTE.Refrencing_Column = Destination_Table.Refrencing_Column
WHERE Destination_Table.Refrencing_Column IS NULL
I would suggest you consider 'timestamping' the row with the time it was inserted. Or adding an identity column to the table.
But what it sounds like you want to do is get current max id and then add the row_number to it.
Select col1, col2, mid + row_number() over(order by smt) id
From (
Select col1, col2, (select max(id) from tbl) mid
From query
) t

Resources