Using Top in T-SQL - sql-server

A question on using Top. For example, we have this SQL statement:
SELECT TOP (5) WITH TIES orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate DESC;
It orders return rows by orderdate first then select the top most five rows.
But isn't that ORDER clause happens after SELECT clause, which means that the first five order in random will be returned first then those five rows are ordered by orderdate?

The order of commands in the statement doesn't reflect the actual order of operations that SQL follows. See this article which shows the order to be:
from
where
group by
having
select
order by
limit
As you can see, the TOP operation (limit) is the last to be executed.

Question has already an accepted answer. But I would like to quote content from Microsoft Documentation.
Logical Processing Order of the SELECT statement
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

But isn't that ORDER clause happens after SELECT clause, which means
that the first five order in random will be returned first then those
five rows are ordered by orderdate ?
No. ORDER BY is processed after the SELECT, but limiting the result set to 5 rows happens even later.
The physical details of actual query processing may vary, but the end result would be as if the server sorted the whole table by orderdate, then picked the top 5 (or more if needed due to ties) rows, return those rows and discard the rest.

Related

Return all columns excluding rows with a duplicateID in one column

I have an interesting issue.
I inherited a sloppy database with a table that has duplicate rows. However, they are not exact duplicates due to one column(a text column).
Here is an example:
TestID TestDescription Cost
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
115893hc127aaq Etiology • Understand the causes of acute pancreatitis $10
You can see that all the data except the 'TestDescription' is identical.
There are 1000's of rows like this where there might be 2 or 3 duplicate rows with minor spacing or spelling issues in 'TestDescription'
Because of this, using DISTINCT won't work.
I want to SELECT all rows but only get one row for each TestID...lets say the first one, then ignore the rest.
I tried SELECT DISTINCT *
But I can't do this using DISTINCT because TestDescription contains minor differences between rows.
SELECT DISTINCT TestID works, but that only returns TestID and I need to see all columns.
Is there a way of doing this in Sql Server 2012?
Thanks!
One approach uses row_number():
select *
from (
select t.*, row_number() over(partition by testid order by (select null)) rn
from mytable t
) t
where rn = 1
This assumes that you want one row per testid, as your question suggests.
You did not tell which column you want to use to break the ties, and I am unsure there is actually one, so I odered by (select null). This is not a deterministic order by clause, so consequent executions of the query might not always select the same row from a given duplicate group.

SQL Server - Delete Duplicate Rows - how does Partition By affect this query?

I've been using the following inherited query where I'm trying to delete duplicate rows and I'm getting some unexpected results when first running it as a SELECT - I believe it has something to do with my lack of understanding of the Partition part of the statement:
WITH CTE AS(
SELECT [Id],
[Url],
[Identifier],
[Name],
[Entity],
[DOB],
RN = ROW_NUMBER()OVER(PARTITION BY Name ORDER BY Name)
FROM Data.Statistics
where Id = 2170
)
DELETE FROM CTE WHERE RN > 1
Can someone help me understand exactly what I'm doing with the Partition BY Name part of this? This doesn't limit the query in any way to only looking for duplicates in the Name field, correct? I need to ensure that it's looking for records where all 5 of the fields inside the CTE definition are the same for a record to be considered a duplicate.
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) doesn't make a lot of sense. You wouldn't ORDER BY the same thing you used in PARTITION BY since it will be the same value for everything in the partition, making the ORDER BY part useless.
Basically the CTE part of this query is saying to split the matching rows (those with [Id] = 2170) temporarily into groups for each distinct name, and within each group of rows with the same name, order those by name (which are obviously all the same value) and then return the row number within that sequence group as RN. Unique names will all have a row number of 1, because there is only one row with that name. Duplicate names will have row numbers 1, 2, 3, and so on. The order of those rows is undefined in this case because of the silly ORDER BY clause, but if you changed the ORDER BY to something meaningful, the row numbers would follow that sequence.

SQL Server: order of returned rows when using IN clause

Running the following query returns 4 rows. As I can see in SSMS the order of returned rows is the same as I specified in the IN clause.
SELECT * FROM Table WHERE ID IN (4,3,2,1)
Can I say that the order of returned rows are ALWAYS the same as they appear in the IN clause?
If yes then is it true, that the following two queries return the rows in the same order? (as I've tested the orders are the same, but I don't know if I can trust this behavior)
SELECT TOP 10 * FROM Table ORDER BY LastModification DESC
SELECT * FROM Table WHERE ID IN (SELECT TOP 10 ID FROM Table ORDER BY LastModification DESC)
I ask this question because I have a quite complex select query. Using this trick over it brings me ca. 30% performance gain, in my case.
You cannot guarantee the records to be in any particular order unless you use ORDER BY clause. You may use some tricks that may work some of the time but they won't give you guarantee of the order.

Row_Number Over Where RowNumber between

I'm try to select a certain rows from my table using the row_number over. However, the sql will prompt the error msg "Invalid column name 'ROWNUMBERS' ". Anyone can correct me?
SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product
WHERE ROWNUMBERS BETWEEN #fromCount AND #toCount
Attempting to reference the aliased column in the WHERE clause does not work because of the logical query processing taking place. The WHERE is evaluated before the SELECT clause. Therefore, the column ROWNUMBERS does not exist when WHERE is evaluated.
The correct way to reference the column in this example would be:
SELECT a.*
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product) a
WHERE a.ROWNUMBERS BETWEEN #fromCount AND #toCount
For your reference, the order for operations is:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
There is another answer here that solves the specific error reported. However, I also want to address the wider problem. It looks a lot like what you are doing here is paging your results for display. If that is the case, and if you can use Sql Server 2012, there is a better way now. Take a look at OFFSET/FETCH:
SELECT First Name + ' ' + Last Name
FROM Employees
ORDER BY First Name
OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
That would show the third page of a query where the page size is 5.

How to order by name if order by a number is repeated in SQL Server?

I query my table to get the name and order from temp_tbl:
Select name, sequence from temp_tbl order by [order]
The above query return this resultset like this..
I have to apply a logic here, since I order by [order] and the in the above resultset it returns me two 3 and two 5, In such cases i need to order by name for the repeated numbers in order column
The expected result is
How can I achieve this in SQL query or stored procedure ?
You can have multiple terms in the ORDER BY clause. These terms are treated in descending order, so the first term takes precedence; then if there is ambiguity within that order, use the second term, and so on. So:
select name, sequence
from temp_tbl
order by [order], name

Resources