Retrieving specific number of rows based on sum of row number - sql-server

After reading an experimenting I decided I need to ask:
I am trying to retrieve a specific number of rows from a table based on the sum of the row number: This is a basic table with two columns: CusID, CusName.
I started by numbering each row to 1 so that I can use a SUM of the row number, or so I thought.
WITH Example AS
(
SELECT
*,
ROW_NUMBER() OVER (Partition by CusID ORDER BY CusID) AS RowNumber
FROM
MySchema.MyTable
)
I am not sure how to move beyond here. I tried using the HAVING clause but obviously that would not work. I could also use TOP or Percent.
But I would like to retrieve the rows based on the sum of row number.
What's the way to do this?

First of all Windowed functions cannot be used in the context of another windowed function or aggregate.So you can not use Aggregate function inside the row_number I think it could better than use all function after your with like this
WITH Example AS
(
SELECT *, ROW_NUMBER() OVER (Partition by CusID ORDER BY CusID) AS RowNumber
FROM MySchema.MyTable
)
select cusid,cusname,sum(rownumber) from example
group by Cusid,Cusname
having .....

Related

rank gets affected by groupby, why

I've recently seen a query like below (the rank, dense_rank, with group by clause). I found the group by clause makes the rank behaves like dense rank, and could not find microsoft documentation about it.
with FactTransactionHistory as
(
select 2 as ProductKey,'abc1' as trx
union
select 3 as ProductKey,'abc1' as trx
union
select 4 as ProductKey,'abc' as trx
union
select 4 as ProductKey,'abc2' as trx
union
select 4 as ProductKey,'abc3' as trx
union
select 5 as ProductKey,'abc' as trx
)
select ProductKey, DENSE_RANK() over(order by ProductKey) rowNumDense, RANK() over(order by ProductKey) rowNum
/*, count(*) recordCount*/
from FactTransactionHistory
group by ProductKey
My understanding is if the over clause has partition by, it will be ordered within the partition, hence the rank value is determined within the partition.
But this query has no partitition by, so the order by is on the whole dataset, and I could not explain about the rank function, why it is behaving like dense_rank.
Can you please help on explaining why?
Note: if I remove the group by clause, the rank and dense_rank has shown different value as the documentation stated.
I found the group by clause makes rank behave like dense rank.
These two ranking functions only differ on how they handle ties. Here, you are ordering the over() clause of the window function with the same column that is used in the group by - that is ProductKey. By nature, aggregation guarantees no duplicates on the product key, so both functions give the same result.
But this query has no partition by, so the order by is on the whole dataset
This is the place where your expectation goes wrong. To quote the docs on the OVER clause
If PARTITION BY is not specified, the function treats all rows of the query result set as a single group.
My emphasis. It's the result set rows, not the source rows, that make up the single partition here.

T-SQL: aggregate function for calculating Nth percentile

I am trying to calculate the Nth percentile of all of the values in a single column in a table. All I want is a scalar, aggregate value for which N percent of the values are below. For instance, If the table has 100 rows where the value is the same as the row index plus one (1 to 100 consecutively), then I'd want this value to tell me that 95% of the values are below 95.
The PERCENTILE_CONT analytic function looks closest to what I want. But if I try to use it like this:
SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER () AS P95
I get one row per row in the table, all with the same value. I could use TOP 1 to just give me one of those rows, but now I've done an additional table scan.
I am not trying to create a wizbang table of results partitioned by some other column in the original table. I just want an aggregate, scalar value.
Edit: I have been able to use PERCENTILE_CONT in a query with a WHERE clause. For example:
DECLARE #P95 INT
SELECT TOP 1 #P95 = (PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER ())
FROM ExampleTable
WHERE LOWER(Color) = 'blue'
SELECT #P95
Including the WHERE clause gives a different result than I got without it.
From what I can tell, you will need to do a subquery here. For example, to find the number of records strictly below the 95 percentile we can try:
WITH cte AS (
SELECT ValueColumn,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY ValueColumn) OVER () AS P95
FROM yourTable
)
SELECT COUNT(*)
FROM cte
WHERE ValueColumn < P95;

SQL Numbering of non-sequential Groups

Can anyone help me with a select statement that will return the "Section" numbering I have shown below? I found some similar questions and answers but nothing that addresses my specific requirements.
My data (simplified for this example) are the "Sequence" and "Data" columns and I want to produce the "Section" column, based on:
my data being ordered by the value in the Sequence column, and
based on the break in value of the Data column:
Note that the "Section" numbering I desire breaks on the "change in value" of the Data column with no consideration for the actual values in that column or for them having to be in any particular sequence.
I should also clarify that the values in the Sequence column will be contiguous so no missing numbers in the sequence, which the chosen answer satisfies.
We can use the difference in row number method here:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY Sequence) -
ROW_NUMBER() OVER (PARTITION BY Data ORDER BY Sequence) rn
FROM yourTable
)
SELECT Sequence, Data, DENSE_RANK() OVER (ORDER BY rn) AS Section
FROM cte
ORDER BY Sequence;
Demo
It is difficult to explain in words why this method works here, but if you are curious, you may try SELECT * FROM cte to see what is happening.
this solution uses the window function twice.
-- [2] then use dense_rank() on the smallest sequence by Data
select *, dense_rank() over(order by s)
from
(
-- [1] find the smallest Sequence for each group of Data
select *, s = min(Sequence) over (partition by Data)
from tbl
) t
order by Sequence

How ROW_NUMBER used with insertions?

I've multipe uniond statements in MSSQL Server that is very hard to find a unique column among the result.
I need to have a unique value per each row, so I've used ROW_NUMBER() function.
This result set is being copied to other place (actually a SOLR index).
In the next time I will run the same query, I need to pick only the newly added rows.
So, I need to confirm that, the newly added rows will be numbered afterward the last row_number value of the last time.
In other words, Is the ROW_NUMBER functions orders the results with the insertion order - suppose I don't adding any ORDER BY clause?
If no, (as I think), Is there any alternatives?
Thanks.
Without seeing the sql I can only give the general answer that MS Sql does not guarantee the order of select statements without an order clause so that would mean that the row_number may not be the insertion order.
I guess you can do something like this..
;WITH
cte
AS
(
SELECT * , rn = ROW_NUMBER() OVER (ORDER BY SomeColumn)
FROM
(
/* Your Union Queries here*/
)q
)
INSERT INTO Destination_Table
SELECT * FROM
CTE LEFT JOIN Destination_Table
ON CTE.Refrencing_Column = Destination_Table.Refrencing_Column
WHERE Destination_Table.Refrencing_Column IS NULL
I would suggest you consider 'timestamping' the row with the time it was inserted. Or adding an identity column to the table.
But what it sounds like you want to do is get current max id and then add the row_number to it.
Select col1, col2, mid + row_number() over(order by smt) id
From (
Select col1, col2, (select max(id) from tbl) mid
From query
) t

Row_Number Over Where RowNumber between

I'm try to select a certain rows from my table using the row_number over. However, the sql will prompt the error msg "Invalid column name 'ROWNUMBERS' ". Anyone can correct me?
SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product
WHERE ROWNUMBERS BETWEEN #fromCount AND #toCount
Attempting to reference the aliased column in the WHERE clause does not work because of the logical query processing taking place. The WHERE is evaluated before the SELECT clause. Therefore, the column ROWNUMBERS does not exist when WHERE is evaluated.
The correct way to reference the column in this example would be:
SELECT a.*
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Price ASC) AS ROWNUMBERS, *
FROM Product) a
WHERE a.ROWNUMBERS BETWEEN #fromCount AND #toCount
For your reference, the order for operations is:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
There is another answer here that solves the specific error reported. However, I also want to address the wider problem. It looks a lot like what you are doing here is paging your results for display. If that is the case, and if you can use Sql Server 2012, there is a better way now. Take a look at OFFSET/FETCH:
SELECT First Name + ' ' + Last Name
FROM Employees
ORDER BY First Name
OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
That would show the third page of a query where the page size is 5.

Resources