I have the next table:
Supposing these users are sort in descending order based on their inserted date.
Now, what I want to do, is to change their sorting numbers in that way that for each user, the sorting number has to start from 1 up to the number of appearances of each user. The result should look something like:
Can someone provide me some clues of how to do it in sql server ? Thanks.
You can use the ROW_NUMBER ranking function to calculate a row's rank given a partition and order.
In this case, you want to calculate row numbers for each user PARTITION BY User_ID. The desired output shows that ordering by ID is enough ORDER BY ID.
SELECT
Id,
User_ID,
ROW_NUMBER() OVER (PARTITION BY User_ID ORDER BY Id) AS Sort_Number
FROM MyTable
There are other ranking functions you can use, eg RANK, DENSE_RANK to calculate a rank according to a score, or NTILE to calculate percentiles for each row.
You can also use the OVER clause with aggragets to create running totals or moving averages, eg SUM(Id) OVER (PARTITION BY User_ID ORDER BY Id) will create a running total of the Id values for each user.
use ROW_NUMBER() PARTITION BY User_Id
SELECT
Id,
[User_Id],
Sort_Number = ROW_NUMBER() OVER(PARTITION BY [User_Id]
ORDER BY [User_Id],[CreatedDate] DESC)
FROM YourTable
select id
,user_id
,row_number() over(partition by user_id order by user_id) as sort_number
from table
Use ranking function row_number()
select *,
row_number() over (partition by User_id order by user_id, date desc)
from table t
Related
I have simple T-SQL query, which calculates row number, rows count and total volume across all records:
DECLARE #t TABLE
(
id varchar(100),
volume float,
prev_date date
);
INSERT INTO #t VALUES
('0318610084', 100, '2019-05-16'),
('0318610084', 200, '2016-06-04');
SELECT
row_num = ROW_NUMBER() OVER (PARTITION BY id ORDER BY prev_date),
rows_count = COUNT(*) OVER (PARTITION BY id ORDER BY prev_date),
vol_total = SUM(volume) OVER (PARTITION BY id ORDER BY prev_date),
*
FROM #t;
I get the following result:
However, this is NOT what I expected: in all two rows the rows_count must be 2 and vol_total must be 300:
The workaround would be to add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. However, I thought that there must be another way.
In the end of the day I have found out that the ORDER BY clause must use id field rather prev_date field:
row_num = ROW_NUMBER() OVER (PARTITION BY id ORDER BY id),
rows_count = COUNT(*) OVER (PARTITION BY id ORDER BY id),
vol_total = SUM(volume) OVER (PARTITION BY id ORDER BY id)
After this change the query's output is as expected.
But! I don't understand why is this so? How come the ordering affects partitioning?
For Aggregate functions generally it is not required to have order in the window definition unless you want to do the aggregation one at a time in an ordered fashion, it is like running total. Simply removing the orders will fix the problem.
If I want to explain it from another way it would be like a window that is expanding row by row as you move on to another row. It is started with the first row, calculate the aggregation with all the rows from before (which in the first row is just the current row!) to the position of row.
if you remove the order, the aggregation will be computed for all the rows in the window definition and no order of applying window will take effect.
You can change the order in window definition to see the effect of it.
Of course, ranking functions need the order and this point is just for the aggregations.
DECLARE #t TABLE
(
id varchar(100),
volume float,
prev_date date
);
INSERT INTO #t VALUES
('0318610084', 100, '2019-05-16'),
('0318610084', 200, '2016-06-04');
SELECT
row_num = ROW_NUMBER() OVER (PARTITION BY id ORDER BY prev_date),
rows_count = COUNT(*) OVER (PARTITION BY id),
vol_total = SUM(volume) OVER (PARTITION BY id),
*
FROM #t;
Enabling order in the window for aggregations added after SqlServer 2012 and it was not part of the first release of the feature in 2005.
For a detailed explanation of the order in window functions on aggregates this is a great help:
Producing a moving average and cumulative total - SqlServer Documentation
We have a data structure with four columns:
ContractoreName, ProjectCode, InvoiceID, OrderID
We want to group the data by both ContractoreName and ProjectCode columns, and then get the InvoiceID of the row for each group with MAX(OrderID).
You could use ROW_NUMBER:
SELECT ContractorName, ProjectName, OrderId, InvoiceId
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ContractorName, ProjectName
ORDER BY OrderId DESC) AS rn
FROM tab
) AS sub
WHERE rn = 1;
ROW_NUMBER() is what I would call the canonical solution. In many cases, an old-fashioned solution has better performance:
select t.*
from t
where t.orderid = (select max(t2.orderid)
from t t2
where t2.contractorname = t.contractorname and
t2.projectname = t.projectname
);
This is especially true if there is an index on (contractorname, projectname, orderid).
Why is this faster? Basically, SQL Server can scan the table doing a lookup in an index. The lookup is really fast because the index is designed for it, so the scan is just a little faster than a full table scan.
When using row_number(), SQL Server has to scan the table to calculate the row number (and that can use the index, so it might be fast). But then it has to go back to the table to fetch the columns and apply the where clause. So, even if it uses an index, it is doing more work.
EDIT:
I should also point out that this can be done without a subquery:
select distinct contractorname, projectname,
max(orderid) over (partition by contractorname, projectname) as lastest_order,
first_value(invoiceid) partition by (order by contractorname, projectname order by orderid desc) as lastest_invoice
from t;
Unfortunately, SQL Server doesn't offer first_value() as an aggregation function, but you can use select distinct and get the same effect.
I tried to use this query to get the ranks of each vendr by their rating
SELECT vendorid, rating, RANK() over(ORDER BY rating DESC)ranking
FROM vendors
but I want to get the ranking of a specific vendor so I put the where clause like this:
SELECT vendorid, rating, RANK() over(ORDER BY rating DESC)ranking
FROM vendors
WHERE vendorid=1
but it returns a value of 1 in ranking even though it is not rank 1.
how should I fix this?
In this case
SELECT
vendorid, rating,
RANK() OVER (ORDER BY rating DESC) ranking
FROM
vendors
WHERE
vendorid = 1
Rank is calculated after where, so after filtering, SQL Server will assign ranks and show rank for whatever values left
How to fix this?
Use subquery or cte like below.
;With cte as
(
SELECT
vendorid, rating,
RANK() OVER (ORDER BY rating DESC) ranking
FROM
YOURTABLE
)
select *
from cte
where vendorid = 1
I'm converting a SQL Server stored procedure to HiveQL.
How can I convert something like:
SELECT
p.FirstName, p.LastName,
RANK() OVER (ORDER BY a.PostalCode) AS Rank
I Have seen this use case a few times, there is a way to do something similar to RANK() in Hive using a UDF.
There are basically a few steps:
Partition the data into groups with DISTRIBUTE BY
Order the data in each group with SORT BY
There is actually a nice article on the topic, and you can also find some code from Edward Capriolo here.
Here is an example query doing a rank in Hive:
ADD JAR p-rank-demo.jar;
CREATE TEMPORARY FUNCTION p_rank AS 'demo.PsuedoRank';
SELECT
category,country,product,sales,rank
FROM (
SELECT
category,country,product,sales,
p_rank(category, country) rank
FROM (
SELECT
category,country,product,
sales
FROM p_rank_demo
DISTRIBUTE BY
category,country
SORT BY
category,country,sales desc) t1) t2
WHERE rank <= 3
Which does the equivalent of the following query in MySQL:
SELECT
category,country,product,sales,rank
FROM (
SELECT
category,country,product, sales,
rank() over (PARTITION BY category, country ORDER BY sales DESC) rank
FROM p_rank_demo) t
WHERE rank <= 3
For anyone who comes across this question, Hive now supports rank() and other analytics functions.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
No big deal
Somehow in Hive 0.13.1 that comma does not work :-(
so I got it working without the comma
(PARTITION BY category country ORDER BY sales DESC)
I have a data set that contains the columns Date, Cat, and QTY. What I want to do is add a unique column that only counts unique Cat values when it does the row count. This is what I want my result set to look like:
By using the SQL query below, I'm able to get row using the row_number() function.
However, I can't get that unique column that I have depicted above. When I add group by to the OVER clause, it does not work. Does anybody have any ideas as how I could get this unique count column to work?
SELECT
Date,
ROW_NUMBER() OVER (PARTITION BY Date ORDER By Date, Cat) as ROW,
Cat,
Qty
FROM SOURCE
Here is a solution.
You need not worry about the ordering of Cat. Using following SQL you will be able to get unique values for your Date & Cat combination.
SELECT
Date,
ROW_NUMBER() OVER (PARTITION BY Date, Cat ORDER By Date, Cat) as ROW,
Cat,
Qty
FROM SOURCE
DENSE_RANK() OVER (PARTITION BY date ORDER BY cat)