Row_Number() CTE Performance when using ORDER BY CASE - sql-server

I have a table I'd like to do paging and ordering on and was able to get a query similar to the following to do the work (the real query is much more involved with joins and such).
WITH NumberedPosts (PostID, RowNum) AS
(
SELECT PostID, ROW_NUMBER() OVER (ORDER BY
CASE WHEN #sortCol = 'User' THEN User END DESC,
CASE WHEN #sortCol = 'Date' THEN Date END DESC,
CASE WHEN #sortCol = 'Email' THEN Email END DESC) as RowNum
FROM Post
)
INSERT INTO #temp(PostID, User, Date, Email)
SELECT PostID, User, Date, Email
FROM Post
WHERE NumberedPosts.RowNum BETWEEN #start and (#start + #pageSize)
AND NumberedPosts.PostID = Post.PostID
The trouble is that performance is severely degraded when using the CASE statements (at least a 10x slowdown), when compared to a normal ORDER BY Date desc clause . Looking at the query plan it appears that all columns are still being sorted, even if they do not match the #sortCol qualifier.
Is there a way to get this to execute at near 'native' speed? Is dynamic SQL the best candidate for this problem? Thanks!

There shouldn't be any reason to query the post table twice. You can go the dynamic route and address those issues on performance or create 3 queries determined by the #sortCol parameter. Redundant code except for the row_num and order by parts, but sometimes you give up maintainability if speed is critical.
If #sortCol = 'User'
Begin
Select... Order by User
End
If #sortCol = 'Date'
Begin
Select .... Order by Date
end
If #sortCol = 'Email'
Begin
Select... Order by Email
End

Better to do this with either three hardcoded queries (in appropriate IF statements based on #sortCol) or dynamic SQL.
You might be able to do a trick with UNION ALL of three different queries (base on a base CTE which does all your JOINs), where only one returns rows for #sortCol, but I'd have to profile it before recommending it:
WITH BasePosts(PostID, User, Date, Email) AS (
SELECT PostID, User, Date, Email
FROM Posts -- This is your complicated query
)
,NumberedPosts (PostID, User, Date, Email, RowNum) AS
(
SELECT PostID, User, Date, Email, ROW_NUMBER() OVER (ORDER BY User DESC)
FROM BasePosts
WHERE #sortCol = 'User'
UNION ALL
SELECT PostID, User, Date, Email, ROW_NUMBER() OVER (ORDER BY Date DESC)
FROM BasePosts
WHERE #sortCol = 'Date'
UNION ALL
SELECT PostID, User, Date, Email, ROW_NUMBER() OVER (ORDER BY Email DESC)
FROM BasePosts
WHERE #sortCol = 'Email'
)
INSERT INTO #temp(PostID, User, Date, Email)
SELECT PostID, User, Date, Email
FROM NumberedPosts
WHERE NumberedPosts.RowNum BETWEEN #start and (#start + #pageSize)

I would definitely go down the dynamic SQL route (using sp_executesql with parameters to avoid any injection attacks). Using the CASE approach you're immediately stopping SQL Server from using any relevant indexes that would assist in the sorting process.

This should work, but not sure if it improves performance:
WITH NumberedPosts (PostID, RowNum) AS
(
SELECT PostID, ROW_NUMBER() OVER (ORDER BY
CASE WHEN #sortCol = 'User' THEN User
WHEN #sortCol = 'Date' THEN Date
WHEN #sortCol = 'Email' THEN Email
END DESC) as RowNum
FROM Post
)
INSERT INTO #temp(PostID, User, Date, Email)
SELECT PostID, User, Date, Email
FROM Post
WHERE NumberedPosts.RowNum BETWEEN #start and (#start + #pageSize)
AND NumberedPosts.PostID = Post.PostID

Related

using Row_number() and Partition, then ordering desc, and selecting top result in DB2

i have a db2 linked server i'm running a query on through SQL Server.
select *
from openquery (DO,'
select distinct HOUSE_NUM, NAME, DOB, AGE, row_number()
over(partition by DOB) rownum
from schema.INFO
where HOUSE_NUM = ''332''
group by HOUSE_NUM, NAME, DOB, AGE
order by NAME, rownum desc
limit 1
with ur');
the table has historical records, so there is a row for each age of the individual. i want to select the highest numbered row for each partition because that will give me their current age, however when i put limit 1 to select the top result i only get 1 row which ignores all the other people. the problem is there are multiple people living in a house and i need all of their ages, not just one of them. how do i select the top result of each partition in db2?
Before applying limit
After applying limit, i need the other names too
A Db2 query would look like this - rownumber cannot be referred to in the same query part this is why I used a CTE
with temp as (
select distinct HOUSE_NUM, NAME, DOB, AGE
, row_number() over(partition by HOUSE_NUM, NAME, DOB order by age desc) as rownum
from schema.INFO
where HOUSE_NUM = '332'
)
select *
from temp
where rownum = 1
Hope this helps - due to the limited information about the data it is only a best guess

StackExchange Query Help t-sql

Would anybody be able to help me with this exercise. I am used to querying on postgresql and not t-sql and I am running into trouble with how some of my data aggregates
My assignment requires me to:
Create a query that returns the number of comments made on each day for each post from the top 50 most commented on posts in the past year.
For example, this query below is giving me a non aggregated result set:
select cast(creationdate as date),
postid,
count(id)
from comments
where postid = 17654496
group by creationdate, postid
The schema is all here
https://data.stackexchange.com/stackoverflow/query/edit/898297
You can try to use CTE get the count by date.
then use window function with ROW_NUMBER make row number order by count amount desc.
;with CTE as (
select cast(creationdate as date) dt,
postid,
count(id) cnt
from comments
WHERE creationdate between dateadd(year,-1,getdate()) and getdate()
group by cast(creationdate as date), postid
), CTE2 AS (
select *,ROW_NUMBER() OVER (order by cnt desc) rn
from CTE
)
SELECT *
FROM CTE2
WHERE rn <=50
https://data.stackexchange.com/stackoverflow/query/898322/test

Change where statement to exclude duplicates

I have a stored procedure that I use to import orders into a survey software's database. Currently it imports all the orders from the previous day, but recent changes in business plan requires these surveys to be sent out hourly. The imports come from a view in a different database. attribute 1 is the order number, which will be unique per survey so we can use that to limit the imports.
How do I change this to not pull in duplicates?
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
Insert INTO SurveyTable
(SurveyTable.firstname,
SurveyTable.token,
SurveyTable.email,
SurveyTable.emailstatus,
SurveyTable.language,
SurveyTable.remindersent,
SurveyTable.attribute_1,
SurveyTable.attribute_2)
select
location,
cast( '' as xml ).value('xs:base64Binary(sql:column( "token" ) )', 'nvarchar(MAX)' ),
email,
emailstatus,
[language],
remindersent,
attribute_1,
attribute_2
from
(
select
RTRIM([Closed_Orders_For_Survey].[Location]) location,
crypt_gen_random(12) as token,
[Closed_Orders_For_Survey].[Email] email,
'OK' emailstatus,
'en' [language],
'N' remindersent,
[Closed_Orders_For_Survey].[Order Number] attribute_1,
CONVERT(VARCHAR(10), [Closed_Orders_For_Survey].[Invoice Date],110) attribute_2
from
[Closed_Orders_For_Survey]
where
[Closed_Orders_For_Survey].[Order Date] >= dateadd(DAY, -1, Convert(date, GETDATE()))
) as x
END
P.S. the token is a generated unique string used in creaating the survey URL. We decided not to use the order number as the token because this would be too predictable and enable people to change their url to fill out other users surveys.
This is not perfectly optimal for very large data sets (you'd want to make sure you have good indexes on attribute_1), but you can do a 'where not exists' clause to filter out any orders from the inner query that already have been inserted:
select
location,
cast( '' as xml ).value('xs:base64Binary(sql:column( "token" ) )', 'nvarchar(MAX)' ),
email,
emailstatus,
[language],
remindersent,
attribute_1,
attribute_2
from
(
select
RTRIM([Closed_Orders_For_Survey].[Location]) location,
crypt_gen_random(12) as token,
[Closed_Orders_For_Survey].[Email] email,
'OK' emailstatus,
'en' [language],
'N' remindersent,
[Closed_Orders_For_Survey].[Order Number] attribute_1,
CONVERT(VARCHAR(10), [Closed_Orders_For_Survey].[Invoice Date],110) attribute_2
from
[Closed_Orders_For_Survey]
where
[Closed_Orders_For_Survey].[Order Date] >= dateadd(DAY, -1, Convert(date, GETDATE()))
) as x
where
not exists (
select attribute_1
from SurveyTable
where
attribute_1 = x.attribute_1
)
This might seem silly, but if the token is a primary key, then the attempt to insert a duplicate token will sinply fail. Trap that specific error in your application (and vigorously ignore it) and you're golden.
If it must be all one transaction, you can always add a "and token not in (select token from table)" subselect (potentially inefficient, but should work).

SQL Query Distinct two columns, Max(date) and retrieve ID

I'm having trouble figuring out how to make this query work. I've tried everything under the sun to avoid looping.
The table has ID (pk), UserID, BookID, BookDate (datetime), and SellerID. There are duplicate combinatins of UserID and BookID.
I am trying to retrieve distinct records by UserID and BookID that have the most recent BookDate. That's easy enough (below), but I also need to retrieve the ID and SellerID columns for the returned record. That's where I'm having trouble...
Select Distinct
UserID, CourseID, MAX(AssignedON)
From
AssignmentS
Group By
UserID, CourseID
Every time I add a join I get all records. I've tried rowover, exists and nothing seems to work. Any help would be greatly appreciated!
select userid,courseid,bookdate,sellerid from
(select userid,courseid,bookdate,sellerid,
row_number() over (partition by userid,courseid
order by bookdate desc) as RNUM
from yourtable where yourwhere)
where rnum = 1;
[This][1]
[1]: http://coding.feron.it/2012/08/mssql-having-maxid-id-problem-row_number-partition/ blog post describes in detail how to do this with multiple tables
Figured it out. I just had to move things around a bit and this is working perfectly!
select userid,courseid,bookdate,sellerid from (
select * row_number() over(partition by userid,courseid, order by bookdate desc) as RNUM
from yourtable where yourwhere)
where rnum = 1;

SQL Server over query

My query returns the latest entry for each userid, but I need it to return the the latest entry for each userid and taskname. I tried to use group by, but I am getting an error. Is there something I'm doing wrong? Thanks!
SELECT UserId, TaskName, First, Last, email, ValueDate, Analysis
FROM (SELECT UserId, TaskName, First, Last,
email, ValueDate, Analysis,
ROW_NUMBER() OVER(PARTITION BY UserID
ORDER BY ValueDate DESC) AS rk
FROM MyTable) AS L
WHERE rk = 1
You should replace PARTITION BY UserID with PARTITION BY UserID, TaskName :
SELECT UserId, TaskName, First, Last, email, ValueDate, Analysis
FROM (SELECT UserId, TaskName, First, Last,
email, ValueDate, Analysis,
ROW_NUMBER() OVER(PARTITION BY UserID, TaskName
ORDER BY ValueDate DESC) AS rk
FROM MyTable) AS L
WHERE rk = 1
Without having some data to test and tables structure, I can only presume what you really want to achieve but it could be something that look like this :
SELECT UserId, TaskName, First, Last, email, max(ValueDate), Analysis
FROM MyTable
GROUP BY UserId, TaskName, First, Last, email, Analysis
ORDER BY MAX(ValueDate) DESC

Resources