SQL Server Pagination w/o row_number() or nested subqueries? - sql-server

I have been fighting with this all weekend and am out of ideas. In order to have pages in my search results on my website, I need to return a subset of rows from a SQL Server 2005 Express database (i.e. start at row 20 and give me the next 20 records). In MySQL you would use the "LIMIT" keyword to choose which row to start at and how many rows to return.
In SQL Server I found ROW_NUMBER()/OVER, but when I try to use it it says "Over not supported". I am thinking this is because I am using SQL Server 2005 Express (free version). Can anyone verify if this is true or if there is some other reason an OVER clause would not be supported?
Then I found the old school version similar to:
SELECT TOP X * FROM TABLE WHERE ID NOT IN (SELECT TOP Y ID FROM TABLE ORDER BY ID) ORDER BY ID where X=number per page and Y=which record to start on.
However, my queries are a lot more complex with many outer joins and sometimes ordering by something other than what is in the main table. For example, if someone chooses to order by how many videos a user has posted, the query might need to look like this:
SELECT TOP 50 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID WHERE iUserID NOT IN (SELECT TOP 100 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID ORDER BY iVideoCount) ORDER BY iVideoCount
The issue is in the subquery SELECT line: TOP 100 iUserID, iVideoCount
To use the "NOT IN" clause it seems I can only have 1 column in the subquery ("SELECT TOP 100 iUserID FROM ..."). But when I don't include iVideoCount in that subquery SELECT statement then the ORDER BY iVideoCount in the subquery doesn't order correctly so my subquery is ordered differently than my parent query, making this whole thing useless. There are about 5 more tables linked in with outer joins that can play a part in the ordering.
I am at a loss! The two above methods are the only two ways I can find to get SQL Server to return a subset of rows. I am about ready to return the whole result and loop through each record in PHP but only display the ones I want. That is such an inefficient way to things it is really my last resort.
Any ideas on how I can make SQL Server mimic MySQL's LIMIT clause in the above scenario?

Unfortunately, although SQL Server 2005 Row_Number() can be used for paging and with SQL Server 2012 data paging support is enhanced with Order By Offset and Fetch Next, in case you can not use any of these solutions you require to first
create a temp table with identity column.
then insert data into temp table with ORDER BY clause
Use the temp table Identity column value just like the ROW_NUMBER() value
I hope it helps,

Related

Add computed column using subquery

In SQL Server 2000, I want to add a computed column which basically is MAX(column1).
Of course I get an error because subqueries are not allowed.
What I basically try to do is to get the max(dateandtime) of a number of tables of my database.
However when I run my code it takes too long because it's a very old and badly designed database with no keys and indexes.
So, I believe that by adding a new computed column which is the max(datetime), I will do my query much much faster because I will query
(SELECT TOP 1 newcomputedcolumn FROM Mytable)
and I will not have to do
(SELECT TOP 1 dateandtime FROM Mytable
ORDER BY dateandtime DESC)
or
(SELECT MAX(dateandtime) FROM Mytable)
which takes too long.
Any ideas? Thanks a lot.

Index for using IN clause in where condition

My application access data from a table in SQL Server. Consider the table name is PurchaseDetail with some other columns.
The select query has below where clauses.
1. name - name has 10000 values only.
2. createdDateTime
The actual query is
select *
from PurchaseDetail
where name in (~2000 name)
and createdDateTime = 'someDateValue';
The SQL tuning advisor gave some recommendation. I tried with those recommended indexes. The performance increased a bit but not completely.
Is there any wrong in my query? or Is there any possible to change/improve my select query?
Because I didn't use IN in where clause before. My table having more than 100 million records.
Any suggestion please?
In this case using IN for that much data is not good at all.
this best way is to use INNER JOIN instead.
It would be nicer if insert those names into a temp table and INNER JOIN it with your SELECT query.

SQL Server : group all data by one column

I need some help in writing a SQL Server stored procedure. All data group by Train_B_N.
my table data
Expected result :
expecting output
with CTE as
(
select Train_B_N, Duration,Date,Trainer,Train_code,Training_Program
from Train_M
group by Train_B_N
)
select
*
from Train_M as m
join CTE as c on c.Train_B_N = m.Train_B_N
whats wrong with my query?
The GROUP BY smashes the table together, so having columns that are not GROUPED combine would cause problems with the data.
select Train_B_N, Duration,Date,Trainer,Train_code,Training_Program
from Train_M
group by Train_B_N
By ANSI standard, the GROUP BY must include all columns that are in the SELECT statement which are not in an aggregate function. No exceptions.
WITH CTE AS (SELECT TRAIN_B_N, MAX(DATE) AS Last_Date
FROM TRAIN_M
GROUP BY TRAIN_B_N)
SELECT A.Train_B_N, Duration, Date,Trainer,Train_code,Training_Program
FROM TRAIN_M AS A
INNER JOIN CTE ON CTE.Train_B_N = A.Train_B_N
AND CTE.Last_Date = A.Date
This example would return the last training program, trainer, train_code used by that ID.
This is accomplished from MAX(DATE) aggregate function, which kept the greatest (latest) DATE in the table. And since the GROUP BY smashed the rows to their distinct groupings, the JOIN only returns a subset of the table's results.
Keep in mind that SQL will return #table_rows X #Matching_rows, and if your #Matching_rows cardinality is greater than one, you will get extra rows.
Look up GROUP BY - MSDN. I suggest you read everything outside the syntax examples initially and obsorb what the purpose of the clause is.
Also, next time, try googling your problem like this: 'GROUP BY, SQL' or insert the error code given by your IDE (SSMS or otherwise). You need to understand why things work...and SO is here to help, not be your google search engine. ;)
Hope you find this begins your interest in learning all about SQL. :D

SQL Server: order of returned rows when using IN clause

Running the following query returns 4 rows. As I can see in SSMS the order of returned rows is the same as I specified in the IN clause.
SELECT * FROM Table WHERE ID IN (4,3,2,1)
Can I say that the order of returned rows are ALWAYS the same as they appear in the IN clause?
If yes then is it true, that the following two queries return the rows in the same order? (as I've tested the orders are the same, but I don't know if I can trust this behavior)
SELECT TOP 10 * FROM Table ORDER BY LastModification DESC
SELECT * FROM Table WHERE ID IN (SELECT TOP 10 ID FROM Table ORDER BY LastModification DESC)
I ask this question because I have a quite complex select query. Using this trick over it brings me ca. 30% performance gain, in my case.
You cannot guarantee the records to be in any particular order unless you use ORDER BY clause. You may use some tricks that may work some of the time but they won't give you guarantee of the order.

What are the differences between the older row_number() and the newer OFFSET + FETCH based pagination in SQL Server?

I have few questions in context of the older row_number (SQL Server 2008) and the newer OFFSET + FETCH (SQL Server 2012) paging mechanism provided by SQL Server 2012.
What are the limitations with row_number()?
Is OFFSET + FETCH an improved replacement for row_number()?
Are there any use-cases which could only be sufficed using one and not the other?
Are there any performance differences between the two? If yes, which one is recommended?
Thanks.
Using ROW_NUMBER() works fine - it's just more work than necessary; you need to write a "skeleton" CTE around your actual query, add the ROW_NUMBER() column to your output set, and then filter on that.
Using the new OFFSET / FETCH is simpler - and yes, it's also better for performance, as these two links can show you:
New T-SQL features in SQL Server 2012
Comparing performance for different SQL Server paging
So overall: if you're using SQL Server 2012 - then you should definitely use OFFSET/FETCH rather than ROW_NUMBER() for paging
By definition ROW_NUMBER is a temporary value calculated when the query is run. OFFSET / FETCH is an option that you can specify for the ORDER BY clause.
In terms of speed, they both achieve great performance and the difference between each method depends on the columns that you specify in the SELECT clause and the Indexes that you have on your tables.
In the following 2 examples, you can see a difference between the two methods:
1. Case when OFFSET / FETCH is faster:
SELECT
Id
FROM Orders
ORDER BY
Id
OFFSET 50000 ROWS FETCH NEXT 5000 ROWS ONLY
SELECT
A.Id
FROM
(
SELECT
Id,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber FROM Orders
) AS A
WHERE
A.RowNumber BETWEEN 50001 AND 55000
2. Case when ROW_NUMBER() is faster:
SELECT
*
FROM Orders
ORDER BY
Id
OFFSET 50000 ROWS FETCH NEXT 5000 ROWS ONLY
SELECT
A.*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber FROM Orders
) AS A
WHERE
A.RowNumber BETWEEN 50001 AND 55000
I found this article which presents 3 techniques of paging compares their results in a chart. Might be helpful in deciding which approach you want to follow.
Paging Function Performance in SQL Server 2012
All of the paging methods discussed work fine for smaller amounts of records, but not so much for larger quantities of data.

Resources