Which paging method (Sql Server 2008) for BEST performance? - sql-server

In Sql Server 2008, many options are available for database paging via stored procedure. For example, see here and here.
OPTIONS:
ROW_NUMBER() function
ROWCOUNT
CURSORS
temporary tables
Nested SQL queries
OTHERS
Paging using ROW_NUMBER() is known to have performance issues:
Please advise, which paging method has the best performance (for large tables with JOINs) ?
Please also provide links to relevant article(s), if possible.
Thank You.

One question you have to answer is if you want to display the total number of rows to the end user. To calculate the number of the last page, you also need the last row number.
If you can do without that information, a temporary table is a good option. You can select the pirmary key and use LIMIT to retrieve keys up to the key you're interested in. If you do this right, the typical use case will only retrieve the first few pages.
If you need the last page number, you can use ROW_NUMBER(). Using a temporary table won't be much faster because you can't use the LIMIT clause, making this strategy the equivalent of a ROW_NUMBER() calculation.

We can get a rowcount using following query.
WITH data AS
(
SELECT ROW_NUMBER() OVER (order by memberid ) AS rowid, memberid
FROM Customer
)
SELECT *, (select count(*) from data) AS TotalCount
FROM data
WHERE rowid > 20 AND rowid <= 30

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.
If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here
You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1
If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

SQL 2005 Huge table data paging with filter

I have a table in sql 2005 with a big count of data - smth like 1 500 000 rows right now and later it should be more. Before paging I need to detect what rows the user can read (sql query for checking is a heavy which refer to several other tables) and the result should be paged.
What the best practice to work with the huge table that should be filtered and paged after all?
Thanks in advance!
If you want to return paginated results in SQL Server your best bet is probably to use the ROW_NUMBER() function. Here is an example that would get you the 400th-410th results:
SELECT ID, Name, Date
FROM (SELECT TOP 410 ROW_NUMBER() OVER (ORDER BY id)
AS Row, ID, Name, Date FROM MyTable)
AS MyPagedTable
WHERE Row >= 400 AND Row <= 410
Make sure you have the proper indexes in place. If you are getting performance issues then I would recommend looking at the execution plan and seeing where the problem areas are.

Implement Oracle Paging For ANY Query?

I've found a lot of examples of paging in Oracle. The particular one I'm using now looks a like this:
SELECT * FROM (
SELECT a.*, ROWNUM RNUM FROM (
**Select * From SomeTable**) a
WHERE ROWNUM <= 500) b
WHERE b.RNUM >= 1
The line in bold represents the 'original' query. The rest of the SQL there is to implement the paging. The problem I'm running into is a query that is perfectly valid by itself; will fail when I place it inside of my paging code.
As an example - this query will fail:
SELECT TABLE1.*, TABLE1.SomeValue FROM TABLE1
With a ambiguous column error. But, without my extra code; it will run just fine. I have a large number of 'saved' queries, but I have to ensure that my paging solution doesn't invalidate them.
I've used SQL Developer as my Oracle querying tool and it manages to implement paging that works even with the above query that fails when I wrap it in the paging code. Can anyone tell me how they manage to pull it off?
First off, the original query would need to have an ORDER BY clause in order to make the paging solution work reasonably. Otherwise, it would be perfectly valid for Oracle to return the same 500 rows for the first page, the second page, and the Nth page.
SQL Developer is not changing your query to implement paging. It is simply sending the full query to Oracle and paging the results itself using JDBC. The JDBC client application can specify a fetch size which controls how many rows are returned from the database to the client at a time. The client application can then wait for the user to either decide to go to the next page or to do something else in which case the cursor is closed.
Whether the SQL Developer approach makes sense depends heavily on the architecture of your application. If you're trying to page data in a stateless web application, it probably doesn't work because you're not going to hold a database session open across multiple page requests. On the other hand, if you've got a fat client application with a dedicated Oracle database connection, it's quite reasonable.
First of all,
What's the point in doing
SELECT TABLE1.*, TABLE1.someValue from TABLE1
Wouldn't TABLE1.* automatically select "someValue", so why query it redundantly ?
Secondly for pagination, try the analytical query approach
SELECT * FROM {
SELECT col1, col2, col3
, row_number() OVER (order by col1) position
FROM TABLE1
} WHERE rn >= p_seek and rn < (p_seek+p_count)
p_seek is the starting position and p_count is the number of rows to fetch.
Here instead of col1, col2, col3, etc you can do TABLE1.*, TABLE1.someValue etc.

Microsoft SQL Server Paging

There are a number of sql server paging questions on stackoverflow and many of them talk about using ROW_NUMBER() OVER (ORDER BY ...) AND CTE. Once you get into the hundreds of thousands of rows and start adding sorting on non-primary key values and adding custom WHERE clauses, these methods become very inneficient. I have a dataset of several million rows I am trying to page through with custom sorting and filtering, but I am getting poor performance, even with indexes on all the fields that I sort by and filter by. I even went as far as to include my SELECT columns in each of the indexes, but this barely helped and severely bloated my database.
I noticed the stackoverflow paging only takes about 500 milliseconds no matter what sorting criteria or page number you click on. Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows? This would include getting the total rows as efficiently as possible.
My current query has the exact same logic as this stackoverflow question about paging:
Best paging solution using SQL Server 2005?
Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows?
If you want accurate perfect paging, there is no substitute for building an index key (position row number) for each record. However, there are alternatives.
(1) total number of pages (records)
You can use an approximation from sysindexes.rows (almost instant) assuming the rate of change is small.
You can use triggers to maintain a completely accurate, to the second, table row count
(2) paging
(a)
You can show page jumps within say the next five pages to either side of a record. These need to scan at most {page size} x 5 on each side. If your underlying query lends itself to travelling along the sort order quickly, this should not be slow. So given a record X, you can go to the previous page using (assuming sort order is a asc, b desc
select top(#pagesize) t.*
from tbl x
inner join tbl t on (t.a = x.a and t.b > x.b) OR
(t.a < a.x)
where x.id = #X
order by t.a asc, t.b desc
(i.e. the last {page size} of records prior to X)
To go five pages back, you increase it to TOP(#pagesize*5) then further TOP(#pagesize) from that subquery.
Downside: This option requires that you cannot directly jump to a particular location, your options are only FIRST (easy), LAST (easy), NEXT/PRIOR, <5 pages either side
(b)
If the paging is always going to be quite specific and predictable, maintain an INDEXED view or trigger-updated table that does not contain gaps in the row number. This may be an option if the tables normally only see updates at one end of the spectrum, with gaps from deletes easily filled quickly by shifting not-so-many records.
This approach gives you a rowcount (last row) and also direct access to any page.
try this, let say you have country table as below:
DECLARE #pageIndex INT=0;
DECLARE #pageSize INT= 10;
DECLARE #sortByColumn NVARCHAR(200)='Code';
DECLARE #sortByDesc BIT=0;
;WITH tbl AS (
SELECT COUNT(id) OVER() [RowTotal], c.Id, c.Code, c.Name
FROM dbo.[Country] c
ORDER BY
CASE WHEN #sortByColumn='Code' AND #sortByDesc=0 THEN c.Code END ASC,
CASE WHEN #sortByColumn='Code' AND #sortByDesc<>0 THEN c.Code END DESC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc=0 THEN c.Name END ASC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc<>0 THEN c.Name END DESC,
,c.Name ASC --DEFAULT SORTING ORDER
OFFSET #PageIndex*#pageSize ROWS
FETCH NEXT #pageSize ROWS ONLY
) SELECT (#PageIndex*#pageSize)+(ROW_NUMBER() OVER(ORDER BY Id))[RowNo],* from tbl;

Resources