Select next 50000 entries - sql-server

I selected the first 50,000 rows with TOP 50,000, now I want to get the next group of 50,000. I can't use ROW_NUMBER since the entries do not have an id. I use SQL Server 2012 with the help of SQL Server Management Studio.
How can I get the entries that come after the first 50,000?

With SQL Server 2012, you get the next OFFSET and FETCH NEXT keywords on the ORDER BY clause.
So you can select the first 50'000 rows with:
SELECT (list of columns)
FROM dbo.YourTable
WHERE (some condition)
ORDER BY (some column)
OFFSET 0 ROWS FETCH NEXT 50000 ROWS ONLY
and then the next 50'000 later with:
SELECT (list of columns)
FROM dbo.YourTable
WHERE (some condition)
ORDER BY (some column)
OFFSET 50000 ROWS FETCH NEXT 50000 ROWS ONLY
(just a side-note: a page size of 50'000 rows seems overly large - how about 1'000 or something?)
See Using the new OFFSET and FETCH NEXT options for more details and background info

If you have some way to assure a consistent order of the results I can't see why you can't use row_number with a common table expression like this:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY something_unique DESC) AS 'Row'
FROM my_table
)
-- To get the first 50k rows:
SELECT * FROM CTE WHERE Row BETWEEN 0 AND 50000
-- To get the next 50k rows:
--SELECT * FROM CTE WHERE Row BETWEEN 50001 AND 100000
It's not very effecient though as you construct the row number over the whole table even though you only need 50k. If possible you might want to consider adding an extra column to the source table to include some artifical numering.
As noted in the comments just using TOP X without ordering the results first will render inconsistent results as order isn't guaranteed if not explicitly specified.

Related

SQL Server paging using limit and offset with non-unique ordering column

I would like to implement paging using SQL Server limit and offset as shown below. My question is, will SQL Server guerantee the correct data page when data is ordered by non-unique column?
SELECT * FROM TableName
ORDER BY NonUniqueColumn
OFFSET 10 ROWS
FETCH NEXT 20 ROWS ONLY
OPTION (recompile)
SELECT * FROM TableName
ORDER BY NonUniqueColumn
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY
OPTION (recompile)
These two selects overlap. Will the second select (second page) contain the last ten rows from the first select?
will SQL Server guerantee the correct data page when data is ordered by non-unique column?
No. Add the primary key columns to the ORDER BY to guarantee a stable ordering. EG
SELECT * FROM TableName
ORDER BY NonUniqueColumn, Id
OFFSET 10 ROWS
FETCH NEXT 20 ROWS ONLY
OPTION (recompile)

How to SELECT LIMIT in ASE 12.5? LIMIT 10, 10 gives syntax error?

How can I LIMIT the result returned by a query in Adaptive Server IQ/12.5.0/0306?
The following gives me a generic error near LIMIT:
SELECT * FROM mytable LIMIT 10, 10;
Any idea why? This is my first time with this dbms
Sybase IQ uses row_count to limit the number of rows returned.
You will have to set the row count at the beginning of your statement and decide whether the statement should be TEMPORARY or not.
ROW_COUNT
SET options
LIMIT statement is not supported in Sybase IQ 12. I think there is not simple or clean solution, similarly like in old SQL server. But there are some approches that works for SQL server 2000 and should work also for Sybase IQ 12. I don't promise that queries below will work on copy&paste.
Subquery
SELECT TOP 10 *
FROM mytable
WHERE Id NOT IN (
SELECT TOP 10 Id FROM mytable ORDER BY OrderingColumn
)
ORDER BY OrderingColumn
Basically, it fetches 10 rows but also skips first 10 rows. To get this works, rows must be unique and ordering is important. Id cannot be more times in results. Otherwise you can filter out valid rows.
Asc-Desc
Another workaround depends on ordering. It uses ordering and fetches 10 rows for second page and you have to take care of last page (it does not work properly with simple formula page * rows per page).
SELECT *
FROM
(
SELECT TOP 10 *
FROM
(
SELECT TOP 20 * -- (page * rows per page)
FROM mytable
ORDER BY Id
) AS t1
ORDER BY Id DESC
) AS t2
ORDER BY Id ASC
I've found some info about non working subqueries in FROM statement in ASE 12. This approach maybe is not possible.
Basic iteration
In this scenario you can just iterate through rows. Let's assume id of tenth row is 15. Then it will select next 10 rows after tenth row. Bad things happen when you will order by another column than Id. It is not possible.
SELECT TOP 10 *
FROM mytable
WHERE Id > 15
ORDER BY Id
Here is article about another workarounds in SQL server 2000. Some should also works in similar ways in Sybase IQ 12.
http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET
All those things are workarounds. If you can try to migrate on newer version.

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]
That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate
I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.
While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x

Preserving ORDER BY in SELECT INTO

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO VibeFGEventsAfterStudyStart
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id
The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.
In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?
I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:
insert Table2 (T2Col1, T2Col2)
select top 10000 T1Col1, T1Col2
from Table1
order by newid()
What for?
Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.
The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.
As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.
So it makes no sense to have a specific order on the insert as you try to achieve.
SQL 101: sets have no order.
Just add top to your sql with a number that is greater than the actual number of rows:
SELECT top 25000 *
into spx_copy
from SPX
order by date
I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.
It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.
You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.
I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:
create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);
insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');
select * from tmp1 order by id;
And I got this list:
1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry
No surprises here. Then I made a copy from tmp1 to tmp2 this way:
insert into tmp2 (name)
select name
from tmp1
order by id;
select * from tmp2 order by id;
I got the exact response like before. Apple to Blackberry.
Now reverse the order to test it:
delete from tmp2;
insert into tmp2 (name)
select name
from tmp1
order by id desc;
select * from tmp2 order by id;
9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple
So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!
The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't.
So, what do you do?
The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).
I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.
Why not create row numbers based on the sequence (order) you need to preserve?
SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
, VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.
I found this approach helpful to solve this problem:
WITH ordered as
(
SELECT TOP 1000
[Month]
FROM SourceTable
GROUP BY [Month]
ORDER BY [Month]
)
INSERT INTO DestinationTable (MonthStart)
(
SELECT * from ordered
)
Try using INSERT INTO instead of SELECT INTO
INSERT INTO VibeFGEventsAfterStudyStart
SELECT VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

Microsoft SQL Server Paging

There are a number of sql server paging questions on stackoverflow and many of them talk about using ROW_NUMBER() OVER (ORDER BY ...) AND CTE. Once you get into the hundreds of thousands of rows and start adding sorting on non-primary key values and adding custom WHERE clauses, these methods become very inneficient. I have a dataset of several million rows I am trying to page through with custom sorting and filtering, but I am getting poor performance, even with indexes on all the fields that I sort by and filter by. I even went as far as to include my SELECT columns in each of the indexes, but this barely helped and severely bloated my database.
I noticed the stackoverflow paging only takes about 500 milliseconds no matter what sorting criteria or page number you click on. Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows? This would include getting the total rows as efficiently as possible.
My current query has the exact same logic as this stackoverflow question about paging:
Best paging solution using SQL Server 2005?
Anyone know how to make paging work efficiently in SQL Server 2008 with millions of rows?
If you want accurate perfect paging, there is no substitute for building an index key (position row number) for each record. However, there are alternatives.
(1) total number of pages (records)
You can use an approximation from sysindexes.rows (almost instant) assuming the rate of change is small.
You can use triggers to maintain a completely accurate, to the second, table row count
(2) paging
(a)
You can show page jumps within say the next five pages to either side of a record. These need to scan at most {page size} x 5 on each side. If your underlying query lends itself to travelling along the sort order quickly, this should not be slow. So given a record X, you can go to the previous page using (assuming sort order is a asc, b desc
select top(#pagesize) t.*
from tbl x
inner join tbl t on (t.a = x.a and t.b > x.b) OR
(t.a < a.x)
where x.id = #X
order by t.a asc, t.b desc
(i.e. the last {page size} of records prior to X)
To go five pages back, you increase it to TOP(#pagesize*5) then further TOP(#pagesize) from that subquery.
Downside: This option requires that you cannot directly jump to a particular location, your options are only FIRST (easy), LAST (easy), NEXT/PRIOR, <5 pages either side
(b)
If the paging is always going to be quite specific and predictable, maintain an INDEXED view or trigger-updated table that does not contain gaps in the row number. This may be an option if the tables normally only see updates at one end of the spectrum, with gaps from deletes easily filled quickly by shifting not-so-many records.
This approach gives you a rowcount (last row) and also direct access to any page.
try this, let say you have country table as below:
DECLARE #pageIndex INT=0;
DECLARE #pageSize INT= 10;
DECLARE #sortByColumn NVARCHAR(200)='Code';
DECLARE #sortByDesc BIT=0;
;WITH tbl AS (
SELECT COUNT(id) OVER() [RowTotal], c.Id, c.Code, c.Name
FROM dbo.[Country] c
ORDER BY
CASE WHEN #sortByColumn='Code' AND #sortByDesc=0 THEN c.Code END ASC,
CASE WHEN #sortByColumn='Code' AND #sortByDesc<>0 THEN c.Code END DESC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc=0 THEN c.Name END ASC,
CASE WHEN #sortByColumn='Name' AND #sortByDesc<>0 THEN c.Name END DESC,
,c.Name ASC --DEFAULT SORTING ORDER
OFFSET #PageIndex*#pageSize ROWS
FETCH NEXT #pageSize ROWS ONLY
) SELECT (#PageIndex*#pageSize)+(ROW_NUMBER() OVER(ORDER BY Id))[RowNo],* from tbl;

Resources