I have a sql statement that consists of multiple SELECT statements. I want to limit the total number of rows coming back to let's say 1000 rows. I thought that using the SET ROWCOUNT 1000 directive would do this...but it does not. For example:
SET ROWCOUNT 1000
select orderId from TableA
select name from TableB
My initial thought was that SET ROWCOUNT would apply to the entire batch, not the individual statements within it. The behavior I'm seeing is it will limit the first select to 1000 and then the second one to 1000 for a total of 2000 rows returned. Is there any way to have the 1000 limit applied to the batch as a whole?
Not in one statement. You're going to have to subtract ##ROWCOUNT from the total rows you want after each statement, and use a variable (say, "#RowsLeft") to store the remaining rows you want. You can then SELECT TOP #RowsLeft from each individual query...
And how would you ever see any records from the second query if the first always returns more than 1000 if you were able to do this in a batch?
If the queries are simliar enough you could try to do this through a union and use the rowcount on that as it would only be one query at that point. If the queries are differnt in the columns returned I'm not sure what you would get by limiting the entire group to 1000 rows because the meanings would be different. From a user perspective I'd rather consistently get 500 orders and 500 customer names than 998 orers and 2 names one day and 210 orders and 790 names the next. It would be impossible to use the application especially if you happened to be most interested in the information in the second query.
Use TOP not ROWCOUNT
http://msdn.microsoft.com/en-us/library/ms189463.aspx
You trying to get 1000 rows MAX from all tables right?
I think other methods may fill up with from the top queries first, and you may never get results from the lower ones.
The requirement sounds odd. Unless you are unioning or joining the data from the two selects, to consider them as one so that you apply a max rows simply does not make sense, since they are unrelated queries at that point. If you really need to do this, try:
select top 1000 from (
select orderId, null as name, 'TableA' as Source from TableA
union all
select null as orderID, name, 'TableB' as Source from TableB
) a order by Source
SET ROWCOUNT applies to each individual query. In your given example, it's applied twice, once to each SELECT statement, since each statement is its own batch (they're not grouped or unioned or anything, and so execute completely separately).
#RedFilter's approach seems the most likely to give you what you want.
Untested and doesn't make use of ROWCOUNT, but could give you an idea?
Assumes col1 in TableA and TableB are the same type.
SELECT TOP 1000 *
FROM (select orderId
from TableA
UNION ALL
select name from TableB) t
The following worked for me:
CREATE PROCEDURE selectTopN
(
#numberOfRecords int
)
AS
SELECT TOP (#numberOfRecords) * FROM Customers
GO
this is your solution :
TOP (Transact-SQL)
and about ##RowCount you can read this Link :
SET ROWCOUNT (Transact-SQL)
Important
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL).
I think two way will work.!
Related
I am trying to delete millions of records from 4 databases, and running into an unexpected error. I made a temp table that holds a list of all the id's I wish to delete:
CREATE TABLE #CaseList (case_id int)
INSERT INTO #CaseList
SELECT DISTINCT id
FROM my_table
WHERE <my criteria for choosing cases>
I have deleted all the associated records (with foreign key on case_id)
DELETE FROM image WHERE case_id in (SELECT case_id from #CaseList)
Then I'm deleting records from my_table in batches (so as not to blow up the transaction log - which despite my database being in Simple Mode - still grows when making changes like deletions):
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 0 ROWS
FETCH NEXT 10000 ROWS ONLY)
This will work fine for one or three or five rounds (so I've deleted 10k-50k records), then will fail with this error message:
Msg 512, Level 16, State 1, Procedure trgd_image, Line 188
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Which is really weird because as I said, I already deleted all the associated records from the image table. Then it gets weirder because if I select smaller batches, the deletion works without error.
I generally cut the FETCH NEXT n half (5k), then in half again (2500), then in half again (1200) etc. until it works
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 50000 ROWS
FETCH NEXT 1200 ROWS ONLY)
Then repeat that amount until I get past where it failed, then turn it back up to 10000 and it will work again for a batch or three...
DELETE FROM my_table WHERE id in (SELECT case_id
FROM #CaseList
ORDER by case_id
OFFSET 60000 ROWS
FETCH NEXT 10000 ROWS ONLY)
then fail again with the same error... rinse, wash, and repeat.
What can cause that subquery error when there are NOT related records in the image table? Why would selecting the cases in smaller batches work "around it" and then allow larger batches again?
I would really love a solution to this so I can make a WHILE loop and run this deletion through the millions of rows that way instead of having to manage it manually which is going to take me weeks with millions of rows needed to be deleted out of 4 databases.
The query you're showing cannot produce the error you're seeing. If you're sure it is, you have a bug report. My guess is that trgd_image, Line 188 (or somewhere nearby) you'll find you're using a scalar comparison, = instead of in.
I also have some advice for you, free for the asking. I wrote lots of queries like yours, and never used anything like OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY. You don't need to, either, and your SQL will be easier to write if you don't.
First, unless your machine is seriously undersized for 2018 for the scale of data you're using, I think you'll find 100,000 row transactions are just fine. If not, at least try to understand why not. A machine managing many millions of rows ought to be able to deal with a 1% of them without breaking a sweat.
When you populate #CaseList, trap ##rowcount. Then you can print/record that, and compute the number of "chunks" in your work.
Ideally, though, there's no temporary table. Instead, those cases probably have some logical grouping you can operate on. They might have regions or owners or dates, whatever was used to select them in the first place. Iterate over that, e.g.
delete from T where id in (select id from S where user = 1
Once you do that, you can write a loop:
select #user = min(user) from S where ...
while #user is not NULL begin
print "deleting cases for user", #user
delete from T where id in (select id from S where user = #user)
select #u = #user
select #user = min(user) from S where ... and user > #u
end
That way, if the process blows up partway through -- for any reason -- you have a logical grouping of deletions and a clean break: you know all the cases for user (or whatever) less than #user are deleted, and you can look into what's wrong with the "current" one. Quite often, you'll discover that the problem isn't unique, and by solving it you'll prevent future problems with others.
When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'
As far as I am aware, the only way to get a random value in a SELECT statement is by using the newid() function, as the random() function doesn’t generate new values for each row.
This leads to the following awkward construction to get a random number from, say 0 - 9:
abs(checksum(newid())) % 10
If I use this expression in the SELECT clause, it behaves as expected. However, if I try something like the following:
select *
from table
where abs(checksum(newid())) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them. Apparently newid() is only evaluated once, instead of for each row.
The question is, how can I use a random number in the WHERE clause?
More
There is a similar question which asks for fixed number of rows at random. In the above example I could have used:
select top 50 percent from table order by newid();
which will get me what I am looking for.
The question remains, how can I use a random number in the WHERE clause. For example, is it possible to do something like this?
select *
from table
where code={random number};
Here is one way to get around the problem
SELECT *
FROM (SELECT *,
Abs(Checksum(Newid())) % 10 AS ran
FROM yourtable) a
WHERE ran > 4;
for some reason newid() in where clause it is executed only once and it is checked with the constant.
When I check the execution plan your query is missing compute scalar where as my query has compute scalar present in execution plan.
The function newid() is calculate only once in the WHERE clause, not row by row. The trick is to force it to run row by row.
Of course it is possible to include it in a SELECT clause, and, in turn, include that in a CTE or a subquery, as per the other answers.
Microsoft offer a solution here: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189108(v=sql.105)?redirectedfrom=MSDN
The trick is to force newid() to recalculate by combining it with some row value. This is easily done in the checksum() function.
For example:
SELECT *
FROM table
WHERE abs(checksum(newid(),id)) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them
You may get all of the rows or none of them ,since NEWID() is executed once per query when you use it in where clause..This is explained here by Conor Cunnigham and the technical term for this is called RumTimeConstants
You can look at your execution plan and look out for below expression
Const ConstValue
which you can see is calculated once and used throughout and finally you are doing just a boolean comparison,so you will end up with all rows or none
you have to use CTE Like the one stated in another answer or use Top with order by newid() or tablesample to return random rows
you may find Tablesample option more helpfull,since this may not go though all the table data to get only sample set of rows,unlike Newid()
below is one example on a table having 1000000 rows
select * from Orders
TABLESAMPLE (50 PERCENT)
plan
I'm trying to understand the behavior of an UPDATE/REPLACE that I'm carrying out that is removing some invalid data and replacing with preferred data.
The UPDATE executes normally and does what it needs to do, but the rows affected are not what I expected in some cases (I'm carrying this out on multiple databases).
I've put part of the script below (The rest is essentially replicating the same function across multiple tables)
UPDATE TBL_HISTORY
SET DETAILS = REPLACE(DETAILS,'"','Times New Roman')
WHERE HISTORYID IN
(SELECT TOP 1000 (HISTORYID) FROM TBL_HISTORY
WHERE DETAILS LIKE '%"%')
GO
What I'd imagine to happen with the script above is to select the TOP 1000 records in TBL_HISTORY that contain the unwanted string of data and carry out the REPLACE.
The result has been in cases where there are more than 1000 affected rows it will update all of them, returning a value of 1068 rows affected for example.
HISTORYID is the PK on the table. Am I misunderstanding how this should work? Any guidance would be appreciated.
Try this instead(it is faster). If it still update more than 1000 rows, it is due to a trigger. If it updates 1000 rows then HISTORYID is not the only column in the primary key(composite primary key).
;WITH CTE as
(
SELECT top 1000
DETAILS
FROM
TBL_HISTORY
WHERE
DETAILS LIKE '%"%'
)
UPDATE CTE
SET DETAILS = REPLACE(DETAILS,'"','Times New Roman')
I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]
That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate
I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.
While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x