need help for this, can SQL do this?
I have a table with 100 rows, I need to call 10 random rows from the table.
Can SQL make every query call give me different random data but not the chosen rows before
1st random query = get 10 random rows
2nd random query = get another 10 random rows different from the 1st call
.
.
.
10th random query = get the last 10 rows not chosen
I am still searching but no idea how, last option is every select do updates to the rows so not selected in next call
SELECT TOP 10 [your select list]
FROM [your table]
ORDER BY NEWID()
Should give you 10 "random" rows per call.
If you need the rows to be 100% sure they're not selected previously - the only way to do it is to mark them so you can exclude them in where or remove them from table.
edit to expand on the mark approach:
Pseudo code for marking as extracted could look something like this:
DECLARE #tempList AS TABLE
INSERT INTO #tempList
SELECT TOP 10 [your select list]
FROM [your table]
WHERE extracted = 0
ORDER BY NEWID()
UPDATE [your table]
SET extracted = 1
WHERE rows IN #tempList
SELECT * FROM #templist
With this script, you just need to declare which run you are doing, it will always give the same rows for each run, you just have to change the value for #run. Result will seem random without actually being random:
DECLARE #run INT = 2
DECLARE #t table(id int)
INSERT #t
SELECT x.id*10 + y.id + 1
FROM (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(id)
CROSS JOIN
(values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(id)
;WITH CTE as
(
SELECT id, ntile(10) over (order by reverse(CHECKSUM(id, 'abc'))) rowgroup FROM #t
)
SELECT id
FROM CTE
WHERE rowgroup = #run
Related
Objective:
Want to know which is faster/better performance when trying to retrieve a finite number of rows from CTE that is already ordered.
Example:
Say I have a CTE(intentionally simplified) that looks like this, and I only want the top 5 rows :
WITH cte
AS (
SELECT Id = RANK() OVER (ORDER BY t.ActionID asc)
, t.Name
FROM tblSample AS t -- tblSample is indexed on Id
)
Which is faster:
SELECT TOP 5 * FROM cte
OR
SELECT * FROM cte WHERE Id BETWEEN 1 AND 5 ?
Notes:
I am not a DB programmer, so to me the TOP solution seems better as
once SS finds the 5th row, it will stop executing and "return" (100%
assumption) while in the other method, i feel it will unnecessarily
process the whole cte.
My question is for a CTE, would the answer to this question be the same if it were a table?
The most important thing to note is that both queries are not going to always produce the same result set. Consider the following data:
CREATE TABLE #tblSample (ActionId int not null, name varchar(10) not null);
INSERT #tblSample VALUES (1,'aaa'),(2,'bbb'),(3,'ccc');
Both of these will produce the same result:
WITH CTE AS
(
SELECT id = RANK() OVER (ORDER BY t.ActionID asc), t.name
FROM #tblSample t
)
SELECT TOP(2) * FROM CTE;
WITH CTE AS
(
SELECT id = RANK() OVER (ORDER BY t.ActionID asc), t.name
FROM #tblSample t
)
SELECT * FROM CTE WHERE id BETWEEN 1 AND 2;
Now let's do this update:
UPDATE #tblSample SET ActionId = 1;
After this update the first query still returns two rows, the second query returns 3. Keep in mind too that, without an ORDER BY in the TOP query the results are not guaranteed because there is no default order in SQL.
With that out of the way - which performs better? It depends. It depends on your indexing, your statistics, number of rows, and the execution plan that the SQL Engine goes with.
Top 5 selects any 5 rows as per Index defined on the table whereas Id between 1 and 5 tries to fetch data based on Id column whether by Index seek or scan depends on the selected attributes. Both are two different queries.. 'Id between' query might be slow if you do not have any index on Id,
Let me try to explain with an example...
Consider this is your data..
create index nci_name on yourcte(id) include(name)
--drop index nci_name on yourcte
;with cte as (
select * from yourcte )
select top 5 * from cte
;with cte as (
select * from yourcte )
select * from cte where id between 1 and 5
First i am creating index on id with name included, Now if you see your second query does Index seek and first one does index scan and selects top 5, so in this case second approach is better
See the execution plan:
Now i am removing the index
Executing
--drop index nci_name on yourtable
Now it does table scan on both the approaches
If you notice in both the table scans, in the first one it reads only 5 rows and second approach it reads 10 rows and applies predicate
See execution plan properties for first plan
For second approach it reads 10 rows
Now first approach is better..
In your case this index needs to be on ActionId which determines the id.
Hence performance depends on how you index on your base table.
In order to get the RANK() which you are calculating in your cte it must sort all the data by t.ActionID. Sorting is a blocking operation: the entire input must be processed before a single row is output.
So in this case whether you select any five rows, or if you take the five that sorted to the top of the pile is probably irrelevant.
I have inherited a stored procedure that utilizes a table variable to store data, then updates each row with a running total calculation. The order of the records in the table variable is very important, as we want the volume to be ordered highest to lowest (i.e. the running total will get increasingly larger as you go down the table).
My problem is, during the step where the table variable is updated, the running total seems to be calculating , but not in a way that the data in the table variable was previously sorted by (descending by highest volume)
DECLARE #TableVariable TABLE ([ID], [Volume], [SortValue], [RunningTotal])
--Populate table variable and order by the sort value...
INSERT INTO #TableVariable (ID, Volume, SortValue)
SELECT
[ID], [Volume], ABS([Volume]) as SortValue
FROM
dbo.VolumeTable
ORDER BY
SortValue DESC
--Set TotalVolume variable...
SELECT#TotalVolume = ABS(sum([Volume]))
FROM #TableVariable
--Calculate running total, update rows in table variable...I believe this is where problem occurs?
SET #RunningTotal = 0
UPDATE #TableVariable
SET #RunningTotal = RunningTotal = #RunningTotal + [Volume]
FROM #TableVariable
--Output...
SELECT
ID, Volume, SortValue, RunningTotal
FROM
#TableVariable
ORDER BY
SortValue DESC
The result is, the record that had the highest volume, that I would have expected the running total to calculate on first (thus running total = [volume]), somehow ends up much further down in the list. The running total seems to calculate randomly
Here is what I would expect to get:
But here is what the code actually generates:
Not sure if there is a way to get the UPDATE statement to be enacted on the table variable in such a way that it is ordered by volume desc? From what Ive read so far, it could be an issue with the sorting behavior of a table variable but not sure how to correct? Can anyone help?
GarethD provided the definitive link to the multiple ways of calculating running totals and their performance. The correct one is both the simplest and fastest, 300 times faster that then quirky update. That's because it can take advantage of any indexes that cover the sort column, and because it's a lot simpler.
I repeat it here to make clear how much simpler this is when the database provided the appropriate windowing functions
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
The SUM line means: Sum all ticket counts over all (UNBOUNDED) the rows that came before (PRECEDING) the current one if they were ordered by date
That ends up being 300 times faster than the quirky update.
The equivalent query for VolumeTable would be:
SELECT
ID,
Volume,
ABS(Volume) as SortValue,
SUM(Volume) OVER (ORDER BY ABS(Volume) DESC RANGE UNBOUNDED PRECEDING)
FROM
VolumeTable
ORDER BY ABS(Volume) DESC
Note that this will be a lot faster if there is an index on the sort column (Volume), and ABS isn't used. Applying any function on a column means that the optimizer can't use any indexes that cover it, because the actual sort value is different than the one stored in the index.
If the table is very large and performance suffers, you could create a computed column and create an index on it
Take a peek at the Window functions offered in SQL
For example
Declare #YourTable table (ID int,Volume int)
Insert Into #YourTable values
(100,1306489),
(125,898426),
(150,907404)
Select ID
,Volume
,RunningTotal = sum(Volume) over (Order by Volume Desc)
From #YourTable
Order By Volume Desc
Returns
ID Volume RunningTotal
100 1306489 1306489
150 907404 2213893
125 898426 3112319
To be clear, The #YourTable is for demonstrative purposes only. There should be no need to INSERT your actual data into a table variable.
EDIT to Support 2008 (Good news is Row_Number() is supported in 2008)
Select ID
,Volume
,RowNr=Row_Number() over (Order by Volume Desc)
Into #Temp
From #YourTable
Select A.ID
,A.Volume
,RunningTotal = sum(B.Volume)
From #Temp A
Join #Temp B on (B.RowNr<=A.RowNr)
Group By A.ID,A.Volume
Order By A.Volume Desc
Assuming that I had previously declared a table called #temp which has count as NULL values, and later on I wanted to update that column in my Script how would I do that?
count --- CAM
1 201
1 2
1 2012
2 20
I have the update statement which would be:
Update #temp set [count]= ((ROW_NUMBER() over(order by CAM desc)-1/3)+1
However, it gives me the following error:
Windowed functions can only appear in the SELECT or ORDER BY clauses.
I have tried many different ways using a select statement, but no luck!. Any help on this?
If I'm understanding what you want to do, although count is a bit of an odd column name here given the data it seems to hold:
WITH cte AS
(
SELECT (row_number() OVER(ORDER BY CAM DESC) - 1)/3 + 1 AS [count],
CAM
FROM #temp
)
UPDATE #temp
SET #temp.[count] = cte.[count]
FROM #temp
INNER JOIN cte ON #temp.CAM = cte.CAM
Note I've also pulled the /3 outside of the parentheses - I believe this is what you've intended.
This will work as long as CAM is unique.
I need a faster method to calculate and display a running sum.
It's an MVC telerik grid that queries a view that generates a running sum using a sub-query. The query takes 73 seconds to complete, which is unacceptable. (Every time the user hits "Refresh Forecast Sheet", it takes 73 seconds to re-populate the grid.)
The query looks like this:
SELECT outside.EffectiveDate
[omitted for clarity]
,(
SELECT SUM(b.Amount)
FROM vCI_UNIONALL inside
WHERE inside.EffectiveDate <= outside.EffectiveDate
) AS RunningBalance
[omitted for clarity]
FROM vCI_UNIONALL outside
"EffectiveDate" on certain items can change all the time... New items can get added, etc. I certainly need something that can calculate the running sum on the fly (when the Refresh button is hit). Stored proc or another View...? Please advise.
Solution: (one of many, this one is orders of magnitude faster than a sub-query)
Create a new table with all the columns in the view except for the RunningTotal col. Create a stored procedure that first truncates the table, then INSERT INTO the table using SELECT all columns, without the running sum column.
Use update local variable method:
DECLARE #Amount DECIMAL(18,4)
SET #Amount = 0
UPDATE TABLE_YOU_JUST_CREATED SET RunningTotal = #Amount, #Amount = #Amount + ISNULL(Amount,0)
Create a task agent that will run the stored procedure once a day. Use the TABLE_YOU_JUST_CREATED for all your reports.
Take a look at this post
Calculate a Running Total in SQL Server
If you have SQL Server Denali, you can use new windowed function.
In SQL Server 2008 R2 I suggest you to use recursive common table expression.
Small problem in CTE is that for fast query you have to have identity column without gaps (1, 2, 3,...) and if you don't have such a column you have to create a temporary or variable table with such a column and to move you your data there.
CTE approach will be something like this
declare #Temp_Numbers (RowNum int, Amount <your type>, EffectiveDate datetime)
insert into #Temp_Numbers (RowNum, Amount, EffectiveDate)
select row_number() over (order by EffectiveDate), Amount, EffectiveDate
from vCI_UNIONALL
-- you can also use identity
-- declare #Temp_Numbers (RowNum int identity(1, 1), Amount <your type>, EffectiveDate datetime)
-- insert into #Temp_Numbers (Amount, EffectiveDate)
-- select Amount, EffectiveDate
-- from vCI_UNIONALL
-- order by EffectiveDate
;with
CTE_RunningTotal
as
(
select T.RowNum, T.EffectiveDate, T.Amount as Total_Amount
from #Temp_Numbers as T
where T.RowNum = 1
union all
select T.RowNum, T.EffectiveDate, T.Amount + C.Total_Amount as Total_Amount
from CTE_RunningTotal as C
inner join #Temp_Numbers as T on T.RowNum = C.RowNum + 1
)
select C.RowNum, C.EffectiveDate, C.Total_Amount
from CTE_RunningTotal as C
option (maxrecursion 0)
There're may be some questions with duplicates EffectiveDate values, it depends on how you want to work with them - do you want to them to be ordered arbitrarily or do you want them to have equal Amount?
I would like to know if i can select more then 1 uniqueidentifier in SQL server.
To select 1 : SELECT NEWID() this brings back 1 result.
I would like to bring back like 50 results
EDIT:
I would like the results to be returned in 1 grid, so i can copy all of them at once. Not copy and paste 1 by 1.
Are you trying to do this in SQL Server Management Studio?
Try:
SELECT NEWID()
GO 50
and run this batch
Update:
OK - how about this then??
SELECT NEWID()
FROM master..spt_values
WHERE name IS NULL
AND number < 50
Assuming the master.dbo.sysobjects table has at least 50 system objects in it:
SELECT TOP 50 NEWID() FROM master.dbo.sysobjects WHERE xtype = 'S'
You don't need an order by, since the NEWID is random every time.
--run these queries independently
CREATE TABLE #temp1 (ID UniqueIdentifier)
GO
INSERT INTO #temp1
SELECT NewID() AS ID
GO 50
SELECT *
FROM #temp1
GO
DROP TABLE #temp1
GO