SQL Filtering A Result Set To Return A Maximum Amount Of Rows At Even Intervals - sql-server

I currently use SQL2008 where I have a stored procedure that fetches data from a table that then gets fed in to a line graph on the client. This procedure takes a from date and a too date as parameters to filter the data. This works fine for small datasets but the graph gets a bit muddled when a large date range is entered causes thousends of results.
What I'd like to do is provide a max amount of records to be returned and return records at evenly spaced intervals to give that amount. For example say I limited it to 10 records and the result set was 100 records I'd like the stored procedure to return every 10th record.
Is this possible wihtout suffering big performance issues and what would be the best way to achieve it? I'm struggling to find a way to do it without cursors and if thats the case I'd rather not do it at all.
Thanks

Assuming you use at least SQL2005, you could do somesting like
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
)
SELECT a, b
FROM p
WHERE row_no % (total_cnt / 10) = 1
The where condition in the bottom calculates the modulus of the row number by the total number of records divided by the required number of final records.
If you want to use the average instead of one specific value, you would extend this as follows:
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
),
a as (
SELECT a, b, row_no, total_count,
avg(a) OVER(partition by row_no / (total_cnt / 10)) as avg_a
FROM p
)
SELECT a, b, avg_a
FROM a
WHERE row_no % (total_cnt / 10) = 1
The formula to select one of the values in the final WHERE clause is used with the % replaced by / in the partition by clause.

Related

Get random data from SQL Server without performance impact

I need to select random rows from my sql table, when search this cases in google, they suggested to ORDER BY NEWID() but it reduces the performance. Since my table has more than 2'000'000 rows of data, this solution does not suit me.
I tried this code to get random data :
SELECT TOP 10 *
FROM Table1
WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) AS INT)) % 100) < 10
It also drops performance sometimes.
Could you please suggest good solution for getting random data from my table, I need minimum rows from that tables like 30 rows for each request. I tried TableSAMPLE to get the data, but it returns nothing once I added my where condition because it return the data by the basis of page not basis of row.
Try to calc the random ids before to filter your big table.
since your key is not identity, you need to number records and this will affect performances..
Pay attention, I have used distinct clause to be sure to get different numbers
EDIT: I have modified the query to use an arbitrary filter on your big table
declare #n int = 30
;with
t as (
-- EXTRACT DATA AND NUMBER ROWS
select *, ROW_NUMBER() over (order by YourPrimaryKey) n
from YourBigTable t
-- SOME FILTER
WHERE 1=1 /* <-- PUT HERE YOUR COMPLEX FILTER LOGIC */
),
r as (
-- RANDOM NUMBERS BETWEEN 1 AND COUNT(*) OF FILTERED TABLE
select distinct top (#n) abs(CHECKSUM(NEWID()) % n)+1 rnd
from sysobjects s
cross join (SELECT MAX(n) n FROM t) t
)
select t.*
from t
join r on r.rnd = t.n
If your uniqueidentifier key is a random GUID (not generated with NEWSEQUENTIALID() or UuidCreateSequential), you can use the method below. This will use the clustered primary key index without sorting all rows.
SELECT t1.*
FROM (VALUES(
NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())
,(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())
,(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID()),(NEWID())) AS ThirtyKeys(ID)
CROSS APPLY(SELECT TOP (1) * FROM dbo.Table1 WHERE ID >= ThirtyKeys.ID) AS t1;

Calculate a Recursive Rolling Average in SQL Server

We are attempting to calculate a rolling average and have tried to convert numerous SO answers to solve the problem. To this point we are still unsuccessful.
What we've tried:
Here are some of the SO answers we have considered.
SQL Server: How to get a rolling sum over 3 days for different customers within same table
SQL Query for 7 Day Rolling Average in SQL Server
T-SQL calculate moving average
Our latest attempt has been to modify one of the solutions (#4) found here.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
Example:
Here is an example in SQL Fiddle: http://sqlfiddle.com/#!6/4570a/17
In the fiddle, we are still trying to get the SUM to work right but ultimately we are trying to get the average.
The end goal
Using the Fiddle example, we need to find the difference between Value1 and ComparisonValue1 and present it as Diff1. When a row has no Value1 available, we need to estimate it by taking the average of the last two Diff1 values and then add it to the ComparisonValue1 for that row.
With the correct query, the result would look like this:
GroupID Number ComparisonValue1 Diff1 Value1
5 10 54.78 2.41 57.19
5 11 55.91 2.62 58.53
5 12 55.93 2.78 58.71
5 13 56.54 2.7 59.24
5 14 56.14 2.74 58.88
5 15 55.57 2.72 58.29
5 16 55.26 2.73 57.99
Question: is it possible to calculate this average when it could potentially factor into the average of the following rows?
Update:
Added a VIEW to the Fiddle schema to simplify the final query.
Updated the query to include the new rolling average for Diff1 (column Diff1Last2Avg). This rolling average works great until we run into nulls in the Value1 column. This is where we need to insert the estimate.
Updated the query to include the estimate that should be used when there is no Value1 (column Value1Estimate). This is working great and would be perfect if we could use the estimate in place of NULL in the Value1 column. Since the Diff1 column reflects the difference between Value1 (or its estimate) and ComparisonValue1, including the Estimate would fill in all the NULL values in Diff1. This in turn would continue to allow the Estimates of future rows to be calculated. It gets confusing at this point, but still hacking away at it. Any ideas?
Credit for the idea goes to this answer: https://stackoverflow.com/a/35152131/6305294 from #JesúsLópez
I have included comments in the code to explain it.
UPDATE
I have corrected the query based on comments.
I have swapped numbers in minuend and subtrahend to get difference as a positive number.
Removed Diff2Ago column.
Results of the query now exactly match your sample output.
;WITH cte AS
(
-- This is similar to your ItemWithComparison view
SELECT i.Number, i.Value1, i2.Value1 AS ComparisonValue1,
-- Calculated Differences; NULL will be returned when i.Value1 is NULL
CONVERT( DECIMAL( 10, 3 ), i.Value1 - i2.Value1 ) AS Diff
FROM Item AS i
LEFT JOIN [Group] AS G ON g.ID = i.GroupID
LEFT JOIN Item AS i2 ON i2.GroupID = g.ComparisonGroupID AND i2.Number = i.Number
WHERE NOT i2.Id IS NULL
),
cte2 AS(
/*
Start with the first number
Note if you do not have at least 2 consecutive numbers (in cte) with non-NULL Diff value and therefore Diff1Ago or Diff2Ago are NULL then everything else will not work;
You may need to add additional logic to handle these cases */
SELECT TOP 1 -- start with the 1st number (see ORDER BY)
a.Number, a.Value1, a.ComparisonValue1, a.Diff, b.Diff AS Diff1Ago
FROM cte AS a
-- "1 number ago"
LEFT JOIN cte AS b ON a.Number - 1 = b.Number
WHERE NOT a.Value1 IS NULL
ORDER BY a.Number
UNION ALL
SELECT b.Number, b.Value1, b.ComparisonValue1,
( CASE
WHEN NOT b.Value1 IS NULL THEN b.Diff
ELSE CONVERT( DECIMAL( 10, 3 ), ( a.Diff + a.Diff1Ago ) / 2.0 )
END ) AS Diff,
a.Diff AS Diff1Ago
FROM cte2 AS a
INNER JOIN cte AS b ON a.Number + 1 = b.Number
)
SELECT *, ( CASE WHEN Value1 IS NULL THEN ComparisonValue1 + Diff ELSE Value1 END ) AS NewValue1
FROM cte2 OPTION( MAXRECURSION 0 );
Limitations:
this solution works well only when you need to consider small number of preceding values.

How to select Second Last Row in mySql?

I want to retrieve the 2nd last row result and I have seen this question:
How can I retrieve second last row?
but it uses order by which in my case does not work because the Emp_Number Column contains number of rows and date time stamp that mixes data if I use order by .
The rows 22 and 23 contain the total number of rows (excluding row 21 and 22) and the time and day it got entered respectively.
I used this query which returns the required result 21 but if this number increases it will cause an error.
SELECT TOP 1 *
FROM(
SELECT TOP 2 *
FROM DAT_History
ORDER BY Emp_Number ASC
) t
ORDER BY Emp_Number desc
Is there any way to get the 2nd last row value without using the Order By function?
There is no guarantee that the count will be returned in the one-but-last row, as there is no definite order defined. Even if those records were written in the correct order, the engine is free to return the records in any order, unless you specify an order by clause. But apparently you don't have a column to put in that clause to reproduce the intended order.
I propose these solutions:
1. Return the minimum of those values that represent positive integers
select min(Emp_Number * 1)
from DAT_history
where Emp_Number not regexp '[^0-9]'
See SQL Fiddle
This will obviously fail when the count is larger then the smallest employee number. But seeing the sample data, that would represent a number of records that is maybe not expected...
2. Count the records, ignoring the 2 aggregated records
select count(*)-2
from DAT_history
See SQL Fiddle
3. Relying on correct order without order by
As explained at the start, you cannot rely on the order, but if for some reason you still want to rely on this, you can use a variable to number the rows in a sub query, and then pick out the one that has been attributed the one-but-last number:
select Emp_Number * 1
from (select Emp_Number,
#rn := #rn + 1 rn
from DAT_history,
(select #rn := 0) init
) numbered
where rn = #rn - 1
See SQL Fiddle
The * 1 is added to convert the text to a number data type.
This is not a perfect solution. I am making some assumptions for this. Check if this could work for you.
;WITH cte
AS (SELECT emp_number,
Row_number()
OVER (
ORDER BY emp_number ASC) AS rn
FROM dat_history
WHERE Isdate(emp_number) = 0) --Omit date entries
SELECT emp_number
FROM cte
WHERE rn = 1 -- select the minimum entry, assuming it would be the count and assuming count might not exceed the emp number range of 9888000

Recursive Decaying Average in Sql Server 2012

I need to calculate a decaying average (cumulative moving?) of a set of values. The last value in the series is 50% weight, with the decayed average of all the prior series as the other 50% weight, recursively.
I came up with a CTE query that produces correct results, but it depends on a sequential row number. I'm wondering if there is a better way to do this in SQL 2012, maybe with the new windowing functions for Over(), or something like that?
In the live data, the rows are ordered by time. I can use an SQL view and ROW_NUMBER() to generate the necessary Row field for my CTE approach, but if there is a more efficient way to do this, I would like to keep this as efficient as possible.
I have a sample table with 2 columns: Row int, and Value Float. I have 6 sample data values of 1,2,3,4,4,4. The correct result should be 3.78125.
My solution is:
;WITH items AS (
SELECT TOP 1
Row, Value, Value AS Decayed
FROM Sample Order By Row
UNION ALL
SELECT v.Row, v.Value, Decayed * .5 + v.Value *.5 AS Decayed
FROM Sample v
INNER JOIN items itms ON itms.Row = v.Row-1
)
SELECT top 1 Decayed FROM items order by Row desc
This correctly produces 3.78125 with the test data. My question is: Is there a more efficient and/or simpler way to do this in SQL 2012, or is this about the only way to do it? Thanks.
One possible alternative would be
WITH T AS
(
SELECT
Value * POWER(5E-1, ROW_NUMBER()
OVER (ORDER BY Row DESC)
/* first row decays less so special cased */
-IIF(LEAD(Value) OVER (ORDER BY Row DESC) IS NULL,1,0))
as x
FROM Sample
)
SELECT SUM(x)
FROM T
SQL Fiddle
Or for the updated question using 60%/40%
WITH T AS
(
SELECT IIF(LEAD(Value) OVER (ORDER BY Row DESC) IS NULL, 1,0.6)
* Value
* POWER(4E-1, ROW_NUMBER() OVER (ORDER BY Row DESC) -1)
as x
FROM Sample
)
SELECT SUM(x)
FROM T
SQL Fiddle
both of the above perform a single pass through the data and can potentially use an index on Row INCLUDE(Value) to avoid a sort.

I need to get the whole row for all duplicate entries efficiently

internet! I'm pretty new to SQL and I need to get all the rows with duplicate information in certain fields and have them display right next to their other duplicates (group by duplicates).
For instance, say I have a table with columns:
A,B,C,D,E,F,G
I want to be able to get all entries (the full row) where B, C, D, and E share the same value as another entry and show the duplicates right next to the original entry. I already have a solution, but it is horribly inefficient. I am trying to improve my running time here.
My original solution was this:
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT
B+C+D+E+F+G
FROM
tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
)
ORDER BY B,C,D,E,F,G ASC
This gave me the results that I wanted, but it is horrendously slow (took over 15 mins to run). I reworked my solution with a temporary table and shaved the time down to 5 mins of running time using this script:
--Drop the temp table if it exists.
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
SELECT
B+C+D+E+F+G AS CompareString
INTO #Temp1
FROM tbl_myTable
GROUP BY
B,C,D,E,F,G
HAVING COUNT(*) > 1
SELECT TOP 1000
A,
B,
C,
D,
E,
F,
G
FROM tbl_myTable
WHERE (B+C+D+E+F+G) IN (
SELECT * FROM #Temp1
)
ORDER BY B,C,D,E,F,G ASC
Five minutes still seems like a long time. Is there a faster way to do this? I'm new to SQL, so if something I did was not good, let me know! Thanks!
I would do something like this:
with cte as (
SELECT *
, count(*) over (partition by B, C, D, E, F, G) as cnt
, dense_rank() over (order by B, C, D, E, F, G) as grp
FROM STI.[dbo].[tbl_Consignee]
)
select *
from cte
where cnt > 1
order by grp
Essentially, the dense_rank() call gives each unique tuple an identifier (so you can put duplicates next to each other with the order by clause) and the count counts the number of rows per group.
Without actual data, I have to make a few assumptions here.
First, I assume your lettered fields are all text types, and you are using + to concatenate and not to add numeric values (otherwise A+B+C = 6 when A = 1 B = 2 and C = 3 as well as when A=2 B=3 and C=1, which would not be a match).
Next I'm going to assume there is a key field of some kind on each row that is not represented in your example. Something like tbl_myTable.MyTableKey bigint IDENTITY (1,1) NOT NULL.
Assuming all that, I'd try...
SELECT
[BaseTable].MyTableKey AS [Original Record],
[DupCheckTable].MyTableKey AS [Duplicate Record]
FROM
tbl_myTable [BaseTable]
LEFT OUTER JOIN tbl_myTable [DupCheckTable] ON
[BaseTable].A = [DupCheckTable].A
AND
[BaseTable].B = [DupCheckTable].B
AND
--... repeat for each actual field
--AND
[BaseTable].G = [DupCheckTable].G
AND
[BaseTable].MyTableKey < [DupCheckTable].MyTableKey --the less than operator prevents you from getting each match twice
WHERE
[DupCheckTable].MyTableKey IS NOT NULL
I think this will run faster because you can use the table key, which is presumably indexed, as part of the join. Also, you feed any of your (or my) queries to the Tuning Advisor to see what it thinks would help in the lines of statistics and indexes.

Resources