Creating test data for calculation using RAND()

Creating test data for calculation using RAND() - sql-server

I attempted to populate a table with two columns of random FLOATs, but of every row generated was identical.
;WITH CTE (x, y) AS (
SELECT RAND(), RAND()
UNION ALL
SELECT x, y FROM CTE
)
--INSERT INTO CalculationTestData (x, y)
SELECT TOP 5000000 x, y
FROM CTE
OPTION (MAXRECURSION 0)
I can accomplish what I need just fine by just not using the CTE, but this has peaked my curiosity.
Is there a way to do this quickly?
I know quickly is a relative term, by it, I mean approximately how quickly it would take to execute the above.

What do you expect other than for the cte to repeat the rows because you're recursion is just selecting them again
SELECT RAND(), RAND() -- SELECT 9 , 10
UNION ALL
SELECT x, y -- SELECT 9 , 10
what you want to do is more like this
SELECT RAND(), RAND()
UNION ALL
SELECT RAND(), RAND() -- but the problem is that this 'row' will be duplicated
so you need to seed and reseed for each row giving you something like
SELECT RAND(CAST(NEWID() AS VARBINARY)),
RAND(CAST(NEWID() AS VARBINARY))
UNION ALL
SELECT RAND(CAST(NEWID() AS VARBINARY)),
RAND(CAST(NEWID() AS VARBINARY))
using NEWID() as the seed is one way there may well be others that are more efficient etc

Try this instead of rand(): it will give a random positive whole number on each entry. I had the same issue with rand() recently
ABS(Checksum(NewID()))
Float:
cast(ABS(Checksum(NewID()) ) as float)
To be Clear:
;WITH CTE (x, y) AS (
SELECT cast(ABS(Checksum(NewID()) ) as float), cast(ABS(Checksum(NewID()) ) as float)
UNION ALL
SELECT x, y FROM CTE
)
Did not give a random entry on each line?

Related

How to use recursive CTE to add resolution to a data set

I'm attempting to create a recursive CTE statement that adds blank rows in between data points that will later for interpolation. I'm a beginner with SQL and this is my first time using CTE's and am having some difficulty finding the proper way to do this.
I've attempted a few different slight variations on the code I have provided below after some research but haven't grasped a good enough understanding to see my issue yet. The following code should simulate sparse sampling by taking a observation every 4 hours from the sample data set and the second portion should add rows with there respective x values every 0.1 of an hour which will later be filled with interpolated values derived from a cubic spline.
--Sample Data
create table #temperatures (hour integer, temperature double precision);
insert into #temperatures (hour, temperature) values
(0,18.5),
(1,16.9),
(2,15.3),
(3,14.1),
(4,13.8),
(5,14.7),
(6,14.7),
(7,13.5),
(8,12.2),
(9,11.4),
(10,10.9),
(11,10.5),
(12,12.3),
(13,16.4),
(14,22.3),
(15,27.2),
(16,31.1),
(17,34),
(18,35.6),
(19,33.1),
(20,25.1),
(21,21.3),
(22,22.3),
(23,20.3),
(24,18.4),
(25,16.8),
(26,15.6),
(27,15.4),
(28,14.7),
(29,14.1),
(30,14.2),
(31,14),
(32,13.9),
(33,13.9),
(34,13.6),
(35,13.1),
(36,15),
(37,18.2),
(38,21.8),
(39,24.1),
(40,25.7),
(41,29.9),
(42,28.9),
(43,31.7),
(44,29.4),
(45,30.7),
(46,29.9),
(47,27);
--1
WITH xy (x,y)
AS
(
SELECT TOP 12
CAST(hour AS double precision) AS x
,temperature AS y
FROM #temperatures
WHERE cast(hour as integer) % 4 = 0
)
Select x,y
INTO #xy
FROM xy
Select [x] As [x_input]
INTO #x_series
FROM #xy
--2
with recursive
, x_series(input_x) as (
select
min(x)
from
#xy
union all
select
input_x + 0.1
from
x_series
where
input_x + 0.1 < (select max(x) from x)
)
, x_coordinate as (
select
input_x
, max(x) over(order by input_x) as previous_x
from
x_series
left join
#xy on abs(x_series.input_x - xy.x) < 0.001
)
The first CTE works as expected and produces a list of 12 (a sample every 4 hours for two days) but the second produces syntax error. The expected out put would be something like
(4,13.8), (4.1,null/0), (4.2,null/0),....., (8,12.2)

I dont think you need recursive.
What about this:
SQL DEMO
SELECT DISTINCT n = number *1.0 /10 , #xy.x, #xy.y
FROM master..[spt_values] step
LEFT JOIN #xy
ON step.number*1.0 /10 = #xy.x
WHERE number BETWEEN 40 AND 480
This 480 is based on the two days you mention.
OUTPUT
You dont even need the temporal table
SELECT DISTINCT n = number *1.0 /10 , #temperatures.temperature
FROM master..[spt_values] step
LEFT JOIN #temperatures
ON step.number *1.0 / 10 = #temperatures.hour
AND #temperatures.hour % 4 = 0
WHERE number BETWEEN 40 AND 480;

I don't think you need a recursive CTE here. I think a solution like this would be a better approach. Modify accordingly.
DECLARE #max_value FLOAT =
(SELECT MAX(hour) FROM #temperatures) * 10
INSERT INTO #temperatures (hour, temperature)
SELECT X.N / 10, NULL
FROM (
select CAST(ROW_NUMBER() over(order by t1.number) AS FLOAT) AS N
from master..spt_values t1
cross join master..spt_values t2
) X
WHERE X.N <= #max_value
AND X.N NOT IN (SELECT hour FROM #temperatures)

Use the temp table #xy produced in --1 you have, the following will give you a x series:
;with x_series(input_x)
as
(
select min(x) AS input_x
from #xy
union all
select input_x + 0.1
from x_series
where input_x + 0.1 < (select max(x) from #xy)
)
SELECT * FROM x_series;

Multiple date ranges using CTE

I need to generate a table of half hour periods. I have the following which works:
WITH ctePeriods AS
(
SELECT #gapStart HalfHourPeriod
UNION ALL
SELECT DATEADD(MINUTE, 30, HalfHourPeriod)
FROM ctePeriods
WHERE HalfHourPeriod < DATEADD(MINUTE, -30, #gapEnd)
)
Which gives me the values for the range between #gapStart and #gapEnd.
However I also have a table of ranges which I need to generate:
create table #gaps(HHFrom datetime, HHTo datetime)
Currently I'm using this to get the values for #gapStart and #gapEnd used above by getting the min and max from #gaps. But this means I'm filling in more rows then I need in ctePeriods.
Is there any way that I can use the rows in #gaps within ctePeriods so I only create the rows that I need?

I personally prefer using a Tally Table for things like this. You can use a persisted Tally Table, or you create one on the fly (as I do here):
CREATE TABLE #gaps (HHFrom datetime,
HHTo datetime);
INSERT INTO #gaps (HHFrom,
HHTo)
VALUES('20190101','20190103'),
('20190217','20190315'),
('20190708',GETDATE());
GO
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3, N N4, N N5, N N6), --1000000 rows, feel free to increase/decrease per your own requirement
Dates AS(
SELECT G.HHFrom,
G.HHTo,
DATEADD(MINUTE, 30*T.I, G.HHFrom) AS HH
FROM #gaps G
CROSS JOIN Tally T
WHERE DATEADD(MINUTE, 30*T.I, G.HHFrom) <= G.HHTo)
SELECT *
FROM Dates D
ORDER BY D.HHFrom, D.HH;
GO
DROP TABLE #gaps;
Unlike an rCTE, this means that for large ranges the statement won't "fall over" if you have more than 100 rows (the default recursion), and isn't recursive like an rCTE.

Should I use a cursor for this?

I have a table with three fields. Group number, X-coord and Y-coord. There can be from 0 to about 10 rows within each group number.
What I want to do is calculate the maximum and minimum distance between points within each group. Obviously, this will only give you a value if there are 2 or more rows within that group.
Output should consist of fields: group number, minDistance, maxDistance.
Is a cursor a good solution for this?
(Coordinates are in WGS84 and I have a working formula for calculating distances)
My reasoning for using a cursor is that I cannot avoid doing a cross join for each group and then applying the formula for each result of the cross join.

I wouldn't use a cursor in your situation but preferably a scalar User Defined Function with the required group number in argument, and calculate the maximum distance for that group inside the UDF.
Please note the calculation algorithm inside the function is much simpler than what you may have.
create table dist (groupId int, X int, Y int)
insert into dist(groupid, x, y) values (1,14,20),(1,11,20),(1,10,22),(1,12,24),(1,11,28),(1,19,78)
insert into dist(groupid, x, y) values (2,10,20),(2,11,20),(2,10,22),(2,12,24),(2,11,28),(2,17,52)
create function dbo.getMinMaxDistanceForGroup (#groupId int)
returns table as return (
select MIN(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MinDistance,
MAX(SQRT(SQUARE(b.X - a.X) + SQUARE(b.Y - a.Y))) MaxDistance
from dist a cross join dist b
where a.groupId = #groupId and b.groupId = #groupId
)
select groupId, MinDistance, MaxDistance
from dist OUTER APPLY dbo.getMinMaxDistanceForGroup(groupId)
group by groupid, MinDistance, MaxDistance

How to select Top % in T-SQL without using Top clause?

How to select Top 40% from a table without using the Top clause (or Top percent, the assignment is a little ambiguous) ? This question is for T-SQL, SQL Server 2008. I am not allowed to use Top for my assignment.
Thanks.
This is what I've tried but seems complicated. Isn't there an easier way ?
select top (convert (int, (select round (0.4*COUNT(*), 0) from MyTable))) * from MyTable

Try the NTILE function:
;WITH YourCTE AS
(
SELECT
(some columns),
percentile = NTILE(10) OVER(ORDER BY SomeColumn DESC)
FROM
dbo.YourTable
)
SELECT *
FROM YourCTE
WHERE percentile <= 4
The NTILE(10) OVER(....) creates 10 groups of percentages over your data - and thus, the top 40% are the groups no. 1, 2, 3, 4 of that result

Use NTILE
CREATE TABLE #temp(StudentID CHAR(3), Score INT)
INSERT #temp VALUES('S1',75 )
INSERT #temp VALUES('S2',83)
INSERT #temp VALUES('S3',91)
INSERT #temp VALUES('S4',83)
INSERT #temp VALUES('S5',93 )
INSERT #temp VALUES('S6',75 )
INSERT #temp VALUES('S7',83)
INSERT #temp VALUES('S8',91)
INSERT #temp VALUES('S9',83)
INSERT #temp VALUES('S10',93 )
SELECT * FROM (
SELECT NTILE(10) OVER(ORDER BY Score) AS NtileValue,*
FROM #temp) x
WHERE NtileValue <= 4
ORDER BY 1
Interesting enough I blogged about NTILE today: Does anyone use the NTILE() windowing function?

A problem with the NTILE(10) answers given so far is that if the table has 15 rows they will return 8 rows (53%) rather than the correct number to make up 40% (6).
If the number of rows is not evenly divisible by number of buckets the extra rows all go into the first buckets rather than being evenly distributed.
This alternative (borrows SQL Menace's table) avoids that issue.
WITH CTE
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY Score) AS RN,
COUNT(*) OVER() AS Cnt
FROM #temp)
SELECT StudentID,
Score
FROM CTE
WHERE RN <= CEILING(0.4 * Cnt )

Using Top t-sql command:
select top 10 [Column_1],
[Column_2] from [Table]
order by [Column_1]
Using Paging method:
select
[Column_1],
[Column_2]
from
(Select ROW_NUMBER() Over (ORDER BY [Column_1]) AS Row,
[Column_1],
[Column_2]
FROM [Table]) as [alias]
WHERE (Row between 0 and 10)
This is finding the top 10 with order by [Column_1]...please note this is using [variable] method of documentation.
If you could provide column names and table names i could write much more beneficial t-sql, for example to find the top 40% you are going to need to do another sub-query to get count of all rows then do division, i'd likely do this as a query before i do the main query.

Calculate and set ROWCOUNT for whatever number of records.
Then execute you query for the limited set.
declare #rc as integer
select #rc = count(*)*0.40 from CTE
Set ROWCOUNT #rc
select * from CTE
ROWCOUNT is not deprecated yet - see http://msdn.microsoft.com/en-us/library/ms188774.aspx

SQL Filtering A Result Set To Return A Maximum Amount Of Rows At Even Intervals

I currently use SQL2008 where I have a stored procedure that fetches data from a table that then gets fed in to a line graph on the client. This procedure takes a from date and a too date as parameters to filter the data. This works fine for small datasets but the graph gets a bit muddled when a large date range is entered causes thousends of results.
What I'd like to do is provide a max amount of records to be returned and return records at evenly spaced intervals to give that amount. For example say I limited it to 10 records and the result set was 100 records I'd like the stored procedure to return every 10th record.
Is this possible wihtout suffering big performance issues and what would be the best way to achieve it? I'm struggling to find a way to do it without cursors and if thats the case I'd rather not do it at all.
Thanks

Assuming you use at least SQL2005, you could do somesting like
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
)
SELECT a, b
FROM p
WHERE row_no % (total_cnt / 10) = 1
The where condition in the bottom calculates the modulus of the row number by the total number of records divided by the required number of final records.
If you want to use the average instead of one specific value, you would extend this as follows:
WITH p as (
SELECT a, b,
row_number() OVER(ORDER BY time_column) as row_no,
count() OVER() as total_count
FROM myTable
WHERE <date is in range>
),
a as (
SELECT a, b, row_no, total_count,
avg(a) OVER(partition by row_no / (total_cnt / 10)) as avg_a
FROM p
)
SELECT a, b, avg_a
FROM a
WHERE row_no % (total_cnt / 10) = 1
The formula to select one of the values in the final WHERE clause is used with the % replaced by / in the partition by clause.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Creating test data for calculation using RAND() - sql-server

Related

How to use recursive CTE to add resolution to a data set

Multiple date ranges using CTE

Should I use a cursor for this?

How to select Top % in T-SQL without using Top clause?

SQL Filtering A Result Set To Return A Maximum Amount Of Rows At Even Intervals

Categories

Resources