Avg of float inconsistency

Avg of float inconsistency - sql-server

The select returns right at 23,000 rows
The except will return between 60 to 200 rows (and not the same rows)
The except should return 0 as it is select a except select a
PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
[tf] is a float and and I get float is not exact
But I naively thought avg(float) would be repeatable
Avg(float) does appear to be repeatable
What is the solution?
TF is between 0 and 1 and I only need like 5 significant digits
I just need avg(TF) to be the same number run to run
Decimal(9,8) gives me enough precision and if I cast to decimal(9,8) the except properly returns 0
I can change [TF] to decimal(9,8) but it will be bit of work and lot of regression testing as some of the test that use [tf] take over a day to run
Is change [TF] to decimal(9,8) the best solution?
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
except
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
order by [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
In this case tf is term frequency of tf-idf
tf normalization is subjective and does not require much precision
Avg(tf) needs to be consistent from select to select or the results are not consistent
In a single select with joins I need a consistent avg(tf)
Going with decimal and a low precision for tf got consistent results

This is very similiar to: SELECT SUM(...) is non-deterministic when adding the column-values of datatype float.
The problem is that with inaccurate datatype (FLOAT/REAL) the order of of arithmetic operations on floating point matters. Demo from connect:
DECLARE #fl FLOAT = 100000000000000000000
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl - 100000000000000000000
SELECT CONVERT(NVARCHAR(40), #fl, 2)
-- 0.000000000000000e+000
DECLARE #fl FLOAT = 0
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl + 100000000000000000000
SET #fl = #fl - 100000000000000000000
SELECT #fl
-- 507904
LiveDemo
Possible solutions:
CAST all arguments to accurate datatype like DECIMAL/NUMERIC
alter table and change FLOAT to DECIMAL
you can try to force query optimizer to calculate the sum with the same order.
The good news is that when a stable query result matters to your
application, you can force the order to be the same by preventing
parallelism with OPTION (MAXDOP 1).
It looks like intial link is dead. WebArchive

Related

How can I query a varchar(x) value for the number of decimal places

I have a varchar(250) ParameterValue that I would like to check the number of decimal places in.
This is the line of code that I cannot get working:
where RIGHT(CAST(ParameterValue as DECIMAL(10,5)), ParameterValue), 1) != 0
The code below is where the line of code is used:
select *
INTO #ParamPrecision
from Data_table
where RIGHT(CAST(ParameterValue as DECIMAL(10,5)), ParameterValue), 1) != 0
AND ParameterBindStatus = 0
UPDATE a
SET a.ParameterBindStatus = 5
FROM Data_table, #ParamPrecision b
WHERE a.SQLParameterId = b.SQLParameterId
INSERT Log_table
(
SQLBatchId,
SQLProcess,
Error,
SQLError_Message,
ParametersSent
)
SELECT SQLBatchId,
'sp_ReadParametersToBindData',
1,
'Invalid parameter value sent from MES',
'some parameter info'
FROM #ParamPrecision
SELECT *
INTO #UnBoundPrompt
FROM Data_table
WHERE DATEADD(DAY, 1, SQLTimeStamp) < GETDATE()
AND ParameterBindStatus = 0
UPDATE a
SET a.ParameterBindStatus = 99
FROM Data_tablea, #UnBoundPrompt b
WHERE a.SQLParameterId = b.SQLParameterId
INSERT Log_table
(
SQLBatchId,
SQLProcess,
Error,
SQLError_Message,
ParametersSent
)
SELECT SQLBatchId,
'sp_ReadParametersToBindData',
1,
'Parameter download timeout',
'some parameter info'
FROM #UnBoundPrompt
If the check for decimal places is not satisfied, the next select statement checks if the parameter timestamp is active for more than 1 day. If this is satisfied, a log entry is made.
If the number of decimal places exceeds 4, then I want to set the ParameterBindStatus = 5 and update the log table.
I have changed the code as follows to allow me to confirm the rest of the code and that works but the code does not execute when trying to detect number of decimal places.
select *
INTO #ParamPrecision
from Data_table
where ParameterValue > '1500'
AND ParameterBindStatus = 0

this may help with your precision problem - I've laid it out as a table so you can see each step of the transformation but you can easily see the pattern :) essentially you just reverse the string and truncate it. All steps included (can be done faster) - you may/may not need to add a bit for the case that there is no decimal point.
--setup
create table test
(stringVal varchar(250));
insert into test values
('12.3456'),
('1.2345678'),
('12'),
('0.123')
--query
SELECT stringVal,
Reverse(CONVERT(VARCHAR(50), stringVal, 128)) as reversedText
, Cast(Reverse(CONVERT(VARCHAR(50), stringVal, 128)) as float) as float
, Cast(Cast(Reverse(CONVERT(VARCHAR(50), stringVal, 128)) as float) as bigint) as bigint
, len(Cast(Cast(Reverse(CONVERT(VARCHAR(50), stringVal, 128)) as float) as bigint)) as decimalPrecision
FROM test

TSQL remap big number to smaller, but keep Identity

I'm trying to load a huge number into Field1 INT which can hold only max=2,147,483,647, according to it I can't change DDL, so tried to find adhoc solution to cut out single digit from the middle of this number and then add check for uniqueness.
This numbers are in the format like: 29000001234, so I mean to keep this format with zeros in the middle to easy recognizing. I don't want to introduce any new columns/tables into this task, as limited in freedom there, this is 3rd party schema.
Can anybody suggest better solution, how to remap/keep all numbers under that limit; this is my draft:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
SELECT #i = (SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),1,4) +
SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),7,LEN(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)))) )
select #i;

But if you really want to remove the middle digits, here's another approach:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #StringFloat as varchar(80)
SET #StringFloat = CONVERT(varchar(80), CAST(#fl AS bigint))
SET #I = CAST( CONCAT(LEFT( #StringFloat, 4 ), RIGHT( #StringFloat, 5 )) as int )
SELECT #i;

I think arithmetic operations should be less expensive than string operations, so you should use them instead:
DECLARE #fl FLOAT = 29000001234
DECLARE #flBig BIGINT = #fl
DECLARE #i INT
SET #i = (#flBig / 1000000000) * 10000000 + (#flBig % 100000000)
select #i; --> 290001234
Provided example assumes the first part of the number will have a maximum of two digits (i.e. 29 in your case) and that you want to allow larger number in the left part (up to 999999).
NOTE: parentheses are redundant, as division and multiplication have the same priority and modulo operator has higher precedence over addition. I have used them just to highlight the parts of the computation.

You can't do that without any arithmetic overflow, or with out losing your original data.
If you have a limitation in columns of your destination table or query, use multiple rows:
declare #c bigint = 29000001234;
declare #s bigint = 1000000000; -- Separator value
;with cte(partNo, partValue) as (
select 1, #c % #s
union all
select partNo + 1, (#c / power(#s, partNo)) % #s
from cte
where (#c / power(#s, partNo)) > 0
)
select partValue
from cte;

Seems like a strange situation, not sure why you need to go to all the trouble of converting a big number to a string and then randomly remove a digit. Some more information about why or what the real goal is would be helpful.
That said, maybe it would be easier to just subtract a constant amount from these values? e.g.:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #OFFSET BIGINT = 29000000000
SET #I = CAST(#fl AS BIGINT)-#OFFSET
SELECT #I
Which gives you an INT of 1234 as the result using your example.

The following creation drops increasingly wide blocks of digits from the original third party value and returns the results that fit in an INT. The results could be outer joined with the existing data to find a suitable new value.
declare #ThirdPartyValue as BigInt = 29000001234;
declare #MaxInt as BigInt = 2147483647;
declare #TPV as VarChar(19) = Cast( #ThirdPartyValue as VarChar(19) );
declare #TPVLen as Int = Len( #TPV );
with
-- 0 through 9.
Digits as (
select Digit from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Digit ) ),
-- 0 through #TPVLen .
Positions as (
select Ten_1.Digit * 10 + Ten_0.Digit as Number
from Digits as Ten_0 cross join Digits as Ten_1
where Ten_1.Digit * 10 + Ten_0.Digit <= #TPVLen ),
-- 1 through #TPVLen - 1 .
Widths as (
select Number
from Positions
where 0 < Number and Number < #TPVLen ),
-- Try dropping Width digits at Position from #TPV .
AlteredTPVs as (
select P.Number as Position, W.Number as Width,
Stuff( #TPV, P.Number, W.Number, '' ) as AlteredTPV
from Positions as P cross join Widths as W
where P.Number + W.Number <= #TPVLen )
-- See which results fit in an Int .
select Position, Width, AlteredTPV, Cast( AlteredTPV as BigInt ) as AlteredTPVBigInt
from AlteredTPVs
where Cast( AlteredTPV as BigInt ) <= #MaxInt -- Comment out this line to see all results.
order by Width, Position
It could be more clever about returning only distinct new values.
This general idea could be used to hunt down blocks of zeroes or other suitable patterns to arrive at a set of values to be tested against the existing data.

How to generate random alphanumeric unique characters with specified length

Problem is described below:
Generate a unique alphanumeric characters.
Length of characters should be 32.
Unique numbers may be seeded in the current time to help in the uniqueness of the generated numbers.
Alphabet characters must come from this pool: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Sample Output: 445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1
I can do this using Java, but will greatly appreciate if you could help me do this on MS SQL or T-SQL.

First, you need to split the string into separate rows. Then, do a SELECT with ORDER BY NEWID() for the random sort. Finally, use FOR XML PATH('') to concatenate them back:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E4
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
ONLINE DEMO
The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
SELECT 1 FROM( VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1)
)t(N)
),
CteTally(N) AS(
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E1
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result

I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".
For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid() will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid(), so make sure log2(len(pool)) x output length is at least 128.
The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of #e. For the question #e only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max). If you need something more truly random you could find another source for the entropy bits like crypt_gen_random(). Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.
This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.
declare #characterPool varchar(256) =
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare #outputLength int = 32;
declare #n int = 0; /* counter */
declare #numLoops int = ceiling(log(len(#characterPool)) / log(2) * #outputLength / 128)
declare #e varbinary(128) = 0x; /* entropy */
while #n < #numLoops
begin
set #e = cast(newid() as binary(16)); set #n += 1;
end
declare #b int; /* byte */
declare #d int; /* dividend */
declare #out varchar(128) = '';
declare #outputBase int = len(#characterPool);
declare #entropyBytes int = len(#e);
declare #m int = 0;
while #m < #outputLength
begin
set #b = 0; set #d = 0; set #n = 0;
while #n < #entropyBytes /* big-endian */
begin
set #b = (#b - #d * #outputBase) * 256 + cast(substring(#e, #n + 1, 1) as int);
set #d = #b / #outputBase;
set #e = cast(stuff(#e, #n + 1, 1, cast(#d as binary(1))) as varbinary(128));
set #n += 1;
end
set #out = substring(#characterPool, #b - #d * #outputBase + 1, 1) + #out;
set #m += 1;
end
select #out as "UniqueString"
http://rextester.com/EYAK79470
As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.
Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random() compares to newid() in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid on the little end and attach the rest to the front.
declare #e varbinary(1024) = cast(newid() as binary(16));
declare #padBytes int = ceiling(log(len(#characterPool)) / log(2) * #outputLength) - 128;
if #padBytes > 0 set #e = crypt_gen_random(#padBytes) + #e; /* big end plus little end */

Given two lat/longs, how can I tell if they are within 1 mile of each other?

I'm trying to implement a very efficient check to see whether two points are within a mile of each other.
I only care whether they are within a mile - nothing else about the distance matters to me.
Because of that narrow focus, I am not looking for a general purpose "how far apart are these points" function.
My current approach is to compute the Haversine distance, and then check to see if it's less than a mile.
Efficiency matters in this case because I have to compute this yes/no flag for large record sets.
So, what is the most efficient way to tell whether two lat/long points are within a mile of each other?
I'm doing this check in T-SQL, not that it matters much.
My current haversine computation is below.
CREATE FUNCTION dbo.USR_UFN_HAVERSINE_DISTANCE
(
#LAT1 FLOAT(18)
,#LONG1 FLOAT(18)
,#LAT2 FLOAT(18)
,#LONG2 FLOAT(18)
,#UnitOfMeasure NVARCHAR(10) = 'KILOMETERS'
)
RETURNS FLOAT(18)
AS
BEGIN
DECLARE
#R FLOAT(8)
,#DLAT FLOAT(18)
,#DLON FLOAT(18)
,#A FLOAT(18)
,#C FLOAT(18)
,#D FLOAT(18)
;
SET #R =
CASE #UnitOfMeasure
WHEN 'MILES' THEN 3956.55
WHEN 'KILOMETERS' THEN 6367.45
WHEN 'FEET' THEN 20890584
WHEN 'METERS' THEN 6367450
ELSE 6367.45 --km
END
SET #DLAT = RADIANS(#LAT2 - #LAT1);
SET #DLON = RADIANS(#LONG2 - #LONG1);
SET #A = SIN(#DLAT / 2)
* SIN(#DLAT / 2)
+ COS(RADIANS(#LAT1))
* COS(RADIANS(#LAT2))
* SIN(#DLON / 2)
* SIN(#DLON / 2);
SET #C = 2 * ASIN(MIN(SQRT(#A)));
SET #D = #R * #C;
RETURN #D;
END;

Since you specify that you need to run this over large data sets, I'd suggest a table-valued function. Better if you can pre-compute the geography points, but this does it all inline.
create function dbo.fn_areWithinOneMile(#long1 float, #lat1 float, #long2 float, #lat2 float)
returns table
as
return
select cast(
case when
geography::Point(#lat1, #long1, 4236).STDistance(geography::Point(#lat2, #long2, 4236)) > 1609.34 then 0
else 1
end as bit) as [withinOneMile?]
go
with cte as (select * from (values
(42, 42),
(43, 43),
(44, 44)
) as x(lat, long)
), j as (
select long, lat, lag(long, 1) over (order by lat) as long2, lag(lat, 1) over (order by lat) as lat2
from cte
)
select *
from j
cross apply dbo.fn_areWithinOneMile(long, lat, long2, lat2) as o
where long2 is not null;

DECLARE
#pt1 geography,
#pt2 geography;
SET #pt1 = geography::Point(45.65100, -120.34900, 4326);
SET #pt2 = geography::Point(44.65100, -120.37654, 4326);
SELECT #pt1.STDistance(#pt2);
-The return value is in meters though you can specify the return by changing the SRID.
-The list of SRID's are available here
Select * from sys.spatial_reference_systems

want to generate coupon code 5 digit number [duplicate]

This question already has answers here:
TSQL Generate 5 character length string, all digits [0-9] that doesn't already exist in database
(6 answers)
Closed 7 years ago.
i want to create a coupon code generator by using SQL database but i don't know how to generate 1000 of random number without repeating them. so can someone help me it's important. thanks

Records in a relational database tables are unordered by nature.
therefor, you can simply create a table that has all the values between #First and #Last (0 and 9999 in your case), and then use a random order by when selecting from that table. you can also use a simple int in the database table and just format it when you select the data from the table.
Since my main database is Sql server, and I have no experience with sqlite, I will use Sql Server syntax in my code example, and leave it up to you to find the sqllite equivalent.
First, create the table:
CREATE TABLE Tbl
(
IntValue int PRIMARY KEY,
IsUsed bit NOT NULL DEFAULT 0
)
Then, populate it with numbers between 0 and 9999:
;With CTE AS (
SELECT 0 As IntValue
UNION ALL
SELECT IntValue + 1
FROM CTE
WHERE IntValue + 1 < 10000
)
INSERT INTO Tbl (IntValue)
SELECT IntValue
FROM CTE
OPTION(MAXRECURSION 0)
Then, you want to select multiple values each time, so I would write a stored procedure like this:
CREATE PROCEDURE stp_GetCouponCodes
(
#Number int = 5 -- or whatever number is default
)
AS
BEGIN
DECLARE #UsedValues AS TABLE
(
IntValue int
)
BEGIN TRY
BEGIN TRANSACTION
INSERT INTO #UsedValues
SELECT TOP(#Number) IntValue
FROM Tbl
WHERE IsUsed = 0
ORDER BY NEWID()
UPDATE Tbl
SET IsUsed = 1
FROM Tbl
INNER JOIN
#UsedValues uv ON(Tbl.IntValue = uv.IntValue)
SELECT RIGHT('00000' + CAST(IntValue as varchar), 5)
FROM #UsedValues
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF ##TRANCOUNT > 0
ROLLBACK TRANSACTION
END CATCH
END
Then, when ever you want to generate coupons, simply execute the stored procedure with the number of coupons you want:
EXEC stp_GetCouponCodes 10;
See working fiddle example here.

The code below uses a quick method to generate 100 random 5-character strings based on the alphabet provided. You'll still need to perform duplicate checking, but this should get you started.
DECLARE #Quantity INT = 1000
DECLARE #Alphabet VARCHAR(100) = '0123456789'
DECLARE #Length INT = LEN(#Alphabet)
DECLARE #Top INT = SQRT(#Quantity) + 1
;WITH CTE AS (
SELECT TOP (#Top) *
FROM sys.objects
)
SELECT TOP (#Quantity)
SUBSTRING(#Alphabet, ABS(CHECKSUM(NEWID())) % #Length + 1, 1)
+ SUBSTRING(#Alphabet, ABS(CHECKSUM(NEWID())) % #Length + 1, 1)
+ SUBSTRING(#Alphabet, ABS(CHECKSUM(NEWID())) % #Length + 1, 1)
+ SUBSTRING(#Alphabet, ABS(CHECKSUM(NEWID())) % #Length + 1, 1)
+ SUBSTRING(#Alphabet, ABS(CHECKSUM(NEWID())) % #Length + 1, 1)
AS [Code]
FROM CTE X
CROSS JOIN CTE Y

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Avg of float inconsistency - sql-server

Related

How can I query a varchar(x) value for the number of decimal places

TSQL remap big number to smaller, but keep Identity

How to generate random alphanumeric unique characters with specified length

Given two lat/longs, how can I tell if they are within 1 mile of each other?

want to generate coupon code 5 digit number [duplicate]

Categories

Resources