Unwanted Real number generated - database

I have a weird problem if you can call it a problem that is.
Sorry in advance, the database is in french.
I have a table which hold the time a user passed on a specific task
I want to sum the time passed for every task
I'm able to get a sum from the database but the data is kind of wierd
The field is a real number to start with
Example, if I sum 0,35 + 0,63 + 1 I should get 1,98 Data without a sum:
But instead Access give me 1,97999998927116 Data with sum:
If I was to sum only integer the number would be correct
I know I could simply use a round function to get rid of it.
But I would like to know why it does this.

This is because Sum uses floating-point arithmetic if you execute it on a column that is defined as a Single or a Double
Floating-point arithmetic is often inaccurate.
You can avoid these kinds of errors by defining your column as a Decimal or as Currency

Related

Real to Float conversion with no loss of data

I had a table with two columns for coordinates stored in. These columns were REAL datatype, and I noticed that from my application it was only showing 5 decimals for coordinates, and positions were not accurate enough.
I decided to change datatype to FLOAT, so I could use more decimals. It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Anyone can tell me why this happens? What happens with the decimal precision on REAL datatype?. IsnĀ“t the data rounded and truncated when inserted? Why when I changed the datatype the precision came up with no loss of data?..
You want to use a Decimal data-type.
Floating point values are caluclated by a value and an exponenent. This allows you have store huge number representations in small amounts of memory. This also means that you don't always get exactly the number you're looking for, just very very close. This is why when you compare floating point values, you compare them within a certain tolerance.
It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Be careful, this doesn't mean that the value that was filled in is the accurate value of what you're looking for. If you truncated your original calculation, you need to get those numbers again without cutting off any precision. The values that it autofills when you convert from Real to Float aren't the rest of what you truncated, they are entirely new values which result from adding more precision to the calculation used to populate your Real value.
Here is a good thread that explains the difference in data-types in SQL:
Difference between numeric, float and decimal in SQL Server
Another helpful link:
Bad habits to kick : choosing the wrong data type

Named range of consistent random numbers

Background
Following on from a question I asked a while ago about getting an array of different (but not necessarily unique) random numbers to which the answer was this:
=RANDBETWEEN(ROW(A1:A10)^0,10)
To get an array of 10 random numbers between 1 and 10
The Problem
If I create a named range (called "randArray") with the formula above I hoped I would be able to reference randArray a number of times and get the same set of random numbers. Granted, they would change each time I press F9 or update the worksheet -- but change together.
This is what I get instead, two completely different sets of random numbers
I'm not surprised by this behavior but how can I achieve this without using VBA and without putting the random numbers onto the worksheet?
If you're interested
This example is intended to be MCVE. In my actual case, I am using random numbers to estimate Pi. The user stipulates how many random points to apply and gets an accordingly accurate estimation. The problem arises because I also graph the points and when there are a small number of points it's very clear to see that the estimation and the graph don't represent the same dataset
Update
I have awarded the initial bounty to #Michael for providing an interesting and different solution. I am still looking for a complete solution which allows the user to stipulate how many random points to use, and although there might not be a perfect answer I'm still interested in any other possible solutions and more than happy to put up further bounties.
Thank you to everyone who has contributed so far.
This solution generates 10 seemingly random numbers between 1 and 10 that persist for nearly 9 seconds at a time. This allows repeated calls of the same formula to return the same set of values in a single refresh.
You can modify the time frame if required. Shorter time periods allow for more frequent updates, but also slightly increase the extremely unlikely chance that some calls to the formula occur after the cutover point resulting in a 2nd set of 10 random numbers for subsequent calls.
Firstly, define an array "Primes" with 10 different prime numbers:
={157;163;167;173;179;181;191;193;197;199}
Then, define this formula that will return an array of 10 random numbers:
=MOD(ROUND(MOD(ROUND(NOW(),4)*70000,Primes),0),10)+1
Explanation:
We need to build our own random number generator that we can seed with the same value for an amount of time; long enough for the called formula to keep returning the same value.
Firstly, we create a seed: ROUND(NOW(),4) creates a new seed number every 0.0001 days = 8.64 seconds.
We can generate rough random numbers using the following formula:
Random = Seed * 7 mod Prime
https://cdsmith.wordpress.com/2011/10/10/build-your-own-simple-random-numbers/
Ideally, a sequence of random numbers is generated by taking input from the previous output, but we can't do that in a single function. So instead, this uses 10 different prime numbers, essentially starting 10 different random number generators. Now, this has less reliability at generating random numbers, but testing results further below shows it actually seems to do a pretty good job.
ROUND(NOW(),4)*70000 gets our seed up to an integer and multiplies by 7 at the same time
MOD(ROUND(NOW(),4)*70000,Prime) generates a sequence of 10 random numbers from 0 to the respective prime number
ROUND(MOD(ROUND(NOW(),4)*70000,Prime),0) is required to get us back to an integer because Excel seems to struggle with apply Mod to floating point numbers.
=MOD(ROUND(MOD(ROUND(NOW(),4)*70000,Prime),0),10)+1 takes just the value from the ones place (random number from 0 to 9) and shifts it to give us a random number from 1 to 10
Testing results:
I generated 500 lots of 10 random numbers (in columns instead of rows) for seed values incrementing by 0.0001 and counted the number of times each digit occurred for each prime number. You can see that each digit occurred nearly 500 times in total and that the distribution of each digit is nearly equal between each prime number. So, this may be adequate for your purposes.
Looking at the numbers generated in immediate succession you can see similarities between adjacent prime numbers, they're not exactly the same but they're pretty close in places, even if they're offset by a few rows. However, if the refresh is occurring at random intervals, you'll still get seemingly random numbers and this should be sufficient for your purposes. Otherwise, you can still apply this approach to a more complex random number generator or try a different mix of prime numbers that are further apart.
Update 1: Trying to find a way of being able to specify the number of random numbers generated without storing a list of primes.
Attempt 1: Using a single prime with an array of seeds:
=MOD(ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))/10000,4)*70000,1013),0),10)+1
This does give you an even distribution, but it really is just repeating the exact same sequence of 10 numbers over and over. Any analysis of the sample would be identical to analysing =MOD(ROW(1:SampleSize),10)+1. I think you want more variation than that!
Attempt 2: Working on a 2-dimensional array that still uses 10 primes....
Update 2: Didn't work. It had terrible performance. A new answer has been submitted that takes a similar but different approach.
OK, here's a solution where users can specify the number of values in defined name SAMPLESIZE
=MOD(ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize)),4)*10000*163,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*2,4)*10000*211,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*3,4)*10000*17,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*5,4)*10000*179,53),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*7,4)*10000*6101,1013),0),10)+1
It's a long formula, but has good efficiency and can be used in other functions. Attempts at a shorter formula resulted in unusably poor performance and arrays that for some reason couldn't be used in other functions.
This solution combines 5 different prime number generators to increase variety in the generated random numbers. Some arbitrary constants were introduced to try to reduce repeating patterns.
This has correct distribution and fairly good randomness. Repeated testing with a SampleSize of 10,000 resulted in frequencies of individual numbers varying between 960 and 1040 with no overall favoritism. However it seems to have the strange property of never generating the same number twice in a row!
You can achieve this using just standard spreadsheet formulas.
One way is to use the so called Lehmer random number method. It generates a sequence of random numbers in your spreadsheet that stays the same until you change the "seed number", a number you choose yourself and will recreate a different random sequence for each seed number you choose.
The short version:
In cell B1, enter your "seed" number, it can be any number from 1 to 2,147,483,647
In cell B2 enter the formula =MOD(48271*B1,2^31-1) , this will generate the first random number of your sequence.
Now copy this cell down as far as the the random sequence you want to generate.
That's it. For your named range, go ahead and name the range from B2 down as far as your sequence goes. If you want a different set of numbers, just change the seed in B1. If you ever want to recreate the same set of numbers just use the same seed and the same random sequence will appear.
More details in this tutorial:
How to generate random numbers that don't change in Excel and Google Sheets
It's not a great answer but considering the limitation of a volatile function, it is definitely a possible answer to use the IF formula with Volatile function and a Volatile variable placed somewhere in the worksheet.
I used the below formula to achieve the desired result
=IF(rngIsVolatile,randArray,A1:A10)
I set cell B12 as rngIsVolatile. I pasted the screenshots below to see it in working.
When rngIsVolatile is set to True, it picks up new values from randArray:
When rngIsVolatile is set to False, it picks up old values from A1:A10:

TSQL - Do you calculate values then sum, or sum first then calculate values?

I feel stupid asking this - there is probably a math rule I am forgetting.
I am trying to calculate a gross profit based on net sales, cost, and billbacks.
I get two different values based on how I do the calculation:
(sum(netsales) - sum(cost)) + sum(billbackdollars) as CalculateOutsideSum,
sum((netsales - cost) + BillBackDollars) as CalculateWithinSum
This is coming off of a basic transaction fact table.
In this particular example, there are about 90 records being summed, and I get the following results
CalculateOutsideSum: 234.77
CalculateWithinSum: 247.70
I imagined this would be some sort of transitive property and both results would be the same considering it's just summation.
Which method is correct?
From a mathematical point of view, you should get exactly the same value with both your formulas.
Anyway in this cases it's better to performs sum after any calculation.
EDIT AFTER OPENER RESPONSE:
And treat your data with isnull function or other casting function which increases data precision.
Rounding, formatting and castings which decreases data precision should be applied after sums.
Just figured it out...
Problem was Net Sales was null for 3 rows, causing the calculation to become null, and incorrectly summing. After adding an isnull, both sums come out the same.

Best way to remove integer part of float number

To remove integer part of float numbers i use:
update ACTIVITIES set TIME = TIME - FLOOR(TIME) --TIME is float
this works anyway in the calculation there is some errors due to floating point calculation.
EDIT: I cannot modify the schema, TIME must stay float.
The reason i need to do this is that because of a bug the float numbers become > 1 even if the decimal part is still ok. So i need to remove the integer part.
I cannot reproduce it now, but i remember i had something like:
1.6666666667 becomes 0.6666542534, while it should be 0.6666666667.
Please note that this is legacy code so TIME is a float number, while if i'd write this from scratch i would use a TIME datatype.
So my question is: is this correct or can it be improved?
update ACTIVITIES set TIME = TIME - FLOOR(TIME)

Multiply a number with 0.01

I need to multiply a number which is like these 00000000001099 with 0.01 and then convert into two decimal places for e.g., 10.99 after multiplication in a derived column in SSIS package.
Right now I am using these expression (dt_numeric,2,2)((DT_CY)((dt_wstr,14)PRICE) * 0.01) but it is failing.
I get the column price with value 00000000001099 from a flat file after conversion I need to place the value back to a flat file again.
Since your string is 14 long you cannot use DT_I4 - it'll just figure out that this is very wrong and give you the error about potential loss of data. You could edit the error and ignore possible truncations, but a better way is to use a datatype that can hold your number
Your Derivation should look like this:
(DT_NUMERIC,X,2)((DT_NUMERIC,X+2,2)([InputColumn]))*0.01)
In your example
(DT_NUMERIC,14,2)(((DT_NUMERIC,16,2)([PRICE]))*0.01)
By using the extra step with x+2,2 makes you able to hold 99999999999999 into the numeric, then divide by 100 (or multiply with 0.01) and cast back to the minimum possible numeric (x,2) - you might want to use a bigger standardized numeric type - look at MSDN/BOL to see the storage requirements for each of them, and just pick the biggest type taking the same amount of bytes as your requirement.
This should work...
(DT_DECIMAL, 2 )(DT_WSTR, 20 )((DT_I4)#[User::Cost] * 0.01)
While the value 00000000001099 is a number, it cannot be represented this way in a numeric datatype. The leading zeros will be stripped. Because you are showing this number this way, I must presume the number is stored in a string datatype. In the dataflow before your derived column I would recommend the use of the "Data Conversion" component. Convert the string to a numeric type. In the downstream derived column component perform the mathematical multiplcation operation to get the decimal point in the correct place.

Resources