Accuracy loss when converting a tuple of floats into an array in python3 - arrays

I have some problems with the accuracy when I convert a tuple or list of floats into an array with the 'f' typecode, but with the 'd' typecode it runs "correctly".
For example:
import array
a = (2.16, -22.4, 95.12, -63.47, -0.02, 1245.2)
b = array.array('f', a)
print(b)
# array('f', [2.1600000858306885, -22.399999618530273, 95.12000274658203, -63.470001220703125, -0.019999999552965164, 1245.199951171875])
c = array.array('d', a)
print(c)
# array('d', [2.16, -22.4, 95.12, -63.47, -0.02, 1245.2])
As you can see, the array c contains the same numbers as the tuple a, but the array b has accuracy problems.
However both type(b[0]) and type(c[0]) results in <class 'float'>.

There is actually no accuracy-loss in the way you may suspect here, it's a case of "Representation Error".
The literal value 2.16 does not have an exact representation as a float; after parsing it is stored as 0x400147AE147AE148, because Python always uses double precision (see Numbers.real) to represent floats.
The value is then converted to 0x400A3D71 (in case of f) or stays the same (is case of d). These values correspond to 2.1600000858306884765625 and 2.16000000000000014210854715202, both of which are the most accurate representation of the literal 2.16 one could get. The loss of precision from the original 2.16 is inevitable because 2.16 simply does not exist as a precise value.
What you are seeing in the string representation is the interpreter rounding the underlying float/double to a near value if the loss in precision due to that rounding is considered acceptable. The underlying values are as close to 2.16 as they could possibly get in both cases, just their string representation is different.

Related

Why do I get negative values in my array? [duplicate]

import numpy as np
a = np.arange(1000000).reshape(1000,1000)
print(a**2)
With this code I get this answer. Why do I get negative values?
[[ 0 1 4 ..., 994009 996004 998001]
[ 1000000 1002001 1004004 ..., 3988009 3992004 3996001]
[ 4000000 4004001 4008004 ..., 8982009 8988004 8994001]
...,
[1871554624 1873548625 1875542628 ..., -434400663 -432404668 -430408671]
[-428412672 -426416671 -424420668 ..., 1562593337 1564591332 1566589329]
[1568587328 1570585329 1572583332 ..., -733379959 -731379964 -729379967]]
On your platform, np.arange returns an array of dtype 'int32' :
In [1]: np.arange(1000000).dtype
Out[1]: dtype('int32')
Each element of the array is a 32-bit integer. Squaring leads to a result which does not fit in 32-bits. The result is cropped to 32-bits and still interpreted as a 32-bit integer, however, which is why you see negative numbers.
Edit: In this case, you can avoid the integer overflow by constructing an array of dtype 'int64' before squaring:
a=np.arange(1000000,dtype='int64').reshape(1000,1000)
Note that the problem you've discovered is an inherent danger when working with numpy. You have to choose your dtypes with care and know before-hand that your code will not lead to arithmetic overflows. For the sake of speed, numpy can not and will not warn you when this occurs.
See http://mail.scipy.org/pipermail/numpy-discussion/2009-April/041691.html for a discussion of this on the numpy mailing list.
python integers don't have this problem, since they automatically upgrade to python long integers when they overflow.
so if you do manage to overflow the int64's, one solution is to use python int's in the numpy array:
import numpy
a=numpy.arange(1000,dtype=object)
a**20
numpy integer types are fixed width and you are seeing the results of integer overflow.
A solution to this problem is as follows (taken from here):
...change in class StringConverter._mapper (numpy/lib/_iotools.py) from:
{{{
_mapper = [(nx.bool_, str2bool, False),
(nx.integer, int, -1),
(nx.floating, float, nx.nan),
(complex, _bytes_to_complex, nx.nan + 0j),
(nx.string_, bytes, asbytes('???'))]
}}}
to
{{{
_mapper = [(nx.bool_, str2bool, False),
(nx.int64, int, -1),
(nx.floating, float, nx.nan),
(complex, _bytes_to_complex, nx.nan + 0j),
(nx.string_, bytes, asbytes('???'))]
}}}
This solved a similar problem that I had with numpy.genfromtxt for me
Note that the author describes this as a 'temporary' and 'not optimal' solution. However, I have had no side effects using v2.7 (yet?!).

Eiffel: REAL_32.to_double gives a strange value

Trying to transform a real_32 to real_64, I'm getting
real_32: 61.55
real_64: 61.54999923706055
Am I wrong with the to_double function?
This is expected. In the particular example, the binary representation of the decimal 61.55 with single and double precision respectively is:
REAL_32: 0 10000100 11101100011001100110011
REAL_64: 0 10000000100 1110110001100110011001100110011001100110011001100110
As you can see, the trailing pattern 0011 is recurrent and should go ad infinitum to give a precise value.
When REAL_32 is assigned to REAL_64, the trailing 0011s are not added automatically, but filled with zeroes instead:
REAL_32: 0 10000100 11101100011001100110011
REAL_64: 0 10000000100 1110110001100110011001100000000000000000000000000000
In decimal notation, this corresponds to 61.54999923706055. What is essential here, 61.54999923706055 and 61.55 have exactly the same binary representation when using single precision floating numbers. You can check it yourself with print ({REAL_32} 61.55 = {REAL_32} 61.54999923706055). In other words, the results you get are correct, and the two values are the same. The only difference is that when REAL_32 is printed, it is rounded to lower number of meaningful decimal digits.
This is the reason why accounting and bookkeeping software never uses floating-point numbers, only integer and decimal.
As a workaround working for getting from JSON into typescript deserialization, the following worked:
a_real_32.out.to_real_64

Matlab changing array value in error when appending matrices

I have a very strange bug in MATLAB (R2016a) where appending a ones array using vertcat (or using regular appending with [A; B]) results in a matrix where the ones have been scaled down to 0.0001 instead of 1. Multiplying the ones matrix by 10000 fixes the issue but I would like to know why 0.0001 is being appended instead of 1. Here is the code:
temp = ones([1,307200]);
new_coords = vertcat(world_coords, temp);
new_coords
which results in columns like the following being outputted:
0.4449
0.3673
1.8984
0.0001
The type for world_coords is double, so I don't think typecasting is the issue.
As mentioned in my comment, the output is scaled due to the range of the the values in world_coords. You should see in the first line of the output a scaling factor of 1.0e+4.
You can change the output format for example with:
format long
For more details see: format

ValueError: matplotlib display text must have all code points < 128 or use Unicode strings

In my code I get an array like this:
array(['2.83100e+07', '2.74000e+07', '2.79400e+07'],dtype='|S11')
How can I "cut" my values like:
2.83100e+07 --> 2.831 ?
Best regards!
using a for loop and round(n)
In [23]: round(66.66666666666,4)
Out[23]: 66.6667
In [24]: round(1.29578293,6)
Out[24]: 1.295783
help on round():
round(number[, ndigits]) -> floating point number
Round a number to a given precision in decimal digits (default 0 digits). This always returns a floating point number. Precision may be negative

np.around and np.round not actually rounding

I'm writing a simple code in Python 2.7 to change a couple very long files i have into text files so that I can scroll through them in a text reader.
However, i found out that the numpy.array in the file has very long floats that end in unneeded scientific notation. I try and use numpy.around or numpy.round to change these to only have two places after the decimal but it doesn't change anything. Here is my code:
import h5py
import sys
from Tkinter import Tk
from tkFileDialog import askopenfilename
import numpy as np
sys.stdout.write( 'Please pick file from window\n')
fileName = askopenfilename() # show an "Open" dialog box and return the path to the selected file
sys.stdout.write(fileName)
f = h5py.File(fileName, 'r')
dataset = f['/dcoor'][:]
newname = raw_input('New file name ')
print type(dataset[0][0])
dataset = np.asarray(dataset)
dataset = dataset.astype(float)
print type(dataset[0][0])
print '\nDataset before rounding: \n', dataset
dataset = np.around(dataset, decimals = 2)
print '\nDataset after rounding: \n', dataset
np.savetxt(newname,dataset)
I do not get any error messages and my output is this:
New file name test4
<type 'numpy.float32'>
<type 'numpy.float64'>
Dataset before rounding:
[[ 1.48999996e+01 1.07949997e+02 1.80000007e-01 3.59000000e+02
0.00000000e+00]
[ 1.60100002e+01 1.07489998e+02 3.89999986e-01 3.98000000e+02
0.00000000e+00]
[ 1.86700001e+01 1.07669998e+02 5.89999974e-01 4.26000000e+02
0.00000000e+00]
...,
[ 2.78700008e+01 2.75200005e+01 2.99973999e+03 4.15000000e+02
0.00000000e+00]
[ 2.60499992e+01 2.72800007e+01 2.99991992e+03 4.10000000e+02
0.00000000e+00]
[ 2.56599998e+01 2.85400009e+01 3.00009009e+03 4.37500000e+02
0.00000000e+00]]
Dataset after rounding:
[[ 1.49000000e+01 1.07950000e+02 1.80000000e-01 3.59000000e+02
0.00000000e+00]
[ 1.60100000e+01 1.07490000e+02 3.90000000e-01 3.98000000e+02
0.00000000e+00]
[ 1.86700000e+01 1.07670000e+02 5.90000000e-01 4.26000000e+02
0.00000000e+00]
Which is odd since it appears to round some numbers but not others, and keeps the trailing zeros as well. i converted the original array because i thought that might make a difference but obviously it did not. Could the problem be that the array's are so long? Each one is roughly 16,000 rows. Could it be that the original array was saved in an hdf5 file which keeps the original format? I can't go back and retest my mice i work with so if that's the case i'm rather SOL. Thank you for any help.
The numbers are being rounded. The reason why they aren't precisely two decimal places is because IEEE 754 floating point numbers have rounding errors. Since you can't represent all floating point numbers perfectly (with a limited size) in any given base (base 2 in this case), there are implicit precision problems.
Think about numbers like 2/3 or 5/7. You can't perfectly represent them in base 10.
However, I'm not sure why you care about the fact that the way that Numpy visually represents floats with repr uses scientific notation. When you want to write them out you can use loop over the array and specify the precision when writing:
for row in dataset:
for elem in row:
somefile.write("%.2f" % (elem,))
This will ensure that only 2 decimal places are written (and it will round it in the way that you're trying to). But it's important to note that when you load the files, it will still have the same IEEE 754 drawbacks.

Resources