Python 2.7 convert 2d string array to float array - arrays

I read the following string on a .txt file
{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}
using lin = lin.strip() to remove '\n'
Then I replaced { and } to [ and ] using
lin = lin.replace ("{", "[")
lin = lin.replace ("}", "]")
My goal is to convert lin into a float 2d array. So I did
my_matrix = np.array(lin, dtype=float)
but i got an error message: "ValueError: could not convert string to float: [[1,2,3,0],[1,1,1,2],[0,-1,3,9]]"
Removing the dtype, i get an string array. I already tried to multiply lin by 1.0, make a copy of lin using .astype(float), but nothing seems to work.

I am using the JSON library to parse the contents of the file and then iterate through the arrays and converting each element into float. However an integer solution might already be enough for what you want. That one is much faster and shorter.
import json
fc = '{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}'
a = json.loads(fc.replace('{','[').replace('}',']'))
print(a) # a is now array of integers. this might be enough
for linenumber, linecontent in enumerate(a):
for elementnumber, element in enumerate(linecontent):
a[linenumber][elementnumber] = float(element)
print(a) # a is now array of floats
Shorter solution
import json
fc = '{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}'
a = json.loads(fc.replace('{','[').replace('}',']'))
print(a) # a is now array of integers. this might be enough
a = [[float(c) for c in b] for b in a]
print(a) # a is now array of floats
(works for both python 2 and 3)

import numpy as np
readStr = "{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}"
readStr = readStr[2:-2]
# Originally read string is now -> "1,2,3,0},{4,5,6,7},{8,-1,9,0"
line = readStr.split("},{")
# line is now a list object -> ["1,2,3,0", "4,5,6,7", "8,-1,9,0"]
array = []
temp = []
# Now we iterate through 'line', convert each element into a list, and
# then append said list to 'array' on each iteration of 'line'
for string in line:
num_array = string.split(',')
for num in num_array:
temp.append(num)
array.append(temp)
temp = []
# Now with 'array' -> [[1,2,3,0], [4,5,6,7], [8,-1,9,0]]
my_matrix = np.array(array, dtype = float)
# my_matrix = [[1.0, 2.0, 3.0, 0.0]
# [4.0, 5.0, 6.0, 7.0]
# [8.0, -1.0, 9.0, 0.0]]
Although this may not be the most elegant solution, I think it is easy to follow and gives you exactly what you're looking for.

Related

append an element to 2d numpy array

I have a numpy array that has a shape of (500, 151296). Below is the array format
array:
array([[-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607],
[-0.14988784, 0.12030973, 0.34801325, ..., 0.36962894,
0.04133283, 0.04434045],
[-0.3080041 , 0.18728344, 0.36068922, ..., 0.09335024,
-0.11459247, 0.10187756],
...,
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663],
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663],
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663]], dtype=float32)
array[0]:
array([-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607], dtype=float32)
I have another list that has stopwords which are same size of the numpy array shape
stopwords = ['no', 'not', 'in' .........]
I want to add each stopword to the numpy array which has 500 elements. Below is the code that I am using to add
for i in range(len(stopwords)):
array = np.append(array[i], str(stopwords[i]))
I am getting the below error
IndexError Traceback (most recent call last)
<ipython-input-45-361e2cf6519b> in <module>
1 for i in range(len(stopwords)):
----> 2 array = np.append(array[i], str(stopwords[i]))
IndexError: index 2 is out of bounds for axis 0 with size 2
Desired output:
array[0]:
array([-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607, 'no'], dtype=float32)
Can anyone tell me where am I doing wrong?
What you are doing wrong is that you overwrite the variable array inside the for loop:
for i in range(len(stopwords)):
array = np.append(array[i], str(stopwords[i]))
# ^^^^^ ^^^^^
But what you are also doing wrong is to use np.append in a for loop, which is almost always a bad idea.
You could rather do something like:
from string import ascii_letters
from random import choices
import numpy as np
N, M = 50, 7
arr = np.random.randn(N, M)
stopwords = np.array(["".join(choices(ascii_letters, k=10)) for _ in range(N)])
result = np.concatenate([arr, stopwords[:, None]], axis=-1)
assert result.shape == (N, M+1)
print(result[0]) # ['0.1' '-1.2' '-0.1' '1.6' '-1.4' '-0.2' '1.7' 'ybWyFlqhcS']
But it is also wrong, mixing data types for no apparent reason.
Imho, you better just keep the two arrays.
Depending on what you are doing you can iterate over them as follows:
for vector, stopword in zip(arr, stopwords):
print(f"{stopword = }")
print(f"{vector = }")
# stopword = 'RgfTVGzPOl'
# vector = array([-0.9, 1.1, 0.7 , -0.3 , -0.7 , -0.7, -0.6])
#
# stopword = 'XlJqKdsvCC'
# vector = array([-0.5, 0.1, -0.7 , -0.6, -1.1, -0.6, -0.6])
#
#...
Let's try some debugging.
Start with a smaller float array:
In [76]: arr = np.arange(12).reshape(3,4).astype(float)
In [77]: arr
Out[77]:
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
In [78]: words = ['no','not','in']
In [79]: for i in range(3):
...: arr = np.append(arr[i], str(words[i]))
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [79], in <cell line: 1>()
1 for i in range(3):
----> 2 arr = np.append(arr[i], str(words[i]))
IndexError: index 2 is out of bounds for axis 0 with size 2
Look at i and arr when you get the error:
In [80]: arr
Out[80]: array(['1.0', 'not'], dtype='<U3')
In [81]: i
Out[81]: 2
arr looks nothing like the original arr, does it? It's a 1d array with 2 string elements. It's arr[2] that's raising the error. Do you understand why?
Recreate arr, and perform just one step:
In [82]: arr = np.arange(12).reshape(3,4).astype(float)
In [83]: np.append(arr[0], words[0])
Out[83]: array(['0.0', '1.0', '2.0', '3.0', 'no'], dtype='<U32')
That looks a bit like what you want for the first row, except it is string dtype. But you don't want to replace the original arr with this 1d array, do you?
Doing the i=1 step on this result produces
In [84]: np.append(Out[83][1], words[1])
Out[84]: array(['1.0', 'not'], dtype='<U3')
Which is the array that i=2 is having problems with (a shape (2,) array).
Don't just throw up your hands in despair when you get an error - debug by looking at variables, and testing the code step by step.
The kind of iteration that you attempt does work for lists:
In [85]: alist = arr.tolist()
In [86]: alist
Out[86]: [[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0, 11.0]]
In [87]: for i in range(3):
...: alist[i].append(words[i])
...:
In [88]: alist
Out[88]:
[[0.0, 1.0, 2.0, 3.0, 'no'],
[4.0, 5.0, 6.0, 7.0, 'not'],
[8.0, 9.0, 10.0, 11.0, 'in']]
The elements of a list can differ in length; list append works in-place; lists can contain numbers and strings. None of this holds true for numpy arrays.
As a general rule, trying to replicate list methods with numpy arrays does not work.

scipy curve_fit with arrays TypeError: only length-1 arrays can be converted to Python scalars

I am trying to create the curve fit with scipy for the energy eigenvalues calculated from a 4x4 Hamiltonian matrix. In the following error "energies" corresponds to the function in which I define the Hamiltonian, "xdata" is an array given after and out of the function and corresponds to k and "e" is the energy eigenvalues that a get.
The error seems to be at the Hamiltonian matrix. However if I run the code without the curve_fit everything works fine.
I have also tried using np.array according to other questions I found here but again it doesn't work.
If a give a specific xdata in the curve fit, like xdata[0], the code works but it doesn't help me much since I want the fit using all values.
Does anyone know what is the problem? Thank you all in advance!
Traceback (most recent call last):
File "fitest.py", line 70, in <module>
popt, pcov = curve_fit(energies,xdata, e)#,
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 651, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kwargs)
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 377, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 453, in _general_function
return function(xdata, *params) - ydata
File "fitest.py", line 23, in energies
[ 0.0, 0.0, 0.0, ep-2*V4*cos(kpt*d) ]],dtype=complex)
TypeError: only length-1 arrays can be converted to Python scalars
Code:
from numpy import sin, cos, array
from scipy.optimize import curve_fit
from numpy import *
from numpy.linalg import *
def energies(kpt, a=1.0, b=2.0, c=3.0, f=4.0):
e1=-15.0
e2=-10.0
d=1.0
v0=(-2.0/d**2)
V1=a*v0
V2=b*v0
V3=c*v0
V4=d*v0
basis=('|S, s>', '|S,px>', '|S, py>', '|S,pz>')
h=array([[ e1-2*V1*cos(kpt*d), -2*V2*1j*sin(kpt*d), 0.0, 0.0 ],
[ 2*V2*1j*sin(kpt*d), e2-2*V3*cos(kpt*d), 0.0, 0.0],
[ 0.0, 0.0, e2-2*V4*cos(kpt*d), 0.0],
[ 0.0, 0.0, 0.0, e2-2*V4*cos(kpt*d) ]],dtype=complex)
e,psi=eigh(h)
return e
print energies(kpt=0.0)
k2=0.4*2*pi/2.05
print energies(kpt=k2)
xdata = array([0.0,k2])
print xdata
popt, pcov = curve_fit(energies, xdata, e)
print " "
print popt
print " "
Your problem has nothing to do with your fit, you run into the same problem, if you perform
print energies(xdata)
The reason for this error message is that you put an array kpt into h as an array element and then tell numpy, to transform this array kpt into a complex number. Numpy is kind enough to transform an array of length one into a scalar, which then can be transformed into a complex number. This explains, why you didn't get an error message with xdata[0]. You can easily reproduce your problem like this
import numpy as np
#all fine with an array of length one
xa = np.asarray([1])
a = np.asarray([[xa, 2, 3], [4, 5, 6]])
print a
print a.astype(complex)
#can't apply dtype = complex to an array with two elements
xb = np.asarray([1, 2])
b = np.asarray([[xb, 2, 3], [4, 5, 6]])
print b
print b.astype(complex)
Idk, what you were trying to achieve with your energies function, so I can only speculate, what you were aiming at, when constructing the h array. Maybe a 3D array like this?
kpt = np.asarray([1, 2, 3])
h = np.zeros(16 * len(kpt), dtype = complex).reshape(len(kpt), 4, 4)
h[:, 0, 0] = 2 * kpt + 1
h[:, 0, 1] = kpt ** 2
h[:, 3, 2] = np.sin(kpt)
print h

Saving to an empty array of arrays from nested for-loop

I have an array of arrays filled with zeros, so this is the shape I want for the result.
I'm having trouble saving the nested for-loop to this array of arrays. In other words, I want to replace all of the zeros with what the last line calculates.
percent = []
for i in range(len(F300)):
percent.append(np.zeros(lengths[i]))
for i in range(0,len(Name)):
for j in range(0,lengths[i]):
percent[i][j]=(j+1)/lengths[i]
The last line only saves the last j value for each i.
I'm getting:
percent = [[0,0,1],[0,1],[0,0,0,1]]
but I want:
percent = [[.3,.6,1],[.5,1],[.25,.5,75,1]]
The problem with this code is that because it's in Python 2.7, the / operator is performing "classic" division. There are a couple different approaches to solve this in Python 2.7. One approach is to convert the numbers being divided into floating point numbers:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]): #
percent[i][j] = float(j + 1) / float(lengths[i])
Another approach would be to import division from the __future__ package. However, this import line must be the first statement in your code.
from __future__ import division
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]):
percent[i][j] = (j + 1) / lengths[i]
The third approach, and the one I personally prefer, is to make good use of NumPy's built-in functions:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = np.array([np.linspace(1.0 / float(l), 1.0, l) for l in lengths])
All three approaches will produce a list (or in the last case, numpy.ndarray object) of numpy.ndarray objects with the following values:
[[0.33333333, 0.66666667, 1.], [0.5, 1.], [0.25, 0.5, 0.75, 1.]]

How to decode a numpy array of encoded literals/strings in Python3? AttributeError: 'numpy.ndarray' object has no attribute 'decode'

In Python 3, I have the follow NumPy array of strings.
Each string in the NumPy array is in the form b'MD18EE instead of MD18EE.
For example:
import numpy as np
print(array1)
(b'first_element', b'element',...)
Normally, one would use .decode('UTF-8') to decode these elements.
However, if I try:
array1 = array1.decode('UTF-8')
I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'decode'
How do I decode these elements from a NumPy array? (That is, I don't want b'')
EDIT:
Let's say I was dealing with a Pandas DataFrame with only certain columns that were encoded in this manner. For example:
import pandas as pd
df = pd.DataFrame(...)
df
COL1 ....
0 b'entry1' ...
1 b'entry2'
2 b'entry3'
3 b'entry4'
4 b'entry5'
5 b'entry6'
You have an array of bytestrings; dtype is S:
In [338]: arr=np.array((b'first_element', b'element'))
In [339]: arr
Out[339]:
array([b'first_element', b'element'],
dtype='|S13')
astype easily converts them to unicode, the default string type for Py3.
In [340]: arr.astype('U13')
Out[340]:
array(['first_element', 'element'],
dtype='<U13')
There is also a library of string functions - applying the corresponding str method to the elements of a string array
In [341]: np.char.decode(arr)
Out[341]:
array(['first_element', 'element'],
dtype='<U13')
The astype is faster, but the decode lets you specify an encoding.
See also How to decode a numpy array of dtype=numpy.string_?
If you want the result to be a (Python) list of strings, you can use a list comprehension:
>>> l = [el.decode('UTF-8') for el in array1]
>>> print(l)
['element', 'element 2']
>>> print(type(l))
<class 'list'>
Alternatively, if you want to keep it as a Numpy array, you can use np.vectorize to make a vectorized decoder function:
>>> decoder = np.vectorize(lambda x: x.decode('UTF-8'))
>>> array2 = decoder(array1)
>>> print(array2)
['element' 'element 2']
>>> print(type(array2))
<class 'numpy.ndarray'>

Cython buffer protocol: how to retrieve data?

I'm trying to setup a buffer protocol in cython. I declare a new class in which I setup the two necessary methods __getbuffer__ and __releasebuffer__
FYI I'm using Cython0.19 and Python2.7 and here is the cython code:
cimport numpy as CNY
# Cython buffer protocol implementation for my array class
cdef class P_NpArray:
cdef CNY.ndarray npy_ar
def __cinit__(self, inpy_ar):
self.npy_ar=inpy_ar
def __getbuffer__(self, Py_buffer *buffer, int flags):
cdef Py_ssize_t ashape[2]
ashape[0]=self.npy_ar.shape[0]
ashape[1]=self.npy_ar.shape[1]
cdef Py_ssize_t astrides[2]
astrides[0]=self.npy_ar.strides[0]
astrides[1]=self.npy_ar.strides[1]
buffer.buf = <void *> self.npy_ar.data
buffer.format = 'f'
buffer.internal = NULL
buffer.itemsize = self.npy_ar.itemsize
buffer.len = self.npy_ar.size*self.npy_ar.itemsize
buffer.ndim = self.npy_ar.ndim
buffer.obj = self
buffer.readonly = 0
buffer.shape = ashape
buffer.strides = astrides
buffer.suboffsets = NULL
def __releasebuffer__(self, Py_buffer *buffer):
pass
This code compiles fine. But I can't retrieve the buffer data properly.
See the following test where:
I create a numpy array
load it with my buffer protocoled class
try to retrieve it as numpy array (Just to showcase my problem):
>>> import myarray
>>> import numpy as np
>>> ar=np.ones((2,4)) # create a numpy array
>>> ns=myarray.P_NpArray(ar) # declare numpy array as a new numpy-style array
>>> print ns
<myarray.P_NpArray object at 0x7f30f2791c58>
>>> nsa = np.asarray(ns) # Convert back to numpy array. Buffer protocol called here.
/home/tools/local/x86z/lib/python2.7/site-packages/numpy/core/numeric.py:235: RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size.
return array(a, dtype, copy=False, order=order)
>>> print type(nsa) # Output array has the correct type
<type 'numpy.ndarray'>
>>> print "nsa=",nsa
nsa= <myarray.P_NpArray object at 0x7f30f2791c58>
>>> print "nsa.data=", nsa.data
nsa.data= Xy�0
>>> print "nsa.itemsize=",nsa.itemsize
nsa.itemsize= 8
>>> print "nsa.size=",nsa.size # Output array has a WRONG size
nsa.size= 1
>>> print "nsa.shape=",nsa.shape # Output array has a WRONG shape
nsa.shape= ()
>>> np.frombuffer(nsa.data, np.float64) # I can't get a proper read of the data buffer
[ 6.90941928e-310]
I looked around for the RuntimeWarning and found out that it probably was not relevant see PEP 3118 warning when using ctypes array as numpy array http://bugs.python.org/issue10746 and http://bugs.python.org/issue10744. What do you think ?
Obviously the buffer shape and size are not properly transmitted. So. What am I missing ? Is my buffer protocol correctly defined ?
Thanks

Resources