I have a numpy array that has a shape of (500, 151296). Below is the array format
array:
array([[-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607],
[-0.14988784, 0.12030973, 0.34801325, ..., 0.36962894,
0.04133283, 0.04434045],
[-0.3080041 , 0.18728344, 0.36068922, ..., 0.09335024,
-0.11459247, 0.10187756],
...,
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663],
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663],
[-0.17399777, -0.02492459, -0.07236133, ..., 0.08901921,
-0.17250113, 0.22222663]], dtype=float32)
array[0]:
array([-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607], dtype=float32)
I have another list that has stopwords which are same size of the numpy array shape
stopwords = ['no', 'not', 'in' .........]
I want to add each stopword to the numpy array which has 500 elements. Below is the code that I am using to add
for i in range(len(stopwords)):
array = np.append(array[i], str(stopwords[i]))
I am getting the below error
IndexError Traceback (most recent call last)
<ipython-input-45-361e2cf6519b> in <module>
1 for i in range(len(stopwords)):
----> 2 array = np.append(array[i], str(stopwords[i]))
IndexError: index 2 is out of bounds for axis 0 with size 2
Desired output:
array[0]:
array([-0.18510018, 0.13180602, 0.32903048, ..., 0.39744213,
-0.01461623, 0.06420607, 'no'], dtype=float32)
Can anyone tell me where am I doing wrong?
What you are doing wrong is that you overwrite the variable array inside the for loop:
for i in range(len(stopwords)):
array = np.append(array[i], str(stopwords[i]))
# ^^^^^ ^^^^^
But what you are also doing wrong is to use np.append in a for loop, which is almost always a bad idea.
You could rather do something like:
from string import ascii_letters
from random import choices
import numpy as np
N, M = 50, 7
arr = np.random.randn(N, M)
stopwords = np.array(["".join(choices(ascii_letters, k=10)) for _ in range(N)])
result = np.concatenate([arr, stopwords[:, None]], axis=-1)
assert result.shape == (N, M+1)
print(result[0]) # ['0.1' '-1.2' '-0.1' '1.6' '-1.4' '-0.2' '1.7' 'ybWyFlqhcS']
But it is also wrong, mixing data types for no apparent reason.
Imho, you better just keep the two arrays.
Depending on what you are doing you can iterate over them as follows:
for vector, stopword in zip(arr, stopwords):
print(f"{stopword = }")
print(f"{vector = }")
# stopword = 'RgfTVGzPOl'
# vector = array([-0.9, 1.1, 0.7 , -0.3 , -0.7 , -0.7, -0.6])
#
# stopword = 'XlJqKdsvCC'
# vector = array([-0.5, 0.1, -0.7 , -0.6, -1.1, -0.6, -0.6])
#
#...
Let's try some debugging.
Start with a smaller float array:
In [76]: arr = np.arange(12).reshape(3,4).astype(float)
In [77]: arr
Out[77]:
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
In [78]: words = ['no','not','in']
In [79]: for i in range(3):
...: arr = np.append(arr[i], str(words[i]))
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [79], in <cell line: 1>()
1 for i in range(3):
----> 2 arr = np.append(arr[i], str(words[i]))
IndexError: index 2 is out of bounds for axis 0 with size 2
Look at i and arr when you get the error:
In [80]: arr
Out[80]: array(['1.0', 'not'], dtype='<U3')
In [81]: i
Out[81]: 2
arr looks nothing like the original arr, does it? It's a 1d array with 2 string elements. It's arr[2] that's raising the error. Do you understand why?
Recreate arr, and perform just one step:
In [82]: arr = np.arange(12).reshape(3,4).astype(float)
In [83]: np.append(arr[0], words[0])
Out[83]: array(['0.0', '1.0', '2.0', '3.0', 'no'], dtype='<U32')
That looks a bit like what you want for the first row, except it is string dtype. But you don't want to replace the original arr with this 1d array, do you?
Doing the i=1 step on this result produces
In [84]: np.append(Out[83][1], words[1])
Out[84]: array(['1.0', 'not'], dtype='<U3')
Which is the array that i=2 is having problems with (a shape (2,) array).
Don't just throw up your hands in despair when you get an error - debug by looking at variables, and testing the code step by step.
The kind of iteration that you attempt does work for lists:
In [85]: alist = arr.tolist()
In [86]: alist
Out[86]: [[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0, 11.0]]
In [87]: for i in range(3):
...: alist[i].append(words[i])
...:
In [88]: alist
Out[88]:
[[0.0, 1.0, 2.0, 3.0, 'no'],
[4.0, 5.0, 6.0, 7.0, 'not'],
[8.0, 9.0, 10.0, 11.0, 'in']]
The elements of a list can differ in length; list append works in-place; lists can contain numbers and strings. None of this holds true for numpy arrays.
As a general rule, trying to replicate list methods with numpy arrays does not work.
Related
Let's say I have two lists a and b, whereas one is a list of arrays
a = [1200, 1400, 1600, 1800]
b = [array([ 1.84714754, 4.94204658, 11.61580355, ..., 17.09772144,
17.09537562, 17.09499705]), array([ 3.08541849, 5.11338795, 10.26957508, ..., 16.90633304,
16.90417909, 16.90458781]), array([ 4.61916789, 4.58351918, 4.37590053, ..., -2.76705271,
-2.46715664, -1.94577492]), array([7.11040853, 7.79529924, 8.48873734, ..., 7.78736448, 8.47749987,
9.36040364])]
The shape of both is said to be (4,)
If I now try to plot these via plt.scatter(a, b)
I get an error I can't relate to: ValueError: setting an array element with a sequence.
At the end I want a plot where per n-th value in a a set of values stored as n-th array in b shall be plotted.
I'm pretty sure I've done this before, but I can't get this working.
Any ideas? ty
You need to adjust the elements in a to match the elements in b
len_b = [len(sub_array) for sub_array in b]
a = [repeat_a for i,repeat_a in enumerate(a) for _ in range(len_b[i])]
# convert list of array to just list of values
b = np.ravel(b).tolist()
# check if lengths are same
assert len(a) == len(b)
# if yes, now this should work
plt.scatter(a,b)
I am afraid repetition it is. If all lists in b have the same length, you can use numpy.repeat:
import numpy as np
import matplotlib.pyplot as plt
#fake data
np.random.seed(123)
a = [1200, 1400, 1600, 1800]
b = np.random.randint(1, 100, (4, 11)).tolist()
plt.scatter(np.repeat(a, len(b[0])), b)
plt.show()
If you are not sure and want to be on the safe side, list comprehension it is.
import numpy as np
import matplotlib.pyplot as plt
#fake data
np.random.seed(123)
a = [1200, 1400, 1600, 1800]
b = np.random.randint(1, 100, (4, 11)).tolist()
plt.scatter([[x]*len(b[i]) for i, x in enumerate(a)], b)
plt.show()
The output is the same:
Referring to the suggestion of #sai I tried
import numpy as np
arr0 = np.array([1, 2, 3, 4, 5])
arr1 = np.array([6, 7, 8, 9])
arr2 = np.array([10, 11])
old_b = [arr0, arr1, arr2]
b = np.ravel(old_b).tolist()
print(len(b))
Which will give me length 3 instead of the length 11 I expected. How can I collapse a list of arrays to a single list?
edit:
b = np.concatenate(old_b).ravel().tolist()
will lead to the desired result. Thanks all.
Suppose we got a 1D array below
arr = np.array([a,b,c])
The first thing I need to do is the make the product of all of the elments, i.e
[ab,ac,bc]
Then construct a 2d triangular array with this element
[
[a,ab,ac],
[0,b,bc],
[0,0,c]
]
Create a diagonal with your 1-D array and fill the upper triangle of it with upper triangle of outer:
out = np.diag(arr)
#upper triangle indices
uidx = np.triu_indices(arr.size,k=1)
#replacing upper triangle with outer
out[uidx]=np.outer(arr,arr)[uidx]
One way to do this is to calculate the outer product of your 1d array and then use masking informed by the knowledge that you only want the upper triangle of the 2d triangular matrix.
import numpy as np
a = np.array([5,4,3])
n = len(a)
outer = np.outer(a, a)
outer[np.tril_indices(n)] = 0
outer[np.diag_indices(n)] = a
outer
array([[ 5, 20, 15],
[ 0, 4, 12],
[ 0, 0, 3]])
We can use masking to achieve our desired result, like so -
def upper_outer(a):
out = a[:,None]*a
out[np.tri(len(a), k=-1, dtype=bool)] = 0
np.fill_diagonal(out,a)
return out
Sample run -
In [84]: a = np.array([3,6,2])
In [86]: upper_outer(a)
Out[86]:
array([[ 3, 18, 6],
[ 0, 6, 12],
[ 0, 0, 2]])
Benchmarking
Other approaches :
# #Nick Becker's soln
def tril_diag(a):
n = len(a)
outer = np.outer(a, a)
outer[np.tril_indices(n)] = 0
outer[np.diag_indices(n)] = a
return outer
# #Ehsan's soln
def triu_outer(arr):
out = np.diag(arr)
uidx = np.triu_indices(arr.size,k=1)
out[uidx]=np.outer(arr,arr)[uidx]
return out
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
in_ = [np.random.rand(n) for n in [10,100,200,500,1000,5000]]
funcs = [upper_outer, tril_diag, triu_outer]
t = benchit.timings(funcs, in_)
t.rank()
t.plot(logx=True, save='timings.png')
For large datasets, we can also use numexpr to leverage multi-cores -
import numexpr as ne
def upper_outer_v2(a):
mask = ~np.tri(len(a), dtype=bool)
out = ne.evaluate('a2D*a*mask',{'a2D':a[:,None], 'a':a, 'mask':mask})
np.fill_diagonal(out,a)
return out
New timings plot :
There is a blas function for (almost) that:
# example
a = np.array([1.,2.,5.])
from scipy.linalg.blas import dsyr
# apply blas function; transpose since blas uses FORTRAN order
out = dsyr(1,a,1).T
# fix diagonal
out.reshape(-1)[::a.size+1] = a
out
# array([[ 1., 2., 5.],
# [ 0., 2., 10.],
# [ 0., 0., 5.]])
benchit (thanks #Divakar)
I am trying to create the curve fit with scipy for the energy eigenvalues calculated from a 4x4 Hamiltonian matrix. In the following error "energies" corresponds to the function in which I define the Hamiltonian, "xdata" is an array given after and out of the function and corresponds to k and "e" is the energy eigenvalues that a get.
The error seems to be at the Hamiltonian matrix. However if I run the code without the curve_fit everything works fine.
I have also tried using np.array according to other questions I found here but again it doesn't work.
If a give a specific xdata in the curve fit, like xdata[0], the code works but it doesn't help me much since I want the fit using all values.
Does anyone know what is the problem? Thank you all in advance!
Traceback (most recent call last):
File "fitest.py", line 70, in <module>
popt, pcov = curve_fit(energies,xdata, e)#,
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 651, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kwargs)
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 377, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 26, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/eb/software/Python/2.7.12-intel-2016b/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 453, in _general_function
return function(xdata, *params) - ydata
File "fitest.py", line 23, in energies
[ 0.0, 0.0, 0.0, ep-2*V4*cos(kpt*d) ]],dtype=complex)
TypeError: only length-1 arrays can be converted to Python scalars
Code:
from numpy import sin, cos, array
from scipy.optimize import curve_fit
from numpy import *
from numpy.linalg import *
def energies(kpt, a=1.0, b=2.0, c=3.0, f=4.0):
e1=-15.0
e2=-10.0
d=1.0
v0=(-2.0/d**2)
V1=a*v0
V2=b*v0
V3=c*v0
V4=d*v0
basis=('|S, s>', '|S,px>', '|S, py>', '|S,pz>')
h=array([[ e1-2*V1*cos(kpt*d), -2*V2*1j*sin(kpt*d), 0.0, 0.0 ],
[ 2*V2*1j*sin(kpt*d), e2-2*V3*cos(kpt*d), 0.0, 0.0],
[ 0.0, 0.0, e2-2*V4*cos(kpt*d), 0.0],
[ 0.0, 0.0, 0.0, e2-2*V4*cos(kpt*d) ]],dtype=complex)
e,psi=eigh(h)
return e
print energies(kpt=0.0)
k2=0.4*2*pi/2.05
print energies(kpt=k2)
xdata = array([0.0,k2])
print xdata
popt, pcov = curve_fit(energies, xdata, e)
print " "
print popt
print " "
Your problem has nothing to do with your fit, you run into the same problem, if you perform
print energies(xdata)
The reason for this error message is that you put an array kpt into h as an array element and then tell numpy, to transform this array kpt into a complex number. Numpy is kind enough to transform an array of length one into a scalar, which then can be transformed into a complex number. This explains, why you didn't get an error message with xdata[0]. You can easily reproduce your problem like this
import numpy as np
#all fine with an array of length one
xa = np.asarray([1])
a = np.asarray([[xa, 2, 3], [4, 5, 6]])
print a
print a.astype(complex)
#can't apply dtype = complex to an array with two elements
xb = np.asarray([1, 2])
b = np.asarray([[xb, 2, 3], [4, 5, 6]])
print b
print b.astype(complex)
Idk, what you were trying to achieve with your energies function, so I can only speculate, what you were aiming at, when constructing the h array. Maybe a 3D array like this?
kpt = np.asarray([1, 2, 3])
h = np.zeros(16 * len(kpt), dtype = complex).reshape(len(kpt), 4, 4)
h[:, 0, 0] = 2 * kpt + 1
h[:, 0, 1] = kpt ** 2
h[:, 3, 2] = np.sin(kpt)
print h
I have an array of arrays filled with zeros, so this is the shape I want for the result.
I'm having trouble saving the nested for-loop to this array of arrays. In other words, I want to replace all of the zeros with what the last line calculates.
percent = []
for i in range(len(F300)):
percent.append(np.zeros(lengths[i]))
for i in range(0,len(Name)):
for j in range(0,lengths[i]):
percent[i][j]=(j+1)/lengths[i]
The last line only saves the last j value for each i.
I'm getting:
percent = [[0,0,1],[0,1],[0,0,0,1]]
but I want:
percent = [[.3,.6,1],[.5,1],[.25,.5,75,1]]
The problem with this code is that because it's in Python 2.7, the / operator is performing "classic" division. There are a couple different approaches to solve this in Python 2.7. One approach is to convert the numbers being divided into floating point numbers:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]): #
percent[i][j] = float(j + 1) / float(lengths[i])
Another approach would be to import division from the __future__ package. However, this import line must be the first statement in your code.
from __future__ import division
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = []
for i in range(3): # Deduced size of F300 from the length of percent.
percent.append(np.zeros(lengths[i]))
for i in range(0, len(percent)):
for j in range(0, lengths[i]):
percent[i][j] = (j + 1) / lengths[i]
The third approach, and the one I personally prefer, is to make good use of NumPy's built-in functions:
import numpy as np
lengths = [3, 2, 4] # Deduced values of lengths from your output.
percent = np.array([np.linspace(1.0 / float(l), 1.0, l) for l in lengths])
All three approaches will produce a list (or in the last case, numpy.ndarray object) of numpy.ndarray objects with the following values:
[[0.33333333, 0.66666667, 1.], [0.5, 1.], [0.25, 0.5, 0.75, 1.]]
I read the following string on a .txt file
{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}
using lin = lin.strip() to remove '\n'
Then I replaced { and } to [ and ] using
lin = lin.replace ("{", "[")
lin = lin.replace ("}", "]")
My goal is to convert lin into a float 2d array. So I did
my_matrix = np.array(lin, dtype=float)
but i got an error message: "ValueError: could not convert string to float: [[1,2,3,0],[1,1,1,2],[0,-1,3,9]]"
Removing the dtype, i get an string array. I already tried to multiply lin by 1.0, make a copy of lin using .astype(float), but nothing seems to work.
I am using the JSON library to parse the contents of the file and then iterate through the arrays and converting each element into float. However an integer solution might already be enough for what you want. That one is much faster and shorter.
import json
fc = '{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}'
a = json.loads(fc.replace('{','[').replace('}',']'))
print(a) # a is now array of integers. this might be enough
for linenumber, linecontent in enumerate(a):
for elementnumber, element in enumerate(linecontent):
a[linenumber][elementnumber] = float(element)
print(a) # a is now array of floats
Shorter solution
import json
fc = '{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}'
a = json.loads(fc.replace('{','[').replace('}',']'))
print(a) # a is now array of integers. this might be enough
a = [[float(c) for c in b] for b in a]
print(a) # a is now array of floats
(works for both python 2 and 3)
import numpy as np
readStr = "{{1,2,3,0},{4,5,6,7},{8,-1,9,0}}"
readStr = readStr[2:-2]
# Originally read string is now -> "1,2,3,0},{4,5,6,7},{8,-1,9,0"
line = readStr.split("},{")
# line is now a list object -> ["1,2,3,0", "4,5,6,7", "8,-1,9,0"]
array = []
temp = []
# Now we iterate through 'line', convert each element into a list, and
# then append said list to 'array' on each iteration of 'line'
for string in line:
num_array = string.split(',')
for num in num_array:
temp.append(num)
array.append(temp)
temp = []
# Now with 'array' -> [[1,2,3,0], [4,5,6,7], [8,-1,9,0]]
my_matrix = np.array(array, dtype = float)
# my_matrix = [[1.0, 2.0, 3.0, 0.0]
# [4.0, 5.0, 6.0, 7.0]
# [8.0, -1.0, 9.0, 0.0]]
Although this may not be the most elegant solution, I think it is easy to follow and gives you exactly what you're looking for.