I have two numpy structured arrays arr1, arr2.
arr1 has fields ['f1','f2','f3'].
arr2 has fields ['f1','f2','f3','f4'].
I.e.:
arr1 = [[f1_1_1, f2_1_1, f3_1_1 ], arr2 = [[f1_2_1, f2_2_1, f3_2_1, f4_2_1 ],
[f1_1_2, f2_1_2, f3_1_2 ], [f1_2_2, f2_2_2, f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
I want to assign various slices of arr1 to the corresponding slice of arr2 (slices in the indexes and in the fields).
See below for the various cases.
From answers I found (to related, but not exactly the same, questions) it seemed to me that the only way to do it is assigning one slice at a time, for a single field, i.e., something like
arr2['f1'][0:1] = arr1['f1'][0:1]
(and I can confirm this works), looping over all source fields in the slice.
Is there a way to assign all intended source fields in the slice at a time?
I mean to assign, say, the elements x in the image
Case 1 (only some fields in arr1)
arr1 = [[ x , x , f3_1_1 ], arr2 = [[ x , x , f3_2_1, f4_2_1 ],
[ x , x , f3_1_2 ], [ x , x , f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 2 (all fields in arr1)
arr1 = [[ x , x , x ], arr2 = [[ x , x , x , f4_2_1 ],
[ x , x , x ], [ x , x , x , f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 3
arr1 has fields ['f1','f2','f3','f5'].
arr2 has fields ['f1','f2','f3','f4'].
Assign a slice of ['f1','f2','f3']
Sources:
Python Numpy Structured Array (recarray) assigning values into slices
Convert a slice of a structured array to regular NumPy array in NumPy 1.14
You can do it for example like that:
import numpy as np
x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)], dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
y = np.array([('Carl', 10, 75.0), ('Joe', 7, 76.0)], dtype=[('name2', 'U10'), ('age2', 'i4'), ('weight', 'f4')])
print(x[['name', 'age']])
print(y[['name2', 'age2']])
# multiple field indexing
y[['name2', 'age2']] = x[['name', 'age']]
print(y[['name2', 'age2']])
# you can also use slicing if you want specific parts or the size does not match
y[:1][['name2', 'age2']] = x[1:][['name', 'age']]
print(y[:][['name2', 'age2']])
The names field names can be different, I am not sure about the dtypes and if there is (down)casting.
https://docs.scipy.org/doc/numpy/user/basics.rec.html#assignment-from-other-structured-arrays
https://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields
Related
I'm a bit confused about how numpy's ndarray's min/max function with a given axis argument works.
import numpy as np
x = np.random.rand(2,3,4)
x.min(axis=0)
produces
array([[[0.4139181 , 0.24235588, 0.50214552, 0.38806332],
[0.63775691, 0.08142376, 0.69722379, 0.1968098 ],
[0.50496744, 0.54245416, 0.75325114, 0.67245846]],
[[0.79760899, 0.35819981, 0.5043491 , 0.75274284],
[0.54778544, 0.5597848 , 0.52325408, 0.66775091],
[0.71255276, 0.85835137, 0.60197253, 0.33060771]]])
array([[0.4139181 , 0.24235588, 0.50214552, 0.38806332],
[0.54778544, 0.08142376, 0.52325408, 0.1968098 ],
[0.50496744, 0.54245416, 0.60197253, 0.33060771]])
a 3x4 numpy array. I was thinking it would produce a size 2 array with the minimum for x[0] and x[1].
Can someone explain how this min function is working?
When you do x.min(axis=0), you request the min to be computed along the axis 0, which means this dimension is aggregated into a single value and thus the output has a (3,4) shape.
What you want is to compute the min on the combined axes 1 and 2:
x.min(axis=(1,2))
# array([0.38344152, 0.0202184 ])
You can also first reshape the array to combine those two dimensions, then compute the min along this new dimension (here, 1):
x.reshape(2,-1).min(axis=1)
# array([0.38344152, 0.0202184 ])
intermediate, reshaped, array:
x.reshape(2,-1)
array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ,
0.64589411, 0.43758721, 0.891773 , 0.96366276, 0.38344152,
0.79172504, 0.52889492],
[0.56804456, 0.92559664, 0.07103606, 0.0871293 , 0.0202184 ,
0.83261985, 0.77815675, 0.87001215, 0.97861834, 0.79915856,
0.46147936, 0.78052918]])
used input:
np.random.seed(0)
x = np.random.rand(2,3,4)
i have the following numpy array extracted from a dataframe that i want to reshape
Extraction
x = c_df['x'].values
y = c_df['y'].values
z = c_df['z'].values
convert to array
x_y_z = np.array([x, y, z])
x_y_z
Array looks like this
array([[748260.27757, 748262.56478, 748263.52455, ..., 730354.86406,
730374.75 , 730388.45066],
[333346.25 , 333308.43521, 333296.25 , ..., 331466.13593,
331453.84365, 331446.25 ],
[ 2840. , 2840. , 2840. , ..., 2400. ,
2400. , 2400. ]])
basically i want to reshape it to be able to plot using plt.contourf which required Z to be 2D array
so i assume the array needs to be reshaped to something like
YYYYYYYYYY
Xzzzzzzzzzz
Xzzzzzzzzzz
Xzzzzzzzzzz
Xzzzzzzzzzz
is my assumption correct? if yes how to reshape the array.
If I understand you correctly Numpy Mgrid should be able to help you. However you might want some more explanation which can be found on this thread.
For the next time you can make it easier if you provide a simplified example of your problem.
I have an array of 2D, called X and a 1D array for X's classes, what i want to do is slice the same amount of first N percent elements for each class and store inside a new array, for example, in a simple way without doing for loops:
For the following X array which is 2D:
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]
[0.659871 0.320614 ]
[0.459677 0.940614 ]
[0.823733 0.831789 ]
[0.236175 0.10750 ]
[0.379032 0.241121 ]
[0.512535 0.8522193]
Output is 3.
Then, i'd like to store the first 3 index that belongs to class 0 and first 3 elements that belongs to class 0 and maintain the occurence order of the indices, the following output:
First 3 from each class:
[1 0 0 1 0 1]
New_X =
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]]
First, 30% is only 2 elements from each class (even when using np.ceil).
Second, I'll assume both arrays are numpy.array.
Given the 2 arrays, we can find the desired indices using np.where and array y in the following way:
in_ = sorted([x for x in [*np.where(y==0)[0][:np.ceil(0.3*6).astype(int)],*np.where(y==1)[0][:np.ceil(0.3*6).astype(int)]]]) # [0, 1, 2, 3]
Now we can simply slice X like so:
X[in_]
# array([[0.612515 , 0.385088 ],
# [0.213345 , 0.174123 ],
# [0.432596 , 0.8714246],
# [0.70023 , 0.730789 ]])
The definition of X and y are:
X = np.array([[0.612515 , 0.385088 ],
[0.213345 , 0.174123 ],
[0.432596 , 0.8714246],
[0.70023 , 0.730789 ],
[0.455105 , 0.128509 ],
[0.518423 , 0.295175 ],
[0.659871 , 0.320614 ],
[0.459677 , 0.940614 ],
[0.823733 , 0.831789 ],
[0.236175 , 0.1075 ],
[0.379032 , 0.241121 ],
[0.512535 , 0.8522193]])
y = np.array([1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0])
Edit
The following line: np.where(y==0)[0][:np.ceil(0.3*6).astype(int)] doing the following:
np.where(y==0)[0] - returns all the indices where y==0
Since you wanted only the 30%, we slice those indices to get all the values up to 30% - [:np.ceil(0.3*6).astype(int)]
I have a CSV file:
8.84,17.22,13.22,3.84
3.99,11.73,19.66,1.27
Def jo(x):
data=np.loadtxt(x,delimiter=',')
Return data
Print(jo('data.csv')
The code returns:
[ [8.84 17.22 13.22 3.84]
[3.99 11.73 19.66 1.27] ]
But I want all these elements in a single array, because I want to find their mean and median.
How to combine these 2 arrays into 1 ?
use numpy.rehshape
# data: data is your array
>>> data.reshape(-1)
In [245]: txt="""8.84,17.22,13.22,3.84
...: 3.99,11.73,19.66,1.27"""
In [246]: data = np.loadtxt(txt.splitlines(), delimiter=',')
In [247]: data
Out[247]:
array([[ 8.84, 17.22, 13.22, 3.84],
[ 3.99, 11.73, 19.66, 1.27]])
In [248]: data.shape
Out[248]: (2, 4)
That is one array, just 2d.
There are various ways of turning that into a 1d array:
In [259]: arr = data.ravel()
In [260]: arr
Out[260]: array([ 8.84, 17.22, 13.22, 3.84, 3.99, 11.73, 19.66, 1.27])
But there's no need to do that. mean (and median) without axis parameter acts on the raveled array. Check the docs:
In [261]: np.mean(data)
Out[261]: 9.971250000000001
In [262]: np.mean(arr)
Out[262]: 9.971250000000001
I have got a 3d array (an array of triangles). I would like to get the triangles (2d arrays) containing a given point (1d array).
I went through in1d, where, argwhere but I am still unsuccessfull....
For instance with :
import numpy as np
import numpy.random as rd
t = rd.random_sample((10,3,3))
v0 = np.array([1,2,3])
t[1,2] = v0
t[5,0] = v0
t[8,1] = v0
I would like to get:
array([[[[[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]],
[[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]],
[[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]]]])
to then get the set of v0 adjacent points
{[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 0.31995867, 0.58351034, 0.38731405],
[ 0.04435288, 0.96613852, 0.83228402],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]}
without looping, the array being quite big.
For instance
In [28]: np.in1d(v0,t[8]).all()
Out[28]: True
works as a test on a line, but I can't get it over the all array.
Thanks for your help.
What I mean is the vectorized equivalent to:
In[54]:[triangle for triangle in t if v0 in triangle ]
Out[54]:
[array([[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]]),
array([[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]]),
array([[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]])]
You can simply do -
t[(t==v0).all(axis=-1).any(axis=-1)]
We are performing ALL and ANY reduction along the last axis with axis=-1 there. First .all(axis=-1) looks for rows exactly matching the array v0 and then the latter .any(axis=-1) looks for ANY match in each of the 2D blocks. This results in a boolean array of the same length as the length of input array. So, we use the boolean array to filter out valid elements off the input array.