Confirm all columns in a pandas dataframe are 1-D - arrays

It is not good practise to include multi-arrays/lists as columns in a pandas dataframe. In the event that I want to raise a value error whenever any column in a dataframe is not 1-D.
Given a dataset
dfA = pd.DataFrame(
np.array(
[
[1, (0,2), 0, 3],
[1, (0,0), 1, 2],
[0, (5,1),6, 1],
[4, (3,0), 3, 4],
[1, (1,1), 0, 2],
[2, (0,1), 3, 5],
[1, (3,3), 1, 2],
[6, (4,3), 5, 3],
[3, (0,2), 1, 2],
[2, (0,0), 2, 1],
]
),
columns=['A', 'B', 'C', 'D'])
I want to do something similar to
if columns in dfA are not all 1-D:
raise ValueError("Dataframe must only have 1-D columns")

In your case you can slice the 1st row , then np.shape
dfA.iloc[0].map(lambda x :np.shape(x))!=()
Out[413]:
A False
B True
C False
D False
Name: 0, dtype: bool

Related

numpy shape inconsistent with array structure

I am going mad over this thing.
I have 2 lists
A = [ [[1,2,3],[1,2,3],[1,2,3]], [[1,2,3],[1,2,3],[1,2,3]]]
B = [ [[1,2,3],[1,2,3],[1,2,3]], [[1,2,3],[1,2,3]]]
When I call the shape of A and B as numpy array I get this:
In [33]: np.asarray(A).shape
Out[33]: (2, 3, 3)
In [31]: np.asarray(B).shape
Out[31]: (2,)
How do I shape A in the same way as B, that is (2,)?
I think I understand why it's happening but I don't know how to prevent this to happening.
Anyone any help/idea please?
thanks!
Your 2 lists:
In [232]: A
Out[232]: [[[1, 2, 3], [1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3], [1, 2, 3]]]
In [233]: B
Out[233]: [[[1, 2, 3], [1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3]]]
Now, explain why the B result is better than the A one?
In [234]: np.array(A)
Out[234]:
array([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]])
In [235]: np.array(B)
<ipython-input-235-c938532b77c1>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
np.array(B)
Out[235]:
array([list([[1, 2, 3], [1, 2, 3], [1, 2, 3]]),
list([[1, 2, 3], [1, 2, 3]])], dtype=object)

numpy splitting arrays column wise and stacking to each row

I have a 2D array of dimension 12*80.
I want to split each row into subarrays of size 4 (total of 20 arrays) and stack the resulting arrays to the rows vertically.
Let us say that my array is
>>> A
array([[1, 2, 2, 2, 2, 2],
[3, 3, 1, 3, 1, 3],
[3, 1, 2, 1, 1, 3]])
>>>
and I want to split each row into 3 subarrays and stack vertically. My expected output is
>>> A
array([[1, 2],
[2, 2],
[2, 2],
[3, 3],
[1, 3],
[1, 3],
[3, 1],
[2, 1],
[1, 3]])
>>>
Is there any other way other than iterating over and splitting each row one by one ? A better efficient implementation ?

Numpy - Indexing one dimension of a multidimensional array

I have an numpy array like this with shape (6, 2, 4):
x = np.array([[[0, 3, 2, 0],
[1, 3, 1, 1]],
[[3, 2, 3, 3],
[0, 3, 2, 0]],
[[1, 0, 3, 1],
[3, 2, 3, 3]],
[[0, 3, 2, 0],
[1, 3, 2, 2]],
[[3, 0, 3, 1],
[1, 0, 1, 1]],
[[1, 3, 1, 1],
[3, 1, 3, 3]]])
And I have choices array like this:
choices = np.array([[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 1, 1],
[0, 0, 0, 1]])
How can I use choices array to index only the middle dimension with size 2 and get a new numpy array with shape (6, 4) in the most efficient way possible?
The result would be this:
[[1 3 1 1]
[3 3 2 3]
[3 2 3 3]
[1 3 2 0]
[1 0 1 1]
[1 3 1 3]]
I've tried to do it by x[:, choices, :] but this doesn't return what I want. I also tried to do x.take(choices, axis=1) but no luck.
Use np.take_along_axis to index along the second axis -
In [16]: np.take_along_axis(x,choices[:,None],axis=1)[:,0]
Out[16]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
Or with explicit integer-array indexing -
In [22]: m,n = choices.shape
In [23]: x[np.arange(m)[:,None],choices,np.arange(n)]
Out[23]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
as I recently had this issue, found #divakar's answer useful, but still wanted a general functions for that (independent of number of dims etc.), here it is :
def take_indices_along_axis(array, choices, choice_axis):
"""
array is N dim
choices are integer of N-1 dim
with valuesbetween 0 and array.shape[choice_axis] - 1
choice_axis is the axis along which you want to take indices
"""
nb_dims = len(array.shape)
list_indices = []
for this_axis, this_axis_size in enumerate(array.shape):
if this_axis == choice_axis:
# means this is the axis along which we want to choose
list_indices.append(choices)
continue
# else, we want arange(this_axis), but reshaped to match the purpose
this_indices = np.arange(this_axis_size)
reshape_target = [1 for _ in range(nb_dims)]
reshape_target[this_axis] = this_axis_size # replace the corresponding axis with the right range
del reshape_target[choice_axis] # remove the choice_axis
list_indices.append(
this_indices.reshape(tuple(reshape_target))
)
tuple_indices = tuple(list_indices)
return array[tuple_indices]
# test it !
array = np.random.random(size=(10, 10, 10, 10))
choices = np.random.randint(10, size=(10, 10, 10))
assert take_indices_along_axis(array, choices, choice_axis=0)[5, 5, 5] == array[choices[5, 5, 5], 5, 5, 5]
assert take_indices_along_axis(array, choices, choice_axis=2)[5, 5, 5] == array[5, 5, choices[5, 5, 5], 5]

Creating an array of 3D vectors where each element of each vector is from a given range

I am trying to implement an array of 3D vectors. All vectors are combination of element ranges. What I mean is:
array = [v_1, v_2, v_3,....]
v_j = [x_1, x_2, x_3] with x_i in [a, b].
The important thing for me is, that I want to have all possible combinations.
So for example let a = 1, b = 10. Then it should be something like:
v_1 = [1, 1, 1], v_2 = [1, 1, 2],...v_10 = [1, 1, 10]
and then the next one should be:
v_11 = [1, 2, 1], v_12 = [1, 2, 2]....
I tried it by using linspace but I just get the vectors where each element is equal i.e.
v_1 = [1, 1, 1], v_2 = [2, 2, 2]....
Is there an easy way to do this or do I have to do it by a lot of loops.
My linspace example was:
ffac = np.linspace(-1E-3, 1E-3, 100, endpoint=True)
for i in range(100):
eps = np.ones(shape=[100, ]) * ffac[i]
With a and b, we can make np.arange(a, b+1), and then use np.meshgrid:
xij = np.arange(a, b+1)
np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0))
For b=2, we obtain:
>>> np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0))
array([[[[1, 1, 1],
[1, 1, 2]],
[[1, 2, 1],
[1, 2, 2]]],
[[[2, 1, 1],
[2, 1, 2]],
[[2, 2, 1],
[2, 2, 2]]]])
For a vector of n options, the result is thus a n×n×n×3.
Or if you want to flatten it:
>>> np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0)).reshape(-1, 3)
array([[1, 1, 1],
[1, 1, 2],
[1, 2, 1],
[1, 2, 2],
[2, 1, 1],
[2, 1, 2],
[2, 2, 1],
[2, 2, 2]])

How to get all sub matrices of 2D array without numpy?

I need to get all submatrices of the 2D array and to do the manipulation for each submatrix. So I created example matrix:
M3 = [list(range(5)) for i in range(6)]
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]
I need to capture 3 rows and 3 columns and then shift this "window" till I get all submatrices. The first submatrix would be:
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]
and the last one is:
[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]
For this matrix I need 12 submatrices. However, I become more using code with which I tried to solve the problem:
for j in range(len(M3[0])-3):
for i in range(len(M3)-3):
for row in M3[0+j:3+j]:
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print(X_i_j)
I get 18 but not 12 (with two duplicates of each submatrix):
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
...
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
And with this sample of code I get 6 submatrices with 1 duplicate for each:
for i in range(len(M3)-3):
for j in range(len(M3[0])-3):
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print(X_i_j)
I do not see what is wrong with it and why I get the duplicates. How can I get all sub matrices of 2D array without numpy for this case?
Your code is working ( with change of order of vars and constants ):
for j in range(len(M3)-2):
for i in range(len(M3[0])-2):
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print('=======')
for x in X_i_j:
print(x)
I would solve it slightly different.
a function to read y-number-of-rows
then a function to read x-number-of-columns from those rows, which then is your sub.
This would work for any (2D) array / sub-array
Sample:
def read_y_rows(array, rows, offset):
return array[offset:rows + offset]
def read_x_cols(array, cols, offset):
return list(row[offset:cols + offset] for row in array)
def get_sub_arrays(array, x_dim_cols, y_dim_rows):
"""
get 2D sub arrays by x_dim columns and y_dim rows
from 2D array (list of lists)
"""
result = []
for start_row in range(len(array) - y_dim_rows + 1):
y_rows = read_y_rows(array, y_dim_rows, start_row)
for start_col in range(len(max(array, key=len)) - x_dim_cols + 1):
x_columns = read_x_cols(y_rows, x_dim_cols, start_col)
result.append(x_columns)
return result
to use it you could do:
M3 = [list(range(5)) for i in range(6)]
sub_arrays = get_sub_arrays(M3, 3, 3) ## this would also work for 2x2 arrays
the sub_arrays is again a list of lists, containing all found subarrays, you could print them like this:
for sub_array in sub_arrays:
print()
for row in sub_array:
print(row)
I know it is a lot more code than above, just wanted to share this code.

Resources