I have an array of 2D, called X and a 1D array for X's classes, what i want to do is slice the same amount of first N percent elements for each class and store inside a new array, for example, in a simple way without doing for loops:
For the following X array which is 2D:
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]
[0.659871 0.320614 ]
[0.459677 0.940614 ]
[0.823733 0.831789 ]
[0.236175 0.10750 ]
[0.379032 0.241121 ]
[0.512535 0.8522193]
Output is 3.
Then, i'd like to store the first 3 index that belongs to class 0 and first 3 elements that belongs to class 0 and maintain the occurence order of the indices, the following output:
First 3 from each class:
[1 0 0 1 0 1]
New_X =
[[0.612515 0.385088 ]
[0.213345 0.174123 ]
[0.432596 0.8714246]
[0.700230 0.730789 ]
[0.455105 0.128509 ]
[0.518423 0.295175 ]]
First, 30% is only 2 elements from each class (even when using np.ceil).
Second, I'll assume both arrays are numpy.array.
Given the 2 arrays, we can find the desired indices using np.where and array y in the following way:
in_ = sorted([x for x in [*np.where(y==0)[0][:np.ceil(0.3*6).astype(int)],*np.where(y==1)[0][:np.ceil(0.3*6).astype(int)]]]) # [0, 1, 2, 3]
Now we can simply slice X like so:
X[in_]
# array([[0.612515 , 0.385088 ],
# [0.213345 , 0.174123 ],
# [0.432596 , 0.8714246],
# [0.70023 , 0.730789 ]])
The definition of X and y are:
X = np.array([[0.612515 , 0.385088 ],
[0.213345 , 0.174123 ],
[0.432596 , 0.8714246],
[0.70023 , 0.730789 ],
[0.455105 , 0.128509 ],
[0.518423 , 0.295175 ],
[0.659871 , 0.320614 ],
[0.459677 , 0.940614 ],
[0.823733 , 0.831789 ],
[0.236175 , 0.1075 ],
[0.379032 , 0.241121 ],
[0.512535 , 0.8522193]])
y = np.array([1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0])
Edit
The following line: np.where(y==0)[0][:np.ceil(0.3*6).astype(int)] doing the following:
np.where(y==0)[0] - returns all the indices where y==0
Since you wanted only the 30%, we slice those indices to get all the values up to 30% - [:np.ceil(0.3*6).astype(int)]
Related
I have two numpy structured arrays arr1, arr2.
arr1 has fields ['f1','f2','f3'].
arr2 has fields ['f1','f2','f3','f4'].
I.e.:
arr1 = [[f1_1_1, f2_1_1, f3_1_1 ], arr2 = [[f1_2_1, f2_2_1, f3_2_1, f4_2_1 ],
[f1_1_2, f2_1_2, f3_1_2 ], [f1_2_2, f2_2_2, f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
I want to assign various slices of arr1 to the corresponding slice of arr2 (slices in the indexes and in the fields).
See below for the various cases.
From answers I found (to related, but not exactly the same, questions) it seemed to me that the only way to do it is assigning one slice at a time, for a single field, i.e., something like
arr2['f1'][0:1] = arr1['f1'][0:1]
(and I can confirm this works), looping over all source fields in the slice.
Is there a way to assign all intended source fields in the slice at a time?
I mean to assign, say, the elements x in the image
Case 1 (only some fields in arr1)
arr1 = [[ x , x , f3_1_1 ], arr2 = [[ x , x , f3_2_1, f4_2_1 ],
[ x , x , f3_1_2 ], [ x , x , f3_2_2, f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 2 (all fields in arr1)
arr1 = [[ x , x , x ], arr2 = [[ x , x , x , f4_2_1 ],
[ x , x , x ], [ x , x , x , f4_2_2 ],
... , ... ,
[f1_1_N1, f2_1_N1, f3_1_N1]] [f1_2_N2, f2_2_N2, f3_2_N2, f4_2_N2]]
Case 3
arr1 has fields ['f1','f2','f3','f5'].
arr2 has fields ['f1','f2','f3','f4'].
Assign a slice of ['f1','f2','f3']
Sources:
Python Numpy Structured Array (recarray) assigning values into slices
Convert a slice of a structured array to regular NumPy array in NumPy 1.14
You can do it for example like that:
import numpy as np
x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)], dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
y = np.array([('Carl', 10, 75.0), ('Joe', 7, 76.0)], dtype=[('name2', 'U10'), ('age2', 'i4'), ('weight', 'f4')])
print(x[['name', 'age']])
print(y[['name2', 'age2']])
# multiple field indexing
y[['name2', 'age2']] = x[['name', 'age']]
print(y[['name2', 'age2']])
# you can also use slicing if you want specific parts or the size does not match
y[:1][['name2', 'age2']] = x[1:][['name', 'age']]
print(y[:][['name2', 'age2']])
The names field names can be different, I am not sure about the dtypes and if there is (down)casting.
https://docs.scipy.org/doc/numpy/user/basics.rec.html#assignment-from-other-structured-arrays
https://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields
I have got a 3d array (an array of triangles). I would like to get the triangles (2d arrays) containing a given point (1d array).
I went through in1d, where, argwhere but I am still unsuccessfull....
For instance with :
import numpy as np
import numpy.random as rd
t = rd.random_sample((10,3,3))
v0 = np.array([1,2,3])
t[1,2] = v0
t[5,0] = v0
t[8,1] = v0
I would like to get:
array([[[[[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]],
[[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]],
[[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]]]])
to then get the set of v0 adjacent points
{[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 0.31995867, 0.58351034, 0.38731405],
[ 0.04435288, 0.96613852, 0.83228402],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]}
without looping, the array being quite big.
For instance
In [28]: np.in1d(v0,t[8]).all()
Out[28]: True
works as a test on a line, but I can't get it over the all array.
Thanks for your help.
What I mean is the vectorized equivalent to:
In[54]:[triangle for triangle in t if v0 in triangle ]
Out[54]:
[array([[ 0.87312 , 0.33411403, 0.56808291],
[ 0.36769417, 0.66884858, 0.99675896],
[ 1. , 2. , 3. ]]),
array([[ 0.31995867, 0.58351034, 0.38731405],
[ 1. , 2. , 3. ],
[ 0.04435288, 0.96613852, 0.83228402]]),
array([[ 1. , 2. , 3. ],
[ 0.28647107, 0.95755263, 0.5378722 ],
[ 0.73731078, 0.8777235 , 0.75866665]])]
You can simply do -
t[(t==v0).all(axis=-1).any(axis=-1)]
We are performing ALL and ANY reduction along the last axis with axis=-1 there. First .all(axis=-1) looks for rows exactly matching the array v0 and then the latter .any(axis=-1) looks for ANY match in each of the 2D blocks. This results in a boolean array of the same length as the length of input array. So, we use the boolean array to filter out valid elements off the input array.
I have an array that represents distances, so I try to get an array with the indexes from the 3 smaller and increasing distances, this way:
array([[ 2.8],
[ 206. ],
[ 84.4],
[ 297.6],
[ 112.7],
[ 235.4],
[ 170.7],
[ 22.2],
[ 264.1],
[ 163.2],
[ 43.7],
[ 131.2]])
Result = [0, 7, 10]
Any idea o suggestion? Thanks in advance.
I found a not very elegant but working approach. If someone has another fast or better version, will be welcome.
from operator import itemgetter
# b distances array
b = [i for i in enumerate(b)]
b = sorted(b, key=itemgetter(1))
b = [i[0] for i in b]
result = b[:3]
result
# [0,7,10]
So I'm trying to get a multidimensional array to work in CoffeeScript. I have tried with standard Python list comprehension notation, that makes the inner bracket a string or something. so I can't do list[0][1] to get 1, I instead get list[0][0] = '1,1' and list[0][1] = ''
[[i, 1] for i in [1]]
Using a class as the storage container, to then grab x and y. Which gives 'undefined undefined', rather then '1 1' for the latter part.
class Position
constructor:(#x,#y) ->
x = [new Position(i,1) for i in [1]]
for i in x
alert i.x + ' ' + i.y#'undefined undefined'
i = new Position(1,1)
alert i.x + ' ' + i.y#'1 1'
Being able to use a list of points is extremely needed and I cannot find a way to make a list of them. I would prefer to use a simple multidimensional array, but I don't know how.
You just need to use parenthesis, (), instead of square brackets, [].
From a REPL:
coffee> ([i, 1] for i in [1])
[ [ 1, 1 ] ]
coffee> [[i, 1] for i in [1]]
[ [ [ 1, 1 ] ] ]
you can see that using the square brackets, as you would in Python, puts the generating expression inside of an extra list.
This is because the parenthesis, () are actually only there in CoffeeScript for when you want to assign the expression to a variable, so:
coffee> a = ([i, 1] for i in [1])
[ [ 1, 1 ] ]
coffee> a[0][1]
1
coffee> b = [i, 1] for i in [1]
[ [ 1, 1 ] ]
coffee> b[0][1]
undefined
Also, see the CoffeeScript Cookbook.
There is a question very similar to this already but I would like to do this for multiple arrays. I have an array of arrays.
my #AoA = (
$arr1 = [ 1, 0, 0, 0, 1 ],
$arr2 = [ 1, 1, 0, 1, 1 ],
$arr3 = [ 2, 0, 2, 1, 0 ]
);
I want to sum the items of all the three (or more) arrays to get a new one like
( 4, 1, 2, 2, 2 )
The use List::MoreUtils qw/pairwise/ requires two array arguments.
#new_array = pairwise { $a + $b } #$arr1, #$arr2;
One solution that comes to mind is to loop through #AoA and pass the first two arrays into the pairwise function. In the subsequent iterations, I will pass the next #$arr in #AoA and the #new_array into the pairwise function. In the case of an odd sized array of arrays, after I've passed in the last #$arr in #AoA, I will pass in an equal sized array with elements of 0's.
Is this a good approach? And if so, how do I implement this? thanks
You can easily implement a “n-wise” function:
sub nwise (&#) # ← take a code block, and any number of further arguments
{
my ($code, #arefs) = #_;
return map {$code->( do{ my $i = $_; map $arefs[$_][$i], 0 .. $#arefs } )}
0 .. $#{$arefs[0]};
}
That code is a bit ugly because Perl does not support slices of multidimensional arrays. Instead I use nested maps.
A quick test:
use Test::More;
my #a = (1, 0, 0, 0, 1);
my #b = (1, 1, 0, 1, 1);
my #c = (2, 0, 2, 1, 0);
is_deeply [ nwise { $_[0] + $_[1] + $_[2] } \#a, \#b, \#c], [4, 1, 2, 2, 2];
I prefer passing the arrays as references instead of using the \# or + prototype: This allows you to do
my #arrays = (\#a, \#b, \#c);
nwise {...} #arrays;
From List::MoreUtils you could have also used each_arrayref:
use List::Util qw/sum/;
use List::MoreUtils qw/each_arrayref/;
my $iter = each_arrayref #arrays;
my #out;
while (my #vals = $iter->()) {
push #out, sum #vals;
}
is_deeply \#out, [4, 1, 2, 2, 2];
Or just plain old loops:
my #out;
for my $i (0 .. $#a) {
my $accumulator = 0;
for my $array (#arrays) {
$accumulator += $array->[$i];
}
push #out, $accumulator;
}
is_deeply \#out, [4, 1, 2, 2, 2];
The above all assumed that all arrays were of the same length.
A note on your snippet:
Your example of the array structure is of course legal perl, which will even run as intended, but it would be best to leave out the inner assignments:
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
You might actually be looking for PDL, the Perl Data Language. It is a numerical array module for Perl. It has many functions for processing arrays of data. Unlike other numerical array modules for other languages it has this handy ability to use its functionality on arbitrary dimensions and it will do what you mean. Note that this is all done at the C level, so it is efficient and fast!
In your case you are looking for the projection method sumover which will take an N dimensional object and return an N-1 dimensional object created by summing over the first dimension. Since in your system you want to sum over the second we first have to transpose by exchanging dimensions 0 and 1.
#!/usr/bin/env perl
use strict;
use warnings;
use PDL;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ],
);
my $pdl = pdl \#AoA;
my $sum = $pdl->xchg(0,1)->sumover;
print $sum . "\n";
# [4 1 2 2 2]
The return from sumover is another PDL object, if you need a Perl list you can use list
print "$_\n" for $sum->list;
Here's a simple iterative approach. It probably will perform terribly for large data sets. If you want a better performing solution you will probably need to change the data structure, or look on CPAN for one of the statistical packages. The below assumes that all arrays are the same size as the first array.
$sum = 0;
#rv = ();
for ($y=0; $y < scalar #{$AoA[0]}; $y++) {
for ($x=0; $x < scalar #AoA; $x++) {
$sum += ${$AoA[$x]}[$y];
}
push #rv, $sum;
$sum = 0;
}
print '('.join(',',#rv).")\n";
Assumptions:
each row in your AoA will have the same number of columns as the first row.
each value in the arrayrefs will be a number (specifically, a value in a format that "works" with the += operator)
there will be at least one "row" with sat least one "column"
Note: "$#{$AoA[0]}" means, "the index of the last element ($#) of the array that is the first arrayref in #AoA ({$AoA[0]})"
(shebang)/usr/bin/perl
use strict;
use warnings;
my #AoA = (
[ 1, 0, 0, 0, 1 ],
[ 1, 1, 0, 1, 1 ],
[ 2, 0, 2, 1, 0 ]
);
my #sums;
foreach my $column (0..$#{$AoA[0]}) {
my $sum;
foreach my $aref (#AoA){
$sum += $aref->[$column];
}
push #sums,$sum;
}
use Data::Dumper;
print Dumper \#sums;