Binning then sorting arrays in each bin but keeping their indices together - arrays

I have two arrays and the indices of these arrays are related. So x[0] is related to y[0], so they need to stay organized. I have binned the x array into two bins as shown in the code below.
x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]
binx = [0,4,9]
index = np.digitize(x,binx)
Giving me the following:
In [1]: index
Out[1]: array([1, 2, 2, 1, 2])
So far so good. (I think)
The y array is a parameter telling me how well measured the x data point is, so .9 is better than .2, so I'm using the next code to sort out the best of the y array:
y.sort()
ysorted = y[int(len(y) * .5):]
which gives me:
In [2]: ysorted
Out[2]: [0.6, 0.7, 0.8]
giving me the last 50% of the array. Again, this is what I want.
My question is how do I combine these two operations? From each bin, I need to get the best 50% and put these new values into a new x and new y array. Again, keeping the indices of each array organized. Or is there an easier way to do this? I hope this makes sense.

Many numpy functions have arg... variants that don't operate "by value" but rather "by index". In your case argsort does what you want:
order = np.argsort(y)
# order is an array of indices such that
# y[order] is sorted
top50 = order[len(order) // 2 :]
top50x = x[top50]
# now top50x are the x corresponding 1-to-1 to the 50% best y

You should make a list of pairs from your x and y lists
It can be achieved with the zip function:
x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]
values = zip(x, y)
values
[(1, 0.1), (4, 0.7), (7, 0.6), (0, 0.8), (5, 0.3)]
To sort such a list of pairs by a specific element of each pair you may use the sort's key parameter:
values.sort(key=lambda pair: pair[1])
[(1, 0.1), (5, 0.3), (7, 0.6), (4, 0.7), (0, 0.8)]
Then you may do whatever you want with this sorted list of pairs.

Related

How to do a cartesian product of a variable number of lists in Julia?

For each value j in the set {1, 2, ..., n} where the value of n can vary (it is some variable in my program that can be different depending on the inputs from the user), I have an array A_j. I would like to obtain the cartesian product of all the arrays A_j, so that I can then iterate through that cartesian product (taking one element from each A_1, A_2, ... A_n to get a tuple (a_1, a_2, ..., a_n) in A_1 x A_2 x ... x A_n). How would I accomplish this in Julia?
Use Iterators.product:
help?> Iterators.product
product(iters...)
Return an iterator over the product of several iterators. Each generated
element is a tuple whose ith element comes from the ith argument iterator.
The first iterator changes the fastest.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> collect(Iterators.product(1:2, 3:5))
2×3 Matrix{Tuple{Int64, Int64}}:
(1, 3) (1, 4) (1, 5)
(2, 3) (2, 4) (2, 5)

minimum operations to make array left part equal to right part

Given an even length array, [a1, a2,....,an], a beautiful array is an array where a[i] == a[i + n / 2] for 0<= i < n / 2. define an operation as change all array elements equal to value x to value y. what's the minimum operations required to make a given array beautiful? all elements are in range [1, 100000]. If simply return unmatch array pairs (ignore order) in left and right part of array, it will return wrong results in some cases such as [1, 1, 2, 5, 2, 5, 5, 2], unmatched pairs are (1, 2), (1, 5), (2, 5), but when change 2 -> 5, than (1, 2) and (1, 5) become the same. so what's the correct method to solve this problem?
It is a graph question.
For every pair(a[i], a[i+n/2]) where a[i]!=a[i+n/2], add an undirected edge between the two nodes.
Note that you shouldn't add multiple edges between 2 numbers.
Now you essentially need to remove all the edges in the graph by performing some operations. The final answer is the number of operations.
In each operation, you remove an edge. After removing an edge between two vertices, combine the vertices and rearrange their edges.

How can I create a tiled/stacked array based on ranges using these 2 input arrays - but without looping?

My basic problem is that I need to use 2 arrays with integers, and arrive at an combined array that is the combination of many ranges made using pairwise combinations from the 2 initial arrays.
Said slightly differently, I want to use 2 arrays, combine them to produce a set of ranges, and then merge these ranges together. Importantly, I need to do this without using any looping, as I am going to need to do this almost 4 million times.
My 2 starting arrays are:
import numpy as np
sd = np.array([3,3,4,2,5,1]) # StartDate
ed = np.array([4,5,5,5,8,2]) # EndDate
Pairwise, they would look like this, combining (sd[i] with ed[i]):
[(3, 4), (3, 5), (4, 5), (2, 5), (5, 8), (1, 2)] # Pairwise combinations of StartDate and EndDate
By way of example, I could iterate over these pairs, creating ranges, exemplifying below:
[In]: range1 = np.arange(3,4)
[Out]: array([3])
[In]: range2 = np.arange(3,5)
[Out]: array([3,4])
...and so on, to arrive at the final out put which would be:
array([3, 3, 4, 4, 2, 3, 4, 5, 6, 7, 1]) # End result where the arrays are tiled after one another
#(note first 3 digits are array 1 and array 2 from immediately above.
My issue is that I need to go from the input arrays and to the output array without looping, as I have already tried a version of this, and it is WAY too slow. Any help very much appreciated.
You are in luck. Here is a one liner solution:
indexer = np.r_[tuple([np.s_[i:j] for (i,j) in zip(sd,ed)])]
output:
[3 3 4 4 2 3 4 5 6 7 1]
I have also explained similar case in here for torch: "Here is how it works:
np.s_[i:j] creates a slice object (simply a range) of indices from start=i to end=j.
np.r_[i:j, k:m] creates a list ALL indices in slices (i,j) and (k,m) (You can pass more slices to np.r_ to concatenate them all together at once. This is an example of concatenating only two slices.)
Therefore, indexer creates a list of ALL indices by concatenating a list of slices (each slice is a range of indices)."

Numpy array questions about .shape

I'm new to numpy, and I have some troubles in array shapes.
I want to operate the array like a matrix in matlab. However, I'm confused about the following things:
>>> b = np.array([[1,2],[3,4]])
array([[1, 2],
[3, 4]])
>>> c = b[:,1] # I want c is a column vector
>>> c.shape
(2,)
>>> d = b[1,:] # I want d is a row vector
>>> d.shape
>>> (2,)
I want to treat c and d as column vector and row vector respectively.
I don't understand why c and d have the same shape (2,).
So it troubles me in later calculations.
Could anyone help me deal with this problem. Thanks a lot !
Using a plain integer as an index returns that column/row as a true vector. This is similar to indexing a list - you only receive the element at that index. The containing dimension is stripped away:
>>> my_list = ['a', 'b', 'c', 'd']
>>> my_list[2]
'c'
Instead, you want a slice. A slice of a list is a (sub-) list, and a slice of a matrix is a matrix. With numpy, you can specify this either as slice notation using : or a sequence of indices:
>>> c = b[:,:1] # slice notation
>>> c.shape
(2, 1)
>>> d = b[[1],:] # sequence of indices
>>> d.shape
(1, 2)
Slice notation is for consecutive index ranges. For example, :1 means "everything from the start up to 1". Sequence notation is for non-consecutive index sets. For example, [0, 2] does skip index 1. If you just want a single index, sequence notation is simpler unless you are dealing with borders (first/last row/column).
You can use
c = b[:,[1]]
d = b[[1],:]
to get the vector as an explicit row/column vector:
c.shape # (1, 2)
d.shape # (2, 1)
In general, if you want your array c to be a column vector of shape (2,1), you can reshape it by:
c = c.reshape(-1,1) # c.shape --> (2, 1)
Similarly, if you want your array d to be a row vector of shape (1,2), you can reshape it by:
d = d.reshape(1,-1) # d.shape --> (1, 2)

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Resources