Splitting array in three different arrays of maximum three elements [duplicate] - arrays

This question already has answers here:
How to split (chunk) a Ruby array into parts of X elements? [duplicate]
(2 answers)
Closed 7 years ago.
I have an array:
array = [12, 13, 14, 18, 17, 19, 30, 23]
I need to split this array into arrays of maximum three elements each:
[12, 13, 14] [18, 17, 19] [30, 23]
How can I do this?

Try this...
Using Enumerable#each_slice to slice array x value
array = [12, 13, 14, 18, 17, 19, 30, 23]
array.each_slice(3)
array.each_slice(3).to_a

Take a look at Enumerable#each_slice:
foo.each_slice(3).to_a
#=> [["1", "2", "3"], ["4", "5", "6"], ["7", "8", "9"], ["10"]]
If you're using rails you can also use in_groups_of:
foo.in_groups_of(3)

By this time, I hope you got your answer. If you are using Rails, you can go with in_groups, you won't have to call to_a explicitly then :
array.in_groups(3)
# => [[12, 13, 14], [18, 17, 19], [30, 23, nil]]
array.in_groups(3, false)
# => [[12, 13, 14], [18, 17, 19], [30, 23]]
One more advantage of using in_groups is, you can preserve the array size (strictly). It will fill_with = nil to maintain the array size.

Related

Get coordinates in a 2D array? [duplicate]

This question already has answers here:
How do I get indices of N maximum values in a NumPy array?
(21 answers)
Closed 1 year ago.
I have this [116, 116] array, and I would like to get the coordinates/indices of the 10 maximum values present in that array.
How can I achieve that?
Thanks!
Let's create a test array arr as:
array([[ 1, 2, 141, 4, 5, 6],
[ 7, 143, 9, 10, 11, 12],
[ 13, 14, 15, 145, 17, 18],
[ 19, 20, 21, 22, 23, 24],
[ 25, 26, 27, 28, 29, 30]])
To find cordinates of e.g. 3 maximum values, run:
ind = np.divmod(np.flip(np.argsort(arr, axis=None)[-3:]), arr.shape[1])
The result is a 2-tuple with row and column coordinates:
(array([2, 1, 0], dtype=int64), array([3, 1, 2], dtype=int64))
To test it, you can print indicated elements:
arr[ind]
getting:
array([145, 143, 141])
Now replace -3 with -10 and you will get coordinates of 10
max elements.
See this answer using np.argpartition.

Easy way to do nd-array contraction using advanced indexing in Python

I know there must be an elegant way to do this using advanced indexing, I just can't figure it out.
Suppose I have the (2,3,4) array
x = array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
and the (4,) array y = array([1,0,1,1])
What is the most elegant way to obtain the (3,4) array that
z = np.zeros((3,4))
for ii in xrange(3):
for jj in xrange(4):
z[ii,jj] = x[y[jj],ii,jj]
produces?
In [490]: x[y,:,np.arange(4)]
Out[490]:
array([[12, 16, 20],
[ 1, 5, 9],
[14, 18, 22],
[15, 19, 23]])
We need to transpose this. With a mix of basic and advanced indexing, the slice dimension has been put last:
In [491]: x[y,:,np.arange(4)].T
Out[491]:
array([[12, 1, 14, 15],
[16, 5, 18, 19],
[20, 9, 22, 23]])
(that basic/advanced quirk is documented and discussed in some SO.)
or with advanced indexing all around:
In [492]: x[y,np.arange(3)[:,None],np.arange(4)]
Out[492]:
array([[12, 1, 14, 15],
[16, 5, 18, 19],
[20, 9, 22, 23]])

How do you maintain an index when you sort two index synced (paired) arrays?

I have two arrays that I need to keep the index pairs together:
arr1 = [17,9,8,20,14,16]
arr2 = [27,13,10,10,24,18]
I want to return them both as:
arr1 = [8,9,14,16,17,20]
arr2 = [10,13,24,18,27,10]
I've tried arr1.each.zip(arr2.each).sort which gives me: [[8, 10], [9, 13], [14, 24], [16, 18], [17, 27], [20, 10]]. I was hoping there was a faster way that also maintains the arrays.
I then went on to transpose which got me my nested arrays but then I just can't seem to get the map right to fix my original arrays.
arr1.each.zip(arr2.each).sort.transpose.map {
|a_1| a1.map { |a_2| arr1 = a_1; arr2 = a_2 }
}
I also feel like there should be a simpler less time and space complex solution to this as well.
You're very close.
arr1 = [17,9,8,20,14,16]
arr2 = [27,13,10,10,24,18]
arr1, arr2 = arr1.zip(arr2).sort.transpose
#=> [[8, 9, 14, 16, 17, 20], [10, 13, 24, 18, 27, 10]]
arr1
#=> [8, 9, 14, 16, 17, 20]
arr2
#=> [10, 13, 24, 18, 27, 10]
Note that if arr1 contains duplicates the corresponding values in arr2 will break ties in sorting.
Another way, if you only wish to sort on arr1, is the following.
sorted_indices = arr1.each_index.sort_by { |i| arr1[i] }
#=> [2, 1, 4, 5, 0, 3]
arr1 = arr1.values_at(*sorted_indices)
#=> [8, 9, 14, 16, 17, 20]
arr2 = arr2.values_at(*sorted_indices)
#=> [10, 13, 24, 18, 27, 10]
See Enumerable#sort_by and Array#values_at.

Pyspark 2.1.0 wrapped array to array

I have a Spark (Python) dataframe with two columns: a user ID and then an array of arrays, which is represented in Spark as a wrapped array like so:
[WrappedArray(9, 10, 11, 12), WrappedArray(20, 21, 22, 23, 24, 25, 26)]
In its usual representation this would look like this:
[[9, 10, 11, 12], [20, 21, 22, 23, 24, 25, 26]]
I want to perform operations on each of the subarrays, for example take a third list and check whether any of its values is in the first sub-array, but I can't seem to find solutions for pyspark 2.0 (only Scala-specific older solutions like this and this).
How does one access (and in general work with) wrapped arrays? What is an efficient way to do what I described above?
You can treat each wrapped array as individual list . in your example, if you want to which elements from 2nd wrapped array is present in first array, you could do something like -
# Prepare data
data = [[10001,[9, 10, 11, 12],[20, 10, 9, 23, 24, 25, 26]],
[10002,[8, 1, 2, 3],[49, 3, 6, 5, 6]],
]
rdd = sc.parallelize(data)
df = rdd.map(
lambda row : row+[
[x for x in row[2] if x in row[1]]
]
).toDF(["userID","array1","array2","commonElements"])
df.show()
output :
+------+---------------+--------------------+--------------+
|userID| array1| array2|commonElements|
+------+---------------+--------------------+--------------+
| 10001|[9, 10, 11, 12]|[20, 10, 9, 23, 2...| [10, 9]|
| 10002| [8, 1, 2, 3]| [49, 3, 6, 5, 6]| [3]|
+------+---------------+--------------------+--------------+

Pythonic algorithm for doing an array calculation

I am looking for the most efficient and pythonic algorithm for doing an array calculation. Here is the problem:
I have an array of shape (5,2,3) and its sum along the axis=0 as follows:
import numpy as np
A = np.array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
B = A.sum(axis=0)
So B is basically equal to A[0]+A[1]+A[2]+A[3]+A[4] which is:
array([[232, 142, 283],
[387, 367, 229]])
I want to know at what stage of the sum process, each of 6 elements of B has gone bigger than 100. For example element B[0,0] goes above 100 after 3 steps: A[0]+A[1]+A[2], or B[1,1] goes above 100 after 2 steps A[0]+A[1].
So the final output of the algorithm should be this array:
array([[3, 5, 2],
[2, 2, 4]])
I know I can do the calculation for each element separately but I was wondering if anyone could come up with a creative and faster algorithm.
Cheers,
Use cumsum to get a cumulative summation, compare it against the threshold and finally use argmax to catch it as the first instance of crossing that threshold -
(A.cumsum(axis=0) > 100).argmax(axis=0)+1
Sample run -
In [228]: A
Out[228]:
array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
In [229]: (A.cumsum(0) > 100).argmax(0)+1
Out[229]:
array([[3, 5, 2],
[2, 2, 4]])

Resources