I am unsure of the correct terminology to search for to find the correct optimisation. I want to simplify the final four lines of code below into two lines, whereby, the addition of +/- 1 is done during the assignment to plus and minus variables respectively.
# generic params to simulate loop conditions
position = np.arange(10)
axis = 2
# actual code to optimise
plus = np.asarray(position)
plus[axis] += 1
minus = np.asarray(position)
minus[axis] -= 1
To clarify this is an iteration problem: Any solutions that don't take generic position or axis variables are wrong i.e. explicitly the following are not solutions:
plus = np.asarray([0,1,3,3,4,5,6,7,8,9])
plus = np.asarray(range(axis)+[position[axis]+1]+range(axis+1,len(position)))
Here's an approach using np.in1d to condense those four lines to two -
mask = np.in1d(np.arange(position.size),axis)
plus, minus = position + mask, position - mask
Sample run
Let's test it out for a generic position array with another index 6 -
In [60]: position
Out[60]: array([1, 0, 6, 8, 1, 7, 1, 3, 1, 6])
In [61]: axis = 6
In [62]: mask = np.in1d(np.arange(position.size),axis)
In [63]: plus, minus = position + mask, position - mask
In [64]: plus
Out[64]: array([1, 0, 6, 8, 1, 7, 2, 3, 1, 6]) # Change at 6th index
In [65]: minus
Out[65]: array([1, 0, 6, 8, 1, 7, 0, 3, 1, 6]) # Change at 6th index
Related
Problem:
The most computationally efficient solution to getting the indices of boundaries in an array where starts of boundaries always start with a particular number and non-boundaries are indicated by a different particular number.
Differences between this question and other boundary-based numpy questions on SO:
here are some other boundary based numpy questions
Numpy 1D array - find indices of boundaries of subsequences of the same number
Getting the boundary of numpy array shape with a hole
Extracting boundary of a numpy array
The difference between the question I am asking and other stackoverflow posts in my attempt to search for a solution is that the other boundaries are indicated by a jump in value, or a 'hole' of values.
What seems to be unique to my case is the starts of boundaries always start with a particular number.
Motivation:
This problem is inspired by IOB tagging in natural language processing. In IOB tagging, the start of a word is tagged with B [beginning] is the tag of the first letter in an entity, I [inside] is the tag for all other characters besides the first character in a word, and [O] is used to tag all non-entity characters
Example:
import numpy as np
a = np.array(
[
0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
]
)
1 is the start of each boundary. If a boundary has a length greater than one, then 2 makes up the rest of the boundary. 0 are non-boundary numbers.
The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1
So the desired solution; the indices of the indices boundary values for a are
desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]
Current Solution:
If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.
I can get the start indices using
starts = np.where(a==1)[0]
starts
array([ 3, 10, 13, 15, 16, 19, 20, 21])
So what's left is 6, 10, 14, 15,16,19,20,21
I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.
first = np.where(a[:-1] - 2 == a[1:])[0]
first
array([6])
second = np.where((a[:-1] - 1 == a[1:]) &
((a[1:]==1) | (a[1:]==0)))[0]
second
array([10, 14, 16])
third = np.where(
(a[:-1] == a[1:]) &
(a[1:]==1)
)[0]
third
array([15, 19, 20])
The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.
Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.
if (a[-1] == 1) | (a[-1] == 2):
pen = np.concatenate((
starts, first, second, third, np.array([a.shape[0]-1])
))
else:
pen = np.concatenate((
starts, first, second, third,
))
np.sort(pen).reshape(-1,2)
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.
A standard trick for this type of problem is to pad the input appropriately. In this case, it is helpful to append a 0 to the end of the array:
In [55]: a1 = np.concatenate((a, [0]))
In [56]: a1
Out[56]:
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
0])
Then your starts calculation still works:
In [57]: starts = np.where(a1 == 1)[0]
In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])
The condition for the end is that the value is a 1 or a 2 followed by a value that is not 2. You've already figured out that to handle the "followed by" condition, you can use a shifted version of the array. To implement the and and or conditions, use the bitwise binary operators & and |, respectiveley. In code, it looks like:
In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]
In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])
Finally, put starts and ends into a single array:
In [63]: np.column_stack((starts, ends))
Out[63]:
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
For an numpy 1d array such as:
In [1]: A = np.array([2,5,1,3,9,0,7,4,1,2,0,11])
In [2]: A
Out[2]: array([2,5,1,3,9,0,7,4,1,2,0,11])
I need to split the array by using the values as a sub-array length.
For the example array:
The first index has a value of 2, so I need the first split to occur at index 0 + 2, so it would result in ([2,5,1]).
Skip to index 3 (since indices 0-2 were gobbled up in step 1).
The value at index 3 = 3, so the second split would occur at index 3 + 3, and result in ([3,9,0,7]).
Skip to index 7
The value at index 7 = 4, so the third and final split would occur at index 7 + 4, and result in ([4,1,2,0,11])
I'm using this simple array as an example, because I think it will help in my actual use case, which is reading data from binary files (either as bytes or unsigned shorts). I'm guessing that numpy will be the fastest way to do it, but I could also use struct/bytearray/lists or whatever would be best.
I hope this makes sense. I had a hard time trying to figure out how best to word the question.
Here is an approach using standard python lists and a while loop:
def custom_partition(arr):
partitions = []
i = 0
while i < len(arr):
pariton_size = arr[i]
next_i = i + pariton_size + 1
partitions.append(arr[i:next_i])
i = next_i
return partitions
a = [2, 5, 1, 3, 9, 0, 7, 4, 1, 2, 0, 11]
b = custom_partition(a)
print(b)
Output:
[[2, 5, 1], [3, 9, 0, 7], [4, 1, 2, 0, 11]]
Say I have an array of numbers, e.g. [0, 1, 2, 3, 4, 5] and I want to end up with an array, e.g. [2, 1, 4, 0, 5, 3]. At my disposal, I have a single method that I can use:
move(fromIndex, toIndex)
Thus, to achieve my desired array, I could call the method a number of times:
move(2, 0); // [2, 0, 1, 3, 4, 5]
move(1, 2); // [2, 1, 0, 3, 4, 5] (swapped 2 with 0)
move(4, 2); // [2, 1, 4, 0, 3, 5]
move(3, 4); // [2, 1, 4, 3, 0, 5] (swapped 4 with 0)
move(4, 3); // [2, 1, 4, 0, 3, 5] (swapped 0 with 3)
move(5, 4); // [2, 1, 4, 0, 5, 3] (swapped 5 with 3)
Thus, I also have a list of move() operations to achieve my desired result. The list of move() operations can possibly be reduced in size by changing the order and the indexes, to end up with the same result.
Is there an algorithm that I can use on my list of move() operations to reduce the number of operations to a minimum?
We can create a graph with an element pointing towards the number it needs to be swapped with to get to the desired position. Hence we will get multiple graphs with possible cycles. In your particular case,we will get
2->0->3->5->4->2 (first and last elements denote a cycle)
This means that 2 wants to be swapped with 0 to get to the desired position. Similarly,0 wants to be swapped with 3 to get to the desired position. Notice that 1 does not want to be swapped.
Now, we can swap two adjacent elements of the graph to reduce the graph size by 1. Say we swap 3 and 5 so now the arr = [0,1,2,5,4,3]. Now 3 is in desired state so we can remove it from graph
2->0->5->4->2
We need to repeat this process (m-1) times to remove the graph completely. Here m represents the number of edges of the graph. We can have multiple disconnected graphs or graphs without cycles. We need to make sure that we are swapping elements from the same graph. The final answer would be the sum of all steps (that is m-1 for each component) of the graph.
I want create method that return an array which contains exactly the same numbers as the given array, but rearranged so that every 3 is immediately followed by a 4.
Do not move the 3's, but every other number may move. The array contains the same number of 3's and 4's, every 3 has a number after it that is not a 3 or 4, and a 3 appears in the array before any 4.
Example:
problem({1, 3, 1, 4, 4, 3, 1}) → {1, 3, 4, 1, 1, 3, 4}
problem({3, 2, 2, 4}) → {3, 4, 2, 2}
thanks .
Set i = 0, j = 0. Then you repeat the following:
Find the first 3 at an index ≥ i which is not followed by a 4. If none are found, you succeeded. If the 3 is the last number in the array or followed by a 3, you failed. Now find the first 4 at an index ≥ j which is not preceded by a 3. If none are found, you fail. Otherwise set i = location of the 3, j = location of the 4, exchange the objects at positions i+1 and j, set i = i + 2 and j = j + 1, and repeat.
I don't like writing code that depends on promises about the data that I don't verify myself, so this will work whatever is in the array.
Lets say I have an array
Y = [1, 2, 3, 4, 5, 6]
I want to make a new array that replaces every other number with 0, so it creates
y = [1, 0, 3, 0, 5, 0]
How would I go about approaching this and writing code for this in a efficient way?
This should do that:
Y(2:2:end) = 0;
With this line you basically say each element starting from the seconds up to the last, in steps of two, should be zero. This can be done for larger steps too:, Y(N:N:end) = 0 makes every Nth element equal to 0.