Related
Problem:
The most computationally efficient solution to getting the indices of boundaries in an array where starts of boundaries always start with a particular number and non-boundaries are indicated by a different particular number.
Differences between this question and other boundary-based numpy questions on SO:
here are some other boundary based numpy questions
Numpy 1D array - find indices of boundaries of subsequences of the same number
Getting the boundary of numpy array shape with a hole
Extracting boundary of a numpy array
The difference between the question I am asking and other stackoverflow posts in my attempt to search for a solution is that the other boundaries are indicated by a jump in value, or a 'hole' of values.
What seems to be unique to my case is the starts of boundaries always start with a particular number.
Motivation:
This problem is inspired by IOB tagging in natural language processing. In IOB tagging, the start of a word is tagged with B [beginning] is the tag of the first letter in an entity, I [inside] is the tag for all other characters besides the first character in a word, and [O] is used to tag all non-entity characters
Example:
import numpy as np
a = np.array(
[
0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
]
)
1 is the start of each boundary. If a boundary has a length greater than one, then 2 makes up the rest of the boundary. 0 are non-boundary numbers.
The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1
So the desired solution; the indices of the indices boundary values for a are
desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]
Current Solution:
If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.
I can get the start indices using
starts = np.where(a==1)[0]
starts
array([ 3, 10, 13, 15, 16, 19, 20, 21])
So what's left is 6, 10, 14, 15,16,19,20,21
I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.
first = np.where(a[:-1] - 2 == a[1:])[0]
first
array([6])
second = np.where((a[:-1] - 1 == a[1:]) &
((a[1:]==1) | (a[1:]==0)))[0]
second
array([10, 14, 16])
third = np.where(
(a[:-1] == a[1:]) &
(a[1:]==1)
)[0]
third
array([15, 19, 20])
The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.
Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.
if (a[-1] == 1) | (a[-1] == 2):
pen = np.concatenate((
starts, first, second, third, np.array([a.shape[0]-1])
))
else:
pen = np.concatenate((
starts, first, second, third,
))
np.sort(pen).reshape(-1,2)
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.
A standard trick for this type of problem is to pad the input appropriately. In this case, it is helpful to append a 0 to the end of the array:
In [55]: a1 = np.concatenate((a, [0]))
In [56]: a1
Out[56]:
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
0])
Then your starts calculation still works:
In [57]: starts = np.where(a1 == 1)[0]
In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])
The condition for the end is that the value is a 1 or a 2 followed by a value that is not 2. You've already figured out that to handle the "followed by" condition, you can use a shifted version of the array. To implement the and and or conditions, use the bitwise binary operators & and |, respectiveley. In code, it looks like:
In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]
In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])
Finally, put starts and ends into a single array:
In [63]: np.column_stack((starts, ends))
Out[63]:
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
I am applying Dijkstra on each node in python using spyder. I am getting the correct results too but I am unable to store the results (1-D arrays) into an nxn matrix by adding a row. I am storing the data obtained from Dijkstra in D_path variable but in variable explorer, it is giving me the type as NoneType Object. I am appending it with the row and appending the row with the matrix.
import sys
import numpy as np
class Graph():
def __init__(self, vertices):
self.V = vertices
self.graph = [[0 for column in range(vertices)]
for row in range(vertices)]
def printSolution(self, dist):
print("Vertex tDistance from Source")
for node in range(self.V):
print(node, "t", dist[node])
# A utility function to find the vertex with
# minimum distance value, from the set of vertices
# not yet included in shortest path tree
def minDistance(self, dist, sptSet):
# Initialize minimum distance for next node
min = sys.maxsize
# Search not nearest vertex not in the
# shortest path tree
for v in range(self.V):
if dist[v] < min and sptSet[v] == False:
min = dist[v]
min_index = v
return min_index
# Funtion that implements Dijkstra's single source
# shortest path algorithm for a graph represented
# using adjacency matrix representation
def dijkstra(self, src):
dist = [sys.maxsize] * self.V
dist[src] = 0
sptSet = [False] * self.V
for cout in range(self.V):
# Pick the minimum distance vertex from
# the set of vertices not yet processed.
# u is always equal to src in first iteration
u = self.minDistance(dist, sptSet)
# Put the minimum distance vertex in the
# shortest path tree
sptSet[u] = True
# Update dist value of the adjacent vertices
# of the picked vertex only if the current
# distance is greater than new distance and
# the vertex in not in the shortest path tree
for v in range(self.V):
if self.graph[u][v] > 0 and sptSet[v] == False and dist[v] > dist[u] + self.graph[u][v]:
dist[v] = dist[u] + self.graph[u][v]
self.printSolution(dist)
# Driver program
g = Graph(9)
g.graph = [[0, 4, 0, 0, 0, 0, 0, 8, 0],
[4, 0, 8, 0, 0, 0, 0, 11, 0],
[0, 8, 0, 7, 0, 4, 0, 0, 2],
[0, 0, 7, 0, 9, 14, 0, 0, 0],
[0, 0, 0, 9, 0, 10, 0, 0, 0],
[0, 0, 4, 14, 10, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 2, 0, 1, 6],
[8, 11, 0, 0, 0, 0, 1, 0, 7],
[0, 0, 2, 0, 0, 0, 6, 7, 0]
]
#D_path = list()
matrix=[] #define empty matrix
for i in range(9): #total row is 3
row=[]
D_path = g.dijkstra(i)
row.append(D_path) #adding 0 value for each column for this row
matrix.append(row) #add fully defined column into the row
print (matrix)
The question itself is language-agnostic. I will use python for my example, mainly because I think it is nice to demonstrate the point.
I have an N-dimensional array of shape (n1, n2, ..., nN) that is contiguous in memory (c-order) and filled with numbers. For each dimension by itself, the numbers are ordered in ascending order. A 2D example of such an array is:
>>> import numpy as np
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> n1+n2
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10]])
In this case, the values in each row are ascending, and the values in each column are ascending, too. A 1D example array is
>>> n1 = np.arange(10)
>>> n1*n1
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
I would like to obtain a list/array containing the indices that would sort the flattened version of the nD array in ascending order. By the flattened array I mean that I interpret the nD-array as a 1D array of equivalent size. The sorting doesn't have to preserve order, i.e., the order of indices indexing equal numbers doesn't matter. For example
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> arr = n1*n2
>>> arr
array([[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 2, 4, 6, 8, 10, 12],
[ 0, 3, 6, 9, 12, 15, 18],
[ 0, 4, 8, 12, 16, 20, 24]])
>>> np.argsort(arr.ravel())
array([ 0, 28, 14, 7, 6, 21, 4, 3, 2, 1, 5, 8, 9, 15, 22, 10, 11,
29, 16, 12, 23, 17, 13, 18, 30, 24, 19, 25, 31, 20, 26, 32, 27, 33,
34], dtype=int64)
Standard sorting on the flattened array can accomplish this; however, it doesn't exploit the fact that the array is already partially sorted, so I suspect there exists a more efficient solution. What is the most efficient way to do so?
A comment asked what my use-case is, and if I could provide some more realistic test data for benchmarking. Here is how I encountered this problem:
Given an image and a binary mask for that image (which selects pixels), find the largest sub-image which contains only selected pixels.
In my case, I applied a perspective transformation to an image, and want to crop it so that there is no black background while preserving as much of the image as possible.
from skimage import data
from skimage import transform
from skimage import img_as_float
tform = transform.EuclideanTransform(
rotation=np.pi / 12.,
translation = (10, -10)
)
img = img_as_float(data.chelsea())[50:100, 150:200]
tf_img = transform.warp(img, tform.inverse)
tf_mask = transform.warp(np.ones_like(img), tform.inverse)[..., 0]
y = np.arange(tf_mask.shape[0])
x = np.arange(tf_mask.shape[1])
y1 = y[:, None, None, None]
y2 = y[None, None, :, None]
x1 = x[None, :, None, None]
x2 = x[None, None, None, :]
y_padded, x_padded = np.where(tf_mask==0.0)
y_padded = y_padded[None, None, None, None, :]
x_padded = x_padded[None, None, None, None, :]
y_inside = np.logical_and(y1[..., None] <= y_padded, y_padded<= y2[..., None])
x_inside = np.logical_and(x1[..., None] <= x_padded, x_padded<= x2[..., None])
contains_padding = np.any(np.logical_and(y_inside, x_inside), axis=-1)
# size of the sub-image
height = np.clip(y2 - y1 + 1, 0, None)
width = np.clip(x2 - x1 + 1, 0, None)
img_size = width * height
# find all largest sub-images
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
cropped_img = tf_img[y_low[0]:y_high[0]+1, x_low[0]:x_high[0]+1]
The algorithm is quite inefficient; I am aware. What is interesting for this question is img_size, which is a (50,50,50,50) 4D-array that is ordered as described above. Currently I do:
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
but with a proper argsort algorithm (that I can interrupt early) this could potentially be made much better.
I would do it using parts of mergesort and a divide and conquer approach.
You start with the first two arrays.
[0, 1, 2, 3, 4, 5, 6],//<- This
[ 1, 2, 3, 4, 5, 6, 7],//<- This
....
Then you can merge them like this (Java-like syntax):
List<Integer> merged=new ArrayList<>();
List<Integer> firstRow=... //Same would work with arrays
List<Integer> secondRow=...
int firstCnter=0;
int secondCnter=0;
while(firstCnter<firstRow.size()||secondCnter<secondRow.size()){
if(firstCnter==firstRow.size()){ //Unconditionally add all elements from the second, if we added all the elements from the first
merged.add(secondRow.get(secondCnter++));
}else if(secondCnter==secondRow.size()){
merged.add(firstRow.get(firstCnter++));
}else{ //Add the smaller value from both lists at the current index.
int firstValue=firstRow.get(firstCnter);
int secondValue=secondRow.get(secondCnter);
merged.add(Math.min(firstValue,secondValue));
if(firstValue<=secondValue)
firstCnter++;
else
secondCnter++;
}
}
After that you can merge the next two rows, until you have:
[0,1,1,2,2,3,3,4,4,5,5,6,7]
[2,3,3,4,4,5,5,6,6,7,7,8,8,9]
[4,5,6,7,8,9,10] //Not merged.
Continue to merge again.
[0,1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,7,7,7,8,8,9]
[4,5,6,7,8,9,10]
After that, the last merge:
[0,1,1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,9,9,10]
I don't know about the time complexity, but should be a viable solution
Another idea: Use a min-heap with just the current must-have candidates for being the next-smallest value. Start with the value at the origin (index 0 in all dimensions), as that's smallest. Then repeatedly take out the smallest value from the heap and add its neighbors not yet added.
So, I've searched a bit, but not entirely sure what to search on, tbh...
I'm currently doing some "level generation" code, and creating objects to contain the objects for my generation.
Basically I have a "Cell" class which is defined by a Coordinate.
I'm then trying to create a "CellArea" class, which holds multiple cells in an area. It'd be simple enough if this was a rectangle, as I'd just use Cell[, ] then. But since this could be an "L" shape of sorts (I'm fine with limiting it to 2 "Corridors"), how would I go about doing that? Or is it simply more efficient to do some sort of list/collection?
I was wondering if you can do a Jagged Array that'd look something like...
{0, 1, 2}
{0, 1, 2}
{0, 1, 2}
{0, 1, 2}
{0, 1, 2, 3, 4, 5}
{0, 1, 2, 3, 4, 5}
{0, 1, 2, 3, 4, 5}
I hope I made my question clear? Obviously I could do a bidim array that is [7, 6] and then not fill in the "null" places, but isn't that inefficient in a way?
Yes of course you can!
int[][] jaggedArray = new int[7][];
jaggedArray[0] = new int[] {0, 1, 2};
jaggedArray[1] = new int[] {0, 1, 2};
jaggedArray[2] = new int[] {0, 1, 2};
jaggedArray[3] = new int[] {0, 1, 2};
jaggedArray[4] = new int[] {0, 1, 2, 3, 4, 5};
jaggedArray[5] = new int[] {0, 1, 2, 3, 4, 5};
jaggedArray[6] = new int[] {0, 1, 2, 3, 4, 5};
Reference: http://msdn.microsoft.com/en-us/library/2s05feca.aspx
For a sequence of n distinct elements, like for example {0, 1, 2, ..., n-1}, where n is even, it is possible to identify n permutations of the sequence that are completely dissimilar from each other. If n is odd, then it is possible to identify n-1 such permutations. Note, this is just a claim and not yet a fact.
By completely dissimilar (which might not be accurate), I mean that there is no "adjacent and ordered pair" of elements, like (0,1), (2,3) and (n-3, n-2), from a certain permutation that repeats in any of the remaining permutations.
For example:
If n=2, then we can identify {0, 1} and {1, 0} as two completely dissimilar permutations.
If n=3, then we can identify {0, 1, 2} and {2, 1, 0} as two completely dissimilar permutations.
If n=4, then we can identify {0, 1, 2, 3}, {1, 3, 0, 2}, {2, 0, 3, 1} and {3, 2, 1, 0} as four completely dissimilar permutations.
If n=5, then we can identify {0, 1, 2, 3, 4}, {1, 4, 2, 0, 3}, {3, 0, 2, 4, 1}, {4, 3, 2, 1, 0} as four completely dissimilar permutations.
I will like to know:
1) Is there a general rule or algorithm to find, given a sequence, any n (if n is even) or n-1 (if n is odd) completely dissimilar permutations of that sequence?
2) Is there a formal definition of this problem?