Finding blocks in arrays - arrays

I was looking over some interview questions, and I stumbled onto this one:
There's an m x n array. A block in the array is denoted by a 1 and a 0 indicates no block. You are supposed to find the number of objects in the array. A object is nothing but a set of blocks that are connected horizontally and/or vertically.
eg
0 1 0 0
0 1 0 0
0 1 1 0
0 0 0 0
0 1 1 0
Answer: There are 2 objects in this array. The L shape object and the object in the last row.
I'm having trouble coming up with an algorithm that would catch a 'u' (as below) shape. How should i approach this?
0 1 0 1
0 1 0 1
0 1 1 1
0 0 0 0
0 1 1 0

One approach would use Flood Fill. The algorithm would be something like this:
for row in block_array:
for block in row:
if BLOCK IS A ONE and BLOCK NOT VISITED:
FLOOD_FILL starting from BLOCK
You'd mark items visited during the flood fill process, and track shapes from there as well.

This works in C#
static void Main()
{
int[][] array = { new int[] { 0, 1, 0, 1 }, new int[] { 0, 1, 0, 1 }, new int[] { 0, 1, 1, 1 }, new int[] { 0, 0, 0, 0 }, new int[] { 0, 1, 1, 0 } };
Console.WriteLine(GetNumber(array));
Console.ReadKey();
}
static int GetNumber(int[][] array)
{
int objects = 0;
for (int i = 0; i < array.Length; i++)
for (int j = 0; j < array[i].Length; j++)
if (ClearObjects(array, i, j))
objects++;
return objects;
}
static bool ClearObjects(int[][] array, int x, int y)
{
if (x < 0 || y < 0 || x >= array.Length || y >= array[x].Length) return false;
if (array[x][y] == 1)
{
array[x][y] = 0;
ClearObjects(array, x - 1, y);
ClearObjects(array, x + 1, y);
ClearObjects(array, x, y - 1);
ClearObjects(array, x, y + 1);
return true;
}
return false;
}

I would use Disjoint sets (connected components).
At the begining, each (i,j) matrix element with value 1 is one element set itself.
Then you can iterate over each matrix element and for each element (i,j) you should join each adjacent position set {(i+1,j),(i-1,j),(i,j+1),(i,j-1)} to (i,j) set if its value is 1.
You can find an implementation of disjoint sets at Disjoint Sets in Python
At the end, the number of diffrent sets is the number of objects.

I would use a disjoint-set datastructure (otherwise known as union-find).
Briefly: for each connected component, build an "inverse tree" using a single link per element as a "parent" pointer. Following the parent pointers will eventually find the root of the tree, which is used for component identification (as it is the same for every member of the connected component). To merge neighboring components, make the root of one component the parent of the other (which will no longer be a root, as it now has a parent).
Two simple optimizations make this datastructure very efficient. One is, make all root queries "collapse" their paths to point directly to the root -- that way, the next query will only need one step. The other is, always use the "deeper" of the two trees as the new root; this requires a maintaining a "rank" score for each root.
In addition, in order to make evaluating neighbors more efficient, you might consider preprocessing your input on a row-by-row basis. That way, a contiguous segment of 1's on the same row can start life as a single connected component, and you can efficiently scan the segments of the previous row based on your neighbor criterion.

My two cents (slash) algorithm:
1. List only the 1's.
2. Group (collect connected ones).
In Haskell:
import Data.List (elemIndices, delete)
example1 =
[[0,1,0,0]
,[0,1,0,0]
,[0,1,1,0]
,[0,0,0,0]
,[0,1,1,0]]
example2 =
[[0,1,0,1]
,[0,1,0,1]
,[0,1,1,1]
,[0,0,0,0]
,[0,1,1,0]]
objects a ws = solve (mapIndexes a) [] where
mapIndexes s =
concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws)
areConnected (y,x) (y',x') =
(y == y' && abs (x-x') == 1) || (x == x' && abs (y-y') == 1)
solve [] r = r
solve (x:xs) r =
let r' = solve' xs [x]
in solve (foldr delete xs r') (r':r)
solve' vs r =
let ys = filter (\y -> any (areConnected y) r) vs
in if null ys then r else solve' (foldr delete vs ys) (ys ++ r)
Output:
*Main> objects 1 example1
[[(4,2),(4,1)],[(2,2),(2,1),(1,1),(0,1)]]
(0.01 secs, 1085360 bytes)
*Main> objects 1 example2
[[(4,2),(4,1)],[(0,3),(1,3),(2,3),(2,2),(2,1),(1,1),(0,1)]]
(0.01 secs, 1613356 bytes)

Why not just look at all the adjacent cells of a given block? Start at some cell that has a 1 in it, keep track of the cells you have visited before, and keep looking through adjacent cells until you cannot find a 1 anymore. Then move onto cells that you have not looked yet and repeat the process.

Something like this should work:
while array has a 1 that's not marked:
Create a new object
Create a Queue
Add the 1 to the queue
While the queue is not empty:
get the 1 on top of the queue
Mark it
Add it to current object
look for its 4 neighbors
if any of them is a 1 and not marked yet, add it to queue

Related

Next greater element over a certain percentage of each element in array

I have seen some posts about next greater element. I am looking for a more performant solution for one of its variant.
The problem :
I have an array of numbers. I want to know for each number, the next index where the value become bigger than a percentage of X.
Example :
Let's suppose I have this array [1000, 900, 1005, 1022, 1006] and I set a target of 1%. Meanwhile, I want to know when the value become 1% bigger than it was.
1000 -> We want to know when value become bigger of equal to 1010 -> Index = 3
900 -> We want to know when value become bigger of equal to 909 -> Index = 2
1005 -> We want to know when value become bigger of equal to 1015.05 -> Index = 3
1022 -> We want to know when value become bigger of equal to 1030.2 -> Index = -1
1006 -> We want to know when value become bigger of equal to 1016.06 -> Index = -1
Naïve solution :
An O(n^2) algorithm can solve the problem. But it's too slow for my needs.
Does anyone know a faster algorithm to solve this problem or one of its close variant ?
I'd use a min heap. Each element in the min heap is a tuple (value, index) where value is the target value, and index is the index in the input array where that target value originated.
Then the algorithm is:
create an output array with all elements set to -1
for each element in the input array
for each target value on the min heap less than the element's value
pop the (targetValue, targetIndex) tuple
record the index of the current input element at the target index
add the current element (value, index) tuple to the min heap
For example, given the array in the question, the algorithm performs the following steps:
Create an output array with all elements set to -1
Read 1000, put (1010, 0) in the min heap.
Read 900, put (909, 1) in the min heap.
Read 1005. That's larger than 909, so pop the (909, 1), and record index 2 as the answer for element 909. Put (1015.05, 2) in the min heap.
Read 1022. Pop (1010, 0) and then (1015.05, 2) from the min heap, recording index 3 as the answer for elements 1000 and 1005. Put (1030.2, 3) in the min heap.
Read 1006, put (1016.06, 4) in the min heap.
Since the end of the input array has been reached, (1030.2, 3) and (1016.06, 4) will never be popped, and the corresponding elements in the output array remain as -1
Running time is O(nlogn).
Sample python implementation:
from heapq import heappush, heappop
def nextGreater(inputArray):
targetHeap = []
outputArray = [-1] * len(inputArray)
for inputIndex, inputValue in enumerate(inputArray):
while targetHeap and targetHeap[0][0] < inputValue:
targetValue, targetIndex = heappop(targetHeap)
outputArray[targetIndex] = inputIndex
heappush(targetHeap, (inputValue * 1.01, inputIndex))
return outputArray
inputArray = [1000, 900, 1005, 1022, 1006]
outputArray = nextGreater(inputArray)
print outputArray # [3, 2, 3, -1, -1]
You can create a list of tuples of index and value in array. Sort the list by value. Then you can iterate over the list using two pointers finding values that are greater by the given percentage and capture the corresponding indices. Complexity would be O(nlogn)
Sample implementation in java 17 given below:
final double percentage = 1.01;
int[] arr = new int[]{1000, 900, 1005, 1022, 1006};
record KeyValuePair(int value, int index) {}
List<KeyValuePair> keyValuePairs = new ArrayList<>();
for (int i = 0; i < arr.length; ++i) {
keyValuePairs.add(new KeyValuePair(arr[i], i));
}
keyValuePairs.sort(Comparator.comparingInt(KeyValuePair::value));
int i = 0, j = 1;
while (i != keyValuePairs.size() && j != keyValuePairs.size()) {
if (keyValuePairs.get(i).value() * percentage < keyValuePairs.get(j).value()) {
if (keyValuePairs.get(i).index() < keyValuePairs.get(j).index()) {
System.out.println("For index " + keyValuePairs.get(i).index() + " -> " + keyValuePairs.get(j).index());
} else if (keyValuePairs.get(i).index() + 1 != keyValuePairs.size()) {
System.out.println("For index " + keyValuePairs.get(i).index() + " -> " + (keyValuePairs.get(i).index() + 1));
}
++i;
} else {
++j;
}
}

Generate a matrix of combinations (permutation) without repetition (array exceeds maximum array size preference)

I am trying to generate a matrix, that has all unique combinations of [0 0 1 1], I wrote this code for this:
v1 = [0 0 1 1];
M1 = unique(perms([0 0 1 1]),'rows');
• This isn't ideal, because perms() is seeing each vector element as unique and doing:
4! = 4 * 3 * 2 * 1 = 24 combinations.
• With unique() I tried to delete all the repetitive entries so I end up with the combination matrix M1 →
only [4!/ 2! * (4-2)!] = 6 combinations!
Now, when I try to do something very simple like:
n = 15;
i = 1;
v1 = [zeros(1,n-i) ones(1,i)];
M = unique(perms(vec_1),'rows');
• Instead of getting [15!/ 1! * (15-1)!] = 15 combinations, the perms() function is trying to do
15! = 1.3077e+12 combinations and it's interrupted.
• How would you go about doing in a much better way? Thanks in advance!
You can use nchoosek to return the indicies which should be 1, I think in your heart you knew this must be possible because you were using the definition of nchoosek to determine the expected final number of permutations! So we can use:
idx = nchoosek( 1:N, k );
Where N is the number of elements in your array v1, and k is the number of elements which have the value 1. Then it's simply a case of creating the zeros array and populating the ones.
v1 = [0, 0, 1, 1];
N = numel(v1); % number of elements in array
k = nnz(v1); % number of non-zero elements in array
colidx = nchoosek( 1:N, k ); % column index for ones
rowidx = repmat( 1:size(colidx,1), k, 1 ).'; % row index for ones
M = zeros( size(colidx,1), N ); % create output
M( rowidx(:) + size(M,1) * (colidx(:)-1) ) = 1;
This works for both of your examples without the need for a huge intermediate matrix.
Aside: since you'd have the indicies using this approach, you could instead create a sparse matrix, but whether that's a good idea or not would depend what you're doing after this point.

Gnuplot: Nested “plot” iteration (“plot for”) with dependent loop indices

I have recently attempted to concisely draw several graphs in a plot using gnuplot and the plot for ... syntax. In this case, I needed nested loops because I wanted to pass something like the following index combinations (simplified here) to the plot expression:
i = 0, j = 0
i = 1, j = 0
i = 1, j = 1
i = 2, j = 0
i = 2, j = 1
i = 2, j = 2
and so on.
So i loops from 0 to some upper limit N and for each iteration of i, j loops from 0 to i (so i <= j). I tried doing this with the following:
# f(i, j, x) = ...
N = 5
plot for [i=0:N] for [j=0:i] f(i, j, x) title sprintf('j = %d', j)
but this only gives five iterations with j = 0 every time (as shown by the title). So it seems that gnuplot only evaluates the for expressions once, fixing i = 0 at the beginning and not re-evaluating to keep up with changing i values. Something like this has already been hinted at in this answer (“in the plot for ... structure the second index cannot depend on the first one.”).
Is there a simple way to do what I want in gnuplot (i.e. use the combinations of indices given above with some kind of loop)? There is the do for { ... } structure since gnuplot 4.6, but that requires individual statements in its body, so it can’t be used to assemble a single plot statement. I suppose one could use multiplot to get around this, but I’d like to avoid multiplot if possible because it makes things more complicated than seems necessary.
I took your problem personally. For your specific problem you can use a mathematical trick. Remap your indices (i,j) to a single index k, such that
(0,0) -> (0)
(1,0) -> (1)
(1,1) -> (2)
(2,0) -> (3)
...
It can be shown that the relation between i and j and k is
k = i*(i+1)/2 + j
which can be inverted with a bit of algebra
i(k)=floor((sqrt(1+8.*k)-1.)/2.)
j(k)=k-i(k)*(i(k)+1)/2
Now, you can use a single index k in your loop
N = 5
kmax = N*(N+1)/2 + N
plot for [k=0:kmax] f(i(k), j(k), x) title sprintf('j = %d', j(k))

Vector search Algorithm

I have the following problem. Say I have a vector:
v = [1,2,3,4,5,1,2,3,4,...]
I want to sequentially sample points from the vector, that have an absolute maginute difference higher than a threshold from a previously sampled point. So say my threshold is 2.
I start at the index 1, and sample the first point 1. Then my condition is met at v[3], and I sample 3 (since 3-1 >= 2). Then 3, the new sampled point becomes the reference, that I check against. The next sampled point is 5 which is v[5] (5-3 >= 2). Then the next point is 1 which is v[6] (abs(1-5) >= 2).
Unfortunately my code in R, is taking too long. Basically I am scanning the array repeatedly and looking for matches. I think that this approach is naive though. I have a feeling that I can accomplish this task in a single pass through the array. I dont know how though. Any help appreciated. I guess the problem I am running into is that the location of the next sample point can be anywhere in the array, and I need to scan the array from the current point to the end to find it.
Thanks.
I don't see a way this can be done without a loop, so here is one:
my.sample <- function(x, thresh) {
out <- x
i <- 1
for (j in seq_along(x)[-1]) {
if (abs(x[i]-x[j]) >= thresh) {
i <- j
} else {
out[j] <- NA
}
}
out[!is.na(out)]
}
my.sample(x = c(1:5,1:4), thresh = 2)
# [1] 1 3 5 1 3
You can do this without a loop using a bit of recursion:
vsearch = function(v, x, fun=NULL) {
# v: input vector
# x: threshold level
if (!length(v) > 0) return(NULL)
y = v-rep(v[1], times=length(v))
if (!is.null(fun)) y = fun(y)
i = which(y >= x)
if (!length(i) > 0) return(NULL)
i = i[1]
return(c(v[i], vsearch(v[-(1:(i-1))], x, fun=fun)))
}
With your vector above:
> vsearch(c(1,2,3,4,5,1,2,3,4), 2, abs)
[1] 3 5 1 3

Data.Map vs. Data.Array for symmetric matrices?

Sorry for the vague question, but I hope for an experienced Haskeller this is a no-brainer.
I have to represent and manipulate symmetric matrices, so there are basically three different choices for the data type:
Complete matrix storing both the (i,j) and (j,i) element, although m(i,j) = m(j,i)
Data.Array (Int, Int) Int
A map, storing only elements (i,j) with i <= j (upper triangular matrix)
Data.Map (Int, Int) Int
A vector indexed by k, storing the upper triangular matrix given some vector order f(i,j) = k
Data.Array Int Int
Many operations are going to be necessary on the matrices, updating a single element, querying for rows and columns etc. However, they will mainly act as containers, no linear algebra operations (inversion, det, etc) will be required.
Which one of the options would be the fastest one in general if the dimensionality of the matrices is going to be at around 20x20? When I understand correctly, every update (with (//) in the case of array) requires full copies, so going from 20x20=400 elements to 20*21/2 = 210 elements in the cases 2. or 3. would make a lot of sense, but access is slower for case 2. and 3. needs conversion at some point.
Are there any guidelines?
Btw: The 3rd option is not a really good one, as computing f^-1 requires square roots.
You could try using Data.Array using a specialized Ix class that only generates the upper half of the matrix:
newtype Symmetric = Symmetric { pair :: (Int, Int) } deriving (Ord, Eq)
instance Ix Symmetric where
range ((Symmetric (x1,y1)), (Symmetric (x2,y2))) =
map Symmetric [(x,y) | x <- range (x1,x2), y <- range (y1,y2), x >= y]
inRange (lo,hi) i = x <= hix && x >= lox && y <= hiy && y >= loy && x >= y
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
index (lo,hi) i
| inRange (lo,hi) i = (x-loy)+(sum$take(y-loy)[hix-lox, hix-lox-1..])
| otherwise = error "Error in array index"
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
sym x y
| x < y = Symmetric (y,x)
| otherwise = Symmetric (x,y)
*Main Data.Ix> let a = listArray (sym 0 0, sym 6 6) [0..]
*Main Data.Ix> a ! sym 3 2
14
*Main Data.Ix> a ! sym 2 3
14
*Main Data.Ix> a ! sym 2 2
13
*Main Data.Ix> length $ elems a
28
*Main Data.Ix> let b = listArray (sym 0 0, sym 19 19) [0..]
*Main Data.Ix> length $ elems b
210
There is a fourth option: use an array of decreasingly-large arrays. I would go with either option 1 (using a full array and just storing every element twice) or this last one. If you intend to be updating a lot of elements, I strongly recommend using a mutable array; IOArray and STArray are popular choices.
Unless this is for homework or something, you should also take a peek at Hackage. A quick look suggests the problem of manipulating matrices has been solved several times already.

Resources