Optimizing 2D grid connectivity algorithm - arrays

Summary: I'm looking for an optimal algorithm to ensure connectivity over a 2D grid of binary values. I have a fairly involved algorithm that does it in effectively linear time, but only if certain pre-processing steps are performed. The following goes into fairly extensive detail about the algorithm and its run time. I've also put together a Unity app that offers a detailed visualization of all the steps mentioned below (and some others), which can be found here.
I have a set of scripts that procedurally generate terrain using an algorithm called marching squares. One of the steps is to connect all the regions together. Specifically, I have a grid of 0s (floors) and 1s (walls) and want to ensure that every 0 is reachable from every other 0. I'm optimizing for:
The amount of tunneling that needs to be done. i.e. the number of 1s that are turned into 0 should be minimized.
Asymptotic run-time. I'm trying to make it linear in the number of tiles in the grid, or as close to linear as possible.
By treating the rooms (connected regions of 0s) as vertices and potential tunnels as edges, we can use a minimum spanning tree algorithm as our workhorse. I'll describe the algorithm from the starting point of an unconnected grid of 0s and 1s.
Input:
A 2d array of bytes, either 0 or 1, representing terrain (0: floor, 1: wall).
e.g. the following has four 'rooms' (connected components of floors).
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1
1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Output
The same grid, such that each room can be reached from any other room, with
the least amount of damage done to the grid (fewest 1s flipped to 0s). Here we've carved a total of three tunnels:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1
1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Overview of basic algorithm:
The following is a high level description of the algorithm, without several crucial optimizations, in order to illustrate the high level ideas:
Run a BFS on the grid to find rooms (connected components of floors), storing only edge tiles (i.e. floor tiles adjacent to a wall tile) since the shortest path between two rooms will always be between two edge tiles.
For each pair of rooms, do a double loop to find the pair of tiles with the shortest euclidean distance. This pair forms a potential tunnel between the two rooms, by digging a straight path between them.
Treat the rooms from (1) as vertices, and the pairs from (2) as edges in a graph, with the weights being euclidean distance. Run Kruskal's Minimum Spanning Tree algorithm on this graph to acquire a list of tunnels to dig that minimize the number of tiles that need to be changed.
This guarantees connectivity, and it guarantees the absolute minimum number of changed tiles (caveat: this is false if we consider the possibility of connecting a room not to another room, but to a tunnel between another pair of rooms). The issue is that it scales poorly: step (2) scales quadratically in the number of tiles in the grid.
Optimized algorithm
The bottleneck is with step (2). We're meticulously checking every single pair of tiles for every pair of rooms to ensure we get the absolute smallest connection. If we accept a little bit of error (i.e. a suboptimal connection) we can speed it up dramatically. The basic idea is to skip a number of tiles proportional to the distance we just computed: if we compute a large distance between tile A and tile B, then chances are that we're nowhere near an optimal connection, so we can skip checking the nearby tiles. Any error from this skipping will be proportional to the length of the optimal connection.
To explain this visually, suppose X and Y represent a current pair of tiles being checked, and that we're currently looping X over the room on the left.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 X 0 0 0 0 0 0 1 1 1 1 1 1 1
1 1 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 0 0 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 0 0 0 1
1 1 0 0 0 0 1 1 0 0 0 0 0 1 1
1 1 1 1 1 1 1 0 0 0 0 Y 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
These have a distance of 11, so let's skip 11 tiles (marked with dashes):
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 - - - - - - 1 1 1 1 1 1 1
1 1 0 0 0 0 0 - - 1 1 1 1 1 1
1 1 0 0 0 0 - - 1 1 1 1 1 1 1
1 0 0 0 0 X - 1 1 1 1 0 0 0 1
1 1 0 0 0 0 1 1 0 0 0 0 0 1 1
1 1 1 1 1 1 1 0 0 0 0 Y 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Most comparisons will be far apart, so this dramatically reduces the run time of step 2, and in practice, produces minimal error.
The one issue being overlooked here is that this assumes the edge tiles are ordered: this requires an additional preprocessing step.
Thus here's the optimized algorithm:
Perform the BFS as in the previous algorithm.
Sort each room. This can be done by executing a depth-first along the edge tiles. This will give us a roughly continuous path along the edge of the room. There are a few (literal) corner cases where the path can jump, but the paths are continuous within reasonable approximation.
Perform the double loop in step 2 from the previous algorithm, but this time, instead of incrementing by one tile, increment by the last computed distance.
Perform Kruskal's algorithm as before. Note that Kruskal's requires we sort the edges in the graph: since the graph is complete (every pair of rooms has a potential tunnel), a standard sort becomes the new bottleneck in the algorithm. Since we're sorting by distance, which is a float, we can achieve a much faster sort by truncating the floats and turning them into integers. Again, this produces minimal error (Kruskal's might choose a tunnel with distance 4.6 over a tunnel with distance 4.2, for example) but offers dramatic speedup.
Run-time analysis
Let n denote the number of tiles in the grid. Let m denote the number of rooms (connected components) in the grid. Note that in general, in the worst case, m = O(n).
There are four steps in the algorithm. (1) and (2) are both O(n) in memory and time, as they are a BFS and DFS that processes each tile in the map at most once.
(3) is a bit trickier. We're doing a double loop over every room, and then finding a connection. This is O(m^2) for the double loop, multipled by the average work done per pair of rooms. Giving a tight analytical bound for the work done on average per pair of rooms is not a straightforward matter. But empirically, testing over a large variety of configurations, as n grows large, the average work converges to a single comparison. This is because the average distance of tiles between rooms grows as the grid grows.
So in total the work done for (3) is O(m^2), with O(1) storage.
For (4), it's given by the runtime for Kruskal's, which is O(m^2 logm^2) to sort the edges naively and O(m^2 a(m^2)) to run the edges through the UnionFind data structure, where a is the inverse ackermann function (effectively a constant). If we truncate the edge lengths and use an integer sorting algorithm, we can get the sort down to O(m^2). Storage is O(m^2).
So in total, the runtime is dominated by Kruskal's algorithm, given by O(m^2 a(m^2)), or effectively, O(m^2). Given that m = O(n) in the worst case, this is not very good performance. But we can do one final preprocessing step on the grid to get it down to O(n), which is to limit the number of rooms in an organic way.
Prior to any of the other steps, we can use a floodfill algorithm to fill in small rooms in linear time: specifically, we can fill in any room of size less than sqrt(n). Since there can only be at most sqrt(n) rooms of size at least sqrt(n) in the grid, it follows that m = O(sqrt(n)), making the entire algorithm linear in the size of the grid.
Conclusion
Is it possible to do better than this? Obviously we cannot do asymptotically better than linear time and storage, but in order to achieve those figures, a certain amount of sloppily quantified suboptimality in the tunnel lengths is accepted, and it requires modifying the original grid (namely, putting a bound on the number of rooms).

Related

How to isolate specific boolean island given a coordinate on it?

Given a 2d array with Boolean islands where 1 is land and 0 is water. Lets say I want only the island to which I point with a coordinate. How would I transfer it to a new array where everything beyond the borders of that island is water.
Here is a simple example.
I am given this 2d array
1 0 0 1 1 0
0 1 0 0 0 1
1 1 1 0 0 0
0 1 0 1 0 1
1 1 1 1 1 0
and the coordinate [1][2] (that would be the 2nd column 3rd row)
Then the final result in the new array should be something like
0 0 0 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 1 0 1 0 0
1 1 1 1 1 0
The pixels can only be connected either up, down, left or right to each other (no diagonals)

Matrix Permutations with Contraint

I have the following matrix:
1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2
I'd like to randomly permute the columns, with the constraint that every four numbers in the second row should contain some form of
0 0 1 2
e.g. Columns 1:4, 5:8, 9:12, 13:16, 17:20, 21:24 in the example below each contain the numbers 0 0 1 2.
0 1 0 2 2 0 1 0 0 0 2 1 1 2 0 0 2 0 1 0 1 0 0 2
Every column in the permuted matrix should have a corresponding one in the first matrix. In other words, nothing should be altered within a column.
I can't seem to think of an intuitive solution to this - Is there another way of coming up with some form of the initial matrix that both satisfies the constraint and retains the integrity of the columns? Each column represents conditions in an experiment, which is why I'd like them to be balanced.
You can compute the permutations directly in the following manner: First, permute all columns with 0 in the second row among themselves, then all 1s among themselves, and finally all 2s among themselves. This ensures that, for example, any two 0 columns are equally likely to be the first two columns in the resulting permutation of A.
The second step is to permute all columns in blocks of 4: permute columns 1-4 randomly, permute columns 5-8 randomly, etc. Once you do this, you have a matrix that maintains the (0 0 1 2) pattern for every block of 4 columns, but each set of (0 0 1 2) is equally likely to be in any given block of 4, and the (0 0 1 2) are equally likely to be in any order.
A = [1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2];
%% Find the indices of the zeros and generate a random permutation with that size
zeroes = find(A(2,:)==0);
perm0 = zeroes(randperm(length(zeroes)));
%% Find the indices of the ones and generate a random permutation with that size
wons = find(A(2,:) == 1);
perm1 = wons(randperm(length(wons)));
%% NOTE: the spelling of `zeroes` and `wons` is to prevent overwriting
%% the MATLAB builtin functions `zeros` and `ones`
%% Find the indices of the twos and generate a random permutation with that size
twos = find(A(2,:) == 2);
perm2 = twos(randperm(length(twos)));
%% permute the zeros among themselves, the ones among themselves and the twos among themselves
A(:,zeroes) = A(:,perm0);
A(:,wons) = A(:,perm1);
A(:,twos) = A(:,perm2);
%% finally, permute each block of 4 columns, so that the (0 0 1 2) pattern is preserved, but each column still has an
%% equi-probable chance of being in any position
for i = 1:size(A,2)/4
perm = randperm(4) + 4*i-4;
A(:, 4*i-3:4*i) = A(:,perm);
end
Example result:
A =
Columns 1 through 15
1 1 2 2 2 2 1 1 2 2 1 2 2 1 2
0 0 2 1 0 2 0 1 0 2 1 0 1 2 0
0 1 2 2 2 0 1 1 1 1 2 0 0 2 0
Columns 16 through 24
2 1 1 1 1 1 2 2 1
0 2 0 0 1 0 0 1 2
1 1 2 2 0 0 2 1 0
I was able to generate 100000 constrained permutations of A in about 9.32 seconds running MATLAB 2016a, to give you an idea of how long this code takes. There are certainly ways to optimize the permutation selection so you don't have to make quite so many random draws, but I always prefer the simple, straightforward approach until it proves insufficient.
You could use a rejection method: keep trying random permutations, chosen equiprobably, until one satisfies the requirement. This guarantees that all valid permutations have the same probability of being picked.
A = [ 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 ]; % data matrix
required = [0 0 1 2]; % restriction
row = 2; % row to which the resitriction applies
sorted_req = sort(required(:)); % sort required values
done = false; % initiallize
while ~done
result = A(:, randperm(size(A,2))); % random permutation of columns of A
test = sort(reshape(result(row,:), numel(required), []), 1); % reshape row
% into blocks, each block in a column; and sort each block
done = all(all(bsxfun(#eq, test, sorted_req))); % test if valid
end
Here's an example result:
result =
2 1 1 1 1 2 1 2 1 2 2 1 2 2 1 2 2 2 1 1 1 2 1 2
2 0 0 1 2 1 0 0 0 1 0 2 2 0 1 0 1 2 0 0 2 0 1 0
2 1 2 2 1 2 2 0 1 1 1 2 1 1 0 0 0 0 0 0 0 2 1 2

How to find all combinations of multiple 2D arrays(matrix) , rotation allowed

I have 3 2d Arrays(matrix) with 0 and 1-
For each array, I will rotate 4 times clock-wise , 4 times anti clock-wise and flip the array and repeat the above and for each iteration I will repeat the steps for other array and so on to combine the array to build a symmetry or kind of Rubik's cube but with 5 elements each side. It means if I like to add 2 arrays , it means 1 of Array 1 must be fit with 0 of Array 2.
Following kind of structure-
Following is my 3 arrays
0 0 1 0 1
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
0 1 0 1 1
-------------
0 1 0 1 0
0 1 1 1 0
1 1 1 1 1
0 1 1 1 0
0 0 1 0 0
-------------
1 0 1 0 0
1 1 1 1 1
0 1 1 1 0
1 1 1 1 1
0 1 0 1 0
-------------
This problem is evolved from the problem I asked How to solve 5 * 5 Cube in efficient easy way.
Consider my rotate methods are as follows -
rotateLeft()
rotateRight()
flipSide()
for (firstArray){
element = single.rotateLeft();
for(secondArray){
element2 = single.rotateLeft();
if(element.combine(element2){
for(thirdArray){
}
}
}
}
Currently I have fixed 3 arrays , but how exactly and efficiently I must solve this problem.

Algorithm for 'Pogo Painter' minigame

I am working on a minigame called 'Pogo Painter', and I need some mathematical solutions. Below is an image (made with Paint) to illustrate a bit what it's all about.
Four players, each of different color, must claim squares to gain points. The minigame will be similar to this: http://www.youtube.com/watch?v=rKCQfAlaRrc, but slightly different. The players will be allowed to run around the playground and claim any of the squares, and points are gathered when a pattern is closed. For example, claiming blue square on A3 will create a closed blue pattern.
What kind of variables should I declare and how do I check if the pattern is closed?
Please answer if you have a solution :)
Here’s another (Discrete Optimization) way to model your problem.
Notation
View your grid as a ‘graph’ with n^2 nodes, and edges of length 1 (Edges connect two neighboring nodes.) Let the nodes be numbered 1:n^2. (For ease of notation, you can use a double array (x,y) to denote each node if you prefer.)
Decision Variables
There are k colors, one for each player (1 through 4). 0 is an unclaimed cell (white)
X_ik = 1 if player k has claimed node i. 0 otherwise.
To start out
X_i0 = 1 for all nodes i.
All nodes start out as white (0).
Neighboring sets: Two nodes i and j are ‘neighbors’ if they are adjacent to each other. (Any given node i can have at most 4 neighbors: Up down right and left.)
Edge variables:
We can now define a new set of edge variables Y_ijk that connect two adjacent nodes (i and j) with a common color k.
Y_ijk = 1 if neighboring nodes i and j are both of color k. 0 Otherwise.
(That is, X_ik = X_jk) for non-zero k.
We now have an undirected graph. Checking for ‘closed patterns’ is the same as detecting cycles.
Detecting Cycles:
A simple DFS search will do, since we have undirected cycles. Start with each colored node i, and check for cycles. If a path leads you back to a visited node, cycles exist. You can award points accordingly.
Finally, one suggestion as you design the game. You can reward points according to the “longest cycle” you detect. The shortest cycle gets 4 points, one point for each edge (or one point for each node in the cycle) whichever works best for you.
1 1
1 1 scores 4 points
1 1 1
1 1 1 scores 6 points
1 1 1
1 1 1
1 1 scores 8 points
Hope that helps.
Okay,
This is plenty of text, but it's simple.
An N-by-N square will satisfy as the game-board.
Each time a player claims a square,
If the square is not attached to any square of that player, then you must give that square a unique ID.
If the square is attached,
Count how many neighbours of each ID it has.
( See the demos I put below, to see what this means)
For each group
patterns_count += group_size - 1
If the number of groups is more than 1
Change the ID of that group as well as every other square connected to it so they all share the same ID
You must remember which IDs belong to which players.
This is what you have in your example
1 1 1 0 0 0 0 2 2
1 0 0 0 1 3 3 0 0
1 1 0 0 3 3 0 0 0
0 1 0 0 4 5 0 0 0
0 0 0 6 4 0 0 0 0
7 7 0 0 0 0 8 8 8
0 7 7 0 9 8 8 0 8
A A 7 0 9 8 0 0 8
A 0 7 0 0 0 8 8 8
And this is what it would turn out like after blue grabs A-3
1 1 1 0 0 0 0 2 2
1 0 0 0 1 3 3 0 0
1 1 0 0 3 3 0 0 0
0 1 0 0 4 5 0 0 0
0 0 0 6 4 0 0 0 0
7 7 0 0 0 0 8 8 8
0 7 7 0 9 8 8 0 8
A A 7 0 9 8 0 0 8
A 0 7 0 0 8 8 8 8
More examples of the algorithm in use
1 1 1 0
1 0 1 0
1 1 0
0 0 0 0
2 neighbours. 2x'1'
1x closed pattern.
1 1 1 0
1 0 1 0
1 1 1 0
0 0 0 0
--
1 1 1 0 0
1 0 1 0 0
1 1 0 0
1 0 1 0 0
1 1 1 0 0
3 neighbours: 3x'1'
2x closed patterns
1 1 1 0 0
1 0 1 0 0
1 1 1 0 0
1 0 1 0 0
1 1 1 0 0
--
1 1 1 0 0
1 0 1 0 0
1 1 2 2
0 0 2 0 2
0 0 2 2 2
4 neighbours: 2x'1', 2x'2'
2 Closed patterns
1 1 1 0 0
1 0 1 0 0
1 1 1 1 1
0 0 1 0 1
0 0 1 1 1
But I also consider these a closed pattern. You haven't given any description as to what should be considered one and what shouldn't be.
1 1 0
1 1 0
0 0 0
1 1 1
1 1 1
0 0 0
1 1 1
1 1 1
1 1

rotate vector of arbitrary length circularly about an array about some point x,y in matlab

I have an array:
1 1 1 0 0
1 2 2 0 0
1 2 3 0 0
0 0 0 0 0
0 0 0 0 0
I want to make it
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
It is like rotating 1/4 piece of pie 270 degrees to fill out the remaining parts of the pie to make a full circle. Essentially mirroring the entire corner in all directions. I don't want to use any in built matlab features if possible - just some vector tricks if possible. Thanks.
EDIT:
This is embedded within an matrix of zeros of arbitrary size. I want it to work in both the above example and say this example:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 1 2 2 0 0 0 0 0 0 0 0 0
0 0 1 2 3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ideally, I want to have a vector say [1,2,3.. N] which can be rotated circularly about the highest value in the array (N) centered about some point xc,yc in the grid. Or if this isn't possible, take an base array [1 1 1, 1 2 2, 1 2 3] and rotate it such that 3 is in the centre and you fill a circle as in the 2nd matrix above.
EDIT:
I found rot90(M,k) rotates matrix M k times but this produces:
Mrot = M + rot90(M,1) + rot90(M,2) + rot90(M,3)
Mrot =
1 1 2 1 1
1 2 4 2 1
2 4 12 4 2
1 2 4 2 1
1 1 2 1 1
This stacks it in the x,y directions which isn't correct.
Assuming the corner you want to replicate is symmetric about the diagonal (as in your example), then you can do this in one indexing step. Given a matrix M containing your sample 5-by-5 matrix, here's how to do it:
>> index = [1 2 3 2 1];
>> M = M(index, index)
M =
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1

Resources