Data structure for keeping track of a game board in OCaml - arrays

I am fairly new to OCaml and I want to implement a game which is similar to four-in-a-line.
What I need is some data structure to keep the game state. The game board is a 4x4 square with a total of 16 tiles.
I am looking for a representation for this in OCaml that will make it easy and fast to retrieve (or do some operation on) all the elements in entire column, row or diagonal.
I will be doing minimax search on this game, which is why speed is important.
So far I have considered a one-dimensional list. The problem with a list is that its hard to figure out what elements belong to each row/column/diagonal, and then retrieve them with a for example.
I thought about using Array.make 4 (Array.make 4 Empty);;. This is absolutely perfect when it comes to rows. Its easy to get them and do a pattern match on it. But it is a chore to do pattern matching on individual columns and diagonals.
What I would like to be able to do is have a function that takes a game board and returns a list of lists containing all the rows/columns/diagonals. I would then like to do, for example, match (rows,columns,diagonals) with (Empty, Empty, Empty, Empty) -> something.

Since length is fixed, prefer arrays over lists: they use less memory and are faster to read and write.
I'm afraid you will need to write a function to get diagonals, there is no simple pattern matching.
When you write "do some operation on [a diagonal]", I assume you're thinking about a function f that takes an array of length 4 storing the elements, for instance [|Empty;Empty;Empty;Empty|].
Maybe f could instead take as arguments the position p, and an array of indices inside the position:
f p [|x1,y1; x2,y2; x3,y3; x4,y4|] would extract the squares p.(x1).(y1) ... p.(x4).(y4). Then just pass different x's and y's to make f operate on row/columns/diagonals.
Once the code is working and you're turning to optimization, you might want to have a look at bitvectors:
if there are a lot of positions stored in the tree of you minmax search, reducing the memory footprint means more cache hits and faster execution. You might event want to encode a position in a single int yourself, but this is some tricky work, you don't want to do it too early.

Sometimes matching doesn't work. Here, I think you should try to use functions as much as possible, and then getting your cells row first or column first won't be that complex, and you could even move from one representation to the other by reversing the indices order.
If I use the following type:
type color = Red | Yellow;;
type cell = Empty | Color of color;;
type board = Array.make 4 (Array.make 4 Empty);;
and decide for column first, then the following functions will get me rows or columns:
let column (b: board) i j = b.(i).(j)
let row (b: board) i j = b.(j).(i)
For the diagonals, there are 2 sets of them, one going top-left toward down-right, and the other one in the other direction (top-right to bottom-left):
let ldiag (b: board) i j = b.((i + j) mod 4).(j)
let rdiag (b: board) i j = b.((i - j + 4) mod 4).(j)
Then I guess that checking a row, column, or diagonal is just a matter of checking the 4 cells of that line.
let check predicate linef k = predicate (linef b k 0) &&
predicate (linef b k 1) &&
predicate (linef b k 2) &&
predicate (linef b k 3)
then for instance, checking if there's a diagonal of red:
let has_line linef b color =
let cmp x = x = color in
let check k = check cmp linef b k in
check 0 || check 1 || check 2 || check 3
let has_ldiag b color = has_line ldiag b color
let has_rdiag b color = has_line rdiag b color
let has_red_diagonal b = has_ldiag b Red | has_rdiag b Red

How about indexing your tiles with the corresponding coordinate? So, the elements on your one-d list would be of the form:
(int * int * ref tile)
Then you can filter rows / columns / diagonals like this:
Row n: (precondition: 0 <= n, u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u = n);;
Column n: (precondition: 0 <= n, u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> v = n);;
Diagonal 1: (precondition: 0 <= u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u = v);;
Diagonal 2: (precondition: 0 <= u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u + v = 3);;
It should also be possible to index the tiles with just one integer (the index of the tile within the one-d list), but that would need some calculations in the filter function (given index, figure out the coordinate and then decide if it belongs to the desired row / column / diagonal).


Julia / Cellular Automata: efficient way to get neighborhood

I'd like to implement a cellular automaton (CA) in Julia. Dimensions should be wrapped, this means: the left neighbor of the leftmost cell is the rightmost cell etc.
One crucial question is: how to get the neighbors of one cell to compute it's state in the next generation? As dimensions should be wrapped and Julia does not allow negative indices (as in Python) i had this idea:
Considered a 1D CA, one generation is a one-dimensional array:
0 0 1 0 0
What if we create a two dimensional Array, where the first row is shifted right and the third is shifted left, like this:
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
Now, the first column contain the states of the first cell and it's neighbors etc.
i think this can easily be generalized for two and more dimensions.
First question: do you think this is a good idea, or is this a wrong track?
EDIT: Answer to first question was no, second Question and code example discarded.
Second question: If the approach is basically ok, please have a look at the following sketch:
EDIT: Other approach, here is a stripped down version of a 1D CA, using mod1() for getting neighborhood-indices, as Bogumił Kamiński suggested.
for any cell:
- A array of all indices
- B array of all neighborhood states
- C states converted to one integer
- D lookup next state
function digits2int(digits, base=10)
int = 0
for digit in digits
int = int * base + digit
return int
gen = [0,0,0,0,0,1,0,0,0,0,0]
rule = [0,1,1,1,1,0,0,0]
function nextgen(gen, rule)
values = [mod1.(x .+ [-1,0,1], size(gen)) for x in 1:length(gen)] # A
values = [gen[value] for value in values] # B
values = [digits2int(value, 2) for value in values] # C
values = [rule[value+1] for value in values] # D
return values
for _ in 1:100
global gen
gen = nextgen(gen, rule)
Next step should be to extend it to two dimensions, will try it now...
The way I typically do it is to use mod1 function for wrapped indexing.
In this approach, no matter what dimensionality of your array a is then when you want to move from position x by delta dx it is enough to write mod1(x+dx, size(a, 1)) if x is the first dimension of an array.
Here is a simple example of a random walk on a 2D torus counting the number of times a given cell was visited (here I additionally use broadcasting to handle all dimensions in one expression):
function randomwalk()
a = zeros(Int, 8, 8)
pos = (1,1)
for _ in 1:10^6
# Von Neumann neighborhood
dpos = rand(((1,0), (-1,0), (0,1), (0,-1)))
pos = mod1.(pos .+ dpos, size(a))
a[pos...] += 1
Usually, if the CA has cells that are only dependent on the cells next to them, it's simpler just to "wrap" the vector by adding the last element to the front and the first element to the back, doing the simulation, and then "unwrap" by taking the first and last elements away again to get the result length the same as the starting array length. For the 1-D case:
const lines = 10
const start = ".........#........."
const rules = [90, 30, 14]
rule2poss(rule) = [rule & (1 << (i - 1)) != 0 for i in 1:8]
cells2bools(cells) = [cells[i] == '#' for i in 1:length(cells)]
bools2cells(bset) = prod([bset[i] ? "#" : "." for i in 1:length(bset)])
function transform(bset, ruleposs)
newbset = map(x->ruleposs[x],
[bset[i + 1] * 4 + bset[i] * 2 + bset[i - 1] + 1
for i in 2:length(bset)-1])
vcat(newbset[end], newbset, newbset[1])
const startset = cells2bools(start)
for rul in rules
println("\nUsing Rule $rul:")
bset = vcat(startset[end], startset, startset[1]) # wrap ends
rp = rule2poss(rul)
for _ in 1:lines
println(bools2cells(bset[2:end-1])) # unwrap ends
bset = transform(bset, rp)
As long as only the adjacent cells are used in the simulation for any given cell, this is correct.
If you extend this to a 2D matrix, you would also "wrap" the first and last rows as well as the first and last columns, and so forth.

Maximize number of inversion count in array

We are given an unsorted array A of integers (duplicates allowed) with size N possibly large. We can count the number of pairs with indices i < j, for which A[i] < A[j], let's call this X.
We can change maximum one element from the array with a cost equal to the difference in absolute values (for instance, if we replace element on index k with the new number K, the cost Y is | A[k] - K |).
We can only replace this element with other elements found in the array.
We want to find the minimum possible value of X + Y.
Some examples:
[1,2,2] should return 1 (change the 1 to 2 such that the array becomes [2,2,2])
[2,2,3] should return 1 (change the 3 to 2)
[2,1,1] should return 0 (because no changes are necessary)
[1,2,3,4] should return 6 (this is already the minimum possible value)
[4,4,5,5] should return 3 (this can accomplished by changing the first 4 into a 5 or the last 5 in a 4)
The number of pairs can be found with a naive O(n²) solution, here in Python:
def calc_x(arr):
n = len(arr)
cnt = 0
for i in range(n):
for j in range(i+1, n):
if arr[j] > arr[i]:
cnt += 1
return cnt
A brute-force solution is easily written as for example:
def f(arr):
best_val = calc_x(arr)
used = set(arr)
for i, v in enumerate(arr):
for replacement in used:
if replacement == v:
arr2 = arr[0:i] + replacement + arr[i:]
y = abs(replacement - v)
x = calc_x(arr2)
best_val = min(best_val, x + y)
return best_val
We can count for each element the number of items right of it larger than itself in O(n*log(n)) using for instance an AVL-tree or some variation on merge sort.
However, we still have to search which element to change and what improvement it can achieve.
This was given as an interview question and I would like some hints or insights as how to solve this problem efficiently (data structures or algorithm).
Definitely go for a O(n log n) complexity when counting inversions.
We can see that when you change a value at index k, you can either:
1) increase it, and then possibly reduce the number of inversions with elements bigger than k, but increase the number of inversions with elements smaller than k
2) decrease it (the opposite thing happens)
Let's try not to count x every time you change a value. What do you need to know?
In case 1):
You have to know how many elements on the left are smaller than your new value v and how many elements on the right are bigger than your value. You can pretty easily check that in O (n). So what is your x now? You can count it with the following formula:
prev_val - your previous value
prev_x - x that you've counted at the beginning of your program
prev_l - number of elements on the left smaller than prev_val
prev_r - number of elements on the right bigger than prev_val
v - new value
l - number of elements on the right smaller than v
r - number of elements on the right bigger than v
new_x = prev_x + r + l - prev_l - prev_r
In the second case you pretty much do the opposite thing.
Right now you get something like O( n^3 ) instead of O (n^3 log n), which is probably still bad. Unfortunately that's all what I came up for now. I'll definitely tell you if I come up with sth better.
EDIT: What about memory limit? Is there any? If not, you can just for each element in the array make two sets with elements before and after the current one. Then you can find the amount of smaller/bigger in O (log n), making your time complexity O (n^2 log n).
EDIT 2: We can also try to check, what element would be the best to change to a value v, for every possible value v. You can make then two sets and add/erase elements from them while checking for every element, making the time complexity O(n^2 log n) without using too much space. So the algorithm would be:
1) determine every value v that you can change any element, calculate x
2) for each possible value v:
make two sets, push all elements into the second one
for each element e in array:
add previous element (if there's any) to the first set and erase element e from the second set, then count number of bigger/smaller elements in set 1 and 2 and calculate new x
EDIT 3: Instead of making two sets, you could go with prefix sum for a value. That's O (n^2) already, but I think we can go even better than this.

Substitute a vector value with two values in MATLAB

I have to create a function that takes as input a vector v and three scalars a, b and c. The function replaces every element of v that is equal to a with a two element array [b,c].
For example, given v = [1,2,3,4] and a = 2, b = 5, c = 5, the output would be:
out = [1,5,5,3,4]
My first attempt was to try this:
v = [1,2,3,4];
v(2) = [5,5];
However, I get an error, so I do not understand how to put two values in the place of one in a vector, i.e. shift all the following values one position to the right so that the new two values fit in the vector and, therefore, the size of the vector will increase in one. In addition, if there are several values of a that exist in v, I'm not sure how to replace them all at once.
How can I do this in MATLAB?
Here's a solution using cell arrays:
% remember the indices where a occurs
ind = (v == a);
% split array such that each element of a cell array contains one element
v = mat2cell(v, 1, ones(1, numel(v)));
% replace appropriate cells with two-element array
v(ind) = {[b c]};
% concatenate
v = cell2mat(v);
Like rayryeng's solution, it can replace multiple occurrences of a.
The problem mentioned by siliconwafer, that the array changes size, is here solved by intermediately keeping the partial arrays in cells of a cell array. Converting back to an array concenates these parts.
Something I would do is to first find the values of v that are equal to a which we will call ind. Then, create a new output vector that has the output size equal to numel(v) + numel(ind), as we are replacing each value of a that is in v with an additional value, then use indexing to place our new values in.
Assuming that you have created a row vector v, do the following:
%// Find all locations that are equal to a
ind = find(v == a);
%// Allocate output vector
out = zeros(1, numel(v) + numel(ind));
%// Determine locations in output vector that we need to
%// modify to place the value b in
indx = ind + (0:numel(ind)-1);
%// Determine locations in output vector that we need to
%// modify to place the value c in
indy = indx + 1;
%// Place values of b and c into the output
out(indx) = b;
out(indy) = c;
%// Get the rest of the values in v that are not equal to a
%// and place them in their corresponding spots.
rest = true(1,numel(out));
rest([indx,indy]) = false;
out(rest) = v(v ~= a);
The indx and indy statements are rather tricky, but certainly not hard to understand. For each index in v that is equal to a, what happens is that we need to shift the vector over by 1 for each index / location of v that is equal to a. The first value requires that we shift the vector over to the right by 1, then the next value requires that we shift to the right by 1 with respect to the previous shift, which means that we actually need to take the second index and shift by the right by 2 as this is with respect to the original index.
The next value requires that we shift to the right by 1 with respect to the second shift, or shifting to the right by 3 with respect to the original index and so on. These shifts define where we're going to place b. To place c, we simply take the indices generated for placing b and move them over to the right by 1.
What's left is to populate the output vector with those values that are not equal to a. We simply define a logical mask where the indices used to populate the output array have their locations set to false while the rest are set to true. We use this to index into the output and find those locations that are not equal to a to complete the assignment.
v = [1,2,3,4,5,4,4,5];
a = 4;
b = 10;
c = 11;
Using the above code, we get:
out =
1 2 3 10 11 5 10 11 10 11 5
This successfully replaces every value that is 4 in v with the tuple of [10,11].
I think that strrep deserves a mention here.
Although it's called string replacement and warns for non-char input, it still works perfectly fine for other numbers as well (including integers, doubles and even complex numbers).
v = [1,2,3,4]
a = 2, b = 5, c = 5
out = strrep(v, a, [b c])
Warning: Inputs must be character arrays or cell arrays of strings.
out =
1 5 5 3 4
You are not attempting to overwrite an existing value in the vector. You're attempting to change the size of the vector (meaning the number of rows or columns in the vector) because you're adding an element. This will always result in the vector being reallocated in memory.
Create a new vector, using the first and last half of v.
Let's say your index is stored in the variable index.
index = 2;
newValues = [5, 5];
x = [ v(1:index), newValues, v(index+1:end) ]
x =
1 2 5 5 3 4

Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
distance = 1 - (1 / (2 * p)) * distance;

Creating sets of similar elements in a 2D array

I am trying to solve a problem that is based on a 2D array. This array contains different kinds of elements (from a total of 3 possible kinds). Lets assume the kind as X, Y, Z.
The array appears to be something like this. Note that it would always be completely filled. The diagram is for illustration.
7 | | | | | | |
6 | | | | | | |
5 | | | | | | |
4 | |X|Z|Y|X| |
3 | |Y|X|Y|Y|X|
2 |Y|Y|X|Z|Z|X|
1 |X|X|Y| |X|X|
0 | | | |Z| | |
0 1 2 3 4 5
I am trying to create sets of elements that are placed adjacent to each other. For example, set1 may comprise of elements of type X located at: (0,1), (1,1), (2,2), (2,3), (1,4). Similarly, set2 may comprise of elements of type Y located at: (3,4), (3,3), 4,3).
Problem: Given any point in the array, it must be capable of adding all elements to the appropriate set and ensuring that there are no two sets that contain the same element. Note that a set is only created if more than 2 adjacent elements of the same kind are encountered.
Moreover, if a certain subset of elements is removed, more elements are added to replace the removed ones. The array must then be re-iterated over to make new sets or modify the existing ones.
Solution: I implemented a recursive solution such that it would iterate over all the adjacent elements of, for example, element X (0,1). Then, while iterating over the 8 possible adjacent elements, it would call itself recursively whenever a type X occurred.
This kind of solution is too much brute-force and inefficient, especially in the case where some elements are replaced with new ones of possibly different types. In such a case, almost the whole array has to be re-iterated to make/modify sets and ensuring that no same element exists in more than one set.
Is there any algorithm to deal efficiently with this kind of problem? I need help with some ideas/suggestions or pseudo codes.
[EDIT 5/8/2013: Fixed time complexity. (O(a(n)) is essentially constant time!)]
In the following, by "connected component" I mean the set of all positions that are reachable from each other by a path that allows only horizontal, vertical or diagonal moves between neighbouring positions having the same kind of element. E.g. your example {(0,1), (1,1), (2,2), (2,3), (1,4)} is a connected component in your example input. Each position belongs to exactly one connected component.
We will build a union/find data structure that will be used to give every position (x, y) a numeric "label" having the property that if and only if any two positions (x, y) and (x', y') belong to the same component then they have the same label. In particular this data structure supports three operations:
set(x, y, i) will set the label for position (x, y) to i.
find(x, y) will return the label assigned to the position (x, y).
union(Z), for some set of labels Z, will combine all labels in Z into a single label k, in the sense that future calls to find(x, y) on any position (x, y) that previously had a label in Z will now return k. (In general k will be one of the labels already in Z, though this is not actually important.) union(Z) also returns the new "master" label, k.
If there are n = width * height positions in total, this can be done in O(n*a(n)) time, where a() is the extremely slow-growing inverse Ackermann function. For all practical input sizes, this is the same as O(n).
Notice that whenever two vertices are adjacent to each other, there are four possible cases:
One is above the other (connected by a vertical edge)
One is to the left of the other (connected by a horizontal edge)
One is above and to the left of the other (connected by a \ diagonal edge)
One is above and to the right of the other (connected by a / diagonal edge)
We can use the following pass to determine labels for each position (x, y):
Set nextLabel to 0.
For each row y in increasing order:
For each column x in increasing order:
Examine the W, NW, N and NE neighbours of (x, y). Let Z be the subset of these 4 neighbours that are of the same kind as (x, y).
If Z is the empty set, then we tentatively suppose that (x, y) starts a brand new component, so call set(x, y, nextLabel) and increment nextLabel.
Otherwise, call find(Z[i]) on each element of Z to find their labels, and call union() on this set of labels to combine them together. Assign the new label (the result of this union() call) to k, and then also call set(x, y, k) to add (x, y) to this component.
After this, calling find(x, y) on any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly answer queries of the form "Which positions belong to the connected component containing position (x, y)?" then create a hashtable of lists posInComp and make a second pass over the input array, appending each (x, y) to the list posInComp[find(x, y)]. This can all be done in linear time and space. Now to answer a query for some given position (x, y), simply call lab = find(x, y) to find that position's label, and then list the positions in posInComp[lab].
To deal with "too-small" components, just look at the size of posInComp[lab]. If it's 1 or 2, then (x, y) does not belong to any "large-enough" component.
Finally, all this work effectively takes linear time, so it will be lightning fast unless your input array is huge. So it's perfectly reasonable to recompute it from scratch after modifying the input array.
In your situation, I would rely, at least, on two different arrays:
Array1 (sets) -> all the sets and the associated list of points. Main indices: set names.
Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names.
It might be possible to create more supporting arrays like, for example, one including the minimum/maximum X/Y values for each set to speed up the analysis (although it would be pretty quick anyway, as shown below).
You are not mentioning any programming language, but I include a sample (C#) code because it is the best way to explain the point. Please, don't understand it as a suggestion of the best way to proceed (personally, I don't like Dictionaries/Lists too much; although think that do provide a good graphical way to show an algorithm, even for unexperienced C# users). This code only intends to show a data storage/retrieval approach; the best way to achieve the optimal performance would depend upon the target language and further issues (e.g., dataset size) and is something you have to take care of.
Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points
Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y)
List<Point> temp0 = new List<Point>();
temp0.Add(new Point(0, 0));
temp0.Add(new Point(0, 1));
sets.Add("Set1", temp0);
List<String> tempX = new List<string>();
temp0 = new List<Point>();
temp0.Add(new Point(0, 2));
temp0.Add(new Point(1, 2));
sets.Add("Set2", temp0);
List<String> tempY = new List<string>();
setsDef.Add("X", tempX);
setsDef.Add("Y", tempY);
//-------- TEST
//I have a new Y value which is 2,2
Point targetPoint = new Point(2, 2);
string targetSet = "Y";
//I go through all the Y sets
List<string> targetSets = setsDef[targetSet];
bool alreadyThere = false;
Point candidatePoint;
string foundSet = "";
foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet
List<Point> curPoints = sets[set];
foreach (Point point in curPoints) //Going through all the points in the given set
if (point == targetPoint)
//Already-stored point and thus the analysis will be stopped
alreadyThere = true;
else if (isSurroundingPoint(point, targetPoint))
//A close point was found and thus the set where the targetPoint has to be stored
candidatePoint = point;
foundSet = set;
if (alreadyThere || foundSet != "")
if (!alreadyThere)
if (foundSet != "")
//Point added to an existing set
List<Point> curPoints = sets[foundSet];
sets[foundSet] = curPoints;
//A new set has to be created
string newName = "New Set";
temp0 = new List<Point>();
sets.Add(newName, temp0);
setsDef[targetSet] = targetSets;
Where isSurroundingPoint is a function checking whether both points are close one to the other:
private bool isSurroundingPoint(Point point1, Point point2)
bool isSurrounding = false;
if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1)
if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1)
isSurrounding = true;
return isSurrounding;
You may want to check out region growing algorithms, which are used for image segmentation. These algorithms start from a seed pixel and grow a contiguous region where all the pixels in the region have some property.
In your case adjacent 'pixels' are in the same image segment if they have the same label (ie, kind of element X, Y or Z)
I wrote something to find objects of just one type for another SO question. The example below adds two more types. Any re-iteration would examine the whole list again. The idea is to process the list of points for each type separately. The function solve groups any connected points and removes them from the list before enumerating the next group. areConnected checks the relationship between the points' coordinates since we are only testing points of one type. In this generalized version, the types (a b c) could be anything (strings, numbers, tuples, etc.), as long as they match.
btw - here's a link to a JavaScript example of j_random_hacker's terrific algorithm:
Haskell code:
import Data.List (elemIndices, delete)
example = ["xxyyyz"
objects a b c ws = [("X",solve xs []),("Y",solve ys []),("Z",solve zs [])] where
mapIndexes s =
concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws)
[xs,ys,zs] = map mapIndexes [a,b,c]
areConnected (y,x) (y',x') = abs (x-x') < 2 && abs (y-y') < 2
solve [] r = r
solve (x:xs) r =
let r' = solve' xs [x]
in solve (foldr delete xs r') (if null (drop 2 r') then r else r':r)
solve' vs r =
let ys = filter (\y -> any (areConnected y) r) vs
in if null ys then r else solve' (foldr delete vs ys) (ys ++ r)
Sample output:
*Main> objects 'x' 'y' 'z' example
(0.02 secs, 1560072 bytes)
