Creating sets of similar elements in a 2D array - arrays

I am trying to solve a problem that is based on a 2D array. This array contains different kinds of elements (from a total of 3 possible kinds). Lets assume the kind as X, Y, Z.
The array appears to be something like this. Note that it would always be completely filled. The diagram is for illustration.
7 | | | | | | |
6 | | | | | | |
5 | | | | | | |
4 | |X|Z|Y|X| |
3 | |Y|X|Y|Y|X|
2 |Y|Y|X|Z|Z|X|
1 |X|X|Y| |X|X|
0 | | | |Z| | |
0 1 2 3 4 5
I am trying to create sets of elements that are placed adjacent to each other. For example, set1 may comprise of elements of type X located at: (0,1), (1,1), (2,2), (2,3), (1,4). Similarly, set2 may comprise of elements of type Y located at: (3,4), (3,3), 4,3).
Problem: Given any point in the array, it must be capable of adding all elements to the appropriate set and ensuring that there are no two sets that contain the same element. Note that a set is only created if more than 2 adjacent elements of the same kind are encountered.
Moreover, if a certain subset of elements is removed, more elements are added to replace the removed ones. The array must then be re-iterated over to make new sets or modify the existing ones.
Solution: I implemented a recursive solution such that it would iterate over all the adjacent elements of, for example, element X (0,1). Then, while iterating over the 8 possible adjacent elements, it would call itself recursively whenever a type X occurred.
This kind of solution is too much brute-force and inefficient, especially in the case where some elements are replaced with new ones of possibly different types. In such a case, almost the whole array has to be re-iterated to make/modify sets and ensuring that no same element exists in more than one set.
Is there any algorithm to deal efficiently with this kind of problem? I need help with some ideas/suggestions or pseudo codes.

[EDIT 5/8/2013: Fixed time complexity. (O(a(n)) is essentially constant time!)]
In the following, by "connected component" I mean the set of all positions that are reachable from each other by a path that allows only horizontal, vertical or diagonal moves between neighbouring positions having the same kind of element. E.g. your example {(0,1), (1,1), (2,2), (2,3), (1,4)} is a connected component in your example input. Each position belongs to exactly one connected component.
We will build a union/find data structure that will be used to give every position (x, y) a numeric "label" having the property that if and only if any two positions (x, y) and (x', y') belong to the same component then they have the same label. In particular this data structure supports three operations:
set(x, y, i) will set the label for position (x, y) to i.
find(x, y) will return the label assigned to the position (x, y).
union(Z), for some set of labels Z, will combine all labels in Z into a single label k, in the sense that future calls to find(x, y) on any position (x, y) that previously had a label in Z will now return k. (In general k will be one of the labels already in Z, though this is not actually important.) union(Z) also returns the new "master" label, k.
If there are n = width * height positions in total, this can be done in O(n*a(n)) time, where a() is the extremely slow-growing inverse Ackermann function. For all practical input sizes, this is the same as O(n).
Notice that whenever two vertices are adjacent to each other, there are four possible cases:
One is above the other (connected by a vertical edge)
One is to the left of the other (connected by a horizontal edge)
One is above and to the left of the other (connected by a \ diagonal edge)
One is above and to the right of the other (connected by a / diagonal edge)
We can use the following pass to determine labels for each position (x, y):
Set nextLabel to 0.
For each row y in increasing order:
For each column x in increasing order:
Examine the W, NW, N and NE neighbours of (x, y). Let Z be the subset of these 4 neighbours that are of the same kind as (x, y).
If Z is the empty set, then we tentatively suppose that (x, y) starts a brand new component, so call set(x, y, nextLabel) and increment nextLabel.
Otherwise, call find(Z[i]) on each element of Z to find their labels, and call union() on this set of labels to combine them together. Assign the new label (the result of this union() call) to k, and then also call set(x, y, k) to add (x, y) to this component.
After this, calling find(x, y) on any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly answer queries of the form "Which positions belong to the connected component containing position (x, y)?" then create a hashtable of lists posInComp and make a second pass over the input array, appending each (x, y) to the list posInComp[find(x, y)]. This can all be done in linear time and space. Now to answer a query for some given position (x, y), simply call lab = find(x, y) to find that position's label, and then list the positions in posInComp[lab].
To deal with "too-small" components, just look at the size of posInComp[lab]. If it's 1 or 2, then (x, y) does not belong to any "large-enough" component.
Finally, all this work effectively takes linear time, so it will be lightning fast unless your input array is huge. So it's perfectly reasonable to recompute it from scratch after modifying the input array.

In your situation, I would rely, at least, on two different arrays:
Array1 (sets) -> all the sets and the associated list of points. Main indices: set names.
Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names.
It might be possible to create more supporting arrays like, for example, one including the minimum/maximum X/Y values for each set to speed up the analysis (although it would be pretty quick anyway, as shown below).
You are not mentioning any programming language, but I include a sample (C#) code because it is the best way to explain the point. Please, don't understand it as a suggestion of the best way to proceed (personally, I don't like Dictionaries/Lists too much; although think that do provide a good graphical way to show an algorithm, even for unexperienced C# users). This code only intends to show a data storage/retrieval approach; the best way to achieve the optimal performance would depend upon the target language and further issues (e.g., dataset size) and is something you have to take care of.
Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points
Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y)
List<Point> temp0 = new List<Point>();
temp0.Add(new Point(0, 0));
temp0.Add(new Point(0, 1));
sets.Add("Set1", temp0);
List<String> tempX = new List<string>();
tempX.Add("Set1");
temp0 = new List<Point>();
temp0.Add(new Point(0, 2));
temp0.Add(new Point(1, 2));
sets.Add("Set2", temp0);
List<String> tempY = new List<string>();
tempY.Add("Set2");
setsDef.Add("X", tempX);
setsDef.Add("Y", tempY);
//-------- TEST
//I have a new Y value which is 2,2
Point targetPoint = new Point(2, 2);
string targetSet = "Y";
//I go through all the Y sets
List<string> targetSets = setsDef[targetSet];
bool alreadyThere = false;
Point candidatePoint;
string foundSet = "";
foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet
{
List<Point> curPoints = sets[set];
foreach (Point point in curPoints) //Going through all the points in the given set
{
if (point == targetPoint)
{
//Already-stored point and thus the analysis will be stopped
alreadyThere = true;
break;
}
else if (isSurroundingPoint(point, targetPoint))
{
//A close point was found and thus the set where the targetPoint has to be stored
candidatePoint = point;
foundSet = set;
break;
}
}
if (alreadyThere || foundSet != "")
{
break;
}
}
if (!alreadyThere)
{
if (foundSet != "")
{
//Point added to an existing set
List<Point> curPoints = sets[foundSet];
curPoints.Add(targetPoint);
sets[foundSet] = curPoints;
}
else
{
//A new set has to be created
string newName = "New Set";
temp0 = new List<Point>();
temp0.Add(targetPoint);
sets.Add(newName, temp0);
targetSets.Add(newName);
setsDef[targetSet] = targetSets;
}
}
Where isSurroundingPoint is a function checking whether both points are close one to the other:
private bool isSurroundingPoint(Point point1, Point point2)
{
bool isSurrounding = false;
if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1)
{
if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1)
{
isSurrounding = true;
}
}
return isSurrounding;
}

You may want to check out region growing algorithms, which are used for image segmentation. These algorithms start from a seed pixel and grow a contiguous region where all the pixels in the region have some property.
In your case adjacent 'pixels' are in the same image segment if they have the same label (ie, kind of element X, Y or Z)

I wrote something to find objects of just one type for another SO question. The example below adds two more types. Any re-iteration would examine the whole list again. The idea is to process the list of points for each type separately. The function solve groups any connected points and removes them from the list before enumerating the next group. areConnected checks the relationship between the points' coordinates since we are only testing points of one type. In this generalized version, the types (a b c) could be anything (strings, numbers, tuples, etc.), as long as they match.
btw - here's a link to a JavaScript example of j_random_hacker's terrific algorithm: http://jsfiddle.net/groovy/fP5kP/
Haskell code:
import Data.List (elemIndices, delete)
example = ["xxyyyz"
,"xyyzzz"
,"yxxzzy"
,"yyxzxy"
,"xyzxyy"
,"xzxxzz"
,"xyzyyz"
,"xyzxyy"]
objects a b c ws = [("X",solve xs []),("Y",solve ys []),("Z",solve zs [])] where
mapIndexes s =
concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws)
[xs,ys,zs] = map mapIndexes [a,b,c]
areConnected (y,x) (y',x') = abs (x-x') < 2 && abs (y-y') < 2
solve [] r = r
solve (x:xs) r =
let r' = solve' xs [x]
in solve (foldr delete xs r') (if null (drop 2 r') then r else r':r)
solve' vs r =
let ys = filter (\y -> any (areConnected y) r) vs
in if null ys then r else solve' (foldr delete vs ys) (ys ++ r)
Sample output:
*Main> objects 'x' 'y' 'z' example
[("X",[[(7,0),(6,0),(5,0),(4,0)]
,[(3,4),(5,2),(5,3),(4,3),(2,2),(3,2),(2,1),(0,1),(1,0),(0,0)]])
,("Y",[[(7,5),(6,4),(7,4),(6,3)],[(4,4),(4,5),(3,5),(2,5)]
,[(4,1),(3,0),(3,1),(0,4),(2,0),(0,3),(1,1),(1,2),(0,2)]])
,("Z",[[(5,5),(6,5),(5,4)]
,[(7,2),(6,2),(5,1),(4,2),(3,3),(1,3),(2,3),(2,4),(1,4),(1,5),(0,5)]])]
(0.02 secs, 1560072 bytes)

Related

FiPy: Can I directly change faceVariables depending on neighboring cells?

I am working with a biological model of the distribution of microbial biomass (b1) on a 2D grid. From the biomass a protein (p1) is produced. The biomass diffuses over the grid, while the protein does not. Only if a certain amount of protein is produced (p > p_lim), the biomass is supposed to diffuse.
I try to implement this by using a dummy cell variable z multiplied with the diffusion coefficient and setting it from 0 to 1 only in cells where p > p_lim.
The condition works fine and when the critical amount of p is reached in a cell, z is set to 1, and diffusion happens. However, the diffusion still does not work with the rate I would like, because to calculate diffusion, the face variable, not the value of the cell itself is used. The faces of z are always a mean of the cell with z=1 and its neighboring cells with z=0. I I, however, would like the diffusion to work at its original rate even if the neighbouring cell is still at p < p_lim.
So, my question is: Can i somehow access a faceVariable and change it? For example, set a face to 1 if any neigboring cell has reached p1 > p_lim? I guess this is not a proper mathematical thing to do, but I couldn't think of another way to simulate this problem.
I will show a very reduced form of my model below. In any case, I thank you very much for your time!
##### produce mesh
nx= 5.
ny= nx
dx = 1.
dy = dx
L = nx*dx
mesh = Grid2D(nx=nx,ny=ny,dx=dx,dy=dy)
#parameters
h1 = 0.5 # production rate of p
Db = 10. # diffusion coeff of b
p_lim=0.1
# cell variables
z = CellVariable(name="z",mesh=mesh,value=0.)
b1 = CellVariable(name="b1",mesh=mesh,hasOld=True,value=0.)
p1= CellVariable(name="p1",mesh=mesh,hasOld=True,value=0.)
# equations
eqb1 = (TransientTerm(var=b1)== DiffusionTerm(var=b1,coeff=Db*z.arithmeticFaceValue)-ImplicitSourceTerm(var=b1,coeff=h1))
eqp1 = (TransientTerm(var=p1)==ImplicitSourceTerm(var=b1,coeff=h1))
# set b1 to 10. in the center of the grid
b1.setValue(10.,where=((x>2.)&(x<3.)&(y>2.)&(y<3.)))
vi=Viewer(vars=(b1,p1),FIPY_VIEWER="matplotlib")
eq = eqb1 & eqp1
from builtins import range
for t in range(10):
b1.updateOld()
p1.updateOld()
z.setValue(z + 0.1,where=((p1>=p_lim) & (z < 1.)))
eq.solve(dt=0.1)
vi.plot()
In addition to .arithmeticFaceValue, FiPy provides other interpolators between cell and face values, such as .harmonicFaceValue and .minmodFaceValue.
These properties are implemented using subclasses of _CellToFaceVariable, specifically _ArithmeticCellToFaceVariable, _HarmonicCellToFaceVariable, and _MinmodCellToFaceVariable.
You can also make a custom interpolator by subclassing _CellToFaceVariable. Two such examples are _LevelSetDiffusionVariable and ScharfetterGummelFaceVariable (neither is well documented, I'm afraid).
You need to override the _calc_() method to provide your custom calculation. This method takes three arguments:
alpha: an array of the ratio (0-1) of the distance from the face to the cell on one side, relative to the distance from distance from the cell on the other side to the cell on the first side
id1: an array of indices of the cells on one side of the face
id2: an array of indices of the cells on the other side of the face
Note: You can ignore any clause if inline.doInline: and look at the _calc_() method defined under the else: clause.

Julia / Cellular Automata: efficient way to get neighborhood

I'd like to implement a cellular automaton (CA) in Julia. Dimensions should be wrapped, this means: the left neighbor of the leftmost cell is the rightmost cell etc.
One crucial question is: how to get the neighbors of one cell to compute it's state in the next generation? As dimensions should be wrapped and Julia does not allow negative indices (as in Python) i had this idea:
Considered a 1D CA, one generation is a one-dimensional array:
0 0 1 0 0
What if we create a two dimensional Array, where the first row is shifted right and the third is shifted left, like this:
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
Now, the first column contain the states of the first cell and it's neighbors etc.
i think this can easily be generalized for two and more dimensions.
First question: do you think this is a good idea, or is this a wrong track?
EDIT: Answer to first question was no, second Question and code example discarded.
Second question: If the approach is basically ok, please have a look at the following sketch:
EDIT: Other approach, here is a stripped down version of a 1D CA, using mod1() for getting neighborhood-indices, as Bogumił Kamiński suggested.
for any cell:
- A array of all indices
- B array of all neighborhood states
- C states converted to one integer
- D lookup next state
function digits2int(digits, base=10)
int = 0
for digit in digits
int = int * base + digit
end
return int
end
gen = [0,0,0,0,0,1,0,0,0,0,0]
rule = [0,1,1,1,1,0,0,0]
function nextgen(gen, rule)
values = [mod1.(x .+ [-1,0,1], size(gen)) for x in 1:length(gen)] # A
values = [gen[value] for value in values] # B
values = [digits2int(value, 2) for value in values] # C
values = [rule[value+1] for value in values] # D
return values
end
for _ in 1:100
global gen
println(gen)
gen = nextgen(gen, rule)
end
Next step should be to extend it to two dimensions, will try it now...
The way I typically do it is to use mod1 function for wrapped indexing.
In this approach, no matter what dimensionality of your array a is then when you want to move from position x by delta dx it is enough to write mod1(x+dx, size(a, 1)) if x is the first dimension of an array.
Here is a simple example of a random walk on a 2D torus counting the number of times a given cell was visited (here I additionally use broadcasting to handle all dimensions in one expression):
function randomwalk()
a = zeros(Int, 8, 8)
pos = (1,1)
for _ in 1:10^6
# Von Neumann neighborhood
dpos = rand(((1,0), (-1,0), (0,1), (0,-1)))
pos = mod1.(pos .+ dpos, size(a))
a[pos...] += 1
end
a
end
Usually, if the CA has cells that are only dependent on the cells next to them, it's simpler just to "wrap" the vector by adding the last element to the front and the first element to the back, doing the simulation, and then "unwrap" by taking the first and last elements away again to get the result length the same as the starting array length. For the 1-D case:
const lines = 10
const start = ".........#........."
const rules = [90, 30, 14]
rule2poss(rule) = [rule & (1 << (i - 1)) != 0 for i in 1:8]
cells2bools(cells) = [cells[i] == '#' for i in 1:length(cells)]
bools2cells(bset) = prod([bset[i] ? "#" : "." for i in 1:length(bset)])
function transform(bset, ruleposs)
newbset = map(x->ruleposs[x],
[bset[i + 1] * 4 + bset[i] * 2 + bset[i - 1] + 1
for i in 2:length(bset)-1])
vcat(newbset[end], newbset, newbset[1])
end
const startset = cells2bools(start)
for rul in rules
println("\nUsing Rule $rul:")
bset = vcat(startset[end], startset, startset[1]) # wrap ends
rp = rule2poss(rul)
for _ in 1:lines
println(bools2cells(bset[2:end-1])) # unwrap ends
bset = transform(bset, rp)
end
end
As long as only the adjacent cells are used in the simulation for any given cell, this is correct.
If you extend this to a 2D matrix, you would also "wrap" the first and last rows as well as the first and last columns, and so forth.

Connecting random points in MATLAB without intersecting lines

I need help with solving this problem. I have randomly generated points (example on Picture #1) and I want to connect them with lines (example on Picture #2). Lines can't be intersected and after connection, the connected points should look like an irregular area.
%Generating random points
xn = randi([3 7],1,10);
yn = randi([3 6],1,10);
%Generated points
xn = [6,3,7,7,6,6,6,4,6,3];
yn = [5,3,4,3,3,6,5,4,6,3];
Picture #1:
Result should be like this:
Picture #2:
Any idea how to solve this?
I suppose for the general case it can be very difficult to come up with a solution. But, assuming your points are scattered "nicely" there is quite a simple solution.
If you sort your points according to the angle above the x axis of the vector connecting the point and the center of the point cloud then:
P = [xn;yn]; %// group the points as columns in a matrix
c = mean(P,2); %// center point relative to which you compute the angles
d = bsxfun(#minus, P, c ); %// vectors connecting the central point and the dots
th = atan2(d(2,:),d(1,:)); %// angle above x axis
[st si] = sort(th);
sP = P(:,si); %// sorting the points
And that's about it. To plot the result:
sP = [sP sP(:,1)]; %// add the first point again to close the polygon
figure;plot( sP(1,:), sP(2,:), 'x-');axis([0 10 0 10]);
This algorithm will fail if several points has the same angle w.r.t the center of the point cloud.
An example with 20 random points:
P = rand(2,50);
You could adapt the code from another answer I gave for generating random simple polygons of an arbitrary number of sides. The difference here is you already have your set of points chosen and thus implicitly the number of sides you want (i.e. the same as the number of unique points). Here's what the code would look like:
xn = [6,3,7,7,6,6,6,4,6,3]; % Sample x points
yn = [5,3,4,3,3,6,5,4,6,3]; % Sample y points
[~, index] = unique([xn.' yn.'], 'rows', 'stable'); % Get the unique pairs of points
x = xn(index).';
y = yn(index).';
numSides = numel(index);
dt = DelaunayTri(x, y);
boundaryEdges = freeBoundary(dt);
numEdges = size(boundaryEdges, 1);
while numEdges ~= numSides
if numEdges > numSides
triIndex = vertexAttachments(dt, boundaryEdges(:,1));
triIndex = triIndex(randperm(numel(triIndex)));
keep = (cellfun('size', triIndex, 2) ~= 1);
end
if (numEdges < numSides) || all(keep)
triIndex = edgeAttachments(dt, boundaryEdges);
triIndex = triIndex(randperm(numel(triIndex)));
triPoints = dt([triIndex{:}], :);
keep = all(ismember(triPoints, boundaryEdges(:,1)), 2);
end
if all(keep)
warning('Couldn''t achieve desired number of sides!');
break
end
triPoints = dt.Triangulation;
triPoints(triIndex{find(~keep, 1)}, :) = [];
dt = TriRep(triPoints, x, y);
boundaryEdges = freeBoundary(dt);
numEdges = size(boundaryEdges, 1);
end
boundaryEdges = [boundaryEdges(:,1); boundaryEdges(1,1)];
x = dt.X(boundaryEdges, 1);
y = dt.X(boundaryEdges, 2);
And here's the resulting polygon:
patch(x,y,'w');
hold on;
plot(x,y,'r*');
axis([0 10 0 10]);
Two things to note:
Some sets of points (like the ones you chose here) will not have a unique solution. Notice how my code connected the top 4 points in a slightly different way than you did.
I made use of the TriRep and DelaunayTri classes, both of which may be removed in future MATLAB releases in favor of the delaunayTriangulation class.

Data structure for keeping track of a game board in OCaml

I am fairly new to OCaml and I want to implement a game which is similar to four-in-a-line.
What I need is some data structure to keep the game state. The game board is a 4x4 square with a total of 16 tiles.
I am looking for a representation for this in OCaml that will make it easy and fast to retrieve (or do some operation on) all the elements in entire column, row or diagonal.
I will be doing minimax search on this game, which is why speed is important.
So far I have considered a one-dimensional list. The problem with a list is that its hard to figure out what elements belong to each row/column/diagonal, and then retrieve them with a List.map for example.
I thought about using Array.make 4 (Array.make 4 Empty);;. This is absolutely perfect when it comes to rows. Its easy to get them and do a pattern match on it. But it is a chore to do pattern matching on individual columns and diagonals.
What I would like to be able to do is have a function that takes a game board and returns a list of lists containing all the rows/columns/diagonals. I would then like to do, for example, match (rows,columns,diagonals) with (Empty, Empty, Empty, Empty) -> something.
Since length is fixed, prefer arrays over lists: they use less memory and are faster to read and write.
I'm afraid you will need to write a function to get diagonals, there is no simple pattern matching.
When you write "do some operation on [a diagonal]", I assume you're thinking about a function f that takes an array of length 4 storing the elements, for instance [|Empty;Empty;Empty;Empty|].
Maybe f could instead take as arguments the position p, and an array of indices inside the position:
f p [|x1,y1; x2,y2; x3,y3; x4,y4|] would extract the squares p.(x1).(y1) ... p.(x4).(y4). Then just pass different x's and y's to make f operate on row/columns/diagonals.
Once the code is working and you're turning to optimization, you might want to have a look at bitvectors:
if there are a lot of positions stored in the tree of you minmax search, reducing the memory footprint means more cache hits and faster execution. You might event want to encode a position in a single int yourself, but this is some tricky work, you don't want to do it too early.
Sometimes matching doesn't work. Here, I think you should try to use functions as much as possible, and then getting your cells row first or column first won't be that complex, and you could even move from one representation to the other by reversing the indices order.
If I use the following type:
type color = Red | Yellow;;
type cell = Empty | Color of color;;
type board = Array.make 4 (Array.make 4 Empty);;
and decide for column first, then the following functions will get me rows or columns:
let column (b: board) i j = b.(i).(j)
let row (b: board) i j = b.(j).(i)
For the diagonals, there are 2 sets of them, one going top-left toward down-right, and the other one in the other direction (top-right to bottom-left):
let ldiag (b: board) i j = b.((i + j) mod 4).(j)
let rdiag (b: board) i j = b.((i - j + 4) mod 4).(j)
Then I guess that checking a row, column, or diagonal is just a matter of checking the 4 cells of that line.
let check predicate linef k = predicate (linef b k 0) &&
predicate (linef b k 1) &&
predicate (linef b k 2) &&
predicate (linef b k 3)
then for instance, checking if there's a diagonal of red:
let has_line linef b color =
let cmp x = x = color in
let check k = check cmp linef b k in
check 0 || check 1 || check 2 || check 3
let has_ldiag b color = has_line ldiag b color
let has_rdiag b color = has_line rdiag b color
let has_red_diagonal b = has_ldiag b Red | has_rdiag b Red
Etc.
How about indexing your tiles with the corresponding coordinate? So, the elements on your one-d list would be of the form:
(int * int * ref tile)
Then you can filter rows / columns / diagonals like this:
Row n: (precondition: 0 <= n, u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u = n);;
Column n: (precondition: 0 <= n, u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> v = n);;
Diagonal 1: (precondition: 0 <= u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u = v);;
Diagonal 2: (precondition: 0 <= u, v <= 3)
List.filter tiles (fun x -> match x with (u, v, _) -> u + v = 3);;
It should also be possible to index the tiles with just one integer (the index of the tile within the one-d list), but that would need some calculations in the filter function (given index, figure out the coordinate and then decide if it belongs to the desired row / column / diagonal).

Algorithm for maintaining an "ordering string" for ordering database elements

I have a database in which I'd like to store an arbitrary ordering for a particular element. The database in question doesn't support order sets, so I have to do this myself.
One way to do this would be to store a float value for the element's position, and then take the average of the position of the surrounding elements when inserting a new one:
Item A - Position 1
Item B - Position 1.5 (just inserted).
Item C - Position 2
Now, for various reasons I don't wish to use floats, I'd like to use strings instead. For example:
Item A - Position a
Item B - Position aa (just inserted).
Item C - Position b
I'd like to keep these strings as short as possible since they will never be "tidied up".
Can anyone suggest an algorithm for generating such string as efficiently and compactly as possible?
Thanks,
Tim
It would be reasonable to assign 'am' or 'an' position to Item B and use binary division steps for another insertions.
This resembles 26-al number system, where 'a'..'z' symbols correspond to 0..25.
a b //0 1
a an b //insert after a - middle letter of alphabet
a an au b //insert after an
a an ar au b //insert after an again (middle of an, au)
a an ap ar au b //insert after an again
a an ao ap ar au b //insert after an again
a an ann ao... //insert after an, there are no more place after an, have to use 3--symbol label
....
a an anb... //to insert after an, we treat it as ana
a an anan anb // it looks like 0 0.5 0.505 0.51
Pseudocode for binary tree structure:
function InsertAndGetStringKey(Root, Element): string
if Root = nil then
return Middle('a', 'z') //'n'
if Element > Root then
if Root.Right = nil then
return Middle(Root.StringKey, 'z')
else
return InsertAndGetStringKey(Root.Right, Element)
if Element < Root then
if Root.Left = nil then
return Middle(Root.StringKey, 'a')
else
return InsertAndGetStringKey(Root.Left, Element)
Middle(x, y):
//equalize length of strings like (an, anf)-> (ana, anf)
L = Length(x) - Length(y)
if L < 0 then
x = x + StringOf('a', -L) //x + 'aaaaa...' L times
else if L > 0 then
y = y + StringOf('a', L)
if LL = LastSymbol(x) - LastSymbol(y) = +-1 then
return(Min(x, y) + 'n') // (anf, ang) - > anfn
else
return(Min(x, y) + (LastSymbol(x) + LastSymbol(y))/2) // (nf, ni)-> ng
As stated the problem has no solution. Once an algorithm has generated strings 'a' and 'aa' for adjacent elements there is no string which can be inserted between them. This is a fatal problem for the approach. This problem is independent of the alphabet used for the strings: replace 'a' by 'the first letter in the alphabet used' if you wish.
Of course, it can be worked around by changing the ordering string for other elements when this impasse is reached, but that seems to be beyond what OP wants.
I think that the problem is equivalent to finding an integer to represent the order of an element and finding that, say, 35 and 36 are already used to order existing elements. There is simply no integer between 35 and 36, no matter how hard you look.
Use real numbers, or a computer approximation such as floating-point numbers, or rationals.
EDIT in response to OP's comment
Just adapt the algorithm for adding 2 rationals: (a/b)+(c/d) = (ad+cb)/bd. Take (ad+cb)/2 (rounding if you want or need) and you have a rational midway between the first two.
Are capitals an option?
If so, I would use them to insert between otherwise adjacent values.
For instance to insert between
a
aa
You could do:
a
aAaa <--- this cap. tells there is one more place between adjacent small values .ie. a[Aa]a
aAba
aAca
aBaa
aBba
aa
Now if you need to insert between a and aAaa
You could do
a
aAAaaa <--- 2 caps. tells there are two more places between adjacent small values i.e. a[AAaa]a
aAAaba
aAAaca
...
aAAbaa
aAaa
In terms of being compact or efficient I make no claims...

Resources