String expansion in openCL - c

I have a simple task of expanding the string FX according to the following rules:
X -> X+YF+
Y-> -FX-Y
In OpenCL, string manipulation is not supported but the use of an array of characters is. How would a kernel program that expands this string in parallel look like in openCL?
More details:
Consider the expansion of 'FX' in the python code below.
axiom = "FX"
def expand(s):
switch = {
"X": "X+YF+",
"Y": "-FX-Y",
}
return switch.get(s, s)
def expand_once(string):
return [expand(c) for c in string]
def expand_n(s, n):
for i in range(n):
s = ''.join(expand_once(s))
return s
expanded = expand_n(axiom, 200)
The result expanded will be a result of expanding the axiom 'FX' 200 times. This is a rather slow process thus the need to do it on openCL for parallelization.
This process results in an array of strings which I will then use to draw a dragon curve.
below is an example of how I would come up with such a dragon curve: This part is not of much importance. The expansion on OpenCL is the crucial part.
import turtles
from PIL import Image
turtles.setposition(5000, 5000)
turtles.left(90) # Go up to start.
for c in expanded:
if c == "F":
turtles.forward(10)
elif c == "-":
turtles.left(90)
elif c == "+":
turtles.right(90)
# Write out the image.
im = Image.fromarray(turtles.canvas)
im.save("dragon_curve.jpg")

Recursive algorithms like this don't especially lend themselves to GPU acceleration, especially as the data set changes its size on each iteration.
If you do really need to do this iteratively, the challenge is for each work-item to know where in the output string to place its result. One way to do this would be to assign work groups a specific substring of the input, and on every iteration, keep count of the total number of Xs and Ys in each workgroup-sized substring of the output. From this you can calculate how much that substring will expand in one iteration, and if you accumulate those values, you'll know the offset of the output of each substring expansion. Whether this is efficient is another question. :-)
However, your algorithm is actually fairly predictable: you can calculate precisely how large the final string will be given the initial string and number of iterations. The best way to generate this string with OpenCL would be to come up with a non-recursive function which analytically calculates the character at position N given M iterations, and then call that function once per work-item, with the (known!) final length of the string as the work size. I don't know if it's possible to come up with such a function, but it seems like it might be, and if it is possible, this is probably the most efficient way to do it on a GPU.
It seems like this might be possible: as far as I can tell, the result will be highly periodic:
FX
FX+YF+
FX+YF++-FX-YF+
FX+YF++-FX-YF++-FX+YF+--FX-YF+
FX+YF++-FX-YF++-FX+YF+--FX-YF++-FX+YF++-FX-YF+--FX+YF+--FX-YF+
^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^ ^^^^^^^
A* B A B A B A B
As far as I can see, those A blocks are all identical, and so are the Bs. (apart from the first A which is effectively at position -1) You can therefore determine the characters at 14 positions out of every 16 completely deterministically. I strongly suspect it's possible to work out the pattern of +s and -s that connects them too. If you figure that out, the solution becomes pretty easy.
Note though that when you have that function, you probably don't even need to put the result in a giant string: you can just feed your drawing algorithm with that function directly.

Related

Theory of arrays in Z3: (1) model is difficult to understand, (2) do not know how to implement functions and (3) difference with sequences

Following to the question published in How expressive can we be with arrays in Z3(Py)? An example, I expressed the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
The solution in Z3Py:
t = Int('t')
avg_arr = Int('avg_arr')
len_arr = Int('len_arr')
arr = Array('arr', IntSort(), IntSort())
phi_1 = And(0 <= i, i< len_arr)
phi_2 = (t+avg_arr<arr[i])
phi = Exists(i, And(phi_1, phi_2))
s = Solver()
s.add(phi)
print(s.check())
print(s.model())
Note that, (1) the formula is satisfiable and (2) each time I execute it, I get a different model. For instance, I just got: [avg_a = 0, t = 7718, len_arr = 1, arr = K(Int, 7719)].
I have three questions now:
What does arr = K(Int, 7719)] mean? Does this mean the array contains one Int element with value 7719? In that case, what does the K mean?
Of course, this implementation is wrong in the sense that the average and length values are independent from the array itself. How can I implement simple avg and len functions?
Where is the i index in the model given by the solver?
Also, in which sense would this implementation be different using sequences instead of arrays?
(1) arr = K(Int, 7719) means that it's a constant array. That is, at every location it has the value 7719. Note that this is truly "at every location," i.e., at every integer value. There's no "size" of the array in SMTLib parlance. For that, use sequences.
(2) Indeed, your average/length etc are not related at all to the array. There are ways of modeling this using quantifiers, but I'd recommend staying away from that. They are brittle, hard to code and maintain, and furthermore any interesting theorem you want to prove will get an unknown as answer.
(3) The i you declared and the i you used as the existential is completely independent of each other. (Latter is just a trick so z3 can recognize it as a value.) But I guess you removed that now.
The proper way to model such problems is using sequences. (Although, you shouldn't expect much proof performance there either.) Start here: https://microsoft.github.io/z3guide/docs/theories/Sequences/ and see how much you can push it through. Functions like avg will need a recursive definition most likely, for that you can use RecAddDefinition, for an example see: https://stackoverflow.com/a/68457868/936310
Stack-overflow works the best when you try to code these yourself and ask very specific questions about how to proceed, as opposed to overarching questions. (But you already knew that!) Best of luck..

Can a recursive function containing a for loop that contains a call of the mentioned function be implemented using only for loops?

Similar questions have been asked and the general consensus is that anything can be converted from recursion to for loops and vice versa. However, I can't find a way to convert a function of the following pseudocode type to a for loop:
def recursive(n):
if n == 0:
return
for i in range(some_number):
do_sth...
recursive(n-1)
In this case, there is n nested loops and n varies depending on the given argument. When using only for loops, the number of nested loops seems to be always predetermined in the code, it doesn't vary depending on "input". Is there a way to make something like this using only for loops?
Is there a way to make something like this using only for loops?
Well, if you admit a while loop as a case of a pseudocode for loop, at least your example can be made:
def nonrecursive(n):
a = []
z = 0
while n:
while n:
i = z
if i == some_number: break
print((n, i))
a += [[n, i]]
n -= 1
z = 0
if not a: break
n, i = a.pop()
i += 1
z = i
We need to be careful here.
The general true statement is that loops can replace recursion and vice versa. This can be shown lots of ways; see the structured programming theorem for ideas.
Whether for loops can replace recursion depends upon your definitions. Can your for loops run forever, or for an indefinite amount of time not known in advance? If so, they are functionally equivalent to while loops, and they can replace recursion. If your for loops cannot be made to run forever or for an unknown (initially) number of iterations, recursion cannot always be replaced.
Really, it's while loops (plus a stack data structure) that can replace recursion without much trouble.

Dynamic programming with sets

I have a typical question in dynamic programming.
My question is given an array = {1,2,3,4,5,6}, I have to find all the arrays whose sum is atmost k. If I consider all the sets, it will become exponential alogorthm. I thought of achiveng this by Dynamic Programming.
Suppose f k =7,
My idea is
Pass 1: {1],{2}....{6}
Pass 2: Pass1 + {1,2},{1,3},{1,4},{1,5}
Pass 3: Pass2 + {1,2,3},
And my algo stops.
Im not able to formulate this with dynamic programming. Any inputs?? How to formulate this algo into program?
A DP solution for the problem should follow the next recursive formula, and build bottom-up:
f(i,0) = {{}} //a set containing only an empty set
f(0,W) = {{}} (W > 0)
f(0,W) = {} (W < 0) //an empty set
f(i,W) = f(i-1,W) [union] extend(f(i-1,w-element[i]),element[i])
Where the function extend(set,e) is:
extend(set,e):
for each s in set: //s is a set itself
s.add(e)
Note that complexity could still be exponential (and not even pseudo-polynomial), since the number of sets generated could be exponential, and is stored in the DP table.
your problem is an instance of the knapsack problem whose related decision problem is known to be NP-complete. this means that most certainly there will be no sub-exponential algorithm (though a mathematical proof is missing ).
ZachLangleys comment shows that the enumeration of all solutions would still be exponential in the worst case even if there was an efficient problem solver since producing the output already requires exponential time.
since the decision problem is NP-complete, counting can not be easier (otherwise you'd count and afterwards test the result whether it equals 0 or not).

use five point stencil to evaluate function with vector inputs and converge to maximum output value

I am familiar with iterative methods on paper, but MATLAB coding is relatively new to me and I cannot seem to find a way to code this.
In code language...
This is essentially what I have:
A = { [1;1] [2;1] [3;1] ... [33;1]
[1;2] [2;2] [3;2] ... [33;2]
... ... ... ... ....
[1;29] [2;29] [3;29] ... [33;29] }
... a 29x33 cell array of 2x1 column vectors, which I got from:
[X,Y] = meshgrid([1:33],[1:29])
A = squeeze(num2cell(permute(cat(3,X,Y),[3,1,2]),1))
[ Thanks to members of stackOverflow who helped me do this ]
I have a function that calls each of these column vectors and returns a single value. I want to institute a 2-D 5-point stencil method that evaluates a column vector and its 4 neighbors and finds the maximum value attained through the function out of those 5 column vectors.
i.e. if I was starting from the middle, the points evaluated would be :
1.
A{15,17}(1)
A{15,17}(2)
2.
A{14,17}(1)
A{14,17}(2)
3.
A{15,16}(1)
A{15,16}(2)
4.
A{16,17}(1)
A{16,17}(2)
5.
A{15,18}(1)
A{15,18}(2)
Out of these 5 points, the method would choose the one with the largest returned value from the function, move to that point, and rerun the method. This would continue on until a global maximum is reached. It's basically an iterative optimization method (albeit a primitive one). Note: I don't have access to the optimization toolbox.
Thanks a lot guys.
EDIT: sorry I didn't read the iterative part of your Q properly. Maybe someone else wants to use this as a template for a real answer, I'm too busy to do so now.
One solution using for loops (there might be a more elegant one):
overallmax=0;
for v=2:size(A,1)-1
for w=2:size(A,2)-1
% temp is the horizontal part of the "plus" stencil
temp=A((v-1):(v+1),w);
tmpmax=max(cat(1,temp{:}));
temp2=A(v,(w-1):(w+1));
% temp2 is the vertical part of the "plus" stencil
tmpmax2=max(cat(1,temp2{:}));
mxmx=max(tmpmax,tmpmax2);
if mxmx>overallmax
overallmax=mxmx;
end
end
end
But if you're just looking for max value, this is equivalent to:
maxoverall=max(cat(1,A{:}));

Does qsort demand consistent comparisons or can I use it for shuffling?

Update: Please file this under bad ideas. You don't get anything for free in life and here is certainly proof. A simple idea gone bad. It is definitely something to learn from however.
Lazy programming challenge. If I pass a function that 50-50 returns true or false for the qsort's comparision function I think that I can effectively unsort an array of structures writing 3 lines of code.
int main ( int argc, char **argv)
{
srand( time(NULL) ); /* 1 */
...
/* qsort(....) */ /* 2 */
}
...
int comp_nums(const int *num1, const int *num2)
{
float frand =
(float) (rand()) / ((float) (RAND_MAX+1.0)); /* 3 */
if (frand >= 0.5f)
return GREATER_THAN;
return LESS_THAN;
}
Any pitfalls I need to look for? Is it possible in fewer lines through swapping or is this the cleanest I get for 3 non trivial lines?
Bad idea. I mean really bad.
Your solution gives an unpredictable result, not a random result and there is a big difference. You have no real idea of what a qsort with a random comparison will do and whether all combinations are equally likely. This is the most important criterion for a shuffle: all combinations must be equally likely. Biased results equal big trouble. There's no way to prove that in your example.
You should implement the Fisher-Yates shuffle (otherwise known as the Knuth shuffle).
In addition to the other answers, this is worse than a simple Fisher-Yates shuffle because it is too slow. The qsort algorithm is O(n*log(n)), the Fisher-Yates is O(n).
Some more detail is available in Wikipedia on why this kind of "shuffle" does not generally work as well as the Fisher-Yates method:
Comparison with other shuffling
algorithms
The Fisher-Yates shuffle is quite
efficient; indeed, its asymptotic time
and space complexity are optimal.
Combined with a high-quality unbiased
random number source, it is also
guaranteed to produce unbiased
results. Compared to some other
solutions, it also has the advantage
that, if only part of the resulting
permutation is needed, it can be
stopped halfway through, or even
stopped and restarted repeatedly,
generating the permutation
incrementally as needed. In high-level
programming languages with a fast
built-in sorting algorithm, an
alternative method, where each element
of the set to be shuffled is assigned
a random number and the set is then
sorted according to these numbers, may
be faster in practice[citation
needed], despite having worse
asymptotic time complexity (O(n log n)
vs. O(n)). Like the Fisher-Yates
shuffle, this method will also produce
unbiased results if correctly
implemented, and may be more tolerant
of certain kinds of bias in the random
numbers. However, care must be taken
to ensure that the assigned random
numbers are never duplicated, since
sorting algorithms in general won't
order elements randomly in case of a
tie. A variant of the above method
that has seen some use in languages
that support sorting with
user-specified comparison functions is
to shuffle a list by sorting it with a
comparison function that returns
random values. However, this does not
always work: with a number of commonly
used sorting algorithms, the results
end up biased due to internal
asymmetries in the sorting
implementation.[7]
This links to here:
just one more thing While writing this
article I experimented with various
versions of the methods and discovered
one more flaw in the original version
(renamed by me to shuffle_sort). I was
wrong when I said “it returns a nicely
shuffled array every time it is
called.”
The results are not nicely shuffled at
all. They are biased. Badly. That
means that some permutations (i.e.
orderings) of elements are more likely
than others. Here’s another snippet of
code to prove it, again borrowed from
the newsgroup discussion:
N = 100000
A = %w(a b c)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.shuffle
Score[sorted.join("")] += 1
end
Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
This code
shuffles 100,000 times array of three
elements: a, b, c and records how many
times each possible result was
achieved. In this case, there are only
six possible orderings and we should
got each one about 16666.66 times. If
we try an unbiased version of shuffle
(shuffle or shuffle_sort_by), the
result are as expected:
abc: 16517
acb: 16893
bac: 16584
bca: 16568
cab: 16476
cba: 16962
Of course,
there are some deviations, but they
shouldn’t exceed a few percent of
expected value and they should be
different each time we run this code.
We can say that the distribution is
even.
OK, what happens if we use the
shuffle_sort method?
abc: 44278
acb: 7462
bac: 7538
bca: 3710
cab: 3698
cba: 33314
This is not
an even distribution at all. Again?
It shows how the sort method is biased and goes into detail why this is so. FInally he links to Coding Horror:
Let's take a look at the correct
Knuth-Fisher-Yates shuffle algorithm.
for (int i = cards.Length - 1; i > 0; i--)
{
int n = rand.Next(i + 1);
Swap(ref cards[i], ref cards[n]);
}
Do you see the difference? I missed
it the first time. Compare the swaps
for a 3 card deck:
Naïve shuffle Knuth-Fisher-Yates shuffle
rand.Next(3); rand.Next(3);
rand.Next(3); rand.Next(2);
rand.Next(3);
The naive shuffle
results in 3^3 (27) possible deck
combinations. That's odd, because the
mathematics tell us that there are
really only 3! or 6 possible
combinations of a 3 card deck. In the
KFY shuffle, we start with an initial
order, swap from the third position
with any of the three cards, then swap
again from the second position with
the remaining two cards.
No, this won't properly shuffle the array, it will barely move elements around their original locations, with exponential distribution.
The comparison function isn't supposed to return a boolean type, it's supposed to return a negative number, a positive number, or zero which qsort() uses to determine which argument is greater than the other.
The Old New Thing takes on this one
I think the basic idea of randomly partition the set recursively on the way down and concatenate the results on the way up will work (It will average O(n*log n) binary decisions and that is darn close to log2(fact(n)) but q-sort will not be sure to do that with a random predicate.
BTW I think the same argument and issues can be said for any O(n*log n) sort strategy.
Rand isn't the most random thing out there... If you want to shuffle cards or something this isn't the best. Also a Knuth shuffle would be quicker, but your solution is ok if it doesn't loop forever

Resources