I am working on a program which generates C code for one function. This generated C function resides in the central loop of another target program; this function is performance sensitive. The generated function is used to call another function, based on a bool value -- this boolean value is fetched using 2 ints passed to the generated function: a state number and a mode number. Generated function looks like so:
void dispatch(System* system, int state, int mode) {
// Some other code here...
if (truthTable[state][mode]) {
doExpensiveCall(system, state, mode);
}
}
Some facts:
The range of 'state' and 'mode' values start at 0, and end at some number < 10,000. Their possible values are sequential, with no gaps in between. So, for example, if the end value of 'state' is 1000, then we know that there are 1001 and states (including state 0).
The code generator is aware of the states and modes, and it knows ahead of time which combination of state+mode will yield a value of true. Theoretically, any combination of state+mode could yield true value, and thus make a call to doExpensiveCall, but in practive it will mostly be a handful of state+mode combinations that will yield a value of true. Again, this info is known during the code generation.
Since this function will be called alot, I want to optimize the check for the truth value. In the average case, I expect the test to yield false for the vast percetage of time. On average, I expect that less than 1% of the calls will yield a value of true. But, theoretically, it could be as hight as 100% of the time (this point depends on the end-user).
I am exploring the different ways that I could compute whether a state+mode will yied a call to doExpensiveCall(). In the end, I'll have to choose something, so I'm exploring my options now. These are the different ways that I could think of so far:
1) Create a precomputed dual dimensional array, which contains booleans. This is what I'm using in the example above. This yields the fastest possible check that I can think of. The problem is that if state and mode have large ranges (say 10,000x1000), the generated table starts becomming very big (in the case of 10,000x1000, thats 10MB for just that table). Example:
// STATE_COUNT=4, MODE_COUNT=3
static const char truthTable[STATE_COUNT][MODE_COUNT] = {
{0,1,0},
{0,0,0},
{1,1,0},
{0,0,1}
}
2) Create a table like #1, but compressed: instead of each array entry being a single boolean, it would be a char bitfield. Then, during the check, I would do some computation with state+mode to decide how to index into the array. This reduces the size of the precomputed table by MODE_MODE/8. The downside is that the reduction is not that much, and now theres is now need compute the index of the boolean in the bitfield table, instead of just a simple array access as in the case in #1.
3) Since the amount of state+mode combinations that will yield a value of true is expected to be small, a switch statement is also possible (using the truthTable in #1 as reference):
switch(state){
case 0: // row
switch(mode){ // col
case 1: doExpensiveCall(system, state, mode);
break;
}
break;
case 2:
switch(mode){
case 0:
case 1: doExpensiveCall(system, state, mode);
break;
}
break;
case 3:
switch(mode){
case 2: doExpensiveCall(system, state, mode);
break;
}
break;
}
QUESTION:
What are other ways that, given the facts above, can be used calcuate this boolean value needed to call doExpensiveCall()?
Thanks
Edit:
I though about Jens sample code, and the following occurred to me. In order to have just one switch statement, I can do this computation in the generated code:
// #if STATE_COUNT > MODE_COUNT
int i = s * STATE_COUNT + m;
// #else
int i = m * MODE_COUNT + s;
// #endif
switch(i) {
case 1: // use computed values here, too.
case 8:
case 9:
case 14:
doExpensiveCall(system, s, m);
}
I'd try to use a modified version of (3), where you actually have only one call, and all the switch/case stuff leads to that call. By that you can ensure that the compiler will choose whatever heuristics he has for optimizing this.
Something along the line of
switch(state) {
default: return;
case 0: // row
switch(mode){ // col
default: return;
case 1: break;
}
break;
case 2:
switch(mode){
default: return;
case 0: break;
case 1: break;
}
break;
case 3:
switch(mode){
default: return;
case 2: break;
}
break;
}
doExpensiveCall(system, state, mode);
That is, you'd only have "control" inside the switch. The compiler should be able to sort this out nicely.
These heuristics will probably be different between architectures and compilation options (e.g -O3 versus -Os) but this is what compilers are for, make choices based on platform specific knowledge.
And for your reference to time effeciency, if your function call is really expensive as you claim, this part will just be burried in the noise, don't worry about it. (Or otherwise benchmark your code to be sure.)
If the code generator knows the percentage of the table that's in use it can choose the algorithm at build time.
So if it is about 50% true/false use the 10 MB table.
Otherwise use a hash table or a radix tree.
A hash table would choose a hash function and a number of buckets. You'd compute the hash, find the bucket and search the chain for the true (or false) values.
The radix tree would choose a radix (like 10) and you'd have 10 entries with pointers to NULL (no true values down there) and one would have a pointer to another 10 entries, until you finally reach a value.
Related
So for my class, we're supposed to make a simplified version of poker. For example, an array {1,2,3,4,5} would be a straight, while an array {1,1,1,4,5} would be a three of a kind. I've done everything except for determine what kind of hand the user has and here's what I have so far:
public class handTypes {
public static void main(String[] args) {
int[] testHand = {1,1,3,3,5}; //Just a test array
int[] counts = new int [6];
int counter = 0;
for(int i=0; i < testHand.length ; i++){
counts[testHand[i]]++;
}
// Determines a pair or 2 pairs
for(int i =0; i <counts.length; i++){
if(counts[i] == 2)
counter++; }
int option = counter;
switch(option){
case 1:
System.out.println("You have one pair!");
break;
case 2:
System.out.println("You have 2 pairs!");
break;
}
}
}
Of course this doesn't work for for a flush, 4 of a kind, 5 of a kind, etc. But I read online a little algorithm that helps determine that. Lets say the created array "int[] counts" holds the values {4,0,1,0,0}, it would be 4 of a kind while the values {1,1,1,1,1} would be a straight. How would I somehow use this as a statement?
A common general procedure for simple poker-hand evaluators is this:
Sort the hand by rank (high to low, deuces low, aces high).
Starting from the highest hand (five of a kind) and working downward, determine if the hand is of that type and return (e.g. if checking for five of a kind, just check that the first and fifth cards are the same rank--remember that they are sorted--if so, return "five of a kind" and you're done). Your simplified version doesn't seem to have flushes or straight flushes, so skip them.
Work your way down through the hands: for four-of-a-kind, check for XXXXY and XYYYY. For full house, check for XXXYY and XXYYY. For trips, check XXXYZ, XYYYZ, and XYZZZ, etc.
For straights, check five-in-a-row, then special case 2345A (this might not be a special case in your version).
If every check fails, just return "no pair" with the cards sorted high-to-low.
We know that a Duff's device makes use of interlacing the structures of a fallthrough switch and a loop like:
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Now, in Swif 2.1, switch-case control flows do not implicitly have fallthrough as we read in Swift docs:
No Implicit Fallthrough
In contrast with switch statements in C and Objective-C, switch
statements in Swift do not fall through the bottom of each case and
into the next one by default. Instead, the entire switch statement
finishes its execution as soon as the first matching switch case is
completed, without requiring an explicit break statement. This makes
the switch statement safer and easier to use than in C, and avoids
executing more than one switch case by mistake.
Now, given that there's a fallthrough clause to have explicitly a fallthrough side effect in Swift:
Fallthrough
Switch statements in Swift do not fall through the bottom of each case
and into the next one. Instead, the entire switch statement completes
its execution as soon as the first matching case is completed. By
contrast, C requires you to insert an explicit break statement at the
end of every switch case to prevent fallthrough. Avoiding default
fallthrough means that Swift switch statements are much more concise
and predictable than their counterparts in C, and thus they avoid
executing multiple switch cases by mistake.
that is pretty much like:
let integerToDescribe = 5
var description = "The number \(integerToDescribe) is"
switch integerToDescribe {
case 2, 3, 5, 7, 11, 13, 17, 19:
description += " a prime number, and also"
fallthrough
default:
description += " an integer."
}
print(description)
// prints "The number 5 is a prime number, and also an integer."
considering that as Wikipedia reminds to us, the devices comes out from the issue
A straightforward code to copy items from an array to a memory-mapped output register might look like this:
do { /* count > 0 assumed */
*to = *from++; /* "to" pointer is NOT incremented, see explanation below */
} while(--count > 0);
Which would be the exact implementation of a Duff's device in Swift?
This is just a language & coding question, it is not intended to be applied in real Swift applications.
Duffs device is about more than optimisation. If you look at https://research.swtch.com/duff it is a discussion of implementing co-routines using this mechanism (see paragraph 8 for a comment from Mr. Duff).
If you try to write a portable co-routine package without this ability. You will end up in assembly or re-writing jmpbuf entries [ neither is portable ].
Modern languages like go and swift have more restrictive memory models than C, so this sort of mechanism (I imagine) would cause all sorts of tracking problems. Even the lambda-like block structure in clang,gcc end up intertwined with thread local storage, and can cause all sorts of havoc unless you stick to trivial applications.
You express your intent in the highest level code possible, and trust the Swift compiler to optimize it for you, instead of trying to optimize it yourself. Swift is a high level language. You don't do low level loop unrolling in a high level language.
And in Swift, especially, you don't need to worry about copying arrays (the original application of Duff's Device) because Swift pretends to copy an array whenever you assign it, using "copy on write." This means that it will use the same array for two variables as long as you are just reading from them, but as soon as you modify one of them, it will create a duplicate in the background.
For example, from https://developer.apple.com/documentation/swift/array
Modifying Copies of Arrays
Each array has an independent value that includes the values of all
of its elements. For simple types such as integers and other structures,
this means that when you change a value in one array, the value of that
element does not change in any copies of the array. For example:
var numbers = [1, 2, 3, 4, 5]
var numbersCopy = numbers
numbers[0] = 100
print(numbers)
// Prints "[100, 2, 3, 4, 5]"
print(numbersCopy)
// Prints "[1, 2, 3, 4, 5]"
I tried modified the code given here to solve linear equations for values of x. Such as
(3*x+7)/3+(2*x)/9=6/10
by first splitting it into two expressions right and left and then using the "SolveSimpleRoot" and it worked giving the value of x. But if the linear equation was written in the form of
(3+2*x)/(5*x-2)=7, which you can mulitiply throughout by (5*x-2) and indeed is a linear then the code fails at
// extract coefficients, solve known forms of order up to 1
MathNet.Symbolics.Expression[] coeff = MathNet.Symbolics.Polynomial.Coefficients(variable, simple);
with an error of:
The input sequence was empty.Parameter name: source
It also fails to solve if the expression was like (2x+7)/x=2, which still expands out to be linear.
Any idea why?
The code is basically:
public void solveForX()
{
string eqn = "(3*x+7)/3+(2*x)/9=6/10"
string[] expString = eqn.Split('=');
var x = MathNet.Symbolics.Expression.Symbol("x");
MathNet.Symbolics.Expression aleft = MathNet.Symbolics.Infix.ParseOrThrow(expString[0]);
MathNet.Symbolics.Expression aright = MathNet.Symbolics.Infix.ParseOrThrow(expString[1]);
ans = SolveSimpleRoot(x, aleft - aright);
SelectionInsertText(MathNet.Symbolics.Infix.Print(cx));
}
private MathNet.Symbolics.Expression SolveSimpleRoot(MathNet.Symbolics.Expression variable, MathNet.Symbolics.Expression expr)
{
// try to bring expression into polynomial form
MathNet.Symbolics.Expression simple = MathNet.Symbolics.Algebraic.Expand(MathNet.Symbolics.Rational.Simplify(variable, expr));
// extract coefficients, solve known forms of order up to 1
MathNet.Symbolics.Expression[] coeff = MathNet.Symbolics.Polynomial.Coefficients(variable, simple);
switch (coeff.Length)
{
case 1: return variable;
case 2: return MathNet.Symbolics.Rational.Simplify(variable, MathNet.Symbolics.Algebraic.Expand(-coeff[0] / coeff[1]));
default: return MathNet.Symbolics.Expression.Undefined;
}
}
You can extend it to support such rational cases by multiplying both sides with the denominator (hence effectively taking the numerator only, with Rational.Numerator):
private Expr SolveSimpleRoot(Expr variable, Expr expr)
{
// try to bring expression into polynomial form
Expr simple = Algebraic.Expand(Rational.Numerator(Rational.Simplify(variable, expr)));
// extract coefficients, solve known forms of order up to 1
Expr[] coeff = Polynomial.Coefficients(variable, simple);
switch (coeff.Length)
{
case 1: return Expr.Zero.Equals(coeff[0]) ? variable : Expr.Undefined;
case 2: return Rational.Simplify(variable, Algebraic.Expand(-coeff[0] / coeff[1]));
default: return Expr.Undefined;
}
}
This way it can also handle (3+2*x)/(5*x-2)=7. I also fixed case 1 to return undefined if the coefficient is not equal to zero in this case, so the solution of the other example, (2*x+7)/x=2, will be undefined as expected.
Of course this still remains a very simplistic routine which will not be able to handle any higher order problems.
So I'm compiling a subset of C to a simple stack VM for learning purposes and I'd like to know how to best compile a switch statement, e.g.
switch (e) {
case 0: { ... }
case 1: { ... }
...
case k: { ... }
}
The book I'm going through offers a simple way to compile it with indexed jumps but the simple scheme described in the book only works for contiguous, ascending ranges of case values.
Right now I'm using symbolic labels for the first pass and during the second pass I'm going to resolve the labels to actual jump targets because having labels simplifies the initial compilation to stack instructions quite a bit. My idea right now is to generalize what's in the book to any sequence of case values in any order with the following scheme. Call the case values c1, c2, ..., cn and the corresponding labels j1, j2, ..., jn then generate the following sequence of instructions assuming the value of e is on top of the stack:
dup, loadc c1, eq, jumpnz j1,
dup, loadc c2, eq, jumpnz j2,
...
dup, loadc cn, eq, jumpnz jn,
pop, jump :default
j1: ...
j2: ...
...
jn: ...
default: ...
The notation is hopefully clear but if not: dup = duplicate value on top of stack, loadc c = push a constant c on top of the stack, eq = compare top two values on the stack and push 0 or 1 based on the comparison, jumpnz j = jump to the label j if the top stack value is not 0, label: = something that will be resolved to an actual address during second compilation pass.
So my question is then what are some other ways of compiling switch statements? My way of doing it is much less compact than an indexed jump table for contiguous ranges of case values but works out pretty well when there are large gaps.
First sort all the cases. Then identify all (large enough to be worthwhile) continguous or near-contiguous sequences, and treat them as a single unit that is handled with a jump table. Then, instead of your linear sequence of compares and jumps, use a balanced binary tree of branches to minimize the average number of jumps. You do this by comparing against the median of the cases or blocks of cases, recursively.
I am building a board game of sorts and currently dealing with player movement. I want to restrict the player so that they can move only within the game board itself. The board is stored in a nested array, the Actionscript 3 equivalent of a 2D array - like array[x][y]. I know the length, and I know the target I'm trying to find. All I need to do is identify if that target exists within the array, in order to confirm if the player can actually move to that slot, and return true or false. Can anyone offer any suggestions? This doesn't seem to be a very easy question.
Assuming your array looks like this:
var board:Array = [
[a,0,0],
[0,b,0],
[0,0,c]
]
Then to check for the presence in the middle of the board would simply be
if (board[1][1] != 0) {
trace(board[1][1]) // outputs "b"
}
So if you wanted to move b in any direction by one block, you might generically try...
function move(row:int, col:int, direction:string):void {
switch (direction) {
case "up":
if (row - 1 >= 0 && row - 1 < board.length) {
board[row-1][col] = board[row][col];
board[row][col] = 0;
}
break;
case "down":
if (row + 1 >= 0 && row + 1 < board.length) {
board[row+1][col] = board[row][col];
board[row][col] = 0;
}
break;
case "left":
if (col - 1 >= 0 && col - 1 < board[row].length) {
board[row][col - 1] = board[row][col];
board[row][col] = 0;
}
break;
case "right":
if (col + 1 >= 0 && col + 1 < board[row].length) {
board[row][col + 1] = board[row][col];
board[row][col] = 0;
}
break;
}
}
I didn't make it a comment, because it's too long, though this isn't an exhaustive answer, just a couple of things you need to clarify:
It may or may not be the best choice of emulating 2D arrays. Some other choices would include: a single one-dimensional array, where you access items modulo width. This has the benefit of faster lookup and that many built-in functions, which work on arrays will work on it out of the box. The disadvantage would be the row insertion operation - something you'd have to write on your own, while inserting a row into array of arrays is trivial. Another choice - using Cantor's mapping function (example here: Mapping two integers to one, in a unique and deterministic way ), there are more then one such functions with integer hash-table. This has the advantage of taking just as much space as there are actual values (if you have only few elements on the map, it will take less memory). Searching for items in such array is a lot faster then in a normal array (because you don't look into "empty" items - which don't even exist. However, you will need to write a lot of functionality all by yourself.
Finding path is a subject of many algorithms all cumulatively known as "path finding algorithms". The best one to choose will vary significantly upon your circumstances. Before you choose one (or invent your own), it is important to answer questions like this:
Do all movements have the same cost?
Is moving back, or even just revisiting of the same location allowed?
Are you looking for the solution that guarantees you to find the best way, or just any way?
If the movement is not straight-forward, is it possible that you may find a heuristic function that estimates the cost of intermediate solution?
Finally, what are you optimizing for? The speed? The memory?
However, if you just wanted to know whether the point is within rectangular boundary, I'd just have a Rectangle the size of the map and use http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/geom/Rectangle.html#containsPoint%28%29 - this will spare you writing the switch similar to the one in Artiace's answer.