Duff's device in Swift - c

We know that a Duff's device makes use of interlacing the structures of a fallthrough switch and a loop like:
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Now, in Swif 2.1, switch-case control flows do not implicitly have fallthrough as we read in Swift docs:
No Implicit Fallthrough
In contrast with switch statements in C and Objective-C, switch
statements in Swift do not fall through the bottom of each case and
into the next one by default. Instead, the entire switch statement
finishes its execution as soon as the first matching switch case is
completed, without requiring an explicit break statement. This makes
the switch statement safer and easier to use than in C, and avoids
executing more than one switch case by mistake.
Now, given that there's a fallthrough clause to have explicitly a fallthrough side effect in Swift:
Fallthrough
Switch statements in Swift do not fall through the bottom of each case
and into the next one. Instead, the entire switch statement completes
its execution as soon as the first matching case is completed. By
contrast, C requires you to insert an explicit break statement at the
end of every switch case to prevent fallthrough. Avoiding default
fallthrough means that Swift switch statements are much more concise
and predictable than their counterparts in C, and thus they avoid
executing multiple switch cases by mistake.
that is pretty much like:
let integerToDescribe = 5
var description = "The number \(integerToDescribe) is"
switch integerToDescribe {
case 2, 3, 5, 7, 11, 13, 17, 19:
description += " a prime number, and also"
fallthrough
default:
description += " an integer."
}
print(description)
// prints "The number 5 is a prime number, and also an integer."
considering that as Wikipedia reminds to us, the devices comes out from the issue
A straightforward code to copy items from an array to a memory-mapped output register might look like this:
do { /* count > 0 assumed */
*to = *from++; /* "to" pointer is NOT incremented, see explanation below */
} while(--count > 0);
Which would be the exact implementation of a Duff's device in Swift?
This is just a language & coding question, it is not intended to be applied in real Swift applications.

Duffs device is about more than optimisation. If you look at https://research.swtch.com/duff it is a discussion of implementing co-routines using this mechanism (see paragraph 8 for a comment from Mr. Duff).
If you try to write a portable co-routine package without this ability. You will end up in assembly or re-writing jmpbuf entries [ neither is portable ].
Modern languages like go and swift have more restrictive memory models than C, so this sort of mechanism (I imagine) would cause all sorts of tracking problems. Even the lambda-like block structure in clang,gcc end up intertwined with thread local storage, and can cause all sorts of havoc unless you stick to trivial applications.

You express your intent in the highest level code possible, and trust the Swift compiler to optimize it for you, instead of trying to optimize it yourself. Swift is a high level language. You don't do low level loop unrolling in a high level language.
And in Swift, especially, you don't need to worry about copying arrays (the original application of Duff's Device) because Swift pretends to copy an array whenever you assign it, using "copy on write." This means that it will use the same array for two variables as long as you are just reading from them, but as soon as you modify one of them, it will create a duplicate in the background.
For example, from https://developer.apple.com/documentation/swift/array
Modifying Copies of Arrays
Each array has an independent value that includes the values of all
of its elements. For simple types such as integers and other structures,
this means that when you change a value in one array, the value of that
element does not change in any copies of the array. For example:
var numbers = [1, 2, 3, 4, 5]
var numbersCopy = numbers
numbers[0] = 100
print(numbers)
// Prints "[100, 2, 3, 4, 5]"
print(numbersCopy)
// Prints "[1, 2, 3, 4, 5]"

Related

When to use const in a function for C

I've been studying for my midterm tomorrow, and I came across this question where I couldn't decide between answers 2 and 3. PlayingCard is a struct, and HANDSIZE is a constant with a value of 5.
The right answer was apparently option 2, but both a friend and I thought that the better choice here would be 3, as we were told that we should use const as good programming practice when we know that the variable isn't going to be changed. Seeing as it isn't changed here, why would option 2 be better than option 3?
/* QUESTION: What should go for the definition of isFlush that detects flushes
1: Nothing
2: bool isFlush(PlayingCard hand[HANDSIZE]) {
3: bool isFlush(const PlayingCard hand[HANDSIZE]) {
4: bool isFlush(PlayingCard ** hand) {
5: bool isFlush(const PlayingCard ** hand) {
6: bool isFlush(PlayingCard *** hand) {
7: bool isFlush(CardFace * hand) {
8: bool isFlush(CardSuit * hand) {
9: bool isFlush(CardSuit suits[HANDSIZE]) {
*/
// missing function defintion
CardSuit suit = hand[0].suit;
for (int i = 1; i < HANDSIZE; i++ ) {
if (suit != hand[i].suit) {
return false;
}
}
return true;
}
as we were told that we should use const as good programming practice
when we know that the variable isn't going to be changed
This is true, but only when you pass a pointer to a variable. If you pass the variable by value there is no advantage that I can see.
In this case I agree that the best answer is 3. It is probably too late to ask the author of the question why the correct answer is 2.
You can use const as a hint that the function will not modify something. You can pass a pointer to non-const thing to a function that declares it will take a pointer to const thing.
We don't have sufficient information to inform you which function declaration will work. I can eliminate options 1, 4, 5, 6, 8 and 9.
I can eliminate option 1 because it would not compile.
I can eliminate options 4, 5, and 6 because you cannot access a pointer with the . operator as is being done in the initialization of suit.
I can eliminate option 8, because a struct or union cannot be compared using !=.
I can eliminate option 9, because the name of the argument is wrong.
Options 8 and 9 are also unlikely because of the recursive definition required of the type CardSuit that is being suggested in the initialization of suit.
You asked:
Seeing as it isn't changed here, why would option 2 be better than option 3?
You seems to have eliminated option 7. I have insufficient information to do so, since PlayingCard might not have a suit member, whereas CardFace might have such a member. If that were the case, the correct answer would be option 7.
But assuming option 7 should be eliminated, then either options 2 or 3 would work for the function. So, to answer your question above in particular, option 3 is superior because it communicates the intention that the function will not modify the elements of the array.
The only advantage option 2 has over option 3 is that it will exclude attempts to call the function with an array of const things. You would only want to do that if the function wished to modify the elements.

Fast copy selected elements of an array in C?

This could be a very basic question but it really bugs me a lot. What I am trying to do is basically to copy elements in an old vector to a new vector using C.
The copy is based on an index vector where each element in this vector represents the index of the element in the old vector. The index vector is not sorted.
For example,
an old vector A = [3.4, 2.6, 1.1].
a index vector B = [1, 1, 3, 3, 2, 1, 2].
After copy, I would expect the new vector to be C = [3.4, 3.4, 1.1, 1.1, 2.6, 3.4, 2.6].
The most brutal solution I can think of is to run a loop through B and copy the corresponding element in A to C. But when the vector is too large, the cost is not bearable.
My question is, is there a faster/smarter way to do copy in C on this occasion?
Originally the code is written in Julia and I had no problem with that. In Julia, I simply use C = A[B] and it is fast. Anyone knows how they do it?
Add the C pseudo code:
float *A = []; # old array
int *B = []; # index array
float *C;
for(i=0;i<length(B);i++)
{
*(C+i) = *(A+*(B+i));
}
Assumption: length(B) isn't actually a function but you didn't post what it is. If it is a function, capture it in a local variable outside the for loop and read it once; else you have a Schlemiel the Painter's Algorithm.
I think the best we can do is Duff's Device aka loop unrolling. I also made some trivial optimizations the compiler can normally make, but I recall the compiler's loop unrolling isn't quite as good as Duff's device. My knowledge could be out of date and the compiler's optimizer could have caught up.
Probing may be required to determine the optimal unroll number. 8 is traditional but your code inside the loop body is larger than normal.
This code is destructive to the pointers A and B. Save them if you want them again.
float *A = []; # old array
int *B = []; # index array
float *C;
int ln = length(B);
int n = (ln + 7) % 8;
switch (n % 8) {
case 0: do { *C++ = A[*B++];
case 7: *C++ = A[*B++];
case 6: *C++ = A[*B++];
case 5: *C++ = A[*B++];
case 4: *C++ = A[*B++];
case 3: *C++ = A[*B++];
case 2: *C++ = A[*B++];
case 1: *C++ = A[*B++];
} while (--n > 0);
}
With this much scope I can do no better, but with a larger scope a better choice might exist involving redesigning your data structures.

Different ways to do a precomputed truth check

I am working on a program which generates C code for one function. This generated C function resides in the central loop of another target program; this function is performance sensitive. The generated function is used to call another function, based on a bool value -- this boolean value is fetched using 2 ints passed to the generated function: a state number and a mode number. Generated function looks like so:
void dispatch(System* system, int state, int mode) {
// Some other code here...
if (truthTable[state][mode]) {
doExpensiveCall(system, state, mode);
}
}
Some facts:
The range of 'state' and 'mode' values start at 0, and end at some number < 10,000. Their possible values are sequential, with no gaps in between. So, for example, if the end value of 'state' is 1000, then we know that there are 1001 and states (including state 0).
The code generator is aware of the states and modes, and it knows ahead of time which combination of state+mode will yield a value of true. Theoretically, any combination of state+mode could yield true value, and thus make a call to doExpensiveCall, but in practive it will mostly be a handful of state+mode combinations that will yield a value of true. Again, this info is known during the code generation.
Since this function will be called alot, I want to optimize the check for the truth value. In the average case, I expect the test to yield false for the vast percetage of time. On average, I expect that less than 1% of the calls will yield a value of true. But, theoretically, it could be as hight as 100% of the time (this point depends on the end-user).
I am exploring the different ways that I could compute whether a state+mode will yied a call to doExpensiveCall(). In the end, I'll have to choose something, so I'm exploring my options now. These are the different ways that I could think of so far:
1) Create a precomputed dual dimensional array, which contains booleans. This is what I'm using in the example above. This yields the fastest possible check that I can think of. The problem is that if state and mode have large ranges (say 10,000x1000), the generated table starts becomming very big (in the case of 10,000x1000, thats 10MB for just that table). Example:
// STATE_COUNT=4, MODE_COUNT=3
static const char truthTable[STATE_COUNT][MODE_COUNT] = {
{0,1,0},
{0,0,0},
{1,1,0},
{0,0,1}
}
2) Create a table like #1, but compressed: instead of each array entry being a single boolean, it would be a char bitfield. Then, during the check, I would do some computation with state+mode to decide how to index into the array. This reduces the size of the precomputed table by MODE_MODE/8. The downside is that the reduction is not that much, and now theres is now need compute the index of the boolean in the bitfield table, instead of just a simple array access as in the case in #1.
3) Since the amount of state+mode combinations that will yield a value of true is expected to be small, a switch statement is also possible (using the truthTable in #1 as reference):
switch(state){
case 0: // row
switch(mode){ // col
case 1: doExpensiveCall(system, state, mode);
break;
}
break;
case 2:
switch(mode){
case 0:
case 1: doExpensiveCall(system, state, mode);
break;
}
break;
case 3:
switch(mode){
case 2: doExpensiveCall(system, state, mode);
break;
}
break;
}
QUESTION:
What are other ways that, given the facts above, can be used calcuate this boolean value needed to call doExpensiveCall()?
Thanks
Edit:
I though about Jens sample code, and the following occurred to me. In order to have just one switch statement, I can do this computation in the generated code:
// #if STATE_COUNT > MODE_COUNT
int i = s * STATE_COUNT + m;
// #else
int i = m * MODE_COUNT + s;
// #endif
switch(i) {
case 1: // use computed values here, too.
case 8:
case 9:
case 14:
doExpensiveCall(system, s, m);
}
I'd try to use a modified version of (3), where you actually have only one call, and all the switch/case stuff leads to that call. By that you can ensure that the compiler will choose whatever heuristics he has for optimizing this.
Something along the line of
switch(state) {
default: return;
case 0: // row
switch(mode){ // col
default: return;
case 1: break;
}
break;
case 2:
switch(mode){
default: return;
case 0: break;
case 1: break;
}
break;
case 3:
switch(mode){
default: return;
case 2: break;
}
break;
}
doExpensiveCall(system, state, mode);
That is, you'd only have "control" inside the switch. The compiler should be able to sort this out nicely.
These heuristics will probably be different between architectures and compilation options (e.g -O3 versus -Os) but this is what compilers are for, make choices based on platform specific knowledge.
And for your reference to time effeciency, if your function call is really expensive as you claim, this part will just be burried in the noise, don't worry about it. (Or otherwise benchmark your code to be sure.)
If the code generator knows the percentage of the table that's in use it can choose the algorithm at build time.
So if it is about 50% true/false use the 10 MB table.
Otherwise use a hash table or a radix tree.
A hash table would choose a hash function and a number of buckets. You'd compute the hash, find the bucket and search the chain for the true (or false) values.
The radix tree would choose a radix (like 10) and you'd have 10 entries with pointers to NULL (no true values down there) and one would have a pointer to another 10 entries, until you finally reach a value.

Compiling switch statements for a simple VM

So I'm compiling a subset of C to a simple stack VM for learning purposes and I'd like to know how to best compile a switch statement, e.g.
switch (e) {
case 0: { ... }
case 1: { ... }
...
case k: { ... }
}
The book I'm going through offers a simple way to compile it with indexed jumps but the simple scheme described in the book only works for contiguous, ascending ranges of case values.
Right now I'm using symbolic labels for the first pass and during the second pass I'm going to resolve the labels to actual jump targets because having labels simplifies the initial compilation to stack instructions quite a bit. My idea right now is to generalize what's in the book to any sequence of case values in any order with the following scheme. Call the case values c1, c2, ..., cn and the corresponding labels j1, j2, ..., jn then generate the following sequence of instructions assuming the value of e is on top of the stack:
dup, loadc c1, eq, jumpnz j1,
dup, loadc c2, eq, jumpnz j2,
...
dup, loadc cn, eq, jumpnz jn,
pop, jump :default
j1: ...
j2: ...
...
jn: ...
default: ...
The notation is hopefully clear but if not: dup = duplicate value on top of stack, loadc c = push a constant c on top of the stack, eq = compare top two values on the stack and push 0 or 1 based on the comparison, jumpnz j = jump to the label j if the top stack value is not 0, label: = something that will be resolved to an actual address during second compilation pass.
So my question is then what are some other ways of compiling switch statements? My way of doing it is much less compact than an indexed jump table for contiguous ranges of case values but works out pretty well when there are large gaps.
First sort all the cases. Then identify all (large enough to be worthwhile) continguous or near-contiguous sequences, and treat them as a single unit that is handled with a jump table. Then, instead of your linear sequence of compares and jumps, use a balanced binary tree of branches to minimize the average number of jumps. You do this by comparing against the median of the cases or blocks of cases, recursively.

setting values to an array (post-init)

this seems to be a simple question, but i cannot get my head around it...
i want to set the elements of an array, based on some conditions, in the most elegant way.
this is my non-compiling pseudo code:
float*array=NULL;
switch(elements) {
case 1: array={1}; break;
case 2: array={7, 1}; break;
case 3: array={3, 2, -1}; break;
case 4: array={10, 20, 30, 40}; break;
default:break;
}
the size of the array is limited, so i could do something like 'float array[16];' as well, but the problem is obviously assignment in the case-statement.
i really don't want to do something like:
case 2: array[0]=1; array[1]=2;
my current (clumsy) implementation is:
#define ARRAYCOPY(dest, src) for(int i=0;i<sizeof(src)/sizeof(*src);i++)dest[i]=src[i]
// ...
case 2: do {float*vec={1, 2}; ARRAYCOPY(array, vec); } while(0); break;
i'm using the ARRAYCOPY define, since memcpy() doesn't seem to work. at least doing
float*vec={1, 2}; memcpy(array, vec, sizeof(vec)/sizeof(*vec);
did not fill any values into array.
i guess there must be a nicer solution?
There is the memcpy function in <string.h>, which is similar to what you implemented as ARRAYCOPY. It copies a block of memory of given size, in your case this size is number of elements in the array * size of an element in the array.
http://www.cplusplus.com/reference/clibrary/cstring/memcpy/
memcpy(destination_array, source_array, number_of_elements * size_of_element);
So, you'd have
// let's allocate exactly as much space as is needed
float* array = (float*)malloc(elements * sizeof(float));
// and now copy the elements we want
switch(elements) {
case 1: memcpy(array, (float[]){1}, 1 * sizeof(float)); break;
case 2: memcpy(array, (float[]){1, 2}, 2 * sizeof(float)); break;
case 3: memcpy(array, (float[]){1, 2, 3}, 3 * sizeof(float)); break;
case 4: memcpy(array, (float[]){10, 20, 30, 40}, 4 * sizeof(float)); break;
default:break;
}
How about:
float *global[] = {
NULL,
(float[]){1},
(float[]){1, 2},
(float[]){7, 8, 9},
};
/* ... make sure `elements` is in range */
float *array = global[elements];
EDIT
You're obviously free to continue using the switch:
switch(elements) {
case 1:
array = (float[]){1};
break;
}
Some options:
1) All of those arrays you may want to use declare on the stack somewhere else, and assign the pointer "array" to whichever one you will actually want. I think this is overly complex though to have these rather pointless arrays floating around.
2) Why doesn't memcpy work? Just remember it thinks in terms of bytes not array elements, so memcpy(target, source, sizeof(source)) should work fine provided target is bigger. I would wrap with if(sizeof(target) > sizeof(source)) { handle error } like this:
case 3:
float temp[] = {1.0,2.0,3.0};
memcpy(array, temp, sizeof(temp));
Make sure you allocate memory (stack or heap) and keep track of the semantic size of array.

Resources