This could be a very basic question but it really bugs me a lot. What I am trying to do is basically to copy elements in an old vector to a new vector using C.
The copy is based on an index vector where each element in this vector represents the index of the element in the old vector. The index vector is not sorted.
For example,
an old vector A = [3.4, 2.6, 1.1].
a index vector B = [1, 1, 3, 3, 2, 1, 2].
After copy, I would expect the new vector to be C = [3.4, 3.4, 1.1, 1.1, 2.6, 3.4, 2.6].
The most brutal solution I can think of is to run a loop through B and copy the corresponding element in A to C. But when the vector is too large, the cost is not bearable.
My question is, is there a faster/smarter way to do copy in C on this occasion?
Originally the code is written in Julia and I had no problem with that. In Julia, I simply use C = A[B] and it is fast. Anyone knows how they do it?
Add the C pseudo code:
float *A = []; # old array
int *B = []; # index array
float *C;
for(i=0;i<length(B);i++)
{
*(C+i) = *(A+*(B+i));
}
Assumption: length(B) isn't actually a function but you didn't post what it is. If it is a function, capture it in a local variable outside the for loop and read it once; else you have a Schlemiel the Painter's Algorithm.
I think the best we can do is Duff's Device aka loop unrolling. I also made some trivial optimizations the compiler can normally make, but I recall the compiler's loop unrolling isn't quite as good as Duff's device. My knowledge could be out of date and the compiler's optimizer could have caught up.
Probing may be required to determine the optimal unroll number. 8 is traditional but your code inside the loop body is larger than normal.
This code is destructive to the pointers A and B. Save them if you want them again.
float *A = []; # old array
int *B = []; # index array
float *C;
int ln = length(B);
int n = (ln + 7) % 8;
switch (n % 8) {
case 0: do { *C++ = A[*B++];
case 7: *C++ = A[*B++];
case 6: *C++ = A[*B++];
case 5: *C++ = A[*B++];
case 4: *C++ = A[*B++];
case 3: *C++ = A[*B++];
case 2: *C++ = A[*B++];
case 1: *C++ = A[*B++];
} while (--n > 0);
}
With this much scope I can do no better, but with a larger scope a better choice might exist involving redesigning your data structures.
Related
We know that a Duff's device makes use of interlacing the structures of a fallthrough switch and a loop like:
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Now, in Swif 2.1, switch-case control flows do not implicitly have fallthrough as we read in Swift docs:
No Implicit Fallthrough
In contrast with switch statements in C and Objective-C, switch
statements in Swift do not fall through the bottom of each case and
into the next one by default. Instead, the entire switch statement
finishes its execution as soon as the first matching switch case is
completed, without requiring an explicit break statement. This makes
the switch statement safer and easier to use than in C, and avoids
executing more than one switch case by mistake.
Now, given that there's a fallthrough clause to have explicitly a fallthrough side effect in Swift:
Fallthrough
Switch statements in Swift do not fall through the bottom of each case
and into the next one. Instead, the entire switch statement completes
its execution as soon as the first matching case is completed. By
contrast, C requires you to insert an explicit break statement at the
end of every switch case to prevent fallthrough. Avoiding default
fallthrough means that Swift switch statements are much more concise
and predictable than their counterparts in C, and thus they avoid
executing multiple switch cases by mistake.
that is pretty much like:
let integerToDescribe = 5
var description = "The number \(integerToDescribe) is"
switch integerToDescribe {
case 2, 3, 5, 7, 11, 13, 17, 19:
description += " a prime number, and also"
fallthrough
default:
description += " an integer."
}
print(description)
// prints "The number 5 is a prime number, and also an integer."
considering that as Wikipedia reminds to us, the devices comes out from the issue
A straightforward code to copy items from an array to a memory-mapped output register might look like this:
do { /* count > 0 assumed */
*to = *from++; /* "to" pointer is NOT incremented, see explanation below */
} while(--count > 0);
Which would be the exact implementation of a Duff's device in Swift?
This is just a language & coding question, it is not intended to be applied in real Swift applications.
Duffs device is about more than optimisation. If you look at https://research.swtch.com/duff it is a discussion of implementing co-routines using this mechanism (see paragraph 8 for a comment from Mr. Duff).
If you try to write a portable co-routine package without this ability. You will end up in assembly or re-writing jmpbuf entries [ neither is portable ].
Modern languages like go and swift have more restrictive memory models than C, so this sort of mechanism (I imagine) would cause all sorts of tracking problems. Even the lambda-like block structure in clang,gcc end up intertwined with thread local storage, and can cause all sorts of havoc unless you stick to trivial applications.
You express your intent in the highest level code possible, and trust the Swift compiler to optimize it for you, instead of trying to optimize it yourself. Swift is a high level language. You don't do low level loop unrolling in a high level language.
And in Swift, especially, you don't need to worry about copying arrays (the original application of Duff's Device) because Swift pretends to copy an array whenever you assign it, using "copy on write." This means that it will use the same array for two variables as long as you are just reading from them, but as soon as you modify one of them, it will create a duplicate in the background.
For example, from https://developer.apple.com/documentation/swift/array
Modifying Copies of Arrays
Each array has an independent value that includes the values of all
of its elements. For simple types such as integers and other structures,
this means that when you change a value in one array, the value of that
element does not change in any copies of the array. For example:
var numbers = [1, 2, 3, 4, 5]
var numbersCopy = numbers
numbers[0] = 100
print(numbers)
// Prints "[100, 2, 3, 4, 5]"
print(numbersCopy)
// Prints "[1, 2, 3, 4, 5]"
//runs through initial values and set them to null and zero;
for(int g =0;g<Arraysize;g++){Array1[g].word="NULL";Array1[g].usage=0;}
//struct
int Arraysize = 100;
struct HeavyWords{
string word;
int usage;
};
//runs through txt file and checks if word has already been stored, if it didn't,
it adds it as the next point in the struct, if it has, it adds to the usage int at that point in my array of structs
while (myfile >> Bookword)
{totalwords++;cout<<Bookword<<endl;
bool foundWord = false;
for(int q = 0;q<counter;q++)
{
if(Array1[q].word == Bookword)
{
Array1[q].usage++;
foundWord = true;
}
}
if(foundWord == false) {
Array1[counter].word = Bookword;
Array1[counter].usage = 1;
counter++;
//cout<<counter<<endl;
}
//double size of array when the counter reaches array size
if(counter==Arraysize)
{
HeavyWords * Array2;
Array2 = new HeavyWords[2*Arraysize];
for (int k= 0;k<Arraysize;k++)
{
Array2[k].word = Array1[k].word;
}
Arraysize = 2*Arraysize;
Arraydouble++;
HeavyWords* cursor = Array1;
Array1 = Array2;
delete [] cursor;
}
}
//I just started programming in C++ so im apologize if this code is an explosion of nonesence.
//here is my code,
//I have been racking my brain as to why it is not correctly storing the usage of each word, but when I run it, it gives me the incorrect amount of times certain words are used
//would really love if someone could tell me where my logic went wrong
Immediate problem - While copying Array1 to Array2 you are not copying the usage.
Solution - copy the usage. A statement such as Array2[k] = Array1[k] would do.
Suggestions:
You are also not breaking out in the first part of the code when you find a match in the Array1 for the word you are looking for. You code would needlessly continue to iterate over the entire array, when e.g. a match would have been found at say 10th index and you could have come out of the for loop.
You are re inventing the wheel. You need an expandable array; C++ STL has one readymade for you - it is called vector.
Also Array/Vector does NOT look to be right choice for what you are trying to do. On each word you are doing a linear search on the Array1. A map from C++ STL would neatly AND efficiently do what you are trying to do. Your code would also be much shorter. You can look up on how to code with maps. If you write some code, I can help further. Or wait; someone here would write out entire code for you :).
I want to generate 5 random positions on a map. I can only come up with the code below, which uses while (1) and break:
int map[10][10];
memset(map,0,sizeof(map));
for (int i = 0; i < 5; i++) {
while (1) {
int x = RAND_FROM_TO(0, 10);
int y = RAND_FROM_TO(0, 10);
if (map[x][y]==0) {
map[x][y]=1;
break;
}
}
}
Is there any other way to do the same job without while(1), because I have been told the while(1) is very bad.
I just want to find a simple way to do it, so the efficiency of the generating random numbers is not under my consideration.
You can use a shuffle algorithm such as Fisher–Yates. I would pose a modified (truncated) version as so:
Express your XY coordinates as a single number.
Construct a list of all coordinates.
Pick one at random, mark it.
Remove that coordinate from the list (swap it with the one at the end of the list, and treat the list as 1 element shorter)
repeat with the list that no longer contains the marked coordinate.
This way, rather than choosing 5 numbers from 0-99, you choose one 0-99, 0-98, ... 0-95, which guarantees that you can complete the task with exactly 5 choices.
EDIT: Upon further consideration, step 1 is not strictly necessary, and you could use this on a system with sparse coordinates if you did it that way.
What about something like this:
// Create an array of valid indexes for both x and y.
NSMutableArray *xCoords = [NSMutableArray array];
NSMutableArray *yCoords = [NSMutableArray array];
for (int i = 0; i < 9; ++i) {
[xCoords addObject:#(i)];
[yCoords addObject:#(i)];
}
int map[10][10];
memset(map, 0, sizeof(map));
for (int i = 0; i < 5; ++i) {
// Pick a random x coordinate from the valid x coordinate list.
int rand = RAND_FROM_TO(0, [xCoords count]);
int x = [xCoords objectAtIndex:rand];
// Now remove that coordinate so it cannot be picked again.
[xCoords removeObjectAtIndex:rand];
// Repeat for y.
rand = RAND_FROM_TO(0, [yCoords count]);
int y = [yCoords objectAtIndex:rand];
[yCoords removeObjectAtIndex:rand];
assert(map[x][y] == 0);
map[x][y] = 1;
}
Note: I'm using NSMutableArray because you originally specified Objective-C as a tag.
Note 2: An array of valid indexes is not the most efficient representation. Using NSMutableIndexSet instead is left as an exercise to the reader. As is using basic C primitives if you don't / can't use NSMutableArray.
Note 3: This has a bug where if you pick, say, x = 3 the first time, no further choices will end up with x = 3, even though there will be valid choices where x = 3 but y is different. Fixing that is also left as an exercise, but this does satisfy your requirements, on the surface.
Suppose I have an array A = {a,b,c,d,e,f,g} and a set of (zero-based) indexes I={1,3,5} in A. Now suppose that I actually don't have A, but only the array which is the result of removing the indexes specified in I from A, i.e. B = {a,c,e,g} (I also have I itself).
Given an index in B, can I analytically calculate the corresponding index in A? For example, for the index 3 in B the answer should be 6.
It's easy to think of a O(|A|) solution, but it's unacceptable as A can get pretty big. A O(|I|) solution should be fine. Also note that I may periodically change (more indexes removed).
Perhaps use an array that for each of the elements in B would have the number of elements before that index that were removed {0,1,2,3} then one would take the index into B and look up in that array and add that value to the index into B to get the index into A. This would take additional space equal to the size of B but would be O(1).
"I" splits original array into some slices. We can get B concatenating these slice. With I={1,3,5} we get slices {0, 0}, {2,2}, {4,4}, {6,lastA} We can create an ordered map where the keys are indices in B and the values are slices.
{ 0: {0,0}, 1: {2, 2}, 2: {4, 4}, 3: {6,lastA} }
Actually, we don't need to keep upper bound of each slice
{ 0: 0, 1: 2, 2: 4, 3: 6 }
In C++ code make look like this:
std::function<size_t (size_t)> getIndexConverter(size_t sizeOfA, std::vector<size_t> I)
{
std::map<size_t, size_t> abIndices;
size_t sliceStart = 0;
for (size_t i = 0, imax = I.size(); i < imax; ++i) {
if (sliceStart < I[i])
abIndices.emplace(sliceStart - i, sliceStart);
sliceStart = I[i] + 1;
}
if (sliceStart < sizeOfA)
abIndices.emplace(sliceStart - I.size(), sliceStart);
return [abIndices](size_t bIndex) -> size_t {
auto slice = abIndices.lower_bound(bIndex);
assert(slice != abIndices.end()); // it is impossible because of algorithm we use to construct abIndices
return bIndex - slice->first + slice->second;
};
}
full example on Ideone
This method requires additional memory equal to number of slices and executes with logarithmic time.
I'm currently using a random number and a series of if statements to assign a pointer to one of four lists using the following:
struct listinfo//struct holds head, tail and the number of entries for the n2l, norm, known and old lists
{
struct vocab * head;
int entries;
struct vocab * tail;
};
...
int list_selector=0;
struct listinfo * currentlist = NULL;
//select a list at random, using the percentage probabilities in the if statements.
//FISH! Can this be done with a switch and ranges?
list_selector = (rand() % 100)+1;
if (list_selector<33) currentlist = &n2l;
if (list_selector>32&&list_selector<95) currentlist=&norm;
if (list_selector>94&&list_selector<100) currentlist = &known;
if (list_selector==100) currentlist = &old;
I was just wondering if there's a neater way to do this using ranges in the switch as in this question.
If so, an example would be great. Any additional tips would be much appreciated too.
Edit: Fixed! Was linking to wrong page instead of this.
I don't believe C supports ranges in switch statements. You could use an if-else construct to reduce comparisons:
if( x == 100 )
...
else if( x > 94 )
...
And your random number generator is not properly random: RAND_MAX is unlikely to be divisible by 100, so some numbers may be more common than others. Here is the accepted way to convert rand() to a properly random distribution from 1 to 100:
x = 1 + (int) ( 100.0 * ( rand() / ( RAND_MAX + 1.0 ) ) );
As noted before, C doesn't support ranges as switch case selectors. For a compact and efficient form, you could use C's ternary operator, eg:
r = (random() % 100)+1;
currentlist = r<33? &n2l : r<95? &norm : r<100? &known : &old;
or if you prefer could use nested if's, like:
currentlist = &n2l;
if (r>32) {
currentlist = &norm;
if (r>94) {
currentlist = &known;
if (r==100)
currentlist = &old;
}
}
Note, per man rand, "on older rand() implementations... the lower-order bits are much less random than the higher-order bits", so I prefer random() to rand(). Or go with the 100.0 * ( rand() / ( RAND_MAX + 1.0)) formula suggested by asc99c.
No, in this situation, using the less than, greater than, is much more efficient. For a switch, you'd have to outlay all of the different possibilities.
case 1:
2:
3:
4:
/*write all the numbers out all the way up to 31, then for 32 */
32: currentlist = &n2l;
break;
case 33:
34:
35:
/*write out all of these numbers until 94, then for 95 */
95: currentlist=&norm;
break;
/*etc*/
If it's possible to change that (rand() % 100)+1 to provide fewer possible values, you'll have a much easier time with the solution.
If that's not possible, I'm trying to come up with a solution, but it seems like a tricksy problem at first glance (without listing every possible value).