Efficient Implementation of Fitness-Proportionate "Roulette" Selection - c

I am currently writing a keyboard layout optimization algorithm in C (such as the one designed by Peter Klausler) and I want to implement a fitness-proportionate selection as described here (PDF Link):
With roulette selection you select
members of the population based on a
roullete wheel model. Make a pie
chart, where the area of a member’s
slice to the whole circle is the ratio
of the members fitness to the total
population. As you can see if a point
on the circumfrence of the circle is
picked at random those population
members with higher fitness will have a
higher probability of being picked.
This ensures natural selection takes
place.
The problem is, I don't see how to implement it efficiently. I've thought of two methods: one is unreliable, and the other is slow.
First, the slow one:
For a keyboard pool of length N, create an array of length N where each element of the array actually contains two elements, a minimum and a maximum value. Each keyboard has a corresponding minimum and maximum value, and the range is based on the fitness of the keyboard. For example, if keyboard zero has a fitness of 10, keyboard one has a fitness of 20, and keyboard two has a fitness of 25, it would look like this:
Code:
array[0][0] = 0; // minimum
array[0][1] = 9; // maximum
array[1][0] = 10;
array[1][1] = 30;
array[2][0] = 31;
array[2][1] = 55;
(In this case a lower fitness is better, since it means less effort is required.)
Then generate a random number. For whichever range that number falls into, the corresponding keyboard is "killed" and replaced with the offspring of a different keyboard. Repeat this as many times as desired.
The problem with this is that it is very slow. It takes O(N^2) operations to finish.
Next the fast one:
First figure out what the lowest and highest fitnesses for the keyboards are. Then generate a random number between (lowest fitness) and (highest fitness) and kill all keyboards with a fitness higher than the generated number. This is efficient, but it's not guaranteed to only kill half the keyboards. It also has somewhat different mechanics from a "roulette wheel" selection, so it may not even be applicable.
So the question is, what is an efficient implementation?
There is a somewhat efficient algorithm on page 36 of this book (Link), but the problem is, it's only efficient if you do the roulette selection only one or a few times. Is there any efficient way to do many roulette selections in parallel?

For one thing, it sounds like you are talking about unfitness scores if you want to "kill off" your selection (which is likely to be a keyboard with high score).
I see no need to maintain two arrays. I think the simplest way is to maintain a single array of scores, which you then iterate through to make a choice:
/* These will need to be populated at the outset */
int scores[100];
int totalScore;
for (gen = 0; gen < nGenerations; ++gen) {
/* Perform a selection and update */
int r = rand() % totalScore; /* HACK: using % introduces bias */
int t = 0;
for (i = 0; i < 100; ++i) {
t += scores[i];
if (r < t) {
/* Bingo! */
totalScore -= scores[i];
keyboards[i] = generate_new_keyboard_somehow();
scores[i] = score_keyboard(keyboards[i]);
totalScore += scores[i]; /* Now totalScore is correct again */
}
}
}
Each selection/update takes O(n) time for n keyboards.

Related

Specific permutations of 32 card deck (in C)

I want to generate all permutations of 32 card deck, I represent cards as numbers 0-7, so I don´t care about color of the card. The game is very simple (divide deck into two gropus, compare two cards, add both cards to group of bigger card). I have already code this part of game, but deck is now generating randomly, and I want to look to all possibilities of cards, and make some statistics about it. How can I code this card generating? I totaly don´t know, how to code it.
Because I was just studying Aaron Williams 2009 paper "Loopless Generation of Multiset Permutations by Prefix Shifts", I'll contribute a version of his algorithm, which precisely solves this problem. I believe it to be faster than the standard C++ next_permutation which is usually cited for this problem, because it doesn't rely on searching the input vector for the pivot point. But more extensive benchmarking would be required to produce a definitive answer; it is quite possible that it ends up moving more data around.
Williams' implementation of the algorithm avoids data movement by storing the permutation in a linked list, which allows the "prefix shift" (rotate a prefix of the vector by one position) to be implemented by just modifying two next pointers. That makes the algorithm loopless.
My version here differs in a couple of ways.
First, it uses an ordinary array to store the values, which means that the shift does require a loop. On the other hand, it avoids having to implement a linked-list datatype, and many operations are faster on arrays.
Second, it uses suffix shifts rather than prefix shifts; in effect, it produces the reverse of each permutation (compared with Williams' implementation). I did that because it simplifies the description of the starting condition.
Finally, it just does one permutation step. One of the great things about Williams' algorithm is that the state of the permutation sequence can be encapsulated in a single index value (as well as the permutation itself, of course). This implementation returns the state to be provided to the next call. (Since the state variable will be 0 at the end, the return value doubles as a termination indicator.)
Here's the code:
/* Do a single permutation of v in reverse coolex order, using
* a modification of Aaron Williams' loopless shift prefix algorithm.
* v must have length n. It may have repeated elements; the permutations
* generated will be unique.
* For the first call, v must be sorted into non-descending order and the
* third parameter must be 1. For subsequent calls, the third parameter must
* be the return value of the previous call. When the return value is 0,
* all permutations have been generated.
*/
unsigned multipermute_step(int* v, unsigned n, unsigned state) {
int old_end = v[n - 1];
unsigned pivot = state < 2 || v[state - 2] > v[state] ? state - 1 : state - 2;
int new_end = v[pivot];
for (; pivot < n - 1; ++pivot) v[pivot] = v[pivot + 1];
v[pivot] = new_end;
return new_end < old_end ? n - 1 : state - 1;
}
In case that comment was unclear, you could use the following to produce all shuffles of a deck of 4*k cards without regard to suit:
unsigned n = 4 * k;
int v[n];
for (unsigned i = 0; i < k; ++i)
for (unsigned j = 0; j < 4; ++j)
v[4 * i + j] = i;
unsigned state = 1;
do {
/* process the permutation */
} while ((state = multipermute_step(v, n, state);
Actually trying to do that for k == 8 will take a while, since there are 32!/(4!)8 possible shuffles. That's about 2.39*1024. But I did do all the shuffles of decks of 16 cards in 0.3 seconds, and I estimate that I could have done 20 cards in half an hour.

Can I speed up this function? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm trying to write John Conway's Game of Life in C, but I'm having trouble adding living cells to the board. The function I wrote to handle it is extremely slow.
Thought process: I want to add n living cells to the board randomly, so while cells left to set alive, get a random (x, y) pair, and if it's dead, make it living. That way I can guarantee n cells become alive.
Is my understanding of the problem incorrect, or am I just being inefficient? Why is it so slow, and how can I make it faster?
void add_cells( int board[BOARD_WIDTH][BOARD_HEIGHT], int n )
{
// Randomly set n dead cells to live state.
while ( n )
{
int randX = rand() % BOARD_WIDTH;
int randY = rand() % BOARD_HEIGHT;
if( board[randX][randY] == 0 )
{
board[randX][randY] = 1;
n--;
}
}
}
If let's say 70% of cells are alive, then it means that your program will have to find an other cell 7 times out of 10, which makes unecessary repetitions.
You could pop the selected cell out from a "remaining cells" array when you set it alive, and select your cell randomly in this array. I suggest to use a dynamicaly resizable container so you don't have to manipulate your entire "remaining cells" array each time you pop out a cell. This should help save you more time.
There are several issues that might explain some slowness in your problem:
Is the board initialized to 0 before calling add_cells()? If the board has random contents, finding dead cells might take an arbitrary long time, or potentially take forever if fewer than n cells are dead.
Are you sure the board is correctly defined? The 2D array seems more natural with y being the first dimension and x the second: using int board[BOARD_HEIGHT][BOARD_WIDTH] and swapping the index values for randX and randY.
Testing for (n > 0) would protect against an infinite loop if add_cells() is ever called with a negative n.
If n is large, finding dead cells can take a long time as shooting at random has a small chance of hitting one.
If n is larger than BOARD_WIDTH * BOARD_HEIGHT or if there are fewer than n dead cells, the loop will iterate forever.
If n is large or if the board has only a few dead cells, it would be more efficient to enumerate the dead cells and chose the target cells at random from the dead cells only. The drawback is such a method would be slower if n is small and the board has many dead cells.
The time complexity for n small compared to the number of dead cells is O(n), which is hard to beat and should be very fast on current hardware, but it tends towards O(n * BOARD_WIDTH * BOARD_HEIGHT) if n is large or close to the number of dead cells, which is much less efficient, and the function never finishes if n is greater than the number of dead cells.
If the board is known to be empty when add_cells() is called, if n is larger than BOARD_WIDTH * BOARD_HEIGHT / 2, it would be more efficient to set all cells alive and chose n cells to kill.
If the board is not necessarily empty, passing this function the number of live cells would help decide which approach is better and if there are at least n dead cells without the need for a lengthy loop to enumerate the dead cells.
If your board is contiguous in memory, you don't have to call rand() twice. You can just use rand() % (BOARD_WIDTH * BOARD_HEIGHT).
void add_cells(uint8_t board[BOARD_WIDTH][BOARD_HEIGHT], int n)
{
std::mt19937 eng;
uniform_int_distribution<int> dist(0, BOARD_WIDTH * BOARD_HEIGHT - 1);
while(n)
{
int index = dist(eng);
uint8_t* cell = (uint8_t*)board + index;
if(*cell == 0)
{
*cell = 1;
--n;
}
}
}
The modulo function is pretty slow, try (float)rand()/RAND_MAX*BOARD_WIDTH + 0.5
You can also use a faster rand, see here

How to understand linear partitioning in dynamic programming

a couple days ago I learned about linear partitioning problem, here is my code for it, is this code right and I don't understand the formula behind it, why is it like that, if you are able please explain me why the formula works.
for(int i=1;i<=n;i++) {
rsq[i]=rsq[i-1]+arr[i];
}
int dp[n+1][k+1];
for(int i=0;i<=n;i++) {
for(int j=0;j<=k;j++) {
dp[i][j]=987654321;
}
}
dp[0][0]=0;
for(int i=1;i<=n;i++) {
dp[i][1]=rsq[i];
}
for(int i=1;i<=k;i++) {
dp[1][i]=arr[1];
}
for(int i=2;i<=n;i++) {
for(int j=2;j<=k;j++) {
for(int x=1;x<i;x++) {
int s=max(dp[x][j-1], rsq[i]-rsq[x]);
if(dp[i][j]>s) dp[i][j]=s;
}
}
}
cout<<dp[n][k];
Thanks in advance.
Following this explanation, apparently the semantics of the state space dp as follows; apparently arr contains the sizes of the items to process and rsq contains the partial sums needed below to circumvent their recalculation.
dp[i][j] = minimum possible cost over all partitions of
arr[1],...arr[i] into j ranges
where i in {1,...,n} and j in {1,...k} or positive
infinity if such a partition does not exist
Apparently in the implementation 987654321 is used to model the value of positive infinity. Note that in the explanation, the axes of the state space are exchanged compared to the implementation in the original question. Based on this definition, we obtain the following recurrence relation for the values of the states.
dp[i,j] = min{ max{ dp[i-1,j'], sum_{i'=j'+1}^{n} arr[i']} : j' in {1,...,j} }
In the implementation, the sum above is precalculated in rsq. The recurrence relation can be interpreted as follows. Given all values of dp[i-1][*] for some specific value of i (which means that all cost values for items 1 up to i-1 are known), all values dp[i][*] (for items 1 up to i) can be obtained by taking all items from j'+1 to n' (j' ranges from j to j, all possibilies are considered) and summing up the remainig items (which then consitute a partition); for the optimal partition of the first items, the precalculated value is used. The maximum of these values is the cost of the choice.
Intuitively, this can be seen as partitioning the items arr[1],...,arr[n] at an arbitrary split point. The items to the right are considered as one partition (the cost of which is the sum of their members, as they are placed together into one partition), the items to the left are recursively partitioned optimally into one partition less. The dynamic programming algorithm (besides the precalculation of the partial sums) initializes some base cases which corrspond to placing every item in a single partition and organizes the order of evaluation of the states in such a way that all values needed for the next larger value j of the second axis are always calculated when needed.

Remove 1000Hz tone from FFT array in C

I have an array of doubles which is the result of the FFT applied on an array, that contains the audio data of a Wav audio file in which i have added a 1000Hz tone.
I obtained this array thought the DREALFT defined in "Numerical Recipes".(I must use it).
(The original array has a length that is power of two.)
Mine array has this structure:
array[0] = first real valued component of the complex transform
array[1] = last real valued component of the complex transform
array[2] = real part of the second element
array[3] = imaginary part of the second element
etc......
Now, i know that this array represent the frequency domain.
I want to determine and kill the 1000Hz frequency.
I have tried this formula for finding the index of the array which should contain the 1000Hz frequency:
index = 1000. * NElements /44100;
Also, since I assume that this index refers to an array with real values only, i have determined the correct(?) position in my array, that contains imaginary values too:
int correctIndex=2;
for(k=0;k<index;k++){
correctIndex+=2;
}
(I know that surely there is a way easier but it is the first that came to mind)
Then, i find this value: 16275892957.123705, which i suppose to be the real part of the 1000Hz frequency.(Sorry if this is an imprecise affermation but at the moment I do not care to know more about it)
So i have tried to suppress it:
array[index]=-copy[index]*0.1f;
I don't know exactly why i used this formula but is the only one that gives some results, in fact the 1000hz tone appears to decrease slightly.
This is the part of the code in question:
double *copy = malloc( nCampioni * sizeof(double));
int nSamples;
/*...Fill copy with audio data...*/
/*...Apply ZERO PADDING and reach the length of 8388608 samples,
or rather 8388608 double values...*/
/*Apply the FFT (Sure this works)*/
drealft(copy - 1, nSamples, 1);
/*I determine the REAL(?) array index*/
i= 1000. * nSamples /44100;
/*I determine MINE(?) array index*/
int j=2;
for(k=0;k<i;k++){
j+=2;
}
/*I reduce the array value, AND some other values aroud it as an attempt*/
for(i=-12;i<12;i+=2){
copy[j-i]=-copy[i-j]*0.1f;
printf("%d\n",j-i);
}
/*Apply the inverse FFT*/
drealft(copy - 1, nSamples, -1);
/*...Write the audio data on the file...*/
NOTE: for simplicity I omitted the part where I get an array of double from an array of int16_t
How can i determine and totally kill the 1000Hz frequency?
Thank you!
As Oli Charlesworth writes, because your target frequency is not exactly one of the FFT bins (your index, TargetFrequency * NumberOfElements / SamplingRate, is not exactly an integer), the energy of the target frequency will be spread across all bins. For a start, you can eliminate some of the frequency by zeroing the bin closest to the target frequency. This will of course affect other frequencies somewhat too, since it is slightly off target. To better suppress the target frequency, you will need to consider a more sophisticated filter.
However, for educational purposes: To suppress the frequency corresponding to a bin, simply set that bin to zero. You must set both the real and the imaginary components of the bin to zero, which you can do with:
copy[index*2 + 0] = 0;
copy[index*2 + 1] = 1;
Some notes about this:
You had this code to calculate the position in the array:
int correctIndex = 2;
for (k = 0; k < index; k++) {
correctIndex += 2;
}
That is equivalent to:
correctIndex = 2*(index+1);
I believe you want 2*index, not 2*(index+1). So you were likely reducing the wrong bin.
At one point in your question, you wrote array[index] = -copy[index]*0.1f;. I do not know what array is. You appeared to be working in place in copy. I also do not know why you multiplied by 1/10. If you want to eliminate a frequency, just set it to zero. Multiplying it by 1/10 only reduces it to 10% of its original magnitude.
I understand that you must pass copy-1 to drealft because the Numerical Recipes code uses one-based indexing. However, the C standard does not support the way you are doing it. The behavior of the expression copy-1 is not defined by the standard. It will work in most C implementations. However, to write supported portable code, you should do this instead:
// Allocate one extra element.
double *memory = malloc((nCampioni+1) * sizeof *memory);
// Make a pointer that is convenient for your work.
double *copy = memory+1;
…
// Pass the necessary base address to drealft.
drealft(memory, nSamples, 1);
// Suppress a frequency.
copy[index*2 + 0] = 0;
copy[index*2 + 1] = 0;
…
// Free the memory.
free(memory);
One experiment I suggest you consider is to initialize an array with just a sine wave at the desired frequency:
for (i = 0; i < nSamples; ++i)
copy[i] = sin(TwoPi * Frequency / SampleRate * i);
(TwoPi is of course 2*3.1415926535897932384626433.) Then apply drealft and look at the results. You will see that much of the energy is at a peak in the closest bin to the target frequency, but much of it has also spread to other bins. Clearly, zeroing a single bin and performing the inverse FFT cannot eliminate all of the frequency. Also, you should see that the peak is in the same bin you calculated for index. If it is not, something is wrong.

Linear Search Algorithm Optimization

I just finished a homework problem for Computer Science 1 (yes, it's homework, but hear me out!). Now, the assignment is 100% complete and working, so I don't need help on it. My question involves the efficiency of an algorithm I'm using (we aren't graded on algorithmic efficiency yet, I'm just really curious).
The function I'm about to present currently uses a modified version of the linear search algorithm (that I came up with, all by myself!) in order to check how many numbers on a given lottery ticket match the winning numbers, assuming that both the numbers on the ticket and the numbers drawn are in ascending order. I was wondering, is there any way to make this algorithm more efficient?
/*
* Function: ticketCheck
*
* #param struct ticket
* #param array winningNums[6]
*
* Takes in a ticket, counts how many numbers
* in the ticket match, and returns the number
* of matches.
*
* Uses a modified linear search algorithm,
* in which the index of the successor to the
* last matched number is used as the index of
* the first number tested for the next ticket value.
*
* #return int numMatches
*/
int ticketCheck( struct ticket ticket, int winningNums[6] )
{
int numMatches = 0;
int offset = 0;
int i;
int j;
for( i = 0; i < 6; i++ )
{
for( j = 0 + offset; j < 6; j++ )
{
if( ticket.ticketNum[i] == winningNums[j] )
{
numMatches++;
offset = j + 1;
break;
}
if( ticket.ticketNum[i] < winningNums[j] )
{
i++;
j--;
continue;
}
}
}
return numMatches;
}
It's more or less there, but not quite. In most situations, it's O(n), but it's O(n^2) if every ticketNum is greater than every winningNum. (This is because the inner j loop doesn't break when j==6 like it should, but runs the next i iteration instead.)
You want your algorithm to increment either i or j at each step, and to terminate when i==6 or j==6. [Your algorithm almost satisfies this, as stated above.] As a result, you only need one loop:
for (i=0,j=0; i<6 && j<6; /* no increment step here */) {
if (ticketNum[i] == winningNum[j]) {
numMatches++;
i++;
j++;
}
else if (ticketNum[i] < winningNum[j]) {
/* ticketNum[i] won't match any winningNum, discard it */
i++;
}
else { /* ticketNum[i] > winningNum[j] */
/* discard winningNum[j] similarly */
j++;
}
}
Clearly this is O(n); at each stage, it either increments i or j, so the most steps it can do is 2*n-1. This has almost the same behaviour as your algorithm, but is easier to follow and easier to see that it's correct.
You're basically looking for the size of the intersection of two sets. Given that most lottos use around 50 balls (or so), you could store the numbers as bits that are set in an unsigned long long. Finding the common numbers is then a simple matter of ANDing the two together: commonNums = TicketNums & winningNums;.
Finding the size of the intersection is a matter of counting the one bits in the resulting number, a subject that's been covered previously (though in this case, you'd use 64-bit numbers, or a pair of 32-bit numbers, instead of a single 32-bit number).
Yes, there is something faster, but probably using more memory. Make an array full of 0 in the size of the possible numbers, put a 1 on every drawn number. For every ticket number add the value at the index of that number.
int NumsArray[MAX_NUMBER+1];
memset(NumsArray, 0, sizeof NumsArray);
for( i = 0; i < 6; i++ )
NumsArray[winningNums[i]] = 1;
for( i = 0; i < 6; i++ )
numMatches += NumsArray[ticket.ticketNum[i]];
12 loop rounds instead of up to 36
The surrounding code left as an exercise.
EDIT: It also has the advantage of not needing to sort both set of values.
This is really only a minor change on a scale like this, but if the second loop reaches a number bigger than the current ticket number, it is already allowed to brake. Furthermore, if your seconds traverses numbers lower than your ticket number, it may update the offset even if no match is found within that iteration.
PS:
Not to forget, general results on efficiency make more sense, if we take the number of balls or the size of the ticket to be variable. Otherwise it is too much dependent of the machine.
If instead of comparing the arrays of lottery numbers you were to create two bit arrays of flags -- each flag being set if it's index is in that array -- then you could perform a bitwise and on the two bit arrays (the lottery ticket and the winning number sets) and produce another bit array whose bits were flags for matching numbers only. Then count the bits set.
For many lotteries 64 bits would be enough, so a uint64_t should be big enough to cover this. Also, some architectures have instructions to count the bits set in a register, which some compilers might be able to recognize and optimize for.
The efficiency of this algorithm is based both on the range of lottery numbers (M) and the number of lottery numbers per ticket (N). The setting if the flags is O(N), while the and-ing of the two bit arrays and counting of the bits could be O(M), depending on if your M (lotto number range) is larger than the size that the target cpu can preform these operations on directly. Most likely, though, M will be small and its impact will likely be less than that of N on the performance.

Resources