Is a there a neat algorithm that I can use to fill in random positions in a huge 2D n x n array with m number of integers without filling in an occupied position? Where , and
Kind of like this pseudo code:
int n;
int m;
void init(int new_n, int new_m) {
n = new_n;
m = new_m;
}
void create_grid() {
int grid[n][n];
int x, y;
for(x = 1; x <= n; x ++) {
for(y = 1; y <= n; y ++) {
grid[x][y] = 0;
}
}
populate_grid(grid);
}
void populate_grid(int grid[][]) {
int i = 1;
int x, y;
while(i <= m) {
x = get_pos();
y = get_pos();
if(grid[x][y] == 0) {
grid[x][y] = i;
i ++;
}
}
}
int get_pos() {
return random() % n + 1;
}
... but more efficient for bigger n's and m's. Specially if m is bigger and more positions are being occupied, it would take longer to generate a random position that isn't occupied.
Unless the filling factor really gets large, you shouldn't worry about hitting occupied positions.
Assuming for instance that half of the cells are already filled, you have 50% of chances to first hit a filled cell; and 25% to hit two filled ones in a row; 12.5% of hitting three... On average, it takes... two attempts to find an empty place ! (More generally, if there is only a fraction 1/M of free cells, the average number of attempts raises to M.)
If you absolutely want to avoid having to test the cells, you can work by initializing an array with the indexes of the free cells. Then instead of choosing a random cell, you choose a random entry in the array, between 1 and L (the lenght of the list, initially N²).
After having chosen an entry, you set the corresponding cell, you move the last element in the list to the random position, and set L= L-1. This way, the list of free positions is kept up-to-date.
Note the this process is probably less efficient than blind attempts.
To generate pseudo-random positions without repeats, you can do something like this:
for (int y=0; y<n; ++y) {
for(int x=0; x<n; ++x) {
int u=x,v=y;
u = (u+hash(v))%n;
v = (v+hash(u))%n;
u = (u+hash(v))%n;
output(u,v);
}
}
for this to work properly, hash(x) needs to be a good pseudo-random hash function that produces positive numbers that won't overflow when you add to a number between 0 and n.
This is a version of the Feistel structure (https://en.wikipedia.org/wiki/Feistel_cipher), which is commonly used to make cryptographic ciphers like DES.
The trick is that each step like u = (u+hash(v))%n; is invertible -- you can get your original u back by doing u = (u-hash(v))%n (I mean you could if the % operator worked with negative numbers the way everyone wishes it did)
Since you can invert the operations to get the original x,y back from each u,v output, each distinct x,y MUST produce a distinct u,v.
Related
I want to generate 5 random positions on a map. I can only come up with the code below, which uses while (1) and break:
int map[10][10];
memset(map,0,sizeof(map));
for (int i = 0; i < 5; i++) {
while (1) {
int x = RAND_FROM_TO(0, 10);
int y = RAND_FROM_TO(0, 10);
if (map[x][y]==0) {
map[x][y]=1;
break;
}
}
}
Is there any other way to do the same job without while(1), because I have been told the while(1) is very bad.
I just want to find a simple way to do it, so the efficiency of the generating random numbers is not under my consideration.
You can use a shuffle algorithm such as Fisher–Yates. I would pose a modified (truncated) version as so:
Express your XY coordinates as a single number.
Construct a list of all coordinates.
Pick one at random, mark it.
Remove that coordinate from the list (swap it with the one at the end of the list, and treat the list as 1 element shorter)
repeat with the list that no longer contains the marked coordinate.
This way, rather than choosing 5 numbers from 0-99, you choose one 0-99, 0-98, ... 0-95, which guarantees that you can complete the task with exactly 5 choices.
EDIT: Upon further consideration, step 1 is not strictly necessary, and you could use this on a system with sparse coordinates if you did it that way.
What about something like this:
// Create an array of valid indexes for both x and y.
NSMutableArray *xCoords = [NSMutableArray array];
NSMutableArray *yCoords = [NSMutableArray array];
for (int i = 0; i < 9; ++i) {
[xCoords addObject:#(i)];
[yCoords addObject:#(i)];
}
int map[10][10];
memset(map, 0, sizeof(map));
for (int i = 0; i < 5; ++i) {
// Pick a random x coordinate from the valid x coordinate list.
int rand = RAND_FROM_TO(0, [xCoords count]);
int x = [xCoords objectAtIndex:rand];
// Now remove that coordinate so it cannot be picked again.
[xCoords removeObjectAtIndex:rand];
// Repeat for y.
rand = RAND_FROM_TO(0, [yCoords count]);
int y = [yCoords objectAtIndex:rand];
[yCoords removeObjectAtIndex:rand];
assert(map[x][y] == 0);
map[x][y] = 1;
}
Note: I'm using NSMutableArray because you originally specified Objective-C as a tag.
Note 2: An array of valid indexes is not the most efficient representation. Using NSMutableIndexSet instead is left as an exercise to the reader. As is using basic C primitives if you don't / can't use NSMutableArray.
Note 3: This has a bug where if you pick, say, x = 3 the first time, no further choices will end up with x = 3, even though there will be valid choices where x = 3 but y is different. Fixing that is also left as an exercise, but this does satisfy your requirements, on the surface.
I have a linked list of objects each containing a 32-bit integer (and provably fewer than 232 such objects) and I want to efficiently choose an integer that's not present in the list, without using any additional storage (so copying them to an array, sorting the array, and choosing the minimum value not in the array would not be an option). However, the definition of the structure for list elements is under my control, so I could add (within reason) additional storage to each element as part of solving the problem. For example, I could add an extra set of prev/next pointers and merge-sort the list. Is this the best solution? Or is there a simpler or more efficient way to do it?
Given the conditions that you outline in the comments, especially your expectation of many identical values, you must expect a sparse distribution of used values.
Consequently, it might actually be best to just guess a value randomly and then check whether it coincides with a value in the list. Even if half the available value range were used (which seems extremely unlikely from your comments), you would only traverse the list twice on average. And you can drastically decrease this factor by simultaneously checking a number of guesses in one pass. Done correctly, the factor should always be close to one.
The advantage of such a probabilistic approach is that you are immune to bad sequences of values. Such sequences are always possible with range based approaches: If you calculate the min and max of the data, you run the risk, that the data contains both 0 and 2^32-1. If you sequentially subdivide an interval, you run the risk of always getting values in the middle of the interval, which can shrink it to zero in 32 steps. With a probabilistic approach, these sequences can't hurt you.
I think, I would use something like four guesses for very small lists, and crank it up to roughly 16 as the size of the list approaches the limit. The high starting value is due to the fact that any such algorithm will be memory bound, i. e. your CPU has ample amounts of time to check a value while it waits for the next values to arrive from memory, so you better make good use of that time to reduce the number of passes required.
A further optimization would instantly replace a busted guess with a new one and keep track of where the replacement happened, so that you can avoid a complete second pass through the data. Also, move the busted guess to the end of the list of guesses, so that you only need to check against the start position of the first guess in your loop to stop as early as possible.
If you can spare one pointer in each object, you get an O(n) worst-case algorithm easily (standard divide-and-conquer):
Divide the range of possible IDs equally.
Make a singly-linked list covering each subrange.
If one subrange is empty, choose any id in it.
Otherwise repeat with the elements of the subrange with fewest elements.
Example code using two sub-ranges per iteration:
unsigned getunusedid(element* h) {
unsigned start = 0, stop = -1;
for(;h;h = h->mainnext)
h->next = h->mainnext;
while(h) {
element *l = 0, *r = 0;
unsigned cl = 0, cr = 0;
unsigned mid = start + (stop - start) / 2;
while(h) {
element* next = h->next;
if(h->id < mid) {
h->next = l;
cl++;
l = h;
} else {
h->next = r;
cr++;
r = h;
}
h = next;
}
if(cl < cr) {
h = l;
stop = mid - 1;
} else {
h = r;
start = mid;
}
}
return start;
}
Some more remarks:
Beware of bugs in the above code; I have only proved it correct, not tried it.
Using more buckets (best keep to a power of 2 for easy and efficient handling) each iteration might be faster due to better data-locality (though only try and measure if it's not fast enough otherwise), as #MarkDickson rightly remarks.
Without those extra-pointers, you need full sweeps each iteration, raising the bound to O(n*lg n).
An alternative would be using 2+ extra-pointers per element to maintain a balanced tree. That would speed up id-search, at the expense of some memory and insertion/removal time overhead.
If you don't mind an O(n) scan for each change in the list and two extra bits per element, whenever an element is inserted or removed, scan through and use the two bits to represent whether an integer (element + 1) or (element - 1) exists in the list.
For example, inserting the element, 2, the extra bits for each 3 and 1 in the list would be updated to show that 3-1 (in the case of 3) and 1+1 (in the case of 1) now exist in the list.
Insertion/deletion time can be reduced by adding a pointer from each element to the next element with the same integer.
I am supposing that integers have random values not controlled by your code.
Add two unsigned integers in your list class:
unsigned int rangeMinId = 0;
unsigned int rangeMaxId = 0xFFFFFFFF ;
Or if not possible to change the List class add them as global variables.
When the list is empty you will always know that the range if free. When you add a new item in the list check if its ID is between rangeMinId and rangeMaxId and if so change the nearest of them to this ID.
It may happen after a lot of time that rangeMinId to become equal to rangeMaxId-1, then you need a simple function which traverses the whole list and search for another free range. But this will not happens very frequently.
Other solutions are more complex and involves using of sets, binary trees or sorted arrays.
Update:
The free range search function can be done in O(n*log(n)). An example of such function is given below(I have not extensively tested it). The example is for integer array but easily can be adapted for a list.
int g_Calls = 0;
bool _findFreeRange(const int* value, int n, int& left, int& right)
{
g_Calls ++ ;
int l=left, r=right,l2,r2;
int m = (right + left) / 2 ;
int nl=0, nr=0;
for(int k = 0; k < n; k++)
{
const int& i = value[k] ;
if(i > l && i < r)
{
if(i-l < r-i)
l = i;
else
r = i;
}
if(i < m)
nl ++ ;
else
nr ++ ;
}
if ( (r - l) > 1 )
{
left = l;
right = r;
return true ;
}
if( nl < nr)
{
// check first left then right
l2 = left;
r2 = m;
if(r2-l2 > 1 && _findFreeRange(value, n, l2, r2))
{
left = l2 ;
right = r2 ;
return true;
}
l2 = m;
r2 = right;
if(r2-l2 > 1 && _findFreeRange(value, n, l2, r2))
{
left = l2 ;
right = r2 ;
return true;
}
}
else
{
// check first right then left
l2 = m;
r2 = right;
if(r2-l2 > 1 && _findFreeRange(value, n, l2, r2))
{
left = l2 ;
right = r2 ;
return true;
}
l2 = left;
r2 = m;
if(r2-l2 > 1 && _findFreeRange(value, n, l2, r2))
{
left = l2 ;
right = r2 ;
return true;
}
}
return false;
}
bool findFreeRange(const int* value, int n, int& left, int& right, int maxx)
{
g_Calls = 1;
left = 0;
right = maxx;
if(!_findFreeRange(value, n, left, right))
return false ;
left++;
right--;
return (right - left) >= 0 ;
}
If it returns false list is filled and there is no free range (very least possible), maxm is the maximal limit of the range in this case 0xFFFFFFFF.
The idea is first to search the biggest range of the list and then if no free hole is found to recursively search the subranges for holes which may have been left during the first pass. If the list is sparsely filled it is very least probable that function will be called more than once. However when the list become almost completely filled it can happen the range search to take longer. Thus in this most worst case scenario, when the list becomes closed to filled, its better to start keeping all free ranges in a list.
This reminds me of the book Programming Pearls, and in particular the very first column, "Cracking the Oyster". What is the real problem you are trying to solve?
If your list is small, then a simple linear search to find max/min would work and it would work quickly.
When your list gets large and linear search becomes unwieldy, you can build a bitmap to represent the unused numbers for much less memory than adding 2 extra pointers at each node in the linked list. In fact, it would only be 2^(32-8) = 16KB of RAM compared to your linked list being potentially >10GB.
Then to find an unused number, you can just traverse the bitmap one machine-word at a time, checking if it's non-zero. If it is, then at least one number in that 32- or 64- bit block is unused, and you can inspect the word to find out exactly which bit is set. As you add numbers to the list, all you have to do is clear the corresponding bit in the bitmap.
One possible solution is to take the min and max of the list with a simple O(n) iteration, then pick a number between max and min + (1 << 32). This is simple to do since overflow/underflow behavior is well-defined for unsigned integers:
uint32_t min, max;
// TODO: compute min and max here
// exclude max from choice space (min will be an exclusive upper bound)
max++;
uint32_t choice = rand32() % (min - max) + max; // where rand32 is a random unsigned 32-bit integer
Of course, if it doesn't need to be random, then you can just use one more than the maximum of the list.
Note: the only case where this fails is if min is 0 and max is UINT32_MAX (aka 4294967295).
Ok. Here is one really simple solution. Some of the answers have become too theoretical and complicated for optimization. If you need a quick solution do this:
1.In your List add a member:
unsigned int NextFreeId = 1;
add also an std::set<unsigned int> ids
When you add item in the list add also the integer in the set and keep track of the NextFreeId:
int insert(unsigned int id)
{
ids.insert(id);
if (NextFreeId == id) //will not happen too frequently
{
unsigned int TheFreeId ;
unsigned int nextid = id+1, previd = id-1;
while(true )
{
if(nextid < 0xFFFFFFF && !ids.count(nextid))
{
NextFreeId = nextid ;
break ;
}
if(previd > 0 && !ids.count(previd))
{
NextFreeId = previd ;
break ;
}
if(prevId == 0 && nextid == 0xFFFFFFF)
break; // all the range is filled, there is no free id
nextid++ ;
previd -- ;
}
}
return 1;
}
Sets are very efficient to check if a value is contained so the complexity will be O(log(N)). It is quick to implement. Also set is searched not each time but only when the NextFreeId is filled. List is not traversed at all.
So i have this piece of code:
int* get_lotto_draw() //Returns an array of six random lottery numbers 1-49
{
int min = 1;
int max = 49;
int counter = 0;
srand(time(NULL));
int *arrayPointer = malloc(6 * sizeof(int));
for(counter = 0; counter <= 5; counter++)
{
arrayPointer[counter] = rand()%(max-min)+min;
}
return arrayPointer;
}
This gives me 6 int* but sometimes these int* values can be the same. How can i compare each of them to eachother, so that if they are the same number, it will re-calculate on of the values that are equal ? Thanks for your time.
make a search in the array for same number before storing the number as,
for(counter = 0; counter <= 5; counter++)
{
int x1 = 1;
while(x1)
{
int temp = rand()%(max-min)+min;
for(i = 0; i < counter; i++)
{
if(arrayPointer[i] == temp)
{
break;
}
}
if(i == counter)
{
x1 = 0;
arrayPointer[counter] = temp;
}
}
}
The problem is that random-number generators don't know or care about history (at least, not in the sense that you do--software random number generators use recent results to generate future results). So, your choices are generally to...
Live with the possibility of duplicates. Obviously, that doesn't help you.
Review previous results and discard duplicates by repeating the draw until the value is unique. This has the potential to never end, if your random numbers get stuck in a cycle.
Enumerate the possible values, shuffle them, and pick the values off the list as needed. Obviously, that's not sensible if you have an enormous set of possible values.
In your case, I'd pick the third.
Create an array (int[]) big enough to hold one of every possible value (1-49).
Use a for loop to initialize each array value.
Use another for loop to swap each entry with a random index. If you swap a position with itself, that's fine.
Use the first few (6) elements of the array.
You can combine the second and third steps, if you want, but splitting them seems more readable and less prone to error to me.
I am trying to write C code to randomly select 10 random sites from a grid of 10x10. The way I am considering going about this is to assign every cell a random number between zero and RAND_MAX and then picking out the 10 smallest/largest values. But I have very little idea about how to actually code something like that :/
I have used pseudo-random number generators before so I can do that part.
Just generate 2 random numbers between 0 and 9 and the select the random element from the array like:
arr[rand1][rand2];
Do that 10 times in a loop. No need to make it more complicated than that.
To simplify slightly, treat the 10x10 array as an equivalent linear array of 100 elements. Now the problem becomes that of picking 10 distinct numbers from a set of 100. To get the first index, just pick a random number in the range 0 to 99.
int hits[10]; /* stow randomly selected indexes here */
hits[0] = random1(100); /* random1(n) returns a random int in range 0..n-1 */
The second number is almost as easy. Choose another number from the 99 remaining possibilities. Random1 returns a number in the continuous range 0..99; you must then map that into the broken range 0..hits[0]-1, hits[0]+1..99.
hits[1] = random1(99);
if (hits[1] == hits[0]) hits[1]++;
Now for the second number the mapping starts to get interesting because it takes a little extra work to ensure the new number is distinct from both existing choices.
hits[2] = random1(98);
if (hits[2] == hits[0]) hits[2]++;
if (hits[2] == hits[1]) hits[2]++;
if (hits[2] == hits[0]) hits[2]++; /* re-check, in case hits[1] == hits[0]+1 */
If you sort the array of hits as you go, you can avoid the need to re-check elements for uniqueness. Putting everything together:
int hits[10];
int i, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++) {
if (choice < hits[i]) /* find where it belongs in sorted hits */
break;
choice++; /* and make sure it is distinct *
/* need ++ to preserve uniform random distribution! */
}
insert1( hits, n, choice, i );
/* insert1(...) inserts choice at i in growing array hits */
}
You can use hits to fetch elements from your 10x10 array like this:
array[hits[0]/10][hits[0]%10]
for (int i = 0; i < 10; i++) {
// ith random entry in the matrix
arr[rand() % 10][rand() % 10];
}
Modified this from Peter Raynham's answer - I think the idea in it is right, but his execution is too complex and isn't mapping the ranges correctly:
To simplify slightly, treat the 10x10 array as an equivalent linear array of 100 elements. Now the problem becomes that of picking 10 distinct numbers from a set of 100.
To get the first index, just pick a random number in the range 0 to 99.
int hits[10]; /* stow randomly selected indexes here */
hits[0] = random1(100); /* random1(n) returns a random int in range 0..n-1 */
The second number is almost as easy. Choose another number from the 99 remaining possibilities. Random1 returns a number in the continuous range 0..99; you must then map that into the broken range 0..hits[0]-1, hits[0]+1..99.
hits[1] = random1(99);
if (hits[1] >= hits[0]) hits[1]++;
Note that you must map the complete range of hits[0]..98 to hits[0]+1..99
For another number you must compare to all previous numbers, so for the third number you must do
hits[2] = random1(98);
if (hits[2] >= hits[0]) hits[2]++;
if (hits[2] >= hits[1]) hits[2]++;
You don't need to sort the numbers! Putting everything together:
int hits[10];
int i, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++)
if (choice >= hits[i])
choice++;
hits[i] = choice;
}
You can use hits to fetch elements from your 10x10 array like this:
array[hits[0]/10][hits[0]%10]
If you want your chosen random cells from grid to be unique - it seems that you really want to construct random permutations. In that case:
Put cell number 0..99 into 1D array
Take some shuffle algorithm and toss that array with it
Read first 10 elements out of shuffled array.
Drawback: Running time of this algorithm increases linearly with increasing number of cells. So it may be better for practical reasons to do as #PaulP.R.O. says ...
There is a subtle bug in hjhill's solution. If you don't sort the elements in your list, then when you scan the list (inner for loop), you need to re-scan whenever you bump the choice index (choice++). This is because you may bump it into a previous entry in the list - for example with random numbers: 90, 89, 89.
The complete code:
int hits[10];
int i, j, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++) {
if (choice >= hits[i]) { /* find its place in partitioned range */
choice++;
for (j = 0; j < i; j++) { /* adjusted the index, must ... */
if (choice == hits[j]) { /* ... ensure no collateral damage */
choice++;
j = 0;
}
}
}
}
hits[n] = choice;
}
I know it's getting a little ugly with five levels of nesting. When selecting just a few elements (e.g., 10 of 100) it will have better performance than the sorting solution; when selecting a lot of elements (e.g., 90 of 100), performance will likely be worse than the sorting solution.
Say I have a 2D array of random boolean ones and zeroes called 'lattice', and I have a 1D array called 'list' which lists the addresses of all the zeroes in the 2D array. This is how the arrays are defined:
define n 100
bool lattice[n][n];
bool *list[n*n];
After filling the lattice with ones and zeroes, I store the addresses of the zeroes in list:
for(j = 0; j < n; j++)
{
for(i = 0; i < n; i++)
{
if(!lattice[i][j]) // if element = 0
{
list[site_num] = &lattice[i][j]; // store address of zero
site_num++;
}
}
}
How do I extract the x,y coordinates of each zero in the array? In other words, is there a way to return the indices of an array element through referring to its address?
EDIT: I need to make the code as efficient as possible, as I'm doing lots of other complicated stuff with much larger arrays. So a fast way of accomplishing this would be great
One solution is to map (x, y) to a natural number (say z).
z = N * x + y
x = z / N (integer division)
y = z % N
In this case, you should use int list[N * N];
Another solution is to just store the coordinates when you find a zero, something like:
list_x[site_num] = x;
list_y[site_num] = y;
site_num++;
Or you can define a struct of two ints.
Well, it is possible with some pointer arithmetic.
You have the address of your first element of lattice and the addresses of all zero-fields in list. You know the size of bool. By subtracting the first-elements address from a zero-field address and dividing by the size of bool you get a linar index. This linear index can be calculated into the 2-dim index by using modulo and division.
But why don't you store the 2-dim index within your list instead of the address? Do you need the addess or just the index?
And you should think about turning the for-loops around (outer loop i, inner loop j).
struct ListCoords
{
int x, y;
} coords[n*n];
for(i = 0; i < site_num; i++)
{
int index = list[i] - &lattice[0][0];
coords[i].x = index % n;
coords[i].y = index / n;
}
I may have the % and / operators backwards for your needs, but this should give you the idea.
How do I extract the x,y coordinates of each zero in the array? In other words, is there a way to return the indices of an array element through referring to its address?
You can't. Simple as that. If you need that information you need to pass it along with the arrays in question.
bool *list[n*n]; is an illegal statement in C89 (EDIT: Unless you made n a macro (yuck!)), you may wish to note that variable length arrays are a C99 feature.