Create actual random data in arrays - c

I have been working on an assignment, where I have to create a given number of arrays and fill them up with random data. The approach I would like to follow is I want the arrays to be filled with data, only a percentage. The problem is that for every array, the random values are in the same position and not spread how I would like.
I have been creating the arrays in this way:
int **array = malloc(DOC * sizeof *array);
for (i = 0; i < DOC; i++)
{
array[i] = malloc(MAXWORDS * sizeof **array);
}
and filling them using :
srand((unsigned) time(&t));
and
for(i = 0; i < DOC; i++){
for(j = 0; j < MAXWORDS; j++){
array[i][rand() %percentage]=rand() %VALUE;
}
}
Where
int percentage = rand() %MAXWORDS/10;
MAXWORDS defines the lenght of the array
DOC the number of arrays
VALUE is the max random value
As you can see the random values are all behaving identically.
I know that this has to do with the way that srand depends on the time to generate the numbers, and the execution of the program is really fast, so the similar data are because of the "similar" time. So what I am asking is for either a different day to generate random values or some trick I could do to fill the arrays differently.

With "rand() % percentage" you are only picking elements within the first 10% of each array. Instead, you probably want something like this:
for (i = 0; i < DOC; ++i){
for (j = 0; j < MAXWORDS; ++j) {
if (rand() % 100 <= 10) {
array[i][j] = rand() % VALUE;
}
}
}
This gives each elements in the array roughly a 10% chance of being initialized, which should result (for large enough arrays) in about 10% of the elements being initialized.
If you want exactly 10% of the array to be initialized, you could instead do something like placing all indices (0...j) into an array, randomizing the array, and picking the first MAXWORDS/10 indices from the randomized array for initialization.

rand() and srand(), especially when used with %, they don't produce random numbers as uniformly distributed as you may think.
Check Mersenne twister algorithm as an alternative pseudorandom number generator.

I think the problem is with how you are choosing the second index: rand() % percentage will always fill towards the front of the arrays.
The standard random number generator in C (srand + rand) tends to be pretty bad at generating numbers that pass statistical tests for randomness. There are more sophisticated random number generators with better properties available as part of the GNU Scientific Library that you may find helpful.

Related

What is the most efficient (fastest) way to find an N number of the largest integers in an array in C?

Let's have an array of size 8
Let's have N be 3
With an array:
1 3 2 17 19 23 0 2
Our output should be:
23, 19, 17
Explanation: The three largest numbers from the array, listed in descending order.
I have tried this:
int array[8];
int largest[N] = {0, 0, 0};
for (int i = 1; i < N; i++) {
for (int j = 0; j < SIZE_OF_ARRAY; j++) {
if (largest[i] > array[j]) {
largest[i] = array[j];
array[j] = 0;
}
}
}
Additionally, let the constraint be as such:
integers in the array should be 0 <= i <= 1 000
N should be 1 <= N <= SIZE_OF_ARRAY - 1
SIZE_OF_ARRAY should be 2 <= SIZE_OF_ARRAY <= 1 000 000
My way of implementing it is very inefficient, as it scrubs the entire array an N number of times. With huge arrays, this can take several minutes to do.
What would be the fastest and most efficient way to implement this in C?
You should look at the histogram algorithm. Since the values have to be between 0 and 1000, you just allocate an array for each of those values:
#define MAX_VALUE 1000
int occurrences[MAX_VALUE+1];
int largest[N];
int i, j;
for (i=0; i<N; i++)
largest[N] = -1;
for (i=0; i<=MAX_VALUE; i++)
occurrences[i] = 0;
for (i=0; i<SIZE_OF_ARRAY; i++)
occurrences[array[i]]++;
// Step through the occurrences array backward to find the N largest values.
for (i=MAX_VALUE, j=0, i; i>=0 && j<N; i--)
if (occurrences[i] > 0)
largest[j++] = i;
Note that this will yield only one element in largest for each unique value. Modify the insertion accordingly if you want all occurrences to appear in largest. Because of that, you may get values of -1 for some elements if there weren't enough unique large numbers to fill the largest array. Finally, the results in largest will be sorted from largest to smallest. That will be easy to fix if you want to: just fill the largest array from right to left.
The fastest way is to recognize that data doesn't just appear (it either exists at compile time; or arrives by IO - from files, from network, etc); and therefore you can find the 3 highest values when the data is created (at compile time; or when you're parsing and sanity checking and then storing data received by IO - from files, from network, etc). This is likely to be the fastest possible way (because you're either doing nothing at run-time, or avoiding the need to look at all the data a second time).
However; in this case, if the data is modified after it was created then you'd need to update the "3 highest values" at the same time as the data is modified; which is easy if a lower value is replaced by a higher value (you just check if the new value becomes one of the 3 highest values) but involves a search if a "previously highest" value is being replaced with a lower value.
If you need to search; then it can be done with a single loop, like:
firstHighest = INT_MIN;
secondHighest = INT_MIN;
thirdHighest = INT_MIN;
for (int i = 1; i < N; i++) {
if(array[i] > thirdHighest) {
if(array[i] > secondHighest) {
if(array[i] > firstHighest) {
thirdHighest = secondHighest;
secondHighest = firstHighest;
firstHighest = array[i];
} else {
thirdHighest = secondHighest;
secondHighest = array[i];
}
} else {
thirdHighest = array[i];
}
}
}
Note: The exact code will depend on what you want to do with duplicates (you may need to replace if(array[j] > secondHighest) { with if(array[j] >= secondHighest) { and if(array[j] > firstHighest) { with if(array[j] >= firstHighest) { if you want the numbers 1, 2, 3, 4, 4, 4, 4 to give the answer 4, 4, 4 instead of 2, 3, 4).
For large amounts of data it can be accelerated with SIMD and/or multiple threads. For example; if SIMD can do "bundles of 8 integers" and you have 4 CPUs (and 4 threads); then you can split it into quarters then treat each quarter as columns of 8 elements; find the highest 3 values in each column in each quarter; then determine the highest 3 values from the "highest 3 values in each column in each quarter". In this case you will probably want to add padding (dummy values set to INT_MIN) to the end of the array to ensure that the array's total size is a multiple of SIMD width and number of CPUs.
For small amounts of data the extra overhead of setting up SIMD and/or coordinating multiple threads is going to cost more than it saves; and the "simple loop" version is likely to be as fast as it gets.
For unknown/variable amounts of data you could provide multiple alternatives (simple loop, SIMD with single thread, and SIMD with a variable number of threads) and decide which method to use (and how many threads to use) at run-time based on the amount of data.
One method I can think of is to just sort the array and return the first N numbers. Since the array is sorted, the N number we return will be the N largest numbers of the array. This method will take a time complexity of O(nlogn) where n is the number of elements we have in the given array. I think this is probably very good time complexity you can get when approaching this problem.
Another approach with similar time complexity would be to use a max-heap. Form max-heap from the given array and for N times, use pop() (or extract or whatever you call it) to get the top-most element which would be the max element remaining in the heap after each pop.
The time complexity of this approach could be considered to be even better than first one - O(n + Nlogn) where n is the number of elements in array and N is the number of largest elements to be found. Here, O(n) would be required to build heap and for popping the top-most element, we would need O(logn) for N times which sums up to - O(n + Nlogn), slightly better than O(nlogn)

Avoiding duplicates in a 2D array?

I am doing a program in C which needs to take in a set of values (integers) into a 2D array, and then performs certain mathematical operations on it. I have decided to implement a check in the program as the user is inputting the values to avoid them from entering values that are already present in the array.
I am however unsure of how to go about this check. I figured out I might need some sort of recursive function to check all the elements previous to the one that's being entered, but I don't know how to implement it.
Please find below a snippet of my code for illustrative purposes:
Row and col are values inputted by the user for the dimension of the array
for (int i=0; i<row;i++){
for (int j=0; j<col; j++){
scanf("%d", &arr[i][j]); //take in elements
}
}
for (int i = 0; i < row; i++)
{
for (int j = 0; i < col; j++)
{
if (arr[i][j] == arr[i][j-1]){
printf("Duplicate.\n");}
else {}
}
}
I know this is probably not correct but it's my attempt.
Any help would be much appreciated.
I would suggest that your store every element you read in a temporary 1D array. Everytime you scan a new element, traverse the 1D array checking if the value exists or not. Although this is not optimal, this will be at least less expensive than traversing the 2D array everytime.
Example:
int temp[SIZE];
int k,elements = 0;
for (int i = 0; i < row; i++) {
for (int j = 0; j < col; j++) {
scanf("%d", &arr[i][j]); //take in elements
temp[elements] = arr[i][j];
elements++;
for (int k = 0; k < elements; k++) {
if (temp[k] == arr[i][j])
printf("Duplicate.\n"); //or do whatever you wish
}
}
}
A balanced tree inserts and searches in O(log N) time.
Since the algorithms are quite simple & standard and were published in the seminal books by Knuth, there are plenty of implementations out there, including a clear and concise one at codereview.SE (which is thus automatically CC-BY-SA 3.0; do apply a bugfix in the answer). Using it (as well as virtually any other one) is simple: start with node* root = NULL;, then insert and search, and finally free_tree.
Asymptotically, the best method is a hash table with O(1) for both, but that is probably an overkill (the algorithms are much more complex and memory footprint is larger) unless you have a lot of numbers. For C++, there's a standard implementation, yet there are plenty 3rd-party ones for C, too.
If your number of input values is small, even the tree may be an overkill, and simply looking through previous values would be fast enough. If your 2D array is contiguous in memory, you can access it as 1D with int* arr1d = (int*)&arr2d.

Generate random numbers without repeats

I want to generate random numbers without repeats till all gone, then again generating random numbers with the initial dataset.
I know keeping already generated numbers in an array and loopin through them to check whether it is alredy generated or the method deducting the numbers that are generated from the array and randomize numbers with the new array.
What I want is not those methods, if there is a way that is efficient using data structures will be quite nice, if it is any other method also ok
Thanks
Say you want to generate 1,000 unique random numbers and present them to some code one at a time. When you exhaust those numbers, you want to present the same numbers again, but in a different sequence.
To generate the numbers, use a hash table. In C#, it would look like this:
const int MaxNumbers = 1000;
HashSet<int> uniqueNumbers = new HashSet<int>();
Random rnd = new Random();
// generate random numbers until there are 1000 in the hash table
while (uniqueNumbers.Count < MaxNumbers)
{
uniqueNumbers.Add(rnd.Next());
}
// At this point you have 1,000 unique random numbers
// Put them in an array
int[] myNumbers = uniqueNumbers.ToArray();
This works because the HashSet.Add method rejects duplicates. And it's very fast because lookup in the hash table is O(1).
Now, you can serve them by setting a current index at 0 and increment it every time a number is requested. Something like:
int ixCurrent = 0;
int GetNextNumber()
{
if (ixCurrent < MaxNumbers)
{
++ixCurrent;
return myNumbers[ixCurrent-1];
}
But what to do when ixCurrent runs off the end of the array? Enter the Fisher-Yates Shuffle:
// out of numbers. Shuffle the array and return the first one.
for (int i = MaxNumbers-1; i > 0; --i)
{
int j = rnd.Next(i+1);
int temp = myNumbers[i];
myNumbers[i] = myNumbers[j];
myNumbers[j] = temp;
}
ixCurrent = 1;
return myNumbers[0];
}
If you know that the numbers you want to return are within a particular range (that is, you want to return the numbers 0-999 in random order), then you just fill an array with the values 0-999 and shuffle it.
I'm not sure what language you are using, but here's some C++ code that does what you're looking for. Instead of it searching an array, it just does a direct check of a specific section of memory for a set flag and if it isn't set then the number chosen is new and printed.
The section I marked as handler is the code that is first executed when a unique number is found. Change the 10's and the 11 to different numbers if you want a larger set of random numbers but you might have to wait forever for the output.
int main(int argc, char *argv[]){
char randn[10];
char randnset[10];
int n;
int ct=0;
memset(randnset,'1',10);
memset(randn,0,10);
while (ct < 10){
srand(time(NULL));
n=rand() % 11;
if (!randn[n]){
printf("%d\n",n); // handler
randn[n]='1';
ct++;
}
}
return 0;
}
Every random generator function takes a seed value as a parameter and uses it in its internal algorithm to generate random numbers. If you want to generate the same sequence of numbers, you have to use the same seed value. As an example you can achieve this in Java like this:
int seed = 10;
Random r = new Random(seed);
for(int i=0; i<10; i++){
System.out.println(r.nextInt());
}
The output is something like this (of course it will have different results in your system):
-1157793070
1913984760
1107254586
1773446580
254270492
-1408064384
1048475594
1581279777
-778209333
1532292428
and it gives me the same results each time I execute it.

How to compare and replace (int) pointers?

So i have this piece of code:
int* get_lotto_draw() //Returns an array of six random lottery numbers 1-49
{
int min = 1;
int max = 49;
int counter = 0;
srand(time(NULL));
int *arrayPointer = malloc(6 * sizeof(int));
for(counter = 0; counter <= 5; counter++)
{
arrayPointer[counter] = rand()%(max-min)+min;
}
return arrayPointer;
}
This gives me 6 int* but sometimes these int* values can be the same. How can i compare each of them to eachother, so that if they are the same number, it will re-calculate on of the values that are equal ? Thanks for your time.
make a search in the array for same number before storing the number as,
for(counter = 0; counter <= 5; counter++)
{
int x1 = 1;
while(x1)
{
int temp = rand()%(max-min)+min;
for(i = 0; i < counter; i++)
{
if(arrayPointer[i] == temp)
{
break;
}
}
if(i == counter)
{
x1 = 0;
arrayPointer[counter] = temp;
}
}
}
The problem is that random-number generators don't know or care about history (at least, not in the sense that you do--software random number generators use recent results to generate future results). So, your choices are generally to...
Live with the possibility of duplicates. Obviously, that doesn't help you.
Review previous results and discard duplicates by repeating the draw until the value is unique. This has the potential to never end, if your random numbers get stuck in a cycle.
Enumerate the possible values, shuffle them, and pick the values off the list as needed. Obviously, that's not sensible if you have an enormous set of possible values.
In your case, I'd pick the third.
Create an array (int[]) big enough to hold one of every possible value (1-49).
Use a for loop to initialize each array value.
Use another for loop to swap each entry with a random index. If you swap a position with itself, that's fine.
Use the first few (6) elements of the array.
You can combine the second and third steps, if you want, but splitting them seems more readable and less prone to error to me.

Starting with a 10x10 array how do I choose 10 random sites

I am trying to write C code to randomly select 10 random sites from a grid of 10x10. The way I am considering going about this is to assign every cell a random number between zero and RAND_MAX and then picking out the 10 smallest/largest values. But I have very little idea about how to actually code something like that :/
I have used pseudo-random number generators before so I can do that part.
Just generate 2 random numbers between 0 and 9 and the select the random element from the array like:
arr[rand1][rand2];
Do that 10 times in a loop. No need to make it more complicated than that.
To simplify slightly, treat the 10x10 array as an equivalent linear array of 100 elements. Now the problem becomes that of picking 10 distinct numbers from a set of 100. To get the first index, just pick a random number in the range 0 to 99.
int hits[10]; /* stow randomly selected indexes here */
hits[0] = random1(100); /* random1(n) returns a random int in range 0..n-1 */
The second number is almost as easy. Choose another number from the 99 remaining possibilities. Random1 returns a number in the continuous range 0..99; you must then map that into the broken range 0..hits[0]-1, hits[0]+1..99.
hits[1] = random1(99);
if (hits[1] == hits[0]) hits[1]++;
Now for the second number the mapping starts to get interesting because it takes a little extra work to ensure the new number is distinct from both existing choices.
hits[2] = random1(98);
if (hits[2] == hits[0]) hits[2]++;
if (hits[2] == hits[1]) hits[2]++;
if (hits[2] == hits[0]) hits[2]++; /* re-check, in case hits[1] == hits[0]+1 */
If you sort the array of hits as you go, you can avoid the need to re-check elements for uniqueness. Putting everything together:
int hits[10];
int i, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++) {
if (choice < hits[i]) /* find where it belongs in sorted hits */
break;
choice++; /* and make sure it is distinct *
/* need ++ to preserve uniform random distribution! */
}
insert1( hits, n, choice, i );
/* insert1(...) inserts choice at i in growing array hits */
}
You can use hits to fetch elements from your 10x10 array like this:
array[hits[0]/10][hits[0]%10]
for (int i = 0; i < 10; i++) {
// ith random entry in the matrix
arr[rand() % 10][rand() % 10];
}
Modified this from Peter Raynham's answer - I think the idea in it is right, but his execution is too complex and isn't mapping the ranges correctly:
To simplify slightly, treat the 10x10 array as an equivalent linear array of 100 elements. Now the problem becomes that of picking 10 distinct numbers from a set of 100.
To get the first index, just pick a random number in the range 0 to 99.
int hits[10]; /* stow randomly selected indexes here */
hits[0] = random1(100); /* random1(n) returns a random int in range 0..n-1 */
The second number is almost as easy. Choose another number from the 99 remaining possibilities. Random1 returns a number in the continuous range 0..99; you must then map that into the broken range 0..hits[0]-1, hits[0]+1..99.
hits[1] = random1(99);
if (hits[1] >= hits[0]) hits[1]++;
Note that you must map the complete range of hits[0]..98 to hits[0]+1..99
For another number you must compare to all previous numbers, so for the third number you must do
hits[2] = random1(98);
if (hits[2] >= hits[0]) hits[2]++;
if (hits[2] >= hits[1]) hits[2]++;
You don't need to sort the numbers! Putting everything together:
int hits[10];
int i, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++)
if (choice >= hits[i])
choice++;
hits[i] = choice;
}
You can use hits to fetch elements from your 10x10 array like this:
array[hits[0]/10][hits[0]%10]
If you want your chosen random cells from grid to be unique - it seems that you really want to construct random permutations. In that case:
Put cell number 0..99 into 1D array
Take some shuffle algorithm and toss that array with it
Read first 10 elements out of shuffled array.
Drawback: Running time of this algorithm increases linearly with increasing number of cells. So it may be better for practical reasons to do as #PaulP.R.O. says ...
There is a subtle bug in hjhill's solution. If you don't sort the elements in your list, then when you scan the list (inner for loop), you need to re-scan whenever you bump the choice index (choice++). This is because you may bump it into a previous entry in the list - for example with random numbers: 90, 89, 89.
The complete code:
int hits[10];
int i, j, n;
for (n = 0; n < 10; n++) {
int choice = random1( 100 - n ); /* pick a remaining index at random */
for (i = 0; i < n; i++) {
if (choice >= hits[i]) { /* find its place in partitioned range */
choice++;
for (j = 0; j < i; j++) { /* adjusted the index, must ... */
if (choice == hits[j]) { /* ... ensure no collateral damage */
choice++;
j = 0;
}
}
}
}
hits[n] = choice;
}
I know it's getting a little ugly with five levels of nesting. When selecting just a few elements (e.g., 10 of 100) it will have better performance than the sorting solution; when selecting a lot of elements (e.g., 90 of 100), performance will likely be worse than the sorting solution.

Resources