Weighted random integers - c

I want to assign weightings to a randomly generated number, with the weightings represented below.
0 | 1 | 2 | 3 | 4 | 5 | 6
─────────────────────────────────────────
X | X | X | X | X | X | X
X | X | X | X | X | X |
X | X | X | X | X | |
X | X | X | X | | |
X | X | X | | | |
X | X | | | | |
X | | | | | |
What's the most efficient way to do it?

#Kerrek's answer is good.
But if the histogram of weights is not all small integers, you need something more powerful:
Divide [0..1] into intervals sized with the weights. Here you need segments with relative size ratios 7:6:5:4:3:2:1. So the size of one interval unit is 1/(7+6+5+4+3+2+1)=1/28, and the sizes of the intervals are 7/28, 6/28, ... 1/28.
These comprise a probability distribution because they sum to 1.
Now find the cumulative distribution:
P x
7/28 => 0
13/28 => 1
18/28 => 2
22/28 => 3
25/28 => 4
27/28 => 5
28/28 => 6
Now generate a random r number in [0..1] and look it up in this table by finding the smallest x such that r <= P(x). This is the random value you want.
The table lookup can be done with binary search, which is a good idea when the histogram has many bins.
Note you are effectively constructing the inverse cumulative density function, so this is sometimes called the method of inverse transforms.

If your array is small, just pick a uniform random index into the following array:
int a[] = {0,0,0,0,0,0,0, 1,1,1,1,1,1, 2,2,2,2,2, 3,3,3,3, 4,4,4, 5,5, 6};
If you want to generate the distribution at runtime, use std::discrete_distribution.

To get the distribution you want, first you basically add up the count of X's you wrote in there. You can do it like this (my C is super rusty, so treat this as pseudocode)
int num_cols = 7; // for your example
int max;
if (num_cols % 2 == 0) // even
{
max = (num_cols+1) * (num_cols/2);
}
else // odd
{
max = (num_cols+1) * (num_cols/2) + ((num_cols+1)/2);
}
Then you need to randomly select an integer between 1 and max inclusive.
So if your random integer is r the last step is to find which column holds the r'th X. Something like this should work:
for(int i=0;i<num_cols;i++)
{
r -= (num_cols-i);
if (r < 1) return i;
}

Related

How to generate long (up to 25 millions) random sequence of integers in C (with no repetition)?

I need to generate long (pseudo)random arrays (1000-25 000 000 integers) where no element is repeated. How do I do it since rand() function does not generate numbers long enough?
I tried to use this idea: array[i] = (rand() << 14) | rand() % length; however I suppose there is much better way that I don't know.
Thank you for your help.
You can use the Fisher-Yates shuffle for this.
Create an array of n elements and populate each element sequentially.
-------------------------
| 1 | 2 | 3 | 4 | 5 | 6 |
-------------------------
In this example n is 6. Now select a random index from 0 to n-1 (i.e. rand() % n) and swap the number at that index with the number at the top of the array. Let's say the random index is 2. So we swap the value at index 2 (3) and the one at n-1 (6). Now we have:
v
-------------------------
| 1 | 2 | 6 | 4 | 5 | 3 |
-------------------------
Now we do the same, this time with the upper bound of the index being n-2. Then we swap the value at that index with the value at index n-2. Let's say time we randomly get 0. So we swap index 0 (1) with index n-2 (5):
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Then repeat. Let's say the next random index is 3. This happens to be our upper limit, so no change:
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Next we get 0:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------
And finally 1:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------

Why this type of power function work?

res = 1;
for ( i = 1; i <= n; i <<= 1 ) // n = exponent
{
if ( n & i )
res *= a; // a = base
a *= a;
}
This should be more effective code for power and I don't know why this works.
First line of for() is fine I know why is there i <<= i. But I don't understand the line where is: if ( n & i ). I know how that works but I don't know why...
Let us say you have a binary representation of an unsigned number. How do you find the decimal representation?
Let us take a simple four bit example:
N = | 0 | 1 | 0 | 1 |
-----------------------------------------
| 2^3 = 8 | 2^2 = 4 | 2^1 = 2 | 2^0 = 1 |
-----------------------------------------
| 0 | 4 | 0 | 1 | N = 4 + 1 = 5
Now what would happen if the base wasn't fixed at 2 for each bit but instead was the square of the previous bit and you multiply the contribution from each bit instead of adding:
N = | 0 | 1 | 0 | 1 |
----------------------------
| a^8 | a^4 | a^2 | a^1 |
----------------------------
| 0 | a^4 | 0 | a^1 | N = a^4 * a^1 = a^(4+1) = a^5
As you can see, the code calculate a^N

C - how to split a matrix in 4

What`s the easiest method to split a matrix in 4?
I have a nxn matrix, where n is multiple of 4;
http://i.stack.imgur.com/S4H2m.png
____________________
| | |
| | |
| 1st | 2nd |
| | |
|--------+----------
| | |
| 4th | 3rd |
| | |
|________|_________|
I don`t need to make a new matrix, only to get the ranges of i,j that refer to that new matrix;
1 st quadrant range of indices 0 - i/2 and 0 - j/2
2nd 0 - i/2 and j/2+1 - j
3rd i/2+1 - i and j/2+1 - j
4th i/2+1 - i and 0 - j/2
Maybe this could help you:
First matrix should go from: (0,0) - (n/2-1,n/2-1)
Second matrix should go from: (0,n/2) - (n/2-1,n-1)
3th matrix should go from: (n/2,n/2) - (n-1,n-1)
4th matrix should go from: (n/2,0) - (n-1,n/2-1)

Storing user-defined logic in a database

I'm designing a database to store information about events that are dynamic in nature. What I mean by this is that, each type of event will have some variables attached to them that changes on each occurrence based on some rules defined by the user.
Let's say we have Event Type A with variable X and Y. In this event type, the user can define some rules that determines the value of X and Y on each occurrence of the event.
An example of a set of rules a user might define:
On first occurrence, X = 0; Y = 0;
On each occurrence, X = X + 1;
On each occurrence, if X == 100 then { X = 0; Y = Y + 1 }
By defining these rules, the value of X and Y changes dynamically on all occurrences of the event as follow:
1st occurrence: X = 1, Y = 0
2nd occurrence: X = 2, Y = 0
...
100th occurrence: X = 0, Y = 1
Now, I'm not sure how to store the "user-defined rules" in a database and later query them in my code. Can anyone point me in the right direction? Here's a start:
EVENTS
id;
name;
description;
event_type;
EVENT_TYPE_A_OCCURRENCES
id;
event_id;
X;
Y;
EVENT_RULES
id;
event_id;
frequency; // the frequency in which this rule applies
at_occurrence; // apply this rule at a specific occurrence
condition; // stores the code for the condition
statements; // stores the code for the statements
I'm no expert, please help me solve this problem. Thank you.
Assume following user defined rules stored in table:
-----------------------------------------------------------------
|eventid|occurance|keep-old-x|keep-old-y|x-frequency|y-frequency|
-----------------------------------------------------------------
| A | 1 | T | F | 1 | 100 |
-----------------------------------------------------------------
| B | 2 | F | T | -2 | 0 |
-----------------------------------------------------------------
| C | 5 | T | T | 100 | -3 |
-----------------------------------------------------------------
Lets say before event X = 10, Y = 12.
Event = A, ocuuurance = 1, keep-old-x = T, keep-old-y = F, x-frequency = 1, y-frequency = 100
if keep-old-x is T then
X = X + x-frequency
else
X = x-frequency
endif
if keep-old-y is T then
Y = Y + y-frequency
else
Y = y-frequency
endif
Now, X = 11, Y = 100
You may need to add two more columns to change value of X variable on specific value; as:
--------------------------
|if-x-value| x-new-value |
--------------------------
| 100 | 0 |
--------------------------
| 125 | 5 |
--------------------------
| 150 | 10 |
--------------------------
I hope this helps.

Loop unrolling and its effects on pipelining and CPE (have the solution, but don't understand it)

Below the line is a question on a practice test. The table actually has all the solutions filled in. However, I need clarification upon why the solutions are what they are. (Read the question below the horizontal line).
For example, I would really like to understand the solution row for A2 and A3.
As I see it, you have the following situation going on in A2:
x * y
xy * r
xyr * z
Now, let's look at how that'd be in the pipeline:
|1|2|3|4|5|6|7|8 |9|10|11|12|13|14|15|16|17|18|19|20|21|
| | | | | | | | | | | | | | | | | | | | | |
{ x * y } | | | | | | | | | | | | | | | | |
{ xy * r } | | | | | | | | | | | | |
{ xyr * z } | | | | | | | | |
//next iteration, which means different x, y and z's| |
{x2 * y2 } | | | | | | | |
{x2y2 * r } // this is dependent on both previous r and x2y2
{x2y2r * z }
So we are able to overlap xyr * z and x2 * y2, because there are no dependency conflicts. However, that is only getting rid of 3 cycles right?
So it would still be (12 - 3) / 3 = 9 / 3 = 3 Cycles Per Element (three elements). So how are they getting 8/3 CPE for A2?
Any help understanding this concept will be greatly appreciated! There's not a big rush, as the test isn't til next week. If there is any other information you need, please let me know!
(Below is the full test question text, along with the table completely filled in with the solutions)
Consider the following function for computing the product of an array of n integers.
We have unrolled the loop by a factor of 3.
int prod(int a[], int n) {
int i, x, y, z;
int r = 1;
for(i = 0; i < n-2; i += 3) {
x = a[i]; y = a[i+1]; z = a[i+2];
r = r * x * y * z; // Product computation
}
for (; i < n; i++)
r *= a[i];
return r;
}
For the line labeled Product computation, we can use parentheses to create five different
associations of the computation, as follows:
r = ((r * x) * y) * z; // A1
r = (r * (x * y)) * z; // A2
r = r * ((x * y) * z); // A3
r = r * (x * (y * z)); // A4
r = (r * x) * (y * z); // A5
We express the performance of the function in terms of the number of cycles per element
(CPE). As described in the book, this measure assumes the run time, measured in clock
cycles, for an array of length n is a function of the form Cn + K, where C is the CPE.
We measured the five versions of the function on an Intel Pentium III. Recall that the integer multiplication operation on this machine has a latency of 4 cycles and an issue time of 1 cycle.
The following table shows some values of the CPE, and other values missing. The measured
CPE values are those that were actually observed. “Theoretical CPE” means that performance
that would be achieved if the only limiting factor were the latency and issue time of
the integer multiplier.
Fill in the missing entries. For the missing values of the measured CPE, you can use the
values from other versions that would have the same computational behavior. For the values
of the theoretical CPE, you can determine the number of cycles that would be required for
an iteration considering only the latency and issue time of the multiplier, and then divide by 3.
Without knowing the CPU architecture, we can only guess.
My interpretation would be that the timing diagram only shows part of the pipeline, from gathering the operands to writing the result, because this is what is relevant to dependency resolution.
Now, the big if: If there is a buffer stage between the dependency resolver and the execution units, it would be possible to start the third multiplication of the first group (3) and the first multiplication of the second group (4) both at offset 8.
As 3 is dependent on 2, it does not make sense to use a different unit here, so 3 is queued to unit 1 right after 2. The following instruction, 4 is not dependent on a previous result, so it can be queued to unit 2, and started in parallel.
In theory, this could happen as early as cycle 6, giving a CPE of 6/3. In practice, that is dependent on the CPU design.

Resources