Generate Random number between two number with one rare number - c

i can generate random number between two numbers in c using this..
arc4random()%(high-low+1)+low;
then now my requirement is...i want to make a number rare....thats mean if
high=5,
low=1,
and rare=3,
than 3 will be appeared much rarely than 1,2,4 and 5...
Thanks

You can use tables to calculate your final roll, similar to how pen and paper RPGs do this same type of calculation:
Roll 1 D 21 (easily possibly w/ code).
If you get 1-5, it counts as a 1
If you get 6-10, it counts as a 2
If you get 11-15, it counts as a 4
If you get 16-20, it counts as a 5
If you get a 21, it counts as a 3
The advantage to this option is you get a strong sense of the exact probabilities you are dealing with. You can get a feeling of exactly how rare or common each number is, and you get fine-grained control of how common each number is, in comparison to the other numbers.
You could also use fractions to generate the table. Use the Least Common Multiple to determine a common base. That base is the max random number size you will need. Then, put all the fractions in like terms. Use the resulting numerators to determine the size of the range for each number in the table.
With this automated solution, the input numbers are very easy to understand in relation to each other. E.g:
1/4 for 1
1/4 for 2
1/4 for 4
1/5 for 5
1/20 for 3
This would generate a table like so:
LCM = 20
1-5 = 1 (like terms - 5/20)
6-10 = 2 (5/20)
11-15 = 4 (5/20)
16-19 = 5 (4/20)
20 = (1/20)
Some more on LCM: http://en.wikipedia.org/wiki/Least_common_multiple

One simple-to-understand option:
Generate one number to determine whether you're going to return the rare number (e.g. generate a number in the range [0-99], and if it's 0, return the rare number
If you get to this step, you're returning a non-rare number: keep generating numbers in the normal range until you get any non-rare number, and return that
There are other alternative approaches which would only require you to generate a single number, but the above feels like it would be the simplest one to write and understand.

You could create an array containing the numbers according to their probability:
list = (1, 1, 2, 2, 3, 4, 4, 5, 5);
return list.itemAtIndex(random() % list.count());
This is not very elegant, but it works and easily scales should the probabilities get more complex.

The sum of all probabilities must be 1. Now we are working here with discrete probabilities over a finite range so we are looking at (here) 5 possibilities with some distribution you have, call them p1, p2, p3, p4 and p5 the sum of which is 1.
f0 = 0
f1 = p1
f2 = f1 + p2
f3 = f2 + p3
f4 = f3 + p4
f5 = f4 + p5 and must be 1
Generate a random number from 0 to 1 and we will assume it cannot be exactly 1. Look at the f value that fits into its ceiling and that is the value of your random event. So perhaps
f1 = 0.222
f2 = 0.444
f3 = 0.555
f4 = 0.777
f5 = 1
If your random number is 0.645 then you have generated a 4 event.
With the above you have half as much chance of generating a 3 than any of the others. We can make it less likely still, eg:
f1 = 0.24
f2 = 0.48
f3 = 0.52
f4 = 0.76
f5 = 1
0.24 probably of the others and only 0.04 of a 3.

Lets go through this. First we use the srand() function to seed the randomizer. Basically, the computer can generate random numbers based on the number that is fed to srand(). If you gave the same seed value, then the same random numbers would be generated every time.
Therefore, we have to seed the randomizer with a value that is always changing. We do this by feeding it the value of the current time with the time() function.
Now, when we call rand(), a new random number will be produced every time.
#include<stdio.h>
int random_number(int min_num, int max_num);
int main(void) {
printf("Min : 1 Max : 30 %d\n",random_number(0,5));
printf("Min : 100 Max : 1000 %d\n",random_number(100,1000));
return 0;
}
int random_number(int min_num, int max_num)
{
int result=0,low_num=0,hi_num=0;
if(min_num<max_num)
{
low_num=min_num;
hi_num=max_num+1; // this is done to include max_num in output.
}else{
low_num=max_num+1;// this is done to include max_num in output.
hi_num=min_num;
}
srand(time(NULL));
result = (rand()%(hi_num-low_num))+low_num;
return result;
}

while true
generate a random number
if it's not the rare number, return it
generate a second random number - say from 1 to 100
if that second number's <= the percentage chance of the rare number compared to the others, return the rare number
Note: this is fast for the common case or returning the non-rare number.

Related

binomial coefficient for very high numbers in c

So the task I have to solve is to calculate the binomial coefficient for 100>=n>k>=1 and then say how many solutions for n and k are over an under barrier of 123456789.
I have no problem in my formula of calculating the binomial coefficient but for high numbers n & k -> 100 the datatypes of c get to small to calculated this.
Do you have any suggestions how I can bypass this problem with overflowing the datatypes.
I thought about dividing by the under barrier straight away so the numbers don't get too big in the first place and I have to just check if the result is >=1 but i couldn't make it work.
Say your task is to determine how many binomial coefficients C(n, k) for 1 ≤ k < n ≤ 8 exceed a limit of m = 18. You can do this by using the recurrence C(n, k) = C(n − 1, k) + C(n − 1, k − 1) that can visualized in Pascal's triangle.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 (20) 15 6 1
1 7 (21 35 35 21) 7 1
1 8 (28 56 70 56 28) 8 1
Start at the top and work your way down. Up to n = 5, everything is below the limit of 18. On the next line, the 20 exceeds the limit. From now on, more and more coefficients are beyond 18.
The triangle is symmetric and strictly increasing in the first half of each row. You only need to find the first element that exceeds the limit on each line in order to know how many items to count.
You don't have to store the whole triangle. It is enough to keey the last and current line. Alternatively, you can use the algorithm detailed [in this article][ot] to work your way from left to right on each row. Since you just want to count the coefficients that exceed a limit and don't care about their values, the regular integer types should be sufficient.
First, you'll need a type that can handle the result. The larget number you need to handle is C(100,50) = 100,891,344,545,564,193,334,812,497,256. This number requires 97 bits of precision, so your normal data types won't do the trick. A quad precision IEEE float would do the trick if your environment provides it. Otherwise, you'll need some form of high/arbitrary precision library.
Then, to keep the numbers within this size, you'll want cancel common terms in the numerator and the denominator. And you'll want to calculate the result using ( a / c ) * ( b / d ) * ... instead of ( a * b * ... ) / ( c * d * ... ).

Determine the adjacency of two fibonacci number

I have many fibonacci numbers, if I want to determine whether two fibonacci number are adjacent or not, one basic approach is as follows:
Get the index of the first fibonacci number, say i1
Get the index of the second fibonacci number, say i2
Get the absolute value of i1-i2, that is |i1-i2|
If the value is 1, then return true.
else return false.
In the first step and the second step, it may need many comparisons to get the correct index by using accessing an array.
In the third step, it need one subtraction and one absolute operation.
I want to know whether there exists another approach to quickly to determine the adjacency of the fibonacci numbers.
I don't care whether this question could be solved mathematically or by any hacking techniques.
If anyone have some idea, please let me know. Thanks a lot!
No need to find the index of both number.
Given that the two number belongs to Fibonacci series, if their difference is greater than the min. number among them then those two are not adjacent. Other wise they are.
Because Fibonacci series follows following rule:
F(n) = F(n-1) + F(n-2) where F(n)>F(n-1)>F(n-2).
So F(n) - F(n-1) = F(n-2) ,
=> Diff(n,n-1) < F(n-1) < F(n-k) for k >= 1
Difference between two adjacent fibonaci number will always be less than the min number among them.
NOTE : This will only hold if numbers belong to Fibonacci series.
Simply calculate the difference between them. If it is smaller than the smaller of the 2 numbers they are adjacent, If it is bigger, they are not.
Each triplet in the Fibonacci sequence a, b, c conforms to the rule
c = a + b
So for every pair of adjacent Fibonaccis (x, y), the difference between them (y-x) is equal to the value of the previous Fibonacci, which of course must be less than x.
If 2 Fibonaccis, say (x, z) are not adjacent, then their difference must be greater than the smaller of the two. At minimum, (if they are one Fibonacci apart) the difference would be equal to the Fibonacci between them, (which is of course greater than the smaller of the two numbers).
Since for (a, b, c, d)
since c= a+b
and d = b+c
then d-b = (b+c) - b = c
By Binet's formula, the nth Fibonacci number is approximately sqrt(5)*phi**n, where phi is the golden ration. You can use base phi logarithms to recover the index easily:
from math import log, sqrt
def fibs(n):
nums = [1,1]
for i in range(n-2):
nums.append(sum(nums[-2:]))
return nums
phi = (1 + sqrt(5))/2
def fibIndex(f):
return round((log(sqrt(5)*f,phi)))
To test this:
for f in fibs(20): print(fibIndex(f),f)
Output:
2 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
Of course,
def adjacentFibs(f,g):
return abs(fibIndex(f) - fibIndex(g)) == 1
This fails with 1,1 -- but there is little point for explicit testing special logic for such an edge-case. Add it in if you want.
At some stage, floating-point round-off error will become an issue. For that, you would need to replace math.log by an integer log algorithm (e.g. one which involves binary search).
On Edit:
I concentrated on the question of how to recover the index (and I will keep the answer since that is an interesting problem in its own right), but as #LeandroCaniglia points out in their excellent comment, this is overkill if all you want to do is check if two Fibonacci numbers are adjacent, since another consequence of Binet's formula is that sufficiently large adjacent Fibonacci numbers have a ratio which differs from phi by a negligible amount. You could do something like:
def adjFibs(f,g):
f,g = min(f,g), max(f,g)
if g <= 34:
return adjacentFibs(f,g)
else:
return abs(g/f - phi) < 0.01
This assumes that they are indeed Fibonacci numbers. The index-based approach can be used to verify that they are (calculate the index and then use the full-fledged Binet's formula with that index).

Making Minimal Changes to Change Range of the Array

Consider having an array filled with elements a0,a1,a2,....,a(n-1).
Consider that this array is sorted already; it will be easier to describe the problem.
Now the range of the array is defined as the biggest element - smallest element.
Say this range is some value x.
Now the problem I have is that, I want to change the elements in such a way that the range becomes less than/equal to some target value y.
I also have the additional constraint that I want to change minimal amount for each element. Consider an element a(i) that has value z. If I change it by r amount, this costsr^2.
Thus, what is an efficient algorithm to update this array to make the range less than or equal to target range y that minimizes the cost.
An example:
Array = [ 0, 3, 19, 20, 23 ] Target range is 17.
I would make the new array [ 3, 3, 19, 20, 20 ] . The cost is (3)^2 + (3)^2 = 18.
This is the minimal cost.
If you are adding/removing to some certain element a(i), you must add/remove that quantity q all at once. You can not remove 3 times 1 unit from a certain element, but must remove a quantity of 3 units once.
I think you can build two heaps from the array - one min-heap, one max-heap. Now you will take the top elements of both heaps and peek at the ones right under them and compare the differences. The one that has the bigger difference you will take and if that difference is bigger than you need, you will just take the required size and add the cost.
Now, if you had to take the whole difference and didn't achieve your goal, you will need to repeat this step. However, if you once again choose from the same heap, you have to remember to add the cost for the element you are taking out of the heap in that steps AND also for those that have been taken out of the processed heap before.
This yields an O(N*logN) algorithm, I'm not sure if it can be done faster.
Example:
Array [2,5,10,12] , I want difference 4.
First heap has 2 on top, second one 12. the 2 is 3 far from 5 and 12 is 2 far from 10 so I take the min-heap and the two will have to be changed by 3. So now we have a new situation:
[5, 10, 12]
The 12 is 2 far from 10 and we take it, subtract 2 and get new situation:
[5,10]
Now we can choose any heap, both differences are the same (the same numbers :-) ). We just need to change by 1 so we get subtract 1 from 10 and get the right result. Now, because we changed 5 to 6 we would also have to change the number that was originally 12 once more to 9 so the resulting cost:
[2 - changed to 5, 5 - unchanged, 10 - changed to 9, 12 - changed to 9].
Here is a linear-time algorithm that minimizes the piecewise quadratic objective function. Probably it can be simplified.
Let the range be [x, x + y], where x is a variable. For different choices of x, there are at most 2n + 1 possibilities for which points lie in the range, arising from 2n critical values a0 - y, a1 - y, ..., a(n-1) - y, a0, a1, ..., a(n-1). One linear-time merge yields the critical values in sorted order. For each of the 2n - 1 intervals [w, z] between critical values where the range contains at least one point, we can construct and minimize a quadratic function consisting of a sum where every point aj less than w yields a term (x - aj)^2 and every point aj greater than z + y yields a term (x + y - aj)^2. The global minimum lies at the mean of aj (for terms of the first type) or aj - y (for terms of the second type); the endpoints of the interval must be checked as well. Naively, this gives a quadratic-time algorithm.
To get down to linear time, it suffices to update the sum preceding the mean computation incrementally. Each of the critical values has an associated event indicating whether the point responsible for it is entering or leaving the interval, meaning that that point's term should enter or leave the sum.

How to get an evenly distributed sample from Perl array values?

I have an array containing many values between 0 and 360 (like degrees in a circle), but unevenly distributed:
1,45,46,47,48,49,50,51,52,53,54,55,100,120,140,188, 210, 280, 355
Now I need to reduce those values to e.g. 4 only, but as evenly as possible distributed values.
How to do that?
Thanks,
Jan
Put the numbers on a circle, like a clock. Now construct a logical cross, say at 12, 3, 6, and 9 o’clock. Put the 12 at the first number. Now find what numbers would be nearest to 3, 6, and 9 o’clock, and record the sum of those three numbers’ distances next to the first number.
Iterate by rotating the top of your cross — the 12 o’clock point — clockwise until it exactly lines up with the next number. Again measure how far the nearest numbers are to each of your three other crosspoints, and record that score next to this current 12 o’clock number.
Repeat until you reach your 12 o’clock has rotated all the way to the original 3 o’clock, at which point you’re done. Whichever number has the lowest sum assigned to it determines the winning configuration.
This solution generalizes to any range of values R and any number N of final points you wish to reduce the set to. Each point on the “cross” is R/N away from each other, and you need only rotate until the top of your cross reaches where the next arm was in the original position. So if you wanted 6 points, you would have a 6-pointed cross, each 60 degrees apart instead of a 4-pointed cross each 90 degrees apart. If your range is different, you still do the same sort of operation. That way you don’t need a physical clock and cross to implement this algorithm: it works for any R and N.
I feel bad about this answer from a Perl perspective, as I’ve not managed to include any dollar signs in the solution. :)
Use a clustering algorithm to divide your data into evenly distributed partitions. Then grab a random value from each cluster. The following $datafile looks like this:
1 1
45 45
46 46
...
210 210
280 280
355 355
First column is a tag, second column is data. Running the following with $K = 4:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster->[int rand( #$cluster - 1)];
}
use YAML; print Dump \%clusters;
returns this:
120: 120
199: 188
317.5: 355
45.9166666666667: 46
First column is the center of the cluster, second is the selected value from that cluster. The centers' distance to one another should be maximized according to the Expectation Maximization algorithm.

Find all possible row-wise sums in a 2D array

Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.

Resources