A probability theory problem in skiplist's C implement

A probability theory problem in skiplist's C implement - c

These days I am looking at skiplist code in Algorithms in C, Parts 1-4, and when insert a new value into skiplist is more complex than I think. During insert, code should ensure that the new value insert into level i with the probability of 1/2^i, and this implement by below code:
static int Rand()
{
int i, j = 0;
uint32_t t = rand();
for (i = 1, j = 2; i < lg_n_max; i ++, j += j)
if (t > RANDMAX / j)
break;
if (i > lg_n)
lg_n = i;
return i;
}
I don't know how the Rand function ensure this, can you explain this for me, thank you.

Presumably RANDMAX is intended to be RAND_MAX.
Neglecting rounding issues, half the return values of rand are above RAND_MAX / 2, and therefore half the time, the loop exits with i = 1.
If the loop continues, it updates i to 2 and j to 4. Then half the remaining return values (¾ of the total) are above RAND_MAX / 4, so, one-quarter of the time, the loop exits with i = 2.
Further iterations continue in the same manner, each iteration exiting with a portion of return values that is half the previous, until the lg_n_max limit is reached.
Thus, neglecting rounding issues and the final limit, the routine returns 1 half the time, 2 one-quarter of the time, 3 one-eighth the time, and so on.
lg_n is not defined in the routine. It appears to be a record of the greatest value returned by the routine so far.

Thanks Eric Postpischil very much for his answer, I have understand how to ensure the probability. And I have a more understood answer:
The t is a random value between 0 and RANDMAX, and we assume that the loop will run 2 times. In the first loop, value of t is smaller than RANDMAX/2^1, means that value of t fall into the range from 0 to RANDMAX/2 , the probability of this is 1/2. In the second loop, remember the fact that value of t is in the range of (0, RANDMAX/2^i), value of t is smaller that RANDMAX/2^2, means that value of t fall into the range from 0 to RANDMAX/2^2, the probability of this is also 1/2, because the range of (0, RANDMAX/2^2) is only 1/2 of the range in first loop, and the first loop show value of t is in the range of (0, RANDMAX/2^1). And notice that the probability of second loop is conditional probability，it‘s based on the probability of first loop, so the probability of second loop is 1/2*1/2=1/4.
In a short, every loop will bring a * 1/2 to last loop's probability.

Related

c loop function computing time complexity

I am learning to compute the time complexity of algorithms.
Simple loops and nested loops can be compute but how can I compute if there are assignments inside the loop.
For example :
void f(int n){
int count=0;
for(int i=2;i<=n;i++){
if(i%2==0){
count++;
}
else{
i=(i-1)*i;
}
}
}
i = (i-1)*i affects how many times the loop will run. How can I compute the time complexity of this function?

As i * (i-1) is even all the time ((i * (i-1)) % 2 == 0), if the else part will be true for one time in the loop, i++ makes the i odd number. As result, after the first odd i in the loop, always the condition goes inside the else part.
Therefore, as after the first iteration, i will be equal to 3 which is odd and goes inside the else part, i will be increased by i * (i-1) +‌ 1 in each iteration. Hence, if we denote the time complexity of the loop by T(n), we can write asymptotically: T(n) = T(\sqrt(n)) + 1. So, if n = 2^{2^k}, T(n) = k = log(log(n)).

There is no general rule to calculate the time complexity for such algorithms. You have to use your knowledge of mathematics to get the complexity.
For this particular algorithm, I would approach it like this.
Since initially i=2 and it is even, let's ignore that first iteration.
So I am only considering from i=3. From there I will always be odd.
Your expression i = (i-1)*i along with the i++ in the for loop finally evaluates to i = (i-1)*i+1
If you consider i=3 as 1st iteration and i(j) is the value of i in the jth iteration, then i(1)=3.
Also
i(j) = [i(j-1)]^2 - i(j-1) + 1
The above equation is called a recurrence relation and there are standard mathematical ways to solve it and get the value of i as a function of j. Sometimes it is possible to get and sometimes it might be very difficult or impossible. Frankly, I don't know how to solve this one.
But generally, we don't get situations where you need to go that far. In practical situations, I would just assume that the complexity is logarithmic because the value of i is increasing exponentially.

What is the time complexity of the following dependent loops?

I have a question that needs answer before an exam I'm supposed to have this week.
i = 1;
while (i <= n)
{
for (j = 1; j < i; j++)
printf("*");
j *= 2;
i *= 3;
}
I have those dependent loops, I calculated the outer loop's big O to be O(logn).
The inner loop goes from 1 to i - 1 for every iteration of the outer loop,
the problem I'm having with this is that I do not know how calculate the inner loop's time complexity, and then the overall complexity (I'm used to just multiplying both complexities but I'm not sure about this one)
Thanks a lot!
P.S: I know that the j *= 2 doesn't affect the for loop.

As you recognized, computing the complexity of a loop nest where the bounds of an inner loop vary for different iterations of the outer loop is not as easy a simple multiplication of two iteration counts. You need to look more deeply to get the tightest possible bound.
The question can be taken to be asking about how many times the body of the inner loop is executed, as a function of n. On the first outer-loop iteration, i is 1, so j is never less than i, so there are no inner-loop iterations. Next, i is 3, so there are two inner-loop iterations, then eight the next time, then 26 ... in short, 3i-1 - 1 inner-loop iterations. You need to add those all up to compute the overall complexity.
Well, that sum is Σi = 1, floor(log n) (3i-1 - 1), so you could say that the complexity of the loop nest is
O(Σi = 1, floor(log n) (3i-1 - 1))
, but such an answer is unlikely to get you full marks.
We can simplify that by observing that our sum is bounded by a related one:
= O(Σi = 1, floor(log n) (3i-1))
. At this point (if not sooner) it would be useful to recognize the sum of powers pattern therein. It is often useful to know that 20 + 21 + ... 2k - 1 = 2k - 1. This is closely related to base-2 numeric representations, and a similar formula can be written for any other natural number base. For example, for base 3, it is 2 * 30 + 2 * 31 + ... 2 * 3k - 1 = 3k - 1. This might be enough for you to intuit the answer: that the total number of inner-loop iterations is bounded by a constant multiple of the number of inner-loop iterations on the last iteration of the outer loop, which in turn is bounded by n.
But if you want to prove it, then you can observe that the sum in the previous bound expression is itself bounded by a related definite integral:
= O(∫0log n 3i di)
... and that has a closed-form solution:
= O((3log n - 30) / log 3)
, which clearly has a simpler bound itself
= O(3log n)
. Exponentials of logarithms reduce to linear functions of the logarithm argument. Since we need only an asymptotic bound, we don't care about the details, and thus we can go straight to
= O(n)

Code to generate bell curve is only creating data at even indexes. Why?

I'm writing some code to use random numbers to create a bell curve.
The basic approach is as follows:
Create an array of 2001 integers.
For some number of repeats, do the following:
• Start with a value of 1000 (the center-value)
• Loop 1000 times
• Generate a random number 0 or 1. If the random number is zero, subtract 1 from the value. If it's 1, add 1 to the value.
• Increment the count in my array at the resulting index value.
So 1000 times, we randomly add 1 or subtract 1 from a starting value of 1000. On average, we'll add 1 and subtract one about as often, so the outcome should be centered around 1000. Values greater or less than 1000 should be less and less frequent. A value at index 0 or index 1 would require a "coin toss" with the same result 1000 times in a row... a VERY unlikely event that is still possible.
Here is the code I came up with, written in C with a thin Objective C wrapper:
#import "BellCurveUtils.h"
#implementation BellCurveUtils
#define KNumberOfEntries 1000
#define KPinCount 1000
#define KSlotCount (KPinCount*2+1)
static int bellCurveData[KSlotCount];
+(void) createBellCurveData;
{
NSLog(#"Entering %s", __PRETTY_FUNCTION__);
NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
int entry;
int i;
int random_index;
//First zero out the data
for (i = 0; i< KSlotCount; i++)
bellCurveData[i] = 0;
//Generate KNumberOfEntries entries in the array
for (entry =0; entry<KNumberOfEntries; entry++)
{
//Start with a value of 1000 (center value)
int value = 1000;
//For each entry, add +/- 1 to the value 1000 times.
for (random_index = 0; random_index<KPinCount; random_index++)
{
int random_value = arc4random_uniform(2) ? -1: 1;
value += random_value;
}
bellCurveData[value] += 1;
}
NSTimeInterval elapsed = [NSDate timeIntervalSinceReferenceDate] - start;
NSLog(#"Elapsed time = %.2f", elapsed);
int startWithData=0;
int endWithData=KSlotCount-1;
for (i = 0; i< KSlotCount; i++)
{
if (bellCurveData[i] >0)
{
startWithData = i;
break;
}
}
for (i = KSlotCount-1; i>=0 ; i--)
{
if (bellCurveData[i] >0)
{
endWithData = i;
break;
}
}
for (i = startWithData; i <= endWithData; i++)
printf("value[%d] = %d\n", i, bellCurveData[i]);
}
#end
The code does generate a bell-shaped curve. However, the array entries with odd indexes are ALL zero.
Here is some sample output:
value[990] = 23
value[991] = 0
value[992] = 22
value[993] = 0
value[994] = 20
value[995] = 0
value[996] = 25
value[997] = 0
value[998] = 37
value[999] = 0
value[1000] = 23
value[1001] = 0
value[1002] = 26
value[1003] = 0
value[1004] = 20
value[1005] = 0
value[1006] = 28
value[1007] = 0
value[1008] = 23
value[1009] = 0
value[1010] = 26
I have gone over this code line-by-line, and do not see why this is. When I step through it in the debugger, I get values that bounce around by single steps, starting at 1000, dropping to 999, incrementing to 1001, and various values even and odd. However, after 1000 iterations, the result of value is always even. What am I missing here?!?
I realize this isn't a typical SO development question, but I'm stumped. I cannot see what I am doing wrong. Can somebody explain these results?

//For each entry, add +/- 1 to the value 1000 times.
for (random_index = 0; random_index<KPinCount; random_index++)
{
int random_value = arc4random_uniform(2) ? -1: 1;
value += random_value;
}
For any two iterations of this loop, there are three potential outcomes:
random_value is zero both times, in which case "value" decreases by 2.
random_value is one both times, in which case "value" increases by 2.
random_value is zero once and one once, in which case "value" is unchanged.
Therefore, if the loop runs an even number of times (i.e. KPinCount is an even number), the parity of "value" will never change. Since it begins as an even number (1000), it ends as an even number.
Edit: If you want to resolve the problem but keep the same basic approach, then rather than starting with value = 1000 and running 1000 iterations in which you either add or subtract one, perhaps you could start with value = 0 and run 2000 iterations in which you add either one or zero. I'd have posted this as a comment to the discussion above, but can't comment since I just registered.

Youe immediate problem is at
for (random_index = 0; random_index < KPinCount; random_index++)
{
int random_value = arc4random_uniform(2) ? -1: 1;
value += random_value;
}
Because KPinCount is defined as 1000 (an even number), at the end of the loop, value will have changed by a multiple of 2.
Maybe try with KPinCount varying between 999 and 1000???

Ok, I've gotten some very useful feedback on this project.
To summarize:
If you always add or subtract one from a value, and do it twice, the possibilities are:
+1 +1 = even change
+1 -1 = no (even) change
-1 -1 = even change
Thus in that case the value always changes by 0 or 2, so result is always an even number.
Likewise, if you always apply an odd number of +1/-1 value changes, the resulting value will always be odd.
A couple of solutions were proposed.
Option 1: (The change I used in my testing) was before calculating each value, randomly decide to loop either 999 or 1000 times. That way half the time the result will be even and the other half of the time the value will be odd.
This has the effect that the spread of the graph will be infinitesimally narrower, because half of the time the possible range of values will be less by +/- 1.
Option 2 was to generate 3 random values, and add +1,0, or -1 to the value based on the result.
Option 3, suggested by #rhashimoto in the comments to one of the other answers, was to generate 4 random values, and add +1,0, 0, or -1 to the value based on the result.
I suspected that options 2 and 3 would cause a narrower spread of the curve because for 1/3 or 1/4 of the possible random values on each iteration, the value would not change, so the average spread of values would be smaller.
I've run a number of tests with different settings, and confirmed my suspicions.
Here are graphs of the different approaches. All sample graphs are plots of 1,000,000 points, with the graph clamped to values ranging from 800 to 1200 since there are never values outside that range in practice. The green bars on the graph are at the center point and +/- 50 steps
First, option 1, which randomly applies either 999 or 1000 +/-1 changes to the starting value:
Option 2, 1000 iterations of applying 3 random possible changes, -1,0, or +1:
And option 3, 1000 iterations of applying 4 random possible changes, -1,0, 0, or +1, as suggested by rhashimoto in the comments to pmg's answer:
And overlaying all the graphs on top of each other in Photoshop:
I have created graphs using a much larger number of points (100 million instead of 1 million) and the graphs are much smoother and less "jittery", but the shape of the curve is for all practical purposes identical. Applying a modest rolling average to the results from a one-million iteration graph would no doubt yield a very smooth curve.

Efficient way to detect "rank of corner" in flattened multi-dimensional array

This is a small piece of very frequently-called code, and part of a convolution algorithm I am trying to optimise (technically it's my first-pass optimisation, and I have already improved speed by a factor of 2, but now I am stuck):
inline int corner_rank( int max_ranks, int *shape, int pos ) {
int i;
int corners = 0;
for ( i = 0; i < max_ranks; i++ ) {
if ( pos % shape[i] ) break;
pos /= shape[i];
corners++;
}
return corners;
}
The code is being used to calculate a property of a position pos within an N-dimensional array (that has been flattened to pointer, plus arithmetic). max_ranks is the dimensionality, and shape is the array of sizes in each dimension.
An example 3-dimensional array might have max_ranks = 3, and shape = { 3, 4, 5 }. The schematic layout of the first few elements might look like this:
0 1 2 3 4 5 6 7 8
[0,0,0] [1,0,0] [2,0,0] [0,1,0] [1,1,0] [2,1,0] [0,2,0] [1,2,0] [2,2,0]
Returned by function:
3 0 0 1 0 0 1 0 0
Where the first row 0..8 shows the index offset given by pos, and the numbers below give the multi-dimensional indices. Edit: Below that I have put the value returned by the function (the value of 2 is returned at positions 12, 24 and 36).
The function is effectively returning the number of "leading" zeros in the multi-dimensional index, and is designed as it is to avoid needing to make a full conversion to array indices on every increment.
Is there anything I can do with this function to make it inherently faster? Is there a clever way of avoiding %, or another way to calculate the "corner rank" - apologies by the way if it has a more formal name that I do not know . . .

The only time you should return max_ranks is if pos equals zero. Checking for this allows you to remove the conditional check from your for-loop. This should improve both the worst case completion time, and speed of the looping for large values of max_ranks.
Here is my addition, plus a alternative way of avoiding the division operation. I believe that this is as fast as a handwritten div like #twalberg was suggesting, unless there is some way to produce the remainder without a second multiplication.
I'm afraid since the most common answer is 0 (which doesn't even get past the first mod call) you aren't going to see much improvement. My guess is that your average run time is very close to the run time of the modulus function itself. You might try searching for a faster way to determine if a number is a factor of pos. You don't actual need to calculate the remainder; you just need to know if there is a remainder or not.
Sorry if I made things confusing by restructuring your code. I believe this will be slightly faster unless your compiler was already making these optimizations.
inline int corner_rank( int max_ranks, int *shape, int pos ) {
// Most calls will not get farther than this.
if (pos % shape[0] != 0) return 0;
// One check here, guarantees that while loop below always returns.
if (pos == 0) return max_ranks;
int divisor = shape[0] * shape[1];
int i = 1;
while (true) {
if (pos % divisor != 0) return i;
divisor *= shape[++i];
}
}
Also try declaring pos and divisor as the smallest types possible. If they will never be greater than 255 you can use an unsigned char. I know that some processors can perform a divide with smaller numbers faster than larger numbers, but you have to set your variable types appropriately.

Incorrect sum - For loop in C

I am trying to sum values in a for loop with C. The initial value of variable x = 1 and I want to double it a set number of times and add the result. I have made the for loop, but my sum is always off by the initial value.
For example, if x = 1, the pattern should go:
1, 2, 4, 8, 16
...and the total should be 31. Unfortunately, total is off by one.
int x = 1;
int y = 10;
int total;
for(int i = 1; i < y; i++)
{
x *= 2;
total += x;
}
printf("Total: %d\n", total);
This is off by one. How can I have the loop start with 1 instead of 2?

Switch the two statements in the body of the for loop. Also it is a good idea to initialize total to 0, in case you want to move all of this into a function.

As is usually the case with erroneous code, there is more that one way to "fix" it. While you made it sufficiently clear as to what you are trying to implement, nobody knows how you are trying to implement it.
As #Ray Toal already noted in his answer, the correct result can be obtained by initializing total to 0 before the cycle and doing x *= 2 after the addition inside the cycle.
Alternatively, one can say that the cycle is OK as it is in the original code. You just have to initialize the total to 1 before the cycle.
Which approach is closer to what you were trying to implement originally - only you know. In both cases make sure you make the correct number of iterations.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight