Estimating frequency of element of an array in O(n) time - arrays

I suppose the title might be a little misleading, but I couldn't think of a better one.
I have an array A[], all but one of whose elements occurs some number of times that is a multiple of 15, e.g. 2 occurs 30 times, 3 occurs 45 times. But one element occurs x times where x is not a multiple of 15. How do I print the number x. I'm looking for a linear solution without a hash-table.
Thanks.

There was similar question here, on StackOverflow, but i can't find it.
Lets use 3 instead of 15, because it will be easier and i think that it is completely equivalent. The sequence will be 4, 5, 4, 5, 3, 3, 4, 5, in binary 100, 101, 100, 101, 11, 11, 100, 101.
You can do the following: sum all values in least significant bit of numbers and take remainder over 3 (15 originally):
bit1 = (0 + 1 + 0 + 1 + 1 + 1 + 0 + 1) % 3 = 5 % 3 = 2 != 0
if it is != 0 then that bit is equal to 1 in number that we are trying to find. Now lets move to the next:
bit2 = (0 + 0 + 0 + 0 + 1 + 1 + 0 + 0) % 3 = 2 % 3 = 2 != 0
bit3 = (1 + 1 + 1 + 1 + 0 + 0 + 1 + 1) % 3 = 6 % 3 = 0 == 0
So we have bit3 == 0, bit2 != 0, bit1 != 0, making 011. Convert to decimal: 3.
The space complexity is O(1) and time complexity is O(n * BIT_LENGTH_OF_VARS), where BIT_LENGTH_OF_VARS == 8 for byte, BIT_LENGTH_OF_VARS == 32 for int, etc. So it can be large, but constants don't affect asymptotic behavior and O(n * BIT_LENGTH_OF_VARS) is really O(n).
That's it!

Related

How to improve an algorithm to check if there is an element in the array that is equal to the difference between any other two elements in the array?

I know that this is apparently a simple question. But I can't get a better approach to get better efficiency. Here's what I'm trying. It is very naive but I still can't get it correct.
Sort the array. (Divide and Conquer)
a) Select one element at a time
b) loop through all the remaining elements of the array (in a pair) to get
the difference between them to match the selected element.
Repeat step 2 till at least all the elements are found.
Store all the elements that match the condition.
Print the stored elements.
Condition A[i] - A[j] = A[k] is equal to A[i] = A[j] + A[k], so we can look for sum.
Sort the array.
For every element search if it is sum of two others using two pointers approach (increment lower index when sum is too small, decrement upper index when sum is too big)
Resulting complexity is quadratic
Just out of interest, we can solve this problem in O(n log n + m log m) time, where m is the range, using a Fast Fourier Transform.
First sort the input. Now consider that each of the attainable distances between numbers can be achieved by subtracting one difference-prefix-sum from another. For example:
input: 1 3 7
diff-prefix-sums: 2 6
difference between 7 and 3 is 6 - 2
Now let's add the total (the rightmost prefix sum) to each side of the equation:
ps[r] - ps[l] = D
ps[r] + (T - ps[l]) = D + T
Let's list the differences:
1 3 7
2 4
and the prefix sums:
p => 0 2 6
T - p => 6 4 0 // 6-0, 6-2, 6-6
We need to efficiently determine the counts of all the different achievable differences. This is akin to multiplying the polynomial with coefficients [1, 0, 0, 0, 1, 0, 1] by the polynomial with coefficients, [1, 0, 1, 0, 0, 0, 0] (we don't need the zero coefficient in the second set since it only generates degrees less than or equal to T), which we can accomplish in m log m time, where m is the degree, with a Fast Fourier Transform.
The resultant coefficients would be:
1 0 0 0 1 0 1
*
1 0 1 0 0 0 0
=>
x^6 + x^2 + 1
*
x^6 + x^4
= x^12 + x^10 + x^8 + 2x^6 + x^4
=> 1 0 1 0 1 0 1 0 1 0 0 0 0
We discard counts of degrees lower than or equal to T, and display our ordered results:
1 * 12 = 1 * (T + 6) => 1 diffs of 6
1 * 10 = 1 * (T + 4) => 1 diffs of 4
1 * 8 = 1 * (T + 2) => 1 diffs of 2
If any of the coefficients, their negatives, or T are in our set of array elements, we have a match.

Generate random even number in range [m, n]

I was looking for C code to generate a set of random even number in range [start, end]. I tried,
int random = ((start + rand() % (end - start) / 2)) * 2;
This won't work, for example if the range is [0, 4], both 0 & 4 included
int random = (0 + rand() % (4 - 0) / 2) * 2
=> (rand() % 2) * 2
=> 0, 2, ... (never includes 4) but expectation = 0, 2, 4 ...
On the other hands if I use,
int random = ((start + rand() % (end - start) / 2) + 1) * 2;
This won't work, for example,
int random = (0 + (rand() % (4 - 0) / 2) + 1) * 2
=> ((rand() % 4 / 2) + 1) * 2
=> 2, 4, ... (never includes 0) but expectation = 0, 2, 4 ...
Any clue? how to get rid of this problem?
You complicated it too much. Since you're using rand() and the modulo operator, I'm assuming that you will not be using this for cryptographic or security purposes, but as a simple even number generator.
The formula I have found for generating a random even number in the range of [0, 2n] is to use
s = (rand() % (n + 1)) * 2
An example code:
#include <stdio.h>
int main() {
int i, s;
for(i = 0; i < 100; i++) {
s = (rand() % 3) * 2;
printf("%d ", s);
}
}
And it gave me the following output:
2 2 0 2 4 2 2 0 0 2 4 2 4 2 4 2 0 0 2 2 4 4 0 0 4 4 4 2 2 2 4 0 0 0 4 0 2 2 2 2 0 0 0 4 4 2 4 4 4 0 4 2 2 4 4 0 4 4 2 2 0 0 4 0 4 4 2 0 2 4 0 0 0 0 4 0 4 4 0 4 2 0 0 4 4 0 0 4 4 2 0 0 4 0 2 2 2 0 0 4 0 2 4 2
Best regards!
rand() % x will generate a number in the range [0,x) so if you want the range [0,x] then use rand() % (x+1)
Common notation for ranges is to use [] for inclusive and () for exclusive, so [a,b) would be a range such that a is included but not b.
So in your case, just use (rand() % 3)*2 to get random numbers among {0,2,4}
If you want even numbers in the range [m,n], then use ((m/2) + rand() % ((n-m+2)/2))*2
I do not trust in the mod operator for random numbers. I prefer
start + ((1 + stop - start) * rand())
/ (1 + RAND_MAX)
which only relies on the distribution of rand() in the interval
[0, .. , RAND_MAX] and not on any distribution of rand()%n in the
interval [0, .. , n-1].
Note: If you use this expression you should add appropriate casts to avoid multiplication overflow.
Note also
ISO/IEC 9899:201x (p.346):
There are no guarantees as to the quality of the random sequence produced and some implementations are known to produce sequences with distressingly non-random low-order bits. Applications with particular requirements should use a generator that is known to be sufficient for their needs.
Just and-out the low bit, which makes it even:
n= (rand()%N)&(-2);
and to use a start/stop (a range), the values can be offset:
int n, start= 5, stop= 20+1;
n= ((rand()%(stop-start))+start)&(-2);
The latter calculation generates a random number between 0 and RAND_MAX (this value is library-dependent, but is guaranteed to be at least 32767).
If the stop value must be included in the range of generated numbers, then add 1 to the stop value.
It takes that value modulo the stop value plus the start value, and then adds the start value. The value is now within the range of [start, stop]. As only even numbers are required, the low bit is anded-out because even numbers start at 2.
The anding-out is performed by generating a mask of all 1's, except the lowest bit. As -1 is all 1's (0xFFF...FFFFF), -2 is all 1's except this low bit (0xFFF...FFFFE). Next the bitwise AND operation (&) is perfomed with this mask and the number is now in the range [start,stop]. QED.

C rand() dice issue

I'm new to C and I'm reading a book about it. I just came across the rand() function. The book states that using rand() returns a random number from 0 to 32767. It also states that you can narrow the random numbers by using % (modulus operator) to do so.
Here is an example: the following expression puts a random number from 1 to 6 in the variable dice
dice = (rand() % 5) + 1;
I'm unable to get a remainder of 5 as any number from 0 to 33767 % 5 is equal to 0 to 4, but never 5.
Shouldn't it be % 6 in the above statement instead?
For example, if I choose randomly a number between 0 and 32767, let's say 75, then:
75 % 5 == 0
76 % 5 == 1
77 % 5 == 2
78 % 5 == 3
79 % 5 == 4
80 % 5 == 0
Etc.
So regardless of the random number between 0 and 32767, the remainder will never be 5, so it will not be possible to get a 6 number for the dice (as per the above statement).
Not sure if you will understand what I mean but your help would be much appreciated.
dice = (rand() % 5) + 1;
This will generate a random number between 1 to 5, inclusive, as you have analyzed. The % 5 in the book is probably just a typo. To get 1 to 6 it needs to be % 6.
First you will have to understand how modulo (%) works. If you have say 10 and divide it by 5 you get 2 with a remainder of 0, hence the 10 % 5. The possible range of remainders you would get when you mod(modulo) 5 is 0 - 4. Remember that the possible remainders would can get when you divide by x if from 0 to x-1. So in your case with the dice program you need numbers from the range of 1 to 6 (the faces of a die) hence you would mod 6 and add 1 to this number for give the necessary shift. (rand() % 6) + 1

Count permutations - store counter in array [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Input:
I have some arrays, like:
1, 2, 3, 4, 5
2, 1, 3, 4, 5
3, 2, 5, 4, 1
5, 4, 3, 1, 2
.....
All of them are non repeating permutations of 5 digits - 5C5. Rows can repeat, but any digit in row is unique.
Aim:
Count how many arrays of each type (permutation) are in input data.
My thoughts:
5C5 says that there's only 120 unique rows can be. So I can store counters in int[120] array. And increment them while reading input.
My question:
Is there any efficient algorithm to convert (hash) this array into array index?
Preferable language is C, with it's pointers and manual memory management. In perfect, I'm trying to do something like:
FILE *f;
int counters[120] = {0};
char seq[20];
parse_line(f, seq); #scans and parses string into array
counters[hash(seq)]++;
PS:
I was inspired for this question by solving "UVa 157 - Recycling". Later I saw solutions and understood that I misunderstood task, but question left unanswered.
Do a base conversion. The first digit is in base 5, the second in base 4, then base 3, and base 2. So, for example:
1, 2, 3, 4, 5 -> 0 * 4*3*2*1 + 0 * 3*2*1 + 0 * 2*1 + 0 * 1 -> 0
2, 1, 3, 4, 5 -> 1 * 4*3*2*1 + 0 * 3*2*1 + 0 * 2*1 + 0 * 1 -> 24
3, 2, 5, 4, 1 -> 2 * 4*3*2*1 + 1 * 3*2*1 + 2 * 2*1 + 1 * 1 -> 59
5, 4, 3, 1, 2 -> 4 * 4*3*2*1 + 3 * 3*2*1 + 2 * 2*1 + 0 * 1 -> 118
5, 4, 3, 2, 1 -> 4 * 4*3*2*1 + 3 * 3*2*1 + 2 * 2*1 + 1 * 1 -> 119
Remember to only count numbers you haven't seen when choosing the digit! Walking carefully through the third row of the above:
3, 2, 5, 4, 1
At first, we have the following mapping of numbers to digits:
1 2 3 4 5
0 1 2 3 4
Since the first number is 3, the first digit is 2. Now we delete 3 from the numbers, giving
1 2 4 5
0 1 2 3
The next number is 2, so the next digit is 1. The mapping is now
1 4 5
0 1 2
The next number is 5, so the next digit is 2. The mapping is now
1 4
0 1
The next number is 4, so the next digit is 1. The last digit will be 0 though it won't contribute anything to the sum -- the last digit is in unary, so it will always be 0. So the numbers 32541 correspond to the digits 21210.
To calculate the value of this number in base 10, we use the usual base conversion routine: we multiply the "column value" by the current column's base, then add in the value of the current digit times the column value. So:
0 * 1
+ 1 * (1*1)
+ 2 * (2*1*1)
+ 1 * (3*2*1*1)
+ 2 * (4*3*2*1*1)
-----------------
59
See also the wikipedia page on factorial number systems.
Simplest but memory consuming solution is to create non-colliding hash. Convert the array to number, assuming that permutations contains only 5 digits. The max value of number can only be 54321. Take A[54321], calculate number from the digits and increment counter.
Theoritically the optimum collision free hash has following expression:
If S = s0s1s2...sn-1
Hash(S) = s0*M0 + s1*M1 + s2*M3... sn-1*Mn-1
where M is size of set of digits si can take.
In your case, M is 5 and n is 5,
So max value of hash needs to be
1*50 + 2*51 + 3*52 + 4*53 + 5*54 = 3711.

Sum of all subparts of an array of integers

Given an array {1,3,5,7}, its subparts are defined as {1357,135,137,157,357,13,15,17,35,37,57,1,3,5,7}.
I have to find the sum of all these numbers in the new array. In this case sum comes out to be 2333.
Please help me find a solution in O(n). My O(n^2) solution times out.
link to the problem is here or here.
My current attempt( at finding a pattern) is
for(I=0 to len) //len is length of the array
{
for(j=0 to len-i)
{
sum+= arr[I]*pow(10,j)*((len-i) C i)*pow(2,i)
}
}
In words - len-i C i = (number of integers to right) C weight. (combinations {from permutation and combination})
2^i = 2 power (number of integers to left)
Thanks
You can easily solve this problem with a simple recursive.
def F(arr):
if len(arr) == 1:
return (arr[0], 1)
else:
r = F(arr[:-1])
return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1)
So, how does it work? It is simple. Let say we want to compute the sum of all subpart of {1,3,5,7}. Let assume that we know the number of combinatiton of {1,3,5} and the sum of subpart of {1,3,5} and we can easily compute the {1,3,5,7} using the following formula:
SUM_SUBPART({1,3,5,7}) = 11 * SUM_SUBPART({1,3,5}) + NUMBER_COMBINATION({1,3,5}) * 7 + 7
This formula can easily be derived by observing. Let say we have all combination of {1,3,5}
A = [135, 13, 15, 35, 1, 3, 5]
We can easily create a list of {1,3,5,7} by
A = [135, 13, 15, 35, 1, 3, 5] +
[135 * 10 + 7,
13 * 10 + 7,
15 * 10 + 7,
35 * 10 + 7,
1 * 10 + 7,
3 * 10 + 7,
5 * 10 + 7] + [7]
Well, you could look at at the subparts as sums of numbers:
1357 = 1000*1 + 100*3 + 10*5 + 1*7
135 = 100*1 + 10*3 + 1*5
137 = 100*1 + 10*3 + 1*7
etc..
So, all you need to do is sum up the numbers you have, and then according to the number of items work out what is the multiplier:
Two numbers [x, y]:
[x, y, 10x+y, 10y+x]
=> your multiplier is 1 + 10 + 1 = 12
Three numbers [x, y, z]:
[x, y, z,
10x+y, 10x+z,
10y+x, 10y+z,
10z+x, 10z+y,
100x+10y+z, 100x10z+y
.
. ]
=> you multiplier is 1+10+10+1+1+100+100+10+10+1+1=245
You can easily work out the equation for n numbers....
If you expand invisal's recursive solution you get this explicit formula:
subpart sum = sum for k=0 to N-1: 11^(N-k) * 2^k * a[k]
This suggests the following O(n) algorithm:
multiplier = 1
for k from 0 to N-1:
a[k] = a[k]*multiplier
multiplier = multiplier*2
multiplier = 1
sum = 0
for k from N-1 to 0:
sum = sum + a[k]*multiplier
multiplier = multiplier*11
Multiplication and addition should be done modulo M of course.

Resources