How do i find median for the given program - c

We have to find mean median and mode for certain sets of observations we take last observation as -1 always and doesn't need to count it in the list
Sample input and ouput:
1 2 3 -1
2 2 -1(no mode)
My code
int mean,median,mode,i,sum=0,count=0,temp;
int obs[size];
for(i=0;i<size;i++)
{
scanf("%d",&obs[size] );
if(obs[size]==-1)
{
break;
}
sum=sum+obs[size];
count++;
}
mean=sum/count;
printf("%d",mean);
temp=count;
if(temp%2==0)
{
median=(obs[(temp)] /2 - obs[(temp - 1)/2])/2;
}
else
{
median=obs[(temp-1)/2];
}
printf("\t%d",median);
But here median condition doesn't work what's the problem?

Your problem is that your code doesn't find the median.
In fact, it finds nothing and will crash during runtime because the way you are reading inputs into your array causes an array index out of bound error.
Replace obs[size] with obs[i] in the loop where you are getting the inputs into your array.
I assume that you fill the array in ascending order while entering your observations, otherwise you have to sort the array before finding the median.
And I don't understand your need to introduce the temp variable. Use count and don't ruin the code readability. I think you have a false notion that operations on count will change its value; it won't unless you assign those values back to count.
You have declared median as of type int. Make it float for reasons you will find below.
Getting into your code for median,
if (count % 2 == 0)
{
// median=(obs[(count)] /2 - obs[(count - 1)/2])/2; --> Mistake
median = (obs[count / 2] - obs[(count / 2) - 1 ]) / 2;
// int median cannot handle this case because we divide two integers by 2
}
else
{
// median=obs[(count-1)/2]; --> Mistake
median = obs[(count / 2) + 1];
}
Be careful that you don't use count - 1 instead of count because you have incremented the value of count even while reading -1 into your obs. Since you have not posted that code for mode and if you think that that's not working as well then this might be the problem.

Related

Picking random indexes into a sorted array

Let's say I have a sorted array of values:
int n=4; // always lower or equal than number of unique values in array
int i[256] = {};
int v = {1 1 2 4 5 5 5 5 5 7 7 9 9 11 11 13}
// EX 1 ^ ^ ^ ^
// EX 2 ^ ^ ^ ^
// EX 3 ^ ^ ^ ^
I would like to generate n random index values i[0] ... i[n-1], so that:
v[i[0]] ... v[i[n-1]] points to a unique number (ie. must not point to 5 twice)
Each number to must be the rightmost of its kind (ie. must point to the last 5)
An index to the final number (13 in this case) should always be included.
What I've tried so far:
Getting the indexes to the last of the unique values
Shuffling the indexes
Pick out the n first indexes
I'm implementing this in C, so the more standard C functions I can rely on and the shorter code, the better. (For example, shuffle is not a standard C function, but if I must, I must.)
Create an array of the last index values
int last[] = { 1, 2, 3, 8, 10, 12, 14 };
Fisher-Yates shuffle the array.
Take the first n-1 elements from the shuffled array.
Add the index to the final number.
Sort the resulting array, if desired.
This algorithm is called reservoir sampling, and can be used whenever you know how big a sample you need but not how many elements you're sampling from. (The name comes from the idea that you always maintain a reservoir of the correct number of samples. When a new value comes in, you mix it into the reservoir, remove a random element, and continue.)
Create the return value array sample of size n.
Start scanning the input array. Each time you find a new value, add its index to the end of sample, until you have n sampled elements.
Continue scanning the array, but now when you find a new value:
a. Choose a random number r in the range [0, i) where i is the number of unique values seen so far.
b. If r is less than n, overwrite element r with the new element.
When you get to the end, sort sample, assuming you need it to be sorted.
To make sure you always have the last element in the sample, run the above algorithm to select a sample of size n-1. Only consider a new element when you have found a bigger one.
The algorithm is linear in the size of v (plus an n log n term for the sort in the last step.) If you already have the list of last indices of each value, there are faster algorithms (but then you would know the size of the universe before you started sampling; reservoir sampling is primarily useful if you don't know that.)
In fact, it is not conceptually different from collecting all the indices and then finding the prefix of a Fisher-Yates shuffle. But it uses O(n) temporary memory instead of enough to store the entire index list, which may be considered a plus.
Here's an untested sample C implementation (which requires you to write the function randrange()):
/* Produces (in `out`) a uniformly distributed sample of maximum size
* `outlen` of the indices of the last occurrences of each unique
* element in `in` with the requirement that the last element must
* be in the sample.
* Requires: `in` must be sorted.
* Returns: the size of the generated sample, while will be `outlen`
* unless there were not enough unique elements.
* Note: `out` is not sorted, except that the last element in the
* generated sample is the last valid index in `in`
*/
size_t sample(int* in, size_t inlen, size_t* out, size_t outlen) {
size_t found = 0;
if (inlen && outlen) {
// The last output is fixed so we need outlen-1 random indices
--outlen;
int prev = in[0];
for (size_t curr = 1; curr < inlen; ++curr) {
if (in[curr] == prev) continue;
// Add curr - 1 to the output
size_t r = randrange(0, ++found);
if (r < outlen) out[r] = curr - 1;
prev = in[curr];
}
// Add the last index to the output
if (found > outlen) found = outlen;
out[found] = inlen - 1;
}
return found;
}

Minimum Size Subarray Sum with sorting

The Minimum Size Subarray Sum problem:
given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn't one, return 0 instead.
For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.
The following is my solution:
public int minSubArrayLen(int s, int[] nums) {
long sum = 0;
int a = 0;
if (nums.length < 1)
return 0;
Arrays.sort(nums);
for (int i = nums.length-1; i >= 0; i--) {
sum += nums[i];
a++;
if (sum>=s)
break;
}
if (sum < s) {
return 0;
}
return a;
}
This solution was not accepted because it did not pass the following test case:
697439
[5334,6299,4199,9663,8945,3566,9509,3124,6026,6250,7475,5420,9201,9501,38,5897,4411,6638,9845,161,9563,8854,3731,5564,5331,4294,3275,1972,1521,2377,3701,6462,6778,187,9778,758,550,7510,6225,8691,3666,4622,9722,8011,7247,575,5431,4777,4032,8682,5888,8047,3562,9462,6501,7855,505,4675,6973,493,1374,3227,1244,7364,2298,3244,8627,5102,6375,8653,1820,3857,7195,7830,4461,7821,5037,2918,4279,2791,1500,9858,6915,5156,970,1471,5296,1688,578,7266,4182,1430,4985,5730,7941,3880,607,8776,1348,2974,1094,6733,5177,4975,5421,8190,8255,9112,8651,2797,335,8677,3754,893,1818,8479,5875,1695,8295,7993,7037,8546,7906,4102,7279,1407,2462,4425,2148,2925,3903,5447,5893,3534,3663,8307,8679,8474,1202,3474,2961,1149,7451,4279,7875,5692,6186,8109,7763,7798,2250,2969,7974,9781,7741,4914,5446,1861,8914,2544,5683,8952,6745,4870,1848,7887,6448,7873,128,3281,794,1965,7036,8094,1211,9450,6981,4244,2418,8610,8681,2402,2904,7712,3252,5029,3004,5526,6965,8866,2764,600,631,9075,2631,3411,2737,2328,652,494,6556,9391,4517,8934,8892,4561,9331,1386,4636,9627,5435,9272,110,413,9706,5470,5008,1706,7045,9648,7505,6968,7509,3120,7869,6776,6434,7994,5441,288,492,1617,3274,7019,5575,6664,6056,7069,1996,9581,3103,9266,2554,7471,4251,4320,4749,649,2617,3018,4332,415,2243,1924,69,5902,3602,2925,6542,345,4657,9034,8977,6799,8397,1187,3678,4921,6518,851,6941,6920,259,4503,2637,7438,3893,5042,8552,6661,5043,9555,9095,4123,142,1446,8047,6234,1199,8848,5656,1910,3430,2843,8043,9156,7838,2332,9634,2410,2958,3431,4270,1420,4227,7712,6648,1607,1575,3741,1493,7770,3018,5398,6215,8601,6244,7551,2587,2254,3607,1147,5184,9173,8680,8610,1597,1763,7914,3441,7006,1318,7044,7267,8206,9684,4814,9748,4497,2239]
The expected answer is 132 but my output was 80.
Does anyone have any idea what went wrong with my algorithm/code?
I will simply explain the flaw in the logic rather giving the correct logic to handle the problem statement
You are taking the numbers in a specific sequence and then adding them for comparison. Quite easily the case can be different where you take numbers in random order to get the exact sum.
For example [2,3,1,2,4,3] and s = 7.
Based on your logic
Step 1-> Sort the numbers and you get [1,2,2,3,3,4]
Step 2-> You pick last 2 number (3,4) to get your sum 7
Lets change the sum to 8
From Step 2-> You get 3+3+4 = 10 so u break out of the loop. After this step you return a = 2
Flaw here is 4+3+1 also makes 8 something your logic skips.
Same way 3+3+2 is also possible solution to achieve 8.
You sorting the array is first flaw in the logic itself. If you consider subarray of existing arrangement, sorting changes the arrangement therefore you will never be able to get the expected solution.

Efficient way to detect "rank of corner" in flattened multi-dimensional array

This is a small piece of very frequently-called code, and part of a convolution algorithm I am trying to optimise (technically it's my first-pass optimisation, and I have already improved speed by a factor of 2, but now I am stuck):
inline int corner_rank( int max_ranks, int *shape, int pos ) {
int i;
int corners = 0;
for ( i = 0; i < max_ranks; i++ ) {
if ( pos % shape[i] ) break;
pos /= shape[i];
corners++;
}
return corners;
}
The code is being used to calculate a property of a position pos within an N-dimensional array (that has been flattened to pointer, plus arithmetic). max_ranks is the dimensionality, and shape is the array of sizes in each dimension.
An example 3-dimensional array might have max_ranks = 3, and shape = { 3, 4, 5 }. The schematic layout of the first few elements might look like this:
0 1 2 3 4 5 6 7 8
[0,0,0] [1,0,0] [2,0,0] [0,1,0] [1,1,0] [2,1,0] [0,2,0] [1,2,0] [2,2,0]
Returned by function:
3 0 0 1 0 0 1 0 0
Where the first row 0..8 shows the index offset given by pos, and the numbers below give the multi-dimensional indices. Edit: Below that I have put the value returned by the function (the value of 2 is returned at positions 12, 24 and 36).
The function is effectively returning the number of "leading" zeros in the multi-dimensional index, and is designed as it is to avoid needing to make a full conversion to array indices on every increment.
Is there anything I can do with this function to make it inherently faster? Is there a clever way of avoiding %, or another way to calculate the "corner rank" - apologies by the way if it has a more formal name that I do not know . . .
The only time you should return max_ranks is if pos equals zero. Checking for this allows you to remove the conditional check from your for-loop. This should improve both the worst case completion time, and speed of the looping for large values of max_ranks.
Here is my addition, plus a alternative way of avoiding the division operation. I believe that this is as fast as a handwritten div like #twalberg was suggesting, unless there is some way to produce the remainder without a second multiplication.
I'm afraid since the most common answer is 0 (which doesn't even get past the first mod call) you aren't going to see much improvement. My guess is that your average run time is very close to the run time of the modulus function itself. You might try searching for a faster way to determine if a number is a factor of pos. You don't actual need to calculate the remainder; you just need to know if there is a remainder or not.
Sorry if I made things confusing by restructuring your code. I believe this will be slightly faster unless your compiler was already making these optimizations.
inline int corner_rank( int max_ranks, int *shape, int pos ) {
// Most calls will not get farther than this.
if (pos % shape[0] != 0) return 0;
// One check here, guarantees that while loop below always returns.
if (pos == 0) return max_ranks;
int divisor = shape[0] * shape[1];
int i = 1;
while (true) {
if (pos % divisor != 0) return i;
divisor *= shape[++i];
}
}
Also try declaring pos and divisor as the smallest types possible. If they will never be greater than 255 you can use an unsigned char. I know that some processors can perform a divide with smaller numbers faster than larger numbers, but you have to set your variable types appropriately.

Given an array of booleans, what is the most efficient way to select the index of a random TRUE value?

You're given an array of size n, containing arbitrary boolean values.
What is the fastest way to return the index of a random TRUE value.
The algorithm should randomly return any one of the indices containing a TRUE.
Something like this:
int count = 0;
int index = -1;
for (int i = 0; i != n; ++i)
{
if (values[i])
{
++count;
if (unit_random <= 1.0f / count)
{
index = i;
}
}
}
So for 4 values for example you get the following probabilities for their indices:
1: (1 / 1) * (1 / 2) * (2 / 3) * (3 / 4) = 1 / 4
2: (1 / 2) * (2 / 3) * (3 / 4) = 1 / 4
3: (1 / 3) * (3 / 4) = 1 / 4
4: 1 / 4 = 1 / 4
EDIT: As Steve Jessop pointed out the floating point comparision will eventually lead to a very non uniform selection. Assuming unit_random is defined as rand() / RAND_MAX the comparision can be changed to:
typedef unsigned long long u64;
u64 product = u64(count) * rand();
if (product <= u64(RAND_MAX))
This won't give perfect distribution due to the discrete nature of rand but it will be better.
The quickest solution - assuming you don't select repeatedly on the same array - is to pick a random index, return it if it's true, and repeat if not. In the best case, where all entries are true, this is O(1); in the worst case, where only one entry is true, this is O(n) (each try has a 1/n chance of hitting the only true value, which means an expected number of tries of n). This is no worse than any of the other posted solutions.
If you expect the array to usually be almost all false, though, you may want to pick another solution, as the variance in runtime of this random method will be high.
It's not quite clear what "randomly distibuted" means. Does it mean "with some unknown distribution"? If so, let's pretend all possible distributions are equally probable, so the "expected distribution" (like in "expected value") is uniform (the "average" of all possible distributions.) Then any index is TRUE with a probability of 1/2. So your task becomes the task of iterating through the array as quickly as possible. Start at the beginning, like you would normally iterate an array, until you encounter a TRUE value.
In order to return it, you must first count the True values (there is not way to skip that) and accumulate their indices in another array. And after counting you need to just generate random integer from 0 to N-1 (where N is the number of True values) and pick the value from the created array.
in pseudo-python:
indices=[]
for i,val in enumerate(arr):
if val:
indices.append(i)
randi = randint(0,len(indices)-1)
return indices[randi]
Simple solution: Generate a permutation of possible indexes (1:n ?) and in the order of that permutation return index if the corresponding value is true
def randomTrue(x):
perm = randomPermute(0:len(x))
for i in perm:
if x[i]:
return i

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...
The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.
It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.
Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.
Proof (completely rewritten in a more formal style):
Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.
We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.
Group boundaries
| | | | |
0 0 1 1 1 2 2 3 3
| | | |
Segment boundaries
Wherever there is an increase of two, we assume a double boundary by convention.
Group boundaries
| || | |
0 0 0 2 2 2 2 3 3
Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.
Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //
We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.
Group boundaries
| X | | |
0 0 ? 1 2 2 3 3 3
| | | |
Segment boundaries
Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".
Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.
Proof: Every other consecutive pair bounds only even groups. //
Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.
Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.
Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //
Example:
Observed group boundaries; relevant subsequence marked by […]
[ ] |
0 0 Y 1 1 Z 2 3 3
| | | |
Segment boundaries
Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1
As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.
At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.
In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.
Extensions:
This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.
It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).
A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).
PROOF OF O((log N)²)
T(2) = 1 //to make the summation nice
T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements
For some N=2^k,
T(2^k) = (log 2^k) + T(2^(k-1))
= (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
= (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
= k + (k-1) + (k-2) + ... + 1
= k(k+1)/2
= (k² + k)/2
= (log(N)² + log(N))/ 2
= O(log(N)²)
Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:
[* * * * a a a a * * *]
^ ^
| |
| |
i j
Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.
This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.
In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.
It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.
With that background, here are some observations:
1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.
2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.
3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.
4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).
5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.
Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.
As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.
I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.
The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).
The algorithm is as follows (written in C#):
// Finds the start of the range of equal numbers containing the index "index",
// which is assumed to be inside the array
//
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
int candidate = index;
int value = arr[index];
int step = 1;
// find the boundary for binary search:
while(candidate>=0 && arr[candidate] == value)
{
candidate -= step;
step *= 2;
}
// binary search:
int a = Math.Max(0,candidate);
int b = candidate+step/2;
while(a+1!=b)
{
int c = (a+b)/2;
if(arr[c] == value)
b = c;
else
a = c;
}
return b;
}
// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
if(arr[start] == arr[end-1])
return end;
int middle = (start+end)/2;
int rangeStart = findRangeStart(arr,middle);
if((rangeStart & 1) == 0)
return search(arr, middle, end);
return search(arr, start, rangeStart);
}
// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
return search(arr, 0, arr.Length);
}
Take the middle element e. Use binary search to find the first and last occurrence. O(log(n))
If it is odd return e.
Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]
Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).
AHhh. There is an answer.
Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.
In pseudocode (this is the general idea, not tested...):
private static int FindOddBall(int[] ary)
{
int l = 0,
r = ary.Length - 1;
int n = (l+r)/2;
while (r > l+2)
{
n = (l + r) / 2;
while (ary[n] == ary[n-1])
n = FindBreakIndex(ary, l, n);
if (n % 2 == 0) // even index we are on or to the left of the oddball
l = n;
else // odd index we are to the right of the oddball
r = n-1;
}
return ary[l];
}
private static int FindBreakIndex(int[] ary, int l, int n)
{
var t = ary[n];
var r = n;
while(ary[n] != t || ary[n] == ary[n-1])
if(ary[n] == t)
{
r = n;
n = (l + r)/2;
}
else
{
l = n;
n = (l + r)/2;
}
return n;
}
You can use this algorithm:
int GetSpecialOne(int[] array, int length)
{
int specialOne = array[0];
for(int i=1; i < length; i++)
{
specialOne ^= array[i];
}
return specialOne;
}
Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net
We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?
So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.
Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?
An array of size 5 could be divided into 1, 2 or 3 sublists.
What seems to be obvious is not that obvious, if we go into details.
An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).
Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).
Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?
Or can we calculate the solution without knowing the probability of the length of the sublists?
The clue is you're looking for log(n). That's less than n.
Stepping through the entire array, one at a time? That's n. That's not going to work.
We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.
So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.
Keep going from there.
Use a hash table
For each element E in the input set
if E is set in the hash table
increment it's value
else
set E in the hash table and initialize it to 0
For each key K in hash table
if K % 2 = 1
return K
As this algorithm is 2n it belongs to O(n)
Try this:
int getOddOccurrence(int ar[], int ar_size)
{
int i;
int xor = 0;
for (i=0; i < ar_size; i++)
xor = xor ^ ar[i];
return res;
}
XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.
Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].
edit: due to public demand, here is the code
int f(int *x, int min, int max) {
int size = max;
min /= 2;
max /= 2;
while (min < max) {
int i = (min + max)/2;
if (i==0 || x[2*i-1] == x[2*i])
min = i+1;
else
max = i-1;
}
if (2*max == size || x[2*max] != x[2*max+1])
return x[2*max];
return x[2*min];
}

Resources