Making use of binary search algorithm - c

I'm more interested in the thought process than in the code itself. I'm supposed to find the number of v[i] components in a sorted array such that x < v[i] < y, where x and y are sent as input from the keyboard.
I'm supposed to solve this efficiently using a modified binary search instead of a regular search through the vector.
My problem is that I can't visualize how to implement a binary search in this case, the method just doesn't seem fit to me.
Any thoughts ?

You could do a binary search in the array for x and y. Then you could subtract the index of x from the index of y to get how many items are between them.
You'd actually have to subtract 1 from that result since you are using strictly less-than.
For example, say you have an array:
[3, 6, 8, 10, 14, 39, 41, 100]
Let x=8 and y = 39.
The index of x is 2 and the index of y is 5.
5-2 = 3
3-1 = 2
If x and y are allowed to be values that are not contained in your array, you can still follow the same approach. Search for x and if it is not found, use the index of the element that is just larger than x. Likewise for the index of the element that is just smaller than y.

Supposing the original array v is sorted, just follow those steps:
Use binary search to locate value x in the array - if it's not there (the lower and upper bounds in binary search meet), get the index of closest higher value
Do the same for the y value, get the index of closest lower value if it's not this time
Find out how many items are in between (by subtracting indices and adding 1)

Here's a great article if you're interested: Binary Search
Depending on the values of x and y, if they're the lower and upper limits respectively of numbers within the array, it's as simple as applying a general BS algorithm with those limits in place rather than the general first and last element of the array.
Drawing it up on paper will help.

Related

Given an array of sorted integers and find the closest value to the given number in c. Array may contain duplicate values

For example we have 5 values(a,b,c,e,d) in an array which may contain duplicate values.The values of a,b,c,d,e are calculated by another function and got assigned with some values.
let the user input value be x.
we need to find the closest value to x in the array and the output must be in a,b,c,e,d. if the closest number is one of the duplicates the alphabetical order must be considered.
for example:
Array: a,b,c,e,d
a=6,b=5,c=3,d=9,e=9 are the values assigned to them by a function.
for x : 5,
output : b
for x :11,
output : d
for x : 4,
output :c
Try and implement the following algorithm:
get the element at the middle of the array ;
if this element has the value x, the array contains it, you know what to print ;
if the element is larger, look in the first half of the array ;
otherwise look in the second half of the array ;
Once the array portion you search into has a size of 0, you have found the place where x would be inserted to preserve the ordering. The closest value is either the one of the left or the one on the right, if any. Compute the differences to determine what to print.

The best order to choose elements in the random array to maximize output?

We have an array as input to production.
R = [5, 2, 8, 3, 6, 9]
If ith input is chosen the output is sum of ith element, the max element whose index is less than i and the min element whose index is greater than i.
For example if I take 8, output would be 8+5+3=16.
Selected items cannot be selected again. So, if I select 8 the next array for next selection would look like R = [5, 2, 3, 6, 9]
What is the order to choose all inputs with maximum output in total? If possible, please send dynamic programming solutions.
I'll start the bidding with an O(n2n) solution . . .
There are a number of ambiguities in your description of the problem, that you have declined to address in comments. None of these ambiguities affects the runtime complexity of this solution, but they do affect implementation details of the solution, so the solution is necessarily somewhat of a sketch.
The solution is as follows:
Create an array results of 2n integers. Each array index i will denote a certain subsequence of the input, and results[i] will be the greatest sum that we can achieve starting with that subsequence.
A convenient way to manage the index-to-subsequence mapping is to represent the first element of the input using the least significant bit (the 1's place), the second element with the 2's place, etc.; so, for example, if our input is [5, 2, 8, 3, 6, 9], then the subsequence 5 2 8 would be represented as array index 0001112 = 7, meaning results[7]. (You can also start with the most significant bit ā€” which is probably more intuitive ā€” but then the implementation of that mapping is a little bit less convenient. Up to you.)
Then proceed in order, from subset #0 (the empty subset) up through subset #2nāˆ’1 (the full input), calculating each array-element by seeing how much we get if we select each possible element and add the corresponding previously-stored values. So, for example, to calculate results[7] (for the subsequence 5 2 8), we select the largest of these values:
results[6] plus how much we get if we select the 5
results[5] plus how much we get if we select the 2
results[3] plus how much we get if we select the 8
Now, it might seem like it should require O(n2) time to compute any given array-element, since there are n elements in the input that we could potentially select, and seeing how much we get if we do so requires examining all other elements (to find the maximum among prior elements and the minimum among later elements). However, we can actually do it in just O(n) time by first making a pass from right to left to record the minimal value that is later than each element of the input, and then proceeding from left to right to try each possible value. (Two O(n) passes add up to O(n).)
An important caveat: I suspect that the correct solution only ever involves, at each step, selecting either the rightmost or second-to-rightmost element. If so, then the above solution calculates many, many more values than an algorithm that took that into account. For example, the result at index 1110002 is clearly not relevant in that case. But I can't prove this suspicion, so I present the above O(n2n) solution as the fastest solution whose correctness I'm certain of.
(I'm assuming that the elements are nonnegative absent a suggestion to the contrary.)
Here's an O(n^2)-time algorithm based on ruakh's conjecture that there exists an optimal solution where every selection is from the rightmost two, which I prove below.
The states of the DP are (1) n, the number of elements remaining (2) k, the index of the rightmost element. We have a recurrence
OPT(n, k) = max(max(R(0), ..., R(n - 2)) + R(n - 1) + R(k) + OPT(n - 1, k),
max(R(0), ..., R(n - 1)) + R(k) + OPT(n - 1, n - 1)),
where the first line is when we take the second rightmost element, and the second line is when we take the rightmost. The empty max is zero. The base cases are
OPT(1, k) = R(k)
for all k.
Proof: the condition of choosing from the two rightmost elements is equivalent to the restriction that the element at index i (counting from zero) can be chosen only when at most i + 2 elements remain. We show by induction that there exists an optimal solution satisfying this condition for all i < j where j is the induction variable.
The base case is trivial, since every optimal solution satisfies the vacuous restriction for j = 0. In the inductive case, assume that there exists an optimal solution satisfying the restriction for all i < j. If j is chosen when there are more than j + 2 elements left, let's consider what happens if we defer that choice until there are exactly j + 2 elements left. None of the elements left of j are chosen in this interval by the inductive hypothesis, so they are irrelevant. Choosing the elements right of j can only be at least as profitable, since including j cannot decrease the max. Meanwhile, the set of elements left of j is the same at both times, and the set of the elements right of j is a subset at the later time as compared to the earlier time, so the min does not decrease. We conclude that this deferral does not affect the profitability of the solution.

Minimum Complexity of two lists element summation comparison

I have a question in algorithm design about arrays, which should be implement in C language.
Suppose that we have an array which has n elements. For simplicity n is power of '2' like 1, 2, 4, 8, 16 , etc. I want to separate this to 2 parts with (n/2) elements. Condition of separating is lowest absolute difference between sum of all elements in two arrays for example if I have this array (9,2,5,3,6,1,4,7) it will be separate to these arrays (9,5,1,3) and (6,7,4,2) . summation of first array's elements is 18 and the summation of second array's elements is 19 and the difference is 1 and these two arrays are the answer but two arrays like (9,5,4,2) and (7,6,3,1) isn't the answer because the difference of element summation is 4 and we have found 1 . so 4 isn't the minimum difference. How to solve this?
Thank you.
This is the Partition Problem, which is unfortunately NP-Hard.
However, since your numbers are integers, if they are relatively low, there is a pseudo polynomial O(W*n^2) solution using Dynamic Programming (where W is sum of all elements).
The idea is to create the DP matrix of size (W/2+1)*(n+1)*(n/2+1), based on the following recursive formula:
D(0,i,0) = true
D(0,i,k) = false k != 0
D(x,i,k) = false x < 0
D(x,0,k) = false x > 0
D(x,i,0) = false x > 0
D(x,i,k) = D(x,i-1,k) OR D(x-arr[i], i-1,k-1)
The above gives a 3d matrix, where each entry D(x,i,k) says if there is a subset containing exactly k elements, that sums to x, and uses the first i elements as candidates.
Once you have this matrix, you just need to find the highest x (that is smaller than SUM/2) such that D(x,n,n/2) = true
Later, you can get the relevant subset by going back on the table and "retracing" your choices at each step. This thread deals with how it is done on a very similar problem.
For small sets, there is also the alternative of a naive brute force solution, which basically splits the array to all possible halves ((2n)!/(n!*n!) of those), and picks the best one out of them.

Find all pairs (x, y) in a sorted array so that x + y < z

This is an interview question. Given a sorted integer array and number z find all pairs (x, y) in the array so that x + y < z. Can it be done better than O(n^2)?
P.S. I know that we can find all pairs (x, y | x + y == z) in O(N).
You cannot necessarily find all such pairs in O(n) time, because there might be O(n2) pairs of values that have this property. In general, an algorithm can't take any less time to run than the number of values that it produces.
Hope this helps!
In generate, no it can't. Consider the case where x + y < z for all x, y in the array. You have to touch (e.g. display) all of the n(n - 1)/2 possible pairs in the set. This is fundamentally O(n^2).
If you are asked to output all pairs that satisfy that property, I don't think there is anything better than O(N^2) since there can be O(N^2) pairs in the output.
But this is also true for x + y = z, for which you claim there is a O(N) solution - so I might be missing something.
I suspect the original question asked for the number of pairs. In that case, it can be done in O(N log(N)). For each element x find out y = z - x and do a binary search for y in the array. The position of y gives the number of pairs that can be formed with that particular value of x. Summing this over all values in the array gives you the answer. There are N values and finding the number if pairs for each takes O(log(N)) (binary search), so the whole thing is O(N log(N)).
You can find them in O(N), if you add the additional constraint that each element is unique.
After finding all of the x+y==z pairs, you know that for every x and y that satisfies that condition, every x or y (choose one) that is at a lower index than its pair satisfies the x+y < z condition.
Actually selecting these and outputting them would take O(n^2), but in a sense, the x+y==z pairs are a compressed form of the answer, together with the input.
(You can preprocess the input to a form where each element is unique, together with a counter for number of occurrences. This would take O(N) time. You can generalize this solution to unsorted arrays, increasing the time to O(nlogn).)
The justification for saying that finding the pairs in under the time linearly proportional to the size of the solution: Suppose the question is "what are the integers that are between 0 and the given input K"?
Because it is a sorted integer array, you could use the Binary search algorithm, so the best is O(N), and the worst is O(N*logN), the average case is also O(N*logN).
You can sort the array and for every element that little than z, use binary-search - total O(NlogN).
Total run-time : O(|P| + NlogN), where P is the resulting pairs.
There actually exists an O(nlogn) solution to this question.
What I would do (after checking first if I'm allowed to do that) is to define the output format of my algorithm/function.
I would define it as a sequence of elements (S,T). S - Position of element in the array (or its value). T - Position of the sub-array [0,T]. So for example, if T=3, it means that element S combined with elements 0,1,2 and 3 satisfy the desired condition.
The total result of this is O(nlogn) run time, and O(n) memory.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources