Number of intervals that contains a given query point - arrays

I know a similar question exists here. My question is also same that I have N intervals (some possibly overlapping some even same). Then Q point queries are given and I need to tell how many intervals contains this point.
I tried to develop my algorithm by sorting the end point array then counting the number of overlapped interval by +1, -1 trick as mentioned in an answer. But after performing the binary search what I should do? Because its not always the case that the corresponding index of the prefix sum array is the answer.
e.g.
Intervals are : [1,4] [5,7] [6,10] [7,13]
sorted end point array : [1,4,5,6,7,7,10,13]
+1/-1 array : [1,-1,1,1,1,-1,-1,-1]
prefix sum array : [1,0,1,2,3,2,1,0]
Query : 10
my algorithm gives 1 (corresponding prefix array)
but actual ans should be 2.
How should I fix my algorithm?

There are no good answers in the question you linked, so:
First:
Put the entry and exit positions of each interval into separate arrays. (if you are using closed intervals then exit position is end position + 1, i.e., in [4,6], entry is 4, exit is 7.
Sort the arrays.
Then, for each point p:
Binary search in the entry array to find the number of entry positions <= p.
Binary search in the exit array to find the number of exit positions <= p.
The number of intervals that contain the point is entry_count - exit_count
NOTE that the number of positions <= p is the index of the first element > p. See: Where is the mistake in my code to perform Binary Search? to help you get that search right.
For your example:
Intervals: [1,4], [5,7], [6,10], [7,13]
Entry positions: [1,5,6,7]
Exit positions: [5,8,11,14]
Entry positions <= 6: 3
Exit positions <= 6: 1
Intervals that contains 6: 3-1 = 2

Problem is your intervals are [] instead of [), and the answer probably was made for the latter . First transform your end indendexes to value -1.
After this + "compressing" repeated coordinates you should have:
points = [1,5,6,7,8,11,14]
sums = [1,0,1,1,-1,-1,-1]
accumulated = [1,1,2,3,2,1,0]
Then for a query, if query < points[0] or query > points[max] return 0. If not, binary search over points to get index and the answer lies on accumulated[index].

Related

Changing the values of array by the distance of the indexes (c)

I'm having hard time with this one:
I need to write a function in C that recieving a binary array and his size, and the function should calculate and replace the current values with the distance (by indexes) of each 1 to the closest 0.
for example: if the function recieve that array {1,1,0,1,1,1,0,1} then the new values of the array should be {2,1,0,1,2,1,0,1}. It is known that the input has atleast 1 zero.
So the first step I tought about was to locate pair of zeros (or just 1 if there is only 1) and set them as 2 indexes (z1, z2). Then I set another index i
that check everytime which zero is the closest to him (absolute value) and then the diffrence between i and z1 or z2 would be the new value.
I have the plan but things are not going exactly as I planned. Basicly I deleted the code (it wasn't good anyway) so I would appreciate any help. thanks!
This problem is based on two things
Keep an array left[i] which has the distance of nearest 0 from index i from left to right.
Keep an array right[i] which has the distance of nearest 0 from index i from right to left.
Both can be calculate in single loop iteration. O(n).
Then for each position get the minimum value of left[i] and right[i]. That will be the answer for 1 staying in position i.
Overall the time complexity is O(n).

Maximize sum of weights with constraints given on left and right indices in array

I recently came through an interesting coding problem, which is as follows:
There are n boxes, let us assume this is an array of n boxes.
For each index i of this array, three values are given -
1.) Weight(i)
2.) Left(i)
3.) Right(i)
left(i) means - if weight[i] is chosen, we are not allowed to choose left[i] elements from the left of this ith element.
Similarly, right[i] means if arr[i] is chosen, we are not allowed to choose right[i] elements from the right of it.
Example :
Weight[2] = 5
Left[2] = 1
Right[2] = 3
Then, if I pick element at position 2, I get weight of 5 units. But, I cannot pick elements at position {1} (due to left constraint). And cannot pick elements at position {3,4,5} (due to right constraint).
Objective - We need to calculate the maximum sum of the weights we can pick.
Sample Test Case :-
**Input: **
5
2 0 3
4 0 0
3 2 0
7 2 1
9 2 0
**Output: **
13
Note - First column is weights, Second column is left constraints, Third column is right constraints
I used Dynamic Programming approach(similar to Longest Increasing Subsequence) to reach a O(n^2) solution. But, not able to think of a O(n*logn) solution. (n can be up to 10^5.)
I also tried to use priority queue, in which elements with lower value of (right[i] + i) are given higher priority(assigned higher priority to element with lower value of "i", in case primary key value is equal). But, it is also giving timeout error.
Any other approach for this? or any optimization in priority queue method? I can post both of my codes if needed.
Thanks.
One approach is to use a binary indexed tree to create a data structure that makes it easy to do two operations in O(logn) time each:
Insert number into an array
Find maximum in a given range
We will use this data structure to hold the maximum weight that can be achieved by selecting box i along with an optimal selection of boxes to the left.
The key is that we will only insert values into this data structure when we reach a point where the right constraint has been met.
To find the best value for box i, we need to find the maximum value in the data structure for all points up to location i-left[i], which can be done in O(logn).
The final algorithm is to loop over i=0..n-1 and for each i:
Compute result for box i by finding maximum in range 0..(i-left[i])
Schedule the result to be added when we reach location i+right[i]
Add any previously scheduled results into our data structure
The final result is the maximum value in the whole data structure.
Overall, the complexity is o(nlogn) because each value of i results in one lookup and one update operation.

Count of subarray

The problem is a variant of subarray counting. Given an array of numbers, let's say, 1,2,2,3,2,1,2,2,2,2 I look for subarrays and count the frequency of each. I start with looking from some K length subarrays (example K = 3).
Count of subarray 1,2,2 is C1:2.
Count of subarray 2,2,3 is 1.
Count of subarray 2,3,2 is 1.
and so on.
Now, I look for subarrays of length 2.
Count of subarray 1,2 is C2: 2. But (1,2) is a subset of the subarray 1,2,2. So, I calculate its count by subtracting C1 from C2 which gives count of 1,2 as 0. Similarly, count of 2,2 is 1.
My problem is in handling cases where more than one parent subset exists. I don't consider the sub-arrays in my result set whose frequency comes out to be 1. Example:
1,2,3,1,2,3,1,2,2,3
Here, Count of 1,2,3 is 2.
Count of 2,3,1 is 2.
Now, when I look for count of 2,3, it should be 1 as all the greater length parents have covered the occurrences. How shall I handle these cases?
The approach I thought was to mark all the pattern occurrences of the parent. In the above case, mark all the occurrences of 1,2,3 and 2,3,1. Array looks like this:
1,2,3,1,2,3,1,2,2,3
X,X,X,X,X,X,X,2,2,3
where X denotes the marked position. Now, frequency of 2,3 we see is 1 as per the positions which are unmarked. So, basically, I mark all the pattern occurrences I find at the current step. For the next step, I start looking for patterns from the unmarked locations only to get the correct count.
I am dealing with the large data on which this seems a bit not-so-good thing to do. Also, I'm not sure if it's correct or not. Any other approaches or ideas can be of big help?
Build suffix array for given array.
To count all repeating subarrays with given length - walk through this suffix array, comparing neighbor suffixes by needed prefix length.
For your first example
source array
1,2,2,3,2,1,2,2,2,2
suffix array is
5,0,9,4,8,7,6,1,2,3:
1,2,2,2,2 (5)
1,2,2,3,2,1,2,2,2,2 (0)
2 (9)
2,1,2,2,2,2 (4)
2,2 (8)
2,2,2 (7)
2,2,2,2 (6)
2,2,3,2,1,2,2,2,2 (1)
2,3,2,1,2,2,2,2 (2)
3,2,1,2,2,2,2 (3)
With length 2 we can count two subarrays 1,2 and four subarrays 2,2
If you want to count any given subarray - for example, all suffixes beginning with (1,2), just use binary search to get the first and the last indexes (like std:upperbound and std:lowerbound operations in C++ STL).
For the same example indexes of the first and last occurrences of (1,2) in suffix array are 0 and 1, so count is last-first+1=2

Minimum number of moves required to get a permutation of a int of array?

You have a sequence of d[0] , d[1], d[2] , d[3] ,..,d[n]. In each move you are allowed to increase any d[i] by 1 or 2 or 5 i:0 to n .What is the minimum number of moves required to transform the sequence to permutation of [1,2,3,..,n] if it's possible else return -1. 1<=n<=1000
My approach is sort the given array in ascending array than count it by adding 1 or 2 or 5 . But it fails in many cases .Some of my classmates did this in exam using this method but they read question wrong so read question carefully .
e.g. [1,1,3,2,1] than answer is 4 since We can get [1,2,5,4,3 ] by adding 0,1,2,2,2 respectively so answer is 4 .
[1,2,3,4,1] => [1,1,2,3,4] we will get 4 using sorting method [0,1,1,1,1] but answer is 2 since we can add [2+2] in 1 to get [1,2,3,4,5] .
similarly
[1,2,3,1] =>[1,1,2,3] to [1,2,3,4] required 3 transformation but answer is 2 since by adding [1+2] to 1 we can get [1,2,3,4].
Another method can be used as but i don't have any proof for correctness .
Algorithm
input "n" is number of element , array "a" which contains input element
initialize cnt = 0 ;
initialize boolarray[n] ={0};
1. for i=0...n boolarray[a[i]]=1;
2. put all element in sorted order whose boolarray[a[i]]=0 for i=0...n
3. Now make boolarray[a[i]]=1; for i=0..n and count
how many additions are required .
4. return count ;
According to me this question will be result in 0 or more always since any number can be produced using 1 , 2 and 5 except this case when any d[i] i=0..n is greater than number of Inputs .
How to solve this correctly ?
Any answer and suggestions are welcome .
Your problem can be converted in weighted bipartite matching problem :-
first part p1 of graph are the current array numbers as nodes.
second part p2 of graph are numbers 1 to n.
There is edge between node of p1 to node p2 if we can add 1,2,5 to it to make node in p2.
weighted bipartite matching can be solved using the hungarian algorithm
Edit :-
If you are evaluating minimum number of move then you can use unweighted bipartite matching . You can use hopcroft-karp algorithm which runs in O(n^1.5) in your case as number of edges E = O(n) in the graph.
Create an array count which contains the count of how often we have a specific number in our base array
input 1 1 3 2 1
count 3 1 1 0 0
now walk over this array and calculate the steps
sum = 0
for i: 1..n
while count[i] > 1 // as long as we have spare numbers
missing = -1 // find the biggest empty spot which is bigger than the number at i
for x: n..i+1 // look for the biggest missing
if count[x] > 0 continue // this one is not missing
missing = x
break;
if missing == -1 return -1 // no empty spot found
sum += calcCost(i, missing)
count[i]--
count[missing]++
return sum
calcCost must be greedy

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources