How to replace the elements of a range less than k with k? - c

How do I replace the elements in the range of an array greater than k by k when the number of queries are high?
Given that each query is given by the form l r k; where [l...r] is the range of the array.

Since my first answer created big thread of comments I'm going to combine everything in new answer.
We are going to use Segment Tree as helper data-structure which will be used to answer this question: what is the minimum on range [l, r]. Initially all segment tree nodes will be filled with some "Infinity" numbers which can be 201 in your problem (since all K are lower then 200 based on your comment).
Once we read our input array (lets call it A) we are going to process queries:
for each query [L, R, K] we are going to update our segment tree: try to set new minimum K on range [L, R]. That could be done with O(LogN) using lazy propagation. Here is a great example http://se7so.blogspot.com/2012/12/segment-trees-and-lazy-propagation.html
now we need to build final array. We are iterating over each index in our array and replace it with A[i] = min(A[i], minimum_on_range(i, i)). That will take N * Log(N) steps
Total complexity of that approach is M * Log(N) + N * Log(N)

Related

Maximize number of inversion count in array

We are given an unsorted array A of integers (duplicates allowed) with size N possibly large. We can count the number of pairs with indices i < j, for which A[i] < A[j], let's call this X.
We can change maximum one element from the array with a cost equal to the difference in absolute values (for instance, if we replace element on index k with the new number K, the cost Y is | A[k] - K |).
We can only replace this element with other elements found in the array.
We want to find the minimum possible value of X + Y.
Some examples:
[1,2,2] should return 1 (change the 1 to 2 such that the array becomes [2,2,2])
[2,2,3] should return 1 (change the 3 to 2)
[2,1,1] should return 0 (because no changes are necessary)
[1,2,3,4] should return 6 (this is already the minimum possible value)
[4,4,5,5] should return 3 (this can accomplished by changing the first 4 into a 5 or the last 5 in a 4)
The number of pairs can be found with a naive O(n²) solution, here in Python:
def calc_x(arr):
n = len(arr)
cnt = 0
for i in range(n):
for j in range(i+1, n):
if arr[j] > arr[i]:
cnt += 1
return cnt
A brute-force solution is easily written as for example:
def f(arr):
best_val = calc_x(arr)
used = set(arr)
for i, v in enumerate(arr):
for replacement in used:
if replacement == v:
continue
arr2 = arr[0:i] + replacement + arr[i:]
y = abs(replacement - v)
x = calc_x(arr2)
best_val = min(best_val, x + y)
return best_val
We can count for each element the number of items right of it larger than itself in O(n*log(n)) using for instance an AVL-tree or some variation on merge sort.
However, we still have to search which element to change and what improvement it can achieve.
This was given as an interview question and I would like some hints or insights as how to solve this problem efficiently (data structures or algorithm).
Definitely go for a O(n log n) complexity when counting inversions.
We can see that when you change a value at index k, you can either:
1) increase it, and then possibly reduce the number of inversions with elements bigger than k, but increase the number of inversions with elements smaller than k
2) decrease it (the opposite thing happens)
Let's try not to count x every time you change a value. What do you need to know?
In case 1):
You have to know how many elements on the left are smaller than your new value v and how many elements on the right are bigger than your value. You can pretty easily check that in O (n). So what is your x now? You can count it with the following formula:
prev_val - your previous value
prev_x - x that you've counted at the beginning of your program
prev_l - number of elements on the left smaller than prev_val
prev_r - number of elements on the right bigger than prev_val
v - new value
l - number of elements on the right smaller than v
r - number of elements on the right bigger than v
new_x = prev_x + r + l - prev_l - prev_r
In the second case you pretty much do the opposite thing.
Right now you get something like O( n^3 ) instead of O (n^3 log n), which is probably still bad. Unfortunately that's all what I came up for now. I'll definitely tell you if I come up with sth better.
EDIT: What about memory limit? Is there any? If not, you can just for each element in the array make two sets with elements before and after the current one. Then you can find the amount of smaller/bigger in O (log n), making your time complexity O (n^2 log n).
EDIT 2: We can also try to check, what element would be the best to change to a value v, for every possible value v. You can make then two sets and add/erase elements from them while checking for every element, making the time complexity O(n^2 log n) without using too much space. So the algorithm would be:
1) determine every value v that you can change any element, calculate x
2) for each possible value v:
make two sets, push all elements into the second one
for each element e in array:
add previous element (if there's any) to the first set and erase element e from the second set, then count number of bigger/smaller elements in set 1 and 2 and calculate new x
EDIT 3: Instead of making two sets, you could go with prefix sum for a value. That's O (n^2) already, but I think we can go even better than this.

Possibly simpler O(n) solution to find the Sub-array of length K (or more) with the maximum average

I saw this question on a coding competition site.
Suppose you are given an array of n integers and an integer k (n<= 10^5, 1<=k<=n). How to find the sub-array(contiguous) with maximum average whose length is more than k.
There's an O(n) solution presented in research papers(arxiv.org/abs/cs/0207026.), linked in a duplicate SO question. I'm posting this as a separate question since I think I have a similar method with a simpler explanation. Do you think there's anything wrong with my logic in the solution below?
Here's the logic:
Start with the range of window as [i,j] = [0,K-1]. Then iterate over remaining elements.
For every next element, j, update the prefix sum**. Now we have a choice - we can use the full range [i,j] or discard the range [i:j-k] and keep [j-k+1:j] (i.e keep the latest K elements). Choose the range with the higher average (use prefix sum to do this in O(1)).
Keep track of the max average at every step
Return the max avg at the end
** I calculate the prefix sum as I iterate over the array. The prefix sum at i is the cumulative sum of the first i elements in the array.
Code:
def findMaxAverage(nums, k):
prefix = [0]
for i in range(k):
prefix.append(float(prefix[-1] + nums[i]))
mavg = prefix[-1]/k
lbound = -1
for i in range(k,len(nums)):
prefix.append(prefix[-1] + nums[i])
cavg = (prefix[i+1] - prefix[lbound+1])/(i-lbound)
altavg = (prefix[i+1] - prefix[i-k+1])/k
if altavg > cavg:
lbound = i-k
cavg = altavg
mavg = max(mavg, cavg)
return mavg
Consider k = 3 and sequence
3,0,0,2,0,1,3
Output of your program is 1.3333333333333333. It has found subsequence 0,1,3, but the best possible subsequence is 2,0,1,3 with average 1.5.

Is it possible to select a number from every given intervals without repetition in selections. Solution in LINEAR TIME

I have been trying this question on hackerearth practice which requires below work done.
PROBLEM
Given an integer n which signifies a sequence of n numbers from {0,1,2,3,4,5.......n-2,n-1}
We are provided m ranges in form of (L,R) such that (0<=L<=n-1)(0<=R<=n-1)
if(L <= R) (L,R) signifies numbers {L,L+1,L+2,L+3.......R-1,R} from above sequence
else (L,R) signifies numbers {R,R+1,R+2,.......n-2,n-1} & {0,1,2,3....L-1,L} ie numbers wrap around
example
n = 5 ie {0,1,2,3,4}
(0,3) signifies {0,1,2,3}
(3,0) signifies {3,4,0}
(3,2) signifies {3,4,0,1,2}
Now we have to select ONE (only one) number from each range without repeating any selection. We have to tell is it possible to select one number from each(and every) range without repetition.
Example test case
n = 5// numbers {0,1,2,3,4}
// ranges m in number //
0 0 ie {0}
1 2 ie {1,2}
2 3 ie {2,3}
4 4 ie {4}
4 0 ie {4,0}
Answer is "NO" it's not possible.
Because we cannot select any number from range 4 0 because if we select 4 from it we could not be able to select from range 4 4 and if select 0 from it we would not be able to select from 0 0
My approaches -
1) it can be done in O(N*M) using recurrsion checking all possibilitie of selection from each range and side by side using hash map to record our selections.
2) I was trying it in order n or m ie linear order .Problem lack editorial explanation .Only a code is mentioned in the editorial without comments and explanation . I m not able to get the codelinear solution code by someone which passes all test cases and got accepted.
I am not able to understand the logic/algo used in the code and why is it working?
Please suggest ANY linear method and logic behind it because problem has these constraints
1 <= N<= 10^9
1 <= M <= 10^5
0 <= L, R < N
which demands a linear or nlogn solution as i guess??
The code in the editorial can also be seen here http://ideone.com/5Xb6xw
Warning --After looking The code I found the code is using n and m interchangebly So i would like to mention the input format for the problem.
INPUT FORMAT
The first line contains test cases, tc, followed by two integers N,M- the first one depicting the number of countries on the globe, the second one depicting the number of ranges his girlfriend has given him. After which, the next M lines will have two integers describing the range, X and Y. If (X <= Y), then range covers countries [X,X+1... Y] else range covers [X,X+1,.... N-1,0,1..., Y].
Output Format
Print "YES" if it is possible to do so, print "NO", if it is not.
There are two components to the editorial solution.
Linear-time reduction to a problem on ordinary intervals
Assume to avoid trivial cases that the number of input intervals is less than n.
The first is to reduce the problem to one where the intervals don't wrap around as follows. Given an interval [L, R], if L ≤ R, then emit two intervals [L, R] and [L + n, R + n]; if L > R, emit [L, R + n]. The easy direction of the reduction is showing that, if the original problem has a solution, then the reduced problem has a solution. For [L, R] with L ≤ R assigned a number k, assign k to [L, R] and k + n to [L + n, R + n]. For [L, R] with L > R, assign whichever of k, k + n belongs to [L, R + n]. Except for the dual assignment of k and k + n for intervals [L, R] and [L + n, R + n] respectively, each interval gets its own residue class mod n, so the assignments do not conflict.
Conversely, the hard direction of the reduction (if the original problem has no solution, then the reduced problem has no solution) is proved using Hall's marriage theorem. By Hall's criterion, an unsolvable original problem has, for some k, a set of k input intervals whose union has size less than k. We argue first that there exists such a set of input intervals whose union is a (circular) interval (which by assumption isn't all of 0..n-1). Decompose the union into the set of maximal (circular) intervals that comprise it. Each input interval is contained in exactly one of these intervals. By an averaging argument, some maximal (circular) interval contains more input intervals than its size. We finish by "lifting" this counterexample to the reduced problem. Given the maximal (circular) interval [L*, R*], we lift it to the ordinary interval [L*, R*] if L* ≤ R*, or [L*, R* + n] if L* > R*. Do likewise with the circular intervals contained in this interval. It is tedious but straightforward to show that this lifted counterexample satisfies Hall's criterion, which implies that the reduced problem has no solution.
O(m log m)-time solution for ordinary intervals
This is a sweep-line algorithm. Sort the intervals by lower endpoint and scan them in that order. We imagine that the sweep line moves from lower endpoint to lower endpoint. Maintain the set of intervals that intersect the sweep line and have not been assigned a number, sorted by upper endpoint. When the sweep line is about to move, assign the numbers between the old and new positions to the intervals in the set, preferentially to the ones whose upper endpoint is the lowest. The correctness of this strategy should be clear: the intervals that could be assigned a number but are passed over have at least as many options (in the sense of being a superset) as the intervals that are assigned, so we never make a choice that we have cause to regret.

find nth-smallest value across m sorted arrays using idea from 2 sorted arrays

May I ask whether would it be possible? the general approach would be somehow like find n-th value on two sorted array, to ignore the insignificants and try to focus on the rest by adjusting the value of n in recursion
The 2 sorted arrays problem would yield a computation time O(log(|A|)+log(|B|), while the question is similar, I would like to ask if there exist algorithm for m sorted arrays for time O(log(|A1|)+log(|A2|)+...+log(|Am|)),
or some similar variation that is near the time I mentioned above (due to the variable m, we might need some other sorting algorithm for the pivots from those arrays),
or if such algorithm doesn't exist, why?
I just can't find this algorithm from googling
There is a simple randomized algorithm:
Select a pivot randomly from any of the m arrays. Let's call it x
For every array, do a binary search for x to find out how many values < x are in the array. Say we have ri values < x in array i. We know that x has rank r = sum(i = 1 to m, ri) in the union of all arrays.
If n <= r, we restrict each array i to the indices 0...(ri - 1) and recurse. If n > r, we restrict each array to the indices ri...|Ai | - 1
repeat
The expected recursion depth is O(log(N)) with N being the total number of elements, with a proof similar to that of Quickselect, so the expected running time is something like O(m * log2(N)).
The paper "Generalized Selection and Ranking" by Frederickson and Johnson proposes selection and ranking algorithms for different scenarios, for example an O(m + c * log(k/c)) algorithm to select the k-th element from m equally sized sorted sequences, with c = min{m, k}.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources