Find array indices of values greater than given distance in O(n)

Find array indices of values greater than given distance in O(n) - arrays

Given an array with indices 1 ... n and corresponding values x1 ... xn and the values are sorted in increasing order, left f(i) be the first index (smaller than i) of the greatest value (from the right to the left) with |xi - xf(i)| > 5. For example given (x1, x2, x3, x4) = (6, 7, 12, 14) then f(4) = 2 because the index of the greatest value (from the right to the left) with a distance greater than 5 is 7 (as 14-12 would only be a distance of 2), f(3) = 1 and f(2),f(1) are undefiened (no cycles/modulo because index f(i) must be smaller than i).
I'm searching for an algorithm that finds f(n),f(n-1), ..., f(1) alltogether in time complexity O(n).
The naive version has O(n2) only.

The observation that leads to O(n) time algorithm is that for i < j where both f(i) and f(j) are defined, f(i) <= f(j), i.e., f is an (non-strictly) increasing function. Therefore, when for an index i we find f(i), f(i-1) has to be either equal to or smaller than f(i).
Here is a piece of code that captures the essence of the algorithm.
for (int i = 1; i <= n; i++)
f[i] = -1; //initialize to undefined
int j = n-1;
int i = n;
while (i > 0 && j > 0) {
while ((j > 0) && (x[i] - x[j] <= 5))
j--;
if (x[i] - x[j] > 5)
f[i] = j;
i--;
}

Related

Find the number of subarrays of odd lengths that have a median equal to k

Find the number of subarrays of odd lengths that have a median equal to k.
For example: array = [5,3,1,4,7,7], k=4 then there are 4 odd length subarrays with 4 as their median: [4], [1,4,7], [5,3,1,4,7], [3,1,4,7,7] therefore return 4 as the answer.
Can anyone please help me with this subarray problem, I'm not sure how to get the output.

Recently encountered this problem in Online Assessment.
However 'k' is index and 1 <= k <= n where n is length of array.
We have to find how many subarrays have arr[k] as median, also subarray has to be odd length. This is the only hint we need.
Since subarrays are odd length it can be (arr[k]) or (1 element on left and right), (2 elements on left and right), so on...
We can maintain smaller and bigger arrays of length n and populate them as follows:
if(arr[i] < arr[k])
smaller[i] = 1;
else
smaller[i] = 0;
for bigger elements than arr[k]:
if(arr[i] > arr[k])
bigger[i] = 1;
else
bigger[i] = 0;
This helps us to find in range i...j where i <= j, count of smaller and bigger elements with respect to arr[k].
For arr[k] to be median in the range [i,j], The following condition has to hold.
(smaller[j] - smaller[i - 1]) = (bigger[j] - bigger[i - 1])
In other words difference between a number of smaller and bigger elements in the range [i, j] is 0.
we create new array d of length n, such that
d[i] = smaller[i] - bigger[i]
Now problem reduces to finding the number of subarrays having a sum of 0;
But not all subarrays having sum 0 are useful to us.
We don't care about the subarrays that do not include 'k'. So,
ans = subarray_sum_zero(1, n, d) - subarray_sum_zero(1, k - 1, d) - subarray_sum_zero(k + 1, n, d)
subarray_sum_zero function finds the number of subarrays in array d.
You can find the subarray sum equals k using the map in linear time complexity.
overall runtime complexity is O(n) and space complexity is O(n).
It should be able pass the tests n = 1e5.

#adf_hater 's logic is correct ( because median is middle element so smaller elements has to be equal to number of bigger elements) . Here is the code using same logic
int sum(int start, int end, vector<int>& v) {
unordered_map<ll, ll> prevSum;
int res = 0, currSum = 0;
for (int i = start; i < end; i++) {
currSum += v[i];
if (currSum == 0)
res++;
if (prevSum.find(currSum - 0) != prevSum.end())
res += (prevSum[currSum - 0]);
prevSum[currSum]++;
}
return res ;
}
void solve(int n, vector<int>&v, int k){
vector<int>smaller(n, 0), bigger(n, 0), d(n, 0) ;
k-= 1;
for(int i = 0 ; i < n; i++)
smaller[i] = v[i] < v[k];
for(int i = 0 ; i < n; i++)
bigger[i] = v[i] > v[k] ;
for(int i = 0 ; i< n; i++)
d[i] = smaller[i] - bigger[i] ;
cout<< sum(0, n, d) - sum(0, k, d) - sum(k+1, n, d) ;
}

Find the frequency of number repeates (r-l+1)/2 times in range [l:r]

Given the array: A[N]. There are some queries including Li and Ri. We must Find the number that appears more than (Ri-Li+1)/2 times in range [Li:Ri].
For example:
INPUT:
N=7
1 1 3 2 3 4 3
OUTPUT:
Ranges:
[1:3] ans is :>1
[1:4] no answer
[1:7] ans is :>3
[2:7] no answer
First, I think we can use map to store the times that A[i] appears from 1 to j
And it's take up a lot of memories if N up to 5e5.
Then I sort(Increasing order) the queries so that Ri, and no more idea.
Suggestions:
Is there any efficient algorithm to this problem or any data structure to stores the frequency of A[i]: from 1 to j?

I have no idea about such data structure, but I find an solution for this problem.
If Ri - Li + 1 is odd, there may have two elements appear (Ri - Li + 1) / 2 times. Which one do you want to get? We can use the algorithm beblow to get one of them and the algorithm can get all of these two if you want.
If there are just few queries satisfy \sum (Ri - Li) are small enough, get the answer for each [Li, Ri] separately.
For each [Li, Ri],we can use a O(Ri - Li) time, O(1) auxiliary memory algorithm to get the answer. If there is a x appears exactly (Ri - Li + 1) / 2 times, at least one of three case below must happend (suppose Ri > Li).
x appears (Ri - Li + 1) / 2 times in [Li, Ri - 1].
x appears (Ri - Li + 1) / 2 times in [Li + 1, Ri].
A[Li] == A[Ri] == x.
For case 1,2 we can use 'Heavy Hitters' algorithm to find the candidate x.
So can get three candidate x for one travese, and check each of them to find the answer(see cpp code below).
int getCandidateX(int L, int R) {
int x = A[L], count = 1;
for(int i = L + 1; i <= R; ++i){
if(A[i] == x) ++count;
else if(--count == 0){
x = A[i];
count = 1;
}
}
return x;
}
int getFrequency(int L, int R, int x) {
int count = 0;
for(int i = L; i <= R; ++i) {
if(A[i] == x) ++count;
}
return count;
}
/**
* if Ri == Li, no answer
* suppose Ri > Li
* return {x, 0} and {-1,-1} if no such element
*/
pair<int,int> getAnswer(int Li, int Ri) {
int t = (Ri - Li + 1) / 2;
int x;
if((Ri - Li) & 1) {
x = getCandidateX(Li, Ri);
if(getFrequency(Li, Ri, x) == t) return {x, 0};
return {-1, -1}
}
x = getCandidateX(Li, Ri - 1);
if(getFrequency(Li, Ri, x) == t) return {x, 0};
x = getCandidateX(Li + 1, Ri);
if(getFrequency(Li, Ri, x) == t) return {x, 0};
if(A[Li] == A[Ri] && getFrequency(Li, Ri, A[Li]) == t)
return {Li, 0};
return {-1,-1}
}
When \sum (Ri - Li) is large, I found an O((m + n)logn) online solution, but it also cost a lot of memory. I conduct it as a RMQ(Range Maximum Query) problem and solve it by ST(sparse table) algorithm.
First, we can get the frequency in [L, R] of any x with O(logn) time.
We can store all the position of x in map[x] where map maps x to its position array.(we can use treemap or hashmap)
Then we can get the frequency of x in [L, R] by binary search which cost O(logn) time.
Define num[L][R] be a set of elements appear more than (R - L + 1) / 4 times in interavl [L,R]. Let val[i][k] = num[L][L + 2^k - 1], k >= 2.
Every val[i][k] has at most 4 elements, and we can calculate all val[i][k] for 0 <= i <= n and i + 2^k <= n in O(nlogn) time and O(nlogn) memory.
Because for every interval [L,R] and M1, M2 such that L <= M < R it is obvious to see that num[L][R] \subset num[L][M] \cup num[M + 1][R]. Then val[i][k] \subset val[i][k - 1] \cup val[i + 2^{k - 1} - 1][k - 1]`.
Let t as the greatest number such that 2^t <= R - L + 1 we can draw a conclusion that if x \in [L,R] appears not less than (R - L + 1) / 2 times，x must in val[L][t] or val[R - 2^t + 1][R]。
This means it is sufficient to check the frequency of every element in val[L][t] \cup val[R - 2^t - 1][t].
For every query [L,R] we can check every element in O(logn) time, so the total time is O((m + n)logn) where n is the element number of A and m is the query number.
If the question is to get the element appears exactly (Ri-Li+1)/2 + 1 times (or more), it can be solve in a more simply way.

Time complexity in terms of n if time complexity is O(x*y) where x+y = n

This code is to move all the zeroes in the vector to the end of the vector while maintaining the order of the non zero elements.
Eg: 0 3 0 8 0 9
Output : 3 8 9 0 0 0
I wrote the following code for this
void moveZeroes(vector<int>& nums) {
vector<int> v, v1; // v has the index of all the zero elements while v1 has index of non zero elements
for(int i = 0; i < nums.size(); i++){
if(nums[i] == 0)
v.push_back(i);
else v1.push_back(i);
}
//Here i'm swapping all the zero elements with non zero elements
for(int i = 0; i < v.size(); i++){
for(int j = 0; j < v1.size(); j++){
if(v[i] < v1[j]){
swap(nums[v[i]], nums[v1[j]]);
v[i] = v1[j];
}
}
}
}
So if nums has size n and v has size x & v1 has size y where x + y = n, then time complexity is O(x*y) . But what will be the time complexity in terms of n?

Let N be the the number of elements in the vector.
In the best case the time complexity would be linear, i.e, O(N). This happens in 2 cases:
when all the elements are non-zero. The second loop in which you swap elements would not be executed (v would be empty)
when all the elements are zero. In such a case only the first part of the second loop would run (v1 would be empty).
In the other cases you incur in a quadratic time complexity O(N^2) due to the double loop.
For instance, suppose that half of the elements are zero and the other half is non-zero. This means that the number of iterations would be.
N/2*N/2 = N^2/4 = O(N^2)

So if nums has size n and v has size x & v1 has size y where x + y = n, then time complexity is O(x*y) . But what will be the time complexity in terms of n?
Could be up to O(n2) if x and y are roughly n/2, e.g: y = x = n / 2 => x * y = n^2/4.
I recommend you to do the following:
void moveZeroes(vector<int>& nums)
{
for(int i = 0, p = 0; i < (int)nums.size(); ++i)
if(nums[i])
{
if(p != i)
swap(nums[i], nums[p]);
++p;
}
}
Pointer p indicates how many non zero elements have been swapped.
This way you get an O(n) time complexity, the code is clearer and don't use an extra O(n) memory consumption.

Xor of all pairwise sums of integers in an array

We have an array A for example [1, 2, 3]. I want to find the XOR of the SUM of all pairs of integers in the array.
Though this can easily be done in O(n^2) (where n is the size of the array) by passing over all of the pairs, I want to improve the time complexity of the solution? Any answer that improves the time complexity would be great.
E.g. for the above example array, A, the answer would be (1+2)^(1+3)^(2+3) = 2. Since the pairwise elements are (1,2), (1,3), (2,3), and 3 ^ 4 ^ 5 = 2.

Here's an idea for a solution in O(nw) time, where w is the size of a machine word (generally 64 or some other constant). The most important thing is counting how many of the pairs will have a particular bit set, and the parity of this number determines whether that bit will be set in the result. The goal is to count that in O(n) time instead of O(n2).
Finding the right-most bit of the result is easiest. Count how many of the input numbers have a 0 in the right-most place (i.e. how many are even), and how many have a 1 there (i.e. how many are odd). The number of pairs whose sum has a 1 in the rightmost place equals the product of those two counts, since a pair must have one odd and one even number for its sum to be odd. The result has a 1 in the rightmost position if and only if this product is odd.
Finding the second-right-most bit of the result is a bit harder. We can do the same trick of counting how many elements do and don't have a 1 there, then taking the product of those counts; but we also need to count how many 1 bits are carried into the second place from sums where both numbers had a 1 in the first place. Fortunately, we can compute this using the count from the previous stage; it is the number of pairs given by the formula k*(k-1)/2 where k is the count of those with a 1 bit in the previous place. This can be added to the product in this stage to determine how many 1 bits there are in the second place.
Each stage takes O(n) time to count the elements with a 0 or 1 bit in the appropriate place. By repeating this process w times, we can compute all w bits of the result in O(nw) time. I will leave the actual implementation of this to you.

Here's my understanding of at least one author's intention for an O(n * log n * w) solution, where w is the number of bits in the largest sum.
The idea is to examine the contribution of each bit one a time. Since we are only interested in whether the kth bit in the sums is set in any one iteration, we can remove all parts of the numbers that include higher bits, taking them each modulo 2^(k + 1).
Now the sums that would necessarily have the kth bit set lie in the intervals, [2^k, 2^(k + 1)) and [2^(k+1) + 2^k, 2^(k+2) − 2]. So we sort the input list (modulo 2^(k + 1)), and for each left summand, we decrement a pointer to the end of each of the two intervals, and binary search the relevant start index.
Here's JavaScript code with a random comparison to brute force to show that it works (easily translatable to C or Python):
// https://stackoverflow.com/q/64082509
// Returns the lowest index of a value
// greater than or equal to the target
function lowerIdx(a, val, left, right){
if (left >= right)
return left;
mid = left + ((right - left) >> 1);
if (a[mid] < val)
return lowerIdx(a, val, mid+1, right);
else
return lowerIdx(a, val, left, mid);
}
function bruteForce(A){
let answer = 0;
for (let i=1; i<A.length; i++)
for (let j=0; j<i; j++)
answer ^= A[i] + A[j];
return answer;
}
function f(A, W){
const n = A.length;
const _A = new Array(n);
let result = 0;
for (let k=0; k<W; k++){
for (let i=0; i<n; i++)
_A[i] = A[i] % (1 << (k + 1));
_A.sort((a, b) => a - b);
let pairs_with_kth_bit = 0;
let l1 = 1 << k;
let r1 = 1 << (k + 1);
let l2 = (1 << (k + 1)) + (1 << k);
let r2 = (1 << (k + 2)) - 2;
let ptr1 = n - 1;
let ptr2 = n - 1;
for (let i=0; i<n-1; i++){
// Interval [2^k, 2^(k+1))
while (ptr1 > i+1 && _A[i] + _A[ptr1] >= r1)
ptr1 -= 1;
const idx1 = lowerIdx(_A, l1-_A[i], i+1, ptr1);
let sum = _A[i] + _A[idx1];
if (sum >= l1 && sum < r1)
pairs_with_kth_bit += ptr1 - idx1 + 1;
// Interval [2^(k+1)+2^k, 2^(k+2)−2]
while (ptr2 > i+1 && _A[i] + _A[ptr2] > r2)
ptr2 -= 1;
const idx2 = lowerIdx(_A, l2-_A[i], i+1, ptr2);
sum = _A[i] + _A[idx2]
if (sum >= l2 && sum <= r2)
pairs_with_kth_bit += ptr2 - idx2 + 1;
}
if (pairs_with_kth_bit & 1)
result |= 1 << k;
}
return result;
}
var As = [
[1, 2, 3], // 2
[1, 2, 10, 11, 18, 20], // 50
[10, 26, 38, 44, 51, 70, 59, 20] // 182
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 10) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 500;
for (let i=0; i<numTests; i++){
const W = 8;
const A = [];
const n = 12;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << (W - 1)));
A.push(num);
}
const fA = f(A, W);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");

Need explaination for this code (algorithm)

The problem:
Larry is very bad at math - he usually uses a calculator, which worked well throughout college. Unforunately, he is now struck in a deserted island with his good buddy Ryan after a snowboarding accident. They're now trying to spend some time figuring out some good problems, and Ryan will eat Larry if he cannot answer, so his fate is up to you!
It's a very simple problem - given a number N, how many ways can K numbers less than N add up to N?
For example, for N = 20 and K = 2, there are 21 ways:
0+20
1+19
2+18
3+17
4+16
5+15
...
18+2
19+1
20+0
Input
Each line will contain a pair of numbers N and K. N and K will both be an integer from 1 to 100, inclusive. The input will terminate on 2 0's.
Output
Since Larry is only interested in the last few digits of the answer, for each pair of numbers N and K, print a single number mod 1,000,000 on a single line.
Sample Input
20 2
20 2
0 0
Sample Output
21
21
The solution code:
#include<iostream>
#include<stdlib.h>
#include<stdio.h>
using namespace std;
#define maxn 100
typedef long ss;
ss T[maxn+2][maxn+2];
void Gen() {
ss i, j;
for(i = 0; i<= maxn; i++)
T[1][i] = 1;
for(i = 2; i<= 100; i++) {
T[i][0] = 1;
for(j = 1; j <= 100; j++)
T[i][j] = (T[i][j-1] + T[i-1][j]) % 1000000;
}
}
int main() {
//freopen("in.txt", "r", stdin);
ss n, m;
Gen();
while(cin>>n>>m) {
if(!n && !m) break;
cout<<T[m][n]<<endl;
}
return 0;
}
How has this calculation been derived?
How has it come T[i][j] = (T[i][j-1] + T[i-1][j]) ?

Note: I only use n and k (lower case) to refer to some anonymous variable. I will always use N and K (upper case) to refer to N and K as defined in the question (sum and the number of portions).
Let C(n, k) be the result of n choose k, then the solution to the problem is C(N + K - 1, K - 1), with the assumption that those K numbers are non-negative (or there will be infinitely many solution even for N = 0 and K = 2).
Since the K numbers are non-negative, and the sum N is fixed, we can think of the problem as: how many ways to divide candy among K people. We can divide the candies, by lying them into a line, and put (K - 1) separator between the candies. The (K - 1) separators will divide the candies up to K portions of candies. Looking at another perspective, it is also like choosing (K - 1) positions among (N + K - 1) positions to put in the separators, then the rest of the positions are candies. So, this explains why the number of ways is N + (K - 1) choose (K - 1).
Then the problem reduce to how to find the least significant digits of C(n, k). (Since maximum of N and K is 100 as defined in maxn, we don't have to worry if the algorithm goes up to O(n3)).
The calculation uses this combinatorial identity C(n, k) = C(n - 1, k) + C(n, k - 1) (Pascal's rule). The clever thing about the implementation is that it doesn't store C(n, k) (table of result of combination, which is a jagged array), but it stores C(N, K) instead. The identity is actually present in the T[i][j] = (T[i][j-1] + T[i-1][j]):
The first dimension is actually K, the number of portions. And the second dimension is the sum N. T[K][N] will directly store the result, and according to the mathematical result derived above, is (least significant digits of) C(N + K - 1, K - 1).
Re-writing the T[i][j] = (T[i][j-1] + T[i-1][j]) back to equivalent mathematical result:
C(i + j - 1, i - 1) = C(i + j - 2, i - 1) + C(i + j - 2, i - 2), which is correct according to the identity.
The program will fill the array row by row:
The row K = 0 is already initialized to 0, using the fact that static array is initialized to 0.
It fills the row K = 1 with 1 (there is only 1 way to divide N into 1 portion).
For the rest of the rows, it sets the case N = 0 to 1 (there is only 1 way to divide 0 into K parts - all parts are 0).
Then the rest are filled with the expression T[i][j] = (T[i][j-1] + T[i-1][j]), which will refer to the previous row, and the previous element of the same row, both of which has been filled up in earlier iterations.

Let C(x, y) to be the result of x choose y, then the value of T[i][j] equals: C(i - 1 + j, j).
You can proove this by induction.
Base cases:
T[1][j] = C(1 - 1 + j, j) = C(j, j) = 1
T[i][0] = C(i - 1, 0) = 1
For the induction step, use the formula (for 0<=y<=x):
C(x,y) = C(x - 1, y - 1) + C(x - 1, y)
Therefore:
C(i - 1 + j, j) = C(i-1+j - 1, j - 1) + C(i-1+j - 1, j) = C(i-1+(j-1), (j-1)) + C((i-1)-1+j, j)
Or in other words:
T[i][j] = T[i,j-1] + T[i-1,j]
Now, as nhahtdh mentioned before, the value you are looking for is C(N + K - 1, K - 1)
which equals:
T[N+1][K-1] = C(N+1-1+K-1, K-1)
(modulo 1000000)

This is a famous problem - you can check solution here
How many ways to drop N identical balls to K boxes.
The following algorithm is a dynamic-programming solution to your problem:
Define D[i,j] to be the number of ways i numbers less than j, can sum up to j.
0 <= i < = N
1 <= j <= K
Where D[j,1] = 1 for every j.
And where j > 1 you get:
D[i,j] = D[i,j-1] + D[i-1,j-1] +...+ D[0,j-1]

The problem is known as "the integer partition problem". Basically there exists a recursive computation of the k-partition of n, but your solution is just the dynamic programming version of it (non-recursive and computing bottom-up for short).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight