Find the unique integer that appears m times in an array - arrays

where all other integers in this array appeared n times. we have n>m.
all elements in this array are integers. Can you design an algorithm that works in O(N) where N is the number of elements in the array, while minimizing the space complexity? In the best case space complexity can be limited to O(log(m)).
a special case is n=2 and m=1 (which is easy). Is there a generalized algorithm that can handle arbitrary m and n?
thanks

You can use a hashtable that maps numbers in the array to the number of occurrences. You can iterate through the array, incrementing the number of occurrences for each number. Then, you can iterate through the hashtable, searching for a key with n occurrences.

If the array has length > m, then pivot on a random element in the array. Find the half of the array that has length m (mod n), and repeat on that half.
This has expected run-time O(N), and requires O(1) additional storage.

The below algo might be what you're looking for. Basically, you create a map with key as the integer value and the value as the number of occurrence.you loop through your array just once and anytime a numbers occurs more than once, you increase it's count
public static void findCount (int[] array,int m, int n){
if(m>n){
thrown new IllegalArgumentException("m is greater than n");
}
Map<Integer,Integer> intCount = new HashMap<Integer,Integer>();
for(int i = 0; i<array.length; i++){
if (!intCount.containsKey(array[i])) intCount.put(array[i], 0);
intCount.put(array[i], intCount.get(array[i]) + 1);
}
for (Map.Entry<String,Integer> entry : words.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
if(value==m){
System.out.println("Value "+key+" Occurs "+value+" times");
}
}
}

The case n=2, m=1 can be done by xoring all the numbers together in the array A.
This idea can be generalised by counting (modulo n) the number of elements in A with the i'th bit set. That count is non-zero if and only if the i'th bit of the answer is set.
This gives you an O(N.log(max(A))) way to compute the solution using O(log(n)) additional storage.
This doesn't achieve the O(log(N)) run-time and O(log(m)) storage complexities given in the question, but it seems an interesting approach.

Related

Hacker Earth(Basic I/O Question) Play With Numbers [ subarry ]

I have been trying to solve this problem and it works good with small numbers but not the big 10^9 numbers in Hacker Earth
You are given an array of n numbers and q queries. For each query you have to print the floor of the expected value(mean) of the subarray from L to R.
INPUT:
First line contains two integers N and Q denoting number of array elements and number of queries.
Next line contains N space separated integers denoting array elements.
Next Q lines contain two integers L and R(indices of the array).
OUTPUT:
print a single integer denoting the answer.
Constraints:
1<= N ,Q,L,R <= 10^6
1<= Array elements <= 10^9
NOTE
Use Fast I/O
using namespace std;
long int solvepb(int a, int b, long int *arr,int n){
int result, count = 0;
vector<long int>res;
for(int i=0;i<n;i++){
if(i+1 >= a && i+1 <=b){
res.push_back(arr[i]);
count += arr[i];
}
}
result = count / res.size();
return result;
}
int main(){
int n,q;cin>>n>>q;
long int arr[n];
for(int i=0;i<n;i++){
cin>>arr[i];
}
while(q--){
int a,b;
cin>>a>>b;
cout<<solvepb(a,b,arr,n)<<endl;
}
return 0;
}```
So currently, the issue with your algorithm is that each time you are computing the mean over two indices in the array. This means that if the queries are particularly bad, for each of the Q queries, you might iterate through all N elements of the array.
How can one try to reduce this? Notice that because sums are additive, the sum up to an index i is the same as the sum up to an index j plus the sum of the numbers between i and j. Let me rewrite that as an equation -
sum[0:i] = sum[0:j] + sum[j+1:i]
It should be obvious now that by rearranging this equation, you can quickly get the sum between two indices by storing the sum of numbers up to an index. (i.e. sum[j+1:i] = sum[0:i] - sum[0:j]). This means that rather than having O(N*Q), you can have O(N + Q) runtime complexity. The O(N) part of the new complexity is from iterating the array once to get all the sums. The O(Q) part comes from answering the Q queries.
This kind of approach is called prefix sums. There are some optimized data structures like Fenwick trees made specifically for prefix sums that you can read about online or on Wikipedia. But for your question, a simple array should work just fine.
A few comments about your code:
In your for loop in the solvepb function, you are going from 0 to n always, but you didn't need to. You could have specified to go from a to b if you knew a was smaller than b. Otherwise, you go from b to a.
You also do not really use the vector. The vector in the solvepb function stores array elements, but these are never used again. You only seem to use it to find the number of elements from a to b, but you can get that by simply subtracting the difference between the two indices (i.e. b-a+1 if a < b otherwise a-b+1)

Largest 3 numbers c language [duplicate]

I have an array
A[4]={4,5,9,1}
I need it would give the first 3 top elements like 9,5,4
I know how to find the max element but how to find the 2nd and 3rd max?
i.e if
max=A[0]
for(i=1;i<4;i++)
{
if (A[i]>max)
{
max=A[i];
location=i+1;
}
}
actually sorting will not be suitable for my application because,
the position number is also important for me i.e. I have to know in which positions the first 3 maximum is occurring, here it is in 0th,1th and 2nd position...so I am thinking of a logic
that after getting the max value if I could put 0 at that location and could apply the same steps for that new array i.e.{4,5,0,1}
But I am bit confused how to put my logic in code
Consider using the technique employed in the Python standard library. It uses an underlying heap data structure:
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
for elem in it:
heappushpop(result, elem)
result.sort(reverse=True)
return result
The steps are:
Make an n length fixed array to hold the results.
Populate the array with the first n elements of the input.
Transform the array into a minheap.
Loop over remaining inputs, replacing the top element of the heap if new data element is larger.
If needed, sort the final n elements.
The heap approach is memory efficient (not requiring more memory than the target output) and typically has a very low number of comparisons (see this comparative analysis).
You can use the selection algorithm
Also to mention that the complexity will be O(n) ie, O(n) for selection and O(n) for iterating, so the total is also O(n)
What your essentially asking is equivalent to sorting your array in descending order. The fastest way to do this is using heapsort or quicksort depending on the size of your array.
Once your array is sorted your largest number will be at index 0, your second largest will be at index 1, ...., in general your nth largest will be at index n-1
you can follw this procedure,
1. Add the n elements to another array B[n];
2. Sort the array B[n]
3. Then for each element in A[n...m] check,
A[k]>B[0]
if so then number A[k] is among n large elements so,
search for proper position for A[k] in B[n] and replace and move the numbers on left in B[n] so that B[n] contains n large elements.
4. Repeat this for all elements in A[m].
At the end B[n] will have the n largest elements.

find pair of numbers whose difference is an input value 'k' in an unsorted array

As mentioned in the title, I want to find the pairs of elements whose difference is K
example k=4 and a[]={7 ,6 23,19,10,11,9,3,15}
output should be :
7,11
7,3
6,10
19,23
15,19
15,11
I have read the previous posts in SO " find pair of numbers in array that add to given sum"
In order to find an efficient solution, how much time does it take? Is the time complexity O(nlogn) or O(n)?
I tried to do this by a divide and conquer technique, but i'm not getting any clue of exit condition...
If an efficient solution includes sorting the input array and manipulating elements using two pointers, then I think I should take minimum of O(nlogn)...
Is there any math related technique which brings solution in O(n). Any help is appreciated..
You can do it in O(n) with a hash table. Put all numbers in the hash for O(n), then go through them all again looking for number[i]+k. Hash table returns "Yes" or "No" in O(1), and you need to go through all numbers, so the total is O(n). Any set structure with O(1) setting and O(1) checking time will work instead of a hash table.
A simple solution in O(n*Log(n)) is to sort your array and then go through your array with this function:
void find_pairs(int n, int array[], int k)
{
int first = 0;
int second = 0;
while (second < n)
{
while (array[second] < array[first]+k)
second++;
if (array[second] == array[first]+k)
printf("%d, %d\n", array[first], array[second]);
first++;
}
}
This solution does not use extra space unlike the solution with a hashtable.
One thing may be done using indexing in O(n)
Take a boolean array arr indexed by the input list.
For each integer i is in the input list then set arr[i] = true
Traverse the entire arr from the lowest integer to the highest integer as follows:
whenever you find a true at ith index, note down this index.
see if there arr[i+k] is true. If yes then i and i+k numbers are the required pair
else continue with the next integer i+1

Finding kth smallest number from n sorted arrays

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)
I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two.
This has logarithmic time complexity.
After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO.
Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)
Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?
I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.
Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.
Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.
PS: Do you think this can be further generalized to the case of unsorted arrays?
PPS: It's not a homework problem, I am just preparing for interviews.
This doesn't generalize the links, but does solve the problem:
Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).
When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.
Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.
It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.
I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.
Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.
Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.
Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step
Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total
that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))
You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.
You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.
Run-time analysis:
You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.
There exist an generalization that solves the problem in O(N log k) time, see the question here.
Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:
class Node {
int elementIndex;
int arrayIndex;
public Node(int elementIndex, int arrayIndex) {
super();
this.elementIndex = elementIndex;
this.arrayIndex = arrayIndex;
}
}
public class KthSmallestInMSortedArrays {
public int findKthSmallest(List<Integer[]> lists, int k) {
int ans = 0;
PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
return lists.get(a.arrayIndex)[a.elementIndex] -
lists.get(b.arrayIndex)[b.elementIndex];
});
for (int i = 0; i < lists.size(); i++) {
Integer[] arr = lists.get(i);
if (arr != null) {
Node n = new Node(0, i);
pq.add(n);
}
}
int count = 0;
while (!pq.isEmpty()) {
Node curr = pq.poll();
ans = lists.get(curr.arrayIndex)[curr.elementIndex];
if (++count == k) {
break;
}
curr.elementIndex++;
pq.offer(curr);
}
return ans;
}
}
The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).
This would be the code. O(k*log(m))
public int findKSmallest(int[][] A, int k) {
PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
for (int i = 0; i < A.length; i++)
queue.offer(new int[] { i, 0 });
int ans = 0;
while (!queue.isEmpty() && --k >= 0) {
int[] el = queue.poll();
ans = A[el[0]][el[1]];
if (el[1] < A[el[0]].length - 1) {
el[1]++;
queue.offer(el);
}
}
return ans;
}
If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .
maybe we can regard the n sorted array as buckets then try the bucket sort method.
This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.
It can be done by doing binary search in each array, while calculating the number of smaller elements.
I used the bisect_left and bisect_right to make it work for non-unique numbers as well,
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1
Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)
public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
{
// if (k>m+n) exception
if (k == 0)
{
return Math.Min(array1[start1], array2[start2]);
}
if (start1 == end1)
{
return array2[k];
}
if (start2 == end2)
{
return array1[k];
}
int mid = k / 2;
int sub1 = Math.Min(mid, end1 - start1);
int sub2 = Math.Min(mid, end2 - start2);
if (array1[start1 + sub1] < array2[start2 + sub2])
{
return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
}
else
{
return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
}
}

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Resources