Finding count of distinct elements in every k subarray - arrays

How to solve this question efficiently?
Given an array of size n and an integer k we need to return the sum of count of all distinct numbers in a window of size k. The window slides forward.
e.g. arr[] = {1,2,1,3,4,2,3};
Let k = 4.
The first window is {1,2,1,3}, count of distinct numbers is 2….(1 is repeated)
The second window is {2,1,3,4} count of distinct numbers is 4
The third window is {1,3,4,2} count of distinct numbers is 4
The fourth window is {3,4,2,3} count of distinct numbers is 2

You should keep track of
a map that counts frequencies of elements in your window
a current sum.
The map with frequencies can also be an array if the possible elements are from a limited set.
Then when your window slides to the right...
increase the frequency of the new number by 1.
if that frequency is now 1, add it to the current sum.
decrease the frequency of the old number by 1.
if that frequency is now 0, subtract it from the current sum.

Actually, I am the asker of the question, I am not answering the question, but i just wanted to comment on the answers, but I can't since I have very less reputation.
I think that for {1, 2, 1, 3} and k = 4, the given algorithms produce count = 3, but according to the question, the count should be 2 (since 1 is repeated)

You can use a hash table H to keep track of the window as you iterate over the array. You also keep an additional field for each entry in the hash table that tracks how many times that element occurs in your window.
You start by adding the first k elements of arr to H. Then you iterate through the rest of arr and you decrease the counter field of the element that just leaves the windows and increase the counter field of the element that enters the window.
At any point (including the initial insertion into H), if a counter field turns 1, you increase the number of distinct elements you have in your window. This can happen while the last but one occurrence of an element leaves the window or while a first occurrence enters it. If a counter field turns to any other value but 1, you decrease the number of distinct elements you have in the window.
This is a linear solution in the number of elements in arr. Hashing integers can be done like this, but depending on the language you use to implement your solution you might not really need to hash them yourself. In case the range in which the elements of arr reside in is small enough, you can use a simple array instead of the hash table, as the other contributors suggested.

This is how I solved the problem
private static int[] getSolve(int[] A, int B) {
Map<Integer, Integer> map = new HashMap<>();
for (int i = 0; i < B; i++) {
map.put(A[i], map.getOrDefault(A[i], 0) + 1);
}
List<Integer> res = new ArrayList<>();
res.add(map.size());
//4, 1, 3, 1, 5, 2, 5, 6, 7
//3, 1, 5, 2, 5, 6 count = 5
for (int i = B; i < A.length; i++) {
if (map.containsKey(A[i - B]) && map.get(A[i - B]) == 1) {
map.remove(A[i - B]);
}
if (map.containsKey(A[i - B])) {
map.put(A[i - B], map.get(A[i - B]) - 1);
}
map.put(A[i], map.getOrDefault(A[i], 0) + 1);
System.out.println(map.toString());
res.add(map.size());
}
return res.stream().mapToInt(i -> i).toArray();
}

Related

Subset sum problem with known subset size and array being a range

I'm trying to find a fast way to solve the subset sum problem with a few modifications, I know the exact size of the subset I need to get the target number and I also know the input array to be a range from 1 to 2000. My questions is if there is any way to improve upon the base subset sum problem solution to make it even faster when knowing these conditions as the normal solutions are too slow. Basically the only changing part is the wanted target sum.
I would preferably want it to return all the possible subsets of the given size that add up to the target value if its possible without slowing the program down too much. An example code in python or a similar language would be appriciated.
I've tried many of the solutions for the base subset sum problem but they are too slow to execute due to the size of the input array.
Knowing the size of the subset is an incredibly powerful information, because you don't have to iterate through subset size.
Given N your subset size, you could just :
Sum up the N first elements of your input array (first subset of size N)
Iterate by substracting the first element of your subarray, and adding the element next to it, which translate to looking at the next subarray
Return the subarray if the sum equals your target number
This should be O(input array size) in time and O(1) in memory, regardless of the initial array content. There is probably a more optimal solution using the range property of your initial array.
Here is an example in C++ :
void subsetSum(std::vector<int>() array, int subArraySize, int targetNumber)
{
int sum = 0;
for (int i = 0; i < subArraySize; ++i) // Initial sum
{
sum += array[i];
}
for (int i = subArraySize; i < array.size(), ++i)
{
sum -= array[subArraySize-i];
sum += array[i];
if (sum == targetNumber)
std::cout << subArraySize-i; // this print the starting position of the subarray
}
}
First find the contiguous subarray that solves this, or as close to contiguous as we can get. The center of this is going to be target/width if width is odd, or (target-1)/width, (target+1)/width if width is even.
Having found the center, add the same number of neighbors on both sides until you get to the desired width. The rightmost element of the array will need to be shifted further right in cases where there is no contiguous solution.
Ruby code:
def f(target, width)
arr = []
# put in center of array
if width % 2 == 0
arr.append target / width
arr.append target / width + 1
else
arr.append target/width
end
# repeatedly prepend next smallest integer
# and append next largest integer
while arr.length < width
arr.unshift(arr[0] - 1)
arr.append(arr[-1] + 1)
end
# increase the last element of the array to match
# the target sum. This is only necessary if there is no
# contiguous solution. Because of integer division,
# where we need to adjust it will always be to increase
# the sum of the array.
arr[-1] += target - arr.sum
return arr
end
Example run:
> f(12342, 7)
=> [1760, 1761, 1762, 1763, 1764, 1765, 1767]
Note that this code doesn't do any of the work of confirming that a solution exists in the range (1, 2000), but your code should.
So far so fast, but finding all subsets that solve this will be slow because there are many. You can find them by pushing elements to the left and right. in pairs.
Final answer will be the sum over i of: (number of ways of pushing elements to the left by a cumulative i spaces) (number of ways of pushing elements to the right by a cumulative i spaces.
To give a simple example: for a target of 13, width of 3, we start with [3,4,6].
pushes: arrays
0: [3, 4, 6]
1: [2, 4, 7], [2, 5, 6]
2: [1, 4, 8], [1, 5, 7], [2, 3, 8]
3: [1, 3, 9]
4: [1, 2, 10]
... and we're done. There will be a massive number of these, peaking (I think) when the width of the array is half the width of the range, and the initial array is centered in the range.

Number of subarrays with same 'degree' as the array

So this problem was asked in a quiz and the problem goes like:
You are given an array 'a' with elements ranging from 1-106 and the size of array could be maximum 105 Now we are asked to find the number of subarrays with the same 'degree' as the original array. Degree of an array is defined as the frequency of maximum occurring element in the array. Multiple elements could have the same frequency.
I was stuck in this problem for like an hour but couldn't think of any solution. How do I solve it?
Sample Input:
first-input
1,2,2,3,1
first-output 2
second-input
1,1,2,1,2,2
second-output 4
The element that occurs most frequently is called the mode; this problem defines degree as the frequency count. Your tasks are:
Identify all of the mode values.
For each mode value, find the index range of that value. For instance, in the array
[1, 1, 2, 1, 3, 3, 2, 4, 2, 4, 5, 5, 5]
You have three modes (1 2 5) with a degree of 3. The index ranges are
1 - 0:3
2 - 2:8
5 - 10:12
You need to count all index ranges (subarrays) that include at least one of those three ranges.
I've tailored this example to have both basic cases: modes that overlap, and those that do not. Note that containment is a moot point: if you have an array where one mode's range contains another:
[0, 1, 1, 1, 0, 0]
You can ignore the outer one altogether: any subarray that contains 0 will also contain 1.
ANALYSIS
A subarray is defined by two numbers, the starting and ending indices. Since we must have 0 <= start <= end <= len(array), this is the "handshake" problem between array bounds. We have N(N+1)/2 possible subarrays.
For 10**5 elements, you could just brute-force the problem from here: for each pair of indices, check to see whether that range contains any of the mode ranges. However, you can easily cut that down with interval recognition.
ALGORITHM
Step through the mode ranges, left to right. First, count all subranges that include the first mode range [0:3]. There is only 1 possible starts [0] and 10 possible ends [3:12]; that's 10 subarrays.
Now move to the second mode range, [2:8]. You need to count subarrays that include this, but exclude those you've already counted. Since there's an overlap, you need a starting point later than 0, or an ending point before 3. This second clause is not possible with the given range.
Thus, you consider start [1:2], end [8:12]. That's 2 * 5 more subarrays.
For the third range [10:12 (no overlap), you need a starting point that does not include any other subrange. This means that any starting point [3:10] will do. Since there's only one possible endpoint, you have 8*1, or 8 more subarrays.
Can you turn this into something formal?
Taking reference from leet code
https://leetcode.com/problems/degree-of-an-array/solution/
solve
class Solution {
public int findShortestSubArray(int[] nums) {
Map<Integer, Integer> left = new HashMap(),
right = new HashMap(), count = new HashMap();
for (int i = 0; i < nums.length; i++) {
int x = nums[i];
if (left.get(x) == null) left.put(x, i);
right.put(x, i);
count.put(x, count.getOrDefault(x, 0) + 1);
}
int ans = nums.length;
int degree = Collections.max(count.values());
for (int x: count.keySet()) {
if (count.get(x) == degree) {
ans = Math.min(ans, right.get(x) - left.get(x) + 1);
}
}
return ans;
}
}

How do I generate random numbers from an array without repetition?

I know similar question have been asked before but bear with me.
I have an array:
int [] arr = {1,2,3,4,5,6,7,8,9};
I want numbers to be generated randomly 10 times. Something like this:
4,6,8,2,4,9,3,8,7
Although some numbers are repeated, no number is generated more than once in a row. So not like this:
7,3,1,8,8,2,4,9,5,6
As you can see, the number 8 was repeated immediately after it was generated. This is not the desired effect.
So basically, I'm ok with a number being repeated as long as it doesn't appear more than once in a row.
Generate a random number.
Compare it to the last number you generated
If it is the same; discard it
If it is different, add it to the array
Return to step 1 until you have enough numbers
generate a random index into the array.
repeat until it's different from the last index used.
pull the value corresponding to that index out of the array.
repeat from beginning until you have as many numbers as you need.
While the answers posted are not bad and would work well, someone might be not pleased with the solution as it is possible (tough incredibly unlikely) for it to hang if you generate long enough sequence of same numbers.
Algorithm that deals with this "problem", while preserving distribution of numbers would be:
Pick a random number from the original array, let's call it n, and output it.
Make array of all elements but n
Generate random index from the shorter array. Swap the element on the index with n. Output n.
Repeat last step until enough numbers is outputed.
int[] arr = {1, 2, 3, 4, 5, 6, 7, 8, 9};
int[] result = new int[10];
int previousChoice = -1;
int i = 0;
while (i < 10) {
int randomIndex = (int) (Math.random() * arr.length);
if (arr[randomIndex] != previousChoice) {
result[i] = arr[randomIndex];
i++;
}
}
The solutions given so far all involve non-constant work per generation; if you repeatedly generate indices and test for repetition, you could conceivably generate the same index many times before finally getting a new index. (An exception is Kiraa's answer, but that one involves high constant overhead to make copies of partial arrays)
The best solution here (assuming you want unique indices, not unique values, and/or that the source array has unique values) is to cycle the indices so you always generate a new index in (low) constant time.
Basically, you'd have a with loop like this (using Python for language mostly for brevity):
# randrange(x, y) generates an int in range x to y-1 inclusive
from random import randrange
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
result = []
selectidx = 0
randstart = 0
for _ in range(10): # Runs loop body 10 times
# Generate offset from last selected index (randstart is initially 0
# allowing any index to be selected; on subsequent loops, it's 1, preventing
# repeated selection of last index
offset = randrange(randstart, len(arr))
randstart = 1
# Add offset to last selected index and wrap so we cycle around the array
selectidx = (selectidx + offset) % len(arr)
# Append element at newly selected index
result.append(arr[selectidx])
This way, each generation step is guaranteed to require no more than one new random number, with the only constant additional work being a single addition and remainder operation.

How to determine to which extent/level an array of integers is already sorted

Consider an array of any given unique integers e.g. [1,3,2,4,6,5] how would one determine
the level of "sortedness", ranging from 0.0 to 1.0 ?
One way would be to evaluate the number of items that would have to be moved to make it sorted and then divide that by the total number of items.
As a first approach, I would detect the former as just the number of times a transition occurs from higher to lower value. In your list, that would be:
3 -> 2
6 -> 5
for a total of two movements. Dividing that by six elements gives you 33%.
In a way, this makes sense since you can simply move the 2 to between 1 and 3, and the 5 to between 4 and 6.
Now there may be edge cases where it's more efficient to move things differently but then you're likely going to have to write really complicated search algorithms to find the best solution.
Personally, I'd start with the simplest option that gave you what you wanted and only bother expanding if it turns out to be inadequate.
I would say the number of swaps is not a very good way to determine this. Most importantly because you can sort the array using a different number of swaps. In your case, you could switch 2<-->3 and 6<-->5, but you could also do a lot more switches.
How would you sort, say:
1 4 3 2 5
Would you directly switch 2 and 4, or would you switch 3 and 4, then 4 and 2, and then 3 and 2.
I would say a more correct method would be the number of elements in the right place divided by the total number of elements.
In your case, that would be 2/6.
Ok this is just an idea, but what if you can actually sort the array, i.e.
1,2,3,4,5,6
then get it as a string
123456
now get your original array in string
132465
and compare the Levenshtein distance between the two
I'll propose a different approach: let's count the number of non-descending sequences k in the array, then take its reversal: 1/k. For perfectly sorted array there's only one such sequence, 1/k = 1/1 = 1. This "unsortedness" level is the lowest when the array is sorted descendingly.
0 level is approached only asymptotically when the size of the array approaches infinity.
This simple approach can be computed in O(n) time.
In practice, one would measure unsortedness by the amount of work it needs to get sorted. That depends on what you consider "work". If only swaps are allowed, you could count the number op swaps needed. That has a nice upper bound of (n-1). For a mergesort kind of view you are mostly interested in the number of runs, since you'll need about log (nrun) merge steps. Statistically, you would probably take "sum(abs((rank - intended_rank))" as a measure, similar to a K-S test. But at eyesight, sequences like "HABCDEFG" (7 swaps, 2 runs, submean distance) and "HGFEDCBA" (4 swaps, 8 runs, maximal distance) are always showstoppers.
You could sum up the distances to their sorted position, for each item, and divide with the maximum such number.
public static <T extends Comparable<T>> double sortedMeasure(final T[] items) {
int n = items.length;
// Find the sorted positions
Integer[] sorted = new Integer[n];
for (int i = 0; i < n; i++) {
sorted[i] = i;
}
Arrays.sort(sorted, new Comparator<Integer>() {
public int compare(Integer i1, Integer i2) {
T o1 = items[i1];
T o2 = items[i2];
return o1.compareTo(o2);
}
public boolean equals(Object other) {
return this == other;
}
});
// Sum up the distances
int sum = 0;
for (int i = 0; i < n; i++) {
sum += Math.abs(sorted[i] - i);
}
// Calculate the maximum
int maximum = n*n/2;
// Return the ratio
return (double) sum / maximum;
}
Example:
sortedMeasure(new Integer[] {1, 2, 3, 4, 5}) // -> 0.000
sortedMeasure(new Integer[] {1, 5, 2, 4, 3}) // -> 0.500
sortedMeasure(new Integer[] {5, 1, 4, 2, 3}) // -> 0.833
sortedMeasure(new Integer[] {5, 4, 3, 2, 1}) // -> 1.000
One relevant measurement of sortedness would be "number of permutations needed to be sorted". In your case that would be 2, switching the 3,2 and 6,5. Then remains how to map this to [0,1]. You could calculate the maximum number of permutations needed for the length of the array, some sort of a "maximum unsortedness", which should yield a sortedness value of 0. Then take the number of permutations for the actual array, subtract it from the max and divide by max.

Find shortest subarray containing all elements

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?
Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).
This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.
Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)
This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]
Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]
Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}

Resources