Counting appropriate number of subarrays in an array excluding some specific pairs? - arrays

Let's say, I have an array like this:
1 2 3 4 5
And given pair is (2,3), then number of possible subarrays that don't have (2,3) present in them will be,,
1. 1
2. 2
3. 3
4. 4
5. 5
6. 1 2
7. 3 4
8. 4 5
9. 3 4 5
So, the answer will be 9.
Obviously, there can be more of such pairs.
Now, one method that I thought of is of O(n^2) which involves finding all such elements of maximum length n. Can I do better? Thanks!

Let's see, this adhoc pseudocode should be O(n):
array = [1 2 3 4 5]
pair = [2 3]
length = array.length
n = 0
start = 0
while (start < length)
{
# Find next pair
pair_pos = start
while (pair_pos < length) and (array[pair_pos,pair_pos+1] != pair) # (**1)
{
pair_pos++
}
# Count subarrays
n += calc_number_of_subarrays(pair_pos-start) # (**2)
# Continue after the pair
start = pair_pos+2
}
print n
Note **1: This seems to involve a loop inside the outer loop. Since every element of the array is visited exactly once, both loops together are O(n). In fact, it is probably easy to refactor this to use only one while loop.
Note **2: Given an array of length l, there are l+(l-1)+(l-2)+...+1 subarrays (including the array itself). Which is easy to calculate in O(1), there is no loop involved. c/f Euler. :)

You don't need to find which subarrays are in an array to know how many of them there are. Finding where the pair is in the array is at most 2(n-1) array operations. Then you only need to do a simple calculation with the two lengths you extract from that. The amount of subarrays in an array of length 3 is, for example, 3 + 2 + 1 = 6 = (n(n+1))/2.
The solution uses that in a given array [a, ..., p1, p2, ..., b], the amount of subarrays without the pair is the amount of subarrays for [a, ..., p1] + the amount of subarrays for [p2, ..., b]. If multiple of such pairs exist, we repeat the same trick on [p2, ..., b] as if it was the whole array.
function amount_of_subarrays ::
index := 1
amount := 0
lastmatch := 0
while length( array ) > index do
if array[index] == pair[1] then
if array[index+1] == pair[2] then
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
lastmatch := index
fi
fi
index := index + 1
od
//index is now equal to the length
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
return amount
For an array [1, 2, 3, 4, 5] with pair [2, 3], index will be 2 when the two if-statements are true. amount will be updated to 3 and lastmatch will be updated to 2. No more matches will be found, so lastmatch is 2 and index is 5. amount will be 3 + 6 = 9.

Related

How to find a sum 3 elements in array?

I have an array A=[a1,a2,a3, ..., aN] I would like to take a product of each 3 elements:
s1=a1+a2+a3
s2=a4+a5+a6
...
sM=a(N-2)+a(N-1)+aN
My solution:
k=size(A);
s=0;
for n=1:k
s(n)=s(n-2)+s(n-1)+s(n);
end
Error: Attempted to access s(2); index out of bounds because numel(s)=1.
Hoe to fix it?
If you want to sum in blocks, for the general case when the number of elements of A is not necessarily a multiple of the block size, you can use accumarray:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = accumarray(ceil((1:numel(A))/s).', A(:));
If you want a sliding sum with a given block size, you can use conv:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = conv(A(:).', ones(1,s), 'valid');
You try to calculate sby using values from s. Dont you mean s(n)=A(n-2)+A(n-1)+A(n);? Also size returns more than one dimension on its own.
That being said, getting the 2 privous values n-2 and n-1 doenst work for n=1;2 (because you must have positive indices). You have to explain how the first two values should be handeled. I assume either 0 for elements not yet exisiting
k=size(A,2); %only the second dimension when A 1xn, or length(A)
s=zeros(1,k); %get empty values instead of appending each value for better performance
s(1)=A(1);
s(2)=A(2)+A(1);
for n=3:k %start at 3
s(n)=A(n-2)+A(n-1)+A(n);
end
or sshoult be 2 values shorter than A.
k=size(A,2);
s=zeros(1,k-2);
for n=1:k-2
s(n)=A(n)+A(n+1)+A(n+2);
end
You initialise s as a scalar with s = 0. Then you try and index it like an array, but it only has a single element.
Your current logic (if fixed) will calculate this:
s(1) = a(1)+a(2)+a(3)
s(2) = a(2)+a(3)+a(4)
...
% 's' will be 2 elements shorter than 'a'
So we need to be a bit wiser with the indexing to get what you describe, which is
s(1) = a(1)+a(2)+a(3)
s(2) = a(4)+a(5)+a(6)
...
% 's' will be a third as big as 'a'
You should pre-allocate s to the right size, like so:
k = numel(A); % Number of elements in 'A'
s = zeros( 1, k/3 ); % Output array, assuming 'k' is divisible by 3
for n = 0:3:k-3
s(n/3+1) = a(n+1) + a(n+2) + a(n+3);
end
You could do this in one line by reshaping the array to have 3 rows, then summing down each column, this assumes that the number of elements in a is divisible by 3, and that a is a row vector...
s = sum( reshape( a, 3, [] ) );

Number of ways of partitioning an array

Given an array of n elements, a k-partitioning of the array would be to split the array in k contiguous subarrays such that the maximums of the subarrays are non-increasing. Namely max(subarray1) >= max(subarray2) >= ... >= max(subarrayK).
In how many ways can an array be partitioned into valid partitions like the ones mentioned before?
Note: k isn't given as input or anything, I mereley used it to illustrate the general case. A partition could have any size from 1 to n, we just need to find all the valid ones.
Example, the array [3, 2, 1] can be partitioned in 4 ways, you can see them below:
The valid partitions :[3, 2, 1]; [3, [2, 1]]; [[3, 2], 1]; [[3], [2], [1]].
I've found a similar problem related to linear partitioning, but I couldn't find a way to adapt the thinking to this problem. I'm pretty sure this is dynamic programming, but I haven't been able to properly identify
how to model the problem using a recurrence relation.
How would you solve this?
Call an element of the input a tail-max if it is at least as great as all elements that follow. For example, in the following input:
5 9 3 3 1 2
the following elements are tail-maxes:
5 9 3 3 1 2
^ ^ ^ ^
In a valid partition, every subarray must contain the next tail-max at or after the subarray's starting position; otherwise, the next tail-max will be the max of some later subarray, and the condition of non-increasing subarray maximums will be violated.
On the other hand, if every subarray contains the next tail-max at or after the subarray's starting position, then the partition must be valid, as the definition of a tail-max ensures that the maximum of a later subarray cannot be greater.
If we identify the tail-maxes of an array, for example
1 1 9 2 1 6 5 1
. . X . . X X X
where X means tail-max and . means not, then we can't place any subarray boundaries before the first tail-max, because if we do, the first subarray won't contain a tail-max. We can place at most one subarray boundary between a tail-max and the next; if we place more, we get a subarray that doesn't contain a tail-max. The last tail-max must be the last element of the input, so we can't place a subarray boundary after the last tail-max.
If there are m non-tail-max elements between a tail-max and the next, that gives us m+2 options: m+1 places to put an array boundary, or we can choose not to place a boundary between these elements. These factors are multiplicative.
We can make one pass from the end of the input to the start, identifying the lengths of the gaps between tail-maxes and multiplying together the appropriate factors to solve the problem in O(n) time:
def partitions(array):
tailmax = None
factor = 1
result = 1
for i in reversed(array):
if tailmax is None:
tailmax = i
continue
factor += 1
if i >= tailmax:
# i is a new tail-max.
# Multiply the result by a factor indicating how many options we
# have for placing a boundary between i and the old tail-max.
tailmax = i
result *= factor
factor = 1
return result
Update: Sorry I misunderstanding the problem. In this case, split the arrays to sub-arrays where every tails is the max element in the array, then it will work in narrow cases. e.g. [2 4 5 9 6 8 3 1] would be split to [[2 4 5 9] 6 8 9 3 1] first. Then we can freely chose range 0 - 5 to decide whether following are included. You can use an array to record the result of DP. Our goal is res[0]. We already have res[0] = res[5] + res[6] + res[7] + res[8] + res[9] + res[10] in above example and res[10] = 1
def getnum(array):
res = [-1 for x in range(len(array))]
res[0] = valueAt(array, res, 0)
return res[0]
def valueAt(array, res, i):
m = array[i]
idx = i
for index in range(i, len(array), 1):
if array[index] > m:
idx = index
m = array[index]
value = 1;
for index in range(idx + 1, len(array), 1):
if res[index] == -1:
res[index] = valueAt(array, res, index)
value = value + res[index]
return value;
Worse than the answer above in time consuming. DP always costs a lot.
Old Answer: If no duplicate elements in an array is allowed, the following way would work:
Notice that the number of sub-arrays is not depends on the values of elements if no duplicate. We can remark the number is N(n) if there is n elements in array.
The largest element must be in the first sub-arrays, other elements can be in or not in the first sub-array. Depends on whether they are in the first sub-array, the number of partitions for the remaining elements varies.
So,
N(n) = C(n-1, 1)N(n-1) + C(n-1, 2)N(n-2) + ... + C(n-1, n-1)N(0)
where C(n,k) means:
Then it can be solved by DP.
Hope this helps

Vectorizing addition of subarray

Let's say I have two (large) vectors a=[0 0 0 0 0] and b=[1 2 3 4 5] of the same size and one index vector ind=[1 5 2 1] with values in {1,...,length(a)}. I would like to compute
for k = 1:length(ind)
a(ind(k)) = a(ind(k)) + b(ind(k));
end
% a = [2 2 0 0 5]
That is, I want to add those entries of b declared in ind to a including multiplicity.
a(ind)=a(ind)+b(ind);
% a = [1 2 0 0 5]
is much faster, of course, but ignores indices which appear multiple times.
How can I speed up the above code?
We can use unique to identify the unique index values and use the third output to determine which elements of ind share the same index. We can then use accumarray to sum all the elements of b which share the same index. We then add these to the original value of a at these locations.
[uniqueinds, ~, inds] = unique(ind);
a(uniqueinds) = a(uniqueinds) + accumarray(inds, b(ind)).';
If max(inds) == numel(a) then this could be simplified to the following since accumarray will simply return 0 for any missing entry in ind.
a(:) = a(:) + accumarray(ind(:), b(ind));
Another approach based on accumarray:
a(:) = a(:) + accumarray(ind(:), b(ind(:)), [numel(a) 1]);
How it works
accumarray with two column vectors as inputs aggregates the values of the second input corresponding to the same index in the first. The third input is used here to force the result to be the same size as a, padding with zeros if needed.

Finding indexes of maximum values of an array

How do I find the index of the 2 maximum values of a 1D array in MATLAB? Mine is an array with a list of different scores, and I want to print the 2 highest scores.
You can use sort, as #LuisMendo suggested:
[B,I] = sort(array,'descend');
This gives you the sorted version of your array in the variable B and the indexes of the original position in I sorted from highest to lowest. Thus, B(1:2) gives you the highest two values and I(1:2) gives you their indices in your array.
I'll go for an O(k*n) solution, where k is the number of maximum values you're looking for, rather than O(n log n):
x = [3 2 5 4 7 3 2 6 4];
y = x; %// make a copy of x because we're going to modify it
[~, m(1)] = max(y);
y(m(1)) = -Inf;
[~, m(2)] = max(y);
m =
5 8
This is only practical if k is less than log n. In fact, if k>=3 I would put it in a loops, which may offend the sensibilities of some. ;)
To get the indices of the two largest elements: use the second output of sort to get the sorted indices, and then pick the last two:
x = [3 2 5 4 7 3 2 6 4];
[~, ind] = sort(x);
result = ind(end-1:end);
In this case,
result =
8 5

matlab: eliminate elements from array

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.
A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.
Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.
What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));
Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

Resources