Permutations with predicate Scala - arrays

I am trying to solve combinations task in Scala. I have an array with repeated elements and I have to count the number of combinations which satisfy the condition a+b+c = 0. Numbers should not be repeated, if they are in different places it doesn`t count as a distinct combination.
So I turned my array into Set, so the elements would not repeat each other. Also, I have found about combinations method for sequences, but I am not really sure how to use it in this case. Also, I do not know where t put these permutations condition.
Here is what I have for now:
var arr = Array(-1, -1, -2, -2, 1, -5, 1, 0, 1, 14, -8, 4, 5, -11, 13, 5, 7, -10, -4, 3, -6, 8, 6, 2, -9, -1, -4, 0)
val arrSet = Set(arr)
arrSet.toSeq.combinations(n)
I am new to Scala, so I would be really grateful for any advice!

Here's what you need:
arr.distinct.combinations(3).filter(_.sum == 0).size
where:
distinct removes the duplicates
combinations(n) produces combinations of n elements
filter filters them by keeping only those whose sum is 0
size returns the total number of such combinations
P.S.: arr don't need to be a var. You should strive to never use var in Scala and stick to val as long as it's possible.

Related

Numpy: Get indices of boundaries in an array where starts of boundaries always start with a particular number; non-boundaries by a particular number

Problem:
The most computationally efficient solution to getting the indices of boundaries in an array where starts of boundaries always start with a particular number and non-boundaries are indicated by a different particular number.
Differences between this question and other boundary-based numpy questions on SO:
here are some other boundary based numpy questions
Numpy 1D array - find indices of boundaries of subsequences of the same number
Getting the boundary of numpy array shape with a hole
Extracting boundary of a numpy array
The difference between the question I am asking and other stackoverflow posts in my attempt to search for a solution is that the other boundaries are indicated by a jump in value, or a 'hole' of values.
What seems to be unique to my case is the starts of boundaries always start with a particular number.
Motivation:
This problem is inspired by IOB tagging in natural language processing. In IOB tagging, the start of a word is tagged with B [beginning] is the tag of the first letter in an entity, I [inside] is the tag for all other characters besides the first character in a word, and [O] is used to tag all non-entity characters
Example:
import numpy as np
a = np.array(
[
0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
]
)
1 is the start of each boundary. If a boundary has a length greater than one, then 2 makes up the rest of the boundary. 0 are non-boundary numbers.
The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1
So the desired solution; the indices of the indices boundary values for a are
desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]
Current Solution:
If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.
I can get the start indices using
starts = np.where(a==1)[0]
starts
array([ 3, 10, 13, 15, 16, 19, 20, 21])
So what's left is 6, 10, 14, 15,16,19,20,21
I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.
first = np.where(a[:-1] - 2 == a[1:])[0]
first
array([6])
second = np.where((a[:-1] - 1 == a[1:]) &
((a[1:]==1) | (a[1:]==0)))[0]
second
array([10, 14, 16])
third = np.where(
(a[:-1] == a[1:]) &
(a[1:]==1)
)[0]
third
array([15, 19, 20])
The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.
Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.
if (a[-1] == 1) | (a[-1] == 2):
pen = np.concatenate((
starts, first, second, third, np.array([a.shape[0]-1])
))
else:
pen = np.concatenate((
starts, first, second, third,
))
np.sort(pen).reshape(-1,2)
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.
A standard trick for this type of problem is to pad the input appropriately. In this case, it is helpful to append a 0 to the end of the array:
In [55]: a1 = np.concatenate((a, [0]))
In [56]: a1
Out[56]:
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
0])
Then your starts calculation still works:
In [57]: starts = np.where(a1 == 1)[0]
In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])
The condition for the end is that the value is a 1 or a 2 followed by a value that is not 2. You've already figured out that to handle the "followed by" condition, you can use a shifted version of the array. To implement the and and or conditions, use the bitwise binary operators & and |, respectiveley. In code, it looks like:
In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]
In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])
Finally, put starts and ends into a single array:
In [63]: np.column_stack((starts, ends))
Out[63]:
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])

Regarding positions of elements within a row's 'islands of positive values'

Consider a specified row in a numpy integer array. My task has 3 parts:
Identify the locations of any positive islands (ie: consec positive values) in the specified row.
Identify the lengths of each of these positive islands.
Determine (True or False) if the island element that is closest in value to the row index is in the FIRST or LAST island position.
The following code, I believe, answers parts a) and b).
import numpy as np
arr = np.array([[-1, -4, -2, -8, 8, -3, -5, -6, 7],
[-4, -9, -1, 3, 8, -7, -6, 2, -5],
[ 4, 6, 9, 3, -1, -2, 5, 4, 8],
[ 5, -1, 2, 5, 6, 7, -3, -4, 1]])
row_idx = 2
arr_row = arr[row_idx]
mask = arr_row > 0
changes = np.concatenate(([mask[0]], mask[:-1] != mask[1:], [mask[-1]]))
isl_idx = np.where(changes)[0] # 1st index of islands
pos_idx = isl_idx[::2] # 1st index of POSITIVE islands
print('pos_idx = ', pos_idx)
pos_len = np.diff(isl_idx)[::2] # length of POSITIVE islands
print('pos_len = ', pos_len)
print()
When row_idx = 2, for example, we have output:
pos_idx = [0, 6] # first indices of the two positive islands
pos_len = [4,3]) # lengths of the two positive islands
My problem is that I can't find a good way to tackle part c). The desired output, for the example above, would look like:
firstLast = [True, False]
Explanation: We are in row_idx = 2, so:
The value in the 1st positive island closest to 2 is 3, and this 3 is indeed in the FIRST or LAST position of its island. (True)
The value in the 2nd positive island closest to 2 is 4, which is not in the FIRST or LAST position of its island. (False)

Efficient method of finding single unsorted element in array

Assume an array of length n, which is sorted. (values are arbitrary, can be negative, not only ints)
One element is out of place, for example
-1, 2, 3.0, 4, 5, 10, 6, 7, 8, 9, 11, 12, 13, 14
n can be large.
Is there a way better than o(n) to find that element?

Find All Numbers in Array which Sum upto Zero

Given an array, the output array consecutive elements where total sum is 0.
Eg:
For input [2, 3, -3, 4, -4, 5, 6, -6, -5, 10],
Output is [3, -3, 4, -4, 5, 6, -6, -5]
I just can't find an optimal solution.
Clarification 1: For any element in the output subarray, there should a subset in the subarray which adds with the element to zero.
Eg: For -5, either one of subsets {[-2, -3], [-1, -4], [-5], ....} should be present in output subarray.
Clarification 2: Output subarray should be all consecutive elements.
Here is a python solution that runs in O(n³):
def conSumZero(input):
take = [False] * len(input)
for i in range(len(input)):
for j in range(i+1, len(input)):
if sum(input[i:j]) == 0:
for k in range(i, j):
take[k] = True;
return numpy.where(take, input)
EDIT: Now more efficient! (Not sure if it's quite O(n²); will update once I finish calculating the complexity.)
def conSumZero(input):
take = [False] * len(input)
cs = numpy.cumsum(input)
cs.insert(0,0)
for i in range(len(input)):
for j in range(i+1, len(input)):
if cs[j] - cs[i] == 0:
for k in range(i, j):
take[k] = True;
return numpy.where(take, input)
The difference here is that I precompute the partial sums of the sequence, and use them to calculate subsequence sums - since sum(a[i:j]) = sum(a[0:j]) - sum(a[0:i]) - rather than iterating each time.
Why not just hash the incremental sum totals and update their indexes as you traverse the array, the winner being the one with largest index range. O(n) time complexity (assuming average hash table complexity).
[2, 3, -3, 4, -4, 5, 6, -6, -5, 10]
sum 0 2 5 2 6 2 7 13 7 2 12
The winner is 2, indexed 1 to 8!
To also guarantee an exact counterpart contiguous-subarray for each number in the output array, I don't yet see a way around checking/hashing all the sum subsequences in the candidate subarrays, which would raise the time complexity to O(n^2).
Based on the example, I assumed that you wanted to find only the ones where 2 values together added up to 0, if you want to include ones that add up to 0 if you add more of them together (like 5 + -2 + -3), then you would need to clarify your parameters a bit more.
The implementation is different based on language, but here is a javascript example that shows the algorithm, which you can implement in any language:
var inputArray = [2, 3, -3, 4, -4, 5, 6, -6, -5, 10];
var ouputArray = [];
for (var i=0;i<inputArray.length;i++){
var num1 = inputArray[i];
for (var x=0;x<inputArray.length;x++){
var num2 = inputArray[x];
var sumVal = num1+num2;
if (sumVal == 0){
outputArray.push(num1);
outputArray.push(num2);
}
}
}
Is this the problem you are trying to solve?
Given a sequence , find maximizing such that
If so, here is the algorithm for solving it:
let $U$ be a set of contiguous integers
for each contiguous $S\in\Bbb Z^+_{\le n}$
for each $\T in \wp\left([i,j)\right)$
if $\sum_{n\in T}a_n = 0$
if $\left|S\right| < \left|U\left$
$S \to u$
return $U$
(Will update with full latex once I get the chance.)

Removing numbers from a large range of numbers

I've got the following problem that I'm trying to find a more optimal solution for.
Let's say you have a range of numbers between 0 and 9:
Values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Now, let's say you "remove" 1, 4, 5, and 7:
Values: 0, -, 2, 3, -, -, 6, -, 8, 9
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Where there is no value, all subsequent values are shifted to the left:
Values: 0, 2, 3, 6, 8, 9
Index: 0, 1, 2, 3, 4, 5
The value at index 1 has now become 2 (was 1), the value at index 2 is now 3 (was 2), the value at index 3 is now 6 (was 3), etc.
Here's the problem. I need to manage this on a larger scale, up to tens of thousands of values. A random number of those values will be removed from the original contiguous range, and potentially added back afterwards (but not in the same order they were removed). The starting state will always be a complete sequence of numbers between 0 and MAX_VAL.
Things I've tried:
1) Maintaining an array of values, removing values from that array, and shifting everything over by one. This fails because you're iterating through all the values after the one you've just removed, and it's too slow as a result. Getting the value for a given index afterwards is really fast though.
2) Maintaining a linked list of values, and removing the value by pulling it out of the list. This seems to be slow both adding/removing values and getting the value at a given index, since I need to walk through the list first.
3) Keeping track of the "removed" values, rather then maintaining a giant array/list/etc of values from 0 to MAX_VAL. If the removed values are stored in an ordered array, then it becomes trivial to calculate how many values have been removed before and after a given index, and just return an offset index instead. This kinda works, except it's slow to maintain the ordered array of removed values and iterate through that instead, especially if the number of removed values approaches MAX_VAL.
Is there some sort of algorithm or technique that can handle this kind of problem more quickly and efficiently?
Is there some sort of algorithm or technique that can handle this kind of problem more quickly and efficiently?
The answer very much depends on typical use cases:
Is the set of numbers typically sparse or dense?
How often do you do insertions vs. removals vs. lookups?
In which patterns are numbers inserted or removed (random, continuous, from the end or start)?
What are there any memory constraints?
Here are some ideas for a generic solution:
Create a structure that stores ranges instead of numbers.
Start with a single entry: 0 - MAX_VAL.
A range can have subranges. This resulting graph of ranges forms a tree.
Removing a number splits a leaf range into two, creating two new leafs.
This algorithm would perform quite well when the set is dense (because there are few ranges). It would still perform somewhat fast when the graph grows (O(log n) for lookups) when you keep the tree balanced.
Now, let's say you "remove" 1, 4, 5, and 7:
Values: 0, -100, 2, 3, -100, -100, 6, -100, 8, 9// use a unique value that doesn't used in array
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Resources