Cut a set of ranges of integers from another set of ranges - arrays

I have two arrays of ranges in this form:
wanted = {[10, 15], [20, 25]}
cut = {[5, 12], [22, 24]}
So wanted is an array of two elements (ranges) - [10, 15] and [20, 25].
Each of the two arrays fulfil these conditions:
It is sorted by the first value in each range of integers
The ranges will never overlap (e.g. [10, 15], [15, 25] is not possible)
This also means that each range is unique within the array (no [1, 5], [1, 5])
If a range is just one integer wide, it will be displayed as [5, 5] (beginning and end are equal)
I now want to obtain an array of ranges, where all ranges from cut have been removed from the ranges in wanted.
result = {[13, 15], [20, 21], [25, 25]}
Is there some brilliant algorithm better / easier / faster than the below?
For each element in wanted, compare that element to one element after another from cut until the element from cut ends above the element from wanted.

Say there are n elements in wanted and m elements in cut.
The following is an O(m + n) algorithm to perform the required task:
j = 1
result = {}
for i = 1:n
// go to next cut while current cut ends before current item
while j <= m && cut[j].end < wanted[i].start
j++
// cut after item, thus no overlap
if j > m || cut[j].start > wanted[i].end
result += (wanted[i].start, wanted[i].end)
else // overlap
// extract from start to cut start
if cut[j].start > wanted[i].start
result += (wanted[i].start, cut[j].start-1)
// extract from cut end to end
if cut[j].end < wanted[i].end
result += (cut[j].end+1, wanted[i].end)
j++
Note that, asymptotically, you can't do better than O(m + n), since it should be reasonably easy to prove that you need to look at every element (in the worst case).

What is the biggest size which wanted and cut may be? Comparing the "first element from wanted" with "all from cut" will take O(n^2) run time, i.e. very slow if the arrays are large.
It would be much faster to work over each array in parallel until you reach the end of both, something like a "merge".

Related

How to find out if an arithmetic sequence exists in an array

If there is an array that contains random integers in ascending order, how can I tell if this array contains a arithmetic sequence (length>3) with the common differece x?
Example:
Input: Array=[1,2,4,5,8,10,17,19,20,23,30,36,40,50]
x=10
Output: True
Explanation of the Example: the array contains [10,20,30,40,50], which is a arithmetic sequence (length=5) with the common differece 10.
Thanks!
I apologize that I have not try any code to solve this since I have no clue yet.
After reading the answers, I tried it in python.
Here are my codes:
df = [1,10,11,20,21,30,40]
i=0
common_differene=10
df_len=len(df)
for position_1 in range(df_len):
for position_2 in range(df_len):
if df[position_1] + common_differene == df[position_2]:
position_1=position_2
i=i+1
print(i)
However, it returns 9 instead of 4.
Is there anyway to prevent the repetitive counting in one sequence [10,20,30,40] and also prevent accumulating i from other sequences [1,11,21]?
You can solve your problem by using 2 loops, one to run through every element and the other one to check if the element is currentElement+x, if you find one that does, you can continue form there.
With the added rule of the sequence being more than 2 elements long, I have recreated your problem in FREE BASIC:
DIM array(13) As Integer = {1, 2, 4, 5, 8, 10, 17, 19, 20, 23, 30, 36, 40, 50}
DIM x as Integer = 10
DIM arithmeticArrayMinLength as Integer = 3
DIM index as Integer = 0
FOR position As Integer = LBound(array) To UBound(array)
FOR position2 As Integer = LBound(array) To UBound(array)
IF (array(position) + x = array(position2)) THEN
position = position2
index = index + 1
END IF
NEXT
NEXT
IF (index <= arithmeticArrayMinLength) THEN
PRINT false
ELSE
PRINT true
END IF
Hope it helps
Edit:
After reviewing your edit, I have come up with a solution in Python that returns all arithmetic sequences, keeping the order of the list:
def arithmeticSequence(A,n):
SubSequence=[]
ArithmeticSequences=[]
#Create array of pairs from array A
for index,item in enumerate(A[:-1]):
for index2,item2 in enumerate(A[index+1:]):
SubSequence.append([item,item2])
#finding arithmetic sequences
for index,pair in enumerate(SubSequence):
if (pair[1] - pair[0] == n):
found = [pair[0],pair[1]]
for index2,pair2 in enumerate(SubSequence[index+1:]):
if (pair2[0]==found[-1] and pair2[1]-pair2[0]==n):
found.append(pair2[1])
if (len(found)>2): ArithmeticSequences.append(found)
return ArithmeticSequences
df = [1,10,11,20,21,30,40]
common_differene=10
arseq=arithmeticSequence(df,common_differene)
print(arseq)
Output: [[1, 11, 21], [10, 20, 30, 40], [20, 30, 40]]
This is how you can get all the arithmetic sequences out of df for you to do whatever you want with them.
Now, if you want to remove the sub-sequences of already existing arithmetic sequences, you can try running it through:
def distinct(A):
DistinctArithmeticSequences = A
for index,item in enumerate(A):
for index2,item2 in enumerate([x for x in A if x != item]):
if (set(item2) <= set(item)):
DistinctArithmeticSequences.remove(item2)
return DistinctArithmeticSequences
darseq=distinct(arseq)
print(darseq)
Output: [[1, 11, 21], [10, 20, 30, 40]]
Note: Not gonna lie, this was fun figuring out!
Try from 1: check the presence of 11, 21, 31... (you can stop immediately)
Try from 2: check the presence of 12, 22, 32... (you can stop immediately)
Try from 4: check the presence of 14, 24, 34... (you can stop immediately)
...
Try from 10: check the presence of 20, 30, 40... (bingo !)
You can use linear searches, but for a large array, a hash map will be better. If you can stop as soon as you have found a sequence of length > 3, this procedure takes linear time.
Scan the list increasingly and for every element v, check if the element v + 10 is present and draw a link between them. This search can be done in linear time as a modified merge operation.
E.g. from 1, search 11; you can stop at 17; from 2, search 12; you can stop at 17; ... ; from 8, search 18; you can stop at 19...
Now you have a graph, the connected components of which form arithmetic sequences. You can traverse the array in search of a long sequence (or a longest), also in linear time.
In the given example, the only links are 10->-20->-30->-40->-50.

Designing a wireless sensor algorithm

So I'm given a sorted array containing n >= 2 number of integers. These integers are used to represent wireless sensors and each has a broadcast radius of 2, meaning that if I had a number "4" it can reach at most "2" or "6". So I need to design an algorithm that returns an array containing all pairs of sensors (as subarrays) that can communicate with each other, possibly having their message forwarded by some intermediate sensor, eg. "8" is able to communicate with "12" given "10" exists in the array. The algorithm also needs to run in O(n^2) time.
So at first it was pretty simple, I would just get the length of the array, n, and iterate through it using a while loop (i < n) and if the current element + 2 was greater or equal to the following element, add its index and the following element's index to a subarray and add it to an empty array. But I was having problems with the intermediate sensor part. How would I find the intermediate sensor connections though?
Sort the list (takes O(NlogN) time)
Start traversing the array
See if the current element can communicate with our current set. If yes, then add it to the set and continue. Else, store the old set, create a new one and add current element to it.
For each set, generate all pairs of sensors that can communicate.
Something like:
def generate_pairs(array):
pairs = []
array_length = len(array)
for i in range(0, array_length):
for j in range(i+1, array_length):
pairs.append([array[i],array[j]])
return pairs
main_list = [1,2,4,7,9,10,13]
main_list.sort()
curr_set = [main_list[0]]
all_sets = []
for i in range (1,len(main_list)):
if main_list[i]-main_list[i-1] <=2:
curr_set.append(main_list[i])
else:
all_sets.append(curr_set)
curr_set = [main_list[i]]
all_pairs = []
for i in all_sets:
all_pairs += generate_pairs(i)
print(all_pairs)
# prints [[1, 2], [1, 4], [2, 4], [7, 9], [7, 10], [9, 10]]

Most Efficient Algorithm to Align an Multiple Ordered Sequences

I have a strange feeling this is a very easy problem to solve but I'm not finding a good way of doing this without using brute force or dynamic programming. Here it goes:
Given N arrays of ordered and monotonic values, find the set of positions for each array i1, i2 ... in that minimises pair-wise difference of values at those indexes between all arrays. In other words, find the positions for all arrays whose values are closest to each other. Multiple solutions may exist and arrays may or may not be equally sized.
If A denotes the list of all arrays, the pair-wise difference is given by the sum of absolute differences between all values at the given indexes between all different arrays, as so:
An example, 3 arrays a, b and c:
a = [20 29 30 32 33]
b = [28 29 30 32 33]
c = [10 12 28 31 32 33]
The best alignment for this array would be a[3] b[3] c[4] or a[4] b[4] c[5], because (32,32,32) and (33,33,33) are all equal values and have, therefore minimum pairwise difference between each other. (Assuming array index starts at 0)
This is a common problem in bioinformatics thats usually solved with Dynamic Programming, but due to the fact this is an ordered sequence, I think there's somehow a way of exploiting this notion of order. I first thought about doing this pairwise, but this does not guarantee the global optimum because the best local answer might not be the best global answer.
This is meant to be language agnostic, but I don't really mind an answer for a specific language, as long as there is no loss of generality. I know Dynamic Programming is an option here, but I have a feeling there's an easier way to do this?
The tricky thing is parsing the arrays so that at some point you're guaranteed to be considering the set of indices that realize the pairwise min. Using a min heap on the values doesn't work. Counterexample with 4 arrays: [0,5], [1,2], [2], [2]. We start with a d(0,1,2,2) = 7, optimal is d(0,2,2,2) = 6, but the min heap moves us from 7 to d(5,1,2,2) = 12, then d(5,2,2,2) = 9.
I believe (but haven't proved) that if we alway increment the index that improves pairwise distance the most (or degrades it the least), we're guaranteed to visit every local min and the global min.
Assuming n total elements across k arrays:
Simple approach: we repeatedly get the pairwise distance deltas (delta wrt. incrementing each index), increment the best one, and any time doing so switch us from improvement to degradation (i.e. a local minimum) we calculate the pairwise distance. All this is O(k^2) per increment for a total running time of O((n-k) * (k^2)).
With O(k^2) storage, we could keep an array where (i,j) stores the pairwise distance delta achieve by increment the index of array i wrt. array j. We also store the column sums. Then on incrementing an index we can update the appropriate row & column & column sums in O(k). This gives us a running time of O((n-k)*k)
To just complete Dave's answer, here is the pseudocode of the delta algorithm:
initialise index_table to 0's where each row i denotes the index for the ith array
initialise delta_table with the corresponding cost of incrementing index of ith array and keeping the other indexes at their current values
cur_cost <- cost of current index table
best_cost <- cur_cost
best_solutions <- list with the current index table
while (can_at_least_one_index_increase)
i <- index whose delta is lowest
increment i-th entry of the index_table
if cost(index_table) < cur_cost
cur_cost = cost(index_table)
best_solutions = {} U {index_table}
if cost(index_table) = cur_cost
best_solutions = best_solutions U {index_table}
update delta_table
Important Note: During an iteration, some index_table entries might have already reached the maximum value for that array. Whenever updating the delta_table, it is necessary to never pick those values, otherwise this will result in a Array Out of Bounds,Segmentation Fault or undefined behaviour. A neat trick is to simply check which indexes are already at max and set a sufficiently large value, so they are never picked. If no index can increase anymore, the loop will end.
Here's an implementation in Python:
def align_ordered_sequences(arrays: list):
def get_cost(index_table):
n = len(arrays)
if n == 1:
return 0
sum = 0
for i in range(0, n-1):
for j in range(i+1, n):
v1 = arrays[i][index_table[i]]
v2 = arrays[j][index_table[j]]
sum += math.sqrt((v1 - v2) ** 2)
return sum
def compute_delta_table(index_table):
# Initialise the delta table: we switch each index element to 1, call
# the cost method and then revert the change, this avoids having to
# create copies, which decreases performance unnecessarily
delta_table = []
for i in range(n):
if index_table[i] + 1 >= len(arrays[i]):
# Implementation detail: if the index is outside the bounds of
# array i, choose a "large enough" number
delta_table.append(999999999999999)
else:
index_table[i] = index_table[i] + 1
delta_table.append(get_cost(index_table))
index_table[i] = index_table[i] - 1
return delta_table
def can_at_least_one_index_increase(index_table):
answer = False
for i in range(len(arrays)):
if index_table[i] < len(arrays[i]) - 1:
answer = True
return answer
n = len(arrays)
index_table = [0] * n
delta_table = compute_delta_table(index_table)
best_solutions = [index_table.copy()]
cur_cost = get_cost(index_table)
best_cost = cur_cost
while can_at_least_one_index_increase(index_table):
i = delta_table.index(min(delta_table))
index_table[i] = index_table[i] + 1
new_cost = get_cost(index_table)
# A new best solution was found
if new_cost < cur_cost:
cur_cost = new_cost
best_solutions = [index_table.copy()]
# A new solution with the same cost was found
elif new_cost == cur_cost:
best_solutions.append(index_table.copy())
# Update the delta table
delta_table = compute_delta_table(index_table)
return best_solutions
And here are some examples:
>>> print(align_ordered_sequences([[0,5], [1,2], [2], [2]]))
[[0, 1, 0, 0]]
>> print(align_ordered_sequences([[3, 5, 8, 29, 40, 50], [1, 4, 14, 17, 29, 50]]))
[[3, 4], [5, 5]]
Note 2: this outputs indexes not the actual values of each array.

Ruby - Efficient method of checking if sum of two numbers in array equal a value

Here's my problem: I have a list of 28,123 numbers I need to iterate through and an array of 6965 other numbers checking if the sum of two numbers (can be the same number) have equal value to each of the 28,123 numbers. I want to put them in a new array or mark them as true / false. Any solutions I've come up with so far are extremely inefficient.
So a dumbed-down version of what I want is if I have the following: array = [1, 2, 5] and the numbers 1 to 5 would return result = [2, 3, 4] or the array of result = [false, true, true, true, false]
I read this SE question: Check if the sum of two different numbers in an array equal a variable number? but I need something more efficient in my case it seems, or maybe a different approach to the problem. It also doesn't seem to work for two of the same number being added together.
Any help is much appreciated!
non_abundant(n) is a function that returns the first n non_abundant numbers. It executes almost instantaneously.
My Code:
def contains_pair?(array, n)
!!array.combination(2).detect { |a, b| a + b == n }
end
result = []
array = non_abundant(6965)
(1..28123).each do |n|
if array.index(n) == nil
index = array.length - 1
else
index = array.index(n)
end
puts n
if contains_pair?( array.take(index), n)
result << n
end
end
numbers = [1, 2, 5]
results = (1..10).to_a
numbers_set = numbers.each_with_object({}){ |i, h| h[i] = true }
results.select do |item|
numbers.detect do |num|
numbers_set[item - num]
end
end
#=> [2, 3, 4, 6, 7, 10]
You can add some optimizations by sorting your numbers and checking if num is bigger then item/2.
The complexity is O(n*m) where n and m are lengths of two lists.
Another optimization is if numbers list length is less then results list (n << m) you can achieve O(n*n) complexity by calculating all possible sums in numbers list first.
The most inefficient part of your algorithm is the fact that you are re-calculating many possible sums of combinations, 28123 times. You only need to do this once.
Here is a very simple improvement to your code:
array = non_abundant(6965)
combination_sums = array.combination(2).map {|comb| comb.inject(:+)}.uniq
result = (1..28123).select do |n|
combination_sums.include? n
end
The rest of your algorithm seems to be an attempt to compensate for that original performance mistake of re-calculating the sums - which is no longer needed.
There are further optimisations you could potentially make, such as using a binary search. But I'm guessing this improvement will already be sufficient for your needs.

How do I compare integer indexes in arrays when there are duplicate values?

First, some necessary background. I'm trying to make a number-based version of the game Mastermind as a way of learning to code in Ruby. My code basically works like this:
The computer generates an array (#computer_sequence) of 4 random numbers from 1-5
The user enters a 4 digit sequence, which winds up in an array called #user_array.
A method, called compare, iterates through #user_array, comparing the value and index of each number to those in #computer_sequence. The program then tells the user how many of their numbers have the correct value and the correct position, or how many numbers have the correct value only.
The problem: If there are multiple instances of a number in an array, they get the same index, right? Like if I have the array [1, 3, 3, 4], the number three has an index of 1, even though there are two 3s. For this program to work, though, each number has to have a unique position (is index even the word I want here?) in the array, even if the number occurs multiple times. Does that make sense?
Also, here's the code for the compare method:
def compare
value_only = 0
value_and_place = 0
puts "The computer's values are: #{#computer_sequence}"
puts "The user's values are: #{#user_array}"
#user_array.each do |candidate|
#computer_sequence.each do |computer_number|
if candidate == computer_number && #user_array.index(candidate) == #computer_sequence.index(computer_number)
value_and_place +=1
elsif candidate == computer_number && #user_array.index(candidate) != #computer_sequence.index(computer_number)
value_only +=1
end
end
end
Suppose
n = 4
computer = Array.new(n) { [1,2,3,4,5].sample }
#=> [3, 2, 3, 3]
user_digits = [2, 4, 2, 3]
First compute pairs of elements at the same index of computer and user_digits.
pairs = computer.zip(user_digits)
#=> [[3, 2], [2, 4], [3, 2], [3, 3]]
Compute number of values that match at the same position
pairs.count { |c,u| c==u }
#=> 1
Compute number of values that match at different positions
First remove the matches at the same positions of computer and user_digits.
comp, users = pairs.reject { |c,u| c==u }.transpose
#=> [[3, 2, 3], [2, 4, 2]]
meaning
comp #=> [3, 2, 3]
users #=> [2, 4, 2]
Now step through users removing the first matching element in comp (if there is one).
users.each do |n|
i = comp.index(n)
comp.delete_at(i) if i
end
So now:
comp #=> [3,3]
meaning that the number of elements that match at different positions is:
users.size-comp.size
#=> 1
Notice that we could alternatively compute the number of values that match at the same position as
n - users.size
For n equal to 4 this doesn’t offer any significant time saving, but it would if we had a problem with the same structure and n were large.
Alternative calculation
After computing
comp, users = pairs.reject { |c,u| c==u }.transpose
we could write
users.size - comp.difference(users).size
#=> 1
where Array#difference is as I defined it in my answer here.
Here
comp.difference(users)
#=> [3,3]
No, equal elements in an array don't have the same index. Maybe you're thinking that because Array#index only returns the index of the first element equal to its argument. But there are many ways to see that other equal elements have their own indexes. For example,
a = [1, 3, 3, 4]
a[1] == 3 # true
a[2] == 3 # also true
Aside from that issue, your algorithm doesn't quite match the rules of Mastermind. If there is one three in the computer's sequence and the player guesses two threes, both in different positions than the three in the computer's sequence, the player should be told that only one element of their sequence matches the computer's sequence in value but not position.
Given the above, plus that I think it would be clearer to calculate the two numbers separately, I'd do it like this:
value_and_place = 4.times { |i| #user_array[i] == #computer_sequence[i] }
value_only = (#user_array & #computer_sequence).length - value_and_place
That's less efficient than the approach you're taking, but CPU efficiency isn't important for 4-element arrays.
You can pass in the index value to your loop for each candidate using the each_with_index method. So when the first 3 is passed in, index will be 1 and when the second 3 is passed in, index will be 2.
The problem with using .index(candidate) is it returns the first index.
Try this:
#user_array.each_with_index do |candidate, index|
#computer_sequence.each do |computer_number|
if candidate == computer_number && candidate == #computer_sequence[index]
value_and_place +=1
elsif candidate == computer_number && candidate != #computer_sequence[index]
value_only +=1
end
end
end

Resources