Find missing random amount of numbers in array with duplicates - arrays

I should have a complete array of numeric identifiers like this one:
a = [3, 4, 5, 6, 7, 8, 9, 10]
But instead, I have a a messed up array in random order, with duplicates and missing numbers like this one:
b = [4, 9, 7, 7, 3, 3]
Is there a more optimal way to find out which numbers are missing apart from substract the array without duplicates?
a - b.uniq

(a - b).empty?
works, but--depending on the data--it may not be the fastest way of determining if a contains an element not in b. For example, the probability were high that every element of a was not in b, it might be faster, on average, to check if a[0] is in b, then (if it is not) if a[1] is in b and so on, stopping if and when the element is in b. But again, that depends on the data, in particular the likelihood that (a - b).empty? is true. If that likelihood is great, Array#-, which is written in C, would be relatively fast and probably the best choice.
On the other hand, if its all but certain that a will contain many elements that is not in b it may be faster to do something like the following:
require 'set'
b_set = b.to_set
#=> #<Set: {4, 9, 7, 3}>
a.all? { |n| b_set.include?(n) }
In any event, you might first perform a cheap test:
b.size < a.size
If that is true there certainly will be at least one element of a that is not in b (assuming that a contains no duplicates).

Ruby 2.6 introduced Array#difference which seems perfect here:
a = [3, 4, 5, 6, 7, 8, 9, 10]
b = [4, 9, 7, 7, 3, 3]
a.difference(b)
# => [5, 6, 8, 10]
Seems handy for this, with the added benefit of being very readable.

Related

How do you subdivide an array into n number of subarrays, such that the sum of each subarray is as equal as possible?

So an example of the question is as follows:
Let's say we want to subdivide [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] into 3 sub-arrays.
While I imagine there are many correct answers one of them would be, [10, 8], [9, 7, 2], [1, 3, 4, 5, 6]. The reason being that here the sum of the sub arrays is, 18, 18, 19, meaning they are as close to equal as they can possibly be.
Is there an algorithm that can consistently return such an answer given any starting array and any number of sub-arrays to divide into? (Assuming that: length of the starting array > number of sub-arrays)
(PS If you want to show your logic in code I feel the most comfortable with python.)

How to find array elements that are the sum of a given number

For example, we have such array arr = [1, 1, 3, 4, 5, 7] and we have given number 8, we need to find any n number of elements in this array that will be the sum of the given number. In this case, it should be [1, 3, 4] or [1, 7] or [3, 5]. What is the easiest way to do it in Ruby?
Like #Stefan and #Jorg said in comments there is no easy way to do it. If this was a question to myself, I would probably write down something like this.
arr = [1, 1, 3, 4, 5, 7]
number = 8
result = []
for i in 0..(arr.length) do
arr.combination(i).each do |combination|
result.push(combination) if combination.sum == number
end
end
print result.uniq
Depending on the magnitude of the given number, it may be faster to use dynamic programming. If tot is the given number and arr is the array of possible summands, the method given below has a computational complexity of O(tot*arr.size).
Code
def find_summands(arr, tot)
return [] if tot.zero?
arr.each_with_object([{tot=>nil}]) do |n,a|
h = a.last.each_key.with_object({}) do |t,h|
return soln(arr,a.drop(1),n) if t==n
h[t] = 0
h[t-n] = n
end
a << h
end
nil
end
def soln(arr,a,n)
t = n
a.reverse.each_with_object([n]) do |h,b|
m = h[t]
b << m
t += m
end.reverse.tap { |a| (arr.size-a.size).times { a << 0 } }
end
Examples
arr = [1, 1, 3, 4, 5, 7]
find_summands(arr, 8)
#=> [1, 0, 3, 4, 0, 0]
find_summands(arr, 11)
#=> [1, 1, 0, 4, 5, 0]
find_summands(arr, 21)
#=> [1, 1, 3, 4, 5, 7]
find_summands(arr, 22)
#=> nil
find_summands([1, -2, 3, 4, 5, 7], 6)
#=> [1, -2, 3, 4, 0, 0]
Each zero in the array returned indicates that the corresponding element in arr is not used in the summation.
Explanation
Suppose:
arr = [4, 2, 6, 3, 5, 1]
tot = 13
then
find_summands(arr, tot)
#=> [4, 0, 6, 3, 0, 0]
When a solution is obtained soln is called to put it into a more useful form:
soln(arr, a.drop(1), n)
Here, arr is as above and
n #=> 3
a #=> [
{13=>nil}, # for tot
{13=>0, 9=>4}, # for arr[0] => 4
{13=>0, 11=>2, 9=>0, 7=>2}, # for arr[1] => 2
{13=>0, 7=>0, 11=>0, 5=>6, 9=>0, 3=>6, 1=>6} # for arr[2] => 6
]
n equals the value of the last summand used from arr, left to right.
When considering arr[0] #=> 4 the remaining amount to be summed is 13, the key of a[0] #=> {13=>nil}. There are two possibilities, 4 is a summand or it is not. This gives rise to the hash
a[1]
#=> {13-0=>0, 13-4=>4}
# { 13=>0, 9=>4}
where the keys are the remaining amount to be summed and the value is 4 if 4 is a summand and is zero if it is not.
Now consider arr[1] #=> 2. We look to the keys of a[1] to see what the possible remaining amounts might be after 4 is used or not. (13 and 9). For each of these we consider using or not using 2. That gives rise to the hash
a[2]
#=> {13-0=>0, 13-2=>2, 9-0=>0, 9-2=>2}
# { 13=>0, 11=>2, 9=>0, 7=>2}
7=>2 can be read, if 2 (the value) is a summand, there is a choice of using arr[0] or not that results in the remaining amount to be summed after 2 is included being 7.
Next consider arr[2] #=> 6. We look to the keys of a[2] to see what the possible remaining amounts might be after 4 and 6 are used or not. (13, 11, 9 and 7). For each of these we consider using or not using 6. We therefore now create the hash
a[3]
#=> {13-0=>0, 13-6=>6, 11-0=>0, 11-6=>6, 9-0=>0, 9-6=>6, 7-0=>0, 7-6=>6}
# { 13=>0, 7=>6, 11=>0, 5=>6, 9=>0, 3=>6, 7=>0, 1=>6}
# { 13=>0, 11=>0, 5=>6, 9=>0, 3=>6, 7=>0, 1=>6}
The pair 11=>0 can be read, "if 6 is not a summand, there is a choice of using or not using arr[0] #=> 4 and arr[2] #=> 2 that results in the remaining amount to be summed after 6 is excluded being 11".
Note that the key-value pair 7=>6 was overwritten with 7=>0 when not using 6 was considered with a remaining amount of 7. We are only looking for one solution, so it doesn't matter how we get to a remaining amount of 7 after the first three elements of arr are considered. These collisions tend to increase as we move left-to-right in arr, so the number of states we need to keep track of is greatly reduced because we are able to "throw away" so many of them.
Lastly (as it turns out), we consider arr[3] #=> 3. We look to the keys of a[3] to see what the possible remaining amounts might be after 4, 2 and 6 have been used or not (13, 11, 5, 9, 3, 7 and 1). For each of these we consider using or not using 3. We get this far in creating the hash a[4]:
{13=>0, 10=>3, 11=>0, 8=>3, 5=>0, 2=>3, 9=>0, 6=>3, 3=>0, 0=>3}
As the last key-value pair has a key of zero we know we have found a solution.
Let's construct the solution. Because the value of 0 is 3, 3 is a summand. (We would have found the solution earlier if the value were zero.) We now work backwards. As 3 is used, the remaining amount before 3 is used is 0+3 #=> 3. We find that a[3][3] #=> 6, meaning 6 is also a summand. The remaining balance before using the 6 was 3+6 #=> 9, so we compute a[2][9] #=> 0, which tells us that the 2 is not a summand. Lastly, a[1][9-0] #=> 4 shows that 4 is also a summand. Hence the solution
[4, 0, 6, 3, 0, 0]

How can I create a function that combines list/array rows/columns/elements in arbitrary sized array/list?

Afternoon. I'm currently trying to create a function(s) that, when given an array or list and a specified selection of columns/rows/elements, the specified columns/rows/etc are removed and concatenated into a array/list-much in this fashion (but for arbitrary sized objects that may or may not be pretty big)
a = [1 2 3 b=['a','b','c'
4 5 6 'd','e','f'
7 8 9] 'g','h','i']
Now, let's say I want the 1st, and third columns. Then this would look like
a'=[1 3 b'=['a', 'c'
4 6 'd', 'f'
7 9] 'g', 'i]
I'm familiar with slicing indices and extracting them using numpy-so I guess where I'm really hung up is creating some object (a list or array of arrays/lists?) that contains columns/whatever (in the above i choose the first and third columns, as you can see) and then iterating over that object to create a concatenated/combined list of what I've specified(i.e.-If I'm given an array with 127 variables and I want to exact an arbitrary amount of arbitrary columns at a given time)
Thanks for taking a look. Let me know how to update the op if anything is unclear.
How is this different from advanced indexing
In [324]: A = np.arange(12).reshape(2,6)
In [325]: A
Out[325]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
In [326]: A[:,[1,2,4]]
Out[326]:
array([[ 1, 2, 4],
[ 7, 8, 10]])
To select both rows and columns you have to pay attention to index broadcasting:
In [327]: A = np.arange(24).reshape(4,6)
In [328]: A[[[1],[3]], [1,2,4]] # column index and row index
Out[328]:
array([[ 7, 8, 10],
[19, 20, 22]])
In [329]: A[np.ix_([1,3], [1,2,4])] # easier with ix_()
Out[329]:
array([[ 7, 8, 10],
[19, 20, 22]])
https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing
The index arrays/lists can be assigned to variables - the input the the A indexing can be a tuple.
In [330]: idx = [[1,3],[1,2,4]]
In [331]: idx1 = np.ix_(*idx)
In [332]: idx1
Out[332]:
(array([[1],
[3]]), array([[1, 2, 4]]))
In [333]: A[idx1]
Out[333]:
array([[ 7, 8, 10],
[19, 20, 22]])
And to expand a set of slices and indices into single array, np.r_ is handy (though not magical):
In [335]: np.r_[slice(0,5),7,6, 3:6]
Out[335]: array([0, 1, 2, 3, 4, 7, 6, 3, 4, 5])
There are other indexing tools, utilities in indexing_tricks, functions like np.delete and np.take.
Try np.source(np.delete) to see how that handles general purpose deletion.
You could use a double list comprehension
>>> def select(arr, rows, cols):
... return [[el for j, el in enumerate(row) if j in cols] for i, row in enumerate(arr) if i in rows]
...
>>> select([[1,2,3,4],[5,6,7,8],[9,10,11,12]],(0,2),(1,3))
[[2, 4], [10, 12]]
>>>
please note that, independent of the order of indices in rows and cols,
select doesn't reorder the rows and columns of the input, note also that
using the same index repeatedly in either rows or cols does not give you duplicated rows or columns. Eventually note that select works only for lists of lists.
That said I advise you in favor of numpy that's hugely more flexible and
extremely more efficient.

Efficient way of finding sequential numbers across multiple arrays?

I'm not looking for any code or having anything being done for me. I need some help to get started in the right direction but do not know how to go about it. If someone could provide some resources on how to go about solving these problems I would very much appreciate it. I've sat with my notebook and am having trouble designing an algorithm that can do what I'm trying to do.
I can probably do:
foreach element in array1
foreach element in array2
check if array1[i] == array2[j]+x
I believe this would work for both forward and backward sequences, and for the multiples just check array1[i] % array2[j] == 0. I have a list which contains int arrays and am getting list[index] (for array1) and list[index+1] for array2, but this solution can get complex and lengthy fast, especially with large arrays and a large list of those arrays. Thus, I'm searching for a better solution.
I'm trying to come up with an algorithm for finding sequential numbers in different arrays.
For example:
[1, 5, 7] and [9, 2, 11] would find that 1 and 2 are sequential.
This should also work for multiple sequences in multiple arrays. So if there is a third array of [24, 3, 15], it will also include 3 in that sequence, and continue on to the next array until there isn't a number that matches the last sequential element + 1.
It also should be able to find more than one sequence between arrays.
For example:
[1, 5, 7] and [6, 3, 8] would find that 5 and 6 are sequential and also 7 and 8 are sequential.
I'm also interested in finding reverse sequences.
For example:
[1, 5, 7] and [9, 4, 11]would return 5 and 4 are reverse sequential.
Example with all:
[1, 5, 8, 11] and [2, 6, 7, 10] would return 1 and 2 are sequential, 5 and 6 are sequential, 8 and 7 are reverse sequential, 11 and 10 are reverse sequential.
It can also overlap:
[1, 5, 7, 9] and [2, 6, 11, 13] would return 1 and 2 sequential, 5 and 6 sequential and also 7 and 6 reverse sequential.
I also want to expand this to check numbers with a difference of x (above examples check with a difference of 1).
In addition to all of that (although this might be a different question), I also want to check for multiples,
Example:
[5, 7, 9] and [10, 27, 8] would return 5 and 10 as multiples, 9 and 27 as multiples.
and numbers with the same ones place.
Example:
[3, 5, 7] and [13, 23, 25] would return 3 and 13 and 23 have the same ones digit.
Use a dictionary (set or hashmap)
dictionary1 = {}
Go through each item in the first array and add it to the dictionary.
[1, 5, 7]
Now dictionary1 = {1:true, 5:true, 7:true}
dictionary2 = {}
Now go through each item in [6, 3, 8] and lookup if it's part of a sequence.
6 is part of a sequence because dictionary1[6+1] == true
so dictionary2[6] = true
We get dictionary2 = {6:true, 8:true}
Now set dictionary1 = dictionary2 and dictionary2 = {}, and go to the third array.. and so on.
We only keep track of sequences.
Since each lookup is O(1), and we do 2 lookups per number, (e.g. 6-1 and 6+1), the total is n*O(1) which is O(N) (N is the number of numbers across all the arrays).
The brute force approach outlined in your pseudocode will be O(c^n) (exponential), where c is the average number of elements per array and n is the number of total arrays.
If the input space is sparse (meaning there will be more missing numbers on average than presenting numbers), then one way to speed up this process is to first create a single sorted set of all the unique numbers from all your different arrays. This "master" set will then allow you to early exit (i.e. break statements in your loops) on any sequences which are not viable.
For example, if we have input arrays [1, 5, 7] and [6, 3, 8] and [9, 11, 2], the master ordered set would be {1, 2, 3, 5, 6, 7, 8, 9, 11}. If we are looking for n+1 type sequences, we could skip ever continuing checking any sequence that contains a 3 or 9 or 11 (because the n+1 value in not present at the next index in the sorted set. While the speedups are not drastic in this particular example, if you have hundreds of input arrays and very large range of values for n (sparsity), then the speedups should be exponential because you will be able to early exit on many permutations. If the input space is not sparse (such as in this example where we didn't have many holes), the speedups will be less than exponential.
A further improvement would be to store a "master" set of key-value pairs, where the key is the n value as shown in the example above, and the value portion of the pair is a list of the indices of any arrays that contain that value. The master set of the previous example would then be: {[1, 0], [2, 2], [3, 1], [5, 0], [6, 1], [7, 0], [8, 1], [9, 2], [11, 2]}. With this architecture, scan time could potentially be as low as O(c*n), because you could just traverse this single sorted master set looking for valid sequences instead of looping over all the sub-arrays. By also requiring the array indexes to increment, you can clearly see that the 1->2 sequence can be skipped because the arrays are not in the correct order, and the same with the 2->3 sequence, etc. Note this toy example is somewhat oversimplified because in practice you would need a list of indices for the value portions of the key-value pairs. This would be necessary if the same value of n ever appeared in multiple arrays (duplicate values).

How do I find the complement of an array?

If I have a sorted array of numerical values such as Double, Integer, and Time, what is the general logic to finding a complement?
Over my CS career in college, I've gotten better of understanding complements and edge cases for ranges. As I help students whose skill levels and understanding match mine when I wrote this, I need help finding a generalized way to convey this concept to them for singular elements and ranges.
Try something like this:
def complement(l, universe=None):
"""
Return the complement of a list of integers, as compared to
a given "universe" set. If no universe is specified,
consider the universe to be all integers between
the minimum and maximum values of the given list.
"""
if universe is not None:
universe = set(universe)
else:
universe = set(range(min(l), max(l)+1))
return sorted(universe - set(l))
then
l = [1,3,5,7,10]
complement(l)
yields:
[2, 4, 6, 8, 9]
Or you can specify your own universe:
complement(l, range(12))
yields:
[0, 2, 4, 6, 8, 9, 11]
To add another option - using a data type that is always useful to learn about, for these types of operations.
a = set([1, 3, 5, 7, 10])
b = set(range(1, 11))
c = sorted(list(b.symmetric_difference(a)))
print(c)
[2, 4, 6, 8, 9]
>>> nums = [1, 3, 5, 7, 10]
>>> [n + ((n&1)*2-1) for n in nums]
[2, 4, 6, 8, 9]
The easiest way is to iterate from the beginning of your list to the second to last element. Set j equal to the index + 1. While j is less than the next number in your list, append it to your list of complements and increment it.
# find the skipped numbers in a list sorted in ascending order
def getSkippedNumbers (arr):
complement = []
for i in xrange(0, len(arr) - 1):
j = arr[i] + 1
while j < arr[i + 1]:
complement.append(j)
j += 1
return complement
test = [1, 3, 5, 7, 10]
print getSkippedNumbers(test) # returns [2, 4, 6, 8, 9]
You can find the compliment of two lists using list comprehension. Here we are taking the complement of a set x with respect to a set y:
>>> x = [1, 3, 5, 7, 10]
>>> y = [1, 2, 3, 4, 8, 9, 20]
>>> z = [n for n in x if not n in y]
>>> z
[5, 7, 10]
>>>

Resources