how to delete reduplicate element in an array in python - arrays

I have a two column array, which is already sorted according to the first column. I want to remove some elements according my rule:
1) compare the value the element with all the other values in the first column. If the difference of the value with others is larger than a given value (0.1 for example), keep it in the new array. Otherwise, if the data whose difference with others is smaller than the value, all of these value could be regarded as a compare group, then
2) for these compare groups, I need compare their elements in the second column and retain only the elements with the smallest value in second column in the groups.
For example: if my array is
a= [[1.2, 3],
[2.2, 3],
[2.25, 1],
[2.28, 3],
[3.2, 8],
[4.2, 10]]
Then what I want to get is following:
a=[[1.2, 3],
[2.25, 1],
[3.2, 8],
[4.2, 10]]
I delete the second and fourth elements. because the difference of the first elements 2.2,2.25 and 2.28 are smaller than 0.1, but the second elements 1 is the smallest one among them.
Could some one give me some hints, please?
thanks

from numpy import *
eps = 0.1
#ASSUMING the second arrow is sorted (otherwise sort it first)
a= array(
[[1, 1.2, 3],
[2, 2.2, 3],
[3, 2.25, 1],
[4, 2.28, 4],
[5, 3.2, 8],
[6, 4.2, 10],
[7, 4.21, 3],
[8, 4.25, 4],
[9, 4.28, 1],
[10, 5.2, 10],
])
# expected result
# a= [[1, 1.2, 3],
# [3, 2.25, 1],
# [5, 3.2, 8],
# [9, 4.28, 1],
# [10, 5.2, 10],
# ]
n = shape(a)[0]
b = a[:,1]
a1 = a[ (diff(b)<eps) ]
#indexes of some False that could be True.
#these indexes should be checked backwards
#and evtl. added to a1
indexes = where((diff(b)<eps)==False)[0][1:]
for index in indexes:
if b[index] - b[index-1]<eps:
a1 = vstack( (a1,a[index,:]) )
#sort array
a1 = a1[lexsort( (a1[:,1],a1[:,1]))]
groups = where(diff(a1[:,1])>eps)[0]
i = 0
# get min of groups
for g in groups:
ag = a1[i:g+1,2]
Ag = a1[i:g+1,:]
if i == 0:
a2 = Ag [ ag == min(ag) ]
else:
a2 = vstack( (a2, Ag [ ag == min(ag) ] ) )
i = g+1
#add last group
ag = a1[g+1:,2]
Ag = a1[g+1:,:]
a2 = vstack( (a2, Ag [ ag == min(ag) ]) )
#the elements that build no groups
result = a[ in1d(a[:,0], [ int(i) for i in a[:,0] if i not in a1[:,0] ]) ]
# add the elements of a2, these are the minimal elements of each group
result = vstack( (result, a2) )
# sort the result (optional)
result = result[lexsort( (result[:,0], result[:,0]))]
print "final result\n", result
Here is the output of this code
In [1]: run filter.py
final result
[[ 1. 1.2 3. ]
[ 3. 2.25 1. ]
[ 5. 3.2 8. ]
[ 9. 4.28 1. ]
[ 10. 5.2 10. ]]

Related

Calculating the sum of integers in a nested array, Ruby

I'm fairly new to learning Ruby so please bear with me. I am working on a 7 kyu Ruby coding challenge and I've been tasked with finding how many people are left on the bus (first value represents people on, second value, people off) please look at comments in code for more detail.
below is a test example:
([[10, 0], [3, 5], [5, 8]]), # => should return 5"
This is my solution so far:
def number(bus_stops)
bus_stops.each{ | on, off | on[0] -= off[1] }
end
bus_stops
# loop through the array
# for the first array in the nested array subtract second value from first
# add the sum of last nested array to first value of second array and repeat
# subtract value of last element in nested array and repeat
How can I approach this? any resources you would recommend?
There would be many ways to achieve this. Here is one with inject
arr.map { |inner_array| inner_array.inject(&:-) }.inject(&:+)
Iterate over the arrays and calculate the count at each position of how many people would have been left on the bus (this can return negative integers). This will return
[10, -2, -3]
[10 on, none off][3 on, 5 off][5 on, 8 off]
Then inject a + operator between each element to calculate the sum of people left on the bus. This only works if you count from 0 people on and 0 people off.
Here are two other ways to compute the desired result.
arr = [[10, 0], [3, 5], [5, 8]]
Use Array#transpose
arr.transpose.map(&:sum).reduce(:-)
#=> 5
The steps are as follows.
a = arr.transpose
#=> [[10, 3, 5], [0, 5, 8]]
b = a.map(&:sum)
#=> [18, 13] ([total ons, total offs])
b.reduce(:-)
#=> 5
Use Matrix methods
require 'matrix'
(Matrix.row_vector([1] * arr.size) * Matrix[*arr] * Matrix.column_vector([1,-1]))[0,0]
#=> 5
The steps are as follows.
a = [1] * arr.size
#=> [1, 1, 1]
b = Matrix.row_vector(a)
#=> Matrix[[1, 1, 1]]
c = Matrix[*arr]
#=> Matrix[[10, 0], [3, 5], [5, 8]]
d = b * c
#=> Matrix[[18, 13]]
e = Matrix.column_vector([1,-1])
#=> Matrix[[1], [-1]]
f = d * e
#=> Matrix[[5]]
f[0,0]
#=> 5
See Matrix::[], Matrix::row_vector, Matrix::column_vector and Matrix#[]. Notice that the instance method [] is documented in Object.
sum takes a block, which is really simple in this case:
arr = [[10, 0], [3, 5], [5, 8]]
p arr.sum{|on, off| on - off} # => 5
So you were very close.

Can someone explain in detail how mapping this array works step by step

The original array: [ [3, 5], [9, 6], [3,1] ]
1) Make the first number in each mini array the smaller number and the second number in each mini array the bigger number
- [ [3, 5], [9, 6], [3,1] ] -> [[ 3,5], [6,9], [1,3]]
2) Sort the mini arrays by first number.
- [ [3,5], [6,9], [1,3] ] -> [ [1,3], [3,5], [6,9] ]
So [ [3, 5], [9, 6], [3,1] ] -> [ [1,3], [3,5], [6,9] ] by the end of the sorting transformation.
Can someone explain in a step by step, detailed, clear, concise way how to use array.map() to make this happen?
Use map to make the nested array numbers placed in increasing order, then apply sort based on first number in the nested array.
const arr = [
[3, 5],
[9, 6],
[3, 1]
];
const sortedArr = arr
.map(([num1, num2]) => num2 < num1 ? [num2, num1] : [num1, num2])
.sort(([arr1num1], [arr2num1]) => arr1num1 - arr2num1);
console.log(sortedArr);
-- Edit--
Result of map
[[3, 5], [6, 9], [1, 3]];
In the sort function, I'am doing array destructuring (see - https://javascript.info/destructuring-assignment). Basically, I'am only extracting the first element from array and storing in separate variable. In our case, arr1num1 and arr2num1.
Example-
const arr = [3, 5];
// Extracting the first element from array and saving in variable - arr1num1.
// I do not need second array element 5, so i did not extracted the second array element
const [arr1num1] = arr; // arr1num1 will have now value 3
console.log(arr1num1);
// If second array element is also required, then the above statement will become const [num1, num2] = arr; where num1 -> 3 and num2 ->5
Alternative, to above would be to directly access array elements using there index.
const arr = [
[3, 5],
[9, 6],
[3, 1]
];
const sortedArr = arr
.map(([num1, num2]) => num2 < num1 ? [num2, num1] : [num1, num2])
.sort((arr1, arr2) => arr1[0] - arr2[0]);
console.log(sortedArr);
To know more about sort, see - Sorting
How sorting works,
In first iteration, swapping of [3, 5] & [6, 9] will happen based on sort condition.
arr1num1 -> 3
arr2num1 -> 6
The statement arr1num1 - arr2num1 will check if 3 is greater than 6. If yes, it will swap the arrays [3, 5] and [6, 9]. Since, 3 is not greater than 9, so no swapping will take place.
Second Iteration, swapping of [6, 9] and [1, 3] will take place based on condition.
arr1num1 -> 6
arr1num2 -> 1
Since, 6 > than 1, swapping will take place.
Before Swapping array will be- `[[3, 5], [6, 9], [1, 3]]`;
After Swapping array will be - `[[3, 5], [1, 3], [6, 9]]`;
This process will continue, until your array gets sorted.

How to find indices of max n elements in array in stable order

I have a number and an array:
n = 4
a = [0, 1, 2, 3, 3, 4]
I want to find the indices corresponding to the maximal n elements of a in the reverse order of the element size, and in stable order when the element sizes are equal. The expected output is:
[5, 3, 4, 2]
This code:
a.each_with_index.max(n).map(&:last)
# => [5, 4, 3, 2]
gives the right indices, but changes the order.
Code
def max_with_order(arr, n)
arr.each_with_index.max_by(n) { |x,i| [x,-i] }.map(&:last)
end
Examples
a = [0,1,2,3,3,4]
max_with_order(a, 1) #=> [5]
max_with_order(a, 2) #=> [5, 3]
max_with_order(a, 3) #=> [5, 3, 4]
max_with_order(a, 4) #=> [5, 3, 4, 2]
max_with_order(a, 5) #=> [5, 3, 4, 2, 1]
max_with_order(a, 6) #=> [5, 3, 4, 2, 1, 0]
Explanation
For n = 3 the steps are as follows.
b = a.each_with_index
#=> #<Enumerator: [0, 1, 2, 3, 3, 4]:each_with_index>
We can convert b to an array to see the (six) values it will generate and pass to the block.
b.to_a
#=> [[0, 0], [1, 1], [2, 2], [3, 3], [3, 4], [4, 5]]
Continuing,
c = b.max_by(n) { |x,i| [x,-i] }
#=> [[4, 5], [3, 3], [3, 4]]
c.map(&:last)
#=> [5, 3, 4]
Note that the elements of arr need not be numeric, merely comparable.
You can supply a block to max to make the determination more specific like so
a.each_with_index.max(n) do |a,b|
if a[0] == b[0] # the numbers are the same
b[1] <=> a[1] # compare the indexes in reverse
else
a[0] <=> b[0] # compare the numbers themselves
end
end.map(&:last)
#=> [5,3,4,2]
max block expects a comparable response e.g. -1,0,1 so in this case we are just saying if the number is the same then compare the indexes in reverse order e.g. 4 <=> 3 #=> -1 the -1 indicates this values is less so that will then be placed after 3
Also to expand on #CarySwoveland's answer (which I am a bit jealous I did not think of), since you only care about returning the indices we could implement as follows without a secondary map
a.each_index.max_by(n) { |x| [a[x],-x] }
#=> [5,3,4,2]
#compsy you wrote without changing order, so it would be:
a = [0,1,2,3,3,4]
n = a.max
i = 0
a.each do |x|
break if x == n
i += 1
end
I use variable i as index, when x (which is the value beeing analized) is equals n we use break to stop the each method conserving the last value of i wich corresponds to the position of the max value at the array. Be aware that value of i is different by one of the natural position in the array, and tht is because in arrays the first element is 0 not 1.
I break the each because there is no need to keep checking all the other values of the array after we found the position of the value.

Prevent identical pairs when shuffling and slicing Ruby array

I'd like to prevent producing pairs with the same items when producing a random set of pairs in a Ruby array.
For example:
[1,1,2,2,3,4].shuffle.each_slice(2).to_a
might produce:
[[1, 1], [3, 4], [2, 2]]
I'd like to be able to ensure that it produces a result such as:
[[4, 1], [1, 2], [3, 2]]
Thanks in advance for the help!
arr = [1,1,2,2,3,4]
loop do
sliced = arr.shuffle.each_slice(2).to_a
break sliced if sliced.none? { |a| a.reduce(:==) }
end
Here are three ways to produce the desired result (not including the approach of sampling repeatedly until a valid sample is found). The following array will be used for illustration.
arr = [1,4,1,2,3,2,1]
Use Array#combination and Array#sample
If pairs sampled were permitted to have the same number twice, the sample space would be
arr.combination(2).to_a
#=> [[1, 4], [1, 1], [1, 2], [1, 3], [1, 2], [1, 1], [4, 1], [4, 2],
# [4, 3], [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [1, 1], [2, 3],
# [2, 2], [2, 1], [3, 2], [3, 1], [2, 1]]
The pairs containing the same value twice--here [1, 1] and [2, 2]--are not wanted so they are simple removed from the above array.
sample_space = arr.combination(2).reject { |x,y| x==y }
#=> [[1, 4], [1, 2], [1, 3], [1, 2], [4, 1], [4, 2], [4, 3],
# [4, 2], [4, 1], [1, 2], [1, 3], [1, 2], [2, 3], [2, 1],
# [3, 2], [3, 1], [2, 1]]
We evidently are to sample arr.size/2 elements from sample_space. Depending on whether this is to be done with or without replacement we would write
sample_space.sample(arr.size/2)
#=> [[4, 3], [1, 2], [1, 3]]
for sampling without replacement and
Array.new(arr.size/2) { sample_space.sample }
#=> [[1, 3], [4, 1], [2, 1]]
for sampling with replacement.
Sample elements of each pair sequentially, Method 1
This method, like the next, can only be used to sample with replacement.
Let's first consider sampling a single pair. We could do that by selecting the first element of the pair randomly from arr, remove all instances of that element in arr and then sample the second element from what's left of arr.
def sample_one_pair(arr)
first = arr.sample
[first, second = (arr-[first]).sample]
end
To draw a sample of arr.size/2 pairs we there execute the following.
Array.new(arr.size/2) { sample_one_pair(arr) }
#=> [[1, 2], [4, 3], [1, 2]]
Sample elements of each pair sequentially, Method 2
This method is a very fast way of sampling large numbers of pairs with replacement. Like the previous method, it cannot be used to sample without replacement.
First, compute the cdf (cumulative distribution function) for drawing an element of arr at random.
counts = arr.group_by(&:itself).transform_values { |v| v.size }
#=> {1=>3, 4=>1, 2=>2, 3=>1}
def cdf(sz, counts)
frac = 1.0/sz
counts.each_with_object([]) { |(k,v),a|
a << [k, frac * v + (a.empty? ? 0 : a.last.last)] }
end
cdf_first = cdf(arr.size, counts)
#=> [[1, 0.429], [4, 0.571], [2, 0.857], [3, 1.0]]
This means that there is a probability of 0.429 (rounded) of randomly drawing a 1, 0.571 of drawing a 1 or a 4, 0.857 of drawing a 1, 4 or 2 and 1.0 of drawing one of the four numbers. We therefore can randomly sample a number from arr by obtaining a (pseudo-) random number between zero and one (p = rand) and then determine the first element of counts_cdf, [n, q] for which p <= q:
def draw_random(cdf)
p = rand
cdf.find { |n,q| p <= q }.first
end
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 4
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 1
draw_random(counts_cdf) #=> 2
draw_random(counts_cdf) #=> 3
In simulation models, incidentally, this is the standard way of generating pseudo-random variates from discrete probability distributions.
Before drawing the second random number of the pair we need to modify cdf_first to reflect that fact that the first number cannot be drawn again. Assuming there will be many pairs to generate randomly, it is most efficient to construct a hash cdf_second whose keys are the first values drawn randomly for the pair and whose values are the corresponding cdf's.
cdf_second = counts.keys.each_with_object({}) { |n, h|
h[n] = cdf(arr.size - counts[n], counts.reject { |k,_| k==n }) }
#=> {1=>[[4, 0.25], [2, 0.75], [3, 1.0]],
# 4=>[[1, 0.5], [2, 0.833], [3, 1.0]],
# 2=>[[1, 0.6], [4, 0.8], [3, 1.0]],
# 3=>[[1, 0.5], [4, 0.667], [2, 1.0]]}
If, for example, a 2 is drawn for the first element of the pair, the probability is 0.6 of drawing a 1 for the second element, 0.8 of drawing a 1 or 4 and 1.0 of drawing a 1, 4, or 3.
We can then sample one pair as follows.
def sample_one_pair(cdf_first, cdf_second)
first = draw_random(cdf_first)
[first, draw_random(cdf_second[first])]
end
As before, to sample arr.size/2 values with replacement, we execute
Array.new(arr.size/2) { sample_one_pair }
#=> [[2, 1], [3, 2], [1, 2]]
With replacement, you may get results like:
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 1], [1, 2], [1, 3]]
Note that 1 gets chosen three times, even though it's only in the original array twice. This is because the 1 is "replaced" each time it's chosen. In other words, it's put back into the collection to potentially be chosen again.
Here's a version of Cary's excellent sample_one_pair solution without replacement:
def unique_pairs(arr)
dup = arr.dup
Array.new(dup.size / 2) do
dup.shuffle!
first = dup.pop
second_index = dup.rindex { |e| e != first }
raise StopIteration unless second_index
second = dup.delete_at(second_index)
[first, second]
end
rescue StopIteration
retry
end
unique_pairs([1, 1, 2, 2, 3, 4]) # => [[4, 3], [1, 2], [2, 1]]
This works by creating a copy of the original array and deleting elements out of it as they're chosen (so they can't be chosen again). The rescue/retry is in there in case it becomes impossible to produce the correct number of pairs. For example, if [1, 3] is chosen first, and [1, 4] is chosen second, it becomes impossible to make three unique pairs because [2, 2] is all that's left; the sample space is exhausted.
This should be slower than Cary's solution (with replacement) but faster (on average) than the posted solutions (without replacement) that require looping and retrying. Welp, chalk up another point for "always benchmark!" I was wrong about all most of my assumptions. Here are the results on my machine with an array of 16 numbers ([1, 1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8, 9, 9, 10]):
cary_with_replacement
93.737k (± 2.9%) i/s - 470.690k in 5.025734s
mwp_without_replacement
187.739k (± 3.3%) i/s - 943.415k in 5.030774s
mudasobwa_without_replacement
129.490k (± 9.4%) i/s - 653.150k in 5.096761s
EDIT: I've updated the above solution to address Stefan's numerous concerns. In hindsight, the errors are obvious and embarrassing! On the plus side, the revised solution is now faster than mudasobwa's solution, and I've confirmed that the two solutions have the same biases.
You can check if there any mathes and shuffle again:
a = [1,1,2,2,3,4]
# first time shuffle
sliced = a.shuffle.each_slice(2).to_a
# checking if there are matches and shuffle if there are
while sliced.combination(2).any? { |a, b| a.sort == b.sort } do
sliced = a.shuffle.each_slice(2).to_a
end
It is unlikely, be aware about possibility of infinity loop

ruby array subrange when that range is a variable

Is it possible to apply a sub range to an array in ruby like this:
> array = [4, 3, 2, 1]
> array[0...2]
=> [4, 3]
if the [0...2] is stored in a variable? I can't seem to get a syntax to give me what I want. What replaces the <?> in the following, if anything?
> array = [4, 3, 2, 1]
> range = [0...2]
> array<?>
=> [4, 3]
Yes, sure! Do this way:
array = [4, 3, 2, 1]
exclusive_range = [0...2] # Will get 0th and 1st element of the array
inclusive_range = [0..2] # Will get 0th, 1st and 2nd element of the array
array[exclusive_range.first]
# => [4, 3]
array[inclusive_range.first]
# => [4, 3, 2]
If you want to avoid .first call, you can put your range in a variable (Not in an array):
range = 0...2
array[range]
# => [4, 3]
Note that (0..2).size #=> 3. If you want to return [4,3] you want:
range = 0..1
You could use it like this:
array[range] #=> [4, 3]
or like this:
array.values_at *range #=> [4, 3]

Resources