Custom sort using spaceship operator in ruby [duplicate] - arrays

This question already has an answer here:
How do I do stable sort?
(1 answer)
Closed 3 years ago.
I am implementing a custom sort. There is the spaceship operator <=> to deal with sorting an array:
myArray.sort { |a, b| a <=> b }
a <=> b returns 1 when b is larger than a, and the two elements are swapped.
a <=> b returns 0 when a equals to b, and the two elements stay in the original position.
a <=> b returns -1 when a is less than b, and the two element stay in the original position.
So I tested with an example:
myArray = [2, 1, 7, 9, 3, 8, 0]
myArray.sort { |a, b| 1 <=> 1 } # making it always return 0
#=> [9, 1, 7, 2, 3, 8, 0]
The result is not what I expect. In my expectation, when the spaceship operator returns 0, every element would stay in the original position. Since the spaceship operator always returns 0 in the example above, the array should remain as is. However, the result is different from the original array.
Is there something I misunderstand?
Updated:
Following was the idea where my question above came from.
Originally I was trying to order objects by their attributes(Let's assume the attribute is status).
For example
myObjects = [obj1, obj2,..., objn] # the objects are ordered by create_time already
myObjects.sort{|a,b|
case a.status
when 'failed'
x = 1
when 'success'
x = 0
when 'pending'
x = -1
end
case b.status
when 'failed'
y = 1
when 'success'
y = 0
when 'pending'
y = -1
end
x <=> y # compare two objects (a and b) by the status
}
myObjects is ordered by created_time already, but I want to sort it again by each object's status.
For instance,
Two objects with the same created time(only taking hours and minutes into consideration here, just ignoring seconds) will be sorted again by their status, making objects with failed status be put at the end of the array.
The x and y value in the code above will depend on the object's status,
and x y are compared to decide the order. If the statuses of two objects are equal (x == y), the elements should stay in the same position because they are sorted by created_time already, no needs to sort them again.
When the status of two objects are both success, x <=> y will return 0.
But according to some comments, the comparison value 0 returned by the spaceship operator seems to output unpredictable order.
What if myObjects contains elements all with the same status? It would cause the spaceship operator returns 0 all the time since x == y.
In my expectation, myObjects should remain the same order since the statuses are all the same, how should I correct when using the spaceship operator in my case?
Many thanks to everyone's help!

Your assumption about how the sorting works is incorrect. As per the documentation #sort is not stable:
The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.

Array#sort uses Quicksort algorithm, which is not stable and can produce this behaviour when elements are "equal".
Reason is in choosing and moving pivot element at every step, ruby implementation seems to choose pivot at middle for this case (but it can be chosen differently).
This is what happens in your example:
pivot is chosen at element 9 at middle of array
now algorithm ensures that items on left of pivot are less than it and items on the right are greater or equal, because all are "equal" - this makes everything to be in right part
now recursively repeat for left(this is always empty in this case) and right partitions
result is sorted_left + [pivot] + sorted_right, left is empty thus pivot is moved
Ruby core documentation mentions this:
When the comparison of two elements returns 0, the order of the elements is unpredictable.
Also spaceship operator <=> does not play any role here, you could just call myArray.sort{0} for the same effect.
Update:
From updated question it's clear that you want to sort by two attributes, this can be done several ways:
Method1: you can invent a metric/key that takes both values into account and sort by it:
status_order = { 'success' => 1, 'failed' => 2, 'pending' => 3 }
myObjects.sort_by{|o| "#{status_order[o.status]}_#{o.created_time}" }
This is not very optimal in terms of extreme performance, but is shorter.
Method2: implicit composite key by writing comparison rule like this:
status_order = { 'success' => 1, 'failed' => 2, 'pending' => 3 }
status_order.default = 0
myObjects.sort{|a,b|
if a.status == b.status
a.created_time <=> b.created_time
else
status_order[a.status] <=> status_order[b.status]
end
}

How it works
myArray = [2, 1, 7, 9, 3, 8, 0]
myArray.sort { |a, b| a <=> b }
#=> [0, 1, 2, 3, 7, 8, 9]
myArray.sort { |a, b| b <=> a }
#=> [9, 8, 7, 3, 2, 1, 0]
If result of comparsion is always 0 it's impossible to sort elements. That is quite logical.
However, the documentation explicitly states that the order of the elements is unpredictable in this case. That's why the result is different from the old array.
However, I replicated your situation in Ruby 2.5.1 and it returns old array
myArray = [2, 1, 7, 9, 3, 8, 0]
myArray.sort { |a, b| 1 <=> 1 }
#=> [2, 1, 7, 9, 3, 8, 0]
There's also a misunderstanding in your code. You wrote
myArray #=> [9, 1, 7, 2, 3, 8, 0]
But in fact Array#sort doesn't change array, only Array#sort! does it.

Related

How to get the indices of subarray in large array

I have two arrays as follows:
a = [1,2,3,4,5,6,7,8,9,10]
b = [3,5,8,10,11]
I want to find the index of subarray in main array if a number is present. The expected output is:
res = [2,4,7,9]
I have done as follows:
[3,5,8,10,11].each do |_element|
res_array = []
if [1,2,3,4,5,6,7,8,9,10].find_index(_element).present?
res_array << (header_array.find_index(_element)
end
res_array
end
But I think there is a better approach to do this.
If performance matters (i.e. if your arrays are huge), you can build a hash of all number-index pairs in a, using each_with_index and to_h:
a.each_with_index.to_h
#=> {1=>0, 2=>1, 3=>2, 4=>3, 5=>4, 6=>5, 7=>6, 8=>7, 9=>8, 10=>9}
A hash allows fetching the values (i.e. indices) for the numbers in b much faster (as opposed to traversing an array each time), e.g. via values_at:
a.each_with_index.to_h.values_at(*b)
#=> [2, 4, 7, 9, nil]
Use compact to eliminate nil values:
a.each_with_index.to_h.values_at(*b).compact
#=> [2, 4, 7, 9]
or alternatively slice and values:
a.each_with_index.to_h.slice(*b).values
#=> [2, 4, 7, 9]
b.map { |e| a.index(e) }.compact
#⇒ [2, 4, 7, 9]
or, more concise:
b.map(&a.method(:index)).compact
Here is another simpler solution,
indxs = a.each_with_index.to_h
(a&b).map{|e| indxs[e]}
All the answers so far traverse all of a once (#Stefan's) or traverse all or part of a b.size times. My answer traverses part or all of a once. It is relatively efficient when a is large, b is small relative to a and all elements in b appear in a.
My solution is particularly efficient when a is ordered in such a way that the elements of b typically appear towards the beginning of a. For example, a might be a list of last names sorted by decreasing frequency of occurrence in the general population (e.g., ['smith', 'jones',...]) and b is a list of names to look up in a.
a and b may contain duplicates1 and not all elements of b are guaranteed to be in a. I assume b is not empty.
Code
require 'set'
def lookup_index(a, b)
b_set = b.to_set
b_hash = {}
a.each_with_index do |n,i|
next unless b_set.include?(n)
b_hash[n] = i
b_set.delete(n)
break if b_set.empty?
end
b_hash.values_at(*b)
end
I converted b to a set to make lookups comparable in speed to hash lookups (which should not be surprising considering that sets are implemented with an underlying hash). Hash lookups are very fast, of course.
Examples
a = [1,2,3,4,5,6,7,8,9,10,8]
b = [3,5,8,10,11,5]
Note that in this example both a and b contain duplicates and 11 in b is not present in a.
lookup_index(a, b)
#=> [2, 4, 7, 9, nil, 4]
Observe the array returned contains the index 4 twice, once for each 5 in b. Also, the array contains nil at index 4 to show that it is b[4] #=> 11 that does not appear in a. Without the nil placeholder there would be no means to map the elements of b to indices in a. If, however, the nil placeholder is not desired, one may replace b_hash.values_at(*b) with b_hash.values_at(*b).compact, or, if duplicates are unwanted, with b_hash.values_at(*b).compact.uniq.
As a second example suppose we are given the following.
a = [*1..10_000]
b = 10.times.map { rand(100) }.shuffle
#=> [30, 62, 36, 24, 41, 27, 83, 61, 15, 55]
lookup_index(a, b)
#=> [29, 61, 35, 23, 40, 26, 82, 60, 14, 54]
Here the solution was found after the first 83 elements of a were enumerated.
1 My solution would be no more efficient if duplicates were not permitted in a and/or b.

Sort starting with different order

In the following, the results are the same:
[3, 5].sort{|a, b| b <=> a}
[5, 3].sort{|a, b| b <=> a}
I would like to know what happened internally and how it depends on input array.
The first line:
[3, 5].sort { |a, b| b <=> a }
Invokes the block with a = 3 and b = 5. It returns the result of 5 <=> 3 which is 1. An integer greater than 0 tells sort that a follows b. The result is therefore [5, 3].
The second line:
[5, 3].sort { |a, b| b <=> a }
Invokes the block with a = 5 and b = 3. It returns the result of 3 <=> 5 which is -1. An integer less than 0 tells sort that b follows a. The result is therefore (again) [5, 3].
Because you are sorting an array, and changing the array's elements order does not change the sorting result.
This is the whole point of sorting after all - to get sorted result despite the initial arrays ordering.
To change the result, you will want to change the sorting rule, not the array.
The output is the same regardless of the input order because you sorting the array.
If you want to sort with the opposite order write
[3,5].sort{|a,b| a <=> b}

Ruby sorting even and odd numbers issue

I'm learning Ruby and just started with the sorting. Trying to sort the array like this: [1,3,5,2,4,6] and I'm don't really understand what is wrong with the code. Any help would be appreciated!
[1,2,3,4,5,6].sort do |x,y|
if x.odd? and y.odd?
0
elsif x.odd?
-1
else
1
end
if (x.odd? && y.odd?) or (x.even? && y.even?)
x <=> y
end
end
First off, let's fix your indentation (and convert to standard Ruby community coding style), so that we can better see what's going on:
[1, 2, 3, 4, 5, 6].sort do |x, y|
if x.odd? && y.odd?
0
elsif x.odd?
-1
else
1
end
if (x.odd? && y.odd?) || (x.even? && y.even?)
x <=> y
end
end
Now, the problem becomes obvious: your first conditional expression evaluates to 0, -1, or 1, but nothing is being done with this value. The value is not stored in a variable, not passed as an argument, not returned. It is simply ignored. The entire expression is a NO-OP.
Which means that the only thing that matters is this:
if (x.odd? && y.odd?) || (x.even? && y.even?)
x <=> y
end
This will return 0 for two elements that are equal, -1 or 1 for two elements that are unequal but both odd or both even, and nil (which to sort means "these two elements are un-comparable, they don't have a defined relative ordering") for elements where one element is odd and one is even. Since sort requires all elements to be comparable, it will then abort.
The easiest way to approach this problem would probably be to partition the array into odds and evens, sort them separately, and then concatenate them:
[1, 2, 3, 4, 5, 6].partition(&:odd?).map(&:sort).inject(:concat)
#=> [1, 3, 5, 2, 4, 6]
Or do it the other way round, just sort them all, and then partition (Thanks #Eric Duminil):
[1, 2, 3, 4, 5, 6].sort.partition(&:odd?).inject(:concat)
#=> [1, 3, 5, 2, 4, 6]
It's probably the first time I ever used a negative modulo :
i % -2 is -1 if i is odd
i % -2 is 0 if i is even
So sorting by i % -2 first and then by i should achieve the desired result.
If you want even numbers before odd numbers, you can sort by i % 2.
[3, 2, 1, 5, 6, 4].sort_by{ |i| [ i % -2, i] }
#=> [1, 3, 5, 2, 4, 6]
Thanks to #Stefan for his original idea!

Modifying an array during iteration

Consider the excerpt:
a = [1, 2, 3, 4, 5]
a.each { |e| a.shift ; p e ; p a }
that outputs:
1
[2, 3, 4, 5]
3
[3, 4, 5]
5
[4, 5]
It reveals that the implementation of each is done in terms of an index (1 is the element at position 0 when printed, 3 is the element at 1 when printed, and 5 is the element at position 2 when printed).
An alternative would be to print 1, 2, 3.
Is this behaviour intended? Or is it just implementation detail, and it is possible that someday Array gets reimplemented and this behavior may change?
Yes, this is a well known behaviour.
It is likely that the behaviour is due to implementation reason, i.e., efficiency, etc.
But whether or not it is just implementation detail does not have much significance. Ruby developers are concerned about not breaking backward compatibility, and it is very unlikely that such basic behaviour will be altered. Even if they were to go on, they will have a reasonably long span of transition period.
Regardless of that, it is usually not good to modify an array during iteration from the point of view of readability. In the relevant cases, it is likely that you should duplicate the array.

Vectorizing Matlab replace array values from start to end

I have an array in which I want to replace values at a known set of indices with the value immediately preceding it. As an example, my array might be
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0];
and the indices of values to be replaced by previous values might be
y = [2, 3, 8];
I want this replacement to occur from left to right, or else start to finish. That is, the value at index 2 should be replaced by the value at index 1, before the value at index 3 is replaced by the value at index 2. The result using the arrays above should be
[1, 1, 1, 4, 5, 6, 7, 7, 9, 0]
However, if I use the obvious method to achieve this in Matlab, my result is
>> x(y) = x(y-1)
x =
1 1 2 4 5 6 7 7 9 0
Hopefully you can see that this operation was performed right to left and the value at index 3 was replaced by the value at index 2, then 2 was replaced by 1.
My question is this: Is there some way of achieving my desired result in a simple way, without brute force looping over the arrays or doing something time consuming like reversing the arrays around?
Well, practically this is a loop but the order is number of consecutive index elements
while ~isequal(x(y),x(y-1))
x(y)=x(y-1)
end
Using nancumsum you can achieve a fully vectorized version. Nevertheless, for most cases the solution karakfa provided is probably one to prefer. Only for extreme cases with long sequences in y this code is faster.
c1=[0,diff(y)==1];
c1(c1==0)=nan;
shift=nancumsum(c1,2,4);
y(~isnan(shift))=y(~isnan(shift))-shift(~isnan(shift));
x(y)=x(y-1)

Resources