Is it safe to delete from an Array inside each? - arrays

Is it possible to safely delete elements from an Array while iterating over it via each? A first test looks promising:
a = (1..4).to_a
a.each { |i| a.delete(i) if i == 2 }
# => [1, 3, 4]
However, I could not find hard facts on:
Whether it is safe (by design)
Since which Ruby version it is safe
At some points in the past, it seems that it was not possible to do:
It's not working because Ruby exits the .each loop when attempting to delete something.
The documentation does not state anything about deletability during iteration.
I am not looking for reject or delete_if. I want to do things with the elements of an array, and sometimes also remove an element from the array (after I've done other things with said element).
Update 1: I was not very clear on my definition of "safe", what I meant was:
do not raise any exceptions
do not skip any element in the Array

You should not rely on unauthorized answers too much. The answer you cited is wrong, as is pointed out by Kevin's comment to it.
It is safe (from the beginning of Ruby) to delete elements from an Array while each in the sense that Ruby will not raise an error for doing that, and will give a decisive (i.e., not random) result.
However, you need to be careful because when you delete an element, the elements following it will be shifted, hence the element that was supposed to be iterated next would be moved to the position of the deleted element, which has been iterated over already, and will be skipped.

In order to answer your question, whether it is "safe" to do so, you will first have to define what you mean by "safe". Do you mean
it doesn't crash the runtime?
it doesn't raise an Exception?
it does raise an Exception?
it behaves deterministically?
it does what you expect it to do? (What do you expect it to do?)
Unfortunately, the Ruby Language Specification is not exactly helpful:
15.2.12.5.10 Array#each
each(&block)
Visibility: public
Behavior:
If block is given:
For each element of the receiver in the indexing order, call block with the element as the only argument.
Return the receiver.
This seems to imply that it is indeed completely safe in the sense of 1., 2., 4., and 5. above.
The documentation says:
each { |item| block } → ary
Calls the given block once for each element in self, passing that element as a parameter.
Again, this seems to imply the same thing as the spec.
Unfortunately, none of the currently existing Ruby implementations interpret the spec in this way.
What actually happens in MRI and YARV is the following: the mutation to the array, including any shifting of the elements and/or indices becomes visible immediately, including to the internal implementation of the iterator code which is based on array indices. So, if you delete an element at or before the position you are currently iterating, you will skip the next element, whereas if you delete an element after the position you are currently iterating, you will skip that element. For each_with_index, you will also observe that all elements after the deleted element have their indices shifted (or rather the other way around: the indices stay put, but the elements are shifted).
So, this behavior is "safe" in the sense of 1., 2., and 4.
The other Ruby implementations mostly copy this (undocumented) behavior, but being undocumented, you cannot rely on it, and in fact, I believe at least one did experiment briefly with raising some sort of ConcurrentModificationException instead.

I would say that it is safe, based on the following:
2.2.2 :035 > a = (1..4).to_a
=> [1, 2, 3, 4]
2.2.2 :036 > a.each { |i| a.delete(i+1) if i > 1 ; puts i }
1
2
4
=> [1, 2, 4]
I'd infer from this test that Ruby correctly recognises while iterating through the contents that the element "3" has been deleted while element "2" was being processed, otherwise element "4" would also have been deleted.
However,
2.2.2 :040 > a.each { |i| puts i; a.delete(i) if i > 1 ; puts i }
1
1
2
2
4
4
This suggests that after "2" is deleted, the next element processed is whichever is now third in the array, so the element that used to be in third place does not get processed at all. each appears to re-examine the array to find the next element to process on every iteration.
I think that with that in mind, you ought to duplicate the array in your circumstances prior to processing.

It depends.
All .each does is returns an enumerator, which holds the collection an a pointer to where it left. Example:
a = [1,2,3]
b = a.each # => #<Enumerator: [1, 2, 3]:each>
b.next # => 1
a.delete(2)
b.next # => 3
a.clear
b.next # => StopIteration: iteration reached an end
Each with block calls next until the iteration reaches its end. So as long as you don't modify any 'future' array records it should be safe.
However there are so many helpful methods in ruby's Enumerable and Array you really shouldn't ever need to do this.

You are right, in the past it was advised not to remove items from the collection while iterating over it. In my tests and at least with version 1.9.3 in practice in an array this gives no problem, even when deleting prior or next elements.
It is my opinion that while you can you shouldn't.
A more clear and safe approach is to reject the elements and assign to a new array.
b = a.reject{ |i| i == 2 } #[1, 3, 4]
In case you want to reuse your a array that is also possible
a = a.reject{ |i| i == 2 } #[1, 3, 4]
which is in fact the same as
a.reject!{ |i| i == 2 } #[1, 3, 4]
You say you don't want to use reject because you want to do other things with the elements before deleting, but that is also possible.
a.reject!{ |i| puts i if i == 2;i == 2 }
# 2
#[1, 3, 4]

Related

How do I use an across loop in post condition to compare an old array and new array at certain indices?

I have a method that shifts all the items, in an array, to the left by one position. In my post condition I need to ensure that my items have shifted to the left by one. I have already compared the first element of the old array to the last element of the new array. How do i across loop through the old array from 2 until count, loop through the new array from 1 until count-1 and compare them? This is my implementation so far.
items_shifted:
old array.deep_twin[1] ~ array[array.count]
and
across 2 |..| (old array.deep_twin.count) as i_twin all
across 1 |..| (array.count-1) as i_orig all
i_twin.item ~ i_orig.item
end
end
end
I expected the result to be true but instead I get a contract violation pointing to this post condition. I have tested the method out manually by printing out the array before and after the method and I get the expected result.
In the postcondition that fails, the loop cursors i_twin and i_orig iterate over sequences 2 .. array.count and 1 .. array.count - 1 respectively, i.e. their items are indexes 2, 3, ... and 1, 2, .... So, the loop performs comparisons 2 ~ 1, 3 ~ 2, etc. (at run-time, it stops on the first inequality). However, you would like to compare elements, not indexes.
One possible solution is shown below:
items_shifted:
across array as c all
c.item =
if c.target_index < array.upper then
(old array.twin) [c.target_index + 1]
else
old array [array.lower]
end
end
The loop checks that all elements are shifted. If the cursor points to the last element, it compares it against the old first element. Otherwise, it tests whether the current element is equal to the old element at the next index.
Cosmetics:
The postcondition does not assume that the array starts at 1, and uses array.lower and array.upper instead.
The postcondition does not perform a deep twin of the original array. This allows for comparing elements using = rather than ~.
Edit: To avoid potential confusion caused by precedence rules, and to highlight that comparison is performed for all items between old and new array, a better variant suggested by Eric Bezault looks like:
items_shifted:
across array as c all
c.item =(old array.twin)
[if c.target_index < array.upper then
c.target_index + 1
else
array.lower
end]
end

The number of same elements in an array

My aim is to display the number of identical elements in an array.
Here is my code:
a = [5, 2, 4, 1, 2]
b = []
for i in a
unless b.include?(a[i])
b << a[i]
print i," appears ",a.count(i)," times\n"
end
end
I get this output:
5 appears 1 times
2 appears 2 times
4 appears 1 times
The output misses 1.
Here's a different way to do it, assuming I understand what "it" is (counting elements in an array):
a = [5,2,4,1,2]
counts = a.each_with_object(Hash.new(0)) do |element, counter|
counter[element] += 1
end
# => {5=>1, 2=>2, 4=>1, 1=>1}
# i.e. one 5, two 2s, one 4, one 1.
counts.each do |element, count|
puts "#{element} appears #{count} times"
end
# => 5 appears 1 times
# => 2 appears 2 times
# => 4 appears 1 times
# => 1 appears 1 times
Hash.new(0) initialises a hash with a default value 0. We iterate on a (while passing the hash as an additional object), so element will be each element of a in order, and counter will be our hash. We will increment the value of the hash indexed by the element by one; on the first go for each element, there won't be anything there, but our default value saves our bacon (and 0 + 1 is 1). The next time we encounter an element, it will increment whatever value already is present in the hash under that index.
Having obtained a hash of elements and their counts, we can print them, of course, puts is same as print but automatically inserts a newline; and rather than using commas to print several things, it is much nicer to put the values directly into the printed string itself using the string interpolation syntax ("...#{...}...").
The problems in your code are as follows:
[logic] for i in a will give you elements of a, not indices. Thus, a[i] will give you nil for the first element, not 5, since a[5] is outside the list. This is why 1 is missing from your output: a[1] (i.e. 2) is already in b when you try to process it.
[style] for ... in ... is almost never seen in Ruby code, with strong preference to each and other methods of Enumerable module
[performance] a.count(i) inside a loop increases your algorithmic complexity: count itself has to see the whole array, and you need to iterate the array to see i, which will be exponentially slower with huge arrays. The method above only has one loop, as access to hashes is very fast, and thus grows more or less linearly with the size of the array.
The stylistic and performance problems are minor, of course; you won't see performance drop till you need to process really large arrays, and style errors won't make your code not work; however, if you're learning Ruby, you should aim to work with the language from the start, to get used to its idioms as you go along, as it will give you much stronger foundation than transplanting other languages' idioms onto it.
a = [5,2,4,1,2]
b = a.uniq
for i in b
print i," appears ",a.count(i)," times\n"
end
print b
Result:
5 appears 1 times
2 appears 2 times
4 appears 1 times
1 appears 1 times
[5, 2, 4, 1]

Is it possible to alter an Array object's length?

How does one alter self in an Array to be a totally new array? How do I fill in the commented portion below?
class Array
def change_self
#make this array be `[5,5,5]`
end
end
I understand this: Why can't I change the value of self? and know I can't just assign self to a new object. When I do:
arr = [1,2,3,4,5]
arr contains a reference to an Array object. I can add a method to Array class that alters an array, something like:
self[0] = 100
but is it possible to change the length of the array referenced by arr?
How are these values stored in the Array object?
You are asking three very different questions in your title and in your text:
Is it possible to alter an Array object's length using an Array method?
Yes, there are 20 methods which can (potentially) change the length of an Array:
<< increases the length by 1
[]= can alter the length arbitrarily, depending on arguments
clear sets the length to 0
compact! can decrease the length, depending on contents
concat can increase the length, depending on arguments
delete can decrease the length, depending on arguments and contents
delete_at can decrease the length, depending on arguments
delete_if / reject! can decrease the length, depending on arguments and contents
fill can increase the length, depending on arguments
insert increases the length
keep_if / select! can decrease the length, depending on arguments and contents
pop decreases the length
push increases the length
replace can alter the length arbitrarily, depending on arguments and contents (it simply replaces the Array completely with a different Array)
shift decreases the length
slice! decreases the length
uniq! can decrease the length, depending on contents
unshift increases the length
When monkey patching the Array class, how does one alter "self" to be a totally new array? How do I fill in the commented portion below?
class Array
def change_self
#make this array be [5,5,5] no matter what
end
end
class Array
def change_self
replace([5, 5, 5])
end
end
How are these values actually stored in the Array object?
We don't know. The Ruby Language Specification does not prescribe any particular storage mechanism or implementation strategy. Implementors are free to implement Arrays any way they like, as long as they obey the contracts of the Array methods.
As an example, here's the Array implementation in Rubinius, which I find fairly readable (at least more so than YARV):
vm/builtin/array.cpp: certain core methods and data structures
kernel/bootstrap/array.rb: a minimal implementation for bootstrapping the Rubinius kernel
kernel/common/array.rb: the bulk of the implementation
For comparison, here is Topaz's implementation:
lib-topaz/array.rb
And JRuby:
core/src/main/java/org/jruby/RubyArray.java
arr = [1,2,3,4,5]
arr.replace([5,5,5])
I wouldn't monkey-patch a new method into Array; especially since it already exists. Array#replace
As Array are mutables, you can alter it's contents:
class Array
def change_self
self.clear
self.concat [5, 5, 5]
end
end
You modify the array so it becomes empty, and then add all the elements from the target array. They still are two different objects (ie, myAry.object_id would differ from [5, 5, 5].object_id), but now they are equivalent arrays.
Moreover, the array still is the same that before - just it's content changed:
myAry = [1, 2, 3]
otherRef = myAry
previousId = myAry.object_id
previousHash = myAry.hash
myAry.change_self
puts "myAry is now #{myAry}"
puts "Hash changed from #{previousHash} to #{myAry.hash}"
puts "ID #{previousId} remained as #{myAry.object_id}, as it's still the same instance"
puts "otherRef points to the same instance - it shows the changes, too: #{otherRef}"
Anyway, I really don't know why one would want to do this - are you solving the right problem, or just kidding with the language?

Is the reverse() method in groovy merely an abstraction of an iteration?

Based on a question, the user wanted to access 99999th line of a 100000 lines file without having to iterate using an eachLineclosure on the first 99998 lines. So, I had suggested that he use
file.readLines().reverse()[1] to access the 99999th line of the file.
This is logically appealing to a programmer. However, I was quite doubtful about the intricacy regarding the implementation of this method.
Is the reverse() method a mere abstraction of the complete iteration on lines that is hidden from the programmer or is it really as intelligent as to be able to iterate over as less number of lines as possible to reach the required line?
As you can see from the code, reverse() calls Collections.reverse in Java to reverse the list.
However the non-mutating code gives you another option. Using listIterator() you can get an iterator with hasPrevious and previous to walk back through the list, so if you do:
// Our list
def a = [ 1, 2, 3, 4 ]
// Get a list iterator pointing at the end
def listIterator = a.listIterator( a.size() )
// Wrap the previous calls in another iterator
def iter = [ hasNext:{ listIterator.hasPrevious() },
next:{ listIterator.previous() } ] as Iterator
We can then do:
// Check the value of 1 element from the end of the list
assert iter[ 1 ] == 3
However, all of this is an ArrayList under the covers, so it's almost certainly quicker (and easier for the code to be read) if you just do:
assert a[ 2 ] == 3
Rather than all the reversing. Though obviously, this would need profiling to make sure I'm right...
According to the "Javadoc" it simply creates a new list in reverse order:
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Unless I'm missing something, you're correct an it's not that smart to jump cursor immediately. My understanding is if it's indexed as an Array it can access it directly, but if not has to iterate all over.
Alternative might be:
file.readLines().last()[-1]
This answer:
def a= [1, 2, 3, 4]
def listIterator= a.listIterator(a.size())
def iter= [hasNext: {listIterator.hasPrevious()},
next: {listIterator.previous()}] as Iterator
assert iter[1] == 3
only works in Groovy-1.7.2 and after.
In Groovy-1.7.1, 1.7.0, the 1.7 betas, 1.6, 1.5, and back to 1.0-RC-01, it doesn't find the getAt(1) method call for the proxy. For version 1.0-RC-06 and before, java.util.HashMap cannot be cast to java.util.Iterator.

Ruby 1.92 in Rails 3: A Case where Array.length Does Not Equal Array.count?

My understanding is that count and length should return the same number for Ruby arrays. So I can't figure out what is going on here (FactoryGirl is set to create--save to database--by default):
f = Factory(:family) # Also creates one dependent member
f.members.count # => 1
f.members.length # => 1
m = Factory(:member, :family=>f, :first_name=>'Sam') #Create a 2nd family member
f.members.count # => 2
f.members.length # => 1
puts f.members # prints a single member, the one created in the first step
f.members.class # => Array
f.reload
[ Now count == length = 2, and puts f.members prints both members]
I vaguely understand why f needs to be reloaded, though I would have expected that f.members would involve a database lookup for members with family_id=f.id, and would return all the members even if f is stale.
But how can the count be different from the length? f.members is an Array, but is the count method being overridden somewhere, or is the Array.count actually returning a different result from Array.length? Not a pressing issue, just a mystery that might indicate a basic flaw in my understanding of Ruby or Rails.
In looking at the source, https://github.com/rails/rails/blob/master/activerecord/lib/active_record/associations/collection_association.rb, length calls the size method on the internal collection and count actually calls count on the database.

Resources