Is the reverse() method in groovy merely an abstraction of an iteration? - loops

Based on a question, the user wanted to access 99999th line of a 100000 lines file without having to iterate using an eachLineclosure on the first 99998 lines. So, I had suggested that he use
file.readLines().reverse()[1] to access the 99999th line of the file.
This is logically appealing to a programmer. However, I was quite doubtful about the intricacy regarding the implementation of this method.
Is the reverse() method a mere abstraction of the complete iteration on lines that is hidden from the programmer or is it really as intelligent as to be able to iterate over as less number of lines as possible to reach the required line?

As you can see from the code, reverse() calls Collections.reverse in Java to reverse the list.
However the non-mutating code gives you another option. Using listIterator() you can get an iterator with hasPrevious and previous to walk back through the list, so if you do:
// Our list
def a = [ 1, 2, 3, 4 ]
// Get a list iterator pointing at the end
def listIterator = a.listIterator( a.size() )
// Wrap the previous calls in another iterator
def iter = [ hasNext:{ listIterator.hasPrevious() },
next:{ listIterator.previous() } ] as Iterator
We can then do:
// Check the value of 1 element from the end of the list
assert iter[ 1 ] == 3
However, all of this is an ArrayList under the covers, so it's almost certainly quicker (and easier for the code to be read) if you just do:
assert a[ 2 ] == 3
Rather than all the reversing. Though obviously, this would need profiling to make sure I'm right...

According to the "Javadoc" it simply creates a new list in reverse order:
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Unless I'm missing something, you're correct an it's not that smart to jump cursor immediately. My understanding is if it's indexed as an Array it can access it directly, but if not has to iterate all over.
Alternative might be:
file.readLines().last()[-1]

This answer:
def a= [1, 2, 3, 4]
def listIterator= a.listIterator(a.size())
def iter= [hasNext: {listIterator.hasPrevious()},
next: {listIterator.previous()}] as Iterator
assert iter[1] == 3
only works in Groovy-1.7.2 and after.
In Groovy-1.7.1, 1.7.0, the 1.7 betas, 1.6, 1.5, and back to 1.0-RC-01, it doesn't find the getAt(1) method call for the proxy. For version 1.0-RC-06 and before, java.util.HashMap cannot be cast to java.util.Iterator.

Related

Each slice or keep the full array

each_slice keeps slices of length n, but in some cases I want to keep the full array, i.e. do nothing.
module MyModule
def num_slice
some_boolean_test? ? :full_array : 10 # Note : the content of some_boolean_test? in uninteresting, just assume sometimes it ca return true or false
end
end
class Foo
include MyModule
def a_method
big_array.each_slice(num_slice) do |array_slice|
# I want array_slice == big_array if num_slice returns :full_array
...
end
end
end
I could write a wrapper around Array#each_slice instead so I could define a different behaviour when the parameter is :full_array.
Could anyone help with that?
I'd first caution against significant logic differences between environments, since either one branch is less tested or you have twice the code to maintain. But assuming good reasons for the way you're doing it, here are some options:
Pass the array
Since num_slice is making a decision about the array, it seems reasonable num_slice should get access to it.
def num_slice(arr)
some_boolean_test? ? arr.size : 10
end
Environment configuration
You're using Rails, so you can set the slice size differently in production and your other environments. In production, make it 10, and in test, make it arbitrarily large; then just use the configured value. This is nice because there's no code difference.
def a_method
big_array.each_slice(Rails.application.config.slice_size) do |array_slice|
# ...
end
end
Wrap it
I wouldn't recommend this method because it causes the most significant difference between your environments, but since you asked about it, here's a way.
def a_method
magic_slice(big_array) do |array_slice|
# ...
end
end
def magic_slice(arr, &block)
if some_boolean_test?
block.call(arr)
else
arr.each_slice(10, &block)
end
end
def a_method(big_array, debug_context)
num_slice = debug_context ? big_array.length : 10
big_array.each_slice(num_slice) do |array_slice|
# array_slice will equal to big_array if debug_context == true
puts array_slice.inspect
end
end
test:
a_method([1,2,3,4,5], true)
[1, 2, 3, 4, 5]

Is it safe to delete from an Array inside each?

Is it possible to safely delete elements from an Array while iterating over it via each? A first test looks promising:
a = (1..4).to_a
a.each { |i| a.delete(i) if i == 2 }
# => [1, 3, 4]
However, I could not find hard facts on:
Whether it is safe (by design)
Since which Ruby version it is safe
At some points in the past, it seems that it was not possible to do:
It's not working because Ruby exits the .each loop when attempting to delete something.
The documentation does not state anything about deletability during iteration.
I am not looking for reject or delete_if. I want to do things with the elements of an array, and sometimes also remove an element from the array (after I've done other things with said element).
Update 1: I was not very clear on my definition of "safe", what I meant was:
do not raise any exceptions
do not skip any element in the Array
You should not rely on unauthorized answers too much. The answer you cited is wrong, as is pointed out by Kevin's comment to it.
It is safe (from the beginning of Ruby) to delete elements from an Array while each in the sense that Ruby will not raise an error for doing that, and will give a decisive (i.e., not random) result.
However, you need to be careful because when you delete an element, the elements following it will be shifted, hence the element that was supposed to be iterated next would be moved to the position of the deleted element, which has been iterated over already, and will be skipped.
In order to answer your question, whether it is "safe" to do so, you will first have to define what you mean by "safe". Do you mean
it doesn't crash the runtime?
it doesn't raise an Exception?
it does raise an Exception?
it behaves deterministically?
it does what you expect it to do? (What do you expect it to do?)
Unfortunately, the Ruby Language Specification is not exactly helpful:
15.2.12.5.10 Array#each
each(&block)
Visibility: public
Behavior:
If block is given:
For each element of the receiver in the indexing order, call block with the element as the only argument.
Return the receiver.
This seems to imply that it is indeed completely safe in the sense of 1., 2., 4., and 5. above.
The documentation says:
each { |item| block } → ary
Calls the given block once for each element in self, passing that element as a parameter.
Again, this seems to imply the same thing as the spec.
Unfortunately, none of the currently existing Ruby implementations interpret the spec in this way.
What actually happens in MRI and YARV is the following: the mutation to the array, including any shifting of the elements and/or indices becomes visible immediately, including to the internal implementation of the iterator code which is based on array indices. So, if you delete an element at or before the position you are currently iterating, you will skip the next element, whereas if you delete an element after the position you are currently iterating, you will skip that element. For each_with_index, you will also observe that all elements after the deleted element have their indices shifted (or rather the other way around: the indices stay put, but the elements are shifted).
So, this behavior is "safe" in the sense of 1., 2., and 4.
The other Ruby implementations mostly copy this (undocumented) behavior, but being undocumented, you cannot rely on it, and in fact, I believe at least one did experiment briefly with raising some sort of ConcurrentModificationException instead.
I would say that it is safe, based on the following:
2.2.2 :035 > a = (1..4).to_a
=> [1, 2, 3, 4]
2.2.2 :036 > a.each { |i| a.delete(i+1) if i > 1 ; puts i }
1
2
4
=> [1, 2, 4]
I'd infer from this test that Ruby correctly recognises while iterating through the contents that the element "3" has been deleted while element "2" was being processed, otherwise element "4" would also have been deleted.
However,
2.2.2 :040 > a.each { |i| puts i; a.delete(i) if i > 1 ; puts i }
1
1
2
2
4
4
This suggests that after "2" is deleted, the next element processed is whichever is now third in the array, so the element that used to be in third place does not get processed at all. each appears to re-examine the array to find the next element to process on every iteration.
I think that with that in mind, you ought to duplicate the array in your circumstances prior to processing.
It depends.
All .each does is returns an enumerator, which holds the collection an a pointer to where it left. Example:
a = [1,2,3]
b = a.each # => #<Enumerator: [1, 2, 3]:each>
b.next # => 1
a.delete(2)
b.next # => 3
a.clear
b.next # => StopIteration: iteration reached an end
Each with block calls next until the iteration reaches its end. So as long as you don't modify any 'future' array records it should be safe.
However there are so many helpful methods in ruby's Enumerable and Array you really shouldn't ever need to do this.
You are right, in the past it was advised not to remove items from the collection while iterating over it. In my tests and at least with version 1.9.3 in practice in an array this gives no problem, even when deleting prior or next elements.
It is my opinion that while you can you shouldn't.
A more clear and safe approach is to reject the elements and assign to a new array.
b = a.reject{ |i| i == 2 } #[1, 3, 4]
In case you want to reuse your a array that is also possible
a = a.reject{ |i| i == 2 } #[1, 3, 4]
which is in fact the same as
a.reject!{ |i| i == 2 } #[1, 3, 4]
You say you don't want to use reject because you want to do other things with the elements before deleting, but that is also possible.
a.reject!{ |i| puts i if i == 2;i == 2 }
# 2
#[1, 3, 4]

in python which would be the fastest way to compare / test with an lists

In my code the lists will eventually end up will all elements empty. Which is what I am testing for, are all elements == ''.
The size of the lists can vary with input.
The two test I was considering are an equality test, and using the list.count() function. Which will be faster at runtime.
I am new to python so how things are done in the back ground are not that familiar to me. My assumption is that Test 2 will be faster if it does not iteratively check each element to do the comparison. As the data in the lists can vary from an empty string to a over string of over 100 chars the simple check done by Test 1 count('') could also be very fast.
Sample code to set up my variables for testing.
mylist = [''] * 33
testlist = []
testlist.extend('' * mylist.__len__())
testlist.count('')
33
mylist.count('')
33
Which of the following test is going to be faster.
Test 1
if mylist.count('') == 33:
do some thing
while mylist.count('') !=33:
do some thing
Test 2
if mylist == testlist:
do some thing
while mylist != testlist:
do some thing
You don't describe the problem that you are actually trying to solve, but are you setting list entries to the empty string in order to mark them as finished, so that you don't process them again?
If that's the case, then you may get better results by using a different data structure. For example, perhaps you could use a set, and remove items when you are done with them. Then you can just test to see if your set is empty, which is a constant-time operation.
But we need to know more about what you are trying to do in order to be able to help you.
If you just want to figure out which of two implementations is faster, Python's timeit module contains functions for timing execution of code. For example:
>>> from timeit import timeit
>>> l1 = [''] * 1000
>>> l2 = [''] * 1000
>>> timeit(lambda:l1 == l2)
4.670141935348511
>>> timeit(lambda:l1.count('') == len(l1))
4.50224494934082
so it looks like these two approaches take almost exactly the same time in this case (as you might have guessed). But in the case where the list is not full of empty strings, == is faster (because when it finds a mismatch it can return False immediately without having to check any more list elements):
>>> l3 = ['a'] + [''] * 999
>>> timeit(lambda:l3.count('') == len(l3))
4.379799842834473
>>> timeit(lambda:l3 == l2)
0.19073486328125

SWI Prolog Array Retrive [Index and Element]

I'm trying to program an array retrieval in swi-prolog. With the current code printed below I can retrieve the element at the given index but I also want to be able to retrieve the index[es] of a given element.
aget([_|X],Y,Z) :- Y \= 0, Y2 is (Y-1), aget(X,Y2,Z).
aget([W|_],Y,Z) :- Y = 0, Z is W.
Example 1: aget([9,8,7,6,5],1,N) {Retrieve the element 8 at index 1}
output: N = 9. {Correct}
Example 2: aget([9,8,7,6,5],N,7) {retrieve the index 2 for Element 7}
output: false {incorrect}
The way I understood it was that swi-prolog would work in this way with little no additional programing. So clearly I'm doing something wrong. If you could point me in the right direction or tell me what I'm doing wrong, I would greatly appreciate it.
Your code it's too procedural, and the second clause it's plainly wrong, working only for numbers.
The functionality you're looking for is implemented by nth0/3. In SWI-Prolog you can see the optimized source with ?- edit(nth0). An alternative implementation has been discussed here on SO (here my answer).
Note that Prolog doesn't have arrays, but lists. When an algorithm can be rephrased to avoid indexing, then we should do.
If you represent arrays as compounds, you can also use the ISO standard predicate arg/3 to access an array element. Here is an example run:
?- X = array(11,33,44,77), arg(2,X,Y).
X = array(11, 33, 44, 77),
Y = 33.
The advantage over lists is that the compound access needs O(1) time and whereas the list access needs O(n) time, where n is the length of the array.

How to stop a loop in Groovy

From the collection fileMatches, I want to assign the maps with the 10 greatest values to a new collection called topTen. So I try to make a collection:
def fileMatches = [:].withDefault{[]}
new File('C:\\BRUCE\\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split (/\t/)[0, 2]
fileMatches[source] << (matches as int)
I want to iterate through my collection and grab the 10 maps with greatest values. One issue I might be having is that the output of this doesn't look quite like I imagined. One entry for example:
C:\cygwin\home\pro-services\git\projectdb\project\stats\top.h:[984, 984]
The advice so far has been excellent, but I'm not sure if my collection is arranged to take advantage of the suggested solutions (I have filename:[984, 984] when maybe I want [filename, 984] as the map entries in my collection). I don't understand this stuff quite yet (like how fileMatches[source] << (matches as int) works, as it produces the line I posted immediately above (with source:[matches, matches] being the output).
Please advise, and thanks for the help!
Check this another approach, using some Collection's skills. It does what you want with some simplicity...
def fileMatches = [um: 123, dois: 234, tres: 293, quatro: 920, cinco: 290];
def topThree;
topThree = fileMatches.sort({tup1, tup2 -> tup2.value <=> tup1.value}).take(3);
Result:
Result: [quatro:920, tres:293, cinco:290]
You might find it easier to use some of the built-in collection methods that Groovy provides, e.g.:
fileMatches.sort { a, b -> b.someFilename <=> a.someFilename }[0..9]
or
fileMatches.sort { it.someFileName }[-1..-10]
The range on the end there will cause an error if you have < 10 entries, so it may need some adjusting if that's your case.

Resources