How to stop a loop in Groovy - arrays

From the collection fileMatches, I want to assign the maps with the 10 greatest values to a new collection called topTen. So I try to make a collection:
def fileMatches = [:].withDefault{[]}
new File('C:\\BRUCE\\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split (/\t/)[0, 2]
fileMatches[source] << (matches as int)
I want to iterate through my collection and grab the 10 maps with greatest values. One issue I might be having is that the output of this doesn't look quite like I imagined. One entry for example:
C:\cygwin\home\pro-services\git\projectdb\project\stats\top.h:[984, 984]
The advice so far has been excellent, but I'm not sure if my collection is arranged to take advantage of the suggested solutions (I have filename:[984, 984] when maybe I want [filename, 984] as the map entries in my collection). I don't understand this stuff quite yet (like how fileMatches[source] << (matches as int) works, as it produces the line I posted immediately above (with source:[matches, matches] being the output).
Please advise, and thanks for the help!

Check this another approach, using some Collection's skills. It does what you want with some simplicity...
def fileMatches = [um: 123, dois: 234, tres: 293, quatro: 920, cinco: 290];
def topThree;
topThree = fileMatches.sort({tup1, tup2 -> tup2.value <=> tup1.value}).take(3);
Result:
Result: [quatro:920, tres:293, cinco:290]

You might find it easier to use some of the built-in collection methods that Groovy provides, e.g.:
fileMatches.sort { a, b -> b.someFilename <=> a.someFilename }[0..9]
or
fileMatches.sort { it.someFileName }[-1..-10]
The range on the end there will cause an error if you have < 10 entries, so it may need some adjusting if that's your case.

Related

How do only add item to an array if it doesn't exist already (in a case insensitive way)? [duplicate]

I want to know what's the best way to make the String.include? methods ignore case. Currently I'm doing the following. Any suggestions? Thanks!
a = "abcDE"
b = "CD"
result = a.downcase.include? b.downcase
Edit:
How about Array.include?. All elements of the array are strings.
Summary
If you are only going to test a single word against an array, or if the contents of your array changes frequently, the fastest answer is Aaron's:
array.any?{ |s| s.casecmp(mystr)==0 }
If you are going to test many words against a static array, it's far better to use a variation of farnoy's answer: create a copy of your array that has all-lowercase versions of your words, and use include?. (This assumes that you can spare the memory to create a mutated copy of your array.)
# Do this once, or each time the array changes
downcased = array.map(&:downcase)
# Test lowercase words against that array
downcased.include?( mystr.downcase )
Even better, create a Set from your array.
# Do this once, or each time the array changes
downcased = Set.new array.map(&:downcase)
# Test lowercase words against that array
downcased.include?( mystr.downcase )
My original answer below is a very poor performer and generally not appropriate.
Benchmarks
Following are benchmarks for looking for 1,000 words with random casing in an array of slightly over 100,000 words, where 500 of the words will be found and 500 will not.
The 'regex' text is my answer here, using any?.
The 'casecmp' test is Arron's answer, using any? from my comment.
The 'downarray' test is farnoy's answer, re-creating a new downcased array for each of the 1,000 tests.
The 'downonce' test is farnoy's answer, but pre-creating the lookup array once only.
The 'set_once' test is creating a Set from the array of downcased strings, once before testing.
user system total real
regex 18.710000 0.020000 18.730000 ( 18.725266)
casecmp 5.160000 0.000000 5.160000 ( 5.155496)
downarray 16.760000 0.030000 16.790000 ( 16.809063)
downonce 0.650000 0.000000 0.650000 ( 0.643165)
set_once 0.040000 0.000000 0.040000 ( 0.038955)
If you can create a single downcased copy of your array once to perform many lookups against, farnoy's answer is the best (assuming you must use an array). If you can create a Set, though, do that.
If you like, examine the benchmarking code.
Original Answer
I (originally said that I) would personally create a case-insensitive regex (for a string literal) and use that:
re = /\A#{Regexp.escape(str)}\z/i # Match exactly this string, no substrings
all = array.grep(re) # Find all matching strings…
any = array.any?{ |s| s =~ re } # …or see if any matching string is present
Using any? can be slightly faster than grep as it can exit the loop as soon as it finds a single match.
For an array, use:
array.map(&:downcase).include?(string)
Regexps are very slow and should be avoided.
You can use casecmp to do your comparison, ignoring case.
"abcdef".casecmp("abcde") #=> 1
"aBcDeF".casecmp("abcdef") #=> 0
"abcdef".casecmp("abcdefg") #=> -1
"abcdef".casecmp("ABCDEF") #=> 0
class String
def caseinclude?(x)
a.downcase.include?(x.downcase)
end
end
my_array.map!{|c| c.downcase.strip}
where map! changes my_array, map instead returns a new array.
To farnoy in my case your example doesn't work for me. I'm actually looking to do this with a "substring" of any.
Here's my test case.
x = "<TD>", "<tr>", "<BODY>"
y = "td"
x.collect { |r| r.downcase }.include? y
=> false
x[0].include? y
=> false
x[0].downcase.include? y
=> true
Your case works with an exact case-insensitive match.
a = "TD", "tr", "BODY"
b = "td"
a.collect { |r| r.downcase }.include? b
=> true
I'm still experimenting with the other suggestions here.
---EDIT INSERT AFTER HERE---
I found the answer. Thanks to Drew Olsen
var1 = "<TD>", "<tr>","<BODY>"
=> ["<TD>", "<tr>", "<BODY>"]
var2 = "td"
=> "td"
var1.find_all{|item| item.downcase.include?(var2)}
=> ["<TD>"]
var1[0] = "<html>"
=> "<html>"
var1.find_all{|item| item.downcase.include?(var2)}
=> []

Is the reverse() method in groovy merely an abstraction of an iteration?

Based on a question, the user wanted to access 99999th line of a 100000 lines file without having to iterate using an eachLineclosure on the first 99998 lines. So, I had suggested that he use
file.readLines().reverse()[1] to access the 99999th line of the file.
This is logically appealing to a programmer. However, I was quite doubtful about the intricacy regarding the implementation of this method.
Is the reverse() method a mere abstraction of the complete iteration on lines that is hidden from the programmer or is it really as intelligent as to be able to iterate over as less number of lines as possible to reach the required line?
As you can see from the code, reverse() calls Collections.reverse in Java to reverse the list.
However the non-mutating code gives you another option. Using listIterator() you can get an iterator with hasPrevious and previous to walk back through the list, so if you do:
// Our list
def a = [ 1, 2, 3, 4 ]
// Get a list iterator pointing at the end
def listIterator = a.listIterator( a.size() )
// Wrap the previous calls in another iterator
def iter = [ hasNext:{ listIterator.hasPrevious() },
next:{ listIterator.previous() } ] as Iterator
We can then do:
// Check the value of 1 element from the end of the list
assert iter[ 1 ] == 3
However, all of this is an ArrayList under the covers, so it's almost certainly quicker (and easier for the code to be read) if you just do:
assert a[ 2 ] == 3
Rather than all the reversing. Though obviously, this would need profiling to make sure I'm right...
According to the "Javadoc" it simply creates a new list in reverse order:
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Unless I'm missing something, you're correct an it's not that smart to jump cursor immediately. My understanding is if it's indexed as an Array it can access it directly, but if not has to iterate all over.
Alternative might be:
file.readLines().last()[-1]
This answer:
def a= [1, 2, 3, 4]
def listIterator= a.listIterator(a.size())
def iter= [hasNext: {listIterator.hasPrevious()},
next: {listIterator.previous()}] as Iterator
assert iter[1] == 3
only works in Groovy-1.7.2 and after.
In Groovy-1.7.1, 1.7.0, the 1.7 betas, 1.6, 1.5, and back to 1.0-RC-01, it doesn't find the getAt(1) method call for the proxy. For version 1.0-RC-06 and before, java.util.HashMap cannot be cast to java.util.Iterator.

Clearing up confusion with Maps/Collections (Groovy)

I define a collection that is supposed to map two parts of a line in a tab-separated text file:
def fileMatches = [:].withDefault{[]}
new File('C:\\BRUCE\\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split (/\t/)[0, 2]
fileMatches[source] << (matches as int)}
I end up with entries such as filename:[984, 984] and I want [filename : 984] . I don't understand how fileMatches[source] << (matches as int) works. How might I get the kind of collection I need?
I'm not quite sure I understand what you are trying to do. How, for example, would you handle a line where the two values are different instead of the same (which seems to be what is implied by your code)? A map requires unique keys, so you can't use filename as a key if it has multiple values.
That said, you could get the result you want with the data implied by your result using:
def fileMatches = [:]
new File('C:\\BRUCE\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split(/\t/)[0,2]
fileMatches[source] = (matches as int)
}
But this will clobber the data (i.e., you will always end up with the second value from the last line your file. If that's not what you want you may want to rethink your data structure here.
Alternatively, assuming you want unique values, you could do:
def fileMatches = [:].withDefault([] as Set)
new File('C:\\BRUCE\ForensicAll.txt').eachLine { line ->
def (source, matches) = line.split(/\t/)[0,2]
fileMatches[source] << (matches[1] as int)
}
This will result in something like [filename:[984]] for the example data and, e.g., [filename:[984, 987]] for files having those two values in the two columns you are checking.
Again, it really depends on what you are trying to capture. If you could provide more detail on what you are trying to accomplish, your question may become answerable...

Scala repeat Array

I am a newbie to scala. I try to write a function that is "repeating" an Array (Scala 2.9.0):
def repeat[V](original: Array[V],times:Int):Array[V]= {
if (times==0)
Array[V]()
else
Array.concat(original,repeat(original,times-1)
}
But I am not able to compile this (get an error about the manifest)...
You need to ask compiler to provide class manifest for V:
def repeat[V : Manifest](original: Array[V], times: Int): Array[V] = ...
The answer to question: why is that needed, you can find here:
Why is ClassManifest needed with Array but not List?
I'm not sure where you want to use it, but I can generally recommend you to use List or other suitable collection instead of Array.
BTW, an alternative way to repeat a Array, would be to "fill" a Seq with references of the Array and then flatten that:
def repeat[V: Manifest](original: Array[V], times: Int) : Array[V] =
Seq.fill(times)(original).flatten.toArray;

ruby oneliner vs groovy

i was going through railstutorial and saw the following one liner
('a'..'z').to_a.shuffle[0..7].join
it creates random 7 character domain name like following:
hwpcbmze.heroku.com
seyjhflo.heroku.com
jhyicevg.heroku.com
I tried converting the one liner to groovy but i could only come up with:
def range = ('a'..'z')
def tempList = new ArrayList (range)
Collections.shuffle(tempList)
println tempList[0..7].join()+".heroku.com"
Can the above be improved and made to a one liner? I tried to make the above code shorter by
println Collections.shuffle(new ArrayList ( ('a'..'z') ))[0..7].join()+".heroku.com"
However, apparently Collections.shuffle(new ArrayList ( ('a'..'z') )) is a null
Not having shuffle built-in adds the most to the length, but here's a one liner that'll do it:
('a'..'z').toList().sort{new Random().nextInt()}[1..7].join()+".heroku.com"
Yours doesn't work because Collections.shuffle does an inplace shuffle but doesn't return anything. To use that as a one liner, you'd need to do this:
('a'..'z').toList().with{Collections.shuffle(it); it[1..7].join()+".heroku.com"}
It isn't a one-liner, but another Groovy way to do this is to add a shuffle method to String...
String.metaClass.shuffle = { range ->
def r = new Random()
delegate.toList().sort { r.nextInt() }.join()[range]}
And then you have something very Ruby-like...
('a'..'z').join().shuffle(0..7)+'.heroku.com'
This is my attempt. It is a one-liner but allows repetition of characters. It does not perform a shuffle, though it generates suitable output for a random domain name.
I'm posting it as an example of a recursive anonymous closure:
{ i -> i > 0 ? "${(97 + new Random().nextInt(26) as char)}" + call(i-1) : "" }.call(7) + ".heroku.com"
This is definitely not as pretty as the Ruby counterpart but, as Ted mentioned, it's mostly because of the fact that shuffle method is a static method in Collections.
[*'a'..'z'].with{ Collections.shuffle it; it }.take(7).join() + '.heroku.com'
I am using the spread operator trick to convert the range into a list :)

Resources