How to get the element index when mapping an array in Scala? - arrays

Let's consider a simple mapping example:
val a = Array("One", "Two", "Three")
val b = a.map(s => myFn(s))
What I need is to use not myFn(s: String): String here, but myFn(s: String, n: Int): String, where n would be the index of s in a. In this particular case myFn would expect the second argument to be 0 for s == "One", 1 for s == "Two" and 2 for s == "Three". How can I achieve this?

Depends whether you want convenience or speed.
Slow:
a.zipWithIndex.map{ case (s,i) => myFn(s,i) }
Faster:
for (i <- a.indices) yield myFn(a(i),i)
{ var i = -1; a.map{ s => i += 1; myFn(s,i) } }
Possibly fastest:
Array.tabulate(a.length){ i => myFn(a(i),i) }
If not, this surely is:
val b = new Array[Whatever](a.length)
var i = 0
while (i < a.length) {
b(i) = myFn(a(i),i)
i += 1
}
(In Scala 2.10.1 with Java 1.6u37, if "possibly fastest" is declared to take 1x time for a trivial string operation (truncation of a long string to a few characters), then "slow" takes 2x longer, "faster" each take 1.3x longer, and "surely" takes only 0.5x the time.)

A general tip: Use .iterator method liberally, to avoid creation of intermediate collections, and thus speed up your computation. (Only when performance requirements demand it. Or else don't.)
scala> def myFun(s: String, i: Int) = s + i
myFun: (s: String, i: Int)java.lang.String
scala> Array("nami", "zoro", "usopp")
res17: Array[java.lang.String] = Array(nami, zoro, usopp)
scala> res17.iterator.zipWithIndex
res19: java.lang.Object with Iterator[(java.lang.String, Int)]{def idx: Int; def idx_=(x$1: Int): Unit} = non-empty iterator
scala> res19 map { case (k, v) => myFun(k, v) }
res22: Iterator[java.lang.String] = non-empty iterator
scala> res22.toArray
res23: Array[java.lang.String] = Array(nami0, zoro1, usopp2)
Keep in mind that iterators are mutable, and hence once consumed cannot be used again.
An aside: The map call above involves de-tupling and then function application. This forces use of some local variables. You can avoid that using some higher order sorcery - convert a regular function to the one accepting tuple, and then pass it to map.
scala> Array("nami", "zoro", "usopp").zipWithIndex.map(Function.tupled(myFun))
res24: Array[java.lang.String] = Array(nami0, zoro1, usopp2)

What about this? I think it should be fast and it's pretty. But I'm no expert on Scala speed...
a.foldLeft(0) ((i, x) => {myFn(x, i); i + 1;} )

Index can also be accessed via the second element of tuples generated by the zipWithIndex method:
val a = Array("One", "Two", "Three")
val b = a.zipWithIndex.map(s => myFn(s._1, s._2))

Related

How do i extend a class which needs a ClassTag AND to extend Comparable?

I have a simple class I always implement when working with a new Language, MergeSort. So I am looking at my implementations of it with type Int and it looks great. Then I wanted to genericize it. I started with a simple implementation of T, but i noticed that needed to relfect the ClassTag. How do i assign the reflected ClassTag + extending?
class MergeSort[T: scala.reflect.ClassTag] {
var array: Array[T] = Array[T]()
var length: Int = 0
var tempArray: Array[T] = new Array(length)
def sort(data: Array[T]): Unit = {
array = data;
length = data.length;
tempArray = new Array[T](length)
//sort(0, length - 1)
}
}
Now this looks nice! It works, but when I i do the sort and rest of the functionality, I need to be able to compare 2 items of type T. The "Java" way was to just make sure the Object has the compareTo method. So i was thinking: [T extends Comparable]
but in scala, I am doing assignment for T with ClassTag, and
class MergeSort[T: scala.reflect.ClassTag extends Comparable] {} for example. It will error saying:
']' expected, but 'extends' found.
I was thinking this would sorta be the way to do things, but i am not sure whats going on here.
The endstate is to implement the merge portion of the class:
def merge(lower: Int, center: Int, upper: Int){
// ...
// loop
// if (tempArr(i) <= tempArr(j)) {} // OLD WAY, since First attempt was with Int.
// if (tempArr(i).compareTo(tempArr(j)) < 0) {} // Modified way with Comparable
}
Is this the scala way of implementing? I was noticing that people were mentioning Ordering, but i thought Comparable made sense.
The Scala way of implementing merge sort is using List and vals and the Ordering trait. The advantage of Ordering (the Java Comparator) is that Scala gives you implicit orderings for all standard library types by default.
def msort[T: Ordering](xs: List[T]): List[T] = {
#tailrec
def merge(xs: List[T], ys: List[T], acc: List[T] = Nil): List[T] =
(xs, ys) match {
case (Nil, _) => acc.reverse ++ ys
case (_, Nil) => acc.reverse ++ xs
case (x :: xs1, y :: ys1) =>
if (implicitly[Ordering[T]].lt(x, y))
merge(xs1, ys, x :: acc)
else
merge(xs, ys1, y :: acc)
}
xs match {
case Nil | _ :: Nil => xs
case _ =>
val (xs1, xs2) = xs splitAt (xs.length / 2)
merge(msort(xs1), msort(xs2))
}
}
msort(List(4, 23, 1, 2, 5, 76, 3, 142, 4321, 213, 42323))
// List(1, 2, 3, 4, 5, 23, 76, 142, 213, 4321, 42323)
msort(List("John", "Chris", "Helen", "Danny", "Michelle"))
// List(Chris, Danny, Helen, John, Michelle)
Another advantage over Ordered is that Scala provides implicit conversions from Ordered[A] => Ordering[A], which means your custom types that mix in Ordered will work with msort without the need to define implicit orderings.
Finally, the last advantage over Ordered is when using numeric types: Int, Double, etc. do not mix in Ordered, so you will not be able to sort elements of these types with Ordered, this is why most use Ordering instead.
I'm well aware this variant is not in-memory, but it does not require ClassTag at all to implement.

Process sequence of elements in scala

Given a sorted sequence of pair, I combine the elements if they are continuous or smaller and skip if they are not.
For example:
if A = Seq((1,2),(3,4),(5,6),(8,10)(11,15))
output should be Seq((1,6),(8,15))
since last element of current entry is continuous with first element of next entry
if B = Seq((1,4),(3,5),(6,7),(9,10)(11,15))
output should be Seq((1,7),(9,15))
since last element of current entry is smaller with first element of next entry
I tried something like:
val finalOut = mySeq.sliding(2).map {
case Array(x, y, _*) => (x, y, (x._2 - y._1))
}.toList
The problem with this is it will just take 2 elements at a time, whereas we need to keep traversing unless there is a gap in continuity. I am not sure how to obtain that in scala.
I tried implementing for loop as well, but that also doesn't help, because it processes one element at a time and doesn't help in keeping a track of other elements or counter like c++.
You can do this with foldRight and accumulate into a new Seq.
This works
val tups1 = Seq((1,2),(3,4),(5,6),(8,10),(11,15))
val tups2 = Seq((1,4),(3,5),(6,7),(9,10),(11,15))
def f(tups: Seq[(Int, Int)]): Seq[(Int, Int)] = {
val emptySeq: Seq[(Int, Int)] = Seq()
tups.foldRight(emptySeq){ (next, accum) => accum match {
case Nil => Seq(next)
case (a, b) +: cs if a - 1 <= next._2 => (next._1, next._2 max b) +: cs
case _ => next +: accum
}}
}
f(tups1) // Seq[(Int, Int)] = List((1,6), (8,15))
f(tups2) // Seq[(Int, Int)] = List((1,7), (9,15))
I'd use foldRight over foldLeft because Seq's are Lists under the hood, and for Lists, prepend is constant, while append is O(n), and foldRight lets you accomplish this with only prepends.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html

How to obtain nested WrappedArray

I need read-only structure with fast indexed access and minimum overhead. That structure would be queried quite often by the application. So, as it was supposed on the net, I tried to use Arrays and cast them to IndexedSeq
scala> val wa : IndexedSeq[Int] = Array(1,2,3)
wa: IndexedSeq[Int] = WrappedArray(1, 2, 3)
So far, so good. But I need to use nested Arrays and there the problem lies.
val wa2d : IndexedSeq[IndexedSeq[Int]] = Array(Array(1,2), Array(3), Array())
<console>:8: error: type mismatch;
found : Array[Array[_ <: Int]]
required: IndexedSeq[IndexedSeq[Int]]
val wa2d : IndexedSeq[IndexedSeq[Int]] = Array(Array(1,2), Array(3), Array())
Scala compiler could not apply implicit conversion recursively.
scala> val wa2d : IndexedSeq[IndexedSeq[Int]] = Array(Array[Int](1,2) : IndexedSeq[Int], Array[Int](3) : IndexedSeq[Int], Array[Int]() : IndexedSeq[Int])
wa2d: IndexedSeq[IndexedSeq[Int]] = WrappedArray(WrappedArray(1, 2), WrappedArray(3), WrappedArray())
That worked as expected, but this form is too verbose, for each subarray I need to specify types twice. And I would like to avoid it completely. So I've tried another approach
scala> val wa2d : IndexedSeq[IndexedSeq[Int]] = Array(Array(1,2), Array(3), Array()).map(_.to[IndexedSeq])
wa2d: IndexedSeq[IndexedSeq[Int]] = ArraySeq(Vector(1, 2), Vector(3), Vector())
But all WrappedArrays mysteriously disappeared and was replaced with ArraySeq and Vector.
So what is the less obscure way to define nested WrappedArrays ?
Here is how you do it:
scala> def wrap[T](a: Array[Array[T]]): IndexedSeq[IndexedSeq[T]] = { val x = a.map(x => x: IndexedSeq[T]); x }
scala> wrap(Array(Array(1,2), Array(3,4)))
res13: IndexedSeq[IndexedSeq[Int]] = WrappedArray(WrappedArray(1, 2), WrappedArray(3, 4))
If you want to use implicit conversions, use this:
def wrap[T](a: Array[Array[T]]): IndexedSeq[IndexedSeq[T]] = { val x = a.map(x => x: IndexedSeq[T]); x }
implicit def nestedArrayIsNestedIndexedSeq[T](x: Array[Array[T]]): IndexedSeq[IndexedSeq[T]] = wrap(x)
val x: IndexedSeq[IndexedSeq[Int]] = Array(Array(1,2),Array(3,4))
And here is why you might not want to do it:
val th = ichi.bench.Thyme.warmed()
val a = (0 until 100).toArray
val b = a: IndexedSeq[Int]
def sumArray(a: Array[Int]): Int = { var i = 0; var sum = 0; while(i < a.length) { sum += a(i); i += 1 }; sum }
def sumIndexedSeq(a: IndexedSeq[Int]): Int = { var i = 0; var sum = 0; while(i < a.length) { sum += a(i); i += 1 }; sum }
scala> th.pbenchOff("")(sumArray(a))(sumIndexedSeq(b))
Benchmark comparison (in 439.6 ms)
Significantly different (p ~= 0)
Time ratio: 3.18875 95% CI 3.06446 - 3.31303 (n=30)
First 65.12 ns 95% CI 62.69 ns - 67.54 ns
Second 207.6 ns 95% CI 205.2 ns - 210.1 ns
res15: Int = 4950
The bottom line is that once you access your Array[Int] indirectly via WrappedArray[Int], primitives get boxed. So things get much slower. If you really need the full performance of arrays, you have to use them directly. And if you don't, just use a Vector and stop worrying about it.
I would just go with Vector for prototyping and then go to Array once/if you are sure that this is actually a performance bottleneck. Use a type alias so you can quickly switch from Vector to Array.
Somewhere in your package object:
type Vec[T] = Vector[T]
val Vec = Vector
// type Vec[T] = Array[T]
// val Vec = Array
Then you can write code like this
val grid = Vec(Vec(1,2), Vec(3,4))
and switch quickly to an array version in case you measure that this is actually a performance bottleneck.

2d scala array iteration

I have a 2d array of type boolean (not important)
It is easy to iterate over the array in non-functional style.
How to do it FP style?
var matrix = Array.ofDim[Boolean](5, 5)
for ex, I would like to iterate through all the rows for a given column and return a list of int that would match a specific function.
Example: for column 3, iterate through rows 1 to 5 to return 4, 5 if the cell at (4, 3), (5, 3) match a specif function. Thx v much
def getChildren(nodeId: Int) : List[Int] = {
info("getChildren("+nodeId+")")
var list = List[Int]()
val nodeIndex = id2indexMap(nodeId)
for (rowIndex <- 0 until matrix.size) {
val elem = matrix(rowIndex)(nodeIndex)
if (elem) {
println("Row Index = " + rowIndex)
list = rowIndex :: list
}
}
list
}
What about
(1 to 5) filter {i => predicate(matrix(i)(3))}
where predicate is your function?
Note that initialized with (5,5) indexes goes from 0 to 4.
Update: based on your example
def getChildren(nodeId: Int) : List[Int] = {
info("getChildren("+nodeId+")")
val nodeIndex = id2indexMap(nodeId)
val result = (0 until matrix.size).filter(matrix(_)(nodeIndex)).toList
result.forEach(println)
result
}
You may move the print in the fiter if you want too, and reverse the list if you want it exactly as in your example
If you're not comfortable with filters and zips, you can stick with the for-comprehension but use it in a more functional way:
for {
rowIndex <- matrix.indices
if matrix(rowIndex)(nodeIndex)
} yield {
println("Row Index = " + rowIndex)
rowIndex
}
yield builds a new collection from the results of the for-comprehension, so this expression evaluates to the collection you want to return. seq.indices is a method equivalent to 0 until seq.size. The curly braces allow you to span multiple lines without semicolons, but you can make it in-line if you want:
for (rowIndex <- matrix.indices; if matrix(rowIndex)(nodeIndex)) yield rowIndex
Should probably also mention that normally if you're iterating through an Array you won't need to refer to the indices at all. You'd do something like
for {
row <- matrix
elem <- row
} yield f(elem)
but your use-case is a bit unusual in that it requires the indices of the elements, which you shouldn't normally be concerned with (using array indices is essentially a quick and dirty hack to pair a data element with a number). If you want to capture and use the notion of position you might be better off using a Map[Int, Boolean] or a case class with such a field.
def findIndices[A](aa: Array[Array[A]], pred: A => Boolean): Array[Array[Int]] =
aa.map(row =>
row.zipWithIndex.collect{
case (v,i) if pred(v) => i
}
)
You can refactor it to be a bit more nicer by extracting the function that finds the indices in a single row only:
def findIndices2[A](xs: Array[A], pred: A => Boolean): Array[Int] =
xs.zipWithIndex.collect{
case (v,i) if pred(v) => i
}
And then write
matrix.map(row => findIndices2(row, pred))

Convert an array into an index hash in Ruby

I have an array, and I want to make a hash so I can quickly ask "is X in the array?".
In perl, there is an easy (and fast) way to do this:
my #array = qw( 1 2 3 );
my %hash;
#hash{#array} = undef;
This generates a hash that looks like:
{
1 => undef,
2 => undef,
3 => undef,
}
The best I've come up with in Ruby is:
array = [1, 2, 3]
hash = Hash[array.map {|x| [x, nil]}]
which gives:
{1=>nil, 2=>nil, 3=>nil}
Is there a better Ruby way?
EDIT 1
No, Array.include? is not a good idea. Its slow. It does a query in O(n) instead of O(1). My example array had three elements for brevity; assume the actual one has a million elements. Let's do a little benchmarking:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..1_000_000).to_a
hash = Hash[array.map {|x| [x, nil]}]
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
end
Produces:
user system total real
Array.include? 46.190000 0.160000 46.350000 ( 46.593477)
Hash.include? 0.000000 0.000000 0.000000 ( 0.000523)
If all you need the hash for is membership, consider using a Set:
Set
Set implements a collection of unordered values with no
duplicates. This is a hybrid of Array's intuitive inter-operation
facilities and Hash's fast lookup.
Set is easy to use with Enumerable objects (implementing
each). Most of the initializer methods and binary operators accept
generic Enumerable objects besides sets and arrays. An
Enumerable object can be converted to Set using the
to_set method.
Set uses Hash as storage, so you must note the following points:
Equality of elements is determined according to Object#eql? and Object#hash.
Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an
unreliable state.
When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.
Comparison
The comparison operators <, >, <= and >= are implemented as
shorthand for the {proper_,}{subset?,superset?} methods. However, the
<=> operator is intentionally left out because not every pair of
sets is comparable. ({x,y} vs. {x,z} for example)
Example
require 'set'
s1 = Set.new [1, 2] # -> #<Set: {1, 2}>
s2 = [1, 2].to_set # -> #<Set: {1, 2}>
s1 == s2 # -> true
s1.add("foo") # -> #<Set: {1, 2, "foo"}>
s1.merge([2, 6]) # -> #<Set: {1, 2, "foo", 6}>
s1.subset? s2 # -> false
s2.subset? s1 # -> true
[...]
Public Class Methods
new(enum = nil)
Creates a new set containing the elements of the given enumerable
object.
If a block is given, the elements of enum are preprocessed by the
given block.
try this one:
a=[1,2,3]
Hash[a.zip]
You can do this very handy trick:
Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}
If you want to quickly ask "is X in the array?" you should use Array#include?.
Edit (in response to addition in OP):
If you want speedy look up times, use a Set. Having a Hash that points to all nils is silly. Conversion is an easy process too with Array#to_set.
require 'benchmark'
require 'set'
array = (1..1_000_000).to_a
set = array.to_set
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end
Results on my machine:
user system total real
Array.include? 36.200000 0.140000 36.340000 ( 36.740605)
Set.include? 0.000000 0.000000 0.000000 ( 0.000515)
You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.
I'm fairly certain that there isn't a one-shot clever way to construct this hash. My inclination would be to just be explicit and state what I'm doing:
hash = {}
array.each{|x| hash[x] = nil}
It doesn't look particularly elegant, but it's clear, and does the job.
FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. I get an "ArgumentError: odd number of arguments for Hash" error. Hash.[] expects a literal, even-lengthed list of values:
Hash[a, 1, b, 2] # => {a => 1, b => 2}
so I tried changing your code to:
hash = Hash[*array.map {|x| [x, nil]}.flatten]
but the performance is dire:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..100_000).to_a
Benchmark.bm(15) do |x|
x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end
gives
user system total real
assignment loop 0.440000 0.200000 0.640000 ( 0.657287)
hash constructor 4.440000 0.250000 4.690000 ( 4.758663)
Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.
Rampion beat me to it. Set might be the answer.
You can do:
require 'set'
set = array.to_set
set.include?(x)
Your way of creating the hash looks good. I had a muck around in irb and this is another way
>> [1,2,3,4].inject(Hash.new) { |h,i| {i => nil}.merge(h) }
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}
I think chrismear's point on using assignment over creation is great. To make the whole thing a little more Ruby-esque, though, I might suggest assigning something other than nil to each element:
hash = {}
array.each { |x| hash[x] = 1 } # or true or something else "truthy"
...
if hash[376] # instead of if hash.has_key?(376)
...
end
The problem with assigning to nil is that you have to use has_key? instead of [], since [] give you nil (your marker value) if the Hash doesn't have the specified key. You could get around this by using a different default value, but why go through the extra work?
# much less elegant than above:
hash = Hash.new(42)
array.each { |x| hash[x] = nil }
...
unless hash[376]
...
end
Maybe I am misunderstanding the goal here; If you wanted to know if X was in the array, why not do array.include?("X") ?
Doing some benchmarking on the suggestions so far gives that chrismear and Gaius's assignment-based hash creation is slightly faster than my map method (and assigning nil is slightly faster than assigning true). mtyaka and rampion's Set suggestion is about 35% slower to create.
As far as lookups, hash.include?(x) is a very tiny amount faster than hash[x]; both are twice as a fast as set.include?(x).
user system total real
chrismear 6.050000 0.850000 6.900000 ( 6.959355)
derobert 6.010000 1.060000 7.070000 ( 7.113237)
Gaius 6.210000 0.810000 7.020000 ( 7.049815)
mtyaka 8.750000 1.190000 9.940000 ( 9.967548)
rampion 8.700000 1.210000 9.910000 ( 9.962281)
user system total real
times 10.880000 0.000000 10.880000 ( 10.921315)
set 93.030000 17.490000 110.520000 (110.817044)
hash-i 45.820000 8.040000 53.860000 ( 53.981141)
hash-e 47.070000 8.280000 55.350000 ( 55.487760)
Benchmarking code is:
#!/usr/bin/ruby -w
require 'benchmark'
require 'set'
array = (1..5_000_000).to_a
Benchmark.bmbm(10) do |bm|
bm.report('chrismear') { hash = {}; array.each{|x| hash[x] = nil} }
bm.report('derobert') { hash = Hash[array.map {|x| [x, nil]}] }
bm.report('Gaius') { hash = {}; array.each{|x| hash[x] = true} }
bm.report('mtyaka') { set = array.to_set }
bm.report('rampion') { set = Set.new(array) }
end
hash = Hash[array.map {|x| [x, true]}]
set = array.to_set
array = nil
GC.start
GC.disable
Benchmark.bmbm(10) do |bm|
bm.report('times') { 100_000_000.times { } }
bm.report('set') { 100_000_000.times { set.include?(500_000) } }
bm.report('hash-i') { 100_000_000.times { hash.include?(500_000) } }
bm.report('hash-e') { 100_000_000.times { hash[500_000] } }
end
GC.enable
If you're not bothered what the hash values are
irb(main):031:0> a=(1..1_000_000).to_a ; a.length
=> 1000000
irb(main):032:0> h=Hash[a.zip a] ; h.keys.length
=> 1000000
Takes a second or so on my desktop.
If you're looking for an equivalent of this Perl code:
grep {$_ eq $element} #array
You can just use the simple Ruby code:
array.include?(element)
Here's a neat way to cache lookups with a Hash:
a = (1..1000000).to_a
h = Hash.new{|hash,key| hash[key] = true if a.include? key}
Pretty much what it does is create a default constructor for new hash values, then stores "true" in the cache if it's in the array (nil otherwise). This allows lazy loading into the cache, just in case you don't use every element.
This preserves 0's if your hash was [0,0,0,1,0]
hash = {}
arr.each_with_index{|el, idx| hash.merge!({(idx + 1 )=> el }) }
Returns :
# {1=>0, 2=>0, 3=>0, 4=>1, 5=>0}

Resources