I am experiencing a boxing issue which affects negatively performance of my Scala code. I have extracted the relevant code, which still shows the issue, with some added strangeness. I have the following representation of a 2D Double array which allows me to perform transformations on it by providing my functions:
case class Container(
a: Array[Array[Double]] = Array.tabulate[Double](10000, 10000)((x,y) => x.toDouble * y)
) {
def transformXY(f: (Double, Double, Double) => Double): Container = {
Container(Array.tabulate[Double](a.length, a.length) { (x, y) =>
f(x, y, a(x)(y))
})
}
def transform(f: Double => Double): Container = {
Container(Array.tabulate[Double](a.length, a.length) { (x, y) =>
f(a(x)(y))
})
}
}
Following code reproduces the issue for me:
object Main extends App {
def now = System.currentTimeMillis()
val iters = 3
def doTransformsXY() = {
var t = Container()
for (i <- 0 until iters) {
val start = now
t = t.transformXY { (x, y, h) =>
h + math.sqrt(x * x + y * y)
}
println(s"transformXY: Duration ${now - start}")
}
}
def doTransforms() = {
var t = Container()
for (i <- 0 until iters) {
val start = now
t = t.transform { h =>
h + math.sqrt(h * h * h)
}
println(s"transform: Duration ${now - start}")
}
}
if (true) { // Shows a lot of boxing if enabled
doTransformsXY()
}
if (true) { // Shows a lot of boxing again - if enabled
doTransformsXY()
}
if (true) { // Shows java8.JFunction...apply()
doTransforms()
}
if (true) { // Shows java8.JFunction...apply() if doTransforms() is enabled
doTransformsXY()
}
}
When I run this code and sample it using Java VisualVM, I experience the following:
while doTransformsXY is running, I see a lot of time spent in scala.runtime.BoxesRunTime.boxToDouble()
once doTransforms is running, there is no more significant time spend it boxing, samples show scala.runtime.java8.JFunction2$mcDII$sp.apply() instead
I run doTransformsXY again, there is still no significant boxing, again time grows in scala.runtime.java8.JFunction2$mcDII$sp.apply()
This is with Scala 2.12.4, Windows x64 jdk1.8.0_92
My primary question is about the boxing, which I see in my production code as well:
why is there Double boxing happening in Array.tabulate? Do I need to go procedural (while loops, manual Array creation) to avoid it?
My secondary question is:
why is no more boxing done once I call the transform variant?
why is no more boxing done once I call the transform variant?
I did not reproduce that. If I carefully pause VMs and check with JProfiler, it still does a lot of boxing and allocation of Doubles. Which is what I expected, and I have an explanation for.
Looking at the Function1 and Function2 traits in standard library, we can see #specialized annotations:
trait Function1[#specialized(Int, Long, Float, Double) -T1, #specialized(Unit, Boolean, Int, Float, Long, Double) +R]
trait Function2[#specialized(Int, Long, Double) -T1, #specialized(Int, Long, Double) -T2, #specialized(Unit, Boolean, Int, Float, Long, Double) +R]
but the Function3 is just
trait Function3[-T1, -T2, -T3, +R]
#specialized is how Scala lets you avoid boxing on generics with primitives. But this comes at a price of compiler having to generate additional methods and classes, so beyond a certain threshold it will just produce a ridiculous amount of code (if not crash outright). So Function has, if my math is correct, 4 (specs on T1) x 6 (specs on R) = 24 copies of each specialized method and 24 extra classes in addition to just apply and a generic trait.
Oh, and by the way, those methods are postfixed with $mc and the JNI type signatures. So method ending in $mcDII is a specialized overload that returns a Double, and accepts two Ints as parameters. This is the type of function you're passing into tabulate inside transform, i.e. this part
(x, y) => f(a(x)(y))
While calls to f should show up with $mcDD postfix (returns a Double and accepts a Double).
However, calling
f(x, y, a(x)(y))
will become something like
unbox(f(box(x), box(y), box(a(x)(y))))
So I bothered you enough with the explanation. It's time for solution. To bring boxing of both methods to equivalent shape, create a specialized interface:
trait DoubleFunction3 {
def apply(a: Double, b: Double, c: Double): Double
}
and rewrite your signature in transformXY
def transformXY(f: DoubleFunction3): Container = //... same code
Since it's Scala 2.12 and you have just one abstract method in the trait, you can still pass lambdas, so this code:
t = t.transformXY { (x, y, h) =>
h + math.sqrt(x * x + y * y)
}
requires no change.
Now you might notice that this does not fully eliminate boxing because tabulate causes it too. This is definition of a single-dimensional tabulate:
def tabulate[T: ClassTag](n: Int)(f: Int => T): Array[T] = {
val b = newBuilder[T]
b.sizeHint(n)
var i = 0
while (i < n) {
b += f(i)
i += 1
}
b.result()
}
Note that it works with a generic Builder[T], calling a method +=(elem: T). Builder itself is not specialized, so it will do wasteful boxing/unboxing when creating your arrays. Your fix for this is to write a version that directly uses Double instead of T, for the dimensions you need.
Related
The below code is written in scala,
val Array(f, t) = readLine().trim().split(" +").map(_.toInt)
I am not able to comprehend val Array(f, t).
To me, Array is class. Due to that, We can only create the object and with that object, we can access the function of the class. Or else We can access the static methods of the Array class without creating an object for it.
-- scala
def main(args: Array[String]): Unit = {
val n = readInt
val m = readInt
val f = Array.ofDim[Int](100000)
Arrays.fill(f, -1)
for (e <- 1 to m) {
val Array(f, t) = readLine().trim().split(" +").map(_.toInt)
// Code continues
}
}
That is called pattern matching (for example you can check this at Extractors). The code you mentioned means that please assign the first (index 0) value in the array resulting to f, assign the second (index 1) element to t and there should not be more or less values in the array. Both f and t are fresh variables.
You also mentioned the confusion with the val Array(...) syntax. It translates to the following method: scala.Array.unapplySeq[T](x:Array[T])
What is the main difference in these two:
val array: Array<Double> = arrayOf()
vs
val array: DoubleArray = doubleArrayOf()
I know that one is using primitive data type double and the second its object based countrepart Double.
Is there any penalty or disadvatnage in using plain DoubleArray?
Why I want to know:
I am using JNI and for Double, I have to call
jclass doubleClass = env->FindClass("java/lang/Double");
jmethodID doubleCtor = env->GetMethodID(doubleClass, "<init>", "(D)V");
jobjectArray res = env->NewObjectArray(elementCount, doubleClass, nullptr);
for (int i = 0; i < elementCount; i++){
jobject javaDouble = env->NewObject(doubleClass, doubleCtor, array[i]);
env->SetObjectArrayElement(res, i, javaDouble);
env->DeleteLocalRef(javaDouble);
}
vs
jdoubleArray res = env->NewDoubleArray(elementCount);
env->SetDoubleArrayRegion(res, 0, elementCount, array);
There is no penalty (in fact, it will be faster due to no boxing), but, as with primitive types in Java, it forces you to create specialized overloads of certain methods if you want to be able to use them with [Int/Double/etc]Array.
This has actually been discussed over at the Kotlin forums:
the memory layout of an array of integers is quite different from that of an array of object pointers.
Norswap's comment in that discussion summarizes the tradeoff quite well:
The native one [int[]/IntArray] is faster to read/write, but the wrapped [Integer[]/Array<Int>] does not need to be entirely converted each time it crosses a generic boundary.
#7, norswap
For example, a function accepting Array<Int> (Integer[] on the JVM) will not accept an IntArray (int[]).
You have already listed the only real difference, that one is compiled to the primitive double[] and the other to Double[]. However, Double[] is an array of objects, so any time you modify the array by setting a value to a double, or retrieve a double, boxing and unboxing will be performed, respectively.
It is usually recommended to use DoubleArray instead, for speed and memory reasons.
As an example of speed penalties due to the object wrappers, take a look at the start of this post, taken from Effective Java:
public static void main(String[] args) {
Long sum = 0L; // uses Long, not long
for (long i = 0; i <= Integer.MAX_VALUE; i++) {
sum += i;
}
System.out.println(sum);
}
Replacing Long with long brings runtime from 43 seconds down to 8 seconds.
I am frequently needing to calculate mean and standard deviation for numeric arrays. So I've written a small protocol and extensions for numeric types that seems to work. I just would like feedback if there is anything wrong with how I have done this. Specifically, I am wondering if there is a better way to check if the type can be cast as a Double to avoid the need for the asDouble variable and init(_:Double) constructor.
I know there are issues with protocols that allow for arithmetic, but this seems to work ok and saves me from putting the standard deviation function into classes that need it.
protocol Numeric {
var asDouble: Double { get }
init(_: Double)
}
extension Int: Numeric {var asDouble: Double { get {return Double(self)}}}
extension Float: Numeric {var asDouble: Double { get {return Double(self)}}}
extension Double: Numeric {var asDouble: Double { get {return Double(self)}}}
extension CGFloat: Numeric {var asDouble: Double { get {return Double(self)}}}
extension Array where Element: Numeric {
var mean : Element { get { return Element(self.reduce(0, combine: {$0.asDouble + $1.asDouble}) / Double(self.count))}}
var sd : Element { get {
let mu = self.reduce(0, combine: {$0.asDouble + $1.asDouble}) / Double(self.count)
let variances = self.map{pow(($0.asDouble - mu), 2)}
return Element(sqrt(variances.mean))
}}
}
edit: I know it's kind of pointless to get [Int].mean and sd, but I might use numeric elsewhere so it's for consistency..
edit: as #Severin Pappadeux pointed out, variance can be expressed in a manner that avoids the triple pass on the array - mean then map then mean. Here is the final standard deviation extension
extension Array where Element: Numeric {
var sd : Element { get {
let sss = self.reduce((0.0, 0.0)){ return ($0.0 + $1.asDouble, $0.1 + ($1.asDouble * $1.asDouble))}
let n = Double(self.count)
return Element(sqrt(sss.1/n - (sss.0/n * sss.0/n)))
}}
}
Swift 4 Array extension with FloatingPoint elements:
extension Array where Element: FloatingPoint {
func sum() -> Element {
return self.reduce(0, +)
}
func avg() -> Element {
return self.sum() / Element(self.count)
}
func std() -> Element {
let mean = self.avg()
let v = self.reduce(0, { $0 + ($1-mean)*($1-mean) })
return sqrt(v / (Element(self.count) - 1))
}
}
There's actually a class that provides this functionality already - called NSExpression. You could reduce your code size and complexity by using this instead. There's quite a bit of stuff to this class, but a simple implementation of what you want is as follows.
let expression = NSExpression(forFunction: "stddev:", arguments: [NSExpression(forConstantValue: [1,2,3,4,5])])
let standardDeviation = expression.expressionValueWithObject(nil, context: nil)
You can calculate mean too, and much more. Info here: http://nshipster.com/nsexpression/
In Swift 3 you might (or might not) be able to save yourself some duplication with the FloatingPoint protocol, but otherwise what you're doing is exactly right.
To follow up on Matt's observation, I'd do the main algorithm on FloatingPoint, taking care of Double, Float, CGFloat, etc. But then I then do another permutation of this on BinaryInteger, to take care of all of the integer types.
E.g. on FloatingPoint:
extension Array where Element: FloatingPoint {
/// The mean average of the items in the collection.
var mean: Element { return reduce(Element(0), +) / Element(count) }
/// The unbiased sample standard deviation. Is `nil` if there are insufficient number of items in the collection.
var stdev: Element? {
guard count > 1 else { return nil }
return sqrt(sumSquaredDeviations() / Element(count - 1))
}
/// The population standard deviation. Is `nil` if there are insufficient number of items in the collection.
var stdevp: Element? {
guard count > 0 else { return nil }
return sqrt(sumSquaredDeviations() / Element(count))
}
/// Calculate the sum of the squares of the differences of the values from the mean
///
/// A calculation common for both sample and population standard deviations.
///
/// - calculate mean
/// - calculate deviation of each value from that mean
/// - square that
/// - sum all of those squares
private func sumSquaredDeviations() -> Element {
let average = mean
return map {
let difference = $0 - average
return difference * difference
}.reduce(Element(0), +)
}
}
But then on BinaryInteger:
extension Array where Element: BinaryInteger {
var mean: Double { return map { Double(exactly: $0)! }.mean }
var stdev: Double? { return map { Double(exactly: $0)! }.stdev }
var stdevp: Double? { return map { Double(exactly: $0)! }.stdevp }
}
Note, in my scenario, even when dealing with integer input data, I generally want floating point mean and standard deviations, so I arbitrarily chose Double. And you might want to do safer unwrapping of Double(exactly:). You can handle this scenario any way you want. But it illustrates the idea.
Not that I know Swift, but from numerics POV you're doing it a bit inefficiently
Basically, you're doing two passes (actually, three) over the array to compute two values, where one pass should be enough. Vairance might be expressed as E(X2) - E(X)2, so in some pseudo-code:
tuple<float,float> get_mean_sd(data) {
float s = 0.0f;
float s2 = 0.0f;
for(float v: data) {
s += v;
s2 += v*v;
}
s /= count;
s2 /= count;
s2 -= s*s;
return tuple(s, sqrt(s2 > 0.0 ? s2 : 0.0));
}
Just a heads-up, but when I tested the code outlined by Severin Pappadeux the result was a "population standard deviation" rather than a "sample standard deviation". You would use the first in an instance where 100% of the relevant data is available to you, such as when you are computing the variance around an average grade for all 20 students in a class. You would use the second if you did not have universal access to all the relevant data, and had to estimate the variance from a much smaller sample, such as estimating the height of all males within a large country.
The population standard deviation is often denoted as StDevP. The Swift 5.0 code I used is shown below. Note that this is not suitable for very large arrays due to loss of the "small value" bits as the summations get large. Especially when the variance is close to zero you might run into run-times errors. For such serious work you might have to introduce an algorithm called compensated summation
import Foundation
extension Array where Element: FloatingPoint
{
var sum: Element {
return self.reduce( 0, + )
}
var average: Element {
return self.sum / Element( count )
}
/**
(for a floating point array) returns a tuple containing the average and the "standard deviation for populations"
*/
var averageAndStandardDeviationP: ( average: Element, stDevP: Element ) {
let sumsTuple = sumAndSumSquared
let populationSize = Element( count )
let average = sumsTuple.sum / populationSize
let expectedXSquared = sumsTuple.sumSquared / populationSize
let variance = expectedXSquared - (average * average )
return ( average, sqrt( variance ) )
}
/**
(for a floating point array) returns a tuple containing the sum of all the values and the sum of all the values-squared
*/
private var sumAndSumSquared: ( sum: Element, sumSquared: Element ) {
return self.reduce( (Element(0), Element(0) ) )
{
( arg0, x) in
let (sumOfX, sumOfSquaredX) = arg0
return ( sumOfX + x, sumOfSquaredX + ( x * x ) )
}
}
}
In this article, the author explains monad using this example (I am guessing Haskell is used):
bind f' :: (Float,String) -> (Float,String)
which implies that
bind :: (Float -> (Float,String)) -> ((Float,String) ->
(Float,String))
and proceed to ask to implement the function bind and offer the solution as:
bind f' (gx,gs) = let (fx,fs) = f' gx in (fx,gs++fs)
I am having problem understanding the solution. What would this look like in C or Swift?
I have gone as far as I can implementing the example and I am stuck at implementing bind:
let f: Float -> Float = { value in return 2 * value }
let g: Float -> Float = { value in return 10 + value }
let ff: Float -> (Float, String) = { value in return (f(value), "f called") }
let gg: Float -> (Float, String) = { value in return (g(value), "f called") }
In C++ I think it would look something like this:
#include <functional>
#include <string>
#include <utility>
using P = std::pair<float, std::string>;
using FP = std::function<P(P)>;
FP mbind(std::function<P(float)> f) {
return [f](P in) {
auto && res = f(in.first);
return {res.first, in.second + res.second};
};
}
In C you could do something similar by storing function pointers, though the invocation syntax would have to be more verbose since you'll need to pass the state around explicitly.
In Swift, perhaps something like this:
let bind: (Float -> (Float, String)) -> ((Float, String) -> (Float, String)) = {
lhs in
return {
rhs in
let result = lhs(rhs.0)
return (result.0, "\(result.1); \(rhs.1)" )
}
}
It is a bind for Writer monad. The bind function for that monad should do 2 things:
Execute computation with Float value.
Update the existing log (a String value).
Initially you have a tuple (oldFloat,oldString) and want to apply to this tuple a function with type Float -> (Float,String).
Your function takes an oldFloat value from tuple (oldFloat,oldString) and returns a tuple (newFloat,newString).
What behavior do you expect from your bind function? I suppose you want to get a tuple containing a newFloat and updated log oldString ++ new string, right? Here is a straitforward implementation of it:
bind f (oldFloat,oldString) =
-- apply function f to oldFloat from tuple (oldFloat,oldString)
-- to get a new tuple (newFloat,newString)
let (newFloat,newString) = f oldFloat
-- you want from your bind function to get a tuple containing
-- a newFloat and a newString added to oldString
in (newFloat, oldString ++ newString)
I'm attempting to represent the basic strategy of a blackjack game as a map with integer keys whose values are a fixed length array of strings.
The keys represent the value of the player's hand, the array index represents the value of the dealers up card (hence the fixed length array of size 10 corresponding to card values 2-11). The string value in the array at the position corresponding to the dealer's up card contains the ideal play (stay, hit, split, double).
IE) player hand value is a hard 8, dealer up card is a 2. Basic strategy says the player should hit. To determine this using my map, I would get the array whose key is 8(player hand value = 8) and then looking at the string in array index 0 (Dealer up card = 2).
I've attempted to define it this way:
val hardHandBasicStrategy = collection.mutable.Map[Int,Array[String](10)]
but Scala doesn't seem to like this...
Please help me understand what I've done wrong, and/or suggest a way to make it work.
Scala doesn't have a type that represents arrays of a fixed size. You can either simply use arrays of size ten--this is what is normally done--or, if you want stronger guarantees that it really is size ten, you can construct a ten-long-array-class yourself:
class ArrayTen[T: ClassManifest](zero: T) {
protected val data = Array.fill(10)(zero)
def apply(i: Int) = data(i)
def update(i: Int, t: T) { data(i) = t }
protected def set(ts: Array[T]) { for (i <- data.indices) data(i) = ts(i) }
def map[U: ClassManifest](f: T => U) = {
val at = new ArrayTen(null.asInstanceOf[U])
at.set(data.map(f))
at
}
def foreach(f: T => Unit) { data.map(f) }
override def toString = data.mkString("#10[",",","]")
override def hashCode = scala.util.MurmurHash.arrayHash(data)
override def equals(a: Any) = a match {
case a: ArrayTen[_] => (data,a.data).zipped.forall(_ == _)
case _ => false
}
// Add other methods here if you really need them
}
Example:
scala> new ArrayTen("(nothing)")
res1: ArrayTen[java.lang.String] =
#10[(nothing),(nothing),(nothing),(nothing),(nothing),
(nothing),(nothing),(nothing),(nothing),(nothing)]
scala> res1(3) = "something!!!"
scala> res1
res3: ArrayTen[java.lang.String] =
#10[(nothing),(nothing),(nothing),something!!!,(nothing),
(nothing),(nothing),(nothing),(nothing),(nothing)]
If you need the the fixed-length array to take a parameter that determines the length, then you should
trait Size { size: Int }
class ArraySize[S <: Size, T: ClassManifest](zero: T, s: Size) {
protected val data = Array.fill(s.size)(zero)
...
}
You won't have all the collections goodies unless you reimplement them, but then again you don't want most of the goodies, since most of them can change the length of the array.