Scala: remove first column (first element in each row) - arrays

Given an var x: Array[Seq[Any]], what would be the most efficient (fast, in-place if possible?) way to remove the first element from each row?
I've tried the following but it didn't work - probably because of immutability...
for (row <- x.indices)
x(row).drop(1)

First off, Arrays are mutable, and I'm guessing you're intending to change the elements of x, rather than replacing the entire array with a new array object, so it makes more sense to use val instead of var:
val x: Array[Seq[Any]]
Since you said your Seq objects are immutable, then you need to make sure you are setting the values in your array. This will work:
for (row <- x.indices)
x(row) = x(row).drop(1)
This can be written in nicer ways. For example, you can use transform to map all the values of your array with a function:
x.transform(_.drop(1))
This updates in-place, unlike map, which will leave the old array unmodified and return a new array.
EDIT: I started to speculate on which method would be faster or more efficient, but the more I think about it, the more I realize I'm not sure. Both should have acceptable performance for most use cases.

this would work
scala> val x = Array(List(1,2,3),List(1,2,3))
x: Array[List[Int]] = Array(List(1, 2, 3), List(1, 2, 3))
scala> x map(_.drop(1))
res0: Array[List[Int]] = Array(List(2, 3), List(2, 3))

Related

Scala - Efficient element wise sum of two arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays. For example:
val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2)
// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)
I've seen this post: Element-wise sum of arrays in Scala, and I currently use this approach (zip/map). However, using this for a big data application I am concerned about its performance. Using this approach one has to traverse the array(s) at least twice. Is there a better approach in terms of efficiency?
The most efficient way might well be to do it lazily.
As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike, but I don't know for sure)
import scala.collection.IndexedSeq
import scala.math.Numeric
case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
override val length: Int = seq1.length.min(seq2.length)
override def apply(i: Int) =
if (i >= length) throw new IndexOutOfBoundsException
else seq1(i) + seq2(i)
}
Arrays are implicitly convertible to a subtype of collection.IndexedSeq. This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq).
If you need an Array, you can get one with only a single traversal via
val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray
but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.
As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice. It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).
As Luis noted, for a performance question: experiment and benchmark. It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).
Well, first of all, as with all things related to performance the only answer is to benchmark.
Second, are you sure you need plain mutable, invariant, weird Arrays? Can't you use something like Vector or ArraySeq?
Third, you can just do something like this or using a while loop, which would be the same.
val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
arr1(i) + arr2(i)
}

Julia Quick way to initialise an empty array that's the same size as another?

I have an array
array1 = Array{Int,2}(undef, 2, 3)
Is there a way to quickly make a new array that's the same size as the first one? E.g. something like
array2 = Array{Int,2}(undef, size(array1))
current I have to do this which is pretty cumbersome, and even worse for higher dimension arrays
array2 = Array{Int,2}(undef, size(array1)[1], size(array1)[2])
What you're looking for is similar(array1).
You can even change up the array type by passing in a type, e.g.
similar(array1, Float64)
similar(array1, Int64)
Using similar is a great solution. But the reason your original attempt doesn't work is the number 2 in the type parameter signature: Array{Int, 2}. The number 2 specifies that the array must have 2 dimensions. If you remove it you can have exactly as many dimensions as you like:
julia> a = rand(2,4,3,2);
julia> b = Array{Int}(undef, size(a));
julia> size(b)
(2, 4, 3, 2)
This works for other array constructors too:
zeros(size(a))
ones(size(a))
fill(5, size(a))
# etc.

Adding value to arrays in scala

I faced a problem where I needed to add a new value in the middle of an Array (i.e. make a copy of the original array and replace that with the new one). I successfully solved my problem, but I was wondering whether there were other methods to do this without changing the array to buffer for a while.
val original = Array(0, 1, 3, 4)
val parts = original.splitAt(2)
val modified = parts._1 ++ (2 +: parts._2)
res0: Array[Int] = Array(0, 1, 2, 3, 4)
What I don't like on my solution is the parts variable; I'd prefer not using an intermediate step like that. Is that the easiest way to add the value or is there some better ways to do add an element?
This is precisely what patch does:
val original = Array(0, 1, 3, 4)
original.patch(2, Array(2), 0) // Array[Int] = Array(0, 1, 2, 3, 4)
You can use a mutable version of a collection to do this. The method insert do what you want (insert an element at a given index).
Well, if indeed the extra variable is what's troubling you, you can do it in one go:
val modified = original.take(2) ++ (2 +: original.drop(2))
But using a mutable collection like Augusto suggested might fit better, depending on your use case (e.g. performance, array size, multiple such edits...).
The question is, what's the context? If you are doing this in a loop, allocating a new array every time will kill your performance anyway, and you should rethink your approach (e.g. collect all the elements you want to insert before inserting them).
If you aren't, well, you can use System.arraycopy to avoid any intermediate conversions:
val original = Array(0, 1, 3, 4)
val index = 2
val valueToInsert = 2
val modified = Array.ofDim[Int](original.length + 1)
System.arraycopy(original, 0, modified, 0, index)
modified(index) = valueToInsert
System.arraycopy(original, index, modified, index + 1, original.length - index)
But note how easy it's to make an off-by-one error here (I think there isn't one, but I haven't tested it). So the only reason to do it is if you really need high performance, and that's only likely if it happens in a loop, in which case go back to the second sentence.

How do I algorithmically instantiate and manipulate a multidimensional array in Scala

I am trying to wrote a program to manage a Database through a Scala Gui, and have been running into alot of trouble formatting my data in such a way as to input it into a Table and have the Column Headers populate. To do this, I have been told I would need to use an Array[Array[Any]] instead of an ArrayBuffer[ArrayBuffer[String]] as I have been using.
My problem is that the way I am trying to fill these arrays is modular: I am trying to use the same function to draw from different tables in a MySQL database, each of which has a different number of columns and entries.
I have been able to (I think) define a 2-D array with
val Data = new Array[Array[String]](numColumns)(numRows)
but I haven't found any ways of editing individual cells in this new array.
Data(i)(j)=Value //or
Data(i,j)=Value
do not work, and give me errors about "Update" functionality
I am sure this can't possibly be as complicated as I have been making it, so what is the easy way of managing these things in this language?
You don't need to read your data into an Array of Arrays - you just need to convert it to that format when you feed it to the Table constuctor - which is easy, as demonstrated my answer to your other question: How do I configure the Column names in a Scala Table?
If you're creating a 2D array, the idiom you want is
val data = Array.ofDim[String](numColumms, numRows)
(There is also new Array[String](numColumns, numRows), but that's deprecated.)
You access element (i, j) of an Array data with data(i)(j) (remember they start from 0).
But in general you should avoid mutable collections (like Array, ArrayBuffer) unless there's a good reason. Try Vector instead.
Without knowing the format in which you're retrieving data from the database it's not possible to say how to put it into a collection.
Update:
You can alternatively put the type information on the left hand side, so the following are equivalent (decide for yourself which you prefer):
val a: Array[Array[String]] = Array.ofDim(2,2)
val a = Array.ofDim[String](2,2)
To explain the syntax for accessing / updating elements: as in Java, a multi-dimensional array is just an array of arrays. So here, a(i) is element i of a, which an Array[String], and so a(i)(j) is element j of that array, which is a String.
Luigi's answer is great, but I'd like to shed some light on why your code isn't working.
val Data = new Array[Array[String]](numColumns)(numRows)
does not do what you expect it to do. The new Array[Array[String]](numColumns) part does create an array of array of strings with numColumns entries, with all entries (arrys of strings) being null, and returns it. The following (numRows) then just calls the apply function on that returned object, which returns the numRowsth entry in that list, which is null.
You can try that out in the scala REPL: When you input
new Array[Array[String]](10)(9)
you get this as output:
res0: Array[String] = null
Luigi's solution, instead
Array.ofDim[String](2,2)
does the right thing:
res1: Array[Array[String]] = Array(Array(null, null), Array(null, null))
It's rather ugly, but you can update a multidimensional array with update
> val data = Array.ofDim[String](2,2)
data: Array[Array[String]] = Array(Array(null, null), Array(null, null))
> data(0).update(0, "foo")
> data
data: Array[Array[String]] = Array(Array(foo, null), Array(null, null))
Not sure about the efficiency of this technique.
Luigi's answer is great, but I just wanted to point out another way of initialising an Array that is more idiomatic/functional – using tabulate. This takes a function that takes the array cell coordinates as input and produces the cell value:
scala> Array.tabulate[String](4, 4) _
res0: (Int, Int) => String => Array[Array[String]] = <function1>
scala> val data = Array.tabulate(4, 4) {case (x, y) => x * y }
data: Array[Array[Int]] = Array(Array(0, 0, 0, 0), Array(0, 1, 2, 3), Array(0, 2, 4, 6), Array(0, 3, 6, 9))

ArrayStack Remove method?

Is there a method that does the same as Java's remove in the ArrayStack class?
Or is it possible to write one in Scala?
All Scala's collection types support adding/removing elements at either the start or the end (with varying performance trade-offs), and are only limited in size by certain properties of the JVM - such as the maximum size of pointers.
So if this is the only reason that you're using a Stack, then you've chosen the wrong collection type. Given the requirement to be able to remove elements from the middle of the collection, something like a Vector would be a much better fit.
You can use filterNot(_ == o) to create another stack with any instances of o missing (at least in 2.9), and you can stack.slice(0,n) ++ stack.slice(n+1,stack.length) to create a new stack with in indexed element missing.
But, no, there isn't an exact analog, probably because removing an item at a random position in an array is a low-performance thing to do.
Edit: slice seems buggy to me, actually, in 2.9.0.RC2 (I have filed a bug report with code to fix it, so this will be fixed for 2.9.0.final, presumably). And in 2.8.1, you have to create a new ArrayStack by hand. So I guess the answer for now is a pretty emphatic "no".
Edit: slice has been fixed, so as of 2.9.0.RC4 and later, the slice approach should work.
Maybe this could fit your needs:
scala> import collection.mutable.Stack
import collection.mutable.Stack
scala> val s = new Stack[Int]
s: scala.collection.mutable.Stack[Int] = Stack()
scala> s push 1
res0: s.type = Stack(1)
scala> s push 2
res1: s.type = Stack(2, 1)
scala> s push 3
res2: s.type = Stack(3, 2, 1)
scala> s pop
res3: Int = 3
scala> s pop
res4: Int = 2
scala> s pop
res5: Int = 1
Or there is also immutable version of the Stack class.

Resources