Adding value to arrays in scala - arrays

I faced a problem where I needed to add a new value in the middle of an Array (i.e. make a copy of the original array and replace that with the new one). I successfully solved my problem, but I was wondering whether there were other methods to do this without changing the array to buffer for a while.
val original = Array(0, 1, 3, 4)
val parts = original.splitAt(2)
val modified = parts._1 ++ (2 +: parts._2)
res0: Array[Int] = Array(0, 1, 2, 3, 4)
What I don't like on my solution is the parts variable; I'd prefer not using an intermediate step like that. Is that the easiest way to add the value or is there some better ways to do add an element?

This is precisely what patch does:
val original = Array(0, 1, 3, 4)
original.patch(2, Array(2), 0) // Array[Int] = Array(0, 1, 2, 3, 4)

You can use a mutable version of a collection to do this. The method insert do what you want (insert an element at a given index).

Well, if indeed the extra variable is what's troubling you, you can do it in one go:
val modified = original.take(2) ++ (2 +: original.drop(2))
But using a mutable collection like Augusto suggested might fit better, depending on your use case (e.g. performance, array size, multiple such edits...).

The question is, what's the context? If you are doing this in a loop, allocating a new array every time will kill your performance anyway, and you should rethink your approach (e.g. collect all the elements you want to insert before inserting them).
If you aren't, well, you can use System.arraycopy to avoid any intermediate conversions:
val original = Array(0, 1, 3, 4)
val index = 2
val valueToInsert = 2
val modified = Array.ofDim[Int](original.length + 1)
System.arraycopy(original, 0, modified, 0, index)
modified(index) = valueToInsert
System.arraycopy(original, index, modified, index + 1, original.length - index)
But note how easy it's to make an off-by-one error here (I think there isn't one, but I haven't tested it). So the only reason to do it is if you really need high performance, and that's only likely if it happens in a loop, in which case go back to the second sentence.

Related

numpy array difference to the largest value in another large array which less than the original array

numpy experts,
I'm using numpy.
I want to compare two arrays, get the largest value that is smaller than one of the arrays, and calculate the difference between them.
For example,
A = np.array([3, 5, 7, 12, 13, 18])
B = np.array([4, 7, 17, 20])
I want [1, 0, 4, 2] (4-3, 7-7, 17-13, 20-18) , in this case.
The problem is that the size of the A and B arrays is so large that it would take a very long time to do this by simple means. I can try to divide them to some size, but I wonder if there is a simple numpy function to solve this problem.
Or can I use numba?
For your information, This is my current very stupid codes.
delta = np.zeros_like(B)
for i in range(len(B)):
index_A = (A <= B[i]).argmin() - 1
delta[i] = B[i] - A[index_A]
I agree with #tarlen555 that the problem is mostly related to the for-loop. I guess this one is already much faster:
diff = B-A[:,np.newaxis]
diff[diff<0] = max(A.max(), B.max())
diff.min(axis=0)
In the second line, I wanted to fill all entries with negative values with something ridiculously large. Since your numbers are integer, np.inf doesn't work, but something like that could be more elegant.
EDIT:
Another way:
from scipy.spatial import cKDTree
tree = cKDTree(A.reshape(-1, 1))
k = 2
large_value = max(A.max(), B.max())
while True:
indices = tree.query(B.reshape(-1, 1), k=k)[1]
diff = B[:,np.newaxis]-A[indices]
if np.all(diff.max(axis=-1)>=0):
break
k += 1
diff[diff<0] = large_value
diff.min(axis=1)
This solution could be more memory-efficient but frankly I'm not sure how much more.

Scala - Efficient element wise sum of two arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays. For example:
val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2)
// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)
I've seen this post: Element-wise sum of arrays in Scala, and I currently use this approach (zip/map). However, using this for a big data application I am concerned about its performance. Using this approach one has to traverse the array(s) at least twice. Is there a better approach in terms of efficiency?
The most efficient way might well be to do it lazily.
As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike, but I don't know for sure)
import scala.collection.IndexedSeq
import scala.math.Numeric
case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
override val length: Int = seq1.length.min(seq2.length)
override def apply(i: Int) =
if (i >= length) throw new IndexOutOfBoundsException
else seq1(i) + seq2(i)
}
Arrays are implicitly convertible to a subtype of collection.IndexedSeq. This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq).
If you need an Array, you can get one with only a single traversal via
val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray
but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.
As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice. It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).
As Luis noted, for a performance question: experiment and benchmark. It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).
Well, first of all, as with all things related to performance the only answer is to benchmark.
Second, are you sure you need plain mutable, invariant, weird Arrays? Can't you use something like Vector or ArraySeq?
Third, you can just do something like this or using a while loop, which would be the same.
val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
arr1(i) + arr2(i)
}

Scala: remove first column (first element in each row)

Given an var x: Array[Seq[Any]], what would be the most efficient (fast, in-place if possible?) way to remove the first element from each row?
I've tried the following but it didn't work - probably because of immutability...
for (row <- x.indices)
x(row).drop(1)
First off, Arrays are mutable, and I'm guessing you're intending to change the elements of x, rather than replacing the entire array with a new array object, so it makes more sense to use val instead of var:
val x: Array[Seq[Any]]
Since you said your Seq objects are immutable, then you need to make sure you are setting the values in your array. This will work:
for (row <- x.indices)
x(row) = x(row).drop(1)
This can be written in nicer ways. For example, you can use transform to map all the values of your array with a function:
x.transform(_.drop(1))
This updates in-place, unlike map, which will leave the old array unmodified and return a new array.
EDIT: I started to speculate on which method would be faster or more efficient, but the more I think about it, the more I realize I'm not sure. Both should have acceptable performance for most use cases.
this would work
scala> val x = Array(List(1,2,3),List(1,2,3))
x: Array[List[Int]] = Array(List(1, 2, 3), List(1, 2, 3))
scala> x map(_.drop(1))
res0: Array[List[Int]] = Array(List(2, 3), List(2, 3))

How do I algorithmically instantiate and manipulate a multidimensional array in Scala

I am trying to wrote a program to manage a Database through a Scala Gui, and have been running into alot of trouble formatting my data in such a way as to input it into a Table and have the Column Headers populate. To do this, I have been told I would need to use an Array[Array[Any]] instead of an ArrayBuffer[ArrayBuffer[String]] as I have been using.
My problem is that the way I am trying to fill these arrays is modular: I am trying to use the same function to draw from different tables in a MySQL database, each of which has a different number of columns and entries.
I have been able to (I think) define a 2-D array with
val Data = new Array[Array[String]](numColumns)(numRows)
but I haven't found any ways of editing individual cells in this new array.
Data(i)(j)=Value //or
Data(i,j)=Value
do not work, and give me errors about "Update" functionality
I am sure this can't possibly be as complicated as I have been making it, so what is the easy way of managing these things in this language?
You don't need to read your data into an Array of Arrays - you just need to convert it to that format when you feed it to the Table constuctor - which is easy, as demonstrated my answer to your other question: How do I configure the Column names in a Scala Table?
If you're creating a 2D array, the idiom you want is
val data = Array.ofDim[String](numColumms, numRows)
(There is also new Array[String](numColumns, numRows), but that's deprecated.)
You access element (i, j) of an Array data with data(i)(j) (remember they start from 0).
But in general you should avoid mutable collections (like Array, ArrayBuffer) unless there's a good reason. Try Vector instead.
Without knowing the format in which you're retrieving data from the database it's not possible to say how to put it into a collection.
Update:
You can alternatively put the type information on the left hand side, so the following are equivalent (decide for yourself which you prefer):
val a: Array[Array[String]] = Array.ofDim(2,2)
val a = Array.ofDim[String](2,2)
To explain the syntax for accessing / updating elements: as in Java, a multi-dimensional array is just an array of arrays. So here, a(i) is element i of a, which an Array[String], and so a(i)(j) is element j of that array, which is a String.
Luigi's answer is great, but I'd like to shed some light on why your code isn't working.
val Data = new Array[Array[String]](numColumns)(numRows)
does not do what you expect it to do. The new Array[Array[String]](numColumns) part does create an array of array of strings with numColumns entries, with all entries (arrys of strings) being null, and returns it. The following (numRows) then just calls the apply function on that returned object, which returns the numRowsth entry in that list, which is null.
You can try that out in the scala REPL: When you input
new Array[Array[String]](10)(9)
you get this as output:
res0: Array[String] = null
Luigi's solution, instead
Array.ofDim[String](2,2)
does the right thing:
res1: Array[Array[String]] = Array(Array(null, null), Array(null, null))
It's rather ugly, but you can update a multidimensional array with update
> val data = Array.ofDim[String](2,2)
data: Array[Array[String]] = Array(Array(null, null), Array(null, null))
> data(0).update(0, "foo")
> data
data: Array[Array[String]] = Array(Array(foo, null), Array(null, null))
Not sure about the efficiency of this technique.
Luigi's answer is great, but I just wanted to point out another way of initialising an Array that is more idiomatic/functional – using tabulate. This takes a function that takes the array cell coordinates as input and produces the cell value:
scala> Array.tabulate[String](4, 4) _
res0: (Int, Int) => String => Array[Array[String]] = <function1>
scala> val data = Array.tabulate(4, 4) {case (x, y) => x * y }
data: Array[Array[Int]] = Array(Array(0, 0, 0, 0), Array(0, 1, 2, 3), Array(0, 2, 4, 6), Array(0, 3, 6, 9))

Scala: what is the best way to append an element to an Array?

Say I have an Array[Int] like
val array = Array( 1, 2, 3 )
Now I would like to append an element to the array, say the value 4, as in the following example:
val array2 = array + 4 // will not compile
I can of course use System.arraycopy() and do this on my own, but there must be a Scala library function for this, which I simply could not find. Thanks for any pointers!
Notes:
I am aware that I can append another Array of elements, like in the following line, but that seems too round-about:
val array2b = array ++ Array( 4 ) // this works
I am aware of the advantages and drawbacks of List vs Array and here I am for various reasons specifically interested in extending an Array.
Edit 1
Thanks for the answers pointing to the :+ operator method. This is what I was looking for. Unfortunately, it is rather slower than a custom append() method implementation using arraycopy -- about two to three times slower. Looking at the implementation in SeqLike[], a builder is created, then the array is added to it, then the append is done via the builder, then the builder is rendered. Not a good implementation for arrays. I did a quick benchmark comparing the two methods, looking at the fastest time out of ten cycles. Doing 10 million repetitions of a single-item append to an 8-element array instance of some class Foo takes 3.1 sec with :+ and 1.7 sec with a simple append() method that uses System.arraycopy(); doing 10 million single-item append repetitions on 8-element arrays of Long takes 2.1 sec with :+ and 0.78 sec with the simple append() method. Wonder if this couldn't be fixed in the library with a custom implementation for Array?
Edit 2
For what it's worth, I filed a ticket:
https://issues.scala-lang.org/browse/SI-5017
You can use :+ to append element to array and +: to prepend it:
0 +: array :+ 4
should produce:
res3: Array[Int] = Array(0, 1, 2, 3, 4)
It's the same as with any other implementation of Seq.
val array2 = array :+ 4
//Array(1, 2, 3, 4)
Works also "reversed":
val array2 = 4 +: array
Array(4, 1, 2, 3)
There is also an "in-place" version:
var array = Array( 1, 2, 3 )
array +:= 4
//Array(4, 1, 2, 3)
array :+= 0
//Array(4, 1, 2, 3, 0)
The easiest might be:
Array(1, 2, 3) :+ 4
Actually, Array can be implcitly transformed in a WrappedArray

Resources