Type-safe rectangular multidimensional array type - arrays

How do you represent a rectangular 2-dimensional (or multidimensional) array data structure in Scala?
That is, each row has the same length, verified at compile time, but the dimensions are determined at runtime?
Seq[Seq[A]] has the desired interface, but it permits the user to provide a "ragged" array, which can result in a run-time failure.
Seq[(A, A, A, A, A, A)] (and similar) does verify that the lengths are the same, but it also forces this length to be specified at compile time.
Example interface
Here's an example interface of what I mean (of course, the inner dimension doesn't have to be tuples; it could be specified as lists or some other type):
// Function that takes a rectangular array
def processArray(arr : RectArray2D[Int]) = {
// do something that assumes all rows of RectArray are the same length
}
// Calling the function (OK)
println(processArray(RectArray2D(
( 0, 1, 2, 3),
(10, 11, 12, 13),
(20, 21, 22, 23)
)))
// Compile-time error
println(processArray(RectArray2D(
( 0, 1, 2, 3),
(10, 11, 12),
(20, 21, 22, 23, 24)
)))

This is possible using the Shapeless library's sized types:
import shapeless._
def foo[A, N <: Nat](rect: Seq[Sized[Seq[A], N]]) = rect
val a = Seq(Sized(1, 2, 3), Sized(4, 5, 6))
val b = Seq(Sized(1, 2, 3), Sized(4, 5))
Now foo(a) compiles, but foo(b) doesn't.
This allows us to write something very close to your desired interface:
case class RectArray2D[A, N <: Nat](rows: Sized[Seq[A], N]*)
def processArray(arr: RectArray2D[Int, _]) = {
// Run-time confirmation of what we've verified at compile-time.
require(arr.rows.map(_.size).distinct.size == 1)
// Do something.
}
// Compiles and runs.
processArray(RectArray2D(
Sized( 0, 1, 2, 3),
Sized(10, 11, 12, 13),
Sized(20, 21, 22, 23)
))
// Doesn't compile.
processArray(RectArray2D(
Sized( 0, 1, 2, 3),
Sized(10, 11, 12),
Sized(20, 21, 22, 23)
))

Using encapsulation to ensure proper size.
final class Matrix[T]( cols: Int, rows: Int ) {
private val container: Array[Array[T]] = Array.ofDim[T]( cols, rows )
def get( col: Int, row: Int ) = container(col)(row)
def set( col: Int, row: Int )( value: T ) { container(col)(row) = value }
}

Note: I misread the question, mistaking a rectangle for a square. Oh, well, if you're looking for squares, this would fit. Otherwise, you should go with #Travis Brown's answer.
This solution may not be the most generic one, but it coincides with the way Tuple classes are defined in Scala.
class Rect[T] private (val data: Seq[T])
object Rect {
def apply[T](a1: (T, T), a2: (T, T)) = new Rect(Seq(a1, a2))
def apply[T](a1: (T, T, T), a2: (T, T, T), a3: (T, T, T)) = new Rect(Seq(a1, a2, a3))
// Continued...
}
Rect(
(1, 2, 3),
(3, 4, 5),
(5, 6, 7))
This is the interface you were looking for and the compiler will stop you if you have invalid-sized rows, columns or type of element.

Related

Scala Array Copy looks to be working differently

object Solution extends App {
val arr1 = Array(
Array(1,2,3),
Array(4,5,6)
)
var arr2 = Array.ofDim[Int](2,3)
Array.copy(arr1,0,arr2,0,arr1.length)
arr1(0)(1) = 23
println(arr1.map(_.mkString(",")).mkString("\n"))
println()
println(arr2.map(_.mkString(",")).mkString("\n"))
}
1,23,3
4,5,6
1,23,3
4,5,6
what is wrong, why is the 23 appearing in both arrays
Because Array in Scala, or if to be more precise in JVM, because of Scala interop with Java - is a mutable structure, and you performing shallow copy and not a deep copy. Meaning - you copying the upper structure (or top array in your case) and not entire structure recursively, like all downstream array.
Solution might look like:
val source = Array(
Array(1, 2, 3),
Array(4, 5, 6)
)
val target = Array.ofDim[Int](2, 3)
source.zipWithIndex.foreach { case (row, index) =>
Array.copy(source(index), 0, target(index), 0, source.length)
}
target(0)(1) = 23
println(source.map(_.mkString(",")).mkString("\n"))
println()
println(target.map(_.mkString(",")).mkString("\n"))
which will print out result:
1,2,3
4,5,6
1,23,0
4,5,0
Scatie example: https://scastie.scala-lang.org/lrrHyGqZRxKk7mZ6CbLoiA
UPDATE
As correctly stated #Luis Miguel Mejía Suárez in the comment section - zipWithIndex expensive operation. More optimal solution would be
(0 until source.length).foreach { index =>
Array.copy(source(index), 0, target(index), 0, source.length)
}
Array.copy uses System.arrayCopy which modifies both the arrays. In the doc:
Copy one array to another. Equivalent to Java's System.arraycopy(src, srcPos, dest, destPos, length), except that this also works for polymorphic and boxed arrays.
Note that the passed-in dest array will be modified by this call.
You can try a simple map with identity:
scala> val arr1 = Array(Array(1,2,3),Array(4,5,6))
arr1: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))
scala> val arr3 = arr1.map(_.map(identity))
arr3: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))
scala> arr1(0)(1) = 23
scala> arr1
res16: Array[Array[Int]] = Array(Array(1, 23, 3), Array(4, 5, 6))
scala> arr3
res17: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))

Pairing last element from previous pair as first in next pair [duplicate]

Given I've got an array in Swift such as [1,2,3,4], a method pairs() will transform it in to the array of tuples: [(1,2), (2,3), (3,4)].
Here are some more examples of how pairs() should behave:
pairs([]) should return [] as it has no pairs.
pairs([1]) should also return [], as it has no pairs.
pairs([1,2]) should be [(1,2)]. It has just one pair.
I can write code to do this for Array, but I'd like to have pairs() available as an extension on Sequence, so that it returns a Sequence of the pairs. This would make it useable on any sequence, and compatible with methods such as map, reduce, filter, etc.
How do I go about creating a Sequence like this? And how do I write the method to transform any Sequence in this way so that it can be used as flexibly as possible?
We can use zip() and dropFirst() if we define an extension
on the Collection type:
extension Collection {
func pairs() -> AnySequence<(Element, Element)> {
return AnySequence(zip(self, self.dropFirst()))
}
}
Example:
let array = [1, 2, 3, 4]
for p in array.pairs() {
print(p)
}
Output:
(1, 2)
(2, 3)
(3, 4)
More examples:
print(Array("abc".pairs()))
// [("a", "b"), ("b", "c")]
print([1, 2, 3, 4, 5].pairs().map(+))
// [3, 5, 7, 9]
print([3, 1, 4, 1, 5, 9, 2].pairs().filter(<))
// [(1, 4), (1, 5), (5, 9)]
(Unlike I wrote in the first version of this answer ...) this
approach is not safe when applied to a Sequence, because it is
not guaranteed that a sequence can be traversed multiple times
non-destructively.
Here is a direct implementation with a custom iterator type
which works on sequences as well:
struct PairSequence<S: Sequence>: IteratorProtocol, Sequence {
var it: S.Iterator
var last: S.Element?
init(seq: S) {
it = seq.makeIterator()
last = it.next()
}
mutating func next() -> (S.Element, S.Element)? {
guard let a = last, let b = it.next() else { return nil }
last = b
return (a, b)
}
}
extension Sequence {
func pairs() -> PairSequence<Self> {
return PairSequence(seq: self)
}
}
Example:
print(Array([1, 2, 3, 4].pairs().pairs()))
// [((1, 2), (2, 3)), ((2, 3), (3, 4))]

Read values and list of lists in Haskell

Before to mark this question as duplicated, I already read this topic: Haskell read Integer and list of lists from file and the solution doesn't solve my problem.
I'm trying to read the content in a File that contains this structure:
String, String, [(Int, Int, Int)]
The file looks something like this:
Name1 22/05/2018 [(1, 5, 10), (2, 5, 5), (3, 10, 40)]
Name2 23/05/2018 [(1, 10, 10), (2, 15, 5), (3, 50, 40),(4,20,5)]
Name3 22/05/2018 [(4, 2, 1), (5, 2, 2), (6, 50, 3), (1,2,3)]
Name4 23/05/2018 [(1, 3, 10), (2, 1, 5), (3, 2, 40), (6,20,20)]
In Haskell, I created this function to read the contents of the file and "convert" this content to my custom type.
rlist :: String -> [(Int, Int, Int)]
rlist = read
loadPurchases :: IO [(String, String, [(Int, Int, Int)])]
loadPurchases = do s <- readFile "tes.txt"
return (glpurch (map words (lines s)))
glpurch :: [[String]] -> [(String, String, [(Int, Int, Int)])]
glpurch [] = []
gplpurch ([name, dt, c]:r) = (name, dt, (rlist c)) : gplpurch r
But when I try to execute the "loadPurchases" function, I get this error:
Non-exhaustive patterns in function glpurch.
Using :set -Wall, I received this help message:
<interactive>:6:1: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for `glpurch':
Patterns not matched:
([]:_:_)
([_]:_)
([_, _]:_)
((_:_:_:_:_):_)
My problem is how to create all these conditions.
I will be very grateful if anyone can help me create those conditions that are likely to determine the "stopping condition"
You are only matching lists of length 3 when in fact there are many more words on each line. Just try it in GHCi:
> words "Name1 22/05/2018 [(1, 5, 10), (2, 5, 5), (3, 10, 40)]"
["Name1","22/05/2018","[(1,","5,","10),","(2,","5,","5),","(3,","10,","40)]"]
You probably want to recombine all words past the first two:
glpurch ((name : dt : rest) :r) = (name, dt, (rlist $ unwords rest)) : gplpurch r
To solve my problem, I did what #Welperooni and #Thomas M. DuBuisson suggested.
I added this code to my function:
glpurch ((name: dt: c: _): r) = (name, dt, (read c :: [(Cod, Quant, Price)
And I removed the blanks that were in the list in my file, these spaces made the division of the text not done correctly.

How to sum up every column of a Scala array?

If I have an array of array (similar to a matrix) in Scala, what's the efficient way to sum up each column of the matrix? For example, if my array of array is like below:
val arr = Array(Array(1, 100, ...), Array(2, 200, ...), Array(3, 300, ...))
and I want to sum up each column (e.g., sum up the first element of all sub-arrays, sum up the second element of all sub-arrays, etc.) and get a new array like below:
newArr = Array(6, 600, ...)
How can I do this efficiently in Spark Scala?
There is a suitable .transpose method on List that can help here, although I can't say what its efficiency is like:
arr.toList.transpose.map(_.sum)
(then call .toArray if you specifically need the result as an array).
Using breeze Vector:
scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.map(breeze.linalg.Vector(_)).reduce(_ + _)
res0: breeze.linalg.Vector[Int] = DenseVector(6, 600)
If your input is sparse you may consider using breeze.linalg.SparseVector.
In practice a linear algebra vector library as mentioned by #zero323 will often be the better choice.
If you can't use a vector library, I suggest writing a function col2sum that can sum two columns -- even if they are not the same length -- and then use Array.reduce to extend this operation to N columns. Using reduce is valid because we know that sums are not dependent on order of operations (i.e. 1+2+3 == 3+2+1 == 3+1+2 == 6) :
def col2sum(x:Array[Int],y:Array[Int]):Array[Int] = {
x.zipAll(y,0,0).map(pair=>pair._1+pair._2)
}
def colsum(a:Array[Array[Int]]):Array[Int] = {
a.reduce(col2sum)
}
val z = Array(Array(1, 2, 3, 4, 5), Array(2, 4, 6, 8, 10), Array(1, 9));
colsum(z)
--> Array[Int] = Array(4, 15, 9, 12, 15)
scala> val arr = Array(Array(1, 100), Array(2, 200), Array(3, 300 ))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))
scala> arr.flatten.zipWithIndex.groupBy(c => (c._2 + 1) % 2)
.map(a => a._1 -> a._2.foldLeft(0)((sum, i) => sum + i._1))
res40: scala.collection.immutable.Map[Int,Int] = Map(2 -> 600, 1 -> 6, 0 -> 15)
flatten array and zipWithIndex to get index and groupBy to map new array as column array, foldLeft to sum the column array.

Removing arrays which are subsets of other arrays in Scala

I have an array of array of integers. Like:
val t1 = Array(Array(1, 2, 3), Array(2), Array(4, 5, 6), Array(5, 6))
I want to remove the arrays that are subsets of another array. So, the result should be:
Array(Array(1, 2, 3), Array(4, 5, 6))
Ideally, these should be Sets, but in the context of my program, they are arrays, and I don't want to convert them to sets due to performance reasons.
I solved it this way in Scala, but I would like to know if there is a more elegant (and/or more efficient) way to do this:
def removeSubsets[T: ClassManifest](clusters: Array[Array[T]]) = {
val sortedClusters = clusters.sortBy(-1 * _.length)
sortedClusters.foldLeft(Array[Array[T]]()){ (acc, ele) =>
val isASubset = acc.exists(arr => (ele diff arr).length == 0)
if (isASubset) acc else acc :+ ele
}
}

Resources