I've been looking for a method similar to String.split in Scala Array, but I've not been able to find it.
What I want to do is to split an array by a separator.
For example, separating the following array:
val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
using the '\n' separator, should result in:
List(Array(a, b), Array(c, d, e), Array(g))
I know that I can convert the Array to String, and apply split there:
array.mkString.split('\n').map(_.toArray)
but I would prefer to skip the conversion.
The solution I have so far involves using span recursively and is a bit too boilerplate:
def splitArray[T](array: Array[T], separator: T): List[Array[T]] = {
def spanRec(array: Array[T], aggResult: List[Array[T]]): List[Array[T]] = {
val (firstElement, restOfArray) = array.span(_ != separator)
if (firstElement.isEmpty) aggResult
else spanRec(restOfArray.dropWhile(_ == separator), firstElement :: aggResult)
}
spanRec(array, List()).reverse
}
I'm sure there must be something in Scala I'm missing. Any idea?
thanks,
Ruben
This is not the most concise implementation, but it should be fairly performed and preserves the array type without resorting to reflection. The loop can of course be replaced by a recursion.
Since your question doesn't explicitly state what should be done with the separator I assume, that they should not cause any entry in the output list (see the test cases below).
def splitArray[T](xs: Array[T], sep: T): List[Array[T]] = {
var (res, i) = (List[Array[T]](), 0)
while (i < xs.length) {
var j = xs.indexOf(sep, i)
if (j == -1) j = xs.length
if (j != i) res ::= xs.slice(i, j)
i = j + 1
}
res.reverse
}
Some tests:
val res1 =
// Notice the two consecutive '\n'
splitArray(Array('a', 'b', '\n', 'c', 'd', 'e', '\n', '\n', 'g', '\n'), '\n')
println(res1)
// List([C#12189646, [C#c31d6f2, [C#1c16b01f)
res1.foreach(ar => {ar foreach print; print(" ")})
// ab cde g
// No separator
val res2 = splitArray(Array('a', 'b'), '\n')
println(res2)
// List([C#3a2128d0)
res2.foreach(ar => {ar foreach print; print(" ")})
// ab
// Only separators
val res3 = splitArray(Array('\n', '\n'), '\n')
println(res3)
// List()
A borrowed the arguments from sschaef's solution:
def split[T](array : Array[T])(where : T=>Boolean) : List[Array[T]] = {
if (array.isEmpty) Nil
else {
val (head, tail) = array span {!where(_)}
head :: split(tail drop 1)(where)
}
} //> split: [T](array: Array[T])(where: T => Boolean)List[Array[T]]
val array = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
split(array){_ =='\n'} //> res2: List[Array[Char]] = List(Array(a, b), Array(c, d, e), Array(g))
def splitByNewLines(array : Array[Char]) = split(array){_ =='\n'}
splitByNewLines(array) //> res3: List[Array[Char]] = List(Array(a, b), Array(c, d, e), Array(g))
You can use the span method to split the array into two parts and then call your split method recursively on the second part.
import scala.reflect.ClassTag
def split[A](l:Array[A], a:A)(implicit act:ClassTag[Array[A]]):Array[Array[A]] = {
val (p,s) = l.span(a !=)
p +: (if (s.isEmpty) Array[Array[A]]() else split(s.tail,a))
}
This is not very efficient though, since it has quadratic performance. If you want something fast, a simple tail recursive solution will probably be the best approach.
With lists instead of arrays you would get linear performance and wouldn't need reflection.
This is a short formulation that should do the job:
def split(array:Array[Char], sep:Char) : Array[Array[Char]] = {
/* iterate the list from right to left and recursively calculate a
pair (chars,list), where chars contains the elements encountered
since the last occurrence of sep.
*/
val (chars, list) = array.foldRight[(List[Char],List[Array[Char]])]((Nil,Nil))((x,y) => if (x == sep) (Nil, (y._1.toArray)::y._2) else (x::y._1, y._2) );
/* if the last element was sep, do nothing;
otherwise prepend the last collected chars
*/
if (chars.isEmpty)
list.toArray
else
(chars.toArray::list).toArray
}
/* example:
scala> split(array,'\n')
res26: Array[Array[Char]] = Array(Array(a, b), Array(c, d, e), Array(g), Array())
*/
If we use List instead of Array, we can generalize the code a bit:
def split[T](array:List[T], char:T) : List[List[T]] = {
val (chars, list) = array.foldRight[(List[T],List[List[T]])]((Nil,Nil))((x,y) => if (x == char) (Nil, (y._1)::y._2) else (x::y._1, y._2) )
if (chars.isEmpty) list else (chars::list)
}
/* example:
scala> split(array.toList, '\n')
res32: List[List[Char]] = List(List(a, b), List(c, d, e), List(g), List())
scala> split(((1 to 5) ++ (1 to 5)).toList, 3)
res35: List[List[Int]] = List(List(1, 2), List(4, 5, 1, 2), List(4, 5))
*/
If this solution is considered as elegant or unreadable is left to the reader and her preference for functional programming :)
Simple way of doing it using foldLeft
val f = array.foldLeft((Array[Char](),List[Array[Char]]()))(
(acc, char: Char) => {
char match {
case '\n' => (Array(),acc._1 :: acc._2)
case _ => (acc._1 :+ char,acc._2)
}
}
)._2.reverse
I came up with a solution that aims at the following:
is generic: you should be able to split an Array just like a Vector, and a collection of Chars just like a collection of arbitrary objects
preserves the types of the inputs: an Array[A] gets split in an Array[Array[A]], a Vector[A] gets split in a Vector[Vector[A]]
allows to use a lazy approach if needed (via an Iterator)
exposes a compact interface for most cases (just call a split method on your collection)
Before getting to the explanation, note that you can play with the code that follows here on Scastie.
The first step is implementing an Iterator that chunks your collection:
import scala.language.higherKinds
import scala.collection.generic.CanBuildFrom
final class Split[A, CC[_]](delimiter: A => Boolean, as: CC[A])(
implicit view: CC[A] => Seq[A], cbf: CanBuildFrom[Nothing, A, CC[A]])
extends Iterator[CC[A]] {
private[this] var it: Iterator[A] = view(as).iterator
private def skipDelimiters() = {
it = it.dropWhile(delimiter)
}
skipDelimiters()
override def hasNext: Boolean = it.hasNext
override def next(): CC[A] = {
val builder = cbf()
builder ++= it.takeWhile(!delimiter(_))
skipDelimiters()
builder.result()
}
}
I'm using a predicate instead of a value to be more elastic in how the collection gets split, especially when splitting a collection of non-scalar values (like Chars).
I'm using an implicit view on the collection type to be able to apply this to all kinds of collection that can be seen as a Seq (like Vectors and Arrays) and a CanBuildFrom to be able to build the exact type of collection I'm receiving as an input.
The implementation of the Iterator simply makes sure to drop delimiters and chunk the rest.
We can now use an implicit class to offer a friendly interface and add the split method to all the collections, both allowing a predicate or a value to be defined as delimiters:
final implicit class Splittable[A, CC[_]](val as: CC[A])(implicit ev1: CC[A] => Seq[A], ev2: CanBuildFrom[Nothing, A, CC[A]], ev3: CanBuildFrom[Nothing, CC[A], CC[CC[A]]]) {
def split(delimiter: A => Boolean): CC[CC[A]] = new Split(as)(delimiter).to[CC]
def split(delimiter: A): CC[CC[A]] = new Split(as)(_ == delimiter).to[CC]
}
Now you can use your method freely on collection of Chars
val a = Array('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
val b = List('\n', '\n', '\n')
val c = Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n')
val d = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
val e = Array('a', 'b', 'c', 'd', 'e', 'g', '\n')
a.split('\n')
b.split('\n')
c.split('\n')
d.split('\n')
e.split('\n')
and arbitrary objects alike:
final case class N(n: Int, isDelimiter: Boolean)
Vector(N(1, false), N(2, false), N(3, true), N(4, false), N(5, false)).split(_.isDelimiter)
Note that by using the iterator directly you use a lazy approach, as you can see if you add a debug print to the next method and try to execute the following:
new Split(Vector('\n', 'c', 'd', 'e', '\n', 'g', '\n'))(_ == '\n'}).take(1).foreach(println)
If you want, you can add a couple of methods to Splittable that return an Iterator, so that you can expose the lazy approach as well directly through it.
I don't know of any build-in method, but I came up with a simpler one than yours:
def splitOn[A](xs: List[A])(p: A => Boolean): List[List[A]] = xs match {
case Nil => Nil
case x :: xs =>
val (ys, zs) = xs span (!p(_))
(x :: ys) :: splitOn(zs.tail)(p)
}
// for Array
def splitOn[A : reflect.ClassTag](xs: Array[A])(p: A => Boolean): List[Array[A]] =
if (xs.isEmpty) List()
else {
val (ys, zs) = xs.tail span (!p(_))
(xs.head +: ys) :: splitOn(zs.tail)(p)
}
scala> val xs = List('a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n')
xs: List[Char] =
List(a, b,
, c, d, e,
, g,
)
scala> splitOn(xs)(_ == '\n')
res7: List[List[Char]] = List(List(a, b), List(c, d, e), List(g))
How about this? No reflection, and not recursive either but tries to use as much of the scala library as possible.
def split[T](a: Array[T], sep: T)(implicit m:ClassManifest[T]): Array[Array[T]] = {
val is = a.indices filter (a(_) == sep)
(0 +: (is map (1+))) zip (is :+ (a.size+1)) map {
case(from,till) => a.slice(from, till)
}
}
Probably slow, but just for fun. :-)
The indices filter gives you the indices (is) of where your separator was found.
In your example, that's 2,6,8. I think this is O(n).
The next line transforms that into (0,2), (3,6), (7,8), (9, 10). So k separators yield k+1 ranges.
These are handed to slice, which does the rest of the work. The transformation is also O(n) where n is the number of separators found.
(This means an input of Array[Char]() will yield Array(Array()) and not the more intuitive Array() but that's not too interesting).
The array appending/prepending (:+, +:) is wasteful using arrays, but nothing that can't be solved by using an appropriate collection that lets you have O(1) appends/prepends.
You can also accomplish this using fold:
def splitArray[T](array: Array[T], separator: T) =
array.foldRight(List(List.empty[T])) { (c, list) =>
if (c == separator) Nil :: list
else (c :: list.head) :: list.tail
}.filter(!_.isEmpty).map(_.reverse).toArray
which was already mentioned by lambda.xy.x, but for some reason it was a bit less readable then necessary ;)
Pimped version of generic sequence / array split -
implicit def toDivide[A, B <% TraversableLike[A, B]](a : B) = new {
private def divide(x : B, condition: (A) => Boolean) : Iterable[B] = {
if (x.size > 0)
x.span(condition) match {
case (e, f) => if (e.size > 0) Iterable(e) ++ divide(f.drop(1),condition) else Iterable(f)
}
else
Iterable()
}
def divide(condition: (A) => Boolean): Iterable[B] = divide(a, condition)
}
Almost a one-liner:
val it = array.iterator
List.range(0, array.count(_ == '\n')).map(_ => it.takeWhile(_ != '\n').toArray)
For a given array, this makes use of an Iterator version of the Array in order to call .takeWhile as many times as there are occurrences of the separator.
Another version of the same a bit shorter although less readable, using List.tabulate which is a map over a range:
val it = array.iterator
List.tabulate(array.count(_ == '\n'))(_ => it.takeWhile(_ != '\n').toArray)
This can be made a generic equivalent of array.mkString.split("\n", -1).map(_.toArray) by pimping Array with:
implicit class ArrayExtensions[T: ClassTag](array: Array[T]) {
def split(sep: T): List[Array[T]] = {
val it = array.iterator
List.range(0, array.count(_ == sep)).map(_ => it.takeWhile(_ != sep).toArray)
}
}
and used this way:
Array('\n', '\n', 'a', 'b', '\n', 'c', 'd', 'e', '\n', 'g', '\n').split('\n')
// List(Array(), Array(), Array(a, b), Array(c, d, e), Array(g), Array())
To get rid of empty sub-arrays resulting from 2 successive occurrences of the separator, one can pipe the result with .filter(_.nonEmpty).