Hi I've tried to insert element to rdd array[String] using scala in spark.
Here is example.
val data = RDD[Array[String]] = Array(Array(1,2,3), Array(1,2,3,4), Array(1,2)).
I want to make length 4 of all arrays in this data.
If the length of array is less than 4, I want to fill the NULL value in the array.
here is my code that I tried to solve.
val newData = =>
if(x.length < 4){
for(i <- x.length until 4){
But The result is Array[Any] = Array((), Array(1, 2, 3, 4), ()).
So I tried another ways. I used yield on for loop.
val newData = =>
if(x.length < 4){
for(i <- x.length until 4)yield{
The result is Array[Object] = Array(Vector(Array(1, 2, 3, N, U, L, L)), Array(1, 2, 3, 4), Vector(Array(1, 2, N, U, L, L), Array(1, 2, N, U, L, L)))
these are not what I want. I want to return like this
RDD[Array[String]] = Array(Array(1,2,3,NULL), Array(1,2,3,4), Array(1,2,NULL,NULL)).
What should I do?
Is there a method to solve it?

union is a functional operation, it doesn't change the array x. You don't need to do this with a loop, though, and any loop implementations will probably be slower -- it's much better to create one new collection with all the NULL values instead of mutating something every time you add a null. Here's a lambda function that should work for you:
def fillNull(x: Array[Int], desiredLength: Int): Array[String] = { ++ Array.fill(desiredLength - x.length)("NULL")
val newData =, 4))

I solved your use case with the following code:
val initialRDD = sparkContext.parallelize(Array(Array[AnyVal](1, 2, 3), Array[AnyVal](1, 2, 3, 4), Array[AnyVal](1, 2, 3)))
val transformedRDD = =>
if (array.length < 4) {
val transformedArray = Array.fill[AnyVal](4)("NULL")
Array.copy(array, 0, transformedArray, 0, array.length)
} else {
val result = transformedRDD.collect()


How to rotate an array using array reversal technique?

I am trying to rotate an array from a particular position using array reversal method.
Input array: [1,2,3,4,5,6,7]
d = 3
Output array: [5,6,7,1,2,3,4]
To achieve this I thought of working on the array in three steps.
Step1: Reverse the array from starting position until d => [4,3,2,1,5,6,7]
Step2: Reverse the array from d till the end of the array => [4,3,2,1,7,6,5]
Step3: Reverse the complete array from Step2 => [5,6,7,1,2,3,4]
I haven't followed any functional programming pattern as I want to check the algorithm step by step.
val arr = Array[Int](1, 2, 3, 4, 5, 6, 7)
def reverseAlgo(brr: Array[Int], start: Int, end: Int): Unit = {
var temp = 0
for(i <- start until end/2) {
temp = brr(i)
brr(i) = brr(end-i-1)
brr(end-i-1) = temp
Step1 is working fine:
reverseAlgo(arr, 0, 3)
But Step2 is not producing the required output:
reverseAlgo(arr, 3, 7)
As you see, the output of the array should be: 3,2,1,7,6,5,4
Since the output from Step2 is incorrect, the final output is also wrong.
reverseAlgo(arr, 0, arr.length)
Could anyone let me know what is the mistake I am doing here ?
Why not just something as simple as this?
import scala.collection.immutable.ArraySeq
import scala.reflect.ClassTag
def rotate[T : ClassTag](arr: ArraySeq[T])(pos: Int): ArraySeq[T] = {
val length = arr.length
ArraySeq.tabulate[T](n = length) { i =>
arr((i + 1 + pos) % length)
Which can be used like this:
rotate(arr = ArraySeq(1, 2, 3, 4, 5, 6, 7))(pos = 3)
// res: ArraySeq[Int] = ArraySeq(, 5, 6, 7, 1, 2, 3, 4)
You can see the code running here.
Your code will only work when the range starts at zero.
for(i <- start until end/2) {
temp = brr(i)
brr(i) = brr(end-i-1)
brr(end-i-1) = temp
Should be something like:
for(i <- 0 until (end-start)/2) {
temp = brr(start+i)
brr(start+i) = brr(end-i-1)
brr(end-i-1) = temp
With this change your code works.
Mutation is to be avoided but, if you must, recursion is still useful.
def reversePart[A](arr: Array[A], start: Int, end: Int): Unit = {
def loop(a:Int, b:Int): Unit =
if (a < b) {
val temp = arr(a)
arr(a) = arr(b)
arr(b) = temp
loop(a+1, b-1)
loop(start max 0, end min arr.length-1)
val test = Array(1, 2, 3, 4, 5, 6, 7)
reversePart(test, 0, 3) //Array(4, 3, 2, 1, 5, 6, 7)
reversePart(test, 4, 7) //Array(4, 3, 2, 1, 7, 6, 5)
reversePart(test, -1, 99) //Array(5, 6, 7, 1, 2, 3, 4)
I realize this doesn't directly answer your question, but for reference and for readers interested in a slightly different approach, one possibility is to implement this as a view.
In this example, Rotate implements the logic, while IndexedSeqViewRotate adds the rotate method as an extension to any IndexedSeqView as long as it's in scope.
In the tests, I materialized the views into Vectors to take advantage of the equality, but of course you can materialize them into an Array as well.
import scala.collection.IndexedSeqView
import scala.collection.IndexedSeqView.SomeIndexedSeqOps
final class Rotate[A](underlying: SomeIndexedSeqOps[A], n: Int) extends IndexedSeqView[A] {
#inline private def rotateIndex(i: Int): Int = ((i - n) % length + length) % length
override def apply(i: Int): A = underlying(rotateIndex(i))
override lazy val length: Int = underlying.length
final implicit class IndexedSeqViewRotate[A](val underlying: IndexedSeqView[A]) extends AnyVal {
def rotate(n: Int): IndexedSeqView[A] = new Rotate(underlying, n)
assert(Array().view.rotate(7).to(Vector) == Vector.empty)
assert(Array(1,2,3,4,5,6,7).view.rotate(7).to(Vector) == Vector(1,2,3,4,5,6,7))
assert(Array(1,2,3,4,5,6,7).view.rotate(0).to(Vector) == Vector(1,2,3,4,5,6,7))
assert(Array(1,2,3,4,5,6,7).view.rotate(1).to(Vector) == Vector(7,1,2,3,4,5,6))
assert(Array(1,2,3,4,5,6,7).view.rotate(3).to(Vector) == Vector(5,6,7,1,2,3,4))
assert(Array(1,2,3,4,5,6,7).view.rotate(-1).to(Vector) == Vector(2,3,4,5,6,7,1))
You can play around with this code here on Scastie.

Pairwise comparison array Scala

I 'm trying to compare if two consecutive elements of an array are equal.
I have tried using for but as it returns a boolean but it does not seem to work what am I missing
val array1 = Array(1, 4, 2, 3)
def equalElements(array : Array[Int]) : Boolean = {
for (i <- 1 to (array.size )) {
if (array(i) == array(i + 1)) true else false
You can use sliding that
Groups elements in fixed size blocks by passing a "sliding window"
over them (as opposed to partitioning them, as is done in grouped.)
val array1 = Array(1, 1, 2, 2)
val equalElements = array1
.sliding(size = 2, step = 1) //step = 1 is a default value.
.exists(window => window.length == 2 && window(0) == window(1))

How do I take slice from an array position to end of the array?

How do I get an array of array with elements like this? Is there an inbuilt scala api that can provide this value (without using combinations)?
val inp = Array(1,2,3,4)
Vector((1,2), (1,3), (1,4)),
Vector((2,3), (2,4)),
My answer is below. I feel that there should be an elegant answer than this in scala.
val inp = Array(1,2,3,4)
val mp = (0 until inp.length - 1).map( x => {
(x + 1 until inp.length).map( y => {
Added combination constraint.
Using combinations(2) and groupBy() on the first element (0) of each combination will give you the values and structure you want. Getting the result as a Vector[Vector]] will require some conversion using toVector
scala> inp.combinations(2).toList.groupBy(a => a(0)).values
res11: Iterable[List[Array[Int]]] = MapLike.DefaultValuesIterable
List(Array(2, 3), Array(2, 4)),
List(Array(1, 2), Array(1, 3), Array(1, 4)),
List(Array(3, 4))
Note This answer is OK only if the elements in the Seq are unique and sorted (according to <). See edit for the more general case.
val v = a.toVector
and by foregoing combinations, I can choose tuples instead and not have to cast at the end
for (i <- v.init) yield { for (j <- v if i < j) yield (i, j) }
or => v.filter(i < _).map((i, _)))
Not sure if there's a performance hit for using init on vector
For non-unique elements, we can use the indices
val v = a.toVector.zipWithIndex
for ((i, idx) <- v.init) yield { for ((j, jdx) <- v if idx < jdx) yield (i, j) }

Faster way to make a zeroed array in Scala

I create zeroed Arrays in Scala with
(0 until Nrows).map (_ => 0).toArray but is there anything faster ? map is slow.
I have the same question but with 1 instead of O, i.e. I also want to accelerate (0 until Nrows).map (_ => 1).toArray
Zero is the default value for an array of Ints, so just do this:
val array = new Array[Int](NRows)
If you want all those values to be 1s then use .fill() (with thanks to #gourlaysama):
val array = Array.fill(NRows)(1)
However, looking at how this works internally, it involves the creation of a few objects that you don't need. I suspect the following (uglier) approach may be quicker if speed is your main concern:
val array = new Array[Int](NRows)
for (i <- 0 until array.length) { array(i) = 1 }
For multidimensional arrays consider Array.ofDim, for instance,
scala> val a = Array.ofDim[Int](3,3)
a: Array[Array[Int]] = Array(Array(0, 0, 0), Array(0, 0, 0), Array(0, 0, 0))
scala> val a = Array.ofDim[Int](3)
a: Array[Int] = Array(0, 0, 0)
In the context here,
val a = Array.ofDim[Int](NRows)
For setting (possibly nonzero) initial values, consider Array.tabulate, for instance,
scala> Array.tabulate(3,3)( (x,y) => 1)
res5: Array[Array[Int]] = Array(Array(1, 1, 1), Array(1, 1, 1), Array(1, 1, 1))
scala> Array.tabulate(3)( x => 1)
res18: Array[Int] = Array(1, 1, 1)

Removing arrays which are subsets of other arrays in Scala

I have an array of array of integers. Like:
val t1 = Array(Array(1, 2, 3), Array(2), Array(4, 5, 6), Array(5, 6))
I want to remove the arrays that are subsets of another array. So, the result should be:
Array(Array(1, 2, 3), Array(4, 5, 6))
Ideally, these should be Sets, but in the context of my program, they are arrays, and I don't want to convert them to sets due to performance reasons.
I solved it this way in Scala, but I would like to know if there is a more elegant (and/or more efficient) way to do this:
def removeSubsets[T: ClassManifest](clusters: Array[Array[T]]) = {
val sortedClusters = clusters.sortBy(-1 * _.length)
sortedClusters.foldLeft(Array[Array[T]]()){ (acc, ele) =>
val isASubset = acc.exists(arr => (ele diff arr).length == 0)
if (isASubset) acc else acc :+ ele
