how to insert element to rdd array in spark - arrays

Hi I've tried to insert element to rdd array[String] using scala in spark.
Here is example.
val data = RDD[Array[String]] = Array(Array(1,2,3), Array(1,2,3,4), Array(1,2)).
I want to make length 4 of all arrays in this data.
If the length of array is less than 4, I want to fill the NULL value in the array.
here is my code that I tried to solve.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4){
x.union("NULL")
}
}
else{
x
}
)
But The result is Array[Any] = Array((), Array(1, 2, 3, 4), ()).
So I tried another ways. I used yield on for loop.
val newData = data.map(x =>
if(x.length < 4){
for(i <- x.length until 4)yield{
x.union("NULL")
}
}
else{
x
}
)
The result is Array[Object] = Array(Vector(Array(1, 2, 3, N, U, L, L)), Array(1, 2, 3, 4), Vector(Array(1, 2, N, U, L, L), Array(1, 2, N, U, L, L)))
these are not what I want. I want to return like this
RDD[Array[String]] = Array(Array(1,2,3,NULL), Array(1,2,3,4), Array(1,2,NULL,NULL)).
What should I do?
Is there a method to solve it?

union is a functional operation, it doesn't change the array x. You don't need to do this with a loop, though, and any loop implementations will probably be slower -- it's much better to create one new collection with all the NULL values instead of mutating something every time you add a null. Here's a lambda function that should work for you:
def fillNull(x: Array[Int], desiredLength: Int): Array[String] = {
x.map(_.toString) ++ Array.fill(desiredLength - x.length)("NULL")
}
val newData = data.map(fillNull(_, 4))

I solved your use case with the following code:
val initialRDD = sparkContext.parallelize(Array(Array[AnyVal](1, 2, 3), Array[AnyVal](1, 2, 3, 4), Array[AnyVal](1, 2, 3)))
val transformedRDD = initialRDD.map(array =>
if (array.length < 4) {
val transformedArray = Array.fill[AnyVal](4)("NULL")
Array.copy(array, 0, transformedArray, 0, array.length)
transformedArray
} else {
array
}
)
val result = transformedRDD.collect()

Related

How to rotate an array using array reversal technique?

I am trying to rotate an array from a particular position using array reversal method.
Input array: [1,2,3,4,5,6,7]
d = 3
Output array: [5,6,7,1,2,3,4]
To achieve this I thought of working on the array in three steps.
Step1: Reverse the array from starting position until d => [4,3,2,1,5,6,7]
Step2: Reverse the array from d till the end of the array => [4,3,2,1,7,6,5]
Step3: Reverse the complete array from Step2 => [5,6,7,1,2,3,4]
I haven't followed any functional programming pattern as I want to check the algorithm step by step.
val arr = Array[Int](1, 2, 3, 4, 5, 6, 7)
def reverseAlgo(brr: Array[Int], start: Int, end: Int): Unit = {
var temp = 0
for(i <- start until end/2) {
temp = brr(i)
brr(i) = brr(end-i-1)
brr(end-i-1) = temp
}
brr.foreach(println)
}
Step1 is working fine:
reverseAlgo(arr, 0, 3)
Output:
3
2
1
4
5
6
7
But Step2 is not producing the required output:
reverseAlgo(arr, 3, 7)
Output:
3
2
1
4
5
6
7
As you see, the output of the array should be: 3,2,1,7,6,5,4
Since the output from Step2 is incorrect, the final output is also wrong.
Step3:
reverseAlgo(arr, 0, arr.length)
Output:
7
6
5
4
1
2
3
Could anyone let me know what is the mistake I am doing here ?
Why not just something as simple as this?
import scala.collection.immutable.ArraySeq
import scala.reflect.ClassTag
def rotate[T : ClassTag](arr: ArraySeq[T])(pos: Int): ArraySeq[T] = {
val length = arr.length
ArraySeq.tabulate[T](n = length) { i =>
arr((i + 1 + pos) % length)
}
}
Which can be used like this:
rotate(arr = ArraySeq(1, 2, 3, 4, 5, 6, 7))(pos = 3)
// res: ArraySeq[Int] = ArraySeq(, 5, 6, 7, 1, 2, 3, 4)
You can see the code running here.
Your code will only work when the range starts at zero.
for(i <- start until end/2) {
temp = brr(i)
brr(i) = brr(end-i-1)
brr(end-i-1) = temp
}
Should be something like:
for(i <- 0 until (end-start)/2) {
temp = brr(start+i)
brr(start+i) = brr(end-i-1)
brr(end-i-1) = temp
}
With this change your code works.
Mutation is to be avoided but, if you must, recursion is still useful.
def reversePart[A](arr: Array[A], start: Int, end: Int): Unit = {
def loop(a:Int, b:Int): Unit =
if (a < b) {
val temp = arr(a)
arr(a) = arr(b)
arr(b) = temp
loop(a+1, b-1)
}
loop(start max 0, end min arr.length-1)
}
val test = Array(1, 2, 3, 4, 5, 6, 7)
reversePart(test, 0, 3) //Array(4, 3, 2, 1, 5, 6, 7)
reversePart(test, 4, 7) //Array(4, 3, 2, 1, 7, 6, 5)
reversePart(test, -1, 99) //Array(5, 6, 7, 1, 2, 3, 4)
I realize this doesn't directly answer your question, but for reference and for readers interested in a slightly different approach, one possibility is to implement this as a view.
In this example, Rotate implements the logic, while IndexedSeqViewRotate adds the rotate method as an extension to any IndexedSeqView as long as it's in scope.
In the tests, I materialized the views into Vectors to take advantage of the equality, but of course you can materialize them into an Array as well.
import scala.collection.IndexedSeqView
import scala.collection.IndexedSeqView.SomeIndexedSeqOps
final class Rotate[A](underlying: SomeIndexedSeqOps[A], n: Int) extends IndexedSeqView[A] {
#inline private def rotateIndex(i: Int): Int = ((i - n) % length + length) % length
override def apply(i: Int): A = underlying(rotateIndex(i))
override lazy val length: Int = underlying.length
}
final implicit class IndexedSeqViewRotate[A](val underlying: IndexedSeqView[A]) extends AnyVal {
def rotate(n: Int): IndexedSeqView[A] = new Rotate(underlying, n)
}
assert(Array().view.rotate(7).to(Vector) == Vector.empty)
assert(Array(1,2,3,4,5,6,7).view.rotate(7).to(Vector) == Vector(1,2,3,4,5,6,7))
assert(Array(1,2,3,4,5,6,7).view.rotate(0).to(Vector) == Vector(1,2,3,4,5,6,7))
assert(Array(1,2,3,4,5,6,7).view.rotate(1).to(Vector) == Vector(7,1,2,3,4,5,6))
assert(Array(1,2,3,4,5,6,7).view.rotate(3).to(Vector) == Vector(5,6,7,1,2,3,4))
assert(Array(1,2,3,4,5,6,7).view.rotate(-1).to(Vector) == Vector(2,3,4,5,6,7,1))
You can play around with this code here on Scastie.

Pairwise comparison array Scala

I 'm trying to compare if two consecutive elements of an array are equal.
I have tried using for but as it returns a boolean but it does not seem to work what am I missing
val array1 = Array(1, 4, 2, 3)
def equalElements(array : Array[Int]) : Boolean = {
for (i <- 1 to (array.size )) {
if (array(i) == array(i + 1)) true else false
}
}
You can use sliding that
Groups elements in fixed size blocks by passing a "sliding window"
over them (as opposed to partitioning them, as is done in grouped.)
val array1 = Array(1, 1, 2, 2)
val equalElements = array1
.sliding(size = 2, step = 1) //step = 1 is a default value.
.exists(window => window.length == 2 && window(0) == window(1))

How do I take slice from an array position to end of the array?

How do I get an array of array with elements like this? Is there an inbuilt scala api that can provide this value (without using combinations)?
e.g
val inp = Array(1,2,3,4)
Output
Vector(
Vector((1,2), (1,3), (1,4)),
Vector((2,3), (2,4)),
Vector((3,4))
)
My answer is below. I feel that there should be an elegant answer than this in scala.
val inp = Array(1,2,3,4)
val mp = (0 until inp.length - 1).map( x => {
(x + 1 until inp.length).map( y => {
(inp(x),inp(y))
})
})
print(mp)
+Edit
Added combination constraint.
Using combinations(2) and groupBy() on the first element (0) of each combination will give you the values and structure you want. Getting the result as a Vector[Vector]] will require some conversion using toVector
scala> inp.combinations(2).toList.groupBy(a => a(0)).values
res11: Iterable[List[Array[Int]]] = MapLike.DefaultValuesIterable
(
List(Array(2, 3), Array(2, 4)),
List(Array(1, 2), Array(1, 3), Array(1, 4)),
List(Array(3, 4))
)
ORIGINAL ANSWER
Note This answer is OK only if the elements in the Seq are unique and sorted (according to <). See edit for the more general case.
With
val v = a.toVector
and by foregoing combinations, I can choose tuples instead and not have to cast at the end
for (i <- v.init) yield { for (j <- v if i < j) yield (i, j) }
or
v.init.map(i => v.filter(i < _).map((i, _)))
Not sure if there's a performance hit for using init on vector
EDIT
For non-unique elements, we can use the indices
val v = a.toVector.zipWithIndex
for ((i, idx) <- v.init) yield { for ((j, jdx) <- v if idx < jdx) yield (i, j) }

Faster way to make a zeroed array in Scala

I create zeroed Arrays in Scala with
(0 until Nrows).map (_ => 0).toArray but is there anything faster ? map is slow.
I have the same question but with 1 instead of O, i.e. I also want to accelerate (0 until Nrows).map (_ => 1).toArray
Zero is the default value for an array of Ints, so just do this:
val array = new Array[Int](NRows)
If you want all those values to be 1s then use .fill() (with thanks to #gourlaysama):
val array = Array.fill(NRows)(1)
However, looking at how this works internally, it involves the creation of a few objects that you don't need. I suspect the following (uglier) approach may be quicker if speed is your main concern:
val array = new Array[Int](NRows)
for (i <- 0 until array.length) { array(i) = 1 }
For multidimensional arrays consider Array.ofDim, for instance,
scala> val a = Array.ofDim[Int](3,3)
a: Array[Array[Int]] = Array(Array(0, 0, 0), Array(0, 0, 0), Array(0, 0, 0))
Likewise,
scala> val a = Array.ofDim[Int](3)
a: Array[Int] = Array(0, 0, 0)
In the context here,
val a = Array.ofDim[Int](NRows)
For setting (possibly nonzero) initial values, consider Array.tabulate, for instance,
scala> Array.tabulate(3,3)( (x,y) => 1)
res5: Array[Array[Int]] = Array(Array(1, 1, 1), Array(1, 1, 1), Array(1, 1, 1))
scala> Array.tabulate(3)( x => 1)
res18: Array[Int] = Array(1, 1, 1)

Removing arrays which are subsets of other arrays in Scala

I have an array of array of integers. Like:
val t1 = Array(Array(1, 2, 3), Array(2), Array(4, 5, 6), Array(5, 6))
I want to remove the arrays that are subsets of another array. So, the result should be:
Array(Array(1, 2, 3), Array(4, 5, 6))
Ideally, these should be Sets, but in the context of my program, they are arrays, and I don't want to convert them to sets due to performance reasons.
I solved it this way in Scala, but I would like to know if there is a more elegant (and/or more efficient) way to do this:
def removeSubsets[T: ClassManifest](clusters: Array[Array[T]]) = {
val sortedClusters = clusters.sortBy(-1 * _.length)
sortedClusters.foldLeft(Array[Array[T]]()){ (acc, ele) =>
val isASubset = acc.exists(arr => (ele diff arr).length == 0)
if (isASubset) acc else acc :+ ele
}
}

Resources