Find out frequency of certain integers appearing in an array in Scala - arrays

Say I have two arrays. One array A of a set of integers - all distinct. Another array B of a list of integers, all appearing in array A, but not necessarily distinct. For example:
A could be Array(123, 456, 789)
B could be Array(123, 123, 456, 123, 789, 456)
I want to create an array C, which tells us the frequency of each element (from array A) appearing in array B. In this case, C would be Array(3, 2, 1) because 123 appears 3 times, 456 appears 2 times, and 789 appears 1 time.
What is an efficient way to do this in Scala?
My attempt is
val C: Array[Int] = Array.fill(3)(0)
var idx = 0
for(i <- A){for(j <- B){if(j == i){C(idx) += 1}}
idx += 1}
for(i <- C){println(i)}
But I understand that this is probably inefficient, and would take a long time if I am dealing with a much larger array A and array B. But I am restricted to for loops and if statements since I am only a beginner with Scala. Is there a more efficient way to do this?

Lets say that n is length of Array A and m is length of array B.
As of now your solution is O(n * m)
You can improve this to O(n + m) by using a mutable HashMap and O(n) extra space.
import scala.collection.mutable
val a = Array(123, 456, 789)
val b = Array(123, 123, 456, 123, 789, 456)
val countMap = mutable.HashMap.empty[Int, Int]
// add all integers in `a` with count 0
for (i <- a) {
countMap.put(i, 0)
}
// iterate on b
// and update the count in countMap (if exists)
for (i <- b) {
countMap.get(i).foreach(c => countMap.put(i, c + 1))
}
// fill your array `c`
val c = Array.ofDim[Int](a.length)
for ((i, index) <- a.zipWithIndex) {
c(index) = countMap.getOrElse(i, 0)
}
println(c.mkString(", "))
// 3, 2, 1
Keep in mind that for's for Scala collections have their own costs, you can improve it further by using while loops.
import scala.collection.mutable
val a = Array(123, 456, 789)
val b = Array(123, 123, 456, 123, 789, 456)
val countMap = mutable.HashMap.empty[Int, Int]
// to use with our while loops
var i = 0
// add all integers in `a` with count 0
i = 0
while (i < a.length) {
countMap.put(a(i), 0)
i = i + 1
}
// iterate on b
// and update the count in countMap (if exists)
i = 0
while (i < b.length) {
if (countMap.contains(b(i))) {
countMap.put(b(i), countMap(b(i)) + 1)
}
i = i + 1
}
// fill your array `c`
val c = Array.ofDim[Int](a.length)
i = 0
while (i < a.length) {
c(i) = countMap.getOrElse(a(i), 0)
i = i + 1
}
println(c.mkString(", "))
// 3, 2, 1

Related

want to use only one user define function for input and output in python 3+

from numpy import*
def row():
for i in range(len(a)):
for j in range(len(a[i])):
t = [i, j]
inp(*t)
def inp(*m):
a[m] = int(input(f"entert the element of {m} = "))
out(*m)
def out(*o):
print(a[o])
a = zeros((1,2), dtype = int)
row()
Output is showing like this:
enter the element of (0, 0) = 2
2
enter the element of (0, 1) = 3
3
but I want to show output like this
input at a time like this
enter the element of (0, 0) = 2
enter the element of (0, 1) = 3
output at a time like this
2
3
it is possible to do by creating two separate for loop or two function for input and output
but my question is: how can i do it only creating one function
def row():
for i in range(len(a)):
for j in range(len(a[i])):
t = [i, j]
inp(*t)
Instead of printing the output right away, save the values and print them after you've finished collecting them:
import numpy as np
def row(a):
values = []
for i in range(len(a)):
for j in range(len(a[i])):
value = get_input(a, i, j)
values.append(value)
for value in values:
print(value)
def get_input(a, *m):
a[m] = int(input(f"enter the element of {m} = "))
return a[m]
a = np.zeros((1,2), dtype = int)
row(a)
gives
enter the element of (0, 0) = 1
enter the element of (0, 1) = 2
1
2
However, there's an even simpler way: since you also set the new elements in your matrix, you don't have to save them separately but can simply print out the matrix values:
for item in a.ravel():
print(item)

Spark Scala apply function on array of arrays element-wise

Disclaimer: I'm VERY new to spark and scala. I am working on a document similarity project in Scala with Spark. I have a dataframe which looks like this:
+--------+--------------------+------------------+
| text| shingles| hashed_shingles|
+--------+--------------------+------------------+
| qwerty|[qwe, wer, ert, rty]| [-4, -6, -1, -9]|
|qwerasfg|[qwe, wer, era, r...|[-4, -6, 6, -2, 2]|
+--------+--------------------+------------------+
Where I split the document text into shingles and computed a hash value for each one.
Imagine I have a hash_function(integer, seed) -> integer.
Now I want to apply n different hash functions of this form to the hashed_shingles arrays. I.e. obtain an array of n arrays such that each array is hash_function(hashed_shingles, seed) with seed from 1 to n.
I'm trying something like this, but I cannot get it to work:
val n = 3
df = df.withColumn("tmp", array_repeat($"hashed_shingles", n)) // Repeat minhashes
val minhash_expr = "transform(tmp,(x,i) -> hash_function(x, i))"
df = df.withColumn("tmp", expr(minhash_expr)) // Apply hash to each array
I know how to do it with a udf, but as I understand they are not optimized and I should try to avoid using them, so I try to do everything with org.apache.spark.sql.functions.
Any ideas on how to approach it without udf?
The udf which achieves the same goal is this:
// Family of hashing functions
class Hasher(seed: Int, max_val : Int, p : Int = 104729) {
private val random_generator = new scala.util.Random(seed)
val a = 1 + 2*random_generator.nextInt((p-2)/2)// a odd in [1, p-1]
val b = 1 + random_generator.nextInt(p - 2) // b in [1, p-1]
def getHash(x : Int) : Int = ((a*x + b) % p) % max_val
}
// Compute a list of minhashes from a list of hashers given a set of ids
class MinHasher(hashes : List[Hasher]) {
def getMinHash(set : Seq[Int])(hasher : Hasher) : Int = set.map(hasher.getHash).min
def getMinHashes(set: Seq[Int]) : Seq[Int] = hashes.map(getMinHash(set))
}
// Minhasher
val minhash_len = 100
val hashes = List.tabulate(minhash_len)(n => new Hasher(n, shingle_bins))
val minhasher = new MinHasher(hashes)
// Compute Minhashes
val minhasherUDF = udf[Seq[Int], Seq[Int]](minhasher.getMinHashes)
df = df.withColumn("minhashes", minhasherUDF('hashed_shingles))

How to manually sort an array in Scala? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have tried:
for(i<-0 to arr.length){
println(i)
if(a(i) > a(i+1)){
var tempVal: Int = a(i)
a(i)= a(i+1)
a(i+1) = tempVal
}
}
The example I tried is: [1,2,8,5,10]. I want to sort this array without using any type of the built-in sorted scala functions. When I try the above, it throws: Index 4 out of bounds for length 4. How can I fix this? Is there any better way to sort an array in Scala?
You can find in the Scala By Example book, in chapter 2, an example for a sort, without using a .sort kind of function:
def sort(xs: Array[Int]): Array[Int] = {
if (xs.length <= 1) xs
else {
val pivot = xs(xs.length / 2)
Array.concat(
sort(xs filter (pivot >)),
xs filter (pivot ==),
sort(xs filter (pivot <)))
}
}
If you want to read more about this algorithm, you can do it at Scala Quicksort algorithms: FP/recursive, imperative (and performance). This article also analyses the memory complexity.
Welcome to Scala. In Software Engineering there are multiple ways to sort outside of the standard library. One way to understand the different ones is to watch some of the Hungarian dancers entertain you with:
Quicksort https://www.youtube.com/watch?v=ywWBy6J5gz8
Bubblesort https://www.youtube.com/watch?v=lyZQPjUT5B4
But in answer to your question. It is kind of tough since a doesn't necessarily refer to anything and we don't know what arr originally looks like.
I redid a bit, the other thing is, this looks like a bubble sort, so you would have to ensure that you do a pass with no swaps. Here you will get a response at least, but it is still incorrect, post an update, and read about bubble sort. By the way, your check at the last element was reaching out of bounds .:)
val arr = Array(10, 3, 4, 9, 2, 5, 1)
for(i<-0 to (arr.length -1 )){
if(i < (arr.length -1) && arr(i) > arr(i+1)){
var tempVal: Int = arr(i)
arr(i)= arr(i+1)
arr(i+1) = tempVal
}
}
println(arr.mkString(","))
Let's go through your code one step at a time. First of all, as an example I'll say that
val a = Array(5, 3, 4, 7, 1)
I'll let the formatter tidy up my code, fix the reference to arr (it should be a, judging from the rest of the code) and get rid of the debug print.
We get to this point (playground):
val a = Array(5, 3, 4, 7, 1)
for (i <- 0 to a.length) {
if (a(i) > a(i + 1)) {
var tempVal: Int = a(i)
a(i) = a(i + 1)
a(i + 1) = tempVal
}
}
Now at least the code compiles. As suggested in a comment, one error is using to to produce a range: as you are correctly assuming that array indexes are 0 based. However, this means that an array of length 5 (as in my case) will have valid indexes in the range 0 to 4. The to method produces a range which includes the specified ends, so 0 to a.length will create the range 0 to 5, where 5 will cause the IndexOutOfRangeException. Again, as suggested in the comment, we should replace to with until, which yields the same result as 0 to (n - 1).
Furthermore, you are indexing the next element inside the loop, which means that you want to loop the array until the second to last element, which means we need to iterate from 0 until (a.length - 1).
After this change the code also runs, so I'll add a println at the end to see the result (playground):
for (i <- 0 until (a.length - 1)) {
if (a(i) > a(i + 1)) {
var tempVal: Int = a(i)
a(i) = a(i + 1)
a(i + 1) = tempVal
}
}
println(a.iterator.mkString(", "))
Unfortunately this prints 3, 4, 5, 1, 7, which is definitely not sorted.
It looks like you are implementing bubble sort, but in order to do that we cannot simply go through the array once, we need to iterate over and over again until the array is sorted. We'll introduce a boolean variable to keep track of whether we reached the desired conclusion (playground):
val a = Array(5, 3, 4, 7, 1)
var needsSorting = true
while (needsSorting) {
needsSorting = false
for (i <- 0 until (a.length - 1)) {
if (a(i) > a(i + 1)) {
var tempVal: Int = a(i)
a(i) = a(i + 1)
a(i + 1) = tempVal
needsSorting = true
}
}
}
println(a.iterator.mkString(", "))
Now the output is 1, 3, 4, 5, 7, which is sorted! This successfully implements a bubble sort, which is however a very inefficient algorithm, requiring to go through the entire array once for every item in the array, which means that it has quadratic complexity.
The next step for you is learning more on more efficient sorting algorithms.
In the meantime, we can probably have a look at the code and improve where possible. A first step could be to remove the unnecessary mutable variable when swapping and factor out the swap method (playground):
def swap(a: Array[Int], i: Int, j: Int): Unit = {
val tmp = a(j)
a(j) = a(i)
a(i) = tmp
}
val a = Array(5, 3, 4, 7, 1)
var needsSorting = true
while (needsSorting) {
needsSorting = false
for (i <- 0 until (a.length - 1)) {
if (a(i) > a(i + 1)) {
swap(a, i, i + 1)
needsSorting = true
}
}
}
println(a.iterator.mkString(", "))
Another thing I would do is factor out sorting in its own function (playground):
def swap(a: Array[Int], i: Int, j: Int): Unit = {
val tmp = a(j)
a(j) = a(i)
a(i) = tmp
}
def sort(a: Array[Int]): Unit = {
var needsSorting = true
while (needsSorting) {
needsSorting = false
for (i <- 0 until (a.length - 1)) {
if (a(i) > a(i + 1)) {
swap(a, i, i + 1)
needsSorting = true
}
}
}
}
val a = Array(5, 3, 4, 7, 1)
sort(a)
println(a.iterator.mkString(", "))
Another thing I would do is probably to factor out a single pass as its own method and declare both helpers in the sort function itself to limit the scope in which they can be used and take advantage of a being in scope so that we don't have to pass it in (playground):
def sort(a: Array[Int]): Unit = {
val secondToLastItem = a.length - 1
def swap(i: Int, j: Int): Unit = {
val tmp = a(j)
a(j) = a(i)
a(i) = tmp
}
def onePassIsSorted(): Boolean = {
var swapped = false
for (i <- 0 until secondToLastItem) {
val j = i + 1
if (a(i) > a(j)) {
swap(i, j)
swapped = true
}
}
swapped
}
while (onePassIsSorted()) {}
}
val a = Array(5, 3, 4, 7, 1)
sort(a)
println(a.iterator.mkString(", "))

Count elements of array A in array B with Scala

I have two arrays of strings, say
A = ('abc', 'joia', 'abas8', '09ma09', 'oiam0')
and
B = ('gfdg', '89jkjj', '09ma09', 'asda', '45645ghf', 'dgfdg', 'yui345gd', '6456ds', '456dfs3', 'abas8', 'sfgds').
What I want to do is simply to count the number of elements of every string in A that appears in B (if any). For example, the resulted array here should be: C = (0, 0, 1, 1, 0). How can I do that?
try this:
A.map( x => B.count(y => y == x)))
You can do it how idursun suggested, it will work, but may be not efficient as if you'll prepare intersection first. If B is much bigger than A it will give massive speedup. 'intersect' method has better 'big-O' complexity then doing linear search for each element of A in B.
val A = Array("abc", "joia", "abas8", "09ma09", "oiam0")
val B = Array("gfdg", "89jkjj", "09ma09", "asda", "45645ghf", "dgfdg", "yui345gd", "6456ds", "456dfs3", "abas8", "sfgds")
val intersectCounts: Map[String, Int] =
A.intersect(B).map(s => s -> B.count(_ == s)).toMap
val count = A.map(intersectCounts.getOrElse(_, 0))
println(count.toSeq)
Result
(0, 0, 1, 1, 0)
Use a foldLeft construction as the yield off of each element of A:
val A = List("a","b")
val B = List("b","b")
val C = for (a <- A)
yield B.foldLeft(0) { case (totalc : Int, w : String) =>
totalc + (if (w == a) 1 else 0)
}
And the result:
C: List[Int] = List(0, 2)

what is the way to find if array contain Arithmetic progression (sequence)

i have sorted array of numbers like
1, 4, 5 , 6, 8
what is the way to find out if this array contain Arithmetic progression (sequence) ?
like in this example
4,6,8
or
4,5,6
remark : the minimum numbers in sequence is 3
You can solve this recursively, by breaking it into smaller problems, which are:
Identify the pairs {1,4},{1,5}...{6,8}
For each pair, look for sequences with the same interval
First create the scaffolding to run the problems:
Dim number(7) As Integer
Dim result() As Integer
Dim numbers As Integer
Sub FindThem()
number(1) = 1
number(2) = 4
number(3) = 5
number(4) = 6
number(5) = 8
number(6) = 10
number(7) = 15
numbers = UBound(number)
ReDim result(numbers)
Dim i As Integer
For i = 1 To numbers - 2
FindPairs i
Next
End Sub
Now iterate over the pairs
Sub FindPairs(start As Integer)
Dim delta As Integer
Dim j As Integer
result(1) = number(start)
For j = start + 1 To numbers
result(2) = number(j)
delta = result(2) - result(1)
FindMore j, 2, delta
Next
End Sub
Finding sequences as you go
Sub FindMore(start As Integer, count As Integer, delta As Integer)
Dim k As Integer
For k = start + 1 To numbers
step = number(k) - result(count)
result(count + 1) = number(k) ' should be after the if statement
' but here makes debugging easier
If step = delta Then
PrintSeq "Found ", count + 1
FindMore k, count + 1, delta
ElseIf step > delta Then ' Pointless to search further
Exit Sub
End If
Next
End Sub
This is just to show the results
Sub PrintSeq(text As String, count As Integer)
ans = ""
For t = 1 To count
ans = ans & "," & result(t)
Next
ans = text & " " & Mid(ans, 2)
Debug.Print ans
End Sub
Results
findthem
Found 1,8,15
Found 4,5,6
Found 4,6,8
Found 4,6,8,10
Found 5,10,15
Found 6,8,10
Edit: Oh, and of course, the array MUST be sorted!
HTH
First, I will assume that you only want arithmetic sequences of three terms or more.
I would suggest checking each number a[i] as the start of an arithmetic sequence, and a[i+n] as the next one.
Now that you have the first two terms in your series, you can find the next. In general, if x is your first term and y is your second, your terms will be x + i*(y-x), with the first term at i = 0. The next term will be x + 2*(y-x). Search your array for that value. If that value is in your array, you have an arithmetic sequence of three items or more!
You can continue with i=3, i=4, etc. until you reach one that is not found in your array.
If l is the size of your array, do this for all i from 0 to l-2, and all n from 0 to l-i-1
The only major caveat is that, in the example, this will find both sequences 4,6,8 as well as 6,8. Technically, both of them are arithmetic sequences in your series. You will have to more specifically define what you want there. In your case, it might be trivial to just check and eliminate all progressions that are totally contained inside others.
The general idea is to pick an element as your a_1, then any element after that one as your a_2, compute the difference and then see if any other elements afterwards that match that difference. As long as there are at least 3 elements with the same difference, we consider it a progression.
progression (A, n)
for i = 1 ... n - 2
a_1 = A[i]
for j = i + 1 ... n - 1
a_2 = A[j]
d = a_2 - a_1
S = [ i, j ]
for k = j + 1 ... n
if ( d == ( a[k] - a[S.last] ) )
/* Append the element index to the sequence so far. */
S += k
if ( |s| > 2 )
/* We define a progression to have at least 3 numbers. */
return true
return false
You can modify the algorithm to store each set S before it is lost, to compute all the progressions for the given array A. The algorithm runs in O(n^3) assuming appending to and getting the last element of the set S are in constant time.
Although I feel like there might be a more efficient solution...
Certainly not the optimal way to solve your problem, but you can do the following:
Iterate through all pairs of numbers in your array - each 2 numbers fully define arithmetic sequence if we assume that they're 1st and 2nd progression members. So knowing those 2 numbers you can construct further progression elements and check if they're in your array.
If you want just find 3 numbers forming arithmetic progression then you can iterate through all pairs of non-adjacent numbers a[i] and a[j], j > i+1 and check if their arithmetic mean belongs to array - you can do that using binary search on interval ]i,j[.
Here's the code in Swift 4:
extension Array where Element == Int {
var isArithmeticSequence: Bool {
let difference = self[1] - self[0]
for (index, _) in self.enumerated() {
if index < self.count-1 {
if self[index + 1] - self[index] != difference {
return false
}
}
}
return true
}
var arithmeticSlices: [[Int]] {
var arithmeticSlices = [[Int]]()
var sliceSize = 3
while sliceSize < self.count+1 {
for (index, _) in self.enumerated() {
if (index + sliceSize-1) <= self.count - 1 {
let currentSlice = Array(self[index...index + sliceSize-1])
if currentSlice.isArithmeticSequence {
arithmeticSlices.append(currentSlice)
}
}
}
sliceSize+=1
}
return arithmeticSlices
}
}
let A = [23, 24, 98, 1, 2, 5]
print(A.arithmeticSlices) // []
let B = [4, 7, 10, 4,5]
print(B.arithmeticSlices) //[[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3, 4], [2, 3, 4, 5], [1, 2, 3, 4, 5]]
let C = [4, 7, 10, 23, 11, 12, 13]
print(C.arithmeticSlices) // [[4, 7, 10], [11, 12, 13]]

Resources