Folding an array of arrays - arrays

I'm trying to fold an array of arrays of strings into a single string but I'm not having much luck. Unfortunately it seems Array.reduce expects my lambda to return an array of strings because it is an array of array of strings.
I'm getting :
Line error 37: The type 'string[]' does not match the type 'string'
This is the offending line
(fold state) + (fold item)
Because it's expecting the lambda to return a string[]
Here is the code:
let splitStr (seperator: string[]) (str: string) = str.Split(seperator, StringSplitOptions.None)
let convertFile fileName =
let arrayToTransaction arr =
let rec foldArray index (sb: StringBuilder) (arr:string[]) =
if index > 5 then sb.ToString()
else
let text =
match index with
| 0 -> sb.Append(DateTime.Parse(arr.[1]).ToString("dd/MM/yy", CultureInfo.InvariantCulture))
| 1 -> sb.Append(arr.[0].Substring(0, arr.[0].IndexOf(',')).Trim())
| 2 -> sb.Append("Test")
| 3 -> sb.Append("Test")
| 4 -> sb.Append(Single.Parse(arr.[2].Substring(arr.[2].IndexOf('-') + 1)).ToString("F2", CultureInfo.InvariantCulture))
| _ -> sb.Append(String.Empty)
foldArray (index + 1) (text.Append(",")) arr
arr
|> Array.map (splitStr [|"\n"|])
|> Array.reduce (fun state item -> let fold x = foldArray 0 (new StringBuilder()) x
(fold state) + (fold item))
File.ReadAllText(fileName)
|> splitStr [|"\r\n\r\n"|]
|> arrayToTransaction

Your lambda in the Array.reduce must return a string[] since the signature of the lambda is 'T->'T->'T and the first 'T is already unified as string[] so the result should also be a string[]

Related

Applying a function on an F# array

type Sizes =
| Big
| Medium
| Small
;;
//defines cup/can/bottle with size
type Containment =
| CupDrink of s:Sizes
| CannedDrink of s:Sizes
| BottledDrink of s:Sizes
;;
// defines record for each type for drink
type Coffee = {DrinkName : string; price: double }
type Soda = {DrinkName : string; price: double }
type Brew = {DrinkName : string; price: double }
// union for type of drink
type liquid =
| Coffee of c:Coffee
| Cola of s:Soda
| Beer of b:Brew
;;
let Guiness = Beer {DrinkName = "Guiness"; price = 0.15}
let CocaCola = Cola {DrinkName = "Cola"; price = 0.15}
let smallCup = CupDrink Small // it could be just containment | size
let bigBottle= BottledDrink Big
let findPricePrML(dr:liquid) =
let price = 0.0
match dr with
|Beer(b=h)-> h.price
|Cola(s=h) ->h.price
|Coffee(c=h) -> h.price
|_-> failwith "not found"
// returns size in ML fro each size available // asuming that small bottle , can and cup have same size,
//if not another kind of program can be made but it's not part of the assingment
let find size =
match size with
|Big -> 250.00
|Medium -> 125.00
|Small -> 75.00
let grandTotal (dr:liquid ,cont:Containment) = function
(*let temp = {dra=dr;conta = cont} //supossed to search on menu list if such item exists (can't figure the syntax)
if List.contains temp menuList then *)
|CupDrink (s=z) -> findPricePrML dr * find z
|BottledDrink (s=z) -> findPricePrML dr * find z
|CannedDrink (s=z) -> findPricePrML dr * find z
|_-> failwith "not found"
(* else failwith "no such item exists"*)
;;
let source = [|(CocaCola, bigBottle); (CocaCola, smallCup); (Tuborg, smallCup)|]
let Test =
Async.Parallel [ async { return Array.map grandTotal source } ]
|> Async.RunSynchronously
Hello, I'm trying to learn basics of CPU bound parallel programming in F#. Here I have a function that calculates drink prices. And all i want is to apply another function(which multiplies the result with a certain number) to the results i get from the parallel calculation but I keep getting type mismatch errors. In my solution the result i get is a jagged array. Unfortunately i couldn't figure out how to get the results just as an array too.
I think that the first issue with your code is that your grandTotal function takes two arguments:
let grandTotal (dr:liquid, cont:Containment) = function
| ...
This means that you have to call it with something like grandTotal (CocaCola, bigBottle) drinkKind. However, in your code that tries to call this, you use:
Array.map grandTotal source
This calls grandTotal with only a single argument - an item from the source list, so you get back a function rather than a price. You probably need something like:
Array.map (fun drink -> grandTotal drink kind) source
The second issue is that you are not really parallelising anything. The way you use async, you are just creating a single computation and then running that on a background thread. You could do something like:
let test =
[ for a in source -> async { return grandTotal a kind } ]
|> Async.Parallel
|> Async.RunSynchronously
However, a more efficient and simpler approach is to use Array.Parallel.map:
Array.Parallel.map (fun drink -> grandTotal drink kind) source
To answer your question about calling another function - this is impossible without seeing a more complete code sample that we can run.

Create Tuple out of Array(Array[String) of Varying Sizes using Scala

I am new to scala and I am trying to make a Tuple pair out an RDD of type Array(Array[String]) that looks like:
(122abc,223cde,334vbn,445das),(221bca,321dsa),(231dsa,653asd,698poq,897qwa)
I am trying to create Tuple Pairs out of these arrays so that the first element of each array is key and and any other part of the array is a value. For example the output would look like:
122abc 223cde
122abc 334vbn
122abc 445das
221bca 321dsa
231dsa 653asd
231dsa 698poq
231dsa 897qwa
I can't figure out how to separate the first element from each array and then map it to every other element.
If I'm reading it correctly, the core of your question has to do with separating the head (first element) of the inner arrays from the tail (remaining elements), which you can use the head and tail methods. RDDs behave a lot like Scala lists, so you can do this all with what looks like pure Scala code.
Given the following input RDD:
val input: RDD[Array[Array[String]]] = sc.parallelize(
Seq(
Array(
Array("122abc","223cde","334vbn","445das"),
Array("221bca","321dsa"),
Array("231dsa","653asd","698poq","897qwa")
)
)
)
The following should do what you want:
val output: RDD[(String,String)] =
input.flatMap { arrArrStr: Array[Array[String]] =>
arrArrStr.flatMap { arrStrs: Array[String] =>
arrStrs.tail.map { value => arrStrs.head -> value }
}
}
And in fact, because of how the flatMap/map is composed, you could re-write it as a for-comprehension.:
val output: RDD[(String,String)] =
for {
arrArrStr: Array[Array[String]] <- input
arrStr: Array[String] <- arrArrStr
str: String <- arrStr.tail
} yield (arrStr.head -> str)
Which one you go with is ultimately a matter of personal preference (though in this case, I prefer the latter, as you don't have to indent code as much).
For verification:
output.collect().foreach(println)
Should print out:
(122abc,223cde)
(122abc,334vbn)
(122abc,445das)
(221bca,321dsa)
(231dsa,653asd)
(231dsa,698poq)
(231dsa,897qwa)
This is a classic fold operation; but folding in Spark is calling aggregate:
// Start with an empty array
data.aggregate(Array.empty[(String, String)]) {
// `arr.drop(1).map(e => (arr.head, e))` will create tuples of
// all elements in each row and the first element.
// Append this to the aggregate array.
case (acc, arr) => acc ++ arr.drop(1).map(e => (arr.head, e))
}
The solution is a non-Spark environment:
scala> val data = Array(Array("122abc","223cde","334vbn","445das"),Array("221bca","321dsa"),Array("231dsa","653asd","698poq","897qwa"))
scala> data.foldLeft(Array.empty[(String, String)]) { case (acc, arr) =>
| acc ++ arr.drop(1).map(e => (arr.head, e))
| }
res0: Array[(String, String)] = Array((122abc,223cde), (122abc,334vbn), (122abc,445das), (221bca,321dsa), (231dsa,653asd), (231dsa,698poq), (231dsa,897qwa))
Convert your input element to seq and all and then try to write the wrapper which will give you List(List(item1,item2), List(item1,item2),...)
Try below code
val seqs = Seq("122abc","223cde","334vbn","445das")++
Seq("221bca","321dsa")++
Seq("231dsa","653asd","698poq","897qwa")
Write a wrapper to convert seq into a pair of two
def toPairs[A](xs: Seq[A]): Seq[(A,A)] = xs.zip(xs.tail)
Now send your seq as params and it it will give your pair of two
toPairs(seqs).mkString(" ")
After making it to string you will get the output like
res8: String = (122abc,223cde) (223cde,334vbn) (334vbn,445das) (445das,221bca) (221bca,321dsa) (321dsa,231dsa) (231dsa,653asd) (653asd,698poq) (698poq,897qwa)
Now you can convert your string, however, you want.
Using df and explode.
val df = Seq(
Array("122abc","223cde","334vbn","445das"),
Array("221bca","321dsa"),
Array("231dsa","653asd","698poq","897qwa")
).toDF("arr")
val df2 = df.withColumn("key", 'arr(0)).withColumn("values",explode('arr)).filter('key =!= 'values).drop('arr).withColumn("tuple",struct('key,'values))
df2.show(false)
df2.rdd.map( x => Row( (x(0),x(1)) )).collect.foreach(println)
Output:
+------+------+---------------+
|key |values|tuple |
+------+------+---------------+
|122abc|223cde|[122abc,223cde]|
|122abc|334vbn|[122abc,334vbn]|
|122abc|445das|[122abc,445das]|
|221bca|321dsa|[221bca,321dsa]|
|231dsa|653asd|[231dsa,653asd]|
|231dsa|698poq|[231dsa,698poq]|
|231dsa|897qwa|[231dsa,897qwa]|
+------+------+---------------+
[(122abc,223cde)]
[(122abc,334vbn)]
[(122abc,445das)]
[(221bca,321dsa)]
[(231dsa,653asd)]
[(231dsa,698poq)]
[(231dsa,897qwa)]
Update1:
Using paired rdd
val df = Seq(
Array("122abc","223cde","334vbn","445das"),
Array("221bca","321dsa"),
Array("231dsa","653asd","698poq","897qwa")
).toDF("arr")
import scala.collection.mutable._
val rdd1 = df.rdd.map( x => { val y = x.getAs[mutable.WrappedArray[String]]("arr")(0); (y,x)} )
val pair = new PairRDDFunctions(rdd1)
pair.flatMapValues( x => x.getAs[mutable.WrappedArray[String]]("arr") )
.filter( x=> x._1 != x._2)
.collect.foreach(println)
Results:
(122abc,223cde)
(122abc,334vbn)
(122abc,445das)
(221bca,321dsa)
(231dsa,653asd)
(231dsa,698poq)
(231dsa,897qwa)

F# concatenate int array option to string

I have a data contract (WCF) with a field defined as:
[<DataContract(Namespace = _Namespace.ws)>]
type CommitRequest =
{
// Excluded for brevity
...
[<field: DataMember(Name="ExcludeList", IsRequired=false) >]
ExcludeList : int array option
}
I want to from the entries in the ExcludeList, create a comma separated string (to reduce the number of network hops to the database to update the status). I have tried the following 2 approaches, neither of which create the desired string, both are empty:
// Logic to determine if we need to execute this block works correctly
try
// Use F# concat
let strList = request.ExcludeList.Value |> Array.map string
let idString = String.concat ",", strList
// Next try using .NET Join
let idList = String.Join ((",", (request.ExcludeList.Value.Select (fun f -> f)).Distinct).ToString ())
with | ex ->
...
Both compile and execute but neither give me anything in the string. Would greatly appreciate someone pointing out what I am doing wrong here.
let intoArray : int array option = Some [| 1; 23; 16 |]
let strList = intoArray.Value |> Array.map string
let idString = String.concat "," strList // don't need comma between params
// Next try using .NET Join
let idList = System.String.Join (",", strList) // that also works
Output:
>
val intoArray : int array option = Some [|1; 23; 16|]
val strList : string [] = [|"1"; "23"; "16"|]
val idString : string = "1,23,16"
val idList : string = "1,23,16"

F#: Writing a function that takes any kind of array as input

I am new to programming and F# is my first language.
Here is part of my code:
let splitArrayIntoGroups (inputArray: string[]) (groupSize: int) =
let groups = new LinkedList<string[]>()
let rec splitRecursively currentStartIndex currentEndIndex =
groups.AddLast(inputArray.[currentStartIndex..currentEndIndex]) |> ignore
let newEndIndex = Math.Min((inputArray.Length - 1), (currentEndIndex + groupSize))
if newEndIndex <> currentEndIndex then
splitRecursively (currentStartIndex + groupSize) newEndIndex
splitRecursively 0 (groupSize - 1)
groups
I want this function to be able to accept arrays of any type (including types that I define myself) as input. What changes should I make?
This was already answered but here you have an implementation not using a linked list but just an array of lists
let rec split<'T> (input: 'T array) size =
let rec loopOn (tail : 'T array) grouped =
let lastIndex = Array.length tail - 1
let endindx = min (size - 1) lastIndex
let arrWrapper = (fun e -> [|e|])
let newGroup = tail.[0..endindx]
|> List.ofArray
|> arrWrapper
|> Array.append grouped
match tail with
| [||] -> newGroup
|> Array.filter (fun e -> List.length e > 0)
| _ -> loopOn tail.[endindx + 1..] newGroup
let initialState = [|List.empty<'T>|]
loopOn input initialState
Because this is generic implementation you can call it with different types
type Custom = {Value : int}
let r = split<int> [|1..1000|] 10
let r2 = split<float> [|1.0..1000.0|] 10
let r3 = split<Custom> [|for i in 1..1000 ->
{Value = i}|] 10
replace string[] with _[] in the function signature.

F# Array instantiates with 5 items but not with 6

I am new to F#, so I am probably missing something trivial but here goes.
This works -
let monthsWith31Days = [| MonthType.January;
MonthType.March;
MonthType.May;
MonthType.July;
MonthType.December |]
But this doesn't
let monthsWith31Days = [| MonthType.January;
MonthType.March;
MonthType.May;
MonthType.July;
MonthType.August;
MonthType.December |]
What I have noted is that it's not the content itself, but the number of items that matter (even if I change the actual items used). The problem starts when number of items exceed 5.
This is the error I get when I run my NUnit tests -
System.ArgumentException: Value does not fall within expected range.
Any ideas what I'm missing?
Edit:
Entire type definition (two types are related so showing both here) -
type public Month(monthType:MonthType, year:Year) =
member public this.Year
with get () = year
member public this.MonthType
with get () = monthType
member public this.GetDaysCount () =
let monthsWith31Days = [| MonthType.January;
MonthType.March;
MonthType.May;
MonthType.July;
MonthType.August;
MonthType.December |]
let has31 = monthsWith31Days |> Array.filter(fun n -> (int)n = (int)this.monthType) |> Array.length
if (has31 > 0)
then 31
// else if (this.MonthType = MonthType.February)
// then (if this.Year.Leap then 29
// else 28)
else 30
and public Year(ad:int) =
member public this.AD
with get() = ad
member public this.Months = Enum.GetValues(typeof<MonthType>).Cast().ToArray()
|> Array.map(fun n -> new Month (n, this))
member public this.GetMonth (index:int) =
(this.Months |> Array.filter(fun p-> (int)p.MonthType = index)).First()
member public this.GetMonth (monthName:string) =
let requiredMonthType = Enum.Parse(typeof<MonthType>, monthName) |> unbox<MonthType>
(this.Months |> Array.filter(fun p-> p.MonthType = requiredMonthType)).First()
member public this.Leap =
if this.AD % 400 = 0 then true
else if this.AD % 100 = 0 then false
else if this.AD % 4 = 0 then true
else false
member this.DaysCount = if this.Leap then 366 else 365
I actually vaguely recall some bug about creating array literals full of enums on some target CLR platform, where if you had more than 5, then some bad code was generated or something. Maybe you're hitting that? Are you targeting x64 and CLR2? You can work around the bug by avoiding array literals, and use e.g. a list and then call List.ToArray.

Resources