F# Arrays - Count Yes And No - arrays

I have three arrays - first is a float array, second is a string array, and third is a float array comprising the sorted, unique values from the first array.
module SOQN =
open System
type collective = { score:double; yes:int; no:int; correct:double }
let first = [| 25; 20; 23; 10; 8; 5; 4; 12; 19; 15; 15; 12; 11; 11 |]
let second = [| "No"; "No"; "Yes"; "Yes"; "Yes"; "No"; "Yes"; "No"; "Yes"; "Yes"; "Yes"; "No"; "Yes"; "No" |]
let third = Array.distinct (first |> Array.sort)
let fourth = Seq.zip first second
let fifth = fourth |> Seq.sortBy fst
let yesCounts =
fifth
|> Seq.filter (fun (_, y) -> if y = "Yes" then true else false)
|> Seq.map fst
let noCounts =
fifth
|> Seq.filter (fun (_, y) -> if y = "No" then true else false)
|> Seq.map fst
(*
Expected Result:
third = [| 4; 5; 8; 10; 11; 12; 15; 19; 20; 23; 25 |]
yesCounts = [| 1; 1; 2; 3; 4; 4; 6; 7; 7; 8; 8 |]
noCounts = [| 0; 1; 1; 1; 2; 4; 4; 4; 5; 5; 6 |]
yesProportions = [| 1/1; 1/2; ;2/3 3/4; 4/6; 4/8; 6/10; 7/11; 7/12; 8/13; 8/14 |]
*)
I need a new collection generated from iterating through the third array and including the yes and no counts "<=" each of its values. Finally, I need to iterate through this new collection to create a new column comprising the yes proportions at each value and printing each unique value and its matching yes proportion.
Please advise?

First, use some more meaningful names:
let nums = [| 25; 20; 23; 10; 8; 5; 4; 12; 19; 15; 15; 12; 11; 11 |]
let yesNos = // convert string -> bool to simplify following code
[| "No"; "No"; "Yes"; "Yes"; "Yes"; "No"; "Yes"; "No"; "Yes"; "Yes"; "Yes"; "No"; "Yes"; "No" |]
|> Array.map (fun s -> s = "Yes")
let distinctNums = nums |> Array.distinct |> Array.sort
let numYesNos = Array.zip nums yesNos
Then if you really want separate collections for each yes/no/ratio calculation, you can build those using a fold:
let foldYesNos num (yesCounts, noCounts, yesRatios) =
// filter yes/no array by <= comparison
// then partition it into two arrays by the second bool item
let (yays, nays) = numYesNos |> Array.filter (fun (n,_) -> n <= num) |> Array.partition snd
let yesCount = Array.length yays
let noCount = Array.length nays
let yesRatio = float yesCount / float(yesCount + noCount)
(yesCount::yesCounts, noCount::noCounts, yesRatio::yesRatios)
// fold *back* over the distinct numbers
// to make the list accumulation easier/not require a reversal
let (yays, nays, ratio) = Seq.foldBack foldYesNos (distinctNums |> Seq.sort) ([], [], [])
However, I assume since you posted a Collective record type in the sample that you might actually want to emit one of these records for each input:
type Collective = { score:int; yes:int; no:int; correct:float }
let scoreNum num =
let (yays, nays) = numYesNos |> Array.filter (fun (n,_) -> n <= num) |> Array.partition snd
let yesCount = Array.length yays
let noCount = Array.length nays
let yesRatio = float yesCount / float(yesCount + noCount)
{ score = num; yes = yesCount; no = noCount; correct = yesRatio }
distinctNums |> Array.map scoreNum
You can see this code is very similar, it just returns a Collective record for each input rather than building lists for the individual calculations, and so we can use a map instead of a fold.

Related

Implementing Array.make in OCaml

I am trying to implement Array.make of the OCaml Array module. However I'm not getting the values right, my implementation:
let make_diy s v =
let rec aux s v =
if s = 0 then [| |]
else Array.append v ( aux (s-1) v )
in aux s v;;
...has the value: val make_diy : int -> 'a array -> 'a array = <fun>instead of: int -> 'a -> 'a array. It only creates an array if v is already in an array:
make_diy 10 [|3|];;
- : int array = [|3; 3; 3; 3; 3; 3; 3; 3; 3; 3|]
make_diy 10 3;;
Error: This expression has type int but an expression was expected of type
'a array
I also tried to make it with an accumulator, but it has the same result:
let make_diy s v =
let rec aux s v acc =
if s=0 then acc
else append v ( aux (s-1) v acc )
in aux s v [| |];;
EDIT: (typo) append instead of add
The problem seems to be that Array.append expects two arrays. If you want to use it with just an element, you need to wrap the first argument into an array:
Array.append [| v |] (aux (s - 1) v)
Note that this isn't a particularly efficient way to build up an array. Each iteration will allocate a new, slightly larger array. But presumably this is just a learning exercise.

How to match sub-string in arrays of strings in F#

I am trying to learn Machine learning. I am new to F#.
For the given dataset, Let say I have 2 string arrays.
let labels = [|"cat"; "dog"; "horse"|]
let scan_data = [|"cat\1.jpg"; "cat\2.jpg"; "dog\1.jpg"; "dog\2.jpg"; "dog\3.jpg"; "horse\1.jpg"; "horse\2.jpg"; "horse\3.jpg"; "horse\4.jpg"; "horse\5.jpg"|]
As you must have guessed, there are 3 labels (are kind of folders) which contains training image data (total 10). I want to create using above 2 arrays, a array like this :
let data_labels = [| //val data_labels : int [] []
[|1; 0; 0|]; //since 0th scan_data item represent "cat"
[|1; 0; 0|];
[|0; 1; 0|]; //since 2nd scan_data item represent "dog"
[|0; 1; 0|];
[|0; 1; 0|];
[|0; 0; 1|]; //since 5th scan_data item represent "horse"
[|0; 0; 1|];
[|0; 0; 1|];
[|0; 0; 1|];
[|0; 0; 1|];
|]
So whenever the sub-string (from "labels") match is found in "scan_data" item there should be an array representing match as "1" and rest no match as "0".
Any thoughts on how to achieve this in F#.
let helper (str1:string) str2 = if str1.Contains(str2) then 1 else 0
let t = scan_data |> Array.map (fun item -> labels |> Array.map (helper item) )

Sort Array and a Corresponding Array [duplicate]

This question already has answers here:
How can I sort multiple arrays based on the sorted order of another array
(2 answers)
Closed 6 years ago.
Say I have and array [4, 1, 8, 5] and another array that corresponds to each object in the first array, say ["Four", "One", "Eight", "Five"]. How can I sort the first array in ascending order while also moving the corresponding object in the second array to the same index in Swift?
Doesn't sound like best practice but this will solve your problem:
var numbers = [4,7,8,3]
var numbersString = ["Four","Seven","Eight","Three"]
func bubbleSort<T,Y>(inout numbers:[T],inout _ mirrorArray: [Y], _ comapre : (T,T)->(Bool)) -> () {
let numbersLength = numbers.count
for i in 0 ..< numbersLength {
for j in 1 ..< numbersLength-i {
if comapre(numbers[j-1],numbers[j]) {
swap(&numbers[j-1], &numbers[j])
swap(&mirrorArray[j-1], &mirrorArray[j])
}
}
}
}
bubbleSort(&numbers,&numbersString) { (a, b) -> (Bool) in
a<b
}
print(numbers,numbersString)
*This is generic therefore will work with any type and let you supply the condition
Using quick sort:
func quicksort_swift(inout a:[Int], inout b:[String], start:Int, end:Int) {
if (end - start < 2){
return
}
let p = a[start + (end - start)/2]
var l = start
var r = end - 1
while (l <= r){
if (a[l] < p){
l += 1
continue
}
if (a[r] > p){
r -= 1
continue
}
let t = a[l]
let t1 = b[l]
a[l] = a[r]
b[l] = b[r]
a[r] = t
b[r] = t1
l += 1
r -= 1
}
quicksort_swift(&a, b: &b, start: start, end: r + 1)
quicksort_swift(&a, b: &b, start: r + 1, end: end)
}
Although, the dictionary solution offered by #NSNoob, should be faster and more elegant.

Why is iterating through an array faster than Seq.find

I have an array sums that gives all the possible sums of a function f. This function accepts integers (say between 1 and 200, but same applies for say 1 and 10000) and converts them to double. I want to store sums as an array as I still haven't figured out how to do the algorithm I need without a loop.
Here's the code for how I generate sums:
let f n k = exp (double(k)/double(n)) - 1.0
let n = 200
let maxLimit = int(Math.Round(float(n)*1.5))
let FunctionValues = [|1..maxLimit|] |> Array.map (fun k -> f n k)
let sums = FunctionValues |> Array.map (fun i -> Array.map (fun j -> j + i) FunctionValues) |> Array.concat |> Array.sort
I found certain elements of the array sums that I want to find some integers that when input into the function f and then added will equal the value in sums. I could store the integers in sums, but I found that this destroys my memory.
Now I have two algorithms. Algorithm 1 uses a simple loop and a mutable int to store the values I care about. It shouldn't be very efficient since there isn't a break statement when it finds all the possible integers. I tried implementing Algorithm 2 that is more functional style, but I found it slower (~10% slower or 4200ms vs 4600ms with n = 10000), despite Seq being lazy. Why is this?
Algorithm 1:
let mutable a = 0
let mutable b = 0
let mutable c = 0
let mutable d = 0
for i in 1..maxLimit do
for j in i..maxLimit do
if sums.[bestI] = f n i + f n j then
a <- i
b <- j
if sums.[bestMid] = f n i + f n j then
c <- i
d <- j
Algorithm 2:
let findNM x =
let seq = {1..maxLimit} |> Seq.map (fun k -> (f n k, k))
let get2nd3rd (a, b, c) = (b, c)
seq |> Seq.map (fun (i, n) -> Seq.map (fun (j, m) -> (j + i, n, m) ) seq)
|> Seq.concat |> Seq.find (fun (i, n, m) -> i = x)
|> get2nd3rd
let digitsBestI = findNM sums.[bestI]
let digitsBestMid = findNM sums.[bestMid]
let a = fst digitsBestI
let b = snd digitsBestI
let c = fst digitsBestMid
let d = snd digitsBestMid
Edit: Note that the array sums is length maxLimit*maxLimit not length n. bestI and bestMid are then indices between 0 and maxLimit*maxLimit. For the purposes of this question they can be any number in that range. Their specific values are not particularly relevant.
I extended OPs code a bit in order to profile it
open System
let f n k = exp (double(k)/double(n)) - 1.0
let outer = 200
let n = 200
let maxLimit= int(Math.Round(float(n)*1.5))
let FunctionValues = [|1..maxLimit|] |> Array.map (fun k -> f n k)
let random = System.Random 19740531
let sums = FunctionValues |> Array.map (fun i -> Array.map (fun j -> j + i) FunctionValues) |> Array.concat |> Array.sort
let bests =
[| for i in [1..outer] -> (random.Next (n, maxLimit*maxLimit), random.Next (n, maxLimit*maxLimit))|]
let stopWatch =
let sw = System.Diagnostics.Stopwatch ()
sw.Start ()
sw
let timeIt (name : string) (a : int*int -> 'T) : unit =
let t = stopWatch.ElapsedMilliseconds
let v = a (bests.[0])
for i = 1 to (outer - 1) do
a bests.[i] |> ignore
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, elapsed %d ms, result %A" name d v
let algo1 (bestI, bestMid) =
let mutable a = 0
let mutable b = 0
let mutable c = 0
let mutable d = 0
for i in 1..maxLimit do
for j in i..maxLimit do
if sums.[bestI] = f n i + f n j then
a <- i
b <- j
if sums.[bestMid] = f n i + f n j then
c <- i
d <- j
a,b,c,d
let algo2 (bestI, bestMid) =
let findNM x =
let seq = {1..maxLimit} |> Seq.map (fun k -> (f n k, k))
let get2nd3rd (a, b, c) = (b, c)
seq |> Seq.map (fun (i, n) -> Seq.map (fun (j, m) -> (j + i, n, m) ) seq)
|> Seq.concat |> Seq.find (fun (i, n, m) -> i = x)
|> get2nd3rd
let digitsBestI = findNM sums.[bestI]
let digitsBestMid = findNM sums.[bestMid]
let a = fst digitsBestI
let b = snd digitsBestI
let c = fst digitsBestMid
let d = snd digitsBestMid
a,b,c,d
let algo3 (bestI, bestMid) =
let rec find best i j =
if best = f n i + f n j then i, j
elif i = maxLimit && j = maxLimit then 0, 0
elif j = maxLimit then find best (i + 1) 1
else find best i (j + 1)
let a, b = find sums.[bestI] 1 1
let c, d = find sums.[bestMid] 1 1
a, b, c, d
let algo4 (bestI, bestMid) =
let rec findI bestI mid i j =
if bestI = f n i + f n j then
let x, y = mid
i, j, x, y
elif i = maxLimit && j = maxLimit then 0, 0, 0, 0
elif j = maxLimit then findI bestI mid (i + 1) 1
else findI bestI mid i (j + 1)
let rec findMid ii bestMid i j =
if bestMid = f n i + f n j then
let x, y = ii
x, y, i, j
elif i = maxLimit && j = maxLimit then 0, 0, 0, 0
elif j = maxLimit then findMid ii bestMid (i + 1) 1
else findMid ii bestMid i (j + 1)
let rec find bestI bestMid i j =
if bestI = f n i + f n j then findMid (i, j) bestMid i j
elif bestMid = f n i + f n j then findI bestI (i, j) i j
elif i = maxLimit && j = maxLimit then 0, 0, 0, 0
elif j = maxLimit then find bestI bestMid (i + 1) 1
else find bestI bestMid i (j + 1)
find sums.[bestI] sums.[bestMid] 1 1
[<EntryPoint>]
let main argv =
timeIt "algo1" algo1
timeIt "algo2" algo2
timeIt "algo3" algo3
timeIt "algo4" algo4
0
The test results on my machine:
algo1, elapsed 438 ms, result (162, 268, 13, 135)
algo2, elapsed 1012 ms, result (162, 268, 13, 135)
algo3, elapsed 348 ms, result (162, 268, 13, 135)
algo4, elapsed 322 ms, result (162, 268, 13, 135)
algo1 uses the naive for loop implementation. algo2 uses a more refined algorithm relying on Seq.find. I describe algo3 and algo4 later.
OP wondered why the naive algo1 performed better even it does more work than the algo2 that is based around lazy Seq (essentially an IEnumerable<>).
The answer is Seq abstraction introduces overhead and prevents useful optimizations from occuring.
I usually resort to looking at the generated IL code in order to understand what's going (There are many good decompilers for .NET like ILSpy).
Let's look at algo1 (decompiled to C#)
// Program
public static Tuple<int, int, int, int> algo1(int bestI, int bestMid)
{
int a = 0;
int b = 0;
int c = 0;
int d = 0;
int i = 1;
int maxLimit = Program.maxLimit;
if (maxLimit >= i)
{
do
{
int j = i;
int maxLimit2 = Program.maxLimit;
if (maxLimit2 >= j)
{
do
{
if (Program.sums[bestI] == Math.Exp((double)i / (double)200) - 1.0 + (Math.Exp((double)j / (double)200) - 1.0))
{
a = i;
b = j;
}
if (Program.sums[bestMid] == Math.Exp((double)i / (double)200) - 1.0 + (Math.Exp((double)j / (double)200) - 1.0))
{
c = i;
d = j;
}
j++;
}
while (j != maxLimit2 + 1);
}
i++;
}
while (i != maxLimit + 1);
}
return new Tuple<int, int, int, int>(a, b, c, d);
}
algo1 is then expanded to an efficient while loop. In addition f is inlined. The JITter is easily able to create efficient machine code from this.
When we look at algo2 unpacking the full structure is too much for this post so I focus on findNM
internal static Tuple<int, int> findNM#48(double x)
{
IEnumerable<Tuple<double, int>> seq = SeqModule.Map<int, Tuple<double, int>>(new Program.seq#49(), Operators.OperatorIntrinsics.RangeInt32(1, 1, Program.maxLimit));
FSharpTypeFunc get2nd3rd = new Program.get2nd3rd#50-1();
Tuple<double, int, int> tupledArg = SeqModule.Find<Tuple<double, int, int>>(new Program.findNM#52-1(x), SeqModule.Concat<IEnumerable<Tuple<double, int, int>>, Tuple<double, int, int>>(SeqModule.Map<Tuple<double, int>, IEnumerable<Tuple<double, int, int>>>(new Program.findNM#51-2(seq), seq)));
FSharpFunc<Tuple<double, int, int>, Tuple<int, int>> fSharpFunc = (FSharpFunc<Tuple<double, int, int>, Tuple<int, int>>)((FSharpTypeFunc)((FSharpTypeFunc)get2nd3rd.Specialize<double>()).Specialize<int>()).Specialize<int>();
return Program.get2nd3rd#50<double, int, int>(tupledArg);
}
We see that it requires creation of multiple objects implementing IEnumerable<> as well as functions objects that are passed to higher order functions like Seq.find. While it is in principle possible for the JITter to inline the loop it most likely won't because of time-constraints and memory reasons. This means each call to the function object is a virtual call, virtual calls are quite expensive (tip: check the machine code). Because the virtual call might do anything that in turn prevents optimizations such as using SIMD instructions.
The OP noted that F# loop expressions lacks break/continue constructs which are useful when writing efficient for loops. F# do however support it implicitly in that if you write a tail-recursive function F# unwinds this into an efficient loop that uses break/continue to exit early.
algo3 is an example of implementing algo2 using tail-recursion. The disassembled code is something like this:
internal static Tuple<int, int> find#66(double best, int i, int j)
{
while (best != Math.Exp((double)i / (double)200) - 1.0 + (Math.Exp((double)j / (double)200) - 1.0))
{
if (i == Program.maxLimit && j == Program.maxLimit)
{
return new Tuple<int, int>(0, 0);
}
if (j == Program.maxLimit)
{
double arg_6F_0 = best;
int arg_6D_0 = i + 1;
j = 1;
i = arg_6D_0;
best = arg_6F_0;
}
else
{
double arg_7F_0 = best;
int arg_7D_0 = i;
j++;
i = arg_7D_0;
best = arg_7F_0;
}
}
return new Tuple<int, int>(i, j);
}
This enables us to write idiomatic functional code and yet get very good performance while avoiding stack overflows.
Before I realized how good tail-recursion is implemented in F# I tried to write efficient while loops with mutable logic in the while test expression. For the sake of humanity that code is abolished from existence now.
algo4 is an optimized version in that it only iterates of sums once for both bestMid and bestI much like algo1 but algo4 exits early if it can.
Hope this helps

Elementwise multiplication of arrays in F#

Is there a simple way to multiply the items of an array in F#?
So for example of I want to calculate a population mean from samples I would multiply observed values by frequency and then divide by the sample numbers.
let array_1 = [|1;32;9;5;6|];;
let denominator = Array.sum(array_1);;
denominator;;
let array_2 = [|1;2;3;4;5|];;
let productArray = [| for x in array_1 do
for y in array_2 do
yield x*y |];;
productArray;;
let numerator = Array.sum(productArray);;
numerator/denominator;;
Unfortunately this is yielding a product array like this:-
val it : int [] =
[|1; 2; 3; 4; 5; 32; 64; 96; 128; 160; 9; 18; 27; 36; 45; 5; 10; 15; 20; 25;
6; 12; 18; 24; 30|]
Which is the product of everything with everything, whereas I am after the dot product (x.[i]*y.[i] for each i).
Unfortunately adding an i variable and an index to the for loops does not seem to work.
What is the best solution to use here?
Array.zip array_1 array_2
|> Array.map (fun (x,y) -> x * y)
As the comment points out, you can also use Array.map2:
Array.map2 (*) array_1 array_2
Like this:
Array.map2 (*) xs ys
Something like
[| for i in 0 .. array_1.Length - 1 ->
array_1.[i] * array_2.[i] |]
should work.

Resources