F# memoization efficiency - near 1 million elements - arrays

I'm working on an f# solution to this problem where I need to find the generator element above 1,000,000 with the longest generated sequence
I use a tail-recursive function that memoizes the previous results to speed up the calculation. This is my current implementation.
let memoize f =
let cache = new Dictionary<_,_>(1000000)
(fun x ->
match cache.TryGetValue x with
| true, v ->
v
| _ -> let v = f x
cache.Add(x, v)
v)
let rec memSequence =
memoize (fun generator s ->
if generator = 1 then s + 1
else
let state = s+1
if even generator then memSequence(generator/2) state
else memSequence(3*generator + 1) state )
let problem14 =
Array.init 999999 (fun idx -> (idx+1, (memSequence (idx+1) 0))) |> Array.maxBy snd |> fst
It seems to work well until want to calculate the lengths of the sequences generated by the first 100,000 numbers but it slows down significantly over that. In fact, for 120,000 it doesn't seem to terminate. I had a feeling that it might be due to the Dictionary I use, but I read that this shouldn't be the case. Could you point out why this may be potentially inefficient?

You're on the right track, but there's one thing very wrong in how you implement your memoization.
Your memoize function takes a function of one argument and returns a memoized version of it. When you use it in memSequence however, you give it a curried, two argument function. What then happens is that the memoize takes the function and saves down the result of partially applying it for the first argument only, i.e. it stores the closure resulting from applying the function to generator, and than proceeds to call those closures on s.
This means that your memoization effectively doesn't do anything - add some print statements in your memoize function and you'll see that you're still doing full recursion.

I think the underlying question may have been How to combine a memoizing function with a potentially costly calculating function that takes more than one argument?.
In this case, that second argument isn't needed. There's nothing inherently wrong with memoizing 2168612 elements (the size of the dictionary after the calculation).
Beware of overflow, since at 113383 the sequence surpasses System.Int32.MaxValue. A solution might thus look like this:
let memoRec f =
let d = new System.Collections.Generic.Dictionary<_,_>()
let rec g x =
match d.TryGetValue x with
| true, res -> res
| _ -> let res = f g x in d.Add(x, res); res
g
let collatzLong =
memoRec (fun f n ->
if n <= 1L then 0
else 1 + f (if n % 2L = 0L then n / 2L else n * 3L + 1L) )
{0L .. 999999L}
|> Seq.map (fun i -> i, collatzLong i)
|> Seq.maxBy snd
|> fst

Related

What would be an idiomatic F# way to scale a list of (n-tuples or list) with another list, also arrays?

Given:
let weights = [0.5;0.4;0.3]
let X = [[2;3;4];[7;3;2];[5;3;6]]
what I want is: wX = [(0.5)*[2;3;4];(0.4)*[7;3;2];(0.3)*[5;3;6]]
would like to know an elegant way to do this with lists as well as with arrays. Additional optimization information is welcome
You write about a list of lists, but your code shows a list of tuples. Taking the liberty to adjust for that, a solution would be
let weights = [0.5;0.4;0.3]
let X = [[2;3;4];[7;3;2];[5;3;6]]
X
|> List.map2 (fun w x ->
x
|> List.map (fun xi ->
(float xi) * w
)
) weights
Depending on how comfortable you are with the syntax, you may prefer a oneliner like
List.map2 (fun w x -> List.map (float >> (*) w) x) weights X
The same library functions exist for sequences (Seq.map2, Seq.map) and arrays (in the Array module).
This is much more than an answer to the specific question but after a chat in the comments and learning that the question was specifically a part of a neural network in F# I am posting this which covers the question and implements the feedforward part of a neural network. It makes use of MathNet Numerics
This code is an F# translation of part of the Python code from Neural Networks and Deep Learning.
Python
def backprop(self, x, y):
"""Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
F#
module NeuralNetwork1 =
//# Third-party libraries
open MathNet.Numerics.Distributions // Normal.Sample
open MathNet.Numerics.LinearAlgebra // Matrix
type Network(sizes : int array) =
let mutable (_biases : Matrix<double> list) = []
let mutable (_weights : Matrix<double> list) = []
member __.Biases
with get() = _biases
and set value =
_biases <- value
member __.Weights
with get() = _weights
and set value =
_weights <- value
member __.Backprop (x : Matrix<double>) (y : Matrix<double>) =
// Note: There is a separate member for feedforward. This one is only used within Backprop
// Note: In the text layers are numbered from 1 to n with 1 being the input and n being the output
// In the code layers are numbered from 0 to n-1 with 0 being the input and n-1 being the output
// Layers
// 1 2 3 Text
// 0 1 2 Code
// 784 -> 30 -> 10
let feedforward () : (Matrix<double> list * Matrix<double> list) =
let (bw : (Matrix<double> * Matrix<double>) list) = List.zip __.Biases __.Weights
let rec feedfowardInner layer activation zs activations =
match layer with
| x when x < (__.NumLayers - 1) ->
let (bias, weight) = bw.[layer]
let z = weight * activation + bias
let activation = __.Sigmoid z
feedfowardInner (layer + 1) activation (z :: zs) (activation :: activations)
| _ ->
// Normally with recursive functions that build list for returning
// the final list(s) would be reversed before returning.
// However since the returned list will be accessed in reverse order
// for the backpropagation step, we leave them in the reverse order.
(zs, activations)
feedfowardInner 0 x [] [x]
In weight * activation * is an overloaded operator operating on Matrix<double>
Related back to your example data and using MathNet Numerics Arithmetics
let weights = [0.5;0.4;0.3]
let X = [[2;3;4];[7;3;2];[5;3;6]]
first the values for X need to be converted to float
let x1 = [[2.0;3.0;4.0];[7.0;3.0;2.0];[5.0;3;0;6;0]]
Now notice that x1 is a matrix and weights is a vector
so we can just multiply
let wx1 = weights * x1
Since the way I validated the code was a bit more than most I will explain it so that you don't have doubts to its validity.
When working with Neural Networks and in particular mini-batches, the starting numbers for the weights and biases are random and the generation of the mini-batches is also done randomly.
I know the original Python code was valid and I was able to run it successfully and get the same results as indicated in the book, meaning that the initial successes were within a couple of percent of the book and the graphs of the success were the same. I did this for several runs and several configurations of the neural network as discussed in the book. Then I ran the F# code and achieved the same graphs.
I also copied the starting random number sets from the Python code into the F# code so that while the data generated was random, both the Python and F# code used the same starting numbers, of which there are thousands. I then single stepped both the Python and F# code to verify that each individual function was returning a comparable float value, e.g. I put a break point on each line and made sure I checked each one. This actually took a few days because I had to write export and import code and massage the data from Python to F#.
See: How to determine type of nested data structures in Python?
I also tried a variation where I replaced the F# list with Linked list, but found no increase in speed, e.g. LinkedList<Matrix<double>>. Was an interesting exercise.
If I understand correctly,
let wX = weights |> List.map (fun w ->
X |> List.map (fun (a, b, c) ->
w * float a,
w * float b,
w * float c))
This is an alternate way to achieve this using Math.Net: https://numerics.mathdotnet.com/Matrix.html#Arithmetics

Exit / stop Array2D.initBased early

How is it possible to exit early / break out of / stop an array creation in F# (in this case, of Array2D.initBased)?
Remark: dic is a Dictionary<,>() whose value is an object that has a method named someMethod that takes two int parameters.
let arr = Array2D.initBased 1 1 width height (fun x y ->
let distinctValues = dic |> Seq.map (fun (KeyValue(k,v)) -> v.someMethod x y) |> Set.ofSeq
match distinctValues.count with
| dic.Count ->
// do something
// exit array creation here, because I do not need arr any more if v.someMethod x y produced distinct values for each dic value
| _ ->
// do something else
This is a tricky question - I don't think there is any function that lets you do this easily. I think the best option is probably to define your own higher-order function (implemented using not very elegant recursion) that hides the behavior.
The idea would be to define tryInitBased that behaves as initBased but the user-provided function can return option (to indicate failure) and the function returns option (either successfully created array or None):
/// Attempts to initialize a 2D array using the specified base offsets and lengths.
/// The provided function can return 'None' to indicate a failure - if the initializer
/// fails for any of the location inside the array, the construction is stopped and
/// the function returns 'None'.
let tryInitBased base1 base2 length1 length2 f =
let arr = Array2D.createBased base1 base2 length1 length2 (Unchecked.defaultof<_>)
/// Recursive function that fills a specified 'x' line
/// (returns false as soon as any call to 'f' fails, or true)
let rec fillY x y =
if y < (base2+length2) then
match f x y with
| Some v ->
arr.[x, y] <- v
fillY x (y + 1)
| _ -> false
else true
/// Recursive function that iterates over all 'x' positions
/// and calls 'fillY' to fill individual lines
let rec fillX x =
if x < (base1+length1) then
if fillY x base2 then fillX (x + 1)
else false
else true
if fillX base1 then Some arr else None
Then you can keep your code pretty much the same, but replace initBased with tryInitBased and return None or Some(res) from the lambda function.
I also posted the function to F# snippets with a nicer formatting.

2d Array Sort in Haskell

I'm trying to teach myself Haskell (coming from OOP languages). Having a hard time grasping the immutable variables stuff. I'm trying to sort a 2d array in row major.
In java, for example (pseudo):
int array[3][3] = **initialize array here
for(i = 0; i<3; i++)
for(j = 0; j<3; j++)
if(array[i][j] < current_low)
current_low = array[i][j]
How can I implement this same sort of thing in Haskell? If I create a temp array to add the low values to after each iteration, I won't be able to add to it because it is immutable, correct? Also, Haskell doesn't have loops, right?
Here's some useful stuff I know in Haskell:
main = do
let a = [[10,4],[6,10],[5,2]] --assign random numbers
print (a !! 0 !! 1) --will print a[0][1] in java notation
--How can we loop through the values?
First, your Java code does not sort anything. It just finds the smallest element. And, well, there's a kind of obvious Haskell solution... guess what, the function is called minimum! Let's see what it does:
GHCi> :t minimum
minimum :: Ord a => [a] -> a
ok, so it takes a list of values that can be compared (hence Ord) and outputs a single value, namely the smallest. How do we apply this to a "2D list" (nested list)? Well, basically we need the minimum amongst all minima of the sub-lists. So we first replace the list of list with the list of minima
allMinima = map minimum a
...and then use minimum allMinima.
Written compactly:
main :: IO ()
main = do
let a = [[10,4],[6,10],[5,2]] -- don't forget the indentation
print (minimum $ map minimum a)
That's all!
Indeed "looping through values" is a very un-functional concept. We generally don't want to talk about single steps that need to be taken, rather think about properties of the result we want, and let the compiler figure out how to do it. So if we weren't allowed to use the pre-defined minimum, here's how to think about it:
If we have a list and look at a single value... under what circumstances is it the correct result? Well, if it's smaller than all other values. And what is the smallest of the other values? Exactly, the minimum amongst them.
minimum' :: Ord a => [a] -> a
minimum' (x:xs)
| x < minimum' xs = x
If it's not smaller, then we just use the minimum of the other values
minimum' (x:xs)
| x < minxs = x
| otherwise = minxs
where minxs = minimum' xs
One more thing: if we recurse through the list this way, there will at some point be no first element left to compare with something. To prevent that, we first need the special case of a single-element list:
minimum' :: Ord a => [a] -> a
minimum' [x] = x -- obviously smallest, since there's no other element.
minimum' (x:xs)
| x < minxs = x
| otherwise = minxs
where minxs = minimum' xs
Alright, well, I'll take a stab. Zach, this answer is intended to get you thinking in recursions and folds. Recursions, folds, and maps are the fundamental ways that loops are replaced in functional style. Just try to believe that in reality, the question of nested looping rarely arises naturally in functional programming. When you actually need to do it, you'll often enter a special section of code, called a monad, in which you can do destructive writes in an imperative style. Here's an example. But, since you asked for help with breaking out of loop thinking, I'm going to focus on that part of the answer instead. #leftaroundabout's answer is also very good and you fill in his definition of minimum here.
flatten :: [[a]] -> [a]
flatten [] = []
flatten xs = foldr (++) [] xs
squarize :: Int -> [a] -> [[a]]
squarize _ [] = []
squarize len xs = (take len xs) : (squarize len $ drop len xs)
crappySort :: Ord a => [a] -> [a]
crappySort [] = []
crappySort xs =
let smallest = minimum xs
rest = filter (smallest /=) xs
count = (length xs) - (length rest)
in
replicate count smallest ++ crappySort rest
sortByThrees xs = squarize 3 $ crappySort $ flatten xs

Matching elements of two arrays in F#

I have two sequences of stock data, and I'm trying to line up the dates and combine the data so that I can pass it to other functions that will run some statistics on it. Essentially, I want to pass two (or more) sequences that look like:
sequenceA = [(float,DateTime)]
sequenceB = [(float,DateTime)]
to a function, and have it return a single sequence where all the data is properly aligned by DateTime. Something like:
return = [(float,float,DateTime)]
where the floats are the close prices of the two sequences for that DateTime.
I've tried using a nested for loop, and I'm fairly certain that should work (though I've had some trouble with it), but it seems like F#'s match expression should also be able to handle this. I've looked up some documentation and examples of match expressions, but I'm running into a number of different issues that I haven't been able to get past.
This is my most recent attempt at a simplified version of what I'm trying to accomplish. As you can see, I'm just trying to see if the first element of the sequence 'x' has the date "1/11/2011". The problem is that 1) it always returns "Yes", and 2) I can't figure out how to get from here to the whole sequence, and then ultimately 2+ sequences.
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,System.DateTime.Parse("1/9/2011"))]
type t = seq<float*DateTime>
let align (a:t) =
let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match snd b with
| testDate -> printfn "Yes"
| _ -> printfn "No"
align x
I'm relatively new to F#, but I'm fairly sure that this should be possible with a match expression. Any help would be much appreciated!
Your question has two parts:
As to the pattern matching, in the pattern that you have above, testDate is a name that will be bound to the second item in tuple b. Both patterns will match any date, but the since the first pattern matches, your example always prints 'yes'.
If you want to match on a specific value of date, you can use the 'when' keyword to in your pattern:
let dateValue = DateTime.Today
match dateValue with
| someDate when someDate = DateTime.Today -> "Today"
| _ -> "Not Today"
If I had to implement the align function, I probably wouldn't try to use pattern matching. You can use Seq.groupBy to collect all entries with the same date.
///Groups two sequences together by key
let align a b =
let simplifyEntry (key, values) =
let prices = [for value in values -> snd value]
key, prices
a
|> Seq.append b
|> Seq.groupBy fst
|> Seq.map simplifyEntry
|> Seq.toList
//Demonstrate alignment of two sequences
let s1 = [DateTime.Today, 1.0]
let s2 = [
DateTime.Today, 2.0
DateTime.Today.AddDays(2.0), 10.0]
let pricesByDate = align s1 s2
for day, prices in pricesByDate do
let pricesText =
prices
|> Seq.map string
|> String.concat ", "
printfn "%A %s" day pricesText
I happen to be working on a library for working with time series data and it has a function for doing this - it is actually a bit more general, because it returns DateTime * float option * float option to represent the case when one series has value for a specified date, but the other one does not.
The function assumes that the two series are already sorted - which means that it only needs to walk over them once (for not-sorted sequences, you need to do multiple iterations or build some temporary tables).
Also note that the arguments are swapped than in your example. You need to give it DateTime * float. The function is not particularly nice - it works in IEnumerable which means that it needs to use mutable enumerators (and ugly imperative stuff, in general). In general, pattern matching just does not work well with sequences - you can get the head, but you cannot get the tail - because that would be inefficient. You could write much nicer one for F# lists...
open System.Collections.Generic
let alignWithOrdering (seq1:seq<'T * 'TAddress>) (seq2:seq<'T * 'TAddress>) (comparer:IComparer<_>) = seq {
let withIndex seq = Seq.mapi (fun i v -> i, v) seq
use en1 = seq1.GetEnumerator()
use en2 = seq2.GetEnumerator()
let en1HasNext = ref (en1.MoveNext())
let en2HasNext = ref (en2.MoveNext())
let returnAll (en:IEnumerator<_>) hasNext f = seq {
if hasNext then
yield f en.Current
while en.MoveNext() do yield f en.Current }
let rec next () = seq {
if not en1HasNext.Value then yield! returnAll en2 en2HasNext.Value (fun (k, i) -> k, None, Some i)
elif not en2HasNext.Value then yield! returnAll en1 en1HasNext.Value (fun (k, i) -> k, Some i, None)
else
let en1Val, en2Val = fst en1.Current, fst en2.Current
let comparison = comparer.Compare(en1Val, en2Val)
if comparison = 0 then
yield en1Val, Some(snd en1.Current), Some(snd en2.Current)
en1HasNext := en1.MoveNext()
en2HasNext := en2.MoveNext()
yield! next()
elif comparison < 0 then
yield en1Val, Some(snd en1.Current), None
en1HasNext := en1.MoveNext()
yield! next ()
else
yield en2Val, None, Some(snd en2.Current)
en2HasNext := en2.MoveNext()
yield! next () }
yield! next () }
Assuming that we want to use strings as keys (rather than your DateTime), you can call it like this:
alignWithOrdering
[ ("b", 0); ("c", 1); ("d", 2) ]
[ ("a", 0); ("b", 1); ("c", 2) ] (Comparer<string>.Default) |> List.ofSeq
// Returns
[ ("a", None, Some 0); ("b", Some 0, Some 1);
("c", Some 1, Some 2); ("d", Some 2, None) ]
If you're interested in working with time series of stock data in F#, you might be interested in joining the F# for Data and Machine Learning working group of the F# Foundation. We're currently working on an open-source library with support for time series that makes this much nicer :-). If you're interested in looking at & contributing to the early preview, then you can do that via this working group.
open System
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,DateTime.Parse("1/9/2011"))]
//type t = seq<float*DateTime>
let (|EqualDate|_|) str dt=
DateTime.TryParse str|>function
|true,x when x=dt->Some()
|_->None
let align a =
//let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match b with
|_,EqualDate "1/9/2011" -> printfn "Yes"
| _ -> printfn "No"
align x
x|>Seq.skip 1|>align

F# sorting array

I have an array like this,
[|{Name = "000016.SZ";
turnover = 3191591006.0;
MV = 34462194.8;};
{Name = "000019.SZ";
turnover = 2316868899.0;
MV = 18438461.48;};
{Name = "000020.SZ";
turnover = 1268882399.0;
MV = 7392964.366;};
.......
|]
How do I sort this array according to "turnover"? Thanks
(does not have much context to explain the code section? how much context should I write)
Assuming that the array is in arr you can just do
arr |> Array.sortBy (fun t -> t.turnover)
I know this has already been answered beautifully; however, I am finding that, like Haskell, F# matches the way I think and thought I'd add this for other novices :)
let rec sortData =
function
| [] -> []
| x :: xs ->
let smaller = List.filter (fun e -> e <= x) >> sortData
let larger = List.filter (fun e -> e > x) >> sortData
smaller xs # [ x ] # larger xs
Note 1: "a >> b" is function composition and means "create a function, f, such that f x = b(a(x))" as in "apply a then apply b" and so on if it continues: a >> b >> c >>...
Note 2: "#" is list concatenation, as in [1..100] = [1..12] # [13..50] # [51..89] # [90..100]. This is more powerful but less efficient than cons, "::", which can only add one element at a time and only to the head of a list, a::[b;c;d] = [a;b;c;d]
Note 3: the List.filter (fun e ->...) expressions produces a "curried function" version holding the provided filtering lambda.
Note 4: I could have made "smaller" and "larger" lists instead of functions (as in "xs |> filter |> sort"). My choice to make them functions was arbitrary.
Note 5: The type signature of the sortData function states that it requires and returns a list whose elements support comparison:
_arg1:'a list -> 'a list when 'a : comparison
Note 6: There is clarity in brevity (despite this particular post :) )
As a testament to the algorithmic clarity of functional languages, the following optimization of the above filter sort is three times faster (as reported by VS Test Explorer). In this case, the list is traversed only once per pivot (the first element) to produce the sub-lists of smaller and larger items. Also, an equivalence list is introduced which collects matching elements away from further comparisons.
let rec sort3 =
function
| [] -> []
| x::xs ->
let accum tot y =
match tot with
| (a,b,c) when y < x -> (y::a,b,c)
| (a,b,c) when y = x -> (a,y::b,c)
| (a,b,c) -> (a,b,y::c)
let (a,b,c) = List.fold accum ([],[x],[]) xs
(sort3 a) # b # (sort3 c)

Resources