Matching elements of two arrays in F# - arrays

I have two sequences of stock data, and I'm trying to line up the dates and combine the data so that I can pass it to other functions that will run some statistics on it. Essentially, I want to pass two (or more) sequences that look like:
sequenceA = [(float,DateTime)]
sequenceB = [(float,DateTime)]
to a function, and have it return a single sequence where all the data is properly aligned by DateTime. Something like:
return = [(float,float,DateTime)]
where the floats are the close prices of the two sequences for that DateTime.
I've tried using a nested for loop, and I'm fairly certain that should work (though I've had some trouble with it), but it seems like F#'s match expression should also be able to handle this. I've looked up some documentation and examples of match expressions, but I'm running into a number of different issues that I haven't been able to get past.
This is my most recent attempt at a simplified version of what I'm trying to accomplish. As you can see, I'm just trying to see if the first element of the sequence 'x' has the date "1/11/2011". The problem is that 1) it always returns "Yes", and 2) I can't figure out how to get from here to the whole sequence, and then ultimately 2+ sequences.
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,System.DateTime.Parse("1/9/2011"))]
type t = seq<float*DateTime>
let align (a:t) =
let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match snd b with
| testDate -> printfn "Yes"
| _ -> printfn "No"
align x
I'm relatively new to F#, but I'm fairly sure that this should be possible with a match expression. Any help would be much appreciated!

Your question has two parts:
As to the pattern matching, in the pattern that you have above, testDate is a name that will be bound to the second item in tuple b. Both patterns will match any date, but the since the first pattern matches, your example always prints 'yes'.
If you want to match on a specific value of date, you can use the 'when' keyword to in your pattern:
let dateValue = DateTime.Today
match dateValue with
| someDate when someDate = DateTime.Today -> "Today"
| _ -> "Not Today"
If I had to implement the align function, I probably wouldn't try to use pattern matching. You can use Seq.groupBy to collect all entries with the same date.
///Groups two sequences together by key
let align a b =
let simplifyEntry (key, values) =
let prices = [for value in values -> snd value]
key, prices
a
|> Seq.append b
|> Seq.groupBy fst
|> Seq.map simplifyEntry
|> Seq.toList
//Demonstrate alignment of two sequences
let s1 = [DateTime.Today, 1.0]
let s2 = [
DateTime.Today, 2.0
DateTime.Today.AddDays(2.0), 10.0]
let pricesByDate = align s1 s2
for day, prices in pricesByDate do
let pricesText =
prices
|> Seq.map string
|> String.concat ", "
printfn "%A %s" day pricesText

I happen to be working on a library for working with time series data and it has a function for doing this - it is actually a bit more general, because it returns DateTime * float option * float option to represent the case when one series has value for a specified date, but the other one does not.
The function assumes that the two series are already sorted - which means that it only needs to walk over them once (for not-sorted sequences, you need to do multiple iterations or build some temporary tables).
Also note that the arguments are swapped than in your example. You need to give it DateTime * float. The function is not particularly nice - it works in IEnumerable which means that it needs to use mutable enumerators (and ugly imperative stuff, in general). In general, pattern matching just does not work well with sequences - you can get the head, but you cannot get the tail - because that would be inefficient. You could write much nicer one for F# lists...
open System.Collections.Generic
let alignWithOrdering (seq1:seq<'T * 'TAddress>) (seq2:seq<'T * 'TAddress>) (comparer:IComparer<_>) = seq {
let withIndex seq = Seq.mapi (fun i v -> i, v) seq
use en1 = seq1.GetEnumerator()
use en2 = seq2.GetEnumerator()
let en1HasNext = ref (en1.MoveNext())
let en2HasNext = ref (en2.MoveNext())
let returnAll (en:IEnumerator<_>) hasNext f = seq {
if hasNext then
yield f en.Current
while en.MoveNext() do yield f en.Current }
let rec next () = seq {
if not en1HasNext.Value then yield! returnAll en2 en2HasNext.Value (fun (k, i) -> k, None, Some i)
elif not en2HasNext.Value then yield! returnAll en1 en1HasNext.Value (fun (k, i) -> k, Some i, None)
else
let en1Val, en2Val = fst en1.Current, fst en2.Current
let comparison = comparer.Compare(en1Val, en2Val)
if comparison = 0 then
yield en1Val, Some(snd en1.Current), Some(snd en2.Current)
en1HasNext := en1.MoveNext()
en2HasNext := en2.MoveNext()
yield! next()
elif comparison < 0 then
yield en1Val, Some(snd en1.Current), None
en1HasNext := en1.MoveNext()
yield! next ()
else
yield en2Val, None, Some(snd en2.Current)
en2HasNext := en2.MoveNext()
yield! next () }
yield! next () }
Assuming that we want to use strings as keys (rather than your DateTime), you can call it like this:
alignWithOrdering
[ ("b", 0); ("c", 1); ("d", 2) ]
[ ("a", 0); ("b", 1); ("c", 2) ] (Comparer<string>.Default) |> List.ofSeq
// Returns
[ ("a", None, Some 0); ("b", Some 0, Some 1);
("c", Some 1, Some 2); ("d", Some 2, None) ]
If you're interested in working with time series of stock data in F#, you might be interested in joining the F# for Data and Machine Learning working group of the F# Foundation. We're currently working on an open-source library with support for time series that makes this much nicer :-). If you're interested in looking at & contributing to the early preview, then you can do that via this working group.

open System
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,DateTime.Parse("1/9/2011"))]
//type t = seq<float*DateTime>
let (|EqualDate|_|) str dt=
DateTime.TryParse str|>function
|true,x when x=dt->Some()
|_->None
let align a =
//let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match b with
|_,EqualDate "1/9/2011" -> printfn "Yes"
| _ -> printfn "No"
align x
x|>Seq.skip 1|>align

Related

map multiple elements with same key in F#

Suppose two sorted lists with same length, key value list let k = [1;1;2;2;3;3], and a number listlet n = [1;2;3;4;5;6]. The ith position on both lists maps with each other, meaning the element n=1 has a key value of k=1, n=2 has a key value of k=1... so on.
How to create a list that maps multiple elements with same key value in F#?
So the output would be in a format [[n1;n2;key];[n1;n2;key]...], the example output is [[1;2;1];[3;4;2];[5;6;3]].
What you describe is a map data-structure, so i would use one, instead of implementing it yourself with a bunch of list functions you can convert your two lists in such a way
let k = [1;1;2;2;3;3]
let n = [1;2;3;4;5;6]
let mapKeyToValues keys values =
let folder m k v =
m |> Map.change k (function
| Some old -> Some (v :: old)
| None -> Some [v]
)
List.fold2 folder Map.empty keys values
Now with
let r = mapKeyToValues k n
You get a data-structur like
Map [
1, [2; 1]
2, [4; 3]
3, [6; 5]
]
This would be equivalent to the JSON-Object
{
"1": [2,1],
"2": [4,3],
"3": [6,5]
}
Usually you could write
let r = Map (List.zip k n)
This way, out of a tuple with (k,v) you create such a map-datastructure. But a new key, just ovverrides an older, so you get
Map [
(1, 2)
(2, 4)
(3, 6)
]
That's the reason why you need Map.change. You go through every (key,value) of a list with List.fold2 and then add it to the map data-structure. If a key is not present (the None case) you create a list as a value with your only value. In the Some case a key was already added, and you add your value to the list that is already present.
In this example i suppose the order of the values doesn't matter. If you want the same order as they appear in the value list, you must use List.foldBack2 instead of List.fold2. But you must change the order of some arguments. This should be an exercise for you, to understand it better.
There is a Map module with different functions to work with such a data-structure, and it also provides things like Map.map, Map.filter, Map.fold, Map.find, ... and so on. So use this instead. If really needed you also could use Map.toList to transform it back to a list again, or use Map.fold.
Anyway as a reminder, as some people use (List.zip and then List.map) or (List.zip and then List.fold). There is a List.map2 and List.fold2 that does this in one operation instead of two.
This can be achieved by "zipping" the two lists together and then grouping by the key:
let groupByKey keys numbers =
List.zip keys numbers
|> List.groupBy fst
|> List.map (fun (key, values) -> key, (values |> List.map snd))
groupByKey k n // [(1, [1; 2]); (2, [3; 4]); (3, [5; 6])]
The above version of the function doesn't have the exact output format that you specified, but it is a more structured output, where each item in the list is a tuple of the key and the list of values for that key. This might be preferred for cleaner code, depending on what you're going to do with the result.
To get your specified format it can be changed slightly:
let groupByKey keys numbers =
List.zip keys numbers
|> List.groupBy fst
|> List.map (fun (key, values) -> [
yield! values |> List.map snd
yield key
])
groupByKey k n // [[1; 2; 1]; [3; 4; 2]; [5; 6; 3]]
I think what you are looking for is Seq.zip, which takes two sequences (list are sequences) and gives back a sequence of pairs.
Also note you specified your lists with commas, but F# uses semi-colons to separate list elements.
let keys = [ 1; 1; 2; 2; 3; 3 ]
let values = [ 1; 2; 3; 4; 5; 6 ]
let map =
values
|> Seq.zip keys
|> Seq.groupBy (fun (k, v) -> k)
|> Seq.map (fun (k, vs) -> k, vs |> Seq.map snd |> Seq.toList)
|> Map.ofSeq
printfn "%A" map
// map [(1, [1; 2]); (2, [3; 4]); (3, [5; 6])]
To get the format you are looking for (although I think you should consider a Map) you can unpack it like so:
let mapList =
map
|> Map.toSeq
|> Seq.map (fun (k, vs) -> vs # [ k ])
|> Seq.toList
printfn "%A" mapList
// [[1; 2; 1]; [3; 4; 2]; [5; 6; 3]]

F#: How to iterate through a List of Arrays of Strings (string [] list) the functional way

I'm a newbie at F#,
I've got a List that contains arrays, each arrays contains 7 Strings.
I want to loop through the Arrays and do some kind of Array.map later on,
However my problem is that I can't send individual arrays to some other function.
I don't want to use for-loops but focus on the functional way using pipelines and mapping only.
let stockArray =
[[|"2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26"|];
[|"2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12"|];
[|"2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19"|];
[|"2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52"|];]
let tryout =
stockArray
|> List.iter;;
Output complains about List.iter:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b list -> unit'
The type 'string [] list' does not match the type ''a -> unit'
When trying Array.iter, same difference:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b [] -> unit'
The type 'string [] list' does not match the type ''a -> unit'
In C# I would simply go about it with a foreach to start treating my arrays one at a time, but with F# I feel real stuck.
Thank you for your help
The question is not clear, even with the extra comments. Anyway, I think you will finally be able to figure out your needs from this answer.
I have implemented parseDate and parseFloat in such a way that I expect it to work on any machine, whatever locale, with the given data. You may want something else for your production application. Also, how theInt is calculated is perhaps not what you want.
List.iter, as you already discovered, converts data to unit, effectively throwing away data. So what's the point in that? It is usually placed last when used in a pipe sequence, often doing some work that involves side effects (e.g. printing out data) or mutable data operations (e.g. filling a mutable list with items). I suggest you study functions in the List, Array, Seq and Option modules, to see how they're used to transform data.
open System
open System.Globalization
let stockArray =
[
[| "2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26" |]
[| "2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12" |]
[| "2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19" |]
[| "2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52" |]
]
type OutData = { TheDate: DateTime; TheInt: int }
let parseDate s = DateTime.ParseExact (s, "yyyy-MM-dd", CultureInfo.InvariantCulture)
let parseFloat s = Double.Parse (s, CultureInfo.InvariantCulture)
let myFirstMap (inArray: string[]) : OutData =
if inArray.Length <> 7 then
failwith "Expected array with seven strings."
else
let theDate = parseDate inArray.[0]
let f2 = parseFloat inArray.[2]
let f3 = parseFloat inArray.[3]
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
let tryout =
stockArray
|> List.map myFirstMap
The following is an alternative implementation of myFirstMap. I guess some would say it's more idiomatic, but I would just say that what you prefer to use depends on what you might expect from a possible future development.
let myFirstMap inArray =
match inArray with
| [| sDate; _; s2; s3; _; _; _ |] ->
let theDate = parseDate sDate
let f2 = parseFloat s2
let f3 = parseFloat s3
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
| _ -> failwith "Expected array with seven strings."
The pipe operator |> is used to write an f x as x |> f.
The signature of List.iter is:
action: ('a -> unit) -> list: ('a list) -> unit
You give it an action, then a list, and it gives you a void.
You can read it thus: when you give List.iter an action, its type will be
list: ('a list) -> unit
a function to which you can pass a list.
So when you write stockArray |> List.iter, what you're actually trying to give it in place of an action is your list - that's the error. So pass in an action:
let tryout = List.iter (fun arr -> printfn "%A" arr) stockArray
which can be rewritten as:
let tryout = stockArray |> List.iter (fun arr -> printfn "%A" arr)
However my problem is that I can't send individual arrays to some other function
List.map and similar functions allow you to do precisely this - you don't need to iterate the list yourself.
For example, this will return just the first element of each array in your list:
stockArray
|> List.map (fun x -> x.[0])
You can replace the function passed to List.map with any function that operates on one array and returns some value.

F# fsharp Fast way of finding a string "starting with" from two arrays

I have a problem with performance in large arrays (50k each). What would be the fastest way of finding a string that starts with another string given two arrays? I'm have tried different things, but the code below seems to be as good as I can get it.
let findFile (f:string, p:string, pc:string, pcn:string) =
f.StartsWith(p + "-" + pc) ||
f.StartsWith(p + "_" + pc) ||
f.StartsWith(p + "-" + pcn) ||
f.StartsWith(p + "_" + pcn)
products
|> Array.Parallel.map (fun i p ->
allFiles |> Array.Parallel.map (fun f ->
if findFile (f.Filename, p.Style, p.ColorCode, p.ColorName)
then {p with Filename = f.Filename }
else p
))
Thank you in advance.
First I would recommend to sanitize the filenames by splitting the two parts and if possible removing the rest:
Split the filenames by the '-'or '_' character so you compare tuples of (style * color) instead of strings twice. Also if at all possible, differentiate between when using color code from color name and separate into 2 arrays.
Now you have 2 options: use a Dictionary or sort the values
Dictionary: take the longer list and put it in a dictionary. Scan the shorter list looking for the values. Dictionaries use hash tables which make them very efficient and comparisons are also very fast. This requires that you use as a key only the style and color code/name leaving the rest of the string out.
The solution could look like this:
let dict () =
let dict = new Dictionary<_, _>()
allFiles |> Seq.iter (fun f -> f.Filename.Split '-' |> fun a -> dict.Add((a.[0], a.[1]), f) )
products
|> Array.Parallel.map (fun p ->
let vRef = ref { Filename = "" }
if dict.TryGetValue((p.Style, p.ColorCode) , vRef)
then {p with Filename = (!vRef).Filename }
else p
)
If that is not possible consider then:
Sorting both lists: products and filenames. Scan both ordered lists simultaneously with an index each only advancing the lower value each time.
One more thing:
If you still want to do string comparisons you should consider using compiled Regex which are very efficient. Your regex could be something like: ^code[-_](red|FF0000) which would match any of the 4 values:
code-red
code_red
code-FF0000
code_FF0000
This is how you use compiled Regex:
let regex = new Regex(sprintf "^%s[-_](%s|%s)" p.Style p.ColorCode p.ColorName, RegexOptions.Singleline + RegexOptions.Compiled)
for i in 1..30 do
if regex.IsMatch(sprintf "code-%d" i) then printfn "%A" i

F# memoization efficiency - near 1 million elements

I'm working on an f# solution to this problem where I need to find the generator element above 1,000,000 with the longest generated sequence
I use a tail-recursive function that memoizes the previous results to speed up the calculation. This is my current implementation.
let memoize f =
let cache = new Dictionary<_,_>(1000000)
(fun x ->
match cache.TryGetValue x with
| true, v ->
v
| _ -> let v = f x
cache.Add(x, v)
v)
let rec memSequence =
memoize (fun generator s ->
if generator = 1 then s + 1
else
let state = s+1
if even generator then memSequence(generator/2) state
else memSequence(3*generator + 1) state )
let problem14 =
Array.init 999999 (fun idx -> (idx+1, (memSequence (idx+1) 0))) |> Array.maxBy snd |> fst
It seems to work well until want to calculate the lengths of the sequences generated by the first 100,000 numbers but it slows down significantly over that. In fact, for 120,000 it doesn't seem to terminate. I had a feeling that it might be due to the Dictionary I use, but I read that this shouldn't be the case. Could you point out why this may be potentially inefficient?
You're on the right track, but there's one thing very wrong in how you implement your memoization.
Your memoize function takes a function of one argument and returns a memoized version of it. When you use it in memSequence however, you give it a curried, two argument function. What then happens is that the memoize takes the function and saves down the result of partially applying it for the first argument only, i.e. it stores the closure resulting from applying the function to generator, and than proceeds to call those closures on s.
This means that your memoization effectively doesn't do anything - add some print statements in your memoize function and you'll see that you're still doing full recursion.
I think the underlying question may have been How to combine a memoizing function with a potentially costly calculating function that takes more than one argument?.
In this case, that second argument isn't needed. There's nothing inherently wrong with memoizing 2168612 elements (the size of the dictionary after the calculation).
Beware of overflow, since at 113383 the sequence surpasses System.Int32.MaxValue. A solution might thus look like this:
let memoRec f =
let d = new System.Collections.Generic.Dictionary<_,_>()
let rec g x =
match d.TryGetValue x with
| true, res -> res
| _ -> let res = f g x in d.Add(x, res); res
g
let collatzLong =
memoRec (fun f n ->
if n <= 1L then 0
else 1 + f (if n % 2L = 0L then n / 2L else n * 3L + 1L) )
{0L .. 999999L}
|> Seq.map (fun i -> i, collatzLong i)
|> Seq.maxBy snd
|> fst

F# sorting array

I have an array like this,
[|{Name = "000016.SZ";
turnover = 3191591006.0;
MV = 34462194.8;};
{Name = "000019.SZ";
turnover = 2316868899.0;
MV = 18438461.48;};
{Name = "000020.SZ";
turnover = 1268882399.0;
MV = 7392964.366;};
.......
|]
How do I sort this array according to "turnover"? Thanks
(does not have much context to explain the code section? how much context should I write)
Assuming that the array is in arr you can just do
arr |> Array.sortBy (fun t -> t.turnover)
I know this has already been answered beautifully; however, I am finding that, like Haskell, F# matches the way I think and thought I'd add this for other novices :)
let rec sortData =
function
| [] -> []
| x :: xs ->
let smaller = List.filter (fun e -> e <= x) >> sortData
let larger = List.filter (fun e -> e > x) >> sortData
smaller xs # [ x ] # larger xs
Note 1: "a >> b" is function composition and means "create a function, f, such that f x = b(a(x))" as in "apply a then apply b" and so on if it continues: a >> b >> c >>...
Note 2: "#" is list concatenation, as in [1..100] = [1..12] # [13..50] # [51..89] # [90..100]. This is more powerful but less efficient than cons, "::", which can only add one element at a time and only to the head of a list, a::[b;c;d] = [a;b;c;d]
Note 3: the List.filter (fun e ->...) expressions produces a "curried function" version holding the provided filtering lambda.
Note 4: I could have made "smaller" and "larger" lists instead of functions (as in "xs |> filter |> sort"). My choice to make them functions was arbitrary.
Note 5: The type signature of the sortData function states that it requires and returns a list whose elements support comparison:
_arg1:'a list -> 'a list when 'a : comparison
Note 6: There is clarity in brevity (despite this particular post :) )
As a testament to the algorithmic clarity of functional languages, the following optimization of the above filter sort is three times faster (as reported by VS Test Explorer). In this case, the list is traversed only once per pivot (the first element) to produce the sub-lists of smaller and larger items. Also, an equivalence list is introduced which collects matching elements away from further comparisons.
let rec sort3 =
function
| [] -> []
| x::xs ->
let accum tot y =
match tot with
| (a,b,c) when y < x -> (y::a,b,c)
| (a,b,c) when y = x -> (a,y::b,c)
| (a,b,c) -> (a,b,y::c)
let (a,b,c) = List.fold accum ([],[x],[]) xs
(sort3 a) # b # (sort3 c)

Resources