fold inverts the order of a sequence - arrays

The order of my Sequence, array etc matters. I have tried converting between List, Seq and Array to see if there are differences and in each case it inverses the order.
I have a seq of [noun] [verb] [ajective] for example, which is transformed into strings and then folded together. A sample response given this template might be "bad run bandits" instead of "bandits run bad".
Any thoughts on why fold does this or how to get it to execute in the appropriate order?
let res = template |> Seq.map(fun pos ->
let e = s |> PredictionEngine.GetRandom
pos |> PredictionEngine.GetBestPartOfSpeechWord e)
|> Seq.fold(fun acc w -> w.Text + " " + acc) ""

I believe this is just a matter of changing the order of your last function to:
Seq.fold(fun acc w -> acc + " " + w.Text) ""
Like this the new itens get concatenated to the end of the old ones. A minimum example of this working can be seen in the following snippet:
["Bandits";"Run";"Bad"]
|> Seq.fold(fun acc w -> acc + " " + w) ""
|> printfn "%s"

If you have such common job as concatenating strings I recommend using functions from library.
["noun"; "verb"; "adj"]
|> String.concat " "
On daily basis work, I think, there are not so many problems that require writing custom folding.

Related

F# fsharp Fast way of finding a string "starting with" from two arrays

I have a problem with performance in large arrays (50k each). What would be the fastest way of finding a string that starts with another string given two arrays? I'm have tried different things, but the code below seems to be as good as I can get it.
let findFile (f:string, p:string, pc:string, pcn:string) =
f.StartsWith(p + "-" + pc) ||
f.StartsWith(p + "_" + pc) ||
f.StartsWith(p + "-" + pcn) ||
f.StartsWith(p + "_" + pcn)
products
|> Array.Parallel.map (fun i p ->
allFiles |> Array.Parallel.map (fun f ->
if findFile (f.Filename, p.Style, p.ColorCode, p.ColorName)
then {p with Filename = f.Filename }
else p
))
Thank you in advance.
First I would recommend to sanitize the filenames by splitting the two parts and if possible removing the rest:
Split the filenames by the '-'or '_' character so you compare tuples of (style * color) instead of strings twice. Also if at all possible, differentiate between when using color code from color name and separate into 2 arrays.
Now you have 2 options: use a Dictionary or sort the values
Dictionary: take the longer list and put it in a dictionary. Scan the shorter list looking for the values. Dictionaries use hash tables which make them very efficient and comparisons are also very fast. This requires that you use as a key only the style and color code/name leaving the rest of the string out.
The solution could look like this:
let dict () =
let dict = new Dictionary<_, _>()
allFiles |> Seq.iter (fun f -> f.Filename.Split '-' |> fun a -> dict.Add((a.[0], a.[1]), f) )
products
|> Array.Parallel.map (fun p ->
let vRef = ref { Filename = "" }
if dict.TryGetValue((p.Style, p.ColorCode) , vRef)
then {p with Filename = (!vRef).Filename }
else p
)
If that is not possible consider then:
Sorting both lists: products and filenames. Scan both ordered lists simultaneously with an index each only advancing the lower value each time.
One more thing:
If you still want to do string comparisons you should consider using compiled Regex which are very efficient. Your regex could be something like: ^code[-_](red|FF0000) which would match any of the 4 values:
code-red
code_red
code-FF0000
code_FF0000
This is how you use compiled Regex:
let regex = new Regex(sprintf "^%s[-_](%s|%s)" p.Style p.ColorCode p.ColorName, RegexOptions.Singleline + RegexOptions.Compiled)
for i in 1..30 do
if regex.IsMatch(sprintf "code-%d" i) then printfn "%A" i

Is there a simple way to print each element of an array?

let x=[|15..20|]
let y=Array.map f x
printf "%O" y
Well, I got a type information.
Is there any way to print each element of "y" with delimiter of ",", while not having to use a for loop?
Either use String.Join in the System namespace or F# 'native':
let x = [| 15 .. 20 |]
printfn "%s" (System.String.Join(",", x))
x |> Seq.map string |> String.concat "," |> printfn "%s"
Using String.concat to concatenate the string with a separator is probably the best option in this case (because you do not want to have the separator at the end).
However, if you just wanted to print all elements, you can also use Array.iter:
let nums= [|15..20|]
Array.iter (fun x -> printfn "%O" x) nums // Using function call
nums |> Array.iter (fun x -> printfn "%O" x) // Using the pipe
Adding the separators in this case is harder, but possible using iteri:
nums |> Array.iteri (fun i x ->
if i <> 0 then printf ", "
printf "%O" x)
This won't print the entire array if it is large; I think it prints only the first 100 elements. Still, I suspect this is what you're after:
printfn "%A" y
If the array of items is large and you do not want to generate a large string, another option is to generate a interleaved sequence and skip the first item. The following code works assuming the array has at least one element.
One advantage of this approach is that it cleanly separates the act of interleaving the items and that of printing. It also eliminates having to do a check for the first item on every iteration.
let items = [| 15 .. 20|]
let strInterleaved delimiter items =
items
|> Seq.collect (fun item -> seq { yield delimiter; yield item})
|> Seq.skip(1)
items
|> Seq.map string
|> strInterleaved ","
|> Seq.iter (printf "%s")

F# memoization efficiency - near 1 million elements

I'm working on an f# solution to this problem where I need to find the generator element above 1,000,000 with the longest generated sequence
I use a tail-recursive function that memoizes the previous results to speed up the calculation. This is my current implementation.
let memoize f =
let cache = new Dictionary<_,_>(1000000)
(fun x ->
match cache.TryGetValue x with
| true, v ->
v
| _ -> let v = f x
cache.Add(x, v)
v)
let rec memSequence =
memoize (fun generator s ->
if generator = 1 then s + 1
else
let state = s+1
if even generator then memSequence(generator/2) state
else memSequence(3*generator + 1) state )
let problem14 =
Array.init 999999 (fun idx -> (idx+1, (memSequence (idx+1) 0))) |> Array.maxBy snd |> fst
It seems to work well until want to calculate the lengths of the sequences generated by the first 100,000 numbers but it slows down significantly over that. In fact, for 120,000 it doesn't seem to terminate. I had a feeling that it might be due to the Dictionary I use, but I read that this shouldn't be the case. Could you point out why this may be potentially inefficient?
You're on the right track, but there's one thing very wrong in how you implement your memoization.
Your memoize function takes a function of one argument and returns a memoized version of it. When you use it in memSequence however, you give it a curried, two argument function. What then happens is that the memoize takes the function and saves down the result of partially applying it for the first argument only, i.e. it stores the closure resulting from applying the function to generator, and than proceeds to call those closures on s.
This means that your memoization effectively doesn't do anything - add some print statements in your memoize function and you'll see that you're still doing full recursion.
I think the underlying question may have been How to combine a memoizing function with a potentially costly calculating function that takes more than one argument?.
In this case, that second argument isn't needed. There's nothing inherently wrong with memoizing 2168612 elements (the size of the dictionary after the calculation).
Beware of overflow, since at 113383 the sequence surpasses System.Int32.MaxValue. A solution might thus look like this:
let memoRec f =
let d = new System.Collections.Generic.Dictionary<_,_>()
let rec g x =
match d.TryGetValue x with
| true, res -> res
| _ -> let res = f g x in d.Add(x, res); res
g
let collatzLong =
memoRec (fun f n ->
if n <= 1L then 0
else 1 + f (if n % 2L = 0L then n / 2L else n * 3L + 1L) )
{0L .. 999999L}
|> Seq.map (fun i -> i, collatzLong i)
|> Seq.maxBy snd
|> fst

Matching elements of two arrays in F#

I have two sequences of stock data, and I'm trying to line up the dates and combine the data so that I can pass it to other functions that will run some statistics on it. Essentially, I want to pass two (or more) sequences that look like:
sequenceA = [(float,DateTime)]
sequenceB = [(float,DateTime)]
to a function, and have it return a single sequence where all the data is properly aligned by DateTime. Something like:
return = [(float,float,DateTime)]
where the floats are the close prices of the two sequences for that DateTime.
I've tried using a nested for loop, and I'm fairly certain that should work (though I've had some trouble with it), but it seems like F#'s match expression should also be able to handle this. I've looked up some documentation and examples of match expressions, but I'm running into a number of different issues that I haven't been able to get past.
This is my most recent attempt at a simplified version of what I'm trying to accomplish. As you can see, I'm just trying to see if the first element of the sequence 'x' has the date "1/11/2011". The problem is that 1) it always returns "Yes", and 2) I can't figure out how to get from here to the whole sequence, and then ultimately 2+ sequences.
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,System.DateTime.Parse("1/9/2011"))]
type t = seq<float*DateTime>
let align (a:t) =
let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match snd b with
| testDate -> printfn "Yes"
| _ -> printfn "No"
align x
I'm relatively new to F#, but I'm fairly sure that this should be possible with a match expression. Any help would be much appreciated!
Your question has two parts:
As to the pattern matching, in the pattern that you have above, testDate is a name that will be bound to the second item in tuple b. Both patterns will match any date, but the since the first pattern matches, your example always prints 'yes'.
If you want to match on a specific value of date, you can use the 'when' keyword to in your pattern:
let dateValue = DateTime.Today
match dateValue with
| someDate when someDate = DateTime.Today -> "Today"
| _ -> "Not Today"
If I had to implement the align function, I probably wouldn't try to use pattern matching. You can use Seq.groupBy to collect all entries with the same date.
///Groups two sequences together by key
let align a b =
let simplifyEntry (key, values) =
let prices = [for value in values -> snd value]
key, prices
a
|> Seq.append b
|> Seq.groupBy fst
|> Seq.map simplifyEntry
|> Seq.toList
//Demonstrate alignment of two sequences
let s1 = [DateTime.Today, 1.0]
let s2 = [
DateTime.Today, 2.0
DateTime.Today.AddDays(2.0), 10.0]
let pricesByDate = align s1 s2
for day, prices in pricesByDate do
let pricesText =
prices
|> Seq.map string
|> String.concat ", "
printfn "%A %s" day pricesText
I happen to be working on a library for working with time series data and it has a function for doing this - it is actually a bit more general, because it returns DateTime * float option * float option to represent the case when one series has value for a specified date, but the other one does not.
The function assumes that the two series are already sorted - which means that it only needs to walk over them once (for not-sorted sequences, you need to do multiple iterations or build some temporary tables).
Also note that the arguments are swapped than in your example. You need to give it DateTime * float. The function is not particularly nice - it works in IEnumerable which means that it needs to use mutable enumerators (and ugly imperative stuff, in general). In general, pattern matching just does not work well with sequences - you can get the head, but you cannot get the tail - because that would be inefficient. You could write much nicer one for F# lists...
open System.Collections.Generic
let alignWithOrdering (seq1:seq<'T * 'TAddress>) (seq2:seq<'T * 'TAddress>) (comparer:IComparer<_>) = seq {
let withIndex seq = Seq.mapi (fun i v -> i, v) seq
use en1 = seq1.GetEnumerator()
use en2 = seq2.GetEnumerator()
let en1HasNext = ref (en1.MoveNext())
let en2HasNext = ref (en2.MoveNext())
let returnAll (en:IEnumerator<_>) hasNext f = seq {
if hasNext then
yield f en.Current
while en.MoveNext() do yield f en.Current }
let rec next () = seq {
if not en1HasNext.Value then yield! returnAll en2 en2HasNext.Value (fun (k, i) -> k, None, Some i)
elif not en2HasNext.Value then yield! returnAll en1 en1HasNext.Value (fun (k, i) -> k, Some i, None)
else
let en1Val, en2Val = fst en1.Current, fst en2.Current
let comparison = comparer.Compare(en1Val, en2Val)
if comparison = 0 then
yield en1Val, Some(snd en1.Current), Some(snd en2.Current)
en1HasNext := en1.MoveNext()
en2HasNext := en2.MoveNext()
yield! next()
elif comparison < 0 then
yield en1Val, Some(snd en1.Current), None
en1HasNext := en1.MoveNext()
yield! next ()
else
yield en2Val, None, Some(snd en2.Current)
en2HasNext := en2.MoveNext()
yield! next () }
yield! next () }
Assuming that we want to use strings as keys (rather than your DateTime), you can call it like this:
alignWithOrdering
[ ("b", 0); ("c", 1); ("d", 2) ]
[ ("a", 0); ("b", 1); ("c", 2) ] (Comparer<string>.Default) |> List.ofSeq
// Returns
[ ("a", None, Some 0); ("b", Some 0, Some 1);
("c", Some 1, Some 2); ("d", Some 2, None) ]
If you're interested in working with time series of stock data in F#, you might be interested in joining the F# for Data and Machine Learning working group of the F# Foundation. We're currently working on an open-source library with support for time series that makes this much nicer :-). If you're interested in looking at & contributing to the early preview, then you can do that via this working group.
open System
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,DateTime.Parse("1/9/2011"))]
//type t = seq<float*DateTime>
let (|EqualDate|_|) str dt=
DateTime.TryParse str|>function
|true,x when x=dt->Some()
|_->None
let align a =
//let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match b with
|_,EqualDate "1/9/2011" -> printfn "Yes"
| _ -> printfn "No"
align x
x|>Seq.skip 1|>align

print Large Lists in F#

I am trying to print a large list with F# and am a difficult time. I am trying to create a lexical analyzer in F# I believe I am done but I can't seem to get it to print the entire list to check it.
here is an example of what I am trying to do
let modifierReg = Regex("(public|private)");
let isModifier str = if (modifierReg.IsMatch(str)) then ["Modifier"; str] else ["Keyword"; str]
let readLines filePath = seq {
use sr = new StreamReader (filePath:string)
while not sr.EndOfStream do
yield sr.ReadLine () }
let splitLines listArray =
listArray
|> Seq.map (fun (line: string) -> let m = Regex.Match(line, commentReg) in if m.Success then (m.Groups.Item 1).Value.Split([|' '|], System.StringSplitOptions.RemoveEmptyEntries) else line.Split([|' '|], System.StringSplitOptions.RemoveEmptyEntries) )
let res =
string1
|> readLines
|> splitLines
let scanLines lexicons =
lexicons
|> Seq.map (fun strArray -> strArray |> Seq.map (fun str -> isModifier(str)))
let printSeq seq =
printfn "%A" seq
let scanner filePath =
filePath
|> readLines
|> splitLines
|> scanLines
let scannerResults = scanner pathToCode
printSeq scannerResults
When I try to print the list I get the following
seq
[seq [["Keyword"; "class"]; ["Identifier"; "A"]]; seq [["Block"; "{"]];
seq [["Modifier"; "public"]; ["Type"; "int"]; ["Identifier"; "x;"]];
seq [["Modifier"; "public"]; ["Type"; "int"]; ["Identifier"; "y;"]]; ...]
I can't get it to print any further. I get the same behavior with something as simple as the following
printfn "%a" [1 .. 101]]
I can't seem to figure out how to print it off. Anyone have any experience with this? I can't seem to find any examples
Seq.iter will iterate over all the elements of a sequence, so e.g.
somelist|> Seq.iter (printfn "%A")
will print each of the elements. (The "%A" specifier is good at the common case for printing arbitrary data, but for large lists or whatnot, you can exercise finer control, as here, by iterating over every element and printing each individually, e.g. on a new line as above.)
You're not working with lists, you're working with sequences. Since sequences may be infinite, printf and friends only output the first N elements. Makes sense.
Brain and Daniel has already answered your question. I would add that %A would use reflection to print the object passed to printfn function. In your case it is not simple list of items but rather a list of list of list and so on, which basically is a tree. If this tree is too large then printfn "%A" would be pose a performance problem and you would need to write your own print function that could traverse the tree and print it.

Resources