print Large Lists in F# - arrays

I am trying to print a large list with F# and am a difficult time. I am trying to create a lexical analyzer in F# I believe I am done but I can't seem to get it to print the entire list to check it.
here is an example of what I am trying to do
let modifierReg = Regex("(public|private)");
let isModifier str = if (modifierReg.IsMatch(str)) then ["Modifier"; str] else ["Keyword"; str]
let readLines filePath = seq {
use sr = new StreamReader (filePath:string)
while not sr.EndOfStream do
yield sr.ReadLine () }
let splitLines listArray =
listArray
|> Seq.map (fun (line: string) -> let m = Regex.Match(line, commentReg) in if m.Success then (m.Groups.Item 1).Value.Split([|' '|], System.StringSplitOptions.RemoveEmptyEntries) else line.Split([|' '|], System.StringSplitOptions.RemoveEmptyEntries) )
let res =
string1
|> readLines
|> splitLines
let scanLines lexicons =
lexicons
|> Seq.map (fun strArray -> strArray |> Seq.map (fun str -> isModifier(str)))
let printSeq seq =
printfn "%A" seq
let scanner filePath =
filePath
|> readLines
|> splitLines
|> scanLines
let scannerResults = scanner pathToCode
printSeq scannerResults
When I try to print the list I get the following
seq
[seq [["Keyword"; "class"]; ["Identifier"; "A"]]; seq [["Block"; "{"]];
seq [["Modifier"; "public"]; ["Type"; "int"]; ["Identifier"; "x;"]];
seq [["Modifier"; "public"]; ["Type"; "int"]; ["Identifier"; "y;"]]; ...]
I can't get it to print any further. I get the same behavior with something as simple as the following
printfn "%a" [1 .. 101]]
I can't seem to figure out how to print it off. Anyone have any experience with this? I can't seem to find any examples

Seq.iter will iterate over all the elements of a sequence, so e.g.
somelist|> Seq.iter (printfn "%A")
will print each of the elements. (The "%A" specifier is good at the common case for printing arbitrary data, but for large lists or whatnot, you can exercise finer control, as here, by iterating over every element and printing each individually, e.g. on a new line as above.)

You're not working with lists, you're working with sequences. Since sequences may be infinite, printf and friends only output the first N elements. Makes sense.

Brain and Daniel has already answered your question. I would add that %A would use reflection to print the object passed to printfn function. In your case it is not simple list of items but rather a list of list of list and so on, which basically is a tree. If this tree is too large then printfn "%A" would be pose a performance problem and you would need to write your own print function that could traverse the tree and print it.

Related

F#: How to iterate through a List of Arrays of Strings (string [] list) the functional way

I'm a newbie at F#,
I've got a List that contains arrays, each arrays contains 7 Strings.
I want to loop through the Arrays and do some kind of Array.map later on,
However my problem is that I can't send individual arrays to some other function.
I don't want to use for-loops but focus on the functional way using pipelines and mapping only.
let stockArray =
[[|"2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26"|];
[|"2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12"|];
[|"2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19"|];
[|"2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52"|];]
let tryout =
stockArray
|> List.iter;;
Output complains about List.iter:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b list -> unit'
The type 'string [] list' does not match the type ''a -> unit'
When trying Array.iter, same difference:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b [] -> unit'
The type 'string [] list' does not match the type ''a -> unit'
In C# I would simply go about it with a foreach to start treating my arrays one at a time, but with F# I feel real stuck.
Thank you for your help
The question is not clear, even with the extra comments. Anyway, I think you will finally be able to figure out your needs from this answer.
I have implemented parseDate and parseFloat in such a way that I expect it to work on any machine, whatever locale, with the given data. You may want something else for your production application. Also, how theInt is calculated is perhaps not what you want.
List.iter, as you already discovered, converts data to unit, effectively throwing away data. So what's the point in that? It is usually placed last when used in a pipe sequence, often doing some work that involves side effects (e.g. printing out data) or mutable data operations (e.g. filling a mutable list with items). I suggest you study functions in the List, Array, Seq and Option modules, to see how they're used to transform data.
open System
open System.Globalization
let stockArray =
[
[| "2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26" |]
[| "2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12" |]
[| "2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19" |]
[| "2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52" |]
]
type OutData = { TheDate: DateTime; TheInt: int }
let parseDate s = DateTime.ParseExact (s, "yyyy-MM-dd", CultureInfo.InvariantCulture)
let parseFloat s = Double.Parse (s, CultureInfo.InvariantCulture)
let myFirstMap (inArray: string[]) : OutData =
if inArray.Length <> 7 then
failwith "Expected array with seven strings."
else
let theDate = parseDate inArray.[0]
let f2 = parseFloat inArray.[2]
let f3 = parseFloat inArray.[3]
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
let tryout =
stockArray
|> List.map myFirstMap
The following is an alternative implementation of myFirstMap. I guess some would say it's more idiomatic, but I would just say that what you prefer to use depends on what you might expect from a possible future development.
let myFirstMap inArray =
match inArray with
| [| sDate; _; s2; s3; _; _; _ |] ->
let theDate = parseDate sDate
let f2 = parseFloat s2
let f3 = parseFloat s3
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
| _ -> failwith "Expected array with seven strings."
The pipe operator |> is used to write an f x as x |> f.
The signature of List.iter is:
action: ('a -> unit) -> list: ('a list) -> unit
You give it an action, then a list, and it gives you a void.
You can read it thus: when you give List.iter an action, its type will be
list: ('a list) -> unit
a function to which you can pass a list.
So when you write stockArray |> List.iter, what you're actually trying to give it in place of an action is your list - that's the error. So pass in an action:
let tryout = List.iter (fun arr -> printfn "%A" arr) stockArray
which can be rewritten as:
let tryout = stockArray |> List.iter (fun arr -> printfn "%A" arr)
However my problem is that I can't send individual arrays to some other function
List.map and similar functions allow you to do precisely this - you don't need to iterate the list yourself.
For example, this will return just the first element of each array in your list:
stockArray
|> List.map (fun x -> x.[0])
You can replace the function passed to List.map with any function that operates on one array and returns some value.

How do I write in text file, when I've created a file

this is my first time writing in here.
I'm new to f# and wanted to get some help.
I've made a program that's supposed to take words out of an existing text file, edit it and write it in a new text file, in order by most frequent word to least.
I've made the most, but when the text file appears, but inside it says:
System.Tuple`2[System.String,System.Int32][]
Here's my code:
let reg = RegularExpressions.Regex "\s+"
let cleanEx = RegularExpressions.Regex "[\,\.\!\"\:\;\?\-]"
let read = (File.OpenText "clep.txt").ReadToEnd()
let clen = (cleanEx.Replace(read, "")).ToLower()
let clean = reg.Split(clen)
let finAr = Array.countBy id clean
let finlist = Array.sortByDescending (fun (_, count) -> count) finAr
// printfn "%A" finlist
let string = finlist.ToString()
let writer = File.AppendText("descend.txt")
writer.WriteLine(finlist);
writer.Close();
Why do you see?
System.Tuple`2[System.String,System.Int32][]
Because finAr is an array of tuples (string*int) and finlist is the array of same items, but ordered by count. When you do finlist.ToString() it does not give you a string representation of array items. ToString() by default (if not overridden) return full name of the object type. Which is array of tuples in your case.
Now what do you need to write a file of words in the frequency order? Just mapping array items to strings:
let lines =
clean
|> Array.countBy id // finAr
|> Array.sortByDescending (fun (_,count) -> count) // finlist
|> Array.map (fun (word, _) -> word) // here mapping each tuple to string
File.WriteAllLines("descent.txt", lines)
With a couple of wrappers, you can pipe operations related to reading file and writing to file:
"clep.txt"
|> readTextFile
|> getWordsMostFrequestFirst
|> writeLinesToFile "descent.txt"
Wrappers:
let readTextFile (path: string) =
(File.OpenText path).ReadToEnd()
let writeLinesToFile (path: string) (contents: string seq) =
File.WriteAllLines(path, contents)
And a function which processes text:
let getWordsMostFrequestFirst (text: string) =
let splitByWhitespaces (input: string) = Regex.Split(input, "\s+")
let toLower (input: string) = input.ToLower()
let removeDelimiters (input: string) = Regex.Replace(input, "[\,\.\!\"\:\;\?\-]", "")
text
|> removeDelimiters
|> toLower
|> splitByWhitespaces
|> Array.countBy id
|> Array.sortByDescending snd // easy way to get tuple items
|> Array.map fst
You're only writing a single line of text to the file, and because finlist is not a type for which StreamWriter.WriteLine() has a specific overload, it is treated as object, and the string used is the result of finlist.ToString(), which, as is common with built-in .NET types, is just the type name.
If you want to write the actual elements of the array to the file, you need to actually process the array.
This would write the string parts from all the tuples to the text file:
finlist
|> Array.map fst
|> Array.iter writer.WriteLine
To include the numbers, for example in the format "text: 1", you would have to create an appropriately formatted string for each array item first:
finlist
|> Array.map (fun (text, number) -> sprintf "%s: %i" text number)
|> Array.iter writer.WriteLine
By the way, because of the way .NET strings use \ for escaping characters, just like regular expressions do, your RegExes won't work the way you've written them. It should be
let reg = RegularExpressions.Regex #"\s+"
let cleanEx = RegularExpressions.Regex #"[\,\.\!\""\:\;\?\-]"
There are two changes here: The # before the strings tell the compiler not to use \ to escape characters (alternatively you can write every single backslash in a RegEx as \\, but that doesn't make it any more readable). In the middle of the second one, another " escapes the double quotes, because otherwise they would now terminate the string, and the line wouldn't compile anymore.

F# fsharp Fast way of finding a string "starting with" from two arrays

I have a problem with performance in large arrays (50k each). What would be the fastest way of finding a string that starts with another string given two arrays? I'm have tried different things, but the code below seems to be as good as I can get it.
let findFile (f:string, p:string, pc:string, pcn:string) =
f.StartsWith(p + "-" + pc) ||
f.StartsWith(p + "_" + pc) ||
f.StartsWith(p + "-" + pcn) ||
f.StartsWith(p + "_" + pcn)
products
|> Array.Parallel.map (fun i p ->
allFiles |> Array.Parallel.map (fun f ->
if findFile (f.Filename, p.Style, p.ColorCode, p.ColorName)
then {p with Filename = f.Filename }
else p
))
Thank you in advance.
First I would recommend to sanitize the filenames by splitting the two parts and if possible removing the rest:
Split the filenames by the '-'or '_' character so you compare tuples of (style * color) instead of strings twice. Also if at all possible, differentiate between when using color code from color name and separate into 2 arrays.
Now you have 2 options: use a Dictionary or sort the values
Dictionary: take the longer list and put it in a dictionary. Scan the shorter list looking for the values. Dictionaries use hash tables which make them very efficient and comparisons are also very fast. This requires that you use as a key only the style and color code/name leaving the rest of the string out.
The solution could look like this:
let dict () =
let dict = new Dictionary<_, _>()
allFiles |> Seq.iter (fun f -> f.Filename.Split '-' |> fun a -> dict.Add((a.[0], a.[1]), f) )
products
|> Array.Parallel.map (fun p ->
let vRef = ref { Filename = "" }
if dict.TryGetValue((p.Style, p.ColorCode) , vRef)
then {p with Filename = (!vRef).Filename }
else p
)
If that is not possible consider then:
Sorting both lists: products and filenames. Scan both ordered lists simultaneously with an index each only advancing the lower value each time.
One more thing:
If you still want to do string comparisons you should consider using compiled Regex which are very efficient. Your regex could be something like: ^code[-_](red|FF0000) which would match any of the 4 values:
code-red
code_red
code-FF0000
code_FF0000
This is how you use compiled Regex:
let regex = new Regex(sprintf "^%s[-_](%s|%s)" p.Style p.ColorCode p.ColorName, RegexOptions.Singleline + RegexOptions.Compiled)
for i in 1..30 do
if regex.IsMatch(sprintf "code-%d" i) then printfn "%A" i

Is there a simple way to print each element of an array?

let x=[|15..20|]
let y=Array.map f x
printf "%O" y
Well, I got a type information.
Is there any way to print each element of "y" with delimiter of ",", while not having to use a for loop?
Either use String.Join in the System namespace or F# 'native':
let x = [| 15 .. 20 |]
printfn "%s" (System.String.Join(",", x))
x |> Seq.map string |> String.concat "," |> printfn "%s"
Using String.concat to concatenate the string with a separator is probably the best option in this case (because you do not want to have the separator at the end).
However, if you just wanted to print all elements, you can also use Array.iter:
let nums= [|15..20|]
Array.iter (fun x -> printfn "%O" x) nums // Using function call
nums |> Array.iter (fun x -> printfn "%O" x) // Using the pipe
Adding the separators in this case is harder, but possible using iteri:
nums |> Array.iteri (fun i x ->
if i <> 0 then printf ", "
printf "%O" x)
This won't print the entire array if it is large; I think it prints only the first 100 elements. Still, I suspect this is what you're after:
printfn "%A" y
If the array of items is large and you do not want to generate a large string, another option is to generate a interleaved sequence and skip the first item. The following code works assuming the array has at least one element.
One advantage of this approach is that it cleanly separates the act of interleaving the items and that of printing. It also eliminates having to do a check for the first item on every iteration.
let items = [| 15 .. 20|]
let strInterleaved delimiter items =
items
|> Seq.collect (fun item -> seq { yield delimiter; yield item})
|> Seq.skip(1)
items
|> Seq.map string
|> strInterleaved ","
|> Seq.iter (printf "%s")

Matching elements of two arrays in F#

I have two sequences of stock data, and I'm trying to line up the dates and combine the data so that I can pass it to other functions that will run some statistics on it. Essentially, I want to pass two (or more) sequences that look like:
sequenceA = [(float,DateTime)]
sequenceB = [(float,DateTime)]
to a function, and have it return a single sequence where all the data is properly aligned by DateTime. Something like:
return = [(float,float,DateTime)]
where the floats are the close prices of the two sequences for that DateTime.
I've tried using a nested for loop, and I'm fairly certain that should work (though I've had some trouble with it), but it seems like F#'s match expression should also be able to handle this. I've looked up some documentation and examples of match expressions, but I'm running into a number of different issues that I haven't been able to get past.
This is my most recent attempt at a simplified version of what I'm trying to accomplish. As you can see, I'm just trying to see if the first element of the sequence 'x' has the date "1/11/2011". The problem is that 1) it always returns "Yes", and 2) I can't figure out how to get from here to the whole sequence, and then ultimately 2+ sequences.
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,System.DateTime.Parse("1/9/2011"))]
type t = seq<float*DateTime>
let align (a:t) =
let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match snd b with
| testDate -> printfn "Yes"
| _ -> printfn "No"
align x
I'm relatively new to F#, but I'm fairly sure that this should be possible with a match expression. Any help would be much appreciated!
Your question has two parts:
As to the pattern matching, in the pattern that you have above, testDate is a name that will be bound to the second item in tuple b. Both patterns will match any date, but the since the first pattern matches, your example always prints 'yes'.
If you want to match on a specific value of date, you can use the 'when' keyword to in your pattern:
let dateValue = DateTime.Today
match dateValue with
| someDate when someDate = DateTime.Today -> "Today"
| _ -> "Not Today"
If I had to implement the align function, I probably wouldn't try to use pattern matching. You can use Seq.groupBy to collect all entries with the same date.
///Groups two sequences together by key
let align a b =
let simplifyEntry (key, values) =
let prices = [for value in values -> snd value]
key, prices
a
|> Seq.append b
|> Seq.groupBy fst
|> Seq.map simplifyEntry
|> Seq.toList
//Demonstrate alignment of two sequences
let s1 = [DateTime.Today, 1.0]
let s2 = [
DateTime.Today, 2.0
DateTime.Today.AddDays(2.0), 10.0]
let pricesByDate = align s1 s2
for day, prices in pricesByDate do
let pricesText =
prices
|> Seq.map string
|> String.concat ", "
printfn "%A %s" day pricesText
I happen to be working on a library for working with time series data and it has a function for doing this - it is actually a bit more general, because it returns DateTime * float option * float option to represent the case when one series has value for a specified date, but the other one does not.
The function assumes that the two series are already sorted - which means that it only needs to walk over them once (for not-sorted sequences, you need to do multiple iterations or build some temporary tables).
Also note that the arguments are swapped than in your example. You need to give it DateTime * float. The function is not particularly nice - it works in IEnumerable which means that it needs to use mutable enumerators (and ugly imperative stuff, in general). In general, pattern matching just does not work well with sequences - you can get the head, but you cannot get the tail - because that would be inefficient. You could write much nicer one for F# lists...
open System.Collections.Generic
let alignWithOrdering (seq1:seq<'T * 'TAddress>) (seq2:seq<'T * 'TAddress>) (comparer:IComparer<_>) = seq {
let withIndex seq = Seq.mapi (fun i v -> i, v) seq
use en1 = seq1.GetEnumerator()
use en2 = seq2.GetEnumerator()
let en1HasNext = ref (en1.MoveNext())
let en2HasNext = ref (en2.MoveNext())
let returnAll (en:IEnumerator<_>) hasNext f = seq {
if hasNext then
yield f en.Current
while en.MoveNext() do yield f en.Current }
let rec next () = seq {
if not en1HasNext.Value then yield! returnAll en2 en2HasNext.Value (fun (k, i) -> k, None, Some i)
elif not en2HasNext.Value then yield! returnAll en1 en1HasNext.Value (fun (k, i) -> k, Some i, None)
else
let en1Val, en2Val = fst en1.Current, fst en2.Current
let comparison = comparer.Compare(en1Val, en2Val)
if comparison = 0 then
yield en1Val, Some(snd en1.Current), Some(snd en2.Current)
en1HasNext := en1.MoveNext()
en2HasNext := en2.MoveNext()
yield! next()
elif comparison < 0 then
yield en1Val, Some(snd en1.Current), None
en1HasNext := en1.MoveNext()
yield! next ()
else
yield en2Val, None, Some(snd en2.Current)
en2HasNext := en2.MoveNext()
yield! next () }
yield! next () }
Assuming that we want to use strings as keys (rather than your DateTime), you can call it like this:
alignWithOrdering
[ ("b", 0); ("c", 1); ("d", 2) ]
[ ("a", 0); ("b", 1); ("c", 2) ] (Comparer<string>.Default) |> List.ofSeq
// Returns
[ ("a", None, Some 0); ("b", Some 0, Some 1);
("c", Some 1, Some 2); ("d", Some 2, None) ]
If you're interested in working with time series of stock data in F#, you might be interested in joining the F# for Data and Machine Learning working group of the F# Foundation. We're currently working on an open-source library with support for time series that makes this much nicer :-). If you're interested in looking at & contributing to the early preview, then you can do that via this working group.
open System
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,DateTime.Parse("1/9/2011"))]
//type t = seq<float*DateTime>
let (|EqualDate|_|) str dt=
DateTime.TryParse str|>function
|true,x when x=dt->Some()
|_->None
let align a =
//let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match b with
|_,EqualDate "1/9/2011" -> printfn "Yes"
| _ -> printfn "No"
align x
x|>Seq.skip 1|>align

Resources