F# CSV - for each row create an array from columns data - arrays

I have a CSV file, where the fst column is a title and next 700+ columns are some int data.
Title D1 D2 D3 D4 .. D700
Name1 0 1 7 5 48
I try to use CsvProvider to read the file and then convert data to my custom type
type DigitRecord = { Title:string; Digits:int[] }
The problem is I don't know how to put all column data (except the first one with a title) into a int[] array.
let dataRecords =
CSV.Rows
|> Seq.map (fun record -> {Title = record.Title; Digits = ???})
I want to get a record with Title=Name1 and Digits=[|0,1,7,5...48|]
I'm newbie in F#, I'd be grateful for any help!

I think the easiest way is to use CsvParser like this:
let readData (path : string) seps =
CsvFile.Load(path, seps).Rows
|> Seq.map
(fun row -> row.Columns.[0], row.Columns |> Array.skip 1 |> Array.map int)
|> Seq.map
(fun (title, digits) -> {Title = title; Digits = digits})

Related

F#: How to iterate through a List of Arrays of Strings (string [] list) the functional way

I'm a newbie at F#,
I've got a List that contains arrays, each arrays contains 7 Strings.
I want to loop through the Arrays and do some kind of Array.map later on,
However my problem is that I can't send individual arrays to some other function.
I don't want to use for-loops but focus on the functional way using pipelines and mapping only.
let stockArray =
[[|"2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26"|];
[|"2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12"|];
[|"2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19"|];
[|"2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52"|];]
let tryout =
stockArray
|> List.iter;;
Output complains about List.iter:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b list -> unit'
The type 'string [] list' does not match the type ''a -> unit'
When trying Array.iter, same difference:
error FS0001: Type mismatch. Expecting a
'string [] list -> 'a' but given a
'('b -> unit) -> 'b [] -> unit'
The type 'string [] list' does not match the type ''a -> unit'
In C# I would simply go about it with a foreach to start treating my arrays one at a time, but with F# I feel real stuck.
Thank you for your help
The question is not clear, even with the extra comments. Anyway, I think you will finally be able to figure out your needs from this answer.
I have implemented parseDate and parseFloat in such a way that I expect it to work on any machine, whatever locale, with the given data. You may want something else for your production application. Also, how theInt is calculated is perhaps not what you want.
List.iter, as you already discovered, converts data to unit, effectively throwing away data. So what's the point in that? It is usually placed last when used in a pipe sequence, often doing some work that involves side effects (e.g. printing out data) or mutable data operations (e.g. filling a mutable list with items). I suggest you study functions in the List, Array, Seq and Option modules, to see how they're used to transform data.
open System
open System.Globalization
let stockArray =
[
[| "2012-03-30"; "32.40"; "32.41"; "32.04"; "32.26"; "31749400"; "32.26" |]
[| "2012-03-29"; "32.06"; "32.19"; "31.81"; "32.12"; "37038500"; "32.12" |]
[| "2012-03-28"; "32.52"; "32.70"; "32.04"; "32.19"; "41344800"; "32.19" |]
[| "2012-03-27"; "32.65"; "32.70"; "32.40"; "32.52"; "36274900"; "32.52" |]
]
type OutData = { TheDate: DateTime; TheInt: int }
let parseDate s = DateTime.ParseExact (s, "yyyy-MM-dd", CultureInfo.InvariantCulture)
let parseFloat s = Double.Parse (s, CultureInfo.InvariantCulture)
let myFirstMap (inArray: string[]) : OutData =
if inArray.Length <> 7 then
failwith "Expected array with seven strings."
else
let theDate = parseDate inArray.[0]
let f2 = parseFloat inArray.[2]
let f3 = parseFloat inArray.[3]
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
let tryout =
stockArray
|> List.map myFirstMap
The following is an alternative implementation of myFirstMap. I guess some would say it's more idiomatic, but I would just say that what you prefer to use depends on what you might expect from a possible future development.
let myFirstMap inArray =
match inArray with
| [| sDate; _; s2; s3; _; _; _ |] ->
let theDate = parseDate sDate
let f2 = parseFloat s2
let f3 = parseFloat s3
let f = f2 - f3
let theInt = int f
{ TheDate = theDate; TheInt = theInt }
| _ -> failwith "Expected array with seven strings."
The pipe operator |> is used to write an f x as x |> f.
The signature of List.iter is:
action: ('a -> unit) -> list: ('a list) -> unit
You give it an action, then a list, and it gives you a void.
You can read it thus: when you give List.iter an action, its type will be
list: ('a list) -> unit
a function to which you can pass a list.
So when you write stockArray |> List.iter, what you're actually trying to give it in place of an action is your list - that's the error. So pass in an action:
let tryout = List.iter (fun arr -> printfn "%A" arr) stockArray
which can be rewritten as:
let tryout = stockArray |> List.iter (fun arr -> printfn "%A" arr)
However my problem is that I can't send individual arrays to some other function
List.map and similar functions allow you to do precisely this - you don't need to iterate the list yourself.
For example, this will return just the first element of each array in your list:
stockArray
|> List.map (fun x -> x.[0])
You can replace the function passed to List.map with any function that operates on one array and returns some value.

How do I write in text file, when I've created a file

this is my first time writing in here.
I'm new to f# and wanted to get some help.
I've made a program that's supposed to take words out of an existing text file, edit it and write it in a new text file, in order by most frequent word to least.
I've made the most, but when the text file appears, but inside it says:
System.Tuple`2[System.String,System.Int32][]
Here's my code:
let reg = RegularExpressions.Regex "\s+"
let cleanEx = RegularExpressions.Regex "[\,\.\!\"\:\;\?\-]"
let read = (File.OpenText "clep.txt").ReadToEnd()
let clen = (cleanEx.Replace(read, "")).ToLower()
let clean = reg.Split(clen)
let finAr = Array.countBy id clean
let finlist = Array.sortByDescending (fun (_, count) -> count) finAr
// printfn "%A" finlist
let string = finlist.ToString()
let writer = File.AppendText("descend.txt")
writer.WriteLine(finlist);
writer.Close();
Why do you see?
System.Tuple`2[System.String,System.Int32][]
Because finAr is an array of tuples (string*int) and finlist is the array of same items, but ordered by count. When you do finlist.ToString() it does not give you a string representation of array items. ToString() by default (if not overridden) return full name of the object type. Which is array of tuples in your case.
Now what do you need to write a file of words in the frequency order? Just mapping array items to strings:
let lines =
clean
|> Array.countBy id // finAr
|> Array.sortByDescending (fun (_,count) -> count) // finlist
|> Array.map (fun (word, _) -> word) // here mapping each tuple to string
File.WriteAllLines("descent.txt", lines)
With a couple of wrappers, you can pipe operations related to reading file and writing to file:
"clep.txt"
|> readTextFile
|> getWordsMostFrequestFirst
|> writeLinesToFile "descent.txt"
Wrappers:
let readTextFile (path: string) =
(File.OpenText path).ReadToEnd()
let writeLinesToFile (path: string) (contents: string seq) =
File.WriteAllLines(path, contents)
And a function which processes text:
let getWordsMostFrequestFirst (text: string) =
let splitByWhitespaces (input: string) = Regex.Split(input, "\s+")
let toLower (input: string) = input.ToLower()
let removeDelimiters (input: string) = Regex.Replace(input, "[\,\.\!\"\:\;\?\-]", "")
text
|> removeDelimiters
|> toLower
|> splitByWhitespaces
|> Array.countBy id
|> Array.sortByDescending snd // easy way to get tuple items
|> Array.map fst
You're only writing a single line of text to the file, and because finlist is not a type for which StreamWriter.WriteLine() has a specific overload, it is treated as object, and the string used is the result of finlist.ToString(), which, as is common with built-in .NET types, is just the type name.
If you want to write the actual elements of the array to the file, you need to actually process the array.
This would write the string parts from all the tuples to the text file:
finlist
|> Array.map fst
|> Array.iter writer.WriteLine
To include the numbers, for example in the format "text: 1", you would have to create an appropriately formatted string for each array item first:
finlist
|> Array.map (fun (text, number) -> sprintf "%s: %i" text number)
|> Array.iter writer.WriteLine
By the way, because of the way .NET strings use \ for escaping characters, just like regular expressions do, your RegExes won't work the way you've written them. It should be
let reg = RegularExpressions.Regex #"\s+"
let cleanEx = RegularExpressions.Regex #"[\,\.\!\""\:\;\?\-]"
There are two changes here: The # before the strings tell the compiler not to use \ to escape characters (alternatively you can write every single backslash in a RegEx as \\, but that doesn't make it any more readable). In the middle of the second one, another " escapes the double quotes, because otherwise they would now terminate the string, and the line wouldn't compile anymore.

process subarrays asynchronously and reduce the results to a single array

Input If input is in the form of array of arrays.
let items = [|
[|"item1"; "item2"|]
[|"item3"; "item4"|]
[|"item5"; "item6"|]
[|"item7"; "item8"|]
[|"item9"; "item10"|]
[|"item11"; "item12"|]
|]
Asynchronous action that returns asynchronous result or error
let action (item: string) : Async<Result<string, string>> =
async {
return Ok (item + ":processed")
}
Attempt process one subarray at a time in parallel
let result = items
|> Seq.map (Seq.map action >> Async.Parallel)
|> Async.Parallel // wrong? process root items sequentially
|> Async.RunSynchronously
Expectations:
a) Process one subarray at a time in parallel, then process the second subarray in parallel and so on. (In other words sequential processing for the root items and parallel processing for subitems)
b) Then collect all the results and merge them into a singly dimensioned results array while maintaining the order.
c) Preferably using built-in methods provided by Array, Seq, List, Async etc. instead of any custom operators (that'd be last resort)
d) Optional - If it's not possible to have something within the chain, then as a last resort perhaps convert the result subarrays into single array at the end and return to the caller, if that leads to a cleaner and minimalistic approach which I prefer.
Attempt 2
let result2 = items
|> Seq.map (Seq.map action >> Async.Parallel)
|> Async.Parallel // wrong? is it processing root items sequentially
|> Async.RunSynchronously
|> Array.collect id
Array.iter (fun (item: Result<string, string>) ->
match item with
| Ok r -> Console.WriteLine(r)
| Error e -> Console.WriteLine(e)
) result2
Edit
let action (item: string) : Async<Result<string, string>> =
async {
return Ok (item + ":processed")
}
let items = [| "item1"; "item2"; "item3"; "item4"; "item5"; "item6"; "item7"; "item8"; "item9"; "item10"|]
let result = items
|> Seq.chunkBySize 2
|> Seq.map (Seq.map action >> Async.Parallel)
|> Seq.map Async.RunSynchronously
|> Seq.toArray
|> Array.collect id
let result = items |> Array.map ( Array.map action >> Async.Parallel)
|> Array.map Async.RunSynchronously
|> Array.collect id
Edit: Note that majority of operations defined on Seq can be found in array and vice versa. If you initially have an array you can use array operation all the way down.
let items = [| "item1"; "item2"; "item3"; "item4"; "item5"; "item6"; "item7"; "item8"; "item9"; "item10"|]
let result = items
|> Array.chunkBySize 2
|> Array.map (Array.map action >> Async.Parallel >> Async.RunSynchronously)
|> Array.concat

F# remove duplicates from a string [] list

I have a program that results in an [] list, and I'm trying to remove near duplicated arrays from the list. An example of the list is...
[
[|
"Jackson";
"Stentzke";
"22";
"001"
|];
[|
"Jackson";
"Stentzke";
"22";
"002"
|];
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
Basically I'm trying to write a function that would read over the list and remove all examples of near identical data. So the final returned [] list should look like...
[
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
I've tried a number of functions to try and get this result or something close to it that can work with. My current attempt is this...
let removeDuplicates (arrayList: string[]list) =
let list = arrayList|> List.map(fun aL ->
let a = arrayList|> List.map(fun aL2 ->
try
match (aL.GetValue(0).Equals(aL2.GetValue(0))) && (aL.GetValue(2).Equals(aL2.GetValue(2))) && (aL.GetValue(3).Equals(aL2.GetValue(3))) with
| false -> aL2
| _ -> [|""|]
with
| ex -> [|""|]
)
a
)
list |> List.concat |> List.distinct
But all this returns is the a reversed version on the input []list.
Does anyone know how to remove near duplicated arrays from a list?
I believe your code and comments don't match up very well. Considering your comments "the first, second and third values are the same", I believe this can get you in the right track:
let removeDuplicates (arrayList: string[]list) =
arrayList |> Seq.distinctBy (fun elem -> (elem.[0] , elem.[1] , elem.[2]))
The result of this against your input data is a two element list containing:
[
[|
"Jackson";
"Stentzke";
"22";
"001"
|];
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
You should create a dictionary/map based on the fields you consider identical then just remove any duplicate occurance. Here's a simply and mechanical way, assuming xs is the List you specified above:
type DataRec = { key:string
fname:string
lname:string
id1:string
id2:string}
let dataRecs = xs |> List.map (fun x -> {key=x.[0]+x.[1]+x.[2];fname=x.[0];lname=x.[1];id1=x.[2];id2=x.[3]})
dataRecs |> Seq.groupBy (fun x -> x.key)
|> Seq.filter (fun x -> Seq.length (snd x) = 1)
|> Seq.collect snd
|> Seq.map (fun x -> [|x.fname;x.lname;x.id1;x.id2|])
|> Seq.toList
Output:
val it : string [] list = [[|"Alec"; "Stentzke"; "18"; "003"|]]
It basically creates a key from the first three items, groups by it, filters out anything over 2 counst, and then maps back to an array.
Using some Linq:
let comparer (atMost) =
{ new System.Collections.Generic.IEqualityComparer<string[]> with
member __.Equals(a, b) =
Seq.zip a b
|> Seq.sumBy (fun (a',b') -> System.StringComparer.InvariantCulture.Compare(a', b') |> abs |> min 1)
|> ((>=) atMost)
member __.GetHashCode(a) = 1
}
System.Linq.Enumerable.GroupBy(data, id, comparer 1)
|> Seq.choose (fun g -> match Seq.length g with | 1 -> Some g.Key | _ -> None)
The comparer allows for atMost : int number of differences between two arrays.

How to generate an array with a dynamic type in F#

I'm trying to write a function to return a dynamically typed array. Here is an example of one of the ways I've tried:
let rng = System.Random()
type ConvertType =
| AsInts
| AsFloat32s
| AsFloats
| AsInt64s
type InputType =
| Ints of int[]
| Float32s of float32[]
| Floats of float[]
| Int64s of int64[]
let genData : int -> int -> ConvertType -> InputType * int[] =
fun (sCount:int) (rCount:int) (ct:ConvertType) ->
let source =
match ct with
| AsInts -> Array.init sCount (fun _ -> rng.Next()) |> Array.map (fun e -> int e) |> Ints
| AsFloat32s -> Array.init sCount (fun _ -> rng.Next()) |> Array.map (fun e -> float32 e) |> Float32s
| AsFloats -> Array.init sCount (fun _ -> rng.Next()) |> Array.map (fun e -> float e) |> Floats
| AsInt64s -> Array.init sCount (fun _ -> rng.Next()) |> Array.map (fun e -> int64 e) |> Int64s
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
The problem I have is that on down when I use the function, I need the array to be the primitive type e.g. float32[] and not "InputType."
I have also tried doing it via an interface created by an inline function and using generics. I couldn't get that to work how I wanted either but I could have just been doing it wrong.
Edit:
Thanks for the great reply(s), I'll have to try that out today. I'm adding the edit because I solved my problem, although I didn't solve it how I wanted (i.e. like the answer). So an FYI for those who may look at this, I did the following:
let counts = [100; 1000; 10000]
let itCounts = [ 1000; 500; 200]
let helperFunct =
fun (count:int) (numIt:int) (genData : int -> int -> ('T[] * int[] )) ->
let c2 = int( count / 2 )
let source, indices = genData count c2
....
[<Test>]
let ``int test case`` () =
let genData sCount rCount =
let source = Array.init sCount (fun _ -> rng.Next())
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
(counts, itCounts) ||> List.Iter2 (fun s i -> helperFunct s i genData)
.....
Then each proceeding test case would be something like:
[<Test>]
let ``float test case`` () =
let genData sCount rCount =
let source = Array.init sCount (fun _ -> rng.Next()) |> Array.map (fun e -> float e)
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
.....
But, the whole reason I asked the question was that I was trying to avoid rewriting that genData function for every test case. In my real code, this temporary solution kept me from having to break up some stuff in the "helperFunct."
I think that your current design is actually quite good. By listing the supported types explicitly, you can be sure that people will not try to call the function to generate data of type that does not make sense (say byte).
You can write a generic function using System.Convert (which lets you convert value to an arbitrary type, but may fail if this does not make sense). It is also (very likely) going to be less efficient, but I have not measured that:
let genDataGeneric<'T> sCount rCount : 'T[] * int[] =
let genValue() = System.Convert.ChangeType(rng.Next(), typeof<'T>) |> unbox<'T>
let source = Array.init sCount (fun _ -> genValue())
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
You can then write
genDataGeneric<float> 10 10
but people can also write genDataGeneric<bool> and the code will either crash or produce nonsense, so that's why I think that your original approach had its benefits too.
Alternatively, you could parameterize the function to take a converter that turns int (which is what you get from rng.Next) to the type you want:
let inline genDataGeneric convertor sCount rCount : 'T[] * int[] =
let genValue() = rng.Next() |> convertor
let source = Array.init sCount (fun _ -> genValue())
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
Then you can write just getDataGeneric float 10 10 which is still quite elegant and it will be efficient too (I added inline because I think it can help here)
EDIT: Based on the Leaf's comment, I tried using the overloaded operator trick and it turns out you can also do this (this blog has the best explanation of what the hell is going on in the following weird snippet!):
// Specifies conversions that we want to allow
type Overloads = Overloads with
static member ($) (Overloads, fake:float) = fun (n:int) -> float n
static member ($) (Overloads, fake:int64) = fun (n:int) -> int64 n
let inline genDataGeneric sCount rCount : 'T[] * int[] =
let convert = (Overloads $ Unchecked.defaultof<'T>)
let genValue() = rng.Next() |> convert
let source = Array.init sCount (fun _ -> genValue())
let indices = Array.init rCount (fun _ -> rng.Next sCount) |> Array.sort
source, indices
let (r : float[] * _) = genDataGeneric 10 10 // This compiles & works
let (r : byte[] * _) = genDataGeneric 10 10 // This does not compile
This is interesting, but note that you still have to specify the type somewhere (either by usage later or using a type annotation). If you're using type annotation, then the earlier code is probably simpler and gives less confusing error messages. But the trick used here is certainly interesting.

Resources