map multiple elements with same key in F# - arrays

Suppose two sorted lists with same length, key value list let k = [1;1;2;2;3;3], and a number listlet n = [1;2;3;4;5;6]. The ith position on both lists maps with each other, meaning the element n=1 has a key value of k=1, n=2 has a key value of k=1... so on.
How to create a list that maps multiple elements with same key value in F#?
So the output would be in a format [[n1;n2;key];[n1;n2;key]...], the example output is [[1;2;1];[3;4;2];[5;6;3]].

What you describe is a map data-structure, so i would use one, instead of implementing it yourself with a bunch of list functions you can convert your two lists in such a way
let k = [1;1;2;2;3;3]
let n = [1;2;3;4;5;6]
let mapKeyToValues keys values =
let folder m k v =
m |> Map.change k (function
| Some old -> Some (v :: old)
| None -> Some [v]
)
List.fold2 folder Map.empty keys values
Now with
let r = mapKeyToValues k n
You get a data-structur like
Map [
1, [2; 1]
2, [4; 3]
3, [6; 5]
]
This would be equivalent to the JSON-Object
{
"1": [2,1],
"2": [4,3],
"3": [6,5]
}
Usually you could write
let r = Map (List.zip k n)
This way, out of a tuple with (k,v) you create such a map-datastructure. But a new key, just ovverrides an older, so you get
Map [
(1, 2)
(2, 4)
(3, 6)
]
That's the reason why you need Map.change. You go through every (key,value) of a list with List.fold2 and then add it to the map data-structure. If a key is not present (the None case) you create a list as a value with your only value. In the Some case a key was already added, and you add your value to the list that is already present.
In this example i suppose the order of the values doesn't matter. If you want the same order as they appear in the value list, you must use List.foldBack2 instead of List.fold2. But you must change the order of some arguments. This should be an exercise for you, to understand it better.
There is a Map module with different functions to work with such a data-structure, and it also provides things like Map.map, Map.filter, Map.fold, Map.find, ... and so on. So use this instead. If really needed you also could use Map.toList to transform it back to a list again, or use Map.fold.
Anyway as a reminder, as some people use (List.zip and then List.map) or (List.zip and then List.fold). There is a List.map2 and List.fold2 that does this in one operation instead of two.

This can be achieved by "zipping" the two lists together and then grouping by the key:
let groupByKey keys numbers =
List.zip keys numbers
|> List.groupBy fst
|> List.map (fun (key, values) -> key, (values |> List.map snd))
groupByKey k n // [(1, [1; 2]); (2, [3; 4]); (3, [5; 6])]
The above version of the function doesn't have the exact output format that you specified, but it is a more structured output, where each item in the list is a tuple of the key and the list of values for that key. This might be preferred for cleaner code, depending on what you're going to do with the result.
To get your specified format it can be changed slightly:
let groupByKey keys numbers =
List.zip keys numbers
|> List.groupBy fst
|> List.map (fun (key, values) -> [
yield! values |> List.map snd
yield key
])
groupByKey k n // [[1; 2; 1]; [3; 4; 2]; [5; 6; 3]]

I think what you are looking for is Seq.zip, which takes two sequences (list are sequences) and gives back a sequence of pairs.
Also note you specified your lists with commas, but F# uses semi-colons to separate list elements.
let keys = [ 1; 1; 2; 2; 3; 3 ]
let values = [ 1; 2; 3; 4; 5; 6 ]
let map =
values
|> Seq.zip keys
|> Seq.groupBy (fun (k, v) -> k)
|> Seq.map (fun (k, vs) -> k, vs |> Seq.map snd |> Seq.toList)
|> Map.ofSeq
printfn "%A" map
// map [(1, [1; 2]); (2, [3; 4]); (3, [5; 6])]
To get the format you are looking for (although I think you should consider a Map) you can unpack it like so:
let mapList =
map
|> Map.toSeq
|> Seq.map (fun (k, vs) -> vs # [ k ])
|> Seq.toList
printfn "%A" mapList
// [[1; 2; 1]; [3; 4; 2]; [5; 6; 3]]

Related

process subarrays asynchronously and reduce the results to a single array

Input If input is in the form of array of arrays.
let items = [|
[|"item1"; "item2"|]
[|"item3"; "item4"|]
[|"item5"; "item6"|]
[|"item7"; "item8"|]
[|"item9"; "item10"|]
[|"item11"; "item12"|]
|]
Asynchronous action that returns asynchronous result or error
let action (item: string) : Async<Result<string, string>> =
async {
return Ok (item + ":processed")
}
Attempt process one subarray at a time in parallel
let result = items
|> Seq.map (Seq.map action >> Async.Parallel)
|> Async.Parallel // wrong? process root items sequentially
|> Async.RunSynchronously
Expectations:
a) Process one subarray at a time in parallel, then process the second subarray in parallel and so on. (In other words sequential processing for the root items and parallel processing for subitems)
b) Then collect all the results and merge them into a singly dimensioned results array while maintaining the order.
c) Preferably using built-in methods provided by Array, Seq, List, Async etc. instead of any custom operators (that'd be last resort)
d) Optional - If it's not possible to have something within the chain, then as a last resort perhaps convert the result subarrays into single array at the end and return to the caller, if that leads to a cleaner and minimalistic approach which I prefer.
Attempt 2
let result2 = items
|> Seq.map (Seq.map action >> Async.Parallel)
|> Async.Parallel // wrong? is it processing root items sequentially
|> Async.RunSynchronously
|> Array.collect id
Array.iter (fun (item: Result<string, string>) ->
match item with
| Ok r -> Console.WriteLine(r)
| Error e -> Console.WriteLine(e)
) result2
Edit
let action (item: string) : Async<Result<string, string>> =
async {
return Ok (item + ":processed")
}
let items = [| "item1"; "item2"; "item3"; "item4"; "item5"; "item6"; "item7"; "item8"; "item9"; "item10"|]
let result = items
|> Seq.chunkBySize 2
|> Seq.map (Seq.map action >> Async.Parallel)
|> Seq.map Async.RunSynchronously
|> Seq.toArray
|> Array.collect id
let result = items |> Array.map ( Array.map action >> Async.Parallel)
|> Array.map Async.RunSynchronously
|> Array.collect id
Edit: Note that majority of operations defined on Seq can be found in array and vice versa. If you initially have an array you can use array operation all the way down.
let items = [| "item1"; "item2"; "item3"; "item4"; "item5"; "item6"; "item7"; "item8"; "item9"; "item10"|]
let result = items
|> Array.chunkBySize 2
|> Array.map (Array.map action >> Async.Parallel >> Async.RunSynchronously)
|> Array.concat

F# remove duplicates from a string [] list

I have a program that results in an [] list, and I'm trying to remove near duplicated arrays from the list. An example of the list is...
[
[|
"Jackson";
"Stentzke";
"22";
"001"
|];
[|
"Jackson";
"Stentzke";
"22";
"002"
|];
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
Basically I'm trying to write a function that would read over the list and remove all examples of near identical data. So the final returned [] list should look like...
[
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
I've tried a number of functions to try and get this result or something close to it that can work with. My current attempt is this...
let removeDuplicates (arrayList: string[]list) =
let list = arrayList|> List.map(fun aL ->
let a = arrayList|> List.map(fun aL2 ->
try
match (aL.GetValue(0).Equals(aL2.GetValue(0))) && (aL.GetValue(2).Equals(aL2.GetValue(2))) && (aL.GetValue(3).Equals(aL2.GetValue(3))) with
| false -> aL2
| _ -> [|""|]
with
| ex -> [|""|]
)
a
)
list |> List.concat |> List.distinct
But all this returns is the a reversed version on the input []list.
Does anyone know how to remove near duplicated arrays from a list?
I believe your code and comments don't match up very well. Considering your comments "the first, second and third values are the same", I believe this can get you in the right track:
let removeDuplicates (arrayList: string[]list) =
arrayList |> Seq.distinctBy (fun elem -> (elem.[0] , elem.[1] , elem.[2]))
The result of this against your input data is a two element list containing:
[
[|
"Jackson";
"Stentzke";
"22";
"001"
|];
[|
"Alec";
"Stentzke";
"18";
"003"
|]
]
You should create a dictionary/map based on the fields you consider identical then just remove any duplicate occurance. Here's a simply and mechanical way, assuming xs is the List you specified above:
type DataRec = { key:string
fname:string
lname:string
id1:string
id2:string}
let dataRecs = xs |> List.map (fun x -> {key=x.[0]+x.[1]+x.[2];fname=x.[0];lname=x.[1];id1=x.[2];id2=x.[3]})
dataRecs |> Seq.groupBy (fun x -> x.key)
|> Seq.filter (fun x -> Seq.length (snd x) = 1)
|> Seq.collect snd
|> Seq.map (fun x -> [|x.fname;x.lname;x.id1;x.id2|])
|> Seq.toList
Output:
val it : string [] list = [[|"Alec"; "Stentzke"; "18"; "003"|]]
It basically creates a key from the first three items, groups by it, filters out anything over 2 counst, and then maps back to an array.
Using some Linq:
let comparer (atMost) =
{ new System.Collections.Generic.IEqualityComparer<string[]> with
member __.Equals(a, b) =
Seq.zip a b
|> Seq.sumBy (fun (a',b') -> System.StringComparer.InvariantCulture.Compare(a', b') |> abs |> min 1)
|> ((>=) atMost)
member __.GetHashCode(a) = 1
}
System.Linq.Enumerable.GroupBy(data, id, comparer 1)
|> Seq.choose (fun g -> match Seq.length g with | 1 -> Some g.Key | _ -> None)
The comparer allows for atMost : int number of differences between two arrays.

f# Finding the difference between 2 obj[]lists

I have 2 obj[]lists list1 and list2. List1 has a length of 8 and list2 has a length of 10. There are arrays in list1 that only exist in list1. That also goes the same for list2. But there are array that exist in both. I'm wondering how to get the arrays that exist in list1. At the moment when I run my code I get a list of the arrays that exist in both lists, but it's missing the data unique to list1. I'm wondering how to get that unique list1 data. Any suggestions?
let getProdOnly (index:int)(list1:obj[]list)(list2:obj[]list) =
let mutable list3 = list.Empty
for i = 0 to list1.Length-1 do
for j = 0 to list2.Length-1 do
if list1.Item(i).GetValue(index).Equals(list2.Item(j).GetValue(index)) then
System.Diagnostics.Debug.WriteLine("Exists in List 1 and 2")
else
list3 <- list1.Item(i)
Something like this:
let ar1 = [|1;2;3|]
let ar2 = [|2;3;4|]
let s1 = ar1 |> Set.ofArray
let s2 = ar2 |> Set.ofArray
Set.difference s1 s2
//val it : Set<int> = set [1]
There are also a bunch of Array related functions, like compareWith, distinct, exists if you want to work with Arrays directly.
But as was pointed out in previous answers, this type of imperative code is not very idiomatic. Try to avoid mutable variables, try to avoid loops. It could probably rewritten with Array.map for example.
If you want the elements unique to one list, this is the easiest way to do it in F# 4.0:
list1
|> List.except list2
which will remove all the elements of list2 from list1. Note that except also calls a distinct, so you might need to watch out for that.
First I took your code with minor changes and added some printf debuging to see what is does.
let getProdOnly2 (index:int)(list1:obj[] list)(list2:obj[] list) =
let mutable list3 : obj[] list= list.Empty
for i = 0 to list1.Length-1 do
for j = 0 to list2.Length-1 do
if list1.[i].[index] = list2.[j].[index] then
printfn "equal"
System.Diagnostics.Debug.WriteLine("Exists in List 1 and 2")
list3
else
printfn "add %A %A" (list1.Item(i)) (list2.Item(j))
list3 <- list1.Item(i) :: list3
list3
list3
And it does adding an element each time it finds an element not equal the current element.
So my attempt is to take the list1 and just ceep or better filter the elements that are not part of list2.
let getProdOnly3 (index:int)(list1:obj[] list)(list2:obj[] list) =
list1
|> List.filter (fun el1 ->
list2
|> List.fold (fun acc el2 -> acc && (el2<>el1)) true )
I tested the code with the following lists
let list1 = [ [| 1;2;3;4|]
[| 1;2;3;4|]
[| 2;3;4|]
[| 3;4;5|] ] |> List.map (fun a -> a |> Array.map (fun e -> box e))
let list2 = [ [| 2;3;4|]
[| 3;4;5|] ] |> List.map (fun a -> a |> Array.map (fun e -> box e))
In difference to s952163 my result will have double entries if list1 has double entries, do not know if that is wanted or unwanted beahyuvier.

Matching elements of two arrays in F#

I have two sequences of stock data, and I'm trying to line up the dates and combine the data so that I can pass it to other functions that will run some statistics on it. Essentially, I want to pass two (or more) sequences that look like:
sequenceA = [(float,DateTime)]
sequenceB = [(float,DateTime)]
to a function, and have it return a single sequence where all the data is properly aligned by DateTime. Something like:
return = [(float,float,DateTime)]
where the floats are the close prices of the two sequences for that DateTime.
I've tried using a nested for loop, and I'm fairly certain that should work (though I've had some trouble with it), but it seems like F#'s match expression should also be able to handle this. I've looked up some documentation and examples of match expressions, but I'm running into a number of different issues that I haven't been able to get past.
This is my most recent attempt at a simplified version of what I'm trying to accomplish. As you can see, I'm just trying to see if the first element of the sequence 'x' has the date "1/11/2011". The problem is that 1) it always returns "Yes", and 2) I can't figure out how to get from here to the whole sequence, and then ultimately 2+ sequences.
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,System.DateTime.Parse("1/9/2011"))]
type t = seq<float*DateTime>
let align (a:t) =
let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match snd b with
| testDate -> printfn "Yes"
| _ -> printfn "No"
align x
I'm relatively new to F#, but I'm fairly sure that this should be possible with a match expression. Any help would be much appreciated!
Your question has two parts:
As to the pattern matching, in the pattern that you have above, testDate is a name that will be bound to the second item in tuple b. Both patterns will match any date, but the since the first pattern matches, your example always prints 'yes'.
If you want to match on a specific value of date, you can use the 'when' keyword to in your pattern:
let dateValue = DateTime.Today
match dateValue with
| someDate when someDate = DateTime.Today -> "Today"
| _ -> "Not Today"
If I had to implement the align function, I probably wouldn't try to use pattern matching. You can use Seq.groupBy to collect all entries with the same date.
///Groups two sequences together by key
let align a b =
let simplifyEntry (key, values) =
let prices = [for value in values -> snd value]
key, prices
a
|> Seq.append b
|> Seq.groupBy fst
|> Seq.map simplifyEntry
|> Seq.toList
//Demonstrate alignment of two sequences
let s1 = [DateTime.Today, 1.0]
let s2 = [
DateTime.Today, 2.0
DateTime.Today.AddDays(2.0), 10.0]
let pricesByDate = align s1 s2
for day, prices in pricesByDate do
let pricesText =
prices
|> Seq.map string
|> String.concat ", "
printfn "%A %s" day pricesText
I happen to be working on a library for working with time series data and it has a function for doing this - it is actually a bit more general, because it returns DateTime * float option * float option to represent the case when one series has value for a specified date, but the other one does not.
The function assumes that the two series are already sorted - which means that it only needs to walk over them once (for not-sorted sequences, you need to do multiple iterations or build some temporary tables).
Also note that the arguments are swapped than in your example. You need to give it DateTime * float. The function is not particularly nice - it works in IEnumerable which means that it needs to use mutable enumerators (and ugly imperative stuff, in general). In general, pattern matching just does not work well with sequences - you can get the head, but you cannot get the tail - because that would be inefficient. You could write much nicer one for F# lists...
open System.Collections.Generic
let alignWithOrdering (seq1:seq<'T * 'TAddress>) (seq2:seq<'T * 'TAddress>) (comparer:IComparer<_>) = seq {
let withIndex seq = Seq.mapi (fun i v -> i, v) seq
use en1 = seq1.GetEnumerator()
use en2 = seq2.GetEnumerator()
let en1HasNext = ref (en1.MoveNext())
let en2HasNext = ref (en2.MoveNext())
let returnAll (en:IEnumerator<_>) hasNext f = seq {
if hasNext then
yield f en.Current
while en.MoveNext() do yield f en.Current }
let rec next () = seq {
if not en1HasNext.Value then yield! returnAll en2 en2HasNext.Value (fun (k, i) -> k, None, Some i)
elif not en2HasNext.Value then yield! returnAll en1 en1HasNext.Value (fun (k, i) -> k, Some i, None)
else
let en1Val, en2Val = fst en1.Current, fst en2.Current
let comparison = comparer.Compare(en1Val, en2Val)
if comparison = 0 then
yield en1Val, Some(snd en1.Current), Some(snd en2.Current)
en1HasNext := en1.MoveNext()
en2HasNext := en2.MoveNext()
yield! next()
elif comparison < 0 then
yield en1Val, Some(snd en1.Current), None
en1HasNext := en1.MoveNext()
yield! next ()
else
yield en2Val, None, Some(snd en2.Current)
en2HasNext := en2.MoveNext()
yield! next () }
yield! next () }
Assuming that we want to use strings as keys (rather than your DateTime), you can call it like this:
alignWithOrdering
[ ("b", 0); ("c", 1); ("d", 2) ]
[ ("a", 0); ("b", 1); ("c", 2) ] (Comparer<string>.Default) |> List.ofSeq
// Returns
[ ("a", None, Some 0); ("b", Some 0, Some 1);
("c", Some 1, Some 2); ("d", Some 2, None) ]
If you're interested in working with time series of stock data in F#, you might be interested in joining the F# for Data and Machine Learning working group of the F# Foundation. We're currently working on an open-source library with support for time series that makes this much nicer :-). If you're interested in looking at & contributing to the early preview, then you can do that via this working group.
open System
let x = seq[(1.0,System.DateTime.Parse("1/8/2011"));(2.0,DateTime.Parse("1/9/2011"))]
//type t = seq<float*DateTime>
let (|EqualDate|_|) str dt=
DateTime.TryParse str|>function
|true,x when x=dt->Some()
|_->None
let align a =
//let testDate = System.DateTime.Parse("1/11/2011")
let b = Seq.head(a)
match b with
|_,EqualDate "1/9/2011" -> printfn "Yes"
| _ -> printfn "No"
align x
x|>Seq.skip 1|>align

How to Generate A Specific Number of Random Indices for Array Element Removal F#

So sCount is the number of elements in the source array, iCount is the number of elements I want to remove.
let indices = Array.init iCount (fun _ -> rng.Next sCount) |> Seq.distinct |> Seq.toArray |> Array.sort
The problem with the method above is that I need to specifically remove iCount indices, and this doesn't guarantee that.
I've tried stuff like
while indices.Count < iCount do
let x = rng.Next sCount
if not (indices.Contains x) then
indices <- indices.Add x
And a few other similar things...
Every way I've tried has been extremely slow though - I'm dealing with source arrays of sizes up to 20 million elements.
What you're doing should be fine if you need a set of indices of negligible size compared to the array. Otherwise, consider doing a variation of a Knuth-Fisher-Yates shuffle to get the first i elements in a random permutation of 1 .. n:
let rndSubset i n =
let arr = Array.zeroCreate i
arr.[0] <- 0
for j in 1 .. n-1 do
let ind = rnd.Next(j+1)
if j < i then arr.[j] <- arr.[ind]
if ind < i then arr.[ind] <- j
arr
I won't give you F# code for this (because I don't know F#...), but I'll describe the approach/algorithm that you should use.
Basically, what you want to do is pick n random elements of a given list list. This can be done in pseudocode:
chosen = []
n times:
index = rng.upto(list.length)
elem = list.at(index)
list.remove-at(index)
chosen.add(elem)
Your list variable should be populated with all possible indices in the source list, and then when you pick n random values from that list of indices, you have random, distinct indices that you can do whatever you want with, including printing values, removing values, knocking yourself out with values, etc...
is iCount closer to the size of the array or closer to 0? That will change the algorithm which you will use.
If closer to 0, then keep track of the previously generated numbers and check if additional numbers have already been generated.
If closer to the size of the array then use the method as described by #feralin
let getRandomNumbers =
let rand = Random()
fun max count ->
Seq.initInfinite (fun _ -> rand.Next(max))
|> Seq.distinct
|> Seq.take count
let indices = Array.init 100 id
let numToRemove = 10
let indicesToRemove = getRandomNumbers (indices.Length - 1) numToRemove |> Seq.toList
> val indicesToRemove : int list = [32; 38; 26; 51; 91; 43; 92; 94; 18; 35]

Resources