I am trying to implement a function stride n a where n is a stride length and a is an array. Given a call like stride 2 [| "a"; "b"; "c"; "d" |] it should return something like [ [|"a"; "b"|]; [|"c"; "d" |] ]. I am brand new to F# and don't know anything about using arrays in a functional language. I know what I've written is garbage, but it's a start:
let stride n (a : array 'a) =
let rec r ind =
if ind >= a.length()
then
[]
else
a[ind .. (ind + n - 1)]::r(ind + n)
in
r 0
(see also on dotnetfiddle). This does not compile. I to added the array 'a type parameter because the compiler couldn't find the length method, but this type parameter does not appear to be allowed.
For context, I am trying to get groups of letters from a string, so I plan to call this like stride 2 myString.ToCharArray().
First of all, there is already an app for that - it's called Array.chunkBySize:
> Array.chunkBySize 2 [|1;2;3;4;5;6;7|]
val it : int [] [] = [|[|1; 2|]; [|3; 4|]; [|5; 6|]; [|7|]|]
And there are similar functions in the Seq and List module, so if your goal is to work with strings, I would consider using the Seq variant, since string already implements the seq interface:
> Seq.chunkBySize 2 "abcdefg";;
val it : seq<char []> =
seq [[|'a'; 'b'|]; [|'c'; 'd'|]; [|'e'; 'f'|]; [|'g'|]]
But if you're interested in education rather than GSD, then here it is:
The logic of the code is fine, except you have a few purely syntactic mistakes and one logical.
First, the type "array of a" is not denoted array 'a. In general the F# notation for generic types is either T<'a> or 'a T, for example list<int> or int list. However, this doesn't work for arrays, because arrays are special. There is a type System.Array, but it's not generic, so you can't use it like that. Instead, the idea of an array is kind of baked into the CLR, so you have to use the special syntax, which is 'a[].
So: a : 'a[] instead of a : array 'a
Second while array does have a length property, it's capitalized (i.e. Length) and it's a property, not a method, so there shouldn't be parens after it.
So: a.Length instead of a.length()
However, that is not quite the F# way. Methods and properties are meh, functions are way better. The F# way would be to use the Array.length function.
So: Array.length a instead of a.length()
Bonus: if you do that, there is no need for the type annotation : 'a[], because the compiler can now figure it out from the type of Array.length.
Third, indexing of arrays, lists, and everything else that has an index, needs a dot before the opening bracket.
So: a.[ind .. (ind + n - 1)] instead of a[ind .. (ind + n - 1)]
Fourth, the in is not necessary in F#. You can just omit it.
With the above modifications, your program will work.
But only on arrays whose length is a multiple of n. On all others you'll get an IndexOutOfRangeException. This is because you also have...
The logical mistake is that while you're checking that ind is within the array bounds, you're not checking that ind + n - 1 is as well. So you need a third case in your branch:
if ind >= Array.length a then
[]
elif ind + n - 1 >= Array.length a then
a.[ind..] :: r (ind+n)
else
a.[ind .. (ind+n-1)] :: r (ind+n)
Now this is ready for prime time.
I'm working on an f# solution to this problem where I need to find the generator element above 1,000,000 with the longest generated sequence
I use a tail-recursive function that memoizes the previous results to speed up the calculation. This is my current implementation.
let memoize f =
let cache = new Dictionary<_,_>(1000000)
(fun x ->
match cache.TryGetValue x with
| true, v ->
v
| _ -> let v = f x
cache.Add(x, v)
v)
let rec memSequence =
memoize (fun generator s ->
if generator = 1 then s + 1
else
let state = s+1
if even generator then memSequence(generator/2) state
else memSequence(3*generator + 1) state )
let problem14 =
Array.init 999999 (fun idx -> (idx+1, (memSequence (idx+1) 0))) |> Array.maxBy snd |> fst
It seems to work well until want to calculate the lengths of the sequences generated by the first 100,000 numbers but it slows down significantly over that. In fact, for 120,000 it doesn't seem to terminate. I had a feeling that it might be due to the Dictionary I use, but I read that this shouldn't be the case. Could you point out why this may be potentially inefficient?
You're on the right track, but there's one thing very wrong in how you implement your memoization.
Your memoize function takes a function of one argument and returns a memoized version of it. When you use it in memSequence however, you give it a curried, two argument function. What then happens is that the memoize takes the function and saves down the result of partially applying it for the first argument only, i.e. it stores the closure resulting from applying the function to generator, and than proceeds to call those closures on s.
This means that your memoization effectively doesn't do anything - add some print statements in your memoize function and you'll see that you're still doing full recursion.
I think the underlying question may have been How to combine a memoizing function with a potentially costly calculating function that takes more than one argument?.
In this case, that second argument isn't needed. There's nothing inherently wrong with memoizing 2168612 elements (the size of the dictionary after the calculation).
Beware of overflow, since at 113383 the sequence surpasses System.Int32.MaxValue. A solution might thus look like this:
let memoRec f =
let d = new System.Collections.Generic.Dictionary<_,_>()
let rec g x =
match d.TryGetValue x with
| true, res -> res
| _ -> let res = f g x in d.Add(x, res); res
g
let collatzLong =
memoRec (fun f n ->
if n <= 1L then 0
else 1 + f (if n % 2L = 0L then n / 2L else n * 3L + 1L) )
{0L .. 999999L}
|> Seq.map (fun i -> i, collatzLong i)
|> Seq.maxBy snd
|> fst
How is it possible to exit early / break out of / stop an array creation in F# (in this case, of Array2D.initBased)?
Remark: dic is a Dictionary<,>() whose value is an object that has a method named someMethod that takes two int parameters.
let arr = Array2D.initBased 1 1 width height (fun x y ->
let distinctValues = dic |> Seq.map (fun (KeyValue(k,v)) -> v.someMethod x y) |> Set.ofSeq
match distinctValues.count with
| dic.Count ->
// do something
// exit array creation here, because I do not need arr any more if v.someMethod x y produced distinct values for each dic value
| _ ->
// do something else
This is a tricky question - I don't think there is any function that lets you do this easily. I think the best option is probably to define your own higher-order function (implemented using not very elegant recursion) that hides the behavior.
The idea would be to define tryInitBased that behaves as initBased but the user-provided function can return option (to indicate failure) and the function returns option (either successfully created array or None):
/// Attempts to initialize a 2D array using the specified base offsets and lengths.
/// The provided function can return 'None' to indicate a failure - if the initializer
/// fails for any of the location inside the array, the construction is stopped and
/// the function returns 'None'.
let tryInitBased base1 base2 length1 length2 f =
let arr = Array2D.createBased base1 base2 length1 length2 (Unchecked.defaultof<_>)
/// Recursive function that fills a specified 'x' line
/// (returns false as soon as any call to 'f' fails, or true)
let rec fillY x y =
if y < (base2+length2) then
match f x y with
| Some v ->
arr.[x, y] <- v
fillY x (y + 1)
| _ -> false
else true
/// Recursive function that iterates over all 'x' positions
/// and calls 'fillY' to fill individual lines
let rec fillX x =
if x < (base1+length1) then
if fillY x base2 then fillX (x + 1)
else false
else true
if fillX base1 then Some arr else None
Then you can keep your code pretty much the same, but replace initBased with tryInitBased and return None or Some(res) from the lambda function.
I also posted the function to F# snippets with a nicer formatting.
I have an array like this,
[|{Name = "000016.SZ";
turnover = 3191591006.0;
MV = 34462194.8;};
{Name = "000019.SZ";
turnover = 2316868899.0;
MV = 18438461.48;};
{Name = "000020.SZ";
turnover = 1268882399.0;
MV = 7392964.366;};
.......
|]
How do I sort this array according to "turnover"? Thanks
(does not have much context to explain the code section? how much context should I write)
Assuming that the array is in arr you can just do
arr |> Array.sortBy (fun t -> t.turnover)
I know this has already been answered beautifully; however, I am finding that, like Haskell, F# matches the way I think and thought I'd add this for other novices :)
let rec sortData =
function
| [] -> []
| x :: xs ->
let smaller = List.filter (fun e -> e <= x) >> sortData
let larger = List.filter (fun e -> e > x) >> sortData
smaller xs # [ x ] # larger xs
Note 1: "a >> b" is function composition and means "create a function, f, such that f x = b(a(x))" as in "apply a then apply b" and so on if it continues: a >> b >> c >>...
Note 2: "#" is list concatenation, as in [1..100] = [1..12] # [13..50] # [51..89] # [90..100]. This is more powerful but less efficient than cons, "::", which can only add one element at a time and only to the head of a list, a::[b;c;d] = [a;b;c;d]
Note 3: the List.filter (fun e ->...) expressions produces a "curried function" version holding the provided filtering lambda.
Note 4: I could have made "smaller" and "larger" lists instead of functions (as in "xs |> filter |> sort"). My choice to make them functions was arbitrary.
Note 5: The type signature of the sortData function states that it requires and returns a list whose elements support comparison:
_arg1:'a list -> 'a list when 'a : comparison
Note 6: There is clarity in brevity (despite this particular post :) )
As a testament to the algorithmic clarity of functional languages, the following optimization of the above filter sort is three times faster (as reported by VS Test Explorer). In this case, the list is traversed only once per pivot (the first element) to produce the sub-lists of smaller and larger items. Also, an equivalence list is introduced which collects matching elements away from further comparisons.
let rec sort3 =
function
| [] -> []
| x::xs ->
let accum tot y =
match tot with
| (a,b,c) when y < x -> (y::a,b,c)
| (a,b,c) when y = x -> (a,y::b,c)
| (a,b,c) -> (a,b,y::c)
let (a,b,c) = List.fold accum ([],[x],[]) xs
(sort3 a) # b # (sort3 c)
EDIT3: I'm writing a code to process very long input list of Ints with only few hundred non-duplicates. I use two auxiliary lists to maintain cumulative partial sums to calculate some accumulator value, the how's and why's are non-important. I want to ditch all lists here and turn it into nice destructive loop, and I don't know how. I don't need the whole code, just a skeleton code would be great, were read/write is done to two auxiliary arrays and some end result is returned. What I have right now would run 0.5 hour for the input. I've coded this now in C++, and it runs in 90 seconds for the same input.
I can't understand how to do this, at all. This is the list-based code that I have right now:(but the Map-based code below is clearer)
ins :: (Num b, Ord a) => a -> b -> [(a, b)] -> ([(a, b)], b)
ins n x [] = ( [(n,x)], 0)
ins n x l#((v, s):t) =
case compare n v of
LT -> ( (n,s+x) : l , s )
EQ -> ( (n,s+x) : t , if null t then 0 else snd (head t))
GT -> let (u,z) = ins n x t
in ((v,s+x):u,z)
This is used in a loop, to process a list of numbers of known length, (changed it to foldl now)
scanl g (0,([],[])) ns -- ns :: [Int]
g ::
(Num t, Ord t, Ord a) =>
(t, ([(a, t)], [(a, t)])) -> a -> (t, ([(a, t)], [(a, t)]))
g (c,( a, b)) n =
let
(a2,x) = ins n 1 a
(b2,y) = if x>0 then ins n x b else (b,0)
c2 = c + y
in
(c2,( a2, b2))
This works, but I need to speed it up. In C, I would keep the lists (a,b) as arrays; use binary search to find the element with the key just above or equal to n (instead of the sequential search used here); and use in-place update to change all the preceding entries.
I'm only really interested in final value. How is this done in Haskell, with mutable arrays?
I tried something, but I really don't know what I'm doing here, and am getting strange and very long error messages (like "can not deduce ... from context ..."):
goarr top = runSTArray $ do
let sz = 10000
a <- newArray (1,sz) (0,0) :: ST s (STArray s Int (Integer,Integer))
b <- newArray (1,sz) (0,0) :: ST s (STArray s Int (Integer,Integer))
let p1 = somefunc 2 -- somefunc :: Integer -> [(Integer, Int)]
go1 p1 2 0 top a b
go1 p1 i c top a b =
if i >= top
then
do
return c
else
go2 p1 i c top a b
go2 p1 i c top a b =
do
let p2 = somefunc (i+1) -- p2 :: [(Integer, Int)]
let n = combine p1 p2 -- n :: Int
-- update arrays and calc new c
-- like the "g" function is doing:
-- (a2,x) = ins n 1 a
-- (b2,y) = if x>0 then ins n x b else (b,0)
-- c2 = c + y
go1 p2 (i+1) c2 top a b -- a2 b2??
This doesn't work at all. I don't even know how to encode loops in do notation. Please help.
UPD: the Map based code that runs 3 times slower:
ins3 :: (Ord k, Num a) => k -> a -> Map.Map k a -> (Map.Map k a, a)
ins3 n x a | Map.null a = (Map.insert n x a , 0)
ins3 n x a = let (p,q,r) = Map.splitLookup n a in
case q of
Nothing -> (Map.union (Map.map (+x) p)
(Map.insert n (x+leftmost r) r) , leftmost r)
Just s -> (Map.union (Map.map (+x) p)
(Map.insert n (x+s) r) , leftmost r)
leftmost r | Map.null r = 0
| otherwise = snd . head $ Map.toList r
UPD2: The error message is " Could not deduce (Num (STArray s1 i e)) from the context () arising from the literal `0' at filename.hs:417:11"
that's where it says return c in go1 function. Perhaps c is expected to be an array, but I want to return the accumulator value that is built while using the two auxiliary arrays.
EDIT3: I've replaced scanl and (!!) with foldl and take as per Chris's advice, and now it runs in constant space with sane empirical complexity and is actually projected to finish in under 0.5 hour - a.o.t. ... 3 days ! I knew about it of course but was so sure GHC optimizes the stuff away for me, surely it wouldn't make that much of a difference, I thought! And so felt only mutable arrays could help... Bummer.
Still, C++ does same in 90 sec, and I would very much appreciate help in learning how to code this with mutable arrays, in Haskell.
Are the input values ever EQ? If they are not EQ then the way scanl g (0,([],[])) ns is used means that the first [(,)] array, call it a always has map snd a == reverse [1..length a] at each stage of g. For example, in a length 10 list the value of snd (a !! 4) is going to be 10-4. Keeping these reversed index values by mutating the second value of each preceding entry in a is quite wasteful. If you need speed then this is one place to make a better algorithm.
None of this applies to the second [(,)] whose purpose is still mysterious to me. It records all insertions that were not done at the end of a, so perhaps it allows one to reconstruct the initial sequence of values.
You said "I'm only really interested in final value." Do you mean you only care about the last value in list output by the scanl .. line? If so then you need a foldl instead of scanl.
Edit: I am adding a non-mutable solution using a custom Finger Tree. It passes my ad hoc testing (at bottom of code):
{-# LANGUAGE MultiParamTypeClasses #-}
import Data.Monoid
import Data.FingerTree
data Entry a v = E !a !v deriving Show
data ME a v = NoF | F !(Entry a v) deriving Show
instance Num v => Monoid (ME a v) where
mempty = NoF
NoF `mappend` k = k
k `mappend` NoF = k
(F (E _a1 v1)) `mappend` (F (E a2 v2)) = F (E a2 (v1 + v2))
instance Num v => Measured (ME a v) (Entry a v) where
measure = F
type M a v = FingerTree (ME a v) (Entry a v)
getV NoF = 0
getV (F (E _a v)) = v
expand :: Num v => M a v -> [(a, v)]
expand m = case viewl m of
EmptyL -> []
(E a _v) :< m' -> (a, getV (measure m)) : expand m'
ins :: (Ord a, Num v) => a -> v -> M a v -> (M a v, v)
ins n x m =
let comp (F (E a _)) = n <= a
comp NoF = False
(lo, hi) = split comp m
in case viewl hi of
EmptyL -> (lo |> E n x, 0)
(E v s) :< higher | n < v ->
(lo >< (E n x <| hi), getV (measure hi))
| otherwise ->
(lo >< (E n (s+x) <| higher), getV (measure higher))
g :: (Num t, Ord t, Ord a) =>
(t, (M a t, M a t)) -> a -> (t, (M a t, M a t))
g (c, (a, b)) n =
let (a2, x) = ins n 1 a
(b2, y) = if x>0 then ins n x b else (b, 0)
in (c+y, (a2, b2))
go :: (Ord a, Num v, Ord v) => [a] -> (v, ([(a, v)], [(a, v)]))
go ns = let (t, (a, b)) = foldl g (0, (mempty, mempty)) ns
in (t, (expand a, expand b))
up = [1..6]
down = [5,4..1]
see'tests = map go [ up, down, up ++ down, down ++ up ]
main = putStrLn . unlines . map show $ see'test
Slightly unorthodox, I am adding a second answer using a mutable technique. Since user1308992 mentioned Fenwick trees, I have used them to implement the algorithm. Two STUArray are allocated and mutated during the run. The basic Fenwick tree keeps totals for all smaller indices and the algorithm here needs totals for all larger indices. This change is handled by the (sz-x) subtraction.
import Control.Monad.ST(runST,ST)
import Data.Array.ST(STUArray,newArray)
import Data.Array.Base(unsafeRead, unsafeWrite)
import Data.Bits((.&.))
import Debug.Trace(trace)
import Data.List(group,sort)
{-# INLINE lsb #-}
lsb :: Int -> Int
lsb i = (negate i) .&. i
go :: [Int] -> Int
go xs = compute (maximum xs) xs
-- Require "top == maximum xs" and "all (>=0) xs"
compute :: Int -> [Int] -> Int
compute top xs = runST mutating where
-- Have (sz - (top+1)) > 0 to keep algorithm simple
sz = top + 2
-- Reversed Fenwick tree (no bounds checking)
insert :: STUArray s Int Int -> Int -> Int -> ST s ()
insert arr x v = loop (sz-x) where
loop i | i > sz = return ()
| i <= 0 = error "wtf"
| otherwise = do
oldVal <- unsafeRead arr i
unsafeWrite arr i (oldVal + v)
loop (i + lsb i)
getSum :: STUArray s Int Int -> Int -> ST s Int
getSum arr x = loop (sz - x) 0 where
loop i acc | i <= 0 = return acc
| otherwise = do
val <- unsafeRead arr i
loop (i - lsb i) $! acc + val
ins n x arr = do
insert arr n x
getSum arr (succ n)
mutating :: ST s Int
mutating = do
-- Start index from 0 to make unsafeRead, unsafeWrite easy
a <- newArray (0,sz) 0 :: ST s (STUArray s Int Int)
b <- newArray (0,sz) 0 :: ST s (STUArray s Int Int)
let loop [] c = return c
loop (n:ns) c = do
x <- ins n 1 a
y <- if x > 0
then
ins n x b
else
return 0
loop ns $! c + y
-- Without debugging use the next line
-- loop xs 0
-- With debugging use the next five lines
c <- loop xs 0
a' <- see a
b' <- see b
trace (show (c,(a',b'))) $ do
return c
-- see is only used in debugging
see arr = do
let zs = map head . group . sort $ xs
vs <- sequence [ getSum arr z | z <- zs ]
let ans = filter (\(a,v) -> v>0) (zip zs vs)
return ans
up = [1..6]
down = [5,4..1]
see'tests = map go [ up, down, up ++ down, down ++ up ]
main = putStrLn . unlines . map show $ see'tests