Haskell Constant Propagation on Data Structures? - arrays

I want to know how deeply Haskell evaluates data structures at compile time.
Consider the following list:
simpleTableMultsList :: [Int]
simpleTableMultsList = [n*m | n <- [1 ..9],m <- [1 ..9]]
This list gives a representation of the multiplication table for 1 through 9. Now, suppose we want to change it so that we represent the product of two one digit numbers as a pair of numbers (first digit, second digit). Then we may consider
simpleTableMultsList :: [(Int,Int)]
simpleTableMultsList = [(k `div` 10, k `rem` 10) | n <- [1 ..9],m <- [1 ..9],let k = n*m]
Now we can implement multiplication on one digit numbers as a table lookup. YAY!! However, we want to be more efficient than this! So we want to make this structure an unboxed array. Haskell gives a really great way to do this using
import qualified Data.Array.Unboxed as A
Then we can do:
simpleTableMults :: A.Array (Int,Int) (Int,Int)
simpleTableMults = A.listArray ((1,1),(9,9)) simpleTableMultsList
Now if I want a constant time multiplication of two one digit numbers n and m, I can do:
simpleTableMults ! (n,m)
This is great! Now suppose I compile this module we've been working on. Does the simpleTableMults get fully evaluated so that when I run the computation simpleTableMults ! (n,m), the program literally makes a lookup in memory ... or does it have to build the data structure in memory first. Since it is an unboxed array, my understanding is that the Array must be created at once and is completely strict in its elements -- so that all the elements of the array are fully evaluated.
So really my question is: when does this evaluation occur, and can I force it to occur at compile time?
------- Edit ---------------
I tried to dig further on this! I tried compiling and examining information about the core. It seems GHC is performing a lot of reductions on the code at compile time. I wish I knew more about core to be able to tell. If we compile with
ghc -O2 -ddump-simpl-stats Main.hs
We can see that 98 beta reductions are performed, an unpack-list operation is carried out, many things are unfolded, and a bunch of inlines are performed (around 150). It even tells you where the beta reductions occur, ... since the word IxArray is coming, I am more curious if some sort of simplification is occuring. Now the interesting thing from my point of view is that adding
simpleTableMults = D.deepseq t t
where t = A.listArray ((1,1),(9,9)) simpleTableMultsList
increases the number of beta reductions, inlines, and simplifications quite substantially at compile time. It would be really great if I could load the compiled into a debugger of some sort and "view" the data structure! I am, as it stands, more mistified than before.
------ Edit 2 -------------
I still don't know what beta reductions are being performed. However, I did find out some interesting things based on sassa-nf's repsonse response. For the following experiment, I used the ghc-heap-view package. I changed the way Array was represented in the source according to the Sassa-NF answer. I loaded the program into GHCi, and immediately called
:printHeap simpleTableMults
And as expected got a index too large exception. But under the suggested unpacked datatype, I got a let expression with a toArray and a bunch of _thunks, and some _funs. Not really sure yet what these mean ... The other interesting thing is that by using seq, or some other strictness forcing in the source code, I ended up with all _thunks inside of the let. I can upload the exact emission if that helps.
Also, if I perform a single indexing, the array gets completely evaluated in all cases.
Also, there is no way to call ghci with optimizations, so I might not be getting the same results as when compiled with GHC -O2.

Let's exaggerate:
import qualified Data.Array.Unboxed as A
simpleTableMults :: A.Array (Int,Int) (Int,Int)
simpleTableMults = A.listArray ((1,1),(10000,2000))
[(k `div` 10, k `rem` 10) | n <- [1 ..10000],m <- [1 ..2000],let k = n*m]
main = print $ simpleTableMults A.! (10000,1000)
Then
ghc -O2 -prof b.hs
b +RTS -hy
......Out of memory
hp2hs b.exe.hp
What happened?! You can see the heap consumption graph to go above 1GB, and then it died.
Well, the pair is computed eagerly, but the projections of the pair are lazy, so we end up with tons of thunks to compute k ``div`` 10 and k ``rem`` 10.
import qualified Data.Array.Unboxed as A
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int deriving (Show)
simpleTableMults :: A.Array (Int,Int) P
simpleTableMults = A.listArray ((1,1),(10000,2000))
[P (k `div` 10) (k `rem` 10) |
n <- [1 ..10000],m <- [1 ..2000],let k = n*m]
main = print $ simpleTableMults A.! (10000,1000)
This one is fine, because we eagerly computed the pair.

Related

Optimize a file writing operation in OCaml?

basically in my project, I am trying to write a list of strings into file like this:
val mutable rodata_list : (string*string) list = []
.....
let zip1 ll =
List.map (fun (h,e) -> h^e) ll in
let oc = open_out_gen [Open_append; Open_creat] 0o666 "final_data.s" in
List.iter (fun l -> Printf.fprintf oc "%s\n" l) (zip1 rodata_list);
Here is my problem, usually the rodata_list can reach as long as 800,000 size, and the above code on our server (64-bit, 32 core Intel(R) Xeon(R) CPU E5-2690 0 # 2.90GHz) takes about 3.5 seconds.. The OCaml version I use is 4.01.0.
This is not acceptable, especially as I have 4 piece of code like this to write into a file. Totally they could take me over 15 seconds..
I tried this:
Printf.fprintf oc "%s\n" (String.concat "\n" (zip1 rodata_list));
But no obvious improvement..
So I am wondering that, how to optimize this part? I appreciate any solutions. Thank you!
Don't use ^ to concatenate a bunch of strings in performance critical code, as it will lead to quadratic complexity;
Try not to rely on *printf functions, when performance matters (although in OCaml 4.02 it is pretty fast);
Don't apply several iterations on a list in a row, since OCaml doesn't have a deforesting. Try to do as much operations in an iteration as possible;
If you're using lists of 1 million elements, then you're actually doing something wrong. Try to use different data structure;
So, given the advices above we have the following:
List.iter (fun (x,y) ->
output_string oc x;
output_string oc y;
output_char oc '\n') rodata_list
Also, any optimizations should start from profiling, to get the profile you need to compile it with profiling info, for example like this:
ocamlbuild myprogram.p.native
Then you can run program to collect the profile, that can be read with gprof. My guess, that you will spend all the time not in the actual IO, or even concatenation, but in garbage collection, since your zip, will create millions of string.
How fast it should be
So to proof, that you're actually trying to optimize wrong part of your code, I've wrote this small program:
let rec init_rev acc = function
| 0 -> acc
| n -> init_rev (("hello", "world") :: acc) (n-1)
let () = List.iter (fun (x,y) ->
print_string x;
print_endline y) (init_rev [] 1000_000)
It creates a list of one million elements and outputs it:
$ ocamlbuild main.native
$ time ./main.native > data.txt
real 0m0.998s
user 0m0.211s
sys 0m0.783s
This is on macbook laptop. Moreover we spend most of the time in the system, with only 200ms in OCaml. And a simple loop for 1000_000 iterations without creating a list, takes only 11ms.
So, profile.

Code becomes slower as more boxed arrays are allocated

Edit: It turns out that things generally (not just array/ref operations) slow down the more arrays have been created, so I guess this might just be measuring increased GC times and might not be as strange as I thought. But I'd really like to know (and learn how to find out) what's happening here though, and if there's some way to mitigate this effect in code that creates lots of smallish arrays. Original question follows.
In investigating some weird benchmarking results in a library, I stumbled upon some behavior I don't understand, though it might be really obvious. It seems that the time taken for many operations (creating a new MutableArray, reading or modifying an IORef) increases in proportion to the number of arrays in memory.
Here's the first example:
module Main
where
import Control.Monad
import qualified Data.Primitive as P
import Control.Concurrent
import Data.IORef
import Criterion.Main
import Control.Monad.Primitive(PrimState)
main = do
let n = 100000
allTheArrays <- newIORef []
defaultMain $
[ bench "array creation" $ do
newArr <- P.newArray 64 () :: IO (P.MutableArray (PrimState IO) ())
atomicModifyIORef' allTheArrays (\l-> (newArr:l,()))
]
We're creating a new array and adding it to a stack. As criterion does more samples and the stack grows, array creation takes more time, and this seems to grow linearly and regularly:
Even more odd, IORef reads and writes are affected, and we can see the atomicModifyIORef' getting faster presumably as more arrays are GC'd.
main = do
let n = 1000000
arrs <- replicateM (n) $ (P.newArray 64 () :: IO (P.MutableArray (PrimState IO) ()))
-- print $ length arrs -- THIS WORKS TO MAKE THINGS FASTER
arrsRef <- newIORef arrs
defaultMain $
[ bench "atomic-mods of IORef" $
-- nfIO $ -- OR THIS ALSO WORKS
replicateM 1000 $
atomicModifyIORef' arrsRef (\(a:as)-> (as,()))
]
Either of the two lines that are commented get rid of this behavior but I'm not sure why (maybe after we force the spine of the list, the elements can actually by collected).
Questions
What's happening here?
Is it expected behavior?
Is there a way I can avoid this slowdown?
Edit: I assume this has something to do with GC taking longer, but I'd like to understand more precisely what's happening, especially in the first benchmark.
Bonus example
Finally, here's a simple test program that can be used to pre-allocate some number of arrays and time a bunch of atomicModifyIORefs. This seems to exhibit the slow IORef behavior.
import Control.Monad
import System.Environment
import qualified Data.Primitive as P
import Control.Concurrent
import Control.Concurrent.Chan
import Control.Concurrent.MVar
import Data.IORef
import Criterion.Main
import Control.Exception(evaluate)
import Control.Monad.Primitive(PrimState)
import qualified Data.Array.IO as IO
import qualified Data.Vector.Mutable as V
import System.CPUTime
import System.Mem(performGC)
import System.Environment
main :: IO ()
main = do
[n] <- fmap (map read) getArgs
arrs <- replicateM (n) $ (P.newArray 64 () :: IO (P.MutableArray (PrimState IO) ()))
arrsRef <- newIORef arrs
t0 <- getCPUTimeDouble
cnt <- newIORef (0::Int)
replicateM_ 1000000 $
(atomicModifyIORef' cnt (\n-> (n+1,())) >>= evaluate)
t1 <- getCPUTimeDouble
-- make sure these stick around
readIORef cnt >>= print
readIORef arrsRef >>= (flip P.readArray 0 . head) >>= print
putStrLn "The time:"
print (t1 - t0)
A heap profile with -hy shows mostly MUT_ARR_PTRS_CLEAN, which I don't completely understand.
If you want to reproduce, here is the cabal file I've been using
name: small-concurrency-benchmarks
version: 0.1.0.0
build-type: Simple
cabal-version: >=1.10
executable small-concurrency-benchmarks
main-is: Main.hs
build-depends: base >=4.6
, criterion
, primitive
default-language: Haskell2010
ghc-options: -O2 -rtsopts
Edit: Here's another test program, that can be used to compare slowdown with heaps of the same size of arrays vs [Integer]. It takes some trial and error adjusting n and observing profiling to get comparable runs.
main4 :: IO ()
main4= do
[n] <- fmap (map read) getArgs
let ns = [(1::Integer).. n]
arrsRef <- newIORef ns
print $ length ns
t0 <- getCPUTimeDouble
mapM (evaluate . sum) (tails [1.. 10000])
t1 <- getCPUTimeDouble
readIORef arrsRef >>= (print . sum)
print (t1 - t0)
Interestingly, when I test this I find that the same heap size-worth of arrays affects performance to a greater degree than [Integer]. E.g.
Baseline 20M 200M
Lists: 0.7 1.0 4.4
Arrays: 0.7 2.6 20.4
Conclusions (WIP)
This is most likely due to GC behavior
But mutable unboxed arrays seem to lead to more sever slowdowns (see above). Setting +RTS -A200M brings performance of the array garbage version in line with the list version, supporting that this has to do with GC.
The slowdown is proportional to the number of arrays allocated, not the number of total cells in the array. Here is a set of runs showing, for a similar test to main4, the effects of number of arrays allocated both on the time taken to allocate, and a completely unrelated "payload". This is for 16777216 total cells (divided amongst however many arrays):
Array size Array create time Time for "payload":
8 3.164 14.264
16 1.532 9.008
32 1.208 6.668
64 0.644 3.78
128 0.528 2.052
256 0.444 3.08
512 0.336 4.648
1024 0.356 0.652
And running this same test on 16777216*4 cells, shows basically identical payload times as above, only shifted down two places.
From what I understand about how GHC works, and looking at (3), I think this overhead might be simply from having pointers to all these arrays sticking around in the remembered set (see also: here), and whatever overhead that causes for the GC.
You are paying linear overhead every minor GC per mutable array that remains live and gets promoted to the old generation. This is because GHC unconditionally places all mutable arrays on the mutable list and traverses the entire list every minor GC. See https://ghc.haskell.org/trac/ghc/ticket/7662 for more information, as well as my mailing list response to your question: http://www.haskell.org/pipermail/glasgow-haskell-users/2014-May/024976.html
I think you're definitely seeing GC effects. I had a related issue in cassava (https://github.com/tibbe/cassava/issues/49#issuecomment-34929984) where the GC time was increasing linearly with increasing heap size.
Try to measure how the GC time and mutator time increase as you hold on to more and more arrays in memory.
You can reduce GC time with playing with the +RTS options. For example, try setting -A to your L3 cache size.

Parallel mapM on Repa arrays

In my recent work with Gibbs sampling, I've been making great use of the RVar which, in my view, provides a near ideal interface to random number generation. Sadly, I've been unable to make use of Repa due to the inability to use monadic actions in maps.
While clearly monadic maps can't be parallelized in general, it seems to me that RVar may be at least one example of a monad where effects can be safely parallelized (at least in principle; I'm not terribly familiar with the inner workings of RVar). Namely, I want to write something like the following,
drawClass :: Sample -> RVar Class
drawClass = ...
drawClasses :: Array U DIM1 Sample -> RVar (Array U DIM1 Class)
drawClasses samples = A.mapM drawClass samples
where A.mapM would look something like,
mapM :: ParallelMonad m => (a -> m b) -> Array r sh a -> m (Array r sh b)
While clearly how this would work depends crucially on the implementation of RVar and its underlying RandomSource, in principle one would think that this would involve drawing a new random seed for each thread spawned and proceeding as usual.
Intuitively, it seems that this same idea might generalize to some other monads.
So, my question is: Could one construct a class ParallelMonad of monads for which effects can be safely parallelized (presumably inhabited by, at the least, RVar)?
What might it look like? What other monads might inhabit this class? Have others considered the possibility of how this might work in Repa?
Finally, if this notion of parallel monadic actions can't be generalized, does anyone see any nice way to make this work in the specific case of RVar (where it would be very useful)? Giving up RVar for parallelism is a very difficult trade-off.
It's been 7 years since this question has been asked, and it still seems like no-one came up with a good solution to this problem. Repa doesn't have a mapM/traverse like function, even one that could run without parallelization. Moreover, considering the amount of progress there was in the last few years it seems unlikely that it will happen either.
Because of stale state of many array libraries in Haskell and my overall dissatisfaction with their feature sets I've put forth a couple of years of work into an array library massiv, which borrows some concepts from Repa, but takes it to a completely different level. Enough with the intro.
Prior to today, there was three monadic map like functions in massiv (not counting the synonym like functions: imapM, forM et al.):
mapM - the usual mapping in an arbitrary Monad. Not parallelizable for obvious reasons and is also a bit slow (along the lines of usual mapM over a list slow)
traversePrim - here we are restricted to PrimMonad, which is significantly faster than mapM, but the reason for this is not important for this discussion.
mapIO - this one, as name suggests, is restricted to IO (or rather MonadUnliftIO, but that is irrelevant). Because we are in IO we can automatically split array in as many chunks as there are cores and use separate worker threads to map the IO action over each element in those chunks. Unlike pure fmap, which is also parallelizable, we have to be in IO here because of non-determinism of scheduling combined with side effects of our mapping action.
So, once I read this question, I thought to myself that the problem is practically solved in massiv, but no so fast. Random number generators, such as in mwc-random and others in random-fu can't use the same generator across many threads. Which means, that the only piece of the puzzle I was missing was: "drawing a new random seed for each thread spawned and proceeding as usual". In other words, I needed two things:
A function that would initialize as many generators as there gonna be worker threads
and an abstraction that would seamlessly give the correct generator to the mapping function depending on which thread that the action is running in.
So that is exactly what I did.
First I will give examples using the specially crafted randomArrayWS and initWorkerStates functions, as they are more relevant to the question and later move to the more general monadic map. Here are their type signatures:
randomArrayWS ::
(Mutable r ix e, MonadUnliftIO m, PrimMonad m)
=> WorkerStates g -- ^ Use `initWorkerStates` to initialize you per thread generators
-> Sz ix -- ^ Resulting size of the array
-> (g -> m e) -- ^ Generate the value using the per thread generator.
-> m (Array r ix e)
initWorkerStates :: MonadIO m => Comp -> (WorkerId -> m s) -> m (WorkerStates s)
For those who are not familiar with massiv, the Comp argument is a computation strategy to use, notable constructors are:
Seq - run computation sequentially, without forking any threads
Par - spin up as many threads as there are capabilities and use those to do the work.
I'll use mwc-random package as an example initially and later move to RVarT:
λ> import Data.Massiv.Array
λ> import System.Random.MWC (createSystemRandom, uniformR)
λ> import System.Random.MWC.Distributions (standard)
λ> gens <- initWorkerStates Par (\_ -> createSystemRandom)
Above we initialized a separate generator per thread using system randomness, but we could have just as well used a unique per thread seed by deriving it from the WorkerId argument, which is a mere Int index of the worker. And now we can use those generators to create an array with random values:
λ> randomArrayWS gens (Sz2 2 3) standard :: IO (Array P Ix2 Double)
Array P Par (Sz (2 :. 3))
[ [ -0.9066144845415213, 0.5264323240310042, -1.320943607597422 ]
, [ -0.6837929005619592, -0.3041255565826211, 6.53353089112833e-2 ]
]
By using Par strategy the scheduler library will split evenly the work of generation among available workers and each worker will use it's own generator, thus making it thread safe. Nothing prevents us from reusing the same WorkerStates arbitrary number of times as long as it is not done concurrently, which otherwise would result in an exception:
λ> randomArrayWS gens (Sz1 10) (uniformR (0, 9)) :: IO (Array P Ix1 Int)
Array P Par (Sz1 10)
[ 3, 6, 1, 2, 1, 7, 6, 0, 8, 8 ]
Now putting mwc-random to the side we can reuse the same concept for other possible uses cases by using functions like generateArrayWS:
generateArrayWS ::
(Mutable r ix e, MonadUnliftIO m, PrimMonad m)
=> WorkerStates s
-> Sz ix -- ^ size of new array
-> (ix -> s -> m e) -- ^ element generating action
-> m (Array r ix e)
and mapWS:
mapWS ::
(Source r' ix a, Mutable r ix b, MonadUnliftIO m, PrimMonad m)
=> WorkerStates s
-> (a -> s -> m b) -- ^ Mapping action
-> Array r' ix a -- ^ Source array
-> m (Array r ix b)
Here is the promised example on how to use this functionality with rvar, random-fu and mersenne-random-pure64 libraries. We could have used randomArrayWS here as well, but for the sake of example let's say we already have an array with different RVarTs, in which case we need a mapWS:
λ> import Data.Massiv.Array
λ> import Control.Scheduler (WorkerId(..), initWorkerStates)
λ> import Data.IORef
λ> import System.Random.Mersenne.Pure64 as MT
λ> import Data.RVar as RVar
λ> import Data.Random as Fu
λ> rvarArray = makeArrayR D Par (Sz2 3 9) (\ (i :. j) -> Fu.uniformT i j)
λ> mtState <- initWorkerStates Par (newIORef . MT.pureMT . fromIntegral . getWorkerId)
λ> mapWS mtState RVar.runRVarT rvarArray :: IO (Array P Ix2 Int)
Array P Par (Sz (3 :. 9))
[ [ 0, 1, 2, 2, 2, 4, 5, 0, 3 ]
, [ 1, 1, 1, 2, 3, 2, 6, 6, 2 ]
, [ 0, 1, 2, 3, 4, 4, 6, 7, 7 ]
]
It is important to note, that despite the fact that pure implementation of Mersenne Twister is being used in the above example, we cannot escape the IO. This is because of the non-deterministic scheduling, which means that we never know which one of the workers will be handling which chunk of the array and consequently which generator will be used for which part of the array. On the up side, if the generator is pure and splittable, such as splitmix, then we can use the pure, deterministic and parallelizable generation function: randomArray, but that is already a separate story.
It's probably not a good idea to do this due to inherently sequential nature of PRNGs. Instead, you might want to transition your code as follows:
Declare an IO function (main, or what have you).
Read as many random numbers as you need.
Pass the (now pure) numbers onto your repa functions.

Non-monolithic arrays in Haskell

I have accepted an answer to the question below, but It seemed I misunderstood how Arrays in haskell worked. I thought they were just beefed up lists. Keep that in mind when reading the question below.
I've found that monolithic arrays in haskell are quite inefficient when using them for larger arrays.
I haven't been able to find a non-monolithic implementation of arrays in haskell. What I need is O(1) time look up on a multidimensional array.
Is there an implementation of of arrays that supports this?
EDIT: I seem to have misunderstood the term monolithic. The problem is that it seems like the arrays in haskell treats an array like a list. I might be wrong though.
EDIT2: Short example of inefficient code:
fibArray n = a where
bnds = (0,n)
a = array bnds [ (i, f i) | i <- range bnds ]
f 0 = 0
f 1 = 1
f i = a!(i-1) + a!(i-2)
this is an array of length n+1 where the i'th field holds the i'th fibonacci number. But since arrays in haskell has O(n) time lookup, it takes O(n²) time to compute.
You're confusing linked lists in Haskell with arrays.
Linked lists are the data types that use the following syntax:
[1,2,3,5]
defined as:
data [a] = [] | a : [a]
These are classical recursive data types, supporting O(n) indexing and O(1) prepend.
If you're looking for multidimensional data with O(1) lookup, instead you should use a true array or matrix data structure. Good candidates are:
Repa - fast, parallel, multidimensional arrays -- (Tutorial)
Vector - An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework . (Tutorial)
HMatrix - Purely functional interface to basic linear algebra and other numerical computations, internally implemented using GSL, BLAS and LAPACK.
Arrays have O(1) indexing. The problem is that each element is calculated lazily. So this is what happens when you run this in ghci:
*Main> :set +s
*Main> let t = 100000
(0.00 secs, 556576 bytes)
*Main> let a = fibArray t
Loading package array-0.4.0.0 ... linking ... done.
(0.01 secs, 1033640 bytes)
*Main> a!t -- result omitted
(1.51 secs, 570473504 bytes)
*Main> a!t -- result omitted
(0.17 secs, 17954296 bytes)
*Main>
Note that lookup is very fast, after it's already been looked up once. The array function creates an array of pointers to thunks that will eventually be calculated to produce a value. The first time you evaluate a value, you pay this cost. Here are a first few expansions of the thunk for evaluating a!t:
a!t -> a!(t-1)+a!(t-2)-> a!(t-2)+a!(t-3)+a!(t-2) -> a!(t-3)+a!(t-4)+a!(t-3)+a!(t-2)
It's not the cost of the calculations per se that's expensive, rather it's the need to create and traverse this very large thunk.
I tried strictifying the values in the list passed to array, but that seemed to result in an endless loop.
One common way around this is to use a mutable array, such as an STArray. The elements can be updated as they're available during the array creation, and the end result is frozen and returned. In the vector package, the create and constructN functions provide easy ways to do this.
-- constructN :: Unbox a => Int -> (Vector a -> a) -> Vector a
import qualified Data.Vector.Unboxed as V
import Data.Int
fibVec :: Int -> V.Vector Int64
fibVec n = V.constructN (n+1) c
where
c v | V.length v == 0 = 0
c v | V.length v == 1 = 1
c v | V.length v == 2 = 1
c v = let len = V.length v
in v V.! (len-1) + v V.! (len-2)
BUT, the fibVec function only works with unboxed vectors. Regular vectors (and arrays) aren't strict enough, leading back to the same problem you've already found. And unfortunately there isn't an Unboxed instance for Integer, so if you need unbounded integer types (this fibVec has already overflowed in this test) you're stuck with creating a mutable array in IO or ST to enable the necessary strictness.
Referring specifically to your fibArray example, try this and see if it speeds things up a bit:
-- gradually calculate m-th item in steps of k
-- to prevent STACK OVERFLOW , etc
gradualth m k arr
| m <= v = pre `seq` arr!m
where
pre = foldl1 (\a b-> a `seq` arr!b) [u,u+k..m]
(u,v) = bounds arr
For me, for let a=fibArray 50000, gradualth 50000 10 aran at 0.65 run time of just calling a!50000 right away.

Growing arrays in Haskell

I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.

Resources