What is the fastest way to initialize an immutable unboxed int array in Haskell? - arrays

Is this the fastest way to initialize an immutable array in Haskell with non-default (non-zero) values? In the following examples I am simply initializing the array with values from 0 to (size-1).
Fastest so far (twice the speed of Code.ST below). Thanks to leftaroundabout:
...
import qualified Data.Vector.Unboxed as V
stArray :: Int -> V.Vector Int
stArray size =
V.generate size id
...
My original fastest:
module Code.ST where
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed
stArray :: Int -> UArray Int Int
stArray size =
runSTUArray $ newArray (0,size-1) 0 >>= f 0
where
f i a
| i >= size = return a
| otherwise = writeArray a i i >> f (i + 1) a
stMain :: IO ()
stMain = do
let size = 340000000
let a = stArray size
putStrLn $ "Size: " ++ show size ++ " Min: " ++ show (a ! 0) ++ " Max: " ++ show (a ! (size - 1))
I have tried the simpler immutable ways of doing it and it is 2 to 3 times slower on my PC (YMMV). I also tried Repa but it falls over even with smaller than 340000000 size arrays (lots of HD trashing - I gave up before it finished).

Have you tried listArray from Data.Array.Unboxed? You can use them like this:
-- listArray :: (Ix i, IArray a e) => (i, i) -> [e] -> a i e
listArray (0,3) "abcdefgh" :: UArray Int Char
This will create
array (0,3) [(0,'a'),(1,'b'),(2,'c'),(3,'d')]
If you need a bit more flexibility you can use array from the same module.
-- array :: (Ix i, IArray a e) => (i, i) -> [(i, e)] -> a i e
array (0,3) (zip [1,3,0,2] "abcd") :: UArray Int Char
Which will produce
array (0,3) [(0,'c'),(1,'a'),(2,'d'),(3,'b')]
I don't really know whether it is fast or not, but certainly it is more convenient to use than hand-written ST loops.

Related

Converting a list to `IO (IOArray Int a)` in Haskell

I need to write a function that takes a list of a and returns IO (IOArray Int a)
listToArray :: [a] -> IO (IOArray Int a)
I need some help to get started with IO arrays. I create a new one by newListArray but then it seems that I have to commit to a specific type and my function needs to work with any type a.
Thanks for the help!
If you want to work with any type you can take a look at this type signature
newListArray :: (MArray a e m, Ix i) => (i, i) -> [e] -> m (a i e)
where the m is IO, a is IOArray and i is Int.
This one requires start and end index of the array. You can see in detail here: http://hackage.haskell.org/package/array-0.5.4.0/docs/Data-Array-MArray.html#v:newListArray

Run-length encoding of a Repa array

I have a one-dimensional Repa array that consists of 0's and 1's and I want to calculate its run-length encoding.
E.g.: Turn [0,0,1,1,1,0,0,0,1,0,1,1] into [2,3,3,1,1,2] or something similar. (I'm using a list representation because of readability)
Ideally, I would like the run-length of the 1's and ignore the 0's.
So [0,0,1,1,1,0,0,0,1,0,1,1] becomes [3,1,2].
I would like the result to be a (Repa) array as well.
How can I do this using Repa? I can't use map or traverse since they only give me one element at a time. I could try to fold with some special kind of accumulator but that doesn't seem to be ideal and I don't know it it's even possible (due to monad laws).
I'm currently just iterating over the array and returning a list without using any Repa function. I'm working on Booleans instead of 1's and 0's but the algorithm is the same. I'm converting this list to a Repa Array afterwards.
runLength :: Array U DIM1 Bool -> [Length]
runLength arr = go ([], 0, False) 0 arr
where
Z :. n = extent arr
go :: Accumulator -> Int -> Array U DIM1 Bool -> [Length]
go !acc#(xs, c, b) !i !arr | i == n = if c > 0 then c:xs else xs
| otherwise =
if unsafeIndex arr (Z :. i)
then if b
then go (xs, c+1, b) (i+1) arr
else go (xs, 1, True) (i+1) arr
else if b
then go (c:xs, 0, False) (i+1) arr
else go (xs, 0, False) (i+1) arr

SML NJ , able to find max real of array, cant get index

i have an array of real values like [|1.2, 3.4, 5.3, 2.5|]
fun max_arr arr = foldl Real.max (sub (arr, 0)) arr;
works fine to find the max value 5.3 .
Then i would expect something like
fun max_arri arr = foldli (Real.max(sub (arr, 0))) arr;
to successfully return the location of max value 2.but it doesn't work.
(Error: unbound variable or constructor: max_arri)
I went through everything I could find online but the documentation about sml seems small...
according to the manual they both take the same data. so what would i need to change?
foldli f init arr
foldl f init arr
also i dont want to use lists because i change the data alot
They do not take the same input.
foldl :: ( 'a * 'b -> 'b) -> 'b -> 'a array -> 'b
foldli :: (int * 'a * 'b -> 'b) -> 'b -> 'a array -> 'b
As we can see from the types, the difference is that foldli takes a function that also takes an integer -- the index of the element.
The return type of the two functions are both 'b, so foldli does not return the index. Rather, this expression:
foldli (fn (i, a, b) => f (a, b)) init arr
Is the exact equivalent of this expression:
foldl f init arr
Now, if we want to return the index of an element, we need the 'b in the type of foldli to become int. However, finding the maximum relies on comparison of elements, so we also need the current maximum, just like in your max_arr function. The obvious solution is to use a tuple. 'b now becomes (real * int).
(* cElem = current element
* cI = current index
*)
fun fmax (i, elem : real, (cElem, cI)) =
if cElem > elem
then (cElem, cI)
else (elem, i)
fun max_arri arr = foldli fmax (sub (arr, 0), 0) arr
Of course, this is not the type we want our max_arri to return - we only want the index. The loose helper function isn't very nice either, but it's a bit long to have as a lambda. Instead, we wrap it all in a local:
local
fun fmax (i, elem : real, (cElem, cI)) =
if cElem > elem
then (cElem, cI)
else (elem, i)
fun max_arri' arr = foldli fmax (sub (arr, 0), 0) arr
in
fun max_arri arr = let val (_, i) = max_arri' arr
in i end
end

How to use `getBounds' with STArray?

I'm trying to write a Fisher-Yates shuffle algorithm using STArray. Unlike all the other examples I've found on the net, I am trying to avoid using native lists. I just want to shuffle an array, in-place.
This is what I have:
randShuffleST arr gen = runST $ do
_ <- getBounds arr
return (arr, gen)
arr is the STArray and gen will be a generator state of type (RandomGen g).
I was hoping I could rely on the (MArray (STArray s) e (ST s)) instance declaration defined in MArray for being able to use MArray's getBounds but GHCi cannot infer the type of randShuffleST. It fails with:
Could not deduce (MArray a e (ST s))
arising from a use of `getBounds'
from the context (Ix i)
bound by the inferred type of
randShuffleST :: Ix i => a i e -> t -> (a i e, t)
at CGS/Random.hs:(64,1)-(66,25)
Possible fix:
add (MArray a e (ST s)) to the context of
a type expected by the context: ST s (a i e, t)
or the inferred type of
randShuffleST :: Ix i => a i e -> t -> (a i e, t)
or add an instance declaration for (MArray a e (ST s))
In a stmt of a 'do' block: _ <- getBounds arr
In the second argument of `($)', namely
`do { _ <- getBounds arr;
return (arr, gen) }'
In the expression:
runST
$ do { _ <- getBounds arr;
return (arr, gen) }
Interestingly, if I remove the call to `runST' like so:
randShuffleST arr gen = do
_ <- getBounds arr
return (arr, gen)
it compiles fine, with the type signature
randShuffleST :: (Ix i, MArray a e m) => a i e -> t -> m (a i e, t)
. I'm using GHC 7.4.2 on Arch Linux.
Please give explicit type signatures in your responses to help me understand your code, thank you.
EDIT: I really like Antal S-Z's answer, but I cannot select it because frankly I do not fully understand it. Maybe once I understand my own problem better I'll answer my own question in the future... thanks.
You probably shouldn't use runST in your function. runST should be used once, on the outside of some computation that uses mutation internally but has a pure interface. You probably want your shuffle function, which shuffles the array in-place, to have a type like STArray s i e -> ST s () (or possibly a more general type), and then have a different function that uses runST to present a pure interface, if you want that (that function would probably need to copy values, though). In general the goal of ST is that STRefs and STArrays can never escape from one runST invocation and be used in another.
The type inferred for your function without runST is fine, just more polymorphic (it'll work for IO arrays, ST arrays, STM arrays, unboxed arrays, etc.). You'll have an easier time with inference errors if you specify explicit type signatures, though.
This occurs because the rank-2 type of runST is preventing you from giving a meaningful type to randShuffleST. (There's a second problem with your code as written: mutable ST arrays can't meaningfully exist outside of the ST monad, so returning one from inside runST is impossible, and constructing one to pass into a pure function is unlikely at best. This is "uninteresting," but might end up being confusing on its own; see the bottom of this answer for how to address it.)
So, let's see why you can't write down a type signature. It's worth saying up-front that I agree with shachaf about the best way to write functions like the one you're writing: stay inside ST, and use runST only once, at the very end. If you do this, then I've included some sample code at the bottom of the answer which shows how to write your code successfully. But I think it's interesting to understand why you get the error you do; errors like the one you're getting are some of the reasons that you don't want to write your code this way!
To begin with, let's first look at a simplified version of the function which produces the same error message:
bounds arr = runST (getBounds arr)
Now, let's try to give a type to bounds. The obvious choice is
bounds :: (MArray a e (ST s), Ix i) => a i e -> (i,i)
bounds arr = runST (getBounds arr)
We know that arr must be an MArray and we don't care what elements or index type it has (as long as its indices are in Ix), but we know that it must live inside the ST monad. So this should work, right? Not so fast!
ghci> :set -XFlexibleContexts +m
ghci> :module + Control.Monad.ST Data.Array.ST
ghci> let bounds :: (MArray a e (ST s), Ix i) => a i e -> (i,i)
ghci| bounds arr = runST (getBounds arr)
ghci|
<interactive>:8:25:
Could not deduce (MArray a e (ST s1))
arising from a use of `getBounds'
from the context (MArray a e (ST s), Ix i)
bound by the type signature for
bounds :: (MArray a e (ST s), Ix i) => a i e -> (i, i)
at <interactive>:7:5-38
...
Wait a minute: Could not deduce (MArray a e (ST s1))? Where'd that s1 come from‽ We don't mention such a type variable anywhere! The answer is that it's coming from the runST in the definition of bounds. In general, runST has the type (renaming some type variables for convenience) runST :: (forall σ. ST σ α) -> α; when we use it here, we've constrained it to the type (forall σ. ST σ (i,i)) -> (i,i). What's happening here is that the forall is like a lambda (in fact, it is a lambda), binding σ locally inside the parentheses. So when getBounds arr returns something of type ST s (i,i), we can unify α with the (i,i)---but we can't unify the σ with the s, because the σ isn't in scope. In GHC, the type variables for runST are s and a, not σ and α, so it renames s to s1 to remove ambiguity, and it's this type variable that you're seeing.
So the error is fair: we've claimed that for some particular s, MArray a e (ST s) holds. But runST needs that to be true for every s. The error is, however, very unclear, since it introduces a new type variable which you can't actually refer to (so the "possible fix" is meaningless, although it's never helpful anyway).
Now, the obvious question is, "So can I write a correct type signature?" The answer is "…sort of." (But you probably don't want to.) The desired type would be something like the following:
ghci> :set -XConstraintKinds -XRank2Types
ghci> let bounds :: (forall s. MArray a e (ST s), Ix i) => a i e -> (i,i)
ghci| bounds arr = runST (getBounds arr)
ghci|
<interactive>:170:25:
Could not deduce (MArray a e (ST s))
arising from a use of `getBounds'
from the context (forall s. MArray a e (ST s), Ix i)
...
This constraint says that MArray a e (ST s) holds for every s, but we still get a type error. It appears that "GHC does not support polymorphic constraints to the left of an arrow"—and in fact, while googling around trying to find that information, I found an excellent blog post at "Main Is Usually A Function", which runs into the same problem as you, explains the error, and provides the following workaround. (They also get the superior error message "malformed class assertion," which makes clear that such a thing is impossible; this is probably due to differing GHC versions.)
The idea is, as is common when we want more out of type class constraints that we can get from GHC's built-in system, to provide explicit evidence for the existence of such a type class by (ab)using a GADT:
ghci> :set -XNoFlexibleContexts -XNoConstraintKinds
ghci> -- We still need -XRank2Types, though
ghci> :set -XGADTs
ghci> data MArrayE a e m where
ghci| MArrayE :: MArray a e m => MArrayE a e m
ghci|
ghci>
Now, whenever we have a value of type MArrayE a e m, we know that the value must have been constructed with the MArrayE constructor; this constructor can only be called when there's an MArray a e m constraint available, and so pattern-matching on MArrayE will make that constraint available again. (The only other possibility is that your value of that type was undefined, which is why a pattern match is in fact necessary.) Now, we can provide that as an explicit argument to the bounds function, so we'd call it as bounds MArrayE arr:
ghci> :set -XScopedTypeVariables
ghci> let bounds :: forall a e i.
ghci| Ix i => (forall s. MArrayE a e (ST s)) -> a i e -> (i,i)
ghci| bounds evidence arr = runST (go evidence)
ghci| where go :: MArrayE a e (ST s) -> ST s (i,i)
ghci| go MArrayE = getBounds arr
ghci|
ghci> -- Hooray!
Note the weirdness where we have to factor out the body into its own function and pattern-match there. What's going on is that if you pattern-match in bounds's argument list, the s from the evidence gets fixed to a particular value too early, and so we need to put this off; and (I think because inference with higher-rank types is hard) we also need to provide an explicit type for go, which necessitates scoped type variables.
And finally, returning to your original code:
ghci> let randShuffleST :: forall a e i g. Ix i => (forall s. MArrayE a e (ST s))
ghci| -> a i e
ghci| -> g
ghci| -> (a i e, g)
ghci| randShuffleST evidence arr gen = runST $ go evidence
ghci| where go :: MArrayE a e (ST s) -> ST s (a i e,g)
ghci| go MArrayE = do _ <- getBounds arr
ghci| return (arr, gen)
ghci|
ghci> -- Hooray again! But...
Now, as I said at the beginning, there's one problem left to address. In the code above, there's never going to be a way to construct a value of type forall s. MArrayE a e (ST s), because the contraint forall s. MArray a e (ST s) constraint is unsatisfiable. For the same reason, in your original code, you couldn't write randShuffleST even without the type error you're getting, because you can't write a function which returns an STArray outside of ST.
The reason for both of these problems is the same: an STArray's first parameter is the state thread it lives on. The MArray instance for STArray is instance MArray (STArray s) e (ST s), and so you'll always have types of the form ST s (STArray s i e). Since runST :: (forall s. ST s a) -> a, running runST mySTArrayAction would "leak" the s out in an illegal way. Look into
runSTArray :: Ix i => (forall s. ST s (STArray s i e)) -> Array i e
and its unboxed friend
runSTUArray :: Ix i => (forall s. ST s (STUArray s i e)) -> UArray i e.
You can also use
unsafeFreeze :: (Ix i, MArray a e m, IArray b e) => a i e -> m (b i e)
to do the same thing, as long as you promise that that's the last function you'll ever call on your mutable array; the freeze function relaxes this restriction, but has to copy the array. By the same token, if you want to pass an array, and not a list, into the pure version of your function, you'll probably also want
thaw :: (Ix i, IArray a e, MArray b e m) => a i e -> m (b i e);
using unsafeThaw would probably be disastrous here, since you're passing in an immutable array that you have no control over! This would all combine to give us something like:
ghci> :set -XNoRank2Types -XNoGADTs
ghci> -- We still need -XScopedTypeVariables for our use of `thaw`
ghci> import Data.Array.IArray
ghci> let randShuffleST :: forall ia i e g. (Ix i, IArray ia e)
ghci| => ia i e
ghci| -> g
ghci| -> (Array i e, g)
ghci| randShuffleST iarr gen = runST $ do
ghci| marr <- thaw iarr :: ST s (STArray s i e)
ghci| _ <- getBounds marr
ghci| iarr' <- unsafeFreeze marr
ghci| return (iarr', gen)
ghci|
ghci> randShuffleST (listArray (0,2) "abc" :: Array Int Char) "gen"
(array (0,2) [(0,'a'),(1,'b'),(2,'c')],"gen")
This takes O(n) time to copy the input immutable array, but—with optimizations—takes O(1) time to freeze the mutable array for the output, since STArray and Array are the same under the hood.
Applying this to your problem in particular, we have the following:
{-# LANGUAGE FlexibleContexts #-}
import System.Random
import Control.Monad
import Control.Applicative
import Control.Monad.ST
import Data.Array.ST
import Data.STRef
import Data.Array.IArray
updateSTRef :: STRef s a -> (a -> (b,a)) -> ST s b
updateSTRef r f = do
(b,a) <- f <$> readSTRef r
writeSTRef r a
return b
swapArray :: (MArray a e m, Ix i) => a i e -> i -> i -> m ()
swapArray arr i j = do
temp <- readArray arr i
writeArray arr i =<< readArray arr j
writeArray arr j temp
shuffle :: (MArray a e (ST s), Ix i, Random i, RandomGen g)
=> a i e -> g -> ST s g
shuffle arr gen = do
rand <- newSTRef gen
bounds#(low,_) <- getBounds arr
when (rangeSize bounds > 1) .
forM_ (reverse . tail $ range bounds) $ \i ->
swapArray arr i =<< updateSTRef rand (randomR (low,i))
readSTRef rand
-- Two different pure wrappers
-- We need to specify a specific type, so that GHC knows *which* mutable array
-- to work with. This replaces our use of ScopedTypeVariables.
thawToSTArray :: (Ix i, IArray a e) => a i e -> ST s (STArray s i e)
thawToSTArray = thaw
shufflePure :: (IArray a e, Ix i, Random i, RandomGen g)
=> a i e -> g -> (a i e, g)
shufflePure iarr g = runST $ do
marr <- thawToSTArray iarr
g' <- shuffle marr g
iarr' <- freeze marr
return (iarr',g')
shufflePure' :: (IArray a e, Ix i, Random i, RandomGen g)
=> a i e -> g -> (Array i e, g)
shufflePure' iarr g =
let (g',g'') = split g
iarr' = runSTArray $ do
marr <- thaw iarr -- `runSTArray` fixes the type of `thaw`
void $ shuffle marr g'
return marr
in (iarr',g'')
Again, you could replace freeze with Data.Array.Unsafe.unsafeFreeze in shufflePure; this would probably produce a speedup, since it wouldn't have to copy the array to return it if it was an Array i e. The runSTArray function wraps unsafeFreeze safely, so that's not an issue in shufflePure'. (The two are equivalent, modulo some details about splitting the PRNG.)
What do we see here? Importantly, only the mutable code ever references the mutable arrays, and it stays mutable (i.e., returns something inside ST s). Since shuffle does an in-place shuffle, it doesn't need to return an array, just the PRNG. To build a pure interface, we thaw an immutable array into a mutable array, shuffle that in-place, and then freeze the resulting array back into an immutable one. This is important: it prevents us from leaking mutable data back into the pure world. You can't directly mutably shuffle the passed-in array, because it's is immutable; contrariwise, you can't directly return the mutably shuffled array as an immutable array, because it's mutable, and what if someone could mutate it?
This doesn't run afoul of any of the errors we saw above, because all of those errors come from improper use of runST. If we restrict our use of runST, only running it once we've assembled a pure result, all the inner state-threading can happen automatically. Since runST is the only function with a rank-2 type, it's the only place where severe type-weirdness can be produced; everything else just requires your standard type-based reasoning, albeit perhaps with a little more thought to keep the s state-thread parameter consistent.
And lo and behold:
*Main> let arr10 = listArray (0,9) [0..9] :: Array Int Int
*Main> elems arr10
[0,1,2,3,4,5,6,7,8,9]
*Main> elems . fst . shufflePure arr10 <$> newStdGen
[3,9,0,5,1,2,8,7,6,4]
*Main> elems . fst . shufflePure arr10 <$> newStdGen
[3,1,0,5,9,8,4,7,6,2]
*Main> elems . fst . shufflePure' arr10 <$> newStdGen
[3,9,2,6,8,4,5,0,7,1]
*Main> elems . fst . shufflePure' arr10 <$> newStdGen
[8,5,2,1,9,4,3,0,7,6]
Success, at long last! (Way too long last, really. Sorry about the length of this answer.)
Below is one way of implementing an in-place Fisher-Yates (I think that is
called a Durstenfeld or Knuth Shuffle). Notice that runST is never called, but runSTArray instead, and it is only called once.
import Data.Array
import Data.Array.ST
import Control.Monad.ST
import Control.Monad
import System.Random
fisherYates :: (RandomGen g,Ix ix, Random ix) => g -> Array ix e -> Array ix e
fisherYates gen a' = runSTArray $ do
a <- thaw a'
(bot,top) <- getBounds a
foldM (\g i -> do
ai <- readArray a i
let (j,g') = randomR (bot,i) g
aj <- readArray a j
writeArray a i aj
writeArray a j ai
return g') gen (range (bot,top))
return a
Note that although the algorithm is performed in-place, the function first copies the array given in the input (a result of using the function thaw) before performing the algorithm on the copy. In order to avoid copying the array you have at least two options:
Use unsafeThaw, which is (as the name suggests) unsafe and can only be used if you
are sure that the input array will never be used again. This is not trivial to
guarantee because of lazy evaluation.
Let fisherYates have the type (RandomGen g,Ix ix, Random ix) => g -> STArray s ix e -> ST s (STArray s ix e) and perform the whole operation that requires an in-place fisher-yates algorithm inside the ST monad and only give the final answer with runST.

Something like mapM, but for arrays? (like arrayMap, but mapping an impure function)

I see that I can map a function over mutable arrays with mapArray, but there doesn't seem to be something like mapM (and mapM_). mapArray won't let me print its elements, for example:
import Data.Array.Storable
arr <- newArray (1,10) 42 :: IO -- answer to Life, Universe and Everything
x <- readLn :: IO Int
mapArray (putStrLn.show) arr -- <== this doesn't work!
The result will be:
No instances for (MArray StorableArray Int m,
MArray StorableArray (IO ()) m)
arising from a use of `mapArray' at <interactive>:1:0-27
Possible fix:
add an instance declaration for
(MArray StorableArray Int m, MArray StorableArray (IO ()) m)
In the expression: mapArray (putStrLn . show) arr
In the definition of `it': it = mapArray (putStrLn . show) arr
Is there something like that in Haskell (or in GHC even if not standard Haskell)?
Also, I found no foldr/foldl functions for arrays (mutable or not). Do they exist?
Thanks a lot!
Import the module Data.Traversable. It defines a typeclass for just what you want with instances already defined for array and all sorts of things. It has generalized versions of sequence and mapM, plus some even more general functions that you probably won't bother with very often.
Just a simple
import Data.Traversable as T
T.mapM doIOStuff arr
works fine.
Perhaps use one of the other array libraries, if you're doing a lot of mutation? Like uvector?
Otherwise,
forM_ [1..n] \$ \i ->. unsafeWrite x i
should be fine.
For the example of printing all the elements: you can use "mapM_ print . elems".
But it sounds like you want to create a new array where each value is the result of a monadic action of the previous one? In that case:
arrayMapM :: (Monad m, Ix i) => (a -> m b) -> Array i a -> m (Array i b)
arrayMapM func src =
liftM (listArray (bounds src)) . mapM func . elems $ src

Resources