map runSTArray over a list of STArrays? - arrays

I have a function that creates recursively a flattened list of matrices from a tree that have to be mutable as their elements are updated often during their creation. So far I have come up with a recursive solution that has the signature:
doAll :: .. -> [ST s (STArray s (Int, Int) Int)]
The reason I do not return the [UArray (Int,Int) Int] directly is because doAll is called recursively, modifies elements of the matrices in the list and appends new matrices. I don't want to freeze and thaw the matrices unnecessarily.
So far so good. I can inspect the n-th matrix (of type Array (Int, Int) Int) in ghci
runSTArray (matrices !! 0)
runSTArray (matrices !! 1)
and indeed I get the correct results for my algorithm. However, I didn't find a way to map runSTUArray over the list that is returned by doAll:
map (runSTArray) matrices
Couldn't match expected type `forall s. ST s (STArray s i0 e0)'
with actual type `ST s0 (STArray s0 (Int, Int) Int)'
The same problem happens if I try to evaluate recursively over the list or try to evaluate single elements wrapped in a function
Could someone please explain what is going on (I didn't really understand the implications of the forall keyword) and how I could evaluate the arrays in the list?

This is an unfortunate consequence of the type trick that makes ST safe. First, you need to know how ST works. The only way to get from the ST monad to pure code is with the runST function, or other functions built upon it like runSTArray. These are all of the form forall s.. This means that, in order to construct an Array from an STArray, the compiler must be able to determine that it can substitute any type it likes in for the s type variable inside runST.
Now consider the function map :: (a -> b) -> [a] -> [b]. This shows that every element in the list must have exactly the same type (a), and therefore also the same s. But this extra constraint violates the type of runSTArray, which declares that the compiler must be able to freely substitute other values for s.
You can work around this by defining a new function to first freeze the arrays inside the ST monad, then run the resulting ST action:
runSTArrays :: Ix ix => (forall s. [ST s (STArray s ix a)]) -> [Array ix a]
runSTArrays arrayList = runST $ (sequence arrayList >>= mapM freeze)
Note the forall requires the RankNTypes extension.

You just bounced against the limitations of the type system.
The runSTArray has a higher ranked type. You must pass it a ST-action whose state type variable is unique. Yet, in Haskell it is normally not possible to have such values in lists.
The whole thing is a clever scheme to make sure that values you produce in an ST action can't escape from there. Which means, it looks like your design is somehow broken.
One suggestion: can't you process the values in another ST action, like
sequence [ ... your ST s (STArray s x) ...] >>= processing
where
processing :: [STArray s x] -> ST s (your results)

Related

Haskell: Break a loop conditionally

I want to break a loop in a situation like this:
import Data.Maybe (fromJust, isJust, Maybe(Just))
tryCombination :: Int -> Int -> Maybe String
tryCombination x y
| x * y == 20 = Just "Okay"
| otherwise = Nothing
result :: [String]
result = map (fromJust) $
filter (isJust) [tryCombination x y | x <- [1..5], y <- [1..5]]
main = putStrLn $ unlines $result
Imagine, that "tryCombination" is a lot more complicated like in this example. And it's consuming a lot of cpu power. And it's not a evalutation of 25 possibilities, but 26^3.
So when "tryCombination" finds a solution for a given combination, it returns a Just, otherwise a Nothing. How can I break the loop instantly on the first found solution?
Simple solution: find and join
It looks like you're looking for Data.List.find. find has the type signature
find :: (a -> Bool) -> [a] -> Maybe a
So you'd do something like
result :: Maybe (Maybe String)
result = find isJust [tryCombination x y | x <- [1..5], y <- [1..5]]
Or, if you don't want a Maybe (Maybe String) (why would you?), you can fold them together with Control.Monad.join, which has the signature
join :: Maybe (Maybe a) -> Maybe a
so that you have
result :: Maybe String
result = join $ find isJust [tryCombination x y | x <- [1..5], y <- [1..5]]
More advanced solution: asum
If you wanted a slightly more advanced solution, you could use Data.Foldable.asum, which has the signature
asum :: [Maybe a] -> Maybe a
What it does is pick out the first Just value from a list of many. It does this by using the Alternative instance of Maybe. The Alternative instance of Maybe works like this: (import Control.Applicative to get access to the <|> operator)
λ> Nothing <|> Nothing
Nothing
λ> Nothing <|> Just "world"
Just "world"
λ> Just "hello" <|> Just "world"
Just "hello"
In other words, it picks the first Just value from two alternatives. Imagine putting <|> between every element of your list, so that
[Nothing, Nothing, Just "okay", Nothing, Nothing, Nothing, Just "okay"]
gets turned to
Nothing <|> Nothing <|> Just "okay" <|> Nothing <|> Nothing <|> Nothing <|> Just "okay"
This is exactly what the asum function does! Since <|> is short-circuiting, it will only evaluate up to the first Just value. With that, your function would be as simple as
result :: Maybe String
result = asum [tryCombination x y | x <- [1..5], y <- [1..5]]
Why would you want this more advanced solution? Not only is it shorter; once you know the idiom (i.e. when you are familiar with Alternative and asum) it is much more clear what the function does, just by reading the first few characters of the code.
To answer your question, find function is what you need. After you get Maybe (Maybe String) you can transform it into Maybe String with join
While find is nicer, more readable and surely does only what's needed, I wouldn't be so sure about inefficiency of the code that you have in a question. The lazy evaluation would probably take care of that and compute only what's needed, (extra memory can still be consumed). If you are interested, try to benchmark.
Laziness can actually take care of that in this situation.
By calling unlines you are requesting all of the output of your "loop"1, so obviously it can't stop after the first successful tryCombination. But if you only need one match, just use listToMaybe (from Data.Maybe); it will convert your list to Nothing if there are no matches at all, or Just the first match found.
Laziness means that the results in the list will only be evaluated on demand; if you never demand any more elements of the list, the computations necessary to produce them (or even see whether there are any more elements in the list) will never be run!
This means you often don't have to "break loops" the way you do in imperative languages. You can write the full "loop" as a list generator, and the consumer(s) can decide independently how much of the they want. The extreme case of this idea is that Haskell is perfectly happy to generate and even filter infinite lists; it will only run the generation code just enough to produce exactly as many elements as you later end up examining.
1 Actually even unlines produces a lazy string, so if you e.g. only read the first line of the resulting joined string you could still "break the loop" early! But you print the whole thing here.
The evaluation strategy you are looking for is exactly the purpose of the Maybe instance of MonadPlus. In particular, there is the function msum whose type specializes in this case to
msum :: [Maybe a] -> Maybe a
Intuitively, this version of msum takes a list of potentially failing computations, executes them one after another until the first computations succeeds and returns the according result. So, result would become
result :: Maybe String
result = msum [tryCombination x y | x <- [1..5], y <- [1..5]]
On top of that, you could make your code in some sense agnostic to the exact evaluation strategy by generalizing from Maybe to any instance of MonadPlus:
tryCombination :: MonadPlus m => Int -> Int -> m (Int,Int)
-- For the sake of illustration I changed to a more verbose result than "Okay".
tryCombination x y
| x * y == 20 = return (x,y) -- `return` specializes to `Just`.
| otherwise = mzero -- `mzero` specializes to `Nothing`.
result :: MonadPlus m => m (Int,Int)
result = msum [tryCombination x y | x <- [1..5], y <- [1..5]]
To get your desired behavior, just run the following:
*Main> result :: Maybe (Int,Int)
Just (4,5)
However, if you decide you need not only the first combination but all of them, just use the [] instance of MonadPlus:
*Main> result :: [(Int,Int)]
[(4,5),(5,4)]
I hope this helps more on a conceptual level than just providing a solution.
PS: I just noticed that MonadPlus and msum are indeed a bit too restrictive for this purpose, Alternative and asum would have been enough.

Haskell Constant Propagation on Data Structures?

I want to know how deeply Haskell evaluates data structures at compile time.
Consider the following list:
simpleTableMultsList :: [Int]
simpleTableMultsList = [n*m | n <- [1 ..9],m <- [1 ..9]]
This list gives a representation of the multiplication table for 1 through 9. Now, suppose we want to change it so that we represent the product of two one digit numbers as a pair of numbers (first digit, second digit). Then we may consider
simpleTableMultsList :: [(Int,Int)]
simpleTableMultsList = [(k `div` 10, k `rem` 10) | n <- [1 ..9],m <- [1 ..9],let k = n*m]
Now we can implement multiplication on one digit numbers as a table lookup. YAY!! However, we want to be more efficient than this! So we want to make this structure an unboxed array. Haskell gives a really great way to do this using
import qualified Data.Array.Unboxed as A
Then we can do:
simpleTableMults :: A.Array (Int,Int) (Int,Int)
simpleTableMults = A.listArray ((1,1),(9,9)) simpleTableMultsList
Now if I want a constant time multiplication of two one digit numbers n and m, I can do:
simpleTableMults ! (n,m)
This is great! Now suppose I compile this module we've been working on. Does the simpleTableMults get fully evaluated so that when I run the computation simpleTableMults ! (n,m), the program literally makes a lookup in memory ... or does it have to build the data structure in memory first. Since it is an unboxed array, my understanding is that the Array must be created at once and is completely strict in its elements -- so that all the elements of the array are fully evaluated.
So really my question is: when does this evaluation occur, and can I force it to occur at compile time?
------- Edit ---------------
I tried to dig further on this! I tried compiling and examining information about the core. It seems GHC is performing a lot of reductions on the code at compile time. I wish I knew more about core to be able to tell. If we compile with
ghc -O2 -ddump-simpl-stats Main.hs
We can see that 98 beta reductions are performed, an unpack-list operation is carried out, many things are unfolded, and a bunch of inlines are performed (around 150). It even tells you where the beta reductions occur, ... since the word IxArray is coming, I am more curious if some sort of simplification is occuring. Now the interesting thing from my point of view is that adding
simpleTableMults = D.deepseq t t
where t = A.listArray ((1,1),(9,9)) simpleTableMultsList
increases the number of beta reductions, inlines, and simplifications quite substantially at compile time. It would be really great if I could load the compiled into a debugger of some sort and "view" the data structure! I am, as it stands, more mistified than before.
------ Edit 2 -------------
I still don't know what beta reductions are being performed. However, I did find out some interesting things based on sassa-nf's repsonse response. For the following experiment, I used the ghc-heap-view package. I changed the way Array was represented in the source according to the Sassa-NF answer. I loaded the program into GHCi, and immediately called
:printHeap simpleTableMults
And as expected got a index too large exception. But under the suggested unpacked datatype, I got a let expression with a toArray and a bunch of _thunks, and some _funs. Not really sure yet what these mean ... The other interesting thing is that by using seq, or some other strictness forcing in the source code, I ended up with all _thunks inside of the let. I can upload the exact emission if that helps.
Also, if I perform a single indexing, the array gets completely evaluated in all cases.
Also, there is no way to call ghci with optimizations, so I might not be getting the same results as when compiled with GHC -O2.
Let's exaggerate:
import qualified Data.Array.Unboxed as A
simpleTableMults :: A.Array (Int,Int) (Int,Int)
simpleTableMults = A.listArray ((1,1),(10000,2000))
[(k `div` 10, k `rem` 10) | n <- [1 ..10000],m <- [1 ..2000],let k = n*m]
main = print $ simpleTableMults A.! (10000,1000)
Then
ghc -O2 -prof b.hs
b +RTS -hy
......Out of memory
hp2hs b.exe.hp
What happened?! You can see the heap consumption graph to go above 1GB, and then it died.
Well, the pair is computed eagerly, but the projections of the pair are lazy, so we end up with tons of thunks to compute k ``div`` 10 and k ``rem`` 10.
import qualified Data.Array.Unboxed as A
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int deriving (Show)
simpleTableMults :: A.Array (Int,Int) P
simpleTableMults = A.listArray ((1,1),(10000,2000))
[P (k `div` 10) (k `rem` 10) |
n <- [1 ..10000],m <- [1 ..2000],let k = n*m]
main = print $ simpleTableMults A.! (10000,1000)
This one is fine, because we eagerly computed the pair.

Non-monolithic arrays in Haskell

I have accepted an answer to the question below, but It seemed I misunderstood how Arrays in haskell worked. I thought they were just beefed up lists. Keep that in mind when reading the question below.
I've found that monolithic arrays in haskell are quite inefficient when using them for larger arrays.
I haven't been able to find a non-monolithic implementation of arrays in haskell. What I need is O(1) time look up on a multidimensional array.
Is there an implementation of of arrays that supports this?
EDIT: I seem to have misunderstood the term monolithic. The problem is that it seems like the arrays in haskell treats an array like a list. I might be wrong though.
EDIT2: Short example of inefficient code:
fibArray n = a where
bnds = (0,n)
a = array bnds [ (i, f i) | i <- range bnds ]
f 0 = 0
f 1 = 1
f i = a!(i-1) + a!(i-2)
this is an array of length n+1 where the i'th field holds the i'th fibonacci number. But since arrays in haskell has O(n) time lookup, it takes O(n²) time to compute.
You're confusing linked lists in Haskell with arrays.
Linked lists are the data types that use the following syntax:
[1,2,3,5]
defined as:
data [a] = [] | a : [a]
These are classical recursive data types, supporting O(n) indexing and O(1) prepend.
If you're looking for multidimensional data with O(1) lookup, instead you should use a true array or matrix data structure. Good candidates are:
Repa - fast, parallel, multidimensional arrays -- (Tutorial)
Vector - An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework . (Tutorial)
HMatrix - Purely functional interface to basic linear algebra and other numerical computations, internally implemented using GSL, BLAS and LAPACK.
Arrays have O(1) indexing. The problem is that each element is calculated lazily. So this is what happens when you run this in ghci:
*Main> :set +s
*Main> let t = 100000
(0.00 secs, 556576 bytes)
*Main> let a = fibArray t
Loading package array-0.4.0.0 ... linking ... done.
(0.01 secs, 1033640 bytes)
*Main> a!t -- result omitted
(1.51 secs, 570473504 bytes)
*Main> a!t -- result omitted
(0.17 secs, 17954296 bytes)
*Main>
Note that lookup is very fast, after it's already been looked up once. The array function creates an array of pointers to thunks that will eventually be calculated to produce a value. The first time you evaluate a value, you pay this cost. Here are a first few expansions of the thunk for evaluating a!t:
a!t -> a!(t-1)+a!(t-2)-> a!(t-2)+a!(t-3)+a!(t-2) -> a!(t-3)+a!(t-4)+a!(t-3)+a!(t-2)
It's not the cost of the calculations per se that's expensive, rather it's the need to create and traverse this very large thunk.
I tried strictifying the values in the list passed to array, but that seemed to result in an endless loop.
One common way around this is to use a mutable array, such as an STArray. The elements can be updated as they're available during the array creation, and the end result is frozen and returned. In the vector package, the create and constructN functions provide easy ways to do this.
-- constructN :: Unbox a => Int -> (Vector a -> a) -> Vector a
import qualified Data.Vector.Unboxed as V
import Data.Int
fibVec :: Int -> V.Vector Int64
fibVec n = V.constructN (n+1) c
where
c v | V.length v == 0 = 0
c v | V.length v == 1 = 1
c v | V.length v == 2 = 1
c v = let len = V.length v
in v V.! (len-1) + v V.! (len-2)
BUT, the fibVec function only works with unboxed vectors. Regular vectors (and arrays) aren't strict enough, leading back to the same problem you've already found. And unfortunately there isn't an Unboxed instance for Integer, so if you need unbounded integer types (this fibVec has already overflowed in this test) you're stuck with creating a mutable array in IO or ST to enable the necessary strictness.
Referring specifically to your fibArray example, try this and see if it speeds things up a bit:
-- gradually calculate m-th item in steps of k
-- to prevent STACK OVERFLOW , etc
gradualth m k arr
| m <= v = pre `seq` arr!m
where
pre = foldl1 (\a b-> a `seq` arr!b) [u,u+k..m]
(u,v) = bounds arr
For me, for let a=fibArray 50000, gradualth 50000 10 aran at 0.65 run time of just calling a!50000 right away.

Growing arrays in Haskell

I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.

How to make my Haskell code use Laziness and Garbage collector

I wrote a Haskell code which has to solve the following problem : we have n files : f1, f2, f3 .... fn and I cut those files such a way that each slice has 100 lines
f1_1, f1_2, f1_3 .... f1_m
f2_1, f2_2, .... f2_n
...
fn_1, fn_2, .... fn_k
finally I construct a special data type (Dags) using slices in the following way
f1_1, f2_1, f3_1, .... fn_1 => Dag1
f1_2, f2_2, f3_2, ..... fn_2 => Dag2
....
f1_k, f2_k, f3_k, ..... fn_k => Dagk
the code that I wrote start by cutting all the files, then it couple the i-th elements of the results list and construct Dag using the final result list
it looks like this
-- # take a filename and cut the file in slices of 100 lines
sliceFile :: FilePath -> [[String]]
-- # take a list of lists and group the i-th elements into list
coupleIthElement :: [[String]] -> [[String]]
-- # take a list of lines and create a DAG
makeDags :: [String] -> Dag
-- # final code look like this
makeDag_ :: [FilePath] -> [Dag]
makeDags files = map makeDags $ coupleIthElement (concat (map sliceFile files))
The problem is that this code is non-efficient because :
it needs storing all the files in memory in list form
the garbage collector is not working efficiently since all fonctions need the results list of the previous fonction
How could I re-write my program to take advantage of garbage collector work and Laziness of Haskell ?
if not possible or easier, what can i do to be more efficient even a bit ?
thanks for reply
edit
coupleIthElement ["abc", "123", "xyz"] must return ["a1x","b2y","c3z"]
of cause the 100 lines are arbitrary selected using a particular criteria upon some element of the lines but i discard this aspect to make the problem more easier to understand,
another edition
data Dag = Dag ([(Int, String)], [((Int, Int), Int)]) deriving Show
test_dag = Dag ([(1, "a"),(2, "b"),(3, "c")],[((1,2),1),((1,3),1)])
test_dag2 = Dag ([],[])
the first list is each vertice define by the number and the label, the second list is the edges ((1,2),3) means edge between vertice 1 and 2 with the cost 3
A few points:
1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.
2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:
dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine
allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM 100 (dagFromHandles handles)
This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do
useDag :: Dag -> IO ()
runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM_ 100 (dagFromHandles handles >>= useDag)
This should make more efficient use of garbage collection.
Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.
If it's not needed to process your file in slices, avoid this. Haskell does this automatically! In Haskell, you think of IO as a stream. Data is read from input, as soon as it's needed and discarded, as soon as it's unused. So for instance, this is an easy file-copying programm:
main = interact id
interact has the signature interact :: (String -> String) -> IO (), and feeds the input into a function which handles it and produces some output, which is written to stdout. This program is more efficient then most C-implementations, as the runtime automatically buffers the input and output.
If you want to understand laziness, you have to forget all the wisdom you learned as a imperative programmer and have to think about a program as a description to modify data, not as a set of instructions - data is only processed when needed!
The key point, why your data may be handled the wrong way is the multiple traversion of the list. Your function makeDags traverses the transposed the slices list one by one, so the elements of the original list may not be discarded. What you should try, is to write your function in a way like this:
sliceFile :: FilePath -> [[String]]
sliceFile fp = do
f <- readFile fp
let l = lines fp
slice [] = []
slice x = ll : slice ls where (ll,ls) = splitAt 100 x
return slice l
sliceFirstRow :: [[String]] -> ([String],[[String]])
sliceFirstRow list = unzip $ map (\(x:xs) -> (x,xs)) list
makeDags :: [[String]] -> [Dag]
makeDags [[]] = []
makeDags list = makeDag firstRow : makeDags restOfList where
(firstRow,restOfList) = sliceFirstRow list
This function may be a solution, since the first row is no longer referenced, when it's done. But in the most places, this is a result of laziness, so you could probably try to use seq to force building the Dags and allowing the IO data to be garbage-collected. (If you don't force building the dags, the data can't be garbage collected).
But anyway, I could provide a more helpfull answer, if you give some informations about what these dags are.

Resources