In Haskell is possible to write functions over a size indexed list that ensure that we never get out of bounds. A possible implementation is:
data Nat = Zero | Succ Nat deriving (Eq, Ord, Show)
infixr 5 :-
data Vec (n :: Nat) a where
Nil :: Vec 'Zero a
(:-) :: a -> Vec n a -> Vec ('Succ n) a
data Fin (n :: Nat) where
FZ :: Fin ('Succ n)
FS :: Fin n -> Fin ('Succ n)
vLookup :: Vec n a -> Fin n -> a
vLookup Nil _ = undefined
vLookup (x :- _) FZ = x
vLookup (_ :- xs) (FS i) = vLookup xs i
Of course, this is just fine for immutable size indexed lists (aka Vec).
But how about mutable ones? Is possible to define (or is there a library for) mutable size indexed arrays in Haskell? If there's no such library, how it could be implemented?
Edit 1: I searched Hackage and didn't found any library matching my description (size indexed mutable arrays).
Edit 2: I would like to mention that I have thought use IORef's to get the desired mutability:
type Array n a = IORef (Vec n a)
but I'm wondering if there's a better (more efficient, more elegant) option...
Such a type does exist on Hackage.
I would avoid something like type Array n a = IORef (Vec n a). Mutable arrays are all about efficiency. If you don't need it to run fast / with low memory footprint, then there's not much point in using them – even “mutable algorithms” are generally easier to express in Haskell using functional style, perhaps with a state monad but no true destructive mutable state.
But if efficiency matters, then you also want tight cache-efficient storage. Unboxed vectors are ideal. OTOH, Vec is on the runtime data level no different from ordinary lists, which are of course not so great in terms of cache coherence. Even if you defined them to actually interweave mutability into the list spine, it wouldn't be really better in any way than using immutable Vecs in a pure-functional style.
So, if I had to write something like that simple, I'd rather wrap an (unsafe, length-wise) unboxed mutable arrox in a length-indexed newtype.
import qualified Data.Vector.Unboxed.Mutable as VUM
newtype MVec (s :: *) (n :: Nat) (a :: *)
= MVec { getMVector :: VUM.MVector s a }
You can then define an interface that makes all public operations type-checked length-safe, while still retaining the performance profile of MVector.
Related
What is the fastest way to flatten an array of arrays in ocaml? Note that I mean arrays, and not lists.
I'd like to do this linearly, with the lowest coefficients possible.
OCaml Standard Library is rather deficient and requires you to implement so many things from scratch. That's why we have extended libraries like Batteries and Core. I would suggest you to use them, so that you will not face such problems.
Still, for the sake of completeness, let's try to implement our own solution, and then compare it with a proposed fun xxs -> Array.(concat (to_list xxs)) solution.
In the implementation we have few small problems. First of all in order to construct an array we need to provide a value for each cell. We can't just create an uninitialized array, this will break a type system. We can, of course use Obj module, but this is rather ugly. Another problem, is that the input array can be empty, so we need to handle this case somehow. We can, of course, just raise an exception, but I prefer to make my functions total. It is not obvious though, how to create an empty array, but it is not impossible:
let empty () = Array.init 0 (fun _ -> assert false)
This is a function that will create an empty polymorphic array. We use a bottom value (a value that is an inhabitant of every type), denoted as assert false. This is typesafe and neat.
Next is how to create an array, without having a default value. We can, write a very complex code, that will use Array.init and translate ith index to j'th index of n'th array. But this is tedious, error prone and quite ineffective. Another approach would be to find a first value in the input array and use it as a default. Here comes another problem, as in Standard Library we don't have an Array.find function. Sic. It's a shame that in 21th century we need to write an Array.find function, but this is how life is made. Again, use Core (or Core_kernel) library or Batteries. There're lots of excellent libraries in OCaml community available via opam. But back to our problem, since we don't have a find function we will use our own custom solution. We can use fold_left, but it will traverse the whole array, albeit we need to find only the first element. There is a solution, we can use exceptions, for non-local exits. Don't be afraid, this is idiomatic in OCaml. Also raising and catching an exception in OCaml is very fast. Other than non local exit, we also need to send the value, that we've found. We can use a reference cell as a communication channel. But this is rather ugly, and we will use the exception itself to bear the value for us. Since we don't know the type of an element in advance, we will use two modern features of OCaml language. Local abstract types and local modules. So let's go for the implementation:
let array_concat (type t) xxs =
let module Search = struct exception Done of t end in
try
Array.iter (fun xs ->
if Array.length xs <> 0
then raise_notrace (Search.Done xs.(0))) xxs;
empty ()
with Search.Done default ->
let len =
Array.fold_left (fun n xs -> n + Array.length xs) 0 xxs in
let ys = Array.make len default in
let _ : int = Array.fold_left (fun i xs ->
let len = Array.length xs in
Array.blit xs 0 ys i len;
i+len) 0 xxs in
ys
Now, the interesting part. Benchmarking! Let's use a proposed solution for comparison:
let default_concat xxs = Array.concat (Array.to_list xxs)
Here goes our testing harness:
let random_array =
Random.init 42;
let max = 100000 in
Array.init 1000 (fun _ -> Array.init (Random.int max) (fun i -> i))
let test name f =
Gc.major ();
let t0 = Sys.time () in
let xs = f random_array in
let t1 = Sys.time () in
let n = Array.length xs in
printf "%s: %g sec (%d bytes)\n%!" name (t1 -. t0) n
let () =
test "custom " array_concat;
test "default" default_concat
And... the results:
$ ./array_concat.native
custom : 0.38 sec (49203647 bytes)
default: 0.20 sec (49203647 bytes)
They don't surprise me, by the way. Our solution is two times slower than the standard library. The moral of this story is:
Always benchmark before optimizing
Use extended libraries (core, batteries, containers, ...)
Update (concatenating arrays using Base)
With the base library, we can concatenate arrays easily,
let concat_base = Array.concat_map ~f:ident
And here's our benchmark:
./example.native
custom : 0.524071 sec (49203647 bytes)
default: 0.308085 sec (49203647 bytes)
base : 0.201688 sec (49203647 bytes)
So now the base implementation is the fastest and the smallest.
Say I have an array of integers A such that A[i] = j, and I want to "invert it"; that is, to create another array of integers B such that B[j] = i.
This is trivial to do procedurally in linear time in any language; here's a Python example:
def invert_procedurally(A):
B = [None] * (max(A) + 1)
for i, j in enumerate(A):
B[j] = i
return B
However, is there any way to do this functionally (as in functional programming, using map, reduce, or functions like those) in linear time?
The code might look something like this:
def invert_functionally(A):
# We can't modify variables in FP; we can only return a value
return map(???, A) # What goes here?
If this is not possible, what is the best (most efficient) alternative when doing functional programming?
In this context are arrays mutable or immutable? Generally I'd expect the mutable case to be about as straightforward as your Python implementation, perhaps aside from a few wrinkles with types. I'll assume you're more interested in the immutable scenario.
This operation inverts the indices and elements, so it's also important to know something about what constitutes valid array indices and impose those same constraints on the elements. Haskell has a class for index constraints called Ix. Any Ix type is ordered and has a range implementation to make an ordered list of indices ranging from one specified index to another. I think this Haskell implementation does what you want.
import Data.Array.IArray
invertArray :: (Ix x) => Array x x -> Array x x
invertArray arr = listArray (low,high) newElems
where oldElems = elems arr
newElems = indices arr
low = minimum oldElems
high = maximum oldElems
Under the hood listArray uses zipWith and range to associate indices in the specified range to the listed elements. That part ought to be linear time, and so is the one-time operation of extracting elements and indices from an array.
Whenever the sets of the input arrays indices and elements differ some elements will be undefined, which for better or worse blow up faster than Python's None. I believe you could overcome the undefined issue by implementing new Ix a instances over the Maybe monad, for instance.
Quick side-note: check out the invPerm example in the Haskell 98 Library Report. It does something similar to invertArray, but assumes up front that input array's elements are a permutation of its indices.
A solution needing mapand 3 operations:
toTuples views an the array as a list of tuples (i,e) where i is the index and e the element in the array at that index.
fromTuples creates and loads an array from a list of tuples.
swap which takes a tuple (a,b) and returns (b,a)
Hence the solution would be (in Haskellish notation):
invert = fromTuples . map swap . toTuples
How is an array created in haskell using the constructor array? I mean, does it create the first element and so on? In that case how does it read the associated list?
For example if we consider the following two programs:-
ar :: Int->(Array Int Int)
ar n = array (0,(n-1)) (((n-1),1):[(i,((ar n)!(i+1))) | i<-[0..(n-2)]])
ar :: Int->(Array Int Int)
ar n = array (0,(n-1)) ((0,1):[(i,((ar n)!(i-1))) | i<-[1..(n-1)]])
Will these two have different time complexity?
That depends on the implementation, but in a reasonable implementation, both have the same complexity (linear in the array size).
In GHC's array implementation, if we look at the code
array (l,u) ies
= let n = safeRangeSize (l,u)
in unsafeArray' (l,u) n
[(safeIndex (l,u) n i, e) | (i, e) <- ies]
{-# INLINE unsafeArray' #-}
unsafeArray' :: Ix i => (i,i) -> Int -> [(Int, e)] -> Array i e
unsafeArray' (l,u) n#(I# n#) ies = runST (ST $ \s1# ->
case newArray# n# arrEleBottom s1# of
(# s2#, marr# #) ->
foldr (fill marr#) (done l u n marr#) ies s2#)
{-# INLINE fill #-}
fill :: MutableArray# s e -> (Int, e) -> STRep s a -> STRep s a
-- NB: put the \s after the "=" so that 'fill'
-- inlines when applied to three args
fill marr# (I# i#, e) next
= \s1# -> case writeArray# marr# i# e s1# of
s2# -> next s2#
we can see that first a new chunk of memory is allocated for the array, that is then sequentially filled with arrEleBottom (which is an error call with message "undefined array element"), and then the elements supplied in the list are written to the respective indices in the order they appear in the list.
In general, since it is a boxed array, what is written to the array on construction is a thunk that specifies how to compute the value when it is needed (explicitly specified values, like the literal 1 in your examples, will result in a direct pointer to that value written to the array).
When the evaluation of such a thunk is forced, it may force also the evaluation of further thunks in the array - if the thunk refers to other array elements, like here. In the specific examples here, forcing any thunk results in forcing all thunks later resp. earlier in the array until the end with the entry that doesn't refer to another array element is reached. In the first example, if the first array element that is forced is the one at index 0, that builds a thunk of size proportional to the array length that is then reduced, so forcing the first array element then has complexity O(n), then all further elements are already evaluated, and forcing them is O(1). In the second example, the situation is symmetric, there forcing the last element first incurs the total evaluation cost. If the elements are demanded in a different order, the cost of evaluating all thunks is spread across the requests for different elements, but the total cost is the same. The cost of evaluating any not-yet-evaluated thunk is proportional to its distance from the next already evaluated thunk, and includes evaluating all thunks in between.
Since array access is constant time (except for cache effects, but those should not make a difference if you fill the array either forward or backward, they could make a big difference if the indices were in random order, but that still wouldn't affect time complexity), both have the same complexity.
Note however, that your using ar n to define the array elements carries the risk of multiple arrays being allocated (if compiled without optimisations, GHC does that - just tested: even with optimisations that can happen). To make sure that only one is constructed, make it
ar n = result
where
result = array (0,n-1) (... result!index ...)
Is there an efficient fixed-size list library in Haskell? I think the IArray interface is somewhat complicated when one only wants arrays indexed by natural numbers [including zero]. I want to write code like
zeroToTwenty :: Int -> FixedList Int
zeroToTwenty 0 = createFixedList 21 []
zeroToTwenty n = zeroToTwenty (n-1) `append` n
my naive solution is below.
Edit: Sorry for the lack of context; I want a datastructure that can be allocated once, to avoid excessive garbage collection. This was in the context of the merge routine for merge sort, which takes two sorted sublists and produces a single sorted list.
How about using the vector package? It provides very efficient growable vectors with a list-like interface, and O(1) indexing.
I would probably use Vector as Don Stewart suggests, but you can use a list-like interface with IArray by using ListLike.
You might consider using a finger tree. It offers amortized O(1) cons, snoc, uncons, and unsnoc, and O(log n) split.
Here's my naive solution,
import Data.Array.Diff
newtype FixedList a = FixedList (Int, (DiffArray Int a))
createFixedList n init = FixedList (0, array (0, n - 1) init)
append (FixedList (curr, array)) v = FixedList (curr + 1, array // [(curr, v)])
instance Show a => Show (FixedList a) where
show (FixedList (curr, arr)) = show $ take curr (elems arr)
I see that I can map a function over mutable arrays with mapArray, but there doesn't seem to be something like mapM (and mapM_). mapArray won't let me print its elements, for example:
import Data.Array.Storable
arr <- newArray (1,10) 42 :: IO -- answer to Life, Universe and Everything
x <- readLn :: IO Int
mapArray (putStrLn.show) arr -- <== this doesn't work!
The result will be:
No instances for (MArray StorableArray Int m,
MArray StorableArray (IO ()) m)
arising from a use of `mapArray' at <interactive>:1:0-27
Possible fix:
add an instance declaration for
(MArray StorableArray Int m, MArray StorableArray (IO ()) m)
In the expression: mapArray (putStrLn . show) arr
In the definition of `it': it = mapArray (putStrLn . show) arr
Is there something like that in Haskell (or in GHC even if not standard Haskell)?
Also, I found no foldr/foldl functions for arrays (mutable or not). Do they exist?
Thanks a lot!
Import the module Data.Traversable. It defines a typeclass for just what you want with instances already defined for array and all sorts of things. It has generalized versions of sequence and mapM, plus some even more general functions that you probably won't bother with very often.
Just a simple
import Data.Traversable as T
T.mapM doIOStuff arr
works fine.
Perhaps use one of the other array libraries, if you're doing a lot of mutation? Like uvector?
Otherwise,
forM_ [1..n] \$ \i ->. unsafeWrite x i
should be fine.
For the example of printing all the elements: you can use "mapM_ print . elems".
But it sounds like you want to create a new array where each value is the result of a monadic action of the previous one? In that case:
arrayMapM :: (Monad m, Ix i) => (a -> m b) -> Array i a -> m (Array i b)
arrayMapM func src =
liftM (listArray (bounds src)) . mapM func . elems $ src