I'm trying to implement a recursive function to remove empty directories in purescript. For the following code I get an error about matching Effect with Array.
module Test where
import Prelude
import Data.Array as Array
import Effect (Effect)
import Node.Buffer.Class (toArray)
import Node.FS.Stats (isDirectory)
import Node.FS.Sync as FS
import Node.Path (FilePath)
import Prim.Boolean (False)
rmEmptyDirs :: FilePath -> Effect Unit
rmEmptyDirs path = do
stats <- FS.stat path
if isDirectory stats then do
files <- FS.readdir path
if Array.length files == 0 then
FS.rmdir path
else do
file <- files
rmEmptyDirs file
else
pure unit
Here is the error message:
Could not match type
Effect
with type
Array
while trying to match type Effect Unit
with type Array t0
while checking that expression rmEmptyDirs file
has type Array t0
in binding group rmEmptyDirs
where t0 is an unknown type
I understand that the innermost do block is in an Array context. I don't know how to "strip off" the Effect from the recursive call to rmEmptyDirs. Putting Array.singleton $ before the call doesn't help. liftEffect has the opposite effect of what I want to do. How do I get this compile?
The standard way to thread one context through another is traverse.
Look at the type signature:
traverse :: forall a b m. Applicative m => (a -> m b) -> t a -> m (t b)
First you give it a function a -> m b - in your case that would be rmEmptyDirs with a ~ FilePath, m ~ Effect, and b ~ Unit.
Then you give it some container t (in your case Array) full of a (in your case FilePath).
And it runs the function on every value in the container, and returns the same container full of resulting values b, the whole container wrapped in context m.
In practice this would look like this:
traverse rmEmptyDirs files
Then you'd also need to throw away the array of units, or else the compiler would complain that you're implicitly discarding it. To do that, either bind it to a throwaway variable:
_ <- traverse rmEmptyDirs files
Or use the void combinator, which does the same thing:
void $ traverse rmEmptyDirs files
Another useful thing is for, which is just traverse with arguments flipped, but the flipped arguments allow you to seamlessly pass a lambda-expression as the argument, making the whole thing look almost like a for statement from C-like languages. Very handy for cases when you don't want to give a name to the function you're using to traverse:
for files \file -> do
log $ "Recursing into " <> file
rmEmptyDirs file
Finally, unrelated hint: instead of if foo then bar else pure unit, use the when combinator. It would allow you to drop the else branch:
when (isDirectory stat) do
file <- FS.readDir ...
...
And instead of length ... == 0 use null:
if Array.null files then ...
For Array this doesn't matter, but for many other containers length is O(n) while null is O(1), so it's good to build the habit.
Related
I need your help guys.
Im trying to learn and do a simple task in haskell, but it's still hard for me.
What im trying to do is: Read a line of numbers separated with whitespace, iterate over that list, check values, and if values are not zero add 1 otherwise -1. I was trying to do it watching some tutorials and other project code, but it just outputs a bunch of errors.
My code:
import System.Environment
import Control.Monad
import Text.Printf
import Data.List
import System.IO
solve :: IO ()
solve = do
nums <- map read . words <$> getLine
print (calculate nums)
calculate (x:xs) = x + check xs
check num
| num == 0 =
-1
| otherwise =
1
main :: IO ()
main = do
n <- readLn
if n /= 0
then do
printf "Case: "
solve
else main
Errors:
C:\Users\Donatas\Documents\haskell\la3.hs:9:21: error:
* Ambiguous type variable `b0' arising from a use of `read'
prevents the constraint `(Read b0)' from being solved.
Probable fix: use a type annotation to specify what `b0' should be.
These potential instances exist:
instance Read BufferMode -- Defined in `GHC.IO.Handle.Types'
instance Read Newline -- Defined in `GHC.IO.Handle.Types'
instance Read NewlineMode -- Defined in `GHC.IO.Handle.Types'
...plus 25 others
...plus six instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
* In the first argument of `map', namely `read'
In the first argument of `(.)', namely `map read'
In the first argument of `(<$>)', namely `map read . words'
|
9 | nums <- map read . words <$> getLine
| ^^^^
C:\Users\Donatas\Documents\haskell\la3.hs:10:9: error:
* Ambiguous type variable `a0' arising from a use of `print'
prevents the constraint `(Show a0)' from being solved.
Probable fix: use a type annotation to specify what `a0' should be.
These potential instances exist:
instance Show HandlePosn -- Defined in `GHC.IO.Handle'
instance Show BufferMode -- Defined in `GHC.IO.Handle.Types'
instance Show Handle -- Defined in `GHC.IO.Handle.Types'
...plus 27 others
...plus 13 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
* In a stmt of a 'do' block: print (calculate nums)
In the expression:
do nums <- map read . words <$> getLine
print (calculate nums)
In an equation for `solve':
solve
= do nums <- map read . words <$> getLine
print (calculate nums)
|
10 | print (calculate nums)
| ^^^^^^^^^^^^^^^^^^^^^^
C:\Users\Donatas\Documents\haskell\la3.hs:12:1: error:
* Non type-variable argument in the constraint: Num [a]
(Use FlexibleContexts to permit this)
* When checking the inferred type
calculate :: forall a. (Eq a, Num [a], Num a) => [a] -> a
|
12 | calculate (x:xs) = x + check xs
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.
To start with, I suggest you default to always writing type annotations. And before you start implementing anything, sketch out what the types of your program look like. For this program I suggest you start from:
main :: IO ()
solve :: String -> String
calculate :: [Int] -> Int
check :: Int -> Int
The names could also probably be improved to better convey what it is they're doing.
Note that there is only one function with type IO _. This serves to isolate the impure part of your program, which will make your life easier (e.g. testing, code reasoning, etc).
You're not far off. Just try reworking your code to fit into the above types. And be aware that you're missing a pattern in your calculate implementation ;)
If you inspect your code and follow the types, it is crystal-clear where the error is. Yes, you can add type annotations -- that is highly recommended -- but I find your code is so simple you could get away with just a bit of equational reasoning.
It starts with solve, it is easy to see that nums is of type Read a => [a], given that you split a string by words (i.e. [String]) and map read over it. So a list of as is what you give to calculate. As you know, a list is the disjoint sum between (1) the empty list ([]) and (2) a cons cell made of a head, an element of type a, and a tail, the rest of the list ((x:xs)).
First thing you notice is that the case of the empty list is missing; let's add it:
calculate [] = 0 -- I assume this is correct
On to the body of calculate and check. The latter clearly expects a number, you can be a bit more concise by the way:
check 0 = -1
check _ = 1
Now if you look at calculate, you see that you are calling check and passing it xs. What is xs? It is bound in the pattern (x:xs) which is how you uncons a cons cell. Clearly, xs is the tail of the cell and so a list itself. But check expects a number! The only number you can expect here is x, not xs. So let's change you code to
calculate (x:xs) = check x + ...
Your specifications state that you want to iterate over the list. That can only happen if you do something with xs. What can you do with it? The only answer to that is to call calculate recursively:
calculate (x:xs) = check x + calculate xs
... and with these changes, your code is fine.
I have a function that creates recursively a flattened list of matrices from a tree that have to be mutable as their elements are updated often during their creation. So far I have come up with a recursive solution that has the signature:
doAll :: .. -> [ST s (STArray s (Int, Int) Int)]
The reason I do not return the [UArray (Int,Int) Int] directly is because doAll is called recursively, modifies elements of the matrices in the list and appends new matrices. I don't want to freeze and thaw the matrices unnecessarily.
So far so good. I can inspect the n-th matrix (of type Array (Int, Int) Int) in ghci
runSTArray (matrices !! 0)
runSTArray (matrices !! 1)
and indeed I get the correct results for my algorithm. However, I didn't find a way to map runSTUArray over the list that is returned by doAll:
map (runSTArray) matrices
Couldn't match expected type `forall s. ST s (STArray s i0 e0)'
with actual type `ST s0 (STArray s0 (Int, Int) Int)'
The same problem happens if I try to evaluate recursively over the list or try to evaluate single elements wrapped in a function
Could someone please explain what is going on (I didn't really understand the implications of the forall keyword) and how I could evaluate the arrays in the list?
This is an unfortunate consequence of the type trick that makes ST safe. First, you need to know how ST works. The only way to get from the ST monad to pure code is with the runST function, or other functions built upon it like runSTArray. These are all of the form forall s.. This means that, in order to construct an Array from an STArray, the compiler must be able to determine that it can substitute any type it likes in for the s type variable inside runST.
Now consider the function map :: (a -> b) -> [a] -> [b]. This shows that every element in the list must have exactly the same type (a), and therefore also the same s. But this extra constraint violates the type of runSTArray, which declares that the compiler must be able to freely substitute other values for s.
You can work around this by defining a new function to first freeze the arrays inside the ST monad, then run the resulting ST action:
runSTArrays :: Ix ix => (forall s. [ST s (STArray s ix a)]) -> [Array ix a]
runSTArrays arrayList = runST $ (sequence arrayList >>= mapM freeze)
Note the forall requires the RankNTypes extension.
You just bounced against the limitations of the type system.
The runSTArray has a higher ranked type. You must pass it a ST-action whose state type variable is unique. Yet, in Haskell it is normally not possible to have such values in lists.
The whole thing is a clever scheme to make sure that values you produce in an ST action can't escape from there. Which means, it looks like your design is somehow broken.
One suggestion: can't you process the values in another ST action, like
sequence [ ... your ST s (STArray s x) ...] >>= processing
where
processing :: [STArray s x] -> ST s (your results)
I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.
I wrote a Haskell code which has to solve the following problem : we have n files : f1, f2, f3 .... fn and I cut those files such a way that each slice has 100 lines
f1_1, f1_2, f1_3 .... f1_m
f2_1, f2_2, .... f2_n
...
fn_1, fn_2, .... fn_k
finally I construct a special data type (Dags) using slices in the following way
f1_1, f2_1, f3_1, .... fn_1 => Dag1
f1_2, f2_2, f3_2, ..... fn_2 => Dag2
....
f1_k, f2_k, f3_k, ..... fn_k => Dagk
the code that I wrote start by cutting all the files, then it couple the i-th elements of the results list and construct Dag using the final result list
it looks like this
-- # take a filename and cut the file in slices of 100 lines
sliceFile :: FilePath -> [[String]]
-- # take a list of lists and group the i-th elements into list
coupleIthElement :: [[String]] -> [[String]]
-- # take a list of lines and create a DAG
makeDags :: [String] -> Dag
-- # final code look like this
makeDag_ :: [FilePath] -> [Dag]
makeDags files = map makeDags $ coupleIthElement (concat (map sliceFile files))
The problem is that this code is non-efficient because :
it needs storing all the files in memory in list form
the garbage collector is not working efficiently since all fonctions need the results list of the previous fonction
How could I re-write my program to take advantage of garbage collector work and Laziness of Haskell ?
if not possible or easier, what can i do to be more efficient even a bit ?
thanks for reply
edit
coupleIthElement ["abc", "123", "xyz"] must return ["a1x","b2y","c3z"]
of cause the 100 lines are arbitrary selected using a particular criteria upon some element of the lines but i discard this aspect to make the problem more easier to understand,
another edition
data Dag = Dag ([(Int, String)], [((Int, Int), Int)]) deriving Show
test_dag = Dag ([(1, "a"),(2, "b"),(3, "c")],[((1,2),1),((1,3),1)])
test_dag2 = Dag ([],[])
the first list is each vertice define by the number and the label, the second list is the edges ((1,2),3) means edge between vertice 1 and 2 with the cost 3
A few points:
1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.
2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:
dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine
allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM 100 (dagFromHandles handles)
This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do
useDag :: Dag -> IO ()
runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM_ 100 (dagFromHandles handles >>= useDag)
This should make more efficient use of garbage collection.
Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.
If it's not needed to process your file in slices, avoid this. Haskell does this automatically! In Haskell, you think of IO as a stream. Data is read from input, as soon as it's needed and discarded, as soon as it's unused. So for instance, this is an easy file-copying programm:
main = interact id
interact has the signature interact :: (String -> String) -> IO (), and feeds the input into a function which handles it and produces some output, which is written to stdout. This program is more efficient then most C-implementations, as the runtime automatically buffers the input and output.
If you want to understand laziness, you have to forget all the wisdom you learned as a imperative programmer and have to think about a program as a description to modify data, not as a set of instructions - data is only processed when needed!
The key point, why your data may be handled the wrong way is the multiple traversion of the list. Your function makeDags traverses the transposed the slices list one by one, so the elements of the original list may not be discarded. What you should try, is to write your function in a way like this:
sliceFile :: FilePath -> [[String]]
sliceFile fp = do
f <- readFile fp
let l = lines fp
slice [] = []
slice x = ll : slice ls where (ll,ls) = splitAt 100 x
return slice l
sliceFirstRow :: [[String]] -> ([String],[[String]])
sliceFirstRow list = unzip $ map (\(x:xs) -> (x,xs)) list
makeDags :: [[String]] -> [Dag]
makeDags [[]] = []
makeDags list = makeDag firstRow : makeDags restOfList where
(firstRow,restOfList) = sliceFirstRow list
This function may be a solution, since the first row is no longer referenced, when it's done. But in the most places, this is a result of laziness, so you could probably try to use seq to force building the Dags and allowing the IO data to be garbage-collected. (If you don't force building the dags, the data can't be garbage collected).
But anyway, I could provide a more helpfull answer, if you give some informations about what these dags are.
I have a file which look like this index : label, index's value contain keys in the range of 0... 100000000 and label can be any String value, I want split this file which has 110 Mo in many slices of 100 lines each an make some computation upon each slice. How can I do this?
123 : "acgbdv"
127 : "ytehdh"
129 : "yhdhgdt"
...
9898657 : "bdggdggd"
If you're using String IO, you can do the following:
import System.IO
import Control.Monad
-- | Process 100 lines
process100 :: [String] -> MyData
-- whatever this function does
loop :: [String] -> [MyData]
loop lns = go [] lns
where
go acc [] = reverse acc
go acc lns = let (this, next) = splitAt 100 lns in go (process100 this:acc) next
processFile :: FilePath -> IO [MyData]
processFile f = withFile f ReadMode (fmap (loop . lines) . hGetContents)
Note that this function will silently process the last chunk even if it isn't exactly 100 lines.
Packages like bytestring and text generally provide functions like lines and hGetContents so you should be able to easily adapt this function to any of them.
It's important to know what you're doing with the results of processing each slice, because you don't want to hold on to that data for longer than necessary. Ideally, after each slice is calculated the data would be entirely consumed and could be gc'd. Generally either the separate results get combined into a single data structure (a "fold"), or each one is dealt with separately (maybe outputting a line to a file or something similar). If it's a fold, you should change "loop" to look like this:
loopFold :: [String] -> MyData -- assuming there is a Monoid instance for MyData
loopFold lns = go mzero lns
where
go !acc [] = acc
go !acc lns = let (this, next) = splitAt 100 lns in go (process100 this `mappend` acc) next
The loopFold function uses bang patterns (enabled with "LANGUAGE BangPatterns" pragma) to force evaluation of the "MyData". Depending on what MyData is, you may need to use deepseq to make sure it's fully evaluated.
If instead you're writing each line to output, leave loop as it is and change processFile:
processFileMapping :: FilePath -> IO ()
processFileMapping f = withFile f ReadMode pf
where
pf = mapM_ (putStrLn . show) <=< fmap (loop . lines) . hGetContents
If you're interested in enumerator/iteratee style processing, this is a pretty simple problem. I can't give a good example without knowing what sort of work process100 is doing, but it would involve enumLines and take.
Is it necessary to process exactly 100 lines at a time, or do you just want to process in chunks for efficiency? If it's the latter, don't worry about it. You'd most likely be better off processing one line at a time, using either an actual fold function or a function similar to processFileMapping.