I have a file which contains a set of 200,000+ words and I want the program to read the data and store it in array and form a new array with all the 200,000+ words.
I wrote the code as
import System.IO
main = do
handle <- openFile "words.txt" ReadMode
contents <- hGetContents handle
con <- lines contents
putStrLn ( show con)
hClose handle
But it is giving error as type error at line 5
And the text file is a of the form
ABRIDGMENT
ABRIDGMENTS
ABRIM
ABRIN
ABRINS
ABRIS
and so on
what are the amendments in the code that it can can form a array of words
I solved it in python (HTH)
def readFile():
allWords = []
for word in open ("words.txt"):
allWords.append(word.strip())
return allWords
Maybe
readFile "words.txt" >>= return . words
with type
:: IO [String]
or you can write
getWordsFromFile :: String -> IO [String]
getWordsFromFile file = readFile file >>= return . words
and use as
main = do
wordList <- getWordsFromFile "words.txt"
putStrLn $ "File contains " ++ show (length wordList) ++ " words."
Very constructive comments from #sanityinc and #Sarah (thanks!):
#sanityinc: "Other options: fmap words $ readFile file or words <$> readFile file if you've imported <$> from Control.Applicative"
#Sarah: "To elaborate a bit, whenever you see foo >>= return . bar you can (and should) replace it with fmap bar foo because you're not actually using the extra powers that come with Monad and in most cases restricting yourself to a needlessly complex type is not beneficial. This will be even more true in the future where Applicative is a superclass of Monad"
Related
I am trying to read text of all files in a folder with following code:
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- hGetLine hand
hClose hand
print $ "First line: " <> fline
else return ()
However, some of these files are binary. How can I find if a given file is binary? I could not find any such function in https://hoogle.haskell.org/?hoogle=binary%20file
Thanks for your help.
Edit: By binary I mean the file has unprintable characters. I am not sure of proper term for these files.
I installed UTF8-string and modified the code:
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- hGetLine hand
hClose hand
if isUTF8Encoded (unpack fline) then do
print $ "Not binary file."
print $ "First line: " <> fline
else return ()
else return ()
Now it works but on encountering a 'binary' executable file (called esync.x), there is error at hGetLine hand expression:
"Filename: ./esync.x; "firstline2.hs: ./esync.x: hGetLine: invalid argument (invalid byte sequence)
How can I check about characters from file handle itself?
The definition of binary is quite vague, but assuming you mean content which is not valid UTF-8 text.
You should use toString in Data.ByteString.UTF8 which replaces non-UTF-8 characters with a replacement character but doesn't fail with an error.
Converting your example to use UTF-8 ByteStrings:
import Data.Monoid
import System.IO
import System.Directory
import qualified Data.ByteString as B
import qualified Data.ByteString.UTF8 as B
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- B.hGetLine hand
hClose hand
print $ "First line: " <> B.toString fline
else return ()
This code doesn't fail on binary but is not really detecting binary content. If you want to detect binary, look for B.replacement_char in your data. To detect non-printable characters, you may look for code points smaller than 32 (space character) as well.
I want to read a list of strings seperated by newlines from STDIN, until a new line is witnessed and I want an action of the type IO [String]. Here is how I would do it with recursion:
myReadList :: IO String
myReadList = go []
where
go :: [String] -> IO [String]
go l = do {
inp <- getLine;
if (inp == "") then
return l;
else go (inp:l);
}
However, this method of using go obscures readability and is a pattern so common that one would ideally want to abstract this out.
So, this was my attempt:
whileM :: (Monad m) => (a -> Bool) -> [m a] -> m [a]
whileM p [] = return []
whileM p (x:xs) = do
s <- x
if p s
then do
l <- whileM p xs
return (s:l)
else
return []
myReadList :: IO [String]
myReadList = whileM (/= "") (repeat getLine)
I am guessing there is some default implementation of this whileM or something similar already. However I cannot find it.
Could someone point out what is the most natural and elegant way to deal with this problem?
unfoldWhileM is same as your whileM except that it takes an action (not a list) as second argument.
myReadList = unfoldWhileM (/= "") getLine
Yes for abstracting out the explicit recursion as mentioned in the previous answer there is the Control.Monad.Loop library which is useful. For those who are interested here is a nice tutorial on Monad Loops.
However there is another way. Previously, struggling with this job and knowing that Haskell is by default Lazy i first tried;
(sequence . repeat $ getLine) >>= return . takeWhile (/="q")
I expected the above to collect entered lines into an IO [String] type. Nah... It runs indefinitely and IO actişons don't look lazy at all. At this point System IO Lazy might come handy too. It's a 2 function only simple library.
run :: T a -> IO a
interleave :: IO a -> T a
So run takes an Lazy IO action and turns it into an IO action and interleave does the opposite. Accordingly if we rephrase the above function as;
import qualified System.IO.Lazy as LIO
gls = LIO.run (sequence . repeat $ LIO.interleave getLine) >>= return . takeWhile (/="q")
Prelude> gls >>= return . sum . fmap (read :: String -> Int)
1
2
3
4
q
10
A solution using the effectful streams of the streaming package:
import Streaming
import qualified Streaming.Prelude as S
main :: IO ()
main = do
result <- S.toList_ . S.takeWhile (/="") . S.repeatM $ getLine
print result
A solution that shows prompts, keeping them separated from the reading actions:
main :: IO ()
main = do
result <- S.toList_
$ S.zipWith (\_ s -> s)
(S.repeatM $ putStrLn "Write something: ")
(S.takeWhile (/="") . S.repeatM $ getLine)
print result
I'm using Haskell for programming a parser, but this error is a wall I can't pass. Here is my code:
main = do
arguments <- getArgs
let fileName = head arguments
fileContents <- readFile fileName
converter <- open "UTF-8" Nothing
let titleLength = length fileName
titleWithoutExtension = take (titleLength - 4) fileName
allNonEmptyLines = unlines $ tail $ filter (/= "") $ lines fileContents
When I try to read a file with "US-ASCII" encoding I get the famous error hGetContents: invalid argument (invalid byte sequence). I've tried to change the "UTF-8" in my code by "US-ASCII", but the error persist. Is there a way for reading this files, or any kind of file handling encoding problems?
You should hSetEncoding to configure the file handle for a specific text encoding, e.g.:
import System.Environment
import System.IO
main = do
(path : _) <- getArgs
h <- openFile path ReadMode
hSetEncoding h latin1
contents <- hGetContents h
-- no need to close h
putStrLn $ show $ length contents
If your file contains non-ASCII characters and it's not UTF8 encoded, then latin1 is a good bet although it's not the only possibility.
problem
I have a Type:
data FSObject = Folder String [FSObject] | File String
and I need to make a function
search :: String -> FSObject -> Maybe String
that returns a path to File that we are searching for (if exists)
(String in search function should be equal to the name (the String in File object (w/o path)) of the searched File.
my thoughts/tryings
I feel like I am not doing it properly- in a functional way. I am new to this language so I am sorry for the following code.
I was trying to do this for several hours. I was trying like this:
heler function contains that returns true if given FSObject contains File that we are looking for
helper function that returns first element with the File (using previous function and "find" function)
helper function to deal with conversion from Maybe String to String
My search function would check if there is the File, if no-> return Nothing, else return Just with String computed somehow using recursion
I can paste my work but I don't know how to make it work, its totally unreadable etc.
Do some1 has hints/comments to share? How to properly deal with Maybe in this kind of problems?
You can do this using recursion.
findPath::String->FSObject->Maybe String
findPath name (File name') | name == name' = Just name
findPath _ (File _) = Nothing
findPath name (Folder path contents) =
fmap ((path ++ "/") ++) $ msum $ map (findPath name) contents
The only tricky point is the last part. Breaking the last line apart....
map (findPath name) contents
will output a list of subresults
[Just "a/b/filename", Nothing, Just "c/d/filename", ....]
msum will take the first Just in the list, throwing the rest away (and, by laziness, nothing further than that first value will actually be created)
Finally, the path is prepended (the fmap is there to reach inside the Maybe)
While #jamshidh solution is short, it isn't modular and the final result. Here is my process of writing the program. Two main points:
I'll use so called "generate then search" approach: first I generate paths for all files, then I search the collection for the right path
I'll store path as list of components - this way the code will be more generic (in Haskell more generic code is less error-prone) and I will insert path separators afterwards in a small function (so I can do one thing at a time which is easier).
Ok I need function allPaths that gives me list of all files along with their paths. All paths of a single file is that single file, and all paths of a folder is concatenated collections of paths from children with prepended folder name:
allPaths (File file) = singleFile file
allPaths (Folder folder children) = concatMap (addFolder folder . allPaths) children
I actually wrote the code top-down. So I didn't bother defining singleFile and addFolder at this point. Single file is simple:
singleFile file = [(file, [])]
addFolder adds f to second component of a tuple. There is a function in Control.Arrow for that already, but I add its implementation here for simplicity:
second f (a,x) = (a, f x)
addFolder f files = map (second (f:)) files
When I was learning Haskell it was hard to write such code at once, but now it's automatic and without intermediate steps.
Now we basically implement search by filtering all matching files, taking the first match and extracting the path. Oh, there is already function lookup in the standard library for that:
search fileToSearch l = lookup fileToSearch $ allPaths l
It took me quite a while to figure out how to compose lookup and allPaths. Fortunately the order in the tuple was chosen correctly by accident.
Note that you still need to convert folder list to a path by inserting separators and appending filename as necessary using concatMap.
You should create a recursive function using an accumulation parameter to save the current path where you're searching. It would be something like this:
search str (File f) = if str == f then Just str else Nothing
search str (Directory dir (x:xs)) =
Just (sear str [] (Directory dir (x:xs))) where
sear str path (File f) = if str == f then (path ++ "/" ++ str ++ "|") else []
sear str path (Directory y []) = []
sear str path (Directory dir (x:xs)) = sear str (path ++ "/" ++ dir) x ++ sear str path (Directory dir xs)
I hope it's helpful for you.
I am writing a daemon that reads something from a small file, modifies it, and writes it back to the same file. I need to make sure that each file is closed promptly after reading before I try to write to it. I also need to make sure each file is closed promptly after writing, because I might occasionally read from it again right away.
I have looked into using binary-strict instead of binary, but it seems that only provides a strict Get, not a strict Put. Same issue with System.IO.Strict. And from reading the binary-strict documentation, I'm not sure it really solves my problem of ensuring that files are promptly closed. What's the best way to handle this? DeepSeq?
Here's a highly simplified example that will give you an idea of the structure of my application. This example terminates with
*** Exception: test.dat: openBinaryFile: resource busy (file is locked)
for obvious reasons.
import Data.Binary ( Binary, encode, decode )
import Data.ByteString.Lazy as B ( readFile, writeFile )
import Codec.Compression.GZip ( compress, decompress )
encodeAndCompressFile :: Binary a => FilePath -> a -> IO ()
encodeAndCompressFile f = B.writeFile f . compress . encode
decodeAndDecompressFile :: Binary a => FilePath -> IO a
decodeAndDecompressFile f = return . decode . decompress =<< B.readFile f
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff
All 'puts' or 'writes' to files are strict. The act of writeFile demands all Haskell data be evaluated in order to put it on disk.
So what you need to concentrate on is the lazy reading of the input. In your example above you both lazily read the file, then lazily decode it.
Instead, try reading the file strictly (e.g. with strict bytestrings), and you'll be fine.
Consider using a package such as conduit, pipes, iteratee or enumerator. They provide much of the benefits of lazy IO (simpler code, potentially smaller memory footprint) without the lazy IO. Here's an example using conduit and cereal:
import Data.Conduit
import Data.Conduit.Binary (sinkFile, sourceFile)
import Data.Conduit.Cereal (sinkGet, sourcePut)
import Data.Conduit.Zlib (gzip, ungzip)
import Data.Serialize (Serialize, get, put)
encodeAndCompressFile :: Serialize a => FilePath -> a -> IO ()
encodeAndCompressFile f v =
runResourceT $ sourcePut (put v) $$ gzip =$ sinkFile f
decodeAndDecompressFile :: Serialize a => FilePath -> IO a
decodeAndDecompressFile f = do
val <- runResourceT $ sourceFile f $$ ungzip =$ sinkGet get
case val of
Right v -> return v
Left err -> fail err
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff
An alternative to using conduits et al. would be to just use System.IO, which will allow you to control explicitly when files are closed with respect to the IO execution order.
You can use openBinaryFile followed by normal reading operations (probably the ones from Data.ByteString) and hClose when you're done with it, or withBinaryFile, which closes the file automatically (but beware this sort of problem).
Whatever the method you use, as Don said, you probably want to read as a strict bytestring and then convert the strict to lazy afterwards with fromChunks.