Open file in haskell, passing filepath in C FFI call (CString) - c

I want to open a file in Haskell, but I want the top level function to be called from C (I want to pass the filepath from C).
I'm having trouble getting the filepath CString into a type that I can use readFile on.
Here's my first attempt, adapting the example from the docs:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.Types
import Foreign.C (CString, peekCString)
openFileDoStuff :: String -> IO Bool
openFileDoStuff filename = do
lines <- (fmap lines . readFile) filename
print lines
-- do stuff with lines
return True
openFilepathHs :: CString -> IO Bool
openFilepathHs cstr = openFileDoStuff (peekCString cstr)
foreign export ccall openFilepathHs :: CString -> IO Bool
I get a compiler error passing (peekCString cstr) to openFileDoStuff:
• Couldn't match type: IO String
with: [Char]
If I change the signature of my function to openFileDoStuff :: IO String -> IO Bool, I then can't use the filename parameter in the readFile call:
• Couldn't match type: IO String
with: [Char]
If it's not abundantly clear, I am a newbie in Haskell. I know there's no way to convert IO String -> String, but there must be a way to actually use the CString type.

Use >>= to combine IO actions.
openFilepathHs cstr = peekCString cstr >>= openFileDoStuff
Actually, this pattern of passing a piece of data through successive IO transformations is so common it has a standard combinator for abbreviation.
openFilepathHs = peekCString >=> openFileDoStuff
You can also use do syntax to hide calls to >>=, but as a beginner I personally found do syntax very difficult to understand before I understood how to make calls to >>= myself.
openFilepathHs cstr = do
cstrContents <- peekCString cstr
openFileDoStuff cstrContents

I needed to run the IO String and bind it to a variable:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.Types
import Foreign.C (CString, peekCString)
openFileDoStuff :: IO String -> IO Bool
openFileDoStuff filename = do
filenameString <- filename
lines <- (fmap lines . readFile) filenameString
print lines
-- do stuff with lines
return True
openFilepathHs :: CString -> IO Bool
openFilepathHs cstr = openFileDoStuff (peekCString cstr)
foreign export ccall openFilepathHs :: CString -> IO Bool

Related

The Haskell way to do IO Loops (without explicit recursion)?

I want to read a list of strings seperated by newlines from STDIN, until a new line is witnessed and I want an action of the type IO [String]. Here is how I would do it with recursion:
myReadList :: IO String
myReadList = go []
where
go :: [String] -> IO [String]
go l = do {
inp <- getLine;
if (inp == "") then
return l;
else go (inp:l);
}
However, this method of using go obscures readability and is a pattern so common that one would ideally want to abstract this out.
So, this was my attempt:
whileM :: (Monad m) => (a -> Bool) -> [m a] -> m [a]
whileM p [] = return []
whileM p (x:xs) = do
s <- x
if p s
then do
l <- whileM p xs
return (s:l)
else
return []
myReadList :: IO [String]
myReadList = whileM (/= "") (repeat getLine)
I am guessing there is some default implementation of this whileM or something similar already. However I cannot find it.
Could someone point out what is the most natural and elegant way to deal with this problem?
unfoldWhileM is same as your whileM except that it takes an action (not a list) as second argument.
myReadList = unfoldWhileM (/= "") getLine
Yes for abstracting out the explicit recursion as mentioned in the previous answer there is the Control.Monad.Loop library which is useful. For those who are interested here is a nice tutorial on Monad Loops.
However there is another way. Previously, struggling with this job and knowing that Haskell is by default Lazy i first tried;
(sequence . repeat $ getLine) >>= return . takeWhile (/="q")
I expected the above to collect entered lines into an IO [String] type. Nah... It runs indefinitely and IO actişons don't look lazy at all. At this point System IO Lazy might come handy too. It's a 2 function only simple library.
run :: T a -> IO a
interleave :: IO a -> T a
So run takes an Lazy IO action and turns it into an IO action and interleave does the opposite. Accordingly if we rephrase the above function as;
import qualified System.IO.Lazy as LIO
gls = LIO.run (sequence . repeat $ LIO.interleave getLine) >>= return . takeWhile (/="q")
Prelude> gls >>= return . sum . fmap (read :: String -> Int)
1
2
3
4
q
10
A solution using the effectful streams of the streaming package:
import Streaming
import qualified Streaming.Prelude as S
main :: IO ()
main = do
result <- S.toList_ . S.takeWhile (/="") . S.repeatM $ getLine
print result
A solution that shows prompts, keeping them separated from the reading actions:
main :: IO ()
main = do
result <- S.toList_
$ S.zipWith (\_ s -> s)
(S.repeatM $ putStrLn "Write something: ")
(S.takeWhile (/="") . S.repeatM $ getLine)
print result

Reading file with "US-ASCII" encoding in Haskell: hGetContents: invalid argument (invalid byte sequence)

I'm using Haskell for programming a parser, but this error is a wall I can't pass. Here is my code:
main = do
arguments <- getArgs
let fileName = head arguments
fileContents <- readFile fileName
converter <- open "UTF-8" Nothing
let titleLength = length fileName
titleWithoutExtension = take (titleLength - 4) fileName
allNonEmptyLines = unlines $ tail $ filter (/= "") $ lines fileContents
When I try to read a file with "US-ASCII" encoding I get the famous error hGetContents: invalid argument (invalid byte sequence). I've tried to change the "UTF-8" in my code by "US-ASCII", but the error persist. Is there a way for reading this files, or any kind of file handling encoding problems?
You should hSetEncoding to configure the file handle for a specific text encoding, e.g.:
import System.Environment
import System.IO
main = do
(path : _) <- getArgs
h <- openFile path ReadMode
hSetEncoding h latin1
contents <- hGetContents h
-- no need to close h
putStrLn $ show $ length contents
If your file contains non-ASCII characters and it's not UTF8 encoded, then latin1 is a good bet although it's not the only possibility.

Convert all the elements in a file into a array in haskell

I have a file which contains a set of 200,000+ words and I want the program to read the data and store it in array and form a new array with all the 200,000+ words.
I wrote the code as
import System.IO
main = do
handle <- openFile "words.txt" ReadMode
contents <- hGetContents handle
con <- lines contents
putStrLn ( show con)
hClose handle
But it is giving error as type error at line 5
And the text file is a of the form
ABRIDGMENT
ABRIDGMENTS
ABRIM
ABRIN
ABRINS
ABRIS
and so on
what are the amendments in the code that it can can form a array of words
I solved it in python (HTH)
def readFile():
allWords = []
for word in open ("words.txt"):
allWords.append(word.strip())
return allWords
Maybe
readFile "words.txt" >>= return . words
with type
:: IO [String]
or you can write
getWordsFromFile :: String -> IO [String]
getWordsFromFile file = readFile file >>= return . words
and use as
main = do
wordList <- getWordsFromFile "words.txt"
putStrLn $ "File contains " ++ show (length wordList) ++ " words."
Very constructive comments from #sanityinc and #Sarah (thanks!):
#sanityinc: "Other options: fmap words $ readFile file or words <$> readFile file if you've imported <$> from Control.Applicative"
#Sarah: "To elaborate a bit, whenever you see foo >>= return . bar you can (and should) replace it with fmap bar foo because you're not actually using the extra powers that come with Monad and in most cases restricting yourself to a needlessly complex type is not beneficial. This will be even more true in the future where Applicative is a superclass of Monad"

Why does print statement change gzread behavior?

I'm trying to read a gzip file in Fortran using the C functions gzopen, gzread, and gzclose from the zlib library. My subroutine works properly when it contains a print statement, but gives a Z_STREAM_ERROR (-2) without it. What is causing this to happen, and how can I fix it?
module gzmodule
use :: iso_c_binding
implicit none
private
public fastunzip
interface
type(c_ptr) function gzopen(filename,mode) bind(c)
use :: iso_c_binding
character(kind=c_char), dimension(*) :: filename
character(kind=c_char), dimension(*) :: mode
end function gzopen
end interface
interface
integer(c_int) function gzread(gzfile,buffer,length) bind(c)
use :: iso_c_binding
type(c_ptr), value :: gzfile
character(len=1,kind=c_char) :: buffer(*)
integer(c_int) :: length
end function gzread
end interface
interface
integer(c_int) function gzclose(gzfile) bind(c)
use :: iso_c_binding
type(c_ptr), value :: gzfile
end function
end interface
contains
subroutine fastunzip(filename, isize,abuf,ierr)
use :: iso_c_binding
character(len=*,kind=c_char), intent(in) :: filename
integer(c_int), intent(out) :: isize
character(len=1,kind=c_char), intent(inout) :: abuf(:,:,:,:)
integer(4), intent(out) :: ierr
type(c_ptr) :: gzfile
integer(c_int) :: iclose
logical :: c_associated
ierr = 1 !! indicates that an error has occured
isize = 0
gzfile = gzopen(trim(filename)//c_null_char,"rb")
if (.not.c_associated(gzfile)) return
isize = gzread(gzfile,abuf,size(abuf))
print*,isize !! why do I need this for it to work?
if (isize.ne.size(abuf)) return
iclose = gzclose(gzfile)
if (iclose.ne.0) return
ierr = 0 !! success
end subroutine fastunzip
end module gzmodule
program main
use gzmodule
implicit none
character(100) :: filename = './f10_19950120v7.gz'
integer(4) :: isize
integer(4) :: ierr
logical(4) :: exists
integer(4), parameter :: nlon = 1440
integer(4), parameter :: nlat = 720
integer(4), parameter :: nvar = 5
integer(4), parameter :: nasc = 2
character(1) :: abuf(nlon,nlat,nvar,nasc)
inquire(file=filename,exist=exists)
if (.not.exists) stop 'file not found'
call fastunzip(filename, isize,abuf,ierr)
print*,'return value of isize ',isize
if (ierr.ne.0) stop 'error in fastunzip'
print*,'done'
end program main
I'm on CentOS and compiling with:
gfortran -o example_usage.exe example_usage.f90 /lib64/libz.so.1
and the data file is available at this site.
In subroutine fastunzip you declare logical :: c_associated. However, you get this function by use association (of iso_c_binding), so you should remove that line.
My installed gfortran (4.8) marks that as an error, so I guess you have an older version? But once I remove that line your code appears to work even without the print, so perhaps that is worth trying for you.
On a style note, I'd recommend use, intrinsic :: iso_c_binding, perhaps even with only (which would also flag to you that the c_associated is through use association).

Ensuring files are closed promptly

I am writing a daemon that reads something from a small file, modifies it, and writes it back to the same file. I need to make sure that each file is closed promptly after reading before I try to write to it. I also need to make sure each file is closed promptly after writing, because I might occasionally read from it again right away.
I have looked into using binary-strict instead of binary, but it seems that only provides a strict Get, not a strict Put. Same issue with System.IO.Strict. And from reading the binary-strict documentation, I'm not sure it really solves my problem of ensuring that files are promptly closed. What's the best way to handle this? DeepSeq?
Here's a highly simplified example that will give you an idea of the structure of my application. This example terminates with
*** Exception: test.dat: openBinaryFile: resource busy (file is locked)
for obvious reasons.
import Data.Binary ( Binary, encode, decode )
import Data.ByteString.Lazy as B ( readFile, writeFile )
import Codec.Compression.GZip ( compress, decompress )
encodeAndCompressFile :: Binary a => FilePath -> a -> IO ()
encodeAndCompressFile f = B.writeFile f . compress . encode
decodeAndDecompressFile :: Binary a => FilePath -> IO a
decodeAndDecompressFile f = return . decode . decompress =<< B.readFile f
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff
All 'puts' or 'writes' to files are strict. The act of writeFile demands all Haskell data be evaluated in order to put it on disk.
So what you need to concentrate on is the lazy reading of the input. In your example above you both lazily read the file, then lazily decode it.
Instead, try reading the file strictly (e.g. with strict bytestrings), and you'll be fine.
Consider using a package such as conduit, pipes, iteratee or enumerator. They provide much of the benefits of lazy IO (simpler code, potentially smaller memory footprint) without the lazy IO. Here's an example using conduit and cereal:
import Data.Conduit
import Data.Conduit.Binary (sinkFile, sourceFile)
import Data.Conduit.Cereal (sinkGet, sourcePut)
import Data.Conduit.Zlib (gzip, ungzip)
import Data.Serialize (Serialize, get, put)
encodeAndCompressFile :: Serialize a => FilePath -> a -> IO ()
encodeAndCompressFile f v =
runResourceT $ sourcePut (put v) $$ gzip =$ sinkFile f
decodeAndDecompressFile :: Serialize a => FilePath -> IO a
decodeAndDecompressFile f = do
val <- runResourceT $ sourceFile f $$ ungzip =$ sinkGet get
case val of
Right v -> return v
Left err -> fail err
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff
An alternative to using conduits et al. would be to just use System.IO, which will allow you to control explicitly when files are closed with respect to the IO execution order.
You can use openBinaryFile followed by normal reading operations (probably the ones from Data.ByteString) and hClose when you're done with it, or withBinaryFile, which closes the file automatically (but beware this sort of problem).
Whatever the method you use, as Don said, you probably want to read as a strict bytestring and then convert the strict to lazy afterwards with fromChunks.

Resources