Ensuring files are closed promptly - file

I am writing a daemon that reads something from a small file, modifies it, and writes it back to the same file. I need to make sure that each file is closed promptly after reading before I try to write to it. I also need to make sure each file is closed promptly after writing, because I might occasionally read from it again right away.
I have looked into using binary-strict instead of binary, but it seems that only provides a strict Get, not a strict Put. Same issue with System.IO.Strict. And from reading the binary-strict documentation, I'm not sure it really solves my problem of ensuring that files are promptly closed. What's the best way to handle this? DeepSeq?
Here's a highly simplified example that will give you an idea of the structure of my application. This example terminates with
*** Exception: test.dat: openBinaryFile: resource busy (file is locked)
for obvious reasons.
import Data.Binary ( Binary, encode, decode )
import Data.ByteString.Lazy as B ( readFile, writeFile )
import Codec.Compression.GZip ( compress, decompress )
encodeAndCompressFile :: Binary a => FilePath -> a -> IO ()
encodeAndCompressFile f = B.writeFile f . compress . encode
decodeAndDecompressFile :: Binary a => FilePath -> IO a
decodeAndDecompressFile f = return . decode . decompress =<< B.readFile f
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff

All 'puts' or 'writes' to files are strict. The act of writeFile demands all Haskell data be evaluated in order to put it on disk.
So what you need to concentrate on is the lazy reading of the input. In your example above you both lazily read the file, then lazily decode it.
Instead, try reading the file strictly (e.g. with strict bytestrings), and you'll be fine.

Consider using a package such as conduit, pipes, iteratee or enumerator. They provide much of the benefits of lazy IO (simpler code, potentially smaller memory footprint) without the lazy IO. Here's an example using conduit and cereal:
import Data.Conduit
import Data.Conduit.Binary (sinkFile, sourceFile)
import Data.Conduit.Cereal (sinkGet, sourcePut)
import Data.Conduit.Zlib (gzip, ungzip)
import Data.Serialize (Serialize, get, put)
encodeAndCompressFile :: Serialize a => FilePath -> a -> IO ()
encodeAndCompressFile f v =
runResourceT $ sourcePut (put v) $$ gzip =$ sinkFile f
decodeAndDecompressFile :: Serialize a => FilePath -> IO a
decodeAndDecompressFile f = do
val <- runResourceT $ sourceFile f $$ ungzip =$ sinkGet get
case val of
Right v -> return v
Left err -> fail err
main = do
let i = 0 :: Int
encodeAndCompressFile "test.dat" i
doStuff
doStuff = do
i <- decodeAndDecompressFile "test.dat" :: IO Int
print i
encodeAndCompressFile "test.dat" (i+1)
doStuff

An alternative to using conduits et al. would be to just use System.IO, which will allow you to control explicitly when files are closed with respect to the IO execution order.
You can use openBinaryFile followed by normal reading operations (probably the ones from Data.ByteString) and hClose when you're done with it, or withBinaryFile, which closes the file automatically (but beware this sort of problem).
Whatever the method you use, as Don said, you probably want to read as a strict bytestring and then convert the strict to lazy afterwards with fromChunks.

Related

Why does a File need to be mutable to call Read::read_to_string?

Here's a line from the 2nd edition Rust tutorial:
let mut f = File::open(filename).expect("file not found");
I'm of the assumption that the file descriptor is a wrapper around a number that basically doesn't change and is read-only.
The compiler complains that the file cannot be borrowed mutably, and I'm assuming it's because the method read_to_string takes the instance as the self argument as mutable, but the question is "why"? What is ever going to change about the file descriptor? Is it keeping track of the cursor location or something?
error[E0596]: cannot borrow immutable local variable `fdesc` as mutable
--> main.rs:13:5
|
11 | let fdesc = File::open(fname).expect("file not found");
| ----- consider changing this to `mut fdesc`
12 | let mut fdata = String::new();
13 | fdesc.read_to_string(&mut fdata)
| ^^^^^ cannot borrow mutably
The whole source:
fn main() {
let args: Vec<String> = env::args().collect();
let query = &args[1];
let fname = &args[2];
println!("Searching for '{}' in file '{}'...", query, fname);
let fdesc = File::open(fname).expect("file not found"); //not mut
let mut fdata = String::new();
fdesc.read_to_string(&mut fdata)
.expect("something went wrong reading the file");
println!("Found: \n{}", fdata);
}
I'm assuming it's because the method read_to_string takes the instance as the self argument as mutable
Yes, that's correct:
fn read_to_string(&mut self, buf: &mut String) -> Result<usize>
The trait method Read::read_to_string takes the receiver as a mutable reference because in general, that's what is needed to implement "reading" from something. You are going to change a buffer or an offset or something.
Yes, an actual File may simply contain an underlying file descriptor (e.g. on Linux or macOS) or a handle (e.g. Windows). In these cases, the operating system deals with synchronizing the access across threads. That's not even guaranteed though — it depends on the platform. Something like Redox might actually have a mutable reference in its implementation of File.
If the Read trait didn't accept a &mut self, then types like BufReader would have to use things like internal mutability, reducing the usefulness of Rust's references.
See also:
Why is it possible to implement Read on an immutable reference to File?

The Haskell way to do IO Loops (without explicit recursion)?

I want to read a list of strings seperated by newlines from STDIN, until a new line is witnessed and I want an action of the type IO [String]. Here is how I would do it with recursion:
myReadList :: IO String
myReadList = go []
where
go :: [String] -> IO [String]
go l = do {
inp <- getLine;
if (inp == "") then
return l;
else go (inp:l);
}
However, this method of using go obscures readability and is a pattern so common that one would ideally want to abstract this out.
So, this was my attempt:
whileM :: (Monad m) => (a -> Bool) -> [m a] -> m [a]
whileM p [] = return []
whileM p (x:xs) = do
s <- x
if p s
then do
l <- whileM p xs
return (s:l)
else
return []
myReadList :: IO [String]
myReadList = whileM (/= "") (repeat getLine)
I am guessing there is some default implementation of this whileM or something similar already. However I cannot find it.
Could someone point out what is the most natural and elegant way to deal with this problem?
unfoldWhileM is same as your whileM except that it takes an action (not a list) as second argument.
myReadList = unfoldWhileM (/= "") getLine
Yes for abstracting out the explicit recursion as mentioned in the previous answer there is the Control.Monad.Loop library which is useful. For those who are interested here is a nice tutorial on Monad Loops.
However there is another way. Previously, struggling with this job and knowing that Haskell is by default Lazy i first tried;
(sequence . repeat $ getLine) >>= return . takeWhile (/="q")
I expected the above to collect entered lines into an IO [String] type. Nah... It runs indefinitely and IO actişons don't look lazy at all. At this point System IO Lazy might come handy too. It's a 2 function only simple library.
run :: T a -> IO a
interleave :: IO a -> T a
So run takes an Lazy IO action and turns it into an IO action and interleave does the opposite. Accordingly if we rephrase the above function as;
import qualified System.IO.Lazy as LIO
gls = LIO.run (sequence . repeat $ LIO.interleave getLine) >>= return . takeWhile (/="q")
Prelude> gls >>= return . sum . fmap (read :: String -> Int)
1
2
3
4
q
10
A solution using the effectful streams of the streaming package:
import Streaming
import qualified Streaming.Prelude as S
main :: IO ()
main = do
result <- S.toList_ . S.takeWhile (/="") . S.repeatM $ getLine
print result
A solution that shows prompts, keeping them separated from the reading actions:
main :: IO ()
main = do
result <- S.toList_
$ S.zipWith (\_ s -> s)
(S.repeatM $ putStrLn "Write something: ")
(S.takeWhile (/="") . S.repeatM $ getLine)
print result

Reading file with "US-ASCII" encoding in Haskell: hGetContents: invalid argument (invalid byte sequence)

I'm using Haskell for programming a parser, but this error is a wall I can't pass. Here is my code:
main = do
arguments <- getArgs
let fileName = head arguments
fileContents <- readFile fileName
converter <- open "UTF-8" Nothing
let titleLength = length fileName
titleWithoutExtension = take (titleLength - 4) fileName
allNonEmptyLines = unlines $ tail $ filter (/= "") $ lines fileContents
When I try to read a file with "US-ASCII" encoding I get the famous error hGetContents: invalid argument (invalid byte sequence). I've tried to change the "UTF-8" in my code by "US-ASCII", but the error persist. Is there a way for reading this files, or any kind of file handling encoding problems?
You should hSetEncoding to configure the file handle for a specific text encoding, e.g.:
import System.Environment
import System.IO
main = do
(path : _) <- getArgs
h <- openFile path ReadMode
hSetEncoding h latin1
contents <- hGetContents h
-- no need to close h
putStrLn $ show $ length contents
If your file contains non-ASCII characters and it's not UTF8 encoded, then latin1 is a good bet although it's not the only possibility.

Convert all the elements in a file into a array in haskell

I have a file which contains a set of 200,000+ words and I want the program to read the data and store it in array and form a new array with all the 200,000+ words.
I wrote the code as
import System.IO
main = do
handle <- openFile "words.txt" ReadMode
contents <- hGetContents handle
con <- lines contents
putStrLn ( show con)
hClose handle
But it is giving error as type error at line 5
And the text file is a of the form
ABRIDGMENT
ABRIDGMENTS
ABRIM
ABRIN
ABRINS
ABRIS
and so on
what are the amendments in the code that it can can form a array of words
I solved it in python (HTH)
def readFile():
allWords = []
for word in open ("words.txt"):
allWords.append(word.strip())
return allWords
Maybe
readFile "words.txt" >>= return . words
with type
:: IO [String]
or you can write
getWordsFromFile :: String -> IO [String]
getWordsFromFile file = readFile file >>= return . words
and use as
main = do
wordList <- getWordsFromFile "words.txt"
putStrLn $ "File contains " ++ show (length wordList) ++ " words."
Very constructive comments from #sanityinc and #Sarah (thanks!):
#sanityinc: "Other options: fmap words $ readFile file or words <$> readFile file if you've imported <$> from Control.Applicative"
#Sarah: "To elaborate a bit, whenever you see foo >>= return . bar you can (and should) replace it with fmap bar foo because you're not actually using the extra powers that come with Monad and in most cases restricting yourself to a needlessly complex type is not beneficial. This will be even more true in the future where Applicative is a superclass of Monad"

Ways to validate converted code from FORTRAN to C

I have converted around 90+ fortran files into C files using a tool and I need to validate that the conversion is good or not.
Can you give me some ideas on how best to ensure that the functionality has been preserved through the translation?
You need verification tests that exercise those fortran functions. Then you run those tests against the c code.
You can use unit test technology/methodology. In fact I can't see how else you would prove that the conversion is correct.
In lots of unit test methodologies you would write the tests in the same language as the code, but in this case I recommend very very strongly to pick one language and one code base to exercise both sets of functions. Also don't worry about be trying to create pure unit tests rather use the techniques to give you coverage of all the use that the fortran code was supposed to handle.
Use unit tests.
First write your unit tests on the Fortran code and check whether they all run correctly, then rewrite them in C and run those.
The problem in this approach is that you also need to rewrite your unit test, which you normally don't do when refactoring code (except for API changes). This means that you might end up debugging your ported unit testing code as well, beside the actual code.
Therefore, it might be better to write testing code that contains minimal logic and only write the results of the functions to a file. Then you can rewrite this minimal testing code in C, generate the same files and compare the files.
Here is what I did for a "similar" task (comparing fortran 90 to fortran 90 + OpenACC GPU accelerated code):
Analyze what's the output of each Fortran module.
Write these output arrays to .dat files.
Copy the .dat files into a reference folder.
Write the output of the converted modules to files (either CSV or binary). Use the same filename for convenience.
Make a python script that compares the two versions.
I used convenience functions like these in fortran (analogous for 1D, 2D case):
subroutine write3DToFile(path, array, n1, n2, n3)
use pp_vardef
use pp_service, only: find_new_mt
implicit none
!input arguments
real(kind = r_size), intent(in) :: array(n1,n2,n3)
character(len=*), intent(in) :: path
integer(4) :: n1
integer(4) :: n2
integer(4) :: n3
!temporary
integer(4) :: imt
call find_new_mt(imt)
open(imt, file = path, form = 'unformatted', status = 'replace')
write(imt) array
close(imt)
end subroutine write3DToFile
In python I used the following script for reading binary Fortran data and comparing it. Note: Since you want to convert to C you would have to adapt it such that you can read the data produced by C instead of Fortran.
from optparse import OptionParser
import struct
import sys
import math
def unpackNextRecord(file, readEndianFormat, numOfBytesPerValue):
header = file.read(4)
if (len(header) != 4):
#we have reached the end of the file
return None
headerFormat = '%si' %(readEndianFormat)
headerUnpacked = struct.unpack(headerFormat, header)
recordByteLength = headerUnpacked[0]
if (recordByteLength % numOfBytesPerValue != 0):
raise Exception, "Odd record length."
return None
recordLength = recordByteLength / numOfBytesPerValue
data = file.read(recordByteLength)
if (len(data) != recordByteLength):
raise Exception, "Could not read %i bytes as expected. Only %i bytes read." %(recordByteLength, len(data))
return None
trailer = file.read(4)
if (len(trailer) != 4):
raise Exception, "Could not read trailer."
return None
trailerUnpacked = struct.unpack(headerFormat, trailer)
redundantRecordLength = trailerUnpacked[0]
if (recordByteLength != redundantRecordLength):
raise Exception, "Header and trailer do not match."
return None
dataFormat = '%s%i%s' %(readEndianFormat, recordLength, typeSpecifier)
return struct.unpack(dataFormat, data)
def rootMeanSquareDeviation(tup, tupRef):
err = 0.0
i = 0
for val in tup:
err = err + (val - tupRef[i])**2
i = i + 1
return math.sqrt(err)
##################### MAIN ##############################
#get all program arguments
parser = OptionParser()
parser.add_option("-f", "--file", dest="inFile",
help="read from FILE", metavar="FILE", default="in.dat")
parser.add_option("--reference", dest="refFile",
help="reference FILE", metavar="FILE", default="ref.dat")
parser.add_option("-b", "--bytesPerValue", dest="bytes", default="4")
parser.add_option("-r", "--readEndian", dest="readEndian", default="big")
parser.add_option("-v", action="store_true", dest="verbose")
(options, args) = parser.parse_args()
numOfBytesPerValue = int(options.bytes)
if (numOfBytesPerValue != 4 and numOfBytesPerValue != 8):
print "Unsupported number of bytes per value specified."
sys.exit()
typeSpecifier = 'f'
if (numOfBytesPerValue == 8):
typeSpecifier = 'd'
readEndianFormat = '>'
if (options.readEndian == "little"):
readEndianFormat = '<'
inFile = None
refFile = None
try:
#prepare files
inFile = open(str(options.inFile),'r')
refFile = open(str(options.refFile),'r')
i = 0
while True:
passedStr = "pass"
i = i + 1
unpackedRef = None
try:
unpackedRef = unpackNextRecord(refFile, readEndianFormat, numOfBytesPerValue)
except(Exception), e:
print "Error reading record %i from %s: %s" %(i, str(options.refFile), e)
sys.exit()
if (unpackedRef == None):
break;
unpacked = None
try:
unpacked = unpackNextRecord(inFile, readEndianFormat, numOfBytesPerValue)
except(Exception), e:
print "Error reading record %i from %s: %s" %(i, str(options.inFile), e)
sys.exit()
if (unpacked == None):
print "Error in %s: Record expected, could not load record it" %(str(options.inFile))
sys.exit()
if (len(unpacked) != len(unpackedRef)):
print "Error in %s: Record %i does not have same length as reference" %(str(options.inFile), i)
sys.exit()
#analyse unpacked data
err = rootMeanSquareDeviation(unpacked, unpackedRef)
if (abs(err) > 1E-08):
passedStr = "FAIL <-------"
print "%s, record %i: Mean square error: %e; %s" %(options.inFile, i, err, passedStr)
if (options.verbose):
print unpacked
except(Exception), e:
print "Error: %s" %(e)
finally:
#cleanup
if inFile != None:
inFile.close()
if refFile != None:
refFile.close()

Resources