My application (C program) opens two file handles to the same file (one in write and one in read mode). Two separate threads in the app read from and write to the file. This works fine.
Since my app runs on embedded device with a limited ram disk size, I would like write FileHandle to wrap to beginning of file on reaching max size and the read FileHandle to follow like a circular buffer. I understand from answers to this question that this should work. However as soon as I do fseek of write FileHandle to beginning of file, fread returns error. Will the EOF get reset on doing fseek to beginning of file? If so, which function should be used to cause write file position to get set to 0 without causing EOF to be reset.
EDIT/UPDATE:
I tried couple of things:
Based on #neodelphi I used pipes this works. However my usecase requires I write to a file. I receive multiple channels of live video surveilance stream that needs to be stored to harddisk and also read back decoded and displayed on monitor.
Thanks to #Clement suggestions on doing ftell I fixed a couple of bugs in my code and wrap works for the reader however, the data read appears to be stale data since write are still buffered but reader reads stale content from hard disk. I cant avoid buffering due to performance considerations (I get 32Mbps of live data that needs to be written to harddisk). I have tried things like flushing writes only in the interval from when write wraps to when read wraps and truncating the file (ftruncate) after read wraps but this doesnt solve the stale data problem.
I am trying to use two files in ping-pong fashion to see if this solves the issue but want to know if there is a better solution
You should have something like that :
// Write
if(ftell(WriteHandle)>BUFFER_MAX) rewind (WriteHandle);
fwrite(WriteHandle,/* ... */);
// Read (assuming binary)
readSize = fread (buffer,1,READ_CHUNK_SIZE,ReadHandle);
if(readSize!=READ_CHUNK_SIZE){
rewind (ReadHandle);
if(fread (buffer+readSize,1,READ_CHUNK_SIZE-readSize,ReadHandle)!=READ_CHUNK_SIZE-readSize)
;// ERROR !
}
Not tested, but it gives an idea. The write should also handle the case BUFFER_MAX is not modulo WRITE_CHUNK_SIZE.
Also, you may read only if you are sure that the data has already been written. But I guess you already do that.
You could mmap the file into you're virtual memory and then just create a normal circular buffer with the pointer returned.
int fd = open(path, O_RDWR);
volatile void * mem = mmap(NULL, max_size, PROT_WRITE, MAP_SHARED, fd, 0);
volatile char * c_mem = (volatile char *)mem;
c_mem[index % max_size] = 'a'; // This line will now write to the offset index in the file
// Now doing
Can also probably be stricter on permissions depending on on exact case.
Related
I'm trying to parse some code which works with O_DIRECT files.
ssize_t written = write(fd, buf, size);
What is confusing is that size can be lower than the sector size of the disk, thus does write(fd,buf,size) write the entirety of buf to fd or only the first size bytes of buf to disk?
Without O_DIRECT this is simply the second case, but I can't find any documentation about in the case of O_DIRECT, and from what I've read it will still send buf to the disk, so the only thing I can think of is that it also tells the disk to only write size...
[...] does write(fd,buf,size) write the entirety of buf to fd or only the first size bytes of buf to disk?
If the write() call is successful it means all of the requested size data has been written but the question becomes: written to where? You have to remember that opening a file with O_DIRECT is sending more of a hint that you want to bypass OS caches rather than order. The filesystem could choose to simply write your I/O through the page cache either because that's what it always does or because you broke the rules regarding alignment and using the page cache is a way of quietly fixing up your mistake. The only way to know this would be to investigate the data path when the I/O was issued.
I am new to C programming and am trying to execute the code below.
My objective is to create a program that continuously polls on an input file, reads it when it is modified and writes the contents to a new output file.
The program writes to the output file if there is no 'while(1)' loop but it doesn't write to the output file (though the contents have been read) in the infinite loop. Could you please guide me where I am wrong?
void main(){
struct stat time_buf;
time_t input_timestamp=0;
while(1){
if(access( "inpfile.txt", F_OK ) != -1){
sleep(5);
stat("inpfile.txt", &time_buf);
if(time_buf.st_mtime > input_timestamp){
FILE *fpi,*fpo;
long length;
char *buffer=0;
fpi = fopen("inpfile.txt","r");
fseek(fpi,0,SEEK_END);
length=ftell(fpi);
fseek(fpi,0,SEEK_SET);
buffer=(char *)malloc(length);
fread(buffer,1,length,fpi);
fclose(fpi);
fpo=fopen("outfile.txt","w+");
fwrite(buffer,sizeof(char),length,fpo);
fclose(fpo);
input_timestamp=time_buf.st_mtime;
}
}
}
}
None of this is really going to answer your problem directly, since you don't really state anything other than your code isn't working. What do you think a car mechanic would ask you if you told him, "My car isn't working."
For example, you're not telling us if the modification time reported by your call to stat() changes when you think the file contents change.
First, the call to access() is worse than useless:
if(access( "inpfile.txt", F_OK ) != -1){
.
.
.
That call does you no good, and it introduces the possibility of a TOCTOU bug. Just open the file - don't check if it exists first. If it doesn't exist, opening the file will fail with an ENOENT or similar value in errno.
Trying to somehow check if something you plan to do later will work is almost always a bad idea. "I want to check if the file exists before I try to open it." or "I want to ping the server to make sure it's alive before I try to connect to it." are BAD IDEAS. They not only introduce a race condition if the situation changes between the check and the action (what happens if the file is deleted between your access() call and the fopen() call?), the check has to differ from your intended action in some way otherwise it would be the action. And since it's different, different rules apply. A fully-functional server may be configured to not respond to pings, for example. So the check can fail in situations where the action itself would actually work. And the other way around - the check can work and then the action can fail.
Don't check. Just do. And then handle all errors.
Second, since you're using stat() to get the file's modification time, there's no need to use fseek()/ftell() to get the size of the file as there's an st_size field in struct stat that tells you the file size.
This will copy a file in one step (note that there's no error checking at all, and I used open()/read()/write() instead of fopen()/fread()/fwrite() since you're already using POSIX functions - that allows using fstat() on the file descriptor instead of stat() on the input file name, meaning that name only has to be used once:
struct stat sb;
int fd = open( inputFileName, O_RDONLY );
fstat( fd, &sb );
char *data = malloc( sb.st_size );
read( fd, data, sb.st_size );
close( fd );
fd = open( outputFileName, O_WRONLY | O_CREAT | O_TRUNC, 0644 );
write( fd, data, sb.st_size );
close( fd );
free( data );
Note that there is no error checking in that code - open() can fail, malloc() can fail, read() and write() can both be only partially successful (try to write 8 KB, and only 4 KB actually get written), so you have to check return values. I also left out all the necessary header files.
Adapting that code to a loop where you check for a change in the modification time shouldn't be hard.
Third, technically fseek()/ftell() to get the size of the file is undefined behavior in C. (I have no idea why it's taught so much. Bad teachers? And if you use it on Windows on files > 2 GB you'll have issues as long is 32 bits even on 64-bit Windows.)
Streams open in binary mode are not required to support fseek(..., SEEK_END) - footnote 268 of the C Standard specifically states "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has
undefined behavior for a binary stream ...". And ftell() on a text stream does not represent how many bytes will be read from a file. Per 7.21.9.4 The ftell function, paragraph 2:
The ftell function obtains the current value of the file position
indicator for the stream pointed to by stream. For a binary
stream, the value is the number of characters from the
beginning of the file. For a text stream, its file position indicator
contains unspecified information, usable by the fseek function for
returning the file position indicator for the stream to its
position at the time of the ftell call; the difference
between two such return values is not necessarily a meaningful
measure of the number of characters written or read.
Fourth, you state "though the contents have been read", but from the posted code you can't know that - you don't check any return values. You just make calls and hope they actually work.
You're also at risk of having this question closed as off-topic. One of the reasons for that is:
Questions seeking debugging help ("why isn't this code working?") must
include the desired behavior, a specific problem or error and the
shortest code necessary to reproduce it in the question itself.
Questions without a clear problem statement are not useful to other
readers. See: How to create a Minimal, Complete, and Verifiable
example.
As one of the comments already noted, you didn't mention your debugging efforts.
And none of this answer gets into the difficulty of actually detecting file system changes and then processing the changed data. It's not easy to do reliably and quickly, and it I'd say it indicates a bad design.
You need to use fflush after the fwrite.
The fwrite does not write the data into the real file it write it to a buffer stream and after X time it will update the data or if the stream buffer is full.
I am working with a file API that only provides a const char* filename interface (accepting - for stdout) when writing files. I would instead like the output to be written into memory, so I can then pass the data elsewhere.
I can use dup2 to redirect stdout to an arbitrary file descriptor and something like fmemopen/open_memstream as my sink. However, the memory stream functions expect a size, which -- in my case -- I don't know in advance and could be arbitrarily large.
The data I need to access, however, does have a fixed length and offset within what's being produced (e.g., out of 1MB, I need the 64KB starting at 384KB, etc.). As such, is there a way to set up a circular buffer with fmemopen/open_memstream that just keeps being rewritten until reaching the offset in question? (I realise this is inefficient, but there's no ability to seek.)
Or is this the wrong approach? I've read a bit about memory mapped files and that seems to be similar to what I'm trying to achieve, but it's not something I know much about...
EDIT Just to be clear, I cannot write anything to disk.
Using dup2 redirect stdout to a pipe and call the API with - to instruct it to use standard output. Then read from the pipe the data generated by the API, filter it and store it in a memory region.
If the pipe capacity is not enough, you will need two threads to make this approach work.
One will be running the API call generating data and putting it into the pipe.
The other thread, will take the data from the pipe, check the offset and store the data in memory when the target offset is reached. But keep reading from the pipe until EOF so that the other thread can complete the API call and finish gracefully.
Consider the following code:
lseek(fd, 100, 0); /* Seek to the 100th byte in the file fd. */
write(fd, buf, n); /* Write from that position. */
lseek(fd, 0, 0); /* Is this necessary? Will it trigger a actual disk movement? */
I'd like to lseek back to the beginning of the file, in case another line of code continues writing from that position thinking that it starts at the beginning of the file. First, is this good practice? Second...
I'd like to know if an lseek does trigger an actual disk movement. Or, is the disk movement triggered only in the event of an actual reading or writing.
Disk seeking is a huge performance hit, and I'd like to know the tradeoffs between such defensive coding practices and performance.
Assuming that this is a Windows or Unix type system, a regular file and you did nothing fancy with file open flags, none of those functions will trigger a disk seek.
It is likely that in 5 seconds or so, the buffer containing that new file data will be written to disk, along with everything else that happened.
Also, the file position that lseek sets is an entirely imaginary property of a file. It controls where data will read or write to in the file by default, but there are many functions that simply override file position.
As to if it is good practice I don't think it matters much. However, I've gotten out of the habit of using seek functions when writing to files because of multithreading. You might want to use pread and pwrite by preference.
The core of my app looks approximately as follows:
size_t bufsize;
char* buf1;
size_t r1;
FILE* f1=fopen("/path/to/file","rb");
...
do{
r1=fread(buf1, 1, bufsize, f1);
processChunk(buf1,r1);
} while (!feof(f1));
...
(In reality, I have multiple FILE*'s and multiple bufN's.) Now, I hear that FILE is quite ready to manage a buffer (referred to as a "stream buffer") all by itself, and this behavior appears to be quite tweakable: https://www.gnu.org/software/libc/manual/html_mono/libc.html#Controlling-Buffering .
How can I refactor the above piece of code to ditch the buf1 buffer and use f1's internal stream buffer instead (while setting it to bufsize)?
If you don't want opaquely buffered I/O, don't use FILE *. Use lower-level APIs that let you manage all the application-side buffering yourself, such as plain POSIX open() and read() for instance.
So I've read a little bit of the C standard and run some benchmarks and here are my findings:
1) Doing it as in the above example does involve unnecessary in-memory copying, which increases the user time of simple cmp program based on the above example about twice. Nevertheless user-time is insignificant for most IO-heavy programs, unless the source of the file is extremely fast.
On in-memory file-sources (/dev/shm on Linux), however, turning off FILE buffering (setvbuf(f1, NULL, _IONBF, 0);) does yield a nice and consistent speed increase of about 10–15% on my machine when using buffsizes close to BUFSIZ (again, measured on the IO-heavy cmp utility based on the above snippet, which I've already mentioned, which I've tested on 2 identical 700MB files 100 times).
2) Whereas there is an API for setting the FILE buffer, I haven't found any standardized API for reading it, so I'm going to stick with the true and tested way of doing, but with the FILE buffer off (setvbuf(f1, NULL, _IONBF, 0);)
(But I guess I could solve my question by setting my own buffer as the FILE stream buffer with the _IONBF mode option (=turn off buffering), and then I could just access it via some unstandardized pointer in the FILE struct.)