Read and write from a file in a circular buffer fashion - file

i need to make a file behave as a circular buffer. From one thread i have to write the data.From another thread i have read from the file. But the size of the file is fixed.
Any idea?

Since you have not mentioned the language you will be using, I am only able to provide you with a general answer: Write an abstraction that, when reading past the end of the file, seeks to the beginning of the file and resumes reading there.
Be advised that writing and reading to the file from multiple threads needs proper synchronization.

I assume that a thread knows the position of the other thread. In this case the writer can append to the file and increase its position until it arrives at MAXSIZE. Then it should wrap around by seeking to position 0 and continue overwriting the old contents as long as its position is smaller than position of the reader, after that it has to block. At the same time reader can read and wrap around if necessary until it reaches the position of the writer.
In other words it's not much different from the standard circular in memory buffer. Are you sure that using a file is necessary in your case? You might also consider doing some research on the producer-consumer problem.

You could also consider using a named pipe.

Related

What happens if stdin fills up?

This seems like a simple question, but I have had a really hard time finding an answer. I am writing a program in C where this seems possible (though remotely so) on some systems, as it appears there are situations where stdin has a buffer of only 4k.
So, my question is, is there a standard way an OS deals with stdin filling up (i.e., a de facto standard, a posix requirement, etc)? How predictable is the outcome, if there is in fact some sort of standard way to deal with the situation?
The OS will have a buffer that stores the unread stdin input. In general things writing to stdin will be using blocking calls so that if the buffer fills up they will simply stall until room is available, so no data will be lost. If this is the undesirable behaviour (you don't want to be blocking the writer) then you need to make sure you are reading the buffer in time so that it doesn't fill up.
One thing you could do is create a worker thread that simply sits in a tight loop reading the stdin as fast as it can and puts the data somewhere else (in a much larger buffer for example) and then the main program accesses the data from your new buffer rather than reading from stdin itself.

Redirect file descriptor into memory

I am working with a file API that only provides a const char* filename interface (accepting - for stdout) when writing files. I would instead like the output to be written into memory, so I can then pass the data elsewhere.
I can use dup2 to redirect stdout to an arbitrary file descriptor and something like fmemopen/open_memstream as my sink. However, the memory stream functions expect a size, which -- in my case -- I don't know in advance and could be arbitrarily large.
The data I need to access, however, does have a fixed length and offset within what's being produced (e.g., out of 1MB, I need the 64KB starting at 384KB, etc.). As such, is there a way to set up a circular buffer with fmemopen/open_memstream that just keeps being rewritten until reaching the offset in question? (I realise this is inefficient, but there's no ability to seek.)
Or is this the wrong approach? I've read a bit about memory mapped files and that seems to be similar to what I'm trying to achieve, but it's not something I know much about...
EDIT Just to be clear, I cannot write anything to disk.
Using dup2 redirect stdout to a pipe and call the API with - to instruct it to use standard output. Then read from the pipe the data generated by the API, filter it and store it in a memory region.
If the pipe capacity is not enough, you will need two threads to make this approach work.
One will be running the API call generating data and putting it into the pipe.
The other thread, will take the data from the pipe, check the offset and store the data in memory when the target offset is reached. But keep reading from the pipe until EOF so that the other thread can complete the API call and finish gracefully.

Weird output to text file using fprintf()

In a simulation program I am trying to print the measures to a text file. The project is a combination of Java, C and C++ but the file I am working with is in C. The code for printing is as follows:
if(sample)
fprintf(MeasureInfo->measuresFile, "%d: %f\n", count++, sample);
This works for part of the output but there are large bulks (about 100 to 1000 measures) of data that are not printed to the text file. Instead I see just a bulk of NULs in Sublime Text and 0-bytes in bless:
436: 0.851661
437: 0.043466
(Really large block of NUL all in one line).210402
751: 0.357543
752: 0.816120
I only worked with part of the code so far and thought it might be a concurrency problem. So I printed out all pids that access the function with getpid() and it gave me different ones (19036, 19037, 19038 for instance). I then tried to use pthread_mutex_lock and pthread_mutex_unlock but it produced the same output.
Another thing I tried was using sleep after every 400 measures. This actually helped but shortened the number of measures the produced by a fourth.
Do you have any idea what the actual problem might be and how to fix it? I am really sorry if this is an answered or easy question but I tried and searched a while and didn't find a solution to this.
I/O and multitasking are a dangerous combination. At the thread level, fprintf is writing into a buffer, in thread-unsafe fashion. The individual formatting components might overwrite each other, and when one call flushes the buffer while another is trying to write, the result is certain to be lost data. It could even cause a buffer overflow and crash right inside to stdio library.
At the process level, the buffers are getting flushed to particular locations in the file. If two processes try to write at the same time, they will clobber each others' results. A process typically doesn't poll the filesystem to check if a file is growing due to an outside influence, then seek to the new end before writing.
If you have a very large project and synchronizing all the I/O to this file isn't an option, consider assigning each thread or process its own file, and merge the files after the fact.
sample looks to be a float or double. You shouldn't compare a floating point variable against a fixed value, as it most likely will never have this value. For floating point variable you should always compare against a delta. Your if clause will probably always be entered.
fprintf shouldn't print NUL though. Can you show the exact output (or part of it)?

File being simultaneously written, read and removed

I'm wondering if there is an mechanism that reads a file while it is being written and remove the content that has been read simultaneously. The purpose for doing this is because the file is stored in memory (ramdisk) and as the file size increases, we need to remove the part that has already being processed.
Thanks a lot!!!
PS: I'm using Linux and Java for this. :)
Data cannot be removed from the beginning or middle of a file. Process the data using multiple files and erase them as they are consumed.
Reading from a file while it is being written to is no big deal, this is the purpose of every tail program, however deleting already read content of an opened file... I don't think it is possible.
You may want to think of a work around. For example you can have a number of files {0,n} with the same limit of bytes to write to. Start writing the file_i where i is the highest available number out of {0,n} and go up to limit. Reading starts from the lowest available file_i, reads up to limit and when done deletes the file just consumed.
We still haven't heard what OS our friend user2386567 is using, but as a counterpoint to the other answers declaring that it's impossible to delete data from the middle of a file, I'd like to point out that Linux has FALLOC_FL_PUNCH_HOLE for that exact purpose.

Multi-Threading with files

So let's say I have the following code where I open a file, read the contents line by line and then use each line for a function somewhere else and then when I'm done rewind the file.
FILE *file = Open_File();
char line[max];
while (!EndofFile())
{
int length = GetLength(line);
if (length > 0)
{
DoStuffToLine(line)
}
}
rewind(file);
I'm wondering if there is a way to use threads here to add concurrency. Since I'm just reading the file and not writing to it I feel like I don't have to worry about race conditioning. However I'm not sure how to handle the code that's in the while loop because if one thread is looping over the file and another thread is looping over the file at the same time, would they cause each other to skip over lines, make other errors, etc? What's a good way to approach this?
If you're trying to do this to improve read performance, you're going to likely be disappointed since this will almost surely be disk I/O bound. Adding more threads won't help the OS and disk controller fetch data any faster.
However, if you're trying to just process the data in parallel, that's another matter. In that case, I would read the entire file into a memory buffer somewhere, then have your threads process it in parallel. That way you don't have to worry about thread safety with rewinding the file pointer or any other annoying issues like it.
You'll likely still need to use other locking mechanisms for the multithreaded parts of course, depending on exactly what you're doing, but you shouldn't have to worry about what the standard library is going to do when you start accessing a file with multiple threads.
The concurrency adds some race condition problems:
1. The EndofFile() function is evaluated at the start of the loop, it may always happens that this function returns true for two threads, then one thread reaches the end of file and the other thread attempts to read the file.You never know when a thread may be in execution;
2. Same is valid for the GetLength function: when a thread has the length information, the length may change because another thread may read another line;
3. You are reading a file sequentially, even if you rewind it, it may always occur that the current position of the IO pointer is altered by some other thread.
Furthermore, as Telgin pointed out, reading a file is not CPU bound, but I/O bound, so is the system to read the file.You can't improve the performance because you need some locks, and locking to guarantee thread safety just introduces overhead.
I'm not sure that this is the best approach. However, you could read the file. Then store it in two separate objects and read the objects instead of the file. Just make sure to do cleanup afterward.

Resources