Does C fopen read the entire file into memory? - c

I'm wondering if the fopen command is smart enough to stop reading a file if it's to large and then wait for some proceeding read command to continue reading.
For that matter, how large is _iobuf?

fopen(...) doesn't do any size checks; it just returns a file pointer. Are you thinking of fread(...), by any chance?
You can always find the size of the file that you are going to read by using stat(...) system call.

Per the C standard, setbuf()/setvbuf() shall be called after fopen() and before anything else. These set the buffering mode and buffer size for the freshly opened file. This implies that at least at the C level fopen() reads nothing from the file it has opened.
The underlying file system implementation in the OS, however, may read ahead into a file cache, but this is clearly not defined in the language standard. You need to find this out in the OS documentation or by experimentation.
_iobuf is not defined in the C standard. While I may "guess" what it is, it's unlikely something you need to concern yourself with (it would not contain a fixed-size C file buffer anyway and rather contain a pair of values: a pointer to a buffer and the buffer size).

If you look at here, you'll see that #define BUFSIZ 1024
But it depends on your fopen lib implementation. AFAIK, there is no standard definition.

Related

What is the difference between a pointer to a buffer and a pointer to a file?

In Chapter 22 of "C Programming: A Modern Approach", the basics of the <stdio.h>header are explained.
One detail that has me a little confused is the difference between a pointer to a buffer and a pointer to a file (denoted as FILE *).
Consider the following (through which the confusion is derived):
fopen is prototyped as: FILE *fopen(const char * restrict filename, const char * restrict mode).
fflush is prototyped as int fflush (FILE *stream). fflush is described as a function the flushes a file's buffer.
setvbuf is prototyped as int setvbuf (FILE * restrict stream, char * restrict buf, int mode, size_t size). When the author describes this function, he refers to the second parameter (buf) as the address of the buffer...which presumably is the same idea as pointer to buffer.
Firstly, from what I understand (especially given the name choice of the first parameter in fflush and setvbuf), a stream is semantically equivalent to a pointer to a file. Importantly, therefore, a stream IS NOT the file itself. A stream is the location of the file, at least as is represented through virtual memory (please correct if this is off base).
Secondly, when one opens a file, this amounts to creating a corresponding buffer (that is also represented in virtual memory).
At first, because of fflush's prototype, I was under the impression that the pointer to a file was in practice the pointer to a buffer; this is clearly wrong given the prototype of setvbuf (which has distinct parameters for the pointer to a file and the address of the buffer). So what exactly is the pointer to file pointing to?
Further, how does one acquire the address associated with a given file's buffer (the author has not shown a function yet the returns the address of a buffer associated with the file that was opened).
Any insight is greatly appreciated. Cheers~
The terms “stream” and “file” are a bit muddled in C. A file is something outside the program, and it may be a physical device, a file on disk, or some other thing provided by the operating system.
A stream is, roughly, an interface to a file. It is largely constructed in the C environment by using various data structures to remember information about the file it is connected to, to hold data being written to or read from the file, and so on.
For historic reasons, a stream is managed through a data structure type called FILE. A FILE * is actually a pointer to a stream (or, more technically, a pointer to the data used to control a stream). The data in a FILE includes a file position indicator, a pointer to its associated internal buffer (not anything you should use), and information about errors that have occurred or whether the end of the file has been reached. It would be better if the name were STREAM instead of FILE, but we are stuck with FILE due to history.
A buffer is often an array of char or unsigned char used to hold data being moved between various things, although there can be buffers of other types. The buf argument to setvbuf is used to provide a buffer to be used with the stream. This is not a commonly used routine. Passing an array to setvbuf gives the array to the C library to use for that stream. The program should stop using the array for any other purpose until it closes that stream. This is different from an array you use to read or write characters using the other functions like getchar or fputc.

Understanding file pointers and buffers

Why, in C, do you need a separate buffer to read a FILE *? When you declare a FILE * and assign to it with fopen, does the file then not exist in contiguous memory starting at the address of said pointer? I'm struggling to make the connection as to why you need need to read via fread() into a separate buffer. If someone could explain how theFIlE *file = fopen(filename, "r") and the subsequent fread(&buffer,...) work in conjunction it would help my understanding tremendously. Thanks in advance.
The FILE * returned by fopen is an unnecessary, but useful, layer of indirection.
Theoretically, fopen could have been designed to read the whole file into a buffer in memory, and just return you that buffer.
The issue approach is that it's not flexible at all. It forces you to read the entire file for all file IO operations, which is very undesirable. For example, here are some problems that would come about:
How could you read a file that's too big to fit in RAM?
What if you just want to append a new line at the end of a file (such as for logging). You would have to read the whole file, append the line at the end, then rewrite the entire file back. Expensive!
What if you're only interested in reading a small part of a file, such as reading the magic number to identify the file's type, without regard for its actual content?
What if you wanted to simultaneously edit the file from multiple programs. Each program need to constantly reread the whole file into memory, to ensure it kept up-to-date.
fopen returns a file handle that identifies a file still on disk. How much you read out of this file into memory is entirely up to you.
The explanation given above is pretty much self explanatory , still I would try to make it simple (in case anyone has problem understanding it)
In Short,consider this example and you yourself would know 'why?'
1) your files might be too large and stored in your hard drive , then if you try reading it frequently, don't you think this is an overhead for loading whole file again and again.
2) And more worse say the file is huge then if you load whole of your file it consumes your RAM even if you don't need whole file at once.
Why, in C, do you need a separate buffer to read a FILE *?
First thing, Because reading into buffers and then using it is always faster.
does the file then not exist in contiguous memory starting at the
address of said pointer?
May or May not be ,depending on its size.
If someone could explain how theFIlE *file = fopen(filename, "r") and
the subsequent fread(&buffer,...) work in conjunction
The fopen() function is used to open a file and associates an I/O stream with it. This function takes two arguments. The first argument is a pointer to a string containing name of the file to be opened while the second argument is the mode in which the file is to be opened.
Various modes can be like r, r+, w, w+, a, a+ .
The fopen() function returns a FILE stream pointer on success while it returns NULL in case of a failure.
Look here for detailed info.
Why, in C, do you need a separate buffer to read a FILE *?
No buffer aren't necessary while they are usually present to accelerate I/Os.
When you declare a FILE * and assign to it with fopen, does the file
then not exist in contiguous memory starting at the address of said
pointer?
Certainly not, 'cause this would be at least inefficient (why read a entire file huge if it is not needed at the end?) and at worst impossible at all (RAM size is usually much less than DISK size).
If someone could explain how the FILE *file = fopen(filename, "r") and
the subsequent fread(&buffer,...) work in conjunction it would help my
understanding tremendously.
Then FILE * is not an handle to a memory object that contains the file data, but is a memory object that contains data to help accessing file data on disk. That opaque object (opaque means don't try to look inside details are hidden) contains for example the current offset (remember when you read or write this is done at a given offset and this would modify the offset accordingly), or the open mode (this way writing into an opened for read file will correctly fails), or some buffer (that may contains part of the file and sometimes the whole file!), etc. A FILE * is handle as a handle for a door. Don't confuse file and FILE*, the first is a generic term to embrace what you already know (data on disk), then second is a type to represent an opened file which is a dynamic object to represent manipulation of a given file.
I'm struggling to make the connection as to why you need need to read
via fread() into a separate buffer.
If you don't have/can't have the file in memory, then you need to ask for reading the part you are interested in.

using setvbuf to make memory buffer act like FILE*

I need a cross platform way of treating memory buffer as FILE*. I have seen other questions which point out that there is no portable way to do this (fmemopen in linux is what I need but it fails on Windows platform).
I have tried using the setvbuf and it seems to work. Can anyone please point out the exact problem of using setvbuf function?
Also , I have seen the C standard draft WG14/N1256 and 7.19.5.6 says:
the contents of array at any time are indeterminate.
I don't understand if I use my own buffer how can its contents be indeterminate?
EDIT: Thanks for all the answers. Not using this method anymore.
No really, there's no portable way to do this.
Using setvbuf may appear to work but you're really invoking undefined behavior, and it will fail in unexpected ways at unexpected times. The GNU C library does have fmemopen(3) as an extension, as you mentioned, but it's not portable to non-GNU systems.
If you're using some library that requires a FILE* pointer and you only have the required data in memory, you'll just have to write it out to a temporary file and pass in a handle to that file. Ideally, your library should provide an alternative function that takes a memory pointer instead of a file pointer, but if not, you're out of luck (and you should complain to the library writer about that deficiency).
Function setvbuf() is used to tell the FILE the memory to be used as buffer, but it does not specify how this memory will be used: that's up to the implementation.
Thus, the contents of the buffer are indeterminate at any time, and if it happens to work for you, it is just by chance.
It depends on what you want to do with the buffer/FILE*. You can certainly perform simple operations and get away with them, but you cannot guarantee that all of the FILE* operations will perform as expected on your memory buffer.
Sorry, there is simply no cross-platform one-liner to get full FILE* characteristics, I've tried myself many times haha
what you can try:
#define-wrapped OS-specific logic
Look further into the interface you are trying to interact with. At some point it just plays with a buffer anyway. Then splice in your buffer. This is what I did.
Your technique + faith.

Make FILE* struct map to a buffer?

I am working with some legacy code which uses something like this:
void store_data(FILE *file);
However, I don't want to store data on the disk, I want to store it in memory (char *buf). I could edit all of the code, but the code jumps all over the place and fwrite gets called on the file all over the place. So is there an easier way, for example that I can map a FILE* object to an (auto-growing) buffer? I do not know the total size of the data before-hand.
The solution has to be portable.
There is no way to do this using only the facilities provided in the C standard. The closest you can come is
FILE *scratch = tmpfile();
...
store_data(scratch);
...
/* after you're completely done calling the legacy code */
rewind(scratch);
buf = read_into_memory_buffer(scratch);
fclose(scratch);
This does hit the disk, at least potentially, but I'd say it's your best bet if you need wide portability and can't modify the "legacy code".
In POSIX.1-2008, there is open_memstream, which does exactly what you want; however, this revision of POSIX is not yet widely adopted. GNU libc (used on Linux and a few others) has it, but it's not available on OSX or the *BSDs as far as I know, and certainly not on Windows.
You might want to look at fmemopen and open_memstream. They do something in the direction of what you want.
From the man page:
The open_memstream() function opens a stream for writing to a buffer.
The buffer is dynamically allocated (as with malloc(3)), and automati‐
cally grows as required. After closing the stream, the caller should
free(3) this buffer.
I don't know if it's a good idea, but it's an idea.
You can "redefine" fwrite using a macro.
#define fwrite(a, b, c) your_memory_write_function(a, b, c)
Then implement memory_write_function to write data to your auto growing buffer instead of a file.
You will need to call store_data with a pointer to something else though (not a pointer to FILE). But that's possible with C so you will have no issues there.
On what platform are you running? Can't you use tmpfs? If you open a file on tmpfs, is it not, from the point of view of the kernel, the same as a regular file, but written to memory?
You may want to look into fmemopen(). If that's not available to you, then you could possibly use a named shared memory segment along with fdopen() to convert the file descriptor returned by shm_open() to a FILE*.

Where is the FILE struct allocated?

In C, when opening a file with
FILE *fin;
fin=fopen("file.bin","rb");
I only have a pointer to a structure of FILE. Where is the actual FILE struct allocated on Windows machine? And does it contain all the necessary information for accessing the file?
My aim is to dump the whole data segment to disk and then to reload the dumped file back to the beginning of the data segment. The code that reloads the dumped file is placed in a separate function. This way, the fin pointer is local and is on the stack, thus is not being overwritten on reload. But the FILE struct itself is not local. I take care not to overwrite the memory region of size sizeof(FILE) that starts at the address fin.
The
fread(DataSegStart,1,szTillFin,fin);
fread(dummy,1,sizeof(FILE),fin);
fread(DataSegAfterFin,1,szFinTillEnd,fin);
operations completes successfully, but I get an assertion failure on
fclose(fin)
Do I overwrite some other necessary file data other than in the FILE struct?
The actual instance of the FILE structure exists within the standard library. Typically the standard library allocates some number of FILE structures, which may or may not be a fixed number of them. When you call fopen(), it returns a pointer to one of those structures.
The data within the FILE structure likely contains pointers to other things such as buffers. You're unlikely to be able to save and restore those structures to disk without some really deep integration with your standard library implementation.
You may be interested in something like CryoPID which does process save and restore at a different level.
It seems like you're trying to do something dangerous, unlikely to work.
fopen allocates a FILE structure and initializes it. fclose releases it. How it allocates it and what it puts in it is implementation dependent. It could contain a pointer to another piece of memory, which is also allocated somewhere (since it's buffered I/O, I guess it does allocate a buffer somewhere).
Writing code that relies on the internals of fopen is dangerous, most likely won't work, and surely won't be stable and portable.
Well, you have a pointer to a FILE object, so technically you know where it is but you should be aware that FILE is deliberately an opaque type. You shouldn't need to know what it contains, you just need to know that you can pass it to functions that know about it to perform certain actions. Additionally, FILE may not be a complete type so sizeof(FILE) might not be correct and, additionally, the object might contain pointers to other structures. Simply avoiding overwriting the FILE object is not likely to be sufficient for you to avoid corrupting the program by writing over most of its memory.
FILE is defined in stdio.h. It contains all the information about the file but, looking at the code you show, I think you don't understand its purpose. It is created and run through the operating system with the C library which fills FILE with information about the file but it is not contained in the file itself.

Resources