Where is the FILE struct allocated? - c

In C, when opening a file with
FILE *fin;
fin=fopen("file.bin","rb");
I only have a pointer to a structure of FILE. Where is the actual FILE struct allocated on Windows machine? And does it contain all the necessary information for accessing the file?
My aim is to dump the whole data segment to disk and then to reload the dumped file back to the beginning of the data segment. The code that reloads the dumped file is placed in a separate function. This way, the fin pointer is local and is on the stack, thus is not being overwritten on reload. But the FILE struct itself is not local. I take care not to overwrite the memory region of size sizeof(FILE) that starts at the address fin.
The
fread(DataSegStart,1,szTillFin,fin);
fread(dummy,1,sizeof(FILE),fin);
fread(DataSegAfterFin,1,szFinTillEnd,fin);
operations completes successfully, but I get an assertion failure on
fclose(fin)
Do I overwrite some other necessary file data other than in the FILE struct?

The actual instance of the FILE structure exists within the standard library. Typically the standard library allocates some number of FILE structures, which may or may not be a fixed number of them. When you call fopen(), it returns a pointer to one of those structures.
The data within the FILE structure likely contains pointers to other things such as buffers. You're unlikely to be able to save and restore those structures to disk without some really deep integration with your standard library implementation.
You may be interested in something like CryoPID which does process save and restore at a different level.

It seems like you're trying to do something dangerous, unlikely to work.
fopen allocates a FILE structure and initializes it. fclose releases it. How it allocates it and what it puts in it is implementation dependent. It could contain a pointer to another piece of memory, which is also allocated somewhere (since it's buffered I/O, I guess it does allocate a buffer somewhere).
Writing code that relies on the internals of fopen is dangerous, most likely won't work, and surely won't be stable and portable.

Well, you have a pointer to a FILE object, so technically you know where it is but you should be aware that FILE is deliberately an opaque type. You shouldn't need to know what it contains, you just need to know that you can pass it to functions that know about it to perform certain actions. Additionally, FILE may not be a complete type so sizeof(FILE) might not be correct and, additionally, the object might contain pointers to other structures. Simply avoiding overwriting the FILE object is not likely to be sufficient for you to avoid corrupting the program by writing over most of its memory.

FILE is defined in stdio.h. It contains all the information about the file but, looking at the code you show, I think you don't understand its purpose. It is created and run through the operating system with the C library which fills FILE with information about the file but it is not contained in the file itself.

Related

FILE in C and different output

#include <stdio.h>
int main(void){
FILE *fp = fopen("loop.txt" , "r");
printf("%p\n",fp);
}
output :
Run 1 : 0x101d010
Run 2 : 0x13f9010
Run 3 : 0xeaf010
Why is the output different every time ?
The fopen() function call returns a pointer to a FILE structure that 'describes' the file, in terms of what the operating system needs in order to access that file on disk. That FILE structure will be located somewhere in memory (allocated at run-time); the actual location (address) of that memory block will vary between different runs of the program - which is exactly why you need to keep track of it in your fp (pointer) variable.
All other calls to library functions (such as fwrite(), fread() and fclose()), which access that file, will need that fp variable as a parameter; this indicates to the functions (and to the system) which file object you are working with.
To give an authoritative and detailed explanation about why your program receives a different address in the file pointer, each time you run it, would require equally detailed and authoritative knowledge of your system's implementation of the fopen() call (and related I/O support code) – and that is knowledge that I don't have.
However, here are two possible explanations:
Each time you call fopen(), the system allocates space for the required FILE structure by calling malloc(sizeof(FILE)); this will return the address of the first available chunk of system memory of sufficient size, which will clearly vary between runs, depending on what other programs and/or services are using the system's memory pool.
The I/O subsystem has a fixed, internal table of FILE structures, each with its (fixed) starting address; when you call fopen(), the system assigns the first available table entry to your opened file and the function returns the address of that. But this can also vary between runs, depending on what other programs/services are using entries in that table.
If I had to make a guess (and that's all it would be), the large differences between the addresses you show in your example would tend to make me favour the first possibility. But there are numerous other ways your system could handle the task.

Understanding file pointers and buffers

Why, in C, do you need a separate buffer to read a FILE *? When you declare a FILE * and assign to it with fopen, does the file then not exist in contiguous memory starting at the address of said pointer? I'm struggling to make the connection as to why you need need to read via fread() into a separate buffer. If someone could explain how theFIlE *file = fopen(filename, "r") and the subsequent fread(&buffer,...) work in conjunction it would help my understanding tremendously. Thanks in advance.
The FILE * returned by fopen is an unnecessary, but useful, layer of indirection.
Theoretically, fopen could have been designed to read the whole file into a buffer in memory, and just return you that buffer.
The issue approach is that it's not flexible at all. It forces you to read the entire file for all file IO operations, which is very undesirable. For example, here are some problems that would come about:
How could you read a file that's too big to fit in RAM?
What if you just want to append a new line at the end of a file (such as for logging). You would have to read the whole file, append the line at the end, then rewrite the entire file back. Expensive!
What if you're only interested in reading a small part of a file, such as reading the magic number to identify the file's type, without regard for its actual content?
What if you wanted to simultaneously edit the file from multiple programs. Each program need to constantly reread the whole file into memory, to ensure it kept up-to-date.
fopen returns a file handle that identifies a file still on disk. How much you read out of this file into memory is entirely up to you.
The explanation given above is pretty much self explanatory , still I would try to make it simple (in case anyone has problem understanding it)
In Short,consider this example and you yourself would know 'why?'
1) your files might be too large and stored in your hard drive , then if you try reading it frequently, don't you think this is an overhead for loading whole file again and again.
2) And more worse say the file is huge then if you load whole of your file it consumes your RAM even if you don't need whole file at once.
Why, in C, do you need a separate buffer to read a FILE *?
First thing, Because reading into buffers and then using it is always faster.
does the file then not exist in contiguous memory starting at the
address of said pointer?
May or May not be ,depending on its size.
If someone could explain how theFIlE *file = fopen(filename, "r") and
the subsequent fread(&buffer,...) work in conjunction
The fopen() function is used to open a file and associates an I/O stream with it. This function takes two arguments. The first argument is a pointer to a string containing name of the file to be opened while the second argument is the mode in which the file is to be opened.
Various modes can be like r, r+, w, w+, a, a+ .
The fopen() function returns a FILE stream pointer on success while it returns NULL in case of a failure.
Look here for detailed info.
Why, in C, do you need a separate buffer to read a FILE *?
No buffer aren't necessary while they are usually present to accelerate I/Os.
When you declare a FILE * and assign to it with fopen, does the file
then not exist in contiguous memory starting at the address of said
pointer?
Certainly not, 'cause this would be at least inefficient (why read a entire file huge if it is not needed at the end?) and at worst impossible at all (RAM size is usually much less than DISK size).
If someone could explain how the FILE *file = fopen(filename, "r") and
the subsequent fread(&buffer,...) work in conjunction it would help my
understanding tremendously.
Then FILE * is not an handle to a memory object that contains the file data, but is a memory object that contains data to help accessing file data on disk. That opaque object (opaque means don't try to look inside details are hidden) contains for example the current offset (remember when you read or write this is done at a given offset and this would modify the offset accordingly), or the open mode (this way writing into an opened for read file will correctly fails), or some buffer (that may contains part of the file and sometimes the whole file!), etc. A FILE * is handle as a handle for a door. Don't confuse file and FILE*, the first is a generic term to embrace what you already know (data on disk), then second is a type to represent an opened file which is a dynamic object to represent manipulation of a given file.
I'm struggling to make the connection as to why you need need to read
via fread() into a separate buffer.
If you don't have/can't have the file in memory, then you need to ask for reading the part you are interested in.

What is the concept behind file pointer or the stream pointer?

I know that pointer is a variable that stores address of another variable. So i understood the concepts of char type pointers, integer type pointers, what happens when we add one to a pointer etc. But i didn't get the real concept behind file pointer. Why can't we directly point to them as we do in case of character data type? For eg consider a file with content:
Hello World
fantastic
Let 'ptr' point to this file. Why can't we use the technique ptr to point to 'H', (ptr+1) to 'e', (ptr+2) to 'l' and so on. If my question is stupid, forgive sometimes it would becomes clear if i understand the real concept. I think this file is actually stored in memory just like a string is stored in memory.
(I know fscanf() function and all)
There's something called memory mapped file, but this apart, you can achieve what you want (if I understood it correctly) simply opening the file and loading it into a buffer (which is by the way a common way of reading data from files).
Once in memory, you access the first byte with *buf, the second with *(buf+1) and so on; or, usually better since clearer, with buf[0], buf[1] and so on.
Why you can't if you don't use a memory mapped file? Since what you have when you open a file in C (using fopen) is an opaque pointer (i.e. a pointer pointing to data unnknown to you, you must consider it as a "concept" rather than actual data you can read) allowing other functions (fread, fwrite, fseek, and so on) to "operate" on that file you opened, but that pointer does not "contain" the bytes of the file. It is called sometimes handler for a reason: it makes it possibile to "handle" the file.
Using that opaque pointer FILE*, you can read bytes from that file in memory, then you can process the data in memory.
I think you're talking in the context of the C language. No, the file is not actually stored in memory. Think of the file pointer e.g. as a pointer/arrow which shows you how far you're in the process of reading of that file. This means, if you now do a read operation, the pointer tells you which char/int etc. you will read from this file i.e. where you are currently in it. This is what the pointer is for. This is my way of roughly and informally explaining this.
I suppose the basic reason I wouldn't expect ptr+1 to give me the second character of the file is that generally, in C, pointer arithmetic moves you by one object, not one byte; so I would expect ptr+1 to point to the "next" file, whatever that means (if anything).
And files are generally stored on disk, not in memory.
The file is not stored in memory. It can be brought to memory (or parts of it) when you open it. Files are not part of your program's data, they're just an entity you can use with the help of the operating system.
There is a lot more behind files when compared to regular character arrays in memory. Reading from and writing to files is generally buffered, this is handled by the standard C library FILE structure, which lets you invoke operations on a file.
And what would even mean to have a "pointer to a file"? You see, ptr+1 to scan through the file is not a good choice for many reasons. If it's binary data, what exactly do you expect with ptr+1? What if you wanted to read larger chunks of data, like a line at a time?
As you can see, there are several reasons for this choice, the main one being that files are not layed out in memory in your program's address space like regular variables. A structure describing the file and your cursor position is the most common approach.
Another important point to note is that the semantics of ptr+1 make sense for language built-in types only. A file is not a built-in type, and it wouldn't make sense to be.

Make FILE* struct map to a buffer?

I am working with some legacy code which uses something like this:
void store_data(FILE *file);
However, I don't want to store data on the disk, I want to store it in memory (char *buf). I could edit all of the code, but the code jumps all over the place and fwrite gets called on the file all over the place. So is there an easier way, for example that I can map a FILE* object to an (auto-growing) buffer? I do not know the total size of the data before-hand.
The solution has to be portable.
There is no way to do this using only the facilities provided in the C standard. The closest you can come is
FILE *scratch = tmpfile();
...
store_data(scratch);
...
/* after you're completely done calling the legacy code */
rewind(scratch);
buf = read_into_memory_buffer(scratch);
fclose(scratch);
This does hit the disk, at least potentially, but I'd say it's your best bet if you need wide portability and can't modify the "legacy code".
In POSIX.1-2008, there is open_memstream, which does exactly what you want; however, this revision of POSIX is not yet widely adopted. GNU libc (used on Linux and a few others) has it, but it's not available on OSX or the *BSDs as far as I know, and certainly not on Windows.
You might want to look at fmemopen and open_memstream. They do something in the direction of what you want.
From the man page:
The open_memstream() function opens a stream for writing to a buffer.
The buffer is dynamically allocated (as with malloc(3)), and automati‐
cally grows as required. After closing the stream, the caller should
free(3) this buffer.
I don't know if it's a good idea, but it's an idea.
You can "redefine" fwrite using a macro.
#define fwrite(a, b, c) your_memory_write_function(a, b, c)
Then implement memory_write_function to write data to your auto growing buffer instead of a file.
You will need to call store_data with a pointer to something else though (not a pointer to FILE). But that's possible with C so you will have no issues there.
On what platform are you running? Can't you use tmpfs? If you open a file on tmpfs, is it not, from the point of view of the kernel, the same as a regular file, but written to memory?
You may want to look into fmemopen(). If that's not available to you, then you could possibly use a named shared memory segment along with fdopen() to convert the file descriptor returned by shm_open() to a FILE*.

Storing struct array in kernel space, Linux

I believe I may be over-thinking this problem a bit... I've got a text file located on my filesystem which I am parsing at boot and storing the results into an array of structs. I need to copy this array from user space into kernel space (copy_from_user), and must have this data accessible by the kernel at any time. The data in kernel space will need to be accessed by the Sockets.c file. Is there a special place to store an array within kernel space, or can I simply add a reference to the array in Sockets.c? My C is a bit rusty...
Thanks for any advice.
I believe there are two main parts in your problem:
Passing the data from userspace to kernelspace
Storing the data in the kernelspace
For the first issue, I would suggest using a Netlink socket, rather than the more traditional system call (read/write/ioctl) interface. Netlink sockets allow configuration data to be passed to the kernel using a socket-like interface, which is significantly simpler and safer to use.
Your program should perform all the input parsing and validation and then pass the data to the kernel, preferably in a more structured form (e.g. entry-by-entry) than a massive data blob.
Unless you are interested in high throughput (megabytes of data per second), the netlink interface is fine. The following links provide an explanation, as well as an example:
http://en.wikipedia.org/wiki/Netlink
http://www.linuxjournal.com/article/7356
http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO
http://www.kernel.org/doc/Documentation/connector/
As far as the array storage goes, if you plan on storing more than 128KB of data you will have to use vmalloc() to allocate the space, otherwise kmalloc() is preferred. You should read the related chapter of the Linux Device Drivers book:
http://lwn.net/images/pdf/LDD3/ch08.pdf
Please note that buffers allocated with vmalloc() are not suitable for DMA to/from devices, since the memory pages are not contiguous. You might also want to consider a more complex data structure like a list if you do not know how many entries you will have beforehand.
As for accessing the storage globally, you can do it as with any C program:
In a header file included by all .c files that you need to access the data put something like:
extern struct my_struct *unique_name_that_will_not_conflict_with_other_symbols;
The extern keyword indicates that this declares a variable that is implemented at another source file. This will make this pointer accesible to all C files that include this header.
Then in a C file, preferrably the one with the rest of your code - if one exists:
struct my_struct *unique_name_that_will_not_conflict_with_other_symbols = NULL;
Which is the actual implementation of the variable declared in the header file.
PS: If you are going to work with the Linux kernel, you really need to brush up on your C. Otherwise you will be in for some very frustrating moments and you WILL end up sorry and sore.
PS2: You will also save a lot of time if you at least skim through the whole Linux Device Drivers book. Despite its name and its relative age, it has a lot of information that is both current and important when writing any code for the Linux Kernel.
You can just define an extern pointer somewhere in the kernel (say, in the sockets.c file where you're going to use it). Initialise it to NULL, and include a declaration for it in some appropriate header file.
In the part of the code that does the copy_from_user(), allocate space for the array using kmalloc() and store the address in the pointer. Copy the data into it. You'll also want a mutex to be locked around access to the array.
The memory allocated by kmalloc() will persist until freed with kfree().
Your question is basic and vague enough that I recommend you work through some of the exercises in this book. The whole of chapter 8 is dedicated to allocating kernel memory.
Initializing the Array as a global variable in your Kernel Module will make it accessible forever until the kernel is running i.e. until your system is running.

Resources