What is the concept behind file pointer or the stream pointer? - c

I know that pointer is a variable that stores address of another variable. So i understood the concepts of char type pointers, integer type pointers, what happens when we add one to a pointer etc. But i didn't get the real concept behind file pointer. Why can't we directly point to them as we do in case of character data type? For eg consider a file with content:
Hello World
fantastic
Let 'ptr' point to this file. Why can't we use the technique ptr to point to 'H', (ptr+1) to 'e', (ptr+2) to 'l' and so on. If my question is stupid, forgive sometimes it would becomes clear if i understand the real concept. I think this file is actually stored in memory just like a string is stored in memory.
(I know fscanf() function and all)

There's something called memory mapped file, but this apart, you can achieve what you want (if I understood it correctly) simply opening the file and loading it into a buffer (which is by the way a common way of reading data from files).
Once in memory, you access the first byte with *buf, the second with *(buf+1) and so on; or, usually better since clearer, with buf[0], buf[1] and so on.
Why you can't if you don't use a memory mapped file? Since what you have when you open a file in C (using fopen) is an opaque pointer (i.e. a pointer pointing to data unnknown to you, you must consider it as a "concept" rather than actual data you can read) allowing other functions (fread, fwrite, fseek, and so on) to "operate" on that file you opened, but that pointer does not "contain" the bytes of the file. It is called sometimes handler for a reason: it makes it possibile to "handle" the file.
Using that opaque pointer FILE*, you can read bytes from that file in memory, then you can process the data in memory.

I think you're talking in the context of the C language. No, the file is not actually stored in memory. Think of the file pointer e.g. as a pointer/arrow which shows you how far you're in the process of reading of that file. This means, if you now do a read operation, the pointer tells you which char/int etc. you will read from this file i.e. where you are currently in it. This is what the pointer is for. This is my way of roughly and informally explaining this.

I suppose the basic reason I wouldn't expect ptr+1 to give me the second character of the file is that generally, in C, pointer arithmetic moves you by one object, not one byte; so I would expect ptr+1 to point to the "next" file, whatever that means (if anything).
And files are generally stored on disk, not in memory.

The file is not stored in memory. It can be brought to memory (or parts of it) when you open it. Files are not part of your program's data, they're just an entity you can use with the help of the operating system.
There is a lot more behind files when compared to regular character arrays in memory. Reading from and writing to files is generally buffered, this is handled by the standard C library FILE structure, which lets you invoke operations on a file.
And what would even mean to have a "pointer to a file"? You see, ptr+1 to scan through the file is not a good choice for many reasons. If it's binary data, what exactly do you expect with ptr+1? What if you wanted to read larger chunks of data, like a line at a time?
As you can see, there are several reasons for this choice, the main one being that files are not layed out in memory in your program's address space like regular variables. A structure describing the file and your cursor position is the most common approach.
Another important point to note is that the semantics of ptr+1 make sense for language built-in types only. A file is not a built-in type, and it wouldn't make sense to be.

Related

What is the difference between a pointer to a buffer and a pointer to a file?

In Chapter 22 of "C Programming: A Modern Approach", the basics of the <stdio.h>header are explained.
One detail that has me a little confused is the difference between a pointer to a buffer and a pointer to a file (denoted as FILE *).
Consider the following (through which the confusion is derived):
fopen is prototyped as: FILE *fopen(const char * restrict filename, const char * restrict mode).
fflush is prototyped as int fflush (FILE *stream). fflush is described as a function the flushes a file's buffer.
setvbuf is prototyped as int setvbuf (FILE * restrict stream, char * restrict buf, int mode, size_t size). When the author describes this function, he refers to the second parameter (buf) as the address of the buffer...which presumably is the same idea as pointer to buffer.
Firstly, from what I understand (especially given the name choice of the first parameter in fflush and setvbuf), a stream is semantically equivalent to a pointer to a file. Importantly, therefore, a stream IS NOT the file itself. A stream is the location of the file, at least as is represented through virtual memory (please correct if this is off base).
Secondly, when one opens a file, this amounts to creating a corresponding buffer (that is also represented in virtual memory).
At first, because of fflush's prototype, I was under the impression that the pointer to a file was in practice the pointer to a buffer; this is clearly wrong given the prototype of setvbuf (which has distinct parameters for the pointer to a file and the address of the buffer). So what exactly is the pointer to file pointing to?
Further, how does one acquire the address associated with a given file's buffer (the author has not shown a function yet the returns the address of a buffer associated with the file that was opened).
Any insight is greatly appreciated. Cheers~
The terms “stream” and “file” are a bit muddled in C. A file is something outside the program, and it may be a physical device, a file on disk, or some other thing provided by the operating system.
A stream is, roughly, an interface to a file. It is largely constructed in the C environment by using various data structures to remember information about the file it is connected to, to hold data being written to or read from the file, and so on.
For historic reasons, a stream is managed through a data structure type called FILE. A FILE * is actually a pointer to a stream (or, more technically, a pointer to the data used to control a stream). The data in a FILE includes a file position indicator, a pointer to its associated internal buffer (not anything you should use), and information about errors that have occurred or whether the end of the file has been reached. It would be better if the name were STREAM instead of FILE, but we are stuck with FILE due to history.
A buffer is often an array of char or unsigned char used to hold data being moved between various things, although there can be buffers of other types. The buf argument to setvbuf is used to provide a buffer to be used with the stream. This is not a commonly used routine. Passing an array to setvbuf gives the array to the C library to use for that stream. The program should stop using the array for any other purpose until it closes that stream. This is different from an array you use to read or write characters using the other functions like getchar or fputc.

Pointer alignment issue

I have the content of a file already loaded in memory and I want to assign the data from the file to a convenient set of structs, and I don´t want to allocate new memory.
So I have the pointer of the memory where the data from the file starts, from there I work down this pointer assigning the values to different structs but then I reach a point where the program crashes.
//_pack_dynamic is the pointer to the data in memory
us *l_all_indexes = (us *) _pack_dynamic; //us is an unsigned short
printf("Index 0:%d", l_all_indexes[0]); //here is where the program crashes
_pack_dynamic += sizeof(us) * m_number_of_indexes;
The data, at least for the first element, is there, I can get it out like so:
us temp;
memcpy(&temp, _pack_dynamic, sizeof(us));
Any idea how I could extract all the indexes (m_number_of_indexes) from _pack_dynamic and assign them to l_all_indexes without allocating new memory?
Accessing _pack_dynamic as if it contained us object(s) has undefined behaviour unless it actually does contain such objects (this is a slight simplification, but a good rule of thumb. An array of char certainly cannot be interpreted as short).
The memcpy way into a proper us object is the only standard way to interpret memory as an object. Another approach for integers is to read char by char and shift-mask-or them together. This approach allows assuming a particular endianness instead of native.
A system dependent way that might work is to make sure that _pack_dynamic is aligned to the boundary required by us. But even then, standard gives you no guarantees about behaviour.
"Allocating" an automatic variable has hardly any runtime overhead. Allocating a few bytes for a short is usually insignificant.

C: Synchronising two file pointers to the same file

I need two file pointers (FILE *) to operate alongside each other. One is to apply append operations and another is for reading and overwriting.
I need appends to the file from one pointer to be recognised by the other file pointer so that the other file pointer can both correctly read and overwrite this appended data.
To synchronise the data, it appears that using fflush() on the appending file pointer works (at least for reading it does), but is this the correct way to achieve what I want and is it portable?
Thank you.
You should be able to do that with one pointer (and thus not having to sync unnecessarily). Just use fseek(f, SEEK_END, 0); when you want to add at the end. Use "rb+" to make the file readable and writeable.
As long as you don't use multiple threads to access the file, this should work just fine.

Where is the FILE struct allocated?

In C, when opening a file with
FILE *fin;
fin=fopen("file.bin","rb");
I only have a pointer to a structure of FILE. Where is the actual FILE struct allocated on Windows machine? And does it contain all the necessary information for accessing the file?
My aim is to dump the whole data segment to disk and then to reload the dumped file back to the beginning of the data segment. The code that reloads the dumped file is placed in a separate function. This way, the fin pointer is local and is on the stack, thus is not being overwritten on reload. But the FILE struct itself is not local. I take care not to overwrite the memory region of size sizeof(FILE) that starts at the address fin.
The
fread(DataSegStart,1,szTillFin,fin);
fread(dummy,1,sizeof(FILE),fin);
fread(DataSegAfterFin,1,szFinTillEnd,fin);
operations completes successfully, but I get an assertion failure on
fclose(fin)
Do I overwrite some other necessary file data other than in the FILE struct?
The actual instance of the FILE structure exists within the standard library. Typically the standard library allocates some number of FILE structures, which may or may not be a fixed number of them. When you call fopen(), it returns a pointer to one of those structures.
The data within the FILE structure likely contains pointers to other things such as buffers. You're unlikely to be able to save and restore those structures to disk without some really deep integration with your standard library implementation.
You may be interested in something like CryoPID which does process save and restore at a different level.
It seems like you're trying to do something dangerous, unlikely to work.
fopen allocates a FILE structure and initializes it. fclose releases it. How it allocates it and what it puts in it is implementation dependent. It could contain a pointer to another piece of memory, which is also allocated somewhere (since it's buffered I/O, I guess it does allocate a buffer somewhere).
Writing code that relies on the internals of fopen is dangerous, most likely won't work, and surely won't be stable and portable.
Well, you have a pointer to a FILE object, so technically you know where it is but you should be aware that FILE is deliberately an opaque type. You shouldn't need to know what it contains, you just need to know that you can pass it to functions that know about it to perform certain actions. Additionally, FILE may not be a complete type so sizeof(FILE) might not be correct and, additionally, the object might contain pointers to other structures. Simply avoiding overwriting the FILE object is not likely to be sufficient for you to avoid corrupting the program by writing over most of its memory.
FILE is defined in stdio.h. It contains all the information about the file but, looking at the code you show, I think you don't understand its purpose. It is created and run through the operating system with the C library which fills FILE with information about the file but it is not contained in the file itself.

Why must we convert everything to a char* when we write it to a file?

Out of curiosity, why are we converting something to a char* when we write it to a ( binary ) file?
Because the typical I/O functions that write, take a pointer to char. This is because someone considered that the most representative way of talking about the data stored in a binary file; it's just a bunch of the machine's smallest adressable word, in sequence. The C type name for that is char.
char* merely represents a pointer to the beginning of a sequence of bytes, which is exactly what one expects a binary file to contain.
Unwind and vezult have already answered your question, and I assume you know what a pointer is. But just in case you think of *converting something to a char** as an operation that actually somehow changes your data in memory (and, for example, may take more time if there's a lot of data) then note that such is not the behavior of getting a pointer.
Are you talking about fread() and fwrite()? The data they read or write are passed as void* (or const void*), so you don't have to convert.
But in C++ when you use, say, istream::read(), then the pointer to the reception buffer must be passed as a char*, so there is no implicit conversion.

Resources