confused about using ftell() to check if the file is empty - c

I want to add a structure to a binary file but first i need to check whether the file has previous data stored in it, and if not i can add the structure,otherwise ill have to read all the stored data and stick the structure in its correct place, but i got confused about how to check if the file is empty ,i thought about trying something like this:
size = 0
if(fp!=NULL)
{
fseek (fp, 0, SEEK_END);
size = ftell (fp);
rewind(fp);
}
if (size==0)
{
// print your error message here
}
but if the file is empty or still not created how can the file pointer not be NULL ? whats the point of using ftell() if i can simply do something like this :
if(fp==NULL){fp=fopen("data.bin","wb");
fwrite(&struct,sizeof(struct),1,ptf);
fclose(fp);}
i know that NULL can be returned in other cases such as protected files but still i cant understand how using ftell() is effective when file pointers will always return NULL if the file is empty.any help will be appreciated :)

i need to check whether the file has previous data stored in it
There might be no portable and robust way to do that (that file might change during the check, because other processes are using it). For example, on Unix or Linux, that file might be opened by another process writing into it while your own program is running (and that might even happen between your ftell and your rewind). And your program might be running in several processes.
You could use operating system specific functions. For POSIX (including Linux and many Unixes like MacOSX or Android), you might use stat(2) to query the file status (including its size with st_size). But after that, some other process might still write data into that file.
You might consider advisory locking, e.g. with flock(2), but then you adopt the system-wide convention that every program using that file would lock it.
You could use some database with ACID properties. Look into sqlite or into RDBMS systems like PostGreSQL or MariaDB. Or indexed file library like gdbm.
You can continue coding with the implicit assumption (but be aware of it) that only your program is using that file, and that your program has at most one process running it.
if the file is empty [...] how can the file pointer not be NULL ?
As Increasingly Idiotic answered, fopen can fail, but usually don't fail on empty files. Of course, you need to handle fopen failure (see also this). So most of the time, your fp would be valid, and your code chunk (assuming no other process is changing that file simulateously) using ftell and rewind is an approximate way to check that the file is empty. BTW, if you read (e.g. with fread or fgetc) something from that file, that read would fail if your file was empty, so you probably don't need to check its emptiness before.
A POSIX specific way to query the status (including size) of some fopen-ed file is to use fileno(3) and fstat(2) together like fstat(fileno(fp), &mystat) after having declared struct stat mystat;

fopen() does not return NULL for empty files.
From the documentation:
If successful, returns a pointer to the object that controls the opened file stream ... On error, returns a null pointer.
NULL is returned only when the file could not be opened. The file could fail to open due to any number of reasons such as:
The file doesn't exist
You don't have permissions to read the file
The file cannot be opened multiple times simultaneously.
More possible reasons in this SO answer
In your case, if fp == NULL you'll need to figure out why fopen failed and handle each case accordingly. In most cases, an empty file will open just fine and return a non NULL file pointer.

Related

Load file content into memory, C

I will be dealing with really huge files, for which I want just partially to load the content into memory. So I was wondering if the command:
FILE* file=fopen("my/link/file.txt", "r");
loads the whole file content into memory or it is just a pointer to the content? After I open the file I use fgets() to read the file line by line.
And what about fwrite()? Do I need to open and close the file every time I write something so It doesn't overloads or it is managed in the background?
Another thing, is there maybe a nice bash command like "-time" which could tell me the maximal peak memory of my executed program ? I am using OSx.
As per the man page for fopen(),
The fopen() function opens the file whose name is the string pointed to by path and associates a stream with it.
So, no, it does not load the content of the file into memory or elsewhere.
To operate on the returned file pointer, as you already know, you need to use fgets() and family.
Also, once you open the file, get a pointer and does not fclose() the same, you can use the pointer any number of time to write into the file (remember to open the file in append more). You don't need to open and close for every read and write made to the pointer.
Also, FWIW, if you want to move the file pointer back and forth, you may feel fseek() can come handy.
fopen does not load all the file into the memory. It create a file descriptor to the file. Like a pointer to the place of the open file table.
in the open file table you have a pointer to the location of the file on the disk.
if you want to go to place on the file use fseek.
another Option is to use mmap. This is create new mapping in the virtual address space of the calling process. You can access to the file like an array.. (not all the file load into the memory. it use the memory pages mechanism to load the data)
fopen does not read the file, fread and fgets and similar functions do.
Personally I've never tried reading and writing a file at the same time.
It should work, though.
You can use multiple file pointers to the same file.
There is no command like time for memory consumption. The simplest way is to look at top. There exist malloc/new replacement libraries which can do that for you.
loads the whole file content into memory or it is just a pointer to the content?
No,
fopen() open file with the specified filename and associates it with a stream that can be identified by the FILE pointer.
fread() can be used to get file contents into buffer.
Multiple read/write operations can be carried out without any need for opening files number of times.
Functions like rewind() and fseek() can be used to change position of cursor in file.

mkstemp() - is it safe to close descriptor and reopen it again?

When generating a temporary file name using mkstemp(), is it safe to immediately call close() on the file descriptor returned by mkstemp(), store the file name generated by mkstemp() somewhere and use it (at a much later time) to open the file again for writing a temporary file? Or will this temporary file name become available again as soon as I call close() on it?
The reason why I'm asking is that I'm wondering why mkstemp() returns a file descriptor at all. If it is safe to close() the descriptor immediately, why does it return a descriptor at all? mkstemp() could close it then on its own and just give me a file name.
No. In between the time when you use mkstemp() to create the file and the time when you reopen it, your adversary may have removed the file you created and put a symlink in its place pointing to somewhere else altogether. This is a TOCTOU — Time of Check, Time of Use — vulnerability which the use of mkstemp() largely avoids, provided you keep the file descriptor open.
Once you close the file descriptor, all bets are off in a sufficiently hostile environment.
Note that even if you keep the file descriptor open, an adversary might remove the file, or rename it, and then create their own file (symlink, directory) in its place. The file descriptor remains valid. You could use stat() to get the name information and the fstat() to get the file descriptor information, and if the two match (st_dev and st_ino fields), then you're probably still OK. If they differ, someone's got at the file — if you rename it, you may be renaming their file rather than the one you created.
While the file originally created by mkstemp() still exists, the name will not be regenerated. In general, successive calls to mkstemp() will create distinct names anyway, but the name is guaranteed to be unique at the moment of creation (see the O_EXCL flag for open()).
And just in case you're wondering, no — there isn't a way to associate a name with a file descriptor (there is no hypothetical int flink(int fd, const char *name) system call). There was a question about that on one of the Stack Exchange sites a while ago, and the answer was definitely negative, with references to the Linux Kernel mailing list and so on. One such question is Is it possible to recreate a file from an opened file descriptor?, but I think there was a more thorough version of the question too.
The mkstemp function specifically uses descriptors instead of filenames to avoid race conditions that are commonly associated with its predecessors such as mktemp. In fact, the "s" in "mkstemp" means "secure", because the race condition can be a source of vulnerability (e.g. if you use the temporary file to store JIT code, for example, and someone guessing/stomping the file before you open it could cause your application to load/run the provided code rather than the code that your program generates).
Once you close the descriptor, nothing prevents another application from writing a file with the same name, so please don't do that. You should retain the descriptor for as long as the temporary file is needed (and close the descriptor once the temporary file is no longer going to be used by your program).

Can a failed fopen impact the filesystem?

If fopen( path, "w" ) succeeds, then the file will be truncated. If the fopen fails, are there an guarantees that the file is not modified?
No there are no guarantees about the state of a file if fopen(path, "w") fails. The failure could be coming from any operation from opening the file, committing the truncation to disk, etc ... The only guarantee a failure provides is that you don't have access to the file.
The only reason why fopen() would fail would be if the file is somehow inaccessible or cannot be modified. If you are worried, though, about the file being modified, you could instead use the open() command with the flag O_WRITE. You could then convert this to a FILE* pointer by using fdopen().
Excellent question, and I think the answer is no. fopen has to allocate a FILE structure, and the natural order of operations when implementing it would be to open the file first, then attempt allocating the FILE. This way, fopen is just a wrapper around fdopen (or a similar function with some leading underscores or whatnot for namespace conformance).
Personally I would not use stdio functions at all when you care about the state of your files after any failure. Even once you have the file open, stdio's buffering makes it almost impossible to know where an error occurred if a write function ever returns failure, and even more impossible to return your file to a usable, consistent state.

What is Opening a file in C?

In C when we open a file what happens?? As I know that the contents of the file is not loaded in the memory when we open a file. It just sets the file descriptor ? So what is this file descriptor then?? And if the contents of the file is not loaded in the memory then how a file is opened?
Typically, if you're opening a file with fopen or (on a POSIX system) open, the function, if successful, will "open the file" - it merely gives you a value (a FILE * or an int) to use in future calls to a read function.
Under the hood, the operating system might read some or all of the file in, it might not. You have to call some function to request data to be read anyways, and if it hasn't done it by the time you call fread/fgets/read/etc... then it will at that point.
A "file descriptor" typically refers to the integer returned by open in POSIX systems. It is used to identify an open file. If you get a value 3, somewhere, the operating system is keeping track that 3 refers to /home/user/dir/file.txt, or whatever. It's a short little value to indicate to the OS which file to read from. When you call open, and open say, foo.txt, the OS says, "ok, file open, calling it 3 from here on".
This question is not entirely related to the programming language. Although the library does have an impact on what happens when opening a file (using open or fopen, for example), the main behavior comes from the operating system.
Linux, and I assume other OSs perform read ahead in most cases. This means that the file is actually read from the physical storage even before you call read for the file. This is done as an optimization, reducing the time for the read when the file is actually read by the user. This behavior can be controlled partially by the programmer, using specific flag for the open functions. For example, the Win32 API CreateFile can specify FILE_FLAG_RANDOM_ACCESS or FILE_FLAG_SEQUENTIAL_SCAN to specify random access (in which case the file is not read ahead) or sequential access (in which case the OS will perform quite aggressive read ahead), respectively. Other OS APIs might give more or less control.
For the basic ANSI C API of open, read, write that use a file descriptor, the file descriptor is a simple integer that is passed onto the OS and signifies the file. In the OS itself this is most often translated to some structure that contains all the needed information for the file (name, path, seek offsets, size, read and write buffers, etc.). The OS will open the file - meaning find the specific file system entry (an inode under Linux) that correlates to the path you've given in the open method, creates the file structure and return an ID to the user - the file descriptor. From that point on the OS is free to read whatever data it seems fit, even if not requested by the user (reading more than was requested is often done, to at least work in the file system native size).
C has no primitives for file I/O, it all depends on what operating system
and what libraries you are using.
File descriptors are just abstracts. Everything is done on the operating system.
If the program uses fopen() then a buffering package will use an implementation-specific system call to get a file descriptor and it will store it in a FILE structure.
The system call (at least on Unix, Linux, and the Mac) will look around on (usually) a disk-based filesystem to find the file. It creates data structures in the kernel memory that collects the information needed to read or write the file.
It also creates a table for each process that links to the other kernel data structures necessary to access the file. The index into this table is a (usually) small number. This is the file descriptor that is returned from the system call to the user process, and then stored in the FILE struct.
As already mentioned it is OS functionality.
But for C file I/O most probably you need info on fopen function.
If you will check description for that function, it says :
Description:
Opens a stream.
fopen opens the file named by
filename and associates a stream with
it. fopen returns a pointer to be used
to identify the stream in subsequent
operations.
So on successful completion fopen just returns a pointer to the newly opened stream. And it returns NULL in case of any error.
When you open the file then the file pointer gets the base address(starting address)of that file.Then you use different functions to work on the file.
EDIT:
Thanks to Chris,here is the structure which is named FILE
typedef struct {
int level; /* fill/empty level of buffer */
unsigned flags; /* File status flags */
char fd; /* File descriptor */
unsigned char hold; /* Ungetc char if no buffer */
int bsize; /* Buffer size */
unsigned char *buffer; /* Data transfer buffer */
unsigned char *curp; /* Current active pointer */
unsigned istemp; /* Temporary file indicator */
short token; /* Used for validity checking */
} FILE;

Is it ‘safe’ to remove() open file?

I think about adding possibility of using same the filename for both input and output file to my program, so that it will replace the input file.
As the processed file may be quite large, I think that best solution would to be first open the file, then remove it and create a new one, i.e. like that:
/* input == output in this case */
FILE *inf = fopen(input, "r");
remove(output);
FILE *outf = fopen(output, "w");
(of course, with error handling added)
I am aware that not all systems are going to allow me to remove open file and that's acceptable as long as remove() is going to fail in that case.
I am worried though if there isn't any system which will allow me to remove that open file and then fail to read its' contents.
C99 standard specifies behavior in that case as ‘implementation-defined’; SUS doesn't even mention the case.
What is your opinion/experience? Do I have to worry? Should I avoid such solutions?
EDIT: Please note this isn't supposed to be some mainline feature but rather ‘last resort’ in the case user specifies same filename as both input and output file.
EDIT: Ok, one more question then: is it possible that in this particular case the solution proposed by me is able to do more evil than just opening the output file write-only (i.e. like above but without the remove() call).
No, it's not safe. It may work on your file system, but fail on others. Or it may intermittently fail. It really depends on your operating system AND file system. For an in depth look at Solaris, see this article on file rotation.
Take a look at GNU sed's '--in-place' option. This option works by writing the output to a temporary file, and then copying over the original. This is the only safe, compatible method.
You should also consider that your program could fail at any time, due to a power outage or the process being killed. If this occurs, then your original file will be lost. Additionally, for file systems which do have reference counting, your not saving any space, over the temp file solution, as both files have to exist on disk until the input file is closed.
If the files are huge, and space is at premium, and developer time is cheap, you may be able to open a single for read/write, and ensure that your write pointer does not advance beyond your read pointer.
All systems that I'm aware of that let you remove open files implement some form of reference-counting for file nodes. So, removing a file removes the directory entry, but the file node itself still has one reference from open file handle. In such an implementation, removing a file obviously won't affect the ability to keep reading it, and I find it hard to imagine any other reasonable way to implement this behavior.
I've always got this to work on Linux/Unix. Never on Windows, OS/2, or (shudder) DOS. Any other platforms you are concerned about?
This behaviour actually is useful in using temporary diskspace - open the file for read/write, and immediately delete it. It gets cleaned up automatically on program exit (for any reason, including power-outage), and makes it much harder (but not impossible) for others to monitor it (/proc can give clues, if you have read access to that process).

Resources