Why does fgetc move the file position indicator backwards? - c

A program that runs just fine on my freeBSD system fails when I build it on windows (Visual Studio 15). It goes into an endless loop here:
//...
while (1) {
if ('#' == fgetc(f)) {
// we do some stuff here. irrelevant for stackoverflow question
break;
}
fseek(f, -1, SEEK_CUR);
if (0 != fseek(f, -1, SEEK_CUR)) {
// Beginning of file.
break;
}
}
//...
On closer look (by adding a bunch of fgetpos()-calls) I find that fgetc moves the file position indicator backwards. So it misses the beginning of the file and some '#' if they are not in a multiple-of-3 position from the end.
I notice that this only happenes when the file f is opened with
fopen(filename, "a+");
//text mode read/append
When I change it to
fopen(filename, "ab+");
//binary mode read/append
then everything works as expected.
I think for my code it is safe just to use binary mode all the time.
But two questions remain:
Are there reasons that stand against binary mode?
What trickery is this with wrong direction in text mode?

Quoting C11 7.21.9.2 the fseek function:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
Invoking fseek with a whence argument of SEEK_CUR on a stream open in text mode is not covered by the C Standard. Opening the file in binary mode seems a much better option.
The value returned by fgetpos() may not be meaningful as an offset in the file, it is only meant to be passed as an argument to fsetpos().
As a general remark, you should try and change you algorithms to avoid relying on backwards seeks in the stream, especially relying on fseek() errors seems unreliable. Instead save the position before the fgetc() with ftell() or fgetpos() and restore it when needed with fseek(pos, SEEK_SET, fp) or fsetpos().

Related

Is it recommended method for computing the size of a file using fseek()?

In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?
If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.
The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr ยง7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.
The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat
Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

Unexpected result from fseek/ftell when using C.UTF-8 as the current locale

I was testing the interaction of fseek and fgetpos (more precisely if I can get an fpos_t that's inside a multi byte) and got into a pretty unexpected situation.
Whenever I use setlocale(LC_CTYPE, "C.UTF-8"); and fputwc, fseek seems to not work anymore and the only way to move the cursor inside the file is to use fgetwc.
The code is below (all calls complete successfully, i.e. setlocale, fseek, fputwc, etc.., for brevity I stripped the checking of the return value).
This happens on Ubuntu with glibc 2.16. Does anyone have a good explanation why this happens? Is this a bug in glibc?
setlocale(LC_CTYPE, "C.UTF-8");
uselocale(LC_GLOBAL_LOCALE);
FILE* fp = fopen("/tmp/wc.test", "w+");
wchar_t wc = 0x00a2;
fputwc(wc, fp);
fflush(fp);
rewind(fp);
long ftell_out;
fpos_t fpos_out;
fseek(fp, 1, SEEK_SET); // looks like it doesn't have any effect
ftell_out = ftell(fp); // ftell_out is 0
fgetpos(fp, &fpos_out); // the (inner) offset of fpos_out is 0 as well
fgetwc(fp); // it reads wc(0x00a2) here as if we are at
ftell_out = ftell(fp); // this is 2
fgetpos(fp, &fpos_out); // this is 2
Some notes:
if I close the file and reopen it in read more then everything works as expected (after fseek, ftell_out/fpos_out are 1 and fgetwc fails with a proper errno since the position is inside a multibyte)
if I don't use setlocale the output is almost as expected except that fgetwc doesn't set the errno anymore.

Why fread() returns 0?

I don't udnerstand that.
The File exists. It has contant which length is fitting the value hold by sizeIndexI.
I'm at the begining of the File(Am I not?) and anyway It wont read from that file...
Ofc. the file also was succesfully opened before. (In this case with a+) And the access permissions for the File are ofc. also given.
fpNewsPageLogger = fopen ("/NewsLogx", "a+");
if (fpNewsPageLogger == nullptr)
{
/*...*/
}
else
{
fseek (fpNewsPageLogger, 0 ,SEEK_END);
sizeIndexI = ftell (fpNewsPageLogger);
rewind (fpNewsPageLogger);
DebugLogMsg10 (pDebugLogger, sizeThreadID, "ReadAmount:%d IndexI:%d!", sizeBytesRead, sizeIndexI);
cpTmpNews = calloc (sizeIndexI, sizeof(char));
if (cpTmpNews == nullptr)
{
fclose (fpNewsPageLogger);
return;
}
sizeBytesRead = fread (cpTmpNews, sizeof (char), sizeIndexI, fpNewsPageLogger);
/*...*/
}
Is there anything I'm not thinking about?
Firstly, standard library is not required to meaningfully support seeking from SEEK_END. Did you check the value of sizeIndexI? Maybe it is simply zero? If you ask fread to read zero elements, it expectedly returns zero.
Secondly, you are opening your stream as a text stream. For a text stream values returned by ftell do not generally have any meaningful numerical semantics. In general case ftell for text streams returns an implementation defined encoding of the current position, not the byte offset from the beginning of the file. If you want to work with your stream as binary stream, add "b" to fopen
fpNewsPageLogger = fopen ("/NewsLogx", "ab+");

The use of "r+" in fopen on windows vs linux

I was toying around with some code which was opening, reading, and modifying a text file. A quick (simplified) example would be:
#include <stdio.h>
int main()
{
FILE * fp = fopen("test.txt", "r+");
char line[100] = {'\0'};
int count = 0;
int ret_code = 0;
while(!feof(fp)){
fgets(line, 100, fp);
// do some processing on line...
count++;
if(count == 4) {
ret_code = fprintf(fp, "replaced this line\n");
printf("ret code was %d\n", ret_code);
perror("Error was: ");
}
}
fclose(fp);
return 0;
}
Now on Linux, compiled with gcc (4.6.2) this code runs, and modifies the file's 5th line. The same code, running on Windows7 compiled with Visual C++2010 runs and claims to have succeeded (reports a return code of 19 characters and perror says "No error") but fails to replace the line.
On Linux my file has full permissions:
-rw-rw-rw- 1 mike users 191 Feb 14 10:11 test.txt
And as far as I can tell it's the same on Windows:
test.txt (right click) -> properties -> Security
"Allow" is checked for Read & Write for user, System, and Admin.
I get the same results using MinGW's gcc on Windows so I know it's not a Visual C++ "feature".
Am I missing something obvious, or is the fact that I get no errors, but also no output just an undocumented "feature" of using r+ with fopen() on Windows?
EDIT: Seems even at Microsoft's site they say "r+" should open for reading and writting. They also made this note:
When the "r+", "w+", or "a+" access type is specified, both reading and writing are allowed (the file is said to be open for "update"). However, when you switch between reading and writing, there must be an intervening fflush, fsetpos, fseek, or rewind operation. The current position can be specified for the fsetpos or fseek operation, if desired.
So I tried:
...
if(count == 4) {
fflush(fp);
ret_code = fprintf(fp, "replaced this line\n");
fflush(fp);
printf("ret code was %d\n", ret_code);
...
to no avail.
According to the Linux man page for fopen():
Reads and writes may be intermixed on read/write streams in any order.
Note that ANSI C requires that a file positioning function intervene
between output and input, unless an input operation encounters
end-of-file. (If this condition is not met, then a read is allowed to
return the result of writes other than the most recent.) Therefore it
is good practice (and indeed sometimes necessary under Linux) to put
an fseek(3) or fgetpos(3) operation between write and read operations
on such a stream. This operation may be an apparent no-op (as in
fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect.
So, you should always call fseek() (as, eg. fseek(..., 0, SEEK_CUR)) when switching between reading and writing from a file.
Before performing output after input, an fflush() isn't any good - you need to perform a seek operation. Something like:
fseek(fp, ftell(fp), SEEK_SET); // not fflush(fp);
from the C99 standard (7.19.5.3/6 "The fopen functoin):
When a file is opened with update mode ('+' as the second or third
character in the above list of mode argument values), both input and
output may be performed on the associated stream. However, output
shall not be directly followed by input without an intervening call to
the fflush function or to a file positioning function (fseek,
fsetpos, or rewind), and input shall not be directly followed by output
without an intervening call to a file positioning function, unless the
input operation encounters end-of-file.

in append/update mode is a call to a file positioning function still required?

After opening a file in append update mode, is it necessary to execute a file positioning statement before each write to the file?
FILE *h;
int ch;
if ((h = fopen("data", "a+")) == NULL) exit(1);
if (fseek(h, 0 SEEK_SET)) exit(2);
ch = fgetc(h); /* read very first character */
if (ch == EOF) exit(3);
/* redundant? mandatory? */
fseek(h, 0, SEEK_END); /* call file positioning before output */
/* add 1st character to the end of file on a single line*/
fprintf(h, "%c\n", ch);
The C11 Standard says:
7.21.5.3/6 ... all subsequent writes to the file to be forced to the then current end-of-file ...
and
7.21.5.3/7 ... input shall not be directly followed by output without an
intervening call to a file positioning function ...
I take it the shall in 7.21.5.3/7 is stronger than the description in 7.21.5.3/6.
Probably not redundant in portable C. While the underlying file descriptor will always append (at least on Unix), the point of the fseek/fflush requirement is to get rid of the input buffer before writing to the output, so that the same buffer can be used for reading and writing. AFAIK you're not even required to seek to end of file, you can seek anywhere, as long as you seek.
The second description is stronger than the first, but that is to be expected. The first only states that all writes go to EOF, i.e. that there's no way to write anywhere else. The second establishes the rule that switching from reading to writing must be accompanied by a flush or seek, to ensure that read and write aspects of the buffer don't get mixed up.

Resources