Reading the last 50 characters of a file with fseek() - c

I'm trying to read the last 50 characters in a file by doing this:
FILE* fptIn;
char sLine[51];
if ((fptIn = fopen("input.txt", "rb")) == NULL) {
printf("Coudln't access input.txt.\n");
exit(0);
}
if (fseek(fptIn, 50, SEEK_END) != 0) {
perror("Failed");
fclose(fptIn);
exit(0);
}
fgets(sLine, 50, fptIn);
printf("%s", sLine);
This doesn't return anything that makes sense remotely. Why?

Change 50 to -50. Also note that this will only work with fixed-length character encodings like ASCII. Finding the 50th character from the end is far from trivial with things like UTF-8.

Try setting the offset to -50.

Besides the sign of the offset the following things could make trouble:
A newline character makes fgets stop reading, but it is considered a valid character and therefore it is included in the string copied to str.
Use either ferror or feof to check whether an error happened or the End-of-File was reached.
See also

fseek(fptIn, 50, SEEK_END)
Sets the stream pointer at the end of the file, and then tries to position the cursor 50 bytes ahead thereof. Remember, for binary streams:
3 For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence..The specified
position is the beginning of the file if whence is SEEK_SET, the current value of the file
position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not
meaningfully support fseek calls with a whence value of SEEK_END.
This call should fail. The next call to fgets invokes UB. Try -50 as an offset and also iff the call succeeds try to read it into your buffer
Note: emphasis mine

Related

How to know if the file end with a new line character or not

I'm trying to input a line at the end of a file that has the following shape "1 :1 :1 :1" , so at some point the file may have a new line character at the end of it, and in order to execute the operation I have to deal with that, so I came up with the following solution :
go to the end of the file and go backward by 1 characters (the length of the new line character in Linux OS as I guess), read that character and if it wasn't a new line character insert a one and then insert the whole line else go and insert the line, and this is the translation of that solution on C :
int insert_element(char filename[]){
elements *elem;
FILE *p,*test;
size_t size = 0;
char *buff=NULL;
char c='\n';
if((p = fopen(filename,"a"))!=NULL){
if(test = fopen(filename,"a")){
fseek(test,-1,SEEK_END );
c= getc(test);
if(c!='\n'){
fprintf(test,"\n");
}
}
fclose(test);
p = fopen(filename,"a");
fseek(p,0,SEEK_END);
elem=(elements *)malloc(sizeof(elements));
fflush(stdin);
printf("\ninput the ID\n");
scanf("%d",&elem->id);
printf("input the adress \n");
scanf("%s",elem->adr);
printf("innput the type \n");
scanf("%s",elem->type);
printf("intput the mark \n");
scanf("%s",elem->mark);
fprintf(p,"%d :%s :%s :%s",elem->id,elem->adr,elem->type,elem->mark);
free(elem);
fflush(stdin);
fclose(p);
return 1;
}else{
printf("\nRrror while opening the file !\n");
return 0;
}
}
as you may notice that the whole program depends on the length of the new line character (1 character "\n") so I wonder if there is an optimal way, in another word works on all OS's
It seems you already understand the basics of appending to a file, so we just have to figure out whether the file already ends with a newline.
In a perfect world, you'd jump to the end of the file, back up one character, read that character, and see if it matches '\n'. Something like this:
FILE *f = fopen(filename, "r");
fseek(f, -1, SEEK_END); /* this is a problem */
int c = fgetc(f);
fclose(f);
if (c != '\n') {
/* we need to append a newline before the new content */
}
Though this will likely work on Posix systems, it won't work on many others. The problem is rooted in the many different ways systems separate and/or terminate lines in text files. In C and C++, '\n' is a special value that tells the text mode output routines to do whatever needs to be done to insert a line break. Likewise, the text mode input routines will translate each line break to '\n' as it returns the data read.
On Posix systems (e.g., Linux), a line break is indicated by a line feed character (LF) which occupies a single byte in UTF-8 encoded text. So the compiler just defines '\n' to be a line feed character, and then the input and output routines don't have to do anything special in text mode.
On some older systems (like old MacOS and Amiga) a line break might be a represented by a carriage return character (CR). Many IBM mainframes used different character encodings called EBCDIC that don't have a direct mappings for LF or CR, but they do have a special control character called next line (NL). There were even systems (like VMS, IIRC) that didn't use a stream model for text files but instead used variable length records to represent each line, so the line breaks themselves were implicit rather than marked by a specific control character.
Most of those are challenges you won't face on modern systems. Unicode added more line break conventions, but very little software supports them in a general way.
The remaining major line break convention is the combination CR+LF. What makes CR+LF challenging is that it's two control characters, but the C i/o functions have to make them appear to the programmer as though they are the single character '\n'. That's not a big deal with streaming text in or out. But it makes seeking within a file hard to define. And that brings us back to the problematic line:
fseek(f, -1, SEEK_END);
What does it mean to back up "one character" from the end on a system where line breaks are indicated by a two character sequence like LF+CR? Do we really want the i/o system to have to possibly scan the entire file in order for fseek (and ftell) to figure out how to make sense of the offset?
The C standards people punted. In text mode, the offset argument for fseek can only be 0 or a value returned by a previous call to ftell. So the problematic call, with a negative offset, isn't valid. (On Posix systems, the invalid call to fseek will likely work, but the standard doesn't require it to.)
Also note that Posix defines LF as a line terminator rather than a separator, so a non-empty text file that doesn't end with a '\n' should be uncommon (though it does happen).
For a more portable solution, we have two choices:
Read the entire file in text mode, remembering whether the most recent character you read was '\n'.
This option is hugely inefficient, so unless you're going to do this only occasionally or only with short files, we can rule that out.
Open the file in binary mode, seek backwards a few bytes from the end, and then read to the end, remembering whether the last thing you read was a valid line break sequence.
This might be a problem if our fseek doesn't support the SEEK_END origin when the file is opened in binary mode. Yep, the C standard says supporting that is optional. However, most implementations do support it, so we'll keep this option open.
Since the file will be read in binary mode, the input routines aren't going to convert the platform's line break sequence to '\n'. We'll need a state machine to detect line break sequences that are more than one byte long.
Let's make the simplifying assumption that a line break is either LF or CR+LF. In the latter case, we don't care about the CR, so we can simply back up one byte from the end and test whether it's LF.
Oh, and we have to figure out what to do with an empty file.
bool NeedsLineBreak(const char *filename) {
const int LINE_FEED = '\x0A';
FILE *f = fopen(filename, "rb"); /* binary mode */
if (f == NULL) return false;
const bool empty_file = fseek(f, 0, SEEK_END) == 0 && ftell(f) == 0;
const bool result = !empty_file ||
(fseek(f, -1, SEEK_END) == 0 && fgetc(f) == LINE_FEED);
fclose(f);
return result;
}

How to use fgets after using fgetc?

I'm trying to write a specific program that reads data from a file but I realized that when I read the file with fgetc, if I use fgets later, it doesn't have any output.
For example, this code:
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE * arq = fopen("arquivo.txt", "r");
char enter = fgetc(arq);
int line_count = 1;
while(enter != EOF) {
if (enter == '\n') line_count++;
enter = fgetc(arq);
}
printf("%d", line_count);
char str[128];
while(fgets(str, 128, arq)) printf("%s", str);
}
the second while doesn't print anything but if I delete the first while, the code prints the file content. Why is that happening?
TLDR: rewind(arq); is what you want
When you read from a file, the internal file pointer advances as you read, so that each subsequent read will return the next data in the file. When you get to the end, all subsequent reads will return EOF as there is nothing more to read.
You can manipulate the internal file pointer with the fseek and ftell functions. fseek allows you to set the internal file pointer to any point in the file, relative to the beginning, the end, or the current position. ftell will tell you the current position. This allows you to easily remember any position in the file and go back to it later.
SYNOPSIS
#include <stdio.h>
int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);
void rewind(FILE *stream);
DESCRIPTION
The fseek() function sets the file position indicator for the stream pointed to by stream.
The new position, measured in bytes, is obtained by adding offset bytes to the position
specified by whence. If whence is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is
relative to the start of the file, the current position indicator, or end-of-file, respecā€
tively. A successful call to the fseek() function clears the end-of-file indicator for
the stream and undoes any effects of the ungetc(3) function on the same stream.
The ftell() function obtains the current value of the file position indicator for the
stream pointed to by stream.
The rewind() function sets the file position indicator for the stream pointed to by stream
to the beginning of the file. It is equivalent to:
(void) fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream is also cleared (see clearerr(3)).
One caveat here is that the offsets used by fseek and returned by ftell are byte offsets, not character offsets. So when accessing a non-binary file (anything not opened with a "b" modifier to fopen) the offsets might not correspond to characters exactly. It should always be ok to pass an offset returned by ftell back to fseek unmodifed to get to the same spot in the file, but trying to compute offsets otherwise may be tricky.

Difference between fread(&c, 1, 1, input) and fgetc(input) for reading one byte

I'm currently trying to read in a PNG file, one byte at a time, and I'm getting different results when I use fread((void*), size_t, size_t, FILE*) and fgetc(FILE*).
I essentially want to "Read one byte at a time until the file ends", and I do so in two different ways. In both cases, I open the image I want in binary mode through:
FILE* input = fopen( /* Name of File */, 'rb');
And store each byte in a character, char c
fread: while( fread(&c, 1, 1, input) != 0) //read until there are no more bytes read
fgetc:
while( (c = fgetc(input)) != EOF) //Read while EOF hasn't been reached
In the fread case, I read all the bytes I need to do. The reading function stops at the end of the file, and I end up printing all 380,000 bytes (which makes sense, as the input file is a 380kB file).
However, in the fgetc case, I stop once I reach a byte with a value of ff (which is -1, the value of the macro EOF.
My question is, if both functions are doing the same thing, reading one byte at a time, how does fread know to continue reading even if it comes across a byte with a value of EOF? And building off of this, how does fread know when to stop if EOF is passed when reading the file?
fgetc returns an int, not a char. EOF (and many actual character codes) cannot be stored in a char and attempting to do so will result in Undefined Behaviour. So don't do that. Store the return value in an int.

Read from file number of characters from a given position - C

I have a file called "cache.txt", where I have some data. I have a unordered_map, cache, with a pair of ints as value, where : the first element from the pair represents the number of characters I want to read, and the second the position from where I want to read. The key from the unordered_map, comps, is not relevant.
....
fp = fopen("cache.txt", "r");
fseek(fp, cache[comps].second, SEEK_SET);
int number_of_chars = cache[comps].first;
char c;
while((c = getc(fp)) != EOF && number_of_chars > 0) {
--number_of_chars;
printf("%c",c);
}
fclose(fp);
I have to use it several times, so that's the reason for opening and closing the file each time.
If you are using text streams (and apparently you are, because you open the file in mode "r", not "rb"), then the only position values you can pass to fseek are 0 or some value returned by ftell. In other words, you cannot count characters read yourself and use that count in a call to fseek.
If you use binary streams, then you can count the bytes read yourself, but you will have to deal with the fact that whatever the OS uses to represent newlines will not be translated to a newline character. Because there is no way to know how a given OS handles newlines, it's impossible to portably read a text stream in binary mode.
In short, you should take the requirement in the C standard library seriously. Make sure that the offset you are storing in your map came directly from a call to ftell. ("Directly" means that you cannot even use ftell(file) + 1; only ftell(file).)
Unix/Linux programmers can get away with not dealing with the above, since Posix mandates that there is no difference between text and binary modes, and only requires that fseek use a value returned by ftell for wide streams. But if you are using Windows, you will find that trying to use fseek to return to a byte position you computed yourself does not work. And if you are not using Windows, you should ask yourself whether your program might someday be ported to Windows.

Undoing the effects of ungetc() : "How" do fseek(),rewind() and fsetpos() do it?Is buffer refilled each time?

Huh!!How shall I put the whole thing in a clear question!!Let me try:
I know that the files opened using fopen() are buffered into memory.We use a buffer for efficiency and ease.During a read from the file, the contents of the file are first read to the buffer,and we read from that buffer.Similarly,in a write to the file, the contents are written to the buffer first ,and then to the file.
But what with fseek(),fsetpos() and rewind()dropping the effect of the previous calls to ungetc()? Can you tell me how it is done?I mean,given we have opened a file for read and it is copied into the buffer.Now using ungetc() we've changed some characters in the buffer.Here is what I just fail to understand even after much effort:
Here's what said about the ungetc() --"A call to fseek, fsetpos or rewind on stream will discard any characters previously put back into it with this function." --How can characters already put into the buffer be discarded?One approach is that the original characters that were removed are "remembered",and each new character that was put in is identified and replaced with original character.But it seems very inefficient.The other option is to load a copy of the original file into buffer and place the file pointer at the intended position.Which approach of these two does fseek, fsetpos or rewind take to discard the characters put using ungetc()?
For text streams,how does the presence of unread characters in the stream,characters that were put in using ungetc(), affect the return value of ftell()?My confusion arise from the following line about ftell() and ungetc() from this link about ftell(SOURCE)
"For text streams, the numerical value may not be meaningful but can still be used to restore the position to the same position later using fseek (if there are characters put back using ungetc still pending of being read, the behavior is undefined)."
Focusing on the last line of the above paragraph,what has pending of being read got to do with a "ungetc()-obtained" character being discarded? Each time we read a character that was put into the stream using ungetc(),is it discarded after the read?
A good mental model of the put back character is simply that it's some extra little property which hangs off the FILE * object. Imagine you have:
typedef struct {
/* ... */
int putback_char;
/* ... */
} FILE;
Imagine putback_char is initialized to the value EOF which indicates "there is no putback char", and ungetc simply stores the character to this member.
Imagine that every read operation goes through getc, and that getc does something like this:
int getc(FILE *stream)
{
int ret = stream->putback_char;
if (ret != EOF) {
stream->putback_char = EOF;
if (__is_binary(stream))
stream->current_position--;
return ret;
}
return __internal_getc(stream); /* __internal_getc doesn't know about putback_char */
}
The functions which clear the pushback simply assign EOF to putback_char.
In other words, the put back character (and only one needs to be supported) can actually be a miniature buffer which is separate from the regular buffering. (Consider that even an unbuffered stream supports ungetc: such a stream has to put the byte or character somewhere.)
Regarding the position indicator, the C99 standard says this:
For a text stream, the value of its file position indicator after a successful call to the ungetc function is unspecified until all pushed-back characters are read or discarded. For a binary stream, its file position indicator is decremented by each successful call to the ungetc function; if its value was zero before a call, it is indeterminate after the call. [7.19.7.11 The ungetc function]
So, the www.cplusplus.com reference you're using is incorrect; the behavior of ftell is not undefined when there are pending characters pushed back with ungetc.
For text streams, the value is unspecified. Accessing an unspecified value isn't undefined behavior, because an unspecified value cannot be a trap representation.
The undefined behavior exists for binary streams if a push back occurs at position zero, because the position then becomes indeterminate. Indeterminate means that it's an unspecified value which could be a trap representation. Accessing it could halt the program with an error message, or trigger other behaviors.
It's better to get programming language and library specifications from the horse's mouth, rather than from random websites.
Lets start from the beginning,
int ungetc(int c, FILE *stream);
The ungetc() function shall push the byte specified by c (converted to an unsigned char) back onto the input stream pointed to by stream.A character is virtually put back into an input stream, decreasing its internal file position as if a previous getc operation was undone.This only affects further input operations on that stream, and not the content of the physical file associated with it, which is not modified by any calls to this function.
int fseek(FILE *stream, long offset, int whence);
The new position, measured in bytes from the beginning of the file, shall be obtained by adding offset to the position specified by whence. The specified point is the beginning of the file for SEEK_SET, the current value of the file-position indicator for SEEK_CUR, or end-of-file for SEEK_END.fseek either flushes any buffered output before setting the file position or else remembers it so it will be written later in its proper place in the file
int fsetpos(FILE *stream, const fpos_t *pos);
The fsetpos() function sets the file position and state indicators for the stream pointed to by stream according to the value of the object pointed to by pos, which must be a value obtained from an earlier call to fgetpos() on the same stream.
void rewind(FILE *stream);
The rewind function repositions the file pointer associated with stream to the beginning of the file. A call to rewind is similar to
(void) fseek( stream, 0L, SEEK_SET );
So as you see ungetc(), Pushing back characters doesn't alter the file; only the internal buffering for the stream is affected.so your second comment "The other option is to load a copy of the original file into buffer and place the file pointer at the intended position" is correct.
Now Answering your second question - A successful intervening call (with the stream pointed to by stream) to a file-positioning function discards any pushed-back characters for the stream. The external storage corresponding to the stream is unchanged

Resources