Binary file reading - c

I am dealing with a code which reading data from a binary file. The code is given here. Would anyone please make clear to me the role of fseek and fread here.
fc = fopen(CLOUDS_FILE, "rb");
if (fc == NULL){ fputs("File open error.\n", stderr); exit(1); }
crs = aux[CLRS];
fpos = (int) (pixel[2]*crs*crs + pixel[1]*crs + pixel[0]);
flsz = sizeof(fd);
fseek(fc, fpos*flsz, 0);
rd = fread((void *) &fd, flsz, 1, fc);
if (rd != 1){ fputs("Read error.\n", stderr); exit(1); }
fclose(fc);

fseek() changes the file offset. fread() reads data starting from the current offset, incrementing the offset by the number of elements read.
(Or is the question something else entirely? I mean, the above is something one can trivially figure by reading the manpages)

The binary file reading is done with an internal 'pointer', just like text editors have a cursor position when editing something. When opening the file in reading mode (using fopen) the pointer will be at the beginning of the file. Read operations (like fread, which will read a specified number of bytes from the stream) start reading at the pointer position and usually advance the pointer when they're done. If it is only necessary to read a specific part of the file, it is possible to manually set the pointer to a certain (relative or absolute) position, this is what fseek is used for.

#include <stdio.h>
int fseek(FILE *stream, long offset, int whence);
The fseek() function sets the file position indicator for the stream
pointed to by stream. The new position, measured in bytes, is obtained
by adding offset bytes to the position specified by whence. If whence
is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is relative to
the start of the file, the current position indicator, or end-of-file,
respectively.
#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The function fread() reads nmemb elements of data, each size bytes
long, from the stream pointed to by stream, storing them at the loca‐
tion given by ptr.

Sure, fseek is forwarding the "read from" index in the file to a calculated offset in CLOUDS_FILE, while fread is reading one object of size sizeof(fd) (whatever fd is, as that's not in your pasted code) into fd.

Related

confusion with fseek() in C

I am new to the C programming language. I am learning file I/O, and am confused with the fseek function. Here is my code
#include <stdio.h>
#include <stdlib.h>
struct threeNumbers {
int n1,n2,n3;
}
int main (){
int n;
struct threeNumbers number;
FILE *filePointer;
if ((filePointer=fopen("\\\\wsl$\\Ubuntu-20.04\\home\\haseeb\\learningC\\file Input and Output\\program2\\program.bin","rb"))==NULL){
printf("error! opening file);
/* if pointer is null, the program will exit */
exit(1);
}
/* moves the cursore at the end of the file*/
fseek(filePointer,-sizeof(struct threeNumbers),SEEK_END);
for(n=1;n<5;++n){
fread(&number,sizeof(struct threeNumbers),1,filePointer);
printf (" n1:%i\tn2:%i\tn3:",number.n1,number.n2,number.n3);
fseek(filePointer,-2*sizeof(struct threeNumbers),SEEK_CUR);
}
fclose(filePointer);
return 0;
}
I know that this program will start reading the records from the file program.bin in the reverse order (last to first) and prints it.
my confusion is I know that "fseek(filePointer,-sizeof(struct threeNumbers),SEEK_END);" will move the cursor at the end of the binary file. What does "fseek(filePointer,-2*sizeof(struct threeNumbers),SEEK_CUR);" do? I think it moves to the current location, but what is the point of the cursor cumming to the current location in this program? Also why is it -2 instead of being just "-sizeof(struct threeNumbers)"?
Disregarding the actual code, this is what fseek() does:
The fseek() function sets the file position indicator for the stream
pointed to by stream. The new position, measured in bytes, is obtained
by adding offset bytes to the position specified by whence. If whence
is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is relative to
the start of the file, the current position indicator, or end-of-file,
respectively. A successful call to the fseek() function clears the
end-of-file indicator for the stream and undoes any effects of the
ungetc(3) function on the same stream.
fseek(filePointer,-sizeof(struct threeNumbers),SEEK_END) will not "move the cursor at the end of the binary file"; it will move it sizeof(struct threeNumbers) before the end of the file.

Displaying size of a file [C]

I'm making a simple sockets program to send a text file or a picture file over to another socket connected to a port. However, I want to also send the size of the file over to the client socket so that it knows how many bytes to receive.
I also want to implement something where I can send a certain number of bytes instead of the file itself. For example, if a file I wanted to send was 14,003 bytes and I felt like sending 400 bytes, then only 400 bytes would be sent.
I am implementing something like this:
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
/* Seek from offset into the file */
//fseek(fp, 0L, SEEK_END);
fseek(fp, offset, sendSize + offset); // seek to sendSize
fileSize = ftell(fp); // get current file pointer
//fseek(fp, 0, SEEK_SET); // seek back to beginning of file
}
printf("The size is: %d", fileSize);
}
offset is pretty much going to go 40 bytes into the file and then send whatever sendSize bytes over to the other program.
I keep getting an output of 0 instead of 5. Any reason behind this?
You can try this.
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
fseek(fp, 0L, SEEK_END);
fileSize = ftell(fp);
}
printf("The size is: %d", fileSize);
}
The fseek() to the end, then ftell() method is a reasonably portable way of getting the size of a file, but not guaranteed to be correct. It won't transparently handle newline / carriage return conversions, and as a result, the standard doesn't actually guarantee that the return from ftell() is useful for any purpose other than seeking to the same position.
The only portable way is to read the file until data runs out and keep a count of bytes. Or stat() the file using the (non-ANSI) Unix standard function.
You may be opening the file in text mode as Windows can open a file in text mode even without the "t" option.
And you can't use ftell() to get the size of a file opened in text mode. Per 7.21.9.4 The ftell function of the C Standard:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time
of the ftell call; the difference between two such return
values is not necessarily a meaningful measure of the number of
characters written or read.
Even if it does return the "size" of the file, the translation to "text" may changed the actual number of bytes read.
It's also not portable or standard-conforming to use fseek() to find the end of a binary file. Per 7.21.9.2 The
fseek
function:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END.
I think your Seek does not work due to the 3rd parameter:
try to seek with
(fp, offset, SEEK_SET);
as he will try to use the number sendSize+Offset as the "origin" constant, it will be compared to the 3 constant values as below (it is 0, 1 or 2) and as nothing compares it seem to return 0 all time.
http://www.cplusplus.com/reference/cstdio/fseek/
Parameters
stream, offset, origin
Position used as reference for the offset. It is specified by one of the following constants defined in exclusively to be used as arguments for this function:
Constant Reference position
SEEK_SET Beginning of file
SEEK_CUR Current position of the file pointer
SEEK_END End of file

C using fstat() to read size of file

I consider reading file of unknown size that I know doesn't change size in the meantime. So I intend to use fstat() function and struct stat. Now I am considering what the st_size field really means and how should I use it.
If I get the file size's in this way, then allocate a buffer of that size and read exactly that size of bytes there seems to be one byte left over. I come to this conclusion when I used feof() function to check if there really nothing left in FILE *. It returns false! So I need to read (st_size + 1) and only than all bytes have been read and feof() works correctly. Should I always add this +1 value to this size to read all bytes from binary file or there is some hidden reason that this isn't reading to EOF?
struct stat finfo;
fstat(fileno(fp), &finfo);
data_length = finfo.st_size;
I am asking about this because when I add +1 then the number of bytes read by fread() is really -1 byte less, and as the last byte is inserted 00 byte. I could also before checking with feof() do something like this
fread(NULL, 1, 1, fp);
It is the real code, it is a little odd situation:
// reading png bytes from file
FILE *fp = fopen("./test/resources/RGBA_8bits.png", "rb");
// get file size from file info
struct stat finfo;
fstat(fileno(fp), &finfo);
pngDataLength = finfo.st_size;
pngData = malloc(sizeof(unsigned char)*pngDataLength);
if( fread(pngData, 1, pngDataLength, fp) != pngDataLength) {
fprintf(stderr, "%s: Incorrect number of bytes read from file!\n", __func__);
fclose(fp);
free(pngData);
return;
}
fread(NULL, 1, 1, fp);
if(!feof(fp)) {
fprintf(stderr, "%s: Not the whole binary file has been read.\n", __func__);
fclose(fp);
free(pngData);
return;
}
fclose(fp);
This behaviour is normal.
feof will return true only once you have tried to read beyond the file's end which you don't do as you read exactly the size of the file.

Is it recommended method for computing the size of a file using fseek()?

In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?
If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.
The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr §7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.
The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat
Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

Why does fgetpos() return a negative offset?

When I am using fgetpos(fp,&pos), the call is setting pos to a negative value where pos is of type fpos_t. Can some one explain why this is happening?
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define TRUE 1
#define FALSE 0
#define MAX_TAG_LEN 50
char filename[1000] = "d:\\ire\\a.xml";
//extract each tag from the xml file
int getTag(char * tag, FILE *fp)
{
//skip until a beginning of a next
while(!feof(fp))
if((char)fgetc(fp) == '<')break;
if(!feof(fp)){
char temp[MAX_TAG_LEN]={0};
char *ptr;
int len;
fpos_t b;
fgetpos(fp,&b); // here the b is containing -ve values.....???
fread(temp,sizeof(char),MAX_TAG_LEN - 1,fp);
temp[MAX_TAG_LEN-1] = 0;
ptr = strchr(temp,'>'); //search of ending tag bracket
len = ptr - temp + 1;
sprintf(tag,"<%.*s",len,temp); //copy the tag
printf("%s",tag); //print the tag
b += len; //reset the position of file pointer to just after the tag character.
fsetpos(fp,&b);
return TRUE;
}
else{
return FALSE;
}
}
int main()
{
int ch;
char tag[100]={0};
FILE *fp = fopen(filename,"r");
while(getTag(tag,fp)){
}
fclose(fp);
return 0;
}
where a.xml is a very basic xml file
<file>
<page>
<title>AccessibleComputing</title>
<id>10</id>
<redirect />
<revision>
<id>133452289</id>
<timestamp>2007-05-25T17:12:12Z</timestamp>
<contributor>
<username>Gurch</username>
<id>241822</id>
</contributor>
<minor />
<comment>Revert edit(s) by [[Special:Contributions/Ngaiklin|Ngaiklin]] to last version by [[Special:Contributions/Rory096|Rory096]]</comment>
<text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>
</revision>
</page>
</file>
The code is working for some xml files but for the above xml file it is stoping after printing the first tag.
According to the cplusplus.com description of fpos_t:
fpos_t objects are usually created by a call to fgetpos, which returns a reference to an object of this type. The content of a fpos_t is not meant to be read directly, but only to use its reference as an argument in a call to fsetpos.
I think that means that in theory the value of an fpos_t could be arbitrarily positive or negative, so long as the implementation treats it correctly. For example, fpos_t could be some offset from the end of the file rather than the beginning, in which case negative values make sense. It also could be some weird bit-packed representation that would use each bit, including the sign bit, to encode some other information about the file position.
Finally I found the error....
msdn says
You can use fseek to reposition the
pointer anywhere in a file. The
pointer can also be positioned beyond
the end of the file. fseek clears the
end-of-file indicator and negates the
effect of any prior ungetc calls
against stream.
When a file is opened for appending
data, the current file position is
determined by the last I/O operation,
not by where the next write would
occur. If no I/O operation has yet
occurred on a file opened for
appending, the file position is the
start of the file.
For streams opened in text mode, fseek has limited use, because
carriage return–linefeed translations
can cause fseek to produce unexpected
results. The only fseek operations
guaranteed to work on streams opened
in text mode are:
Seeking with an offset of 0 relative
to any of the origin values. Seeking
from the beginning of the file with an
offset value returned from a call to
ftell.
once fopen call is modified from "r" to "rb" it worked fine....
thanks

Resources