fseek() giving 2 different counts when I expected the same count - c

If I use
fseek(file_ptr, 0, SEEK_END);
size = ftell(file_ptr);
I get 480000 which is right as I have 60000 x double float at 8 bytes per double in the file.
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes. Anyone know what they are?

But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes
That is because the size of a double on your system is 8 bytes, and fseek() set the file position indicator 8 bytes from SEEK_END.
The new position, measured in bytes, is obtained by adding offset
bytes to the position specified by whence.
Re:
Anyone know what they are?
This is what the open group's manual page has to say about it:
The fseek() function shall allow the file-position indicator to be set
beyond the end of existing data in the file. If data is later written
at this point, subsequent reads of data in the gap shall return bytes
with the value 0 until data is actually written into the gap.
The behavior of fseek() on devices which are incapable of seeking is
implementation-defined. The value of the file offset associated with
such a device is undefined.
Note that this behaviour is only specified for POSIX-compliant systems.

The answer is in fseek() man page (Linux OS).
The function prototype is
int fseek(FILE *stream, long offset, int whence);
where offset parameter is described as follows
The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
Since your offset is 8 (sizeof double, in your system) and the original file size is 480000, that's why you get 480008.
The manual never mention any limitation preventing the pointer to be set after SEEK_END.

Related

fseek to a 32-bit unsigned offset

I am reading a file format (TIFF) that has 32-bit unsigned offsets from the beginning of the file.
Unfortunately the prototype for fseek, the usual way I would go to particular file offset, is:
int fseek ( FILE * stream, long int offset, int origin );
so the offset is signed. How should I handle this situation? Should I be using a different function for seeking?
After studying this question more deeply and considering the other comments and answers (thank you), I think the simplest approach is to do two seeks if the offset is greater than 2147483647 bytes. This allows me to keep the offsets as uint32_t and continue using fseek. The positioning code is therefore like this:
// note: error handling code omitted
uint32_t offset = ... (whatever it is)
if( offset > 2147483647 ){
fseek( file, 2147483647, SEEK_SET );
fseek( file, (long int)( offset - 2147483647 ), SEEK_CUR );
} else {
fseek( file, (long int) offset, SEEK_SET );
}
The problem with using 64-bit types is that the code might be running on a 32-bit architecture (among other things). There is a function fsetpos which uses a structure fpos_t to manage arbitrarily large offsets, but that brings with it a range of complexities. Although fsetpos might make sense if I was truly using offsets of arbitrarily large size, since I know the largest possible offset is uint32_t, then the double seek meets that need.
Note that this solution allows all TIFF files to be handled on a 32-bit system. The advantage of this is obvious if you consider commercial programs like PixInsight. PixInsight can only handle TIFF files smaller than 2147483648 bytes when running on 32-bit systems. To handle full sized TIFF files, a user has to use the 64-bit version of PixInsight on a 64-bit computer. This is probably because the PixInsight programmers used a 64-bit type to handle the offsets internally. Since my solution only uses 32-bit types, I can handle full-sized TIFF files on a 32-bit system (as long as the underlying operating system can handle files that large).
You can try to use lseek64() (man page)
#define _LARGEFILE64_SOURCE /* See feature_test_macros(7) */
#include <sys/types.h>
#include <unistd.h>
off64_t lseek64(int fd, off64_t offset, int whence);
With
int fd = fileno (stream);
Notes from The GNU C lib - Setting the File Position of a Descriptor
This function is similar to the lseek function. The difference is that the offset parameter is of type off64_t instead of off_t which makes it possible on 32 bit machines to address files larger than 2^31 bytes and up to 2^63 bytes. The file descriptor filedes must be opened using open64 since otherwise the large offsets possible with off64_t will lead to errors with a descriptor in small file mode.
When the source file is compiled with _FILE_OFFSET_BITS == 64 on a 32 bits machine this function is actually available under the name lseek and so transparently replaces the 32 bit interface.
About fd and stream, from Streams and File Descriptors
Since streams are implemented in terms of file descriptors, you can extract the file descriptor from a stream and perform low-level operations directly on the file descriptor. You can also initially open a connection as a file descriptor and then make a stream associated with that file descriptor.

Reading from binary after x bytes in C

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

Strange value for offset in fwrite

For the purpose of tracing I/O calls, I have overwritten fwrite in a shared library that I LD_PRELOAD when running a program. In my version of fwrite, I get the absolute offset of the write using
long int pos = ftell(stream);
The value is then passed to a function that takes a size_t argument (so unsigned long), and the value is printed.
I encountered some calls with 18446744073709551615 as the offset (2^64 - 1), so I guess the initial long int returned by ftell was -1. These operations always wrote 10 bytes.
So my question is: what could have led the stream offset to be set to -1?
I don't trace the name of the files being accessed by those strange fwrites, so I don't know what is being accessed. Also I should precise that the program is an MPI program. It might be possible that the stream involved is actually a socket, or something else than a file...
-1 return value from ftell means that the errno was set. Here is the relivenat text from the manpage:
ftell() returns the current offset. Otherwise, -1 is returned and errno is set to indicate the error.

When going over a binary file in c, do I go by bits or bytes?

I have a long binary string which contains a number of bytes,
I need to use fseek to get to a specific byte in the string.
I know I need to calculate the offset but I'm not sure if the offset
is calculated by bits or bytes. If I need to get to the 3rd byte for example
I need to advance the indicator by 3 or by (3*8=)24?
fseek takes offset as the number of bytes, not bits:
The new position, measured in bytes from the beginning of the file, shall be obtained by adding offset to the position specified by whence. (emphasis is mine).
int fseek(FILE *stream, long offset, int whence);
The fseek() function sets the file position indicator for the stream pointed to by stream. The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
If whence is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is relative to the start of the file, the current position indicator, or end-of-file, respectively.
If you are working with text file
First Byte position ==> fseek(fp,0,SEEK_SET);
Second Byte position ==> fseek(fp,1,SEEK_SET);
Third Byte position ==> fseek(fp,2,SEEK_SET);
You just need to specify Number of byte and not Number of byte * 8

What is the reading limit of function 'read' in 'unistd.h'?

Standard unix C has this function:
ssize_t read(int fd, void *buf, size_t count);
But what is the maximum bytes that this 'read' function can read 1 time?
From man read(2):
read() attempts to read up to count bytes from file descriptor fd into
the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If
count is greater than SSIZE_MAX, the result is unspecified.
The value of SSIZE_MAX depends on your system, but generally it's something akin to the maximum value of signed long, which is often 231 (32-bit systems) or 263 (64-bit systems).
231 bytes is 2 gigabytes, so you're probably safe; in practice, the actual device driver/buffers/network I/O is never going to give you a 2 gigabyte chunk of data in one go.
quote from IEEE Std 1003.1 (aka POSIX.1)
If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.
So you have to check man 2 read at your target platform. For example, FreeBSD man says in ERRORS part:
[EINVAL] The value nbytes is greater than INT_MAX.
Generally it can read as many bytes as there are available in buf. In reality, the underlying device driver (be it the filesystem or the network, or a pipe), would return less than what you want in case there is nothing more available.
So, the particular behaviour of read depends on the underlying driver in the kernel.
This is why it's important to always check the return value of read and examine the actual bytes read.
read() takes an open file descriptor, the address of a buffer, and a
number, count of bytes. It attempts to read count bytes into the
buffer from the file described by the descriptor. It is important to
assure that buf points to at least count bytes of storage!
It can read as much as your buffer can hold, the limit is SSIZE_MAX and also the limits of your hardware.

Resources