Strange value for offset in fwrite - c

For the purpose of tracing I/O calls, I have overwritten fwrite in a shared library that I LD_PRELOAD when running a program. In my version of fwrite, I get the absolute offset of the write using
long int pos = ftell(stream);
The value is then passed to a function that takes a size_t argument (so unsigned long), and the value is printed.
I encountered some calls with 18446744073709551615 as the offset (2^64 - 1), so I guess the initial long int returned by ftell was -1. These operations always wrote 10 bytes.
So my question is: what could have led the stream offset to be set to -1?
I don't trace the name of the files being accessed by those strange fwrites, so I don't know what is being accessed. Also I should precise that the program is an MPI program. It might be possible that the stream involved is actually a socket, or something else than a file...

-1 return value from ftell means that the errno was set. Here is the relivenat text from the manpage:
ftell() returns the current offset. Otherwise, -1 is returned and errno is set to indicate the error.

Related

fseek() giving 2 different counts when I expected the same count

If I use
fseek(file_ptr, 0, SEEK_END);
size = ftell(file_ptr);
I get 480000 which is right as I have 60000 x double float at 8 bytes per double in the file.
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes. Anyone know what they are?
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes
That is because the size of a double on your system is 8 bytes, and fseek() set the file position indicator 8 bytes from SEEK_END.
The new position, measured in bytes, is obtained by adding offset
bytes to the position specified by whence.
Re:
Anyone know what they are?
This is what the open group's manual page has to say about it:
The fseek() function shall allow the file-position indicator to be set
beyond the end of existing data in the file. If data is later written
at this point, subsequent reads of data in the gap shall return bytes
with the value 0 until data is actually written into the gap.
The behavior of fseek() on devices which are incapable of seeking is
implementation-defined. The value of the file offset associated with
such a device is undefined.
Note that this behaviour is only specified for POSIX-compliant systems.
The answer is in fseek() man page (Linux OS).
The function prototype is
int fseek(FILE *stream, long offset, int whence);
where offset parameter is described as follows
The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
Since your offset is 8 (sizeof double, in your system) and the original file size is 480000, that's why you get 480008.
The manual never mention any limitation preventing the pointer to be set after SEEK_END.

Reading from binary after x bytes in C

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

Discerning whether a value of 255 is genuine data or an error output

I'm using a microcontroller to send unsigned 8-bit data to Matlab. Whenever there is any data loss, Matlab displays a value of 255. The underlying code of the Matlab program that interfaces with the WIN32 USB APIs shows that a value of -1 is returned for a range of errors. Since the data is of the unsigned 8-bit type, a value of -1 would be interpreted as 255, which explains why the latter number is displayed when a transmission error has occurred.
So, how could one tell whether a value of 255 represents genuine data or an error output?
Thanks and cheers!
(This is only a partial answer.)
This sounds similar to the way C's standard character input is done.
The fgetc() function returns an int result, which is either the value EOF (typically -1), if there was an error or there's no more data to read, or the value of the character that was successfully read, treated as an unsigned char and converted to int.
If you store the value returned by fgetc() in a signed char object (note that plain char may be either signed or unsigned), a value of -1 could indicate either that fgetc() returned EOF, or that it successfully read a byte with the value 0xFF. That's the problem with this kind of in-band signalling; it can be difficult to distinguish between an error indication and valid data that happens to look like an error indication.
With fgetc(), there are two ways to resolve this. You can store the result in an int, which means you'll get distinct values for EOF (-1) and for 0xFF (255). Or you can call the feof() and ferror() functions after calling fgetc(); if either returns a true value, you know that the EOF indicated an actual error or end-of-file condition.
You haven't told us enough about the interface between your microcontroller and Matlab to know how you can make this distinction. If there's some other function you can call, something similar to feof() or ferror(), you could call it when you get a -1 or 255 result to determine what that result means. Or, if possible, you might consider modifying the interface you're using so it returns a result bigger than one byte, so that the error indication -1 is distinct from all possible valid data values.
Well, if the function is supposed to return -1 upon failure, there is no way that reasonable output would return 255. If the function can return -1, it's using a signed 8-bit return, not an unsigned which means its return range should be -128 -> 128. 255 would never be genuine data.

What is the reading limit of function 'read' in 'unistd.h'?

Standard unix C has this function:
ssize_t read(int fd, void *buf, size_t count);
But what is the maximum bytes that this 'read' function can read 1 time?
From man read(2):
read() attempts to read up to count bytes from file descriptor fd into
the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If
count is greater than SSIZE_MAX, the result is unspecified.
The value of SSIZE_MAX depends on your system, but generally it's something akin to the maximum value of signed long, which is often 231 (32-bit systems) or 263 (64-bit systems).
231 bytes is 2 gigabytes, so you're probably safe; in practice, the actual device driver/buffers/network I/O is never going to give you a 2 gigabyte chunk of data in one go.
quote from IEEE Std 1003.1 (aka POSIX.1)
If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.
So you have to check man 2 read at your target platform. For example, FreeBSD man says in ERRORS part:
[EINVAL] The value nbytes is greater than INT_MAX.
Generally it can read as many bytes as there are available in buf. In reality, the underlying device driver (be it the filesystem or the network, or a pipe), would return less than what you want in case there is nothing more available.
So, the particular behaviour of read depends on the underlying driver in the kernel.
This is why it's important to always check the return value of read and examine the actual bytes read.
read() takes an open file descriptor, the address of a buffer, and a
number, count of bytes. It attempts to read count bytes into the
buffer from the file described by the descriptor. It is important to
assure that buf points to at least count bytes of storage!
It can read as much as your buffer can hold, the limit is SSIZE_MAX and also the limits of your hardware.

What is "-1L" / "1L" in C?

What do "-1L", "1L" etc. mean in C ?
For example, in ftell reference, it says
... If an error occurs, -1L is returned ...
What does this mean ? What is the type of "1L" ?
Why not return NULL, if error occurs ?
The L specifies that the number is a long type, so -1L is a long set to negative one, and 1L is a long set to positive one.
As for why ftell doesn't just return NULL, it's because NULL is used for pointers, and here a long is returned. Note that 0 isn't used because 0 is a valid value for ftell to return.
Catching this situation involves checking for a non-negative value:
long size;
FILE *pFile;
...
size = ftell(pFile);
if(size > -1L){
// size is a valid value
}else{
// error occurred
}
ftell() returns type long int, the L suffix applied to a literal forces its type to long rather than plain int.
NULL would be wholly incorrect because it is a macro representing a pointer not an integer. Its value, when interpreted and an integer may represent a valid file position, while -1 (or any negative value) cannot.
For all intents and purposes you can generally simply regard the error return as -1, the L suffix is not critical to correct operation in most cases due to implicit casting rules
It means to return the value as a long, not an int.
That means -1 as a long (rather than the default type for numbers, which is an integer)
-1 formated in long int is a -1L. Why not simple NULL? Because NULL in this function is a normal result and can't sygnalize error too. Why NULL in this function is a normal result? Because NULL == 0 and ftell returns position in a stream, when you are on start of stream function returns 0 and this is a normal result not error, then if you compare this function to NULL to check error, you will be get error when you will be on start position in stream.
Editing today implies more details are still wanted.
Mark has it right. The "L" suffix is long. -1L is thus a long -1.
My favored way to test is different from Marks and is a matter of preference not goodness.
if ( err >= 0L )
success
else
error
By general habit I do not like looking for explicit -1. If a -2 ever pops up in the future my code will likely not break.
Ever since I started using C, way back in the beginning of C, I noticed most library routines returning int values return 0 for success and -1 on error. Most.
NULL is not normally returned by integer functions as NULL is a pointer value. Besides the clash of types a huge reason for not returning NULL depends on a bit of history.
Things were not clean back when C was being invented, and maybe not even on small systems today. The original K&R C did not guarantee NULL would be zero as is usually the case on CPUs with virtual memory. On small "real memory" systems zero may be a valid address making it necessary for "invalid" addresses to be moved to some other OS dependent location. Such would really be accepted by the CPU, just not generated in the normal scheme of things. Perhaps a very high memory address. I can even see a hidden array called extern const long NULL[1]; allowing NULL to become the address of this otherwise unused array.
Back then you saw a lot of if ( ptr != NULL ) statements rather than if ( ptr ) for people serious about writing portable code.

Resources