What is the reading limit of function 'read' in 'unistd.h'? - file

Standard unix C has this function:
ssize_t read(int fd, void *buf, size_t count);
But what is the maximum bytes that this 'read' function can read 1 time?

From man read(2):
read() attempts to read up to count bytes from file descriptor fd into
the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If
count is greater than SSIZE_MAX, the result is unspecified.
The value of SSIZE_MAX depends on your system, but generally it's something akin to the maximum value of signed long, which is often 231 (32-bit systems) or 263 (64-bit systems).
231 bytes is 2 gigabytes, so you're probably safe; in practice, the actual device driver/buffers/network I/O is never going to give you a 2 gigabyte chunk of data in one go.

quote from IEEE Std 1003.1 (aka POSIX.1)
If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined.
So you have to check man 2 read at your target platform. For example, FreeBSD man says in ERRORS part:
[EINVAL] The value nbytes is greater than INT_MAX.

Generally it can read as many bytes as there are available in buf. In reality, the underlying device driver (be it the filesystem or the network, or a pipe), would return less than what you want in case there is nothing more available.
So, the particular behaviour of read depends on the underlying driver in the kernel.
This is why it's important to always check the return value of read and examine the actual bytes read.

read() takes an open file descriptor, the address of a buffer, and a
number, count of bytes. It attempts to read count bytes into the
buffer from the file described by the descriptor. It is important to
assure that buf points to at least count bytes of storage!
It can read as much as your buffer can hold, the limit is SSIZE_MAX and also the limits of your hardware.

Related

fseek() giving 2 different counts when I expected the same count

If I use
fseek(file_ptr, 0, SEEK_END);
size = ftell(file_ptr);
I get 480000 which is right as I have 60000 x double float at 8 bytes per double in the file.
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes. Anyone know what they are?
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes
That is because the size of a double on your system is 8 bytes, and fseek() set the file position indicator 8 bytes from SEEK_END.
The new position, measured in bytes, is obtained by adding offset
bytes to the position specified by whence.
Re:
Anyone know what they are?
This is what the open group's manual page has to say about it:
The fseek() function shall allow the file-position indicator to be set
beyond the end of existing data in the file. If data is later written
at this point, subsequent reads of data in the gap shall return bytes
with the value 0 until data is actually written into the gap.
The behavior of fseek() on devices which are incapable of seeking is
implementation-defined. The value of the file offset associated with
such a device is undefined.
Note that this behaviour is only specified for POSIX-compliant systems.
The answer is in fseek() man page (Linux OS).
The function prototype is
int fseek(FILE *stream, long offset, int whence);
where offset parameter is described as follows
The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
Since your offset is 8 (sizeof double, in your system) and the original file size is 480000, that's why you get 480008.
The manual never mention any limitation preventing the pointer to be set after SEEK_END.

How does C read bytes using the read syscall?

I need to read an unsigned short using the read system call.
According to the manpage:
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
In my case, an unsigned short is two bytes size, so it can store the numbers up to 65535. But when I execute this code:
char buf[2];
bytes_read = read(0, buf, 2);
bytes_wrote = write(1, buf, 2);
and type in the command line, say, the number 123, it returns only 12. Does it not read bytes, but symbols? How can I read a value with more than 2 symbols into a 2-byte buffer? For example, the maximum values of an unsigned short. I found nothing in either K&R or the manpages about it, so I think it's very simple.
NB: I'm assuming your terminal uses either ASCII or UTF8. For the purposes of this explanation, they're equivalent.
When you type, say, 123, read isn't getting that as a number. It's seeing that as a sequence of bytes -- since you said that it should look to fill a 2-char buffer, it sees the first two bytes: First, 0x31, then 0x32. It reads the first byte, and then the second; it doesn't interpret them into numbers. That the series of bytes happens to represent a number when decoded as ASCII or UTF8 is irrelevant; all C cares about is the actual sequence of bytes, and that's what it gives you.
Note that it doesn't even see the third byte. That's left in the input stream to be consumed by later input operations.
If that's what you want, great! If you wanna get a number typed out (i.e. that's been entered as a string of bytes whose values align with a decimal number), take a look at fscanf and its related functions.

Reading from binary after x bytes in C

I am trying to read double values from a binary in C, but the binary starts with an integer and then the doubles I am looking for.
How do I skip that first 4 bytes when reading with fread()?
Thanks
Try this:
fseek(input, sizeof(int), SEEK_SET);
before any calls to fread.
As Weather Vane said you can use sizeof(int) safely if the file was generated in the same system architecture as the program you are writing. Otherwise, you should manually specify the size of integer of the system where the file originated.
You can use fseek to skip the initial integer. If you insist on using fread anyway, then you can read the integer first:
fread(ptr, sizeof(int), 1, stream).
Of course you have to declare ptr before calling fread.
As I said, fseek is another option:
fseek(stream, sizeof(int), SEEK_SET).
Beware that fseek moves the file pointer in bytes (1 in the given line from the beginning of the file); integer can be 4 or other number of bytes which is system specific.
Be careful when implementing things like this. If the file isn't created on the same machine, you may get invalid values due to different floating point specifications.
If the file you're reading is created on the same machine, make sure that the program that writes, correctly address the type sizes.
If both writer and reader are developed in C and are supposed to run only on the same machine, use the fseek() with the sizeof(type) used in the writer in the offset parameter.
If the machine that writes the binary isn't the same that will read it, you probably don't want to even read the doubles with fread() as their format may differ due to possible different architectures.
Many architectures rely on the IEEE 754 for floating point format, but if the application is supposed to address multi-platform support, you should make sure that the serialized format can be read from all architectures (or converted while unserializing).
Just read those 4 unneeded bytes, like
void* buffer = malloc(sizeof(double));
fread(buffer,4,1,input); //to skip those four bytes
fread(buffer,sizeof(double),1,input); //then read first double =)
double* data = (double*)buffer;//then convert it to double
And so on

Strange value for offset in fwrite

For the purpose of tracing I/O calls, I have overwritten fwrite in a shared library that I LD_PRELOAD when running a program. In my version of fwrite, I get the absolute offset of the write using
long int pos = ftell(stream);
The value is then passed to a function that takes a size_t argument (so unsigned long), and the value is printed.
I encountered some calls with 18446744073709551615 as the offset (2^64 - 1), so I guess the initial long int returned by ftell was -1. These operations always wrote 10 bytes.
So my question is: what could have led the stream offset to be set to -1?
I don't trace the name of the files being accessed by those strange fwrites, so I don't know what is being accessed. Also I should precise that the program is an MPI program. It might be possible that the stream involved is actually a socket, or something else than a file...
-1 return value from ftell means that the errno was set. Here is the relivenat text from the manpage:
ftell() returns the current offset. Otherwise, -1 is returned and errno is set to indicate the error.

Using RNDADDENTROPY to add entropy to /dev/random

I have a device which generates some noise that I want to add to the entropy pool for the /dev/random device in an embedded Linux system.
I'm reading the man page on /dev/random and I don't really understand the structure that you pass into the RNDADDENTROPY ioctl call.
RNDADDENTROPY
Add some additional entropy to the input pool, incrementing
the entropy count. This differs from writing to /dev/random
or /dev/urandom, which only adds some data but does not
increment the entropy count. The following structure is used:
struct rand_pool_info {
int entropy_count;
int buf_size;
__u32 buf[0];
};
Here entropy_count is the value added to (or subtracted from)
the entropy count, and buf is the buffer of size buf_size
which gets added to the entropy pool.
Is entropy_count in this structure the number of bits that I am adding? Why wouldn't this just always be buf_size * 8 (assuming that buf_size is in terms of bytes)?
Additionally why is buf a zero size array? How am I supposed to assign a value to it?
Thanks for any help here!
I am using a hardware RNG to stock my entropy pool. My struct is a static size
and looks like this (my kernel has a slightly different random.h; just copy what
you find in yours and increase the array size to whatever you want):
#define BUFSIZE 256
/* WARNING - this struct must match random.h's struct rand_pool_info */
typedef struct {
int bit_count; /* number of bits of entropy in data */
int byte_count; /* number of bytes of data in array */
unsigned char buf[BUFSIZ];
} entropy_t;
Whatever you pass in buf will be hashed and will stir the entropy pool.
If you are using /dev/urandom, it does not matter what you pass for bit_count
because /dev/urandom ignores it equaling zero and just keeps on going.
What bit_count does is push the point out at which /dev/random will block
and wait for something to add more entropy from a physical RNG source.
Thus, it's okay to guesstimate on bit_count. If you guess low, the worst
that will happen is that /dev/random will block sooner than it otherwise
would have. If you guess high, /dev/random will operate like /dev/urandom
for a little bit longer than it otherwise would have before it blocks.
You can guesstimate based on the "quality" of your entropy source.
If it's low, like characters typed by humans, you can set it to 1 or 2
per byte. If it's high, like values read from a dedicated hardware RNG,
you can set it to 8 bits per byte.
If your data is perfectly random, then I believe it would be appropriate for entropy_count to be the number of bits in the buffer you provide. However, many (most?) sources of randomness aren't perfect, and so it makes sense for the buffer size and amount of entropy to be kept as separate parameters.
buf being declared to be size zero is a standard C idiom. The deal is that when you actually allocate a rand_pool_info, you do malloc(sizeof(rand_pool_info) + size_of_desired_buf), and then you refer to the buffer using the buf member. Note: With some C compilers, you can declare buf[*] instead of buf[0] to be explicit that in reality buf is "stretchy".
The number of bytes you have in the buffer correlates to the entropy of the data but the entropy can not be calculated only from that data or its length.
Sure, if the data came from a good, unpredictable and equal-distributed hardware random noise generatr the entropy (in bits) is 8*size of the buffer (in bytes).
But if the bits are not equally distributed or are somehow predictable the entropy becomes less.
See https://en.wikipedia.org/wiki/Entropy_(information_theory)
I hope that helps.

Resources