Finding out the number of chars read/write reads

Finding out the number of chars read/write reads - c

I'm fairly new to c so bear with me.
How do I go about finding out the number of chars read/write reads?
Can I be more specific and designate the # of chars read/write reads in an argument? If so, how?

From man(2) read:
If successful, the number of bytes actually read is returned
From man(2) write:
Upon successful completion the number of bytes which were written is returned
Now concerning:
Can I be more specific and designate the # of chars read/write reads in an argument? If so, how?
AFAIK no, but there might be some device/kernel specific ways using for example ioctl(2)

C and C++ has different IO libraries. I guess you are coding in C.
fprintf(3) returns (when successful) the number of printed characters.
scanf(3) returns the number of successfully read items, but also accept the %n specifier:
n Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next
pointer, which must be a pointer to int.
You could also do IO line by line... (getline, snprintf, sscanf, fputs ....)
for Linux and Posix
If you call directly the read(2) or write(2) functions (i.e. syscalls) they return the number of input or output bytes on success.
And you could use the lseek(2) syscall, or the ftell(3) <stdio.h> function, to query the current file offset (which has no meaning so would fail on non-seekable files like pipes, sockets, FIFOs, ...).
See also FIONREAD

Related

Amount of data read() syscall will actually read

Suppose I have a file for which the file descriptor has more than n bytes left until EOF, and I invoke the read() syscall for n bytes. Is the function guaranteed to read n bytes into the buffer? Or can it read less?

The read system call is guaranteed to read as many many characters as you asked for, except when it can't. But it turns out that there are so many exceptions -- so many cases where it can't read as many characters as you asked for -- that it basically ends up being safest to assume that any given read call probably won't read as many characters as you asked for. I believe it's good practice to always write your code with that in mind.
The man page on my system says
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
So if it's not a normal file, or if it is a normal file but there aren't enough characters, you'll get fewer than you asked for. But in the case you asked about, yes, you should be guaranteed to get exactly as many characters as you asked for.
With that said, though, if you find yourself with a choice between assuming that read is allegedly guaranteed to read exactly the number of characters requested, versus acknowledging that it might return less, I would always write the code to assume it might return less. That is, if you have a call like
r = read(fd, buf, n);
there isn't usually much to be gained by assuming that if r is greater than 0, it must be exactly n. Your code has to be able to handle the r < n case so it will behave properly when it's almost at end-of-file, so unless you want to have two different code paths (one for "normal" reads, and one for the last read), you might as well write one piece of code, that can handle the r < n case, and let it operate all the time.
(Also, as Zan Lynx reminds in a comment, don't have the code notice that r < n, and infer from that that end-of-file is coming up soon. Wait for r == 0 before deciding you're at end-of-file.)

You could've read it from the man page yourself:
On Linux, read() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
actually transferred. (This is true on both 32-bit and 64-bit
systems.)
So even if you had enough RAM and so on, you couldn't read a full-size DVD image in one go - however, this wouldn't be the sane thing to do either; to access such large files, mmap would be better.
Other than that, a signal might be delivered, which can cause exit with EINTR and buffer contents indeterminate.
ERRORS
[...]
EINTR The call was interrupted by a signal before any data was read; see signal(7).

Is the function guaranteed to read n bytes into the buffer? Or can it
read less?
No, even if your file has more than n bytes before its end, the read(fd, buf, n) function is not guaranteed to read n bytes into bufffer and then return n. It can read less and return a positive value that is less than n.
See Linux man page at https://man7.org/linux/man-pages/man2/read.2.html
RETURN VALUE
It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

read system in Linux vs Windows

Is there any difference between using read() in Linux than in Windows?
Is it possible that in Windows, it will usually read less than I request, and in Linux it usually reads as much as I request?

read isn't a standard c function. Historically it is a posix syscall, and as such, windows (assuming windows means MSVC) isn't required to implement it at all. Still, they tried. And we can compare the two implementations:
linux:
http://man7.org/linux/man-pages/man2/read.2.html
On success, the number of bytes read is returned (zero indicates end
of file), and the file position is advanced by this number. It is
not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to end-of-
file, or because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal. See also NOTES.
windows:
https://msdn.microsoft.com/en-us/library/ms235412.aspx
https://msdn.microsoft.com/en-us/library/wyssk1bs.aspx
_read returns the number of bytes read, which might be less than count if there are fewer than count bytes left in the file or if the file was opened in text mode, in which case each carriage return–line feed (CR-LF) pair is replaced with a single linefeed character. Only the single linefeed character is counted in the return value. The replacement does not affect the file pointer.
So you should expect both implementations to return less than the requested number of bytes. Futhermore, there is a clear difference when reading files in text mode.

How does the STDIN buffer and getchar() pointer change during successive calls? [duplicate]

This question already has an answer here:
Confusion about how a getchar() loop works internally
(1 answer)
Closed 8 years ago.
Given input in the stdin buffer, when successive calls to getchar() are performed, does the pointer move along the memory address of the stdin buffer, allowing getchar() to retrieve the value at each address? If so, once they have been retrieved are the values removed and the pointer then incremented?
Generally my understanding of getchar() in a loop follows this logic:
getchar() called
stdin buffer checked for input
If stdin buffer empty, getchar() sleeps
user enters input and awakens get char()
stdin buffer checked again for input
stdin buffer not empty
getchar() retrieves value at address at the start of the stdin buffer
value at address removed from stdin buffer, pointer incremented
subsequent calls repeat steps 7-8 until EOF encountered
A similar question was asked before on stackoverflow but I had trouble understanding the responses.

Generally there is a stdio internal buffer. getchar() may trigger a line read into the buffer, and generally on subsequent calls, it will simply increment a pointer until the pointer reaches the end of the current data in the buffer. The implementation usually uses a simple internal char * to an underlying chunk of dynamic memory, with a few pointers and state variable(s).
Implementations vary, I don't recall the POSIX standard implying much about the internal implementation of getchar() or stdio streams in general, except that given operations should be supported.
If I recall, some implementations are unbuffered (I think the DOS compiler I used did not buffer), but there are multiple standard lib implementations for a given OS.
It is not uncommon to have 2 stdio libs on the same system, example: sys-admins managing AIX, Solaris, HPUX, and other non-Linux/BSD UNIX platforms will frequently install the GNU stack to get tools like gcc, and that stack includes glibc (GNU LIBC).
You can download a libc/stdio source online. See glibc.
If it helps, consider that stdio provides peek and unget functionality, and the only way to do that is by an internal buffer between the terminal and the user program.

In C, what's the size of stdout buffer?

Today I learned that stdout is line buffered when it's set to terminal and buffered in different cases. So, in normal situation, if I use printf() without the terminating '\n' it will be printed on the screen only when the buffer will be full. How to get a size of this buffer, how big is this?

The actual size is defined by the individual implementation; the standard doesn't mandate a minimum size (based on what I've been able to find, anyway). Don't have a clue on how you'd determine the size of the buffer.
Edit
Chapter and verse:
7.19.3 Files
...
3 When a stream is unbuffered, characters are intended to appear from the source or at the
destination as soon as possible. Otherwise characters may be accumulated and
transmitted to or from the host environment as a block. When a stream is fully buffered,
characters are intended to be transmitted to or from the host environment as a block when
a buffer is ﬁlled. When a stream is line buffered, characters are intended to be
transmitted to or from the host environment as a block when a new-line character is
encountered. Furthermore, characters are intended to be transmitted as a block to the host
environment when a buffer is ﬁlled, when input is requested on an unbuffered stream, or
when input is requested on a line buffered stream that requires the transmission of
characters from the host environment. Support for these characteristics is
implementation-deﬁned, and may be affected via the setbuf and setvbuf functions.
Emphasis added.
"Implementation-defined" is not a euphemism for "I don't know", it's simply a statement that the language standard explicitly leaves it up to the implementation to define the behavior.
And having said that, there is a non-programmatic way to find out; consult the documentation for your compiler. "Implementation-defined" also means that the implementation must document the behavior:
3.4.1
1 implementation-deﬁned behavior
unspeciﬁed behavior where each implementation documents how the choice is made
2 EXAMPLE An example of implementation-deﬁned behavior is the propagation of the high-order bit
when a signed integer is shifted right.

The Linux when a pipe is created for default pipe size 64K is used.
In /proc/sys/fs/pipe-max-size the maximum pipe size exists.
For the default 1048576 is typical.
For glibc's default file buffer; 65536 bytes seems reasonable.
However, ascertained by grep from the glibc source tree:
libio/libio.h:#define _IO_BUFSIZ _G_BUFSIZ
sysdeps/generic/_G_config.h:#define _G_BUFSIZ 8192
sysdeps/unix/sysv/linux/_G_config.h:#define _G_BUFSIZ 8192
By that the original question might or might not be answered.
For a minute's effort the best guess is 8 kilobytes.
For mere line buffering 8K is adequate.
However, for more than line buffered output
as compared with 64K; 8K is not efficient.
Because for the default pipe size 64K is used and
if a larger pipe size is not expected and
if a larger pipe size is not explicitly set
then for a stdio buffer 64K is recommended.
If performance is required
then meager 8K buffers do not suffice.
By fcntl(pipefd,F_SETPIPE_SZ,1048576)
a pipe's size can be increased.
By setvbuf (stdout,buffer,_IOFBF,1048576)
a stdio provided file buffer can be replaced.
If a pipe is not used
then pipe size is irrelevant.
However, if between two processes data is piped
then by increasing pipe size a performance boon could become.
Otherwise
by the smallest buffer or
by the smallest pipe
a bottleneck is created.
If reading also
then by a larger buffer
by stdio fewer read function invocations might be required.
By the word "might" an important consideration is suggested.
As by provided
by a single write function invocation
by a single read function invocation
as much data can be read.
By a read function invocation
a return with fewer bytes than requested can be expected.
By an additional read function invocation
additional bytes may be gained.
For writing a data line; by stdio overkill is provided.
However, by stdio line buffered output is possible.
In some scenarios line buffered output is essential.
If writing to a proc virtual file system provided file or
if writing to a sys virtual file system provided file
then in a single write buffer
the line feed byte should be included.
If a second write is used
then an unexpected outcome could become.
If read write and stdio are mixed
then caveats exist.
Before
a write function invocation
a fflush function invocation is required.
Because stderr is not buffered;
for stderr the fflush function invocation is not required.
By read fewer than expected bytes might be provided.
By stdio the previous bytes might already be buffered.
Not mixing unistd and stdio I/O is good advise, but often ignored.
Mixing buffered input is unreasonable.
Mixing unbuffered input is possible.
Mixing buffered output is plausible.
By stdio buffered IO convenience is provided.
Without stdio buffered IO is possible.
However, for the code additional bytes are required.
When a sufficient sized buffer is leveraged;
compared with stdio provided output functions;
the write function invocation is not necessarily slower.
However, when a pipe is not involved
then by function mmap superior IO can be provided.
On a pipe by mmap an error is not returned.
However, in the address space the data is not provided.
On a pipe by lseek an error is provided.
Lastly by man 3 setvbuf a good example is provided.
If on the stack the buffer is allocated
then before a return a fclose function invocation
must not be omitted.
The actual question was
"In C, what's the size of stdout buffer?"
By 8192 that much might be answered.
By those who encounter this inquiry
curiosity concerning buffer input/output efficiency might exist.
By some inquiries the goal is implicitly approached.
By a preference for terse replies
the pipe size significance and
the buffer size significance and
mmap is not explicated.
This reply explicates.

here are some pretty interesting answers on a similar question.
on a linux system you can view buffer sizes from different functions, including ulimit.
Also the header files limits.h and pipe.h should contain that kind of info.

You could set it to unbuffered, or just flush it.
This seems to have some decent info when the C runtime typically flushes it for you and some examples. Take a look at this.

character reading in C

I am struggling to know the difference between these functions. Which one of them can be used if i want to read one character at a time.
fread()
read()
getc()

Depending on how you want to do it you can use any of those functions.
The easier to use would probably be fgetc().
fread() : read a block of data from a stream (documentation)
read() : posix implementation of fread() (documentation)
getc() : get a character from a stream (documentation). Please consider using fgetc() (doc)instead since it's kind of saffer.

fread() is a standard C function for reading blocks of binary data from a file.
read() is a POSIX function for doing the same.
getc() is a standard C function (a macro, actually) for reading a single character from a file - i.e., it's what you are looking for.

In addition to the other answers, also note that read is unbuffered method to read from a file. fread provides an internal buffer and reading is buffered. The buffer size is determined by you. Also each time you call read a system call occurs which reads the amount of bytes you told it to. Where as with fread it will read a chunk in the internal buffer and return you only the bytes you need. For each call on fread it will first check if it can provide you with more data from the buffer, if not it makes a system call (read) and gets a chunk more data and returns you only the portion you wanted.
Also read directly handles the file descriptor number, where fread needs the file to be opened as a FILE pointer.

The answer depends on what you mean by "one character at a time".
If you want to ensure that only one character is consumed from the underlying file descriptor (which may refer to a non-seekable object like a pipe, socket, or terminal device) then the only solution is to use read with a length of 1. If you use strace (or similar) to monitor a shell script using the shell command read, you'll see that it repeatedly calls read with a length of 1. Otherwise it would risk reading too many bytes (past the newline it's looking for) and having subsequent processes fail to see the data on the "next line".
On the other hand, if the only program that should be performing further reads is your program itself, fread or getc will work just fine. Note that getc should be a lot faster than fread if you're just reading a single byte.