C read() function difficulty understanding file descriptor [duplicate] - c

This question already has answers here:
Strange behavior performing library functions on STDOUT and STDIN's file descriptors
(2 answers)
Closed 5 years ago.
I'm having difficulity understanding the read function in C.
len = read(fd, buf, 32);
when I assign fd as 0,1,2 and run the program, its basically doing the same thing, can someone tell me what difference does this make?

read() attempts to read up to count bytes from file descriptor fd.
fd = 0
fd = 1
fd = 2
Is reading from different file descriptors. The difference is, you are reading from different files, and the data read into the buffer is different.
What is the difference in reading from Book A and reading from Book B ? it is the same process of reading a book... it is the content that changes.

As far as I understand your question it is why nothing changes if you read from file descriptors 0, 1, 2.
In a normal program the file descriptor 0 is stdin, 1 is stdout and 2 is stderr. stdin is where you should read your input, 1 is where you should write your output and 2 is where you should write your error messages.
It is not uncommon that all three file descriptors may point to the same underlying file (a file can also be the console, network connection, etc.) behind the scenes. If you're just running your program from the command line this is actually quite likely. In that case you may be able to read from all of them and get the exact same result.
But. Then you decide that you want to save the output of the program in a file and run it like this: program > output. Now file descriptor 1 is no longer pointing to the same file as stdin and your program would break. Same thing happens if you point stderr to some error logging facility. Or get the input from a file or a pipe. Or run the program in some debuggers. Or a different terminal. This is why you should only read from 0 and no other file descriptors, even if you might get away with it sometimes.

Related

Sequential write calls to the same file descriptor, but only the second crashes?

I am currently writing a C program that is writing some data to a file descriptor, where the file descriptor represents some other process that has opened a connection to the program.
My program invariably crashes at a certain point, and I have narrowed down the last few actions that it has taken, which looks something like:
write(clientfd, "start", 5);
printf("something goes here");
write(clientfd, "end", 3);
printf("something else goes here");
The writes are to the same file descriptor, and writing basic string literals - however, in the course of this program's execution, only the first write and printf go off - the program appears to crash at the second write, as the second printf never appears.
This doesn't seem to make much sense to me. I've also printed out the output of the first write (the number of bytes that it actually wrote), and it appears to be correct (5 in this instance), meaning that the first write call didn't even fail, but the second one causes the program to crash for some arcane reason. It may be important to note that, for this file descriptor connection, on the client's side of the connection, the client has already closed their end of the file descriptor. I wasn't sure if that was relevant or not, but I felt that it wasn't, since the first write succeeded.
for this file descriptor connection, on the client's side of the connection, the client has already closed their end of the file descriptor.
From the man page, I guess you're getting one of the following errors:
EBADF fd is not a valid file descriptor or is not open for writing.
EPIPE fd is connected to a pipe or socket whose reading end is closed. When this happens the writing process will also receive a SIGPIPE signal. (Thus, the write return value is seen only if the program catches, blocks or ignores this signal.)
I suggest you check the return value, and in case of error, make sure you inspect it with perror().
On error, -1 is returned, and errno is set appropriately.

What is the purpose of file descriptors? [duplicate]

This question already has answers here:
What's the difference between a file descriptor and a file pointer?
(9 answers)
Closed 4 years ago.
My understanding is that both fopen() and open() can be used to open files. open() returns a file descriptor. But they should be equivalent in terms of get a file for writing or reading. What is the purpose of definining the file descriptors? It is not clear from the wiki page.
https://en.wikipedia.org/wiki/File_descriptor
fopen returns a FILE * which is a wrapper around the file descriptor (I will ignore the "this is not required by the specification" aspect here, as I am not aware of an implementation that does not do this). At a high level, it looks like this:
application --FILE *--> libc --file descriptor--> kernel
Shells operate directly on file descriptors mainly because they are executing other programs, and you cannot modify the other program's FILE * objects. However, you are able to modify other program's file descriptors using the dup syscall at startup (i.e. between fork and exec). For example:
/bin/cat > foo.txt
This tells the shell to execute the /bin/cat program, but first redirect stdout (file descriptor #1) to a file that it opens. This is implemented as (pseudocode):
if (fork() == 0) {
int fd = open("foo.txt");
dup2(fd, 1);
exec("/bin/cat");
}
The closest thing you can do with FILE * is calling freopen, but this is not persisted when using exec unlike file descriptors.
But why do we need FILE * at all then, if it's just a wrapper around a file descriptor? One main benefit is having a readahead buffer. For example, consider fgets. This will eventually call the read syscall on the file descriptor associated with the FILE * that you pass in. But how does it know how much to read? The kernel has no option to say "give me one line" (line-buffered ttys aside). If you read more than one line in the first read, the next time you call fgets you might only get part of the next line, since the kernel has already given you the first part in the previous read syscall. The other option would be calling read one character at a time, which is horrible for performance.
So what does libc do? It reads a bunch of characters at once, then stores the extra characters in an internal buffer on the FILE * object. The next time you call fgets, it is able to use the internal buffer. This buffer is also shared with functions like fread, so you can interleave calls to fgets and fread without losing data.
The two function at different levels:
open() is a lower-level, POSIX function to open a file. It returns a distinct integer to identify, and enable access to, the file opened. This integer is a file descriptor.
fopen() is a higher-level, portable, C standard-library function to open a file.
On a POSIX system, the portable fopen() probably calls the nonportable open(), but this is an implementation detail.
When in doubt, prefer fopen().
For more information, on a Linux system, man 2 read. The POSIX read() function reads data via the file descriptor returned by open().

fclose(stdout) vs close(STDOUT_FILENO) - C [duplicate]

This question already has answers here:
Difference between fclose and close
(4 answers)
Closed 6 years ago.
I want to redirect STDOUT to a file on the disk. The point is to make the printf on my program write to a file instead of the console.
I saw some articles on the web where they used:
dup2(fileno(outputFile), STDOUT_FILENO);
Or:
close(STDOUT_FILENO);
dup(fileno(outputFile));
In every tutorial they use close() and it actually works. But I was curious and I tried to use fclose(stdout) instead but some error happened when I tried to use printf:
fclose(STDOUT_FILENO);
dup(fileno(outputFile));
Error:
Bad file descriptor
My question is, why does fclose() not work but close() does?
Thanks.
STDOUT_FILENO is a numeric file descriptor (usually 1). When you use close, you release the descriptor, but then reassign it with dup2. Any output to that descriptor will now go to the new file.
stdout on the other hand is a FILE*, an object of sorts that contains a file descriptor. Doing printf formats output into a buffer associated with the FILE, and then writes the buffer to the descriptor (depending upon the buffering mode). When you fclose a FILE*, it (normally) closes the underlying descriptor, and frees the FILE object. The dup2 does not ressurect stdout, so printf fails.

File Descriptor 0

Drawing from this thread discussing file descriptors and tables;
I want to know how stdin (that is, file descriptor 0, not C's stdin FILE structure) is handled within shells.
When I run a piece of code like read(0, buffer, 1024) in C, which by default in C file descriptor 0 is connected to keyboard, the shell allows me to type text in, because, we assume, read is waiting to read the contents of the character device 'standard input', aka the keyboard. But wouldn't standard input simply be empty and produce that as its result? Alright, so let's say that 'connected to keyboard' path is the way of explaining it; if that's the case, then that must mean shells line buffer their command's, right? Calling a read on file descriptor 0 would mean that file descriptor 0 in a shell is connected to this line-buffered buffer output of standard input, and not directly to the keyboard, so what's making C wait around? Furthermore, why can we not use lseek() on standard input - does said 'file' always get overwritten every 'write' that's made to it and therefore there is nothing to seek around in as standard input (being the keyboard) is not really a file on a storage device per se?
read(0, buffer, 1024)
is a system call, a call into kernel code. The kernel's implementation of read will dispatch to the terminal (or pseudo-terminal) device driver, which will wait until you've either typed 1024 characters, a newline, or an EOF marker, Ctrl+D.
then that must mean shells line buffer their command's, right?
The buffering is performed in the terminal driver, if the terminal is set to the right mode. Otherwise, the program will just wait until 1024 bytes are entered.
Furthermore, why can we not use lseek() on standard input
You can if stdin is a regular file. You just can't seek on a terminal, because that would require the terminal driver to remember all data that passed through the terminal device since it was created.

dup2 function and program behaviour

if am writing a code.
#include<stdio.h>
int main()
{
int i;
FILE *fp;
fp=fopen("shiv.txt","w");
printf("%d",fileno(fp));
dup2(3,1);
fprintf(fp,"hello");
}
as an output the program is printing hello3 in the shiv.txt file
as we can see printf is called first yet its output is shown after the output of fprintf.
moreover dup2 was called after the printf statement therefore the output of printf should be placed on terminal
The standard I/O streams are buffered — with the possible exception of the standard error stream, which is only required to not be fully buffered by POSIX. Without a call to fflush(stdout) to flush the output buffer for standard output (or the output of a newline sequence if it is line-buffered), the way things work with respect to the FILE interface is not defined once you call dup2.
Since dup2 works with file descriptors and not FILE pointers, you have a problem: POSIX doesn't specify what to do in this case. The buffer associated with stdout may be discarded, or it may be flushed as if with fclose. The buffer may even remain associated and not flushed/discarded since stdout from the perspective of the FILE interface is still open.
So the behavior isn't necessarily deterministic without syncing the FILE interface with the underlying file description (add an fclose(stdout) call after dup2). Additionally, what happens with, e.g., stderr in addition to stdout with dup2 associated with the file description of the file you open? Is the behavior in order of the dup2 calls as with a queue or in reverse order as with a stack or even in a seemingly random order, the latter of which suggests that a segfault may be possible? And what is the order of output if you dup2(STDERR_FILENO, STDOUT_FILENO), followed by dup2(fileno(fp), STDERR_FILENO)? Do the results of writing to the standard output/error buffers appear before the fprintf results or after or mixed (or sometimes one and sometimes another)? Which appears first — the data written to stderr or the data written to stdout? Can you be certain this will always happen in that order?
The answer probably won't surprise you: No. What happens on one configuration may differ from what happens on another configuration because the interaction between file descriptors, the buffers used by the standard streams, and the FILE interface of the standard streams is left undefined.
As #twalberg commented, there is no guarantee that the file you opened is file descriptor 3, so be careful when hard-coding numbers like that. Also, you have STDOUT_FILENO available from <unistd.h>, which is where dup2 is actually declared, so you can avoid using a call to fileno in place of file descriptor 1 by using it.
There are rules to follow when manipulating "handles" to open file descriptions.
Per System Interfaces Chapter 2, file descriptors and streams are "handles" to file descriptions and a file description can have multiple handles.
This chapter defines rules when working with both a file descriptor and a stream for the same file description. If they are not followed, then the results are "undefined".
Calling dup2() to replace the stdout file descriptor after calling printf() (on the STDOUT stream) without a call to fflush(STDOUT) in between is a violation of those rules (simplified), hence the undefined behavior.

Resources