Meaning of read and syscall with arguments - c

I'm playing with strace and ltrace tools to get information about a program which contains a prompt for a user entry.
With strace, after the read from the prompt is called, there is an openat of a specific file in readOnly:
openat(AT_FDCWD, "file", O_RDONLY) = 3
read(3, "22d72c", 6) = 6
I know that the second argument for read is supposed to be a buffer and that read starts at the buffer, but what exatly does it mean here? Does it means it starts at the 22d72c bit? or is 22d72c a key and it reads it's value?
As for syscall, I'm getting that when I use ltrace, after a scanf for the prompt and a
fopen to open the file, it returns similar syscall such as:
syscall(0, 3, 0x56127f5c96c0, 6)
What is the meaning of syscall third argument here? (0x56127f5c96c0)

No. "22d72c" are the 6 characters that read read from your file... just check the beginning of file.
Indeed if you read from STDIN_FILENO using read (or for example use fgets; strace will output
read(0,
and stop there waiting for the read to complete so that it can print out the characters read!
As for
syscall(0, 3, 0x56127f5c96c0, 6)
that output is from a program that doesn't know how to decode the system call parameters for system call 0 (read), so it just displays some sensible value - all small numbers in decimal. 0x56127f5c96c0 is the pointer to the first character of the buffer you're reading into.

0x56127f5c96c0 is the pointer passed to read. It's not very useful to you is it? strace was nice enough to decode the system call, notice that's a pointer argument, and show you what it points to instead.

Related

C Read in bash : stdin and stdout

I have a simple C program with the read function and I don't understand the output.
//code1.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main()
{
int r;
char c; // In C, char values are stored in 1 byte
r = read ( 0, &c, 1);
// DOC:
//ssize_t read (int filedes, void *buffer, size_t size)
//The read function reads up to size bytes from the file with descriptor filedes, storing the results in the buffer.
//The return value is the number of bytes actually read.
// Here:
// filedes is 0, which is stdin from <stdio.h>
// *buffer is &c : address in memory of char c
// size is 1 meaning it will read only 1 byte
printf ("r = %d\n", r);
return 0;
}
And here is a screenshot of the result:
I ran this program 2 times as showed above and typed "a" for the first try and "aecho hi" for the second try.
How I try to explain the results:
When read is called it sees that stdin is closed and opens it (from my point of view, why? It should just read it. I don't know why it opens it).
I type "aecho hi" in the bash and press enter.
read has priority to process stdin and reads the first byte of "aecho hi" : "a".
I get the confirmation that read has processed 1 byte with the printf.
a.out has finished and is terminated.
Somehow the remaining data in stdin is processed in bash (the father of my program) and goes to stdout which executes it and for some reason the first byte has been deleted by read.
This is all hypothetical and very blurry. Any help understanding what is happening would be very welcome.
When you type at your terminal emulator, it writes your keystrokes to a "file", in this case an in-memory buffer that, thanks to the file system, looks just like any other file that might be on disk.
Every process inherits 3 open file handles from its parent. We are interested in one of them here, standard input. The program executed by the terminal emulator (here, bash), is given as its standard input the in-memory buffer described in the first paragraph.
a.out, when run by bash, also receives this same file as its standard input. Keep this in mind: bash and a.out are reading from the same, already-opened file.
After you run a.out, its read blocks, because its standard input is empty. When you type aecho hi<enter>, the terminal writes these characters to the buffer (<enter> becoming a single linefeed character). a.out only requests one character, so it gets a and leaves the rest of the characters in the file. (Or more precisely, the file pointer is still pointing at the e after a is read.)
After a.out completes, bash tries to read from the same file. Normally, the file is empty (i.e., the file pointer is at the end of the file), so bash blocks waiting for another command. In this case, though, there is input available already: echo hi\n. bash reads this now the same as if you had typed it after a.out completed.
Check this. As alk suggests stdin and stdout are already open with the program. Now you have to understand, once you type:
aecho hi
and hit return the stdin buffer is filled with all those letters (and space) - and will continue to be as long as you don't flush it. When the program exits, the stdin buffer is still full, and your terminal automatically handles a write into stdin by echoing it to stdout - this is what you're seeing at the end - your shell reading stdin.
Now as you point out, your code "presses return" for you so to speak - in the first execution adding an empty shell line, and in the second executing echo hi. But you must remember, you pressed return, so "\n" is in the buffer! To be explicit, you in fact typed:
aecho hi\n
Once your program exits the shell reads the remaining characters in the buffer, including the return, and that's what you see!

how to use a GDB input file for multiple input

EDIT: GDB was not the issue. Bugs in my code created the behaviour.
I am wondering how GDB's input works.
For example I created the following small c program:
#include <stdlib.h>
#include <stdio.h>
int main(){
setbuf(stdout,NULL);
printf("first:\n");
char *inp;
size_t k = 0;
getline(&inp, &k, stdin);
printf("%s",inp);
free(inp);
// read buffer overflow
printf("second:\n");
char buf[0x101];
read(fileno(stdin),buf,0x100);
printf("%s",buf);
printf("finished\n");
}
It reads two times a string from stdin and prints the echo of it.
To automate this reading I created following python code:
python3 -c 'import sys,time; l1 = b"aaaa\n"; l2 = b"bbbb\n"; sys.stdout.buffer.write(l1); sys.stdout.buffer.flush(); time.sleep(1); sys.stdout.buffer.write(l2); sys.stdout.buffer.flush();'
Running the c programm works fine. Running the c program with the python input runs fine, too:
python-snippet-above | ./c-program
Running gdb without an input file, typing the strings when requested, seems also fine.
But when it comes to using an inputfile in gdb, I am afraid I am using the debugger wrongly.
Through tutorials and stackoverflow posts I know that gdb can take input via file.
So I tried:
& python-snippet > in
& gdb ./c-program
run < in
I expected that gdb would use for the first read the first line of the file in and for the second read the second line of in.
in looks like (due to the python code):
aaaa
bbbb
But instead gdb prints:
(gdb) r < in
Starting program: /home/user/tmp/stackoverflow/test < in
first:
aaaa
second:
finished
[Inferior 1 (process 24635) exited with code 011]
Observing the variable buf after read(fileno(stdin),buf,0x100) shows me:
(gdb) print buf
$1 = 0x0
So i assume that my second input (bbbb) gets lost. How can I use multiple input inside gdb?
Thanks for reading :)
I am wondering how GDB's input works.
Your problem doesn't appear to have anything to with GDB, and everything to do with bugs in your program itself.
First, if you run the program outside of GDB in the same way, namely:
./a.out < in
you should see the same behavior that you see in GDB. Here is what I see:
./a.out < in
first:
aaaa
second:
p ��finished
So what are the bugs?
The first one: from "man getline"
getline() reads an entire line from stream, storing the address
of the buffer containing the text into *lineptr.
If *lineptr is NULL, then getline() will allocate a buffer
for storing the line, which should be freed by the user program.
You did not set inp to NULL, nor to an allocated buffer. If inp didn't happen to be NULL, you would have gotten heap corruption.
Second bug: you don't check return value from read. If you did, you'd discover that it returns 0, and therefore your printf("%s",buf); prints uninitialized values (which are visible in my terminal as ��).
Third bug: you are expecting read to return the second line. But you used getline on stdin before, and when reading from a file, stdin will use full buffering. Since your input is small, the first getline tries to read BUFSIZ worth of data, and reads (buffers) all of it. A subsequent read (naturally) returns 0 since you've already reached end of file.
You have setbuf(stdout,NULL);. Did you mean to disable buffering on stdin instead?
Fourth bug: read does not NUL-terminate the string, you have to do that yourself, before you can call printf("%s", ...) on it.
With the bugs corrected, I get expected:
first:
aaaa
second:
bbbb
finished

Linux C: what happens to unused file descriptors?

(apologies for not taking care of my accepts lately - will do so as soon as I get some time; just wanted to ask this question now that it occurred)
Consider the following C program:
int main(void) {
write(3, "aaaaaa\n", 7);
write(2, "bbbbbb\n", 7);
write(1, "cccccc\n", 7);
return 0;
}
I build and run it from the bash shell like this:
$ gcc -o wtest wtest.c
$ ./wtest 3>/dev/stdout
aaaaaa
bbbbbb
cccccc
The way I see it, in this case, due to the shell redirection of fd 3 to stdout, that file descriptor is now "used" (not sure about "opened", since there is no opening of files, in the C code at least) - and so we get the cccccc string output to terminal, as expected.
If I don't use the redirection, then the output is this:
$ ./wtest
aaaaaa
bbbbbb
Now fd 3 is not redirected - and so cccccc string is not output, again as expected.
My question is - what happened to those cccccc bytes? Did they dissapear in the same sense, as if I redirected fd 3 to /dev/null? (as in:
$ ./wtest 3>/dev/null
)
In addition, assuming that in a particular case I'd like to "hide" the fd 3 output: would there be a performance difference between redirecting "3>/dev/null" vs. not addressing fd 3 in the shell at all (in terms of streaming data; that is, if fd 3 outputs a really long byte stream, would there be an instruction penalty per byte written in the "3>/dev/null" case, as opposed to not addressing fd 3)?
Many thanks in advance for any answers,
Cheers!
My question is - what happened to those cccccc bytes?
nothing. you failed to capture the return code of write, it should tell you that there was an error and errno should tell you what the error was
You also seem to have a questionable concept of what is persistent, the "bytes" are still sitting in the string literal where the compiler put them from the beginning. write copies byte to a stream.
Jens is right. If you run your program under strace on both situations, you'll see that when you redirect, the write works - because the shell called pipe() on your behalf before fork'ing your executable.
When you look at the strace without the redirection:
write(3, "aaaaaa\n", 7) = -1 EBADF (Bad file descriptor)
write(2, "bbbbbb\n", 7bbbbbb) = 7
write(1, "cccccc\n", 7cccccc) = 7
Which reminds us of the best practice - always check your return values.

read() from stdin

Consider the following line of code:
while((n = read(STDIN_FILENO, buff, BUFSIZ)) > 0)
As per my understanding read/write functions are a part of non-buffered I/O. So does that mean read() function will read only one character per call from stdio? Or in other words, the value of n will be
-1 in case of error
n = 0 in case of EOF
1 otherwise
If it is not the case, when would the above read() function will return and why?
Note: I was also thinking that read() will wait until it successfully reads BUFSIZ number of characters from stdin. But what happens in a case number of characters available to read are less than BUFSIZ? Will read wait forever or until EOF arrives (Ctrl + D on unix or Ctrl + Z on windows)?
Also, lets say BUFSIZ = 100 and stdin = ACtrl+D (i.e EOF immediately following a single character). Now how many times the while loop will iterate?
The way read() behaves depends on what is being read. For regular files, if you ask for N characters, you get N characters if they are available, less than N if end of file intervenes.
If read() is reading from a terminal in canonical/cooked mode, the tty driver provides data a line at a time. So if you tell read() to get 3 characters or 300, read will hang until the tty driver has seen a newline or the terminal's defined EOF key, and then read() will return with either the number of characters in the line or the number of characters you requested, whichever is smaller.
If read() is reading from a terminal in non-canonical/raw mode, read will have access to keypresses immediately. If you ask read() to get 3 characters it might return with anywhere from 0 to 3 characters depending on input timing and how the terminal was configured.
read() will behave differently in the face of signals, returning with less than the requested number of characters, or -1 with errno set to EINTR if a signal interrupted the read before any characters arrived.
read() will behave differently if the descriptor has been configured for non-blocking I/O. read() will return -1 with errno set to EAGAIN or EWOULDBLOCK if no input was immediately available. This applies to sockets.
So as you can see, you should be ready for surprises when you call read(). You won't always get the number of characters you requested, and you might get non-fatal errors like EINTR, which means you should retry the read().
Your code reads:
while((n = read(0, buff, BUFSIZ) != 0))
This is flawed - the parentheses mean it is interpreted as:
while ((n = (read(0, buff, BUFSIZ) != 0)) != 0)
where the boolean condition is evaluated before the assignment, so n will only obtain the values 0 (the condition is not true) and 1 (the condition is true).
You should write:
while ((n = read(0, buff, BUFSIZ)) > 0)
This stops on EOF or a read error, and n lets you know which condition you encountered.
Apparently, the code above was a typo in the question.
Unbuffered I/O will read up to the number of characters you read (but not more). It may read less on account of EOF or an error. It may also read less because less is available at the time of the call. Consider a terminal; typically, that will only read up to the end of line because there isn't any more available than that. Consider a pipe; if the feeding process has generated 128 unread bytes, then if BUFSIZ is 4096, you'll only get 128 bytes from the read. A non-blocking file descriptor may return because nothing is available; a socket may return fewer bytes because there isn't more information available yet; a disk read may return fewer bytes because there are fewer than the requested number of bytes left in the file when the read is performed.
In general, though, read() won't return just one byte if you request many bytes.
As the read() manpage states:
Return Value
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
So, each read() will read up to the number of specified bytes; but it may read less. "Non-buffered" means that if you specify read(fd, bar, 1), read will only read one byte. Buffered IO attempts to read in quanta of BUFSIZ, even if you only want one character. This may sound wasteful, but it avoids the overhead of making system calls, which makes it fast.
read attempts to get all of characters requested.
if EOF happens before all of the requested characters can be returned, it returns what it got
after it does this the next read returns -1, to let you know you the file end.
What happens when it tries to read and there is nothing there involves something called blocking. You can call open to read a file blocking or non-blocking. "blocking" means wait until there is something to return.
This is what you see in a shell waiting for input. It sits there. Until you hit return.
Non-blocking means that read will return no bytes of data if there are none. Depending on a lot of other factors which would make a completely correct answer unusable for you, read will set errno to something like EWOULDBLOCK, which lets you know why your read returned zero bytes. It is not necessarily a fatal error.
Your code could test for a minus to find EOF or errors
When we say read is unbuffered, it means no buffering takes place at the level of your process after the data is pulled off the underlying open file description, which is a potentially-shared resource. If stdin is a terminal, there are likely at least 2 additional buffers in play, however:
The terminal buffer, which can probably hold 1-4k of data off the line until.
The kernel's cooked/canonical mode buffer for line entry/editing on a terminal, which lets the user perform primitive editing (backspace, backword, erase line, etc.) on the line until it's submitted (to the buffer described above) by pressing enter.
read will pull whatever has already been submitted, up to the max read length you passed to it, but it cannot pull anything from the line editing buffer. If you want to disable this extra layer of buffering, you need to lookup how to disable cooked/canonical mode for a terminal using tcsetattr, etc.

Is this kind of behavior defined by standard?

#include <unistd.h>
int main(int argc, char* argv[])
{
char buf[500];
read(0, buf, 5);
return 0;
}
The above read 5 characters from stdin,but if I input more than 5:
12345morethan5
[root# test]# morethan5
-bash: morethan5: command not found
The remaining characters will be executed as shell commands.
Is this kind of behavior defined by standard?
Sort of :-)
Your program reads 5 characters, and that's it. Not less, not more. The rest remain in the terminal buffer and get sent to your shell once your C program terminates.
Since you are using read(), which is a raw system call, instead of any of the C stdio buffering alternatives this behaviour is not just expected, but required.
From the POSIX standard on read():
The read() function shall attempt to
read nbyte bytes from the file
associated with the open file
descriptor, fildes, into the buffer
pointed to by buf.
...
Upon successful completion, where
nbyte is greater than 0, read() shall
mark for update the st_atime field of
the file, and shall return the number
of bytes read. This number shall never
be greater than nbyte.
...
Upon successful completion, read()
[XSI] [Option Start] and pread()
[Option End] shall return a
non-negative integer indicating the
number of bytes actually read.
I.e. read() should never read more bytes from the file descriptor than requested.
From the related part on terminals:
It is not, however, necessary to read
a whole line at once; any number of
bytes, even one, may be requested in a
read() without losing information.
...
The last process to close a terminal device file shall cause any output to be sent to the device and any input to be discarded.
Note: normally your shell will still have an open file descriptor for the terminal, until you end the session.
That has nothing to do with any standard, it's up to your runtime what to write to stdin. Your runtime makes the standard input available to your program, which reads some bytes from it and quits, and then the remaining bytes are processed by the runtime itself -- if you can configure it to clear all the file descriptors after forking a process, you could maybe prevent this behaviour, but that would seriously impede most of the standard command line workflows which rely on attaching one process's input to another process's output...

Resources