Strange behavior with read()

Strange behavior with read() - c

I discovered the function read(), but I don't understand everything.
Here is my code:
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char array[10];
int ret;
printf("read : ");
fflush(stdout);
array[sizeof(array) - 1] = '\0';
ret = read(STDIN_FILENO, array, sizeof(array) - 1);
printf("array = %s\n", array);
printf("characters read = %d\n", ret);
//getchar();
return (0);
}
Here is an example of the running program :
$> ./a.out
read : hi guys how are you
array = hi guys h
characters read = 9
$> ow are you
zsh: command not found: ow
$>
Why is it launching a shell command after the end of the program?
I noticed that if I uncomment the getchar() line, this strange behavior disappears. I'd like to understand what is going on, if someone has an idea :)

Your call to read is reading in the first 9 characters of what you've type. Anything else will be left in the input buffer so that when you program exits, your shell will read it instead.
You should check the return value of read so you know how much has been read as it's not guaranteed that it'll be the amount you ask for and also the value returned is used to indicate an error.
The string read in won't be null-terminated either, so you also should use the return value (if positive) to put the NUL character in so that your string is valid.
If you want to read in the whole line, you'll need to put in a loop and identify when there is an end of line character (most likely '\n').

You typed about 20 characters, but you only read 9 characters with read(). Everything after that was left in the terminal driver's input buffer. So when the shell called read() after the program exited, it got the rest of the line, and tried to execute it as a command.
To prevent this, you should keep reading until you get to the end of the line.

Related

Different execution flow using read() and fgets() in C

I have a sample program that takes in an input from the terminal and executes it in a cloned child in a subshell.
#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/wait.h>
#include <sched.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
int clone_function(void *arg) {
execl("/bin/sh", "sh", "-c", (char *)arg, (char *)NULL);
}
int main() {
while (1) {
char data[512] = {'\0'};
int n = read(0, data, sizeof(data));
// fgets(data, 512, stdin);
// int n = strlen(data);
if ((strcmp(data, "exit\n") != 0) && n > 1) {
char *line;
char *lines = strdup(data);
while ((line = strsep(&lines, "\n")) != NULL && strcmp(line, "") != 0) {
void *clone_process_stack = malloc(8192);
void *stack_top = clone_process_stack + 8192;
int clone_flags = CLONE_VFORK | CLONE_FS;
clone(clone_function, stack_top, clone_flags | SIGCHLD, (void *)line);
int status;
wait(&status);
free(clone_process_stack);
}
} else {
exit(0);
}
}
return 0;
}
The above code works in an older Linux system (with minimal RAM( but not in a newer one. Not works means that if I type a simple command like "ls" I don't see the output on the console. But with the older system I see it.
Also, if I run the same code on gdb in debugger mode then I see the output printed onto the console in the newer system as well.
In addition, if I use fgets() instead of read() it works as expected in both systems without an issue.
I have been trying to understand the behavior and I couldn't figure it out. I tried doing an strace. The difference I see is that the wait() return has the output of the ls program in the cases it works and nothing for the cases it does not work.
Only thing I can think of is that read(), since its not a library function has undefined behavior across systems. But I can't agree as to how its affecting the output.
Can someone point me out to why I might be observing this behavior?
EDIT
The code is compiled as:
gcc test.c -o test
strace when it's not working as expected is shown below
strace when it's working as expected (only difference is I added a printf("%d\n", n); following the call for read())
Thank you
Shabir

There are multiple problems in your code:
a successful read system call can return any non zero number between 1 and the buffer size depending on the type of handle and available input. It does not stop at newlines like fgets(), so you might get line fragments, multiple lines, or multiple lines and a line fragment.
furthermore, if read fills the whole buffer, as it might when reading from a regular file, there is no trailing null terminator, so passing the buffer to string functions has undefined behavior.
the test if ((strcmp(data, "exit\n") != 0) && n > 1) { is performed in the wrong order: first test if read was successful, and only then test the buffer contents.
you do not set the null terminator after the last byte read by read, relying on buffer initialization, which is wasteful and insufficient if read fills the whole buffer. Instead you should make data one byte longer then the read size argument, and set data[n] = '\0'; if n > 0.
Here are ways to fix the code:
using fgets(), you can remove the line splitting code: just remove initial and trailing white space, ignore empty and comment lines, clone and execute the commands.
using read(), you could just read one byte at a time, collect these into the buffer until you have a complete line, null terminate the buffer and use the same rudimentary parser as above. This approach mimics fgets(), by-passing the buffering performed by the standard streams: it is quite inefficient but avoids reading from handle 0 past the end of the line, thus leaving pending input available for the child process to read.

It looks like 8192 is simply too small a value for stack size on a modern system. execl needs more than that, so you are hitting a stack overflow. Increase the value to 32768 or so and everything should start working again.

Using control+D (EOF) but return an unexpected character D [duplicate]

This question already has answers here:
Simple program adding "D" to output
(3 answers)
Why does C program print 0D instead of 0? (When EOF sent as Ctrl+D) [duplicate]
(1 answer)
Closed 5 years ago.
I was coding a very simple programme to detect word pattern by entering to stdin and return the times found the pattern.
However the code return me the correct number but follow a char D.
#include <stdio.h>
#include "string.h"
#define MAXLINE 1000 /* maximum input line length */
char pattern[] = "ould"; /* pattern to search for */
/* print all lines from standard input that match pattern */
int main()
{
char line[MAXLINE];
int found = 0;
while (fgets(line, MAXLINE, stdin) != NULL)
if (strstr(line, pattern) != NULL) {
printf("%s", line);
found++;
}
printf("%d \n", found);
return 0;
}
Result:
glaroam2-180-76:Lab2 apple$ ./find0
fould
fould
1D

The code is correct (apart from the #include "string.h" which should be
#include <string.h>)1, the problem is that when you press
Ctrl+D on your terminal, your terminal might write
something on the terminal, which you cannot control and this output might be
^D
After fgets returns NULL, you do printf("%d \n", found); which prints the '1'.
But because there was ^D on the terminal, the ^ was replaced by the '1' and
you end up with:
1D
Change your last printf to this:
printf("\n\n%d \n, found);
And you might see only a '1' in the next lines of the output.
This has nothing to do with your C program, it's the behaviour of your terminal.
My terminal for example doesn't print when pressing Ctrl+D,
but when pressing Ctrl+C I get ^C. There's nothing you
can do.
edit
With There's nothing you can do I mean that you cannot control the way the
terminal from you C program without calling external tools like stty. While
this might solve your problem, you are loosing portability.
However, before you start you program, you can configure your terminal using a
program like stty. See Jonathan Leffler's answer for more info on that.
Fotenotes
1As Jonathan Leffler points out in the comments, using quotes instead
of angle brackets for system headers is not an error per se. For example my GCC
compiler searches in the same directory of the compiled file for headers that
were included with quotes. But in general, it's a good practice to include the
header files included provided by your system with angle brackets.

It's a terminal setting: echoctl. It means that when you type Control-D, the terminal echoes ^D, and then the 1 overwrites the ^. Try using:
stty -echoctl
and then rerunning your program.
With that said, I'm surprised that the D isn't wiped out by the blank after the %d in the format string. I suspect your actual code may be missing that. When I tested on my Mac, the program with the space after the %d did not show the D for long enough for me to spot it; when I removed that space, I got the output shown in the question.

C Read in bash : stdin and stdout

I have a simple C program with the read function and I don't understand the output.
//code1.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main()
{
int r;
char c; // In C, char values are stored in 1 byte
r = read ( 0, &c, 1);
// DOC:
//ssize_t read (int filedes, void *buffer, size_t size)
//The read function reads up to size bytes from the file with descriptor filedes, storing the results in the buffer.
//The return value is the number of bytes actually read.
// Here:
// filedes is 0, which is stdin from <stdio.h>
// *buffer is &c : address in memory of char c
// size is 1 meaning it will read only 1 byte
printf ("r = %d\n", r);
return 0;
}
And here is a screenshot of the result:
I ran this program 2 times as showed above and typed "a" for the first try and "aecho hi" for the second try.
How I try to explain the results:
When read is called it sees that stdin is closed and opens it (from my point of view, why? It should just read it. I don't know why it opens it).
I type "aecho hi" in the bash and press enter.
read has priority to process stdin and reads the first byte of "aecho hi" : "a".
I get the confirmation that read has processed 1 byte with the printf.
a.out has finished and is terminated.
Somehow the remaining data in stdin is processed in bash (the father of my program) and goes to stdout which executes it and for some reason the first byte has been deleted by read.
This is all hypothetical and very blurry. Any help understanding what is happening would be very welcome.

When you type at your terminal emulator, it writes your keystrokes to a "file", in this case an in-memory buffer that, thanks to the file system, looks just like any other file that might be on disk.
Every process inherits 3 open file handles from its parent. We are interested in one of them here, standard input. The program executed by the terminal emulator (here, bash), is given as its standard input the in-memory buffer described in the first paragraph.
a.out, when run by bash, also receives this same file as its standard input. Keep this in mind: bash and a.out are reading from the same, already-opened file.
After you run a.out, its read blocks, because its standard input is empty. When you type aecho hi<enter>, the terminal writes these characters to the buffer (<enter> becoming a single linefeed character). a.out only requests one character, so it gets a and leaves the rest of the characters in the file. (Or more precisely, the file pointer is still pointing at the e after a is read.)
After a.out completes, bash tries to read from the same file. Normally, the file is empty (i.e., the file pointer is at the end of the file), so bash blocks waiting for another command. In this case, though, there is input available already: echo hi\n. bash reads this now the same as if you had typed it after a.out completed.

Check this. As alk suggests stdin and stdout are already open with the program. Now you have to understand, once you type:
aecho hi
and hit return the stdin buffer is filled with all those letters (and space) - and will continue to be as long as you don't flush it. When the program exits, the stdin buffer is still full, and your terminal automatically handles a write into stdin by echoing it to stdout - this is what you're seeing at the end - your shell reading stdin.
Now as you point out, your code "presses return" for you so to speak - in the first execution adding an empty shell line, and in the second executing echo hi. But you must remember, you pressed return, so "\n" is in the buffer! To be explicit, you in fact typed:
aecho hi\n
Once your program exits the shell reads the remaining characters in the buffer, including the return, and that's what you see!

how to use a GDB input file for multiple input

EDIT: GDB was not the issue. Bugs in my code created the behaviour.
I am wondering how GDB's input works.
For example I created the following small c program:
#include <stdlib.h>
#include <stdio.h>
int main(){
setbuf(stdout,NULL);
printf("first:\n");
char *inp;
size_t k = 0;
getline(&inp, &k, stdin);
printf("%s",inp);
free(inp);
// read buffer overflow
printf("second:\n");
char buf[0x101];
read(fileno(stdin),buf,0x100);
printf("%s",buf);
printf("finished\n");
}
It reads two times a string from stdin and prints the echo of it.
To automate this reading I created following python code:
python3 -c 'import sys,time; l1 = b"aaaa\n"; l2 = b"bbbb\n"; sys.stdout.buffer.write(l1); sys.stdout.buffer.flush(); time.sleep(1); sys.stdout.buffer.write(l2); sys.stdout.buffer.flush();'
Running the c programm works fine. Running the c program with the python input runs fine, too:
python-snippet-above | ./c-program
Running gdb without an input file, typing the strings when requested, seems also fine.
But when it comes to using an inputfile in gdb, I am afraid I am using the debugger wrongly.
Through tutorials and stackoverflow posts I know that gdb can take input via file.
So I tried:
& python-snippet > in
& gdb ./c-program
run < in
I expected that gdb would use for the first read the first line of the file in and for the second read the second line of in.
in looks like (due to the python code):
aaaa
bbbb
But instead gdb prints:
(gdb) r < in
Starting program: /home/user/tmp/stackoverflow/test < in
first:
aaaa
second:
finished
[Inferior 1 (process 24635) exited with code 011]
Observing the variable buf after read(fileno(stdin),buf,0x100) shows me:
(gdb) print buf
$1 = 0x0
So i assume that my second input (bbbb) gets lost. How can I use multiple input inside gdb?
Thanks for reading :)

I am wondering how GDB's input works.
Your problem doesn't appear to have anything to with GDB, and everything to do with bugs in your program itself.
First, if you run the program outside of GDB in the same way, namely:
./a.out < in
you should see the same behavior that you see in GDB. Here is what I see:
./a.out < in
first:
aaaa
second:
p ��finished
So what are the bugs?
The first one: from "man getline"
getline() reads an entire line from stream, storing the address
of the buffer containing the text into *lineptr.
If *lineptr is NULL, then getline() will allocate a buffer
for storing the line, which should be freed by the user program.
You did not set inp to NULL, nor to an allocated buffer. If inp didn't happen to be NULL, you would have gotten heap corruption.
Second bug: you don't check return value from read. If you did, you'd discover that it returns 0, and therefore your printf("%s",buf); prints uninitialized values (which are visible in my terminal as ��).
Third bug: you are expecting read to return the second line. But you used getline on stdin before, and when reading from a file, stdin will use full buffering. Since your input is small, the first getline tries to read BUFSIZ worth of data, and reads (buffers) all of it. A subsequent read (naturally) returns 0 since you've already reached end of file.
You have setbuf(stdout,NULL);. Did you mean to disable buffering on stdin instead?
Fourth bug: read does not NUL-terminate the string, you have to do that yourself, before you can call printf("%s", ...) on it.
With the bugs corrected, I get expected:
first:
aaaa
second:
bbbb
finished

How can I flush unread data from a tty input queue on a UNIX system?

My program has to read just ONE character from the standard input, and so I use read(0, buffer, 1).
But if the user insert more than one single character, they remain in some buffer and when I call a read again they are still there.
So, how can I discard these characters?
I want that when I call a read again, the buffer is filled with the new character, not with the old ones.
An example:
I've a read(0, buffer, 1) and the user writes abcde. My buffer contains a (and it's right), but then I call read(0, buffer, 1) again and I want the next character written by the user from now, and not the b written before.

The POSIX answer is tcflush(): flush non-transmitted output data, non-read input data, or both. There is also tcdrain() which waits for output to be transmitted. They've been in POSIX since there was a POSIX standard (1988 for the trial-use version), though I don't recall ever using them directly.
Example program
Compile this code so the resulting program is called tcflush:
#include <stdio.h>
#include <unistd.h>
#include <termios.h>
int main(void)
{
char buffer[20] = "";
read(0, buffer, 1);
printf("%c\n", buffer[0]);
tcflush(0, TCIFLUSH);
read(0, buffer, 1);
printf("%c\n", buffer[0]);
tcflush(0, TCIFLUSH);
return 0;
}
Example dialog
$ ./tcflush
abc
a
def
d
$
Looks like what the doctor ordered. Without the second tcflush(), the shell complains that it can't find a command ef. You can place a tcflush() before the first read if you like. It wasn't necessary for my simple testing, but if I'd used sleep 10; ./tcflush and then typed ahead, it would make a difference.
Tested on RHEL 5 Linux on an x86/64 machine, and also on Mac OS X 10.7.4.

When your program wants to start reading characters, it must drain the buffer of existing characters and then wait to read the character.
Otherwise, it will read the last character entered, not the last character entered after right now.
Naturally, you do not need to do anything with the read characters; but, you do need to read them.