Understanding behaviour of read() and write() - c

hi i am a student and just start learning low level c programming.i tried to understand read() and write() methods with this program.
#include <unistd.h>
#include <stdlib.h>
main()
{
char *st;
st=calloc(sizeof(char),2);//allocate memory for 2 char
read(0,st,2);
write(1,st,2);
}
i was expecting that it would give segmentation fault when i would try to input more than 2 input characters.but when i execute program and enter " asdf " after giving " as " as output it executes "df" command.
i want to know why it doesn't give segmentation fault when we assign more than 2 char to a string of size 2.and why is it executing rest(after 2 char)of input as command instead of giving it as output only?
also reading man page of read() i found read() should give EFAULT error,but it doesn't.
I am using linux.

Your read specifically states that it only wants two characters so that's all it gets. You are not putting any more characters into the st area so you won't get any segmentation violations.
As to why it's executing the df part, that doesn't actually happen on my immediate system since the program hangs around until ENTER is pressed, and it appears the program's I/O is absorbing the extra. But that immediate system is Cygwin - see update below for behaviour on a "real" UNIX box.
And you'll only get EFAULT if st is outside your address space or otherwise invalid. That's not the case here.
Update:
Trying this on Ubuntu 9, I see that the behaviour is identical to yours. When I supply the characters asls, the program outputs as then does a directory listing
That means your program is only reading the two characters and leaving the rest for the "next" program to read, which is the shell.
Just make sure you don't try entering:
asrm -rf /
(no, seriously, don't do that).

You ask read() to read no more than 2 characters (third parameters to read()) and so it overwrites no more than two characters in the buffer you supplied. That's why there's no reason for any erroneous behavior.

When you read(), you specify how many bytes you want. You won't get more than that unless your libc is broken, so you'll never write beyond the end of your buffer as long as your count is never greater than the size of your buffer. The extra bytes remain in the stream, and the next read() will get them. And if you don't have a next read() in your app, the process that spawned it (which would normally be the shell) may see them, since spawning a console app from the shell involves attaching the shell's input and output streams to the process. Whether the shell sees and gets the bytes depends partly on how much buffering is done behind the scenes by libc, and whether it can/does "unget" any buffered bytes on exit.

with read(0, st, 2); you read 2 chars from standard input.
The rest of what you typed will not be accuired from the program, but will not be omitted, so the keystrokes are going back to the shell, from which your program started (which are df and enter).

Since you only read 2 character, there is no problem. the df characters are not consume by your program, so they stay in the terminal buffer, and are consumed by the shell :
your program runs
you type asdf\n
your program reads asand leaves df\n in the tty buffer
you write the content of the st buffer to stdout
your program stops
the shell reads df\n from input and executes df command.
Fun things to try :
strace your program, to trace the system call : strace -e read, write ./yourprogram
read(0, st, 5)

Related

How can I find the dimension of the standard input?

I have a problem finding out how big is the dimension of the stdin through a pipe. I know that a lot of you will be furious at this question, but just hear me out.
Half of it already works:
$ echo "BYE" | ./my_prog
In the linux shell outputs 4 which is exactly what I want.
The problem comes out when I try to feed it some bytes, in fact the first time works while after it doesn't work anymore.
$ ./create_bytes.py -n 200 | ./my_prog
$ 200
$ ./create_bytes.py -n 200 | ./my_prog
$ 0
and I can't understand why. I'm sure the stream is always the same length.
The code I'm using is the following
int main (int argc, char *argv[]) {
struct stat fd_s;
if (fstat(STDIN_FILENO, &fd_s) == -1) {
perror("fstat(fdin)");
exit(EXIT_FAILURE);
}
printf("%lld\n", fdin_stat.st_size);
...
}
Thanks in advance
EDIT:
This is the actual request:
Read a stream of lines (bytes sequence that terminates with \n) from stdin in 16 bytes blocks. Every line can't be bigger than 128 bytes.
Maybe I'm just making it more difficult than it should be?
I hope it can help
Thanks
If the input is a pipe, it doesn't have a size. It's a stream that in principle can go on forever. The fact that the first time you ran it it gave you a number is not something you can rely on.
If you want to read everything from stdin into memory, you need to read data in a loop, and have a buffer that you realloc() when it is full and there is still more data to be read.
If you need to read in a text file and are going to process it line by line, you can consider using the POSIX function getline(), or you might even read a whole file with getdelim() if you are sure it doesn't contain a given delimiter.
You've run into an ill-defined corner case. POSIX specifies that fstat returns the struct stat info of a file associated with a file descriptor. But what happens when the file descriptor does not correspond to a file it not really defined. You might expect the stat call to return an error (and I'm sure there are some systems that do so), but on most systems it returns some information about the object the file descriptor refers to. What info depends on the OS and the type of the object.
On Linux with a pipe (the case you seem to be using) it will always return st_size = 0 (which implies you are using something other than Linux). I would imagine there are systems that return with st_size set to the amount of data buffered in the pipe, as that seems a useful piece of information. Your results seem consistent with that.
In a comment I asked
If you were to invoke ./create_bytes.py -n 100000000 | ./my_prog, how would you expect it to work?
and you replied
It should print out 100000000 I guess
So let's think about this, and ask: How could this possibly work?
The create_bytes.py script is going to write 100,000,000 bytes. Where do they go? Into the pipe.
But what happens over in my_prog? It doesn't actually read any characters from the pipe, it just asks, what is the "size" of the pipe?
But if create_bytes.py has written 100,000,000 characters, and if my_prog hasn't read them, where are they? Are they all "in the pipe"? And the answer is, no, they are not.
Pipes have a finite capacity. If they fill up, and if the reader doesn't read characters out fast enough, the operating system automatically puts the writing process to sleep. The writing process isn't woken up again, isn't given the opportunity to write any more characters, until some empty space has cleared up in the pipe for it to write into again.
My point is that if pipes have a finite capacity (as I assert that they do), it's impossible for the example I posed to print "100000000", for the simple reason that there is no piece of code, anywhere, that can possibly read and count those characters.
You might imagine that fstat ought to read and count them in this situation somehow, but (a) it doesn't and (b) it couldn't. If fstat read characters from the pipe so it could count them, the characters would be gone. If your program then tried to read them (perhaps down below the ... you had in your code fragment), it wouldn't be able to read them, and that would be Wrong.
But, to convince yourself, I encourage you to try that invocation
./create_bytes.py -n 100000000 | ./my_prog
and see what you get. I'll bet you $100 you don't get "100000000", but the result you do get might be interesting.
I don't have your create_bytes.py script, so instead I tried
yes | a.out
yes is a standard Unix program that prints "y" an infinite number of times. a.out was where I'd just compiled your test program, after fixing it up a bit. And, on my machine, it printed
65536
So evidently, on my machine, when fstat is called on a file descriptor that's connected to a pipe, fstat fills in st_size with the size of the contents of the pipe, and on my machine, pipes evidently have a capacity of 65536, which is of course 216.

Read STDIN using syscall READ in Linux: unconsumed input is sent to bash

The following program (64-bit YASM) reads 4 bytes from the standard input and exits:
section .data
buf db " " ; Just allocate 16 bytes for string
section .text
global _start
_start:
mov rax, 0 ; READ syscall
mov rdi, 0 ; STDIN
mov rsi, buf ; Address of the string
mov rdx, 4 ; How many bytes to read
syscall
; Exit:
mov rax, 60
mov rdi, 0
syscall
Once compiled
yasm -f elf64 -l hello.lst -o input.o input.asm
ld -o input input.o
If it is run just as
./input
with, say, 123456\n as user input, it will consume 1234, but the end bit, 56\n is sent to bash. So, bash will try to run the command 56... thankfully unsuccessfully. But imagine if the input were 1234rm -f *. However, if I provide the input using re-direction or piping, for example,
echo "123456" | ./input
56 is not sent to bash.
So, how can I prevent unconsumed input to be sent to bash? Do I need to keep consuming it until EOF in some form is encountered? Is this even expected behavior?
The same thing happens in a C program:
#include <unistd.h>
int main()
{
char buf[16];
read(0, buf, 4);
return 0;
}
(I was just wondering if C runtime was somehow clearing STDIN, but no, it doesn't)
Yep, it's normal behavior. Anything you don't consume is available for the next process. You know how when you're doing something slow you can type ahead, and when the slow thing finishes the shell will run what you typed? Same thing here.
There's no one-size-fits-all solution. It's really about user expectation. How much input did they expect your program to consume? That's how much you should read.
Does your program act like a single line prompt like read? Then you should read a full line of input up through the next \n character. The easiest way to do that without over-reading is to read 1 character at a time. If you do bulk reads you might consume part of the next line by mistake.
Does your program act like a filter like cat or sed or grep? Then you should read until you reach EOF.
Does your program not read from stdin at all like echo or gcc? Then you should leave stdin alone and consume nothing, leaving the input for the next program.
Consuming exactly 4 bytes is unusual but could be reasonable behavior for, say, an interactive program that prompts for a 4-digit PIN and doesn't require the user to press Enter.
"Is sent to" bash (or to any other program) is a suboptimal way to think about it, and perhaps that's contributing to your surprise / confusion. "Made available to" would be a more accurate characterization, whether you're talking about a terminal, a pipe, or any other input source connected to a program's standard input.
When one process forks another, such as a shell does to execute many of the commands you enter, the new process inherits a lot of properties from its parent. Among those are its open file descriptors, and in particular, those of the parent's standard streams. On POSIX systems, this is where a process's standard streams come from in the absence of redirection, and it is also the mechanism by which redirection is implemented.
Thus, when no I/O redirection is involved, of course the parent shell reads input data that programs it launches leave unread. The input could not be available for those programs if it were not also available to the parent shell, because both are reading from the same source. This is also why you can move programs between foreground and background in the same terminal, and each one is able to read from the terminal when it is in the foreground.
Inasmuch as you mention the read() syscall in particular, I suspect your surprise might also have been related to seeing different-seeming behavior from programs that use stdio functions to read from the standard input. This has everything to do with the fact that when the standard input is connected to a terminal, the stdio functions read it in line-buffered mode by default. That is, they transfer data from the underlying source into an internal buffer, thus removing it from the stream, one line at a time (with some caveats).
You can emulate that with read. The simplest way would be to read one byte at a time, until you see a newline or end-of-file, but the C library functions do it more efficiently by configuring the terminal driver appropriately when the standard input is in fact connected to a terminal. You can do that directly, too, but it's a bit complicated.

System Calls Function for prompting and getting user input

Ok, so I am writing a C program for my class, but I am only allowed to use system calls. Basically our program is making our on cp command, where we are taking two files as inputs from the command line and copying the first file and putting it into a second file. It is relatively simple and I have most of the code right or just about right with maybe some small fixes. However, one part of the program is if the destination file already exits, we need to prompt the user to ask if it should be overwritten or not, so I need to know how to get user input using a system call function, aka I can't use scanf, fgets, gets etc. The only function I can use from the standard library is printf basically. So I need to know what the system calls function is to get a user prompt. This part of the code is suppose to work like cp -i , if that helps anyone. Thank you in advance.
You could use system call read. To read from standard input, fd (file descriptor) is 0.
$ man read
READ(2)
Linux Programmer's Manual (2)
NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If count is greater than SSIZE_MAX, the result is unspecified.

When does scanf start and stop scanning?

It seems scanf begins scanning the input when the Enter key is pressed, and I want to verify this with the code below (I eliminated error checking and handling for simplicity).
#include <stdio.h>
int main(int argc, char **argv) {
/* disable buffering */
setvbuf(stdin, NULL, _IONBF, 0);
int number;
scanf("%d", &number);
printf("number: %d\n", number);
return 0;
}
Here comes another problem, after I disable input buffering (just to verify the result; I know I should next-to-never do that in reality in case it interferes the results), the output is (note the extra prompt):
$ ./ionbf
12(space)(enter)
number: 12
$
$
which is different from the output when input buffering is enabled (no extra prompt):
$ ./iofbf
12(space)(enter)
number: 12
$
It seems the new line character is consumed when buffer enabled. I tested on two different machines, one with gcc 4.1.2 and bash 3.2.25 installed, the other with gcc 4.4.4 and bash 4.1.5, and the result is the same on both.
The problems are:
How to explain the different behaviors when input buffering is enabled and disabled?
Back to the original problem, when does scanf begin scanning user input? The moment a character is entered? Or is it buffered until a line completes?
Interesting question — long-winded answer. In case of doubt, I'm describing what I think happens on Unix; I leave Windows to other people. I think the behaviour would be similar, but I'm not sure.
When you use setvbuf(stdin, NULL, _IONBF, 0), you force the stdin stream to read one character at a time using the read(0, buffer, 1) system call. When you run with _IOFBF or _IOLBF, then the code managing the stream will attempt to read many more bytes at a time (up to the size of the buffer you provide if you use setvbuf(), or BUFSIZ if you don't). These observations plus the space in your input are key to explaining what happens. I'm assuming your terminal is in normal or canonical input mode — see Canonical vs non-canonical terminal input for a discussion of that.
You are correct that the terminal driver does not make any characters available until you type return. This allows you to use backspace etc to edit the line as you type it.
When you hit return, the kernel has 4 characters available to send to any program that wants to read them: 1 2 space return.
In the case where you are not using _IONBF, those 4 characters are all read at once into the standard I/O buffer for stdin by a call such as read(0, buffer, BUFSIZ). The scanf() then collects the 1, the 2 and the space characters from the buffer, and puts back the space into the buffer. (Note that the kernel has passed all four characters to the program.) The program prints its output and exits. The shell resumes, prints a prompt and waits for some more input to be available — but there won't be any input available until the user types another return, possibly (usually) preceded by some other characters.
In the case where you are using _IONBF, the program reads the characters one at a time. It makes a read() call to get one character and gets the 1; it makes another read() call and gets the 2; it makes another read() call and gets the space character. (Note that the kernel still has the return ready and waiting.) It doesn't need the space to interpret the number, so it puts it back in its pushback buffer (there is guaranteed to be space for at least one byte in the pushback buffer), ready for the next standard I/O read operation, and returns. The program prints its output and exits. The shell resumes, prints a prompt, and tries to read a new command from the terminal. The kernel obliges by returning the newline that is waiting, and the shell says "Oh, that's an empty command" and gives you another prompt.
You can demonstrate this is what happens by typing 1 2 x p s return to your (_IONBF) program. When you do that, your program reads the value 12 and the 'x', leaving 'ps' and the newline to be read by the shell, which will then execute the ps command (without echoing the characters that it read), and then prompt again.
You could also use truss or strace or a similar command to track the system calls that are executed by your program to see the veracity of what I suggest happens.

Interacting with shell program in C to feed a custom Buffer into its gets() function (and more)

If you dont want the long sschpeal head the the last paragraph-->
I found a buffer overflow vulnerability in a program that is using gets() to fill a function's local 1024-char* buffer. It's on Sparc Solaris 5.8 (sun4u) 32-bit.
The first obstacle to overcome was the tch was not letting me manually input > 257 chars
(256 if I want to be able to hit enter ;)
To bypass this, I have been executing /bin/sh and stty raw and I can effectively overflow the buffer now with > 1095 chars.
(Note : I have to use Ctrl-J to do line-feeds/enter , though I haven't researched stty raw to examine why this change occurs.
My issue is this: it is now time to not only overflow the buffer but also write new return address / preserve %fp in hex codes. But since I know of no way to manually enter hex codes from inside a terminal program, I figured I could find a way to use C and have it execute/interact with the vulnerable program and eventually send it my custom buffer.
HOWEVER, if I had a way to manually enter / copy paste hex bytes, I could just do something EASY like this!!!
perl -e 'print "n" . "A"x1094 . "\xff\xbe\xf5\x58" . "\xff\xbe\xff\x68" . "\0"'
(if you're wondering why I am printing 'n' it is because the vulnerable program checks for a yes/no # index 0 of the string)
because I know no way to manually paste such hex-information, I have been trying in C.
In C, I craft the special buffer and have been learning to popen() the vulnerable program ("w") and fputs my buffer, but it has been working iffy at best. (popen and IPC is all new to me)
(I also tried piping/dup2ing and i got NO results, no evidence of effective string output/input) not sure what is going wrong, and I experimented much with the code and later abandoned it.
The best to depict the output from my 'popen' program is that there is a segfault in the vulnerable program only by delimiting the buffer at indexes [1096->1099], this is effectively the location of the function's %fp, so it seemed normal # first. However, delimiting the string at indexes HIGHER than this leaves the programing working fine (WTF)!!! And that sort of behavior makes me think WTF!!? That is not the same behavior as manually pasting, as going more chars most definitely changes seg fault -> bus error, because I will be next overwriting the return address followed by whatever possibly important info in that stack frame and beyond!!
Is the whole string not actually getting sent in one bang?!?!? I heard something about buffer fflush() issues from the popen() manpage, but I dont understand that talk!!
It's my first time using popen(), and there is more behavior that I have deemed strange-> if i stop fputs()ing data , the vulnerable program goes into an infinite loop, repeatedly printing the last output string that it NORMALLY would
only print once,
but in this case, whenever i stop fputs'ing, the thing starts infinitely printing out. Now, I expected that if I am not outputting, wouldn't the program just sit and wait for more input like a good duck. ??? apparently not. apparently it has to keep on pissing and moaning that I need to enter the next string!! is this normal behavior with popen?! Perhaps it is due to my popen' program exiting and closing with pclose(), before actually finishing (but i was expecting a buffer overflow and i dont know why I am not getting it like I could when pasting manually)
Note: I am using "\r\n" to signal the vulnerable program to do a 'return' , I am not sure the equivalent of CTRL-J / Enter key (which enter key does not work in raw tty). I am also not sure if raw tty is even necessary when piping a buffer.
then I thought I try to be clever and cat the strings to a file and then do a pipe via command line. I have no idea if u can pipe like this to a program expecting inputs
in this form, I could not even get a single overflow!! i.e.
printf "\r\n" > derp && perl -e 'print "n" . "A"x1025' >> derp && printf "\r\n" >> derp
cat derp | ./vuln
Now, rewind <-> back in tsh, i said I have a 257 char limit, and i needed to do ONE LESS THAN THAT if i wanted to be able to hit enter and have the program continue operation. So, perhaps \r\n is not right here, cause that's 2 chars. either that or you just Cannot cat into a program like this. But I AM using \r\n in my C programs to tell the vulnerable program that I have hit enter, and they are at least mildly more functional (not really), though still not overflowing the buffer in the same fashion as manually pasting my trash buffer.
ARGh!!!
Also, using just one or the other: '\r' or '\n' was most definitely not working! is there another control char out there I am missing out on? And is it possible that this could be one of my issues with my programs???
but basically my whole problem is I cant' seem to understand how to create a program to run and interface with a command-line executable and say hey!!! Take this whole buffer into your gets(), i know you'd really love it!! just as I would if I was running the program from terminal myself.
And i know of no way to manually paste / write hex codes into the terminal, is the whole reason why i am trying to write an interacting program to
craft a string with hext bytes in C and send to that program's gets()!!!!
If you jumped to this paragraph, i want you also to know that I am using specifically /bin/bash and stty raw so that I could manually input more than 257 chars (not sure if I NEED to continue doing this if I can successfully create an interacting program to send the vulnerable program the buffer. maybe sending a buffer in that way bypasses tch' terminal 257 char limit)
Can anyone help me!?!?!?!?!
The popen call is probably the call you want. Make sure to call pclose after the test is finished so that the child process is properly reaped.
Edit Linux man page mentioned adding "b" to the mode was possible for binary mode, but POSIX says anything other than "r" or "w" is undefined. Thanks to Dan Moulding for pointing this out.
FILE *f = popen("./vuln", "w");
fwrite(buf, size, count, f);
pclose(f);
If the shell is reading with gets(), it is reading its standard input.
In your exploit code, therefore, you need to generate an appropriate overlong string. Unless you're playing at being expect, you simply write the overlong buffer to a pipe connected from your exploit program to the victim's standard input. You just need to be sure that your overlong string doesn't contain any newlines (CR or LF). If you pipe, you avoid the vagaries of terminal settings and control-J for control-M etc; the pipe is a transparent 8-bit transport mechanism.
So, your program should:
Create a pipe (pipe()).
Fork.
Child:
connect the read end of the pipe to standard input (dup2()).
close the read and write ends of the pipe.
exec the victim program.
report an error and exit if it fails to exec the victim.
Parent:
close the read end of the pipe.
generates the string to overflow the victim's input buffer.
write the string to the victim down the pipe.
Sit back and watch the fireworks!
You might be able to simplify this with popen() and the "w" option (since the parent process will want to write to the child).
You might need to consider what to do about signal handling. There again, it is simpler not to do so, though if you write to a pipe when the receiver (victim) has exited, you will get a SIGPIPE signal which will terminate the parent.
Nothing is yielding results.
Let me make highlights of what I suspect are issues.
the string that I pipe includes a \n at the beginning to acknowledge the "press enter to continue" of the vulnerable program.
The buffer I proceed to overflow is declared char c[1024]; now I fill this up with over 1100 bytes. I don't get it; sometimes it works, sometimes it doesn't. Wavering factor is if I am in gdb (being in gdb yields better results). but sometimes it doesn't overflow there either. DUE TO THIS, I really believe this to be some sort of issue with the shell / terminal settings on how my buffer is getting transferred. But I have no idea how to fix this :(
I really appreciate the help everybody. But I am not receiving consistent results. I have done a number of things and have learned a lot of rough material, but I think it might be time to abandon this effort. Or, at least wait longer until someone comes through with answers.
p.s.
installed Expect, :) but I could not receive an overflow from within it...
I seemed to necessitate Expect anyways, because after the pipe is done doing its work I need to regain control of the streams. Expect made this very simple, aside from that fact that I can't get the program to overflow.
I swear this has to do something with the terminal shell settings but I don't have a clue.
Another update
It's teh strangest.
I have actually effectively overwritten the return address with the address of a shellcode environment variable.
That was last night, Oddly enough, the program crashed after going to the end of the environment variable, and never gave me a shell. The shellcode is handwritten, and works (in an empty program that alters main's return address to the addr of the shellcode and returns, simply for test purposes to ensure working shellcode). In this test program Main returns into my SPARC shellcode and produces a shell.
...so.... idk why it didn't work in the new context. but thats the least of my problems. because the overflow it's strange.....
I couldn't seem to reproduce the overflow after some time, as I had stated in my prior post. So, i figured hey why not, let's send a bigger,more dangerous 4000 char buffer filled with trash "A"s like #JonathanLeffler recommended, to ensure segfaulting. And ok let's just say STRANGE results.
If I send less than 3960 chars, there will NOT be an overflow (WTF?!?!), although earlier i could get overflow at times when doing only about 1100 chars, which is significantly less, and that smaller buffer would overwrite the exact spot of return address (when it worked .*cough)
NOW THE strangest part!!!
this 'picky' buffer seems to segfault only for specific lengths. But i tried using gdb after sending the big 4000 char buffer, and noticed something strange. Ok yes it segfaulted, but there were 'preserved areas,' including the return address i previously was able to overflow, is somehow unharmed, and u can see from the image (DONT CLICK IT YET) Read the next paragraph to understand everything so u can properly view it. I am sure it looks a mess without proper understanding. parts of my crafted buffer are not affecting certain areas of memory that I have affected in the past with a smaller buffer! How or why this is happening. I do not know yet. I have no idea how regular this behavior is. but i will try to find out .
That image takes place about 1000 bytes in from the buffer's start address. you can see the 'preserved memory segments', embedded between many 0x41's from my buffer ("A" in hex) . In actuality, address 0xffbef4bc holds the return address of 0x0001136c, which needs to be overwritten, it is the return address of the function that called this one, 'this one' = the function that holds the vulnerable buffer. we cannot write (*the function that vulnerable buffer belongs to)*'s return address due to the nature of stack windows in SPARC -- that return address is actually BELOW the address of the buffer, unreachable, so therefore we must overwrite the return address of the function above us. aka our caller ;)
Anyways the point is that I was also able to previously overflow that return address sometimes with a smaller buffer. So WTF is up with these gaps!!?!??! Shouldnt a larger buffer be able to overflow these, esp. if the smaller buffer could (though not consistently).. Whatever, here's the image.
[image] http://s16.postimage.org/4l5u9g3c3/Screen_shot_2012_06_26_at_11_29_38_PM.png

Resources