How can I find the dimension of the standard input? - c

I have a problem finding out how big is the dimension of the stdin through a pipe. I know that a lot of you will be furious at this question, but just hear me out.
Half of it already works:
$ echo "BYE" | ./my_prog
In the linux shell outputs 4 which is exactly what I want.
The problem comes out when I try to feed it some bytes, in fact the first time works while after it doesn't work anymore.
$ ./create_bytes.py -n 200 | ./my_prog
$ 200
$ ./create_bytes.py -n 200 | ./my_prog
$ 0
and I can't understand why. I'm sure the stream is always the same length.
The code I'm using is the following
int main (int argc, char *argv[]) {
struct stat fd_s;
if (fstat(STDIN_FILENO, &fd_s) == -1) {
perror("fstat(fdin)");
exit(EXIT_FAILURE);
}
printf("%lld\n", fdin_stat.st_size);
...
}
Thanks in advance
EDIT:
This is the actual request:
Read a stream of lines (bytes sequence that terminates with \n) from stdin in 16 bytes blocks. Every line can't be bigger than 128 bytes.
Maybe I'm just making it more difficult than it should be?
I hope it can help
Thanks

If the input is a pipe, it doesn't have a size. It's a stream that in principle can go on forever. The fact that the first time you ran it it gave you a number is not something you can rely on.
If you want to read everything from stdin into memory, you need to read data in a loop, and have a buffer that you realloc() when it is full and there is still more data to be read.
If you need to read in a text file and are going to process it line by line, you can consider using the POSIX function getline(), or you might even read a whole file with getdelim() if you are sure it doesn't contain a given delimiter.

You've run into an ill-defined corner case. POSIX specifies that fstat returns the struct stat info of a file associated with a file descriptor. But what happens when the file descriptor does not correspond to a file it not really defined. You might expect the stat call to return an error (and I'm sure there are some systems that do so), but on most systems it returns some information about the object the file descriptor refers to. What info depends on the OS and the type of the object.
On Linux with a pipe (the case you seem to be using) it will always return st_size = 0 (which implies you are using something other than Linux). I would imagine there are systems that return with st_size set to the amount of data buffered in the pipe, as that seems a useful piece of information. Your results seem consistent with that.

In a comment I asked
If you were to invoke ./create_bytes.py -n 100000000 | ./my_prog, how would you expect it to work?
and you replied
It should print out 100000000 I guess
So let's think about this, and ask: How could this possibly work?
The create_bytes.py script is going to write 100,000,000 bytes. Where do they go? Into the pipe.
But what happens over in my_prog? It doesn't actually read any characters from the pipe, it just asks, what is the "size" of the pipe?
But if create_bytes.py has written 100,000,000 characters, and if my_prog hasn't read them, where are they? Are they all "in the pipe"? And the answer is, no, they are not.
Pipes have a finite capacity. If they fill up, and if the reader doesn't read characters out fast enough, the operating system automatically puts the writing process to sleep. The writing process isn't woken up again, isn't given the opportunity to write any more characters, until some empty space has cleared up in the pipe for it to write into again.
My point is that if pipes have a finite capacity (as I assert that they do), it's impossible for the example I posed to print "100000000", for the simple reason that there is no piece of code, anywhere, that can possibly read and count those characters.
You might imagine that fstat ought to read and count them in this situation somehow, but (a) it doesn't and (b) it couldn't. If fstat read characters from the pipe so it could count them, the characters would be gone. If your program then tried to read them (perhaps down below the ... you had in your code fragment), it wouldn't be able to read them, and that would be Wrong.
But, to convince yourself, I encourage you to try that invocation
./create_bytes.py -n 100000000 | ./my_prog
and see what you get. I'll bet you $100 you don't get "100000000", but the result you do get might be interesting.
I don't have your create_bytes.py script, so instead I tried
yes | a.out
yes is a standard Unix program that prints "y" an infinite number of times. a.out was where I'd just compiled your test program, after fixing it up a bit. And, on my machine, it printed
65536
So evidently, on my machine, when fstat is called on a file descriptor that's connected to a pipe, fstat fills in st_size with the size of the contents of the pipe, and on my machine, pipes evidently have a capacity of 65536, which is of course 216.

Related

Amount of data read() syscall will actually read

Suppose I have a file for which the file descriptor has more than n bytes left until EOF, and I invoke the read() syscall for n bytes. Is the function guaranteed to read n bytes into the buffer? Or can it read less?
The read system call is guaranteed to read as many many characters as you asked for, except when it can't. But it turns out that there are so many exceptions -- so many cases where it can't read as many characters as you asked for -- that it basically ends up being safest to assume that any given read call probably won't read as many characters as you asked for. I believe it's good practice to always write your code with that in mind.
The man page on my system says
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
So if it's not a normal file, or if it is a normal file but there aren't enough characters, you'll get fewer than you asked for. But in the case you asked about, yes, you should be guaranteed to get exactly as many characters as you asked for.
With that said, though, if you find yourself with a choice between assuming that read is allegedly guaranteed to read exactly the number of characters requested, versus acknowledging that it might return less, I would always write the code to assume it might return less. That is, if you have a call like
r = read(fd, buf, n);
there isn't usually much to be gained by assuming that if r is greater than 0, it must be exactly n. Your code has to be able to handle the r < n case so it will behave properly when it's almost at end-of-file, so unless you want to have two different code paths (one for "normal" reads, and one for the last read), you might as well write one piece of code, that can handle the r < n case, and let it operate all the time.
(Also, as Zan Lynx reminds in a comment, don't have the code notice that r < n, and infer from that that end-of-file is coming up soon. Wait for r == 0 before deciding you're at end-of-file.)
You could've read it from the man page yourself:
On Linux, read() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
actually transferred. (This is true on both 32-bit and 64-bit
systems.)
So even if you had enough RAM and so on, you couldn't read a full-size DVD image in one go - however, this wouldn't be the sane thing to do either; to access such large files, mmap would be better.
Other than that, a signal might be delivered, which can cause exit with EINTR and buffer contents indeterminate.
ERRORS
[...]
EINTR The call was interrupted by a signal before any data was read; see signal(7).
Is the function guaranteed to read n bytes into the buffer? Or can it
read less?
No, even if your file has more than n bytes before its end, the read(fd, buf, n) function is not guaranteed to read n bytes into bufffer and then return n. It can read less and return a positive value that is less than n.
See Linux man page at https://man7.org/linux/man-pages/man2/read.2.html
RETURN VALUE
It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

System Calls Function for prompting and getting user input

Ok, so I am writing a C program for my class, but I am only allowed to use system calls. Basically our program is making our on cp command, where we are taking two files as inputs from the command line and copying the first file and putting it into a second file. It is relatively simple and I have most of the code right or just about right with maybe some small fixes. However, one part of the program is if the destination file already exits, we need to prompt the user to ask if it should be overwritten or not, so I need to know how to get user input using a system call function, aka I can't use scanf, fgets, gets etc. The only function I can use from the standard library is printf basically. So I need to know what the system calls function is to get a user prompt. This part of the code is suppose to work like cp -i , if that helps anyone. Thank you in advance.
You could use system call read. To read from standard input, fd (file descriptor) is 0.
$ man read
READ(2)
Linux Programmer's Manual (2)
NAME
read - read from a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
If count is zero, read() returns zero and has no other results. If count is greater than SSIZE_MAX, the result is unspecified.

Interacting with shell program in C to feed a custom Buffer into its gets() function (and more)

If you dont want the long sschpeal head the the last paragraph-->
I found a buffer overflow vulnerability in a program that is using gets() to fill a function's local 1024-char* buffer. It's on Sparc Solaris 5.8 (sun4u) 32-bit.
The first obstacle to overcome was the tch was not letting me manually input > 257 chars
(256 if I want to be able to hit enter ;)
To bypass this, I have been executing /bin/sh and stty raw and I can effectively overflow the buffer now with > 1095 chars.
(Note : I have to use Ctrl-J to do line-feeds/enter , though I haven't researched stty raw to examine why this change occurs.
My issue is this: it is now time to not only overflow the buffer but also write new return address / preserve %fp in hex codes. But since I know of no way to manually enter hex codes from inside a terminal program, I figured I could find a way to use C and have it execute/interact with the vulnerable program and eventually send it my custom buffer.
HOWEVER, if I had a way to manually enter / copy paste hex bytes, I could just do something EASY like this!!!
perl -e 'print "n" . "A"x1094 . "\xff\xbe\xf5\x58" . "\xff\xbe\xff\x68" . "\0"'
(if you're wondering why I am printing 'n' it is because the vulnerable program checks for a yes/no # index 0 of the string)
because I know no way to manually paste such hex-information, I have been trying in C.
In C, I craft the special buffer and have been learning to popen() the vulnerable program ("w") and fputs my buffer, but it has been working iffy at best. (popen and IPC is all new to me)
(I also tried piping/dup2ing and i got NO results, no evidence of effective string output/input) not sure what is going wrong, and I experimented much with the code and later abandoned it.
The best to depict the output from my 'popen' program is that there is a segfault in the vulnerable program only by delimiting the buffer at indexes [1096->1099], this is effectively the location of the function's %fp, so it seemed normal # first. However, delimiting the string at indexes HIGHER than this leaves the programing working fine (WTF)!!! And that sort of behavior makes me think WTF!!? That is not the same behavior as manually pasting, as going more chars most definitely changes seg fault -> bus error, because I will be next overwriting the return address followed by whatever possibly important info in that stack frame and beyond!!
Is the whole string not actually getting sent in one bang?!?!? I heard something about buffer fflush() issues from the popen() manpage, but I dont understand that talk!!
It's my first time using popen(), and there is more behavior that I have deemed strange-> if i stop fputs()ing data , the vulnerable program goes into an infinite loop, repeatedly printing the last output string that it NORMALLY would
only print once,
but in this case, whenever i stop fputs'ing, the thing starts infinitely printing out. Now, I expected that if I am not outputting, wouldn't the program just sit and wait for more input like a good duck. ??? apparently not. apparently it has to keep on pissing and moaning that I need to enter the next string!! is this normal behavior with popen?! Perhaps it is due to my popen' program exiting and closing with pclose(), before actually finishing (but i was expecting a buffer overflow and i dont know why I am not getting it like I could when pasting manually)
Note: I am using "\r\n" to signal the vulnerable program to do a 'return' , I am not sure the equivalent of CTRL-J / Enter key (which enter key does not work in raw tty). I am also not sure if raw tty is even necessary when piping a buffer.
then I thought I try to be clever and cat the strings to a file and then do a pipe via command line. I have no idea if u can pipe like this to a program expecting inputs
in this form, I could not even get a single overflow!! i.e.
printf "\r\n" > derp && perl -e 'print "n" . "A"x1025' >> derp && printf "\r\n" >> derp
cat derp | ./vuln
Now, rewind <-> back in tsh, i said I have a 257 char limit, and i needed to do ONE LESS THAN THAT if i wanted to be able to hit enter and have the program continue operation. So, perhaps \r\n is not right here, cause that's 2 chars. either that or you just Cannot cat into a program like this. But I AM using \r\n in my C programs to tell the vulnerable program that I have hit enter, and they are at least mildly more functional (not really), though still not overflowing the buffer in the same fashion as manually pasting my trash buffer.
ARGh!!!
Also, using just one or the other: '\r' or '\n' was most definitely not working! is there another control char out there I am missing out on? And is it possible that this could be one of my issues with my programs???
but basically my whole problem is I cant' seem to understand how to create a program to run and interface with a command-line executable and say hey!!! Take this whole buffer into your gets(), i know you'd really love it!! just as I would if I was running the program from terminal myself.
And i know of no way to manually paste / write hex codes into the terminal, is the whole reason why i am trying to write an interacting program to
craft a string with hext bytes in C and send to that program's gets()!!!!
If you jumped to this paragraph, i want you also to know that I am using specifically /bin/bash and stty raw so that I could manually input more than 257 chars (not sure if I NEED to continue doing this if I can successfully create an interacting program to send the vulnerable program the buffer. maybe sending a buffer in that way bypasses tch' terminal 257 char limit)
Can anyone help me!?!?!?!?!
The popen call is probably the call you want. Make sure to call pclose after the test is finished so that the child process is properly reaped.
Edit Linux man page mentioned adding "b" to the mode was possible for binary mode, but POSIX says anything other than "r" or "w" is undefined. Thanks to Dan Moulding for pointing this out.
FILE *f = popen("./vuln", "w");
fwrite(buf, size, count, f);
pclose(f);
If the shell is reading with gets(), it is reading its standard input.
In your exploit code, therefore, you need to generate an appropriate overlong string. Unless you're playing at being expect, you simply write the overlong buffer to a pipe connected from your exploit program to the victim's standard input. You just need to be sure that your overlong string doesn't contain any newlines (CR or LF). If you pipe, you avoid the vagaries of terminal settings and control-J for control-M etc; the pipe is a transparent 8-bit transport mechanism.
So, your program should:
Create a pipe (pipe()).
Fork.
Child:
connect the read end of the pipe to standard input (dup2()).
close the read and write ends of the pipe.
exec the victim program.
report an error and exit if it fails to exec the victim.
Parent:
close the read end of the pipe.
generates the string to overflow the victim's input buffer.
write the string to the victim down the pipe.
Sit back and watch the fireworks!
You might be able to simplify this with popen() and the "w" option (since the parent process will want to write to the child).
You might need to consider what to do about signal handling. There again, it is simpler not to do so, though if you write to a pipe when the receiver (victim) has exited, you will get a SIGPIPE signal which will terminate the parent.
Nothing is yielding results.
Let me make highlights of what I suspect are issues.
the string that I pipe includes a \n at the beginning to acknowledge the "press enter to continue" of the vulnerable program.
The buffer I proceed to overflow is declared char c[1024]; now I fill this up with over 1100 bytes. I don't get it; sometimes it works, sometimes it doesn't. Wavering factor is if I am in gdb (being in gdb yields better results). but sometimes it doesn't overflow there either. DUE TO THIS, I really believe this to be some sort of issue with the shell / terminal settings on how my buffer is getting transferred. But I have no idea how to fix this :(
I really appreciate the help everybody. But I am not receiving consistent results. I have done a number of things and have learned a lot of rough material, but I think it might be time to abandon this effort. Or, at least wait longer until someone comes through with answers.
p.s.
installed Expect, :) but I could not receive an overflow from within it...
I seemed to necessitate Expect anyways, because after the pipe is done doing its work I need to regain control of the streams. Expect made this very simple, aside from that fact that I can't get the program to overflow.
I swear this has to do something with the terminal shell settings but I don't have a clue.
Another update
It's teh strangest.
I have actually effectively overwritten the return address with the address of a shellcode environment variable.
That was last night, Oddly enough, the program crashed after going to the end of the environment variable, and never gave me a shell. The shellcode is handwritten, and works (in an empty program that alters main's return address to the addr of the shellcode and returns, simply for test purposes to ensure working shellcode). In this test program Main returns into my SPARC shellcode and produces a shell.
...so.... idk why it didn't work in the new context. but thats the least of my problems. because the overflow it's strange.....
I couldn't seem to reproduce the overflow after some time, as I had stated in my prior post. So, i figured hey why not, let's send a bigger,more dangerous 4000 char buffer filled with trash "A"s like #JonathanLeffler recommended, to ensure segfaulting. And ok let's just say STRANGE results.
If I send less than 3960 chars, there will NOT be an overflow (WTF?!?!), although earlier i could get overflow at times when doing only about 1100 chars, which is significantly less, and that smaller buffer would overwrite the exact spot of return address (when it worked .*cough)
NOW THE strangest part!!!
this 'picky' buffer seems to segfault only for specific lengths. But i tried using gdb after sending the big 4000 char buffer, and noticed something strange. Ok yes it segfaulted, but there were 'preserved areas,' including the return address i previously was able to overflow, is somehow unharmed, and u can see from the image (DONT CLICK IT YET) Read the next paragraph to understand everything so u can properly view it. I am sure it looks a mess without proper understanding. parts of my crafted buffer are not affecting certain areas of memory that I have affected in the past with a smaller buffer! How or why this is happening. I do not know yet. I have no idea how regular this behavior is. but i will try to find out .
That image takes place about 1000 bytes in from the buffer's start address. you can see the 'preserved memory segments', embedded between many 0x41's from my buffer ("A" in hex) . In actuality, address 0xffbef4bc holds the return address of 0x0001136c, which needs to be overwritten, it is the return address of the function that called this one, 'this one' = the function that holds the vulnerable buffer. we cannot write (*the function that vulnerable buffer belongs to)*'s return address due to the nature of stack windows in SPARC -- that return address is actually BELOW the address of the buffer, unreachable, so therefore we must overwrite the return address of the function above us. aka our caller ;)
Anyways the point is that I was also able to previously overflow that return address sometimes with a smaller buffer. So WTF is up with these gaps!!?!??! Shouldnt a larger buffer be able to overflow these, esp. if the smaller buffer could (though not consistently).. Whatever, here's the image.
[image] http://s16.postimage.org/4l5u9g3c3/Screen_shot_2012_06_26_at_11_29_38_PM.png

Can a program output be redirected to a pipe through program itself?

Well this is regarding a program for a competition.
I was submitting a program & finding my metrics to be relatively way slower than the top scorers in terms of total execution speed. All others (page faults, memory...) were similar. I found that when I ran through my program without the printf (or write) my total execution speed (as measured in my own pc) seemed to be similar.
The competition evaluates the output by redirecting the output (with a pipe, i suppose) into a file & matching its MD5 with theirs....
My question is, Is there by any means something in C, that doesn't write to the output stream but still the pipe gets its input. Or perhaps I am even framing the question wrong. But either way, I am in a fix.
I have been beating my head off with optimizing the algorithm. BTW they accept makefile where many have tried to optimize. For me neither of the optimization flags have worked. I don't know what else can be done about that too...
If you need to make a program that writes its output to a file, you just need to:
open the file with int fd = fopen("/file/path", O_WRONLY); (you may need to check the parameters, it's been a long time since I've done C programming) and then write(fd, ...); or fprintf(fd, ...);
open the file with fopen, close the standard output and use dup2() to duplicate the file descriptor to the file descriptor number 1 (i.e. standard output).
You may try fprintf on the pipe fd.

Understanding behaviour of read() and write()

hi i am a student and just start learning low level c programming.i tried to understand read() and write() methods with this program.
#include <unistd.h>
#include <stdlib.h>
main()
{
char *st;
st=calloc(sizeof(char),2);//allocate memory for 2 char
read(0,st,2);
write(1,st,2);
}
i was expecting that it would give segmentation fault when i would try to input more than 2 input characters.but when i execute program and enter " asdf " after giving " as " as output it executes "df" command.
i want to know why it doesn't give segmentation fault when we assign more than 2 char to a string of size 2.and why is it executing rest(after 2 char)of input as command instead of giving it as output only?
also reading man page of read() i found read() should give EFAULT error,but it doesn't.
I am using linux.
Your read specifically states that it only wants two characters so that's all it gets. You are not putting any more characters into the st area so you won't get any segmentation violations.
As to why it's executing the df part, that doesn't actually happen on my immediate system since the program hangs around until ENTER is pressed, and it appears the program's I/O is absorbing the extra. But that immediate system is Cygwin - see update below for behaviour on a "real" UNIX box.
And you'll only get EFAULT if st is outside your address space or otherwise invalid. That's not the case here.
Update:
Trying this on Ubuntu 9, I see that the behaviour is identical to yours. When I supply the characters asls, the program outputs as then does a directory listing
That means your program is only reading the two characters and leaving the rest for the "next" program to read, which is the shell.
Just make sure you don't try entering:
asrm -rf /
(no, seriously, don't do that).
You ask read() to read no more than 2 characters (third parameters to read()) and so it overwrites no more than two characters in the buffer you supplied. That's why there's no reason for any erroneous behavior.
When you read(), you specify how many bytes you want. You won't get more than that unless your libc is broken, so you'll never write beyond the end of your buffer as long as your count is never greater than the size of your buffer. The extra bytes remain in the stream, and the next read() will get them. And if you don't have a next read() in your app, the process that spawned it (which would normally be the shell) may see them, since spawning a console app from the shell involves attaching the shell's input and output streams to the process. Whether the shell sees and gets the bytes depends partly on how much buffering is done behind the scenes by libc, and whether it can/does "unget" any buffered bytes on exit.
with read(0, st, 2); you read 2 chars from standard input.
The rest of what you typed will not be accuired from the program, but will not be omitted, so the keystrokes are going back to the shell, from which your program started (which are df and enter).
Since you only read 2 character, there is no problem. the df characters are not consume by your program, so they stay in the terminal buffer, and are consumed by the shell :
your program runs
you type asdf\n
your program reads asand leaves df\n in the tty buffer
you write the content of the st buffer to stdout
your program stops
the shell reads df\n from input and executes df command.
Fun things to try :
strace your program, to trace the system call : strace -e read, write ./yourprogram
read(0, st, 5)

Resources