read ( C function ) behave strangely - c

Let consider a fragment of code:
char *buffer = (char*) malloc(MAX_LENGTH_OF_COMMAND);
while(1){
printf("gsh> ");
read(0, buffer, sizeof(buffer) );
}
And behaviour is quite strange. I mean, the output for following input "input,input,input" is "gsh> gsh> gsh> gsh>".
So I expected that there is a interrupt during I/O process ( I mean getting data from user) because waiting for user is a wasting of time. Ok, I understand it. But, what if I have to use buffer in next line, for example:
char *buffer = (char*) malloc(MAX_LENGTH_OF_COMMAND);
while(1){
printf("gsh> ");
read(0, buffer, sizeof(buffer) );
// do something with buffer.
}
So it is necessary to have COMPLETE buffer ( input). I don't understand that and I don't know what is way to ensure that complete input is available.
Please explain. ( and correct my train of thought).
Thanks in advance.

You just discovered that read() doesn't guarantee how many bytes it will read. You normally have to call read() in a loop until you find input delimiting characters (such as a newline). In addition, we note that after said newline, you will need to keep whatever remains in the read buffer (if anything) as it is valid input to the next thing that needs to read().
Please note that read()'s return is the number of bytes it read, and your input will not be null-terminated (because it's not expecting a string).

Related

Heap Overflow Attack

I am learning about heap overflow attacks and my textbook provides the following vulnerable C code:
/* record type to allocate on heap */
typedef struct chunk {
char inp[64]; /* vulnerable input buffer */
void (*process)(char *); /* pointer to function to process inp */
} chunk_t;
void showlen(char *buf)
{
int len;
len = strlen(buf);
printf("buffer5 read %d chars\n", len);
}
int main(int argc, char *argv[])
{
chunk_t *next;
setbuf(stdin, NULL);
next = malloc(sizeof(chunk_t));
next->process = showlen;
printf("Enter value: ");
gets(next->inp);
next->process(next->inp);
printf("buffer5 done\n");
}
However, the textbook doesn't explain how one would fix this vulnerability. If anyone could please explain the vulnerability and a way(s) to fix it that would be great. (Part of the problem is that I am coming from Java, not C)
The problem is that gets() will keep reading into the buffer until it reads a newline or reaches EOF. It doesn't know the size of the buffer, so it doesn't know that it should stop when it hits its limit. If the line is 64 bytes or longer, this will go outside the buffer, and overwrite process. If the user entering the input knows about this, he can type just the right characters at position 64 to replace the function pointer with a pointer to some other function that he wants to make the program call instead.
The fix is to use a function other than gets(), so you can specify a limit on the amount of input that will be read. Instead of
gets(next->inp);
you can use:
fgets(next->inp, sizeof(next->inp), stdin);
The second argument to fgets() tells it to write at most 64 bytes into next->inp. So it will read at most 63 bytes from stdin (it needs to allow a byte for the null string terminator).
The code uses gets, which is infamous for its potential security problem: there's no way to specify the length of the buffer you pass to it, it'll just keep reading from stdin until it encounters \n or EOF. It may therefore overflow your buffer and write to memory outside of it, and then bad things will happen - it could crash, it could keep running, it could start playing porn.
To fix this, you should use fgets instead.
You can fill up next with more than 64 bytes you will by setting the address for process. Thereby enable one to insert whatever address one wishes. The address could be a pointer to any function.
To fix simple ensure that only 63 bytes (one for null) is read into the array inp - use fgets
The function gets does not limit the amount of text that comes from stdin. If more than 63 chars come from stdin, there will be an overflow.
The gets discards the LF char, that would be an [Enter] key, but it adds a null char at the end, thus the 63 chars limit.
If the value at inp is filled with 64 non-null chars, as it can be directly accessed, the showlen function will trigger an access violation, as strlen will search for the null-char beyond inp to determine its size.
Using fgets would be a good fix to the first problem but it will also add a LF char and the null, so the new limit of readable text would be 62.
For the second, just take care of what is written on inp.

fopen() in read-only mode and its buffer

Consider the following, albeit very messy, code in C:
#include<stdio.h>
int main() {
char buf[3]; //a new, small buffer
FILE *fp = fopen("test.txt", "r"); //our test file, with the contents "123abc"
setvbuf(fp, buf, _IOFBF, 2); //we assign our small buffer as fp's buffer \
//in fully buffered mode
char character = fgetc(fp); // get the first character...
character = fgetc(fp); // and the next...
character = fgetc(fp); // and the next... (third character, '3')
buf[2] = '\0'; //add a terminating line for display
fputs(buf, stderr); //write our buffer to stderr, should show up immediately
}
Compiling and running the code will print '3a' as the contents of our self-designated buffer, buf. My question is: how does this buffer get filled? Does a call to fgetc() mean several calls until the buffer is full and then stops (we only made three calls to fgetc, which should not include the present 'a')? The first buffer was "12", so does this mean when another fgetc() call is made and the pointer references something outside of the scope of the buffer, is the buffer purged and then filled with the next block of data, or simply overwritten? I understand buffer sizes are platform dependent so I'm more concerned with how, in general, an fopen()ed stream in a read mode pulls characters into it's buffer.
The buffer, and exactly how and when it is filled, is an implementation detail inside the stdio package. But the way it is likely to be implemented is that fgetc gets one character from the buffer, if there are characters available in the buffer. If the buffer is empty, it fills it by reading (in your case) two more characters from the file.
So your first fgetc will read 12 from the file and put it in the buffer, and then return '1'. Your second fgetc will not read from the file, since a character is available in the buffer, and return '2'. Your third fgetc will find that the buffer is empty, so it will read 3a from the file and put it in the buffer, and then return '3'. Therefore, when you print the content of the buffer, it will be 3a.
Note that there are two levels of "reading" happening here. First you have your fgetc calls, and then, below that level, code inside the stdio packade which is reading from the file. If we assume this is on a Unix or Linux system, the second type of reading is done using the system call read(2).
The lower-level reading fills the entire buffer at once, so you don't need as many calls to read as calls to fgetc. (Which is the entire point of having the buffer.)

using read system call after a scanf

I am having a confusion regarding the following code,
#include<stdio.h>
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
read(stdin,buf,4); //attaching to stdin
printf("buffer is %s\n",buf);
return 1;
}
suppose on runtime I provided with the input 10abcd so as per my understanding following should happen:
scanf should place 10 in data
and abcd will still be on the stdin buffer
when read tries to read the stdin (already abcd is there) it should place the abcd into the buf
so printf should print abcd
but it is not happening ,printf showing no o/p
am I missing something here?
First of all read (stdin, ...) should give warnings (if you have them enabled) which you would be wise to heed. read() takes an integer as the first parameter specifying which channel to read from. stdin is of type FILE *.
Even if you changed it to read(0,..., this is not recommended practice. scanf is reading from FILE *stdin which is buffered from file handle 0. read (0, ...) reads directly from the underlying file handle and ignore any characters which were buffered. This will cause strange results unless stdin is set unbuffered.
Ignoring mechanical issues related to the syntax of the read() function call, there are two cases to consider:
Input is from a terminal.
Input is from a file.
Terminal
No data will be available for reading until the user hits return. At that point, the standard I/O library will read all the available data into the buffer associated with stdin (that would be "10abcd\n"). It will then parse the number, leaving the a in the buffer to be read later by other standard I/O functions.
When the read() occurs, it will also wait for the user to provide some input. It has no clue about the data in the stdin buffer. It will hang until the user hits return, and will then read the next lot of data, returning up to 4 bytes in the buffer (no null termination unless it so happens that the fourth character is an ASCII NUL '\0').
File
Actually, this isn't all that much different, except that instead of reading a line of data into the buffer, the standard I/O library will probably read an entire buffer full, (BUFSIZ bytes, which might be 512 or larger). It will then convert the 10 and leave the a for later use. (If the file is shorter than the buffer size, it will all be read into the stdin buffer.)
The read will then collect the next 4 bytes from the file. If the whole file was read already, then it will return nothing — 0 bytes read.
You need to record and check the return value from read(). You should also check the return value from scanf() to ensure it did actually read a number.
try... man read first.
read is declared as ssize_t read(int fd, void *buf, size_t count);
and stdin is declared as FILE *. thats the issue. use fread() instead and you will be sorted.
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
fread(buf, 1, 4, stdin);
printf("buffer is %s\n",buf);
return 1;
}
EDIT: Your understanding is almost correct but not totally.
To address your question properly, i will agree with Jonathen Laffer.
how your code works,
1) scanf should place 10 in data.
2) abcd will still be on the stdin buffer when you press ENTER.
3) then read() will again wait for entry and you have to again press ENTER to run program further.
4)now if you have entered anything before pressing ENTER for 2nd time the printf should print it else you will not get anything on output other than your printf statement.
Thats why i asked you to use fread instead. hope it helps.

Is there any way to peek at the stdin buffer?

We know that stdin is, by default, a buffered input; the proof of that is in usage of any of the mechanisms that "leave data" on stdin, such as scanf():
int main()
{
char c[10] = {'\0'};
scanf("%9s", c);
printf("%s, and left is: %d\n", c, getchar());
return 0;
}
./a.out
hello
hello, and left is 10
10 being newline of course...
I've always been curious, is there any way to "peek" at the stdin buffer without removing whatever may reside there?
EDIT
A better example might be:
scanf("%9[^.]", c);
With an input of "at.ct", now I have "data" (ct\n) left on stdin, not just a newline.
Portably, you can get the next character in the input stream with getchar() and then push it back with ungetc(), which results in a state as if the character wasn't removed from the stream.
The ungetc function pushes the character specified by c (converted to an unsigned char) back onto the input stream pointed to by stream. Pushed-back characters will be returned by subsequent reads on that stream in the reverse order of their pushing.
Only one character of pushback is guaranteed by the standard, but usually, you can push back more.
As mentioned in the other answers resp. the comments there, in practice, you can almost certainly peek at the buffer if you provide your own buffer with setvbuf, although that is not without problems:
If buf is not a null pointer, the array it points to may be used instead of a buffer allocated by the setvbuf function
that leaves the possibility that the provided buffer may not be used at all.
The contents of the array at any time are indeterminate.
that means you have no guarantee that the contents of the buffer reflects the actual input (and it makes using the buffer undefined behaviour if it has automatic storage duration, if we're picky).
However, in practice the principal problem would be finding out where in the buffer the not-yet-consumed part of the buffered input begins and where it ends.
If you want to look at the stdin buffer without changing it, you could tell it to use a another buffer with setbuf, using an array you can access:
char buffer[BUFSIZ];
if (setbuf(stdin, buffer) != 0)
// error
getchar();
printf("%15s\n", buffer);
This let you see something more than ungetc, but I don't think you can go further in a portable way.
Actually this is legal but is not correct for the standard, quoting from it about the setvbuf (setbuf has the same behavior):
The contents of the array at any time are indeterminate.
So this is not what you need if you're looking for complete portability and standard-compliance, but I can't imagine why the buffer should not contain what is expected. However, it seems to work on my computer.
Beware that you have to provide an array of at least BUFSIZ characters to setbuf, and you must not do any I/O operation on the stream before it. If you need more flexibility, take a look at setvbuf.
You could set your own buffer with setvbuf on stdin, and peek there whenever you want.

Reading from stdin using read(..) and figuring out the size of the buffer

I was wondering if anyone could tell me if there is a way to dynamically allocate a buffer when reading an input from stdin using read(...)
For example:
n = read(0, buffer, sizeof ?); How do I ensure that the number of bytes read from stdin (here 0) is the same as in buffer ?
You can't. You do a read into a fixed-size buffer, e.g.:
char buf[BUF_SIZE];
int num_read = read(0, buf, BUF_SIZE);
and then figure out if there's any more data available (usually by checking whether num_read is equal to BUF_SIZE, but in some cases, maybe you need to interpret the data itself). If there is, then you do another read. And so on.
It's up to you to deal with concatenating all the read data.
You can't (unless you've got precognitive skills) figure out the size of what you will get.
But the read method allow you to read part by part the content of stdin, if you put your read() call into a (while your_stop_condition) loop, you will be able to read all the stuff you need from stdin, by packets.
char buffer_to_read[SIZE];
int bytes=0;
while your_stop_condition
{
bytes = read(0, buffer_to_read, SIZE);
// do what you want with your data read
// if bytes < SIZE, you read an EOF
}

Resources