I am learning about heap overflow attacks and my textbook provides the following vulnerable C code:
/* record type to allocate on heap */
typedef struct chunk {
char inp[64]; /* vulnerable input buffer */
void (*process)(char *); /* pointer to function to process inp */
} chunk_t;
void showlen(char *buf)
{
int len;
len = strlen(buf);
printf("buffer5 read %d chars\n", len);
}
int main(int argc, char *argv[])
{
chunk_t *next;
setbuf(stdin, NULL);
next = malloc(sizeof(chunk_t));
next->process = showlen;
printf("Enter value: ");
gets(next->inp);
next->process(next->inp);
printf("buffer5 done\n");
}
However, the textbook doesn't explain how one would fix this vulnerability. If anyone could please explain the vulnerability and a way(s) to fix it that would be great. (Part of the problem is that I am coming from Java, not C)
The problem is that gets() will keep reading into the buffer until it reads a newline or reaches EOF. It doesn't know the size of the buffer, so it doesn't know that it should stop when it hits its limit. If the line is 64 bytes or longer, this will go outside the buffer, and overwrite process. If the user entering the input knows about this, he can type just the right characters at position 64 to replace the function pointer with a pointer to some other function that he wants to make the program call instead.
The fix is to use a function other than gets(), so you can specify a limit on the amount of input that will be read. Instead of
gets(next->inp);
you can use:
fgets(next->inp, sizeof(next->inp), stdin);
The second argument to fgets() tells it to write at most 64 bytes into next->inp. So it will read at most 63 bytes from stdin (it needs to allow a byte for the null string terminator).
The code uses gets, which is infamous for its potential security problem: there's no way to specify the length of the buffer you pass to it, it'll just keep reading from stdin until it encounters \n or EOF. It may therefore overflow your buffer and write to memory outside of it, and then bad things will happen - it could crash, it could keep running, it could start playing porn.
To fix this, you should use fgets instead.
You can fill up next with more than 64 bytes you will by setting the address for process. Thereby enable one to insert whatever address one wishes. The address could be a pointer to any function.
To fix simple ensure that only 63 bytes (one for null) is read into the array inp - use fgets
The function gets does not limit the amount of text that comes from stdin. If more than 63 chars come from stdin, there will be an overflow.
The gets discards the LF char, that would be an [Enter] key, but it adds a null char at the end, thus the 63 chars limit.
If the value at inp is filled with 64 non-null chars, as it can be directly accessed, the showlen function will trigger an access violation, as strlen will search for the null-char beyond inp to determine its size.
Using fgets would be a good fix to the first problem but it will also add a LF char and the null, so the new limit of readable text would be 62.
For the second, just take care of what is written on inp.
Related
I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char *input = (char *)malloc(sizeof(char));
input = "\0";
while (1){
scanf("%s\n", input);
if (strcmp(input, "0 0 0") == 0) break;
printf("%s\n",input);
}
}
I'm trying to read in a string of integers until "0 0 0" is entered in.
The program spits out bus error as soon as it executes the scanf line, and I have no clue how to fix it.
Below is the error log.
[1] 59443 bus error
You set input to point to the first element of a string literal (while leaking the recently allocated buffer):
input = "\0"; // now the malloc'd buffer is lost
Then you try to modify said literal:
scanf("%s\n", input);
That is undefined behaviour. You can't write to that location. You can fix that problem by removing the first line, input = "\0";.
Next, note that you're only allocating space for one character:
char *input = (char *)malloc(sizeof(char));
Once you fix the memory leak and the undefined behaviour, you can think about allocating more space. How much space you need is for you to say, but you need enough to contain the longest string you want to read in plus an extra character for the null terminator. For example,
char *input = malloc(257);
would allow you to read in strings up to 256 characters long.
The immediate problem, (thanks to another answer) is that you're initializing input wrong, by pointing it at read-only data, then later trying to write to it via scanf. (Yes, even the lowly literal "" is a pointer to a memory area where the empty string is stored.)
The next problem is semantic: there's no point in trying to initialize it when scanf() will soon overwrite whatever you put there. But if you wanted to, a valid way is input[0] = '\0', which would be appropriate for, say, a loop using strcat().
And finally, waiting in the wings to bite you is a deeper issue: You need to understand malloc() and sizeof() better. You're only allocating enough space for one character, then overrunning the 1-char buffer with a string of arbitrary length (up to the maximum that your terminal will allow on a line.)
A rough cut would be to allocate far more, say 256 chars, than you'll ever need, but scanf is an awful function for this reason -- makes buffer overruns painfully easy especially for novices. I'll leave it to others to suggest alternatives.
Interestingly, the type of crash can indicate something about what you did wrong. A Bus error often relates to modifying read-only memory (which is still a mapped page), such as you're trying to do, but a Segmentation Violation often indicates overrunning a buffer of a writable memory range, by hitting an unmapped page.
input = "\0";
is wrong.
'input' is pointer, not memory.
"\0" is string, not char.
You assigning pointer to a new value which points to a segment of memory which holds constants because "\0" is constant string literal.
Now when you are trying to modify this constant memory, you are getting bus error which is expected.
In your case i assume you wanted to initialize 'input' with empty string.
Use
input[0]='\0';
note single quotes around 0.
Next problem is malloc:
char *input = (char *)malloc(sizeof(char));
you are allocating memory for 1 character only.
When user will enter "0 0 0" which is 5 characters + zero you will get buffer overflow and will probably corrupt some innocent variable.
Allocate enough memory upfront to store all user input. Usual values are 256, 8192 bytes it doesn't matter.
Then,
scanf("%s\n", input);
may still overrun the buffer if user enters alot of text. Use fgets(buf, limit(like 8192), stdin), that would be safer.
I found a program that takes in standard input
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <PATTERN>\n", argv[0]);
return 2;
}
/* we're not going to worry about long lines */
char buf[4096]; // 4kibi
while (!feof(stdin) && !ferror(stdin)) { // when given a file through input redirection, file becomes stdin
if (!fgets(buf, sizeof(buf), stdin)) { // puts reads sizeof(buf) characters from stdin and puts it into buf; fgets() stops reading when the newline is read
break;
}
if (rgrep_matches(buf, argv[1])) {
fputs(buf, stdout); // writes the string into stdout
fflush(stdout);
}
}
if (ferror(stdin)) {
perror(argv[0]); // interprets error
return 1;
}
return 0;
}
Why is the buf set to 4096 elements? Is it because the maximum number of characters on each line can only be 4096?
The answer is in the code you pasted:
/* we're not going to worry about long lines */
char buf[4096]; // 4kibi
Lines longer than 4096 characters can occur, but the author didn't deem them worth caring about.
Note also the definition of fgets:
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte (\0) is stored after the last character in the buffer.
So if there is a line longer than 4095 characters (since the 4096'th is reserved for the null byte), it will be split across multiple iterations of the while loop.
The program just reads 4096 characters per iteration.
There's no limit in the size of a line, but the may be a limit in the size of the stack ( 8 MB in modern linux systems)
Most programmers choose what fit best for the program being implemented, in this case the programmer commented that there's no need to worry about longer lines.
The author seems to just have a very large memory block for his expected input, to avoid dealing with chunks.
The seemingly awkward number 4096 is most likely explained by the fact that it is a) a power of two number and b) is a memory page size. So when the system chooses to swap out a page to disc, it can do it in one go without any overhead involved.
Wether this really helps is another question, because if you allocate a page with 'malloc', it may not be aligned on a page boundary.
I myself also use such a number often, because it doesn't hurt and in best case it might help. However, it is only really relevant if you are worried about speed and you have reall yontrol over the allocation process in detail. If you allocate a page directly from the OS, then such a size might really have some benefits.
There is no such thing as max no characters in a line. 4096 is taken assuming a normal condition's no lines will be more than 4096 bytes.
It more like preparing for worst case.
Assume you take the size of array less than the sizeof(line) then itbreaks the operation into more than one step till eof is encountered.
I think it is simply that the author chose the char buffer size to be 4*kibi* (4096 = 1024 * 4) by design as commented in code.
#include<stdio.h>
#include<string.h>
int main()
{
char a[1000000];
int i,j;
int arr[1000000];
gets(a);
unsigned long int len=strlen(a);
if(len<1000000){
for(i=0,j=len-1;i<len&&j>=0;i++,j--)
arr[j]=a[i]-'0';
}
return 0;
}
I am using this code to store the number entered through keyboard into an integer array.But it keeps giving me segmentation fault.I don't know where it is.Plus I've heard gets() isn't a good option,
But I don't know how to use the alternative way to do it. It seems to be a fairly simple code.
Can anyone point where memory is leaking and why?
I have used the debugger on Code::Blocks,the call stack is empty.
The alternative to gets it's fgets:
fgets(a, sizeof(a), stdin);
You have placed two very large arrays on the stack. It's unlikely that your process was launched with a large enough stack (over 5MB). The arrays a and arr should be dynamically allocated using malloc() or calloc() and freed later using free().
Define your array a as follows:
char a[ 1000000 ] = { 0 };
The convention in C for strings is to have a NULL terminator. This ensures two things:
a does not take on values from a previous stack frame.
a has a NULL terminator at the end.
Remark:
Having an array of length 1 million will exhaust your call stack quite quickly.
Consider using a dynamic array to be more space efficient. You would need to read one byte at a time from stdin until EOF or new line is sent, implemented as follows:
for ( int byte = getchar(); byte != EOF && byte != '\n'; byte = getchar() ) {
dynarray_Add( dynArray, byte );
}
... where dynArray_Add would be some function that adds a character to your array of characters and performs the appropriately doubling when the length has reached the capacity.
If you are unfamiliar with a dynamic array, read more here.
There are a number of issues with your code. The most obvious is that you're allocating a huge amount of memory on the stack, which is a bad idea even if it isn't causing a problem in itself. You're also using an old, unsafe function (gets) and you're not doing appropriate error checking.
fgets is almost a drop-in replacement for gets, and it has better safety. Simply call it with fgets( a, 1000000, stdin ), and it will never overrun the size of your buffer. Check the return value against NULL and you won't have uninitialised memory issues. Use malloc to get memory and you won't have stack size issues (don't forget to free!). Finally, don't use ints for your loop when the length is stored as an unsigned long! In this case the size of the buffer means it can't be an infinite loop but it's still bad style (also I think you want size_t not unsigned long - They just happen to be the same on your system).
I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));