I am programming in C in Unix,
and I am using gets to read the inputs from keyboard.
I always get this warning and the program stop running:
warning: this program uses gets(), which is unsafe.
Can anybody tell me the reason why this is happening?
gets is unsafe because you give it a buffer, but you don't tell it how big the buffer is. The input may write past the end of the buffer, blowing up your program fairly spectacularly. Using fgets instead is a bit better because you tell it how big the buffer is, like this:
const int bufsize = 4096; /* Or a #define or whatever */
char buffer[bufsize];
fgets(buffer, bufsize, stdin);
...so provided you give it the correct information, it doesn't write past the end of the buffer and blow things up.
Slightly OT, but:
You don't have to use a const int for the buffer size, but I would strongly recommend you don't just put a literal number in both places, because inevitably you'll change one but not the other later. The compiler can help:
char buffer[4096];
fgets(buffer, (sizeof buffer / sizeof buffer[0]), stdin);
That expression gets resolved at compile-time, not runtime. It's a pain to type, so I used to use a macro in my usual set of headers:
#define ARRAYCOUNT(a) (sizeof a / sizeof a[0])
...but I'm a few years out of date with my pure C, there's probably a better way these days.
As mentioned in the previous answers use fgets instead of gets.
But it is not like gets doesn't work at all, it is just very very unsafe. My guess is that you have a bug in your code that would appear with fgets as well so please post your source.
EDIT
Based on the updated information you gave in your comment I have a few suggestions.
I recommend searching for a good C tutorial in your native language, Google is your friend here. As a book I would recommend The C Programming Language
If you have new information it is a good idea to edit them into your original post, especially if it is code, it will make it easier for people to understand what you mean.
You are trying to read a string, basically an array of characters, into a single character, that will of course fail. What you want to do is something like the following.
char username[256];
char password[256];
scanf("%s%s", username, password);
Feel free to comment/edit, I am very rusty even in basic C.
EDIT 2 As jamesdlin warned, usage of scanf is as dangerous as gets.
man gets says:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many characters
gets() will read, and because
gets() will continue to store
characters past the end of the buffer,
it is extremely dangerous to use. It
has been used to break computer
security. Use fgets() instead.
gets() is unsafe. It takes one parameter, a pointer to a char buffer. Ask yourself how big you have to make that buffer and how long a user can type input without hitting the return key.
Basically, there is no way to prevent a buffer overflow with gets() - use fgets().
Related
I'm trying to make an array with variable starting length to get a string. The code should count the words and adjust the size of the array, but this is only a test and I expose it here because I want to know if it's a good practice or one error. And if there is something I should know about, or I must have in mind.
Note, I talk about C, not C++
#include <stdio.h>
int main()
{ int c,b,count;
scanf("%d",&c);
count=c+1;
getchar();
char a[count];
for ( c=b=0 ; c!=count && b!='\n' ; c++ )
{
b=getchar();
a[c]=b;
}
a[c]='\0';
printf("%s",a); printf("%d",c-1);
}
I don't need change the size of the array at the execution time.
I was testing and I don't remember well why I'm using the c variable at first time instead of count directly, but I remember the first getchar was to flush the buffer, because it didn't work without the getchar.
I don't know why I need to put getchar. If I delete the getchar the program fails.
Anyway the program works fine. The first time you run, it expects a number with scanf and then expects the text.
If the text is larger than the size of the array the program will ignore it.
The number is the size of the array.
My questions are:
It is a good practice do a[variable] to do this job?
Why I need the getchar?
It will be portable? I mean, I don't know if some systems or standards don't accept this like some old C compilers or somewhat.
There are better methods?
It is a good practice do a[variable] to do this job?
It depends on someone's compiler configuration. It has been supported since C99. However since there's not a good reason to use it in such a simple program, use the standard malloc instead. Here's an in-depth discussion of the topic.
Why I need the getchar?
There's likely some input still buffered up in your terminal, and that first character is discarding it. Try printing the value out to the screen to see what it is, that might help as figure it out.
It will be portable?
See my answer to your first question. It will probably work on modern versions of gcc, but for example it doesn't work in Windows C (which is still basically on C89).
It is a good practice do a[variable] to do this job?
Where the size is determined by arbitrary user input without imposed limits, it is not good practice. A user could easily enter a very large value and overrun the stack.
Use either dynamic allocation, or check and coerce the input value to some sensible limit.
Also worth noting that VLAs are not supported in C++ or some C compilers, so the code lacks portability.
Why I need the getchar?
The user has to enter at least a newline for scanf() to return, but the %d format specifier does not consume non-digit characters, so it remains buffered. However your code is easily broken by entering additional non-digit characters for example "16a<newline>" will assign 16 to c, and the a will be discarded leaving the newline buffered as before. A better solution is:
while( getchar() != `\n` ) {}
It will be portable? I mean, I don't know if some systems or standards don't accept this like some old C compilers or somewhat.
Adoption of C99 VLAs is variable, and in C11 they are optional in any case.
There are better methods?
I hesitate to say "better", but safer and more flexible and portable ways sure. With respect to the array allocation, you could use malloc().
Using malloc or calloc would be a better choice in C
https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm
my question is about
gets()
and
puts()
are they a perfect solution for string input and output?
gets is marked as obsolescent in C99 and has been removed in C11 because of security issues with this function. Don't use it, use fgets instead. As an historical note, gets was exploited (in fingerd) by the first massive internet worm: the inet worm back in 1988.
puts function is OK if it fits your needs.
gets() is fundamentally insecure in a really horrific way: it will write an unlimited number of characters to its argument, overflowing any buffer it is provided. As such, it should never, ever be used. Many newer compilers will issue an automatic warning if you use it. Instead, use fgets(), which takes a length argument:
char buf[...];
fgets(buf, sizeof(buf), stdin);
On the other hand, puts() is totally fine. It's equivalent to printf("%s\n", x);, and some compilers will in fact convert certain constant printf() calls to puts() as a standard optimization. Go wild.
For gets, see the man page:
BUGS
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and because gets() will continue to store
characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
puts is fine, if you're just looking to write a string to stdout.
I recently faced an interview question on what's the hidden problem with the following code. I was unable to detect it .Can anyone help?
#include<stdio.h>
int main(void)
{
char buff[10];
memset(buff,0,sizeof(buff));
gets(buff);
printf("\n The buffer entered is [%s]\n",buff);
return 0;
}
The function gets accepts a string from stdin and does not check the capacity of the buffer.This may result in buffer overflow. The standard function fgets() can be used here.
gets could return much more than 10 characters.
gets is really problematic because you can't tell it to only fill 'buff' up to a length of 10.
check the Bugs Section of this manual which says
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and because
gets() will continue to store characters past the end of the buffer,
it is extremely dangerous to use. It has been used to break computer
security. Use fgets() instead.
It is not advisable to mix calls to input functions from the stdio
library with low-level calls to read(2) for the file descriptor
associated with the input stream; the results will be undefined and
very probably not what you want.
It is always recommended to use fgets()/ scanf() over gets().
by using the function gets() you don't have the option to limit the user to a certian text length, which may cause a buffer overflow exception. That is why you should not use it.
Try to use fgets() instead:
fgets(buff, MAX_LENGTH_ stdin);
Good luck!
From man gets:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many
characters gets() will read, and
because gets() will continue to store
characters past the end of the buffer,
it is extremely dangerous to use.
It has been used to break computer
security. Use fgets() instead.
Almost everywhere I see scanf being used in a way that should have the same problem (buffer overflow/buffer overrun): scanf("%s",string). This problem exists in this case? Why there are no references about it in the scanf man page? Why gcc does not warn when compiling this with -Wall?
ps: I know that there is a way to specify in the format string the maximum length of the string with scanf:
char str[10];
scanf("%9s",str);
edit: I am not asking to determe if the preceding code is right or not. My question is: if scanf("%s",string) is always wrong, why there are no warnings and there is nothing about it in the man page?
The answer is simply that no-one has written the code in GCC to produce that warning.
As you point out, a warning for the specific case of "%s" (with no field width) is quite appropriate.
However, bear in mind that this is only the case for the case of scanf(), vscanf(), fscanf() and vfscanf(). This format specifier can be perfectly safe with sscanf() and vsscanf(), so the warning should not be issued in that case. This means that you cannot simply add it to the existing "scanf-style-format-string" analysis code; you will have to split that into "fscanf-style-format-string" and "sscanf-style-format-string" options.
I'm sure if you produce a patch for the latest version of GCC it stands a good chance of being accepted (and of course, you will need to submit patches for the glibc header files too).
Using gets() is never safe. scanf() can be used safely, as you said in your question. However, determining if you're using it safely is a more difficult problem for the compiler to work out (e.g. if you're calling scanf() in a function where you pass in the buffer and a character count as arguments, it won't be able to tell); in that case, it has to assume that you know what you're doing.
When the compiler looks at the formatting string of scanf, it sees a string! That's assuming the formatting string is not entered at run-time. Some compilers like GCC have some extra functionality to analyze the formatting string if entered at compile time. That extra functionality is not comprehensive, because in some situations a run-time overhead is needed which is a NO NO for languages like C. For example, can you detect an unsafe usage without inserting some extra hidden code in this case:
char* str;
size_t size;
scanf("%z", &size);
str = malloc(size);
scanf("%9s"); // how can the compiler determine if this is a safe call?!
Of course, there are ways to write safe code with scanf if you specify the number of characters to read, and that there is enough memory to hold the string. In the case of gets, there is no way to specify the number of characters to read.
I am not sure why the man page for scanf doesn't mention the probability of a buffer overrun, but vanilla scanf is not a secure option. A rather dated link - Link shows this as the case. Also, check this (not gcc but informative nevertheless) - Link
It may be simply that scanf will allocate space on the heap based on how much data is read in. Since it doesn't allocate the buffer and then read until the null character is read, it doesn't risk overwriting the buffer. Instead, it reads into its own buffer until the null character is found, and presumably copies that buffer into another of the correct size at the end of the read.
"The average man does not want to be free. He simply wants to be safe." - H. L. Menken
I am attempting to write very secure C. Below I list some of the techniques I use and ask are they as secure as I think they are. Please don't not hesitate to tear my code/preconceptions to shreds. Any answer that finds even the most trivial vulnerability or teaches me a new idea will be highly valued.
Reading from a stream:
According to the GNU C Programming Tutorial getline:
The getline function will
automatically enlarge the block of
memory as needed, via the realloc
function, so there is never a shortage
of space -- one reason why getline is
so safe. [..] Notice that getline can
safely handle your line of input, no
matter how long it is.
I assume that getline should, under all inputs, prevent a buffer overflow from occurring when reading from a stream.
Is my assumption correct? Are there inputs and/or allocation schemes under which this could lead to an exploit? For instance what if the first character from the stream is some bizarre control character, maybe 0x08 BACKSPACE (ctl-H).
Has any work been done to mathematically prove getline as secure?
Malloc Returns Null on Failure:
If malloc encounters an error malloc returns a NULL pointer. This presents a security risk since one can still apply pointer arithmetic to a NULL (0x0) pointer, thus wikipedia recommends
/* Allocate space for an array with ten elements of type int. */
int *ptr = (int*)malloc(10 * sizeof (int));
if (ptr == NULL) {
/* Memory could not be allocated, the program should handle
the error here as appropriate. */
}
Secure sscanf:
When using sscanf I've gotten in the habit of allocating the size to-be-extracted strings to the size of the input string hopefully avoiding the possibility of an overrun. For example:
const char *inputStr = "a01234b4567c";
const char *formatStr = "a%[0-9]b%[0-9]c":
char *str1[strlen(inputStr)];
char *str2[strlen(inputStr)];
sscanf(inputStr, formatStr, str1, str2);
Because str1 and str2 are the size of the inputStr and no more characters than strlen(inputStr) can be read from inputStr, it seems impossible, given all possible values for the inputStr to cause a buffer overflow?
Am I correct? Are there strange corner cases I haven't thought of?
Are there better ways to write this? Libraries that have already solved it?
General Questions:
While I've posted a large number of questions I don't expect anyone to answer all of them. The questions are more of guideline to the sorts of answers I am looking for. I really want to learn the secure C mindset.
What other secure C idioms are out there?
What corner cases do I need to always check?
How can I write unit tests to enforce these rules?
How can I enforce constraints in a testability or provably correct way?
Any recommended static/dynamic analysis technics or tools for C?
What secure C practices do you follow and how do you justify them to yourself and others?
Resources:
Many of the resources were borrowed from the answers.
Secure Programming for Linux and Unix HOWTO by David Wheeler
Secure C programming - SUN Microsystems
Insecure Programming by Example
Add More NOPS - blog covering these issues
CERT Secure Coding Initiative
flawfinder - static analysis tool
Using Thm Provers to prove safety by Yannick Moy
libsafe
I think your sscanf example is wrong. It can still overflow when used that way.
Try this, which specifies the maximum number of bytes to read:
void main(int argc, char **argv)
{
char buf[256];
sscanf(argv[0], "%255s", &buf);
}
Take a look at this IBM dev article about protecting against buffer overflows.
In terms of testing, I would write a program that generates random strings of random length and feed them to your program, and make sure they are handled appropriately.
Reading from a stream
The fact that getline() "will automatically enlarge the block of memory as needed" means that this could be used as a denial-of-service attack, as it would be trivial to generate an input that was so long it would exhaust the available memory for the process (or worse, the system!). Once an out-of-memory condition occurs, other vulnerabilities may also come into play. The behaviour of code in low/no memory is rarely nice, and very hard to predict. IMHO it is safer to set reasonable upper bounds on everything, especially in security-sensitive applications.
Furthermore (as you anticipate by mentioning special characters), getline() only gives you a buffer; it does not make any guarantees about the contents of the buffer (as the safety is entirely application-dependent). So sanitising the input is still an essential part of processing and validating user data.
sscanf
I would tend to prefer to use a regular expression library, and have very narrowly defined regexps for user data, rather than use sscanf. This way you can perform a good deal of validation at the time of input.
General comments
Fuzzing tools are available which generate random input (both valid and invalid) that can be used to test your input handling
Buffer management is critical: buffer overflows, underflows, out-of-memory
Race conditions can be exploited in otherwise secure code
Binary files could be manipulated to inject invalid values or oversized values into headers, so file format code must be rock-solid and not assume binary data is valid
Temporary files can often be a source of security issues, and must be carefully managed
Code injection can be used to replace system or runtime library calls with malicious versions
Plugins provide a huge vector for attack
As a general principle, I would suggest having clearly defined interfaces where user data (or any data from outside the application) is assumed invalid and hostile until it is processed, sanitised and validated, and the only way for user data to enter the application
A good place to start looking at this is David Wheeler's excellent secure coding site.
His free online book "Secure Programming for Linux and Unix HOWTO" is an excellent resource that is regularly updated.
You might also like to look at his excellent static analyser FlawFinder to get some further hints. But remember, no automated tool is a replacement for a good pair of experienced eyes, or as David so colourfully puts it..
Any static analysis tool, such as Flawfinder, is merely a tool. No tool can substitute for human thought! In short, "a fool with a tool is still a fool". It's a mistake to think that analysis tools (like flawfinder) are a substitute for security training and knowledge
I have personally used David's resources for several years now and find them to be excellent.
Insecure Programming by Example
blog with a few of the answers
Yannick Moy developed a Hoare/Floyd weakest precondition system for C during his PhD and applied it to the CERT managed strings library. He found a number of bugs (see page 197 of his memoir). The good news is that the library is safer now for his work.
You could also look at Les Hatton's web site here and at his book Safer C which you can get from Amazon.
Don't use gets() for input, use fgets(). To use fgets(), if your buffer is automatically allocated (i.e., "on the stack"), then use this idiom:
char buf[N];
...
if (fgets(buf, sizeof buf, fp) != NULL)
This will keep working if you decide to change the size of buf. I prefer this form to:
#define N whatever
char buf[N];
if (fgets(buf, N, fp) != NULL)
because the first form uses buf to determine the second argument, and is clearer.
Check the return value of fclose().