Are functions like strcpy, gets, etc. always dangerous? What if I write a code like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100);
strcpy(str2, str1);
}
This way the function doesn't accept arguments(parameters...) and the str variable will always be the same length...which is here 16 or slightly more depending on the compiler version...but yeah 100 will suffice as of march, 2011 :).
Is there a way for a hacker to take advantage of the code above?
10x!
Absolutely not. Contrary to Microsoft's marketing campaign for their non-standard functions, strcpy is safe when used properly.
The above is redundant, but mostly safe. The only potential issue is that you're not checking the malloc return value, so you may be dereferencing null (as pointed out by kotlinski). In practice, this likely to cause an immediate SIGSEGV and program termination.
An improper and dangerous use would be:
char array[100];
// ... Read line into uncheckedInput
// Extract substring without checking length
strcpy(array, uncheckedInput + 10);
This is unsafe because the strcpy may overflow, causing undefined behavior. In practice, this is likely to overwrite other local variables (itself a major security breach). One of these may be the return address. Through a return to lib C attack, the attacker may be able to use C functions like system to execute arbitrary programs. There are other possible consequences to overflows.
However, gets is indeed inherently unsafe, and will be removed from the next version of C (C1X). There is simply no way to ensure the input won't overflow (causing the same consequences given above). Some people would argue it's safe when used with a known input file, but there's really no reason to ever use it. POSIX's getline is a far better alternative.
Also, the length of str1 doesn't vary by compiler. It should always be 17, including the terminating NUL.
You are forcefully stuffing completely different things into one category.
Functions gets is indeed always dangerous. There's no way to make a safe call to gets regardless of what steps you are willing to take and how defensive you are willing to get.
Function strcpy is perfectly safe if you are willing to take the [simple] necessary steps to make sure that your calls to strcpy are safe.
That already puts gets and strcpy in vastly different categories, which have nothing in common with regard to safety.
The popular criticisms directed at safety aspects of strcpy are based entirely on anecdotal social observations as opposed to formal facts, e.g. "programmers are lazy and incompetent, so don't let them use strcpy". Taken in the context of C programming, this is, of course, utter nonsense. Following this logic we should also declare the division operator exactly as unsafe for exactly the same reasons.
In reality, there are no problems with strcpy whatsoever. gets, on the other hand, is a completely different story, as I said above.
yes, it is dangerous. After 5 years of maintenance, your code will look like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
{enough lines have been inserted here so as to not have str1 and str2 nice and close to each other on the screen}
char *str2 = malloc(100);
strcpy(str2, str1);
}
at that point, someone will go and change str1 to
str1 = "THIS IS A REALLY LONG STRING WHICH WILL NOW OVERRUN ANY BUFFER BEING USED TO COPY IT INTO UNLESS PRECAUTIONS ARE TAKEN TO RANGE CHECK THE LIMITS OF THE STRING. AND FEW PEOPLE REMEMBER TO DO THAT WHEN BUGFIXING A PROBLEM IN A 5 YEAR OLD BUGGY PROGRAM"
and forget to look where str1 is used and then random errors will start happening...
Your code is not safe. The return value of malloc is unchecked, if it fails and returns 0 the strcpy will give undefined behavior.
Besides that, I see no problem other than that the example basically does not do anything.
strcpy isn't dangerous as far as you know that the destination buffer is large enough to hold the characters of the source string; otherwise strcpy will happily copy more characters than your target buffer can hold, which can lead to several unfortunate consequences (stack/other variables overwriting, which can result in crashes, stack smashing attacks & co.).
But: if you have a generic char * in input which hasn't been already checked, the only way to be sure is to apply strlen to such string and check if it's too large for your buffer; however, now you have to walk the entire source string twice, once for checking its length, once to perform the copy.
This is suboptimal, since, if strcpy were a little bit more advanced, it could receive as a parameter the size of the buffer and stop copying if the source string were too long; in a perfect world, this is how strncpy would perform (following the pattern of other strn*** functions). However, this is not a perfect world, and strncpy is not designed to do this. Instead, the nonstandard (but popular) alternative is strlcpy, which, instead of going out of the bounds of the target buffer, truncates.
Several CRT implementations do not provide this function (notably glibc), but you can still get one of the BSD implementations and put it in your application. A standard (but slower) alternative can be to use snprintf with "%s" as format string.
That said, since you're programming in C++ (edit I see now that the C++ tag has been removed), why don't you just avoid all the C-string nonsense (when you can, obviously) and go with std::string? All these potential security problems vanish and string operations become much easier.
The only way malloc may fail is when an out-of-memory error occurs, which is a disaster by itself. You cannot reliably recover from it because virtually anything may trigger it again, and the OS is likely to kill your process anyway.
As you point out, under constrained circumstances strcpy isn't dangerous. It is more typical to take in a string parameter and copy it to a local buffer, which is when things can get dangerous and lead to a buffer overrun. Just remember to check your copy lengths before calling strcpy and null terminate the string afterward.
Aside for potentially dereferencing NULL (as you do not check the result from malloc) which is UB and likely not a security threat, there is no potential security problem with this.
gets() is always unsafe; the other functions can be used safely.
gets() is unsafe even when you have full control on the input -- someday, the program may be run by someone else.
The only safe way to use gets() is to use it for a single run thing: create the source; compile; run; delete the binary and the source; interpret results.
Related
This is a code from a tutorial in which user enters the size of the string and string itself. The code should uses memory allocation to reproduce the same string. I have few doubts in the code-
Why is the *text pointer initialized to NULL at the beginning? Was this initialization useful in later part of the program or it is good practice to initialize to NULL.
Why is it comparing the pointer to NULL. Won't the address change once we allocate a string to the pointer? At the end of the string will pointer point to NULL (no address)?
What is the use of scanf(" ")?
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int size;
char *text = NULL; //---------------------------------------------------------->1
printf("Enter limit of the text: \n");
scanf("%d", &size);
text = (char *) malloc(size * sizeof(char));
if (text != NULL) //---------------------------------------------------------->2
{
printf("Enter some text: \n");
scanf(" "); //---------------------------------------------------------->3
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
text = NULL;//---------------------------------------------------------->4
return 0;
}
Why is the *text pointer initialized to NULL at the beginning?
To protect you from your own humanity, mainly. As the code evolves, it's often easy to forget to initialize the pointer in one or more branches of code and then you're dereferencing an uninitialized pointer - it is undefined behavior, and as such it's not guaranteed to crash. In worst case, if you don't use proper tools such as Valgrind (it'd point it out right away), you can spend hours or days finding such a problem because of how unpredictable it is, and because the behavior changes based on what else was on the stack before the call - so you might see a "bug" in a completely unreleated and perfectly not-buggy code.
Why is it comparing the pointer to NULL.
Because malloc can return a NULL and just because it returns it doesn't mean you can dereference it. The null pointer value is special: it means "hey, this pointer is not valid, don't use it for anything". So before you dereference anything returned from malloc, you have to check that it's not null. To do otherwise is undefined behavior, and modern compilers may do quite unexpected things to your code when such behavior is present. But before asking such a question I'd advise to always check what is the function you're wondering about actually designed to do. Google cppref malloc and the first hit is: https://en.cppreference.com/w/c/memory/malloc. There, under the heading of Return value, we read:
On failure, returns a null pointer.
That's why it's comparing the pointer to NULL!
What is the use of scanf(" ")?
That one is easy: you could have looked it up yourself. The C standard library is well documented: https://en.cppreference.com/w/c/io/fscanf
When you read it, the relevant part is:
format: pointer to a null-terminated character string specifying how to read the input.
The format string consists of [...]
whitespace characters: any single whitespace character in the format string consumes all available consecutive whitespace characters from the input (determined as if by calling isspace in a loop). Note that there is no difference between "\n", " ", "\t\t", or other whitespace in the format string.
And there's your answer: scanf(" ") will consume any whitespace characters in the input, until it reaches either EOF or the first non-whitespace character.
After freeing *text pointer, it was again allocated to NULL. So did it effectively free up memory?
No. First of all, the language used here is wrong: the pointer was assigned a value of NULL. Nothing was allocated! Pointer is like a postal address. You can replace it with the word "NOWHERE", and that's what NULL is. But putting something like "this person has no address" in your address book you have not "allocated" anything.
Yes - free did free the memory. Then you can set it to NULL because you're human, so that you won't forget so easily that the pointer's value is not valid anymore. It's in this case a "note to self". Humans tend to forget that a pointer is null and then will use it. Such use is undefined behavior (your program can do anything, for example erase your hard drive). So the text = NULL assignment has nothing to do with the machine. It has everything to do with you: humans are not perfect, and it's best to program defensively so that you give yourself less chances to introduce a bug as you change the code, or as you work under deadline pressure, etc.
Generally speaking, the NULL assignment at the end of main is not necessary in such a simple short program. But you have to recognize the fact that text cannot be dereferenced after it has been free-d.
Personally, I find it best to leverage the property of C language that gives variables lexical scope. Once the scope ends, the variable is not accessible, so you can't write a bug that would use text - it won't compile. This is called "correctness by design": you design the software in such a way that some bugs are impossible by construction, and if you code the bug then the code won't compile. That's a million times better than catching the bug at runtime, or worse - having to debug it, potentially in unrelated code (remember: undefined behavior is nasty - it often manifests as problems thousands of lines away from the source).
So here's how I'd rewrite it just to address this one issue (there are others still left there):
#include <stdio.h>
#include <stdlib.h>
void process_text(int size)
{
char *const text = malloc(size * sizeof(char));
if (!text) return;
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
int main()
{
int size;
printf("Enter limit of the text: \n");
scanf("%d", &size);
process_text(size);
}
The scope of text is limited to the block of process_text. You initialize it immediately at the point of declaration: that's always preferred. There's no need to set it to NULL first, since you assign the desired value right away. You check if maybe malloc has returned NULL (i.e. it failed to allocate memory), and if so you immediately return from the function. A NULL check is idiomatically written as if (pointer) /* do something if the pointer is non-null */ or as if (!pointer) /* do something if the pointer IS NULL */. It's less verbose that way, and anyone reading such code is supposed to know what it means if they have any sort of experience. Now you know too what such code means. It's not a big hurdle to be aware of this idiom. It's less typing and less distraction.
Generally speaking, code that returns early should be preferred to nested if blocks and unending levels of indentation. When there are multiple checks before a function can do its job, they often end up in nested if statements, making the function much harder to read.
There's a flip side to that: in C++ the code is supposed to leverage C++ (i.e. it's not just C compiled with a C++ compiler), and the resources that have to be released when returning from a function should be automatically released by the compiler generated code that invokes destructors. But in C no such automatic destructor calls are made. So if you return from a function early, you have to make sure that you've released any resources that were allocated earlier on. Sometimes the nested if statements help with that, so you shouldn't be blindly following some advice without understanding the context and assumptions the advice makes :)
Although it's truly a matter of preference - and I have C++ background where the code written as above is way more natural - in C probably it'd be better not to return early:
void process_text_alternative_version(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
}
free(text);
}
The value of text is only used if it's not null, but we don't return from the function early. This ensures that in all cases will the memory block pointed to by text - if any - gets freed! This is very important: it's yet another way to write code that's correct by design, i.e. in a way that makes certain mistakes either impossible or much harder to commit. Written as above, you have no way of forgetting to free the memory (unless you add a return statement somewhere inside).
It must be said that even though some decisions made in the design of the C language library have been atrocious, the interface to free has been thoughtfully made in a way that makes the above code valid. free is explicitly allowed to be passed a null pointer. When you pass it a null pointer - e.g. when malloc above failed to allocate the memory - it will do nothing. That is: "freeing" a null pointer is a perfectly valid thing to do. It doesn't do anything, but it's not a bug. It enables writing code like above, where it's easy to see that in all cases text will be freed.
A VERY IMPORTANT COROLLARY: null pointer checks before free (in C) or delete (in C++) indicate that the author of the code doesn't have a clue about the most basic behavior of free and delete: it's usually an indicator that the code will be written as if it was a black magical art that no mere mortal understands. If the author doesn't understand it, that is. But we can and must do better: we can educate ourselves about what the functions/operators that we use do. It's documented. It costs no money to look that documentation up. People have spent long hours making sure the documentation is there for anyone so inclined to see. Ignoring it is IMHO the very definition of insanity. It's sheer irrationality on a wild rollercoaster ride. For the sane among us: all it takes is a google search that includes the word cppref somewhere. You'll get cppreference links up top, and that's a reliable resource - and collaboratively written, so you can fix any shortcomings you note, since it's a wiki. It's called "cpp"reference, but it really is two references in one: a C++ Reference as well as a C Reference.
Back to the code in question, though: someone could have written it as follows:
void process_text_alternative_version_not_recommended(int size)
{
char *text = malloc(size * sizeof(char));
if (text) {
printf("Enter some text: \n");
scanf(" ");
gets(text);
printf("Inputted text is: %s\n", text);
free(text);
}
}
It's just as valid, but such form defeats the purpose: it's not clear at a glance that text is always freed. You have to inspect the condition of the if block to convince yourself that indeed it will get freed. This code will be OK for a while, and then years later someone will change it to have a bit fancier if condition. And now you got yourself a memory leak, since in some cases malloc will succeed, but free won't be called. You're now hoping that some future programmer, working under pressure and stress (almost invariably!) will notice and catch the problem. Defensive programming means that we protect ourselves not only from bad inputs (whether errant or malicious), but also from our own inherent human fallibility. Thus it makes most sense in my opinion to use the first alternative version: it won't turn into a memory leak no matter how you modify the if condition. But beware: messing up the if condition may turn it into undefined behavior if the test becomes broken such that the body of if executes in spite of the pointer being null. It's not possible to completely protect ourselves from us, sometimes.
As far as constness is concerned, there are 4 ways of declaring the text pointer. I'll explain what they all mean:
char *text - a non-const pointer to non-const character(s): the pointer can be changed later to point to something else, and the characters it points to can be changed as well (or at least the compiler won't prevent you from doing it).
char *const text - a const pointer to non-const character(s) - the pointer itself cannot be changed past this point (the code won't compile if you try), but the characters will be allowed to be changed (the compiler won't complain but that doesn't mean that it's valid to do it - it's up to you the programmer to understand what the circumstances are).
const char *text - a non-const pointer to const character(s): the pointer can be changed later to point somewhere else, but the characters it points to cannot be changed using that pointer - if you try, the code won't compiler.
const char *const text - a const pointer to const character(s): the pointer cannot be changed after its definition, and it cannot be used to change the character(s) it points to - an attempt to do either will prevent the code from compiling.
We chose variant #2: the pointed-to characters can't be constant since gets will definitely be altering them. If you used the variant #4, the code wouldn't compile, since gets expects a pointer to non-const characters.
Choosing #2 we're less likely to mess it up, and we're explicit: this pointer here will remain the same for the duration of the rest of this function.
We also free the pointer immediately before leaving the function: there's no chance we'll inadvertently use it after it was freed, because there's literally nothing done after free.
This coding style protects you from your own humanity. Remember that a lot of software engineering has nothing whatsoever to do with machines. The machine doesn't care much about how comprehensible the code is: it will do what it's told - the code can be completely impenetrable to any human being. The machine doesn't care one bit. The only entitities that are affected - positively or negatively - by the design of the code are the human developers, maintainers, and users. Their humanity is an inseparable aspect of their being, and that implies that they are imperfect (as opposed to the machine which normally is completely dependable).
Finally, this code has a big problem - it again has to do with humans. Indeed you ask the user to enter the size limit for the text. But the assumption must be that humans - being humans - will invariably mess it up. And you'll be absolutely in the wrong if you blame them for messing it up: to err is human, and if you pretend otherwise then you're just an ostrich sticking your head in the sand and pretending there's no problem.
The user can easily make a mistake and enter text longer than the size they declared. That's undefined behavior: the program at this point can do anything, up to and including erasing your hard drive. Here it's not even a joke: in some circumstances it's possible to artificially create an input to this program that would cause the hard drive to indeed be wiped. You may think that it's a far-off possibility, but that's not the case. If you wrote this sort of a program on an Arduino, with an SD card attached, I could create input for both size and text that would cause the contents of the SD card to be zeroed - possibly even an input that can all be typed on a keyboard without use of special control characters. I'm 100% serious here.
Yes, typically this "undefined behavior means you'll format your hard drive" is said tongue-in-cheek, but that doesn't mean preclude it from being a true statement in the right circumstances (usually the more expensive the circumstances, the truer it becomes - such is life). Of course in most cases the user is not malicious - merely error-prone: they'll burn your house down because they were drunk, not because they tried to kill you - that's an awesome consolation I'm sure! But if you get a user that's an adversary - oh boy, they absolutely will leverage all such buffer overrun bugs to take over your system, and soon make you think hard about your choice of career. Maybe landscaping doesn't look all that bad in retrospect when the alternative is to face a massive lawsuit over loss of data (whether disclosure of data or a true loss when the data is wiped and lost).
To this effect, gets() is an absolutely forbidden sort of an interface: it's not possible to make it safe, that is: to make it work when faced with users that are either human, drunk and just error-prone, or worse - an adversary determined to create yet another "Data leak from Bobby Tables' Bank, Inc." headline in the newspaper.
In the second round of fixes, we need to get rid of the gets call: it's basically a big, absurdly bad mistake that the authors of the original C standard library have committed. I am not joking when I say that millions if not billions of dollars have been lost over decades because gets and similarly unsafe interfaces should never ever have been born, and because programmers have been unwittingly using them in spite of their inherently broken, dangerous and unsafe design. What's the problem: well, how on Earth can you tell gets to limit the length of input to actually fit in however much memory you have provided? Sadly, you can't. gets assumes that you-the-programmer have made no mistakes, and that wherever the input's coming from will fit into the space available. Ergo gets is totally utterly broken and any reasonable C coding standard will simply state "Calls to gets are not allowed".
Yes. Forget about gets. Forget about any examples you saw of people calling gets. They are all wrong. Every single one of them. I'm serious. All code using gets is broken, there's no qualification here. If you use gets, you're basically saying "Hey, I've got nothing to lose. If some big institution exposes millions of their users' data, I'm fine with getting sued and having to live under a bridge thereafter". I bet you'd be not so happy about getting sued by a million angry users, so that's where the tale of gets ends. From now on it doesn't exist, and if someone tell you about using gets, you need to look at them weird and tell them "WTF are you talking about? Have you lost your mind?!". That's the only proper response. It's that bad of a problem. No exaggeration here, I'm not trying to scare you.
As for what to do instead of gets? Of course it's a well solved problem. See this question to learn everything you should know about it!
In this function:
It is not needed at all as there is no danger that the automatic variable will be used not initialized in this function
This test checks if malloc was successful or not. If malloc fails it returns NULL
a bit weird way to skip blanks
This statement is not needed at all. The function terminates and variable stops to exists.
The conclusion: I would not rather recommend this kind of code to be used as an example when you learn programming. The authors C knowledge is IMO very limited
Whenever we declare a variable, it is a good practice to initialize it with some value. As you are declaring a dynamic array here, you are initializing it with NULL.
It is set to NULL so that it can be helpful to check if the text is valid or not. If somehow the malloc failed, the text will be still NULL. So you can check whether the malloc failed or not to allocate the memory. Try to put an invalid number for size like -1. You will see that the program won't prompt for the text input, as malloc failed and text is still NULL. I think this answer your query 1, 2, and 4 about why the text is being set to NULL and why it is checking whether the text is NULL or not.
For the 3rd query, After you get the input of size using scanf("%d", &size);, you are pressing Enter. If you don't use the scanf(" ") the pressed Enter will be taken as the end of gets(text) and text would be always empty. So to ignore the Enter pressed after scanf("%d", &size);, scanf(" ") is being used.
char buffer[8];
strncpy(buffer, "12345678", 8);
printf("%s\n", buffer);
prints: 12345678�
I understand that the issue is that there is not room for the null terminator, and that the solution is to change the 8 to a 9.
But, I am curious what it is printing and why it stops after two characters.
Is this a security flaw or just a bug? Could it be exploited by a user?
EDIT 1
I understand that officially it is undefined behavior and that nasal demons may occur at this point from a developer perspective, but if anyone has a good understanding regarding the actual code that is running, are there people who could exploit this code in a controlled manner. I am wondering from the point of view of an exploiter, not a developer, whether this could be used to make effective exploits.
EDIT 2
One of the comments led me to this site and I think it covers the whole idea that I am wondering about: http://www.cse.scu.edu/~tschwarz/coen152_05/Lectures/BufferOverflow.html
It is the way strncpy was designed and implemented. There is a clear warning which is mentioned in most of the man pages of strncpy as below. So, the onus is on the user to ensure he/she uses it correctly in such a way that, it cannot be exploited.
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
But, I am curious what it is printing and why it stops after two characters.
It is an undefined behavior! When you try to print a string using "%s", the printf function keeps printing characters in contiguous memory starting from the beginning address of the string provided till it encounters a '\0'. As the string provided by you is not null terminated, the behavior of printf in such a case cannot be predicted. It may print 2 additional characters or even 200 additional characters or may lead to other unpredictable behaviors.
are there people who could exploit this code in a controlled manner
Yes, Ofcourse. This can lead to printing of contents of memory which would otherwise be inaccessible / unknown to users. Now, how useful the contents of the memory is depends on what actually is present in the memory. It could be a private key or some such information. But, please do note that you need carefully crafted attacks to extract critical information which attacker wants.
When you try to print something that is not a string by using %s in printf, the behavior is undefined. That undefined behavior is what you observe.
Function strncpy, by design, in intended to produce so called fixed-width strings. And that it exactly what it does in your case. But in general case fixed-width strings are different from normal zero-terminated strings. You cannot print them with %s.
In general case, trying to use strncpy to create normal zero-terminated strings makes little or no sense. So no, "the solution" is not to change 8 to 9. The solution is to stop using strncpy if you want to work with zero-terminated strings. Many platforms provide strlcpy function, which is designed as a limited-length string copying function for zero-terminated strings. (Alas, it is not standard.)
If you want to print a fixed-width striung with printf, use format s with precision. In your case printf("%.8s", buffer) would print your fixed-width string properly.
C11 & C++14 standards have dropped gets() function that is inherently insecure & leads to security problems because it doesn't performs bounds checking results in buffer overflow. Then why C11 standard doesn't drop strcat() & strcpy() functions? strcat() function doesn't check to see whether second string will fit in the 1st array. strcpy() function also contains no provision for checking boundary of target array. What if the source array has more characters than destination array can hold? Most probably program will crash at runtime.
So, wouldn't it be nice if these two unsafe functions completely removed from the language? Why they are still exist? What is the reason? Wouldn't it is fine to have only functions like strncat(),strncpy()? If I am not wrong Microsoft C & C++ compiler provides safe versions of these functions strcpy_s(),strcat_s(). Then why they aren't officially implemented by other C compilers to provide safety?
gets() is inherently unsafe, because in general it can overflow the target if too much data is received on stdin. This:
char s[MANY];
gets(s);
will cause undefined behavior if more than MANY characters are entered, and there is typically nothing the program can do to prevent it.
strcpy() and strcat() can be used completely safely, since they can overflow the target only if the source string is too long to be contained in the target array. The source string is contained in an array object that is under the control of the program itself, not of any external input. For example, this:
char s[100];
strcpy(s, "hello");
strcat(s, ", ");
strcat(s, "world");
cannot possibly overflow unless the program itself is modified.
strncat() can be used as a safer version of strcat() -- as long as you specify the third argument correctly. One problem with strncat() is that it only gives you one way of handling the case where there's not enough room in the target array: it silently truncates the string. Sometimes that might be what you want, but sometimes you might want to detect the overflow and do something about it.
As for strncpy(), it is not simply a safer version of strcpy(). It's not inherently dangerous, but if you're not very careful you can easily leave the target array without a terminating '\0' null character, leading to undefined behavior next time you pass it to a function expecting a pointer to a string. As it happens, I've written about this.
strcpy and strcat aren't similar to gets. The problem of gets is, it's used to read from input, so it's out of the programmer's control whether there will be buffer overflow.
C99 Rational explains strncpy as:
Rationale for International Standard — Programming Languages — C §7.21.2.4 The strncpy function
strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter 5 names to null assures efficient field-wise comparisons. strncpy is not by origin a “bounded strcpy,” and the Committee preferred to recognize existing practice rather than alter the function to better suit it to such use.
Myth 1: strcpy() is unsafe and how it works comes as a great surprise to a veteran C programmer.
Myth 2: strncpy() is safe.
Myth 3: strncpy() is a safer version of strcpy().
Myth 4: Microsoft is some kind of authority of the use of the C language and know what they are talking about.
strcat() and strcpy() are perfectly safe functions.
Also note that strncpy was never intended to be a safe version of strcpy. It is used for an obscure, obsolete string format used in an ancient version of Unix. strncpy is actually very unsafe (one of many blog post about it here), unlike strcpy, since very few programmers seem to be able to use the former without producing fatal bugs (no null termination).
A better question is why the inherently unsafe strncpy() wasn't removed from the language. Is anyone working with obscure Unix strings from the 1970s much?
When removing a function completely, one of the major things the standards have to mainly consider is how much of code it could break and how many people (programmers, library writers, compiler vendors, etc) would be annoyed (or would oppose) with the change.
gets() was deprecated from LSB (Linux Standard Base). POSIX-2008 made it obsolete and gets() has been historically known to be a seriously bad function and has always been strongly discouraged to use in any code. Pretty much every C programmer knew it's seriously dangerous to use gets(). So the chances of its removal breaking any production code is very very little, it not, non-existing. So it was easy to remove gets() from C11 for the committee.
But it's not the case with strcpy, strcat, etc. They can be used safely and it's still being used by many programmers in new code. While they can be subject to be buffer overflow, it's mostly programmer's control while gets() isn't.
There can be argument made to use snprintf in place of strcpy and strcat. But it would seem pointless in simple cases like:
char buf[256];
strcpy(buf, "hello");
(if buf was a pointer, then the allocate size need to tracked for use in snprintf)
because as a programmer, I know, the above is perfectly safe. More importantly a lot of legacy code would break. Basically, there's no such strong arguments can be made to remove strcpy, etc functions as they can be used safely.
What you are talking about is scenarios which will lead to undefined behavior.
Let's say
char a[3] = "string";
for(i=0;i<5;i++)
printf("%c\n",a[i]);
You have array out of bound access and the standard hasn't removed this because it is you who is assigning the value and it is under your control.
Same with strcpy() and strcat() .
So standard can't remove all scenarios leading to UB.
Whereas gets() we know is not under the programmers control and it is taking data from some stream and you never know what the input might be and there is a high probability you might end up with buffer overflow so it has been removed and a safer function fgets() has been added.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why should you use strncpy instead of strcpy?
I'm reading a book about computers/cryptographic etc. And often the writer use thing such as strncpy(dest, src, strlen(src)); instead of strcpy(dest, src); it makes no much sense for me.. Well, I'm a professional programmer.
The question is: It make really a real difference? real applications use something like this?
The author is almost certainly abusing the strn* functions. Unfortunately, there is almost never a good reason to use strn* functions, since they don't actually do what you want.
Let's take a look at strcpy and strncpy:
char *strcpy(char *dest, const char *src);
char *strncpy(char *dest, const char *src, size_t n);
The strcpy function copies src into dest, including a trailing \0. It is only rarely useful:
Unless you know the source string fits in the destination buffer, you have a buffer overflow. Buffer overflows are security hazards, they can crash your program, and cause it to behave incorrectly.
If you do know the size of the source string and destination buffer, you might as well use memcpy.
By comparison, the strncpy copies at most n bytes of src into dest and then pads the rest with \0. It is only rarely useful:
Unless you know the source string is smaller than the destination buffer, you cannot be certain that the resulting buffer will be nul-terminated. This can cause errors elsewhere in the program, if you assume the result is nul-terminated.
If you do know the source string is smaller, again, you might as well use memcpy.
You can simply terminate the string after you call strncpy, but are you sure that you want to silently truncate the result? I'd imagine most of the time, you'd rather have an error message.
Why do these functions exist?
The strcpy function is occasionally handy, but it is mostly a relic of an era where people did not care very much about validating input. Feeding a string that is too large to a program would crash it, and the advice was "don't do that".
The strncpy function is useful when you want to transmit data in fixed-size fields, and you don't want to put garbage in the remainder of the field. This is mostly a relic of an era where people used fixed-size fields.
So you will rarely see strcat or strncpy in modern C software.
A worse problem
However, your example combines the worst of both worlds. Let's examine this piece of source code:
strncpy(dest, src, strlen(src));
This copies src into dest, without a \0 terminator and without bounds checking. It combines the worst aspect of strcpy (no bounds checking) with the worst aspect of strncpy (no terminator). If you see code like this, run away.
How to work with strings
Good C code typically uses one of a few options for working with strings:
Use fixed buffers and snprintf, as long as you don't mind fixed buffers.
Use bounded string functions like strlcpy and strlcat. These are BSD extensions.
Use a custom string point which tracks string lengths, and write your own functions using memcpy and malloc.
If that code is a verbatim copy from the book, the author either does not know C or is not a security specialist or both.
It could also be that he's using a misconfigured compiler, explicitly prohibiting the use of certain known to be potentially unsafe functions. And it's a questionable practice when the compiler cannot distinguish safe from unsafe from potentially unsafe and is just getting in the way all the time.
The reason for the strn family is to prevent you from overflowing buffers. If you're passed data from a caller and blindly trust that it's properly null-terminated and won't overflow the buffer you're copying it to, your going to got owned by a buffer overrun attack.
The efficiency difference is negligible. The strn family might be slightly slower as it needs to keep checking that you're not overflowing.
strncpy(dest, src, strlen(src)); is the same as strcpy(dest, src); so there's no difference here, but..
Strcpy can lead to security holes, since you don't know how long the strings are. Somebody clever can overwrite something on your server using this.
Use of strncpy against strcpy has little to do with its execution speed but strncpy strongly lowers the possibility of buffer overflow attacks.
In C, the pointers are most powerful but they also make the system vulnerable.
Following excerpt from a good Hack-Proofing book might help you.
Many overflow bugs are a result of bad string manipulation. Calls such as
strcpy() do not check the length of a string before copying it. The result is
that a buffer overflow may occur. It is expected that a NULL terminator will
be present. In one sense, the attacker relies on this bug in order to exploit a
machine; however, it also means that the attacker’s injected buffer also must
be free of NULL characters. If the attacker inserts a NULL character, the
string copy will be terminated before the entire payload can be inserted.
what should I use when I want to copy src_str to dst_arr and why?
char dst_arr[10];
char *src_str = "hello";
PS: my head is spinning faster than the disk of my computer after reading a lot of things on how good or bad is strncpy and strlcpy.
Note: I know strlcpy is not available everywhere. That is not the concern here.
strncpy is never the right answer when your destination string is zero-terminated. strncpy is a function intended to be used with non-terminated fixed-width strings. More precisely, its purpose is to convert a zero-terminated string to a non-terminated fixed-width string (by copying). In other words, strncpy is not meaningfully applicable here.
The real choice you have here is between strlcpy and plain strcpy.
When you want to perform "safe" (i.e. potentially truncated) copying to dst_arr, the proper function to use is strlcpy.
As for dst_ptr... There's no such thing as "copy to dst_ptr". You can copy to memory pointed by dst_ptr, but first you have to make sure it points somewhere and allocate that memory. There are many different ways to do it.
For example, you can just make dst_ptr to point to dst_arr, in which case the answer is the same as in the previous case - strlcpy.
Or you can allocate the memory using malloc. If the amount of memory you allocated is guaranteed to be enough for the string (i.e. at least strlen(src_str) + 1 bytes is allocated), then you can use the plain strcpy or even memcpy to copy the string. There's no need and no reason to use strlcpy in this case , although some people might prefer using it, since it somehow gives them the feeling of extra safety.
If you intentionally allocate less memory (i.e. you want your string to get truncated), then strlcpy becomes the right function to use.
strlcpy() is safer than strncpy() so you might as well use it.
Systems that don't have it will often have a s_strncpy() that does the same thing.
Note : you can't copy anything to dst_ptr until it points to something
I did not know of strlcpy. I just found here that:
The strlcpy() and strlcat() functions copy and concatenate strings
respectively. They are designed to be safer, more consistent, and
less error prone replacements for strncpy(3) and strncat(3).
So strlcpy seams safer.
Edit: A full discussion is available here.
Edit2:
I realize that what I wrote above does not answer the "in your case" part of your question. If you understand the limitations of strncpy, I guess you can use it and write good code around it to avoid its pitfalls; but if your are not sure about your understanding of its limits, use strlcpy.
My understanding of the limitations of strncpy and strlcpy is that you can do something very bad with strncpy (buffer overflow), and the worst you can do with strlcpy is to loose one char in the process.
You should always the standard function, which in this case is the C11 strcpy_s() function. Not strncpy(), as this is unsafe not guaranteeing zero termination. And not the OpenBSD-only strlcpy(), as it is also unsafe, and OpenBSD always comes up with it's own inventions, which usually don't make it into any standard.
See
http://en.cppreference.com/w/c/string/byte/strcpy
The function strcpy_s is similar to the BSD function strlcpy, except that
strlcpy truncates the source string to fit in the destination (which is a security risk)
strlcpy does not perform all the runtime checks that strcpy_s does
strlcpy does not make failures obvious by setting the destination to a null string or calling a handler if the call fails.
Although strcpy_s prohibits truncation due to potential security risks, it's possible to truncate a string using bounds-checked strncpy_s instead.
If your C library doesn't have strcpy_s, use the safec lib.
https://rurban.github.io/safeclib/doc/safec-3.1/df/d8e/strcpy__s_8c.html
First of all, your dst_ptr has no space allocated and you haven't set it to point at the others, so assigning anything to that would probably cause a segmentation fault.
Strncpy should work perfectly fine - just do:
strncpy(dst_arr, src_str, sizeof(dst_arr));
and you know you wont overflow dst_arr. If you use a bigger src_str you might have to put your own null-terminator at the end of dst_arr, but in this case your source is < your dest, so it will be padded with nulls anyway.
This works everywhere and its safe, so I wouldn't look at anything else unless its intellectual curiousity.
Also note that it would be good to use a non-magic number for the 10 so you know the size of that matches the size of the strncpy :)
you should not use strncpy and not strlcpy for this. Better you use
*dst_arr=0; strncat(dst_arr,src_arr,(sizeof dst_arr)-1);
or without an initialization
sprintf(dst_arr,"%.*s",(sizeof dst_arr)-1,src_arr);
dst_arr here must be an array NOT a pointer.