strcpy when dest buffer is smaller than src buffer - c

I am trying to understand the difference/disadvantages of strcpy and strncpy.
Can somebody please help:
void main()
{
char src[] = "this is a long string";
char dest[5];
strcpy(dest,src) ;
printf("%s \n", dest);
printf("%s \n", src);
}
The output is:
this is a long string
a long string
QUESTION: I dont understand, how the source sting got modified. As per explanation, strcpy should keep copying till it encounters a '\0', so it does, but how come "src' string got modified.
Please explain.

The easy answer is that you have (with that strcpy() call) done something outside the specifications of the system, and thus deservedly suffer from undefined behaviour.
The more difficult answer involves examining the concrete memory layout on your system, and how strcpy() works internally. It probably goes something like this:
N+28 "g0PP"
N+24 "trin"
N+20 "ng s"
N+16 "a lo"
N+12 " is "
src N+08 "this"
N+04 "DPPP"
dest N+00 "DDDD"
The letters D stand for bytes in dest, the letters P are padding bytes, the 0 characters are ASCII NUL characters used as string terminators.
Now strcpy(dest,src) will change the memory content somewhat (presuming it correctly handles the overlapping memory areas):
N+28 "g0PP"
N+24 "trin"
N+20 "g0 s"
N+16 "trin"
N+12 "ng s"
src N+08 "a lo"
N+04 " is "
dest N+00 "this"
I.e. while dest now "contains" the full string "this is a long string" (if you count the overflowed memory), src now contains a completely different NUL-terminated string "a long string".

This is a buffer overflow, and undefined behavior.
In your case, it appears that the compiler has placed dest and src sequentially in memory. When you copy from src to dest, it continues copying past the end of dest and overwrites part of src.

with high likliness the string are exact neighbours. So in your case you may have this picture
dst | | | | |src | | | | | |
so you start writing and it happens that the fields of src are overwritten.
Howerver you can surely not rely on it. Everything could happen what you have is undefined behaviour. So something else can happen on another computer another time and/or other options.
Regards
Friedrich

Your code caused a buffer overflow - copying to dest more characters than it can hold.
The additional characters were written on another place on the stack, in your case, where src was pointing to.
You need to use strncpy() function.

As an additional note, please keep in mind that strncpy function is not the right function to use when you need to perform copying with buffer overrun protection. This function is not intended for that purpose and has never been intended for that purpose. strncpy is a function that was created long time ago to perform some very application-specific string copying within some very specific filesystem in some old version of UNIX. Unfortunately, the authors of the library managed to "highjack" the generic-sounding name strncpy to use for that very narrow and specific purpose. It was then preserved for backward compatibility purposes. And now, we have a generation or two of programmers who make ther assumptions about strncpy's purpose based solely on its name, and consequently use it improperly. In reality, strncpy has very little or no meaningful uses at all.
C standard library (at least its C89/90 version) offers no string copying function with buffer overrrun protection. In order to perform such protected copying, you have to use either some platform-specific function, like strlcpy, strcpy_s or write one yourself.
P.S. This thread on StackOverflow contains a good discussion about the real purpose strncpy was developed for. See this post specifically for the precise explanation of its role in UNIX file system. Also, see here for a good article on how strncpy came to be.
Once again, strncpy is a function for copying a completely different kind of string - fixed length string. It is not even intended to be used with traditional C-style null-terminated strings.

I suggest a quick read of:
http://en.wikipedia.org/wiki/Strncpy#strncpy
which shows you the differences. Essentially strncpy lets you specify a number of bytes to copy, which means the resultant string isn't necessarily nullterminated.
Now when you use strcpy to copy one string over another, it doesn't check the resultant area of memory to see if it's big enough - it doesn't hold your hand in that regard. It checks up to the null character in the src string.
Of course, dst in this example is only 5 bytes. So what happens? It keeps on writing, to the end of dest and onwards past it in memory. And in this case, the next part of memory on the stack is your src string. So while your code isn't intentionally copying it, the layout of bytes in memory coupled with the writing past the end of dst has caused this.
Hope that helps!

Either I'm misunderstanding your question, or you're misunderstanding strcpy:
QUESTION: I dont understand, how the
source sting got modified. As per
explanation, strcpy should keep
copying till it encounters a '\0', so
it does, but how come "src' string got
modified.
It sounds to me like you're expecting strcpy to stop copying into dest when it reaches the end of dest, based on seeing a \0 character. This isn't what it does. strcpy copies into the destination until it reaches the end of the source string, delimited by a \0 character. It assumes you allocated enough memory for the copy. Before the copy the dest buffer could have anything in it, including all nulls.
strncpy solves this by having you actually tell it how big the buffer you're copying into is, so you can avoid cases where it copies more than can fit.

Related

strcpy has no room for the null terminator, so what is it printing?

char buffer[8];
strncpy(buffer, "12345678", 8);
printf("%s\n", buffer);
prints: 12345678�
I understand that the issue is that there is not room for the null terminator, and that the solution is to change the 8 to a 9.
But, I am curious what it is printing and why it stops after two characters.
Is this a security flaw or just a bug? Could it be exploited by a user?
EDIT 1
I understand that officially it is undefined behavior and that nasal demons may occur at this point from a developer perspective, but if anyone has a good understanding regarding the actual code that is running, are there people who could exploit this code in a controlled manner. I am wondering from the point of view of an exploiter, not a developer, whether this could be used to make effective exploits.
EDIT 2
One of the comments led me to this site and I think it covers the whole idea that I am wondering about: http://www.cse.scu.edu/~tschwarz/coen152_05/Lectures/BufferOverflow.html
It is the way strncpy was designed and implemented. There is a clear warning which is mentioned in most of the man pages of strncpy as below. So, the onus is on the user to ensure he/she uses it correctly in such a way that, it cannot be exploited.
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
But, I am curious what it is printing and why it stops after two characters.
It is an undefined behavior! When you try to print a string using "%s", the printf function keeps printing characters in contiguous memory starting from the beginning address of the string provided till it encounters a '\0'. As the string provided by you is not null terminated, the behavior of printf in such a case cannot be predicted. It may print 2 additional characters or even 200 additional characters or may lead to other unpredictable behaviors.
are there people who could exploit this code in a controlled manner
Yes, Ofcourse. This can lead to printing of contents of memory which would otherwise be inaccessible / unknown to users. Now, how useful the contents of the memory is depends on what actually is present in the memory. It could be a private key or some such information. But, please do note that you need carefully crafted attacks to extract critical information which attacker wants.
When you try to print something that is not a string by using %s in printf, the behavior is undefined. That undefined behavior is what you observe.
Function strncpy, by design, in intended to produce so called fixed-width strings. And that it exactly what it does in your case. But in general case fixed-width strings are different from normal zero-terminated strings. You cannot print them with %s.
In general case, trying to use strncpy to create normal zero-terminated strings makes little or no sense. So no, "the solution" is not to change 8 to 9. The solution is to stop using strncpy if you want to work with zero-terminated strings. Many platforms provide strlcpy function, which is designed as a limited-length string copying function for zero-terminated strings. (Alas, it is not standard.)
If you want to print a fixed-width striung with printf, use format s with precision. In your case printf("%.8s", buffer) would print your fixed-width string properly.

How fast is strn*() compared to str*()? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why should you use strncpy instead of strcpy?
I'm reading a book about computers/cryptographic etc. And often the writer use thing such as strncpy(dest, src, strlen(src)); instead of strcpy(dest, src); it makes no much sense for me.. Well, I'm a professional programmer.
The question is: It make really a real difference? real applications use something like this?
The author is almost certainly abusing the strn* functions. Unfortunately, there is almost never a good reason to use strn* functions, since they don't actually do what you want.
Let's take a look at strcpy and strncpy:
char *strcpy(char *dest, const char *src);
char *strncpy(char *dest, const char *src, size_t n);
The strcpy function copies src into dest, including a trailing \0. It is only rarely useful:
Unless you know the source string fits in the destination buffer, you have a buffer overflow. Buffer overflows are security hazards, they can crash your program, and cause it to behave incorrectly.
If you do know the size of the source string and destination buffer, you might as well use memcpy.
By comparison, the strncpy copies at most n bytes of src into dest and then pads the rest with \0. It is only rarely useful:
Unless you know the source string is smaller than the destination buffer, you cannot be certain that the resulting buffer will be nul-terminated. This can cause errors elsewhere in the program, if you assume the result is nul-terminated.
If you do know the source string is smaller, again, you might as well use memcpy.
You can simply terminate the string after you call strncpy, but are you sure that you want to silently truncate the result? I'd imagine most of the time, you'd rather have an error message.
Why do these functions exist?
The strcpy function is occasionally handy, but it is mostly a relic of an era where people did not care very much about validating input. Feeding a string that is too large to a program would crash it, and the advice was "don't do that".
The strncpy function is useful when you want to transmit data in fixed-size fields, and you don't want to put garbage in the remainder of the field. This is mostly a relic of an era where people used fixed-size fields.
So you will rarely see strcat or strncpy in modern C software.
A worse problem
However, your example combines the worst of both worlds. Let's examine this piece of source code:
strncpy(dest, src, strlen(src));
This copies src into dest, without a \0 terminator and without bounds checking. It combines the worst aspect of strcpy (no bounds checking) with the worst aspect of strncpy (no terminator). If you see code like this, run away.
How to work with strings
Good C code typically uses one of a few options for working with strings:
Use fixed buffers and snprintf, as long as you don't mind fixed buffers.
Use bounded string functions like strlcpy and strlcat. These are BSD extensions.
Use a custom string point which tracks string lengths, and write your own functions using memcpy and malloc.
If that code is a verbatim copy from the book, the author either does not know C or is not a security specialist or both.
It could also be that he's using a misconfigured compiler, explicitly prohibiting the use of certain known to be potentially unsafe functions. And it's a questionable practice when the compiler cannot distinguish safe from unsafe from potentially unsafe and is just getting in the way all the time.
The reason for the strn family is to prevent you from overflowing buffers. If you're passed data from a caller and blindly trust that it's properly null-terminated and won't overflow the buffer you're copying it to, your going to got owned by a buffer overrun attack.
The efficiency difference is negligible. The strn family might be slightly slower as it needs to keep checking that you're not overflowing.
strncpy(dest, src, strlen(src)); is the same as strcpy(dest, src); so there's no difference here, but..
Strcpy can lead to security holes, since you don't know how long the strings are. Somebody clever can overwrite something on your server using this.
Use of strncpy against strcpy has little to do with its execution speed but strncpy strongly lowers the possibility of buffer overflow attacks.
In C, the pointers are most powerful but they also make the system vulnerable.
Following excerpt from a good Hack-Proofing book might help you.
Many overflow bugs are a result of bad string manipulation. Calls such as
strcpy() do not check the length of a string before copying it. The result is
that a buffer overflow may occur. It is expected that a NULL terminator will
be present. In one sense, the attacker relies on this bug in order to exploit a
machine; however, it also means that the attacker’s injected buffer also must
be free of NULL characters. If the attacker inserts a NULL character, the
string copy will be terminated before the entire payload can be inserted.

What do I have to pay attention to when copying strings?

On which factors do I have to pay attention when copying strings in C? And what can go wrong?
I can think of I need to reserve sufficient memory before copying a string and have to make sure I have enough privileges to write to the memory (try to avoid General Protection Faults), but is there anything else I have to pay attention on when copying strings? Are there any additional potential errors?
Make sure that you have sufficient buffer space in the destination (i.e. know ahead how many bytes you're copying). This also means making sure that the source string is properly terminated (by a null character, i.e. 0 = '\0').
Make sure that the destination string gets properly terminated.
If your application is character-set aware, bear in mind that some character sets can have embedded null characters, while some can have variable-length characters.
C strings are conventionally arrays of non-zero bytes ended by a zero byte. Routines dealing with them get a pointer to a byte (that it, to the array starting at that byte). You should take care that the reserved place fits. Learn about strcpy and strcat and their bounded counter-parts strncpy and strncat. Also about strdup. And don't forget to clear the terminating byte to zero (in particular, when reaching the bound with strncpy etc...). Some libraries (e.g. Glib from GTK) provide nice utility functions (e.g. g_strdup_printf) building strings.
Read the documentation of all the functions I mentioned.
You need to:
allocate sufficient memory for destination (strlen(source) + 1)
make sure source is not NULL
deal with unicode issues (string length might be less than length in
bytes)
You have to pay attention to the definition of a string in C: they are a sequence of characters ended with a null terminating character ('\0').
All the functions in string.h will work with this assumption. Work with those functions and you should be fine.

Is the function strcpy always dangerous?

Are functions like strcpy, gets, etc. always dangerous? What if I write a code like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100);
strcpy(str2, str1);
}
This way the function doesn't accept arguments(parameters...) and the str variable will always be the same length...which is here 16 or slightly more depending on the compiler version...but yeah 100 will suffice as of march, 2011 :).
Is there a way for a hacker to take advantage of the code above?
10x!
Absolutely not. Contrary to Microsoft's marketing campaign for their non-standard functions, strcpy is safe when used properly.
The above is redundant, but mostly safe. The only potential issue is that you're not checking the malloc return value, so you may be dereferencing null (as pointed out by kotlinski). In practice, this likely to cause an immediate SIGSEGV and program termination.
An improper and dangerous use would be:
char array[100];
// ... Read line into uncheckedInput
// Extract substring without checking length
strcpy(array, uncheckedInput + 10);
This is unsafe because the strcpy may overflow, causing undefined behavior. In practice, this is likely to overwrite other local variables (itself a major security breach). One of these may be the return address. Through a return to lib C attack, the attacker may be able to use C functions like system to execute arbitrary programs. There are other possible consequences to overflows.
However, gets is indeed inherently unsafe, and will be removed from the next version of C (C1X). There is simply no way to ensure the input won't overflow (causing the same consequences given above). Some people would argue it's safe when used with a known input file, but there's really no reason to ever use it. POSIX's getline is a far better alternative.
Also, the length of str1 doesn't vary by compiler. It should always be 17, including the terminating NUL.
You are forcefully stuffing completely different things into one category.
Functions gets is indeed always dangerous. There's no way to make a safe call to gets regardless of what steps you are willing to take and how defensive you are willing to get.
Function strcpy is perfectly safe if you are willing to take the [simple] necessary steps to make sure that your calls to strcpy are safe.
That already puts gets and strcpy in vastly different categories, which have nothing in common with regard to safety.
The popular criticisms directed at safety aspects of strcpy are based entirely on anecdotal social observations as opposed to formal facts, e.g. "programmers are lazy and incompetent, so don't let them use strcpy". Taken in the context of C programming, this is, of course, utter nonsense. Following this logic we should also declare the division operator exactly as unsafe for exactly the same reasons.
In reality, there are no problems with strcpy whatsoever. gets, on the other hand, is a completely different story, as I said above.
yes, it is dangerous. After 5 years of maintenance, your code will look like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
{enough lines have been inserted here so as to not have str1 and str2 nice and close to each other on the screen}
char *str2 = malloc(100);
strcpy(str2, str1);
}
at that point, someone will go and change str1 to
str1 = "THIS IS A REALLY LONG STRING WHICH WILL NOW OVERRUN ANY BUFFER BEING USED TO COPY IT INTO UNLESS PRECAUTIONS ARE TAKEN TO RANGE CHECK THE LIMITS OF THE STRING. AND FEW PEOPLE REMEMBER TO DO THAT WHEN BUGFIXING A PROBLEM IN A 5 YEAR OLD BUGGY PROGRAM"
and forget to look where str1 is used and then random errors will start happening...
Your code is not safe. The return value of malloc is unchecked, if it fails and returns 0 the strcpy will give undefined behavior.
Besides that, I see no problem other than that the example basically does not do anything.
strcpy isn't dangerous as far as you know that the destination buffer is large enough to hold the characters of the source string; otherwise strcpy will happily copy more characters than your target buffer can hold, which can lead to several unfortunate consequences (stack/other variables overwriting, which can result in crashes, stack smashing attacks & co.).
But: if you have a generic char * in input which hasn't been already checked, the only way to be sure is to apply strlen to such string and check if it's too large for your buffer; however, now you have to walk the entire source string twice, once for checking its length, once to perform the copy.
This is suboptimal, since, if strcpy were a little bit more advanced, it could receive as a parameter the size of the buffer and stop copying if the source string were too long; in a perfect world, this is how strncpy would perform (following the pattern of other strn*** functions). However, this is not a perfect world, and strncpy is not designed to do this. Instead, the nonstandard (but popular) alternative is strlcpy, which, instead of going out of the bounds of the target buffer, truncates.
Several CRT implementations do not provide this function (notably glibc), but you can still get one of the BSD implementations and put it in your application. A standard (but slower) alternative can be to use snprintf with "%s" as format string.
That said, since you're programming in C++ (edit I see now that the C++ tag has been removed), why don't you just avoid all the C-string nonsense (when you can, obviously) and go with std::string? All these potential security problems vanish and string operations become much easier.
The only way malloc may fail is when an out-of-memory error occurs, which is a disaster by itself. You cannot reliably recover from it because virtually anything may trigger it again, and the OS is likely to kill your process anyway.
As you point out, under constrained circumstances strcpy isn't dangerous. It is more typical to take in a string parameter and copy it to a local buffer, which is when things can get dangerous and lead to a buffer overrun. Just remember to check your copy lengths before calling strcpy and null terminate the string afterward.
Aside for potentially dereferencing NULL (as you do not check the result from malloc) which is UB and likely not a security threat, there is no potential security problem with this.
gets() is always unsafe; the other functions can be used safely.
gets() is unsafe even when you have full control on the input -- someday, the program may be run by someone else.
The only safe way to use gets() is to use it for a single run thing: create the source; compile; run; delete the binary and the source; interpret results.

Valgrind Warning: Should I Take It Seriously

Background:
I have a small routine that mimics fgets(character, 2, fp) except it takes a character from a string instead of a stream. newBuff is dynamically allocated string passed as a parameter and character is declared as char character[2].
Routine:
character[0] = newBuff[0];
character[1] = '\0';
strcpy(newBuff, newBuff+1);
The strcpy replicates the loss of information as each character is read from it.
Problem: Valgrind does warns me about
this activity, "Source and destination
overlap in strcpy(0x419b818,
0x419b819)".
Should I worry about this warning?
Probably the standard does not specify what happens when these buffers overlap. So yes, valgrind is right to complain about this.
In practical terms you will most likely find that your strcpy copies in order from left-to-right (eg. while (*dst++ = *src++);) and that it's not an issue. But it it still incorrect and may have issues when running with other C libraries.
One standards-correct way to write this would be:
memmove(newBuff, newBuff+1, strlen(newBuff));
Because memmove is defined to handle overlap. (Although here you would end up traversing the string twice, once to check the length and once to copy. I also took a shortcut, since strlen(newBuff) should equal strlen(newBuff+1)+1, which is what I originally wrote.)
Yes, and you should also worry that your function has pathologically bad performance (O(n^2) for a task that should be O(n)). Moving the entire contents of the string back by a character every time you read a character is a huge waste of time. Instead you should just keep a pointer to the current position and increment that pointer.
Situations where you find yourself needing memmove or the equivalent (copying between buffers that overlap) almost always indicate a design flaw. Often it's not just a flaw in the implementation but in the interface.
Yes -- the behavior of strcpy is only defined if the source and dest don't overlap. You might consider a combination of strlen and memmove instead.
Yes, you should worry. The C standard states that the behavior of strcpy is undefined when the source and destination objects overlap. Undefined behavior means it may work sometimes, or it may fail, or it may appear to succeed but manifest failure elsewhere in the program.
The behavior of strcpy() is officially undefined if source and destination overlap.
From the manpage for memcpy comes a suggestion:
The memcpy() function copies n bytes from memory area s2 to memory area s1. If s1 and s2 overlap, behavior is undefined. Applications in which s1 and s2 might overlap should use memmove(3) instead.
The answer is yes: with certain compiler/library implementations, newest ones I guess, you'll end up with a bogus result. See How is strcpy implemented? for an example.

Resources