Valgrind Warning: Should I Take It Seriously - c

Background:
I have a small routine that mimics fgets(character, 2, fp) except it takes a character from a string instead of a stream. newBuff is dynamically allocated string passed as a parameter and character is declared as char character[2].
Routine:
character[0] = newBuff[0];
character[1] = '\0';
strcpy(newBuff, newBuff+1);
The strcpy replicates the loss of information as each character is read from it.
Problem: Valgrind does warns me about
this activity, "Source and destination
overlap in strcpy(0x419b818,
0x419b819)".
Should I worry about this warning?

Probably the standard does not specify what happens when these buffers overlap. So yes, valgrind is right to complain about this.
In practical terms you will most likely find that your strcpy copies in order from left-to-right (eg. while (*dst++ = *src++);) and that it's not an issue. But it it still incorrect and may have issues when running with other C libraries.
One standards-correct way to write this would be:
memmove(newBuff, newBuff+1, strlen(newBuff));
Because memmove is defined to handle overlap. (Although here you would end up traversing the string twice, once to check the length and once to copy. I also took a shortcut, since strlen(newBuff) should equal strlen(newBuff+1)+1, which is what I originally wrote.)

Yes, and you should also worry that your function has pathologically bad performance (O(n^2) for a task that should be O(n)). Moving the entire contents of the string back by a character every time you read a character is a huge waste of time. Instead you should just keep a pointer to the current position and increment that pointer.
Situations where you find yourself needing memmove or the equivalent (copying between buffers that overlap) almost always indicate a design flaw. Often it's not just a flaw in the implementation but in the interface.

Yes -- the behavior of strcpy is only defined if the source and dest don't overlap. You might consider a combination of strlen and memmove instead.

Yes, you should worry. The C standard states that the behavior of strcpy is undefined when the source and destination objects overlap. Undefined behavior means it may work sometimes, or it may fail, or it may appear to succeed but manifest failure elsewhere in the program.

The behavior of strcpy() is officially undefined if source and destination overlap.
From the manpage for memcpy comes a suggestion:
The memcpy() function copies n bytes from memory area s2 to memory area s1. If s1 and s2 overlap, behavior is undefined. Applications in which s1 and s2 might overlap should use memmove(3) instead.

The answer is yes: with certain compiler/library implementations, newest ones I guess, you'll end up with a bogus result. See How is strcpy implemented? for an example.

Related

strcpy has no room for the null terminator, so what is it printing?

char buffer[8];
strncpy(buffer, "12345678", 8);
printf("%s\n", buffer);
prints: 12345678�
I understand that the issue is that there is not room for the null terminator, and that the solution is to change the 8 to a 9.
But, I am curious what it is printing and why it stops after two characters.
Is this a security flaw or just a bug? Could it be exploited by a user?
EDIT 1
I understand that officially it is undefined behavior and that nasal demons may occur at this point from a developer perspective, but if anyone has a good understanding regarding the actual code that is running, are there people who could exploit this code in a controlled manner. I am wondering from the point of view of an exploiter, not a developer, whether this could be used to make effective exploits.
EDIT 2
One of the comments led me to this site and I think it covers the whole idea that I am wondering about: http://www.cse.scu.edu/~tschwarz/coen152_05/Lectures/BufferOverflow.html
It is the way strncpy was designed and implemented. There is a clear warning which is mentioned in most of the man pages of strncpy as below. So, the onus is on the user to ensure he/she uses it correctly in such a way that, it cannot be exploited.
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
But, I am curious what it is printing and why it stops after two characters.
It is an undefined behavior! When you try to print a string using "%s", the printf function keeps printing characters in contiguous memory starting from the beginning address of the string provided till it encounters a '\0'. As the string provided by you is not null terminated, the behavior of printf in such a case cannot be predicted. It may print 2 additional characters or even 200 additional characters or may lead to other unpredictable behaviors.
are there people who could exploit this code in a controlled manner
Yes, Ofcourse. This can lead to printing of contents of memory which would otherwise be inaccessible / unknown to users. Now, how useful the contents of the memory is depends on what actually is present in the memory. It could be a private key or some such information. But, please do note that you need carefully crafted attacks to extract critical information which attacker wants.
When you try to print something that is not a string by using %s in printf, the behavior is undefined. That undefined behavior is what you observe.
Function strncpy, by design, in intended to produce so called fixed-width strings. And that it exactly what it does in your case. But in general case fixed-width strings are different from normal zero-terminated strings. You cannot print them with %s.
In general case, trying to use strncpy to create normal zero-terminated strings makes little or no sense. So no, "the solution" is not to change 8 to 9. The solution is to stop using strncpy if you want to work with zero-terminated strings. Many platforms provide strlcpy function, which is designed as a limited-length string copying function for zero-terminated strings. (Alas, it is not standard.)
If you want to print a fixed-width striung with printf, use format s with precision. In your case printf("%.8s", buffer) would print your fixed-width string properly.

Is strcpy where src == dest strictly defined?

It's not too hard to demonstrate that strcpy on overlapped source and destination addresses fails on some platforms, either producing incorrect results or trapping (the latter with some negative random offsets on Linux/amd64).
I'd instrumented a strcpy wrapper function for our codebase with an debug-build assertions that check for such overlapped copies, and have received a number of internal development requests to weaken this assertion checking so that it only raises an abortion for non-zero overlaps.
I've been hesitant to do so, based on my read of the strcpy documentation since I'd assume that equal source and destinations would count as overlapped. Is overlapped defined explicitly in the C++ standard (or C), and does this also include equality?
I suspect many vendor strcpy implementations special case this despite the freedom the standard allows to have this be undefined behavior. Are there any platform/hardware combinations where such an equal copy is known to fail?
Since you are using C++, the question is really, why are you not using std::string. strcpy is notoriously unsafe (as are it's cousins strcpy_s and strncpy, although they are mildly safer than strcpy).
If you attempt to copy from a source to a destination that are the same, at best you will get no change.
If you found yourself a better documentation website for C functions, you'd see this signature:
char *strcpy(char *restrict s1, const char *restrict s2);
restrict in this case indicates that the caller promises that the two buffers in question do not overlap.
We can further search for the meaning of restrict as of C99, and find this wikipedia page:
It says that for the lifetime of the pointer, only it or a value
directly derived from it (such as pointer + 1) will be used to access
the object to which it points.
which is pretty clear that identical pointers are not allowed. If it happens to work on your system, there is no reason for you to think it will work in the next iteration of the compiler, library, or on new hardware.

strncpy or strlcpy in my case

what should I use when I want to copy src_str to dst_arr and why?
char dst_arr[10];
char *src_str = "hello";
PS: my head is spinning faster than the disk of my computer after reading a lot of things on how good or bad is strncpy and strlcpy.
Note: I know strlcpy is not available everywhere. That is not the concern here.
strncpy is never the right answer when your destination string is zero-terminated. strncpy is a function intended to be used with non-terminated fixed-width strings. More precisely, its purpose is to convert a zero-terminated string to a non-terminated fixed-width string (by copying). In other words, strncpy is not meaningfully applicable here.
The real choice you have here is between strlcpy and plain strcpy.
When you want to perform "safe" (i.e. potentially truncated) copying to dst_arr, the proper function to use is strlcpy.
As for dst_ptr... There's no such thing as "copy to dst_ptr". You can copy to memory pointed by dst_ptr, but first you have to make sure it points somewhere and allocate that memory. There are many different ways to do it.
For example, you can just make dst_ptr to point to dst_arr, in which case the answer is the same as in the previous case - strlcpy.
Or you can allocate the memory using malloc. If the amount of memory you allocated is guaranteed to be enough for the string (i.e. at least strlen(src_str) + 1 bytes is allocated), then you can use the plain strcpy or even memcpy to copy the string. There's no need and no reason to use strlcpy in this case , although some people might prefer using it, since it somehow gives them the feeling of extra safety.
If you intentionally allocate less memory (i.e. you want your string to get truncated), then strlcpy becomes the right function to use.
strlcpy() is safer than strncpy() so you might as well use it.
Systems that don't have it will often have a s_strncpy() that does the same thing.
Note : you can't copy anything to dst_ptr until it points to something
I did not know of strlcpy. I just found here that:
The strlcpy() and strlcat() functions copy and concatenate strings
respectively. They are designed to be safer, more consistent, and
less error prone replacements for strncpy(3) and strncat(3).
So strlcpy seams safer.
Edit: A full discussion is available here.
Edit2:
I realize that what I wrote above does not answer the "in your case" part of your question. If you understand the limitations of strncpy, I guess you can use it and write good code around it to avoid its pitfalls; but if your are not sure about your understanding of its limits, use strlcpy.
My understanding of the limitations of strncpy and strlcpy is that you can do something very bad with strncpy (buffer overflow), and the worst you can do with strlcpy is to loose one char in the process.
You should always the standard function, which in this case is the C11 strcpy_s() function. Not strncpy(), as this is unsafe not guaranteeing zero termination. And not the OpenBSD-only strlcpy(), as it is also unsafe, and OpenBSD always comes up with it's own inventions, which usually don't make it into any standard.
See
http://en.cppreference.com/w/c/string/byte/strcpy
The function strcpy_s is similar to the BSD function strlcpy, except that
strlcpy truncates the source string to fit in the destination (which is a security risk)
strlcpy does not perform all the runtime checks that strcpy_s does
strlcpy does not make failures obvious by setting the destination to a null string or calling a handler if the call fails.
Although strcpy_s prohibits truncation due to potential security risks, it's possible to truncate a string using bounds-checked strncpy_s instead.
If your C library doesn't have strcpy_s, use the safec lib.
https://rurban.github.io/safeclib/doc/safec-3.1/df/d8e/strcpy__s_8c.html
First of all, your dst_ptr has no space allocated and you haven't set it to point at the others, so assigning anything to that would probably cause a segmentation fault.
Strncpy should work perfectly fine - just do:
strncpy(dst_arr, src_str, sizeof(dst_arr));
and you know you wont overflow dst_arr. If you use a bigger src_str you might have to put your own null-terminator at the end of dst_arr, but in this case your source is < your dest, so it will be padded with nulls anyway.
This works everywhere and its safe, so I wouldn't look at anything else unless its intellectual curiousity.
Also note that it would be good to use a non-magic number for the 10 so you know the size of that matches the size of the strncpy :)
you should not use strncpy and not strlcpy for this. Better you use
*dst_arr=0; strncat(dst_arr,src_arr,(sizeof dst_arr)-1);
or without an initialization
sprintf(dst_arr,"%.*s",(sizeof dst_arr)-1,src_arr);
dst_arr here must be an array NOT a pointer.

Is the function strcpy always dangerous?

Are functions like strcpy, gets, etc. always dangerous? What if I write a code like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100);
strcpy(str2, str1);
}
This way the function doesn't accept arguments(parameters...) and the str variable will always be the same length...which is here 16 or slightly more depending on the compiler version...but yeah 100 will suffice as of march, 2011 :).
Is there a way for a hacker to take advantage of the code above?
10x!
Absolutely not. Contrary to Microsoft's marketing campaign for their non-standard functions, strcpy is safe when used properly.
The above is redundant, but mostly safe. The only potential issue is that you're not checking the malloc return value, so you may be dereferencing null (as pointed out by kotlinski). In practice, this likely to cause an immediate SIGSEGV and program termination.
An improper and dangerous use would be:
char array[100];
// ... Read line into uncheckedInput
// Extract substring without checking length
strcpy(array, uncheckedInput + 10);
This is unsafe because the strcpy may overflow, causing undefined behavior. In practice, this is likely to overwrite other local variables (itself a major security breach). One of these may be the return address. Through a return to lib C attack, the attacker may be able to use C functions like system to execute arbitrary programs. There are other possible consequences to overflows.
However, gets is indeed inherently unsafe, and will be removed from the next version of C (C1X). There is simply no way to ensure the input won't overflow (causing the same consequences given above). Some people would argue it's safe when used with a known input file, but there's really no reason to ever use it. POSIX's getline is a far better alternative.
Also, the length of str1 doesn't vary by compiler. It should always be 17, including the terminating NUL.
You are forcefully stuffing completely different things into one category.
Functions gets is indeed always dangerous. There's no way to make a safe call to gets regardless of what steps you are willing to take and how defensive you are willing to get.
Function strcpy is perfectly safe if you are willing to take the [simple] necessary steps to make sure that your calls to strcpy are safe.
That already puts gets and strcpy in vastly different categories, which have nothing in common with regard to safety.
The popular criticisms directed at safety aspects of strcpy are based entirely on anecdotal social observations as opposed to formal facts, e.g. "programmers are lazy and incompetent, so don't let them use strcpy". Taken in the context of C programming, this is, of course, utter nonsense. Following this logic we should also declare the division operator exactly as unsafe for exactly the same reasons.
In reality, there are no problems with strcpy whatsoever. gets, on the other hand, is a completely different story, as I said above.
yes, it is dangerous. After 5 years of maintenance, your code will look like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
{enough lines have been inserted here so as to not have str1 and str2 nice and close to each other on the screen}
char *str2 = malloc(100);
strcpy(str2, str1);
}
at that point, someone will go and change str1 to
str1 = "THIS IS A REALLY LONG STRING WHICH WILL NOW OVERRUN ANY BUFFER BEING USED TO COPY IT INTO UNLESS PRECAUTIONS ARE TAKEN TO RANGE CHECK THE LIMITS OF THE STRING. AND FEW PEOPLE REMEMBER TO DO THAT WHEN BUGFIXING A PROBLEM IN A 5 YEAR OLD BUGGY PROGRAM"
and forget to look where str1 is used and then random errors will start happening...
Your code is not safe. The return value of malloc is unchecked, if it fails and returns 0 the strcpy will give undefined behavior.
Besides that, I see no problem other than that the example basically does not do anything.
strcpy isn't dangerous as far as you know that the destination buffer is large enough to hold the characters of the source string; otherwise strcpy will happily copy more characters than your target buffer can hold, which can lead to several unfortunate consequences (stack/other variables overwriting, which can result in crashes, stack smashing attacks & co.).
But: if you have a generic char * in input which hasn't been already checked, the only way to be sure is to apply strlen to such string and check if it's too large for your buffer; however, now you have to walk the entire source string twice, once for checking its length, once to perform the copy.
This is suboptimal, since, if strcpy were a little bit more advanced, it could receive as a parameter the size of the buffer and stop copying if the source string were too long; in a perfect world, this is how strncpy would perform (following the pattern of other strn*** functions). However, this is not a perfect world, and strncpy is not designed to do this. Instead, the nonstandard (but popular) alternative is strlcpy, which, instead of going out of the bounds of the target buffer, truncates.
Several CRT implementations do not provide this function (notably glibc), but you can still get one of the BSD implementations and put it in your application. A standard (but slower) alternative can be to use snprintf with "%s" as format string.
That said, since you're programming in C++ (edit I see now that the C++ tag has been removed), why don't you just avoid all the C-string nonsense (when you can, obviously) and go with std::string? All these potential security problems vanish and string operations become much easier.
The only way malloc may fail is when an out-of-memory error occurs, which is a disaster by itself. You cannot reliably recover from it because virtually anything may trigger it again, and the OS is likely to kill your process anyway.
As you point out, under constrained circumstances strcpy isn't dangerous. It is more typical to take in a string parameter and copy it to a local buffer, which is when things can get dangerous and lead to a buffer overrun. Just remember to check your copy lengths before calling strcpy and null terminate the string afterward.
Aside for potentially dereferencing NULL (as you do not check the result from malloc) which is UB and likely not a security threat, there is no potential security problem with this.
gets() is always unsafe; the other functions can be used safely.
gets() is unsafe even when you have full control on the input -- someday, the program may be run by someone else.
The only safe way to use gets() is to use it for a single run thing: create the source; compile; run; delete the binary and the source; interpret results.

strcpy when dest buffer is smaller than src buffer

I am trying to understand the difference/disadvantages of strcpy and strncpy.
Can somebody please help:
void main()
{
char src[] = "this is a long string";
char dest[5];
strcpy(dest,src) ;
printf("%s \n", dest);
printf("%s \n", src);
}
The output is:
this is a long string
a long string
QUESTION: I dont understand, how the source sting got modified. As per explanation, strcpy should keep copying till it encounters a '\0', so it does, but how come "src' string got modified.
Please explain.
The easy answer is that you have (with that strcpy() call) done something outside the specifications of the system, and thus deservedly suffer from undefined behaviour.
The more difficult answer involves examining the concrete memory layout on your system, and how strcpy() works internally. It probably goes something like this:
N+28 "g0PP"
N+24 "trin"
N+20 "ng s"
N+16 "a lo"
N+12 " is "
src N+08 "this"
N+04 "DPPP"
dest N+00 "DDDD"
The letters D stand for bytes in dest, the letters P are padding bytes, the 0 characters are ASCII NUL characters used as string terminators.
Now strcpy(dest,src) will change the memory content somewhat (presuming it correctly handles the overlapping memory areas):
N+28 "g0PP"
N+24 "trin"
N+20 "g0 s"
N+16 "trin"
N+12 "ng s"
src N+08 "a lo"
N+04 " is "
dest N+00 "this"
I.e. while dest now "contains" the full string "this is a long string" (if you count the overflowed memory), src now contains a completely different NUL-terminated string "a long string".
This is a buffer overflow, and undefined behavior.
In your case, it appears that the compiler has placed dest and src sequentially in memory. When you copy from src to dest, it continues copying past the end of dest and overwrites part of src.
with high likliness the string are exact neighbours. So in your case you may have this picture
dst | | | | |src | | | | | |
so you start writing and it happens that the fields of src are overwritten.
Howerver you can surely not rely on it. Everything could happen what you have is undefined behaviour. So something else can happen on another computer another time and/or other options.
Regards
Friedrich
Your code caused a buffer overflow - copying to dest more characters than it can hold.
The additional characters were written on another place on the stack, in your case, where src was pointing to.
You need to use strncpy() function.
As an additional note, please keep in mind that strncpy function is not the right function to use when you need to perform copying with buffer overrun protection. This function is not intended for that purpose and has never been intended for that purpose. strncpy is a function that was created long time ago to perform some very application-specific string copying within some very specific filesystem in some old version of UNIX. Unfortunately, the authors of the library managed to "highjack" the generic-sounding name strncpy to use for that very narrow and specific purpose. It was then preserved for backward compatibility purposes. And now, we have a generation or two of programmers who make ther assumptions about strncpy's purpose based solely on its name, and consequently use it improperly. In reality, strncpy has very little or no meaningful uses at all.
C standard library (at least its C89/90 version) offers no string copying function with buffer overrrun protection. In order to perform such protected copying, you have to use either some platform-specific function, like strlcpy, strcpy_s or write one yourself.
P.S. This thread on StackOverflow contains a good discussion about the real purpose strncpy was developed for. See this post specifically for the precise explanation of its role in UNIX file system. Also, see here for a good article on how strncpy came to be.
Once again, strncpy is a function for copying a completely different kind of string - fixed length string. It is not even intended to be used with traditional C-style null-terminated strings.
I suggest a quick read of:
http://en.wikipedia.org/wiki/Strncpy#strncpy
which shows you the differences. Essentially strncpy lets you specify a number of bytes to copy, which means the resultant string isn't necessarily nullterminated.
Now when you use strcpy to copy one string over another, it doesn't check the resultant area of memory to see if it's big enough - it doesn't hold your hand in that regard. It checks up to the null character in the src string.
Of course, dst in this example is only 5 bytes. So what happens? It keeps on writing, to the end of dest and onwards past it in memory. And in this case, the next part of memory on the stack is your src string. So while your code isn't intentionally copying it, the layout of bytes in memory coupled with the writing past the end of dst has caused this.
Hope that helps!
Either I'm misunderstanding your question, or you're misunderstanding strcpy:
QUESTION: I dont understand, how the
source sting got modified. As per
explanation, strcpy should keep
copying till it encounters a '\0', so
it does, but how come "src' string got
modified.
It sounds to me like you're expecting strcpy to stop copying into dest when it reaches the end of dest, based on seeing a \0 character. This isn't what it does. strcpy copies into the destination until it reaches the end of the source string, delimited by a \0 character. It assumes you allocated enough memory for the copy. Before the copy the dest buffer could have anything in it, including all nulls.
strncpy solves this by having you actually tell it how big the buffer you're copying into is, so you can avoid cases where it copies more than can fit.

Resources