On which factors do I have to pay attention when copying strings in C? And what can go wrong?
I can think of I need to reserve sufficient memory before copying a string and have to make sure I have enough privileges to write to the memory (try to avoid General Protection Faults), but is there anything else I have to pay attention on when copying strings? Are there any additional potential errors?
Make sure that you have sufficient buffer space in the destination (i.e. know ahead how many bytes you're copying). This also means making sure that the source string is properly terminated (by a null character, i.e. 0 = '\0').
Make sure that the destination string gets properly terminated.
If your application is character-set aware, bear in mind that some character sets can have embedded null characters, while some can have variable-length characters.
C strings are conventionally arrays of non-zero bytes ended by a zero byte. Routines dealing with them get a pointer to a byte (that it, to the array starting at that byte). You should take care that the reserved place fits. Learn about strcpy and strcat and their bounded counter-parts strncpy and strncat. Also about strdup. And don't forget to clear the terminating byte to zero (in particular, when reaching the bound with strncpy etc...). Some libraries (e.g. Glib from GTK) provide nice utility functions (e.g. g_strdup_printf) building strings.
Read the documentation of all the functions I mentioned.
You need to:
allocate sufficient memory for destination (strlen(source) + 1)
make sure source is not NULL
deal with unicode issues (string length might be less than length in
bytes)
You have to pay attention to the definition of a string in C: they are a sequence of characters ended with a null terminating character ('\0').
All the functions in string.h will work with this assumption. Work with those functions and you should be fine.
Related
char buffer[8];
strncpy(buffer, "12345678", 8);
printf("%s\n", buffer);
prints: 12345678�
I understand that the issue is that there is not room for the null terminator, and that the solution is to change the 8 to a 9.
But, I am curious what it is printing and why it stops after two characters.
Is this a security flaw or just a bug? Could it be exploited by a user?
EDIT 1
I understand that officially it is undefined behavior and that nasal demons may occur at this point from a developer perspective, but if anyone has a good understanding regarding the actual code that is running, are there people who could exploit this code in a controlled manner. I am wondering from the point of view of an exploiter, not a developer, whether this could be used to make effective exploits.
EDIT 2
One of the comments led me to this site and I think it covers the whole idea that I am wondering about: http://www.cse.scu.edu/~tschwarz/coen152_05/Lectures/BufferOverflow.html
It is the way strncpy was designed and implemented. There is a clear warning which is mentioned in most of the man pages of strncpy as below. So, the onus is on the user to ensure he/she uses it correctly in such a way that, it cannot be exploited.
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
But, I am curious what it is printing and why it stops after two characters.
It is an undefined behavior! When you try to print a string using "%s", the printf function keeps printing characters in contiguous memory starting from the beginning address of the string provided till it encounters a '\0'. As the string provided by you is not null terminated, the behavior of printf in such a case cannot be predicted. It may print 2 additional characters or even 200 additional characters or may lead to other unpredictable behaviors.
are there people who could exploit this code in a controlled manner
Yes, Ofcourse. This can lead to printing of contents of memory which would otherwise be inaccessible / unknown to users. Now, how useful the contents of the memory is depends on what actually is present in the memory. It could be a private key or some such information. But, please do note that you need carefully crafted attacks to extract critical information which attacker wants.
When you try to print something that is not a string by using %s in printf, the behavior is undefined. That undefined behavior is what you observe.
Function strncpy, by design, in intended to produce so called fixed-width strings. And that it exactly what it does in your case. But in general case fixed-width strings are different from normal zero-terminated strings. You cannot print them with %s.
In general case, trying to use strncpy to create normal zero-terminated strings makes little or no sense. So no, "the solution" is not to change 8 to 9. The solution is to stop using strncpy if you want to work with zero-terminated strings. Many platforms provide strlcpy function, which is designed as a limited-length string copying function for zero-terminated strings. (Alas, it is not standard.)
If you want to print a fixed-width striung with printf, use format s with precision. In your case printf("%.8s", buffer) would print your fixed-width string properly.
I have a function that gets passed an array of chars, or a string. I use this array for data array and thus has a lot of random characters in it including NULL chars. My problem comes in when I am trying to retrieve this data the compiler sees and the Null char and thinks the string ends. Thereby effectively throwing out all the data after that. Is there an option where I can somehow make an array that is not ended by a Null char?
A C string and an array of char is not the same thing. The first is implemented by means of the second with the additional convention that the string ends where the array has the first 0 element.
So what you need is just an unsigned char[something] and you'd have to keep track of the length that you want to have separately. Then also you shouldn't use strcpy or similar functions but memcpy etc.
The null ('\0') terminator is treated as the string terminator in C. So You need to tell the compiler exactly how much data to read, why don't you maintain a separate count for the size of the data and then use functions which use that size to operate on the data?
Firstly, a string in C language is not some sort of "black box" object that can somehow choose to ignore or not ignore something out of its own will. It is based on a mere raw array of chars, of which you have full unrestricted control. This means that it is really you who chooses how to process the data stored in that array of chars. Not the compiler, not the string itself, but you and only you.
Secondly, a string in C language is defined as a sequence of characters ending with zero character. This immediately means that if you attempt using string-specific functions with your array, they will always stop at zeros. If you want your data to contain embedded zeros, then you should not call it "strings" and you should not use any string-specific functions with it. So, forget about strcmp, strcpy and such. Again, it is something you are responsible for, not the compiler.
Thirdly, the functions you would use with such data would typically be functions like memcpy for copying, memcmp for comparison and so on. Anything that's missing you'll have to implement yourself. And since you no longer have any terminating characters in your data, it is your responsibility to know where the data begins and where it ends.
what should I use when I want to copy src_str to dst_arr and why?
char dst_arr[10];
char *src_str = "hello";
PS: my head is spinning faster than the disk of my computer after reading a lot of things on how good or bad is strncpy and strlcpy.
Note: I know strlcpy is not available everywhere. That is not the concern here.
strncpy is never the right answer when your destination string is zero-terminated. strncpy is a function intended to be used with non-terminated fixed-width strings. More precisely, its purpose is to convert a zero-terminated string to a non-terminated fixed-width string (by copying). In other words, strncpy is not meaningfully applicable here.
The real choice you have here is between strlcpy and plain strcpy.
When you want to perform "safe" (i.e. potentially truncated) copying to dst_arr, the proper function to use is strlcpy.
As for dst_ptr... There's no such thing as "copy to dst_ptr". You can copy to memory pointed by dst_ptr, but first you have to make sure it points somewhere and allocate that memory. There are many different ways to do it.
For example, you can just make dst_ptr to point to dst_arr, in which case the answer is the same as in the previous case - strlcpy.
Or you can allocate the memory using malloc. If the amount of memory you allocated is guaranteed to be enough for the string (i.e. at least strlen(src_str) + 1 bytes is allocated), then you can use the plain strcpy or even memcpy to copy the string. There's no need and no reason to use strlcpy in this case , although some people might prefer using it, since it somehow gives them the feeling of extra safety.
If you intentionally allocate less memory (i.e. you want your string to get truncated), then strlcpy becomes the right function to use.
strlcpy() is safer than strncpy() so you might as well use it.
Systems that don't have it will often have a s_strncpy() that does the same thing.
Note : you can't copy anything to dst_ptr until it points to something
I did not know of strlcpy. I just found here that:
The strlcpy() and strlcat() functions copy and concatenate strings
respectively. They are designed to be safer, more consistent, and
less error prone replacements for strncpy(3) and strncat(3).
So strlcpy seams safer.
Edit: A full discussion is available here.
Edit2:
I realize that what I wrote above does not answer the "in your case" part of your question. If you understand the limitations of strncpy, I guess you can use it and write good code around it to avoid its pitfalls; but if your are not sure about your understanding of its limits, use strlcpy.
My understanding of the limitations of strncpy and strlcpy is that you can do something very bad with strncpy (buffer overflow), and the worst you can do with strlcpy is to loose one char in the process.
You should always the standard function, which in this case is the C11 strcpy_s() function. Not strncpy(), as this is unsafe not guaranteeing zero termination. And not the OpenBSD-only strlcpy(), as it is also unsafe, and OpenBSD always comes up with it's own inventions, which usually don't make it into any standard.
See
http://en.cppreference.com/w/c/string/byte/strcpy
The function strcpy_s is similar to the BSD function strlcpy, except that
strlcpy truncates the source string to fit in the destination (which is a security risk)
strlcpy does not perform all the runtime checks that strcpy_s does
strlcpy does not make failures obvious by setting the destination to a null string or calling a handler if the call fails.
Although strcpy_s prohibits truncation due to potential security risks, it's possible to truncate a string using bounds-checked strncpy_s instead.
If your C library doesn't have strcpy_s, use the safec lib.
https://rurban.github.io/safeclib/doc/safec-3.1/df/d8e/strcpy__s_8c.html
First of all, your dst_ptr has no space allocated and you haven't set it to point at the others, so assigning anything to that would probably cause a segmentation fault.
Strncpy should work perfectly fine - just do:
strncpy(dst_arr, src_str, sizeof(dst_arr));
and you know you wont overflow dst_arr. If you use a bigger src_str you might have to put your own null-terminator at the end of dst_arr, but in this case your source is < your dest, so it will be padded with nulls anyway.
This works everywhere and its safe, so I wouldn't look at anything else unless its intellectual curiousity.
Also note that it would be good to use a non-magic number for the 10 so you know the size of that matches the size of the strncpy :)
you should not use strncpy and not strlcpy for this. Better you use
*dst_arr=0; strncat(dst_arr,src_arr,(sizeof dst_arr)-1);
or without an initialization
sprintf(dst_arr,"%.*s",(sizeof dst_arr)-1,src_arr);
dst_arr here must be an array NOT a pointer.
Background:
I have a small routine that mimics fgets(character, 2, fp) except it takes a character from a string instead of a stream. newBuff is dynamically allocated string passed as a parameter and character is declared as char character[2].
Routine:
character[0] = newBuff[0];
character[1] = '\0';
strcpy(newBuff, newBuff+1);
The strcpy replicates the loss of information as each character is read from it.
Problem: Valgrind does warns me about
this activity, "Source and destination
overlap in strcpy(0x419b818,
0x419b819)".
Should I worry about this warning?
Probably the standard does not specify what happens when these buffers overlap. So yes, valgrind is right to complain about this.
In practical terms you will most likely find that your strcpy copies in order from left-to-right (eg. while (*dst++ = *src++);) and that it's not an issue. But it it still incorrect and may have issues when running with other C libraries.
One standards-correct way to write this would be:
memmove(newBuff, newBuff+1, strlen(newBuff));
Because memmove is defined to handle overlap. (Although here you would end up traversing the string twice, once to check the length and once to copy. I also took a shortcut, since strlen(newBuff) should equal strlen(newBuff+1)+1, which is what I originally wrote.)
Yes, and you should also worry that your function has pathologically bad performance (O(n^2) for a task that should be O(n)). Moving the entire contents of the string back by a character every time you read a character is a huge waste of time. Instead you should just keep a pointer to the current position and increment that pointer.
Situations where you find yourself needing memmove or the equivalent (copying between buffers that overlap) almost always indicate a design flaw. Often it's not just a flaw in the implementation but in the interface.
Yes -- the behavior of strcpy is only defined if the source and dest don't overlap. You might consider a combination of strlen and memmove instead.
Yes, you should worry. The C standard states that the behavior of strcpy is undefined when the source and destination objects overlap. Undefined behavior means it may work sometimes, or it may fail, or it may appear to succeed but manifest failure elsewhere in the program.
The behavior of strcpy() is officially undefined if source and destination overlap.
From the manpage for memcpy comes a suggestion:
The memcpy() function copies n bytes from memory area s2 to memory area s1. If s1 and s2 overlap, behavior is undefined. Applications in which s1 and s2 might overlap should use memmove(3) instead.
The answer is yes: with certain compiler/library implementations, newest ones I guess, you'll end up with a bogus result. See How is strcpy implemented? for an example.
I am trying to understand the difference/disadvantages of strcpy and strncpy.
Can somebody please help:
void main()
{
char src[] = "this is a long string";
char dest[5];
strcpy(dest,src) ;
printf("%s \n", dest);
printf("%s \n", src);
}
The output is:
this is a long string
a long string
QUESTION: I dont understand, how the source sting got modified. As per explanation, strcpy should keep copying till it encounters a '\0', so it does, but how come "src' string got modified.
Please explain.
The easy answer is that you have (with that strcpy() call) done something outside the specifications of the system, and thus deservedly suffer from undefined behaviour.
The more difficult answer involves examining the concrete memory layout on your system, and how strcpy() works internally. It probably goes something like this:
N+28 "g0PP"
N+24 "trin"
N+20 "ng s"
N+16 "a lo"
N+12 " is "
src N+08 "this"
N+04 "DPPP"
dest N+00 "DDDD"
The letters D stand for bytes in dest, the letters P are padding bytes, the 0 characters are ASCII NUL characters used as string terminators.
Now strcpy(dest,src) will change the memory content somewhat (presuming it correctly handles the overlapping memory areas):
N+28 "g0PP"
N+24 "trin"
N+20 "g0 s"
N+16 "trin"
N+12 "ng s"
src N+08 "a lo"
N+04 " is "
dest N+00 "this"
I.e. while dest now "contains" the full string "this is a long string" (if you count the overflowed memory), src now contains a completely different NUL-terminated string "a long string".
This is a buffer overflow, and undefined behavior.
In your case, it appears that the compiler has placed dest and src sequentially in memory. When you copy from src to dest, it continues copying past the end of dest and overwrites part of src.
with high likliness the string are exact neighbours. So in your case you may have this picture
dst | | | | |src | | | | | |
so you start writing and it happens that the fields of src are overwritten.
Howerver you can surely not rely on it. Everything could happen what you have is undefined behaviour. So something else can happen on another computer another time and/or other options.
Regards
Friedrich
Your code caused a buffer overflow - copying to dest more characters than it can hold.
The additional characters were written on another place on the stack, in your case, where src was pointing to.
You need to use strncpy() function.
As an additional note, please keep in mind that strncpy function is not the right function to use when you need to perform copying with buffer overrun protection. This function is not intended for that purpose and has never been intended for that purpose. strncpy is a function that was created long time ago to perform some very application-specific string copying within some very specific filesystem in some old version of UNIX. Unfortunately, the authors of the library managed to "highjack" the generic-sounding name strncpy to use for that very narrow and specific purpose. It was then preserved for backward compatibility purposes. And now, we have a generation or two of programmers who make ther assumptions about strncpy's purpose based solely on its name, and consequently use it improperly. In reality, strncpy has very little or no meaningful uses at all.
C standard library (at least its C89/90 version) offers no string copying function with buffer overrrun protection. In order to perform such protected copying, you have to use either some platform-specific function, like strlcpy, strcpy_s or write one yourself.
P.S. This thread on StackOverflow contains a good discussion about the real purpose strncpy was developed for. See this post specifically for the precise explanation of its role in UNIX file system. Also, see here for a good article on how strncpy came to be.
Once again, strncpy is a function for copying a completely different kind of string - fixed length string. It is not even intended to be used with traditional C-style null-terminated strings.
I suggest a quick read of:
http://en.wikipedia.org/wiki/Strncpy#strncpy
which shows you the differences. Essentially strncpy lets you specify a number of bytes to copy, which means the resultant string isn't necessarily nullterminated.
Now when you use strcpy to copy one string over another, it doesn't check the resultant area of memory to see if it's big enough - it doesn't hold your hand in that regard. It checks up to the null character in the src string.
Of course, dst in this example is only 5 bytes. So what happens? It keeps on writing, to the end of dest and onwards past it in memory. And in this case, the next part of memory on the stack is your src string. So while your code isn't intentionally copying it, the layout of bytes in memory coupled with the writing past the end of dst has caused this.
Hope that helps!
Either I'm misunderstanding your question, or you're misunderstanding strcpy:
QUESTION: I dont understand, how the
source sting got modified. As per
explanation, strcpy should keep
copying till it encounters a '\0', so
it does, but how come "src' string got
modified.
It sounds to me like you're expecting strcpy to stop copying into dest when it reaches the end of dest, based on seeing a \0 character. This isn't what it does. strcpy copies into the destination until it reaches the end of the source string, delimited by a \0 character. It assumes you allocated enough memory for the copy. Before the copy the dest buffer could have anything in it, including all nulls.
strncpy solves this by having you actually tell it how big the buffer you're copying into is, so you can avoid cases where it copies more than can fit.