Related
This question already has answers here:
Why should you use strncpy instead of strcpy?
(10 answers)
Closed 5 years ago.
I know there are a lot of answers on the web which say one is better than the other but I have come to this understanding of strncpy() is better in terms of preventing unnecessary crashes in the system because of the missing '\0'. As many people put it, it is quite defensive.
I have already gone through this link: Why should you use strncpy instead of strcpy?
Let's say:
char dest[5];
char src[7] = "Hello";
strncpy(dest, src, sizeof(dest)-1);
With this implementation (the key being sizeof(dest)-1 which always gives room for the '\0' to be appended), even if the destination has a truncated string, isn't it always safer to use strncpy() instead of strcpy()? In other words, I'm trying to understand when would one want to use strcpy() over strncpy()?
It is not better. Contrary to common belief strncpy was not created to replace strcpy or to be a safe version of it. It fails to null terminate the string when the source string is larger than the num parameter, so you have to do it manually. Like this:
strncpy(dst, src, num);
dst[num - 1] = 0;
Which requires 2 lines to achieve 1 simple goal and it is easy for developers to forget the second one.
Besides that, it always writes num characters (appends 0's if necessary) regardless of the sizes of the strings which can be a performance concern when dst is small and num is large.
Using strncat fixes the second problem of writing extra characters but still forces you to write 2 lines:
dst[0];
strncat(dst, src, num);
The function strlcpy fixes both of these problems but it is not a C standard library function so do not use it.
The right choice is snprintf which is guaranteed to null terminate the string and only writes the necessary amount of characters, even if it means truncating the source string.
snprintf(dst, size, "%s", src);
In C, strncpy has the advantage over strcpy that it does not overflow the buffer when the source is larger than the destination.
strncpy has the disadvantage that, when the source is larger than the destination, it does not terminate the buffer with a \0. So every call of strncpy must be followed with a statement terminating the buffer, such as
strncpy(dst, src, n);
dst[n-1]='\0';
User Eugene Sh points out that for defensive programming strlcpy must be used: "strlcpy() takes the full size of the destination buffer and
guarantees NUL-termination."
In C++, use std::string rather than c-style strings unless performance in a particular area is critical, in which case performance test to see if it is more than a negligible difference. You can use std::string::reserve to minimize allocations when you know the size up front.
In my opinion, 85% of bugs are caused by c-style strings. std::string is much safer and is less error prone.
In C, you won't have std::string available.
If you have to use c-style strings, then I'd prefer strncpy, because at least it asks for the size and makes you think about the size, rather than just overflowing.
Recently I was programming in my Code Blocks and I did a little program only for hobby in C.
char littleString[1];
fflush( stdin );
scanf( "%s", littleString );
printf( "\n%s", littleString);
If I created a string of one character, why does the CodeBlocks allow me to save 13 characters?
C have no bounds-checking, writing out of bounds of arrays or dynamically allocated memory can't be checked by the compiler. Instead it will lead to undefined behavior.
To prevent buffer overflow with scanf you can tell it to only read a specific number of characters, and nothing more. So to tell it to read only one character you use the format "%1s".
As a small side-note: Remember that strings in C have an extra character in them, the terminator (character '\0'). So if you have a string that should contain one character, the size actually needs to be two characters.
LittleString is not a string. It is a char array of length one. In order for a char array to be a string, it must be null terminated with an \0. You are writing past the memory you have allotted for littleString. This is undefined behavior.Scanf just reads user input from the console and assigns it to the variable specified, in this case littleString. If you would like to control the length of user input which is assigned to the variable, I would suggest using scanf_s. Please note that scanf_s is not a C99 standard
Many functions in C is implemented without any checks for correctness of use. In other words, it is the callers responsibility that the arguments fulfill some rules set by the function.
Example: For strcpy the Linux man page says
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer
pointed to by dest. The strings may not overlap, and the
destination string dest must be large enough to receive the copy.
If you as a caller break that contract by passing a too small buffer, you'll have undefined behavior and anything can happen.
The program may crash or even do exactly what you expected in 99 out of 100 times and do something strange in 1 out of 100 times.
In the various cases that a buffer is provided to the standard library's many string functions, is it guaranteed that the buffer will not be modified beyond the null terminator? For example:
char buffer[17] = "abcdefghijklmnop";
sscanf("123", "%16s", buffer);
Is buffer now required to equal "123\0efghijklmnop"?
Another example:
char buffer[10];
fgets(buffer, 10, fp);
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
The C99 draft standard does not explicitly state what should happen in those cases, but by considering multiple variations, you can show that it must work a certain way so that it meets the specification in all cases.
The standard says:
%s - Matches a sequence of non-white-space characters.252)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept the
sequence and a terminating null character, which will be added automatically.
Here's a pair of examples that show it must work the way you are proposing to meet the standard.
Example A:
char buffer[4] = "abcd";
char buffer2[10]; // Note the this could be placed at what would be buffer+4
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\0"
// buffer2 = "4\0"
Example B:
char buffer[17] = "abcdefghijklmnop";
char* buffer2 = &buffer[4];
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\04\0"
Note that the interface of sscanf doesn't provide enough information to really know that these were different. So, if Example B is to work properly, it must not mess with the bytes after the null character in Example A. This is because it must work in both cases according to this bit of spec.
So implicitly it must work as you stated due to the spec.
Similar arguments can be placed for other functions, but I think you can see the idea from this example.
NOTE:
Providing size limits in the format, such as "%16s", could change the behavior. By the specification, it would be functionally acceptable for sscanf to zero out a buffer to its limits before writing the data into the buffer. In practice, most implementations opt for performance, which means they leave the remainder alone.
When the intent of the specification is to do this sort of zeroing out, it is usually explicitly specified. strncpy is an example. If the length of the string is less than the maximum buffer length specified, it will fill the rest of the space with null characters. The fact that this same "string" function could return a non-terminated string as well makes this one of the most common functions for people to roll their own version.
As far as fgets, a similar situation could arise. The only gotcha is that the specification explicitly states that if nothing is read in, the buffer remains untouched. An acceptable functional implementation could sidestep this by checking to see if there is at least one byte to read before zeroing out the buffer.
Each individual byte in the buffer is an object. Unless some part of the function description of sscanf or fgets mentions modifying those bytes, or even implies their values may change e.g. by stating their values become unspecified, then the general rule applies: (emphasis mine)
6.2.4 Storage durations of objects
2 [...] An object exists, has a constant address, and retains its last-stored value throughout its lifetime. [...]
It's this same principle that guarantees that
#include <stdio.h>
int a = 1;
int main() {
printf ("%d\n", a);
printf ("%d\n", a);
}
attempts to print 1 twice. Even though a is global, printf can access global variables, and the description of printf doesn't mention not modifying a.
Neither the description of fgets nor that of sscanf mentions modifying buffers past the bytes that actually were supposed to be written (except in the case of a read error), so those bytes don't get modified.
The standard is somewhat ambiguous on this, but I think a reasonable reading of it is that the answer is: yes, it's not allowed to write more bytes to the buffer than it read+null. On the other hand, a stricter reading/interpretation of the text could conclude that the answer is no, there's no guarantee. Here's what a publicly avaialble draft says about fgets.
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
There's a guarantee about how much it is supposed to read from the input, i.e. stop reading at newline or EOF and not read more than n-1 bytes. Although nothing is said explicitly about how much it's allowed to write to the buffer, the common knowledge is that fgets's n parameter is used to prevent buffer overflows. It's a little strange that the standard uses the ambiguous term read, which may not necessarily imply that gets can't write to the buffer more than n bytes, if you want to nitpick on the terminology it uses. But note that the same "read" terminology is used about both issues: the n-limit and the EOF/newline limit. So if you interpret the n-related "read" as a buffer-write limit, then [for consistency] you can/should interpret the other "read" the same way, i.e. not write more than what it read when string is shorter than the buffer.
On the other hand, if you distinguish between the uses of the phrase-verb "read into" (="write") and just "read", then you can't read the committee's text the same way. You are guaranteed that it won't "read into" (="write to") the array more than n bytes, but if the input string is terminated sooner by newline or EOF you're only guaranteed the rest (of the input) won't be "read", but whether that implies in won't be "read into" (="written to") the buffer is unclear under this stricter reading. The crucial issue is keyword is "into", which is elided, so the problem is whether the completion given by me in brackets in the following modified quote is the intended interpretation:
No additional characters are read [into the array] after a new-line character (which is retained) or after end-of-file.
Frankly a single postcondition stated as a formula (and would be pretty short in this case) would have been a lot more helpful than the verbiage I quoted...
I can't be bothered to try and analyze their writeup about the *scanf family, because I suspect it's going to be even more complicated given all the other things that happen in those functions; their writeup for fscanf is about five pages long... But I suspect a similar logic applies.
is it guaranteed that the buffer will not be modified beyond the null
terminator?
No, there's no guarantee.
Is buffer now required to equal "123\0efghijklmnop"?
Yes. But that's only because you've used correct parameters to your string related functions. Should you mess up buffer length, input modifiers to sscanf and such, then you program will compile. But it will most likely fail during runtime.
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
Yes. Once fgets() figures you have a 3 character input string it stores the input in the provided buffer, and it doesn't care about the reset of provided space at all.
Is buffer now required to equal "123\0efghijklmnop"?
Here buffer is just consists of 123 string guaranteed terminating at NUL.
Yes the memory allocated for array buffer will not get de-allocated, however you are making sure/restricting your string buffer can atmost only have 16 char elements which you can read into it at any point of time. Now depends whether you write just a single char or maximum what buffer can take.
For example:
char buffer[4096] = "abc";`
actually does something below,
memcpy(buffer, "abc", sizeof("abc"));
memset(&buffer[sizeof("abc")], 0, sizeof(buffer)-sizeof("abc"));
The standard insists that if any part of char array is initialized that is all it consists of at any moment until obeying its memory boundary.
There are no any guarantees from standard, which is why the functions sscanf and fgets are recommended to be used (with respect to the size of the buffer) as you show in your question (and using of fgets is considered preferable compared with gets).
However, some standard functions use null-terminator in their work, e.g. strlen (but I suppose you ask about string modification)
EDIT:
In your example
fgets(buffer, 10, fp);
untouching characters after 10-th is guaranteed (content and length of buffer will not be considered by fgets)
EDIT2:
Moreover, when using fgets keep in mind that '\n' will be stored in the buffers. e.g.
"123\n\0fghijklmnop"
instead of expected
"123\0efghijklmnop"
Depends on the function in use (and to a lesser degree its implementation). sscanf will start writing when it encounters its first non-whitespace character, and continue writing until its first whitespace character, where it will add a finishing 0 and return. But a function like strncpy (famously) zeroes out the rest of the buffer.
There is however nothing in the C standard which mandates how these functions behave.
A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?
I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.
If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);
One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.
You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.
So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.
EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!
The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}
Edit: I've added the source for the example.
I came across this example:
char source[MAX] = "123456789";
char source1[MAX] = "123456789";
char destination[MAX] = "abcdefg";
char destination1[MAX] = "abcdefg";
char *return_string;
int index = 5;
/* This is how strcpy works */
printf("destination is originally = '%s'\n", destination);
return_string = strcpy(destination, source);
printf("after strcpy, dest becomes '%s'\n\n", destination);
/* This is how strncpy works */
printf( "destination1 is originally = '%s'\n", destination1 );
return_string = strncpy( destination1, source1, index );
printf( "After strncpy, destination1 becomes '%s'\n", destination1 );
Which produced this output:
destination is originally = 'abcdefg'
After strcpy, destination becomes '123456789'
destination1 is originally = 'abcdefg'
After strncpy, destination1 becomes '12345fg'
Which makes me wonder why anyone would want this effect. It looks like it would be confusing. This program makes me think you could basically copy over someone's name (eg. Tom Brokaw) with Tom Bro763.
What are the advantages of using strncpy() over strcpy()?
The strncpy() function was designed with a very particular problem in mind: manipulating strings stored in the manner of original UNIX directory entries. These used a short fixed-sized array (14 bytes), and a nul-terminator was only used if the filename was shorter than the array.
That's what's behind the two oddities of strncpy():
It doesn't put a nul-terminator on the destination if it is completely filled; and
It always completely fills the destination, with nuls if necessary.
For a "safer strcpy()", you are better off using strncat() like so:
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
That will always nul-terminate the result, and won't copy more than necessary.
strncpy combats buffer overflow by requiring you to put a length in it. strcpy depends on a trailing \0, which may not always occur.
Secondly, why you chose to only copy 5 characters on 7 character string is beyond me, but it's producing expected behavior. It's only copying over the first n characters, where n is the third argument.
The n functions are all used as defensive coding against buffer overflows. Please use them in lieu of older functions, such as strcpy.
While I know the intent behind strncpy, it is not really a good function. Avoid both. Raymond Chen explains.
Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.
See also Why is strncpy insecure?
strncpy is NOT safer than strcpy, it just trades one type of bugs with another. In C, when handling C strings, you need to know the size of your buffers, there is no way around it. strncpy was justified for the directory thing mentioned by others, but otherwise, you should never use it:
if you know the length of your string and buffer, why using strncpy ? It is a waste of computing power at best (adding useless 0)
if you don't know the lengths, then you risk silently truncating your strings, which is not much better than a buffer overflow
What you're looking for is the function strlcpy() which does terminate always the string with 0 and initializes the buffer. It also is able to detect overflows. Only problem, it's not (really) portable and is present only on some systems (BSD, Solaris). The problem with this function is that it opens another can of worms as can be seen by the discussions on
http://en.wikipedia.org/wiki/Strlcpy
My personal opinion is that it is vastly more useful than strncpy() and strcpy(). It has better performance and is a good companion to snprintf(). For platforms which do not have it, it is relatively easy to implement.
(for the developement phase of a application I substitute these two function (snprintf() and strlcpy()) with a trapping version which aborts brutally the program on buffer overflows or truncations. This allows to catch quickly the worst offenders. Especially if you work on a codebase from someone else.
EDIT: strlcpy() can be implemented easily:
size_t strlcpy(char *dst, const char *src, size_t dstsize)
{
size_t len = strlen(src);
if(dstsize) {
size_t bl = (len < dstsize-1 ? len : dstsize-1);
((char*)memcpy(dst, src, bl))[bl] = 0;
}
return len;
}
The strncpy() function is the safer one: you have to pass the maximum length the destination buffer can accept. Otherwise it could happen that the source string is not correctly 0 terminated, in which case the strcpy() function could write more characters to destination, corrupting anything which is in the memory after the destination buffer. This is the buffer-overrun problem used in many exploits
Also for POSIX API functions like read() which does not put the terminating 0 in the buffer, but returns the number of bytes read, you will either manually put the 0, or copy it using strncpy().
In your example code, index is actually not an index, but a count - it tells how many characters at most to copy from source to destination. If there is no null byte among the first n bytes of source, the string placed in destination will not be null terminated
strncpy fills the destination up with '\0' for the size of source, eventhough the size of the destination is smaller....
manpage:
If the length of src is less than n, strncpy() pads the remainder of
dest with null bytes.
and not only the remainder...also after this until n characters is
reached. And thus you get an overflow... (see the man page
implementation)
This may be used in many other scenarios, where you need to copy only a portion of your original string to the destination. Using strncpy() you can copy a limited portion of the original string as opposed by strcpy(). I see the code you have put up comes from publib.boulder.ibm.com.
That depends on our requirement.
For windows users
We use strncpy whenever we don't want to copy entire string or we want to copy only n number of characters. But strcpy copies the entire string including terminating null character.
These links will help you more to know about strcpy and strncpy
and where we can use.
about strcpy
about strncpy
the strncpy is a safer version of strcpy as a matter of fact you should never use strcpy because its potential buffer overflow vulnerability which makes you system vulnerable to all sort of attacks