Below is some code which I ran through a static analyzer. It came back saying there is a stack overflow vulnerability in the function that uses strtok below, as described here:
https://cwe.mitre.org/data/definitions/121.html
If you trace the execution, the variables used by strtok ultimately derive their data from the user_input variable in somefunction coming in from the wild. But I figured I prevented problems by first checking the length of user_input as well as by explicitly using strncpy with a bound any time I copied pieces of user_input.
somefunction(user_input) {
if (strlen(user_input) != 23) {
if (user_input != NULL)
free(user_input);
exit(1);
}
Mystruct* mystruct = malloc(sizeof(Mystruct));
mystruct->foo = malloc(3 * sizeof(char));
memset(mystruct->foo, '\0', 3);
strncpy(mystruct->foo,&(user_input[0]),2);
mystruct->bar = malloc(19 * sizeof(char));
memset(mystruct->bar, '\0', 19);
/* Remove spaces from user's input. strtok is not guaranteed to
* not modify the source string so we copy it first.
*/
char *input = malloc(22 * sizeof(char));
strncpy(input,&(user_input[2]),21);
remove_spaces(input,mystruct->bar);
}
void remove_spaces(char *input, char *output) {
const char space[2] = " ";
char *token;
token = strtok(input, space);
while( token != NULL ) {
// the error is indicated on this line
strncat(output, token, strlen(token));
token = strtok(NULL, space);
}
}
I presumed that I didn't have to malloc token per this comment, and elsewhere. Is there something else I'm missing?
strncpy does not increase the safety of your code; indeed, it may well make the code less safe by introducing the possibility of an unterminated output string. But the issue being flagged by the static analyser involves neither with strncpy nor strtok; it's with strncat.
Although they are frequently touted as increasing code safety, that was never the purpose of strncpy, strncat nor strncmp. The strn* alternatives to str* functions are intended for use in a context in which string data is not null-terminated. Such a context exists, although it is rare in student code: fixed-length string fields in fixed-size database records. If a field in a database record always contains 20 characters (CHAR(20) in SQL terms), there's no need to force a trailing 0-byte, which could have been used to allow 21-character names (or whatever the field is). It's a waste of space, and the only reason that those unnecessary bytes might be examined by the database code is to check database integrity. (Not that the extra byte really helps maintain integrity, either. But it must be checked for correctness.)
If you were writing code which used or created fixed-length unterminated string fields, you would certainly need a set of string functions which accept a length argument. But the string library already had those functions: memcpy and memcmp. The strn versions were added to ease the interface when both null-terminated and fixed-length strings are being used in the same application; for example, if a null-terminated string is read from user input and needs to be copied into a fixed-length database field. In that context, the interface of strncpy makes sense: the database field must be completed cleared of old data, but the input string might be too short to guarantee that. So you can't use strcpy even if you check that it won't overflow (because it doesn't necessarily erase old data) and you can't use memcpy (because the bytes following the end of the input string are indeterminated). Hence an interface like strncpy, which guarantees that the destination will be filled but doesn't guarantee that it will be null-terminated.
strncmp and strnlen do have some applications which don't necessarily have to do with fixed-length string records, but they are not safety-related either. strncmp is handy if you want to know whether a given string is a prefix of another string (although a startswith function would have more directly addressed this use case) and strnlen lets you answer the question "Are there at least four characters in this string?" without having to worry about how many cycles would be wasted if the string continued for another four million characters. But that doesn't justify using them in other, more normal, contexts.
OK, that was a bit of a detour. Let's get back to strncat, whose prototype is
char *strncat(char *dest, const char *src, size_t n);
where n is the maximum number of characters to copy. As the man page notes, you (and not the standard library) are responsible for ensuring that the destination has n+1 bytes available for the copy. The library function cannot take responsibility, because it cannot know how much space is available, and it hasn't asked you to specify that.
In my opinion, that makes strncat completely useless. In order to know how much space is available in the destination, you need to know where the concatenation's copy will start. But if you knew that, why on earth would you ask the standard library to scan over the destination looking for the concatenation point? In any case, you are not verifying how much space is available; you simply call:
strncat(output, token, strlen(token));
That does exactly the same thing as strcat(output, token) except that it scans token twice (once to count the bytes and a second time to copy them) and during the copy it does a redundant check to ensure that the count has not been exceeded while copying.
A "safe" version of strncat would require you to specify the length of the destination, but since there is no such function in the standard C library and also no consensus as to what the prototype for such a function would be, you need to guarantee safety yourself by tracking the amount of space used in output by each concatenation. As an extra benefit, if you do that, you can then make the computational complexity of a sequence of concatenations linear in the number of bytes copied, which one might intuitively expect, as opposed to quadratic, as implemented by strcat and strncat.
So a safe and efficient procedure might look like this:
void remove_spaces(char *output, size_t outmax,
char *input) {
if (outmax = 0) return;
char *token = strtok(input, " ");
char *outlimit = output + outmax;
while( token ) {
size_t tokelen = strlen(token);
if (tokelen >= outlimit - output)
tokelen = outlimit - output - 1;
memcpy(output, token, tokelen);
output += tokelen;
token = strtok(NULL, " ");
}
*output = 0;
}
The CWE warning does not mention strtok at all, so the question in the title itself is a red herring. strtok is one of the few parts of your code which is not problematic, although (as you note) it does force an otherwise unnecessary copy of the input string, in case that string is in read-only memory. (As noted above, strncpy does not guarantee that the copy is null-terminated, so it is not safe here. strdup, which needs to be paired with free, is the safest way to copy a string. Fortunately, it will still be part of the C standard instead of just being available almost everywhere.)
That might be a good enough reason to avoid strtok. If so, it's easy to get rid of:
void remove_spaces(char *output, size_t outmax,
/* This version doesn't modify input */
const char *input) {
if (outmax = 0) return;
char *token = strtok(input, " ");
char *outlimit = output + outmax;
while ( *(input += strspn(input, " ")) ) {
size_t cpylen = (tokelen < outlimit - outptr)
? tokelen
: outlimit - outptr - 1;
memcpy(output, input, cpylen);
output += cpylen;
input += tokelen;
}
*output = 0;
}
A better interface would manage to indicate whether the output was truncated, and perhaps give an indication of how many bytes were necessary to accommodate the operation. See snprintf for an example.
Related
This might be somewhat pointless, but I'm curious what you guys think about it. I'm iterating over a string with pointers and want to pull a short substring out of it (placing the substring into a pre-allocated temporary array). Are there any reasons to use assignment over strncopy, or vice-versa? I.e.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{ char orig[] = "Hello. I am looking for Molly.";
/* Strings to store the copies
* Pretend that strings had some prior value, ensure null-termination */
char cpy1[4] = "huh\0";
char cpy2[4] = "huh\0";
/* Pointer to simulate iteration over a string */
char *startptr = orig + 2;
int length = 3;
int i;
/* Using strncopy */
strncpy(cpy1, startptr, length);
/* Using assignment operator */
for (i = 0; i < length; i++)
{ cpy2[i] = *(startptr + i);
}
/* Display Results */
printf("strncpy result:\n");
printf("%s\n\n", cpy1);
printf("loop result:\n");
printf("%s\n", cpy2);
}
It seems to me that strncopy is both less typing and more easily readable, but I've seen people advocate looping instead. Is there a difference? Does it even matter? Assume that this is for small values of i (0 < i < 5), and null-termination is assured.
Refs: Strings in c, how to get subString, How to get substring in C, Difference between strncpy and memcpy?
strncpy(char * dst, char *src, size_t len) has two peculiar properties:
if (strlen(src) >= len) : the resulting string will not be nul-terminated.
if (strlen(src) < len) : the end of the string will be filled/padded with '\0'.
The first property will force you to actually check if (strlen(src) >= len) and act appropiately. (or brutally set the final character to nul with dst[len-1] = '\0';, like #Gilles does above) The other property is not particular dangerous, but can spill a lot of cycles. Imagine:
char buff[10000];
strncpy(buff, "Hello!", sizeof buff);
which touches 10000 bytes, where only 7 need to be touched.
My advice:
A: if you know the sizes, just do memcpy(dst,src,len); dst[len] = 0;
B: if you don't know the sizes, get them somehow (using strlen and/or sizeof and/or the allocated size for dynamically allocced memory). Then: goto A above.
Since for safe operation the strncpy() version already needs to know the sizes, (and the checks on them!), the memcpy() version is not more complex or more dangerous than the strncpy() version. (technically it is even marginally faster; because memcpy() does not have to check for the '\0' byte)
While this may seem counter-intuitive, there are more optimized ways to copy a string than by using the assignment operator in a loop. For instance, IA-32 provides the REP prefix for MOVS, STOS, CMPS etc for string handling, and these can be much faster than a loop that copies one char at a time. The implementation of strncpy or strcpy may choose to use such hardware-optimized code to achieve better performance.
As long as you know your lengths are "in range" and everything is correctly nul terminated, then strncpy is better.
If you need to get length checks etc in there, looping could be more convenient.
A loop with assignment is a bad idea because you're reinventing the wheel. You might make a mistake, and your code is likely to be less efficient than the code in the standard library (some processors have optimized instructions for memory copies, and optimized implementations usually at least copy word by word if possible).
However, note that strncpy is not a well-rounded wheel. In particular, if the string is too long, it does not append a null byte to the destination. The BSD function strlcpy is better designed, but not available everywhere. Even strlcpy is not a panacea: you need to get the buffer size right, and be aware that it might truncate the string.
A portable way to copy a string, with truncation if the string is too long, is to call strncpy and always add the terminating null byte. If the buffer is an array:
char buffer[BUFFER_SIZE];
strncpy(buffer, source, sizeof(buffer)-1);
buf[sizeof(buffer)-1] = 0;
If the buffer is given by a pointer and size:
strncpy(buf, source, buffer_size-1);
buf[buffer_size-1] = 0;
I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));
For doing string concatenation, I've been doing basic strcpy, strncpy of char* buffers. Then I learned about the snprintf and friends.
Should I stick with my strcpy, strcpy + \0 termination? Or should I just use snprintf in the future?
For most purposes I doubt the difference between using strncpy and snprintf is measurable.
If there's any formatting involved I tend to stick to only snprintf rather than mixing in strncpy as well.
I find this helps code clarity, and means you can use the following idiom to keep track of where you are in the buffer (thus avoiding creating a Shlemiel the Painter algorithm):
char sBuffer[iBufferSize];
char* pCursor = sBuffer;
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), "some stuff\n");
for(int i = 0; i < 10; i++)
{
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), " iter %d\n", i);
}
pCursor += snprintf(pCursor, sizeof(sBuffer) - (pCursor - sBuffer), "into a string\n");
snprintf is more robust if you want to format your string. If you only want to concatenate, use strncpy (don't use strcpy) since it's more efficient.
As others did point out already: Do not use strncpy.
strncpy will not zero terminate in case of truncation.
strncpy will zero-pad the whole buffer if string is shorter than buffer. If buffer is large, this may be a performance drain.
snprintf will (on POSIX platforms) zero-terminate. On Windows, there is only _snprintf, which will not zero-terminate, so take that into account.
Note: when using snprintf, use this form:
snprintf(buffer, sizeof(buffer), "%s", string);
instead of
snprintf(buffer, sizeof(buffer), string);
The latter is insecure and - if string depends on user input - can lead to stack smashes, etc.
sprintf has an extremely useful return value that allows for efficient appending.
Here's the idiom:
char buffer[HUGE] = {0};
char *end_of_string = &buffer[0];
end_of_string += sprintf( /* whatever */ );
end_of_string += sprintf( /* whatever */ );
end_of_string += sprintf( /* whatever */ );
You get the idea. This works because sprintf returns the number of characters it wrote to the buffer, so advancing your buffer by that many positions will leave you pointing to the '\0' at the end of what's been written so far. So when you hand the updated position to the next sprintf, it can start writing new characters right there.
Constrast with strcpy, whose return value is required to be useless. It hands you back the same argument you passed it. So appending with strcpy implies traversing the entire first string looking for the end of it. And then appending again with another strcpy call implies traversing the entire first string, followed by the 2nd string that now lives after it, looking for the '\0'. A third strcpy will re-traverse the strings that have already been written yet again. And so forth.
So for many small appends to a very large buffer, strcpy approches (O^n) where n is the number of appends. Which is terrible.
Plus, as others mentioned, they do different things. sprintf can be used to format numbers, pointer values, etc, into your buffer.
I think there is another difference between strncpy and snprintf.
Think about this:
const int N=1000000;
char arr[N];
strncpy(arr, "abce", N);
Usually, strncpy will set the rest of the destination buffer to '\0'. This will cost lots of CPU time. While when you call snprintf,
snprintf(a, N, "%s", "abce");
it will leave the buffer unchanged.
I don't know why strncpy will do that, but in this case, I will choose snprintf instead of strncpy.
All *printf functions check formatting and expand its corresponding argument, thus it is slower than a simple strcpy/strncpy, which only copy a given number of bytes from linear memory.
My rule of thumb is:
Use snprintf whenever formatting is needed.
Stick to strncpy/memcpy when only need to copy a block of linear memory.
You can use strcpy whenever you know exatcly the size of buffers you're copying. Don't use that if you don't have full control over the buffers size.
strcpy, strncpy, etc. only copies strings from one memory location to another. But, with snprint, you can do more stuff like formatting the string. Copying integers into buffer, etc.
It purely depends on your requirement which one to use. If as per your logic, strcpy & strncpy is already working for you, there is no need to jump to snprintf.
Also, remember to use strncpy for better safety as suggested by others.
The difference between strncpy and snprintf is that strncpy basically lays on you responsibility of terminating string with '\0'. It may terminate dst with '\0' but only if src is short enough.
Typical examples are:
strncpy(dst, src, n);
// if src is longer than n dst will not contain null
// terminated string at this point
dst[n - 1] = '\0';
snprintf(dst, n, "%s", src); // dst will 100% contain null terminated string
strncpy() supposedly protects from buffer overflows. But if it prevents an overflow without null terminating, in all likelihood a subsequent string operation is going to overflow. So to protect against this I find myself doing:
strncpy( dest, src, LEN );
dest[LEN - 1] = '\0';
man strncpy gives:
The strncpy() function is similar, except that not more than n bytes of src are copied. Thus, if there is no null byte among the first n bytes of src, the result will not be null-terminated.
Without null terminating something seemingly innocent like:
printf( "FOO: %s\n", dest );
...could crash.
Are there better, safer alternatives to strncpy()?
strncpy() is not intended to be used as a safer strcpy(), it is supposed to be used to insert one string in the middle of another.
All those "safe" string handling functions such as snprintf() and vsnprintf() are fixes that have been added in later standards to mitigate buffer overflow exploits etc.
Wikipedia mentions strncat() as an alternative to writing your own safe strncpy():
*dst = '\0';
strncat(dst, src, LEN);
EDIT
I missed that strncat() exceeds LEN characters when null terminating the string if it is longer or equal to LEN char's.
Anyway, the point of using strncat() instead of any homegrown solution such as memcpy(..., strlen(...))/whatever is that the implementation of strncat() might be target/platform optimized in the library.
Of course you need to check that dst holds at least the nullchar, so the correct use of strncat() would be something like:
if (LEN) {
*dst = '\0'; strncat(dst, src, LEN-1);
}
I also admit that strncpy() is not very useful for copying a substring into another string, if the src is shorter than n char's, the destination string will be truncated.
Originally, the 7th Edition UNIX file system (see DIR(5)) had directory entries that limited file names to 14 bytes; each entry in a directory consisted of 2 bytes for the inode number plus 14 bytes for the name, null padded to 14 characters, but not necessarily null-terminated. It's my belief that strncpy() was designed to work with those directory structures - or, at least, it works perfectly for that structure.
Consider:
A 14 character file name was not null terminated.
If the name was shorter than 14 bytes, it was null padded to full length (14 bytes).
This is exactly what would be achieved by:
strncpy(inode->d_name, filename, 14);
So, strncpy() was ideally fitted to its original niche application. It was only coincidentally about preventing overflows of null-terminated strings.
(Note that null padding up to the length 14 is not a serious overhead - if the length of the buffer is 4 KB and all you want is to safely copy 20 characters into it, then the extra 4075 nulls is serious overkill, and can easily lead to quadratic behaviour if you are repeatedly adding material to a long buffer.)
There are already open source implementations like strlcpy that do safe copying.
http://en.wikipedia.org/wiki/Strlcpy
In the references there are links to the sources.
Strncpy is safer against stack overflow attacks by the user of your program, it doesn't protect you against errors you the programmer do, such as printing a non-null-terminated string, the way you've described.
You can avoid crashing from the problem you've described by limiting the number of chars printed by printf:
char my_string[10];
//other code here
printf("%.9s",my_string); //limit the number of chars to be printed to 9
Some new alternatives are specified in ISO/IEC TR 24731 (Check https://buildsecurityin.us-cert.gov/daisy/bsi/articles/knowledge/coding/317-BSI.html for info). Most of these functions take an additional parameter that specifies the maximum length of the target variable, ensure that all strings are null-terminated, and have names that end in _s (for "safe" ?) to differentiate them from their earlier "unsafe" versions.1
Unfortunately, they're still gaining support and may not be available with your particular tool set. Later versions of Visual Studio will throw warnings if you use the old unsafe functions.
If your tools don't support the new functions, it should be fairly easy to create your own wrappers for the old functions. Here's an example:
errCode_t strncpy_safe(char *sDst, size_t lenDst,
const char *sSrc, size_t count)
{
// No NULLs allowed.
if (sDst == NULL || sSrc == NULL)
return ERR_INVALID_ARGUMENT;
// Validate buffer space.
if (count >= lenDst)
return ERR_BUFFER_OVERFLOW;
// Copy and always null-terminate
memcpy(sDst, sSrc, count);
*(sDst + count) = '\0';
return OK;
}
You can change the function to suit your needs, for example, to always copy as much of the string as possible without overflowing. In fact, the VC++ implementation can do this if you pass _TRUNCATE as the count.
1Of course, you still need to be accurate about the size of the target buffer: if you supply a 3-character buffer but tell strcpy_s() it has space for 25 chars, you're still in trouble.
Use strlcpy(), specified here: http://www.courtesan.com/todd/papers/strlcpy.html
If your libc doesn't have an implementation, then try this one:
size_t strlcpy(char* dst, const char* src, size_t bufsize)
{
size_t srclen =strlen(src);
size_t result =srclen; /* Result is always the length of the src string */
if(bufsize>0)
{
if(srclen>=bufsize)
srclen=bufsize-1;
if(srclen>0)
memcpy(dst,src,srclen);
dst[srclen]='\0';
}
return result;
}
(Written by me in 2004 - dedicated to the public domain.)
Instead of strncpy(), you could use
snprintf(buffer, BUFFER_SIZE, "%s", src);
Here's a one-liner which copies at most size-1 non-null characters from src to dest and adds a null terminator:
static inline void cpystr(char *dest, const char *src, size_t size)
{ if(size) while((*dest++ = --size ? *src++ : 0)); }
strncpy works directly with the string buffers available, if you are working directly with your memory, you MUST now buffer sizes and you could set the '\0' manually.
I believe there is no better alternative in plain C, but its not really that bad if you are as careful as you should be when playing with raw memory.
Without relying on newer extensions, I have done something like this in the past:
/* copy N "visible" chars, adding a null in the position just beyond them */
#define MSTRNCPY( dst, src, len) ( strncpy( (dst), (src), (len)), (dst)[ (len) ] = '\0')
and perhaps even:
/* pull up to size - 1 "visible" characters into a fixed size buffer of known size */
#define MFBCPY( dst, src) MSTRNCPY( (dst), (src), sizeof( dst) - 1)
Why the macros instead of newer "built-in" (?) functions? Because there used to be quite a few different unices, as well as other non-unix (non-windows) environments that I had to port to back when I was doing C on a daily basis.
I have always preferred:
memset(dest, 0, LEN);
strncpy(dest, src, LEN - 1);
to the fix it up afterwards approach, but that is really just a matter of preference.
These functions have evolved more than being designed, so there really is no "why".
You just have to learn "how". Unfortunately the linux man pages at least are
devoid of common use case examples for these functions, and I've noticed lots
of misuse in code I've reviewed. I've made some notes here:
http://www.pixelbeat.org/programming/gcc/string_buffers.html
Edit: I've added the source for the example.
I came across this example:
char source[MAX] = "123456789";
char source1[MAX] = "123456789";
char destination[MAX] = "abcdefg";
char destination1[MAX] = "abcdefg";
char *return_string;
int index = 5;
/* This is how strcpy works */
printf("destination is originally = '%s'\n", destination);
return_string = strcpy(destination, source);
printf("after strcpy, dest becomes '%s'\n\n", destination);
/* This is how strncpy works */
printf( "destination1 is originally = '%s'\n", destination1 );
return_string = strncpy( destination1, source1, index );
printf( "After strncpy, destination1 becomes '%s'\n", destination1 );
Which produced this output:
destination is originally = 'abcdefg'
After strcpy, destination becomes '123456789'
destination1 is originally = 'abcdefg'
After strncpy, destination1 becomes '12345fg'
Which makes me wonder why anyone would want this effect. It looks like it would be confusing. This program makes me think you could basically copy over someone's name (eg. Tom Brokaw) with Tom Bro763.
What are the advantages of using strncpy() over strcpy()?
The strncpy() function was designed with a very particular problem in mind: manipulating strings stored in the manner of original UNIX directory entries. These used a short fixed-sized array (14 bytes), and a nul-terminator was only used if the filename was shorter than the array.
That's what's behind the two oddities of strncpy():
It doesn't put a nul-terminator on the destination if it is completely filled; and
It always completely fills the destination, with nuls if necessary.
For a "safer strcpy()", you are better off using strncat() like so:
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
That will always nul-terminate the result, and won't copy more than necessary.
strncpy combats buffer overflow by requiring you to put a length in it. strcpy depends on a trailing \0, which may not always occur.
Secondly, why you chose to only copy 5 characters on 7 character string is beyond me, but it's producing expected behavior. It's only copying over the first n characters, where n is the third argument.
The n functions are all used as defensive coding against buffer overflows. Please use them in lieu of older functions, such as strcpy.
While I know the intent behind strncpy, it is not really a good function. Avoid both. Raymond Chen explains.
Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.
See also Why is strncpy insecure?
strncpy is NOT safer than strcpy, it just trades one type of bugs with another. In C, when handling C strings, you need to know the size of your buffers, there is no way around it. strncpy was justified for the directory thing mentioned by others, but otherwise, you should never use it:
if you know the length of your string and buffer, why using strncpy ? It is a waste of computing power at best (adding useless 0)
if you don't know the lengths, then you risk silently truncating your strings, which is not much better than a buffer overflow
What you're looking for is the function strlcpy() which does terminate always the string with 0 and initializes the buffer. It also is able to detect overflows. Only problem, it's not (really) portable and is present only on some systems (BSD, Solaris). The problem with this function is that it opens another can of worms as can be seen by the discussions on
http://en.wikipedia.org/wiki/Strlcpy
My personal opinion is that it is vastly more useful than strncpy() and strcpy(). It has better performance and is a good companion to snprintf(). For platforms which do not have it, it is relatively easy to implement.
(for the developement phase of a application I substitute these two function (snprintf() and strlcpy()) with a trapping version which aborts brutally the program on buffer overflows or truncations. This allows to catch quickly the worst offenders. Especially if you work on a codebase from someone else.
EDIT: strlcpy() can be implemented easily:
size_t strlcpy(char *dst, const char *src, size_t dstsize)
{
size_t len = strlen(src);
if(dstsize) {
size_t bl = (len < dstsize-1 ? len : dstsize-1);
((char*)memcpy(dst, src, bl))[bl] = 0;
}
return len;
}
The strncpy() function is the safer one: you have to pass the maximum length the destination buffer can accept. Otherwise it could happen that the source string is not correctly 0 terminated, in which case the strcpy() function could write more characters to destination, corrupting anything which is in the memory after the destination buffer. This is the buffer-overrun problem used in many exploits
Also for POSIX API functions like read() which does not put the terminating 0 in the buffer, but returns the number of bytes read, you will either manually put the 0, or copy it using strncpy().
In your example code, index is actually not an index, but a count - it tells how many characters at most to copy from source to destination. If there is no null byte among the first n bytes of source, the string placed in destination will not be null terminated
strncpy fills the destination up with '\0' for the size of source, eventhough the size of the destination is smaller....
manpage:
If the length of src is less than n, strncpy() pads the remainder of
dest with null bytes.
and not only the remainder...also after this until n characters is
reached. And thus you get an overflow... (see the man page
implementation)
This may be used in many other scenarios, where you need to copy only a portion of your original string to the destination. Using strncpy() you can copy a limited portion of the original string as opposed by strcpy(). I see the code you have put up comes from publib.boulder.ibm.com.
That depends on our requirement.
For windows users
We use strncpy whenever we don't want to copy entire string or we want to copy only n number of characters. But strcpy copies the entire string including terminating null character.
These links will help you more to know about strcpy and strncpy
and where we can use.
about strcpy
about strncpy
the strncpy is a safer version of strcpy as a matter of fact you should never use strcpy because its potential buffer overflow vulnerability which makes you system vulnerable to all sort of attacks