strcat Vs strncat - When should which function be used? - c

Some static code analyzer tools are suggesting that all strcat usage should be replaced with strncat for safety purpose?
In a program, if we know clearly the size of the target buffer and source buffers, is it still recommended to go for strncat?
Also, given the suggestions by static tools, should strcat be used ever?

Concatenate two strings into a single string.
Prototypes
#include <string.h>
char * strcat(char *restrict s1, const char *restrict s2);
char * strncat(char *restrict s1, const char *restrict s2, size_t n);
DESCRIPTION
The strcat() and strncat() functions append a copy of the null-terminated
string s2 to the end of the null-terminated string s1, then add a terminating \0'. The string s1 must have sufficient space to hold the
result.
The strncat() function appends not more than n characters from s2, and
then adds a terminating \0'.
The source and destination strings should not overlap, as the behavior is
undefined.
RETURN VALUES
The `strcat()` and `strncat()` functions return the pointer s1.
SECURITY CONSIDERATIONS
The strcat() function is easily misused in a manner which enables malicious users to arbitrarily change a running program's functionality
through a buffer overflow attack.
Avoid using strcat(). Instead, use strncat() or strlcat() and ensure
that no more characters are copied to the destination buffer than it can
hold.
Note that strncat() can also be problematic. It may be a security concern for a string to be truncated at all. Since the truncated string
will not be as long as the original, it may refer to a completely different resource and usage of the truncated resource could result in very
incorrect behavior. Example:
void
foo(const char *arbitrary_string)
{
char onstack[8] = "";
#if defined(BAD)
/*
* This first strcat is bad behavior. Do not use strcat!
*/
(void)strcat(onstack, arbitrary_string); /* BAD! */
#elif defined(BETTER)
/*
* The following two lines demonstrate better use of
* strncat().
*/
(void)strncat(onstack, arbitrary_string,
sizeof(onstack) - strlen(onstack) - 1);
#elif defined(BEST)
/*
* These lines are even more robust due to testing for
* truncation.
*/
if (strlen(arbitrary_string) + 1 >
sizeof(onstack) - strlen(onstack))
err(1, "onstack would be truncated");
(void)strncat(onstack, arbitrary_string,
sizeof(onstack) - strlen(onstack) - 1);
#endif
}
Example
char dest[20] = "Hello";
char *src = ", World!";
char numbers[] = "12345678";
printf("dest before strcat: \"%s\"\n", dest); // "Hello"
strcat(dest, src);
printf("dest after strcat: \"%s\"\n", dest); // "Hello, World!"
strncat(dest, numbers, 3); // strcat first 3 chars of numbers
printf("dest after strncat: \"%s\"\n", dest); // "Hello, World!123"

If you are absolutely sure about source buffer's size and that the source buffer contains a NULL-character terminating the string, then you can safely use strcat when the destination buffer is large enough.
I still recommend using strncat and give it the size of the destination buffer - length of the destination string - 1
Note: I edited this since comments noted that my previous answer was horribly wrong.

They don't do the same thing so they can't be substituted for one another. Both have different data models.
A string for strcat is a null
terminated string for which you (as the programmer) guarantee that it has enough space.
A string for strncat is a sequence
of char that is either terminated
at the length you are indicating or
by a null termination if it is
supposed to be shorter than that
length.
So the use of these functions just depends on the assumptions that you may (or want to) do about your data.

Static tools are generally poor at understanding the circumstances around the use of a function. I bet most of them just warn for every strcat encountered instead of actually looking whether the data passed to the function is deterministic or not. As already mentioned, if you have control over your input data neither function is unsafe.
Though note that strncat() is slightly slower, as it has to check against '\0' termination and a counter, and also explicitly add it to the end. strcat() on the other hand just checks for '\0', and it adds the trailing '\0' to the new string by copying the terminator from the second string along with all the data.

It's very simple strcat is used to concatenate two strings , for example
String a= data
String b = structures
If use perform strcat
Strcat(a, b)
then
a= data structures
But if you want to concatenate specific numer of word r elements then you can use strncat
Example if you want to concatenate only the first two alphabet lts of b into a then you have to write
Strncat(a,b,2)
(It means that you just cancatenate the fist two alphabets of b into a , and a becomes
a = data st

Related

Why strncpy() is not respecting the given size_t n which is 10 in temp2?

This problem is blowing my mind...Can anyone please sort out the problem because i have already wasted hours on this.. ;(
#include <stdio.h>
#include <string.h>
int main(){
char string[] = "Iam pretty much big string.";
char temp1[50];
char temp2[10];
// strcpy() and strncpy()
strcpy(temp1, string);
printf("%s\n", temp1);
strncpy(temp2, temp1, 10);
printf("%s\n", temp2);
return 0;
}
Result
Iam pretty much big string.
Iam prettyIam pretty much big string.
Expected Result:
Iam pretty much big string.
Iam pretty
The strncpy function is respecting the 10 byte limit you're giving it.
It copies the first 10 bytes from string to temp2. None of those 10 bytes is a null byte, and the size of temp2 is 10, so there are no null bytes in temp2. When you then pass temp2 to printf, it reads past the end of the array invoking undefined behavior.
You would need to set the size given to strncpy to the array size - 1, then manually add the null byte to the end.
strncpy(temp2, temp1, sizeof(temp2)-1);
temp2[sizeof(temp2)-1] = 0;
The address of temp2 is just before the address of temp1 and because you do not copy the final 0, the printf will continue printing after the end of temp2.
As time as you do not insert the 0, the result of printf is undefined.
You invoke Undefined Behavior attempting to print temp2 as temp2 is not nul-terminated. From man strncpy:
"Warning: If there is no null byte among the first n bytes of src,
the string placed in dest will not be null-terminated." (emphasis in
original)
See also C11 Standard - 7.24.2.4 The strncpy function (specifically footnote: 308)
So temp2 is not nul-terminated.
Citation of the appropriate [strncpy] tag on Stack Overflow https://stackoverflow.com/tags/strncpy/info, which may help you to understand what happens exactly:
This function is not recommended to use for any purpose, neither in C nor C++. It was never intended to be a "safe version of strcpy" but is often misused for such purposes. It is in fact considered to be much more dangerous than strcpy, since the null termination mechanism of strncpy is not intuitive and therefore often misunderstood. This is because of the following behavior specified by ISO 9899:2011 7.24.2.4:
char *strncpy(char * restrict s1,
const char * restrict s2,
size_t n);
/--/
3 If the array pointed to by s2 is a string that is shorter than n characters, null characters
are appended to the copy in the array pointed to by s1, until n characters in all have been
written.
A very common mistake is to pass an s2 which is exactly as many characters as the n parameter, in which case s1 will not get null terminated. That is: strncpy(dst, src, strlen(src));
/* MCVE of incorrect use of strncpy */
#include <string.h>
#include <stdio.h>
int main (void)
{
const char* STR = "hello";
char buf[] = "halt and catch fire";
strncpy(buf, STR, strlen(STR));
puts(buf); // prints "helloand catch fire"
return 0;
}
Recommended practice in C is to check the buffer size in advance and then use strcpy(), alternatively memcpy().
Recommended practice in C++ is to use std::string instead.
From the manpage for strncpy():
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Either your input is shorter than the supplied length, you add the terminating null byte yourself, or it won't be there. printf() expects the string to be properly null terminated, and thus overruns your allocated buffer.
This only goes to show that the n variants of many standard functions are by no means safe. You must read their respective man pages, and specifically look for what they do when the supplied length does not suffice.

Example for strncpy()

If I would use the strncpy function for the strings cat and dog. I don't understand if the \0 character is counted in or not, so I would like to know if the end result will be catdo? or would it be something like cat\0do
strncpy("cat", "dog", 2);
You should not use the strncpy and strncat functions at all.
Their names start with str, but they do not really work with strings. In C, a string is defined as "a character sequence terminated by '\0'". These functions do not guarantee that the resulting character array is always null-terminated.
The better alternatives are strlcpy and strlcat, but these are not available everywhere.
Even better would be a separate string library in which determining the length of a string were a constant-time operation. But that gets distracting.
As torstenvl mentioned, "cat" and "dog" are string literals, so you're not using the function correctly here. The first parameter is the destination, the second parameter is the source, and the third parameter is the number of bytes to copy.
char *strncpy(char *restrict s1, const char *restrict s2, size_t n)
Source: The Open Group Base Specification - strncpy
To answer your specific question: yes; the null terminator is copied to the destination string. n bytes are written, and if your source string s2 is shorter than n bytes, NULL is filled in until n bytes are written.
In your question, it looks like you're trying to append the two strings. To do this in C, you need to first allocate a source string buffer, copy the first string over, then copy the second string, starting from the end of the second string. Depending on where you start the last step, you can end up with either "catdog\0" or "cat\0dog\0". This is another example of the quintessential "off by one" errors.
To start, you have to calculate the length of the two strings you want to append. You can do this using strlen, from string.h. strlen does not count the null-terminator as part of the length, so remember that to get the length of the final string, you'll have to do strlen(s1) + strlen(s2) + 1.
You can then copy the first string over as you normally would. An easy way to do the second copy is to do this:
char* s2start = finalString[strlen(s1) + 1];
You can then do strncpy(s2start, s2, [the size of s2]) and that way you know you're starting right on the s1 null terminator, avoiding the "cat\0dog" error.
Hope this helps, good luck.
When you write out a string like "cat" or "dog" in c, the arrays cannot be changed, if you try it will result in undefined behavior. You can only use these if a function expects const char * input, const is telling you that it cannot/will not be changed in the function. When you write "dog" the data in the character array will look something like this:
{'d','o','g','\0'}
Notice it is NUL terminated.
The function you are using:
char *strncpy(char *dest, const char *src, size_t n)
Copies src to dst with a maximum length of n you cannot copy into "cat" as mentioned above, you can see char *dest is not constant but const char * src is constant. So the source could be "cat" or "dog"
If you were to allocate space for the string you are allowed to modify it:
char cat_str[] = "cat";
now the character array cat_str is initialized to "cat" but we can alway change it, note its length will be 4 (one for each letter plus a NUL) because we did not specify the length. So be sure to not change anything past cat_str[3], you can index it by 0 to 3
There is a common misconception from some static analysis tools that strncpy is a safer version of strcpy. It's not, it has a differnt purpose. If we insist on using it to prevent buffer overflows, you need to be cognisent of the fact that for it's signature
char * strncpy ( char * destination, const char * source, size_t num );
No null-character is implicitly appended at the end of destination if source is longer than num. Thus, in this case, destination shall not be considered a null terminated C string (reading it as such would overflow).
So if you know that your source is a null terminated C string, then you can do the following:
#include <stdio.h>
#include <string.h>
int main()
{
const char* source = "dog";
char destination[4] = "cat";
printf("source is %s\n", source);
printf("destination is %s\n", destination);
/* the strlen+1 accounts for null termination on source */
/* but you need to be sure that source can fit into destination */
/* and still be null terminated - (that's on you the programmer) */
strncpy(destination, source, strlen(source) + 1);
printf("source is still %s\n", source);
printf("destination is now %s\n", destination);
return 0;
}

Wrong strlen output

I have the following piece of code in C:
char a[55] = "hello";
size_t length = strlen(a);
char b[length];
strncpy(b,a,length);
size_t length2 = strlen(b);
printf("%d\n", length); // output = 5
printf("%d\n", length2); // output = 8
Why is this the case?
it has to be 'b [length +1]'
strlen does not include the null character in the end of c strings.
You never initialized b to anything. Therefore it's contents are undefined. The call to strlen(b) could read beyond the size of b and cause undefined behavior (such as a crash).
b is not initialized: it contains whatever is in your RAM when the program is run.
For the first string a, the length is 5 as it should be "hello" has 5 characters.
For the second string, b you declare it as a string of 5 characters, but you don't initialise it, so it counts the characters until it finds a byte containing the 0 terminator.
UPDATE: the following line was added after I wrote the original answer.
strncpy(b,a,length);
after this addition, the problem is that you declared b of size length, while it should be length + 1 to provision space for the string terminator.
Others have already pointed out that you need to allocate strlen(a)+1 characters for b to be able to hold the whole string.
They've given you a set of parameters to use for strncpy that will (attempt to) cover up the fact that it's not really suitable for the job at hand (or almost any other, truth be told). What you really want is to just use strcpy instead. Also note, however, that as you've allocated it, b is also a local (auto storage class) variable. It's rarely useful to copy a string into a local variable.
Most of the time, if you're copying a string, you need to copy it to dynamically allocated storage -- otherwise, you might as well use the original and skip doing a copy at all. Copying a string into dynamically allocated storage is sufficiently common that many libraries already include a function (typically named strdup) for the purpose. If you're library doesn't have that, it's fairly easy to write one of your own:
char *dupstr(char const *input) {
char *ret = malloc(strlen(input)+1);
if (ret)
strcpy(ret, input);
return ret;
}
[Edit: I've named this dupstr because strdup (along with anything else starting with str is reserved for the implementation.]
Actually char array is not terminated by '\0' so strlen has no way to know where it sh'd stop calculating lenght of string as as
its syntax is int strlen(char *s)-> it returns no. of chars in string till '\0'(NULL char)
so to avoid this this we have to append NULL char (b[length]='\0')
otherwise strlen count char in string passed till NULL counter is encountered

C: String manipulation adding more characters without causing a buffer overflow

In C I have a path in one of my strings
/home/frankv/
I now want to add the name of files that are contained in this folder - e.g. file1.txt file123.txt etc.
Having declared my variable either like this
char pathToFile[strlen("/home/frankv/")+1]
or
char *pathToFile = malloc(strlen("/home/frankv/")+1)
My problem is that I cannot simply add more characters because it would cause a buffer overflow. Also, what do I do in case I do not know how long the filenames will be?
I've really gotten used to PHP lazy $string1.$string2 .. What is the easiest way to do this in C?
If you've allocated a buffer with malloc(), you can use realloc() to expand it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *buf;
const char s1[] = "hello";
const char s2[] = ", world";
buf = malloc(sizeof s1);
strcpy(buf, s1);
buf = realloc(buf, sizeof s1 + sizeof s2 - 1);
strcat(buf, s2);
puts(buf);
return 0;
}
NOTE: I have omitted error checking. You shouldn't. Always check whether malloc() returns a null pointer; if it does, take some corrective action, even if it's just terminating the program. Likewise for realloc(). And if you want to be able to recover from a realloc() failure, store the result in a temporary so you don't clobber your original pointer.
Use std::string, if possible. Else, reallocate another block of memory and use strcpy and strcat.
You have a couple options, but, if you want to do this dynamically using no additional libraries, realloc() is the stdlib function you're looking for:
char *pathToFile = malloc(strlen("/home/frankv/")+1);
char *string_to_add = "filename.txt";
char *p = realloc(pathToFile, strlen(pathToFile) + strlen(string_to_add) + 1);
if (!p) abort();
pathToFile = p;
strcat(p, string_to_add);
Note: you should always assign the result of realloc to a new pointer first, as realloc() returns NULL on failure. If you assign to the original pointer, you are begging for a memory leak.
If you're going to be doing much string manipulation, though, you may want to consider using a string library. Two I've found useful are bstring and ustr.
In case you can use C++, use the std::string. In case you must to use pure C, use what's call doubling - i.e. when out of space in the string - double the memory and copy the string into the new memory. And you'll have to use the second syntax:
char *pathToFile = malloc(strlen("/home/frankv/")+1);
You have chosen the wrong language for manipulating strings!
The easy and conventional way out is to do something like:
#define MAX_PATH 260
char pathToFile[MAX_PATH+1] = "/home/frankv/";
strcat(pathToFile, "wibble/");
Of course, this is error prone - if the resulting string exceeds MAX_PATH characters, anything can happen, and it is this sort of programming which is the route many trojans and worms use to penetrate security (by corrupting memory in a carefully defined way). Hence my deliberate choice of 260 for MAX_PATH, which is what it used to be in Windows - you can still make Windows Explorer do strange things to your files with paths over 260 characters, possibly because of code like this!
strncat may be a small help - you can at least tell it the maximum size of the destination, and it won't copy beyond that.
To do it robustly you need a string library which does variable length strings correctly. But I don't know if there is such a thing for C (C++ is a different matter, of course).

C strcpy() - evil?

Some people seem to think that C's strcpy() function is bad or evil. While I admit that it's usually better to use strncpy() in order to avoid buffer overflows, the following (an implementation of the strdup() function for those not lucky enough to have it) safely uses strcpy() and should never overflow:
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
*s2 is guaranteed to have enough space to store *s1, and using strcpy() saves us from having to store the strlen() result in another function to use later as the unnecessary (in this case) length parameter to strncpy(). Yet some people write this function with strncpy(), or even memcpy(), which both require a length parameter. I would like to know what people think about this. If you think strcpy() is safe in certain situations, say so. If you have a good reason not to use strcpy() in this situation, please give it - I'd like to know why it might be better to use strncpy() or memcpy() in situations like this. If you think strcpy() is okay, but not here, please explain.
Basically, I just want to know why some people use memcpy() when others use strcpy() and still others use plain strncpy(). Is there any logic to preferring one over the three (disregarding the buffer checks of the first two)?
memcpy can be faster than strcpy and strncpy because it does not have to compare each copied byte with '\0', and because it already knows the length of the copied object. It can be implemented in a similar way with the Duff's device, or use assembler instructions that copy several bytes at a time, like movsw and movsd
I'm following the rules in here. Let me quote from it
strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons. strncpy is not by origin a ``bounded strcpy,'' and the Committee has preferred to recognize existing practice rather than alter the function to better suit it to such use.
For that reason, you will not get a trailing '\0' in a string if you hit the n not finding a '\0' from the source string so far. It's easy to misuse it (of course, if you know about that pitfall, you can avoid it). As the quote says, it wasn't designed as a bounded strcpy. And i would prefer not to use it if not necessary. In your case, clearly its use is not necessary and you proved it. Why then use it?
And generally speaking, programming code is also about reducing redundancy. If you know you have a string containing 'n' characters, why tell the copying function to copy maximal n characters? You do redundant checking. It's little about performance, but much more about consistent code. Readers will ask themselves what strcpy could do that could cross the n characters and which makes it necessary to limit the copying, just to read in manuals that this cannot happen in that case. And there the confusion start happen among readers of the code.
For the rational to use mem-, str- or strn-, i chose among them like in the above linked document:
mem- when i want to copy raw bytes, like bytes of a structure.
str- when copying a null terminated string - only when 100% no overflow could happen.
strn- when copying a null terminated string up to some length, filling the remaining bytes with zero. Probably not what i want in most cases. It's easy to forget the fact with the trailing zero-fill, but it's by design as the above quote explains. So, i would just code my own small loop that copies characters, adding a trailing '\0':
char * sstrcpy(char *dst, char const *src, size_t n) {
char *ret = dst;
while(n-- > 0) {
if((*dst++ = *src++) == '\0')
return ret;
}
*dst++ = '\0';
return ret;
}
Just a few lines that do exactly what i want. If i wanted "raw speed" i can still look out for a portable and optimized implementation that does exactly this bounded strcpy job. As always, profile first and then mess with it.
Later, C got functions for working with wide characters, called wcs- and wcsn- (for C99). I would use them likewise.
The reason why people use strncpy not strcpy is because strings are not always null terminated and it's very easy to overflow the buffer (the space you have allocated for the string with strcpy) and overwrite some unrelated bit of memory.
With strcpy this can happen, with strncpy this will never happen. That is why strcpy is considered unsafe. Evil might be a little strong.
Frankly, if you are doing much string handling in C, you should not ask yourself whether you should use strcpy or strncpy or memcpy. You should find or write a string library that provides a higher level abstraction. For example, one that keeps track of the length of each string, allocates memory for you, and provides all the string operations you need.
This will almost certainly guarantee you make very few of the kinds of mistakes usually associated with C string handling, such as buffer overflows, forgetting to terminate a string with a NUL byte, and so on.
The library might have functions such as these:
typedef struct MyString MyString;
MyString *mystring_new(const char *c_str);
MyString *mystring_new_from_buffer(const void *p, size_t len);
void mystring_free(MyString *s);
size_t mystring_len(MyString *s);
int mystring_char_at(MyString *s, size_t offset);
MyString *mystring_cat(MyString *s1, ...); /* NULL terminated list */
MyString *mystring_copy_substring(MyString *s, size_t start, size_t max_chars);
MyString *mystring_find(MyString *s, MyString *pattern);
size_t mystring_find_char(MyString *s, int c);
void mystring_copy_out(void *output, MyString *s, size_t max_chars);
int mystring_write_to_fd(int fd, MyString *s);
int mystring_write_to_file(FILE *f, MyString *s);
I wrote one for the Kannel project, see the gwlib/octstr.h file. It made life much simpler for us. On the other hand, such a library is fairly simple to write, so you might write one for yourself, even if only as an exercise.
No one has mentioned strlcpy, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:
The most common misconception is that
strncpy() NUL-terminates the
destination string. This is only true,
however, if length of the source
string is less than the size
parameter. This can be problematic
when copying user input that may be of
arbitrary length into a fixed size
buffer. The safest way to use
strncpy() in this situation is to pass
it one less than the size of the
destination string, and then terminate
the string by hand. That way you are
guaranteed to always have a
NUL-terminated destination string.
There are counter-arguments for the use of strlcpy; the Wikipedia page makes note that
Drepper argues that strlcpy and
strlcat make truncation errors easier
for a programmer to ignore and thus
can introduce more bugs than they
remove.*
However, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to strncpy. Use of strlcpy makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.
Also note that the lack of strlcpy in glibc or Microsoft's libraries should not be a barrier to use; you can find the source for strlcpy and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top of strlcpy.c.
I personally am of the mindset that if the code can be proven to be valid—and done so quickly—it is perfectly acceptable. That is, if the code is simple and thus obviously correct, then it is fine.
However, your assumption seems to be that while your function is executing, no other thread will modify the string pointed to by s1. What happens if this function is interrupted after successful memory allocation (and thus the call to strlen), the string grows, and bam you have a buffer overflow condition since strcpy copies to the NULL byte.
The following might be better:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
return s2;
}
Now, the string can grow through no fault of your own and you're safe. The result will not be a dup, but it won't be any crazy overflows, either.
The probability of the code you provided actually being a bug is pretty low (pretty close to non-existent, if not non-existent, if you are working in an environment that has no support for threading whatsoever). It's just something to think about.
ETA: Here is a slightly better implementation:
char *
strdup(const char *s1, int *retnum) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
retnum = s1_len;
return s2;
}
There the number of characters is being returned. You can also:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
s2[s1_len+1] = '\0';
return s2;
}
Which will terminate it with a NUL byte. Either way is better than the one that I quickly put together originally.
I agree. I would recommend against strncpy() though, since it will always pad your output to the indicated length. This is some historical decision, which I think was really unfortunate as it seriously worsens the performance.
Consider code like this:
char buf[128];
strncpy(buf, "foo", sizeof buf);
This will not write the expected four characters to buf, but will instead write "foo" followed by 125 zero characters. If you're for instance collecting a lot of short strings, this will mean your actual performance is far worse than expected.
If available, I prefer to use snprintf(), writing the above like:
snprintf(buf, sizeof buf, "foo");
If instead copying a non-constant string, it's done like this:
snprintf(buf, sizeof buf, "%s", input);
This is important, since if input contains % characters snprintf() would interpret them, opening up whole shelvefuls of cans of worms.
I think strncpy is evil too.
To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.
This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.
Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.
So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):
string *string_dup(const string *s1) {
string *s2 = string_alloc(string_len(s1));
if (s2 != NULL) {
string_set(s2,s1);
}
return s2;
}
static inline size_t string_len(const string *s) {
return strlen(s->data);
}
static inline void string_set(string *dest, const string *src) {
// potential (but unlikely) performance issue: strncpy 0-fills dest,
// even if the src is very short. We may wish to optimise
// by switching to memcpy later. But strncpy is better here than
// strcpy, because it means we can use string_set even when
// the length of src is unknown.
strncpy(dest->data, src->data, dest->capacity);
}
string *string_alloc(size_t maxlen) {
if (maxlen > SIZE_MAX - sizeof(string) - 1) return NULL;
string *self = malloc(sizeof(string) + maxlen + 1);
if (self != NULL) {
// empty string
self->data[0] = '\0';
// strncpy doesn't NUL-terminate if it prevents overflow,
// so exclude the NUL-terminator from the capacity, set it now,
// and it can never be overwritten.
self->capacity = maxlen;
self->data[maxlen] = '\0';
}
return self;
}
typedef struct string {
size_t capacity;
char data[0];
} string;
The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.
I'd tend to use memcpy if I have already calculated the length, although strcpy is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.
But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial strlen, so strncpy doesn't buy you anything in terms of safety (and presumbly strncpy is slower as it has to both check bounds and for nul), and any difference between memcpy and strcpy isn't worth changing code for speculatively.
The evil comes when people use it like this (although the below is super simplified):
void BadFunction(char *input)
{
char buffer[1024]; //surely this will **always** be enough
strcpy(buffer, input);
...
}
Which is a situation that happens suprising often.
But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.
strlen finds upto last null terminating place.
But in reality buffers are not null terminated.
that's why people use different functions.
Well, strcpy() is not as evil as strdup() - at least strcpy() is part of Standard C.
In the situation you describe, strcpy is a good choice. This strdup will only get into trouble if the s1 was not ended with a '\0'.
I would add a comment indicating why there are no problems with strcpy, to prevent others (and yourself one year from now) wondering about its correctness for too long.
strncpy often seems safe, but may get you into trouble. If the source "string" is shorter than count, it pads the target with '\0' until it reaches count. That may be bad for performance. If the source string is longer than count, strncpy does not append a '\0' to the target. That is bound to get you into trouble later on when you expect a '\0' terminated "string". So strncpy should also be used with caution!
I would only use memcpy if I was not working with '\0' terminated strings, but that seems to be a matter of taste.
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
Problems:
s1 is unterminated, strlen causes the access of unallocated memory, program crashes.
s1 is unterminated, strlen while not causing the access of unallocated memory access memory from another part of your application. It's returned to the user (security issue) or parsed by another part of your program (heisenbug appears).
s1 is unterminated, strlen results in a malloc which the system can't satisfy, returns NULL. strcpy is passed NULL, program crashes.
s1 is unterminated, strlen results in a malloc which is very large, system allocs far too much memory to perform the task at hand, becomes unstable.
In the best case the code is inefficient, strlen requires access to every element in the string.
There are probably other problems... Look, null termination isn't always a bad idea. There are situations where, for computational efficiency, or to reduce storage requirements it makes sense.
For writing general purpose code, e.g. business logic does it make sense? No.
char* dupstr(char* str)
{
int full_len; // includes null terminator
char* ret;
char* s = str;
#ifdef _DEBUG
if (! str)
toss("arg 1 null", __WHENCE__);
#endif
full_len = strlen(s) + 1;
if (! (ret = (char*) malloc(full_len)))
toss("out of memory", __WHENCE__);
memcpy(ret, s, full_len); // already know len, so strcpy() would be slower
return ret;
}
This answer uses size_t and memcpy() for a fast and simple strdup().
Best to use type size_t as that is the type returned from strlen() and used by malloc() and memcpy(). int is not the proper type for these operations.
memcpy() is rarely slower than strcpy() or strncpy() and often significantly faster.
// Assumption: `s1` points to a C string.
char *strdup(const char *s1) {
size_t size = strlen(s1) + 1;
char *s2 = malloc(size);
if(s2 != NULL) {
memcpy(s2, s1, size);
}
return s2;
}
§7.1.1 1 "A string is a contiguous sequence of characters terminated by and including the first null character. ..."
Your code is terribly inefficient because it runs through the string twice to copy it.
Once in strlen().
Then again in strcpy().
And you don't check s1 for NULL.
Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.

Resources