Strings in C: pitfalls and techniques - c

I will be coaching an ACM Team next month (go figure), and the time has come to talk about strings in C. Besides a discussion on the standard lib, strcpy, strcmp, etc., I would like to give them some hints (something like str[0] is equivalent to *str, and things like that).
Do you know of any lists (like cheat sheets) or your own experience in the matter?
I'm already aware of the books for the ACM competition (which are good, see particularly this), but I'm after tricks of the trade.
Thank you.
Edit: Thank you very much everybody. I will accept the most voted answer, and have duly upvoted others which I think are relevant. I expect to do a summary here (like I did here, asap). I have enough material now and I'm certain this has improved the session on strings immensely. Once again, thanks.

It's obvious but I think it's important to know that strings are nothing more than an array of bytes, delimited by a zero byte.
C strings aren't all that user-friendly as you probably know.
Writing a zero byte somewhere in the string will truncate it.
Going out of bounds generally ends bad.
Never, ever use strcpy, strcmp, strcat, etc.., instead use their safe variants: strncmp, strncat, strndup,...
Avoid strncpy. strncpy will not always zero delimit your string! If the source string doesn't fit in the destination buffer it truncates the string but it won't write a nul byte at the end of the buffer. Also, even if the source buffer is a lot smaller than the destination, strncpy will still overwrite the whole buffer with zeroes. I personally use strlcpy.
Don't use printf(string), instead use printf("%s", string). Try thinking of the consequences if the user puts a %d in the string.
You can't compare strings with if( s1 == s2 )
doStuff(s1);
You have to compare every character in the string. Use strcmp or better strncmp.
if( strncmp( s1, s2, BUFFER_SIZE ) == 0 )
doStuff(s1);

Abusing strlen() will dramatically worsen the performance.
for( int i = 0; i < strlen( string ); i++ ) {
processChar( string[i] );
}
will have at least O(n2) time complexity whereas
int length = strlen( string );
for( int i = 0; i < length; i++ ) {
processChar( string[i] );
}
will have at least O(n) time complexity. This is not so obvious for people who haven't taken time to think of it.

The following functions can be used to implement a non-mutating strtok:
strcspn(string, delimiters)
strspn(string, delimiters)
The first one finds the first character in the set of delimiters you pass in. The second one finds the first character not in the set of delimiters you pass in.
I prefer these to strpbrk as they return the length of the string if they can't match.

str[0] is equivalent to 0[str], or more generally str[i] is i[str] and i[str] is *(str + i).
NB
this is not specific to strings but it works also for C arrays

The strn* variants in stdlib do not necessarily null terminate the destination string.
As an example: from MSDN's documentation on strncpy:
The strncpy function copies the
initial count characters of strSource
to strDest and returns strDest. If
count is less than or equal to the
length of strSource, a null character
is not appended automatically to the
copied string. If count is greater
than the length of strSource, the
destination string is padded with null
characters up to length count.

confuse strlen() with sizeof() when using a string:
char *p = "hello!!";
strlen(p) != sizeof(p)
sizeof(p) yield, at compile time, the size of the pointer (4 or 8 bytes) whereas strlen(p) counts, at runtime, the lenght of the null terminated char array (7 in this example).

strtok is not thread safe, since it uses a mutable private buffer to store data between calls; you cannot interleave or annidate strtok calls also.
A more useful alternative is strtok_r, use it whenever you can.

kmm has already a good list. Here are the things I had problems with when I started to code C.
String literals have an own memory section and are always accessible. Hence they can for example be a return value of function.
Memory management of strings, in particular with a high level library (not libc). Who is responsible to free the string if it is returned by function or passed to a function?
When should "const char *" and when "char *" be used. And what does it tell me if a function returns a "const char *".
All these questions are not too difficult to learn, but hard to figure out if you don't get taught them.

I have found that the char buff[0] technique has been incredibly useful.
Consider:
struct foo {
int x;
char * payload;
};
vs
struct foo {
int x;
char payload[0];
};
see https://stackoverflow.com/questions/295027
See the link for implications and variations

I'd point out the performance pitfalls of over-reliance on the built-in string functions.
char* triple(char* source)
{
int n=strlen(source);
char* dest=malloc(n*3+1);
strcpy(dest,src);
strcat(dest,src);
strcat(dest,src);
return dest;
}

I would discuss when and when not to use strcpy and strncpy and what can go wrong:
char *strncpy(char* destination, const char* source, size_t n);
char *strcpy(char* destination, const char* source );
I would also mention return values of the ansi C stdlib string functions. For example ask "does this if statement pass or fail?"
if (stricmp("StrInG 1", "string 1")==0)
{
.
.
.
}

perhaps you could illustrate the value of sentinel '\0' with following example
char* a = "hello \0 world";
char b[100];
strcpy(b,a);
printf(b);
I once had my fingers burnt when in my zeal I used strcpy() to copy binary data. It worked most of the time but failed mysteriously sometimes. Mystery was revealed when I realized that binary input sometimes contained a zero byte and strcpy() would terminate there.

You could mention indexed addressing.
An elements address is the base address + index * sizeof element

A common error is:
char *p;
snprintf(p, 3, "%d", 42);
it works until you use up to sizeof(p) bytes.. then funny things happens (welcome to the jungle).
Explaination
with char *p you are allocating space for holding a pointer (sizeof(void*) bytes) on the stack. The right thing here is to allocate a buffer or just to specify the size of the pointer at compile time:
char buf[12];
char *p = buf;
snprintf(p, sizeof(buf), "%d", 42);

Pointers and arrays, while having the similar syntax, are not at all the same. Given:
char a[100];
char *p = a;
For the array, a, there is no pointer stored anywhere. sizeof(a) != sizeof(p), for the array it is the size of the block of memory, for the pointer it is the size of the pointer. This become important if you use something like: sizeof(a)/sizeof(a[0]). Also, you can't ++a, and you can make the pointer a 'const' pointer to 'const' chars, but the array can only be 'const' chars, in which case you'd be init it first. etc etc etc

If possible, use strlcpy (instead of strncpy) and strlcat.
Even better, to make life a bit safer, you can use a macro such as:
#define strlcpy_sz(dst, src) (strlcpy(dst, src, sizeof(dst)))

Related

writing myself strncat, unable to enlarge the size of the array C

as far as I'm concerned, strncat enlarges the size of the array you want to cat.
for example:
char str1[] = "This is str1";
char str2[] = "This is str2";
and here the length of str1 is 12 and str2 is also 12, but when I strncat them, str1 changes from 12 to 24.
I was asked to write strncat by my own, but I can't figure out how to enlarge the size of an array, taking in account that we didn't learn pointers yet.
I tried just putting every char in the end of the array while moving the distance by 1 each iteration, but as you would have thought, it doesn't put the data in the array because there is no such position like this in the array (str[20] when str's length is 10 for example).
Thanks in advance,
every help would be appreciated.
strlen returns the length of the string, that is, counts until the first null character. It does NOT return the size of the memory allocated for str1!
When you concatenatestr2 to str1, you write beyond the memory allocated for str1. That will cause undefined behavior. In your particular case, it seems nothing happens and it even seems that str1 has become larger. That is not so. However (in your paticular case), if str2 follows str1 in memory, you just overwrote str2. Try printing str2. It will probaby print his is str2.
Since strcat() et al. does not enlarge a buffer, your implementation does not have to do it. (And it is simply not possible with the parameter list of strcat().) It is the caller's responsibility to pass a destination buffer big enough.
On the caller's side you can simply create an array big enough and pass its address. However, you can still use variable length arrays (VLA):
char str1[] = "This is str1";
char str2[] = "This is str2";
char str1str2[strlen(str1)+strlen(str2)+1];
strcpy( str1str2, str1 );
yourstrcat( str1str2, str2 );
str1str2 is big enough to store both contents plus 1 for the string terminator \0.
Thanks for everyone, I solved the problem. As some of you said, I don't need to enlarge the string, I just need to make sure it's big enough to contain all the data.
what I did eventually is this:
void strnCat(char dest[], char src[], int length)
{
int i = 0;
int len = strlen(dest);
for(i=0; i < length; i++)
{
dest[len+i] = src[i];
dest[len+i+1] = 0;
}
}
so my main problem was that I forget to add the null at the end of the array to make it a string and that I used strlen(str) instead of saving the length in a variable. I did that because I forgot that there is no end of the string after the null disappears.
It is a really strange task to let students implement strncat, since this is one of the C functions that is very difficult to use correctly.
So to implement it yourself, you should read its specification in the C standard or in the POSIX standard. There you will find that strncat doesn't enlarge any array. By the way, arrays cannot be enlarged in C at all, it's impossible by definition. Note the careful distinction between the words array (can contain arbitrary bytes) and string (must contain one null byte) in the standard wording.
A saner alternative to implement is strlcat, which is not in the C standard but also widely known.

C Initialize Character Array from Character Pointer

My question should be rather simple.
I need to give a function a char array of a pre-defined length, but I have a character pointer with variable length, but not longer than the length of my array.
Here the code:
#define SIZE_MAX_PERSON_NAME 50
char person[SIZE_MAX_PERSON_NAME];
char* currentPerson = "John";
now how would I get John into the person array but also setting the rest of the array to 0 (/NUL) ?
so that I would have
BINARY DATA: "John/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL/NUL....."
in my memory?
sorry if this is overly stupid, but I can't seem to find a solution right now.
First, zero-initialize the fixed-size array :
// Using memset here because I don't know if the whole copy operation can or will be used
// multiple times. We want to be sure that the array is properly zero-initialized if the next
// string to copy is shorter than the previous.
memset(person, '\0', SIZE_MAX_PERSON_NAME);
Then, copy the variable-size string into it :
strcpy(person, currentPerson);
If you are not certain that currentPerson will fit into person :
strncpy(person, currentPerson, SIZE_MAX_PERSON_NAME - 1);
Note that strncpy also zero-initialize the remaining bytes of the array if
strlen(currentPerson) < SIZE_MAX_PERSON_NAME - 1
So you basically have these two options :
memset(person, '\0', SIZE_MAX_PERSON_NAME);
strcpy(person, currentPerson);
Or :
strncpy(person, currentPerson, SIZE_MAX_PERSON_NAME - 1);
person[SIZE_MAX_PERSON_NAME - 1] = '\0';
After this answer was posted the question was retagged from C++ to C.
Use a std::string, like this:
// "using namespace std;" or "using std::string;", then:
string const person = currentPerson;
old_c_function( person.c_str() );
To do things at the C level, which I recommend that you don't, first replace the unnecessary #define with a typed constant:
int const max_person_name_size = 50;
Then zero-initialize your array:
char person[max_person_name_size] = {};
(Note: no silly memset here.)
(Also note: this zeroing is only a preventive measure. You wanted it. But it's not really necessary since strcpy will ensure a trailing zero-byte.)
Then just copy in the string:
assert( strlen( current_person ) < max_person_name_size );
strcpy( person, current_person );
But don't do this. Use std::string instead.
Update: doing other things for some minutes made me realize that this answer, as all the others so far, is completely off the mark. The OP states in a comment elsewhere that
” I've got a function in the library which only takes a character array. Not a character pointer.
Thus, apparently it's all about a misconception.
The only way this can make sense is if the array is modified by the function, and then std::string::c_str() is not a solution. But a std::string can still be used, if its length is set to something sufficient for the C function. Can go like this:
person.resize( max_person_name_size );
foo( &person[0] ); // Assuming foo modifies the array.
person.resize( strlen( person.c_str() ) );
With literal, you may do:
char person[SIZE_MAX_PERSON_NAME] = "John";
if c-string is not a literal, you have to do the copy with strcpy
strcpy(person, currentPerson);
This is the one and only reason for the existence of strncpy:
Putting a string (up to the 0-terminator or buffer end) into a fixed-length array and zeroing out the rest.
This does not ensure 0-termination, thus avoid it for anything else.
7.24.2.4 The strncpy function
#include <string.h>
char *strncpy(char * restrict s1, const char * restrict s2, size_t n);
2 The strncpy function copies not more than n characters (characters that follow a null
character are not copied) from the array pointed to by s2 to the array pointed to by s1.308) If copying takes place between objects that overlap, the behavior is undefined.
3 If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written.
4 The strncpy function returns the value of s1.

copy a string in c - memory question:

consider the following code:
t[7] = "Hellow\0";
s[3] = "Dad";
//now copy t to s using the following strcpy function:
void strcpy(char *s, char *t) {
int i = 0;
while ((s[i] = t[i]) != '\0')
i++;
}
the above code is taken from "The C programming Language book".
my question is - we are copying 7 bytes to what was declared as 3 bytes.
how do I know that after copying, other data that was after s[] in the memory
wasn't deleted?
and one more question please: char *s is identical to char* s?
Thank you !
As you correctly point out, passing s[3] as the first argument is going to overwrite some memory that could well be used by something else. At best your program will crash right there and then; at worst, it will carry on running, damaged, and eventually end up corrupting something it was supposed to handle.
The intended way to do this in C is to never pass an array shorter than required.
By the way, it looks like you've swapped s and t; what was meant was probably this:
void strcpy(char *t, char *s) {
int i = 0;
while ((t[i] = s[i]) != '\0')
i++;
}
You can now copy s[4] into t[7] using this amended strcpy routine:
char t[] = "Hellow";
char s[] = "Dad";
strcpy(t, s);
(edit: the length of s is now fixed)
About the first question.
If you're lucky your program will crash.
If you are not it will keep on running and overwrite memory areas that shouldn't be touched (as you don't know what's actually in there). This would be a hell to debug...
About the second question.
Both char* s and char *s do the same thing. It's just a matter of style.
That is, char* s could be interpreted as "s is of type char pointer" while char *s could be interpreted as "s is a pointer to a char". But really, syntactically it's the same.
That example does nothing, you're not invoking strcpy yet. But if you did this:
strcpy(s,t);
It would be wrong in several ways:
The string s is not null terminated. In C the only way strcpy can know where a string ends is by finding the '\0'. The function may think that s is infinite and it might corrupt your memory and make the program crash.
Even if was null terminated, as you said the size of s is only 3. Because of the same cause, strcpy would write memory beyond where s ends, with maybe catastrophic results.
The workaround for this in C is the function strncpy(dst, src, max) in which you specify the maximum number of chars to copy. Still beware that this function might generate a not null terminated string if src is shorter than max chars.
I will assume that both s and t (above the function definition) are arrays of char.
how do I know that after copying, other data that was after s[] in the memory wasn't deleted?
No, this is worse, you are invoking undefined behavior and we know this because the standard says so. All you are allowed to do after the three elements in s is compare. Assignment is a strict no-no. Advance further, and you're not even allowed to compare.
and one more question please: char s is identical to char s?
In most cases it is a matter of style where you stick your asterix except if you are going to declare/define more than one, in which case you need to stick one to every variable you are going to name (as a pointer).
a string-literal "Hellow\0" is equal to "Hellow"
if you define
t[7] = "Hellow";
s[7] = "Dad";
your example is defined and crashes not.

C strcpy() - evil?

Some people seem to think that C's strcpy() function is bad or evil. While I admit that it's usually better to use strncpy() in order to avoid buffer overflows, the following (an implementation of the strdup() function for those not lucky enough to have it) safely uses strcpy() and should never overflow:
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
*s2 is guaranteed to have enough space to store *s1, and using strcpy() saves us from having to store the strlen() result in another function to use later as the unnecessary (in this case) length parameter to strncpy(). Yet some people write this function with strncpy(), or even memcpy(), which both require a length parameter. I would like to know what people think about this. If you think strcpy() is safe in certain situations, say so. If you have a good reason not to use strcpy() in this situation, please give it - I'd like to know why it might be better to use strncpy() or memcpy() in situations like this. If you think strcpy() is okay, but not here, please explain.
Basically, I just want to know why some people use memcpy() when others use strcpy() and still others use plain strncpy(). Is there any logic to preferring one over the three (disregarding the buffer checks of the first two)?
memcpy can be faster than strcpy and strncpy because it does not have to compare each copied byte with '\0', and because it already knows the length of the copied object. It can be implemented in a similar way with the Duff's device, or use assembler instructions that copy several bytes at a time, like movsw and movsd
I'm following the rules in here. Let me quote from it
strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons. strncpy is not by origin a ``bounded strcpy,'' and the Committee has preferred to recognize existing practice rather than alter the function to better suit it to such use.
For that reason, you will not get a trailing '\0' in a string if you hit the n not finding a '\0' from the source string so far. It's easy to misuse it (of course, if you know about that pitfall, you can avoid it). As the quote says, it wasn't designed as a bounded strcpy. And i would prefer not to use it if not necessary. In your case, clearly its use is not necessary and you proved it. Why then use it?
And generally speaking, programming code is also about reducing redundancy. If you know you have a string containing 'n' characters, why tell the copying function to copy maximal n characters? You do redundant checking. It's little about performance, but much more about consistent code. Readers will ask themselves what strcpy could do that could cross the n characters and which makes it necessary to limit the copying, just to read in manuals that this cannot happen in that case. And there the confusion start happen among readers of the code.
For the rational to use mem-, str- or strn-, i chose among them like in the above linked document:
mem- when i want to copy raw bytes, like bytes of a structure.
str- when copying a null terminated string - only when 100% no overflow could happen.
strn- when copying a null terminated string up to some length, filling the remaining bytes with zero. Probably not what i want in most cases. It's easy to forget the fact with the trailing zero-fill, but it's by design as the above quote explains. So, i would just code my own small loop that copies characters, adding a trailing '\0':
char * sstrcpy(char *dst, char const *src, size_t n) {
char *ret = dst;
while(n-- > 0) {
if((*dst++ = *src++) == '\0')
return ret;
}
*dst++ = '\0';
return ret;
}
Just a few lines that do exactly what i want. If i wanted "raw speed" i can still look out for a portable and optimized implementation that does exactly this bounded strcpy job. As always, profile first and then mess with it.
Later, C got functions for working with wide characters, called wcs- and wcsn- (for C99). I would use them likewise.
The reason why people use strncpy not strcpy is because strings are not always null terminated and it's very easy to overflow the buffer (the space you have allocated for the string with strcpy) and overwrite some unrelated bit of memory.
With strcpy this can happen, with strncpy this will never happen. That is why strcpy is considered unsafe. Evil might be a little strong.
Frankly, if you are doing much string handling in C, you should not ask yourself whether you should use strcpy or strncpy or memcpy. You should find or write a string library that provides a higher level abstraction. For example, one that keeps track of the length of each string, allocates memory for you, and provides all the string operations you need.
This will almost certainly guarantee you make very few of the kinds of mistakes usually associated with C string handling, such as buffer overflows, forgetting to terminate a string with a NUL byte, and so on.
The library might have functions such as these:
typedef struct MyString MyString;
MyString *mystring_new(const char *c_str);
MyString *mystring_new_from_buffer(const void *p, size_t len);
void mystring_free(MyString *s);
size_t mystring_len(MyString *s);
int mystring_char_at(MyString *s, size_t offset);
MyString *mystring_cat(MyString *s1, ...); /* NULL terminated list */
MyString *mystring_copy_substring(MyString *s, size_t start, size_t max_chars);
MyString *mystring_find(MyString *s, MyString *pattern);
size_t mystring_find_char(MyString *s, int c);
void mystring_copy_out(void *output, MyString *s, size_t max_chars);
int mystring_write_to_fd(int fd, MyString *s);
int mystring_write_to_file(FILE *f, MyString *s);
I wrote one for the Kannel project, see the gwlib/octstr.h file. It made life much simpler for us. On the other hand, such a library is fairly simple to write, so you might write one for yourself, even if only as an exercise.
No one has mentioned strlcpy, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:
The most common misconception is that
strncpy() NUL-terminates the
destination string. This is only true,
however, if length of the source
string is less than the size
parameter. This can be problematic
when copying user input that may be of
arbitrary length into a fixed size
buffer. The safest way to use
strncpy() in this situation is to pass
it one less than the size of the
destination string, and then terminate
the string by hand. That way you are
guaranteed to always have a
NUL-terminated destination string.
There are counter-arguments for the use of strlcpy; the Wikipedia page makes note that
Drepper argues that strlcpy and
strlcat make truncation errors easier
for a programmer to ignore and thus
can introduce more bugs than they
remove.*
However, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to strncpy. Use of strlcpy makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.
Also note that the lack of strlcpy in glibc or Microsoft's libraries should not be a barrier to use; you can find the source for strlcpy and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top of strlcpy.c.
I personally am of the mindset that if the code can be proven to be valid—and done so quickly—it is perfectly acceptable. That is, if the code is simple and thus obviously correct, then it is fine.
However, your assumption seems to be that while your function is executing, no other thread will modify the string pointed to by s1. What happens if this function is interrupted after successful memory allocation (and thus the call to strlen), the string grows, and bam you have a buffer overflow condition since strcpy copies to the NULL byte.
The following might be better:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
return s2;
}
Now, the string can grow through no fault of your own and you're safe. The result will not be a dup, but it won't be any crazy overflows, either.
The probability of the code you provided actually being a bug is pretty low (pretty close to non-existent, if not non-existent, if you are working in an environment that has no support for threading whatsoever). It's just something to think about.
ETA: Here is a slightly better implementation:
char *
strdup(const char *s1, int *retnum) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
retnum = s1_len;
return s2;
}
There the number of characters is being returned. You can also:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
s2[s1_len+1] = '\0';
return s2;
}
Which will terminate it with a NUL byte. Either way is better than the one that I quickly put together originally.
I agree. I would recommend against strncpy() though, since it will always pad your output to the indicated length. This is some historical decision, which I think was really unfortunate as it seriously worsens the performance.
Consider code like this:
char buf[128];
strncpy(buf, "foo", sizeof buf);
This will not write the expected four characters to buf, but will instead write "foo" followed by 125 zero characters. If you're for instance collecting a lot of short strings, this will mean your actual performance is far worse than expected.
If available, I prefer to use snprintf(), writing the above like:
snprintf(buf, sizeof buf, "foo");
If instead copying a non-constant string, it's done like this:
snprintf(buf, sizeof buf, "%s", input);
This is important, since if input contains % characters snprintf() would interpret them, opening up whole shelvefuls of cans of worms.
I think strncpy is evil too.
To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.
This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.
Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.
So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):
string *string_dup(const string *s1) {
string *s2 = string_alloc(string_len(s1));
if (s2 != NULL) {
string_set(s2,s1);
}
return s2;
}
static inline size_t string_len(const string *s) {
return strlen(s->data);
}
static inline void string_set(string *dest, const string *src) {
// potential (but unlikely) performance issue: strncpy 0-fills dest,
// even if the src is very short. We may wish to optimise
// by switching to memcpy later. But strncpy is better here than
// strcpy, because it means we can use string_set even when
// the length of src is unknown.
strncpy(dest->data, src->data, dest->capacity);
}
string *string_alloc(size_t maxlen) {
if (maxlen > SIZE_MAX - sizeof(string) - 1) return NULL;
string *self = malloc(sizeof(string) + maxlen + 1);
if (self != NULL) {
// empty string
self->data[0] = '\0';
// strncpy doesn't NUL-terminate if it prevents overflow,
// so exclude the NUL-terminator from the capacity, set it now,
// and it can never be overwritten.
self->capacity = maxlen;
self->data[maxlen] = '\0';
}
return self;
}
typedef struct string {
size_t capacity;
char data[0];
} string;
The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.
I'd tend to use memcpy if I have already calculated the length, although strcpy is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.
But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial strlen, so strncpy doesn't buy you anything in terms of safety (and presumbly strncpy is slower as it has to both check bounds and for nul), and any difference between memcpy and strcpy isn't worth changing code for speculatively.
The evil comes when people use it like this (although the below is super simplified):
void BadFunction(char *input)
{
char buffer[1024]; //surely this will **always** be enough
strcpy(buffer, input);
...
}
Which is a situation that happens suprising often.
But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.
strlen finds upto last null terminating place.
But in reality buffers are not null terminated.
that's why people use different functions.
Well, strcpy() is not as evil as strdup() - at least strcpy() is part of Standard C.
In the situation you describe, strcpy is a good choice. This strdup will only get into trouble if the s1 was not ended with a '\0'.
I would add a comment indicating why there are no problems with strcpy, to prevent others (and yourself one year from now) wondering about its correctness for too long.
strncpy often seems safe, but may get you into trouble. If the source "string" is shorter than count, it pads the target with '\0' until it reaches count. That may be bad for performance. If the source string is longer than count, strncpy does not append a '\0' to the target. That is bound to get you into trouble later on when you expect a '\0' terminated "string". So strncpy should also be used with caution!
I would only use memcpy if I was not working with '\0' terminated strings, but that seems to be a matter of taste.
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
Problems:
s1 is unterminated, strlen causes the access of unallocated memory, program crashes.
s1 is unterminated, strlen while not causing the access of unallocated memory access memory from another part of your application. It's returned to the user (security issue) or parsed by another part of your program (heisenbug appears).
s1 is unterminated, strlen results in a malloc which the system can't satisfy, returns NULL. strcpy is passed NULL, program crashes.
s1 is unterminated, strlen results in a malloc which is very large, system allocs far too much memory to perform the task at hand, becomes unstable.
In the best case the code is inefficient, strlen requires access to every element in the string.
There are probably other problems... Look, null termination isn't always a bad idea. There are situations where, for computational efficiency, or to reduce storage requirements it makes sense.
For writing general purpose code, e.g. business logic does it make sense? No.
char* dupstr(char* str)
{
int full_len; // includes null terminator
char* ret;
char* s = str;
#ifdef _DEBUG
if (! str)
toss("arg 1 null", __WHENCE__);
#endif
full_len = strlen(s) + 1;
if (! (ret = (char*) malloc(full_len)))
toss("out of memory", __WHENCE__);
memcpy(ret, s, full_len); // already know len, so strcpy() would be slower
return ret;
}
This answer uses size_t and memcpy() for a fast and simple strdup().
Best to use type size_t as that is the type returned from strlen() and used by malloc() and memcpy(). int is not the proper type for these operations.
memcpy() is rarely slower than strcpy() or strncpy() and often significantly faster.
// Assumption: `s1` points to a C string.
char *strdup(const char *s1) {
size_t size = strlen(s1) + 1;
char *s2 = malloc(size);
if(s2 != NULL) {
memcpy(s2, s1, size);
}
return s2;
}
§7.1.1 1 "A string is a contiguous sequence of characters terminated by and including the first null character. ..."
Your code is terribly inefficient because it runs through the string twice to copy it.
Once in strlen().
Then again in strcpy().
And you don't check s1 for NULL.
Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.

How could this C fragment be written more safely?

I have the following C code fragment and have to identify the error and suggest a way of writing it more safely:
char somestring[] = "Send money!\n";
char *copy;
copy = (char *) malloc(strlen(somestring));
strcpy(copy, somestring);
printf(copy);
So the error is that strlen ignores the trailing '\0' of a string and therefore it is not going to be allocated enough memory for the copy but I'm not sure what they're getting at about writing it more safely?
I could just use malloc(strlen(somestring)+1)) I assume but I'm thinking there must be a better way than that?
EDIT: OK, I've accepted an answer, I suspect that the strdup solution would not be expected from us as it's not part of ANSI C. It seems to be quite a subjective question so I'm not sure if what I've accepted is actually the best. Thanks anyway for all the answers.
I can't comment on the responses above, but in addition to checking the
return code and using strncpy, you should never do:
printf(string)
But use:
printf("%s", string);
ref: http://en.wikipedia.org/wiki/Format_string_attack
char somestring[] = "Send money!\n";
char *copy = strdup(something);
if (copy == NULL) {
// error
}
or just put this logic in a separate function xstrdup:
char * xstrdup(const char *src)
{
char *copy = strdup(src);
if (copy == NULL) {
abort();
}
return copy;
}
char somestring[] = "Send money!\n";
char *copy;
size_t copysize;
copysize = strlen(somestring)+1;
copy = (char *) malloc(copysize);
if (copy == NULL)
bail("Oh noes!\n");
strncpy(copy, somestring, copysize);
printf("%s", copy);
Noted differences above:
Result of malloc() must be checked!
Compute and store the memory size!
Use strncpy() because strcpy() is naughty. In this contrived example it won't hurt, but don't get into the habit of using it.
EDIT:
To those thinking I should be using strdup()... that only works if you take the very narrowest view of the question. That's not only silly, it's overlooking an even better answer:
char somestring[] = "Send money!\n";
char *copy = somestring;
printf(copy);
If you're going to be obtuse, at least be good at it.
strlen + 1, for the \0 terminator
malloc may fail; always check malloc return value
Ick... use strdup() like everyone else said and write it yourself if you have to. Since you have time to think about this now... check out the 25 Most Dangerous Programming Errors at Mitre, then consider why the phrase printf(copy) should never appear in code. That is right up there with malloc(strlen(str)) in terms of utter badness not to mention the headache of tracking down why it causes lots of grief when copy is something like "%s%n"...
I would comment to previous solutions but I do not have enough rep.
Using strncpy here is as wrong as using strcpy(As there is absolutely no risk of overflow). There is a function called memcpy in < string.h > and it is meant exactly for this. It is not only significantly faster, but also the correct function to use to copy strings of known length in standard C.
From the accepted answer:
char somestring[] = "Send money!\n";
char *copy;
size_t copysize;
copysize = strlen(somestring)+1;
copy = (char *) malloc(copysize);
if (copy == NULL)
bail("Oh noes!\n");
memcpy(copy, somestring, copysize); /* You don't use str* functions for this! */
printf("%s", copy);
to add more to Adrian McCarthy's ways to make safer code,
Use a static code analyzer, they are very good at finding this kind of errors
Ways to make the code safer (and more correct).
Don't make an unnecessary copy. From the example, there's no apparent requirement that you actually need to copy somestring. You can output it directly.
If you have to make a copy of a string, write a function to do it (or use strdup if you have it). Then you only have to get it right in one place.
Whenever possible, initialize the pointer to the copy immediately when you declare it.
Remember to allocate space for the null terminator.
Remember to check the return value from malloc.
Remember to free the malloc'ed memory.
Don't call printf with an untrusted format string. Use printf("%s", copy) or puts(copy).
Use an object-oriented language with a string class or any language with built-in string support to avoid most of these problems.
The best way to write it more safely, if one were to be truly interested in such a thing, would be to write it in Ada.
somestring : constant string := "Send money!";
declare
copy : constant string := somestring;
begin
put_line (somestring);
end;
Same result, so what are the differences?
The whole thing is done on the stack
(no pointers). Deallocation is
automatic and safe.
Everything is automaticly range-checked so
there's no chance of buffer-overflow
exploits
Both strings are constants,
so there's no chance of screwing up
and modifying them.
It will probably be way faster than the C, not only because of the lack of dynamic allocation, but because there isn't that extra scan through the string required by strlen().
Note that in Ada "string" is not some special dynamic construct. It's the built-in array of characters. However, Ada arrays can be sized at declaration by the array you assign into them.
The safer way would be to use strncpy instead of strcpy. That function takes a third argument: the length of the string to copy. This solution doesn't stretch beyond ANSI C, so this will work under all environments (whereas other methods may only work under POSIX-compliant systems).
char somestring[] = "Send money!\n";
char *copy;
copy = (char *) malloc(strlen(somestring));
strncpy(copy, somestring, strlen(somestring));
printf(copy);

Resources