C: String manipulation adding more characters without causing a buffer overflow - c

In C I have a path in one of my strings
/home/frankv/
I now want to add the name of files that are contained in this folder - e.g. file1.txt file123.txt etc.
Having declared my variable either like this
char pathToFile[strlen("/home/frankv/")+1]
or
char *pathToFile = malloc(strlen("/home/frankv/")+1)
My problem is that I cannot simply add more characters because it would cause a buffer overflow. Also, what do I do in case I do not know how long the filenames will be?
I've really gotten used to PHP lazy $string1.$string2 .. What is the easiest way to do this in C?

If you've allocated a buffer with malloc(), you can use realloc() to expand it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *buf;
const char s1[] = "hello";
const char s2[] = ", world";
buf = malloc(sizeof s1);
strcpy(buf, s1);
buf = realloc(buf, sizeof s1 + sizeof s2 - 1);
strcat(buf, s2);
puts(buf);
return 0;
}
NOTE: I have omitted error checking. You shouldn't. Always check whether malloc() returns a null pointer; if it does, take some corrective action, even if it's just terminating the program. Likewise for realloc(). And if you want to be able to recover from a realloc() failure, store the result in a temporary so you don't clobber your original pointer.

Use std::string, if possible. Else, reallocate another block of memory and use strcpy and strcat.

You have a couple options, but, if you want to do this dynamically using no additional libraries, realloc() is the stdlib function you're looking for:
char *pathToFile = malloc(strlen("/home/frankv/")+1);
char *string_to_add = "filename.txt";
char *p = realloc(pathToFile, strlen(pathToFile) + strlen(string_to_add) + 1);
if (!p) abort();
pathToFile = p;
strcat(p, string_to_add);
Note: you should always assign the result of realloc to a new pointer first, as realloc() returns NULL on failure. If you assign to the original pointer, you are begging for a memory leak.
If you're going to be doing much string manipulation, though, you may want to consider using a string library. Two I've found useful are bstring and ustr.

In case you can use C++, use the std::string. In case you must to use pure C, use what's call doubling - i.e. when out of space in the string - double the memory and copy the string into the new memory. And you'll have to use the second syntax:
char *pathToFile = malloc(strlen("/home/frankv/")+1);

You have chosen the wrong language for manipulating strings!
The easy and conventional way out is to do something like:
#define MAX_PATH 260
char pathToFile[MAX_PATH+1] = "/home/frankv/";
strcat(pathToFile, "wibble/");
Of course, this is error prone - if the resulting string exceeds MAX_PATH characters, anything can happen, and it is this sort of programming which is the route many trojans and worms use to penetrate security (by corrupting memory in a carefully defined way). Hence my deliberate choice of 260 for MAX_PATH, which is what it used to be in Windows - you can still make Windows Explorer do strange things to your files with paths over 260 characters, possibly because of code like this!
strncat may be a small help - you can at least tell it the maximum size of the destination, and it won't copy beyond that.
To do it robustly you need a string library which does variable length strings correctly. But I don't know if there is such a thing for C (C++ is a different matter, of course).

Related

Is this a good way to use strcpy? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I try to use strcpy; is it a good way to do so?
char *stringvar;
stringvar = malloc(strlen("mi string") + 1);
strcpy(stringvar, "mi string");
stringvar[strlen("mi string")] = '\0';
What do you think?
There is exactly one bug in it: You don't check for malloc-failure.
Aside from that, it's repetitive and error-prone.
And overwriting the terminator with the terminator is useless.
Also, the repeated recalculation of the string-length is expensive.
Anyway, as you already have to determine the length, prefer memcpy() over strcpy().
What you should do is extracting it into a function, let's call it strdup() (that is the name POSIX and the next C standard give it):
char* strdup(const char* s) {
size_t n = strlen(s) + 1;
char* r = malloc(n);
if (r)
memcpy(r, s, n);
return r;
}
char* stringvar = strdup("mi string");
if (!stringvar)
handle_error();
You don't need the last line
stringvar[strlen("mi string")] = '\0';
strcpy takes care of that for you.
In real code you absolutely must check malloc's return value to make sure it's not NULL.
Other than that your code is fine. In particular, you've got the vital + 1 in the call to malloc. strlen gives you the length of the string not including the terminating '\0' character, but strcpy is going to add it, so you absolutely need to allocate space for it.
The problem with strcpy -- the fatal problem, some people say -- is that at the moment you call strcpy, you have no way of telling strcpy how big the destination array is. It's your responsibility to make the array big enough, and avoid overflow. strcpy is unable, by itself, to prevent buffer overflow -- and of course if the destination does overflow, that's a big problem.
So then the question is, how can you ensure -- absolutely ensure -- that all the calls to strcpy in all the code you write are correct? And how can you ensure that later, someone modifying your program won't accidentally mess things up?
Basically, if you use strcpy at all, you want to arrange that two things are right next to each other:
the code that arranges that the pointer variable points to enough space for the string you're about to copy into it, and
the actual call to strcpy that copies the string into that pointer.
So your code
stringvar = malloc(strlen("mi string") + 1);
strcpy(stringvar, "mi string");
comes pretty close to that ideal.
I know your code is only an example, but it does let us explore the concern, what if later, someone modifying your program accidentally messes things up? What if someone changes it to
stringvar = malloc(strlen("mi string") + 1);
strcpy(stringvar, "mi string asombroso");
Obviously we've got a problem. So to make absolutely sure that there's room for the string we're copying, it's even better, I think, if the string we're copying is in a variable, so it's patently obvious that the string we're copying is the same string we allocated space for.
So here's my improved version of your code:
char *inputstring = "mi string";
char *stringvar = malloc(strlen(inputstring) + 1);
if(stringvar == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
strcpy(stringvar, inputstring);
(Unfortunately, the check for a NULL return from malloc gets in the way of the goal of having the strcpy call right next to the malloc call.)
Basically your code is an implementation of the C library function strdup, which takes a string you give it and returns a copy in malloc'ed memory.
One more thing. You were worried about the + 1 in the all to malloc, and as I said, it's correct. A pretty common error is
stringvar = malloc(strlen(inputstring));
strcpy(stringvar, inputstring);
This fails to allocate space for the \0, so when strcpy adds the \0 it overflows the allocated space, so it's a problem.
And with that said, make sure you don't accidentally write
stringvar = malloc(strlen("mi string" + 1));
Do you see the error? This is a surprisingly easy mistake to make, but obviously it doesn't do what you want it to do.
There are some issues with the code posted:
you do not check if malloc() succeeded: if malloc() fails and returns NULL, passing this null pointer to strcpy has undefined behavior.
the last statement stringvar[strlen("mi string")] = '\0'; is useless: strcpy does copy the null terminator at the end of the source string, making the destination array a proper C string.
Here is a corrected version:
#include <stdlib.h>
#include <string.h>
...
char *stringvar;
if ((stringvar = malloc(strlen("mi string") + 1)) != NULL)
strcpy(stringvar, "mi string");
Note that is would be slightly better to store the allocation size and not use strcpy:
char *stringvar;
size_t size = strlen("mi string") + 1;
if ((stringvar = malloc(size)) != NULL)
memcpy(stringvar, "mi string", size);
Indeed it would be even simpler and safer to use strdup(), available on POSIX compliant systems, that performs exactly the above steps:
char *stringvar = strdup("mi string");

Confusion in "strcat function in C assumes the destination string is large enough to hold contents of source string and its own."

So I read that strcat function is to be used carefully as the destination string should be large enough to hold contents of its own and source string. And it was true for the following program that I wrote:
#include <stdio.h>
#include <string.h>
int main(){
char *src, *dest;
printf("Enter Source String : ");
fgets(src, 10, stdin);
printf("Enter destination String : ");
fgets(dest, 20, stdin);
strcat(dest, src);
printf("Concatenated string is %s", dest);
return 0;
}
But not true for the one that I wrote here:
#include <stdio.h>
#include <string.h>
int main(){
char src[11] = "Hello ABC";
char dest[15] = "Hello DEFGIJK";
strcat(dest, src);
printf("concatenated string %s", dest);
getchar();
return 0;
}
This program ends up adding both without considering that destination string is not large enough. Why is it so?
The strcat function has no way of knowing exactly how long the destination buffer is, so it assumes that the buffer passed to it is large enough. If it's not, you invoke undefined behavior by writing past the end of the buffer. That's what's happening in the second piece of code.
The first piece of code is also invalid because both src and dest are uninitialized pointers. When you pass them to fgets, it reads whatever garbage value they contain, treats it as a valid address, then tries to write values to that invalid address. This is also undefined behavior.
One of the things that makes C fast is that it doesn't check to make sure you follow the rules. It just tells you the rules and assumes that you follow them, and if you don't bad things may or may not happen. In your particular case it appeared to work but there's no guarantee of that.
For example, when I ran your second piece of code it also appeared to work. But if I changed it to this:
#include <stdio.h>
#include <string.h>
int main(){
char dest[15] = "Hello DEFGIJK";
strcat(dest, "Hello ABC XXXXXXXXXX");
printf("concatenated string %s", dest);
return 0;
}
The program crashes.
I think your confusion is not actually about the definition of strcat. Your real confusion is that you assumed that the C compiler would enforce all the "rules". That assumption is quite false.
Yes, the first argument to strcat must be a pointer to memory sufficient to store the concatenated result. In both of your programs, that requirement is violated. You may be getting the impression, from the lack of error messages in either program, that perhaps the rule isn't what you thought it was, that somehow it's valid to call strcat even when the first argument is not a pointer to enough memory. But no, that's not the case: calling strcat when there's not enough memory is definitely wrong. The fact that there were no error messages, or that one or both programs appeared to "work", proves nothing.
Here's an analogy. (You may even have had this experience when you were a child.) Suppose your mother tells you not to run across the street, because you might get hit by a car. Suppose you run across the street anyway, and do not get hit by a car. Do you conclude that your mother's advice was incorrect? Is this a valid conclusion?
In summary, what you read was correct: strcat must be used carefully. But let's rephrase that: you must be careful when calling strcat. If you're not careful, all sorts of things can go wrong, without any warning. In fact, many style guides recommend not using functions such as strcat at all, because they're so easy to misuse if you're careless. (Functions such as strcat can be used perfectly safely as long as you're careful -- but of course not all programmers are sufficiently careful.)
The strcat() function is indeed to be used carefully because it doesn't protect you from anything. If the source string isn't NULL-terminated, the destination string isn't NULL-terminated, or the destination string doesn't have enough space, strcat will still copy data. Therefore, it is easy to overwrite data you didn't mean to overwrite. It is your responsibility to make sure you have enough space. Using strncat() instead of strcat will also give you some extra safety.
Edit Here's an example:
#include <stdio.h>
#include <string.h>
int main()
{
char s1[16] = {0};
char s2[16] = {0};
strcpy(s2, "0123456789abcdefOOPS WAY TOO LONG");
/* ^^^ purposefully copy too much data into s2 */
printf("-%s-\n",s1);
return 0;
}
I never assigned to s1, so the output should ideally be --. However, because of how the compiler happened to arrange s1 and s2 in memory, the output I actually got was -OOPS WAY TOO LONG-. The strcpy(s2,...) overwrote the contents of s1 as well.
On gcc, -Wall or -Wstringop-overflow will help you detect situations like this one, where the compiler knows the size of the source string. However, in general, the compiler can't know how big your data will be. Therefore, you have to write code that makes sure you don't copy more than you have room for.
Both snippets invoke undefined behavior - the first because src and dest are not initialized to point anywhere meaningful, and the second because you are writing past the end of the array.
C does not enforce any kind of bounds checking on array accesses - you won't get an "Index out of range" exception if you try to write past the end of an array. You may get a runtime error if you try to access past a page boundary or clobber something important like the frame pointer, but otherwise you just risk corrupting data in your program.
Yes, you are responsible for making sure the target buffer is large enough for the final string. Otherwise the results are unpredictable.
I'd like to point out what is actually happening in the 2nd program in order to illustrate the problem.
It allocates 15 bytes at the memory location starting at dest and copies 14 bytes into it (including the null terminator):
char dest[15] = "Hello DEFGIJK";
...and 11 bytes at src with 10 bytes copied into it:
char src[11] = "Hello ABC";
The strcat() call then copies 10 bytes (9 chars plus the null terminator) from src into dest, starting right after the 'K' in dest. The resulting string at dest will be 23 bytes long including the null terminator. The problem is, you allocated only 15 bytes at dest, and the memory adjacent to that memory will be overwritten, i.e. corrupted, leading to program instability, wrong results, data corruption, etc.
Note that the strcat() function knows nothing about the amount of memory you've allocated at dest (or src, for that matter). It is up to you to make sure you've allocated enough memory at dest to prevent memory corruption.
By the way, the first program doesn't allocate memory at dest or src at all, so your calls to fgets() are corrupting memory starting at those locations.

C strcpy() - evil?

Some people seem to think that C's strcpy() function is bad or evil. While I admit that it's usually better to use strncpy() in order to avoid buffer overflows, the following (an implementation of the strdup() function for those not lucky enough to have it) safely uses strcpy() and should never overflow:
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
*s2 is guaranteed to have enough space to store *s1, and using strcpy() saves us from having to store the strlen() result in another function to use later as the unnecessary (in this case) length parameter to strncpy(). Yet some people write this function with strncpy(), or even memcpy(), which both require a length parameter. I would like to know what people think about this. If you think strcpy() is safe in certain situations, say so. If you have a good reason not to use strcpy() in this situation, please give it - I'd like to know why it might be better to use strncpy() or memcpy() in situations like this. If you think strcpy() is okay, but not here, please explain.
Basically, I just want to know why some people use memcpy() when others use strcpy() and still others use plain strncpy(). Is there any logic to preferring one over the three (disregarding the buffer checks of the first two)?
memcpy can be faster than strcpy and strncpy because it does not have to compare each copied byte with '\0', and because it already knows the length of the copied object. It can be implemented in a similar way with the Duff's device, or use assembler instructions that copy several bytes at a time, like movsw and movsd
I'm following the rules in here. Let me quote from it
strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons. strncpy is not by origin a ``bounded strcpy,'' and the Committee has preferred to recognize existing practice rather than alter the function to better suit it to such use.
For that reason, you will not get a trailing '\0' in a string if you hit the n not finding a '\0' from the source string so far. It's easy to misuse it (of course, if you know about that pitfall, you can avoid it). As the quote says, it wasn't designed as a bounded strcpy. And i would prefer not to use it if not necessary. In your case, clearly its use is not necessary and you proved it. Why then use it?
And generally speaking, programming code is also about reducing redundancy. If you know you have a string containing 'n' characters, why tell the copying function to copy maximal n characters? You do redundant checking. It's little about performance, but much more about consistent code. Readers will ask themselves what strcpy could do that could cross the n characters and which makes it necessary to limit the copying, just to read in manuals that this cannot happen in that case. And there the confusion start happen among readers of the code.
For the rational to use mem-, str- or strn-, i chose among them like in the above linked document:
mem- when i want to copy raw bytes, like bytes of a structure.
str- when copying a null terminated string - only when 100% no overflow could happen.
strn- when copying a null terminated string up to some length, filling the remaining bytes with zero. Probably not what i want in most cases. It's easy to forget the fact with the trailing zero-fill, but it's by design as the above quote explains. So, i would just code my own small loop that copies characters, adding a trailing '\0':
char * sstrcpy(char *dst, char const *src, size_t n) {
char *ret = dst;
while(n-- > 0) {
if((*dst++ = *src++) == '\0')
return ret;
}
*dst++ = '\0';
return ret;
}
Just a few lines that do exactly what i want. If i wanted "raw speed" i can still look out for a portable and optimized implementation that does exactly this bounded strcpy job. As always, profile first and then mess with it.
Later, C got functions for working with wide characters, called wcs- and wcsn- (for C99). I would use them likewise.
The reason why people use strncpy not strcpy is because strings are not always null terminated and it's very easy to overflow the buffer (the space you have allocated for the string with strcpy) and overwrite some unrelated bit of memory.
With strcpy this can happen, with strncpy this will never happen. That is why strcpy is considered unsafe. Evil might be a little strong.
Frankly, if you are doing much string handling in C, you should not ask yourself whether you should use strcpy or strncpy or memcpy. You should find or write a string library that provides a higher level abstraction. For example, one that keeps track of the length of each string, allocates memory for you, and provides all the string operations you need.
This will almost certainly guarantee you make very few of the kinds of mistakes usually associated with C string handling, such as buffer overflows, forgetting to terminate a string with a NUL byte, and so on.
The library might have functions such as these:
typedef struct MyString MyString;
MyString *mystring_new(const char *c_str);
MyString *mystring_new_from_buffer(const void *p, size_t len);
void mystring_free(MyString *s);
size_t mystring_len(MyString *s);
int mystring_char_at(MyString *s, size_t offset);
MyString *mystring_cat(MyString *s1, ...); /* NULL terminated list */
MyString *mystring_copy_substring(MyString *s, size_t start, size_t max_chars);
MyString *mystring_find(MyString *s, MyString *pattern);
size_t mystring_find_char(MyString *s, int c);
void mystring_copy_out(void *output, MyString *s, size_t max_chars);
int mystring_write_to_fd(int fd, MyString *s);
int mystring_write_to_file(FILE *f, MyString *s);
I wrote one for the Kannel project, see the gwlib/octstr.h file. It made life much simpler for us. On the other hand, such a library is fairly simple to write, so you might write one for yourself, even if only as an exercise.
No one has mentioned strlcpy, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:
The most common misconception is that
strncpy() NUL-terminates the
destination string. This is only true,
however, if length of the source
string is less than the size
parameter. This can be problematic
when copying user input that may be of
arbitrary length into a fixed size
buffer. The safest way to use
strncpy() in this situation is to pass
it one less than the size of the
destination string, and then terminate
the string by hand. That way you are
guaranteed to always have a
NUL-terminated destination string.
There are counter-arguments for the use of strlcpy; the Wikipedia page makes note that
Drepper argues that strlcpy and
strlcat make truncation errors easier
for a programmer to ignore and thus
can introduce more bugs than they
remove.*
However, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to strncpy. Use of strlcpy makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.
Also note that the lack of strlcpy in glibc or Microsoft's libraries should not be a barrier to use; you can find the source for strlcpy and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top of strlcpy.c.
I personally am of the mindset that if the code can be proven to be valid—and done so quickly—it is perfectly acceptable. That is, if the code is simple and thus obviously correct, then it is fine.
However, your assumption seems to be that while your function is executing, no other thread will modify the string pointed to by s1. What happens if this function is interrupted after successful memory allocation (and thus the call to strlen), the string grows, and bam you have a buffer overflow condition since strcpy copies to the NULL byte.
The following might be better:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
return s2;
}
Now, the string can grow through no fault of your own and you're safe. The result will not be a dup, but it won't be any crazy overflows, either.
The probability of the code you provided actually being a bug is pretty low (pretty close to non-existent, if not non-existent, if you are working in an environment that has no support for threading whatsoever). It's just something to think about.
ETA: Here is a slightly better implementation:
char *
strdup(const char *s1, int *retnum) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
retnum = s1_len;
return s2;
}
There the number of characters is being returned. You can also:
char *
strdup(const char *s1) {
int s1_len = strlen(s1);
char *s2 = malloc(s1_len+1);
if(s2 == NULL) {
return NULL;
}
strncpy(s2, s1, s1_len);
s2[s1_len+1] = '\0';
return s2;
}
Which will terminate it with a NUL byte. Either way is better than the one that I quickly put together originally.
I agree. I would recommend against strncpy() though, since it will always pad your output to the indicated length. This is some historical decision, which I think was really unfortunate as it seriously worsens the performance.
Consider code like this:
char buf[128];
strncpy(buf, "foo", sizeof buf);
This will not write the expected four characters to buf, but will instead write "foo" followed by 125 zero characters. If you're for instance collecting a lot of short strings, this will mean your actual performance is far worse than expected.
If available, I prefer to use snprintf(), writing the above like:
snprintf(buf, sizeof buf, "foo");
If instead copying a non-constant string, it's done like this:
snprintf(buf, sizeof buf, "%s", input);
This is important, since if input contains % characters snprintf() would interpret them, opening up whole shelvefuls of cans of worms.
I think strncpy is evil too.
To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.
This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.
Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.
So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):
string *string_dup(const string *s1) {
string *s2 = string_alloc(string_len(s1));
if (s2 != NULL) {
string_set(s2,s1);
}
return s2;
}
static inline size_t string_len(const string *s) {
return strlen(s->data);
}
static inline void string_set(string *dest, const string *src) {
// potential (but unlikely) performance issue: strncpy 0-fills dest,
// even if the src is very short. We may wish to optimise
// by switching to memcpy later. But strncpy is better here than
// strcpy, because it means we can use string_set even when
// the length of src is unknown.
strncpy(dest->data, src->data, dest->capacity);
}
string *string_alloc(size_t maxlen) {
if (maxlen > SIZE_MAX - sizeof(string) - 1) return NULL;
string *self = malloc(sizeof(string) + maxlen + 1);
if (self != NULL) {
// empty string
self->data[0] = '\0';
// strncpy doesn't NUL-terminate if it prevents overflow,
// so exclude the NUL-terminator from the capacity, set it now,
// and it can never be overwritten.
self->capacity = maxlen;
self->data[maxlen] = '\0';
}
return self;
}
typedef struct string {
size_t capacity;
char data[0];
} string;
The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.
I'd tend to use memcpy if I have already calculated the length, although strcpy is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.
But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial strlen, so strncpy doesn't buy you anything in terms of safety (and presumbly strncpy is slower as it has to both check bounds and for nul), and any difference between memcpy and strcpy isn't worth changing code for speculatively.
The evil comes when people use it like this (although the below is super simplified):
void BadFunction(char *input)
{
char buffer[1024]; //surely this will **always** be enough
strcpy(buffer, input);
...
}
Which is a situation that happens suprising often.
But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.
strlen finds upto last null terminating place.
But in reality buffers are not null terminated.
that's why people use different functions.
Well, strcpy() is not as evil as strdup() - at least strcpy() is part of Standard C.
In the situation you describe, strcpy is a good choice. This strdup will only get into trouble if the s1 was not ended with a '\0'.
I would add a comment indicating why there are no problems with strcpy, to prevent others (and yourself one year from now) wondering about its correctness for too long.
strncpy often seems safe, but may get you into trouble. If the source "string" is shorter than count, it pads the target with '\0' until it reaches count. That may be bad for performance. If the source string is longer than count, strncpy does not append a '\0' to the target. That is bound to get you into trouble later on when you expect a '\0' terminated "string". So strncpy should also be used with caution!
I would only use memcpy if I was not working with '\0' terminated strings, but that seems to be a matter of taste.
char *strdup(const char *s1)
{
char *s2 = malloc(strlen(s1)+1);
if(s2 == NULL)
{
return NULL;
}
strcpy(s2, s1);
return s2;
}
Problems:
s1 is unterminated, strlen causes the access of unallocated memory, program crashes.
s1 is unterminated, strlen while not causing the access of unallocated memory access memory from another part of your application. It's returned to the user (security issue) or parsed by another part of your program (heisenbug appears).
s1 is unterminated, strlen results in a malloc which the system can't satisfy, returns NULL. strcpy is passed NULL, program crashes.
s1 is unterminated, strlen results in a malloc which is very large, system allocs far too much memory to perform the task at hand, becomes unstable.
In the best case the code is inefficient, strlen requires access to every element in the string.
There are probably other problems... Look, null termination isn't always a bad idea. There are situations where, for computational efficiency, or to reduce storage requirements it makes sense.
For writing general purpose code, e.g. business logic does it make sense? No.
char* dupstr(char* str)
{
int full_len; // includes null terminator
char* ret;
char* s = str;
#ifdef _DEBUG
if (! str)
toss("arg 1 null", __WHENCE__);
#endif
full_len = strlen(s) + 1;
if (! (ret = (char*) malloc(full_len)))
toss("out of memory", __WHENCE__);
memcpy(ret, s, full_len); // already know len, so strcpy() would be slower
return ret;
}
This answer uses size_t and memcpy() for a fast and simple strdup().
Best to use type size_t as that is the type returned from strlen() and used by malloc() and memcpy(). int is not the proper type for these operations.
memcpy() is rarely slower than strcpy() or strncpy() and often significantly faster.
// Assumption: `s1` points to a C string.
char *strdup(const char *s1) {
size_t size = strlen(s1) + 1;
char *s2 = malloc(size);
if(s2 != NULL) {
memcpy(s2, s1, size);
}
return s2;
}
§7.1.1 1 "A string is a contiguous sequence of characters terminated by and including the first null character. ..."
Your code is terribly inefficient because it runs through the string twice to copy it.
Once in strlen().
Then again in strcpy().
And you don't check s1 for NULL.
Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.

How could this C fragment be written more safely?

I have the following C code fragment and have to identify the error and suggest a way of writing it more safely:
char somestring[] = "Send money!\n";
char *copy;
copy = (char *) malloc(strlen(somestring));
strcpy(copy, somestring);
printf(copy);
So the error is that strlen ignores the trailing '\0' of a string and therefore it is not going to be allocated enough memory for the copy but I'm not sure what they're getting at about writing it more safely?
I could just use malloc(strlen(somestring)+1)) I assume but I'm thinking there must be a better way than that?
EDIT: OK, I've accepted an answer, I suspect that the strdup solution would not be expected from us as it's not part of ANSI C. It seems to be quite a subjective question so I'm not sure if what I've accepted is actually the best. Thanks anyway for all the answers.
I can't comment on the responses above, but in addition to checking the
return code and using strncpy, you should never do:
printf(string)
But use:
printf("%s", string);
ref: http://en.wikipedia.org/wiki/Format_string_attack
char somestring[] = "Send money!\n";
char *copy = strdup(something);
if (copy == NULL) {
// error
}
or just put this logic in a separate function xstrdup:
char * xstrdup(const char *src)
{
char *copy = strdup(src);
if (copy == NULL) {
abort();
}
return copy;
}
char somestring[] = "Send money!\n";
char *copy;
size_t copysize;
copysize = strlen(somestring)+1;
copy = (char *) malloc(copysize);
if (copy == NULL)
bail("Oh noes!\n");
strncpy(copy, somestring, copysize);
printf("%s", copy);
Noted differences above:
Result of malloc() must be checked!
Compute and store the memory size!
Use strncpy() because strcpy() is naughty. In this contrived example it won't hurt, but don't get into the habit of using it.
EDIT:
To those thinking I should be using strdup()... that only works if you take the very narrowest view of the question. That's not only silly, it's overlooking an even better answer:
char somestring[] = "Send money!\n";
char *copy = somestring;
printf(copy);
If you're going to be obtuse, at least be good at it.
strlen + 1, for the \0 terminator
malloc may fail; always check malloc return value
Ick... use strdup() like everyone else said and write it yourself if you have to. Since you have time to think about this now... check out the 25 Most Dangerous Programming Errors at Mitre, then consider why the phrase printf(copy) should never appear in code. That is right up there with malloc(strlen(str)) in terms of utter badness not to mention the headache of tracking down why it causes lots of grief when copy is something like "%s%n"...
I would comment to previous solutions but I do not have enough rep.
Using strncpy here is as wrong as using strcpy(As there is absolutely no risk of overflow). There is a function called memcpy in < string.h > and it is meant exactly for this. It is not only significantly faster, but also the correct function to use to copy strings of known length in standard C.
From the accepted answer:
char somestring[] = "Send money!\n";
char *copy;
size_t copysize;
copysize = strlen(somestring)+1;
copy = (char *) malloc(copysize);
if (copy == NULL)
bail("Oh noes!\n");
memcpy(copy, somestring, copysize); /* You don't use str* functions for this! */
printf("%s", copy);
to add more to Adrian McCarthy's ways to make safer code,
Use a static code analyzer, they are very good at finding this kind of errors
Ways to make the code safer (and more correct).
Don't make an unnecessary copy. From the example, there's no apparent requirement that you actually need to copy somestring. You can output it directly.
If you have to make a copy of a string, write a function to do it (or use strdup if you have it). Then you only have to get it right in one place.
Whenever possible, initialize the pointer to the copy immediately when you declare it.
Remember to allocate space for the null terminator.
Remember to check the return value from malloc.
Remember to free the malloc'ed memory.
Don't call printf with an untrusted format string. Use printf("%s", copy) or puts(copy).
Use an object-oriented language with a string class or any language with built-in string support to avoid most of these problems.
The best way to write it more safely, if one were to be truly interested in such a thing, would be to write it in Ada.
somestring : constant string := "Send money!";
declare
copy : constant string := somestring;
begin
put_line (somestring);
end;
Same result, so what are the differences?
The whole thing is done on the stack
(no pointers). Deallocation is
automatic and safe.
Everything is automaticly range-checked so
there's no chance of buffer-overflow
exploits
Both strings are constants,
so there's no chance of screwing up
and modifying them.
It will probably be way faster than the C, not only because of the lack of dynamic allocation, but because there isn't that extra scan through the string required by strlen().
Note that in Ada "string" is not some special dynamic construct. It's the built-in array of characters. However, Ada arrays can be sized at declaration by the array you assign into them.
The safer way would be to use strncpy instead of strcpy. That function takes a third argument: the length of the string to copy. This solution doesn't stretch beyond ANSI C, so this will work under all environments (whereas other methods may only work under POSIX-compliant systems).
char somestring[] = "Send money!\n";
char *copy;
copy = (char *) malloc(strlen(somestring));
strncpy(copy, somestring, strlen(somestring));
printf(copy);

Make a copy of a char*

I have a function that accepts a char* as one of its parameters. I need to manipulate it, but leave the original char* intact. Essentially, I want to create a working copy of this char*. It seems like this should be easy, but I am really struggling.
My first (naive) attempt was to create another char* and set it equal to the original:
char* linkCopy = link;
This doesn't work, of course, because all I did was cause them to point to the same place.
Should I use strncpy to accomplish this?
I have tried the following, but it causes a crash:
char linkCopy[sizeof(link)] = strncpy(linkCopy, link, sizeof(link));
Am I missing something obvious...?
EDIT: My apologies, I was trying to simplify the examples, but I left some of the longer variable names in the second example. Fixed.
The sizeof will give you the size of the pointer. Which is often 4 or 8 depending on your processor/compiler, but not the size of the string pointed to. You can use strlen and strcpy:
// +1 because of '\0' at the end
char * copy = malloc(strlen(original) + 1);
strcpy(copy, original);
...
free(copy); // at the end, free it again.
I've seen some answers propose use of strdup, but that's a posix function, and not part of C.
You might want to take a look at the strdup (man strdup) function:
char *linkCopy = strdup(link);
/* Do some work here */
free(linkCopy);
Edit: And since you need it to be standard C, do as others have pointed out:
char *linkCopy = malloc(strlen(link) + 1);
/* Note that strncpy is unnecessary here since you know both the size
* of the source and destination buffers
*/
strcpy(linkCopy, link);
/* Do some work */
free(linkCopy);
Since strdup() is not in ANSI/ISO standard C, if it's not available in your compiler's runtime, go ahead and use this:
/*
** Portable, public domain strdup() originally by Bob Stout
*/
#include <stdlib.h>
#include <string.h>
char* strdup(const char* str)
{
char* newstr = (char*) malloc( strlen( str) + 1);
if (newstr) {
strcpy( newstr, str);
}
return newstr;
}
Use strdup, or strndup if you know the size (more secure).
Like:
char* new_char = strdup(original);
... manipulate it ...
free(new_char)
ps.: Not a C standard
Some answers, including the accepted one are a bit off. You do not strcpy a string you have just strlen'd. strcpy should not be used at all in modern programs.
The correct thing to do is a memcpy.
EDIT: memcpy is very likely to be faster in any architecture, strcpy can only possibly perform better for very short strings and should be avoided for security reasons even if they are not relevant in this case.
You are on the right track, you need to use strcpy/strncpy to make copies of strings. Simply assigning them just makes an "alias" of it, a different name that points to the same thing.
Your main problem in your second attempt is that you can't assign to an array that way. The second problem is you seem to have come up with some new names in the function call that I can't tell where they came from.
What you want is:
char linkCopy[sizeof(link)];
strncpy(linkCopy, chLastLink, sizeof(link));
but be careful, sizeof does not always work the way you want it to on strings. Use strlen, or use strdup.
Like sean.bright said strdup() is the easiest way to deal with the copy. But strdup() while widely available is not std C. This method also keeps the copied string in the heap.
char *linkCopy = strdup(link);
/* Do some work here */
free(linkCopy);
If you are committed to using a stack allocated string and strncpy() you need some changes. You wrote:
char linkCopy[sizeof(link)]
That creates a char array (aka string) on the stack that is the size of a pointer (probably 4 bytes). Your third parameter to strncpy() has the same problem. You probably want to write:
char linkCopy[strlen(link)+1];
strncpy(linkCopy,link,strlen(link)+1);
You don't say whether you can use C++ instead of C, but if you can use C++ and the STL it's even easier:
std::string newString( original );
Use newString as you would have used the C-style copy above, its semantics are identical. You don't need to free() it, it is a stack object and will be disposed of automatically.

Resources