I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain
I'm somewhat new to C and am wondering about certain things about memory allocation. My function is as follows:
size_t count_nwords(const char* str) {
//char* copied_str = strdup(str); // because 'strtok()' alters the string it passes through
char copied_str[strlen(str)];
strcpy(copied_str, str);
size_t count = 1;
strtok(copied_str, " ");
while(strtok(NULL, " ") != 0) {
count++;
}
//free(copied_str);
return count;
}
This function counts the amount of words in a string (the delimiter is a space, ie ""). I do not want the string passed in argument to be modified.
I have two questions:
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy() is sufficient and faster, but I am not certain.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the function is robust? Or is using size_t nwords = count_nwords(copied_input); completely safe and will always properly get the returned value?
Thank you!
EDIT: I've accepted the only answer that concerned my questions precisely, but I advise reading the other answers as they provide good insights regarding errors I had made in my code.
Failure to account for the null character
// char copied_str[strlen(str)];
char copied_str[strlen(str) + 1];
strcpy(copied_str, str);
Wrong algorithm
Even with above fix, code returns 1 with count_nwords(" ")
Unnecessary copying of string
strtok() not needed here. A copy of the string is not needed.
Alternative: walk the string.
size_t count_nwords(const char* str) {
size_t count = 0;
while (*str) {
while (isspace((unsigned char) *str)) {
str++;
}
if (*str) {
count++;
while (!isspace((unsigned char) *str) && *str) {
str++;
}
}
}
return count;
}
Another option is the state-loop approach where you continually loop over each character keeping track of the state of your count with a simple flag. (you are either in a word reading characters or you are reading spaces). The benefit being you have only a single loop involved. A short example would be:
size_t count_words (const char *str)
{
size_t words = 0;
int in_word = 0;
while (*str) {
if (isspace ((unsigned char)*str))
in_word = 0;
else {
if (!in_word)
words++;
in_word = 1;
}
str++;
}
return words;
}
It is worth understanding all techniques. isspace requires the inclusion of ctype.h.
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy()
is sufficient and faster, but I am not certain.
Your solution is clean and works well so don't bother. The only point is that you are using VLA which is now optional, then using strdup would be less standard prone. Now regarding performance, as it is not specified how VLAs are implemented, performance may vary from compiler/platform to compiler/platform (gcc is known to use stack for VLAs but any other compiler may use heap). We only know that strdup allocates on the heap, that's all. I doubt that performance problem will come from such a choice.
Note: you allocation size is wrong and should be at least strlen(str)+1.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the
function is robust? Or is using size_t nwords =
count_nwords(copied_input); completely safe and will always properly
get the returned value?
Managing return values and memory suitable for is a concern of the compiler. Usually, these values are transfered on/from the stack (have some reading on "stack frame"). As you may suspect, space is allocated on the stack for it just before the call and is deallocated after the call (as soon as you discard or copy the returned value).
void trim(char *line)
{
int i = 0;
char new_line[strlen(line)];
char *start_line = line;
while (*line != '\0')
{
if (*line != ' ' && *line != '\t')
{
new_line[i] = *line;
i++;
}
line++;
}
new_line[i] = '\0';
printf("%s\n", start_line);
printf("%s\n", new_line);
strcpy(start_line, new_line);
}
I really cannot find the problem here. My pointers are initialized, and I made a pointer to have the start of the string line. At the end I would like to copy the new line in the old one, so the caller has a changed value of his line.
But strcpy() makes a segmentation fault. What is wrong?
This is the code that calls trim():
char *str = "Irish People Try American Food";
printf("%s\n", str);
trim(str);
printf("%s\n", str);
You need to show the whole program; what calls "trim()"? Paul R's answer is right, you are one character short and it should be at least:
char new_line[strlen(line) + 1];
However, this will not always cause a segfault, and if it did it would probably not be at strcpy().
The likely reason strcpy(start_line, new_line) is faulting is that start_line points to the original value of line. It is likely that you are calling the function like:
int main() {
trim("blah blah\tblah");
return 0;
}
If so, line is a pointer to a constant char array that can't be modified. On many OS's this is stored in a read-only memory area, so it will cause an immediate segmentation fault if a write attempt is made. So strcpy() then faults when trying to write into to this read only location.
As a quick test try this:
int main() {
char test[100] = "blah blah\tblah";
trim(test);
return 0;
}
If it works, that's your specific issue with strcpy() faulting.
EDIT - the question was updated later to include the main() calling function, which confirmed that the trim function was called with a pointer to a string constant. The problem line is:
char *str = "Irish People Try American Food";
This creates a string literal, an array of 31 characters including a null terminator which cannot be modified. The pointer str is then initialized with the address of this constant, array.
The correction is to allocate a regular array of characters and then initialize it with the known string. In this case the assignment and temporary constant string literal may or may not be optimized out, but the end result is always the same - a writable array of characters initialized with the desired text:
char str[100] = "Irish People Try American Food";
/* or */
char str2[] = "American People Like Irish Beer";
/* or */
char *str3[37];
strcpy(str3, "And German Beer"); /* example only, need to check length */
These create normal writable char arrays of lengths 100, 32, and 37, respectively. Each is then initialized with the given strings.
The ANSI/ISO C standard defined the language such that a string literal is a array of char that cannot be modified. This is the case even as it was first standardized in C89. Prior to this string literals had been commonly writable, such as in the pre-standard K&R C of very early UNIX code.
Identical string literals of either form need not be distinct. If
the program attempts to modify a string literal of either form, the
behavior is undefined. - ANSI X3.159-1989
Many C89 and newer compilers have since then placed this array into the .text or .rodata segments where it may even be physically unwritable (ROM, read-only MMU pages, etc.), as discovered here. Compilers may also coalesce duplicate string constants into single one to conserve space - and you wouldn't to write into "one" of those either!
The fact that these semantically unwritable strings were still left as type char *, and that they could be assigned to and passed as such was known to be a compromise, even as the C89 standard was being drafted. That they did not use the then-brand-new type qualifier const was described as a "not-completely-happy result". See Richie's (DMR's) explanation.
And apparently that result still boomerangs around and whacks people upside the head nearly 30 years later.
Your new_line string is one char too small - it does not have room for the final '\0' terminator - change:
char new_line[strlen(line)];
to:
char new_line[strlen(line) + 1];
You should also be aware that string literals can not be modified, so if you try to call your function like this:
trim("Hello world!");
then this will result in undefined behaviour. (You should also get a compiler warning if you try to do this.)
As #PaulR stated, your new line's buffer is too small. But instead of using another buffer that takes up more space, you could use a single-character approach, like this:
void trim(char *s)
{
char *src = s, *dest = s;
while (*src)
{
if ((*src != ' ') && (*src != '\t'))
*dest++ = *src;
++src;
}
*dest = '\0';
}
I'm working my way in understanding pointers. I wrote this string copy functionality in C.
#include<stdio.h>
char *my_strcpy(char *dest, char *source)
{
while (*source != '\0')
{
*dest++ = *source++;
}
*dest = '\0';
return dest;
}
int main(void)
{
char* temp="temp";
char* temp1=NULL;
my_strcpy(temp1,temp);
puts(temp1);
return 0;
}
This program gives a segfault.If I change char* temp1=NULL to char* temp1 still it fails. If I change char* temp1 to char temp1[80], the code works. The code also works if char temp1[1] and gives the output as temp. I was thinking the output should be t. Why is it like this and why do I get error with char* temp.
Because you're not allocating space for the destination string. You're trying to write to memory at position NULL (almost certainly 0x00).
Try char* temp1= malloc(strlen(temp)+1); or something like it. That will allocate some memory and then you can copy the characters into it. The +1 is for the trailing null character.
If you wrote Java and friends, it would prevent you from accessing memory off the end of the array. But at a language level, C lets you write to memory anywhere you want. And then crash (hopefully immediately but maybe next week). Arrays aren't strictly enforced data types, they are just conventions for allocating and referencing memory.
If you create it as char temp1[1] then you are allocating some memory on the stack. Memory near that may be accessible (you can read and write to it) but you will be scribbling over other memory intended for something else. This is a classic memory bug.
Also style: I personally advise against using the return values from ++s. It's harder to read and makes you think twice.
*dest = *source;
dest++;
source++;
Is clearer. But that's just my opinion.
You must to allocate space for the destination parameter.
When you use char temp1[80], you allocate 80 bytes in the memory.
You can allocate memory in static way, like array, or use the malloc function
I've tried to write a string replace function in C, which works on a char *, which has been allocated using malloc(). It's a little different in that it will find and replace strings, rather than characters in the starting string.
It's trivial to do if the search and replace strings are the same length (or the replace string is shorter than the search string), since I have enough space allocated. If I try to use realloc(), I get an error that tells me I am doing a double free - which I don't see how I am, since I am only using realloc().
Perhaps a little code will help:
void strrep(char *input, char *search, char *replace) {
int searchLen = strlen(search);
int replaceLen = strlen(replace);
int delta = replaceLen - searchLen;
char *find = input;
while (find = strstr(find, search)) {
if (delta > 0) {
realloc(input, strlen(input) + delta);
find = strstr(input, search);
}
memmove(find + replaceLen, find + searchLen, strlen(input) - (find - input));
memmove(find, replace, replaceLen);
}
}
The program works, until I try to realloc() in an instance where the replaced string will be longer than the initial string. (It still kind of works, it just spits out errors as well as the result).
If it helps, the calling code looks like:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void strrep(char *input, char *search, char *replace);
int main(void) {
char *input = malloc(81);
while ((fgets(input, 81, stdin)) != NULL) {
strrep(input, "Noel", "Christmas");
}
}
As a general rule, you should never do a free or realloc on a user provided buffer. You don't know where the user allocated the space (in your module, in another DLL) so you cannot use any of the allocation functions on a user buffer.
Provided that you now cannot do any reallocation within your function, you should change its behavior a little, like doing only one replacement, so the user will be able to compute the resulting string max length and provide you with a buffer long enough for this one replacement to occur.
Then you could create another function to do the multiple replacements, but you will have to allocate the whole space for the resulting string and copy the user input string. Then you must provide a way to delete the string you allocated.
Resulting in:
void strrep(char *input, char *search, char *replace);
char* strrepm(char *input, char *search, char *replace);
void strrepmfree(char *input);
First off, sorry I'm late to the party. This is my first stackoverflow answer. :)
As has been pointed out, when realloc() is called, you can potentially change the pointer to the memory being reallocated. When this happens, the argument "string" becomes invalid. Even if you reassign it, the change goes out of scope once the function ends.
To answer the OP, realloc() returns a pointer to the newly-reallocated memory. The return value needs to be stored somewhere. Generally, you would do this:
data *foo = malloc(SIZE * sizeof(data));
data *bar = realloc(foo, NEWSIZE * sizeof(data));
/* Test bar for safety before blowing away foo */
if (bar != NULL)
{
foo = bar;
bar = NULL;
}
else
{
fprintf(stderr, "Crap. Memory error.\n");
free(foo);
exit(-1);
}
As TyBoer points out, you guys can't change the value of the pointer being passed in as the input to this function. You can assign whatever you want, but the change will go out of scope at the end of the function. In the following block, "input" may or may not be an invalid pointer once the function completes:
void foobar(char *input, int newlength)
{
/* Here, I ignore my own advice to save space. Check your return values! */
input = realloc(input, newlength * sizeof(char));
}
Mark tries to work around this by returning the new pointer as the output of the function. If you do that, the onus is on the caller to never again use the pointer he used for input. If it matches the return value, then you have two pointers to the same spot and only need to call free() on one of them. If they don't match, the input pointer now points to memory that may or may not be owned by the process. Dereferencing it could cause a segmentation fault.
You could use a double pointer for the input, like this:
void foobar(char **input, int newlength)
{
*input = realloc(*input, newlength * sizeof(char));
}
If the caller has a duplicate of the input pointer somewhere, that duplicate still might be invalid now.
I think the cleanest solution here is to avoid using realloc() when trying to modify the function caller's input. Just malloc() a new buffer, return that, and let the caller decide whether or not to free the old text. This has the added benefit of letting the caller keep the original string!
Just a shot in the dark because I haven't tried it yet but when you realloc it returns the pointer much like malloc. Because realloc can move the pointer if needed you are most likely operating on an invalid pointer if you don't do the following:
input = realloc(input, strlen(input) + delta);
Someone else apologized for being late to the party - two and a half months ago. Oh well, I spend quite a lot of time doing software archaeology.
I'm interested that no-one has commented explicitly on the memory leak in the original design, or the off-by-one error. And it was observing the memory leak that tells me exactly why you are getting the double-free error (because, to be precise, you are freeing the same memory multiple times - and you are doing so after trampling over the already freed memory).
Before conducting the analysis, I'll agree with those who say your interface is less than stellar; however, if you dealt with the memory leak/trampling issues and documented the 'must be allocated memory' requirement, it could be 'OK'.
What are the problems? Well, you pass a buffer to realloc(), and realloc() returns you a new pointer to the area you should use - and you ignore that return value. Consequently, realloc() has probably freed the original memory, and then you pass it the same pointer again, and it complains that you're freeing the same memory twice because you pass the original value to it again. This not only leaks memory, but means that you are continuing to use the original space -- and John Downey's shot in the dark points out that you are misusing realloc(), but doesn't emphasize how severely you are doing so. There's also an off-by-one error because you do not allocate enough space for the NUL '\0' that terminates the string.
The memory leak occurs because you do not provide a mechanism to tell the caller about the last value of the string. Because you kept trampling over the original string plus the space after it, it looks like the code worked, but if your calling code freed the space, it too would get a double-free error, or it might get a core dump or equivalent because the memory control information is completely scrambled.
Your code also doesn't protect against indefinite growth -- consider replacing 'Noel' with 'Joyeux Noel'. Every time, you would add 7 characters, but you'd find another Noel in the replaced text, and expand it, and so on and so forth. My fixup (below) does not address this issue - the simple solution is probably to check whether the search string appears in the replace string; an alternative is to skip over the replace string and continue the search after it. The second has some non-trivial coding issues to address.
So, my suggested revision of your called function is:
char *strrep(char *input, char *search, char *replace) {
int searchLen = strlen(search);
int replaceLen = strlen(replace);
int delta = replaceLen - searchLen;
char *find = input;
while ((find = strstr(find, search)) != 0) {
if (delta > 0) {
input = realloc(input, strlen(input) + delta + 1);
find = strstr(input, search);
}
memmove(find + replaceLen, find + searchLen, strlen(input) + 1 - (find - input));
memmove(find, replace, replaceLen);
}
return(input);
}
This code does not detect memory allocation errors - and probably crashes (but if not, leaks memory) if realloc() fails. See Steve Maguire's 'Writing Solid Code' book for an extensive discussion of memory management issues.
Note, try to edit your code to get rid of the html escape codes.
Well, though it has been a while since I used C/C++, realloc that grows only reuses the memory pointer value if there is room in memory after your original block.
For instance, consider this:
(xxxxxxxxxx..........)
If your pointer points to the first x, and . means free memory location, and you grow the memory size pointed to by your variable by 5 bytes, it'll succeed. This is of course a simplified example as blocks are rounded up to a certain size for alignment, but anyway.
However, if you subsequently try to grow it by another 10 bytes, and there is only 5 available, it will need to move the block in memory and update your pointer.
However, in your example you are passing the function a pointer to the character, not a pointer to your variable, and thus while the strrep function internally might be able to adjust the variable in use, it is a local variable to the strrep function and your calling code will be left with the original pointer variable value.
This pointer value, however, has been freed.
In your case, input is the culprit.
However, I would make another suggestion. In your case it looks like the input variable is indeed input, and if it is, it shouldn't be modified, at all.
I would thus try to find another way to do what you want to do, without changing input, as side-effects like this can be hard to track down.
This seems to work;
char *strrep(char *string, const char *search, const char *replace) {
char *p = strstr(string, search);
if (p) {
int occurrence = p - string;
int stringlength = strlen(string);
int searchlength = strlen(search);
int replacelength = strlen(replace);
if (replacelength > searchlength) {
string = (char *) realloc(string, strlen(string)
+ replacelength - searchlength + 1);
}
if (replacelength != searchlength) {
memmove(string + occurrence + replacelength,
string + occurrence + searchlength,
stringlength - occurrence - searchlength + 1);
}
strncpy(string + occurrence, replace, replacelength);
}
return string;
}
Sigh, is there anyway to post code without it sucking?
realloc is strange, complicated and should only be used when dealing with lots of memory lots of times per second. i.e. - where it actually makes your code faster.
I have seen code where
realloc(bytes, smallerSize);
was used and worked to resize the buffer, making it smaller. Worked about a million times, then for some reason realloc decided that even if you were shortening the buffer, it would give you a nice new copy. So you crash in a random place 1/2 a second after the bad stuff happened.
Always use the return value of realloc.
My quick hints.
Instead of:
void strrep(char *input, char *search, char *replace)
try:
void strrep(char *&input, char *search, char *replace)
and than in the body:
input = realloc(input, strlen(input) + delta);
Generally read about passing function arguments as values/reference and realloc() description :).