I'm implementing a function which, given a string, a character and another string (since now we can call it the "substring"); puts the substring everywhere the character is in the string.
To explain me better, given these parameters this is what the function should return (pseudocode):
func ("aeiou", 'i', "hello") -> aehelloou
I'm using some functions from string.h lib. I have tested it with pretty good result:
char *somestring= "this$ is a tes$t wawawa$wa";
printf("%s", strcinsert(somestring, '$', "WHAT?!") );
Outputs: thisWHAT?! is a tesWHAT?!t wawawaWHAT?!wa
so for now everything is allright. The problem is when I try to do the same with, for example this string:
char *somestring= "this \"is a test\" wawawawa";
printf("%s", strcinsert(somestring, '"', "\\\"") );
since I want to change every " for a \" . When I do this, the PC collapses. I don't know why but it stops working and then shutdown. I've head some about the bad behavior of some functions of the string.h lib but I couldn't find any information about this, I really thank any help.
My code:
#define salloc(size) (str)malloc(size+1) //i'm lazy
typedef char* str;
str strcinsert (str string, char flag, str substring)
{
int nflag= 0; //this is the number of times the character appears
for (int i= 0; i<strlen(string); i++)
if (string[i]==flag)
nflag++;
str new=string;
int pos;
while (strchr(string, flag)) //since when its not found returns NULL
{
new= salloc(strlen(string)+nflag*strlen(substring)-nflag);
pos= strlen(string)-strlen(strchr(string, flag));
strncpy(new, string, pos);
strcat(new, substring);
strcat(new, string+pos+1);
string= new;
}
return new;
}
Thanks for any help!
Some advices:
refrain from typedef char* str;. The char * type is common in C and masking it will just make your code harder to be reviewed
refrain from #define salloc(size) (str)malloc(size+1) for the exact same reason. In addition don't cast malloc in C
each time you write a malloc (or calloc or realloc) there should be a corresponding free: C has no garbage collection
dynamic allocation is expensive, use it only when needed. Said differently a malloc inside a loop should be looked at twice (especially if there is no corresponding free)
always test allocation function (unrelated: and io) a malloc will simply return NULL when you exhaust memory. A nice error message is then easier to understand than a crash
learn to use a debugger: if you had executed your code under a debugger the error would have been evident
Next the cause: if the replacement string contains the original one, you fall again on it and run in an endless loop
A possible workaround: allocate the result string before the loop and advance both in the original one and the result. It will save you from unnecessary allocations and de-allocations, and will be immune to the original char being present in the replacement string.
Possible code:
// the result is an allocated string that must be freed by caller
str strcinsert(str string, char flag, str substring)
{
int nflag = 0; //this is the number of times the character appears
for (int i = 0; i<strlen(string); i++)
if (string[i] == flag)
nflag++;
str new_ = string;
int pos;
new_ = salloc(strlen(string) + nflag*strlen(substring) - nflag);
// should test new_ != NULL
char * cur = new_;
char *old = string;
while (NULL != (string = strchr(string, flag))) //since when its not found returns NULL
{
pos = string - old;
strncpy(cur, old, pos);
cur[pos] = '\0'; // strncpy does not null terminate the dest. string
strcat(cur, substring);
strcat(cur, string + 1);
cur += strlen(substring) + pos; // advance the result
old = ++string; // and the input string
}
return new_;
}
Note: I have not reverted the str and salloc but you really should do.
In your second loop, you always look for the first flag character in the string. In this case, that’ll be the one you just inserted from substring. The strchr function will always find that quote and never return NULL, so your loop will never terminate and just keep allocating memory (and not enough of it, since your string grows arbitrarily large).
Speaking of allocating memory, you need to be more careful with that. Unlike in Python, C doesn’t automatically notice when you’re no longer using memory; anything you malloc must be freed. You also allocate far more memory than you need: even in your working "this$ is a tes$t wawawa$wa" example, you allocate enough space for the full string on each iteration of the loop, and never free any of it. You should just run the allocation once, before the second loop.
This isn’t as important as the above stuff, but you should also pay attention to performance. Each call to strcat and strlen iterates over the entire string, meaning you look at it far more often than you need. You should instead save the result of strlen, and copy the new string directly to where you know the NUL terminator is. The same goes for strchr; you already replaced the beginning of the string and don’t want to waste time looking at it again, apart from the part where that’s causing your current bug.
In comparison to these issues, the style issues mentioned in the comments with your typedef and macro are relatively minor, but they are still worth mentioning. A char* in C is different from a str in Python; trying to typedef it to the same name just makes it more likely you’ll try to treat them as the same and run into these issues.
I don't know why but it stops working
strchr(string, flag) is looking over the whole string for flag. Search needs to be limited to the portion of the string not yet examined/updated. By re-searching the partially replaces string, code is finding the flag over and over.
The whole string management approach needs re-work. As OP reported a Python background, I've posted a very C approach as mimicking Python is not a good approach here. C is different especially in the management of memory.
Untested code
// Look for needles in a haystack and replace them
// Note that replacement may be "" and result in a shorter string than haystack
char *strcinsert_alloc(const char *haystack, char needle, const char *replacment) {
size_t n = 0;
const char *s = haystack;
while (*s) {
if (*s == needle) n++; // Find needle count
s++;
}
size_t replacemnet_len = strlen(replacment);
// string length - needles + replacements + \0
size_t new_size = (size_t)(s - haystack) - n*1 + n*replacemnet_len + 1;
char *dest = malloc(new_size);
if (dest) {
char *d = dest;
s = haystack;
while (*s) {
if (*s == needle) {
memcpy(d, s, replacemnet_len);
d += replacemnet_len;
} else {
*d = *s;
d++;
}
s++;
}
*d = '\0';
}
return dest;
}
In your program, you are facing problem for input -
char *somestring= "this \"is a test\" wawawawa";
as you want to replace " for a \".
The first problem is whenever you replace " for a \" in string, in next iteration strchr(string, flag) will find the last inserted " of \". So, in subsequent interations your string will form like this -
this \"is a test" wawawawa
this \\"is a test" wawawawa
this \\\"is a test" wawawawa
So, for input string "this \"is a test\" wawawawa" your while loop will run for infinite times as every time strchr(string, flag) finds the last inserted " of \".
The second problem is the memory allocation you are doing in your while loop in every iteration. There is no free() for the allocated memory to new. So when while loop run infinitely, it will eat up all the memory which will lead to - the PC collapses.
To resolve this, in every iteration, you should search for flag only in the string starting from a character after the last inserted substring to the end of the string. Also, make sure to free() the dynamically allocated memory.
Related
I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain
I'm somewhat new to C and am wondering about certain things about memory allocation. My function is as follows:
size_t count_nwords(const char* str) {
//char* copied_str = strdup(str); // because 'strtok()' alters the string it passes through
char copied_str[strlen(str)];
strcpy(copied_str, str);
size_t count = 1;
strtok(copied_str, " ");
while(strtok(NULL, " ") != 0) {
count++;
}
//free(copied_str);
return count;
}
This function counts the amount of words in a string (the delimiter is a space, ie ""). I do not want the string passed in argument to be modified.
I have two questions:
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy() is sufficient and faster, but I am not certain.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the function is robust? Or is using size_t nwords = count_nwords(copied_input); completely safe and will always properly get the returned value?
Thank you!
EDIT: I've accepted the only answer that concerned my questions precisely, but I advise reading the other answers as they provide good insights regarding errors I had made in my code.
Failure to account for the null character
// char copied_str[strlen(str)];
char copied_str[strlen(str) + 1];
strcpy(copied_str, str);
Wrong algorithm
Even with above fix, code returns 1 with count_nwords(" ")
Unnecessary copying of string
strtok() not needed here. A copy of the string is not needed.
Alternative: walk the string.
size_t count_nwords(const char* str) {
size_t count = 0;
while (*str) {
while (isspace((unsigned char) *str)) {
str++;
}
if (*str) {
count++;
while (!isspace((unsigned char) *str) && *str) {
str++;
}
}
}
return count;
}
Another option is the state-loop approach where you continually loop over each character keeping track of the state of your count with a simple flag. (you are either in a word reading characters or you are reading spaces). The benefit being you have only a single loop involved. A short example would be:
size_t count_words (const char *str)
{
size_t words = 0;
int in_word = 0;
while (*str) {
if (isspace ((unsigned char)*str))
in_word = 0;
else {
if (!in_word)
words++;
in_word = 1;
}
str++;
}
return words;
}
It is worth understanding all techniques. isspace requires the inclusion of ctype.h.
Should the strdup() way (which is the commented part in the code) be preferred over the strcpy() one? My understanding is that strcpy()
is sufficient and faster, but I am not certain.
Your solution is clean and works well so don't bother. The only point is that you are using VLA which is now optional, then using strdup would be less standard prone. Now regarding performance, as it is not specified how VLAs are implemented, performance may vary from compiler/platform to compiler/platform (gcc is known to use stack for VLAs but any other compiler may use heap). We only know that strdup allocates on the heap, that's all. I doubt that performance problem will come from such a choice.
Note: you allocation size is wrong and should be at least strlen(str)+1.
Since no memory is allocated for the size_t value to be returned (it's a local variable), should it be done in order to ensure the
function is robust? Or is using size_t nwords =
count_nwords(copied_input); completely safe and will always properly
get the returned value?
Managing return values and memory suitable for is a concern of the compiler. Usually, these values are transfered on/from the stack (have some reading on "stack frame"). As you may suspect, space is allocated on the stack for it just before the call and is deallocated after the call (as soon as you discard or copy the returned value).
I have a string pointer like below,
char *str = "This is cool stuff";
Now, I've references to this string pointer like below,
char* start = str + 1;
char* end = str + 6;
So, start and end are pointing to different locations of *str. How can I copy the string chars falls between start and end into a new string pointer. Any existing C++/C function is preferable.
Just create a new buffer called dest and use strncpy
char dest[end-start+1];
strncpy(dest,start,end-start);
dest[end-start] = '\0'
Use STL std::string:
#include
const char *str = "This is cool stuff";
std::string part( str + 1, str + 6 );
This uses iterator range constructor, so the part of the C-string does not have to be zero-terminated.
It's best to do this with strcpy(), and terminate the result yourself. The standard strncpy() function has very strange semantics.
If you really want a "new string pointer", and be a bit safe with regard to lengths and static buffers, you need to dynamically allocate the new string:
char * ranged_copy(const char *start, const char *end)
{
char *s;
s = malloc(end - start + 1);
memcpy(s, start, end - start);
s[end - start] = 0;
return s;
}
If you want to do this with C++ STL:
#include <string>
...
std::string cppStr (str, 1, 6); // copy substring range from 1st to 6th character of *str
const char *newStr = cppStr.c_str(); // make new char* from substring
char newChar[] = new char[end-start+1]]
p = newChar;
while (start < end)
*p++ = *start++;
This is one of the rare cases when function strncpy can be used. Just calculate the number of characters you need to copy and specify that exact amount in the strncpy. Remember that strncpy will not zero-terminate the result in this case, so you'll have to do it yourself (which, BTW, means that it makes more sense to use memcpy instead of the virtually useless strncpy).
And please, do yourself a favor, start using const char * pointers with string literals.
Assuming that end follows the idiomatic semantics of pointing just past the last item you want copied (STL semantics are a useful idiom even if we're dealing with straight C) and that your destination buffer is known to have enough space:
memcpy( buf, start, end-start);
buf[end-start] = '\0';
I'd wrap this in a sub-string function that also took the destination buffer size as a parameter so it could perform a check and truncate the result or return an error to prevent overruns.
I'd avoid using strncpy() because too many programmers forget about the fact that it might not terminate the destination string, so the second line might be mistakenly dropped at some point by someone believing it unnecessary. That's less likely if memcpy() were used. (In general, just say no to using strncpy())
I'm trying to pass a string to chdir(). But I always seem to have some trailing stuff makes the chdir() fail.
#define IN_LEN 128
int main(int argc, char** argv) {
int counter;
char command[IN_LEN];
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
size_t path_len; char path[IN_LEN];
...
fgets(command, IN_LEN, stdin)
counter = 0;
tmp = strtok(command, delim);
while(tmp != NULL) {
*(tokens+counter) = tmp;
tmp = strtok(NULL, delim);
counter++;
}
if(strncmp(*tokens, cd_command, strlen(cd_command)) == 0) {
path_len = strlen(*(tokens+1));
strncpy(path, *(tokens+1), path_len-1);
// this is where I try to remove the trailing junk...
// but it doesn't work on a second system
if(chdir(path) < 0) {
error_string = strerror(errno);
fprintf(stderr, "path: %s\n%s\n", path, error_string);
}
// just to check if the chdir worked
char buffer[1000];
printf("%s\n", getcwd(buffer, 1000));
}
return 0;
}
There must be a better way to do this. Can any help out? I'vr tried to use scanf but when the program calls scanf, it just hangs.
Thanks
It looks like you've forgotten to append a null '\0' to path string after calling strncpy(). Without the null terminator chdir() doesn't know where the string ends and it just keeps looking until it finds one. This would make it appear like there are extra characters at the end of your path.
You have (at least) 2 problems in your example.
The first one (which is causing the immediately obvious problems) is the use of strncpy() which doesn't necessarily place a '\0' terminator at the end of the buffer it copies into. In your case there's no need to use strncpy() (which I consider dangerous for exactly the reason you ran into). Your tokens will be '\0' terminated by strtok(), and they are guaranteed to be smaller than the path buffer (since the tokens come from a buffer that's the same size as the path buffer). Just use strcpy(), or if you want the code to be resiliant of someone coming along later and mucking with the buffer sizes use something like the non-standard strlcpy().
As a rule of thumb don't use strncpy().
Another problem with your code is that the tokens allocation isn't right.
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
will allocate an area as large as your input string buffer, but you're storing pointers to strings in that allocation, not chars. You'll have fewer tokens than characters (by definition), but each token pointer is probably 4 times larger than a character (depending on the platform's pointer size). If your string has enough tokens, you'll overrun this buffer.
For example, assume IN_LEN is 14 and the input string is "a b c d e f g". If you use spaces as the delimiter, there will be 7 tokens, which will require a pointer array with 28 bytes. Quite a few more than the 14 allocated by the malloc() call.
A simple change to:
char** tokens = (char**) malloc((sizeof(char*) * IN_LEN) / 2);
should allocate enough space (is there an off-by-one error in there? Maybe a +1 is needed).
A third problem is that you potentially access *tokens and *(tokens+1) even if zero or only one token was added to the array. You'll need to add some checks of the counter variable before dereferencing those pointers.
I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks
strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}
strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.
strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.
OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.
If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.
replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.
and don't forget to check if malloc returns NULL!