I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL
I'm implementing a function which, given a string, a character and another string (since now we can call it the "substring"); puts the substring everywhere the character is in the string.
To explain me better, given these parameters this is what the function should return (pseudocode):
func ("aeiou", 'i', "hello") -> aehelloou
I'm using some functions from string.h lib. I have tested it with pretty good result:
char *somestring= "this$ is a tes$t wawawa$wa";
printf("%s", strcinsert(somestring, '$', "WHAT?!") );
Outputs: thisWHAT?! is a tesWHAT?!t wawawaWHAT?!wa
so for now everything is allright. The problem is when I try to do the same with, for example this string:
char *somestring= "this \"is a test\" wawawawa";
printf("%s", strcinsert(somestring, '"', "\\\"") );
since I want to change every " for a \" . When I do this, the PC collapses. I don't know why but it stops working and then shutdown. I've head some about the bad behavior of some functions of the string.h lib but I couldn't find any information about this, I really thank any help.
My code:
#define salloc(size) (str)malloc(size+1) //i'm lazy
typedef char* str;
str strcinsert (str string, char flag, str substring)
{
int nflag= 0; //this is the number of times the character appears
for (int i= 0; i<strlen(string); i++)
if (string[i]==flag)
nflag++;
str new=string;
int pos;
while (strchr(string, flag)) //since when its not found returns NULL
{
new= salloc(strlen(string)+nflag*strlen(substring)-nflag);
pos= strlen(string)-strlen(strchr(string, flag));
strncpy(new, string, pos);
strcat(new, substring);
strcat(new, string+pos+1);
string= new;
}
return new;
}
Thanks for any help!
Some advices:
refrain from typedef char* str;. The char * type is common in C and masking it will just make your code harder to be reviewed
refrain from #define salloc(size) (str)malloc(size+1) for the exact same reason. In addition don't cast malloc in C
each time you write a malloc (or calloc or realloc) there should be a corresponding free: C has no garbage collection
dynamic allocation is expensive, use it only when needed. Said differently a malloc inside a loop should be looked at twice (especially if there is no corresponding free)
always test allocation function (unrelated: and io) a malloc will simply return NULL when you exhaust memory. A nice error message is then easier to understand than a crash
learn to use a debugger: if you had executed your code under a debugger the error would have been evident
Next the cause: if the replacement string contains the original one, you fall again on it and run in an endless loop
A possible workaround: allocate the result string before the loop and advance both in the original one and the result. It will save you from unnecessary allocations and de-allocations, and will be immune to the original char being present in the replacement string.
Possible code:
// the result is an allocated string that must be freed by caller
str strcinsert(str string, char flag, str substring)
{
int nflag = 0; //this is the number of times the character appears
for (int i = 0; i<strlen(string); i++)
if (string[i] == flag)
nflag++;
str new_ = string;
int pos;
new_ = salloc(strlen(string) + nflag*strlen(substring) - nflag);
// should test new_ != NULL
char * cur = new_;
char *old = string;
while (NULL != (string = strchr(string, flag))) //since when its not found returns NULL
{
pos = string - old;
strncpy(cur, old, pos);
cur[pos] = '\0'; // strncpy does not null terminate the dest. string
strcat(cur, substring);
strcat(cur, string + 1);
cur += strlen(substring) + pos; // advance the result
old = ++string; // and the input string
}
return new_;
}
Note: I have not reverted the str and salloc but you really should do.
In your second loop, you always look for the first flag character in the string. In this case, that’ll be the one you just inserted from substring. The strchr function will always find that quote and never return NULL, so your loop will never terminate and just keep allocating memory (and not enough of it, since your string grows arbitrarily large).
Speaking of allocating memory, you need to be more careful with that. Unlike in Python, C doesn’t automatically notice when you’re no longer using memory; anything you malloc must be freed. You also allocate far more memory than you need: even in your working "this$ is a tes$t wawawa$wa" example, you allocate enough space for the full string on each iteration of the loop, and never free any of it. You should just run the allocation once, before the second loop.
This isn’t as important as the above stuff, but you should also pay attention to performance. Each call to strcat and strlen iterates over the entire string, meaning you look at it far more often than you need. You should instead save the result of strlen, and copy the new string directly to where you know the NUL terminator is. The same goes for strchr; you already replaced the beginning of the string and don’t want to waste time looking at it again, apart from the part where that’s causing your current bug.
In comparison to these issues, the style issues mentioned in the comments with your typedef and macro are relatively minor, but they are still worth mentioning. A char* in C is different from a str in Python; trying to typedef it to the same name just makes it more likely you’ll try to treat them as the same and run into these issues.
I don't know why but it stops working
strchr(string, flag) is looking over the whole string for flag. Search needs to be limited to the portion of the string not yet examined/updated. By re-searching the partially replaces string, code is finding the flag over and over.
The whole string management approach needs re-work. As OP reported a Python background, I've posted a very C approach as mimicking Python is not a good approach here. C is different especially in the management of memory.
Untested code
// Look for needles in a haystack and replace them
// Note that replacement may be "" and result in a shorter string than haystack
char *strcinsert_alloc(const char *haystack, char needle, const char *replacment) {
size_t n = 0;
const char *s = haystack;
while (*s) {
if (*s == needle) n++; // Find needle count
s++;
}
size_t replacemnet_len = strlen(replacment);
// string length - needles + replacements + \0
size_t new_size = (size_t)(s - haystack) - n*1 + n*replacemnet_len + 1;
char *dest = malloc(new_size);
if (dest) {
char *d = dest;
s = haystack;
while (*s) {
if (*s == needle) {
memcpy(d, s, replacemnet_len);
d += replacemnet_len;
} else {
*d = *s;
d++;
}
s++;
}
*d = '\0';
}
return dest;
}
In your program, you are facing problem for input -
char *somestring= "this \"is a test\" wawawawa";
as you want to replace " for a \".
The first problem is whenever you replace " for a \" in string, in next iteration strchr(string, flag) will find the last inserted " of \". So, in subsequent interations your string will form like this -
this \"is a test" wawawawa
this \\"is a test" wawawawa
this \\\"is a test" wawawawa
So, for input string "this \"is a test\" wawawawa" your while loop will run for infinite times as every time strchr(string, flag) finds the last inserted " of \".
The second problem is the memory allocation you are doing in your while loop in every iteration. There is no free() for the allocated memory to new. So when while loop run infinitely, it will eat up all the memory which will lead to - the PC collapses.
To resolve this, in every iteration, you should search for flag only in the string starting from a character after the last inserted substring to the end of the string. Also, make sure to free() the dynamically allocated memory.
i am trying to convert a string (example: "hey there mister") into a double pointer that's pointing to every word in the sentence.
so: split_string->|pointer1|pointer2|pointer3| where pointer1->"hey", pointer2->"there" and pointer3->"mister".
char **split(char *s) {
char **nystreng = malloc(strlen(s));
char str[strlen(s)];
int i;
for(i = 0; i < strlen(s); i++){
str[i] = s[i];
}
char *temp;
temp = strtok(str, " ");
int teller = 0;
while(temp != NULL){
printf("%s\n", temp);
nystreng[teller] = temp;
temp = strtok(NULL, " ");
}
nystreng[teller++] = NULL;
//free(nystreng);
return nystreng;
}
My question is, why isnt this working?
Your code has multiple problems. Among them:
char **nystreng = malloc(strlen(s)); is just wrong. The amount of space you need is the size of a char * times the number pieces into which the string will be split plus one (for the NULL pointer terminator).
You fill *nystreng with pointers obtained from strtok() operating on local array str. Those pointers are valid only for the lifetime of str, which ends when the function returns.
You do not allocate space for a string terminator in str, and you do not write one, yet you pass it to strtok() as if it were a terminated string.
You do not increment teller inside your tokenization loop, so each token pointer overwrites the previous one.
You have an essential problem here in that you do not know before splitting the string how many pieces there will be. You could nevertheless get an upper bound on that by counting the number of delimiter characters and adding 1. You could then allocate space for that many char pointers plus one. Alternatively, you could build a linked list to handle the pieces as you tokenize, then allocate the result array only after you know how many pieces there are.
As for str, if you want to return pointers into it, as apparently you do, then it needs to be dynamically allocated, too. If your platform provides strdup() then you could just use
char *str = strdup(s);
Otherwise, you'll need to check the length, allocate enough space with malloc() (including space for the terminator), and copy the input string into the allocated space, presumably with strcpy(). Normally you would want to free the string afterward, but you must not do that if you are returning pointers into that space.
On the other hand, you might consider returning an array of strings that can be individually freed. For that, you must allocate each substring individually (strdup() would again be your friend if you have it), and in that event you would want to free the working space (or allow it to be cleaned up automatically if you use a VLA).
There are two things you need to do -
char str[strlen(s)]; //size should be equal to strlen(s)+1
Extra 1 for '\0'. Right now you pass str (not terminated with '\0') to strtok which causes undefined behaviour .
And second thing ,you also need allocate memory to each pointer of nystring and then use strcpy instead of pointing to temp(don't forget space for nul terminator).
i Have the following code
char inputs []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1";
char parameters[18];
strcpy(parameters,strtok(inputs,"/"));
and then some code to transmit my characters through uart and see them at a monitor. when i transmit the inputs i see them fine but when i transmit the parameters i see nothing the string is empty.
i have seen examples for strtok and it uses that code to split strings. I have also tried this kind of code at visual studio and when i print them it shows me the strings fine. Is there any chance that strtok doesn't function well with a microprocessor????
While working with microcontrollers, you have to take care from which memory area you are working on. On software running on a PC, everything is stored and run from the RAM. But, on a flash microcontroller, code is run from flash (also called program memory) while data are processed from RAM (also called data memory).
In the case you are working on, the inputs variable is storing an hardcoded character array, which can be const, and we don't know in which area the compiler chose to put it. So, we could rewrite you small program just to make sure that all the data are stored in program data and we will use the "_P" functions to manipulate this data.
#include <avr/pgmspace.h > // to play in program space
const char inputs PROGMEM []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1"; // Now, we are sure this is in program memory space
char buffer[200]; // should be long enough to contain a copy of inputs
char parameters[18];
int length = strlen_P(inputs); // Should contains string length of 186 (just to debug)
strcpy_P(buffer,inputs); // Copy the PROGMEM data in data memory space
strcpy(parameters,strtok_P(buffer,"/")); // Parse the data now in data memory space
For more info on program space with avr gcc : http://www.nongnu.org/avr-libc/user-manual/pgmspace.html
I don't use strtok much, if at all, but this appears to be the correct way to store the result of strtok() in an char[]:
const int NUM_PARAMS = 18;
const int MAX_CHARS = 64;
char parameters[NUM_PARAMS][MAX_CHARS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
or this if you don't want to allocate unnecessary memory for smaller tokens:
const int NUM_PARAMS = 18;
char* parameters[NUM_PARAMS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
parameters[count] = malloc(strlen(result) + 1);
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
parameters should now contain all of your tokens.
strtok() intentionally modifies the source string, replacing tokens with terminators as it goes. There is no reason to store the content in auxiliary storage whatsoever so long as the source string is mutable.
Splitting the string into conjoined constants (which will be assembled by the compiler, so they ultimately will be a single terminated string):
char inputs []="3,0,23.30,3,30/"
"55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/"
"1.5,0.5,0.2,0.2,0.3,0.1";
Clearly the second and third sequences delimited by '/' are nowhere near 17 chars wide (remember, your copy needs one more place for the terminator, thus only 17 chars can legally be copied).
Unless there is a compelling reason to do otherwise, I see no reason you can't simply do this:
char *param = strtok(inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
What you do with // TODO do something with param... is up to you. The length of the parameter can be retrieved by strlen(param), for example. You can make a copy, providing you have enough storage space as the destination (in the second and third cases, you don't provide enough storage with only an 18-char buffer in your example).
Regardless, remember, strtok() modifies the source inputs[] array. If that is not acceptable an alternative would be something like:
char *tmp_inputs = strdup(inputs);
if (tmp_inputs != NULL)
{
char *param = strtok(tmp_inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
// done with copy of inputs. free it.
free(tmp_inputs);
}
There are considerably threading decisions to make as well, the very reason the version of strtok() that requires the caller (you) to tote around context between calls was invented. See strtok_r() for more information.
You have to initialialize parameters that way :
char parameters[18] = {'\0'};
Thus, it will be initialized with null characters instead of variable values. This is important, because strcpy will try to find a null character to identify the end of the string.
Just to make it clear...
parameters after its initialisation : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
parameters after strtok : ['3',',','0',',','2','3','.','3','0',',','3',',','3','0',0,0,0,0]
I'm trying to pass a string to chdir(). But I always seem to have some trailing stuff makes the chdir() fail.
#define IN_LEN 128
int main(int argc, char** argv) {
int counter;
char command[IN_LEN];
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
size_t path_len; char path[IN_LEN];
...
fgets(command, IN_LEN, stdin)
counter = 0;
tmp = strtok(command, delim);
while(tmp != NULL) {
*(tokens+counter) = tmp;
tmp = strtok(NULL, delim);
counter++;
}
if(strncmp(*tokens, cd_command, strlen(cd_command)) == 0) {
path_len = strlen(*(tokens+1));
strncpy(path, *(tokens+1), path_len-1);
// this is where I try to remove the trailing junk...
// but it doesn't work on a second system
if(chdir(path) < 0) {
error_string = strerror(errno);
fprintf(stderr, "path: %s\n%s\n", path, error_string);
}
// just to check if the chdir worked
char buffer[1000];
printf("%s\n", getcwd(buffer, 1000));
}
return 0;
}
There must be a better way to do this. Can any help out? I'vr tried to use scanf but when the program calls scanf, it just hangs.
Thanks
It looks like you've forgotten to append a null '\0' to path string after calling strncpy(). Without the null terminator chdir() doesn't know where the string ends and it just keeps looking until it finds one. This would make it appear like there are extra characters at the end of your path.
You have (at least) 2 problems in your example.
The first one (which is causing the immediately obvious problems) is the use of strncpy() which doesn't necessarily place a '\0' terminator at the end of the buffer it copies into. In your case there's no need to use strncpy() (which I consider dangerous for exactly the reason you ran into). Your tokens will be '\0' terminated by strtok(), and they are guaranteed to be smaller than the path buffer (since the tokens come from a buffer that's the same size as the path buffer). Just use strcpy(), or if you want the code to be resiliant of someone coming along later and mucking with the buffer sizes use something like the non-standard strlcpy().
As a rule of thumb don't use strncpy().
Another problem with your code is that the tokens allocation isn't right.
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
will allocate an area as large as your input string buffer, but you're storing pointers to strings in that allocation, not chars. You'll have fewer tokens than characters (by definition), but each token pointer is probably 4 times larger than a character (depending on the platform's pointer size). If your string has enough tokens, you'll overrun this buffer.
For example, assume IN_LEN is 14 and the input string is "a b c d e f g". If you use spaces as the delimiter, there will be 7 tokens, which will require a pointer array with 28 bytes. Quite a few more than the 14 allocated by the malloc() call.
A simple change to:
char** tokens = (char**) malloc((sizeof(char*) * IN_LEN) / 2);
should allocate enough space (is there an off-by-one error in there? Maybe a +1 is needed).
A third problem is that you potentially access *tokens and *(tokens+1) even if zero or only one token was added to the array. You'll need to add some checks of the counter variable before dereferencing those pointers.