Related
So, suppose I have an array (program asks me to write some text):
char sentences[] = "The first sentence.The second sentence.The third sentence";
And I need to store each sentence as an array, where I can have access to any word, or to store the sentences in a single array as elements.
(sentences[0] = "The first sentence"; sentences[1] = "The second sentence";)
How to print out each sentence separately I know:
char* sentence_1 = strtok(sentences, ".");
char* sentence_2 = strtok(NULL, ".");
char* sentence_3 = strtok(NULL, ".");
printf("#1 %s\n", sentence_1);
printf("#2 %s\n", sentence_2);
printf("#3 %s\n", sentence_3);
But how to make program store those sentences in 1 or 3 arrays I have no idea.
Please, help!
If you keep it in the main, since your sentences memory is static (cannot be deleted) you can simply do that:
#include <string.h>
#include <stdio.h>
int main()
{
char sentences[] = "The first sentence.The second sentence.The third sentence";
char* sentence[3];
unsigned int i;
sentence[0] = strtok(sentences, ".");
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strtok(NULL, ".");
}
for (i=0;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
printf("%d: %s\n",i,sentence[i]);
}
return 0;
}
In the general case, you first have to duplicate your input string:
char *sentences_dup = strdup(sentences);
sentence[0] = strtok(sentences_dup, ".");
many reasons for that:
you don't know the lifespan/scope of the input, and it is generally a pointer/a parameter, so your sentences could be invalid as soon as the input memory is freed/goes out of scope
the passed buffer may be const: you cannot modify its memory (strtok modifies the passed buffer)
change sentences[] by *sentences in the example above and you're pointing on a read-only zone: you have to make a copy of the buffer.
Don't forget to store the duplicated pointer, because you may need to free it at some point.
Another alternative is to also duplicate there:
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strdup(strtok(NULL, "."));
}
so you can free your big tokenized string at once, and the sentences have their own, independent memory.
EDIT: the remaining problem here is that you still have to know in advance how many sentences there are in your input.
For that, you could count the dots, and then allocate the proper number of pointers.
int j,nb_dots=0;
char pathsep = '.';
int nb_sentences;
int len = strlen(sentences);
char** sentence;
// first count how many dots we have
for (j=0;j<len;j++)
{
if (sentences[j]==pathsep)
{
nb_dots++;
}
}
nb_sentences = nb_dots+1; // one more!!
// allocate the array of strings
sentence=malloc((nb_sentences) * sizeof(*sentence));
now that we have the number of strings, we can perform our strtok loop. Just be careful of using nb_sentences and not sizeof(sentence)/sizeof(sentence[0]) which is now irrelevant (worth 1) because of the change of array type.
But at this point you could also get rid of strtok completely like proposed in another answer of mine
i Have the following code
char inputs []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1";
char parameters[18];
strcpy(parameters,strtok(inputs,"/"));
and then some code to transmit my characters through uart and see them at a monitor. when i transmit the inputs i see them fine but when i transmit the parameters i see nothing the string is empty.
i have seen examples for strtok and it uses that code to split strings. I have also tried this kind of code at visual studio and when i print them it shows me the strings fine. Is there any chance that strtok doesn't function well with a microprocessor????
While working with microcontrollers, you have to take care from which memory area you are working on. On software running on a PC, everything is stored and run from the RAM. But, on a flash microcontroller, code is run from flash (also called program memory) while data are processed from RAM (also called data memory).
In the case you are working on, the inputs variable is storing an hardcoded character array, which can be const, and we don't know in which area the compiler chose to put it. So, we could rewrite you small program just to make sure that all the data are stored in program data and we will use the "_P" functions to manipulate this data.
#include <avr/pgmspace.h > // to play in program space
const char inputs PROGMEM []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1"; // Now, we are sure this is in program memory space
char buffer[200]; // should be long enough to contain a copy of inputs
char parameters[18];
int length = strlen_P(inputs); // Should contains string length of 186 (just to debug)
strcpy_P(buffer,inputs); // Copy the PROGMEM data in data memory space
strcpy(parameters,strtok_P(buffer,"/")); // Parse the data now in data memory space
For more info on program space with avr gcc : http://www.nongnu.org/avr-libc/user-manual/pgmspace.html
I don't use strtok much, if at all, but this appears to be the correct way to store the result of strtok() in an char[]:
const int NUM_PARAMS = 18;
const int MAX_CHARS = 64;
char parameters[NUM_PARAMS][MAX_CHARS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
or this if you don't want to allocate unnecessary memory for smaller tokens:
const int NUM_PARAMS = 18;
char* parameters[NUM_PARAMS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
parameters[count] = malloc(strlen(result) + 1);
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
parameters should now contain all of your tokens.
strtok() intentionally modifies the source string, replacing tokens with terminators as it goes. There is no reason to store the content in auxiliary storage whatsoever so long as the source string is mutable.
Splitting the string into conjoined constants (which will be assembled by the compiler, so they ultimately will be a single terminated string):
char inputs []="3,0,23.30,3,30/"
"55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/"
"1.5,0.5,0.2,0.2,0.3,0.1";
Clearly the second and third sequences delimited by '/' are nowhere near 17 chars wide (remember, your copy needs one more place for the terminator, thus only 17 chars can legally be copied).
Unless there is a compelling reason to do otherwise, I see no reason you can't simply do this:
char *param = strtok(inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
What you do with // TODO do something with param... is up to you. The length of the parameter can be retrieved by strlen(param), for example. You can make a copy, providing you have enough storage space as the destination (in the second and third cases, you don't provide enough storage with only an 18-char buffer in your example).
Regardless, remember, strtok() modifies the source inputs[] array. If that is not acceptable an alternative would be something like:
char *tmp_inputs = strdup(inputs);
if (tmp_inputs != NULL)
{
char *param = strtok(tmp_inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
// done with copy of inputs. free it.
free(tmp_inputs);
}
There are considerably threading decisions to make as well, the very reason the version of strtok() that requires the caller (you) to tote around context between calls was invented. See strtok_r() for more information.
You have to initialialize parameters that way :
char parameters[18] = {'\0'};
Thus, it will be initialized with null characters instead of variable values. This is important, because strcpy will try to find a null character to identify the end of the string.
Just to make it clear...
parameters after its initialisation : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
parameters after strtok : ['3',',','0',',','2','3','.','3','0',',','3',',','3','0',0,0,0,0]
I am pretty confused with pointers in C. I am finding it hard to wrap my mind around creating them and passing stuff around? I have a "Segmentation Fault: 11" error after I added code, in which previously it worked. Needed to add something. This is part of the code:
char *token2;
char *line2;
char comma_loc = 0;
int num_of_commas = 0;
char *line2[1];
while(token != NULL) { //lets make sure token has a string token
//printf("Wats in token: %s\n", token);
if(key==true) {
//printf("This should be an identifier: %s\n", token);
if(comma != true) { //added if statement, just take away if it fails, the first case is the original
int len = strlen(token);
iden_holder[iden_holder_count] = (char *)malloc(sizeof(char) * (len +1));
memcpy(iden_holder[iden_holder_count], token, len +1);
iden_holder_count++;
key = false;
} else {
int len2 = strlen(token);
line2[0] = (char *)malloc(sizeof(char) * (len2 + 1));
memcpy(line2[0], token, len2 + 1);
token2 = strtok(line2[0],",");
while(token2 != NULL) {
int len = strlen(token2);
iden_holder[iden_holder_count] = (char *)malloc(sizeof(char) * (len +1));
memcpy(iden_holder[iden_holder_count], token, len +1);
iden_holder_count++;
token2 = strtok(line2[0],",");
}
key = false;
}
Point of this code is to take the string within token and copy it into another token, in my case token2. I decided to use memcpy, but I am confused how to use it due to the pointers confusion. I should also note that I used strtok before this, and the code here is within in. Could it be that if I use it again that it will override the other one?
Read this completely. It will help you with your basics. It did to me. :)
Pointers are exactly that: pointers. They're meant to point to something. The vast majority of problems people have with pointers is that they're not pointing anywhere intelligent :-)
Consider the following code:
char xyzzy[] = "hello";
char *pch;
In a stack-based C implementation, this will probably give you a stack containing the string and a pointer set to an arbitrary value.
The pointer exists on the stack like any other variable but it could point to anywhere.
If you then execute:
pch = xyzzy;
it's set to point to the first character of xyzzy (the h).
Arrays and pointers are very different beasts. For example, you cannot increment xyzzy to point to the second character of that string but you can increment pch.
The confusion arises because, in quite a lot of circumstances, arrays will decay to a pointer to the first element of that array.
That's basically the reason why you don't need [] for pointers, because they're not arrays. They do not know, and do not care, about how many things may exist at the memory they point at, their only concern is the one thing they currently point to.
Moving the pointer throughout the array, and ensuring you don't go off the ends, is extra management that you have to do as a programmer.
My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}
I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks
strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}
strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.
strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.
OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.
If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.
replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.
and don't forget to check if malloc returns NULL!