I have a character like ';' or ',' used as a delimeter in a raw string. I need to split the string and iterate over each string.
Ex: If,
char* str = "apples, mangoes , orang; ,ad";
And the delimiter is ',' then I need something like
while(substr!='\0') {
func(substr);
//some operation maybe like substr=strstr(substr)+1;
}
The function should be called 4 times with strings: "apples"," mangoes "," orang; ","ad".
In your case str is string literal and you cannot use strtok on that since its in R_ONLY section.
and strtok does modify your str which is literal and would yield runtime error (Segmentation Fault).
If you want to split it then you must have an user input. Probably fgets (most preferred one).
fgets (str, SIZE, stdin); // user input for str
Make use of strtok function only if you have user input for a string.
char * strtok (char *string, const char *delimiter);
This is how you can use it.
char *buff = strtok (str, ",;");
while (buff != NULL)
{
printf (buff);
buff = strtok (NULL, ",;");
}
For more information on the string functions
man string
strtok is handy tool for tokenize the string in C. Also note strtok modify the original string.
In your case
char* str = "apples, mangoes , orang; ,ad";
This is string literal which is read only and it's Undefined Behaviour if you use strtok on this. So batter to use predefined length or array or copy this string to some temp buffer then apply strtok on temp buffer.
For example
int main()
{
char str[] = "apples, mangoes , orang; ,ad";
char *token = strtok (str, ",;");
while (token != NULL)
{
printf ("%s ",token);
token = strtok (NULL, ",;");
}
return 0;
}
Related
For some reason, whether I try strchr or strrchr, I get the same value returned. I have no idea why.
Here is the segment of code giving me trouble:
printf("Enter a data point (-1 to stop input):\n");
fgets(input, 50, stdin);
char *name = strrchr(input, ',');
if(name)
{
printf("%s", name);
}
The input is Jane Austen, 6, and I am trying to separate it into two strings: one before the comma and one after the comma. However, my use of strrchr(input, ','); or strchr(input, ','); seems pointless, as my output is ALWAYS , 6. Can someone explain why?
It sounds like you want strtok instead:
char *name = strtok(input, ",");
char *value = strtok(NULL, ",");
Some languages provide a string function split that takes a string or regular expression and splits the string into a list of substrings separated by the delimiter (python, ruby, perl). It is not too difficult to construct such a split function, especially if you just split on a single character.
char** split(char* string, char delim);
You will also want a string join function, and a function to cleanup the allocated space.
char* split_join(char** splitray, char* buffer);
void split_free(char** splitray);
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
I am using strtok to tokenise the string, Is strtok affects the original buffer? For e.g:
*char buf[] = "This Is Start Of life";
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}*
printf("Orignal Buffer:: %s ",buf);
Output is::
This
Is
Start
Of
life
Original Buffer:: This
I read that strtok returns pointer to the next token, then how the buf is getting affected? Is there way to retain original buffer (without extra copy overhead)?
Follow-on Question:: from so far answers I guess there is no way to retain the buffer. So what if I use dynamic array to create original buffer and if strtok is going to affect it, then there will be memory leak while freeing the original buffer or is strtok takes care of freeing memory?
strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok(). Therefore the original string gets affected.
strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token. Therefore after you run strtok() the delim characters will be replaced by NULL characters. You can read link1 link2.
As you can see in output of example in link2, the output you are getting is as expected since the delim character is replaced by strtok.
When you do strtok(NULL, "|"), strtok finds a token and puts null on place (replace delimiter with '\0') and modifies the string. So you need to make the copy of the original string before tokenization.
Please try following:
void main(void)
{
char buf[] = "This Is Start Of life";
char *buf1;
/* calloc() function will allocate the memory & initialize its to the NULL*/
buf1 = calloc(strlen(buf)+1, sizeof(char));
strcpy(buf1, buf);
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}
printf("Original Buffer:: %s ",buf1);
}
Let's say I'm using strtok() like this..
char *token = strtok(input, ";-/");
Is there a way to figure out which token actually gets used? For instance, if the inputs was something like:
Hello there; How are you? / I'm good - End
Can I figure out which delimiter was used for each token? I need to be able to output a specific message, depending on the delimiter that followed the token.
Important: strtok is not re-entrant, you should use strtok_r instead of it.
You can do it by saving a copy of the original string, and looking into offsets of the current token into that copy:
char str[] = "Hello there; How are you? / I'm good - End";
char *copy = strdup(str);
char *delim = ";-/";
char *res = strtok( str, delim );
while (res) {
printf("%c\n", copy[res-str+strlen(res)]);
res = strtok( NULL, delim );
}
free(copy);
This prints
;
/
-
Demo #1
EDIT: Handling multiple delimiters
If you need to handle multiple delimiters, determining the length of the current sequence of delimiters becomes slightly harder: now you need to find the next token before deciding how long is the sequence of delimiters. The math is not complicated, as long as you remember that NULL requires special treatment:
char str[] = "(20*(5+(7*2)))+((2+8)*(3+6*9))";
char *copy = strdup(str);
char *delim = "*+()";
char *res = strtok( str, delim );
while (res) {
int from = res-str+strlen(res);
res = strtok( NULL, delim );
int to = res != NULL ? res-str : strlen(copy);
printf("%.*s\n", to-from, copy+from);
}
free(copy);
Demo #2
You can't. strtok overwrites the next separator character with a nul character (in order to terminate the token that it's returning this time), and it doesn't store the previous value that it overwrites. The first time you call strtok on your example string, the ; is gone forever.
You could do something if you keep an unmodified copy of the string you're modifying with strtok - given the index of the nul terminator for your current token (relative to the start of the string), you can look at the same index in the copy and see what was there.
That might be worse than just writing your own code to separate the string, of course. You can use strpbrk or strcspn, if you can live with the resulting token not being nul-terminated for you.
man 3 strtok
The strtok() and strtok_r() functions return a pointer to the
beginning of each subsequent token in the string, after replacing the
token itself with a NUL character. When no
more tokens remain, a null pointer is returned.
But with a little pointer arithmetic you can do something like:
char* string = "Hello,World!";
char* dup = strdup(string);
char* world = strtok(string, ",");
char delim_used = dup[world - string];
free(dup);
My function foo(char *str) receives str that is a multiline string with new line characters that is null-terminated. I am trying to write a while loop that iterates through the string and operates on one line. What is a good way of achieving this?
void foo(char *str) {
while((line=getLine(str)) != NULL) {
// Process a line
}
}
Do I need to implement getLine myself or is there an in-built function to do this for me?
You will need to implement some kind of parsing based on the new line character yourself. strtok() with a delimiter of "\n" is a pretty good option that does something like what you're looking for but it has to be used slightly differently than your example. It would be more like:
char *tok;
char *delims = "\n";
tok = strtok(str, delims);
while (tok != NULL) {
// process the line
//advance the token
tok = strtok(NULL, delims);
}
You should note, however, that strtok() is both destructive and not threadsafe.
I think you might use strtok, which tokenizes a string into packets delimited by some specific characters, in your case the newline character:
void foo(char *str)
{
char *line = strtok(str, "\n");
while(line)
{
//work with line, which contains a single line without the trailing '\n'
...
//next line
line = strtok(NULL, "\n");
}
}
But keep in mind that this alters the contents of str (it actually replaces the '\n's by '\0's), so you may want to make a copy of it beforehand if you need it further.
Sooo... a bit late, but below is a re-entrant version of #debeer's and #Christian Rau's answer - notice strtok_r instead of strtok.
This can be called from multiple threads using different strings.
char *tok;
char *saveptr;
char *delims = "\n";
tok = strtok_r(str, delims, &saveptr);
while (tok != NULL) {
// process the line
//advance the token
tok = strtok_r(NULL, delims, &saveptr);
}
Please note that it is still destructive as it modifies the string being tokenised.
You can use fgets to do the getLine work: http://linux.die.net/man/3/fgets