Parse comma separated string in C - c

Currently I'm trying this, which is printing out nothing (but there are no compilation problems):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void parseGprmc(char* gprmc) {
printf("test");
char* ptr;
ptr = strtok(gprmc, ",");
while(ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(gprmc, ",");
}
}
int main() {
char* gprmc = "$GPRMC,081836,A,3751.65,S,14507.36,E,000.0,360.0,,,E*62";
printf("%s", gprmc);
parseGprmc(gprmc);
printf("%s", gprmc);
return 1;
}
What am I doing wrong?
Ideally, parseGprmc would print out:
$GPRMC
081836
A
3751.65
S
14507.36
E
000.0
360.0
E*62
Considering the null values as valid as well but I don't think that strtok does that.

There are two serious issues in your code. First (as pointed out in the comments), the strtok function modifies the string given as its first argument – but you have declared gprmc as a pointer to a constant (immutable) string literal. To fix this, change the declaration to a char array that is initialized with a copy of the literal:
char gprmc[] = "$GPRMC,081836,A,3751.65,S,14507.36,E,000.0,360.0,,,E*62";
Second, only the first call to strtok for parsing a given string should have that string as its first argument; subsequent calls (on the same string) should use a NULL first argument (see this cppreference, particularly the part beginning with If str is not a null pointer, ….
Here is a 'fixed' version of your parseGprmc function:
void parseGprmc(char* gprmc)
{
printf("test");
char* ptr;
ptr = strtok(gprmc, ","); // First call, use string address
while (ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(NULL, ","); // Subsequent calls, use NULL
}
}
Note that it is not directly possible to extract empty (null) tokens using the strtok function, as the function searches for the first character which is not contained in delim (quoted from the same cppreference page). To do this, you have to add some 'tricks', comparing the returned address of each token with the address of the end of the previous token, and counting how many delimiter characters were in the 'gap'. Here is a possible modification of your function to do just that:
void parseGprmc(char* gprmc)
{
printf("test");
char* ptr;
char* lastend;
ptrdiff_t nBlanks;
ptr = strtok(gprmc, ","); // First call, use string address
while (ptr != NULL) {
lastend = ptr + strlen(ptr); // Address of last character in token
printf("%s\n", ptr);
ptr = strtok(NULL, ","); // Subsequent calls, use NULL
nBlanks = ptr ? ptr - lastend : -1; // Number of delimiters found
while (--nBlanks >= 1) { // One empty token for each gap > 1
printf("<null token>\n");
}
}
}

One remark about strtok is that you must use NULL for the second and next calls. This is because after the first call the contents of gprmc is modified and has an \0 in place of first ,
ptr = strtok(gprmc, ","); // for first call before the loop
ptr = strtok(NULL, ",");// for the call following in the loop

homework? inside the loop you don't pass strtok a pointer, you pass NULL. See help on that function. But perhaps what you're wanting is to use strchr instead and change ',' to '\n' ?

Related

Wrong output after modifying an array in a function (in C)

I'm a C noob and I'm having problems with the following code:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void split_string(char *conf, char *host_ip[]){
long unsigned int conf_len = sizeof(conf);
char line[50];
strcpy(line, conf);
int i = 0;
char* token;
char* rest = line;
while ((token = strtok_r(rest, "_", &rest))){
host_ip[i] = token;
printf("-----------\n");
printf("token: %s\n", token);
i=i+1;
}
}
int main(){
char *my_conf[1];
my_conf[0] = "conf01_192.168.10.1";
char *host_ip[2];
split_string(my_conf[0], host_ip);
printf("%s\n",host_ip[0]);
printf("%s\n",host_ip[1]);
}
I want to modify the host_ip array inside the split_string function and then print the 2 resulting strings in the main.
However, the 2 last printf() are only printing unknown/random characters (maybe an address?). Any help?
There are 2 problems:
First, you're returning pointers to local variables. You can avoid this by strduping the strings and freeing in the caller.
Second:
On the first call to strtok_r(), str should point to the string to be parsed, and the value of saveptr is ignored. In subsequent calls, str should be NULL, and saveptr should be unchanged since the previous call.
I.e. you must NULL for the first argument after the first iteration in the loop. Nowhere is it said that it is OK to use the same pointer for both arguments.
This is because the strtok_r is an almost drop-in replacement to the braindead strtok, with just one extra argument, so that you could even wrap it with a macro...
Thus we get
char *start = rest;
while ((token = strtok_r(start, "_", &rest))){
host_ip[i] = strdup(token);
printf("-----------\n");
printf("token: %s\n", token);
i++;
start = NULL;
}
and in the caller:
free(host_ip[0]);
free(host_ip[1]);
You are storing address of local variable (line) which is in stack.Stack is LIFO and has valid data for local variables in its stack memory during its function life time.after that, the same stack memory will be allocated to another function's local variables. So, data stores in line【50】's memory will be invalid after coming out of string_split function

Weird output from strtok

I was having some issues dealing with char*'s from an array of char*'s and used this for reference: Splitting C char array into words
So what I'm trying to do is read in char arrays and split them with a space delimiter so I can do stuff with it. For example if the first token in my char* is "Dog" I would send it to a different function that dealt with dogs. My problem is that I'm getting a strange output.
For example:
INPUT: *cmd = "Dog needs a vet appointment."
OUTPUT: (from print statements) "Doneeds a vet appntment."
I've checked for memory leaks using valgrind and I have none of them or other errors.
void parseCmd(char* cmd){ //passing in an individual char* from a char**
char** p_args = calloc(100, sizeof(char*));
int i = 0;
char* token;
token = strtok(cmd, " ");
while (token != NULL){
p_args[i++] = token;
printf("%s",token); //trying to debug
token = strtok(NULL, cmd);
}
free(p_args);
}
Any advice? I am new to C so please bear with me if I did something stupid. Thank you.
In your case,
token = strtok(NULL, cmd);
is not what you should be doing. You instead need:
token = strtok(NULL, " ");
As per the ISO standard:
char *strtok(char * restrict s1, const char * restrict s2);
A sequence of calls to the strtok function breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a character from the string pointed to by s2.
The only difference between the first and subsequent calls (assuming, as per this case, you want the same delimiters) should be using NULL as the input string rather than the actual string. By using the input string as the delimiter list in subsequent calls, you change the behaviour.
You can see exactly what's happening if you try the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void parseCmd(char* cmd) {
char* token = strtok(cmd, " ");
while (token != NULL) {
printf("[%s] [%s]\n", cmd, token);
token = strtok(NULL, cmd);
}
}
int main(void) {
char x[] = "Dog needs a vet appointment.";
parseCmd(x);
return 0;
}
which outputs (first column will be search string to use next iteration, second is result of this iteration):
[Dog] [Dog]
[Dog] [needs a vet app]
[Dog] [intment.]
The first step worked fine since you were using space as the delimiter and it modified the string by placing a \0 at the end of Dog.
That means the next attempt (with the wrong spearator) would use one of the letters from {D,o,g} to split. The first matching letter for that set is the o in appointment which is why you see needs a vet app. The third attempt finds none of the candidate letters so you just get back the rest of the string, intment..
token = strtok(NULL, cmd); should be token = strtok(NULL, " ");.
The second argument is for delimiter.
http://man7.org/linux/man-pages/man3/strtok.3.html

How to break a string in C with /

I have a string with the following pattern :
char *str = "ai/aj/module_mat.mod";
and I want to select module_mat as my final string for the further logic. I have tried to used rindex() so that I can get the final part of the string. But I am not able to do this in C. What am I doing wrong?
The code I am trying is -
char *first = rindex(str, "/");
char *first = strtok(first, ".");
Your mistake is right here:
char *str = "ai/aj/module_mat.mod";
Since str points to a constant, this should be:
const char *str = "ai/aj/module_mat.mod";
Now your compiler should show you the other problems.
Similarly:
char *first = rindex(str, "/");
Since rindex is returning a pointer into the constant you passed it, that pointer should also be const.
char *first = strtok(first, ".");
Hmm, what do the docs for strtok say:
If a delimiter byte is found, it is overwritten with a null byte
to terminate the current token, and strtok() saves a pointer to the following byte; ...
So strtok modifies the thing the pointer points to, so passing it a pointer to a constant is bad! You can't modify a constant.
First off, the string literal is immutable, so it is very dangerous to bind it to a mutable char pointer. First fix your code:
const char* str = "ai/aj/module_mat.mod";
Next, use strchr:
#include <string.h>
const char* p = strchr(str, '/');
if (p != NULL) {
++p;
printf("Last part: %s\n", p);
} else {
printf("No '/' found in string %s.\n", str);
}
If a / is found in the string, p will point to it, and hence p can be used as the suffix substring of the original string, and there's no need to modify the original string. We advance p by one to skip past the / and are left with the final part of the string.

Need to know when no data appears between two token separators using strtok()

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

C: Parse empty tokens from a string with strtok

My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}

Resources