Parsing a string - c

i have a string of the format "ABCDEFG,12:34:56:78:90:11". i want to separate these two values that are separated by commas into two different strings. how do i do that in gcc using c language.

One possibility is something like this:
char first[20], second[20];
scanf("%19[^,], %19[^\n]", first, second);

So many people are suggesting strtok... Why? strtok is a left-over of stone age of programming and is good only for 20-line utilities!
Each call to strtok modifies strToken by inserting a null character after the token returned by that call. [...]
[F]unction uses a static variable for parsing the string into tokens. [...] Interleaving calls to this function is highly likely to produce data corruption and inaccurate results.
scanf, as in Jerry Coffin's answer, is a much better alternative. Or, you can do it manually: find the separator with strchr, then copy parts to separate buffers.

char str[] = "ABCDEFG,12:34:56:78:90:11"; //[1]
char *first = strtok(str, ","); //[2]
char *second = strtok(NULL, ""); //[3]
[1] ABCDEFG,12:34:56:78:90:11
[2] ABCDEFG\012:34:56:78:90:11
Comma replaced with null character with first pointing to 'A'
[3] Subsequent calls to `strtok` have NULL` as first argument.
You can change the delimiter though.
Note: you cannot use "string literals", because `strtok` modifies the string.

You can use strtok which will allow you to specify the separator and generate the tokens for you.

You could use strtok:
Example from cppreference.com:
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "#";
char *result = NULL;
result = strtok( str, delims );
while( result != NULL ) {
printf( "result is \"%s\"\n", result );
result = strtok( NULL, delims );
}

Try using the following regex it will find anything with chars a-z A-Z followed by a ","
"[A-Z]," if you need lower case letter too try "[a-zA-Z],"
If you need it to search for the second part first you could try the following
",[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}"
There is an example on how to use REGEX's at
http://ddj.com/184404797
Thanks,
V$h3r

Related

Handling consecutive delimiters with strsep() in C

I am trying to read a string word by word in C using strsep() function, which can be also done using strtok(). When there are consecutive delimiters -in my case the empty space- the function does not ignore them. I am expected to use strsep() and couldn't figure out the solution. I'd appreciate it if one of you can help me.
#include <stdio.h>
#include <string.h>
int main(){
char newLine[256]= "scalar i";
char *q;
char *token;
q = strdup(newLine);
const char delim[] = " ";
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
return 0;
}
Actual output is:
The token is: "scalar"
The token is: ""
What I expected is:
The token is: "scalar"
The token is: "i"
To do that I also tried to write a while loop so that I could continue until the token is non-empty.
But I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
First note that strsep(), while convenient is not in the standard C library, and will only be available on Unix systems with BSD-4.4 C library support. That's most Unix'ish systems today, but still.
Anyway, strsep() supports empty fields. That means that if your string has consecutive delimiters, it will find empty, length-0, tokens between each of these delimiters. For example, the tokens for string "ab cd" will be:
"ab"
""
"cd"
2 delimiters -> 3 tokens.
Now, you also said:
I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
I am guessing what you were trying to perform is simply comparison, e.g. if (my_token == "") { ... }. That won't work, because that is a comparison of pointers, not of the strings' contents. Two strings may have identical characters at different places in memory, and that is particularly likely with the example I just gave, since my_token will be dynamic, and will not be pointing to the static-storage-duration string "" used in the comparison.
Instead, you will need to use strcmp(my_token,""), or better yet, just check manually for the first char being '\0'.

Tokenizing a string when encountered a newline - Not working newline is not getting recognized

I am trying to tokenize a string when encountered a newline.
rest = strdup(value);
while ((token = strtok_r(rest,"\n", &rest))) {
snprintf(new_value, MAX_BANNER_LEN + 1, "%s\n", token);
}
where 'value' is a string say, "This is an example\nHere is a newline"
But the above function is not tokenizing the 'value' and the 'new_value' variable comes as it is i.e. "This is an example\nHere is a newline".
Any suggestions to overcome this?
Thanks,
Poornima
Several things going on with your code:
strtok and strtok_r take the string to tokenize as first parameter. Subsequent tokenizations of the same string should pass NULL. (It is okay to tokenize the same string with different delimiters.)
The second parameter is a string of possible separators. In your case you should pass "\n". (strtok_r will treat stretches of the characters as single break. That means that tokenizing "a\n\n\nb" will produce two tokens.)
The third parameter to strtok_r is an internal parameter to the function. It will mark where the next tokenization should start, but you need not use it. Just define a char * and pass its address.
Especially, don't repurpose the source string variable as state. In your example, you will lose the handle to the strduped string, so that you cannot free it later, as you should.
It is not clear how you determine that your tokenization "doesn't work". You print the token to the same char buffer repeatedly. Do you want to keep only the part after the last newline? In that case, use strchrr(str, '\n'). If the result isn't NULL it is your "tail". If it is NULL the whole string is your tail.
Here's how tokenizing a string could work:
char *rest = strdup(str);
char *state;
char *token = strtok_r(rest, "\n", &state);
while (token) {
printf("'%s'\n", token);
token = strtok_r(NULL, "\n", &state);
}
free(rest);

Using sscanf to read strings

I am trying to save one character and 2 strings into variables.
I use sscanf to read strings with the following form :
N "OldName" "NewName"
What I want : char character = 'N' , char* old_name = "OldName" , char* new_name = "NewName" .
This is how I am trying to do it :
sscanf(mystring,"%c %s %s",&character,old_name,new_name);
printf("%c %s %s",character,old_name,new_name);
The problem is , my problem stops working without any outputs .
(I want to ignore the quotation marks too and save only its content)
When you do
char* new_name = "NewName";
you make the pointer new_name point to the read-only string array containing the constant string literal. The array contains exactly 8 characters (the letters of the string plus the terminator).
First of all, using that pointer as a destination for scanf will cause scanf to write to the read-only array, which leads to undefined behavior. And if you give a string longer than 7 character then scanf will also attempt to write out of bounds, again leading to undefined behavior.
The simple solution is to use actual arrays, and not pointers, and to also tell scanf to not read more than can fit in the arrays. Like this:
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
sscanf(mystring,"%c %63s %63s",&character,old_name,new_name);
To skip the quotation marks you have a couple of choices: Either use pointers and pointer arithmetic to skip the leading quote, and then set the string terminator at the place of the last quote to "remove" it. Another solution is to move the string to overwrite the leading quote, and then do as the previous solution to remove the last quote.
Or you could rely on the limited pattern-matching capabilities of scanf (and family):
sscanf(mystring,"%c \"%63s\" \"%63s\"",&character,old_name,new_name);
Note that the above sscanf call will work iff the string actually includes the quotes.
Second note: As said in the comment by Cool Guy, the above won't actually work since scanf is greedy. It will read until the end of the file/string or a white-space, so it won't actually stop reading at the closing double quote. The only working solution using scanf and family is the one below.
Also note that scanf and family, when reading string using "%s" stops reading on white-space, so if the string is "New Name" then it won't work either. If this is the case, then you either need to manually parse the string, or use the odd "%[" format, something like
sscanf(mystring,"%c \"%63[^\"]\" \"%63[^\"]\"",&character,old_name,new_name);
You must allocate space for your strings, e.g:
char* old_name = malloc(128);
char* new_name = malloc(128);
Or using arrays
char old_name[128] = {0};
char new_name[128] = {0};
In case of malloc you also have to free the space before the end of your program.
free(old_name);
free(new_name);
Updated:...
The other answers provide good methods of creating memory as well as how to read the example input into buffers. There are two additional items that may help:
1) You expressed that you want to ignore the quotation marks too.
2) Reading first & last names when separated with space. (example input is not)
As #Joachim points out, because scanf and family stop scanning on a space with the %s format specifier, a name that includes a space such as "firstname lastname" will not be read in completely. There are several ways to address this. Here are two:
Method 1: tokenizing your input.
Tokenizing a string breaks it into sections separated by delimiters. Your string input examples for instance are separated by at least 3 usable delimiters: space: " ", double quote: ", and newline: \n characters. fgets() and strtok() can be used to read in the desired content while at the same time strip off any undesired characters. If done correctly, this method can preserve the content (even spaces) while removing delimiters such as ". A very simple example of the concept below includes the following steps:
1) reading stdin into a line buffer with fgets(...)
2) parse the input using strtok(...).
Note: This is an illustrative, bare-bones implementation, sequentially coded to match your input examples (with spaces) and includes none of the error checking/handling that would normally be included.
int main(void)
{
char line[128];
char delim[] = {"\n\""};//parse using only newline and double quote
char *tok;
char letter;
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
fgets(line, 128, stdin);
tok = strtok(line, delim); //consume 1st " and get token 1
if(tok) letter = tok[0]; //assign letter
tok = strtok(NULL, delim); //consume 2nd " and get token 2
if(tok) strcpy(old_name, tok); //copy tok to old name
tok = strtok(NULL, delim); //consume 3rd " throw away token 3
tok = strtok(NULL, delim); //consume 4th " and get token 4
if(tok) strcpy(new_name, tok); //copy tok to new name
printf("%c %s %s\n", letter, old_name, new_name);
return 0;
}
Note: as written, this example (as do most strtok(...) implementations) require very narrowly defined input. In this case input must be no longer than 127 characters, comprised of a single character followed by space(s) then a double quoted string followed by more space(s) then another double quoted string, as defined by your example:
N "OldName" "NewName"
The following input will also work in the above example:
N "old name" "new name"
N "old name" "new name"
Note also about this example, some consider strtok() broken, while others suggest avoiding its use. I suggest using it sparingly, and only in single threaded applications.
Method 2: walking the string.
A C string is just an array of char terminated with a NULL character. By selectively copying some characters into another string, while bypassing the one you do not want (such as the "), you can effectively strip unwanted characters from your input. Here is an example function that will do this:
char * strip_ch(char *str, char ch)
{
char *from, *to;
char *dup = strdup(str);//make a copy of input
if(dup)
{
from = to = dup;//set working pointers equal to pointer to input
for (from; *from != '\0'; from++)//walk through input string
{
*to = *from;//set destination pointer to original pointer
if (*to != ch) to++;//test - increment only if not char to strip
//otherwise, leave it so next char will replace
}
*to = '\0';//replace the NULL terminator
strcpy(str, dup);
free(dup);
}
return str;
}
Example use case:
int main(void)
{
char line[128] = {"start"};
while(strstr(line, "quit") == NULL)
{
printf("Enter string (\"quit\" to leave) and hit <ENTER>:");
fgets(line, 128, stdin);
sprintf(line, "%s\n", strip_ch(line, '"'));
printf("%s", line);
}
return 0;
}

Splitting a string into words

I have following problem:
// Basically I am reading from a file and storing in local array.
char myText[100] = "This is text of movie Jurassic Park";
// here I want to store each work in to dictionary
st.insert(&myText[0]); // should insert "This" not till end of sentence.
// similarly for next word "is", "text" and so on.
How do I do that in C?
For this, you would use the strtok function:
char myText[100] = "This is text of movie Jurassic Park";
char *p;
for (p = strtok(myText," "); p != NULL; p = strtok(NULL," ")) {
st.insert(p);
}
Note that this function modifies the string it's parsing by adding NUL bytes where the delimiters are.
You could use strtok(). http://www.cplusplus.com/reference/cstring/strtok/
For C you will need to include .
If you just want to split on spaces, you basically may want an strsplit or strtok.
Have a look at Split string with delimiters in C

I misunderstand win32 (and maybe libc) strtok( )

In some CGI code, I need to encode rarely-occurring '&', '<', and '>' chars. In the encoding function, I want to get out right away if there are no such chars in the input string. So, at entry, I try to use strtok( ) to find that out:
char *
encode_amp_lt_gt ( char *in ) {
...
if ( NULL == strtok( in, "&<>" )) {
return in;
}
...
}
But, even in the absence of any of the delimiters, strtok( ) returns a pointer to the first character of in.
I expected it to return NULL if no delims in the string.
Is my code wrong, or is my expectation wrong? I don't want to call strchr( ) three times just to eliminate the usual case.
Thanks!
You probably don't want strtok to begin with, as it leaves you no way of figuring what character was eliminated (except if you have a spare copy of the string).
strtok is not a straightforward API and is easy to misunderstand.
Quoting the manpage:
The strtok() and strtok_r() functions return a pointer to the beginning of
each subsequent token in the string, after replacing the token itself with
a NUL character. When no more tokens remain, a null pointer is returned.
Your problem probably means you've fallen to the obscurity of the algorithm. Suppose this string:
char* value = "foo < bar & baz > frob";
The first time you call strtok:
char* ptr = strtok(value, "<>&");
strtok will return you the value pointer, except that it will have modified the string to this:
"foo \0 bar & baz > frob"
As you may notice, it changed the < to a NUL. Now, however, if you use value, you'll get "foo " since there's a NUL in the middle of the way.
Subsequent calls to strtok with NULL will proceed through the string, until you've reached the end of the string, at which point you'll get NULL.
char* str = "foo < bar & frob > nicate";
printf("%s\n", strtok(str, "<>&")); // prints "foo "
printf("%s\n", strtok(NULL, "<>&")); // prints " bar "
printf("%s\n", strtok(NULL, "<>&")); // prints " frob "
printf("%s\n", strtok(NULL, "<>&")); // prints " nicate"
assert(strtok(NULL, "<>&") == NULL); // should be true
It would be fairly straightforward to write a function that replaces the contents without strtok, either dealing with the hard work yourself, or getting help from strpbrk and strcat.
The function you want is strpbrk, not strtok. The bigger question is - how is the string that is being returned being allocated when you're replacing things, and how does the calling function know if it should free it or not?

Resources