function split which I want to include the delimiter " : " - c

I have this function which returns an array of strings in C which are delimited by delim. Every call to strtok returns the string between two delimiters, but it doesn't include the delimiter. I want it to return the string with the char of delim at the end only if this char==':'. For example, for " ici: papa mama", if I call it with delim== " \t\n:" I want it to return an array which contains:
ici:
papa
mama
(the ':' will be included in ici but not the spaces or tabulations).
need help
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char ** split(const char * str, const char * delim)
{
/* count words */
char * s = strdup(str);
char* inter=malloc(sizeof(char)*100);// pour l etiquette ne pas oublier de mettre deux points a la fin avant de la mettre dans le
if (strtok(s, delim) == 0){
/* no word */
return NULL;}
int nw = 1;
while (strtok(NULL, delim) != 0)
nw += 1;
strcpy(s, str); /* restore initial string modified by strtok */
/* split */
char ** v = malloc((nw + 1) * sizeof(char *));
int i;
v[0] = strdup(strtok(s, delim));
for (i = 1; i != nw; ++i)
v[i] = strdup(strtok(NULL, delim));
v[i] = NULL; /* end mark */
free(s);
return v;
}
int main() {
char **words=split("ici: haha papa", " \t\n:");
printf("%s \n",words[0]);// with the two points
printf("%s \n",words[1]);// without the spaces
printf("%s \n",words[2]);// without the spaces
return 0;

For a generalized solution, the only way to preserve a single separator/delimiter would be to write your own tokenizer.
In your specific case, however, it would appear that the colon ‘:’ only appears attached to the end of the first token. strtok() makes this kind of issue particularly easy, since you can change the separator/delimiter list with each invocation.
char * first = strtok( s, " \t\n\r" ); // first token ends with whitespace
For the remaining tokens, get them in a loop with the colon included in the list of separators/delimiters.
char * tok = strtok( NULL, ": \t\n\r" );
As per your added comment:
For the case where there may be more than one space-delimited word before the first colon (or, as per your comment, where the colon does not have adjacent whitespace), you will have to use strchr() (or some other search function) to find the token and feed the two halves of the string into separate tokenization loops.
Error handling
Remember to look closely at the documentation when designing your code. C makes it look like you can chain stuff together nicely, but many times you cannot (without extra helper functions). For example, strtok() may return NULL, which is not a valid input to strdup() (causing UB).
In other words, make sure to check for error conditions with the return values from function invocations.
Replacing strtok
And back to the first idea, strtok() is likely implemented in terms of the strspn() and strcspn() functions. Rather than use strtok(), use those, and use strndup() to get desired substrings.
UPDATE
I was bored, so... Here is a generalized tokenizer that might do something like what you want...
#include <iso646.h>
#include <stdlib.h>
#include <string.h>
char * tok2( const char ** s, const char * ks, const char * ds )
{
// s may not be NULL
// *s may be NULL
if (!*s) return NULL;
// skip any leading ds
const char * first = *s += strspn( *s, ds );
if (!*first) return NULL;
// find the first ks|ds
while (**s and !strchr( ks, **s ) and !strchr( ds, **s )) *s += 1;
// skip any ks
*s += strspn( *s, ks );
// return substring from first (inclusive) to *s (exclusive)
size_t n = *s - first;
if (!n) return NULL;
char * result = malloc( n+1 );
if (!result) return NULL;
result[n] = '\0';
return strncpy( result, first, n );
}
It compiles cleanly with MSVC and Clang. On Windows don’t forget to add -D_CRT_SECURE_NO_WARNINGS to your command-line options to silence the stupid.
Here is the very simple test I used for it (just concatenate this to the above and compile):
#include <stdio.h>
int main(void)
{
const char * test_string = " one two: three four:five six : seven eight :nine ten ";
printf( "given: \"%s\"\n", test_string );
const char ** src = &test_string;
char * str;
while ((str = tok2( src, ":", " \t\r\n" )))
{
printf( " \"%s\"\n", str );
free( str );
}
}

Related

How to replace characters by strtok function - C?

I really want to change all spaces ' ' in my char array for NULL -
#include <string.h>
void ReplaceCharactersInString(char *pcString, char *cOldChar, char *cNewChar) {
char *p = strtok(pcString, cOldChar);
strcpy(pcString, p);
while (p != NULL) {
strcat(pcString, p);
p = strtok(cNewChar, cOldChar);
}
}
int main() {
char pcString[] = "I am testing";
ReplaceCharactersInString(pcString, " ", NULL);
printf(pcString);
}
OUTPUT: Iamtesting
If I simply put the printf(p) function before:
p = strtok(cNewChar, cOldChar);
In the result I have what I need - but the problem is how to store it in pcString (directly)?
Or there is maybe a better solution to simply do it?
While some functions expect a [single] string to be pre-parsed to: I\0am\0testing, that is rare.
And, if you have multiple spaces/delimiters, you'll get (e.g.) foo\0\0bar, which you probably don't want.
And, your printf in main will only print the first token in the string because it will stop on the first EOS (i.e. '\0').
(i.e.) You probably don't want strcpy/strcat.
More likely, you want to fill an array of char * pointers to the tokens you parse.
So, you'd want to pass down char **argv, then do: argv[argc++] = strtok(...); and then do: return argc
Here's how I would refactor your code:
#include <stdio.h>
#include <string.h>
#define ARGMAX 100
int
ReplaceCharactersInString(int argmax,char **argv,char *pcString,
const char *delim)
{
char *p;
int argc;
// allow space for NULL termination
--argmax;
for (argc = 0; argc < argmax; ++argc, ++argv) {
// get next token
p = strtok(pcString,delim);
if (p == NULL)
break;
// zap the buffer pointer
pcString = NULL;
// store the token in the [returned] array
*argv = p;
}
*argv = NULL;
return argc;
}
int
main(void)
{
char pcString[] = "I am testing";
int argc;
char **av;
char *argv[ARGMAX];
argc = ReplaceCharactersInString(ARGMAX,argv,pcString," ");
printf("argc: %d\n",argc);
for (av = argv; *av != NULL; ++av)
printf("'%s'\n",*av);
return 0;
}
Here's the output:
argc: 3
'I'
'am'
'testing'
strcat strcpy should not be used when the source and destination overlap in memory.
Iterate through the array and replace the matching character with the desired character.
Since zeros are part of the string, printf will stop at the first zero and strlen can't be used for the length to print. sizeof can be used as pcString is defined in the same scope.
Note that ReplaceCharactersInString would not work a second time as it would stop at the first zero. The function could be written to accept a length parameter and loop using the length.
#include <stdio.h>
#include <stdlib.h>
void ReplaceCharactersInString(char *pcString, char cOldChar,char cNewChar){
while ( pcString && *pcString) {//not NULL and not zero
if ( *pcString == cOldChar) {//match
*pcString = cNewChar;//replace
}
++pcString;//advance to next character
}
}
int main ( void) {
char pcString[] = "I am testing";
ReplaceCharactersInString ( pcString, ' ', '\0');
for ( int each = 0; each < sizeof pcString; ++each) {
printf ( "pcString[%02d] = int:%-4d char:%c\n", each, pcString[each], pcString[each]);
}
return 0;
}
You want to split the string into individual tokens separated by spaces such as "I\0am\0testing\0". You can use strtok() for this but this function is error prone. I suggest you allocate an array of pointers and make them point to the words. Note that splitting the source string is sloppy and does not allow for tokens to be adjacent such as in 1+1. You could allocate the strings instead.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **split_string(const char *str, char *delim) {
size_t i, len, count;
const char *p;
/* count tokens */
p = str;
p += strspn(p, delim); // skip initial delimiters
count = 0;
while (*p) {
count++;
p += strcspn(p, delim); // skip token
p += strspn(p, delim); // skip delimiters
}
/* allocate token array */
char **array = calloc(sizeof(*array, count + 1);
p = str;
p += strspn(p, delim); // skip initial delimiters
for (i = 0; i < count; i++) {
len = strcspn(p, delim); // token length
array[i] = strndup(p, len); // allocate a copy of the token
p += len; // skip token
p += strspn(p, delim); // skip delimiters
}
/* array ends with a null pointer */
array[count] = NULL;
return array;
}
int main() {
const char *pcString = "I am testing";
char **array = split_string(pcString, " \t\r\n");
for (size_t i = 0; array[i] != NULL; i++) {
printf("%zu: %s\n", i, array[i]);
}
return 0;
}
The strtok function pretty much does exactly what you want. It basically replaces the next delimiter with a '\0' character and returns the pointer to the current token. The next time you call strtok, you should pass a NULL argument (see the documentation for strtok) and it will point to the next token, which will again be delimited by '\0'. Read some more examples of correct strtok usage.

How to find substring between quotation marks in C

If I have a string such as the string that is the command
echo 'foobar'|cat
Is there a good way for me to get the text between the quotation marks ("foobar")? I read that it was possible to use scanf to do it in a file, is it also possible in-memory?
My attempt:
char * concat2 = concat(cmd, token);
printf("concat:%s\n", concat2);
int res = scanf(in, " '%[^']'", concat2);
printf("result:%s\n", in);
Use strtok() once, to locate the first occurrence of delimiter you wish (' in your case), and then once more, to find the ending pair of it, like this:
#include <stdio.h>
#include <string.h>
int main(void) {
const char* lineConst = "echo 'foobar'|cat"; // the "input string"
char line[256]; // where we will put a copy of the input
char *subString; // the "result"
strcpy(line, lineConst);
subString = strtok(line, "'"); // find the first double quote
subString=strtok(NULL, "'"); // find the second double quote
if(!subString)
printf("Not found\n");
else
printf("the thing in between quotes is '%s'\n", subString);
return 0;
}
Output:
the thing in between quotes is 'foobar'
I was based on this: How to extract a substring from a string in C?
If your string is in this format -"echo 'foobar'|cat", sscanf can be used-
char a[20]={0};
char *s="echo 'foobar'|cat";
if(sscanf(s,"%*[^']'%[^']'",a)==1){
// do something with a
}
else{
// handle this condition
}
%*[^'] will read and discard a string until it encounter single quote ' , the second format specifier %[^'] will read string till ' and store it in a.
There are a large number of ways to approach the problem. From walking a pair of pointers down the string to locate the delimiters, and a large number of string functions provided in string.h. You can make use of character search functions such as strchr or string search functions like strpbrk, you can use tokenizing functions like strtok, etc...
Look over and learn from them all. Here is an implementation with strpbrk and a pointer difference. It is non-destructive, so you need not make a copy of the original string.
#include <stdio.h>
#include <string.h>
int main (void) {
const char *line = "'foobar'|cat";
const char *delim = "'"; /* delimiter, single quote */
char *p, *ep;
if (!(p = strpbrk (line, delim))) { /* find the first quote */
fprintf (stderr, "error: delimiter not found.\n");
return 1;
}
p++; /* advance to next char */
ep = strpbrk (p, delim); /* set end pointer to next delim */
if (!p) { /* validate end pointer */
fprintf (stderr, "error: matching delimiters not found.\n");
return 1;
}
char substr[ep - p + 1]; /* storage for substring */
strncpy (substr, p, ep - p); /* copy the substring */
substr[ep - p] = 0; /* nul-terminate */
printf ("\n single-quoted string : %s\n\n", substr);
return 0;
}
Example Use/Output
$ ./bin/substr
single-quoted string : foobar
Without Using string.h
As mentioned above, you can also simply walk a pair of pointers down the string and locate your pairs of quotes in that manner as well. For completeness, here is an example finding multiple quoted strings within a single line:
#include <stdio.h>
int main (void) {
const char *line = "'foobar'|cat'mousebar'sum";
char delim = '\'';
char *p = (char *)line, *sp = NULL, *ep = NULL;
size_t i = 0;
for (; *p; p++) { /* for each char in line */
if (!sp && *p == delim) /* find 1st delim */
sp = p, sp++; /* set start ptr */
else if (!ep && *p == delim) /* find 2nd delim */
ep = p; /* set end ptr */
if (sp && ep) { /* if both set */
char substr[ep - sp + 1]; /* declare substr */
for (i = 0, p = sp; p < ep; p++)/* copy to substr */
substr[i++] = *p;
substr[ep - sp] = 0; /* nul-terminate */
printf ("single-quoted string : %s\n", substr);
sp = ep = NULL;
}
}
return 0;
}
Example Use/Output
$ ./bin/substrp
single-quoted string : foobar
single-quoted string : mousebar
Look all the answers over and let us know if you have any questions.

How to obtain a single word from a string in C?

In order to complete a program I am working on, I have to be able to put pieces of a string into a stack for later use. For example, say I had this string:
"22 15 - 2 +"
Ideally, I first want to extract 22 from the string, place it in a separate, temporary string, and then manipulate it as I would like. Here is the code that I'm using which I think would work, but it is very over-complicated.
void evaluatePostfix(char *exp){
stack *s = initStack();
char *temp_str;
char temp;
int temp_len, val, a, b, i=0, j;
int len = strlen(exp);
while(len > 0){
temp_str = malloc(sizeof(char)); //holds the string i am extracting
j=0; //first index in temp_str
temp = exp[i]; //current value in exp, incremented later on the function
temp_len = 1; //for reallocation purposes
while(!isspace(temp)){ //if a white space is hit, the full value is already scanned
if(ispunct(temp)) //punctuation will always be by itself
break; //break if it is encountered
temp_str = (char*)realloc(temp_str, temp_len+1); //or else reallocate the string to hold the new character
temp_str[j] = temp; //copy the character to the string
temp_len++; //increment for the length of temp_str
i++; //advance one value in exp
j++; //advance one value in temp_str
len--; //the number of characters left to scan is one less
temp = exp[i]; //prepare for the next loop
} //and so on, and so on...
} //more actions follow this, but are excluded
}
Like I said, overcomplicated. Is there a simpler way for me to extract this code? I can reliably depend upon there being white space between the values and characters I need to extract.
If you are good to use library function, then strtok is for this
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
Reference
The limitation of strtok(char *str, const char *delim) is that it can't work on multiple strings simultaneously as it maintains a static pointer to store the index till it has parsed (hence sufficient if playing with only one string at a time). The better and safer method is to use strtok_r(char *str, const char *delim, char **saveptr) which explicitly takes a third pointer to save the parsed index.
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token, *saveptr;
/* get the first token */
token = strtok_r(str, s, &saveptr);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok_r(NULL, s, &saveptr);
}
return(0);
}
Take a look at the strotk function, i think it's what you'r looking for.

Simpler way to extract occurrences without using strtok in C

I'm trying to extract the strings before and after the first comma from the given string. However, I feel there's got to be a better way than what I have below, perhaps I don't even need the strdup calls. Thanks
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int extract_names(const char *str)
{
char *name, *last, *p1, *p2, *p3;
name = strdup(str);
last = strdup(str);
p1 = strchr(name, ',');
if (p1)
{
*p1 = '\0';
printf("%s\n", name);
}
p2 = strchr(last, ',');
p2++;
if (p2)
{
p3 = strpbrk(p2 + 1, " \0");
if (p3)
*p3 = '\0';
printf("%s\n", p2);
}
free(name);
free(last);
return 0;
}
int main()
{
// strings should at least contain last,name.
// but can contain several words
const char *str1 = "jones,bob age,12";
extract_names(str1);
const char *str2 = "smith,peter";
extract_names(str2);
return 0;
}
Output
jones
bob
smith
peter
Use strchr to find the limits of the last and first names. Then you can use the precision specifier in printf to print just the part of the string you are interested in.
For example:
int extract_names(const char *str)
{
const char *comma = strchr(str, ',');
const char *name_end = strchr(str, ' ');
/* name ends at space or end of string */
if (!name_end) {
name_end = str + strlen(str);
}
/* print last name */
printf("%.*s\n", (comma - str), str);
/* print first name */
printf("%.*s\n", name_end - comma, comma + 1);
return 0;
}
Since you're obviously trying to learn, I'll only give you a few pointers and not a "better working solution".
strdup is a useful idea here. As you do, it allows you to overwrite the string (const char *str is "readonly").
This combination:
p2 = strchr(last, ',');
p2++;
if (p2) {
is wrong. after the call p2 equals the return value. if you advance it before the if, you're not testing anything (if NULL was returned, it's 1 by the time you test it).
You don't need two strdups, and you don't need to search twice. p1 already points to the right place in name, you can use that for the rest of your logic.

How to safety parse tab-delimited string ?

How to safety parse tab-delimiter string ? for example:
test\tbla-bla-bla\t2332 ?
strtok() is a standard function for parsing strings with arbitrary delimiters. It is, however, not thread-safe. Your C library of choice might have a thread-safe variant.
Another standard-compliant way (just wrote this up, it is not tested):
#include <string.h>
#include <stdio.h>
int main()
{
char string[] = "foo\tbar\tbaz";
char * start = string;
char * end;
while ( ( end = strchr( start, '\t' ) ) != NULL )
{
// %s prints a number of characters, * takes number from stack
// (your token is not zero-terminated!)
printf( "%.*s\n", end - start, start );
start = end + 1;
}
// start points to last token, zero-terminated
printf( "%s", start );
return 0;
}
Use strtok_r instead of strtok (if it is available). It has similar usage, except it is reentrant, and it does not modify the string like strtok does. [Edit: Actually, I misspoke. As Christoph points out, strtok_r does replace the delimiters by '\0'. So, you should operate on a copy of the string if you want to preserve the original string. But it is preferable to strtok because it is reentrant and thread safe]
strtok will leave your original string modified. It replaces the delimiter with '\0'. And if your string happens to be a constant, stored in a read only memory (some compilers will do that), you may actually get a access violation.
Using strtok() from string.h.
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "test\tbla-bla-bla\t2332";
char * pch;
pch = strtok (str," \t");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " \t");
}
return 0;
}
You can use any regex library or even the GLib GScanner, see here and here for more information.
Yet another version; this one separates the logic into a new function
#include <stdio.h>
static _Bool next_token(const char **start, const char **end)
{
if(!*end) *end = *start; // first call
else if(!**end) // check for terminating zero
return 0;
else *start = ++*end; // skip tab
// advance to terminating zero or next tab
while(**end && **end != '\t')
++*end;
return 1;
}
int main(void)
{
const char *string = "foo\tbar\tbaz";
const char *start = string;
const char *end = NULL; // NULL value indicates first call
while(next_token(&start, &end))
{
// print substring [start,end[
printf("%.*s\n", end - start, start);
}
return 0;
}
If you need a binary safe way to tokenize a given string:
#include <string.h>
#include <stdio.h>
void tokenize(const char *str, const char delim, const size_t size)
{
const char *start = str, *next;
const char *end = str + size;
while (start < end) {
if ((next = memchr(start, delim, end - start)) == NULL) {
next = end;
}
printf("%.*s\n", next - start, start);
start = next + 1;
}
}
int main(void)
{
char str[] = "test\tbla-bla-bla\t2332";
int len = strlen(str);
tokenize(str, '\t', len);
return 0;
}

Resources