(C) strtok with multiple spaces/tabs, checking for null with pointer - c

I am trying to split a string into two tokens using strtok() that might have spaces and tabs mixed in the string.
So I made this:
struct strstr
{
char *str,
*one,
*two;
};
typedef struct strstr *STRSTR;
void split(STRSTR);
int main()
{
STRSTR str = malloc(sizeof(struct strstr));
str->str = malloc(256);
fgets(str->str, 256, stdin);
split(str);
printf("%s, %s\n", str->one, str->two);
free(str->str);
free(str);
return 0;
}
void split(STRSTR str)
{
int i;
char *temp = str->str;
while(isspace(*(str->str)))
str->str++;
str->one = strtok(str->str, " \t");
for(i = 0; i < strlen(str->one); i++)
{
if(!isspace(str->one[i]))
str->str++;
}
str->str++;
if(str->str != NULL)
{
puts("In null if");
str->two = strtok(str->str, "");
}
str->str = temp;
}
So for example if you input Hello Earth lingss, it will print out Hello, Earth lingss, which is perfect.
However, if I input Hello only, the split function goes inside the if(str->str != NULL) statement. How do I stop it from doing that with the code that I have?
EDIT: Also another problem, if someone doesn't mind checking it out. temp will only point to the first word in str->str. How can I make it point to the whole thing?

Add this statement before the last if block in the split function
str->str = strtok(str->str," \t"); like
str->str = strtok(str->str," \t");
if(str->str != NULL)
{
puts("In null if");
str->two = strtok(str->str, "");
}
you split the string based on the "\t" as the delimiter but you never changed the str->str string, use the above snippet and it should work fine

strtok is a funny function that both modifies the string you pass it, and stores information about it internally. You should pass your string to strtok once, then pass in NULL on subsequent calls. For instance, if your goal is to simply break a string up into tokens (which is obviously what strtok is for), then something like:
#define BUFFER_SIZE 256
int main(void) {
char *buffer = malloc(BUFFER_SIZE);
if (!buffer) {
return -1;
}
fgets(buffer, BUFFER_SIZE, stdin);
char *word;
char *ptr = buffer;
printf("Tokens: [");
while ((word = strtok(ptr, " \t\n"))) {
printf("%s, ", word);
ptr = NULL;
}
printf("]\n");
free(buffer);
}
will work. When I run the code like this:
./quick
when in the fun apple orange
I get the following result:
Tokens: [when, in, the, fun, apple, orange, ]
The important thing is that I only passed the buffer pointer to strtok on the first time through the loop. After that it is passed NULL.

Related

Function that concatenates two pointer strings together

I am having issues when it comes to concatenating these two pointer strings together, below is my concatenating function, I am supposed to take string 1 and add it to string 2. Also I cannot use any functions in the string library, that's the point of this is to help us understand what code is actually in the functions by writing it ourself.
char strconcat(char *user2p, char *user1p) {
while (*user2p) {
user2p++;
}
while (*user1p) {
*user2p = *user1p;
*user2p++;
*user1p++;
}
*user2p = '\0';
printf("test: %c", *user2p);
return *user2p;
}
And here is the part of my main that is relevant to the function.
int main() {
char userString1[21], userString2[21];
char *user1p, *user2p;
user1p = userString1;
user2p = userString2;
printf("Please enter the first string: ");
gets(userString1);
printf("Please enter the second string: ");
gets(userString2);
printf("String 1 after concatenation: ");
puts(userString1);
printf("String 2 after concatenation: %c\n", strconcat(user2p, user1p));
The terminal keeps giving me this, I didn't include the code for the length and alphabetical order. It gives me a null when I try to run the test printf in the function and it gives me nothing when I return the function. I'm at a loss and any help is much appreciated!
Please enter the first string: jackhammer
Please enter the second string: jacky
The length of string 1 is: 10
The length of string 2 is: 5
String 1 comes before string 2 alphabetically.
String 1 after concatenation: jackhammer
(null)
String 2 after concatenation:
Your concat algorithm is fine, but you have to return a pointer to the original [leftmost] value, so your function needs to save it before looping:
char *
strconcat(char *user2p, char *user1p)
{
char *orig2p = user2p;
while (*user2p) {
user2p++;
}
while (*user1p) {
*user2p = *user1p;
user2p++;
user1p++;
}
*user2p = '\0';
printf("test: %s\n", orig2p);
return orig2p;
}
UPDATE:
To come up with a completely bulletproof test program for the concat function, we can use [overly] large input buffers and clip the input length to a maximum of 1/2 of the target buffer.
gets strips the newline but fgets does not. So, I've created an xgets function that is similar to gets but uses fgets and strchr to get [nearly] the same effect.
Although I believe it's okay to use standard string functions as part of the test code, I've created a hand coded version of strchr [hope that's not your next assignment :-)].
Anyway, here's the full program:
#include <stdio.h>
char *
strconcat(char *user2p, char *user1p)
{
char *orig2p = user2p;
while (*user2p) {
user2p++;
}
while (*user1p) {
*user2p = *user1p;
*user2p++;
*user1p++;
}
*user2p = '\0';
printf("test: %s\n", orig2p);
return orig2p;
}
char *
xstrchr(char *buf,int chrwant)
{
int chrcur;
char *res = NULL;
for (chrcur = *buf++; chrcur != 0; chrcur = *buf++) {
if (chrcur == chrwant) {
res = buf - 1;
break;
}
}
return res;
}
char *
xgets(char *buf,int maxlen)
{
char *cp;
char *res;
res = fgets(buf,maxlen,stdin);
if (res != NULL) {
cp = xstrchr(buf,'\n');
if (cp != NULL)
*cp = 0;
}
return res;
}
#define MAXLEN 800
int
main(void)
{
char userString1[MAXLEN], userString2[MAXLEN + 1];
char *user1p, *user2p;
printf("Please enter the first string: ");
user1p = xgets(userString1,MAXLEN / 2);
printf("Please enter the second string: ");
user2p = xgets(userString2,MAXLEN / 2);
if ((user2p != NULL) && (user1p != NULL))
printf("String 2 after concatenation: %s\n",strconcat(user2p, user1p));
return 0;
}
There's a number of issues. First is this.
while (*user1p) {
*user2p = *user1p;
*user2p++;
*user1p++;
}
This is working by accident. If you have compiler warnings on you should get a warning...
test.c:13:9: warning: expression result unused [-Wunused-value]
*user2p++;
^~~~~
test.c:14:9: warning: expression result unused [-Wunused-value]
*user1p++;
^~~~~~~
The reason it's unused is because C is interpreting it like so:
*(user1p++)
Increment the pointer, then dereference it. You just want to increment the pointers, no dereferencing required.
while (*user1p) {
*user2p = *user1p;
user2p++;
user1p++;
}
Then down here.
printf("String 2 after concatenation: %c\n", strconcat(user2p, user1p));
%c prints an individual char. You want %s which prints a char *. This reveals you have the wrong signature. strconcat should return a char * (ie. what C uses for strings) and return user2p (a char *).
char *strconcat(char *orig_to, const char *from) {
...
return user2p;
}
And since you're not changing from it should be const char * to let the compiler know and warn you if its accidentally changed.
Finally, when you return *user2p it's already been moved to the end of the string.
while (*user1p) {
*user2p = *user1p;
user2p++;
user1p++;
}
*user2p = '\0';
printf("test: %c", *user2p);
// This points to the null byte just set above
return user2p;
So printing the result of strconcat will print nothing. To get around this, store the original pointer for user2p and return that.
char *strconcat(char *orig_to, const char *from) {
char *orig_user2p = user2p;
...
return orig_user2p;
}
And some tips. It's easier to follow the code with good variable names that describe what they're doing.
char *strconcat(char *orig_to, const char *from) {
char *to = orig_to;
...
}
char foo[NN] already makes foo a pointer. There's no need to declare separate char * variables and copy the pointer.
char from[21], to[21];
Never use gets. There's no limit to how much memory it can use and it can easily overflow your buffer. Use fgets which can limit how much can be read to available memory.
printf("Please enter the string to concat from: ");
fgets(from, sizeof(from), stdin);
Though it's annoying that it keeps the newline and there's no simple function to strip it. You can use scanf which will strip whitespace, but beware its many pitfalls.
printf("Please enter the string to concat from: ");
scanf("%20s", from);
printf("Please enter the string to concat to: ");
scanf("%20s", to);
Finally, be sure the string you're concatenating to can hold its own contents and the new contents.
char from[21], to[41];
printf("Please enter the string to concat to: ");
// Be sure to leave enough room in `to` to fit `from`.
fgets(to, sizeof(to) - sizeof(from), stdin);
I would have created a more dynamic memory model. This code is more generic and concatenates strings creating a new string containing both strings.. Free when done :-)...
char *strconcat(char *string1, char *string2) {
int lenStr1=0,lenStr2=0;
char *tmpStr1=string1,*tmpStr2=string2,*returnStr;
while (*tmpStr1++)lenStr1++;
while (*tmpStr2++)lenStr2++;
if((returnStr=(char *)malloc(lenStr1+lenStr2+1))){
memcpy(returnStr,string1,lenStr1);
memcpy(&returnStr[lenStr1],string2,lenStr2);
returnStr[lenStr1+lenStr2]=0;
return returnStr;
} else {
return 0;
}
}
int main() {
char *string1="String 1 ",*string2="String 2 ",*result;
if((result=strconcat(string1, string2))) {
printf("-> %s \n",result);
free(result);
} else {
printf("Out of memory");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}

Suggestion for parsing values from a particular format

I'm working on a client-server program, where initial step involves parsing the request from client on the server side. The input would look like this.
INSERT A->B B->C; QUERY A B; RESET;
So there are three different commands and they are separated by ';'. RESET option has no parameters. INSERT might have any number of parameter(which are space separated and each value separated by "->"). QUERY is again space separated. The server has to build a acyclic graph based on the input. So my problem is to parse this string into subsequent requests. I planned on using 'strtok' and when the final value is reached(for example 'A'), create a linked list of INSERTS(since the number of request is unknown). But my code is too big and I'm looking for a more concise solution for this problem.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
typedef struct insert {
char event1;
char event2;
struct insert * next;
}insert,*insPtr;
typedef struct query {
char event1;
char event2;
struct query * next;
}query,*queryPtr;
typedef struct reset {
int status;
}reset,*rPtr;
void create_node(char *events) {
char event[2];
char *a,*str;
char *pch = strtok(events,"->");
while(pch != NULL) {
printf("%s\n",pch);
pch = strtok(NULL,"->");
}
}
int insert_parser(char *string) {
char *a, *b;
a = string;
b = "INSERT ";
while(*a == *b){
a++;
b++;
}
char *pch = strtok(a," ");
while(pch){
printf("%s\n",pch);
pch = strtok(NULL," ");
}
return(0);
}
int parse_for_values (char* command) {
int value;
if (strstr(command, "INSERT") !=NULL ) {
printf("%s\n",command);
printf ("Insert command found\n");
value = 1;
} else if(strstr(command, "QUERY") !=NULL) {
printf("%s\n",command);
printf ("Query command found\n");
value = 2;
} else if(strstr(command, "RESET") != NULL) {
printf ("Reset command found\n");
printf("%s\n",command);
value = 3;
} else {
printf("unknown command:%s:\n",command);
printf("Unknown command\n");
return(1);
}
switch(value) {
case 1:
insert_parser(command);
break;
case 2:
break;
case 3:
break;
}
return(0);
}
int parse_for_command(char *input) {
char *ptr;
input = strtok(input,"\n");// this strtok is to remove the trailing '\n' in the string
ptr = strtok(input,";");
// printf("%s\n",ptr);
char *cmdPtr;
while(ptr != NULL) {
cmdPtr = ptr;
parse_for_values(cmdPtr);
// printf("%s",ptr);
ptr = strtok(NULL, ";"); //NULL as first argument tells strtok to work on internally held value
}
return(0);
}
int main() {
char input[100];
printf("Enter the input\n");
fgets(input, 100, stdin);
//printf("%s",input);
char *inp = input;
inp = strtok(inp,"\n");
create_node(inp);
//parser(input,"INSERT ", "->");
//parse_for_command(input);
return(0);
}
You have to find individual tokens (lexical analysis) and analyse this sequence of tokens (syntactic analysis). Either you write these procedures manually, or you formally specify syntax of your language and employ existing tools to create the required C code automatically (see flex and bison).
I'm not 100% clear on what you are asking, but the separation of the input commands into individual strings can be done with a single set of calls to strtok. Additionally, the time to remove the trailing newline resulting from the call to fgets is immediately after the call to fgets so you do not have dangling newlines hanging off the end of the string. (e.g.):
fgets(input, 100, stdin);
len = strlen (input); /* get length of str */
if (input[len - 1] != '\n') /* no '\n', input too long */
input[--len] = 0; /* null-terminate */
Separating the strings with strtok can make use of a for loop to handle the initial and subsequent calls to NULL in a single call. For purposes of the example, I've just used a statically declared character array to hold the separated strings, but you can just as well pass p to whatever type list or abstraction you desire. If your intent was to further separate the individual components of each command, then a simple while loop walking a pointer (or two) down the separated INSERT or QUERY command will work. Below is the example, if your intent was different, please let me know.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NCMD 10
#define LCMD 64
int main (void) {
char input[] = "INSERT A->B B->C; QUERY A B; RESET;";
char command[NCMD][LCMD] = {{0}};
char *p = NULL;
size_t i = 0;
size_t ncmd = 0;
for (p = strtok (input, ";"); p; p = strtok (NULL, ";")) {
while (*p == ' ') p++;
strncpy (command[ncmd++], p, LCMD);
}
printf ("\n The separated commands are:\n\n");
for (i = 0; i < ncmd; i++)
printf (" command[%zu] : %s\n", i, command[i]);
printf ("\n");
return 0;
}
Output
$ ./bin/parseinput
The separated commands are:
command[0] : INSERT A->B B->C
command[1] : QUERY A B
command[2] : RESET
Note: the length validation on p as well as the limit check on ncmd were intentionally omitted for purposes of the example.

strtok() skips first token

Can't seem to work out why this code is not working.
It should be really straight forward.
From what I have troubleshooted, in the while(token) block the id array is assigned but then when I go to print all the char array's outside the while(token) block the id array displays nothing but all the other array's display their contents.
int loadCatData(char* menuFile) {
char line[MAX_READ];
char id[ID_LEN];
char hotCold[MIN_DESC_LEN];
char name[MAX_NAME_LEN];
char description[MAX_DESC_LEN];
char delim[2] = "|";
char lineTemp[MAX_READ];
int count;
FILE *mfMenu = fopen(menuFile, "r");
while (fgets(line, sizeof(line), mfMenu)!=NULL) {
count = 0;
printf(line);
strcpy(lineTemp,line);
char* token = strtok(lineTemp, delim);
while (token) {
printf(token);
if (count == 0) {
strcpy(id, token);
}
if (count == 1) {
strcpy(hotCold, token);
}
if (count == 2) {
strcpy(name, token);
}
if (count == 3) {
strcpy(description, token);
}
printf("\n");
token = strtok(NULL, delim);
count = count + 1;
}
printf(id);
printf(hotCold);
printf(name);
printf(description);
}
fclose(mfMenu);
return true;
}
You are the victim of a buffer overflow error caused by strcpy.
What is happening is that the hotCold array is too small to hold the data you're filling it with, but strcpy doesn't care, nor does it know that there isn't enough room. So it keeps on writing data into hotCold and then runs out of room, then fills up the padding bytes, then fills up id. You just have the unfortunate luck of having the terminating null byte of hotCold sitting at the start of id.
Switch from using strcpy to strncpy or strncat (which I think is better than strncpy). If you're skeptical of what I'm saying, add a line of code at the end that goes like this:
assert (strlen (hotCold) < MIN_DESC_LEN);
The other alternative is that the id field is being interpreted as a special printf-format specifier. Just in case, replace printf(id) with printf("%s", id).
int loadCatData(const char* menuFile) {
char id[ID_LEN];
char hotCold[MIN_DESC_LEN];
char name[MAX_NAME_LEN];
char description[MAX_DESC_LEN];
FILE *mfMenu = fopen(menuFile, "r");
while (fscanf(mfMenu, "%*s|%*s|%*s|%*s",
sizeof(id), id, sizeof(hotCold), hotCold,
sizeof(name), name, sizeof(description), description) == 4) {
printf("%s %s %s %s\n", id, hotCold, name, description);
}
fclose(mfMenu);
return true;
}
You should never pass input from outside the program to printf as the first argument. Imagine if one of the tokens is "%s" and you say printf(token)--that's undefined behavior because you didn't pass a second string to print, and your program will crash if you're lucky.

How do you split a string in C?

If I have a string like:
const char* mystr = "Test Test Bla Bla \n Bla Bla Test \n Test Test \n";
How would I use the newline character '\n', to split the string into an array of strings?
I'm trying to accomplish in C, the thing string.Split() does in C# or boost's string algorithm split does in C++ .
Try to use the strtok function. Be aware that it modifies the source memory so you can't use it with a string literal.
char *copy = strdup(mystr);
char *tok;
tok = strtok(copy, "\n");
/* Do something with tok. */
while (tok) {
tok = strtok(NULL, "\n");
/* ... */
}
free(copy);
The simplest way to split a string in C is to use strtok() however that comes along with an arm's length list of caveats on its usage:
It's destructive (destroys the input string), and you couldn't use it on the string you have above.
It's not reentrant (it keeps its state between calls, and you can only be using it to tokenize one string at a time... let alone if you wanted to use it with threads). Some systems provide a reentrant version, e.g. strtok_r(). Your example might be split up like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void) {
char mystr[] = "Test Test Bla Bla \n Bla Bla Test \n Test Test \n";
char *word = strtok(mystr, " \n");
while (word) {
printf("word: %s\n", word);
word = strtok(NULL, " \n");
}
return 0;
}
Note the important change of your string declaration -- it's now an array and can be modified. It's possible to tokenize a string without destroying it, of course, but C does not provide a simple solution for doing so as part of the standard library.
Remember that C makes you do all the memory allocation by hand. Remember also that C doesn't really have strings, only arrays of characters. Also, string literals are immutable, so you're going to need to copy it. It will be easier to copy the whole thing first.
So, something like this (wholly untested):
char *copy = xstrdup(mystr);
char *p;
char **arry;
size_t count = 0;
size_t i;
for (p = copy; *p; p++)
if (*p == '\n')
count++;
arry = xmalloc((count + 1) * sizeof(char *));
i = 0;
p = copy;
arry[i] = p;
while (*p)
{
if (*p == '\n')
{
*p = '\0';
arry[i++] = p+1;
}
p++;
}
return arry; /* deallocating arry and arry[0] is
the responsibility of the caller */
In the above reactions, I see only while(){} loops, where IMHO for(){} loops are more compact.
cnicutar:
for(tok = strtok(copy, "\n");tok; tok = strtok(NULL, "\n") {
/* ... */
}
FatalError:
char *word;
for ( word = strtok(mystr, " \n");word; word = strtok(NULL, " \n") {
printf("word: %s\n", word);
}
Zack:
for (arry[i=0]=p=copy; *p ; p++)
{
if (*p == '\n')
{
*p = '\0';
arry[i++] = p+1;
}
}
[the clarity of this last example is disputable]
You can use below mentioned library. It has many other useful functions.
http://www.boost.org/doc/libs/1_48_0/libs/tokenizer/index.html
Or you can use strtok function.

How does strtok() split the string into tokens in C?

Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDIT:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."
strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.
This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.
strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)
From the POSIX strtok page:
This function uses static storage to keep track of the current string position between calls.
There is a thread-safe variant (strtok_r) that doesn't do this type of magic.
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;
for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
printf("token=%s\n", token);
}
Result:
this
is
the
string
I
want
to
parse
The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.
The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
And here is an example usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zStrtok(s,","));
printf("2 %s\n",zStrtok(NULL,","));
printf("3 %s\n",zStrtok(NULL,","));
printf("4 %s\n",zStrtok(NULL,","));
printf("5 %s\n",zStrtok(NULL,","));
printf("6 %s\n",zStrtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
#include "stdafx.h"
#include <iostream>
using namespace std;
char* mystrtok(char str[],char filter[])
{
if(filter == NULL) {
return str;
}
static char *ptr = str;
static int flag = 0;
if(flag == 1) {
return NULL;
}
char* ptrReturn = ptr;
for(int j = 0; ptr != '\0'; j++) {
for(int i=0 ; filter[i] != '\0' ; i++) {
if(ptr[j] == '\0') {
flag = 1;
return ptrReturn;
}
if( ptr[j] == filter[i]) {
ptr[j] = '\0';
ptr+=j+1;
return ptrReturn;
}
}
}
return NULL;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[200] = "This,is my,string.test";
char *ppt = mystrtok(str,", .");
while(ppt != NULL ) {
cout<< ppt << endl;
ppt = mystrtok(NULL,", .");
}
return 0;
}
For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.
In case the link got broken, paste in:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "Hello, my name is? Matthew! Hey.";
char* p;
for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
puts(p);
}
return 0;
}
Credits go to Anders K.
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DICT_LEN 256
int *create_delim_dict(char *delim)
{
int *d = (int*)malloc(sizeof(int)*DICT_LEN);
memset((void*)d, 0, sizeof(int)*DICT_LEN);
int i;
for(i=0; i< strlen(delim); i++) {
d[delim[i]] = 1;
}
return d;
}
char *my_strtok(char *str, char *delim)
{
static char *last, *to_free;
int *deli_dict = create_delim_dict(delim);
if(!deli_dict) {
/*this check if we allocate and fail the second time with entering this function */
if(to_free) {
free(to_free);
}
return NULL;
}
if(str) {
last = (char*)malloc(strlen(str)+1);
if(!last) {
free(deli_dict);
return NULL;
}
to_free = last;
strcpy(last, str);
}
while(deli_dict[*last] && *last != '\0') {
last++;
}
str = last;
if(*last == '\0') {
free(deli_dict);
free(to_free);
deli_dict = NULL;
to_free = NULL;
return NULL;
}
while (*last != '\0' && !deli_dict[*last]) {
last++;
}
*last = '\0';
last++;
free(deli_dict);
return str;
}
int main()
{
char * str = "- This, a sample string.";
char *del = " ,.-";
char *s = my_strtok(str, del);
while(s) {
printf("%s\n", s);
s = my_strtok(NULL, del);
}
return 0;
}
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
printf("%s\n",p);
}
Input: How is that
Result:
How
is
that
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok is replacing delimiter with'\0' NULL character in given string
CODE
#include<iostream>
#include<cstring>
int main()
{
char s[]="30/4/2021";
std::cout<<(void*)s<<"\n"; // 0x70fdf0
char *p1=(char*)0x70fdf0;
std::cout<<p1<<"\n";
char *p2=strtok(s,"/");
std::cout<<(void*)p2<<"\n";
std::cout<<p2<<"\n";
char *p3=(char*)0x70fdf0;
std::cout<<p3<<"\n";
for(int i=0;i<=9;i++)
{
std::cout<<*p1;
p1++;
}
}
OUTPUT
0x70fdf0 // 1. address of string s
30/4/2021 // 2. print string s through ptr p1
0x70fdf0 // 3. this address is return by strtok to ptr p2
30 // 4. print string which pointed by p2
30 // 5. again assign address of string s to ptr p3 try to print string
30 4/2021 // 6. print characters of string s one by one using loop
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that strtok is not just returning adress but it is placing '\0' character where delimiter is present.
cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''

Resources