Related
How to split a string into an tokens and then save them in an array?
Specifically, I have a string "abc/qwe/jkh". I want to separate "/", and then save the tokens into an array.
Output will be such that
array[0] = "abc"
array[1] = "qwe"
array[2] = "jkh"
please help me
#include <stdio.h>
#include <string.h>
int main ()
{
char buf[] ="abc/qwe/ccd";
int i = 0;
char *p = strtok (buf, "/");
char *array[3];
while (p != NULL)
{
array[i++] = p;
p = strtok (NULL, "/");
}
for (i = 0; i < 3; ++i)
printf("%s\n", array[i]);
return 0;
}
You can use strtok()
char string[] = "abc/qwe/jkh";
char *array[10];
int i = 0;
array[i] = strtok(string, "/");
while(array[i] != NULL)
array[++i] = strtok(NULL, "/");
Why strtok() is a bad idea
Do not use strtok() in normal code, strtok() uses static variables which have some problems. There are some use cases on embedded microcontrollers where static variables make sense but avoid them in most other cases. strtok() behaves unexpected when more than 1 thread uses it, when it is used in a interrupt or when there are some other circumstances where more than one input is processed between successive calls to strtok().
Consider this example:
#include <stdio.h>
#include <string.h>
//Splits the input by the / character and prints the content in between
//the / character. The input string will be changed
void printContent(char *input)
{
char *p = strtok(input, "/");
while(p)
{
printf("%s, ",p);
p = strtok(NULL, "/");
}
}
int main(void)
{
char buffer[] = "abc/def/ghi:ABC/DEF/GHI";
char *p = strtok(buffer, ":");
while(p)
{
printContent(p);
puts(""); //print newline
p = strtok(NULL, ":");
}
return 0;
}
You may expect the output:
abc, def, ghi,
ABC, DEF, GHI,
But you will get
abc, def, ghi,
This is because you call strtok() in printContent() resting the internal state of strtok() generated in main(). After returning, the content of strtok() is empty and the next call to strtok() returns NULL.
What you should do instead
You could use strtok_r() when you use a POSIX system, this versions does not need static variables. If your library does not provide strtok_r() you can write your own version of it. This should not be hard and Stackoverflow is not a coding service, you can write it on your own.
I've been assigned a homework from my college professor and I seem to have found some strange behavior of strtok
Basically, we have to parse a CSV file for my class, where the number of tokens in the CSV is known and the last element may have extra "," characters.
An example of a line:
Hello,World,This,Is,A lot, of Text
Where the tokens should be output as
1. Hello
2. World
3. This
4. Is
5. A lot, of Text
For this assignment we MUST use strtok. Because of this I found on some other SOF post that using strtok with an empty string (or passing "\n" as the second argument) results in reading until the end of the line. This is perfect for my application since the extra commas always appear in the last element.
I've created this code which works:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#define NUM_TOKENS 5
const char *line = "Hello,World,This,Is,Text";
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS; token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n"))
{
tokens[i++] = strdup(token);
}
free(copy);
return tokens;
}
int main()
{
char **tokens = split_line(line, NUM_TOKENS);
for (int i = 0; i < NUM_TOKENS; i++)
{
printf("%s\n", tokens[i]);
free(tokens[i]);
}
}
Now this works and should get me full credit but I hate this ternary that shouldn't be needed:
token = strtok(NULL, i < NUM_TOKENS - 1 ? ",\n" : "\n");
I'd like to replace the method with this version:
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
// Make an array the correct size to hold num_tokens
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
tokens[i] = strdup(strtok(NULL, "\n"));
free(copy);
return tokens;
}
This tickles my fancy much nicer since it is much easier to see that there is a final case. You also get rid of the strange ternary operator.
Sadly though, this segfaults! I can't for the life of me figure out why.
Edit: Add some output examples:
[11:56:06] gravypod:test git:(master*) $ ./test_no_fault
Hello
World
This
Is
Text
[11:56:10] gravypod:test git:(master*) $ ./test_seg_fault
[1] 3718 segmentation fault (core dumped) ./test_seg_fault
[11:56:14] gravypod:test git:(master*) $
Please check the return value from strtok before you risk passing NULL to another function. Your loop is calling strtok one more time than you think.
It is more usual to use this return value to control your loop, then you are not at the mercy of your data. As for the delimitors, best to keep it simple and not try anything fancy.
char **split_line(const char *line, int num_tokens)
{
char *copy = strdup(line);
char **tokens = (char**) malloc(sizeof(char*) * num_tokens);
int i = 0;
char *token;
char delim1[] = ",\r\n";
char delim2[] = "\r\n";
char *delim = delim1; // start with a comma in the delimiter set
token = strtok(copy, delim);
while(token != NULL) { // strtok result comtrols the loop
tokens[i++] = strdup(token);
if(i == NUM_TOKENS) {
delim = delim2; // change the delimiters
}
token = strtok(NULL, delim);
}
free(copy);
return tokens;
}
Note you should also check the return values from malloc and strdup and free your memory properly
When you get to the last loop, you'll get
for (char *token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
loop body
loop increment step, i.e. token = strtok(NULL, ",\n") (with the wrong second arg)
loop continuation check i < NUM_TOKENS - 1
i.e. it has still called strtok even though you're now out-of-range. You've also got an off-by-one on your array indices here: you'd want to initialise i=0 not 1.
You could avoid this by e.g.
making the initial strtok a special case outside the loop, e.g.
int i = 0;
tokens[i++] = strdup(strtok(copy, ",\n"));
then moving the strtok(NULL, ",\n") inside the loop
I'm also surprised you want the \n there at all, or even need to call the last strtok (wouldn't that already just point to the rest of the string? If you just trying to chop a trailing newline there are easier ways) but I haven't used strtok in years.
(As an aside you're also not freeing the malloced array you store the string pointers in. That said since it's the end of the program at that point that doesn't matter so much.)
Remember that strtok identifies a token when it finds any of the characters in the delimiter string (the second argument to strtok()) - it doesn't try to match the entire delimiter string itself.
Thus, the ternary operator was never needed in the first place - the string will be tokenized based on the occurrence of , OR \n in the input string, so the following works:
for (token = strtok(copy, ",\n"); i < NUM_TOKENS; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
The second example segfaults because it's already tokenized the input to the end of the string by the time it exits the for loop. Calling strtok() again sets token to NULL, and the segfault is generated when strdup() is called on the NULL pointer. Removing the extra call to strtok gives the expected results:
for (token = strtok(copy, ",\n"); i < NUM_TOKENS - 1; token = strtok(NULL, ",\n"))
{
tokens[i++] = strdup(token);
}
tokens[i] = strdup(token);
I want to delete last character in string
first, i use strtok function
My Input is : "Hello World Yaho"
I use " " as my delimeter
My expectation is this
Hell
Worl
Yah
But the actual output is this
Hello
Worl
Yaho
How can I solve this problem? I can't understand this output
this is my code
int main(int argc, char*argv[])
{
char *string;
char *ptr;
string = (char*)malloc(100);
puts("Input a String");
fgets(string,100,stdin);
printf("Before calling a function: %s]n", string);
ptr = strtok(string," ");
printf("%s\n", ptr);
while(ptr=strtok(NULL, " "))
{
ptr[strlen(ptr)-1]=0;
printf("%s\n", ptr);
}
return 0;
}
This program deletes the last character of every word.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main(int argc, char*argv[]){
char *string;
char *ptr;
string = (char*)malloc(100);
puts("Input a String");
fgets(string,100,stdin);
printf("Before calling a function: %s\n", string);
string[strlen(string)-1]=0;
ptr = strtok(string," ");
printf("%s\n", ptr);
while(ptr){
ptr[strlen(ptr)-1]=0;
printf("%s\n", ptr);
ptr = strtok(0, " ");
}
return 0;
}
You must remember to
Trim the string from trailing newline
Use strtok properly
Test
Input a String
Hello World Yaho
Before calling a function: Hello World Yaho
Hello
Hell
Worl
Yah
Your problem is best solved by splitting it in 2 phases: parsing the phrase into words on one hand, with strtok if you wish, and printing the words with their last character omitted in a separate function:
#include <stdio.h>
#include <string.h>
static void print_truncated_word(const char *ptr) {
int len = strlen(ptr);
if (len > 0) len -= 1;
printf("%.*s", len, ptr);
}
int main(int argc, char*argv[]) {
char buf[128];
char *ptr;
puts("Input a string: ");
if (fgets(buf, sizeof buf, stdin) == NULL) {
/* premature end of file */
exit(1);
}
printf("Before calling a function: %s\n", string);
ptr = strtok(string, " \n");
while (ptr) {
print_truncated_word(ptr);
ptr = strtok(NULL, " \n");
}
return 0;
}
Note that the print_truncated_word function does not modify the buffer. Side effects on input arguments should be avoided, unless they are the explicit goal of the function. strtok is ill behaved to this regard, among other shortcomings such as its hidden state that prevents nested use.
Since you kept the delm as space it will create separate tokens for space separated words in your string and c-style strings contain their last characters as '\0' i.e null character so it deletes that character and not your last character in the text.
check this out
http://www.cprogramming.com/tutorial/c/lesson9.html
it turns out that C-style strings are always terminated with a null character, literally a '\0' character (with the value of 0),
I'm not good at using C language. Here is my dumb question. Now I am trying to get input from users, which may have spaces. And what I need to do is to split this sentence using space as delimiter and then put each fragment into char* array. Ex:
Assuming I have char* result[10];, and the input is: Good morning John. The output should be result[0]="Good"; result[1]="morning"; result[2]="John";I have already tried scanf("%[^\n]",input); and gets(input); Yet it is still hard to deal with String in C. And also I have tried strtok, but it seems that it only replaced the space by NULL. Hence the result will be GoodNULLmorningNULLJohn. Obviously it's not what I want. Please help my dumb question. Thanks.
Edit:
This is what I don't understand when using strtok. Here is a test code.
The substr still displayed Hello there. It seems subtok only replace a null at the space position. Thus, I can't use the substr in an if statement.
int main()
{
int i=0;
char* substr;
char str[] = "Hello there";
substr = strtok(str," ");
if(substr=="Hello"){
printf("YES!!!!!!!!!!");
}
printf("%s\n",substr);
for(i=0;i<11;i++){
printf("%c", substr[i]);
}
printf("\n");
system("pause");
return 0;
}
Never use gets, is deprecated in C99 and removed from C11.
IMO, scanf is not a good function to use when you don't know the number of elements before-hand, I suggest fgets:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[128];
char *ptr;
fgets(str, sizeof str, stdin);
/* Remove trailing newline */
ptr = strchr(str, '\n');
if (ptr != NULL) {
*ptr = '\0';
}
/* Tokens */
ptr = strtok(str, " ");
while (ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(NULL, " ");
}
return 0;
}
gets is not recommended to use, as there is no way to tell the size of the buffer. fgets is ok here because it will stop reading when the 1st new line is encountered. You could use strtok to store all the splited words in to an array of strings, for example:
#include <stdio.h>
#include <string.h>
int main(void) {
char s[256];
char *result[10];
fgets(s, sizeof(s), stdin);
char *p = strtok(s, " \n");
int cnt = 0;
while (cnt < (sizeof result / sizeof result[0]) && p) {
result[cnt++] = p;
p = strtok(NULL, " \n");
}
for (int i = 0; i < cnt; i++)
printf("%s\n", result[i]);
return 0;
}
As most of the other answers haven't covered another thing you were asking:
strtok will not allocate temporary memory and will use your given string to replace every separator with a zero termination. This is why Good morning John becomes GoodNULLmorningNULLJohn. If it wouldn't do this, each token would print the whole rest of the string on its tail like:
result[0] = Good morning John
result[1] = morning John
result[2] = John
So if you want to keep your original input and an array of char* per word, you need 2 buffers. There is no other way around that. You also need the token buffer to stay in scope as long as you use the result array of char* pointers, else that one points to invalid memory and will cause undefined behavior.
So this would be a possible solution:
int main()
{
const unsigned int resultLength = 10;
char* result[resultLength];
memset(result, 0, sizeof result); // we should also zero the result array to avoid access violations later on
// Read the input from the console
char input[256];
fgets(input, sizeof input, stdin);
// Get rid of the newline char
input[strlen(input) - 1] = 0;
// Copy the input string to another buffer for your tokens to work as expected
char tokenBuffer[256];
strcpy(tokenBuffer, input);
// Setting of the pointers per word
char* token = strtok(tokenBuffer, " ");
for (unsigned int i = 0; token != NULL && i < resultLength; i++)
{
result[i] = token;
token = strtok(NULL, " ");
}
// Print the result
for (unsigned int i = 0; i < resultLength; i++)
{
printf("result[%d] = %s\n", i, result[i] != NULL ? result[i] : "NULL");
}
printf("The input is: %s\n", input);
return 0;
}
It prints:
result[0] = Good
result[1] = morning
result[2] = John
result[3] = NULL
result[4] = NULL
result[5] = NULL
result[6] = NULL
result[7] = NULL
result[8] = NULL
result[9] = NULL
The input is: Good morning John
I was wondering how you could take 1 string, split it into 2 with a delimiter, such as space, and assign the 2 parts to 2 separate strings. I've tried using strtok() but to no avail.
#include <string.h>
char *token;
char line[] = "SEVERAL WORDS";
char *search = " ";
// Token will point to "SEVERAL".
token = strtok(line, search);
// Token will point to "WORDS".
token = strtok(NULL, search);
Update
Note that on some operating systems, strtok man page mentions:
This interface is obsoleted by strsep(3).
An example with strsep is shown below:
char* token;
char* string;
char* tofree;
string = strdup("abc,def,ghi");
if (string != NULL) {
tofree = string;
while ((token = strsep(&string, ",")) != NULL)
{
printf("%s\n", token);
}
free(tofree);
}
For purposes such as this, I tend to use strtok_r() instead of strtok().
For example ...
int main (void) {
char str[128];
char *ptr;
strcpy (str, "123456 789asdf");
strtok_r (str, " ", &ptr);
printf ("'%s' '%s'\n", str, ptr);
return 0;
}
This will output ...
'123456' '789asdf'
If more delimiters are needed, then loop.
Hope this helps.
char *line = strdup("user name"); // don't do char *line = "user name"; see Note
char *first_part = strtok(line, " "); //first_part points to "user"
char *sec_part = strtok(NULL, " "); //sec_part points to "name"
Note: strtok modifies the string, so don't hand it a pointer to string literal.
You can use strtok() for that
Example: it works for me
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
If you have a char array allocated you can simply put a '\0' wherever you want.
Then point a new char * pointer to the location just after the newly inserted '\0'.
This will destroy your original string though depending on where you put the '\0'
If you're open to changing the original string, you can simply replace the delimiter with \0. The original pointer will point to the first string and the pointer to the character after the delimiter will point to the second string. The good thing is you can use both pointers at the same time without allocating any new string buffers.
You can do:
char str[] ="Stackoverflow Serverfault";
char piece1[20] = ""
,piece2[20] = "";
char * p;
p = strtok (str," "); // call the strtok with str as 1st arg for the 1st time.
if (p != NULL) // check if we got a token.
{
strcpy(piece1,p); // save the token.
p = strtok (NULL, " "); // subsequent call should have NULL as 1st arg.
if (p != NULL) // check if we got a token.
strcpy(piece2,p); // save the token.
}
printf("%s :: %s\n",piece1,piece2); // prints Stackoverflow :: Serverfault
If you expect more than one token its better to call the 2nd and subsequent calls to strtok in a while loop until the return value of strtok becomes NULL.
This is how you implement a strtok() like function (taken from a BSD licensed string processing library for C, called zString).
Below function differs from the standard strtok() in the way it recognizes consecutive delimiters, whereas the standard strtok() does not.
char *zstring_strtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
Below is an example code that demonstrates the usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zstring_strtok(s,","));
printf("2 %s\n",zstring_strtok(NULL,","));
printf("3 %s\n",zstring_strtok(NULL,","));
printf("4 %s\n",zstring_strtok(NULL,","));
printf("5 %s\n",zstring_strtok(NULL,","));
printf("6 %s\n",zstring_strtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
You can even use a while loop (standard library's strtok() would give the same result here)
char s[]="some text here;
do {
printf("%s\n",zstring_strtok(s," "));
} while(zstring_strtok(NULL," "));