How to cut a string using 2 delimiters - c

How to cut a string using 2 delimiters in C?
I'm getting a string from the user in this platform:
cp <path1> <path2>
I need to get the pathes into a new string (each path to one string).
I tried to use strstr and strtok but it doesn't work.
I don't know the length of the pathes. I also just know that they are starting with " \" (this are the delimiters that I have (space + \)).
this is what i tried
#include
#include
#include
int main()
{
char *c;
char *ch = malloc(1024);
while (strcmp(ch, "exit"))
{
scanf("%[^\n]%*c", ch); //what was the input (cp /dor/arthur /king/apple)
c = malloc(sizeof(strlen(ch) + 1));
strcpy(c, ch);
char *pch = strtok(c, " //");
printf("this is : %s \n", pch); //printed "this is: cp"
}
}

use strtok() . the above link contains an example of using strtok().
you cans use the 2 delimeters (space + \) with strtok() in this way:
str = strtok(str, " \\");

Is in the main function? If it is, main function has argc (int) and *argv[] (string) parameters which you can do what you want.

Related

How to replace a single character in one string with another string?

Looking to replace a character of a string with another string. I'd like it to work so that it can be used in the middle of a string and keeps the subsequent characters.
e.g. below would like to alter the string 'AND THE' to be 'ANDSPACETHE', currently outputting 'ANDSPACE' using strstr(). Ideally this could work multiple times in the same string e.g. 'AND THE CAT' --> 'ANDSPACETHESPACECAT'
#include<string.h>
#include<stdlib.h>
#include<stdio.h>
int main ()
{
char str[30] ="AND THE";
char * pch;
pch = strstr(str," ");
if (pch != NULL){
strncpy (pch,"SPACE",6);
}
printf("%s\n",str);
return 0;
}
OUTPUT = ANDSPACE
DESIREDOUTPUT = ANDSPACETHE
You could start with something like this:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
char str[30] = "AND THE";
char strtoinsert[] = "SPACE";
char* pch = strstr(str, " ");
if (pch != NULL) {
// move remaining part of string further in order to make room
memmove(pch + strlen(strtoinsert), pch + 1, strlen(pch));
// copy the string to insert
memcpy(pch, strtoinsert, strlen(strtoinsert));
}
printf("%s\n", str);
return 0;
}
This will replace the first space encountered. With that base you should be able to figure out how to replace all spaces in a string.
Hint 1: repeat the whole thing until no more space is found.
Hint 2: if the string to insert contains spaces, then it's slightly more complicated as you cannot apply simply hint 1.
Hint 3: if you want to replace not just a single character but a whole word, this doesn't work either, you should be able to figure it out.

The difference between using strtok() to inputed string or declared string

To understand the behavior of strtok() in C ANSI, I worte two code.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = "This is";
char *pch;
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
The result of This program is
Splitting string "This is " into tokens:
This
is
Next, I changed it a little bit.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = ;
char *pch;
scanf("%s",str); //After launch program, I typed "This is "
str[strcspn(str,"\n")] = '\0'
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
It prints
Splitting string "This" into tokens:
This
I can't understand why the second word is gone when I use stdin.
The problem isn't with strtok, but with your use of scanf and the "%s" format specifier. That format specifier reads space delimited strings, i.e you can not use "%s" to read anything with a space in it.
The natural solution is to use fgets instead, which you have already prepared for by "removing the newline" (which scanf would not usually read anyway).
It should have been pretty obvious that the strtok can't be involved, since you print the input string before even calling strtok.

Why does strtok() not tokenize the string a certain way?

I'm trying to tokenize a string using brackets [] as delimiters. I can tokenize a string exactly how I want it with one input, but it has an error other times. For example, I have a string with characters before the delimiter and it works fine, but if nothing is before the delimiter then I run into errors.
This one gives me an error. The token2 ends up being NULL and token is "name]" with the bracket still on there.
char name[] = "[name]";
char *token = strtok(name, "[");
char *token2 = strtok(NULL, "]");
Output:
token = name]
token2 = NULL
However, if I have the following, then it works just fine.
char line[] = "Hello [name]";
char *tok = strtok(line, "[");
char *tok2 = strtok(NULL, "]");
Output:
tok = Hello
tok2 = name
I don't understand what I'm doing wrong when the input is simply something like "[name]". I want just what's inside the brackets only.
Edit:
Thanks for the input, everyone. I found a solution to what I'm trying to do. Per #Ryan and #StoryTeller's advice, I first checked if the input began with [ and delimited with []. Here's what I tried and worked for the input:
char name[] = "[name]", *token = NULL, *token2 = NULL;
if (name[0] == '[')
{
token = strtok(name, "[]");
}
else
{
token = strtok(name, "[");
token2 = strtok(NULL, "]");
}
In short: the 2nd time you called strtok() in your first example is the same as calling it on an empty string and this is why you get NULL.
Each call to strtok gives you the token based on your chosen delimiter. In your 1st try:
char name[] = "[name]";
char *token = strtok(name, "[");
char *token2 = strtok(NULL, "]");
The delimiter you chose is "[" so the 1st call to strtok will get "name]", since this is the first token in the string (remember that the string starts with a delimiter). The second will get NULL, since "name]" was the end of your original string and invoking strotk() now is like invoking it on an empty string.
strtok() uses a static buffer that holds your original string and each invocation "uses" another part of that buffer. After your 1st call, the function "used" the entire buffer.
In your 2nd try:
char line[] = "Hello [name]";
char *tok = strtok(line, "[");
char *tok2 = strtok(NULL, "]");
You call strtok on a string with the delimiter in the middle of it, so you get a token AND you still have a string left in the static buffer used by the function. That enables the 2nd call of strtok() to return a valid token instead of NULL.
If you are simply trying to extract the contents between a single-pair of brackets [...], then strchr provides a bit more straight-forward way to accomplish the task. When you are calling strtok with a single-delimiter (e.g. '[' and then ']'), you are essentially doing what you would do with two successive calls to strchr with the characters being the same '[' and then ']'.
For example the following will parse the string given on the command line for the characters between brackets ("[some name]" by default if no argument is given) up to a maximum of MAXNM character (including the nul-terminating character):
#include <stdio.h>
#include <string.h>
#define MAXNM 128
int main (int argc, char **argv) {
char *s = argc > 1 ? argv[1] : "[some name]", /* input */
*p = s, /* pointer */
*ep, /* end pointer */
buf[MAXNM] = ""; /* buffer for result */
/* if starting and ending bracket are present in input */
if ((p = strchr (s, '[')) && (ep = strchr (p, ']'))) {
if (ep - p > MAXNM) { /* length + 1 > MAXNM ? */
fprintf (stderr, "error: result too long.\n");
return 1;
}
/* copy betweeen brackets to buf (+1 for char after `[`) */
strncpy (buf, p + 1, ep - p - 1); /* ep - p - 1 for length */
buf[ep - p - 1] = 0; /* nul terminate, also done via initialization */
printf ("name : '%s'\n", buf); /* output the name */
}
else
fprintf (stderr, "error: no enclosing brackets found in input.\n");
return 0;
}
note: the benefit of using strchr and strncpy for paring between fixed delimiters is you do not modify the original string (like strtok does). So this method is safe for use with string literals or other constant strings.
Example Use/Output
$ ./bin/brackets
name : 'some name'
$ ./bin/brackets "this [is the name] and more"
name : 'is the name'

How to scan multiple words using sscanf in C?

I'm trying to scan a line that contains multiple words in C. Is there a way to scan it word by word and store each word as a different variable?
For example, I have the following types of lines:
A is the 1 letter;
B is the 2 letter;
C is the 3 letter;
If I'm parsing through the first line: "A is the 1 letter" and I have the following code, what do I put in each case so I can get the individual tokens and store them as variables. To clarify, by the end of this code, I want "is," "the," "1," "letter" in different variables.
I have the following code:
while (feof(theFile) != 1) {
string = "A is the 1 letter"
first_word = sscanf(string);
switch(first_word):
case "A":
what to put here?
case "B":
what to put here?
...
You shouldn't use feof() like that. You should use fgets() or equivalent. You probably need to use the little-known (but present in standard C89) conversion specifier %n.
#include <stdio.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255s%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
}
}
return(0);
}
This reads a line, then uses sscanf() iteratively to fetch words from the line. The %n format specifier doesn't count towards the successful conversions, hence the comparison with 1. Note the use of %255s to prevent overflows in word. Note too that sscanf() could write a null after the 255 count specified in the conversion specification, hence the difference of one between the declaration of char word[256]; and the conversion specifier %255s.
Clearly, it is up to you to decide what to do with each word as it is extracted; the code here simply prints it.
One advantage of this technique over any solution based on strtok() is that sscanf() does not modify the input string so if you need to report an error, you have the original input line to use in the error report.
After editing the question, it seems that the punctuation like semi-colon is not wanted in a word; the code above would include punctuation as part of the word. In that case, you have to think a bit harder about what to do. The starting point might well be using and alphanumeric scan-set as the conversion specification in place of %255s:
"%255[a-zA-Z_0-9]%n"
You probably then have to look at what's in the character at the start of the next component and skip it if it is not alphanumeric:
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
Leading to:
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255[a-zA-Z_0-9]%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
}
}
return(0);
}
You'll need to consider the I18N and L10N aspects of the alphanumeric ranges chosen; what's available may depend on your implementation (POSIX doesn't specify support in scanf() scan-sets for the notations such as [[:alnum:]], unfortunately).
You can use strtok() to tokenize or split strings. Please refer the following link for an example: http://www.cplusplus.com/reference/cstring/strtok/
You can take array of character pointers and assign tokens to them.
Example:
char *tokens[100];
int i = 0;
char *token = strtok(string, " ");
while (token != NULL) {
tokens[i] = token;
token = strtok(NULL, " ");
i++;
}
printf("Total Tokens: %d", i);
Note the %s specifier strips whitespace. So you can write:
std::string s = "A is the 1 letter";
typedef char Word[128];
Word words[6];
int wordsRead = sscanf(s.c_str(), "%128s%128s%128s%128s%128s%128s", words[0], words[1], words[2], words[3], words[4], words[5] );
std::cout << wordsRead << " words read" << std::endl;
for(int i = 0;
i != wordsRead;
++i)
std::cout << "'" << words[i] << "'" << std::endl;
Note how this approach (unlike strtok), effectively requires an assumption about the maximim number of words to read, as well as their lengths.
I would recommend using strtok().
Here is the example from http://www.cplusplus.com/reference/cstring/strtok/
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output will be:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string

How does strtok() split the string into tokens in C?

Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDIT:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."
strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.
This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.
strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)
From the POSIX strtok page:
This function uses static storage to keep track of the current string position between calls.
There is a thread-safe variant (strtok_r) that doesn't do this type of magic.
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;
for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
printf("token=%s\n", token);
}
Result:
this
is
the
string
I
want
to
parse
The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.
The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
And here is an example usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zStrtok(s,","));
printf("2 %s\n",zStrtok(NULL,","));
printf("3 %s\n",zStrtok(NULL,","));
printf("4 %s\n",zStrtok(NULL,","));
printf("5 %s\n",zStrtok(NULL,","));
printf("6 %s\n",zStrtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
#include "stdafx.h"
#include <iostream>
using namespace std;
char* mystrtok(char str[],char filter[])
{
if(filter == NULL) {
return str;
}
static char *ptr = str;
static int flag = 0;
if(flag == 1) {
return NULL;
}
char* ptrReturn = ptr;
for(int j = 0; ptr != '\0'; j++) {
for(int i=0 ; filter[i] != '\0' ; i++) {
if(ptr[j] == '\0') {
flag = 1;
return ptrReturn;
}
if( ptr[j] == filter[i]) {
ptr[j] = '\0';
ptr+=j+1;
return ptrReturn;
}
}
}
return NULL;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[200] = "This,is my,string.test";
char *ppt = mystrtok(str,", .");
while(ppt != NULL ) {
cout<< ppt << endl;
ppt = mystrtok(NULL,", .");
}
return 0;
}
For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.
In case the link got broken, paste in:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "Hello, my name is? Matthew! Hey.";
char* p;
for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
puts(p);
}
return 0;
}
Credits go to Anders K.
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DICT_LEN 256
int *create_delim_dict(char *delim)
{
int *d = (int*)malloc(sizeof(int)*DICT_LEN);
memset((void*)d, 0, sizeof(int)*DICT_LEN);
int i;
for(i=0; i< strlen(delim); i++) {
d[delim[i]] = 1;
}
return d;
}
char *my_strtok(char *str, char *delim)
{
static char *last, *to_free;
int *deli_dict = create_delim_dict(delim);
if(!deli_dict) {
/*this check if we allocate and fail the second time with entering this function */
if(to_free) {
free(to_free);
}
return NULL;
}
if(str) {
last = (char*)malloc(strlen(str)+1);
if(!last) {
free(deli_dict);
return NULL;
}
to_free = last;
strcpy(last, str);
}
while(deli_dict[*last] && *last != '\0') {
last++;
}
str = last;
if(*last == '\0') {
free(deli_dict);
free(to_free);
deli_dict = NULL;
to_free = NULL;
return NULL;
}
while (*last != '\0' && !deli_dict[*last]) {
last++;
}
*last = '\0';
last++;
free(deli_dict);
return str;
}
int main()
{
char * str = "- This, a sample string.";
char *del = " ,.-";
char *s = my_strtok(str, del);
while(s) {
printf("%s\n", s);
s = my_strtok(NULL, del);
}
return 0;
}
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
printf("%s\n",p);
}
Input: How is that
Result:
How
is
that
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok is replacing delimiter with'\0' NULL character in given string
CODE
#include<iostream>
#include<cstring>
int main()
{
char s[]="30/4/2021";
std::cout<<(void*)s<<"\n"; // 0x70fdf0
char *p1=(char*)0x70fdf0;
std::cout<<p1<<"\n";
char *p2=strtok(s,"/");
std::cout<<(void*)p2<<"\n";
std::cout<<p2<<"\n";
char *p3=(char*)0x70fdf0;
std::cout<<p3<<"\n";
for(int i=0;i<=9;i++)
{
std::cout<<*p1;
p1++;
}
}
OUTPUT
0x70fdf0 // 1. address of string s
30/4/2021 // 2. print string s through ptr p1
0x70fdf0 // 3. this address is return by strtok to ptr p2
30 // 4. print string which pointed by p2
30 // 5. again assign address of string s to ptr p3 try to print string
30 4/2021 // 6. print characters of string s one by one using loop
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that strtok is not just returning adress but it is placing '\0' character where delimiter is present.
cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''

Resources