Issues splitting a string into 2 halves in C - c

I have a string that contains with spaces, such as "print 2" or "print 3 test". I'm trying to remove the first argument - in these examples, the print.
I tried strtok():
char *test;
test = strtok(COMMAND, " ");
printf("%s\n", test);
However printing test will segfault. I tried making a function, and it works fine from main() but when called from the function I need it in, it also segfaults.
char* split(char S[], int N) {
printf("Running split() on %s\n", S);
int Spaces = 1;
int i = 0;
for (i; i<strlen(S) && Spaces <=N; i++) {
if (S[i] == ' ') {
Spaces++;
}
}
printf("split: %s\n", &S[i]);
//return "0";
return &S[i];
}
I'm guessing it's some kind of pointer problem. Command is being passed into the print function like so:
Print(File, Lines, COMMAND);

I don't know what COMMAND is in your test sample, but you should test, if strtok returns null (when strtok can't find a token).
printf with a nullpointer will give you a seg vault.
Normally you call strtok from a loop:
http://www.cplusplus.com/reference/clibrary/cstring/strtok/

always test the return value of strtok()!
If no such byte (2nd parameter) is found, ie. no tokens exist in the string pointed to by the 1st parameter, a null pointer is returned.

Related

string input parameter for c program

I wrote a function to replace the blank spaces with tab character. But when I tried to implement it using function. I am quite not understanding how to call this function. I need functions
Which takes string from user as input,
Second function which replaces the blank space with tab character,
Function to print the modified string.
I achieved second one:
void SecondFunction()
{
char string[] = "I am new to c";
char *p = string;
for (; *p; ++p)
{
if (*p == ' ')
*p = '\t';
}
printf(string);
}
And when I tried to call this function like:
int main()
{
SecondFunction("Hi s");
}
By changing my function to:
void SecondFunction(char* str)
{
char string[] = str;
char *p = string;
....
...etc
}
I get the following error:
error: invalid initializer
char string[] = str;
^
Please, can anybody help me to write the 3 functions of my requirement?
Reading user input
To read input from the user you can use scanf. You need to pass it the memory address of the variable where you want to store the input:
char userinput[256]; // make it large enough to hold what the user inputs
scanf("%s", userinput); // array decays to pointer so no '&' here
The %s means were reading string input. We could also read an int using %d, like this:
int i;
scanf("%d", &i); // note the address-of operator '&' to get the address of i
Printing variables
Your SecondFunction is almost correct. To printf a C-string you need to use a syntax similar to when you scanf to a variable:
printf("%s", string);
Similarly, you could print the int i like this:
printf("The number is: %d", i);
Copying C-strings
When you tried doing this: char string[] = str, that's not possible. Arrays cannot be assigned or even copy constructed.
Just in case for the future, when you want to copy a C-string, you need to use strcpy:
char string[256]; // again, must be large enough to hold the new contents
strcpy(string, str); // copies from str to string
So in conclusion, your function could look something like this:
void SecondFunction(char* str)
{
char string[256];
strcpy(string, str);
char *p = string;
for (; *p; ++p)
{
if (*p == ' ')
*p = '\t';
}
printf("%s", string);
}
Bonus: Why you can't write to the str parameter directly
When you write this: SecondFunction("Hi s"), the string "Hi s" gets stored in a read-only memory segment.
If you then go and try to modify the parameter inside SecondFunction, you get undefined behavior, possibly a segmentation fault.

How to extract a substring from a string in C?

I tried using strncmp but it only works if I give it a specific number of bytes I want to extract.
char line[256] = This "is" an example. //I want to extract "is"
char line[256] = This is "also" an example. // I want to extract "also"
char line[256] = This is the final "example". // I want to extract "example"
char substring[256]
How would I extract all the elements in between the ""? and put it in the variable substring?
Note: I edited this answer after I realized that as written the code would cause a problem as strtok doesn't like to operate on const char* variables. This was more an artifact of how I wrote the example than a problem with the underlying principle - but apparently it deserved a double downvote. So I fixed it.
The following works (tested on Mac OS 10.7 using gcc):
#include <stdio.h>
#include <string.h>
int main(void) {
const char* lineConst = "This \"is\" an example"; // the "input string"
char line[256]; // where we will put a copy of the input
char *subString; // the "result"
strcpy(line, lineConst);
subString = strtok(line,"\""); // find the first double quote
subString=strtok(NULL,"\""); // find the second double quote
printf("the thing in between quotes is '%s'\n", subString);
}
Here is how it works: strtok looks for "delimiters" (second argument) - in this case, the first ". Internally, it knows "how far it got", and if you call it again with NULL as the first argument (instead of a char*), it will start again from there. Thus, on the second call it returns "exactly the string between the first and second double quote". Which is what you wanted.
Warning: strtok typically replaces delimiters with '\0' as it "eats" the input. You must therefore count on your input string getting modified by this approach. If that is not acceptable you have to make a local copy first. In essence I do that in the above when I copy the string constant to a variable. It would be cleaner to do this with a call to line=malloc(strlen(lineConst)+1); and a free(line); afterwards - but if you intend to wrap this inside a function you have to consider that the return value has to remain valid after the function returns... Because strtok returns a pointer to the right place inside the string, it doesn't make a copy of the token. Passing a pointer to the space where you want the result to end up, and creating that space inside the function (with the correct size), then copying the result into it, would be the right thing to do. All this is quite subtle. Let me know if this is not clear!
if you want to do it with no library support...
void extract_between_quotes(char* s, char* dest)
{
int in_quotes = 0;
*dest = 0;
while(*s != 0)
{
if(in_quotes)
{
if(*s == '"') return;
dest[0]=*s;
dest[1]=0;
dest++;
}
else if(*s == '"') in_quotes=1;
s++;
}
}
then call it
extract_between_quotes(line, substring);
#include <string.h>
...
substring[0] = '\0';
const char *start = strchr(line, '"') + 1;
strncat(substring, start, strcspn(start, "\""));
Bounds and error checking omitted. Avoid strtok because it has side effects.
Here is a long way to do this: Assuming string to be extracted will be in quotation marks
(Fixed for error check suggested by kieth in comments below)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char input[100];
char extract[100];
int i=0,j=0,k=0,endFlag=0;
printf("Input string: ");
fgets(input,sizeof(input),stdin);
input[strlen(input)-1] = '\0';
for(i=0;i<strlen(input);i++){
if(input[i] == '"'){
j =i+1;
while(input[j]!='"'){
if(input[j] == '\0'){
endFlag++;
break;
}
extract[k] = input[j];
k++;
j++;
}
}
}
extract[k] = '\0';
if(endFlag==1){
printf("1.Your code only had one quotation mark.\n");
printf("2.So the code extracted everything after that quotation mark\n");
printf("3.To make sure buffer overflow doesn't happen in this case:\n");
printf("4.Modify the extract buffer size to be the same as input buffer size\n");
printf("\nextracted string: %s\n",extract);
}else{
printf("Extract = %s\n",extract);
}
return 0;
}
Output(1):
$ ./test
Input string: extract "this" from this string
Extract = this
Output(2):
$ ./test
Input string: Another example to extract "this gibberish" from this string
Extract = this gibberish
Output(3):(Error check suggested by Kieth)
$ ./test
Input string: are you "happy now Kieth ?
1.Your code only had one quotation mark.
2.So the code extracted everything after that quotation mark
3.To make sure buffer overflow doesn't happen in this case:
4.Modify the extract buffer size to be the same as input buffer size
extracted string: happy now Kieth ?
--------------------------------------------------------------------------------------------------------------------------------
Although not asked for it -- The following code extracts multiple words from input string as long as they are in quotation marks:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char input[100];
char extract[50];
int i=0,j=0,k=0,endFlag=0;
printf("Input string: ");
fgets(input,sizeof(input),stdin);
input[strlen(input)-1] = '\0';
for(i=0;i<strlen(input);i++){
if(input[i] == '"'){
if(endFlag==0){
j =i+1;
while(input[j]!='"'){
extract[k] = input[j];
k++;
j++;
}
endFlag = 1;
}else{
endFlag =0;
}
//break;
}
}
extract[k] = '\0';
printf("Extract = %s\n",extract);
return 0;
}
Output:
$ ./test
Input string: extract "multiple" words "from" this "string"
Extract = multiplefromstring
Have you tried looking at the strchr function? You should be able to call that function twice to get pointers to the first and second instances of the " character and use a combination of memcpy and pointer arithmetic to get what you want.

segmentation fault while running the programme

I have written code for parsing a string into words. Here is code. Can any one help here to fix the segmentation fault error during run time?
Calling fun :
int main()
{
int count = 0, i; // count to hold numbr of words in the string line.
char buf[MAX_LENTHS]; // buffer to hold the string
char *options[MAX_ORGS]; // options to hold the words that we got after parsing.
printf("enter string");
scanf("%s",buf);
count = parser(buf,options); // calling parser
for(i = 0; i < count; ++i)
printf("option %d is %s", i, options[i]);
return 0;
}
Called function:
int parser(char str[], char *orgs[])
{
char temp[1000];//(char *)malloc(strlen(str)*sizeof(char));
int list = 0;
strcpy(temp, str);
*orgs[list]=strtok(str, " \t ");
while(((*orgs[list++]=strtok(str," \t"))!=NULL)&&MAX_ORGS>list)
list++;
printf("count =%d",list);
return list;
}
Note : I'm trying to learn C these days, can any one help to get a good tutorial (pdf) or site to learn these strings with pointers, and sending string to functions as arguments?
You are using strtok wrong.
(It is generally best to not use strtok at all, for all its problems and pitfalls.)
If you must use it, the proper way to use strtok is to call it ONCE with the string you want to "tokenize",
then call it again and again with NULL as an indication to continue parsing the original string.
I also think you're using the orgs array wrong.
Change this assignment
*orgs[list++]=strtok(str, " \t ");
to this:
orgs[list++]=strtok(str, " \t ");
Because orgs is an array of character-pointers.
orgs[x] is a character-pointer, which matches the return-type of strtok
Instead, you are referring to *orgs[x], which is just a character.
So you are trying to do:
[character] = [character-pointer];
which will result in "very-bad-thingsā„¢".
Finally, note that you are incrementing list twice each time through your loop.
So basically you're only filling in the even-elements, leaving the odd-elements of orgs uninitialized.
Only increment list once per loop.
Basically, you want this:
orgs[list++] = strtok(str, " \t ");
while(( (orgs[list++] = strtok(NULL," \t")) !=NULL) && MAX_ORGS > list)
/* do nothing */;
PS You allocate space for temp, and strcpy into it.
But then it looks like you never use it. Explain what temp is for, or remove it.
char buf[MAX_LENTHS];
You have not defined the array size, i. e. MAX_LENTHS should be defined like
#define MAX_LENTHS 25
And as Paul R says in his comment you also need to initialize your array of character pointers
char *options[MAX_ORGS];
with something .
int parser(char str[], char *orgs[]){
int list=0;
orgs[list]=strtok(str, " \t\n");
while(orgs[list]!=NULL && ++list < MAX_ORGS)
orgs[list]=strtok(NULL," \t\n");
printf("count = %d\n",list);
return list;
}
int main(){
int count=0,i;
char buf[MAX_LENTHS];
char *options[MAX_ORGS];
printf("enter string: ");
fgets(buf, sizeof(buf), stdin);//input include space character
count=parser(buf,options);
for(i=0;i<count;++i)
printf("option %d is %s\n",i,options[i]);
return 0;
}

C program to find individual words in a string using strtok

I am writing a program where I use strtok in order to find each word in a string that I type into the command line, in my example, my code is called command.c so when I type:
./command.out "Hi, there"
I should get as my result:
Arg = "Hi, there"
Next word "Hi,"
Next word "there"
so far my code will complete the arg part of the print statement, but will not use execute the latter part in order to separate the string in question, my code currently is as follows:
#include <stdio.h>
#include <string.h>
void main (int argc, char *argv[]) {
int i;
for(i =1;i< argc; i++)
printf("Arg = %s\n", argv[i]);
char delims[] = " ";
char *word = NULL;
word = strtok(argv[i], delims);
while(word != NULL) {
printf("Next word \"%s\"\n", word);
word = strtok(NULL, delims);
}
}
Where am I going wrong and how can I fix this code? Thanks for all the help
You are missing the curly braces around the for block:
for(i =1;i< argc; i++)
{
printf /* ... and so forth */
}
Your code indentation is wrong, this may cause your problem.
The 'for' statement affects only the next line, the printf one, so variable 'i' increases later to value '2', then when you ask for argv[i], you're asking for argv[2], you should call argv[1].

How does strtok() split the string into tokens in C?

Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDIT:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."
strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.
This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.
strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)
From the POSIX strtok page:
This function uses static storage to keep track of the current string position between calls.
There is a thread-safe variant (strtok_r) that doesn't do this type of magic.
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;
for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
printf("token=%s\n", token);
}
Result:
this
is
the
string
I
want
to
parse
The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.
The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
And here is an example usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zStrtok(s,","));
printf("2 %s\n",zStrtok(NULL,","));
printf("3 %s\n",zStrtok(NULL,","));
printf("4 %s\n",zStrtok(NULL,","));
printf("5 %s\n",zStrtok(NULL,","));
printf("6 %s\n",zStrtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
#include "stdafx.h"
#include <iostream>
using namespace std;
char* mystrtok(char str[],char filter[])
{
if(filter == NULL) {
return str;
}
static char *ptr = str;
static int flag = 0;
if(flag == 1) {
return NULL;
}
char* ptrReturn = ptr;
for(int j = 0; ptr != '\0'; j++) {
for(int i=0 ; filter[i] != '\0' ; i++) {
if(ptr[j] == '\0') {
flag = 1;
return ptrReturn;
}
if( ptr[j] == filter[i]) {
ptr[j] = '\0';
ptr+=j+1;
return ptrReturn;
}
}
}
return NULL;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[200] = "This,is my,string.test";
char *ppt = mystrtok(str,", .");
while(ppt != NULL ) {
cout<< ppt << endl;
ppt = mystrtok(NULL,", .");
}
return 0;
}
For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.
In case the link got broken, paste in:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "Hello, my name is? Matthew! Hey.";
char* p;
for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
puts(p);
}
return 0;
}
Credits go to Anders K.
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DICT_LEN 256
int *create_delim_dict(char *delim)
{
int *d = (int*)malloc(sizeof(int)*DICT_LEN);
memset((void*)d, 0, sizeof(int)*DICT_LEN);
int i;
for(i=0; i< strlen(delim); i++) {
d[delim[i]] = 1;
}
return d;
}
char *my_strtok(char *str, char *delim)
{
static char *last, *to_free;
int *deli_dict = create_delim_dict(delim);
if(!deli_dict) {
/*this check if we allocate and fail the second time with entering this function */
if(to_free) {
free(to_free);
}
return NULL;
}
if(str) {
last = (char*)malloc(strlen(str)+1);
if(!last) {
free(deli_dict);
return NULL;
}
to_free = last;
strcpy(last, str);
}
while(deli_dict[*last] && *last != '\0') {
last++;
}
str = last;
if(*last == '\0') {
free(deli_dict);
free(to_free);
deli_dict = NULL;
to_free = NULL;
return NULL;
}
while (*last != '\0' && !deli_dict[*last]) {
last++;
}
*last = '\0';
last++;
free(deli_dict);
return str;
}
int main()
{
char * str = "- This, a sample string.";
char *del = " ,.-";
char *s = my_strtok(str, del);
while(s) {
printf("%s\n", s);
s = my_strtok(NULL, del);
}
return 0;
}
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
printf("%s\n",p);
}
Input: How is that
Result:
How
is
that
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok is replacing delimiter with'\0' NULL character in given string
CODE
#include<iostream>
#include<cstring>
int main()
{
char s[]="30/4/2021";
std::cout<<(void*)s<<"\n"; // 0x70fdf0
char *p1=(char*)0x70fdf0;
std::cout<<p1<<"\n";
char *p2=strtok(s,"/");
std::cout<<(void*)p2<<"\n";
std::cout<<p2<<"\n";
char *p3=(char*)0x70fdf0;
std::cout<<p3<<"\n";
for(int i=0;i<=9;i++)
{
std::cout<<*p1;
p1++;
}
}
OUTPUT
0x70fdf0 // 1. address of string s
30/4/2021 // 2. print string s through ptr p1
0x70fdf0 // 3. this address is return by strtok to ptr p2
30 // 4. print string which pointed by p2
30 // 5. again assign address of string s to ptr p3 try to print string
30 4/2021 // 6. print characters of string s one by one using loop
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that strtok is not just returning adress but it is placing '\0' character where delimiter is present.
cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''

Resources