I`m having a hard time splitting a sentence read from a file in C programming language via strtok function. I scanned it from a file and stored it in a variable info, from which I need to separate words. I tried many things and eventually copied a code from the net and changed it a little bit. The code separates the first token nicely, but then it writes some nonsense.
#include <stdio.h>
#include <string.h>
void main()
{
//int i; //brojac
char info[]=""; // sve informacije, kasnije treba da bude u strukturi
FILE *pok;
pok=fopen("C:/Users/Trajkovici/Desktop/OsobeFajl.txt","r");
if(pok==NULL)
{
printf("Greška prilikom otvaranja datoteke!");
}
fscanf(pok,"%[^\n]",&info);
puts("INFO: ");
puts(info);
//fclose(pok);
char * token = strtok(info, " ");
// loop through the string to extract all other tokens
while( token != NULL )
{
puts("\nTOKEN:");
printf( " %s\n", token ); //printing each token
token = strtok(NULL, " ");
}
}
This is the file and the result:
The result
The file
BTW, I wrote the same code, without extracting a sentence from a file, but instead declaring it manually. It works perfectly fine.
#include<stdio.h>
#include <string.h>
int main() {
char string[] = "Sladjan Jankovic 46 Vranje";
// Extract the first token
puts(string);
char * token = strtok(string, " ");
// loop through the string to extract all other tokens
while( token != NULL )
{
printf( " %s\n", token ); //printing each token
token = strtok(NULL, " ");
}
return 0;
}
And this is the result of the above code:
The result
So, the problem is that I have two codes with literally same variables, but one of them splits into tokens fine, while the other one doesn`t. Any help about the first code?
P.S. Sorry for possible bad indentation, this is my first time posting on Stack Overflow. Also, some comments and lines from the file are in Serbian.
char info[]="";
will allocate only one element. Using it in
fscanf(pok,"%[^\n]",&info);
is dangerous because it will write out-of-bounds when a string with positive length is read. (even one-character string is too long because there must be a terminating null-character).
Allocate enough elements like (for example):
char info[102400]="";
and specify the maximum length to read (the limit have to be at most the size of buffer minus one for terminating null-character) to prevent buffer overrun like this:
fscanf(pok,"%102399[^\n]",info);
Also note that you should remove & before info. Arrays in expressions (except for some exceptions) are automatically converted to pointers for their first elements. Adding & will have it pass a pointer to an array while %[ expects a pointer to a character. Passing data having wrong type to fscanf() invokes undefined behavior.
Related
I am trying to read a string word by word in C using strsep() function, which can be also done using strtok(). When there are consecutive delimiters -in my case the empty space- the function does not ignore them. I am expected to use strsep() and couldn't figure out the solution. I'd appreciate it if one of you can help me.
#include <stdio.h>
#include <string.h>
int main(){
char newLine[256]= "scalar i";
char *q;
char *token;
q = strdup(newLine);
const char delim[] = " ";
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
return 0;
}
Actual output is:
The token is: "scalar"
The token is: ""
What I expected is:
The token is: "scalar"
The token is: "i"
To do that I also tried to write a while loop so that I could continue until the token is non-empty.
But I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
First note that strsep(), while convenient is not in the standard C library, and will only be available on Unix systems with BSD-4.4 C library support. That's most Unix'ish systems today, but still.
Anyway, strsep() supports empty fields. That means that if your string has consecutive delimiters, it will find empty, length-0, tokens between each of these delimiters. For example, the tokens for string "ab cd" will be:
"ab"
""
"cd"
2 delimiters -> 3 tokens.
Now, you also said:
I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
I am guessing what you were trying to perform is simply comparison, e.g. if (my_token == "") { ... }. That won't work, because that is a comparison of pointers, not of the strings' contents. Two strings may have identical characters at different places in memory, and that is particularly likely with the example I just gave, since my_token will be dynamic, and will not be pointing to the static-storage-duration string "" used in the comparison.
Instead, you will need to use strcmp(my_token,""), or better yet, just check manually for the first char being '\0'.
I am having a struggle with the following exercise in my book:
Write a program that prompts the user to enter a series of words separated by single spaces, then prints the words in reverse order. Read the input as a string, and then use strtok to break it into words.
Input:hi there you are cool
Output: None it shuts itself.
Expected:cool are you there hi
My program only gets the string and waits and shuts after a couple of seconds. Here's the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void){
int ch ;
char * str , * str2;
char * p;
str = (char*)malloc(sizeof(char) * 100);
str2 =(char*)malloc(sizeof(char) * 100);
if((fgets(str , sizeof(str) , stdin)) != NULL){
str = strtok(str ," \t");
p = strrchr(str , '\0');
strcat(str2,p);
printf("%s",p);
while(str != NULL){
str = strtok(NULL ," \t");
p = strrchr(str + 1, '\0');
strcat(str2,p);
printf("%s",p);
}
}
return 0;
}
I know this question has been asked here. I get the idea there but my problem is implementation and carrying out. This is more of a beginner question.
Since you yourself stated that this is for an exercise I will not provide a working solution but an outline of what you might want to do.
Functions you want to use:
getline - for an easy read of an input line (notice that the newline character will not be eliminated
strtok_r to get the tokens (i.e. the words) from the input string
the _r means that this function is re-entrant which means that it can saftly be called by multiple threads at the same time. The normal version has an internal state and strtok_r lets you manage that state via a parameter.
(Please also read the docs for these functions if you have further questions)
For the algorithm:
Use getline to read a single line from input and replace the newline character with the 0 char. Then you should extract all one token after the other from the input and store them in a stack like fashion. After you tokenized the input just pop the token from the stack an print them to the stdout.
Another approach would be:
Write a function that simply reverses a string. Then use this function to reverse the input string and then for all tokens to read the token from the reversed input string and print the reverse token to stdout.
Currently learning C, Having some trouble with passing c-string tokens into array. Lines come in by standard input, strtok is used to split the line up, and I want to put each into an array properly. an EOF check is required for exiting the input stream. Here's what I have, set up so that it will print the tokens back to me (these tokens will be converted to ASCII in a different code segment, just trying to get this part to work first).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char string[1024]; //Initialize a char array of 1024 (input limit)
char *token;
char *token_arr[1024]; //array to store tokens.
char *out; //used
int count = 0;
while(fgets(string, 1023, stdin) != NULL) //Read lines from standard input until EOF is detected.
{
if (count == 0)
token = strtok(string, " \n"); //If first loop, Get the first token of current input
while (token != NULL) //read tokens into the array and increment the counter until all tokens are stored
{
token_arr[count] = token;
count++;
token = strtok(NULL, " \n");
}
}
for (int i = 0; i < count; i++)
printf("%s\n", token_arr[i]);
return 0;
}
this seems like proper logic to me, but then i'm still learning. The issue seems to be with streaming in multiple lines before sending the EOF signal with ctrl-D.
For example, given an input of:
this line will be fine
the program returns:
this
line
will
be
fine
But if given:
none of this
is going to work
It returns:
is going to work
ing to work
to work
any help is greatly appreciated. I'll keep working at it in the meantime.
There are a couple of issues here:
You never call token = strtok(string, " \n"); again once the string is "reset" to a new value, so strtok() still thinks it is tokenizing your original string.
strtok is returning pointers to "substrings" inside string. You are changing the contents of what is in string and so your second line effectively corrupts your first (since the original contents of string are overwritten).
To do what you want you need to either read each line into a different buffer or duplicate the strings returned by strtok (strdup() is one way - just remember to free() each copy...)
I am trying to save one character and 2 strings into variables.
I use sscanf to read strings with the following form :
N "OldName" "NewName"
What I want : char character = 'N' , char* old_name = "OldName" , char* new_name = "NewName" .
This is how I am trying to do it :
sscanf(mystring,"%c %s %s",&character,old_name,new_name);
printf("%c %s %s",character,old_name,new_name);
The problem is , my problem stops working without any outputs .
(I want to ignore the quotation marks too and save only its content)
When you do
char* new_name = "NewName";
you make the pointer new_name point to the read-only string array containing the constant string literal. The array contains exactly 8 characters (the letters of the string plus the terminator).
First of all, using that pointer as a destination for scanf will cause scanf to write to the read-only array, which leads to undefined behavior. And if you give a string longer than 7 character then scanf will also attempt to write out of bounds, again leading to undefined behavior.
The simple solution is to use actual arrays, and not pointers, and to also tell scanf to not read more than can fit in the arrays. Like this:
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
sscanf(mystring,"%c %63s %63s",&character,old_name,new_name);
To skip the quotation marks you have a couple of choices: Either use pointers and pointer arithmetic to skip the leading quote, and then set the string terminator at the place of the last quote to "remove" it. Another solution is to move the string to overwrite the leading quote, and then do as the previous solution to remove the last quote.
Or you could rely on the limited pattern-matching capabilities of scanf (and family):
sscanf(mystring,"%c \"%63s\" \"%63s\"",&character,old_name,new_name);
Note that the above sscanf call will work iff the string actually includes the quotes.
Second note: As said in the comment by Cool Guy, the above won't actually work since scanf is greedy. It will read until the end of the file/string or a white-space, so it won't actually stop reading at the closing double quote. The only working solution using scanf and family is the one below.
Also note that scanf and family, when reading string using "%s" stops reading on white-space, so if the string is "New Name" then it won't work either. If this is the case, then you either need to manually parse the string, or use the odd "%[" format, something like
sscanf(mystring,"%c \"%63[^\"]\" \"%63[^\"]\"",&character,old_name,new_name);
You must allocate space for your strings, e.g:
char* old_name = malloc(128);
char* new_name = malloc(128);
Or using arrays
char old_name[128] = {0};
char new_name[128] = {0};
In case of malloc you also have to free the space before the end of your program.
free(old_name);
free(new_name);
Updated:...
The other answers provide good methods of creating memory as well as how to read the example input into buffers. There are two additional items that may help:
1) You expressed that you want to ignore the quotation marks too.
2) Reading first & last names when separated with space. (example input is not)
As #Joachim points out, because scanf and family stop scanning on a space with the %s format specifier, a name that includes a space such as "firstname lastname" will not be read in completely. There are several ways to address this. Here are two:
Method 1: tokenizing your input.
Tokenizing a string breaks it into sections separated by delimiters. Your string input examples for instance are separated by at least 3 usable delimiters: space: " ", double quote: ", and newline: \n characters. fgets() and strtok() can be used to read in the desired content while at the same time strip off any undesired characters. If done correctly, this method can preserve the content (even spaces) while removing delimiters such as ". A very simple example of the concept below includes the following steps:
1) reading stdin into a line buffer with fgets(...)
2) parse the input using strtok(...).
Note: This is an illustrative, bare-bones implementation, sequentially coded to match your input examples (with spaces) and includes none of the error checking/handling that would normally be included.
int main(void)
{
char line[128];
char delim[] = {"\n\""};//parse using only newline and double quote
char *tok;
char letter;
char old_name[64]; // Space for 63 characters plus string terminator
char new_name[64];
fgets(line, 128, stdin);
tok = strtok(line, delim); //consume 1st " and get token 1
if(tok) letter = tok[0]; //assign letter
tok = strtok(NULL, delim); //consume 2nd " and get token 2
if(tok) strcpy(old_name, tok); //copy tok to old name
tok = strtok(NULL, delim); //consume 3rd " throw away token 3
tok = strtok(NULL, delim); //consume 4th " and get token 4
if(tok) strcpy(new_name, tok); //copy tok to new name
printf("%c %s %s\n", letter, old_name, new_name);
return 0;
}
Note: as written, this example (as do most strtok(...) implementations) require very narrowly defined input. In this case input must be no longer than 127 characters, comprised of a single character followed by space(s) then a double quoted string followed by more space(s) then another double quoted string, as defined by your example:
N "OldName" "NewName"
The following input will also work in the above example:
N "old name" "new name"
N "old name" "new name"
Note also about this example, some consider strtok() broken, while others suggest avoiding its use. I suggest using it sparingly, and only in single threaded applications.
Method 2: walking the string.
A C string is just an array of char terminated with a NULL character. By selectively copying some characters into another string, while bypassing the one you do not want (such as the "), you can effectively strip unwanted characters from your input. Here is an example function that will do this:
char * strip_ch(char *str, char ch)
{
char *from, *to;
char *dup = strdup(str);//make a copy of input
if(dup)
{
from = to = dup;//set working pointers equal to pointer to input
for (from; *from != '\0'; from++)//walk through input string
{
*to = *from;//set destination pointer to original pointer
if (*to != ch) to++;//test - increment only if not char to strip
//otherwise, leave it so next char will replace
}
*to = '\0';//replace the NULL terminator
strcpy(str, dup);
free(dup);
}
return str;
}
Example use case:
int main(void)
{
char line[128] = {"start"};
while(strstr(line, "quit") == NULL)
{
printf("Enter string (\"quit\" to leave) and hit <ENTER>:");
fgets(line, 128, stdin);
sprintf(line, "%s\n", strip_ch(line, '"'));
printf("%s", line);
}
return 0;
}
I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.
My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;
My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?
My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?
And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!
int main() {
char phoneNum[]= "(515) 555-5555";
char aCode[3];
char aNum[7];
char *numPtr;
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
printf("%s\n", aCode);
numPtr = strtok(&phoneNum[6], "-");
while (numPtr != NULL) {
strcat(aNum, numPtr);
numPtr = strtok(NULL, "-");
}
printf("%s", aNum);
}
I can primarily see two errors,
Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.
strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.
Also, see other answers for a complete bug-free code.
The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.
Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:
char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);
This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.
Demo.
Your code has multiple issues
You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.
char aCode[4] = {'\0'};
You don't check if strtok() returns NULL.
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.
Subsequent calls to strtok() must have NULL as the first parameter, hence
numPtr = strtok(&phoneNum[6], "-");
is wrong, it should be
numPtr = strtok(NULL, "-");
Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:
char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
int len = 0;
(void) snprintf(area, sizeof(area), "%s", p);
while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
}
}
(void) printf("(%s) %s\n", area, digits);