Reading a line of chars one word at a time - c

I'm new to C and I see plenty of example of reading a file one word at a time but I'm trying to make a function that is given a line of text(actually a list of filenames) and it needs to read a word(filename) at a time.
Eg. I call the function, words("file1.c file2.c file3.txt");
And the function needs to read each word(filename) and put it through another function.
So far I've got:
void words(char* line) {
char buf[100];
while (!feof(line)) {
fscanf(line,"%s",buf);
printf("current word %s \n", buf);
}
}
But this won't compile. I get "passing argument 1 of ‘feof’ from incompatible pointer type"
edit So this is the code I've come up with. It seems to work fine if I called it with words("test1 test2 test3 test4 "); but if the last character is not a space then it has an error in the out put. eg ("test1 test2 test3 test4");
char buf[100];
int word_length = 0;
int n;
while((sscanf(line + word_length,"%s",buf, &n)) == 1) {
printf("current word %s \n", buf);
word_length = word_length + strlen(buf) + 1;
}
What I am doing wrong?

The fscanf and feof functions work on files.
The corresponding function for strings is sscanf.
The return value from sscanf can be used to check whether you managed to scan anything from the string and how far into the string you should look for the next word.
Edit:
Good effort. There are two problems left. First, if there are multiple spaces between words your code will fail. Also, the + 1 will move you past the null terminator if there is no space after the last word.
The second problem can be solved by not adding a +1. That means that the next item will be scanned right after the previous one ends. This is not a problem because scanf will skip initial whitespace.
The problem with multiple spaces can be solved by finding how far into the string the next token starts using strstr.
Because strstr returns a pointer I switched to using a pointer instead of an index to keep track of progress through the string.
char *ptr = line;
while((sscanf(ptr,"%s",buf)) == 1) {
printf("current word %s \n", buf);
ptr = strstr(ptr, buf); // Find where the current word starts.
ptr += strlen(buf); // Skip past the current word.
}

Related

How to get each string within a buffer fetched with "getline" from a file in C

I'm trying to read every string separated with commas, dots or whitespaces from every line of a text from a file (I'm just receiving alphanumeric characters with scanf for simplicity). I'm using the getline function from <stdio.h> library and it reads the line just fine. But when I try to "iterate" over the buffer that was fetched with it, it always returns the first string read from the file. Let's suppose I have a file called "entry.txt" with the following content:
test1234 test hello
another test2
And my "main.c" contains the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_WORD 500
int main()
{
FILE *fp;
int currentLine = 1;
size_t characters, maxLine = MAX_WORD * 500;
/* Buffer can keep up to 500 words of 500 characters each */
char *word = (char *)malloc(MAX_WORD * sizeof(char)), *buffer = (char *)malloc((int)maxLine * sizeof(char));
fp = fopen("entry.txt", "r");
if (fp == NULL) {
return 1;
}
for (currentLine = 1; (characters = getline(&buffer, &maxLine, fp)) != -1; currentLine++)
{
/* This line gets "test1234" onto "word" variable, as expected */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // As expected
/* This line should get "test" string, but again it obtains "test1234" from the buffer */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // Not intended...
// Do some stuff with the "word" and "currentLine" variables...
}
return 0;
}
What happens is that I'm trying to get every alphanumeric string (namely word from now on) in sequence from the buffer, when the sscanf function just gives me the first occurrence of a word within the specified buffer string. Also, every line on the entry file can contain an unknown amount of words separated by either whitespaces, commas, dots, special characters, etc.
I'm obtaining every line from the file separately with "getline" because I need to get every word from every line and store it in other place with the "currentLine" variable, so I'll know from which line a given word would've come. Any ideas of how to do that?
fscanf has an input stream argument. A stream can change its state, so that the second call to fscanf reads a different thing. For example:
fscanf(stdin, "%s", str1); // str1 contains some string; stdin advances
fscanf(stdin, "%s", str2); // str2 contains some other sting
scanf does not have a stream argument, but it has a global stream to work with, so it works exactly like fscanf(stdin, ...).
sscanf does not have a stream argument, nor there is any global state to keep track of what was read. There is an input string. You scan it, some characters get converted, and... nothing else changes. The string remains the same string (how could it possibly be otherwise?) and no information about how far the scan has advanced is stored anywhere.
sscanf(buffer, "%s", str1); // str1 contains some string; nothing else changes
sscanf(buffer, "%s", str2); // str2 contains the same sting
So what does a poor programmer fo?
Well I lied. No information about how far the scan has advanced is stored anywhere only if you don't request it.
int nchars;
sscanf(buffer, "%s%n", str1, &nchars); // str1 contains some string;
// nchars contains number of characters consumed
sscanf(buffer+nchars, "%s", str2); // str2 contains some other string
Error handling and %s field widths omitted for brevity. You should never omit them in real code.

Using strtok_s, the second item is always NULL, even though the first works right. How can I get both values?

I am working in C and the strtok_s function isnt working as expected. I want to separate 2 halves of user input, delimited by a space character between them. Ive been reading the manual but i cannot figure it out. Below is the function I wrote. Its goal is to separate the first and second half of user input delimited by a space and return the value to 2 pointers. The print statement has only been used for my debugging.
void argGetter(char* commandDesired, char** firstArg, char** secondArg) {
// this char holds the first part of the command before the " "
char* commandCleanDesired;
// this char array holds the part after the " "
char *nextToken;
char *argument;
commandCleanDesired = strtok_s(commandDesired, " ", &nextToken);
argument = strtok_s(NULL, " ", &nextToken);
printf("\n\nCMD 1 is %s\n\nCMD 2 is %s\n\n\n", commandCleanDesired, argument);
*firstArg = commandCleanDesired;
*secondArg = argument;
}
//this shows how argGetter is called.
void main() {
// these hold the return values from argGetter()
char* secondArg = NULL;
char* firstArg = NULL;
//This holds user input
char commandDesired[255];
//This line prints the prompt
printf("\n\tSanity$hell> ");
//Then we get user input
scanf_s("%s", commandDesired, 255);
//split the command from args using argGetter
argGetter(commandDesired, &firstArg, &secondArg);
printf("\n First Arg is %s\n", firstArg);
printf("\nYour second arg is %s\n\n", secondArg);
}
It gets commandCleanDesired fine, but the second variable, (named 'argument') is ALWAYS null.
I have tried the things below to get the value after the space and store it in argument (unsuccessfully). These little code snippets show how I modified the above code during my attempts to solve the issue.
commandCleanDesired = strtok_s(commandDesired, " ", &commandDesired);
argument = strtok_s(commandDesired, " ", &commandDesired);
//the above resulted in NULL for the second value argument as well.
// Below is the next thing i tried.
char * nextToken;
commandCleanDesired = strtok_s(commandDesired, " ", &nextToken);
argument = strtok_s(NULL, " ", &nextToken);
//both result in argument being NULL.
//I tried the above after reading the manual more.
I have been reading the manual at https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strtok-s-strtok-s-l-wcstok-s-wcstok-s-l-mbstok-s-mbstok-s-l?view=msvc-170.
I used NULL for the string argument the second time because the above manual led me to believe that was necessary for all subsequent calls after the first call. An example input of commandDesired would be "cd C://"
For the above input, i would like this function to have commandCleanDesired = 'cd' and argument = 'C://'
currently with the misbehavior of the above function for the above input, the function gives commandCleanDesired = 'cd' and argument = (NULL)
TLDR, How am I misusing the strtok_s function in C, how can I get the second value after the space to be stored in the "argument" pointer?
Thank you in advance.
The issue is that I used scanf_s or scanf to get the user input in main. This tokenizes the input, which is not what I want.
If you want to read a whole line, use fgets. When I use fgets instead, the issue is solved!
If you want to separate strings at the space characters, don't use scanf() (or friends) with the %s format specifier, as it stops reading at space characters themselves, so the string that finally reaches strtok (or friends) don't have spaces on it. This is probably the most probable reason (I have not looked in detail at your code, sorry) that you get the first word in the first time, and NULL later.
A good alternative, is to use fgets(), in something like:
char line[1024];
/* the following call to fgets() reads a complete line (incl. the
* \n char) into line. */
while (fgets(line, sizeof line, stdin)) { /* != NULL means not eof */
for ( char *arg = strtok(line, " \t\n");
arg != NULL;
arg = strtok(NULL, " \t\n"))
{
/*process argument in arg here */
}
}
Or, if you want to first get out the last \n char, and then process
the whole line to tokenize the arguments...
char line[1024];
/* the following call to fgets() reads a complete line (incl. the
* \n char) into line. */
while (fgets(line, sizeof line, stdin)) { /* != NULL means not eof */
process_line(strtok(line, "\n")); /* only one \n at end can be, at most */
}
Then, inside the process_line() function you need to check the parameter for NULL (for the case the string only has a single \n on it, that will result in a null output from strtok())
IMPORTANT WARNING: strtok() is not reentrant, and also it cannot be nested. It uses an internal, global iterator that is initialized each time you provide a first non-null parameter. If you need to run several levels of scanning, you have two options:
run the outer loop in full, appending work to do to a second level set of jobs (or similar) to be able to run strtok() on each separate level when the first loop is finished.
run the reentrant version of strtok(), e.g. strtok_r(). This will allow reentrancy and nesting, you just need to provide a different state buffer (where strtok stores the iterator state) for each nesting level (or thread)

Understanding this program which prints a sentence in reverse but keeps the words unchanged

I don't quite understand this program. I don't understand what is happening in the for loop. Can someone explain to me in simple words. And the site also didn't explain it well-enough. This is the link to the site. https://www.geeksforgeeks.org/print-words-string-reverse-order/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void printReverse(char str[])
{
int length = strlen(str);
FILE *fptr;
if((fptr=fopen("Question1.txt","w"))==NULL)
{
printf("Invalid file");
exit(0);
}
int i;
for (i = length - 1; i >= 0; i--) {
if (str[i] == ' ')
{
str[i] = '\0';
printf("%s ", &(str[i]) + 1);
fprintf(fptr,"%s ", &(str[i]) + 1);
}
}
fprintf(fptr,"%s",str);
printf("%s.", str);
fclose(fptr);
}
int main()
{
char str[1000];
//clrscr();
printf("Enter string: ");
scanf("%[^\n]s", str);
printReverse(str);
//getch();
return 0;
}
In the for loop, why put &(str[i])+1? And also in printf("%s.", str)--this only has the first word; how?
Okay, let's see if I can help. I'll go through the code carefully.
I suspect you already understand this. It's just a method call.
void printReverse(char str[])
{
strlen is a standard method that returns the length of a null-terminated string. That means that str might contain Hello (5 characters), but there's one more byte with a 0 in it, which is how C has always marked the end of the string. In this case, str itself takes 6 bytes, but length will be 5.
int length = strlen(str);
This is how you open a file in C. C++ has better ways. The file is written for writing.
FILE *fptr;
if((fptr=fopen("Question1.txt","w"))==NULL)
{
printf("Invalid file");
exit(0);
}
Here's your for-loop. Let's assume str contains Hello, so length is 5, but the indexes into string are str[0..4]. C uses the index as "offset from the beginning", so the first element is 0, not 1. Thus, when this loop starts, str[i] == o (using Hello as our example string). We then loop, decrementing i each time. Once i goes below 0, the loop ends.
int i;
for (i = length - 1; i >= 0; i--) {
Okay, remember we're printing the words in normal order, but the words themselves are in reverse order. So this looks for a space -- between words. So if we use Hello there as our input text, this if-statement is true when i is pointing to the space between the two words.
Now here's the trick. Remember what I said earlier about null-terminated strings? What this does is to step on that space and replace it with a 0. That makes the rest of this magic work.
if (str[i] == ' ')
{
str[i] = '\0';
And here's the magic. Now, this is a strange way to do it. I would have done it with &str[i+1], but this works. What this is doing is saying "Print the string that begins after the space we just clobbered." We do it to the terminal and the file.
printf("%s ", &(str[i]) + 1);
fprintf(fptr,"%s ", &(str[i]) + 1);
}
}
This writes the produced rearranged string to the file that was opened as well as to your terminal then makes sure the file is closed.
fprintf(fptr,"%s",str);
printf("%s.", str);
fclose(fptr);
}
This all works because we step on the spaces with a zero. For Hello world, we:
Start from the tail
Find the space and stick a zero in it
Print world
Keep backing up to the end of the data.
Drop out of the for-loop and print whatever is left: Hello
Answer to your specific questions
In the for loop why put? &(str[i])+1?
&str[i] is the address of the character at index i where a space has been replaced with a NUL character. With +1 you get the address of the character after it, i.e. the beginning of the word that follows the space that was just replaced. (In case of double spaces this would result in an empty string.)
And also in printf("%s.", str); this only has the first word how?
Assuming the first word is not preceded by a space, the loop will not print it.
This printf("%s.", str); will print the string from the beginning until the first NUL character that replaces a former space character, hence resulting in the first word.
Additional question from comment
So... for example if I input Hello World does the W in that get the index 0?
The W is at index 6. (H is 0, e is 1 etc.)
When i has been counted down to 5, the space at this position will be replaced with a NUL ('\0') character, and it will print the remaining string from the W up to the end of the string which is also marked by a NUL character. (As defined by the C standard.)
And what if the character is not a NULL character? Then it won't go execute if right? It'll just increment i again till it encounters another NULL right?
I don't fully understand these questions. In case there was no NUL character at the end of the string printf would read past the end of the string leading to undefined behavior.
In case of an input string Hello World and Universe", all spaces after Worldwould have been replaced with NUL characters before, so when the program reaches the position of the space beforeWorld`, the string will be
Hello World\0and\0Universe\0
before the replacement and
Hello\0World\0and\0Universe\0
after the replacement.

Tokenizing a phone number in C

I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.
My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;
My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?
My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?
And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!
int main() {
char phoneNum[]= "(515) 555-5555";
char aCode[3];
char aNum[7];
char *numPtr;
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
printf("%s\n", aCode);
numPtr = strtok(&phoneNum[6], "-");
while (numPtr != NULL) {
strcat(aNum, numPtr);
numPtr = strtok(NULL, "-");
}
printf("%s", aNum);
}
I can primarily see two errors,
Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.
strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.
Also, see other answers for a complete bug-free code.
The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.
Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:
char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);
This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.
Demo.
Your code has multiple issues
You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.
char aCode[4] = {'\0'};
You don't check if strtok() returns NULL.
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.
Subsequent calls to strtok() must have NULL as the first parameter, hence
numPtr = strtok(&phoneNum[6], "-");
is wrong, it should be
numPtr = strtok(NULL, "-");
Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:
char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
int len = 0;
(void) snprintf(area, sizeof(area), "%s", p);
while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
}
}
(void) printf("(%s) %s\n", area, digits);

How would I compare a string (entered by the user) to the first word of a line in a file?

I am really struggling to understand how character arrays work in C. This seems like something that should be really simple, but I do not know what function to use, or how to use it.
I want the user to enter a string, and I want to iterate through a text file, comparing this string to the first word of each line in the file.
By "word" here, I mean substring that consists of characters that aren't blanks.
Help is greatly appreciated!
Edit:
To be more clear, I want to take a single input and search for it in a database of the form of a text file. I know that if it is in the database, it will be the first word of a line, since that is how to database is formatted. I suppose I COULD iterate through every single word of the database, but this seems less efficient.
After finding the input in the database, I need to access the two words that follow it (on the same line) to achieve the program's ultimate goal (which is computational in nature)
Here is some code that will do what you are asking. I think it will help you understand how string functions work a little better. Note - I did not make many assumptions about how well conditioned the input and text file are, so there is a fair bit of code for removing whitespace from the input, and for checking that the match is truly "the first word", and not "the first part of the first word". So this code will not match the input "hello" to the line "helloworld 123 234" but it will match to "hello world 123 234". Note also that it is currently case sensitive.
#include <stdio.h>
#include <string.h>
int main(void) {
char buf[100]; // declare space for the input string
FILE *fp; // pointer to the text file
char fileBuf[256]; // space to keep a line from the file
int ii, ll;
printf("give a word to check:\n");
fgets(buf, 100, stdin); // fgets prevents you reading in a string longer than buffer
printf("you entered: %s\n", buf); // check we read correctly
// see (for debug) if there are any odd characters:
printf("In hex, that is ");
ll = strlen(buf);
for(ii = 0; ii < ll; ii++) printf("%2X ", buf[ii]);
printf("\n");
// probably see a carriage return - depends on OS. Get rid of it!
// note I could have used the result that ii is strlen(but) but
// that makes the code harder to understand
for(ii = strlen(buf) - 1; ii >=0; ii--) {
if (isspace(buf[ii])) buf[ii]='\0';
}
// open the file:
if((fp=fopen("myFile.txt", "r"))==NULL) {
printf("cannot open file!\n");
return 0;
}
while( fgets(fileBuf, 256, fp) ) { // read in one line at a time until eof
printf("line read: %s", fileBuf); // show we read it correctly
// find whitespace: we need to keep only the first word.
ii = 0;
while(!isspace(fileBuf[ii]) && ii < 255) ii++;
// now compare input string with first word from input file:
if (strlen(buf)==ii && strstr(fileBuf, buf) == fileBuf) {
printf("found a matching line: %s\n", fileBuf);
break;
}
}
// when you get here, fileBuf will contain the line you are interested in
// the second and third word of the line are what you are really after.
}
Your recent update states that the file is really a database, in which you are looking for a word. This is very important.
If you have enough memory to hold the whole database, you should do just that (read the whole database and arrange it for efficient searching), so you should probably not ask about searching in a file.
Good database designs involve data structures like trie and hash table. But for a start, you could use the most basic improvement of the database - holding the words in alphabetical order (use the somewhat tricky qsort function to achieve that).
struct Database
{
size_t count;
struct Entry // not sure about C syntax here; I usually code in C++; sorry
{
char *word;
char *explanation;
} *entries;
};
char *find_explanation_of_word(struct Database* db, char *word)
{
for (size_t i = 0; i < db->count; i++)
{
int result = strcmp(db->entries[i].word, word);
if (result == 0)
return db->entries[i].explanation;
else if (result > 0)
break; // if the database is sorted, this means word is not found
}
return NULL; // not found
}
If your database is too big to hold in memory, you should use a trie that holds just the beginnings of the words in the database; for each beginning of a word, have a file offset at which to start scanning the file.
char* find_explanation_in_file(FILE *f, long offset, char *word)
{
fseek(f, offset, SEEK_SET);
char line[100]; // 100 should be greater than max line in file
while (line, sizeof(line), f)
{
char *word_in_file = strtok(line, " ");
char *explanation = strtok(NULL, "");
int result = strcmp(word_in_file, word);
if (result == 0)
return explanation;
else if (result > 0)
break;
}
return NULL; // not found
}
I think what you need is fseek().
1) Pre-process the database file as follows. Find out the positions of all the '\n' (carriage returns), and store them in array, say a, so that you know that ith line starts at a[i]th character from the beginning of the file.
2) fseek() is a library function in stdio.h, and works as given here. So, when you need to process an input string, just start from the start of the file, and check the first word, only at the stored positions in the array a. To do that:
fseek(inFile , a[i] , SEEK_SET);
and then
fscanf(inFile, "%s %s %s", yourFirstWordHere, secondWord, thirdWord);
for checking the ith line.
Or, more efficiently, you could use:
fseek ( inFile , a[i]-a[i-1] , SEEK_CURR )
Explanation: What fseek() does is, it sets the read/write position indicator associated with the file at the desired position. So, if you know at which point you need to read or write, you can just go there and read directly or write directly. This way, you won't need to read whole lines just to get first three words.

Resources