Strtok strange behaviour

Strtok strange behaviour - c

I'm having some troubles using strtok function.
As an exercise I have to deal with a text file by ruling out white spaces, transforming initials into capital letters and printing no more than 20 characters in a line.
Here is a fragment of my code:
fgets(sentence, SIZE, f1_ptr);
char *tok_ptr = strtok(sentence, " \n"); //tokenazing each line read
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0, i;
while (!feof(f1_ptr)) {
while (tok_ptr != NULL) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
fgets(sentence, SIZE + 1, f1_ptr);
tok_ptr = strtok(sentence, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
The text is just a bunch of lines I just show as a reference:
Watch your thoughts ; they become words .
Watch your words ; they become actions .
Watch your actions ; they become habits .
Watch your habits ; they become character .
Watch your character ; it becomes your destiny .
Here is what I obtain in the end:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;THeyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacteR.Wat
chYourCharacter;ItBe
comesYourDEstiny.Lao
-Tze
The final result is mostly correct, but sometimes (for example "they" in they become (and only in that case) or "destiny") words are not correctly tokenized. So for example "they" is split into "t" and "hey" resulting in THey (DEstiny in the other instance) after the manipulations I made.
Is it some bug or am I missing something? Probably my code is not that efficient and some condition may end up being critical...
Thank you for the help, it's not that big of a deal, I just don't understand why such a behaviour is occurring.

You have a large number of errors in your code and you are over-complicating the problem. The most pressing error is Why is while ( !feof (file) ) always wrong? Why? Trace the execution-path within your loop. You attempt to read with fgets(), and then you use sentence without knowing whether EOF was reached calling tok_ptr = strtok(sentence, " \n"); before you ever get around to checking feof(f1_ptr)
What happens when you actually reach EOF? That IS "Why while ( !feof (file) ) is always wrong?" Instead, you always want to control your read-loop with the return of the read function you are using, e.g. while (fgets(sentence, SIZE, f1_ptr) != NULL)
What is it you actually need your code to do?
The larger question is why are you over-complicating the problem with strtok, and arrays (and fgets() for that matter)? Think about what you need to do:
read each character in the file,
if it is whitespace, ignore it, set the in-word flag false,
if a non-whitespace, if 1st char in word, capitalize it, output the char, set the in-word flag true and increment the number of chars output to the current line, and finally
if it is the 20th character output, output a newline and reset the counter zero.
The bare-minimum tools you need from your C-toolbox are fgetc(), isspace() and toupper() from ctype.h, a counter for the number of characters output, and a flag to know if the character is the first non-whitespace character after a whitespace.
Implementing the logic
That makes the problem very simple. Read a character, is it whitespace?, set your in-word flag false, otherwise if your in-word flag is false, capitalize it, output the character, set your in-word flag true, increment your word count. Last thing you need to do is check if your character-count has reached the limit, if so output a '\n' and reset your character-count zero. Repeat until you run out of characters.
You can turn that into a code with something similar to the following:
#include <stdio.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
int c, in = 0, n = 0; /* char, in-word flag, no. of chars output in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read / validate each char in file */
if (isspace(c)) /* char is whitespace? */
in = 0; /* set in-word flag false */
else { /* otherwise, not whitespace */
putchar (in ? c : toupper(c)); /* output char, capitalize 1st in word */
in = 1; /* set in-word flag true */
n++; /* increment character count */
}
if (n == CPL) { /* CPL limit reached? */
putchar ('\n'); /* output newline */
n = 0; /* reset cpl counter */
}
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
Example Use/Output
Given your input file stored on my computer in dat/text220.txt, you can produce the output you are looking for with:
$ ./bin/text220 dat/text220.txt
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
(the executable for the code was compiled to bin/text220, I usually keep separate dat, obj, and bin directories for data, object files and executables to keep by source code directory clean)
note: by reading from stdin by default if no filename is provided as the first argument to the program, you can use your program to read input directly, e.g.
$ echo "my dog has fleas - bummer!" | ./bin/text220
MyDogHasFleas-Bummer
!
No fancy string functions required, just a loop, a character, a flag and a counter -- the rest is just arithmetic. It's always worth trying to boils your programming problems down to basic steps and then look around your C-toolbox and find the right tool for each basic step.
Using strtok
Don't get me wrong, there is nothing wrong with using strtok and it makes a fairly simple solution in this case -- the point I was making is that for simple character-oriented string-processing, it's often just a simple to loop over the characters in the line. You don't gain any efficiencies using fgets() with an array and strtok(), the read from the file is already placed into a buffer of BUFSIZ1.
If you did want to use strtok(), you should control you read-loop your with the return from fgets()and then you can tokenize with strtok() also checking its return at each point. A read-loop with fgets() and a tokenization loop with strtok(). Then you handle first-character capitalization and then limiting your output to 20-chars per-line.
You could do something like the following:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
#define MAXC 1024
#define DELIM " \t\r\n"
void putcharCPL (int c, int *n)
{
if (*n == CPL) { /* if n == limit */
putchar ('\n'); /* output '\n' */
*n = 0; /* reset value at mem address 0 */
}
putchar (c); /* output character */
(*n)++; /* increment value at mem address */
}
int main (int argc, char **argv) {
char line[MAXC]; /* buffer to hold each line */
int n = 0; /* no. of chars ouput in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) /* read each line and tokenize line */
for (char *tok = strtok (line, DELIM); tok; tok = strtok (NULL, DELIM)) {
putcharCPL (toupper(*tok), &n); /* convert 1st char to upper */
for (int i = 1; tok[i]; i++) /* output rest unchanged */
putcharCPL (tok[i], &n);
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(same output)
The putcharCPL() function is just a helper that checks if 20 characters have been output and if so outputs a '\n' and resets the counter. It then outputs the current character and increments the counter by one. A pointer to the counter is passed so it can be updated within the function making the updated value available back in main().
Look things over and let me know if you have further questions.
footnotes:
1. Depending on your version of gcc, the constant in the source setting the read-buffer size may be _IO_BUFSIZ. _IO_BUFSIZ was changed to BUFSIZ here: glibc commit 9964a14579e5eef9 For Linux BUFSIZE is defined as 8192 (512 on Windows).

This is actually a much more interesting OP from a professional point of view than some of the comments may suggest, despite the 'newcomer' aspect of the question, which may sometimes raise fairly deep, underestimated issues.
The fun thing is that on my platform (W10, MSYS2, gcc v.10.2), your code runs fine with correct results:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
So first, congratulations, newcomer: your coding is not that bad.
This points to how different compilers may or may not protect against limited inappropriate coding or specification misuse, may or may not protect stacks or heaps.
This said, the comment by #Andrew Henle pointing to an illuminating answer about feof is quite relevant.
If you follow it and retrieve your feof test, just moving it down after read checks, not before (as below). Your code should yield better results (note: I will just alter your code minimally, deliberately ignoring lesser issues):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#define SIZE 100 // add some leeway to avoid off-by-one issues
int main()
{
FILE* f1_ptr = fopen("C:\\Users\\Public\\Dev\\test_strtok", "r");
if (! f1_ptr)
{
perror("Open issue");
exit(EXIT_FAILURE);
}
char sentence[SIZE] = {0};
if (NULL == fgets(sentence, SIZE, f1_ptr))
{
perror("fgets issue"); // implementation-dependent
exit(EXIT_FAILURE);
}
errno = 0;
char *tok_ptr = strtok(sentence, " \n"); //tokenizing each line read
if (tok_ptr == NULL || errno)
{
perror("first strtok parse issue");
exit(EXIT_FAILURE);
}
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0;
size_t i = 0;
while (1) {
while (1) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr == NULL) break;
tok_ptr[0] = toupper(tok_ptr[0]);
}
if (NULL == fgets(sentence, SIZE, f1_ptr)) // let's get away whith annoying +1,
// we have enough headroom
{
if (feof(f1_ptr))
{
fprintf(stderr, "\n%s\n", "Found EOF");
break;
}
else
{
perror("Unexpected fgets issue in loop"); // implementation-dependent
exit(EXIT_FAILURE);
}
}
errno = 0;
tok_ptr = strtok(sentence, " \n");
if (tok_ptr == NULL)
{
if (errno)
{
perror("strtok issue in loop");
exit(EXIT_FAILURE);
}
break;
}
tok_ptr[0] = toupper(tok_ptr[0]);
}
return 0;
}
$ ./test
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
Found EOF

Related

C program to read file reading an extra line

The code I'm working on involves reading a file w/ input structured as the following:
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
Where spaces denotes an arbitrary amount of white spaces. My code is supposed to give both the name and the value. There is another condition, where everything on the line after a '#' is ignored (treated like a comment). The output is supposed be:
"name: (name) value: val \n"
For the most bit the code is working, except that it adds an extra line where it will create a set name= null and val to whatever the last number read was. For example my test file:
a 12
b 33
#c 15
nice 6#9
The output is:
Line after: a 12
name: a value: 12 :
Line after: b 33
name: b value: 33 :
Line after: # c 15
Line after: nice 6#9
name: nice value: 6 :
Line after:
name: value: 6 : //why is this happening
The code is here.
void readLine(char *filename)
{
FILE *pf;
char name[10000];
char value[20];
pf = fopen(filename, "r");
char line[10000];
if (pf){
while (fgets(line, sizeof(line), pf) != NULL) {
//printf("Line: %s\n",line);
printf("Line after: %s\n",line);
while(true){
int i=0;
char c=line[i]; //parse every char of the line
//assert(c==' ');
int locationS=0; //index in the name
int locationV=0; //index in the value
while((c==' ')&& i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
while (c!=' '&&i<sizeof(line))
{
name[locationS]=c;
locationS++;
//printf("%d",locationS);
++i;
c=line[i];if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c==' ');
while(c==' '&&i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
printf("\n");
while ((c!=' '&& c!='\n')&&i<sizeof(line))
{
value[locationV]=c;
locationV++;
++i;
c=line[i];if(c=='#'){
break;
}
}
printf("name: %s value: %s : \n",name, value);
memset(&name[0], 0, sizeof(name));
memset(&value[0], 0, sizeof(value));
break; //nothing interesting left
}
}
fclose(pf);
}else{
printf("Error in file\n");
exit(EXIT_FAILURE);
}
}

Pasha, you are doing some things correctly, but then you are making what you are trying to do much more difficult that need be. First, avoid using magic-numbers in your code, such as char name[10000];. Instead:
...
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
...
(you did very good following the rule Don't skimp on Buffer Size :)
Likewise you have done well in opening the file and validating the file is open for reading before attempting to read from it with fgets(). You can do that validation in a single block and handle the error at that time -- which will have the effect of reducing one-level of indention throughout the rest of your code, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now with the file open and validated that it is open for reading and any error handled, you can proceed to reading each line in your file. Unless you are storing the names in an array that needs to survive your read loop, you can simply declare name[MAXC]; within the read-loop block, e.g.
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
(note: rather than declare another array to hold value, we have simply declared val as an int and will use sscanf to parse name and val converting the value directly to int at that time)
Anytime you are using a line-oriented input function (like fgets() or POSIX getline(), you will want to trim the '\n' read and included in the buffer that is filled. You can do that easily with the strcspn, see strspn(3) - Linux manual page. It is a simple, single call where you use the return from strcspn as the index for the '\n' in order to overwrite the '\n' with the nul-terminating character (which is '\0', or simply 0)
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
Now all you need to do is check for the presence of the first '#' in line that marks the beginning of a comment. If found, you will simply overwrite '#' with the nul-terminating character as you did for the '\n', e.g.
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
Now that you have your line and have removed the '\n' and any comment that may be present, you can check that line isn't empty (meaning it began with a '#' or was simply an empty line containing only a '\n')
if (!*line) /* if empty-string */
continue; /* get next line */
(note: if (!*line) is simply shorthand for if (line[0] == 0). When you dereference your buffer, e.g. *line your are simply returning the first element (first char) as *line == *(line + 0) in pointer notation which is equivalent *(line + 0) == line[0] in array-index notation. The [] operates as a dereference as well.)
Now simply parse for the name and val directly from line using sscanf. Both the "%s" and "%d" conversion specifiers will ignore all leading whitespace before the conversion specifier. You can use this simple method so long as name itself does not contain whitespace. Just as you validate the return of your file opening, you will validate the return of sscanf to determine if the number of conversions you specified successfully took place. For example:
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
(note: by using the field-width modifier for your string, e.g. "%1023s" you protect your array-bounds for name. The field width limits sscanf from writing more than 1023 char + \0 to name. This cannot be provided by a variable or by a macro and is one of the occasions where you must stick a magic-number in your code... For every rule there is generally a caveat or two...)
If you asked for 2 conversions, and sscanf returned 2, then you know that both the requested conversions were successful. Further, since for val you have specified an integer conversion, you are guaranteed that value will contain an integer.
That's all there is to it. All that remains is closing the file (if not reading from stdin) and you are done. A full example could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
if (!*line) /* if empty-string */
continue; /* get next line */
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(note: if you want to print the raw line before trimming the '\n' and comments, just move the printing of line before the calls to strcspn. Above line is printed showing the final state of line before the call to sscanf)
Example Use/Output
Using your input file stored in dat/nameval.txt on my system, you could simply do the following to read values redirected from stdin:
$ ./bin/parsenameval <dat/nameval.txt
line: a 12
name: a
val : 12
line: b 33
name: b
val : 33
line: nice 6
name: nice
val : 6
(note: just remove the redirection < to actually open and read from the file rather than having the shell do it for you. Six-to-one, half-dozen to another.)
Look things over and let me know if you have further questions. If for some reason you cannot use any function to help you parse the line and must use only pointers or array-indexing, let me know. Following the approach above, it takes only a little effort to replace each of the operations with its manual equivalent.

How to fscanf word by word in a file?

I have a file with a series of words separated by a white space. For example file.txt contains this: "this is the file". How can I use fscanf to take word by word and put each word in an array of strings?
Then I did this but I don't know if it's correct:
char *words[100];
int i=0;
while(!feof(file)){
fscanf(file, "%s", words[i]);
i++;
fscanf(file, " ");
}

When reading repeated input, you control the input loop with the input function itself (fscanf in your case). While you can also loop continually (e.g. for (;;) { ... }) and check independently whether the return is EOF, whether a matching failure occurred, or whether the return matches the number of conversion specifiers (success), in your case simply checking that the return matches the single "%s" conversion specifier is fine (e.g. that the return is 1).
Storing each word in an array, you have several options. The most simple is using a 2D array of char with automatic storage. Since the longest non-medical word in the Unabridged Dictionary is 29-characters (requiring a total of 30-characters with the nul-terminating character), a 2D array with a fixed number of rows and fixed number of columns of at least 30 is fine. (dynamically allocating allows you to read and allocate memory for as many words as may be required -- but that is left for later.)
So to set up storage for 128 words, you could do something similar to the following:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
Now simply open your filename provided as the first argument to the program (or read from stdin by default if no argument is given), and then validate that your file is open for reading, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now to the crux of your read-loop. Simply loop checking the return of fscanf to determine success/failure of the read, adding words to your array and incrementing your index on each successful read. You must also include in your loop-control a check of your index against your array bounds to ensure you do not attempt to write more words to your array than it can hold, e.g.
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
That's it, now just close the file and use your words stored in your array as needed. For example just printing the stored words you could do:
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Now just compile it, With Warnings Enabled (e.g. -Wall -Wextra -pedantic for gcc/clang, or /W3 on (VS, cl.exe) and then test on your file. The full code is:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Example Input File
$ cat dat/thefile.txt
this is the file
Example Use/Output
$ ./bin/fscanfsimple dat/thefile.txt
array[ 0] : this
array[ 1] : is
array[ 2] : the
array[ 3] : file
Look things over and let me know if you have further questions.

strtok() might be a function that can help you here.
If you know that the words will be separated by whitespace, then calling strtok will return the char pointer to the start of the next word.
Sample code from https://www.systutorials.com/docs/linux/man/3p-strtok/
#include <string.h>
...
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";
/* Token will point to "LINE". */
token = strtok(line, search);
/* Token will point to "TO". */
token = strtok(NULL, search);
In your case, the space character would also act as a delimiter in the line.
Note that strtok might modify the string passed in, so if you need to you should make a deep copy using something like malloc.
It might also be easier to use fread() to read a block from a file

As mentioned in comments, using feof() does not work as would be expected. And, as described in this answer unless the content of the file is formatted with very predictable content, using any of the scanf family to parse out the words is overly complicated. I do not recommend using it for that purpose.
There are many other, better ways to read content of a file, word by word. My preference is to read each line into a buffer, then parse the buffer to extract the words. This requires determining those characters that may be in the file, but would not be considered part of a word. Characters such as \n,\t, (space), -, etc. should be considered delimiters, and can be used to extract the words. The following is a recipe for extracting words from a file: (example code for a few of the items is included below these steps.)
Read file to count words, and get the length of the longest word.
Use count, and longest values from 1st step to allocate memory for words.
Rewind the file.
Read file line by line into a line buffer using while(fgets(line, size, fp))
Parse each new line into words using delimiters and store each word into arrays of step 2.
Use resulting array of words as necessary.
free all memory allocated when finished with arrays
Some example of code to do some of these tasks:
// Get count of words, and longest word in file
int longestWord(char *file, int *nWords)
{
FILE *fp=0;
int cnt=0, longest=0, numWords=0;
int c;
fp = fopen(file, "r");
if(fp)
{
// if((strlen(buf) > 0) && (buf[0] != '\t') && (buf[0] != '\n') && (buf[0] != '\0')&& (buf[0] > 0))
while ( (c = fgetc(fp) ) != EOF )
{
if ( isalnum (c) ) cnt++;
else if ( ( ispunct (c) ) || ( isspace(c) ) || (c == '\0' ))
{
(cnt > longest) ? (longest = cnt, cnt=0) : (cnt=0);
numWords++;
}
}
*nWords = numWords;
fclose(fp);
}
else return -1;
return longest;
}
// Create indexable memory for word arrays
char ** Create2DStr(ssize_t numStrings, ssize_t maxStrLen)
{
int i;
char **a = {0};
a = calloc(numStrings, sizeof(char *));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(maxStrLen + 1, 1);
}
return a;
}
Usage: For a file with 25 words, the longest being 80 bytes:
char **strArray = Create2DStr(25, 80+1);//creates 25 array locations
//each 80+1 characters long
//(+1 is room for null terminator.)

int i=0;
char words[50][50];
while(fscanf(file, " %s ", words[i]) != EOF)
i++;
I wouldn't entirely recommend doing it this way, because of the unknown amount of words in the file, and the unknown length of a "word". Either can be over the size of '50'. Just do it dynamically, instead. Still, this should show you how it works.

How can I use fscanf to take word by word and put each word in an array of strings?
Read each word twice: first to find length via "%n". 2nd time, save it. (Inefficient yet simple)
Re-size strings as you go. Again inefficient, yet simple.
// Rough untested sample code - still need to add error checking.
size_t string_count = 0;
char **strings = NULL;
for (;;) {
long pos = ftell(file);
int n = 0;
fscanf(file, "%*s%n", &n); // record where scanning a "word" stopped
if (n == 0) break;
fseek(file, pos, SEEK_SET); // go back;
strings = realloc(strings, sizeof *strings * (string_count+1));// increase array size
strings[string_count] = malloc(n + 1u); // Get enough memory for the word
fscanf(file, "%s ", strings[string_count] ); // read/save word
}
// use strings[], string_count
// When done, free each strings[] and then strings

Confused about task with checking and compering last three words in sentence

I have a problem which i cant fix i need to check last three words of frist sentence with last three words of fourth sentence
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char firstRow[256];
char secondRow[256];
char thirdRow[256];
char fourthRow[256];
printf("Enter four row of lyrcis:\n");
gets(firstRow);
gets(secondRow);
gets(thirdRow);
gets(fourthRow);
if ( strcmp(a1+strlen(a1)-1, a4+strlen(a4)-1) &&
strcmp(a1+strlen(a1)-2, a4+strlen(a4)-2) &&
strcmp(a1+strlen(a1)-3, a4+strlen(a4)-3) == 0 ){
printf("Good job last three words of first and fourth sentence are same");
}
else {
printf("nothing");
}
return 0;
}
This is something i tried but obviously problem is that i cant use if like that with only one strcmp it works. Maybe i need strcpy command? Help!

First -- Do not use 'gets'. It is horribly insecure. There is no limitation on the number of characters it will read or whether the size of the buffer you provide has adequate storage. That allows for buffer overrun exploits and is the primary reason it has been dropped from the C library. If your professor insists on using it -- find a new professor.
The other problem you have is failing to validate each step in your process. You fail to check if gets actually read anything before passing the pointers to strcmp or strlen.
Further, your indexing is nonsense. strlen(x) - n doesn't index the end - n word in the buffer. For that you have to tokenize the string (split it into words). There are a number of ways to do it.
One method that works no matter what is simply finding the end of the string (e.g. strlen(line) - 1) and using a pointer to iterate from the end of the string towards the start until your first whitespace is found (or you reach the beginning).
The C library (in string.h) provides strrchr which automates that process for you. It will start at the end of a string and iterate backwards until it finds the first occurrence of the character you tell it to find returning a pointer to that character (or returning NULL it that character is not found). The only downside here is you are limited to search for a single character.
The C library (in string.h) provides strtok, which does not provide for a reverse search, but does provide the ability to split a string based on a set of delimiters you provide (e.g. it could handle splitting on any one of space, tab, '.', etc..). Here you simply store the pointers to each of the words (or a copy of the words) and take the last 3 indexes for comparison.
The following provides an example that uses strrchr presuming your words or separated by one (or more) spaces. Both the method used below and strtok modify the original string, so make a copy of a string before parsing if the string is originally stored in read-only memory (e.g. a string literal).
The program expects the filename to read to be provided as the first argument (or it will read from stdin if no argument is provided). Additional comments are provided in the code, in-line, below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifndef BUF_SIZ /* if you need constants... define them */
#define BUF_SIZ 8192 /* don't put 'magic' numbers in code */
#endif
#define NLAST 3
int main (int argc, char **argv) {
size_t len = 0;
char line1[BUF_SIZ] = "",
line4[BUF_SIZ] = "",
*last[NLAST] = { NULL };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (fgets (line1, BUF_SIZ, fp) == NULL) { /* read 1st line */
fprintf (stderr, "error: failed to read line1.\n");
return 1;
}
for (int i = 0; i < NLAST; i++) /* read/discard lines 2,3 read line 4 */
if (fgets (line4, BUF_SIZ, fp) == NULL) {
fprintf (stderr, "error: failed to read line4.\n");
return 1;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
len = strlen (line1); /* get length of line1 */
if (len && line1[len-1] == '\n') /* validate last is '\n' */
line1[--len] = 0; /* overwrite with nul-character */
else { /* error: handle line too long or no POSIX EOL */
fprintf (stderr, "error: line1 too long or no POSIX EOL.\n");
return 1;
}
len = strlen (line4); /* same thing for line4 */
if (len && line4[len-1] == '\n')
line4[--len] = 0;
else {
fprintf (stderr, "error: line4 too long or no POSIX EOL.\n");
return 1;
}
if (!*line1 || !*line4) { /* test if either is empty-string */
fprintf (stderr, "error: one or both line(s) empty.\n");
return 1;
}
for (int i = 0; i < NLAST; i++) { /* loop NLAST times */
char *p1 = strrchr (line1, ' '), /* get pointer to last ' ' */
*p4 = strrchr (line4, ' ');
if (!p1) { /* validate result of strrchr */
if (i < NLAST - 1) { /* if not last iteration - handle error */
fprintf (stderr, "error: only '%d' words in line1.\n", i+1);
return 1;
}
else /* if last iteration, assign line to pointer */
p1 = line1;
}
if (!p4) { /* same for line4 */
if (i < NLAST - 1) {
fprintf (stderr, "error: only '%d' words in line4.\n", i+1);
return 1;
}
else
p4 = line1;
}
/* copy to last array in order - checking if p1 is beginning of line */
last[NLAST - 1 - i] = p1 == line1 ? p1 : p1 + 1;
while (p1 > line1 && *p1 == ' ') /* nul-terminate at space */
*p1-- = 0;
while (p4 > line4 && *p4 == ' ')
*p4-- = 0;
}
printf ("\nthe last %d words in lines 1 & 4 are the same:\n", NLAST);
for (int i = 0; i < NLAST; i++)
printf (" %s\n", last[i]);
return 0;
}
Example Input File
$ cat dat/last14-3.txt
My dog has fleas
My snake has none
The cats have none
Cats are lucky because the dog has fleas
Example Use/Output
$ ./bin/lines1_4_last3 < dat/last14-3.txt
the last 3 words in lines 1 & 4 are the same:
dog
has
fleas
Regardless which method you choose to tokenize the lines, you must validate each step along the way. Look things over and make sure you understand why each validation was necessary, if not, just ask and I'm happy to help further.

How to get the strings from a file and store in a 2D char array and compare that 2D char array with a string in C?

I have a text file, it has values(I usually call them as upc_values) of
01080006210
69685932764
40000114485
40000114724
07410855329
72908100004
66484101000
04000049163
43701256600
99999909001
07726009493
78732510053
78732510063
78732510073
78732510093
02842010109
02842010132
78732510213
02410011035
73999911110
char *UPC_val = "99999909001";
char upcbuf[100][12];
char buf[12];
memset(buf,0,sizeof(buf));
memset(upcbuf,0,sizeof(upcbuf));
When I tried to fgets, I stored that in a 2D buffer.
while ( fgets(buf, sizeof(buf), f) != NULL ) {
strncpy(upcbuf[i], buf, 11);
i++;
}
I tried to print the data in the buffer.
puts(upcbuf[0]);
upcbuf[0] has the whole data in a continues stream,
0108000621069685932764400001144854000011472407410855329729081000046648410100004000049163437012566009999990900107726009493787325100537873251006378732510073787325100930284201010902842010132787325102130241001103573999911110
and I want to compare this upc values(11 digit) with another string(11 digit). I used,
if(strncmp(UPC_Val,upcbuf[i],11) == 0)
{
//do stuff here
}
It didn't work properly, I used strstr() too like,
if(strstr(upcbuf[0],UPC_val) != NULL)
{
//do stuff here
}
I am totally unaware of what it is doing, am I doing the comparison properly?
How to do this, any help please?
Thanks in advance.

To read a line of text of 11 digits and a '\n' into a string needs an array of at least 13 to store the string. There is little reason to be so tight. Suggest 2x expected max size
char upcbuf[100][12]; // large enough for 100 * (11 digits and a \0)
...
#define BUF_SIZE (13*2)
char buf[BUF_SIZE];
while (i < 100 && fgets(buf, sizeof buf, f) != NULL ) {
Lop off the potential tailing '\n'
size_t len = strlen(buf);
if (len && buf[len-1] == '\n') buf[--len] = '\0';
Check length and handle that somehow.
if (len != 11) exit(EXIT_FAILURE);
Save/print the data
// strncpy(upcbuf[i], buf, 11); // fails to insure a null character at the end
strcpy(upcbuf[i], buf);
i++;
puts(upcbuf[i]);
To compare strings
if(strcmp(UPC_Val,upcbuf[i]) == 0) {
// strings match
}

If you are still having trouble getting the logic to work after #chux's answer, then here is a short example implementing his suggestions that takes the filename to read as the first argument, and optionally the upc to search for as the second argument (it will search for "99999909001" by default [and it that case you can just read the file in on stdin]).
Note the use of an enum to define global constants for your row and column values. (you can use independent #define ROW 128 and #define COL 32 if you like) If you need constants in your code, define them once, at the top, so if they ever need to change, you have a single convenient place to change the values, rather than having to pick through your code, or perform a global search/replace to change them.
For example, you could put the logic together as follows:
#include <stdio.h>
#include <string.h>
enum { COL = 32, ROW = 128 }; /* an enum is convenient for constants */
int main (int argc, char **argv) {
char buf[COL] = "", /* buffer to read each line */
upcbuf[ROW][COL] = { "" }, /* 2D array of ROW x COL chars */
*upcval = argc > 2 ? argv[2] : "99999909001";
size_t n = 0; /* index/counter */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin; /* file */
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* fill upcbuf (you could search at same time, but let's fill) */
while (n < ROW && fgets (buf, COL, fp)) {
size_t len = strlen (buf); /* get length */
/* test last char '\n', overwrite w/nul-terminating char */
if (len && buf[len - 1] == '\n')
buf[--len] = 0;
strcpy (upcbuf[n++], buf); /* copy to upcbuf */
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
/* step through upcbuf - search for upcval */
for (size_t i = 0; i < n; i++)
if (strcmp (upcbuf[i], upcval) == 0) {
printf ("upcval: '%s' found at line '%zu'.\n", upcval, i + 1);
return 0;
}
printf ("upcval: '%s' not found in file.\n", upcval);
return 0;
}
Example Use/Output
$ ./bin/upcbuf dat/upcfile.txt
upcval: '99999909001' found at line '10'.
$ ./bin/upcbuf dat/upcfile.txt 01080006210
upcval: '01080006210' found at line '1'.
$ ./bin/upcbuf dat/upcfile.txt 02410011035
upcval: '02410011035' found at line '19'.
$ ./bin/upcbuf dat/upcfile.txt "not there!"
upcval: 'not there!' not found in file.
Also note that if you were simply searching for a single upc, then you could combine read and search in a single loop, but since you often read as a separate function, and then operate on the data elsewhere in your code, this example simply reads all upc values from the file into your array, and then searches though the array in a separate loop. Look things over, look at all answers, and let us know if you have any further questions.
As a final note, you have checked if the last char is '\n', but what happens if it isn't? You should check if the length is COL-1 indicating that additional characters remain unread in that line and handle the error (or just read and discard the remaining chars). You can do that with an addition similar to the following:
/* test last char '\n', overwrite w/nul-terminating char */
if (len && buf[len - 1] == '\n')
buf[--len] = 0;
else if (len == COL - 1) { /* if no '\n' & len == COL - 1 */
fprintf (stderr, "error: line excces %d chars.\n", COL - 1);
return 1;
}
And, you need to use the else if and check the COL - 1 and not simply use an else there because you may be reading from a file that does not have a POSIX end-of-line (e.g. a new-line character) after the final line of the file. fgets properly reads the final line, even without a POSIX line ending, but there will be no '\n' in buf. So even without the POSIX line ending, the line can be a valid line, and you are guaranteed to have a complete read, so long as the number of characters read (+ the nul-terminating char) does not equal your buffer size.

Compare 2 files

I have a problem, I need to make a program which will compare two files.
If in first file I have:
Milk
Sugar
Eggs
and in the second file I have
Vanilla
Soda
Sugar
I want to show the the line which appear in both files.
I don't have a lot of experience with c, but I tried something.
But my question is how I will show Sugar as output if they are not on the same line?
#include <stdio.h>
#include <stdlib.h>
#include<string.h>
#define MAX 100
void equal (char*lineone,char*linetwo){
if(strcmp(lineone,linetwo)==0){
printf("%s",lineone);
}
}
int main(){
FILE *fp1,*fp2;
fp1=fopen("D:/aici/file1.txt","r");
fp2=fopen("D:/aici/file2.txt","r");
char buff[MAX],buff1[MAX];
int i=0;
while((fgets(buff,MAX,fp1)!=NULL)&&(fgets(buff1,MAX,fp2))!=NULL){
//i++;
equal(buff,buff1);
}
}

What you should do (for performence reasons) is to save all the words in to two buffers and then compare them.
But , you can also do it with a little change in your implementation ,
Just need to seperate the loop to one main loop and one inner loop so you will get the effect that for each word in file 1 it will compare all words in file 2, again , very slow method when comparing to just save all the words first and only then compare each other.
void equal (char*lione,char*linetwo){
if(strcmp(lione,linetwo)==0){
printf("%s",lineone);
}
}
int main(){
FILE *fp1,*fp2;
fp1=fopen("D:/aici/file1.txt","r");
fp2=fopen("D:/aici/file2.txt","r");
char buff[MAX],buff1[MAX];
int i=0;
while(fgets(buff,MAX,fp1)!=NULL) {
while(fgets(buff1,MAX,fp2))!=NULL){
//i++;
equal(buff,buff1);
}
rewind(fp2);
}
}

Continuing from the comment, whether you continue using fgets (recommended), or you recognize that you can also use fscanf and not worry about removing the '\n' from each word, you need to validate each step of your program. While fscanf may appear easier at first, you may want to brush up on man fscanf and determine how you will control the '\n' that will be left, unread, in each of your file streams.
The following is a short example, continuing with fgets, showing how you can test for, and remove, each of the trailing '\n' read and included in your buff by fgets. (as well as reasonable validations for each step). (note: I'm presuming that since your input is a single word, a 256-char buffer is sufficient -- given the longest word in the unabridged dictionary is 28 characters, but you can also validate whether fgets has made a complete read of each line, or if additional characters remain unread)
The following code expects the filenames for each of the files to be given as the first two arguments to the program.
#include <stdio.h>
#include <string.h>
#define MAXC 256
int main (int argc, char **argv) {
if (argc < 3) { /* validate 2 arguments given */
fprintf (stderr, "error: insufficient input.\n"
"usage: %s file1 file2\n", argv[0]);
return 1;
}
char buf1[MAXC] = "", /* declare buf1 */
buf2[MAXC] = ""; /* declare buf2 */
FILE *f1 = fopen (argv[1], "r"), /* open file 1 */
*f2 = fopen (argv[2], "r"); /* open file 2 */
if (!f1) { /* validate file 1 open for reading */
fprintf (stderr, "file open failed '%s'\n", argv[1]);
return 1;
}
if (!f2) { /* validate file 2 open for reading */
fprintf (stderr, "file open failed '%s'\n", argv[2]);
return 1;
}
while (fgets (buf1, MAXC, f1)) { /* read each word in file 1 */
size_t len1 = strlen (buf1); /* get length */
if (len1 && buf1[len1 - 1] == '\n')
buf1[--len1] = 0; /* overwrite '\n' with nul-byte */
while (fgets (buf2, MAXC, f2)) { /* read each in file 2 */
size_t len2 = strlen (buf2);
if (len2 && buf2[len2 - 1] == '\n')
buf2[--len2] = 0; /* overwrite '\n' with nul-byte */
if (len1 != len2) /* if lengths differ, not equal */
continue; /* get next word from file 2 */
if (strcmp (buf1, buf2) == 0) /* compare strings */
printf ("%s\n", buf1); /* print if equal */
}
rewind (f2); /* rewind f2, clear EOF */
}
fclose (f1); /* close f1 */
fclose (f2); /* close f2 */
return 0;
}
(note: the length check if (len1 != len2) is just an efficiency check that prevents calling strcmp unless the words are equal in length. A simple comparison on the lengths (which you already have) is much less expensive than a full function call to strcmp every time. (note, this is a really small savings, that you can remove if you like))
Input Files (intentionally no POSIX-eol)
The datafiles were intentionally created without POSIX end-of-lines to demonstrate it makes no difference to the outcome if you properly handle the newline removal.
$ cat dat/f1cmp.txt
Milk
Sugar
Eggs
$ cat dat/f2cmp.txt
Vanilla
Soda
Sugar
Example Use/Output
$ ./bin/fgets_cmp_words dat/f1cmp.txt dat/f2cmp.txt
Sugar
Look things over and concentrate on the validations. Let me know if you have any further questions.
Showing Where Words Differ
To show where the words differ, you only need to modify the inner loop. You can do a simple comparison by looping over the characters in buf1 and buf2 and stopping when the first difference is located. You can continue for the two cases above (1) where the lengths differ; and (2) where the return of strcmp != 0, or you can just do a single test following a non-zero return from strcmp.
The modifications to the inner-loop above is shown below. I don't know what output format you are looking for, so I have just output the words that differ and shown the character at which the words begin to differ (zero-based indexing):
while (fgets (buf2, MAXC, f2)) { /* read each in file 2 */
size_t len2 = strlen (buf2);
int i = 0;
if (len2 && buf2[len2 - 1] == '\n')
buf2[--len2] = 0; /* overwrite '\n' with nul-byte */
if (len1 != len2) { /* if lengths differ, not equal */
/* locate & output difference */
for (i = 0; buf1[i] == buf2[i]; i++) {}
printf ("%s & %s differ at char %d (%c != %c)\n",
buf1, buf2, i, buf1[i], buf2[i]);
continue; /* get next word from file 2 */
}
if (strcmp (buf1, buf2) == 0) /* compare strings */
printf ("%s\n", buf1); /* print if equal */
else { /* locate & output difference */
for (i = 0; buf1[i] == buf2[i]; i++) {}
printf ("%s & %s differ at char %d (%c != %c)\n",
buf1, buf2, i, buf1[i], buf2[i]);
}
}
Example Use/Output
$ ./bin/fgets_cmp_wrds dat/f1cmp.txt dat/f2cmp.txt
Milk & Vanilla differ at char 0 (M != V)
Milk & Soda differ at char 0 (M != S)
Milk & Sugar differ at char 0 (M != S)
Sugar & Vanilla differ at char 0 (S != V)
Sugar & Soda differ at char 1 (u != o)
Sugar
Eggs & Vanilla differ at char 0 (E != V)
Eggs & Soda differ at char 0 (E != S)
Eggs & Sugar differ at char 0 (E != S)
Look it over and let me know if you have further questions.