I have a file with a series of words separated by a white space. For example file.txt contains this: "this is the file". How can I use fscanf to take word by word and put each word in an array of strings?
Then I did this but I don't know if it's correct:
char *words[100];
int i=0;
while(!feof(file)){
fscanf(file, "%s", words[i]);
i++;
fscanf(file, " ");
}
When reading repeated input, you control the input loop with the input function itself (fscanf in your case). While you can also loop continually (e.g. for (;;) { ... }) and check independently whether the return is EOF, whether a matching failure occurred, or whether the return matches the number of conversion specifiers (success), in your case simply checking that the return matches the single "%s" conversion specifier is fine (e.g. that the return is 1).
Storing each word in an array, you have several options. The most simple is using a 2D array of char with automatic storage. Since the longest non-medical word in the Unabridged Dictionary is 29-characters (requiring a total of 30-characters with the nul-terminating character), a 2D array with a fixed number of rows and fixed number of columns of at least 30 is fine. (dynamically allocating allows you to read and allocate memory for as many words as may be required -- but that is left for later.)
So to set up storage for 128 words, you could do something similar to the following:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
Now simply open your filename provided as the first argument to the program (or read from stdin by default if no argument is given), and then validate that your file is open for reading, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now to the crux of your read-loop. Simply loop checking the return of fscanf to determine success/failure of the read, adding words to your array and incrementing your index on each successful read. You must also include in your loop-control a check of your index against your array bounds to ensure you do not attempt to write more words to your array than it can hold, e.g.
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
That's it, now just close the file and use your words stored in your array as needed. For example just printing the stored words you could do:
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Now just compile it, With Warnings Enabled (e.g. -Wall -Wextra -pedantic for gcc/clang, or /W3 on (VS, cl.exe) and then test on your file. The full code is:
#include <stdio.h>
#define MAXW 32 /* if you need a constant, #define one (or more) */
#define MAXA 128
int main (int argc, char **argv) {
char array[MAXA][MAXW] = {{""}}; /* array to store up to 128 words */
size_t n = 0; /* word index */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
n++;
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (size_t i = 0; i < n; i++)
printf ("array[%3zu] : %s\n", i, array[i]);
return 0;
}
Example Input File
$ cat dat/thefile.txt
this is the file
Example Use/Output
$ ./bin/fscanfsimple dat/thefile.txt
array[ 0] : this
array[ 1] : is
array[ 2] : the
array[ 3] : file
Look things over and let me know if you have further questions.
strtok() might be a function that can help you here.
If you know that the words will be separated by whitespace, then calling strtok will return the char pointer to the start of the next word.
Sample code from https://www.systutorials.com/docs/linux/man/3p-strtok/
#include <string.h>
...
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";
/* Token will point to "LINE". */
token = strtok(line, search);
/* Token will point to "TO". */
token = strtok(NULL, search);
In your case, the space character would also act as a delimiter in the line.
Note that strtok might modify the string passed in, so if you need to you should make a deep copy using something like malloc.
It might also be easier to use fread() to read a block from a file
As mentioned in comments, using feof() does not work as would be expected. And, as described in this answer unless the content of the file is formatted with very predictable content, using any of the scanf family to parse out the words is overly complicated. I do not recommend using it for that purpose.
There are many other, better ways to read content of a file, word by word. My preference is to read each line into a buffer, then parse the buffer to extract the words. This requires determining those characters that may be in the file, but would not be considered part of a word. Characters such as \n,\t, (space), -, etc. should be considered delimiters, and can be used to extract the words. The following is a recipe for extracting words from a file: (example code for a few of the items is included below these steps.)
Read file to count words, and get the length of the longest word.
Use count, and longest values from 1st step to allocate memory for words.
Rewind the file.
Read file line by line into a line buffer using while(fgets(line, size, fp))
Parse each new line into words using delimiters and store each word into arrays of step 2.
Use resulting array of words as necessary.
free all memory allocated when finished with arrays
Some example of code to do some of these tasks:
// Get count of words, and longest word in file
int longestWord(char *file, int *nWords)
{
FILE *fp=0;
int cnt=0, longest=0, numWords=0;
int c;
fp = fopen(file, "r");
if(fp)
{
// if((strlen(buf) > 0) && (buf[0] != '\t') && (buf[0] != '\n') && (buf[0] != '\0')&& (buf[0] > 0))
while ( (c = fgetc(fp) ) != EOF )
{
if ( isalnum (c) ) cnt++;
else if ( ( ispunct (c) ) || ( isspace(c) ) || (c == '\0' ))
{
(cnt > longest) ? (longest = cnt, cnt=0) : (cnt=0);
numWords++;
}
}
*nWords = numWords;
fclose(fp);
}
else return -1;
return longest;
}
// Create indexable memory for word arrays
char ** Create2DStr(ssize_t numStrings, ssize_t maxStrLen)
{
int i;
char **a = {0};
a = calloc(numStrings, sizeof(char *));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(maxStrLen + 1, 1);
}
return a;
}
Usage: For a file with 25 words, the longest being 80 bytes:
char **strArray = Create2DStr(25, 80+1);//creates 25 array locations
//each 80+1 characters long
//(+1 is room for null terminator.)
int i=0;
char words[50][50];
while(fscanf(file, " %s ", words[i]) != EOF)
i++;
I wouldn't entirely recommend doing it this way, because of the unknown amount of words in the file, and the unknown length of a "word". Either can be over the size of '50'. Just do it dynamically, instead. Still, this should show you how it works.
How can I use fscanf to take word by word and put each word in an array of strings?
Read each word twice: first to find length via "%n". 2nd time, save it. (Inefficient yet simple)
Re-size strings as you go. Again inefficient, yet simple.
// Rough untested sample code - still need to add error checking.
size_t string_count = 0;
char **strings = NULL;
for (;;) {
long pos = ftell(file);
int n = 0;
fscanf(file, "%*s%n", &n); // record where scanning a "word" stopped
if (n == 0) break;
fseek(file, pos, SEEK_SET); // go back;
strings = realloc(strings, sizeof *strings * (string_count+1));// increase array size
strings[string_count] = malloc(n + 1u); // Get enough memory for the word
fscanf(file, "%s ", strings[string_count] ); // read/save word
}
// use strings[], string_count
// When done, free each strings[] and then strings
Related
I'm having some troubles using strtok function.
As an exercise I have to deal with a text file by ruling out white spaces, transforming initials into capital letters and printing no more than 20 characters in a line.
Here is a fragment of my code:
fgets(sentence, SIZE, f1_ptr);
char *tok_ptr = strtok(sentence, " \n"); //tokenazing each line read
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0, i;
while (!feof(f1_ptr)) {
while (tok_ptr != NULL) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
fgets(sentence, SIZE + 1, f1_ptr);
tok_ptr = strtok(sentence, " \n");
if (tok_ptr != NULL)
tok_ptr[0] = toupper(tok_ptr[0]);
}
The text is just a bunch of lines I just show as a reference:
Watch your thoughts ; they become words .
Watch your words ; they become actions .
Watch your actions ; they become habits .
Watch your habits ; they become character .
Watch your character ; it becomes your destiny .
Here is what I obtain in the end:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;THeyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacteR.Wat
chYourCharacter;ItBe
comesYourDEstiny.Lao
-Tze
The final result is mostly correct, but sometimes (for example "they" in they become (and only in that case) or "destiny") words are not correctly tokenized. So for example "they" is split into "t" and "hey" resulting in THey (DEstiny in the other instance) after the manipulations I made.
Is it some bug or am I missing something? Probably my code is not that efficient and some condition may end up being critical...
Thank you for the help, it's not that big of a deal, I just don't understand why such a behaviour is occurring.
You have a large number of errors in your code and you are over-complicating the problem. The most pressing error is Why is while ( !feof (file) ) always wrong? Why? Trace the execution-path within your loop. You attempt to read with fgets(), and then you use sentence without knowing whether EOF was reached calling tok_ptr = strtok(sentence, " \n"); before you ever get around to checking feof(f1_ptr)
What happens when you actually reach EOF? That IS "Why while ( !feof (file) ) is always wrong?" Instead, you always want to control your read-loop with the return of the read function you are using, e.g. while (fgets(sentence, SIZE, f1_ptr) != NULL)
What is it you actually need your code to do?
The larger question is why are you over-complicating the problem with strtok, and arrays (and fgets() for that matter)? Think about what you need to do:
read each character in the file,
if it is whitespace, ignore it, set the in-word flag false,
if a non-whitespace, if 1st char in word, capitalize it, output the char, set the in-word flag true and increment the number of chars output to the current line, and finally
if it is the 20th character output, output a newline and reset the counter zero.
The bare-minimum tools you need from your C-toolbox are fgetc(), isspace() and toupper() from ctype.h, a counter for the number of characters output, and a flag to know if the character is the first non-whitespace character after a whitespace.
Implementing the logic
That makes the problem very simple. Read a character, is it whitespace?, set your in-word flag false, otherwise if your in-word flag is false, capitalize it, output the character, set your in-word flag true, increment your word count. Last thing you need to do is check if your character-count has reached the limit, if so output a '\n' and reset your character-count zero. Repeat until you run out of characters.
You can turn that into a code with something similar to the following:
#include <stdio.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
int c, in = 0, n = 0; /* char, in-word flag, no. of chars output in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while ((c = fgetc(fp)) != EOF) { /* read / validate each char in file */
if (isspace(c)) /* char is whitespace? */
in = 0; /* set in-word flag false */
else { /* otherwise, not whitespace */
putchar (in ? c : toupper(c)); /* output char, capitalize 1st in word */
in = 1; /* set in-word flag true */
n++; /* increment character count */
}
if (n == CPL) { /* CPL limit reached? */
putchar ('\n'); /* output newline */
n = 0; /* reset cpl counter */
}
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
Example Use/Output
Given your input file stored on my computer in dat/text220.txt, you can produce the output you are looking for with:
$ ./bin/text220 dat/text220.txt
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
(the executable for the code was compiled to bin/text220, I usually keep separate dat, obj, and bin directories for data, object files and executables to keep by source code directory clean)
note: by reading from stdin by default if no filename is provided as the first argument to the program, you can use your program to read input directly, e.g.
$ echo "my dog has fleas - bummer!" | ./bin/text220
MyDogHasFleas-Bummer
!
No fancy string functions required, just a loop, a character, a flag and a counter -- the rest is just arithmetic. It's always worth trying to boils your programming problems down to basic steps and then look around your C-toolbox and find the right tool for each basic step.
Using strtok
Don't get me wrong, there is nothing wrong with using strtok and it makes a fairly simple solution in this case -- the point I was making is that for simple character-oriented string-processing, it's often just a simple to loop over the characters in the line. You don't gain any efficiencies using fgets() with an array and strtok(), the read from the file is already placed into a buffer of BUFSIZ1.
If you did want to use strtok(), you should control you read-loop your with the return from fgets()and then you can tokenize with strtok() also checking its return at each point. A read-loop with fgets() and a tokenization loop with strtok(). Then you handle first-character capitalization and then limiting your output to 20-chars per-line.
You could do something like the following:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define CPL 20 /* chars per-line, if you need a constant, #define one (or more) */
#define MAXC 1024
#define DELIM " \t\r\n"
void putcharCPL (int c, int *n)
{
if (*n == CPL) { /* if n == limit */
putchar ('\n'); /* output '\n' */
*n = 0; /* reset value at mem address 0 */
}
putchar (c); /* output character */
(*n)++; /* increment value at mem address */
}
int main (int argc, char **argv) {
char line[MAXC]; /* buffer to hold each line */
int n = 0; /* no. of chars ouput in line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) /* read each line and tokenize line */
for (char *tok = strtok (line, DELIM); tok; tok = strtok (NULL, DELIM)) {
putcharCPL (toupper(*tok), &n); /* convert 1st char to upper */
for (int i = 1; tok[i]; i++) /* output rest unchanged */
putcharCPL (tok[i], &n);
}
putchar ('\n'); /* tidy up with newline */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(same output)
The putcharCPL() function is just a helper that checks if 20 characters have been output and if so outputs a '\n' and resets the counter. It then outputs the current character and increments the counter by one. A pointer to the counter is passed so it can be updated within the function making the updated value available back in main().
Look things over and let me know if you have further questions.
footnotes:
1. Depending on your version of gcc, the constant in the source setting the read-buffer size may be _IO_BUFSIZ. _IO_BUFSIZ was changed to BUFSIZ here: glibc commit 9964a14579e5eef9 For Linux BUFSIZE is defined as 8192 (512 on Windows).
This is actually a much more interesting OP from a professional point of view than some of the comments may suggest, despite the 'newcomer' aspect of the question, which may sometimes raise fairly deep, underestimated issues.
The fun thing is that on my platform (W10, MSYS2, gcc v.10.2), your code runs fine with correct results:
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
So first, congratulations, newcomer: your coding is not that bad.
This points to how different compilers may or may not protect against limited inappropriate coding or specification misuse, may or may not protect stacks or heaps.
This said, the comment by #Andrew Henle pointing to an illuminating answer about feof is quite relevant.
If you follow it and retrieve your feof test, just moving it down after read checks, not before (as below). Your code should yield better results (note: I will just alter your code minimally, deliberately ignoring lesser issues):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#define SIZE 100 // add some leeway to avoid off-by-one issues
int main()
{
FILE* f1_ptr = fopen("C:\\Users\\Public\\Dev\\test_strtok", "r");
if (! f1_ptr)
{
perror("Open issue");
exit(EXIT_FAILURE);
}
char sentence[SIZE] = {0};
if (NULL == fgets(sentence, SIZE, f1_ptr))
{
perror("fgets issue"); // implementation-dependent
exit(EXIT_FAILURE);
}
errno = 0;
char *tok_ptr = strtok(sentence, " \n"); //tokenizing each line read
if (tok_ptr == NULL || errno)
{
perror("first strtok parse issue");
exit(EXIT_FAILURE);
}
tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters
int num = 0;
size_t i = 0;
while (1) {
while (1) {
for (i = num; i < strlen(tok_ptr) + num; i++) {
if (i % 20 == 0 && i != 0) //maximum of 20 char per line
fputc('\n', stdout);
fputc(tok_ptr[i - num], stdout);
}
num = i;
tok_ptr = strtok(NULL, " \n");
if (tok_ptr == NULL) break;
tok_ptr[0] = toupper(tok_ptr[0]);
}
if (NULL == fgets(sentence, SIZE, f1_ptr)) // let's get away whith annoying +1,
// we have enough headroom
{
if (feof(f1_ptr))
{
fprintf(stderr, "\n%s\n", "Found EOF");
break;
}
else
{
perror("Unexpected fgets issue in loop"); // implementation-dependent
exit(EXIT_FAILURE);
}
}
errno = 0;
tok_ptr = strtok(sentence, " \n");
if (tok_ptr == NULL)
{
if (errno)
{
perror("strtok issue in loop");
exit(EXIT_FAILURE);
}
break;
}
tok_ptr[0] = toupper(tok_ptr[0]);
}
return 0;
}
$ ./test
WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
Found EOF
This is my csv file, i want to get only those row which start with character "A" so i got my output but with some addition column as '0' please help me to find were i went wrong?
And one more thing i want to remove specific column like bread,anName,ot
Name,id,bread,anName,Ot,number
A,1,animal,tiger,op,8.1
M,2,animal,toper,ip,9.1
A1,7,animal,dog,cp,Na11
A2,9,animal,mouse,ap,0
A23,9,animal,pouch,gp,Na11
#include <stdio.h>
#include <stdlib.h>
#define NUMLETTERS 100
typedef struct {
char Name[100];
int id;
char number[100];
} record_t;
int main(void) {
FILE *fp;
record_t records[NUMLETTERS];
int count = 0, i;
fp = fopen("letter.csv", "r");
if (fp == NULL) {
fprintf(stderr, "Error reading file\n");
return 1;
}
while (fscanf(fp, "%s,%d,%s", records[count].name, &records[count].id, records[count].number) == 1)
count++;
for (i = 0; i < count; i++) {
if(records[i].Name[0] == 'A'){
printf("%s,%d,%s\n", records[i].Name, records[i].id, records[i].number);
}
}
fclose(fp);
return 0;
}
i want output as:
A,1,8.1
A1,7,Na11
A2,9,0
A23,9,Na11
You have two problems:
The %s format specifier tells fscanf to read a space-delimited string. Since the the records aren't space-delimited the first %s will read the whole line.
The fscanf function returns the number of successfully parsed elements it handled. Since you attempt to read three values you should compare with 3 instead of 1.
Now for one way how to solve the first problem: Use the %[ format specifier. It can handle simple patterns and, most importantly, negative patterns (read while input does not match).
So you could tell fscanf to read a string until it finds a comma by using %[^,]:
fscanf(fp, " %[^,],%d,%s", records[count].Refdes, &records[count].pin, records[count].NetName)
The use of the %[ specifier is only needed for the first string, as the second will be space-delimited (the newline).
Also note that there's a space before the %[ format, to read and ignore leading white-space, like for example the newline from the previous line.
i want to get only those row which start with character "A"
i want to remove the number which coming between A and tiger,
If I understand you correctly and you only want to store rows beginning with 'A', then I would adjust your approach to read each line with fgets() and then check whether the first character in the buffer is 'A', if so, continue; and get the next line. The for those lines that do start with 'A', simply use sscanf to parse the data into your array of struct records.
For your second part of removing the number between 'A' and "tiger", there is a difference between what you store and what you output (this comes into play in storing only records beginning with 'A' as well), but for those structs stored where the line starts with 'A', you can simply not-output the pin struct member to get the output you want.
The approach to reading a line at a time will simply require that you declare an additional character array (buffer), called buf below, to read each line into with fgets(), e.g.
char buf[3 * NUMLETTERS] = "";
...
/* read each line into buf until a max of NUMLETTERS struct filled */
while (count < NUMLETTERS && fgets (buf, sizeof buf, fp)) {
record_t tmp = { .Refdes = "" }; /* temporary struct to read into */
if (*buf != 'A') /* if doesn't start with A get next */
continue;
/* separate lines beginning with 'A' into struct members */
if (sscanf (buf, " %99[^,],%d,%99[^\n]",
tmp.Refdes, &tmp.pin, tmp.NetName) == 3)
records[count++] = tmp; /* assign tmp, increment count */
else
fprintf (stderr, "%d A record - invalid format.\n", count + 1);
}
A short example putting that to use and (since we are not sure what "remove" is intended to be), we have included a pre-processor conditional that will only output the .Refdes and .NetName members by default, but if you either #define WITHPIN or include the define in your compile string (e.g. -DWITHPIN) it will output the .pin member as well.
#include <stdio.h>
#include <stdlib.h>
#define NUMLETTERS 100
typedef struct {
char Refdes[NUMLETTERS];
int pin;
char NetName[NUMLETTERS];
} record_t;
int main (int argc, char **argv) {
record_t records[NUMLETTERS];
char buf[3 * NUMLETTERS] = "";
int count = 0, i;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* read each line into buf until a max of NUMLETTERS struct filled */
while (count < NUMLETTERS && fgets (buf, sizeof buf, fp)) {
record_t tmp = { .Refdes = "" }; /* temporary struct to read into */
if (*buf != 'A') /* if doesn't start with A get next */
continue;
/* separate lines beginning with 'A' into struct members */
if (sscanf (buf, " %99[^,],%d,%99[^\n]",
tmp.Refdes, &tmp.pin, tmp.NetName) == 3)
records[count++] = tmp; /* assign tmp, increment count */
else
fprintf (stderr, "%d A record - invalid format.\n", count + 1);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (i = 0; i < count; i++)
#ifdef WITHPIN
printf ("%-8s %2d %s\n",
records[i].Refdes, records[i].pin, records[i].NetName);
#else
printf ("%-8s %s\n", records[i].Refdes, records[i].NetName);
#endif
}
Example Use/Output
$ ./bin/getaonly dat/getasonly.txt
A tiger
A1 dog
A2 mouse
A23 pouch
If you define -DWITHPIN in your compile string, then you will get all three outputs:
$ ./bin/getaonly dat/getasonly.txt
A 1 tiger
A1 7 dog
A2 9 mouse
A23 9 pouch
(note: with the data stored in your array, you can adjust the output format to anything you need)
Since there is some uncertainty whether you want to store all and output only records beginning with 'A' or only want to store records beginning with 'A' -- let me know if I need to make changes and I'm happy to help further.
I have a text file, it has values(I usually call them as upc_values) of
01080006210
69685932764
40000114485
40000114724
07410855329
72908100004
66484101000
04000049163
43701256600
99999909001
07726009493
78732510053
78732510063
78732510073
78732510093
02842010109
02842010132
78732510213
02410011035
73999911110
char *UPC_val = "99999909001";
char upcbuf[100][12];
char buf[12];
memset(buf,0,sizeof(buf));
memset(upcbuf,0,sizeof(upcbuf));
When I tried to fgets, I stored that in a 2D buffer.
while ( fgets(buf, sizeof(buf), f) != NULL ) {
strncpy(upcbuf[i], buf, 11);
i++;
}
I tried to print the data in the buffer.
puts(upcbuf[0]);
upcbuf[0] has the whole data in a continues stream,
0108000621069685932764400001144854000011472407410855329729081000046648410100004000049163437012566009999990900107726009493787325100537873251006378732510073787325100930284201010902842010132787325102130241001103573999911110
and I want to compare this upc values(11 digit) with another string(11 digit). I used,
if(strncmp(UPC_Val,upcbuf[i],11) == 0)
{
//do stuff here
}
It didn't work properly, I used strstr() too like,
if(strstr(upcbuf[0],UPC_val) != NULL)
{
//do stuff here
}
I am totally unaware of what it is doing, am I doing the comparison properly?
How to do this, any help please?
Thanks in advance.
To read a line of text of 11 digits and a '\n' into a string needs an array of at least 13 to store the string. There is little reason to be so tight. Suggest 2x expected max size
char upcbuf[100][12]; // large enough for 100 * (11 digits and a \0)
...
#define BUF_SIZE (13*2)
char buf[BUF_SIZE];
while (i < 100 && fgets(buf, sizeof buf, f) != NULL ) {
Lop off the potential tailing '\n'
size_t len = strlen(buf);
if (len && buf[len-1] == '\n') buf[--len] = '\0';
Check length and handle that somehow.
if (len != 11) exit(EXIT_FAILURE);
Save/print the data
// strncpy(upcbuf[i], buf, 11); // fails to insure a null character at the end
strcpy(upcbuf[i], buf);
i++;
puts(upcbuf[i]);
To compare strings
if(strcmp(UPC_Val,upcbuf[i]) == 0) {
// strings match
}
If you are still having trouble getting the logic to work after #chux's answer, then here is a short example implementing his suggestions that takes the filename to read as the first argument, and optionally the upc to search for as the second argument (it will search for "99999909001" by default [and it that case you can just read the file in on stdin]).
Note the use of an enum to define global constants for your row and column values. (you can use independent #define ROW 128 and #define COL 32 if you like) If you need constants in your code, define them once, at the top, so if they ever need to change, you have a single convenient place to change the values, rather than having to pick through your code, or perform a global search/replace to change them.
For example, you could put the logic together as follows:
#include <stdio.h>
#include <string.h>
enum { COL = 32, ROW = 128 }; /* an enum is convenient for constants */
int main (int argc, char **argv) {
char buf[COL] = "", /* buffer to read each line */
upcbuf[ROW][COL] = { "" }, /* 2D array of ROW x COL chars */
*upcval = argc > 2 ? argv[2] : "99999909001";
size_t n = 0; /* index/counter */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin; /* file */
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* fill upcbuf (you could search at same time, but let's fill) */
while (n < ROW && fgets (buf, COL, fp)) {
size_t len = strlen (buf); /* get length */
/* test last char '\n', overwrite w/nul-terminating char */
if (len && buf[len - 1] == '\n')
buf[--len] = 0;
strcpy (upcbuf[n++], buf); /* copy to upcbuf */
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
/* step through upcbuf - search for upcval */
for (size_t i = 0; i < n; i++)
if (strcmp (upcbuf[i], upcval) == 0) {
printf ("upcval: '%s' found at line '%zu'.\n", upcval, i + 1);
return 0;
}
printf ("upcval: '%s' not found in file.\n", upcval);
return 0;
}
Example Use/Output
$ ./bin/upcbuf dat/upcfile.txt
upcval: '99999909001' found at line '10'.
$ ./bin/upcbuf dat/upcfile.txt 01080006210
upcval: '01080006210' found at line '1'.
$ ./bin/upcbuf dat/upcfile.txt 02410011035
upcval: '02410011035' found at line '19'.
$ ./bin/upcbuf dat/upcfile.txt "not there!"
upcval: 'not there!' not found in file.
Also note that if you were simply searching for a single upc, then you could combine read and search in a single loop, but since you often read as a separate function, and then operate on the data elsewhere in your code, this example simply reads all upc values from the file into your array, and then searches though the array in a separate loop. Look things over, look at all answers, and let us know if you have any further questions.
As a final note, you have checked if the last char is '\n', but what happens if it isn't? You should check if the length is COL-1 indicating that additional characters remain unread in that line and handle the error (or just read and discard the remaining chars). You can do that with an addition similar to the following:
/* test last char '\n', overwrite w/nul-terminating char */
if (len && buf[len - 1] == '\n')
buf[--len] = 0;
else if (len == COL - 1) { /* if no '\n' & len == COL - 1 */
fprintf (stderr, "error: line excces %d chars.\n", COL - 1);
return 1;
}
And, you need to use the else if and check the COL - 1 and not simply use an else there because you may be reading from a file that does not have a POSIX end-of-line (e.g. a new-line character) after the final line of the file. fgets properly reads the final line, even without a POSIX line ending, but there will be no '\n' in buf. So even without the POSIX line ending, the line can be a valid line, and you are guaranteed to have a complete read, so long as the number of characters read (+ the nul-terminating char) does not equal your buffer size.
I have a problem, I need to make a program which will compare two files.
If in first file I have:
Milk
Sugar
Eggs
and in the second file I have
Vanilla
Soda
Sugar
I want to show the the line which appear in both files.
I don't have a lot of experience with c, but I tried something.
But my question is how I will show Sugar as output if they are not on the same line?
#include <stdio.h>
#include <stdlib.h>
#include<string.h>
#define MAX 100
void equal (char*lineone,char*linetwo){
if(strcmp(lineone,linetwo)==0){
printf("%s",lineone);
}
}
int main(){
FILE *fp1,*fp2;
fp1=fopen("D:/aici/file1.txt","r");
fp2=fopen("D:/aici/file2.txt","r");
char buff[MAX],buff1[MAX];
int i=0;
while((fgets(buff,MAX,fp1)!=NULL)&&(fgets(buff1,MAX,fp2))!=NULL){
//i++;
equal(buff,buff1);
}
}
What you should do (for performence reasons) is to save all the words in to two buffers and then compare them.
But , you can also do it with a little change in your implementation ,
Just need to seperate the loop to one main loop and one inner loop so you will get the effect that for each word in file 1 it will compare all words in file 2, again , very slow method when comparing to just save all the words first and only then compare each other.
void equal (char*lione,char*linetwo){
if(strcmp(lione,linetwo)==0){
printf("%s",lineone);
}
}
int main(){
FILE *fp1,*fp2;
fp1=fopen("D:/aici/file1.txt","r");
fp2=fopen("D:/aici/file2.txt","r");
char buff[MAX],buff1[MAX];
int i=0;
while(fgets(buff,MAX,fp1)!=NULL) {
while(fgets(buff1,MAX,fp2))!=NULL){
//i++;
equal(buff,buff1);
}
rewind(fp2);
}
}
Continuing from the comment, whether you continue using fgets (recommended), or you recognize that you can also use fscanf and not worry about removing the '\n' from each word, you need to validate each step of your program. While fscanf may appear easier at first, you may want to brush up on man fscanf and determine how you will control the '\n' that will be left, unread, in each of your file streams.
The following is a short example, continuing with fgets, showing how you can test for, and remove, each of the trailing '\n' read and included in your buff by fgets. (as well as reasonable validations for each step). (note: I'm presuming that since your input is a single word, a 256-char buffer is sufficient -- given the longest word in the unabridged dictionary is 28 characters, but you can also validate whether fgets has made a complete read of each line, or if additional characters remain unread)
The following code expects the filenames for each of the files to be given as the first two arguments to the program.
#include <stdio.h>
#include <string.h>
#define MAXC 256
int main (int argc, char **argv) {
if (argc < 3) { /* validate 2 arguments given */
fprintf (stderr, "error: insufficient input.\n"
"usage: %s file1 file2\n", argv[0]);
return 1;
}
char buf1[MAXC] = "", /* declare buf1 */
buf2[MAXC] = ""; /* declare buf2 */
FILE *f1 = fopen (argv[1], "r"), /* open file 1 */
*f2 = fopen (argv[2], "r"); /* open file 2 */
if (!f1) { /* validate file 1 open for reading */
fprintf (stderr, "file open failed '%s'\n", argv[1]);
return 1;
}
if (!f2) { /* validate file 2 open for reading */
fprintf (stderr, "file open failed '%s'\n", argv[2]);
return 1;
}
while (fgets (buf1, MAXC, f1)) { /* read each word in file 1 */
size_t len1 = strlen (buf1); /* get length */
if (len1 && buf1[len1 - 1] == '\n')
buf1[--len1] = 0; /* overwrite '\n' with nul-byte */
while (fgets (buf2, MAXC, f2)) { /* read each in file 2 */
size_t len2 = strlen (buf2);
if (len2 && buf2[len2 - 1] == '\n')
buf2[--len2] = 0; /* overwrite '\n' with nul-byte */
if (len1 != len2) /* if lengths differ, not equal */
continue; /* get next word from file 2 */
if (strcmp (buf1, buf2) == 0) /* compare strings */
printf ("%s\n", buf1); /* print if equal */
}
rewind (f2); /* rewind f2, clear EOF */
}
fclose (f1); /* close f1 */
fclose (f2); /* close f2 */
return 0;
}
(note: the length check if (len1 != len2) is just an efficiency check that prevents calling strcmp unless the words are equal in length. A simple comparison on the lengths (which you already have) is much less expensive than a full function call to strcmp every time. (note, this is a really small savings, that you can remove if you like))
Input Files (intentionally no POSIX-eol)
The datafiles were intentionally created without POSIX end-of-lines to demonstrate it makes no difference to the outcome if you properly handle the newline removal.
$ cat dat/f1cmp.txt
Milk
Sugar
Eggs
$ cat dat/f2cmp.txt
Vanilla
Soda
Sugar
Example Use/Output
$ ./bin/fgets_cmp_words dat/f1cmp.txt dat/f2cmp.txt
Sugar
Look things over and concentrate on the validations. Let me know if you have any further questions.
Showing Where Words Differ
To show where the words differ, you only need to modify the inner loop. You can do a simple comparison by looping over the characters in buf1 and buf2 and stopping when the first difference is located. You can continue for the two cases above (1) where the lengths differ; and (2) where the return of strcmp != 0, or you can just do a single test following a non-zero return from strcmp.
The modifications to the inner-loop above is shown below. I don't know what output format you are looking for, so I have just output the words that differ and shown the character at which the words begin to differ (zero-based indexing):
while (fgets (buf2, MAXC, f2)) { /* read each in file 2 */
size_t len2 = strlen (buf2);
int i = 0;
if (len2 && buf2[len2 - 1] == '\n')
buf2[--len2] = 0; /* overwrite '\n' with nul-byte */
if (len1 != len2) { /* if lengths differ, not equal */
/* locate & output difference */
for (i = 0; buf1[i] == buf2[i]; i++) {}
printf ("%s & %s differ at char %d (%c != %c)\n",
buf1, buf2, i, buf1[i], buf2[i]);
continue; /* get next word from file 2 */
}
if (strcmp (buf1, buf2) == 0) /* compare strings */
printf ("%s\n", buf1); /* print if equal */
else { /* locate & output difference */
for (i = 0; buf1[i] == buf2[i]; i++) {}
printf ("%s & %s differ at char %d (%c != %c)\n",
buf1, buf2, i, buf1[i], buf2[i]);
}
}
Example Use/Output
$ ./bin/fgets_cmp_wrds dat/f1cmp.txt dat/f2cmp.txt
Milk & Vanilla differ at char 0 (M != V)
Milk & Soda differ at char 0 (M != S)
Milk & Sugar differ at char 0 (M != S)
Sugar & Vanilla differ at char 0 (S != V)
Sugar & Soda differ at char 1 (u != o)
Sugar
Eggs & Vanilla differ at char 0 (E != V)
Eggs & Soda differ at char 0 (E != S)
Eggs & Sugar differ at char 0 (E != S)
Look it over and let me know if you have further questions.
I wanted to only count the number of strings in a text file, containing numbers as well. But the code below, counts even the numbers in the file as strings. How do I rectify the problem?
int count;
char *temp;
FILE *fp;
fp = fopen("multiplexyz.txt" ,"r" );
while(fscanf(fp,"%s",temp) != EOF )
{
count++;
}
printf("%d ",count);
return 0;
}
Well, first up, using the temp pointer without having backing storage for it is going to cause you a world of pain.
I'd suggest, as a start, using something like char temp[1000] instead, keeping in mind that's still a bit risky if you have words more than a thousand or so characters long (that's a different issue to the one you're asking about so I'll mention it but not spend too much time on fixing it).
Secondly, it appears you want to count words with numbers (like alpha7 or pi/2). If that's the case, you simply need to check temp after reading the "word" and increment count only if it matches a "non-numeric" pattern.
That could be as simple as just not incrementing if the word consists only of digits, or it could be complicated if you want to handle decimals, exponential formats and so on.
But the bottom line remains the same:
while(fscanf(fp,"%s",temp) != EOF )
{
if (! isANumber(temp))
count++;
}
with a suitable definition of isANumber. For example, for unsigned integers only, something like this would be a good start:
int isANumber (char *str) {
// Empty string is not a number.
if (*str == '\0')
return 0;
// Check every character.
while (*str != '\0') {
// If non-digit, it's not a number.
if (! isdigit (*str))
return 0;
str++;
}
// If all characters were digits, it was a number.
return 1;
}
For more complex checking, you can use the strto* calls in C, giving them the temp buffer and ensuring you use the endptr method to ensure the entire string is scanned. Off the top of my head, so not well tested, that would go something like:
int isANumber (char *str) {
// Empty string is not a number.
if (*str == '\0')
return 0;
// Use strtod to get a double.
char *endPtr;
long double d = strtold (str, &endPtr);
// Characters unconsumed, not number (things like 42b).
if (*endPtr != '\0')
return 0;
// Was a long double, so number.
return 1;
}
The only thing you need to watch out for there is that certain strings like NaN or +Inf are considered a number by strtold so you may need extra checks for that.
inside your while loop, loop through the string to check if any of its characters are digits. Something like:
while(*temp != '\0'){
if(isnumber(*temp))
break;
}
[dont copy exact same code]
I find strpbrk to be one of the most helpful function to search for several needles in a haystack. Your set of needles being the numeric characters "0123456789" which if present in a line read from your file will count as a line. I also prefer POSIX getline for a line count do to its proper handling of files with non-POSIX line endings for the last line (both fgets and wc -l omit text (and a count) of the last line if it does not contain a POSIX line end ('\n'). That said, a small function that searches a line for characters contained in a trm passed as a parameter could be written as:
/** open and read each line in 'fn' returning the number of lines
* continaing any of the characters in 'trm'.
*/
size_t nlines (char *fn, char *trm)
{
if (!fn) return 0;
size_t lines = 0, n = 0;
char *buf = NULL;
FILE *fp = fopen (fn, "r");
if (!fp) return 0;
while (getline (&buf, &n, fp) != -1)
if (strpbrk (buf, trm))
lines++;
fclose (fp);
free (buf);
return lines;
}
Simply pass the filename of interest and the terms to search for in each line. A short test code with a default term of "0123456789" that takes the filename as the first parameter and the term as the second could be written as follows:
#include <stdio.h> /* printf */
#include <stdlib.h> /* free */
#include <string.h> /* strlen, strrchr */
size_t nlines (char *fn, char *trm);
int main (int argc, char **argv) {
char *fn = argc > 1 ? argv[1] : NULL;
char *srch = argc > 2 ? argv[2] : "0123456789";
if (!fn) return 1;
printf ("%zu %s\n", nlines (fn, srch), fn);
return 0;
}
/** open and read each line in 'fn' returning the number of lines
* continaing any of the characters in 'trm'.
*/
size_t nlines (char *fn, char *trm)
{
if (!fn) return 0;
size_t lines = 0, n = 0;
char *buf = NULL;
FILE *fp = fopen (fn, "r");
if (!fp) return 0;
while (getline (&buf, &n, fp) != -1)
if (strpbrk (buf, trm))
lines++;
fclose (fp);
free (buf);
return lines;
}
Give it a try and see if this is what you are expecting, if not, just let me know and I am glad to help further.
Example Input File
$ cat dat/linewno.txt
The quick brown fox
jumps over 3 lazy dogs
who sleep in the sun
with a temp of 101
Example Use/Output
$ ./bin/getline_nlines_nums dat/linewno.txt
2 dat/linewno.txt
$ wc -l dat/linewno.txt
4 dat/linewno.txt