Extract numerical values from a string and average them

Extract numerical values from a string and average them - c

I have a .txt file that contains data in this format:
xxxx: 0.9467,
yyyy: 0.9489,
zzzz: 0.78973,
hhhh: 0.8874,
yyyy: 0.64351,
xxxx: 0.8743,
and so on...
Let's say that my C program receives, as input, the string yyyy. The program should, simply, return all the instances of yyyy in the .txt file and the average of all their numerical values.
int main() {
FILE *filePTR;
char fileRow[100000];
if (fopen_s(&filePTR, "file.txt", "r") == 0) {
while (fgets(fileRow, sizeof fileRow, filePTR) != NULL) {
if (strstr(fileRow, "yyyy") != NULL) { // Input parameter
printf("%s", fileRow);
}
}
fclose(filePTR);
printf("\nEnd of the file.\n");
} else {
printf("ERROR! Impossible to read the file.");
}
return 0;
}
This is my code right now. I don't know how to:
Isolate the numerical values
actually convert them to double type
average them
I read something about the strtok function (just to start), but I would need some help...

You have started off on the right track and should be commended for using fgets() to read a complete line from the file on each iteration, but your choice of strstr does not ensure the prefix you are looking for is found at the beginning of the line.
Further, you want to avoid hardcoding your search string as well as the file to open. main() takes arguments through argc and argv that let you pass information into your program on startup. See: C11 Standard - §5.1.2.2.1 Program startup(p1). Using the parameters eliminates your need to hardcode values by letting you pass the filename to open and the prefix to search for as arguments to your program. (which also eliminates the need to recompile your code simply to read from another filename or search for another string)
For example, instead of hardcoding values, you can use the parameters to main() to open any file and search for any prefix simply using something similar to:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
...
At this point in your program, you have opened the file passed as the first argument and have validated that it is open for reading through the file-stream pointer fp. You have passed in the prefix to search for as the second argument, assigned it to the pointer str and have obtained the length of the prefix and have stored in in len.
Next you want to read each line from your file into buf, but instead of attempting to match the prefix with strstr(), you can use strncmp() with len to compare the beginning of the line read from your file. If the prefix is found, you can then use sscanf to parse the double value from the file and add it to sum and increment the number of values stored in n, e.g.
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
(note: above the assignment suppression operator for sscanf(), '*' allows you to read and discard the prefix and ':' without having to store the prefix in a second string)
All that remains is checking if values are contained in sum by checking your count n and if so, output the average for the prefix. Or, if n == 0 the prefix was not found in the file, e.g.:
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
That is basically all you need. With it, you can read from any file you like and search for any prefix simply passing the filename and prefix as the first two arguments to your program. The complete example would be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
Example Use/Output
Using your data file stored in dat/prefixdouble.txt, you can search for each prefix in the file and obtain the average, e.g.
$ ./bin/prefixaverage dat/prefixdouble.txt hhhh
prefix 'hhhh' avg: 0.8874
$ ./bin/prefixaverage dat/prefixdouble.txt xxxx
prefix 'xxxx' avg: 0.9105
$ ./bin/prefixaverage dat/prefixdouble.txt yyyy
prefix 'yyyy' avg: 0.7962
$ ./bin/prefixaverage dat/prefixdouble.txt zzzz
prefix 'zzzz' avg: 0.7897
$ ./bin/prefixaverage dat/prefixdouble.txt foo
prefix 'foo' -- not found in file.
Much easier than having to recompile each time you want to search for another prefix. Look things over and let me know if you have further questions.

Related

Reading a text file to extract coordinates in c

i am trying to read a text file and extract the cordinates of 'X' so that i can place then on a map
the text file is
10 20
9 8 X
2 3 P
4 5 G
5 6 X
7 8 X
12 13 X
14 15 X
I tried multiple times but I am unable to extract the relevant data and place it in separate variables to plot
I am quite new to c and am trying to learn things so any help is appreciated
thanks in advance

From my top comments, I suggested an array of point structs.
Here is your code refactored to do that.
I changed the scanf to use %s instead of %c for the point name. It generalizes the point name and [probably] works better with the input line because [I think] the %c would not match up correctly.
It compiles but is untested:
#include <stdio.h>
#include <stdlib.h>
struct point {
int x;
int y;
char name[8];
};
struct point *points;
int count;
int map_row;
int map_col;
void
read_data(const char *file_name)
{
FILE *fp = fopen(file_name, "r");
if (fp == NULL) {
/* if the file opened is empty or has any issues, then show the error */
perror("File Error");
return;
}
/* get the dimensions from the file */
fscanf(fp, "%d %d", &map_row, &map_col);
map_row = map_row + 2;
map_col = map_col + 2;
while (1) {
// enlarge dynamic array
++count;
points = realloc(points,sizeof(*points) * count);
// point to place to store data
struct point *cur = &points[count - 1];
if (fscanf(fp, "%d %d %s", &cur->x, &cur->y, cur->name) != 3)
break;
}
// trim to amount used
--count;
points = realloc(points,sizeof(*points) * count);
fclose(fp);
}

There are a number of way to approach this. Craig has some very good points on the convenience of using a struct to coordinate data of different types. This approach reads with fgets() and parses the data you need with sscanf(). The benefit eliminates the risk of a matching-failure leaving characters unread in your input stream that will corrupt the remainder of your read from the point of matching-failure forward. Reading with fgets() you consume a line of input at a time, and that read is independent of the parsing of values with sscanf().
Putting it altogether and allowing the filename to be provided by the first argument to the program (or reading from stdin by default if no argument is provided), you can do:
#include <stdio.h>
#define MAXC 1024 /* if you need a constand, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC]; /* buffer to hold each line */
int map_row, map_col; /* map row/col variables */
/* use filename provided as 1st argument (stdin if none provided) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open");
return 1;
}
/* read/validate first line saving into map_row, map_col */
if (!fgets (buf, MAXC, fp) ||
sscanf (buf, "%d %d", &map_row, &map_col) != 2) {
fputs ("error: EOF or invalid map row/col data.\n", stderr);
return 1;
}
/* loop reading remaining lines, for used as line counter */
for (size_t i = 2; fgets (buf, MAXC, fp); i++) {
char suffix;
int x, y;
/* validate parsing x, y, suffix from buf */
if (sscanf (buf, "%d %d %c", &x, &y, &suffix) != 3) {
fprintf (stderr, "error: invalid format line %zu.\n", i);
continue;
}
if (suffix == 'X') { /* check if line suffix is 'X' */
printf ("%2d %2d %c\n", x, y, suffix);
}
}
if (fp != stdin) { /* close file if not stdin */
fclose (fp);
}
}
(note: this just illustrates the read and isolation of the values from lines with a suffix of 'X'. Data handling, and calculations are left to you)
Example Use/Output
With your data in dat/coordinates.txt you could do:
$ ./bin/readcoordinates dat/coordinates.txt
9 8 X
5 6 X
7 8 X
12 13 X
14 15 X
As Craig indicates, if you need to store your matching data, then an array of struct provides a great solution.

C copy substring from text file

Say I have the following text file -
name:asdfg
address:zcvxz
,
name:qwerwer
address:zxcvzxcvxz
,
And I wanna copy the name (without "name:") to a certain string variable, the address to another and so on.
How do I do so without corrupting memory?
Tried using (example) -
char buf[50];
while (fgets(buf, 50, file) != NULL) {
if (!strncmp(buf, "name", 4))
strncpy(somestring, buf + 5, 20)
//do the same for address, continue looping
but the text lines differ in length, so it seems to copy all sorts of crap from the buffer, as the strings arent null terminated so it copies "asdfgcrapcrapcrap".

You are to be commended for using fgets to handle your file I/O as it provides a much more flexible and robust way to read, validate and prepare to parse the lines of data you read. It is generally the recommended way to do line-oriented input (either from a file or from the user). However, this is one of those circumstances where treating multiple records as formatted input does have some advantages.
Let's start with an example reading your data file and capturing the name:.... and address:... data in a simple data structure to hold both the name and address data values in a 20-char array for each. Each line is read, the length is validated, the trailing '\n' is removed and then strchr is used to locate the ':' in the line. (we don't care about lines without ':'). The label before ':' is copied to tmp and then compare against "name" or "address" to determine which value to read. Once the address data is read, both name and addr values are printed to stdout,
#include <stdio.h>
#include <string.h>
enum { MAXC = 20, MAXS = 256 };
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
char buf[MAXS] = "",
*name = "name", /* name/address literals for comparison */
*addr = "address";
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (buf, MAXS, fp)) { /* read each line */
char *p = buf, /* pointer to use with strchr */
tmp[MAXC] = ""; /* storage for labels */
size_t len = strlen (buf); /* get buf len */
if (len && buf[len - 1] == '\n') /* validate last char is '\n' */
buf[--len] = 0; /* overwrite with nul-character */
else if (len + 1 == MAXS) { /* handle string too long */
fprintf (stderr, "error: line too long or no '\n'\n");
return 1;
}
if ((p = strchr (buf, ':'))) { /* find ':' in buf */
size_t labellen = p - buf, /* get length of label */
datalen = strlen (p + 1); /* get length of data */
if (labellen + 1 > MAXC) { /* validate both lengths */
fprintf (stderr, "error: label exceeds '%d' chars.\n", MAXC);
return 1;
}
if (datalen + 1 > MAXC) {
fprintf (stderr, "error: data exceeds '%d' chars.\n", MAXC);
return 1;
}
strncpy (tmp, buf, labellen); /* copy label to temp */
tmp[labellen] = 0; /* nul-terminate */
if (strcmp (name, tmp) == 0) /* is the label "name" ? */
strcpy (mydata.name, p + 1);
else if (strcmp (addr, tmp) == 0) { /* is the label "address" ? */
strcpy (mydata.addr, p + 1);
/* record complete -- output results */
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
}
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(note: there are many ways to structure this logic. The example above just represents a semi-standard method)
Example Use/Output
$./bin/nameaddr <dat/nameaddr.txt
name : asdfg
addr : zcvxz
name : qwerwer
addr : zxcvzxcvxz
Here is where I will have a tough time convincing you that fgets was the way to go for this problem. Why? Here we are essentially reading formatted input that is comprised of 3-lines of data. The format string for fscanf doesn't care how many lines are involved, and can easily be constructed to skip '\n' within the formatted input. This can provide (a more fragile), but attractive alternative for the right input files.
For example, the code above can be reduced to the following using fscanf for a formatted read:
#include <stdio.h>
#define MAXC 20
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* read 3-lines at a time separating name and address at once */
while (fscanf (fp, " name:%19s address:%19s ,",
mydata.name, mydata.addr) == 2)
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(the output is the same)
In the rare case, for the correct data file, fscanf can provide a viable alternative to a line-oriented read with fgets. However, your first choice should remain a line-oriented approach using either fgets or POSIX getline.
Look both over and let me know if you have further questions.

If the name is 20 characters or longer, strncpy() won't copy the null terminator to the destination string, so you need to add it yourself.
strncpy(somestring, buf + 5, 19);
somestring[19] = '\0';

C program to copy .csv of integers copies one less element unless element size is set to +1

I'm new to learning the C language and I wanted to write a simple program that would copy an array integers from one .csv file to a new .csv file. My code works as intended, however when my array size for fread/fwrite is set to the exact number of elements in the .csv array (10 in this case), it only copies nine of the elements.
When the array size is set to +1, it copies all the elements.
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 11
//program that copies an array of integers from one .csv to another .csv
int main(int argc, char * argv[])
{
if (argc != 2)
{
fprintf(stderr, "Usage ./file_sort file.csv\n");
return 1;
}
char * csvfile = argv[1];
FILE * input_csvile = fopen(csvfile, "r"); //open .csv file and create file pointer input_csvile
if(input_csvile == NULL)
{
fprintf(stderr, "Error, Could not open\n");
return 2;
}
unsigned int giving_total[LISTSIZE];
if(input_csvile != NULL) //after file opens, read array from .csv input file
{
fread(giving_total, sizeof(int), LISTSIZE, input_csvile);
}
else
fprintf(stderr, "Error\n");
FILE * printed_file = fopen("school_currentfy1.csv", "w");
if (printed_file != NULL)
{
fwrite(giving_total, sizeof(int), LISTSIZE, printed_file); //copy array of LISTSIZE integers to new file
}
else
fprintf(stderr, "Error\n");
fclose(printed_file);
fclose(input_csvile);
return 0;
}
Does this have something to do with the array being 0-indexed and the .csv file being 1-indexed? I also had an output with the LISTSIZE of 11 which had the last (10) element being displayed incorrectly; 480 instead of 4800.
http://imgur.com/lLOozrc Output/input with LISTSIZE of 10
http://imgur.com/IZPGwsA Input/Output with LISTSIZE of 11

Note: as noted in the comment, fread and fwrite are for reading and writing binary data, not text. If you are dealing with a .csv (comma separated values -- e.g. as exported from MS Excel or Open/LibreOffice calc) You will need to use fgets (or any other character/string oriented function) followed by sscanf (or strtol, strtoul) to read the values as text and perform the conversion to int values. To write the values to your output file, use fprintf. (fscanf is also available for input text processing and conversion, but you lose flexibility in handling variations in input format)
However, if your goal was to read binary data for 10 integers (e.g. 40-bytes of data), then fread and fwrite are fine, but as with all input/output routines, you need to validate the number of bytes read and written to insure you are dealing with valid data within your code. (and that you have a valid output data file when you are done)
There are many ways to read a .csv file, depending on the format. One generic way is to simply read each line of text with fgets and then repeatedly call sscanf to convert each value. (this has a number of advantages in handling different spacing around the ',' compared to fscanf) You simply read each line, assign a pointer to the beginning of the buffer read by fgets, and then call sscanf (with %n to return the number of character processed by each call) and then advance the pointer by that number and scan forward in the buffer until your next '-' (for negative values) or a digit is encountered. (using %n and scanning forward can allow fscanf to be used in a similar manner) For example:
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
Now to create your output file, you simply write numread values back out in comma separated value format. (you can adjust how many your write per line as required)
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
Then it is just a matter of closing your output file and checking for any stream errors (always check on close when writing values to a file)
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
Putting it altogether, you could do something similar to the following:
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 10
#define MAXC 256
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage ./file_sort file.csv [outfile]\n");
return 1;
}
int giving_total[LISTSIZE]; /* change to int to handle negative values */
size_t i, numread = 0; /* generic i and number of integers read */
char *csvfile = argv[1],
buf[MAXC] = ""; /* buffer to hold MAXC chars of text */
FILE *fp = fopen (csvfile, "r");
if (fp == NULL) { /* validate csvfile open for reading */
fprintf(stderr, "Error, Could not open input file.\n");
return 2;
}
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
if (numread < LISTSIZE) /* warn if less than LISTSIZE integers read */
fprintf (stderr, "Warning: only '%zu' integers read from file", numread);
fclose (fp); /* close input file */
fp = fopen (argc > 2 ? argv[2] : "outfile.csv", "w"); /* open output file */
if (fp == NULL) { /* validate output file open for writing */
fprintf(stderr, "Error, Could not open output file.\n");
return 3;
}
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
return 0;
}
Like I said, there are many, many ways to approach this. The idea is to build in as much flexibility to your read as possible so it can handle any variations in the input format without choking. Another very robust way to approach the read is using strtol (or strtoul for unsigned values). Both allow will advance a pointer for you to the next character following the integer converted so you can start your scan for the next digit from there.
An example of the read flexibility provide in either of these approaches is shown below. Reading a file of any number of lines, with values separate by any separator and converting each integer encountered to a value in your array, e.g.
Example Input
$ cat ../dat/10int.csv
8572, -2213, 6434, 16330, 3034
12346, 4855, 16985, 11250, 1495
Example Program Use
$ ./bin/fgetscsv ../dat/10int.csv dat/outfile.csv
Example Output File
$ cat dat/outfile.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495
Look things over and let me know if you have questions. If your intent was to read 40-bytes in binary form, just let me know and I'm happy to help with an example there.
If you want a truly generic read of values in a file, you can tweak the code that finds the number in the input file to scan forward in the file and validate that any '-' is followed by a digit. This allows reading any format and simply picking the integers from the file. For example with the following minor change:
while (*p && numread < LISTSIZE) {
if (sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1)
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next number in buf */
for (; *p; p++) {
if (*p >= '0' && *p <= '9') /* positive value */
break;
if (*p == '-' && *(p+1) >= '0' && *(p+1) <= '9') /* negative */
break;
}
}
You can easily process the following file and obtain the same results:
$ cat ../dat/10intmess.txt
8572,;a -2213,;--a 6434,;
a- 16330,;a
- The Quick
Brown%3034 Fox
12346Jumps Over
A
4855,;*;Lazy 16985/,;a
Dog.
11250
1495
Example Program Use
$ ./bin/fgetscsv ../dat/10intmess.txt dat/outfile2.csv
Example Output File
$ cat dat/outfile2.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495

Check if all char values are present in string

I'm currently working on this assignment and I'm stuck. The objective is to read a file and find if these char values exist in the String from the file. I have to compare a String from a file to another String I put in as an argument. However, just as long as each char value is in the String from the file then it "matches".
Example (input and output):
./a.out file1 done
done is in bonehead
done is not in doggie
Example (file1):
bonehead
doggie
As you can see the order in which is compares Strings does not matter and the file also follows one word per line. I've put together a program that finds if the char value is present in the other String but that is only part of the problem. Any idea how to go about this?
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv){
FILE *f = fopen(argv[1], "r");
char *line = NULL;
size_t len = 0;
ssize_t read;
char *word = argv[2];
if(argc != 3){
printf("./a.out <file> <word>\n");
exit(EXIT_SUCCESS);
}
if(f == NULL){
printf("file empty\n");
exit(EXIT_SUCCESS);
}
// confused what this loop does too
while((read = getline(&line, &len, f)) != -1){
char *c = line;
while(*c){
if(strchr(word, *c))
printf("can't spell \"%s\" without \"%s\"!\n", line, word);
else
printf("no \"%s\" in \"%s\".\n", word, line);
c++;
}
}
fclose(f);
exit(EXIT_SUCCESS);
}

Another approach would simply keep a sum of each character matched in the line read from the file, adding one for each unique character in the word supplied to test, and if the sum is equal to the length of the string made up by the unique characters is the search term, then each of the unique characters in the search term are included in the line read from the file.
#include <stdio.h>
#include <string.h>
#define MAXC 256
int main (int argc, char **argv) {
if (argc < 3 ) { /* validate required arguments */
fprintf (stderr, "error: insufficient input, usage: %s file string\n",
argv[0]);
return 1;
}
FILE *fp = fopen (argv[1], "r");
char line[MAXC] = "";
char *s = argv[2]; /* string holding search string */
size_t slen = strlen(s), sum = 0, ulen;
char uniq[slen+1]; /* unique characters in s */
if (!fp) { /* validate file open */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
memset (uniq, 0, slen+1); /* zero the VLA */
/* fill uniq with unique characters from s */
for (; *s; s++) if (!strchr (uniq, *s)) uniq[sum++] = *s;
ulen = strlen (uniq);
s = argv[2]; /* reset s */
while (fgets (line, MAXC, fp)) { /* for each line in file */
if (strlen (line) - 1 < ulen) { /* short line, continue */
printf ("%s is not in %s", s, line);
continue;
}
char *up = uniq; /* ptr to uniq */
sum = 0; /* reset sum */
while (*up) if (strchr (line, *up++)) sum++; /* count chars */
if (sum < ulen) /* validate sum */
printf ("%s is not in %s", s, line);
else
printf ("%s is in %s", s, line);
}
fclose (fp); /* close file */
return 0;
}
Example Use/Output
$ ./bin/strallcinc dat/words.txt done
done is in bonehead
done is not in doggie
which would work equally well for duplicate characters in the search string. e.g.
$ ./bin/strallcinc dat/words.txt doneddd
doneddd is in bonehead
doneddd is not in doggie
You can decide if you would handle duplicate characters differently, but you should make some determination on how that contingency will be addressed.
Let me know if you have any questions.

confused what this loop does
The while (read ... line obviously reads in lines from your file, placing them in the line variable
*c is a pointer to the start of the variable line and this pointer is incremented by c++, so that each letter in the word from the file is accessed. The while loop will be terminated when *c points to the null terminator (0).
The if (strchr(word ... line is testing if the test word contains one of the letters from the word in the file.
This seems to be the reverse of what you are trying to do - finding if all the letters in the test word can be found in the word from the file.
The printf lines are not sensible because there is no either/or - you need one line to print 'yes' our letters are present and one line to print 'no' at least one letter is not present.
The printf statements should be outside the comparison loop, so that you don't get multiple lines of output for each word. Add a flag to show if any letter does not exist in the word. Set flag to 1 at start, and only change it to 0 when a letter is not present, then use the flag to print one of the two outcome statements.
This code snippet may help
/* set flag to 'letters all present' */
int flag = 1;
/* set pointer c to start of input line */
c = word;
/* test word from file for each letter in test word */
while(*c) {
if(strchr(line, *c) == NULL) {
/* set flag to letter not present */
flag = 0;
break;
}
c++;
}

For every possible pair of two unique words in the file, print out the count of occurrences of that pair [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
this code works for single word counts and it differentiate between words with punctuation words with upper lower case. Is there an easy way around to make this code work for pairs as well instead of single words? like I need to print the occurrence of every pair of words in a text file.
Your help is much appreciated,
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE* f = fopen (argv[1], "r");
char buffer[10000];
if (argc != 2)
{
fprintf(stderr, "Usage: %s file\n", argv[0]);
}
fclose(f);
snprintf(buffer, sizeof(buffer), "tr -cs '[:punct:][a-z][A-Z]' '[\\n*]' < %s |"
" sort | uniq -c | sort -n", argv[1]);
return(system(buffer));
}
Example input
The Cat Sat On The Mat
Output
(The Cat, The Sat, The On, The The, The Mat, Cat The, Cat Sat, Cat On, for 30 pairs)

It seems inconceivable that the purpose of your assignment determining the frequency of word-pairs in a file would be to have you wrap a piped-string of shell utilities in a system call. What does that possibly teach you about C? That a system function exists that allows shell access? Well, it does, and you can, lesson done, nothing learned.
It seems far more likely that the intent was for you to understand the use of structures to hold collections of related data in a single object, or at the minimum array or pointer indexing to check for pairs in adjacent words within a file. Of the 2 normal approaches, use of a struct, or index arithmetic, the use of a struct is far more beneficial. Something simple to hold a pair of words and the frequency that pair is seen is all you need. e.g.:
enum { MAXC = 32, MAXP = 100 };
typedef struct {
char w1[MAXC];
char w2[MAXC];
size_t freq;
} wordpair;
(note, the enum simply defines the constants MAXC (32) and MAXP (100) for maximum characters per-word, and maximum pairs to record. You could use two #define statements to the same end)
You can declare an array of the wordpair struct which will hold a pair or words w1 and w2 and how many time that pair is seen in freq. The array of struct can be treated like any other array, sorted, etc..
To analyze the file, you simply need to read the first two words into the first struct, save a pointer to the second word, and then read each remaining word that remains in the file comparing whether the pair formed by the pointer and the new word read already exists (if so simply update the number of times seen), and if it doesn't exist, add a new pair updating the pointer to point to the new word read, and repeat.
Below is a short example that will check the pair occurrence for the words in all filenames given as arguments on the command line (e.g. ./progname file1 file2 ...). If no file is given, the code will read from stdin by default.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 32, MAXP = 100 };
typedef struct {
char w1[MAXC];
char w2[MAXC];
size_t freq;
} wordpair;
size_t get_pair_freq (wordpair *words, FILE *fp);
int compare (const void *a, const void *b);
int main (int argc, char **argv) {
/* initialize variables & open file or stdin for seening */
wordpair words[MAXP] = {{"", "", 0}};
size_t i, idx = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* read from file given, or from stdin (default) */
idx = get_pair_freq (words, stdin);
/* read each remaining file given on command line */
for (i = 2; i < (size_t)argc; i++)
{ if (fp && fp != stdin) { fclose (fp); fp = NULL; }
/* open file for reading */
if (!(fp = fopen (argv[i], "r"))) {
fprintf (stderr, "error: file open failed '%s'.\n",
argv[i]);
continue;
}
/* check 'idx' against MAXP */
if ((idx += get_pair_freq (words, fp)) == MAXP)
break;
}
if (fp && fp != stdin) fclose (fp);
/* sort words alphabetically */
qsort (words, idx, sizeof *words, compare);
/* output the frequency of word pairs */
printf ("\nthe occurrence of words pairs are:\n\n");
for (i = 0; i < idx; i++) {
char pair[MAXC * 2] = "";
sprintf (pair, "%s:%s", words[i].w1, words[i].w2);
printf (" %-32s : %zu\n", pair, words[i].freq);
}
return 0;
}
size_t get_pair_freq (wordpair *pairs, FILE *fp)
{
char w1[MAXC] = "", w2[MAXC] = "";
char *fmt1 = " %32[^ ,.\t\n]%*c";
char *fmt2 = " %32[^ ,.\t\n]%*[^A-Za-z0-9]%32[^ ,.\t\n]%*c";
char *w1p;
int nw = 0;
size_t i, idx = 0;
/* read 1st 2 words into pair, update index 'idx' */
if (idx == 0) {
if ((nw = fscanf (fp, fmt2, w1, w2)) == 2) {
strcpy (pairs[idx].w1, w1);
strcpy (pairs[idx].w2, w2);
pairs[idx].freq++;
w1p = pairs[idx].w2; /* save pointer to w2 for next w1 */
idx++;
}
else {
if (!nw) fprintf (stderr, "error: file read error.\n");
return idx;
}
}
/* read each word in file into w2 */
while (fscanf (fp, fmt1, w2) == 1) {
/* check against all pairs in struct */
for (i = 0; i < idx; i++) {
/* check if pair already exists */
if (strcmp (pairs[i].w1, w1p) == 0 &&
strcmp (pairs[i].w2, w2) == 0) {
pairs[i].freq++; /* update frequency for pair */
goto skipdup; /* skip adding duplicate pair */
}
} /* add new pair, update pairs[*idx].freq */
strcpy (pairs[idx].w1, w1p);
strcpy (pairs[idx].w2, w2);
pairs[idx].freq++;
w1p = pairs[idx].w2;
idx++;
skipdup:
if (idx == MAXP) { /* check 'idx' against MAXP */
fprintf (stderr, "warning: MAXP words exceeded.\n");
break;
}
}
return idx;
}
/* qsort compare funciton */
int compare (const void *a, const void *b)
{
return (strcmp (((wordpair *)a)->w1, ((wordpair *)b)->w1));
}
Use/Output
Given your example of "Hi how are you are you.", it produces the desired results (in sorted order according to your LOCALE).
$ echo "Hi how are you are you." | ./bin/file_word_pairs
the occurrence of words pairs are:
Hi:how : 1
are:you : 2
how:are : 1
you:are : 1
(there is no requirement that you sort the results, but it makes lookup/confirmation a lot easier with longer files)
Removing qsort
$ echo "Hi how are you are you." | ./bin/file_word_pairs
the occurrence of words pairs are:
Hi:how : 1
how:are : 1
are:you : 2
you:are : 1
While you are free to attempt to use your system version, why not take the time to learn how to approach the problem in C. If you want to learn how to do it through a system call, take a Linux course, as doing it in that manner has very little to do with C.
Look it over, lookup the functions that are new to you in the man pages and then ask about anything you don't understand thereafter.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Extract numerical values from a string and average them - c

Related

Reading a text file to extract coordinates in c

C copy substring from text file

C program to copy .csv of integers copies one less element unless element size is set to +1

Check if all char values are present in string

For every possible pair of two unique words in the file, print out the count of occurrences of that pair [closed]

Categories

Resources