C Programming - File io parsing strings using sscanff

C Programming - File io parsing strings using sscanff - c

I am trying to do the following the C programming language, any help or if you can finish the code I will be greatly appreciated:
I am trying to write a program in C programming language that uses file io, that will parse through the words using sscanf function and output each word in all the sentences inside a txt document (bar.txt). Here is the instructions.
Write a program that opens the file bar.txt name the program "reader". Pass a parameter to indicate lines to read. Read all the lines in the file based on the parameter 'lines' into a buffer and using sscanf parse all the words of the sentences into different string* variables. Print each of the words to the screen followed by a carriage return. You can hardwire filename (path of bar.xt) or use option to enter filename.
This is the txt file (bar.txt) i am working with:
bar.txt
this is the first sentence
this is the 2nd sentence
this is the 3rd sentence
this is the 4th sentence
this is the 5th sentence
end of file: bar.txt
usage of argv: Usage: updater [-f "filename"] 'lines'
-f is optional (if not provided have a hardwired name from previous program 2 (bar.txt))
'lines' integer from 1 to 10 (remember the files has 5-10 strings from previous program)
a sample input example for the input into the program is:
./reader -f bar.txt 1
OUTPUT:
Opening file "bar.txt"
File Sentence 1 word 1 = this
File Sentence 1 word 2 = is
File Sentence 1 word 3 = the
File Sentence 1 word 4 = first
File Sentence 1 word 5 = sentence
another example
./reader -f bar.txt 5
OUTPUT:
File Sentence 5 word 1 = this
File Sentence 5 word 2 = is
File Sentence 5 word 3 = the
File Sentence 5 word 4 = 5th
File Sentence 5 word 5 = sentence
Examples of commands:
./reader -f bar.txt 5
./reader -f bar.txt 2
./reader -f bar.txt 7
./reader 2
./reader 5
./reader 8
./reader 11
this is the code that I have so far please fix the code to show the desired output:
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer = argv[3][0] - 48;
int i; //for loops
if (argv[1][0] == '-' && argv[1][1] == 'f')
{
file = fopen(filename, "r");
if (file == NULL){
printf("Could not open file %s",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}
}

You need to get the filename from argv if they use the -f option. And you need to get the number of lines from a different argument depending on whether this option was supplied.
Use strcmp() to compare strings, rather than testing each character separately. And use atoi() to convert the lines argument to an integer, since your method only works for single-digit numbers.
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
function usage() {
fprintf(stderr, "Usage: reader [-f filename] lines\n");
exit(1);
}
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer;
int i; //for loops
if (argc < 2) {
usage();
}
# Process arguments
if (strcmp(argv[1], "-f") == 0)
{
if (argc < 4) {
usage();
}
filename = argv[2];
integer = atoi(argv[3]);
} else {
integer = atoi(argc[1]);
}
file = fopen(filename, "r");
if (file == NULL){
fprintf(stderr, "Could not open file %s\n",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}

To add to what Barmar already answered, for the further steps in completing the assignment:
Splitting a string into separate words is usually called tokenization, and we normally use strtok() for this. There are several ways how one can use sscanf() to do it. For example:
Use sscanf(string, "%s %s %s", word1, word2, word3) with however many word buffers you might need. (If you use e.g. char word1[100], then use %99s, to avoid buffer overrun bugs. One character must be reserved for the end-of-string character \0.)
The return value of sscanf() tells you how many words it copied to the word buffers. However, if string contains more than the number of words you specified, the extra ones are lost.
If the exercise specifies the maximum length of strings, say N, then you know there can be at most N/2+1 words, each of maximum length N, because each consecutive pair of words must be separated by at least one space or other whitespace character.
  
Use sscanf(string + off, " %s%n", word, &len) to obtain each word in a loop. It will return 1 (with int len set to a positive number) for each new word, and 0 or EOF when string starting at off does not contain any more words.
The idea is that for each new word, you increment off by len, thus examining the rest of string in each iteration.
  
Use sscanf(string + off, " %n%*s%n", &start, &end) with int start, end to obtain the range of positions containing the next word. Set start = -1 and end = -1 before the call, and repeat as long as end > start after the call. Advance to next word by adding end to off.
The beginning of the next word (when start >= 0) is then string + start, and it has end - start characters.
To "emulate" strtok() behaviour, one can temporarily save the terminating character (which can be whitespace or the end of string character) by using e.g. char saved = string[off + end];, then replace it with an end-of-string character, string[off + end] = '\0';, so that (string + start) is a pointer to the word, just like strtok() returns. Before the next scan, one does string[off + end] = saved; to restore the saved character, and off += end; to advance to the next word. 
The first one is the easiest, but is the least useful in practical programs. (It works fine, but we do not usually know beforehand the string length and word count limitations.)
The second one is very useful when you have alternate patterns you can try for the next "word" or item; for example, when reading 2D or 3D vectors (points in a plane, or in three-dimensional space), you can support multiple different formats from <1 2 3> to [1,2,3] to 1 2 3, by trying to parse the most complicated/longest first, and trying the next one, until one of them works. (If none of them work, then the input is in error, of course.)
The third one is most useful in that it describes essentially how strtok() works, and what its side effects are. (It's saved character is hidden internally as a static variable.)

Related

Using fread() to parse data through a file and stdin

I am trying to write an algorithm that takes input from a file and builds what is called an "s1 record". (The functionality of this function is not important) Depending on the command line arguments, the program will set the inputFile to the specified file, or stdin if no file is provided.
The algorithm needs to be structured in a way that can handle both file patterns.
The idea of this is to take FILE* data and read it into a buffer of size 16 bytes. Every 16 bytes of data, an s1 record will be built. As long as there are 16 bytes to read then it works fine and dandy. Once there is a line with less than 16 bytes, it doesn't create an s1 record.
Ive tested the output and these are some of the things I noticed:
When I run the program using "stdin", I am prompted for user input. I enter 20 characters (which should print 16 in 1 srecord, and 4 in another) and my output is as follows:
12345678901234567890
Buffer: 1234567890123456
S113000031323334353637383930313233343536AA
When I run the program using a file (record.dat) with one single line with the characters of the alphabet on it, I get this:
Buffer: ABCDEFGHIJKLMNOP
Buffer: QRSTUVWXYZKLMNOP
This is not valid either, as it prints the "KLMNOP" at the end of the line as well.
My question is: How can I structure this to accept the input from either a file or stdin using the same algorithm, and what exactly am I doing wrong in my algorithm? I have tried providing all the useful information I can, and can specify more detail if requested. Below is the code for the algorithm I am trying to write.
inputFile is set to stdin if no file is specified
char buffer[kMaxLineSize + 1] = { '\0' };
char byte = 0;
int count = 1;
while((fread(buffer, 1, kMaxLineSize, inputFile)))
{
printf("%c", byte);
clearCRLF(buffer);
printf("Buffer: %s\n", buffer);
if(outputFormat == 1)
{
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer);
fprintf(outputFile, "%s\n", s1Record);
addressField += strlen(buffer);
s1Count++;
}
else
{
char asmRecord[kMaxASMRecordSize] = { 0 };
buildAssemblyRecord(asmRecord, buffer);
fprintf(outputFile, "%s\n", asmRecord);
}
}

I'll try to combine the comments in this answer.
But first, what you call an s1 "record" is not a record. It is a string of maximum 16 characters and a terminating null character. A record, in my understanding, is a struct with fields of possibly different types, one of which could be a string.
The code fixes are as follows:
char buffer[kMaxLineSize + 1] = { '\0' };
int len;
while (len=fread(buffer, 1, kMaxLineSize, inputFile))
{
...
char s1Record[kMaxSRecordSize] = { 0 };
buildS1Record(addressField, s1Record, buffer, len);
so you pass the length read to the function. Now it can copy the characters read and terminate with a '\0' character. Note also that there can be a discrepancy between kMaxLineSize and kMaxSRecordSize: they must be the same size (plus 1 for \0), so better use a single variable.
I hope this late answer can still be of use to you.

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.

Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).

fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}

Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

C Resetting data counters in FOR loop

I've got a very large text file that I'm trying to do word analysis on. Among word count, I might be looking for other information as well, but I left that out for simplicity.
In this text file I have blocks of text separated by asterisks '*'. The code I have below scans the text file and prints out # of characters and words as it should, but I'd like to reset the counter after an asterisk is met, and store all information in a table of some sort. I'm not so worried on how I'll make the table as much as I am unsure of how to loop the same counting code for each text block between asterisks.
Maybe a for loop like
for (arr = strstr(arr, "*"); arr; arr = strstr(arr + strlen("*"), "*"))
Example text file:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I have a sentence. I have two sentences now.
*
I have another sentence. And another.
*
I'd like to count the amount of words and characters from the asterisk above this
one until the next asterkisk, not including the count from the last one.
*
...
...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
(EOF)
Desired output:
*# #words #alphaChar
----------------------------
1 9 34
-----------------------------
2 5 30
-----------------------------
3 28 124
...
...
I have tried
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int characterCount=0;
int counterPosition, wordCount=0, alphaCount=0;
//input file
FILE *file= fopen("test.txt", "r");
if (file== NULL)
printf("Cannot find the file.\n");
//Count total number of characters in file
while (1)
{
counterPosition = fgetc(speechFile);
if (counterPosition == EOF)
break;
++characterCount;
}
rewind(file); // Sends the pointer to the beginning of the file
//Dynamically allocate since array size cant be variable
char *arr= ( char*) malloc(totalCharacterCount);
while(fscanf(speechFile, "%c", &arr[i]) != EOF ) //Scan until the end of file.
i++; //increment, storing each character in a unique position
for(i = 0; i <characterCount; i++)
{
if(arr[i] == ' ') //count words
wordCount++;
if(isalpha(arr[i])) //count letters only
alphaCount++;
}//end for loop
printf("word count is %d and alpha count is %d", wordCount,alphaCount);
}

Since you are having full files text in array arr[], you need to divide that string arr using * as delimiter. you can use strtok() to divide that string using * as delimiter. Then perform the word count and character count operation on each token. read this link to know about strtok.

Reading from two columns into variables

I'm writing a C program that takes an input file and stores it. The input file has two columns, with an integer in the first and a string in the second, like so:
12 apple
17 frog
20 grass
I've tried using fgets to take an entire line as a string then break it apart using scanf but I'm getting lots of issues. I have searched quite a lot but haven't found anything that answers my question, but sorry if I missed something obvious.
This is the code that I've been trying:
while(fgets(line, sizeof(line), fp))
{
scanf(line, "%d\t%s", &key, value);
insert(key, value, newdict);
}

Let's have a quick go at doing with strtok since someone mentioned it. Let's imagine your file is called file.txt and has the following contents:
10 aaa
20 bbb
30 ccc
This is how we can parse it:
#include <stdio.h>
#include <string.h>
#define MAX_NUMBER_OF_LINES 10 // parse a maximum of 10 lines
#define MAX_LINE_SIZE 50 // parse a maximum of 50 chars per line
int main ()
{
FILE* fh = fopen("file.txt", "r"); // open the file
char temp[MAX_LINE_SIZE]; // some buffer storage for each line
// storage for MAX_NUMBER_OF_LINES integers
int d_out[MAX_NUMBER_OF_LINES];
// storage for MAX_NUMBER_OF_LINES strings each MAX_LINE_SIZE chars long
char s_out[MAX_NUMBER_OF_LINES][MAX_LINE_SIZE];
// i is a special variable that tells us if we're parsing a number or a string (0 for num, 1 for string)
// di and si are indices to keep track of which line we're currently handling
int i = 0, di = 0, si = 0;
while (fgets(temp, MAX_LINE_SIZE, fh) && di < MAX_NUMBER_OF_LINES) // read the input file and parse the string
{
temp[strlen(temp) -1] = '\0'; // get rid of the newline in the buffer
char* c = strtok(temp, " "); // set the delimiters
while(c != NULL)
{
if (i == 0) // i equal to 0 means we're parsing a number
{
i = 1; // next we'll parse a string, let's indicate that
sscanf(c, "%d", &d_out[di++]);
}
else // i must be 1 parsing a string
{
i = 0; // next we'll parse a number
sprintf(s_out[si++], "%s", c);
}
c = strtok(NULL, " ");
}
printf("%d %s\n", d_out[di -1], s_out[si - 1]); // print what we've extracted
}
fclose(fh);
return 0;
}
This will extract the contents from the file and store them in respective arrays, we then print them and get back our original contents:
$ ./a.out
10 aaa
20 bbb
30 ccc

Use:
fgets (name, 100, stdin);
100 is the max length of the buffer. You should adjust it as per your need.
Use:
scanf ("%[^\n]%*c", name);
The [] is the scanset character. [^\n] tells that while the input is not a newline ('\n') take input. Then with the %*c it reads the newline character from the input buffer (which is not read), and the * indicates that this read in input is discarded (assignment suppression), as you do not need it, and this newline in the buffer does not create any problem for next inputs that you might take.

The problem here seems to be that you are reading from the file twice. First with fgets and then with scanf. You will probably not get an errors from the compiler in your use of scanf, but should be getting warnings as you use line for the format string and the other arguments does not match the format. It would also be pretty obvious if you checked the return value from scanf, as it returns the number of successfully scanned items. Your call would most likely return zero (or minus one when you have hit end of file).
You should be using sscanf instead to parse the line you read with fgets.
See e.g. this reference for the different scanf variants.

Your problem can be solved by using sscanf (with the support of getline) like below:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
/* tokens bags */
char tok_str[255];
int tok_int;
fp = fopen("./file.txt", "r");
if (fp == NULL)
exit(EXIT_FAILURE);
/* Reads the line from the stream. */
while ((read = getline(&line, &len, fp)) != -1) {
/* Scans the character string pointed by line, according to given format. */
sscanf(line, "%d\t%s", &tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
if (line)
free(line);
exit(EXIT_SUCCESS);
}
Or, even simpler. You could use fscanf (with the support of feof) and replace the while loop shown above (along with some other redundant code cleanups) with the following one:
/* Tests the end-of-file indicator for the stream. */
while (!feof(fp)) {
/* Scans input from the file stream pointer. */
fscanf(fp,"%d\t%s\n",&tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
Assuming that your file contains following lines (where single line format is number[tab]string[newline]):
12 apple
17 frog
20 grass
the output will be:
12-apple
17-frog
20-grass

How to find number of lines of a file?

for example:
file_ptr=fopen(“data_1.txt”, “r”);
how do i find number of lines in the file?

You read every single character in the file and add up those that are newline characters.
You should look into fgetc() for reading a character and remember that it will return EOF at the end of the file and \n for a line-end character.
Then you just have to decide whether a final incomplete line (i.e., file has no newline at the end) is a line or not. I would say yes, myself.
Here's how I'd do it, in pseudo-code of course since this is homework:
open file
set line count to 0
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
read character from file
Extending that to handle a incomplete last line may not be necessary for this level of question. If it is (or you want to try for extra credits), you could look at:
open file
set line count to 0
set last character to end-of-file
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
set last character to character
read character from file
if last character is not new-line:
add 1 to line count
No guarantees that either of those will work since they're just off the top of my head, but I'd be surprised if they didn't (it wouldn't be the first or last surprise I've seen however - test it well).

Here's a different way:
#include <stdio.h>
#include <stdlib.h>
#define CHARBUFLEN 8
int main (int argc, char **argv) {
int c, lineCount, cIdx = 0;
char buf[CHARBUFLEN];
FILE *outputPtr;
outputPtr = popen("wc -l data_1.txt", "r");
if (!outputPtr) {
fprintf (stderr, "Wrong filename or other error.\n");
return EXIT_FAILURE;
}
do {
c = getc(outputPtr);
buf[cIdx++] = c;
} while (c != ' ');
buf[cIdx] = '\0';
lineCount = atoi((const char *)buf);
if (pclose (outputPtr) != 0) {
fprintf (stderr, "Unknown error.\n");
return EXIT_FAILURE;
}
fprintf (stdout, "Line count: %d\n", lineCount);
return EXIT_SUCCESS;
}

Is finding the line count the first step of some more complex operation? If so, I suggest you find a way to operate on the file without knowing the number of lines in advance.
If your only purpose is to count the lines, then you must read them and... count!