I am trying to create a a program that does the following actions:
Open a file and read one line.
Open another file and read another line.
Compare the two lines and print a message.
This is my code:
#include <stdio.h>
#include <string.h>
int findWord(char sizeLineInput2[512]);
int main()
{
FILE*cfPtr2,*cfPtr1;
int i;
char sizeLineInput1[512],sizeLineInput2[512];
cfPtr2=fopen("mike2.txt","r");
// I open the first file
while (fgets(sizeLineInput2, 512, cfPtr2)!=NULL)
// I read from the first 1 file one line
{
if (sizeLineInput2[strlen(sizeLineInput2)-1]=='\n')
sizeLineInput2[strlen(sizeLineInput2)-1]='\0';
printf("%s \n",sizeLineInput2);
i=findWord(sizeLineInput2);
//I call the procedure that compares the two lines
}
getchar();
return 0;
}
int findWord(char sizeLineInput2[512])
{
int x;
char sizeLineInput1[512];
File *cfPtr1;
cfPtr1=fopen("mike1.txt","r");
// here I open the second file
while (fgets(sizeLineInput1, 512,cfPtr1)!=NULL)
{
if (sizeLineInput1[strlen(sizeLineInput1)-1]=='\n')
sizeLineInput1[strlen(sizeLineInput1)-1]='\0';
if (strcmp(sizeLineInput1,sizeLineInput2)==0)
//Here, I compare the two lines
printf("the words %s and %s are equal!\n",sizeLineInput1,sizeLineInput2);
else
printf("the words %s and %s are not equal!\n",sizeLineInput1,sizeLineInput2);
}
fclose(cfPtr1);
return 0;
}
It seems to have some problem with file pointers handling. Could someone check it and tell me what corrections I have to do?
Deconstruction and Reconstruction
The current code structure is, to be polite about it, cock-eyed.
You should open the files in the same function - probably main(). There should be two parallel blocks of code. In fact, ideally, you'd do your opening and error handling in a function so that main() simply contains:
FILE *cfPtr1 = file_open("mike1.txt");
FILE *cfPtr2 = file_open("mike2.txt");
If control returns to main(), the files are open, ready for use.
You then need to read a line from each file - in main() again. If either file does not contain a line, then you can bail out with an appropriate error:
if (fgets(buffer1, sizeof(buffer1), cfPtr1) == 0)
...error: failed to read file1...
if (fgets(buffer2, sizeof(buffer2), cfPtr2) == 0)
...error: failed to read file2...
Then you call you comparison code with the two lines:
findWord(buffer1, buffer2);
You need to carefully segregate the I/O operations from the actual processing of data; if you interleave them as in your first attempt, it makes everything very messy. I/O tends to be messy, simply because you have error conditions to deal with - that's why I shunted the open operation into a separate function (doubly so since you need to do it twice).
You could decide to wrap the fgets() call and error handling up in a function, too:
const char *file1 = "mike1.txt";
const char *file2 = "mike2.txt";
read_line(cfPtr1, file1, buffer1, sizeof(buffer1));
read_line(cfPtr2, file2, buffer2, sizeof(buffer2));
That function can trim the newline off the end of the string and deal with anything else that you want it to do - and report an accurate error, including the file name, if anything goes wrong. Clearly, with the variables 'file1' and 'file2' on hand, you'd use those instead of literal strings in the file_open() calls. Note, too, that making them into variables means it is trivial to take the file names from the command line; you simply set 'file1' and 'file2' to point to the argument list instead of the hard-wired defaults. (I actually wrote: const char file1[] = "mike1.txt"; briefly - but then realized that if you handle the file names via the command line, then you need pointers, not arrays.)
Also, if you open a file, you should close the file too. Granted, if your program exits, the o/s cleans up behind you, but it is a good discipline to get into. One reason is that not every program exits (think of the daemons running services on your computer). Another is that you quite often use a resource (file, in the current discussion) briefly and do not need it again. You should not hold resources in your program for longer than you need them.
Philosophy
Polya, in his 1957 book "How To Solve It", has a dictum:
Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.
That is as valid advice in programming as it is in mathematics. And in their classic 1978 book 'The Elements of Programming Style', Kernighan and Plauger make the telling statements:
[The] subroutine call permits us to summarize the irregularities in the argument list [...]
The subroutine itself summarizes the regularities of the code.
In more modern books such as 'The Pragmatic Programmer' by Hunt & Thomas (1999), the dictum is translated into a snappy TLA:
DRY - Don't Repeat Yourself.
If you find your code doing the 'same' lines of code repeated several times, write a subroutine to do it once and call the subroutine several times.
That is what my suggested rewrite is aiming at.
In both main() and findWord() you should not use strlen(sizeLineInputX) right after reading the file with fgets() - there may be no '\0' in sizeLineInput2 and you will have strlen() read beyond the 512 bytes you have.
Instead of using fgets use fgetc to read char by char and check for a newline character (and for EOF too).
UPD to your UPD: you compare each line of mike2.txt with each line of mike1.txt - i guess that's not what you want. Open both files one outside while loop in main(), use one loop for both files and check for newline and EOF on both of them in that loop.
Related
I am trying to read each line from stdin after I finished reading from given file, or if given file name does not exist. Currently I am using below format.
while (fgets(buf, sizeof(buf), fp)!=NULL){
main process...
}
while (fgets(buf, sizeof(buf), stdin)!=NULL){
main process...
}
This format does work as I intended.
However, main process is quite a chunky code, and would there be a way to shorten this, so that I can write while loop only once? Thank you.
If your problem is that 'main process' consists of a lot of lines of code that you do not want to duplicate, the most straightforward solution is to make a function that implements main process.
Since the while loops are identical, save for the file pointer, you could also include the while loop in the function, with the file pointer as a parameter (as in David's remark).
Then you should add a function like this:
void process_input(FILE *input_handle) {
char buf[1024];
while (fgets(buf, sizeof(buf), input_handle) != NULL) {
main process...
}
}
And your original code then should be replaced with:
process_input(fp);
process_input(stdin);
would there be a way to shorten this, so that I can write while loop only once?
There isn't.
You can of course abstract the code into a function which takes a FILE* as a parameter, or extend the stdio interfaces yourself (example), but the long and short of it is that neither standard C nor any popular libc implementation have anything like the ARGV file handle from perl, or anything that let you open a list of files as a single stream.
I am completely new to flex, and my experience in programming is rather little. I need to create a scanner using flex that will output a stream of tokens eventually. For the moment, I just need to get the absolute basics up and running. I want the compiled output file "a.exe" to be able to be run from the text within a SINGLE file and not user input. The output should also be to a file. The assignment asks that the program is able to run like so in a cmd/PS window:
.\a.exe inputfile.txt outputfile.txt
Where input and output files are whatever file names are added in that order.
As it stands currently, my program creates the output file I designate, but nothing is written to it. When trying to read the Flex Manual, I am very confused as I am still very new to computer sciences in general.
As per the moment, I just want to get an executable file that will adhere to the rules section and output properly. This said I am generically just counting the characters in the input file and trying to display them to an output file. I also am trying to help the others in my class have a place to begin (as none of us were formally taught in this affair) so I am taking the time to try and create this file generically (with installation and usage instructions) so that I can give them a place to start the actual assignment of making the scanner.
I installed Flex 2.5.4a from http://gnuwin32.sourceforge.net/packages.html. I edited my Path to include the bin file after installation.
I build the file using the command "flex tokenout.l" and then "gcc lex.yy.c" and it generates an a.exe file. The file does not seem to work much at all past creating the output file.
code:
int num_lines = 0;
int num_chars = 0;
FILE *yyin;
FILE *yyout;
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
int yywrap(void) {
return 0;
}
int main(int argc, char *argv[])
{
yyin = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
yyparse();
yylex();
fprintf(yyout,"# of lines = %d, # of chars = %d\n", num_lines, num_chars);
fclose(yyin);
fclose(yyout);
return 0;
}
The result should be that the line "# of lines = the actual # of lines, # of chars = the actual # of characters" to the file designated as the second argument.
Currently the file designated by the second argument is created but remains blank.
Lex (flex) calls (or more precisely, generates code that calls) yywrap upon reaching the end of its input stream (in yyin). The job of this function is to:
Take care of closing the input file if needed / appropriate.
Switch to the next input file, if there is a next file.
Return nonzero (1, preferably) if flex should finish up, 0 if yyin is now re-opened to the next file.
Or, as the manual puts it:
When the scanner receives an end-of-file indication from YY_INPUT, it then checks the ‘yywrap()’ function. If ‘yywrap()’ returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.
If you do not supply your own version of ‘yywrap()’, then you must either use ‘%option noyywrap’ (in which case the scanner behaves as though ‘yywrap()’ returned 1), or you must link with ‘-lfl’ to obtain the default version of the routine, which always returns 1.
(Modern flex has <<EOF>> rules which are generally a better way to deal with stacked input files, since transitions between files should almost always force a token boundary.)
yyin = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
yyparse();
yylex();
As it stands currently, my program creates the output file I designate, but nothing is written to it.
You're confused because you don't know what your program is doing, and you don't know what it's doing because it's not telling you. What you need is feedback. In particular, you need to check for errors.
For example, what if the first fopen(3) fails? What if yyparse fails, or doesn't return? (It won't.) Check for errors, and have the program tell you what's happening.
#include <err.h>
if( argc < 3 ) {
errx(EXIT_FAILURE, "syntax: foo in out");
}
if( (yyin = fopen(argv[1],"r")) == NULL ) {
err(EXIT_FAILURE, "could not read '%s'", argv[1]);
}
if (yyout = fopen(argv[2],"w")) == NULL ) {
err(EXIT_FAILURE, "could not write '%s'", argv[2]);
}
printf("starting yyparse\n");
if( 0 != yyparse() ) {
errx(EXIT_FAILURE, "parse error");
}
printf("starting yylex\n");
if( 0 != yylex() ) {
errx(EXIT_FAILURE, "lex error");
}
The above ensures the program is started with sufficient arguments, ensures both files are open successfully, and checks for errors parsing and lexing. That's just an example, though. As John Bollinger advised, you don't need yyparse because you're not using bison, and yyout controls only the file used by the flex ECHO statement. You can use your own global FILE * handle, and fprintf(3) to it in your flex actions.
What i think you will find is that you never see "starting yylex" on the screen, because yyparse never returns, because -- if it is being generated somewhere -- it's not returning, because it's calling yylex, which never returns anything to it.
I would delete those lines, and set flex debugging on with
yy_flex_debug = 1;
before calling yylex. I think you'll find it makes more sense then.
You appear to be starting by adapting an example program from the Flex manual. That's fine, but maybe your very first step should be getting the exact example program working. After that, take it one step at a time. For example, the next step might be to get it to use the first argument as the name of the input file (and no other changes).
With respect to the partial program you have presented, I see two semantic issues:
When you use flex with bison (or yacc), it is the generated parser (accessed via yyparse()) that calls yylex(), and generally it will do so repeatedly until the input is exhausted. It is not useful in that case for the main program to call the lexer directly.
yyout is the file to which flex will direct the output of ECHO statements, nothing more, nothing less. It is not particularly useful to you, and I would ignore it for now.
I'm curious about the function signature for yyrestart - namely in the lexer file I see that the signature is:
void yyrestart (FILE * input_file )
In my code I use yyrestart to flush the buffer, but I haven't been passing it any argument, it's just been empty:
yyrestart();
Which is currently working on every system we test on except for the latest version of OS X. Stepping through with GDB, it's clear on my rhel machine that just calling with no argument sets the file pointer to NULL:
yyrestart (input_file=0x0) at reglexer.c:1489
Whereas on El Capitan it comes through as garbage, which is causing the mem error later in generated code:
yyrestart (input_file=0x100001d0d) at reglexer.c:1489
I can't for the life of me figure out where yyrestart() is defined. Is there some macro in yacc/flex that defines the behavior for calling yyrestart with no arguments? If not, how is this even compiling?
*********** EDIT to Clarify the Compiling Question ************
As a small snippet to see what I'm talking about - this is what I have in a my .y file, which is executing the parser (this is a SLIGHT modification of what's this example):
int main() {
FILE *myfile = fopen("infile.txt", "r");
if (!myfile) {
fprintf(stderr, "can't open infile.txt\n");
return 1;
}
calcYYin = myfile;
do {
calcYYparse();
} while (!feof(calcYYin));
calcYYrestart();
return 0;
}
I can build that repository with whatever I want passed in as arguments to calcYYrestart() on that line. Substituting
calcYYrestart('a', 1, 5, 'a string');
still lets me compile the entire program using make (but a get a segv with bad input). But looking through the generated parcalc.c file, I don't see anything that would allow me to call calcYYrestart with anything except for a file pointer. I only see this as the prototype:
void calcYYrestart (FILE * input_file );
Where's the magic happening with the compiler that lets me put whatever I want as arguments to that generated function?
You are expecting C to gently lead you through the maze, holding your hand, chiding you when you err and applauding your successes.
These may not be unreasonable expectations for a language, but C is not that language. C does what you tell it to do, nothing more, and when your instructions fall short of clarity, it simply lets you fall.
Although, in its defense, you can ask it to be a bit more verbose. If you specify -Wall on the command line (at least with gcc and clang), the compiler will provide you with some warnings. [See note 1.]
In this case, it probably would have warned you that calcYYrestart was not declared, which would make it your responsibility to get the arguments right. The function is declared and defined in the lexer, but here you are using it in the parser, which is a separate compilation unit. You really should declare it in the parser prologue, but nothing will enforce the correctness of that declaration. (C++ would fail to link in that case, but C does not record argument types in the formal function name.)
It's worth noting that there are many problems with the sample code you are basing your work on. I'd suggest looking for a better bison/flex tutorial, or at least reading through the sections in the flex manual about how input is handled.
Here, I've added some annotations to the original example, which shows the calc.y bison input file:
/* This is unnecessary, since `calcYYparse` is defined in this file.
extern int calcYYparse();
*/
extern FILE *calcYYin;
/* Command line arguments are always good */
int main(int argc, char** argv) {
/* If there is an argument, use it. Otherwise, stick with stdin */
/* There is no need for a local variable. We can just use yyin */
if (argc > 1) {
calcYYin = fopen(argv[1], "r");
if (!calcYYin) {
fprintf(stderr, "can't open infile.txt\n");
return 1;
}
}
/* calcYYin = myfile; */
/* This loop is unnecessary, since yyparse parses input until it
* reaches EOF, unless it hits an error. And if it hits an error, it
* will call calcYYerror (below), which in turn calls exit(1), so it
* never returns.
*/
/* do { */
calcYYparse();
/* } while (!feof(calcYYin)); */
return 0;
}
void calcYYerror(const char* s) {
fprintf(stderr, "Error! %s\n", s);
/* Valid arguments to `exit` are 0 and small positive integers. */
exit(EXIT_FAILURE);
}
Of course, you probably don't want to just blow up the world if you hit a syntax error. The intention was probably to discard the rest of the line and then continue the parse. In that case, for obvious reasons, callYYerror should not call exit().
By default, after yyerror is called, yyparse returns immediately (after cleaning up its local storage) with an error indication. If you want it to instead continue, then you need to use an error production, which would be the best solution.
You could also simply call yyparse again, as in the example. However, that leaves an unknown amount of the input file in the flex buffer. There is no reason to believe that the buffer contains exactly the rest of the line in error. Since flex scanners typically read there input in large chunks (except for interactive input), resetting the input file with yyrestart will discard a random amount of input, leaving the input file pointer at a random position in the file, which probably does not correspond with the beginning of a new line.
Even if that were not the case, as with unbuffered (interactive) input, it is entirely possible that the error was detected at the end of a line, in which case the new line will already have been consumed. So discarding to the end of the current line will result in discarding the line following the error.
Finally, the use of feof(input) to terminate input loops is a well-known antipattern, and should be avoided in favour of terminating when an EOF is encountered while reading input. In the case of flex-generated scanners, when EOF is detected, the current input is discarded, and then (if yywrap doesn't succeed in creating a new input), the END indication is returned to the parser. By then, yyin is no longer valid (because it was discarded), and calling feof on it is undefined behaviour.
Notes
You get even more warnings by also specifying -Wextra. And you can make the compiler a little stricter by telling it to use the latest standard, -std=c11, instead of the 1989 version augmented with various gcc extensions, mostly now outdated.)
Disclaimer: this is for an assignment. I am not asking for explicit code. Rather, I only ask for enough help that I may understand my problem and correct it myself.
I am attempting to recreate the Unix ar utility as per a homework assignment. The majority of this assignment deals with file IO in C, and other parts deal with system calls, etc..
In this instance, I intend to create a simple listing of all the files within the archive. I have not gotten far, as you may notice. The plan is relatively simple: read each file header from an archive file and print only the value held in ar_hdr.ar_name. The rest of the fields will be skipped over via fseek(), including the file data, until another file is reached, at which point the process begins again. If EOF is reached, the function simply terminates.
I have little experience with file IO, so I am already at a disadvantage with this assignment. I have done my best to research proper ways of achieving my goals, and I believe I have implemented them to the best of my ability. That said, there appears to be something wrong with my implementation. The data from the archive file does not seem to be read, or at least stored as a variable. Here's my code:
struct ar_hdr
{
char ar_name[16]; /* name */
char ar_date[12]; /* modification time */
char ar_uid[6]; /* user id */
char ar_gid[6]; /* group id */
char ar_mode[8]; /* octal file permissions */
char ar_size[10]; /* size in bytes */
};
void table()
{
FILE *stream;
char str[sizeof(struct ar_hdr)];
struct ar_hdr temp;
stream = fopen("archive.txt", "r");
if (stream == 0)
{
perror("error");
exit(0);
}
while (fgets(str, sizeof(str), stream) != NULL)
{
fscanf(stream, "%[^\t]", temp.ar_name);
printf("%s\n", temp.ar_name);
}
if (feof(stream))
{
// hit end of file
printf("End of file reached\n");
}
else
{
// other error interrupted the read
printf("Error: feed interrupted unexpectedly\n");
}
fclose(stream);
}
At this point, I only want to be able to read the data correctly. I will work on seeking the next file after that has been finished. I would like to reiterate my point, however, that I'm not asking for explicit code - I need to learn this stuff and having someone provide me with working code won't do that.
You've defined a char buffer named str to hold your data, but you are accessing it from a separate memory ar_hdr structure named temp. As well, you are reading binary data as a string which will break because of embedded nulls.
You need to read as binary data and either change temp to be a pointer to str or read directly into temp using something like:
ret=fread(&temp,sizeof(temp),1,stream);
(look at the doco for fread - my C is too rusty to be sure of that). Make sure you check and use the return value.
I'm trying to read a specific line from a file and I can get the line number but I'm not sure how to go about doing it, this is what I have so far:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *file;
file = fopen("temp.txt","r");
char tmp[256]={0x0};
char *tmpline;
int numline = 1;
while(file != NULL && fgets(tmp, sizeof(tmp),file) !=NULL)
{
tmpline = strstr(tmp,"status:green");
if(tmpline) {
printf("%d - %s", numline, tmpline);
}
numline++;
}
if (file != NULL) fclose(file);
return 0;
}
The test file looks like:
s1.server.com
127.0.0.1
status:green
s2.server.com
127.0.0.1
status:red
s3.server.com
127.0.0.1
status:red
s4.server.com
127.0.0.1
status:green
The output that I have is:
3 - status:green
15 - status:green
But what I really want it to show is:
s1.server.com
s4.server.com
I want it to search for "status:green" then go back a few lines to show which server it belongs to
It sounds as if you need to do one of two things.
Simpler option: keep a little circular buffer of lines. Read into line 0, line 1, line 2, ..., line n-1, line 0, line 1, etc. Then, when you see the text you want, look in entry (current_index - 2) mod buffer_size. (Here it sounds as if a buffer size of 3 will suffice.)
More sophisticated option: actually parse the input so that for each block you work out the server name, its IP address and its status, and then display the information you need using that.
The "more sophisticated option" would be substantially more work, but more robust if the syntax of your input ever changes (e.g., with optional extra lines with more information about the server -- multiple IP addresses or multiple names, perhaps).
There are some other things you could do that I think are worse. (1) Call ftell on each line and put the results of that in a circular buffer, and then use fseek when you see "status:green". (2) Read the whole file using code like you currently have, building up a list of "good" servers' line numbers. Then go through the file again and report the good ones. I think these are both uglier and less efficient than the approaches I listed above. There's one possible advantage: you can adapt them to count in "stanzas" separated by blank lines, without needing to parse things properly. That would get you part of the flexibility of the "more sophisticated" approach I mentioned, without needing a proper parser.
And here's a hybrid possibility: don't use a circular buffer, but one whose size can increase if need be. Start at the first entry in the buffer each time you see a blank line. Let the buffer grow if there are "long" stanzas. Then when you see "status:green", do whatever processing you need to on the (presumably complete) stanza now held in your buffer.
None of the above is necessary, of course, if you're sure that the file format will never change.
If the test file (and production file) is well-formed then you can do something like the following (error checking left out for brevity!):
typedef struct _SERVERSTATUS
{
char* name;
char* ip;
char* status;
} SERVERSTATUS;
SERVERSTATUS ss;
ss.name = calloc(256);
ss.ip = calloc(256);
ss.status = calloc(256);
while (!feof(file))
{
fgets(ss.name, file);
fgets(ss.ip, file);
fgets(ss.status, file);
if (!strcmp(ss.status, "status:green"))
printf("%s\n", ss.name);
}
free(ss.name);
free(ss.ip);
free(ss.status);
Edit: You also have to handle the whitespace between the file entries! That's, um, left as an exercise for the questioner
Read the first and third lines in each group. Search for status:green and, if found, print the server name.