I made a minimal example for Packcc parser generator.
Here, the parser have to recognize float or integer numbers.
I try to print the location of the detected numbers. For simplicity there is no
line/column count, just the number from "ftell".
%auxil "FILE*" # The type sent to "pcc_create" for access in "ftell".
test <- line+
/
_ EOL+
line <- num _ EOL
num <- [0-9]+'.'[0-9]+ {printf("Float at %li\n", ftell(auxil));}
/
[0-9]+ {printf("Integer at %li\n", ftell(auxil));}
_ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r'
%%
int main()
{
FILE* file = fopen("test.txt", "r");
stdin = file;
if(file == NULL) {
// try to open.
puts("File not found");
}
else {
// parse.
pcc_context_t *ctx = pcc_create(file);
while(pcc_parse(ctx, NULL));
pcc_destroy(ctx);
}
return 0;
}
The file to parse can be
2.0
42
The command can be
packcc test.peg && cc test.c && ./a.out
The problem is the cursor value is always at the end of file whatever the number
position.
Positions can be retrieved by special variables.
In the example above "ftell" must be replaced by "$0s" or "$0e".
$0s is the begining of the matched pattern, $0e is the end of the matched pattern.
https://github.com/arithy/packcc/blob/master/README.md
Without looking more closely at the generated code, it would seem that the parser insists on reading the entire text into memory before executing any of the actions. That seems unnecessary for this grammar, and it is certainly not the way a typical generated lexical scanner would work. It's particularly odd since it seems like the generated scanner uses getchar to read one byte at a time, which is not very efficient if you are planning to read the entire file.
To be fair, you wouldn't be able to use ftell in a flex-generated scanner either, unless you forced the scanner into interactive mode. (The original AT&T lex, which also reads one character at a time, would give you reasonable value from ftell. But you're unlikely to find a scanner built with it anymore.)
Flex would give you the wrong answer because it deliberately reads its input in chunks the size of its buffer, usually 8k. That's a lot more efficient than character-at-a-time reading. But it doesn't work for interactive environments -- for example, where you are parsing directly from user input -- because you don't want to read beyond the end of the line the user typed.
You'll have to ask whoever maintains packcc what their intended approach for maintaining source position is. It's possible that they have something built in.
Related
I am completely new to flex, and my experience in programming is rather little. I need to create a scanner using flex that will output a stream of tokens eventually. For the moment, I just need to get the absolute basics up and running. I want the compiled output file "a.exe" to be able to be run from the text within a SINGLE file and not user input. The output should also be to a file. The assignment asks that the program is able to run like so in a cmd/PS window:
.\a.exe inputfile.txt outputfile.txt
Where input and output files are whatever file names are added in that order.
As it stands currently, my program creates the output file I designate, but nothing is written to it. When trying to read the Flex Manual, I am very confused as I am still very new to computer sciences in general.
As per the moment, I just want to get an executable file that will adhere to the rules section and output properly. This said I am generically just counting the characters in the input file and trying to display them to an output file. I also am trying to help the others in my class have a place to begin (as none of us were formally taught in this affair) so I am taking the time to try and create this file generically (with installation and usage instructions) so that I can give them a place to start the actual assignment of making the scanner.
I installed Flex 2.5.4a from http://gnuwin32.sourceforge.net/packages.html. I edited my Path to include the bin file after installation.
I build the file using the command "flex tokenout.l" and then "gcc lex.yy.c" and it generates an a.exe file. The file does not seem to work much at all past creating the output file.
code:
int num_lines = 0;
int num_chars = 0;
FILE *yyin;
FILE *yyout;
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
int yywrap(void) {
return 0;
}
int main(int argc, char *argv[])
{
yyin = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
yyparse();
yylex();
fprintf(yyout,"# of lines = %d, # of chars = %d\n", num_lines, num_chars);
fclose(yyin);
fclose(yyout);
return 0;
}
The result should be that the line "# of lines = the actual # of lines, # of chars = the actual # of characters" to the file designated as the second argument.
Currently the file designated by the second argument is created but remains blank.
Lex (flex) calls (or more precisely, generates code that calls) yywrap upon reaching the end of its input stream (in yyin). The job of this function is to:
Take care of closing the input file if needed / appropriate.
Switch to the next input file, if there is a next file.
Return nonzero (1, preferably) if flex should finish up, 0 if yyin is now re-opened to the next file.
Or, as the manual puts it:
When the scanner receives an end-of-file indication from YY_INPUT, it then checks the ‘yywrap()’ function. If ‘yywrap()’ returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.
If you do not supply your own version of ‘yywrap()’, then you must either use ‘%option noyywrap’ (in which case the scanner behaves as though ‘yywrap()’ returned 1), or you must link with ‘-lfl’ to obtain the default version of the routine, which always returns 1.
(Modern flex has <<EOF>> rules which are generally a better way to deal with stacked input files, since transitions between files should almost always force a token boundary.)
yyin = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
yyparse();
yylex();
As it stands currently, my program creates the output file I designate, but nothing is written to it.
You're confused because you don't know what your program is doing, and you don't know what it's doing because it's not telling you. What you need is feedback. In particular, you need to check for errors.
For example, what if the first fopen(3) fails? What if yyparse fails, or doesn't return? (It won't.) Check for errors, and have the program tell you what's happening.
#include <err.h>
if( argc < 3 ) {
errx(EXIT_FAILURE, "syntax: foo in out");
}
if( (yyin = fopen(argv[1],"r")) == NULL ) {
err(EXIT_FAILURE, "could not read '%s'", argv[1]);
}
if (yyout = fopen(argv[2],"w")) == NULL ) {
err(EXIT_FAILURE, "could not write '%s'", argv[2]);
}
printf("starting yyparse\n");
if( 0 != yyparse() ) {
errx(EXIT_FAILURE, "parse error");
}
printf("starting yylex\n");
if( 0 != yylex() ) {
errx(EXIT_FAILURE, "lex error");
}
The above ensures the program is started with sufficient arguments, ensures both files are open successfully, and checks for errors parsing and lexing. That's just an example, though. As John Bollinger advised, you don't need yyparse because you're not using bison, and yyout controls only the file used by the flex ECHO statement. You can use your own global FILE * handle, and fprintf(3) to it in your flex actions.
What i think you will find is that you never see "starting yylex" on the screen, because yyparse never returns, because -- if it is being generated somewhere -- it's not returning, because it's calling yylex, which never returns anything to it.
I would delete those lines, and set flex debugging on with
yy_flex_debug = 1;
before calling yylex. I think you'll find it makes more sense then.
You appear to be starting by adapting an example program from the Flex manual. That's fine, but maybe your very first step should be getting the exact example program working. After that, take it one step at a time. For example, the next step might be to get it to use the first argument as the name of the input file (and no other changes).
With respect to the partial program you have presented, I see two semantic issues:
When you use flex with bison (or yacc), it is the generated parser (accessed via yyparse()) that calls yylex(), and generally it will do so repeatedly until the input is exhausted. It is not useful in that case for the main program to call the lexer directly.
yyout is the file to which flex will direct the output of ECHO statements, nothing more, nothing less. It is not particularly useful to you, and I would ignore it for now.
I would like to know how to get the cursor position (x, y) in my program, without writing anything on the screen neither tracking it all the time.
I found out a way to get its position with this function (I don't check the return of read, write, etc here to write a smaller code on this subject but I do it in my program):
void get_cursor_position(int *col, int *rows)
{
int a = 0;
int i = 0;
char buf[4];
write(1, "\033[6n", 4); // string asking for the cursor position
read(1, buf, 4);
while (buf[i])
{
if (buf[i] >= 48 && buf[i] <= 57)
{
if (a == 0)
*rows = atoi(&buf[i]) - 1;
else
*col = atoi(&buf[i]) - 1;
a++;
}
i++;
}
}
This function gives me the exact cursor position (*rows = y, *col = x), but it writes on the screen.
How can I get the cursor position without writing anything on the screen?
(If the cursor is on one of the printed characters, it will overwrite it.)
Should echo be toggled before and after sending the escape sequence?
This is a school project, so I only can use termcap, I can't use ncurses functions, the only allowed functions are tputs, tgoto, tgetstr, tgetnum, tgetflag.
There are several problems:
canonical mode is buffered (see below)
the read is done on the file-descriptor for standard output (that may happen to work — sometimes — but don't count on it)
the read does not read enough characters to get a typical response
the response would have two decimal integers, separated by semicolon ;
the response would have a final character (which would become an issue if the read actually asked for enough characters...)
Further reading:
General Terminal Interface The Single UNIX ® Specification, Version 2
In canonical mode input processing, terminal input is processed in units of lines. A line is delimited by a newline character (NL), an end-of-file character (EOF), or an end-of-line (EOL) character. See Special Characters for more information on EOF and EOL. This means that a read request will not return until an entire line has been typed or a signal has been received. Also, no matter how many bytes are requested in the read() call, at most one line will be returned. It is not, however, necessary to read a whole line at once; any number of bytes, even one, may be requested in a read() without losing information.
XTerm Control Sequences
CSI Ps n Device Status Report (DSR).
Ps = 5 -> Status Report.
Result ("OK") is CSI 0 n
Ps = 6 -> Report Cursor Position (CPR) [row;column].
Result is CSI r ; c R
That is, your program should be prepared to read Escape[ followed by two decimal integers (with no fixed limit on their length), and two other characters ; and R.
By the way, termcap by itself will do little for your solution. While ncurses has some relevant capabilities defined in the terminal database:
# u9 terminal enquire string (equiv. to ANSI/ECMA-48 DA)
# u8 terminal answerback description
# u7 cursor position request (equiv. to VT100/ANSI/ECMA-48 DSR 6)
# u6 cursor position report (equiv. to ANSI/ECMA-48 CPR)
few programs use those, and in any case you would find it difficult to use the cursor position report in a termcap application.
This function print the length of words with '*' called histogram.How can I save results into text file? I tried but the program does not save the results.(no errors)
void histogram(FILE *myinput)
{
FILE *ptr;
printf("\nsaving results...\n");
ptr=fopen("results1.txt","wt");
int j, n = 1, i = 0;
size_t ln;
char arr[100][10];
while(n > 0)
{
n = fscanf(myinput, "%s",arr[i]);
i++;
}
n = i;
for(i = 0; i < n - 1; i++)
{
ln=strlen(arr[i]);
fprintf(ptr,"%s \t",arr[i]);
for(j=0;j<ln;j++)
fprintf(ptr, "*");
fprintf(ptr, "\n");
}
fclose(myinput);
fclose(ptr);
}
I see two ways to take care of this issue:
Open a file in the program and write to it.
If running with command line, change the output location for standard out
$> ./histogram > outfile.txt
Using the '>' will change where standard out will write to. The issue with '>' is that it will truncate a file and then write to the file. This means that if there was any data in that file before, it is gone. Only the new data written by the program will be there.
If you need to keep the data in the file, you can change the standard out to append the file with '>>' as in the following example:
$> ./histogram >> outfile.txt
Also, there does not have to be a space between '>' and the file name. I just do that for preference. It could look like this:
$> ./histogram >outfile.txt
If your writing to a file will be a one time thing, changing standard out is probably be best way to go. If you are going to do it every time, then add it to the code.
You will need to open another FILE. You can do this in the function or pass it in like you did the file being read from.
Use 'fprintf' to write to the file:
int fprintf(FILE *restrict stream, const char *restrict format, ...);
Your program may have these lines added to write to a file:
FILE *myoutput = fopen("output.txt", "w"); // or "a" if you want to append
fprintf(myoutput, "%s \t",arr[i]);
Answer Complete
There may be some other issues as well that I will discuss now.
Your histogram function does not have a return identifier. C will set it to 'int' automatically and then say that you do not have a return value for the function. From what you have provided, I would add the 'void' before the function name.
void histogram {
The size of arr's second set of arrays may be to small. One can assume that the file you are reading from does not exceed 10 characters per token, to include the null terminator [\0] at the end of the string. This would mean that there could be at most 9 characters in a string. Else you are going to overflow the location and potentially mess your data up.
Edit
The above was written before a change to the provided code that now includes a second file and fprintf statements.
I will point to the line that opens the out file:
ptr=fopen("results1.txt","wt");
I am wondering if you mean to put "w+" where the second character is a plus symbol. According to the man page there are six possibilities:
The argument mode points to a string beginning with one of the
following sequences (possibly followed by additional characters, as
described below):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is
positioned at the beginning of the file.
a Open for appending (writing at end of file). The file is
created if it does not exist. The stream is positioned at the
end of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file
position for reading is at the beginning of the file, but
output is always appended to the end of the file.
As such, it appears you are attempting to open the file for reading and writing.
A friend of mine needs to use MATLAB for one of his classes, so he called me up (a Computer Science Major) and asked if I could teach him C. I am familiar with C++, so I am also familiar with the general syntax, but had to read up on the IO library for C.
I was creating some simple IO programs to show my friend, but my third program is causing me trouble. When I run the program on my machine using Eclipse (with the CDT) Eclipse's console produces a glitchy output where instead of prompting me for the data, it gets the input and then prints it all at once with FAILURE.
The program is supposed to get a filename from user, create the file, and write to it until the user enters a blank line.
When I compile/run it on my machine via console (g++ files2.c) I am prompted for the data properly, but FAILURE shows up, and there is no output file.
I think the error lies with how I am using the char arrays, since using scanf to get the filename will create a functional file (probably since it ignores whitespace), but not enter the while loop.
#include <stdio.h>
#define name_length 20
#define line_size 80
int main() {
FILE * write_file; // pointer to file you will write to
char filename[name_length]; // variable to hold the name of file
char string_buffer[line_size]; // buffer to hold your text
printf("Filename: "); // prompt for filename
fgets(filename, name_length, stdin); // get filename from user
if (filename[name_length-1] == '\n') // if last char in stream is newline,
{filename[name_length-1] = '\0';} // remove it
write_file = fopen(filename, "w"); // create/overwrite file user named
if (!write_file) {printf("FAILURE");} // failed to create FILE *
// inform user how to exit
printf("To exit, enter a blank line (no spaces)\n");
// while getting input, print to file
while (fgets(string_buffer, line_size, stdin) != NULL) {
fputs(string_buffer, write_file);
if (string_buffer[0] == '\n') {break;}
}
fclose(write_file);
return 0;
}
How should I go about fixing the program? I have found next to nothing on user-terminated input being written to file.
Now if you will excuse me, I have a couple of files to delete off of my University's UNIX server, and I cannot specify them by name since they were created with convoluted filenames...
EDIT------
Like I said, I was able to use
scanf("%s", filename);
to get a working filename (without the newline char). But regardless of if I use scanf or fgets for my while loop, if I use them in conjunction with scanf for the filename, I am not able to write anything to file, as it does not enter the while loop.
How should I restructure my writing to file and my while loop?
Your check for the newline is wrong; you're looking at the last character in filename but it may be before that if the user enters a filename that's shorter than the maximum. You're then trying to open a file that has a newline in it's name.
These lines seem to be incorrect:
if (filename[name_length-1] == '\n') // if last char in stream is newline,
{filename[name_length-1] = '\0';} // remove it
You verify the name_length - 1 character,, which is 19 in your case without any regard of the introduced filename's length. So if your file name's length is less then 18 you won't replace the '\n' character at the end of your string. Obviously the file name can't contain '\n' character.
You need to get the size of you file name first with strlen() as an example.
if (filename[strlen(filename) - 1] == '\n')
{
filename[strlen(filename) - 1] = '\0';
}
(Don't forget to include the string.h header)
I hope I was able to help with my weak english.
I am trying to create a a program that does the following actions:
Open a file and read one line.
Open another file and read another line.
Compare the two lines and print a message.
This is my code:
#include <stdio.h>
#include <string.h>
int findWord(char sizeLineInput2[512]);
int main()
{
FILE*cfPtr2,*cfPtr1;
int i;
char sizeLineInput1[512],sizeLineInput2[512];
cfPtr2=fopen("mike2.txt","r");
// I open the first file
while (fgets(sizeLineInput2, 512, cfPtr2)!=NULL)
// I read from the first 1 file one line
{
if (sizeLineInput2[strlen(sizeLineInput2)-1]=='\n')
sizeLineInput2[strlen(sizeLineInput2)-1]='\0';
printf("%s \n",sizeLineInput2);
i=findWord(sizeLineInput2);
//I call the procedure that compares the two lines
}
getchar();
return 0;
}
int findWord(char sizeLineInput2[512])
{
int x;
char sizeLineInput1[512];
File *cfPtr1;
cfPtr1=fopen("mike1.txt","r");
// here I open the second file
while (fgets(sizeLineInput1, 512,cfPtr1)!=NULL)
{
if (sizeLineInput1[strlen(sizeLineInput1)-1]=='\n')
sizeLineInput1[strlen(sizeLineInput1)-1]='\0';
if (strcmp(sizeLineInput1,sizeLineInput2)==0)
//Here, I compare the two lines
printf("the words %s and %s are equal!\n",sizeLineInput1,sizeLineInput2);
else
printf("the words %s and %s are not equal!\n",sizeLineInput1,sizeLineInput2);
}
fclose(cfPtr1);
return 0;
}
It seems to have some problem with file pointers handling. Could someone check it and tell me what corrections I have to do?
Deconstruction and Reconstruction
The current code structure is, to be polite about it, cock-eyed.
You should open the files in the same function - probably main(). There should be two parallel blocks of code. In fact, ideally, you'd do your opening and error handling in a function so that main() simply contains:
FILE *cfPtr1 = file_open("mike1.txt");
FILE *cfPtr2 = file_open("mike2.txt");
If control returns to main(), the files are open, ready for use.
You then need to read a line from each file - in main() again. If either file does not contain a line, then you can bail out with an appropriate error:
if (fgets(buffer1, sizeof(buffer1), cfPtr1) == 0)
...error: failed to read file1...
if (fgets(buffer2, sizeof(buffer2), cfPtr2) == 0)
...error: failed to read file2...
Then you call you comparison code with the two lines:
findWord(buffer1, buffer2);
You need to carefully segregate the I/O operations from the actual processing of data; if you interleave them as in your first attempt, it makes everything very messy. I/O tends to be messy, simply because you have error conditions to deal with - that's why I shunted the open operation into a separate function (doubly so since you need to do it twice).
You could decide to wrap the fgets() call and error handling up in a function, too:
const char *file1 = "mike1.txt";
const char *file2 = "mike2.txt";
read_line(cfPtr1, file1, buffer1, sizeof(buffer1));
read_line(cfPtr2, file2, buffer2, sizeof(buffer2));
That function can trim the newline off the end of the string and deal with anything else that you want it to do - and report an accurate error, including the file name, if anything goes wrong. Clearly, with the variables 'file1' and 'file2' on hand, you'd use those instead of literal strings in the file_open() calls. Note, too, that making them into variables means it is trivial to take the file names from the command line; you simply set 'file1' and 'file2' to point to the argument list instead of the hard-wired defaults. (I actually wrote: const char file1[] = "mike1.txt"; briefly - but then realized that if you handle the file names via the command line, then you need pointers, not arrays.)
Also, if you open a file, you should close the file too. Granted, if your program exits, the o/s cleans up behind you, but it is a good discipline to get into. One reason is that not every program exits (think of the daemons running services on your computer). Another is that you quite often use a resource (file, in the current discussion) briefly and do not need it again. You should not hold resources in your program for longer than you need them.
Philosophy
Polya, in his 1957 book "How To Solve It", has a dictum:
Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.
That is as valid advice in programming as it is in mathematics. And in their classic 1978 book 'The Elements of Programming Style', Kernighan and Plauger make the telling statements:
[The] subroutine call permits us to summarize the irregularities in the argument list [...]
The subroutine itself summarizes the regularities of the code.
In more modern books such as 'The Pragmatic Programmer' by Hunt & Thomas (1999), the dictum is translated into a snappy TLA:
DRY - Don't Repeat Yourself.
If you find your code doing the 'same' lines of code repeated several times, write a subroutine to do it once and call the subroutine several times.
That is what my suggested rewrite is aiming at.
In both main() and findWord() you should not use strlen(sizeLineInputX) right after reading the file with fgets() - there may be no '\0' in sizeLineInput2 and you will have strlen() read beyond the 512 bytes you have.
Instead of using fgets use fgetc to read char by char and check for a newline character (and for EOF too).
UPD to your UPD: you compare each line of mike2.txt with each line of mike1.txt - i guess that's not what you want. Open both files one outside while loop in main(), use one loop for both files and check for newline and EOF on both of them in that loop.