Can anyone explain to me the purpose of ungetch?
This is from K&R chapter 4 where you create a Reverse Polish Calculator.
I've ran the program without the call to ungetch and in my tests it still works the same.
int getch(void) /* get a (possibly pushed back) character */
{
if (bufp > 0)
{
return buf[--bufp];
}
else
{
return getchar();
}
}
void ungetch(int c) /* push character back on input */
{
if (bufp >= BUFSIZE)
{
printf("ungetch: too many characters\n");
}
else
{
buf[bufp++] = c;
}
}
(I've removed the ternary operator in getch to make it clearer.)
I don't know about the specific example you're referring to (It's probaby 23 years since I read K&R, and that was the first edition.), but often when parsing it's convenient to 'peek' at the next character to see if it is part of what you're currently parsing. For instance, if you're reading a number you want to keep reading digits until you come to a non-digit. Ungetc lets the number reader look at the next character without consuming it so that someone else can read it. In Greg Hewgill's example of "2 3+", the number reader would read the 3 digit, then read the plus sign and know the number is finished, then ungetc the plus sign so that it can be read later.
Try running the program without spaces around operators. I don't recall precisely the format of that example and I don't have K&R handy, but instead of using "2 3 +" try "2 3+". The ungetch() is probably used when parsing numbers, as the number parser will read digits until it gets something that is a non-digit. If the non-digit is a space, then the next getch() will read the + and all is well. However, if the next non-digit is a +, then it will need to push that back onto the input stream so the main read loop can find it again.
Hope I'm remembering the example correctly.
It's used a lot for lexical scanners (the part of the compiler that breaks your text into chunks like variable names, constants, operators, etc.). The function isn't necessary for the scanner, it's just very convenient.
When you're reading a variable name, for example, you don't know when you're done until you read a character that can't be part of the variable name. But then you have to remember that character and find a way to communicate it to the next chunk of the lexer. You could create a global variable or something, or pass it to the caller--but then how do you return other things, like error codes? Instead, you ungetch() the character to put it back into the input stream, do whatever you need to with your variable name and return. Then when the lexer starts reading the next chunk, it doesn't have to look around for extra characters lying around.
Take a look at this code, you'll understand:
#include <conio.h>
#include <stdio.h>
int main()
{
int y=0;
char t[10];
int u=0;
ungetch('a');
t[y++]=getch();
ungetch('m');
t[y++]=getch();
ungetch('a');
t[y++]=getch();
ungetch('z');
t[y++]=getch();
ungetch('z');
t[y++]=getch();
ungetch('a');
t[y++]=getch();
ungetch('l');
t[y++]=getch();
ungetch('\0');
t[y++]=getch();
ungetch('\0');
t[y++]=getch();
ungetch('\0');
t[y++]=getch();
printf("%s",t);
return 0;
}
Related
I want to ask what's the best way of reading a file (pref. using buffer) if I will need to throw out words which middle symbol is a number. (There can be more than one space between words). The text file could look like this "asd4ggt gklk6k k77k 345k ll4l 7" so I need to throw out "asd4ggt" and "7" (I don't need to throw out "k77k" because it's even number of symbols so there isn't middle symbol). In words symbols can be from 0 to 9, A to Z, a to z (only simple English alphabet)
I think of reading a text file word by word: read one word into buffer if it has even number of symbols then write it to file but if it has odd number of symbols then I have to check if its' middle symbol is a number and if it is I skip this word and go to the next word.
Is this a right way of thinking how to complete this task?
Based of your comment we came to this:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int evenCheck(const char *ptr);
size_t middleCheck(const char *ptr);
int main(void){
const char *ptr = "t4k4k";
size_t middle = middleCheck(ptr);
if( evenCheck(ptr) == 0){
printf("Output to file the word %s\n",ptr);
}else{
if ( isdigit(ptr[middle]) ){
printf("Ignoring the word %s, because has the number %c in the middle\n",ptr, ptr[middle]);
}else{
printf("Output to file the word %s, because the middle is %c which is a Letter\n",ptr, ptr[middle]);
}
}
}
int evenCheck(const char *ptr){
size_t len = strlen(ptr);
if ( (len % 2) ){
return 1;
}
return 0;
}
size_t middleCheck(const char *ptr){
size_t middle = strlen(ptr) / 2;
return middle;
}
Output:
Output to file the word t4k4k, because the middle is k which is a Letter
Now you were asking about how to do this if the file has more than one word.
Well one option will be to save the file in a Multi-Dimensional array or read the whole file.
I'm sure you can do it, if not come back with another Question.
Depends on the type of data, and what you plan to do with it. If it's a small enough file that sits in a single buffer, just load the file and then throw out whichever parts you don't want from the buffer.
If the data needs to be loaded into a data structure other than a flat buffer, then you'll need to process the input, probably line-by-line, building the structure and throwing out what you don't need as you go.
Note that the standard file routines can read a byte or line of text efficiently (they still use a larger buffer internally).
Other than that, you're question really isn't that clear.
First, you should define which encoding is your text file using. On most operating systems, it would be UTF-8 (so Unicode characters can take several bytes each).
Then, the notion of word is probably (human) language specific. I'm not sure it would be the same in English (BTW, gklk6k is not an English word, it does not appear in any English dictionnary), in ancient Greek, in Russian, in Japanese, in Chinese. Notice that the notion of letter is not as simple as you would imagine (Unicode has much more letters than A ... Z & a ... z; my family name in Russian is Старынкевич and all these letters are outside of A ... Z & a ... z, and they need more than one byte each). And what about combining characters? Greek diacritics? What exactly are words for you, how can they be separated? What about punctuation?
If your system provides it, I would use getline(3) to read a line and have a loop around that. Then process every line. Splitting it into words is itself interesting. You could use some UTF-8 library like ICU, Glib (from GTK) etc etc...
In other words, you should define first what your input can be (and a few examples don't constitute a definition). Perhaps you could specify the possible valid input using EBNF notation. Maybe reading more about lexing & parsing techniques is relevant.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am having a lot of trouble starting my project. Here are the directions:
"Complete counts.c as follows:
Read characters from standard input until EOF (the end-of-file mark) is read. Do not prompt the user to enter text - just read data as soon as the program starts.
Keep a running count of each different character encountered in the input, and keep count of the total number of characters input (excluding EOF)."
The format my professor gave me to start is: `
#include <stdio.h>
int main(int argc, char *argv[]) {
return 0;
}
In addition to how to start the problem, I'm also confused as to why the two parameter's are given in the main function when nothing is going to be passed to it. Help would be much appretiated! Thank you!
`
Slightly tricky to see what you're having trouble with here. The title doesn't form a complete question, nor is there one in the body; and they seem to be hinting at entirely different questions.
The assignment tells you to read characters - not store them. You could have a loop that only reads them one at a time if you wish (for instance, using getchar). You're also asked to report counts of each character, which would make sense to store in an array. Given that this is of "each different character", the simplest way would be to size the array for all possible characters (limits.h defines UCHAR_MAX, which would help with this). Remember to initialize the array if it's automatically allocated (the default for function local variables).
Regarding the arguments to main, this program does not need them, and the C standard does allow you to leave them out. They're likely included as this is a template of a basic C program, to make it usable if command line arguments will be used also.
For more reference code you might want to compare the word count utility (wc); the character counting you want is the basis of a frequency analysis or histogram.
This should give you a start to investigate what you need to learn to complete your task,
Initially declare a character input buffer of sufficient size to read chars as,
char input[SIZE];
Use fgets() to read the characters from stdin as,
if (fgets(input, sizeof input, stdin) == NULL) {
; // handle EOF
}
Now input array has your string of characters which you to find occurrence of characters. I did not understand When you say different characters to count, however you have an array to traverse it completely to count the characters you need.
Firstly, luckily for you we will not need dynamic memory allocation at all here as we are not asked to store the input strings, instead we simply need to record how many of each ascii code is input during program run, as there a constant and finite number of those we can simply store them in a fixed size array.
The functions we are looking at here (assuming we are using standard libs) are as follows:
getchar, to read chars from standard input
printf, to print the outputs back to stdout
The constructs we will need are:
do {} while, to loop around until a condition is false
The rest just needs simple mathematical operators, here is a short example which basically shows a sample solution:
#include <stdio.h>
int main(int argc, char *argv[])
{
/* Create an array with entries for each char,
* then init it to zeros */
int AsciiCounts[256] = {0};
int ReadChar;
int TotalChars = 0;
int Iterator = 0;
do
{
/* Read a char from stdin */
ReadChar = getchar();
/* Increment the entry for its code in the array */
AsciiCounts[ReadChar]++;
TotalChars++;
} while (ReadChar != EOF);
/* Stop if we read an EOF */
do
{
/* Print each char code and how many times it occurred */
printf("Char code %#x occurred %d times\n", Iterator, AsciiCounts[Iterator]);
Iterator++;
} while (Iterator <= 255);
/* Print the total length read in */
printf("Total chars read (excluding EOF): %d", --TotalChars);
return 0;
}
Which should achieve the basic goal, however a couple of extension exercises which would likely benefit your understanding of C. First you could try to convert the second do while loop to a for loop, which is more appropriate for the situation but I did not use for simplicity's sake. Second you could add a condition so the output phase skips codes which never occurred. Finally it could be interesting to check which chars are printable and print their value instead of their hex code.
On the second part of the question, the reason those arguments are passed to main even though they are ignored is due to the standard calling convention of c programs under most OSes, they pass the number of command line arguments and values of each command line argument respectively in case the program wishes to check them. However if you really will not use them you can in most compilers just use main() instead however this makes things more difficult later if you choose to add command line options and has no performance benefit.
this is my first question in this site, and I've just started programming, please be patient with me.
I'm having some trouble with this code to read strings and intergers from a file, they are separated by a semicolon ";" and it starts with the number of lines. The file is something like this:
13;
A;15;B;1;0;0;0;
A;9;C;0;3;2;0;
A;9;D;0;4;0;2;
A;3;E;2;3;2;0;
A;7;F;5;5;3;1;
A;5;G;5;7;6;0;
A;13;H;0;0;0;0;
A;1;I;8;1;0;0;
A;1;J;2;2;1;0;
A;6;K;7;3;2;0;
A;5;L;2;4;3;0;
A;12;AA;0;3;2;0;
A;9;BA;0;1;0;0;
What I tried to do was to create a function that would receive a file pointer (fp) and the number of lines that was read in the main function. It would read the file and save the intergers and strings in matrices :
#include<stdio.h>
#include<stdlib.h>
char timesjogos[100][2][100];
int golsjogos[100][3];
int faltasjogos[100][3];
int camajogos[100][3];
int cverjogos[100][3];
int ReadGames(FILE *caminho,int njogos){
printf("starting to read jogos.\n");
int i=0;
while(fscanf(caminho, " %[^;];%d[^;];%[^;];%d[^;];%d[^;];%d[^;];%d[^;];",
timesjogos[i][0], &golsjogos[i][0], timesjogos[i][1], &golsjogos[i][1],
&faltasjogos[i][0], &camajogos[i][0], &cverjogos[i][0]) == 7)
{
if(i < njogos)
i++;
else
break;
}
}
int main()
{
FILE *fp;
int nbets;
fp = fopen("jogos.txt", "r");
if (!fp){
printf ("Error trying to open file.");
}
fscanf(fp, " %d[^;];", &nbets);
ReadGames(fp, nbets);
}
My doubts are about the %[^;]; I used to read each string up to the ; , should I use %d[^;] for the intergers? What is the correct way to do it?
Also, I'm using global variables to save the information read, the problem is that they can be not large enough to save huge amounts of lines (my professor made a 24180 lines file to test our codes). I was thinking about using the number of lines it gives in the first line to make pre-sized matrices inside the function, but how can I return or save it after the function ends?
I'm sorry for the huge code, but I wanted to show all the details. I would be very thankful for your more experienced help :D
The %[^;] notation reads a string consisting of any number of non-semicolons. The parsing stops when a semicolon is encountered. With numbers, the parsing stops at a semicolon anyway; the semicolon is not a part of the representation of a number.
Your use of %d[^;] means that fscanf() is looking for an integer (%d), then an open square bracket, caret, semicolon and close square bracket. These don't appear in the input, of course, so the scanning fails.
Therefore, your input loop should probably be:
while (fscanf(caminho, " %[^;];%d;%[^;];%d;%d;%d;%d;",
timesjogos[i][0], &golsjogos[i][0], timesjogos[i][1],
&golsjogos[i][1], &faltasjogos[i][0], &camajogos[i][0],
&cverjogos[i][0]) == 7)
{
...
}
You might prefer to specify a maximum length for the %[^;] conversion specifications; %99[^;] would be appropriate since the third dimension of timesjogos is 100. There's an off-by-one difference between the length specified and the length used (enshrined because of ancient history; it was that way before the first C standard, and the C standard codified existing practice).
I'm learning C from K&R's "The C Programming Language" book. I'm doing the exercises specified in the book. I'm on exercise number 1.16, but I don't understand it.
Exercise 1.16:
Revise the main routine of the longest-line program so it will
correctly print the length of arbitrarily long input lines, and as
much as possible of the text.
My questions:
"...as much as possible of the text..." - is there some limitation on string length? Maybe in standard headers there's a variable with the max allowed value of string length?
"...the length of arbitrarily long input lines..." - but in the code MAXLINE is defined as 1000. It is limited size too. I see some solutions here, but in my opinion it is not solution decision, since on the former there is a restriction on length of a line (1000 characters).
Maybe I don't understood the task. My understanding is I must remove the 1000-character limitation.
It's a pretty early exercise in K&R, you're just supposed to do some minor changes to the code, not a total redesign of the code.
"...as much as possible of the text..."
is up to you to interpret. I'd do it by printing what's stored in the longest buffer. i.e. print out up to 1000 characters of the line. Again, it's an early exercise, with little introduction to dynamically allocated memory yet. And at the time K&R was written, storing away arbitrarily long text lines wasn't as feasible as it is today.
"...the length of arbitrarily long input lines..."
Is a hard requirement. You're supposed to find the correct length no matter how long it is (at least within the bounds of an int. )
One way to solve this problem is:
After the call to getline(), check if the last character read into the line buffer is a newline ('\n')
If it is, you read a complete line. The len variable is the correct length of the line(the return value of getline(), and no special consideration is needed compared to to original code.
If it is not , you did not read the entire line, and need to hunt for the end of this line. You add a while loop, calling getchar() until it returns a newline (or EOF), and count the number of characters you read in that loop. Just do len++ to count.
When the while loop is done, the new len is now the actual length of the line, but our buffer just has the first 999 characters of it.
As before, you store away (the copy() function call) the current line buffer (max 1000 chars) if this line is the longest so far.
When you're done, you print out the stored line as before (the longest buffer) and the max variable for the length.
Due to the above mentioned while loop that max length is now correct.
If the longest line indeed was longer than 1000 chars. you at least print out those first 999 chars - which is "as much as possible".
I'll not spoil it and post the code you need to accomplish this, but it is just 6 lines of code that you need to add to the longest-line program of exercise 1-16.
On modern machines "as much as possible of the text" is likely to be all of the text, thanks to automatically line-wrapping terminal programs. That book was written when teletype terminals were still in use. There is no limitation on string length other than perhaps memory limitations of the machine you're working on.
They're expecting you to add some kind of loop to read characters and look for newlines rather than assuming that a read into the MAXLINE sized buffer is going to contain a newline for sure.
here is my version:
int getline(char s[],int lim)
{
int c,i;
for(i=0;i<lim-1&&(c=getchar())!=EOF&&c!='\n';++i)
s[i]=c;
if(c=='\n')
{
s[i]=c;
++i;
}
if(c!=EOF)
{
while((c=getchar())!=EOF&&c!='\n')
i++;
}
s[i]='\0';
return i;
}
#define MAXLINE 1000
int len;
int max;
char line[MAXLINE];
char longest[MAXLINE];
max=0;
while((len=getline(line,MAXLINE))>1)
{
if(len>max)
{
max=len;
copy(longest,line);
}
}
if(max>0)
{
printf("%d:%s",max,longest);
}
return 0;
for some unknown reasons ,the example code doesn't work in my pc
particularly,when the condition is 'len>0',the loop won't end
i think the main reason is that when you type nothing,but you still have to press enter,so it is received as '\n',and the len is 1;
i think it satisfy the requirement that print the length of arbitrarily long input lines, and as much as possible of the text.
And it works like this
#include
main()
{
long tlength = 0;
short input, llength = 1;
while (llength > 0) {
llength = 0;
while ((input = getchar()) != EOF) {
++llength;
if (input == '\n')
break;
}
tlength = tlength + llength;
printf("\nLength of just above line : %5d\n\n", llength);
}
printf("\n\tLength of entire text : %8ld\n", tlength);
return 0;
}
According to me, This question only wants the length of each arbitrarily line + At last the length of entire text.
Try to run this code and tell me is it correct according to question because i too confuse in this problem.
I want to offer that this exercise actually makes more sense if imagine that the limit of the number of characters you can copy is very small -- say, 100 characters -- and that your program is supposed to judge between lines that are longer than that limit.
(If you actually change the limit so that it's very small, the code becomes easier to test: if it picks out the first line that hits that small limit, you'll know your code isn't working, whereas if it returns the first however-many characters of the longest line, it's working.)
Keep the part of the code that copies and counts characters until it hits a newline or EOF or the line-size-limit. Add code that picks up where this counting and copying leaves off, and which will keep counting even after the copying has stopped, so long as getchar() still hasn't returned an EOF or a newline.
My solution: just below the call to getLine
if ( line[len-1] != '\n' && line[len-1] != EOF) //if end of line or file wasnt found after max length
{
int c;
while ( ( c = getchar() ) != '\n' && c != EOF )
len++; //keep counting length until end of line or file is found
}
to test it, change MAXLINE to 25
This code comes from K&R. I have read it several times, but it still seems to escape my grasp.
#define BUFSIZE 100
char buf[BUFSIZE];
int bufp = 0;
int getch(void)
{
return(bufp>0)?buf[--bufp]:getchar();
}
int ungetch(int c)
{
if(bufp>=BUFSIZE)
printf("too many characters");
else buf[bufp++]=c;
}
The purpose of these two functions, so K&R says, is to prevent a program from reading too much input. i.e. without this code a function might not be able to determine it has read enough data without first reading too much. But I don't understand how it works.
For example, consider getch().
As far as I can see this is the steps it takes:
check if bufp is greater than 0.
if so then return the char value of buf[--bufp].
else return getchar().
I would like to ask a more specific question, but I literally dont know how this code achieves what it is intended to achieve, so my question is: What is (a) the purpose and (b) the reasoning of this code?
Thanks in advance.
NOTE: For any K&R fans, this code can be found on page 79 (depending on your edition, I suppose)
(a) The purpose of this code is to be able to read a character and then "un-read" it if it turns out you accidentally read a character too many (with a max. of 100 characters to be "un-read"). This is useful in parsers with lookahead.
(b) getch reads from buf if it has contents, indicated by bufp>0. If buf is empty, it calls getchar. Note that it uses buf as a stack: it reads it from right-to-left.
ungetch pushes a character onto the stack buf after doing a check to see if the stack isn't full.
The code is not really for "reading too much input", instead is it so you can put back characters already read.
For example, you read one character with getch, see if it is a letter, put it back with ungetch and read all letters in a loop. This is a way of predicting what the next character will be.
This block of code is intended for use by programs that make decisions based on what they read from the stream. Sometimes such programs need to look at a few character from the stream without actually consuming the input. For example, if your input looks like abcde12xy789 and you must split it into abcde, 12, xy, 789 (i.e. separate groups of consecutive letters from groups of consecutive digits) you do not know that you have reached the end of a group of letters until you see a digit. However, you do not want to consume that digit at the time you see it: all you need is to know that the group of letters is ending; you need a way to "put back" that digit. An ungetch comes in handy in this situation: once you see a digit after a group of letters, you put the digit back by calling ungetch. Your next iteration will pick that digit back up through the same getch mechanism, sparing you the need to preserve the character that you read but did not consume.
1. The other idea also shown here can be also called as a very primitive I/O stack mangement system and gives the implementation of the function getch() and ungetch().
2. To go a step further , suppose you want to design an Operating System , how can you handle the memory which stores all the keystrokes?
This is solved by the above code snippet.An extension of this concept is used in file handling , especially in editing files .In that case instead of using getchar() which is used to take input from Standard input , a file is used as a source of input.
I have a problem with code given in question. Using buffer (in form of stack) in this code is not correct as when getting more than one extra inputs and pushing into stack will have undesired effect in latter processing (getting input from buffer).
This is because when latter processing (getting input) going on ,this buffer (stack) will give extra input in reverse order (means last extra input given first).
Because of LIFO (Last in first out ) property of stack , the buffer in this code must be quene as it will work better in case of more than one extra input.
This mistake in code confused me and finally this buffer must be quene as shown below.
#define BUFSIZE 100
char buf[BUFSIZE];
int bufr = 0;
int buff = 0;
int getch(void)
{
if (bufr ==BUFSIZE)
bufr=0;
return(bufr>=0)?buf[bufr++]:getchar();
}
int ungetch(int c)
{
if(buff>=BUFSIZE && bufr == 0)
printf("too many characters");
else if(buff ==BUFSIZE)
buff=0;
if(buff<=BUFSIZE)
buf[buff++]=c;
}