I have a few questions on this exercise. Here is the code I'm dealing with:
#include <stdio.h>
int main (void)
{
int c;
int inspace;
inspace = 0;
while((c = getchar()) != EOF)
{
if(c == ' ')
{
if(inspace == 0)
{
inspace = 1;
putchar(c);
}
}
if(c != ' ')
{
inspace = 0;
putchar(c);
}
}
return 0;
}
(Sorry I'm having a lot of trouble comprehending how these programs work because they're so simple and lack description on how they actually work)
First of all, how does putchar(c) not output the same exact data that came in. Despite it checking for a blank or != blank, it still says to output "c" which is just getchar(c) meaning whatever was inputted. I see no code that specifies to delete extra spaces and output just one space. Where does the code specify that that is what must take place? I'm having trouble understanding how getchar/putchar works it seems to me.
Also, what importance does inspace == 1 or 0 have? If inspace is == 1 then it just outputs the characters inputted back out. There's nothing saying that the extra blanks are deleted and inspace isn't defined as anything except 0 or 1, there's nothing defining it as a space so how can it possibly have any real meaning as to what the program is doing?
I'm really confused, where is the code that's replacing the spaces and how does it work? Is there a simpler book I should be learning from that explains the solutions?
First of all, how does putchar(c) not output the same exact data that came in. Despite it checking for a blank or != blank, it still says to output "c" which is just getchar(c) meaning whatever was inputted. I see no code that specifies to delete extra spaces and output just one space. Where does the code specify that that is what must take place? I'm having trouble understanding how getchar/putchar works it seems to me.
You are correct that if putchar is called, it just outputs the input character. The key to this program is that putchar isn't called on every input character. The various if statements control when it is called. At a high level, the program avoids calling putchar on the second, third, fourth, etc., spaces if there are multiple spaces in a row. It's only called on the first space.
Also, what importance does inspace == 1 or 0 have? If inspace is == 1 then it just outputs the characters inputted back out. There's nothing saying that the extra blanks are deleted and inspace isn't defined as anything except 0 or 1, there's nothing defining it as a space so how can it possibly have any real meaning as to what the program is doing?
Don't think of it as spaces being deleted. Think of it as them being omitted. Sometimes putchar is called, sometimes it isn't. Look at the loop and try to figure out what conditions would cause putchar not to be called.
Importantly, look at what happens if you start a loop iteration, inspace == 1, and c == ' '. What happens?
It might help to put together a table showing when putchar is and isn't called.
Is putchar(c) called?
=====================
| c == ' ' | c != ' '
-------------+----------+---------
inspace == 0 | Y | Y
inspace == 1 | N | Y
Think about the logic in this block when there are two ore more consecutive ' ' characters in the input.
if(c == ' ')
{
if(inspace == 0)
{
inspace = 1;
putchar(c);
}
}
When the first space character is encountered, the code enters the nested if block and prints the character.
When the second space character is encountered, the code does not enter the nested if block and the character is not printed.
If you follow this logic, you'll notice that if there are two or more consecutive space characters in the input, only one is printed.
how does putchar(c) not output the same exact data that came in.
When the code reaches putchar(c), it outputs the same exact character that came in. However, the code may not be reaching putchar(c) on some of the iterations.
what importance does inspace == 1 or 0 have?
Once the program sets inspace to 1, it stops printing further space characters, because the code will not reach putchar(c) on second and subsequent iterations of the loop.
inspace is set to 1 after printing the first space in a sequence of one or more spaces. If inspace is set to zero coming into the first conditional, a space would be printed; otherwise, no space would be printed.
Here is a diagram that explains what is happening:
The program starts in the black circle, and proceeds to one of two states, depending on the input character:
If the character is space, the state on the left is entered, when the first space is printed, and the rest of spaces are ignored (i.e. inspace is set to 1)
If the character is non-space, the state on the right is entered, when each character is printed.
Each time a new character is read the program decides if it wants to switch the state, or to remain in the current state.
Note: the diagram is not showing the EOF to save some space. When EOF is reached, the program exits.
The first if block along with the inner one is saying, "If the input is a space and the inspace flag is zero, print it, and also set the flag to one". I.e. print space if it is the first one, and indicate the next one won't be the first. The second block is saying "If the input is not space, print it and reset the previous spaces flag so the next encountered space will be considered the first one.". That's all.
Related
I would like to count the number of lines in an ASCII text file.
I thought the best way to do this would be by counting the newlines in the file:
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') ++lines;
}
However, I'm not sure if this would account for the last line on all both MS Windows and Linux. That is if my text file finishes as below, without an explicit newline, is there one encoded there anyway or should I add an extra ++lines; after the for loop?
cat
dog
Then what about if there is an explicit newline at the end of the file? Or do I just need to test for this case by keeping track of the previously read value?
If there is no newline, one won't be generated. C tells you exactly what's there.
Text files are always expected to end with a line feed. There's no canonical way of handling files that don't.
Here's how some tools choose to deal with characters after the last line feed:
wc doesn't count it as a line (so you have good precedence for that)
Vim marks the file as [noeol], and saves the file without a trailing line feed
GNU sed treats the file as if it had a last line feed
sh's read exits with error, but still returns the data
Since behaviour is pretty much undefined, you can just do whatever's convenient or useful to you.
First, there will not be any implicitly encoded newline at the end of the last line. The only way there will be a newline is if the software or person that produced the file put it there. Putting it there is generally considered good practice, however.
The ultimate answer for what you should report as the line count depends on the convention that you need to follow for the software or people that will be using this line count, and probably what you can assume about the behavior of the input source as well.
Most command-line tools will terminate their output with a newline character. In this case, the sensible answer may be to report the number of newline characters as the number of actual lines.
On the other hand, when a text editor is displaying a file, you will see that the line numbering in the margin (if supported) contains a number for the last line whether it is empty or not. This is in part to tell the user that there is a blank line there, but if you want to count the number of lines displayed in the margin, it is one plus the number of newline characters in the file. It is typical for some coders to not terminate their last lines with a newline character (sometimes due to sloppiness), so in this case this convention would actually be the right answer.
I'm not sure any other conventions make much sense. For example, if you choose not to count the last line unless it is non-empty, then what counts as non-empty? The file ending after newline? What if there is whitespace on that line? What if there are several empty lines at the end of the file?
If you're going to use this method, you could always keep a separate counter for how many letters on the line you are at. If the count at the end is greater than 1, then you know there is stuff on the last line that wasn't counted.
int letters = 0
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
letters++; // Increase count on character
if (c == '\n')
{
++words;
letters = 0; // Set back to 0 after new line
}
}
if (letters > 0)
{
++words;
}
Your concern is real, the last line in the file may be missing the final end of line marker. The end of line marker is a single '\n' in Linux, a CR LF pair in Windows that the C runtime converts automatically into a '\n'.
You can simplify your code and handle the special case of the last line missing a linefeed this way:
int c, last = '\n', lines = 0;
while ((c = getc(fp)) != EOF) { /* Count word line endings. */
if (c == '\n')
lines += 1;
last = c;
}
if (last != '\n')
lines += 1;
Since you are concerned with speed, using getc instead of fgetc will help on platforms where it is defined as a macro that handles the stream structures directly and calls a function only to refill the buffer, every BUFSIZ characters or so, unless the stream is unbuffered.
How about this:
Create a flag for yourself to keep track of any non \n characters following a \n that is reset when c=='\n'.
After the EOF, check to see if the flag is true and increment if yes.
bool more_chars = false;
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') {
more_chars = false;
++words;
} else more_chars = true;
}
if(more_chars) words++;
Windows and UNIX/Linux style line breaks make no difference here. On either system a text file may or may not have a newline at the end of the last line.
If you always add 1 to the line count, this effectively counts the empty line at the end of the file when there is a newline at the end (i.e., file "foo\n" will count as having two lines: "foo" and ""). This may be an entirely reasonable solution, depending on how you want to define a line.
Another definition of a "line" is that it always ends in a newline, i.e., the file "foo\nbar" would only have one line ("foo") by this definition. This definition is used by wc.
Of course you could keep track of whether the newline was the last character in file and only add 1 to the count in case it wasn't. Then a "line" would be defined as either ending in a newline or being non-empty at the end of the file, which sounds quite complex to me.
I am trying to write an indexing program where it will take input from the user and store it into an array then keep counting the occurrence of words for example.
user enters: hello##world I,I,I am##!stuck201
hello 1 occurred 1 time
world 1 occurred 1 time
I occurred 3 times
am occurred 1 time
stuck occurred 1 time
So as you can see it will count anything that contains letter(s) separated by anything as a word.
(I am confused on how to go about checking the input for anything other than letters, I was thinking of using ASCII codes but there has to be a better way, if you could just set me in the correct direction for this, Thank you much.)
Before I began the program I was trying to get I/O working and I am having difficulty. The actual program will require me to use 2 dimensional arrays, but if you could help me with this snippet of code that will be appreciated thanks.
#include <stdio.h>
int main()
{
char array[64];
int i=0, j, input;
printf("Please enter an input:");
input==fgetc(stdin);
while(input != " ")
{
array[i]==input;
i++;
input==fgetc(stdin);
}
for(j=0;j<10;j++)
{
printf("You entered:%c",array[j]);
}
}
Upon compilation it gives me a warning "12:14 warning: comparison between pointer and integer"
Output of this code:::
Please enter an input: (I type input) ehehasd world hello (enter)
then it just sits at blank cursor and I have to exit using CTRL C
I want this snippet of code to just take input from user that is separated by a space store it into an array then print out what the user entered. What am I doing wrong?
Check isalpha, it has some fineprints about what it will consider a letter, but it may work for your case.
Another way to do it, if you don't want to do the loop yourself is to use regular expressions. It is fairly easy to make a regex that returns only sequences of letters.
The line (which appears twice in the code):
input==fgetc(stdin);
makes a comparison, not an assignment. Use:
input = fgetc(stdin);
Your line:
while(input != " ")
is incorrect and is the source of the compiler warning. You are comparing a string with a character. You probably intended to use:
while (input != ' ')
and since you could encounter EOF, you probably should use:
while (input != EOF && input != ' ')
You could sensibly use #include <ctype.h> and then:
while (isalpha(input))
which automatically handles EOF (it's a valid input to isalpha(), but returns false; EOF is not an alphabetic character).
Your final loop should probably be:
for (int j = 0; j < i; j++)
(assuming you have a C99 or more recent compiler — if not, declare j outside the loop as now). This only outputs words that have been entered. Otherwise, you'll be printing undefined gibberish.
You'll need to upgrade the code to handle multiple words in the input. At the moment, it stops at the end of the first word (assuming you fix the other problems that I've identified).
Use isAlpha() to test is it is a letter, look here
Alright, so I'm doing exercise 8 in K&R second edition. Upon looking up the answer after my attempt at doing the exercise didn't print anything but the newlines (the other ints for tabs and empty spaces remained 0 despite running loops to count - I later found out that I used the wrong character for blank space which is just a blank space but it still neglected to count '\t' correctly), I found this:
#include <stdio.h>
int main(void)
{
int blanks, tabs, newlines;
int c;
int done = 0;
int lastchar = 0;
blanks = 0;
tabs = 0;
newlines = 0;
while(done == 0)
{
c = getchar();
if(c == ' ')
++blanks;
if(c == '\t')
++tabs;
if(c == '\n')
++newlines;
if(c == EOF)
{
if(lastchar != '\n')
{
++newlines;
}
done = 1;
}
lastchar = c;
}
printf("Blanks: %d\nTabs: %d\nLines: %d\n", blanks, tabs, newlines);
return 0;
}
Now this works fine. K&R is interesting in that it uses ideas not taught to you in the actual text, for instance I tried to run my "while" loop with multiple IFs the same way this one does, except my WHILE loop ran only when getchar was != EOF. I want to know why it didn't work that way.
I found that what they did is a much better idea, creating the int done and then assigning it a 1 instead of 0 at the end of the program was a much better idea, but mine still ran somewhat correctly. (sorry I don't have my own original code this time).
Where I am stumped is what is the purpose of main(void) and return 0;? Before starting this book I found criticism on this but readers claimed it was only in the 1st edition. Here I find that the 2nd edition doesn't teach that but then puts it in the solutions text.
Also, what is the purpose of the int "lastchar"? If getchar(c) is the input and lastchar is always defined as 0, then how could lastchar possibly be changed by any input whatsoever to make it meaningful to the program at all by running a loop to count newlines with it? I see that lastchar is defined as 'c' at the end of the program, but how does that pertain to it being called previously?
Sorry if any of my questions are complicated. Please just answer whatever you can and let me know if you need any further clarification. Just to reiterate I'm very curious why the program can't run a while loop using getchar(c) != EOF, with the same IF statements. Rather than using while done == 0. I feel as if it could be a little shorter/concise (definitely can't say simpler) that way.
Where I am stumped is what is the purpose of main(void) and return 0;?
In standard C programs, main(0) should return an int, and 0 indicates successful program completion. One could argue that main should have two parameters -- the command-line argument count and an array of arguments, but if your program doesn't make use of arguments then it isn't necessary.
Also, what is the purpose of the int "lastchar"?
And the end of the while loop, the program stores a copy of the current character in the lastchar variable. As you can see in the EOF-handling code, it makes use of lastchar when determining whether the input text ended in a partial line.
I'm very curious why the program can't run a while loop using getchar(c) != EOF, with the same IF statements.
You could code it that way, but the conditional for the while can appear confusing to someone who doesn't have a lot of experience with C: while ((c = getchar()) != EOF). You would also have to move the if (lastchar != '\n') ++newlines; to just outside of the while loop.
Maybe you should make that change to the program and compare it's output to the original for various types of input (empty file, file ending with a newline, file not ending with a newline). Do both programs show the same output? If not, why? Does the modified version still seem more concise? Which would be easier to make changes to in the future?
Many decisions go into a choice of how to structure a program. Even one as simple as this K&R example.
This question is in K&R, exercise 1.9. I wrote the following code:
#include<stdio.h>
main()
{
int c,i=0,n=0;
while((c=getchar())!=EOF)
{
if(c!=' '||c!='\t')
{
i=0;
putchar(c);
}
else if(c==' '||c=='\t')
{
i++
}
if((c+1)!=' '||(c+1)!='\t')
n=i;
if(n!=0)
{
c=' ';
putchar(c);
}
}
}
but i could not get the desired output. I am using gcc in ubuntu. When I enter something like hello\t\ta as input then my output is hello\_\_a i.e number of tab is replaced by number of space and when I enter hello\_\_a then my output is same as input.
Please help me with it or suggest me something new to get the desired output.
Instead of giving your the full working program, I prefer to guide you to the right direction.
First of all, c+1 does not mean "next character in the input". It only adds 1 to the value of c, which effectively converts c to the next character in the ASCII table.
For example if c is 'a', c+1 means 'b', which is next character int the ASCII table, and if c is ' ' (a single space) that has a code of 32 in the table, c+1 is '!' that has a code 33 in the table.
Well, to get the next character, you need to read it! In the same way you read the first character. The best way to achieve this, is to always hold the previous read character, and check that with the currently read character.
So you need two variables, for example c and pc. You read the character and store it in c. At first, pc is '\0'. If the read character is not space or tab, you write it to the output. If it is tab, you change it to space. And if it is space, you check the previous character (pc). If it is not space, print c. At the end of the loop, you should store the value of c into pc, which means you are holding the previous character in pc.
I guess I told you the complete solution!
The problem is: you want to check the NEXT character, but you check the current character's value incremented by one.
The approach is slightly wrong, here is a hint, keep the last character as state, if the newly entered character is a space and the last character was a space, then don't output, simply go back round the loop and wait for the next character.
If the current character is not a space, output and update the state...
I'm writing a program that deciphers sentences, syllables, and words given in a basic text file.
The program cycles through the file character by character.
It first looks if it is some kind of end-of-sentence marker, like ! ? : ; or ..
Then if the character is not a space or tab, it assumes it is a character.
Finally, it identifies that if it is a space or tab, and the last character before it was a valid letter/character (e.g. not an end-of-sentence marker), it is a word.
I was a bit light on the details, but here is the problem I have.
My word count is equal to my sentence count. What this interprets to, is it realizes that a word stops when there is an end of sentence marker, BUT the real problem is the spaces are considered valid letters.
Heres my if statement, to decide if the character in question is a valid letter in a word:
else if(character != ' ' || character != '\t')
I've already ruled out end-of-sentence markers by that point in the program. (In the original if actually). From reading off an Ascii table, 32 should be the space character.
However, when i output all of the characters that make it into that block of code, spaces are in there.
So what am I doing wrong? How can i stop spaces from getting through this if?
Thanks in advance, and I have a feeling the question may be a bit vague, or poorly worded. If you have any questions or need clarification, let me know.
You should not rely on actual numbers for characters: that depends upon the encoding your platform uses, and may not be ASCII. You can check for any particular character by simply testing against it. For example, to test if c is a space character:
if (c == ' ')
will work, is easier to read, and is portable.
If you want to skip all white-space, you should use #include <ctype.h> and then use isspace():
if (isspace((unsigned char)c))
Edit: As others said, your condition to check for "not a space" is wrong, but the above point still applies. So, your condition can be replaced by:
if (!isspace((unsigned char)c))
I note that
(character != 32 || character != 9)
is always true. because if the character is 32 it is not 9, and true OR false is true...
You probably mean
(character != ' ' && character != '\t')
It would probably be better to just compare against the specific characters you consider whitespace, also use an &&:
if ((character != ' ') &&
(character != '\t'))