So I was wondering if this is right; I have to count the comments longer than one line in a file:
void commentsLongerThanOneLine(FILE* inputStream, FILE* outputStream) {
char c;
int i = 0;
while ((c = fgetc(inputStream) != EOF)) {
if (c == '/' && '*' && '\n') i++;
}
printf("Number of comments longer than one line is : %d\n", i);
return 0;
}
&& means boolean AND the test for equality is == not =,
You would need to have each indicator tested individually.
if ((c == '/') && (c == '*') && (c == '\n'))
Of course this would always generate FALSE as it is an impossible statement.
Even if the line worked the way you think, it would always be false as you seem to test c against slash AND asterix AND new line which is impossible. You need to check against slash set a flag and check if the next character is asterix Then verify that there is a new line before the end of the comment (*/)
// can span multiple lines and you need to check that the /* is not inside a // and vice versa.
Related
I am asked to squeezed two or more consecutive blank lines in the input as one blank line in the output. So I have to use Cygwin to do I/O or test it.
Example: ./Lab < test1.txt > test2.txt
my code is:
int main(void){
format();
printf("\n");
return 0;
}
void format(){
int c;
size_t nlines = 1;
size_t nspace = 0;
int spaceCheck = ' ';
while (( c= getchar()) != EOF ){
/*TABS*/
if(c == '\t'){
c = ' ';
}
/*SPACES*/
if (c ==' '){/*changed from isspace(c) to c==' ' because isspace is true for spaces/tabs/newlines*/
/* while (isspace(c = getchar())); it counts while there is space we will put one space only */
if(nspace > 0){
continue;
}
else{
putchar(c);
nspace++;
nlines = 0;
}
}
/*NEW LINE*/
else if(c == '\n'){
if(nlines >0){
continue;
}
else{
putchar(c);
nlines++;
nspace = 0;
}
}
else{
putchar(c);
nspace = 0;
nlines = 0;
}
}
}
However my test2.txt doesn't have the result I want. Is there something wrong in my logic/code?
You provide too little code, the interesting part would be the loop around the code you posted...
What you actually have to do there is skipping the output:
FILE* file = ...;
char c, prev = 0;
while((c = fgets(file)) != EOF)
{
if(c != '\n' || prev != '\n')
putchar(c);
prev = c;
}
If we have an empty line following another one, then we encounter two subsequent newline characters, so both c and prev are equal to '\n', which is the situation we do not want to output c (the subsequent newline) – and the inverse situation is any one of both being unequal to '\n', as you see above – and only then you want to output your character...
Side note: prev = 0 – well, I need to initalise it to anything different than a newline, could as well have been 's' – unless, of course, you want to skip an initial empty line, too, then you would have to initialise it with '\n'...
Edit, referring to your modified code: Edit2 (removed references to code as it changed again)
As your modified code shows that you do not only want to condense blank lines, but whitespace, too, you first have to consider that you have two classes of white space, on one hand, the newlines, on the other, any others. So you have to differentiate appropriately.
I recommend now using some kind of state machine:
#define OTH 0
#define WS 1
#define NL1 2
#define NL2 3
int state = OTH;
while (( c= getchar()) != EOF )
{
// first, the new lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
// then, any other whitespace
else if(isspace(c))
{
if(state != WS)
{
putchar(' ');
state = WS;
}
}
// finally, all remaining characters
else
{
putchar(c);
state = OTH;
}
}
First differentiation occurs to the current character's own class (newline, whitespace or other), second differentiation according to the previous character's class, which defines the current state. Output occurs always for any non-whitespace character or if the two subsequent whitespace characters only, if they are of different class (newline is a little specific, I need two states for, as we want to leave one blank line, which means we need two subsequent newline characters...).
Be aware: whitespace only lines do not apply as blank lines in above algorithm, so they won't be eliminated (but reduced to a line containing one single space). From the code you posted, I assume this is intended...
For completeness: This is a variant removing leading and trailing whitespace entirely and counting whitespace-only lines as empty lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
else if(isspace(c))
{
if(state == OTH)
state = WS;
}
else
{
if(state == WS)
{
putchar('');
}
putchar(c);
state = OTH;
}
Secret: Only enter the whitespace state, if there was a non-ws character before, but print the space character not before you encounter the next non-whitespace.
Coming to the newlines - well, if there was a normal character, we are either in state OTH or WS, but none of the two NL states. If there was only whitespace on the line, the state is not modified, thus we remain in the corresponding NL state (1 or 2) and skip the line correspondingly...
To dissect this:
if(c == '\n') {
nlines++;
is nlines ever reset to zero?
if(nlines > 1){
c = '\n';
And what happens on the third \n in sequence? will nlines > 1 be true? Think about it!
}
}
putchar(c);
I don't get this: You unconditionally output your character anyways, defeating the whole purpose of checking whether it's a newline.
A correct solution would set a flag when c is a newline and not output anything. Then, when c is NOT a newline (else branch), output ONE newline if your flag is set and reset the flag. I leave the code writing to you now :)
Basically I have to write a program that counts all kinds of different symbols in a .c file. I got it to work with all of the needed symbols except the vertical line '|'. For some reason it just won't count them.
Here's the method I'm using:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename,"r");
FILE *f;
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
if (temp >= 'a' && temp <= 'z') {
capital++;
}
else if (temp >= 'A' && temp <= 'Z') {
lesser++;
}
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
if (temp >= '0' && temp <= '9') {
numbers++;
}
if (temp == '|') {
spc++;
}
if (temp == '\n') {
lines++;
}
}
}
On this line:
else if( temp == '/') temp = fgetc(fp); {
I believe you have a misplaced {. As I understand it should come before temp = fgetc(fp);..
You can easily avoid such an errors if following coding style guidelines placing each expression on it's own line and indenting the code properly.
Update: And this fgetc is a corner case. What if you read past EOF here? You are not checking this error.
Firstly, some compiler warnings:
'f' : unreferenced local variable
not all control paths return a value
So, f can be removed, and the function should return a value on success too. It's always a good idea to set compiler warnings at the highest level.
Then, there is a problem with:
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
Check the ; at the end of the else. This means the block following it, is always executed. Also, for this fgetc() there is no check for EOF or an error.
Also, if temp is a /, but the following character is not, it will be skipped, so we need to put the character back into the stream (easiest solution in this case).
Here is a full example:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename, "r");
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
// check characters - check most common first
if (temp >= 'a' && temp <= 'z') lesser++;
else if (temp >= 'A' && temp <= 'Z') capital++;
else if (temp >= '0' && temp <= '9') numbers++;
else if (temp == '|') spc++;
else if (temp == '\n') lines++;
else if( temp == '/')
if ((temp = fgetc(fp)) == EOF)
break; // handle error/eof
else
if(temp == '/') comments++;
else ungetc(temp, fp); // put character back into the stream
}
fclose (fp); // close as soon as possible
printf("capital: %d\nlesser: %d\ncomments: %d\n"
"numbers: %d\nspc: %d\nlines: %d\n",
capital, lesser, comments, numbers, spc, lines
);
return 1;
}
While it is usually recommended to put if statements inside curly braces, I think in this case we can place them on the same line for clarity.
Each if can be preceded with an else in this case. That way the program doesn't have to check the remaining cases when one is already found. The checks for the most common characters are best placed first for the same reason (but that was the case).
As an alternative you could use islower(temp), isupper(temp) and isdigit(temp) for the first three cases.
Performance:
For the sake of completeness: while this is probably an exercise on small files, for larger files the data should be read in buffers for better performance (or even using memory mapping on the file).
Update, #SteveSummit's comment on fgetc performance:
Good answer, but I disagree with your note about performance at the
end. fgetc already is buffered! So the performance of straightforward
code like this should be fine even for large inputs; there's typically
no need to complicate the code due to concerns about "efficiency".
Whilst this comment seemed to be valid at first, I really wanted to know what the real difference in performance would be (since I never use fgetc I hadn't tested this before), so I wrote a little test program:
Open a large file and sum every byte into a uint32_t, which is comparable to scanning for certain chars as above. Data was already cached by the OS disk cache (since we're testing performance of the functions/scans, not the reading speed of the hard disk). While the example code above was most likely for small files, I thought I might put the test results for larger files here as well.
Those were the average results:
- using fgetc : 8770
- using a buffer and scan the chars using a pointer : 188
- use memory mapping and scan chars using a pointer : 118
Now, I was pretty sure using buffers and memory mapping would be faster (I use those all the time for larger data), the difference in speed is even bigger than expected. Ok, there might be some possible optimizations for fgetc, but even if those would double the speed, the difference would still be high.
Bottom line: Yes, it is worth the effort to optimize this for larger files. For example, if processing the data of a file takes 1 second with buffers/mmap, it would take over a minute with fgetc!
while((c = getc(file)) != -1)
{
if (c == ';')
{
//here I want to skip the line that starts with ;
//I don't want to read any more characters on this line
}
else
{
do
{
//Here I do my stuff
}while (c != -1 && c != '\n');//until end of file
}
}
Can I completely skip a line using getc if first character of line is a semicolon?
Your code contains a couple of references to -1. I suspect that you're assuming that EOF is -1. That's a common value, but it is simply required to be a negative value — any negative value that will fit in an int. Do not get into bad habits at the start of your career. Write EOF where you are checking for EOF (and don't write EOF where you are checking for -1).
int c;
while ((c = getc(file)) != EOF)
{
if (c == ';')
{
// Gobble the rest of the line, or up until EOF
while ((c = getc(file)) != EOF && c != '\n')
;
}
else
{
do
{
//Here I do my stuff
…
} while ((c = getc(file)) != EOF && c != '\n');
}
}
Note that getc() returns an int so c is declared as an int.
Let's assume that by "line" you mean a string of characters until you hit a designated end-of-line character (here assumed as \n, different systems use different characters or character sequences like \r\n). Then whether the current character c is in a semicolon-started line or not becomes a state information which you need to maintain across different iterations of the while-loop. For example:
bool is_new_line = true;
bool starts_with_semicolon = false;
int c;
while ((c = getc(file) != EOF) {
if (is_new_line) {
starts_with_semicolon = c == ';';
}
if (!starts_with_semicolon) {
// Process the character.
}
// If c is '\n', then next letter starts a new line.
is_new_line = c == '\n';
}
The code is just to illustrate the principle -- it's not tested or anything.
This program is supposed to remove all comments from a C source code (in this case comments are considered double slashes '//' and a newline character '\n' and anything in between them, and also anything between '/* ' and '*/'.
The program:
#include <stdio.h>
/* This is a multi line comment
testing */
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '/') //Possible comment
{
c = getchar();
if (c == '/') // Single line comment
while (c = getchar()) //While there is a character and is not EOF
if (c == '\n') //If a space character is found, end of comment reached, end loop
break;
else if (c == '*') //Multi line comment
{
while (c = getchar()) //While there is a character and it is not EOF
{
if (c == '*' && getchar() == '/') //If c equals '*' and the next character equals '/', end of comment reached, end loop
break;
}
}
else putchar('/'); putchar(c); //If not comment, print '/' and the character next to it
}
else putchar(c); //if not comment, print character
}
}
After I use this source code as its own input, this is the output I get:
#include <stdio.h>
* This is a multi line comment
testing *
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '') ////////////////
{
c = getchar();
if (c == '') ////////////////////
while (c = getchar()) /////////////////////////////////////////
if (c == '\n') ///////////////////////////////////////////////////////////////
break;
else if (c == '*') ///////////////////
{
while (c = getchar()) ////////////////////////////////////////////
{
No more beyond this point. I'm compiling it using g++ on the ubuntu terminal.
As you can see, multi lines comments had only their '/' characters removed, while single line ones, had all their characters replaced by '/'. Apart from that, any '/' characters that were NOT the beginning of a new comment were also removed, as in the line if (c == ''), which was supposed to be if (c == '/').
Does anybody know why? thanks.
C does not take notice of the way you indent your code. It only cares about its own grammar.
Look carefully at your elses and think about which if they attach to (hint: the closest open one).
There are other bugs, as well. EOF is not 0, so only the first while is correct. And what happens if the comment looks like this: /* something **/?
You have some (apparent) logic errors...
1.
while (c = getchar()) //While there is a character and is not EOF
You're assuming that EOF == 0. Why not be explicit and change the preceding line to:
while((c = getchar()) != EOF)
2.
else putchar('/'); putchar(c);
Are both of the putchars supposed to be part of the else clause? If so, you need braces {} around the two putchar statements. Also, give each putchar its own line; it not only looks nicer but it's more readable.
Conclusion
Other than what I've mentioned, your logic looks sound.
As already mentioned, the if/else matching is incorrect. One aditional missing functionality is that you must make it more stateful to keep track of whether you are inside a string or not, e.g.
printf("This is not // a comment\n");
I have a configuration file that is supposed to have \r\n line ending style, and I want to include the code in my program to check and correct the format.
Existing code:
int convert_line_endings(FILE *fp)
{
char c = 0, lastc = 0, cnt = 0;
while((c = fgetc(fp)) != EOF)
{
if((c == '\n') && (lastc != '\r'))
{
cnt++;
//somehow "insert" a '\r' in here, after the previous char and before the '\n'
}
lastc = c;
}
return cnt;
}
And in C programming, you can't "insert" a char (or can you?!), just overwrite one or the other. Any suggestions?
No, you can't delete or insert. What you can do is write to a new temporary file by copying everything except \r\n sequences and then overwrite the original file with it.