Problems when trying to skip '\n' in reading txt files - c

I wrote a fiarly small program to help with txt files formatting, but when I tried to read from the input files and skip unwanted '\n' I actually skipped the next character after '\n' instead.
The characters I work on in the sample file is like this:
abcde
abc
ab
abcd
And my code looks like this:
while (!feof(fp1)) {
ch = fgetc(fp1);
if (ch != '\n') {
printf("%c",ch);
}
else {
ch = fgetc(fp1); // move to the next character
if (ch == '\n') {
printf("%c",ch);
}
}
}
The expected result is
abcdeabc
ababcd
But I actually got
abcdebc
abbcd
I guess the problem is in ch = fgetc(fp1); // move to the next character
, but I just can't find a correct way to implement this idea.

Think of the flow of your code (lines numbered below):
1: while (!feof(fp1)) {
2: ch = fgetc(fp1);
3: if (ch != '\n') {
4: printf("%c",ch);
5: }
6: else {
7: ch = fgetc(fp1); // move to the next character
8: if (ch == '\n') {
9: printf("%c",ch);
10: }
11: }
12: }
When you get a newline followed by non-newline, the flow is (starting at the else line): 6, 7, 8, 10, 11, 12, 1, 2.
It's that execution of the final 2 in that sequence that effectively throws away the non-newline character that you had read at 7.
If your intent is to basically throw away single newlines and convert sequences of newlines (two or more) to a single one(a), you can use something like the following pseudo-code:
set numNewlines to zero
while not end-file:
get thisChar
if numNewlines is one or thisChar is not newline:
output thisChar
if thisChar is newline:
increment numNewlines
else:
set numNewlines to zero
This reads the character in one place, making it less likely that you'll inadvertently skip one due to confused flow.
It also uses the newline history to decide what gets printed. It only outputs a newline on the second occurrence in a sequence of newlines, ignoring the first and any after the second.
That means taht a single newline will never be echoed and any group of two or more will be transformed into one.
Some actual C code that demonstrates this(b) follows:
#include <stdio.h>
#include <stdbool.h>
int main(void) {
// Open file.
FILE *fp = fopen("testprog.in", "r");
if (fp == NULL) {
fprintf(stderr, "Cannot open input file\n");
return 1;
}
// Process character by character.
int numNewlines = 0;
while (true) {
// Get next character, stop if none left.
int ch = fgetc(fp);
if (ch == EOF) break;
// Output only second newline in a sequence of newlines,
// or any non-nwline.
if (numNewlines == 1 || ch != '\n') {
putchar(ch);
}
// Manage sequence information.
if (ch == '\n') {
++numNewlines;
} else {
numNewlines = 0;
}
}
// Finish up cleanly.
fclose(fp);
return 0;
}
(a) It's unclear from your question how you want to handle sequences of three or more newlines so I've had to make an assumption.
(b) Of course, you shouldn't use this if your intent is to learn, because:
You'll learn more if you try yourself and have to fix any issues.
Educational institutions will almost certainly check submitted code against a web search, and you'll probably be pinged for plagiarism.
I'm just providing it for completeness.

Related

Multiple blank lines are not squeezed in one blank line(C) using I/O redirection

I am asked to squeezed two or more consecutive blank lines in the input as one blank line in the output. So I have to use Cygwin to do I/O or test it.
Example: ./Lab < test1.txt > test2.txt
my code is:
int main(void){
format();
printf("\n");
return 0;
}
void format(){
int c;
size_t nlines = 1;
size_t nspace = 0;
int spaceCheck = ' ';
while (( c= getchar()) != EOF ){
/*TABS*/
if(c == '\t'){
c = ' ';
}
/*SPACES*/
if (c ==' '){/*changed from isspace(c) to c==' ' because isspace is true for spaces/tabs/newlines*/
/* while (isspace(c = getchar())); it counts while there is space we will put one space only */
if(nspace > 0){
continue;
}
else{
putchar(c);
nspace++;
nlines = 0;
}
}
/*NEW LINE*/
else if(c == '\n'){
if(nlines >0){
continue;
}
else{
putchar(c);
nlines++;
nspace = 0;
}
}
else{
putchar(c);
nspace = 0;
nlines = 0;
}
}
}
However my test2.txt doesn't have the result I want. Is there something wrong in my logic/code?
You provide too little code, the interesting part would be the loop around the code you posted...
What you actually have to do there is skipping the output:
FILE* file = ...;
char c, prev = 0;
while((c = fgets(file)) != EOF)
{
if(c != '\n' || prev != '\n')
putchar(c);
prev = c;
}
If we have an empty line following another one, then we encounter two subsequent newline characters, so both c and prev are equal to '\n', which is the situation we do not want to output c (the subsequent newline) – and the inverse situation is any one of both being unequal to '\n', as you see above – and only then you want to output your character...
Side note: prev = 0 – well, I need to initalise it to anything different than a newline, could as well have been 's' – unless, of course, you want to skip an initial empty line, too, then you would have to initialise it with '\n'...
Edit, referring to your modified code: Edit2 (removed references to code as it changed again)
As your modified code shows that you do not only want to condense blank lines, but whitespace, too, you first have to consider that you have two classes of white space, on one hand, the newlines, on the other, any others. So you have to differentiate appropriately.
I recommend now using some kind of state machine:
#define OTH 0
#define WS 1
#define NL1 2
#define NL2 3
int state = OTH;
while (( c= getchar()) != EOF )
{
// first, the new lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
// then, any other whitespace
else if(isspace(c))
{
if(state != WS)
{
putchar(' ');
state = WS;
}
}
// finally, all remaining characters
else
{
putchar(c);
state = OTH;
}
}
First differentiation occurs to the current character's own class (newline, whitespace or other), second differentiation according to the previous character's class, which defines the current state. Output occurs always for any non-whitespace character or if the two subsequent whitespace characters only, if they are of different class (newline is a little specific, I need two states for, as we want to leave one blank line, which means we need two subsequent newline characters...).
Be aware: whitespace only lines do not apply as blank lines in above algorithm, so they won't be eliminated (but reduced to a line containing one single space). From the code you posted, I assume this is intended...
For completeness: This is a variant removing leading and trailing whitespace entirely and counting whitespace-only lines as empty lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
else if(isspace(c))
{
if(state == OTH)
state = WS;
}
else
{
if(state == WS)
{
putchar('');
}
putchar(c);
state = OTH;
}
Secret: Only enter the whitespace state, if there was a non-ws character before, but print the space character not before you encounter the next non-whitespace.
Coming to the newlines - well, if there was a normal character, we are either in state OTH or WS, but none of the two NL states. If there was only whitespace on the line, the state is not modified, thus we remain in the corresponding NL state (1 or 2) and skip the line correspondingly...
To dissect this:
if(c == '\n') {
nlines++;
is nlines ever reset to zero?
if(nlines > 1){
c = '\n';
And what happens on the third \n in sequence? will nlines > 1 be true? Think about it!
}
}
putchar(c);
I don't get this: You unconditionally output your character anyways, defeating the whole purpose of checking whether it's a newline.
A correct solution would set a flag when c is a newline and not output anything. Then, when c is NOT a newline (else branch), output ONE newline if your flag is set and reset the flag. I leave the code writing to you now :)

Count number of word, line, character in a file

Thanks for the attention.
I write a piece of code to count the number of word, line, and character using C language.
while((c = fgetc(fp)) != EOF)
{
if((char)(c) == ' ' || (char)(c) == '\t'){
num_word++;
num_char++;
}
else if((char)(c) == '\n'){
num_line++;
num_word++;
num_char++;
}
else{
num_char++;
}
}
Everything works fine except for the num_word. For example, if the test case has a blank line, it would count one more.
example for test
case
My program would count 5 instead of 4. Any hints for solving this problem?
You are using two branches (space, \t and \n as delimiters) for new words. It fails for continuous whitespace characters.
To solve the problem, you can either skip the continuous whitespace characters (by keeping a counter or flag for it), or set a new non-whitespace character as beginning of a new word.
The C Programming Language by K&R provides very nice examples.
Your word count is 1 more because you have 1 blank line. It will be 2 more if you have 2 blank lines and so on. This is because of the way you are counting the words on line break. The easiest thing you can do is NOT increment the word count if there are successive line breaks indicating blank lines.
char lastChar = ' ';
while((c = fgetc(fp)) != EOF)
{
if((char)(c) == ' ' || (char)(c) == '\t'){
num_word++;
num_char++;
}
else if((char)(c) == '\n'){
num_line++;
if(lastChar != '\n')
num_word++;
num_char++;
}
else{
num_char++;
}
lastChar = (char)c;
}

C - Reading from a file issue with the last line

The expected input to my program for my assignment is something like
./program "hello" < helloworld.txt. The trouble with this however is that I must analyse every line that is in the program, so I have used the guard for the end of a line as:
while((c = getchar()) != EOF) {
if (c == '\n') {
/*stuff will be done*/
However, my problem with this is that if the helloworld.txt file contains:
hello
world
It will only read the first line(up to the second last line if there were to be more lines).
For this to be fixed, I have to strictly make a new line such that helloworld.txt looks something like:
hello
world
//
Is there another way around this?
Fix your algorithm. Instead of:
while((c = getchar()) != EOF) {
if (c == '\n') {
/* stuff will be done */
} else {
/* buffer the c character */
}
}
Do:
do {
c = getchar();
if (c == '\n' || c == EOF) {
/* do stuff with the buffered line */
/* clear the buffered line */
} else {
/* add the c character to the buffered line */
}
} while (c != EOF);
But please note that you shouldn't use the value of the c variable if it is EOF.
You need to re-structure your program so it can "do stuff" on EOF, if it has read any characters since the previous linefeed. That way, a non-terminated final line will still be processed.

reading a file using getc and skipping a line if it starts with semicolon

while((c = getc(file)) != -1)
{
if (c == ';')
{
//here I want to skip the line that starts with ;
//I don't want to read any more characters on this line
}
else
{
do
{
//Here I do my stuff
}while (c != -1 && c != '\n');//until end of file
}
}
Can I completely skip a line using getc if first character of line is a semicolon?
Your code contains a couple of references to -1. I suspect that you're assuming that EOF is -1. That's a common value, but it is simply required to be a negative value — any negative value that will fit in an int. Do not get into bad habits at the start of your career. Write EOF where you are checking for EOF (and don't write EOF where you are checking for -1).
int c;
while ((c = getc(file)) != EOF)
{
if (c == ';')
{
// Gobble the rest of the line, or up until EOF
while ((c = getc(file)) != EOF && c != '\n')
;
}
else
{
do
{
//Here I do my stuff
…
} while ((c = getc(file)) != EOF && c != '\n');
}
}
Note that getc() returns an int so c is declared as an int.
Let's assume that by "line" you mean a string of characters until you hit a designated end-of-line character (here assumed as \n, different systems use different characters or character sequences like \r\n). Then whether the current character c is in a semicolon-started line or not becomes a state information which you need to maintain across different iterations of the while-loop. For example:
bool is_new_line = true;
bool starts_with_semicolon = false;
int c;
while ((c = getc(file) != EOF) {
if (is_new_line) {
starts_with_semicolon = c == ';';
}
if (!starts_with_semicolon) {
// Process the character.
}
// If c is '\n', then next letter starts a new line.
is_new_line = c == '\n';
}
The code is just to illustrate the principle -- it's not tested or anything.

How do I add a newline character after every 3 characters in C?

I have a text file "123.txt" with this content:
123456789
I want the output to be:
123
456
789
This means, a newline character must be inserted after every 3 characters.
void convert1 (){
FILE *fp, *fq;
int i,c = 0;
fp = fopen("~/123.txt","r");
fq = fopen("~/file2.txt","w");
if(fp == NULL)
printf("Error in opening 123.txt");
if(fq == NULL)
printf("Error in opening file2.txt");
while (!feof(fp)){
for (i=0; i<3; i++){
c = fgetc(fp);
if(c == 10)
i=3;
fprintf(fq, "%c", c);
}
if(i==4)
break;
fprintf (fq, "\n");
}
fclose(fp);
fclose(fq);
}
My code works fine, but prints a newline character also at the end of file, which is not desired. This means, a newline character is added after 789 in the above example. How can I prevent my program from adding a spurious newline character at the end of the output file?
As indicated in the comments, your while loop is not correct. Please try to exchange your while loop with the following code:
i = 0;
while(1)
{
// Read a character and stop if reading fails.
c = fgetc(fp);
if(feof(fp))
break;
// When a line ends, then start over counting (similar as you did it).
if(c == '\n')
i = -1;
// Just before a "fourth" character is written, write an additional newline character.
// This solves your main problem of a newline character at the end of the file.
if(i == 3)
{
fprintf(fq, "\n");
i = 0;
}
// Write the character that was read and count it.
fprintf(fq, "%c", c);
i++;
}
Example: A file containing:
12345
123456789
is turned into a file containing:
123
45
123
456
789
I think you should do your new line at the beggining of the lopp:
// first read
c = fgetc(fp);
i=0;
// fgetc returns EOF when end of file is read, I usually do like that
while((c = fgetc(fp)) != EOF)
{
// Basically, that means "if i divided by 3 is not afloating number". So,
// it will be true every 3 loops, no need to reset i but the first loop has
// to be ignored
if(i%3 == 0 && i != 0)
{
fprintf (fq, "\n");
}
// Write the character
fprintf(fq, "%c", c);
// and increase i
i++;
}
I can't test it right now, maybe there is some mistakes but you see what I mean.

Resources