Removing neighboring duplicate lines from a file using C - c

Empty lines also should be removed if they are duplicates. If line has escape sequences (like \t), it's different than empty line. Code below is deleting too many lines, or sometimes leave duplicates. How to fix this?
#include <stdio.h>
#include <stdlib.h>
int main()
{
char a[6000];
char b[6000];
int test = 0;
fgets(a, 6000, stdin);
while (fgets(b, 6000, stdin) != NULL) {
for (int i = 0; i < 6000; i++) {
if (a[i] != b[i]) {
test = 1;
}
}
if (test == 0) {
fgets(b, 6000, stdin);
} else {
printf("%s", a);
}
int j = 0;
while (j < 6000) {
a[j] = b[j];
j++;
}
test = 0;
}
return 0;
}

Your logic is mostly sound. You are on the right track with your train of thought:
Read a line into previous (a).
Read another line into current (b).
If previous and current have the same contents, go to step 2.
Print previous.
Move current to previous.
Go to step 2.
This still has some problems, however.
Unnecessary line-read
To start, consider this bit of code:
while(fgets(b,6000,stdin)!=NULL) {
...
if(test==0) {
fgets(b,6000,stdin);
}
else {
printf("%s",a);
}
...
}
If a and b have the same contents (test==0), you use an unchecked fgets to read a line again, except you read again when the loop condition fgets(b,6000,stdin)!=NULL is evaluated. The problem is that you're mostly ignoring the line you just read, meaning you're moving an unknown line from b to a. Since the loop already reads another line and checks for failure appropriately, just let the loop read the line, and invert the if statement's equality test to print a if test!=0.
Where's the last line?
Your logic also will not print the last line. Consider a file with 1 line. You read it, then fgets in the loop condition attempts to read another line, which fails because you're at the end of the file. There is no print statement outside the loop, so you never print the line.
Now what about a file with 2 lines that differ? You read the first line, then the last line, see they're different, and print the first line. Then you overwrite the first line's buffer with the last line. You fail to read another line because there aren't any more, and the last line is, again, not printed.
You can fix this by replacing the first (unchecked) fgets with a[0] = 0. That makes the first byte of a a null byte, which means the end of the string. It won't compare equal to a line you read, so test==1, meaning a will be printed. Since there is no string in a to print, nothing is printed. Things then continue as normal, with the contents of b being moved into a and another line being read.
Unique last line problem
This leaves one problem: the last line won't be printed if it's not a duplicate. To fix this, just print b instead of a.
The final recipe
Assign 0 to the first byte of previous (a[0]).
Read a line into current (b).
If previous and current have the same contents, go to step 2.
Print current.
Move current to previous.
Go to step 2.
As you can see, it's not much different from your existing logic; only steps 1 and 4 differ. It also ensures that all fgets calls are checked. If there are no lines in a file, nothing is printed. If there is only 1 line in a file, it is printed. If 2 lines differ, both are printed. If 2 lines are the same, the first is printed.
Optional: optimizations
Instead of checking all 6000 bytes, you only check up to the first null byte in either string since fgets will automatically add one to mark the end of the string.
Faster still would be to add a break statement inside the if statement of your for loop. If a single byte doesn't match, the entire line is not a duplicate, so you can stop comparing early—a lot faster if only byte 10 differs in two 1000-byte lines!

#include <stdio.h>
#include <string.h>
int main(void)
{
char buff[2][6000];
unsigned count=0;
char *prev=NULL
, *this= buff[count%2]
;
while( fgets(this, sizeof buff[0] , stdin)) {
if(!prev || strcmp(prev, this) ) { // first or different
fputs(this, stdout);
prev=this;
count++;
this=buff[count%2];
}
}
fprintf(stderr, "Number of lines witten: %u\n", count);
return 0;
}

There are few problems in your code, like :
for(int i=0; i<6000; i++) {
if(a[i]!=b[i]) {
test=1;
}
}
In this loop, every time the whole buffer will be compared character by character even if it finds if(a[i]!=b[i]) for some value of i. Probably you should break loop after test=1.
Your logic will also not work for a file with just 1 line as you are not printing line outside the loop.
Another problem is fixed length buffer of size of 6000 char.
May you can use getline to solve your problem. You can do -
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char * line = NULL;
char * comparewith = NULL;
int notduplicate;
size_t len = 0;
ssize_t read;
while ((read = getline(&line, &len, stdin)) != -1) {
((comparewith == NULL ) || (strcmp (line, comparewith) != 0)) ? (notduplicate = 1) : (notduplicate = 0);
if (notduplicate) {
printf ("%s\n", line);
if (comparewith != NULL)
free(comparewith);
comparewith = line;
line = NULL;
}
}
if (line)
free (line);
if (comparewith)
free (comparewith);
return 0;
}
An important point to note:
getline() is not in the C standard library. getline() was originally GNU extension and standardized in POSIX.1-2008. So, this code may not be portable. To make it portable, you'll need to roll your own getline() something like this.

Here is a much simpler solution that has no limitation on line length:
#include <stdio.h>
int main(void) {
int c, last1 = 0, last2 = 0;
while ((c = getchar()) != EOF) {
if (c != '\n' || last1 != '\n' || last2 != '\n')
putchar(c);
last2 = last1;
last1 = c;
}
return 0;
}
The code skips sequences of more than 2 consecutive newline characters, hence it removes duplicate blank lines.

Related

Comparing two files char by char - end of file error in C

I have a specific problem. I have to read Strings from two files and compare them char by char, and tell in which row is the difference and for how many chars they differ. I have made pointer and all stuff, but the problem is when it comes to the end of the first file (it needs to read just for the length of the shorter file) I am unable to compare the last String because my for loop goes till the '\n', and in last row there is no \n. But on the other side if I put '\0' in for loop it gives me a wrong result, because it counts '\n' as char as well. How do I handle this problem? Suggestions? I don't want to use strcmp() since I need to count char difference, and there are some other conditions to be fulfilled. Here is my problem:
while(!feof(fPointer)){
zeichnen = 0;
fgets(singleLine, 150, fPointer);
fgets(singleLine2, 150, fPointer2);
length = strlen(singleLine); // save the length of each row, in order to compare them
length2 = strlen(singleLine2);
if(length <= length2){
for(i=0; singleLine[i] != '\n'; i++){
singleLine[i] = tolower(singleLine[i]);
singleLine2[i] = tolower(singleLine2[i]);
if(singleLine[i] != singleLine2[i]){
zeichnen++;
}
}
printf("Zeile: %d \t Zeichnen: %d\n", zeile, zeichnen); // row, char
for(i=0; i < min(length, length2); i++) {
if ( singeLine[i] == '\n' || singleLine2[i] == '\n')
break;
/* Do work */
}
Alternative is to use logical and operator to test two or more conditions in for loop conditional part.
From fgets manual page:
char *fgets(char *s, int size, FILE *stream);
fgets() returns s on success, and NULL on error or when end of file
occurs while no characters have been read.
So you chould have while(true) loop and check both fgets return values for end of input.

Print the 5th letter of each word read from a text file

I wanted to find the 5th letter from every word after reading from the text file. I do not know where I am wrong.
After entering the file path, I am getting a pop up window reading
read.c has stopped working.
The code:
#include<stdio.h>
#include<conio.h>
int main(void){
char fname[30];
char ch[]={'\0'};
int i=0;
FILE *fp;
printf("enter file name with path\n");
gets(fname);
fp=fopen(fname,"r");
if(fp==0)
printf("file doesnot exist\n");
else{
printf("File read successfully\n");
do{
ch[i]=getc(fp);
if(feof(fp)){
printf("end of file");
break;
}
else if(ch[i]=='\n'){
putc(ch[4],stdout);
}
i++;
}while(1);
fclose(fp);
}
return 0;
}
I wanted to find the 5th letter from every word
That's not something your code is doing now. It is wrong for various reasons, like
char ch[]={'\0'}; is an array with length 1. with unbound ch[i], you're overrunning the allocated memory creating undefined behaviour.
gets() is very dangerous, it can cause buffer overflow.
getc() reads character-by-character, not word-by-word, so you need to take care of space character (' ') as a delimiter also.
etc.
My suggestion, rewrite your code using the following algorithm.
Allocate a buffer long enough to hold a complete line from the file.
Open the file, check for success.
Read a whole line to the buffer using fgets()
3.1. If fgets() return NULL, you've most probably reached the end of file. End.
3.2. Otherwise, continue to next step.
Tokenize the line using strtok(), using space ' ' as the delimiter. Check the returned token against NULL.
4.1. if token is NULL, go to step 3.
4.2. if token is not NULL, proceeded to next step .
Check strlen() of the returned token (which is the word). if it is more than 4, print the index 4 of the token. (Remember, array index in c is 0 based).
Continue to step 4.
You can use following snippet. With this code you need to know Maximum length of line. This code prints 5th character on each line on each word if its length is 5 or more.. Hope this work for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE 1000
int main()
{
char fname[30];
char myCurLine[MAX_LINE + 1];
char myCurWord[MAX_LINE + 1];
int index,loop;
FILE *fp;
printf("enter file name with path\n");
gets(fname);
fp=fopen(fname,"r");
if(fp==0)
printf("file doesnot exist\n");
else
{
printf("File read successfully\n");
do
{
if(fgets(myCurLine,MAX_LINE,fp) != NULL)
{
index = 0;
for(loop = 0;loop < MAX_LINE; loop++)
{
myCurWord[index] = myCurLine[loop];
index++;
if((myCurLine[loop] == ' ') || (myCurLine[loop] == '\n'))
{
myCurWord[index] = '\0';
index = 0;
if(strlen(myCurWord) > 4)
{
putchar(myCurWord[4]);
}
index = 0;
if(myCurLine[loop] == '\n')
break;
}
}
}
if(feof(fp))
{
break;
}
}while(1);
fclose(fp);
}
return 0;
}
I edited your code because of these reasons:
You don't need to use char array at all, since you're only checking for letters and you can count the letters in every word of the file (which can be checked using spaces) and print when your count reaches 4 (since we start at 0).
Since gets() has no overflow protection, fgets() is more preferred.
fgets(fname, sizeof(fname), stdin);
Another point is you can simplify your do-while loop into one while loop with the condition of breaking if reaching EOF, since your do-while is simply an infinite loop (with condition defined as true or 1) that breaks at EOF (which is checked in a separate if inside the infinite do-while).
while (!feof)
An alternative to char array is to loop until a space ' ' or newline '\n' is found.
I also removed the else from if (fp==0) to avoid too many indents.
I also added ctype.h to check if the 5th letter is really a letter using isalpha().
This is how the word's 5th letter search works:
Loop (outer loop) until end-of-file (EOF).
In each iteration of outer loop, loop (inner loop) until a space ' ' or newline '\n' is found.
If the counter in inner loop reaches 4 (which means 5th letter is reached),
print the current letter,
reset counter to zero,
then break the inner loop.
Applying those edits to your code,
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
int main(void){
char fname[30];
char ch; // changed from array to single char
int i=0;
FILE *fp;
printf("enter file name with path\n");
fgets(fname, sizeof(fname), stdin); // changed from gets()
// removes the \n from the input, or else the file won't be located
// I never encountered a file with newlines in its name.
strtok(fname, "\n");
fp=fopen(fname,"r");
// if file open failed,
// it tells the user that file doesn't exits,
// then ends the program
if (!fp) {
printf("file does not exist\n");
return -1;
}
// loops until end-of-file
while (!feof(fp))
// loops until space or newline or when 5th letter is found
for (i = 0; (ch=getc(fp)) != ' ' && ch != '\n'; i++)
// if 5th character is reached and it is a letter
if (i == 4 && isalpha(ch)) {
putc(ch ,stdout);
// resets letter counter, ends the loop
i = 0;
break;
}
fclose(fp);
return 0;
}
*Note: Words with less than 5 letters will not be included in the output, but you can specify a character or number to indicate that a word has less than 5 letters. (such as 0, -1)
sample read.txt:
reading write love
coder heart
stack overflow
output:
enter file name with path
read.txt
iertkf

Counting lines in a file excluding the empty lines in C

We have a program that will take a file as input, and then count the lines in that file, but without counting the empty lines.
There is already a post in Stack Overflow with this question, but the answer to that doesn't cover me.
Let's take a simple example.
File:
I am John\n
I am 22 years old\n
I live in England\n
If the last '\n' didn't exist, then the counting would be easy. We actually already had a function that did this here:
/* Reads a file and returns the number of lines in this file. */
uint32_t countLines(FILE *file) {
uint32_t lines = 0;
int32_t c;
while (EOF != (c = fgetc(file))) {
if (c == '\n') {
++lines;
}
}
/* Reset the file pointer to the start of the file */
rewind(file);
return lines;
}
This function, when taking as input the file above, counted 4 lines. But I only want 3 lines.
I tried to fix this in many ways.
First I tried by doing fgets in every line and comparing that line with the string "\0". If a line was just "\0" with nothing else, then I thought that would solve the problem.
I also tried some other solutions but I can't really find any.
What I basically want is to check the last character in the file (excluding '\0') and checking if it is '\n'. If it is, then subtract 1 from the number of lines it previously counted (with the original function). I don't really know how to do this though. Are there any other easier ways to do this?
I would appreciate any type of help.
Thanks.
You can actually very efficiently amend this issue by keeping track of just the last character as well.
This works because empty lines have the property that the previous character must have been an \n.
/* Reads a file and returns the number of lines in this file. */
uint32_t countLines(FILE *file) {
uint32_t lines = 0;
int32_t c;
int32_t last = '\n';
while (EOF != (c = fgetc(file))) {
if (c == '\n' && last != '\n') {
++lines;
}
last = c;
}
/* Reset the file pointer to the start of the file */
rewind(file);
return lines;
}
Here is a slightly better algorithm.
#include <stdio.h>
// Reads a file and returns the number of lines in it, ignoring empty lines
unsigned int countLines(FILE *file)
{
unsigned int lines = 0;
int c = '\0';
int pc = '\n';
while (c = fgetc(file), c != EOF)
{
if (c == '\n' && pc != '\n')
lines++;
pc = c;
}
if (pc != '\n')
lines++;
return lines;
}
Only the first newline in any sequence of newlines is counted, since all but the first newline indicate blank lines.
Note that if the file does not end with a '\n' newline character, any characters encountered (beyond the last newline) are considered a partial last line. This means that reading a file with no newlines at all returns 1.
Reading an empty file will return 0.
Reading a file ending with a single newline will return 1.
(I removed the rewind() since it is not necessary.)
Firstly, detect lines that only consist of whitespace. So let's create a function to do that.
bool stringIsOnlyWhitespace(const char * line) {
int i;
for (i=0; line[i] != '\0'; ++i)
if (!isspace(line[i]))
return false;
return true;
}
Now that we have a test function, let's build a loop around it.
while (fgets(line, sizeof line, fp)) {
if (! (stringIsOnlyWhitespace(line)))
notemptyline++;
}
printf("\n The number of nonempty lines is: %d\n", notemptyline);
Source is Bill Lynch, I've little bit changed.
I think your approach using fgets() is totally fine. Try something like this:
char line[200];
while(fgets(line, 200, file) != NULL) {
if(strlen(line) <= 1) {
lines++;
}
}
If you don't know about the length of the lines in your files, you may want to check if line actually contains a whole line.
Edit:
Of course this depends on how you define what an empty line is. If you define a line with only whitespaces as empty, the above code will not work, because strlen() includes whitespaces.

Reading a text file in C, stopping at multiple points, breaking it into sections

I have a program that has a text file that is variable in length. It must be capable of being printed in the terminal. My problem is that if the code is too large, part of it becomes inaccessible due to the limited scroll of terminal. I was thinking of having a command executed by a character to continue the lines after a certain point, allowing the user to see what they needed, and scroll if they needed. However the closest I have come is what you see here, which prints the text file one line at a time as you press enter. This is extremely slow and cumbersome. Is there another solution?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main()
{
FILE *audit;
audit = fopen("checkout_audit.txt", "r");
char length_of_code[60000];
int ch;
while ((ch = fgetc(audit)) != EOF)
{
fgets(length_of_code, sizeof length_of_code, audit);
fprintf(stdout, length_of_code, audit);
getch();
if (ferror(audit))
{
printf("This is an error message!");
return 13;
}
}
fclose(audit);
return 0;
}
The libraries are included as I tried various methods. Perhaps there is something obvious I am missing, however after looking around I found nothing that suited my needs in C.
You can keep a count of something like num_of_lines and keep incrementing it and when it reaches some number(say 20 lines) then do a getchar() instead of doing it for each line.
Make sure you don't use feof() as already suggested. Just for the purpose of how it can be done I am showing the below snippet.
int num_of_lines = 0;
while(!feof(fp))
{
// fgets();
num_of_lines++;
if(num_of_lines == 20)
{
num_of_lines = 0;
getch();
}
}
Putting the same thing in your code:
int main()
{
FILE *audit;
audit = fopen("checkout_audit.txt", "r");
char length_of_code[60000];
int num_of_lines = 0;
int ch;
while (fgets(length_of_code, sizeof length_of_code, audit) != NULL)
{
fprintf(stdout, length_of_code, audit);
if (ferror(audit))
{
printf("This is an error message!");
return 13;
}
num_of_lines++;
if(num_of_lines == 20)
{
num_of_lines = 0;
getch();
}
}
fclose(audit);
return 0;
}
From the man page of fgets()
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s.
Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte is stored after the last character in the buffer.
So char length_of_code[60000]; is not a better option.
Try to set the size of array to optimum value which in most case is 80.
Also as fgets fetches line by line you will have to output line by line untill EOF
EDIT:
1. 2nd argument to fprintf should be the format specifier and not length
2. 3rd arg should be a string and not the file pointer
fprintf(stdout, "%s", length_of_code);
Code Snippet:
while (fgets(length_of_code, sizeof(length_of_code), audit))
{
fprintf(stdout, "%s", length_of_code);
getch();
if (ferror(audit))
{
printf("This is an error message!");
return 13;
}
}

C - Malloc issue (maybe something else)

Update edition:
So, I'm trying to get this code to work without using scanf/fgets. Gets chars from the user, puts it into a pointer array using a while loop nested in a for loop.
#define WORDLENGTH 15
#define MAXLINE 1000
int main()
{
char *line[MAXLINE];
int i = 0;
int j;
int n;
char c;
for (n=0; c!=EOF; n){
char *tmp = (char *) malloc(256);
while ((c=getchar())!=' '){
tmp[i]=c; // This is no longer updating for some reason.
i++;
}
line[n++]=tmp; //
i=0;
printf("\n%s\n",line[n]); //Seg fault here
}
for(j = 0; j (lessthan) n; j++){
printf("\n%s\n", line[j]);
free (line[j]);
}
return 0;
So, now I'm getting a seg fault. Not sure why tmp[i] is not updating properly. Still working on it.
I've never learned this much about programming during the entire semester so far. Please keep helping me learn. I'm loving it.
You print line[i] and just before that, you set i to 0. Print line[n] instead.
Also, you forgot the terminating 0 character. And your code will become easier if you make tmp a char array and then strdup before assigning to line[n].
sizeof(WORLDLENGTH), for one, is wrong. malloc takes an integer, and WORLDLENGTH is an integer. sizeof(WORLDLENGTH) will give you the size of an integer, which is 4 if you compile for a 32-bit system, so you're allocating 4 bytes.
Btw - while ((c=getchar())!=' '||c!=EOF) - what's your intent here? A condition like (a!=b || a!=c) will always return true if b!=c because there is no way a can be both b and c.
And, as others pointed out, you're printing out line[i], where i is always 0. You probably meant line[n]. And you don't terminate the tmp string.
And there's no overflow checking, so you'll run into evil bugs if a word is longer than WORDLENGTH.
Others have already told you some specific problems with your code but one thing they seem to have missed is that c should be an int, not a char. Otherwise the comparison to EOF wil not work as expected.
In addition, the segfault you're getting is because of this sequence:
line[n++]=tmp;
printf("\n%s\n",line[n]);
You have already incremented n to the next array element then you try to print it. That second line should be:
printf("\n%s\n",line[n-1]);
If you just want some code that works (with a free "do what you darn well want to" licence), here's a useful snippet from my code library.
I'm not sure why you think fgets is to be avoided, it's actually very handy and very safe. I'm assuming you meant gets which is less handy and totally unsafe. Your code is also prone to buffer overruns as well, since it will happily write beyond the end of your allocated area if it gets a lot of characters that are neither space nor end of file.
By all means, write your own code if you're educating yourself but part of that should be examining production-tested bullet-proof code to see how it can be done. And, if you're not educating yourself, you're doing yourself a disservice by not using freely available code.
The snippet follows:
#include <stdio.h>
#include <string.h>
#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;
// Get line with buffer overrun protection.
if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}
if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;
// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.
if (buff[strlen(buff)-1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}
// Otherwise remove newline and give string back to caller.
buff[strlen(buff)-1] = '\0';
return OK;
}
// Test program for getLine().
int main (void) {
int rc;
char buff[10];
rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
printf ("No input\n");
return 1;
}
if (rc == TOO_LONG) {
printf ("Input too long\n");
return 1;
}
printf ("OK [%s]\n", buff);
return 0;
}
It's a useful line input function that has the same buffer overflow protection as fgets and can also detect lines entered by the user that are too long. It also throws away the rest of the too-long line so that it doesn't affect the next input operation.
Sample runs with 'hello', CTRLD, and a string that's too big:
pax> ./qq
Enter string> hello
OK [hello]
pax> ./qq
Enter string>
No input
pax> ./qq
Enter string> dfgdfgjdjgdfhggh
Input too long
pax> _
For what it's worth (and don't hand this in as your own work since you'll almost certainly be caught out for plagiarism - any half-decent educator will search for your code on the net as the first thing they do), this is how I'd approach it.
#include <stdio.h>
#include <stdlib.h>
#define WORDLENGTH 15
#define MAXWORDS 1000
int main (void) {
char *line[MAXWORDS];
int numwords = 0; // Use decent variable names.
int chr, i;
// Code to run until end of file.
for (chr = getchar(); chr != EOF;) { // First char.
// This bit gets a word.
char *tmp = malloc(WORDLENGTH + 1); // Allocate space for word/NUL
i = 0;
while ((chr != ' ') && (chr != EOF)) { // Read until space/EOF
if (i < WORDLENGTH) { // If space left in word,
tmp[i++] = chr; // add it
tmp[i] = '\0'; // and null-terminate.
}
chr = getchar(); // Get next character.
}
line[numwords++] = tmp; // Store.
// This bit skips space at end of word.
while ((chr == ' ') && (chr != EOF)) {
chr = getchar();
}
}
// Now we have all our words, print them.
for (i = 0; i < numwords; i++){
printf ("%s\n", line[i]);
free (line[i]);
}
return 0;
}
I suggest you read that and studdy the comments so that you know how it's working. Feel free to ask any questions in the comments section and I'll answer or clarify.
Here's a sample run:
pax$ echo 'hello my name is pax andthisisaverylongword here' | ./testprog
hello
my
name
is
pax
andthisisaveryl
here
Change your printf line - you need to print line[n] rather than line[i].
first your malloc formula is wrong
malloc(sizeof(char)*WORDLENGTH);
you need to allocate the sizeof a char enought times for the lenght of your word (also 15 seems a bit small, your not counting the longest word in the dictionnary or the "iforgettoputspacesinmyphrasestoscrewtheprogrammer" cases lol
don't be shy char is small you can hit 256 or 512 easily ^^
also
printf("\n%s\n",line[i]);
needs to be changed to
int j = 0;
for(j=0;j<i;j++){
printf("\n%s\n",line[j]);
}
your i never changes so you always print the same line

Resources