calculate mean length of a line using C - c

I have a text file which I have filled a number lines from different texts, having different line length.
What I want to do is calculate the average number characters per line which matters to me in my job. I wrote the following code in C to achieve this. However I cannot run the program once it is compiled.
#include<stdio.h>
#include<stdlib.h>
#define LENGTH 10000000
int main()
{
char c;
int i;
int line_length;
int j;
int char_count;
char char_a[LENGTH];
int line_a[LENGTH];
int line_count;
long sum;
float avg_char_count;
FILE *fp=fopen("input.txt","r");
if(!fp){
fprintf(stderr,"cannot open file");
exit(1);
}
/*read into file*/
i=0;
sum=0;
while(char_a[i++]=fgetc(fp))
sum++;
printf("chars count: %d \n",sum);
/*process array*/
char_count=i;
j=0;
line_count=0;
while(j++<char_count){
if(char_a[j]=='\n'){
sum--;
line_count++;
}
}
/* calculate the average*/
avg_char_count=sum/(float)line_count;
printf("\naverage # of chars in a line is: %f\n ",avg_char_count);
return EXIT_SUCCESS;
}
By the way I am using Borland C++ command-line tool BCC32, running on Windows 7 SP1.
What's wrong with my code?

Try declaring char_a and line_a as pointers to char and int as:
char *char_a;
int *line_a;
And then allocate memory dynamically using malloc.
char_a=(char*)malloc(10000000*sizeof(char));
line_a=(int*)malloc(10000000*sizeof(int));
Secondly, your while loop should end when you reach end of file, i.e. EOF.
while(char_a[i]=fgetc(fp)){
if(char_a[i++]==EOF)
break;
sum++;
}
And, you should initialize the line_count to 1 instead of 0, because when there is no '\n' in the text file, there can still be one line. If there is one '\n' in the text file, it means there are two lines (say, you are in line 1, and then you hit enter, which is '\n', and then you get to the new line, so for one '\n', there are 2 lines).
/*process array*/
char_count=sum;
j=0;
line_count=1;
while(j++<char_count){
if(char_a[j]=='\n'){
sum--;
line_count++;
}
}
NOTE-Currently your char_count include newlines ('\n') when it is printed. Print the statement in the end, because in the end of your program, you have already excluded the newlines by decrementing the sum in the if statement of the second while loop.

The most probable cause is that you allocate 20 Mb of variables on the stack.
I would change the program so that it reads the file on line at a time (or even one character at a time).
That way you only need to allocate space for one line and not for the entire file.

Related

A prgram that prints the length of a line in C

Here's the program:
#include <stdio.h>
#define BUF_LEN 200
#define LINE_NUMBER 3
int line_len(char* filename, int n)
{
FILE* f;
char buf[BUF_LEN];
int j, i = 0;
if ((f = fopen(filename, "r")))
{
for (j = 0; j < n; j++)
fgets(buf, BUF_LEN, f);
for (i = 0; buf[i]; i++) /* find end of buf */ ;
fclose(f);
}
return i;
}
int main()
{
printf("%d\n", line_len("test.txt", LINE_NUMBER));
return 0;
}
From what I understand, the function line_len receives the name of the file and the number of line we are interested in. It then opens the file in a read only mode and iterates till reaching the line n, through each iteration reading BUF_LEN-1 characters from the file f and storing these characters in buf. So when the first for loop breaks, buf will contain all the characters of the first n lines.
I do not understand the need for the second loop. When does it terminate?
How does this function work? If at the end of the first for loop buf will contain the characters of the first n lines, then how come this function returns the length of the line n?
Thanks in advance!
I do not understand the need for the second loop. When does it terminate?
The second loop has buf[i] as its loop condition. It will keep executing as long as buf[i] is true i.e. non zero. So when it gets to the nul character at the end of the line (added by fgets()) the loop will terminate.
How does this function work?
Simplistically, it reads n lines. Each line is put into the buffer overwiriting the previous line. After it's read n lines, it counts the characters in the buffer from when it read the nth line.
It's a poor piece of code though. There's no error checking on the fgets call and if the nth line has more than 199 characters in it, it will give the wrong answer. In fact, if you consider the length of a line to exclude the line feed, it always gets the wrong answer.
It also returns zero if the file has fewer than n lines and if it was unable to open the file. If an error occurs reading a line, it returns an indeterminate number and if it runs off the end of the file, the length of the last line will be returned
An error result would be better in those cases.

Reading first 5 characters from a file using fread function in C

How do i read some 5 to 10 characters from a sample txt file using an fread funtion.
I have the following code:
#include <stdio.h>
main()
{
char ch,fname[20];
FILE *fp;
printf("enter the name of the file:\t");
gets(fname);
fp=fopen(fname,"r");
while(fread(&ch,1,1,fp)!=0)
fwrite(&ch,1,1,stdout);
fclose(fp);
}
when i enter any sample filename..it prints all the data of the file.
my question is how to print only the first 5 to 10 characters from the sample file.
Your while loop runs until read reaches the end of the file (reads 0 bytes for the first time).
You will want to change the condition by using a for loop or a counter.
i.e. (these are suggestions, not the full working code):
int counter = 10;
while(fread(&ch,1,1,fp)!=0 && --counter)
fwrite(&ch,1,1,stdout);
or
int i;
for(i=0; i < 10 && fread(&ch,1,1,fp) > 0 ; i++)
fwrite(&ch,1,1,stdout);
Good luck!
P.S.
To answer your question in the comments, fread allows us to read the data in "atomic units", so that if a whole unit isn't available, no data will be read.
A single byte is the smallest unit (1), and you are reading one unite (of a single byte), this is the 1,1 part in the fread(&ch,1,1,fp).
You could read 10 units using fread(&ch,1,10,fp) or read all the bytes unrequited for a single binary int (this won't be portable - it's just a demo) using int i; fread(&i,sizeof(int),1,fp);
read more here.
Here is a modified version of your code. Check the comments at the lines that are modified
#include <stdio.h>
#define N_CHARS 10 // define the desired buffer size once for code maintenability
int main() // main function should return int
{
char ch[N_CHARS + 1], fname[20]; // create a buffer with enough size for N_CHARS chars and the null terminating char
FILE *fp;
printf("enter the name of the file:\t");
scanf("%20s", fname); // get a string with max 20 chars from stdin
fp=fopen(fname,"r");
if (fread(ch,1,N_CHARS,fp)==N_CHARS) { // check that the desired number of chars was read
ch[N_CHARS] = '\0'; // null terminate before printing
puts(ch); // print a string to stdout and a line feed after
}
fclose(fp);
}

Reading line by line C

I have a txt file with some file names and their size.
This is how I wrote the txt file:
banana //file name
3 //the size of file banana
programs
12
music
524
I have to find a keyboard entered file name and display it's size.
This is my code:
FILE *text;
text=fopen("text.txt","r");
printf("Scan the number of letters of your file name");
int n;
scanf("%d",&n);
char s[++n];
printf("Scan the file name you are looking for: ");
int i;
for(i=0;i<=n;i++)
{
scanf("%c",&s[i]);
}
int l=0;
char c[n];
char g;
while(!feof(text))
{
if(l%2==1) {fgetc(text); fgetc(text); l++;}
if(l%2==0)
{
fgets(c,n,text);
fgetc(text);
for(i=0;i<n;i++)
{
printf("%c",c[i]);
}
l++;
}
}
Obviously, it's not correct. Can you help me? I'm a little bit confuse.
Ugh! Please learn more about basic input. Your program has various flaws:
fgetc reads single characters. This can be useful at times, but obviously you want to read whole lines. fgets does this. You use it once, but it is not advisable to mix these. Decide up front which input paradigm you want to use: char-wise (fgetc), line-wise (fgets) or token-wise (fscanf).
Please don't make the user enter the number of characters in the filename. Quick, how many characters are there in MySpiffyDocument.txt? That's work that the computer should do.
Don't use feof to control yopur input. All input functions have special return values toat indicate that either the end of the file was read or that an error occurred. For fgets, this return value is NULL, for fgetc, this return value is the special constant EOF. The functions feof and ferror are useful after you have encountered the special return values for a post mortem analysis of the two end conditions.
Your inner loop, which is responsible for the core program logic, doesn't make sense at all. For example, for an odd l, increment l and then test for an even l – which will be true, because you have just incrremented an odd l. Use else in such cases. And don't place things that happen anyway in conditional blocks: Increment l once after the if/else blocks.
Here's an example implementation:
#include <stdlib.h>
#include <stdio.h>
int process(const char *filename)
{
char line[80];
char name[80];
int size;
int count = 0;
FILE *f = fopen(filename, "r");
if (f == NULL) return -1;
while (fgets(line, sizeof(line), f)) {
if (count % 2 == 0) {
if (sscanf(line, "%s", name) < 1) continue;
} else {
if (sscanf(line, "%d", &size) < 1) continue;
printf("%12d %s\n", size, name);
}
count++;
}
fclose(f);
return 0;
}
int main()
{
char line[80];
char name[80];
puts("Please enter filename:");
while (fgets(line, sizeof(line), stdin)) {
if (sscanf(line, "%s", name) == 1) {
process(name);
break;
}
}
return 0;
}
Things to note:
The program uses 80 characters a max. buffer size; that means your lines can be up to 78 characters long – line content plus new-line '\n' plus null terminator '\0'. That should be okay for many cases, but eventually the line may overflow. (So your file-name letter count has some merit, but the real solution here is to allocate memory dynamically. I won't open that can of worms now.)
The code uses a double strategy: Read lines first, then scan into these lines with sscanf, so that only the first word on each line is read.
Empty lines are skipped. Even lines that don't hold a valid number are skipped, too. This is sloppy error handling and may trip the odd/even count.
Reading stuff interactively from the keyboard isn't very easy in C. The awkward fgets/sscanf construct in main tries to handle the case when the user enters an empty line or evokes an end-of-file signal via Ctrl-D/Z. A better and easier way is to provide arguments to the command line via argc and argv.
I've moved the file reading into a separate function.

File I/O in C Segmentation Fault

I know what a segmentation fault is, i don't need to know its definition :)
I just need to know where it's coming from in my code. This program is meant to get words as input, read from a text file, write to a separate text file and then print all the words from the read file and the input.
#include<stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char*argv[]){
FILE *read;
FILE *write;
char **list=malloc(sizeof(char*));
char **oldlist=malloc(sizeof(char*));
char *oldword=malloc(sizeof(char));
char exit[]="end";
int a, c, r=0, w=0, n=0, z= 0, y=0, d=0, g=0;
//check arg
for(a=0; a<argc; a++){
if(strcmp(argv[a], "-r")==0){
r =1;
read=fopen("read.txt", "r");
}
else if(strcmp(argv[a], "-w")==0){
w =1;
write=fopen("write.txt", "w");
}
}
if(r==0 && w==0){
printf("Error: Invalid Command.\n");
}
printf("Read = %d | Write = %d\n", r, w);
//getwords
printf("Enter your words: ");
while(1){
char *word=malloc(sizeof(char));
list=realloc(list, sizeof(char*)*(z+10));
word=realloc(word, sizeof(char)*(z+10));
scanf("%s", word);
if (strcmp(word,exit)==0){
break;
}
else{
*(list+z) = word;
z++;
}
}
//read
if (r==1){
do{
while(1){
*(oldword+d)=fgetc(read);
d++;
}
}while(feof(read) != 0);
}
*(oldword+(d-1))="\0";
printf("Your words are:\n");
puts(oldword);
for(c=0; c<n; c++){
puts(*(list+c));
}
//write
if (w ==1){
if(w==1){
fputs(*(oldlist+c),write);
}
for(c=0; c<n; c++){
fputs(*(list+c),write);
}
}
//end
free(list);
fclose(read);
fclose(write);
return 0;
}
You allocate 1 byte for the word:
char *word=malloc(sizeof(char));
You then read a string into it. This is a buffer overflow, and leads to great unhappiness (and questions like this on Stack Overflow). Specifically, reading long words will trample over control information in the 'heap' (the data space controlled by malloc() et al) and scramble data that is used to determine which space is available for use and which is not. What constitutes a 'long word' depends on the system; technically, any string except the empty string (just a terminal '\0') is too long, but you might get away with blue murder if the words are 7 bytes or less.
But shouldn't the realloc take care of the memory problem? I'm trying to make the string input unlimited.
Hmmm...it's a little odd as a way of doing business, but the realloc() of word before you actually use it gets you around some of the problem.
Specifically, you can read up to and including 9 characters before you overflow the memory allocation of word on the first time around the loop. However, that is a long way from making it 'unlimited', and making it unlimited is non-trivial. One issue is that %s stops scanning at a white space characters; that works in your favour, on the whole. Another is that you seem to be using z as a count of the number of strings you've entered and also as a length for the strings you can enter. You are not reallocating list if the list grows beyond 10 entries.
You can still run into various problems. Your handling of oldword doesn't do the extra realloc(), for example; it just allocates one byte. Then you have an infinite loop that is completely unbounded (if it is entered at all). This loop is a disaster:
while(1){
*(oldword+d)=fgetc(read);
d++;
}
If you compile with debugging on (the g flag) and run under valgrind it should give you a pretty good indication of why it seg faults.

trouble using fread in c

Im having some trouble figuring out how to properly format fread statements. The below code is just some randomn stuff Im practicing with. Basically it fills information into the first array (s), writes 's' to a file, and then reads the file into the second array (s2). However I can't seem to get the fread statement formated in a way that doesnt give an error or return garbage. The arrays are in char datatype because, if my understanding is correct, char uses less memory than other datatypes. The eventual application of this practice code is for a data compression project.
#include<stdio.h>
#include<string.h>
FILE *fp;
//file pointer
char s[56];
//first string
char s2[56];
//target string for the fread
int n=0;
//counting variable
int m=0;
int main (void)
{
fp=fopen("test.bin", "w+");
//open a file for reading and writing
strcpy(s, "101010001101010");
//input for the string
for(n=0;n<56;n++)
{
if(s[n]==1)
m=n;
else if(s[n]==0)
m=n;
}
printf("%d\n", m);
//the above for loop finds how many elements in 's' are filled with 1's and 0's
for(n=0;n<m;n++)
{
printf("%c", s[n]);
}
//for loop to print 's'
fwrite(s, m, 1, fp);
//writes 's' to the first file
s2=fread(&s2, m, 1, fp);
//an attempt to use fread...
printf("\n\ns2\n\n");
for(n=0;n<m;n++)
{
printf("%c", s2[n]);
}
printf("\n");
//for loop to print 's2'
fclose(fp);
printf("\n\n");
printf("press any number to close program\n");
scanf("%d", &m);
}
A FILE structure has an implicit seek position within the file. You read and write from that seek position. If you want to read what you have written, you need to change the seek position back to the beginning of the file with a call to fseek(). In fact, for a file open for reading and writing, you must call fseek() when switching between reading and writing.
The return value of the fread function is of type size_t. It is the number of elements successfully read. (reference: http://www.cplusplus.com/reference/cstdio/fread/)
Don't assign it to s2. Simply use fread(&s2, m, 1, fp);

Resources