Finding line size of each row in a text file - c

How can you count the number of characters or numbers in each line? Is there something like a EOF thats more like a End of Line?

You can iterate through each character in the line and keep incrementing a counter until the end-of-line ('\n') is encountered. Make sure to open the file in text mode ("r") and not binary mode ("rb"). Otherwise the stream won't automatically convert different platforms' line ending sequences into '\n' characters.
Here is an example:
int charcount( FILE *const fin )
{
int c, count;
count = 0;
for( ;; )
{
c = fgetc( fin );
if( c == EOF || c == '\n' )
break;
++count;
}
return count;
}
Here's an example program to test the above function:
#include <stdio.h>
int main( int argc, char **argv )
{
FILE *fin;
fin = fopen( "test.txt", "r" );
if( fin == NULL )
return 1;
printf( "Character count: %d.\n", charcount( fin ) );
fclose( fin );
return 0;
}

Regarding reading a file line by line, look at fgets.
char *fgets(char *restrict s, int n, FILE *restrict stream);
The fgets() function shall read bytes
from stream into the array pointed to
by s, until n-1 bytes are read, or a
is read and transferred to
s, or an end-of-file condition is
encountered. The string is then
terminated with a null byte.
The only problem here may be if you can't guarantee a maximum line size in your file. If that is the case, you can iterate over characters until you see a line feed.
Regarding end of line:
Short answer: \n is the newline character (also called a line feed).
Long answer, from Wikipedia:
Systems based on ASCII or a compatible
character set use either LF (Line
feed, 0x0A, 10 in decimal) or CR
(Carriage return, 0x0D, 13 in decimal)
individually, or CR followed by LF
(CR+LF, 0x0D 0x0A); see below for the
historical reason for the CR+LF
convention. These characters are based
on printer commands: The line feed
indicated that one line of paper
should feed out of the printer, and a
carriage return indicated that the
printer carriage should return to the
beginning of the current line.
* LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
* CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
* CR: Commodore 8-bit machines, Apple II family, Mac OS up to version 9 and OS-9
But since you are not likely to be working with a representation that uses carriage return only, looking for a line feed should be fine.

If you open a file in text mode, i.e., without a b in the second argument to fopen(), you can read characters one-by-one until you hit a '\n' to determine the line size. The underlying system should take care of translating the end of line terminators to just one character, '\n'. The last line of a text file, on some systems, may not end with a '\n', so that is a special case.
Pseudocode:
count := 0
c := next()
while c != EOF and c != '\n'"
count := count + 1
the above will count the number of characters in a given line. next() is a function to return the next character from your file.
Alternatively, you can use fgets() with a buffer:
char buf[SIZE];
count = 0;
while (fgets(buf, sizeof buf, fp) != NULL) {
/* see if the string represented by buf has a '\n' in it,
if yes, add the index of that '\n' to count, and that's
the number of characters on that line, which you can
return to the caller. If not, add sizeof buf - 1 to count */
}
/* If count is non-zero here, the last line ended without a newline */

The original question was how to get the number of characters in "each line" (given a line? or the current line?), while the answers have mostly given solutions how to determine the length of the first line in a file. One can easily apply some of them to determine length of current line (without guessing beforehand maximum length for a buffer).
However, what one often needs in practice is the maximum length of any line in a file. Then one can reserve a buffer and use fgets to read the file line by line and use some nice functions (strtok, strtod etc.) to parse lines. In practice, you can use any of the previous solutions to determine length of one line, and just scan through all lines and take the maximum.
An easy script that reads the file character by character:
max=0; i=0;
do
if ((c=fgetc(f))!= EOF && c!='\n') i++;
else {
if (i>max) max=i;
i=0;
}
while (c!=EOF);
return max;
Note: In practice, it would suffice to have an upperbound for the maximum length. A dirty solution would be to use the file size as an upperbound for the maximum length of lines.

\n is the newline character in C. In other languages, such as C#, you may use something like C#'s Environment.EndLine to overcome platform difficulties.
If you already know that your string is one line (let's call it line), use strlen(line) to get the number of characters in it. Subtract 1 if it ends with the '\n'.
If the string has new line characters in it, you'll need to split it around the new line characters and then call strlen() on each substring.

Here is a Simple Algorithm :
You require
File Stream (FILE),
Line Number , which you want size of (int)
Returns
Total Characters in given line
Function :
#include <stdio.h>
#include <string.h>
int getLengthOfLine(FILE* df,int Ofline){
char cchar;
int line=1;
int total =1;
int atLine=0;
int afterLine=0;
while ((cchar=fgetc(df))!=EOF)
{
if (feof(df)){
break ;
}
if (cchar == '\n' || cchar == '\0'){
if(line==Ofline){
// printf(" before %d ",total);
atLine = total;
}
if(line==(Ofline+1)){
// printf(" after %d ",total);
afterLine = total-atLine;
}
// printf(" line is %d ",line);
line++;
}
total++;
}
fseek(df, 0L, SEEK_SET);
if(afterLine==0){
return (total-atLine-1);
}
else
{
return (afterLine-1);
}
}
Uses :
FILE* fp = fopen("path-to-file" , "r");
if(fp!=NULL){
printf(" %d",getLengthOfLine(fp,5));
}

Related

Is there a quick way to get the last element that was put in an array?

I use an fgets to read from stdin a line and save it in a char array, I would like to get the last letter of the line i wrote , which should be in the array before \nand \0.
For example if i have a char line[10] and write on the terminal 1stLine, is there a fast way to get the letter e rather than just cycling to it?
I saw this post How do I print the last element of an array in c but I think it doesn't work for me, even if I just create the array without filling it with fgets , sizeof line is already 10 because the array already has something in it
I know it's not java and I can't just .giveMeLastItem(), but I wonder if there is a smarter way than to cycle until the char before the \n to get the last letter I wrote
code is something like
char command[6];
fgets(command,6,stdin);
If you know the sentinel value, ex: \0 (or \n ,or any value for that matter), and you want the value of the element immediately preceding to that, you can
use strchr() to find out the position of the sentinel and
get the address of retPtr-1 and dereference to get the value you want.
There are many different ways to inspect the line read by fgets():
first you should check the return value of fgets(): a return value of NULL means either the end of file was reached or some sort of error occurred and the contents of the target array is undefined. It is also advisable to use a longer array.
char command[80];
if (fgets(command, sizeof command, stdin) == NULL) {
// end of file or read error
return -1;
}
you can count the number of characters with len = strlen(command) and if this length os not zero(*), command[len - 1] is the last character read from the file, which should be a '\n' if the line has less than 5 bytes. Stripping the newline requires a test:
size_t len = strlen(command);
if (len > 0 && command[len - 1] == '\n')
command[--len] = '\0';
you can use strchr() to locate the newline, if present with char *p strchr(command, '\n'); If a newline is present, you can strip it this way:
char *p = strchar(command, '\n');
if (p != NULL)
*p = '\0';
you can also count the number of characters no in the set "\n" with pos = strcspn(command, "\n"). pos will point to the newline or to the null terminator. Hence you can strip the trailing newline with:
command[strcspn(command, "\n")] = '\0'; // strip the newline if any
you can also write a simple loop:
char *p = command;
while (*p && *p != '\n')
p++;
*p = '\n'; // strip the newline if any
(*) strlen(command) can return 0 if the file contains an embedded null character at the beginning of a line. The null byte is treated like an ordinary character by fgets(), which continues reading bytes into the array until either size - 1 bytes have been read or a newline has been read.
Once you have only the array, there is no other way to do this. You could use strlen(line) and then get the last characters position based on this index, but this basically does exactly the same (loop over the array).
char lastChar = line[strlen(line)-1];
This has time-complexity of O(n), where n is the input length.
You can change the input method to a char by char input and count the length or store the last input. Every O(1) method like this uses O(n) time before (like n times O(1) for every character you read). But unless you have to really speed optimize (and you don't, when you work with user input) should just loop over the array by using a function like strlen(line) (and store the result, when you use it multiple times).
EDIT:
The strchr() function Sourav Ghosh mentioned, does exactly the same, but you can/must specify the termination character.
A straightforward approach can look the following way
char last_letter = command[ strcspn( command, "\n" ) - 1 ];
provided that the string is not empty or contains just the new line character '\n'.
Here is a demonstrative progarm.
#include <stdio.h>
#include <string.h>
int main(void)
{
enum { N = 10 };
char command[N];
while ( fgets( command, N, stdin ) && command[0] != '\n' )
{
char last_letter = command[ strcspn( command, "\n" ) - 1 ];
printf( "%c ", last_letter );
}
putchar( '\n' );
return 0;
}
If to enter the following sequence of strings
Is
there
a
quick
way
to
get
the
last
element
that
was
put
in
an
array?
then the output will be
s e a k y o t e t t t s t n n ?
The fastest way is to keep an array of references like this:
long ref[]
and ref[x] to contain the file offset of the last character of the xth line. Having this reference saved at the beginning of the file you will do something like:
fseek(n*sizeof(long))
long ref = read_long()
fseek(ref)
read_char()
I think this is the fastest way to read the last character at the end of the nth line.
I did a quick test of the three mentioned methods of reading a line from a stream and measuring its length. I read /usr/share/dict/words 100 times and measured with clock()/1000:
fgets + strlen = 420
getc = 510
fscanf with " 100[^\n]%n" = 940
This makes sense as fgets and strlen just do 2 calls, getc does a call per character, and fscanf may do one call but has a lot of machinery to set up for processing complex formats, so a lot more overhead. Note the added space in the fscanf format to skip the newline left from the previous line.
Beside the other good examples.
Another way is using fscanf()/scanf() and the %n format specifier to write to an argument the amount of read characters so far after you have input the string.
Then you subtract this number by one and use it as an index to command:
char command[6];
int n = 0;
if (fscanf(stdin, "%5[^\n]" "%n", command, &n) != 1)
{
fputs("Error at input!", stderr);
// error routine.
}
getchar();
if (n != 0)
{
char last_letter = command[n-1];
}
#include <stdio.h>
int main (void)
{
char command[6];
int n = 0;
if (fscanf(stdin, "%5[^\n]" "%n", command, &n) != 1)
{
fputs("Error at input!", stderr);
// error routine.
}
getchar();
if (n != 0)
{
char last_letter = command[n-1];
putchar(last_letter);
}
return 0;
}
Execution:
./a.out
hello
o

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

getchar() function in C: why it won't print a character right after I use getchar?

I am just a beginner in programming. And I am learning the K&R's book, the C Programming Language. While I am reading, I become more and more curious about this question --
when there is a loop to get characters one by one from the input and I put an outputing function in the loop, whose result I thought would be like print each character right after it had been entered. However, the result seems like the computer will only print out a whole package of characters after I tap a key.
Such as the answer of exercise 1-22 from K&R's book:
/* K&R Exercise 1-22 p.34
*
* Write a program to "fold" long input lines into two or more
* shorter lines after the last non-blank character that occurs before the n-th
* column of input. Make sure your program does something intelligent with very
* long lines, and if there are no blanks or tabs before the specified column.
*/
#include <stdio.h>
#define LINE_LENGTH 80
#define TAB '\t'
#define SPACE ' '
#define NEWLINE '\n'
void entab(int);
int main()
{
int i, j, c;
int n = -1; /* The last column with a space. */
char buff[LINE_LENGTH + 1];
for ( i=0; (c = getchar()) != EOF; ++i )
{
/* Save the SPACE to the buffer. */
if ( c == SPACE )
{
buff[i] = c;
}
/* Save the character to the buffer and note its position. */
else
{
n = i;
buff[i] = c;
}
/* Print the line and reset counts if a NEWLINE is encountered. */
if ( c == NEWLINE )
{
buff[i+1] = '\0';
printf("%s", buff);
n = -1;
i = -1;
}
/* If the LINE_LENGTH was reached instead, then print up to the last
* non-space character. */
else if ( i == LINE_LENGTH - 1 )
{
buff[n+1] = '\0';
printf("%s\n", buff);
n = -1;
i = -1;
}
}
}
I supposed the program would turn out to be like, it would print out only one line of characters, whose length is 80, right after I entered just 80 characters (and I haven't tapped an ENTER key yet). However, it doesn't show up that way! I can totally enter the whole string no matter how many characters there are. When I finally decide to finish the line, I just tap ENTER key, and it will give me the right outputs: the long string is cut into several short pieces/lines, which have 80 characters (and of course the last one may contain less than 80 characters).
I wonder WHY does that happened?
Usually (and in your case), stdin is line-buffered, so your programme doesn't receive the characters as they are typed, but in chunks, when the user enters a newline (Return), or, possibly, when the system buffer is full.
So when the user input is finally sent to your programme, it is copied into the programme's input buffer. That's where getchar() reads the characters from to fill buff.
If the input is long enough, buff will be filled with LINE_LENGTH characters from the input buffer and then printed several times (until the entire contents of the input buffer have been consumed).
On Linux (methinks generally on Unix-ish systems, but I'm not sure), you can also send the input to the programme without entering a newline by typing Ctrl+D on a non-empty line (as the first input on a line, that closes stdin; at a later point in an input line, you can close stdin by typing it twice), so if you type Ctrl+D after entering LINE_LENGTH [or more] characters without a newline, [at least the initial part of] the input is printed immediately then.
I'm very glad to see you here because I also come from China. Actually, my English is poor. And, I am a beginner in programming, too. So, I have not understand your purpose very well.
However, I found a problem in your codes. The loop may not stop. EOF is end of file, but you didn't open any files.
What I knew about getchar() is getting a standard input into butter cache. getchar() is often used to stop the input(or loop).
I hope my answer could help you.

I don't understand the behavior of fgets in this example

While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.

How to find number of lines of a file?

for example:
file_ptr=fopen(“data_1.txt”, “r”);
how do i find number of lines in the file?
You read every single character in the file and add up those that are newline characters.
You should look into fgetc() for reading a character and remember that it will return EOF at the end of the file and \n for a line-end character.
Then you just have to decide whether a final incomplete line (i.e., file has no newline at the end) is a line or not. I would say yes, myself.
Here's how I'd do it, in pseudo-code of course since this is homework:
open file
set line count to 0
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
read character from file
Extending that to handle a incomplete last line may not be necessary for this level of question. If it is (or you want to try for extra credits), you could look at:
open file
set line count to 0
set last character to end-of-file
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
set last character to character
read character from file
if last character is not new-line:
add 1 to line count
No guarantees that either of those will work since they're just off the top of my head, but I'd be surprised if they didn't (it wouldn't be the first or last surprise I've seen however - test it well).
Here's a different way:
#include <stdio.h>
#include <stdlib.h>
#define CHARBUFLEN 8
int main (int argc, char **argv) {
int c, lineCount, cIdx = 0;
char buf[CHARBUFLEN];
FILE *outputPtr;
outputPtr = popen("wc -l data_1.txt", "r");
if (!outputPtr) {
fprintf (stderr, "Wrong filename or other error.\n");
return EXIT_FAILURE;
}
do {
c = getc(outputPtr);
buf[cIdx++] = c;
} while (c != ' ');
buf[cIdx] = '\0';
lineCount = atoi((const char *)buf);
if (pclose (outputPtr) != 0) {
fprintf (stderr, "Unknown error.\n");
return EXIT_FAILURE;
}
fprintf (stdout, "Line count: %d\n", lineCount);
return EXIT_SUCCESS;
}
Is finding the line count the first step of some more complex operation? If so, I suggest you find a way to operate on the file without knowing the number of lines in advance.
If your only purpose is to count the lines, then you must read them and... count!

Resources