How to find number of lines of a file? - c

for example:
file_ptr=fopen(“data_1.txt”, “r”);
how do i find number of lines in the file?

You read every single character in the file and add up those that are newline characters.
You should look into fgetc() for reading a character and remember that it will return EOF at the end of the file and \n for a line-end character.
Then you just have to decide whether a final incomplete line (i.e., file has no newline at the end) is a line or not. I would say yes, myself.
Here's how I'd do it, in pseudo-code of course since this is homework:
open file
set line count to 0
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
read character from file
Extending that to handle a incomplete last line may not be necessary for this level of question. If it is (or you want to try for extra credits), you could look at:
open file
set line count to 0
set last character to end-of-file
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
set last character to character
read character from file
if last character is not new-line:
add 1 to line count
No guarantees that either of those will work since they're just off the top of my head, but I'd be surprised if they didn't (it wouldn't be the first or last surprise I've seen however - test it well).

Here's a different way:
#include <stdio.h>
#include <stdlib.h>
#define CHARBUFLEN 8
int main (int argc, char **argv) {
int c, lineCount, cIdx = 0;
char buf[CHARBUFLEN];
FILE *outputPtr;
outputPtr = popen("wc -l data_1.txt", "r");
if (!outputPtr) {
fprintf (stderr, "Wrong filename or other error.\n");
return EXIT_FAILURE;
}
do {
c = getc(outputPtr);
buf[cIdx++] = c;
} while (c != ' ');
buf[cIdx] = '\0';
lineCount = atoi((const char *)buf);
if (pclose (outputPtr) != 0) {
fprintf (stderr, "Unknown error.\n");
return EXIT_FAILURE;
}
fprintf (stdout, "Line count: %d\n", lineCount);
return EXIT_SUCCESS;
}

Is finding the line count the first step of some more complex operation? If so, I suggest you find a way to operate on the file without knowing the number of lines in advance.
If your only purpose is to count the lines, then you must read them and... count!

Related

C Programming - File io parsing strings using sscanff

I am trying to do the following the C programming language, any help or if you can finish the code I will be greatly appreciated:
I am trying to write a program in C programming language that uses file io, that will parse through the words using sscanf function and output each word in all the sentences inside a txt document (bar.txt). Here is the instructions.
Write a program that opens the file bar.txt name the program "reader". Pass a parameter to indicate lines to read. Read all the lines in the file based on the parameter 'lines' into a buffer and using sscanf parse all the words of the sentences into different string* variables. Print each of the words to the screen followed by a carriage return. You can hardwire filename (path of bar.xt) or use option to enter filename.
This is the txt file (bar.txt) i am working with:
bar.txt
this is the first sentence
this is the 2nd sentence
this is the 3rd sentence
this is the 4th sentence
this is the 5th sentence
end of file: bar.txt
usage of argv: Usage: updater [-f "filename"] 'lines'
-f is optional (if not provided have a hardwired name from previous program 2 (bar.txt))
'lines' integer from 1 to 10 (remember the files has 5-10 strings from previous program)
a sample input example for the input into the program is:
./reader -f bar.txt 1
OUTPUT:
Opening file "bar.txt"
File Sentence 1 word 1 = this
File Sentence 1 word 2 = is
File Sentence 1 word 3 = the
File Sentence 1 word 4 = first
File Sentence 1 word 5 = sentence
another example
./reader -f bar.txt 5
OUTPUT:
File Sentence 5 word 1 = this
File Sentence 5 word 2 = is
File Sentence 5 word 3 = the
File Sentence 5 word 4 = 5th
File Sentence 5 word 5 = sentence
Examples of commands:
./reader -f bar.txt 5
./reader -f bar.txt 2
./reader -f bar.txt 7
./reader 2
./reader 5
./reader 8
./reader 11
this is the code that I have so far please fix the code to show the desired output:
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer = argv[3][0] - 48;
int i; //for loops
if (argv[1][0] == '-' && argv[1][1] == 'f')
{
file = fopen(filename, "r");
if (file == NULL){
printf("Could not open file %s",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}
}
You need to get the filename from argv if they use the -f option. And you need to get the number of lines from a different argument depending on whether this option was supplied.
Use strcmp() to compare strings, rather than testing each character separately. And use atoi() to convert the lines argument to an integer, since your method only works for single-digit numbers.
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
function usage() {
fprintf(stderr, "Usage: reader [-f filename] lines\n");
exit(1);
}
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer;
int i; //for loops
if (argc < 2) {
usage();
}
# Process arguments
if (strcmp(argv[1], "-f") == 0)
{
if (argc < 4) {
usage();
}
filename = argv[2];
integer = atoi(argv[3]);
} else {
integer = atoi(argc[1]);
}
file = fopen(filename, "r");
if (file == NULL){
fprintf(stderr, "Could not open file %s\n",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}
To add to what Barmar already answered, for the further steps in completing the assignment:
Splitting a string into separate words is usually called tokenization, and we normally use strtok() for this. There are several ways how one can use sscanf() to do it. For example:
Use sscanf(string, "%s %s %s", word1, word2, word3) with however many word buffers you might need. (If you use e.g. char word1[100], then use %99s, to avoid buffer overrun bugs. One character must be reserved for the end-of-string character \0.)
The return value of sscanf() tells you how many words it copied to the word buffers. However, if string contains more than the number of words you specified, the extra ones are lost.
If the exercise specifies the maximum length of strings, say N, then you know there can be at most N/2+1 words, each of maximum length N, because each consecutive pair of words must be separated by at least one space or other whitespace character.
  
Use sscanf(string + off, " %s%n", word, &len) to obtain each word in a loop. It will return 1 (with int len set to a positive number) for each new word, and 0 or EOF when string starting at off does not contain any more words.
The idea is that for each new word, you increment off by len, thus examining the rest of string in each iteration.
  
Use sscanf(string + off, " %n%*s%n", &start, &end) with int start, end to obtain the range of positions containing the next word. Set start = -1 and end = -1 before the call, and repeat as long as end > start after the call. Advance to next word by adding end to off.
The beginning of the next word (when start >= 0) is then string + start, and it has end - start characters.
To "emulate" strtok() behaviour, one can temporarily save the terminating character (which can be whitespace or the end of string character) by using e.g. char saved = string[off + end];, then replace it with an end-of-string character, string[off + end] = '\0';, so that (string + start) is a pointer to the word, just like strtok() returns. Before the next scan, one does string[off + end] = saved; to restore the saved character, and off += end; to advance to the next word. 
The first one is the easiest, but is the least useful in practical programs. (It works fine, but we do not usually know beforehand the string length and word count limitations.)
The second one is very useful when you have alternate patterns you can try for the next "word" or item; for example, when reading 2D or 3D vectors (points in a plane, or in three-dimensional space), you can support multiple different formats from <1 2 3> to [1,2,3] to 1 2 3, by trying to parse the most complicated/longest first, and trying the next one, until one of them works. (If none of them work, then the input is in error, of course.)
The third one is most useful in that it describes essentially how strtok() works, and what its side effects are. (It's saved character is hidden internally as a static variable.)

Simple XOR encryption program not working as intended

I'm writing a simple XOR encryption program in C.
It is supposed to read a text file line by line
and create a new encrypted file.
The problem is it is only encrypting the first line in the file, skipping the rest of file.
This is the code I've come up with.
From my analysis it should encrypt the file line by line.
But obviously I'm wrong somewhere, that's why it's misbehaving.
It would be really helpful if someone wiser could point out my mistake.
#include <stdio.h>
void encrypt(char *msg);
int main()
{
char line[100]; //all lines in file are shorter than 100 characters
FILE *in=fopen("msg.txt","r");
FILE *out=fopen("encrypted_msg.txt","a");
while(fscanf(in,"%99[^\n]",line)==1) // I included scanset to read new line in every iteration
{
encrypt(line);
}
fprintf(out,"%s\n",line);
return 0;
}
void encrypt(char *msg)
{
while(*msg)
{
*msg=*msg^31;
msg++;
}
}
It's simple. Just add:
char c = fgetc(in);
inside your while to read the endline \n.
Like this:
while(fscanf(in,"%99[^\n]",line)==1)
{
encrypt(line);
char c = fgetc(in);
}
You specified fscanf to read all symbols except the newline \n, so after the first while iteration there's a \n in input and on the second while iteration fscanf can read nothing cause you exclude \n from the permitted charset.
I think the answer from Edgar Rokyan works because fscanf does not remove the \n from the line so the next time it comes around the first char is a \n which is causes nothing to be read. My solution is to use fgets instead which reads the \n into the line buffer (if you don't want it there just check for it with an if statement and change the last char to a \0).
while(fgets(line, 99, in)) // I included scanset to read new line in every iteration
{
encrypt(line);
}
Or to get rid of the \n at the end of the line.
while(fgets(line, 99, in)) // I included scanset to read new line in every iteration
{
int len = strlen(line);
if(line[len - 1] == '\n')
line[len - 1] = '\0';
encrypt(line);
}

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

Reading from two columns into variables

I'm writing a C program that takes an input file and stores it. The input file has two columns, with an integer in the first and a string in the second, like so:
12 apple
17 frog
20 grass
I've tried using fgets to take an entire line as a string then break it apart using scanf but I'm getting lots of issues. I have searched quite a lot but haven't found anything that answers my question, but sorry if I missed something obvious.
This is the code that I've been trying:
while(fgets(line, sizeof(line), fp))
{
scanf(line, "%d\t%s", &key, value);
insert(key, value, newdict);
}
Let's have a quick go at doing with strtok since someone mentioned it. Let's imagine your file is called file.txt and has the following contents:
10 aaa
20 bbb
30 ccc
This is how we can parse it:
#include <stdio.h>
#include <string.h>
#define MAX_NUMBER_OF_LINES 10 // parse a maximum of 10 lines
#define MAX_LINE_SIZE 50 // parse a maximum of 50 chars per line
int main ()
{
FILE* fh = fopen("file.txt", "r"); // open the file
char temp[MAX_LINE_SIZE]; // some buffer storage for each line
// storage for MAX_NUMBER_OF_LINES integers
int d_out[MAX_NUMBER_OF_LINES];
// storage for MAX_NUMBER_OF_LINES strings each MAX_LINE_SIZE chars long
char s_out[MAX_NUMBER_OF_LINES][MAX_LINE_SIZE];
// i is a special variable that tells us if we're parsing a number or a string (0 for num, 1 for string)
// di and si are indices to keep track of which line we're currently handling
int i = 0, di = 0, si = 0;
while (fgets(temp, MAX_LINE_SIZE, fh) && di < MAX_NUMBER_OF_LINES) // read the input file and parse the string
{
temp[strlen(temp) -1] = '\0'; // get rid of the newline in the buffer
char* c = strtok(temp, " "); // set the delimiters
while(c != NULL)
{
if (i == 0) // i equal to 0 means we're parsing a number
{
i = 1; // next we'll parse a string, let's indicate that
sscanf(c, "%d", &d_out[di++]);
}
else // i must be 1 parsing a string
{
i = 0; // next we'll parse a number
sprintf(s_out[si++], "%s", c);
}
c = strtok(NULL, " ");
}
printf("%d %s\n", d_out[di -1], s_out[si - 1]); // print what we've extracted
}
fclose(fh);
return 0;
}
This will extract the contents from the file and store them in respective arrays, we then print them and get back our original contents:
$ ./a.out
10 aaa
20 bbb
30 ccc
Use:
fgets (name, 100, stdin);
100 is the max length of the buffer. You should adjust it as per your need.
Use:
scanf ("%[^\n]%*c", name);
The [] is the scanset character. [^\n] tells that while the input is not a newline ('\n') take input. Then with the %*c it reads the newline character from the input buffer (which is not read), and the * indicates that this read in input is discarded (assignment suppression), as you do not need it, and this newline in the buffer does not create any problem for next inputs that you might take.
The problem here seems to be that you are reading from the file twice. First with fgets and then with scanf. You will probably not get an errors from the compiler in your use of scanf, but should be getting warnings as you use line for the format string and the other arguments does not match the format. It would also be pretty obvious if you checked the return value from scanf, as it returns the number of successfully scanned items. Your call would most likely return zero (or minus one when you have hit end of file).
You should be using sscanf instead to parse the line you read with fgets.
See e.g. this reference for the different scanf variants.
Your problem can be solved by using sscanf (with the support of getline) like below:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
/* tokens bags */
char tok_str[255];
int tok_int;
fp = fopen("./file.txt", "r");
if (fp == NULL)
exit(EXIT_FAILURE);
/* Reads the line from the stream. */
while ((read = getline(&line, &len, fp)) != -1) {
/* Scans the character string pointed by line, according to given format. */
sscanf(line, "%d\t%s", &tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
if (line)
free(line);
exit(EXIT_SUCCESS);
}
Or, even simpler. You could use fscanf (with the support of feof) and replace the while loop shown above (along with some other redundant code cleanups) with the following one:
/* Tests the end-of-file indicator for the stream. */
while (!feof(fp)) {
/* Scans input from the file stream pointer. */
fscanf(fp,"%d\t%s\n",&tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
Assuming that your file contains following lines (where single line format is number[tab]string[newline]):
12 apple
17 frog
20 grass
the output will be:
12-apple
17-frog
20-grass

Finding line size of each row in a text file

How can you count the number of characters or numbers in each line? Is there something like a EOF thats more like a End of Line?
You can iterate through each character in the line and keep incrementing a counter until the end-of-line ('\n') is encountered. Make sure to open the file in text mode ("r") and not binary mode ("rb"). Otherwise the stream won't automatically convert different platforms' line ending sequences into '\n' characters.
Here is an example:
int charcount( FILE *const fin )
{
int c, count;
count = 0;
for( ;; )
{
c = fgetc( fin );
if( c == EOF || c == '\n' )
break;
++count;
}
return count;
}
Here's an example program to test the above function:
#include <stdio.h>
int main( int argc, char **argv )
{
FILE *fin;
fin = fopen( "test.txt", "r" );
if( fin == NULL )
return 1;
printf( "Character count: %d.\n", charcount( fin ) );
fclose( fin );
return 0;
}
Regarding reading a file line by line, look at fgets.
char *fgets(char *restrict s, int n, FILE *restrict stream);
The fgets() function shall read bytes
from stream into the array pointed to
by s, until n-1 bytes are read, or a
is read and transferred to
s, or an end-of-file condition is
encountered. The string is then
terminated with a null byte.
The only problem here may be if you can't guarantee a maximum line size in your file. If that is the case, you can iterate over characters until you see a line feed.
Regarding end of line:
Short answer: \n is the newline character (also called a line feed).
Long answer, from Wikipedia:
Systems based on ASCII or a compatible
character set use either LF (Line
feed, 0x0A, 10 in decimal) or CR
(Carriage return, 0x0D, 13 in decimal)
individually, or CR followed by LF
(CR+LF, 0x0D 0x0A); see below for the
historical reason for the CR+LF
convention. These characters are based
on printer commands: The line feed
indicated that one line of paper
should feed out of the printer, and a
carriage return indicated that the
printer carriage should return to the
beginning of the current line.
* LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
* CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
* CR: Commodore 8-bit machines, Apple II family, Mac OS up to version 9 and OS-9
But since you are not likely to be working with a representation that uses carriage return only, looking for a line feed should be fine.
If you open a file in text mode, i.e., without a b in the second argument to fopen(), you can read characters one-by-one until you hit a '\n' to determine the line size. The underlying system should take care of translating the end of line terminators to just one character, '\n'. The last line of a text file, on some systems, may not end with a '\n', so that is a special case.
Pseudocode:
count := 0
c := next()
while c != EOF and c != '\n'"
count := count + 1
the above will count the number of characters in a given line. next() is a function to return the next character from your file.
Alternatively, you can use fgets() with a buffer:
char buf[SIZE];
count = 0;
while (fgets(buf, sizeof buf, fp) != NULL) {
/* see if the string represented by buf has a '\n' in it,
if yes, add the index of that '\n' to count, and that's
the number of characters on that line, which you can
return to the caller. If not, add sizeof buf - 1 to count */
}
/* If count is non-zero here, the last line ended without a newline */
The original question was how to get the number of characters in "each line" (given a line? or the current line?), while the answers have mostly given solutions how to determine the length of the first line in a file. One can easily apply some of them to determine length of current line (without guessing beforehand maximum length for a buffer).
However, what one often needs in practice is the maximum length of any line in a file. Then one can reserve a buffer and use fgets to read the file line by line and use some nice functions (strtok, strtod etc.) to parse lines. In practice, you can use any of the previous solutions to determine length of one line, and just scan through all lines and take the maximum.
An easy script that reads the file character by character:
max=0; i=0;
do
if ((c=fgetc(f))!= EOF && c!='\n') i++;
else {
if (i>max) max=i;
i=0;
}
while (c!=EOF);
return max;
Note: In practice, it would suffice to have an upperbound for the maximum length. A dirty solution would be to use the file size as an upperbound for the maximum length of lines.
\n is the newline character in C. In other languages, such as C#, you may use something like C#'s Environment.EndLine to overcome platform difficulties.
If you already know that your string is one line (let's call it line), use strlen(line) to get the number of characters in it. Subtract 1 if it ends with the '\n'.
If the string has new line characters in it, you'll need to split it around the new line characters and then call strlen() on each substring.
Here is a Simple Algorithm :
You require
File Stream (FILE),
Line Number , which you want size of (int)
Returns
Total Characters in given line
Function :
#include <stdio.h>
#include <string.h>
int getLengthOfLine(FILE* df,int Ofline){
char cchar;
int line=1;
int total =1;
int atLine=0;
int afterLine=0;
while ((cchar=fgetc(df))!=EOF)
{
if (feof(df)){
break ;
}
if (cchar == '\n' || cchar == '\0'){
if(line==Ofline){
// printf(" before %d ",total);
atLine = total;
}
if(line==(Ofline+1)){
// printf(" after %d ",total);
afterLine = total-atLine;
}
// printf(" line is %d ",line);
line++;
}
total++;
}
fseek(df, 0L, SEEK_SET);
if(afterLine==0){
return (total-atLine-1);
}
else
{
return (afterLine-1);
}
}
Uses :
FILE* fp = fopen("path-to-file" , "r");
if(fp!=NULL){
printf(" %d",getLengthOfLine(fp,5));
}

Resources