strlen problems, returns the right length + 2 - c

I have a weird bug.
I wrote a function that gets a file and returns the length of each line:
void readFile1(char *argv[], int fileNumber,
char *array_of_lines[NUMBER_OF_LINES+1], int *currNumOfLines)
{
FILE* fp;
fp = fopen(argv[fileNumber], "r");
if (fp == NULL)
{
fprintf(stderr, MSG_ERROR_OPEN_FILE, argv[fileNumber]);
exit(EXIT_FAILURE);
}
char line[256];
while (fgets(line, sizeof(line), fp))
{
printf("\n line contains : %s size is : %lu\n",line,strlen(line));
}
}
The function always prints the right number + 2,
For example if file.txt contains only one line with "AAAAA" the function would print that the length is 7 instead of 5,
line contains : AAAAA
size is : 7
Does someone know where is the bug?

Don't forget that fgets leaves the newline in the buffer.
It seems you're reading a file created in Windows (where a newline is the two characters "\r\n") on a system where newline is only "\n". Those two characters are also part of the string and will be counted by strlen.
The reason I'm guessing you're reading a Windows-created file in a non-Windows system is because for files open in text-mode (the default) then the functions reading and writing strings will translate newlines from and to the operating-system dependent format.
For example on Windows, when writing plain "\n" it will be translated and actually written as "\r\n". When reading the opposite translation happens ("\r\n" becomes "\n").
On a system with plain "\n" line endings (like Linux or macOS), no translation is needed, and the "\r" part will be treated as any other character.

printf("\n line contains : %s size is : %lu\n",line,strlen(line));
Here's a giveaway
Obtained output
line contains : AAAAA
size is : 7
Expected output
line contains : AAAAA size is : 7

Related

C Programming - File io parsing strings using sscanff

I am trying to do the following the C programming language, any help or if you can finish the code I will be greatly appreciated:
I am trying to write a program in C programming language that uses file io, that will parse through the words using sscanf function and output each word in all the sentences inside a txt document (bar.txt). Here is the instructions.
Write a program that opens the file bar.txt name the program "reader". Pass a parameter to indicate lines to read. Read all the lines in the file based on the parameter 'lines' into a buffer and using sscanf parse all the words of the sentences into different string* variables. Print each of the words to the screen followed by a carriage return. You can hardwire filename (path of bar.xt) or use option to enter filename.
This is the txt file (bar.txt) i am working with:
bar.txt
this is the first sentence
this is the 2nd sentence
this is the 3rd sentence
this is the 4th sentence
this is the 5th sentence
end of file: bar.txt
usage of argv: Usage: updater [-f "filename"] 'lines'
-f is optional (if not provided have a hardwired name from previous program 2 (bar.txt))
'lines' integer from 1 to 10 (remember the files has 5-10 strings from previous program)
a sample input example for the input into the program is:
./reader -f bar.txt 1
OUTPUT:
Opening file "bar.txt"
File Sentence 1 word 1 = this
File Sentence 1 word 2 = is
File Sentence 1 word 3 = the
File Sentence 1 word 4 = first
File Sentence 1 word 5 = sentence
another example
./reader -f bar.txt 5
OUTPUT:
File Sentence 5 word 1 = this
File Sentence 5 word 2 = is
File Sentence 5 word 3 = the
File Sentence 5 word 4 = 5th
File Sentence 5 word 5 = sentence
Examples of commands:
./reader -f bar.txt 5
./reader -f bar.txt 2
./reader -f bar.txt 7
./reader 2
./reader 5
./reader 8
./reader 11
this is the code that I have so far please fix the code to show the desired output:
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer = argv[3][0] - 48;
int i; //for loops
if (argv[1][0] == '-' && argv[1][1] == 'f')
{
file = fopen(filename, "r");
if (file == NULL){
printf("Could not open file %s",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}
}
You need to get the filename from argv if they use the -f option. And you need to get the number of lines from a different argument depending on whether this option was supplied.
Use strcmp() to compare strings, rather than testing each character separately. And use atoi() to convert the lines argument to an integer, since your method only works for single-digit numbers.
#include <stdlib.h>
#include <stdio.h>
#define MAXCHAR 1000
function usage() {
fprintf(stderr, "Usage: reader [-f filename] lines\n");
exit(1);
}
int main(int argc, char *argv[]) {
FILE *file;
char string[MAXCHAR];
char* filename = "c:\\cprogram\\fileio-activity\\bar.txt";
int integer;
int i; //for loops
if (argc < 2) {
usage();
}
# Process arguments
if (strcmp(argv[1], "-f") == 0)
{
if (argc < 4) {
usage();
}
filename = argv[2];
integer = atoi(argv[3]);
} else {
integer = atoi(argc[1]);
}
file = fopen(filename, "r");
if (file == NULL){
fprintf(stderr, "Could not open file %s\n",filename);
return 1;
}
while (fgets(string, MAXCHAR, file) != NULL)
printf("%s", string);
fclose(file);
return 0;
}
To add to what Barmar already answered, for the further steps in completing the assignment:
Splitting a string into separate words is usually called tokenization, and we normally use strtok() for this. There are several ways how one can use sscanf() to do it. For example:
Use sscanf(string, "%s %s %s", word1, word2, word3) with however many word buffers you might need. (If you use e.g. char word1[100], then use %99s, to avoid buffer overrun bugs. One character must be reserved for the end-of-string character \0.)
The return value of sscanf() tells you how many words it copied to the word buffers. However, if string contains more than the number of words you specified, the extra ones are lost.
If the exercise specifies the maximum length of strings, say N, then you know there can be at most N/2+1 words, each of maximum length N, because each consecutive pair of words must be separated by at least one space or other whitespace character.
  
Use sscanf(string + off, " %s%n", word, &len) to obtain each word in a loop. It will return 1 (with int len set to a positive number) for each new word, and 0 or EOF when string starting at off does not contain any more words.
The idea is that for each new word, you increment off by len, thus examining the rest of string in each iteration.
  
Use sscanf(string + off, " %n%*s%n", &start, &end) with int start, end to obtain the range of positions containing the next word. Set start = -1 and end = -1 before the call, and repeat as long as end > start after the call. Advance to next word by adding end to off.
The beginning of the next word (when start >= 0) is then string + start, and it has end - start characters.
To "emulate" strtok() behaviour, one can temporarily save the terminating character (which can be whitespace or the end of string character) by using e.g. char saved = string[off + end];, then replace it with an end-of-string character, string[off + end] = '\0';, so that (string + start) is a pointer to the word, just like strtok() returns. Before the next scan, one does string[off + end] = saved; to restore the saved character, and off += end; to advance to the next word. 
The first one is the easiest, but is the least useful in practical programs. (It works fine, but we do not usually know beforehand the string length and word count limitations.)
The second one is very useful when you have alternate patterns you can try for the next "word" or item; for example, when reading 2D or 3D vectors (points in a plane, or in three-dimensional space), you can support multiple different formats from <1 2 3> to [1,2,3] to 1 2 3, by trying to parse the most complicated/longest first, and trying the next one, until one of them works. (If none of them work, then the input is in error, of course.)
The third one is most useful in that it describes essentially how strtok() works, and what its side effects are. (It's saved character is hidden internally as a static variable.)

Checking if a word exist in a file C

In order to check if a word exist in a .txt file I've done the following function :
void checkWord (char* word){
FILE* file = fopen("file.txt", "r");
char line[1000] = "";
if(file != NULL){
while(fgets(line, 1000, file) != NULL){
if (strcmp(line, word) == 0){
printf("hello"); }
}
}
fclose(file);
}
Yet this code doesn't print hello when I call : checkWord("hello"); on the file :
aaaaa
hello
aaaaa
but does print hello when I call : checkWord("hello"); on the file :
aaaa
hello
and I don't understand why.
In the first file the string hello is not contained by the last line, so it is followed by a '\n' character. In the second fole the same string is at the end of file (so no trailing '\n').
The core of the issue is that fgets() stops parsing when a newline character is found AND it doesn't remove that character from the returned string. For this reason an exact match is not found in the first file, when the actual string is "hello\n".
In order to solve it you can either remove the trailing '\n' from the string read by fgets or, if the requirements allow it, search just for the presence of the substring by issuing strstr ().
Although you can't see it, the end of every line except the last one in your file contains a new line character. This is represented by \n. When comparing elements line by line, you will have to account for the fact that most lines will have this character at the end.

Errorneous output while reading File in C

I am trying to write a string to a file and then read the string and output the string written into the file.
For example
INPUT (Input Name)
FalconHawk
OUTPUT
Hi FalconHawk! Have a great day!
My code is:
#include<stdio.h>
void main(){
char n[10],r[1000];
FILE *fptr,*fpt;
scanf("%s",n); //Input name
fptr=fopen("welcome.txt","w");
fprintf(fptr,"%s",n); //Write to file
fclose(fptr);
fpt=fopen("welcome.txt","r");
fscanf(fpt,"%s",r);
printf("Hi %s! Have a good day.",r); //Output file content
fclose(fpt);
}
But because of some reason I am getting an output like
INPUT (Input Name)
FalconHawk
OUTPUT
HiHi FalconHawk! Have a great day! //"Hi" is getting printed two times
On replacing "Hi" with "Welcome" I am getting an output like
OUTPUT
WelcomeWelcome FalconHawk! Have a great day! //"Welcome" is getting printed two times.
What is causing this issue?
Your buffer is too small and there's no room for the terminating null byte, therefore, your code invokes undefined behavior. If you want to read 10 characters, then this is how you should do it
char input[11];
if (scanf("%10s", input) == 1) {
// Safely use `input' here
}
And if you want to read an entire line of text from stdin then use fgets() instead
if (fgets(input, sizeof input, stdin) != NULL) {
// Safely use `input' here
}
Strings in c always need an extra byte of space to store the terminating '\0', read a basic tutorial on strings in c to learn how they work and how to treat them.

How to find number of lines of a file?

for example:
file_ptr=fopen(“data_1.txt”, “r”);
how do i find number of lines in the file?
You read every single character in the file and add up those that are newline characters.
You should look into fgetc() for reading a character and remember that it will return EOF at the end of the file and \n for a line-end character.
Then you just have to decide whether a final incomplete line (i.e., file has no newline at the end) is a line or not. I would say yes, myself.
Here's how I'd do it, in pseudo-code of course since this is homework:
open file
set line count to 0
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
read character from file
Extending that to handle a incomplete last line may not be necessary for this level of question. If it is (or you want to try for extra credits), you could look at:
open file
set line count to 0
set last character to end-of-file
read character from file
while character is not end-of-file:
if character in newline:
add 1 to line count
set last character to character
read character from file
if last character is not new-line:
add 1 to line count
No guarantees that either of those will work since they're just off the top of my head, but I'd be surprised if they didn't (it wouldn't be the first or last surprise I've seen however - test it well).
Here's a different way:
#include <stdio.h>
#include <stdlib.h>
#define CHARBUFLEN 8
int main (int argc, char **argv) {
int c, lineCount, cIdx = 0;
char buf[CHARBUFLEN];
FILE *outputPtr;
outputPtr = popen("wc -l data_1.txt", "r");
if (!outputPtr) {
fprintf (stderr, "Wrong filename or other error.\n");
return EXIT_FAILURE;
}
do {
c = getc(outputPtr);
buf[cIdx++] = c;
} while (c != ' ');
buf[cIdx] = '\0';
lineCount = atoi((const char *)buf);
if (pclose (outputPtr) != 0) {
fprintf (stderr, "Unknown error.\n");
return EXIT_FAILURE;
}
fprintf (stdout, "Line count: %d\n", lineCount);
return EXIT_SUCCESS;
}
Is finding the line count the first step of some more complex operation? If so, I suggest you find a way to operate on the file without knowing the number of lines in advance.
If your only purpose is to count the lines, then you must read them and... count!

Finding line size of each row in a text file

How can you count the number of characters or numbers in each line? Is there something like a EOF thats more like a End of Line?
You can iterate through each character in the line and keep incrementing a counter until the end-of-line ('\n') is encountered. Make sure to open the file in text mode ("r") and not binary mode ("rb"). Otherwise the stream won't automatically convert different platforms' line ending sequences into '\n' characters.
Here is an example:
int charcount( FILE *const fin )
{
int c, count;
count = 0;
for( ;; )
{
c = fgetc( fin );
if( c == EOF || c == '\n' )
break;
++count;
}
return count;
}
Here's an example program to test the above function:
#include <stdio.h>
int main( int argc, char **argv )
{
FILE *fin;
fin = fopen( "test.txt", "r" );
if( fin == NULL )
return 1;
printf( "Character count: %d.\n", charcount( fin ) );
fclose( fin );
return 0;
}
Regarding reading a file line by line, look at fgets.
char *fgets(char *restrict s, int n, FILE *restrict stream);
The fgets() function shall read bytes
from stream into the array pointed to
by s, until n-1 bytes are read, or a
is read and transferred to
s, or an end-of-file condition is
encountered. The string is then
terminated with a null byte.
The only problem here may be if you can't guarantee a maximum line size in your file. If that is the case, you can iterate over characters until you see a line feed.
Regarding end of line:
Short answer: \n is the newline character (also called a line feed).
Long answer, from Wikipedia:
Systems based on ASCII or a compatible
character set use either LF (Line
feed, 0x0A, 10 in decimal) or CR
(Carriage return, 0x0D, 13 in decimal)
individually, or CR followed by LF
(CR+LF, 0x0D 0x0A); see below for the
historical reason for the CR+LF
convention. These characters are based
on printer commands: The line feed
indicated that one line of paper
should feed out of the printer, and a
carriage return indicated that the
printer carriage should return to the
beginning of the current line.
* LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
* CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
* CR: Commodore 8-bit machines, Apple II family, Mac OS up to version 9 and OS-9
But since you are not likely to be working with a representation that uses carriage return only, looking for a line feed should be fine.
If you open a file in text mode, i.e., without a b in the second argument to fopen(), you can read characters one-by-one until you hit a '\n' to determine the line size. The underlying system should take care of translating the end of line terminators to just one character, '\n'. The last line of a text file, on some systems, may not end with a '\n', so that is a special case.
Pseudocode:
count := 0
c := next()
while c != EOF and c != '\n'"
count := count + 1
the above will count the number of characters in a given line. next() is a function to return the next character from your file.
Alternatively, you can use fgets() with a buffer:
char buf[SIZE];
count = 0;
while (fgets(buf, sizeof buf, fp) != NULL) {
/* see if the string represented by buf has a '\n' in it,
if yes, add the index of that '\n' to count, and that's
the number of characters on that line, which you can
return to the caller. If not, add sizeof buf - 1 to count */
}
/* If count is non-zero here, the last line ended without a newline */
The original question was how to get the number of characters in "each line" (given a line? or the current line?), while the answers have mostly given solutions how to determine the length of the first line in a file. One can easily apply some of them to determine length of current line (without guessing beforehand maximum length for a buffer).
However, what one often needs in practice is the maximum length of any line in a file. Then one can reserve a buffer and use fgets to read the file line by line and use some nice functions (strtok, strtod etc.) to parse lines. In practice, you can use any of the previous solutions to determine length of one line, and just scan through all lines and take the maximum.
An easy script that reads the file character by character:
max=0; i=0;
do
if ((c=fgetc(f))!= EOF && c!='\n') i++;
else {
if (i>max) max=i;
i=0;
}
while (c!=EOF);
return max;
Note: In practice, it would suffice to have an upperbound for the maximum length. A dirty solution would be to use the file size as an upperbound for the maximum length of lines.
\n is the newline character in C. In other languages, such as C#, you may use something like C#'s Environment.EndLine to overcome platform difficulties.
If you already know that your string is one line (let's call it line), use strlen(line) to get the number of characters in it. Subtract 1 if it ends with the '\n'.
If the string has new line characters in it, you'll need to split it around the new line characters and then call strlen() on each substring.
Here is a Simple Algorithm :
You require
File Stream (FILE),
Line Number , which you want size of (int)
Returns
Total Characters in given line
Function :
#include <stdio.h>
#include <string.h>
int getLengthOfLine(FILE* df,int Ofline){
char cchar;
int line=1;
int total =1;
int atLine=0;
int afterLine=0;
while ((cchar=fgetc(df))!=EOF)
{
if (feof(df)){
break ;
}
if (cchar == '\n' || cchar == '\0'){
if(line==Ofline){
// printf(" before %d ",total);
atLine = total;
}
if(line==(Ofline+1)){
// printf(" after %d ",total);
afterLine = total-atLine;
}
// printf(" line is %d ",line);
line++;
}
total++;
}
fseek(df, 0L, SEEK_SET);
if(afterLine==0){
return (total-atLine-1);
}
else
{
return (afterLine-1);
}
}
Uses :
FILE* fp = fopen("path-to-file" , "r");
if(fp!=NULL){
printf(" %d",getLengthOfLine(fp,5));
}

Resources