strlen in array

strlen in array - c

I'm debugging a code in which a .ini file is being read for the value of string called Timeout(which is taken into a varibale called rbuf).Please tell me the content of the .ini file when the condition is as follows:
if((strlen(rbuf) > 0) && (rbuf[strlen(rbuf)-1] == '\n')){
rbuf[strlen(rbuf)-1] = '\0';
}
When will the debugger go into the above if loop?
Please specify the exact content of the rbuf value (Buffer value)

When the line has a 'string length' (anything greather than 0, not counting the null-terminator) and the final char before the zero-terminator is a newline, it will enter the conditional block and set that newline to be a null terminator.
In order to tell you the exact contents of rbuf, I would need to know the contents of the ini file. But, for example, if you had a line of text in it like:
i love programming
And lets assume there is an undisplayed newline at the end if it.
Then rbuf would start off containing:
`i love programming\n\0'
Thats 20 bytes. Strlen will return 19 (not including the null-terminator at the end).
rbuf[strlen(rbuf)-1] will be the '\n' character (at index 18 in the buffer).
So your code would see that a newline is at index 18, and set it to '\0', so you end up with:
i love programming\0\0
in your buffer.

Hard to say with the information you have given, but:
(strlen(rbuf) > 0) : rbuf contains a non-empty string
(rbuf[strlen(rbuf)-1] == '\n') : rbuf contain a string that ends with a line break.
Other than that, rbuf might only contain a line break. Or it might contian a series of charecters and ends with a line break.

Related

Count lines in ASCII file using C

I would like to count the number of lines in an ASCII text file.
I thought the best way to do this would be by counting the newlines in the file:
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') ++lines;
}
However, I'm not sure if this would account for the last line on all both MS Windows and Linux. That is if my text file finishes as below, without an explicit newline, is there one encoded there anyway or should I add an extra ++lines; after the for loop?
cat
dog
Then what about if there is an explicit newline at the end of the file? Or do I just need to test for this case by keeping track of the previously read value?

If there is no newline, one won't be generated. C tells you exactly what's there.

Text files are always expected to end with a line feed. There's no canonical way of handling files that don't.
Here's how some tools choose to deal with characters after the last line feed:
wc doesn't count it as a line (so you have good precedence for that)
Vim marks the file as [noeol], and saves the file without a trailing line feed
GNU sed treats the file as if it had a last line feed
sh's read exits with error, but still returns the data
Since behaviour is pretty much undefined, you can just do whatever's convenient or useful to you.

First, there will not be any implicitly encoded newline at the end of the last line. The only way there will be a newline is if the software or person that produced the file put it there. Putting it there is generally considered good practice, however.
The ultimate answer for what you should report as the line count depends on the convention that you need to follow for the software or people that will be using this line count, and probably what you can assume about the behavior of the input source as well.
Most command-line tools will terminate their output with a newline character. In this case, the sensible answer may be to report the number of newline characters as the number of actual lines.
On the other hand, when a text editor is displaying a file, you will see that the line numbering in the margin (if supported) contains a number for the last line whether it is empty or not. This is in part to tell the user that there is a blank line there, but if you want to count the number of lines displayed in the margin, it is one plus the number of newline characters in the file. It is typical for some coders to not terminate their last lines with a newline character (sometimes due to sloppiness), so in this case this convention would actually be the right answer.
I'm not sure any other conventions make much sense. For example, if you choose not to count the last line unless it is non-empty, then what counts as non-empty? The file ending after newline? What if there is whitespace on that line? What if there are several empty lines at the end of the file?

If you're going to use this method, you could always keep a separate counter for how many letters on the line you are at. If the count at the end is greater than 1, then you know there is stuff on the last line that wasn't counted.
int letters = 0
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
letters++; // Increase count on character
if (c == '\n')
{
++words;
letters = 0; // Set back to 0 after new line
}
}
if (letters > 0)
{
++words;
}

Your concern is real, the last line in the file may be missing the final end of line marker. The end of line marker is a single '\n' in Linux, a CR LF pair in Windows that the C runtime converts automatically into a '\n'.
You can simplify your code and handle the special case of the last line missing a linefeed this way:
int c, last = '\n', lines = 0;
while ((c = getc(fp)) != EOF) { /* Count word line endings. */
if (c == '\n')
lines += 1;
last = c;
}
if (last != '\n')
lines += 1;
Since you are concerned with speed, using getc instead of fgetc will help on platforms where it is defined as a macro that handles the stream structures directly and calls a function only to refill the buffer, every BUFSIZ characters or so, unless the stream is unbuffered.

How about this:
Create a flag for yourself to keep track of any non \n characters following a \n that is reset when c=='\n'.
After the EOF, check to see if the flag is true and increment if yes.
bool more_chars = false;
for (int c = fgetc(fp); c != EOF; c = fgetc(fp)) { /* Count word line endings. */
if (c == '\n') {
more_chars = false;
++words;
} else more_chars = true;
}
if(more_chars) words++;

Windows and UNIX/Linux style line breaks make no difference here. On either system a text file may or may not have a newline at the end of the last line.
If you always add 1 to the line count, this effectively counts the empty line at the end of the file when there is a newline at the end (i.e., file "foo\n" will count as having two lines: "foo" and ""). This may be an entirely reasonable solution, depending on how you want to define a line.
Another definition of a "line" is that it always ends in a newline, i.e., the file "foo\nbar" would only have one line ("foo") by this definition. This definition is used by wc.
Of course you could keep track of whether the newline was the last character in file and only add 1 to the count in case it wasn't. Then a "line" would be defined as either ending in a newline or being non-empty at the end of the file, which sounds quite complex to me.

Searching for strings that are NULL terminated within a file where they are not NULL terminated

I am writing a program that opens two files for reading: the first file contains 20 names which I store in an array of the form Names[0] = John\0. The second file is a large text file that contains many occurences of each of the 20 names.
I need my program to scan the entirity of the second file and each time it finds one of the names, a variable Count is incremented and so on the completion of the program, the total number of all the names appearing in the text is stored in Count.
Here is my loop which searches for and counts the number of name occurences:
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
}
}
No matter what I do, this loop doesn't work as I would expect it to, but I have discovered what is wrong (I think!). My problem is that each name in the array is NULL terminated, but when a name appears in the text file it is not NULL terminated, unless it occurs as the last word of a line. Therefore, this while loop is only counting the number of times any of the names appear at the end of a line, rather than the number of appearances of any of the names anywhere in the text file. How can I adjust this loop to combat this problem?
Thank you for any advice in advance.

The issue here is probably your use of fgets, which does not trim the newline from the line it reads.
If you are creating your names array by reading lines with fgets, then all the names will be terminated with a newline character. The lines in the file being read with fgets will also be terminated with a newline character, so the names will only match at the end of the lines.
strstr does not compare the NUL byte which terminates the pattern string, for obvious reasons. If it did, it would only match suffix strings, which would make it a very different function.
Also, you will only find a maximum of one instance of each name in each line. If you think that a name might appear more than once in the same line, you should replace:
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
with something like:
for (TempName = LineOfText;
(TempName = strstr(TempName, Names[a]);
++Count, ++TempName) {
}
For reference, here is the definition of fgets from the C standard (emphasis added):
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
This is different from gets, which does not retain the new-line character.

I think the NULL termination of the names array is not an issue (See strstr function reference). The strstr function is not going to compare the terminator. You do have the possibility of missing additional names on each line. See my adjustment below for an example of how you could count multiple names on each line.
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
/* Iterate through line for multiple occurrences of each name */
while(TempName != NULL){
Count++;
/* Get next occurrence of name on line. fgets is going to
leave a newline at the end of the LineOfText string so
unless some of your names contain a newline, it shouldn't
move past the end of the buffer */
TempName = strstr(TempName + 1, Names[a]);
}
}
}

Reading CR terminated keyword text file

I want to read a list of keywords from a (text) file and then add those in a CString array in C. The trouble is that, I am reading the file line by line, and the file contains one word in every line. I can successfully populate the array, but when I try to look up these keywords in another string, it returns false because I am guessing the keyword has \n at the end.
Another way I could read the file could be, to make the text file a comma separated file, and read one line and tokenize it. But then, I won't know how to read a line whose size can be VERY large, as the list of keyword is ever expanding.
Saad Rehman

If your problem is that a string may have a rogue newline at the end, you can use:
size_t len = strlen (mystring);
if (len > 0)
if (mystring[len-1] == '\n')
mystring[--len] = '\0';
Do this to mystring after you've read it in but before you use it.
It simply checks if the last character is a newline and, if so, replaces it with a string terminator.
The first check is to ensure you don't try this on an empty string where mystring[-1] would invoke the dreaded undefined behaviour.

Using fgets to read strings from file in C

I am trying to read strings from a file that has each string on a new line but I think it reads a newline character once instead of a string and I don't know why. If I'm going about reading strings the wrong way please correct me.
i=0;
F1 = fopen("alg.txt", "r");
F2 = fopen("tul.txt", "w");
if(!feof(F1)) {
do{ //start scanning file
fgets(inimene[i].Enimi, 20, F1);
fgets(inimene[i].Pnimi, 20, F1);
fgets(inimene[i].Kood, 12, F1);
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s",i,inimene[i].Enimi,inimene[i].Pnimi,inimene[i].Kood);
i++;}
while(!feof(F1));};
/*finish getting structs*/
The printf is there to let me see what was read into what and here is the result
i=0
Enimi=peter
Pnimi=pupkin
Kood=223456iatb i=1
Enimi=
Pnimi=masha
Kaad=gubkina
i=2
Enimi=234567iasb
Pnimi=sasha
Kood=dudkina
As you can see after the first struct is read there is a blank(a newline?) onct and then everything is shifted. I suppose I could read a dummy string to absorb that extra blank and then nothing would be shifted, but that doesn't help me understand the problem and avoid in the future.
Edit 1: I know that it stops at a newline character but still reads it. I'm wondering why it doesn't read it during the third string and transfers to the fourth string instead of giving the fourth string the fourth line of the source but it happens just once.
The file is formatted like this by the way
peter
pupkin
223456iatb
masha
gubkina
234567iasb
sasha
dudkina
123456iasb

fgets stops reading when it reads a newline, but the newline is considered a valid character and is included in the returned string.
If you want to remove it, you'll need to trim it yourself:
length = strlen(str);
if (str[length - 1] == '\n')
str[length - 1] = '\0';
Where str is the string into which you read the data from the file, and length is of type size_t.
To answer the edit to the question: the reason the newline is not read during the third read is because you are not reading enough characters. You give fgets a limit of 12 characters, which means it can actually read a maximum of 11 characters since it has to add the null terminator to the end.
The line you read is 11 characters in length before the newline. Note that there is a space at the end of that line when you output it:
Kood=223456iatb i=1
^

As already stated, if there's enough room in the buffer, then fgets() reads the data including the newline into the buffer and null terminates the line. If there isn't enough room in the buffer before coming across the newline, fgets() copies what it can (the length of the buffer minus one byte) and null terminates the string. The library resumes reading from where fgets() left off on the next iteration.
Don't mess with buffers smaller than 2 bytes long.
Note that gets() removes the newline (but does not protect you from buffer overflows, so do not use it). If things go as currently planned, gets() will be removed from the next version of the C standard; it will be a long time before it is removed from C libraries (it will just become a non-standard - or ex-standard - additional function available for abuse).
Your code should check each of the fgets() function calls:
while (fgets(inimene[i].Enimi, 20, F1) != 0 &&
fgets(inimene[i].Pnimi, 20, F1) != 0 &&
fgets(inimene[i].Kood, 12, F1) != 0)
{
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi, inimene[i].Kood);
i++;
}
There are places for do/while loops; they are not used very often, though.

the fgets function reads newline char as a part of the string read.
From the description of fgets:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.

if Enimi/Pnimi/Kood are arrays not pointers:
while( fgets(inimene[i].Enimi,sizeof inimene[i].Enimi,F1) &&
fgets(inimene[i].Pnimi,sizeof inimene[i].Pnimi,F1) &&
fgets(inimene[i].Kood,sizeof inimene[i].Kood,F1) )
{
if( strchr(inimene[i].Enimi,'\n') ) *strchr(inimene[i].Enimi,'\n')=0;
if( strchr(inimene[i].Pnimi,'\n') ) *strchr(inimene[i].Pnimi,'\n')=0;
if( strchr(inimene[i].Kood,'\n') ) *strchr(inimene[i].Kood,'\n')=0;
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi,inimene[i].Kood);
i++;
}

insert text inside a line

I have a file pointer which I am using with fgets() to give me a complete line along with the new line in the buffer.
I want to replace 1 char and add another character before the new line. Is that possible?
For example:
buffer is "12345;\n"
output buffer is "12345xy\n"
This is the code:
buff = fgets((char *)newbuff, IO_BufferSize , IO_handle[i_inx]->fp);
nptr = IO_handle[i_inx]->fp;
if(feof(nptr))
{
memcpy((char *)o_rec_buf+(strlen((char *)newbuff)-1),"E",1);
}
else
{
memcpy((char *)o_rec_buf+(strlen((char *)newbuff)-1),"R",1);
}
As you can see I am replacing the new line here (example line is shown above).
I want to insert the text and retain the new line instead of what I am doing above.

You can't insert one character the way you want to. If you are sure the o_rec_buf has enough space, and that the line will always end in ";\n", then you can do something like:
size_t n = strlen(newbuff);
if (n >= 2)
strcpy(o_rec_buf + n - 1, "E\n");
/* memcpy(o_rec_buf+n-1, "E\n", 3); works too */
Note that using feof() like the way you do is an error most of the times. feof() tells you if you hit end-of-file condition on a file after you hit it. If you are running the above code in a loop, when feof() returns 'true', no line will be read by fgets, and buff will be NULL, but newbuff will be unchanged. In other words, newbuff will contain data from the last fgets call. You will process the last line twice. See CLC FAQ 12.2 for more, and a solution.
Finally, why all the casts? Are o_rec_buf and newbuff not of type char *?

If the buffer has enough space, you'll need to move the trailer 1 character further, using memmove and update the char you need.
Make sure you do not forget to memmove the trailing '\0'.