Checking For a New Line in a Text File - c

I'm trying to write a code that reads from a file and works with the characters it reads. The gist is that it has to correct the capitalization errors present in the file it reads.
One particular requirement is that I have to number each line, so I wrote a bit to determine whether or not a each character read is a line break.
int fix_caps(char* ch, int* char_in_word, int* line_num){
char a;
ch = &a;
if(a != '\n'){
return 0;
}else{
return 1;
}
if(a == ' ')
*char_in_word = 0;
if(*char_in_word == 1)
a = toupper(a);
if(*char_in_word > 1)
a = tolower(a);
char_in_word++;
}
However, the function this is in always returns 0, when it should return 1 at the end of each line. What am I doing wrong?

the execution will never get beyond this 'if control block:
char a;
ch = &a;
if(a != '\n'){
return 0;
}else{
return 1;
}
there is a few reasons it 'always' returns 0
1) 'a' is on the stack and could contain anything.
2) the chances of the 'trash' that is on the stack where 'a'
is located are 255:1 against the trash happening to contain
a new line character.
nothing beyond the 'if control block is ever executed because
an 'if' control block only has two execution paths
and both paths contain a return statement.

Related

Control flow understanding

I'm having a lot of trouble understanding the following program flow:
#include <stdio.h>
#include <ctype.h>
int main ()
{
FILE *fp;
int index = 0;
char word[45];
int words = 0;
fp = fopen("names.txt","r");
if(fp == NULL)
{
perror("Error in opening file");
return(-1);
}
for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
{
// allow only alphabetical characters and apostrophes
if (isalpha(c) || (c == '\'' && index > 0))
{
// append character to word
word[index] = c;
index++;
// ignore alphabetical strings too long to be words
if (index > 45)
{
// consume remainder of alphabetical string
while ((c = fgetc(fp)) != EOF && isalpha(c));
// prepare for new word
index = 0;
}
}
// ignore words with numbers (like MS Word can)
else if (isdigit(c))
{
// consume remainder of alphanumeric string
while ((c = fgetc(fp)) != EOF && isalnum(c));
// prepare for new word
index = 0;
}
// we must have found a whole word
else if (index > 0)
{
// terminate current word
word[index] = '\0';
// update counter
words++;
//prepare for next word
printf("%s\n", word);
index = 0;
}
//printf("%s\n", word);
}
printf("%s\n", word);
fclose(fp);
return(0);
}
As you can see, it's just a plain program that stores characters from words into an array, back-to-back, from a file called 'names.txt'.
My problem resides in the else if(index > 0) condition.
I've run a debugger and, obviously, the program works correctly.
Here's my question:
On the first for-loop iteration, index becomes 1. Otherwise, we wouldn't be able to store a whole word in an array.
If so, how is it possible that, when the program flow reaches the else if (index > 0) condition, it doesn't set word[1] to 0? (Or the subsequent values of index).
It just finishes the whole word and, once it has reached the end of the word, then it proceeds to give word[index] the value of 0 and proceed to the next word.
I've tried reading the documentation, running half of the program and asking with echo, and running a debugger. As it should be, everything runs perfectly, there's no problem with the code (as far as I know). I'm the problem. I just can't get how it works.
PS: sorry if this might be so trivial for some of you, I'm really starting to learn programming and I find sometimes really hard to understand apparently so simple concepts.
Thank you very much for you time guys.
As soon as something is executed in the if...else block, it moves out of the block. So if it satisfies the first if condition, the else if condition is not even checked. So if index > 0 AND c=\ or c is an alphabet, it runs the if statement, and if even one of these conditions does not hold true, it will move to the else if portions of the block.
Note the else in the beginning of the else if (index > 0) condition.
This means that it will only execute if none of the previous if() and else if() executed.
The previous if() and else if() statements do keep executing if the character is alphanumeric, or a non-leading slash, so the last else if() only executes once a non-alphanumeric, or leading slash is encountered.

Infinite Loop on Get_Next_Line in C

I have to create a C function that returns a line read from a file descriptor. I have to define a macro READ_SIZE (that can be editable). This READ_SIZE indicates the number of characters to read at each call of read(). The number can only be positive.
I also have to use one or several static variables to save the characters that were read but not sent back to the calling function. One .C file (5 functions max, 25 lines max per function) and one .h file only.
My function Get_Next_Line shall return its return without the '\n'. If there is nothing more to read on the file descriptor, or if an error occur while reading, the function returns NULL.
Here is the prototype of the function:
char *get_next_line(const int fd)
FUNCTIONS ALLOWED: malloc, free, read, write (to use with my_putchar, my_putstr, etc).
Here is what I have, but it doesn't work. It does an infinite loop I am trying to know why.
char *my_strcat(char *str1, char *str2)
{
int i;
int j;
int s;
char *strfinal;
i = 0;
j = 0;
s = 0;
if ((strfinal = malloc(sizeof(char) * (my_strlen(str1) + my_strlen(str2)
+ 1))) == NULL)
return (NULL);
while (str1[i] != '\0')
{
strfinal[j] = str1[i];
i++;
j++;
}
while (str2[s] != '\0')
{
strfinal[j] = str2[s];
s++;
j++;
}
free(str1);
strfinal[j] = '\0';
return (strfinal);
}
char *get_next_line(const int fd)
{
int n;
int i;
char *str_to_return;
static char buff[READ_SIZE] = {'\0'};
n = 1;
i = 0;
str_to_return = NULL;
while (n)
{
if (i == 0 && buff == '\0')
{
if ((read(fd, buff, READ_SIZE)) <= 0)
return(str_to_return);
if (i == READ_SIZE - 1 || buff[i] == '\n')
{
n = 0;
str_to_return = my_strcat(buff, str_to_return);
i = -1;
}
}
i++;
}
printf("%s\n", str_to_return);
return (str_to_return);
}
in this code:
while (str1[i] != '\0')
{
strfinal[j] = str1[i];
i++;
j++;
}
what guarantee do you have that there will be the null character \0 somewhere in str1[] ???
same goes for the str2 while loop.
If no null character is encountered, then there will be an infinite loop there.
verify the functions you are using to populate characters into memory under str1[] and str2[] include the null character. Since you are only using the read() function prior then that answer is no.
The problem with your two while loops for str1[] and str2[] is that you are relying on the null character to already be there in memory. And that then begs the question, who put that data there in memory and were they given a requirement to terminate the character data with a null character?
you therefore need to somehow place a control over any loop you write so as not get caught in an infinite loop condition; in this case maybe use a counter and after so many advances of the i to access str1[i] then stop, because you have yet to see a null character.
for example, the fgets() function will read so many characters from a FILE stream into an array, and always terminate it with the null character.
if (i == 0 && buff == '\0')
is always false because your definition of buff is
static char buff[READ_SIZE] = {'\0'};
You are attempting to test if buff is empty when i is 0. However as a char pointer, buff is an address and is never 0. You mean to make the if
if (i == 0 && buff[0] == '\0')
in order to check if the first character is the Null character.
However, once i is incremented, then it always fails even if you test against
if (i == 0 && buff[i] == '\0')
in order to find the NULL character within the buffer. Since you enter the while with i = 0 and are checking if buff is empty, you do not need the while.
If you want to just fill the buffer and keep reading until it is full, you need a different type of test. You also need a way of checking if you need to exit the while loop if the if fails (put in an else to determine what to do).
You also do not need to check each character in buff against '\0' because your code has always insured that it ends with one (even for initialization). Thus, strlen(buff) would be valid.
Another point is that when you call mystrcat() you have already verified that buffer is empty.
Also, since the second string in the call is what you read in, then the mystrcat() will not always have a '\0' at the end of str2 (though you are guaranteeing that buff (str1) will). You should call it with the number of characters in str2 to use.

Can't count '|' symbols in a .c file

Basically I have to write a program that counts all kinds of different symbols in a .c file. I got it to work with all of the needed symbols except the vertical line '|'. For some reason it just won't count them.
Here's the method I'm using:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename,"r");
FILE *f;
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
if (temp >= 'a' && temp <= 'z') {
capital++;
}
else if (temp >= 'A' && temp <= 'Z') {
lesser++;
}
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
if (temp >= '0' && temp <= '9') {
numbers++;
}
if (temp == '|') {
spc++;
}
if (temp == '\n') {
lines++;
}
}
}
On this line:
else if( temp == '/') temp = fgetc(fp); {
I believe you have a misplaced {. As I understand it should come before temp = fgetc(fp);..
You can easily avoid such an errors if following coding style guidelines placing each expression on it's own line and indenting the code properly.
Update: And this fgetc is a corner case. What if you read past EOF here? You are not checking this error.
Firstly, some compiler warnings:
'f' : unreferenced local variable
not all control paths return a value
So, f can be removed, and the function should return a value on success too. It's always a good idea to set compiler warnings at the highest level.
Then, there is a problem with:
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
Check the ; at the end of the else. This means the block following it, is always executed. Also, for this fgetc() there is no check for EOF or an error.
Also, if temp is a /, but the following character is not, it will be skipped, so we need to put the character back into the stream (easiest solution in this case).
Here is a full example:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename, "r");
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
// check characters - check most common first
if (temp >= 'a' && temp <= 'z') lesser++;
else if (temp >= 'A' && temp <= 'Z') capital++;
else if (temp >= '0' && temp <= '9') numbers++;
else if (temp == '|') spc++;
else if (temp == '\n') lines++;
else if( temp == '/')
if ((temp = fgetc(fp)) == EOF)
break; // handle error/eof
else
if(temp == '/') comments++;
else ungetc(temp, fp); // put character back into the stream
}
fclose (fp); // close as soon as possible
printf("capital: %d\nlesser: %d\ncomments: %d\n"
"numbers: %d\nspc: %d\nlines: %d\n",
capital, lesser, comments, numbers, spc, lines
);
return 1;
}
While it is usually recommended to put if statements inside curly braces, I think in this case we can place them on the same line for clarity.
Each if can be preceded with an else in this case. That way the program doesn't have to check the remaining cases when one is already found. The checks for the most common characters are best placed first for the same reason (but that was the case).
As an alternative you could use islower(temp), isupper(temp) and isdigit(temp) for the first three cases.
Performance:
For the sake of completeness: while this is probably an exercise on small files, for larger files the data should be read in buffers for better performance (or even using memory mapping on the file).
Update, #SteveSummit's comment on fgetc performance:
Good answer, but I disagree with your note about performance at the
end. fgetc already is buffered! So the performance of straightforward
code like this should be fine even for large inputs; there's typically
no need to complicate the code due to concerns about "efficiency".
Whilst this comment seemed to be valid at first, I really wanted to know what the real difference in performance would be (since I never use fgetc I hadn't tested this before), so I wrote a little test program:
Open a large file and sum every byte into a uint32_t, which is comparable to scanning for certain chars as above. Data was already cached by the OS disk cache (since we're testing performance of the functions/scans, not the reading speed of the hard disk). While the example code above was most likely for small files, I thought I might put the test results for larger files here as well.
Those were the average results:
- using fgetc : 8770
- using a buffer and scan the chars using a pointer : 188
- use memory mapping and scan chars using a pointer : 118
Now, I was pretty sure using buffers and memory mapping would be faster (I use those all the time for larger data), the difference in speed is even bigger than expected. Ok, there might be some possible optimizations for fgetc, but even if those would double the speed, the difference would still be high.
Bottom line: Yes, it is worth the effort to optimize this for larger files. For example, if processing the data of a file takes 1 second with buffers/mmap, it would take over a minute with fgetc!

Issues with 'read' system function

I was writing a programm which must read a word, which is separated from others by a ' ' or \n, from a file (I decided to use 'read' system function for it) and faced up an issue with it.
In particular, as it is written in manual, 'read' must return 0 when there is nothing else to read (EOF is reached), but in my case it returns \n (I checked the ASCII code of returned symbol and it's 10which is of \n and checked my program a number of times and it always returns the same). Here's the code
char *read_word_from_file(int fd, int *flag)
{
int i = 0;
char ch, *buf = NULL;
if (!read(fd, &ch, 1)) { //file is empty
*flag = 1;
return NULL;
}
while (ch != ' ' && ch != '\n') {
if (!(buf = (char *) realloc(buf, i + 1))) goto mem_err;
buf[i++] = ch;
if (!(read(fd, &ch, 1))) {
*flag = 1;
break;
}
}
buf[i] = '\0';
return buf;
mem_err:
perror("realloc");
exit(1);
}
(flag variable is used to indicate the EOF for the outer function, which calls this one) So, my question is "is such behavior is normal, or I made a mistake somewhere?"
P.S. - Off-topic question, how do you make a part of text(a single word) shadowed like a code samples?
gedit simply doesn't show newlines in files. Why is there a newline in your file? Depends on how you created that file. If you used for example puts("M"); then you should understand that puts() adds a newline. If you created it with an editor, you should understand that editors usually write complete lines ending in a newline.
– Jens

Write a program that reads input up to # and reports the number of times that the sequence ei occurs

Write a program that reads input up to # and reports the number of times that the sequence ei occurs
I have this question and I found a code here for this but I'm unable to figure out what the int c1 part does. Here's the code :
#include <stdio.h>
int main(void) {
int c;
int ei_count = 0;
while ((c = getchar())!= '#') {
if (c == 'e') {
int c1 = getchar();
if (c1 == 'i')
ei_count++;
}
}
printf("ei appeared %d times\n", ei_count);
return(0);
}
My question is, how does the if condition work? Can someone please explain ?
I'm new at C
The c1-part is a broken try at scanning the second part of ei, they could have reused c without introducing more errors.
Better alternative:
#include <stdio.h>
int main(void) {
int c, last = 0, ei_count = 0;
while ((c = getchar()) >= 0 && c != '#') {
ei_count += last && c == 'i';
last = c == 'e';
}
printf("ei appeared %d times\n", ei_count);
}
Corrected errors:
neither ei nor # recognized after e.
infinite loop on EOF / input error.
Random facts:
main has an implicit return 0; just before the closing brace.
getchar() returns an int, so it can return -1 on failure and an unsigned char converted to int on success. Always check for failure.
logical and comparison operators always return 0 or 1.
0 is logical false, all else is logical true.
return is not a function call: Use return 0; without parentheses.
That part you mentioned is a way to find a pattern like "ei". At the first moment the code try to find the 'e' character in a loop, and then, once it is found, the code checks if the next char is the letter 'i'. Case not, start again the loop to find another 'e' char.
This is not an good approach since there is no error verification during the getchar() operation and you can fall on a infinite loop.
Oversimplified, it's just a state machine.
Stepping through line-by-line:
while ((c = getchar())!= '#') {
Read input and assign it to the variable c. If that read in char is anything but #, execute the body of the while, otherwise jump over it.
if (c == 'e') {
If the read in character is an e, then we want to execute the internal block. If it's not, skip to the end of this block.
int c1 = getchar();
Read another character.
if (c1 == 'i') ei_count++;
If the new character is an i, then increment the counter of found items.
} close if (if e was found)
} close while.
It's worth pointing out there is a very clear flaw in the logic flow, however. Think about what happens if you have the input "eei".
In C, char and int data types are pretty much the same thing. In an assignment statement, the final result of the calculations are converted to the type of the variable being assigned to. Probably in this example, the author of this code assumed that the value returned by getchar() function which is of type int cannot be assigned to a char variable. You can use a char type for the c1 variable, since the getchar() function returns the ASCII code of the next character in the input, and the value is automatically converted to char during assingment. In the next if statement, you can easily compare the value of c1 with the character value i.
This code is a little bit buggy, for if you enter e# as an input, the program hangs. The reason is that c1 = getchar(); statement in the if block assigns '#' value to variable c1, and the comparison obviously fails, and at the next iteration of the while loop, getchar() returns a value after the '#' character in the input stream, which is a garbage value unless you entered more other characters after the '#' character.
Here's my code with a few fixes:
#include <stdio.h>
int main(void) {
char ch, next;
int ei_cnt = 0;
while((ch = getchar()) != '#') {
if(ch == 'e') {
next = getchar();
if(next == '#')
break;
if(next == 'i')
ei_cnt++;
}
}
printf("ei substring occured %d %s.\n", ei_cnt,
ei_cnt == 1 ? "time" : "times");
return 0;
}

Resources