Control flow understanding - c

I'm having a lot of trouble understanding the following program flow:
#include <stdio.h>
#include <ctype.h>
int main ()
{
FILE *fp;
int index = 0;
char word[45];
int words = 0;
fp = fopen("names.txt","r");
if(fp == NULL)
{
perror("Error in opening file");
return(-1);
}
for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
{
// allow only alphabetical characters and apostrophes
if (isalpha(c) || (c == '\'' && index > 0))
{
// append character to word
word[index] = c;
index++;
// ignore alphabetical strings too long to be words
if (index > 45)
{
// consume remainder of alphabetical string
while ((c = fgetc(fp)) != EOF && isalpha(c));
// prepare for new word
index = 0;
}
}
// ignore words with numbers (like MS Word can)
else if (isdigit(c))
{
// consume remainder of alphanumeric string
while ((c = fgetc(fp)) != EOF && isalnum(c));
// prepare for new word
index = 0;
}
// we must have found a whole word
else if (index > 0)
{
// terminate current word
word[index] = '\0';
// update counter
words++;
//prepare for next word
printf("%s\n", word);
index = 0;
}
//printf("%s\n", word);
}
printf("%s\n", word);
fclose(fp);
return(0);
}
As you can see, it's just a plain program that stores characters from words into an array, back-to-back, from a file called 'names.txt'.
My problem resides in the else if(index > 0) condition.
I've run a debugger and, obviously, the program works correctly.
Here's my question:
On the first for-loop iteration, index becomes 1. Otherwise, we wouldn't be able to store a whole word in an array.
If so, how is it possible that, when the program flow reaches the else if (index > 0) condition, it doesn't set word[1] to 0? (Or the subsequent values of index).
It just finishes the whole word and, once it has reached the end of the word, then it proceeds to give word[index] the value of 0 and proceed to the next word.
I've tried reading the documentation, running half of the program and asking with echo, and running a debugger. As it should be, everything runs perfectly, there's no problem with the code (as far as I know). I'm the problem. I just can't get how it works.
PS: sorry if this might be so trivial for some of you, I'm really starting to learn programming and I find sometimes really hard to understand apparently so simple concepts.
Thank you very much for you time guys.

As soon as something is executed in the if...else block, it moves out of the block. So if it satisfies the first if condition, the else if condition is not even checked. So if index > 0 AND c=\ or c is an alphabet, it runs the if statement, and if even one of these conditions does not hold true, it will move to the else if portions of the block.

Note the else in the beginning of the else if (index > 0) condition.
This means that it will only execute if none of the previous if() and else if() executed.
The previous if() and else if() statements do keep executing if the character is alphanumeric, or a non-leading slash, so the last else if() only executes once a non-alphanumeric, or leading slash is encountered.

Related

Sorting duplicate lines in C

I am trying to write a C program that can filter through lines. It is supposed to print only one line when there are consecutive duplicate lines. I have to use arrays of chars to compare the lines. The size of the arrays are inconsequential (set at 79 chars for the project). I have initialized the arrays as such:
char newArray [MAXCHARS];
char oldArray [MAXCHARS];
and have filled the array by using this for loop, to check for newlines and the end of file:
for(i = 0; i<MAXCHARS;i++){
if((newChar = getc(ifp)) != EOF){
if(newChar != '/n'){
oldArray[i] = newChar;
oldCount++;
}
else if(newChar == '/n'){
oldArray[i] = newChar;
oldCount++;
break;
}
}
else{
endOf = true;
break;
}
}
To cycle through the next line(s) and search for duplicates, I am using a while loop that is initially set to true. It fills the next array up to the newline and tests for EOF as well. Then, I use two for loops to test the arrays. If they are the same at each position in the arrays, duplicate remains unchanged and nothing is printed. If they are not the same, duplicate is set to false and a function (testArrays) is called to print the contents of each array.
while(duplicate){
newCount = 0;
/* fill second array, test for newlines and EOF*/
for(i =0; i< MAXCHARS; i++){
if((newChar = getc(ifp)) != EOF){
if(newChar != '/n'){
newArray[i] = newChar;
newCount++;
}
else if(newChar == '/n'){
newArray[i] = newChar;
newCount++;
break;
}
}
else{
endOf = true;
break;
}
}
/* test arrays against each other to spot duplicate lines*
if they are duplicates, continue the while loop getting new
arrays of characters in newArray until these tests fail*/
for(i =0; i< oldCount; i++){
if(oldArray[i] == newArray[i]){
continue;
}
else{
duplicate = false;
break;
}
}
for(i =0; i <newCount; i++){
if(oldArray[i] == newArray[i]){
continue;
}
else{
duplicate = false;
break;
}
}
if(endOf && duplicate){
testArray(oldArray);
break;
}
}
if((endOf && !duplicate) || (!endOf && !duplicate)){
testArray(oldArray);
testArray(newArray);
}
I find that this does not work and consecutive identical lines are being printed anyways. I cannot figure out how this could be happening. I know this is a lot of code to wade through but it is pretty straight forward and I think that another set of eyes on this will spot the problem easily. Thanks for the help.
is there a reason why you read a character at a time and instead of calling fgets() to read a line?
char instr[MAXCHARS];
for( iline = 0; ( fgets( instr, 256, ifp ) ); iline++ ) {
. . .<strcmp() current line to previous line here>. . .
}
EDIT:
You might want to declare 2 character strings and 3 char pointers -- one point to the current line and the other to the previous line. Then swap the two pointers using the third pointer.
You need to use a function to read lines — either fgets() or one you write (or POSIX getline() if you are familiar with dynamic memory allocation).
You then need to use an algorithm equivalent to:
Read first line into old.
If there is no line (EOF), stop.
Print the first line.
For every extra line read into new.
If there is no line (EOF), stop.
If new is the same as old, go to step 4.
Print new.
Copy new to old.
Go to step 4.
Those 'go to' steps would be part of normal loop controls, not actual goto statements.
I would do it by strings instead of char by char. I would use gets() to get the full input line and strcmp it to the previous string. You can also use fgets(str, MAX_CHARS, stdin) if you want. strcmp assumes your strings are nul terminated and you may need special EOF handling but something like whats below should work:
int main(){
char newStr[MAX_CHARS] = {0}; //string for new input
char oldStr[MAX_CHARS] = {0};
// Loop over input as long as there is something to read
while(gets(newStr) != NULL){
if(strcmp(newStr,oldStr) != 0){
printf("%s", newStr);
}
else{
//This is the case when you have duplicate strings. Dont print
}
memset(oldStr, 0, sizeof(oldStr)); //clear out old string incase it was longer
strcpy(oldStr, newStr); //copy new string into old string for future compare
}
}
At the part where you tested for duplicate, maybe you could test if oldCount == newCount first? My reasoning is that, if it is a duplicate line, oldCount will be equals to newCount. If it’s true, then proceed to check against the two array?

Multiple blank lines are not squeezed in one blank line(C) using I/O redirection

I am asked to squeezed two or more consecutive blank lines in the input as one blank line in the output. So I have to use Cygwin to do I/O or test it.
Example: ./Lab < test1.txt > test2.txt
my code is:
int main(void){
format();
printf("\n");
return 0;
}
void format(){
int c;
size_t nlines = 1;
size_t nspace = 0;
int spaceCheck = ' ';
while (( c= getchar()) != EOF ){
/*TABS*/
if(c == '\t'){
c = ' ';
}
/*SPACES*/
if (c ==' '){/*changed from isspace(c) to c==' ' because isspace is true for spaces/tabs/newlines*/
/* while (isspace(c = getchar())); it counts while there is space we will put one space only */
if(nspace > 0){
continue;
}
else{
putchar(c);
nspace++;
nlines = 0;
}
}
/*NEW LINE*/
else if(c == '\n'){
if(nlines >0){
continue;
}
else{
putchar(c);
nlines++;
nspace = 0;
}
}
else{
putchar(c);
nspace = 0;
nlines = 0;
}
}
}
However my test2.txt doesn't have the result I want. Is there something wrong in my logic/code?
You provide too little code, the interesting part would be the loop around the code you posted...
What you actually have to do there is skipping the output:
FILE* file = ...;
char c, prev = 0;
while((c = fgets(file)) != EOF)
{
if(c != '\n' || prev != '\n')
putchar(c);
prev = c;
}
If we have an empty line following another one, then we encounter two subsequent newline characters, so both c and prev are equal to '\n', which is the situation we do not want to output c (the subsequent newline) – and the inverse situation is any one of both being unequal to '\n', as you see above – and only then you want to output your character...
Side note: prev = 0 – well, I need to initalise it to anything different than a newline, could as well have been 's' – unless, of course, you want to skip an initial empty line, too, then you would have to initialise it with '\n'...
Edit, referring to your modified code: Edit2 (removed references to code as it changed again)
As your modified code shows that you do not only want to condense blank lines, but whitespace, too, you first have to consider that you have two classes of white space, on one hand, the newlines, on the other, any others. So you have to differentiate appropriately.
I recommend now using some kind of state machine:
#define OTH 0
#define WS 1
#define NL1 2
#define NL2 3
int state = OTH;
while (( c= getchar()) != EOF )
{
// first, the new lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
// then, any other whitespace
else if(isspace(c))
{
if(state != WS)
{
putchar(' ');
state = WS;
}
}
// finally, all remaining characters
else
{
putchar(c);
state = OTH;
}
}
First differentiation occurs to the current character's own class (newline, whitespace or other), second differentiation according to the previous character's class, which defines the current state. Output occurs always for any non-whitespace character or if the two subsequent whitespace characters only, if they are of different class (newline is a little specific, I need two states for, as we want to leave one blank line, which means we need two subsequent newline characters...).
Be aware: whitespace only lines do not apply as blank lines in above algorithm, so they won't be eliminated (but reduced to a line containing one single space). From the code you posted, I assume this is intended...
For completeness: This is a variant removing leading and trailing whitespace entirely and counting whitespace-only lines as empty lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
else if(isspace(c))
{
if(state == OTH)
state = WS;
}
else
{
if(state == WS)
{
putchar('');
}
putchar(c);
state = OTH;
}
Secret: Only enter the whitespace state, if there was a non-ws character before, but print the space character not before you encounter the next non-whitespace.
Coming to the newlines - well, if there was a normal character, we are either in state OTH or WS, but none of the two NL states. If there was only whitespace on the line, the state is not modified, thus we remain in the corresponding NL state (1 or 2) and skip the line correspondingly...
To dissect this:
if(c == '\n') {
nlines++;
is nlines ever reset to zero?
if(nlines > 1){
c = '\n';
And what happens on the third \n in sequence? will nlines > 1 be true? Think about it!
}
}
putchar(c);
I don't get this: You unconditionally output your character anyways, defeating the whole purpose of checking whether it's a newline.
A correct solution would set a flag when c is a newline and not output anything. Then, when c is NOT a newline (else branch), output ONE newline if your flag is set and reset the flag. I leave the code writing to you now :)

Can't count '|' symbols in a .c file

Basically I have to write a program that counts all kinds of different symbols in a .c file. I got it to work with all of the needed symbols except the vertical line '|'. For some reason it just won't count them.
Here's the method I'm using:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename,"r");
FILE *f;
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
if (temp >= 'a' && temp <= 'z') {
capital++;
}
else if (temp >= 'A' && temp <= 'Z') {
lesser++;
}
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
if (temp >= '0' && temp <= '9') {
numbers++;
}
if (temp == '|') {
spc++;
}
if (temp == '\n') {
lines++;
}
}
}
On this line:
else if( temp == '/') temp = fgetc(fp); {
I believe you have a misplaced {. As I understand it should come before temp = fgetc(fp);..
You can easily avoid such an errors if following coding style guidelines placing each expression on it's own line and indenting the code properly.
Update: And this fgetc is a corner case. What if you read past EOF here? You are not checking this error.
Firstly, some compiler warnings:
'f' : unreferenced local variable
not all control paths return a value
So, f can be removed, and the function should return a value on success too. It's always a good idea to set compiler warnings at the highest level.
Then, there is a problem with:
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
Check the ; at the end of the else. This means the block following it, is always executed. Also, for this fgetc() there is no check for EOF or an error.
Also, if temp is a /, but the following character is not, it will be skipped, so we need to put the character back into the stream (easiest solution in this case).
Here is a full example:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename, "r");
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
// check characters - check most common first
if (temp >= 'a' && temp <= 'z') lesser++;
else if (temp >= 'A' && temp <= 'Z') capital++;
else if (temp >= '0' && temp <= '9') numbers++;
else if (temp == '|') spc++;
else if (temp == '\n') lines++;
else if( temp == '/')
if ((temp = fgetc(fp)) == EOF)
break; // handle error/eof
else
if(temp == '/') comments++;
else ungetc(temp, fp); // put character back into the stream
}
fclose (fp); // close as soon as possible
printf("capital: %d\nlesser: %d\ncomments: %d\n"
"numbers: %d\nspc: %d\nlines: %d\n",
capital, lesser, comments, numbers, spc, lines
);
return 1;
}
While it is usually recommended to put if statements inside curly braces, I think in this case we can place them on the same line for clarity.
Each if can be preceded with an else in this case. That way the program doesn't have to check the remaining cases when one is already found. The checks for the most common characters are best placed first for the same reason (but that was the case).
As an alternative you could use islower(temp), isupper(temp) and isdigit(temp) for the first three cases.
Performance:
For the sake of completeness: while this is probably an exercise on small files, for larger files the data should be read in buffers for better performance (or even using memory mapping on the file).
Update, #SteveSummit's comment on fgetc performance:
Good answer, but I disagree with your note about performance at the
end. fgetc already is buffered! So the performance of straightforward
code like this should be fine even for large inputs; there's typically
no need to complicate the code due to concerns about "efficiency".
Whilst this comment seemed to be valid at first, I really wanted to know what the real difference in performance would be (since I never use fgetc I hadn't tested this before), so I wrote a little test program:
Open a large file and sum every byte into a uint32_t, which is comparable to scanning for certain chars as above. Data was already cached by the OS disk cache (since we're testing performance of the functions/scans, not the reading speed of the hard disk). While the example code above was most likely for small files, I thought I might put the test results for larger files here as well.
Those were the average results:
- using fgetc : 8770
- using a buffer and scan the chars using a pointer : 188
- use memory mapping and scan chars using a pointer : 118
Now, I was pretty sure using buffers and memory mapping would be faster (I use those all the time for larger data), the difference in speed is even bigger than expected. Ok, there might be some possible optimizations for fgetc, but even if those would double the speed, the difference would still be high.
Bottom line: Yes, it is worth the effort to optimize this for larger files. For example, if processing the data of a file takes 1 second with buffers/mmap, it would take over a minute with fgetc!

How to store the even lines of a file to one array and the odd lines to another

I am given a file of DNA sequences and asked to compare all of the sequences with each other and delete the sequences that are not unique. The file I am working with is in fasta format so the odd lines are the headers and the even lines are the sequences that I want to compare. SO I am trying to store the even lines in one array and the odd lines in another. I am very new to C so I'm not sure where to begin. I figured out how to store the whole file in one array like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
while(fgets(seq, sizeof seq, dna_file)){
strcpy(line[i], seq);
printf("%s", seq);
i++;
}
}
fclose(dna_file);
return 0;
}
I was thinking I would have to incorporate some sort of code that looked like this:
for (i = 0; i < rows; i++){
if (i % 2 == 0) header[i/2] = getline();
else seq[i/2] = getline();
but I'm not sure how to implement it.
Any help would be greatly appreciated!
To store the even lines of a file to one array and the odd lines to another,
read each char and swap output files when '\n' encountered.
void Split(FILE *even, FILE* odd, FILE *source) {
int evenflag = 1;
int ch;
while ((ch = fgetc(source)) != EOF) {
if (evenflag) {
fputc(ch, even);
} else {
fputc(ch, odd);
}
if (ch == '\n') {
evenflag = !evenflag;
}
}
}
It is not clear if this post also requires code to do the unique filtering step.
Could you please give me an example of the data in the file?
Am I right in thinking it'd be something like:
Header
Sequence
Header
Sequence
And so on
Perhaps you could do something like this:
int main(){
int total_seq = 50;
char seq[100];
char line[total_seq][100];
FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");
if (dna_file==NULL){
printf("Error");
}
// Put this in an else statement
int counter = 1;
while(fgets(seq, sizeof seq, dna_file)){
// If counter is odd
// Place next line read in headers array
// If counter is even
// Place next line read in sequence array
// Increment counter
}
// Now you have all the sequences & headers. Remove any duplicates
// Foreach number of elements in 'sequence' array - referenced by, e.g. 'j' where 'j' starts at 0
// Foreach number of elements in 'sequence' array - referenced by 'k' - Where 'k' Starts at 'j + 1'
// IF (sequence[j] != '~') So if its not our chosen escape character
// IF (sequence[j] == sequence[k]) (I think you'd have to use strcmp for this?)
// SET sequence[k] = '~';
// SET header[k] = '~';
// END IF
// END IF
// END FOR
// END FOR
}
// You'd then need an algorithm to run through the arrays. If a '~' is found. Move the following non tilda/sequence down to its position, and so on.
// EDIT: Infact. It would probably be easier if when writing back to file, just ignore/don't write if sequence[x] == '~' (where 'x' iterates through all)
// Finally write back to file
fclose(dna_file);
return 0;
}
First: write a function that counts the number of newline (\n) characters in the file.
Then write a function that searches for the n-th newline
Last, write a function to go through and read from one '\n' to the next.
Alternately, you could just go online and read about string parsing.

Reading data from a file, only alpha characters

I'm working on a program for school right now in c and I'm having trouble reading text from a file. I've only ever worked in Java before so I'm not completely familiar with c yet and this has got me thoroughly stumped even though I'm sure it's pretty simple.
Here's an example of how the text can be formatted in the file we have to read:
boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN)
I have to take in each word and store it in a data structure, and a word is only alpha characters, so no numbers or special characters. I already have the data structure working properly so I just need to get each word into a char array and then add it to my structure. It has to keep reading each char until it gets to a non-alpha char value. I've tried looking into the different ways to scan in from a file and I'm not sure what would be best for my scenario.
Here's the code I have right now for my input:
char str[MAX_WORD_SIZE];
char c;
int index = 0;
while (fscanf(dictionaryInputFile, "%c", c) != EOF) //while not at end of file
{
if (isalpha(c)) //if current character is a letter
{
tolower(c); //ignores case in word
str[index] = c; //add char to string
index++;
}
else if (str[0] != '\0') //If a word
{
str[index] = '\0'; //Make sure no left over characters in String
dictionaryRoot = insertNode(str, dictionaryRoot); //insert word to dictionary
index = 0; //reset index
str[index] = '\0'; //Set first character to null since word has been added
}
}
My thinking was that if it doesn't hit that first if statement then I have to check if str is a word or not, that's why it checks if the 0 index of str is null or not. I'm guessing the else if statement I have is not right though, but I can't figure out a way to end the current word I'm building and then reset str to null when it's added to my data structure. Right now when I run this I get a segmentation fault if I pass the txt file as an argument.
I'd just like to know if I'm on the right track and if not maybe some help on how I should be reading this data.
This is my first time posting here so I hope I included everything you'll need to help me, if not just let me know and I'd be happy to add more information.
Biggest problem: Incorrect use of fscanf(). #BLUEPIXY
// while (fscanf(dictionaryInputFile, "%c", c) != EOF)
while (fscanf(dictionaryInputFile, "%c", &c) != EOF)
No protection against overflow.
// str[index] = c; //add char to string
if (index >= MAX_WORD_SIZE - 1) Handle_TooManySomehow();
Not sure why testing against '\0' when '\0' is also a non-alpha.
Pedantically, isalpha() is problematic when a signed char is passed. Better to pass the unsigned char value: is...((unsigned char) c)), when code knows it is not EOF. Alternatively, save the input using int ch = fgetc(stream) and use is...(ch)).
Minor: Better to use size_t for array indexes than int, but be careful as size_t is unsigned. size_t is important should the array become large, unlike in this case.
Also, when EOF received, any data in str is ignored, even if it contained a word. #BLUEPIXY.
For the most part, OP is on the right track.
Follows is a sample non-tested approach to illustrate not overflowing the buffer.
Test for full buffer, then read in a char if needed. If a non-alpha found, add to dictionary if a non-zero length work was accumulated.
char str[MAX_WORD_SIZE];
int ch;
size_t index = 0;
for (;;) {
if ((index >= sizeof str - 1) ||
((ch = fgetc(dictionaryInputFile)) == EOF) ||
(!isalpha(ch))) {
if (index > 0) {
str[index] = '\0';
dictionaryRoot = insertNode(str, dictionaryRoot);
index = 0;
}
if (ch == EOF) break;
}
else {
str[index++] = tolower(ch);
}
}

Resources