Basically I have to write a program that counts all kinds of different symbols in a .c file. I got it to work with all of the needed symbols except the vertical line '|'. For some reason it just won't count them.
Here's the method I'm using:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename,"r");
FILE *f;
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
if (temp >= 'a' && temp <= 'z') {
capital++;
}
else if (temp >= 'A' && temp <= 'Z') {
lesser++;
}
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
if (temp >= '0' && temp <= '9') {
numbers++;
}
if (temp == '|') {
spc++;
}
if (temp == '\n') {
lines++;
}
}
}
On this line:
else if( temp == '/') temp = fgetc(fp); {
I believe you have a misplaced {. As I understand it should come before temp = fgetc(fp);..
You can easily avoid such an errors if following coding style guidelines placing each expression on it's own line and indenting the code properly.
Update: And this fgetc is a corner case. What if you read past EOF here? You are not checking this error.
Firstly, some compiler warnings:
'f' : unreferenced local variable
not all control paths return a value
So, f can be removed, and the function should return a value on success too. It's always a good idea to set compiler warnings at the highest level.
Then, there is a problem with:
else if( temp == '/') temp = fgetc(fp); {
if(temp == '/')
comments++;
}
Check the ; at the end of the else. This means the block following it, is always executed. Also, for this fgetc() there is no check for EOF or an error.
Also, if temp is a /, but the following character is not, it will be skipped, so we need to put the character back into the stream (easiest solution in this case).
Here is a full example:
int countGreaterLesserEquals(char filename[])
{
FILE *fp = fopen(filename, "r");
int temp = 0; // ASCII code of the character
int capital = 0;
int lesser = 0;
int numbers = 0;
int comments = 0;
int lines = 0;
int spc = 0;
if (fp == NULL) {
printf("File is invalid\\empty.\n");
return 0;
}
while ((temp = fgetc(fp)) != EOF) {
// check characters - check most common first
if (temp >= 'a' && temp <= 'z') lesser++;
else if (temp >= 'A' && temp <= 'Z') capital++;
else if (temp >= '0' && temp <= '9') numbers++;
else if (temp == '|') spc++;
else if (temp == '\n') lines++;
else if( temp == '/')
if ((temp = fgetc(fp)) == EOF)
break; // handle error/eof
else
if(temp == '/') comments++;
else ungetc(temp, fp); // put character back into the stream
}
fclose (fp); // close as soon as possible
printf("capital: %d\nlesser: %d\ncomments: %d\n"
"numbers: %d\nspc: %d\nlines: %d\n",
capital, lesser, comments, numbers, spc, lines
);
return 1;
}
While it is usually recommended to put if statements inside curly braces, I think in this case we can place them on the same line for clarity.
Each if can be preceded with an else in this case. That way the program doesn't have to check the remaining cases when one is already found. The checks for the most common characters are best placed first for the same reason (but that was the case).
As an alternative you could use islower(temp), isupper(temp) and isdigit(temp) for the first three cases.
Performance:
For the sake of completeness: while this is probably an exercise on small files, for larger files the data should be read in buffers for better performance (or even using memory mapping on the file).
Update, #SteveSummit's comment on fgetc performance:
Good answer, but I disagree with your note about performance at the
end. fgetc already is buffered! So the performance of straightforward
code like this should be fine even for large inputs; there's typically
no need to complicate the code due to concerns about "efficiency".
Whilst this comment seemed to be valid at first, I really wanted to know what the real difference in performance would be (since I never use fgetc I hadn't tested this before), so I wrote a little test program:
Open a large file and sum every byte into a uint32_t, which is comparable to scanning for certain chars as above. Data was already cached by the OS disk cache (since we're testing performance of the functions/scans, not the reading speed of the hard disk). While the example code above was most likely for small files, I thought I might put the test results for larger files here as well.
Those were the average results:
- using fgetc : 8770
- using a buffer and scan the chars using a pointer : 188
- use memory mapping and scan chars using a pointer : 118
Now, I was pretty sure using buffers and memory mapping would be faster (I use those all the time for larger data), the difference in speed is even bigger than expected. Ok, there might be some possible optimizations for fgetc, but even if those would double the speed, the difference would still be high.
Bottom line: Yes, it is worth the effort to optimize this for larger files. For example, if processing the data of a file takes 1 second with buffers/mmap, it would take over a minute with fgetc!
Related
I'm having a lot of trouble understanding the following program flow:
#include <stdio.h>
#include <ctype.h>
int main ()
{
FILE *fp;
int index = 0;
char word[45];
int words = 0;
fp = fopen("names.txt","r");
if(fp == NULL)
{
perror("Error in opening file");
return(-1);
}
for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
{
// allow only alphabetical characters and apostrophes
if (isalpha(c) || (c == '\'' && index > 0))
{
// append character to word
word[index] = c;
index++;
// ignore alphabetical strings too long to be words
if (index > 45)
{
// consume remainder of alphabetical string
while ((c = fgetc(fp)) != EOF && isalpha(c));
// prepare for new word
index = 0;
}
}
// ignore words with numbers (like MS Word can)
else if (isdigit(c))
{
// consume remainder of alphanumeric string
while ((c = fgetc(fp)) != EOF && isalnum(c));
// prepare for new word
index = 0;
}
// we must have found a whole word
else if (index > 0)
{
// terminate current word
word[index] = '\0';
// update counter
words++;
//prepare for next word
printf("%s\n", word);
index = 0;
}
//printf("%s\n", word);
}
printf("%s\n", word);
fclose(fp);
return(0);
}
As you can see, it's just a plain program that stores characters from words into an array, back-to-back, from a file called 'names.txt'.
My problem resides in the else if(index > 0) condition.
I've run a debugger and, obviously, the program works correctly.
Here's my question:
On the first for-loop iteration, index becomes 1. Otherwise, we wouldn't be able to store a whole word in an array.
If so, how is it possible that, when the program flow reaches the else if (index > 0) condition, it doesn't set word[1] to 0? (Or the subsequent values of index).
It just finishes the whole word and, once it has reached the end of the word, then it proceeds to give word[index] the value of 0 and proceed to the next word.
I've tried reading the documentation, running half of the program and asking with echo, and running a debugger. As it should be, everything runs perfectly, there's no problem with the code (as far as I know). I'm the problem. I just can't get how it works.
PS: sorry if this might be so trivial for some of you, I'm really starting to learn programming and I find sometimes really hard to understand apparently so simple concepts.
Thank you very much for you time guys.
As soon as something is executed in the if...else block, it moves out of the block. So if it satisfies the first if condition, the else if condition is not even checked. So if index > 0 AND c=\ or c is an alphabet, it runs the if statement, and if even one of these conditions does not hold true, it will move to the else if portions of the block.
Note the else in the beginning of the else if (index > 0) condition.
This means that it will only execute if none of the previous if() and else if() executed.
The previous if() and else if() statements do keep executing if the character is alphanumeric, or a non-leading slash, so the last else if() only executes once a non-alphanumeric, or leading slash is encountered.
I am writing a routine to find a string within a specified block of memory in an embedded (ARM Cortex M0 #16MHz) application and am wondering why the two different versions I have written run at different speeds.
char* memstr(char* mem, uint32_t n, char* str) {
if( (str[0] == '\0') || (n == 0) ) return NULL;
uint32_t i = 0;
char* max_mem;
max_mem = mem + n;
while( mem < max_mem ) {
if( *mem != str[i] ) {
mem -= i;
i = 0;
} else {
if(str[i+1] == '\0') return mem - i;
i++;
}
mem++;
}
return NULL;
}
char* memstr2(char* mem, uint32_t n, char* str) {
if( (str[0] == '\0') || (n == 0) ) return NULL;
uint32_t c = 0;
uint32_t i = 0;
while( c < n ) {
if( mem[c] != str[i] ) {
c -= i;
i = 0;
} else {
i++;
if(str[i] == '\0') return &mem[c - i + 1];
}
c++;
}
return NULL;
}
memstr is consistently a 1us faster than memstr2 when finding a 7 character string in between 20 and 200 bytes of memory. For example finding a 7 character string in 110 bytes, memstr takes 106us and memstr2 takes 107us. 1us may not sound a big deal but in an embedded application where every tick matters it's a drawback.
Kind of a bonus question: This also prompted me to write my own strstr which is faster than stock strstr (e.g. finding a 7 character string in a 207 character string takes my_strstr 236us and strstr 274us). What's wrong with this though as strstr must be pretty optimised?
char* my_strstr(char* str1, char* str2) {
uint32_t i = 0;
if( str2[0] == '\0' ) return NULL;
while( *str1 != '\0' ) {
if( *str1 != str2[i] ) {
str1 -= i;
i = 0;
} else {
i++;
if(str2[i] == '\0') return (str1 - i - 1);
}
str1++;
}
return NULL;
}
First, both functions don't work if you search for a string starting with two equal characters: If you search for xxabcde and the string contains xxxabcde then when you notice that that a of xxabcde doesn't match the third x, you already have skipped two x's and won't match the string.
You also don't check whether you search for an empty string, in which case your code produces undefined behaviour.
You compare memory with memory. But you can do an awful lot of work just comparing memory with a single character. If you search for "abcde", first you have to find the letter a. So I'd check for an empty string first, then read the first character. And then first loop to check for that character.
char first = str2 [0];
if (first == '\0') return mem;
for (; mem < maxmem; ++mem) if (*mem == first) {
... check whether there is a match
}
You should check your data. You would write different code if you expect the search string to come up early vs. if you expect it usually not to be there at all.
In memstr 'mem' is used as a pointer. In memstr2 'mem' is used as the name of an array 'mem[c]'. Depending on optimization a compiler might multiply by 1.
For instance in the statement:
if( mem[c] != str[i] ) {
mem[c] is calculated each time through the loop as
* ( &mem[0] + c * sizeof(mem[0]) )
A decent compiler would figure out that for a 'char' the sizeof(mem[0]) == 1 and the muliply could be skipped. If optimization were intentionally disabled as is typically done for debug versions I can imagine an extra multiply operation per loop. There would be a little extra calculation time in the memstr2 version even without the multiply, but it would surprise me if that were measurable.
It seems to me that the difference can be due to the fact that in the second version you dereference the pointer (return &mem[c - i - 1];) when you return from the function which can lead to access in memory which is costly and is something that can not happen in your first function (mem - i).
But the only way to be sure is to see what assembly is being created for each case.
I don't think this is about C but about compiler and platform
I'm trying to write a code that reads from a file and works with the characters it reads. The gist is that it has to correct the capitalization errors present in the file it reads.
One particular requirement is that I have to number each line, so I wrote a bit to determine whether or not a each character read is a line break.
int fix_caps(char* ch, int* char_in_word, int* line_num){
char a;
ch = &a;
if(a != '\n'){
return 0;
}else{
return 1;
}
if(a == ' ')
*char_in_word = 0;
if(*char_in_word == 1)
a = toupper(a);
if(*char_in_word > 1)
a = tolower(a);
char_in_word++;
}
However, the function this is in always returns 0, when it should return 1 at the end of each line. What am I doing wrong?
the execution will never get beyond this 'if control block:
char a;
ch = &a;
if(a != '\n'){
return 0;
}else{
return 1;
}
there is a few reasons it 'always' returns 0
1) 'a' is on the stack and could contain anything.
2) the chances of the 'trash' that is on the stack where 'a'
is located are 255:1 against the trash happening to contain
a new line character.
nothing beyond the 'if control block is ever executed because
an 'if' control block only has two execution paths
and both paths contain a return statement.
I'm working on a program for school right now in c and I'm having trouble reading text from a file. I've only ever worked in Java before so I'm not completely familiar with c yet and this has got me thoroughly stumped even though I'm sure it's pretty simple.
Here's an example of how the text can be formatted in the file we have to read:
boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN)
I have to take in each word and store it in a data structure, and a word is only alpha characters, so no numbers or special characters. I already have the data structure working properly so I just need to get each word into a char array and then add it to my structure. It has to keep reading each char until it gets to a non-alpha char value. I've tried looking into the different ways to scan in from a file and I'm not sure what would be best for my scenario.
Here's the code I have right now for my input:
char str[MAX_WORD_SIZE];
char c;
int index = 0;
while (fscanf(dictionaryInputFile, "%c", c) != EOF) //while not at end of file
{
if (isalpha(c)) //if current character is a letter
{
tolower(c); //ignores case in word
str[index] = c; //add char to string
index++;
}
else if (str[0] != '\0') //If a word
{
str[index] = '\0'; //Make sure no left over characters in String
dictionaryRoot = insertNode(str, dictionaryRoot); //insert word to dictionary
index = 0; //reset index
str[index] = '\0'; //Set first character to null since word has been added
}
}
My thinking was that if it doesn't hit that first if statement then I have to check if str is a word or not, that's why it checks if the 0 index of str is null or not. I'm guessing the else if statement I have is not right though, but I can't figure out a way to end the current word I'm building and then reset str to null when it's added to my data structure. Right now when I run this I get a segmentation fault if I pass the txt file as an argument.
I'd just like to know if I'm on the right track and if not maybe some help on how I should be reading this data.
This is my first time posting here so I hope I included everything you'll need to help me, if not just let me know and I'd be happy to add more information.
Biggest problem: Incorrect use of fscanf(). #BLUEPIXY
// while (fscanf(dictionaryInputFile, "%c", c) != EOF)
while (fscanf(dictionaryInputFile, "%c", &c) != EOF)
No protection against overflow.
// str[index] = c; //add char to string
if (index >= MAX_WORD_SIZE - 1) Handle_TooManySomehow();
Not sure why testing against '\0' when '\0' is also a non-alpha.
Pedantically, isalpha() is problematic when a signed char is passed. Better to pass the unsigned char value: is...((unsigned char) c)), when code knows it is not EOF. Alternatively, save the input using int ch = fgetc(stream) and use is...(ch)).
Minor: Better to use size_t for array indexes than int, but be careful as size_t is unsigned. size_t is important should the array become large, unlike in this case.
Also, when EOF received, any data in str is ignored, even if it contained a word. #BLUEPIXY.
For the most part, OP is on the right track.
Follows is a sample non-tested approach to illustrate not overflowing the buffer.
Test for full buffer, then read in a char if needed. If a non-alpha found, add to dictionary if a non-zero length work was accumulated.
char str[MAX_WORD_SIZE];
int ch;
size_t index = 0;
for (;;) {
if ((index >= sizeof str - 1) ||
((ch = fgetc(dictionaryInputFile)) == EOF) ||
(!isalpha(ch))) {
if (index > 0) {
str[index] = '\0';
dictionaryRoot = insertNode(str, dictionaryRoot);
index = 0;
}
if (ch == EOF) break;
}
else {
str[index++] = tolower(ch);
}
}
Here is the code for the same .
Problem am facing is , am not able to write any thing on any of the files.
Kindly help resolving this
#include <stdio.h>
#include <string.h>
int main()
{
FILE *fe;
FILE *fo;
FILE *fg;
int i;
int j,l;
char ch;
char tmp[100] ;
fo = fopen("oddfile.txt","a");
if (fo == NULL)
{
perror("ODDFILE");
}
fclose(fo);
fe = fopen("evenfile.txt","a");
if (fe == NULL)
{
perror("EVENFILE");
}
fclose(fe);
fg = fopen("generalfile","r");
if (fg == NULL)
{
perror("GENERALFILE");
}
while(ch = fgetc(fg)!=EOF)
{
if (ch != '\n' && ch != 't' && ch != ' ')
{
tmp[i] = ch;
i++;
}
else
{
printf("%s",tmp);
l =strlen(tmp);
j = l % 2;
if (j == 0)
{
fe = fopen("evenfile.txt","ab");
if (fe == NULL)
{
perror("EVENFILE");
}
fwrite(&tmp,sizeof(tmp),1,fe);
fclose(fe);
}
else
{
fo = fopen("oddfile.txt","ab");
if (fo == NULL)
{
perror("ODDFILE");
}
fwrite(&tmp,sizeof(tmp),1,fo);
fclose(fo);
}
}
}
}
My code is compiling successfully but not able to get the desired output
Change:
while(ch = fgetc(fg)!=EOF)
to
while( (ch = fgetc(fg)) !=EOF)
The precedence rules make the two behave differently. The way you have it coded, ch is always set to either 0 or 1, neither of which is a printable character.
There's no reason to open the files multiple times. Just open each output file once before the loop. You would be wise to use fprintf rather than fwrite, and don't use the size of tmp as the length; you already calculated the length of the string, so use it! You might also want to write a newline or some separator after each word. And you really need to check for stack overflow and handle it if the input contains large words. Blindly assuming that your input words will fit into a 100 character array is a disaster waiting to happen. Just slightly worse than failing to put a null character at the end of the input and calling strlen on a character array that may not be a string. (Which is another bug you have: you need to write a NULL into tmp at the end of the input.)
Also note that it is quite annoying to get an error message which reads:
ODDFILE: no such file or directory
if the file that the program actually cares about is not called "ODDFILE".
And, finally (although it is entirely possible there are more bugs in this short program), it would be a great idea to stop relying on behavior that exists in compilers merely for the purpose of conforming to standard practices from 1983. Declare the main function properly as int main( void ), or (even better) use int main( int argc, char **argv) and allow the caller to specify input file names. And return a value from main! Turn up your compiler warnings as well, since they will give you useful information.