C program to count specific words in FILE - c

The objective of the program is to rate a person's resume. The program should open and read two .txt type FILES. One of the files contains the keywords and the other is the resume itself. The process of the program consists in looping through the keywords.txt and then try to find a similar word in the resume.txt. I got it almost working but the program seems to be considering the first space as the end of the file in the keywords FILE.
This is what I have:(I tried switching the first word on the keywords and the count seems to work/would be goo to scan only characters without symbols and its necessary to count the occurrence of every single keyword)
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
int main(){
FILE* txtKey;
FILE* txtResume;
char keyWords[1000];
char word[10000];
int count;
txtKey=fopen("keywords.txt", "r");
if(txtKey == NULL){
printf("Failed to open txtKey file \n");
return 1;
}
txtResume=fopen("resume.txt", "r");
if(txtResume == NULL){
printf("Failed to open txtResume file \n");
return 1;
}
while (fscanf(txtKey, "%s", keyWords) != EOF)
{
while (fscanf(txtResume, "%s", word) != EOF)
{
if (strstr(word, keyWords) != NULL)
{
count++;
}
}
}
printf("The keywords were found %d times in your resume!", count);
fclose(txtResume);
fclose(txtKey);
return 0;
}//END MAIN

Note: This is prefaced by my top comments.
I've created a word list struct that holds a list of words. It is used twice. Once, to store the list of keywords. And, a second time to parse the current line of the resume file.
I coded it from scratch, because it's somewhat different than what you had:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#ifdef DEBUG
#define dbgprt(_fmt...) \
do { \
printf(_fmt); \
} while (0)
#else
#define dbgprt(_fmt...) \
do { \
} while (0)
#endif
typedef struct {
int list_max;
int list_cnt;
char **list_words;
} list_t;
list_t keywords;
list_t linewords;
char buf[10000];
int
wordsplit(FILE *xf,list_t *list,int storeflg)
{
char *cp;
char *bp;
int valid;
if (! storeflg)
list->list_cnt = 0;
do {
cp = fgets(buf,sizeof(buf),xf);
valid = (cp != NULL);
if (! valid)
break;
bp = buf;
while (1) {
cp = strtok(bp," \t\n");
bp = NULL;
if (cp == NULL)
break;
// grow the list
if (list->list_cnt >= list->list_max) {
list->list_max += 100;
list->list_words = realloc(list->list_words,
sizeof(char *) * (list->list_max + 1));
}
if (storeflg)
cp = strdup(cp);
list->list_words[list->list_cnt++] = cp;
list->list_words[list->list_cnt] = NULL;
}
} while (0);
return valid;
}
void
listdump(list_t *list,const char *tag)
{
char **cur;
dbgprt("DUMP: %s",tag);
for (cur = list->list_words; *cur != NULL; ++cur) {
dbgprt(" '%s'",*cur);
}
dbgprt("\n");
}
int
main(void)
{
FILE *xf;
int count;
xf = fopen("keywords.txt","r");
if (xf == NULL)
return 1;
while (1) {
if (! wordsplit(xf,&keywords,1))
break;
}
fclose(xf);
listdump(&keywords,"KEY");
count = 0;
xf = fopen("resume.txt","r");
if (xf == NULL)
return 2;
while (1) {
if (! wordsplit(xf,&linewords,0))
break;
listdump(&linewords,"CUR");
for (char **str = linewords.list_words; *str != NULL; ++str) {
dbgprt("TRYCUR: '%s'\n",*str);
for (char **key = keywords.list_words; *key != NULL; ++key) {
dbgprt("TRYKEY: '%s'\n",*key);
if (strcmp(*str,*key) == 0) {
count += 1;
break;
}
}
}
}
fclose(xf);
printf("keywords found %d times\n",count);
return 0;
}
UPDATE:
Any option to make it simpler? I don't think I know all the concepts of this answer, although tis result is perfect.
Yes, based on your code, I realized that what I did was a bit advanced. But, by reusing the list as I did, it actually saved a bit of replicated code (e.g. Why have separate parsing code for the keywords and resume data when they are both very similar.
There's standard documentation for all the libc functions (e.g. fgets, strtok, strcmp).
If you know the [maximum] number of keyword beforehand [this is possible to do], you could use a fixed size char ** array [similar to what you had].
Or, you could just do a realloc on a char **keywords array on every new keyword (e.g. cp). And, maintain a separate count variable (e.g. int keycnt). This would be fine if we only needed one list (i.e. we could forego the list_t struct).
We could replicate some of the keyword code for the second loop in main, and again, use different variables for the array and its count.
But, this is wasteful. list_t is an example of using realloc efficiently (i.e. calling it less often). This is a standard technique.
If you do a websearch on dynamic resize array realloc, one of entries you'll find is: https://newton.ex.ac.uk/teaching/resources/jmr/appendix-growable.html
Note the use of strdup to preserve the word values for the keyword list beyond the next call to fgets.
Hopefully, that covers enough so you can study it a bit. The whole "how to implement a dynamic resizing array using realloc?" shows up quite frequently a question on SO, so you could also search here for a question on it.
Also, how could it word if the keywords.txt list has words separated by ","?
To parse by ",", just change the second arg to strtok to include it (e.g. " \t,\n"). That will work for abc def, abc,def, or abc, def.

Related

Illegal instruction 4 when placing a function outside int main

I've just begun learning the C language and I ran into an issue with one of my programs.
I am getting an error: "Illegal instruction 4" when executing: ./dictionary large.txt
Large.txt is a file with 143091 alphabetically sorted words, with each word starting on a new line. I am trying to load all of them into a hash table and return true if all the words are loaded successfully.
This code works for me if the code in bool load() is within int main and load() is non-existent. However, once I place it inside the load() function and call it from main, I get an error.
I would appreciate help on this, as there are not many threads on Illegal instruction.
This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
// Maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45
// Number of letters in the english alphabet
#define ALPHABET_LENGTH 26
// Default dictionary
#define DICTIONARY "large.txt"
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
} node;
// Number of buckets in hash table
const unsigned int N = ALPHABET_LENGTH;
// Hash table
node *table[N];
// Load function
bool load(char *dictionary);
// Hash function
int hash(char *word);
int main(int argc, char *argv[])
{
// Check for correct number of args
if (argc != 2 && argc != 3)
{
printf("Usage: ./speller [DICTIONARY] text\n");
exit(1);
}
// Determine which dictionary to use
char *dictionary = (argc == 3) ? argv[1] : DICTIONARY;
bool loaded = load(dictionary);
// TODO: free hashtable from memory
return 0;
}
bool load(char *dictionary)
{
// Open dictionary for reading
FILE *file = fopen(dictionary, "r");
if (file == NULL)
{
printf("Error 2: could not open %s. Please call customer service.\n", dictionary);
exit(2);
}
// Initialize array to NULL
for (int i = 0; i < N; i++)
table[i] = NULL;
// Declare and initialize variables
unsigned int char_count = 0;
unsigned int word_count = 0;
char char_buffer;
char word_buffer[LENGTH + 1];
int hash_code = 0;
int previous_hash_code = 0;
// Declare pointers
struct node *first_item;
struct node *current_item;
struct node *new_item;
// Is true the first time the while loop is ran to be able to distinguish between hash_code and previous_hash_code after one loop
bool first_loop = true;
// Count the number of words in dictionary
while (fread(&char_buffer, sizeof(char), 1, file))
{
// Builds the word_buffer by scanning characters
if (char_buffer != '\n')
{
word_buffer[char_count] = char_buffer;
char_count++;
}
else
{
// Increases word count each time char_buffer == '\n'
word_count += 1;
// Calls the hash function and stores its value in hash_code
hash_code = hash(&word_buffer[0]);
// Creates and initializes first node in a given table index
if (hash_code != previous_hash_code || first_loop == true)
{
first_item = table[hash_code] = (struct node *)malloc(sizeof(node));
if (first_item == NULL)
{
printf("Error 3: memory not allocated. Please call customer service.\n");
return false;
}
current_item = first_item;
strcpy(current_item->word, word_buffer);
current_item->next = NULL;
}
else
{
new_item = current_item->next = (struct node *)malloc(sizeof(node));
if (new_item == NULL)
{
printf("Error 4: memory not allocated. Please call customer service.\n");
return false;
}
current_item = new_item;
strcpy(current_item->word, word_buffer);
current_item->next = NULL;
}
// Fills word buffer elements with '\0'
for (int i = 0; i < char_count; i++)
{
word_buffer[i] = '\0';
}
// Signals the first loop has finished.
first_loop = false;
// Clears character buffer to keep track of next word
char_count = 0;
// Keeps track if a new table index should be initialized
previous_hash_code = hash_code;
}
}
return true;
}
// Hash in order of: 'a' is 0 and 'z' is 25
int hash(char *word_buffer)
{
int hash = word_buffer[0] - 97;
return hash;
}
Thank you in advance!
Chris
You should use node *table[ALPHABET_LENGTH]; for the table declaration instead of node *table[N];
There is a difference between constant macros and const variables, a macro can be used in a constant expression, such as a global array bound as per your use case, whereas a const variable cannot.
As you can see here, the compiler you say you are using, gcc, with no compiler flags, issues an error message:
error: variably modified 'table' at file scope
You can read more about these differences and use cases in "static const" vs "#define" vs "enum" it has more subjects, like static and enum, but is a nice read to grasp the differences between these concepts.

Creating a Dictionary in C

I am currently working on creating a dictionary using a binary search tree-like structure we designed in class.
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
struct entry
{
char* word;
unsigned int n; /* n is the number of times the word appears in the source. */
struct entry *left;
struct entry *right;
};
/*input_from_args: if no additional argument is given, return stdin. Else, open the text file and read it.*/
FILE*
input_from_args(int argc, const char *argv[]){
if(argc==1){
return stdin;
}else{
return fopen(argv[1],"r");
}
}
Below is the insert function that we also wrote in my class. Given the new word we are looking at, if it is
struct entry*
insert(struct entry *table, char* str)
{
if(table == NULL){
table = (struct entry *)malloc(sizeof(struct entry));
strcpy(table->word,str);
table -> n = 1;
table -> left = NULL;
table -> right = NULL;
}else if(strcmp(str, table->word) == 0){
table -> n = (table ->n)+1;
}else if(strcmp(str, table->word) <0){
table->left = insert(table->left, str);
}else if(strcmp(str, table->word) >0){
table ->right = insert(table->right, str);
}
return table;
}
Below is a print function which I wrote myself which is to print every word in table and N, the number of times it occurs.
void
print_table(struct entry *table){
if(table!=NULL){
print_table(table->left);
printf("%s ", table->word);
printf("%d \n", table->n);
print_table(table->right);
}
}
And finally, below is the main function.
int
main(int argc, const char *argv[])
{
FILE *src = input_from_args(argc, argv);
if(src == NULL){
fprintf(stderr, "%s: unable to open %s\n", argv[0], argv[1]);
exit(EXIT_FAILURE);
}
char str[1024];
struct entry *table;
int c;
while((fscanf(src, "%s", str))!= EOF){
table = insert(table, str);
}
print_table(table);
return 0;
}
I'm having some very odd behavior when I run this function. It seems to only be happening when I run it with longer input.
When I run it with this input(in a .txt file):
This is a test.
This is a test.
This is a test.
I get the following output:
This 3
a 3
is 3
test 3
This is what I should be getting. However, when I give it slightly longer input, such as:
Apple Apple
Blue Blue
Cat Cat
Dog Dog
Elder Elder
Funions Funions
Gosh Gosh
Hairy Hairy
I get the following output:
Appme 2
Blue 2
Cat 2
Dog 2
Elder 2
Funions 2
Gosi 2
Hairy 2
Which is clearly correct as far as the numbers go, but why is it changing some of the letters in my words? I gave it Apple, it returned Appme. I gave it Gosh, it gave me Gosi. What's going on with my code that I am missing?
This line in the insert function is very problematic:
strcpy(table->word,str);
It's problematic because you don't actually allocate memory for the string. That means that table->word is uninitialized and its value will be indeterminate, so the strcpy call will lead to undefined behavior.
The simple solution? Use strdup to duplicate the string:
table->word = strdup(str);
The strdup function is not actually in standard C, but just about all platforms have it.
In your insert function, you do not allocate/malloc() space for the word pointer you are trying to strcpy() to:
if(table == NULL){
table = (struct entry *)malloc(sizeof(struct entry));
strcpy(table->word,str);
table -> n = 1;
table -> left = NULL;
table -> right = NULL;
}
Usually this code would exit with a segmentation fault, because you are copying data to memory you don't own, but this is easy to fix:
table->word = malloc(strlen(str) + 1);
strcpy(table->word, str);
You'll want to allocate one extra byte above the string length, to allow for the null terminator.
You do not need or want to cast the result of malloc(). In other words, this is fine:
table = malloc(sizeof(struct entry));
Get into the habit of using free() on any pointers you have malloc()-ed, when you are done with them. Otherwise, you end up with a memory leak.
Also, compile with -Wall -Weverything flags to enable all warnings.
Note: If one absolutely must use strdup(), it is easy to write a custom function to do so:
char* my_very_own_strdup(const char* src)
{
char* dest = NULL;
if (!src)
return dest;
size_t src_len = strlen(src) + 1;
dest = malloc(src_len);
if (!dest) {
perror("Error: Could not allocate space for string copy\n");
exit(EXIT_FAILURE);
}
memcpy(dest, src, src_len);
return dest;
}
On the line strcpy(table->word,str); where is table->word allocated?
So It copies only 4 bytes to table->word because pointer size is 4-bytes in your machine. So Be careful, you must allocate table->word there,
I would use this one instead of that table->word = strdup(str);

Pointer being freed was not allocated, Abort trap: 6

I'm not proficient in C programming so please excuse me if this isn't a strong question. In the following code, I can only allocate memory to samplesVecafter obtaining the value of nsamplepts, but I need to return the vector samplesVec to the main for further use (not yet coded). However, I'm getting the following error:
Error in Terminal Window:
ImportSweeps(3497,0x7fff7b129310) malloc: * error for object 0x7fdaa0c03af8: pointer being freed was not allocated
* set a breakpoint in malloc_error_break to debug
Abort trap: 6
I'm using Mac OS X Mavericks with the gcc compiler. Thanks for any help.
*EDITED!!! AFTER VALUABLE INPUTS FROM COMMENTATORS, THE FOLLOWING REPRESENTS A SOLUTION TO THE ORIGINAL PROBLEM (WHICH IS NO LONGER AVAILABLE) *
The following code modification seemed to solve my original questions. Thanks for the valuable inputs everyone!
/* Header Files */
#define LIBAIFF_NOCOMPAT 1 // do not use LibAiff 2 API compatibility
#include <libaiff/libaiff.h>
#include <unistd.h>
#include <stdio.h>
#include <dirent.h>
#include <string.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <math.h>
/* Function Declarations */
void FileSearch(char*, char*, char*, char*, char*);
int32_t *ImportSweeps(char*);
/* Main */
int main()
{
char flag1[2] = "N";
char binname[20] = "bin1"; // dummy assignment
char buildfilename[40] = "SweepR";
char skeletonpath[100] = "/Users/.../Folder name/";
int k, len;
/* Find the sweep to be imported in the directory given by filepath */
FileSearch(skeletonpath, binname, buildfilename, skeletonpath, flag1);
if (strcmp(flag1,"Y")) {
printf("No file found. End of program.\n");
} else {
len = (int) strlen(skeletonpath);
char *filepath = malloc(len);
for (k = 0; k < len; k++) {
filepath[k] = skeletonpath[k];
}
printf("File found! Filepath: %s\n", filepath);
// Proceed to import sweep
int32_t *sweepRfile = ImportSweeps(filepath);
if (sweepRfile) {
printf("Success!\n");
// Do other things with sweepRfile
free(sweepRfile);
}
free(filepath);
}
return 0;
}
/* Sub-Routines */
void FileSearch(char *dir, char *binname, char *buildfilename, char* filepath, char* flag1)
{
DIR *dp;
struct dirent *entry;
struct stat statbuf;
if((dp = opendir(dir)) == NULL) {
fprintf(stderr,"Cannot open directory: %s\n", dir);
return;
}
chdir(dir);
while((entry = readdir(dp)) != NULL) {
lstat(entry->d_name, &statbuf);
if(S_ISDIR(statbuf.st_mode)) {
/* Found a directory, but ignore . and .. */
if(strcmp(".",entry->d_name) == 0 || strcmp("..",entry->d_name) == 0)
continue;
strcpy(binname,entry->d_name);
strcpy(buildfilename,"SweepR");
/* Recurse at a new indent level */
FileSearch(entry->d_name, binname, buildfilename, filepath, flag1);
}
else {
sprintf(buildfilename, "%s%s.aiff", buildfilename, binname);
if (strcmp(entry->d_name,buildfilename)) {
strcpy(buildfilename,"SweepR");
} else {
sprintf(filepath, "%s%s/%s", filepath, binname, buildfilename);
strcpy(flag1,"Y");
break;
}
}
}
chdir("..");
closedir(dp);
}
int32_t *ImportSweeps(char *filepath)
{
char *filepathread = filepath;
/* Initialize files for importing */
AIFF_Ref fileref;
/* Intialize files for getting information about AIFF file */
uint64_t nSamples;
int32_t *samples = NULL;
int32_t *samplesVec = NULL;
int channels, bitsPerSample, segmentSize, ghost, nsamplepts;
double samplingRate;
/* Import Routine */
fileref = AIFF_OpenFile(filepathread, F_RDONLY) ;
if(fileref)
{
// File opened successfully. Proceed.
ghost = AIFF_GetAudioFormat(fileref, &nSamples, &channels, &samplingRate, &bitsPerSample, &segmentSize);
if (ghost < 1)
{
printf("Error getting audio format.\n");
AIFF_CloseFile(fileref); return (int32_t) 0;
}
nsamplepts = ((int) nSamples)*channels;
samples = malloc(nsamplepts * sizeof(int32_t));
samplesVec = malloc(nsamplepts * sizeof(int32_t));
ghost = AIFF_ReadSamples32Bit(fileref, samples, nsamplepts);
if (ghost) {
for (int k = 0; k < nsamplepts; k++) {
samplesVec[k] = *(samples+k);
}
}
free(samples);
AIFF_CloseFile(fileref);
}
return samplesVec;
}
So... as far as I can see... :-)
samplesVec, the return value of ImportSweeps is not initialized, if fileref is false. Automatic (== local) variables have no guarantees on its value if samplesVec are not explicitly initialized - in other words samplesVec could carry any address. If samplesVec is not NULL on luck (which on the other hand might be often the case), you try free a not allocated junk of memory, or by very bad luck an somewhere else allocated one.
If I'm correct with my guess you can easy fix this with:
int32_t *samples;
int32_t *samplesVec = NULL;
It is a good idea anyway to initialize any variable as soon as possible with some meaningful error or dummy value, if you not use it in the very next line. As pointers are horrible beasts, I always NULL them if I don't initialize them with a useful value on declaration.
Edit: Several minor small changes for a readable approximation to English. :-)
If AIFF_OpenFile fails, ImportSweeps returns an undefined value because samplesVec wasn't initialized. If that value is non-NULL, main will try to free it. You can either initialize samplesVec = NULL, or you can reorganize the code as
fileref = AIFF_OpenFile(filepathread, F_RDONLY) ;
if(!fileref) {
{
// print error message here
return NULL;
}
// File opened successfully. Proceed.
...
There are people who will insist a functon that should only have one exit -- they are poorly informed and voicing a faulty dogma handed down from others who are likewise uninformed and dogmatic. The check for error and return above is known as a guard clause. The alternate style, of indenting every time a test succeeds, yields the arrow anti-pattern that is harder to read, harder to modify, and more error prone. See http://blog.codinghorror.com/flattening-arrow-code/ and http://c2.com/cgi/wiki?ArrowAntiPattern for some discussion.

C, looping array of char* (strings) does't work. Why?

I have problem with my array of char*-
char *original_file_name_list[500];
while(dp=readdir(dir)) != NULL) {
original_file_name = dp->d_name;
original_file_name_list[counter] = original_file_name;
printf("%s\n",original_file_name_list[0]);
printf("%d\n",counter);
counter++;
}
The problem is, that it prints all files fine. It should print only first file, right?
And if I try printf("%s\n",original_file_name_list[1]); It doesn't work , which means that it is writing only in 1st string. Any idea why?
edit: There is no syntax error due to compiler.
You're not copying the string at all - also your file_name_list array hasn't enough space for a list of filenames - just for a list of pointers. But dp->d_name is just a pointer to a char* - you can't know for how long the memory behind the pointer is valid. Because of that you have to make a copy for yourself.
#include <string.h>
#include <dirent.h>
int main(int argc, char** argv){
char original_file_name_list[50][50];
size_t counter = 0;
while(dp=readdir(dir)) != NULL) // does work fine (ordinary reading files from dir)
{
size_t len = strlen(dp->d_name);
if(len >= 50) len = 49;
strncpy(original_file_name_list[counter], dp->d_name, len);
original_file_name_list[counter][len] = '\0';
printf("%d\n",counter);
counter++;
}
printf("%s\n",original_file_name_list[1]); // <- will work if you have at least 2 files in your directory
return 0;
}
I'm not sure about purpose of counter2 (I have replaced it with counter) but I can propose the following code with strdup() call to store the file names:
char *original_file_name_list[500] = {0}; // it is better to init it here
while(dp=readdir(dir)) != NULL) {
original_file_name_list[counter] = strdup(dp->d_name); // strdup() is ok to use
// here, see the comments
printf("%s\n%d\n",original_file_name_list[counter], counter);
counter++;
}
/* some useful code */
/* don't forget to free the items of list (allocated by strdup(..) )*/
for (int i = 0; i < 500; ++i) {
free(original_file_name_list[i]);
}

Reallocating memory for a struct array in C

I am having trouble with a struct array. I need to read in a text file line by line, and compare the values side by side. For example "Mama" would return 2 ma , 1 am because you have ma- am- ma. I have a struct:
typedef struct{
char first, second;
int count;
} pair;
I need to create an array of structs for the entire string, and then compare those structs. We also were introduced to memory allocation so we have to do it for any size file. That is where my trouble is really coming in. How do I reallocate the memory properly for an array of structs? This is my main as of now (doesn't compile, has errors obviously having trouble with this).
int main(int argc, char *argv[]){
//allocate memory for struct
pair *p = (pair*) malloc(sizeof(pair));
//if memory allocated
if(p != NULL){
//Attempt to open io files
for(int i = 1; i<= argc; i++){
FILE * fileIn = fopen(argv[i],"r");
if(fileIn != NULL){
//Read in file to string
char lineString[137];
while(fgets(lineString,137,fileIn) != NULL){
//Need to reallocate here, sizeof returning error on following line
//having trouble seeing how much memory I need
pair *realloc(pair *p, sizeof(pair)+strlen(linestring));
int structPos = 0;
for(i = 0; i<strlen(lineString)-1; i++){
for(int j = 1; j<strlen(lineSTring);j++){
p[structPos]->first = lineString[i];
p[structPos]->last = lineString[j];
structPos++;
}
}
}
}
}
}
else{
printf("pair pointer length is null\n");
}
}
I am happy to change things around obviously if there is a better method for this. I HAVE to use the above struct, have to have an array of structs, and have to work with memory allocation. Those are the only restrictions.
Allocating memory for an array of struct is as simple as allocating for one struct:
pair *array = malloc(sizeof(pair) * count);
Then you can access each item by subscribing "array":
array[0] => first item
array[1] => second item
etc
Regarding the realloc part, instead of:
pair *realloc(pair *p, sizeof(pair)+strlen(linestring));
(which is not syntactically valid, looks like a mix of realloc function prototype and its invocation at the same time), you should use:
p=realloc(p,[new size]);
In fact, you should use a different variable to store the result of realloc, since in case of memory allocation failure, it would return NULL while still leaving the already allocated memory (and then you would have lost its position in memory). But on most Unix systems, when doing casual processing (not some heavy duty task), reaching the point where malloc/realloc returns NULL is somehow a rare case (you must have exhausted all virtual free memory). Still it's better to write:
pair*newp=realloc(p,[new size]);
if(newp != NULL) p=newp;
else { ... last resort error handling, screaming for help ... }
So if I get this right you're counting how many times pairs of characters occur? Why all the mucking about with nested loops and using that pair struct when you can just keep a frequency table in a 64KB array, which is much simpler and orders of magnitude faster.
Here's roughly what I would do (SPOILER ALERT: especially if this is homework, please don't just copy/paste):
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
void count_frequencies(size_t* freq_tbl, FILE* pFile)
{
int first, second;
first = fgetc(pFile);
while( (second = fgetc(pFile)) != EOF)
{
/* Only consider printable characters */
if(isprint(first) && isprint(second))
++freq_tbl[(first << 8) | second];
/* Proceed to next character */
first = second;
}
}
int main(int argc, char*argv[])
{
size_t* freq_tbl = calloc(1 << 16, sizeof(size_t));;
FILE* pFile;
size_t i;
/* Handle some I/O errors */
if(argc < 2)
{
perror ("No file given");
return EXIT_FAILURE;
}
if(! (pFile = fopen(argv[1],"r")))
{
perror ("Error opening file");
return EXIT_FAILURE;
}
if(feof(pFile))
{
perror ("Empty file");
return EXIT_FAILURE;
}
count_frequencies(freq_tbl, pFile);
/* Print frequencies */
for(i = 0; i <= 0xffff; ++i)
if(freq_tbl[i] > 0)
printf("%c%c : %d\n", (char) (i >> 8), (char) (i & 0xff), freq_tbl[i]);
free(freq_tbl);
return EXIT_SUCCESS;
}
Sorry for the bit operations and hex notation. I just happen to like them in such a context of char tables, but they can be replaced with multiplications and additions, etc for clarity.

Resources