Trying to debug program - c

You are tasked to create a program that will read in a text file and compute statistics on the contents. Your program will count the number of alphabetic characters using isalpha(), the number of digits using isdigit() and the number of punctuation using ispunct(). (Zybook module 11). Your program will provide an appropriate report to display the analysis results to the user. Your program should work with any text file and the user should be able to select the file for input. You may create a test text file for testing and development. I suggest a short paragraph with punctuation and numbers that you can easily test the values with. Your array may be a VLA or FLA array. I suggest using an array of pointers and for each line of the text file allocate an array using calloc(). You may declare a static array, so long as it will hold 100 lines of 80 characters. A line length of 80 characters will be assumed.
Ive written most of the code but am getting a couple warnings and the program terminates abruptly and gives me a segmentation error.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int main(void) {
//Declarations
char paragraphArray[100][80], filename[50];
int i = 0, puncts = 0, alphas = 0, nums = 0, line = 0;
//Ask user what file to load and assign it to filename
printf("Enter the filename to wish to load: ");
scanf("%s", filename);
//FIle pointer is tagged
FILE *fPoint;
//Opens file pointer as the user-named file in read mode
fPoint = fopen(filename, "r");
//If file not found, a NULL value is assigned and prints to screen.
if(fPoint == NULL)
printf("Cannot open file");
//Else statement that reads the file line by line starting at index i=0
else
{ i=0;
while(fgets(paragraphArray[i], 80, fPoint))
{ paragraphArray[i][(strlen(paragraphArray[i]))-1]='\0';
i++;
}
//After while loop is finished, the value of i is stored as number of lines.
line = i;
}
//Closing the file
fclose(fPoint);
//Function Calls
alphas = IsAlpha(paragraphArray, i);
nums = IsDigit(paragraphArray, i);
puncts = IsPunct(paragraphArray, i);
//Display statistics to screen.
printf("There are %d alphabet characters.\n", alphas);
printf("There are %d numerical digits.\n" , nums);
printf("There are %d punctuation marks.\n" , puncts);
}
//Function Definitions
int IsAlpha(char paragraphArray[100][80], int line){
int alphaCount = 0, i = 0, j = 0, asciiValue = 0;
//Outer loop that iterates through each line of the paragraph.
for(i = 0; i < line; i++){
//Inner loop that compares elements of the array to ASCII values
for(j = 0; j < 80; j++){
asciiValue = paragraphArray[i][j];
//if statement that does the comparison and adds to count value
if((asciiValue <= 90 && asciiValue >= 65) || (asciiValue >= 97 && asciiValue <= 122))
alphaCount++;
}
}
//Returns count of alphabet characters after all iterations.
return alphaCount;
}
int IsDigit(char paragraphArray[100][80], int line){
int digitCount = 0, i = 0, j = 0, asciiValue = 0;
//Outer loop that iterates through each line of paragraph
for(i = 0; i < line; i++){
//Inner loop that compares elements of array to ASCII values.
for(j = 0; j < 80; j++){
asciiValue = paragraphArray[i][j];
//If statement that does the comparison and adds count value.
if(asciiValue >= 48 && asciiValue <= 57)
digitCount++;
}
}
//Returns count of numbers after all iterations.
return digitCount;
}
int IsPunct(char paragraphArray[100][80], int line){
int punctCount = 0, i = 0, j = 0, asciiValue = 0;
//Outer Loop that iterates through each line of paragraph.
for(i = 0; i < line; i++){
//Inner loop that compares elements of array to ASCII Values
for(j = 0; j < 80; j++){
asciiValue = paragraphArray[i][j];
//If statement that does comparison and adds count value.
if((asciiValue >= 33 && asciiValue <= 47) || (asciiValue >= 58 && asciiValue <= 64) || (asciiValue >= 91 && asciiValue <= 96) ||(asciiValue >= 123 && asciiValue <= 126))
punctCount++;
}
}
return punctCount;
}

Your problem is in for loop condition.
You itterate with the condition i < 80, but, if your line is lower than 80,
you compare undefined value.
Instead of 80 use strlen.
You can also use valgrind or gdp to debug your program. :)

Related

random numbers being read into array instead of the text file values

I am trying to make a function that reads all the numbers from a text file into an array, where each line of the file has a number, ex:
57346
40963
24580
98307
98312
32777
10
16392
16396
...
My function does allocate the necessary size to store the values, but the values being stored are random ones and 0's that aren't in my text file. Output ex:
0
0
296386
0
-485579776
-653048057
584
0
2095946880
...
This is my code:
typedef struct set{
void** values;
int size;
}Set;
int checkSize(FILE* file) {
int counter = 0;
char chr;
chr = getc(file);
while (chr != EOF) {
if (chr == '\n') {
counter = counter + 1;
}
chr = getc(file);
}
return counter;
}
Set* readSet(FILE* file){
Set* new = malloc(sizeof(Set));
new->size = checkSize(file);
new->values = malloc(sizeof(void*)*new->size);
int arrayAux[new->size];
int i = 0, n;
while(i < new->size) {
fscanf(file, "%ld", &arrayAux[i]);
new->values[i] = arrayAux[i];
i++;
}
//loop to remove the first three lines of the file, which are the number of values in the file,
//the biggest value of the file and the division between the number of values and the biggest value
for(i = 0; i < 3; i++) {
new->values[i] = new->values[i + 1];
new->size--;
}
for (i = 0; i <= new->size; i++) {
printf("%d\n", new->values[i]);
}
return new;
}
How can I fix this? Thanks in advance for any help.
Why void and not long?
You cannot do int arrayAux[new->size]; as size is a variable and thus cannot be used at compile time !!! 100% guarantee of reading out of bounds.
Read the value from file into a long and assign it to the proper space in your list.
Why have the size in every row? Use a global int
why loop to step over the first three in the list?
size -=3
i+=3
Works just as well

Why I am getting a space character in my program in the place of third last character?

Why I am getting a space character in my program in the place of third last character?
Even if I change the string str variable I get the same result.
#include <stdio.h>
#include <string.h>
void parser(char array[])
{
int a, b;
for (int i = 0; i < strlen(array); i++) {
if (array[i] == '>') {
a = i;
break;
}
}
for (int j = a; j < strlen(array); j++) {
if (array[j] == '<') {
b = j;
}
}
for (int p = 0, q = a + 1; p < b - a - 1, q < b; p++, q++) {
array[p] = array[q];
array[b - a] = '\0';
printf("%c", array[p]);
}
}
int main()
{
char str[] = "<h1>hello there i am programmer.</h1>";
parser(str);
return 0;
}
There are many things that could be written better in the code but they do not affect the result.
The line that produces the unexpected outcome is:
array[b-a]='\0';
When this for loop starts...
for(int p=0,q=a+1;p<b-a-1,q<b;p++,q++){
array[p]=array[q];
array[b-a]='\0';
printf("%c",array[p]);
}
... the values of a and b are 3 and 32.
The statement array[b-a]='\0'; puts the NUL terminator character at position 29 in array.
The loop starts with p=0, q=4 (a+1) and repeats until p reaches 28 and q reaches 31 (q<b)*.
When p is 25, q is 29 and array[29] has been repeatedly set to '\0' on the previous iterations, therefore '\0' is copied at position 25 and printed on screen.
You should set the NUL terminator only once, after the loop. And the right position for it is b-a-1, not b-a; you expressed this correctly in the for initialization (p=0) and exit condition (p<b-a-1).
All in all, the code around the last for loop should be like this:
for(int p=0, q=a+1;q<b;p++,q++){
array[p]=array[q];
printf("%c",array[p]);
}
array[b-a-1]='\0';
*The condition p<b-a-1 is ignore because of the comma character. You probably want & between the conditions but they are equivalent, one of them is enough.

Why is this array being initialized in an odd way?

I am reading K&R 2nd Edition and I am having trouble understanding exercise 1-13. The answer is this code
#include <stdio.h>
#define MAXHIST 15
#define MAXWORD 11
#define IN 1
#define OUT 0
main()
{
int c, i, nc, state;
int len;
int maxvalue;
int ovflow;
int wl[MAXWORD];
state = OUT;
nc = 0;
ovflow = 0;
for (i = 0; i < MAXWORD; i++)
wl[i] = 0;
while ((c = getchar()) != EOF)
{
if(c == ' ' || c == '\n' || c == '\t')
{
state = OUT;
if (nc > 0)
{
if (nc < MAXWORD)
++wl[nc];
else
++ovflow;
}
nc = 0;
}
else if (state == OUT)
{
state = IN;
nc = 1;
}
else
++nc;
}
maxvalue = 0;
for (i = 1; i < MAXWORD; ++i)
{
if(wl[i] > maxvalue)
maxvalue = wl[i];
}
for(i = 1; i < MAXWORD; ++i)
{
printf("%5d - %5d : ", i, wl[i]);
if(wl[i] > 0)
{
if((len = wl[i] * MAXHIST / maxvalue) <= 0)
len = 1;
}
else
len = 0;
while(len > 0)
{
putchar('*');
--len;
}
putchar('\n');
}
if (ovflow > 0)
printf("There are %d words >= %d\n", ovflow, MAXWORD);
return 0;
}
At the top, wl is being declared and initialized. What I don't understand is why is it looping through it and setting everything to zero if it just counts the length of words? It doesn't keep track of how many words there are, it just keeps track of the word length so why is everything set to 0?
I know this is unclear it's just been stressing me out for the past 20 minutes and I don't know why.
The ith element of the array wl[] is the number of words of length i that have been found in an input file. The wl[] array needs to be zero-initialized first so that ++wl[nc]; does not cause undefined behavior by attempting to use an uninitialized variable, and so that array elements that represent word lengths that are not present reflect that no such word lengths were found.
Note that ++wl[nc] increments the value wl[nc] when a word of length nc is encountered. If the array were not initialized, the first time the code attempts to increment an array element, it would be attempting to increment an indeterminate value. This attempt would cause undefined behavior.
Further, array indices that represent counts of word lengths that are not found in the input should hold values of zero, but without the zero-initialization, these values would be indeterminate. Even attempting to print these indeterminate values would cause undefined behavior.
The moral: initialize variables to sensible values, or store values in them, before attempting to use them.
It would seem simpler and be more clear to use an array initializer to zero-initialize the wl[] array:
int wl[MAXWORD] = { 0 };
After this, there is no need for the loop that sets the array values to zero (unless the array is used again) for another file. But, the posted code is from The C Answer Book by Tondo and Gimpel. This book provides solutions to the exercises found in the second edition of K&R in the style of K&R, and using only ideas that have been introduced in the book before each exercise. This exercise, 1.13, occurs in "Chapter 1 - A Tutorial Introduction". This is a brief tour of the language lacking many details to be found later in the book. At this point, assignment and arrays have been introduced, but array initializers have not (this has to wait until Chapter 4), and the K&R code that uses arrays has initialized arrays using loops thus far. Don't read too much into code style from the introductory chapter of a book that is 30+ years old.
Much has changed in C since K&R was published, e.g., main() is no longer a valid function signature for the main() function. Note that the function signature must be one of int main(void) or int main(int argc, char *argv[]) (or alternatively int main(int argc, char **argv)), with a caveat for implementation-defined signatures for main().
Everything is set to 0 because if you dont initialize the array, the array will be initialize with random number in it. Random number will cause error in your program. Instead of looping in every position of your array you could do this int wl[MAXWORD] = {0}; at the place of int wl[MAXWORD]; this will put 0 at every position in your array so you dont hava to do the loop.
I edited your code and put some comments in as I was working through it, to explain what's going on. I also changed some of your histogram calculations because they didn't seem to make sense to me.
Bottom line: It's using a primitive "state machine" to count up the letters in each group of characters that isn't white space. It stores this in wl[] such that wl[i] contains an integer that tells you how many groups of characters (sometimes called "tokens") has a word length of i. Because this is done by incrementing the appropriate element of w[], each element must be initialized to zero. Failing to do so would lead to undefined behavior, but probably would result in nonsensical and absurdly large counts in each element of w[].
Additionally, any token with a length that can't be reflected in w[] will be tallied in the ovflow variable, so at the end there will be an accounting of every token.
#include <stdio.h>
#define MAXHIST 15
#define MAXWORD 11
#define IN 1
#define OUT 0
int main(void) {
int c, i, nc, state;
int len;
int maxvalue;
int ovflow;
int wl[MAXWORD];
// Initializations
state = OUT; //Start off not assuming we're IN a word
nc = 0; //Start off with a character count of 0 for current word
ovflow = 0; //Start off not assuming any words > MAXWORD length
// Start off with our counters of words at each length at zero
for (i = 0; i < MAXWORD; i++) {
wl[i] = 0;
}
// Main loop to count characters in each 'word'
// state keeps track of whether we are IN a word or OUTside of one
// For each character in the input stream...
// - If it's whitespace, set our state to being OUTside of a word
// and, if we have a character count in nc (meaning we've just left
// a word), increment the counter in the wl (word length) array.
// For example, if we've just counted five characters, increment
// wl[5], to reflect that we now know there is one more word with
// a length of five. If we've exceeded the maximum word length,
// then increment our overflow counter. Either way, since we're
// currently looking at a whitespace character, reset the character
// counter so that we can start counting characters with our next
// word.
// - If we encounter something other than whitespace, and we were
// until now OUTside of a word, change our state to being IN a word
// and start the character counter off at 1.
// - If we encounter something other than whitespace, and we are
// still in a word (not OUTside of a word), then just increment
// the character counter.
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t') {
state = OUT;
if (nc > 0) {
if (nc < MAXWORD) ++wl[nc];
else ++ovflow;
}
nc = 0;
} else if (state == OUT) {
state = IN;
nc = 1;
} else {
++nc;
}
}
// Find out which length has the most number of words in it by looping
// through the word length array.
maxvalue = 0;
for (i = 1; i < MAXWORD; ++i) {
if(wl[i] > maxvalue) maxvalue = wl[i];
}
// Print out our histogram
for (i = 1; i < MAXWORD; ++i) {
// Print the word length - then the number of words with that length
printf("%5d - %5d : ", i, wl[i]);
if (wl[i] > 0) {
len = wl[i] * MAXHIST / maxvalue;
if (len <= 0) len = 1;
} else {
len = 0;
}
// This is confusing and unnecessary. It's integer division, with no
// negative numbers. What we want to have happen is that the length
// of the bar will be 0 if wl[i] is zero; that the bar will have length
// 1 if the bar is otherwise too small to represent; and that it will be
// expressed as some fraction of MAXHIST otherwise.
//if(wl[i] > 0)
// {
// if((len = wl[i] * MAXHIST / maxvalue) <= 0)
// len = 1;
// }
// else
// len = 0;
// Multiply MAXHIST (our histogram maximum length) times the relative
// fraction, i.e., we're using a histogram bar length of MAXHIST for
// our statistical mode, and interpolating everything else.
len = ((double)wl[i] / maxvalue) * MAXHIST;
// Our one special case might be if maxvalue is huge, a word length
// with just one occurrence might be rounded down to zero. We can fix
// that manually instead of using a weird logic structure.
if ((len == 0) && (wl[i] > 0)) len = 1;
while (len > 0) {
putchar('*');
--len;
}
putchar('\n');
}
// If any words exceeded the maximum word length, say how many there were.
if (ovflow > 0) printf("There are %d words >= %d\n", ovflow, MAXWORD);
return 0;
}

Error printing a string

I'm creating a algorithm that generate numbers (numbers like strings) from 0 to 9999 and search its frequency in a array a[50000].
char key[4];
int freq;
for (int i = 0; i < 10000; i++) {
sprintf(key,"%04i",i); // save 4 digits in key, if i <1000 save leading 0's
freq = BruteForceStringMatch(key,a,n); //n length of a.
printf("%s-%i\n",key,freq);
}
free(a);
but, when I run the program, I get it.
.
.
.
9845-7
9846
-10
9847-4
9848-5
-139
9850-3
9851-6
9852-5
9853-4
9854-2
9855-7
9856-5
9857-4
9858-5
9859 -9
9860-3
.
.
.
9968-6
9969 -9
9970-5
9971-4
9972-7
9973-6
9974-6
9975-2
9976-7
9977-4
9978-2
9979-7
9980-3
9981-4
9982-3
9983 -9
9984-6
9985-7
998-8
9987 -9
9988-3
9989 -9
9990-4
9991-3
9992-5
9993-2
9994 -9
9995-5
9996-6
9997-7
9998-7
There are tabs in randoms position,sometimes the last digit of key is eliminated and there are 139,113,etc that I have no idea where they come from. I'm using gcc version 5.4.0 (GCC) and compile it with windows 10 and the terminal babun.
More information:
BruteForceStringMatch search the frequency of key in a.
int BruteForceStringMatch(char key[4], char* a, int length ){
int freq=0;
int k;
for (int j = 0; j < length -4; j++) {
k =0;
while(k <4 && key[k] == a[j+k])
k=k+1;
if(k == 4)
freq++;
}
return freq;
}
I get a from a file with 5000 digits.
FILE *inputfile;
char c;
int largo = 0;
char *a = (char *)malloc(50000*sizeof(char *));;
char *b = (char *)malloc(50000*sizeof(char *));;
inputfile = fopen("archivo_1.tex", "r");
if (inputfile == NULL) {
fprintf(stderr, "Failed to open the file.\n");
exit(1);
}
if (inputfile) {
for ( int i=0; (c = getc(inputfile)) != EOF; i++){
a[i] = c;
//putchar(a[i]);
largo++;
}
fclose(inputfile);
}
It seems to me your problem is that you defined "key" to only be four chars, when it should be five -- four digits plus the terminating null. So that null ends up in the first byte of "freq"... then when you set "freq" in line 5, that value gets seen by printf (on line 6) as part of the "key" string. In particular, you can see this in the output for the values 9859 and 8859, where the value of "freq" happens to be 9, which is the ASCII code for a tab. Also, for value 9846 and "freq" is 10, which happens to be the ASCII value for linefeed (i.e., newline), and on 9849, where "freq" is 13, which is a carriage return, so "-13" prints over the first three characters of 9849.

Finding the most frequent character in a file in C

I'm writing a function that finds the most common alphabetic character in a file. The function should ignore all characters other than alphabetic.
At the moment I have the following:
int most_common(const char *filename)
{
char frequency[26];
int ch = 0;
FILE *fileHandle;
if((fileHandle = fopen(filename, "r")) == NULL){
return -1;
}
for (ch = 0; ch < 26; ch++)
frequency[ch] = 0;
while(1){
ch = fgetc(fileHandle);
if (ch == EOF) break;
if ('a' <= ch && ch <= 'z')
frequency[ch - 'a']++;
else if ('A' <= ch && ch <= 'Z')
frequency[ch - 'A']++;
}
int max = 0;
for (int i = 1; i < 26; ++i)
if (frequency[i] > frequency[max])
max = i;
return max;
}
Now the function returns how many times the most frequent letter occurred, not the character itself. I'm a bit lost, as I'm not sure if that's the way this function should look like at all. Does it make sense and how possibly can I fix the problem?
I would really appreciate your help.
The variable frequency is indexed by the character code. So frequency[0] is 5, if there have been 5 'a's.
In your code you are assigning the count to max, not the character code, so you're returning the count not the actual character.
You need to store both the maximum frequency count and the character code that it referred to.
I would fix this with:
int maxCount = 0;
int maxChar = 0;
// i = A to Z
for (int i = 0; i <= 26; ++i)
{
// if freq of this char is greater than the previous max freq
if (frequency[i] > maxCount)
{
// store the value of the max freq
maxCount = frequency[i];
// store the char that had the max freq
maxChar = i;
}
}
// character codes are zero-based alphabet.
// Add ASCII value of 'A' to turn back into a char code.
return maxChar + 'A';
Note that I changed int i = 1 to int i = 0. Starting at 1 would mean starting at B, which is a subtle bug you might not notice. Also, the loop should terminate at <= 26, otherwise you'd miss out Z too.
Note the braces. Your braces style (no braces for single-statement blocks) comes very highly un-recommended.
Also, i++ is more common than ++i in cases like this. In this context it will make no difference, so would advise i++.

Resources