Huffman encoding in C

Huffman encoding in C - c

I am trying to write a module which assigns huffman encoded words to the input symbols, but the given codes differ from what they should look like.
For example, if I run it with following symbol probabilities:
(1st column: probabilities; 2nd column: my huffman codes; 3rd column: correct huffman codes)
0,25 --> 01 --> 10
0,15 --> 101 --> 111
0,15 --> 110 --> 110
0,1 --> 1111 --> 010
0,1 --> 000 --> 001
0,05 --> 0010 --> 0110
0,05 --> 0011 --> 0001
0,05 --> 1000 --> 0000
0,05 --> 1001 --> 01111
0,05 --> 1110 --> 01110
I think the problem might be caused in my function for generating huffman codes, since strcat() function's behaviour was initially not good for my idea, so I combined it with strcat(). Not sure if it is good that way tho.
I am providing you with two functions responsible for codes assign, build_huffman_tree() and generate_huffman_tree(), hopefully you can help me out with this, and point out where the problem could be.
Generate guffman tree:
void generate_huffman_tree(node *n, char *code){
if(n->left== NULL && n->right== NULL){
SYMBOLS[code_counter] = n->symbol; // this 3 lines just store current code, not important
CODES[code_counter] = strdup(code);
code_counter += 1;
}
if(n->left!= NULL){
char temp[100];
strcpy(temp, code);
strcat(temp, "0");
generate_huffman_tree(n->left, temp);
}
if(n->right!= NULL){
char temp[100];
strcpy(temp, code);
strcat(temp, "1");
generate_huffman_tree(n->right, temp);
}
Build Huffman tree:
node *build_huffman_tree(double *probabilities){
int num_of_nodes = NUM_SYMBOLS;
int num = NUM_SYMBOLS;
// 1) Initialization: Create new node for every probability
node *leafs = (node*) malloc(num_of_nodes*sizeof(node));
int i;
for(i=0; i<num_of_nodes; i+=1){
node c;
c.probability= *(probability+ i);
c.symbol= *(SYMBOLS + i);
c.left= NULL;
c.right= NULL;
*(leafs+i) = c;
}
node *root= (node*) malloc(sizeof(node)); // Root node which will be returned
while(num_of_nodes> 1){
// 2) Find 2 nodes with lowest probabilities
node *min_n1= (node*)malloc(sizeof(node));
node *min_n2 = (node*)malloc(sizeof(node));
*min_n1 = *find_min_node(leafs, num, min_n1);
leafs = remove_node(leafs, min_n1, num);
num -= 1;
*min_n2= *find_min_node(leafs, num, min_n2);
leafs = remove_node(leafs, min_n2, num);
num -= 1;
// 3) Create parent node, and assign 2 min nodes as its children
// add parent node to leafs, while its children have been removed from leafs
node *new_node = (node*) malloc(sizeof(node));
new_node->probabilty= min_n1->probability + min_n2->probability;
new_node->left= min_n1;
new_node->right= min_n2;
leafs = add_node(leafs, new_node, num);
num += 1;
num_of_nodes -= 1;
root = new_node;
}
return root;
I have tested functions for finding 2 min nodes, removing and adding nodes to leafs structure, and it is proven to work fine, so I guess the problem should be something about this here.

I didn't look at your source code, but there's nothing wrong with the Huffman code you generated. There is also nothing wrong with what you are calling "correct huffman codes". There is more than one valid Huffman code possible with that set of probabilities. If you take the sum of the probabilities times the bit lengths for both Huffman codes, you will find that those sums are exactly the same. Both Huffman codes are optimal, even though they're different.
The way this happens is that when you look for the two lowest frequencies, there is more than one choice. Depending on which choice you make, you will get a different tree.

This code below is an implementation of Mark Allen Weiss's Algorithm. Give it a try!
It offers routines similar to yours, in addition to a function that displays the result according to the previously constituted codes for each letter.
The compiler used is MinGW 2.95 (C-Free 4.0).
Prerequisites:
An input file with a text (any, but remember, it deals with alphabet characters only, no punctuation, no space, no numbers).
The constant IN_PATH is the one you should modify to point at the right location of your text file to run the program successfully.
The image shows a sample text, the letters proportions and the result of huffman code interpretation (letters separated by one space).
Good luck!
//*******************************************************************
// File: HuffmanEncoding - Tree.c
// Author(s): Mohamed Ennahdi El Idrissi
// Date: 14-Aug-2012
//
// Input Files: in.txt
// Output Files: out.txt
// Description: CSC 2302 - <Data Structures>
// <Struct, Array, File I/O, Recursion, Pointers, Binary Tree>
// This program covers the Huffman Encoding concept.
// We first read a file, from we which we count the number of characters, and then reckon the frequency
// of each letter individually. Each letter's frequency is stored in a node with its respective character.
// This node is stored in an array of 26 elements (element 0 -> 'A', element 1 -> 'B'...element 25 -> 'Z').
// Each element is a pointer, and each pointer is supposed to be a root of a tree (sub tree).
// After processing all characters of the text (read from a file), we end up with an array with
// 25 NULL elements. The only element that is not NULL is the root of the tree that gathers the different
// nodes of each letter.
// Deducing the encoding of each letter if performed with intermediary of the prefix traversal.
// To summarize, the pseudo-code is:
// - Initialize the letters array
// - Read file
// - Increment each letter frequency + compute the number of characters in the file
// - Store in the array's node the frequency of each letter
// - Compute the number (N) of involved characters (Sometimes, texts don't include all letters. In our case 'Q' and 'Z' are absent).
// - Loop N times
// - find Minimum and second minimum
// - create a new node, its left child contains the minimum and the right child contains the second minimum
// - minimum position points on the new node, and the second minimum's array position points on NULL
// - Browse the array till the unique non NULL element is encountered
// - invoke prefix traversal function
// - build the encoding of each character
// - display the letter and its characteristics when found.
// - Finally, read the output file to interpret its content
// - if root contains a character (A - Z), display character
// - else, if the current character is '0', browse the left leaf
// - else, if the current character is '1', browse the right leaf
//
//*******************************************************************
#include <stdio.h>
#define NBR_OF_LETTERS 26
#define LEFT 'L'
#define RIGHT 'R'
#define CODE_SIZE 128
#define TYPED_ALLOC(type) (type *) malloc( sizeof(type) )
#define BYTE_SIZE 8
#define IN_PATH "./files/in.txt"
#define OUT_PATH "./files/out.txt"
typedef struct tree_node_s {
float frequency;
char c;
char code[CODE_SIZE];
struct tree_node_s *left;
struct tree_node_s *right;
} tree_node_t;
tree_node_t *arr[NBR_OF_LETTERS], *letters[NBR_OF_LETTERS];
void findMinAndSecondMin(tree_node_t **, float *, int *, float *, int *);
void printTree(tree_node_t *);
void interpret(char *, int *, tree_node_t *);
void printTree(tree_node_t *);
void encode(tree_node_t *, tree_node_t **, char, short, char*);
/*
*
*/
int main() {
char str[CODE_SIZE];
int fileReadingVerdict;
int i, j, k, index, n;
float min, secondMin;
int minIndex, secondMinIndex;
int numberOfCharacters = 0;
tree_node_t *tree;
FILE *in = fopen(IN_PATH, "r");
FILE *out;
if ( in == NULL ) {
printf("\nFile not found");
return 0;
} else {
/*
* Begin: Array Initialization
*/
for (i = 'A'; i <= 'Z'; i++) {
index = i - 'A';
arr[index] = NULL;
}
/*
* End: Array Initialization
*/
numberOfCharacters = 0;
fileReadingVerdict = fgets(str, CODE_SIZE, in) != NULL;
while(!feof(in) || fileReadingVerdict) {
n = strlen(str);
printf("\n%s", str);
for (i = 0; i < n ; i ++ ) {
str[i] = toupper(str[i]);
if (str[i] >= 'A' && str[i] <= 'Z') {
numberOfCharacters ++;
index = str[i] - 'A';
if (arr[index] == NULL) {
arr[index] = TYPED_ALLOC(tree_node_t);// malloc(sizeof(tree_node_t));
arr[index]->c = str[i];
arr[index]->frequency = 1;
arr[index]->left = arr[index]->right = NULL;
} else {
arr[index]->frequency += 1;
}
}
}
if (fileReadingVerdict) {
fileReadingVerdict = fgets(str, CODE_SIZE, in) != NULL;
}
}
}
fclose(in);
for ( i = 0, n = 0 ; i < NBR_OF_LETTERS ; i ++ ) {
letters[i] = arr[i];
if (arr[i] != NULL) {
arr[i]->frequency /= numberOfCharacters; // Computing the frequency.
n ++; // n is the number of involved letters which is going to be consumed in the do while loop's condition
}
}
j = 1;
do {
findMinAndSecondMin(arr, &min, &minIndex, &secondMin, &secondMinIndex);
if (minIndex != -1 && secondMinIndex != -1 && minIndex != secondMinIndex) {
tree_node_t *temp;
tree = TYPED_ALLOC(tree_node_t);// malloc(sizeof(tree_node_t));
tree->frequency = arr[minIndex]->frequency + arr[secondMinIndex]->frequency;
tree->c = j;
tree->left = arr[minIndex];
temp = TYPED_ALLOC(tree_node_t);// malloc(sizeof(tree_node_t));
temp->c = arr[secondMinIndex]->c;
temp->frequency = arr[secondMinIndex]->frequency;
temp->left = arr[secondMinIndex]->left;
temp->right = arr[secondMinIndex]->right;
tree->right = temp;
arr[minIndex] = tree;
arr[secondMinIndex] = NULL;
}
j ++;
} while( j < n );
for ( i = 0 ; i < NBR_OF_LETTERS ; i ++ ) {
if (arr[i] != NULL) {
char code[CODE_SIZE];
strcpy(code, "");
encode(tree = arr[i], letters, 0, 0, code);
puts("\nSuccessful encoding");
printTree(arr[i]);
break;
}
}
in = fopen(IN_PATH, "r");
out = fopen(OUT_PATH, "w");
fileReadingVerdict = fgets(str, CODE_SIZE, in) != NULL;
while(!feof(in) || fileReadingVerdict) {
n = strlen(str);
for (i = 0; i < n ; i ++ ) {
str[i] = toupper(str[i]);
if (str[i] >= 'A' && str[i] <= 'Z') {
index = str[i] - 'A';
fputs(letters[index]->code, out);
}
}
if (fileReadingVerdict) {
fileReadingVerdict = fgets(str, CODE_SIZE, in) != NULL;
}
}
fclose(in);
fclose(out);
printf("\nFile size (only letters) of the input file: %d bits", numberOfCharacters * BYTE_SIZE);
out = fopen(OUT_PATH, "r");
fileReadingVerdict = fgets(str, CODE_SIZE, out) != NULL;
numberOfCharacters = 0;
while(!feof(out) || fileReadingVerdict) {
numberOfCharacters += strlen(str);
if (fileReadingVerdict) {
fileReadingVerdict = fgets(str, CODE_SIZE, out) != NULL;
}
}
fclose(out);
printf("\nFile size of the output file: %d bits", numberOfCharacters);
printf("\nInterpreting output file:\n");
out = fopen(OUT_PATH, "r");
fileReadingVerdict = fgets(str, CODE_SIZE, out) != NULL;
while(!feof(out) || fileReadingVerdict) {
n = strlen(str);
i = 0 ;
while(i < n) {
interpret(str, &i, tree);
}
if (fileReadingVerdict) {
fileReadingVerdict = fgets(str, CODE_SIZE, out) != NULL;
}
}
fclose(out);
puts("\n");
return 0;
}
/*
*
*/
void encode(tree_node_t *node, tree_node_t **letters, char direction, short level, char* code) {
int n;
if ( node != NULL ) {
if ((n = strlen(code)) < level) {
if (direction == RIGHT) {
strcat(code, "1");
} else {
if (direction == LEFT) {
strcat(code, "0");
}
}
} else {
if (n >= level) {
code[n - (n - level) - 1] = 0;
if (direction == RIGHT) {
strcat(code, "1");
} else {
if (direction == LEFT) {
strcat(code, "0");
}
}
}
}
if (node->c >= 'A' && node->c <= 'Z') {
strcpy(node->code, code);
strcpy(letters[node->c - 'A']->code, code);
}
encode(node->left, letters, LEFT, level + 1, code);
encode(node->right, letters, RIGHT, level + 1, code);
}
}
void printTree(tree_node_t *node) {
int n;
if ( node != NULL ) {
if (node->c >= 'A' && node->c <= 'Z') {
printf("\t%c - frequency: %.10f\tencoding: %s\n", node->c, node->frequency, node->code);
}
printTree(node->left);
printTree(node->right);
}
}
/*
* Begin: Minimum and second minimum
*/
void findMinAndSecondMin(tree_node_t *arr[], float *min, int *minIndex, float *secondMin, int *secondMinIndex) {
int i, k;
k = 0;
*minIndex = -1;
/*
* Skipping all the NULL elements.
*/
while (k < NBR_OF_LETTERS && arr[k] == NULL) k++;
*minIndex = k;
*min = arr[k]->frequency;
for ( i = k ; i < NBR_OF_LETTERS; i ++ ) {
if ( arr[i] != NULL && arr[i]->frequency < *min ) {
*min = arr[i]->frequency;
*minIndex = i;
}
}
k = 0;
*secondMinIndex = -1;
/*
* Skipping all the NULL elements.
*/
while ((k < NBR_OF_LETTERS && arr[k] == NULL) || (k == *minIndex && arr[k] != NULL)) k++;
*secondMin = arr[k]->frequency;
*secondMinIndex = k;
if (k == *minIndex) k ++;
for ( i = k ; i < NBR_OF_LETTERS; i ++ ) {
if ( arr[i] != NULL && arr[i]->frequency < *secondMin && i != *minIndex ) {
*secondMin = arr[i]->frequency;
*secondMinIndex = i;
}
}
/*
* End: Minimum and second minimum
*/
}
void interpret(char *str, int *index, tree_node_t *tree) {
int n = strlen(str);
if (tree->c >= 'A' && tree->c <= 'Z') {
printf("%c ", tree->c);
return ;
} else {
if ( *index < n ) {
if (str[*index] == '0') {
(*index) ++;
interpret(str, index, tree->left);
} else {
if (str[*index] == '1') {
(*index) ++;
interpret(str, index, tree->right);
}
}
}
}
}

Related

Why does the last string not properly enter a multidemsional array which shows character frequency?

I have been trying to find a solution for a long time but it seems like i cant.
I was trying to make a program that reads sentences from a file, puts the sentences as strings it into multidimensional array and quicksorts the array sentences alphabetically. Then it puts that array into an another array which shows the frequency of letters in a column position which shows the letter and frequency.
For example sentence 7 "CUD" would be row 7 and have a 1,1 and 1 in columns 3(c) 4(d) and 21(U). This works perfectly except for the last sentence which for some reason doesn't enter at all and shows random numbers.
void isAnAnagram(char anagramTester[][MAX_CHAR]) {
int countLetters[MAX_LINES][26] = {0};
int x;
for (int i =1; i <=MAX_LINES; i++) {
for (int j = 0; (anagramTester[i][j] != '\0'); j++) {
if (anagramTester[i][j] >= 'a' && anagramTester[i][j] <= 'z') {
x = anagramTester[i][j] - 'a' +1;
countLetters[i][x]++;
} else if (anagramTester[i][j] >= 'A' && anagramTester[i][j] <= 'Z') {
x = anagramTester[i][j] - 'A' +1;
countLetters[i][x]++;
}
}
}
for(int i=1;i<=MAX_LINES;i++){ //Rows
printf("row:%d ", i);
for(int j=1;j<=26;j++){ //Cols
printf("%d ",countLetters[i][j]);
}
printf("\n");
}}
This is the function which accepts a 2 dimensional array with strings in it
I have tried changing the loops but fundamentally I don't know why this error is only occurring in the last sentence.
here is the file input and output function
void readSentences(char inputSentences[][MAX_CHAR]){
int lineNum = 0;
FILE *fp = fopen("YOurDETAILS/input.txt", "r+");
fseek(fp, 0, SEEK_SET);
if(fp== NULL ){
/* check does weather file exist etc*/
perror("Error opening file");
lineNum = -1;
/* use this as a file not found code */
}
else {
// fgets returns NULL when it gets to the end of the file
for(lineNum = 1; lineNum <= MAX_LINES; lineNum++){
if(fgets(inputSentences[lineNum], MAX_CHAR, fp) != NULL){
inputSentences[lineNum][MAX_CHAR] = '\0';
}
else {
inputSentences[lineNum][MAX_CHAR] = '\0';
fclose(fp);
}
inputSentences[MAX_LINES][MAX_CHAR] = '\0';
}
}
}
void writeAnswer(char output[][MAX_CHAR]){
FILE *fp = fopen("YOurDETAILS/output.txt", "w");
fprintf(fp,"Sorted Array:\n");
for (int i =1; i <= MAX_LINES; i++) {
fprintf(fp,"sentence %d:%s \n ", i, output[i]);
}
fclose(fp);
}
the contents of the input.txt file are
cat
O, Draconian devil! Oh, lame saint!
tac
Tom Marvolo Riddle
Software engineering
Leonardo da Vinci! The Mona Lisa!
Computer science
CUD
Act
cuddle
Hey there!
Old Immortal dovers
I am Lord Voldemort
duck
the sorting file is
void swap(char sentencesToSwap[][MAX_CHAR], int i, int j){
for(int x = 0;x < MAX_CHAR; x++ ) {
char temp = sentencesToSwap[i][x];
sentencesToSwap[i][x] = sentencesToSwap[j][x];
sentencesToSwap[j][x] = temp;
}
}
void quicksort(char sentencesToSort[][MAX_CHAR], int first, int last){
if(first < last){
char pivotindex = partition(sentencesToSort, first, last);
quicksort(sentencesToSort, first, pivotindex-1);
quicksort(sentencesToSort, pivotindex+1, last);
}
}
int partition(char sentencesToSort[][MAX_CHAR], int first, int last){
swap(sentencesToSort, first, (first + last) / 2);
char *pivot;
pivot = sentencesToSort[first]; // remember pivot
int index1 = first + 1; // index of first unknown value
int index2 = last; // index of last unknown value
while (index1 <= index2) { // while some values still unknown
if (strcasecmp(sentencesToSort[index1],pivot) <= 0)
index1++;
else if (strcasecmp(sentencesToSort[index2],pivot) > 0)
index2--;
else {
swap(sentencesToSort, index1, index2);
index1++;
index2--;
}
}
swap(sentencesToSort, first, index2); // put the pivot value between the two
// sublists and return its index
return index2;
}
The main file is below
char sentences[MAX_LINES][MAX_CHAR];
readSentences(sentences);
isAnAnagram(sentences);
quicksort(sentences, 0, MAX_LINES );
writeAnswer(sentences);

How can I multiply two strings containing 'huge numbers' (over 30 digits)?

I'm doing school project which I'm needed to first receive 2 huge numbers (unlimited size, for the sake of example, lets say over 30 digits), second step is to take the 2 input numbers the create new number of the multiplication of the two, which I'm really breaking a sweat trying to do so.
My code so far:
Type definition to making sure I'm handling the right variables:
typedef char* verylong;
#define MAX_SIZE 100
Input method:
verylong input_long() {
int i, len; //i for loop, len for strlen - using integer for it to avoid invoking the method more than 1 time
verylong number;
char temp_str[MAX_SIZE]; //the input from user - limited to 100
gets(temp_str); //user input
len = strlen(temp_str); //saving the length of the input
number = (char*)calloc(len + 1, sizeof(char)); //allocating memory for the verylong and saving space for \0
for (i = 0; i < len; i++) {
if (temp_str[i] - '0' < 0 || temp_str[i] - '0' > 9) { //the input is not a digit
printf("\nBad input!\n");
return NULL;
}
number[i] = temp_str[i]; //all is good -> add to the verylong number
}
number[i] = '\0'; //setting last spot
return number;
}
My sad attempt of completing my task:
verylong multiply_verylong(verylong vl1, verylong vl2) {
verylong mult;
int cur, i, j, k, lrg, sml, temp_size;
char *temp;
j = 1;
temp = (char*)calloc(lrg + sml + 1, sizeof(char)); //maximum amount of digits
if (strlen(vl1) > strlen(vl2)) {
lrg = strlen(vl1);
sml = strlen(vl2);
}
else {
lrg = strlen(vl2);
sml = strlen(vl1);
}
cur = 0;
for (i = sml-1; i >= 0; i--) {
k = 0;
temp_size = 0;
cur = (vl1[i] - '0')*(vl2[i] - '0');
printf("\ncur=%d", cur);
if (cur > 9)
temp_size = 2;
else
temp_size = 1;
while (k < temp_size) {
if (cur > 9)
temp[j++] = (cur % 10) + '0';
else
temp[j++] = cur + '0';
cur /= 10;
k++;
}
}
mult = (char*)calloc(j + 1, sizeof(char));
for (i = 0; i < j; i++) {
mult[i] = temp[i];
}
mult[i] = '\0';
free(temp);
return mult;
}
Long story short, I know I'm doing mistake at my multiplication method since I'm adding the numbers by simply adding the mult of 2 digits at a time, over that I truly am lost.
Thanks.

My advice would be to break the task into a number of simpler task.
How would you do the multiplication on paper?
123 * 456 -> 1 * (456 * 100) + 2 * (456 * 10) + 3 * (456 * 1)
or written differently
3 * ( 1 * 456)
+ 2 * ( 10 * 456)
+ 1 * (100 * 456)
---------------
SUM TO GET RESULT
or
3 * 456
+ 2 * 4560
+ 1 * 45600
---------------
SUM TO GET RESULT
From this you can identify 3 tasks
Multiplying with powers of 10, i.e. 1, 10, 100, etc. (i.e. add zeros to the end)
Multiplying a string-number with a single digit
Adding two string-numbers.
Write simple functions for each of these steps.
char* mulPowerOf10(char* sn, unsigned power)
{
...
}
char* mulDigit(char* sn, char digit)
{
...
}
char* addNumbers(char* snA, char* snB)
{
...
}
Using these 3 simple functions you can put the real multiplication together. In psedo-code:
char* mulNumbers(char* snA, char* snB)
{
char* result = malloc(2);
strcpy(result, "0");
unsigned power = 0;
for_each_digit D in snA
{
char* t1 = mulPowerOf10(snB, power)
char* t2 = mulDigit(t1, D)
result = addNumbers(result, t2)
++power;
}
free(.. what needs to be freed ..);
return result;
}

Here is a code example.
I found it simpler to store the number as a sequence of digits along with the length in a struct. The number may have leading zeros.
#define MAX_SIZE 1024
typedef struct Number {
int len;
char digits[];
} Number;
// Instantiate a number with room for len digits.
Number *newNumber(int len) {
Number *n = malloc(sizeof(Number)+len);
n->len = len;
memset(n->digits, 0, len);
return n;
}
// inputNumber reads a number from stdin. It return NULL if the input
// is invalid, otherwise it returns a Number containing the given digits.
Number *inputNumber() {
char temp[MAX_SIZE];
if (fgets(temp, sizeof temp, stdin) == NULL)
return NULL; // use fgets because gets is deprecated since C11
// remove trailing \n if any
int len = strlen(temp);
if (len > 0 && temp[len-1] == '\n')
temp[--len] = '\0';
// check input validity
if (len == 0)
return NULL;
for (int i = 0; temp[i] != '\0'; i++)
if (temp[i] < '0' || temp[i] > '9')
return NULL;
Number *n = newNumber(len);
for (int i = 0; temp[i] != '\0'; i++)
n->digits[i] = temp[i] - '0';
return n;
}
To multiply two numbers n1 and n2, we multiply n1 with each digit of n2, and accumulate the result shifted on the left by the position of the n2 digit in the final result.
For instance, to multiply 123*456, we compute 123*4 + 123*5*10 + 123*6*100. Note that *10 and *100 are simply left shifts.
We thus need a function that multiplies a number with a digit, and another function that accumulates a number with a left shift in a result number.
// multiply stores the result of n time digit in result.
// Requires the len of result is the len of n + 1.
void multiplyNumber(Number *n, char digit, Number *result) {
char carry = 0;
for (int i = r->len-1, j = n->len-1; i > 0; i--, j--) {
char x = n->digits[j] * d + carry;
r->digits[i] = x%10;
carry = x/10;
}
r->digits[0] = carry;
}
// accumutateNumber adds n with the left shift s to the number r.
// Requires the len of r is at least len of n + s + 1.
void accumulateNumber(Number *n, int s, Number *r) {
char carry = 0;
for (int i = r->len-1-s, j = n->len-1; j >= 0; i--, j--) {
char x = r->digits[i] + n->digits[j] + carry;
r->digits[i] = x%10;
carry = x/10;
}
r->digits[r->len-1-s-n->len] = carry;
}
Finally, we also need a function to print the number
void printNumber(Number *n) {
int i = 0;
// skip 0 at the front
while (i < n->len && n->digits[i] == 0)
i++;
if (i == n->len) {
printf("0\n");
return;
}
while (i < n->len)
putchar(n->digits[i++] + '0');
putchar('\n');
}
And this is it. We can now write the main function with the input of the numbers, the multiplication of number 1 with each digit of number 2 and accumulate the result with a shift to get the final result.
int main() {
printf("number 1: ");
Number *n1 = inputNumber();
if (n1 == NULL) {
printf("number 1 is invalid\n");
return 1;
}
printf("number 2: ");
Number *n2 = inputNumber();
if (n2 == NULL) {
printf("number 2 is invalid\n");
return 1;
}
Number *r = newNumber(n1->len+n2->len);
Number *tmp = newNumber(n1->len+1);
for (int i = 0; i < n2->len; i++) {
multiplyNumber(n1, n2->digits[n2->len-1-i], tmp);
accumulateNumber(tmp, i, r);
}
printf("result: ");
printNumber(r);
return 0;
}

Here you may have a look at a 'string only' version, multiplying like you would do with a pencil.
It works with 2 loops. The outer loop takes the digits of value2 from the right and multiplies in the inner loop with every digit of value1 from right. The right digit of the multiplication is stored in result, the rest goes in carry for the next inner loop.
At the end of the inner loop, carry is added to result.
After the first outer loop, we have to add previous results to our multiplication.
This is done in if(!first && *lresp) r += toI(*lresp)
The final loop moves the result to the start of the char array.
#include <stdio.h>
#include <stdlib.h>
#define toI(x) ((x)-'0')
#define toC(x) ((x)+'0')
#define max(a,b) ((a)>(b)) ? (a):(b)
char *mul(char *buf1, char *buf2) {
int size, v1, v2, r, carry=0, first=1;
char *startp1, *endp1, *lendp1, *startp2, *endp2;
char *startres, *endres, *resp, *lresp, *result;
for(endp1 = startp1 = buf1; *endp1; endp1++); // start and endpointer 1st value
for(endp2 = startp2 = buf2; *endp2; endp2++); // start and end pointer 2nd value
size = endp2-startp2 + endp1-startp1; // result size
startres = endres = resp = result = malloc(size+10); // some reserve
endres += size+10-1; // result end pointer
for(resp = startres; resp <= endres; resp++) *resp = '\0'; // init result
for(endp1--, endp2--, resp-=2; endp2>=startp2; endp2--, resp--, first=0) {
v2 = toI(*endp2); // current digit of value2
for(lresp = resp, lendp1 = endp1; lendp1 >= startp1; lendp1--, lresp--) {
v1 = toI(*lendp1); // current digit of value1
r = v1 * v2 + carry; // multiply + carry
if(!first && *lresp) r += toI(*lresp); // add result of previous loops
*lresp = toC(r%10); // store last digit
carry = r/10;
}
for( ; carry != 0; carry /= 10)
*lresp-- = toC(carry%10);
}
// we began right with reserve, now move to start of result
for(lresp++; lresp < endres; lresp++, startres++)
*startres=*lresp;
*startres = '\0';
return result;
}
int main() {
char *result = mul("123456789", "12345678");
printf("\n%s\n", result);
free(result);
}

Problem with stack algorithm to obtain the greatest number after remove n digits

I am working on a stack algorithm that obtains the greatest number after removing N digits from any given number.
Given N and a number with k digits, obtain the greatest number after removing N first digits from the given k digit number.
For example: 38271 and N=2 => The greatest number is 871
In my approach, I'm trying to read a sequence of numbers and digits to remove and get the greatest element of each one of them.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pilha.h"
int main()
{
int m = 0, j;
scanf("%d", &m); /* reads the number of sequences */
getchar();
for(j = 0; j < m; j++) {
char *str = NULL;
int ch;
size_t size = 0, len = 0;
/* Read char by char, allocating memory space when needed */
while ((ch=getchar()) != EOF && ch != '\n') {
if (len + 1 >= size)
{
size = size * 2 + 1;
str = realloc(str, sizeof(char)*size);
if (str == NULL) {
return EXIT_FAILURE;
}
}
str[len++] = ch;
}
str[len] = '\0';
printf("%s\n", str);
char delim[] = " ";
char *ptr = strtok(str, delim); /* split the string number from the number of removing digits */
char *aux[2];
int x = 0;
while (ptr != NULL) {
if (x > 1) {
break;
}
aux[x] = ptr; /* stores into a variable to use later */
ptr = strtok(NULL, delim);
x++;
}
int tam = strlen(aux[0]);
printf("\nThe tam is : %d\n", tam);
char number[tam];
strcpy(number, aux[0]); /* copy to a new variable called number */
printf("\nThe number string is : %s\n", number);
int n = atoi(aux[1]); /* converting second parameter (number of removing digits) to integer */
printf("\nThe splitted number is : %d\n", n);
free(str);
/* Converts the string number to integer */
int arrayOfNumberDigits[tam];
for (int l = 0 ; l < tam ; l++) {
int data = number[l] - '0';
arrayOfNumberDigits[l] = data;
}
int limit = tam - n; /* variable that holds the limit of digits of future greatest number */
struct Stack* stack = createStack(limit);
// push(stack, arrayOfNumberDigits[0]); /* push the first element to stack */
int i = 0, stackSize = 0, helper = 0;
for (i = 0; i < tam; i++) {
while (!isEmpty(stack) && (arrayOfNumberDigits[i] > peek(stack)) && ((tam - i) >= limit)) {
printf("%d popped from stack\n", pop(stack));
helper++;
}
stackSize = stackSize - helper;
if (stackSize < limit) {
push(stack, arrayOfNumberDigits[i]);
stackSize++;
}
}
if (!isFull(stack)) {
int x = 0;
for(x = (tam - limit); x < tam; x++) {
pop(stack);
}
}
reverse(stack);
while (!isEmpty(stack)) {
printf("%d", pop(stack));
}
printf("\n");
free(stack);
}
return EXIT_SUCCESS;
}
Sadly, my current implementation does not work for cases like 192340623 with N=2. The problem is inside this piece of code, but I couldn't find what it is:
int i = 0, stackSize = 0, helper = 0;
for (i = 0; i < tam; i++) {
while (!isEmpty(stack) && (arrayOfNumberDigits[i] > peek(stack)) && ((tam - i) >= limit)) {
printf("%d popped from stack\n", pop(stack));
helper++;
}
stackSize = stackSize - helper;
if (stackSize < limit) {
push(stack, arrayOfNumberDigits[i]);
stackSize++;
}
}
Can somebody help find the issue in my current implementation, please?

Count occurrences and associate with given array in C

I'm having issues to correct my code so that it works as I want it.
I have three arrays given in this example:
char arr[MAX_ELEMENTS][MAX_LENGTH] = {"ABS","ABS","ABS","ACT","ACT","PPB","PPB","QQQ","QQQ"};
char race[MAX_ELEMENTS][MAX_LENGTH] = {"PARI", "PARI", "LOND", "PARI", "PARI", "CYKA", "LOND", "CYKA", "PARI"};
int freq[MAX_ELEMENTS];
I wish to create a function that can count the amount of occurrences of string elements in arr[] and store them in freq[]. Apart from that I also wish to know in what race[] there have been the most occurrences of given arr[].
To demonstrate this here is an example of what output I wish to receive when the function works:
In Race [PARI] the highest occurence was [ABS] with 3 occurences!
In Race [LOND] the highest occurence was [ACT] with 1 occurences!
.....
Currently, I am able to count the occurrences of arr[] in freq[] but I can't associate them with their respective race[] and give that output..
for(i=0; i<size; i++)
{
count = 1;
for(j=i+1; j<size; j++)
{
/* If duplicate element is found */
if(strcmp(arr[i], arr[j])==0)
{
count++;
/* Make sure not to count frequency of same element again */
freq[j] = 0;
}
}
/* If frequency of current element is not counted */
if(freq[i] != 0)
{
freq[i] = count;
}
}
Giving me currently :
ABS occurs 3 times.
ACT occurs 2 times.
etc. etc...
But I don't know how I can associate them with the race[] and only count them if a given race.

You probably have to use struct here to format your data.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#define true 1
#define len 100
#define elms 10
struct NODE;
#define Node struct NODE
struct NODE {
unsigned long int val;
int count;
char name[len];
Node *left;
Node *right;
};
Node * makeNode(char * str, unsigned long int val){
Node * tmp = (Node *)malloc(sizeof(Node));
strcpy(tmp->name, str);
tmp->val = val;
tmp->left = NULL;
tmp->right = NULL;
tmp->count = 1;
return tmp;
}
unsigned long int getHash(char * name){
int prime = 19;
int i = 0;
unsigned long int val = 0;
while(name[i]!='\0'){
val += (name[i] * pow(prime, i) );
++i;
}
return val;
}
void insert(Node * root, char * name){
Node * newnode;
int val = getHash(name);
Node * tmp = root;
while(tmp != NULL) {
if ( tmp->val == val){
tmp->count += 1;
break;
}
if (val > tmp->val){
if( tmp->right != NULL)
tmp = tmp->right;
else{
tmp->right = makeNode(name, val);
break;
}
}else {
if( tmp->left != NULL)
tmp = tmp->left;
else{
tmp -> left = makeNode(name, val);
break;
}
}
}
}
Node * find(Node * root, char * name){
int val = getHash(name);
Node * tmp = root;
while(tmp != NULL){
if(tmp -> val == val){
return tmp;
}else if (val > tmp->val){
tmp = tmp->right;
}else{
tmp = tmp->left;
}
}
return NULL;
}
struct Race {
char name[len];
char elements[elms][len];
};
char arr[elms][len] = {"ABS","ABS","ABS","ACT","ACT","PPB","PPB","QQQ","QQQ"};
char race[elms][len] = {"PARI", "PARI", "LOND", "PARI", "PARI", "CYKA", "LOND", "CYKA", "PARI"};
int freq[elms];
void copyArray(char dest[elms][len], char src[elms][len] ){
int i = 0;
while(strlen(src[i]) > 0){
strcpy(dest[i],src[i]);
++i;
}
}
int main(){
Node * root = makeNode("root", 0);
int i = 0;
while(strlen(arr[i]) > 0){
insert(root,arr[i]);
++i;
}
i = 0;
while(strlen(arr[i]) > 0){
Node * r = find(root,arr[i]);
printf("found %s, count = %ld\n", r->name, r->count);
++i;
}
// make representation of race
struct Race r1, r2;
strcpy(r1.name, "PARI");
{
char tmp[elms][len] = { "ABS", "PPB", "QQQ" };
copyArray(r1.elements, tmp);
}
strcpy(r2.name, "LOND");
{
char tmp[elms][len] = { "ACT" };
copyArray(r2.elements, tmp);
}
struct Race races[2] = {r1, r2};
i = 0;
while(i < 2){
struct Race * current = &races[i];
printf("for %s", current->name);
Node * max = NULL;
int m = -1;
int j = 0;
while(strlen(current->elements[j]) > 0){
Node * tmp = find(root, current->elements[j]);
if( tmp != NULL && tmp->count > m) {
max = tmp;
m = tmp->count;
}
++j;
}
if (max != NULL){
printf(" max is %s : %d\n", max->name, max->count);
}else{
printf(" max is None\n");
}
++i;
}
return 0;
}
Basically you have to format you data, and specify link between them. Here I used Binary tree and Rabin karp hashing technique to store data efficiently.
Binary tree is best way to solve counting problem, since the search operation fairly cheap. and Rabin karp hashing technique will avoid string comparison every time.
And I create a struct called Race to store all related elements of that race. so the algorithm is going to be.
let arr be array of elements
let races be array of races
for each race in races
define related element
#find occurrence now
#Binary tree will increment count if element already exist.
let binary_tree be a Binary Tree
for each element in arr
add element to binary_tree
# now we have all the elements with it's count
# let's iterate through races now
for each race in races
m = null
for element in race.elements
node = find_element_in_binary_tree(element)
if node is not null
m = max(m, node)
if m is not null then
print m
else
print not found

First, initializations, note the []s
char arr[][MAX_LENGTH] = {"ABS","ABS","ABS","ACT","ACT","PPB","PPB","QQQ","QQQ"};
char race[][MAX_LENGTH] = {"PARI","PARI","LOND","PARI","PARI","CYKA","LOND","CYKA","PARI"};
int freq[MAX_ELEMENTS];
int n = sizeof(arr)/sizeof(*arr); // get actual number of used items
int i,j;
int max = 0; // init max to 0
The main loop goes through arr and race, and whenever a dupe is found at [j] (after [i]), "invalidate" the dupe ("already processed") by setting its first char to 0 (empty string).
Note that j starts from i and not i+1 to ensure freq is at least 1, even for the first non-dupes items.
for(i=0 ; i<n ; i++) {
freq[i] = 0; // ensure freq is 0 for any item
if ( ! *arr[i]) continue; // skip already processed items
for(j=i ; j<n ; j++) { // j=i, not i+1!
if (!strcmp(arr[i],arr[j]) && !strcmp(race[i],race[j])) {
freq[i]++; // update max if necessary
if (freq[i] > max) max = freq[i];
if (j > i) *arr[j] = 0; // invalidate that arr element
}
}
}
Finally display the max appearances, including ties
printf("Items at max=%d:\n", max);
for(i=0 ; i<n ; i++) {
if (freq[i] == max) { // skipped items are never displayed (max cannot be 0)
printf("%s / %s\n", arr[i],race[i]);
}
}
(no need to check for "invalidation" as max will be >0, and all invalidated items have freq[i] == 0)

Insert function hash table C

I am having trouble implementing my insert function for my hash table.
So I implement some test calls where I just call the function separately. For actual use, I call the function inside a while loop. For testing purpose, I only run the loop 4 times.
I post some outputs below. The reason the table looks weird is because of my hash function. It hashes the words such that A = 1, B = 2, C = 3, and so on. The position of the letter in the word is irrelevant, since I will consider permutations of the word. Moreover, the case of the letter will be irrelevant in this problem as well, so the value of a = the value of A = 1.
And for strings, abc = 1 + 2 + 3 = 6, bc = 2 + 3 = 5, etc.
Overall, the hash function is fine. The problem is the insert function.
The first 4 words of my local dictionary are A, A's, AA's, AB's.
My expected output should be (I got the same output when I run the test calls):
0:
1: [W: A, Len:1]
2:
3:
...
18:
19:
20: [W: A's, Len:3]
21: [W: AA's, Len:4]
22: [W: AB's, Len:4]
But when I call the function inside a loop, whatever is last on the list will overwrite other entries. If I run the loop 100 times, then the last entry still replaces the previous ones (Notice how the lengths of the words are unchanged, but only the words are replaced):
0:
1: [W: AB's, L:1]
2:
3:
...
18:
19:
20: [W: AB's, Len:3]
21: [W: AB's, Len:4]
22: [W: AB's, Len:4]
Below is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int hash(char *word)
{
int h = 0;
while(*word != '\0')
{
if(*word >='A' && *word < 'A'+26) {
h=h+(*word -'A' + 1);
}
else if(*word >='a' && *word < 'a'+26) {
h=h+(*word -'a' + 1);
}
//else { // special characters
// return -1;
//}
word++;
}
return h;
}
typedef struct Entry {
char *word;
int len;
struct Entry *next;
} Entry;
#define TABLE_SIZE 1000 // random numbers for testing
Entry *table[TABLE_SIZE] = { NULL }; // an array of elements
void init() {
int i;
for (i = 0; i < TABLE_SIZE; i++) {
// initialize values
struct Entry *en = (struct Entry *)malloc(sizeof(struct Entry));
en->word = "";
en->len = 0;
en->next = table[i];
table[i] = en;
}
}
//Insert element
void insertElement(char *word, int len) {
int h = hash(word);
int i;
// because all words are different so there is no need to check for duplicates
struct Entry *en = (struct Entry *)malloc(sizeof(struct Entry));
en->word = word;
en->len = len;
en->next = table[h];
table[h] = en;
}
void cleanTable()
{
struct Entry *p, *q;
int i;
for( i=0; i<TABLE_SIZE; ++i )
{
for( p=table[i]; p!=NULL; p=q )
{
q = p->next;
free( p );
}
} // for each entry
}
int main() {
init(); // create hash table
// test calls produce correct output
//insertElement("A", (int)strlen("A"));
//insertElement("A's", (int)strlen("A's"));
//insertElement("AA's", (int)strlen("AA's"));
//insertElement("AB's", (int)strlen("AB's"));
int i;
i = 0;
FILE* dict = fopen("/usr/share/dict/words", "r"); //open the dictionary for read-only access
if(dict == NULL) {
return;
}
// Read each line of the file, and insert the word in hash table
char word[128];
while(i < 4 && fgets(word, sizeof(word), dict) != NULL) {
size_t len = strlen(word);
if (len > 0 && word[len - 1] == '\n') {
word[len - 1] = '\0'; // trim the \n
}
insertElement(word, (int)strlen(word));
i++;
}
for ( i=0; i < 50; i++)
{
printf("%d: ", i);
struct Entry *enTemp = table[i];
while (enTemp->next != NULL)
{
printf("[W: %s, Len:%d] ", enTemp->word, enTemp->len);
enTemp = enTemp->next;
}
printf("\n");
}
cleanTable();
return 0;
}

Try to reallocate the memory in each loop in this part of code:
char* word = malloc(sizeof(char)*128);
while(i < 4 && fgets(word, sizeof(word), dict) != NULL) {
size_t len = strlen(word);
if (len > 0 && word[len - 1] == '\n') {
word[len - 1] = '\0'; // trim the \n
}
insertElement(word, (int)strlen(word));
word = malloc(sizeof(char)*128);
i++;
}
You forgot to reallocate memory to every string which causes all pointers points at same point
Note: Not tested

notice that your insertElement get a pointer to a string, and assign that pointer to the current Entry, but its the main function, you pass the word argument(a pointer) that point the stack allocated string, and that string is changed after each read of a word. you must use malloc so that each word point to its own memory area