Comparing contents of files for groupings of words - c

Background:
I am currently working on a project. The main objective is to read files and compare groupings of words. Only user interaction will be to specify group length. My programs are placed into a directory. Inside that directory, there will be multiple textfiles(Up to 30). I use
system("ls /home/..... > inputfile.txt");
system("ls /home/..... > inputfile.txt");
From there, I open the files from inputfile.txt to read for their contents.
Now to the actual question/problem part.
The method I am using for this is a queue because FIFO. (Code "link.c":http://pastebin.com/rLpVGC00
link.c
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include "linkedlist.h"
struct linkedList
{
char *data;
int key;
int left;
int right;
int size;
};
LinkedList createLinkedList(int size)
{
LinkedList newLL = malloc(sizeof *newLL);
newLL->data = malloc(sizeof(int) * (size+1));
newLL->size = size;
newLL->left = 0;
newLL->right = 0;
return newLL;
}
bool isFull(LinkedList LL)
{
return abs(abs(LL->left)- abs(LL->right)) == LL->size;
}
void insertFront(LinkedList LL, char *newInfo)
{
if(isFull(LL))
{
printf("FULL");
exit(1);
}
LL->data[((--(LL->left) % LL->size) + LL->size) % LL->size] = newInfo;
}
bool isEmpty(LinkedList LL)
{
return LL->left == LL->right;
}
const char * removeEnd(LinkedList LL)
{
if(isEmpty(LL))
{
return "EMPTY";
//exit(1);
}
return LL->data[((--(LL->right) % LL->size) + LL->size) % LL->size];
}
I get two warnings when I compile with link.c and my main (Start11.c)
link.c: In function ‘insertFront’:
link.c:39:64: warning: assignment makes integer from pointer without a cast [enabled by default]
LL->data[((--(LL->left) % LL->size) + LL->size) % LL->size] = newInfo;
^
link.c: In function ‘removeEnd’:
link.c:54:5: warning: return makes pointer from integer without a cast [enabled by default]
return LL->data[((--(LL->right) % LL->size) + LL->size) % LL->size];
^
FULL start11.c code: http://pastebin.com/eskn5yxm .
From bulk of read() function that I have questions about:
fp = fopen(filename, "r");
//We want two word or three word or four word PHRASES
for (i = 0; fgets(name, 100, fp) != NULL && i < 31; i++)
{
char *token = NULL; //setting to null before using it to strtok
token = strtok(name, ":");
strtok(token, "\n");//Getting rid of that dirty \n that I hate
strcat(fnames[i], token);
char location[350];
//Copying it back to a static array to avoid erros with fopen()
strcpy(location, fnames[i]);
//Opening the files for their contents
fpp = fopen(location, "r");
printf("\nFile %d:[%s] \n", i+1, fnames[i]);
char* stringArray[400];
//Reading the actual contents
int y;
for(j = 0; fgets(info,1600,fpp) != NULL && j < 1600; j++)
{
for( char *token2 = strtok(info," "); token2 != NULL; token2 = strtok(NULL, " ") )
{
puts(token2);
++y;
stringArray[y] = strdup(token2);
insertFront(index[i],stringArray[y]);
}
}
}
//Comparisons
char take[20010],take2[200100], take3[200100],take4[200100];
int x,z;
int count, count2;
int groupV,groupV2;
for(x = 0; x < 10000; ++x)
{
if(removeEnd(index[0])!= "EMPTY")
{
take[x] = removeEnd(index[0]);
}
if(removeEnd(index[1])!= "EMPTY")
{
take2[x] = removeEnd(index[1]);
}
if(removeEnd(index[2])!= "EMPTY")
{
take3[x] = removeEnd(index[2]);
}
}
for(z = 0; z < 10; z++)
{
if(take[z] == take2[z])
{
printf("File 1 and File 2 are similar\n");
++count;
if(count == groupL)
{
++groupV;
}
}
if(take[z] == take3[z])
{
printf("File 1 and File 3 are similar\n");
++count2;
if(count == groupL)
{
++groupV2;
}
}
}
Are those two warnings before the reason why when I try to compare the files it'll not be correct? (Yes I realize I "hardcoded" the comparisons. That is just temporary till I get some of this down...)
I'll post header files as a comment. Won't let me post more than two links.
Additional notes:
removeEnd() returns "EMPTY" if there is if there is nothing left to remove.
insertFront() is a void function.
Before I created this account so I can post, I read a previous question regarding strttok and how if I want to insert something I have to strdup() it.
I have not added my free functions to my read() function. I will do that last too.
start.h (pastebin.com/NTnEAPYE)
#ifndef START_H
#define START_H
#include "linkedlist.h"
void read(LinkedList LL,char* filename, int lineL);
#endif
linkedlist.h (pastebin.com/ykzbnCTV)
#include <stdlib.h>
#include <stdio.h>
#ifndef LINKEDLIST_H
#define LINKEDLIST_H
typedef int bool;
typedef struct linkedList *LinkedList;
LinkedList createLinkedList(int size);
bool isFull(LinkedList LL);
void insertFront(LinkedList LL, char *newInfo);
const char * removeEnd(LinkedList LL);
bool isEmpty(LinkedList LL);
#endif

The main problem is around the removeEnd (resp. insertFrom) function:
const char * removeEnd(LinkedList LL)
{
if (...)
return "EMPTY";
else
return LL->data[xxx];
you return a const char * in the first return but a char in the second return, hence the warning, which is a serious one.
And when you compare return value to "EMPTY" in the caller, it's just wrong: you should use strcmp instead of comparing arrays which may be the same depending on compilers which group same strings in the same location, but only by chance (and not portable!)

Related

Sorting structures from files

I recently got an assignment to sort members in a struct by last name and if they are the same to sort by first name. What i have so far only reads their name and age from the file but I am not properly grapsing how I would be able to sort it. So far I gathered the data from the file but im at a loss from there. I followed a code I saw but i didnt get a proper grasping of the process so i reverted back to step one.
struct Members{
int id;
char fname[50];
char lname[50];
int age;
}bio;
int main(){
int i=0;
FILE *fptr;
file = fopen("Members Bio.txt", "r");
while ( fscanf(file, "%d%s%s%d", &bio[i].id,bio[i].fname,bio[i].lname,&bio[i].age) != EOF)
{
printf("%d %s %s %d %d\n", bio[i].id,bio[i].fname, bio[i].lname, bio[i].age);
i++;
}
fclose(fptr);
}
Can anyone help me out on this one?
Code goes something like this for your case.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct Members{
int id;
char fname[50];
char lname[50];
int age;
};
typedef int (*compare_func)(void*, void*);
int struct_cmp(void* s1, void* s2)
{
int l_result = strcmp(((struct Members*) s1)->lname, \
((struct Members*) s2)->lname);
if (l_result < 0)
return 1;
else if (l_result > 0)
return 0;
else
return (strcmp(((struct Members*) s1)->fname, \
((struct Members*) s2)->fname) < 0 ? 1 : 0);
}
void sort(void* arr,long ele_size,long start,long end,compare_func compare)
{
// Generic Recursive Quick Sort Algorithm
if (start < end)
{
/* Partitioning index */
void* x = arr+end*ele_size;
long i = (start - 1);
void* tmp=malloc(ele_size);
for (long j = start; j <= end - 1; j++)
{
if ((*compare)(arr+j*ele_size,x))
{
i++;
// Swap is done by copying memory areas
memcpy(tmp,arr+i*ele_size,ele_size);
memcpy(arr+i*ele_size,arr+j*ele_size,ele_size);
memcpy(arr+j*ele_size,tmp,ele_size);
}
}
memcpy(tmp,arr+(i+1)*ele_size,ele_size);
memcpy(arr+(i+1)*ele_size,arr+end*ele_size,ele_size);
memcpy(arr+end*ele_size,tmp,ele_size);
i= (i + 1);
sort(arr,ele_size,start, i - 1,compare);
sort(arr,ele_size,i + 1, end,compare);
}
}
int main()
{
FILE* fp;
int bio_max = 3;
struct Members bio[bio_max]; // Define bio to be large enough.
/* Open FILE and setup bio matrix */
/* For testing */
bio[0].id = 0;
strcpy(bio[0].fname, "");
strcpy(bio[0].lname, "Apple");
bio[0].age = 0;
bio[1].id = 1;
strcpy(bio[1].fname, "");
strcpy(bio[1].lname, "Cat");
bio[1].age = 1;
bio[2].id = 2;
strcpy(bio[2].fname, "");
strcpy(bio[2].lname, "Bat");
bio[2].age = 2;
/* Sort the structure */
sort(bio, sizeof(struct Members), 0, bio_max - 1, struct_cmp);
/* Print the sorted structure */
for (int i = 0; i < bio_max; i++) {
printf("%d %s %s %d\n", bio[i].id, bio[i].fname, \
bio[i].lname, bio[i].age);
}
}
Output
0 Apple 0
2 Bat 2
1 Cat 1
If the strings are not sorting in the way you want, you can redefine the struct_cmp function. Code is self explanatory, the base logic in the code is pass an array and swap elements using memcpy functions. You cant use simple assignment operator if you want to be generic, so that is why the element size is explicitly passed.
Edit
The code was not handling the condition, if lname are same. I missed it thanks for #4386427 for pointing this out.
I think you should define bio to be an array. And google sort algorithms please. Also recommend you google how to use libc function qsort.

Whats wrong when copying my nested struct?

I code C to get better in programming and study... and have program that should generate a static web-page. It also saves the project as a text-file. I have separate functions to make object (realloc and put a new struct...), and have extracted the problem-code to a short program for this occasion... It's just for reading the 'project'. When I run it says:
Segmentation fault (core dumped)
in the middle of the print_1_content
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define SELL_ITEM 1
#define PARAGRAPH_ITEM 2
struct SellItem {
char title[50];
int nr_of_img;
char ** image_files;//array of strings
};
struct ParagraphItem{
char * text;
};
union ContentItem{//one of the following only
struct SellItem s_item;
struct ParagraphItem p_item;
};
struct Content{
int type;//1=sellitem 2=paragraph
union ContentItem c_item;
};
int open_items_file(struct Content **, int *, char *);
int free_1_item(struct Content *);
struct Content import_1_content(char *);
void increase(struct Content**, int *);
void print_1_content(struct Content *);
struct Content import_1_content(char *);
int free_1_item(struct Content *);
int main (void)
{
struct Content * content;
int content_count=0;
open_items_file(&content, &content_count, "all_items.txt");
return 0;
}
int open_items_file(struct Content ** content, int * number_of_content, char * filename){
printf("open_items_file %s\n", filename);
FILE *fp = fopen(filename, "r");
char * line = NULL;
size_t len = 0;
ssize_t read;
int counter=0;
if(fp==NULL){
return 0;
}
//for each row
while ((read = getline(&line, &len, fp)) != -1) {
if((line[0]=='S' || line[0]=='P') && line[1]=='-'){
if(line[3]==':'){
if(line[2]=='I'){
increase(content, number_of_content);
*content[(*number_of_content)-1] = import_1_content(line);
}
else{
//not sell/paragraph item
}
}//end if line[3]==':'
}//end if line[0] =='S' eller 'P'
counter++;
}
free(line);
fclose(fp);
return counter;
}
void increase(struct Content** content, int *nr_of_content){
if((*nr_of_content)==0){
*content = malloc(sizeof(struct Content));
}
else{
*content = realloc(*content, (*nr_of_content+1) * sizeof(struct Content));
}
(*nr_of_content)++;
}
void print_1_content(struct Content * content){
//Print info
}
struct Content import_1_content(char * text_line){
struct Content temp_content_item;
char * line_pointer = text_line;
char c;
line_pointer += 4;
if(text_line[0]=='S'){
temp_content_item.type = SELL_ITEM;
temp_content_item.c_item.s_item.nr_of_img=0;
int i=0;
char * temp_text;
while(*line_pointer != '|' && *line_pointer != '\n' && i < sizeof(temp_content_item.c_item.s_item.title)-1){
temp_content_item.c_item.s_item.title[i] = *line_pointer;
i++;//target index
line_pointer++;
}
temp_content_item.c_item.s_item.title[i]='\0';
i=0;
//maybe images?
short read_img_counter=0;
if(*line_pointer == '|'){
line_pointer++; //jump over '|'
//img-file-name separ. by ';', row ends by '\n'
while(*line_pointer != '\n'){//outer image filename -loop
i=0;
while(*line_pointer != ';' && *line_pointer != '\n'){//steps thr lett
c = *line_pointer;
if(i==0){//first letter
temp_text = malloc(2);
}
else if(i>0){
temp_text = realloc(temp_text, i+2);//extra for '\0'
}
temp_text[i] = c;
line_pointer++;
i++;
}
if(*line_pointer==';'){//another image
line_pointer++;//jump over ';'
}
else{
}
temp_text[i]='\0';
//allocate
if(read_img_counter==0){//create array
temp_content_item.c_item.s_item.image_files = malloc(sizeof(char*));
}
else{//extend array
temp_content_item.c_item.s_item.image_files = realloc(temp_content_item.c_item.s_item.image_files, sizeof(char*) * (read_img_counter+1));
}
//allocate
temp_content_item.c_item.s_item.image_files[read_img_counter] = calloc(i+1, 1);
//copy
strncpy(temp_content_item.c_item.s_item.image_files[read_img_counter], temp_text, strlen(temp_text));
read_img_counter++;
temp_content_item.c_item.s_item.nr_of_img = read_img_counter;
}
}
else{
printf("Item had no img-files\n");
}
}
else{ // text_line[0]=='P'
temp_content_item.type = PARAGRAPH_ITEM;
temp_content_item.c_item.p_item.text = calloc(strlen(text_line)-4,1);
int i=0;
while(*line_pointer != '\0' && *line_pointer != '\n'){
temp_content_item.c_item.p_item.text[i] = *line_pointer;
i++;
line_pointer++;
}
}
print_1_content(&temp_content_item);
return temp_content_item;
}
int free_1_item(struct Content * item){
if(item->type==SELL_ITEM){
if(item->c_item.s_item.nr_of_img > 0){
//Freeing img-names
for(int i=0; i<item->c_item.s_item.nr_of_img; i++){
free(item->c_item.s_item.image_files[i]);
}
}
return 1;
}
else if(item->type==PARAGRAPH_ITEM){
//freeing p_item
free(item->c_item.p_item.text);
return 1;
}
else{
printf("error: unknown item\n");
}
return 0;
}
The text file to read (all_items.txt) is like this, ends with a new-line, for the two content types of "sell-item" and "paragraph-item":
S-I:Shirt of cotton|image1.jpg;image2.jpg;image3.jpg
P-I:A paragraph, as they are called.
S-I:Trousers, loose style|image4.jpg
So the problem as you've discovered is on this line:
*content[(*number_of_content)-1] = temp_content_item2;
It's because of operate precedence because *content[(*number_of_content)-1] is not the same as (*content)[(*number_of_content)-1], it's actually doing *(content[(*number_of_content)-1]). So your code is doing the array indexing and then de-referencing which is pointing at some random place in memory. Replace that line with this and that will fix the current problem.
(*content)[(*number_of_content)-1] = temp_content_item2;

calls to printf seem to overwrite array of strings

Hello I am developing a program for the Raspberry in C (the in-progress project can be found here).
I noted there are some errors in the task1 function so I created an equivalent program in my Desktop (running Ubuntu) to find the error, where the task1 was readapted as below:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "stub.h"
#include "globals.h"
#include "utils.h"
#include <pthread.h>
#define INPIN 25
void takePic(char picname[24]);
//This thread reads the PIR sensor output
void task1()
{
unsigned char val_read = 0;
static unsigned char alarm_on = FALSE;
static const unsigned int maxNumPics=10;
static char folderPath[] = "/home/usr/Documents/alarmSys_rasp/alarm_new/pics/";
char fileNameTodelete[73];
fileNameTodelete[0] = '\0';
strcat(fileNameTodelete, folderPath);
//INITIALIZING
pinMode(INPIN, INPUT);
//create folder where to save pics
createFolder(folderPath);
char* names[24];
char picname[24];
int res = 0;
picname[0] = '\0';
static unsigned int numPicsInFolder;
//delete if more than 10 files
while((numPicsInFolder = filesByName(names, folderPath))>maxNumPics)
{
fileNameTodelete[0] = '\0';
strcat(fileNameTodelete, folderPath);
strcat(fileNameTodelete, names[0]);
printf("%s\n", fileNameTodelete);
remove(fileNameTodelete);
}
static unsigned int nexEl;
nexEl = numPicsInFolder % maxNumPics;
printf("Entering while\n");
while(1)
{
//static const unsigned int del = 300;
val_read = digitalRead(INPIN);
if (val_read && (!alarm_on)) //motion detected
{
printf("\nDetected movement\n");
if (numPicsInFolder >= maxNumPics)
{
printf("\nMax num pics\n");
fileNameTodelete[0] = '\0';
strcat(fileNameTodelete, folderPath);
strcat(fileNameTodelete, names[nexEl]);
printFiles(names, numPicsInFolder);
printf("File to be deleted %d: %s, ", nexEl, names[nexEl]);
//printf("%s\n", fileNameTodelete);
if ((res = remove(fileNameTodelete))!=0)
{
printf("Error deleting file: %d\n", res);
}
}
else
{
printf("\nNot reached max num pics\n");
numPicsInFolder++;
}
//update buffer
takePic(picname);
printf("value returned by takePic: %s\n", picname);
//names[nexEl] = picname;
strcpy(names[nexEl], picname); //ERROR HERE
printFiles(names, numPicsInFolder);
printf("curr element %d: %s\n",nexEl, names[nexEl]);
nexEl++;
nexEl %= maxNumPics;
printf("\nDetected movement: alarm tripped\n\n");
alarm_on = TRUE;
/*Give some time before another pic*/
}
else if (alarm_on && !val_read)
{
alarm_on = FALSE;
printf("\nAlarm backed off\n\n");
}
}
}
void takePic(char picname[24])
{
/*Build string to take picture*/
int err;
//finalcmd is very long
char finalcmd[150];
finalcmd[0] = '\0';
getDateStr(picname);
char cmd1[] = "touch /home/usr/Documents/alarmSys_rasp/alarm_new/pics/";
char cmdlast[] = "";
strcat(finalcmd, cmd1);
strcat(picname, ".jpg");
strcat(finalcmd, picname);
strcat(finalcmd, cmdlast);
system(finalcmd);
if ((err=remove("/var/www/html/*.jpg"))!=0)
{
printf("Error deleting /var/www/html/*.jpg, maybe not existing\n" );
}
//system(finalcmd_ln);
//pthread_mutex_lock(&g_new_pic_m);
g_new_pic_flag = TRUE;
printf("\nPicture taken\n\n");
}
DESCRIPTION
The main function calls the task1 function defined in the file task1.c. The function creates a file in the folder ./pics/ every time the condition (val_read && (!alarm_on)) is verified (in the simulation this condition is satisfied every 2 loops). The function allows only 10 files in the folder. If there are already 10, it deletes the oldest one and creates the new file by calling the function takePic.
The name of files are stored in a array of strings char* names[24]; and the variable nexEl points to the element of this array having the name of the oldest file so that it is replaced with the name of the new file just created.
PROBLEM
The problem is the following: the array char* names[24] is correctly populated at the first iteration but already in the second iteration some elements are overwritten. The problem arises when the folder has the maximum number of files (10) maybe on the update of the array.
It seems the calls to printf overwrite some of its elements so that for example one of them contains the string "Error deleting /var/www/html/*.jpg, maybe not existing\n" printed inside the funtion takePic.
What am I missing or doing wrong with the management of arrays of strings?
UTILITIES FUNCTIONS
To be complete here are shortly described and reported other functions used in the program.
The function getDateStr builds a string representing the current date in the format yyyy_mm_dd_hh_mm_ss.
The function filesByName builds an array of strings where each string is the name of a file in the folder ./ ordered from the last file created to the newest.
The function printFiles prints the previous array.
void getDateStr(char str[20])
{
char year[5], common[3];
time_t t = time(NULL);
struct tm tm = *localtime(&t);
str[0]='\0';
sprintf(year, "%04d", tm.tm_year+1900);
strcat(str, year);
sprintf(common, "_%02d", tm.tm_mon + 1);
strcat(str, common);
sprintf(common, "_%02d", tm.tm_mday);
strcat(str, common);
sprintf(common, "_%02d", tm.tm_hour);
strcat(str, common);
sprintf(common, "_%02d", tm.tm_min);
strcat(str, common);
sprintf(common, "_%02d", tm.tm_sec);
strcat(str, common);
//printf("%s\n", str);
}
unsigned int countFiles(char* dir)
{
unsigned int file_count = 0;
DIR * dirp;
struct dirent * entry;
dirp = opendir(dir); /* There should be error handling after this */
while ((entry = readdir(dirp)) != NULL) {
if (entry->d_type == DT_REG) { /* If the entry is a regular file */
file_count++;
}
}
return file_count;
}
void printFiles(char* names[24], unsigned int file_count)
{
for (int i=0; i<file_count; i++)
{
printf("%s\n", names[i]);
}
}
unsigned int filesByName(char* names[24], char* dir)
{
unsigned int file_count = 0;
DIR * dirp;
struct dirent * entry;
dirp = opendir(dir); /* There should be error handling after this */
while ((entry = readdir(dirp)) != NULL) {
if (entry->d_type == DT_REG) { /* If the entry is a regular file */
//strncpy(names[file_count], entry->d_name,20);
//names[file_count] = malloc(24*sizeof(char));
names[file_count] = entry->d_name;
file_count++;
}
}
closedir(dirp);
char temp[24];
if (file_count>0)
{
for (int i=0; i<file_count-1; i++)
{
for (int j=i; j<file_count; j++)
{
if (strcmp(names[i], names[j])>0)
{
strncpy(temp, names[i],24);
strncpy(names[i], names[j],24);
strncpy(names[j], temp, 24);
}
}
}
}
return file_count;
}
For the simulation I created also the following function (digitalRead is actually a function of the wiringPi C library for the Raspberry):
int digitalRead(int INPIN)
{
static int res = 0;
res = !res;
return res;
}
In task1, you have char *names[24]. This is an array of char pointers.
In filesByName, you do
names[file_count] = entry->d_name;
but should be doing
names[file_count] = strdup(entry->d_name);
because you can't guarantee that d_name persists or is unique after the function returns or even within the loop. You were already close with the commented out malloc call.
Because you call filesByName [possibly] multiple times, it needs to check for names[file_count] being non-null so it can do a free on it [to free the old/stale value from a previous invocation] before doing the strdup to prevent a memory leak.
Likewise, in task1,
strcpy(names[nexEl], picname); //ERROR HERE
will have similar problems and should be replaced with:
if (names[nexEl] != NULL)
free(names[nexEl]);
names[nexEl] = strdup(picname);
There may be other places that need similar adjustments. And, note that in task1, names should be pre-inited will NULL
Another way to solve this is to change the definition of names [everywhere] from:
char *names[24];
to:
char names[24][256];
This avoids some of the malloc/free actions.
getDateStr() uses a char buffer that is always too small. Perhaps other problems exists too.
void getDateStr(char str[20]) {
char year[5], common[3];
....
sprintf(common, "_%02d", tm.tm_mon + 1); // BAD, common[] needs at least 4
Alternative with more error checking
char *getDateStr(char *str, size_t sz) {
if (str == NULL || sz < 1) {
return NULL;
}
str[0] = '\0';
time_t t = time(NULL);
struct tm *tm_ptr = localtime(&t);
if (tm_ptr == NULL) {
return NULL;
}
struct tm tm = *tm_ptr;
int cnt = snprintf(year, sz, "%04d_%02d_%02d_%02d_%02d_%02d",
tm.tm_year+1900, tm.tm_mon + 1, tm.tm_mday,
tm.tm_hour, tm.tm_min, tm.tm_sec);
if (cnt < 0 || cnt >= sz) {
return NULL;
}
return str;
}

Reading string from array of pointers

How can I read each individual character from a string that is accessed through an array of pointers? In the below code I currently have generated an array of pointers to strings called, symCodes, in my makeCodes function. I want to read the strings 8 characters at a time, I thought about concatenating each string together, then looping through that char by char but the strings in symCodes could be up to 255 characters each, so I feel like that could possibly be too much all to handle at once. Instead, I thought I could read each character from the strings, character by character.
I've tried scanf or just looping through and always end up with seg faults. At the end of headerEncode(), it's near the bottom. I malloc enough memory for each individual string, I try to loop through the array of pointers and print out each individual character but am ending up with a seg fault.
Any suggestions of a different way to read an array of pointers to strings, character by character, up to n amount of characters is appreciated.
EDIT 1: I've updated the program to no longer output warnings when using the -Wall and -W flags. I'm no longer getting a seg fault(yay!) but I'm still unsure of how to go about my question, how can I read an array of pointers to strings, character by character, up to n amount of characters?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "huffman.h"
#define FAIL 0
#define SUCCESS 1
/* global 1 day arrays that hold chars and their freqs from file */
unsigned long globalFreqs[256] = {0};
unsigned char globalUsedCh[256] = {0};
char globalCodes[256] = {0};
unsigned char globalUniqueSymbols;
unsigned long totalCount = 0;
typedef struct HuffmanTreeNode* HTNode;
struct HuffmanTreeNode* globalSortedLL;
/*
struct has the input letter, the letters frequency, and the left and irght childs
*/
struct HuffmanTreeNode
{
char symbol;
unsigned long freq;
char *code;
struct HuffmanTreeNode *left, *right;
struct HuffmanTreeNode* next;
};
/* does it make sense to have a struct for the entire huffman tree to see its size? */
struct HuffmanTree
{
unsigned size;
};
/*generate new node with given symbol and freq */
struct HuffmanTreeNode* newNode(char symbol, int freq)
{
struct HuffmanTreeNode* newNode = malloc(sizeof(struct HuffmanTreeNode));
newNode->symbol = symbol;
newNode->freq = freq;
newNode->left = newNode->right = NULL;
return newNode;
}
/*current work in progress, i believe this is the way to insert it for a BST
/* will change for HuffmanTreenode once working
/*
*/
struct HuffmanTreeNode* insert(struct HuffmanTreeNode* node, struct HuffmanTreeNode* htnNew)
{
struct HuffmanTreeNode* currentNode = node;
if(currentNode == NULL || compareTwoNodes(htnNew, currentNode))
{
htnNew->next = currentNode;
return htnNew;
}
else
{
while(currentNode->next != NULL && compareTwoNodes(currentNode->next, htnNew))
{
currentNode = currentNode->next;
}
htnNew->next = currentNode->next;
currentNode->next = htnNew;
return node;
}
}
int compareTwoNodes(struct HuffmanTreeNode* a, struct HuffmanTreeNode* b)
{
if(b->freq < a->freq)
{
return 0;
}
if(a->freq == b->freq)
{
if(a->symbol > b->symbol)
return 1;
return 0;
}
if(b->freq > a->freq)
return 1;
}
struct HuffmanTreeNode* popNode(struct HuffmanTreeNode** head)
{
struct HuffmanTreeNode* node = *head;
*head = (*head)->next;
return node;
}
/*convert output to bytes from bits*/
/*use binary fileio to output */
/*put c for individual character byte*/
/*fwrite each individual byte for frequency of symbol(look at fileio slides) */
/*
#function:
#param:
#return:
*/
int listLength(struct HuffmanTreeNode* node)
{
struct HuffmanTreeNode* current = node;
int length = 0;
while(current != NULL)
{
length++;
current = current->next;
}
return length;
}
/*
#function:
#param:
#return:
*/
void printList(struct HuffmanTreeNode* node)
{
struct HuffmanTreeNode* currentNode = node;
while(currentNode != NULL)
{
if(currentNode->symbol <= ' ' || currentNode->symbol > '~')
printf("=%d", currentNode->symbol);
else
printf("%c", currentNode->symbol);
printf("%lu ", currentNode->freq);
currentNode = currentNode->next;
}
printf("\n");
}
/*
#function:
#param:
#return:
*/
void buildSortedList()
{
int i;
for(i = 0; i < 256; i++)
{
if(!globalFreqs[i] == 0)
{
globalSortedLL = insert(globalSortedLL, newNode(i, globalFreqs[i]));
}
}
printf("Sorted freqs: ");
printList(globalSortedLL);
printf("listL: %d\n", listLength(globalSortedLL));
}
/*
#function: isLeaf()
will test to see if the current node is a leaf or not
#param:
#return
*/
int isLeaf(struct HuffmanTreeNode* node)
{
if((node->left == NULL) && (node->right == NULL))
return SUCCESS;
else
return FAIL;
}
/*where I plan to build the actual huffmantree */
/*
#function:
#param:
#return:
*/
struct HuffmanTreeNode* buildHuffmanTree(struct HuffmanTreeNode* node)
{
int top = 0;
struct HuffmanTreeNode *left, *right, *topNode, *huffmanTree;
struct HuffmanTreeNode* head = node;
struct HuffmanTreeNode *newChildNode, *firstNode, *secondNode;
while(head->next != NULL)
{
/*grab first two items from linkedL, and remove two items*/
firstNode = popNode(&head);
secondNode = popNode(&head);
/*combine sums, use higher symbol, create new node*/
newChildNode = newNode(secondNode->symbol, (firstNode->freq + secondNode->freq));
newChildNode->left = firstNode;
newChildNode->right = secondNode;
/*insert new node, decrement total symbols in use */
head = insert(head, newChildNode);
}
return head;
}
void printTable(char *codesArray[])
{
int i;
printf("Symbol\tFreq\tCode\n");
for(i = 0; i < 256; i++)
{
if(globalFreqs[i] != 0)
{
if(i <= ' ' || i > '~')
{
printf("=%d\t%lu\t%s\n", i, globalFreqs[i], codesArray[i]);
}
else
{
printf("%c\t%lu\t%s\n", i, globalFreqs[i], codesArray[i]);
}
}
}
printf("Total chars = %lu\n", totalCount);
}
void makeCodes(
struct HuffmanTreeNode *node, /* Pointer to some tree node */
char *code, /* The *current* code in progress */
char *symCodes[256], /* The array to hold the codes for all the symbols */
int depth) /* How deep in the tree we are (code length) */
{
char *copiedCode;
int i = 0;
if(isLeaf(node))
{
code[depth] = '\0';
symCodes[node->symbol] = code;
return;
}
copiedCode = malloc(255*sizeof(char));
memcpy(copiedCode, code, 255*sizeof(char));
code[depth] = '0';
copiedCode[depth] = '1';
makeCodes(node->left, code, symCodes, depth+1);
makeCodes(node->right, copiedCode, symCodes, depth+1);
}
/*
#function: getFileFreq()
gets the frequencies of each character in the given
file from the command line, this function will also
create two global 1d arrays, one for the currently
used characters in the file, and then one with those
characters frequencies, the two arrays will line up
parallel
#param: FILE* in, FILE* out,
the current file being processed
#return: void
*/
void getFileFreq(FILE* in, FILE* out)
{
unsigned long freqs[256] = {0};
int i, t, fileCh;
while((fileCh = fgetc(in)) != EOF)
{
freqs[fileCh]++;
totalCount++;
}
for(i = 0; i < 256; i++)
{
if(freqs[i] != 0)
{
globalUsedCh[i] = i;
globalFreqs[i] = freqs[i];
if(i <= ' ' || i > '~')
{
globalUniqueSymbols++;
}
else
{
globalUniqueSymbols++;
}
}
}
/* below code until total count is for debugging purposes */
printf("Used Ch: ");
for(t = 0; t < 256; t++)
{
if(globalUsedCh[t] != 0)
{
if(t <= ' ' || t > '~')
{
printf("%d ", globalUsedCh[t]);
}
else
printf("%c ", globalUsedCh[t]);
}
}
printf("\n");
printf("Freq Ch: ");
for(t = 0; t < 256; t++)
{
if(globalFreqs[t] != 0)
{
printf("%lu ", globalFreqs[t]);
}
}
printf("\n");
/* end of code for debugging/vizualazation of arrays*/
printf("Total Count %lu\n", totalCount);
printf("globalArrayLength: %d\n", globalUniqueSymbols);
}
void headerEncode(FILE* in, FILE* out, char *symCodes[256])
{
char c;
int i, ch, t, q, b, z;
char *a;
char *fileIn;
unsigned char *uniqueSymbols;
unsigned char *byteStream;
unsigned char *tooManySym = 0;
unsigned long totalEncodedSym;
*uniqueSymbols = globalUniqueSymbols;
totalEncodedSym = ftell(in);
rewind(in);
fileIn = malloc((totalEncodedSym+1)*sizeof(char));
fread(fileIn, totalEncodedSym, 1, in);
if(globalUniqueSymbols == 256)
{
fwrite(tooManySym, 1, sizeof(char), out);
}
else
{
fwrite(uniqueSymbols, 1, sizeof(uniqueSymbols)-7, out);
}
for(i = 0; i < 256; i++)
{
if(globalFreqs[i] != 0)
{
fwrite(globalUsedCh+i, 1, sizeof(char), out);
fwrite(globalFreqs+i, 8, sizeof(char), out);
}
}
for(t = 0; t < totalEncodedSym; t++)
{
fwrite(symCodes[fileIn[t]], 8, sizeof(char), out);
}
for(q = 0; q < totalEncodedSym; q++)
{
symCodes[q] = malloc(255*sizeof(char));
a = symCodes[q];
while(*a != '\0')
printf("%c\n", *(a++));
}
printf("Total encoded symbols: %lu\n", totalEncodedSym);
printf("%s\n", fileIn);
}
void encodeFile(FILE* in, FILE* out)
{
int top = 0;
int i;
char *code;
char *symCodes[256] = {0};
int depth = 0;
code = malloc(255*sizeof(char));
getFileFreq(in, out);
buildSortedList();
makeCodes(buildHuffmanTree(globalSortedLL), code, symCodes, depth);
printTable(symCodes);
headerEncode(in, out, symCodes);
free(code);
}
/*
void decodeFile(FILE* in, FILE* out)
{
}*/
There are many problems in your code:
[major] function compareTwoNodes does not always return a value. The compiler can detect such problems if instructed to output more warnings.
[major] the member symbol in the HuffmanTreeNode should have type int. Type char is problematic as an index value because it can be signed or unsigned depending on compiler configuration and platform specificities. You assume that char has values from 0 to 255, which is incorrect for most platforms where char actually has a range of -128 .. 127. Use unsigned char or int but cast the char values to unsigned char to ensure proper promotion.
[major] comparison if (globalUniqueSymbols == 256) is always false because globalUniqueSymbols is an unsigned char. The maximum number of possible byte values is indeed 256 for 8-bit bytes, but it does not fit in an unsigned char, make globalUniqueSymbols an int.
[major] *uniqueSymbols = globalUniqueSymbols; in function headerEncode stores globalUniqueSymbols into an uninitialized pointer, definitely undefined behavior, probable segmentation fault.
[major] sizeof(uniqueSymbols) is the size of a pointer, not the size of the array not the size of the type. Instead of hacking it as sizeof(uniqueSymbols)-7, fputc(globalUniqueSymbols, out);
[major] fwrite(tooManySym, 1, sizeof(char), out); is incorrect too, since tooManySym is initialized to 0, ie: it is a NULL pointer. You need a special value to tell that all bytes values are used in the source stream, use 0 for that and write it with fputc(0, out);.
You have nested C style comments before function insert, this is not a bug but error prone and considered bad style.
function newNode should take type unsigned long for freq for consistency.
function buildHuffmanTree has unused local variables: right, top and topNode.
variable i is unused in function makeCodes.
many unused variables in headerEncode: byteStream, c, ch, b...
totalEncodedSym is an unsigned long, use an index of the proper type in the loops where you stop at totalEncodedSym.
unused variables un encodeFile: i, top...
Most of these can be detected by the compiler with the proper warning level: gcc -Wall -W or clang -Weverything...
There are probably also errors in the program logic, but you cannot see these until you fix the major problems above.

Program Segmentation Faults on return 0/fclose/free. I think I have memory leaks but can't find them. Please help!

I am trying to write a Huffman encoding program to compress a text file. Upon completetion, the program will terminate at the return statement, or when I attempt to close a file I was reading from. I assume I have memory leaks, but I cannot find them. If you can spot them, let me know (and a method for fixing them would be appreciated!).
(note: small1.txt is any standard text file)
Here is the main program
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define ASCII 255
struct link {
int freq;
char ch[ASCII];
struct link* right;
struct link* left;
};
typedef struct link node;
typedef char * string;
FILE * ofp;
FILE * ifp;
int writebit(unsigned char);
void sort(node *[], int);
node* create(char[], int);
void sright(node *[], int);
void Assign_Code(node*, int[], int, string *);
void Delete_Tree(node *);
int main(int argc, char *argv[]) {
//Hard-coded variables
//Counters
int a, b, c = 0;
//Arrays
char *key = (char*) malloc(ASCII * sizeof(char*));
int *value = (int*) malloc(ASCII * sizeof(int*));
//File pointers
FILE *fp = fopen(argv[1], "r");
if (fp == NULL) {
fprintf(stderr, "can't open %s\n", argv[1]);
return 0;
}
//Nodes
node* ptr;//, *head;
node* array[ASCII];
//
int u, carray[ASCII];
char str[ASCII];
//Variables
char car = 0;
int inList = 0;
int placeinList = -1;
int numofKeys;
if (argc < 2) {
printf("Usage: huff <.txt file> \n");
return 0;
}
for (a = 0; a < ASCII; a++) {
key[a] = -1;
value[a] = 0;
}
car = fgetc(fp);
while (!feof(fp)) {
for (a = 0; a < ASCII; a++) {
if (key[a] == car) {
inList = 1;
placeinList = a;
}
}
if (inList) {
//increment value array
value[placeinList]++;
inList = 0;
} else {
for (b = 0; b < ASCII; b++) {
if (key[b] == -1) {
key[b] = car;
break;
}
}
}
car = fgetc(fp);
}
fclose(fp);
c = 0;
for (a = 0; a < ASCII; a++) {
if (key[a] != -1) {
array[c] = create(&key[a], value[a]);
numofKeys = c;
c++;
}
}
string code_string[numofKeys];
while (numofKeys > 1) {
sort(array, numofKeys);
u = array[0]->freq + array[1]->freq;
strcpy(str, array[0]->ch);
strcat(str, array[1]->ch);
ptr = create(str, u);
ptr->right = array[1];
ptr->left = array[0];
array[0] = ptr;
sright(array, numofKeys);
numofKeys--;
}
Assign_Code(array[0], carray, 0, code_string);
ofp = fopen("small1.txt.huff", "w");
ifp = fopen("small1.txt", "r");
car = fgetc(ifp);
while (!feof(ifp)) {
for (a = 0; a < ASCII; a++) {
if (key[a] == car) {
for (b = 0; b < strlen(code_string[a]); b++) {
if (code_string[a][b] == 48) {
writebit(0);
} else if (code_string[a][b] == 49) {
writebit(1);
}
}
}
}
car = fgetc(ifp);
}
writebit(255);
fclose(ofp);
ifp = fopen("small1.txt", "r");
fclose(ifp);
free(key);
//free(value);
//free(code_string);
printf("here1\n");
return 0;
}
int writebit(unsigned char bitval) {
static unsigned char bitstogo = 8;
static unsigned char x = 0;
if ((bitval == 0) || (bitval == 1)) {
if (bitstogo == 0) {
fputc(x, ofp);
x = 0;
bitstogo = 8;
}
x = (x << 1) | bitval;
bitstogo--;
} else {
x = (x << bitstogo);
fputc(x, ofp);
}
return 0;
}
void Assign_Code(node* tree, int c[], int n, string * s) {
int i;
static int cnt = 0;
string buf = malloc(ASCII);
if ((tree->left == NULL) && (tree->right == NULL)) {
for (i = 0; i < n; i++) {
sprintf(buf, "%s%d", buf, c[i]);
}
s[cnt] = buf;
cnt++;
} else {
c[n] = 1;
n++;
Assign_Code(tree->left, c, n, s);
c[n - 1] = 0;
Assign_Code(tree->right, c, n, s);
}
}
node* create(char a[], int x) {
node* ptr;
ptr = (node *) malloc(sizeof(node));
ptr->freq = x;
strcpy(ptr->ch, a);
ptr->right = ptr->left = NULL;
return (ptr);
}
void sort(node* a[], int n) {
int i, j;
node* temp;
for (i = 0; i < n - 1; i++)
for (j = i; j < n; j++)
if (a[i]->freq > a[j]->freq) {
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
void sright(node* a[], int n) {
int i;
for (i = 1; i < n - 1; i++)
a[i] = a[i + 1];
}
If your program is crashing on what is otherwise a valid operation (like returning from a function or closing a file), I'll near-guarantee it's a buffer overflow problem rather than a memory leak.
Memory leaks just generally mean your mallocs will eventually fail, they do not mean that other operations will be affected. A buffer overflow of an item on the stack (for example) will most likely corrupt other items on the stack near it (such as a file handle variable or the return address from main).
Probably your best bet initially is to set up a conditional breakpoint on writes to the file handles. This should happen in the calls to fopen and nowhere else. If you detect a write after the fopen calls are finished, that will be where your problem occurred, so just examine the stack and the executing line to find out why.
Your first problem (this is not necessarily the only one) lies here:
c = 0;
for (a = 0; a < ASCII; a++) {
if (key[a] != -1) {
array[c] = create(&key[a], value[a]);
numofKeys = c; // DANGER,
c++; // WILL ROBINSON !!
}
}
string code_string[numofKeys];
You can see that you set the number of keys before you increment c. That means the number of keys is one less than you actually need so that, when you access the last element of code_string, you're actually accessing something else (which is unlikely to be a valid pointer).
Swap the numofKeys = c; and c++; around. When I do that, I at least get to the bit printing here1 and exit without a core dump. I can't vouch for the correctness of the rest of your code but this solves the segmentation violation so anything else should probably go in your next question (if need be).
I can see one problem:
strcpy(str, array[0]->ch);
strcat(str, array[1]->ch);
the ch field of struct link is a char array of size 255. It is not NUL terminated. So you cannot copy it using strcpy.
Also you have:
ofp = fopen("small1.txt.huff", "w");
ifp = fopen("small1.txt", "r");
If small1.txt.huff does not exist, it will be created. But if small1.txt it will not be created and fopen will return NULL, you must check the return value of fopen before you go and read from the file.
Just from counting, you have 4 separate malloc calls, but only one free call.
I would also be wary of your sprintf call, and how you are actually mallocing.
You do an sprintf(buf, "%s%d", buf, c[i]) but that can potentially be a buffer overflow if your final string is longer than ASCII bytes.
I advise you to step through with a debugger to see where it's throwing a segmentation fault, and then debug from there.
i compiled the program and ran it with it's source as that small1.txt file and got "can't open (null)" if the file doesn't exist or the file exist and you give it on the command like ./huf small1.txt the program crashes with:
Program terminated with signal 11, Segmentation fault.
#0 0x08048e47 in sort (a=0xbfd79688, n=68) at huf.c:195
195 if (a[i]->freq > a[j]->freq) {
(gdb) backtrace
#0 0x08048e47 in sort (a=0xbfd79688, n=68) at huf.c:195
#1 0x080489ba in main (argc=2, argv=0xbfd79b64) at huf.c:99
to get this from gdb you run
ulimit -c 100000000
./huf
gdb --core=./core ./huf
and type backtrace
You have various problems in your Code:
1.- mallocs (must be):
//Arrays
char *key = (char*) malloc(ASCII * sizeof(char));
int *value = (int*) malloc(ASCII * sizeof(int));
sizeof(char) == 1, sizeof(char *) == 4 or 8 (if 64 bits compiler is used).
2.- Buffer sizes 255 (ASCII) is too short to receive the contents of array[0]->ch + array[1]->ch + '\0'.
3.- Use strncpy instead of strcpy and strncat instead of strcat.
4.- key is an array of individuals chars or is a null terminated string ?, because you are using this variable in both ways in your code. In the characters counting loop you are using this variables as array of individuals chars, but in the creation of nodes you are passing the pointer of the array and copying as null terminated array.
5.- Finally always check your parameters before used it, you are checking if argc < 2 after trying to open argv[1].

Resources