Creating an array of array of structures - c

Objective
To create an array of an array of structs, the dimensions based on a file. Then store that into a Linked List.
I am reading from a file formatted like this:
Read a file that is in the form:
name (string) country (string) age (int)
john USA 54
Sam Aus 18
ect
I dont know how many rows and columns the file will have , nor do I know what varible type each column will be
So in theory the first array of struct will contain [NUMBER OF COLUMNS] strucs that will store each variable (using a void pointer and typecasting) along the line( so strucArrayCol[0] = john , structArrayCol[1] = USA ect).
Each of these array of strucs will be stored into another array of strucs which will have [NUMBER OF ROWS] elements so strucArray2Row[0] = strucArrayCol (which contains john , USA and 54) and strucArrayRow[1] will contain another strucArrayCol which contains (sam Aus 18).
So right now I can read the file, and find the number or rows, columns and the variable type of each column.
This is where i start having trouble as im not sure how to go about
1. How to create this array within array ( I know i need to use Malloc)
2.How I would store the variables in the first array of struc, if I
wanted to store age could I just do
void *data = malloc(sizeof(int));
*((int*)data) = TEMP_AGE;
void data being a struc in StrucArrayCol ( in the case of the example if I wanted to store the age of John void* data would be in StrucArrayCol[3] which is inside StucArrayRow[0], as its the 3rd col in the first line)
Sorry if this dosent make sense
Thanks

You can create a linked-list within a linked-list, assuming there is aversion to anything which is not a linked-list! Declare two linked-list node structures, one for rows in the file, and one for columns within each row:
struct column
{
char *buf;
struct column *next;
};
struct row
{
struct column *head;
struct row *next;
};
Read the file one line at a time, add one row node for each line. Each row will have its own link-lists, it will parse the line in to columns.
struct column* column_create(struct column* cursor, char *line)
{
struct column *node = malloc(sizeof(struct column));
node->next = 0;
node->buf = malloc(strlen(line) + 1);
strcpy(node->buf, line);
if (cursor)
cursor->next = node;
return node;
}
struct row* row_create(struct row* cursor, char *line)
{
struct row *node = malloc(sizeof(struct row));
node->next = 0;
node->head = 0;
//parse the line in to columns
struct column *col = 0;
char *token = strtok(line, " \n");
while (token)
{
col = column_create(col, token);
if (!node->head)
node->head = col;
token = strtok(NULL, " \n");
}
if (cursor)
cursor->next = node;
return node;
}
Or you can do this with a 2-dimensional array of text (which would be 3-dimensional array of characters). Or use an array of strings to hold all the lines in the file, then parse each line in to column. From there, you can test each column to see if it is integer or not.
If you don't know the number of lines in the file, use realloc to allocate as much memory is needed during run time. This example reads all the lines in file, and copies it to an array of lines:
int main()
{
FILE *f = fopen("test.txt", "r");
char **lines = 0;
int lines_size = 0;
int lines_capacity = 0;
char buf[1024];
while (fgets(buf, sizeof(buf), f))
{
int len = strlen(buf);
if (!len) continue;
if (lines_size == lines_capacity)
{
lines_capacity += 16;
lines = realloc(lines, lines_capacity * sizeof(char*));
}
lines[lines_size] = malloc(len + 1);
strcpy(lines[lines_size], buf);
lines_size++;
}
int i;
for (i = 0; i < lines_size; i++)
printf("%s", lines[i]);
printf("\n");
for (i = 0; i < lines_size; i++)
free(lines[i]);
free(lines);
return 0;
}
This will work as long as line length in the file never exceeds 1024. Separately, you can parse each line using strtok
void parseline(char *line)
{
char copy[1024];
strcpy(copy, line);
char *token = strtok(copy, " \n");
while (token)
{
printf("[%s], ", token);
token = strtok(NULL, " \n");
}
printf("\n");
}

You need a linked list in a linked list.
Barmak wrote how to get the data. So here is how to get the linked list.
struct sString{
char* str;
void* next_hor;
void* next_ver;
};
struct sInt{
char* Int;
void* next_hor;
void* next_ver;
};
in first column
if ( check type of column)
{for each row
{
generate corresponding struct and link it to previous element (add-function)
}
}
for other columns
{
( check type of column)
{for each row
{
generate corresponding struct and link it to previous element (add-function)
also iterate though linked list and insert the horizontal link
}
}
}
Its very clustered, has tons of overhead and is hard to manage but it should work.
The vertical pointer could also be of the right type as the types don't change in a column.

Related

Working with Linked Lists and .csv files in C, and one variable seems to be affecting the other for no discernable reason

So I'm working on an assignment that requires me to import a .csv file into a C program. The data (locations and details of picnic tables in an area) is imported into structs, and some of the elements need to be replaced with an index value that represents the material involved.
I've managed to get each line of data and parse it into a list of variables that I'll be populating the tables and database with. The first table that I'm creating contains the 'type' of picnic table that it is, and the code value that will be associated with it.
It checks if the passed value is present in the list, and creates a new entry for it if it is not. Part of this functionality has been implemented so far, but I've found a really odd error during my testing. The ID for the tables range from 3 to five digits, and I've found that when the ID is 3 digits long, two of the characters from the tableType is sliced off. When it's four digits, one of them is sliced off. When it's five digits, the code works fine. Obviously this is a big problem because it's comparing these values to already present ones within the list. I've scoured through every line in my code, and can't figure out what exactly is causing that to happen, especially since the two variables aren't supposed to interact with each other.
The .h file:
/*
* You may add members to the struct below,
* only keep it typedef'ed to DataBase
*/
/*
Define the structs typedef'ed to:
Table, NeighbourhoodTable, PicnicTable
as per the assignment specs
*/
typedef struct tableNode {
int code;
char *name;
struct tableNode *next;
} Table;
typedef struct neighbourhoodNode {
int code;
char *neighbourhood;
struct neighbourhoodNode *next;
} NeighbourhoodTable;
typedef struct node {
int tableId;
int tableTypeId;
int surfaceMaterialId;
int structuralMaterialId;
char *streetAvenue;
int neighbhdId;
char *ward;
double latitude;
double longitude;
char *location;
struct node *next;
} PicnicTable;
typedef struct {
Table *tableTypeTable;
Table *surfaceMaterialTable;
Table *structuralMaterialTable;
NeighbourhoodTable *neighbourhoodTable;
PicnicTable *picnicTableTable;
} DataBase;
/*
* Take the name of a .csv file as parameter and create and populate the Database
* with the corresponding data set information. Return a Database pointer/
*/
DataBase* importDB(char *fileName);
/*
* Take the Database pointer and the name of a .csv file as parameter and create a .csv file containing the
* information of the Database.
* NOTE: the exported .csv file must be exactly the same as the original .csv file
* from which the Database was created if none of editTableEntry or editTable were called.
*/
void exportDB(DataBase*,char *fileName );
/*
* Take the Database pointer, the name of a member of the picnicTable entry and a value for that member
* as parameters and return the number of entries in the picnicTable which have
* that value in the memberName.
Pre: memberName is one of
Structural Material
Surface Material
Street/Avenue
*/
int countEntries(DataBase*,char *memberName, char * value );
/*
* Take the Database pointer and the name of a member of the picnicTable entry as an argument
* and sort the table in ascending order of the entry values of that member.
Pre: memberName is one of
Neighbourhood Id
Ward
*/
void sortByMember(DataBase*,char *memberName);
/*
* Take the Database pointer and, a tableID, the name of a member of the picnicTable entry and a value for that
* member as parameters, and find the entry which has that tableID and
* change its memberName value to newValue.
Pre: memberName is one of
Structural Material
Surface Material
Table Type
*/
void editTableEntry(DataBase*, int tableID, char *memberName, char * value);
/*
Add a PicnicTable to picnicTableTable as a new last entry
Param:
site is siteId
t_type is table type code
surf_type is surface material code
stru_type is structural material code
st_av is street/avenue
neighId is neighbourhood id
w is ward
lat is latitude
longt is longitude
loc is location
Pre: if t_type, surf_type, stru_type, neighId do not exist in their tables, function returns silently without modifying picnicTableTable
Side Effects: picnicTableTable grown by one entry
*/
void addTable(DataBase *Db,int site, int t_type, int surf_type, int stru_type, char * st_av, int neighId, char * w, char * lat, char * longt, char * loc);
/*
* print a listing of picnic tables grouped by wards in ascending order.
*/
void reportByWard(DataBase*);
/*
* print a listing of picnic tables grouped by neigbourhoods in ascending
* alphabetical order.
*/
void reportByNeighbourhood(DataBase*);
/*
free all dynamically allocated memory
*/
void destroyDB (DataBase*);
The .c file:
#include "DB.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int checkTable(Table *table, char *value){
Table *current = table;
/*Check each existing value to see if it matches.*/
while (current != NULL){
printf("Current Address: %p\n", current);
printf("Current Name: %s Value: %s\n", current->name, value);
if(current->name == value){
printf("Match found.\n");
return current->code;
}
else{
printf("Moving on.\n");
current = current->next;
}
}
/*If the code gets to this point, it encountered the end of the
list without finding a match. Therefore, create a new entry. */
/* current->next = malloc(sizeof(Table));
lastIndex = current->code;
current = current->next;
current->name = value;
current->code = lastIndex + 1;
current->next = NULL;
return current->code;
*/}
DataBase* importDB(char *fileName){
char *newTableType, *newSurfaceMaterial, *newStructuralMaterial,
*newStreetAvenue, *newNeighbhd, *newWard, *newLocation;
int newTableID, tableTypeID, surfaceMaterialID, structuralMaterialID, newNeighbhdID;
double newLatitude, newLongitude;
FILE *fp = fopen(fileName, "r");
//If file not found:
if(fp == NULL){
printf("File '%s' Not Found", fileName);
}
//Create the tables and Database
else{
Table *tableType = NULL;
/*
Table *surfaceMaterial = NULL;
Table *structuralMaterial = NULL;
NeighbourhoodTable *neighbourhood = NULL;
PicnicTable *picnicTable = NULL;
DataBase *DB;
DB = malloc(sizeof(DataBase));
DB->tableTypeTable = tableType;
DB->surfaceMaterialTable = surfaceMaterial;
DB->structuralMaterialTable = structuralMaterial;
DB->neighbourhoodTable = neighbourhood;
DB->picnicTableTable = picnicTable;
*/
char str[500];
char *token;
int count = 0;
/*The first line of the csv files are the headers, so read them
but don't do anything to them. */
fgets(str, 500, fp);
/*Run fgets until it encounters NULL*/
while (fgets(str, 500, fp) != NULL){
/*Parse the line of data into tokens, and assign each token to
the table variables.*/
token = strtok(str, ",");
newTableID = atoi(token);
token = strtok(NULL, ",");
newTableType = token;
token = strtok(NULL, ",");
newSurfaceMaterial = token;
token = strtok(NULL, ",");
newStructuralMaterial = token;
token = strtok(NULL, ",");
newStreetAvenue = token;
token = strtok(NULL, ",");
newNeighbhdID = atoi(token);
token = strtok(NULL, ",");
newNeighbhd = token;
token = strtok(NULL, ",");
newWard = token;
token = strtok(NULL, ",");
newLatitude = atof(token);
token = strtok(NULL, ",");
newLongitude = atof(token);
token = strtok(NULL, ",");
newLocation = token;
/*Check for the table type in the tableType Table.*/
/*First check if the table is empty and equals NULL, if so
then create the first entry. Otherwise, check the existing values*/
if (tableType == NULL){
printf("Created Table.\n");
tableType = malloc(sizeof(Table)+8);
tableType->code = 0;
tableType->name = newTableType;
tableType->next = NULL;
printf("Address: %p\n", tableType);
}
else{
printf("Called the function on Table ID %d with type %s\n", newTableID, newTableType);
printf("Returned %d\n\n",checkTable(tableType,newTableType));
}
}
free(tableType);
}
fclose(fp);
}
void destroyDB(DataBase* DB){
free(DB);
}
int main(){
importDB("example.csv");
return 0;
}
The makefile:
CC = gcc
CFLAGS = -Wall -ansi -pedantic -std=c99
DB: DB.c
$(CC) $(CFLAGS) -o DB DB.c
I'm not particularly sure if there's a way to attach a .csv file here, so I'll simply paste the contents:
The .csv file

Array of structure overwritten while reading from a file in C

I am trying to build an array of structures with a string and starting address of a linked list of another structure. Several strings from each line of a file is to be filled into this data structure. But whenever I am coming back to the next line, all the variables of the array and LL that I have filled up so far is being changed to the variables in the current line. As a result all the elements of the array as well as the corresponding linked lists are giving the same result in each element of the array. Here is the code.
struct node
{
char* nodedata;
struct node* link;
};
struct motief
{
char* motiefname;
struct node* link;
};
void add_unique_nodes(struct motief*,int,char *);
int unique_motief(struct motief*,char*);
void add_unique_nodes(struct motief* motiefs,int motief_no,char * token)
{
struct node *temp,*r;
r = malloc(sizeof(struct node));
r->nodedata = token;
//for the first node
if(motiefs[motief_no].link == NULL)
{
motiefs[motief_no].link = r;
}
else
{
temp = motiefs[motief_no].link;
while(temp->link != NULL && token != temp->nodedata)
temp = temp->link;
if(token != temp->nodedata)
temp->link = r;
}
r->link = NULL;
}
void main()
{
struct motief motiefs[100];
FILE *fp, *ft;
fp = fopen("dump.txt","r");
ft = fopen("motief_nodes","w");
char line[100] ={0};
char seps[] = ",";
char* token;
int motief_no = 0;
int i,j;//loop variable
//read the database
while(!feof(fp))
{
if( fgets(line, sizeof(line), fp ))
{
if(motief_no == 1)
printf("for start of 2nd step %s\t",motiefs[0].motiefname);//????
printf("for line %d\t",motief_no);
//get the motief from each input line as a token
token = strtok (line, seps);
//store it in motief array
motiefs[motief_no].motiefname = token;
printf("%s\n",motiefs[motief_no].motiefname);
if(motief_no == 1)
printf("for zero %s\n",motiefs[0].motiefname);
motiefs[motief_no].link = NULL;
//get and store all the nodes
while (token != NULL)
{
//get the node
token = strtok (NULL, seps);
if(token != NULL)
add_unique_nodes(motiefs,motief_no,token);
}
if(motief_no == 0)
printf("for end of 1st step %s\n",motiefs[0].motiefname);
motief_no++;
if(motief_no == 2)//break at 2nd loop, at 1
break;
}
I am new to C programming. I cannot find why it is happening. Please help me to find where I am going wrong and why the file is read into the array besides the specified variable for that purpose in my code. Thanks in advance. Here are few lines from the file to be read.
000100110,1,95,89
000100111,1,95,87
000100110,1,95,74
000100110,1,95,51
I am displaying the structure with the following code
struct node* temp;
for(j=0;j<2;j++)
{
printf("turn %d\t",j);
printf("%s\t",motiefs[j].motiefname);
temp = motiefs[j].link;
printf("%s\t",temp->nodedata);
do
{
temp = temp->link;
printf("%s\t",temp->nodedata);
}while(temp->link != NULL);
printf("\n");
}
And the it shows the following overall result
for line 0 000100110
for end of 1st step 000100110
for start of 2nd step 000100111,1,95,87
for line 1 000100111
for zero 000100111
turn 0 000100111 1 95 87
turn 1 000100111 1 95 87
You keep changing the same memory space when you read into 'line'. you need to allocate new memory space each time you want a different array.
Picture 'line' as pointing to a specific chunk of 100 bytes in a row. You keep telling the fgets function to write to that location, and you also keep copying the address of that location into your structs when you assign the 'token' to moteifname.
Then when you change what's at that address of course it's overwriting what the struct points to as well!
You need to choose to allocate the space in each struct instead of having an internal pointer OR you need to dynamically allocate space each iteration using malloc and then free() all the pointers at the end.
https://www.codingunit.com/c-tutorial-the-functions-malloc-and-free

Read space delimited file with variable number of columns in C

So I have files formatted as follows:
2
4 8 4 10 6
9 6 74
The first line is actually the number of rows that the file will have after it. I want to read the files line by line (note there are different number of tokens in each line but all have the format: 1 token and then an unspecified number of pairs of tokens) and do two things for each line:
1) Know how many tokens are in this line.
2) Assign each token to a variable. Using structures similar to:
typedef struct {
unsigned start; //start node of a graph
unsigned end; // end node of a graph
double weight; //weight of the edge going from start to end
} edge ;
typedef struct {
unsigned id; // id of the node
unsigned ne; // number of edges adjacent to node
edge *edges; // array of edge to store adjacent edges of this node
} node;
Some code:
FILE *fin;
unsigned nn;
node *nodes;
fin = fopen ("input.txt", "r");
fscanf(fin,"%u\n", &nn);
nodes = malloc(nn*sizeof(node));
for(i=0; i < nn; i++) { //loop through all the rows
/*grab the row and split in parts, let's say they are part[0], part[1]... */
/*and there are N tokens in the row*/
nodes[i].id=part[0];
nodes[i].ne=(N-1)/2; //number of pairs excluding first element
nodes[i].edges=malloc( (N-1)/2)*sizeof(edge) );
for(j=0; j< (N-1)/2; j++){
nodes[i].edges[j].start=part[0];
nodes[i].edges[j].end=part[2*j+1];
nodes[i].edges[j].weight=part[2*j+2];
}
}
I need to figure out how to do the part comented inside the first for loop to get the number of tokens and each one of them as a simgle token to asign. Any ideas?
EDIT: to make things clear, each line will have first one integer, and then a variable number of pairs. I want to store data as follows:
if the file reads
2
4 8 4 10 6 //(2 pairs)
9 6 74 //(1 pair)
then
nn=2;
node[0].id=4;
node[0].ne=2; //(2 pairs)
node[0].(*edges) //should be a vector of dimension ne=2 containing elements of type edge
node[0].edges[0].start=4; //same as node[0].id
node[0].edges[0].end=8;
node[0].edges[0].weight=4;
node[0].edges[1].start=4; //same as node[0].id
node[0].edges[1].end=10;
node[0].edges[1].weight=6;
node[1].id=9;
node[1].ne=1; //(1 pair)
node[1].(*edges) //should be a vector of dimension ne=1 containing elements of type edge
node[1].edges[0].start=9; //same as node[1].id
node[1].edges[0].end=6;
node[1].edges[0].weight=74;
This code produces the results you described, It initializes your nested struct member edge, and uses strtok. With strtok(), I included the \n as part of the delimiter in addition to a space " \n" to prevent the newline from giving us trouble (see other comments on that below)
Note: you have to free memory where I have indicated, but before you do, preserve the intermediate results (in the structs) or it will be lost.
#include <ansi_c.h>
typedef struct {
unsigned start;
unsigned end;
double weight;
} edge ;
typedef struct {
unsigned id;
unsigned ne;
edge *edges;
} node;
int GetNumPairs(char *buf);
int main(void)
{
FILE *fp;
char *tok;
char lineBuf[260];
int i=0, j=0;
int nn; //number of nodes
char countPairsBuf[260];
fp = fopen("C:\\dev\\play\\numbers.txt", "r");
//get first line of file for nn:
fgets (lineBuf, sizeof(lineBuf), fp);
nn = atoi(lineBuf);
//create array of node with [nn] elements
node n[nn], *pN;
pN = &n[0];
//read rest of lines, (2 through end)
i = -1;
while(fgets (lineBuf, sizeof(lineBuf), fp))
{
i++;
//get number of items in a line
strcpy(countPairsBuf, lineBuf);
pN[i].ne = GetNumPairs(countPairsBuf); //number of edges (pairs)
if(pN[i].ne > 0)
{ //allocate *edges struct element
pN[i].edges = malloc((pN[i].ne)*sizeof(edge));
//get first item in new line as "line token" and "start"
tok = strtok(lineBuf, " \n");
while(tok)
{
pN[i].id = atoi(tok);
//now get rest of pairs
for(j=0;j<pN[i].ne;j++)
{
pN[i].edges[j].start = pN[i].id;
tok = strtok(NULL, " \n");
pN[i].edges[j].end = atoi(tok);
tok = strtok(NULL, " \n");
pN[i].edges[j].weight = atoi(tok);
}
tok = strtok(NULL, " \n"); //should be NULL if file formatted right
}
}
else //pN[i].ne = -1
{
//error, file line did not contain odd number of elements
}
}
//you have to free memory here
//but I will leave that to you
fclose(fp);
}
//GetNumPairs
int GetNumPairs(char *buf)
{
int len = strlen(buf);
int numWords=0, i, cnt=0;
for(i=0;i<len;i++)
{
if ( isalpha ( buf[i] ) ) cnt++;
else if ( ( ispunct ( buf[i] ) ) || ( isspace ( buf[i] ) ) )
{
numWords++;
cnt = 0;
}
}//if odd number of "words", return number of pairs, else error
return (((numWords-1)%2) == 0) ? ((numWords-1)/2) : (-1);
}

Hashed array linked list collision resolution error

This block of code reads a dictionary file and stores it in a hashed array. This hashing array uses linked list collision resolution. But, for some incomprehensible reason, the reading stops in the middle. (i'm assuming some problem occurs when linked list is made.) Everything works fine when data is being stored in a empty hashed array element.
#define SIZE_OF_ARRAY 350
typedef struct {
char* key;
int status; // (+1) filled, (-1) deleted, 0 empty
LIST* list;
}HASHED_ARRAY;
void insertDictionary (HASHED_ARRAY hashed_array[])
{
//Local Declaration
FILE* data;
char word[30];
char* pWord;
int index;
int length;
int countWord = 0;
//Statement
if (!(data = fopen("dictionaryWords.txt", "r")))
{
printf("Error Opening File");
exit(1);
}
SetStatusToNew (hashed_array); //initialize all status to 'empty'
while(fscanf(data, "%s\n", word) != EOF)
{
length = strlen(word) + 1;
index = hashing_function(word);
if (hashed_array[index].status == 0)//empty
{
hashed_array[index].key = (char*) malloc(length * sizeof(char));//allocate word.
if(!hashed_array[index].key)//check error
{
printf("\nMemory Leak\n");
exit(1);
}
strcpy(hashed_array[index].key, word); //insert the data into hashed array.
hashed_array[index].status = 1;//change hashed array node to filled.
}
else
{
//collision resolution (linked list)
pWord = (char*) malloc(length * sizeof(char));
strcpy (pWord, word);
if (hashed_array[index].list == NULL) // <====== program doesn't enter
//this if statement although the list is NULL.
//So I'm assuming this is where the program stops reading.
{
hashed_array[index].list = createList(compare);
}
addNode(hashed_array[index].list, pWord);
}
countWord++;
//memory allocation for key
}
printStatLinkedList(hashed_array, countWord);
fclose(data);
return;
}
createList and addNode are both ADT function. Former takes a function pointer (compare is a function that I build inside the main function) as a parameter, and latter takes list name, and void type data as parameters. compare sorts linked list. Please spot me the problem .
Depending on where you declare the hashed_array you pass to this function, the contents of it may not be initialized. This means that all contents of all entries is random. This includes the list pointer.
You need to initialize this array properly first. The easiest way is to simple use memset:
memset(hashed_array, 0, sizeof(HASHED_ARRAY) * whatever_size_it_is);
This will set all members to zero, i.e. NULL for pointers.

Hash table sorting and execution time

I write a program to count the frequency word count using hash table, but I don't how to sort it.
I use struct to store value and count.
My hash code generate function is using module and my hash table is using by linked list.
1.My question is how do I sort them by frequency?
2.I am wondering that why my printed execute time is always zero, but I check it for many time. Where is the wrong way?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <ctype.h>
#define HASHSIZE 29989
#define FACTOR 31
#define VOCABULARYSIZE 30
typedef struct HashNode HashNode;
struct HashNode{
char* voc;//vocabulary
int freq;//frequency
struct HashNode *next;//pointed to the same hashcode
//but actually are different numbers
};
HashNode *HashTable[HASHSIZE] = {NULL,0,NULL};//an array of pointers
unsigned int HashCode(const char *pVoc){//generate hashcode
unsigned int index = 0;
int n = strlen(pVoc);
int i = 0;
for(; i < n; i ++)
index = FACTOR*index + pVoc[i];
return index % HASHSIZE;
}
void InsertVocabulary(const char *pVoc){//insert vocabulary to hash table
HashNode *ptr;
unsigned int index = HashCode(pVoc);
for(ptr = HashTable[index]; ptr != NULL; ptr = ptr -> next){//search if already exist
if(!strcmp (pVoc, ptr -> voc)){
(ptr->freq)++;
return;
}
}
ptr = (HashNode*)malloc(sizeof(HashNode));//if doesn't exist, create it
ptr -> freq = 1;
ptr -> voc = (char*)malloc(strlen(pVoc)+1);
strcpy(ptr -> voc, pVoc);
ptr -> next = HashTable[index];
HashTable[index] = ptr;
}
void ReadVocabularyTOHashTable(const char *path){
FILE *pFile;
char buffer[VOCABULARYSIZE];
pFile = fopen(path, "r");//open file for read
if(pFile == NULL)
perror("Fail to Read!\n");//error message
char ch;
int i =0;
do{
ch = fgetc(pFile);
if(isalpha(ch))
buffer[i++] = tolower(ch);//all convert to lowercase
else{
buffer[i] = '\0';//c-style string
i = 0;
if(!isalpha(buffer[0]))
continue;//blank line
else //printf("%s\n",buffer);
InsertVocabulary(buffer);
}
}while(ch != EOF);
fclose(pFile);
}
void WriteVocabularyTOHashTable(const char *path){
FILE *pFile;
pFile = fopen(path, "w");
if(pFile == NULL)
perror("Fail to Write\n");
int i = 0;
for(; i < HASHSIZE; i++){
HashNode *ptr = HashTable[i];
for(; ptr != NULL; ptr = ptr -> next){
fprintf(pFile, "Vocabulary:%s,Count:%d\n", ptr -> voc, ptr -> freq);
if(ptr -> next == NULL)
fprintf(pFile,"\n");
}
}
fclose(pFile);
}
int main(void){
time_t start, end;
time(&start);
ReadVocabularyTOHashTable("test.txt");
WriteVocabularyTOHashTable("result.txt");
time(&end);
double diff = difftime(end,start);
printf("%.21f seconds.\n", diff);
system("pause");
return 0;
}
This is an answer to your first question, sorting by frequency. Every hash node in your table is a distinct vocabulary entry. Some hash to the same code (thus your collision chains) but eventually you have one HashNode for every unique entry. To sort them by frequency with minimal disturbing of your existing code you can use qsort() with a pointer list (or any other sort of your choice) with relative ease.
Note: the most efficient way to do this would be to maintain a sorted linked-list during vocab-insert, and you may want to consider that. This code assumes you already have a hash table populated and need to get the frequencies out in sorted order of highest to lowest.
First, keep a running tally of all unique insertions. Simple enough, just add a counter to your allocation subsection:
gVocabCount++; // increment with each unique entry.
ptr = (HashNode*)malloc(sizeof(HashNode));//if doesn't exist, create it
ptr -> freq = 1;
ptr -> voc = (char*)malloc(strlen(pVoc)+1);
strcpy(ptr -> voc, pVoc);
ptr -> next = HashTable[index];
HashTable[index] = ptr;
Next allocate a list of pointers to HashNodes as large as your total unique vocab-count. then walk your entire hash table, including collision chains, and put each node into a slot in this list. The list better be the same size as your total node count or you did something wrong:
HashNode **nodeList = malloc(gVocabCount * sizeof(HashNode*));
int i;
int idx = 0;
for (i=0;i<HASHSIZE;++i)
{
HashNode* p = HashTable[i];
while (p)
{
nodeList[idx++] = p;
p = p->next;
}
}
So now we have a list of all unique node pointers. We need a comparison function to send to qsort(). We want the items with the largest numbers to be at the head of the list.
int compare_nodeptr(void* left, void* right)
{
return (*(HashNode**)right)->freq - (*(HashNode**)left)->freq;
}
And finally, fire qsort() to sort your pointer list.
qsort(nodeList, gVocabCount, sizeof(HashNode*), compare_nodeptr);
The nodeList array of HashNode pointers will have all of your nodes sorted in descending frequency:
for (i=0; i<gVocabCount; ++i)
printf("Vocabulary:%s,Count:%d\n", nodeList[i]->voc, nodeList[i]->freq);
Finally, don't forget to free the list:
free(nodeList);
As I said at the beginning, the most efficient way to do this would be to use a sorted linked list that pulls an incremented value (by definition all new entries can go to the end) and runs an insertion sort to slip it back into the right place. In the end that list will look virtually identical to what the above code would create (like-count-order not withstanding; i.e. a->freq = 5 and b->freq = 5, either a-b or b-a can happen).
Hope this helps.
EDIT: Updated to show OP an idea of what the Write function that outputs sorted data may look like:
static int compare_nodeptr(const void* left, const void* right)
{
return (*(const HashNode**)right)->freq - (*(const HashNode**)left)->freq;
}
void WriteVocabularyTOHashTable(const char *path)
{
HashNode **nodeList = NULL;
size_t i=0;
size_t idx = 0;
FILE* pFile = fopen(path, "w");
if(pFile == NULL)
{
perror("Fail to Write\n");
return;
}
nodeList = malloc(gVocabCount * sizeof(HashNode*));
for (i=0,idx=0;i<HASHSIZE;++i)
{
HashNode* p = HashTable[i];
while (p)
{
nodeList[idx++] = p;
p = p->next;
}
}
// send to qsort()
qsort(nodeList, idx, sizeof(HashNode*), compare_nodeptr);
for(i=0; i < idx; i++)
fprintf(pFile, "Vocabulary:%s,Count:%d\n", nodeList[i]->voc, nodeList[i]->freq);
fflush(pFile);
fclose(pFile);
free(nodeList);
}
Something like that, anyway. From the OP's test file, these are the top few lines of output:
Vocabulary:the, Count:912
Vocabulary:of, Count:414
Vocabulary:to, Count:396
Vocabulary:a, Count:388
Vocabulary:that, Count:260
Vocabulary:in, Count:258
Vocabulary:and, Count:221
Vocabulary:is, Count:220
Vocabulary:it, Count:215
Vocabulary:unix, Count:176
Vocabulary:for, Count:142
Vocabulary:as, Count:121
Vocabulary:on, Count:111
Vocabulary:you, Count:107
Vocabulary:user, Count:102
Vocabulary:s, Count:102

Resources