Trouble finding frequency of words from a file in C - c

I need to write a code that will print the frequency of each word from a given file. Words like "the" and "The" will count as two different words. I've written some code so far but the command prompt stops working when I try to run the program. I just need some guidance and to be pointed in the best direction for this code, or I would like to be told that this code needs to be abandoned. I'm not very good at this so any help would be very appreciated.
#include <stdio.h>
#include <string.h>
#define FILE_NAME "input.txt"
struct word {
char wordy[2000];
int frequency;
} words;
int word_freq(const char *text, struct word words[]);
int main (void)
{
char *text;
FILE *fp = fopen(FILE_NAME, "r");
fread(text, sizeof(text[0]), sizeof(text) / sizeof(text[0]), fp);
struct word words[2000];
int nword;
int i;
nword = word_freq(text, words);
puts("\nWord frequency:");
for(i = 0; i < nword; i++)
printf(" %s: %d\n", words[i].wordy, words[i].frequency);
return 0;
}
int word_freq(const char *text, struct word words[])
{
char punctuation[] =" .,;:!?'\"";
char *tempstr;
char *pword;
int nword;
int i;
nword = 0;
strcpy(tempstr, text);
while (pword != NULL) {
for(i = 0; i < nword; i++) {
if (strcmp(pword, words[i].wordy) == 0)
break;
}
if (i < nword)
words[i].frequency++;
else {
strcpy(words[nword].wordy, pword);
words[nword].frequency= 1;
nword++;
}
pword = strtok(NULL, punctuation);
}
return nword;
}

First off all:
char *text;
FILE *fp = fopen(FILE_NAME, "r");
fread(text, sizeof(text[0]), sizeof(text) / sizeof(text[0]), fp);
Reads probably 4 bytes of your file because sizeof(text[0]) is 1 and sizeof(text) is probably 4 (depending on pointer size). You need to use ftell() or some other means to get the actual size of your data file in order to read it all into memory.
Next, you are storing this information into a pointer that has no memory allocated to it. text needs to be malloc'd or made to hold memory in some way. This is probably what is causing your program to fail to work, just to start.
There are so so SO many further issues that it will take time to explain them:
How you are using strcpy to blow up memory when you place it intotempstr
How even if that weren't the case, it would copy probably the whole file at once, unless the file had NULL terminated strings within, which it may, so perhaps this is ok.
How you compare nwords[i].wordy, even though it is not initialized and therefore garbage.
How, even if your file were read into memory correctly, you look a pword, which is unitialized for your loop counter.
Please, get some help or ask your teacher about this because this code is seriously broken.

Related

there is a easy way to visualize and modify a txt file?

i'm having a little big trouble in c. in particular im not able to save and modify a .txt file on an easy and efficient way.
the fact is: from a file.txt, i have to save all the words on a struct, and after that i will have to do some operations on this, like modify a specific word, a bubble sort, ecc ecc.
Im having problem on how to correctly save all the words in the struct, in the most generic possible way, even if a word from a line of the file is missing.
i mean:
1 line: word1 word2
2 line: word3
3 line: word4 word5
So even if a word is missing, i need to be able to save all this words, leaving something like a missing space in the struct.
the code that im posting is, at the moment, the best i can make with my hands, because i dont have any more ideas about what i should do.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#define MAX (10) //<- is 10 because the max numbers of letters in any single word is less than 10
struct word{
char word1[MAX+1]; //<- here im defying a struct with 2 char. this struct will contain the words coming out from the file.
char word2[MAX+1]; //<- max+1 because i have ti give 1 space for the " " or the "/n" case.
};
struct word *file_read(FILE *fp, int *count){
int dim = 2; //<- the int dim is the dimensione
char buf[1024]; //<- a simple buffer
struct word *w;
int conv = 0; //<- another counter that i will use in sscanf
int i;
if(!(w = calloc(dim , sizeof(*w)))){
free(w);
}
while(fgets(buf, sizeof(buf),fp)!= NULL){
conv = sscanf(buf, "%s %s", w->word1, w->word2);
if(conv >= 1){ //if conv doesnt increase, just no print
printf("\n%s ", w[*count].word1);
}
if(conv == 2){ //the same operation
printf("%s", w[*count].word2);
}
i++;
if(*count>= dim){
dim *= 2;
struct word* temp = realloc(w, sizeof(*w)*dim);
if(temp != NULL){
w = temp;
} else{
free(w);
return NULL;
}
(*count)++;
}
}
return w;
}
int main(int argc, char *argv[]){ //<- the file will be passed by argv[1] argument
FILE *fp; //<- im defying the FILE type
fp= fopen(argv[1], "r"); //<- im opening the file passed from argv[1], in reading mode
if(fp == 0){ //<- if file is not loaded, the programm have to stop.
printf("FILE IS NOT LOADED");
return 1;
}
struct word *w; //<- im creating a struct pointer called w
int count= 0;
if(!(w = file_read(fp, &count))){ //<- going to do the reading subroutine
return 0;
}
//AFTER THE READING, I SHOULD BE ABLE TO SAVE ALL THE WORDS IN THE STRUCT
//AND I SHOUL BE ABLE TO DO SOME OPERATIONS, LIKE VISUALIZE IT DIFFERENT WAYS
//DO BUBBLE SORT, QSORT, MODIFY THE WORDS IN THE STRUCT, ECC...
}
so, please, how can i make it works???? thank you everybody.
i hope i've been clear :)
user3386109's suggestion of adding int conv to the word structure is good. Still there are errors:
(*count)++ is out of place, so that it's never reached. Move it to the place of the pointless i++.
In order to not store every line's words in the first structure, change all w-> to w[*count]..

Why does my C program crash when i add any statement to the main function?

I'm reasonably new to C - I've made a program that reads a text file and through several functions processes the text by using malloc() and realloc() and formats an output. At the moment I only have the following in my main():
1. initializing the structs
line_arr.max_size = 0;
line_arr.num_lines = 0;
line_arr.line = NULL;
Word_Array word_arr;
word_arr.max_size_ = 0;
word_arr.num_words = 0;
word_arr.word = NULL;
Word_Array ex_words;
ex_words.max_size_ = 0;
ex_words.num_words = 0;
ex_words.word = NULL;
Line_Array.line and Word_Array.word are both pointers to structs Line and Word, respectively. Line and Word are structs that hold a char array and an int that keeps track of what line number the line is or the word occurs in.
2. calling the functions that process the input text files
get_lines(&line_arr, argv[1]);
get_words(&line_arr, &word_arr);
sort_keywords(&word_arr);
All of my functions return void.
All the output and formatting occurs within the functions and any space I have malloc'd I have subsequently freed without any error. The program produces the desired output, however if I add any statement to the main() even a print statement, I get a bus10 error or a segmentation fault 11 error.
I have tried gdb but I haven't used it before, and it seems as though gdb hangs and freezes when I try to run the program anyway.
I'm wondering if anyone knows why this is the case? / is there some fundamental logic I'm not understanding about C or malloc()?
Thanks for your expertise!
edit
structs:
typedef struct Line Line;
struct Line{
char line[MAX_LINE_LEN];
};
typedef struct Word{
char word[MAX_WORD_LEN];
int line_number;
}Word;
typedef struct Word_Array{
//will point to the base in memory where the list of words begins.
Word *word;
int max_size_;
int num_words;
}Word_Array;
typedef struct Line_Array{
//line is a variable that stores an address of a line object.
Line *line;
int max_size;
int num_lines;
}Line_Array;
get_lines():
void get_lines(Line_Array *la, char const *filename){
Line *line_ptr;
char *buffer;
size_t buffer_len;
FILE *fp = fopen(filename, "r");
if(fp == NULL){
printf("Can't read file. \n");
exit(1);
}
while (getline(&buffer, &buffer_len, fp) > 0) {
buffer[strlen(buffer)-1] = '\0';
if (la->line == NULL) {
la->line = (Line *) malloc(sizeof(Line));
if (la->line == NULL) {
exit(1);
}
la->max_size = 1;
la->num_lines = 0;
}
else if (la->num_lines >= la->max_size) {
line_ptr = (Line *) realloc(la->line, (2*la->max_size) * sizeof(Line));
if (line_ptr == NULL) {
exit(1);
}
la->max_size *= 2;
la->line = line_ptr;
}
strncpy(la->line[la->num_lines].line, buffer, MAX_LINE_LEN);
la->num_lines++;
}
fclose(fp);
}
I haven't freed the memory in this method since I make use of it later, but even when other functions aren't being run the same problem exists where if I add something to the main before or after calling get_lines, I receive bus 10 error as my only output. However if i only call get_lines() and other functions the program produces the right output.
A least one problem:
Variables need to be initialized before getline() usage. #Weather Vane
//char *buffer;
//size_t buffer_len;
char *buffer = NULL;
size_t buffer_len = 0;
Notes:
After the while (getline(... , code should free with free(buffer);
strncpy(la->line[la->num_lines].line, buffer, MAX_LINE_LEN); does not insure la->line[la->num_lines].line is a string. Why is strncpy insecure?

Do i need to use 2D arrays for an array of strings in C?

I want my program to read N words from a text file and save them in an array. My question is, do i need a 2D Array e.g: char **wordList or is the 1D Array in the example below sufficient? The output is correct except from the last string which as you can see is weird. Also, am i allocating sufficient memory for the array and why does the last output string come out wrong?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void populateWordsArray(int);
FILE *file;
char *word;
char **wordList;
/*
* Function populateWordsArray: reads N words from
* the given file and creates an array with them.
* Has one argument: int N - the number of words to read.
*/
void populateWordsArray(int N) {
int i = 0;
while(!feof(file) && i < N) {
fscanf(file,"%s",&word[i]);
printf("%s\n",&word[i]);
i++;
}
}
int main(int argc,char *argv[]) { // argv[1] = op argv[2] = name
int N = 0;
file = fopen(argv[2],"r");
if(file == (FILE *) NULL) { // check if the file opened successfully
fprintf(stderr,"Cannot open file\n");
}
fscanf(file,"%d",&N); // get the N number
word = malloc(N * sizeof(char));
populateWordsArray(N);
// write a switch method for the various ops
// call the appropriate function for each operation
free(word);
fclose(file);
return 0;
}
Output:
this
is
a
test!
with
files.
new
line,
here.
ere.
text file content:
10 this is a test! with files.
new line, here.
Your example is wrong. When executing the line fscanf(file,"%s",&word[i]);, the third argument should be the address where the function will write the read data. In your case, word[i] is the i-th element of the array and &word[i] is its address. So, the word will be stored with the first character at the word[i]. Your code only prints something because you print it immediately. Also, you don't get a segfault by pure chance.
If you want to read a string into a buffer, you first need to allocate the space for the buffer.
By using char **, you can make it into a 2D array by first allocating sufficient space for the array of pointers and then allocate sufficient space for each of the pointers to hold an address to a string.
I have rewritten your program for you:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_STRING_LENGTH 100
void populateWordsArray(int);
FILE *file;
char **wordList;
void populateWordsArray(int N)
{
int i = 0;
while(i < N && fscanf(file,"%s", wordList[i]) == 1) // fscanf returns the number of successfully read items. If it's not 1, the read failed. You can also check if the return value is not EOF but, in this situation, it's the same.
{
printf("%s\n", wordList[i]); // i-th element is an address to a buffer that contains 100 bytes
i++;
}
}
int main(int argc,char *argv[])
{
int N = 0, i;
file = fopen(argv[1],"r"); // Indexing starts from 0 in C. Thus, 0th argument is the executable name and 1st is what you want.
if(file == NULL) // No need to cast NULL into a specific type.
{
fprintf(stderr,"Cannot open file\n");
return 1; // You might want to end the program here, possibly with non-zero return value.
}
fscanf(file,"%d",&N);
wordList = malloc(N * sizeof(char*)); // Allocating space for pointers
for(i=0; i<N; i++)
{
wordList[i] = malloc(MAX_STRING_LENGTH); // Allocating space for individual strings
}
populateWordsArray(N);
for(i=0; i<N; i++)
{
free(wordList[i]);
}
free(wordList);
fclose(file);
return 0;
}
I'd also advise against using global variables here.
EDIT: As the comments may suggest, this code is not the best solution. First, all the words might not fit into a 100 byte buffer. To alleviate this issue, allocate a large, fixed-size buffer, read every word into it, then allocate corresponding number of bytes for wordList[i] (don't forget the terminating null byte) and copy the data from the fixed-size buffer into wordList[i].
Also, the code has some missing error checks. For instance, the file may exist but is empty, in which case fscanf(file,"%d",&N); will return EOF. Also, the number at the beginning of the file may not be corresponding to the number of the lines that follow or N might be a negative number (the code allows for it by specifying it to be int).
EDIT2: As #bruno suggested, I made a version that I think is more bulletproof than the previous one. It's possible that I omitted something, I'm in a bit of a hurry. If so, let me know below.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_STRING_LENGTH 512
char line_buffer[MAX_STRING_LENGTH]; // A line of the maximum size that can occur.
char** populateWordsArray(unsigned wantedLines, FILE* file, unsigned *readLines);
char** populateWordsArray(unsigned wantedLines, FILE* file, unsigned *readLines)
{
*readLines=0;
char** wordList;
// Allocating space for pointers
wordList = malloc(wantedLines * sizeof(char*));
if(!wordList)
{
fprintf(stderr,"Cannot allocate sufficient space for the pointers.\n");
exit(EXIT_FAILURE); // You may return NULL here and check it afterwards. The same goes for all the error checking inside this function
}
while(*readLines < wantedLines && fscanf(file,"%s", line_buffer) == 1)
{
wordList[*readLines] = malloc(strlen(line_buffer)+1);
if(!wordList[*readLines])
break;
if(NULL == (wordList[*readLines]=strdup(line_buffer)))
break;
(*readLines)++;
}
return wordList;
}
int main(int argc,char *argv[])
{
unsigned N = 0, i, M;
char **wordList;
FILE *file;
file = fopen(argv[1],"r"); // Indexing starts from 0 in C. Thus, 0th argument is the executable name and 1st is what you want.
if(file == NULL) // No need to cast NULL into a specific type.
{
fprintf(stderr,"Cannot open file\n");
return 1; // You might want to end the program here, possibly with non-zero return value.
}
if(fscanf(file,"%d",&N) != 1)
{
fprintf(stderr,"Cannot read the number of lines. Empty file?\n");
return 1;
}
wordList = populateWordsArray(N, file, &M);
printf("Printing the read lines:\n");
for(i=0; i<M; i++)
{
printf("%s\n", wordList[i]);
}
for(i=0; i<M; i++)
{
free(wordList[i]);
}
free(wordList);
fclose(file);
return 0;
}

Reading integers from txt file in C

I`m making a file reader that reads integers numbers line by line from a file. The problem is that is not working. I think I am using fscanf in a wrong way. Can someone help me?
I had already look for answers in others questions but I can`t find anything that explain why I my code is not working.
int read_from_txt(){
FILE *file;
file = fopen("random_numbers.txt", "r");
//Counting line numbers to allocate exact memory
int i;
unsigned int lines = 0;
unsigned int *num;
char ch;
while(!feof(file)){
ch = fgetc(file);
if (ch == '\n'){
lines++;
}
}
//array size will be lines+1
num = malloc(sizeof(int)*(lines+1));
//storing random_numbers in num vector
for(i=0;i<=lines;i++){
fscanf(file, "%d", &num[i]);
printf("%d", num[i]);
}
fclose(file);
}
The txt file is like:
12
15
32
68
46
...
But the output of this code keeps giving "0000000000000000000..."
You forgot to "rewind" the file:
fseek(file, 0, SEEK_SET);
Your process of reading goes through the file twice - once to count lines, and once more to read the data. You need to go back to the beginning of the file before the second pass.
Note that you can do this in a single pass by using realloc as you go: read numbers in a loop into a temporary int, and for each successful read expand the num array by one by calling realloc. This will expand the buffer as needed, and you would not need to rewind.
Be careful to check the results of realloc before re-assigning to num to avoid memory leaks.
You could try to use the getline function from standard IO and add the parsed numbers into the array using only one loop. See the code below. Please check https://linux.die.net/man/3/getline
Also, you can use the atoi or strtoul functions to convert the read line to an integer. Feel free to check https://linux.die.net/man/3/atoi or https://linux.die.net/man/3/strtoul
The code below evaluate a file with a list of numbers and add those numbers to a C integer pointer
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char ** argv) {
FILE * file;
file = fopen("./file.txt", "r");
size_t read;
char * line = NULL;
size_t line_len = 0;
size_t buffer_size = 10;
int * buffer = (int *)malloc(sizeof(int) * buffer_size);
int seek = 0;
while((read = getline(&line, &line_len, file)) != -1) {
buffer[seek++] = atoi(line);
if (seek % 10 == 0) {
buffer_size += 10;
buffer = (int *)realloc(buffer, sizeof(int) * buffer_size);
}
}
for (int i = 0; i < seek; i++) {
printf("%d\n", buffer[i]);
}
free(buffer);
fclose(file);
}
If you aren't sure which conversion function should you use. You can check the difference between atoi and sscanf at What is the difference between sscanf or atoi to convert a string to an integer?

c - get file into array of chars

hi i have the following code below, where i try to get all the lines of a file into an array... for example if in file data.txt i have the following:
first line
second line
then in below code i want to get in data array the following:
data[0] = "first line";
data[1] = "second line"
My first question: Currently I am getting "Segmentation fault"... Why?
Exactly i get the following output:
Number of lines is 7475613
Segmentation fault
My second question: Is there any better way to do what i am trying do?
Thanks!!!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[])
{
FILE *f = fopen("data.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
int i =0;
int counter = 0;
for(; i<pos; i++)
{
if(*(bytes+i)=='\n') counter++;
}
printf("\nNumber of lines is %d\n", counter);
char* data[counter];
int start=0, end=0;
counter = 0;
int length;
for(i=0; i<pos; i++)
{
if(*(bytes+i)=='\n')
{
end = i;
length =end-start;
data[counter]=(char*)malloc(sizeof(char)*(length));
strncpy(data[counter],
bytes+start,
length);
counter = counter+1;
start = end+1;
}
}
free(bytes);
return 0;
}
First line of the data.txt in this case is not '\n' it is: "23454555 6346346 3463463".
Thanks!
You need to malloc 1 more char for data[counter] for the terminating NUL.
after strncpy, you need to terminate the destination string.
Edit after edit of original question
Number of lines is 7475613
Whooooooaaaaaa, that's a bit too much for your computer!
If the size of a char * is 4, you want to reserve 29902452 bytes (30M) of automatic memory in the allocation of data.
You can allocate that memory dynamically instead:
/* char *data[counter]; */
char **data = malloc(counter * sizeof *data);
/* don't forget to free the memory when you no longer need it */
Edit: second question
My second question: Is there any
better way to do what i am trying do?
Not really; you're doing it right. But maybe you can code without the need to have all that data in memory at the same time.
Read and deal with a single line at a time.
You also need to free(data[counter]); in a loop ... and free(data); before the "you're doing it right" above is correct :)
And you need to check if each of the several malloc() calls succeeded LOL
First of all you need to check if the file got opened correctly or not:
FILE *f = fopen("data.txt", "rb");
if(!f)
{
fprintf(stderr,"Error opening file");
exit (1);
}
If there is error opening the file and you don't check it, you'll get a seg fault when you try to fseek on an invalid file pointer.
Apart from that I see no errors. Tried running the program, by printing the value of the data array at the end, it ran as expected.
One thing to note is that you're opening your file as binary - line termination disciplines may not work as you expect on your platform (UNIX is lf, Windows is cr-lf, some versions of MacOS are cr).

Resources