Print all words and their count - c

I'm to print all words and their count of a text file. When it reads the same word a second time it outputs the number zero. I can't figure out how to output the correct value. For example, if it found "and" it would print "and: 1" but when it finds "and" again it prints "and: 0".
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "hashMap.h"
int main (int argc, const char * argv[]){
char* word;
int *value = 0;
const char* filename;
struct hashMap *hashTable;
int tableSize = 10;
clock_t timer;
FILE *fileptr;
if(argc == 2)
filename = argv[1];
else
filename = "input1.txt"; /*specify your input text file here*/
printf("opening file: %s\n", filename);
fileptr = fopen(filename, "r");
if(fileptr != 0){
printf("Open Successfull!\n");
}
else{
printf("Failed to open!\n");
}
timer = clock();
hashTable = createMap(tableSize);
/*... concordance code goes here ...*/
while(1){
word = getWord(fileptr);
if(word == NULL){
break;
}
value = (int*)atMap(hashTable, word);
if(value != NULL){
value++;
}
else{
value = (int *) malloc(sizeof(int));
*value = 1;
insertMap(hashTable, word, value);
}
printf("%s:%d\n", word, *value);
}
}

The value you're fetching from the hashtable is a pointer to a block of memory with a single integer. You then increment the pointer, which makes it point to uninitialized memory, which just happens to have a zero in it. You probably want to increment the integer value, not the pointer.

Value is a pointer to an int. You don't want to increment the pointer, but rather increment the int it points to. Change value++ to (*value)++.

Related

Find number of occurrences for the substring in a string using C programming

I am trying a program in c to read a text file that contains array of characters or a string and find the number of occurrences of the substring called "GLROX" and say sequence found when it is found. And the "inputGLORX.txt" contains following string inside it.
GLAAAROBBBBBBXGLROXGLROXGLROXGLROXGLCCCCCCCCCCCCCCROXGGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROX
But i am getting wierd results. It would be great if some expert in C-programming helps me to solve this and thanks in advance.
#include <stdio.h>
#include <conio.h>
#include <string.h>
#define NUMBER_OF_STRINGS 40
#define MAX_STRING_SIZE 7
void seqFound()
{
printf("Sequence Found\n");
}
int main()
{
FILE *fp;
char buff[1000];
char strptrArr[NUMBER_OF_STRINGS] [MAX_STRING_SIZE];
const char *search = "GLROX";
fp = fopen("D:/CandC++/inputGLORX.txt", "r");
if(fp==NULL)
printf("It is a null pointer");
while(!feof(fp))
{
//fscanf(fp, "%s", buff);
fgets(buff, 1000,fp);
}
int len = strlen(buff);
printf("length is %d\n",len);
int count = 0;
char *store;
while(store = strstr(buff, search))
{
printf("substring is %s \n",store);
count++;
search++;
}
printf("count is %d\n",count);
while (count!=0) {
seqFound();
count--;
}
return 0;
}
As said in the comment, their are at least 2 problems in the code: your fgets will only fetch the last line (if it fetch one at all ? In any case, this is not what you want), and you are incrementing the search string instead of the buff string.
Something like this should fix most of your problems, as long as no lines in your file are longer than 999 characters. This will not work properly if you use the \n or NULL characters in your search string.
int count = 0;
while (fgets(buff, 1000, fp) != NULL)
{
char *temp = buff;
while ((temp = strstr(temp, search)))
{
printf("%d. %s\n", count + 1, temp);
count++;
temp++;
}
}
Here is a main for testing. I used argv to provide the input.txt and the search string.
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char buff[1000];
char *search;
if (argc < 3)
return (-1);
search = argv[2];
if (search[0] == '\0')
return (-1);
if ((fp = fopen(argv[1], "r")) == NULL)
return (-1);
int count = 0;
while (fgets(buff, 1000, fp) != NULL)
{
char *temp = buff;
while ((temp = strstr(temp, search)))
{
printf("%d. %s\n", count + 1, temp);
count++;
temp++;
}
}
printf("Match found: %d\n", count);
return 0;
}
The way you search in buff is wrong, i.e. this code:
while(store = strstr(buff, search))
{
printf("substring is %s \n",store);
count++;
search++; // <------- ups
}
When you have a hit, you change search, i.e. the string you are looking for. That's not what you want. The search string (aka the needle) shall be the same all the time. Instead you want to move forward in the buffer buff so that you can search in the remainder of the buffer.
That could be something like:
int main()
{
const char* buff = "GLAAAROBBBBBBXGLROXGLROXGLROXGLROXGLCCCCCCCCCCCCCCROXGGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROXGLROX";
const char* search = "GLROX";
const char* remBuff = buff; // Pointer to the remainder of buff
// Initialized to be the whole buffer
const char* hit;
int cnt = 0;
while((hit = strstr(remBuff, search))) // Search in the remainder of buff
{
++cnt;
remBuff = hit + 1; // Update the remainder pointer so it points just 1 char
// after the current hit
}
printf("Found substring %d times\n", cnt);
return 0;
}
Output:
Found substring 15 times

Copy string to element of struct array

I'm attempting to copy a C-string, which is read in from a file to an element of a struct array, but it is not copying. When I attempt to print, the word is not there. I'm kind of new to C. Below is my code. Many thanks for your help.
typedef struct Tree{
int numTimes; //number of occurrences
char* word; //the word buffer
}Node;
#include "proj2.h"
#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]){
FILE* readIn; //read file pointer
FILE* writeOut; //write file pointer
char buffer[18]; //allocate buffer ***please do not fuzz
int length = 0;
int count = 0;
Node* array = (Node*) malloc(sizeof(Node));
/*if(argc < 3){ //if the number of command line arguments is < 3, return EXIT_FAILURE
return EXIT_FAILURE;
}*/
argv[1] = "/Users/magnificentbastard/Documents/workspaceCPP/proj2/Password.txt"; //****testing
argv[2] = "outFile.txt"; //****testing
readIn = fopen(argv[1], "r"); //opens the selected argument file for reading
writeOut = fopen(argv[2], "w"); //opens the selected argument file for writing
if(readIn == NULL){ //if there
printf("ERROR: fopen fail.\n");
return EXIT_FAILURE; //exits if the file opens
}
while(fscanf(readIn, "%18s", buffer) == 1){ //loop to read in the words to the buffer
count++; //counts the words coming in
modWord(buffer); //modifies the words coming in
array = (Node*)realloc(array, sizeof(Node));
for(int i = 0; i < count; i++){ //****not copying over...HELP
strcpy(array[i].word, buffer);
}
}
//Node array[count];
fprintf(stderr, "%d ", count); //***for testing purposes only
int elements = sizeof(array)/sizeof(array[0]); //***testing assigns num elements
fprintf(stderr, "%d ", elements); //***testing prints num elements
fclose(readIn); //closes the in-file
fclose(writeOut); //closes the out-file
return EXIT_SUCCESS;
}
array[count] doesn't allocate the memory. I believe what you're trying to implement here is single-linked list of strings.
What you're trying to do can be achieved, but you'd need to allocate memory for array by using malloc/free combo. What's more, what you're trying to achieve should by done by either making Node.word an array of fixed size OR a pointer and allocating the memory on Node-by-Node basis.
Length of an array cannot be retrieved by use of sizeof operator as sizeof is evaluated in compile and it'll always return a size of a pointer on your platform.

strtok and storage in arrays: output not as expected

In the below code, the file test.txt has the following data :
192.168.1.1-90
192.168.2.2-80
The output of this is not as expected.
I expect the output to be
192.168.1.1
90
192.168.2.2
80
Any help would be much appreciated.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp;
char *result[10][4];
int i=0;
const char s[2] = "-";
char *value,str[128];
fp = fopen("test.txt", "r");
if (fp == NULL)
printf("File doesn't exist\n");
else{
while(!feof(fp)){
if(fgets(str,sizeof(str),fp)){
/* get the first value */
value = strtok(str, s);
result[i][0]=value;
printf("IP : %s\n",result[i][0]); //to be removed after testing
/* get second value */
value = strtok(NULL, s);
result[i][1]=value;
printf("PORT : %s\n",result[i][1]); //to be removed after testing
i++;
}}
for (int k=0;k<2;k++){
for (int j=0;j<2;j++){
printf("\n%s\n",result[k][j]);
}
}
}
return(0);
}
You can try this solution. It uses dynamic memory instead, but does what your after.
The code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFSIZE 128
void exit_if_null(void *ptr, const char *msg);
int
main(int argc, char const *argv[]) {
FILE *filename;
char buffer[BUFFSIZE];
char *sequence;
char **ipinfo;
int str_size = 10, str_count = 0, i;
filename = fopen("ips.txt", "r");
if (filename == NULL) {
fprintf(stderr, "%s\n", "Error Reading File!");
exit(EXIT_FAILURE);
}
ipinfo = malloc(str_size * sizeof(*ipinfo));
exit_if_null(ipinfo, "Initial Allocation");
while (fgets(buffer, BUFFSIZE, filename) != NULL) {
sequence = strtok(buffer, "-\n");
while (sequence != NULL) {
if (str_size == str_count) {
str_size *= 2;
ipinfo = realloc(ipinfo, str_size * sizeof(*ipinfo));
exit_if_null(ipinfo, "Reallocation");
}
ipinfo[str_count] = malloc(strlen(sequence)+1);
exit_if_null(ipinfo[str_count], "Initial Allocation");
strcpy(ipinfo[str_count], sequence);
str_count++;
sequence = strtok(NULL, "-\n");
}
}
for (i = 0; i < str_count; i++) {
printf("%s\n", ipinfo[i]);
free(ipinfo[i]);
ipinfo[i] = NULL;
}
free(ipinfo);
ipinfo = NULL;
fclose(filename);
return 0;
}
void
exit_if_null(void *ptr, const char *msg) {
if (!ptr) {
printf("Unexpected null pointer: %s\n", msg);
exit(EXIT_FAILURE);
}
}
The key thing to understand is that char *strtok(char *str, const char *delim) internally modifies the string pointed to by str and uses that to store the result. So the returned pointer actually points to somewhere in str.
In your code, the content of str is refreshed each time when you parse a new line in the file, but the address remains the same. So after your while loop, the content of str is the last line of the file, somehow modified by strtok. At this time, result[0][0] and result[1][0] both points to the same address, which equals the beginning of str. So you print the same thing twice in the end.
This is further illustrated in the comments added to your code.
int main()
{
FILE *fp;
char *result[10][4];
int i=0;
const char s[2] = "-";
char *value,str[128];
fp = fopen("test.txt", "r");
if (fp == NULL)
printf("File doesn't exist\n");
else{
while(!feof(fp)){
if(fgets(str,sizeof(str),fp)){
/* get the first value */
value = strtok(str, s);
// ADDED: value now points to somewhere in str
result[i][0]=value;
// ADDED: result[i][0] points to the same address for i = 0 and 1
printf("IP : %s\n",result[i][0]); //to be removed after testing
/* get second value */
value = strtok(NULL, s);
// ADDED: value now points to somewhere in str
result[i][1]=value;
// ADDED: result[i][1] points to the same address for i = 0 and 1
printf("PORT : %s\n",result[i][1]); //to be removed after testing
i++;
}}
// ADDED: now result[0][0]==result[1][0], result[0][1]==result[1][1], you can test that
for (int k=0;k<2;k++){
for (int j=0;j<2;j++){
printf("\n%s\n",result[k][j]);
}
}
}
return(0);
}
To get the expected output, you should copy the string pointed by the pointer returned by strtok to somewhere else each time, rather than just copy the pointer itself.

struct pointers to same memory address producing different data?

I have this simple code to read the lines of a file and store them in a struct:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct filedata {
char **items;
int lines;
};
struct filedata *read_file(char *filename) {
FILE* file = fopen(filename, "r");
if (file == NULL) {
printf("Can't read %s \n", filename);
exit(1);
}
char rbuff;
int nlines = 0; // amount of lines
int chr = 0; // character count
int maxlen = 0; // max line length (to create optimal buffer)
int minlen = 2; // min line length (ignores empty lines with just \n, etc)
while ((rbuff = fgetc(file) - 0) != EOF) {
if (rbuff == '\n') {
if (chr > maxlen) {
maxlen = chr + 1;
}
if (chr > minlen) {
nlines++;
}
chr = 0;
}
else {
chr++;
}
}
struct filedata *rdata = malloc(sizeof(struct filedata));
rdata->lines = nlines;
printf("lines: %d\nmax string len: %d\n\n", nlines, maxlen);
rewind(file);
char *list[nlines];
int buffsize = maxlen * sizeof(char);
char buff[buffsize];
int i = 0;
while (fgets(buff, buffsize, file)) {
if (strlen(buff) > minlen) {
list[i] = malloc(strlen(buff) * sizeof(char) + 1);
strcpy(list[i], buff);
i++;
}
}
rdata->items = (char **)list;
fclose(file);
int c = 0;
for (c; c < rdata->lines; c++) {
printf("line %d: %s\n", c + 1, rdata->items[c]);
}
printf("\n");
return rdata;
}
int main(void) {
char fname[] = "test.txt";
struct filedata *ptr = read_file(fname);
int c = 0;
for (c; c < ptr->lines; c++) {
printf("line %d: %s\n", c + 1, ptr->items[c]);
}
return 0;
}
This is the output when I run it:
lines: 2
max string len: 6
line 1: hello
line 2: world
line 1: hello
line 2: H��
For some reason when it reaches the second index in ptr->items, it prints gibberish output. But yet, if I throw some printf()'s in there to show the pointer addresses, they're exactly the same.
Valgrind also prints this when iterating over the char array the second time:
==3777== Invalid read of size 8
==3777== at 0x400AB3: main (test.c:81)
==3777== Address 0xfff000540 is on thread 1's stack
==3777== 240 bytes below stack pointer
But that really doesn't give me any clues in this case.
I'm using gcc 4.9.4 with glibc-2.24 if that matters.
list is an non-static local variable and using it after exiting its scope (returning from read_file in this case) will invoke undefined behavior because it will vanish on exiting its scope. Allocate it dynamically (typically on the heap) like
char **list = malloc(sizeof(char*) * nlines);
Adding code to check if malloc()s are successful will make your code better.
The variable list is local to read_file, but you store a pointer to list in rdata->items. When read_file returns, rdata->items is a dangling pointer, and accessing it is undefined behavior.

How to include the spaces?

I am trying to write a program that takes the words from a file, and puts those in a dynamic array. However when I try to run my code the program copies it all except for the spaces. How do I fix this?
This is a test does it work?
But I get the following:
Thisisatestdoesitwork?
char** getWords(char* filename, int* pn){
char** tmp = (char**)malloc( 1000*sizeof(char));
int *temp=(int*)malloc(1000*sizeof(int);
int c;
int counter = 0;
FILE* fileInput = fopen(filename, "r");
if(fileInput == NULL){
return tmp; // return if file open fails
}
while((c=fgetc(fileInput)) != EOF){
result = fscanf(fileInput, "%c", &c); //try to read a character
if(isalpha(c)){ //chararect alphabetical
tmp[counter] = c; // safe int to array
counter ++;
printf("%c", c); fflush(stdout);
}
else{ // if read not succesfull
fscanf(fileInput, ""); // needs to scan anything not a character
}
if(counter > 100){ // to not exceed the array
break;
}
if(feof(fileInput)){ // to check if at the end of the file
break;
}
}
fclose(fileInput); // closing file
*pn = counter;
return tmp;}
My main Function:
int main(int argc, char* argv[]){
int n;
char** a = getWords("opdracht_4_5.c", &n);
if (a != NULL){
puts("gevonden woorden:");
for (int i = 0;i < n; i++){
printf("%3d %s\n",i,a[i]);
}
for (int i = 0;i < n; i++){
free(a);
}
free(a);
}
return (EXIT_SUCCESS);
}
There are quite a few problems with your code. Here's a start:
You don't test the return value of fopen().
You don't test the return value of malloc().
You assign the return value of fgetc() to a variable of type char. Plain char is compatible with either signed char or unsigned char. In order to make a distinction between a character and EOF (which is negative), the fgetc() function returns a character converted to unsigned char (or EOF). You need to test for EOF and then convert the value to a plain char.
The is...() function expects an int argument whose value is in the range of an unsigned char or EOF. If you have a plain char, you first have to cast it to unsigned char, or you can pass the return value of fgetc() straight to isalpha().
You attempt to append an zero-length char array (temp) to an uninitialized char array (s), and you do not test if there is enough room in the target array. This is broken for more reasons than than I care to enumerate.
You allocate memory for an array of 1000 pointers to char, but you never allocate memory for the char pointers themselves.
You try to append your buffer (s) to an uninitialized pointer (*tmp).
You call strlen() on something that is not null-terminated.
You never return the length of the array.
You call a number of functions that have not been declared.
This will read the file, put each word in an array
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
char** getWords(char* filename, int* pn){
char input[100]; // array to hold each word
char** tmp; // double pointer
int counter = 0;
int words = 0;
int c;
tmp = malloc( (*pn)*sizeof(char*)); // allocate pointers for number of words
if ( tmp == NULL) {
printf ( "malloc failed\n");
exit (1);
}
FILE* fileInput = fopen(filename, "r");
if(fileInput == NULL){
printf ( "file open failed\n");
*pn = 0; // no words entered
return tmp; // return if file open fails
}
while(( c = fgetc(fileInput)) != EOF){
if( isalnum(c)){ // is alpha or number
input[counter] = c; // save to array
input[counter + 1] = '\0'; // save a \0 to the end to make a string
counter ++;
}
else{ // not alpha or number
if ( counter > 0) { // if there are characters, save the word
tmp[words] = malloc ( strlen ( input) + 1); // memory for this word
strcpy ( tmp[words], input); // copy the word to the array
words++;
counter = 0;
if ( words >= *pn) { // got all the words wanted
break;
}
}
}
if(counter > 98){ // too many characters for input, start a new word
tmp[words] = malloc ( strlen ( input) + 1);
strcpy ( tmp[words], input);
words++;
counter = 0;
if ( words >= *pn) {
break;
}
}
}
fclose(fileInput); // closing file
*pn = words; // save number of words
return tmp;
}
int main(int argc, char* argv[]){
int n;
int i;
printf ( "enter the number of words to obtain\n");
scanf ( "%d", &n);
char** a = getWords("opdracht_4_5.c", &n);
if (a != NULL){
puts("gevonden woorden:");
for ( i = 0;i < n; i++){
printf("%3d %s\n",i,a[i]);
}
for ( i = 0;i < n; i++){
free(a[i]); // free each word
}
free(a); // free the pointer to the words
}
return (EXIT_SUCCESS);
}
The input file I used had these as the first two lines
#include<stdio.h>
#include<string.h>
I get this output:
enter the number of words to obtain
6
gevonden woorden:
0 include
1 stdio
2 h
3 include
4 string
5 h
This answer is as yet incomplete
Please allow me to finish this before commenting on it -- Thank you
There are a lot if issues with your code, I won't clean it up for you. However I would like to give you some hints on how your program SHOULD be coded:
Your main objective is to read a file and load the content word by word in an array.
Sorting is an incorrect use because that implies you want to sort them alphabetically or in some other order after loading it into an array.
Okay, so first things first, let's figure out the overall operation of our program. We'll call our program kitten, because it's not quite as powerful as cat.
To run our program we will assume that we give it the filename we want to read on the command-line as follows:
$ ./kitten somefile.txt
and expect the output to be:
word1
word2
word3
.
.
.
wordN
Total words: N
So, let's get started, first we make sure that our user specifies a filename:
#include <stdio.h>
int usage(const char *progname);
int main(int argc, char *argv[])
{
if (argc < 2) {
usage(argv[0]);
return -1;
}
return 0;
}
int usage(const char *progname)
{
fprintf(stderr, "Usage is:\n\t%s filename\n", progname);
}
Now that we know that our program can get a filename, let's try to open the text file, if there is an issue with it we use perror to display the error and exit the program, otherwise we are ready to use the file:
#include <stdio.h>
#include <errno.h>
int usage(const char *progname);
int main(int argc, char *argv[])
{
FILE *fp;
if (argc < 2) {
usage(argv[0]);
return -1;
}
fp = fopen(argv[1], "r");
if (!fp) {
perror(argv[1]); /* display system error, with the filename */
return -1;
}
/* TODO: file manipulation goes here */
fclose(fp); /* close the file */
return 0;
}
int usage(const char *progname)
{
fprintf(stderr, "Usage is:\n\t%s filename\n", progname);
}
Now in C each function should perform just one task. The task should make human sense. For example if the function is supposed to read words into an array, then that's all it should do, it should not open a file or close a file, which is WHY the code above does not create a function for opening the file the way you did. Your function should take in FILE * as the file to read.
Because we use the FILE * as input we'll start the function name with an f to keep with the stdio convention. Ideally, the function should take a pointer to char * (strings) to store the words in.
#include <stdio.h>
#include <errno.h>
int usage(const char *progname);
size_t fload(FILE *fp, char **wordlist_p);
int main(int argc, char *argv[])
{
FILE *fp;
if (argc < 2) {
usage(argv[0]);
return -1;
}
fp = fopen(argv[1], "r");
if (!fp) {
perror(argv[1]); /* display system error, with the filename */
return -1;
}
if(fload(fp, wordlist_p) < 0) {
fprintf(stderr, "Something went wrong\n")
}
fclose(fp); /* close the file */
return 0;
}
int usage(const char *progname)
{
fprintf(stderr, "Usage is:\n\t%s filename\n", progname);
}
size_t fload(FILE *fp, char **wordlist_p)
{
size_t rv = -1; /* return value */
return rv;
}
Now we run into a conceptual problem. How do we allocate memory for wordlist_p? I mean we don't have any idea about how big the file is, we also don't know how big the biggest word in the file is.
Crude approach
Let's first try an think about it the simple way:
Point to the beginning of the `wordlist_p` with a `tail_pointer`
Read the file line by line, (we assume no hyphenation)
For each line split the line up along white spaces,
Allocate space for the number of words in the `wordlist_p` array
For each word in the split line
Allocate space for the word itself
Save the pointer to the word at the tail_pointer
Advance wordlist_p tail_pointer
Next word
Next Line
Let's look at what the fload function would look like with these steps above,
More to come ##

Resources