I am trying to write a program which opens up a text file, reads from the file, changes upper case to lower case, and then counts how many times that word has occurred in the file and prints results into a new text file.
My code so far is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <ctype.h>
#include <string.h>
int main()
{
FILE *fileIN;
FILE *fileOUT;
char str[255];
char c;
int i = 0;
fileIN = fopen ("input.txt", "r");
fileOUT = fopen ("output.txt", "w");
if (fileIN == NULL || fileOUT == NULL)
{
printf("Error opening files\n");
}
else
{
while(! feof(fileIN)) //reading and writing loop
{
fscanf(fileIN, "%s", str); //reading file
i = 0;
c = str[i];
if (isupper(c)) //changing any upper case to lower case
{
c =(tolower(c));
str[i] = putchar(c);
}
printf("%s ", str); //printing output
fprintf(fileOUT, "%s\n", str); //printing into file
}
fclose(fileIN);
fclose(fileOUT);
}
getch();
}
the input.txt file contains the following "The rain in Spain falls mainly in the plane"
Don't ask why.
After the running of the program as is the output would look like:
the
rain
in
spain
falls
mainly
in
the
plane
I have managed to lower case the upper case words. I am now having trouble understanding how I would count the occurrences of each word. eg in the output I would want it to say "the 2" meaning 2 had appeared, this would also mean that i do not want any more "the" to be stored in that file.
I am thinking strcmp and strcpy but unsure how to use those the way i want.
Help would be much appreciated
(Sorry if formatting bad)
You may want to create a hash table with the words as keys and frequencies as values.
Sketch ideas:
recognize words, i.e. alphanumeric string separated by white space, try using strtok()
for each word
search for the word in the hash table based dictionary
if found: increment the frequency
if not found: insert a new entry in the dictionary as (word, 1)
At the end, print the contents of the dictionary, i.e. for all entries, entry.word and entry.frequency
See this question and answer for details: Quick Way to Implement Dictionary in C It is based on Section 6.6 of the bible "The C Programming Language"
UPDATE based on OP's comment:
Hash table is just an efficient table, if you do not want to use it, you can still use vanilla tables. Here are some ideas.
typedef struct WordFreq {
char word[ N ];
int freq;
} WordFreq;
WordFreq wordFreqTable[ T ];
(N is the maximum length of a single word, T is the maximum number of unique words)
For searching and inserting, you can do a linear search in the table for( int i = 0; i != T; ++i ) {
easy sample(need error catch, do free memory, sorting for use qsort, etc...)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define BUFFSIZE 1024
typedef struct _wc {
char *word;
int count;
} WordCounter;
WordCounter *WordCounters = NULL;
int WordCounters_size = 0;
void WordCount(char *word){
static int size = 0;
WordCounter *p=NULL;
int i;
if(NULL==WordCounters){
size = 4;
WordCounters = (WordCounter*)calloc(size, sizeof(WordCounter));
}
for(i=0;i<WordCounters_size;++i){
if(0==strcmp(WordCounters[i].word, word)){
p=WordCounters + i;
break;
}
}
if(p){
p->count += 1;
} else {
if(WordCounters_size == size){
size += 4;
WordCounters = (WordCounter*)realloc(WordCounters, sizeof(WordCounter)*size);
}
if(WordCounters_size < size){
p = WordCounters + WordCounters_size++;
p->word = strdup(word);
p->count = 1;
}
}
}
int main(void){
char buff[BUFFSIZE];
char *wordp;
int i;
while(fgets(buff, BUFFSIZE, stdin)){
strlwr(buff);
for(wordp=buff; NULL!=(wordp=strtok(wordp, ".,!?\"'#$%&()=# \t\n\\;:[]/*-+<>"));wordp=NULL){
if(!isdigit(*wordp) && isalpha(*wordp)){
WordCount(wordp);
}
}
}
for(i=0;i<WordCounters_size;++i){
printf("%s:%d\n", WordCounters[i].word, WordCounters[i].count);
}
return 0;
}
demo
>WordCount.exe
The rain in Spain falls mainly in the plane
^Z
the:2
rain:1
in:2
spain:1
falls:1
mainly:1
plane:1
Related
I am making a language translator, and want to read from the buffer word by word and store them in a key-value struct.
The buffer contains such a file:
hola:hello
que:what
and so on. I already tried everything and I keep errors such as segmentation fault: 11 or just reading the same line again and again.
struct key_value{
char *key;
char *value;
};
...
struct key_value *kv = malloc(sizeof(struct key_value) * count);
char k[20]; //key
char v[20]; //value
int x = 0;
for(i = 0; i < numbytes; i++){
sscanf(buffer,"%21[^:]:%21[^\n]\n",k,v);
(kv + i)->key = k;
(kv + i)->value = v;
}
for(i = 0; i < count; i++){
printf("key: %s, value: %s\n",(kv + i)->key,(kv + i)->value);
}
free(buffer);
free(kv);
I expect the output to be key: hola, value: hello key: que, value: what,
but the actual output is just key: hola, value: hello again and again.
Which is the right way to do it?
There are multiple problems with your code, among them
On each loop iteration, you read from the beginning of the buffer. It is natural, then, that each iteration extracts the same key and value.
More generally, your read loop iteration variable seems to have no relationship with the data read. It appears to be a per-byte iteration, but you seem to want a per-line iteration. You might want to look into scanf's %n directive to help you track progress through the buffer.
You are scanning each key / value pair into the same local k and v variables, then you are assigning pointers to those variables to your structures. The resulting pointers are all the same, and they will become invalid when the function returns. I suggest giving structkey_value` arrays for its members instead of pointers, and copying the data into them.
Your sscanf format reads up to 21 characters each for key and value, but the provided destination arrays are not long enough for that. You need them to be dimensioned for at least 22 characters to hold 21 plus a string terminator.
Your sscanf() format and usage do not support recognition of malformed input, especially overlength keys or values. You need to check the return value, and you probably need to match the trailing newline with a %c field (the literal newline in the format does not mean what you think it means).
Tokenizing (the whole buffer) with strtok_r or strtok or even strchr instead of sscanf() might be easier for you.
Also, style note: your expressions of the form (kv + i)->key are valid, but it would be more idiomatic to write kv[i].key.
I've written a simple piece of code that may help you to solve your problem. I've used the function fgets to read from a file named "file.txt" and the function strchr to individuate the 1st occurence of the separator ':'.
Here the code:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#define MAX_LINE_SIZE 256
#define MAX_DECODED_LINE 1024
struct decod {
char key[MAX_LINE_SIZE];
char value[MAX_DECODED_LINE];
};
static struct decod decod[1024];
int main(void)
{
FILE * fptr = NULL;
char fbuf[MAX_LINE_SIZE];
char * value;
int cnt=0,i;
if ( !(fptr=fopen("file.txt","r")) )
{
perror("");
return errno;
}
while( fgets(fbuf,MAX_LINE_SIZE,fptr)) {
// Eliminate UNIX/DOS line terminator
value=strrchr(fbuf,'\n');
if (value) *value=0;
value=strrchr(fbuf,'\r');
if (value) *value=0;
//Find first occurrence of the separator ':'
value=strchr(fbuf,':');
if (value) {
// Truncates fbuf string to first word
// and (++) points second word
*value++=0;
}
if (cnt<MAX_DECODED_LINE) {
strcpy(decod[cnt].key,fbuf);
if (value!=NULL) {
strcpy(decod[cnt].value,value);
} else {
decod[cnt].value[0]=0;
}
cnt++;
} else {
fprintf(stderr,
"Cannot read more than %d lines\n", MAX_DECODED_LINE);
break;
}
}
if (fptr)
fclose(fptr);
for(i=0;i<cnt;i++) {
printf("key:%s\tvalue:%s\n",decod[i].key,decod[i].value);
}
return 0;
}
This code reads all the lines (max 1024) that the file named file.txt contains, loads all individuated couples (max 1024) into the struct array decod and then printouts the content of the structure.
I wrote this code, I think it does the job! this is simpler than the accepted answer I think! and it uses just as much as memory is needed, no more.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
struct key_value{
char key[22];
char value[22];
};
void parse_str(char* str, struct key_value** kv_arr, int* num){
int n = 0;
int read = -1;
char k[22];
char v[22];
int current_pos = 0;
int consumed = 0;
/*counting number of key-value pairs*/
while (1){
if(current_pos > strlen(str)){
break;
}
read = sscanf(str + current_pos, "%21[^:]:%21[^\n]\n%n", k, v, &consumed);
current_pos += consumed;
if(read == 2){
++n;
}
}
printf("n = %d\n", n);
*kv_arr = malloc(sizeof(struct key_value) * n);
/*filling key_value array*/
int i = 0;
read = -1;
current_pos = 0;
consumed = 0;
while (1){
if(current_pos > strlen(str)){
break;
}
read = sscanf(str + current_pos, "%21[^:]:%21[^\n]\n%n", k, v, &consumed);
current_pos += consumed;
if(read == 2){
struct key_value* kv = &((*kv_arr)[i]);
strncpy(kv->key, k, 22);
strncpy(kv->value, v, 22);
++i;
}
}
*num = n;
}
int main(){
char* str = "hola:hello\n"
"que:what\n";
int n;
struct key_value* kv_arr;
parse_str(str, &kv_arr, &n);
for (int i = 0; i < n; ++i) {
printf("%s <---> %s\n", kv_arr[i].key, kv_arr[i].value);
}
free(kv_arr);
return 0;
}
output :
n = 2
hola <---> hello
que <---> what
Process finished with exit code 0
Note: sscanf operates on a const char*, not an input stream from a file, so it will NOT store any information about what it has consumed.
solution : I used %n in the format string to get the number of characters that it has consumed so far (C89 standard).
I am taking in a variable amount of lines of string using fgets. The end of each line is marked by a new line. I am trying to allocate an array of strings (using malloc). Each index of the array will point to each line of the text that is entered from the user. This was really tough to explain so here is an example:
Using fgets and malloc, allow the user to enter multiple lines of strings. The user will signal the end of the strings with a '.' on a newline. Each line of the multiple strings will be stored as a string in a dynamically allocated array. The output of the program must print each line in reverse order.
Ex:
Enter string:
(this is sample input)
The sky is blue
The grass is green
I love life
.
(this should be the output)
I love life
The grass is green
The sky is blue
I have this so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char charsOnLine[1000];
char **poem;
int i;
int j;
fgets(charsOnLine, 1000, stdin); //runs only once
while (charsOnLine[0] != '.')
{
poem = malloc(sizeof(char*) * 3);
for(j = 0; j < strlen(charsOnLine); j++)
{
poem[j] = malloc(sizeof(strlen(charsOnLine)));
strcpy(poem[j], charsOnLine);
}
fgets(charsOnLine, 1000, stdin);
}
for (j = 0; j < strlen(*poem); j++) //test to print each line of the poem (not in reverse)
{
printf("%s\n",poem[j]);
}
return 0;
}
I am have just started with double pointers, pointers, dynamically allocating memory, fgets() and putting them all together is giving me some trouble.
In my code, I'm testing to see if it'll print each line I enter the same way I entered it, but it is printing the last entered string 4 times instead of each different line.
Once I figure out how to print each entered line, I will figure out how to print them backwards.
There's a rather simple solution to your "in reverse order" problem, which does not even require dynamic memory allocation but just recursion:
void readRecursive() {
char charsOnLine[1000];
if (fgets(charsOnLine, 1000, stdin) && charsOnLine[0] != '.') {
readRecursive();
}
fputs(charsOnLine, stdout);
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char charsOnLine[1000];
char **poem = NULL;
size_t size = 0, current;
while (fgets(charsOnLine, 1000, stdin) != NULL && charsOnLine[0] != '.')
{
char **tmp;
tmp = realloc(poem, (size + 1) * sizeof(*poem));
if(tmp)
{
poem = tmp;
poem[size] = malloc(strlen(charsOnLine) + 1);
if(poem[size])
{
strcpy(poem[size++], charsOnLine);
}
}
}
current = size;
while(current--)
{
fputs(poem[current], stdout);
}
/* do something more with your read poem */
return 0;
}
I am writing a program (for a class assignment) to translate normal words into their pirate equivalents (hi = ahoy).
I have created the dictionary using two arrays of strings and am now trying to translate an input.txt file and put it into an output.txt file. I am able to write to the output file, but it only writes the translated first word over and over on a new line.
I've done a lot of reading/scouring and from what I can tell, using fscanf() to read my input file isn't ideal, but I cannot figure out what would be a better function to use. I need to read the file word by word (separated by space) and also read in each punctuation mark.
Input File:
Hi, excuse me sir, can you help
me find the nearest hotel? I
would like to take a nap and
use the restroom. Then I need
to find a nearby bank and make
a withdrawal.
Miss, how far is it to a local
restaurant or pub?
Output: ahoy (46 times, each on a separate line)
Translate Function:
void Translate(char inputFile[], char outputFile[], char eng[][20], char pir[][20]){
char currentWord[40] = {[0 ... 39] = '\0'};
char word;
FILE *inFile;
FILE *outFile;
int i = 0;
bool match = false;
//open input file
inFile = fopen(inputFile, "r");
//open output file
outFile = fopen(outputFile, "w");
while(fscanf(inFile, "%s1023", currentWord) == 1){
if( ispunct(currentWord) == 0){
while( match != true){
if( strcasecmp(currentWord, eng[i]) == 0 || i<28){ //Finds word in English array
fprintf(outFile, pir[i]); //Puts pirate word corresponding to English word in output file
match = true;
}
else {i++;}
}
match = false;
i=0;
}
else{
fprintf(outFile, &word);//Attempt to handle punctuation which should carry over to output
}
}
}
As you start matching against different english words, i<28 is initially true. Hence the expression <anything> || i<28 is also immediately true and correspondingly the code will behave as though a match was found on the first word in your dictionary.
To avoid this you should handle the "found a match at index i" and the "no match found" condition separately. This can be achieved as follow:
if (i >= dictionary_size) {
// No pirate equivalent, print English word
fprintf(outFile, "%s", currentWord);
break; // stop matching
}
else if (strcasecmp(currentWord, eng[i]) == 0){
...
}
else {i++;}
where dictionary_size would be 28 in your case (based on your attempt at a stop condition with i<28).
Here's a code snippet that I use to parse things out. Here's what it does:
Given this input:
hi, excuse me sir, how are you.
It puts each word into an array of strings based on the DELIMS constant, and deletes any char in the DELIMS const. This will destroy your original input string though. I simply print out the array of strings:
[hi][excuse][me][sir][how][are][you][(null)]
Now this is taking input from stdin, but you can change it around to take it from a file stream. You also might want to consider input limits and such.
#include <stdio.h>
#include <string.h>
#define CHAR_LENGTH 100
const char *DELIMS = " ,.\n";
char *p;
int i;
int parse(char *inputLine, char *arguments[], const char *delimiters)
{
int count = 0;
for (p = strtok(inputLine, delimiters); p != NULL; p = strtok(NULL, delimiters))
{
arguments[count] = p;
count++;
}
return count;
}
int main()
{
char line[1024];
size_t bufferSize = 1024;
char *args[CHAR_LENGTH];
fgets(line, bufferSize, stdin);
int count = parse(line, args, DELIMS);
for (i = 0; i <= count; i++){
printf("[%s]", args[i]);
}
}
I am newbie in C and I will be glad for any help with this program:
Task:
User will enter 4-7 letters (for example 'ADFG').
I have detached text file which contains about several thousand of words
(for example:
BDF
BGFK
JKLI
NGJKL
POIUE
etc.)
-its written in list without that marks
I want to make program, which find words from this text file, which are same as letters which user entered (In this case, when I entered ADFG it will find and display BDF, BGFK, NGJKL).
This is my code so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char enter[8],row[80];
printf("4-7 different letteres: ");
gets(enter);
if (strlen(enter)>7 || strlen(enter)<4)
{
puts("incorrect number of letters");
return 1;
}
typedef struct structure
{
char row[50];
}Row;
Row textrow[40000];
FILE *file;
file = fopen("words.txt","r");
if (file==NULL)
{
printf("Error! File %s can not be opened.","words.txt");
return 1;
}
int i=0;
char words[30];
while (!feof(file))
{
fscanf(file,"%s",&textrow[i].row[0]);
for(int j=0;j<strlen(enter);j++)
{
for(int k=0;k<strlen(textrow[i].row);k++)
{
words=strchr(textrow[i].row[k],enter[j]);
printf("%s",words);
}
}
i++;
}
fclose(file);
return 0;
}
Thanks for any help.
e.g. use strpbrk
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char *words[] = {
"BDF", "BGFK", "JKLI", "NGJKL", "POIUE", NULL
};
char enter[8] = "ADFG";
char **p;
for(p = words; *p ; ++p){
char *word = *p;
if(strpbrk(word, enter)!=NULL){
printf("%s\n", word);
}
}
/*
BDF
BGFK
NGJKL
*/
return 0;
}
As others have commented, given your code is possibly able to compile and build as is, your problem is with the approach, you need to identify the steps necessary to accomplish what you desire. Create the algorithm. Here are some steps to do what I think you want to do:
Problem statement: Given a text file with many words, and user input of 1 or more characters, find all words in file that contain one or more of the characters user entered.
Steps: (approach)
1) Read all words of text file into a string array
2) Get user input string into char array
3) In Loop: Check each char of user input to each char of each string
len = strlen(userInputStr);
for(i=0;i<numWords;i++) //numWords is number of words in file
{
len2 = strlen(word[i]);//length of next word in file
for(j=0;j<len;j++)
{
for(k=0;k<len2;k++)
{
//check each char of user input against each char of current word
}
}
}
It's not something trivial but I would like to know the best way to process multiple outputs, for example:
Input
First line of input will contain a number T = number of test cases. Following lines will contain a string each.
Output
For each string, print on a single line, "UNIQUE" - if the characters are all unique, else print "NOT UNIQUE"
Sample Input
3
DELHI
london
#include<iostream>
Sample Output
UNIQUE
NOT UNIQUE
NOT UNIQUE
So how can I accomplish outputs like that? My code so far is:
int main(int argc, char *argv[])
{
int inputs, count=0;
char str[100];
char *ptr;
scanf("%d",&inputs);
while(inputs-- >0)
{
scanf("%s",str);
for(ptr=str; *ptr!='\0';ptr++)
{
if( *ptr== *(ptr+1))
{
count++;
}
}
if(count>0)
{
printf("NOT UNIQUE");
}
else
{
printf("UNIQUE");
}
}
}
But the above will obviously print the output after each input, but I want the output only after entering all the inputs, if the user enters 3, then the user have to give 3 strings and after the output will be given whether the given strings are unique or not. So I want to know how can I achieve the result given in the problem. Also another thing I want to know is, I am using an array of 100 char, which it can hold a string up to 100 characters, but what do I have to do if I want to handle string with no limit? Just declaring char *str is no good, so what to do?
Hope this helps:
#include <stdio.h>
int main(int argc, char *argv[])
{
int inputs,count=0;
char str[20];
scanf("%d",&inputs);
char *ptr;
char *dummy;
while(inputs-- >0)
{
scanf("%s",str);
for(ptr=str; *ptr!='\0';ptr++)
{
for(dummy=ptr+1; *dummy != '\0';dummy++)
{
if( *ptr== *dummy)
{
count=1;
}
}
if(count == 1)
break;
}
if(count>0)
{
printf("NOT UNIQUE");
}
else
{
printf("UNIQUE");
}
}
}
If you want to save stuff for later use, you must store it somewhere. The example below stores up to 10 lines in buf and then points str to the current line:
#include <stdlib.h>
#include <stdio.h>
#include <string.h> /* for strlen */
#include <ctype.h> /* for isspace */
int main(int argc, char *argv[])
{
int ninput = 0;
char buf[10][100]; /* storage for 10 strings */
char *str; /* pointer to current string */
int i;
printf("Enter up to 10 strings, blank to and input:\n");
for (i = 0; i < 10; i++) {
int l;
str = buf[i];
/* read line and break on end-of-file (^D) */
if (fgets(str, 100, stdin) == NULL) break;
/* delete trailing newline & spaces */
l = strlen(str);
while (l > 0 && isspace(str[l - 1])) l--;
str[l] = '\0';
/* break loop on empty input */
if (l == 0) break;
ninput++;
}
printf("Your input:\n");
for (i = 0; i < ninput; i++) {
str = buf[i];
printf("[%d] '%s'\n", i + 1, str);
}
return 0;
}
Note the two separate loops for input and output.
I've also rejiggled your input. I'm not very fond of fscanf; I prefer to read input line-wise with fgets and then analyse the line with strtok or sscanf. The advantage over fscanf is that yout strings may contain white-space. The drawback is that you have a newline at the end which you usually don't want and have to "chomp".
If you want to allow for longer strings, you should use dynamic allocation with malloc, although I'm not sure if it is useful when reading user input from the console. Tackle that when you have understood the basics of fixed-size allocation on the stack.
Other people have already pointed you to the error in your check for uniqueness.