to find the frequency of words in a text file using java - hashset

Ive managed to parse the entire contents of a given input text file and store each word in a hash set. But now i need to find the frequenct of each of these words in this input file, any suggestions as to how I can go about? :)

Use a HashMap instead of a HashSet and this class as the value:
class Counter {
public int frequency;
}
addWord() then looks like this:
public void addWord (String word) {
Counter c = map.get (word);
if (c == null) {
c = new Counter ();
map.put(word, c);
}
c.frequency ++;
}

Related

Assertion error/exception on basic while loop

Hello everyone I'm currently trying to learn java using University of Helsinki's MOOC Java course. I'm currently working on this problem:
Write a program that reads values from the user until they input a 0. After this, the program prints the total number of inputted values that are negative. The zero that's used to exit the loop should not be included in the total number count.
I'm running into this error: java.lang.AssertionError: Something strange happened. It may be that 'class NumberOfNegativeNumbers'class's public static void main(String[] args) method is missing
or your program crashed due to an exception. More information java.util.NoSuchElementException: No line found
I'd like to solve this without using assertions or catch's because it hasn't been covered, and just understand why this is happening. I've tried replacing the nextInt() with nextLine() and converting the string into an int using Integer.valueOf, as well as adding parameters for the else statement so that if num == 0 it would break, though it shouldn't be needed, as well as moving around my conditional statements.
import java.util.Scanner;
public class NumberOfNegativeNumbers {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
int count = 0;
while (true) {
System.out.println("Give a number: ");
int num = scanner.nextInt();
if (num <= 0){
count = count + 1;
continue;
} else if (num >= 0) {
continue;
} else {
break;
}
}
System.out.println("Number of negative numbers: " + count);
}
}
Looks like, according to an example I found on the internet for Scanner.nextInt(), it's expected to be used with Scanner.hasNextInt(), eg this example I copied from a tutorial site:
package com.tutorialspoint;
import java.util.*;
public class ScannerDemo {
public static void main(String[] args) {
String s = "Hello World! 3 + 3.0 = 6.0 true ";
// create a new scanner with the specified String Object
Scanner scanner = new Scanner(s);
// find the next int token and print it
// loop for the whole scanner
while (scanner.hasNext()) {
// if the next is a int, print found and the int
if (scanner.hasNextInt()) {
System.out.println("Found :" + scanner.nextInt());
}
// if no int is found, print "Not Found:" and the token
System.out.println("Not Found :" + scanner.next());
}
// close the scanner
scanner.close();
}
}
So, likely you need to wrap your code in that hasNextInt() check.
Also, you might need to double check the control flow that will occur if the user enters a zero. A prompt: if scanner.nextInt() returns a 0, what happens in the code you posted? Another hint might be that only two characters need to change to fix the issue I'm pointing out obliquely here.

Strcmp generate a core dump

So i have a std::unordered_map, i want to acces to strings stored intro this map. I want to search intro intro all words inside the map and compare with a given word. If the strings are same then continue execution of the if statement.
{
public:
bool CheckFoo(const char* word);
protected:
typedef std::unordered_map<std::string, bool> word_map;
word_map words_map;
};
bool CheckFoo(const char* word)
{
if (words_map.empty())
{
return false;
}
auto it = words_map.begin();
while (it != words_map.end())
{
const std::string &r = it->first;
const char* tmp = word;
if (strcmp(tmp, r.c_str() ) == 0)
{
return true;
}
}
return false;
}
if ( CheckFoo("wordFoo") )
{
// bla bla
}
The problem is that those codes generate a .core dump file..
Do you see any mistakes in my codes?
The crash core analyze point me to strcmp line
Can't write comments yet but,
Like Nunchy wrote, tmp is not defined in that context.
I also noticed that your code never increments the map iterator, which would result in a never ending loop.
I'm assuming you did not copy your actual code into your post but instead rewrote it hastily which resulted in some typos, but if not, try making sure you're using temp and not tmp in your call to strcmp, and make sure the loop actually increments the iterator.
Like one of the comments on your post points out as well, make sure you actually have data in the map, and the function parameter.
You are declaring temp then referencing tmp which doesn't exist:
const char* temp = word;
if (strcmp(tmp, r.c_str() ) == 0)
Does this compile? Surely it should be:
const char* temp = word;
if (strcmp(temp, r.c_str() ) == 0)
?

How to Convert Binary from a TXT file in C /txt including only int/

The part of my code that I'm asking about looks like this.
My TXT is containing number from 1-20 divided by . I want to make a BINARY file
from this txt, that's what the program supposed to do, but it is only feeling it up with memory dirt. Can you tell me if my code has mistakes.
void txt_to_bin (void)
{
FILE *ft,*fb;
int a;
ft = fopen("binadatok.txt","rt");
fb = fopen("versenyazonosito.dat","wb");
while (fscanf(ft,"%d\n",&a) != EOF)
{
fprintf(fb,"%d\n");
}
}
You need to use fwrite when writing to a binary file, not fprintf:
fwrite(&a, sizeof(a), 1, fb);
you are not providing any value in fprintf(fb,"%d\n") you should provide input of a in this statement .
void txt_to_bin (void)
{
FILE *ft,*fb;
int a;
ft = fopen("binadatok.txt","rt");
fb = fopen("versenyazonosito.dat","wb");
while (fscanf(ft,"%d\n",&a) != EOF)
{
fprintf(fb,"%d\n",a);
}
}
now it will work.

bilingual program in console application in C

I have been trying to implement a way to make my program bilingual : the user could chose if the program should display French or English (in my case).
I have made lots of researches and googling but I still cannot find a good example on how to do that :/
I read about gettext, but since this is for a school's project we are not allowed to use external libraries (and I must admit I have nooo idea how to make it work even though I tried !)
Someone also suggested to me the use of arrays one for each language, I could definitely make this work but I find the solution super ugly.
Another way I thought of is to have to different files, with sentences on each line and I would be able to retrieve the right line for the right language when I need to. I think I could make this work but it also doesn't seem like the most elegant solution.
At last, a friend said I could use DLL for that. I have looked up into that and it indeed seems to be one of the best ways I could find... the problem is that most resources I could find on that matter were coded for C# and C++ and I still have no idea how I would do to implement in C :/
I can grasp the idea behind it, but have no idea how to handle it in C (at all ! I do not know how to create the DLL, call it, retrieve the right stuff from it or anything >_<)
Could someone point me to some useful resources that I could use, or write a piece of code to explain the way things work or should be done ?
It would be seriously awesome !
Thanks a lot in advance !
(Btw, I use visual studio 2012 and code in C) ^^
If you can't use a third party lib then write your own one! No need for a dll.
The basic idea is the have a file for each locale witch contains a mapping (key=value) for text resources.
The name of the file could be something like
resources_<locale>.txt
where <locale> could be something like en, fr, de etc.
When your program stars it reads first the resource file for specified locale.
Preferably you will have to store each key/value pair in a simple struct.
Your read function reads all key/value pair into a hash table witch offers a very good access speed. An alternative would be to sort the array containing the key/value pairs by key and then use binary search on lookup (not the best option, but far better than iterating over all entries each time).
Then you'll have to write a function get_text witch takes as argument the key of the text resource to be looked up an return the corresponding text in as read for the specified locale. You have to handle keys witch have no mapping, the simplest way would be to return key back.
Here is some sample code (using qsort and bsearch):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DEFAULT_LOCALE "en"
#define NULL_ARG "[NULL]"
typedef struct localized_text {
char* key;
char* value;
} localized_text_t;
localized_text_t* localized_text_resources = NULL;
int counter = 0;
char* get_text(char*);
void read_localized_text_resources(char*);
char* read_line(FILE*);
void free_localized_text_resources();
int compare_keys(const void*, const void*);
void print_localized_text_resources();
int main(int argc, char** argv)
{
argv++;
argc--;
char* locale = DEFAULT_LOCALE;
if(! *argv) {
printf("No locale provided, default to %s\n", locale);
} else {
locale = *argv;
printf("Locale provided is %s\n", locale);
}
read_localized_text_resources(locale);
printf("\n%s, %s!\n", get_text("HELLO"), get_text("WORLD"));
printf("\n%s\n", get_text("foo"));
free_localized_text_resources();
return 0;
}
char* get_text(char* key)
{
char* text = NULL_ARG;
if(key) {
text = key;
localized_text_t tmp;
tmp.key = key;
localized_text_t* result = bsearch(&tmp, localized_text_resources, counter, sizeof(localized_text_t), compare_keys);
if(result) {
text = result->value;
}
}
return text;
}
void read_localized_text_resources(char* locale)
{
if(locale) {
char localized_text_resources_file_name[64];
sprintf(localized_text_resources_file_name, "resources_%s.txt", locale);
printf("Read localized text resources from file %s\n", localized_text_resources_file_name);
FILE* localized_text_resources_file = fopen(localized_text_resources_file_name, "r");
if(! localized_text_resources_file) {
perror(localized_text_resources_file_name);
exit(1);
}
int size = 10;
localized_text_resources = malloc(size * sizeof(localized_text_t));
if(! localized_text_resources) {
perror("Unable to allocate memory for text resources");
}
char* line;
while((line = read_line(localized_text_resources_file))) {
if(strlen(line) > 0) {
if(counter == size) {
size += 10;
localized_text_resources = realloc(localized_text_resources, size * sizeof(localized_text_t));
}
localized_text_resources[counter].key = line;
while(*line != '=') {
line++;
}
*line = '\0';
line++;
localized_text_resources[counter].value = line;
counter++;
}
}
qsort(localized_text_resources, counter, sizeof(localized_text_t), compare_keys);
// print_localized_text_resources();
printf("%d text resource(s) found in file %s\n", counter, localized_text_resources_file_name);
}
}
char* read_line(FILE* p_file)
{
int len = 10, i = 0, c = 0;
char* line = NULL;
if(p_file) {
line = malloc(len * sizeof(char));
c = fgetc(p_file);
while(c != EOF) {
if(i == len) {
len += 10;
line = realloc(line, len * sizeof(char));
}
line[i++] = c;
c = fgetc(p_file);
if(c == '\n' || c == '\r') {
break;
}
}
line[i] = '\0';
while(c == '\n' || c == '\r') {
c = fgetc(p_file);
}
if(c != EOF) {
ungetc(c, p_file);
}
if(strlen(line) == 0 && c == EOF) {
free(line);
line = NULL;
}
}
return line;
}
void free_localized_text_resources()
{
if(localized_text_resources) {
while(counter--) {
free(localized_text_resources[counter].key);
}
free(localized_text_resources);
}
}
int compare_keys(const void* e1, const void* e2)
{
return strcmp(((localized_text_t*) e1)->key, ((localized_text_t*) e2)->key);
}
void print_localized_text_resources()
{
int i = 0;
for(; i < counter; i++) {
printf("Key=%s value=%s\n", localized_text_resources[i].key, localized_text_resources[i].value);
}
}
Used with the following resource files
resources_en.txt
WORLD=World
HELLO=Hello
resources_de.txt
HELLO=Hallo
WORLD=Welt
resources_fr.txt
HELLO=Hello
WORLD=Monde
run
(1) out.exe /* default */
(2) out.exe en
(3) out.exe de
(4) out.exe fr
output
(1) Hello, World!
(2) Hello, World!
(3) Hallo, Welt!
(4) Hello, Monde!
gettext is the obvious answer but it seems it's not possible in your case. Hmmm. If you really, really need a custom solution... throwing out a wild idea here...
1: Create a custom multilingual string type. The upside is that you can easily add new languages afterwards, if you want. The downside you'll see in #4.
//Terrible name, change it
typedef struct
{
char *french;
char *english;
} MyString;
2: Define your strings as needed.
MyString s;
s.french = "Bonjour!";
s.english = "Hello!";
3: Utility enum and function
enum
{
ENGLISH,
FRENCH
};
char* getLanguageString(MyString *myStr, int language)
{
switch(language)
{
case ENGLISH:
return myStr->english;
break;
case FRENCH:
return myStr->french;
break;
default:
//How you handle other values is up to you. You could decide on a default, for instance
//TODO
}
}
4: Create wrapper functions instead of using plain old C standard functions. For instance, instead of printf :
//Function should use the variable arguments and allow a custom format, too
int myPrintf(const char *format, MyString *myStr, int language, ...)
{
return printf(format, getLanguageString(myStr, language));
}
That part is the painful one : you'll need to override every function you use strings with to handle custom strings. You could also specify a global, default language variable to use when one isn't specified.
Again : gettext is much, much better. Implement this only if you really need to.
the main idea of making programs translatable is using in all places you use texts any kind of id. Then before displaying the test you get the text using the id form the appropriate language-table.
Example:
instead of writing
printf("%s","Hello world");
You write
printf("%s",myGetText(HELLO_WORLD));
Often instead of id the native-language string itself is used. e.g.:
printf("%s",myGetText("Hello world"));
Finally, the myGetText function is usually implemented as a Macro, e.g.:
printf("%s", tr("Hello world"));
This macro could be used by an external parser (like in gettext) for identifying texts to be translated in source code and store them as list in a file.
The myGetText could be implemented as follows:
std::map<std::string, std::map<std::string, std::string> > LangTextTab;
std::string GlobalVarLang="en"; //change to de for obtaining texts in German
void readLanguagesFromFile()
{
LangTextTab["de"]["Hello"]="Hallo";
LangTextTab["de"]["Bye"]="Auf Wiedersehen";
LangTextTab["en"]["Hello"]="Hello";
LangTextTab["en"]["Bye"]="Bye";
}
const char * myGetText( const char* origText )
{
return LangTextTab[GlobalVarLang][origText ].c_str();
}
Please consider the code as pseudo-code. I haven't compiled it. Many issues are still to mention: unicode, thread-safety, etc...
I hope however the example will give you the idea how to start.

Splitting a comma-delimited string of integers

My background is not in C (it's in Real Studio - similar to VB) and I'm really struggling to split a comma-delimited string since I'm not used to low-level string handling.
I'm sending strings to an Arduino over serial. These strings are commands in a certain format. For instance:
#20,2000,5!
#10,423,0!
'#' is the header indicating a new command and '!' is the terminating footer marking the end of a command. The first integer after '#' is the command id and the remaining integers are data (the number of integers passed as data may be anywhere from 0 - 10 integers).
I've written a sketch that gets the command (stripped of the '#' and '!') and calls a function called handleCommand() when there is a command to handle. The problem is, I really don't know how to split this command up to handle it!
Here's the sketch code:
String command; // a string to hold the incoming command
boolean commandReceived = false; // whether the command has been received in full
void setup() {
// put your setup code here, to run once:
Serial.begin(9600);
}
void loop() {
// main loop
handleCommand();
}
void serialEvent(){
while (Serial.available()) {
// all we do is construct the incoming command to be handled in the main loop
// get the incoming byte from the serial stream
char incomingByte = (char)Serial.read();
if (incomingByte == '!')
{
// marks the end of a command
commandReceived = true;
return;
}
else if (incomingByte == '#')
{
// marks the start of a new command
command = "";
commandReceived = false;
return;
}
else
{
command += incomingByte;
return;
}
}
}
void handleCommand() {
if (!commandReceived) return; // no command to handle
// variables to hold the command id and the command data
int id;
int data[9];
// NOT SURE WHAT TO DO HERE!!
// flag that we've handled the command
commandReceived = false;
}
Say my PC sends the Arduino the string "#20,2000,5!". My sketch ends up with a String variable (called command) that contains "20,2000,5" and the commandRecieved boolean variable is set to True so the handleCommand() function is called.
What I would like to do in the (currently useless) handleCommand() function is assign 20 to a variable called id and 2000 and 5 to an array of integers called data, i.e: data[0] = 2000, data[1] = 5, etc.
I've read about strtok() and atoi() but frankly I just can't get my head around them and the concept of pointers. I'm sure my Arduino sketch could be optimised too.
Since you're using the Arduino core String type, strtok and other string.h functions aren't appropriate. Note that you can change your code to use standard C null-terminated strings instead, but using Arduino String will let you do this without using pointers.
The String type gives you indexOf and substring.
Assuming a String with the # and ! stripped off, finding your command and arguments would look something like this:
// given: String command
int data[MAX_ARGS];
int numArgs = 0;
int beginIdx = 0;
int idx = command.indexOf(",");
String arg;
char charBuffer[16];
while (idx != -1)
{
arg = command.substring(beginIdx, idx);
arg.toCharArray(charBuffer, 16);
// add error handling for atoi:
data[numArgs++] = atoi(charBuffer);
beginIdx = idx + 1;
idx = command.indexOf(",", beginIdx);
}
data[numArgs++] = command.substring(beginIdx);
This will give you your entire command in the data array, including the command number at data[0], while you've specified that only the args should be in data. But the necessary changes are minor.
seems to work, could be buggy:
#include<stdio.h>
#include <string.h>
int main(){
char string[]="20,2000,5";
int a,b,c;
sscanf(string,"%i,%i,%i",&a,&b,&c);
printf("%i %i %i\n",a,b,c);
a=b=c=0;
a=atoi(strtok(string,","));
b=atoi(strtok(0,","));
c=atoi(strtok(0,","));
printf("%i %i %i\n",a,b,c);
return 0;
}

Resources