Is it possible, in any language (it doesn't matter), to have a hash function which uses an array of strings as keys?
I mean something like this:
hash(["word1", "word2", ...]) = "element"
instead of the classical:
hash("word") = "element"
I need something like that because each word I would like to use as key could change the output element of the function. I've got a sequence of words and I want a specific output for that sequence (also the order may change the result).
Sure. Any data structure at all can be hashed. You only need to come up with a strict definition of equality and then ensure that hash(A) == hash(B) if A == B. Suppose your definition is that [s1, s2, ..., sm] == [t1, t2, ..., tn] if and only if m == n and si == ti for i = 1..m and further string s == t if and only if |s|==|t| and s[i]==t[i] for 0<=i<|s|. You can build a hash in a many, many ways:
Concatenate all the strings in the list and hash the result with any string hash function.
Do the same, adding separators such as commas (,)
Hash each string individually and xor the results.
Hash eash string individually, shift the previous hash value, and xor the new value into the hash.
Infinitely many more possibilities...
Tigorous definition of equality is important. If for example order doesn't matter in the lists or the string comparison is case-insensitive, then the hash function must still be designed to ensure hash(A) == hash(B) if A == B . Getting this wrong will cause lookups to fail.
Java is one language that lets you define a hash function for any data type. And in fact a library list of strings will work just fine as a key using the default hash function.
HashMap<ArrayList<String>, String> map = new HashMap<ArrayList<String>, String>();
ArrayList<String> key = new ArrayList<String>();
key.add("Hello");
key.add("World");
map.put(key, "It's me.");
// map now contains mapping ["Hello", "World"] -> "It's me."
Yes it is possible, but in most cases you will have to define your own hash function that translates an array into a hash-key. For example, in java, the array.hashCode() is based on the Object.hashCode() function which is based on the Reference itself and not the contents of the Object.
You may also have a look at Arrays.deepHashCode() function in java if you are interested in an implementation of a hashing function built on top of an array.
Related
I'm new to scala/java and I have troubles getting the difference between those two.
By reading the scala doc I understood that ArrayBuffer are made to be interactive (append, insert, prepend, etc).
1) What are the fundamental implementation differences?
2) Is there performance variation between those two?
Both Array and ArrayBuffer are mutable, which means that you can modify elements at particular indexes: a(i) = e
ArrayBuffer is resizable, Array isn't. If you append an element to an ArrayBuffer, it gets larger. If you try to append an element to an Array, you get a new array. Therefore to use Arrays efficiently, you must know its size beforehand.
Arrays are implemented on JVM level and are the only non-erased generic type. This means that they are the most efficient way to store sequences of objects – no extra memory overhead, and some operations are implemented as single JVM opcodes.
ArrayBuffer is implemented by having an Array internally, and allocating a new one if needed. Appending is usually fast, unless it hits a limit and resizes the array – but it does it in such a way, that the overall effect is negligible, so don't worry. Prepending is implemented as moving all elements to the right and setting the new one as the 0th element and it's therefore slow. Appending n elements in a loop is efficient (O(n)), prepending them is not (O(n²)).
Arrays are specialized for built-in value types (except Unit), so Array[Int] is going to be much more optimal than ArrayBuffer[Int] – the values won't have to be boxed, therefore using less memory and less indirection. Note that the specialization, as always, works only if the type is monomorphic – Array[T] will be always boxed.
The one other difference is, Array's element created as on when its declared but Array Buffer's elements not created unless you assign values for the first time.
For example. You can write Array1(0)="Stackoverflow" but not ArrayBuffer1(0)="Stackoverflow" for the first time value assignments.
(Array1 = Array variable & ArrayBuffer1 = ArrayBuffer variable)
Because as we know, Array buffers are re-sizable, so elements created when you insert values at the first time and then you can modify/reassign them at the particular element.
Array:
Declaring and assigning values to Int Array.
val favNums= new Array[Int](20)
for(i<-0 to 19){
favNums(i)=i*2
}
favNums.foreach(println)
ArrayBuffer:
Declaring and assigning values to Int ArrayBuffer.
val favNumsArrayBuffer= new ArrayBuffer[Int]
for(j<-0 to 19){
favNumsArrayBuffer.insert(j, (j*2))
//favNumsArrayBuffer++=Array(j*3)
}
favNumsArrayBuffer.foreach(println)
If you include favNumsArrayBuffer(j)=j*2 at the first line in the for loop, It doesn't work. But it works fine if you declare it in 2nd or 3rd line of the loop. Because values assigned already at the first line now you can modify by element index.
This simple one-hour video tutorial explains a lot.
https://youtu.be/DzFt0YkZo8M?t=2005
Use an Array if the length of Array is fixed, and an ArrayBuffer if the length can vary.
Another difference is in term of reference and value equality
Array(1,2) == Array(1,2) // res0: Boolean = false
ArrayBuffer(1, 2) == ArrayBuffer(1,2) // res1: Boolean = true
The reason for the difference is == routes to .equals where Array.equals is implemented using Java's == which compares references
public boolean equals(Object obj) {
return (this == obj);
}
whilst ArrayBuffer.equals compares elements contained by ArrayBuffer using sameElements method
override def equals(o: scala.Any): Boolean = this.eq(o.asInstanceOf[AnyRef]) || (
o match {
case it: Seq[A] => (it eq this) || (it canEqual this) && sameElements(it)
case _ => false
}
)
Similarly, contains behaves differently
Array(Array(1,2)).contains(Array(1,2)) // res0: Boolean = false
ArrayBuffer(ArrayBuffer(1,2)).contains(ArrayBuffer(1,2)) // res1: Boolean = true
I am trying to read a file of names, and hash those names to a spot based on the value of their letters. I have succeeded in getting the value for each name in the file, but I am having trouble creating a hash table. Can I just use a regular array and write a function to put the words at their value's index?
while (fgets(name,100, ptr_file)!=NULL) //while file isn't empty
{
fscanf(ptr_file, "%s", &name); //get the first name
printf("%s ",name); //prints the name
int length = strlen(name); // gets the length of the name
int i;
for (i =0; i <length; i++)
//for the length of the string add each letter's value up
{
value = value + name [i];
value=value%50;
}
printf("value= %1d\n", value);
}
No, you can't, because you'll have collisions, and you'll thus have to account for multiple values to a single hash. Generally, your hashing is not really a wise implementation -- why do you limit the range of values to 50? Is memory really really sparse, so you can't have a bigger dictionary than 50 pointers?
I recommend using an existing C string hash table implementation, like this one from 2001.
In general, you will have hash collisions and you will need to find a way to handle that. The two main ways of handling that situation are as follows:
If the hash table entry is taken, you use the next entry, and iterate until you find an empty entry. When looking things up, you need to do similarly. This is nice and simple for adding things to the hash table, but makes removing things tricky.
Use hash buckets. Each hash table entry is a linked list of entries with the same hash value. First you find your hash bucket, then you find the entry on the list. When adding a new entry, you merely add it to the list.
This is my demo-program, which reads strings (words) from stdin, and
deposit into hashtable. Thereafter (at EOF), iterate hashtable, and compute number of words, distinct words, and prints most frequent word:
http://olegh.cc.st/src/words.c.txt
In hashtable, utilized double hashing algorithm.
I'd like a suggestion in c language for the following problem:
I need an association between strings and integers like this:
"foo" => 45,
"bar" => 1023,
etc...
and be able to find the string using the associated integer and the integer using the associated string.
For string to integer I can use hash tables but I'll loose the way back.
The simple solution that I'm using but which is very slow is to create a table:
static param_t params [] = {
{ "foo", 45 },
{ "bar", 1023 },
...
};
and using two functions compare each entry (string or integer) to get the string or the integer.
This works perfectly by this is linear search which is very slow.
What could I use to have a search algorithm in O(1) to find a string and O(size of string) to find the integer?
Any ideas?
The easiest way is to implement lookup tables, preferably sorted by the integer value ("primary key").
typedef enum
{
FOO_INDEX,
BAR_INDEX,
...
N
} some_t;
const int int_array [] = // values sorted in ascending order, smallest first
{
45,
1023,
...
};
const char* str_array [] =
{
"foo",
"bar",
...
};
Now you can use int_array[FOO_INDEX] and str_array[FOO_INDEX] to get the desired data.
Since these are constant tables set at compile-time, you can sort the data. All lookups can then be done with binary search, O(log n). If you have the integer value but need to know the index, perform a binary search on the int_array. And once you have found the index, you get instant lookup from there.
For this to work, both arrays must have the exact size N. To ensure array sizes and data integrity inside those arrays, use a compile-time assert:
static_assert(sizeof(int_array)/sizeof(*int_array) == N, "Bad int_array");
static_assert(sizeof(str_array)/sizeof(*str_array) == N, "Bad str_array");
Sort your list with qsort first, and then use bsearch to find items. It's not O(1), but at least it is O(log(n)).
Use two hashmaps. One for the association from integer to string and another one for the association from string to integer.
An inefficient way would be to kind of convert it to base 256. First-letter-ASCII times 256 in the power of 0 (1) PLUS Second-letter-ASCII times 256 in the power of 1 and so on. Very inefficient (Because long won't be enough so either contain the number in another string or use a mathematical C library. I know there are hashes in Ruby and Perl and it's basically an array that you get into with a certain key (can be a string) but I don't know how it's working.
I have an assignment to make a program in C that displays a number (n < 50) of valid, context-free grammar strings using the following context-free grammar:
S -> AA|0
A -> SS|1
I had few concepts of how to do it, but after analyzing them more and more, none of them were right.
For now, I'm planning to make an array and randomly change [..., A, ...] for [..., S, S, ...] or [..., 1, ...] until there are only 0s and 1s and then check whether the same thing was already randomly generated.
I'm still not convinced if that is the right approach, and I still don't know exactly how to do that or where to keep the final words because the basic form will be an array of chars of different length. Also, in C, is a two dimensional array of chars equal to an array of strings?
Does this make any sense, and is it a proper way to do it? Or am I missing something?
You can simply make a random decision every time you need to decide on something. For example:
function A():
if (50% random chance)
return "1"
else
return concat(S(), S())
function S():
if (50% random chance)
return "0"
else
return concat(A(), A())
Calling S() multiple times give me these outputs:
"0"
"00110110100100101111010111111111001111101011100100011000000110101110000110101110
10001000110001111100011000101011000001101111000110110011101010111111111011010011
10000000101111100100011011010000000101000111110010001000101001100110100111111111
1001010011"
"11"
"10010010101111010111101"
All valid strings for your grammar. Note that you may need to tweak a little the random chances. This sample has a high probability to generate very small strings like "11".
Try to think of the context-free grammar as a set of rules that allow you to generate new strings in a language. For example, the first rule:
S -> AA | 0
How could you generate a word S in this language? One way is with a function that generates, at random, either the string "0" or two A words, concatenated.
Similarly, to implement the second rule:
A -> SS | 1
write a function that generates, at random, either "1" or two S words concatenated.
You asked several questions...
Regarding The question: BTW in C, is two dimensional array of chars equal to array of strings?
Yes.
Here are ways to declare arrays of strings, each example shows varying flexibility in terms of usage:
char **ArrayOfStrings; //most flexible declaration -
//pointer to pointer, can use `calloc()` or `malloc()` to create memory for
//any number of strings of any length (all strings will have same length)
or
char *ArrayOfStrings[10]; //somewhat flexible -
//pointer to array of 10 strings, again can use `c(m)alloc()` to allocate memory for
//each string to have any lenth (all strings will have same length)
or
ArrayOfStrings[5][10]; //Not flexible - (but still very useful)
//2 dimensional array of 5 strings, each with space for up to 9 chars + '\0'
//Note: In C, by definition, strings must always be NULL terminated.
Note: Although each of these forms are valid, and very useful when used correctly, It is good to be aware there are differences in the way each will behave in practice. (read the link for a good discussion on that)
In Lua, using the C interface, given a table, how do I iterate through the table's key/value pairs?
Also, if some table table members are added using arrays, do I need a separate loop to iterate through those as well or is there a single way to iterate though those members at the same time as the key/value pairs?
As Javier says, you want the lua_next() function. I thought a code sample might help make things clearer since this can be a little tricky to use at first glance.
Quoting from the manual:
A typical traversal looks like this:
/* table is in the stack at index 't' */
lua_pushnil(L); /* first key */
while (lua_next(L, t) != 0) {
/* uses 'key' (at index -2) and 'value' (at index -1) */
printf("%s - %s\n",
lua_typename(L, lua_type(L, -2)),
lua_typename(L, lua_type(L, -1)));
/* removes 'value'; keeps 'key' for next iteration */
lua_pop(L, 1);
}
Be aware that lua_next() is very sensitive to the key value left on the stack. Do not call lua_tolstring() on the key unless it really is already a string because that function will replace the value it converts.
lua_next() is just the same as Lua's next() function, which is used by the pairs() function. It iterates all the members, both in the array part and the hash part.
If you want the analogue of ipairs(), the lua_objlen() gives you the same functionality as #. Use it and lua_rawgeti() to numerically iterate over the array part.