string/integer associations in C - c

I'd like a suggestion in c language for the following problem:
I need an association between strings and integers like this:
"foo" => 45,
"bar" => 1023,
etc...
and be able to find the string using the associated integer and the integer using the associated string.
For string to integer I can use hash tables but I'll loose the way back.
The simple solution that I'm using but which is very slow is to create a table:
static param_t params [] = {
{ "foo", 45 },
{ "bar", 1023 },
...
};
and using two functions compare each entry (string or integer) to get the string or the integer.
This works perfectly by this is linear search which is very slow.
What could I use to have a search algorithm in O(1) to find a string and O(size of string) to find the integer?
Any ideas?

The easiest way is to implement lookup tables, preferably sorted by the integer value ("primary key").
typedef enum
{
FOO_INDEX,
BAR_INDEX,
...
N
} some_t;
const int int_array [] = // values sorted in ascending order, smallest first
{
45,
1023,
...
};
const char* str_array [] =
{
"foo",
"bar",
...
};
Now you can use int_array[FOO_INDEX] and str_array[FOO_INDEX] to get the desired data.
Since these are constant tables set at compile-time, you can sort the data. All lookups can then be done with binary search, O(log n). If you have the integer value but need to know the index, perform a binary search on the int_array. And once you have found the index, you get instant lookup from there.
For this to work, both arrays must have the exact size N. To ensure array sizes and data integrity inside those arrays, use a compile-time assert:
static_assert(sizeof(int_array)/sizeof(*int_array) == N, "Bad int_array");
static_assert(sizeof(str_array)/sizeof(*str_array) == N, "Bad str_array");

Sort your list with qsort first, and then use bsearch to find items. It's not O(1), but at least it is O(log(n)).

Use two hashmaps. One for the association from integer to string and another one for the association from string to integer.

An inefficient way would be to kind of convert it to base 256. First-letter-ASCII times 256 in the power of 0 (1) PLUS Second-letter-ASCII times 256 in the power of 1 and so on. Very inefficient (Because long won't be enough so either contain the number in another string or use a mathematical C library. I know there are hashes in Ruby and Perl and it's basically an array that you get into with a certain key (can be a string) but I don't know how it's working.

Related

Is an initialised 2d array fine for mapping consecutive integers to strings in C?

I have to map consecutive integer codes from 1 to 100 to strings in C. Normally, for a mapping of number to string, I would have something like this:
#define code1 1
#define code2 2
.
.
#define code100 100
struct map
{
int code;
char *msg;
}objs[100];
I would then loop over the objs and if the number matches, I would use the corresponding string of the obj array. Since I know that the numbers to be mapped are consecutive, I can just do this:
const char *arr[100] = { "abc", "def", ....... "100th msg"};
I can then forget the looping and just print arr[code]. Is this a bad approach? The only disadvantage I see is that when somebody else adds a code in the middle, they have to be careful about it. The advantage is obviously that I don't need to loop over the struct array.
Using a direct indexed array is a commonly used approach that works fine if the data never (rarely) changes, and there are not too many gaps, because you spend a record for every gap. At some point the management or the storage cost of the gaps may become an issue.
If you need to cope with more dynamic compile-time updates to the data then the next best thing is a sorted array. If you can guarantee that your entries are always in order but perhaps there are gaps, or new entries added to the end, then you can binary chop your ordered array to quickly find the entry you want. You may want to do a start-up pass that checks the array is correctly ordered, but you only have to do that once.
If you need to worry about runtime updates, then you seriously need to consider higher-level container abstractions such as mapping trees or hashmaps.
Suppose the array has error messages. Then a common approach is to define constants fo each error and print the message associated with it, for example:
#define ERR_NONE 0
#define ERR_NOMEM 1
#define ERR_BADNUM 2
// etc
and define the array as:
const char *msgs[] = {
"No error",
"Out of memory",
"Bad number",
// etc
};
and have a function to print the message, for example:
void printmsg(int code)
{
printf("%s\n",msgs[code]);
}
which can be called as
printmsg(ERR_NOMEM);
For modularity, the #defines can be in e.g. errors.h, together with the prototype of printmsg, and the array can be in errors.c.
The only problem with your approach is that the codes can never change. You can't add intermediate codes without changing the entire code. But it should work. Also the first code should be zero or you'll have to either pad the array or shift the codes when accessing.
Essentially what you have is an immutable hash table.
#define BASE_CODE 5
#define CODE_BLUE 5
#define CODE_GREEN 6
const char *responses[] = {'blue', 'green'};
printf("%s\n", responses[code - BASE_CODE]);
If you want to be able to change the codes (add, remove, insert codes in the middle of the sequence, verify if a code was properly referenced), then you should stick with the first approach, but add a hash function so you don't need to loop sequentially over the array.

Array of strings as hash function key?

Is it possible, in any language (it doesn't matter), to have a hash function which uses an array of strings as keys?
I mean something like this:
hash(["word1", "word2", ...]) = "element"
instead of the classical:
hash("word") = "element"
I need something like that because each word I would like to use as key could change the output element of the function. I've got a sequence of words and I want a specific output for that sequence (also the order may change the result).
Sure. Any data structure at all can be hashed. You only need to come up with a strict definition of equality and then ensure that hash(A) == hash(B) if A == B. Suppose your definition is that [s1, s2, ..., sm] == [t1, t2, ..., tn] if and only if m == n and si == ti for i = 1..m and further string s == t if and only if |s|==|t| and s[i]==t[i] for 0<=i<|s|. You can build a hash in a many, many ways:
Concatenate all the strings in the list and hash the result with any string hash function.
Do the same, adding separators such as commas (,)
Hash each string individually and xor the results.
Hash eash string individually, shift the previous hash value, and xor the new value into the hash.
Infinitely many more possibilities...
Tigorous definition of equality is important. If for example order doesn't matter in the lists or the string comparison is case-insensitive, then the hash function must still be designed to ensure hash(A) == hash(B) if A == B . Getting this wrong will cause lookups to fail.
Java is one language that lets you define a hash function for any data type. And in fact a library list of strings will work just fine as a key using the default hash function.
HashMap<ArrayList<String>, String> map = new HashMap<ArrayList<String>, String>();
ArrayList<String> key = new ArrayList<String>();
key.add("Hello");
key.add("World");
map.put(key, "It's me.");
// map now contains mapping ["Hello", "World"] -> "It's me."
Yes it is possible, but in most cases you will have to define your own hash function that translates an array into a hash-key. For example, in java, the array.hashCode() is based on the Object.hashCode() function which is based on the Reference itself and not the contents of the Object.
You may also have a look at Arrays.deepHashCode() function in java if you are interested in an implementation of a hashing function built on top of an array.

Find a single integer that occurs with even frequency in a given array of ints when all others occur odd with frequency

This is an interview question.
Given an array of integers, find the single integer value in the array which occurs with even frequency. All integers will be positive. All other numbers occur odd frequency. The max number in the array can be INT_MAX.
For example, [2, 8, 6, 2] should return 2.
the original array can be modified if you can find better solutions such as O(1) space with O(n) time.
I know how to solve it by hashtable (traverse and count freq). It is O(n) time and space.
Is it possible to solve it by O(1) space or better time?
Given this is an interview question, the answer is: O(1) space is achievable "for very big values of 1":
Prepare a matcharray 1..INT_MAX of all 0
When traversing the array, use the integer as an index into the matcharray, adding 1
When done, traverse the match array to find the one entry with a positive even value
The space for this is large, but independent of the size of the input array, so O(1) space. For really big data sets (say small value range, but enormous array length), this might even be a practically valid solution.
If you are allowed to sort the original array, I believe that you can do this in O(n lg U) time and O(lg U) space, where U is the maximum element of the array. The idea is as follows - using in-place MSD radix sort, sort the array in O(n lg U) time and O(lg U) space. Then, iterate across the array. Since all equal values are consecutive, you can then count how many times each value appears. Once you find the value that appears an even number of times, you can output the answer. This second scan requires O(n) time and O(1) space.
If we assume that U is a fixed constant, this gives an O(n)-time, O(1)-space algorithm. If you don't assume this, then the memory usage is still better than the O(n) algorithm provided that lg U = O(n), which should be true on most machines. Moreover, the space usage is only logarithmically as large as the largest element, meaning that the practical space usage is quite good. For example, on a 64-bit machine, we'd need only space sufficient to hold 64 recursive calls. This is much better than allocating a gigantic array up-front. Moreover, it means that the algorithm is a weakly-polynomial time algorithm as a function of U.
That said, this does rearrange the original array, and thus does destructively modify the input. In a sense, it's cheating because it uses the array itself for the O(n) storage space.
Hope this helps!
Scan through the list maintaining two sets, the 'Even' set and the 'Odd' set. If an element hasn't been seen before (i.e. if it's in neither set), place it in the 'Odd' set. If an element is in one set, move it to the other set. At the end, there should be only one item in the 'Even' set. This probably won't be fast, but the memory usage should be reasonable for large lists.
-Make a hash table containing ints. Call it is_odd or something. Since you might have to look through an array of size INT_MAX, just make it an array of size INT_MAX. Initialize to 0.
-Traverse through the whole array. You have to do this. There's no way to beat O(n).
for each number:
if it's not in the hash table, mark its spot in the table as 1.
if it is in the hash table then:
if its value is '1', make it '2'
if its value is '2', make it '1'.
Now you have to traverse through the hash table. Pull out the sole entry with "2" as the value.
Time:
You traverse the array once and the hash table once, so O(n).
Space:
Just an array of size INT_MAX. Or if you know the range of your array you can restrict your memory use to that.
edit: I just saw that you already had this method. Sorry about that!
I guess we read the task improperly. It asks us "find the single integer value in the array which occurs with even frequency". So, assuming that there is exactly ONE even element, the solution is:
public static void main(String[] args) {
int[] array = { 2, 1, 2, 4, 4 };
int count = 0;
for (int i : array) {
count^=i;
}
System.out.println(count); // Prints 1
}

Hashed Array Tips in C Language

I need some ideas to develop a good hashing function for my assignment. I have a list of all the countries in the world (around 190) in total. The names of each country is the key for the hashing function. Is there a specific kind of hashing function anyone would recommend to store this data in a hashing function without many collisions? Also, can you perhaps give an example of how to implement it?
Use GNU gperf. For inputs like yours, it will generate C code for you which implements a perfect hash function (for the given inputs). No collisions, no worries.
You can use generated perfect hash for that (GNU perf).
Of if the set of strings is dynamic then you can use ternary trie.
For N unique strings it will give you unique number [1..N]. For your case it will be faster than with hash tables.
Here is my implementation of such thing:
http://code.google.com/p/tiscript/source/browse/trunk/tool/tl_ternary_tree.h
The simplest approach I can think of is for each country's name to compute the sum of the ASCII values in its representation and use this as the hash value:
int hash(const char *s)
{
int h = 0;
while (s && *s)
h += *s++;
return h;
}
If your hash map has size N, you store country names with map[hash(my_country) % N] = my_country. Conceptually.
Just try this approach and see whether the resulting hash values are sufficiently uniformly distributed. Note that the quality of the distribution may also depend on N.

(Algorithm) Find if two unsorted arrays have any common elements in O(n) time without sorting?

We have two unsorted arrays and each array has a length of n. These arrays contain random integers in the range of 0-n100. How to find if these two arrays have any common elements in O(n)/linear time? Sorting is not allowed.
Hashtable will save you. Really, it's like a swiss knife for algorithms.
Just put in it all values from the first array and then check if any value from the second array is present.
You have not defined the model of computation. Assuming you can only read O(1) bits in O(1) time (anything else would be a rather exotic model of computation), there can be no algorithm solving the problem in O(n) worst case time complexity.
Proof Sketch:
Each number in the input takes O(log(n ^ 100)) = O(100 log n) = O(log n) bits. The entire input therefore O(n log n) bits, which can not be read in O(n) time. Any O(n) algorithm can therefore not read the entire input, and hence not react if these bits matter.
Answering Neil:
Since you know at start what is your N (two arrays of size N), you can create a hash with array size of 2*N*some_ratio (for example: some_ratio= 1.5). With this size, almost all simple hash functions will provide you good spread of the entities.
You can also implement find_or_insert to search for existing or insert a new one at the same action, this will reduce the hash function and comparison calls. (c++ stl find_or_insert is not good enough since it doesnt tell you whether the item was there before or not).
Linearity Test
Using Mathematica hash function and arbitrary length integers.
Tested until n=2^20, generating random numbers till (2^20)^100 = (approx 10^602)
Just in case ... the program is:
k = {};
For[t = 1, t < 21, t++,
i = 2^t;
Clear[a, b];
Table[a[RandomInteger[i^100]] = 1, {i}];
b = Table[RandomInteger[i^100], {i}];
Contains = False;
AppendTo[k,
{i, First#Timing#For[j = 2, j <= i, j++,
Contains = Contains || (NumericQ[a[b[[j]]]]);
]}]];
ListLinePlot[k, PlotRange -> All, AxesLabel -> {"n", "Time(secs)"}]
Put the elements of the first array in an hash table, and check for existence scanning the second array. This gives you a solution in O(N) average case.
If you want a truly O(N) worst case solution then instead of using an hash table use a linear array in the range 0-n^100. Note that you need to use just a single bit per entry.
If storage is not important, then scratch hash table in favor for an array of n in length. Flag to true when you come across that number in first array. In pass through second array, if you find any of them to be true, you have your answer. O(n).
Define largeArray(n)
// First pass
for(element i in firstArray)
largeArray[i] = true;
// Second pass
Define hasFound = false;
for(element i in secondArray)
if(largeArray[i] == true)
hasFound = true;
break;
Have you tried a counting sort? It is simple to implement, uses O(n) space and also has a \theta(n) time complexity.
Based on the ideas posted till date.We can store the one array integer elements into a hash map . Maximum number of different integers can be stored in RAM . Hash map will have only unique integer values. Duplicates are ignored.
Here is the implementation in Perl language.
#!/usr/bin/perl
use strict;
use warnings;
sub find_common_elements{ # function that prints common elements in two unsorted array
my (#arr1,#array2)=#_; # array elements assumed to be filled and passed as function arguments
my $hash; # hash map to store value of one array
# runtime to prepare hash map is O(n).
foreach my $ele ($arr1){
$hash->{$ele}=true; # true here element exists key is integer number and value is true, duplicate elements will be overwritten
# size of array will fit in memory as duplicate integers are ignored ( mx size will be 2 ^( 32) -1 or 2^(64) -1 based on operating system) which can be stored in RAM.
}
# O(n ) to traverse second array and finding common elements in two array
foreach my $ele2($arr2){
# search in hash map is O(1), if all integers of array are same then hash map will have only one entry and still search tim is O(1)
if( defined $hash->{$ele}){
print "\n $ele is common in both array \n";
}
}
}
I hope it helps.

Resources