Finding the amount of different elements at array - arrays

We have an array at size n. How we can find how many different types of elements we have at n and what is the amount of each one?
For example: at {1,-5,2,-5,2,7,-5,-5} we have 4 different types, and the array of the amounts will be: {1,2,1,4}.
So my questions are:
How we can find how many different elements there is at the array?
How we can count the amount if each one?
Now, I try to solve it at Omega(n), I try a lot but I didn't find a way. I try to solve it with hash-tables.

You are trying to get frequency of an element in an array.
Initialize a Hash where every new key is initialized with value 0.
Loop through array and add this key to hash and increment the value.
In JavaScript:
hash = {};
a = [1,-5,2,-5,2,7,-5,-5];
for(var i = 0; i < a.length; ++i) {
if(hash[a[i]] === undefined)
hash[a[i]] = 0
hash[a[i]] = hash[a[i]] + 1;
}
console.log(hash.toSource());

The syntax and specific data structures you use will vary between languages, but the basic idea would be to store a running count of the number of instances of each value in an associative data structure (HashMap, Dictionary, whatever your language calls it).
Here is an example that will work in Java (I took a guess at the language you were using).
It's probably bad Java, but it illustrates the idea.
int[] myArray = {1,-5,2,-5,2,7,-5,-5};
HashMap<Object,Integer> occurrences = new HashMap<Object,Integer>();
for (int i=0;i<myArray.length;i++)
{
if (occurrences.get(myArray[i]) == null)
{
occurrences.put(myArray[i],1);
}
else
{
occurrences.put(myArray[i],occurrences.get(myArray[i])+1);
}
}
You can then use your HashMap to look up the distinct elements of the array like this
occurrences.keySet()
Other languages have their own HashSet implementations (Dictionaries in .NET and Python, Hashes in Ruby).

There are different approaches to solve this problem.The question that asked here might be asked in different ways.Here the the simple way to do it with std::map which is available in STL libraries.But remember it will be always sort by key.
int arr[]={1,-5,2,-5,2,7,-5,-5};
int n=sizeof(arr)/sizeof(arr[0]);
map<int,int>v;
for(int i=0;i<n;i++)
{
if(v[arr[i]])
v[arr[i]]++;
else
v[arr[i]]=1;
}
map<int,int>::iterator it;
for(it=v.begin();it!=v.end();++it)
cout<<it->first<<" "<<it->second<<endl;
return 0;
it will show output like
-5 4
1 1
2 2
7 1

I suggest you read about 'Count Sort'
Although i am not sure i understood correctly what you actually want to ask. Anyway, i think you want to:
1.) Scan an array and come up with the frequency of each unique element in that array.
2.) Total amount of unique elements
3.) all that in linear computational time
I think, what you need is Counting Sort. See algo on wiki.
You can obviously skip the sorting part. But you must see how it does the sorting (the useful part for your problem). It, first, calculates a histogram (array of size nominally equal to the number of unique elements in you original array) of frequency of each key. This works for integers only (although you can always sort other types by putting integer pointers).
So, every index of this histogram array will correspond to an element in your original array, and the value at this index will correspond to the frequency of this element in the original array.
For Example;
your array x = {3, 4, 3, 3, 1, 0, 1, 3}
//after calculation, you will get
your histogram array h[0 to 4] = {1, 2, 0, 4, 1}
i hope that is what you asked

Related

Implementing Radix sort in java - quite a few questions

Although it is not clearly stated in my excercise, I am supposed to implement Radix sort recursively. I've been working on the task for days, but yet, I only managed to produce garbage, unfortunately. We are required to work with two methods. The sort method receives a certain array with numbers ranging from 0 to 999 and the digit we are looking at. We are supposed to generate a two-dimensional matrix here in order to distribute the numbers inside the array. So, for example, 523 is positioned at the fifth row and 27 is positioned at the 0th row since it is interpreted as 027.
I tried to do this with the help of a switch-case-construct, dividing the numbers inside the array by 100, checking for the remainder and then position the number with respect to the remainder. Then, I somehow tried to build buckets that include only the numbers with the same digit, so for example, 237 and 247 would be thrown in the same bucket in the first "round". I tried to do this by taking the whole row of the "fields"-matrix where we put in the values before.
In the putInBucket-method, I am required to extent the bucket (which I managed to do right, I guess) and then returning it.
I am sorry, I know that the code is total garbage, but maybe there's someone out there who understands what I am up to and can help me a little bit.
I simply don't see how I need to work with the buckets here, I even don't understand why I have to extent them, and I don't see any way to returning it back to the sort-method (which, I think, I am required to do).
Further description:
The whole thing is meant to work as follows: We take an array with integers ranging from 0 to 999. Every number is then sorted by its first digit, as mentioned above. Imagine you have buckets denoted with the numbers ranging from 0 to 9. You start the sorting by putting 523 in bucket 5, 672 in bucket 6 and so on. This is easy when there is only one number (or no number at all) in one of the buckets. But it gets harder (and that's where recursion might come in hand) when you want to put more than one number in one bucket. The mechanism now goes as follows: We put two numbers with the same first digit in one bucket, for example 237 and 245. Now, we want to sort these numbers again by the same algorithm, meaning we call the sort-method (somehow) again with an array that only contains these two numbers and sorting them again, but now my we do by looking at the second digit, so we would compare 3 and 4. We sort every number inside the array like this, and at the end, in order to get a sorted array, we start at the end, meaning at bucket 9, and then just put everything together. If we would be at bucket 2, the algorithm would look into the recursive step and already receive the sorted array [237, 245] and deliver it in order to complete the whole thing.
My own problems:
I don't understand why we need to extent a bucket and I can't figure it out from the description. It is simply stated that we are supposed to do so. I'd imagine that we would to it to copy another element inside it, because if we have the buckets from 0 to 9, putting in two numbers inside the same bucket would just mean that we would overwrite the first value. This might be the reason why we need to return the new, extended bucket, but I am not sure about that. Plus, I don't know how to go further from there. Even if I have an extened bucket now, it's not like I can simply stick it to the old matrix and copy another element into it again.
public static int[] sort(int[] array, int digit) {
if (array.length == 0)
return array;
int[][] fields = new int[10][array.length];
int[] bucket = new int[array.length];
int i = 0;
for (int j = 0; j < array.length; j++) {
switch (array[j] / 100) {
case 0: i = 0; break;
case 1: i = 1; break;
...
}
fields[i][j] = array[j]
bucket[i] = fields[i][j];
}
return bucket;
}
private static int[] putInBucket(int [] bucket, int number) {
int[] bucket_new = int[bucket.length+1];
for (int i = 1; i < bucket_new.length; i++) {
bucket_new[i] = bucket[i-1];
}
return bucket_new;
}
public static void main (String [] argv) {
int[] array = readInts("Please type in the numbers: ");
int digit = 0;
int[] bucket = sort(array, digit);
}
You don't use digit in sort, that's quite suspicious
The switch/case looks like a quite convoluted way to write i = array[j] / 100
I'd recommend to read the wikipedia description of radix sort.
The expression to extract a digit from a base 10 number is (number / Math.pow(10, digit)) % 10.
Note that you can count digits from left to right or right to left, make sure you get this right.
I suppose you first want to sort for digit 0, then for digit 1, then for digit 2. So there should be a recursive call at the end of sort that does this.
Your buckets array needs to be 2-dimensional. You'll need to call it this way: buckets[i] = putInBucket(buckets[i], array[j]). If you handle null in putInBuckets, you don't need to initialize it.
The reason why you need a 2d bucket array and putInBucket (instead of your fixed size field) is that you don't know how many numbers will end up in each bucket
The second phase (reading back from the buckets to the array) is missing before the recursive call
make sure to stop the recursion after 3 digits
Good luck

How do I generate random numbers from an array without repetition?

I know similar question have been asked before but bear with me.
I have an array:
int [] arr = {1,2,3,4,5,6,7,8,9};
I want numbers to be generated randomly 10 times. Something like this:
4,6,8,2,4,9,3,8,7
Although some numbers are repeated, no number is generated more than once in a row. So not like this:
7,3,1,8,8,2,4,9,5,6
As you can see, the number 8 was repeated immediately after it was generated. This is not the desired effect.
So basically, I'm ok with a number being repeated as long as it doesn't appear more than once in a row.
Generate a random number.
Compare it to the last number you generated
If it is the same; discard it
If it is different, add it to the array
Return to step 1 until you have enough numbers
generate a random index into the array.
repeat until it's different from the last index used.
pull the value corresponding to that index out of the array.
repeat from beginning until you have as many numbers as you need.
While the answers posted are not bad and would work well, someone might be not pleased with the solution as it is possible (tough incredibly unlikely) for it to hang if you generate long enough sequence of same numbers.
Algorithm that deals with this "problem", while preserving distribution of numbers would be:
Pick a random number from the original array, let's call it n, and output it.
Make array of all elements but n
Generate random index from the shorter array. Swap the element on the index with n. Output n.
Repeat last step until enough numbers is outputed.
int[] arr = {1, 2, 3, 4, 5, 6, 7, 8, 9};
int[] result = new int[10];
int previousChoice = -1;
int i = 0;
while (i < 10) {
int randomIndex = (int) (Math.random() * arr.length);
if (arr[randomIndex] != previousChoice) {
result[i] = arr[randomIndex];
i++;
}
}
The solutions given so far all involve non-constant work per generation; if you repeatedly generate indices and test for repetition, you could conceivably generate the same index many times before finally getting a new index. (An exception is Kiraa's answer, but that one involves high constant overhead to make copies of partial arrays)
The best solution here (assuming you want unique indices, not unique values, and/or that the source array has unique values) is to cycle the indices so you always generate a new index in (low) constant time.
Basically, you'd have a with loop like this (using Python for language mostly for brevity):
# randrange(x, y) generates an int in range x to y-1 inclusive
from random import randrange
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
result = []
selectidx = 0
randstart = 0
for _ in range(10): # Runs loop body 10 times
# Generate offset from last selected index (randstart is initially 0
# allowing any index to be selected; on subsequent loops, it's 1, preventing
# repeated selection of last index
offset = randrange(randstart, len(arr))
randstart = 1
# Add offset to last selected index and wrap so we cycle around the array
selectidx = (selectidx + offset) % len(arr)
# Append element at newly selected index
result.append(arr[selectidx])
This way, each generation step is guaranteed to require no more than one new random number, with the only constant additional work being a single addition and remainder operation.

what's efficient way to filter an array

I am programming c on linux and I have a big integer array, how to filter it, say, find values that fit some condition, e.g. value > 1789 && value < 2031. what's the efficient way to do this, do I need to sort this array first?
I've read the answers and thank you all, but I need to do such filtering operation many times on this big array, not only for once. so is iterating it one by one every time the best way?
If the only thing you want to do with the array is to get the values that match this criteria, it would be faster just to iterate over the array and check each value for the condition (O(n) vs. O(nlogn)). If however, you are going to perform multiple operations on this array, than it's better to sort it.
Sort the array first. Then on each query do 2 binary searches. I'm assuming queries will be like -
Find integers x such that a < x < b
First binary search would find the index i of the element such that Array[i-1] <= a < Array[i] and second binary search would find the index j such that Array[j] < b <= Array[j+1]. Then your desired range would be [i, j].
This algorithm's complexity is O(NlogN) in preprocessing and O(N) per query if you want to iterate over all the elements and O(logN) per query if you just want to count the number of filtered element.
Let me know if you need help implementing binary search in C. There is library function named binary_search() in C and lower_bound() and upper_bound() in C++ STL.
You could use a max heap implemented as an array of the same size as the source array. Initialize it with min-1 value and insert values into the max-heap as the numbers come in. The first check would be to see if the number to be inserted is greater than the first element, if it's not, discard it, if it is larger then insert it into the array. To get the list of numbers back, read all numbers in the new array till min-1.
To filter the array, you'll have to look at each element once. There's no need to look at any element more than once, so a simple linear search of the array for items matching your criteria is going to be as efficient as you can get.
Sorting the array would end up looking at some elements more than once, which is not necessary for your purpose.
If you can spare some more memory, then you can scan your array once, get the indices of matching values and store it in another array. This new array will be significantly shorter since it has only indices of values which match a specific pattern! Something like this
int original_array[SOME_SIZE];
int new_array[LESS_THAN_SOME__SIZE];
for ( int i=0,j=0; i<SOME_SIZE; i++)
{
if ( original_array[i]> LOWER_LIMIT && original_array[i]< HIGHER_LIMIT )
{
new_array[j++] = i;
}
}
You need to do the above once and form now on,
for ( int i=0; i< LESS_THAN_SOME_SIZE; i++ )
{
if ( original_array[new_array[i]]> LOWER_LIMIT && original_array[new_array[i]]< HIGHER_LIMIT )
{
printf("Success! Found Value %d\n", original_array[new_array[i]] )
}
}
So at the cost of some memory, you can save considerable amount of time. Even if you invest some time in sorting, you have to parse the sorted array every time. This method minimizes the array length as well as the sorting time ( at the cost of extra memory, of course :) )
Try this library: http://code.google.com/p/boolinq/
It is iterator-based and as fast as can be, there are no any overhead. But it needs C++11 standard. Yor code will be written in declarative-way:
int arr[] = {1,2,3,4,5,6,7,8,9};
auto items = boolinq::from(arr).where([](int a){return a>3 && a<6;});
while (!items.empty())
{
int item = items.front();
...
}
Faster than iterator-based scan can be only multithreaded scan...

how to calculate the mode of an unsorted array of integers in O(N)?

...using an iterative procedure (no hash table)?
It's not homework. And by mode I mean the most frequent number (statistical mode). I don't want to use a hash table because I want to know how it can be done iteratively.
OK Fantius, how bout this?
Sort the list with a RadixSort (BucketSort) algorithm (technically O(N) time; the numbers must be integers). Start at the first element, remember its value and start a count at 1. Iterate through the list, incrementing the count, until you reach a different value. If the count for that value is higher than the current high count, remember that value and count as the mode. If you get a tie with the high count, remember both (or all) numbers.
... yeah, yeah, the RadixSort is not an in-place sort, and thus involves something you could call a hashtable (a collection of collections indexed by the current digit). However, the hashtable is used to sort, not to calculate the mode.
I'm going to say that on an unsorted list, it would be impossible to compute the mode in linear time without involving a hashtable SOMEWHERE. On a sorted list, the second half of this algorithm works by just keeping track of the current max count.
Definitely sounds like homework. But, try this: go through the list once, and find the largest number. Create an array of integers with that many elements, all initialized to zero. Then, go through the list again, and for each number, increment the equivalent index of the array by 1. Finally, scan your array and return the index that has the highest value. This will execute in roughly linear time, whereas any algorithm that includes a sort will probably take NlogN time or worse. However, this solution is a memory hog; it'll basically create a bell plot just to give you one number from it.
Remember that many (but not all) languages use arrays that are zero-based, so when converting from a "natural" number to an index, subtract one, and then add one to go from index to natural number.
If you don't want to use a hash, use a modified binary search trie (with a counter per node). For each element in the array insert into the trie. If it already exists in the trie, increment the counter. At the end, find the node with the highest counter.
Of course you can also use a hashmap that maps to a counter variable and will work the same way. I don't understand your complaint about it not being iterative... You iterate through the array, and then you iterate through the members of the hashmap to find the highest counter.
just use counting sort and look into array which store the number occurrences for each entity.h store the number occurrences for each entity.
I prepared two implementations in Python with different space and time complexity:
The first one uses "occurence array" is O(k) in terms of time complexity and S(k+1) in terms of space needed, where k is the greatest number in input.
input =[1,2,3,8,4,6,1,3,7,9,6,1,9]
def find_max(tab):
max=tab[0]
for i in range(0,len(tab)):
if tab[i] > max:
max=tab[i]
return max
C = [0]*(find_max(input)+1)
print len(C)
def count_occurences(tab):
max_occurence=C[0]
max_occurence_index=0
for i in range(0,len(tab)):
C[tab[i]]=C[tab[i]]+1
if C[tab[i]]>max_occurence:
max_occurence = C[tab[i]]
max_occurence_index=tab[i]
return max_occurence_index
print count_occurences(input)
NOTE: Imagine such pitiful example of input like an array [1, 10^8,1,1,1], there will be array of length k+1=100000001 needed.
The second one solution assumes, that we sort our input before searching for mode. I used radix sort, which has time complexity O(kn) where k is the length of the longest number and n is size of the input array. And then we have to iterate over whole sorted array of size n, to determine the longest subset of numbers standing for mode.
input =[1,2,3,8,4,6,1,3,7,9,6,1,9]
def radix_sort(A):
len_A = len(A)
mod = 5 #init num of buckets
div = 1
while True:
the_buckets = [[], [], [], [], [], [], [], [], [], []]
for value in A:
ldigit = value % mod
ldigit = ldigit / div
the_buckets[ldigit].append(value)
mod = mod * 10
div = div * 10
if len(the_buckets[0]) == len_A:
return the_buckets[0]
A = []
rd_list_append = A.append
for b in the_buckets:
for i in b:
rd_list_append(i)
def find_mode_in_sorted(A):
mode=A[0]
number_of_occurences =1
number_of_occurences_canidate=0
for i in range(1,len(A)):
if A[i] == mode:
number_of_occurences =number_of_occurences +1
else:
number_of_occurences_canidate=number_of_occurences_canidate+1
if A[i] != A[i-1]:
number_of_occurences_canidate=0
if number_of_occurences_canidate > number_of_occurences :
mode=A[i]
number_of_occurences =number_of_occurences_canidate+1
return mode#,number_of_occurences
s_input=radix_sort(input)
print find_mode_in_sorted(s_input)
Using JavaScript:
const mode = (arr) => {
let numMapping = {};
let mode
let greatestFreq = 0;
for(var i = 0; i < arr.length; i++){
if(numMapping[arr[i]] === undefined){
numMapping[arr[i]] = 0;
}
numMapping[arr[i]] += 1;
if (numMapping[arr[i]] > greatestFreq){
greatestFreq = numMapping[arr[i]]
mode = arr[i]
}
}
return parseInt(mode)
}

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Resources