Detecting most frequently recurring symbol from ASCII-characters on C - c

How do I write an implementation for a function that takes as input a sequence of ASCII-characters, and gives the most frequently recurring symbol? I need make it on C, Where my bad?
char mostFrequentCharacter(char* str, int size);
char value;
int valueCount = 0;
for (int i =0; i < strlen(str); i++)
{
char oneChar = str[i];
var totalCount = source.Split(oneChar).Length - 1;;
if (totalCount >= valueCount)
{
valueCount = totalCount;
value = oneChar;
}
}
return value;
The function to be optimized to run on a device with a dual-core ARM-based processors and infinite amount of memory.

If the memory is not an issue as you noted, then you shoud create lookup table where you will store number of occurences for each character. Since input is sequence of ASCII characters, size of the structure should be 256. After checking input and initializing lookup table, in the main for loop, increment number of occurences in the corresponding place in the lookup table, check if the number of occurences exceeded the current maximal count, if so, update current maximal count and current most frequent character. In the end, just return most frequent character. Time complexity of this solution is O(N) and space complexity O(1).
char mostFrequentCharacter(char* str, int size) {
char mosfFrequent;
int counts[256], i, maxCount = 0;
// in the case of invalid input, return some invalid character
if(!str || size < 1)
return '\0';
for(i = 0; i < 256; i++)
counts[i] = 0;
for (i = 0; i < size; i++)
{
counts[str[i]]++;
if(counts[str[i]] > maxCount) {
maxCount = counts[str[i]];
mostFrequent = str[i];
}
}
return mostFrequent;
}

Here is an algorithm outline:
Declare a 256 element array of integers (pick your size), zero it.
Loop over the string:
2a. Use each char as index into your array and increment element.
2b. If incremented element is largest so far record index.
All done in one pass over the string. Storage 256 * size of integer bytes, but you have "infinite memory" and that's minuscule in comparison ;-)

Related

C - Counting the occurrence of same number in an array

I have an array in C where:
int buf[4];
buf[0] = 1;
buf[1] = 2;
buf[2] = 5;
buf[3] = 2;
and I want to count how many elements in the array that have the same value with a counter.
In the above example, the number of elements of similar value is 2 since there are two 2s in the array.
I tried:
#include <stdio.h>
int main() {
int buf[4];
int i = 0;
int count = 0;
buf[0] = 1;
buf[1] = 2;
buf[2] = 5;
buf[3] = 2;
int length = sizeof(buf) / sizeof(int);
for (i=0; i < length; i++) {
if (buf[i] == buf[i+1]) {
count++;
}
}
printf("count = %d", count);
return 0;
}
but I'm getting 0 as the output. Would appreciate some help on this.
Update
Apologies for not being clear.
First:
the array is limited to only of size 4 since it involves 4 directions, left, bottom, top and right.
Second:
if there is at least 2 elements in the array that have the same value, the count is accepted. Anything less will simply not register.
Example:
1,2,5,2
count = 2 since there are two '2's in the array.
1,2,2,2
count = 3 since there are three '2's in the array
1,2,3,4
count = 0 since there are no similarities in the array. Hence this is not accepted.
Anything less than the count = 2 is invalid.
You are really rather hamstrung by the order the values appear within buf. The only rudimentary way to handle this when limited to 4-values is to make a pass with nested loops to determine what the matching value is, and then make a single pass over buf again counting how many times it occurs (and since you limit to 4-values, even with a pair of matches, your count is limited to 2 -- so it doesn't make a difference which you count)
A short example would be:
#include <stdio.h>
int main (void) {
int buf[] = {1, 2, 5, 2},
length = sizeof(buf) / sizeof(int),
count = 0,
same = 0;
for (int i = 0; i < length - 1; i++) /* identify what value matches */
for (int j = i + 1; i < length; i++)
if (buf[i] == buf[j]) {
same = buf[i];
goto saved; /* jump out of both loops when same found */
}
saved:; /* the lowly, but very useful 'goto' saves the day - again */
for (int i = 0; i < length; i++) /* count matching numbers */
if (buf[i] == same)
count++;
printf ("count = %d\n", count);
return 0;
}
Example Use/Output
$ ./bin/arr_freq_count
count = 2
While making that many passes over the values, it takes little more to use an actual frequency array to fully determine how often each value occurs, e.g.
#include <stdio.h>
#include <string.h>
#include <limits.h>
int main (void) {
int buf[] = {1, 2, 3, 4, 5, 2, 5, 6},
n = sizeof buf / sizeof *buf,
max = INT_MIN,
min = INT_MAX;
for (int i = 0; i < n; i++) { /* find max/min for range */
if (buf[i] > max)
max = buf[i];
if (buf[i] < min)
min = buf[i];
}
int range = max - min + 1; /* max-min elements (inclusive) */
int freq[range]; /* declare VLA */
memset (freq, 0, range * sizeof *freq); /* initialize VLA zero */
for (int i = 0; i < n; i++) /* loop over buf setting count in freq */
freq[buf[i]-min]++;
for (int i = 0; i < range; i++) /* output frequence of values */
printf ("%d occurs %d times\n", i + min, freq[i]);
return 0;
}
(note: add a sanity check on the range to prevent being surprised by the amount of storage required if min is actually close to INT_MIN and your max is close to INT_MAX -- things could come to quick stop depending on the amount of memory available)
Example Use/Output
$ ./bin/freq_arr
1 occurs 1 times
2 occurs 2 times
3 occurs 1 times
4 occurs 1 times
5 occurs 2 times
6 occurs 1 times
After your edit and explanation that you are limited to 4-values, the compiler should optimize first rudimentary approach just fine. However, for any more than 4-values or when needing the frequency of anything (characters in a file, duplicates in an array, etc..), think frequency array.
The first thing that's wrong is that you are only comparing adjacent values in the buf array. You have to compare all the values to each other.
How to do this is an architectural question. The approach suggested by David Rankin in the comments is one, using an array of structs with the value and count count is a second, and using a hash table is a third option. You've got some coding to do! Good luck. Ask for more help as you need it.
You are comparing values of buf[i] and buf[i+1]. i.e. You are comparing buf[0] with buf[1], buf[1] with buf[2] etc.
What you need is a nested for loop to compare all buf values with each other.
count = 0;
for (i=0; i<4; i++)
{
for (j=i+1; j<4; j++)
{
if (buf[i]==buf[j])
{
count++;
}
}
}
As pointed out by Jonathan Leffler, there is an issue in the above algorithm in case the input has elements {1,1,1,1}. It gives a value of 6 when expected value is 4.
I am keeping it up, as the OP has mentioned that he wants to only check anything above 2. So, this method may still be useful.

Number of valid sub-string algorithm

I came across a question for which I couldn't find the algorithm. Can you help me?
Question- A valid substring is one which contains the letter a or z. You will get a string and you have to calculate the number of valid sub-strings of that string.For example- the string 'abcd' contains 4 valid substrings. The string 'azazaz' contains 21 valid substrings and similarly 'abbzbba' contains 22 valid substrings.
I just want to know the algorithm.
Define D[i] - number of valid substrings ending at index i.
Assuming you have this D[i], the solution is simply D[0]+D[1]+...+D[n-1].
Calculating D is fairly simple, by iterating the string and for each charater:
if it is "valid", all substrings ending with this characters are valid.
Otherwise, only by extending a valid substring that ended at last character - makes it valid.
C code:
int NumValidSubstrings(char* s) {
int n = strlen(s);
int D[n] = {0}; // VLA, if that's an issue, just use dynamic allocation
for (int i = 0; i < n; i++) {
if (s[i] == 'z' || s[i] == 'a') {
// if character is valid, each substring ending with it is also valid.
D[i] += i + 1;
} else if (i > 0) {
// Else, only valid substrings from last character, that are extended by 1
D[i] = D[i-1];
}
}
int count = 0;
for (int i = 0; i < n; i++) count += D[i];
return count;
}
Notes:
This technique is called Dynamic Programming.
This solution is O(n) time + space.
You can save some space by not storing the entire D array - but only the last value and calculate count on the fly, making this solution O(1) space and O(n) time.

Hash function is not giving desire results

I am implementing hash function in order to check the anagrams, but I am not getting desired output. Could you suggest what went wrong?
Output:
key[148]:val[joy]
key[174]:val[jam]
key[294]:val[paula]
key[13]:val[ulrich]
key[174]:val[cat]
key[174]:val[act]
key[148]:val[yoj]
key[265]:val[vij]
key[265]:val[jiv]
Here key value 174 is fine for strings act and cat (anagrams) but same can't be expected with jam.
Below is the code snippet.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
unsigned long hash(char *str, size_t size) {
unsigned long hash_val = 5381;
unsigned long sum = 0;
char *val;
int i, j;
for (j = 0; j < 9; j++) {
val = malloc(strlen(str) + 1);
memset(val, '\0', strlen(str) + 1);
strcpy(val, str);
for (i = 0; val[i] != '\0'; i++) {
sum = sum + val[i];
}
return size % sum;
}
}
int main() {
int i;
char *str[9] = { "joy", "jam", "paula", "ulrich","cat", "act","yoj", "vij", "jiv" };
unsigned long key;
size_t size = 4542; // it may be anything just for test it is being used
for (i = 0; i < 9; i++) {
key = hash(str[i], size);
printf("\nkey[%ld]:val[%s]", key, str[i]);
}
return 1;
}
Yes, it can, because your hash function is very poorly written - it returns your constant 'size' variable modulo sum of all the string characters.
The problem is that the sum of ASCII codes 'c' + 'a' + 't' is equal to the 'j' + 'a' + 'm' (equal to 312) so you are getting the same value for your 'hash'.
You could use a 'normal' (e.g. polynomial) hash function for your anagram table, but with sorted strings - that would be the easiest approach.
For another method, you can calculate a number of appearances of each letter in the string (a histogram) and hash (or just store as is) them instead.
I recommend you to do some research on this topic as it's a very common task.
Also, you could just sort the strings and let unordered_set<string> do the job for you.
but same can't be expected with jam.
Well, there you go wrong. Let's see your algo. What you're doing is basically summing up the ASCII value of the elements of the strings, and returning the modulus result of a fixed value taken with respect to the sum.
To elaborate, as per the ASCII table,
j == 106
a == 97
m == 109
and
c == 99
a == 97
t == 116
Both the words end up having a sum result of 312.
Now as per your algo,
4542 % 312
is suppose to give a constant value, right? That is what it is giving.
Now, don't be "sad", as
s == 115
a ==97
d == 100
that also comes up with 312.
That said, I see you have a local variable unsigned long hash_val = 5381; defined inside your function, but used nowhere.
Your hash function has many problems:
The for (j = 0; j < 9; j++) loop is completely useless.
It is utterly inadequate to allocate memory for a copy of the string, and to forget to free it! Just use the string directly.
You summing method has too many easy collisions, as you diagnosed: anagrams produce the same sum, but also many simple words. You should shuffle the sum between before each character value is added.
return size % sum; should really be return sum % size; so the return value can be used as an index into the hash table of size size. As a matter of fact, size % sum would invoke undefined behavior if sum happened to compute to 0, which would require a very long string (>16MB) but is possible.
Here is an improved hash function:
#include <limits.h>
// constraints: str != NULL, size > 0
size_t hash(const char *str, size_t size) {
size_t sum = 5381; // initial salt
while (*str != '\0') {
// rotate the current sum 2 places to the left
sum = (sum << 2) | (sum >> (CHAR_BIT * sizeof(sum) - 2));
// add the next character value
sum += (unsigned char)*str++;
}
return sum % size;
}

Algorithm: efficient way to remove duplicate integers from an array

I got this problem from an interview with Microsoft.
Given an array of random integers,
write an algorithm in C that removes
duplicated numbers and return the unique numbers in the original
array.
E.g Input: {4, 8, 4, 1, 1, 2, 9} Output: {4, 8, 1, 2, 9, ?, ?}
One caveat is that the expected algorithm should not required the array to be sorted first. And when an element has been removed, the following elements must be shifted forward as well. Anyway, value of elements at the tail of the array where elements were shifted forward are negligible.
Update: The result must be returned in the original array and helper data structure (e.g. hashtable) should not be used. However, I guess order preservation is not necessary.
Update2: For those who wonder why these impractical constraints, this was an interview question and all these constraints are discussed during the thinking process to see how I can come up with different ideas.
A solution suggested by my girlfriend is a variation of merge sort. The only modification is that during the merge step, just disregard duplicated values. This solution would be as well O(n log n). In this approach, the sorting/duplication removal are combined together. However, I'm not sure if that makes any difference, though.
I've posted this once before on SO, but I'll reproduce it here because it's pretty cool. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing: Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}
How about:
void rmdup(int *array, int length)
{
int *current , *end = array + length - 1;
for ( current = array + 1; array < end; array++, current = array + 1 )
{
while ( current <= end )
{
if ( *current == *array )
{
*current = *end--;
}
else
{
current++;
}
}
}
}
Should be O(n^2) or less.
If you are looking for the superior O-notation, then sorting the array with an O(n log n) sort then doing a O(n) traversal may be the best route. Without sorting, you are looking at O(n^2).
Edit: if you are just doing integers, then you can also do radix sort to get O(n).
One more efficient implementation
int i, j;
/* new length of modified array */
int NewLength = 1;
for(i=1; i< Length; i++){
for(j=0; j< NewLength ; j++)
{
if(array[i] == array[j])
break;
}
/* if none of the values in index[0..j] of array is not same as array[i],
then copy the current value to corresponding new position in array */
if (j==NewLength )
array[NewLength++] = array[i];
}
In this implementation there is no need for sorting the array.
Also if a duplicate element is found, there is no need for shifting all elements after this by one position.
The output of this code is array[] with size NewLength
Here we are starting from the 2nd elemt in array and comparing it with all the elements in array up to this array.
We are holding an extra index variable 'NewLength' for modifying the input array.
NewLength variabel is initialized to 0.
Element in array[1] will be compared with array[0].
If they are different, then value in array[NewLength] will be modified with array[1] and increment NewLength.
If they are same, NewLength will not be modified.
So if we have an array [1 2 1 3 1],
then
In First pass of 'j' loop, array[1] (2) will be compared with array0, then 2 will be written to array[NewLength] = array[1]
so array will be [1 2] since NewLength = 2
In second pass of 'j' loop, array[2] (1) will be compared with array0 and array1. Here since array[2] (1) and array0 are same loop will break here.
so array will be [1 2] since NewLength = 2
and so on
1. Using O(1) extra space, in O(n log n) time
This is possible, for instance:
first do an in-place O(n log n) sort
then walk through the list once, writing the first instance of every back to the beginning of the list
I believe ejel's partner is correct that the best way to do this would be an in-place merge sort with a simplified merge step, and that that is probably the intent of the question, if you were eg. writing a new library function to do this as efficiently as possible with no ability to improve the inputs, and there would be cases it would be useful to do so without a hash-table, depending on the sorts of inputs. But I haven't actually checked this.
2. Using O(lots) extra space, in O(n) time
declare a zero'd array big enough to hold all integers
walk through the array once
set the corresponding array element to 1 for each integer.
If it was already 1, skip that integer.
This only works if several questionable assumptions hold:
it's possible to zero memory cheaply, or the size of the ints are small compared to the number of them
you're happy to ask your OS for 256^sizepof(int) memory
and it will cache it for you really really efficiently if it's gigantic
It's a bad answer, but if you have LOTS of input elements, but they're all 8-bit integers (or maybe even 16-bit integers) it could be the best way.
3. O(little)-ish extra space, O(n)-ish time
As #2, but use a hash table.
4. The clear way
If the number of elements is small, writing an appropriate algorithm is not useful if other code is quicker to write and quicker to read.
Eg. Walk through the array for each unique elements (ie. the first element, the second element (duplicates of the first having been removed) etc) removing all identical elements. O(1) extra space, O(n^2) time.
Eg. Use library functions which do this. efficiency depends which you have easily available.
Well, it's basic implementation is quite simple. Go through all elements, check whether there are duplicates in the remaining ones and shift the rest over them.
It's terrible inefficient and you could speed it up by a helper-array for the output or sorting/binary trees, but this doesn't seem to be allowed.
If you are allowed to use C++, a call to std::sort followed by a call to std::unique will give you the answer. The time complexity is O(N log N) for the sort and O(N) for the unique traversal.
And if C++ is off the table there isn't anything that keeps these same algorithms from being written in C.
You could do this in a single traversal, if you are willing to sacrifice memory. You can simply tally whether you have seen an integer or not in a hash/associative array. If you have already seen a number, remove it as you go, or better yet, move numbers you have not seen into a new array, avoiding any shifting in the original array.
In Perl:
foreach $i (#myary) {
if(!defined $seen{$i}) {
$seen{$i} = 1;
push #newary, $i;
}
}
The return value of the function should be the number of unique elements and they are all stored at the front of the array. Without this additional information, you won't even know if there were any duplicates.
Each iteration of the outer loop processes one element of the array. If it is unique, it stays in the front of the array and if it is a duplicate, it is overwritten by the last unprocessed element in the array. This solution runs in O(n^2) time.
#include <stdio.h>
#include <stdlib.h>
size_t rmdup(int *arr, size_t len)
{
size_t prev = 0;
size_t curr = 1;
size_t last = len - 1;
while (curr <= last) {
for (prev = 0; prev < curr && arr[curr] != arr[prev]; ++prev);
if (prev == curr) {
++curr;
} else {
arr[curr] = arr[last];
--last;
}
}
return curr;
}
void print_array(int *arr, size_t len)
{
printf("{");
size_t curr = 0;
for (curr = 0; curr < len; ++curr) {
if (curr > 0) printf(", ");
printf("%d", arr[curr]);
}
printf("}");
}
int main()
{
int arr[] = {4, 8, 4, 1, 1, 2, 9};
printf("Before: ");
size_t len = sizeof (arr) / sizeof (arr[0]);
print_array(arr, len);
len = rmdup(arr, len);
printf("\nAfter: ");
print_array(arr, len);
printf("\n");
return 0;
}
Here is a Java Version.
int[] removeDuplicate(int[] input){
int arrayLen = input.length;
for(int i=0;i<arrayLen;i++){
for(int j = i+1; j< arrayLen ; j++){
if(((input[i]^input[j]) == 0)){
input[j] = 0;
}
if((input[j]==0) && j<arrayLen-1){
input[j] = input[j+1];
input[j+1] = 0;
}
}
}
return input;
}
Here is my solution.
///// find duplicates in an array and remove them
void unique(int* input, int n)
{
merge_sort(input, 0, n) ;
int prev = 0 ;
for(int i = 1 ; i < n ; i++)
{
if(input[i] != input[prev])
if(prev < i-1)
input[prev++] = input[i] ;
}
}
An array should obviously be "traversed" right-to-left to avoid unneccessary copying of values back and forth.
If you have unlimited memory, you can allocate a bit array for sizeof(type-of-element-in-array) / 8 bytes to have each bit signify whether you've already encountered corresponding value or not.
If you don't, I can't think of anything better than traversing an array and comparing each value with values that follow it and then if duplicate is found, remove these values altogether. This is somewhere near O(n^2) (or O((n^2-n)/2)).
IBM has an article on kinda close subject.
Let's see:
O(N) pass to find min/max allocate
bit-array for found
O(N) pass swapping duplicates to end.
This can be done in one pass with an O(N log N) algorithm and no extra storage.
Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]. Meanwhile, a second index j, initially 0, keeps track of the size of the heap.
Examine a[i] and insert it into the heap, which now occupies elements a[0] to a[j+1]. As the element is inserted, if a duplicate element a[k] is encountered having the same value, do not insert a[i] into the heap (i.e., discard it); otherwise insert it into the heap, which now grows by one element and now comprises a[0] to a[j+1], and increment j.
Continue in this manner, incrementing i until all of the array elements have been examined and inserted into the heap, which ends up occupying a[0] to a[j]. j is the index of the last element of the heap, and the heap contains only unique element values.
int algorithm(int[] a, int n)
{
int i, j;
for (j = 0, i = 1; i < n; i++)
{
// Insert a[i] into the heap a[0...j]
if (heapInsert(a, j, a[i]))
j++;
}
return j;
}
bool heapInsert(a[], int n, int val)
{
// Insert val into heap a[0...n]
...code omitted for brevity...
if (duplicate element a[k] == val)
return false;
a[k] = val;
return true;
}
Looking at the example, this is not exactly what was asked for since the resulting array preserves the original element order. But if this requirement is relaxed, the algorithm above should do the trick.
In Java I would solve it like this. Don't know how to write this in C.
int length = array.length;
for (int i = 0; i < length; i++)
{
for (int j = i + 1; j < length; j++)
{
if (array[i] == array[j])
{
int k, j;
for (k = j + 1, l = j; k < length; k++, l++)
{
if (array[k] != array[i])
{
array[l] = array[k];
}
else
{
l--;
}
}
length = l;
}
}
}
How about the following?
int* temp = malloc(sizeof(int)*len);
int count = 0;
int x =0;
int y =0;
for(x=0;x<len;x++)
{
for(y=0;y<count;y++)
{
if(*(temp+y)==*(array+x))
{
break;
}
}
if(y==count)
{
*(temp+count) = *(array+x);
count++;
}
}
memcpy(array, temp, sizeof(int)*len);
I try to declare a temp array and put the elements into that before copying everything back to the original array.
After review the problem, here is my delphi way, that may help
var
A: Array of Integer;
I,J,C,K, P: Integer;
begin
C:=10;
SetLength(A,10);
A[0]:=1; A[1]:=4; A[2]:=2; A[3]:=6; A[4]:=3; A[5]:=4;
A[6]:=3; A[7]:=4; A[8]:=2; A[9]:=5;
for I := 0 to C-1 do
begin
for J := I+1 to C-1 do
if A[I]=A[J] then
begin
for K := C-1 Downto J do
if A[J]<>A[k] then
begin
P:=A[K];
A[K]:=0;
A[J]:=P;
C:=K;
break;
end
else
begin
A[K]:=0;
C:=K;
end;
end;
end;
//tructate array
setlength(A,C);
end;
The following example should solve your problem:
def check_dump(x):
if not x in t:
t.append(x)
return True
t=[]
output = filter(check_dump, input)
print(output)
True
import java.util.ArrayList;
public class C {
public static void main(String[] args) {
int arr[] = {2,5,5,5,9,11,11,23,34,34,34,45,45};
ArrayList<Integer> arr1 = new ArrayList<Integer>();
for(int i=0;i<arr.length-1;i++){
if(arr[i] == arr[i+1]){
arr[i] = 99999;
}
}
for(int i=0;i<arr.length;i++){
if(arr[i] != 99999){
arr1.add(arr[i]);
}
}
System.out.println(arr1);
}
}
This is the naive (N*(N-1)/2) solution. It uses constant additional space and maintains the original order. It is similar to the solution by #Byju, but uses no if(){} blocks. It also avoids copying an element onto itself.
#include <stdio.h>
#include <stdlib.h>
int numbers[] = {4, 8, 4, 1, 1, 2, 9};
#define COUNT (sizeof numbers / sizeof numbers[0])
size_t undup_it(int array[], size_t len)
{
size_t src,dst;
/* an array of size=1 cannot contain duplicate values */
if (len <2) return len;
/* an array of size>1 will cannot at least one unique value */
for (src=dst=1; src < len; src++) {
size_t cur;
for (cur=0; cur < dst; cur++ ) {
if (array[cur] == array[src]) break;
}
if (cur != dst) continue; /* found a duplicate */
/* array[src] must be new: add it to the list of non-duplicates */
if (dst < src) array[dst] = array[src]; /* avoid copy-to-self */
dst++;
}
return dst; /* number of valid alements in new array */
}
void print_it(int array[], size_t len)
{
size_t idx;
for (idx=0; idx < len; idx++) {
printf("%c %d", (idx) ? ',' :'{' , array[idx] );
}
printf("}\n" );
}
int main(void) {
size_t cnt = COUNT;
printf("Before undup:" );
print_it(numbers, cnt);
cnt = undup_it(numbers,cnt);
printf("After undup:" );
print_it(numbers, cnt);
return 0;
}
This can be done in a single pass, in O(N) time in the number of integers in the input
list, and O(N) storage in the number of unique integers.
Walk through the list from front to back, with two pointers "dst" and
"src" initialized to the first item. Start with an empty hash table
of "integers seen". If the integer at src is not present in the hash,
write it to the slot at dst and increment dst. Add the integer at src
to the hash, then increment src. Repeat until src passes the end of
the input list.
Insert all the elements in a binary tree the disregards duplicates - O(nlog(n)). Then extract all of them back in the array by doing a traversal - O(n). I am assuming that you don't need order preservation.
Use bloom filter for hashing. This will reduce the memory overhead very significantly.
In JAVA,
Integer[] arrayInteger = {1,2,3,4,3,2,4,6,7,8,9,9,10};
String value ="";
for(Integer i:arrayInteger)
{
if(!value.contains(Integer.toString(i))){
value +=Integer.toString(i)+",";
}
}
String[] arraySplitToString = value.split(",");
Integer[] arrayIntResult = new Integer[arraySplitToString.length];
for(int i = 0 ; i < arraySplitToString.length ; i++){
arrayIntResult[i] = Integer.parseInt(arraySplitToString[i]);
}
output:
{ 1, 2, 3, 4, 6, 7, 8, 9, 10}
hope this will help
Create a BinarySearchTree which has O(n) complexity.
First, you should create an array check[n] where n is the number of elements of the array you want to make duplicate-free and set the value of every element(of the check array) equal to 1. Using a for loop traverse the array with the duplicates, say its name is arr, and in the for-loop write this :
{
if (check[arr[i]] != 1) {
arr[i] = 0;
}
else {
check[arr[i]] = 0;
}
}
With that, you set every duplicate equal to zero. So the only thing is left to do is to traverse the arr array and print everything it's not equal to zero. The order stays and it takes linear time (3*n).
Given an array of n elements, write an algorithm to remove all duplicates from the array in time O(nlogn)
Algorithm delete_duplicates (a[1....n])
//Remove duplicates from the given array
//input parameters :a[1:n], an array of n elements.
{
temp[1:n]; //an array of n elements.
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
//based on 'value' sort the array temp.
//based on 'value' delete duplicate elements from temp.
//based on 'key' sort the array temp.//construct an array p using temp.
p[i]=temp[i]value
return p.
In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).
this is what i've got, though it misplaces the order we can sort in ascending or descending to fix it up.
#include <stdio.h>
int main(void){
int x,n,myvar=0;
printf("Enter a number: \t");
scanf("%d",&n);
int arr[n],changedarr[n];
for(x=0;x<n;x++){
printf("Enter a number for array[%d]: ",x);
scanf("%d",&arr[x]);
}
printf("\nOriginal Number in an array\n");
for(x=0;x<n;x++){
printf("%d\t",arr[x]);
}
int i=0,j=0;
// printf("i\tj\tarr\tchanged\n");
for (int i = 0; i < n; i++)
{
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
for (int j = 0; j <n; j++)
{
if (i==j)
{
continue;
}
else if(arr[i]==arr[j]){
changedarr[j]=0;
}
else{
changedarr[i]=arr[i];
}
// printf("%d\t%d\t%d\t%d\n",i,j,arr[i],changedarr[i] );
}
myvar+=1;
}
// printf("\n\nmyvar=%d\n",myvar);
int count=0;
printf("\nThe unique items:\n");
for (int i = 0; i < myvar; i++)
{
if(changedarr[i]!=0){
count+=1;
printf("%d\t",changedarr[i]);
}
}
printf("\n");
}
It'd be cool if you had a good DataStructure that could quickly tell if it contains an integer. Perhaps a tree of some sort.
DataStructure elementsSeen = new DataStructure();
int elementsRemoved = 0;
for(int i=0;i<array.Length;i++){
if(elementsSeen.Contains(array[i])
elementsRemoved++;
else
array[i-elementsRemoved] = array[i];
}
array.Length = array.Length - elementsRemoved;

Find integer not occurring twice in an array

I am trying to solve this problem:
In an integer array all numbers occur exactly twice, except for a single number which occurs exactly once.
A simple solution is to sort the array and then test for non repetition. But I am looking for better solution that has time complexity of O(n).
You can use "xor" operation on the entire array. Each pair of numbers will cancel each other, leaving you with the sought value.
int get_orphan(int const * a, int len)
{
int value = 0;
for (int i = 0; i < len; ++i)
value ^= a[i];
// `value` now contains the number that occurred odd number of times.
// Retrieve its index in the array.
for (int i = 0; i < len; ++i)
{
if (a[i] == value)
return i;
}
return -1;
}

Resources