I have an array [a0,a1,...., an] I want to calculate the sum of the distance between every pair of the same element.
1)First element of array will always be zero.
2)Second element of array will be greater than zero.
3) No two consecutive elements can be same.
4) Size of array can be upto 10^5+1 and elements of array can be from 0 to 10^7
For example, if array is [0,2,5 ,0,5,7,0] then distance between first 0 and second 0 is 2*. distance between first 0 and third 0 is 5* and distance between second 0 and third 0 is 2*. distance between first 5 and second 5 is 1*. Hence sum of distances between same element is 2* + 5* + 2* + 1* = 10;
For this I tried to build a formula:- for every element having occurence more than 1 (0 based indexing and first element is always zero)--> sum = sum + (lastIndex - firstIndex - 1) * (NumberOfOccurence - 1)
if occurence of element is odd subtract -1 from sum else leave as it is. But this approach is not working in every case.
,,But this approach works if array is [0,5,7,0] or if array is [0,2,5,0,5,7,0,1,2,3,0]
Can you suggest another efficient approach or formula?
Edit :- This problem is not a part of any coding contest, it's just a little part of a bigger problem
My method requires space that scales with the number of possible values for elements, but has O(n) time complexity.
I've made no effort to check that the sum doesn't overflow an unsigned long, I just assume that it won't. Same for checking that any input values are in fact no more than max_val. These are details that would have to be addressed.
For each possible value, it keeps track of how much would be added to the sum if one of that element is encountered in total_distance. In instances_so_far, it keeps track of how many instances of a value have already been seen. This is how much would be added to total_distance each step. To make this more efficient, the last index at which a value was encountered is tracked, such that total_distance need only be added to when that particular value is encountered, instead of having nested loops that add every value at every step.
#include <stdio.h>
#include <stddef.h>
// const size_t max_val = 15;
const size_t max_val = 10000000;
unsigned long instances_so_far[max_val + 1] = {0};
unsigned long total_distance[max_val + 1] = {0};
unsigned long last_index_encountered[max_val + 1];
// void print_array(unsigned long *array, size_t len) {
// printf("{");
// for (size_t i = 0; i < len; ++i) {
// printf("%lu,", array[i]);
// }
// printf("}\n");
// }
unsigned long get_sum(unsigned long *array, size_t len) {
unsigned long sum = 0;
for (size_t i = 0; i < len; ++i) {
if (instances_so_far[array[i]] >= 1) {
total_distance[array[i]] += (i - last_index_encountered[array[i]]) * instances_so_far[array[i]] - 1;
}
sum += total_distance[array[i]];
instances_so_far[array[i]] += 1;
last_index_encountered[array[i]] = i;
// printf("inst ");
// print_array(instances_so_far, max_val + 1);
// printf("totd ");
// print_array(total_distance, max_val + 1);
// printf("encn ");
// print_array(last_index_encountered, max_val + 1);
// printf("sums %lu\n", sum);
// printf("\n");
}
return sum;
}
unsigned long test[] = {0,1,0,2,0,3,0,4,5,6,7,8,9,10,0};
int main(void) {
printf("%lu\n", get_sum(test, sizeof(test) / sizeof(test[0])));
return 0;
}
I've tested it with a few of the examples here, and gotten the answers I expected.
I had to use static storage for the arrays because they overflowed the stack if put there.
I've left in the commented-out code I used for debugging, it's helpful to understand what's going on, if you reduce max_val to a smaller number.
Please let me know if you find a counter-example that fails.
Here is Python 3 code for your problem. This works on all the examples given in your question and in the comments--I included the test code.
This works by looking at how each consecutive pair of repeated elements adds to the overall sum of distances. If the list has 6 elements, the pair distances are:
x x x x x x The repeated element's locations in the array
-- First, consecutive pairs
--
--
--
--
----- Now, pairs that have one element inside
-----
-----
-----
-------- Now, pairs that have two elements inside
--------
--------
----------- Now, pairs that have three elements inside
-----------
-------------- Now, pairs that have four elements inside
If we look down between each consecutive pair, we see that it adds to the overall sum of all pairs:
5 8 9 8 5
And if we look at the differences between those values we get
3 1 -1 -3
Now if we use my preferred definition of "distance" for a pairs, namely the difference of their indices, we can use those multiplicities for consecutive pairs to calculate the overall sum of distances for all pairs. But since your definition is not mine, we calculate the sum for my definition then adjust it for your definition.
This code makes one pass through the original array to get the occurrences for each element value in the array, then another pass through those distinct element values. (I used the pairwise routine to avoid another pass through the array.) That makes my algorithm O(n) in time complexity, where n is the length of the array. This is much better than the naive O(n^2). Since my code builds an array of the repeated elements, once per unique element value, this has space complexity of at worst O(n).
import collections
import itertools
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ..."""
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
def sum_distances_of_pairs(alist):
# Make a dictionary giving the indices for each element of the list.
element_ndxs = collections.defaultdict(list)
for ndx, element in enumerate(alist):
element_ndxs[element].append(ndx)
# Sum the distances of pairs for each element, using my def of distance
sum_of_all_pair_distances = 0
for element, ndx_list in element_ndxs.items():
# Filter out elements not occurring more than once and count the rest
if len(ndx_list) < 2:
continue
# Sum the distances of pairs for this element, using my def of distance
sum_of_pair_distances = 0
multiplicity = len(ndx_list) - 1
delta_multiplicity = multiplicity - 2
for ndx1, ndx2 in pairwise(ndx_list):
# Update the contribution of this consecutive pair to the sum
sum_of_pair_distances += multiplicity * (ndx2 - ndx1)
# Prepare for the next consecutive pair
multiplicity += delta_multiplicity
delta_multiplicity -= 2
# Adjust that sum of distances for the desired definition of distance
cnt_all_pairs = len(ndx_list) * (len(ndx_list) - 1) // 2
sum_of_pair_distances -= cnt_all_pairs
# Add that sum for this element into the overall sum
sum_of_all_pair_distances += sum_of_pair_distances
return sum_of_all_pair_distances
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0]) == 10
assert sum_distances_of_pairs([0, 5, 7, 0]) == 2
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0, 1, 2, 3, 0]) == 34
assert sum_distances_of_pairs([0, 0, 0, 0, 1, 2, 0]) == 18
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 0, 10, 0]) == 66
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 0, 4, 5, 6, 7, 8, 9, 10, 0]) == 54
The Minimum Size Subarray Sum problem:
given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn't one, return 0 instead.
For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.
The following is my solution:
public int minSubArrayLen(int s, int[] nums) {
long sum = 0;
int a = 0;
if (nums.length < 1)
return 0;
Arrays.sort(nums);
for (int i = nums.length-1; i >= 0; i--) {
sum += nums[i];
a++;
if (sum>=s)
break;
}
if (sum < s) {
return 0;
}
return a;
}
This solution was not accepted because it did not pass the following test case:
697439
[5334,6299,4199,9663,8945,3566,9509,3124,6026,6250,7475,5420,9201,9501,38,5897,4411,6638,9845,161,9563,8854,3731,5564,5331,4294,3275,1972,1521,2377,3701,6462,6778,187,9778,758,550,7510,6225,8691,3666,4622,9722,8011,7247,575,5431,4777,4032,8682,5888,8047,3562,9462,6501,7855,505,4675,6973,493,1374,3227,1244,7364,2298,3244,8627,5102,6375,8653,1820,3857,7195,7830,4461,7821,5037,2918,4279,2791,1500,9858,6915,5156,970,1471,5296,1688,578,7266,4182,1430,4985,5730,7941,3880,607,8776,1348,2974,1094,6733,5177,4975,5421,8190,8255,9112,8651,2797,335,8677,3754,893,1818,8479,5875,1695,8295,7993,7037,8546,7906,4102,7279,1407,2462,4425,2148,2925,3903,5447,5893,3534,3663,8307,8679,8474,1202,3474,2961,1149,7451,4279,7875,5692,6186,8109,7763,7798,2250,2969,7974,9781,7741,4914,5446,1861,8914,2544,5683,8952,6745,4870,1848,7887,6448,7873,128,3281,794,1965,7036,8094,1211,9450,6981,4244,2418,8610,8681,2402,2904,7712,3252,5029,3004,5526,6965,8866,2764,600,631,9075,2631,3411,2737,2328,652,494,6556,9391,4517,8934,8892,4561,9331,1386,4636,9627,5435,9272,110,413,9706,5470,5008,1706,7045,9648,7505,6968,7509,3120,7869,6776,6434,7994,5441,288,492,1617,3274,7019,5575,6664,6056,7069,1996,9581,3103,9266,2554,7471,4251,4320,4749,649,2617,3018,4332,415,2243,1924,69,5902,3602,2925,6542,345,4657,9034,8977,6799,8397,1187,3678,4921,6518,851,6941,6920,259,4503,2637,7438,3893,5042,8552,6661,5043,9555,9095,4123,142,1446,8047,6234,1199,8848,5656,1910,3430,2843,8043,9156,7838,2332,9634,2410,2958,3431,4270,1420,4227,7712,6648,1607,1575,3741,1493,7770,3018,5398,6215,8601,6244,7551,2587,2254,3607,1147,5184,9173,8680,8610,1597,1763,7914,3441,7006,1318,7044,7267,8206,9684,4814,9748,4497,2239]
The expected answer is 132 but my output was 80.
Does anyone have any idea what went wrong with my algorithm/code?
I will simply explain the flaw in the logic rather giving the correct logic to handle the problem statement
You are taking the numbers in a specific sequence and then adding them for comparison. Quite easily the case can be different where you take numbers in random order to get the exact sum.
For example [2,3,1,2,4,3] and s = 7.
Based on your logic
Step 1-> Sort the numbers and you get [1,2,2,3,3,4]
Step 2-> You pick last 2 number (3,4) to get your sum 7
Lets change the sum to 8
From Step 2-> You get 3+3+4 = 10 so u break out of the loop. After this step you return a = 2
Flaw here is 4+3+1 also makes 8 something your logic skips.
Same way 3+3+2 is also possible solution to achieve 8.
You sorting the array is first flaw in the logic itself. If you consider subarray of existing arrangement, sorting changes the arrangement therefore you will never be able to get the expected solution.
I want to make a C program to that will give me all the possible ways to add three numbers(without using zero) to equal whatever the user entered. For example, if the user entered 4, the solutions would be 1+1+2. If the user entered 3, the only solution would be 1+1+1. From what I noticed, after 4, there is always another solution. So, if 5 was entered there would be 2 solutions..1+1+3,2+2+1...and if 6 was entered, there would be three solutions and the solution number would always increase by 1. The thing I can't figure out is the logic on how to get all the answers everytime. My current code is kind of brute force and just gives me the 1+1+(whatever number is left) solution and the other part only works for one solution and only if it is odd. My current code is as follows:
#include <stdio.h>
int main(void)
{
int num;
printf("Enter a number: ");
scanf("%d",&num);
if(num < 3)
printf("No solution.\n");
int p = num/2;
int q = num%2;
int v = num - (p+q);
printf("%d+%d+%d\n",p,q,v);//prints a single solution but only works for odd numbers
int k = num - 2;
printf("1+1+%d\n",k);
//prints a single solution but only 1+1+whatever is left
return 0;
}
Any advice or different way to approach this would be very helpful. I was told I could do 3 nested for loops but I was looking for a different approach.
This solution uses 3 nested for loops, (although I see that something else would be preferred). There is at least one other way:
Using recursion
There might be another one: solving the problem mathematically (and this would be the most elegant)
It is still brute force, but doesn't try some combinations that we know for sure that won't yield a (good) result.
Since the numbers order is not important meaning that for example:
6 =
1 + 2 + 3
1 + 3 + 2
2 + 1 + 3
2 + 3 + 1
3 + 1 + 2
3 + 2 + 1
all 6 permutations of (1, 2, 3) constitute a single solution, only one variant of the 3 numbers permutations matters. That we can use it in our advantage: we chose the 3 number sequence where No1 <= No2 <= No3 (No1 + No2 + No3 = N (entered by the user)).
That translates in lesser operations to compute: instead for each of the 3 indexes (i, j, k) which correspond to (No1, No2, No3) to swipe across the whole [1 .. n] interval:
The outer index (i) only iterates [1 .. n / 3] (it makes no point to go higher since the other 2 numbers are greater or equal to it, and if it did the sum would be greater than n). This alone reduces the number of operations to one third
The mid index (j) only iterates [i .. n / 2] (it doesn't go below the previous one since i <= j, and makes no sense to go higher than n / 2 since the other number will be greater or equal to it, and again the sum would be greater than n)
The inner index (k) only iterates [j .. n - 2] (it's obvious why).
Notes:
It might be possible (actually, I'm pretty sure) that the operations can be reduced even more
The last 3 variables declared (having the bogus names) are for speeding things up: they are calculated once, at the beginning (although I might be reinventing the wheel here since I'm pretty sure that the compiler is optimizing these kind of situations). But, regardless of the optimizations, the algorithm is still O(n ** 3) which is highly inefficient. I feel that I'm missing something obvious, but I can't put my finger on it
I checked (not very thoroughly though), and it doesn't seem to skip solutions
code.c:
#include <stdio.h>
void generate(int n) {
int i, j, k, count = 0, n_div_3 = n / 3, n_div_2 = n / 2, n_minus_1 = n - 1;
for (i = 1; i <= n_div_3; i++)
for (j = i; j <= n_div_2; j++)
for (k = j; k < n_minus_1; k++)
if (i + j + k == n) {
printf("Solution %d: %d %d %d\n", count++, i, j, k);
break;
}
printf("\n%d solutions\n", count);
}
int main () {
int num;
printf("Enter a number: ");
scanf("%d", &num);
generate(num);
return 0;
}
I would like to generate weighted random numbers in an exact manner. I can explain exact with an example: My input array is [1, 2, 3] and their weights are again [1, 2, 3]. In that case I expect to see 1 for 1 times, 2 for 2 times and 3 for 3. Like 3 -> 2 -> 3 -> 1 -> 3 -> 2...
I am implementing random number generation with rand() to get a range between [0, sum_of_weights). sum_of_weights = 1 + 2 + 3 = 6 for the example above. I searched for existing solutions on the Internet, however the result is not what I want. Sometimes I got 2 more than 2 times and no 1 in the sequence. Its still weighted but not exactly give the number of times I waited for.
I am not sure whats wrong with my code below. Should I do something wrong or I try totally different? Thanks for your answers.
int random_t (int items[], int items_weight[], int number_of_items)
{
double random_weight;
double sum_of_weight = 0;
int i;
/* Calculate the sum of weights */
for (i = 0; i < number_of_items; i++) {
sum_of_weight += items_weight[i];
}
/* Choose a random number in the range [0,1) */
srand(time(NULL));
double g = rand() / ( (double) RAND_MAX + 1.0 );
random_weight = g * sum_of_weight;
/* Find a random number wrt its weight */
int temp_total = 0;
for (i = 0; i < number_of_items; i++)
{
temp_total += items_weight[i];
if (random_weight < temp_total)
{
return items[i];
}
}
return -1; /* Oops, we could not find a random number */
}
I also tried something different (the code is below). It worked for my case, but integer overflow and extensive use of static variables makes it problematic.
If you enter an input array before give NULL and continue to work with it. A little bit similar to strtok() usage.
int random_w(int *arr, int weights[], int size)
{
int selected, i;
int totalWeight;
double ratio;
static long int total;
static long int *eachTotal = NULL;
static int *local_arr = NULL;
static double *weight = NULL;
if (arr != NULL)
{
free(eachTotal);
free(weight);
eachTotal = (long int*) calloc(size, sizeof(long));
weight = (double*) calloc(size, sizeof(double));
total = 0;
totalWeight = 0;
local_arr = arr;
for (i = 0; i < size; i++)
{
totalWeight += weights[i];
}
for (i = 0; i < size; i++)
{
weight[i] = (double)weights[i] / totalWeight;
}
srand(time(NULL));
}
while (1)
{
selected = rand() % size;
ratio = (double)(eachTotal[selected])/(double)(total+1);
if (ratio < weight[selected])
{
total++;
eachTotal[selected]++;
return local_arr[selected];
}
}
}
Is this what you want?
# Weights: one 1, two 2s, three 3s
>>> import random
>>> vals = [1] * 1 + [2] * 2 + [3] * 3
>>> random.shuffle(vals)
>>> vals
[2, 3, 1, 2, 3, 3]
Edit: Whoops, for some reason my mind replaced the C tag with the Python one. Regardless, I think what you want is not "weighted" random number generators, but a shuffle. This ought to help.
When you say you didn't get "exactly" the number of values you expected for each weighted value, how many runs are you talking? If you only did six runs of any random process, I wouldn't expect you to be able to definitively say anything was working or not. Your code may work fine. Try running it a million times and check the results then. Or maybe you actually want what Nathon is talking about, a preweighted list of values, which you can then randomly shuffle and still have the exact weights you're looking for.
You can sample from a multinomial distribution. Your universe of random samples (or "urn of balls in a bucket") is {1, 2, 3} and the probabilities ("weights") of observing each is, respectively, {1/6, 2/6, 3/6}.
For demonstration purposes, a Perl script can give you a list of observations of labeled balls with these probabilities:
#!/usr/bin/perl
use strict;
use warnings;
use Math::Random qw(random_multinomial);
use Data::Dumper;
my $events = 10;
my #probabilities = qw(0.167 0.333 0.5);
my #observations = random_multinomial($events, #probabilities);
print Dumper \#observations;
For 10 events, a single trial will return something like:
$VAR1 = 1;
$VAR2 = 2;
$VAR3 = 7;
This means you have (from this single trial) one 1-labeled event, two 2-labeled events, and seven 3-labeled events.
If you repeat the trial, you may get a different distribution of 1, 2 and 3-labeled events.
You can trivially build a list from this to the equivalent {1, 2, 2, 3, 3, 3, 3, 3, 3, 3} list.
Just randomly shuffle this second list to get your weighted, observed list of random numbers.
If you want to have the sample frequencies be completely deterministic, I think
the way to go is generate an array that has the proper number of occurrences
for each value, then do a random shuffle (which preserves the frequencies)
and take successive elements of the shuffled array as your random sequence.
ok, my answer will sound like a hack - but short or writing your own distribution - maybe you can map an uniform distribution and leverage boost (check out http://www.boost.org/doc/libs/1_44_0/doc/html/boost_random/reference.html#boost_random.reference.distributions)
so following your example:
1 -> 1
2,3 ->2
4,5,6 ->3
7,8,9,10 ->4 (etc...)
then generate random number between 1 and 10 and return the mapped element.
and then use boost's uniform_int distribution to get a number which you then map.
here is an example of generating the numbers; you would then need to map the results:
#include <iostream>
#include <boost/random.hpp>
#include <time.h>
using namespace std;
using namespace boost;
int main ( ) {
uniform_int<> distribution(0, 10) ;
mt19937 engine;
engine.seed(time(NULL));
variate_generator<mt19937, uniform_int<> > myrandom (engine, distribution);
cout << myrandom() << endl;
}