Removing numbers from a large range of numbers - c

I've got the following problem that I'm trying to find a more optimal solution for.
Let's say you have a range of numbers between 0 and 9:
Values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Now, let's say you "remove" 1, 4, 5, and 7:
Values: 0, -, 2, 3, -, -, 6, -, 8, 9
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Where there is no value, all subsequent values are shifted to the left:
Values: 0, 2, 3, 6, 8, 9
Index: 0, 1, 2, 3, 4, 5
The value at index 1 has now become 2 (was 1), the value at index 2 is now 3 (was 2), the value at index 3 is now 6 (was 3), etc.
Here's the problem. I need to manage this on a larger scale, up to tens of thousands of values. A random number of those values will be removed from the original contiguous range, and potentially added back afterwards (but not in the same order they were removed). The starting state will always be a complete sequence of numbers between 0 and MAX_VAL.
Things I've tried:
1) Maintaining an array of values, removing values from that array, and shifting everything over by one. This fails because you're iterating through all the values after the one you've just removed, and it's too slow as a result. Getting the value for a given index afterwards is really fast though.
2) Maintaining a linked list of values, and removing the value by pulling it out of the list. This seems to be slow both adding/removing values and getting the value at a given index, since I need to walk through the list first.
3) Keeping track of the "removed" values, rather then maintaining a giant array/list/etc of values from 0 to MAX_VAL. If the removed values are stored in an ordered array, then it becomes trivial to calculate how many values have been removed before and after a given index, and just return an offset index instead. This kinda works, except it's slow to maintain the ordered array of removed values and iterate through that instead, especially if the number of removed values approaches MAX_VAL.
Is there some sort of algorithm or technique that can handle this kind of problem more quickly and efficiently?

Is there some sort of algorithm or technique that can handle this kind of problem more quickly and efficiently?
The answer very much depends on typical use cases:
Is the set of numbers typically sparse or dense?
How often do you do insertions vs. removals vs. lookups?
In which patterns are numbers inserted or removed (random, continuous, from the end or start)?
What are there any memory constraints?
Here are some ideas for a generic solution:
Create a structure that stores ranges instead of numbers.
Start with a single entry: 0 - MAX_VAL.
A range can have subranges. This resulting graph of ranges forms a tree.
Removing a number splits a leaf range into two, creating two new leafs.
This algorithm would perform quite well when the set is dense (because there are few ranges). It would still perform somewhat fast when the graph grows (O(log n) for lookups) when you keep the tree balanced.

Now, let's say you "remove" 1, 4, 5, and 7:
Values: 0, -100, 2, 3, -100, -100, 6, -100, 8, 9// use a unique value that doesn't used in array
Index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Related

Permutations with predicate Scala

I am trying to solve combinations task in Scala. I have an array with repeated elements and I have to count the number of combinations which satisfy the condition a+b+c = 0. Numbers should not be repeated, if they are in different places it doesn`t count as a distinct combination.
So I turned my array into Set, so the elements would not repeat each other. Also, I have found about combinations method for sequences, but I am not really sure how to use it in this case. Also, I do not know where t put these permutations condition.
Here is what I have for now:
var arr = Array(-1, -1, -2, -2, 1, -5, 1, 0, 1, 14, -8, 4, 5, -11, 13, 5, 7, -10, -4, 3, -6, 8, 6, 2, -9, -1, -4, 0)
val arrSet = Set(arr)
arrSet.toSeq.combinations(n)
I am new to Scala, so I would be really grateful for any advice!
Here's what you need:
arr.distinct.combinations(3).filter(_.sum == 0).size
where:
distinct removes the duplicates
combinations(n) produces combinations of n elements
filter filters them by keeping only those whose sum is 0
size returns the total number of such combinations
P.S.: arr don't need to be a var. You should strive to never use var in Scala and stick to val as long as it's possible.

Efficient way of finding sequential numbers across multiple arrays?

I'm not looking for any code or having anything being done for me. I need some help to get started in the right direction but do not know how to go about it. If someone could provide some resources on how to go about solving these problems I would very much appreciate it. I've sat with my notebook and am having trouble designing an algorithm that can do what I'm trying to do.
I can probably do:
foreach element in array1
foreach element in array2
check if array1[i] == array2[j]+x
I believe this would work for both forward and backward sequences, and for the multiples just check array1[i] % array2[j] == 0. I have a list which contains int arrays and am getting list[index] (for array1) and list[index+1] for array2, but this solution can get complex and lengthy fast, especially with large arrays and a large list of those arrays. Thus, I'm searching for a better solution.
I'm trying to come up with an algorithm for finding sequential numbers in different arrays.
For example:
[1, 5, 7] and [9, 2, 11] would find that 1 and 2 are sequential.
This should also work for multiple sequences in multiple arrays. So if there is a third array of [24, 3, 15], it will also include 3 in that sequence, and continue on to the next array until there isn't a number that matches the last sequential element + 1.
It also should be able to find more than one sequence between arrays.
For example:
[1, 5, 7] and [6, 3, 8] would find that 5 and 6 are sequential and also 7 and 8 are sequential.
I'm also interested in finding reverse sequences.
For example:
[1, 5, 7] and [9, 4, 11]would return 5 and 4 are reverse sequential.
Example with all:
[1, 5, 8, 11] and [2, 6, 7, 10] would return 1 and 2 are sequential, 5 and 6 are sequential, 8 and 7 are reverse sequential, 11 and 10 are reverse sequential.
It can also overlap:
[1, 5, 7, 9] and [2, 6, 11, 13] would return 1 and 2 sequential, 5 and 6 sequential and also 7 and 6 reverse sequential.
I also want to expand this to check numbers with a difference of x (above examples check with a difference of 1).
In addition to all of that (although this might be a different question), I also want to check for multiples,
Example:
[5, 7, 9] and [10, 27, 8] would return 5 and 10 as multiples, 9 and 27 as multiples.
and numbers with the same ones place.
Example:
[3, 5, 7] and [13, 23, 25] would return 3 and 13 and 23 have the same ones digit.
Use a dictionary (set or hashmap)
dictionary1 = {}
Go through each item in the first array and add it to the dictionary.
[1, 5, 7]
Now dictionary1 = {1:true, 5:true, 7:true}
dictionary2 = {}
Now go through each item in [6, 3, 8] and lookup if it's part of a sequence.
6 is part of a sequence because dictionary1[6+1] == true
so dictionary2[6] = true
We get dictionary2 = {6:true, 8:true}
Now set dictionary1 = dictionary2 and dictionary2 = {}, and go to the third array.. and so on.
We only keep track of sequences.
Since each lookup is O(1), and we do 2 lookups per number, (e.g. 6-1 and 6+1), the total is n*O(1) which is O(N) (N is the number of numbers across all the arrays).
The brute force approach outlined in your pseudocode will be O(c^n) (exponential), where c is the average number of elements per array and n is the number of total arrays.
If the input space is sparse (meaning there will be more missing numbers on average than presenting numbers), then one way to speed up this process is to first create a single sorted set of all the unique numbers from all your different arrays. This "master" set will then allow you to early exit (i.e. break statements in your loops) on any sequences which are not viable.
For example, if we have input arrays [1, 5, 7] and [6, 3, 8] and [9, 11, 2], the master ordered set would be {1, 2, 3, 5, 6, 7, 8, 9, 11}. If we are looking for n+1 type sequences, we could skip ever continuing checking any sequence that contains a 3 or 9 or 11 (because the n+1 value in not present at the next index in the sorted set. While the speedups are not drastic in this particular example, if you have hundreds of input arrays and very large range of values for n (sparsity), then the speedups should be exponential because you will be able to early exit on many permutations. If the input space is not sparse (such as in this example where we didn't have many holes), the speedups will be less than exponential.
A further improvement would be to store a "master" set of key-value pairs, where the key is the n value as shown in the example above, and the value portion of the pair is a list of the indices of any arrays that contain that value. The master set of the previous example would then be: {[1, 0], [2, 2], [3, 1], [5, 0], [6, 1], [7, 0], [8, 1], [9, 2], [11, 2]}. With this architecture, scan time could potentially be as low as O(c*n), because you could just traverse this single sorted master set looking for valid sequences instead of looping over all the sub-arrays. By also requiring the array indexes to increment, you can clearly see that the 1->2 sequence can be skipped because the arrays are not in the correct order, and the same with the 2->3 sequence, etc. Note this toy example is somewhat oversimplified because in practice you would need a list of indices for the value portions of the key-value pairs. This would be necessary if the same value of n ever appeared in multiple arrays (duplicate values).

Unlimited changes to get an equal array

To clarify at the outset, this is not a homework, I was asked this question at a recent interview and drew a blank.
So I've the following array,
{1, 6, 3, 2, 9}
A change is step which increments any element by 1 and decrements any other element by 1. Thus 1 change could be like,
{2, 5, 3, 2, 9}
I'm allowed to make unlimited such changes, till I get maximum number of equal elements, thus the given array could become
{3, 3, 3, 3, 7} or {3, 4, 4, 4, 4}
Beyond this point more changes will not get any more elements equal. The question is thus, making unlimited changes, what is the maximum number of elements that can be made equal.
Thus the answer for the above array is 4. (Note there are two cases, in either case though the answer is 4)
Another example will be the array,
{1, 4, 1}
In which case we can make changes to get to
{2, 2, 2}
Thus the answer in this case is 3.
Can someone help me with an approach to get started. I'm still drawing a blank.
This seems to be a mathematical problem rather than a computer related one. Since every "Change" increments one element and decrements another, the sum of all the values in the array is constant.
This means that you can get all n elements of the array identical if and only if the sum of all elements can be evenly divided by n. Otherwise one of the elements must take another value to get n-1 equal elements.
By the way, your answers {3, 3, 3, 3, 7} and {3, 4, 4, 4, 4} (sum of 19) are not solutions to your previous state of {1, 6, 3, 2, 9} (sum of 21).

Vectorizing Matlab replace array values from start to end

I have an array in which I want to replace values at a known set of indices with the value immediately preceding it. As an example, my array might be
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0];
and the indices of values to be replaced by previous values might be
y = [2, 3, 8];
I want this replacement to occur from left to right, or else start to finish. That is, the value at index 2 should be replaced by the value at index 1, before the value at index 3 is replaced by the value at index 2. The result using the arrays above should be
[1, 1, 1, 4, 5, 6, 7, 7, 9, 0]
However, if I use the obvious method to achieve this in Matlab, my result is
>> x(y) = x(y-1)
x =
1 1 2 4 5 6 7 7 9 0
Hopefully you can see that this operation was performed right to left and the value at index 3 was replaced by the value at index 2, then 2 was replaced by 1.
My question is this: Is there some way of achieving my desired result in a simple way, without brute force looping over the arrays or doing something time consuming like reversing the arrays around?
Well, practically this is a loop but the order is number of consecutive index elements
while ~isequal(x(y),x(y-1))
x(y)=x(y-1)
end
Using nancumsum you can achieve a fully vectorized version. Nevertheless, for most cases the solution karakfa provided is probably one to prefer. Only for extreme cases with long sequences in y this code is faster.
c1=[0,diff(y)==1];
c1(c1==0)=nan;
shift=nancumsum(c1,2,4);
y(~isnan(shift))=y(~isnan(shift))-shift(~isnan(shift));
x(y)=x(y-1)

Calling Groups of Elements of Matlab Arrays

I'm dealing with long daily time series in Matlab, running over periods of 30-100+ years. I've been meaning to start looking at it by seasons, roughly approximating that by taking 91-day segments of each year over the time period (with some tbd method of correcting for odd number of days in the year)
Basically, what I want is an array indexing method that allows me to make a new array that takes 91 elements every 365 elements, starting at element 1. I've been looking for some normal array methods (some (:) or other), but I haven't been able to find one. I guess an alternative would be to kind of iterate over 365-day segments 91 times, but that seems needlessly complicated.
Is there a simpler way that I've missed?
Thanks in advance for the help!
So if I understand correctly, you want to extract elements 1-91, 366-457, 731-822, and so on? I'm not sure that there is a way to do this with basic matrix indexing, but you can do the following:
days = 1:365; %Create array ranging from 1 - 365
difference = length(data) - 365; %how much bigger is time series data?
padded = padarray(days, [0, difference], 'circular'); %extend to fit time series
extracted = data(padded <= 91); %get every element in the range 1-91
Basically what I am doing is creating an array that is the same size as your time series data that repeats 1-365 over and over. I then perform logical indexing on data, such that the padded array is less than or equal to 91.
As a more approachable example, consider:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
days = 1:5;
difference = length(x) - 5;
padded = padarray(days, [0, difference], 'circular');
extracted = x(padded <= 2);
padded then is equal to [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] and extracted is going to be [1, 2, 6, 7]

Resources