Calling Groups of Elements of Matlab Arrays - arrays

I'm dealing with long daily time series in Matlab, running over periods of 30-100+ years. I've been meaning to start looking at it by seasons, roughly approximating that by taking 91-day segments of each year over the time period (with some tbd method of correcting for odd number of days in the year)
Basically, what I want is an array indexing method that allows me to make a new array that takes 91 elements every 365 elements, starting at element 1. I've been looking for some normal array methods (some (:) or other), but I haven't been able to find one. I guess an alternative would be to kind of iterate over 365-day segments 91 times, but that seems needlessly complicated.
Is there a simpler way that I've missed?
Thanks in advance for the help!

So if I understand correctly, you want to extract elements 1-91, 366-457, 731-822, and so on? I'm not sure that there is a way to do this with basic matrix indexing, but you can do the following:
days = 1:365; %Create array ranging from 1 - 365
difference = length(data) - 365; %how much bigger is time series data?
padded = padarray(days, [0, difference], 'circular'); %extend to fit time series
extracted = data(padded <= 91); %get every element in the range 1-91
Basically what I am doing is creating an array that is the same size as your time series data that repeats 1-365 over and over. I then perform logical indexing on data, such that the padded array is less than or equal to 91.
As a more approachable example, consider:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
days = 1:5;
difference = length(x) - 5;
padded = padarray(days, [0, difference], 'circular');
extracted = x(padded <= 2);
padded then is equal to [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] and extracted is going to be [1, 2, 6, 7]

Related

What do these constraints mean?

Can anyone help me understand this coding problem assignment?
I have an array of numbers, where each number appears twice except for one, and I need to identify which is the number that only appears once.
E.g.
const num_list = [8, 6, 3, 2, 4, 2, 3, 4, 5, 8, 7, 7, 6]
Answer: 5
The thing I'm confused about though is the constraints given for the problem are:
2 <= num_list[i] <= 100000
3 <= i <= 10,000
In particular the second constraint given - What does 'i' refer to here? Is it just stating the minimum number of elements that will be in the array (there are multiple test cases with different arrays as input)? Or does it mean that if I iterate over the array I can only start iterating from index 3 of the array onwards?
Thanks in advance

A question that involves permutations of pairs of row elements

Consider two numpy arrays of integers. U has 2 columns and shows all (p,q) where p<q. For this question, I'll restrict myself to 0<=p,q<=5. The cardinality of U is C(6,2) = 15.
U = [[0,1],
[0,2],
[0,3],
[0,4],
[0,5],
[1,2],
[1,3],
[1,4],
[1,5],
[2,3],
[2,4],
[2,5],
[3,4],
[3,5],
[4,5]]
The 2nd array, V, has 6 columns. I formed it by finding the cartesian product UxUxU. So, the first row of V is [0,1,0,1,0,1], and the last row is [4,5,4,5,4,5]. The cardinality of V is C(6,2)^3 = 3375.
A SMALL SAMPLE of V, used in my question, is shown below. The elements of each row should be thought of as 3 pairs. The rationale follows.
V = [[0,1, 2,5, 2,4],
[0,1, 2,5, 2,5],
[0,1, 2,5, 3,4],
[0,1, 2,5, 3,5],
[0,1, 2,5, 4,0],
[0,1, 2,5, 4,1]]
Here's why the row elements should be thought of as a set of 3 pairs: Later in my code, I will loop through each row of V, using the pair values to 'swap' columns of a matrix M. (M is not shown because it isn't needed for this question) When we get to row [0,1, 2,5, 2,4], for example, we will swap the columns of M having indices 0 & 1, THEN swap the columns having indices 2 & 5, and finally, swap the columns having indices 2 & 4.
I'm currently wasting a lot of time because many of the rows of V could be eliminated.
The easiest case to understand involves V rows like [0,1, 2,5, 3,4] where all values are unique. This row has 6 pair permutations, but they all have the same net effect on M. Their values are unique, so none of the swaps will encounter 'interference' from another swap.
Question 1: How can I efficiently eliminate rows that have unique elements in unneeded permutations?
I would keep, say, [0,1, 2,5, 3,4], but drop:
[0,1, 3,4, 2,5],
[2,5, 0,1, 3,4],
[2,5, 3,4, 0,1],
[3,4, 0,1, 2,5],
[3,4, 2,5, 0,1]
I'm guessing a solution would involve np.sort and np.unique, but I'm struggling with getting a good result.
Question 2: (I don't think it's reasonable to expect an answer to this question, but I'd certainly appreciate any pointers or tips re resources that I could study) The question involves rows of V having one or more common elements, like [0,1, 2,5, 2,4] or [0,5, 2,5, 2,4] or [0,5, 2,5, 3,5]. All of these have 6 pair permutations, but they don't all have the same effect of M. The row [0,1, 2,5, 2,4], for example, has 3 permutations that produce one M outcome, and 3 permutations that produce another. Ideally, I would like to keep two of the rows but eliminate the other four. The two other rows I showed are even more 'pathological'.
Does anyone see a path forward here that would allow more eliminations of V rows? If not, I'll continue what I'm currently doing even though it's really inefficient - screening the code's final outputs for doubles.
To get rows of an array, without repetitions (in your sense), you can run:
VbyRows = V[np.lexsort(V[:, ::-1].T)]
sorted_data = np.sort(VbyRows, axis=1)
result = VbyRows[np.append([True], np.any(np.diff(sorted_data, axis=0), 1))]
Details:
VbyRows = V[np.lexsort(V[:, ::-1].T)] - sort rows by all columns.
I used ::-1 as the column index to sort first on the first column,
then by the second, and so on.
sorted_data = np.sort(VbyRows, axis=1) - sort each row from VbyRows
(and save it as a separate array).
np.diff(sorted_data, axis=0) - compute "vertical" differences between
previous and current row (in sorted_data).
np.any(...) - A bool vector - "cumulative difference indicator" for
each row from sorted_data but the first (does it differ from the
previous row on any position).
np.append([True], ...) - prepend the above result with True (an
indicator that the first row should be included in the result).
The result is also a bool vector, this time for all rows. Each element
of this row answers the question: Should the respective row from VbyRows
be included in the result.
result = VbyRows[np.append([True], np.any(np.diff(sorted_data, axis=0), 1))] -
the final result.
To test the above code I prepared V as follows:
array([[ 0, 1, 2, 5, 3, 4],
[ 0, 1, 3, 4, 2, 5],
[ 2, 5, 0, 1, 3, 4],
[ 2, 5, 3, 4, 0, 1],
[ 3, 4, 0, 1, 2, 5],
[13, 14, 12, 15, 10, 11],
[ 3, 4, 2, 5, 0, 1]])
(the last but one row is "other", all remaining rows contain the same
numbers in various order).
The result is:
array([[ 0, 1, 2, 5, 3, 4],
[13, 14, 12, 15, 10, 11]])
Note that lexsort as the first step provides that from rows with
the same set of numbers the returned row will be the first from rows
sorted by consecutive columns.

Average between arrays of different length

I'm trying to develop a sort of very simple machine learning example to recognize similarity between arrays.
For this reason I'm trying to calculate the average between 2 arrays with different length.
For example if I have:
array_1 = [0, 4, 5];
array_2 = [4, 2, 7];
The average is:
average_array = [2, 3, 6];
But how can I manage to calculate the average if I have the following situation:
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
As you can see the arrays have a different length.
Is there an algorithm that I can apply to solve this problems?
Does anyone have an idea or some suggestion?
Of course I can consider the missing values of the second array as 0, and evaluate the average as, for example:
average_array = [2, 3, 6, 5, 3.5];
or consider the values as "null" and have:
average_array = [2, 3, 6, 10, 7];
But are this two approach good?
Or there is something smarter?
Thanks for your help!!
To answer your question, we really need more information on what you are trying to achieve.
I'm trying to develop a sort of very simple machine learning example
to recognize similarity between arrays. For this reason I'm trying to
calculate the average between 2 arrays with different length.
Depending on your usecase, similarity might be defined completely differently.
For instance:
if the array encodes sound-information you might want to measure similarity as "does this sound clip occur in this one" or "are the main frequencies (which would correspond to chords) the same"
if the array encodes image information (properly DFT-ed and zig-zag-encoded) you might not care about the low frequencies (end of the array) and only measure the difference between the first few values of the array
if the array encodes some kind of composition of elements (e.g. this essay contains keyword "matrix" 40 times, and keyword "SVM" 27 times) the difference in values might be very important.
General advice:
Think about what you're measuring
Decide what's important
But in general, have a look at smoothing algorithms. For instance Kneyser-Ney or Good-Turing smoothing. They explictly deal with comparing a vector of probabilities that may differ in length (in other words, have explicit zero entries)
https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
If after taking the the average of the arrays, you intend to take the mod of the difference of the array and the average array, then you are probably in the right direction if you will measure the dissimilarity by the magnitude of the difference.
But for arrays of different lengths I propose that you also take the index of extra elements in consideration.
For
array_1 = [0, 4, 5, 10, 7];
array_2 = [4, 2, 7];
average should be average_array = [2, 3, 6, 6.5, 5.5];
6.5 = (10 + 3(index) + 0(element) ) / 2
and
5.5 = (7 + 4(index) + 0(element))/2
Reason for taking index into consideration is that the length factor is also dealth with this approach. However this is just my 2 cents. May be there are better algorithms out there.
You should also take a look at this post

Increase time complexity to overcome space complexity

So I have an array 'a0' of size let's say 105, and now I have to make some changes in this array. The ith change could be calculated using a function f(ai-1) to give ai in O(1) time, Where aj denotes array 'a' after jth change has been made to it. Meaning that ai could be calculated if we know ai-1 in constant time. I know that I have to make 105 changes beforehand.
Now the problem asks me to answer large number of queries such as ai[p]-aj[q], where ax[y], represents yth element of the array after xth change has been made to the array a0.
Now if I had space of the order of 1010, I could easily solve this problem in O(1) by storing all the 105 arrays beforehand but I don't (generally) have that kind of space. And I could also answer these queries by each time generating ai and aj from scratch and answering the queries but I can't afford that kind of time complexity either, so I was wondering if I could monitor this problem using some data-structure.
EDIT: Example:
We define an array B= {1,3,1,4,2,6}, and we define aj as the array storing the frequency of ith number after jth element has been added to B. That is, a0={0,0,0,0,0,0} now a1={1,0,0,0,0,0}, a2={1,0,1,0,0,0}, a3={2,0,1,0,0,0} a4={2,0,1,1,0,0} a5={2,1,1,1,0,0} and a6={2,1,1,1,0,1}.
f(aj) just adds a an element to B and updates the value of aj-1.
Assume the number of changed elements per iteration is much smaller than the total number of elements. Store an array of lists, where the list elements are (i, new_value). For example if the full view is like this:
a0 = [3, 5, 1, 9]
a1 = [3, 5, 1, 8]
a2 = [1, 5, 1, 0]
We will store this:
c0 = [(0, 3), (2, 1)]
c1 = [(0, 5)]
c2 = [(0, 1)]
c3 = [(0, 9), (1, 8), (2, 0)]
Then for the query a2[0] - a1[3], we need only consult c0 and c3 (the two columns in the query). We can use binary search to locate the necessary indexes 2 and 1 (the keys for the binary search being the first elements of the tuples).
The query time is then O(log N) for the two binary searches, where N is the maximum number of changes to a single value in the array. The space is O(L + M), where L is the length of the original array and M is the total number of changes made.
If there is some a maximum number of states N, then checkpoints are a good way to go. For instance, if N=100,000, you might have:
c0 = [3, 5, 7, 1, ...]
c100 = [1, 4, 9, 8, ...]
c200 = [9, 7, 1, 2, ...]
...
c10000 = [1, 1, 4, 6, ...]
Now you have 1000 checkpoints. You can find the nearest checkpoint to an arbitrary state x in O(1) time and reconstruct x in at most 99 operations.
Riffing off of my comment on your question and John Zwinck's answer, if your mutating function f(*) is expensive and its effects are limited to only a few elements, then you could store the incremental changes. Doing so won't decrease the time complexity of the algorithm, but may reduce the run-time.
If you had unlimited space, you would just store all of the checkpoints. Since you do not, you'll have to balance the number of checkpoints against the incrementals appropriately. That will require some experimentation, probably centered around determining how expensive f(*) is and the extent of its effects.
Another option is to look at query behavior. If users tend to query the same or nearby locations repeatedly, you may be able to leverage an LRU (least-recently used) cache.

Efficient way of finding sequential numbers across multiple arrays?

I'm not looking for any code or having anything being done for me. I need some help to get started in the right direction but do not know how to go about it. If someone could provide some resources on how to go about solving these problems I would very much appreciate it. I've sat with my notebook and am having trouble designing an algorithm that can do what I'm trying to do.
I can probably do:
foreach element in array1
foreach element in array2
check if array1[i] == array2[j]+x
I believe this would work for both forward and backward sequences, and for the multiples just check array1[i] % array2[j] == 0. I have a list which contains int arrays and am getting list[index] (for array1) and list[index+1] for array2, but this solution can get complex and lengthy fast, especially with large arrays and a large list of those arrays. Thus, I'm searching for a better solution.
I'm trying to come up with an algorithm for finding sequential numbers in different arrays.
For example:
[1, 5, 7] and [9, 2, 11] would find that 1 and 2 are sequential.
This should also work for multiple sequences in multiple arrays. So if there is a third array of [24, 3, 15], it will also include 3 in that sequence, and continue on to the next array until there isn't a number that matches the last sequential element + 1.
It also should be able to find more than one sequence between arrays.
For example:
[1, 5, 7] and [6, 3, 8] would find that 5 and 6 are sequential and also 7 and 8 are sequential.
I'm also interested in finding reverse sequences.
For example:
[1, 5, 7] and [9, 4, 11]would return 5 and 4 are reverse sequential.
Example with all:
[1, 5, 8, 11] and [2, 6, 7, 10] would return 1 and 2 are sequential, 5 and 6 are sequential, 8 and 7 are reverse sequential, 11 and 10 are reverse sequential.
It can also overlap:
[1, 5, 7, 9] and [2, 6, 11, 13] would return 1 and 2 sequential, 5 and 6 sequential and also 7 and 6 reverse sequential.
I also want to expand this to check numbers with a difference of x (above examples check with a difference of 1).
In addition to all of that (although this might be a different question), I also want to check for multiples,
Example:
[5, 7, 9] and [10, 27, 8] would return 5 and 10 as multiples, 9 and 27 as multiples.
and numbers with the same ones place.
Example:
[3, 5, 7] and [13, 23, 25] would return 3 and 13 and 23 have the same ones digit.
Use a dictionary (set or hashmap)
dictionary1 = {}
Go through each item in the first array and add it to the dictionary.
[1, 5, 7]
Now dictionary1 = {1:true, 5:true, 7:true}
dictionary2 = {}
Now go through each item in [6, 3, 8] and lookup if it's part of a sequence.
6 is part of a sequence because dictionary1[6+1] == true
so dictionary2[6] = true
We get dictionary2 = {6:true, 8:true}
Now set dictionary1 = dictionary2 and dictionary2 = {}, and go to the third array.. and so on.
We only keep track of sequences.
Since each lookup is O(1), and we do 2 lookups per number, (e.g. 6-1 and 6+1), the total is n*O(1) which is O(N) (N is the number of numbers across all the arrays).
The brute force approach outlined in your pseudocode will be O(c^n) (exponential), where c is the average number of elements per array and n is the number of total arrays.
If the input space is sparse (meaning there will be more missing numbers on average than presenting numbers), then one way to speed up this process is to first create a single sorted set of all the unique numbers from all your different arrays. This "master" set will then allow you to early exit (i.e. break statements in your loops) on any sequences which are not viable.
For example, if we have input arrays [1, 5, 7] and [6, 3, 8] and [9, 11, 2], the master ordered set would be {1, 2, 3, 5, 6, 7, 8, 9, 11}. If we are looking for n+1 type sequences, we could skip ever continuing checking any sequence that contains a 3 or 9 or 11 (because the n+1 value in not present at the next index in the sorted set. While the speedups are not drastic in this particular example, if you have hundreds of input arrays and very large range of values for n (sparsity), then the speedups should be exponential because you will be able to early exit on many permutations. If the input space is not sparse (such as in this example where we didn't have many holes), the speedups will be less than exponential.
A further improvement would be to store a "master" set of key-value pairs, where the key is the n value as shown in the example above, and the value portion of the pair is a list of the indices of any arrays that contain that value. The master set of the previous example would then be: {[1, 0], [2, 2], [3, 1], [5, 0], [6, 1], [7, 0], [8, 1], [9, 2], [11, 2]}. With this architecture, scan time could potentially be as low as O(c*n), because you could just traverse this single sorted master set looking for valid sequences instead of looping over all the sub-arrays. By also requiring the array indexes to increment, you can clearly see that the 1->2 sequence can be skipped because the arrays are not in the correct order, and the same with the 2->3 sequence, etc. Note this toy example is somewhat oversimplified because in practice you would need a list of indices for the value portions of the key-value pairs. This would be necessary if the same value of n ever appeared in multiple arrays (duplicate values).

Resources