Filling missing data in a data set with constant values

Filling missing data in a data set with constant values - arrays

I have a data set like the following:
x= [1, 4, 10]
y= [10, 20, 30]
(x and y are value pairs, i.e. (1,10), (4,20), (10,30))
I would like to fill the x values gaps and having constant values for y until the next known value pair comes.This should be done between each value pair, i.e. between (1,10) and (4,20) and then again between (4,20) and (10,30).
Input:
x=[1, 4, 10];
y=[10, 20, 30];
Output:
xi= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
yi= [10,10, 10, 20, 20, 20, 20, 20, 20, 30];
How can Matlab solve this for me?

Assuming ascending order of elements in x, this could be one approach based on diff & cumsum -
%// Sample inputs
x=[1, 4, 10]
y=[-2, 5, -3]
xi = min(x):max(x)
yi = zeros(1,numel(xi))
yi(x) = diff([0 y])
yi = cumsum(yi)
Sample run -
x =
1 4 10
y =
-2 5 -3
xi =
1 2 3 4 5 6 7 8 9 10
yi =
-2 -2 -2 5 5 5 5 5 5 -3
Customary bsxfun solution to get yi -
lens = [diff(x) 1];
yi = nonzeros(bsxfun(#times,bsxfun(#ge,lens,[1:max(lens)]'),y)).'

Assuming that x always starts with a 1 and finishes with the final length of xi, this will work:
xi=1:x(end)
yi=y(arrayfun(#(xi)find(x<=xi,1,'last'),xi))

Related

Find subsequence of length k with the largest product

Language: Python
Given an array of integers, return the subsequence of length k which has the largest possible product. If there is more than one valid subsequence that gives the same product, return the one with the largest sum of numbers.
Example 1
array = [-10, -3, 5, 6, -2]
k = 2
Output should be [-10, -3] ( (-10) * (-3) = 30, which is the largest product of the given numbers)
Example 2
array = [10, 3, 5, 6, 20]
k = 3
Output should be [6, 10, 20], (6 * 10 * 20 = 1200)
Example 3
array = [1, -4, 3, -6, 7, 0]
k = 4
Output should be [-6, -4, 3, 7] ( (-6) * (-4) * 3 * 7 = 504)
I've already tried the code
def find_k_prod(arr, k):
arr = sorted(arr)
current_prod = 1
n = len(arr)
for i in range(k):
current_prod *= arr[n-1-i]
max_prod = current_prod
for i in range(k):
current_prod = (current_prod/arr[-(k-i)])*arr[i]
max_prod = max(current_prod, max_prod)
return max_prod
, but have no idea how to return the subsequence (not the product).

subtracting every nth array with every nth array

I have an array with the shape (10000,6). For example:
a = np.array([[5, 5, 5, 5, 5, 5][10, 10, 10, 10, 10][15, 15, 15, 15, 15]...])
I want to take every 25th array and subtract its element values from the next 25 elements until a new subtraction array in selected. so for example if the first array is:
[10, 10, 10, 10, 10]
then these values should be subtracted on the array itself and the next 25 arrays until for example a new subtraction array like this is selected:
[2, 2, 2, 2, 2]
then the array itself and the following 25 elements should be subtracted that arrays values.
This means that after the operation every 25th array will be:
[0, 0, 0, 0, 0]
because it has been subtracted by itself.

Here's what I would do:
import numpy as np
arr = np.random.randint(0, 10, (9, 3))
group_size = 3
# select vectors you want ot subtract and copy them {group_size} times
selected = arr[::group_size].repeat(3, axis = 0)
# subtract selected vectors from all vectors in the group
sub_arr = arr-selected
output:
arr =
[[9 6 3]
[8 3 3]
[2 0 4]
[0 3 9]
[3 9 9]
[0 8 6]
[4 0 0]
[6 1 9]
[2 6 4]]
selected =
[[9 6 3]
[9 6 3]
[9 6 3]
[0 3 9]
[0 3 9]
[0 3 9]
[4 0 0]
[4 0 0]
[4 0 0]]
sub_arr =
[[ 0 0 0]
[-1 -3 0]
[-7 -6 1]
[ 0 0 0]
[ 3 6 0]
[ 0 5 -3]
[ 0 0 0]
[ 2 1 9]
[-2 6 4]]

You can reshape your array so that each chunk has the right number of lines, and then simply subtract the first line
import numpy as np
a = np.arange(10000)[:, None] * np.ones(6)
a = a.reshape(-1, 25, 6)
a -= a[:, 0, :][:, None, :]
a = a.reshape(-1, 6)

Drop element in numpy array (or pandas series) if difference to previous element is <N

I have a numpy array that looks like that:
a = np.array([0,10,19,20,30,40,42,49,50,51])
I would like to drop all the elements whose consecutive difference is <= 2, eventually keeping
a_filtered = np.array([0,10,19,30,40,49])
How can I do this in numpy? Optionally, special thanks for how to do this in a pandas series (e.g. drop all rows whose index difference is < N)

IIUC
s=pd.Series(a)
s[~(s.diff()<=2)]
Out[289]:
0 0
1 10
2 19
4 30
5 40
7 49
dtype: int32
s[~(s.diff()<=2)].to_numpy()
Out[292]: array([ 0, 10, 19, 30, 40, 49])

Here you go:
N = 2
s = pd.Series(a)
mask = ~s.diff().le(2)
s[mask]
# you can also do
# a[mask]
Output:
1 10
2 19
4 30
5 40
7 49
dtype: int32

On numpy, you may use np.diff and np.insert to specially handle element 0
m = np.insert(np.diff(a, 1) > 2, 0, True)
a[m]
Out[526]: array([ 0, 10, 19, 30, 40, 49])
Or Use np.roll and assign element 0 of the mask to True
m = (a - np.roll(a, 1)) > 2
m[0] = True
a[m]
Out[534]: array([ 0, 10, 19, 30, 40, 49])

Frequency of non-increasing and non-decreasing subsequences

Having a sequence of numbers of length L, I need to count how many non-decreasing and non-increasing sub-sequences of exact length are there. For example, if I have a sequence of length 15
2, 4, 11, 13, 3, 5, 5, 6, 3, 3, 2, 4, 2, 14, 15
I see that non-increasing sub-sequences are
13, 3
6, 3, 3 , 2
4, 2
and non-decreasing sub-sequences are
2, 4, 11, 13
3, 5, 5, 6
2, 4
2, 14, 15
So here I have
2 non-increasing sub-sequences of length 2
1 non-increasing sub-sequence of length 4
2 non-decreasing sub-sequences of length 2
1 non-decreasing sub-sequences of length 3
2 non-decreasing sub-sequence of length 4
Since the maximum length of a non-decreasing (or non-increasing) sub-sequence can be 15 in this case, I thought about representing frequencies through vectors x for non-increasing and y for non-decreasing sub-sequences:
x = (0,2,0,1,0,0,0,0,0,0,0,0,0,0,0)
y = (0,1,1,2,0,0,0,0,0,0,0,0,0,0,0)
Expanding this to general case of sequence of length L, I wanted to go through the sequence and, using loops, count frequencies of subsequences of the exact lengths. How would I do that? I would create zero-vectors of length L and I would add 1 to the l-th element of zero matrix every time I meet a sub-sequence of length l.
Since my sequence will be of length of few thousands, I wouldn't ask Matlab to write them, but I would ask it to write me particular frequency.
Is this a good approach?
Is there some function in Matlab that is doing this?

How about that lovely one-line solution?
%// vector
A = [2, 4, 11, 13, 3, 5, 5, 6, 3, 3, 2, 4, 2, 14, 15]
%// number of digits in output
nout = 15;
seqFreq = #(vec,x) histc(accumarray(cumsum(~(-x*sign([x*1; diff(vec(:))]) + 1 )), ...
vec(:),[],#(x) numel(x)*~all(x == x(1)) ),1:nout).' %'
%// non-increasing sequences -> input +1
x = seqFreq(A,+1)
%// non-decreasing sequences -> input -1
y = seqFreq(A,-1)
x = 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0
y = 0 1 1 2 0 0 0 0 0 0 0 0 0 0 0
Explanation
%// example for non-increasing
q = +1;
%// detect sequences: value = -1
seq = sign([q*1; diff(A(:))]);
%// find subs for accumarray
subs = cumsum(~(-q*seq + 1));
%// count number of elements and check if elements are equal, if not, set count to zero
counts = accumarray(subs,A(:),[],#(p) numel(p)*~all(p == p(1)) );
%// count number of sequences
x = histc(counts,1:nout);

For non-decreasing sequences:
x = [2, 4, 11,13,3,5,5,6,3,3,2,4,2,14,15]; %// data
y = [inf x -inf]; %// terminate data properly
starts = find(diff(y(1:end-1))<0 & diff(y(2:end))>=0);
ends = find(diff(y(1:end-1))>=0 & diff(y(2:end))<0);
result = histc(ends-starts+1, 1:numel(x));
For non-increasing sequences, just change inequalities and sign of infs:
y = [-inf x inf]; %// terminate data properly
starts = find(diff(y(1:end-1))>0 & diff(y(2:end))<=0);
ends = find(diff(y(1:end-1))<=0 & diff(y(2:end))>0);
result = histc(ends-starts+1, 1:numel(x));

How to deal with circle degrees in Numpy?

I need to calculate some direction arrays in numpy. I divided 360 degrees into 16 groups, each group covers 22.5 degrees. I want the 0 degree in the middle of a group, i.e., get directions between -11.25 degrees and 11.25 degrees. But the problem is how can I get the group between 168.75 degrees and -168.75 degrees?
a[numpy.where(a<0)] = a[numpy.where(a<0)]+360
for m in range (0,3600,225):
b = (a*10 > m)-(a*10 >= m+225).astype(float)
c = numpy.apply_over_axes(numpy.sum,b,0)

If you want to divide data into 16 groups, having 0 degree in the middle, why are you writing for m in range (0,3600,225)?
>>> [x/10. for x in range(0,3600,225)]
[0.0, 22.5, 45.0, 67.5, 90.0, 112.5, 135.0, 157.5, 180.0, 202.5, 225.0, 247.5,
270.0, 292.5, 315.0, 337.5]
## this sectors are not the ones you want!
I would say you should start with for m in range (-1125,36000,2250) (note that now I am using a 100 factor instead of 10), that would give you the groups you want...
wind_sectors = [x/100.0 for x in range(-1125,36000,2250)]
for m in wind_sectors:
#DO THINGS
I have to say I don't really understand your script and the goal of it...
To deal with circle degrees, I would suggest something like:
a condition, where you put your problematic data, i.e., the one where you have to deal with the transition around zero;
a condition where you put all the other data.
For example, in this case, I am printing all the elements from my array that belong to each sector:
import numpy
def wind_sectors(a_array, nsect = 16):
step = 360./nsect
init = step/2
sectores = [x/100.0 for x in range(int(init*100),36000,int(step*100))]
a_array[a_array<0] = a_arraya_array[a_array<0]+360
for i, m in enumerate(sectores):
print 'Sector'+str(i)+'(max_threshold = '+str(m)+')'
if i == 0:
for b in a_array:
if b <= m or b > sectores[-1]:
print b
else:
for b in a_array:
if b <= m and b > sectores[i-1]:
print b
return "it works!"
# TESTING IF THE FUNCTION IS WORKING:
a = numpy.array([2,67,89,3,245,359,46,342])
print wind_sectors(a, 16)
# WITH NDARRAYS:
b = numpy.array([[250,31,27,306], [142,54,260,179], [86,93,109,311]])
print wind_sectors(b.flat[:], 16)
about flat and reshape functions:
>>> a = numpy.array([[0,1,2,3], [4,5,6,7], [8,9,10,11]])
>>> original = a.shape
>>> b = a.flat[:]
>>> c = b.reshape(original)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> c
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Filling missing data in a data set with constant values - arrays

Assuming that x always starts with a 1 and finishes with the final length of xi, this will work: xi=1:x(end) yi=y(arrayfun(#(xi)find(x<=xi,1,'last'),xi))

Related

Find subsequence of length k with the largest product

subtracting every nth array with every nth array

Drop element in numpy array (or pandas series) if difference to previous element is <N

Frequency of non-increasing and non-decreasing subsequences

How to deal with circle degrees in Numpy?

Categories

Resources