Python Numpy repeating an arange array - arrays

so say I do this
x = np.arange(0, 3)
which gives
array([0, 1, 2])
but what can I do like
x = np.arange(0, 3)*repeat(N=3)times
to get
array([0, 1, 2, 0, 1, 2, 0, 1, 2])

I've seen several recent questions about resize. It isn't used often, but here's one case where it does just what you want:
In [66]: np.resize(np.arange(3),3*3)
Out[66]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
There are many other ways of doing this.
In [67]: np.tile(np.arange(3),3)
Out[67]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
In [68]: (np.arange(3)+np.zeros((3,1),int)).ravel()
Out[68]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])
np.repeat doesn't repeat in the way we want
In [70]: np.repeat(np.arange(3),3)
Out[70]: array([0, 0, 0, 1, 1, 1, 2, 2, 2])
but even that can be reworked (this is a bit advanced):
In [73]: np.repeat(np.arange(3),3).reshape(3,3,order='F').ravel()
Out[73]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])

EDIT: Refer to hpaulj's answer. It is frankly better.
The simplest way is to convert back into a list and use:
list(np.arange(0,3))*3
Which gives:
>> [0, 1, 2, 0, 1, 2, 0, 1, 2]
Or if you want it as a numpy array:
np.array(list(np.arange(0,3))*3)
Which gives:
>> array([0, 1, 2, 0, 1, 2, 0, 1, 2])

how about this one?
arr = np.arange(3)
res = np.hstack((arr, ) * 3)
Output
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
Not much overhead I would say.

Related

resize array while keeping mask

I'm trying to figure out how to effectively resize an 1-d array while keeping the mask it represents. Using this array i do draw simple sprites while one value in the array represents a specific color.
Anyway my goal is as follows, having the following "small" array with values:
0, 1, 2, 3,
0, 1, 2, 2,
0, 1, 1, 1,
0, 0, 1, 1,
0, 0, 0, 0
This obviously is going to be a sprite of size 4x5.
Now i want to resize it keeping the values so getting the same sprite/shape but in higher resolution.
Now by saying "scale-by-2" i would get a 8x10 sized sprite, the 1-d array then should look as follows:
0, 0, 1, 1, 2, 2, 3, 3,
0, 0, 1, 1, 2, 2, 3, 3,
0, 0, 1, 1, 2, 2, 2, 2,
0, 0, 1, 1, 2, 2, 2, 2,
0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
My idea is to group the numbers row by row, take the scale factor (2) and add as many of the digits (from one group) as we have to scale (2) in one row. Then duplicate each row by the scale factor as well. But still i am not sure if this covers all cases.
Any other (more effective) way to handle this?

python: vectorized cumulative counting

I have a numpy array and would like to count the number of occurences for each value, however, in a cumulative way
in = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]
I'm wondering if it is best to create a (sparse) matrix with ones at col = i and row = in[i]
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0
Then we could compute the cumsums along the rows and extract the numbers from the locations where the cumsums increment.
However, if we cumsum a sparse matrix, doesn't become dense? Is there an efficient way of doing it?
Here's one vectorized approach using sorting -
def cumcount(a):
# Store length of array
n = len(a)
# Get sorted indices (use later on too) and store the sorted array
sidx = a.argsort()
b = a[sidx]
# Mask of shifts/groups
m = b[1:] != b[:-1]
# Get indices of those shifts
idx = np.flatnonzero(m)
# ID array that will store the cumulative nature at the very end
id_arr = np.ones(n,dtype=int)
id_arr[idx[1:]+1] = -np.diff(idx)+1
id_arr[idx[0]+1] = -idx[0]
id_arr[0] = 0
c = id_arr.cumsum()
# Finally re-arrange those cumulative values back to original order
out = np.empty(n, dtype=int)
out[sidx] = c
return out
Sample run -
In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])
In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])

Numpy Number Patterns

Is there a function in Numpy that allows you to take 4 records at a time and see where they match with a second dataset? Once there is a match move to the next 4 records of the first data set. It wont always be every 4 records, but i am using this as an example.
So if dataset one had - 1,5,7,8,10,12,6,1,3,6,8,9
And the second dataset had - 1,5,7,8,11,15,6,1,3,6,10,6
My result will be: 1,5,7,8, 6,1,3,6
POST EDIT:
My second example datasets:
import numpy as np
a =np.array([15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,1,1,3,5,6,0,9,47,1,1,7,7,0,0,45,12,17,45])
b = np.array ([6,0,0,15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,3,4,6,0,9,47,1,1,7,7,0,0,45,12,16,1,9,3,30])
Here's another snapshot of an example:
Thank you in advance for looking at my question!!
Update: for the more difficult and more interesting alignment problem it is probably best not to reinvent the wheel but to rely on python's difflib:
from difflib import SequenceMatcher
import numpy as np
k=4
a = np.array([15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,1,1,3,5,6,0,9,47,1,1,7,7,0,0,45,12,17,45])
b = np.array ([6,0,0,15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,3,4,6,0,9,47,1,1,7,7,0,0,45,12,16,1,9,3,30])
sm = SequenceMatcher(a=a, b=b)
matches = sm.get_matching_blocks()
matches = [m for m in matches if m.size >= k]
# [Match(a=0, b=3, size=17), Match(a=21, b=22, size=12)]
consensus = [a[m.a:m.a+m.size] for m in matches]
# [array([15, 15, 0, 0, 10, 10, 0, 0, 2, 1, 8, 8, 42, 2, 4, 4, 3]), array([ 6, 0, 9, 47, 1, 1, 7, 7, 0, 0, 45, 12])]
consfour = [a[m.a:m.a + m.size // k * k] for m in matches]
# [array([15, 15, 0, 0, 10, 10, 0, 0, 2, 1, 8, 8, 42, 2, 4, 4]), array([ 6, 0, 9, 47, 1, 1, 7, 7, 0, 0, 45, 12])]
summary = [np.c_[np.add.outer(np.arange(m.size // k * k), (m.a, m.b)), c]
for m, c in zip(matches, consfour)]
merge = np.concatenate(summary, axis=0)
Below is my original solution assuming already aligned and same-length arrays:
Here is a hybrid solution using numpy to find consecutive matches and cutting them out and then list comp to apply length constraints:
import numpy as np
d1 = np.array([7,1,5,7,8,0,6,9,0,10,12,6,1,3,6,8,9])
d2 = np.array([8,1,5,7,8,0,6,9,0,11,15,6,1,3,6,10,6])
k = 4
# find matches
m = d1 == d2
# find switches between match, no match
sw = np.where(m[:-1] != m[1:])[0] + 1
# split
mnm = np.split(d1, sw)
# select matches
ones_ = mnm[1-m[0]::2]
# apply length constraint
res = [blck[i:i+k] for blck in ones_ for i in range(len(blck)-k+1)]
# [array([1, 5, 7, 8]), array([5, 7, 8, 0]), array([7, 8, 0, 6]), array([8, 0, 6, 9]), array([0, 6, 9, 0]), array([6, 1, 3, 6])]
res_no_ovlp = [blck[k*i:k*i+k] for blck in ones_ for i in range(len(blck)//k)]
# [array([1, 5, 7, 8]), array([0, 6, 9, 0]), array([6, 1, 3, 6])]
You can use matrix masking like,
import numpy as np
from scipy.sparse import dia_matrix
a = np.array([1,5,7,8,10,12,6,1,3,6,8,9])
b = np.array([1,5,7,8,11,15,6,1,3,6,10,6])
mask = dia_matrix((np.ones((1, a.size)).repeat(4, axis=0), np.arange(4)),
shape=(a.size, b.size), dtype=np.int)
print(mask.toarray())
matches = a[mask.T.dot(mask.dot(a == b) == 4).astype(np.bool)]
print(matches)
This will output,
array([[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
[1 5 7 8 6 1 3 6]
You can think about how the matrix multiplication works to get this result.
Scaling
For scaling, I tested with 1e3, 1e5, and 1e7 elements and got,
1e3 - 0.019184964010491967
1e5 - 0.4330314120161347
1e7 - 144.54082221200224
See the gist. Not sure why such a hard jump at 1e7 elements.
This is an exercise is list comprehension. We have the data
data = [1,5,7,8,10,12,6,1,3,6,8,9]
search_data = [1,5,7,8,11,15,6,1,3,6,10,6]
First we can chunk the original data into blocks of length n
n = 4
chunks = [data[i:i + n] for i in range(len(data) - n + 1)]
search_chunks = [search_data[i:i + n] for i in range(len(search_data) - n + 1)]
Now we must select chunks from the first list that appear in the second list
hits = [c for c in chunks if c in search_chunks]
print hits
# [[1, 5, 7, 8], [6, 1, 3, 6]]
This may not be the optimal solution for long lists. It may improve performance to consider sets, if there are likely to repeated chunks
chunks = set(tuple(data[i:i + n]) for i in range(len(data) - n + 1))
search_chunks = set(tuple(search_data[i:i + n]) for i in range(len(search_data) - n + 1))
This can be quite competitive with above numpy solution, e.g.
import numpy as np
import time
# Generate data
len_ = 10000
max_ = 10
data = map(int, np.random.rand(len_) * max_)
search_data = map(int, np.random.rand(len_) * max_)
# Time list comprehension
start = time.time()
n = 4
chunks = set(tuple(data[i:i + n]) for i in range(len(data) - n + 1))
search_chunks = set(tuple(search_data[i:i + n]) for i in range(len(search_data) - n + 1))
hits = [c for c in chunks if c in search_chunks]
print time.time() - start
# Time numpy
a = np.array(data)
b = np.array(search_data)
mask = 1 * (np.abs(np.arange(a.size).reshape((-1, 1)) - np.arange(a.size) - 0.5) < 2)
start = time.time()
matches = a[mask.T.dot(mask.dot(a == b) == 4).astype(np.bool)]
print time.time() - start
It's typically faster here, but it depends on number of repeated chunks etc.

Create an array containing values oscillating between two boundaries

I am wondering if there is a way to generate an array which, for example, would start from 0, increase by 1 until it reaches 3, and then decreases by 1 until it reaches 0 again, eg
[0,1,2,3,2,1,0]
and if I could specify the number of values in the array ahead of time, that would be great. For example, if I could set the lower bound(0), upper bound (3), increment(1), and length of array (9):
[].oscillate(0,3,1,9) would give me this:
[0,1,2,3,2,1,0,1,2]
As of now, the best thing I can come up with is this:
values = []
until values.count >= 9
values.pop
x=0
values << x && x+=1 while x < 3
values << x && x-=1 while x >= 0
end
Fun exercise!
You're looking for a triangle wave.
The formulas on Wikipedia are for the standard shape (between -1 and 1), but here's an adapted version for any wave position, period and amplitude :
def triangle_wave(min, max, increment, length, offset = 0)
amplitude = max - min
period = 2 * amplitude
Array.new(length) do |i|
min + ((increment * (i + offset) - amplitude) % period - amplitude).abs
end
end
puts triangle_wave(0, 3, 1, 9) == [0, 1, 2, 3, 2, 1, 0, 1, 2]
# true
p triangle_wave(-3, 3, 1, 20)
# => [-3, -2, -1, 0, 1, 2, 3, 2, 1, 0, -1, -2, -3, -2, -1, 0, 1, 2, 3, 2]
p triangle_wave(5, 9, 2, 9)
# => [5, 7, 9, 7, 5, 7, 9, 7, 5]
p triangle_wave(0, 1, 0.25, 9)
# => [0.0, 0.25, 0.5, 0.75, 1.0, 0.75, 0.5, 0.25, 0.0]
p triangle_wave(-3, 0, 1, 9, 3)
# => [0, -1, -2, -3, -2, -1, 0, -1, -2]
p triangle_wave(0, 1, 1, 9)
# => [0, 1, 0, 1, 0, 1, 0, 1, 0]
min should be lower than max, increment should be positive and max-min should be divisible by increment. Those are restrictions on the input but not on the output : any wave can be generated.
This problem could be a textbook example of the use of Ruby's flip-flop operator.
As the question only makes sense when there is a non-negative integer steps such that high = low + steps * increment, I've replaced the method's argument high with steps.
def oscillate(low, steps, increment, length)
high = low + steps * increment
n = low
length.times.each_with_object([]) do |_,a|
a << n
n += (n==low)..(n==high-increment) ? increment : -increment
end
end
oscillate(0,3,1,9)
#=> [0, 1, 2, 3, 2, 1, 0, 1, 2]
oscillate(-1, 4, 2, 16)
#=> [-1, 1, 3, 5, 7, 5, 3, 1, -1, 1, 3, 5, 7, 5, 3, 1]
To show what's happening here I will modify the code a little and add some puts statements, then run it with the first example.
def oscillate(low, steps, increment, length)
high = low + steps * increment
puts "high = #{high}"
n = low
length.times.each_with_object([]) do |_,a|
a << n
diff = (n==low)..(n==high-increment) ? increment : -increment
print "n=#{n}, a<<n=#{a}, diff=#{diff}, "
n += diff
puts "n+=diff=#{n}"
end
end
oscillate(0,3,1,9)
high = 3
n=0, a<<n=[0], diff= 1, n+=diff=1
n=1, a<<n=[0, 1], diff= 1, n+=diff=2
n=2, a<<n=[0, 1, 2], diff= 1, n+=diff=3
n=3, a<<n=[0, 1, 2, 3], diff=-1, n+=diff=2
n=2, a<<n=[0, 1, 2, 3, 2], diff=-1, n+=diff=1
n=1, a<<n=[0, 1, 2, 3, 2, 1], diff=-1, n+=diff=0
n=0, a<<n=[0, 1, 2, 3, 2, 1, 0], diff= 1, n+=diff=1
n=1, a<<n=[0, 1, 2, 3, 2, 1, 0, 1], diff= 1, n+=diff=2
n=2, a<<n=[0, 1, 2, 3, 2, 1, 0, 1, 2], diff= 1, n+=diff=3
#=> [0, 1, 2, 3, 2, 1, 0, 1, 2]
Try this
def oscillate(a, b, step, num)
ramp_up = a.step(b, step).entries
ramp_down = ramp_up.drop(1).reverse.drop(1)
ramp_up.concat(ramp_down).cycle.take(num)
end
How does this work?
creates the ramp_up and ramp_down arrays
concatenates the two arrays
cycle returns an ever-repeating enumerator
take materializes num elements from that enumerator. Other than suggested in a comment, this does not recalculate anything. It just materializes entries from the enumerator.

How to create a random mask array?

I've an array with 128 values, each value is 1:
length = 128
partials = Array.new length
partials.each_index do |i|
partials[i] = 1
end
I want to set value 0 on some (random) position (for example, on pos 1,6,50,70,100,112,120).
Of course, the number of position could be different every time, and if I choose 7 different position, I want to end with 7 different pos changed.
What's the faster way to do this in Ruby?
Assuming you want to have n elements with value 0, you can do the below:
n = 5
partials[0,n] = [0]*n
partials.shuffle
Alternatively, can also be written as:
partials.tap{|p| p[0,n] = [0]*n}.shuffle
You can incorporate the zeros into the array creation:
length = 128
zeros = 7
partials = Array.new(length) { |i| i < zeros ? 0 : 1 }.shuffle
#=> [1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
A way:
array = 128.times.map{1}
Or with randomly sprayed 0s:
array = 128.times.map{rand(2)}
or put a number of 0s later:
10.times{array[rand(128)]=0}
etc... Play with it and see what you need
Another alternative:
length = 10
zeros = 2
([0]*(length-zeros)+[1]*zeros).shuffle

Resources