I need to calculate some direction arrays in numpy. I divided 360 degrees into 16 groups, each group covers 22.5 degrees. I want the 0 degree in the middle of a group, i.e., get directions between -11.25 degrees and 11.25 degrees. But the problem is how can I get the group between 168.75 degrees and -168.75 degrees?
a[numpy.where(a<0)] = a[numpy.where(a<0)]+360
for m in range (0,3600,225):
b = (a*10 > m)-(a*10 >= m+225).astype(float)
c = numpy.apply_over_axes(numpy.sum,b,0)
If you want to divide data into 16 groups, having 0 degree in the middle, why are you writing for m in range (0,3600,225)?
>>> [x/10. for x in range(0,3600,225)]
[0.0, 22.5, 45.0, 67.5, 90.0, 112.5, 135.0, 157.5, 180.0, 202.5, 225.0, 247.5,
270.0, 292.5, 315.0, 337.5]
## this sectors are not the ones you want!
I would say you should start with for m in range (-1125,36000,2250) (note that now I am using a 100 factor instead of 10), that would give you the groups you want...
wind_sectors = [x/100.0 for x in range(-1125,36000,2250)]
for m in wind_sectors:
#DO THINGS
I have to say I don't really understand your script and the goal of it...
To deal with circle degrees, I would suggest something like:
a condition, where you put your problematic data, i.e., the one where you have to deal with the transition around zero;
a condition where you put all the other data.
For example, in this case, I am printing all the elements from my array that belong to each sector:
import numpy
def wind_sectors(a_array, nsect = 16):
step = 360./nsect
init = step/2
sectores = [x/100.0 for x in range(int(init*100),36000,int(step*100))]
a_array[a_array<0] = a_arraya_array[a_array<0]+360
for i, m in enumerate(sectores):
print 'Sector'+str(i)+'(max_threshold = '+str(m)+')'
if i == 0:
for b in a_array:
if b <= m or b > sectores[-1]:
print b
else:
for b in a_array:
if b <= m and b > sectores[i-1]:
print b
return "it works!"
# TESTING IF THE FUNCTION IS WORKING:
a = numpy.array([2,67,89,3,245,359,46,342])
print wind_sectors(a, 16)
# WITH NDARRAYS:
b = numpy.array([[250,31,27,306], [142,54,260,179], [86,93,109,311]])
print wind_sectors(b.flat[:], 16)
about flat and reshape functions:
>>> a = numpy.array([[0,1,2,3], [4,5,6,7], [8,9,10,11]])
>>> original = a.shape
>>> b = a.flat[:]
>>> c = b.reshape(original)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> c
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Related
The question itself is language-agnostic. I will use python for my example, mainly because I think it is nice to demonstrate the point.
I have an N-dimensional array of shape (n1, n2, ..., nN) that is contiguous in memory (c-order) and filled with numbers. For each dimension by itself, the numbers are ordered in ascending order. A 2D example of such an array is:
>>> import numpy as np
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> n1+n2
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10]])
In this case, the values in each row are ascending, and the values in each column are ascending, too. A 1D example array is
>>> n1 = np.arange(10)
>>> n1*n1
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
I would like to obtain a list/array containing the indices that would sort the flattened version of the nD array in ascending order. By the flattened array I mean that I interpret the nD-array as a 1D array of equivalent size. The sorting doesn't have to preserve order, i.e., the order of indices indexing equal numbers doesn't matter. For example
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> arr = n1*n2
>>> arr
array([[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 2, 4, 6, 8, 10, 12],
[ 0, 3, 6, 9, 12, 15, 18],
[ 0, 4, 8, 12, 16, 20, 24]])
>>> np.argsort(arr.ravel())
array([ 0, 28, 14, 7, 6, 21, 4, 3, 2, 1, 5, 8, 9, 15, 22, 10, 11,
29, 16, 12, 23, 17, 13, 18, 30, 24, 19, 25, 31, 20, 26, 32, 27, 33,
34], dtype=int64)
Standard sorting on the flattened array can accomplish this; however, it doesn't exploit the fact that the array is already partially sorted, so I suspect there exists a more efficient solution. What is the most efficient way to do so?
A comment asked what my use-case is, and if I could provide some more realistic test data for benchmarking. Here is how I encountered this problem:
Given an image and a binary mask for that image (which selects pixels), find the largest sub-image which contains only selected pixels.
In my case, I applied a perspective transformation to an image, and want to crop it so that there is no black background while preserving as much of the image as possible.
from skimage import data
from skimage import transform
from skimage import img_as_float
tform = transform.EuclideanTransform(
rotation=np.pi / 12.,
translation = (10, -10)
)
img = img_as_float(data.chelsea())[50:100, 150:200]
tf_img = transform.warp(img, tform.inverse)
tf_mask = transform.warp(np.ones_like(img), tform.inverse)[..., 0]
y = np.arange(tf_mask.shape[0])
x = np.arange(tf_mask.shape[1])
y1 = y[:, None, None, None]
y2 = y[None, None, :, None]
x1 = x[None, :, None, None]
x2 = x[None, None, None, :]
y_padded, x_padded = np.where(tf_mask==0.0)
y_padded = y_padded[None, None, None, None, :]
x_padded = x_padded[None, None, None, None, :]
y_inside = np.logical_and(y1[..., None] <= y_padded, y_padded<= y2[..., None])
x_inside = np.logical_and(x1[..., None] <= x_padded, x_padded<= x2[..., None])
contains_padding = np.any(np.logical_and(y_inside, x_inside), axis=-1)
# size of the sub-image
height = np.clip(y2 - y1 + 1, 0, None)
width = np.clip(x2 - x1 + 1, 0, None)
img_size = width * height
# find all largest sub-images
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
cropped_img = tf_img[y_low[0]:y_high[0]+1, x_low[0]:x_high[0]+1]
The algorithm is quite inefficient; I am aware. What is interesting for this question is img_size, which is a (50,50,50,50) 4D-array that is ordered as described above. Currently I do:
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
but with a proper argsort algorithm (that I can interrupt early) this could potentially be made much better.
I would do it using parts of mergesort and a divide and conquer approach.
You start with the first two arrays.
[0, 1, 2, 3, 4, 5, 6],//<- This
[ 1, 2, 3, 4, 5, 6, 7],//<- This
....
Then you can merge them like this (Java-like syntax):
List<Integer> merged=new ArrayList<>();
List<Integer> firstRow=... //Same would work with arrays
List<Integer> secondRow=...
int firstCnter=0;
int secondCnter=0;
while(firstCnter<firstRow.size()||secondCnter<secondRow.size()){
if(firstCnter==firstRow.size()){ //Unconditionally add all elements from the second, if we added all the elements from the first
merged.add(secondRow.get(secondCnter++));
}else if(secondCnter==secondRow.size()){
merged.add(firstRow.get(firstCnter++));
}else{ //Add the smaller value from both lists at the current index.
int firstValue=firstRow.get(firstCnter);
int secondValue=secondRow.get(secondCnter);
merged.add(Math.min(firstValue,secondValue));
if(firstValue<=secondValue)
firstCnter++;
else
secondCnter++;
}
}
After that you can merge the next two rows, until you have:
[0,1,1,2,2,3,3,4,4,5,5,6,7]
[2,3,3,4,4,5,5,6,6,7,7,8,8,9]
[4,5,6,7,8,9,10] //Not merged.
Continue to merge again.
[0,1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,7,7,7,8,8,9]
[4,5,6,7,8,9,10]
After that, the last merge:
[0,1,1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,9,9,10]
I don't know about the time complexity, but should be a viable solution
Another idea: Use a min-heap with just the current must-have candidates for being the next-smallest value. Start with the value at the origin (index 0 in all dimensions), as that's smallest. Then repeatedly take out the smallest value from the heap and add its neighbors not yet added.
I have the task of selecting p% of elements within a given numpy array. For example,
# Initialize 5 x 3 array-
x = np.random.randint(low = -10, high = 10, size = (5, 3))
x
'''
array([[-4, -8, 3],
[-9, -1, 5],
[ 9, 1, 1],
[-1, -1, -5],
[-1, -4, -1]])
'''
Now, I want to select say p = 30% of the numbers in x, so 30% of numbers in x is 5 (rounded up).
Is there a way to select these 30% of numbers in x? Where p can change and the dimensionality of numpy array x can be 3-D or maybe more.
I am using Python 3.7 and numpy 1.18.1
Thanks
You can use np.random.choice to sample without replacement from a 1d numpy array:
p = 0.3
np.random.choice(x.flatten(), int(x.size * p) , replace=False)
For large arrays, the performance of sampling without replacement can be pretty bad, but there are some workarounds.
You can randome choice 0,1 and usenp.nonzero and boolean indexing:
np.random.seed(1)
x[np.nonzero(np.random.choice([1, 0], size=x.shape, p=[0.3,0.7]))]
Output:
array([ 3, -1, 5, 9, -1, -1])
I found a way of selecting p% of numpy elements:
p = 20
# To select p% of elements-
x_abs[x_abs < np.percentile(x_abs, p)]
# To select p% of elements and set them to a value (in this case, zero)-
x_abs[x_abs < np.percentile(x_abs, p)] = 0
I have an array (not sorted) of N elements. I'd like to keep the original order of N, but instead of the actual elements, I'd like them to have their bin numbers, where N is split into m bins of equal (if N is divisible by m) or nearly equal (N not divisible by m) values. I need a vectorized solution (since N is fairly large, so standard python methods won't be efficient). Is there anything in scipy or numpy that can do this?
e.g.
N = [0.2, 1.5, 0.3, 1.7, 0.5]
m = 2
Desired output: [0, 1, 0, 1, 0]
I've looked at numpy.histogram, but it doesn't give me unequally spaced bins.
Listed in this post is a NumPy based vectorized approach with the idea of creating equally spaced indices for the length of the input array using np.searchsorted -
Here's the implementation -
def equal_bin(N, m):
sep = (N.size/float(m))*np.arange(1,m+1)
idx = sep.searchsorted(np.arange(N.size))
return idx[N.argsort().argsort()]
Sample runs with bin-counting for each bin to verify results -
In [442]: N = np.arange(1,94)
In [443]: np.bincount(equal_bin(N, 4))
Out[443]: array([24, 23, 23, 23])
In [444]: np.bincount(equal_bin(N, 5))
Out[444]: array([19, 19, 18, 19, 18])
In [445]: np.bincount(equal_bin(N, 10))
Out[445]: array([10, 9, 9, 10, 9, 9, 10, 9, 9, 9])
Here's another approach using linspace to create those equally spaced numbers that could be used as indices, like so -
def equal_bin_v2(N, m):
idx = np.linspace(0,m,N.size+0.5, endpoint=0).astype(int)
return idx[N.argsort().argsort()]
Sample run -
In [689]: N
Out[689]: array([ 0.2, 1.5, 0.3, 1.7, 0.5])
In [690]: equal_bin_v2(N,2)
Out[690]: array([0, 1, 0, 1, 0])
In [691]: equal_bin_v2(N,3)
Out[691]: array([0, 1, 0, 2, 1])
In [692]: equal_bin_v2(N,4)
Out[692]: array([0, 2, 0, 3, 1])
In [693]: equal_bin_v2(N,5)
Out[693]: array([0, 3, 1, 4, 2])
pandas.qcut
Another good alternative is the pd.qcut from pandas. For example:
In [6]: import pandas as pd
In [7]: N = [0.2, 1.5, 0.3, 1.7, 0.5]
...: m = 2
In [8]: pd.qcut(N, m, labels=False)
Out[8]: array([0, 1, 0, 1, 0], dtype=int64)
Tip for getting the bin middle points
If you want to return the bin edges, use labels=True (default). This will allow you to get the bin middle points with:
In [26]: intervals = pd.qcut(N, 2)
In [27]: [i.mid for i in intervals]
Out[27]: [0.34950000000000003, 1.1, 0.34950000000000003, 1.1, 0.34950000000000003]
The intervals is an array of pandas.Interval objects (when labels=True).
See also: pd.cut, if you would like to make the bin width (not bin count) equal
Problem Question
Divisors of 42 are : 1, 2, 3, 6, 7, 14, 21, 42. These divisors squared are: 1, 4, 9, 36, 49, 196, 441, 1764. The sum of the squared divisors is 2500 which is 50 * 50, a square!
Given two integers m, n (1 <= m <= n) we want to find all integers between m and n whose sum of squared divisors is itself a square. 42 is such a number.
The result will be an array of arrays, each subarray having two elements, first the number whose squared divisors is a square and then the sum of the squared divisors.
Code below
How can I make this specific program run faster? My current code times out after n > 9999.
#returns the divisors of each number in an array of arrays
r = (m..n).to_a.map { |z| (1..z).select { |x| z % x == 0} }
#this finds all integers between m and n whose sum of squared divisors is itself a square
squarenumbers = r.map { |x| x.map { |c| c**2 }.inject(:+) }.select { |x| Math.sqrt(x) % 1 == 0 }
#returns an array of booleans.
booleans = r.map { |x| x.map { |c| c**2 }.inject(:+) }.map { |x| Math.sqrt(x) % 1 == 0 }
#returns the index of each of the true values in booleans as an array
indexer = booleans.map.with_index{|x, i| i if x == true }.compact
#returns the numbers whose squared divisors is a square in an array
unsqr = indexer.map { |x| (m..n).to_a[x] }
#merges the two arrays together, element for element and creates an array of arrays
unsqr.zip(squarenumbers)
# for m = 1 and n = 1000 the result would be
# [[1, 1], [42, 2500], [246, 84100], [287, 84100], [728, 722500]]
Brute-force calculatioins of factors
You begin by calculating:
m, n = 40, 42
r = (m..n).to_a.map { |z| (1..z).select { |x| z % x == 0} }
#=> [[1, 2, 4, 5, 8, 10, 20, 40], [1, 41], [1, 2, 3, 6, 7, 14, 21, 42]]
That's OK, but you don't need .to_a:
r = (m..n).map { |z| (1..z).select { |x| z % x == 0} }
#=> [[1, 2, 4, 5, 8, 10, 20, 40], [1, 41], [1, 2, 3, 6, 7, 14, 21, 42]]
This avoids an extra step, which is the creation of the temporary array1,2:
(m..n).to_a #=> [40, 41, 42]
Structure of a solution
Let's work backwards to come up with our code. First, concentrate on determining, for any given number q, if the sum of squares of the factors of q is itself a perfect square. Suppose we construct a method magic_number? which takes q as its only argument and returns true if q satisfies the required property and false otherwise. Then we will compute:
(m..n).select { |q| magic_number?(q) }
to return an array of all numbers between m and n that satisfy the property. magic_number? can be written like this:
def magic_number?(q)
return true if q == 1
s = sum_of_squared_factors(q)
s == Math.sqrt(s).round**2
end
Calculating sum of squared factors
So now we are left with writing the method sum_of_squared_factors. We can use your code to obtain the factors:
def factors(q)
(1..q).select { |x| q % x == 0 }
end
factors(40) #=> [1, 2, 4, 5, 8, 10, 20, 40]
factors(41) #=> [1, 41]
factors(42) #=> [1, 2, 3, 6, 7, 14, 21, 42]
and then write:
def sum_of_squared_factors(q)
factors(q).reduce(0) { |t,i| t + i*i }
end
sum_of_squared_factors(40) #=> 2210
sum_of_squared_factors(41) #=> 1682
sum_of_squared_factors(42) #=> 2500
Speeding the calculation of factors
There's something more we can do to speed up the calculation of factors. If f is a factor of n, f and n/f, are both factors of n. (For example, since 3 is a factor of 42, so is 42/3 #=> 14). We therefore need only obtain the smaller of each pair.
There is one exception to this rule. If n is a perfect square and f == n**0.5, then f = n/f, so we only include f among the factors of n (not n/f as well).
If turns out that if f is the smaller of the pair, f <=(n**0.5).round3. We therefore need only check to see which of the numbers (1..(n**0.5).round) are factors and include their complements (unless n is a perfect square, in which case we do not double-count (n**0.5).round):
q = 42
arr = (1..Math.sqrt(q).round).select { |x| q % x == 0 }
#=> [1, 2, 3, 6]
arr = arr.flat_map { |n| [n, q/n] }
#=> [1, 42, 2, 21, 3, 14, 6, 7]
arr.pop if a[-2] == a[-1]
arr
#=> [1, 42, 2, 21, 3, 14, 6, 7]
q = 36
arr = (1..Math.sqrt(q).round).select { |x| q % x == 0 }
#=> [1, 2, 3, 4, 6]
arr = arr.flat_map { |n| [n, q/n] }
#=> [1, 36, 2, 18, 3, 12, 4, 9, 6, 6]
arr.pop if a[-2] == a[-1]
#=> 6
arr
#=> [1, 36, 2, 18, 3, 12, 4, 9, 6]
so we can write:
def factors(q)
arr = (1..Math.sqrt(q)).select { |x| q % x == 0 }
arr = arr.flat_map { |n| [n, q/n] }
arr.pop if arr[-2] == arr[-1]
arr
end
Substituting out arr ("chaining" expressions), we obtain a typical Ruby expression:
def factors(q)
(1..Math.sqrt(q)).select { |x| q % x == 0 }.
flat_map { |n| [n, q/n] }.
tap { |a| a.pop if a[-2] == a[-1] }
end
factors(42)
#=> [1, 42, 2, 21, 3, 14, 6, 7]
factors(36)
#=> [1, 36, 2, 18, 3, 12, 4, 9, 6]
See Enumerable#flat_map and Object#tap. (There's no need for this array to be sorted. In applications where it needs to be sorted, just tack .sort onto the end of flat_maps block.)
Wrapping up
In sum, we are left with the following:
def magic_number?(q)
return true if q == 1
s = sum_of_squared_factors(q)
s == Math.sqrt(s).round**2
end
def sum_of_squared_factors(q)
factors(q).reduce(0) { |t,i| t + i*i }
end
def factors(q)
(1..Math.sqrt(q)).select { |x| q % x == 0 }.
flat_map { |n| [n, q/n] }.
tap { |a| a.pop if a[-2] == a[-1] }
end
m, n = 1, 1000
(m..n).select { |q| magic_number?(q) }
#=> `[1, 42, 246, 287, 728]
This calculation was completed in a blink of an eye.
Compute primes to further speed calculation of factors
Lastly, let me describe an even faster way to compute the factors of a number, using the method Prime::prime_division. That method decomposes any number into its prime components. Consider, for example, n = 360.
require 'prime'
Prime.prime_division(360)
#=> [[2, 3], [3, 2], [5, 1]]
This tells us that:
360 == 2**3 * 3**2 * 5**1
#=> true
It also tells us that every factor of 360 is the product of between 0 and 3 2's, multiplied by between 0 and 2 3's, multiplied by 0 or 1 5's. Therefore:
def factors(n)
Prime.prime_division(n).reduce([1]) do |a,(prime,pow)|
a.product((0..pow).map { |po| prime**po }).map { |x,y| x*y }
end
end
a = factors(360).sort
#=> [ 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 18,
# 20, 24, 30, 36, 40, 45, 60, 72, 90, 120, 180, 360]
We can check that:
a == (1..360).select { |n| (360 % n).zero? }
#=> true
One other check:
factors(40).sort
#=> [1, 2, 4, 5, 8, 10, 20, 40]
1. You could instead write that [*m..n] #=> [40, 41, 42].
2. Why is it not necessary to convert the range to an array? Enumerable#map, being an instance method of the module Enumerable, is available for use by every class that includes Enumerable. Array is one, but (m..n).class #=> Range is another. (See the second paragraph at Range).
3. Suppose f is smaller than n/f and f > n**0.5, then n/f < n/(n**0.5) = n**0.5 < f, a contradiction.
I don't know Ruby but the problem lies with the algorithm used in finding the divisors of a number (which is not specific to the language used, i.e. Ruby in this case).
r = (m..n).to_a.map { |z| (1..z).select { |x| z % x == 0} }
To find the divisors of an integer n you are dividing n by all positive integers unto n - 1 which means the loop runs n - 1 times. However, it is enough to divide upto sort(n) to calculate the divisors. In pseudocode this looks like below:
for i = 1 to i <= sqrt(n)
r = n % i
if r == 0 then
i is a divisor
if n / i != i then
n / i is another divisor
For example:
sqrt_42 = 6.48074069840786
i = 1 => 1 and 42 are two divisors
i = 2 => 2 and 21
i = 3 => 3 and 14
i = 4 => no divisor
i = 5 => no divisor
i = 6 => 6 and 7
And thats all.
This will improve the performance a lot since now the loop runs only sort(n) times instead of n - 1 times which is a big difference for large n.
If I have a sorted array of numerical values such as Double, Integer, and Time, what is the general logic to finding a complement?
Over my CS career in college, I've gotten better of understanding complements and edge cases for ranges. As I help students whose skill levels and understanding match mine when I wrote this, I need help finding a generalized way to convey this concept to them for singular elements and ranges.
Try something like this:
def complement(l, universe=None):
"""
Return the complement of a list of integers, as compared to
a given "universe" set. If no universe is specified,
consider the universe to be all integers between
the minimum and maximum values of the given list.
"""
if universe is not None:
universe = set(universe)
else:
universe = set(range(min(l), max(l)+1))
return sorted(universe - set(l))
then
l = [1,3,5,7,10]
complement(l)
yields:
[2, 4, 6, 8, 9]
Or you can specify your own universe:
complement(l, range(12))
yields:
[0, 2, 4, 6, 8, 9, 11]
To add another option - using a data type that is always useful to learn about, for these types of operations.
a = set([1, 3, 5, 7, 10])
b = set(range(1, 11))
c = sorted(list(b.symmetric_difference(a)))
print(c)
[2, 4, 6, 8, 9]
>>> nums = [1, 3, 5, 7, 10]
>>> [n + ((n&1)*2-1) for n in nums]
[2, 4, 6, 8, 9]
The easiest way is to iterate from the beginning of your list to the second to last element. Set j equal to the index + 1. While j is less than the next number in your list, append it to your list of complements and increment it.
# find the skipped numbers in a list sorted in ascending order
def getSkippedNumbers (arr):
complement = []
for i in xrange(0, len(arr) - 1):
j = arr[i] + 1
while j < arr[i + 1]:
complement.append(j)
j += 1
return complement
test = [1, 3, 5, 7, 10]
print getSkippedNumbers(test) # returns [2, 4, 6, 8, 9]
You can find the compliment of two lists using list comprehension. Here we are taking the complement of a set x with respect to a set y:
>>> x = [1, 3, 5, 7, 10]
>>> y = [1, 2, 3, 4, 8, 9, 20]
>>> z = [n for n in x if not n in y]
>>> z
[5, 7, 10]
>>>