I want to draw ROC curves with pRoC.
However for some reason there is extra empty space on either side of the x-axis and I cannot remove it with xlim. Some example code:
library(pROC)
n = c(4, 3, 5)
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, b)
rocobj <- plot.roc(df$b, df$n, percent = TRUE, main="ROC", col="#1c61b6", add=FALSE)
I tried the pROC help file, but that doesn't really help me. Even more puzzling is to me that the Y-axis is OK looking...
I really appreciate your help!
Make sure the plotting device is square and adjust the margins so that top + bottom == left + right:
library(pROC)
png("test.png", width = 480, height = 480)
par(mar = c(4, 4, 4, 4)+.1)
n = c(4, 3, 5)
b = c(TRUE, FALSE, TRUE)
rocobj <- plot.roc(b, n, percent = TRUE, main="ROC", col="#1c61b6", add=FALSE)
dev.off()
An other answer, if you don't mind to have distorted axis, is to use the asp parameter. By default it is set to 1, ensuring both axis have the same scale and the ROC curve is squared*, but you can turn it off with asp = NA:
library(pROC)
par(mar = c(4, 4, 4, 4)+.1)
n = c(4, 3, 5)
b = c(TRUE, FALSE, TRUE)
rocobj <- plot.roc(b, n, percent = TRUE, main="ROC", col="#1c61b6", add=FALSE, asp = NA)
* Having a squared ROC curve is important if you want to interpret it visually. For instance, you may want to compare several local maximas by their distance to the diagonal: you can only do that if the two axis have the same scale. So if you want to do that make sure to follow my other answer.
There is yet a third answer, which takes the margins out of the plotting region, so it will automatically look squared, even when the device isnt. This is done by setting the graphical parameter pty to "s":
library(pROC)
par(pty = "s")
n = c(4, 3, 5)
b = c(TRUE, FALSE, TRUE)
rocobj <- plot.roc(b, n, percent = TRUE, main="ROC", col="#1c61b6", add=FALSE, asp = NA)
(I added a black frame to visualize what's going on)
Related
Suppose I have a set of pixel values, e.g.
> S[42]
6, 2, (0.1, 0, 0)
^ here the 42nd entry is for pixel location (6,2) with a dull red color.
How to efficiently plot S into a fresh numpy bitmap array bitmap = np.zeros((1024, 768, 3))?
Is there some vectorized solution (rather than a for loop)?
I can split S by columns into S_x, S_y and S_RGB if that helps.
this is how you do it, yes splitting up is helpful, and use the same datatypes I have below
bitmap = np.zeros((10, 10, 3))
s_x = (1,2,3) ## tuple
s_y = (0,1,2) ## tuple
pixal_val = np.array([[0,0,1],[1,0,0],[0,1,0]]) ## np
bitmap[s_y, s_x] = pixal_val
plt.imshow(bitmap)
output:
Edit:
it does work with using numpy arrays as coordinates but make sure they are type int
bitmap = np.zeros((10, 10, 3))
s_x = np.array([a for a in range(10)], dtype=int)
s_y = np.array([a for a in range(10)], dtype=int)
np.random.shuffle(s_x)
np.random.shuffle(s_y)
pixel_val = np.random.rand(10,3)
bitmap[s_y, s_x] = pixel_val
plt.imshow(bitmap)
final edit: s_x ans s_y where the wrong way round I have fixed above
It seems I am stuck on the following problem with numpy.
I have an array X with shape: X.shape = (nexp, ntime, ndim, npart)
I need to compute binned statistics on this array along npart dimension, according to the values in binvals (and some bins), but keeping all the other dimensions there, because I have to use the binned statistic to remove some bias in the original array X. Binning values have shape binvals.shape = (nexp, ntime, npart).
A complete, minimal example, to explain what I am trying to do. Note that, in reality, I am working on large arrays and with several hunderds of bins (so this implementation takes forever):
import numpy as np
np.random.seed(12345)
X = np.random.randn(24).reshape(1,2,3,4)
binvals = np.random.randn(8).reshape(1,2,4)
bins = [-np.inf, 0, np.inf]
nexp, ntime, ndim, npart = X.shape
cleanX = np.zeros_like(X)
for ne in range(nexp):
for nt in range(ntime):
indices = np.digitize(binvals[ne, nt, :], bins)
for nd in range(ndim):
for nb in range(1, len(bins)):
inds = indices==nb
cleanX[ne, nt, nd, inds] = X[ne, nt, nd, inds] - \
np.mean(X[ne, nt, nd, inds], axis = -1)
Looking at the results of this may make it clearer?
In [8]: X
Out[8]:
array([[[[-0.20470766, 0.47894334, -0.51943872, -0.5557303 ],
[ 1.96578057, 1.39340583, 0.09290788, 0.28174615],
[ 0.76902257, 1.24643474, 1.00718936, -1.29622111]],
[[ 0.27499163, 0.22891288, 1.35291684, 0.88642934],
[-2.00163731, -0.37184254, 1.66902531, -0.43856974],
[-0.53974145, 0.47698501, 3.24894392, -1.02122752]]]])
In [10]: cleanX
Out[10]:
array([[[[ 0. , 0.67768523, -0.32069682, -0.35698841],
[ 0. , 0.80405255, -0.49644541, -0.30760713],
[ 0. , 0.92730041, 0.68805503, -1.61535544]],
[[ 0.02303938, -0.02303938, 0.23324375, -0.23324375],
[-0.81489739, 0.81489739, 1.05379752, -1.05379752],
[-0.50836323, 0.50836323, 2.13508572, -2.13508572]]]])
In [12]: binvals
Out[12]:
array([[[ -5.77087303e-01, 1.24121276e-01, 3.02613562e-01,
5.23772068e-01],
[ 9.40277775e-04, 1.34380979e+00, -7.13543985e-01,
-8.31153539e-01]]])
Is there a vectorized solution? I thought of using scipy.stats.binned_statistic, but I seem to be unable to understand how to use it for this aim. Thanks!
import numpy as np
np.random.seed(100)
nexp = 3
ntime = 4
ndim = 5
npart = 100
nbins = 4
binvals = np.random.rand(nexp, ntime, npart)
X = np.random.rand(nexp, ntime, ndim, npart)
bins = np.linspace(0, 1, nbins + 1)
d = np.digitize(binvals, bins)[:, :, np.newaxis, :]
r = np.arange(1, len(bins)).reshape((-1, 1, 1, 1, 1))
m = d[np.newaxis, ...] == r
counts = np.sum(m, axis=-1, keepdims=True).clip(min=1)
means = np.sum(X[np.newaxis, ...] * m, axis=-1, keepdims=True) / counts
cleanX = X - np.choose(d - 1, means)
Ok, I think I got it, mainly based on the answer by #jdehesa.
clean2 = np.zeros_like(X)
d = np.digitize(binvals, bins)
for i in range(1, len(bins)):
m = d == i
minds = np.where(m)
sl = [*minds[:2], slice(None), minds[2]]
msum = m.sum(axis=-1)
clean2[sl] = (X - \
(np.sum(X * m[...,np.newaxis,:], axis=-1) /
msum[..., np.newaxis])[..., np.newaxis])[sl]
Which gives the same results as my original code.
On the small arrays I have in the example here, this solution is approximately three times as fast as the original code. I expect it to be way faster on larger arrays.
Update:
Indeed it's faster on larger arrays (didn't do any formal test), but despite this, it just reaches the level of acceptable in terms of performance... any further suggestion on extra vectoriztaions would be very welcome.
Spent a while this morning looking for a generalized question to point duplicates to for questions about as_strided and/or how to make generalized window functions. There seem to be a lot of questions on how to (safely) create patches, sliding windows, rolling windows, tiles, or views onto an array for machine learning, convolution, image processing and/or numerical integration.
I'm looking for a generalized function that can accept a window, step and axis parameter and return an as_strided view for over arbitrary dimensions. I will give my answer below, but I'm interested if anyone can make a more efficient method, as I'm not sure using np.squeeze() is the best method, I'm not sure my assert statements make the function safe enough to write to the resulting view, and I'm not sure how to handle the edge case of axis not being in ascending order.
DUE DILIGENCE
The most generalized function I can find is sklearn.feature_extraction.image.extract_patches written by #eickenberg (as well as the apparently equivalent skimage.util.view_as_windows), but those are not well documented on the net, and can't do windows over fewer axes than there are in the original array (for example, this question asks for a window of a certain size over just one axis). Also often questions want a numpy only answer.
#Divakar created a generalized numpy function for 1-d inputs here, but higher-dimension inputs require a bit more care. I've made a bare bones 2D window over 3d input method, but it's not very extensible.
EDIT JAN 2020: Changed the iterable return from a list to a generator to save memory.
EDIT OCT 2020: Put the generator in a separate function, since mixing generators and return statements doesn't work intiutively.
Here's the recipe I have so far:
def window_nd(a, window, steps = None, axis = None, gen_data = False):
"""
Create a windowed view over `n`-dimensional input that uses an
`m`-dimensional window, with `m <= n`
Parameters
-------------
a : Array-like
The array to create the view on
window : tuple or int
If int, the size of the window in `axis`, or in all dimensions if
`axis == None`
If tuple, the shape of the desired window. `window.size` must be:
equal to `len(axis)` if `axis != None`, else
equal to `len(a.shape)`, or
1
steps : tuple, int or None
The offset between consecutive windows in desired dimension
If None, offset is one in all dimensions
If int, the offset for all windows over `axis`
If tuple, the steps along each `axis`.
`len(steps)` must me equal to `len(axis)`
axis : tuple, int or None
The axes over which to apply the window
If None, apply over all dimensions
if tuple or int, the dimensions over which to apply the window
gen_data : boolean
returns data needed for a generator
Returns
-------
a_view : ndarray
A windowed view on the input array `a`, or `a, wshp`, where `whsp` is the window shape needed for creating the generator
"""
ashp = np.array(a.shape)
if axis != None:
axs = np.array(axis, ndmin = 1)
assert np.all(np.in1d(axs, np.arange(ashp.size))), "Axes out of range"
else:
axs = np.arange(ashp.size)
window = np.array(window, ndmin = 1)
assert (window.size == axs.size) | (window.size == 1), "Window dims and axes don't match"
wshp = ashp.copy()
wshp[axs] = window
assert np.all(wshp <= ashp), "Window is bigger than input array in axes"
stp = np.ones_like(ashp)
if steps:
steps = np.array(steps, ndmin = 1)
assert np.all(steps > 0), "Only positive steps allowed"
assert (steps.size == axs.size) | (steps.size == 1), "Steps and axes don't match"
stp[axs] = steps
astr = np.array(a.strides)
shape = tuple((ashp - wshp) // stp + 1) + tuple(wshp)
strides = tuple(astr * stp) + tuple(astr)
as_strided = np.lib.stride_tricks.as_strided
a_view = np.squeeze(as_strided(a,
shape = shape,
strides = strides))
if gen_data :
return a_view, shape[:-wshp.size]
else:
return a_view
def window_gen(a, window, **kwargs):
#Same docstring as above, returns a generator
_ = kwargs.pop(gen_data, False)
a_view, shp = window_nd(a, window, gen_data = True, **kwargs)
for idx in np.ndindex(shp):
yield a_view[idx]
Some test cases:
a = np.arange(1000).reshape(10,10,10)
window_nd(a, 4).shape # sliding (4x4x4) window
Out: (7, 7, 7, 4, 4, 4)
window_nd(a, 2, 2).shape # (2x2x2) blocks
Out: (5, 5, 5, 2, 2, 2)
window_nd(a, 2, 1, 0).shape # sliding window of width 2 over axis 0
Out: (9, 2, 10, 10)
window_nd(a, 2, 2, (0,1)).shape # tiled (2x2) windows over first and second axes
Out: (5, 5, 2, 2, 10)
window_nd(a,(4,3,2)).shape # arbitrary sliding window
Out: (7, 8, 9, 4, 3, 2)
window_nd(a,(4,3,2),(1,5,2),(0,2,1)).shape #arbitrary windows, steps and axis
Out: (7, 5, 2, 4, 2, 3) # note shape[-3:] != window as axes are out of order
I have several (named) vectors in a list:
data = list(a=runif(n = 50, min = 1, max = 10), b=runif(n = 50, min = 1, max = 10), c=runif(n = 50, min = 1, max = 10), d=runif(n = 50, min = 1, max = 10))
I want to play around with different combinations of them depending on what an array tells me, for example I want to sum across the different combinations in combs:
var <- letters[1:length(data)]
combs <- do.call(expand.grid, lapply(var, function(x) c("", x)))[-1,]
And get the sums for each row of these combinations. So the results for the first 8 rows would look like this:
res = rbind(a=sum(data[["a"]]), b=sum(data[["b"]]), ab = sum(c(data[["a"]], data[["b"]])), c = sum(data[["c"]]), ac = sum(c(data[["a"]], data[["c"]])), bc = sum(c(data[["b"]], data[["c"]])), abc = sum(c(data[["a"]], data[["b"]], data[["c"]])), d=sum(data[["d"]]))
I think it is possible by extracting the list of data, by looping through each rows and each columns (I would have a variable number of columns though), but this seems quite clunky and slow, is there a better way that I am not seeing?
Thanks so much!
Fra
After searching I find no native way or current solution to change efficiently the position of an element in a numpy array, which seems to me quite natural operation. For example if I want to move the 3th element in the 1st position it should be like this:
x = np.array([1,2,3,4,5])
f*(x, 3, 1)
print x
array([1,4,2,3,5])
Im looking for a f* function here. This is different of rolling every elements, also for moves in big array I want to avoid copying operation that could be used by using insert and delete operation
Not sure about the efficiency, but here's an approach using masking -
def change_pos(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
This would support both forward and backward movement.
Sample runs
1) Element moving backward -
In [542]: in_arr
Out[542]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [543]: change_pos(in_arr,6,1)
In [544]: in_arr
Out[544]: array([4, 2, 9, 3, 6, 8, 0, 1])
^
2) Element moving forward -
In [546]: in_arr
Out[546]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [547]: change_pos(in_arr,1,6)
In [548]: in_arr
Out[548]: array([4, 3, 6, 8, 0, 2, 9, 1])
^
With the small example, this wholesale copy tests faster than #Divakar's masked in-place copy:
def foo4(arr, i,j):
L=arr.shape[0]
idx=np.concatenate((np.arange(j),[i],np.arange(j,i),np.arange(i+1,L)))
return arr[idx]
I didn't try to make it work for forward moves. An analogous inplace function runs at about the same speed as Divakar's.
def foo2(arr, i,j):
L=arr.shape[0]
tgt=np.arange(j,i+1)
src=np.concatenate([[i],np.arange(j,i)])
arr[tgt]=arr[src]
But timings could well be different if the array was much bigger and the swap involved a small block in the middle.
Since the data for an array is stored in a contiguous block of memory, elements cannot change place without some sort of copy. You'd have implement lists as a linked list to have a no-copy form of movement.
It just occurred to me that there are some masked copyto and place functions, that might make this sort of copy/movement faster. But I haven't worked with those much.
https://stackoverflow.com/a/40228699/901925
================
np.roll does
idx = np.concatenate((np.arange(2,5),np.arange(2)))
# array([2, 3, 4, 0, 1])
np.take(a, idx) # or a[idx]
In the past I have found the simple numpy indexing i.e. a[:-1]=a[1:] to be faster than most alternatives (including np.roll()). Comparing the two other answers with an 'in place' shift I get:
for shift from 40000 to 100
1.015ms divakar
1.078ms hpaulj
29.7micro s in place shift (34 x faster)
for shift from 40000 to 39900
0.975ms divakar
0.985ms hpaulj
3.47micro s in place shift (290 x faster)
timing comparison using:
import timeit
init = '''
import numpy as np
def divakar(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
def hpaulj(arr, fr, to):
L = arr.shape[0]
idx = np.concatenate((np.arange(to), [fr], np.arange(to, fr), np.arange(fr+1, L)))
return arr[idx]
def paddyg(arr, fr, to):
if fr >= arr.size or to >= arr.size:
return None
tmp = arr[fr].copy()
if fr > to:
arr[to+1:fr+1] = arr[to:fr]
else:
arr[fr:to] = arr[fr+1:to+1]
arr[to] = tmp
return arr
a = np.random.randint(0, 1000, (100000))
'''
fns = ['''
divakar(a, 40000, 100)
''', '''
hpaulj(a, 40000, 100)
''', '''
paddyg(a, 40000, 100)
''']
for f in fns:
print(timeit.timeit(f, setup=init, number=1000))