If we have an arbitrary double value f, another one v and a multiplication factor p, how can I snap the value f to the nearest v power of p?
Example:
f = 3150.0
v = 100.0
p = 2
the multiplications will go like this
100 (v)
200 (multiplied by p)
400
800
1600
3200
...
f is closest to 3200.0 so the function should return 3200.0
There was actually a name for this, which I seem to have forgotten and maybe this is why I couldn't find such a function.
Let k = floor(log_p(f/v)) where log_p(x) = log(x)/log(p) is the logarithm to base p function. It follows from the properties of floor and log that p^k v <= f < p^(k+1) v, which gives the two closest values to f of the form p^n v.
Which of those two values to choose depends on the exact definition of "nearest" in your use-case. If taken in the multiplicative sense (as would be natural on a log scale), that "nearest" value can be calculated directly as p^n v where n = round(log_p(f/v)) = round(log(f/v)/log(p)).
Related
I have a set of not-unique real numbers read from a file.
All these numbers were generated from a linear space, that is, the difference between numbers is always a multiple from a fixed value, the "step" or "grid size" of the linear space, so to say.
Each existing value will tipically appear many times in the file.
My goal would be to find how the values are spaced, so that I could put each (unique) value in an array and access its value with an index.
You are looking for the greatest common divisor of those numbers. Here it is in Python:
def gcd( a, b ):
"greatest common divisor"
while True:
c = a % b
if c < 1e-5:
return b
a, b = b, c
def gcdset( a_set ):
"use the pairwise gcd to find gcd of a set"
x = a_set.pop()
total = x
for u in a_set:
x = gcd( u, x )
# the following step is optional,
# some sort of stabilization just for improved accuracy
total = total + u
x = total / round(total/x)
return x
# the list where we want to find the gcd
inputlist = [2239.864226650253, 1250.4096410911607, 1590.1948696485413,
810.0479848807954, 2177.343744595695, 54.3656365691809, 2033.2748076873656,
2074.049035114251, 108.7312731383618, 2188.216871909531]
# we turn it into a set to get rid of duplicates
aset = set(inputlist)
print(gcdset( aset ))
If you don't have Python around you can play with this code here: http://ideone.com/N9xDWA
I am currently looking at Binomial Option Pricing. I have written the code below, which works fine, when you enter the variables in one at a time. However, entering each set of values is very tedious, and I need to be able to analyse a large set of data. I have created arrays for each of the variables. But, I keep getting the error; A(I) = B, the number of elements in B must equal I. The function is shown below.
function C = BinC(S0,K,r,sig,T,N);
% PURPOSE:
% To return the value of a European call option using the Binomial method
%-------------------------------------------------------------------------
% INPUTS:
% S0 - The initial price of the underlying asset
% K - The strike price
% r - The risk free rate of return, expressed as a decimal
% sig - The volatility of the underlying asset, expressed as a decimal
% T - The time to maturity, expressed as a decimal
% N - The number of steps
%-------------------------------------------------------------------------
dt = T/N;
u = exp(sig*sqrt(dt));
d = 1/u;
p = (exp(r*dt) - d)/(u - d);
S = zeros(N+1,1);
% Price of underlying asset at time T
for n = 1:N+1
S(n) = S0*(d^(N+1-n))*(u^(n-1));
end
% Price of Option at time T
for n = 1:N+1
C(n) = max(S(n)- K, 0);
end
% Backtrack to get option price at time 0
for i = N:-1:1
for n = 1:i
C(n) = exp(-r*dt)*(p*C(n+1) + (1-p)*C(n));
end
end
disp(C(1))
After importing my data, I entered this in to the command window.
for i=1:20
w(i)= BinC(S0(i),K(i),r(i),sig(i),T(i),N(i));
end
When I enter w, all I get back is w = []. I have no idea how I can make A(I) = B. I apologise, if this is a very silly question, but I am new to Matlab and in need of help. Thanks
Your function computes an entire vector C, but displays only C(1). This display is deceptive: it makes you think the function is returning a scalar, but it's not: it's returning the entire vector C, which you try to store into a scalar location.
The solution is simple: Change your function definition to this (rename the output variable):
function out = BinC(S0,K,r,sig,T,N);
Then at the last line of the function, remove the disp, and replace it with
out = C(1);
To verify all of this (compare with your non-working example), try calling it by itself at the command line, and examine the output.
This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;
I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?
I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))
Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.
An rpc server is given which receives millions of requests a day. Each request i takes processing time Ti to get processed. We want to find the 65th percentile processing time (when processing times are sorted according to their values in increasing order) at any moment. We cannot store processing times of all the requests of the past as the number of requests is very large. And so the answer need not be exact 65th percentile, you can give some approximate answer i.e. processing time which will be around the exact 65th percentile number.
Hint: Its something to do how a histogram (i.e. an overview) is stored for a very large data without storing all of data.
Take one day's data. Use it to figure out what size to make your buckets (say one day's data shows that the vast majority (95%?) of your data is within 0.5 seconds of 1 second (ridiculous values, but hang in)
To get 65th percentile, you'll want at least 20 buckets in that range, but be generous, and make it 80. So you divide your 1 second window (-0.5 seconds to +0.5 seconds) into 80 buckets by making each 1/80th of a second wide.
Each bucket is 1/80th of 1 second. Make bucket 0 be (center - deviation) = (1 - 0.5) = 0.5 to itself + 1/80th of a second. Bucket 1 is 0.5+1/80th - 0.5 + 2/80ths. Etc.
For every value, find out which bucket it falls in, and increment a counter for that bucket.
To find 65th percentile, get the total count, and walk the buckets from zero until you get to 65% of that total.
Whenever you want to reset, set the counters all to zero.
If you always want to have good data available, keep two of these, and alternate resetting them, using the one you reset least recently as having more useful data.
Use an updown filter:
if q < x:
q += .01 * (x - q) # up a little
else:
q += .005 * (x - q) # down a little
Here a quantile estimator q tracks the x stream,
moving a little towards each x.
If both factors were .01, it would move up as often as down,
tracking the 50 th percentile.
With .01 up, .005 down, it floats up, 67 th percentile;
in general, it tracks the up / (up + down) th percentile.
Bigger up/down factors track faster but noisier --
you'll have to experiment on your real data.
(I have no idea how to analyze updowns, would appreciate a link.)
The updown() below works on long vectors X, Q in order to plot them:
#!/usr/bin/env python
from __future__ import division
import sys
import numpy as np
import pylab as pl
def updown( X, Q, up=.01, down=.01 ):
""" updown filter: running ~ up / (up + down) th percentile
here vecs X in, Q out to plot
"""
q = X[0]
for j, x in np.ndenumerate(X):
if q < x:
q += up * (x - q) # up a little
else:
q += down * (x - q) # down a little
Q[j] = q
return q
#...............................................................................
if __name__ == "__main__":
N = 1000
up = .01
down = .005
plot = 0
seed = 1
exec "\n".join( sys.argv[1:] ) # python this.py N= up= down=
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, suppress=True ) # .2f
title = "updown random.exponential: N %d up %.2g down %.2g" % (N, up, down)
print title
X = np.random.exponential( size=N )
Q = np.zeros(N)
updown( X, Q, up=up, down=down )
# M = np.zeros(N)
# updown( X, M, up=up, down=up )
print "last 10 Q:", Q[-10:]
if plot:
fig = pl.figure( figsize=(8,3) )
pl.title(title)
x = np.arange(N)
pl.plot( x, X, "," )
pl.plot( x, Q )
pl.ylim( 0, 2 )
png = "updown.png"
print >>sys.stderr, "writing", png
pl.savefig( png )
pl.show()
An easier way to get the value that represents a given percentile of a list or array is the scoreatpercentile function in the scipy.stats module.
>>>import scipy.stats as ss
>>>ss.scoreatpercentile(v,65)
there's a sibling percentileofscore to return the percentile given the value
you will need to store a running sum and a total count.
then check out standard deviation calculations.