I want to write a C program that will calculate a series:
1/x + 1/2*x^2 + 1/3*x^3 + 1/4*x^4 + ...
up to five decimal places.
The program will take x as input and print the f(x) (value of series) up to five decimal places. Can you help me?
For evaluating a polynomial, Horner form generally has better numerical stability than expanded form See http://reference.wolfram.com/legacy/v5/Add-onsLinks/StandardPackages/Algebra/Horner.html
If first term was a typo then try (((((1/4 )* x + 1/3) * x ) + 1/2) * x + 1) * x
Else if first term is really 1/x (((((1/4 )* x + 1/3) * x ) + 1/2) * x*x + 1/x
Of course, you still have to analyze convergence and numerical stability as developped in Eric Postpischil answer.
Last thing, does the serie you submited as example really converge to a finite value for some x???
In order to know that the sum you have calculated is within a desired distance to the limit of the series, you need to demonstrate that the sources of error are less than the desired distance.
When evaluating a series numerically, there are two sources of error. One is the limitations of numerical calculation, such as floating-point rounding. The other is the sum of the remaining terms, which have not been added into the partial sum.
The numerical error depends on the calculations done. For each series you want to evaluate, a custom analysis of the error must be performed. For the sample series you show, a crude but sufficient bound on the numerical error could like be calculated without too much effort. Is this the series you are primarily interested in, or are there others?
The sum of the remaining terms also requires a custom analysis. Often, given a series, we can find an expression that can be proven to be at least as large as the sum of all remaining terms but that is more easily calculated.
After you have established bounds on these two errors, you could sum terms of the series until the sum of the two bounds is less than the desired distance.
Related
I have a very large NoSQL database. Each item in the database is assigned a uniformly distributed random value between 0 and 1. This database is so large that performing a COUNT on queries does not yield acceptable performance, but I'd like to use the random values to estimate COUNT.
The idea is this:
Run a query and order the query by the random value. Random values are indexed, so it's fast.
Grab the lowest N values, and see how big the largest value is, say R.
Estimate COUNT as N / R
The question is two-fold:
Is N / R the best way to estimate COUNT? Maybe it should be (N+1)/R? Maybe we could look at the other values (average, variance, etc), and not just the largest value to get a better estimate?
What is the error margin on this estimated value of COUNT?
Note: I thought about posting this in the math stack exchange, but given this is for databases, I thought it would be more appropriate here.
This actually would be better on math or statistics stack exchange.
The reasonable estimate is that if R is large and x is your order statistic, then R is approximately n / x - 1. About 95% of the time the error will be within 2 R / sqrt(n) of this. So looking at the 100th element will estimate the right answer to within about 20%. Looking at the 10,000th element will estimate it to within about 2%. And the millionth element will get you the right answer to within about 0.2%.
To see this, start with the fact that the n'th order statistic has a Beta distribution with parameters 𝛼 = n and β = R + 1 - n. Which means that the mean value of the n'th smallest value out of R values is n/(R+1). And its variance is 𝛼β / ((𝛼 + β)^2 (𝛼 + β + 1)). If we assume that R is much larger than n, then this is approximately n R / R^3 = n / R^2. Which means that our standard deviation is sqrt(n) / R.
If x is our order statistic, this means that (n / x) - 1 is a reasonable estimate of R. And how much is it off by? Well, we can use the tangent line approximation. The function (n / x) - 1 has a derivative of - n / x^2 Its derivative at x = n/(R+1) is therefore (R + 1)^2 / n. Which for large R is roughly R^2 / n. Stick in our standard deviation of sqrt(n) / R and we come up with an error proportional to R / sqrt(n). Since a 95% confidence interval would be 2 standard deviations, you probably will have an error of around 2 R / sqrt(n).
I have stuck (again) and looking for smart human beings of planet earth to help me out.
Background
I have an application which distributes the amounts to some users in a given percentage. Say I have $35000 and it will distribute the amounts to 3 users (A, B and C) in some ratio. So the amount distributed will be
A - 5691.05459265518
B - 14654.473815207
C - 14654.4715921378
which totals up to $35000
The Problem
I have to provide the results on the front end in 2 decimal spaces instead of float. So I use the round function of SQL Server with the precision value of 2 to convert these to 2 decimal spaces with rounding. But the issue is that when I total these values this comes out to be $34999.9999 instead of $35000.
My Findings
I searched a bit and found
If the expression that you are rounding ends with a 5, the Round()
function will round the expression so that the last digit is an even
number. Here are some examples:
Round(34.55, 1) - Result: 34.6 (rounds up)
Round(34.65, 1) - Result: 34.6 (rounds down)
So technically the answer is correct but I am looking for a function or a way to round of the value exactly what it should have been. I found that if I start rounding off (if the value is less than 5 then leave the previous number else increment the previous digit by 1 ) from the last digit after the decimal and keep on backtracking while I am left with only 2 decimal places.
Please advise.
I'm facing the problem of computing values of a clothoid in C in real-time.
First I tried using the Matlab coder to obtain auto-generated C code for the quadgk-integrator for the Fresnel formulas. This essentially works great in my test scnearios. The only issue is that it runs incredibly slow (in Matlab as well as the auto-generated code).
Another option was interpolating a data-table of the unit clothoid connecting the sample points via straight lines (linear interpolation). I gave up after I found out that for only small changes in curvature (tiny steps along the clothoid) the results were obviously degrading to lines. What a surprise...
I know that circles may be plotted using a different formula but low changes in curvature are often encountered in real-world-scenarios and 30k sampling points in between the headings 0° and 360° didn't provide enough angular resolution for my problems.
Then I tried a Taylor approximation around the R = inf point hoping that there would be significant curvatures everywhere I wanted them to be. I soon realized I couldn't use more than 4 terms (power of 15) as the polynom otherwise quickly becomes unstable (probably due to numerical inaccuracies in double precision fp-computation). Thus obviously accuracy quickly degrades for large t values. And by "large t values" I'm talking about every point on the clothoid that represents a curve of more than 90° w.r.t. the zero curvature point.
For instance when evaluating a road that goes from R=150m to R=125m while making a 90° turn I'm way outside the region of valid approximation. Instead I'm in the range of 204.5° - 294.5° whereas my Taylor limit would be at around 90° of the unit clothoid.
I'm kinda done randomly trying out things now. I mean I could just try to spend time on the dozens of papers one finds on that topic. Or I could try to improve or combine some of the methods described above. Maybe there even exists an integrate function in Matlab that is compatible with the Coder and fast enough.
This problem is so fundamental it feels to me I shouldn't have that much trouble solving it. any suggetions?
about the 4 terms in Taylor series - you should be able to use much more. total theta of 2pi is certainly doable, with doubles.
you're probably calculating each term in isolation, according to the full formula, calculating full factorial and power values. that is the reason for losing precision extremely fast.
instead, calculate the terms progressively, the next one from the previous one. Find the formula for the ratio of the next term over the previous one in the series, and use it.
For increased precision, do not calculate in theta by rather in the distance, s (to not lose the precision on scaling).
your example is an extremely flat clothoid. if I made no mistake, it goes from (25/22) pi =~ 204.545° to (36/22) pi =~ 294.545° (why not include these details in your question?). Nevertheless it should be OK. Even 2 pi = 360°, the full circle (and twice that), should pose no problem.
given: r = 150 -> 125, 90 degrees turn :
r s = A^2 = 150 s = 125 (s+x)
=> 1+(x/s) = 150/125 = 1 + 25/125 x/s = 1/5
theta = s^2/2A^2 = s^2 / (300 s) = s / 300 ; = (pi/2) * (25/11) = 204.545°
theta2 = (s+x)^2/(300 s) = (6/5)^2 s / 300 ; = (pi/2) * (36/11) = 294.545°
theta2 - theta = ( 36/25 - 1 ) s / 300 == pi/2
=> s = 300 * (pi/2) * (25/11) = 1070.99749554 x = s/5 = 214.1994991
A^2 = 150 s = 150 * 300 * (pi/2) * (25/11)
a = sqrt (2 A^2) = 300 sqrt ( (pi/2) * (25/11) ) = 566.83264608
The reference point is at r = Infinity, where theta = 0.
we have x = a INT[u=0..(s/a)] cos(u^2) d(u) where a = sqrt(2 r s) and theta = (s/a)^2. write out the Taylor series for cos, and integrate it, term-by-term, to get your Taylor approximation for x as function of distance, s, along the curve, from the 0-point. that's all.
next you have to decide with what density to calculate your points along the clothoid. you can find it from a desired tolerance value above the chord, for your minimal radius of 125. these points will thus define the approximation of the curve by line segments, drawn between the consecutive points.
I am doing my thesis in the same area right now.
My approach is the following.
at each point on your clothoid, calculate the following (change in heading / distance traveled along your clothoid), by this formula you can calculate the curvature at each point by this simple equation.
you are going to plot each curvature value, your x-axis will be the distance along the clothoid, the y axis will be the curvature. By plotting this and applying very easy linear regression algorithm (search for Peuker algorithm implementation in your language of choice)
you can easily identify where are the curve sections with value of zero (Line has no curvature), or linearly increasing or decreasing (Euler spiral CCW/CW), or constant value != 0 (arc has constant curvature across all points on it).
I hope this will help you a little bit.
You can find my code on github. I implemented some algorithms for such problems like Peuker Algorithm.
The setup
I am writing a code for dealing with polynomials of degree n over d-dimensional variable x and ran into a problem that others have likely faced in the past. Such polynomial can be characterized by coefficients c(alpha) corresponding to x^alpha, where alpha is a length d multi-index specifying the powers the d variables must be raised to.
The dimension and order are completely general, but known at compile time, and could be easily as high as n = 30 and d = 10, though probably not at the same time. The coefficients are dense, in the sense that most coefficients are non-zero.
The number of coefficients required to specify such a polynomial is n + d choose n, which in high dimensions is much less than n^d coefficients that could fill a cube of side length n. As a result, in my situation I have to store the coefficients rather compactly. This is at a price, because retrieving a coefficient for a given multi-index alpha requires knowing its location.
The question
Is there a (straightforward) function mapping a d-dimensional multi-index alpha to a position in an array of length (n + d) choose n?
Ordering combinations
A well-known way to order combinations can be found on this wikipedia page. Very briefly you order the combinations lexically so you can easily count the number of lower combinations. An explanation can be found in the sections Ordering combinations and Place of a combination in the ordering.
Precomputing the binomial coefficients will speed up the index calculation.
Associating monomials with combinations
If we can now associate each monomial with a combination we can effectively order them with the method above. Since each coefficient corresponds with such a monomial this would provide the answer you're looking for. Luckily if
alpha = (a[1], a[2], ..., a[d])
then the combination you're looking for is
combination = (a[1] + 0, a[1] + a[2] + 1, ..., a[1] + a[2] + ... + a[d] + d - 1)
The index can then readily be calculated with the formula from the wikipedia page.
A better, more object oriented solution, would be to create Monomial and Polynomial classes. The Polynomial class would encapsulate a collection of Monomials. That way you can easily model a pathological case like
y(x) = 1.0 + x^50
using just two terms rather than 51.
Another solution would be a map/dictionary where the key was the exponent and the value is the coefficient. That would only require two entries for my pathological case. You're in business if you have a C/C++ hash map.
Personally, I don't think doing it the naive way with arrays is so terrible, even with a polynomial containing 1000 terms. RAM is cheap; that array won't make or break you.
Is there some advantage of writing
t = linspace(0,20,21)
over
t = 0:1:20
?
I understand the former produces a vector, as the first does.
Can anyone state me some situation where linspace is useful over t = 0:1:20?
It's not just the usability. Though the documentation says:
The linspace function generates linearly spaced vectors. It is
similar to the colon operator :, but gives direct control over the
number of points.
it is the same, the main difference and advantage of linspace is that it generates a vector of integers with the desired length (or default 100) and scales it afterwards to the desired range. The : colon creates the vector directly by increments.
Imagine you need to define bin edges for a histogram. And especially you need the certain bin edge 0.35 to be exactly on it's right place:
edges = [0.05:0.10:.55];
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 0 0 0
does not define the right bin edge, but:
edges = linspace(0.05,0.55,6); %// 6 = (0.55-0.05)/0.1+1
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 1 0 0
does.
Well, it's basically a floating point issue. Which can be avoided by linspace, as a single division of an integer is not that delicate, like the cumulative sum of floting point numbers. But as Mark Dickinson pointed out in the comments:
You shouldn't rely on any of the computed values being exactly what you expect. That is not what linspace is for. In my opinion it's a matter of how likely you will get floating point issues and how much you can reduce the probabilty for them or how small can you set the tolerances. Using linspace can reduce the probability of occurance of these issues, it's not a security.
That's the code of linspace:
n1 = n-1
c = (d2 - d1).*(n1-1) % opposite signs may cause overflow
if isinf(c)
y = d1 + (d2/n1).*(0:n1) - (d1/n1).*(0:n1)
else
y = d1 + (0:n1).*(d2 - d1)/n1
end
To sum up: linspace and colon are reliable at doing different tasks. linspace tries to ensure (as the name suggests) linear spacing, whereas colon tries to ensure symmetry
In your special case, as you create a vector of integers, there is no advantage of linspace (apart from usability), but when it comes to floating point delicate tasks, there may is.
The answer of Sam Roberts provides some additional information and clarifies further things, including some statements of MathWorks regarding the colon operator.
linspace and the colon operator do different things.
linspace creates a vector of integers of the specified length, and then scales it down to the specified interval with a division. In this way it ensures that the output vector is as linearly spaced as possible.
The colon operator adds increments to the starting point, and subtracts decrements from the end point to reach a middle point. In this way, it ensures that the output vector is as symmetric as possible.
The two methods thus have different aims, and will often give very slightly different answers, e.g.
>> a = 0:pi/1000:10*pi;
>> b = linspace(0,10*pi,10001);
>> all(a==b)
ans =
0
>> max(a-b)
ans =
3.5527e-15
In practice, however, the differences will often have little impact unless you are interested in tiny numerical details. I find linspace more convenient when the number of gaps is easy to express, whereas I find the colon operator more convenient when the increment is easy to express.
See this MathWorks technical note for more detail on the algorithm behind the colon operator. For more detail on linspace, you can just type edit linspace to see exactly what it does.
linspace is useful where you know the number of elements you want rather than the size of the "step" between them. So if I said make a vector with 360 elements between 0 and 2*pi as a contrived example it's either going to be
linspace(0, 2*pi, 360)
or if you just had the colon operator you would have to manually calculate the step size:
0:(2*pi - 0)/(360-1):2*pi
linspace is just more convenient
For a simple real world application, see this answer where linspace is helpful in creating a custom colour map