Difference between symbolic differentiation and automatic differentiation? - symbolic-math

I just cannot seem to understand the difference. For me it looks like both just go through an expression and apply the chain rule.. What am I missing?

There are 3 popular methods to calculate the derivative:
Numerical differentiation
Symbolic differentiation
Automatic differentiation
Numerical differentiation relies on the definition of the derivative: , where you put a very small h and evaluate function in two places. This is the most basic formula and on practice people use other formulas which give smaller estimation error. This way of calculating a derivative is suitable mostly if you do not know your function and can only sample it. Also it requires a lot of computation for a high-dim function.
Symbolic differentiation manipulates mathematical expressions. If you ever used matlab or mathematica, then you saw something like this
Here for every math expression they know the derivative and use various rules (product rule, chain rule) to calculate the resulting derivative. Then they simplify the end expression to obtain the resulting expression.
Automatic differentiation manipulates blocks of computer programs. A differentiator has the rules for taking the derivative of each element of a program (when you define any op in core TF, you need to register a gradient for this op). It also uses chain rule to break complex expressions into simpler ones. Here is a good example how it works in real TF programs with some explanation.
You might think that Automatic differentiation is the same as Symbolic differentiation (in one place they operate on math expression, in another on computer programs). And yes, they are sometimes very similar. But for control flow statements (`if, while, loops) the results can be very different:
symbolic differentiation leads to inefficient code (unless carefully
done) and faces the difficulty of converting a computer program into a
single expression

It is a common claim, that automatic differentiation and symbolic differentiation are different. However, this is not true. Forward mode automatic differentiation and symbolic differentiation are in fact equivalent. Please see this paper.
In short, they both apply the chain rule from the input variables to the output variables of an expression graph. It is often said, that symbolic differentiation operates on mathematical expressions and automatic differentiation on computer programs. In the end, they are actually both represented as expression graphs.
On the other hand, automatic differentiation also provides more modes. For instance, when applying the chain rule from output variables to input variables then this is called reverse mode automatic differentiation.

"For me it looks like both just go through an expression and apply the chain rule. What am I missing?"
What you're missing is that AD works with numerical values, while symbolic differentiation works with symbols which represent those values. Let's look at simple example to flesh this out.
Suppose I want to compute the derivative of the expression y = x^2.
If I were doing symbolic differentiation, I would start with the symbol x, and I would square it to get y = x^2, and then I would use the chain rule to know that the dervivate dy/dx = 2x. Now, if I want the derivative for x=5, I can plug that into my expression, and get the derivative. But since I have the expression for the derivative, I can plug in any value of x and compute the derivative without having to repeat the chain rule computations.
If I were doing automatic differentiation, I would start with the value x = 5, and then compute y = 5^2 = 25, and compute the derivative as dy/dx = 2*5 = 10. I would have computed the value and the derivative. However, I know nothing about the value of the derivative at x=4. I would have to repeat the process with x=4 to get the derivative at x=4.

Related

How expressive can we be with arrays in Z3(Py)? An example

My first question is whether I can express the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
I know this kind of expressions can be queried in Dafny and (since Dafny uses Z3 below) I guess this can be done in Z3Py.
My second question is: how expressive is the decidable fragment involving arrays in Z3?
I read this paper on how the full theory of arrays is not decidable (http://theory.stanford.edu/~arbrad/papers/arrays.pdf), but only a concrete fragment, the array property fragment.
Is there any interesting paper/tutorial on what can and cannot be done with arrays+quantifiers+functions in Z3?
You found the best paper to read regarding reasoning with Array's, so I doubt there's a better resource or a tutorial out there for you.
I think the sequence logic (not yet officially supported by SMTLib, but z3 supports it), is the right logic to use for reasoning about these sorts of problems, see: https://microsoft.github.io/z3guide/docs/theories/Sequences/
Having said that, most properties about arrays/sequences of "arbitrary size" require inductive proofs. This is because most interesting functions on them are essentially recursive (or iterative), and induction is the only way to prove properties for such programs. While SMT solvers improved significantly regarding support for recursive definitions and induction, they still don't perform anywhere near well compared to a traditional theorem prover. (This is, of course, to be expected.)
I'd recommend looking at the sequence logic, and playing around with recursive definitions. You might get some mileage out of that, though don't expect proofs for anything that require induction, especially if the inductive-hypothesis needs some clever invariant to be specified.
Note that if you know the length of your array concretely (i.e., 10, 15, or some other hopefully not too large a number), then it's best to allocate the elements symbolically yourself, and not use arrays/sequences at all. (And you can repeat your proof for lenghts 0, 1, 2, .. upto some fixed number.) But if you want proofs that work for arbitrary lengths, your best bet is to use sequences in z3, not arrays; with all the caveats I mentioned above.

What's the best way to read a mathematical function f(x,y) command line argument?

From main(), I want the user to input a mathematical function (I,e: 2xy) through the command line. From there, I initially thought to iterate through the string and parse out different arithmetic operators, x, y, etc. However, this could become fairly complicated for more intricate functions, (e.g: (2x^2)/5 +sqrt(x^4) ). Is there a more general method to be able to parse a mathematical function string like this one?
One of the most helpful ways to deal with parsing issues like that is to switch the input methods from equations like that to an RPN based input where the arguments come first and the operators come last.
Rewriting your complex equation would end up looking like:
2 2 x ^ * 5 / x 4 ^ sqrt +
This is generally easier to implement, as you can do it with a simple stack -- pushing new arguments on, while the operators pull the require pieces off the stack and put the result back on. Greatly simplifies the parsing, but you still need to implement the functions.
What you need is an expression evaluator.
A while ago, I wrote a complete C expression evaluator (i.e. evaluated expressions written using C syntax) for a command line processor and scripting language on an embedded system. I used this description of the algorithm as a starting point. You could use the accompanying code directly, but I did not like the implementation, and wrote my own from the algorithm description.
It needed some work to support all C operators, function calls, and variables, but is a clear explanation and therefore a good starting point, especially if you don't need that level of completeness.
The basic principle is that expression evaluation is easier for a computer using a stack and 'Reverse Polish Notation', so the algorithm converts an in-fix notation expression with associated order of precedence and parentheses to RPN, and then evaluates it by popping operands, performing operations, and pushing results, until there are no operations left and one value left on the stack.
It might get a bit more complicated is you choose to deal with implicit multiply operators (2xy rather then 2 * x * y for example. Not least because you'd need to unambiguously distinguish the variables x and y from a single variable xy. That is probably only feasible if you only allow single character variable names. I suggest you either do that and insert explicit multiply operators on the operator stack as part of the parse, or you disallow implicit multiply.

What is the best way to find an input for a function if you already know the output?

I'm working on a fairly complicated program here and unfortunately I've painted myself into a bit of a corner.
I have a function (let's call it f(x) for simplicity) that I know the output value of, and I need to find the input value that generates that output value (to within a certain threshold).
Unfortunately the equations behind f(x) are fairly complicated and I don't have all the information I need to simply run them in reverse- so I'm forced to perform some sort of brute force search to find the right input variable instead.
The outputs for f(x) are guaranteed to be ordered, in such a way that f(x - 1) < f(x) < f(x + 1) is always true.
What is the most efficient way to find the value of x? I'm not entirely sure if this is a "root finding" problem- it seems awfully close, but not quite. I figure there's gotta be some official name for this sort of algorithm, but I haven't been able to find anything on Google.
I'm assuming that x is an integer so the result f(x - 1) < f(x) < f(x + 1) means that the function is strictly monotonic.
I'll also assume your function is not pathological, such as
f(x) = x * cos(2 * pi * x)
which satisfies your property but has all sorts of nasties between integer values of x.
A linear bisection algorithm is appropriate and tractable here (and you could adapt it to functions which are badly behaved for non-integral x), Brent might recover the solution faster. Such algorithms may well return you a non-integral value of x, but you can always check the integers either side of that, and return the best one (that will work if the function is monotonic in all real values of x). Furthermore, if you have an analytic first derivative of f(x), then an adaption of Newton Raphson might work well, constraining x to be integral (which might not make much sense, depending on your function; it would be disastrous to apply it to the pathological example above!). Newton Raphson is cute since you only need one starting point, unlike Linear Bisection and Brent which both require the root to be bracketed.
Do Google the terms that I've italicised.
Reference: Brent's Method - Wikipedia
For a general function, I would do the following:
Evaluate at 0 and determine if x is positive or negative.
(Assuming positive) . . . Evaluate powers of 2 until you bound the value (1, 2, 4, 8, . . . )
Once you have bounds then do repeated bisection until you get the precision you are looking for
If this is being called multiple times, I would cache the values along the way to reduce the time needed for subsequent operations.

Computational Efficiency of Forward Mode Automatic vs Numeric vs Symbolic Differentiation

I am trying to solve a problem of finding the roots of a function using the Newton-Raphson (NR) method in the C language. The functions in which I would like to find the roots are mostly polynomial functions but may also contain trigonometric and logarithmic.
The NR method requires finding the differential of the function. There are 3 ways to implement differentiation:
Symbolic
Numerical
Automatic (with sub types being forward mode and reverse mode. For this particular question, I would like to focus on forward mode)
I have thousands of these functions all requiring finding roots in the quickest time possible.
From the little that I do know, Automatic differentiation is in general quicker than symbolic because it handles the problem of "expression swell" alot more efficiently.
My question therefore is, all other things being equal, which method of differentiation is more computationally efficient: Automatic Differentiation (and more specifically, forward mode) or Numeric differentiation?
If your functions are truly all polynomials, then symbolic derivatives are dirt simple. Letting the coefficients of the polynomial be stored in an array with entries p[k] = a_k, where index k corresponds to the coefficient of x^k, then the derivative is represented by the array with entries dp[k] = (k+1) p[k+1]. For multivariable polynomial, this extends straightforwardly to multidimensional arrays. If your polynomials are not in standard form, e.g. if they include terms like (x-a)^2 or ((x-a)^2-b)^3 or whatever, a little bit of work is needed to transform them into standard form, but this is something you probably should be doing anyways.
If the derivative is not available, you should consider using the secant or regula falsi methods. They have very decent convergence speed (φ-order instead of quadratic). An additional benefit of regula falsi, is that the iterations remains confined to the initial interval, which allows reliable root separation (which Newton does not).
Also note than in the case of numerical evaluation of the derivatives, you will require several computations of the functions, at best two of them. Then the actual convergence speed drops to √2, which is outperformed by the derivative-less methods.
Also note that the symbolic expression of the derivatives is often more costly to evaluate than the functions themselves. So one iteration of Newton costs at least two function evaluations, spoiling the benefit of the convergence rate.

optimizing with IEEE floating point - guaranteed mathematical identities?

I am having some trouble with IEEE floating point rules preventing compiler optimizations that seem obvious. For example,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
cannot be optimized to just return 1 because NaN == NaN is false. Okay, fine, I guess.
However, I want to write such that the optimizer can actually fix stuff up for me. Are there mathematical identities that hold for all floats? For example, I would be willing to write !(x - x) if it meant the compiler could assume that it held all the time (though that also isn't the case).
I see some reference to such identities on the web, for example here, but I haven't found any organized information, including in a light scan of the IEEE 754 standard.
It'd also be fine if I could get the optimizer to assume isnormal(x) without generating additional code (in gcc or clang).
Clearly I'm not actually going to write (x == x) in my source code, but I have a function that's designed for inlining. The function may be declared as foo(float x, float y), but often x is 0, or y is 0, or x and y are both z, etc. The floats represent onscreen geometric coordinates. These are all cases where if I were coding by hand without use of the function I'd never distinguish between 0 and (x - x), I'd just hand-optimize stupid stuff away. So, I really don't care about the IEEE rules in what the compiler does after inlining my function, and I'd just as soon have the compiler ignore them. Rounding differences are also not very important since we're basically doing onscreen drawing.
I don't think -ffast-math is an option for me, because the function appears in a header file, and it is not appropriate that the .c files that use the function compile with -ffast-math.
Another reference that might be of some use for you is a really nice article on floating-point optimization in Game Programming Gems volume 2, by Yossarian King. You can read the article here. It discusses the IEEE format in quite detail, taking into account implementations and architecture, and provides many optimization tricks.
I think that you are always going to struggle to make computer floating-point-number arithmetic behave like mathematical real-number arithmetic, and suggest that you don't for any reason. I suggest that you are making a type error trying to compare the equality of 2 fp numbers. Since fp numbers are, in the overwhelming majority, approximations, you should accept this and use approximate-equality as your test.
Computer integers exist for equality testing of numerical values.
Well, that's what I think, you go ahead and fight the machine (well, all the machines actually) if you wish.
Now, to answer some parts of your question:
-- for every mathematical identity you are familiar with from real-number arithmetic, there are counter examples in the domain of floating-point numbers, whether IEEE or otherwise;
-- 'clever' programming almost always makes it more difficult for a compiler to optimise code than straightforward programming;
-- it seems that you are doing some graphics programming: in the end the coordinates of points in your conceptual space are going to be mapped to pixels on a screen; pixels always have integer coordinates; your translation from conceptual space to screen space defines your approximate-equality function
Regards
Mark
If you can assume that floating-point numbers used in this module will not be Inf/NaN, you can compile it with -ffinite-math-only (in GCC). This may "improve" the codegen for examples like the one you posted.
You could compare for bitwise equality. Although you might get bitten for some values that are equivalent but bitwise different, it will catch all those cases where you have a true equality as you mentioned. And I am not sure the compiler will recognize what you do and remove it when inlining (which I believe is what you are after), but that can easily be checked.
What happened when you tried it the obvious way and profiled it? or examined the generated asm?
If the function is inlined with values known at the call site, the optimizer has this information available. For example: foo(0, y).
You may be surprised at the work you don't have to do, but at the very least profiling or looking at what the compiler actually does with the code will give you more information and help you figure out where to proceed next.
That said, if you know certain things that the optimizer can't figure out itself, you can write multiple versions of the function, and specify the one you want to call. This is something of a hassle, but at least with inline functions they will all be specified together in one header. It's also quite a bit easier than the next step, which is using inline asm to do exactly what you want.

Resources