From what I'm reading, building the constant-1 and constant-0 operations in a quantum computer involves building something like this, where there's two qbits being used. Why do we need two?
The bottom qbit in both examples is not being used at all, so has no impact on the operation. Both operations seemingly only work if the top qbit's initial value is 0 so surely what this is just saying is that this is an operation which either flips a 0 or leaves it alone - in which case what is the second qbit needed for? Wouldn't a set-to-0 function set the input to 0 whatever it is and wouldn't need one of it's inputs to be predetermined?
Granted, the 'output' qbit is for output, but it's value still needs to be predetermined going in to the operation?
Update: I've posted this on the quantum computing stack exchange with links to a couple of blogs/video where you can see the below being brought up.
Related
I was going through the topic of induction variable elimination in the red dragon book, where I came across the following example.
Consider the control flow graph below :
Fig. 1 : Original Control Flow Graph
Now the authors apply strength reduction to the above graph to obtain the graph below:
Fig 2: Control flow graph after applying strength reduction
Example 10.4. After reduction in strength is applied to the inner loops around B2 and B3, the only use of i and j is to determine the outcome of the test in block B4. We know that the values of i and t2 satisfy the relationship t2 = 4*i, while those of j and t4 satisfy the relationship t4 = 4* j, so the test t2>=t4 is equivalent to i> = j. Once this replacement is made, i in block B2 and j in block B3 become dead variables and the assignments to them in these blocks become dead code that can be eliminated, resulting in the flow graph shown in Fig. 3 below. □
Fig 3: Flow graph after induction variable elimination
What I do not get is the claim that "i in block B2 and j in block B3 become dead variables". But if we consider the following graph along the green path in Fig. 2 :
The variables i and j are probably alive in the blocks B2 and B3 respectively, if we go along the path in green as shown and count for the use of i and j (in their respective blocks) on the right-hand side of their assignment. That particular use is a
The variables are no longer live because they have no observable effect.
They are incremented and decremented, but the values are never consulted for any purpose. They are not printed out. No control flow depends on them. No other variable is computed using their values. If they weren't incremented and decremented, nobody would notice.
Eliminating them will not affect program output in any way. So they should be eliminated.
As a more formal definition of liveness, we can start with the following:
A variable is live (at a point in the program) if that value of the variable will become observable (by being made visible outside of the execution of the program, see below).
A variable is also live if its current value is used in the computation of a live value.
That recursive definition excludes the use of a not-otherwise-used variable only for the computation of the value of itself or of other variables which are not live. It's simply a more precise way of saying what I said in the first part of the answer: an assignment is irrelevant if eliminating it would make no observable difference in the execution of the program.
The precise definition of "observable effect" will vary according to computation model, but it basically means that the value is in some way communicated to the world outside of the program execution. For example a value is live if it is printed on the console or written to a file (including being used as the name of a file to be created, because file directories are also files). It's live if it is stored in a database, or causes a light to blink. The C standard includes in the category of observable behaviour reading and writing volatile memory, which is a way of encapsulating CPUs which use loads and stores of specific memory addresses as a way of sending and receiving data from peripherals.
There's an old philosophical riddle: If a tree falls in an uninhabited forest, does it make a sound? If we ignore the anthoropocentricity of that question, it seems reasonable to answer, "No", as did many 19th century scientists. "Sound", they said, is not just a vibration of the air, but rather the result of the atmospheric vibration causing a neural reaction in an ear. (Certainly, it is possible to imagine a forest without any animate life at all, not just human life, so the philosopher can take refuge in that defense.) And that's basically where this model of computational liveness ends up: a computation is observable if it could be observed by someone. [Note 1]
Now, that's still open to interpretation because someone might, for example, "observe" a computation by measuring the amount of time that the computation took. In that sense, all optimisations should be observable, because they are pointless if they don't shorten computation time.
If we take that to be part of observable behaviour, then no useful optimisation is possible. So in most cases, this is not a particularly useful definition of observability. But there are a very few use cases in which preserving the amount of time a computation uses is necessary. The classic such case is countering a security attacks which deduce the value of what should be secret variables by timing various different uses of the value. If you were writing code designed to maintain a highly-confidential secret -- say, the password keys required to access a bank account -- then you might want to include loops in some control flows which have no computational purpose whatsoever, but rather are intended to take exactly the same amount of time as a different control flow which uses the same secret value.
For a more playful example, when I was much younger and computers used much more electricity to do much slower computations, we noticed that you could "listen" to the execution of a program by tuning a radio to pick up the electromagnetic vibrations being produced by the CPU. Then you could write different kinds of pointless loops to make different musical notes or rhythmic artefacts. This same kind of pointless loop can be used in a microcontroller in order to producing a blinking display, or even to directly drive an audio speaker. So there are definitely cases where you would want the compiler to not eliminate "useless" code.
Despite that, it is probably not a good idea to reject all optimisation techniques in order to enable predictable execution times. Most of the time, we would really prefer for our programs to work as fast as possible, or to consume the minimum amount of non-renewable energy; in other words, to avoid doing unnecessary work. But since there are use cases where optimisation can affect behaviour which is not normally considered observable, the compiler needs to provide the programmer with a mechanism to turn optimisation off in particular pieces of code. Those are not the cases being discussed by Aho&c, and with good reason.
Notes:
George Berkeley, writing in 1710:
… it seems no less evident that the various Sensations or Ideas imprinted on the Sense, however blended or combined together (that is, whatever Objects they compose) cannot exist otherwise than in a Mind perceiving them…
Some philosophers of the time posited the necessity of the existence of an omniscient God, in order to avoid the chaos which Berkeley summons up in which the objects in his writing studio suddenly cease to exist when he closes his eyes and are recreated in a blink when he opens them again. In this argument, God, who continually sees all, guarantees the continuity of existence of the objects in Bishop Berkeley's studio. That has always struck me as a peculiarly menial purpose for a deity. (Surely She could delegate such a mundane task to a subordinate.) But to each their own.
For more references and a little discussion, you can start here on Wikipedia. Or just listen to Bruce Cockburn's beautiful environmental anthem.
Context
The function BN_consttime_swap in OpenSSL is a thing of beauty. In this snippet, condition has been computed as 0 or (BN_ULONG)-1:
#define BN_CONSTTIME_SWAP(ind) \
do { \
t = (a->d[ind] ^ b->d[ind]) & condition; \
a->d[ind] ^= t; \
b->d[ind] ^= t; \
} while (0)
…
BN_CONSTTIME_SWAP(9);
…
BN_CONSTTIME_SWAP(8);
…
BN_CONSTTIME_SWAP(7);
The intention is that so as to ensure that higher-level bignum operations take constant time, this function either swaps two bignums or leaves them in place in constant time. When it leaves them in place, it actually reads each word of each bignum, computes a new word that is identical to the old word, and write that result back to the original location.
The intention is that this will take the same time as if the bignums had effectively been swapped.
In this question, I assume a modern, widespread architecture such as those described by Agner Fog in his optimization manuals. Straightforward translation of the C code to assembly (without the C compiler undoing the efforts of the programmer) is also assumed.
Question
I am trying to understand whether the construct above characterizes as a “best effort” sort of constant-time execution, or as perfect constant-time execution.
In particular, I am concerned about the scenario where bignum a is already in the L1 data cache when the function BN_consttime_swap is called, and the code just after the function returns start working on the bignum a right away. On a modern processor, enough instructions can be in-flight at the same time for the copy not to be technically finished when the bignum a is used. The mechanism allowing the instructions after the call to BN_consttime_swap to work on a is memory dependence speculation. Let us assume naive memory dependence speculation for the sake of the argument.
What the question seems to boil down to is this:
When the processor finally detects that the code after BN_consttime_swap read from memory that had, contrary to speculation, been written to inside the function, does it cancel the speculative execution as soon as it detects that the address had been written to, or does it allow itself to keep it when it detects that the value that has been written is the same as the value that was already there?
In the first case, BN_consttime_swap looks like it implements perfect constant-time. In the second case, it is only best-effort constant-time: if the bignums were not swapped, execution of the code that comes after the call to BN_consttime_swap will be measurably faster than if they had been swapped.
Even in the second case, this is something that looks like it could be fixed for the foreseeable future (as long as processors remain naive enough) by, for each word of each of the two bignums, writing a value different from the two possible final values before writing either the old value again or the new value. The volatile type qualifier may need to be involved at some point to prevent an ordinary compiler to over-optimize the sequence, but it still sounds possible.
NOTE: I know about store forwarding, but store forwarding is only a shortcut. It does not prevent a read being executed before the write it is supposed to come after. And in some circumstances it fails, although one would not expect it to in this case.
Straightforward translation of the C code to assembly (without the C compiler undoing the efforts of the programmer) is also assumed.
I know it's not the thrust of your question, and I know that you know this, but I need to rant for a minute. This does not even qualify as a "best effort" attempt to provide constant-time execution. A compiler is licensed to check the value of condition, and skip the whole thing if condition is zero. Obfuscating the setting of condition makes this less likely to happen, but is no guarantee.
Purportedly "constant-time" code should not be written in C, full stop. Even if it is constant time today, on the compilers that you test, a smarter compiler will come along and defeat you. One of your users will use this compiler before you do, and they will not be aware of the risk to which you have exposed them. There are exactly three ways to achieve constant time that I am aware of: dedicated hardware, assembly, or a DSL that generates machine code plus a proof of constant-time execution.
Rant aside, on to the actual architecture question at hand: assuming a stupidly naive compiler, this code is constant time on the µarches with which I am familiar enough to evaluate the question, and I expect it to broadly be true for one simple reason: power. I expect that checking in a store queue or cache if a value being stored matches the value already present and conditionally short-circuiting the store or avoiding dirtying the cache line on every store consumes more energy than would be saved in the rare occasion that you get to avoid some work. However, I am not a CPU designer, and do not presume to speak on their behalf, so take this with several tablespoons of salt, and please consult one before assuming this to be true.
This blog post, and the comments made by the author, Henry, on the subject of this question should be considered as authoritative as anyone should allowed to expect. I will reproduce the latter here for archival:
I didn’t think the case of overwriting a memory location with the same value had a practical use. I think the answer is that in current processors, the value of the store is irrelevant, only the address is important.
Out here in academia, I’ve heard of two approaches to doing memory disambiguation: Address-based, or value-based. As far as I know, current processors all do address-based disambiguation.
I think the current microbenchmark has some evidence that the value isn’t relevant. Many of the cases involve repeatedly storing the same value into the same location (particularly those with offset = 0). These were not abnormally fast.
Address-based schemes uses a store queue and a load queue to track outstanding memory operations. Loads check the store queue to for an address match (Should this load do store-to-load forwarding instead of reading from cache?), while stores check the load queue (Did this store clobber the location of a later load I allowed to execute early?). These checks are based entirely on addresses (where a store and load collided). One advantage of this scheme is that it’s a fairly straightforward extension on top of store-to-load forwarding, since the store queue search is also used there.
Value-based schemes get rid of the associative search (i.e., faster, lower power, etc.), but requires a better predictor to do store-to-load forwarding (Now you have to guess whether and where to forward, rather than searching the SQ). These schemes check for ordering violations (and incorrect forwarding) by re-executing loads at commit time and checking whether their values are correct. In these schemes, if you have a conflicting store (or made some other mistake) that still resulted in the correct result value, it would not be detected as an ordering violation.
Could future processors move to value-based schemes? I suspect they might. They were proposed in the mid-2000s(?) to reduce the complexity of the memory execution hardware.
The idea behind constant-time implementation is not to actually perform everything in constant time. That will never happen on an out-of-order architecture.
The requirement is that no secret information can be revealed by timing analysis.
To prevent this there are basically two requirements:
a) Do not use anything secret as a stop condition for a loop, or as a predicate to a branch. Failing to do so will open you to a branch prediction attack https://eprint.iacr.org/2006/351.pdf
b) Do not use anything secret as an index to memory access. This leads to cache timing attacks http://www.daemonology.net/papers/htt.pdf
As for your code: assuming that your secret is "condition" and possibly the contents of a and b the code is perfectly constant time in the sense that its execution does not depend on the actual contents of a, b and condition. Of course the locality of a and b in memory will affect the execution time of the loop, but not the CONTENTS which are secret.
That is assuming of course condition was computed in a constant time manner.
As for C optimizations: the compiler can only optimize code based on information it knows. If "condition" is truly secret the compiler should not be able to discern it contents and optimize. If it can be deducted from your code then the compiler will most likely make optimization for the 0 case.
I am making an expectimax AI, and the branching factor of this game is unpredictable, ranging from 6 - 20. I'm currently exploring the game tree for 1 second every turn, then making sure the whole game tree is explored to the same depth, but occasionally this results in a very large slowdown, if branching factor for a particular turn jumps up radically. Is if OK if I cut off exploration when parts of the game tree are not explored as deeply? Will this affect the mathematical properties of expectimax at all?
Short answer: I'm pretty sure you lose the mathematical guarantees, but the extent to which this affects your program's performance will probably depend on the game and your board evaluation function.
Here's an abstract scenario to give you some intuition for where having different branch lengths might create the most problems: say that, for player one, the best move is something that takes a few turns to set up. Let's say this set up is not something that your board evaluation function can pick up on. In this case, regardless of what player 2 does in the mean time, there will be a point a few moves in the future where the score of the board will swing in the direction that favors player 1. If one branch gets far enough to see that move and another doesn't, it will look like the first is a worse option for player 2, despite the fact that the same thing will happen on the other branch. If the move that player 2 made in the first branch was actually better than the move it made in the second branch, this would lead to a suboptimal choice.
On the other hand, a perfect board evaluator would make this impossible (it would recognize player 1 setting up their move). There are also games in which setting up moves in advance like this is not possible. But the existence of this case is a red flag.
Fundamentally, branches that didn't get evaluated as far have greater uncertainty in their estimations of how good a move is. This will sometimes result in them being chosen when they shouldn't be and other times it will result in them not being chosen when they should be. As a result, I would strongly suspect that you lose mathematical guarantees by doing this. That said, the practical impact of this problem on performance may or may not be substantial.
There might be some way around this if you incorporate the current turn number into the board evaluation function and adjust for it accordingly. Minimally, that would allow you to explicitly account for the added uncertainty in shorter branches.
I once was talking to my programming teacher about quantum computers, and I remember him telling me that one limitation of these kind of machines would be that you can't actually do something like x = y. I was wondering why is quantum assignment impossible? Does anyone have a clear answer?
Your teacher was referring to the fact that all quantum operations are reversible, because they are unitary transformations. They can be undone. Since assignment can't be undone, it's not a unitary transformation and therefore can't be done by a quantum computer.
But! Our universe runs on quantum mechanics, so how can classical computers do assignment?
Well, if you have a bunch of qubits you know are zero, then you can swap them into your variable. This clears the variable, and now you can add in the value you want to be there. This process is reversible, and acts like assignment. It decreases your supply of known-to-be-zero qubits, but until the sun goes out we can make a steady supply of those.
That being said, the garbage you are swapping away affects how things interfere with each other. So quantum algorithms often include steps to get rid of this garbage by undoing part of an operation while keeping the result, or else they won't work.
The fact that the quantum assignment is impossible comes from the no-cloning theorem which states that the quantum state cannot be copied. In other words, we cannot create two identical copies from one state.
You can find no-cloning theorem in any quantum computing textbook or tutorial.
I am working on an application where I have to keep data sequenced, every unit of data comes with a sequence number where, I check if the sequence number is 1 greater than the previous one, if it is, I increase my received count by 1. My question is, Is there a difference between :
1. in increasing my received count by one.
AND
2. assigning the last received sequence number to received count.
Thanks.
It sounds like a classic premature optimization question to me. Generally increasing value would mean "fetch original->change->store", while assigning would be "fetch other->store new". The "other" would probably be fetched already, thus saving even more clock cycles. Thus assigning would probably be faster.
BUT increment by 1 is usually very well optimized by the compilers and CPU's so that it wouldn't require any fetching or storing. It can very well be done in one CPU command, thus eliminating any difference, and in fact making increment by 1 probably better option performance-wise.
Confused? Good.
Point is that this is the kind of optimization you should not be doing, unless you benchmarked a bottle neck. Then you benchmark the options and chose the best.