Challenges in Enabling Practical-Scale Quantum Computing? - quantum-computing

This question will be useful to many.
Can we collate a list of challenges in enabling Practical-Scale Quantum Computing?
Nathan Aw

The challenges are dependent on the physical architecture. Pick a physical implementation and look for which of the DiVincenzo's criterias are missing. These would be the chosen architecture's challenges.

First of all, I should mention that when people say they want a practical-scale quantum computer, what they probably mean is that they want a device that could perform a computation and yield a good result with a probability arbitrarily close to 100%. This can be achieved with imperfect qubits by using Quantum-Error-Correction algorithms, but it still requires somewhat decent quality qubits (and "decent" by today's standards is of the highest quality people do). There's a qubit quality threshold above which using those error correction algorithms becomes useful and reduces the logical error rate. Below that threshold, it is best to hope for no errors and not perform any error correction.
To answer the question:
""Are there physical implementations that fulfill the 5 DiVincenzo's criteria?""
Some architectures partially fulfill all of them, but not to the level needed to perform error correction & therefore enable practical-scale and reliable quantum computing. For example, IBM and Rigetti-computing (maybe google too? to be verified) have small-scale quantum computers using superconducting/circuit qubits, but each individual qubit is not that great (by error correction standards) and several of the qubits features are below Quantum-Error-Correction threshold. This means that performing a computation could yield an incorrect result. The probability to get a wrong result depends on a bunch of factors such as the coherence time of each of the individual qubits, single and 2-qubit gates fidelities, readout fidelity, etc.
Such a small-scale quantum computer does perform, in a sense, quantum computing. It is just not yet at the required reliability to do interesting applications in quantum chemistry or condensed matter (which are expected to be the killer-apps of a quantum computer).

Related

which code is consuming less power?

My goal is to develop and implement a green algorithm for some special situation. I have developed two algorithms for the same.
One is having large no. of memory accesses(load and store). The pattern is some time coalesced and some time non-coalesced. I am assuming a worst case where most of the access will result in cache failure. See sample Code snippet a).
Another is having large no. of calculations, roughly equivalent to the code snippet b) below.
How do I estimate power consumption in each case. Which one is more energy efficient and why?
Platform: I will be running these codes on Intel I3 processor, with Windows 7, with 4 GB DRAM, 3 MB Cache.
Note: I do not want to use any external power meter. Also please ignore if you find the code not doing any constructive job. This is because it is only fraction of the complete algorithm.
UPDATE:
It is difficult but not impossible. One can very well calculate the cost incurred in reading DRAMs and doing multiplications by an ALU of the CPU. The only thing is one must have required knowledge of electronics of DRAMS and CPU, which I am lacking at this point of time. At least in worst case I think this can very well be established. Worst case means no coalesced access, no compiler optimization.
If you can estimate the cost of accessing DRAM and doing a float multiplication , then why is it impossible for estimating the current, hence a rough idea of power during these operations? Also see me post, I am not asking how much power consumption is there, rather I am asking which code is consuming less/more power or which one is more energy efficient?
a) for(i=0; i<1000000; i++)
{
a[i]= b[i]; //a, b floats in RAM.
{
b) for(i=1; i<1000000; i++)
{
float j= j * i; //j has some value. which is used later in the program , not
// shown here
{
To measure the actual power consumption you should use add an electricity meter to your power supply (remove the batteries if using a notebook).
Note that you will measure the power consumption of the entire system, so make sure to avoid nuisance parameters (any other system activity, i.e. anti-virus updates, graphical desktop environment, indexing services, (internal) hardware devices), perform measurements repeatedly, with and without your algorithms running to cancel out "background" consumption.
If possible use an embedded system.
Concerning your algorithms, the actual energy efficiency depends not only on the C code but also on the performance of the compiler and also the runtime behavior in interaction with the surrounding system. However, here are some resources what you can do as developer to help on this:
Energy-Efficient Software Checklist
Energy-Efficient Software Guidelines
Especially take a look on the paragraph Tools in above "Checklist", as it lists some tools that may help you on rough estimates (based on application profiling). It lists (besides others):
Perfmon
PwrTest/Windows Driver Kit
Windows Event Viewer (Timer tick change events, Microsoft-Windows-Kernel-PowerDiagnostic log)
Intel PowerInformer
Windows ETW (performance monitoring framework)
Intel Application Energy Toolkit
Like commenters pointed out, try using a power meter. Estimating power usage even from raw assembly code is difficult if not impossible on modern superscalar architectures.
As long as only the CPU and memory are involved, you can assume power consumption is proportional to runtime.
This may not be 100% accurate, but as near as you can get without an actual measurement.
You can try using some CPU monitoring tools to see which algorithm heats your CPU more. It won't give you solid data but will show whether there is significant difference between these two algorithms in terms of power consumption.
Here I assume that the major power consumer is CPU and algorithms do not require heavy I/O.
Well, I have done with my preliminary research and discussions with electronics majors.
A rough idea can be get by considering two factors:
1- Currents involved: more current, more power dissipation.
2- Power Dissipation due to clock rate. Power dissipation varies with the square of the frequency.
In Snippet a) the DRAMs and memories hardly take much current, so the power dissipation will be very small during each
a[i]= b[i];
operation. The above operation is nothing but data read and write.
Also the clocking for memories is usually very small as compared to CPU. While CPU is clocked at 3 GHz, memory is clocked at about 133MHz or so. (Not all components are running at rated clock). Thus power dissipation is lower because of lower clock.
In snippet b), It can be seen that I am doing more calculations. This will involve more power dissipation because of several order higher clock frequency.
Another factor is multiplication itself would comprise of several order higher no. of cycles as compared to data read write (provided memory is coalesced).
Also, it would be great to have an option for measuring or getting a rough idea of power dissipation ("code energy" )for some code as shown below(the color represent how energy efficient your code is, Red being very poor, and green being highly energy efficient ):
In short given today's technologies it is not very difficult for softwares to estimate power like this (and possibly taking many other parameters, in addition to how I described above). This will be useful for faster development and evaluation of green algorithms.

What's differential evolution and how does it compare to a genetic algorithm?

From what I've read so far they seem very similar.
Differential evolution uses floating point numbers instead, and the solutions are called vectors? I'm not quite sure what that means.
If someone could provide an overview with a little bit about the advantages and disadvantages of both.
Well, both genetic algorithms and differential evolution are examples of evolutionary computation.
Genetic algorithms keep pretty closely to the metaphor of genetic reproduction. Even the language is mostly the same-- both talk of chromosomes, both talk of genes, the genes are distinct alphabets, both talk of crossover, and the crossover is fairly close to a low-level understanding of genetic reproduction, etc.
Differential evolution is in the same style, but the correspondences are not as exact. The first big change is that DE is using actual real numbers (in the strict mathematical sense-- they're implemented as floats, or doubles, or whatever, but in theory they're ranging over the field of reals.) As a result, the ideas of mutation and crossover are substantially different. The mutation operator is modified so far that it's hard for me to even see why it's called mutation, as such, except that it serves the same purpose of breaking things out of local minima.
On the plus side, there are a handful of results showing DEs are often more effective and/or more efficient than genetic algorithms. And when working in numerical optimization, it's nice to be able to represent things as actual real numbers instead of having to work your way around to a chromosomal kind of representation, first. (Note: I've read about them, but I've not messed extensively with them so I can't really comment from first hand knowledge.)
On the negative side, I don't think there's been any proof of convergence for DEs, yet.
Differential evolution is actually a specific subset of the broader space of genetic algorithms, with the following restrictions:
The genotype is some form of real-valued vector
The mutation / crossover operations make use of the difference between two or more vectors in the population to create a new vector (typically by adding some random proportion of the difference to one of the existing vectors, plus a small amount of random noise)
DE performs well for certain situations because the vectors can be considered to form a "cloud" that explores the high value areas of the solution solution space quite effectively. It's pretty closely related to particle swarm optimization in some senses.
It still has the usual GA problem of getting stuck in local minima however.

Is a Turing machine a real device or an imaginary concept?

When I am studying about Turing machines and PDAs, I was thinking that the first computing device was the Turing machine.
Hence, I thought that there existed a practical machine called the Turing machine and its states could be represented by some special devices (say like flip-flops) and it could accept magnetic tapes as inputs.
Hence I asked how input strings are represented in magnetic tapes, but by the answer and by the details given in my book, I came to know that a Turing machine is somewhat hypothetical.
My question is, how would a Turing machine be implemented practically? For example, how it is used to check spelling errors in our current processors.
Are Turing machines outdated? Or are they still being used?
When Turing first devised what are now called Turing machines, he was doing so for purely theoretical reasons (they were used to prove the existence of undecidable problems) and without having actually constructed one in the real world. Fast forward to the present, and with the exception of hobbyists building Turing machines for the fun of doing so, TMs are essentially confined to Theoryland.
Turing machines aren't used in practice for several reasons. For starters, it's impossible to build a true TM, since you'd need infinite resources to construct the infinite tape. (You could imagine building TMs with a limited amount of tape, then adding more tape in as necessary, though.) Moreover, Turing machines are inherently slower than other models of computation because of the sequential nature of their data access. Turing machines cannot, for example, jump into the middle of an array without first walking across all the elements of the array that it wants to skip. On top of that, Turing machines are extremely difficult to design. Try writing a Turing machine to sort a list of 32-bit integers, for example. (Actually, please don't. It's really hard!)
This then begs the question... why study Turing machines at all? Fortunately, there are a huge number of reasons to do this:
To reason about the limits of what could possibly be computed. Because Turing machines are capable of simulating any computer on planet earth (or, according to the Church-Turing thesis, any physically realizable computing device), if we can show the limits of what Turing machines can compute, we can demonstrate the limits of what could ever hope to be accomplished on an actual computer.
To formalize the definition of an algorithm. Why is binary search an algorithm while the statement "guess the answer" is not? In order to answer this question, we have to have a formal model of what a computer is and what an algorithm means. Having the Turing machine as a model of computation allows us to rigorously define what an algorithm is. No one actually ever wants to translate algorithms into the format, but the ability to do so gives the field of algorithms and computability theory a firm mathematical grounding.
To formalize definitions of deterministic and nondeterministic algorithms. Probably the biggest open question in computer science right now is whether P = NP. This question only makes sense if you have a formal definition for P and NP, and these in turn require definitions if deterministic and nndeterministic computation (though technically they could be defined using second-order logic). Having the Turing machine then allows us to talk about important problems in NP, along with giving us a way to find NP-complete problems. For example, the proof that SAT is NP-complete uses the fact that SAT can be used to encode a Turing machine and it's execution on an input.
Hope this helps!
It is a conceptual device that is not realizable (due to the requirement of infinite tape). Some people have built physical realizations of a Turing machine, but it is not a true Turing machine due to physical limitations.
Here's a video of one: http://www.youtube.com/watch?v=E3keLeMwfHY
Turing Machine are not exactly physical machines, instead they are basically conceptual machine. Turing concept is hypothesis and this is very difficult to implement in real world since we require infinite tapes for small and easy solution too.
It's a theoretical machine, the following paragraph from Wikipedia
A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer.
This machine along with other machines like non-deterministic machine (doesn't exist in real) are very useful in calculating complexity and prove that one algorithm is harder than another or one algorithm is not solvable...etc
Turing machine (TM) is a mathematical model for computing devices. It is the smallest model that can really compute. In fact, the computer that you are using is a very big TM. TM is not outdated. We have other models for computation but this one was used to build the current computers and because of that, we owe a lot to Alan Turing who proposed this model in 1936.

Finding prime factors to large numbers using specially-crafted CPUs

My understanding is that many public key cryptographic algorithms these days depend on large prime numbers to make up the keys, and it is the difficulty in factoring the product of two primes that makes the encryption hard to break. It is also my understanding that one of the reasons that factoring such large numbers is so difficult, is that the sheer size of the numbers used means that no CPU can efficiently operate on the numbers, since our minuscule 32 and 64 bit CPUs are no match for 1024, 2048 or even 4096 bit numbers. Specialized Big Integer math libraries must be used in order to process those numbers, and those libraries are inherently slow since a CPU can only hold (and process) small chunks (like 32 or 64 bits) at one time.
So...
Why can't you build a highly specialized custom chip with 2048 bit registers, and giant arithmetic circuits, much in the same way that we scaled from 8 to 16 to 32 to 64-bit CPUs, just build one a LOT larger? This chip wouldn't need most of the circuitry on conventional CPUs, after all it wouldn't need to handle things like virtual memory, multithreading or I/O. It wouldn't even need to be a general-purpose processor supporting stored instructions. Just the bare minimum to perform the necessary arithmetical calculations on ginormous numbers.
I don't know a whole lot about IC design, but I do remember learning about how logic gates work, how to build a half adder, full adder, then link together a bunch of adders to do multi-bit arithmetic. Just scale up. A lot.
Now, I'm fairly certain that there is a very good reason (or 17) that the above won't work (since otherwise one of the many people smarter than I am would have already done it) but I am interested in knowing why it won't work.
(Note: This question may need some re-working, as I'm not even sure yet if the question makes sense)
What #cube said, and the fact that a giant arithmetic logic unit would take more time for the logic signals to stabilize, and include other complications in digital design. Digital logic design includes something that you take for granted in software, namely that signals through combinational logic take a small but nonzero time to propagate and settle. A 32x32 multiplier needs to be designed carefully. A 1024x1024 multiplier would not only take a huge amount of physical resources in a chip, but it also would be slower than a 32x32 multiplier (though perhaps faster than a 32x32 multiplier computing all the partial products needed to perform a 1024x1024 multiply). Plus it's not only the multiplier that's the bottleneck: you've got memory pathways. You'd have to spend a bunch of time gathering the 1024 bits from a memory circuit that's only 32 bits wide, and storing the resulting 2048 bits back into the memory circuit.
Almost certainly it's better to get a bunch of "conventional" 32-bit or 64-bit systems working in parallel: you get the speedup w/o the hardware design complexity.
edit: if anyone has ACM access (I don't), perhaps take a look at this paper to see what it says.
Its because this speedup would be only in O(n), but the complexity of factoring the number is something like O(2^n) (with respect to the number of bits). So if you made this überprocessor and factorized the numbers 1000 times faster, I would only have to make the numbers 10 bits larger and we would be back on the start again.
As indicated above, the primary problem is simply how many possibilities you have to go through to factor a number. That being said, specialized computers do exist to do this sort of thing.
The real progress for this sort of cryptography is improvements in number factoring algorithms. Currently, the fastest known general algorithm is the general number field sieve.
Historically, we seem to be able to factor numbers twice as large each decade. Part of that is faster hardware, and part of it is simply a better understanding of mathematics and how to perform factoring.
I can't comment on the feasibility of an approach exactly like the one you described, but people do similar things very frequently using FPGAs:
Crack DES keys
Crack GSM conversations
Open source graphics card
Shamir & Tromer suggest a similar approach, using a kind of grid computing:
This article discusses a new design for a custom hardware
implementation of the sieving step, which
reduces [the cost of sieving, relative to TWINKLE,] to about $10M. The new device,
called TWIRL, can be seen as an extension of the
TWINKLE device. However, unlike TWINKLE it
does not have optoelectronic components, and can
thus be manufactured using standard VLSI technology
on silicon wafers. The underlying idea is to use
a single copy of the input to solve many subproblems
in parallel. Since input storage dominates cost, if the
parallelization overhead is kept low then the resulting
speedup is obtained essentially for free. Indeed, the
main challenge lies in achieving this parallelism efficiently while allowing compact storage of the input.
Addressing this involves myriad considerations, ranging
from number theory to VLSI technology.
Why don't you try building an uber-quantum computer and run Shor's algorithm on it?
"... If a quantum computer with a sufficient number of qubits were to be constructed, Shor's algorithm could be used to break public-key cryptography schemes such as the widely used RSA scheme. RSA is based on the assumption that factoring large numbers is computationally infeasible. So far as is known, this assumption is valid for classical (non-quantum) computers; no classical algorithm is known that can factor in polynomial time. However, Shor's algorithm shows that factoring is efficient on a quantum computer, so a sufficiently large quantum computer can break RSA. ..." -Wikipedia

Kernel methods for large scale dataset

Kernel-based classifier usually requires O(n^3) training time because of the inner-product computation between two instances. To speed up the training, inner-product values can be pre-computed and stored in a two-dimensional array. However when the no. of instances is very large, say over 100,000, there will not be sufficient memory to do so.
So any better idea for this?
For modern implementations of support vector machines, the scaling of the training algorithm is dependent on lots of factors, such as the nature of the training data and kernel that you are using. The scaling factor of O(n^3) is an analytical result and isn't particularly useful in predicting how SVM training will scale in real-world situations. For example, empirical estimates of the training algorithm used by SVMLight put the scaling against training set size to be approximately O(n^2).
I would suggest you ask this question in the kernel machines forum. I think you're more likely to get a better answer than on Stack Overflow, which is more of a general-purpose programming site.
The Relevance Vector Machine has a sequential training mode in which you do not need to keep the entire kernel matrix in memory. You can basically calculate a column at a time, determine if it appears relevant, and throw it away otherwise. I have not had much luck with it myself, though, and the RVM has some other issues. There is most likely a better solution in the realm of Gaussian Processes. I haven't really sat down much with those, but I have seen mention of an online algorithm for it.
I am not a numerical analyst, but isn't the QR decomposition which you need to do ordinary least-squares linear regression also O(n^3)?
Anyways, you'll probably want to search the literature (since this is fairly new stuff) for online learning or active learning versions of the algorithm you're using. The general idea is to either discard data far from your decision boundary or to not include them in the first place. The danger is that you might get locked into a bad local maximum and then your online/active algorithm will ignore data that would help you get out.

Resources