General proof of equivalence of two FSMs in finite time? - theory

Does a general proof exist for the equivalence of two (deterministic) finite state machines that always takes finite time? That is, given two FSMs, can you prove that given the same inputs they will always produce the same outputs without actually needing to execute the FSMs (which could be non-terminating?). If such a proof does exist, what is the time complexity?

There is a proof, though I don't know it. Look for Sipser's textbook on the subject, that's where I know of it from.
Scrounging my memory: basically, there is a unique minimal DFA for a given DFA, and there is a minimization algorithm that always terminates. Minimize both A and B, and see if they have the same minimal DFA. I don't know the complexity of the minimization, though its not too bad (I think its polynomial). Graph isomorphism is pretty hard to compute, but because there's a special starting node, it may hopefully be somewhat easier. You may not even require graph isomorphism, to be honest.
But no, you don't ever need to actually run the DFAs, just analyze their structure.

Suppose you have two FSMs with O(n) states. Then you can make an FSM of size O(n2) that recognizes only the symmetric difference of their accept languages. (Make an FSM that has states that correspond to a pair of states, one from each FSM. Then on each step, update each part of the pair simultaneously. A state in the new FSM is an accept state iff exactly one of the pair was an accept state.) Now minimize this FSM and see if it is the same as the trivial one-state FSM that rejects everything. Minimizing an FSM with m states takes time O(m log m), so overall you can do everything in time O(n2 log n).
#Agor correctly states that Sipser is a good reference for these sorts of things. The key point of my answer is that you can do this in polynomial time, even with a small exponent.

Related

Temporal complexity of primary instructions in C

I have a question about algorithmic complexity.
Do the basic instructions in C have an equivalent complexity, if not, in what order are they:
if, write/read a single cell of a matrix, a+b, a*b, a = b ...
Thanks
No. The basic instructions in C cannot be ordered by any kind of wall-time or theoretic complexity. This is not specified and probably cannot be specified by the Standard; rather, these properties arise from the interaction of the code, the OS, and the underlying architecture.
I think you're looking for information on cycles per instruction.
However, even this is not the whole story. Modern CPUs have hierarchical caches. If your algorithm operates on data which is primarily in a fast cache, then it will run much faster than a program which operates on data that must be repeatedly accessed from RAM, the hard drive, or over a network. The amount of calculation done per load is an application's arithmetic intensity. Roofline models provide a tool for thinking about this. You can achieve better cache utilization via blocking and other techniques, though the subfield of communication avoiding algorithms explores this in-depth.
Ultimately, the C language is a high-level abstraction of what a processor actually does. In standard cost models we think of all instructions as taking the same amount of time. In more accurate, but potentially more difficult to use, cache-aware cost models, data movement is treated as being more expensive.
Complexity is not about the time it takes to execute "basic" code lines like addition, multiplication, division and so on.
Even if these expressions have different execution time they all have complexity O(1).
Complexity is about what happens when some variable figure changes. That variable figure can be many different things. Some examples could be "the number of element in an array", "the number of elements in a linked list", "the size of a file", "the size of a matrix".
For instance - if you write code that has to find the largest value in an array of integers, the execution time depends on the number of elements in the array. The code will have to visit every array element to check if it's larger than the previous elements. Consequently, the complexity is O(N), where N is the number of elements. From that we can't say how much time it will take to find the largest element but we can say that it will take 10 times longer to execute on a 1000 element array than on a 100 element array.
Now if you did the same with a linked list (i.e. find largest element) the complexity would again be O(N). However, this does not say that a linked list perform just the same as an array. It only says that it scales in the same way as an array.
A simplified way to say it - if there is no loops involved the complexity is always
O(1).

Show a language is infinite

Given a DFA M with n states defined over an alphabet ∑, and a string x∈L(M) such that |x|>n, I must show that the L(M) is an infinite language.
Is there any way I can prove this using the pumping lemma?
I'm not immediately sure why the pumping lemma needs to be involved directly. The proof idea is the same one behind the proof of the pumping lemma, and we can explain the relationship. Possibly, with that explanation in hand, you can figure out a way to work out a proof that relies on the pumping lemma directly. I think that should in theory be possible to do without too much mental gymnastics.
Imagine a DFA with three states that accepts a string of length four. This is a concrete instance of the general claim you've been asked to prove. If we can understand why this is specifically true here, it will be easier to generalize to DFAs with an arbitrary number of states.
A DFA executes one transition for every symbol of input. Each transition takes the DFA to some state. Since the string of length four is accepted, we execute four transitions - beginning with the initial state - and we end up in some accepting state. Since each transition takes the DFA to some state, we move into states four times. Since there are only three states in the DFA, that means one of the states was transitioned into more than once: we cannot transition four times around three states without visiting some state more than once. The general principle here is referred to as the Pigeonhole principle: if there are n pigeons in m holes, and n > m, then some hole has more than one pigeon. This is the same reasoning behind why the pumping lemma works, by the way.
Now, consider what this observation means. We accepted a string and, while accepting that string, some state was visited twice. This means that there is a loop in our DFA somewhere - a way to get to an earlier state from a later state - and, furthermore, it is possible to accept strings after taking the loop. If the loop can be taken once:
it might not have been taken at all, so there is definitely a shorter string accepted by the DFA.
it might be taken twice, so there is definitely a longer string accepted by the DFA.
indeed, it might be taken over and over again an arbitrary number of times, so infinitely many longer strings must be in the language.
When using the pumping lemma, you typically assume a language is regular and conclude that what the pumping lemma says about regular languages - what we have said above - is true. Then, you provide a counterexample - a string of length greater than the pumping length p (more on what p is in a moment) - and conclude the assumption is false, that is, that the language is not regular.
The pumping length is essentially the number of states in some DFA for the language. If the language is regular, there are infinitely many DFAs that accept it. For the purposes of using the pumping lemma, it doesn't really matter which DFA for the language you use, since any DFA will visit some state more than once if it processes an input whose length exceeds the number of states in the DFA. Perhaps it is typical to imagine a minimal DFA for the language, which is guaranteed to exist and be unique up to isomorphism.

What is good measure to compare algorithms?

Well I was reading an article about comparing two algorithms by firstly analyzing them.
My teacher taught me that you can analyze algorithm by directly using number of steps for that algorithm.
for ex:
algo printArray(arr[n]){
for(int i=0;i<n;i++){
write arr[i];
}
}
will have complexity of O(N), where N is size of array. and it repeats the for loop for N times.
while
algo printMatrix(arr,m,n){
for(i=0;i<m;i++){
for(j=0;j<n;j++){
write arr[i][j];
}
}
}
will have complexity of O(MXN) ~ O(N^2) when M=N. statements inside for are executed MXN times.
similarly O(log N). if it divides input into 2 equal parts. and So on.
But according to that article:
The Measures Execution Time, Number of statements aren't good for analyzing the algorithm.
because:
Execution Time will be system Dependent and,
Number of statements will vary with the programming language used.
and It states that
Ideal Solution will be to express running time of algorithm as a function of input size N that is f(n).
That confused me a little, How can you calculate running time if you consider execution time as not good measure?
Can experts here please elaborate this?
Thanks in advance.
When you were saying "complexity of O(N)" that is referred to as "Big-O notation" which is the same as the "Ideal Solution" that you mentioned in your post. It is a way of expressing run time as a function of input size.
I think were you got confused was when it said "express running time" - it didn't mean express it in a numerical value (which is what execution time is), it meant express it in Big-O notation. I think you just got tripped up on the terminology.
Execution time is indeed system-dependent, but it also depends on the number of instructions the algorithm executes.
Also, I do not understand how the number of steps is irrelevant, given that algorithms are analyzed as language-agnostic and without paying any attention to whatever features and syntactic-sugars various languages imply.
The one measure of algorithm analysis I have always encountered since I started analyzing algorithms is the number of executed instructions and I fail to see how this metric may be irrelevant.
At the same time, complexity classes are meant as an "order of magnitude" indication of how fast or slow an algorithm is. They are dependent of the number of executed instructions and independent of the system the algorithm runs on, because by definition an elementary operation (such as addition of two numbers) should take constant time, however large or small this "constant" means in practice, therefore complexity classes do not change. The constants inside the expression for the exact complexity function may indeed vary from system to system, but what is actually relevant for algorithm comparison is the complexity class, as only by comparing those can you find out how an algorithm behaves on increasingly large inputs (asymptotically) compared to another algorithm.
Big-O notation waves away constants (both fixed cost and constant multipliers). So any function that takes kn+c operations to complete is (by definition!) O(n), regardless of k and c. This is why it's often better to take real-world measurements (profiling) of your algorithms in action with real data, to see how fast they effectively are.
But execution time, obviously, varies depending on the data set -- if you're trying to come up with a general measure of performance that's not based on a specific usage scenario, then execution time is less valuable (unless you're comparing all algorithms under the same conditions, and even then it's not necessarily fair unless you model the majority of possible scenarios, and not just one).
Big-O notation becomes more valuable as you move to larger data sets. It gives you a rough idea of the performance of an algorithm, assuming reasonable values for k and c. If you have a million numbers you want to sort, then it's safe to say you want to stay away from any O(n^2) algorithm, and try to find a better O(n lg n) algorithm. If you're sorting three numbers, the theoretical complexity bound doesn't matter anymore, because the constants dominate the resources taken.
Note also that while the number of statements a given algorithm can be expressed in varies wildly between programming languages, the number of constant-time steps that need to be executed (at the machine level for your target architecture, which is typically one where integer arithmetic and memory accesses take a fixed amount of time, or more precisely are bounded by a fixed amount of time). It is this bound on the maximum number of fixed-cost steps required by an algorithm that big-O measures, which has no direct relation to actual running time for a given input, yet still describes roughly how much work must be done for a given data set as the size of the set grows.
In comparing algorithms, execution speed is important as well mentioned by others, but other factors like memory space are crucial too.
Memory space also uses order of complexity notation.
Code could sort an array in place using a bubble sort needing only a handful of extra memory O(1). Other methods, though faster, may need O(ln N) memory.
Other more esoteric measures include code complexity like Cyclomatic complexity and Readability
Traditionally, computer science measures algorithm effectivity (speed) by the number of comparisons or sometimes data accesses, using "Big O notation". This is so, because the number of comparisons (and/or data accesses) is a good mathematical model to describe efficiency of certain algorithms, searching and sorting ones in particular, where O(log n) is considered the fastest possible in theory.
This theoretic model has always had several flaws though. It assumes that comparisons (and/or data accessing) are what takes time, and that the time for performing things like function calls and branching/looping is neglectible. This is of course nonsense in the real world.
In the real world, a recursive binary search algorithm might for example be extremely slow compared to a quick & dirty linear search implemented with a plain for loop, because on the given system, the function call overhead is what takes the most time, not the comparisons.
There are a whole lot of things that affect performance. As CPUs evolve, more such things are invented. Nowadays, you might have to consider things like data alignment, instruction pipe-lining, branch prediction, data cache memory, multiple CPU cores and so on. All these technologies make traditional algorithm theory rather irrelevant.
To write the most effective code possible, you need to have a specific system in mind and you need in-depth knowledge about said system. Fortunately, compilers have evolved a lot too, so a lot of the in-depth system knowledge can be left to the person who implements a compiler port for the specific system.
Generally, I think many programmers today spend far too much time pondering about program speed and coming up with "clever things" to get better performance. Back in the days when CPUs were slow and compilers were terrible, such things were very important. But today, a good, modern programmer focus on making the code bug-free, readable, maintainable, re-useable, secure, portable etc. It doesn't matter how fast your program is, if it is a buggy mess of unreadable crap. So deal with performance when the need arises.

What's differential evolution and how does it compare to a genetic algorithm?

From what I've read so far they seem very similar.
Differential evolution uses floating point numbers instead, and the solutions are called vectors? I'm not quite sure what that means.
If someone could provide an overview with a little bit about the advantages and disadvantages of both.
Well, both genetic algorithms and differential evolution are examples of evolutionary computation.
Genetic algorithms keep pretty closely to the metaphor of genetic reproduction. Even the language is mostly the same-- both talk of chromosomes, both talk of genes, the genes are distinct alphabets, both talk of crossover, and the crossover is fairly close to a low-level understanding of genetic reproduction, etc.
Differential evolution is in the same style, but the correspondences are not as exact. The first big change is that DE is using actual real numbers (in the strict mathematical sense-- they're implemented as floats, or doubles, or whatever, but in theory they're ranging over the field of reals.) As a result, the ideas of mutation and crossover are substantially different. The mutation operator is modified so far that it's hard for me to even see why it's called mutation, as such, except that it serves the same purpose of breaking things out of local minima.
On the plus side, there are a handful of results showing DEs are often more effective and/or more efficient than genetic algorithms. And when working in numerical optimization, it's nice to be able to represent things as actual real numbers instead of having to work your way around to a chromosomal kind of representation, first. (Note: I've read about them, but I've not messed extensively with them so I can't really comment from first hand knowledge.)
On the negative side, I don't think there's been any proof of convergence for DEs, yet.
Differential evolution is actually a specific subset of the broader space of genetic algorithms, with the following restrictions:
The genotype is some form of real-valued vector
The mutation / crossover operations make use of the difference between two or more vectors in the population to create a new vector (typically by adding some random proportion of the difference to one of the existing vectors, plus a small amount of random noise)
DE performs well for certain situations because the vectors can be considered to form a "cloud" that explores the high value areas of the solution solution space quite effectively. It's pretty closely related to particle swarm optimization in some senses.
It still has the usual GA problem of getting stuck in local minima however.

The limits of parallelism (job-interview question)

Is it possible to solve a problem of O(n!) complexity within a reasonable time given infinite number of processing units and infinite space?
The typical example of O(n!) problem is brute-force search: trying all permutations (ordered combinations).
It sure is. Consider the Traveling Salesman Problem in it's strict NP form: given this list of costs for traveling from each point to each other point, can you put together a tour with cost less than K? With the new infinite-core CPU from Intel, you just assign one core to each possible permutation, and add up the costs (this is fast), and see if any core flags a success.
More generally, a problem in NP is a decision problem such that a potential solution can be verified in polynomial time (i.e., efficiently), and so (since the potential solutions are enumerable) any such problem can be efficiently solved with sufficiently many CPUs.
It sounds like what you're really asking is whether a problem of O(n!) complexity can be reduced to O(n^a) on a non-deterministic machine; in other words, whether Not-P = NP. The answer to that question is no, there are some Not-P problems that are not NP. For example, a limited halting problem (that asks if a program halts in at most n! steps).
The problem would be distributing the work and collecting the results.
If all the CPUs can read the same piece of memory at once, and if each one has a unique CPU-ID that is known to it, then the ID may be used to select a permutation, and the distribution problem is solveable in constant time.
Gathering the results would be tricky, though. Each CPU could compare with its (numerical) neighbor, and then that result compared to the result of the two closest neighbors, etc. This will be a O(log(n!)) process. I don't know for sure, but I suspect that O(log(n!)) is hyperpolynomial, so I don't think that's a solution.
No, N! is even higher than NP. Thinking unlimited parallelism could solve NP problem in polynomial time, which is usually considered as a "reasonable" time complexity, N! problem is still higher than polynomial on such a setup.
You mentioned search as a "typical" problem, but were you actually asked specifically about a search problem? If so, then yes, search is typically parallelizable, but as far as I can tell O(n!) in principle does not imply the degree of concurrency available, does it? You could have a completely serial O(n!) problem, which means infinite computers won't help. I once had an unusual O(n^4) problem that actually was completely serial.
So, available concurrency is the first thing, and IMHO you should get points for bringing up Amdahl's law in an interview. Next potential pitfall is inter-processor communication, and in general the nature of the algorithm. Consider, for example, this list of application classes: http://view.eecs.berkeley.edu/wiki/Dwarf_Mine. FWIW the O(n^4) code I mentioned earlier sort of falls into the FSM category.
Another somewhat related anecdote: I've heard an engineer from a supercomputer vendor claim that if 10% of their CPU time were being spent in MPI libraries, they consider the parallelization a solid success (though that may have just been limited to codes in the computational chemistry domain).
If the problem is one of checking permutations/answers to a problem of complexity O(n!), then of course you can do it efficiently with an infinite number of processors.
The reason is that you can easily distribute atomic pieces of the problem (an atomic piece of the problem might, say, be one of the permutations to check) with logarithmic efficiency.
As a simple example, you could set up the processors as a 'binary tree', so to speak. You could be at the root, and have the processors deliver permutations of the problem (or whatever the smallest pieces of the problem might be) to the leaf processors to solve, and you'd end up solving the problem in log(n!) time.
Remember it's the delivery of the permutations to the processors that takes a long time. Each part of the problem itself will actually be solved instantly.
Edit: Fixed my post according to the comments below.
Sometimes the correct answer is, "How many times does this come up with your code base?" but in this case, there is a real answer.
The correct answer is no, because not all problems can be solved using perfect parallel processing. For example, a travelling salesman-like problem must commit to one path for the second leg of the journey to be considered.
Assuming a fully connected matrix of cities, should you want to display all possible non-cyclic routes for our weary salesman, you're stuck with a O(n!) problem, which can be decomposed to an O(n)*O((n-1)!) problem. The issue is that you need to commit to one path (on the O(n) side of the equation) before you can consider the remaining paths (on the O((n-1)!) side of the equation).
Since some of the computations must be performed prior to other computations, then there is no way to scatter the results perfectly in a single scatter / gather pass. That means the solution will be waiting on the results of calculations which must come before the "next" step can be started. This is the key, as the need for prior partial solutions provide a "bottle neck" in the ability to proceed with the computation.
Since we've proven we can make a number of these infinitely fast, infinitely numerous, CPUs wait (even if they are waiting on themselves), we know that the runtime cannot be O(1), and we only need to pick a very large N to guarantee an "unacceptable" run time.
This is like asking if an infinite number of monkeys typing on a monkey-destruction proof computer with a word-processor can come up with all the works of Shakespeare; given an infinite amount of time. The realist would say not since the conditions are no physically possible. The idealist will say yes; in theory it can happen. Since Software Engineering (Software Engineering, not Computer Science) focuses on real system we can see and touch, then the answer is no. If you doubt me, then go build it and prove me wrong! IMHO.
Disregarding the cost of setup (whatever that might be...assigning a range of values to a processing unit, for instance), then yes. In such a case, any value less than infinity could be solved in one concurrent iteration across an equal number of processing units.
Setup, however, is something significant to disregard.
Each problem could be solved by one CPU, but who would deliver these jobs to all infinite CPU's? In general, this task is centralized, so if we have infinite jobs to deliver to all infinite CPU's, we could take infinite time to do so.

Resources