Recognizing patterns in a number sequence - artificial-intelligence

I think this should be an AI problem.
Is there any algorithm that, given any number sequence, can find patterns?
And the patterns could be abstract as it can be...
For example:
12112111211112 ... ( increasing number of 1's separated by 2 )
1022033304440 ...
11114444333322221111444433332222.. (this can be either repetition of 1111444433332222
or
four 1's and 4's and 3's and 2's...)
and even some error might be corrected
1111111121111111111112111121111 (repetition of 1's with
intermittent 2's)

No, it's impossible, it's related to the halting problem and Gödels incompleteness theorem.
Furthermore some serious philosophical groundwork would need to be done to actually formalize the question. Firstly what is exactly meant by "recognizing a pattern". We should assume it identifies:
The most expressive true pattern. So "some numbers" is invalid as it is not expressive enough
The argument would go something like; assume the algo exists, and now consider a sequence of numbers that is a code for a sequence of programs. Now suppose we have a sequence of halting programs, by above it must know, it cannot just say "some programs" as that is not maximally expressive. So it must say "halting programs" Now given halting program P we can add it to the halting list and the algo should say "halting programs", which would conclude P halts, if it doesn't halt then the algo should say something else, like "some halting and one non halting". Therefore the algo can be used to define an algo that can decide if a program halts.
Not a formal proof now, but not a formal question :) Suggest you look up Gödel, Halting problem and Kolmogorov Complexity.

Related

How expressive can we be with arrays in Z3(Py)? An example

My first question is whether I can express the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
I know this kind of expressions can be queried in Dafny and (since Dafny uses Z3 below) I guess this can be done in Z3Py.
My second question is: how expressive is the decidable fragment involving arrays in Z3?
I read this paper on how the full theory of arrays is not decidable (http://theory.stanford.edu/~arbrad/papers/arrays.pdf), but only a concrete fragment, the array property fragment.
Is there any interesting paper/tutorial on what can and cannot be done with arrays+quantifiers+functions in Z3?
You found the best paper to read regarding reasoning with Array's, so I doubt there's a better resource or a tutorial out there for you.
I think the sequence logic (not yet officially supported by SMTLib, but z3 supports it), is the right logic to use for reasoning about these sorts of problems, see: https://microsoft.github.io/z3guide/docs/theories/Sequences/
Having said that, most properties about arrays/sequences of "arbitrary size" require inductive proofs. This is because most interesting functions on them are essentially recursive (or iterative), and induction is the only way to prove properties for such programs. While SMT solvers improved significantly regarding support for recursive definitions and induction, they still don't perform anywhere near well compared to a traditional theorem prover. (This is, of course, to be expected.)
I'd recommend looking at the sequence logic, and playing around with recursive definitions. You might get some mileage out of that, though don't expect proofs for anything that require induction, especially if the inductive-hypothesis needs some clever invariant to be specified.
Note that if you know the length of your array concretely (i.e., 10, 15, or some other hopefully not too large a number), then it's best to allocate the elements symbolically yourself, and not use arrays/sequences at all. (And you can repeat your proof for lenghts 0, 1, 2, .. upto some fixed number.) But if you want proofs that work for arbitrary lengths, your best bet is to use sequences in z3, not arrays; with all the caveats I mentioned above.

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
or
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

Can Turing machines halt and implicitly accept strings that the Turing machine cannot handle?

so I came across a problem that I was unsure about. For curiosity's sake, I wanted to ask:
I am pretty sure that Turing machines can implicitly reject strings that it cannot handle, but can it do the complement of that? In other words, can it implicitly accept an input that it cannot handle? I apologize if this is a stupid question, I cannot seem to find an answer to this.
That's not a stupid question! I believe what is meant by "strings that it cannot handle" is actually, "strings which are not in a valid format", which I take to mean "strings that contain symbols that we don't know." (I'm going off of slide 14 of this presentation, which I found by just googling Turing 'implicitly reject').
So, if we do use that definition, then we need to simply create a Turing machine that accepts an input if it contains a symbol not in our valid set.
Yes, there are other possible interpretations of "strings that it cannot handle", but I'm fairly sure it means this. It obviously could not be a definition without constraints, or else we could define "strings that it cannot handle" as, say, "strings representing programs that halt", and we'd have solved the halting problem! (Or if you're not familiar with the halting problem, you could substitute in any NP-complete problem, really).
I think the reason that the idea of rejecting strings the Turing machine cannot handle was introduced in the first place is so that the machine can be well defined on all input. So, say, if you have a Turing machine that accepts a binary number if it's divisible by 3, but you pass in input that is not a bianry number (like, say, "apple sauce"), we can still reason about the output of the program.

What is the logical difference between loops and recursive functions?

I came across this video which is discussing how most recursive functions can be written with for loops but when I thought about it, I couldn't see the logical difference between the two. I found this topic here but it only focuses on the practical difference as do many other similar topics on the web so what is the logical difference in the way a loop and a recursion are handled?
Bottom line up front -- recursion is more versatile but in practice is generally less efficient than looping.
A loop could in principle always be implemented as a recursion if you wished to do so. In practice the limits of stack resources put serious constraints on the size of the problems you can address. I can and have built loops that iterate a billion times, something I'd never try with recursion unless I was certain the compiler could and would convert the recursion into a loop. Because of the stack limits and efficiency, people often try to find a looping equivalent for recursions.
Tail recursions can always be converted to loops. However, there are recursions that can't be converted. As an example, I work with statistical design of experiments. Sometimes a large design is constructed by "crossing" several smaller sub-designs. Crossing is where you concatenate every row of a second design to each row of the first. For two sub-designs, all this needs is simple nested looping, but for three or more designs you need to increase the level of nesting, adding one level of nesting for each additional sub-design. So while this is nested looping in principle, in practice the amount of nesting is variable. If you tried to implement it with looping you'd have to revise your program to add/subtract nested loops every time you were dealing with a different number of sub-designs to be crossed, so you can't write an immutable loop-based version. This can easily be implemented with recursion. In this case, I'm happy to trade a slight amount of efficiency, because I wrote and debugged the code 6 years ago and haven't had to revise it since, despite creating lots of crossed designs of varying complexity since then.
One way to think through this is that the choice for recursion or iteration depends on how you think about the problem being solved. Certain "ways of thinking" lead more naturally to recursive solutions, and other ways of thinking lead to more iterative solutions. For any problem, you can in principle think in a way that gives you a recursive solution or a way that gives you an iterative solution. (Sometimes the iterative solution will just end up simulating a recursion stack, but there is no actual recursion there.)
Here's an example. You have an array of integers (positive or negative), and you want to find the maximum segment sum. A segment is a piece of the array that is contiguous. So in the array [3, -4, 2, 1, -2, 4], the maximum segment sum is 5, and you get that from the segment [2, 1, -2, 4]; its sum is 5.
OK - so how might we solve this problem? One thing you might do is reason like this: "if I knew the maximum segment sum in the left half, and the maximum segment sum in the right half, then maybe I could somehow jam those together and figure out the maximum segment sum overall". This idea would require you to find the maximum segment sum on the two subhalves, and this is a smaller instance of the original problem. This is recursion, and a direct translation of this idea into code would therefore be recursive.
But the maximum segment sum problem isn't "recursive" or "iterative" -- it can be both, depending on how you think about the solution. I gave a recursive thought process above. Here is an iterative process: "well, if I add up the elements in each of the segments that start at some index i and end at some index j, I can just take the maximum of these to solve the problem". And directly trying to code this approach would give you triply nested loops (and a bad mark on an assignment because it's horribly inefficient!).
So, the same problem, depending on how the problem is conceptualized, can lead to a recursive or iterative solution. Now, I happened to choose a problem where there are many ways of solving it, and where there are reasonable recursive and iterative solutions. Some problems, however, admit only one type of solution, and that solution may be most naturally implemented using recursion or iteration. For example, if I asked you to write a function that keeps asking the user to enter a letter until they enter y or n, you might start thinking: "keep repeating the prompt and asking for input..." and before you know it you have some iterative code. Perhaps you might instead think recursively: "if the user enters y or n, I am done; otherwise ask the user for y or n"... in which case you'd generate a recursive algorithm. But the recursion here doesn't give you much: it unnecessarily uses a stack and doesn't make the program any faster. (Recursion sometimes makes it easier to prove correctness, in which case you might present something recursively even though you could alternately give a reasonable iterative solution.)

Halting in non-Turing-complete languages

The halting problem cannot be solved for Turing complete languages and it can be solved trivially for some non-TC languages like regexes where it always halts.
I was wondering if there are any languages that has both the ability to halt and not halt but admits an algorithm that can determine whether it halts.
The halting problem does not act on languages. Rather, it acts on machines
(i.e., programs): it asks whether a given program halts on a given input.
Perhaps you meant to ask whether it can be solved for other models of
computation (like regular expressions, which you mention, but also like
push-down automata).
Halting can, in general, be detected in models with finite resources (like
regular expressions or, equivalently, finite automata, which have a fixed
number of states and no external storage). This is easily accomplished by
enumerating all possible configurations and checking whether the machine enters
the same configuration twice (indicating an infinite loop); with finite
resources, we can put an upper bound on the amount of time before we must see
a repeated configuration if the machine does not halt.
Usually, models with infinite resources (unbounded TMs and PDAs, for instance),
cannot be halt-checked, but it would be best to investigate the models and
their open problems individually.
(Sorry for all the Wikipedia links, but it actually is a very good resource for
this kind of question.)
Yes. One important class of this kind are primitive recursive functions. This class includes all of the basic things you expect to be able to do with numbers (addition, multiplication, etc.), as well as some complex classes like #adrian has mentioned (regular expressions/finite automata, context-free grammars/pushdown automata). There do, however, exist functions that are not primitive recursive, such as the Ackermann function.
It's actually pretty easy to understand primitive recursive functions. They're the functions that you could get in a programming language that had no true recursion (so a function f cannot call itself, whether directly or by calling another function g that then calls f, etc.) and has no while-loops, instead having bounded for-loops. A bounded for-loop is one like "for i from 1 to r" where r is a variable that has already been computed earlier in the program; also, i cannot be modified within the for-loop. The point of such a programming language is that every program halts.
Most programs we write are actually primitive recursive (I mean, can be translated into such a language).
The short answer is yes, and such languages can even be extremely useful.
There was a discussion about it a few months ago on LtU:
http://lambda-the-ultimate.org/node/2846

Resources