Can Turing machines halt and implicitly accept strings that the Turing machine cannot handle? - theory

so I came across a problem that I was unsure about. For curiosity's sake, I wanted to ask:
I am pretty sure that Turing machines can implicitly reject strings that it cannot handle, but can it do the complement of that? In other words, can it implicitly accept an input that it cannot handle? I apologize if this is a stupid question, I cannot seem to find an answer to this.

That's not a stupid question! I believe what is meant by "strings that it cannot handle" is actually, "strings which are not in a valid format", which I take to mean "strings that contain symbols that we don't know." (I'm going off of slide 14 of this presentation, which I found by just googling Turing 'implicitly reject').
So, if we do use that definition, then we need to simply create a Turing machine that accepts an input if it contains a symbol not in our valid set.
Yes, there are other possible interpretations of "strings that it cannot handle", but I'm fairly sure it means this. It obviously could not be a definition without constraints, or else we could define "strings that it cannot handle" as, say, "strings representing programs that halt", and we'd have solved the halting problem! (Or if you're not familiar with the halting problem, you could substitute in any NP-complete problem, really).
I think the reason that the idea of rejecting strings the Turing machine cannot handle was introduced in the first place is so that the machine can be well defined on all input. So, say, if you have a Turing machine that accepts a binary number if it's divisible by 3, but you pass in input that is not a bianry number (like, say, "apple sauce"), we can still reason about the output of the program.

Related

Recognizing patterns in a number sequence

I think this should be an AI problem.
Is there any algorithm that, given any number sequence, can find patterns?
And the patterns could be abstract as it can be...
For example:
12112111211112 ... ( increasing number of 1's separated by 2 )
1022033304440 ...
11114444333322221111444433332222.. (this can be either repetition of 1111444433332222
or
four 1's and 4's and 3's and 2's...)
and even some error might be corrected
1111111121111111111112111121111 (repetition of 1's with
intermittent 2's)
No, it's impossible, it's related to the halting problem and Gödels incompleteness theorem.
Furthermore some serious philosophical groundwork would need to be done to actually formalize the question. Firstly what is exactly meant by "recognizing a pattern". We should assume it identifies:
The most expressive true pattern. So "some numbers" is invalid as it is not expressive enough
The argument would go something like; assume the algo exists, and now consider a sequence of numbers that is a code for a sequence of programs. Now suppose we have a sequence of halting programs, by above it must know, it cannot just say "some programs" as that is not maximally expressive. So it must say "halting programs" Now given halting program P we can add it to the halting list and the algo should say "halting programs", which would conclude P halts, if it doesn't halt then the algo should say something else, like "some halting and one non halting". Therefore the algo can be used to define an algo that can decide if a program halts.
Not a formal proof now, but not a formal question :) Suggest you look up Gödel, Halting problem and Kolmogorov Complexity.

C style guide tips for a <80 char line

I can't find many recommendations/style guides for
C that mention how to split up lines in C so you
have less then 80 characters per line.
About the only thing I can find is PEP 7,
the style guide for the main Python implmentation
(CPython).
Does a link exist to a comprehensive C style guide
which includes recommendations for wrapping?
Or failing that, at least some good personal advive
on the matter?
P.S.: What do you do with really_long_variable_names_that_go_on_forever
(besides shortening)? Do you put them on the left edge or let it spill?
Here is Linus'original article about (linux) kernel coding style. The document probably evolved since, it is part of the source distribution.
You can have a look at the GNU Coding Standards which covers much more than coding style, but are pretty interesting nonetheless.
The 80 characters per line "rule" is obsolete.
http://richarddingwall.name/2008/05/31/is-the-80-character-line-limit-still-relevant/
http://en.wikipedia.org/wiki/Characters_per_line
http://news.ycombinator.com/item?id=180949
We don't use punched cards much anymore. We have huge displays with great resolutions that will only get larger as time goes on (obviously hand-helds, tablets, and netbooks are big part of modern computing, but I think most of us are coding on desktops and laptops, and even laptops have big displays these days).
Here are the rules that I feel we should consider:
One line of code does one thing.
One line of code is written as one line of code.
In other words, make each line as simple as possible and do not split a logical line into several physical lines. The first part of the rule helps to ensure reasonable brevity so that conforming to the second part is not burdensome.
Some people believe that certain languages encourage complex "one-liners." Perl is an example of a language that is considered by some to be a "write once, read never" language, but you know what? If you don't write obfuscated Perl, if instead you do one thing per line, Perl code can be just as manageable as anything else... ok, maybe not APL ;)
Besides complex one-liners, another drawback that I see with conforming to some artificial character limit is the shortening of identifiers to conform to the rule. Descriptive identifiers that are devoid of abbreviations and acronyms are often clearer than shortened alternatives. Clear identifiers move us that much closer to literate programming.
Perhaps the best "modern" argument that I've heard for keeping the 80, or some other value, character limit is "side-by-side" comparison of code. Side-by-side comparison is useful for comparing different versions of the same source file, as in source code version control system merge operations. Personally, I've noticed that if I abide by the rules I've suggested, the majority of my lines of code are sufficiently short to view them in their entirety when two source files (or even three, for three-way merges) are viewed side-by-side on a modern display. Sure, some of them overrun the viewport. In such cases, I just scroll a little bit if I need to see more. Also, modern comparison tools can easily tell you which lines are different, so you know which lines you should be looking at. If your tooling tells you that there's no reason to scroll, then there's no reason to scroll.
I think the old recommendation of 80 chars per line comes from a time when monitors were 80x25, nowadays 128 or more should be fine.

How bad is it to abandon THE rule in C (aka: return 0 on success)?

in a current project I dared to do away with the old 0 rule, i.e. returning 0 on success of a function. How is this seen in the community? The logic that I am imposing on the code (and therefore on the co-workers and all subsequent maintenance programmers) is:
.>0: for any kind of success/fulfillment, that is, a positive outcome
==0: for signalling no progress or busy or unfinished, which is zero information about the outcome
<0: for any kind of error/infeasibility, that is, a negative outcome
Sitting in between a lot of hardware units with unpredictable response times in a realtime system, many of the functions need to convey exactly this ternary logic so I decided it being legitimate to throw the minimalistic standard return logic away, at the cost of a few WTF's on the programmers side.
Opininons?
PS: on a side note, the Roman empire collapsed because the Romans with their number system lacking the 0, never knew when their C functions succeeded!
"Your program should follow an existing convention if an existing convention makes sense for it."
Source: The GNU C Library
By deviating from such a widely known convention, you are creating a high level of technical debt. Every single programmer that works on the code will have to ask the same questions, every consumer of a function will need to be aware of the deviation from the standard.
http://en.wikipedia.org/wiki/Exit_status
I think you're overstating the status of this mythical "rule". Much more often, it's that a function returns a nonnegative value on success indicating a result of some sort (number of bytes written/read/converted, current position, size, next character value, etc.), and that negative values, which otherwise would make no sense for the interface, are reserved for signalling error conditions. On the other hand, some functions need to return unsigned results, but zero never makes sense as a valid result, and then zero is used to signal errors.
In short, do whatever makes sense in the application or library you are developing, but aim for consistency. And I mean consistency with external code too, not just your own code. If you're using third-party or library code that follows a particular convention and your code is designed to be closely coupled to that third-party code, it might make sense to follow that code's conventions so that other programmers working on the project don't get unwanted surprises.
And finally, as others have said, whatever your convention, document it!
It is fine as long as you document it well.
I think it ultimately depends on the customers of your code.
In my last system we used more or less the same coding system as yours, with "0" meaning "I did nothing at all" (e.g. calling Init() twice on an object). This worked perfectly well and everybody who worked on that system knew this was the convention.
However, if you are writing an API that can be sold to external customers, or writing a module that will be plugged into an existing, "standard-RC" system, I would advise you to stick to the 0-on-success rule, in order to avoid future confusion and possible pitfalls for other developers.
And as per your PS, when in Rome, do like the romans do :-)
I think you should follow the Principle Of Least Astonishment
The POLA states that, when two
elements of an interface conflict, or
are ambiguous, the behaviour should be
that which will least surprise the
user; in particular a programmer
should try to think of the behavior
that will least surprise someone who
uses the program, rather than that
behavior that is natural from knowing
the inner workings of the program.
If your code is for internal consumption only, you may get away with it, though. So it really depends on the people your code will impact :)
There is nothing wrong with doing it that way, assuming you document it in a way that ensures others know what you're doing.
However, as an alternative, if might be worth exploring the option to return an enumerated type defining the codes. Something like:
enum returnCode {
SUCCESS, FAILURE, NO_CHANGE
}
That way, it's much more obvious what your code is doing, self-documenting even. But might not be an option, depending on your code base.
It is a convention only. I have worked with many api that abandon the principle when they want to convey more information to the caller. As long as your consistent with this approach any experienced programmer will quickly pick up the standard. What is hard is when each function uses a different approach IE with win32 api.
In my opinion (and that's the opinion of someone who tends to do out-of-band error messaging thanks to working in Java), I'd say it is acceptable if your functions are of a kind that require strict return-value processing anyway.
So if the return value of your method has to be inspected at all points where it's called, then such a non-standard solution might be acceptable.
If, however, the return value might be ignored or just checked for success at some points, then the non-standard solution produces quite some problem (for example you can no longer use the if(!myFunction()) ohNoesError(); idiom.
What is your problem? It is just a convention, not a law. If your logic makes more sense for your application, then it is fine, as long as it is well documented and consistent.
On Unix, exit status is unsigned, so this approach won't work if you ever have to run your program there, and this will confuse all your Unix programmers to no end. (I looked it up just now to make sure, and discovered to my surprised that Windows uses a signed exit status.) So I guess it will probably only mostly confuse your Windows programmers. :-)
I'd find another method to pass status between processes. There are many to choose from, some quite simple. You say "at the cost of a few WTF's on the programmers side" as if that's a small cost, but it sounds like a huge cost to me. Re-using an int in C is a miniscule benefit to be gained from confusing other programmers.
You need to go on a case by case basis. Think about the API and what you need to return. If your function only needs to return success or failure, I'd say give it an explicit type of bool (C99 has a bool type now) and return true for success and false for failure. That way things like:
if (!doSomething())
{
// failure processing
}
read naturally.
In many cases, however, you want to return some data value, in which case some specific unused or unlikely to be used value must be used as the failure case. For example the Unix system call open() has to return a file descriptor. 0 is a valid file descriptor as is theoretically any positive number (up to the maximum a process is allowed), so -1 is chosen as the failure case.
In other cases, you need to return a pointer. NULL is an obvious choice for failure of pointer returning functions. This is because it is highly unlikely to be valid and on most systems can't even be dereferenced.
One of the most important considerations is whether the caller and the called function or program will be updated by the same person at any given time. If you are maintaining an API where a function will return the value to a caller written by someone who may not even have access to your source code, or when it is the return code from a program that will be called from a script, only violate conventions for very strong reasons.
You are talking about passing information across a boundary between different layers of abstraction. Violating the convention ties both the caller and the callee to a different protocol increasing the coupling between them. If the different convention is fundamental to what you are communicating, you can do it. If, on the other hand, it is exposing the internals of the callee to the caller, consider whether you can hide the information.

reverse engineering c programs

every c program is converted to machine code, if this binary is distributed. Since the instruction set of a computer is well known, is it possible to get back the C original program?
You can never get back to the exact same source since there is no meta-data about that saved with the compiled code.
But you can re-create code out from the assembly-code.
Check out this book if you are interested in these things: Reversing: Secrets of Reverse Engineering.
Edit
Some compilers-101 here, if you were to define a compiler with another word and not as technical as "compiler", what would it be?
Answer: Translator
A compiler translates the syntax / phrases you have written into another language a C compiler translates to Assembly or even Machine-code. C# Code is translated to IL and so forth.
The executable you have is just a translation of your original text / syntax and if you want to "reverse it" hence "translate it back" you will most likely not get the same structure as you had at the start.
A more real life example would be if you Translate from English to German and the from German back to English, the sentance structure will most likely be different, other words might be used but the meaning, the context, will most likely not have changed.
The same goes for a compiler / translator if you go from C to ASM, the logic is the same, it's just a different way of reading it ( and of course its optimized ).
It depends on what you mean by original C program. Things like local variable names, comments, etc... are not included in the binary, so there's no way to get the exact same source code as the one used to produce the binary. Tools such as IDA Pro might help you disassemble a binary.
I would guestimate the conversion rate of a really skilled hacker at about 1 kilobyte of machine code per day. At common Western salaries, that puts the price of, say, a 100 KB executable at about $25,000. After spending that much money, all that's gained is a chunk of C code that does exactly what yours does, minus the benefit of comments and whatnot. It is no way competitive with your version, you'll be able to deliver updates and improvements much quicker. Reverse engineering those updates is a non trivial effort as well.
If that price tag doesn't impress you, you can arbitrarily raise the conversion cost by adding more code. Just keep in mind that skilled hackers that can tackle large programs like this have something much better to do. They write their own code.
One of the best works on this topic that I know about is:
Pigs from sausages? Reengineering from assembler to C via FermaT.
The claim is you get back a reasonable C program, even if the original asm code was not written in C! Lots of caveats apply.
The Hex-Rays decompiler (extension to IDA Pro) can do exactly that. It's still fairly recent and upcoming but showing great promise. It takes a little getting used to but can potentially speed up the reversing process. It's not a "silver bullet" - no c decompiler is, but it's a great asset.
The common name for this procedure is "turning hamburger back into cows." It's possible to reverse engineer binary code into a functionally equivalent C program, but whether that C code bears a close resemblance to the original is an open question.
Working on tools that do this is a research activity. That is, it is possible to get something in the easy cases (you won't recover local variables names unless debug symbols are present, for instance). It's nearly impossible in practice for large programs or if the programmer had decided to make it difficult.
There is not a 1:1 mapping between a C program and the ASM/machine code it will produce - one C program can compile to a different result on different compilers or with different settings) and sometimes two different bits of C could produce the same machine code.
You definitely can generate C code from a compiled EXE. You just can't know how similar in structure it will be to the original code - apart from variable/function names being lost, I assume it won't know the original way the code was split amongst many files.
You can try hex-rays.com, it has a really nice decompiler which can decompile assembly code into C with 99% accuracy.

Halting in non-Turing-complete languages

The halting problem cannot be solved for Turing complete languages and it can be solved trivially for some non-TC languages like regexes where it always halts.
I was wondering if there are any languages that has both the ability to halt and not halt but admits an algorithm that can determine whether it halts.
The halting problem does not act on languages. Rather, it acts on machines
(i.e., programs): it asks whether a given program halts on a given input.
Perhaps you meant to ask whether it can be solved for other models of
computation (like regular expressions, which you mention, but also like
push-down automata).
Halting can, in general, be detected in models with finite resources (like
regular expressions or, equivalently, finite automata, which have a fixed
number of states and no external storage). This is easily accomplished by
enumerating all possible configurations and checking whether the machine enters
the same configuration twice (indicating an infinite loop); with finite
resources, we can put an upper bound on the amount of time before we must see
a repeated configuration if the machine does not halt.
Usually, models with infinite resources (unbounded TMs and PDAs, for instance),
cannot be halt-checked, but it would be best to investigate the models and
their open problems individually.
(Sorry for all the Wikipedia links, but it actually is a very good resource for
this kind of question.)
Yes. One important class of this kind are primitive recursive functions. This class includes all of the basic things you expect to be able to do with numbers (addition, multiplication, etc.), as well as some complex classes like #adrian has mentioned (regular expressions/finite automata, context-free grammars/pushdown automata). There do, however, exist functions that are not primitive recursive, such as the Ackermann function.
It's actually pretty easy to understand primitive recursive functions. They're the functions that you could get in a programming language that had no true recursion (so a function f cannot call itself, whether directly or by calling another function g that then calls f, etc.) and has no while-loops, instead having bounded for-loops. A bounded for-loop is one like "for i from 1 to r" where r is a variable that has already been computed earlier in the program; also, i cannot be modified within the for-loop. The point of such a programming language is that every program halts.
Most programs we write are actually primitive recursive (I mean, can be translated into such a language).
The short answer is yes, and such languages can even be extremely useful.
There was a discussion about it a few months ago on LtU:
http://lambda-the-ultimate.org/node/2846

Resources