confusion between compiler and interpreter? - c

I read the following documentation about compiler and interpreter somewhere :-
A compiler searches all the errors of a program and lists them. If the program is error
free then it converts the code of program into machine code and then the program can be
executed by separate commands.
An interpreter checks the errors of a program statement by statement. After checking
one statement, it converts that statement into machine code and then executes that
statement. The process continues until the last statement of program occurs.
My doubt came from the following code:
int main()
{
printf("hello")
scanf("%d",&j);
return 0;
}
I am using MINGW GCC compiler. When I compile the above code following things happen:
Firstly I get the error
error: expected ';' before 'scanf()'
After I correct the above error then I get the second error
error: 'j' undeclared (first use in this function)
So I wanted to know that why both the errors are not listed at one time?

Compiler and interpreters are technically two different things, though the boundaries can be pretty fluid sometimes.
A compiler is basically nothing more than a language translator. It takes a source language as input and generates a destination language as output.
An interpreter takes a language (be it high-level or low-level) and executes the code described by the language.
The confusion is mostly because most modern script languages contains both a compiler and an interpreter, where the compiler takes the script language and creates a lower-level equivalent (similar to binary machine language) that the interpreter then reads and executes.
As for your problems with the compiler errors, it's most likely because the compiler can't continue parsing the scanf call due to the first error, and simply skips it (including the undeclared variable).
You should also know that in C some errors can actually cause more errors in code that is otherwise correct for example
int j
printf("Enter something: ");
scanf("%d", &j);
You will get an error because of the missing semicolon after the declaration of the variable j, but you will also get an error with the scanf line as the compiler can't find the variable j, even though the scanf call is otherwise correct.
Another typical example of errors that will give follow-up errors in unrelated code, is to forget the terminating semicolon of a structure in a header file. If it's the last structure you might not even get any error in the header file, just unrelated errors in the source file you include the header file in.

The documentation you are quoting is a bit misleading.
Both compilers and interpreters aim to report as much errors as possible but finding "all the errors of a program" is impossible. (cf. Halting Problem)
So, a compiler doesn't "search for errors", rather, it parses your source into a tree representation (AST) and then tries to transform that tree into another "tree" for another language (say, machine code).
An interpreter also parses your code but the transformation is done in parts at runtime.
So in your example, the missing semicolon causes the parser to fail so the compiler doesn't even get to the compilation stage (which reports the second error).
As others have said, the distinction between compilers and interpreters is not that clear anymore. Similar techniques are used, interpreters often compile to machine code, etc.

The compiler definition you are quoting is not the best one. One would think that the most important characteristic of a compiler is that it finds errors. Though of course it is very important part of the compiler's job, the main one is to translate the source code into some other form - not even necessarily machine code. In the old days some compilers did not bother with listing all the errors found - at least in one case the entire messaging was that the compiler found an error somewhere in the source and stopped. And even now sometimes it is not possible to find all errors in one go.

A common compiler behavior when an error is detected is to try to recover the error and continue the parsing to check other errors.
When the compiler detects the missing semicolon error it usually try to recover the error skipping input until the next semicolon, for that reason the scanf("%d",&j) statement is not parsed and the missing j definition error is not detected.

The text you are quoting is problematic. While in general true, usually the compiler doesn't have a separate "error check" phase.
What it really does is it tries to read your code right away, and if your code has errors in it, it will fail while trying to read.
The difference between interpreter and compiler isn't when it checks for errors, it is when it actually runs the code. A compiler tries to read the program fully, then run it, an interpreter reads ~one statement (or even just one sub-expression), then runs it, reads another, runs it.

Differences between compiler and Interpreter are given below :
Compiler takes entire program whereas Interpreter takes single statement as input.
Compiler generate intermediate object code whereas Interpreter can not generate intermediate object code .
Compiler execute the program entirely whereas Interpreter execute the program line by line.
Compiler usually faster whereas Interpreter usually slower.
Compiler give less error diagnostics than interpreter.

A compiler directly changes the source code into the machine language, whereas an interpreter produces a middle code and then executes this code in order to form a machine understandable code.
Compiler reads entire program for compilation. Interpreter reads single statement at a time.
Compiler displays all errors and warning together. Interpreter displays single error at a time it reads single instruction at a time.
Compiler requires more memory because of object generation, every time when program is being compiled an intermediate code will be generated. An interpreter needs less memory to interpret the program as interpreter does not generates any intermediate code.
In compiler debugging is comparatively difficult. In an interpreter debugging is easy.
Compiler takes large amount of time to analyse the source code but the overall execution time is comparatively faster. An interpreter takes less amount of time to analyse the source code but the overall execution time is slower.
Once a program is compiled, its source code is not useful for running the code. For interpreted programs, the source code is needed to run the program every time.
In compiler the compilation is done before execution. In an interpreter compilation and execution take place simultaneously.
Compiler does not allow a program to run until it is completely error-free.Interpreter runs the program from first line and stops execution only if it encounters an error.
Examples of programming languages that use compilers: C, C++, COBOL Examples of programming languages that use interpreters: BASIC, Visual Basic, Python, Ruby, PHP, Perl, MATLAB, Lisp.

Related

If interpreter does not convert source code is not converted into machine code. Then how is the machine executing the code?

Compiler VS interpreter in programming language translation, my question here is they say, the interpreter is not converted into machine code. But if that's the case then how is the machine executing the code?
I tried browsing around i found one solution saying that it does nto actually cinverts the code into object code but mearly simulated an illution as such but comverting it into instruction code for itself to execute the code. I am still unclear and lost.
All executed code on a computer is eventually machine code. Thus, the difference between a compiler and interpreter is not if but how/when the translation is done.
A compiler will read the entire source code and translate that into machine code which is stored such that the compiled program can be executed again and again without further involvement of the compiler.
An interpreter will read the source code statement by statement and directly execute machine code corresponding to each statement as it progresses. Therefore, the interpreter is involved each time the program is executed.

Nasa paralell benchmarks - MPI error

I have a pretty special problem. I need to review NASA Parallel Benchmarks for my school project, but I found it to be a very problematic task :-) At first, I tried to work with IS (integer sort), but the code wasn't able to compile, and I found out, that I need to rewrite make.def file. So I rewrote it's variable for mpi.h and compiled it, but the program keeps alerting me:
Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(108): MPI_Comm_rank(comm=0x0, rank=0x6084e8) failed
PMPI_Comm_rank(66).: Invalid communicator
To be honest, I don't really know, should I do know. I tried even changing the old compilers, like cc to gcc etc. but it doesn't seem to have any effect. Last thing which I tried to rewrite was the variable CMPI_LIB, but I have no idea how to do it correctly.
Thank you very much for all your responses ;-)
And I'm sorry for my bad English, I'm not a native speaker.
The whole benchmark is here for download (cca 600kB): uloz.to/xTSEzTX8/npb3-3-1-zip
File is.c which I'm trying to compile and launch: hostcode.sourceforge.net/view/2436
Makefile: hostcode.sourceforge.net/view/2437
File make.common - takes care of compiling files with special functions etc.: hostcode.sourceforge.net/view/2438
My make.def is here: #veeylg84-46196 - BASH - Sourcecode
Structure of files in my 'benchmark folder': http://www.sourcepod.com/sozaoh48-46200
The PMPI_Comm_rank(108): MPI_Comm_rank(comm=0x0, rank=0x6084e8) failed message tells a lot. Your MPI_COMM_WORLD constant is 0 which is not the case with MPICH-based (as one could guess from the format of the error message) implementations. It is a compile-time constant in MPICH (#define MPI_COMM_WORLD ((MPI_Comm)0x44000000)) and a run-time constant reference in Open MPI. The #1 reason for such errors is mixing MPI implementations, i.e. including mpi.h from one implementation and linking against the libraries of another one.
As evident from your make.def, you are including mpi.h from a dummy MPI implementation and linking against a real MPI library. This simply won't work. Since mpicc takes care of passing the right include paths and library options to the backend compiler, you don't have to set CMPI_LIB or CMPI_INC explicitly and should leave them empty instead. Those are reserved for the case when the MPI implementation does not provide compiler wrappers (mpicc, mpif90, etc.) and one has to explicitly specify all options, which is the case with e.g. MS-MPI.

Are programs that are compiled gcc optimised by default?

While at University I learned that compiler optimises our code, in order for the executable to be faster. For example when a variable is not used after a point, it will not be calculated.
So, as far as I understand, that means that if I have a program that calls a sorting algorithm, if the results of the algorithm are printed then the algorithm will run. However, if nothing is printed(or used anywhere else), then there is no reason for the program to even make that call.
So, my question is:
Does these things(optimisation) happen by default when compiling with gcc? Or only when the code is compiled with O1, O2, O3 flags?
When you meet a new program for the first time, it is helpful to type man followed by the program name. When I did it for gcc, it showed me this:
Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.
...
-O0 Reduce compilation time and make debugging produce the expected results. This is the default.
To summarize, with -O0, all code that is in the execution path that is taken will actually execute. (Program text that can never be in any execution path, such as if (false) { /* ... */ }, may not generate any machine code, but that is unobservable.) The executed code will feel "as expected", i.e. it'll do what you wrote. That's the goal, at least.

C dummy operations

I cant imagine what the compiler does when for instance there is no lvalue for instance like this :
number>>1;
My intuition tells me that the compiler will discard this line from compilation due to optimizations and if the optimization is removed what happens?
Does it use a register to do the manipulation? or does it behave like if it was a function call so the parameters are passed to the stack, and than the memory used is marked as freed? OR does it transform that to an NOP operation?
Can I see what is happening using the VS++ debugger?
Thank your for your help.
In the example you give, it discards the operation. It knows the operation has no side effects and therefore doesn't need to emit the code to execute the statement in order to produce a correct program. If you disable optimizations, the compiler may still emit code. If you enable optimizations, the compiler may still emit code, too -- it's not perfect.
You can see the code the compiler emits using the /FAsc command line option of the Microsoft compiler. That option creates a listing file which has the object code output of the compiler interspersed with the related source code.
You can also use "view disassembly" in the debugger to see the code generated by the compiler.
Using either "view disassembly" or /FAsc on optimized code, I'd expect to see no emitted code from the compiler.
Assuming that number is a regular variable of integer type (not volatile) then any competent optimizing compiler (Microsoft, Intel, GNU, IBM, etc) will generate exactly NOTHING. Not a nop, no registers are used, etc.
If optimization is disabled off (in a "debug build"), then the compiler may well "do what you asked for", because it doesn't realize it doesn't have side-effects from the code. In this case, the value will be loaded into a register, shifted right once. The result of this is not stored anywhere. The compiler will perform "useless code elimination" as one of the optimization steps - I'm not sure which one, but for this sort of relatively simple thing, I expect the compiler to figure out with fairly basic optimization settings. Some cases, where loops are concerned, etc, the compiler may not optimize away the code until some more advanced optimization settings are enabled.
As mentioned in the comments, if the variable is volatile, then the read of the memory reprsented by number will have to be made, as the compiler MUST read volatile memory.
In Visual studio, if you "view disassembly", it should show you the code that the compiler generated.
Finally, if this was C++, there is also the possibility that the variable is not a regular integer type, the function operator>> is being called when this code is seen by the compiler - this function may have side-effects besides returning a result, so may well have to be performed. But this can't be the case in C, since there is no operator overloading.

Find unused functions in a C project by static analysis

I am trying to run static analysis on a C project to identify dead code i.e functions or code lines that are never ever called. I can build this project with Visual Studio .Net for Windows or using gcc for Linux. I have been trying to find some reasonable tool that can do this for me but so far I have not succeeded. I have read related questions on Stack Overflow i.e this and this and I have tried to use -Wunreachable-code with gcc but the output in gcc is not very helpful. It is of the following format
/home/adnan/my_socket.c: In function ‘my_sockNtoH32’:
/home/adnan/my_socket.c:666: warning: will never be executed
but when I look at line 666 in my_socket.c, it's actually inside another function that is being called from function my_sockNtoH32() and will not be executed for this specific instance but will be executed when called from some other functions.
What I need is to find the code which will never be executed. Can someone plz help with this?
PS: I can't convince management to buy a tool for this task, so please stick to free/open source tools.
If GCC isn't cutting it for you, try clang (or more accurately, its static analyzer). It (generally, your mileage may vary of course) has a much better static analysis than GCC (and produces much better output). It's used in Apple's Xcode but it's open-source and can be used seperately.
When GCC says "will never be executed", it means it. You may have a bug that, in fact, does make that dead code. For example, something like:
if (a = 42) {
// some code
} else {
// warning: unreachable code
}
Without seeing the code it's not possible to be specific, of course.
Note that if there is a macro at line 666, it's possible GCC refers to a part of that macro as well.
GCC will help you find dead code within a compilation. I'd be surprised if it can find dead code across multiple compilation units. A file-level declaration of a function or variable in a compilation unit means that some other compilation unit might reference it. So anything declared at the top level of a file, GCC can't eliminate, as it arguably only sees one compilation unit at a time.
The problem gets get harder. Imagine that compilation unit A declares function a, and compilation unit B has a function b that calls a. Is a dead? On the face of it, no. But in fact, it depends; if b is dead, and the only reference to a is in b, then a is dead, too. We get the same problem if b merely takes &a and puts it into an array X. Now to decide if a is dead, we need a points-to analysis across the entire system, to see if that pointer to a is used anywhere.
To get this kind of accurate "dead" information, you need a global view of the entire set of compilation units, and need to compute a points-to analysis, followed by the construction of a call-graph based on that points-to analysis. Function a is dead only if the call graph (as a tree,
with main as the root) doesn't reference it somewhere.
(Some caveats are necessary: whatever the analysis is, as a practical matter it must be conservative, so even a full-points to analysis may not identify a function correctly as dead. You also have to worry about uses of a C artifact from outside the set of C functions, e.g., a call to a from some bit of assembler code).
Threading makes this worse; each thread has some root function which is probably at the top of the call DAG. Since how a thread gets started isn't defined by C compilers, it should be clear that to determine if a multithreaded C application has dead code, somehow the analysis has to be told the thread root functions, or be told how to discover them by looking for thread-initialization primitives.
You aren't getting a lot responses on how to get a correct answer. While it isn't open source, our DMS Software Reengineering Toolkit with its C Front End has all the machinery to do this, including C parsers, control- and dataflow- analysis, local and global points-to analysis, and global call graph construction. DMS is easily customized to include extra information such as external calls from assembler, and/or a list of thread roots or specific source-patterns that are thread initialization calls, and we've actually done that (easily) for some large embedded engine controllers with millions of lines of code. DMS has been applied to systems as large as 26 million lines of code (some 18,000 compilation units) for the purpose of building such calls graphs.
[An interesting aside: in processing individual comilation units, DMS for scaling reasons in effect deletes symbols and related code that aren't used in that compilation unit. Remarkably, this gets rid of about 95% of code by volume when you take into account the iceberg usually hiding in the include file nest. It says C software typically has poorly factored include files. I suspect you all know that already.]
Tools like GCC will remove dead code while compiling. That's helpful, but the dead code is still lying around in your compilation unit source code using up developer's attention (they have to figure out if it is dead, too!). DMS in its program transformation mode can be configured, modulo some preprocessor issues, to actually remove that dead code from the source. On very large software systems, you don't really want to do this by hand.

Resources