If I have a code, fully written in C, using only libraries written also in C, and I have a compiler, like GCC, supporting many platforms, can I be sure that this code will run for any architecture supported by compiler? For example, can I take Flex or CPython , compile and use it, let say, at AVR?
Edit:
compile and run, of course
no GUI
No, it's hard to say without seeing your code. For example, if you depend on that long is 4 bytes, it won't be right on 64bit machine.
"fully written in C" doesn't guarantee in any way that the code is portable. A portable compiler like GCC abstracts away the CPU architecture details, but the moment you use a system call specific to a particular OS, your code becomes unportable unless you surround the fragment in an #ifdef WHATEVER_OS. This is why standards like POSIX have emerged to unify the system call interface across different operating systems.
Limiting your code to the POSIX-defined system calls and using a POSIX-compliant operating system should generally stop you from worrying, with little exceptions.
source code portability and compilability should be granted in your scenario.
things might change if you would use external libraries or GUI frameworks which depend on the specific OS but this is not your case you you should be good to go.
The response is "they'll probably compile but there is a possibility they won't run". The problem is the resources available to you. Let's say you program for your 2gb machine. Will your program run on a 256mb machine? Could a DOS machine run CPython? But they'll probably compile :-) (technically you could have a program (the code) too much big to fit in the address space of the target machine. If your .exe/.out is 18mb and the target machine has an address space of 16mb, you can't even compile)
Simply using C doesn't guarantee that the code is portable to any platform that supports C. There's a boat-load of traps to step into, like dependence on type-sizes, endianess or undefined behavior.
In reality, a non-trivial program is rarely portable to much except the platforms you have actually verified that it runs on. But you can certainly take measures to try and decrease the chance of problems, though.
Related
I recently read the dragon book of compiler design. It mentions that the compiler has intermediate code generation as one of its phases which produces a machine independent code. Then why was C not developed as a platform independent language like java?
What the Dragon Book is describing is the following process:
Compile the source code into an intermediate machine-independent byte code format
Perform optimizations and analyses on that IR
Translate the IR to the target platform's actual machine code
The upside of this is that if you want to support additional systems, you just need to add a new code generator for step 3 without having to touch steps 1 and 2.
All common C compilers work this way. So if your question is "Why don't C compilers do what the Dragon Book describes?", the answer is: "They do".
Now you mentioned Java. What a Java compiler does is the following:
Compile the Java code into Java byte code. As far as the Java compiler is concerned, this is not an intermediate format, but the actual target language.
The end
Now to run this byte code you need a JVM, which interprets the byte code and/or JIT-compiles it. The optimizations and analyses usually happen during JIT-compilation. This is not the process described in the Dragon Book.
From the language implementers' point of view, this doesn't change the effort of supporting a new target system very much. You no longer have to change the compiler, but instead you have to change the JVM: Instead of having to add a new backend to the javac compiler, you instead add a new backend to the JIT-compiler. The effort remains basically the same.
The major difference is for the Java programmers: Instead of compiling the program for every target platform and distributing packages for each platform, you can now compile the code once and give the resulting package to everyone. Now the people running your code need to install an JVM to be able to use the package, so you basically moved the effort from the programmer to the end user, but installing a JVM is something you need to do only once (not for every Java program you want to run).
So instead of "write once, compile everywhere", you now have "compile once, run everywhere".
So why didn't C do the same thing that Java does? Performance. Interpreting byte code is slow (compared to running compiled code) and JIT-compilation leads to increased start-up time.
C was initially designed for a particular use case, which involved a specific machine. Although it was loosely based on the language BCPL, which was implemented by way of a platform-independent virtual machine, the goal for C was to be able to write low-level code, such as an operating system, which meant that it needed to be able to take advantage of specific features of the target machine, particularly its ability to directly address individual bytes. By contrast, BCPL's underlying architecture is resolutely word-oriented.
The fact that Bell Labs was able to rapidly reimplement the Unix Operating System in their new language (C) certainly contributed ti its popularity. (At least, that's why I initially learned it.) To allow for a wider dissemination of the language, a version of the compiler was written more closely following the architecture outlined in the Dragon Book, with an initial generation of virtual machine code which is then used to produce code for a target machine. This Portable C Compiler was for many years a reference implementation, and continues to be available.
Other languages contemporary with C, notably Pascal, also used the tactic of targetting a platform independent vurtual machine, and it was once common to refer to virtual machine code as "P-Code" because that's what Niklaus Wirth's Pascal project called their target architecture.
Although GCC does not use a virtual machine as such, it does start by generating a liw-level machine-independent internal representation, simplifying the task of porting the compiler to new archutectures. And of course the Clang compiler produces LLVM (low-level virtual machine) code, which can be transpiled into various concrete machine codes, or interpreted directly.
C was originally designed and written as a "Write-Once, Compile-Anywhere" language, which was as close as they could get at the time to a Universal Language.
Processors and Architectures were so radically different, and resources were so small that the idea of a Universal Virtual Machine (like Java has) was just impossible.
The idea that a single code-base could be run through a compiler, and then you have the same software on any target platform was pretty incredible.
The short answer: Because it was not feasible at that time.
The long answer: the Java platform is a language + virtual machine, Java code compiles to a something called ByteCode, then the virtual machine can take this byte code (it is similar to assembly language) and translates it to the relevant command at runtime, meaning the machine instruction that will be understood by the local machine.
Every architecture has it's own instruction set, meaning that an ARM architecture will not be able to understand code compiled for x86 architecture for example.
in C, the c code is compiled directly to machine instructions, these instructions are then performed by the local machine.
to get a behaviour like Java, you will need to have some kind of interpretor that reads C and translates it to machine code at runtime, this is no cheap task and was way too much for the computers of the time (c was invented in 1972) of course another way this could be implemented is to have the user compile your program before using it, which could be nice but probably will involve making your source code visible to the client, which is unwanted.
hopefully that clarifies things a bit.
Aside from leaving a number of things implementation-defined (in practice this is largely platform/ABI-defined, but strictly speaking doesn't have to be), C is mostly a platform-independent language. Indeed there are implementations of C (such as emscripten) that produce output in a form that can run on any machine platform with the right runtime environment for it. If software written in C makes assumptions about the implementation-defined (or worse, undefined) aspects of the language, then it might fail to work on some implementations/machines, but quite often the cause is more a matter of API/environment/library assumptions (like assuming POSIX, or Windows, or glibcisms) than making nonportable assumptions about the language itself.
Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!
I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).
What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
For example, OpenGL is a library which is OS independent. Functions it contain must be assuming a specific processor. But ain't codes/programs/applications OS-specific?
What I learned is that:
OS is processor-specific.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Source code is plain text.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
OpenGL is a library.
Therefore, OpenGL has to be OS/processor-specific. How can it be OS-independent?
What can be OS independent is the source code. Is this correct?
How does it help to know if a source code is OS-independent or not?
What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
When a program uses only defined behaviour (no undefined, unspecified or implementation defined behaviours), then the program is guarenteed by the lanugage standard (in your case C language standard) to compile (using a standards compliant compiler) and run uniformly on all operating systems.
Basically you've to understand that a language standard like C or a library standard like OpenGL gives a set of minimum assumable guarentees that a programmer can make and build upon. These won't change as long as the compiler is compliant with the standard (in case of a library, the implementation is standards-compilant) and the program is not treading in undefined behaviour land.
openGL has to be OS/processor specific. How can it be OS-independent?
No. OpenGL is platform-independant. An OpenGL implementation (driver which implements the calls) is definitely platform and GPU-specific. Say C standard is implemented by GCC, MSVC++, etc. which are all different compiler implementations which can compile C code.
what can be OS independent is the source code. Is this correct?
Source code (if written for with portability in mind) is just one amongst many such platform-independant entities. Libraries (OpenGL, etc.), frameworks (.NET, etc.), etc. can be platform-independant too. For that matter even hardware can be spec'd by some one and implemented by someone else: ARM processors are standards/specifications charted out by ARM and implemented by OEMs like Qualcomm, TI, etc.
do we mean that it can run on any OS as long as the processor is same?
Both processor and the platform (OS) doesn't matter as long as you use only cross-platform components for building your program. Say you use C, a portable language; SDL, a cross-platform library for creating windows, handling events, framebuffers, etc.; OpenGL, a cross-platform graphics library. Now your program will run on multiple platforms, even then it depends on the weakest link. If SDL doesn't run on some J2ME-only phone then it'll not have a library distribution for that platform and thus you application won't run on that platform; so in a sense nothing is all independant. So it's wise to play around with the various libraries available for different architectures, platforms, compilers, etc. and then pick the required ones based on the platforms you're targetting.
What exactly do we mean when we say that a program is OS-independent?
It means that it has been written in a way, that it can be compiled (if compilation is necessary for the language used) or run without or just little modification on several operating systems and/or processor architectures.
For example, openGL is a library which is OS independent.
OpenGL is not a library. OpenGL is an API specification, i.e. a lengthy volume of text that describes a set of tokens (= named numeric values) and entry points (= callable functions) and the effects they have on the system level.
What I learned is that:
OS is processor-specific.
Wrong!
Just like a program can be written in a way that it can targeted to several operating systems (and processor architectures), operating systems can be written in a way, that they can be compiled for and run on several processor architecture.
Linux for example supports so many architectures, that it's jokingly said, that it runs on everything that is capable of processing zeroes and ones and has a memory management unit.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Wrong!
Program logic is independent from the OS. A calculation like x_square = x * x doesn't depend on the OS at all. Only a very small portion of a program, namely those parts that make use of operating system services actually depend on the OS. Such services are things like opening, reading and writing to files, creating windows, stuff like that. But you normally don't use those OS specific APIs directly.
Most OS low level APIs have certain specifics which a easy to trip over and arcane to address. So you don't use them, but some standard, OS independent library that hides the OS specific stuff.
For example the C language (which is already pretty low level) defines a standard set of functions for file access, the stdio functions. fopen, fread, fwrite, fclose, … Similar does C++ with its iostreams But those just wrap the OS specific APIs.
source code is plain text.
Usually it is, but not necessarily. There are also graphical, data flow programming environments, like LabVIEW, which can create native code as well. The source code those use is not plain text, but a diagram, which is stored in a custom binary format.
Compiler ( a program ) is OS specific, but it can compile a source code for a different processor assuming the same OS.
Wrong! and Wrong!
A compiler is language and target specific. But its perfectly possible to have a compiler on your system that generates executables targeted for a different processor architecture and operating system than the system you're using it on (cross compilation). After all a compiler is "just" a (mathematical) function mapping from source code to target binary.
In fact the compiler itself doesn't target an operating system at all, it only targets a processor architecture. The whole operating system specifics are introduced by the ABI (application binary interface) of the OS, which are addresses by the linked runtime environment and that target linker (yes, the linker must be able to address a specific OS).
openGL is a library.
Wrong!
OpenGL is a API specification.
Therefore, openGL has to be OS/processor specific.
Wrong!
And even if OpenGL was a library: Libraries can be written to be portable as well.
How can it be OS-independent?
Because OpenGL itself is just a lengthy document of text describing the API. Then each operating system with OpenGL support will implement that API conforming to the specification, so that a program written or compiled to run on said OS can use OpenGL as specified.
what can be OS independent is the source code.
Wrong!
It's perfectly possible to write a program source code in a way that it will only compile and run for a specific operating system and/or for a specific processor architecture. Pinnacle of OS / architecture dependence: Writing things in assembler and using OS specific low level APIs directly.
How does it help to know if a source code is OS/window independent or not?
It gives you a ballpark figure of how hard it will be to target the program to a different operating system.
A very important thing to understand:
OS independence does not mean, a programm will run on all operating systems or architectures. It means that it is not tethered to a specific OS/CPU combination and porting to a different OS/CPU requires only little effort.
There's a couple concepts here. A program can be OS-independent, that is it can run/compile without changes on a range of OS's. Secondly libraries can be made on a range of OS's which can be used by a platform independent program.
Strictly OpenGL doesn't have to be OS-independent. OpenGL may actually have different source code on different OS's which interface with drivers in a platform specific way. What matters is that OpenGL's interface is OS-independent. Because the interface is OS-independent it can be used by code which is actually OS-independent and can be run/compiled without modification.
Libraries abstracting out OS-specific things is a wonderful way to allow your code to interface with the OS which normally would require OS-specific code.
One of those:
It compiles on any OS supported by program framework without changes to source code. (languages like C++ that compile directly into machine code)
The program is written in interpreted language or in language that compiles into platform-independent bytecode, and can actually run on whatever platform its interpreter supports without modifications. (languages like java or python).
Application relies on cross-platform framework of some kind that abstract operating-system-specific calls away. It will run without modifications on any OS supported by framework.
Because you haven't added any language tag, it is either #1, #2 or #3, depending on your language.
--edit--
OS is processor-specific.
No. See Linux. Same code base, can be compiled for different architectures. Normally, (well, it is reasonable to expect that) OS kernel is written in portable language (like C) that can be rebuild for different CPU. On distribution like gentoo, you can rebuild entire OS from source as well.
Applications (programs/codes/routines/functions/libraries) are OS specific.
No, Applications like java *.jar files can be made more or less OS independent - as long as there is interpreter, they'll run anywhere. There will be some OS-specific part (like java runtime environment in case of java), but your program will run anywhere where this part is present.
Source code is plain text.
Not necessarily, although it is true in most cases.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
Not quite. It is reasonable to be written using (somewhat) portable code so compiler can be rebuilt for different OS.
While running on OS A it is possible (in some cases) to compile code for os B. On Linux you can compile code for windows platform.
OpenGL is a library.
It is not. It is a specification (API) that describes set of programming functions for working with 3d graphics. There are Libraries that implement this specifications. Specification itself is not a library.
Therefore, OpenGL has to be OS/processor-specific.
Incorrect conclusion.
How can it be OS-independent?
As long as underlying platform has standard-compliant OpenGL implementation, rendering part of your program will work in the same way as on any other platform with standard-compliant OpenGL implementation. That's portability. Of course, this is an ideal situation, in reality you might run into driver bug or something.
Is it possible to run a written code or written library that only uses initial c libraries, on every platform?
For example:
Windows,
ARM Microprocessors,
PIC microprocessors,
They have their compilers seperately and this difference is not important for me, I can compile in different compilers for need. But do I have to change code totally or partially to run on this platforms?
Note: For libraries, I will just use default c libraries.
It depends. If your library use only standard C and the multiplatform you are about to port has a compiler compatiable to standard C, you can always write the library code. But if your library have to call native API of each platform, you have to encapsulate these code seperately.
In embedded systems you will need to implement certain functions that the operating system gives you (like _sbrk, _read, etc.) for standard library functions like malloc and printf.
If you take care of that I don't see a reason for your code not to work so long as you take GREAT CARE in how you write it. By GREAT CARE, I mean be very careful with floating points, processor word size and any other things that are not common between your desired targets.
Short answer: possible, but not easy.
Is there any difference in C that is written in Windows and Unix?
I teach C as well as C++ but some of my students have come back saying some of the sample programs do not run for them in Unix. Unix is alien to me. Unfortunately no experience with it whatsoever. All I know is to spell it. If there are any differences then I should be advising our department to invest on systems for Unix as currently there are no Unix systems in our lab. I do not want my students to feel that they have been denied or kept away from something.
That kind of problems usually appear when you don't stick to the bare C standard, and make assumptions about the environment that may not be true. These may include reliance on:
nonstandard, platform specific includes (<conio.h>, <windows.h>, <unistd.h>, ...);
undefined behavior (fflush(stdin), as someone else reported, is not required to do anything by the standard - it's actually undefined behavior to invoke fflush on anything but output streams; in general, older compilers were more lenient about violation of some subtle rules such as strict aliasing, so be careful with "clever" pointer tricks);
data type size (the short=16 bit, int=long=32 bit assumption doesn't hold everywhere - 64 bit Linux, for example, has 64 bit long);
in particular, pointer size (void * isn't always 32 bit, and can't be always casted safely to an unsigned long); in general you should be careful with conversions and comparisons that involve pointers, and you should always use the provided types for that kind of tasks instead of "normal" ints (see in particular size_t, ptrdiff_t, uintptr_t)
data type "inner format" (the standard does not say that floats and doubles are in IEEE 754, although I've never seen platforms doing it differently);
nonstandard functions (__beginthread, MS safe strings functions; on the other side, POSIX/GNU extensions)
compiler extensions (__inline, __declspec, #pragmas, ...) and in general anything that begins with double underscore (or even with a single underscore, in old, nonstandard implementations);
console escape codes (this usually is a problem when you try to run Unix code on Windows);
carriage return format: in normal strings it's \n everywhere, but when written on file it's \n on *NIX, \r\n on Windows, \r on pre-OSX Macs; the conversion is handled automagically by the file streams, so be careful to open files in binary when you actually want to write binary data, and leave them in text mode when you want to write text.
Anyhow an example of program that do not compile on *NIX would be helpful, we could give you preciser suggestions.
The details on the program am yet to get. The students were from our previous batch. Have asked for it. turbo C is what is being used currently.
As said in the comment, please drop Turbo C and (if you use it) Turbo C++, nowadays they are both pieces of history and have many incompatibilities with the current C and C++ standards (and if I remember well they both generate 16-bit executables, that won't even run on 64 bit OSes on x86_64).
There are a lot of free, working and standard-compliant alternatives (VC++ Express, MinGW, Pelles C, CygWin on Windows, and gcc/g++ is the de-facto standard on Linux, rivaled by clang), you just have to pick one.
The language is the same, but the libraries used to get anything platform-specific done are different. But if you are teaching C (and not systems programming) you should easily be able to write portable code. The fact that you are not doing so makes me wonder about the quality of your training materials.
The standard libraries that ship with MSVC and those that ship with a typical Linux or Unix compiler are different enough that you are likely to encounter compatibility issues. There may also be minor dialectic variations between MSVC and GCC.
The simplest way to test your examples in a unix-like environment would be to install Cygwin or MSYS on your existing Windows kit. These are based on GCC and common open-source libraries and will behave much more like the C compiler environment on a unix or linux system.
Cygwin is the most 'unix like', and is based on a cygwin.dll, which is an emulation layer that emulates unix system calls on top of the native Win32 API. Generally anything that would compile on Cygwin is very likely to compile on Linux, as Cygwin is based on gcc and glibc. However, native Win32 APIs are not available to applications compiled on Cygwin.
MSYS/MinGW32 is designed for producing native Win32 apps using GCC. However, most of the standard GNU and other OSS libraries are available, so it behaves more like a unix environment than VC does. In fact, if you are working with code that doesn't use Win32 or unix specific APIs it will probably port between MinGW32 and Linux more easily than it would between MinGW32 and MSVC.
While getting Linux installed in your lab is probably a useful thing to do (Use VMWare player or some other hypervisor if you can't get funding for new servers) you can use either of the above toolchains to get something that will probably be 'close enough' for your purposes. You can learn unix as takes your fancy, and both Cygwin and MSYS will give you a unix-like environment that could give you a bit of a gentle intro in the meantime.
C syntax must be the same if both Windows and Unix compilers adhere to the same C standard. I was told that MS compilers still don't support C99 in full, although Unix compilers are up to speed, so it seems C89 is a lowest common denominator.
However in Unix world you typically will use POSIX syscalls to do system stuff, like IPC etc. Windows isn't POSIX system so it has different API for it.
There is this thing called Ansi C. As long as you code purely Ansi C, there should be no difference. However, this is a rather academic assumption.
In real life, I have never encountered any of my codes being portable from Linux to Windows and vice versa without any modification. Actually, this modificationS (definitely plural) turned out into a vast amout of pre-processor directives, such as #ifdef WINDOWS ... #endif and #ifdef UNIX ... #endif ... even more, if some parallel libs, such as OPENMPI were used.
As you may imagine, this is totally contrary to readable and debugable code, but that was what worked ;-)
Besides, you have got to consider things already mentioned: UTF-8 will sometimes knock out linux compilers...
There should be no difference between the C programming language under windows or *nix,cause the language is specified by the ISO standard.
The C language itself is the portable from Windows to Unix. But operating system details are different and sometimes those intrude into your code.
For instance Unix systems typically use only "\n" to separate lines in a text file, while most Windows tools expect to see "\r\n". There are ways to deal with this sort of difference in a way that gets the C runtime to handle it for you but if you aren't careful you know about them, it's pretty easy to write OS specific C code.
I could that you run a Unix in a Virtual Machine and use that to test your code before you share it with your students.
I think its critical that you familiarize yourself with unix right now.
An excellent way to do this is a with a Knoppix CD.
Try to compile your programs under Linux using gc, and when they don't work, track down the problems (#include <windows>?) and make it work. Then return to windows, and it'll likely compile ok.
In this way, you will discover your programs become cleaner and better teaching material, even for lab exercises on windows machines.
A common problem is that fflush(stdin) doesn't work on Unix.
Which is perfectly normal, since the standard doesn't define how the implementation should handle it.
The solution is to use something like this (untested):
do
{
int c = getchar();
}
while (c != '\n' && c != EOF);
Similarly, you need to avoid anything that causes undefined behavior.