C compiler producing lightweight executeables

C compiler producing lightweight executeables - c

I'm currently using MSVC for C++ but as I'm switching to C to write a very performance-intensive program (interpreter) I have to search for a fitting C compiler.
I've looked at some binaries produced by Turbo-C and even if its old they seem pretty straigthforward and optimized.
Now I don't know what the best compiler for building an interpreter is, but maybe you can help me.
I've considered GCC but as I don't know much about it, I can't be really sure.

99.9% a programs performance depends on the code you write and the language you choose.
you can safely ignore the performance of the the compiler.
Stick to MSVC...and dont waste time :)

If I were you, I would take the approach of worrying less about the compiler and worrying more about your own code. Write the code for the interpreter in a reasonable way. Then, profile it, and optimize spots based on how much time they take. That is more likely to produce a performance benefit than using a particular compiler.

If you want a lightweight program, it is not the compiler you need to worry about so much as the code you write and the libraries you use. Most compilers will produce similar results from the same source code.
For example, using C++ with MFC, a basic windows application used to start off at about 900kB and grow rapidly. Linking with the dynamic MFC dlls would get you down to a few hundred kB. But by dropping MFC totally - using Win32 APIs directly - and using a minimal C runtime it was relatively easy to implement the same thing in an .exe of about 25kB or less (IIRC - it's been a long time since I did this).
So ditch the libraries and get back to proper low level C (or even C++ if you don't use too many "clever" features), and you can easily write very compact applications.
edit
I've just realised I was confused by the question title into talking about lightweight applications as opposed to concentrating on performance, which appears to be the real thrust of the question. If you want performance, then there is no specific need to use C, or move to a painful development environment - just write good, high performance code. Fundamentally this is about using the correct designs and algorithms and then profiling and optimising the resulting code to eliminate bottlenecks and inefficiencies. Note that these days you may achieve a much bugger bang for your buck by switching to a multithreaded approach than just concentrating on raw code optimisation - make sure you utilise the hardware well.

You can use GCC, through MingW, Eclipse CDT, or one of the other Windows ports. You can optimize for executable size, speed of resulting executable, or speed of compilation.

C++ was designed to be backward compatible with C. So any C++ compiler should be able to compile pure C. You might want to tell it that it's C and not C++, so compiler doesn't do name mangling, etc. If the compiler is very good with C++, it should be equally good, or better with C, because C is much simpler.
I would suggest sticking with MSVC. It's a very decent system. Though if you are not convinced - compare your options. Build the same program with multiple compilers, look at the assembly they produce, measure the resulting executable performance, etc.

Related

What would be a simple to use JIT library?

I'm trying to write a language runtime (and a language itself) that is similar to .NET or to the JVM. It's got a form of bytecode that is custom.
What I want is a way to translate said bytecode to actual, runnable machine code. So, because I'm not wanting to write such a translator myself (this is more of a toy project/personal side project) I want to find a good JIT library to use.
Here's what I want the library to do:
The library should be as easy to use as possible (toy project and I don't really have much experience here)
The library should support at least x86_64 (development machine), though preferably it should cover other architectures as well
The library should preferably do some low level optimizations (register tracking and allocation, reducing memory accesses etc); those optimizations shouldn't be very expensive to do though (I will myself do other optimizations to e.g. remove virtual calls and convert them to direct ones, for example). I can accept a library with no optimization if it's easiest to use though.
The library must have an interface that is usable from C (preferred) or C++ (acceptable).
I will use Boehm GC for garbage collection, if it matters (probably doesn't, but just in case). Maybe a compacting GC would be nice, but I guess I shouldn't combine the questions...

I would suggest llvm. There are some great tutorials on how to implement your own language with it and basic stuff is not too complicated. You also get the option to do a lot of more advanced stuff later on. As a bonus not only can you use JIT but you can also statically compile and optimize your binaries. LLVM also does have a C interface and can target all common CPU architectures and even a lot of more obscure ones.

Why compilers are written in C/C++ instead of using CoffeeScript (JavaScript, Node JS)?

I am exposed to C because of embedded system programming, and I think it's one wonderful language in this field. However, why is it used to write compilers? If the reason why gcc is implemented in C/C++ is that there aren't many good languages at that time, there's no excuse for why clang is taking the same path (using C/C++).
Is it for performance reasons? Mostly interpreted languages are a bit slower compared with compiled languages, but I guess the difference is almost negligible in CoffeeScript (JavaScript), because of Node.js.
From the perspective of developers, I suppose it's much easier to write one compiler using high level languages. Unfortunately, most of compilers out there are written in C/C++. Is it just because of legacy code?
Response to comments:
Bootstrapping is just one way to illustrate that this language is powerful enough to write one compiler. It shouldn't the dominant reason why we choose the language to implement the compiler.
I agree with the guess given below, that "most compiler developers would answer because most of compiler related tools (bison, yacc) emit C code". However, neither GCC nor Clang use generated parser, they implemented one themselves. This front-end process is independent of targeting architecture, and should not be C/C++'s strength.
There's more or less consensus that performance is one key factor. Indeed, even for GCC and Clang, building a reasonable size of C project (Linux kernel) takes a lot of time. Is it because of the front-end or the back-end. I have to admit that I didn't have much experience on backe-end of compilers, as we finished the course on compiler with generated LLVM code.

I am exposed to C because of embedded system programming, and I think
it's one wonderful language in this field.
Yes. It's better than Java.
However, why is it used to write compilers?
This question can't be answered without asking the developers. I suspect that the majority of them will tell you that common compiler-writing software (yacc, flex, bison, etc) produce C code.
If the reason for gcc is that there aren't many good languages,
there's no excuse for clang.
GCC isn't a programming language, and neither is Clang. They're both implementations of the C programming language.
Is it for performance reasons?
Don't confuse implementation with specification. Speed is an attribute introduced by your compiler and your computer, not by the programming language. GCC happens to produce fairly efficient machine code, which might influence developers to use C as their primary programming language... but in ten years time, it could* be that node.js produces more efficient machine code than GCC. Don't forget, StackOverflow is forever.
* could, but most likely won't. See Ira Baxters comment below for more info.
Mostly interpreted languages are a bit slower compared with compiled
languages, but I guess the difference is almost negligible in
CoffeeScript (JavaScript), because of Node.js.
Similarly, interpretation or compilation isn't the choice of the language, but of the implementation of the language. For example, GCC and Clang choose to compile C to machine code. Ch and CINT are two interpreters that translate C code directly to behaviour, rather than machine code. Java was once predominantly translated using interpretation, too, but is now predominantly compiled into JVM bytecode. Javascript seems to be phasing towards predominant compilation, too. Who knows? Maybe you'll see compilers written predominantly in Javascript in ten years time...
From the perspective of developers, I suppose it's much easier to
write one compiler using high level languages.
All of these programming languages are technically high level. They're mostly defined in terms of an abstract machine; They're certainly not low level.
Unfortunately, most of compilers out there are written in C/C++.
I don't consider it unfortunate that C++ is used to write software; It's not a bad programming language.
Is it just because of legacy code?
I suppose legacy code might influence the decision of a programmer. In the end though, as I said, you'd have to ask the developers. They might just decide to use C or C++ because C or C++ is their favourite programming language... Why do you speak English?

Compilers are very complex software in general. The front end part is pretty simple (parsing), but the backend part (scheduling, code generation, optimizations, register allocations) involve NP-complete problems (of course compilers try to approximate solutions to these problems). Thus, implementing in C would help compile times. C is also very good at bitwise operators and other low level stuff, which is useful for writing a compiler.
Note that not all compilers are written in C though. For example, Haskell GHC compiler is written in Haskell using bootstrapping technique.
Javascript is async, which doesn't suit compiler writing.

I see many reasons:
There is no elegant way of handling bit-precise code in Javascript
You can't write binary files easily in Javascript, so the assembler part of the compiler would have to be in a more low-level language
Huge JS codebase are very heavy to load in memory (that's plain text, remember?)
Writing optimizing routines for compilers are heavily CPU-intensive, which is not yet very compatible with Javascript
You wouldn't be able to compile your compiler with it (bootstrap), because you need a Javascript interpreter behing your compiler. The bootstrap phase wouldn't be "pure":
JS Compiler compiles NodeJS -> NodeJS runs your new Compiler -> new JS Compiler

gcc is implemented primarily in C, but that is not true of all compilers, including some that are quite standard. It is a common pattern for a compiler to be implemented in the language that it compiles. ghc is written largely in Haskell. Recent versions of guile feature a compiler implemented mostly in Scheme.

nope, coffeescript et al are still much slower than natively-compiled (and optimised) C code. Even if you take the subset of javscript that is able to be optimised (asm.js) its still twice as slow as native C.
What you hear about when people say node.js is just as fast as C code means that its just as fast as part of an overall system that does other things like read from disk, wait for data off the network, etc. In these systems the CPU is underused (especially with today's superfast CPUs) so the performance problem is not the raw processing capability of the language. Hence, a node.js server is exactly as fast as a C server if they're both stuck waiting for a network call to return data. The type of system written in node.js does a lot of waiting for network which is why people use node.js. The type of system written in C does not suit being written in node.js

High-level system language that compiles to c?

I'm looking for a higher-level system language, if possible, suitable for formal verification, that compiles to standard C, so that it can be run cross-platform with (relatively) low overhead.
The two most promising such languages I've stumbled during the past few days are:
BitC - While the design goals of this language match my needs (it even supports the functional paradigm), it is in very unstable state, the documentation is out of date, and, generally, it seems like a very long shot for a real-world project.
Lisaac - It supports Design-by-contract, which is very cool and has a relatively low performance overhead. However the website is dead, there hasn't been a new release since '08 and generally it seems the language is dead.
I'd also like to note that it's not meant for a real-time system, so a GC or, generally, non-determinism (in the real-time sense), is not an issue.
The project involves mainly audio processing, though it has to be cross-platform.
I assume someone would point me to the obvious answer - "plain ol' C". While it is truly cross-platform and very effective, the code quantity would probably be greater.
EDIT: I should clarify that I mean cross-platform AND cross-architecture. That is why I consider only languages, compiled to C in the first place, but if you can point me to another example, I'd be grateful :)

I think you may become interested in ATS. It compiles to C (actually it expresses and explains many C idioms and patterns from a formal type-theoretic perspective, it has even been proposed to prepare a book of sorts to show this -- if only we had more time...).
The project involves mainly audio processing, though it has to be cross-platform.
I don't know much about audio processing, I've been mainly doing some computer graphics stuff (mostly the basic things, just to try it out).
Also, I am not sure if ATS works on Windows (never tried that).
(Disclaimer: I've been working with ATS for some time. It is bulky and large language, and sometimes hard to use, but I very much liked the quality of programs I've been able to produce with it, for instance, see TEST subdirectory in GLES2 bindings for some realistic programs)

The following doesn't strictly adhere to the requirements but I'd like to mention it anyway and it is too long for a comment:
Pypy's RPython can be translated to C. Here's a nice talk about it. It's been used to implement Smalltalk, JavaScript, Io, Scheme, Gameboy (with various degree of completeness), but you can write standalone programs in it. It is known mainly for its implementation of Python language that runs on Intel x86 (IA-32) and x86_64 platforms.
The translation process requires a capable C compiler. The toolchain provides means to infer various things about the code (used by the translation process itself) that you might repurpose for formal verification.
If you know both Python and C you could use cython that translates Python-like syntax to C. It is used to write CPython extensions.

What is wrong with using turbo C?

I always find that some people (a majority from India) are using turbo C.
I cannot find any reason to use such outdated compiler...
But I don't know what reasons to give when trying to tell them to use modern compiler(gcc,msvc,...).

Turbo C is a DOS only product. This means that it no longer runs "natively" on 64-bit versions of Windows, and must be run inside the XP compatibility penalty box.

While there are plenty of reasons not to use Turbo C (it's old, predates standards, generates 16-bit code, etc.), it's not valid to answer a question like "How do I do X in Turbo C?" with "Just use GCC". That would be like somebody asking "How do I do X with my 1992 Toyota?" and you saying "Just get a newer car".
People who are using Turbo C are probably doing so because it's a requirement, not because they don't know about anything better. Odds are it's for a programming class where the assignments they turn in have to work in that compiler. When I was grading C++ assignments, it didn't matter what compiler the students used, but they had to compile and run properly with the compiler I was using.

I would say support and standards compliance would be the two big issues for me.
Good luck even finding Borland/Inprise/Borland/Codegear/Embarcadero, or whatever they call themselves nowadays. Even more kudos if you can get them to admit these products exist (although I did at some point get them from the Borland museum on BDN).
Performance can be important but the vast majority of applications I write spend 90% of their time waiting for the user (I don't do genome sequencing, SETI analysis or protein folding - the market is pretty small).
Honestly, if I have the choice between two free products (where obviously money is not an issue), I'll always select the best (that would be GCC for me).

Turbo C generates 16-bit X86 code. Kiiinda nice when you're developing on a 16-bit x86 processor.
Been there. Done that.
The pragmatic reasons for changing are: gcc is under development, with bug-fixes. It deploys on modern operating systems and modern chips natively.

It was also my first compiler (4 yrs ago), though I switched to gcc soon enough when I learned it didn't follow latest standards and relied on features that are considered deprecated or bad practice. These were enough reasons for me to make the switch.

The most important reason you should use decent C compiler is performance. Since GCC optimizes the code aggressively, the compiled programs would have the performance tens of percents higher than before.

Turbo C is much simpler to configure & use, runs on old DOS machines. Also it is compact in size.I guess that is the reason.
However, it does take a very little advantage of modern processors.

We have to use C "for performance reasons" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In this age of many languages, there seems to be a great language for just about every task and I find myself professionally struggling against a mantra of "nothing but C is fast", where fast is really intended to mean "fast enough". I work with very rational open-minded people, who like to compare numbers, and all I have are thoughts and opinions. Could you help me find my way past subjective opinions and into the "real world"?
Would you help me find research as to what if any other languages could be used for embedded and (Linux) systems programming? I very well could be pushing a false hypothesis and would greatly appreciate research to show me this. Could you please link or include good numbers so as to help keep the "that's just his/her opinion" comments to a minimum.
So these are my particular requirements
memory is not a serious constraint
portability is not a serious concern
this is not a real time system

In my experience, using C for embedded and systems programming isn't necessarily a performance issue - it's often a portability issue. C tends to be the most portable, well supported language on just about every platform, especially on embedded systems platforms.
If you wish to use something else in an embedded system, it's often a matter of figuring out what options are available, then determining whether the performance, memory consumption, library support, etc, are "good enough" for your situation.

"Nothing but C is fast [enough]" is an early optimisation and wrong for all the reasons that early optimisations are wrong. If your system has enough complexity that something other than C is desirable, then there will be parts of the system that must be "fast enough" and parts with lighter constraints. If writing your code, for example, in Python will get the project finished faster, with fewer bugs, then you can follow up with some C or assembly code to speed up the time-critical parts.
Even if it turns out that the entire code must be written in C or assembly to meet the performance requirements, prototyping in a language like Python can have real benefits. You can take your working Python prototype and gradually replace parts with C code until you reach the necessary performance.
So, use the tools that let you get the development work done most correctly and most quickly, then use real data to determine where you need to optimize. It could be that C is the most appropriate tool to start with sometimes, but certainly not always, even in embedded systems.

Using C for embedded systems has got some very good reasons, of which "performance" is only one of the minor. Embedded is very close to the hardware, you need manual memory adressing to communicate with hardware. All the APIs and SDKs are available for C mostly.
There are only a few platforms that can run a VM for Java or Mono which is partially due to the performance implications but also due to expensive implementation costs.

Apart from performance, there is another consideration: you'll most likely be dealing with low-level APIs that were designed to be used in C or C++.
If you cannot use some SDK, you'll only get yourself in trouble instead of saving time with developing using a higher level language. At the very least, you'll end up redoing a bunch of function declarations and constant definitions.

For C:
C is often the only language that is supported by compilers for a processors.
Most of the libraries and example code is probability also in C.
Most embedded developers have years of C experience but very little experience in anything else.
Allows direct hardware interfacing and manual memory management.
Easy integration with assembly language.
C is going to be around for many years to come. In embedded development its a monopoly that smothers any attempt at change. A language that need a VM like Java or Lua is never going to go mainstream in the embedded environment. A compiled language might stand a chance if it provide compelling new features over C.

There are several benchmarks on the web between different languages. Most of them you will find a C or C++ implementation at the top as they give you more control to really optimize things.
Example: The Computer Language Benchmarks Game.

It's hard to argue against C (or other procedure languages like Pascal, Modula-2, Ada) and assembly for embedded. There is a large history of success with those languages. Generally, you want to remove the risk of the unknown. Trying to use anything other than C or assembly, in my opinion, is an unknown. Having said that, there's nothing wrong with a mixed model where you use one of the Schemes that go to C, or Python or Lua or JavaScript as a scripting language.
What you want is the ability to quickly and easily go to C when you have to.
If you convince the team to go with something that is unproven to them, the project is your cookie. If it crumbles, it'll likely be seen as your fault.

This article (by Michael Barr) talks about the use of C, C++, assembler and other languages in embedded systems, and includes a graph showing the relative usage of each.
And here's another article, fittingly entitled, Poor reasons for rejecting C++.

Ada is a high-level programming language that was designed for embedded systems and mission critical systems.
It is a fast secure language that has data checking built in everywhere. It is what the auto pilots in airplanes are programmed in.
At this link you have a comparison between Ada and C.

There are situations where you need real-time performance, especially in embedded systems. You also have severe memory constraints. A language like C gives you greater control over execution time and execution space.
So, depending on what you are doing, C may very well be "better" or more appropriate.
Check out the following articles
http://theunixgeek.blogspot.com/2008/09/c-vs-python-speed.html
http://wiki.python.org/moin/PythonSpeed/PerformanceTips (especially see Python is not C section)
http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php

C is ubiquitous, available for almost any architecture, usually from day-one of a processor's availability. C++ is a close second. If your system can support C++ and you have the necessary expertise, use it in preference to C - it is all that C is, and more, so there are few reasons for not using it.
C++ is a larger language, and there are constructs and techniques supported that may consume resources or behave in unacceptable ways in an embedded system, but that is not a reason not to use the language, but rather how to use it appropriately.
Java and C# (on Micro.Net or WinCE) may be viable alternatives for non-real-time.

You may want to look at the D programming language. It could use some performance tuning, as there are some areas Python can outperform it. I can't really point you to benchmarking comparisons since haven't been keeping a list, but as pointed to by Peter Olsson, Benchmarks & Language Implementations has D Digital Mars.
You will probably also want to look at these lovely questions:
Getting Embedded with D (the programming language)
How would you approach using D in a embedded real-time environment?

I'm not really a systems/embedded programmer, but it seems to me that embedded programs generally need deterministic performance - that immediately rules out many garbage collected languages, because they are not deterministic in general. However, there has been work on deterministic garbage collection (for example, Metronome for Java: http://www.ibm.com/developerworks/java/library/j-rtj4/index.html)
The issue is one of constraints - do the languages/runtimes meet the deterministic, memory usage, etc requirements.

C really is your best choice.
There is a difference for writing portable C code and getting too deep into the ghee whiz features of a specific compiler or corner cases of the language (all of which should be avoided). But portability across compilers and compiler versions. The number of employees that will be capable of developing for or maintaining the code. The compilers are going to have an easier time with it and produce better, cleaner, and more reliable code.
C is not going anywhere, with all the new languages being designed to fix the flaws in all the prior languages. C, with all the flaws these new languages are trying to fix, still stands strong.

Here are a couple articles that compare C# to C++ :
http://systematicgaming.wordpress.com/2009/01/03/performance-c-vs-c/
http://journal.stuffwithstuff.com/2009/01/03/debunking-c-vs-c-performance/
Not exactly what you asked for as it doesn't have a focus on embedded C programming. But it's interesting nonetheless. The first one demonstrates the performance of C++ and the benefits of using "unsafe" code for processor intensive tasks. The second one somewhat debunks the first one and shows that if you write the C# code a little differently then the performance is almost the same.
So I will say that C or C++ can be the clear winner in terms of performance in many cases. But often times the margin is slim. Whether to use C or not is another topic altogether. In my opinion it really should depend on the task at hand. But in embedded systems you often don't have much of a choice.

A couple people have mentioned Lua. People I know who have worked with embedded systems have said Lua is useful, but it's not really its own language per se but more of a library that can be embedded in C. It is targetted towards use in embedded systems and generally you'll want to call Lua code from C. But pure C makes for simpler (though not necessarily easier) maintenance, since everyone knows it.

Depending on the embedded platform, if memory constraints are an issue, you'll most likely need to use a non-garbage collected programming language.
C in this respect is likely the most well-known by the team and the most widely supported with available libraries and tools.

The truth is - not always.
It seems .NET runtime (but any other runtime can be taken as an example) imposes several MBs of runtime overhead. If this is all you have (in RAM), then you are out of luck. JavaME seems to be more compact, but it still all depends on resources you have at your disposal.

C compilers are much faster even on desktop systems, because of how few langage features there are compared to C++, so I'd imagine the difference is non-trivial on embedded systems. This translates to faster iteration times, although OTOH you don't have the conveniences of C++ (such as collections) which may slow you down in the long run.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight