How much faster is C than R in practice?

How much faster is C than R in practice? - c

I wrote a Gibbs sampler in R and decided to port it to C to see whether it would be faster. A lot of pages I have looked at claim that C will be up to 50 times faster, but every time I have used it, it's only about five or six times faster than R. My question is: is this to be expected, or are there tricks which I am not using which would make my C code significantly faster than this (like how using vectorization speeds up code in R)? I basically took the code and rewrote it in C, replacing matrix operations with for loops and making all the variables pointers.
Also, does anyone know of good resources for C from the point of view of an R programmer? There's an excellent book called The Art of R Programming by Matloff, but it seems to be written from the perspective of someone who already knows C.
Also, the screen tends to freeze when my C code is running in the standard R GUI for Windows. It doesn't crash; it unfreezes once the code has finished running, but it stops me from doing anything else in the GUI. Does anybody know how I could avoid this? I am calling the function using .C()

Many of the existing posts have explicit examples you can run, for example Darren Wilkinson has several posts on his blog analyzing this in different languages, and later even on different hardware (eg comparing his high-end laptop to his netbook and to a Raspberry Pi). Some of his posts are
the initial (then revised) post
another later post
and there are many more on his site -- these often compare C, Java, Python and more.
Now, I also turned this into a version using Rcpp -- see this blog post. We also used the same example in a comparison between Julia, Python and R/C++ at useR this summer so you should find plenty other examples and references. MCMC is widely used, and "easy pickings" for speedups.
Given these examples, allow me to add that I disagree with the two earlier comments your question received. The speed will not be the same, it is easy to do better in an example such as this, and your C/C++ skills will mostly determines how much better.
Finally, an often overlooked aspect is that the speed of the RNG matters a lot. Running down loops and adding things up is cheap -- doing "good" draws is not, and a lot of inter-system variation comes from that too.

About the GUI freezing, you might want to call R_CheckUserInterrupt and perhaps R_ProcessEvents every now and then.

I would say C, done properly, is much faster than R.
Some easy gains you could try:
Set the compiler to optimize for more speed.
Compiling with the -march flag.
Also if you're using VS, make sure you're compiling with release options, not debug.

Your observed performance difference will depend on a number of things: the type of operations that you are doing, how you write the C code, what type of compiler-level optimizations you use, your target CPU architecture, etc etc.
You can write basic, sloppy C and get something that works and runs with decent efficiency. You can also fine-tune your code for the unique characteristics of your target CPU - perhaps invoking specialized assembly instructions - and squeeze every last drop of performance that you can out of the code. You could even write code that runs significantly slower than the R version. C gives you a lot of flexibility. The limiting factor here is how much time that you want to put into writing and optimizing the C code.
The reverse is also true (duplicate the previous paragraph here, but swap "C" and "R").
I'm not trying to sound facetious, but there's really not a straightforward answer to your question. The only way to tell how much faster your C version would be is to write the code both ways and benchmark them.

Related

How come the mex code is running more slowly than the matlab code

I use matlab to write a program with many iterations. It cannot be vectorized since the data processing in each iteration is related to that in the previous iteration.
Then I transform the matlab code to mex using the build-in MATLAB coder and the resulting speed is even lower. I don't know whether I need to write the mex code by myself since it seems the mex code doesn't help.

I'd suggest that if you can, you get in touch with MathWorks to ask them for some advice. If you're not able to do that, then I would suggest really reading through the documentation and trying everything you find before giving up.
I've found that a few small changes to the way one implements the MATLAB code, and a few small changes to the project settings (such as disabling responsiveness to Ctrl-C, extrinsic calls back to MATLAB) can make give a speed difference of an order of magnitude or more in the generated code. There are not many people outside MathWorks who would be able to give good advice on exactly what changes might be worthwhile/sensible for you.
I should say that I've only used MATLAB Coder on one project, and I'm not at all an expert (actually not even a competent) C programmer. Nevertheless I've managed to produce C code that was about 10-15 times as fast as the original MATLAB code when mexed. I achieved that by a) just fiddling with all the different settings to see what happened and b) methodically going through the documentation, and seeing if there were places in my MATLAB code where I could apply any of the constructs I came across (such as coder.nullcopy, coder.unroll etc). Of course, your code may differ substantially.

how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm

I am seeking advice on how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm. I am using an MCMC approach to model the likelihood, given various covariates, that an individual will be assigned a particular rank in a social status hierarchy by a 3rd party (the judge): each judge (approx 80, across 4 villages) was asked to rank a group of individuals (approx 80, across 4 villages) based on their assessment of each individual's social status. Therefore, for each judge I have a vector of ranks corresponding to their judgement of each individual's position in the hierarchy.
To model this I assume that, when assigning ranks, judges are basing their decisions on the relative value of some latent measure of an individual's utility, u. Given this, it can then be assumed that a vector of ranks, r, produced by a given judge is a function of an unobserved vector, u, describing the utility of the individuals being ranked, where the individual with the kth highest value of u will be assigned the kth rank. I model u, using the covariates of interest, as a multivariate normally distributed variable and then determine the likelihood of the observed ranks, given the distribution of u generated by the model.
In addition to estimating the effect of, at most, 5 covariates, I also estimate hyperparameters describing variance between judges and items. Therefore, for every iteration of the chain I estimate a multivariate normal density approximately 8-10 times. As a result, 5000 iterations can take up to 14 hours. Obviously, I need to run it for much more than 5000 runs and so I need a means for dramatically speeding up the process. Given this, my questions are as follows:
(i) Am I right to assume that the best speed gains will be had by running some, if not all of my chain in C or C++?
(ii) assuming the answer to question 1 is yes, how do I go about this? For example, is there a way for me to retain all my R functions, but simply do the looping in C or C++: i.e. can I call my R functions from C and then do looping?
(iii) I guess what I really want to know is how best to approach the incorporation of C or C++ code into my program.

First make sure your slow R version is correct. Debugging R code might be easier than debugging C code. Done that? Great. You now have correct code you can compare against.
Next, find out what is taking the time. Use Rprof to run your code and see what is taking the time. I did this for some code I inherited once, and discovered it was spending 90% of the time in the t() function. This was because the programmer had a matrix, A, and was doing t(A) in a zillion places. I did one tA=t(A) at the start, and replaced every t(A) with tA. Massive speedup for no effort. Profile your code first.
Now, you've found your bottleneck. Is it code you can speed up in R? Is it a loop that you can vectorise? Do that. Check your results against your gold standard correct code. Always. Yes, I know its hard to compare algorithms that rely on random numbers, so set the seeds the same and try again.
Still not fast enough? Okay, now maybe you need to rewrite parts (the lowest level parts, generally, and those that were taking the most time in the profiling) in C or C++ or Fortran, or if you are really going for it, in GPU code.
Again, really check the code is giving the same answers as the correct R code. Really check it. If at this stage you find any bugs anywhere in the general method, fix them in what you thought was the correct R code and in your latest version, and rerun all your tests. Build lots of automatic tests. Run them often.
Read up about code refactoring. It's called refactoring because if you tell your boss you are rewriting your code, he or she will say 'why didn't you write it correctly first time?'. If you say you are refactoring your code, they'll say "hmmm... good". THIS ACTUALLY HAPPENS.
As others have said, Rcpp is made of win.

A complete example using R, C++ and Rcpp is provided by this blog post which was inspired by a this post on Darren Wilkinson's blog (and he has more follow-ups). The example is also included with recent releases of Rcpp in a directory RcppGibbs and should get you going.

I have a blog post which discusses exactly this topic which I suggest you take a look at:
http://darrenjw.wordpress.com/2011/07/31/faster-gibbs-sampling-mcmc-from-within-r/
(this post is more relevant than the post of mine that Dirk refers to).

I think the best method currently to integrate C or C++ is the Rcpp package of Dirk Eddelbuettel. You can find a lot of information at his website. There is also a talk at Google that is available through youtube that might be interesting.

Check out this project:
https://github.com/armstrtw/rcppbugs
Also, here is a link to the R/Fin 2012 talk:
https://github.com/downloads/armstrtw/rcppbugs/rcppbugs.pdf

I would suggest to benchmark each step of the MCMC sampler and identify the bottleneck. If you put each full conditional or M-H-step into a function, you can use the R compiler package which might give you 5%-10% speed gain. The next step is to use RCPP.
I think it would be really nice to have a general-purpose RCPP function which generates just one single draw using the M-H algorithm given a likelihood function.
However, with RCPP some things become difficult if you only know the R language: non-standard random distributions (especially truncated ones) and using arrays. You have to think more like a C programmer there.
Multivariate Normal is actually a big issue in R. Dmvnorm is very inefficient and slow. Dmnorm is faster, but it would give me NaNs quicker than dmvnorm in some models.
Neither does take an array of covariance matrices, so it is impossible to vectorize code in many instances. As long as you have a common covariance and means, however, you can vectorize, which is the R-ish strategy to speed up (and which is the oppositve of what you would do in C).

Why aren't programs written in Assembly more often? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It seems to be a mainstream opinion that assembly programming takes longer and is more difficult to program in than a higher level language such as C. Therefore it seems to be recommend or assumed that it is better to write in a higher level language for these reasons and for the reason of better portability.
Recently I've been writing in x86 assembly and it has dawned on me that perhaps these reasons are not really true, except perhaps portability. Perhaps it is more of a matter of familiarity and knowing how to write assembly well. I also noticed that programming in assembly is quite different than programming in an HLL. Perhaps a good and experienced assembly programmer could write programs just as easily and as quickly as an experienced C programmer writing in C.
Perhaps it is because assembly programming is quite different than HLLs, and so requires different thinking, methods and ways, which makes it seem very awkward to program in for the unfamiliar, and so gives it its bad name for writing programs in.
If portability isn't an issue, then really, what would C have over a good assembler such as NASM?
Edit:
Just to point out. When you are writing in assembly, you don't have to write just in instruction codes. You can use macros and procedures and your own conventions to make various abstractions to make programs more modular, more maintainable and easier to read. This is where being familiar with how to write good assembly comes in.

Hellо, I am a compiler.
I just scanned thousands of lines of code while you were reading this sentence. I browsed through millions of possibilities of optimizing a single line of yours using hundreds of different optimization techniques based on a vast amount of academic research that you would spend years getting at. I won't feel any embarrassment, not even a slight ick, when I convert a three-line loop to thousands of instructions just to make it faster. I have no shame to go to great lengths of optimization or to do the dirtiest tricks. And if you don't want me to, maybe for a day or two, I'll behave and do it the way you like. I can transform the methods I'm using whenever you want, without even changing a single line of your code. I can even show you how your code would look in assembly, on different processor architectures and different operating systems and in different assembly conventions if you'd like. Yes, all in seconds. Because, you know, I can; and you know, you can't.
P.S. Oh, by the way you weren't using half of the code you wrote. I did you a favor and threw it away.

ASM has poor legibility and isn't really maintainable compared to higher-level languages.
Also, there are many fewer ASM developers than for other more popular languages, such as C.
Furthermore, if you use a higher-level language and new ASM instructions become available (SSE for example), you just need to update your compiler and your old code can easily make use of the new instructions.
What if the next CPU has twice as many registers?
The converse of this question would be: What functionality do compilers provide?
I doubt you can/want to/should optimize your ASM better than gcc -O3 can.

I've written shedloads of assembler for the 6502, Z80, 6809 and 8086 chips. I stopped doing so as soon as C compilers became available for the platforms I was addressing, and immediately became at least 10x more productive. Most good programmers use the tools they use for rational reasons.

I love programming in assembly language, but it takes more code to do the same thing as in a high-level languge, and there is a direct correlation between lines of code and bugs. (This was explained decades ago in The Mythical Man-Month.)
It's possible to think of C as 'high level assembly', but get a few steps above that and you're in a different world. In C# you don't think twice about writing this:
foreach (string s in listOfStrings) { /* do stuff */ }
This would be dozens, maybe hundreds of lines of code in assembly, each programmer implementing it would take a different approach, and the next person coming along would have to figure it out. So if you believe (as many do) that programs are written primarily for other people to read, assembly is less readable than the typical HLL.
Edit: I accumulated a personal library of code used for common tasks, and macros for implementing C-like control structures. But I hit the wall in the 90s, when GUIs became the norm. Too much time was being spent on things that were routine.
The last task I had where ASM was essential was a few years ago, writing code to combat malware. No user interface, so it was all the fun parts without the bloat.

In addition to other people's answers of readability, maintainability, shorter code and therefore fewer bugs, and being much easier, I'll add an additional reason:
program speed.
Yes, in assembly you can hand tune your code to make use of every last cycle and make it as fast as is physically possible. However who has the time? If you write a not-completely-stupid C program, the compiler will do a really good job of optimizing for you. Probably making at least 95% of the optimizations you'd do by hand, without you having to worry about keeping track of any of it. There's definitely a 90/10 kind of rule here, where that last 5% of optimizations will end up taking up 95% of your time. So why bother?

If an average production program has say 100k lines of code, and each line is about 8-12 assembler instructions, that would be 1 million of assembler instructions.
Even if you could write all this by hand at a decent speed (remember, its 8 times more code that you have to write), what happens if you want to change some of the functionality? Understanding something you wrote a few weeks ago out of those 1 million instructions is a nightmare! There's no modules, no classes, no object-oriented design, no frameworks, no nothing. And the amount of similar looking code you have to write for even the simplest things is daunting at best.
Besides, you can't optimize your code nearly as well as a high level language. Where C for example performs an insane number of optimizations because you describe your intent, not only your code, in assembler you only write code, the assembler can't really perform any note-worthy optimizations on your code. What you write is what you get, and trust me, you can't reliably optimize 1 million instructions that you patch and patch as you write it.

Well I have been writing a lot of assembly "in the old days", and I can assure you that I am much more productive when I write programs in a high level language.

A reasonable level of assembler competence is a useful skill, especially if you work at any sort of system level or embedded programming, not so much because you have to write that much assembler, but because sometimes it's important to understand what the box is really doing. If you don't have a low-level understanding of assembler concepts and issues, this can be very difficult.
However, as for actually writing much code in assembler, there are several reasons it's not much done.
There's simply no (almost) need. Except for something like the very early system initialization and perhaps a few assembler fragments hidden in C functions or macros, all very low-level code that might once have been written in assembler can be written in C or C++ with no difficulty.
Code in higher-level languages (even C and C++) condenses functionality into far fewer lines, and there is considerable research showing that the number of bugs correlates with the number of lines of source code. Ie, the same problem, solved in assembler and C, will have more bugs in assembler simply because its longer. The same argument motivates the move to higher level languages such as Perl, Python, etc.
Writing in assembler, you have to deal with every single aspect of the problem, from detailed memory layout, instruction selection, algorithm choices, stack management, etc. Higher level languages take all this away from you, which is why are so much denser in terms of LOC.
Essentially, all of the above are related to the level of abstraction available to you in assembler versus C or some other language. Assembler forces you to make all of your own abstractions, and to maintain them through your own self-discipline, where any mid-level language like C, and especially higher level languages, provide you with abstractions out of the box, as well as the ability to create new ones relatively easily.

As a developer who spends most of his time in the embedded programming world, I would argue that assembly is far from a dead/obsolete language. There is a certain close-to-the-metal level of coding (for example, in drivers) that sometimes cannot be expressed as accurately or efficiently in a higher-level language. We write nearly all of our hardware interface routines in assembler.
That being said, this assembly code is wrapped such that it can be called from C code and is treated like a library. We don't write the entire program in assembly for many reasons. First and foremost is portability; our code base is used on several products that use different architectures and we want to maximize the amount of code that can be shared between them. Second is developer familiarity. Simply put, schools don't teach assembly like they used to, and our developers are far more productive in C than in assembly. Also, we have a wide variety of "extras" (things like libraries, debuggers, static analysis tools, etc) available for our C code that aren't available for assembly language code. Even if we wanted to write a pure-assembly program, we would not be able to because several critical hardware libraries are only available as C libs. In one sense, it's a chicken/egg problem. People are driven away from assembly because there aren't as many libraries and development/debug tools available for it, but the libs/tools don't exist because not enough people use assembly to warrant the effort creating them.
In the end, there is a time and a place for just about any language. People use what they are most familiar and productive with. There will probably always be a place in a programmer's repertoire for assembly, but most programmers will find that they can write code in a higher-level language that is almost as efficient in far less time.

When you are writing in assembly, you don't have to write just in instruction codes. You can use macros and procedures and your own conventions to make various abstractions to make programs more modular, more maintainable and easier to read.
So what you're basically saying is, that with skilled use of a sophisticated assembler, you can make your ASM code closer and closer to C (or anyway another low-ish-level language of your own invention), until eventually you are just as productive as a C programmer.
Does that answer your question? ;-)
I don't say this idly: I have programmed using exactly such an assembler and system. Even better, the assembler could target a virtual processor, and a separate translator compiled the output of the assembler for a target platform. Much as happens with LLVM's IF, but in its early forms pre-dating it by about 10 years. So there was portability, plus the ability to write routines for a specific target asssembler where required for efficiency.
Writing using that assembler was about as productive as C, and with by comparison with GCC-3 (which was around by the time I was involved) the assembler/translator produced code that was roughly as fast and usually smaller. Size was really important, and the company had few programmers and was willing to teach new hires a new language before they could do anything useful. And we had the back-up that people who didn't know the assembler (e.g. customers) could write C and compile it for the same virtual processor, using the same calling convention and so on, so that it interfaced neatly. So it felt like a marginal win.
That was with multiple man-years of work in the bag developing the assembler technology, libraries, and so on. Admittedly much of which went into making it portable, if it had only ever been targeting one architecture then the all-singing all-dancing assembler would have been much easier.
In summary: you may not like C, but it doesn't mean that the effort of using C is greater than the effort of coming up with something better.

Assembly is not portable between different microprocessors.

The same reason we don't go to the bathroom outside anymore, or why we don't speak Latin or Aramaic.
Technology comes along and makes things easier and more accessible.
EDIT - to cease offending people, I've removed certain words.

Why? Simple.
Compare this :
for (var i = 1; i <= 100; i++)
{
if (i % 3 == 0)
Console.Write("Fizz");
if (i % 5 == 0)
Console.Write("Buzz");
if (i % 3 != 0 && i % 5 != 0)
Console.Write(i);
Console.WriteLine();
}
with
.locals init (
[0] int32 i)
L_0000: ldc.i4.1
L_0001: stloc.0
L_0002: br.s L_003b
L_0004: ldloc.0
L_0005: ldc.i4.3
L_0006: rem
L_0007: brtrue.s L_0013
L_0009: ldstr "Fizz"
L_000e: call void [mscorlib]System.Console::Write(string)
L_0013: ldloc.0
L_0014: ldc.i4.5
L_0015: rem
L_0016: brtrue.s L_0022
L_0018: ldstr "Buzz"
L_001d: call void [mscorlib]System.Console::Write(string)
L_0022: ldloc.0
L_0023: ldc.i4.3
L_0024: rem
L_0025: brfalse.s L_0032
L_0027: ldloc.0
L_0028: ldc.i4.5
L_0029: rem
L_002a: brfalse.s L_0032
L_002c: ldloc.0
L_002d: call void [mscorlib]System.Console::Write(int32)
L_0032: call void [mscorlib]System.Console::WriteLine()
L_0037: ldloc.0
L_0038: ldc.i4.1
L_0039: add
L_003a: stloc.0
L_003b: ldloc.0
L_003c: ldc.i4.s 100
L_003e: ble.s L_0004
L_0040: ret
They're identical feature-wise.
The second one isn't even assembler but .NET IL (Intermediary Language, similar to Java's bytecode). The second compilation transforms the IL into native code (i.e. almost assembler), making it even more cryptical.

I'd guess ASM on even x86(_64) makes sense in cases where you gain a lot by utilizing instructions that are difficult for a compiler to optimize for. x264 for example uses a lot of asm for its encoding, and the speed gains are huge.

I'm sure there are many reasons, but two quick reasons I can think of are
Assembly code is definitely harder to read (I'm positive its more time-consuming to write as well)
When you have a huge team of developers working on a product, it is helpful to have your code divided into logical blocks and protected by interfaces.

One of the early discoveries (you'll find it in Brooks' Mythical Man-Month, which is from experience in the 1960s) was that people were more or less as productive in one language as another, in debugged lines of code per day. This obviously isn't universally true, and can breaks when pushed too far, but it was generally true of the high-level languages of Brooks' time.
Therefore, the fastest way to get productivity would be to use languages where one individual line of code did more, and indeed this works, at least for languages of complexity like FORTRAN and COBOL, or to give a more modern example C.

Portability is always an issue -- if not now, at least eventually. The programming industry spends billions every year to port old software which, at the time it was written, had "obviously" no portability issue whatsoever.

There was a vicious cycle as assembly became less commonplace: as higher level languages matured, assembly language instruction sets were built less for programmer convenience and more for the convenience of compilers.
So now, realistically, it may be very hard to make the right decisions on, say, which registers you should use or which instructions are slightly more efficient. Compilers can use heuristics to figure out which tradeoffs are likely to have the best payoff. We can probably think through smaller problems and find local optimizations that might beat our now pretty sophisticated compilers, but odds are that in the average case, a good compiler will do a better job on the first try than a good programmer probably will. Eventually, like John Henry, we might beat the machine, but we might seriously burn ourselves out getting there.
Our problems are also now quite different. In 1986 I was trying to figure out how to get a little more speed out of small programs that involved putting a few hundred pixels on the screen; I wanted the animation to be less jerky. A fair case for assembly language. Now I'm trying to figure out how to represent abstractions around contract language and servicer policy for mortgages, and I'd rather read something that looks close to the language that the business folks speak. Unlike LISP macros, Assembly macros don't enforce much in the way of rules, so even though you might be able to get something reasonably close to a DSL in a good assembler, it'll be prone to all sorts of quirks that won't cause me problems if I wrote the same code in Ruby, Boo, Lisp, C# or even F#.
If your problems are easy to express in efficient assembly language, though, more power to you.

Ditto most of what others have said.
In the good old days before C was invented, when the only high level languages were things like COBOL and FORTRAN, there were lots of things that just weren't possible to do without resorting to assembler. It was the only way to get the full breadth of flexibility, to be able to access all the devices, etc. But then C was invented, and almost anything that was possible in assembly was possible in C. I have written very little assembly since then.
That said, I think it is a very useful exercise for new programmers to learn to write in assembler. Not because they would actually use it much, but because then you understand what is really happening inside the computer. I've seen lots of programming errors and inefficient code from programmers who clearly have no idea what's really happening with the bits and bytes and registers.

I've been programming in assembly now for about a month. I often write a piece of code in C and then compile it to assembly to assist me. Perhaps I am not utilizing the full optimizing power of the C compiler but it appears that my C asm source is including unnecessary operations. So I am beginning to see that the talk of a good C compiler outperforming a good assembly coder is not always true.
Anyways, my assembly programs are so fast. And the more I use assembly the less time it takes me to write out my code because it's really not that hard. Also the comment about assembly having poor legibility is not true. If you label your programs correctly and make comments when there is additional elaboration needed you should be all set. In fact in ways assembly is more clear to the programmer because they are seeing what is happening at the level of the processor. I don't know about other programmers but for me I like knowing what's happening, rather than things being in a sort of black box.
With that said the real advantage of compilers is that a compiler can understand patterns and relationships and then automatically code them in the appropriate locations in the source. One popular example are virtual functions in C++ which requires the compiler to optimally map function pointers. However a compiler is limited to doing what the maker of the compiler allows the compiler to do. This leads to programmers sometimes having to resort to doing bizarre things with their code , adding coding time, when they could have been done trivially with assembly.
Personally I think the marketplace heavily supports high level languages. If assembly language was the only language in existence today then their would be about 70% less people programming and who knows where our world would be, probably back in the 90's. Higher level languages appeal to a broader range of people. This allows a higher supply of programmers to build the needed infrastructure of our world. Developing nations like China and India benefit heavily from languages like Java. These countries will fast develop their IT infrastructure and people will become more interconnected. So my point is that high level languages are popular not because they produce superior code but because they help to meet demand in the world's marketplaces.

I'm learning assembly in comp org right now, and while it is interesting, it is also very inefficient to write in. You have to keep alot more details in your head to get things working, and its also slower to write the same things. For example, a simple 6 line for loop in C++ can equal 18 lines or more of assembly.
Personally, its alot of fun learning how things work down at the hardware level, and it gives me greater appreciation for how computing works.

What C has over a good macro assembler is the language C. Type checking. Loop constructs. Automatic stack management. (Nearly) automatic variable management. Dynamic memory techniques in assembler are a massive pain in the butt. Doing a linked list properly is just down right scary compared to C or better yet list foo.insert(). And debugging - well, there's no contest on what is easier to debug. HLLs win hands down there.
I've coded nearly half my career in assembler which makes it very easy for me to think in assmebler. it helps me to see what the C compiler is doing which again helps me write code that the C compiler can efficiently handle. A well thought out routine written in C can be written to output exactly what you want in assembler with a little work - and it's portable! I've already had to rewrite a few older asm routines back to C for cross platform reasons and it's no fun.
No, I'll stick with C and deal with the occasional slight slowdown in performance against the productivity time I gain with HLL.

I can only answer why I personally don't write programs in assembly more often, and the main reason is that it's more tedious to do. Also, I think that it is easier to get things subtly wrong without noticing immediately. E.g., you might change the way you use a register in one routine but forget to change this in one place. It'll assemble fine and you may not notice until much later.
That said, I do think there are still valid uses for assembly. For instance, I have a number of pretty optimised assembly routines for processing large amounts of data, using SIMD and following the paranoid "every bit is sacred"[quote V.Stob] approach. (But note that naive assembly implementations are often a lot worse than what a compiler would generate for you.)

C is a macro assembler! And it's the best one!
It can do nearly everything assembly can, it can be portable and in most of the rare cases where it can't do something you can still use embedded assembly code. This leaves only a small fraction of programs that you absolutely need to write in assembly and nothing but assembly.
And the higher level abstractions and the portability make it more worthwhile for most people to write system software in C. And although you might not need portability now if you invest a lot of time and money in writing some program you might not want to limit yourself in what you'll be able to use it for in the future.

People seem to forget that there is also the other direction.
Why are you writing in Assembler in the first place? Why not write the program in a truly low level language?
Instead of
mov eax, 0x123
add eax, 0x456
push eax
call printInt
you could just as well write
B823010000
0556040000
50
FF15.....
That has so many advantages, you know the exact size of your program, you can reuse the value of instructions as input for other instructions and you do not even need an assembler to write it, you can use any text editor...
And the reason you still prefer Assembler about this, is the reason other people prefer C...

Because it's always that way: time pass and good things pass away too :(
But when you write asm code it's totally different feeling than when you code high-level langs, though you know it's much less productive. It's like you're a painter: you are free to draw anything you like the way you like with absolutely no restrictions(well, only by CPU features)... That is why I love it. It's a pity this language goes away. But while somebody still remembers it and codes it, it will never die!

$$$
A company hires a developer to help turn code into $$$. The faster that useful code can be produced, the faster the company can turn that code into $$$.
Higher level languages are generally better at churning out larger volumes of useful code. This is not to say that assembly does not have its place, for there are times and places where nothing else will do.

The advantage of HLL's is even greater when you compare assembly to a higher level language than C, e.g. Java or Python or Ruby. For instance, these languages have garbage collection: no need to worry about when to free a chunk of memory, and no memory leaks or bugs due to freeing too early.

As others mentioned before, the reason for any tool to exist is how efficiently it can work. As HLLs can accomplish the same jobs as many lines of asm code I guess it's natural for assembly to be superseded by other languages. And for the close-to-hardware fiddling - there's inline assembly in C and other variants as per language.
Dr. Paul Carter in says in the PC Assembly Language
"...a better understanding of how
computers really work at a lower level
than in programming languages like
Pascal. By gaining a deeper
understanding of how computers work,
the reader can often be much more
productive developing software in
higher level languages such as C and
C++. Learning to program in assembly
language is an excellent way to
achieve this goal."
We've got introduction to assembly in my college courses. It'll help to clear concepts. However I doubt any of us would write 90% of code in assembly. How relevant is in-depth assembly knowledge today?

Flipping through these answers, I'd bet 9/10 of the responders have never worked with assembly.
This is an ages old question that comes up every so often and you get the same, mostly misinformed answers. If it weren't for portability, I'd still do everything in assembly myself. Even then, I code in C almost like I did in assembly.

Why do you program in assembly? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a question for all the hardcore low level hackers out there. I ran across this sentence in a blog. I don't really think the source matters (it's Haack if you really care) because it seems to be a common statement.
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
As far as the assembly goes - is the code written in assembly because you don't want a compiler emitting extra instructions or using excessive bytes, or are you using better algorithms that you can't express in C (or can't express without the compiler mussing them up)?
I completely get that it's important to understand the low-level stuff. I just want to understand the why program in assembly after you do understand it.

I think you're misreading this statement:
For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.
Games (and most programs these days) aren't "written in assembly" the same way they're "written in C++". That blog isn't saying that a significant fraction of the game is designed in assembly, or that a team of programmers sit around and develop in assembly as their primary language.
What this really means is that developers first write the game and get it working in C++. Then they profile it, figure out what the bottlenecks are, and if it's worthwhile they optimize the heck out of them in assembly. Or, if they're already experienced, they know which parts are going to be bottlenecks, and they've got optimized pieces sitting around from other games they've built.
The point of programming in assembly is the same as it always has been: speed. It would be ridiculous to write a lot of code in assembler, but there are some optimizations the compiler isn't aware of, and for a small enough window of code, a human is going to do better.
For example, for floating point, compilers tend to be pretty conservative and may not be aware of some of the more advanced features of your architecture. If you're willing to accept some error, you can usually do better than the compiler, and it's worth writing that little bit of code in assembly if you find that lots of time is spent on it.
Here are some more relevant examples:
Examples from Games
Article from Intel about optimizing a game engine using SSE intrinsics. The final code uses intrinsics (not inline assembler), so the amount of pure assembly is very small. But they look at the assembler output by the compiler to figure out exactly what to optimize.
Quake's fast inverse square root. Again, the routine doesn't have assembler in it, but you need to know something about architecture to do this kind of optimization. The authors know what operations are fast (multiply, shift) and which are slow (divide, sqrt). So they come up with a very tricky implementation of square root that avoids the slow operations entirely.
High-Performance Computing
Outside the domain of games, people in scientific computing frequently optimize the crap out of things to get them to run fast on the latest hardware. Think of this as games where you can't cheat on the physics.
A great recent example of this is Lattice Quantum Chromodynamics (Lattice QCD). This paper describes how the problem pretty much boils down to one very small computational kernel, which was optimized heavily for PowerPC 440's on an IBM Blue Gene/L. Each 440 has two FPUs, and they support some special ternary operations that are tricky for compilers to exploit. Without these optimizations, Lattice QCD would've run much slower, which is costly when your problem requires millions of CPU hours on expensive machines.
If you are wondering why this is important, check out the article in Science that came out of this work. Using Lattice QCD, these guys calculated the mass of a proton from first principles, and showed last year that 90% of the mass comes from strong force binding energy, and the rest from quarks. That's E=mc2 in action. Here's a summary.
For all of the above, the applications are not designed or written 100% in assembly -- not even close. But when people really need speed, they focus on writing the key parts of their code to fly on specific hardware.

I have not coded in assembly language for many years, but I can give several reasons that I frequently saw:
Not all compilers can make use of certain CPU optimizations and instruction set (e.g., the new instruction sets that Intel adds once in a while). Waiting for compiler writers to catch up means losing a competitive advantage.
Easier to match actual code to known CPU architecture and optimization. For example, things you know about the fetching mechanism, caching, etc. This is supposed to be transparent to the developer, but the fact is that it is not, that's why compiler writers can optimize.
Certain hardware level accesses are only possible/practical via assembly language (e.g., when writing device driver).
Formal reasoning is sometimes actually easier for the assembly language than for the high-level language since you already know what the final or almost final layout of the code is.
Programming certain 3D graphic cards (circa late 1990s) in the absence of APIs was often more practical and efficient in assembly language, and sometimes not possible in other languages. But again, this involved really expert-level games based on the accelerator architecture like manually moving data in and out in certain order.
I doubt many people use assembly language when a higher-level language would do, especially when that language is C. Hand-optimizing large amounts of general-purpose code is impractical.

There is one aspect of assembler programming which others have not mentioned - the feeling of satisfaction you get knowing that every single byte in an application is the result of your own effort, not the compiler's. I wouldn't for a second want to go back to writing whole apps in assembler as I used to do in the early 80s, but I do miss that feeling sometimes...

Usually, a layman's assembly is slower than C (due to C's optimization) but many games (I distinctly remember Doom) had to have specific sections of the game in Assembly so it would run smoothly on normal machines.
Here's the example to which I am referring.

I started professional programming in assembly language in my very first job (80's). For embedded systems the memory demands - RAM and EPROM - were low. You could write tight code that was easy on resources.
By the late 80's I had switched to C. The code was easier to write, debug and maintain. Very small snippets of code were written in assembler - for me it was when I was writing the context switching in an roll-your-own RTOS. (Something you shouldn't do anymore unless it is a "science project".)
You will see assembler snippets in some Linux kernel code. Most recently I've browsed it in spinlocks and other synchronization code. These pieces of code need to gain access to atomic test-and-set operations, manipulating caches, etc.
I think you would be hard pressed to out-optimize modern C compilers for most general programming.
I agree with #altCognito that your time is probably better spent thinking harder about the problem and doing things better. For some reason programmers often focus on micro-efficiencies and neglect the macro-efficiencies. Assembly language to improve performance is a micro-efficiency. Stepping back for a wider view of the system can expose the macro problems in a system. Solving the macro problems can often yield better performance gains.
Once the macro problems are solved then collapse to the micro level.
I guess micro problems are within the control of a single programmer and in a smaller domain. Altering behavior at the macro level requires communication with more people - a thing some programmers avoid. That whole cowboy vs the team thing.

"Yes". But, understand that for the most part the benefits of writing code in assembler are not worth the effort. The return received for writing it in assembly tends to be smaller than the simply focusing on thinking harder about the problem and spending your time thinking of a better way of doing thigns.
John Carmack and Michael Abrash who were largely responsible for writing Quake and all of the high performance code that went into IDs gaming engines go into this in length detail in this book.
I would also agree with Ólafur Waage that today, compilers are pretty smart and often employ many techniques which take advantage of hidden architectural boosts.

These days, for sequential codes at least, a decent compiler almost always beats even a highly seasoned assembly-language programmer. But for vector codes it's another story. Widely deployed compilers don't do such a great job exploiting the vector-parallel capabilities of the x86 SSE unit, for example. I'm a compiler writer, and exploiting SSE tops my list of reasons to go on your own instead of trusting the compiler.

SSE code works better in assembly than compiler intrinsics, at least in MSVC. (i.e. does not create extra copies of data )

I've three or four assembler routines (in about 20 MB source) in my sources at work. All of them are SSE(2), and are related to operations on (fairly large - think 2400x2048 and bigger) images.
For hobby, I work on a compiler, and there you have more assembler. Runtime libraries are quite often full of them, most of them have to do with stuff that defies the normal procedural regime (like helpers for exceptions etc.)
I don't have any assembler for my microcontroller. Most modern microcontrollers have so much peripheral hardware (interrupt controled counters, even entire quadrature encoders and serial building blocks) that using assembler to optimize the loops is often not needed anymore. With current flash prices, the same goes for code memory. Also there are often ranges of pin-compatible devices, so upscaling if you systematically run out of cpu power or flash space is often not a problem
Unless you really ship 100000 devices and programming assembler makes it possible to really make major savings by just fitting in a flash chip a category smaller. But I'm not in that category.
A lot of people think embedded is an excuse for assembler, but their controllers have more CPU power than the machines Unix was developed on. (Microchip coming
with 40 and 60 MIPS microcontrollers for under USD 10).
However a lot people are stuck with legacy, since changing microchip architecture is not easy. Also the HLL code is very architecture dependent (because it uses the hardware periphery, registers to control I/O, etc). So there are sometimes good reasons to keep maintaining a project in assembler (I was lucky to be able to setup affairs on a new architecture from scratch). But often people kid themselves that they really need the assembler.
I still like the answer a professor gave when we asked if we could use GOTO (but you could read that as ASSEMBLER too): "if you think it is worth writing a 3 page essay on why you need the feature, you can use it. Please submit the essay with your results. "
I've used that as a guiding principle for lowlevel features. Don't be too cramped to use it, but make sure you motivate it properly. Even throw up an artificial barrier or two (like the essay) to avoid convoluted reasoning as justification.

Some instructions/flags/control simply aren't there at the C level.
For example, checking for overflow on x86 is the simple overflow flag. This option is not available in C.

Defects tend to run per-line (statement, code point, etc.); while it's true that for most problems, assembly would use far more lines than higher level languages, there are occasionally cases where it's the best (most concise, fewest lines) map to the problem at hand. Most of these cases involve the usual suspects, such as drivers and bit-banging in embedded systems.

If you were around for all the Y2K remediation efforts, you could have made a lot of money if you knew Assembly. There's still plenty of legacy code around that was written in it, and that code occasionally needs maintenance.

Another reason could be when the available compiler just isn't good enough for an architecture and the amount of code needed in the program is not that long or complex as for the programmer to get lost in it. Try programming a microcontroller for an embedded system, usually assembly will be much easier.

Beside other mentioned things, all higher languages have certain limitations. Thats why some people choose to programm in ASM, to have full control over their code.
Others enjoy very small executables, in the range of 20-60KB, for instance check HiEditor, which is implemented by author of the HiEdit control, superb powerfull edit control for Windows with syntax highlighting and tabs in only ~50kb). In my collection I have more then 20 such gold controls from Excell like ssheets to html renders.

I think a lot of game developers would be surprised at this bit of information.
Most games I know of use as little assembly as at all possible. In some cases none at all, and at worst, one or two loops or functions.
That quote is over-generalized, and nowhere near as true as it was a decade ago.
But hey, mere facts shouldn't hinder a true hacker's crusade in favor of assembly. ;)

If you are programming a low end 8 bit microcontroller with 128 bytes of RAM and 4K of program memory you don't have much choice about using assembly. Sometimes though when using a more powerful microcontroller you need a certain action to take place at an exact time. Assembly language comes in useful then as you can count the instructions and so measure the clock cycles used by your code.

Games are pretty performance hungry and although in the meantime the optimizers are pretty good a "master programmer" is still able to squeeze out some more performance by hand coding the right parts in assembly.
Never ever start optimizing your program without profiling it first. After profiling should be able to identify bottlenecks and if finding better algorithms and the like don't cut it anymore you can try to hand code some stuff in assembly.

Aside from very small projects on very small CPUs, I would not set out to ever program an entire project in assembly. However, it is common to find that a performance bottleneck can be relieved with the strategic hand coding of some inner loops.
In some cases, all that is really required is to replace some language construct with an instruction that the optimizer cannot be expected to figure out how to use. A typical example is in DSP applications where vector operations and multiply-accumulate operations are difficult for an optimizer to discover, but easy to hand code.
For example certain models of the SH4 contain 4x4 matrix and 4 vector instructions. I saw a huge performance improvement in a color correction algorithm by replacing equivalent C operations on a 3x3 matrix with the appropriate instructions, at the tiny cost of enlarging the correction matrix to 4x4 to match the hardware assumption. That was achieved by writing no more than a dozen lines of assembly, and carrying matching adjustments to the related data types and storage into a handful of places in the surrounding C code.

It doesn't seem to be mentioned, so I thought I'd add it: in modern games development, I think at least some of the assembly being written isn't for the CPU at all. It's for the GPU, in the form of shader programs.
This might be needed for all sorts of reasons, sometimes simply because whatever higher-level shading language used doesn't allow the exact operation to be expressed in the exact number of instructions wanted, to fit some size-constraint, speed, or any combination. Just as usual with assembly-language programming, I guess.

Almost every medium-to-large game engine or library I've seen to date has some hand-optimized assembly versions available for matrix operations like 4x4 matrix concatenation. It seems that compilers inevitably miss some of the clever optimizations (reusing registers, unrolling loops in a maximally efficient way, taking advantage of machine-specific instructions, etc) when working with large matrices. These matrix manipulation functions are almost always "hotspots" on the profile, too.
I've also seen hand-coded assembly used a lot for custom dispatch -- things like FastDelegate, but compiler and machine specific.
Finally, if you have Interrupt Service Routines, asm can make all the difference in the world -- there are certain operations you just don't want occurring under interrupt, and you want your interrupt handlers to "get in and get out fast"... you know almost exactly what's going to happen in your ISR if it's in asm, and it encourages you to keep the bloody things short (which is good practice anyway).

I have only personally talked to one developer about his use of assembly.
He was working on the firmware that dealt with the controls for a portable mp3 player.
Doing the work in assembly had 2 purposes:
Speed: delays needed to be minimal.
Cost: by being minimal with the code, the hardware needed to run it could be slightly less powerful. When mass-producing millions of units, this can add up.

The only assembler coding I continue to do is for embedded hardware with scant resources. As leander mentions, assembly is still well suited to ISRs where the code needs to be fast and well understood.
A secondary reason for me is to keep my knowledge of assembly functional. Being able to examine and understand the steps which the CPU is taking to do my bidding just feels good.

Last time I wrote in assembler was when I could not convince the compiler to generate libc-free, position independent code.
Next time will probably be for the same reason.
Of course, I used to have other reasons.

A lot of people love to denigrate assembly language because they've never learned to code with it and have only vaguely encountered it and it has left them either aghast or somewhat intimidated. True talented programmers will understand that it is senseless to bash C or Assembly because they are complimentary. in fact the advantage of one is the disadvantage of the other. The organized syntaxic rules of C improves clarity but at the same gives up all the power assembly has from being free of any structural rules ! C code instruction are made to create non-blocking code which could be argued forces clarity of programming intent but this is a power loss. In C the compiler will not allow a jump inside an if/elseif/else/end. Or you are not allowed to write two for/end loops on diferent variables that overlap each other, you cannot write self modifying code (or cannot in an seamless easy way), etc.. conventional programmers are spooked by the above, and would have no idea how to even use the power of these approaches as they have been raised to follow conventional rules.
Here is the truth : Today we have machine with the computing power to do much more that the application we use them for but the human brain is too incapable to code them in a rule free coding environment (= assembly) and needs restrictive rules that greatly reduce the spectrum and simplifies coding.
I have myself written code that cannot be written in C code without becoming hugely inefficient because of the above mentionned limitations. And i have not yet talked about speed which most people think is the main reason for writting in assembly, well it is if you mind is limited to thinking in C then you are the slave of you compiler forever. I always thought chess players masters would be ideal assembly programmers while the C programmers just play "Dames".

No longer speed, but Control. Speed will sometimes come from control, but it is the only reason to code in assembly. Every other reason boils down to control (i.e. SSE and other hand optimization, device drivers and device dependent code, etc.).

If I am able to outperform GCC and Visual C++ 2008 (known also as Visual C++ 9.0) then people will be interested in interviewing me about how it is possible.
This is why for the moment I just read things in assembly and just write __asm int 3 when required.
I hope this help...

I've not written in assembly for a few years, but the two reasons I used to were:
The challenge of the thing! I went through a several-month period years
ago when I'd write everything in x86 assembly (the days of DOS and Windows
3.1). It basically taught me a chunk of low level operations, hardware I/O, etc.
For some things it kept size small (again DOS and Windows 3.1 when writing TSRs)
I keep looking at coding assembly again, and it's nothing more than the challenge and joy of the thing. I have no other reason to do so :-)

I once took over a DSP project which the previous programmer had written mostly in assembly code, except for the tone-detection logic which had been written in C, using floating-point (on a fixed-point DSP!). The tone detection logic ran at about 1/20 of real time.
I ended up rewriting almost everything from scratch. Almost everything was in C except for some small interrupt handlers and a few dozen lines of code related to interrupt handling and low-level frequency detection, which runs more than 100x as fast as the old code.
An important thing to bear in mind, I think, is that in many cases, there will be much greater opportunities for speed enhancement with small routines than large ones, especially if hand-written assembler can fit everything in registers but a compiler wouldn't quite manage. If a loop is large enough that it can't keep everything in registers anyway, there's far less opportunity for improvement.

The Dalvik VM that interprets the bytecode for Java applications on Android phones uses assembler for the dispatcher. This movie (about 31 minutes in, but its worth watching the whole movie!) explains how
"there are still cases where a human can do better than a compiler".

I don't, but I've made it a point to at least try, and try hard at some point in the furture (soon hopefully). It can't be a bad thing to get to know more of the low level stuff and how things work behind the scenes when I'm programming in a high level language. Unfortunately time is hard to come by with a full time job as a developer/consultant and a parent. But I will give at go in due time, that's for sure.

Why don't I see a significant speed-up when using the MATLAB compiler?

I have a lot of nice MATLAB code that runs too slowly and would be a pain to write over in C. The MATLAB compiler for C does not seem to help much, if at all. Should it be speeding execution up more? Am I screwed?

If you are using the MATLAB complier (on a recent version of MATLAB) then you will almost certainly not see any speedups at all. This is because all the compiler actually does is give you a way of packaging up your code so that it can be distributed to people who don't have MATLAB. It doesn't convert it to anything faster (such as machine code or C) - it merely wraps it in C so you can call it.
It does this by getting your code to run on the MATLAB Compiler Runtime (MCR) which is essentially the MATLAB computational kernel - your code is still being interpreted. Thanks to the penalty incurred by having to invoke the MCR you may find that compiled code runs more slowly than if you simply ran it on MATLAB.
Put another way - you might say that the compiler doesn't actually compile - in the traditional sense of the word at least.
Older versions of the compiler worked differently and speedups could occur in certain situations. For Mathwork's take on this go to
http://www.mathworks.com/support/solutions/data/1-1ARNS.html

In my experience slow MATLAB code usually comes from not vectorizing your code (i.e., writing for-loops instead of just multiplying arrays (simple example)).
If you are doing file I/O look out for reading data in one piece at a time. Look in the help files for the vectorized version of fscanf.
Don't forget that MATLAB includes a profiler, too!

I'll echo what dwj said: if your MATLAB code is slow, this is probably because it is not sufficiently vectorized. If you're doing explicit loops when you could be doing operations on whole arrays, that's the culprit.
This applies equally to all array-oriented dynamic languages: Perl Data Language, Numeric Python, MATLAB/Octave, etc. It's even true to some extent in compiled C and FORTRAN compiled code: specially-designed vectorization libraries generally use carefully hand-coded inner loops and SIMD instructions (e.g. MMX, SSE, AltiVec).

First, I second all the above comments about profiling and vectorizing.
For a historical perspective...
Older version of Matlab allowed the user to convert m files to mex functions by pre-parsing the m code and converting it to a set of matlab library calls. These calls have all the error checking that the interpreter did, but old versions of the interpreter and/or online parser were slow, so compiling the m file would sometimes help. Usually it helped when you had loops because Matlab was smart enough to inline some of that in C. If you have one of those versions of Matlab, you can try telling the mex script to save the .c file and you can see exactly what it's doing.
In more recent version (probably 2006a and later, but I don't remember), Mathworks started using a just-in-time compiler for the interpreter. In effect, this JIT compiler automatically compiles all mex functions, so explicitly doing it offline doesn't help at all. In each version since then, they've also put a lot of effort into making the interpreter much faster. I believe that newer versions of Matlab don't even let you automatically compile m files to mex files because it doesn't make sense any more.

The MATLAB compiler wraps up your m-code and dispatches it to a MATLAB runtime. So, the performance you see in MATLAB should be the performance you see with the compiler.
Per the other answers, vectorizing your code is helpful. But, the MATLAB JIT is pretty good these days and lots of things perform roughly as well vectorized or not. That'a not to say there aren't performance benefits to be gained from vectorization, it's just not the magic bullet it once was. The only way to really tell is to use the profiler to find out where your code is seeing bottlenecks. Often times there are some places where you can do local refactoring to really improve the performance of your code.
There are a couple of other hardware approaches you can take on performance. First, much of the linear algebra subsystem is multithreaded. You may want to make sure you have enabled that in your preferences if you are working on a multi-core or multi-processor platform. Second, you may be able to use the parallel computing toolbox to take more advantage of multiple processors. Finally, if you are a Simulink user, you may be able to use emlmex to compile m-code into c. This is particularly effective for fixed point work.

Have you tried profiling your code? You don't need to vectorize ALL your code, just the functions that dominate running time. The MATLAB profiler will give you some hints on where your code is spending the most time.
There are many other things you you should read up on the Tips For Improving Performance section in the MathWorks manual.

mcc won't speed up your code at all--it's not really a compiler.
Before you give up, you need to run the profiler and figure out where all your time is going (Tools->Open Profiler). Also, judicious use of "tic" and "toc" can help. Don't optimize your code until you know where the time is going (don't try to guess).
Keep in mind that in matlab:
bit-level operations are really slow
file I/O is slow
loops are generally slow, but vectorizing is fast (if you don't know the vector syntax, learn it)
core operations are really fast (e.g. matrix multiply, fft)
if you think you can do something faster in C/Fortran/etc, you can write a MEX file
there are commercial solutions to convert matlab to C (google "matlab to c") and they work

You could port your code to "Embedded Matlab" and then use the Realtime-Workshop to translate it to C.
Embedded Matlab is a subset of Matlab. It does not support Cell-Arrays, Graphics, Marices of dynamic size, or some Matrix addressing modes. It may take considerable effort to port to Embedded Matlab.
Realtime-Workshop is at the core of the Code Generation Products. It spits out generic C, or can optimize for a range of embedded Platforms. Most interresting to you is perhaps the xPC-Target, which treats general purpose hardware as embedded target.

I would vote for profiling + then look at what are the bottlenecks.
If the bottleneck is matrix math, you're probably not going to do any better... EXCEPT one big gotcha is array allocation. e.g. if you have a loop:
s = [];
for i = 1:50000
s(i) = 3;
end
This has to keep resizing the array; it's much faster to presize the array (start with zeros or NaN) & fill it from there:
s = zeros(50000,1);
for i = 1:50000
s(i) = 3;
end
If the bottleneck is repeated executions of a lot of function calls, that's a tough one.
If the bottleneck is stuff that MATLAB doesn't do quickly (certain types of parsing, XML, stuff like that) then I would use Java since MATLAB already runs on a JVM and it interfaces really easily to arbitrary JAR files. I looked at interfacing with C/C++ and it's REALLY ugly. Microsoft COM is ok (on Windows only) but after learning Java I don't think I'll ever go back to that.

As others has noted, slow Matlab code is often the result of insufficient vectorization.
However, sometimes even perfectly vectorized code is slow. Then, you have several more options:
See if there are any libraries / toolboxes you can use. These were usually written to be very optimized.
Profile your code, find the tight spots and rewrite those in plain C. Connecting C code (as DLLs for instance) to Matlab is easy and is covered in the documentation.

By Matlab compiler you probably mean the command mcc, which does speed the code a little bit by circumventing Matlab interpreter. What would speed the MAtlab code significantly (by a factor of 50-200) is use of actual C code compiled by the mex command.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight