Efficiency issues when using C99 and C11. - c

The other day I was converting a program written with C99 standard into C11. Basically the motive was to use the code with MSVC but It was written in Linux and was mostly compiled with default GCC behaviour. During the code conversion, I found out that you can not decalre variables of a function after any statement i.e. you must declare them at the top of the function.
But my question is that wouldn't it be against the efficient programming rule that variables should be declared near their use so that it maximizes the cache hits? For example, In a large function of say 200 LOC, I want to use some big static look up array at nearly the end of the function. Wouldn't declaring and initializing it just before the usage cause more cache hits? or am I simple missing some basic point of C11 C language standard?

You seem to have some confusion for which version of the standard you are compiling your program. AFAIK, MSVC doesn't support any of the more recent C standards.
But to come to the core of your question, no this is not an efficiency issue. The compiler is allowed to reorder statements to its liking, as long as the observable behavior of the program doesn't change. Thus a modern compiler will always touch a new variable the latest possible before its first use.

Where the variable declaration appears has no effect on cache behavior. Just having a declaration doesn't touch memory.
You may need to separate out initialization into a separate assignment, however, in order to make sure you don't have an initializer causing a memory access at (near) the beginning of the function.

Related

Which compiler should I trust?

This is going to be some what of a newbie question but I was trying to work on a small exercise in the C Language (not C++) and I was running into some issues.
Say I wanted to use an array within a method whose size depended on one of the arguments:
void someFunc(int arSize)
{
char charArray[arSize];
// DO STUFF
...
}
When I try to compile this as a .c file within Visual Studio 2013 I get an error saying that a non-constant array size is not allowed. However the same code works within CodeBlocks under a GNU Compiler. Which should I trust? Is this normal for compilers to behave so differently? I always thought that if you're doing something that a compiler doesn't like you shouldn't be doing it in the first place because it's not a standard.
Any input is useful! I come from a Background in Python and I am trying to get more heavily involved in programming with Data-Structures and Algorithms.
My platform is Windows as you can probably tell. Please let me know if this question needs more information before it can be answered.
Variable length arrays(VLA) are a C99 feature and Visual Studio until recently did not support C99 and I am not sure if it currently supports VLA in the lastest version. gcc on the other hand does support C99 although not fully. gcc supports VLA as an extension outside of C99 mode, even in C++.
From the draft C99 standard section 6.7.5.2 Array declarators paragraph 4:
[...] If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type.
You should trust the compilers that you're using and that you want to support.
On that particular issue: non-constant array sizes are valid in C99, which isn't fully supported by either gcc or by MSVC (Microsoft's C/C++ compiler). gcc, however, has this feature from the standard implemented even outside of C99 mode, while MSVC hasn't.
It depends upon the particular standard your C compiler is following.
The feature you want is called variable length array (VLA) and was introduced into the C99 standard.
Maybe your Visual Studio is supporting some earlier version of the standard. Perhaps you might configure it to support a later version.
Notice that using VLA with a huge size could be a bad habit: VLA are generally stack allocated, and a call frame stack should usually have a small size (a few kilobytes at most on current processors), especially for kernel code or for recursive or multithreaded functions. You may want to heap-allocate (e.g. with calloc) your array if it is has more than a thousand words. Then you'll need to free it later.
This is a GCC extension acting on you.

Declare variable as locally as possible

I'm new to Linux kernel development.
One thing that bothers me is a way a variables are declared and initialized.
I'm under impression that code uses variable declaration placement rules for C89/ANSI C (variables are declared at the beginning of block), while C99 relaxes the rule.
My background is C++ and there many advises from "very clever people" to declare variable as locally as possible - better declare and initialize in the same instruction:
Google C++ Style Guide
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices - item 18
A good discussion about it here.
What is the accepted way to initialize variables in Linux kernel?
I couldn't find a relevant passage in the Linux kernel coding style. So, follow the convention used in existing code -- declare variables at beginning of block -- or run the risk of your code seeming out-of-place.
Reasons why variables at beginning of block is a Good Thing:
the target architecture may not have a C99 compiler
... can't think of more reasons
You should always try to declare variables as locally as possible. If you're using C++ or C99, that would usually be right before the first use.
In older C, doing that doesn't fall under "possible", and there the place to declare those variables would usually be the beginning of the current block.
(I say 'usually' because of some cases with functions and loops where it's better to make them a bit more global...)
In most normal cases, declare them in the beginning of the function where you are using them. There are exceptions, but they are rare.
if your function is short enough, the deceleration is far away from the first use anyway. If your function is longer then that - it's a good sign your function is too long.
The reason many C++ based coding standards recommend declaring close to use is that C++ data types can be much "fatter" (e.g. thing of class with multiple inheritances etc.) and so take up a lot more space. If you define such an instance at the beginning of a function but use it only much later (and maybe not at all) you are wasting a lot of RAM. This tends to be much less of an issue in C with it's native type only data types.
There is an oblique reference in the Coding Style document. It says:
Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.
So while C99 style in-place initialisers are handy in certain cases the first thing you should probably be asking yourself is why it's hard to have them all at the top of the function. This doesn't prevent you from declaring stuff inside block markers, for example for in-loop calculations.
In older C it is possible to declare them locally by creating a block inside the function. Blocks can be added even without ifs/for/while:
int foo(void)
{
int a;
int b;
....
a = 5 + b;
{
int c;
....
}
}
Although it doesn't look very neat, it still is possible, even in older C.
I can't speak to why they have done things one way in the Linux kernel, but in the systems we develop, we tend to not use C99-specific features in the core code. Individual applications tend to have stuff written for C99, because they will typically be deployed to one known platform, and the gcc C99 implementation is known good.
But the core code has to be deployable on whatever platform the customer demands (within reason). We have supplied systems on AIX, Solaris, Informix, Linux, Tru-64, OpenVMS(!) and the presence of C99 compliant compilers isn't always guaranteed.
The Linux kernel needs to be substantially more portable again - and particularly down to small footprint embedded systems. I guess the feature just isn't important enough to override these sorts of considerations.

Using __thread in c99

I would like to define a few variables as thread-specific using the __thread storage class. But three questions make me hesitate:
Is it really standard in c99? Or more to the point, how good is the compiler support?
Will the variables be initialised in every thread?
Do non-multi threaded programs treat them as plain-old-globals?
To answer your specific questions:
No, it is not part of C99. You will not find it mentioned anywhere in the n1256.pdf (C99+TC1/2/3) or the original C99 standard.
Yes, __thread variables start out with their initialized value in every new thread.
From a standpoint of program behavior, thread-local storage class variables behave pretty much the same as plain globals in non-multi-threaded programs. However, they do incur a bit more runtime cost (memory and startup time), and there can be issues with limits on the size and number of thread-local variables. All this is rather complicated and varies depending on whether your program is static- or dynamic-linked and whether the variables reside in the main program or a shared library...
Outside of implementing C/POSIX (e.g. errno, etc.), thread-local storage class is actually not very useful, in my opinion. It's pretty much a crutch for avoiding cleanly passing around the necessary state in the form of a context pointer or similar. You might think it could be useful for getting around broken interfaces like qsort that don't take a context pointer, but unfortunately there is no guarantee that qsort will call the comparison function in the same thread that called qsort. It might break the job down and run it in multiple threads. Same goes for most other interfaces where this sort of workaround would be possible.
You probably want to read this:
http://www.akkadia.org/drepper/tls.pdf
1) MSVC doesn't support C99. GCC does and other compilers attempt GCC compatibility.
edit A breakdown of compiler support for __thread is available here:
http://chtekk.longitekk.com/index.php?/archives/2011/02/C8.html
2) Only C++ supports an initializer and it must be constant.
3) Non-multi-threaded applications are single-threaded applications.

Why doesn't GNOME use C99?

Looking at mutter source code and evince source code, both still use C89 style of declaring all variables at the very beginning of the function, instead of where it is first used (limited scope is good). Why don't they use C99? GNOME 3 was launch recently and mutter is quite new, so that could have been a good opportunity to switch, if the reason was compatibility with old code style.
Does that mean that contributing code to GNOME needs to be written in C89?
The rationale can be linked to the same rationale behind Glib and GTK+:
No C99 comments or declarations.
Rationale: we expect GLib and GTK+ to
be buildable on various compilers and
C99 support is still not yet
widespread.
Source: http://live.gnome.org/GTK+/BestPractices
Speaking of scope, I guess you can still do this:
if (condition)
{
int temporary = expression();
trigger_side_effect(temporary);
}
In other words, each actual brace-enclosed scope can contain new variable declarations, even in C89. Many people seem surprised by this; there's no difference from this perspective between a function's top-level scope and any other scope contained therein. Variables will be visible in all scopes descending from the one that declared them.
Note that I don't know if this is supported by the GNOME style guide, but it's at least supported by C89, and a recommended technique (by me) to keep things as local as possible.
Many people consider declaring variables all over the place, as opposed to at the beginning of the block, bad style. It makes it mildly more work to look for declarations, and makes it so you have to inspect the whole function to find them all. Also, for whatever reason, declarations after statements were one of the last C99 features GCC implemented, so for a long time, it was a major compatibility consideration.

Is ARPACK thread-safe?

Is it safe to use the ARPACK eigensolver from different threads at the same time from a program written in C? Or, if ARPACK itself is not thread-safe, is there an API-compatible thread-safe implementation out there? A quick Google search didn't turn up anything useful, but given the fact that ARPACK is used heavily in large scientific calculations, I'd find it highly surprising to be the first one who needs a thread-safe sparse eigensolver.
I'm not too familiar with Fortran, so I translated the ARPACK source code to C using f2c, and it seems that there are quite a few static variables. Basically, all the local variables in the translated routines seem to be static, implying that the library itself is not thread-safe.
Fortran 77 does not support recursion, and hence a standard conforming compiler can allocate all variables in the data section of the program; in principle, neither a stack nor a heap is needed [1].
It might be that this is what f2c is doing, and if so, it might be that it's the f2c step that makes the program non thread-safe, rather than the program itself. Of course, as others have mentioned, check out for COMMON blocks as well. EDIT: Also, check for explicit SAVE directives. SAVE means that the value of the variable should be retained between subsequent invocations of the procedure, similar to static in C. Now, allocating all procedure local data in the data section makes all variables implicitly SAVE, and unfortunately, there is a lot of old code that assumes this even though it's not guaranteed by the Fortran standard. Such code, obviously, is not thread-safe. Wrt. ARPACK specifically, I can't promise anything but ARPACK is generally well regarded and widely used so I'd be surprised if it suffered from these kinds of dusty-deck problems.
Most modern Fortran compilers do use stack allocation. You might have better luck compiling ARPACK with, say, gfortran and the -frecursive option.
EDIT:
[1] Not because it's more efficient, but because Fortran was originally designed before stacks and heaps were invented, and for some reason the standards committee wanted to retain the option to implement Fortran on hardware with neither stack nor heap support all the way up to Fortran 90. Actually, I'd guess that stacks are more efficient on todays heavily cache-dependent hardware rather than accessing procedure local data that is spread all over the data section.
I have converted ARPACK to C using f2c. Whenever you use f2c and you care about thread-safety you must use the -a switch. This makes local variables have automatic storage, i.e. be stack based locals rather than statics which is the default.
Even so, ARPACK itself is decidedly not threadsafe. It uses a lot of common blocks (i.e. global variables) to preserve state between different calls to its functions. If memory serves, it uses a reverse communication interface which tends to lead developers to using global variables. And of course ARPACK probably was written long before multi-threading was common.
I ended up re-working the converted C code to systematically remove all the global variables. I created a handful of C structs and gradually moved the global variables into these structs. Finally I passed pointers to these structs to each function that needed access to those variables. Although I could just have converted each global into a parameter wherever it was needed it was much cleaner to keep them all together, contained in structs.
Essentially the idea is to convert global variables into local variables.
ARPACK uses BLAC right? Then those libraries need to be thread safe too.
I believe your idea to check with f2c might not be a bullet proof way of telling if the Fortran code is thread safe, I would guess it also depends on the Fortran compiler and libraries.
I don't know what strategy f2c uses in translating Fortran. Since ARPACK is written in FORTRAN 77, the first thing to do is check for the presence of COMMON blocks. These are global variables, and if used, the code is most likely not thread safe. The ARPACK webpage, http://www.caam.rice.edu/software/ARPACK/, says that there is a parallel version -- it seems likely that that version is threadsafe.

Resources