How much speed gain if using __INLINE__? - c

In my understanding, INLINE can speed up code execution, is it?
How much speed can we gain from it?

Ripped from here:
Yes and no. Sometimes. Maybe.
There are no simple answers. inline functions might make the code faster, they might make it slower. They might make the executable larger, they might make it smaller. They might cause thrashing, they might prevent thrashing. And they might be, and often are, totally irrelevant to speed.
inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster.
inline functions might make it slower: Too much inlining might cause code bloat, which might cause "thrashing" on demand-paged virtual-memory systems. In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.
inline functions might make it larger: This is the notion of code bloat, as described above. For example, if a system has 100 inline functions each of which expands to 100 bytes of executable code and is called in 100 places, that's an increase of 1MB. Is that 1MB going to cause problems? Who knows, but it is possible that that last 1MB could cause the system to "thrash," and that could slow things down.
inline functions might make it smaller: The compiler often generates more code to push/pop registers/parameters than it would by inline-expanding the function's body. This happens with very small functions, and it also happens with large functions when the optimizer is able to remove a lot of redundant code through procedural integration — that is, when the optimizer is able to make the large function small.
inline functions might cause thrashing: Inlining might increase the size of the binary executable, and that might cause thrashing.
inline functions might prevent thrashing: The working set size (number of pages that need to be in memory at once) might go down even if the executable size goes up. When f() calls g(), the code is often on two distinct pages; when the compiler procedurally integrates the code of g() into f(), the code is often on the same page.
inline functions might increase the number of cache misses: Inlining might cause an inner loop to span across multiple lines of the memory cache, and that might cause thrashing of the memory-cache.
inline functions might decrease the number of cache misses: Inlining usually improves locality of reference within the binary code, which might decrease the number of cache lines needed to store the code of an inner loop. This ultimately could cause a CPU-bound application to run faster.
inline functions might be irrelevant to speed: Most systems are not CPU-bound. Most systems are I/O-bound, database-bound or network-bound, meaning the bottleneck in the system's overall performance is the file system, the database or the network. Unless your "CPU meter" is pegged at 100%, inline functions probably won't make your system faster. (Even in CPU-bound systems, inline will help only when used within the bottleneck itself, and the bottleneck is typically in only a small percentage of the code.)
There are no simple answers: You have to play with it to see what is best. Do not settle for simplistic answers like, "Never use inline functions" or "Always use inline functions" or "Use inline functions if and only if the function is less than N lines of code." These one-size-fits-all rules may be easy to write down, but they will produce sub-optimal results.
Copyright (C) Marshall Cline

Using inline makes the system use the substitution model of evaluation, but this is not guaranteed to be used all the time. If this is used, the generated code will be longer and may be faster, but if some optimizations are active, the sustitution model is not faster not all the time.

The reason I use inline function specifier (specifically, static inline), is not because of "speed", but because
static part tells the compiler the function is only visible in the current translation unit (the current file being compiled and included header files)
inline part tells the compiler it can include the implementation of the function at the call site, if it wants to
static inline tells the compiler that it can skip the function completely if it is not used at all in the current translation unit
(Specifically, the compiler that I use most with the options I use most, gcc -Wall, does issue a warning if a function marked static is unused; but will not issue a warning if a function marked static inline is unused.)
static inline tells us humans that the function is a macro-like helper function, in addition adding type-checker to the same behavior as macro's.
Thus, in my opinion, the assumption that inline has anything to do with speed per se, is incorrect. Answering the stated question with a straight answer would be misleading.
In my code, you see them associated with some data structures, or occasionally global variables.
A typical example is when I want to implement a Xorshift pseudorandom number generator in my own C code:
#include <inttypes.h>
static uint64_t prng_state = 1; /* Any nonzero uint64_t seed is okay */
static inline uint64_t prng_u64(void)
{
uint64_t state;
state = prng_state;
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
prng_state = state;
return state * UINT64_C(2685821657736338717);
}
The static uint64_t prng_state = 1; means that prng_state is a variable of type uint64_t, visible only in the current compilation unit, and initialized to 1. The prng_u64() function returns an unsigned 64-bit pseudorandom integer. However, if you do not use prng_u64(), the compiler will not generate code for it either.
Another typical use case is when I have data structures, and they need accessor functions. For example,
#ifndef GRID_H
#define GRID_H
#include <stdlib.h>
typedef struct {
int rows;
int cols;
unsigned char *cell;
} grid;
#define GRID_INIT { 0, 0, NULL }
#define GRID_OUTSIDE -1
static inline int grid_get(grid *const g, const int row, const int col)
{
if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
return GRID_OUTSIDE;
return g->cell[row * (size_t)(g->cols) + col];
}
static inline int grid_set(grid *const g, const int row, const int col,
const unsigned char value)
{
if (!g || row < 0 || col < 0 || row >= g->rows || col >= g->cols)
return GRID_OUTSIDE;
return g->cell[row * (size_t)(g->cols) + col] = value;
}
static inline void grid_init(grid *g)
{
g->rows = 0;
g->cols = 0;
g->cell = NULL;
}
static inline void grid_free(grid *g)
{
free(g->cell);
g->rows = 0;
g->cols = 0;
g->cell = NULL;
}
int grid_create(grid *g, const int rows, const int cols,
const unsigned char initial_value);
int grid_load(grid *g, FILE *handle);
int grid_save(grid *g, FILE *handle);
#endif /* GRID_H */
That header file defines some useful helper functions, and declares the functions grid_create(), grid_load(), and grid_save(), that would be implemented in a separate .c file.
(Yes, those three functions could be implemented in the header file just as well, but it would make the header file quite large. If you had a large project, spread over many translation units (.c source files), each one including the header file would get their own local copies of the functions. The accessor functions defined as static inline above are short and trivial, so it is perfectly okay for them to be copied here and there. The three functions I omitted are much larger.)

Related

Thread-safely initializing a pointer just once

I'm writing a library function, say, count_char(const char *str, int len, char ch) that detects the supported SIMD extensions of the CPU it's running on and dispatches the call to, say, an AVX2- or SSE4.2-optimized version. Since I'd like to avoid the penalty of doing a couple of cpuid instructions per each call, I'm trying to do this just once the first time the function is called (which might be called by different threads simultaneously).
In C++ land I'd just do something like
int count_char(const char *str, int len, char ch) {
static const auto fun_ptr = select_simd_function();
return (*fun_ptr)(str, len, ch);
}
and rely on C++ semantics of static to guarantee that it's called exactly once without any race conditions. But what's the best way to do this in pure C?
This is what I've come up with:
Using atomic variables (that are also present in C) — rather error-prone and a bit harder to maintain.
Using pthread_once — not sure about what overhead it has, plus it might give headache on Windows.
Forcing the library user to call another library function to initialize the pointer — in short, it won't work in my case since this is actually C bits of a library for another language.
Aligning the pointer by 8 bytes and relying on x86 word-sized accesses being atomic — unportable to other architectures (shall I later implement some PowerPC or ARM-specific SIMD versions, say), technically UB (at least in C++).
Using thread-local storage and marking fun_ptr as thread_local and then doing something like
static thread_local fun_ptr_t fun_ptr = NULL;
if (!fun_ptr) {
fun_ptr = select_simd_function();
}
return (*fun_ptr)(str, len, ch);
The upside is that the code is very clear and apparently correct, but I'm not sure about the performance implications of TLS, plus every thread will have to call select_simd_function() once (but that's probably not a big deal).
For me personally, (5) is the winner so far, followed closely by (1) (I'd probably even go with (1) if it weren't somebody else's very foundational library and I didn't want to embarrass myself with a likely faulty implementation).
So, what'd be the best option? Did I miss anything else?
If you can use C11, this would work (assuming your implementation supports threads - it's an optional feature):
#include <threads.h>
static fun_ptr_t fun_ptr = NULL;
static void init_fun_ptr( void )
{
fun_ptr = select_simd_function();
}
fun_ptr_t get_simd_function( void )
{
static once_flag flag = ONCE_FLAG_INIT;
call_once( &flag, init_fun_ptr);
return ( fun_ptr );
}
Of course, you mentioned Windows. I doubt MSVC supports this.

Mode settings in C: enum vs constant vs defines

I have read a few questions on the topic:
Should I use #define, enum or const?
How does Enum allocate Memory on C?
What makes a better constant in C, a macro or an enum?
What is the size of an enum in C?
"static const" vs "#define" vs "enum"
static const vs. #define in c++ - differences in executable size
and I understand that enum are usually preferred on the #define macros for a better encapsulation and/or readibility. Plus it allows the compilers to check for types preventing some errors.
const declaration are somewhat in between, allowing type checking, and encapsulation, but more messy.
Now I work in Embedded applications with very limited memory space (we often have to fight for byte saving). My first ideas would be that constants take more memory than enums. But I realised that I am not sure how constants will appear in the final firmware.
Example:
enum { standby, starting, active, stoping } state;
Question
In a resource limited environment, how does the enum vs #define vs static const compare in terms of execution speed and memory imprint?
To try to get some substantial elements to the answer, I made a simple test.
Code
I wrote a simple C program main.c:
#include <stdio.h>
#include "constants.h"
// Define states
#define STATE_STANDBY 0
#define STATE_START 1
#define STATE_RUN 2
#define STATE_STOP 3
// Common code
void wait(unsigned int n)
{
unsigned long int vLoop;
for ( vLoop=0 ; vLoop<n*LOOP_SIZE ; ++vLoop )
{
if ( (vLoop % LOOP_SIZE) == 0 ) printf(".");
}
printf("\n");
}
int main ( int argc, char *argv[] )
{
int state = 0;
int loop_state;
for ( loop_state=0 ; loop_state<MACHINE_LOOP ; ++loop_state)
{
if ( state == STATE_STANDBY )
{
printf("STANDBY ");
wait(10);
state = STATE_START;
}
else if ( state == STATE_START )
{
printf("START ");
wait(20);
state = STATE_RUN;
}
else if ( state == STATE_RUN )
{
printf("RUN ");
wait(30);
state = STATE_STOP;
}
else // ( state == STATE_STOP )
{
printf("STOP ");
wait(20);
state = STATE_STANDBY;
}
}
return 0;
}
while constants.h contains
#define LOOP_SIZE 10000000
#define MACHINE_LOOP 100
And I considered three variants to define the state constants. The macro as above, the enum:
enum {
STATE_STANDBY=0,
STATE_START,
STATE_RUN,
STATE_STOP
} possible_states;
and the const:
static const int STATE_STANDBY = 0;
static const int STATE_START = 1;
static const int STATE_RUN = 2;
static const int STATE_STOP = 3;
while the rest of the code was kept identical.
Tests and Results
Tests were made on a 64 bits linux machine and compiled with gcc
Global Size
gcc main.c -o main gives
macro: 7310 bytes
enum: 7349 bytes
const: 7501 bytes
gcc -O2 main.c -o main gives
macro: 7262 bytes
enum: 7301 bytes
const: 7262 bytes
gcc -Os main.c -o main gives
macro: 7198 bytes
enum: 7237 bytes
const: 7198 bytes
When optimization is turned on, both the const and the macro variants come to the same size. The enum is always slightly larger. Using gcc -S I can see that the difference is a possible_states,4,4 in .comm. So the enum is always larger than the macro. and the const can be larger but can also be optimized away.
Section size
I checked a few sections of the programs using objdump -h main: .text, .data, .rodata, .bss, .dynamic. In all cases, .bss has 8 bytes, .data, 16 bytes and .dynamic: 480 bytes.
.rodata has 31 bytes, except for the non-optimized const version (47 bytes).
.text goes from 620 bytes up to 780 bytes, depending on the optimisation. The const unoptimised being the only one differing with the same flag.
Execution speed
I ran the program a few times, but I did not notice a substantial difference between the different versions. Without optimisation, it ran for about 50 seconds. Down to 20 seconds with -O2 and up to more than 3 minutes with -Os. I measured the time with /usr/bin/time.
RAM usage
Using time -f %M, I get about 450k in each case, and when using valgrind --tool=massif --pages-as-heap=yes I get 6242304 in all cases.
Conclusion
Whenever some optimisation has been activated, the only notable difference is about 40 Bytes more for the enum case. But no RAM or speed difference.
Remains other arguments about scope, readability... personal preferences.
and I understand that enum are usually preferred on the #define macros for a better encapsulation and/or readibility
Enums are preferred mainly for better readability, but also because they can be declared at local scope and they add a tiny bit more of type safety (particularly when static analysis tools are used).
Constant declaration are somewhat in between, allowing type checking, and encapsulation, but more messy.
Not really, it depends on scope. "Global" const can be messy, but they aren't as bad practice as global read/write variables and can be justified in some cases. One major advantage of const over the other forms is that such variables tend to be allocated in .rodata and you can view them with a debugger, something that isn't always possible with macros and enums (depends on how good the debugger is).
Note that #define are always global and enum may or may not be, too.
My first ideas would be that constants take more memory than enums
This is incorrect. enum variables are usually of type int, though they can be of smaller types (since their size can vary they are bad for portability). Enumeration constants however (that is the things inside the enum declaration) are always int, which is a type of at least 16 bits.
A const on the other hand, is exactly as large as the type you declared. Therefore const is preferred over enum if you need to save memory.
In a resource limited environment, how does the enum vs #define vs static const compare in terms of execution speed and memory imprint?
Execution speed will probably not differ - it is impossible to say since it is so system-specific. However, since enums tend to give 16 bit or larger values, they are a bad idea when you need to save memory. And they are also a bad idea if you need an exact memory layout, as is often the case in embedded systems. The compiler may however of course optimize them to a smaller size.
Misc advice:
Always use the stdint.h types in embedded systems, particularly when you need exact memory layout.
Enums are fine unless you need them as part of some memory layout, like a data protocol. Don't use enums for such cases.
const is ideal when you need something to be stored in flash. Such variables get their own address and are easier to debug.
In embedded systems, it usually doesn't make much sense to optimize code in order to reduce flash size, but rather to optimize to reduce RAM size.
#define will always end up in .text flash memory, while enum and const may end up either in RAM, .rodata flash or .text flash.
When optimizing for size (RAM or flash) on an embedded system, keep track of your variables in the map file (linker output file) and look for things that stand out there, rather than running around and manually optimizing random things at a whim. This way you can also detect if some variables that should be const have ended up in RAM by mistake (a bug).
enums, #defines and static const will generally give exactly the same code (and therefore the same speed and size), with certain assumptions.
When you declare an enum type and enumeration constants, these are just names for integer constants. They don't take any space or time. The same applies to #define'd values (though these are not limited to int's).
A "static const" declaration may take space, usually in a read-only section in flash. Typically this will only be the case when optimisation is not enabled, but it will also happen if you forget to use "static" and write a plain "const", or if you take the address of the static const object.
For code like the sample given, the results will be identical with all versions, as long as at least basic optimisation is enabled (so that the static const objects are optimised). However, there is a bug in the sample. This code:
enum {
STATE_STANDBY = 0,
STATE_START,
STATE_RUN,
STATE_STOP
} possible_states;
not only creates the enumeration constants (taking no space) and an anonymous enum type, it also creates an object of that type called "possible_states". This has global linkage, and has to be created by the compiler because other modules can refer to it - it is put in the ".comm" common section. What should have been written is one of these:
// Define just the enumeration constants
enum { STATE_STANDBY, ... };
// Define the enumeration constants and the
// type "enum possible_states"
enum possible_states { STATE_STANDBY, ... };
// Define the enumeration constants and the
// type "possible_states"
typedef enum { STATE_STANDBY, ... } possible_states;
All of these will give optimal code generation.
When comparing the sizes generated by the compiler here, be careful not to include the debug information! The examples given show a bigger object file for the enumeration version partly because of the error above, but mainly because it is a new type and leads to more debug information.
All three methods work for constants, but they all have their peculiarities. A "static const" can be any type, but you can't use it for things like case labels or array sizes, or for initialising other objects. An enum constant is limited to type "int". A #define macro contains no type information.
For this particular case, however, an enumerated type has a few big advantages. It collects the states together in one definition, which is clearer, and it lets you make them a type (once you get the syntax correct :-) ). When you use a debugger, you should see the actual enum constants, not just a number, for variables of the enum type. ("state" should be declared of this type.) And you can get better static error checking from your tools. Rather than using a series of "if" / "else if" statements, use a switch and use the "-Wswitch" warning in gcc (or equivalent for other compilers) to warn you if you have forgotten a case.
I used a OpenGL library which defined most constants in an enum, and the default OpenGL-header defines it as #defines. So as a user of the header/library there is no big difference. It is just an aspect of desing. When using plain C, there is no static const until it is something that changes or is a big string like (extern) static const char[] version;. For myself I avoid using macros, so in special cases the identifier can be reused. When porting C code to C++, the enums even are scoped and are Typechecked.
Aspect: characterusage & readability:
#define MY_CONST1 10
#define MY_CONST2 20
#define MY_CONST3 30
//...
#define MY_CONSTN N0
vs.
enum MyConsts {
MY_CONST1 = 10,
MY_CONST2 = 20,
MY_CONST3 = 30,
//...
MY_CONSTN = N0 // this gives already an error, which does not happen with (unused) #defines, because N0 is an identifier, which is probably not defined
};

Static functions declared in "C" header files

For me it's a rule to define and declare static functions inside source files, I mean .c files.
However in very rare situations I saw people declaring it in the header file.
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared. This looks pretty odd and far from what we usually want when declaring something as static.
On the other hand if someone naive tries to use that function without defining it the compiler will complaint. So in some sense is not really unsafe to do this even sounding strange.
My questions are:
What is the problem of declaring static functions in header files?
What are the risks?
What the impact in compilation time?
Is there any risk in runtime?
First I'd like to clarify my understanding of the situation you describe: The header contains (only) a static function declaration while the C file contains the definition, i.e. the function's source code. For example
some.h:
static void f();
// potentially more declarations
some.c:
#include "some.h"
static void f() { printf("Hello world\n"); }
// more code, some of it potentially using f()
If this is the situation you describe, I take issue with your remark
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared.
If you declare the function but do not use it in a given translation unit, I don't think you have to define it. gcc accepts that with a warning; the standard does not seem to forbid it, unless I missed something. This may be important in your scenario because translation units which do not use the function but include the header with its declaration don't have to provide an unused definition.
Now let's examine the questions:
What is the problem of declaring static functions in header files?
It is somewhat unusual. Typically, static functions are functions needed in only one file. They are declared static to make that explicit by limiting their visibility. Declaring them in a header therefore is somewhat antithetical. If the function is indeed used in multiple files with identical definitions it should be made external, with a single definition. If only one translation unit actually uses it, the declaration does not belong in a header.
One possible scenario therefore is to ensure a uniform function signature for different implementations in the respective translation units. The common header leads to a compile time error for different return types in C (and C++); different parameter types would cause a compile time error only in C (but not in C++' because of function overloading).
What are the risks?
I do not see risks in your scenario. (As opposed to also including the function definition in a header which may violate the encapsulation principle.)
What the impact in compilation time?
A function declaration is small and its complexity is low, so the overhead of having additional function declarations in a header is likely negligible. But if you create and include an additional header for the declaration in many translation units the file handling overhead can be significant (i.e. the compiler idles a lot while it waits for the header I/O)
Is there any risk in runtime? I cannot see any.
This is not an answer to the stated questions, but hopefully shows why one might implement a static (or static inline) function in a header file.
I can personally only think of two good reasons to declare some functions static in a header file:
If the header file completely implements an interface that should only be visible in the current compilation unit
This is extremely rare, but might be useful in e.g. an educational context, at some point during the development of some example library; or perhaps when interfacing to another programming language with minimal code.
A developer might choose to do so if the library or interaface implementation is trivial and nearly so, and ease of use (to the developer using the header file) is more important than code size. In these cases, the declarations in the header file often use preprocessor macros, allowing the same header file to be included more than once, providing some sort of crude polymorphism in C.
Here is a practical example: Shoot-yourself-in-the-foot playground for linear congruential pseudorandom number generators. Because the implementation is local to the compilation unit, each compilation unit will get their own copies of the PRNG. This example also shows how crude polymorphism can be implemented in C.
prng32.h:
#if defined(PRNG_NAME) && defined(PRNG_MULTIPLIER) && defined(PRNG_CONSTANT) && defined(PRNG_MODULUS)
#define MERGE3_(a,b,c) a ## b ## c
#define MERGE3(a,b,c) MERGE3_(a,b,c)
#define NAME(name) MERGE3(PRNG_NAME, _, name)
static uint32_t NAME(state) = 0U;
static uint32_t NAME(next)(void)
{
NAME(state) = ((uint64_t)PRNG_MULTIPLIER * (uint64_t)NAME(state) + (uint64_t)PRNG_CONSTANT) % (uint64_t)PRNG_MODULUS;
return NAME(state);
}
#undef NAME
#undef MERGE3
#endif
#undef PRNG_NAME
#undef PRNG_MULTIPLIER
#undef PRNG_CONSTANT
#undef PRNG_MODULUS
An example using the above, example-prng32.h:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#define PRNG_NAME glibc
#define PRNG_MULTIPLIER 1103515245UL
#define PRNG_CONSTANT 12345UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides glibc_state and glibc_next() */
#define PRNG_NAME borland
#define PRNG_MULTIPLIER 22695477UL
#define PRNG_CONSTANT 1UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides borland_state and borland_next() */
int main(void)
{
int i;
glibc_state = 1U;
printf("glibc lcg: Seed %u\n", (unsigned int)glibc_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)glibc_next());
printf("%u\n", (unsigned int)glibc_next());
borland_state = 1U;
printf("Borland lcg: Seed %u\n", (unsigned int)borland_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)borland_next());
printf("%u\n", (unsigned int)borland_next());
return EXIT_SUCCESS;
}
The reason for marking both the _state variable and the _next() function static is that this way each compilation unit that includes the header file has their own copy of the variables and the functions -- here, their own copy of the PRNG. Each must be separately seeded, of course; and if seeded to the same value, will yield the same sequence.
One should generally shy away from such polymorphism attempts in C, because it leads to complicated preprocessor macro shenanigans, making the implementation much harder to understand, maintain, and modify than necessary.
However, when exploring the parameter space of some algorithm -- like here, the types of 32-bit linear congruential generators, this lets us use a single implementation for each of the generators we examine, ensuring there are no implementation differences between them. Note that even this case is more like a development tool, and not something you ought to see in a implementation provided for others to use.
If the header implements simple static inline accessor functions
Preprocessor macros are commonly used to simplify code accessing complicated structure types. static inline functions are similar, except that they also provide type checking at compile time, and can refer to their parameters several times (with macros, that is problematic).
One practical use case is a simple interface for reading files using low-level POSIX.1 I/O (using <unistd.h> and <fcntl.h> instead of <stdio.h>). I've done this myself when reading very large (dozens of megabytes to gigabytes range) text files containing real numbers (with a custom float/double parser), as the GNU C standard I/O is not particularly fast.
For example, inbuffer.h:
#ifndef INBUFFER_H
#define INBUFFER_H
typedef struct {
unsigned char *head; /* Next buffered byte */
unsigned char *tail; /* Next byte to be buffered */
unsigned char *ends; /* data + size */
unsigned char *data;
size_t size;
int descriptor;
unsigned int status; /* Bit mask */
} inbuffer;
#define INBUFFER_INIT { NULL, NULL, NULL, NULL, 0, -1, 0 }
int inbuffer_open(inbuffer *, const char *);
int inbuffer_close(inbuffer *);
int inbuffer_skip_slow(inbuffer *, const size_t);
int inbuffer_getc_slow(inbuffer *);
static inline int inbuffer_skip(inbuffer *ib, const size_t n)
{
if (ib->head + n <= ib->tail) {
ib->head += n;
return 0;
} else
return inbuffer_skip_slow(ib, n);
}
static inline int inbuffer_getc(inbuffer *ib)
{
if (ib->head < ib->tail)
return *(ib->head++);
else
return inbuffer_getc_slow(ib);
}
#endif /* INBUFFER_H */
Note that the above inbuffer_skip() and inbuffer_getc() do not check if ib is non-NULL; this is typical for such functions. These accessor functions are assumed to be "in the fast path", i.e. called very often. In such cases, even the function call overhead matters (and is avoided with static inline functions, since they are duplicated in the code at the call site).
Trivial accessor functions, like the above inbuffer_skip() and inbuffer_getc(), may also let the compiler avoid the register moves involved in function calls, because functions expect their parameters to be located in specific registers or on the stack, whereas inlined functions can be adapted (wrt. register use) to the code surrounding the inlined function.
Personally, I do recommend writing a couple of test programs using the non-inlined functions first, and compare the performance and results to the inlined versions. Comparing the results ensure the inlined versions do not have bugs (off by one type is common here!), and comparing the performance and generated binaries (size, at least) tells you whether inlining is worth it in general.
Why would you want a both global and static function? In c, functions are global by default. You only use static functions if you want to limit the access to a function to the file they are declared. So you actively restrict access by declaring it static...
The only requirement for implementations in the header file, is for c++ template functions and template class member functions.

When is the "inline" keyword effective in C?

Well, there is no guarantee by the standard that inline functions are actually inlined; one must use macros to have 100 % guarantee. The compiler always decides which function is or is not inlined based on its own rules irrespective of the inline keyword.
Then when will the inline keyword actually have some effect to what the compiler does when using modern compilers such as the recent version of GCC?
It has a semantic effect. To simplify, a function marked inline may be defined multiple times in one program — though all definitions must be equivalent to each other — so presence of inline is required for correctness when including the function definition in headers (which is, in turn, makes the definition visible so the compiler can inline it without LTO).
Other than that, for inlining-the-optimization, "never" is a perfectly safe approximation. It probably has some effect in some compilers, but nothing worth losing sleep over, especially not without actual hard data. For example, in the following code, using Clang 3.0 or GCC 4.7, main contains the same code whether work is marked inline or not. The only difference is whether work remains as stand-alone function for other translation units to link to, or is removed.
void work(double *a, double *b) {
if (*b > *a) *a = *b;
}
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
//if (y[i] > x[i]) x[i] = y[i];
work(x+i, y+i);
}
}
If you want to control inlining, stick to whatever pragmas or attributes your compiler provides with which to control that behaviour. For example __attribute__((always_inline)) on GCC and similar compilers. As you've mentioned, the inline keyword is often ignored depending on optimization settings, etc.

Benefits of pure function

Today i was reading about pure function, got confused with its use:
A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.
e.g. strlen() is a pure function while rand() is an impure one.
__attribute__ ((pure)) int fun(int i)
{
return i*i;
}
int main()
{
int i=10;
printf("%d",fun(i));//outputs 100
return 0;
}
http://ideone.com/33XJU
The above program behaves in the same way as in the absence of pure declaration.
What are the benefits of declaring a function as pure[if there is no change in output]?
pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like
for (int i = 0; i < 1000; i++)
{
printf("%d", fun(10));
}
With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.
When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.
Here is what the GCC documentation says about the pure attribute:
pure
Many functions have no effects except the return value and their return
value depends only on the parameters and/or global variables.
Such a function can be subject to common subexpression elimination and
loop optimization just as an arithmetic operator would be. These
functions should be declared with the attribute pure. For example,
int square (int) __attribute__ ((pure));
Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.
Here is one for common sub-expression elimination (given foo is pure):
a = foo (99) * x + y;
b = foo (99) * x + z;
Can become:
_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.
A non-pure function
int foo(int x, int y) // possible side-effects
is like an extension of a pure function
int bar(int x, int y) // guaranteed no side-effects
in which you have, besides the explicit function arguments x, y,
the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.
It should be clear why it is much easier to reason about a pure function than a non-pure one.
Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:
#include <iostream>
#include <cstring>
constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}
constexpr const char * str = "asdfjkl;";
constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.
int main() {
std::cout << len << std::endl << std::strlen(str) << std::endl;
return 0;
}
The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.
So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)
In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:
Caching
Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary
Parallelism
Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence
Passing By Reference
A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains
In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.
No need to construct objects or mock connections to DBs / file system.

Resources