Is dividing the task into functions is beneficial or harmful ? [closed] - c

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am working with embedded systems(Mostly ARM Cortex M3-M4) using C language.And was wondering what are the advantages/ disadvantages of dividing the task into many functions such as;
void Handle_Something(void)
{
// do Task-1
// do Task-2
// do Task-3
//etc.
}
to
void Hanlde_Something(void)
{
// Handle_Task1();
// Handle_Tasl2();
//etc.
}
How are these two approaches can be examined with respect to stack usage and overall processing speed, and which is safer/better for what reason ? (You can assume this is outside of ISR)
From what I know, Memory in the stack is allocated/deallocated for local variables in each call/return cycle, thus dividing the task seems reasonable in terms of memory usage but when doing this, I sometimes get Hardfaults from different sources(mostly Bus or Undefined Instruction errors.) that I couldn't figured out why.
Also, working speed is very crucial for many applications in my field, so ı do need to know which methode provides faster responses.
I would appreciate an enlightment. Thanks everybody in advance

This is what's known as "pre-mature optimization".
In the old days when compilers where horrible, they couldn't inline functions by themselves. So in the old days a keyword inline was added to C - similar non-standard versions also existed before the year 1999. This was used to tell a bad compiler how it should generate code better.
Nowadays this is mostly history. Compilers are better than programmers at determining what and when to inline. They may however struggle when a called function is located in a different "translation unit" (basically, in a different .c file). But in your case I take it this is not the case, but Handle_Task1() etc can be regarded as functions in the same file.
With the above in mind:
How are these two approaches can be examined with respect to stack usage and overall processing speed
They are to be regarded as identical. They use the same stack space and take the same time to execute.
Unless you have a bad, older compiler - in which case function calls always take extra space and execution time. Since you are working with modern MCU:s, this should not be the case, or you desperately need to get a better compiler.
As a rule of thumb, it is always better practice to split up larger functions in several smaller, for the sake of readability and maintenance. Even in hard real-time systems, there exist very few cases where function call overhead is an actual bottleneck even when bad compilers are used.

Memory on the stack isn't allocated/reallocated from some complex memory pool. The stack pointer is simply increased/decreased. An operation that is basically free in all but the tightest/smallest loops imaginable (and those will probably be optimized by the compiler).
Don't group together functions because they could reuse variables e.g. don't create a bunch of int tempInt; long tempLong; variables you use throughout your entire program. A variable should serve only a single purpose and its scope should be kept as tight as possible. Also see: is it good or bad to reuse the variables?
Expanding on that, keeping the scope of all variables as local as possible might even cause your compiler to keep the variables in a cpu register only. A shortly used variable might actually never be allocated!
Try to limit functions to only a singly purpose and try avoiding side effects: if you avoid global variables a function becomes easier to test, optimize and understand as each time you call it with the exact same set of arguments it will preform the exact same action. Have a look at: Why are global variables bad, in a single threaded, non-os, embedded application

Each solution has advantages and disadvantages.
The first approach allows to execute the code (a priori) faster, because the asm code won't have instructions related to jumps. However, you have to take the readability into account, in terms of mixing different kind of functionalities in the same function (or creating large functions, which is not a good idea from the guidelines point of view).
The second solution could be easier to understand, because each function contains a simple task, furthermore it is easier for documenting (that is, you dont have to explain different "purposes" in the same function). As I said, this
solution is slower, because your "scheduler" contains jumps, nevertheless you could declare the simple tasks as inline, given that you can split the code in several simple tasks with a proper documentation and the compiler will generate an assembler as the first approach, that is, avoiding the jumps.
Another point is the use of memory. If your simple tasks are being called from different parts of the code, the first solution and the second solution with inline are worse (in terms of memory) than the second solution without inline, because the function is added as many times as it is called from different parts of your code.

Working with modules is always more efficient in terms of error handling, debugging and re-reading. Considering some heavy working libraries (SLAM, PCL etc.) as functions, they are used as external functions and they don't cause a significant loss of performance(tbh sometimes it's almost impossible to embed such large functions into your code). You may face slightly higher stack use as #Colin commented.

Related

C- Why is for loop pointer indexing faster? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Some years ago I was on a panel that was interviewing candidates for a relatively senior embedded C programmer position.
One of the standard questions that I asked was about optimisation techniques. I was quite surprised that some of the candidates didn't have answers.
So, in the interests of putting together a list for posterity - what techniques and constructs do you normally use when optimising C programs?
Answers to optimisation for speed and size both accepted.
First things first - don't optimise too early. It's not uncommon to spend time carefully optimising a chunk of code only to find that it wasn't the bottleneck that you thought it was going to be. Or, to put it another way "Before you make it fast, make it work"
Investigate whether there's any option for optimising the algorithm before optimising the code. It'll be easier to find an improvement in performance by optimising a poor algorithm than it is to optimise the code, only then to throw it away when you change the algorithm anyway.
And work out why you need to optimise in the first place. What are you trying to achieve? If you're trying, say, to improve the response time to some event work out if there is an opportunity to change the order of execution to minimise the time critical areas. For example when trying to improve the response to some external interrupt can you do any preparation in the dead time between events?
Once you've decided that you need to optimise the code, which bit do you optimise? Use a profiler. Focus your attention (first) on the areas that are used most often.
So what can you do about those areas?
minimise condition checking. Checking conditions (eg. terminating conditions for loops) is time that isn't being spent on actual processing. Condition checking can be minimised with techniques like loop-unrolling.
In some circumstances condition checking can also be eliminated by using function pointers. For example if you are implementing a state machine you may find that implementing the handlers for individual states as small functions (with a uniform prototype) and storing the "next state" by storing the function pointer of the next handler is more efficient than using a large switch statement with the handler code implemented in the individual case statements. YMMV.
minimise function calls. Function calls usually carry a burden of context saving (eg. writing local variables contained in registers to the stack, saving the stack pointer), so if you don't have to make a call this is time saved. One option (if you're optimising for speed and not space) is to make use of inline functions.
If function calls are unavoidable minimise the data that is being passed to the functions. For example passing pointers is likely to be more efficient than passing structures.
When optimising for speed choose datatypes that are the native size for your platform. For example on a 32bit processor it is likely to be more efficient to manipulate 32bit values than 8 or 16 bit values. (side note - it is worth checking that the compiler is doing what you think it is. I've had situations where I've discovered that my compiler insisted on doing 16 bit arithmetic on 8 bit values with all of the to and from conversions to go with them)
Find data that can be precalculated, and either calculate during initialisation or (better yet) at compile time. For example when implementing a CRC you can either calculate your CRC values on the fly (using the polynomial directly) which is great for size (but dreadful for performance), or you can generate a table of all of the interim values - which is a much faster implementation, to the detriment of the size.
Localise your data. If you're manipulating a blob of data often your processor may be able to speed things up by storing it all in cache. And your compiler may be able to use shorter instructions that are suited to more localised data (eg. instructions that use 8 bit offsets instead of 32 bit)
In the same vein, localise your functions. For the same reasons.
Work out the assumptions that you can make about the operations that you're performing and find ways of exploiting them. For example, on an 8 bit platform if the only operation that at you're doing on a 32 bit value is an increment you may find that you can do better than the compiler by inlining (or creating a macro) specifically for this purpose, rather than using a normal arithmetic operation.
Avoid expensive instructions - division is a prime example.
The "register" keyword can be your friend (although hopefully your compiler has a pretty good idea about your register usage). If you're going to use "register" it's likely that you'll have to declare the local variables that you want "register"ed first.
Be consistent with your data types. If you are doing arithmetic on a mixture of data types (eg. shorts and ints, doubles and floats) then the compiler is adding implicit type conversions for each mismatch. This is wasted cpu cycles that may not be necessary.
Most of the options listed above can be used as part of normal practice without any ill effects. However if you're really trying to eke out the best performance:
- Investigate where you can (safely) disable error checking. It's not recommended, but it will save you some space and cycles.
- Hand craft portions of your code in assembler. This of course means that your code is no longer portable but where that's not an issue you may find savings here. Be aware though that there is potentially time lost moving data into and out of the registers that you have at your disposal (ie. to satisfy the register usage of your compiler). Also be aware that your compiler should be doing a pretty good job on its own. (of course there are exceptions)
As everybody else has said: profile, profile profile.
As for actual techniques, one that I don't think has been mentioned yet:
Hot & Cold Data Separation: Staying within the CPU's cache is incredibly important. One way of helping to do this is by splitting your data structures into frequently accessed ("hot") and rarely accessed ("cold") sections.
An example: Suppose you have a structure for a customer that looks something like this:
struct Customer
{
int ID;
int AccountNumber;
char Name[128];
char Address[256];
};
Customer customers[1000];
Now, lets assume that you want to access the ID and AccountNumber a lot, but not so much the name and address. What you'd do is to split it into two:
struct CustomerAccount
{
int ID;
int AccountNumber;
CustomerData *pData;
};
struct CustomerData
{
char Name[128];
char Address[256];
};
CustomerAccount customers[1000];
In this way, when you're looping through your "customers" array, each entry is only 12 bytes and so you can fit many more entries in the cache. This can be a huge win if you can apply it to situations like the inner loop of a rendering engine.
My favorite technique is to use a good profiler. Without a good profile telling you where the bottleneck lies, no tricks and techniques are going to help you.
most common techniques I encountered are:
loop unrolling
loop optimization for better cache prefetch
(i.e. do N operations in M cycles instead of NxM singular operations)
data aligning
inline functions
hand-crafted asm snippets
As for general recommendations, most of them are already sounded:
choose better algos
use profiler
don't optimize if it doesn't give 20-30% performance boost
For low-level optimization:
START_TIMER/STOP_TIMER macros from ffmpeg (clock-level accuracy for measurement of any code).
Oprofile, of course, for profiling.
Enormous amounts of hand-coded assembly (just do a wc -l on x264's /common/x86 directory, and then remember most of the code is templated).
Careful coding in general; shorter code is usually better.
Smart low-level algorithms, like the 64-bit bitstream writer I wrote that uses only a single if and no else.
Explicit write-combining.
Taking into account important weird aspects of processors, like Intel's cacheline split issue.
Finding cases where one can losslessly or near-losslessly make an early termination, where the early-termination check costs much less than the speed one gains from it.
Actually inlined assembly for tasks which are far more suited to the x86 SIMD unit, such as median calculations (requires compile-time check for MMX support).
First and foremost, use a better/faster algorithm. There is no point optimizing code that is slow by design.
When optimizing for speed, trade memory for speed: lookup tables of precomputed values, binary trees, write faster custom implementation of system calls...
When trading speed for memory: use in-memory compression
Avoid using the heap. Use obstacks or pool-allocator for identical sized objects. Put small things with short lifetime onto the stack. alloca still exists.
Pre-mature optimization is the root of all evil!
;)
As my applications usually don't need much CPU time by design, I focus on the size my binaries on disk and in memory. What I do mostly is looking out for statically sized arrays and replacing them with dynamically allocated memory where it's worth the additional effort of free'ing the memory later. To cut down the size of the binary, I look for big arrays that are initialized at compile time and put the initializiation to runtime.
char buf[1024] = { 0, };
/* becomes: */
char buf[1024];
memset(buf, 0, sizeof(buf));
This will remove the 1024 zero-bytes from the binaries .DATA section and will instead create the buffer on the stack at runtime and the fill it with zeros.
EDIT: Oh yeah, and I like to cache things. It's not C specific but depending on what you're caching, it can give you a huge boost in performance.
PS: Please let us know when your list is finished, I'm very curious. ;)
If possible, compare with 0, not with arbitrary numbers, especially in loops, because comparison with 0 is often implemented with separate, faster assembler commands.
For example, if possible, write
for (i=n; i!=0; --i) { ... }
instead of
for (i=0; i!=n; ++i) { ... }
Another thing that was not mentioned:
Know your requirements: don't optimize for situations that will unlikely or never happen, concentrate on the most bang for the buck
basics/general:
Do not optimize when you have no problem.
Know your platform/CPU...
...know it thoroughly
know your ABI
Let the compiler do the optimization, just help it with the job.
some things that have actually helped:
Opt for size/memory:
Use bitfields for storing bools
re-use big global arrays by overlaying with a union (be careful)
Opt for speed (be careful):
use precomputed tables where possible
place critical functions/data in fast memory
Use dedicated registers for often used globals
count to-zero, zero flag is free
Difficult to summarize ...
Data structures:
Splitting of a data structure depending on case of usage is extremely important. It is common to see a structure that holds data that is accessed based on a flow control. This situation can lower significantly the cache usage.
To take into account cache line size and prefetch rules.
To reorder the members of the structure to obtain a sequential access to them from your code
Algorithms:
Take time to think about your problem and to find the correct algorithm.
Know the limitations of the algorithm you choose (a radix-sort/quick-sort for 10 elements to be sorted might not be the best choice).
Low level:
As for the latest processors it is not recommended to unroll a loop that has a small body. The processor provides its own detection mechanism for this and will short-circuit whole section of its pipeline.
Trust the HW prefetcher. Of course if your data structures are well designed ;)
Care about your L2 cache line misses.
Try to reduce as much as possible the local working set of your application as the processors are leaning to smaller caches per cores (C2D enjoyed a 3MB per core max where iCore7 will provide a max of 256KB per core + 8MB shared to all cores for a quad core die.).
The most important of all: Measure early, Measure often and never ever makes assumptions, base your thinking and optimizations on data retrieved by a profiler (please use PTU).
Another hint, performance is key to the success of an application and should be considered at design time and you should have clear performance targets.
This is far from being exhaustive but should provide an interesting base.
These days, the most important things in optimzation are:
respecting the cache - try to access memory in simple patterns, and don't unroll loops just for fun. Use arrays instead of data structures with lots of pointer chasing and it'll probably be faster for small amounts of data. And don't make anything too big.
avoiding latency - try to avoid divisions and stuff that's slow if other calculations depend on them immediately. Memory accesses that depend on other memory accesses (ie, a[b[c]]) are bad.
avoiding unpredictabilty - a lot of if/elses with unpredictable conditions, or conditions that introduce more latency, will really mess you up. There's a lot of branchless math tricks that are useful here, but they increase latency and are only useful if you really need them. Otherwise, just write simple code and don't have crazy loop conditions.
Don't bother with optimizations that involve copy-and-pasting your code (like loop unrolling), or reordering loops by hand. The compiler usually does a better job than you at doing this, but most of them aren't smart enough to undo it.
Collecting profiles of code execution get you 50% of the way there. The other 50% deals with analyzing these reports.
Further, if you use GCC or VisualC++, you can use "profile guided optimization" where the compiler will take info from previous executions and reschedule instructions to make the CPU happier.
Inline functions! Inspired by the profiling fans here I profiled an application of mine and found a small function that does some bitshifting on MP3 frames. It makes about 90% of all function calls in my applcation, so I made it inline and voila - the program now uses half of the CPU time it did before.
On most of embedded system i worked there was no profiling tools, so it's nice to say use profiler but not very practical.
First rule in speed optimization is - find your critical path.
Usually you will find that this path is not so long and not so complex. It's hard to say in generic way how to optimize this it's depend on what are you doing and what is in your power to do. For example you want usually avoid memcpy on critical path, so ever you need to use DMA or optimize, but what if you hw does not have DMA ? check if memcpy implementation is a best one if not rewrite it.
Do not use dynamic allocation at all in embedded but if you do for some reason don't do it in critical path.
Organize your thread priorities correctly, what is correctly is real question and it's clearly system specific.
We use very simple tools to analyze the bottle-necks, simple macro that store the time-stamp and index. Few (2-3) runs in 90% of cases will find where you spend your time.
And the last one is code review a very important one. In most case we avoid performance problem during code review very effective way :)
Measure performance.
Use realistic and non-trivial benchmarks. Remember that "everything is fast for small N".
Use a profiler to find hotspots.
Reduce number of dynamic memory allocations, disk accesses, database accesses, network accesses, and user/kernel transitions, because these often tend to be hotspots.
Measure performance.
In addition, you should measure performance.
Sometimes you have to decide whether it is more space or more speed that you are after, which will lead to almost opposite optimizations. For example, to get the most out of you space, you pack structures e.g. #pragma pack(1) and use bit fields in structures. For more speed you pack to align with the processors preference and avoid bitfields.
Another trick is picking the right re-sizing algorithms for growing arrays via realloc, or better still writing your own heap manager based on your particular application. Don't assume the one that comes with the compiler is the best possible solution for every application.
If someone doesn't have an answer to that question, it could be they don't know much.
It could also be that they know a lot. I know a lot (IMHO :-), and if I were asked that question, I would be asking you back: Why do you think that's important?
The problem is, any a-priori notions about performance, if they are not informed by a specific situation, are guesses by definition.
I think it is important to know coding techniques for performance, but I think it is even more important to know not to use them, until diagnosis reveals that there is a problem and what it is.
Now I'm going to contradict myself and say, if you do that, you learn how to recognize the design approaches that lead to trouble so you can avoid them, and to a novice, that sounds like premature optimization.
To give you a concrete example, this is a C application that was optimized.
Great lists. I will just add one tip I didn't saw in the above lists that in some case can yield huge optimisation for minimal cost.
bypass linker
if you have some application divided in two files, say main.c and lib.c, in many cases you can just add a \#include "lib.c" in your main.c That will completely bypass linker and allow for much more efficient optimisation for compiler.
The same effect can be achieved optimizing dependencies between files, but the cost of changes is usually higher.
Sometimes Google is the best algorithm optimization tool. When I have a complex problem, a bit of searching reveals some guys with PhD's have found a mapping between this and a well-known problem and have already done most of the work.
I would recommend optimizing using more efficient algorithms and not do it as an afterthought but code it that way from the start. Let the compiler work out the details on the small things as it knows more about the target processor than you do.
For one, I rarely use loops to look things up, I add items to a hashtable and then use the hashtable to lookup the results.
For example you have a string to lookup and then 50 possible values. So instead of doing 50 strcmps, you add all 50 strings to a hashtable and give each a unique number ( you only have to do this once ). Then you lookup the target string in the hashtable and have one large switch with all 50 cases ( or have functions pointers ).
When looking up things with common sets of input ( like css rules ), I use fast code to keep track of the only possible solitions and then iterate thought those to find a match. Once I have a match I save the results into a hashtable ( as a cache ) and then use the cache results if I get that same input set later.
My main tools for faster code are:
hashtable - for quick lookups and for caching results
qsort - it's the only sort I use
bsp - for looking up things based on area ( map rendering etc )

How does function parameters and local variables affect on the performance of a C program [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
is it worth to reduce the number of function parameters and local variables to enhance the performance of a C program?
in the provided code below, maybe the example doesn't make a difference if the function is called few times during the execution but maybe it make sense if it's called n times, so is there any performance benefits in this case?
int n[4];
// read numbers ...
do_sum1(n[0], n[1], n[2], n[4]);
do_sum2(n);
// Functions definition
// --------------------
void do_sum1(int a, int b, int c, int d)
{
printf("%d\n", a + b + c + d);
}
void do_sum2(int n[4])
{
printf("%d\n", n[0] + n[1] + n[2] + n[3]);
}
The question is trickier than it seems.
First of, let's assume the function is not inlined (as otherwise they are more than likely to be compiled to the same code) and let's analyze the effect.
From one hand, the number of parameters to function affects the performance. All other things being equal, the more parameters are passed, the worse the performance - since copying parameters to the place function expects to find them (be it a stack, a register or any other storage) takes non-0 time.
From the other hand, semantic matters, and in this particular case, things are not equal! In your first case, you are passing 4 integer parameters. Assuming AMD64 ABI, they will be passed in CPU registers - which are super-fast to be accessed for both reading and writing.
However, in the second case you are effectively passing a pointer to a memory location. Which means, accessing values through this pointer means indirection, and at best the values would be found in L1 CPU cache (most likely), but at worst will be read from main memory (super slow!). While L1 cache is fast, it is still much slower when compared to register access.
Bottom line:
I expect the second case to be slower, than the first one.
is it worth to reduce the number of function parameters and local variables to enhance the performance of a C program?
The first question to ask is, will it enhance the performance of your program? Will it make a measurable difference at all? Don't blindly assume that it will.
The second question to ask is, what are the tradeoffs of doing so? How will it affect your ability to debug and maintain your code? Yes, you could make everything global to eliminate passing parameters and using locals, but the effect of that will be to make your code much harder to understand and maintain.
When thinking about improving performance, you should start from the highest level and work your way down:
Are you using the right data structures and/or algorithms for the problem at hand? For example, an unoptimized quicksort will still beat the pants off an aggressively tuned bubble sort (in the average case), binary searches are usually faster than linear searches, etc.
Are you using the appropriate tools or libraries, or are you hand-hacking everything yourself? It's possible there's already a solution out there that's been tested and tuned, such that you don't have to write from scratch.
Have you implemented your design well? For example, do you have any invariants in your loop bodies?
If the answer to all of those question is "yes", then the next step is to let the compiler do some optimizing (such as using the -O2 flag with gcc). C compilers have gotten very good at optimizing code, and depending on the program you can see some significant speedup.
If at this point you still feel your code is too slow, then you need to do some analysis. Run the code through a profiler to find where the bottlenecks are. At this point, you can start to look at micro-optimizations like reducing the number of parameters being passed to a function. Just be aware that it may not make enough of a difference to be worth the effort.

In c: do internal states improve speed?

Lets say I have a function with two parameters that is repetitively called. Does it increase the memory usage when you have functions with arguments?
Would it be faster to generate a function for each repetitive case, and call that function with no parameters?
I believe this is sometimes refereed to as 'internal state', but my question is which of the two options will perform faster?
EDIT>>>>>>>>
Your answers are all enlightening, allow me to clarify all at once.
It seems logical that
x = x + 10
would be faster than:
x = x + y
And I'm not talking about the time it takes to define and initialize y, I am just talking about the operation itself. I'm logically, in the second case there must be some extra step in which the CPU must find Y before performing the operation. When you amplify this with functions and then multiply it over and over, I would assume this would make a significant difference.
And yes, what in my case it applies to physics and the speed will likely be felt.
PS I am very interested in compiler functionality and debating learning assembler.
Parameters are typically passed on the stack so they don't take up more memory.
Parameters may be "un-noticeably" slower because the values may be copied to the stack (depends on how good the compiler is at optimizing).
The compiler is way smarter than you are, so don't try to outsmart the compiler. Write clear code and let the compiler worry about performance.
re: your edit
"it depends"
Does your processor have a different instruction to add 10 to a variable?
What sort of addressing modes does it support?
Regardless of the answers to the above, does the compiler make use of all the processor's features which might squeeze out every drip of performance.
e.g. - The good old 68000 chips had an "INC" opcode to increment a register by 1. It was much faster than other methods. If you were hand rolling assembly the fastest way to do x = x + 10 might have been to call INC 10 times...
I've worked with time constrained real time embedded apps and never had to worry about this level of optimization. I'd write the code and worry about performance if/when it becomes an issue.
Is the repetitive call is made with compile-time parameters, then you can indeed improve performance by "instantiating" a special version of the same function for the given set of compile-time parameters. In such cases the function will not even have a "state": the parameter values will essentially be embedded into the function code. Some compilers can do it implicitly.
The amount of improvement will depend on the nature of the function. In each given version of the function the entire blocks of code might be easily recognized as unreachable and eliminated entirely. One can also say that function inlining by nature involves the same kind of optimization.
Obviously, using such optimizations thoughtlessly might easily lead to a combinatorial explosion of the number of different versions of the same function (i.e. to code bloat), so it should be used with care.
(BTW, this is very similar to what C++ function templates with non-type template parameters do.)
If the repetitive call is made with run-time parameters, then pre-saving them in a run-time state might not achieve any significant improvement. Retrieving parameter values from some "state" is not necessarily more efficient than retrieving them from the "regular" function parameters.
Of course, there are such classic techniques as packing multiple function parameters into a struct object and passing such struct object to the function (instead of passing a large number of independent parameters). If the parameters remain unchanged between multiple calls, then this does improve overall performance by saving time on parameter preparation. But whether to call such struct object a "state" or not is a different question. It is definitely a manual technique, not something done by the compiler and involving any "internal state".
Does it increase the memory usage when you have functions with arguments?
No, function arguments are passed on the stack (or in registers if x64 calling convention).
Would it be faster to generate a function for each repetitive case, and call that function with no parameters?
No, your compiler should optimize it for you, there's no need to make your code less readable

many small sized functions

in computer literature it is generally recommended to write short functions as much as possible. I understand it may increase readability (although not always), and such approach also provides more flexibility. But does it have something to do with optimization as well? I mean -- does it matter to a compiler to compile a bunch of small routines rather than a few large routines?
Thanks.
That depends on the compiler. Many older compilers only optimized a single function at a time, so writing larger functions (up to to some limit) could improve optimization -- but (with most of them) exceeding that limit turned optimization off completely.
Most reasonably current compilers can generate inline code for functions (and C99 added the ineline keyword to facilitate that) and do global (cross-function) optimization, in which case it normally makes no difference at all.
#twain249 and #Jerry are both correct; breaking a program into multiple functions can have a negative effect on performance, but it depends on whether or not the compiler can optimize the functions into inline code.
The only way to know for sure is to examine the assembler output of your program and do some profiling. For example, if you know a particular code path is causing a performance problem, you can look at the assembler, and see how many functions are getting called, how many times parameters are being pushed onto the stack, etc. In that case, you may want to consolidate small functions into one larger one.
This has been a concern for me in the past: doing very tight optimization for embedded projects, I have consciously tried to reduce the number of function calls, especially in tight loops. However, this does produce ungainly functions, sometimes several pages long. To mitigate the maintenance cost of this, you can use macros, which I have leveraged heavily and successfully to make sure there are no function calls while at the same time preserving readability.

Is it okay to use functions to stay organized in C?

I'm a relatively new C programmer, and I've noticed that many conventions from other higher-level OOP languages don't exactly hold true on C.
Is it okay to use short functions to have your coding stay organized (even though it will likely be called only once)? An example of this would be 10-15 lines in something like void init_file(void), then calling it first in main().
I would have to say, not only is it OK, but it's generally encouraged. Just don't overly fragment the train of thought by creating myriads of tiny functions. Try to ensure that each function performs a single cohesive, well... function, with a clean interface (too many parameters can be a hint that the function is performing work which is not sufficiently separate from it's caller).
Furthermore, well-named functions can serve to replace comments that would otherwise be needed. As well as providing re-use, functions can also (or instead) provide a means to organize the code and break it down into smaller units which can be more readily understood. Using functions in this way is very much like creating packages and classes/modules, though at a more fine-grained level.
Yes. Please. Don't write long functions. Write short ones that do one thing and do it well. The fact that they may only be called once is fine. One benefit is that if you name your function well, you can avoid writing comments that will get out of sync with the code over time.
If I can take the liberty to do some quoting from Code Complete:
(These reason details have been abbreviated and in spots paraphrased, for the full explanation see the complete text.)
Valid Reasons to Create a Routine
Note the reasons overlap and are not intended to be independent of each other.
Reduce complexity - The single most important reason to create a routine is to reduce a program's complexity (hide away details so you don't need to think about them).
Introduce an intermediate, understandable abstraction - Putting a section of code int o a well-named routine is one of the best ways to document its purpose.
Avoid duplicate code - The most popular reason for creating a routine. Saves space and is easier to maintain (only have to check and/or modify one place).
Hide sequences - It's a good idea to hide the order in which events happen to be processed.
Hide pointer operations - Pointer operations tend to be hard to read and error prone. Isolating them into routines shifts focus to the intent of the operation instead of the mechanics of pointer manipulation.
Improve portability - Use routines to isolate nonportable capabilities.
Simplify complicated boolean tests - Putting complicated boolean tests into a function makes the code more readable because the details of the test are out of the way and a descriptive function name summarizes the purpose of the tests.
Improve performance - You can optimize the code in one place instead of several.
To ensure all routines are small? - No. With so many good reasons for putting code into a routine, this one is unnecessary. (This is the one thrown into the list to make sure you are paying attention!)
And one final quote from the text (Chapter 7: High-Quality Routines)
One of the strongest mental blocks to
creating effective routines is a
reluctance to create a simple routine
for a simple purpose. Constructing a
whole routine to contain two or three
lines of code might seem like
overkill, but experience shows how
helpful a good small routine can be.
If a group of statements can be thought of as a thing - then make them a function
i think it is more than OK, I would recommend it! short easy to prove correct functions with well thought out names lead to code which is more self documenting than long complex functions.
Any compiler worth using will be able to inline these calls to generate efficient code if needed.
Functions are absolutely necessary to stay organized. You need to first design the problem, and then depending on the different functionality you need to split them into functions. Some segment of code which is used multiple times, probably needs to be written in a function.
I think first thinking about what problem you have in hand, break down the components and for each component try writing a function. When writing the function see if there are some code segment doing the same thing, then break it into a sub function, or if there is a sub module then it is also a candidate for another function. But at some time this breaking job should stop, and it depends on you. Generally, do not make many too big functions and not many too small functions.
When construction the function please consider the design to have high cohesion and low coupling.
EDIT1::
you might want to also consider separate modules. For example if you need to use a stack or queue for some application. Make it separate modules whose functions could be called from other functions. This way you can save re-coding commonly used modules by programming them as a group of functions stored separately.
Yes
I follow a few guidelines:
DRY (aka DIE)
Keep Cyclomatic Complexity low
Functions should fit in a Terminal window
Each one of these principles at some point will require that a function be broken up, although I suppose #2 could imply that two functions with straight-line code should be combined. It's somewhat more common to do what is called method extraction than actually splitting a function into a top and bottom half, because the usual reason is to extract common code to be called more than once.
#1 is quite useful as a decision aid. It's the same thing as saying, as I do, "never copy code".
#2 gives you a good reason to break up a function even if there is no repeated code. If the decision logic passes a certain complexity threshold, we break it up into more functions that make fewer decisions.
It is indeed a good practice to refactor code into functions, irrespective of the language being used. Even if your code is short, it will make it more readable.
If your function is quite short, you can consider inlining it.
IBM Publib article on inlining

Resources