Have you written very long functions? If so, why? - c

I am writing an academic project about extremely long functions in the Linux kernel.
For that purpose, I am looking for examples for real-life functions that are extremely long (few hundreds of lines of code), that you don't consider bad programming (i.e., they won't benefit from decomposition or usage of a dispatch table).
Have you ever written or seen such a code? Can you post or link to it, and give explanation of why is it so long?
I have been getting amazing help from the community here - any idea that will be taken into the project will be properly credited.
Thanks,
Udi

The longest functions that I have ever written all have one thing in common, a very large switch statement. There are times, when you have to switch on a long list of items and it would only make things harder to understand if you tried to refactor some of the options into a separate function. Having large switch statements makes the Cyclomatic complexity go through the roof, but it is often better than the alternative implementations.

It was the last one before I got fired.

A previous job: An extremely long case statement, IIRC 1000+ lines. This was long before objects. Each option was only a few lines long. Breaking it up would have made it less clear. There were actually a pair of such routines doing different things to the same underlying set of data types.
Sorry, I don't have the code anymore and it isn't mine to post, anyway.

The longest function that I didn't see as being horrible would be the key method of a custom CPU VM. As with #epotter, this involved a big switch statement. In fact I'd say a lot of method that I find resist being cleanly broken down or improved in readability involve switch statements.

Unfortunately, you won't often find this type of subroutine checked in or posted somewhere if it's auto-generated during a build step using some sort of code generator.
So look for projects that have C generated from another language.

Beside the performance, I think the size of the call stack in Kernel space is 8K (please verify the size). Also, as far as I know, code in kernel is fairly specific. If some code is unlikely to be re-used in the future why bother make it a function considering function call overhead.

I could imagine that when speed is important (such as when holding some sort of lock in the kernel) you would not want to break up a function because of the overhead due to making a functional call. When compiled, parameters have to be pushed onto the stack and data has to be popped off before returning. Therefor you may have a large function for efficiency reasons.

Related

where can I find signal and alarm function definitions?

I studied about signal and alarm functions but not satisfied hence thought that their definitions can help me.
THIS MIGHT help----->. Good Explanations ::
1.http://www.gnu.org/software/libc/manual/html_node/Signal-Handling.html#Signal-Handling
2.http://www.cs.cf.ac.uk/Dave/C/node24.html
3.http://www.thegeekstuff.com/2012/03/catch-signals-sample-c-code/
The signal(3) C library function is usually a thin wrapper around a system call to the underlying kernel. alarm(3) does a bit more work, but again falls back on the kernel's idea of time handling and signal delivery.
If you really want to know how they work, you'd have to dig into the source of a Unix(y) kernel. Be warned, the code you'll find is probably very complex, kernel programmers have to handle some very exotic corner cases, and be wary of weird uses thay might lead to security problems. All that while keeping it as fast as possible (it's code that'll be used hundreds or thousands of times a second on millions of machines).
Next best would be to check out a book on Unix internals.
Man pages are the best places to know about the working and usage of any system-defined function. From your question, you actually mean the implementations of the functions(I guess).
But ask yourself, why do you need to look into the code? All that happens is it induces more questions, since most of the variables and functions used will be defined somewhere else and you need to look into them again.
If you are so dissatisfied, then try some tough problems involving these functions.

Is it good to use functions as much as possible?

When I read open source codes (Linux C codes), I see a lot functions are used instead of performing all operations on the main(), for example:
int main(void ){
function1();
return 0;
}
void function() {
// do something
function2();
}
void function2(){
function3();
//do something
function4();
}
void function3(){
//do something
}
void function4(){
//do something
}
Could you tell me what are the pros and cons of using functions as much as possible?
easy to add/remove functions (or new operations)
readability of the code
source efficiency(?) as the variables in the functions will be destroyed (unless dynamic allocation is done)
would the nested function slow the code flow?
Easy to add/remove functions (or new operations)
Definitely - it's also easy to see where does the context for an operation start/finish. It's much easier to see that way than by some arbitrary range of lines in the source.
Readability of the code
You can overdo it. There are cases where having a function or not having it does not make a difference in linecount, but does in readability - and it depends on a person whether it's positive or not.
For example, if you did lots of set-bit operations, would you make:
some_variable = some_variable | (1 << bit_position)
a function? Would it help?
Source efficiency(?) due to the variables in the functions being destroyed (unless dynamic allocation is done)
If the source is reasonable (as in, you're not reusing variable names past their real context), then it shouldn't matter. Compiler should know exactly where the value usage stops and where it can be ignored / destroyed.
Would the nested function slow the code flow?
In some cases where address aliasing cannot be properly determined it could. But it shouldn't matter in practice in most programs. By the time it starts to matter, you're probably going to be going through your application with a profiler and spotting problematic hotspots anyway.
Compilers are quite good these days at inlining functions though. You can trust them to do at least a decent job at getting rid of all cases where calling overhead is comparable to function length itself. (and many other cases)
This practice of using functions is really important as the amount of code you write increases. This practice of separating out to functions improves code hygiene and makes it easier to read. I read somewhere that there really is no point of code if it is only readable by you only (in some situations that is okay I'm assuming). If you want your code to live on, it must be maintainable and maintainability is one created by creating functions in the simplest sense possible. Also imagine where your code-base exceeds well over 100k lines. This is quite common and imagine having that all in the main function. That would be an absolute nightmare to maintain. Dividing the code into function helps create degrees of separability so many developers can work on different parts of the code-base. So basically short answer is yes, it is good to use functions when necessary.
Functions should help you structure your code. The basic idea is that when you identify some place in the code which does something that can be described in a coherent, self-contained way, you should think about putting it into a function.
Pros:
Code reuse. If you do many times some sequence of operations, why don't you write it once, use it many times?
Readability: it's much easier to understand strlen(st) than while (st[i++] != 0);
Correctness: the code in the previous line is actually buggy. If it is scattered around, you may probably not even see this bug, and if you will fix it in one place, the bug will stay somewhere else. But given this code inside a function named strlen, you will know what it should do, and you can fix it once.
Efficiency: sometimes, in certain situations, compilers may do a better job when compiling a code inside a function. You probably won't know it in advance, though.
Cons:
Splitting a code into functions just because it is A Good Thing is not a good idea. If you find it hard to give the function a good name (in your mother language, not only in C) it is suspicious. doThisAndThat() is probably two functions, not one. part1() is simply wrong.
Function call may cost you in execution time and stack memory. This is not as severe as it sounds, most of the time you should not care about it, but it's there.
When abused, it may lead to many functions doing partial work and delegating other parts from here to there. too many arguments may impede readability too.
There are basically two types of functions: functions that do a sequence of operations (these are called "procedures" in some contexts), and functions that does some form of calculation. These two types are often mixed in a single function, but it helps to remember this distinction.
There is another distinction between kinds of functions: Those that keep state (like strtok), those that may have side effects (like printf), and those that are "pure" (like sin). Function like strtok are essentially a special kind of a different construct, called Object in Object Oriented Programming.
You should use functions that perform one logical task each, at a level of abstraction that makes the function of each function easy to logically verify. For instance:
void create_ui() {
create_window();
show_window();
}
void create_window() {
create_border();
create_menu_bar();
create_body();
}
void create_menu_bar() {
for(int i = 0; i < N_MENUS; i++) {
create_menu(menus[i]);
}
assemble_menus();
}
void create_menu(arg) {
...
}
Now, as far as creating a UI is concerned, this isn't quite the way one would do it (you would probably want to pass in and return various components), but the logical structure is what I'm trying to emphasize. Break your task down into a few subtasks, and make each subtask its own function.
Don't try to avoid functions for optimization. If it's reasonable to do so, the compiler will inline them for you; if not, the overhead is still quite minimal. The gain in readability you get from this is a great deal more important than any speed you might get from putting everything in a monolithic function.
As for your title question, "as much as possible," no. Within reason, enough to see what each function does at a comfortable level of abstraction, no less and no more.
One condition you can use: if part of the code will be reuse/rewritten, then put it in a function.
I guess I think of functions like legos. You have hundreds of small pieces that you can put together into a whole. As a result of all of those well designed generic, small pieces you can make anything. If you had a single lego that looked like an entire house you couldn't then use it to build a plane, or train. Similarly, one huge piece of code is not so useful.
Functions are your bricks that you use when you design your project. Well chosen separation of functionality into small, easily testable, self contained "functions" makes building and looking after your whole project easy. Their benefits WAYYYYYYY out-weigh any possible efficiency issues you may think are there.
To be honest, the art of coding any sizeable project is in how you break it down into smaller pieces, so functions are key to that.

How much faster is C than R in practice?

I wrote a Gibbs sampler in R and decided to port it to C to see whether it would be faster. A lot of pages I have looked at claim that C will be up to 50 times faster, but every time I have used it, it's only about five or six times faster than R. My question is: is this to be expected, or are there tricks which I am not using which would make my C code significantly faster than this (like how using vectorization speeds up code in R)? I basically took the code and rewrote it in C, replacing matrix operations with for loops and making all the variables pointers.
Also, does anyone know of good resources for C from the point of view of an R programmer? There's an excellent book called The Art of R Programming by Matloff, but it seems to be written from the perspective of someone who already knows C.
Also, the screen tends to freeze when my C code is running in the standard R GUI for Windows. It doesn't crash; it unfreezes once the code has finished running, but it stops me from doing anything else in the GUI. Does anybody know how I could avoid this? I am calling the function using .C()
Many of the existing posts have explicit examples you can run, for example Darren Wilkinson has several posts on his blog analyzing this in different languages, and later even on different hardware (eg comparing his high-end laptop to his netbook and to a Raspberry Pi). Some of his posts are
the initial (then revised) post
another later post
and there are many more on his site -- these often compare C, Java, Python and more.
Now, I also turned this into a version using Rcpp -- see this blog post. We also used the same example in a comparison between Julia, Python and R/C++ at useR this summer so you should find plenty other examples and references. MCMC is widely used, and "easy pickings" for speedups.
Given these examples, allow me to add that I disagree with the two earlier comments your question received. The speed will not be the same, it is easy to do better in an example such as this, and your C/C++ skills will mostly determines how much better.
Finally, an often overlooked aspect is that the speed of the RNG matters a lot. Running down loops and adding things up is cheap -- doing "good" draws is not, and a lot of inter-system variation comes from that too.
About the GUI freezing, you might want to call R_CheckUserInterrupt and perhaps R_ProcessEvents every now and then.
I would say C, done properly, is much faster than R.
Some easy gains you could try:
Set the compiler to optimize for more speed.
Compiling with the -march flag.
Also if you're using VS, make sure you're compiling with release options, not debug.
Your observed performance difference will depend on a number of things: the type of operations that you are doing, how you write the C code, what type of compiler-level optimizations you use, your target CPU architecture, etc etc.
You can write basic, sloppy C and get something that works and runs with decent efficiency. You can also fine-tune your code for the unique characteristics of your target CPU - perhaps invoking specialized assembly instructions - and squeeze every last drop of performance that you can out of the code. You could even write code that runs significantly slower than the R version. C gives you a lot of flexibility. The limiting factor here is how much time that you want to put into writing and optimizing the C code.
The reverse is also true (duplicate the previous paragraph here, but swap "C" and "R").
I'm not trying to sound facetious, but there's really not a straightforward answer to your question. The only way to tell how much faster your C version would be is to write the code both ways and benchmark them.

Is it okay to use functions to stay organized in C?

I'm a relatively new C programmer, and I've noticed that many conventions from other higher-level OOP languages don't exactly hold true on C.
Is it okay to use short functions to have your coding stay organized (even though it will likely be called only once)? An example of this would be 10-15 lines in something like void init_file(void), then calling it first in main().
I would have to say, not only is it OK, but it's generally encouraged. Just don't overly fragment the train of thought by creating myriads of tiny functions. Try to ensure that each function performs a single cohesive, well... function, with a clean interface (too many parameters can be a hint that the function is performing work which is not sufficiently separate from it's caller).
Furthermore, well-named functions can serve to replace comments that would otherwise be needed. As well as providing re-use, functions can also (or instead) provide a means to organize the code and break it down into smaller units which can be more readily understood. Using functions in this way is very much like creating packages and classes/modules, though at a more fine-grained level.
Yes. Please. Don't write long functions. Write short ones that do one thing and do it well. The fact that they may only be called once is fine. One benefit is that if you name your function well, you can avoid writing comments that will get out of sync with the code over time.
If I can take the liberty to do some quoting from Code Complete:
(These reason details have been abbreviated and in spots paraphrased, for the full explanation see the complete text.)
Valid Reasons to Create a Routine
Note the reasons overlap and are not intended to be independent of each other.
Reduce complexity - The single most important reason to create a routine is to reduce a program's complexity (hide away details so you don't need to think about them).
Introduce an intermediate, understandable abstraction - Putting a section of code int o a well-named routine is one of the best ways to document its purpose.
Avoid duplicate code - The most popular reason for creating a routine. Saves space and is easier to maintain (only have to check and/or modify one place).
Hide sequences - It's a good idea to hide the order in which events happen to be processed.
Hide pointer operations - Pointer operations tend to be hard to read and error prone. Isolating them into routines shifts focus to the intent of the operation instead of the mechanics of pointer manipulation.
Improve portability - Use routines to isolate nonportable capabilities.
Simplify complicated boolean tests - Putting complicated boolean tests into a function makes the code more readable because the details of the test are out of the way and a descriptive function name summarizes the purpose of the tests.
Improve performance - You can optimize the code in one place instead of several.
To ensure all routines are small? - No. With so many good reasons for putting code into a routine, this one is unnecessary. (This is the one thrown into the list to make sure you are paying attention!)
And one final quote from the text (Chapter 7: High-Quality Routines)
One of the strongest mental blocks to
creating effective routines is a
reluctance to create a simple routine
for a simple purpose. Constructing a
whole routine to contain two or three
lines of code might seem like
overkill, but experience shows how
helpful a good small routine can be.
If a group of statements can be thought of as a thing - then make them a function
i think it is more than OK, I would recommend it! short easy to prove correct functions with well thought out names lead to code which is more self documenting than long complex functions.
Any compiler worth using will be able to inline these calls to generate efficient code if needed.
Functions are absolutely necessary to stay organized. You need to first design the problem, and then depending on the different functionality you need to split them into functions. Some segment of code which is used multiple times, probably needs to be written in a function.
I think first thinking about what problem you have in hand, break down the components and for each component try writing a function. When writing the function see if there are some code segment doing the same thing, then break it into a sub function, or if there is a sub module then it is also a candidate for another function. But at some time this breaking job should stop, and it depends on you. Generally, do not make many too big functions and not many too small functions.
When construction the function please consider the design to have high cohesion and low coupling.
EDIT1::
you might want to also consider separate modules. For example if you need to use a stack or queue for some application. Make it separate modules whose functions could be called from other functions. This way you can save re-coding commonly used modules by programming them as a group of functions stored separately.
Yes
I follow a few guidelines:
DRY (aka DIE)
Keep Cyclomatic Complexity low
Functions should fit in a Terminal window
Each one of these principles at some point will require that a function be broken up, although I suppose #2 could imply that two functions with straight-line code should be combined. It's somewhat more common to do what is called method extraction than actually splitting a function into a top and bottom half, because the usual reason is to extract common code to be called more than once.
#1 is quite useful as a decision aid. It's the same thing as saying, as I do, "never copy code".
#2 gives you a good reason to break up a function even if there is no repeated code. If the decision logic passes a certain complexity threshold, we break it up into more functions that make fewer decisions.
It is indeed a good practice to refactor code into functions, irrespective of the language being used. Even if your code is short, it will make it more readable.
If your function is quite short, you can consider inlining it.
IBM Publib article on inlining

Which is faster for large "for" loop: function call or inline coding?

I have programmed an embedded software (using C of course) and now I'm considering ways to improve the running time of the system. The most important single module in my system is one very large nested for loop module.
That module consists of two nested for loops that loops max 122500 times. That's not very much yet, but the problem is that inside that nested for loop I have a function call to a function that is in another source file. That specific function consists mostly of two another nested for loops which loops always 22500 times. So now I have to make a function call 122500 times.
I have made that function that is to be called a lot lighter and shorter (yet still works as it should) and now I started to think that would it be faster to rip off that function call and write that process directly inside those first two for loops?
The processor in that system is ARM7TDMI and its frequency is 55MHz. The system itself isn't very time critical so it doesn't have to be real time capable. However the faster it can process its duties the better.
Also would it be also faster to use while loops instead of fors? And any piece of advice about how to improve the running time is appreciated.
-zaplec
TRY IT AND SEE!!
It'll almost certainly make a difference. Function call overhead isn't usually that much of an issue, but at over 100K repetitions it starts to add up.
...But whether or not it makes any real-world difference is something only you can answer, after trying it and timing the results.
As for for vs while... it shouldn't matter unless you actually change the behavior when changing the loop. If in doubt, make your compiler spit out assembler code for both and compare... or just change it and time it.
You need to be careful in the optimizations you make because you aren't always clear on which optimizations the compiler is making for you. Pre-optimization is a common mistake people make. Is it important that your code is readable and easily maintained or slightly faster? Like others have suggested, the best approach is to benchmark the different ways and see if there is a noticeable difference.
If you don't believe your compiler does much in the way of optimization I would look at some older concepts in optimizing C (searches on SO or google should provide some good links).
The ARM processor has an instruction pipeline (cache). When the processor encounters a branch (call) instruction, it must clear the pipeline and reload, thus wasting some time. One objective when optimizing for speed is to reduce the number of reloads to the instruction pipeline. This means reducing branch instructions.
As others have stated in SO, compile your code with optimization set for speed, and profile. I prefer to look at the assembly language listing as well (either printed from the compiler or displayed interwoven in the debugger). Use this as a baseline. If you can't profile, you can use assembly instruction counting as a rough estimate.
The next step is to reduce the number of branches; or the number times a branch is taken. Unrolling loops helps to reduce the number of times a branch is taken. Inlining helps reduce the number of branches. Before applying this fine-tuning techniques, review the design and code implementation to see if branches can be reduced. For example, reduce the number of "if" statements by using Boolean arithmetic or using Karnaugh Maps. My favorite is reducing requirements and eliminating code that doesn't need to be executed.
In the code implementation, move code that doesn't change outside of the for or while loops. Some loops may be reduce to equations (example, replacing a loop of additions with a multiplication). Also, reduce the quantity of iterations, by asking "does this loop really need to be executed this many times").
Another technique is to optimize for Data Oriented Design. Also check this reference.
Just remember to set a limit for optimizing. This is where you decide any more optimization is not generating any ROI or customer satisfaction. Also, apply optimizations in stages; which will allow you to have a deliverable when your manager asks for one.
Run a profiler on your code. If you are just guessing at where you are spending your time, you are probably wrong. A profiler will show what function is taking the most time and you can focus on that. You could be doing something in the function that takes longer than the function call itself. Did you look to see if you can change floating operations to integer, or integer math to shifts? You can spend a lot of time fiddling with things that don't make much difference. Run a profiler on your code and know for sure that the things you are changing will make a difference.
For function vs. inline, unfortunately there is no easy answer. I.e. it depends. See this FAQ. For "for" vs. "while", I wouldn't think there is any significant difference in performance.
In general, a function call should have more overhead than inlining. You really should profile however, as this can be affected quite a bit by your compiler (especially the compile/optimization settings). Some compilers will automatically inline code for example.

Resources