dividing big functions efficiently

dividing big functions efficiently - c

I am currently working to develop a Guideline to improve testability for model based C programming in the area of Embedded Systems.
The first thing I came across was that they created very big (e.g 4k lines) Functions.
For testing issues I would like to suggest to divide the code in smaller pieces.
But I found some sort of a problem and would like to now the most efficient way to solve this.
Imagine the former 4k-lines functions had, lets say 10, local variables.
Now I need to pass this variables around to each function that uses them.
What would be the best, meaning most efficient, way to do this.
At present I have two ideas, both with some disadvantages:
Pass the locals by reference to the functions that need them. I need to use References, because many functions change something about the variables. Disadvantage: Reference goes to Heap -> speed decreases.
Make to locals visible on file level. Disadvantage: overall more memory usage because variables have a longer lifespan.
Is there maybe an other way that reduces the disadvantages?

If the entire 4,000-line function manages to get by on 10 variables, I think you can count yourself lucky.
I would clean it by collecting the variables in a structure, which is instantiated inside the first function, and which is then shared with sub-functions by passing them a pointer. Performance should be very close to what you had.
Also, there are no "references" in C, it can be kind of confusing to use that terminology.

Form one structure that has the variable you want to pass and send that structure variable/structure pointer to any functions you call. This will reduce passing more argumnets

Related

Dealing with a large amount of variables

I am writing a large C program and have one file file.c that contains a large number of variables. Let's say that it has a function named func() that is being used by other files.
Unfortunately, the function func() has to use a large amount of variables and since it contains a lot of code I've decided to write some functions that are used by func() in order to make the code more readable. Now, I have 3 main possibilities:
Declare the variables as global.
Declare the variables as local inside func() and pass these variables as arguments to functions inside file.c using a struct (or any other trick that you can think of).
Declare the variables as local and instead of using other functions just throw all the code inside func() which will result in very long and unreadable code.
Any suggestions ?

In general, the second option is the best.
over-use of global variables is considered a bad practice.
It's may be a source of poor performance in some cases because the global has to be loaded, whereas if you can keep a variable local it can be stored in a register.
Also, code reuse is much easier. The function that receives a parameter is more general. It can be called in various use cases. The function that uses the global can be performed in only one context.
In addition, globals are available, and thus, very probably, also used everywhere in the program. This means that they are accessed and written to from many places. Validating their values can be really cause an overhead.
Bottom line: Modular code is really more readable and easier to maintain.

A "large number of variables" sounds to me like a mess that can be organized into one or several structs. If I were you, the first thing I would do would be to analyze file.c in order to have a good idea of which are the most used variables and the relation they hold to each other, with that information I would then group the variables into structs that could then be passed as pointers to the functions. The least used variables have a reasonable chance to become constants at the function call level, so be aware of that. I wouldn't be surprised if I found variables with duplicated meaning and content, so I believe you have great deal of chance for optimization here. Think carefully, maybe what you see as a large number of variables can be reduced to just one or two compact structs.
In general avoid as much as possible the use of global variables. Use them only as the last resource to export something to other modules and when the use of any other solution implies a performance lost.

Why pass a pointer to a global structure?

This is going to probably be a stupid question but I am reviewing some code and I just don't see the point in what this guy is doing. In one C file he has defined a global structure that has many elements of many types. So from function "A" there is a call to function "B". In the call they are passing a pointer to the global structure and then in function "B" some stuff is done and part of the global is updated. Now This all seems like superfluous overkill since it is already a global. If the structure was local to function "A" I could totally see passing in the address to the structure into function "B". However the memory is permanently allocated already at the very top of the C file. In fact I can argue that there is a potential problem for someone else coming in a changing something and not realizing they have created a bug.
So I am sure there is a "good coding practice" BKM or something like that for doing this but I just can't see it. So in short, why create an address pointer and pass that to a function unnecessarily when the variable is already a global?

Passing the pointer is good style, primarily because globals are bad style. Perhaps the original developer is thinking about the possibility that the global may not be global, or the function that accepts it might possibly operate on a different variable (which may or may not also be global, but still needs to be identified).

If the structures instance is global, and the two code files can access it, then obviously that is some unwanted coding. But there may be a case that the previous developer would have planned to create other instances and in such case his function re-usability had been challenged.
Its a good practice to use references to the structure during function intercommunication ,but if there is no some future plan of huge code change then using globals directly is not a bad idea.

Function B was most likely being written with an eye towards reusability, and for whatever reason was never actually re-used.
Ideally, functions should communicate with each other exclusively through parameters and return values (and exceptions, where supported), rather than sharing global data. This allows you to more easily re-use code in other programs where the global data variables are not present (or have different names).
If you're really squeezed for stack space, or have some other real technical limitation that makes using global data a significantly more attractive / less expensive option than passing arguments around, then globals are the right answer, but that should be rare.

Using Structs in Functions

I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.

This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton

First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.

Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.

In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.

Why do we pass by reference when we have a choice to make the variable external?

Suppose we have an array say:
int arr[1000];
and I have a function that works on that array say:
void Func(void);
Why would there ever be a need to pass by reference (by changing the void), when I can have arr[1000] as an external variable outside main()?
What is the difference?Is there any difference?
Why do people prefer passing by reference rather than making it external? (I myself think that making it external is easier).

If you use a global variable arr, Func is limited to always being used with that one variable and nothing else. Here are some reasons why that might be bad:
arr is part of the "current document" you're working with, and you later decide you want your program to support having more than one document open.
You later decide (or someone using your code as a library decides) to use threads, and suddenly your program randomly crashes when two threads clobber each other's work in arr.
You later decide to make your code a library, and now it makes sense for the caller (in case there's more than one point at which the library gets used in a program) to provide the buffer; otherwise independent parts of the calling code would have the be aware of one another's implementations.
All of these problems go away as soon as you eliminate global variables and make your functions take pointers to the data they need to operate on.

I think you're asking if global variables are bad. Quoting an excellent answer:
The problem with global variables is
that since every function has access
to these, it becomes increasingly hard
to figure out which functions actually
read and write these variables.
To understand how the application
works, you pretty much have to take
into account every function which
modifies the global state. That can be
done, but as the application grows it
will get harder to the point of being
virtually impossible (or at least a
complete waste of time).
If you don't rely on global variables,
you can pass state around between
different functions as needed. That
way you stand a much better chance of
understanding what each function does,
as you don't need to take the global
state into account.

If arr is external then anyone can modify it, not just Func. This is Officially Bad.
Passing arguments ensures that you know what data you are changing and who is changing it.
EDIT: Where Officially Bad means "Usually bad, but not always. Generally don't do it unless you have a good reason." Just like all the other "rules" of software development :)

By making the variable external to the function, the function is now tightly coupled to the module that defines the variable, and is thus harder to reuse in other programs. It also means that your function can only ever work on that one array, which limits the function's flexibility. Suppose one day your requirements change, and now you have to process multiple arrays with Func.
By passing the array as a parameter (along with the array size), the function becomes more easily decoupled from the module using it (meaning it can be more easily used by other programs/modules), and you can now use the function to process more than one array.
From a general code maintenance standpoint, it's best that functions and their callers communicate through parameters and return values rather than rely on shared variables.

It's largely a matter of scope; If you make all your variables external/global in scope, how confusing is that going to get?
Not only that, but you'll have a large number of variables that simply do not need to exist at any given time. Passing function arguments around instead of having lots of global variables lets you more easily get rid of things you no longer need.

Passing by reference (rather than using a global variable) makes it more clear to someone reading the code that the function may change the values of the array.
Additionally if you were to want to preform the action on more than one array you could just use the same function over and over and pass a different array to it each time.
Another reason is that when writing multi-threaded code you usually want each thread to exclusively own as much of the data that it has to work on (sharing writable data is expensive and may result in race conditions if not done properly). By restricting global variable access and making local variables and passing references you can more easily write code that is more thread (and signal handler) friendly.
As an example lets look at the simple puts function.
int puts(const char *s);
This function write a C string to standard output, which can be useful. You might write some complicated code that outputs messages about what it is doing at different stages of execution using puts.
int my_complicated_code( int x, int y, int z);
Now, imagine that you call the function several times in the program, but one of those times you actually don't want it to write to standard output, but to some other FILE *. If all of your calls to puts were actually fputs, which takes a FILE * that tells what file to print to, this would be easy to accomplish if you changed my_complicated_code to take in a FILE * as well as it's other arguments.
int my_complicated_code(int x, int y, int z, FILE * out_file);
Now you can decide which file it will print to at the time when you call my_complicated_code by passing it a reference to any FILE * you have (that is open for writing).
The same thing follows for arrays. The memcpy function would be much less useful if it only copied data to one particular location. Or if it only copied from one particular location, since it actually takes two references to arrays.
It is often easier to write unit tests for functions that take references too since they don't make assumptions about where the data they need is or what its name is. You don't have to keep updating an array with a certain name to mimic the input you want to test, just create a different array for each test and pass it to your function.
In many simple programs it may seem like it is easier to write code using global variables like this, but as programs get bigger this is not the case.

As an addition to all the other answers already giving good reasons: Every single decision in programming is a tradeoff between different advantages and disadvantages. Decades of programming experience by generations of programmers have shown that global state is a bad thing in most cases. There is even a programming paradigm built around the avoidance of it, taking it to the extreme of avoiding state at all:
http://en.wikipedia.org/wiki/Functional_programming
You may find it easier at the moment, but when your projects keep going to grow bigger and bigger, at some point you will find that you have implemented so many workarounds for the problems that came up in the meantime, that you will find yourself unable to maintain your own code.

There is a difference in scope. If you declare "int arr[1000]"
in your main() for instance, you cannot access it in your function "another_function()". You would have to explicitly pass it by reference to every other function in which you want to use it. If it were external, it would be accessible in every function.
See (1.)

It's a maintenance issue too. Why would I want to have to track down some external somewhere when I can just look at the function and see what it is supposed to be?

When is it ok to use a global variable in C?

Apparently there's a lot of variety in opinions out there, ranging from, "Never! Always encapsulate (even if it's with a mere macro!)" to "It's no big deal – use them when it's more convenient than not."
So.
Specific, concrete reasons (preferably with an example)
Why global variables are dangerous
When global variables should be used in place of alternatives
What alternatives exist for those that are tempted to use global variables inappropriately
While this is subjective, I will pick one answer (that to me best represents the love/hate relationship every developer should have with globals) and the community will vote theirs to just below.
I believe it's important for newbies to have this sort of reference, but please don't clutter it up if another answer exists that's substantially similar to yours – add a comment or edit someone else's answer.

Variables should always have a smaller scope possible. The argument behind that is that every time you increase the scope, you have more code that potentially modifies the variable, thus more complexity is induced in the solution.
It is thus clear that avoiding using global variables is preferred if the design and implementation naturally allow that. Due to this, I prefer not to use global variables unless they are really needed.
I can not agree with the 'never' statement either. Like any other concept, global variables are something that should be used only when needed. I would rather use global variables than using some artificial constructs (like passing pointers around), which would only mask the real intent.
Some good examples where global variables are used are singleton pattern implementations or register access in embedded systems.
On how to actually detect excessive usages of global variables: inspection, inspection, inspection. Whenever I see a global variable I have to ask myself: Is that REALLY needed at a global scope?

The only way you can make global variables work is to give them names that assure they're unique.
That name usually has a prefix associated some some "module" or collection of functions for which the global variable is particularly focused or meaningful.
This means that the variable "belongs" to those functions -- it's part of them. Indeed, the global can usually be "wrapped" with a little function that goes along with the other functions -- in the same .h file same name prefix.
Bonus.
When you do that, suddenly, it isn't really global any more. It's now part of some module of related functions.
This can always be done. With a little thinking every formerly global variable can be assigned to some collection of functions, allocated to a specific .h file, and isolated with functions that allow you to change the variable without breaking anything.
Rather than say "never use global variables", you can say "assign the global variable's responsibilities to some module where it makes the most sense."

Global variables in C are useful to make code more readable if a variable is required by multiple methods (rather than passing the variable into each method). However, they are dangerous because all locations have the ability to modify that variable, making it potentially difficult to track down bugs. If you must use a global variable, always ensure it is only modified directly by one method and have all other callers use that method. This will make it much easier to debug issues relating to changes in that variable.

Consider this koan: "if the scope is narrow enough, everything is global".
It is still very possible in this age to need to write a very quick utility program to do a one-time job.
In such cases, the energy required to create safe access to variables is greater than the energy saved by debugging problems in such a small utility.
This is the only case I can think of offhand where global variables are wise, and it is relatively rare. Useful, novel programs so small they can be held completely within the brain's short-term memory are increasingly infrequent, but they still exist.
In fact, I could boldly claim that if the program is not this small, then global variables should be illegal.
If the variable will never change, then it is a constant, not a variable.
If the variable requires universal access, then two subroutines should exist for getting and setting it, and they should be synchronized.
If the program starts small, and might be larger later, then code as if the program is large today, and abolish global variables. Not all programs will grow! (Although of course, that assumes the programmer is willing to throw away code at times.)

When you're not worried about thread-safe code: use them wherever it makes sense, in other words wherever it makes sense to express something as a global state.
When your code may be multi-threaded: avoid at all costs. Abstract global variables into work queues or some other thread-safe structure, or if absolutely necessary wrap them in locks, keeping in mind that these are likely bottlenecks in the program.

I came from the "never" camp, until I started working in the defense industry. There are some industry standards that require software to use global variables instead of dynamic (malloc in the C case) memory. I'm having to rethink my approach to dynamic memory allocation for some of the projects that I work on. If you can protect "global" memory with the appropriate semaphores, threads, etc. then this can be an acceptable approach to your memory management.

Code complexity is not the only optimization of concern. For many applications, performance optimization has a far greater priority. But more importantly, use of global variables can drastically REDUCE code complexity in many situations. There are many, perhaps specialized, situations in which global variables are not only an acceptable solution, but preferred. My favorite specialized example is their use to provide communication between the main thread of an application with an audio callback function running in a real-time thread.
It is misleading to suggest that global variables are a liability in multi-threaded applications as ANY variable, regardless of scope, is a potential liability if it is exposed to change on more than one thread.
Use global variables sparingly. Data structures should be used whenever possible to organize and isolate the use of the global namespace.
Variable scope avails programmers very useful protection -- but it can have a cost. I came to write about global variables tonight because I am an experienced Objective-C programmer who often gets frustrated with the barriers object-orientation places on data access. I would argue that anti-global zealotry comes mostly from younger, theory-steeped programmers experienced principally with object-oriented APIs in isolation without a deep, practical experience of system level APIs and their interaction in application development. But I have to admit that I get frustrated when vendors use the namespace sloppily. Several linux distros had "PI" and "TWOPI" predefined globally, for example, which broke much of my personal code.

When Not to Use: Global variables are dangerous because the only way to ever know how the global variable changed is to trace the entire source code within the .c file within which they are declared (or, all .c files if it is extern as well). If your code goes buggy, you have to search your entire source file(s) to see which functions change it, and when. It is a nightmare to debug when it goes wrong. We often take for granted the ingenuity behind the concept of local variables gracefully going out of scope - it's easy to trace
When to Use: Global variables should be used when its utilization is not excessively masked and where the cost of using local variables is excessively complex to the point where it compromises readability. By this, I mean the necessary of having to add an additional parameter to function arguments and returns and passing pointers around, amongst other things. Three classic examples: When I use the pop and push stack - this is shared between functions. Of-course I could use local variables but then I would have to pass pointers around as an additional parameter. Second classic example can be found in K&R's "The C Programming Language" where they define a getch() and ungetch() functions which share a global character buffer array. Once again, we don't need to make it global, but is the added complexity worth it when its pretty hard to mess up the use of the buffer? Third example is something you'll find in the embedded space amongst Arduino hobbyists. Alot of functions within the main loop function all share the millis() function which is the instantaneous time of when the function is invoked. Because clock speed isn't infinite, the millis() will differ within a single loop. To make it constant, take a snapshot of time prior to every loop and save it in a global variable. The time snapshot will now be the same as when accessed by the many functions.
Alternatives: Not much. Stick to local scoping as much as possible, especially in the beginning of the project, rather than vice versa. As the project grow's and if you feel complexity can be lowered using global variables, then do so, but only if it meets the requirements of point two. And remember, using local scope and having more complicated code is the lesser evil compared to irresponsibly using global variables.

You need to consider in what context the global variable will be used as well. In the future will you want this code to duplicate.
For example if you are using a socket within the system to access a resource. In the future will you want to access more than one of these resources, if the answer is yes I would stay away from globals in the first place so a major refactor will not be required.

Global variables should be used when multiple functions need to access the data or write to an object. For example, if you had to pass data or a reference to multiple functions such as a single log file, a connection pool, or a hardware reference that needs to be accessed across the application. This prevents very long function declarations and large allocations of duplicated data.
You should typically not use global variables unless absolutely necessary because global variables are only cleaned up when explicitly told to do so or your program ends. If you are running a multi-threaded application, multiple functions can write to the variable at the same time. If you have a bug, tracking that bug down can be more difficult because you don't know which function is changing the variable. You also run into the problem of naming conflicts unless you use a naming convention that explicitly gives global variables a unique name.

It's a tool like any other usually overused but I don't think they are evil.
For example I have a program that really acts like an online database. The data is stored in memory but other programs can manipulate it. There are internal routines that act much like stored procedures and triggers in a database.
This program has a hundreds of global variables but if you think about it what is a database but a huge number of global variables.
This program has been in use for about ten years now through many versions and it's never been a problem and I'd do it again in a minute.
I will admit that in this case the global vars are objects that have methods used for changing the object's state. So tracking down who changed the object while debugging isn't a problem since I can always set a break point on the routine that changes the object's state. Or even simpler I just turn on the built in logging that logs the changes.

When you declare constants.

I can think of several reasons:
debugging/testing purposes (warning - haven't tested this code):
#include <stdio.h>
#define MAX_INPUT 46
int runs=0;
int fib1(int n){
++runs;
return n>2?fib1(n-1)+fib1(n-2):1;
};
int fib2(int n,int *cache,int *len){
++runs;
if(n<=2){
if(*len==2)
return 1;
*len=2;
return cache[0]=cache[1]=1;
}else if(*len>=n)
return cache[n-1];
else{
if(*len!=n-1)
fib2(n-1,cache,len);
*len=n;
return cache[n-1]=cache[n-2]+cache[n-3];
};
};
int main(){
int n;
int cache[MAX_INPUT];
int len=0;
scanf("%i",&n);
if(!n||n>MAX_INPUT)
return 0;
printf("fib1(%i)==%i",n,fib1(n));
printf(", %i run(s)\n",runs);
runs=0;
printf("fib2(%i)==%i",n,fib2(n,&cache,&len));
printf(", %i run(s)\n",runs);
main();
};
I used scoped variables for fib2, but that's one more scenario where globals might be useful (pure mathematical functions which need to store data to avoid taking forever).
programs used only once (eg for a contest), or when development time needs to be shortened
globals are useful as typed constants, where a function somewhere requires *int instead of int.
I generally avoid globals if I intend to use the program for more than a day.

I believe we have an edge case in our firm, which prevents me from entering the "never use global variables camp".
We need to write an embedded application which works in our box, that pulls medical data from devices in hospital.
That should run infinitely, even when medical device is plugged off, network is gone, or settings of our box changes. Settings are read from a .txt file, which can be changed during runtime with preferably no trouble.
That is why Singleton pattern is no use to me. So we go back from time to time (after 1000 data is read) and read settings like so:
public static SettingForIncubator settings;
public static void main(String[] args) {
while(true){
SettingsForIncubator settings = getSettings(args);
int counter=0;
while(medicalDeviceIsGivingData && counter < 1000){
readData(); //using settings
//a lot of of other functions that use settings.
counter++;
}
}
}

Global constants are useful - you get more type safety than pre-processor macros and it's still just as easy to change the value if you decide you need to.
Global variables have some uses, for example if the operation of many parts of a program depend on a particular state in the state machine. As long as you limit the number of places that can MODIFY the variable tracking down bugs involving it isn't too bad.
Global variables become dangerous almost as soon as you create more than one thread. In that case you really should limit the scope to (at most) a file global (by declaring it static) variable and getter/setter methods that protect it from multiple access where that could be dangerous.

I'm in the "never" camp here; if you need a global variable, at least use a singleton pattern. That way, you reap the benefits of lazy instantiation, and you don't clutter up the global namespace.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight