I have legacy C code base at work and I find a lot of function implementations in the style below.
char *DoStuff(char *inPtr, char *outPtr, char *error, long *amount)
{
*error = 0;
*amount = 0;
// Read bytes from inPtr and decode them as a long storing in amount
// before returning as a formatted string in outPtr.
return (outPtr);
}
Using DoStuff:
myOutPtr = DoStuff(myInPtr, myOutPtr, myError, &myAmount);
I find that pretty obtuse and when I need to implement a similar function I end up doing:
long NewDoStuff(char *inPtr, char *error)
{
long amount = 0;
*error = 0;
// Read bytes from inPtr and decode them as a long storing in amount.
return amount;
}
Using NewDoStuff:
myAmount = NewDoStuff(myInPtr, myError);
myOutPtr += sprintf (myOutPtr, "%d", myAmount);
I can't help but wondering if there is something I'm missing with the top example, is there a good reason to use that type of approach?
One advantage is that if you have many, many calls to these functions in your code, it will quickly become tedious to have to repeat the sprintf calls over and over again.
Also, returning the out pointer makes it possible for you to do things like:
DoOtherStuff(DoStuff(myInPtr, myOutPtr, myError, &myAmount), &myOther);
With your new approach, the equivalent code is quite a lot more verbose:
myAmount = DoNewStuff(myInPtr, myError);
myOutPtr += sprintf("%d", myAmount);
myOther = DoOtherStuff(myInPtr, myError);
myOutPtr += sprintf("%d", myOther);
It is the C standard library style. The return value is there to aid chaining of function calls.
Also, DoStuff is cleaner IMO. And you really should be using snprintf. And a change in the internals of buffer management do not affect your code. However, this is no longer true with NewDoStuff.
The code you presented is a little unclear (for example, why are you adding myOutPtr with the results of the sprintf.
However, in general what it seems that you're essentially describing is the breakdown of one function that does two things into a function that does one thing and a code that does something else (the concatenation).
Separating responsibilities into two functions is a good idea. However, you would want to have a separate function for this concatenation and formatting, it's really not clear.
In addition, every time you break a function call into multiple calls, you are creating code replication. Code replication is never a good idea, so you would need a function to do that, and you will end up (this being C) with something that looks like your original DoStuff.
So I am not sure that there is much you can do about this. One of the limitations of non-OOP languages is that you have to send huge amounts of parameters (unless you used structs). You might not be able to avoid the giant interface.
If you wind up having to do the sprintf call after every call to NewDoStuff, then you are repeating yourself (and therefore violating the DRY principle). When you realize that you need to format it differently you will need to change it in every location instead of just the one.
As a rule of thumb, if the interface to one of my functions exceeds 110 columns, I look strongly at using a structure (and if I'm taking the best approach). What I don't (ever) want to do is take a function that does 5 things and break it into 5 functions, unless some functionality within the function is not only useful, but needed on its own.
I would favor the first function, but I'm also quite accustomed to the standard C style.
Related
My program answers on incoming messages and do some logic based on ID`s and data included in messages.
I have a different function for each ID.
The project is pure C.
To make the code easy to work with I have adjusted all functions to the same style (same return and parameters).
I also want to evade the long switch-case constructions and make code easier to edit later, so I have created the following function:
AnswerStruct IDHandler(Request Message)
{
struct AnswerStruct ANS;
SIDHandler = IDfunctions[Message.ID];
ANS = SIDHandler(Message);
return ANS;
}
AnswerStruct is struct for answer messages.
Request is struct for incoming messages.
IDfunctions is array of pointers to functions which looks like this -
AnswerStruct func1(Request);
AnswerStruct func4(Request);
...
typedef AnswerStruct(*f)(Request);
AnswerStruct (*SIDHandler)(Request);
static f IDfunctions[IDMax] = {0, *func1, 0, 0, *func4, ...};
Function pointers placed in the array cells equal to their id`s, for example:
func1 related to message with ID=1.
func4 related to message with ID=4.
I think, that by using this array I make my life much easier.
I can call function which I need in one step (just go to the IDfunctions[ID]).
Also, adding new functions becomes a two step operation (just add function to the IDfunctions and write logic).
I doubt the efficiency of the selected solution, it seems clunky to me.
The question is - Is this a good architecture?
If no, how can I edit my solution to make it better?
Thanks.
I doubt the efficiency of the selected solution, it seems clunky to
me.
It can be less efficient to call a function via a function pointer than to call it directly by name, because the former denies the compiler any opportunity to optimize the call. But you have to consider whether that actually matters. In a system that dispatches function calls based on messages received from an external source, the I/O involved in receiving the messages is likely to be much more expensive than the indirect function calls, so the difference in call performance is unlikely to be significant.
On the other hand, your approach affords simpler logic and many fewer lines of code, which is a different and potentially more valuable kind of efficiency.
The question is - Is this a good architecture?
The general approach is perfectly good, and I don't see much to complain about in the implementation sketch provided.
Personally, I would declare array IDFunctions to be const (supposing, of course, that you don't intend to replace any of its members after their initialization), but that's a minor safety / performance detail, where again the performance dimension is probably irrelevant.
I'm writing my first C library and I'm not sure which way to go about. For example a function to retrive string value from some data store can look:
int get_value(void * store, char ** result);
or
char * get_value(void * store, int * error);
I'm having hard time coming with any objective reason to prefer one over another, but than again, I don't write C that much. The return-error-code will look more consistent when multiple output parameters are present, however the return-value could be bit easier to use? Not sure.
Is there general consensus on which style is better and why or is just a personal preference?
There tend not to be good, "hard" answers to style-based questions like this. What follows are my opinions; others will disagree.
Having a function simply return its return value usually makes it easier for the caller -- as long as the caller is interested in getting answers, not necessarily in optimal error handling.
Having all functions return success/failure codes -- returning any other data via "result" parameters -- makes for clean and consistent error handling, but tends to be less convenient for callers. You're always having to declare extra variables (of the proper type) to hold return values. You can't necessarily write things like a = f(g());.
Returning ordinary values ordinarily, and indicating errors via an "out of band" ordinary return value, is a popular technique -- the canonical example is the Standard C getchar function -- but it can feel rather ad-hoc and error-prone.
Returning values via the return value, and error codes via a "return" parameter, is unusual. I can see the attraction, but I can't say I've ever used that technique, or would. If there needs to be an error return distinct from the return value, the "C way" (though of course this is generally a pretty bad idea, and now pretty strongly deprecated) is to use some kind of globalish variable, à la errno.
If you want to pay any heed to the original "spirit of C", it was very much for programmer convenience, and was not too worried about rigid consistency, and was generally okay with healthy dollops of inconsistency and ad-hocciness. So using out-of-band error returns is fine.
If you want to pay heed to modern usage, it seems to be increasingly slanted towards conformity and correctness, meaning that consistent error return schemes are a good thing, even if they're less convenient. So having the return value be a success/failure code, and data returned by a result parameter, is fine.
If you want to have the return value be the return value, for convenience, and if errors are unusual, but for those callers who care you want to give them a way of getting fine-grained error information, a good compromise is sometimes to have a separate function to fetch the details of the most-recent error. This can still lead to the same kinds of race conditions as a global variable, but if your library uses some kind of "descriptors" or "handles", such that you can arrange to have this error-details function return the details of the most recent operation on a particular handle, it can work pretty well.
I have a 3500 lines long C function written by someone else that i need to break it apart without causing any regression. Since it is in C, the main problem i face is maintaining the state of the variables. For example if i break a small part of the code into another function, i will need to pass 10 arguments. Some of them will actually change inside the code of the new function. So effectively i will need to pass a pointer to them. It becomes very messy. Is there is better way of dealing with such refactoring? Any "best practice"?
Unit testing. Extract small portions of the code that depend on 3 variables or less (1 variable best) and test the hell out of it. Replace that code in original function with a call to new function.
Each function should do one thing that is easy to figure out from examining the code.
Instead of passing 10 variables around, put them into a structure and pass that around.
In my opinion, the best thing you can do is to thoroughly study that function, and fully understand its internals. It is more than probable that this function has a lot of anitpatterns inside it, so I'd not try to refactor it: once I knew how it works (which I understand this can suppose a lot of time) I'd throw it away and rewrite the equivalent smaller functions needed, from scratch.
Pack the local variables that are shared between multiple of the sub-functions into a struct and hand the struct around?
Are you stuck with C? I have sometimes converted such functions into a C++ class, where I convert the some (or all) local variables into member variables. Once this step has been done, you can easily break out part of the code into methods that work on the member variables.
In practice this means that a function like:
... do_xxx(...)
{
.. some thousand lines of code...
}
can be converted into:
class xxx_handler
{
public:
xxx_handler(...);
... run(...)
{
part1();
part2();
part3();
return ...;
}
private:
// Member variables goes here.
};
// New replacement function.
... do_xxx(...)
{
xxx_handler handler(...);
return handler.run(...);
}
One thing to start with, as a first step to taking out parts of the function as independent functions, is to move "function global" temp variables to be in tighter scope, something like:
int temp;
temp = 5;
while(temp > 0) {...}
...
temp = open(...);
if (temp < 0) {...}
converted to
{
int temp = 5;
while(temp > 0) {...}
}
...
{
int temp = open(...);
if (temp < 0) {...}
}
After that, it's easier to move each {} block into a separate function, which does one well-defined thing.
But then, most important guideline after having unit tests:
Use version control which supports "cherry-picking" (like git). Commit often, basically whenever it compiles after refactoring something, then commit again (or amend the previous commit if you don't want to have the first commit version around) when it actually works. Learn to use version control's diff tool, and cherry-picking, when you need to roll back after breaking something.
An Example
Suppose we have a text to write and could be converted to "uppercase or lowercase", and can be printed "at left, center or right".
Specific case implementation (too many functions)
writeInUpperCaseAndCentered(char *str){//..}
writeInLowerCaseAndCentered(char *str){//..}
writeInUpperCaseAndLeft(char *str){//..}
and so on...
vs
Many Argument function (bad readability and even hard to code without a nice autocompletion IDE)
write( char *str , int toUpper, int centered ){//..}
vs
Context dependent (hard to reuse, hard to code, use of ugly globals, and sometimes even impossible to "detect" a context)
writeComplex (char *str)
{
// analize str and perhaps some global variables and
// (under who knows what rules) put it center/left/right and upper/lowercase
}
And perhaps there are others options..(and are welcome)
The question is:
Is there is any good practice or experience/academic advice for this (recurrent) trilemma ?
EDIT:
What I usually do is to combine "specific case" implementation, with an internal (I mean not in header) general common many-argument function, implementing only used cases, and hiding the ugly code, but I don't know if there is a better way that I don't know. This kind of things make me realize of why OOP was invented.
I'd avoid your first option because as you say the number of function you end up having to implement (though possibly only as macros) can grow out of control. The count doubles when you decide to add italic support, and doubles again for underline.
I'd probably avoid the second option as well. Againg consider what happens when you find it necessary to add support for italics or underlines. Now you need to add another parameter to the function, find all of the cases where you called the function and updated those calls. In short, anoying, though once again you could probably simplify the process with appropriate use of macros.
That leaves the third option. You can actually get some of the benefits of the other alternatives with this using bitflags. For example
#define WRITE_FORMAT_LEFT 1
#define WRITE_FORMAT_RIGHT 2
#define WRITE_FORMAT_CENTER 4
#define WRITE_FORMAT_BOLD 8
#define WRITE_FORMAT_ITALIC 16
....
write(char *string, unsigned int format)
{
if (format & WRITE_FORMAT_LEFT)
{
// write left
}
...
}
EDIT: To answer Greg S.
I think that the biggest improvement is that it means that if I decide, at this point, to add support for underlined text I it takes two steps
Add #define WRITE_FORMAT_UNDERLINE 32 to the header
Add the support for underlines in write().
At this point it can call write(..., ... | WRITE_FORMAT_UNLDERINE) where ever I like. More to the point I don't need to modify pre-existing calls to write, which I would have to do if I added a parameter to its signature.
Another potential benefit is that it allows you do something like the following:
#define WRITE_ALERT_FORMAT (WRITE_FORMAT_CENTER | \
WRITE_FORMAT_BOLD | \
WRITE_FORMAT_ITALIC)
I prefer the argument way.
Because there's going to be some code that all the different scenarios need to use. Making a function out of each scenario will produce code duplication, which is bad.
Instead of using an argument for each different case (toUpper, centered etc..), use a struct. If you need to add more cases then you only need to alter the struct:
typedef struct {
int toUpper;
int centered;
// etc...
} cases;
write( char *str , cases c ){//..}
I'd go for a combination of methods 1 and 2.
Code a method (A) that has all the arguments you need/can think of right now and a "bare" version (B) with no extra arguments. This version can call the first method with the default values. If your language supports it add default arguments. I'd also recommend that you use meaningful names for your arguments and, where possible, enumerations rather than magic numbers or a series of true/false flags. This will make it far easier to read your code and what values are actually being passed without having to look up the method definition.
This gives you a limited set of methods to maintain and 90% of your usages will be the basic method.
If you need to extend the functionality later add a new method with the new arguments and modify (A) to call this. You might want to modify (B) to call this as well, but it's not necessary.
I've run into exactly this situation a number of times -- my preference is none of the above, but instead to use a single formatter object. I can supply it with the number of arguments necessary to specify a particular format.
One major advantage of this is that I can create objects that specify logical formats instead of physical formats. This allows, for example, something like:
Format title = {upper_case, centered, bold};
Format body = {lower_case, left, normal};
write(title, "This is the title");
write(body, "This is some plain text");
Decoupling the logical format from the physical format gives you roughly the same kind of capabilities as a style sheet. If you want to change all your titles from italic to bold-face, change your body style from left justified to fully justified, etc., it becomes relatively easy to do that. With your current code, you're likely to end up searching through all your code and examining "by hand" to figure out whether a particular lower-case, left-justified item is body-text that you want to re-format, or a foot-note that you want to leave alone...
As you already mentioned, one striking point is readability: writeInUpperCaseAndCentered("Foobar!") is much easier to understand than write("Foobar!", true, true), although you could eliminate that problem by using enumerations. On the other hand, having arguments avoids awkward constructions like:
if(foo)
writeInUpperCaseAndCentered("Foobar!");
else if(bar)
writeInLowerCaseAndCentered("Foobar!");
else
...
In my humble opinion, this is a very strong argument (no pun intended) for the argument way.
I suggest more cohesive functions as opposed to superfunctions that can do all kinds of things unless a superfunction is really called for (printf would have been quite awkward if it only printed one type at a time). Signature redundancy should generally not be considered redundant code. Technically speaking it is more code, but you should focus more on eliminating logical redundancies in your code. The result is code that's much easier to maintain with very concise, well-defined behavior. Think of this as the ideal when it seems redundant to write/use multiple functions.
Important: Please see this very much related question: Return multiple values in C++.
I'm after how to do the same thing in ANSI C? Would you use a struct or pass the addresses of the params in the function? I'm after extremely efficient (fast) code (time and space), even at the cost of readability.
EDIT: Thanks for all the answers. Ok, I think I owe some explanation: I'm writing this book about a certain subset of algorithms for a particular domain. I have set myself the quite arbitrary goal of making the most efficient (time and space) implementations for all my algos to put up on the web, at the cost of readability and other stuff. That is in part the nature of my (general) question.
Answer: I hope I get this straight, from (possibly) fastest to more common-sensical (all of this a priori, i.e. without testing):
Store outvalues in global object (I would assume something like outvals[2]?), or
Pass outvalues as params in the function (foo(int in, int *out1, int *out2)), or
return a struct with both outvals, or
(3) only if the values are semantically related.
Does this make sense? If so, I think Jason's response is the closest, even though they all provide some piece of the "puzzle". Robert's is fine, but at this time semantics is not what I'm after (although his advice is duly noted).
Both ways are valid, certianly, but I would would consider the semantics (struct vs parameter reference) to decide which way best communicates you intentions to the programmer.
If the values you are returning are tightly coupled, then it is okay to return them as a structure. But, if you are simply creating artificial mechanism to return values together (as a struct), then you should use a parameter reference (i.e. pass the address of the variables) to return the values back to the calling function.
As Neil says, you need to judge it for yourself.
To avoid the cost of passing anything, use a global. Next best is a single structure passed by pointer/reference. After that are individual pointer/reference params.
However, if you have to pack data into the structure and then read it back out after the call, you may be better off passing individual parameters.
If you're not sure, just write a bit of quick test code using both approaches, execute each a few hundred thousand times, and time them to see which is best.
You have described the two possible solutions and your perceived performance constraint. Where you go from here is really up to you - we don't have enough information to make an informed judgement.
Easiest to read should be passed addresses in the function, and it should be fast also, pops and pushes are cheap:
void somefunction (int inval1, int inval2, int *outval1, int *outval2) {
int x = inval1;
int y = inval2;
// do some processing
*outval1 = x;
*outval2 = y;
return;
}
The fastest Q&D way that I can think of is to pass the values on a global object, this way you skip the stack operation just keep in mind that it won't be thread safe.
I think that when you return a struct pointer, you probably need to manually find some memory for that. Addresses in parameter list are allocated on the stack, which is way faster.
Keep in mind that sometimes is faster to pass parameters by value and update on return (or make local copies on the stack) than by reference... This is very evident with small structures or few parameters and lots of accesses.
This depends massively on your architecture, and also if you expect (or can have) the function inlined. I'd first write the code in the simplest way, and then worry about speed if that shows up as an expensive part of your code.
I would pass the address to a struct. If the information to be returned isn't complex, then just passing in the addresses to the values would work too.
Personally, it really comes down to how messy the interface would be.
void SomeFunction( ReturnStruct* myReturnVals )
{
// Fill in the values
}
// Do some stuff
ReturnStruct returnVals;
SomeFunction( &returnVals);
// Do more stuff
In either case, you're passing references, so performance should be similar. If there is a chance that the function never actually returns a value, you could avoid the cost of the malloc with the "return a struct" option since you'd simply return null.
My personal preference is to return a dynamically allocated (malloc'd) struct. I avoid using function arguments for output because I think it makes code more confusing and less maintainable in the long-term.
Returning a local copy of the structure is bad because if the struct was declared as non-static inside the function, it becomes null and void once you exit the function.
And to all the folks suggesting references, well the OP did say "C," and C doesn't have them (references).
And sweet feathery Jesus, can I wake up tomorrow and not have to see anything about the King of Flop on TV?