Re-implementing strlen (C) - c

Noob alert:
for learning purposes i've been given the task to re implement the strlen() function, I've gotten the notion that this would be best done with a function like macro rather than a function,
my reasoning would be that using a macro i wouldn't have to deal with passing a string to a function.
what are your thoughts?
is it better to create a proper function or a macro in this case?

Macros are expanded exactly once, when the program is compiled. Since time travel is not part of the C language, it is impossible for a future execution of a program to retroactively change the consequence of a macro. So if a computation, such as computing the length of a string, depends on information not known when the program is compiled, a macro is completely useless. Unless the string happens to be a literal, this will be the case. And I venture to assert that in the vast majority of cases, the string whose length is required did not exist when the program was compiled.
A clear understanding of what macros actually do -- modify the program text before compilation -- will help avoid distractions such as the suggestion in this question.
It can occasionally be useful to use strlen on a constant string literal, in order to avoid bugs which might be introduced in the future when the string literal is modified. For example, the following (which tests whether line starts with the text Hello):
/* Code smell: magic number */
if (strncmp(line, "Hello", 5) == 0) { ... }
Would be better written as:
/* Code smell: redundant repetition, see below */
if (strncmp(line, "Hello", strlen("Hello")) == 0) { ... }
Obviously, if a computation can be performed once at compile-time, it would be better to do so rather than doing it repeatedly when the program runs. Once upon a time when compilers were primitive and almost incapacable of understanding control flow, it made some sense to worry about such things, although even then a lot of the hand-optimisations were much too complicated for the minor benefits achieved.
Today, even this excuse is unavailable for the premature optimizer. Most modern C compilers are perfection capable of replacing strlen("Hello"); with the constant 5, so that the library function is never called. No macro magic is required to achieve this optimisation.
As indicated, the test in the example still has an unnecessary repetition of the prefix string. What we really want to write is:
if (startsWith(line, "Hello")) { ... }
Here the temptation to define startsWith as a macro will be very strong, since it seems like simple compile-time substitution would suffice. This temptation should be avoided. Modern compilers are also capable of "inlining" function calls; that is, inserting the body of the call directly into the code.
So the definition:
static int startsWith(const char* line, const char* prefix) {
return strncmp(line, prefix, strlen(prefix)) == 0;
}
will be just as fast as its macro equivalent, and unlike the macro it will not lead to problems when it is called with a second argument with side effects:
/* Bad style but people do it */
if (startsWith(line, prefixes[++i])) { doAction(i); }
Once the call is inlined, the compiler can then proceed to apply other optimisations, such as the elimination of the call to strlen in case the prefix argument is a string literal.

Related

Unit Testing a Function Macro

I'm writing unit tests for some function macros, which are just wrappers around some function calls, with a little housekeeping thrown in.
I've been writing tests all morning and I'm starting to get tedium of the brainpan, so this might just be a case of tunnel vision, but:
Is there a valid case to be made for unit testing for macro expansion? By that I mean checking that the correct function behavior is produced for the various source code forms of the function macro's arguments. For example, function arguments can take the form, in source code of a:
literal
variable
operator expression
struct member access
pointer-to-struct member access
pointer dereference
array index
function call
macro expansion
(feel free to point out any that I've missed)
If the macro doesn't expand properly, then the code usually won't even compile. So then, is there even any sensible point in a different unit test if the argument was a float literal or a float variable, or the result of a function call?
Should the expansion be part of the unit test?
As I noted in a comment:
Using expressions such as value & 1 could reveal that the macros are careless, but code inspections can do that too.
I think going through the full panoply of tests is overkill; the tedium is giving you a relevant warning.
There is an additional mode of checking that might be relevant, namely side-effects such as: x++ + ++y as an argument. If the argument to the macro is evaluated more than once, the side-effects will probably be scrambled, or at least repeated. An I/O function (getchar(), or printf("Hello World\n")) as the argument might also reveal mismanagement of arguments.
It also depends in part on the rules you want to apply to the macros. However, if they're supposed to look like and behave like function calls, they should only evaluate arguments once (but they should evaluate each argument — if the macro doesn't evaluate an argument at all, then the side-effects won't occur that should occur (that would occur if the macro was really a function).
Also, don't underestimate the value of inline functions.
Based on the comments and some of the points made in #Jonathan Leffler's answer, I've come to the conclusion that this is something that is better tested in functional testing, preferably with a fuzz tester.
That way, using a couple of automation scripts, the fuzzer can throw a jillion arguments at the function macro and log those that either don't compile, produce compiler warnings, or compile and run, but produce the incorrect result.
Since fuzz tests aren't supposed to run quickly (like unit tests), there's no problem just adding it to the fuzz suite and letting it run over the weekend.
The goal of testing is to find errors. And, your macro definitions can contain errors. Therefore, there is a case for testing macros in general, and unit-testing in particular can find many specific errors, as will be explained below.
Code inspection can obviously also be used to find errors, however, there are good points in favor of doing both: Unit-tests can cheaply be repeated whenever the respective code is modified, say, for reactoring.
Code inspections can not cheaply be repeated (at least they cause more effort than re-running unit-tests), but they also can find other points that tests can never detect, like, wrong or bad documentation, design issues like code duplication etc.
That said, there are a number of issues you can find when unit-testing macros, some of which were already mentioned. And, it may in principle be possible that there are fuzz testers which also check for such problems, but I doubt that problems with macro definitions are already in focus of fuzz-testers:
wrong algorithm: Expressions and statements in macro definitions can just be as wrong as they can be in normal non-macro code.
unsufficient parenthesization (as mentioned in the comments): This is a potential problem with macros, and it can be detected, possibly even at compile time, by passing expressions with operators with low precedence as macro arguments. For example, calling FOO(x = 2) in test code will lead to a compile error if FOO(a) is defined as (2 * a) instead of (2 * (a)).
unintended multiple use of arguments in the expansion (as mentioned by Jonathan): This also is a potential problem specific to macros. It should be part of the specification of a macro how often its arguments will be evaluated in the expanded code (and sometimes there can no fixed number be given, see assert). Such statements about how often an argument will be evaluated can be tested by passing macro arguments with side effects that can afterwards be checked by the test code. For example, if FOO(a) is defined to be ((a) * (a)), then the call FOO(++x) will result in x being incremented twice rather than once.
unintended expansion: Sometimes a macro shall expand in a way that causes no code to be produced. assert with NDEBUG is an example here, which shall expand such that the expanded code will be optimized away completely. Whether a macro shall expand in such a way typically depends on configuration macros. To check that a macro actually 'disappears' for the respective configuration, syntactically wrong macro arguments can be used: FOO(++ ++) for example can be a compile-time test to see if instead of the empty expansion one of the non-empty expansions was used (whether this works, however, depends on whether the non-empty expansions use the argument).
bad semicolon: to ensure that a function like macro expands cleanly into a compound statement (with proper do-while(0) wrapper but without trailing semicolon), a compile time check like if (0) FOO(42); else .. can be used.
Note: Those tests I mentioned to be compile-time tests are, strictly speaking, just some form of static analysis. In contrast to using a static analysis tool, such tests have the benefit to specifically test those properties that the macros are expected to have according to their design. Like, static analysis tools typically issue warnings when macro arguments are used without parentheses in the expansion - however, in many expansions parentheses are intentionally not used.

What is better: function or define

I have couple of simple functions like
#define JacobiLog(x1,x2) ((x1>x2)?x1:x2)+log(1+exp(-fabs(x1-x2)))
What is better to implement (code, compile, memory...) - as above with define or to write some simple function
double JacobiLog(double x1,double x2)
{
return ((x1>x2) ? x1 : x2) + log(1+exp(-fabs(x1-x2)));
}
The compiler will probably automatically set your function as inline. You should use it and not a define.
It will also avoid unexpected comportment in the case where you use your define as
double num = JacobiLog(x++, y++);
I let you imagine the problem with code replacement...
define can possibly be little faster, but most probably compiler will inline the function anyway (or you can mark as inline) and they will be the same. But function is better, because it is more readable and easier to debug.
The function is better, assuming a good compiler.
With the function, it is left to the compiler whether the code is inlined, or not (assuming the definition of the function is accessible to everyone who uses it, for example if it is an inline function declared in a header for C++, or just a plain function with all of its users in the same translation unit). With the macro, it is always inlined, which is not necessarily faster, as it may lead to code bloat and therefore more cache misses and page faults.
Not to mention macros are difficult to read and, even worse, to debug.
Even though the 'define' is faster (since it prevents a function call), the compiler can optimize and inline your function, and make it as fast.
If you are in a c++ environment, you should always use template and functions. It will make you're program more readable and prevent type error.
In C, macro can be useful since the type is not specified (see example below):
/* Will work with int, long, double, short, etc. */
#HIGHER(VAL1, VAL2) ((VAL1) > (VAL2) ? (VAL1) : (VAL2))
It's a micro-optimization. Unless you're doing embedded programming and every instruction counts, go with the function. Not to mention that the log is likely about 100x slower than the overhead to call a function. So you can only get about a 1% saving if your program consists mainly of calling this function. [1] Once your program starts doing significant other things, this saving will be reduced to basically unnoticeable.
The compiler is free to inline the function wherever possible, which would make the two identical. However, you can't force the compiler to do so. There is an inline keyword in C++, but this is just a hint, the compiler is free to ignore it.
See this for some differences between the two (this covers inline versus non-inline functions, but, as stated above, inline functions are essentially the same as #define's). The basic conclusion to the link is "it depends".
Also note that, behaviourally, a #define and a function are not 100% equivalent.
[1]: Figures largely made up. Benchmark if you want accurate results.
First (for a complete answer) we have to acknowledge that using a macro can have surprise side-effects which you might not intend, and that a function ensures that you know the incoming types and you know that each parameter is evaluated exactly once.
More often than not, these effects of using a macro are a source of problems.
Generally a compiler will inline the function as appropriate, and if it does its job right then it should have nearly all the advantages of a macro but without the rarely-intended side-effects.
Occasionally, though, you can actually get some benefits that an inlining compiler mightn't recognise. For example your macro will temporarily defer converting the arguments to double if they were int or long and perform more operations in integer arithmetic (which might have a performance or precision advantage). You might also get integer overflow and incorrect results.
Since you included 'memory' in your list of "better" factors, it's tempting to say that the function is smaller (assuming you configure your compiler to optimise for size), but this isn't necessarily true.
Obviously as a function you need only one copy of it in memory and all callers can use that same code, whereas inlined or expanded at every use duplicates the code. Your compiler is very unlikely to isolate a macro and convert it into a function called from many different places in the code.
Where a never-inlined function can fail to be smaller is where it stands in the way of simplifications. There are three common cases I can think of:
If all of the uses of the function involve constant parameters, the inlined simplifications might come out smaller than the whole original function.
The register marshalling code required to execute a function call with the parameters in the correct registers can be longer than the function itself.
Adding a function call can add to the register pressure in the caller, forcing it to generate more complicated code, possibly forcing it to create a stack frame and save more registers on entry and exit.

Using a macro for a small operation, is this good practice?

I have a small piece of code that requires to read 4-bit values bitpacked in a 32-bit integer. Since I need to call this operation several times, even if it's simple, I require max speed on it.
I was pondering about macros and inline functions, thus I made this macro:
#define UI32TO4(x, p) (x >> ((p - 1) *4) & 15)
And I have an inline function that does the same thing.
static inline Uint8 foo_getval(Uint32 bits, int pos){
return (bits >> ((pos-1)*4)) & 15;
}
Considering the simplicity of the operation, and that the values are already prepared for this call (so no possibility of calling on the wrong types, or pass values that are too big or that stuff), what would be the best one to use? Or, at least, the most comprehensible for someone else potentially reading/modifying the code later on?
EDIT! Forgot to mention, I am using C99.
The function is safer. Your assumptions that the values are always "right" only holds while you're developing that code. You can't tell if someone down the line (or yourself when you're tired) won't pass unexpected values.
The compiler will do the inlining when it sees it as effective. Use type-safe functions whenever you can, use macros only when you have no other practical choice.
I would use the inline function because macros can cause unwanted side effects. Use macros only to save typing if necessary.
If a macro name is the same name as a function name in an other compilation unit you would get strange compilation errors. These problems can be hard to find, especially if the macro is expanded elsewhere and no error occurs.
Additionally a function warns you about parameter types and would not let you give a double for pos. The macro could allow this.
It's late, and I'm grumpy (and I'll probably delete this post later) but I get tired of hearing the same arguments against macros parroted over and over again (a double redundacy):
Joachim Pileborg (above) states "using a function allows the compiler to do better typechecking". This is often stated, but I don't believe it. With macros, the compiler already has all the available type information at its fingertips. Functions simply destroy this. (And possibly destroy optimization, by pushing registers out to the stack, but that's a side issue.)
And frast (above) states "macros can cause unwanted side effects". True--but so can functions. I think the rule is to always use UPPER_CASE for macros which don't have function semantics. This rule has often been broken. But it doesnt apply here: the OP has redundantly used both uppercase and function semantics.
But I would suggest a tiny improvement. The OP has quite correctly placed parentheses around the whole macro, but there should also be parentheses around each argument:
#define UI32TO4(x, p) ((x) >> (((p) - 1) * 4) & 15)
Always enclose your macro args in parentheses, unless you are doing string or token concatenting, etc.
Macros are, of course, dangerous, but so are functions. (And the less said of STL, the better).

C++ assignment - stylish or performance?

Having been writing Java code for many years, I was amazed when I saw this C++ statement:
int a,b;
int c = (a=1, b=a+2, b*3);
My question is: Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
I think the compiler will see it the same as the following:
int a=1, b=a+2;
int c = b*3;
(What's the offical name for this? I assume it's a standard C/C++ syntax.)
It's the comma operator, used twice. You are correct about the result, and I don't see much point in using it that way.
Looks like an obscure use of a , (comma) operator.
It's not a representative way of doing things in C++.
The only "good-style" use for the comma operator might be in a for statement that has multiple loop variables, used something like this:
// Copy from source buffer to destination buffer until we see a zero
for (char *src = source, *dst = dest; *src != 0; ++src, ++dst) {
*dst = *src;
}
I put "good-style" in scare quotes because there is almost always a better way than to use the comma operator.
Another context where I've seen this used is with the ternary operator, when you want to have multiple side effects, e.g.,
bool didStuff = DoWeNeedToDoStuff() ? (Foo(), Bar(), Baz(), true) : false;
Again, there are better ways to express this kind of thing. These idioms are holdovers from the days when we could only see 24 lines of text on our monitors, and squeezing a lot of stuff into each line had some practical importance.
Dunno its name, but it seems to be missing from the Job Security Coding Guidelines!
Seriously: C++ allows you to a do a lot of things in many contexts, even when they are not necessarily sound. With great power comes great responsibility...
This is called 'obfuscated C'. It is legal, but intended to confuse the reader. And it seems to have worked. Unless you're trying to be obscure it's best avoided.
Hotei
Your sample code use two not very well known by beginners (but not really hidden either) features of C expressions:
the comma operator : a normal binary operator whose role is to return the last of it's two operands. If operands are expression they are evaluated from left to right.
assignment as an operator that returns a value. C assignment is not a statement as in other languages, and returns the value that has been assigned.
Most use cases of both these feature involve some form of obfuscation. But there is some legitimate ones. The point is that you can use them anywhere you can provide an expression : inside an if or a while conditional, in a for loop iteration block, in function call parameters (is using coma you must use parenthesis to avoid confusing with actual function parameters), in macro parameter, etc.
The most usual use of comma is probably in loop control, when you want to change two variables at once, or store some value before performing loop test, or loop iteration.
For example a reverse function can be written as below, thanks to comma operator:
void reverse(int * d, int len){
int i, j;
for (i = 0, j = len - 1 ; i < j ; i++, j--){
SWAP(d[i], d[j]);
}
}
Another legitimate (not obfuscated, really) use of coma operator I have in mind is a DEBUG macro I found in some project defined as:
#ifdef defined(DEBUGMODE)
#define DEBUG(x) printf x
#else
#define DEBUG(x) x
#endif
You use it like:
DEBUG(("my debug message with some value=%d\n", d));
If DEBUGMODE is on then you'll get a printf, if not the wrapper function will not be called but the expression between parenthesis is still valid C. The point is that any side effect of printing code will apply both in release code and debug code, like those introduced by:
DEBUG(("my debug message with some value=%d\n", d++));
With the above macro d will always be incremented regardless of debug or release mode.
There is probably some other rare cases where comma and assignment values are useful and code is easier to write when you use them.
I agree that assignment operator is a great source of errors because it can easily be confused with == in a conditional.
I agree that as comma is also used with a different meaning in other contexts (function calls, initialisation lists, declaration lists) it was not a very good choice for an operator. But basically it's not worse than using < and > for template parameters in C++ and it exists in C from much older days.
Its strictly coding style and won't make any difference in your program. Especially since any decent C++ compiler will optimize it to
int a=1;
int b=3;
int c=9;
The math won't even be performed during assignment at runtime. (and some of the variables may even be eliminated entirely).
As to choice of coding style, I prefer the second example. Most of the time, less nesting is better, and you won't need the extra parenthesis. Since the use of commas exhibited will be known to virtually all C++ programmers, you have some choice of style. Otherwise, I would say put each assignment on its own line.
Is this a choice of coding style, or does it have a real benefit? (I am looking for a practicle use case)
It's both a choice of coding style and it has a real benefit.
It's clearly a different coding style as compared to your equivalent example.
The benefit is that I already know I would never want to employ the person who wrote it, not as a programmer anyway.
A use case: Bob comes to me with a piece of code containing that line. I have him transferred to marketing.
You have found a hideous abuse of the comma operator written by a programmer who probably wishes that C++ had multiple assignment. It doesn't. I'm reminded of the old saw that you can write FORTRAN in any language. Evidently you can try to write Dijkstra's language of guarded commands in C++.
To answer your question, it is purely a matter of (bad) style, and the compiler doesn't care—the compiler will generate exactly the same code as from something a C++ programmer would consider sane and sensible.
You can see this for yourself if you make two little example functions and compile both with the -S option.

When to use function-like macros in C

I was reading some code written in C this evening, and at the top of
the file was the function-like macro HASH:
#define HASH(fp) (((unsigned long)fp)%NHASH)
This left me wondering, why would somebody choose to implement a
function this way using a function-like macro instead of implementing
it as a regular vanilla C function? What are the advantages and
disadvantages of each implementation?
Thanks a bunch!
Macros like that avoid the overhead of a function call.
It might not seem like much. But in your example, the macro turns into 1-2 machine language instructions, depending on your CPU:
Get the value of fp out of memory and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
whereas the function equivalent would be a lot more machine language instructions, generally something like
Stick the value of fp on the stack
Call the function, which also puts the next (return) address on the stack
Maybe build a stack frame inside the function, depending on the CPU architecture and ABI convention
Get the value of fp off the stack and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
Maybe take the value from the register and put it back on the stack, depending on CPU and ABI
If a stack frame was built, unwind it
Pop the return address off the stack and resume executing instructions there
A lot more code, eh? If you're doing something like rendering every one of the tens of thousands of pixels in a window in a GUI, things run an awful lot faster if you use the macro.
Personally, I prefer using C++ inline as being more readable and less error-prone, but inlines are also really more of a hint to the compiler which it doesn't have to take. Preprocessor macros are a sledge hammer the compiler can't argue with.
One important advantage of macro-based implementation is that it is not tied to any concrete parameter type. A function-like macro in C acts, in many respects, as a template function in C++ (templates in C++ were born as "more civilized" macros, BTW). In this particular case the argument of the macro has no concrete type. It might be absolutely anything that is convertible to type unsigned long. For example, if the user so pleases (and if they are willing to accept the implementation-defined consequences), they can pass pointer types to this macro.
Anyway, I have to admit that this macro is not the best example of type-independent flexibility of macros, but in general that flexibility comes handy quite often. Again, when certain functionality is implemented by a function, it is restricted to specific parameter types. In many cases in order to apply similar operation to different types it is necessary to provide several functions with different types of parameters (and different names, since this is C), while the same can be done by just one function-like macro. For example, macro
#define ABS(x) ((x) >= 0 ? (x) : -(x))
works with all arithmetic types, while function-based implementation has to provide quite a few of them (I'm implying the standard abs, labs, llabs and fabs). (And yes, I'm aware of the traditionally mentioned dangers of such macro.)
Macros are not perfect, but the popular maxim about "function-like macros being no longer necessary because of inline functions" is just plain nonsense. In order to fully replace function-like macros C is going to need function templates (as in C++) or at least function overloading (as in C++ again). Without that function-like macros are and will remain extremely useful mainstream tool in C.
On one hand, macros are bad because they're done by the preprocessor, which doesn't understand anything about the language and does text-replace. They usually have plenty of limitations. I can't see one above, but usually macros are ugly solutions.
On the other hand, they are at times even faster than a static inline method. I was heavily optimizing a short program and found that calling a static inline method takes about twice as much time (just overhead, not actual function body) as compared with a macro.
The most common (and most often wrong) reason people give for using macros (in "plain old C") is the efficiency argument. Using them for efficiency is fine if you have actually profiled your code and are optimizing a true bottleneck (or are writing a library function that might be a bottleneck for somebody someday). But most people who insist on using them have Not actually analyzed anything and are just creating confusion where it adds no benefit.
Macros can also be used for some handy search-and-replace type substitutions which the regular C language is not capable of.
Some problems I have had in maintaining code written by macro abusers is that the macros can look quite like functions but do not show up in the symbol table, so it can be very annoying trying to trace them back to their origins in sprawling codesets (where is this thing defined?!). Writing macros in ALL CAPS is obviously helpful to future readers.
If they are more than fairly simple substitutions, they can also create some confusion if you have to step-trace through them with a debugger.
Your example is not really a function at all,
#define HASH(fp) (((unsigned long)fp)%NHASH)
// this is a cast ^^^^^^^^^^^^^^^
// this is your value 'fp' ^^
// this is a MOD operation ^^^^^^
I'd think, this was just a way of writing more readable code with the casting and mod opration wrapped into a single macro 'HASH(fp)'
Now, if you decide to write a function for this, it would probably look like,
int hashThis(int fp)
{
return ((fp)%NHASH);
}
Quite an overkill for a function as it,
introduces a call point
introduces call-stack setup and restore
The C Preprocessor can be used to create inline functions. In your example, the code will appear to call the function HASH, but instead is just inline code.
The benefits of doing macro functions were eliminated when C++ introduced inline functions. Many older API like MFC and ATL still use macro functions to do preprocessor tricks, but it just leaves the code convoluted and harder to read.

Resources