Naming convention for allocated/non-allocated strings

Naming convention for allocated/non-allocated strings - c

Are there any common naming conventions to differentiate between strings that are allocated and not allocated? What I am looking for is hopefully something similar to us/s from
Making Wrong Code Look Wrong, but I'd rather use something common than make up my own.

The us / s convention in the article is somewhat useful, but it misses a greater point. The point is that variable naming conventions are only as good at protecting from missteps as documentation.
For example:
#Version 1
char* sHello = "Hello";
printf("%s\n", sHello);
which gets modified eventually to
#Version 2
char* sHello = "Hello";
... 100 lines of code ...
sHello = realloc(sHello, strlen(sHello)+7);
strcpy(sHello+strlen(sHello), " World");
... 100 lines of code ...
printf("%s\n", sHello);
When you really want to make a coding convention work, you can't rely on some sort of arbitrary (and yes, they are all arbitrary) naming conventions. You have to back up that naming convention with a means of failing the build. In the above case, judicious use of the keyword const would have done the trick, but the coding convention will lull people into complacency that it's right because it's named a certain way.
It's the documentation problem all over again, eventually in-code documentation gets out-of-sync with the code, and your in-code naming convention will eventually get out-of-sync with the usage of those names.
Now, if you want to add typing information to C (which is really what you're getting at), do it via a system that actually adds types to c, typedef. Then put in place a system that can verify if the type is not being used when it is required and fail the build. Anything else will cost you too much after-the-fact searching / maintenance / cleanup, and half of an implemented coding style is just so much worse than even a bad coding style.

Related

Using C preprocessor macros for function naming idiomatic?

I'm writing a Scheme interpreter. For each built-in type (integer, character, string, etc) I want to have the read and print functions named consistently:
READ_ERROR Scheme_read_integer(FILE *in, Value *val);
READ_ERROR Scheme_read_character(FILE *in, Value *val);
I want to ensure consistency in the naming of these functions
#define SCHEME_READ(type_) Scheme_read_##type_
#define DEF_READER(type_, in_strm_, val_) READ_ERROR SCHEME_READ(type_)(FILE *in_strm_, Value *val_)
So that now, instead of the above, in code I can write
DEF_READER(integer, in, val)
{
// Code here ...
}
DEF_READER(character, in, val)
{
// Code here ...
}
and
if (SOME_ERROR != SCHEME_READ(integer)(stdin, my_value)) do_stuff(); // etc.
Now is this considered an unidiomatic use of the preprocessor? Am I shooting myself in the foot somewhere unknowingly? Should I instead just go ahead and use the explicit names of the functions?
If not are there examples in the wild of this sort of thing done well?

I've seen this done extensively in a project, and there's a severe danger of foot-shooting going on.
The problem happens when you try to maintain the code. Even though your macro-ized function definitions are all neat and tidy, under the covers you get function names like Scheme_read_integer. Where this can become an issue is when something like Scheme_read_integer appears on a crash stack. If someone does a search of the source pack for Scheme_read_integer, they won't find it. This can cause great pain and gnashing of teeth ;)
If you're the only developer, and the code base isn't that big, and you remember using this technique years down the road and/or it's well documented, you may not have an issue. In my case it was a very large code base, poorly documented, with none of the original developers around. The result was much tooth-gnashing.
I'd go out on a limb and suggest using a C++ template, but I'm guessing that's not an option since you specifically mentioned C.
Hope this helps.

I'm usually a big fan of macros, but you should probably consider inlined wrapper functions instead. They will add negligible runtime overhead and will appear in stack backtraces, etc., when you're debugging.

Managing without Objects in C - And, why can I declare variables anywhere in a function in C?

everyone. I actually have two questions, somewhat related.
Question #1: Why is gcc letting me declare variables after action statements? I thought the C89 standard did not allow this. (GCC Version: 4.4.3) It even happens when I explicitly use --std=c89 on the compile line. I know that most compilers implement things that are non-standard, i.e. C compilers allowing // comments, when the standard does not specify that. I'd like to learn just the standard, so that if I ever need to use just the standard, I don't snag on things like this.
Question #2: How do you cope without objects in C? I program as a hobby, and I have not yet used a language that does not have Objects (a.k.a. OO concepts?) -- I already know some C++, and I'd like to learn how to use C on it's own. Supposedly, one way is to make a POD struct and make functions similar to StructName_constructor(), StructName_doSomething(), etc. and pass the struct instance to each function - is this the 'proper' way, or am I totally off?
EDIT: Due to some minor confusion, I am defining what my second question is more clearly: I am not asking How do I use Objects in C? I am asking How do you manage without objects in C?, a.k.a. how do you accomplish things without objects, where you'd normally use objects?
In advance, thanks a lot. I've never used a language without OOP! :)
EDIT: As per request, here is an example of the variable declaration issue:
/* includes, or whatever */
int main(int argc, char *argv[]) {
int myInt = 5;
printf("myInt is %d\n", myInt);
int test = 4; /* This does not result in a compile error */
printf("Test is %d\n", test);
return 0;
}

c89 doesn't allow this, but c99 does. Although it's taken a long time to catch on, some compilers (including gcc) are finally starting to implement c99 features.
IMO, if you want to use OOP, you should probably stick to C++ or try out Objective C. Trying to reinvent OOP built on top of C again just doesn't make much sense.
If you insist on doing it anyway, yes, you can pass a pointer to a struct as an imitation of this -- but it's still not a good idea.
It does often make sense to pass (pointers to) structs around when you need to operate on a data structure. I would not, however, advise working very hard at grouping functions together and having them all take a pointer to a struct as their first parameter, just because that's how other languages happen to implement things.
If you happen to have a number of functions that all operate on/with a particular struct, and it really makes sense for them to all receive a pointer to that struct as their first parameter, that's great -- but don't feel obliged to force it just because C++ happens to do things that way.
Edit: As far as how you manage without objects: well, at least when I'm writing C, I tend to operate on individual characters more often. For what it's worth, in C++ I typically end up with a few relatively long lines of code; in C, I tend toward a lot of short lines instead.
There is more separation between the code and data, but to some extent they're still coupled anyway -- a binary tree (for example) still needs code to insert nodes, delete nodes, walk the tree, etc. Likewise, the code for those operations needs to know about the layout of the structure, and the names given to the pointers and such.
Personally, I tend more toward using a common naming convention in my C code, so (for a few examples) the pointers to subtrees in a binary tree are always just named left and right. If I use a linked list (rare) the pointer to the next node is always named next (and if it's doubly-linked, the other is prev). This helps a lot with being able to write code without having to spend a lot of time looking up a structure definition to figure out what name I used for something this time.

#Question #1: I don't know why there is no error, but you are right, variables have to be declared at the beginning of a block. Good thing is you can declare blocks anywhere you like :). E.g:
{
int some_local_var;
}
#Question #2: actually programming C without inheritance is sometimes quite annoying. but there are possibilities to have OOP to some degree. For example, look at the GTK source code and you will find some examples.
You are right, functions like the ones you have shown are common, but the constructor is commonly devided into an allocation function and an initialization function. E.G:
someStruct* someStruct_alloc() { return (someStruct*)malloc(sizeof(someStruct)); }
void someStruct_init(someStruct* this, int arg1, arg2) {...}
In some libraries, I have even seen some sort of polymorphism, where function pointers are stored within the struct (which have to be set in the initializing function, of course). This results in a C++ like API:
someStruct* str = someStruct_alloc();
someStruct_init(str);
str->someFunc(10, 20, 30);

Regarding OOP in C, have you looked at some of the topics on SO? For instance, Can you write object oriented code in C?.
I can't put my finger on an example, but I think they enforce an OO like discipline in Linux kernel programming as well.

In terms of learning how C works, as opposed to OO in C++, you might find it easier to take a short course in some other language that doesn't have an OO derivative -- say, Modula-2 (one of my favorites) or even BASIC (if you can still find a real BASIC implementation -- last time I wrote BASIC code it was with the QBASIC that came with DOS 5.0, later compiled in full Quick BASIC).
The methods you use to get things done in Modula-2 or Pascal (barring the strong typing, which protects against certain types of errors but makes it more complicated to do certain things) are exactly those used in non-OO C, and working in a language with different syntax might (probably will, IMO) make it easier to learn the concepts without your "programming reflexes" kicking in and trying to do OO operations in a nearly-familiar language.

Good OO naming scheme for c

I find myself with a lot of functions that are oo style in c(nothing with fancy macros or function pointers just struct + functions that take that struct type as a first argument.
Is my_func_my_type a good scheme for such things? If I use that should I try to be consistent? what if it one of my functions name becomes > 20 characters? > 25? Should I keep the style consistent even then?
Also what is a good naming scheme for constructors/initializers/destructors?
Is there something beter then new_my_type , init_my_type, free_my_type ?
P.S. Is there a good name for this/self ptr or should I just name the first parameter of OO like function as I would in a normal function (to_init, some_guy,ect.)

I would personally use a scheme consistent with most C libraries that put the library name as a prefix of the function. Which would give for instance void MyObject_MyFunction. For construction/destruction, you can stay consistent and use MyObject_Construct, MyObject_Destroy. I'd say the name don't matter much as long as you stay consistent.

'good' is a bit subjective, but I find it the easiest to maintain to choose a certain scheme (incorporating struct name into method) and follow it strictly. Really long function names I tend to avoid though, if a function is so complicated that it need such a long name it's probably time to refactor it, split into subfunctions etc.
I also do not use underscores, but that's a matter of taste.
A sample of what's used here:
typedef struct Discriminator
{
//members
} Discriminator;
DiscriminatorConstruct( Discriminator* p );
DiscriminatorDestruct( Discriminator* p );
DiscriminatorFunctionA( Discriminator* p, int arg1, int arg2 );

For lack of one within the C world, C++ provides a useful precedent in the naming of its functions:
[[namespace::...]struct::]function
There's nothing like namespaces in C, but you can just think of and model them as prefixes shared by related structs. For readability, it helps to have something to visually separate the distinct naming components. If you like using underscores in your functions, you might consider two underscores as a separator, otherwise perhaps one. (Technically, two underscores may be reserved for the implementation, but I've never seen any implementation identifier that wasn't prefixed with underscores put a double underscore internally).
Keeping the names close to C++ also helps programmers span both languages, and in migrating code back and forth if necessary. Similarly, C++ terminology may be adopted: constructor, destructor, possibly new and delete (though these names may become misnomers if the C code is ported to C++ but continues to use free/malloc).
IMHO, consistent and clear naming saves more hassles than long identifiers cause, but if something becomes a pain then seek a localised workaround such as a macro, inline wrapper function, function pointer etc..

C99 mixed declarations and code in open source projects?

Why is still C99 mixed declarations and code not used in open source C projects like the Linux kernel or GNOME?
I really like mixed declarations and code since it makes the code more readable and prevents hard to see bugs by restricting the scope of the variables to the narrowest possible. This is recommended by Google for C++.
For example, Linux requires at least GCC 3.2 and GCC 3.1 has support for C99 mixed declarations and code

You don't need mixed declaration and code to limit scope. You can do:
{
int c;
c = 1;
{
int d = c + 1;
}
}
in C89. As for why these projects haven't used mixed declarations (assuming this is true), it's most likely a case of "If it ain't broke don't fix it."

This is an old question but I'm going to suggest that inertia is the reason that most of these projects still use ANSI C declarations rules.
However there are a number of other possibilities, ranging from valid to ridiculous:
Portability. Many open source projects work under the assumption that pedantic ANSI C is the most portable way to write software.
Age. Many of these projects predate the C99 spec and the authors may prefer a consistent coding style.
Ignorance. The programmers submitting predate C99 and are unaware of the benefits of mixed declarations and code. (Alternate interpretation: Developers are fully aware of the potential tradeoffs and decide that mixed declarations and statements are not worth the effort. I highly disagree, but it's rare that two programmers will agree on anything.)
FUD. Programmers view mixed declarations and code as a "C++ism" and dislike it for that reason.

There is little reason to rewrite the Linux kernel to make cosmetic changes that offer no performance gains.
If the code base is working, so why change it for cosmetic reasons?

There is no benefit. Declaring all variables at the beginning of the function (pascal like) is much more clear, in C89 you can also declare variables at the beginning of each scope (inside loops example) which is both practical and concise.

I don't remember any interdictions against this in the style guide for kernel code. However, it does say that functions should be as small as possible, and only do one thing. This would explain why a mixture of declarations and code is rare.
In a small function, declaring variables at the start of scope acts as a sort of Introit, telling you something about what's coming soon after. In this case the movement of the variable declaration is so limited that it would likely either have no effect, or serve to hide some information about the functionality by pushing the barker into the crowd, so to speak. There is a reason that the arrival of a king was declared before he entered a room.
OTOH, a function which must mix variables and code to be readable is probably too big. This is one of the signs (along with too-nested blocks, inline comments and other things) that some sections of a function need to be abstracted into separate functions (and declared static, so the optimizer can inline them).
Another reason to keep declarations at the beginning of the functions: should you need to reorder the execution of statements in the code, you may move a variable out of its scope without realizing it, since the scope of a variable declared in the middle of code is not evident in the indentation (unless you use a block to show the scope). This is easily fixed, so it's just an annoyance, but new code often undergoes this kind of transformation, and annoyance can be cumulative.
And another reason: you might be tempted to declare a variable to take the error return code from a function, like so:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
Perfectly reasonable thing to do. But:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
....
int ret = another_func_may_fail();
if (ret) { handle_other_fail(ret); }
Ooops! ret is defined twice. "So? Remove the second declaration." you say. But this makes the code asymmetric, and you end up with more refactoring limitations.
Of course, I mix declarations and code myself; no reason to be dogmatic about it (or else your karma may run over your dogma :-). But you should know what the concomitant problems are.

Maybe it's not needed, maybe the separation is good? I do it in C++, which has this feature as well.

There is no reason to change the code away like this, and C99 is still not widely supported by compilers. It is mostly about portability.

What style to use when naming types in C

According to this stack overflow answer, the "_t" postfix on type names is reserved in C. When using typedef to create a new opaque type, I'm used to having some sort of indication in the name that this is a type. Normally I would go with something like hashmap_t but now I need something else.
Is there any standard naming scheme for types in C? In other languages, using CapsCase like Hashmap is common, but a lot of C code I see doesn't use upper case at all. CapsCase works fairly nicely with a library prefix too, like XYHashmap.
So is there a common rule or standard for naming types in C?

Yes, POSIX reserves names ending _t if you include any of the POSIX headers, so you are advised to stay clear of those - in theory. I work on a project that has run afoul of such names two or three times over the last twenty or so years. You can minimize the risk of collision by using a corporate prefix (your company's TLA and an underscore, for example), or by using mixed case names (as well as the _t suffix); all the collisions I've seen have been short and all-lower case (dec_t, loc_t, ...).
Other than the system-provided (and system-reserved) _t suffix, there is no specific widely used convention. One of the mixed-case systems (camelCase or InitialCaps) works well. A systematic prefix works well too - the better libraries tend to be careful about these.
If you do decide to use lower-case and _t suffix, do make sure that you use long enough names and check diligently against the POSIX standard, the primary platforms you work on, and any you think you might work on to avoid unnecessary conflicts. The worst problems come when you release some name example_t to customers and then find there is a conflict on some new platform. Then you have to think about making customers change their code, which they are always reluctant to do. It is better to avoid the problem up front.

The Indian Hill style guidelines have some suggestions:
Individual projects will no doubt have
their own naming conventions. There
are some general rules however.
Names with leading and trailing underscores are reserved for system
purposes and should not be used for
any user-created names. Most systems
use them for names that the user
should not have to know. If you must
have your own private identifiers,
begin them with a letter or two
identifying the package to which they
belong.
#define constants should be in all CAPS.
Enum constants are Capitalized or in all CAPS
Function, typedef, and variable names, as well as struct, union, and
enum tag names should be in lower
case.
Many macro "functions" are in all CAPS. Some macros (such as getchar and
putchar) are in lower case since they
may also exist as functions.
Lower-case macro names are only
acceptable if the macros behave like a
function call, that is, they evaluate
their parameters exactly once and do
not assign values to named parameters.
Sometimes it is impossible to write a
macro that behaves like a function
even though the arguments are
evaluated exactly once.
Avoid names that differ only in case, like foo and Foo. Similarly,
avoid foobar and foo_bar. The
potential for confusion is
considerable.
Similarly, avoid names that look like each other. On many terminals and
printers, 'l', '1' and 'I' look quite
similar. A variable named 'l' is
particularly bad because it looks so
much like the constant '1'.
In general, global names (including
enums) should have a common prefix
identifying the module that they
belong with. Globals may alternatively
be grouped in a global structure.
Typedeffed names often have "_t"
appended to their name.
Avoid names that might conflict with
various standard library names. Some
systems will include more library code
than you want. Also, your program may
be extended someday.

C only reserves some uses of a _t suffix. As far as I can tell, this is only current identifiers ending with _t plus any identifier that starts int or uint (7.26.8). However, POSIX may reserve more.
It's a general problem in C, since you have extremely flat namespaces, and there's no silver bullet. If you're familiar with CapCase names and they work well for you, then you should continue to use them. Otherwise, you'll have to evaluate the goals of the current project and see which solution best meets them.

CapsCase is often used for types in C.
For instance, if you look at projects in the GNOME ecosystem (GTK+, GDK, GLib, GObject, Clutter, etc.), you'll see types like GtkButton or ClutterStageWindow. They only use CapsCase for data types; function names and variables are all lower-case with underscore separators - e.g. clutter_actor_get_geometry().
Type naming schemes are like indentation conventions - they generate religious wars with people asserting some sort of moral superiority for their preferred approach. It is certainly preferable to follow the style in existing code, or in related projects (e.g. for me, GNOME over the last few years.)
However, if you're starting from scratch and have no template, there's no hard-and-fast rule. If you're interested in coding efficiently and leaving work at reasonable hour so you can go home and have a beer or whatever, you certainly should pick a style and stick to it for your project, but it matters very little exactly which style you pick.

One alternate solution that works reasonably well is to use uppercase for all type names and macro names. Global variables may be CapCase (CamelBack) and all local variables lower case.
This technique helps to improve readability and also takes advantage of language syntax which reduces the number of pollution characters in variable names; e.g. gvar, kvar, type_t, etc. For example, data types cannot be syntatically confused with any other type.
Global variables are easily distinguished from locals by having at least one upper case letter.
I agree that prefixed or postfixed underscores should be avoided in all token names.
Lets look at the example below.
Its readily clear that InvertedCount is a global due to its case. It's equally clear that INT32U and RET_ERR are types due to their sytax. Its also clear that INVERT_VAL() is a macro due to the fact thats its on the right hand side and there is no cast so it cant be a data type.
One thing is for sure though. Whichever method you use, it should be inline with your organizations coding standard. For me, the least amount of clutter, the better.
Of course, style is a different issue.
#define INVERT_VAL(x) (~x)
#define CALIBRATED_VAL 100u
INT32U InvertedCount;
typedef enum {
ERR_NONE = 0,
...
} RET_ERR;
RET_ERR my_func (void)
{
INT32U val;
INT32U check_sum;
val = CALIBRATED_VAL; // --> Lower case local variable.
check_sum = INVERT_VAL(val); // --> Clear use of macris.
InvertedCount = checksum; // --> Upper case global variable.
// Looks different no g prefix required.
...
return (ERR_NONE);
}

There are many ideas and opinion on this subject, but there is no one universal standard for naming types. The most important thing is to be consistent. In the absence of coding standards, when maintaining code, resist the urge to use another naming convention. Introducing a new naming convention, even if it's perfect, can add unnecessary complexity.
This is actually a great topic to raise when interviewing people. I've never come across a good programmer that didn't have an opinion on this. No opinion or no passion in the answer indicates that the person isn't an experienced programmer.