Unfamiliar C Syntax in the Declaration of Function - c

Currently looking at some C code that doesn't make any sense to me. What is (elementSize)? How am supposed to pass arguments to this static function? What is the name of this syntax style so I can learn more abour it?
static int torch_Tensor_(elementSize)(lua_State *L)
{
luaT_pushinteger(L, THStorage_(elementSize)());
return 1;
}
https://github.com/torch/torch7/blob/master/generic/Tensor.c
This is the file I am trying to understand for reference.

Normally
static int torch_Tensor_(elementSize)(lua_State *L)
would mean torch_Tensor_ is a function that takes a single parameter called elementSize that has no type (?! - syntax error) and returns a function that takes a pointer to lua_State and returns an int. This is blatantly invalid (functions cannot return other functions).
But what's actually going on here is that torch_Tensor_ is defined as a function-like macro, so before the compiler even sees this declaration, torch_Tensor_(elementSize) is replaced by something else.
In https://github.com/torch/torch7/blob/master/Tensor.c there is
#include "general.h"
#define torch_Storage_(NAME) TH_CONCAT_4(torch_,Real,Storage_,NAME)
#define torch_Storage TH_CONCAT_STRING_3(torch.,Real,Storage)
#define torch_Tensor_(NAME) TH_CONCAT_4(torch_,Real,Tensor_,NAME)
#define torch_Tensor TH_CONCAT_STRING_3(torch.,Real,Tensor)
#include "generic/Tensor.c"
#include "THGenerateAllTypes.h"
#include "generic/Tensor.c"
#include "THGenerateHalfType.h"
with TH_CONCAT_... defined in lib/TH/THGeneral.h.in:
#define TH_CONCAT_STRING_3(x,y,z) TH_CONCAT_STRING_3_EXPAND(x,y,z)
#define TH_CONCAT_STRING_3_EXPAND(x,y,z) #x #y #z
#define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
#define TH_CONCAT_4(x,y,z,w) TH_CONCAT_4_EXPAND(x,y,z,w)
So torch_Tensor_ is defined as a macro before generic/Tensor.c is included.
torch_Tensor_(elementSize)
expands to
TH_CONCAT_4(torch_,Real,Tensor_,elementSize)
which expands to
TH_CONCAT_4_EXPAND(torch_,...,Tensor_,elementSize)
... is a placeholder, not real code. Real is defined as a macro in the various THGenerate*Type.h files, so this line actually becomes
TH_CONCAT_4_EXPAND(torch_,char,Tensor_,elementSize)
TH_CONCAT_4_EXPAND(torch_,int,Tensor_,elementSize)
TH_CONCAT_4_EXPAND(torch_,float,Tensor_,elementSize)
...
depending on context. Anyway, the end result is a single identifier of the form
torch_charTensor_elementSize
torch_intTensor_elementSize
torch_floatTensor_elementSize
...
(one token).
The resulting function definition thus looks like e.g.
static int torch_charTensor_elementSize(lua_State *L)
{
...
}
depending on which context generic/Tensor.c was included in.
The reason things are done this way is to have what amounts to the same code, but for multiple different types. In C++ you would write a function template:
namespace torch {
template<typename Real>
static int Tensor_elementSize(lua_State *L) { ... }
}
But C has no templates (nor namespaces), so the only way to get "generic" code like this is to do it manually with macros and preprocessing tricks (and manually "decorating" names; e.g. the elementSize function for floats is really called torch_floatTensor_elementSize).
All we're really trying to do is abstract over a type parameter, here called Real.

Related

C preprocessor #if condition

I am building some generic things in C.
Here is the code:
// main.c
#include <stdio.h>
#define T int;
#include "test.h"
int main()
{
return 0;
}
// test.h
#define _array_is_pointer(T) ( \
{ \
T _value; \
__builtin_classify_type(_value) == 5; \
})
#ifdef T
#if _array_is_pointer(T)
struct array_s
{
T *items;
}
void array_push(struct array_s * array, T value)
{
// push method for pointer.
}
#else
struct array_s
{
T *items;
}
void array_push(struct array_s * array, T value)
{
// push method for non-pointer.
}
#endif
#endif
** edited: add more code in test.h **
I would like the preprocessor runs different code when T is pointer or non-pointer.
But I got an error token "{" is not valid in preprocessor expressions.
Is it possible to do that?
I would like the preprocessor runs different code when T is pointer or non-pointer.
Is it possible to do that?
No, it is not possible. Preprocessor is not aware of types.
If you really want this, pass a mark if T is a pointer or not as a separate macro.
#define T int*
#define T_IS_A_POINTER 1
#include "test.h"
Or have separate calls:
#define T int*
#include "test_a_pointer.h"
#define T int
#include "test_not_a_pointer.h"
The preprocessor doesn't know whether T is a pointer, because preprocessing happens before semantic analysis of the program. All the preprocessor sees are tokens; it knows that 42 is a number and take42, but that's it. The only definitions it knows about are preprocessor #defines.
Moreover, in C, functions --even builtin constant functions like sizeof and __builtin_classify_type-- cannot be evaluated by the preprocessor. The preprocessor cannot evaluate block expressions either, but there wouldn't be much point so it has no idea what a variable is and thus doesn't need declarations. The only identifier you can use in an #if preprocessor conditional are macro definitions which expand to integer constants (or entire expressions containing only arithmetic operations on integer constants).
There is the _Generic construct introduced in C11, which allows you to generate different expressions based on the type of a controlling expression. But it can only be used to generate expressions, not declarations, so it's probably not much help either.
There is no issue while writing multi-line code-snippet in
#define _array_is_pointer(T) ( \
{ \
T _value; \
__builtin_classify_type(_value) == 5; \
})
But, as you have know, the first step done before passing the code to compiler is to create an Expanded source code. In this step, all the five lines woud be pasted whereever you would have written _array_is_pointer(T) and hence resulting code would have :
#if (
{
T _value;
__builtin_classify_type(_value) == 5;
})
and here is a blunder. One can not write multiple lines like this in if clause, nor you could do this using {}. And hence, you got the error token "{" is not valid in preprocessor expressions.
Hence, you would have to write a single expression to in if clause preprocessor.

Is it possible to determine the type of object on-line in one-pass, including macros?

I have a very simple parser that provides a small section of the C language; it looks at a well-formed translation unit and, with one pass and online, determine what the global symbols and types (function, struct, union, variable,) if one is not trying to trick it. However, I'm having trouble determining if it's a struct or a function in this example,
#define CAT_(x, y) x ## y
#define CAT(x, y) CAT_(x, y)
#define F_(thing) CAT(foo, thing)
static struct F_(widget) { int i; }
F_(widget);
static struct F_(widget) a(void) { int i;
return i = 42, F_(widget).i = i, F_(widget); }
int main(void) {
a();
return 0;
}
It assumes that the parenthesis is a function and parses this this way,
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, SEMI<;>].
[ID<stati>, ID<struc>, ID<F_>, LPAR<(>, ID<widge>, RPAR<)>, ID<a>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
[ID<int>, ID<main>, LPAR<(>, ID<void>, RPAR<)>, LBRA<{>, RBRA<}>].
When in fact, what it thinks is the function at the top is actually a struct declaration and the top two should be concatenated. What is the simplest way to recognise that this?
Two-pass, emulating what actually happens in macro replacement; I would have to build a subset of the C pre-processor;
like the C lexer hack, except with macros;
backtrack with the semicolon at the end; that seems hard;
somehow recognise the difference at the beginning, (probably requiring me to add struct to my symbol table.)
As mentioned in the comments, if you want to be able to handle preprocessor macros, you will need to implement (or borrow) a preprocessor.
Writing a preprocessor mostly involves coming to terms with the formal description in the C standard, but it is not otherwise particularly challenging. It can be done online with the resulting token stream fed into a parser, so it doesn't really require a second pass.
(This depends on how you define a "pass" I suppose, but in my usage a one-pass parser reads the input only once without creating and rereading a temporary file. And that is definitely doable.)

Function Pointer declaration and function definition together

I saw some piece of code in one of the old files.
void (*const m_exec[N_EXECS])(void) =
{
#define PROCESS_DEF_TIMED(name) name, // defines macro for use in proclist.h
#define PROCESS_TIMED // define switch for section in proclist.h
#include "proclist.h"
#undef PROCESS_TIMED // undefine switch
#undef PROCESS_DEF_TIMED // undefines macro
};
I am unable to understand the meaning of this code. Is this a function pointer with declaration and function definition together? But if I try to declare similar function pointer like below, I get compilation error
void (*voidFptr)(void) =
{
printf("Hello\n");
}
Also what is #define here? Why this is inside the function I am not sure.
This:
void (*const m_exec[N_EXECS])(void)
is the way you declare an array of function pointers in C. You are not alone in finding this difficult to read. It declares an array of length N_EXECS, where each element in the array is a function that takes no arguments and returns a pointer to a const-void.
The braces-enclosed block after it is the array initializer; probably proclist.h has a whole list of function pointer declarations in it, and this is essentially pasting those into this array. If you want to see what's actually happening after the #include, you can use the -E flag of your compiler. So if this were in main.c, you would run:
gcc -E -Ipath/to/headers -Iother/path/to/headers main.c
And it would give you a (probably huge) dump of source code, which is the result of pushing that file through the preprocessor and evaluating all of the #include statements.
Edit: missed your last question.
Probably (and this is conjecture without seeing proclist.h), the things its defining change the contents of proclist.h. For example, if it contained:
#ifdef PROCESS_TIMED
&function1_timed,
&function2_timed
#else
&function1,
&function2
#endif
Then #define PROCESS_TIMED would change what ended up in your m_exec array.

How to make the first invocation of a macro different from all the next ones ?

That may be really simple but I'm unable to find a good answer.
How can I make a macro representing first a certain value and then a different one?
I know that's nasty but I need it to implicitly declare a variable the first time and then do nothing.
This variable is required by other macros that I'm implementing.
Should I leverage "argument prescan"?
The thing you need to know is the fact I'm generating the code:
#define INC_X x++ //should be declared if needed to
#define PRINT_X printf("VALUE OF X: %d\n", x)
int func() {
[...]
INC_X;
[...]
INC_X;
[...]
PRINT_X;
[...]
}
As far as I know, this is impossible. I know of no way for the expansion of a macro to control the way another macro -- or itself -- will be expanded after. C99 introduced _Pragma so that #pragma things can be done in macros, but there is no equivalent for #define or #undef.
#include <stdio.h>
#define FOO &s[ (!c) ? (c++, 0) : (4) ]
static int c = 0;
const char s[] = { 'f', 'o', 'o', '\0', 'b', 'a', 'r', '\0' };
int main() {
puts(FOO);
puts(FOO);
return 0;
}
Does the above help?
From the look of it, you could try if Boost.Preprocessor contains what you are looking for.
Look at this tutorial
http://www.boostpro.com/tmpbook/preprocessor.html
from the excellent C++ Template Metaprogramming book.
With the edit, I'll have a go at an answer. It requires your compiler to support __FUNCTION__, which MSVC and GCC both do.
First, write a set of functions which maps strings to integers in memory, all stored in some global instance of a structure. This is left as an exercise for the reader, functionally it's a hashmap, but I'll call the resulting instance "global_x_map". The function get_int_ptr is defined to return a pointer to the int corresponding to the specified string, and if it doesn't already exist to create it and initialize it to 0. reset_int_ptr just assigns 0 to the counter for now, you'll see later why I didn't just write *_inc_x_tmp = 0;.
#define INC_X do {\
int *_inc_x_tmp = get_int_ptr(&global_x_map, __FILE__ "{}" __FUNCTION__); \
/* maybe some error-checking here, but not sure what you'd do about it */ \
++*_inc_x_tmp; \
} while(0)
#define PRINT_X do {\
int *_inc_x_tmp = get_int_ptr(&global_x_map, __FILE__ "{}" __FUNCTION__); \
printf("%d\n", *_inc_x_tmp); \
reset_int_ptr(&global_x_map, _inc_x_tmp); \
} while(0)
I've chose the separator "{}" on the basis that it won't occur in a mangled C function name - if your compiler for some reason might put that in a mangled function name then of course you'd have to change it. Using something which can't appear in a file name on your platform would also work.
Note that functions which use the macro are not re-entrant, so it is not quite the same as defining an automatic variable. I think it's possible to make it re-entrant, though. Pass __LINE__ as an extra parameter to get_int_ptr. When the entry is created, store the value of __LINE__.
Now, the map should store not just an int for each function, but a stack of ints. When it's called with that first-seen line value, it should push a new int onto the stack, and return a pointer to that int thereafter whenever it's called for that function with any other line value. When reset_int_ptr is called, instead of setting the counter to 0, it should pop the stack, so that future calls will return the previous int.
This only works of course if the "first" call to INC_X is always the same, is called only once per execution of the function, and that call doesn't appear on the same line as another call. If it's in a loop, if() block, etc, it goes wrong. But if it's inside a block, then declaring an automatic variable would go wrong too. It also only works if PRINT_X is always called (check your early error exits), otherwise you don't restore the stack.
This may all sound like a crazy amount of engineering, but essentially it is how Perl implements dynamically scoped variables: it has a stack for each symbol name. The difference is that like C++ with RAII, Perl automatically pops that stack on scope exit.
If you need it to be thread-safe as well as re-entrant, then make global_x_map thread-local instead of global.
Edit: That __FILE__ "{}" __FUNCTION__ identifier still isn't unique if you have static functions defined in header files - the different versions in different TUs will use the same counter in the non-re-entrant version. It's OK in the re-entrant version, though, I think. You'll also have problems if __FILE__ is a basename, not a full path, since you could get collisions for static functions of the same name defined in files of the same name. That scuppers even the re-entrant version. Finally, none of this is tested.
What about having the macro #define some flag at the end of it's execution and check for that flag first?
#def printFoo
#ifdef backagain
bar
#else
foo
#def backagain
Need to add some \ chars to make it work - and you probably don't want to actually do this compared to an inline func()
An alternative to some of the methods proposed thus far would be to use function pointers. It might not be quite what you are looking for, but they can still be a powerful tool.
void foo (void);
void bar (void);
void (*_func_foo)(void) = foo;
void foo (void) {
puts ("foo\n");
}
void bar (void) {
puts ("bar"\n");
}
#define FOO() _func_foo(); \
_func_foo = bar;
int main (void) {
FOO();
FOO();
FOO();
return 0;
}
#define FOO __COUNTER__ ? bar : foo
Edit: removed all unneeded code

#undef-ing in Practice?

I'm wondering about the practical use of #undef in C. I'm working through K&R, and am up to the preprocessor. Most of this was material I (more or less) understood, but something on page 90 (second edition) stuck out at me:
Names may be undefined with #undef,
usually to ensure that a routine is
really a function, not a macro:
#undef getchar
int getchar(void) { ... }
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.) With your own function names, do you feel the need to do this? Does that change if you're developing a library for others to use?
What it does
If you read Plauger's The Standard C Library (1992), you will see that the <stdio.h> header is allowed to provide getchar() and getc() as function-like macros (with special permission for getc() to evaluate its file pointer argument more than once!). However, even if it provides macros, the implementation is also obliged to provid actual functions that do the same job, primarily so that you can access a function pointer called getchar() or getc() and pass that to other functions.
That is, by doing:
#include <stdio.h>
#undef getchar
extern int some_function(int (*)(void));
int core_function(void)
{
int c = some_function(getchar);
return(c);
}
As written, the core_function() is pretty meaningless, but it illustrates the point. You can do the same thing with the isxxxx() macros in <ctype.h> too, for example.
Normally, you don't want to do that - you don't normally want to remove the macro definition. But, when you need the real function, you can get hold of it. People who provide libraries can emulate the functionality of the standard C library to good effect.
Seldom needed
Also note that one of the reasons you seldom need to use the explicit #undef is because you can invoke the function instead of the macro by writing:
int c = (getchar)();
Because the token after getchar is not an (, it is not an invocation of the function-like macro, so it must be a reference to the function. Similarly, the first example above, would compile and run correctly even without the #undef.
If you implement your own function with a macro override, you can use this to good effect, though it might be slightly confusing unless explained.
/* function.h */
…
extern int function(int c);
extern int other_function(int c, FILE *fp);
#define function(c) other_function(c, stdout);
…
/* function.c */
…
/* Provide function despite macro override */
int (function)(int c)
{
return function(c, stdout);
}
The function definition line doesn't invoke the macro because the token after function is not (. The return line does invoke the macro.
Macros are often used to generate bulk of code. It's often a pretty localized usage and it's safe to #undef any helper macros at the end of the particular header in order to avoid name clashes so only the actual generated code gets imported elsewhere and the macros used to generate the code don't.
/Edit: As an example, I've used this to generate structs for me. The following is an excerpt from an actual project:
#define MYLIB_MAKE_PC_PROVIDER(name) \
struct PcApi##name { \
many members …
};
MYLIB_MAKE_PC_PROVIDER(SA)
MYLIB_MAKE_PC_PROVIDER(SSA)
MYLIB_MAKE_PC_PROVIDER(AF)
#undef MYLIB_MAKE_PC_PROVIDER
Because preprocessor #defines are all in one global namespace, it's easy for namespace conflicts to result, especially when using third-party libraries. For example, if you wanted to create a function named OpenFile, it might not compile correctly, because the header file <windows.h> defines the token OpenFile to map to either OpenFileA or OpenFileW (depending on if UNICODE is defined or not). The correct solution is to #undef OpenFile before defining your function.
Although I think Jonathan Leffler gave you the right answer. Here is a very rare case, where I use an #undef. Normally a macro should be reusable inside many functions; that's why you define it at the top of a file or in a header file. But sometimes you have some repetitive code inside a function that can be shortened with a macro.
int foo(int x, int y)
{
#define OUT_OF_RANGE(v, vlower, vupper) \
if (v < vlower) {v = vlower; goto EXIT;} \
else if (v > vupper) {v = vupper; goto EXIT;}
/* do some calcs */
x += (x + y)/2;
OUT_OF_RANGE(x, 0, 100);
y += (x - y)/2;
OUT_OF_RANGE(y, -10, 50);
/* do some more calcs and range checks*/
...
EXIT:
/* undefine OUT_OF_RANGE, because we don't need it anymore */
#undef OUT_OF_RANGE
...
return x;
}
To show the reader that this macro is only useful inside of the function, it is undefined at the end. I don't want to encourage anyone to use such hackish macros. But if you have to, #undef them at the end.
I only use it when a macro in an #included file is interfering with one of my functions (e.g., it has the same name). Then I #undef the macro so I can use my own function.
Is this a common practice to defend against someone #define-ing a macro with the same name as your function? Or is this really more of a sample that wouldn't occur in reality? (EG, no one in his right, wrong nor insane mind should be rewriting getchar(), so it shouldn't come up.)
A little of both. Good code will not require use of #undef, but there's lots of bad code out there you have to work with. #undef can prove invaluable when somebody pulls a trick like #define bool int.
In addition to fixing problems with macros polluting the global namespace, another use of #undef is the situation where a macro might be required to have a different behavior in different places. This is not a realy common scenario, but a couple that come to mind are:
the assert macro can have it's definition changed in the middle of a compilation unit for the case where you might want to perform debugging on some portion of your code but not others. In addition to assert itself needing to be #undef'ed to do this, the NDEBUG macro needs to be redefined to reconfigure the desired behavior of assert
I've seen a technique used to ensure that globals are defined exactly once by using a macro to declare the variables as extern, but the macro would be redefined to nothing for the single case where the header/declarations are used to define the variables.
Something like (I'm not saying this is necessarily a good technique, just one I've seen in the wild):
/* globals.h */
/* ------------------------------------------------------ */
#undef GLOBAL
#ifdef DEFINE_GLOBALS
#define GLOBAL
#else
#define GLOBAL extern
#endif
GLOBAL int g_x;
GLOBAL char* g_name;
/* ------------------------------------------------------ */
/* globals.c */
/* ------------------------------------------------------ */
#include "some_master_header_that_happens_to_include_globals.h"
/* define the globals here (and only here) using globals.h */
#define DEFINE_GLOBALS
#include "globals.h"
/* ------------------------------------------------------ */
If a macro can be def'ed, there must be a facility to undef.
a memory tracker I use defines its own new/delete macros to track file/line information. this macro breaks the SC++L.
#pragma push_macro( "new" )
#undef new
#include <vector>
#pragma pop_macro( "new" )
Regarding your more specific question: namespaces are often emul;ated in C by prefixing library functions with an identifier.
Blindly undefing macros is going to add confusion, reduce maintainability, and may break things that rely on the original behavior. If you were forced, at least use push/pop to preserve the original behavior everywhere else.

Resources