I want to fill a multidimensional array using a macro so that the people using it think that they are using a function and passing only one string. The macro will use this string and at compile time convert it so that it would appear as a multidimensional array, like this:
make_array ("string1,{string2,{string3,{...,{stringN");
So the macro will replace this function to a multidimensional array and cut that string wherever it encounters,{. The code above will turn in something like this:
make_array = { "string1", "string2", "string3", ..., "stringN"};
I'm using GCC; how can I accomplish this?
Update: I thought I could exclude the quotes of the string using a macro, so I would have a string without a text and I could edit the string in macro but GCC does not accept the declaration of a macro to replace double quotes (like shown below).
#define macro_array ( "text") text
So the text will appear without double quotes and I could find the ,{ mark and cut it and use then stringify to turn the string back.
You can get a moderate approximation to what you are after with C99 and variable arguments in macros:
Source
#define make_array(name, dim1, dim2, ...) \
static char name[dim1][dim2] = { __VA_ARGS__ }
make_array(mine, 2, 2, "abc", "def", "ghi", "jkl");
Output
$: gcc -E xx.c
# 1 "xx.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "xx.c"
static char mine[2][2] = { "abc", "def", "ghi", "jkl" };
$
However, you cannot readily split a string in the preprocessor as requested - it is simply the wrong tool for the job.
You cannot do string processing with macros.
I'm not sure I understand exactly what you want to do, but this is probably best achieved with a function.
Forget it. I see way too many questions along the lines of "How can I make my code look like this?" when the question for achieving good code should be "How can I make my code work like this?" What is the goal you're trying to achieve?
If you want to import external data that was formatted in the weird notation you specified, does the data vary at runtime or is it constant? In the former case you'll need a parser in your program and a good deal of dynamic allocation. In the latter case, you need to write a program that runs before compiling the main program which parses and converts the data into C. But if there's no legitimate reason for the data to be in this weird format to begin with, you should simply write it in C from the start rather than trying to force C to look like something else.
Get rid of the quotes and use a variadic macro
Write a function to split the string and call it from the macro
Don't use macros for this, just make a static inline function. It is just as fast.
As Oli has already said, this kind of string processing is impossible with macros. Concatenation and replacement of other strings is about as much as you can do with macros.
I think the answer here is a question- why does the input string have to be of that format? Writing your required result does not require any more effort than it is to write your input, so why would you want to go through the pain of processing it?
Related
Note that I'm trying to modify existing code, written by someone else, so answers from other similar questions most likely don't apply, as they could kick off the code with macro functions oriented coding mentality in mind, making preprocessing acrobatics much easier, which isn't the case for me.
So this is what I want to do. Let's say I have 1 function and 1 variable:
void Foo1(void);
int Foo1 = 0;
I want to turn them into:
void FOO1(void);
int FOO1 = 0;
Now as everyone knows, doing
#define Foo FOO
will not work as the C preprocessor will not treat Foo out of Foo1 as a single token but rather, it will treat Foo1, the whole of it, as a token.
So I need to somehow "trick" the C preprocessor into believing Foo is a token, then work on it.
I tried this:
#define a_random_thing Foo
#define Foo FOO
naively believing preprocessor will consider "Foo" a product of "a_random_thing" after expansion, and then perform another round of expansion on "Foo".
That sadly and obviously didn't work.
So what exactly should I do?
The code I'm working on is an updated library, and tons of the variable names and function names were modified ever so slightly — just enough to gave me 100+ compilation errors; very thoughtful on their part. I'm trying to make it backward compatible.
If what I'm trying to accomplish is not possible, please also tell me, thanks!
It's not really possible unless you go ahead and define foo1, foo2, foo3...
You simply can't split foo1 into two different pre-processor tokens, which would be required in order to solve the problem variably.
The closest thing you can do is #define foo(n) FOO##n and call it as foo(1) to get FOO1.
...or you could probably just do search & replace in a text editor.
The reason this won't work is because tokenization happens in the lexing phase of the compilation process. In this phase, the C compiler will parse the sequence of bytes as per C's lexical rules to create tokens (individual elements of the code: keywords, names for variables and functions, operations like +-*/, etc.).
The lexing phase works before the C pre-processor. At this time, the pre-processor looks for tokens matching the macro definition, and replaces them. Already, what your program will have are tokens called foo1, and a macro defining FOO.
I would suggest you look at string replacement tools sed/awk to solve your problem.
I've one string array char version[][8] = {"new", "old", "latest", "oldest", "ancient"};
and I've one macro
#define FS(file, attr) \
filesys(file, file_ ##attr## _ops) \
How could I pass members of version of string array into the FS macro ?
You cannot. Macros are compile time and the compiler will no be able to splice the strings the way you want. Instead, try using strcat(), just don't forget that you need to keep track of how large your string arrays are.
You cant use any C features in the macros as proprocessor does not know anything about the C and the C language.
Bear in mind that it is compile time token substitution.
Use normal functions instead
I am trying to use a function-like macro to generate an object-like macro name (generically, a symbol). The following will not work because __func__ (C99 6.4.2.2-1) puts quotes around the function name.
#define MAKE_AN_IDENTIFIER(x) __func__##__##x
The desired result of calling MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED) would be MyFunctionName__NULL_POINTER_PASSED. There may be other reasons this would not work (such as __func__ being taken literally and not interpreted, but I could fix that) but my question is what will provide a predefined macro like __func__ except without the quotes? I believe this is not possible within the C99 standard so valid answers could be references to other preprocessors.
Presently I have simply created my own object-like macro and redefined it manually before each function to be the function name. Obviously this is a poor and probably unacceptable practice. I am aware that I could take an existing cpp program or library and modify it to provide this functionality. I am hoping there is either a commonly used cpp replacement which provides this or a preprocessor library (prefer Python) which is designed for extensibility so as to allow me to 'configure' it to create the macro I need.
I wrote the above to try to provide a concise and well defined question but it is certainly the Y referred to by #Ruud. The X is...
I am trying to manage unique values for reporting errors in an embedded system. The values will be passed as a parameter to a(some) particular function(s). I have already written a Python program using pycparser to parse my code and identify all symbols being passed to the function(s) of interest. It generates a .h file of #defines maintaining the values of previously existing entries, commenting out removed entries (to avoid reusing the value and also allow for reintroduction with the same value), assigning new unique numbers for new identifiers, reporting malformed identifiers, and also reporting multiple use of any given identifier. This means that I can simply write:
void MyFunc(int * p)
{
if (p == NULL)
{
myErrorFunc(MYFUNC_NULL_POINTER_PASSED);
return;
}
// do something actually interesting here
}
and the Python program will create the #define MYFUNC_NULL_POINTER_PASSED 7 (or whatever next available number) for me with all the listed considerations. I have also written a set of macros that further simplify the above to:
#define FUNC MYFUNC
void MyFunc(int * p)
{
RETURN_ASSERT_NOT_NULL(p);
// do something actually interesting here
}
assuming I provide the #define FUNC. I want to use the function name since that will be constant throughout many changes (as opposed to LINE) and will be much easier for someone to transfer the value from the old generated #define to the new generated #define when the function itself is renamed. Honestly, I think the only reason I am trying to 'solve' this 'issue' is because I have to work in C rather than C++. At work we are writing fairly object oriented C and so there is a lot of NULL pointer checking and IsInitialized checking. I have two line functions that turn into 30 because of all these basic checks (these macros reduce those lines by a factor of five). While I do enjoy the challenge of crazy macro development, I much prefer to avoid them. That said, I dislike repeating myself and hiding the functional code in a pile of error checking even more than I dislike crazy macros.
If you prefer to take a stab at this issue, have at.
__FUNCTION__ used to compile to a string literal (I think in gcc 2.96), but it hasn't for many years. Now instead we have __func__, which compiles to a string array, and __FUNCTION__ is a deprecated alias for it. (The change was a bit painful.)
But in neither case was it possible to use this predefined macro to generate a valid C identifier (i.e. "remove the quotes").
But could you instead use the line number rather than function name as part of your identifier?
If so, the following would work. As an example, compiling the following 5-line source file:
#define CONCAT_TOKENS4(a,b,c,d) a##b##c##d
#define EXPAND_THEN_CONCAT4(a,b,c,d) CONCAT_TOKENS4(a,b,c,d)
#define MAKE_AN_IDENTIFIER(x) EXPAND_THEN_CONCAT4(line_,__LINE__,__,x)
static int MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED);
will generate the warning:
foo.c:5: warning: 'line_5__NULL_POINTER_PASSED' defined but not used
As pointed out by others, there is no macro that returns the (unquoted) function name (mainly because the C preprocessor has insufficient syntactic knowledge to recognize functions). You would have to explicitly define such a macro yourself, as you already did yourself:
#define FUNC MYFUNC
To avoid having to do this manually, you could write your own preprocessor to add the macro definition automatically. A similar question is this: How to automatically insert pragmas in your program
If your source code has a consistent coding style (particularly indentation), then a simple line-based filter (sed, awk, perl) might do. In its most naive form: every function starts with a line that does not start with a hash or whitespace, and ends with a closing parenthesis or a comma. With awk:
{
print $0;
}
/^[^# \t].*[,\)][ \t]*$/ {
sub(/\(.*$/, "");
sub(/^.*[ \t]/, "");
print "#define FUNC " toupper($0);
}
For a more robust solution, you need a compiler framework like ROSE.
Gnu-C has a __FUNCTION__ macro, but sadly even that cannot be used in the way you are asking.
Ignoring that there are sometimes better non-macro ways to do this (I have good reasons, sadly), I need to write a big bunch of generic code using macros. Essentially a macro library that will generate a large number of functions for some pre-specified types.
To avoid breaking a large number of pre-existing unit tests, one of the things the library must do is, for every type, generate the name of that type in all caps for printing. E.g. a type "flag" must be printed as "FLAG".
I could just manually write out constants for each type, e.g.
#define flag_ALLCAPSNAME FLAG
but this is not ideal. I'd like to be able to do this programatically.
At present, I've hacked this together:
char capname_buf[BUFSIZ];
#define __MACRO_TO_UPPERCASE(arg) strcpy(capname_buf, arg); \
for(char *c=capname_buf;*c;c++)*c = (*c >= 'a' && *c <= 'z')? *c - 'a' + 'A': *c;
__MACRO_TO_UPPERCASE(#flag)
which does what I want to some extent (i.e. after this bit of code, capname_buf has "FLAG" as its contents), but I would prefer a solution that would allow me to define a string literal using macros instead, avoiding the need for this silly buffer.
I can't see how to do this, but perhaps I'm missing something obvious?
I have a variadic foreach loop macro written (like this one), but I can't mutate the contents of the string literal produced by #flag, and in any case, my loop macro would need a list of character pointers to iterate over (i.e. it iterates over lists, not over indices or the like).
Thoughts?
It is not possible in portable C99 to have a macro which converts a constant string to all uppercase letters (in particular because the notion of letter is related to character encoding. An UTF8 letter is not the same as an ASCII one).
However, you might consider some other solutions.
customize your editor to do that. For example, you could write some emacs code which would update each C source file as you require.
use some preprocessor on your C source code (perhaps a simple C code generator script which would emit a bunch of #define in some #include-d file).
use GCC extensions to have perhaps
#define TO_UPPERCASE_COUNTED(Str,Cnt)
#define TO_UPPERCASE(Str) TO_UPPERCASE_COUNTED(Str,__COUNT__) {( \
static char buf_##Cnt[sizeof(Str)+4]; \
char *str_##Cnt = Str; \
int ix_##Cnt = 0; \
for (; *str_##Cnt; str_##Cnt++, ix_##Cnt++) \
if (ix_##Cnt < sizeof(buf_##Cnt)-1) \
buf_##Cnt[ix_##Cnt] = toupper(*str_##Cnt); \
buf_##Cnt; )}
customize GCC, perhaps using MELT (a domain specific language to extend GCC), to provide your __builtin_capitalize_constant to do the job (edit: MELT is now an inactive project). Or code in C++ your own GCC plugin doing that (caveat, it will work with only one given GCC version).
It's not possible to do this entirely using the c preprocessor. The reason for this is that the preprocessor reads the input as (atomic) pp-tokens from which it composes the output. There's no construct for the preprocessor to decompose a pp-token into individual characters in any way (no one that would help you here anyway).
In your example when the preprocessor reads the string literal "flag" it's to the preprocessor basically an atomic chunk of text. It have constructs to conditionally remove such chunks or glue them together into larger chunks.
The only construct that allows you in some sense to decompose a pp-token is via some expressions. However these expressions only can work on arithmetic types which is why they won't help you here.
Your approach circumvents this problem by using C language constructs, ie you do the conversion at runtime. The only thing the preprocessor does then is to insert the C code to convert the string.
Is there a function or method to access C's keywords as mentioned in the question? The only way I can think of it is creating constants that will just be checked to see if any match, but that could be a lot to type, since there are a lot of keywords. I was hoping there was something. (New to C)
It is for a homework, so I cannot use regular expressions or parsing libraries. The purpose of the HW is to give my program a function and just return the identifiers, hence, why I was hoping there was a way to access the keywords easier than typing them all.
Example:
int foo (int args)
{
int x = 7;
char c = 'a';
args = x + c;
return args;
}
And it should return foo, args, x, c.
I am not looking for an answer, so a good hint if there is one would be great! If not, then just let me know that the tedious way is the only option.
To identify the identifiers (as distinct from other token kinds) in the source, you need to lex the source.
One of the easiest ways to do this is to implement Thompson's Algorithm and use the preprocessing grammar from the C99 language specification. Once the source is lexed (or during lexing), you just need to create the list of preprocessing identifiers that are not C99 keywords. It's quite straightforward to implement this in a couple hundred lines of code.
You will need to write a program to read the file, building 'words' from sequences of alphanumeric characters. You'll need a list of the keywords in C - which is quite short. Then you'll compare the words you read against the list of keywords and print out the first occurrence of each (so you'll also need to store the words you've seen).
You'll need to know what you're expected to do with preprocessor directives; you may be able to ignore them. You'll need to know how to recognize numbers, character strings and character constants. You'll need to know how to recognize both /* ... */ and // ... to EOL comments (or maybe not in the first version).
Eventually, you might get sucked into nastinesses such as strings that extend over line breaks and comments such as:
/\
\
* This is a C comment
*\
\
/
However, you can almost certainly omit those subtleties in a first pass.
There is no built-in way of accessing the language from inside itself. Welcome to C, the land of do-it-yourself. Yes, you're going to have to tokenize the input stream and test each word. For tokenizing, check out the strcspn() function (a compliment string of " \t\n" (space, tab, newline) is probably good enough to get you going there.
Then build a NULL-terminated array of strings, e.g.
const char *identifiers [] = {
"int",
"continue",
NULL
};
and iterate over that, doing strcmp() on the input vs the members of the array. If you hit the terminating NULL, you know it's not in the array (bonus points for using a sorted array and libc's bsearch(3) utilities!).