C syntax: character combinations valid in any context

C syntax: character combinations valid in any context - c

Here's a formal grammar brain teaser (maybe :P)
I'm fairly certain there is no context where the character sequence => may appear in a valid C program (except obviously within a string). However, I'm unable to prove this to myself. Can you either:
Describe a method that I can use for an arbitrary character sequence to determine whether it is possible in a valid C program (outside a string/comment). Better solutions require less intuition.
Point out a program that does this. I have a weak gut feeling this could be undecidable but it'd be great if I was wrong.
To get your minds working, other combos I've been thinking about:
:- (b ? 1:-1), !? don't think so, ?! (b ?!x:y), <<< don't think so.
If anyone cares: I'm interested because I'm creating a little custom C pre-processor for personal use and was hoping to not have to parse any C for it. In the end I will probably just have my tokens start with $ or maybe a backquote but I still found this question interesting enough to post.
Edit: It was quickly pointed out that header names have almost no restrictions so let me amend that I'm particularly interested in non-pre-processor code, alternatively, we could consider characters within the <> of #include <...> as a string literal.
Re-edit: I guess macros/pre-processor directives beat this question any which way I ask it :P but if anyone can answer the question for pure (read: non-macro'd) C code, I think it's an interesting one.

#include <abc=>
is valid in a C program. The text inside the <...> can be any member of the source character set except a newline and >.
This means that most character sequences, including !? and <<<, could theoretically appear.

In addition to all the other quibbles, there are a variety of cases involving macros.
The arguments to a macro expansion don't need to be syntactically correct, although of course they would need to be syntactically correct in the context of their expansion. But then, they might never be expanded:
#include <errno.h>
#define S_(a) #a
#define _(a,cosmetic,c) [a]=#a" - "S_(c)
const char* err_names[] = {
_(EAGAIN, =>,Resource temporarily unavailable),
_(EINTR, =>,Interrupted system call),
_(ENOENT, =>,No such file or directory),
_(ENOTDIR, =>,Not a directory),
_(EPERM, =>,Operation not permitted),
_(ESRCH, =>,No such process),
};
#undef _
const int nerr = sizeof(err_names)/sizeof(err_names[0]);
Or, they could be used but in stringified form:
#define _(a,b,c) [a]=#a" "S_(b)" "S_(c)
Note: Why #a but S_(c)? Because EAGAIN and friends are macros, not constants, and in this case we don't want them to be expanded before stringification.

/*=>*/
//=>
"=>"
'=>'

Related

where is __DARWIN_NULL macro definition?

I have to return nul in a function but I'm not allowed to include any library. I tried to find in how is NULL defined, then to sys/_types/_null.h only to find that NULL is actually __DARWIN_NULL. Great ! Now, I have no idea where to search in order to find the __DARWIN_NULL definition...

I have to return nul in a function but I'm not allowed to include any library.
The solution to this problem is far simpler than you're making it; you've actually asked a form of XY problem.
You won't find the NUL character defined in any standard library; the best way to return that will be using the constant '\0' or 0.
If your professor is teaching you to avoid using <stddef.h> to find NULL then he/she has set a silly exercise which involves using something other than the most appropriate tool for the job, a tool which is guaranteed by the standard to exist, by the way... I would be raising this as a concern.
Nonetheless, sometimes professors don't care and will teach you to do stupid things anyway. NULL is defined as an implementation-defined null pointer constant, usually 0 or a conversion of 0 to void * like so: ((void *) 0). That may not be the implementation-defined value your NULL resolves to; in that case, adjust to suit :)
You could add a preprocessor definition such as #define NULL ((void *) 0) and then you would be able to return NULL; from your functions. Ta-da! Stupid exercises deserve stupid solutions.
If your professor says you're not allowed to use #define, either, I would be tempted to ask him what you are allowed to use. Changing requirements on the fly is not fair. Most compilers will allow you to set preprocessor constants using a command line argument, for example cc -DNULL='((void *) 0)' .... This is useful for exposing compile-time configuration options, but again, using this to define NULL is dumb.
The question in your title is different to the rest of your post. __DARWIN_NULL could also be defined using either of the above, providing it isn't already defined, but I really don't think that's required to answer your actual question.

How can I get the function name as text not string in a macro?

I am trying to use a function-like macro to generate an object-like macro name (generically, a symbol). The following will not work because __func__ (C99 6.4.2.2-1) puts quotes around the function name.
#define MAKE_AN_IDENTIFIER(x) __func__##__##x
The desired result of calling MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED) would be MyFunctionName__NULL_POINTER_PASSED. There may be other reasons this would not work (such as __func__ being taken literally and not interpreted, but I could fix that) but my question is what will provide a predefined macro like __func__ except without the quotes? I believe this is not possible within the C99 standard so valid answers could be references to other preprocessors.
Presently I have simply created my own object-like macro and redefined it manually before each function to be the function name. Obviously this is a poor and probably unacceptable practice. I am aware that I could take an existing cpp program or library and modify it to provide this functionality. I am hoping there is either a commonly used cpp replacement which provides this or a preprocessor library (prefer Python) which is designed for extensibility so as to allow me to 'configure' it to create the macro I need.
I wrote the above to try to provide a concise and well defined question but it is certainly the Y referred to by #Ruud. The X is...
I am trying to manage unique values for reporting errors in an embedded system. The values will be passed as a parameter to a(some) particular function(s). I have already written a Python program using pycparser to parse my code and identify all symbols being passed to the function(s) of interest. It generates a .h file of #defines maintaining the values of previously existing entries, commenting out removed entries (to avoid reusing the value and also allow for reintroduction with the same value), assigning new unique numbers for new identifiers, reporting malformed identifiers, and also reporting multiple use of any given identifier. This means that I can simply write:
void MyFunc(int * p)
{
if (p == NULL)
{
myErrorFunc(MYFUNC_NULL_POINTER_PASSED);
return;
}
// do something actually interesting here
}
and the Python program will create the #define MYFUNC_NULL_POINTER_PASSED 7 (or whatever next available number) for me with all the listed considerations. I have also written a set of macros that further simplify the above to:
#define FUNC MYFUNC
void MyFunc(int * p)
{
RETURN_ASSERT_NOT_NULL(p);
// do something actually interesting here
}
assuming I provide the #define FUNC. I want to use the function name since that will be constant throughout many changes (as opposed to LINE) and will be much easier for someone to transfer the value from the old generated #define to the new generated #define when the function itself is renamed. Honestly, I think the only reason I am trying to 'solve' this 'issue' is because I have to work in C rather than C++. At work we are writing fairly object oriented C and so there is a lot of NULL pointer checking and IsInitialized checking. I have two line functions that turn into 30 because of all these basic checks (these macros reduce those lines by a factor of five). While I do enjoy the challenge of crazy macro development, I much prefer to avoid them. That said, I dislike repeating myself and hiding the functional code in a pile of error checking even more than I dislike crazy macros.
If you prefer to take a stab at this issue, have at.

__FUNCTION__ used to compile to a string literal (I think in gcc 2.96), but it hasn't for many years. Now instead we have __func__, which compiles to a string array, and __FUNCTION__ is a deprecated alias for it. (The change was a bit painful.)
But in neither case was it possible to use this predefined macro to generate a valid C identifier (i.e. "remove the quotes").
But could you instead use the line number rather than function name as part of your identifier?
If so, the following would work. As an example, compiling the following 5-line source file:
#define CONCAT_TOKENS4(a,b,c,d) a##b##c##d
#define EXPAND_THEN_CONCAT4(a,b,c,d) CONCAT_TOKENS4(a,b,c,d)
#define MAKE_AN_IDENTIFIER(x) EXPAND_THEN_CONCAT4(line_,__LINE__,__,x)
static int MAKE_AN_IDENTIFIER(NULL_POINTER_PASSED);
will generate the warning:
foo.c:5: warning: 'line_5__NULL_POINTER_PASSED' defined but not used

As pointed out by others, there is no macro that returns the (unquoted) function name (mainly because the C preprocessor has insufficient syntactic knowledge to recognize functions). You would have to explicitly define such a macro yourself, as you already did yourself:
#define FUNC MYFUNC
To avoid having to do this manually, you could write your own preprocessor to add the macro definition automatically. A similar question is this: How to automatically insert pragmas in your program
If your source code has a consistent coding style (particularly indentation), then a simple line-based filter (sed, awk, perl) might do. In its most naive form: every function starts with a line that does not start with a hash or whitespace, and ends with a closing parenthesis or a comma. With awk:
{
print $0;
}
/^[^# \t].*[,\)][ \t]*$/ {
sub(/\(.*$/, "");
sub(/^.*[ \t]/, "");
print "#define FUNC " toupper($0);
}
For a more robust solution, you need a compiler framework like ROSE.

Gnu-C has a __FUNCTION__ macro, but sadly even that cannot be used in the way you are asking.

C macros: advantage/intent of apparently useless macro

I have some experience in programming in C but I would not dare to call myself proficient.
Recently, I encountered the following macro:
#define CONST(x) (x)
I find it typically used in expressions like for instance:
double x, y;
x = CONST(2.0)*y;
Completely baffled by the point of this macro, I extensively researched the advantages/disadvantages and properties of macros but still I can not figure out what the use of this particular macro would be. Am I missing something?

As presented in the question, you are right that the macro does nothing.
This looks like some artificial structure imposed by whoever wrote that code, maybe to make it abundantly clear where the constants are, and be able to search for them? I could see the advantage in having searchable constants, but this is not the best way to achieve that goal.
It's also possible that this was part of some other macro scheme that either never got implemented or was only partially removed.

Some (old) C compilers do not support the const keyword and this macro is most probably a reminiscence of a more elaborate sequence of macros that handled different compilers. Used like in x = CONST(2.0)*y; though makes no sense.
You can check this section from the Autoconf documentation for more details.
EDIT: Another purpose of this macro might be custom preprocessing (for extracting and/or replacing certain constants for example), like Qt Framework's Meta Object Compiler does.

There is absolutely no benefit of that macro and whoever wrote it must be confused. The code is completely equivalent to x = 2.0*y;.

Well this kind of macro could actually be usefull when there is a need to workaround the macro expansion.
A typical example of such need is the stringification macro. Refer to the following question for an example : C Preprocessor, Stringify the result of a macro
Now in your specific case, I don't see the benefit appart from extreme documention or code parsing purposes.

Another use could be to reserve those values as future function invocations, something like this:
/* #define CONST(x) (x) */
#define CONST(x) some_function(x)
// ...
double x, y;
x = CONST(2.0)*y; // x = some_function(2.0)*y;

Another good thing about this macro would be something like this
result=CONST(number+number)*2;
or something related to comparisons
result=CONST(number>0)*2;
If there is some problem with this macro, it is probably the name. This "CONST" thing isn't related with constants but with some other thing. It would be nice to look for the rest of the code to know why the author called it CONST.

This macro does have the effect of wrapping parenthesis around x during the macro expansion.
I'm guessing someone is trying to allow for something along the lines of
CONST(3+2)*y
which, without the parens, would become
3+2*y
but with the parens becomes
(3+2)*y
I seem to recall that we had the need for something like this in a previous development lifetime.

C macro/#define indentation?

I'm curious as to why I see nearly all C macros formatted like this:
#ifndef FOO
# define FOO
#endif
Or this:
#ifndef FOO
#define FOO
#endif
But never this:
#ifndef FOO
#define FOO
#endif
(moreover, vim's = operator only seems to count the first two as correct.)
Is this due to portability issues among compilers, or is it just a standard practice?

I've seen it done all three ways, it seems to be a matter of style, not of syntax
While usually the second example is the most common, i've seen cases where the first (or third) is used to help distinguish multiple levels of #ifdefs. Sometimes the logic can become deeply nested and the only way to understand it at a glance is to use indentation much like it is common practice to indent blocks of code between { and }.

IIRC, older C preprocessors required the # to be the first character on the line (though I've never actually encountered one that had this requirement).
I never seen your code like your first example. I usually wrote preprocessor directives as in your second example. I found that it visually interfered with the indentation of the actual code less (not that I write in C anymore).
The GNU C Preprocessor manual says:
Preprocessing directives are lines in
your program that start with '#'.
Whitespace is allowed before and after
the '#'.

For preference I use the third style, with the exception of include guards, for which I use the second style.
I don't like the first style at all - I think of #define as being a preprocessor instruction, even though really of course it isn't, it's a # followed by the preprocessor instruction define. But since I do think of it that way, it seems wrong to separate them. I expect text editors written by people who advocate that style will have a block indent/un-indent that works on code written in that style. But I would hate to encounter it using a text editor that didn't.
There's no point pandering to ancient preprocessors where the # has to be the first character of the line, unless you can also list off the top of your head all the other differences between those implementations and standard C, in order to avoid the other things you could possibly do that they would not support. Of course if you genuinely are working with a pre-standard compiler, fair enough.

Preprocessor directives are lines included in our programs that are not actually program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#).Whitespace is allowed before and after the '#'. As soon as a newline character is found, the preprocessor directive is considered to end.
There is no other rule as far the standard of C/C++ concerned,So it remains as the matter of style and readability issue,I have seen/wrote programs only in the second way that you posted,although the third one seems more readable.

Finding the name of a variable in C

I was asked a question in C last night and I did not know the answer since I have not used C much since college so I thought maybe I could find the answer here instead of just forgetting about it.
If a person has a define such as:
#define count 1
Can that person find the variable name count using the 1 that is inside it?
I did not think so since I thought the count would point to the 1 but do not see how the 1 could point back to count.

Building on #Cade Roux's answer, if you use a preprocessor #define to associate a value with a symbol, the code won't have any reference to the symbol once the preprocessor has run:
#define COUNT (1)
...
int myVar = COUNT;
...
After the preprocessor runs:
...
int myVar = (1);
...
So as others have noted, this basically means "no", for the above reason.

The simple answer is no they can't. #Defines like that are dealt with by the preprocessor, and they only point in one direction. Of course the other problem is that even the compiler wouldn't know - as a "1" could point to anything - multiple variables can have the same value at the same time.

Can that person find the variable name "count" using the 1 that is inside it?
No

As I'm sure someone more eloquent and versed than me will point out #define'd things aren't compiled into the source, what you have is a pre-processor macro which will go through the source and change all instance of 'count' it finds with a '1'.
However, to shed more light on the question you were asked, because C is a compiled language down to the machine code you are never going to have the reflection and introspection you have with a language like Java, or C#. All the naming is lost after compilation unless you have a framework built around your source/compiler to do some nifty stuff.
Hope this helps. (excuse the pun)

Unfortunately this is not possible.
#define statements are instructions for the preprocessor, all instances of count are replaced with 1. At runtime there is no memory location associated with count, so the effort is obviously futile.
Even if you're using variables, after compilation there will be no remnants of the original identifiers used in the program. This is generally only possible in dynamic languages.

One trick used in C is using the # syntax in macros to obtain the string literal of the of the macro parameter.
#define displayInt(val) printf("%s: %d\n",#val,val)
#define displayFloat(val) printf("%s: %d\n",#val,val)
#define displayString(val) printf("%s: %s\n",#val,val)
int main(){
int foo=123;
float bar=456.789;
char thud[]="this is a string";
displayInt(foo);
displayFloat(bar);
displayString(thud);
return 0;
}
The output should look something like the following:
foo: 123
bar: 456.789
thud: this is a string

#define count 1 is a very bad idea, because it prevents you from naming any variables or structure fields count.
For example:
void copyString(char* dst, const char* src, size_t count) {
...
}
Your count macro will cause the variable name to be replaced with 1, preventing this function from compiling:
void copyString(char* dst, const char* src, size_t 1) {
...
}

C defines are a pre-processor directive, not a variable. The pre-processor will go through your C file and replace where you write count with what you've defined it as, before compiling. Look at the obfuscated C contest entries for some particularly enlightened uses of this and other pre-processor directives.
The point is that there is no 'count' to point at a '1' value. It just a simple/find replace operation that happens before the code is even really compiled.
I'll leave this editable for someone who actually really knows C to correct.

count isn't a variable. It has no storage allocated to it and no entry in the symbol table. It's a macro that gets replaced by the preprocessor before passing the source code to the compiler.
On the off chance that you aren't asking quite the right question, there is a way to get the name using macros:
#define SHOW(sym) (printf(#sym " = %d\n", sym))
#define count 1
SHOW(count); // prints "count = 1"
The # operator converts a macro argument to a string literal.

#define is a pre-processor directive, as such it is not a "variable"

What you have there is actually not a variable, it is a preprocessor directive. When you compile the code the preprocessor will go through and replace all instaces of the word 'count' in that file with 1.
You might be asking if I know 1 can I find that count points to it? No. Because the relationship between variables names and values is not a bijection there is no way back. Consider
int count = 1;
int count2 = 1;
perfectly legal but what should 1 resolve to?

In general, no.
Firstly, a #define is not a variable, it is a compiler preprocessor macro.
By the time the main phase of the compiler gets to work, the name has been replaced with the value, and the name "count" will not exist anywhere in the code that is compiled.
For variables, it is not possible to find out variable names in C code at runtime. That information is not kept. Unlike languages like Java or C#, C does not keep much metadata at all, in compiles down to assembly language.

Directive starting with "#" are handled by the pre-processor which usually does text substitution before passing the code to the 'real' compiler. As such, there is no variable called count, it's as if all "count" strings in your code are magically replaced with the "1" string.
So, no, no way to find that "variable".

In case of a macro this is preprocessed and the resulting output is compiled. So it is absolutely no way to find out that name because after the preprocessor finnishes his job the resulting file would contain '1' instead of 'count' everywhere in the file.
So the answer is no.

If they are looking at the C source code (which they will be in a debugger), then they will see something like
int i = count;
at that point, they can search back and find the line
#define count 1
If, however, all they have is variable iDontKnowWhat, and they can see it contans 1, there is no way to track that back to 'count'.
Why? Because the #define is evaluated at preprocessor time, which happens even before compilation (though for almost everyone, it can be viewed as the first stage of compilation). Consequently the source code is the only thing that has any information about 'count', like knowing that it ever existed. By the time the compiler gets a look in, every reference to 'count' has been replaced by the number '1'.

It's not a pointer, it's just a string/token substitution. The preprocessor replaces all the #defines before your code ever compiles. Most compilers include a -E or similar argument to emit precompiled code, so you can see what the code looks like after all the #directives are processed.
More directly to your question, there's no way to tell that a token is being replaced in code. Your code can't even tell the difference between (count == 1) and (1 == 1).
If you really want to do that, it might be possible using source file text analysis, say using a diff tool.

What do you mean by "finding"?
The line
#define count 1
defines a symbol "count" that has value 1.
The first step of the compilation process (called preprocessing) will replace every occurence of the symbol count with 1 so that if you have:
if (x > count) ...
it will be replaced by:
if (x > 1) ...
If you get this, you may see why "finding count" is meaningless.

The person asking the question (was it an interview question?) may have been trying to get you to differentiate between using #define constants versus enums. For example:
#define ZERO 0
#define ONE 1
#define TWO 2
vs
enum {
ZERO,
ONE,
TWO
};
Given the code:
x = TWO;
If you use enumerations instead of the #defines, some debuggers will be able to show you the symbolic form of the value, TWO, instead of just the numeric value of 2.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C syntax: character combinations valid in any context - c

#include <abc=> is valid in a C program. The text inside the <...> can be any member of the source character set except a newline and >. This means that most character sequences, including !? and <<<, could theoretically appear.

/=>/ //=> "=>" '=>'

Related

where is __DARWIN_NULL macro definition?

How can I get the function name as text not string in a macro?

C macros: advantage/intent of apparently useless macro

C macro/#define indentation?

Finding the name of a variable in C

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C syntax: character combinations valid in any context - c

#include <abc=> is valid in a C program. The text inside the <...> can be any member of the source character set except a newline and >. This means that most character sequences, including !? and <<<, could theoretically appear.

/*=>*/ //=> "=>" '=>'

Related

where is __DARWIN_NULL macro definition?

How can I get the function name as text not string in a macro?

C macros: advantage/intent of apparently useless macro

C macro/#define indentation?

Finding the name of a variable in C

Categories

Resources

/=>/ //=> "=>" '=>'