Is there a limit to the length of a macro's contents? - c

For embedded programs, I often convert data tables into header #defines, which get dropped into variables/arrays in the .c program.
I've just written a conversion tool that can potentially produce massive output in this format, and now I'm wondering if I should be aware of any limitations of this pattern.
Header example:
#define BIG_IMAGE_BLOCK \
0x00, 0x01, 0x02, 0x03, \
0x04, 0x05, 0x06, 0x07, \
/* this goes on ... */ \
0xa8, 0xa9, 0xaa, 0xab
Code example (avr-gcc):
const uint8_t ImageData[] PROGMEM = {
BIG_IMAGE_BLOCK
};
Can't seem to find an answer to this particular question, seems drowned out by everyone asking about identifier, line length and macro re-evaluation limits.

C17 Section 5.2.4.1, clause 1, lists a number of minimum translation limits. This means implementations are permitted, but not required, to exceed those limits. In the quote below, I've omitted a couple of references to footnotes, and highlighted one that is most likely relevant to this question.
The implementation shall be able to translate and execute at least one program that contains at least
one instance of every one of the following limits:
— 127 nesting levels of blocks
— 63 nesting levels of conditional inclusion
— 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic,
structure, union, or
void
type in a declaration
— 63 nesting levels of parenthesized declarators within a full declarator
— 63 nesting levels of parenthesized expressions within a full expression
— 63 significant initial characters in an internal identifier or a macro name(each universal character name or extended source character is considered a single character)
— 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)
— 4095 external identifiers in one translation unit
— 511 identifiers with block scope declared in one block
— 4095 macro identifiers simultaneously defined in one preprocessing translation unit
— 127 parameters in one function definition
— 127 arguments in one function call
— 127 parameters in one macro definition
— 127 arguments in one macro invocation
— 4095 characters in a logical source line
— 4095 characters in a string literal (after concatenation)
— 65535 bytes in an object (in a hosted environment only)
— 15 nesting levels for #included files
— 1023 case labels for a switch
statement (excluding those for any nested
switch
statements)
— 1023 members in a single structure or union
— 1023 enumeration constants in a single enumeration
— 63 levels of nested structure or union definitions in a single struct-declaration-list
Relevance of the number of characters in a logical source line comes about because expansion of a macro will be into a single logical source line. For example, if \ is used in a macro definition to indicate a multi-line macro, all the parts are spliced into a single source line. This is required by Section 5.1.1.2, clause 1, second item.
Depending on how the macro is defined, it may be affected by other limits as well.
Practically, all implementations (compilers and their preprocessors) do exceed these limits. For example, the allowed length of a logical source line for the gnu compiler is determined by available memory.

The C standard is very lax about specifying such limits. A C implementation must be able to translate “at least one program” with 4095 characters on a logical source line (C 2018 5.2.4.1). However, it may fail in other situations with shorter lines. The length of macro replacement text (measured in either characters or preprocessor tokens) is not explicitly addressed.
So, C implementations may have limits on the lengths of macro replacement text and other text, but it is not controlled by the C standard and often not well documented, or documented at all, by C implementations.
A common technique for preparing complicated or massive data needed in source code is to write a separate program to be executed at compile time to process the data and write the desired source text. This is generally preferable to abusing the C preprocessor features.

Related

External and internal Identifier

I know to code in C well but I thought of learning C from the book C - The Complete Reference by Herbert Schildt. Here is a quote from Chapter 2:
In C89, at least the first 6 characters of an external identifier and at
least the first 31 characters of an internal identifier will be significant. C99 has increased these values. In C99, an external identifier has at least 31 significant characters, and an internal identifier has at least 63 significant characters.
Can somebody explain what does it mean to be significant?
That means that it is used within the compiler to differ between different names.
E.g. if only the first 6 characters are significant, when having two variables:
int abcdef_1;
int abcdef_2;
They will be treated as the same variable, and possibly the compiler will generate a warning or error.
About the minimal significance:
Maybe the compiler/assembler can handle more, but the linker cannot. Or maybe external tools which are out of control of the manufacturer of the assembler/linker can handle less, thus a minimum value (per type, internal/external) is defined in the C standard(s).

In which phase of the compiler, the error of an identifier name being too long is detected?

If I have a very long name of an identifier ,so in which phase of compiler this error can be detected .
Also if I have some long range of a constant assigned to a variable , is there any error in this ?
int a=1987655321467890008766555890765433111223;
The C standard defines eight phases of translation:
Physical source multibyte characters and trigraph sequences are mapped to characters of the source character set.
Each backslash followed by a new-line is deleted (splicing together two lines).
The source characters are grouped into preprocessing tokens, and each sequence of white-space characters is replaced by one space, except new-lines are kept.
Preprocessing directives and _Pragma operators are executed, and macro invocations are expanded.
Source characters in strings and character constants are converted to the execution character set.
Adjacent string literals are concatenated.
Each preprocessing token is converted into a grammar token, and white-space characters separated tokens are discarded. The resulting tokens are analyzed and translated (compiled).
All external references are resolved (the program is linked).
The C standard does not specify in which phase problems in names or values are detected, and the phases are largely conceptual. The phases explain how the C language is understood, not how a compiler must execute.
However, given that, phase 3 is a logical time to detect names that are too long, particularly since names can be preprocessing identifiers, not just identifiers for variables in the program. But this could also be done in phase 4 for preprocessing identifiers or 7 for other identifiers. Also, the compiler might accept long identifiers up to phase 7, but the linker in phase 8 might have a shorter limit, so errors could occur in 8.
Numbers that are much too large for the compiler to handle at all might be detected in phase 3, but 7 is more likely. For numbers that are too large for the object they are being used to initialize, phase 7 is the logical time to detect the problem.

Limit of preprocessor __VA_ARGS__ length?

I'm using a preprocessor macro va_args hack for SQL code to allow pasting directly in sqlite3.exe for that quick no-build debugging:
#define QUOTE(...) #__VA_ARGS__
char const example[] = QUOTE(
INSERT INTO Some_Table(p, q) VALUES(?, ?);
);
https://stackoverflow.com/a/17996915/1848654
However this is hardly how __VA_ARGS__ is supposed to be used. In particular my SQL code consists of hundreds of tokens.
What is the limit of __VA_ARGS__ length (if any)?
The only thing I can find is this bit in C99 (C11 still contains the same text):
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that
contains at least one instance of every one of the following limits:13)
[...]
127 arguments in one macro invocation
[...]
4095 characters in a character string literal or wide string literal (after concatenation)
[...]
[...]
13) Implementations should avoid imposing fixed translation limits whenever possible.
So according to the standard there is no fixed limit. You will have to check whether your compiler documents any limits or just tries to support whatever you throw at it (until it runs out of RAM or something).

in C, what is the maximum amount of identifiers you can have?

what is the max amount of variables/identifiers you can have in C? Learning compiler theory and interpreter design, I've learned that identifiers and their values are stored via a symbol dictionary/hashmap.
Considering that hashmaps/dictionaries have a RAM limit, what would be the max amount of hashed identifiers possible in the C programming language?
In general the number of identifiers is a quality-of-implementation issue. All compilers I know are only limited by available resources (memory).
There is, however, a (nearly useless) specification of minimum limits in the C Standard, C11, emphasis for identifiers by me:
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one
program that contains at least one instance of every one of the
following limits:
127 nesting levels of blocks
63 nesting levels of conditional inclusion
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or void type in a
declaration
63 nesting levels of parenthesized declarators within a full declarator
63 nesting levels of parenthesized expressions within a full expression
63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character
is considered a single character)
31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or
less is considered 6 characters, each universal character name
specifying a short identifier of 00010000 or more is considered 10
characters, and each extended source character is considered the same
number of characters as the corresponding universal character name, if
any)
4095 external identifiers in one translation unit
511 identifiers with block scope declared in one block
4095 macro identifiers simultaneously defined in one preprocessing translation unit
127 parameters in one function definition
127 arguments in one function call
127 parameters in one macro definition
127 arguments in one macro invocation
4095 characters in a logical source line
4095 characters in a string literal (after concatenation)
65535 bytes in an object (in a hosted environment only)
15 nesting levels for #included files
1023 case labels for a switch statement (excluding those for any nested switch statements)
1023 members in a single structure or union
1023 enumeration constants in a single enumeration
63 levels of nested structure or union definitions in a single struct-declaration-list
I consider it nearly useless due to the "at least one program" part. I think the intent is clear, but if your vendor sells you a compiler able to translate exactly one program testing these limits, then you won't get your money back :-)
The standard doesn't specify a limit so it's down to the compiler or interpreter to make the choice.
You should also note that identifiers can be compiled out in the final binary.
There does not seem to be any information in the C standard, but the C++ standard does mention some minimum recommendations which you probably could use as a guideline:
Annex B (informative)
Implementation quantities
[implimits]
(2.8) — Identifiers with block scope declared in one block [1 024].

Ambiguous behavior of variable declaration in c

i have the following code
#include<stdio.h>
int main()
{
int a12345678901234567890123456789012345;
int a123456789012345678901234567890123456;
int sum;
scanf("%d",&a12345678901234567890123456789012345);
scanf("%d",&a123456789012345678901234567890123456);
sum = a12345678901234567890123456789012345 + a123456789012345678901234567890123456;
printf("%d\n",sum);
return 0;
}
the problem is, we know that ANSI standard recognizes variables upto 31 characters...but, both variables are same upto 35 characters...but, still the program compiles without any error and warning and giving correct output...
but how?
shouldn't it give an error of redeclaration?
Many compilers are built to exceed ANSI specification (for instance, in recognizing longer than 31 character variable names) as a protection to programmers. While it works in the compiler you're using, you can't count on it working in just any C compiler...
[...] we know that ANSI standard recognizes variables upto 31 characters [...] shouldn't it give an error of redeclaration?
Well, not necessary. Since you mentioned ANSI C, this is the relevant part of C89 standard:
"Implementation limits"
The implementation shall treat at least the first 31 characters of an internal name (a macro name or an identifier that does not have external linkage) as significant. Corresponding lower-case and upper-case letters are different. The implementation may further restrict the significance of an external name (an identifier that has external linkage) to six characters and may ignore distinctions of alphabetical case for such names.10 These limitations on identifiers are all implementation-defined.
Any identifiers that differ in a significant character are different identifiers. If two identifiers differ in a non-significant character, the behavior is undefined.
http://port70.net/~nsz/c/c89/c89-draft.html#3.1.2 (emphasis mine)
It's also explicitly described as a common extension:
Lengths and cases of identifiers
All characters in identifiers (with or without external linkage) are significant and case distinctions are observed (3.1.2)
http://port70.net/~nsz/c/c89/c89-draft.html#A.6.5.3
So, you're just exploiting a C implementation choice of your compiler.
The C89 rationale elaborates on this:
3.1.2 Identifiers
While an implementation is not obliged to remember more than the first
31 characters of an identifier for the purpose of name matching, the
programmer is effectively prohibited from intentionally creating two
different identifiers that are the same in the first 31 characters.
Implementations may therefore store the full identifier; they are not
obliged to truncate to 31.
The decision to extend significance to 31 characters for internal
names was made with little opposition, but the decision to retain the
old six-character case-insensitive restriction on significance of
external names was most painful. While strong sentiment was expressed
for making C ``right'' by requiring longer names everywhere, the
Committee recognized that the language must, for years to come,
coexist with other languages and with older assemblers and linkers.
Rather than undermine support for the Standard, the severe
restrictions have been retained.
Compilers like GCC may store the full identifier.
The number of significant initial characters in an identifier (C90 6.1.2, C90, C99 and C11 5.2.4.1, C99 and C11 6.4.2).
For internal names, all characters are significant. For external
names, the number of significant characters are defined by the linker;
for almost all targets, all characters are significant.
A conforming implementation must support at least 31 characters for an external identifier (and your identifiers are internal, where the limit is 63 for C99 and C11).
In fact, having all characters significant is the intent of the standard, but the committe doesn't want to make implementations non-conforming by not providing it. The limits for external identifiers origin from some linkers unable to provide more (in C89, only 6 characters were required to be significant, which is why the old standard library functions have names not longer than 6 characters).
To be precise, the standard doesn't exactly mandate these limits, the language in the standard is quite permissive:
C11 (n1570) 5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:18)
[...]
63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)19)
[...]
Footnote 18) clearly expresses the intent:
Implementations should avoid imposing fixed translation limits whenever possible.
Footnote 19) refers to Future language directions 6.11.3:
Restriction of the significance of an external name to fewer than 255 characters (considering each universal character name or extended source character as a single character) is an obsolescent feature that is a concession to existing implementations.
And to explain the permissiveness in the first sentence of 5.2.4.1, cf. the C99 rationale (5.10)
5.2.4 Environmental limits
The C89 Committee agreed that the Standard must say something about certain capacities and limitations, but just how to enforce these treaty points was the topic of considerable debate.
5.2.4.1 Translation limits
The Standard requires that an implementation be able to translate and execute some program that meets each of the stated limits. This criterion was felt to give a useful latitude to the implementor in meeting these limits. While a deficient implementation could probably contrive a program that meets this requirement, yet still succeed in being useless, the C89 Committee felt that such ingenuity would probably require more work than making something useful. The sense of both the C89 and C99 Committees was that implementors should not construe the translation limits as the values of hard-wired parameters, but rather as a set of criteria by which an implementation will be judged.
There is no limit .
Actually there is a limit , it has to be small enough that it will fit in memory, but otherwise no . If there is a builtin limit (I don't believe there is) it is so huge you would be really hard-pressed to reach it. I
generated C++ code with 2 variables with a differing last character to ensure that the names that long are distinct . I got to 64KB file and thought that is enough.

Resources