Determine #defined string length at compile time - c

I have a C-program (an Apache module, i.e. the program runs often), which is going to write() a 0-terminated string over a socket, so I need to know its length.
The string is #defined as:
#define POLICY "<?xml version=\"1.0\"?>\n" \
"<!DOCTYPE cross-domain-policy SYSTEM\n" \
"\"http://www.adobe.com/xml/dtds/cross-domain-policy.dtd\">\n" \
"<cross-domain-policy>\n" \
"<allow-access-from domain=\"*\" to-ports=\"8080\"/>\n" \
"</cross-domain-policy>\0"
Is there please a way, better than using strlen(POLICY)+1 at the runtime (and thus calculating the length again and again)?
A preprocessor directive, which would allow setting POLICY_LENGTH already at compile time?

Use sizeof(). e.g. sizeof("blah") will evaluate to 5 at compile-time (5, not 4, because the string literal always includes an implicit null-termination character).

Use 1+strlen(POLICY) and turn on compiler optimizations. GCC will replace strlen(S) with the length of S at compile time if the value from S is known at compile time.

I have a similar problem when using an outdated compiler (VisualDSP) on an embedded platform which does not yet support C++11 (and so I can't use constexpr).
I don't need to evaluate the string length in the precompiler, but I do need to optimize it into a single assignment.
Just in case someone needs this in the future, here's my extremely hacky solution which should work on even crappy compilers as long as they do proper optimization:
#define STRLENS(a,i) !a[i] ? i : // repetitive stuff
#define STRLENPADDED(a) (STRLENS(a,0) STRLENS(a,1) STRLENS(a,2) STRLENS(a,3) STRLENS(a,4) STRLENS(a,5) STRLENS(a,6) STRLENS(a,7) STRLENS(a,8) STRLENS(a,9) -1)
#define STRLEN(a) STRLENPADDED((a "\0\0\0\0\0\0\0\0\0")) // padding required to prevent 'index out of range' issues.
This STRLEN macro will give you the length of the string literal that you provide it, as long as it's less than 10 characters long. In my case this is enough, but in the OPs case the macro may need to be extended (a lot). Since it is highly repetitive you could easily write a script to create a macro that accepts 1000 characters.
PS: This is just a simple offshoot of the problem I was really trying to fix, which is a statically-computed HASH value for a string so I don't need to use any strings in my embedded system. In case anyone is interested (it would have saved me a day of searching and solving), this will do a FNV hash on a small string literal that can be optimized away into a single assignment:
#ifdef _MSC_BUILD
#define HASH_FNV_OFFSET_BASIS 0x811c9dc5ULL
#define HASH_TYPE int
#else // properly define for your own compiler to get rid of overflow warnings
#define HASH_FNV_OFFSET_BASIS 0x811c9dc5UL
#define HASH_TYPE int
#endif
#define HASH_FNV_PRIME 16777619
#define HASH0(a) (a[0] ? ((HASH_TYPE)(HASH_FNV_OFFSET_BASIS * HASH_FNV_PRIME)^(HASH_TYPE)a[0]) : HASH_FNV_OFFSET_BASIS)
#define HASH2(a,i,b) ((b * (a[i] ? HASH_FNV_PRIME : 1))^(HASH_TYPE)(a[i] ? a[i] : 0))
#define HASHPADDED(a) HASH2(a,9,HASH2(a,8,HASH2(a,7,HASH2(a,6,HASH2(a,5,HASH2(a,4,HASH2(a,3,HASH2(a,2,HASH2(a,1,HASH0(a))))))))))
#define HASH(a) HASHPADDED((a "\0\0\0\0\0\0\0\0\0"))

sizeof works at compile time
#define POLICY "<?xml version=\"1.0\"?>\n" \
"<!DOCTYPE cross-domain-policy SYSTEM\n" \
"\"http://www.adobe.com/xml/dtds/cross-domain-policy.dtd\">\n" \
"<cross-domain-policy>\n" \
"<allow-access-from domain=\"*\" to-ports=\"8080\"/>\n" \
"</cross-domain-policy>\0"
char pol[sizeof POLICY];
strcpy(pol, POLICY); /* safe, with an extra char to boot */
If you need a pre-processor symbol with the size, just count the characters and write the symbol yourself :-)
#define POLICY_LENGTH 78 /* just made that number up! */

Per other answers, sizeof(STRING) gives the length (including the \0 terminator) for string literals. However, it has one downside: if you accidentally pass it a char* pointer expression instead of a string literal, it will return an incorrect value–the pointer size, which will be 4 for 32-bit programs and 8 for 64-bit–as the following program demonstrates:
#include <stdio.h>
#define foo "foo: we will give the right answer for this string"
char bar[] = "bar: and give the right answer for this string too";
char *baz = "baz: but for this string our answer is quite wrong";
#define PRINT_LENGTH(s) printf("LENGTH(%s)=%zu\n", (s), sizeof(s))
int main(int argc, char **argv) {
PRINT_LENGTH(foo);
PRINT_LENGTH(bar);
PRINT_LENGTH(baz);
return 0;
}
However, if you are using C11 or later, you can use _Generic to write a macro which will refuse to compile if passed something other than a char[] array:
#include <stdio.h>
#define SIZEOF_CHAR_ARRAY(s) (_Generic(&(s), char(*)[sizeof(s)]: sizeof(s)))
#define foo "foo: we will give the right answer for this string"
char bar[] = "bar: and give the right answer for this string too";
char *baz = "baz: but for this string our answer is quite wrong";
#define PRINT_LENGTH(s) printf("LENGTH(%s)=%zu\n", (s), SIZEOF_CHAR_ARRAY(s))
int main(int argc, char **argv) {
PRINT_LENGTH(foo);
PRINT_LENGTH(bar);
// Will fail to compile if you uncomment incorrect next line:
// PRINT_LENGTH(baz);
return 0;
}
Note this doesn't only work for string literals – it also works correctly for mutable arrays, provided they are actual char[] arrays of fixed length as a string literal is.
As written, the above SIZEOF_CHAR_ARRAY macro will fail for const expressions (although you'd think string literals ought to be const, for backward compatibility reasons they are not):
const char quux[] = "quux is const";
// Next line will fail to compile:
PRINT_LENGTH(quux);
However, we can improve our SIZEOF_CHAR_ARRAY macro so the above example will also work:
#define SIZEOF_CHAR_ARRAY(s) (_Generic(&(s), \
char(*)[sizeof(s)]: sizeof(s), \
const char(*)[sizeof(s)]: sizeof(s) \
))

Related

How to write message longer than 8 bytes with char pointer in C TCP Socket server? [duplicate]

Given an array of pointers to string literals:
char *textMessages[] = {
"Small text message",
"Slightly larger text message",
"A really large text message that "
"is spread over multiple lines"
}
How does one determine the length of a particular string literal - say the third one? I have tried using the sizeof command as follows:
int size = sizeof(textMessages[2]);
But the result seems to be the number of pointers in the array, rather than the length of the string literal.
If you want the number computed at compile time (as opposed to at runtime with strlen) it is perfectly okay to use an expression like
sizeof "A really large text message that "
"is spread over multiple lines";
You might want to use a macro to avoid repeating the long literal, though:
#define LONGLITERAL "A really large text message that " \
"is spread over multiple lines"
Note that the value returned by sizeof includes the terminating NUL, so is one more than strlen.
My suggestion would be to use strlen and turn on compiler optimizations.
For example, with gcc 4.7 on x86:
#include <string.h>
static const char *textMessages[3] = {
"Small text message",
"Slightly larger text message",
"A really large text message that "
"is spread over multiple lines"
};
size_t longmessagelen(void)
{
return strlen(textMessages[2]);
}
After running make CFLAGS="-ggdb -O3" example.o:
$ gdb example.o
(gdb) disassemble longmessagelen
0x00000000 <+0>: mov $0x3e,%eax
0x00000005 <+5>: ret
I.e. the compiler has replaced the call to strlen with the constant value 0x3e = 62.
Don't waste time performing optimizations that the compiler can do for you!
strlen gives you the length of string whereas sizeof will return the size of the Data Type in Bytes you have entered as parameter.
strlen
sizeof
You could exploit the fact, that values in an array are consecutive:
const char *messages[] = {
"footer",
"barter",
"banger"
};
size_t sizeOfMessage1 = (messages[1] - messages[0]) / sizeof(char); // 7 (6 chars + '\0')
The size is determined by using the boundaries of the elements. The space between the beginning of the first and beginning of the second element is the size of the first.
This includes the terminating \0. The solution, of course, does only work properly with constant strings. If the strings would've been pointers, you would get the size of a pointer instead the length of the string.
This is not guaranteed to work. If the fields are aligned, this may yield wrong sizes and there may be other caveats introduced by the compiler, like merging identical strings.
Also you'll need at least two elements in your array.
strlen is slow and potentially executed in run-time. Whereas sizeof("string_literal") - 1 is fast and executed at compile-time. The problem is how to use sizeof on string literals pointed at by your pointer array - we can't.
Now assuming you want this as fast as possible and also done at compile-time for performance reasons... Everything in C is possible if you throw enough ugly macros at the problem. Here's such a solution that favours performance and maintainability at the cost of readability.
We can move the string initializer list out of the array and into a macro. For example by declaring so-called "X-macros", like this:
#define STRING_LIST(X) \
X("Small text message") \
X("Slightly larger text message") \
X("A really large text message that " \
"is spread over multiple lines")
This macro can now be reused for various purposes, by defining another macro and passing it as parameter "X" to the above list. For example the array declaration could be done as:
#define STRING_INIT_LIST(str) str,
char *textMessages[] =
{
STRING_LIST(STRING_INIT_LIST)
};
And if we want a 1-to-1 corresponding look-up table containing the sizes of each string:
#define STRING_SIZES(str) (sizeof(str)-1),
const size_t sizes[] =
{
STRING_LIST(STRING_SIZES)
};
Complete example containing both a look-up table version as well as a directly compile-time processing version:
#include <stdio.h>
#define STRING_LIST(X) \
X("Small text message") \
X("Slightly larger text message") \
X("A really large text message that " \
"is spread over multiple lines")
int main (void)
{
#define STRING_INIT_LIST(str) str,
char *textMessages[] =
{
STRING_LIST(STRING_INIT_LIST)
};
#define STRING_SIZES(str) (sizeof(str)-1),
const size_t sizes[] =
{
STRING_LIST(STRING_SIZES)
};
puts("The strings are:");
#define STRING_PRINT(str) printf(str ", size:%zu\n", sizeof(str)-1);
STRING_LIST(STRING_PRINT)
printf("\nOr if you will:\n");
for(size_t i=0; i<sizeof(textMessages)/sizeof(*textMessages); i++)
{
printf("%s, size:%zu\n", textMessages[i], sizes[i]);
}
}
Output:
The strings are:
Small text message, size:18
Slightly larger text message, size:28
A really large text message that is spread over multiple lines, size:62
Or if you will:
Small text message, size:18
Slightly larger text message, size:28
A really large text message that is spread over multiple lines, size:62
The machine code of this boils down to printing a bunch of strings and constants from memory, no overhead strlen calls at all.
strlen maybe?
size_t size = strlen(textMessages[2]);
You should use the strlen() library method to get the length of a string. sizeof will give you the size of textMessages[2], a pointer, which would be machine dependent (4 bytes or 8 bytes).

Is there a way to detect u8"" literals at preprocessor or compile time in C?

C11 now provides several kinds of string literal:
"old school literals"
u8"UTF-8 encoded literals"
u"char16_t encoded literals"
U"char32_t encoded literals"
L"wchar_t literals, whatever size it may be"
Because of the different types involved, one could discriminate the differently-sized literals using a _Generic() expression. Sadly, there is no size nor type difference between native "quoted literals" and u8"quoted literals."
I wondered if preprocessor magic could be used, but it appears that GCC either treats the u8"text" as an indivisible token, or it eats the u8 during an early stage. Regardless, I wasn't able to grab the "u8" prefix with a macro. :-(
So, I am left to wonder: is there any way to differentiate between native-encoded literals and UTF-8 encoded literals without "just knowing"?
The context is my library code wanting to intelligently convert a passed string to UTF-8. If I can wrap the calls in a macro that figures out whether I need to transcode the string, that would be great. (Otherwise, of course, I have to rely on the user. And you know what an idiot he is.)
You can use _Generic and then do some pre-processor trick to tell the difference. First the _Generic part, which in this case I made to return a string for printing:
#define LITERAL_TYPE(s) \
_Generic((s), \
char*: U8_TYPE(s), \
wchar_t*: "wchar_t", \
char16_t*: "char16_t", \
char32_t*: "char32_t")
Then the U8_TYPE macro:
#define U8_TYPE(s) (#s[0]=='\"'? "old school":"u8")
This macro simply checks if the first character in the pre-processor token is a " or not. It can be made a bit more advanced and look for 'u' and '8' as well, with some && checks, though you have to check for the ending '"' too in that case so that you don't access out of bounds.
Test code:
#include <stdio.h>
#include <wchar.h>
#include <uchar.h>
#define U8_TYPE(s) (#s[0]=='\"'? "old school":"u8")
#define LITERAL_TYPE(s) \
_Generic((s), \
char*: U8_TYPE(s), \
wchar_t*: "wchar_t", \
char16_t*: "char16_t", \
char32_t*: "char32_t")
int main(void)
{
puts(LITERAL_TYPE("hello"));
puts(LITERAL_TYPE(L"hello"));
puts(LITERAL_TYPE(u8"hello"));
puts(LITERAL_TYPE(u"hello"));
puts(LITERAL_TYPE(U"hello"));
}
Output:
old school
wchar_t
u8
char16_t
char32_t

Can a macro remove characters from its arguments?

Is it possible to define a macro that will trim off a portion of the string argument passed in?
For example:
//can this be defined?
#define MACRO(o) ???
int main(){
printf(MACRO(ObjectT)); //prints "Object" not "ObjectT"
}
Would it be possible for a macro that trim off the last character 'T'?
You can do it for specific strings that you know in advance, presented to the macro as symbols rather than as string literals, but not for general symbols and not for string literals at all. For example:
#include <stdio.h>
#define STRINGIFY(s) # s
#define EXPAND_TO_STRING(x) STRINGIFY(x)
#define TRUNCATE_ObjectT Object
#define TRUNCATE_MrT Pity da fool
#define TRUNCATE(s) EXPAND_TO_STRING(TRUNCATE_ ## s)
int main(){
printf(TRUNCATE(ObjectT)); // prints "Object"
printf(TRUNCATE(MrT)); // prints "Pity da fool"
}
That relies on the token-pasting operator, ##, to construct the name of a macro that expands to the truncated text (or, really, the replacement text), and the stringification operator, #, to convert the expanded result to a string literal. There's a little bit of required macro indirection in there, too, to ensure that all the needed expansions are performed.
Well, at least it should print "Object"...
//can this be defined?
#define MACRO(o) #o "\b \b"
int main(){
printf(MACRO(ObjectT)); //prints "Object" not "ObjectT"
}
And no, you can't strip character using preprocessor only without actual C code (say, malloc+strncpy) to do that.
With the preprocessor? No. It sounds like what you really want to do is something like this:
Code not tested
#define STRINGIFY(o) #o
char* serialize(char* s)
{
if (strcmp(s, "ObjectT") == 0) return "Object";
}
int main(){
printf(serialize(STRINGIFY(#o))); //prints "Object" not "ObjectT"
}

Having troubles formatting scanf while keeping my code understandable

I need a function to read a file name, with a max length of MAX_FILE_NAME_SIZE, which is a symbolic constant, I did this the following way:
char * readFileName()
{
char format[6];
char * fileName = malloc(MAX_FILE_NAME_SIZE * sizeof(fileName[0]));
if(fileName== NULL)
return NULL;
sprintf(format, "%%%ds", MAX_FILE_NAME_SIZE-1);
scanf(format, fileName);
fileName= realloc(fileName, strlen(fileName)*sizeof(fileName[0]));
return fileName;
}
I'd really like to get read of the sprintf part (and also the format vector), what's the cleanest and most efficient way to do this?
Solution
You can make a little Preprocessor hack:
#define MAX_BUFFER 30
#define FORMAT(s) "%" #s "s"
#define FMT(s) FORMAT(s)
int main(void)
{
char buffer[MAX_BUFFER + 1];
scanf(FMT(MAX_BUFFER), buffer);
printf("string: %s\n", buffer);
printf("length: %d\n", strlen(buffer));
return 0;
}
The FORMAT and FMT macros are necessary for the preprocessor to translate them correctly. If you call FORMAT directly with FORMAT(MAX_BUFFER), it will translate into "%" "MAX_BUFFER" "s" which is no good.
You can verify that using gcc -E scanf.c. However, if you call it through another macro, which will effectively resolve the macro names for you and translate to "%" "30" "s", which is a fine format string for scanf.
Edit
As correctly pointed out by #Jonathan Leffler in the comments, you can't do any math on that macro, so you need to declare buffer with plus 1 character for the NULL terminating byte, since the macro expands to %30s, which will read 30 characters plus the null byte.
So the correct buffer declaration should be char buffer[MAX_BUFFER + 1];.
Requested Explanation
As asked in the comments, the one macro version won't work because the preprocessor operator # turns an argument into a string (stringification, see bellow). So, when you call it with FORMAT(MAX_BUFFER), it just stringifies MAX_BUFFER instead of macro-expanding it, giving you the result: "%" "MAX_BUFFER" "s".
Section 3.4 Stringification of the C Preprocessor Manual says this:
Sometimes you may want to convert a macro argument into a string constant. Parameters are not replaced inside string constants, but you can use the ‘#’ preprocessing operator instead. When a macro parameter is used with a leading ‘#’, the preprocessor replaces it with the literal text of the actual argument, converted to a string constant. Unlike normal parameter replacement, the argument is not macro-expanded first. This is called stringification.
This is the output of the gcc -E scanf.c command on a file with the one macro version (the last part of it):
int main(void)
{
char buffer[30 + 1];
scanf("%" "MAX_BUFFER" "s", buffer);
printf("string: %s\n", buffer);
printf("length: %d\n", strlen(buffer));
return 0;
}
As expected. Now, for the two levels, I couldn't explain better than the documentation itself, and in the last part of it there's an actual example of this specific case (two macros):
If you want to stringify the result of expansion of a macro argument, you have to use two levels of macros.
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
str (foo)
==> "foo"
xstr (foo)
==> xstr (4)
==> str (4)
==> "4"
s is stringified when it is used in str, so it is not macro-expanded first. But s is an ordinary argument to xstr, so it is completely macro-expanded before xstr itself is expanded (see Argument Prescan). Therefore, by the time str gets to its argument, it has already been macro-expanded.
Resource
The C Preprocessor

Macro expands correctly, but gives me "expected expression" error

I've made a trivial reduction of my issue:
#define STR_BEG "
#define STR_END "
int main()
{
char * s = STR_BEG abc STR_END;
printf("%s\n", s);
}
When compiling this, I get the following error:
static2.c:12:16: error: expected expression
char * s = STR_BEG abc STR_END;
^
static2.c:7:17: note: expanded from macro 'STR_BEG'
#define STR_BEG "
Now, if I just run the preprocessor, gcc -E myfile.c, I get:
int main()
{
char * s = " abc ";
printf("%s\n", s);
}
Which is exactly what I wanted, and perfectly legal resultant code. So what's the deal?
The macro isn't really expanding "correctly", because this isn't a valid C Preprocessor program. As Kerrek says, the preprocessor doesn't quite work on arbitrary character sequences - it works on whole tokens. Tokens are punctuation characters, identifiers, numbers, strings, etc. of the same form (more or less) as the ones that form valid C code. Those defines do not describe valid strings - they open them, and fail to close them before the end of the line. So an invalid token sequence is being passed to the preprocessor. The fact it manages to produce output from an invalid program is arguably handy, but it doesn't make it correct and it almost certainly guarantees garbage output from the preprocessor at best. You need to terminate your strings for them to form whole tokens - right now they form garbage input.
To actually wrap a token, or token sequence, in quotes, use the stringification operator #:
#define STRFY(A) #A
STRFY(abc) // -> "abc"
GCC and similar compilers will warn you about mistakes like this if you compile or preprocess with the -Wall flag enabled.
(I assume you only get errors when you try to compile as C, but not when you do it in two passes, because internally to the compiler, it retains the information that these are "broken" tokens, which is lost if you write out an intermediate file and then compile the preprocessed source in a second pass... if so, this is an implementation detail, don't rely on it.)
One possible solution to your actual problem might look like this:
#define LPR (
#define start STRFY LPR
#define end )
#define STRFY(A) #A
#define ID(...) __VA_ARGS__
ID(
char * s = start()()()end; // -> char * s = "()()()";
)
The ID wrapper is necessary, though. There's no way to do it without that (it can go around any number of lines, or even your whole program, but it must exist for reasons that are well-covered in other questions).

Resources