Mangling __FILE__ and __LINE__ in code for quoting? - c

Is there a way to get the C/C++ preprocessor or a template or such to mangle/hash the __FILE__ and __LINE__ and perhaps some other external input like a build-number into a single short number that can be quoted in logs or error messages?
(The intention would be to be able to reverse it (to a list of candidates if its lossy) when needed when a customer quotes it in a bug report.)

You will have to use a function to perform the hashing and create a code from __LINE__ and __FILE__ as the C preprocessor is not able to do such complex tasks.
Anyway, you can take inspiration by this article to see if a different solution can be better suited to your situation.

Well... you could use something like:
((*(int*)__FILE__ && 0xFFFF0000) | version << 8 | __LINE__ )
It wouldn't be perfectly unique, but it might work for what you want. Could change those ORs to +, which might work better for some things.
Naturally, if you can actually create a hashcode, you'll probably want to do that.

I needed serial valuse in a project of mine and got them by making a template that specialized on __LINE__ and __FILE__ and resulted in an int as well as generating (as compile time output to stdout) a template specialization for it's inputs that resulted in the line number of that template. These were collected the first time through the compiler and then dumped into a code file and the program was compiled again. That time each location that the template was used got a different number.
(done in D so it might not be possible in C++)
template Serial(char[] file, int line)
{
prgams(msg,
"template Serial(char[] file : \"~file~"\", int line : "~line.stringof~")"
"{const int Serial = __LINE__;");
const int Serial = -1;
}

A simpler solution would be to keep a global static "error location" variable.
#ifdef DEBUG
#define trace_here(version) printf("[%d]%s:%d {%d}\n", version, __FILE__, __LINE__, errloc++);
#else
#define trace_here(version) printf("{%lu}\n", version<<16|errloc++);
#endif
Or without the printf.. Just increment the errloc everytime you cross a tracepoint. Then you can correlate the value to the line/number/version spit out by your debug builds pretty easily.
You'd need to include version or build number, because those error locations could change with any build.
Doesn't work well if you can't reproduce the code paths.

__FILE__ is a pointer into the constants segment of your program. If you output the difference between that and some other constant you should get a result that's independent of any relocation, etc:
extern const char g_DebugAnchor;
#define FILE_STR_OFFSET (__FILE__ - &g_DebugAnchor)
You can then report that, or combine it in some way with the line number, etc. The middle bits of FILE_STR_OFFSET are likely the most interesting.

Well, if you're displaying the message to the user yourself (as opposed to having a crash address or function be displayed by the system), there's nothing to keep you from displaying exactly what you want.
For example:
typedef union ErrorCode {
struct {
unsigned int file: 15;
unsigned int line: 12; /* Better than 5 bits, still not great
Thanks commenters!! */
unsigned int build: 5;
} bits;
unsigned int code;
} ErrorCode;
unsigned int buildErrorCodes(const char *file, int line, int build)
{
ErrorCode code;
code.bits.line=line & ((1<<12) - 1);
code.bits.build=build & ((1<< 5) - 1);
code.bits.file=some_hash_function(file) & ((1<<15) - 1);
return code.code;
}
You'd use that as
buildErrorCodes(__FILE__, __LINE__, BUILD_CODE)
and output it in hex. It wouldn't be very hard to decode...
(Edited -- the commenters are correct, I must have been nuts to specify 5 bits for the line number. Modulo 4096, however, lines with error messages aren't likely to collide. 5 bits for build is still fine - modulo 32 means that only 32 builds can be outstanding AND have the error still happen at the same line.)

Related

C compile-time lookup table generation

In this answer about brute-forcing 2048 AI, a lookup table storing "2048 array shifts" is precomputed to save needless repetitive calculation. In C, to compute this lookup table at compile time, the way I know of is the "caveman-simple solution" where the table itself is generated as another file that is then #included, something like this python script to generate lut.include (replace with 2048-specific code):
#!/usr/bin/python
def swapbits(x):
ret=0
for i in range(8):
if x&(1<<i): ret |= 1<<(7-i)
return ret
print "const uint8_t bitswap[] = {",
print ", ".join("0x%02x"%swapbits(x) for x in range(256)),
print "}"
Is there any cleaner approach? That is, maybe some preprocessor trickery to generate these tables? With C++ this should be possible with constexpr.
C preprocessor has no loops. You have to write all numbers in C preprocessor, one way or the other.
Cleaner meaning does not involve another script and included file
First, implement a constant expression swapping bits:
#define SWAPBITS(x) ((x&1)<<7)|((x&2)<<6)|((x&3)<<5).... etc.
Then you just write all the numbers:
const uint8_t arr[] =
SWAPBITS(0),
SWAPBITS(1),
... etc. ...
// Because you do not want to use external script, write all the numbers manually here.
};
If the "included file" can be lifted, you can use BOOST_PP_REPEAT. Note that that macro literally lists all iterations it can be called with. Similar with P99_REPEAT from p99 library.
#include <boost/preprocessor/repetition/repeat.hpp>
#define SWAPBITS(x) .. as above ..
#define SWAPBITS_DECL(z, n, t) SWAPBITS(n),
const uint8_t arr[] = {
BOOST_PP_REPEAT(256, SWAPBITS_DECL,)
};

Using ENUMs as bitmaps, how to validate in C

I am developing firmware for an embedded application with memory constraints. I have a set of commands that need to processed as they are received. Each command falls under different 'buckets' and each 'bucket' gets a range of valid command numbers. I created two ENUMs as shown below to achieve this.
enum
{
BUCKET_1 = 0x100, // Range of 0x100 to 0x1FF
BUCKET_2 = 0x200, // Range of 0x200 to 0x2FF
BUCKET_3 = 0x300, // Range of 0x300 to 0x3FF
...
...
BUCKET_N = 0xN00 // Range of 0xN00 to 0xNFF
} cmd_buckets;
enum
{
//BUCKET_1 commands
CMD_BUCKET_1_START = BUCKET_1,
BUCKET_1_CMD_1,
BUCKET_1_CMD_2,
BUCKET_1_CMD_3,
BUCKET_1_CMD_4,
//Add new commands above this line
BUCKET_1_CMD_MAX,
//BUCKET_2 commands
CMD_BUCKET_2_START = BUCKET_2,
BUCKET_2_CMD_1,
BUCKET_2_CMD_2,
BUCKET_2_CMD_3,
//Add new commands above this line
BUCKET_2_CMD_MAX,
//BUCKET_3 commands
...
...
...
//BUCKET_N commands
CMD_BUCKET_N_START = BUCKET_N
BUCKET_N_CMD_1,
BUCKET_N_CMD_2,
BUCKET_N_CMD_3,
BUCKET_N_CMD_4,
//Add new commands above this line
BUCKET_N_CMD_MAX,
}cmd_codes
When my command handler function receives a command code, it needs to check if the command is enabled before processing it. I plan to use a bitmap for this. Commands can be enabled or disabled from processing during run-time. I can use an int for each group (giving me 32 commands per group, I realize that 0xN00 to 0xN20 are valid command codes and that others codes in the range are wasted). Even though commands codes are wasted, the design choice has the benefit of easily telling the group of the command code when seeing raw data on a console.
Since many developers can add commands to the 'cmd_codes' enum (even new buckets may be added as needed to the 'cmd_buckets' enum), I want to make sure that the number of command codes in each bucket does not exceed 32 (bitmap is int). I want to catch this at compile time rather than run time. Other than checking each BUCKET_N_CMD_MAX value as below and throwing a compile time error, is there a better solution?
#if (BUCKET_1_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_1 exceeded 32")
#endif
#if (BUCKET_2_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_2 exceeded 32")
#endif
#if (BUCKET_3_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_3 exceeded 32")
#endif
...
...
...
#if (BUCKET_N_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_N exceeded 32")
#endif
Please also suggest if there is a more elegant way to design this.
Thanks, I appreciate your time and patience.
First fix the bug in the code. As mentioned in comments, you have a constant BUCKET_1 = 0x100 which you then assign CMD_BUCKET_1_START = BUCKET_1. The trailing enums will therefore get values 0x101, 0x102, ... and BUCKET_1_CMD_MAX will be 0x106. Since 0x106 is always larger than 0x20, your static assert will always trigger.
Fix that so that it actually checks the total number of items in the enum instead, like this:
#define BUCKET_1_CMD_N (BUCKET_1_CMD_MAX - CMD_BUCKET_1_START)
#define BUCKET_2_CMD_N (BUCKET_2_CMD_MAX - CMD_BUCKET_2_START)
...
Assuming the above is fixed, then you can replace the numerous checks with a single macro. Not a great improvement, but at least it reduces code repetition:
#define BUCKET_MAX 32 // use a defined constant instead of a magic number
// some helper macros:
#define CHECK(n) BUCKET_ ## n ## _CMD_N
#define STRINGIFY(n) #n
// the actual macro:
#define BUCKET_CHECK(n) \
_Static_assert(CHECK(n) <= BUCKET_MAX, \
"Number of commands in BUCKET_" STRINGIFY(n) "_CMD_N exceeds BUCKET_MAX.");
// usage:
int main (void)
{
BUCKET_CHECK(1);
BUCKET_CHECK(2);
}
Output from gcc in case one constant is too large:
error: static assertion failed: "Number of commands in BUCKET_1_CMD_N exceeds BUCKET_MAX."
note: in expansion of macro 'BUCKET_CHECK'
EDIT
If combining the bug fix with the check macro, you would get this:
#define BUCKET_MAX 32
#define CHECK(n) (BUCKET_##n##_CMD_MAX - CMD_BUCKET_##n##_START)
#define STRINGIFY(n) #n
#define BUCKET_CHECK(n) \
_Static_assert(CHECK(n) <= BUCKET_MAX, \
"Number of commands in BUCKET " STRINGIFY(n) " exceeds BUCKET_MAX.");
int main (void)
{
BUCKET_CHECK(1);
BUCKET_CHECK(2);
}
First of all, preprocessor commands do not work that way. The C preprocessor is able to "see" only names instruced by the #define statement or passes as compiler flags. It is not able to see constants defined as part of an enum or with the const keyword. You should use _Static_assert to validate the commands instead of the preprocessor.
As for the commands, I would suggest to have all the commands numbered in the range 0..0x20:
enum {
BUCKET_1_CMD_1,
BUCKET_1_CMD_2,
...
BUCKET_1_CMD_MAX,
};
enum {
BUCKET_2_CMD_1,
BUCKET_2_CMD_2,
...
BUCKET_2_CMD_MAX,
};
Then you need only a single guard value to check if all the commands are in valid range:
#define MAX_COMMAND 0x20
_Static_assert(BUCKET_1_CMD_MAX <= MAX_COMMAND, "too many bucket 1 commands");
_Static_assert(BUCKET_2_CMD_MAX <= MAX_COMMAND, "too many bucket 2 commands");
To use the commands, bitwise-or them together with the bucket "offset":
enum {
BUCKET_1 = 0x100,
BUCKET_2 = 0x200,
};
...
int cmd = BUCKET_2 | BUCKET_2_CMD_1;

What's with the '#' symbols in this C code?

I'm digging around the guts of numpy, trying to figure out why it's not building for me (64-bit Cygwin, Windows 8.1), I've come to this file.
When compilation hits the rad2deg() function (pasted below), I get a segfault. Looking at the file, there are a ton of '#' symbols sprinkled throughout the code. It looks like some kind of wildcard token, or a preprocessor token, but I can't find any info on it anywhere.
#define LOGE2 NPY_LOGE2#c#
#define LOG2E NPY_LOG2E#c#
#define RAD2DEG (180.0#c#/NPY_PI#c#)
#define DEG2RAD (NPY_PI#c#/180.0#c#)
#type# npy_rad2deg#c#(#type# x) {
return x*RAD2DEG;
}`
There are other places in the code where compiler doesn't choke with the '#' characters.
Can anyone point me to a search term that might explain this?
Okay, I've figured it out. This is what I get for posting a question after a long nap and a dose of cough medicine.
This is some non-standard pre-pre-processor trick, probably implemented in the Python code which builds the C code for numpy.
/**begin repeat
* #type = npy_float, npy_double, npy_longdouble#
* #c = f, ,l#
* #C = F, ,L#
*/
#define LOGE2 NPY_LOGE2#c#
#define LOG2E NPY_LOG2E#c#
#define RAD2DEG (180.0#c#/NPY_PI#c#)
#define DEG2RAD (NPY_PI#c#/180.0#c#)
#type# npy_rad2deg#c#(#type# x)
{
return x*RAD2DEG;
}
/**end repeat**/
It iterates across the code, replacing the #-surrounded tokens in the code with the tokens in the comment block, generating three nearly identical code blocks operating on different data types.
I suspect the segfault may be coming from improper data types; we'll see.
Thanks all!

How does this sfrw(x,x_) macro work (msp430)?

I just ran into an interesting phenomenon with msp430f5529 (TI launchpad). After trying different approaches I was able to find a solution, but I don't understand what is going on here.
This code is part of a timer interrupt service routine (ISR). The special function register (SFR) TA0IV is supposed to hold the value of the interrupt number that triggered the ISR.
1 unsigned int index;
2
3 index = TA0IV; // Gives wrong value: 19874
4 index = *((volatile unsigned int *) TA0IV_); // Correct value: 4
TA0IV is defined with macros here:
5 #define sfrw_(x,x_) volatile __MSPGCC_PERIPHERAL__ unsigned int x __asm__("__" #x)
6 #define sfrw(x,x_) extern sfrw_(x,x_)
7 #define TA0IV_ 0x036E /* Timer0_A5 Interrupt Vector Word */
8 sfrw(TA0IV, TA0IV_);
What does this part of the first macro on line 5 do?
asm("__" #x)
Why is there no "x_" on the right hand side in the macro on line 5?
Last and most important question: Why does the usual typecasting on line 4 work as expected, but the one on line 3 doesn't?
BTW I use gcc-4.7.0.
Edit: More info
9 #define __MSPGCC_PERIPHERAL__ __attribute__((__d16__))
1) The # is a preprocessor "stringify" operator. You can see the impact of this using the -E compiler switch. Google "c stringify" for details.
2) Couldn't say. It isn't required that all parameters get used, and apparently whoever wrote this decided they didn't need it.
3) I'll take a shot at this one, but since I don't have all the source code or the hardware and can't experiment, I probably won't get it quite right. Maybe close enough for what you need though.
The first thing to understand is what the asm bit is doing. Normally (ok, sometimes) when you declare a variable (foo), the compiler assigns its own 'internal' name to the variable (ie _foo). However, when interfacing with asm modules (or other languages), sometimes you need to be able to specify the exact name to use, not allowing the compiler to mangle it in any fashion. That's what this asm is doing (see Asm Labels). So when you brush aside all the #define nonsense, what you've got is:
extern volatile __MSPGCC_PERIPHERAL__ unsigned int TA0IV __asm__("__TA0IV");
Since the definition you have posted is "extern," presumably somewhere (not shown), there's a symbol named __TA0IV that's getting defined. And since accessing it isn't working right, it appears that it is getting MIS-defined.
With the caveat that I HAVEN'T TRIED THIS, I would find this to be somewhat more readable:
#define TA0IV_ 0x036E
inline int ReadInterruptNumber()
{
int retval;
asm volatile("movl (%c1), %0": "=rm" (retval) : "i" (TA0IV_));
return retval;
}
FWIW.

Calculating parity bit with the preprocessor (parity functional style with call by ref)

Consider I want to generate parities at compile time. The parity calculation is given literal constants and with any decent optimizer it will boil down to a single constant itself. Now look at the following parity calculation with the C preprocessor:
#define PARITY16(u16) (PARITY8((u16)&0xff) ^ PARITY8((u16)>>8))
#define PARITY8(u8) (PARITY4((u8)&0x0f) ^ PARITY4((u8)>>4))
#define PARITY4(u4) (PARITY2((u4)&0x03) ^ PARITY2((u4)>>2))
#define PARITY2(u2) (PARITY1((u2)&0x01) ^ PARITY1((u2)>>1))
#define PARITY1(u1) (u1)
int message[] = { 0x1234, 0x5678, PARITY16(0x1234^0x5678));
This will calculate the parity at compile time, but it will produce an enormous amount of intermediate code, expanding to 16 instances of the expression u16 which itself can be e.g. an arbitrary complex expression. The problem is that the C preprocessor can't evaluate intermediary expressions and in the general case only expands text (you can force it to do integer arithmetic in-situ but only for trivial cases, or with gigabytes of #defines).
I have found that the parity for 3 bits can be generated at once by an arithmetic expression: ([0..7]*3+1)/4. This reduces the 16-bit parity to the following macro:
#define PARITY16(u16) ((4 & ((((u16)&7)*3+1) ^ \
((((u16)>>3)&7)*3+1) ^ \
((((u16)>>6)&7)*3+1) ^ \
((((u16)>>9)&7)*3+1) ^ \
((((u16)>>12)&7)*3+1) ^ \
((((u16)>>15)&1)*3+1))) >> 2))
which expands u16only 6 times. Is there an even cheaper (in terms of number of expansions) way, e.g. a direct formula for a 4,5,etc. bit parity? I couldn't find a solution for a linear expression of the form (x*k+d)/m for acceptable (non-overflowing) values k,d,m for a range > 3 bits. Anyone out there with a more clever shortcut for preprocessor parity calculation?
Is something like this what you are looking for?
The following "PARITY16(u16)" preprocessor macro can be used as a literal constant in structure assignments, and it only evaluates the argument once.
/* parity.c
* test code to test out bit-twiddling cleverness
* 2013-05-12: David Cary started.
*/
// works for all 0...0xFFFF
// and only evalutes u16 one time.
#define PARITYodd33(u33) \
( \
((((((((((((((( \
(u33) \
&0x555555555)*5)>>2) \
&0x111111111)*0x11)>>4) \
&0x101010101)*0x101)>>8) \
&0x100010001)*0x10001)>>16) \
&0x100000001)*0x100000001)>>32) \
&1)
#define PARITY16(u16) PARITYodd33(((unsigned long long)u16)*0x20001)
// works for all 0...0xFFFF
// but, alas, generates 16 instances of u16.
#define PARITY_16(u16) (PARITY8((u16)&0xff) ^ PARITY8((u16)>>8))
#define PARITY8(u8) (PARITY4((u8)&0x0f) ^ PARITY4((u8)>>4))
#define PARITY4(u4) (PARITY2((u4)&0x03) ^ PARITY2((u4)>>2))
#define PARITY2(u2) (PARITY1((u2)&0x01) ^ PARITY1((u2)>>1))
#define PARITY1(u1) (u1)
int message1[] = { 0x1234, 0x5678, PARITY16(0x1234^0x5678) };
int message2[] = { 0x1234, 0x5678, PARITY_16(0x1234^0x5678) };
#include <stdio.h>
int main(void){
int errors = 0;
int i=0;
printf(" Testing parity ...\n");
printf(" 0x%x = message with PARITY16\n", message1[2] );
printf(" 0x%x = message with PARITY_16\n", message2[2] );
for(i=0; i<0x10000; i++){
int left = PARITY_16(i);
int right = PARITY16(i);
if( left != right ){
printf(" 0x%x: (%d != %d)\n", i, left, right );
errors++;
return 0;
};
};
printf(" 0x%x errors detected. \n", errors );
} /* vim: set shiftwidth=4 expandtab ignorecase : */
Much like the original code you posted, it pairs up bits and (in effect) calculates the XOR between each pair, then from the results it pairs up the bits again, halving the number of bits each time until only a single parity bit remains.
But is that really what you wanted ?
Many people say they are calculating "the parity" of a message.
But in my experience, most of the time they are really generating
a error-detection code bigger than a single parity bit --
a LRC, or a CRC, or a Hamming code, or etc.
further details
If the current system is compiling in a reasonable amount of time,
and it's giving the correct answers, I would leave it alone.
Refactoring "how the pre-processor generates some constant"
will produce bit-for-bit identically the same runtime executable.
I'd rather have easy-to-read source
even if it takes a full second longer to compile.
Many people use a language easier-to-read than the standard C preprocessor to generate C source code.
See pycrc, the character set extractor, "using Python to generate C", etc.
If the current system is taking way too long to compile,
rather than tweak the C preprocessor,
I would be tempted to put that message, including the parity, in a separate ".h" file
with hard-coded constants (rather than force the C pre-processor to calculate them every time),
and "#include" that ".h" file in the ".c" file for the embedded system.
Then I would make a completely separate program (perhaps in C or Python)
that does the parity calculations and
prints out the contents of that ".h" file as pre-calculated C source code,
something like
print("int message[] = { 0x%x, 0x%x, 0x%x };\n",
M[0], M[1], parity( M[0]^M[1] ) );
and tweak my MAKEFILE to run that Python (or whatever) program to regenerate that ".h" file
if, and only if, it is necessary.
As mfontanini says, an inline function is much better.
If you insist on a macro, you can define a temporary variable.
With gcc, you can do it and still have the macro which behaves as an expression:
#define PARITY(x) ({int tmp=x; PARITY16(tmp);})
If you want to stick to the standard, you have to make the macro a statement:
#define PARITY(x, target) do { int tmp=x; target=PARITY16(tmp); } while(0).
In both cases, you can have ugly bugs if tmp ends up a name used in the function (even worse - used within the parameter passed to the macro).

Resources