Regexp puzzle : encapsulated structures - c

So here I am trying to do something all programmers probably had to done as well one day.
I have all those nested macros in my code and I want to comment next to each #endif which #if it closes - if there is no comment yet. Can a regexp do that for me ? Until now * and + have been too greedy in Notepad++, even if I used the theoretically lazy version *? and +? ...
For example this
#if A
Code
#if B
Code
#if C
Code
#endif
Code
#endif /* B */
Code
#endif
Into this
#if A
Code
#if B
Code
#if C
Code
#endif /* C */
Code
#endif /* B */
Code
#endif /* A */

Can this be done with a single regular expression? Hell no!
Can this be done in Notepad++ using only regular expressions? HELL YEAH!!!
Turn OFF . matches newline.
Replace all \r\n with \n. This makes the regular expressions much simpler on Windows. You can change them back later if you want.
Clear all existing #endif markings by replacing ^#endif.* with #endif.
We need to focus on only #if... and #endif lines, so we ignore all the others by replacing ^(?!#(if|endif)).* with ;$0. So ; is our 'ignore' marker.
Next, we repeatedly alter #if - #endif pairs that only have ignored lines between them. So, repeatedly 'Replace All' ^#if(.+)\n((;.*\n)*)#endifwith;#if$1\n$2;#endif /* $1 */. This applies the comment to all #if -
#endifs at the lowest nesting level, and then marks them with ;, so the next time you click 'Replace All', they are ignored.
When you are done, remove the ignore markers by replacing ^;(.+?\n) with $1.
VOILA!

Related

How to use/convert string defined in macro to integer?

I would like to compute something according to the version of a library (which I can't change the values) by using C language.
However, the version of the library, that I am using, is defined as string by using #defines like:
/* major version */
#define MAJOR_VERSION "2"
/* minor version */
#define MINOR_VERSION "2"
Then, my question is: how to do define the macro STR_TO_INT in order to convert the strings MINOR_VERSION and MAJOR_VERSION to integer?
#if ((STR_TO_INT(MAJOR_VERSION) == 2 && STR_TO_INT(MINOR_VERSION) >= 2) || (STR_TO_INT(MAJOR_VERSION > 2))
//I perform an action...
#else
//I perform a different action
#endif
I prefer to define it as macro since I am using a lot of function from this library. Please feel free to give me any idea.
Preprocess the official library header, libheader.h, to generate your more useful information without the quotes in a new header, libversion.h:
sed -n -e '/^#define \(M[AI][JN]OR\)_VERSION "\([0-9][0-9]*\)".*/ {
s//#define NUM_\1_VERSION \2/p
}' libheader.h >libversion.h
You might need to be more flexible about allowing spaces and tabs around the separate parts of #, define and the macro name. I also assume there are no comments in the definition (trailing comments are handled):
/* This starts in column 1 - unlike the next line */
# define /* No comment here */ MAJOR_VERSION /* Nor here */ "2"
Now you can include both libheader.h and libversion.h and compare the numeric versions with impunity (as long as you get the expressions correct):
#include "libheader.h"
#include "libversion.h"
#if ((NUM_MAJOR_VERSION == 2 && NUM_MINOR_VERSION >= 2) || NUM_MAJOR_VERSION > 2)
…perform the new action…
#else
…perform the old action…
#endif
Strictly, the sed script will also convert MIJOR_VERSION and MANOR_VERSION; however, they're unlikely to appear in the library header, and you can ignore the generated numeric versions with ease. There are ways to deal with that if you really think it is an actual rather than hypothetical problem.
More seriously, if the library has complicated controls on the version information, it could be that a single header can masquerade as different versions of the library — there could be multiple lines defining the major and minor versions. If that's the case, you have to work a lot harder.
#define MAJOR_VERSION 2 will work anywhere, as an int, as you have, 2, there is no need for string/ conversions. You can directly do:
if (MAJOR_VERSION == 2) { /* version 2 */ }
else { /* not version 2 */ }

Using ENUMs as bitmaps, how to validate in C

I am developing firmware for an embedded application with memory constraints. I have a set of commands that need to processed as they are received. Each command falls under different 'buckets' and each 'bucket' gets a range of valid command numbers. I created two ENUMs as shown below to achieve this.
enum
{
BUCKET_1 = 0x100, // Range of 0x100 to 0x1FF
BUCKET_2 = 0x200, // Range of 0x200 to 0x2FF
BUCKET_3 = 0x300, // Range of 0x300 to 0x3FF
...
...
BUCKET_N = 0xN00 // Range of 0xN00 to 0xNFF
} cmd_buckets;
enum
{
//BUCKET_1 commands
CMD_BUCKET_1_START = BUCKET_1,
BUCKET_1_CMD_1,
BUCKET_1_CMD_2,
BUCKET_1_CMD_3,
BUCKET_1_CMD_4,
//Add new commands above this line
BUCKET_1_CMD_MAX,
//BUCKET_2 commands
CMD_BUCKET_2_START = BUCKET_2,
BUCKET_2_CMD_1,
BUCKET_2_CMD_2,
BUCKET_2_CMD_3,
//Add new commands above this line
BUCKET_2_CMD_MAX,
//BUCKET_3 commands
...
...
...
//BUCKET_N commands
CMD_BUCKET_N_START = BUCKET_N
BUCKET_N_CMD_1,
BUCKET_N_CMD_2,
BUCKET_N_CMD_3,
BUCKET_N_CMD_4,
//Add new commands above this line
BUCKET_N_CMD_MAX,
}cmd_codes
When my command handler function receives a command code, it needs to check if the command is enabled before processing it. I plan to use a bitmap for this. Commands can be enabled or disabled from processing during run-time. I can use an int for each group (giving me 32 commands per group, I realize that 0xN00 to 0xN20 are valid command codes and that others codes in the range are wasted). Even though commands codes are wasted, the design choice has the benefit of easily telling the group of the command code when seeing raw data on a console.
Since many developers can add commands to the 'cmd_codes' enum (even new buckets may be added as needed to the 'cmd_buckets' enum), I want to make sure that the number of command codes in each bucket does not exceed 32 (bitmap is int). I want to catch this at compile time rather than run time. Other than checking each BUCKET_N_CMD_MAX value as below and throwing a compile time error, is there a better solution?
#if (BUCKET_1_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_1 exceeded 32")
#endif
#if (BUCKET_2_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_2 exceeded 32")
#endif
#if (BUCKET_3_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_3 exceeded 32")
#endif
...
...
...
#if (BUCKET_N_CMD_MAX > 0x20)
#error ("Number of commands in BUCKET_N exceeded 32")
#endif
Please also suggest if there is a more elegant way to design this.
Thanks, I appreciate your time and patience.
First fix the bug in the code. As mentioned in comments, you have a constant BUCKET_1 = 0x100 which you then assign CMD_BUCKET_1_START = BUCKET_1. The trailing enums will therefore get values 0x101, 0x102, ... and BUCKET_1_CMD_MAX will be 0x106. Since 0x106 is always larger than 0x20, your static assert will always trigger.
Fix that so that it actually checks the total number of items in the enum instead, like this:
#define BUCKET_1_CMD_N (BUCKET_1_CMD_MAX - CMD_BUCKET_1_START)
#define BUCKET_2_CMD_N (BUCKET_2_CMD_MAX - CMD_BUCKET_2_START)
...
Assuming the above is fixed, then you can replace the numerous checks with a single macro. Not a great improvement, but at least it reduces code repetition:
#define BUCKET_MAX 32 // use a defined constant instead of a magic number
// some helper macros:
#define CHECK(n) BUCKET_ ## n ## _CMD_N
#define STRINGIFY(n) #n
// the actual macro:
#define BUCKET_CHECK(n) \
_Static_assert(CHECK(n) <= BUCKET_MAX, \
"Number of commands in BUCKET_" STRINGIFY(n) "_CMD_N exceeds BUCKET_MAX.");
// usage:
int main (void)
{
BUCKET_CHECK(1);
BUCKET_CHECK(2);
}
Output from gcc in case one constant is too large:
error: static assertion failed: "Number of commands in BUCKET_1_CMD_N exceeds BUCKET_MAX."
note: in expansion of macro 'BUCKET_CHECK'
EDIT
If combining the bug fix with the check macro, you would get this:
#define BUCKET_MAX 32
#define CHECK(n) (BUCKET_##n##_CMD_MAX - CMD_BUCKET_##n##_START)
#define STRINGIFY(n) #n
#define BUCKET_CHECK(n) \
_Static_assert(CHECK(n) <= BUCKET_MAX, \
"Number of commands in BUCKET " STRINGIFY(n) " exceeds BUCKET_MAX.");
int main (void)
{
BUCKET_CHECK(1);
BUCKET_CHECK(2);
}
First of all, preprocessor commands do not work that way. The C preprocessor is able to "see" only names instruced by the #define statement or passes as compiler flags. It is not able to see constants defined as part of an enum or with the const keyword. You should use _Static_assert to validate the commands instead of the preprocessor.
As for the commands, I would suggest to have all the commands numbered in the range 0..0x20:
enum {
BUCKET_1_CMD_1,
BUCKET_1_CMD_2,
...
BUCKET_1_CMD_MAX,
};
enum {
BUCKET_2_CMD_1,
BUCKET_2_CMD_2,
...
BUCKET_2_CMD_MAX,
};
Then you need only a single guard value to check if all the commands are in valid range:
#define MAX_COMMAND 0x20
_Static_assert(BUCKET_1_CMD_MAX <= MAX_COMMAND, "too many bucket 1 commands");
_Static_assert(BUCKET_2_CMD_MAX <= MAX_COMMAND, "too many bucket 2 commands");
To use the commands, bitwise-or them together with the bucket "offset":
enum {
BUCKET_1 = 0x100,
BUCKET_2 = 0x200,
};
...
int cmd = BUCKET_2 | BUCKET_2_CMD_1;

Getting the preprocessor to process a block of code

I have a system where I specify on the command line the verbosity level. In my functions I check against what was specified to determine if I enter a code block or not:
#ifdef DEBUG
if (verbose_get_bit(verbose_level_1)) {
// arbitrary debugging/printing style code generally goes in here, e.g.:
printf("I am printing this because it was specified and I am compiling debug build\n");
}
#endif
I'd like to make this less tedious to set up, so here's what I have so far:
// from "Verbose.h"
bool verbose_get_bit(verbose_group_name name); // verbose_group_name is an enum
#ifdef DEBUG
#define IF_VERBOSE_BIT_D(x) if (verbose_get_bit(x))
#else // not debug: desired behavior is the entire block that follows gets optimized out
#define IF_VERBOSE_BIT_D(x) if (0)
#endif // not debug
Now, I can do this:
IF_VERBOSE_BIT_D(verbose_GL_debug) {
printf("I don't want the release build to execute any of this code");
glBegin(GL_LINES);
// ... and so on
}
I like this because it looks like an if-statement, it functions as an if-statement, it's clear that it's a macro, and it does not get run in the release build.
I'd be reasonably sure that the code will get optimized out since it will be wrapped in a if(false) block but I would prefer it if there was some way I can get the preprocessor to actually throw the entire block away. Can it be done?
I can't think of a way to do it without wrapping the entire block in a macro.
But this might work for your purposes:
#if DEBUG
#define IF_VERBOSE_BIT_D(x) {x}
#else
#define IF_VERBOSE_BIT_D(x)
#endif
IF_VERBOSE_BIT_D(
cout << "this is" << endl;
cout << "in verbose" << endl;
printf("Code = %d\n", 1);
)
Indeed the compiler should be able to optimize out an if (0), but I often do something similar to this when the code inside the block will not compile at all when not in debug mode.
Not anywhere near as neatly as you just did it. Don't worry, your compiler will fully optimize out any if(0) block.
You can if you so desire check this by writing a program in which you have it as you described and compiling it. If you then remove the if(false) blocks, it should compile to the exact same binary, as shown by an MD5 hash. But that's not necessary, I promise your compiler can figure it out!
Just create a condition in the if-statement which is starts with "false &&" if you want to have it disabled entirely. Unless you compile without any optimization the compiler typically removes dead code.

Disable functions using MACROS

After searching quite a bit in the Internet for a solution I decided to ask here if my solution is fine.
I'm trying to write a simple and modular C logging library intended to be simple to disable and specially helping PhD students and researchers to debug an algorithm reducing as much as possibile the impact of the logging system.
My problem is that I want make possible for the user of the library to disable the logging system at compile time producing an executable in which the cost of the logger is 0.
The C code would look like this:
...
logger_t * logger;
result = logger_init(logger);
if(result == -1) {...}
...
this will simply initialize the logger. Looking for an example code I have checked the assert.h header but that soulution results in a list of warnings in my case. In fact, if logger_init() is substituted by 0 using the macro this will result in the variable logger never used.
For this reason I've decided to use this approach:
int logger_init(logger_t *logger);
#ifndef NLOG /* NLOG not defined -> enable logging */
int logger_init(logger_t *logger) {
...
}
#else /* NLOG defined --> the logging system must be disabled */
#define logger_init(logger) (int)((logger = NULL) == NULL)
#endif /* NLOG */
this does not result in warnings and I also avoid the overhead of calling the function. In fact my first try was to do like this:
int logger_init(logger_t *logger) {
#ifndef NLOG /* NLOG not defined -> enable logging */
...
#endif
return 0;
}
keep calling the function even if I do not need it.
Do you think my solution could be considered a good solution? Is there a better solution?
Thanks a lot, guys!
Cheers,
Armando
The standard idiom for that, at least in the 90s, was:
#ifndef NLOG
void logger_init(logger_t *logger);
void logger_log(logger_t *logger, ...);
#else
#define logger_init (void)sizeof
#define logger_log (void)sizeof
#endif
Remember that sizeof operands are not evaluated, although they are syntax checked.
This trick works also with variadic functions, because the sizeof operator will see an expresion with several comma operators:
logger_log(log, 1, 2, 3);
Converts to:
(void)sizeof(log, 1, 2, 3);
Those commas are not separating parameters (sizeof is not a function but an operator), but they are comma operators.
Note that I changed the return value from int to void. No real need for that, but the sizeof return would be mostly meaningless.
Can't your disabled version simply be a constant:
#ifndef NLOG /* NLOG not defined -> enable logging */
int logger_init(logger_t *logger) {
...
}
#else /* NLOG defined --> the logging system must be disabled */
#define logger_init(logger) 0
#endif /* NLOG */
This way, you just have (after pre-compilation): result = 0; which shouldn't produce any warnings.

Mangling __FILE__ and __LINE__ in code for quoting?

Is there a way to get the C/C++ preprocessor or a template or such to mangle/hash the __FILE__ and __LINE__ and perhaps some other external input like a build-number into a single short number that can be quoted in logs or error messages?
(The intention would be to be able to reverse it (to a list of candidates if its lossy) when needed when a customer quotes it in a bug report.)
You will have to use a function to perform the hashing and create a code from __LINE__ and __FILE__ as the C preprocessor is not able to do such complex tasks.
Anyway, you can take inspiration by this article to see if a different solution can be better suited to your situation.
Well... you could use something like:
((*(int*)__FILE__ && 0xFFFF0000) | version << 8 | __LINE__ )
It wouldn't be perfectly unique, but it might work for what you want. Could change those ORs to +, which might work better for some things.
Naturally, if you can actually create a hashcode, you'll probably want to do that.
I needed serial valuse in a project of mine and got them by making a template that specialized on __LINE__ and __FILE__ and resulted in an int as well as generating (as compile time output to stdout) a template specialization for it's inputs that resulted in the line number of that template. These were collected the first time through the compiler and then dumped into a code file and the program was compiled again. That time each location that the template was used got a different number.
(done in D so it might not be possible in C++)
template Serial(char[] file, int line)
{
prgams(msg,
"template Serial(char[] file : \"~file~"\", int line : "~line.stringof~")"
"{const int Serial = __LINE__;");
const int Serial = -1;
}
A simpler solution would be to keep a global static "error location" variable.
#ifdef DEBUG
#define trace_here(version) printf("[%d]%s:%d {%d}\n", version, __FILE__, __LINE__, errloc++);
#else
#define trace_here(version) printf("{%lu}\n", version<<16|errloc++);
#endif
Or without the printf.. Just increment the errloc everytime you cross a tracepoint. Then you can correlate the value to the line/number/version spit out by your debug builds pretty easily.
You'd need to include version or build number, because those error locations could change with any build.
Doesn't work well if you can't reproduce the code paths.
__FILE__ is a pointer into the constants segment of your program. If you output the difference between that and some other constant you should get a result that's independent of any relocation, etc:
extern const char g_DebugAnchor;
#define FILE_STR_OFFSET (__FILE__ - &g_DebugAnchor)
You can then report that, or combine it in some way with the line number, etc. The middle bits of FILE_STR_OFFSET are likely the most interesting.
Well, if you're displaying the message to the user yourself (as opposed to having a crash address or function be displayed by the system), there's nothing to keep you from displaying exactly what you want.
For example:
typedef union ErrorCode {
struct {
unsigned int file: 15;
unsigned int line: 12; /* Better than 5 bits, still not great
Thanks commenters!! */
unsigned int build: 5;
} bits;
unsigned int code;
} ErrorCode;
unsigned int buildErrorCodes(const char *file, int line, int build)
{
ErrorCode code;
code.bits.line=line & ((1<<12) - 1);
code.bits.build=build & ((1<< 5) - 1);
code.bits.file=some_hash_function(file) & ((1<<15) - 1);
return code.code;
}
You'd use that as
buildErrorCodes(__FILE__, __LINE__, BUILD_CODE)
and output it in hex. It wouldn't be very hard to decode...
(Edited -- the commenters are correct, I must have been nuts to specify 5 bits for the line number. Modulo 4096, however, lines with error messages aren't likely to collide. 5 bits for build is still fine - modulo 32 means that only 32 builds can be outstanding AND have the error still happen at the same line.)

Resources