How to define an array of structs at compile time composed of static (private) structs from separate modules? - c

This question is something of a trick C question or a trick clang/gcc question. I'm not sure which.
I phrased it like I did because the final array is in main.c, but the structs that are in the array are defined in C modules.
The end goal of what I am trying to do is to be able to define structs in seperate C modules and then have those structs be available in a contiguous array right from program start. I do not want to use any dynamic code to declare the array and put in the elements.
I would like it all done at compile or link time -- not at run time.
I'm looking to end up with a monolithic blob of memory that gets setup right from program start.
For the sake of the Stack Overflow question, I thought it would make sense if I imagined these as "drivers" (like in the Linux kernel) Going with that...
Each module is a driver. Because the team is complex, I do not know how many drivers there will ultimately be.
Requirements:
Loaded into contiguous memory (an array)
Loaded into memory at program start
installed by the compiler/linker, not dynamic code
a driver exists because source code exists for it (no dynamic code to load them up)
Avoid cluttering up the code
Here is a contrived example:
// myapp.h
//////////////////////////
struct state
{
int16_t data[10];
};
struct driver
{
char name[255];
int16_t (*on_do_stuff) (struct state *state);
/* other stuff snipped out */
};
// drivera.c
//////////////////////////
#include "myapp.h"
static int16_t _on_do_stuff(struct state *state)
{
/* do stuff */
}
static const struct driver _driver = {
.name = "drivera",
.on_do_stuff = _on_do_stuff
};
// driverb.c
//////////////////////////
#include "myapp.h"
static int16_t _on_do_stuff(struct state *state)
{
/* do stuff */
}
static const struct driver _driver = {
.name = "driverb",
.on_do_stuff = _on_do_stuff
};
// driverc.c
//////////////////////////
#include "myapp.h"
static int16_t _on_do_stuff(struct state *state)
{
/* do stuff */
}
static const struct driver _driver = {
.name = "driverc",
.on_do_stuff = _on_do_stuff
};
// main.c
//////////////////////////
#include <stdio.h>
static struct driver the_drivers[] = {
{drivera somehow},
{driverb somehow},
{driverc somehow},
{0}
};
int main(void)
{
struct state state;
struct driver *current = the_drivers;
while (current != 0)
{
printf("we are up to %s\n", current->name);
current->on_do_stuff(&state);
current += sizeof(struct driver);
}
return 0;
}
This doesn't work exactly.
Ideas:
On the module-level structs, I could remove the static const keywords, but I'm not sure how to get them into the array at compile time
I could move all of the module-level structs to main.c, but then I would need to remove the static keyword from all of the on_do_stuff functions, and thereby clutter up the namespace.
In the Linux kernel, they somehow define kernel modules in separate files and then through linker magic, they are able to be loaded into monolithics

Use a dedicated ELF section to "collect" the data structures.
For example, define your data structure in info.h as
#ifndef INFO_H
#define INFO_H
#ifndef INFO_ALIGNMENT
#if defined(__LP64__)
#define INFO_ALIGNMENT 16
#else
#define INFO_ALIGNMENT 8
#endif
#endif
struct info {
long key;
long val;
} __attribute__((__aligned__(INFO_ALIGNMENT)));
#define INFO_NAME(counter) INFO_CAT(info_, counter)
#define INFO_CAT(a, b) INFO_DUMMY() a ## b
#define INFO_DUMMY()
#define DEFINE_INFO(data...) \
static struct info INFO_NAME(__COUNTER__) \
__attribute__((__used__, __section__("info"))) \
= { data }
#endif /* INFO_H */
The INFO_ALIGNMENT macro is the alignment used by the linker to place each symbol, separately, to the info section. It is important that the C compiler agrees, as otherwise the section contents cannot be treated as an array. (You'll obtain an incorrect number of structures, and only the first one (plus every N'th) will be correct, the rest of the structures garbled. Essentially, the C compiler and the linker disagreed on the size of each structure in the section "array".)
Note that you can add preprocessor macros to fine-tune the INFO_ALIGNMENT for each of the architectures you use, but you can also override it for example in your Makefile, at compile time. (For GCC, supply -DINFO_ALIGNMENT=32 for example.)
The used attribute ensures that the definition is emitted in the object file, even though it is not referenced otherwise in the same data file. The section("info") attribute puts the data into a special info section in the object file. The section name (info) is up to you.
Those are the critical parts, otherwise it is completely up to you how you define the macro, or whether you define it at all. Using the macro is easy, because one does not need to worry about using unique variable name for the structure. Also, if at least one member is specified, all others will be initialized to zero.
In the source files, you define the data objects as e.g.
#include "info.h"
/* Suggested, easy way */
DEFINE_INFO(.key = 5, .val = 42);
/* Alternative way, without relying on any macros */
static struct info foo __attribute__((__used__, __section__("info"))) = {
.key = 2,
.val = 1
};
The linker provides symbols __start_info and __stop_info, to obtain the structures in the info section. In your main.c, use for example
#include "info.h"
extern struct info __start_info[];
extern struct info __stop_info[];
#define NUM_INFO ((size_t)(__stop_info - __start_info))
#define INFO(i) ((__start_info) + (i))
so you can enumerate all info structures. For example,
int main(void)
{
size_t i;
printf("There are %zu info structures:\n", NUM_INFO);
for (i = 0; i < NUM_INFO; i++)
printf(" %zu. key=%ld, val=%ld\n", i,
__start_info[i].key, INFO(i)->val);
return EXIT_SUCCESS;
}
For illustration, I used both the __start_info[] array access (you can obviously #define SOMENAME __start_info if you want, just make sure you do not use SOMENAME elsewhere in main.c, so you can use SOMENAME[] as the array instead), as well as the INFO() macro.
Let's look at a practical example, an RPN calculator.
We use section ops to define the operations, using facilities defined in ops.h:
#ifndef OPS_H
#define OPS_H
#include <stdlib.h>
#include <errno.h>
#ifndef ALIGN_SECTION
#if defined(__LP64__) || defined(_LP64)
#define ALIGN_SECTION __attribute__((__aligned__(16)))
#elif defined(__ILP32__) || defined(_ILP32)
#define ALIGN_SECTION __attribute__((__aligned__(8)))
#else
#define ALIGN_SECTION
#endif
#endif
typedef struct {
size_t maxsize; /* Number of values allocated for */
size_t size; /* Number of values in stack */
double *value; /* Values, oldest first */
} stack;
#define STACK_INITIALIZER { 0, 0, NULL }
struct op {
const char *name; /* Operation name */
const char *desc; /* Description */
int (*func)(stack *); /* Implementation */
} ALIGN_SECTION;
#define OPS_NAME(counter) OPS_CAT(op_, counter, _struct)
#define OPS_CAT(a, b, c) OPS_DUMMY() a ## b ## c
#define OPS_DUMMY()
#define DEFINE_OP(name, func, desc) \
static struct op OPS_NAME(__COUNTER__) \
__attribute__((__used__, __section__("ops"))) = { name, desc, func }
static inline int stack_has(stack *st, const size_t num)
{
if (!st)
return EINVAL;
if (st->size < num)
return ENOENT;
return 0;
}
static inline int stack_pop(stack *st, double *to)
{
if (!st)
return EINVAL;
if (st->size < 1)
return ENOENT;
st->size--;
if (to)
*to = st->value[st->size];
return 0;
}
static inline int stack_push(stack *st, double val)
{
if (!st)
return EINVAL;
if (st->size >= st->maxsize) {
const size_t maxsize = (st->size | 127) + 129;
double *value;
value = realloc(st->value, maxsize * sizeof (double));
if (!value)
return ENOMEM;
st->maxsize = maxsize;
st->value = value;
}
st->value[st->size++] = val;
return 0;
}
#endif /* OPS_H */
The basic set of operations is defined in ops-basic.c:
#include "ops.h"
static int do_neg(stack *st)
{
double temp;
int retval;
retval = stack_pop(st, &temp);
if (retval)
return retval;
return stack_push(st, -temp);
}
static int do_add(stack *st)
{
int retval;
retval = stack_has(st, 2);
if (retval)
return retval;
st->value[st->size - 2] = st->value[st->size - 1] + st->value[st->size - 2];
st->size--;
return 0;
}
static int do_sub(stack *st)
{
int retval;
retval = stack_has(st, 2);
if (retval)
return retval;
st->value[st->size - 2] = st->value[st->size - 1] - st->value[st->size - 2];
st->size--;
return 0;
}
static int do_mul(stack *st)
{
int retval;
retval = stack_has(st, 2);
if (retval)
return retval;
st->value[st->size - 2] = st->value[st->size - 1] * st->value[st->size - 2];
st->size--;
return 0;
}
static int do_div(stack *st)
{
int retval;
retval = stack_has(st, 2);
if (retval)
return retval;
st->value[st->size - 2] = st->value[st->size - 1] / st->value[st->size - 2];
st->size--;
return 0;
}
DEFINE_OP("neg", do_neg, "Negate current operand");
DEFINE_OP("add", do_add, "Add current and previous operands");
DEFINE_OP("sub", do_sub, "Subtract previous operand from current one");
DEFINE_OP("mul", do_mul, "Multiply previous and current operands");
DEFINE_OP("div", do_div, "Divide current operand by the previous operand");
The calculator expects each value and operand to be a separate command-line argument for simplicity. Our main.c contains operation lookup, basic usage, value parsing, and printing the result (or error):
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include "ops.h"
extern struct op __start_ops[];
extern struct op __stop_ops[];
#define NUM_OPS ((size_t)(__stop_ops - __start_ops))
static int do_op(stack *st, const char *opname)
{
struct op *curr_op;
if (!st || !opname)
return EINVAL;
for (curr_op = __start_ops; curr_op < __stop_ops; curr_op++)
if (!strcmp(opname, curr_op->name))
break;
if (curr_op >= __stop_ops)
return ENOTSUP;
return curr_op->func(st);
}
static int usage(const char *argv0)
{
struct op *curr_op;
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv0);
fprintf(stderr, " %s RPN-EXPRESSION\n", argv0);
fprintf(stderr, "\n");
fprintf(stderr, "Where RPN-EXPRESSION is an expression using reverse\n");
fprintf(stderr, "Polish notation, and each argument is a separate value\n");
fprintf(stderr, "or operator. The following operators are supported:\n");
for (curr_op = __start_ops; curr_op < __stop_ops; curr_op++)
fprintf(stderr, "\t%-14s %s\n", curr_op->name, curr_op->desc);
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
int main(int argc, char *argv[])
{
stack all = STACK_INITIALIZER;
double val;
size_t i;
int arg, err;
char dummy;
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))
return usage(argv[0]);
for (arg = 1; arg < argc; arg++)
if (sscanf(argv[arg], " %lf %c", &val, &dummy) == 1) {
err = stack_push(&all, val);
if (err) {
fprintf(stderr, "Cannot push %s to stack: %s.\n", argv[arg], strerror(err));
return EXIT_FAILURE;
}
} else {
err = do_op(&all, argv[arg]);
if (err == ENOTSUP) {
fprintf(stderr, "%s: Operation not supported.\n", argv[arg]);
return EXIT_FAILURE;
} else
if (err) {
fprintf(stderr, "%s: Cannot perform operation: %s.\n", argv[arg], strerror(err));
return EXIT_FAILURE;
}
}
if (all.size < 1) {
fprintf(stderr, "No result.\n");
return EXIT_FAILURE;
} else
if (all.size > 1) {
fprintf(stderr, "Multiple results:\n");
for (i = 0; i < all.size; i++)
fprintf(stderr, " %.9f\n", all.value[i]);
return EXIT_FAILURE;
}
printf("%.9f\n", all.value[0]);
return EXIT_SUCCESS;
}
Note that if there were many operations, constructing a hash table to speed up the operation lookup would make a lot of sense.
Finally, we need a Makefile to tie it all together:
CC := gcc
CFLAGS := -Wall -O2 -std=c99
LDFLAGS := -lm
OPS := $(wildcard ops-*.c)
OPSOBJS := $(OPS:%.c=%.o)
PROGS := rpncalc
.PHONY: all clean
all: clean $(PROGS)
clean:
rm -f *.o $(PROGS)
%.o: %.c
$(CC) $(CFLAGS) -c $^
rpncalc: main.o $(OPSOBJS)
$(CC) $(CFLAGS) $^ $(LDFLAGS) -o $#
Because this forum does not preserve Tabs, and make requires them for indentation, you probably need to fix the indentation after copy-pasting the above. I use sed -e 's|^ *|\t|' -i Makefile
If you compile (make clean all) and run (./rpncalc) the above, you'll see the usage information:
Usage: ./rpncalc [ -h | --help ]
./rpncalc RPN-EXPRESSION
Where RPN-EXPRESSION is an expression using reverse
Polish notation, and each argument is a separate value
or operator. The following operators are supported:
div Divide current operand by the previous operand
mul Multiply previous and current operands
sub Subtract previous operand from current one
add Add current and previous operands
neg Negate current operand
and if you run e.g. ./rpncalc 3.0 4.0 5.0 sub mul neg, you get the result 3.000000000.
Now, let's add some new operations, ops-sqrt.c:
#include <math.h>
#include "ops.h"
static int do_sqrt(stack *st)
{
double temp;
int retval;
retval = stack_pop(st, &temp);
if (retval)
return retval;
return stack_push(st, sqrt(temp));
}
DEFINE_OP("sqrt", do_sqrt, "Take the square root of the current operand");
Because the Makefile above compiles all C source files beginning with ops- in to the final binary, the only thing you need to do is recompile the source: make clean all. Running ./rpncalc now outputs
Usage: ./rpncalc [ -h | --help ]
./rpncalc RPN-EXPRESSION
Where RPN-EXPRESSION is an expression using reverse
Polish notation, and each argument is a separate value
or operator. The following operators are supported:
sqrt Take the square root of the current operand
div Divide current operand by the previous operand
mul Multiply previous and current operands
sub Subtract previous operand from current one
add Add current and previous operands
neg Negate current operand
and you have the new sqrt operator available.
Testing e.g. ./rpncalc 1 1 1 1 add add add sqrt yields 2.000000000, as expected.

Related

Is it possible to map a unversioned glibc function (xdr_wrapstring) symbol to a versioned symbol (xdr_wrapstring#GLIBC_2.2.5) at link time?

I have a 3rd party C static library that uses xdr_wrapstring. I am moving to RH 8 where these symbols are not unavailable in the default /lib64/libc.so.6, but are available as versioned symbols (xdr_wrapstring#GLIBC_2.2.5)? Is there a way to tell the linker to resolve xdr_wrapstring to xdr_wrapstring#GLIBC_2.2.5?
I can't link with libtirpc (which provides unversioned symbols) due to it requiring libssl.so & libcrypto.so via libk5crypto.so
Prefaced by the top comments ...
The assembler .symver shows some promise. A web search on it shows:
http://web.mit.edu/rhel-doc/3/rhel-as-en-3/symver.html
https://man7.org/conf/lca2006/shared_libraries/slide19a.html
Linking against older symbol version in a .so file
From this, I've created a symver.s file that has stubs that seems to work on my system [which has the same versioned symbols issue].
However, I'd have a look at those linked pages (e.g.) symver is also an attribute, so it may be possible to do this with inline asm from a .c file.
I've created a crude test program:
// xdrtest.c -- print address of xdr_wrapstring
#include <stdio.h>
void xdr_wrapstring(void);
int
main(void)
{
void *ptr = xdr_wrapstring;
printf("%p\n",ptr);
return 0;
}
Here is an better test program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include <rpc/types.h>
#include <rpc/xdr.h>
#define ALEN 1000
#define sysfault(_fmt...) \
do { \
fprintf(stderr,_fmt); \
exit(1); \
} while (0)
XDR xdrs;
void
sendstring(const char *str)
{
static char buf[ALEN];
char *bp;
strcpy(buf,str);
bp = buf;
if (! xdr_wrapstring(&xdrs,&bp))
sysfault("sendstring: xdr_wrapstring fail -- str='%s' buf='%s'\n",
str,buf);
}
void
recvstring(const char *str)
{
static char buf[ALEN];
char *bp;
bp = buf;
if (! xdr_wrapstring(&xdrs,&bp))
sysfault("recvstring: xdr_wrapstring fail -- str='%s' buf='%s'\n",
str,buf);
fprintf(stderr,"buf=%p bp=%p str='%s' bp='%s'\n",buf,bp,str,bp);
if (strcmp(bp,str) != 0)
sysfault("recvstring: MISMATCH\n");
}
void
writer(void)
{
xdrstdio_create(&xdrs, stdout, XDR_ENCODE);
sendstring("hello");
sendstring("world");
sendstring("goodbye");
sendstring("galaxy");
}
void
reader(void)
{
xdrstdio_create(&xdrs, stdin, XDR_DECODE);
recvstring("hello");
recvstring("world");
recvstring("goodbye");
recvstring("galaxy");
}
int
main(int argc,char **argv)
{
int opt_dir = -1;
--argc;
++argv;
for (; argc > 0; --argc, ++argv) {
char *cp = *argv;
if (*cp != '-')
break;
cp += 2;
switch (cp[-1]) {
case 'r':
opt_dir = 0;
break;
case 'w':
opt_dir = 1;
break;
}
}
switch (opt_dir) {
case 0:
reader();
break;
case 1:
writer();
break;
default:
sysfault("main: -r/-w not specified\n");
break;
}
return 0;
}
Here is the "magic" xdrver.s file:
.globl xdrstdio_create
.symver foo, xdrstdio_create#GLIBC_2.2.5
xdrstdio_create:
jmp foo
.globl xdr_wrapstring
.symver bar, xdr_wrapstring#GLIBC_2.2.5
xdr_wrapstring:
jmp bar
Compile with (e.g.):
cc -o xdrtest xdrtest.c xdrver.s
Or, of course, we can create xdrver.o and link with whatever program we want.
Anyway, to test the program:
./xdrtest -w | ./xdrtest -r
And, the output is:
buf=0x4044a0 bp=0x4044a0 str='hello' bp='hello'
buf=0x4044a0 bp=0x4044a0 str='world' bp='world'
buf=0x4044a0 bp=0x4044a0 str='goodbye' bp='goodbye'
buf=0x4044a0 bp=0x4044a0 str='galaxy' bp='galaxy'

Remove code between #if 0 and #endif when exporting a C file to a new one

I want to remove all comments in a toy.c file. From Remove comments from C/C++ code I see that I could use
gcc -E -fpreprocessed -P -dD toy.c
But some of my code (say deprecated functions that I don't want to compile) are wrapped up between #if 0 and endif, as if they were commented out.
One one hand, the above command does not remove this type of "comment" because its removal is only possible during macro expansion, which -fpreprocessed prevents;
On the other hand, I have other macros I don't want to expand, so dropping -fpreprocessed is a bad idea.
I see a dilemma here. Is there a way out of this situation? Thanks.
The following toy example "toy.c" is sufficient to illustrate the problem.
#define foo 3 /* this is a macro */
// a toy function
int main (void) {
return foo;
}
// this is deprecated
#if 0
int main (void) {
printf("%d\n", foo);
return 0;
}
#endif
gcc -E -fpreprocessed -P -dD toy.c gives
#define foo 3
int main (void) {
return foo;
}
#if 0
int main (void) {
printf("%d\n", foo);
return 0;
}
#endif
while gcc -E -P toy.c gives
int main (void) {
return 3;
}
There's a pair of programs, sunifdef ("Son of unifdef", which is available from unifdef) and coan, that can be used to do what you want. The question Is there a C pre-processor which eliminates #ifdef blocks based on values defined/undefined? has answers which discuss these programs.
For example, given "xyz37.c":
#define foo 3 /* this is a macro */
// a toy function
int main (void) {
return foo;
}
// this is deprecated
#if 0
int main (void) {
printf("%d\n", foo);
}
#endif
Using sunifdef
sunifdef -DDEFINED -ned < xyz37.c
gives
#define foo 3 /* this is a macro */
// a toy function
int main (void) {
return foo;
}
// this is deprecated
and given this file "xyz23.c":
#if 0
This is deleted
#else
This is not deleted
#endif
#if 0
Deleted
#endif
#if defined(XYZ)
XYZ is defined
#else
XYZ is not defined
#endif
#if 1
This is persistent
#else
This is inconsistent
#endif
The program
sunifdef -DDEFINE -ned < xyz23.c
gives
This is not deleted
#if defined(XYZ)
XYZ is defined
#else
XYZ is not defined
#endif
This is persistent
This is, I think, what you're after. The -DDEFINED options seems to be necessary; choose any name that you do not use in your code. You could use -UNEVER_DEFINE_THIS instead, if you prefer. The -ned option evaluates the constant terms and eliminates the relevant code. Without it, the constant terms like 0 and 1 are not eliminated.
I've used sunifdef happily for a number of years (encroaching on a decade). I've not yet found it to make a mistake, and I've used it to clean up some revoltingly abstruse sets of 'ifdeffery'. The program coan is a development of sunifdef with even more capabilities.
The preprocessor doesn't make exceptions. You cannot use it here to do that.
A simple state machine using python can work. It even handles nesting (well, maybe not all cases are covered like nested #if 0 but you can compare the source before & after and manually validate). Also commented code isn't supported (but it seems that you have it covered)
the input (slightly more complex than yours for the demo):
#define foo 3
int main (void) {
return foo;
}
#if 0
int main (void) {
#ifdef DDD
printf("%d\n", foo);
#endif
}
#endif
void other_function()
{}
now the code, using regexes to detect #if & #endif.
import re
rif0 = re.compile("\s*#if\s+0")
rif = re.compile("\s*#(if|ifn?def)")
endif = re.compile("\s*#endif")
if_nesting = 0
if0_nesting = 0
suppress = False
with open("input.c") as fin, open("output.c","w") as fout:
for l in fin:
if rif.match(l):
if_nesting += 1
if rif0.match(l):
suppress = True
if0_nesting = if_nesting
elif endif.match(l):
if if0_nesting == if_nesting:
suppress = False
if_nesting -= 1
continue # don't write the #endif
if not suppress:
fout.write(l))
the output file contains:
#define foo 3
int main (void) {
return foo;
}
void other_function()
{}
so the nesting worked and the #if 0 part was successfully removed. Not something that sed "/#if 0/,/#endif/d can achieve.
Thanks for the other two answers.
I am now aware of unifdef and sunifdef. I am happy to know the existence of these tools, and that I am not the only one who wants to do this kind of code cleaning.
I have also written a rm_if0_endif.c (attached below) for removing an #if 0 ... #endif block which is sufficient for me. Its philosophy is based on text processing. It scans an input C script, locating #if 0 and the correct enclosing endif, so that this block can be omitted during char-to-char copying.
The text processing approach is limited, as it is designed for #if 0 ... #endif case only, but is all I need for now. A C program is not the only way for this kind of text processing. Jean-François Fabre's answer demonstrates how to do it in Python. I can also do something similar in R, using readLines, startsWith and writeLines. I chose to do it in C as I am not yet an expert in C so this task drives me to learn. Here is a demo of my rm_if0_endif.c. Note that the program can concatenate several C files and add header for each file.
original input file input.c
#define foo 3 /* this is a macro */
// a toy function
int test1 (void) {
return foo;
}
#if 0
#undef foo
#define foo 4
#ifdef bar
#warning "??"
#endif
// this is deprecated
int main (void) {
printf("%d\n", foo);
return 0;
}
#endif
// another toy
int test2 (void) {
return foo;
}
gcc pre-processing output "gcc_output.c" (taken as input for my program)
gcc -E -fpreprocessed -P -dD input.c > gcc_output.c
#define foo 3
int test1 (void) {
return foo;
}
#if 0
#undef foo
#define foo 4
#ifdef bar
#warning "??"
#endif
int main (void) {
printf("%d\n", foo);
return 0;
}
#endif
int test2 (void) {
return foo;
}
final output final_output.c from my program
rm_if0_endif.c has a utility function pattern_matching and a workhorse function rm_if0_endif:
void rm_if0_endif (char *InputFile,
char *OutputFile, char *WriteMode, char *OutputHeader);
The attached file below has a main function, doing
rm_if0_endif("gcc_output.c",
"final_output.c", "w", "// this is a demo of 'rm_if0_endif.c'\n");
It produces:
// this is a demo of 'rm_if0_endif.c'
#define foo 3
int test1 (void) {
return foo;
}
int test2 (void) {
return foo;
}
Appendix: rm_if0_endif.c
#include <stdio.h>
int pattern_matching (FILE *fp, const char *pattern, int length_pattern) {
int flag = 1;
int i, c;
for (i = 0; i < length_pattern; i++) {
c = fgetc(fp);
if (c != pattern[i]) {
flag = 0; break;
}
}
return flag;
}
void rm_if0_endif (char *InputFile,
char *OutputFile, char *WriteMode, char *OutputHeader) {
FILE *fp_r = fopen(InputFile, "r");
FILE *fp_w = fopen(OutputFile, WriteMode);
fpos_t pos;
if (fp_r == NULL) perror("error when opening input file!");
fputs(OutputHeader, fp_w);
int c, i, a1, a2;
int if_0_flag, if_flag, endif_flag, EOF_flag;
const char *if_0 = "if 0";
const char *endif = "endif";
EOF_flag = 0;
while (EOF_flag == 0) {
do {
c = fgetc(fp_r);
while ((c != '#') && (c != EOF)) {
fputc(c, fp_w);
c = fgetc(fp_r);
}
if (c == EOF) {
EOF_flag = 1; break;
}
fgetpos(fp_r, &pos);
if_0_flag = pattern_matching(fp_r, if_0, 4);
fsetpos(fp_r, &pos);
if (if_0_flag == 0) fputc('#', fp_w);
} while (if_0_flag == 0);
if (EOF_flag == 1) break;
a1 = 1; a2 = 0;
do {
c = fgetc(fp_r);
while (c != '#') c = fgetc(fp_r);
fgetpos(fp_r, &pos);
if_flag = pattern_matching(fp_r, if_0, 2);
fsetpos(fp_r, &pos);
if (if_flag == 1) a1++;
fgetpos(fp_r, &pos);
endif_flag = pattern_matching(fp_r, endif, 5);
fsetpos(fp_r, &pos);
if (endif_flag == 1) a2++;
} while (a1 != a2);
for (i = 0; i < 5; i++) c = fgetc(fp_r);
if (c == EOF) {
EOF_flag == 1;
}
}
fclose(fp_r);
fclose(fp_w);
}
int main (void) {
rm_if0_endif("gcc_output.c",
"final_output.c", "w", "// this is a demo of 'rm_if0_endif.c'\n");
return 0;
}

How to determine if a pointer equals an element of an array?

I have code in Code Reveiw that "works" as expected, yet may have UB
.
Code has an array of same-sized char arrays called GP2_format[]. To detect if the pointer format has the same value as the address of one of the elements GP2_format[][0], the below code simple tested if the pointer was >= the smallest element and <= the greatest. As the elements are size 1, no further checking needed.
const char GP2_format[GP2_format_N + 1][1];
const char *format = ...;
if (format >= GP2_format[0] && format <= GP2_format[GP2_format_N]) Inside()
else Outside();
C11 §6.5.8/5 Relational operators < > <= >= appears to define this as the dreaded Undefined Behavior when comparing a pointer from outside the array.
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the
same object, ... of the same array object, they compare equal. ...(same object OK) .... (same union OK) .... (same array OK) ... In all other cases, the behavior is undefined.
Q1 Is code's pointer compare in GP2_get_type() UB?
Q2 If so, what is a well defined alternate, search O(1), to the questionable GP2_get_type()?
Slower solutions
Code could sequentially test format against each GP2_format[] or convert the values to intptr_t, sort one time and do a O(ln2(n)) search.
Similar
...if a pointer is part of a set, but this "set" is not random, it is an array.
intptr_t approach - maybe UB.
#include <stdio.h>
typedef enum {
GP2_set_precision,
GP2_set_w,
GP2_setios_flags_,
GP2_string_,
GP2_unknown_,
GP2_format_N
} GP2_type;
const char GP2_format[GP2_format_N + 1][1];
static int GP2_get_type(const char *format) {
// candidate UB with pointer compare
if (format >= GP2_format[0] && format <= GP2_format[GP2_format_N]) {
return (int) (format - GP2_format[0]);
}
return GP2_format_N;
}
int main(void) {
printf("%d\n", GP2_get_type(GP2_format[1]));
printf("%d\n", GP2_get_type("Hello World")); // potential UB
return 0;
}
Output (as expected, yet potentially UB)
1
5
If you want to comply with the C Standard then your options are:
Perform individual == or != tests against each pointer in the target range
You could use a hash table or search tree or something to speed this up, if it is a very large set
Redesign your code to not require this check.
A "probably works" method would be to cast all of the values to uintptr_t and then do relational comparison. If the system has a memory model with absolute ordering then it should define uintptr_t and preserve that ordering; and if it doesn't have such a model then the relational compare idea never would have worked anyway.
This is not an answer to the stated question, but an answer to the underlying problem.
Unless I am mistaken, the entire problem can be avoided by making GP_format a string. This way the problem simplifies to checking whether a pointer points to within a known string, and that is not UB. (If it is, then using strchr() to find a character and compute its index in the string would be UB, which would be completely silly. That would be a serious bug in the standard, in my opinion. Then again, I'm not a language lawyer, just a programmer that tries to write robust, portable C. Fortunately, the standard states it's written to help people like me, and not compiler writers who want to avoid doing hard work by generating garbage whenever a technicality in the standard lets them.)
Here is a full example of the approach I had in mind. This also compiles with clang-3.5, since the newest GCC I have on the machine I'm currently using is version 4.8.4, which has no _Generic() support. If you use a different version of clang, or gcc, change the first line in the Makefile accordingly, or run e.g. make CC=gcc.
First, Makefile:
CC := clang-3.5
CFLAGS := -Wall -Wextra -std=c11 -O2
LD := $(CC)
LDFLAGS :=
PROGS := example
.PHONY: all clean
all: clean $(PROGS)
clean:
rm -f *.o $(PROGS)
%.o: %.c
$(CC) $(CFLAGS) -c $^
example: out.o main.o
$(LD) $^ $(LDFLAGS) -o $#
Next, out.h:
#ifndef OUT_H
#define OUT_H 1
#include <stdio.h>
typedef enum {
out_char,
out_int,
out_double,
out_FILE,
out_set_fixed,
out_set_width,
out_set_decimals,
out_count
} out_type;
extern const char out_formats[out_count + 1];
extern int outf(FILE *, ...);
#define out(x...) outf(stdout, x)
#define err(x...) outf(stderr, x)
#define OUT(x) _Generic( (x), \
FILE *: out_formats + out_FILE, \
double: out_formats + out_double, \
int: out_formats + out_int, \
char: out_formats + out_char ), (x)
#define OUT_END ((const char *)0)
#define OUT_EOL "\n", ((const char *)0)
#define OUT_fixed(x) (out_formats + out_set_fixed), ((int)(x))
#define OUT_width(x) (out_formats + out_set_width), ((int)(x))
#define OUT_decimals(x) (out_formats + out_set_decimals), ((int)(x))
#endif /* OUT_H */
Note that the above OUT() macro expands to two subexpressions separated by a comma. The first subexpression uses _Generic() to emit a pointer within out_formats based on the type of the macro argument. The second subexpression is the macro argument itself.
Having the first argument to the outf() function be a fixed one (the initial stream to use) simplifies the function implementation quite a bit.
Next, out.c:
#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>
#include <errno.h>
#include "out.h"
/* out_formats is a string consisting of ASCII NULs,
* i.e. an array of zero chars.
* We only check if a char pointer points to within out_formats,
* if it points to a zero char; otherwise, it's just a normal
* string we print as-is.
*/
const char out_formats[out_count + 1] = { 0 };
int outf(FILE *out, ...)
{
va_list args;
int fixed = 0;
int width = -1;
int decimals = -1;
if (!out)
return EINVAL;
va_start(args, out);
while (1) {
const char *const format = va_arg(args, const char *);
if (!format) {
va_end(args);
return 0;
}
if (*format) {
if (fputs(format, out) == EOF) {
va_end(args);
return 0;
}
} else
if (format >= out_formats && format < out_formats + sizeof out_formats) {
switch ((out_type)(format - out_formats)) {
case out_char:
if (fprintf(out, "%c", va_arg(args, int)) < 0) {
va_end(args);
return EIO;
}
break;
case out_int:
if (fprintf(out, "%*d", width, (int)va_arg(args, int)) < 0) {
va_end(args);
return EIO;
}
break;
case out_double:
if (fprintf(out, fixed ? "%*.*f" : "%*.*e", width, decimals, (float)va_arg(args, double)) < 0) {
va_end(args);
return EIO;
}
break;
case out_FILE:
out = va_arg(args, FILE *);
if (!out) {
va_end(args);
return EINVAL;
}
break;
case out_set_fixed:
fixed = !!va_arg(args, int);
break;
case out_set_width:
width = va_arg(args, int);
break;
case out_set_decimals:
decimals = va_arg(args, int);
break;
case out_count:
break;
}
}
}
}
Note that the above lacks even OUT("string literal") support; it's quite minimal implementation.
Finally, the main.c to show an example of using the above:
#include <stdlib.h>
#include "out.h"
int main(void)
{
double q = 1.0e6 / 7.0;
int x;
out("Hello, world!\n", OUT_END);
out("One seventh of a million is ", OUT_decimals(3), OUT(q), " = ", OUT_fixed(1), OUT(q), ".", OUT_EOL);
for (x = 1; x <= 9; x++)
out(OUT(stderr), OUT(x), " ", OUT_width(2), OUT(x*x), OUT_EOL);
return EXIT_SUCCESS;
}
In a comment, chux pointed out that we can get rid of the pointer inequality comparisons, if we fill the out_formats array; then (assuming, just for paranoia's sake, we skip the zero index), we can use (*format > 0 && *format < out_type_max && format == out_formats + *format) for the check. This seems to work just fine.
I also applied Pascal Cuoq's answer on how to make string literals decay into char * for _Generic(), so this does support out(OUT("literal")). Here is the modified out.h:
#ifndef OUT_H
#define OUT_H 1
#include <stdio.h>
typedef enum {
out_string = 1,
out_int,
out_double,
out_set_FILE,
out_set_fixed,
out_set_width,
out_set_decimals,
out_type_max
} out_type;
extern const char out_formats[out_type_max + 1];
extern int outf(FILE *, ...);
#define out(x...) outf(stdout, x)
#define err(x...) outf(stderr, x)
#define OUT(x) _Generic( (0,x), \
FILE *: out_formats + out_set_FILE, \
double: out_formats + out_double, \
int: out_formats + out_int, \
char *: out_formats + out_string ), (x)
#define OUT_END ((const char *)0)
#define OUT_EOL "\n", ((const char *)0)
#define OUT_fixed(x) (out_formats + out_set_fixed), ((int)(x))
#define OUT_width(x) (out_formats + out_set_width), ((int)(x))
#define OUT_decimals(x) (out_formats + out_set_decimals), ((int)(x))
#endif /* OUT_H */
Here is the correspondingly modified implementation, out.c:
#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>
#include <errno.h>
#include "out.h"
const char out_formats[out_type_max + 1] = {
[ out_string ] = out_string,
[ out_int ] = out_int,
[ out_double ] = out_double,
[ out_set_FILE ] = out_set_FILE,
[ out_set_fixed ] = out_set_fixed,
[ out_set_width ] = out_set_width,
[ out_set_decimals ] = out_set_decimals,
};
int outf(FILE *stream, ...)
{
va_list args;
/* State (also, stream is included in state) */
int fixed = 0;
int width = -1;
int decimals = -1;
va_start(args, stream);
while (1) {
const char *const format = va_arg(args, const char *);
if (!format) {
va_end(args);
return 0;
}
if (*format > 0 && *format < out_type_max && format == out_formats + (size_t)(*format)) {
switch ((out_type)(*format)) {
case out_string:
{
const char *s = va_arg(args, char *);
if (s && *s) {
if (!stream) {
va_end(args);
return EINVAL;
}
if (fputs(s, stream) == EOF) {
va_end(args);
return EINVAL;
}
}
}
break;
case out_int:
if (!stream) {
va_end(args);
return EINVAL;
}
if (fprintf(stream, "%*d", width, (int)va_arg(args, int)) < 0) {
va_end(args);
return EIO;
}
break;
case out_double:
if (!stream) {
va_end(args);
return EINVAL;
}
if (fprintf(stream, fixed ? "%*.*f" : "%*.*e", width, decimals, va_arg(args, double)) < 0) {
va_end(args);
return EIO;
}
break;
case out_set_FILE:
stream = va_arg(args, FILE *);
if (!stream) {
va_end(args);
return EINVAL;
}
break;
case out_set_fixed:
fixed = !!va_arg(args, int);
break;
case out_set_width:
width = va_arg(args, int);
break;
case out_set_decimals:
decimals = va_arg(args, int);
break;
case out_type_max:
/* This is a bug. */
break;
}
} else
if (*format) {
if (!stream) {
va_end(args);
return EINVAL;
}
if (fputs(format, stream) == EOF) {
va_end(args);
return EIO;
}
}
}
}
If you find a bug or have a suggestion, please let me know in the comments. I don't actually need such code for anything, but I do find the approach very interesting.

Check for Integer Overflow with Boolean

This little project is based on this discussion about the best way to detect integer overflow before an operation is performed. What I want to do is have a program demonstrate the effectivity of utilizing the integer check. It should produce an integer overflow unchecked for some numbers, whereas it should quit before performing the operation if the check (-c) flag is used. The -m is for multiplication.
The program runs fine without the boolean part, but I need some help with the boolean part that conducts the highestOneBitPosition check. I am getting compilation errors after adding the true/false logic. I am not sure if I am calling and using the highestOneBitPosition function properly. Thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*boolean */
#define true 1
#define false 0
typedef int bool;
void ShowUsage ()
{
printf (
"Integer Overflow Check before performing an arithmetic.\n"
"=======================================================\n"
"Usage:\n"
"Integer Operant (-a, -s, -m, -d) Checked/Unchecked (-u, -c)\n"
"Example: ./overflowcheck 2 -a 2 -u\n"
"\n"
);
}
size_t highestOneBitPosition(uint32_t a) {
size_t bits=0;
while (a!=0) {
++bits;
a>>=1;
};
return bits;
}
int main(int argc, char *argv[]) {
if (argc != 5) {ShowUsage (); return (0);}
else if (strcmp(argv[2],"-m") == 0 && strcmp(argv[4],"-u") == 0)
{printf("%s * %s = %d -- Not checked for integer overflow.\n",argv[1],argv[3], atoi(argv[1])*atoi(argv[3]));return 0;}
/*Works fine so far */
else if (strcmp(argv[2],"-m") == 0 && strcmp(argv[4],"-c") == 0)
{
bool multiplication_is_safe(uint32_t a, uint32_t b) {
a = atoi( argv[1] );
b = atoi( argv[3] );
size_t a_bits=highestOneBitPosition(a), b_bits=highestOneBitPosition(b);
return (a_bits+b_bits<=32);}
if (multiplication_is_safe==true)
{printf("%s * %s = %d -- Checked for integer overflow.\n",argv[1],argv[3], atoi(argv[1])*atoi(argv[3]));return 0;}
if (multiplication_is_safe==false)
{printf("Operation not safe, integer overflow likely.\n");}
}
ShowUsage ();
return (0);}
compilation:
gcc integer_overflow2.c -o integer_overflow
integer_overflow2.c:40:61: error: function definition is not allowed here
bool multiplication_is_safe(uint32_t a, uint32_t b) {
^
integer_overflow2.c:45:17: error: use of undeclared identifier
'multiplication_is_safe'
if (multiplication_is_safe==true)
^
integer_overflow2.c:47:17: error: use of undeclared identifier
'multiplication_is_safe'
if (multiplication_is_safe==false)
^
[to long for a comment]
Nested functions are not supported in C.
Properly indented C sources might look like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*boolean */
#define true 1
#define false 0
typedef int bool;
void ShowUsage()
{
printf("Integer Overflow Check before performing an arithmetic.\n"
"=======================================================\n"
"Usage:\n"
"Integer Operant (-a, -s, -m, -d) Checked/Unchecked (-u, -c)\n"
"Example: ./overflowcheck 2 -a 2 -u\n"
"\n");
}
size_t highestOneBitPosition(uint32_t a)
{
size_t bits = 0;
while (a != 0)
{
++bits;
a >>= 1;
};
return bits;
}
bool multiplication_is_safe(uint32_t a, uint32_t b)
{
a = atoi(argv[1]);
b = atoi(argv[3]);
size_t a_bits = highestOneBitPosition(a), b_bits = highestOneBitPosition(b);
return (a_bits + b_bits <= 32);
}
int main(int argc, char *argv[])
{
if (argc != 5)
{
ShowUsage();
return (0);
}
else if (strcmp(argv[2], "-m") == 0 && strcmp(argv[4], "-u") == 0)
{
printf("%s * %s = %d -- Not checked for integer overflow.\n", argv[1],
argv[3], atoi(argv[1]) * atoi(argv[3]));
return 0;
}
/*Works fine so far */
else if (strcmp(argv[2], "-m") == 0 && strcmp(argv[4], "-c") == 0)
{
if (multiplication_is_safe == true)
{
printf("%s * %s = %d -- Checked for integer overflow.\n", argv[1],
argv[3], atoi(argv[1]) * atoi(argv[3]));
return 0;
}
if (multiplication_is_safe == false)
{
printf("Operation not safe, integer overflow likely.\n");
}
}
ShowUsage();
return (0);
}
There however still is a bug, which you might like to find and fix yourself. Look closely what the compiler warns you about. To enable all warnings use -Wall -Wextra -pedantic for gcc.
Check the below link:
Nested function in C
Standard C doesn't support nested functions.So you are seeing compilation errors.
Please move your function outside main() and just invoke that function from main()

tinyc compiler - libtcc, how to bound check?

im using libtcc to compile c code on the fly. Im going to use it on a cloud computer, to be used over the internet.
how do i use tinyc's built in memory and bound checker function?
heres an example that comes with the tinyc libtcc library?
any help would be great!
thank you!
/*
* Simple Test program for libtcc
*
* libtcc can be useful to use tcc as a "backend" for a code generator.
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "libtcc.h"
/* this function is called by the generated code */
int add(int a, int b)
{
return a + b;
}
char my_program[] =
"int fib(int n)\n"
"{\n"
" if (n <= 2)\n"
" return 1;\n"
" else\n"
" return fib(n-1) + fib(n-2);\n"
"}\n"
"\n"
"int foo(int n)\n"
"{\n"
" printf(\"Hello World!\\n\");\n"
" printf(\"fib(%d) = %d\\n\", n, fib(n));\n"
" printf(\"add(%d, %d) = %d\\n\", n, 2 * n, add(n, 2 * n));\n"
" return 0;\n"
"}\n";
int main(int argc, char **argv)
{
TCCState *s;
int (*func)(int);
void *mem;
int size;
s = tcc_new();
if (!s) {
fprintf(stderr, "Could not create tcc state\n");
exit(1);
}
/* if tcclib.h and libtcc1.a are not installed, where can we find them */
if (argc == 2 && !memcmp(argv[1], "lib_path=",9))
tcc_set_lib_path(s, argv[1]+9);
/* MUST BE CALLED before any compilation */
tcc_set_output_type(s, TCC_OUTPUT_MEMORY);
if (tcc_compile_string(s, my_program) == -1)
return 1;
/* as a test, we add a symbol that the compiled program can use.
You may also open a dll with tcc_add_dll() and use symbols from that */
tcc_add_symbol(s, "add", add);
/* get needed size of the code */
size = tcc_relocate(s, NULL);
if (size == -1)
return 1;
/* allocate memory and copy the code into it */
mem = malloc(size);
tcc_relocate(s, mem);
/* get entry symbol */
func = tcc_get_symbol(s, "foo");
if (!func)
return 1;
/* delete the state */
tcc_delete(s);
/* run the code */
func(32);
free(mem);
return 0;
}
you can set bounds checking manually using:
s->do_bounds_check = 1; //s here is TCCState*
just make sure libtcc is compiled with CONFIG_TCC_BCHECK being defined.
you may also want to enable debugging using:
s->do_debug = 1;
the command line option -b does the exact same to enable bounds checking (it enables debugging as well).

Resources