initializng a C array with repetitive data

initializng a C array with repetitive data - c

I would like to initialize an array of structs, with the same element repetitively, ie
struct st ar[] = { {1,2}, {1,2}, {1,2} };
However I do NOT want to run any code for that, I wish that the layout of the memory upon program's execution would be like so, without any CPU instructions involved (it would increase boot time on very slow CPU and relatively large arrays).
This makes sense, when using the array as mini ad-hoc database (maps id to struct) and when one wishes to use a default value to all database values.
My best solution was to use something of the form
#define I {1,2}
struct st ar[SIZE_OF_ARRAY] = { I,I,I };
So that the compiler will warn me if I'm having too much or too little Is. But this is far from ideal.
I think there's no solution for that in ANSI-C, but I thought that maybe there's a macro-abuse, or gcc extension that would do the work. Ideally I would like a standard solution, but even compiler specific ones would suffice.
I thought That I would somehow be able to define a macro recursively so that I(27) would be resolved to 27 {1,2}s, but I don't think that's possible. But maybe I'm mistaken, is there any hack for that?
Maybe inline assemby would do the trick? It would be very easy to define such memory layout with MASM or TASM, but I'm not sure it is possible to embed memory layout instructions within C code.
Is there any linker trick that would lure it to initialize memory according to my orders?
PS
I know I can generate automatically C file with some script. Using custom scripts is not desirable. If I'd use a custom script, I'll invent a C-macro REP(count,exp,sep) and would write a mini-C-preprocessor to replace that with exp sep exp sep ... exp {exp appears count time}.

The boost preprocessor library (which works fine for C) could help.
#include <boost/preprocessor/repetition/enum.hpp>
#define VALUE(z, n, text) {1,2}
struct st ar[] = {
BOOST_PP_ENUM(27, VALUE, _)
};
#undef VALUE
If you wish to use it, you'll just need the boost/preprocessor directory from boost - it's entirely self contained.
Although, it does have some arbitrary limits on the number of elements (I think it's 256 repetitions in this case). There is an alternative called chaos which doesn't, but it's experimental and will only work for preprocessors that follow the standard precisely (GCC's does).

The easiest way I can think of is to write a script to generate the initialisers that you can include into your C code:
{1,2},
{1,2},
{1,2}
Then in your source:
struct st ar[] = {
#include "init.inc"
};
You can arrange your Makefile to generate this automatically if you like.

I'm guessing you can, on the file with the array definition, do:
#include "rep_array_autogen.c"
struct st ar[SIZE_OF_ARRAY] = REPARRAY_DATA;
and have your Makefile generate rep_array_autogen.c with a format like
#define SIZE_OF_ARRAY 3
#define REPARRAY_DATA {{1, 2}, {1, 2}, {1, 2}, }
Since rep_array_autogen.c is built on your machine, it would be just as fast as hand-coding it in there.

What you have in the form of Is is the best Standard C can give you. I don't understand how you can be averse to standard initialization code but okay with assembly. Embedding assembly is necessarily a non-standard non-portable mechanism.
The problem with recursive macros is that you would not know where to stop.

Elazar,
Can you give data to support your view that pre-initialized data is faster than run-time initialization?
If you are running code on a standard machine with Hard Disk Drive (HDD) and your program resides on the HDD, then the time to copy the data from data section of your binary image to RAM is essentially the same amount of time as using some type of run-time initialization.
Kip Leitner

Related

#define in C, legal character

There is a C structure
struct a
{
int val1,val2;
}
I have made changes to the code like
struct b
{
int val2;
}
struct a
{
int val1;
struct b b_obj;
}
Now, usage of val2 in the other C files is like a_obj->val2;.
I want to replace its declaration usage and there are a lot of them, so I have defined a macro in the header file where the struct a is defined as follows:
#define a_obj->val2 (a_obj->b_obj.val2)
It's not working. Is -> illegal in the identifier part of a macro definition #define?
Could someone please tell me where am I wrong?
Edit as suggested by #Basile -
It's a legacy source code, a very huge project. Not sure of LOC.
I want to make such changes because I want to make it more modular.
For example I want to group similar fields of the structure under a same name and that's the reason I want to create another struct B with fields which are related to B feature and also common to A.
I can't use Find Replace feature of other text editors, I am using VIM.

This kind of macro magic will get you into trouble soon,
because it is making your source code unreadable and brittle (credits Basile for the phrasing).
But this should work for what you describe.
struct b
{
int val2m;
}
struct a
{
int val1;
struct b b_obj;
}
#define val2 b_obj.val2m
The trick is to give the actual identifier inside the struct declaration a new name (val2m), so that the name all the other code uses can be turned into a magic alias,
which then can contain the modified access to take a detour via the additionally introduced inner struct.
This is only a kind of band-aid for the problematic situation of having to change something backstage in existing code with many references. Only use it if there is no chance of refactoring the code cleanly. ("band-aid", appropriate image by StoryTeller, credits).
I explicitly recommend looking at Basiles answer, for a cleaner more "future-proof" way. It is the way to go to avoid the trouble I predict with using this macro magic. Use it if you are not forced by very good reasons.

As other explained, the preprocessor works only on tokens, and you can only #define a name. Read the documentation of cpp and the C11 standard n1570.
What you want to do is very ugly (and there are few occasions where it is worthwhile). It makes your code messy, unreadable, and brittle.
Learn to use better your source code editor (you probably have some interactive replace, or interactive replace with regexp-s; if you don't, switch to a better editor like GNU emacs or vim - and study the documentation of your editor). You could also use scripting tools like ed, sed, grep, awk etc... to help you in doing those replacements.
In a small project, replacing relevant occurrences of ->val2 (or .val2) with ->b_obj.val2 (or .b_obj.val2) is really easy, even if you have a hundred of them. And that keeps your code readable. Don't forget to use some version control system (to keep both old and new versions of your code).
In a large project of at least a million of lines of source code, you might ask how to find every occurrence of field usage of val2 for a given type (but you should probably name val2 well enough to have most occurrences of it be relevant; in other words, take care of the naming of your fields). That is a very different question (e.g. you could write some GCC plugin to find such occurrences and help you in replacing the relevant ones).
If you are refactoring an old and large legacy code, you need to be sure to keep it readable, and you don't want fancy macro tricks. For example, you might add some static inline function to access that field. And it could be then worthwhile to use some better tools (e.g. a compiler plugin, some kind of C parser, etc...) to help you in that refactoring.
Keep the source code readable by human developers. Otherwise, you are shooting yourself in the foot. What you want to do is unreasonable, it decreases the readability of the code base.
I can't use Find Replace feature of other text editors, I am using VIM.
vim is scriptable (e.g. in lua) and accepts plugins (so if interactive replace is not enough, consider writing some vim plugin or script to help you), and has powerful find-replace-regexp facilities. You might also use some combination of scripts to help you. In many cases they are enough. If they are not, you should explain why.
Also, you could temporarily replace the val2 field of struct a with a unique name like val2_3TYRxW1PuK7 (or whatever is appropriate, making some unique "random-looking" name is easy). Then you run your full build (e.g. after some make clean). The compiler would emit error messages for every place where you need to replace val2 used as a field of struct a (but won't mind for any other occurrence of the val2 name used for some other purpose). That could help you a lot -once you have corrected your code to get rid of all errors- (especially when combined with some editor scripting) because then you just need to replace val2_3TYRxW1PuK7 with b_obj.val2 everywhere.

Is -> illegal in #define?
Yes.
#define identifier can only be letter, number or underscore.

Macros definitions must be regular identifiers, so you can't use any special character like - or >.
I've thinked that may be you can use an union, like this:
struct b
{
int val2;
}
struct a
{
int val1;
union {
struct b b_obj;
int val2;
}
}
so you can still using a_obj->val2.

For loop macro which unrolled on the pre-processor phase?

I want to use gcc pre-processor to write almost the same code declaration for 500 times. let's say for demonstration purposes I would like to use a macro FOR_MACRO:
#define FOR_MACRO(x) \
#for i in {1 ... x}: \
const int arr_len_##x[i] = {i};
and calling FOR_MACRO(100) will be converted into:
const int arr_len_1[1] = {1};
const int arr_len_2[2] = {2};
...
const int arr_len_100[100] = {100};

This is not a good idea:
While possible in principle, using the preprocessor means you have to manually unroll the loop at least once, you end up with some arbitrary implementation-defined limit on loop depth and all statements will be generated in a single line.
Better use the scripting language of your choice to generate the code (possibly in a separate includeable file) and integrate that with your build process.

You can use Order-PP for this, if you desperately need to.
It's a scripting language implemented in the preprocessor. This means it's conceptually similar to using a scripting language to generate C code (in fact, the same) except there are no external tools and the script runs at the same time as the C compiler: everything is done with C macros. Despite being built on the preprocessor, there are no real limits to loop iterations, recursion depth, or anything like that (the limit is somewhere in the billions, you don't need to worry about it).
To emit the code requested in the question example, you could write:
#include <order/interpreter.h>
ORDER_PP( // runs Order code
8for_each_in_range(8fn(8I,
8print( 8cat(8(const int arr_len_), 8I)
([) 8I (] = {) 8I (};) )),
1, 101)
)
I can't fathom why you would do this instead of simply integrating an external language like Python into your build process (Order might be implemented using macros, but it's still a separate language to understand), but the option is there.
Order only works with GCC as far as I know; other preprocessors run out of stack too quickly (even Clang), or are not perfectly standard-compliant.

Instead of providing you with a solution for exactly your problem, are you sure it cannot be handled in a better way?
Maybe it would be better to
use one array with one more dimension
fill the data with the help of an array at runtime, as you obviously want to fill out the first entry of each array. If you leave the array uninitialized, it will (provided it is defined on module level) be put into .bss segment instead of .data and will probably need less space in the binary file.

You could use e.g P99 to do such preprocessor code unrolling. But because of the limited capacities of the preprocessor this comes with a limit, and that limit is normally way below 500.

Namespacing in C with structs

It is possible to imitate namespaces in C like this:
#include <stdio.h>
#include <math.h>
struct math_namespace {
double (*sin)(double);
};
const struct math_namespace math = {sin};
int main() {
printf("%f\n", math.sin(3));
return 0;
}
Are there any disadvantages to this, or just situations where a prefix makes more sense? It just seems cleaner to do it this way.

This method is already used in real projects such as the C Containers Library by Jacob Navia. C is not designed for object-oriented programming. This is not really efficient, since you have to (1) access to the structure and (2) dereference the function pointer. If you really want prefixes, I think changing your identifiers remains the best solution.

I have used this style for a while now. It helps organize the program without all of the excess baggage of an OOP language. There is no performance penalty because accessing a function pointer in C is the same as directly accessing the function. I like it enough that I even wrote a very short paper about it. It can be found on http://slkpg.1eko.com under the link "C with Structs" at the bottom of the page.
The direct link is http://slkpg.1eko.com/cstructs.html.

Why reinvent the wheel? One disadvantage is all the setting up which could go out of sync, and also to add to the namespace you have to change the structure.
And there's no 'using namespace' so you always have to specify it. What about and functions with different parameter types?

Well, this does allow you to export your namespace and it does allow a client module to use a static or local version of something that's named sin. So, in that sense, it does actually work.
The downside is that it's not terribly ELF-friendly. The struct initialization is buried in the middle of a writable data page, and it needs to be patched up. Unless you are statically linking, this is a load-time fix-up. On the bright side, it just duplicates what the ELF dispatch table would have done, so I bet it isn't even any slower. On Windows I think the considerations are similar.

Prettiest way to declare a C array either fixed size or variable size?

I am writing a small C code for an algorithm. The main target are embedded microcontrollers, however, for testing purposes, a Matlab/Python interface is required.
I am following an embedded programming standard (MISRA-C 2004), which requires the use of C90, and discourage the use of malloc and friends. Therefore, all the arrays in the code have their memory allocated at compile time. If you change the size of the input arrays, you need to recompile the code, which is alright in the microcontroller scenario.
However, when prototyping with Matlab/Python, the size of the input arrays change rather often, and recompiling every time does not seem like an option. In this case, the use of C99 is acceptable, and the size of the arrays should be determined in runtime.
The question is: what options do I have in C to make these two scenarios coexist in the same code, while keeping the code clean?
I must emphasize that my main concern is how to make the code easy to maintain. I have considered using #ifdef to either take the statically allocated array or the dynamically alocated array. But there are too many arrays, I think #ifdef makes the code look ugly.

I've thought of a way that you can get away with only one #ifdef. I would personally just bite the bullet and recompile my code when I need to. The idea of using a different dialect of C for production and test makes me a bit nervous.
Anyway, here's what you can do.
#ifdef EMBEDDED
#define ARRAY_SIZE(V,S) (S)
#else
#define ARRAY_SIZE(V,S) (V)
#endif
int myFunc(int n)
{
int myArray[ARRAY_SIZE(n, 6)];
// work with myArray
}
The ARRAY_SIZE macro chooses the variable V, if not in the embedded environment; or the fixed size S, if in the embedded environment.

MISRA-C:2004 forbids C99 and thereby VLAs, so if you are writing strictly-conforming MISRA code you can't use them. It is also very likely that VLAs will be explicitly banned in the upcoming MISRA-C standard.
Is it an option not to use statically allocated arrays of unknown size? That is:
uint8_t arr[] = { ... };
...
n = sizeof(arr)/sizeof(uint8_t);
This is most likely the "prettiest" way. Alternatively you can have a debug build in C99 with VLAs, and then change it to statically allocated arrays in the release build.

Finding the elements in a structure in C language

Is it possible to determine the elements(name & datatype) in a structure(C language) in a library ? If yes, how to do it in C language ? If C language does not support it, Is it possible to get the structure elements by other tricks or is there any tool for it?

Do you mean find out when you are programming, or dynamically at runtime?
For the former, sure. Just find the .h file which you are including and you will find the struct definition there including all the fields.
For the latter, no, it is not possible. C compiles structs to machine code in such a way that all of this information is lost. For example, if you have a struct {int x, float y, int z}, and you have some code which says
a = mystruct.y
in the machine code, all that will remain is something like finding the pointer to mystruct, adding 4 to it (the size of the int), and reading 4 bytes from there, then doing some floating point operations to it. Neither the names nor the types of those struct fields will be accessible at all, and therefore, there is no way to find them out at runtime.

No, it isn't possible. C has no inbuilt reflection-style support.

If by "determine the elements of a structure" you mean "get the declaration of that structure type programmatically", then I do not believe that it is possible - at least not portably. Contrary to more modern languages like C++ ot Java, C does not keep type information in a form available to the actual program.
EDIT:
To clarify my comment about it being impossible "portably":
There could very well be some compiler+debugging format combination that would embed the necessary information in the object files that it produces, although I can't say I know of one. You could then, hypothetically, have the program open its own executable file and parse the debugging information. But this is a cumbersome and fragile approach, at best...
Why do you need to do something like that?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

initializng a C array with repetitive data - c

The easiest way I can think of is to write a script to generate the initialisers that you can include into your C code: {1,2}, {1,2}, {1,2} Then in your source: struct st ar[] = { #include "init.inc" }; You can arrange your Makefile to generate this automatically if you like.

Related

#define in C, legal character

For loop macro which unrolled on the pre-processor phase?

Namespacing in C with structs

Prettiest way to declare a C array either fixed size or variable size?

Finding the elements in a structure in C language

Categories

Resources