Want to reduce a function by looping through structs - c

Good Morning All,
I'm trying to reduce a function that's very repetitive, but each "repetition" has two structs with struct A.element1 setting struct B.element1. At the moment I have myFunction() with about twelve different reqFunction() calls to set B to A. Basically what I have now is:
void myFunction( structB *B )
{
structA A;
if( reqGetFunction( GLOBAL_IN_1, ( void *)&A, SIZE ) != 0 )
{
A.element3 = -1;
printf( "element3 failed\n" );
}
B->element7 = A.element3; // A is gotten when regGetFunction() is called
.
.
.
if( reqGetFunction( GLOBAL_IN_12, ( void *)&A, SIZE ) != 0 )
{
A.element14 = -1;
printf( "element14 failed\n" );
}
B->element18 = A.element14;
}
reqGetFunction() can't be changed. I have a static global array for other functions that would loop through GLOBAL_IN, and I could make structA A a static global.
I want to have something like myFunctionSingle() that will do one block, and myFunctionAll() that will have a for loop to cycle through the GLOBAL_IN array as well as the elements of struct's A and B and input them to myFunctionSingle().
So I guess my real question is how could I cycle through the elements of the structs as I can with an array, because everything there (like the structs' setups and reqGetFunction) are set in stone. I've tried a few things and searched around, but am currently stumped. I'm honestly not sure if this is possible or even worth it. Thank you in advance for your input!

Your function calls differ by 1)GLOBAL_IN_XX values 2)A.elementxx that you modify. 3)B.elementxx that you modify
What you need to do is to create a struct containing a value for GLOBAL_IN_XX a pointers to A.element and B.element, whatever type they are, for example:
struct call_parms
{
int global_parm;
int* a_ptr;
int* b_ptr;
};
Then, you need to create an array of those and initialize it accordingly, for example:
struct call_parms callParmsArray[MAX_CALLS]= {{GLOBAL_IN_1,&A.element3,&(B->element5)}, ... };
Then, just iterate over array and call your reqGetFunction with the parameters specified in each array element,something along the lines of:
for(int i = 0; i<MAX_CALLS;i++)
{
reqGetFunction( callParmsArray[i].global_parm, callParmsArray[i].element_ptr, SIZE );
}
You may also want factor a pointer to B->element in the struct and deal with it accordingly, as it is also repetitive. This will likely involve creating a wrapper around reqGetFunction() which will also deal with B and such:
struct call_parms
{
int global_parm;
int* a_ptr;
int* b_ptr;
};
bool myReqFn(struct call_parms* parm)
{
bool res;
if( res = reqGetFunction( parm->global_parm, ( void *)&A, SIZE ) != 0 )
{
*(parm->a_ptr) = -1;
printf( "element %d failed\n",parm->global_parm );
}
*(parm->b_ptr) = *(parm->a_ptr);
return res;
}
for(int i = 0; i<MAX_CALLS;i++)
{
myReqFn( &callParmsArray[i]);
}
The rest is left as an exercise to the reader, as they say...

One way to cycle through a struct that I know of is to use pointer math. I'm not sure what kind of datatype your struct members are, but if you have a concurrent set of identical datatypes numbered from j to k, your code would look something like this:
(_datatype_)*a = &(A.elementj);
(_datatype_)*b = &(B.elementj);
int i;
for (i = j; i < k; i++)
{
*(b + ((_sizeofdatatype) * (i - j)) = *(a + ((_sizeofdatatype) * (i - j));
}
EDIT: This is also, of course, assuming that you want to duplicate each pair of corresponding elements in order, but you can probably tweak it around to get the desired effect.
EDIT: This also assumes you allocate your entire struct (including variables) at the same time, so be careful.

Does GLOBAL_IN_XXX mean GLOBAL_IN[XXX] etc? And does GLOBAL_IN_XXX always map to A.element(XXX+2)? And its always B.element(N+1) = A.elementN?
I'm also going to assume that you can't change A.element1, A.element2 into A.element[], otherwise the soution would be fairly simple wouldn't it?
The most portable solution is to know the offset of each element in A and B (in case there are data alignment gotchas in the stuctures... could occur if you don't have N consecutive ELEMENT_TYPES etc)
#include <stddef.h>
// NOTE: These arrays are clumbsy but avoid making assumptions about member alignment
// in strucs.
static size_t const A_Offsets[] = {
offsetof(struct A, element1),
offsetof(struct A, element2),
offsetof(struct A, element3),
...
...
offsetof(struct A, elementN) };
static size_t const B_Offsets[] = {
offsetof(struct B, element1),
offsetof(struct B, element2),
offsetof(struct B, element3),
...
...
offsetof(struct B, elementN) };
void myFunctionSingle( structB *B, unsigned int index )
{
structA A;
ELEMENT_TYPE *elAPtr = (ELEMENT_TYPE *)((char *)A + A_Offsets[index + 2]);
ELEMENT_TYPE *elBPtr = (ELEMENT_TYPE *)((char *)A + B_Offsets[index + 6]);
if( reqGetFunction( GLOBAL_IN[index], ( void *)&A, SIZE ) != 0 )
{
*elAPtr = -1;
printf( "element%u failed\n", index);
}
*elBPtr = *elAPtr; // A is gotten when regGetFunction() is called
}
void myFunction( structB *B )
{
unsigned int i = 1;
for(; i < MAX_INDEX; ++i)
myFunctionSingle(B, i);
}
EDIT: I'm not sure if the offsetof() stuff is necessary because if you structure has only ELEMENT_TYPE data in it they are probably packed tight, but I'm not sure... if they are packed tight, then you don't have any data alignment issues so you could use the solution presented in Boston Walker's answer.

Related

Static to Dynamic nature in C

I have implemented naive Bayes but I did it in static memory allocation.
I wanted to convert into dynamic but my small brain is not able to do that.
#define COLS 4 //including class label
#define BINS 100
#define CLASS_COL 0
#define CLASS 2
The idea is to fetch above value from a configuration file and then set it.
struct each_col //Probability for each feature based on classes
{
double col_PB[BINS][CLASS];
};
struct NB_Class_Map
{
char label[250];
unsigned int label_value;
double class_PB;
};
struct NB //Proabability for entire feature
{
struct NB_Class_Map classes[CLASS];
struct each_col cols[COLS];
};
NB nb = {0}; //gloabal value
The function to train NB:
long strhash(const char *str)
{
long hash = 5381;
int c;
printf("IN: %s ",str);
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
printf("OUT: %ld ||",hash);
return hash;
}
int setup_train_NB(vector<vector<string> > &data)
{
//Finding the feature count
static int class_label = -1;
for(unsigned int i=0;i<data.size();i++)
{
unsigned int Class;
printf("\n===========New ROW==============\n");
int k;
for(k=0;k<CLASS;k++)
{
if(strcmp(data[i][CLASS_COL].c_str(), nb.classes[k].label) == 0)
{
printf("MATCHED\n");
Class = nb.classes[k].label_value;
break;
}
}
if(k==CLASS)
{
printf("NOT MATCHED\n");
class_label++;
nb.classes[class_label].label_value = class_label;
strcpy( nb.classes[class_label].label, data[i][CLASS_COL].c_str());
Class = nb.classes[class_label].label_value;
}
printf("Class: %d ||\n", Class);
for(unsigned j=0;j<data[0].size();j++)
{
printf("\n===========New COLUMN==============\n");
if(j == CLASS_COL)
{
nb.classes[Class].class_PB++;
continue;
}
unsigned int bin = strhash((data[i][j].c_str()))%BINS;
printf("Bin: %d ||", bin);
printf("Class: %d ||\n", Class);
nb.cols[j].col_PB[bin][Class]++; //[feature][BINS][CLASS]
}
}
//Finding the feature PB
for(unsigned int i=0;i<COLS;i++)
{
if(i==CLASS_COL)
continue;
for(unsigned j=0;j<BINS;j++)
{
for(unsigned k=0;k<CLASS;k++)
{
// nb.cols[i].col_PB[j][k] /= nb.classes[k].class_PB; //without laplacian smoothing
nb.cols[i].col_PB[j][k] = (nb.cols[i].col_PB[j][k] + 1) / (nb.classes[k].class_PB + COLS - 1); //with laplace smoothing
}
}
}
int k = 0;
int sum = 0;
while(k<CLASS)
{
sum += nb.classes[k].class_PB;
k++;
}
//Finding the class PB
k = 0;
while(k<CLASS)
{
nb.classes[k].class_PB /= sum;
k++;
}
return 0;
}
The program is supposed to be written in C but for the moment, I use vector to fetched the data from a CSV file. Please ignore that for the moment. The actual question is how I can remove those hardcoded define value and still declare my structs.
Although it does not matter but the CSV file look like this and it may change in terms of no of cols and labels. The first line is ignored and not put into data.
Person,height,weight,foot
male,654,180,12
female,5,100,6
female,55,150,8
female,542,130,7
female,575,150,9
What actually I am doing is, for each value is put into a bin, then for each of those value, I am finding proabability for the CLASS/label i.e male = 0, female = 1
Basically:
Define variables instead of preprocessor macro constants: size_t cols; size_t bins; etc.
Replace your 1-dimensional fixed-size arrays with pointers (initialized to NULL!) and length variables. Alternatively, you could use a struct mytype_span { size_t length; mytype* data; }
Replace your 2-dimensional fixes-size arrays with "1-dimensional" pointers (also initialized to NULL of course) and pairs off dimension variables. Again, you could use a struct.
Replace your 2-d array accesses a[x][y] with a "linearized" access, i.e. a[x * row_length_of_a + y] (or you could do this in an inline function which takes the relevant arguments, or a struct mytype_span)
When you've read your configuration values from, um, wherever - set the relevant length variables (see above).
use the malloc() library function to allocate the correct amount of space; remember to check the malloc() return value to make sure it's not null, before using the pointer values!
Your use of struct, is probably wrong, except for struct NB_Class_Map. You shouldn't use struct in the goal of puting big arrays in the same variable. Instead of this, you should define of variable for each array, not putting it inside a struct, and instead of using array, replace it by a pointer. Then you can allocate memory to your pointer. e.g. :
struct mydata {
type1 field1;
type2 field2;
etc...
} *myarray;
myarray = calloc(number_of_record_you_need, sizeof(struct mydata));
// here, error checking code, etc.
Now, having done that, if you really want, you may put your different pointers into a global structure, but each of your table should be allocated separately.
Edit (about your variables) :
NB has no real interest as a structure. It's just 2 variables you glued together:
struct NB_Class_Map classes[CLASS];
struct each_col cols[COLS];
NB.Cols is not really a structure. It's just a three dimensional array
double cols[COLS][BINS][CLASS];
The only real structure is
struct NB_Class_Map
{
char label[250];
unsigned int label_value;
double class_PB;
};
So you just have to replace
struct NB_Class_Map classes[CLASS];
with
struct NB_Class_Map *classes;
and
double cols[COLS][BINS][CLASS];
with
double *cols[BINS][CLASS];
Or if you want a type name :
typedef double each_col[BINS][CLASS];
each_cols *cols;
and allocate memory space for classes and colls with calloc.
Now, if you really want this struct NB :
typedef double each_col[BINS][CLASS];
struct NB
{
struct NB_Class_Map *classes;
each_col *cols;
};

Initializing pointers in a struct in a minimum of source lines

I'm currently new to C programming, and appreciate for any tip.
Is there a shorter way to initialize struct pointers in C without removing the pointer tags?
typedef struct {
int x, y, z;
} Point3;
typedef struct {
Point3 *pos, *direction;
} Vector;
int main() {
Vector *p;
p = malloc(sizeof(Vector));
p->pos = malloc(sizeof(Point3));
p->direction = malloc(sizeof(Point3));
return 0;
}
Yes, there is a shorter way — one which is one malloc() call shorter.
Vector *p = malloc(sizeof(Vector));
if (p != 0)
{
p->pos = malloc(2 * sizeof(Point3));
if (p->pos != 0)
p->direction = &p->pos[1];
}
Allocate an array of 2 Point3 values. p->pos points to the first, and p->direction points to the second (or vice versa).
It is still 3 statements (plus error checking) and two calls to malloc(), though.
In practice, you could almost certainly get away with:
Vector *p = malloc(sizeof(Vector) + 2 * sizeof(Point3));
if (p != 0)
{
p->pos = (void *)((char *)p + sizeof(Vector));
p->direction = (void *)((char *)p + sizeof(Vector) + sizeof(Point3));
}
I am not sure that is sanctioned by the C standard, but I can't immediately think of a plausible platform configuration where it would actually fail to work correctly. It would fail if you found some bizarre platform where addresses were 16-bits each but int was 8 bytes and had to be 8-byte aligned, but that's hardly plausible.
To me, it makes far more sense to put the Point3 members directly in the Vector, instead of pointers. Fewer allocations, less memory fragmentation, fewer de-references, fewer cache-misses.
typedef struct {
int x, y, z;
} Point3;
typedef struct {
Point3 pos, direction;
} Vector;
int main(void) {
/* Local (stack) allocation of a Vector, initialized to all zeros */
Vector v = {};
/* Dynamic (heap) allocation of a Vector, initialized to all zeros */
Vector *p;
p = malloc(sizeof(Vector));
if (!p) {
return 1; // failure
}
*p = (Vector){};
return 0;
}
Unfortunately, there is no other way. You can simplify memory allocation with another function, like this
Vector* allocate_vector( ) {
Vector* v = (Vector*)malloc( sizeof(Vector) );
if( v == NULL ) {
/**/
}
v->pos = (Point3*)malloc( sizeof(Point3) );
if( v->pos == NULL ) {
/**/
}
v->direction = (Point3*)malloc( sizeof(Point3) );
if( v->direction == NULL ) {
/**/
}
return v;
}
And then use it, when you need new Vector.
Vector* v = allocate_vector( );

qsort structure array deletes everything

So I am having trouble using qsort to sort an array of structures.
I used this link as an example: http://support.microsoft.com/kb/73853
When I run the program it gives me blanks for the names that were originally in the structure and zeroes for all the values of gp.
typedef int (*compfn)(const void*, const void*);
struct record
{
char player[20];
int gp;
};
struct record entries[15];
int compare(struct record *, struct record *);
void show ()
{
int v;
qsort((void *)entries, 10, sizeof(struct record), (compfunc)compare);
struct record *p = entries;
for(v=0;v<counter;v++, p++)
{
printf("%s ..... %d \n", p->player , p->gp);
}
}
int compare(struct record * p1, struct record * p2)
{
if( p1->gp < p2->gp)
return -1;
else if (p1->gp > p2->gp)
return 1;
else
return 0;
}
Edit: Hey everyone thanks so much for all your help but, I have tried everything you guys have said and it still just turns everything value to zero
Your call can be simplified, no need to cast to void *:
qsort(entries, 10, sizeof entries[0], compare);
Note use of sizeof entries[0] to avoid pointless repetition of the array type.
There should be no cast of the comparison function either, since it should simply be defined to match the prototype:
static int compare(const void *a, const void *b)
{
const struct record *ra = a, *rb = b;
if( ra->gp < rb->gp)
return -1;
if (ra->gp > rb->gp)
return 1;
return 0;
}
By the way, just to be informational, here's a classic (?) way to tersify the 3-way testing that you sometimes see in places like these:
return (ra->gp < rb->gp) ? -1 : (ra->gp > rb->gp);
I do not argue for this way of expressing it, especially not if you're a beginner, but thought I'd include it since it's relevant and might be instructional to have seen.
Apart from the fact that the microsoft support pages are a real mess and not a good source for learning C, your code is missing an & here:
...
qsort((void *)entries, 10, sizeof(struct record), (compfunc)compare);
...
should be
...
qsort((void *)&entries, 10, sizeof(struct record), (compfunc)compare);
...
also, I think you meant to write
...
qsort((void *)&entries, 15, sizeof(struct record), (compfn)compare);
...

Pointer Conventions with: Array of pointers to certain elements

This question is about the best practices to handle this pointer problem I've dug myself into.
I have an array of structures that is dynamically generated in a function that reads a csv.
int init_from_csv(instance **instances,char *path) {
... open file, get line count
*instances = (instance*) malloc( (size_t) sizeof(instance) * line_count );
... parse and set values of all instances
return count_of_valid_instances_read;
}
// in main()
instance *instances;
int ins_len = init_from_csv(&instances, "some/path/file.csv");
Now, I have to perform functions on this raw data, split it, and perform the same functions again on the splits. This data set can be fairly large so I do not want to duplicate the instances, I just want an array of pointers to structs that are in the split.
instance **split = (instance**) malloc (sizeof(instance*) * split_len_max);
int split_function(instance *instances, ins_len, instances **split){
int i, c;
c = 0;
for (i = 0; i < ins_len; i++) {
if (some_criteria_is_true) {
split[c++] = &instances[i];
}
return c;
}
Now my question what would be the best practice or most readable way to perform a function on both the array of structs and the array of pointers? For a simple example count_data().
int count_data (intances **ins, ins_len, float crit) {
int i,c;
c = 0;
for (i = 0; i < ins_len; i++) {
if ins[i]->data > crit) {
++c;
}
}
return c;
}
// code smell-o-vision going off by now
int c1 = count_data (split, ins_len, 0.05); // works
int c2 = count_data (&instances, ins_len, 0.05); // obviously seg faults
I could make my init_from_csv malloc an array of pointers to instances, and then malloc my array of instances. I want to learn how a seasoned c programmer would handle this sort of thing though before I start changing a bunch of code.
This might seem a bit grungey, but if you really want to pass that instances** pointer around and want it to work for both the main data set and the splits, you really need to make an array of pointers for the main data set too. Here's one way you could do it...
size_t i, mem_reqd;
instance **list_seg, *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = (sizeof(instance*) + sizeof(instance)) * line_count;
list_seg = (instance**) malloc( mem_reqd );
data_seg = (instance*) &list_seg[line_count];
/* Index into the data segment */
for( i = 0; i < line_count; i++ ) {
list_seg[i] = &data_seg[i];
}
*instances = list_seg;
Now you can always operate on an array of instance* pointers, whether it's your main list or a split. I know you didn't want to use extra memory, but if your instance struct is not trivially small, then allocating an extra pointer for each instance to prevent confusing code duplication is a good idea.
When you're done with your main instance list, you can do this:
void free_instances( instance** instances )
{
free( instances );
}
I would be tempted to implement this as a struct:
struct instance_list {
instance ** data;
size_t length;
int owner;
};
That way, you can return this from your functions in a nicer way:
instance_list* alloc_list( size_t length, int owner )
{
size_t i, mem_reqd;
instance_list *list;
instance *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = sizeof(instance_list) + sizeof(instance*) * length;
if( owner ) mem_reqd += sizeof(instance) * length;
list = (instance_list*) malloc( mem_reqd );
list->data = (instance**) &list[1];
list->length = length;
list->owner = owner;
/* Index the list */
if( owner ) {
data_seg = (instance*) &list->data[line_count];
for( i = 0; i < line_count; i++ ) {
list->data[i] = &data_seg[i];
}
}
return list;
}
void free_list( instance_list * list )
{
free(list);
}
void erase_list( instance_list * list )
{
if( list->owner ) return;
memset((void*)list->data, 0, sizeof(instance*) * list->length);
}
Now, your function that loads from CSV doesn't have to focus on the details of creating this monster, so it can simply do the task it's supposed to do. You can now return lists from other functions, whether they contain the data or simply point into other lists.
instance_list* load_from_csv( char *path )
{
/* get line count... */
instance_list *list = alloc_list( line_count, 1 );
/* parse csv ... */
return list;
}
etc... Well, you get the idea. No guarantees this code will compile or work, but it should be close. I think it's important, whenever you're doing something with arrays that's even slightly more complicated than just a simple array, it's useful to make that tiny extra effort to encapsulate it. This is the major data structure you'll be working with for your analysis or whatever, so it makes sense to give it a little bit of stature in that it has its own data type.
I dunno, was that overkill? =)

Stabilizing the standard library qsort?

I'm assuming that the good old qsort function in stdlib is not stable, because the man page doesn't say anything about it. This is the function I'm talking about:
#include <stdlib.h>
void qsort(void *base, size_t nmemb, size_t size,
int(*compar)(const void *, const void *));
I assume that if I change my comparison function to also include the address of that which I'm comparing, it will be stable. Is that correct?
Eg:
int compareFoos( const void* pA, const void *pB ) {
Foo *pFooA = (Foo*) pA;
Foo *pFooB = (Foo*) pB;
if( pFooA->id < pFooB->id ) {
return -1;
} else if( pFooA->id > pFooB->id ) {
return 1;
} else if( pA < pB ) {
return -1;
} else if( pB > pA ) {
return 1;
} else {
return 0;
}
}
No, you cannot rely on that unfortunately. Let's assume you have the array (two fields in each record used for checking but only first field used for sorting):
B,1
B,2
A,3
A non-stable sort may compare B,1 with A,3 and swap them, giving:
A,3
B,2
B,1
If the next step were to compare B,2 with B,1, the keys would be the same and, since B,2 has an address less than B,1, no swap will take place. For a stable sort, you should have ended up with:
A,3
B,1
B,2
The only way to do it would be to attach the starting address of the pointer (not its current address) and sort using that as well as the other keys. That way, the original address becomes the minor part of the sort key so that B,1 will eventually end up before B,2 regardless of where the two B lines go during the sorting process.
The canonical solution is to make (i.e. allocate memory for and fill) an array of pointers to the elements of the original array, and qsort this new array, using an extra level of indirection and falling back to comparing pointer values when the things they point to are equal. This approach has the potential side benefit that you don't modify the original array at all - but if you want the original array to be sorted in the end, you'll have to permute it to match the order in the array of pointers after qsort returns.
This does not work because during the sort procedure, the ordering will change and two elements will not have consistent output. What I do to make good old-fashioned qsort stable is to add the initial index inside my struct and initialize that value before passing it to qsort.
typedef struct __bundle {
data_t some_data;
int sort_score;
size_t init_idx;
} bundle_t;
/*
.
.
.
.
*/
int bundle_cmp(void *ptr1, void *ptr2) {
bundle_t *b1, *b2;
b1 = (budnel_t *) ptr1;
b2 = (budnel_t *) ptr2;
if (b1->sort_score < b2->sort_score) {
return -1;
}
if (b1->sort_score > b2->sort_score) {
return 1;
}
if (b1->init_idx < b2->init_idx) {
return -1;
}
if (b1->init_idx > b2->init_idx) {
return 1;
}
return 0;
}
void sort_bundle_arr(bundle_t *b, size_t sz) {
size_t i;
for (i = 0; i < sz; i++) {
b[i]->init_idx = i;
}
qsort(b, sz, sizeof(bundle_t), bundle_cmp);
}

Resources