Stabilizing the standard library qsort? - c

I'm assuming that the good old qsort function in stdlib is not stable, because the man page doesn't say anything about it. This is the function I'm talking about:
#include <stdlib.h>
void qsort(void *base, size_t nmemb, size_t size,
int(*compar)(const void *, const void *));
I assume that if I change my comparison function to also include the address of that which I'm comparing, it will be stable. Is that correct?
Eg:
int compareFoos( const void* pA, const void *pB ) {
Foo *pFooA = (Foo*) pA;
Foo *pFooB = (Foo*) pB;
if( pFooA->id < pFooB->id ) {
return -1;
} else if( pFooA->id > pFooB->id ) {
return 1;
} else if( pA < pB ) {
return -1;
} else if( pB > pA ) {
return 1;
} else {
return 0;
}
}

No, you cannot rely on that unfortunately. Let's assume you have the array (two fields in each record used for checking but only first field used for sorting):
B,1
B,2
A,3
A non-stable sort may compare B,1 with A,3 and swap them, giving:
A,3
B,2
B,1
If the next step were to compare B,2 with B,1, the keys would be the same and, since B,2 has an address less than B,1, no swap will take place. For a stable sort, you should have ended up with:
A,3
B,1
B,2
The only way to do it would be to attach the starting address of the pointer (not its current address) and sort using that as well as the other keys. That way, the original address becomes the minor part of the sort key so that B,1 will eventually end up before B,2 regardless of where the two B lines go during the sorting process.

The canonical solution is to make (i.e. allocate memory for and fill) an array of pointers to the elements of the original array, and qsort this new array, using an extra level of indirection and falling back to comparing pointer values when the things they point to are equal. This approach has the potential side benefit that you don't modify the original array at all - but if you want the original array to be sorted in the end, you'll have to permute it to match the order in the array of pointers after qsort returns.

This does not work because during the sort procedure, the ordering will change and two elements will not have consistent output. What I do to make good old-fashioned qsort stable is to add the initial index inside my struct and initialize that value before passing it to qsort.
typedef struct __bundle {
data_t some_data;
int sort_score;
size_t init_idx;
} bundle_t;
/*
.
.
.
.
*/
int bundle_cmp(void *ptr1, void *ptr2) {
bundle_t *b1, *b2;
b1 = (budnel_t *) ptr1;
b2 = (budnel_t *) ptr2;
if (b1->sort_score < b2->sort_score) {
return -1;
}
if (b1->sort_score > b2->sort_score) {
return 1;
}
if (b1->init_idx < b2->init_idx) {
return -1;
}
if (b1->init_idx > b2->init_idx) {
return 1;
}
return 0;
}
void sort_bundle_arr(bundle_t *b, size_t sz) {
size_t i;
for (i = 0; i < sz; i++) {
b[i]->init_idx = i;
}
qsort(b, sz, sizeof(bundle_t), bundle_cmp);
}

Related

Is there a C function that returns the second, third, etc. instance an int value occurs?

I have implemented code that imports data from a file containing 5 different values, one of them being Time. I have converted the time given in the format Hour.Minute.Second.Millisecond into just Milliseconds.
With this data I created a function Find that finds the data for a given time. This is where the problem arises, since there are multiple days of data here, and the time will repeat multiple times. Is there a function in the C library that returns all instances of a value? Ex.arr =[2,3,4,1,2,] I want it to tell me when the second 2 appears, returning 4.
Edit: For better clarity
These are the functions
void Find(SortedLinkedList *list,int target,int date, char *search) {
if(strcmp(search, "Time") == 0){
Sate *found = findTime(list, target,date);
printf("The Node with time:%d\n Is from the date:%d\n Contains the following:",found->Time,found->Date);
printf("RMag:%6.3f ", found->rmag);
printf("NSmag:%6.3f ", found->NSmag);
printf("azmag:%6.3f ", found->azmag);
printf("avgmag:%6.3f \n", found->avgmag);
}
}
Sate *findTime(SortedLinkedList *list, int target,int date){
Node *current = list->head;
for (int i = 0; i < (list->size)+1 && current != NULL; i++) {
if(current->data->Time == target && current->data->Date == date)
return current->data;
else{
current = current->next;
}
}
}
Right now for it to work I implemented a date insert to differentiate between the times but I'm wondering if it can be done without it.
There's not any kind of a iterate over a collection type of function in the Standard C library other than something like strtok() which will iterate over a text string using the provided token identification pattern.
There is the bsearch() function however that does a search through a sorted list of items and is not really what you want either.
It sounds like you want something like the following. This demonstrates an instantiation of an algorithm however I am not sure what the time points data looks like so that is something you will need to provide.
typedef unsigned long long TimePoint; // made up time data item
typedef struct {
int bFound;
unsigned long ulCurOffset; // array position where item found if bFound is true.
unsigned long ulOffset; // next array position to test
unsigned long ulCount; // count of times found
} IteratorThing;
IteratorThing IterateFunc (IteratorThing x, TimePoint *array, size_t len, TimePoint search)
{
x.bFound = 0; // assume we didn't find one.
// resuming from the current place in the array, search until we
// find a match or we reach the end of the array.
for ( ; x.ulOffset < len; x.ulOffset++) {
// this is a simple comparison for equality which may need to be
// more complex for your specific application.
if (array[x.ulOffset] == search) {
// we have found a match so lets update counts, etc.
x.ulCount++; // count of this search item found.
x.bFound = 1; // indicate we found one.
x.ulCurOffset = x.ulOffset; // remember where we found it.
x.ulOffset++; // point to the next array item to look at
break;
}
}
return x;
}
This would be used as in:
void main_xfun(void)
{
TimePoint array[] = { 1, 2, 3, 2, 3, 4, 0 };
TimePoint search = 2;
size_t len = sizeof(array) / sizeof(array[0]);
{
IteratorThing x = { 0 }; // define and initialize our iterator
while ((x = IterateFunc(x, array, len, search)).bFound) {
// do what is needed when we find a time value
// array offset to the item is x.ulCurOffset
// current count of times found is in x.ulCount;
printf(" found item %d at offset %d count is %d\n", (long)array[x.ulCurOffset], x.ulCurOffset, x.ulCount);
}
printf(" item %d found %d time\n", (long)search, x.ulCount);
}
{
IteratorThing x = { 0 }; // define and initialize our iterator
search = 25;
while ((x = IterateFunc(x, array, len, search)).bFound) {
// do what is needed when we find a time value
// array offset to the item is x.ulCurOffset
// current count of times found is in x.ulCount;
printf(" found item %d at offset %d count is %d\n", (long)array[x.ulCurOffset], x.ulCurOffset, x.ulCount);
}
printf(" item %d found %d time\n", (long)search, x.ulCount);
}
}
produces output of
found item 2 at offset 1 count is 1
found item 2 at offset 3 count is 2
item 2 found 2 time
item 25 found 0 time
To restart the search from the beginning just initialize the iterator struct to all zeros again.
What would be really interesting is to provide a pointer to a comparison function in the interface of the function IterateFunc() which would be called to do the comparisons. This would be along the lines of the bsearch() function which requires a pointer to a comparison function but then that is probably overkill for your specific needs.
If you want this hypothetical function to work for either an array of integers or for your time indexed structures, you will probably need to write a generic function.
If POSIX functions are available to you, you can use lfind() as a starting point for such a generic function.
The lsearch() function shall linearly search the table and return a pointer into the table for the matching entry. If the entry does not occur, it shall be added at the end of the table. ...
The lfind() function shall be equivalent to lsearch(), except that if the entry is not found, it is not added to the table. Instead, a null pointer is returned.
Since lfind() will return the first instance, you need to re-invoke lfind() again past the given instance to find the second instance.
void * lfind_Nth (const void *key, const void *base, size_t *nelp,
size_t width, int (*compar)(const void *, const void *),
int N)
{
const char (*array)[width] = base;
char (*p)[width] = NULL;
size_t n = *nelp;
while (N-- > 0) {
p = n ? lfind(key, array, &n, width, compar) : NULL;
if (p == NULL) break;
n -= (p + 1) - array;
array = p + 1;
}
return p;
}
For your integer array example:
int compar_int (const void *a, const void *b) {
return *(const int *)a != *(const int *)b;
}
int where_Nth_int(int key, int *arr, size_t nelm, int N) {
int *w = lfind_Nth(&key, arr, &nelm, sizeof(*arr),
compar_int, N);
return w ? w - arr : -1;
}
int main (void) {
int arr[] = {2,3,4,1,2,};
int nelm = sizeof(arr)/sizeof(*arr);
printf("Second 2 # %d\n", where_Nth_int(2, arr, nelm, 2));
}

Problems with realloc, makes program crash

Hello i'implementing a smart vector in c, and i'm having problems with the reallocation of the buffer.
this is the struct that contains the array and its infos:
struct _vector
{
item* vec;
size_t elements;
size_t size;
};
item is just a typedef that in this case happens to be int.
I made several function to manage the array, but the one that should resize it, gives me problems.
(Vector is also a typedef for struct _vector* by the way)
This is the function:
void insertVector(const Vector vec,const int pos,const item a)
{
if(vec->elements==vec->size)
{
item* temp=realloc(vec->vec,(vec->size*2)*sizeof(item));
if(temp==NULL)
{
puts("Error: space unavailable");
return;
}
//vec->vec=realloc(vec->vec,(vec->size*2)*sizeof(item));
vec->vec=temp;
vec->size*=2;
}
int size=vec->elements;
if(pos>=0&&pos<=size)
{
for(int i=size;i>pos;i--)
{
vec->vec[i]=vec->vec[i-1];
}
vec->vec[pos]=a;
vec->elements+=1;
printf("size is %lu\nelements are %lu\n",vec->size,vec->elements);
}
}
I just shift the contents to make space for the new element, and it works fine, the problem is when the array is reallocated.
when the number of valid elements is equal to the actual size of the array,
i do a realloc to double the actual size.
As soon as that if activates though the realloc makes the program crash with this error:incorrect checksum for freed object.
The problem is in the if, because it only crashes when the size and elements are equal, if i comment out that section, everything works
I don't know what could it be.
EDIT:
The functions that i used to create and the initialise the instance i'm working with are:
Vector newVector(void)
{
Vector new=malloc(sizeof(*new));
new->vec=NULL;
new->elements=0;
new->size=0;
return new;
}
and
void initVector(const Vector vec,const size_t size)
{
vec->vec=calloc(size,sizeof(item));
vec->elements=size;
vec->size=size*2;
}
Based of your comment
I created a new vector setting to zero every field, then i used this function:
void initVector(const Vector vec,const size_t size)
{
vec->vec=calloc(size,sizeof(item));
vec->elements=size;
vec->size=size*2;
}
I think you are treating the size and the number of elements incorrectly. The
initVector function just allocates memory for the vec->vec array, so
vec->elements should be 0, not size. And vec->size should be size, not
size*2. So the correct function should be
// remove the const, you are modifying the data vec is pointing to
int initVector(Vector vec, size_t size)
{
if(vec == NULL)
return 0;
vec->vec = calloc(size, sizeof *vec->vec);
if(vec->vec == NULL)
return 0;
vec->elements = 0;
vec->size = size;
return 1;
}
Now the insertVector would only allocate new space, when all allocated spaces
are used.
And I suggest you use memmove to copy the memory:
// again, remove the const here
int insertVector(Vector vec, const size_t pos, const item a)
{
if(vec == NULL)
return 0;
if(vec->elements==vec->size)
{
item* temp=realloc(vec->vec,(vec->size*2)*sizeof *temp);
if(temp==NULL)
{
fprintf(stderr, "Error: space unavailable\n");
return 0;
}
vec->vec=temp;
vec->size*=2;
}
// I use vec->elements as upper limit,
// otherwise you could have "holes" in the array and
// you wouldn't realize it.
if(pos < 0 || pos > vec->elements)
{
fprintf(stderr, "invalid position\n");
return 0;
}
memmove(vec->vec + pos + 1, vec->vec + pos, (vec->elements - pos) * sizeof *vec->vec);
vec->vec[pos] = a;
vec->elements += 1;
printf("size is %lu\nelements are %lu\n",vec->size,vec->elements);
return 1;
}
In your initVector function, you set the size incorrectly, to two times what you allocated with calloc. This memory then gets overwritten as you are adding new elements and this is the reason the free fails when you finally invoke realloc. Change initVector to:
void initVector(const Vector vec,const size_t size)
{
vec->vec=calloc(size,sizeof(item));
vec->elements=size;
vec->size=size;
}

qsort endless loop causes error C

So I´m stuck with this sort function because everything seems to work fine when I debug it and there are no errors or warnings what so ever but it somehow gets stuck in an infinite loop.
My struct(if it helps):
typedef struct raeume{
char number[5];
char klasse[6];
int tische;
}raeume;
my start of the qsort function:
void ausgabesortiert(struct raeume *arr[],int used,int size)
{
qsort(*arr,size,sizeof(raeume),cmp);
ausgabesortiert(arr,size,used);
}
my compare function:
int cmp(const void * a, const void * b)
{
raeume *raumA = (raeume *) a;
raeume *raumB = (raeume *) b;
int tempA = raumA->klasse[0] - '0';
int tempB = raumB->klasse[0] - '0';
if(tempA < tempB)
{
return -1;
}
else if(tempA > tempB)
{
return 1;
}
else if(tempA == tempB)
{
if(raumA->tische > raumB->tische)
{
return -1;
}
else if(raumA->tische < raumB->tische)
{
return 1;
}
else if(raumA->tische == raumB->tische)
{
return 0;
}
}
return 0;
}
The declaration of your ausgabesortiert function
void ausgabesortiert(struct raeume *arr[],int used,int size)
clearly suggests that array arr contains pointers to struct raeume objects, not the objects themselves.
But the call to qsort
qsort(*arr,size,sizeof(raeume),cmp);
and the comparison function are written as if you are trying to sort an array of struct raeume objects themselves that begins at arr[0] location.
While there's nothing formally invalid in this, it still looks rather strange. Is this really your intent? What exactly are you trying to sort, again? The arr array or some other array pointed by arr[0]? I suspect that it is the former, in which case you need to fix the qsort call and comparison function.

How to design a function which return a array of oid

As already written at issue#2217, I want to design a function which return a list of oid in the first out param.
Should I:
Return the list of oids as a pointer to pointer?
int git_commit_tree_last_commit_id(git_oid **out, git_repository *repo, const git_commit *commit, char *path)
Or return the list of oids as a pointer to a custom struct?
int git_commit_tree_last_commit_id(git_oid_xx_struct *out, git_repository *repo, const git_commit *commit, char *path)
What is your advice?
The question is, how do you know how many OIDs are in the returned array, and who allocates the underlying memory.
For the first part there are several possibilities,
Return the number in a separate return parameter,
Use a sentinel value to terminate the list.
Return a new struct type, like git_strarray that contains the count and the
raw data.
For the second part, either
the caller can allocate the underlying memory
The function can allocate the memory
the new struct type can manage the memory.
Which path you go down depends upon what you want the code to look like, how much you expect it to be reused, how critical performance is etc.
To start with I'd go with the simplest, which IMO is function returns count and allocates memory.
That means my function would have to look like this:
int get_some_oids_in_an_array(OID** array, int * count, ... ) {
...
*count = number_of_oids;
*array = (OID*)malloc( sizeof(OID)*number_of_oids);
for(i=0; i<number_of_oids; ++i) {
*array[i]=...;
}
...
return 0;
}
/* Example of usage */
void use_get_oids() {
OID* oids;
int n_oids;
int ok = get_some_oids_in_an_array(&oids, &n_oids, ...);
for(i=0; i<n_oids; ++i ) {
... use oids[i] ...
}
free(oids);
}
Note: I'm returning an array of OID, rather than an array of OID*, either is a valid option, and which will work best for you will vary.
If it turned out I was using this kind of pattern often, then would consider switching to the struct route.
int get_some_oids( oidarray * oids, ... ) {
int i;
oidarray_ensure_size(number_of_oids);
for(i=0; i<number_of_oids; ++i) {
oidarray_set_value(i, ...);
}
return 0;
}
typedef struct oidarray {
size_t count;
OID* oids;
};
/* Example of usage */
void use_get_oids() {
oid_array oids = {0};
get_some_oids(&oids);
for(i=0; i<oids.count; ++i) {
... use oids.oids[i] ...
}
oidarray_release(&oids);
}

Want to reduce a function by looping through structs

Good Morning All,
I'm trying to reduce a function that's very repetitive, but each "repetition" has two structs with struct A.element1 setting struct B.element1. At the moment I have myFunction() with about twelve different reqFunction() calls to set B to A. Basically what I have now is:
void myFunction( structB *B )
{
structA A;
if( reqGetFunction( GLOBAL_IN_1, ( void *)&A, SIZE ) != 0 )
{
A.element3 = -1;
printf( "element3 failed\n" );
}
B->element7 = A.element3; // A is gotten when regGetFunction() is called
.
.
.
if( reqGetFunction( GLOBAL_IN_12, ( void *)&A, SIZE ) != 0 )
{
A.element14 = -1;
printf( "element14 failed\n" );
}
B->element18 = A.element14;
}
reqGetFunction() can't be changed. I have a static global array for other functions that would loop through GLOBAL_IN, and I could make structA A a static global.
I want to have something like myFunctionSingle() that will do one block, and myFunctionAll() that will have a for loop to cycle through the GLOBAL_IN array as well as the elements of struct's A and B and input them to myFunctionSingle().
So I guess my real question is how could I cycle through the elements of the structs as I can with an array, because everything there (like the structs' setups and reqGetFunction) are set in stone. I've tried a few things and searched around, but am currently stumped. I'm honestly not sure if this is possible or even worth it. Thank you in advance for your input!
Your function calls differ by 1)GLOBAL_IN_XX values 2)A.elementxx that you modify. 3)B.elementxx that you modify
What you need to do is to create a struct containing a value for GLOBAL_IN_XX a pointers to A.element and B.element, whatever type they are, for example:
struct call_parms
{
int global_parm;
int* a_ptr;
int* b_ptr;
};
Then, you need to create an array of those and initialize it accordingly, for example:
struct call_parms callParmsArray[MAX_CALLS]= {{GLOBAL_IN_1,&A.element3,&(B->element5)}, ... };
Then, just iterate over array and call your reqGetFunction with the parameters specified in each array element,something along the lines of:
for(int i = 0; i<MAX_CALLS;i++)
{
reqGetFunction( callParmsArray[i].global_parm, callParmsArray[i].element_ptr, SIZE );
}
You may also want factor a pointer to B->element in the struct and deal with it accordingly, as it is also repetitive. This will likely involve creating a wrapper around reqGetFunction() which will also deal with B and such:
struct call_parms
{
int global_parm;
int* a_ptr;
int* b_ptr;
};
bool myReqFn(struct call_parms* parm)
{
bool res;
if( res = reqGetFunction( parm->global_parm, ( void *)&A, SIZE ) != 0 )
{
*(parm->a_ptr) = -1;
printf( "element %d failed\n",parm->global_parm );
}
*(parm->b_ptr) = *(parm->a_ptr);
return res;
}
for(int i = 0; i<MAX_CALLS;i++)
{
myReqFn( &callParmsArray[i]);
}
The rest is left as an exercise to the reader, as they say...
One way to cycle through a struct that I know of is to use pointer math. I'm not sure what kind of datatype your struct members are, but if you have a concurrent set of identical datatypes numbered from j to k, your code would look something like this:
(_datatype_)*a = &(A.elementj);
(_datatype_)*b = &(B.elementj);
int i;
for (i = j; i < k; i++)
{
*(b + ((_sizeofdatatype) * (i - j)) = *(a + ((_sizeofdatatype) * (i - j));
}
EDIT: This is also, of course, assuming that you want to duplicate each pair of corresponding elements in order, but you can probably tweak it around to get the desired effect.
EDIT: This also assumes you allocate your entire struct (including variables) at the same time, so be careful.
Does GLOBAL_IN_XXX mean GLOBAL_IN[XXX] etc? And does GLOBAL_IN_XXX always map to A.element(XXX+2)? And its always B.element(N+1) = A.elementN?
I'm also going to assume that you can't change A.element1, A.element2 into A.element[], otherwise the soution would be fairly simple wouldn't it?
The most portable solution is to know the offset of each element in A and B (in case there are data alignment gotchas in the stuctures... could occur if you don't have N consecutive ELEMENT_TYPES etc)
#include <stddef.h>
// NOTE: These arrays are clumbsy but avoid making assumptions about member alignment
// in strucs.
static size_t const A_Offsets[] = {
offsetof(struct A, element1),
offsetof(struct A, element2),
offsetof(struct A, element3),
...
...
offsetof(struct A, elementN) };
static size_t const B_Offsets[] = {
offsetof(struct B, element1),
offsetof(struct B, element2),
offsetof(struct B, element3),
...
...
offsetof(struct B, elementN) };
void myFunctionSingle( structB *B, unsigned int index )
{
structA A;
ELEMENT_TYPE *elAPtr = (ELEMENT_TYPE *)((char *)A + A_Offsets[index + 2]);
ELEMENT_TYPE *elBPtr = (ELEMENT_TYPE *)((char *)A + B_Offsets[index + 6]);
if( reqGetFunction( GLOBAL_IN[index], ( void *)&A, SIZE ) != 0 )
{
*elAPtr = -1;
printf( "element%u failed\n", index);
}
*elBPtr = *elAPtr; // A is gotten when regGetFunction() is called
}
void myFunction( structB *B )
{
unsigned int i = 1;
for(; i < MAX_INDEX; ++i)
myFunctionSingle(B, i);
}
EDIT: I'm not sure if the offsetof() stuff is necessary because if you structure has only ELEMENT_TYPE data in it they are probably packed tight, but I'm not sure... if they are packed tight, then you don't have any data alignment issues so you could use the solution presented in Boston Walker's answer.

Resources