Split C Array on element - c

Say, I have an array T*array and a predicate p, I want to split the array into different sub-arrays T**subs on every element matching p.
So something like:
typedef bool (*P) (T element);
T**subs(T*array,P p){....}
How can the code for subs() look like?
Note, that the code is just pseudo code, you can use variables like array_length and so on in your example, because I just want to get the idea on how to implement subs().

Of more importance than "what would the code look like" is the question "what data structure do you want/need to use?"
For example, if you need to change the sub arrays without changing the original values, you need to copy the array elements into new arrays. If you do not change the values of the sub arrays, you can just return an array of pointers or indices into the original array. Or the array of pointers is a list.
Once you have decided on a data structure that matches your requirements, you can develop the algorithm. But if your algorithm turns out to be cumbersome or slow, you might need to adapt your data structure to allow faster processing.
So you see, your question needs a lot of "design" and decissions from you, based on your requirements.

Assuming there will be no gaps between the sub-arrays, you can return a pointer to a dynamically created array T * result with size N for M=N-2 detected elements.
This array needs to be NULL terminated to indicate it's size, that is result[N-1] needs to be NULL.
Each element of result points into the source array indicating the start (the 1st element) of a sub-array.
The result[N-2] points just beyond the last element.
The size of sub-array i (for i = {0 ... M}) can then be derived by doing result[i+1]-result[i].
No copying, no additional array to indicated the sub-arrays' sizes is needed. Just the source array's size needs to be passed to subs().

We call that a callback function, not predicate.
typedef bool (*P) (T element);
T * * subs(T * array, P callback) {
T * * retval = malloc(sizeof(T*) * max_groups); // either count before, or realloc as needed
size_t group = 0;
retval[group] = array;
for (size_t i = 0; i < array_length; ++i) {
if (callback(array[i])) {
retval[++group]=array + i;
}
}
return retval;
}
This reuses the memory of the argument array and doesn't return any information about the lengths of the groups, but since you only wanted a general idea on how to solve this, I think this should be enough starting point for you to get exactly what you want.

Related

Loop through a void**

I would like to make an update to my Insertion by calculating the size into the method and not by passing it as parameter. How do I can loop into a void**? Or are there specific methods to calculate the dimension of the array?
void insertion_sort(int size, void** array, CompFunction compare){
int i,j;
void* key;
for(i = 1; i<size;i++){
key = array[i];
for(j = i-1; j>=0 && compare(array[j],key)>=0;j--){
swap(&array[j+1],&array[j]);
}
array[j+1] = key;
}
}
I would like to make an update to my Insertion by calculating the size into the method and not by passing it as parameter. How do I can loop into a void**? Or are there specific methods to calculate the dimension of the array?
Your function is not receiving an array. It is receiving a pointer (to a void *). Supposing that the received pointer is valid, the object to which it points can be considered to be the first element of an array (even if it's declared by the caller as a scalar), but the pointer does not convey the length of that array. You must communicate that separately. The most common techniques for that are:
Passing the length as a separate parameter, as you demonstrate, or
Including an end-of-data sentinel value as the last in the array, as the standard library's string functions expect you to do.
Unless the length is somehow encoded into the pointed-to array elements themselves (e.g. via a sentinel), the called function cannot determine the array size from a pointer alone. Think about that for a moment, as it stands to logic entirely on its own.

Dynamic Array Allocation confusion

I am to read in several values from the user and store those in an array. Then I need to create an array which is big enough to store all those values. Using some functions I wrote I sort/lsearch/bsearch through the array for given values.
I already have my program written and everything, but for a static array implementation. I am sort of getting confused on where to actually use the dynamic array.
It makes sense to use it when the user starts entering values, since I can't assume how many values he enters, so the array needs to be big enough to hold it. It also makes sense (Sort of) to use it when I am creating a big enough array that can hold all the value (Acts as a copy of the first array).
I'm not asking for any code, everything is done but on a static approach. I am just trying to visualize where I would need to use darrays here. My thoughts are:
When the user first enters the values
When i copy arr1 into a new arr2 that needs to be big enough to hold all of arr1's values.
Am I right or wrong on this?
Start by using malloc or calloc to allocate an array of some known starting size, and keep track of the current capacity in a variable.
As you're reading values in, if your array isn't big enough, then user realloc to double the size of the array.
The best solution is not to copy the entire array each time a user inputs a value. The demands on malloc and free will be heavy, and get worse with larger arrays.
You need to calculate the size of your array with "number of elements as the input
int* array = newArray(10);
int* newArray(int size) {
return malloc(size * sizeof(int));
}
Keep in mind that an int* is an array, so you can still do array[3]. But, if you centralize the storage of number of used elements and the current size, you can allocate a few elements and only grow when the available elements are exhausted.
struct DynamicIntArray {
int used;
int size;
int* storage
};
void add(struct DynamicArray* array, int value) {
if (used < size) {
(*array).storage[used] = value;
used++;
} else {
int newSize = size+10;
int* newStorage = (int*)malloc(newSize*sizeof(int));
int* oldStorage = (*array).storage;
for (int i = 0; i < size; i++) {
newStorage[i] = oldStorage[i];
}
(*array).storage = newStorage;
(*array).size = newSize;
free(oldStorage);
}
}
with such an example. You should be able to write the newDynamicIntArray(...) function and the freeDynamicIntArray(struct DynamicIntArray* array) function and any other methods you care about.
I think you ask the wrong question.
The question is:
Is a dynamic array (a contiguous block of memory) the proper data structure to hold and process the data in your application?
There is only one especially useful application for arrays and that is as associative array, which means that the array index itself has a meaning and can be used to retrieve the correct contents you are searching with an effort of O(1).
In example, a list of track runners could be stored in an array, where the array index equals the track number. This is the perfect data structure if you want to visualize the name of the runners per track. It's a terrible data structure if you want to alphabetically sort the names of all runners.
But according to your application description, the array index has no meaning for you. This is an indication that an array is not the best choice.
If you are not sure how many entries inserted at runtime i suggest you to use linked list data structure. It will save your memory usage.

Generics and casting by size rather than by type

I've written up a large collection of abstract data types (ie: hash tables, heaps, etc) and simple algorithms (ie: search, sort, list manipulation, etc) that all work on arrays of int. I've modified some of my algorithms to be like what I have below so I can generalize the code without re-writing the same algorithm for each and every data type I want to sort/compare:
void* bubbleSort(void* inputArray, void (*functionPtr)(void*,void*)), size_t dataTypeSize, size_t numElements)
The idea is that I want to be able to sort an array of any arbitrary data type (ie: custom structs), and to accommodate this, I cast the input array as a pointer-to-void, and the sorting algorithm requires a function pointer to a specific comparison function so it knows how to compare any two elements of the same data/struct type. So far so good.
The one thing I can't figure out is how to properly cast the array within the function so I can access a single element at a time. I'm trying to accomplish something like:
someRandomDataType* tempArray = (someRandomDataType*)inputArray;
However, I can't find any means of doing this at run time without the use of macros (which I'd like to avoid in this case if possible). It seems in my case, all I really need to be able to do is cast inputArray so that it is seen as some array with elements of an arbitrary size. Is there some way to cast a pointer-to-array so that, dynamically, it equates to:
(typeThatIsDataTypeSize*) tempArray = (typeThatIsDataTypeSize*)inputArray;
Where "dataTypeSize" refers to the size_t input value passed to the sorting function? Thanks!
Your bubble function already has all the information. size, which is equal to sizeof element you are containing and count the number of elements.
inputArray points to the very first element. When you want to move to the next element you simply increase the pointer by the size of the element.
void* second_element = ( char* )inputArray + size ;
size_t n = 123 ;
void* nth_element = ( char* )inputArray + size * n ;
This is how you index your array.
Moving elements is done by memcpy with the last parameter is size. And swap is done by declaring temporary memory of size size and using memcpy to move memory around.

How we can insert array elements when array size is already fixed in C?

When ever I read differences between linked lists & arrays, I always saw on lot of sites that insertion of an element in to an array is very costly because we need to do lot of data moving. But one thing I always didn't understand is how we can create space for one more element while inserting, as the size of the array (or number of the elements in array) is fixed at compile time. Can any one please let me know how we can insert element into a fixed size array. And is there any concept called Dynamic array in C?
There is, indeed, the concept of a dynamic array. You just need a pointer and to reserve memory of the size you want with malloc. You need also to keep track of the number of elements you have.
int* my_array = malloc(10 * sizeof(int));
int n_used_elements = 0; // Need to keep track of the used elements and the size
int my_array_size = 10; // reserved size
However, when you exceed the number of elements in your array, you need to reserve the whole thing again and copy it again to the new reserved memory, which is also costly.
Usually, when using arrays for dynamically increasing and shrinking amounts of data, one of the most typical approaches goes with the following idea: when you exceed the size of your array, you double the size (i.e. you do not just add one more, but reserve for an extra number of elements in prevision you might need to increase the size of your array again), copy the elements of the old small one and keep going. Whenever you exceed, you double the size. On the other hand, to avoid wasting memory, if you have less than a certain amount of elements occupied, sometimes you half the size of the array.
Inserting a new element in an array is very costly because you have to shift all the elements after the inserted index one position to the right. The bigger the array, the bigger the cost of it (i.e. it is proportional to the size of an array). And you always need to consider the possibility of exceeding the size of the vector.
In C, there is no "native" concept of a dynamic array. You can create fixed length arrays via declaration:
int myArray[10];
Or dynamically via malloc/calloc:
int* myArray = malloc(10, sizeof(int));
The reason that "inserting" into a fixed array is so costly, is because you need to:
Create a new, bigger array.
Copy the old data into the new array.
Insert the new element into the appropriate spot in the new array.
Your options are to create your own storage mechanism (ie: stack, queue, linked list), or implement an existing implementation of such.
If you have an array like int a[10]; (and you use all 10 elements) it is not possible to resize it to fit another element.
For dynamic size you have to use a pointer int* a;, allocate memory youself with a = malloc(10*sizeof(int)); and take care of moving around elements when you insert in the middle.
There's no built-in dynamic array in C. If you need a dynamic array, you can't escape pointers.
typedef struct {
int *array;
size_t used;
size_t size;
} Array;
void insertArray(Array *a, int element) {
if (a->used == a->size) {
a->size *= 2; // double the size when exceeding the size of the array
a->array = (int *)realloc(a->array, a->size * sizeof(int));
}
a->array[a->used++] = element;
}
Check out this post for more details and examples.

How to check if "set" in c

If I allocate a C array like this:
int array[ 5 ];
Then, set only one object:
array[ 0 ] = 7;
How can I check whether all the other keys ( array[1], array[2], …) are storing a value? (In this case, of course, they aren't.)
Is there a function like PHP's isset()?
if ( isset(array[ 1 ]) ) ...
There isn't things like this in C. A static array's content is always "set". You could, however, fill in some special value to pretend it is uninitialized, e.g.
// make sure this value isn't really used.
#define UNINITIALIZED 0xcdcdcdcd
int array[5] = {UNINITIALIZED, UNINITIALIZED, UNINITIALIZED, UNINITIALIZED, UNINITIALIZED};
array[0] = 7;
if (array[1] != UNINITIALIZED) {
...
You can't
There values are all undefined (thus random).
You could explicitly zero out all values to start with so you at least have a good starting point. But using magic numbers to detect if an object has been initialized is considered bad practice (but initializing variables is considered good practice).
int array[ 5 ] = {};
But if you want to explicitly check if they have been explicitly set (without using magic numbers) since creation you need to store that information in another structure.
int array[ 5 ] = {}; // Init all to 0
int isSet[ 5 ] = {}; // Init all to 0 (false)
int getVal(int index) {return array[index];}
int isSet(int index) {return isSet[index];}
void setVal(int index,int val) {array[index] = val; isSet[index] = 1; }
In C, all the elements will have values (garbage) at the time of allocation. So you cannot really have a function like what you are asking for.
However, you can by default fill it up with some standard values like 0 or INT_MIN using memset() and then write an isset() code.
I don't know php, but one of two things is going on here
the php array is actually a hash-map (awk does that)
the php array is being filled with nullable types
in either case there is a meaningful concept of "not set" for the values of the array. On the other hand a c array of built in type has some value in every cell at all times. If the array is uninitialized and is automatic or was allocated on the heap those values may be random, but they exist.
To get the php behavior:
Implement (or find a library wit) and use a hashmap instead on an array.
Make it an array of structures which include an isNull field.
Initialize the array to some sentinal value in all cells.
One solution perhaps is to use a separate array of flags. When you assign one of the elements, set the flag in the boolean array.
You can also use pointers. You can use null pointers to represent data which has not been assigned yet. I made an example below:
int * p_array[3] = {NULL,NULL,NULL};
p_array[0] = malloc(sizeof(int));
*p_array[0] = (int)0;
p_array[2] = malloc(sizeof(int));
*p_array[2] = (int)4;
for (int x = 0; x < 3; x++) {
if (p_array[x] != NULL) {
printf("Element at %i is assigned and the value is %i\n",x,*p_array[x]);
}else{
printf("Element at %i is not assigned.\n",x);
}
}
You could make a function which allocates the memory and sets the data and another function which works like the isset function in PHP by testing for NULL for you.
I hope that helps you.
Edit: Make sure the memory is deallocated once you have finished. Another function could be used to deallocate certain elements or the entire array.
I've used NULL pointers before to signify data has not been created yet or needs to be recreated.
An approach I like is to make 2 arrays, one a bit-array flagging which indices of the array are set, and the other containing the actual values. Even in cases where you don't need to know whether an item in the array is "set" or not, it can be a useful optimization. Zeroing a 1-bit-per-element bit array is a lot faster than initializing an 8-byte-per-element array of size_t, especially if the array will remain sparse (mostly unfilled) for its entire lifetime.
One practical example where I used this trick is in a substring search function, using a Boyer-Moore-style bad-character skip table. The table requires 256 entries of type size_t, but only the ones corresponding to characters which actually appear in the needle string need to be filled. A 1kb (or 2kb on 64-bit) memset would dominate cpu usage in the case of very short searches, leading other implementations to throw around heuristics for whether or not to use the table. But instead, I let the skip table go uninitialized, and used a 256-bit bit array (only 32 bytes to feed to memset) to flag which entries are in use.

Resources