I'm making a data structure library. One function I am allowing users to call is:
unsigned int index(struct myDataStructure, void* value);
It searches my data structure and returns the index location of where that value exists in it.
ex.
{ 'A', 'D', 'C' }
char val1 = 'A';
unsigned int location = index(s, &val1); // location = 0
char val2 = 'C';
location = index(s, &val2); // location = 2
If the element does not exist in the list, then I don't know what to return. Here are the options that I've ruled out so far:
Using an Assert or system exit or exception to end the run-time. I don't think there would be much use in that because the user would have to call contains to make sure the element was in the structure before calling index()
Returning UINT_MAX or any constant in the range of 0 <= UINT_MAX. Those could be index values.
Changing the return type to long, so I can return -1. I don't want to switch data types.
Having the user pass in a **unsigned int so that I can either point it to NULL if does not exist or a real index value. This is not user friendly for people reading my API to understand.
The best solution I had was to do:
// Change return type to pointer.
unsigned int* index(struct myDataStructure, void* value)
{
static int val;
if (value exists...)
{
val = correct index value
return &val;
}
else
{
return NULL;
}
}
But I still feel like this solution is very poor.
You're right that returning a pointer to static is bad in many ways. It's not thread safe, and - worse - it invites users to do stuff like
int *aIndexLoc = index(data, &a);
int *bIndexLoc = index(data, &b);
if (aIndexLoc && bIndexLoc)
printf ("a's loc is %u; b's loc is %u\n", *aIndexLoc, *bIndexLoc);
And of course get the wrong answer.
First... If you want your library to be future-proof, then don't return unsigned for an array index. Return size_t.
Then... There are several idioms for dealing with error returns. The most common is to return an int or enum error code as the function value and the actual return value with a pointer arg. By convention, 0 means "okay" and non-zero values are various error codes. Also, if your data structure is more than a few bytes, don't pass a complete copy of it. Pass a pointer
typedef int ERROR;
ERROR index(size_t *result, struct myDataStructure *myStruct, void *valueToFind);
and thence something like:
size_t loc[1];
struct myDataStructure someData[1];
int aValue[1];
initialize(someData);
get(aValue);
ERROR error = index(loc, someData, aValue);
if (error) {
fprintf(stderr, "Couldn't find the value. Error code: %d\n", error);
return;
}
The 1-element array thing is a trick that lets you code the same way whether an object is allocated on the stack or heap. You can treat the name of the array as a pointer. E.g. someData->fieldName and *loc work just fine.
Several possibilities:
On error return (unsigned int) -1 using
unsigned int index(struct myDataStructure, void* value)
On error return -1 using
ssize_t index(struct myDataStructure, void* value)
(ssize_t is POSIX)
Pass in the address of an unsigned int to point to the result and return -1 on error and 0 on success, using
int index(struct myDataStructure, void* value, unsigned int * result)
Using
an assertion I feel is not appropriate here, as it ends your program.
a static buffer is coding style of the last millennium. It makes your library unusable in a multithreaded context.
You could pass a pointer to a bool that sets as true or false accordingly.
unsigned int index(struct myDataStructure, void *value, bool *ok);
Related
I'm passing an array of single-precision floating point values to a function in C. The function has no knowledge of the size of the array and I'd like to keep it that way, primarily because while the underlying array is of course fixed-length I won't always be filling it completely so I'd need to be able to find the end anyway. With a string you use a null-terminator, but with this implementation all possible values are potentially valid. Is the best I can do like a "code word" to mark the end using multiple values in order, something like ASCII 'STOP'? That leaves open the possibility of coincidentally having that code word in the array of valid data...
You'll see array/size pairs being passed around in C a lot, it's really the only way to do this reliably. Even C strings, which are NUL terminated, are often sent with a length parameter to be sure you don't inadvertently walk off the end of the array and into other memory.
This approach also permits you to use substrings, or subsets of the array, instead of being committed to use the whole thing, the problem you're basically trying to solve. Having a terminator is both a blessing and a curse, as anyone who's ever tried to battle a pernicious buffer-overflow bug can attest to.
In your case, the function signature should look like:
void process(float* v, size_t n)
Where v is the array of floating-point values to process and n is how many of them to use. n should be less than or equal to however many valid entries are in the v array.
If you're passing this kind of thing around a lot you may even encapsulate it in a simple struct that defines the data and size. You can then wrap around that some simple allocator/populator tools.
For example:
struct float_array {
float* values;
size_t size;
};
Where you can then define something like:
struct float_array* make_float_array(size_t n);
void free_float_array(struct float_array* f);
You don't need to pass the array maximum length, just the length currently being used for this call along with the pointer.
You can use NAN this way, assuming that's not a valid value for your dataset:
#include <math.h>
float average(float *array)
{
float sum = 0.0; // Declare this as double for better precision
size_t index = 0;
// x == NAN will return false for all x including NAN, so we need
// the function isnan()
while(! isnan(array[index]))
sum += array[index++];
return sum/index;
}
Since you're probably want to do this for many functions, I recommend writing a function for calculating length:
size_t farray_length(float *array)
{
size_t len = 0;
while(! isnan(array[len])) len++;
return len;
}
But the usual way of solving these problems in C is to send the size as a separate parameter.
float average(float *array, size_t size)
{
float sum = 0.0;
for(size_t i=0; i<size; i++)
sum += array[i];
return sum/size;
}
A third way, which can be useful for instance if you're coding a library with objects you don't want the user to mess with directly, is to declare a struct.
struct float_array {
float *array;
size_t size;
}
float average(float_array array) {
...
With a string you use a null-terminator, but with this implementation all possible values are potentially valid.
If all values are valid, a sentinel value cannot be implemented. It's as simple as that (which is why EOF is an integer value that overflows the char type).
The function has no knowledge of the size of the array and I'd like to keep it that way...
Assuming NaN is an invalid value, you could use the isnan() macro to test for a sentinel value.
However, is NaN is a valid value...
I'd need to be able to find the end anyway.
The only option left is to actually pass the array length along with the array.
If you can't add the array length as a separate argument, you could (probably) store the length of the array as the first member - either using a struct (recommended) or using type punning (don't try this at home unless you know what you're doing).
i.e.
typedef struct float_array_s {
unsigned int len;
float f[];
};
static unsigned int float_array_len(float_array_s * arr) { return arr->len; }
static float float_array_index(float_array_s * arr, unsigned int index) { return arr->f[index]; }
There's really no reason to use computation cycles if you can simply pass the length of the valid array length along with the array.
Edit (type punning)
I highly recommend avoiding this approach, since type lengths could cause hard to detect bugs. However...
It's possible to store the length of the array in the first float member, by using the same bytes (memory) to store an integer.
Note that this might crash (or worst, silently fail) if unsigned int is longer than float (which it might be, even though they usually have the same size in bytes).
i.e.
#include "math.h"
#include "stdint.h"
#include "stdio.h"
/* Returns the member at `index`. */
static float float_array_index_get(float *arr, unsigned int index) {
return arr[index + 1];
}
/* Sets the member at `index` to `val. */
static void float_array_index_set(float *arr, unsigned int index, float val) {
arr[index + 1] = val;
}
/* Returns the array's length. */
static unsigned int float_array_length_get(float *arr) {
if (sizeof(unsigned int) > sizeof(float)) {
fprintf(
stderr,
"ERROR: (%s:%d) type size overflow, code won't work on this system\n",
__FILE__, __LINE__);
}
union {
float f;
unsigned int i;
} pn;
pn.f = arr[0];
return pn.i;
}
/* Sets the array's length. */
static void float_array_length_set(float *arr, unsigned int len) {
if (sizeof(unsigned int) > sizeof(float)) {
fprintf(
stderr,
"ERROR: (%s:%d) type size overflow, code won't work on this system\n",
__FILE__, __LINE__);
}
union {
float f;
unsigned int i;
} pn;
pn.i = len;
arr[0] = pn.f;
}
/* Pushes a member to the array, increasing it's length. */
static void float_array_index_push(float *arr, float val) {
unsigned int len = float_array_length_get(arr);
float_array_index_set(arr, len, val);
float_array_length_set(arr, len + 1);
}
/* Pops a member from the array...
* ... returning nan if the member was nan or if the array is empty.
*/
static float float_array_index_pop(float *arr) {
unsigned int len = float_array_length_get(arr);
if (!len)
return nan("");
float_array_length_set(arr, len);
return float_array_index_get(arr, len);
}
P.S.
I hope you'll stick to the simple func(float * arr, size_t len) now that you see how much extra code you need just to avoid passing the length of the array.
I am having troubles formulating my objective in words so I am not too sure on how to express this.
Say I have two functions with the following signatures:
myBigStruct_t function1()
int function2()
with a definition of myBigStruct_t (that stores a lot of data and is somewhere else) and a union definition that can support the size of both return types:
typedef union myUnion{
myBigStruct_t A;
int B;
} myData_t;
union my2ndUnion{
myData_t data_;
char myArray[sizeOf(myData_t)];
} un2;
Can I do the following:
un2.myArray = function1();
...
if( something ){
myExpress = un2.data_.A;
else{
myOtherExpress = un2.data_.B;
}
...
un2.myArray = function2();
if( something ){
myExpress = un2.data_.A;
else{
myOtherExpress = un2.data_.B;
}
I know array data is normally passed by reference but most C compilers have a means of passing large data types that at least appear to be by value (regardless of whether or not a secret pointer is used).
I know its a bit contrived; I am just trying to get my ahead around unions.
Yes, that is exactly what unions do. I think that answers your question (let me know if it doesn't), but I'll give some more background.
If myBigStruct_t is defined like this:
typedef struct {
char someChars[256];
int someInts[512];
} myBigStruct_t;
Then when you do the assignment from function1(), the data is copied. There is no secret pointer involved.
On the other hand, if myBigStruct_t is defined like this:
typedef struct {
char *someChars; //This gets malloc'd in function1
int *someInts; //This gets malloc'd in function1
} myBigStruct_t;
Then the data is passed by reference, i.e. the data was not copied.
Note that your code as-is won't work because of the type mismatch between the return values of function1 and myArray, and the same for function2. You'll have to assign the specific union members:
un2.myArray.data_.A = function1();
un2.myArray.data_.B = function2();
I don't think there's any reason not to do it this way.
Edit (in response to your comment):
Why not just pass the array as a parameter instead of return value? A function prototype would be like: void foo(void* buffer, size_t buffer_len). I assume you fully understand pointers (otherwise function pointers are probably not the right way to solve your problem). A complete program might look like:
//All the functions use this prototype, although it isn't strictly necessary (your loop would just have to be smarter)
typedef void(*GenericFunctionCall)(void* buffer, size_t buffer_len);
//A struct with only a few bytes
typedef struct SmallStruct_s{
char value;
} SmallStruct_t;
//A struct with more bytes
typedef struct BigStruct_s {
char value[1024];
} BigStruct_t;
//Defining this makes it easy to get the maximum size of all the structs
typedef union AllStructs_s {
SmallStruct_t small;
BigStruct_t big;
} AllStructs_t;
//This function takes the buffer, casts it to a SmallStruct_t, and then does something with it (presumably sets param->value to something)
void smallFunction(void* buffer, size_t buffer_len) {
SmallStruct_t * param = (SmallStruct_t*)buffer;
//do something with param
}
//This function does the same with BigStruct_t
void bigFunction(void* buffer, size_t buffer_len) {
BigStruct_t * param = (BigStruct_t*)buffer;
//do something with param
}
int main() {
//This allocates memory for all the values generated by smallFunction and bigFunction.
AllStructs_t param;
//This is your table of function pointers
GenericFunctionCall functions[2];
functions[0] = smallFunction;
functions[1] = bigFunction;
//Loop through the functions and do something with the results
for (uint32_t function_index = 0; function_index < 2; ++function_index) {
functions[function_index]((void*)¶m, sizeof(AllStructs_t));
//Do something with param here
}
}
Second edit:
Okay I now see what you're trying to do. You can't use union between to accept an arbitrary value, in this case unions are no different from any other data type (i.e. I can't assign BigStruct_t = SmallStruct_t).
Here's why: when the compiler generates code to handle the return value of a function, it is using the caller's memory to store the value. Because of that, you're not allowed to get a pointer to the return value for a function. In compiler-speak: the return value of a function is not an lvalue. The options for solving this are:
Store the return type of the function in the table, and use that assign the appropriate variable
Write a wrapper function for each function from the table, and have it convert the return type from return value to pointer parameter. Call the wrapper instead of the original function
Refactor the code completely
This question already has answers here:
How can mixed data types (int, float, char, etc) be stored in an array?
(6 answers)
Closed 7 years ago.
This is not a 'How can a mixed data type (int, float, char, etc) be stored in an array?' question read carefully please!
Suppose I have the following void pointer, something I don't know it's type until runtime:
void* data;
now I know I can do the following, when I know the type of data(e.g. int):
int typed_data = *(int*)data;
using a switch case statement I could check a variable to determine which cast to perform:
switch(type_id) {
case INT:
int typed_data = *(int*)data;
break;
case FLOAT:
float typed_data = *(float*)data;
break;
// ...
// etc.
}
But, this way I will not be able to access typed_data outside the switch block, consider the below funstion as an example; It takes two void pointers, and according to the value of type_id, it casts the s and x to correct data types, and then does other things with the newly defined typed data:
int sequential_seach(int n, void* s, void* x, type_id) {
int location = 0;
switch(type_id) {
case INT:
int *list = s;
int element = *(int*)x;
break;
case FLOAT:
float *list = s;
float element = *(float*)x;
break;
// ...
// etc.
}
while(location < n && list[location] != element) { // <---This will cause a compile error
location++;
if(location > n - 1) {
location = -1;
}
}
return location;
}
In the above function location and list are not accessible outside the swtich block, even if type_id matched one of the case values and they were defined, they are still out of scope, outside the switch block, therefore when the compiler reaches the line while resides, it complains that location and list are not defined. But these typed variables are needed for the function. So how to solve this? should I copy paste the while block into every case? That doesn't look it's a very good solution. What if I had a longer code which needed these variables in 100 different places?
Sounds like you need generics: the ability to define functions with compile-time type parameters.
Unfortunately, C doesn't natively have generics. Fortunately, you can use macros as pseudo-generics to make the preprocessor automatically generate multiple versions of your code.
Adapted from the linked answer:
// sequential_search.h
/* Provide some helpers that generate a name of the form of sequential_search_T,
unique for each type argument */
#define TOKENPASTE(x, y) x ## y
#define SEQ_SEARCH(T) TOKENPASTE(sequential_search_, T)
/* Provide the generic type definition of your function */
int SEQ_SEARCH(TYPE) (int n, void* s, void* x) {
int location = 0;
TYPE* list = s;
TYPE element = *(TYPE*)x;
while(location < n && list[location] != element) {
location++;
if(location > n - 1) {
location = -1;
}
}
return location;
}
Instantiate it once for each type argument you intend to pass:
// sequential_search.c
#define TYPE int
#include "sequential_search.h"
#undef TYPE
#define TYPE float
#include "sequential_search.h"
#undef TYPE
// etc.
Finally, create a (statically resolvable) call spot that will switch on the type id you have (the runtime information) and then immediately dispatch to one of the generic versions:
int sequential_search(int n, void* s, void* x, type_id) {
switch(type_id) {
case INT: return sequential_search_int(n, s, x);
case FLOAT: return sequential_search_float(n, s, x);
// etc.
}
}
It's not possible to cast a void pointer at runtime in C.
And your code doesn't check if the cast is valid either. Doing this, you will either lose some data or risk a segmentation fault since data types aren't all the same size.
Apologies if this is a "basic" question, but I am new to C and can't find the answer. My question is regarding the need to malloc a variable and return its pointer within a function, compared with creating the variable within the function and returning the result.
The first issue that comes to mind is that any variables declared within the scope of the function will be destroyed once the function terminates; so why is it then that the following is valid:
int add(int a, int b)
{
int result;
result = a + b;
return result;
}
But the following is not?
char *concat(char* a, char* b)
{
char result[10];
strcat(result, a);
strcat(result, b);
return result;
}
The warning you get is that you are returning an address to a local variable, but this is what we are also doing in the first function? Does the behaviour differ depending on the type?
For a more real example, I'm very confused regarding which of the following 2 functions I should be using, as they both work perfectly well for my program:
struct Card *card_create(enum Rank rank, enum Suit suit)
{
struct Card *card = malloc(sizeof(struct Card));
if(card == NULL) {
fprintf(stderr, "malloc: %s", strerror(errno));
return NULL;
}
card->rank = rank;
card->suit = suit;
return card;
}
Or:
struct Card card_create(enum Rank rank, enum Suit suit)
{
struct Card card;
card.rank = rank;
card.suit = suit;
return card;
}
Again, sorry if this is a nooby question, but I'd really appreciate an explanation. Thanks!
In your add() function, you return the value that is held (until the function exits) in its local variable result. The storage for that variable is no longer available once the function returns, but the value that was stored there is just a number, not in itself dependent on that storage.
In your concat() function, the expression result evaluates to a pointer to the local storage for an array of 10 char. You can still return the pointer's value, but once the function exits the meaning of that value is no longer defined.
So, no, the behavior of returning a value does not itself differ between those cases, but the usefulness -- indeed the risk -- associated with doing so varies greatly.
Does the behaviour differ depending on the type?
Kind of. When you return something in C, you return the value, not a pointer to the value or anything that depends on any variables still existing. However, there's an unrelated rule that says that in almost all contexts, a value of array type is implicitly converted to a pointer to the array's first element. Thus, the second code snippet is returning a pointer into the array, while the first snippet just returns an int.
Very interesting question
When a function return local variable as vlue, A new variable will be created and stored in CPU registers( depends on compiler). After that, local variable will be freed.
int add(int a, int b)
{
int result;
result = a + b;
return result; // 1. A new int variable will be created and stored in CPU registers (depends on Compiler)
// 2. result will be freed at the end of function
}
In CPP, the process is very similar to C, card constructor will be called twice
struct Card card_create(enum Rank rank, enum Suit suit)
{
struct Card card;
card.rank = rank;
card.suit = suit;
return card; // 1. New Card object will be created and stored in CPU register(Depends on compiler)
// 2. card will be freed at the end of function
}
Hope this help.
In the first code snip, the value of an int variable is being returned. Very OK.
In the second code snip, the address of a local (stack) variable is being returned. Not OK.
And, perhaps the line:
char[10] result;
would be better as:
char result[10];
If the second example declared result as follows:
char result;
then the value in result could be returned:
return(result);
However, the second example defines result as an array of char; making result a pointer to the beginning of this array. Hence, the true value of result in the second example is an address to local stack memory (which disappears when the function scope terminates).
The first example captures the value of result (an integer value) and sends the value back to the caller, (not the address of the value in the function's local scope).
C is a reasonably new language to me, most of my programming knowledge is based around Java, or web-based languages - so please be gentle if I come across as a complete noob with this question!
I have an array of type unsigned long, with size 100000, declared in main(). When a certain condition of user input is met, a record() function call is made which initiates some hardware to begin audio recording (not really important to the scope of the question).
At the record() function call, as long as a 'ready' flag in a register is initialised, the contents of the register is copied to an array cell - this process iterates until all 100000 cells of the array have been recorded to.
The array from the record() function now needs be returned to a variable in main(). I have tried this by simply returning a variable of type unsigned long from the method call - but I can't seem to make this work. I have also tried this using pointers - but my inexperience with C is showing when I try this.
My code for using pointers is:
int main(void){
...
unsigned long recordingOne[100000];
unsigned long *ptrOne;
ptrOne = &recordingOne;
...
initiateRecording(ptrOne);
}
void initiateRecording(unsigned long *ptr){
unsigned long returnOne[100000];
for(i = 0; i<100001; i++){
returnOne[i] = AD0DR1 //AD0DR1 corresponds to hardware register
}
*ptr = returnOne;
}
For this I get two warnings:
(in function main) warning: assignment from incompatible pointer type [enabled by default]
(in function initiateRecording) warning: assignment makes integer from pointer without a cast [enabled by default]
When I tried this previously without pointers, I tried passing an array as a parameter, and then returning an array. That looked something like this:
int main(void){
...
unsigned long recordingOne[100000];
...
recordingOne = initiateRecording();
}
unsigned long[] initiateRecording(){
unsigned long toReturnOne[100000];
for(i = 0; i<100001; i++){
toReturnOne[i] = AD0DR1 //AD0DR1 corresponds to hardware register
}
return toReturnOne;
}
The compiler wasn't a fan of this either - I'm struggling to declare a return object of type unsigned long that is also an array.
As always, your help is very much appreciated!
Here is the best method:
int main(void)
{
...
unsigned long recordingOne[100000];
unsigned long *ptrOne; // <<<--- Don't need this
ptrOne = &recordingOne; // <<<--- Don't need this
...
initiateRecording(recordingOne); <<<--- Pass address of array directly
}
void initiateRecording(unsigned long *ptr)
{
for(i = 0; i<100000; i++) <<<---- If i == 100000 undefined behavior, such as a segmentation fault, may occur; change 100001 to 100000
ptr[i] = AD0DR1 // <<<--- write directly to array
}
The accessible elements in recordingOne are from [0] to [99999]
The source of both compiler warnings is the following: when you initialize a C array like int array[100];, the symbol array is a pointer to the first element of the array. So when you write something like int *p = &array, the result is that p points not to the beginning of array, but to the location in memory that holds the pointer to the beginning of array. What you want is int *p = &array[0].
Edit: or int *p = array works just fine too, as pointed out in a comment in another answer.