Most high-level languages (Python, Ruby, even Java) use pass-by reference. Obviously, we don't have references in C, but we can imitate them using pointers. There are a few benefits to doing this. For example:
int findChar(char ch, char* in)
{
int i = 0;
for(i = 0; in[i] != '\0'; i++)
if(in[i] == ch)
return i;
return -1;
}
This is a common C paradigm: catch an abnormal or erroneous situation by returning some error value (in this case, return -1 if the character is not in the string).
The problem with this is: what if you want to support strings more than 2^31 - 1 characters long? The obvious solution is to return an unsigned int but that won't work with this error value.
The solution is something like this:
unsigned int* findChar(char ch, char* in)
{
unsigned int i = 0;
for(i = 0; in[i] != '\0'; i++)
if(in[i] == ch)
{
unsigned int index = (unsigned int*) malloc(sizeof(unsigned int));
*index = i;
return index;
}
return NULL;
}
There are some obvious optimizations which I didn't make for simplicity's sake, but you get the idea; return NULL as your error value.
If you do this with all your functions, you should also pass your arguments in as pointers, so that you can pass the results of one function to the arguments of another.
Are there any downsides to this approach (besides memory usage) that I'm missing?
EDIT: I'd like to add (if it isn't completely obvious by my question) that I've got some experience in C++, but I'm pretty much a complete beginner at C.
It is a bad idea because caller is responsible to free the index, otherwise you are leaking memory. Alternatively you can use static int and return its address every time - there will be no leaks, but function becomes non-reentrant, which is risky (but acceptable if you bear it in mind).
Much better would be to return pointer to char function finds, or NULL if it is not present. That's the way strchr() works, BTW.
Edited to reflect changes in original post.
Without the malloc, the position can be still a stack variable and you can use it in an if statement:
int findChar(char ch, char* in, int* pos)
{
int i = 0;
for(i = 0; in[i] != '\0'; i++)
{
if(in[i] == ch)
{
*pos = i;
return 1;
}
}
return 0;
}
In the specific example, you should use size_t as the return type: this is the data type that adequately represents how large strings can get on any system. I.e. you can't possibly have a string that is longer than a size_t can represent. Then, you can fairly safely use (size_t)-1 as an error indicator: realistically, you also cannot put a string with that size into memory, since you also need some address space for the code you are executing; it becomes a limitation of your API that such long strings would not be supported if they existed.
Your approach not only has the disadvantage using more memory, but also the disadvantage of being slower: the callee needs to malloc, the caller needs to free. Those are fairly expensive operations.
There is one other standard approach relevant here: errno. In case of an error indicator, you don't know what the error is. So in C, rather than using an out parameter, we typically put the error details into a global or thread-local variable.
The function needs to dereference the parameters, which takes more time than accessing the stack.
The pointers can be uninitialized, causing unexpected results.
There is no standard way to specify which pointer is for input, wich is for output and which is for both (there are extensions, and naming tricks, but it's still a matter).
I am not an expert, but I think a ton of small mallocs can cause problems. First, you have to take care of freeing the memory after the use of the value. Then you also have to deal with the fragmentation of the free memory. Passing as pointer is more suitable for complex structures.
I'd say the most severe downside to your code is that you use one return value to represent both a general failure and the result if successful.
While this is a common practice, it can lead to wierd scenarios when requirements change, just like the one you described. An alternative practice would be to separate the return values, i.e. something like this
int findChar(char ch, char const * const in, unsigned int * const index)
{
if ( in != NULL && index != NULL)
{
unsigned int i;
for(i = 0; in[i]; i++)
{
if(in[i] == ch)
{
*index = i;
return EXIT_SUCCESS;
}
}
}
return EXIT_FAILURE;
}
...where the function return value tells you whether the function was successful or not, separately from the value of 'index'.
Then again, as fortran noted, there is no way to enforce whether the pointers are input values, output values, or both (i.e. modified inside the function).
The biggest downside is that it requires findChar()'s callers to free() the returned memory, or create a memory leak. You've reinvented the strchr() wheel poorly.
I also don't see why you're thinking that returning a pointer to unsigned int is such a big step forward. First, you could just return an unsigned int, if all you're after is the ability to return values up to 2^32 on a 32-bit machine instead of 2^31-1. Second, your stated goal is to avoid a problem with large strings. Well, what if you're on a 64-bit machine, where 'int' and 'unsigned int' remain 32 bits? What you really want here is a long, but returning pointers doesn't actually help here.
ELIDED BOGUS CRITICISM
Related
For an experiment I created a function to initialize an array that have a built-in length like in java
int *create_arr(int len) {
void *ptr = malloc(sizeof(int[len + 1]));
int *arr = ptr + sizeof(int);
arr[-1] = len;
return arr;
}
that can be later be used like this
int *arr = create_arr(12);
and allow to find the length at arr[-1]. I was asking myself if this is a common practice or not, and if there is an error in what i did.
First of all, your code has some bugs, mainly that in standard C you can't do arithmetic on void pointers (as commented by MikeCAT). Probably a more typical way to write it would be:
int *create_arr(int len) {
int *ptr = malloc((len + 1) * sizeof(int));
if (ptr == NULL) {
// handle allocation failure
}
ptr[0] = len;
return ptr + 1;
}
This is legal but no, it's not common. It's more idiomatic to keep track of the length in a separate variable, not as part of the array itself. An exception is functions that try to reproduce the effect of malloc, where the caller will later pass back the pointer to the array but not the size.
One other issue with this approach is that it limits your array length to the maximum value of an int. On, let's say, a 64-bit system with 32-bit ints, you could conceivably want an array whose length did not fit in an int. Normally you'd use size_t for array lengths instead, but that won't work if you need to fit the length in an element of the array itself. (And of course this limitation would be much more severe if you wanted an array of short or char or bool :-) )
Note that, as Andrew Henle comments, the pointer returned by your function could be used for an array of int, but would not be safe to use for other arbitrary types as you have destroyed the alignment promised by malloc. So if you're trying to make a general wrapper or replacement for malloc, this doesn't do it.
Apart from the small mistakes that have already been pointed in comments, this is not common, because C programmers are used to handle arrays as an initial pointer and a size. I have mainly seen that in mixed programming environments, for example in Windows COM/DCOM where C++ programs can exchange data with VB programs.
Your array with builtin size is close to winAPI BSTR: an array of 16 bits wide chars where the allocated size is at index -1 (and is also a 16 bit integer). So there is nothing really bad with it.
But in the general case, you could have an alignment problem. malloc does return a pointer with a suitable alignment for any type. And you should make sure that the 0th index of your returned array also has a suitable alignment. If int has not the larger alignment, it could fail...
Furthermore, as the pointer is not a the beginning of the allocated memory, the array would require a special function for its deallocation. It should probaby be documented in a red flashing font, because this would be very uncommon for most C programmers.
This technique is not as uncommon as people expect. For example stb header only library for image processing uses this method to implement type safe vector like container in C. See https://github.com/nothings/stb/blob/master/stretchy_buffer.h
It would be more idiomatic to do something like:
struct array {
int *d;
size_t s;
};
struct array *
create_arr(size_t len)
{
struct array *a = malloc(sizeof *a);
if( a ){
a->d = malloc(len * sizeof *a->d);
a->s = a->d ? len : 0;
}
return a;
}
I'm trying to understand C-pointers. As background, I'm used to coding in both C# and Python3.
I understand that pointers can be used to save the addresses of a variable (writing something like type* ptr = &var;) and that incrementing pointers is equivalent to incrementing the index of an array of objects of that object type type. But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
I couldn't think of a way to do this, and most of the examples of C/C++ pointers all seem to use them to reference a variable. So it might be that what I'm asking is either impossible and/or bad coding practice. If so, it would be helpful to understand why.
For example, to clarify my confusion, if there is no way to use pointers without using predefined hard-coded variables, why would you use pointers at all instead of the basic object directly, or arrays of objects?
There is a short piece of code below to describe my question formally.
Many thanks for any advice!
// Learning about pointers and C-coding techniques.
#include <stdio.h>
/* Is there a way to define the int-pointer age WITHOUT the int variable auxAge? */
int main() // no command-line params being passed
{
int auxAge = 12345;
int* age = &auxAge;
// *age is an int, and age is an int* (i.e. age is a pointer-to-an-int, just an address to somewhere in memory where data defining some int is expected)
// do stuff with my *age int e.g. "(*age)++;" or "*age = 37;"
return 0;
}
Yes, you can use dynamic memory (also known as "heap") allocation:
#include <stdlib.h>
int * const integer = malloc(sizeof *integer);
if (integer != NULL)
{
*integer = 4711;
printf("forty seven eleven is %d\n", *integer);
free(integer);
// At this point we can no longer use the pointer, the memory is not ours any more.
}
This asks the C library to allocate some memory from the operating system and return a pointer to it. Allocating sizeof *integer bytes makes the allocation fit an integer exactly, and we can then use *integer to dereference the pointer, that will work pretty much exactly like referencing an integer directly.
There are many good reasons to use pointers in C, and one of them is, that you can only pass by value in C - you cannot pass by reference. Therefore passing pointer to an existing variable saves you the overhead of copying it to stack. As an example, let's assume this very large structure:
struct very_large_structure {
uint8_t kilobyte[1024];
}
And now assume a function which needs to use this structure:
bool has_zero(struct very_large_structure structure) {
for (int i = 0; i < sizeof(structure); i++) {
if (0 == structure.kilobyte[i]) {
return true;
}
}
return false;
}
So for this function to be called, you need to copy the whole structure to stack, and that can be especially on embedded platforms where C is widely used an unacceptable requirement.
If you will pass the structure via pointer, you are only copying to the stack the pointer itself, typically a 32-bit number:
bool has_zero(struct very_large_structure *structure) {
for (int i = 0; i < sizeof(*structure); i++) {
if (0 == structure->kilobyte[i]) {
return true;
}
}
return false;
}
This is by no mean the only and most important use of pointers, but it clearly shows the reasoning why pointers are important in C.
But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
Yes, there are two cases where this is possible.
The first case occurs with dynamic memory allocation. You use the malloc, calloc, or realloc functions to allocate memory from a dynamic memory pool (the "heap"):
int *ptr = malloc( sizeof *ptr ); // allocate enough memory for a single `int` object
*ptr = some_value;
The second case occurs where you have a fixed, well-defined address for an I/O channel or port or something:
char *port = (char *) OxDEADBEEF;
although this is more common in embedded systems than general applications programming.
EDIT
Regarding the second case, chapter and verse:
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
Parameters to a function in C are always pass by value, so changing a parameter value in a function isn't reflected in the caller. You can however use pointers to emulate pass by reference. For example:
void clear(int *x)
{
*x = 0;
}
int main()
{
int a = 4;
printf("a=%d\n", a); // prints 4
clear(&a);
printf("a=%d\n", a); // prints 0
return 0;
}
You can also use pointers to point to dynamically allocated memory:
int *getarray(int size)
{
int *array = malloc(size * sizeof *array);
if (!array) {
perror("malloc failed");
exit(1);
}
return array;
}
These are just a few examples.
Most common reason: because you wish to modify the contents without passing them around.
Analogy:
If you want your living room painted, you don't want to place your house on a truck trailer, move it to the painter, let him do the job and then haul it back. It would be expensive and time consuming. And if your house is to wide to get hauled around on the streets, the truck might crash. You would rather tell the painter which address you live on, have him go there and do the job.
In C terms, if you have a big struct or similar, you'll want a function to access this struct without making a copy of it, passing a copy to the function, then copy back the modified contents back into the original variable.
// BAD CODE, DONT DO THIS
typedef struct { ... } really_big;
really_big rb;
rb = do_stuff(rb);
...
rb do_stuff (really_big thing) // pass by value, return by value
{
thing->something = ...;
...
return thing;
}
This makes a copy of rb called thing. It is placed on the stack, wasting lots of memory and needlessly increasing the stack space used, increasing the possibility of stack overflow. And copying the contents from rb to thing takes lots of execution time. Then when it is returned, you make yet another copy, from thing back to rb.
By passing a pointer to the struct, none of the copying takes place, but the end result is the very same:
void do_stuff (really_big* thing)
{
thing->something = ...;
}
I am trying to understand a portion of code. I am leaving out a lot of the code in order to make it simpler to explain, and to avoid unnecessary confusion.
typedef void *UP_T;
void FunctionC(void *pvD, int Offset) {
unsigned long long int temp;
void *pvFD = NULL;
pvFD = pvD + Offset;
temp = (unsigned long long int)*(int *)pvFD;
}
void FunctionB(UP_T s) {
FunctionC(s, 8);
}
void FunctionA() {
char *tempstorage=(char *)malloc(0);
FunctionB(tempstorage);
}
int main () {
FunctionA();
return 0;
}
Like I said, I am leaving out a ton of code, hence the functions that appear useless because they only have two lines of code.
What is temp? That is what is confusing me. When I run something similar to this code, and use printf() statements along the way, I get a random number for pvD, and pvFD is that random number plus eight.
But, I could also be printing the values incorrectly (using %llu instead of %d, or something like that). I am pretty sure it's a pointer to the location in memory of tempstorage plus 8. Is this correct? I just want to be certain before I continue under that assumption.
The standard specifies that malloc(0) returns either NULL or a valid pointer, but that pointer is never to be dereferenced. There aren't any constraints regarding the actual implementation, so you can't rely on the returned pointer being another plus 8.
It's random in the sense that malloc is typically non-deterministic (i.e. gives different results from run to run).
The result of malloc(0) is implementation-defined (but perfectly valid), you just shouldn't ever dereference it. Nor should you attempt to do arithmetic on it (but this is generally true; you shouldn't use arithmetic to create pointers beyond the bounds of the allocated memory). However, calling free on it is still fine.
I am working my way thru Cyclone: A Safe Dialect of C for a PL class. The paper's authors explain that they've added a special 'fat' pointer that stores bounds information to prevent buffer overflows. But they don't specify if the check on this pointer is static or dynamic. The example they give seems to imply that the programmer must remember to check the size of the array in order to check that they don't exceed the buffer. This seems to open up the possibility of programming errors, just like in C. I thought the whole idea of Cyclone was to make such errors impossible. Does the language have a check? Does it just make it harder to make programming mistakes?
int strlen(const char ?s) {
int i, n;
if (!s) return 0;
n = s.size; //what if the programmer forgets to do this.. or accidentally adds an n++;
for (i = 0; i < n; i++,s++)
if (!*s) return i;
return n;
}
"Fat" pointers support pointer arithmetic with run-time bounds
checking.
Obtained from Wikipedia by googling for “fat pointers”.
Is there a possibility that strcat can ever fail?
If we pass some incorrect buffer or string, then it might lead to memory corruption. But, apart from that is it possible that this function can return failure like strcat returning NULL even if destination string passed is Non-NULL? If no, why strcat has a return type specified at all?
I have just mentioned strcat as an example. But, this question applies to many string and memory related (like memcpy etc) functions. I just want to know the reasoning behind some of these seemingly "always successful" functions having return types.
Returning a pointer to the target string makes it easy to use the output in this sort of (perhaps not-so-clever) way:
int len = strlen(strcat(firstString, secondString));
Most of them go back to a time when C didn't include 'void', so there was no way to specify that it had no return value. As a result, they specified them to return something, even if it was pretty useless.
The implicit contract of these functions is the following: if you pass-in pointers to valid strings, then the functions will perform as advertised. Pass-in a NULL pointer, and the function may do anything (usually, it will raise a SIGSEGV). Given that the arguments are valid (i.e., point to strings) then the algorithms used can not fail.
I always ignored the return types (wondering who uses them) until today I saw this in glibc-2.11 (copied exactly from the source file) and everything became much more clear:
wchar_t *
wcsdup (s)
const wchar_t *s;
{
size_t len = (__wcslen (s) + 1) * sizeof (wchar_t);
void *new = malloc (len);
if (new == NULL)
return NULL;
return (wchar_t *) memcpy (new, (void *) s, len);
}
It makes it easier to write less code ("chain" it?) I guess.
Here's a pretty standard implementation of strcat from OpenBSD:
char *
strcat(char *s, const char *append)
{
char *save = s;
for (; *s; ++s);
while ((*s++ = *append++) != '\0');
return(save);
}
As long as the inputs passed to it are valid (i.e. append is properly terminated and s is large enough to concatenate it), this can't really fail - it's a simple memory manipulation. That memory is entirely under the control of the caller.
The return value here could be used to chain concatenations, for example:
strcat(strcat(s, t1), t2);
Although this is hardly efficient...